A GAN-CNN Fusion Framework for Deep Learning-Based DOA Estimation in Low-SNR Environments

Zhang, Zhenshan; Xu, Wenjie; Zou, Haitao; Yi, Shichao

doi:10.3390/s26051676

Open AccessArticle

A GAN-CNN Fusion Framework for Deep Learning-Based DOA Estimation in Low-SNR Environments

¹

School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang 212003, China

²

School of Science, Jiangsu University of Science and Technology, Zhenjiang 212003, China

³

Zhenjiang Jizhi Ship Technology Co., Ltd., Zhenjiang 212003, China

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(5), 1676; https://doi.org/10.3390/s26051676

Submission received: 20 January 2026 / Revised: 19 February 2026 / Accepted: 2 March 2026 / Published: 6 March 2026

(This article belongs to the Section Remote Sensors)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Direction of Arrival (DOA) estimation faces significant performance degradation under low Signal-to-Noise Ratio (SNR) conditions, where traditional algorithms and deep learning models struggle due to corrupted spatial information and limited training data. To address these challenges, this paper introduces a novel two-stage framework that integrates a Generative Adversarial Network (GAN) for signal enhancement with a complex-valued Convolutional Neural Network (CNN) for DOA estimation. The proposed GAN incorporates an attention mechanism and a dedicated phase-consistent loss function to suppress noise while preserving spatial phase information critical for accurate direction finding. Enhanced signals are transformed into covariance matrices and processed by a complex-valued CNN designed to extract robust spatial features. Extensive experiments demonstrate that the proposed method achieves a DOA accuracy of 72.2% and a Root Mean Square Error (RMSE) of

3.9 °

at —10 dB SNR with 500 snapshots, substantially outperforming conventional and deep learning baselines. The framework also shows strong robustness to limited data, maintaining 93.8% accuracy with only 50 snapshots. The framework offers a practical solution for reliable DOA estimation in low-SNR and data-scarce environments.

Keywords:

DOA estimation; deep learning; generative adversarial network; array signal processing; data augmentation

1. Introduction

DOA estimation represents a fundamental problem in array signal processing with widespread applications in radar systems, wireless communications, sonar, and acoustic signal processing. Accurate DOA estimation is crucial for spatial filtering, beamforming, and source localization in various civilian and military systems [1,2,3].

Traditional DOA estimation algorithms can be broadly categorized into phase-comparison techniques, subspace-based methods, and maximum likelihood approaches. Phase-comparison methods estimate the incident angle by measuring inter-sensor phase differences with relatively low computational cost, which makes them attractive for real-time and hardware-efficient implementations; however, they typically require careful calibration and ambiguity resolution when long baselines are used [4,5,6,7,8,9,10].

Classical subspace-based techniques such as Multiple Signal Classification (MUSIC) and Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) have demonstrated remarkable performance under ideal conditions. These methods exploit the eigenstructure of the covariance matrix to separate signal and noise subspaces. Nevertheless, their performance can become more sensitive to noise and sample limitations in low-SNR regimes or with a small number of snapshots, which may lead to reduced robustness in challenging environments [11,12,13]. Maximum likelihood methods, while statistically efficient, suffer from computational complexity that limits their practical implementation.

Recent advances in deep learning have inspired novel approaches to DOA estimation that circumvent limitations of traditional methods. CNNs have been successfully applied to learn spatial features directly from array data or covariance matrices, while Recurrent Neural Networks (RNNs) and their variants have been employed to capture temporal dependencies in signal sequences [14,15].

From a computational perspective, classical subspace-based methods typically involve covariance estimation and subspace separation, where eigendecomposition introduces non-negligible complexity, and MUSIC further requires a grid search over candidate angles, which increases runtime proportionally to the angular resolution. As a result, their time to first estimation (TTFE) is largely determined by the snapshot accumulation needed for a reliable covariance estimate as well as the subsequent subspace separation and spectral search.

In contrast, DNN-based techniques usually shift the main computational burden to the offline training stage. Once trained, inference can be executed via a single forward pass with fixed latency, making them attractive for real-time deployment, especially on GPUs. However, their TTFE still depends on the input representation: models that rely on covariance matrices require collecting sufficient snapshots before forming

R

, whereas end-to-end networks operating on raw snapshots can potentially reduce TTFE by producing estimates with fewer samples or in a streaming manner. Despite these advantages, most deep learning-based DOA estimators remain vulnerable to performance degradation in noisy environments, as they primarily learn mapping functions without explicit denoising mechanisms. Moreover, existing CNN/RNN-based models often focus on feature extraction from raw or preprocessed data while being sensitive to noise-induced phase distortion, which is indispensable for spatial parameter estimation [16,17].

In this work, the term real time refers to online operation with bounded and predictable latency, rather than implying a universal microsecond-level end-to-end response across all platforms. We distinguish (i) the algorithmic latency after the input representation is available and (ii) the TTFE, which additionally includes the data acquisition window. For covariance-based pipelines, TTFE is dominated by the snapshot accumulation required to form the sample covariance matrix. Let K be the snapshot number and

f_{s}

be the effective acquisition rate; the observation window is

T_{win} = \frac{K}{f_{s}},

(1)

and the TTFE can be approximated as

T_{TTFE} \approx T_{win} + T_{alg},

(2)

where

T_{alg}

includes covariance construction and the subsequent inference/processing. Therefore, any absolute time value is scenario-dependent and should not be interpreted as a general definition of real-time DOA estimation. Notably, microsecond-level update rates reported by phase-interferometry/naive techniques are typically achieved via highly optimized hardware pipelines and per-snapshot processing, whereas covariance-based estimators trade longer observation windows for improved robustness under low SNR and limited data.

The integration of signal enhancement techniques with DOA estimation presents a promising direction to address noise robustness. Generative Adversarial Networks (GANs) have emerged as powerful tools for signal enhancement and denoising applications. In audio and speech processing, GAN-based enhancement has demonstrated remarkable capability in preserving signal integrity while suppressing noise. However, the application of GANs to array signal enhancement for DOA estimation has not been fully exploited, particularly concerning the preservation of spatial phase information critical for accurate direction finding [18].

The development of integrated enhancement-estimation frameworks presents several fundamental challenges that require careful consideration [19]. The primary challenge involves maintaining phase coherence during the denoising process to preserve spatial information, which is particularly crucial for DOA estimation as it fundamentally relies on precise phase relationships across array elements. Any phase distortion introduced during signal enhancement can severely degrade angle estimation accuracy and compromise the overall system performance. Another significant challenge lies in ensuring robust generalization capabilities across diverse noise types and varying SNR conditions, as practical operational environments often exhibit characteristics that may differ substantially from the training data distribution. This necessitates the development of algorithms that can adapt to unseen noise patterns and maintain consistent performance across a wide spectrum of signal quality conditions [20,21,22].

Furthermore, achieving an optimal balance between enhancement quality and computational efficiency represents a critical challenge, especially for real-time applications where processing latency and resource constraints impose practical limitations [23]. The enhancement process must deliver substantial noise suppression while maintaining computational tractability to enable deployment in resource-constrained environments. Additionally, the effective integration of enhancement and estimation modules within a unified framework poses substantial design challenges, requiring careful coordination between the two stages to ensure they work synergistically rather than as independent components. This integration must facilitate seamless information flow between modules while maintaining overall architectural coherence and optimization compatibility. These interconnected challenges collectively define the core technical obstacles that must be systematically addressed to realize effective integrated frameworks for DOA estimation in practical scenarios [24,25].

To address these challenges, this paper makes several key contributions that advance the state of the art in DOA estimation for low-SNR environments. We propose a novel two-stage framework that systematically combines GAN-based signal enhancement with CNN-based DOA estimation, creating an integrated architecture specifically engineered to operate effectively in challenging signal-to-noise ratio conditions. This comprehensive approach addresses the fundamental limitation of existing methods that struggle with performance degradation in noisy environments by incorporating a dedicated signal enhancement stage prior to the estimation process [26,27,28,29].

We develop an enhanced GAN architecture that incorporates sophisticated attention mechanisms and phase-consistent loss functions to preserve crucial spatial characteristics during the denoising process. The attention mechanism enables selective focus on temporally significant signal components, while the phase-consistent loss function ensures the preservation of phase information that is critical for accurate spatial processing and DOA estimation. This represents a significant improvement over conventional denoising approaches that often compromise phase integrity in pursuit of noise reduction.

Furthermore, we design a specialized complex-valued CNN architecture capable of effectively processing enhanced covariance matrices for accurate DOA classification. This network leverages the complex nature of array signals through dedicated processing pathways that maintain the rich information content in both real and imaginary components, enabling more effective feature extraction from the spatial covariance matrices that form the foundation of DOA estimation.

The remainder of this paper is structured as follows. Section 2 provides a comprehensive review of traditional DOA estimation methods and recent advances in deep learning-based approaches, along with an introduction to Generative Adversarial Networks. Section 3 details the proposed two-stage framework, including the design of the GAN-based signal enhancement module and the complex-valued CNN architecture for DOA estimation. Section 4 presents extensive experimental results and in-depth performance analysis under various conditions. Finally, Section 5 concludes the paper and suggests directions for future research.

2. Related Work

2.1. Traditional DOA Estimation Methods

Traditional DOA estimation methods can be broadly grouped into beamforming-based techniques, subspace-based methods, and sparse representation approaches [30,31]. Beamforming methods estimate DOA by scanning candidate angles and evaluating an output power criterion, offering relatively low implementation complexity but limited resolution under closely spaced sources or strong interference [32]. Subspace-based algorithms such as MUSIC and ESPRIT achieve super-resolution by exploiting the eigen-structure of the spatial covariance matrix; however, they are known to be sensitive to practical factors such as low SNR, limited snapshots, model mismatch, and source coherence, which can reduce robustness in challenging environments [33,34]. Sparse methods cast DOA estimation as a sparse recovery problem and can be effective with limited snapshots, but their performance depends on dictionary resolution and regularization choices [35].

Since the above principles are well established, we briefly emphasize a recent trend that is highly relevant to practical deployment: efficient real-time and hardware-oriented implementations of classical DOA/beamforming techniques. Recent works report FPGA/parallel architectures for MUSIC-like processing to reduce latency and resource usage, including dedicated acceleration of key blocks and high-level-synthesis (HLS) implementations. Processor-oriented ESPRIT implementations have also been proposed to enable scalable and efficient DOA estimation on digital hardware [36]. For beamforming, practical system-level implementations of adaptive methods on FPGA-based digital beamforming receivers have been investigated to meet real-time constraints [37]. In addition, phase-comparison AoA estimation has attracted renewed interest due to its hardware friendliness and reconfigurable full-digital architectures [38].

Despite these advances in efficient implementations, traditional pipelines remain fundamentally limited by their sensitivity in low-SNR and data-scarce regimes, motivating hybrid data-driven frameworks that explicitly enhance the received signals prior to DOA estimation. This motivates the enhancement–estimation paradigm adopted in this work.

2.2. Application of Deep Learning in DOA Estimation

The application of deep learning to DOA estimation represents a paradigm shift, leveraging neural networks to learn direct mappings from array data to source directions and thereby circumventing the stringent statistical assumptions of classical methods. Deep learning-based DOA estimators can be categorized by network architecture, input representation, and learning strategy, each suited to specific operational scenarios [39,40,41]. Among these, CNNs have been widely adopted due to their ability to extract spatial features from structured representations of array data. A common approach treats the spatial covariance matrix as a two-dimensional image, where the fundamental mapping learned by the network can be expressed as

\hat{θ} = f_{CNN} (R), R = \frac{1}{K} \sum_{k = 1}^{K} x (k) x^{H} (k) \in C^{N \times N}

(3)

with

x (k)

being the array snapshot at time k and N the number of sensors. To preserve the phase information critical for DOA, complex-valued CNNs have been developed that employ complex convolution operations. The output feature map can be written as

Y = σ (W * X + b)

(4)

where ∗ denotes complex convolution, and

W

and

b

are complex-valued convolution kernels and bias terms, respectively. Here,

σ (\cdot)

denotes a nonlinear activation function. In this work, we adopt a split activation applied independently to the real and imaginary parts, i.e.,

σ (Z) = ϕ (ℜ {Z}) + j ϕ (ℑ {Z})

(5)

where

ϕ (\cdot)

is a standard real-valued nonlinearity (e.g., ReLU),

ℜ {\cdot}

and

ℑ {\cdot}

denote the real and imaginary components, and

j = \sqrt{- 1}

.

For scenarios involving moving sources or sequential snapshots, recurrent architectures such as recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) networks are employed to capture temporal dependencies [42]. The state update and DOA prediction at time t are formulated as

h_{t} = LSTM (x_{t}, h_{t - 1}), {\hat{θ}}_{t} = W_{o} h_{t} + b_{o}

(6)

where

h_{t}

is the hidden state. Bidirectional RNNs further incorporate future context to improve accuracy, especially in offline batch processing [43]. A more recent trend emphasizes end-to-end learning frameworks that bypass explicit covariance computation altogether, mapping raw array data directly to DOA estimates

\hat{θ} = f_{CNN / RNN} (X_{raw}), X_{raw} \in C^{N \times T}

(7)

While eliminating the need for statistical preprocessing, such models demand large datasets and careful architectural design to implicitly learn spatial correlations.

To address the pervasive challenge of data scarcity, transfer learning techniques have been explored to adapt models trained on simulated data to real-world conditions [44]. This is typically achieved through composite loss functions that incorporate domain adaptation

L_{total} = L_{DOA} + λ \cdot L_{domain} (D_{sim} D_{real})

(8)

Despite mitigating the simulation-to-reality gap, these approaches still struggle under low-SNR conditions or when confronted with unseen array geometries.

Notwithstanding significant progress, deep learning-based DOA methods continue to face several unresolved challenges, including pronounced sensitivity to noise, limited generalization across different array configurations, high computational costs for training complex networks, and a general lack of interpretability inherent to black-box models [45,46]. These limitations underscore the necessity for hybrid approaches that synergistically integrate established signal processing principles with the representational power of deep learning. The proposed GAN-CNN framework directly responds to this need by introducing a dedicated signal enhancement stage prior to estimation, thereby addressing the critical issue of noise robustness and paving the way for more reliable DOA systems in practical low-SNR environments.

2.3. GAN Model

Generative Adversarial Networks have been widely adopted in signal processing due to their adversarial learning mechanism [47,48,49]. A GAN consists of a generator (G) and a discriminator (D) trained in a minimax game:

min_{G} max_{D} V (D, G) = E_{x \sim p_{data}} [log D (x)] + E_{z \sim p_{z}} [log (1 - D (G (z)))]

(9)

where

x

denotes real/true samples drawn from the data distribution, and

z

is the latent input. In this work, each sample corresponds to complex-valued baseband array snapshots. To enable standard deep-learning operations, we represent complex snapshots by two channels, i.e.,

x \in R^{2 \times L}

, where L is the temporal length.

In signal enhancement settings, conditional GANs are particularly suitable because the generator learns a supervised mapping from noisy observations to clean targets:

G : X_{noisy} \to X_{clean}

(10)

Here,

X_{noisy}

and

X_{clean}

are complex array-snapshot sequences represented by their real/imaginary channels. From a physical perspective,

G (\cdot)

aims to suppress additive noise while preserving inter-sensor phase relations that encode the DOA information. In our implementation, 1D convolutions are applied along the temporal dimension to capture local time structures in the received signals, while the discriminator employs adaptive pooling to aggregate temporal evidence into a compact decision statistic for distinguishing enhanced signals from clean references [50,51,52,53].

Regarding array calibration, the training data are assumed to be obtained from a calibrated array model. To mitigate potential residual phase offsets after enhancement, a phase alignment step is applied before covariance construction in the preprocessing pipeline. Incorporating explicit sensor gain mismatch modeling via data augmentation or calibration-aware regularization is an important extension and will be considered in future work.

Our implementation builds upon the conditional GAN framework with key enhancements specifically designed for DOA signal processing requirements. The architecture incorporates multi-scale feature extraction through hierarchical downsampling blocks to capture both local and global signal characteristics. An integrated attention mechanism selectively emphasizes temporally significant components, while a symmetric decoder with skip connections ensures the preservation of critical phase information and temporal resolution.

The discriminator employs a computationally efficient design utilizing strided convolutions and adaptive pooling, maintaining strong discriminative capability while minimizing computational overhead. The training process incorporates a composite loss function

L = L_{adv} + λ_{1} L_{L 1} + λ_{2} L_{phase}

that combines adversarial training with magnitude reconstruction and phase consistency constraints. Notably, the phase consistency loss

L_{phase} = E [| ∠ G (x_{noisy}) - ∠ x_{clean} |_{1}]

explicitly addresses the preservation of spatial phase relationships essential for accurate DOA estimation.

Our implementation employs a two-phase training approach. First, generator pre-training uses only reconstruction losses (

L_{L 1} + L phase

) to provide stable initialization. Second, adversarial fine-tuning involves joint optimization of generator and discriminator with the complete loss function. This approach, combined with dynamic weight adjustment and gradient clipping, ensures stable training convergence and prevents mode collapse issues common in GAN training.

The proposed GAN framework demonstrates significant advantages for DOA signal enhancement by simultaneously addressing magnitude reconstruction and phase preservation, while maintaining computational efficiency suitable for real-time applications.

3. The Proposed Method

In this section, we present our proposed approach for DOA estimation in low-SNR environments using a novel two-stage deep learning framework. Traditional DOA estimation methods typically suffer from significant performance degradation under low-SNR conditions due to the corruption of signal subspace information in covariance matrices. This limitation affects both classical subspace-based algorithms and modern learning-based approaches, as noise contamination distorts the essential spatial features required for accurate direction finding. To address this fundamental challenge, we introduce an integrated framework that combines signal enhancement with deep learning-based DOA estimation. Our methodology employs an enhanced GAN architecture specifically designed for array signal denoising in the first stage, followed by a complex-valued CNN for robust DOA estimation in the second stage. The GAN-based enhancement module learns to recover clean signal characteristics from noisy inputs while preserving crucial phase information essential for spatial processing. The enhanced signals are then processed through a specialized CNN architecture that extracts discriminative features from covariance matrices for accurate angle estimation. By leveraging this sequential enhancement-estimation paradigm, our framework demonstrates superior robustness in challenging low-SNR scenarios while maintaining the computational efficiency required for practical implementations.

3.1. GAN Design for Signal Enhancement

3.1.1. Network Architecture

The generator G employs an encoder–decoder architecture with attention mechanisms for processing complex-valued array signals. The input consists of noisy signals

X_{noisy} \in R^{2 \times L}

, where two channels represent real and imaginary components, and L is the sequence length. This representation preserves complex signal information while maintaining compatibility with standard deep learning frameworks.

The encoder pathway, shown in Figure 1, extracts multi-scale features through three downsampling blocks. The transformation can be expressed as:

F_{1} = LeakyReLU ({Conv 1 D}_{2 \to 64} (X_{noisy}))

(11)

F_{2} = IN (LeakyReLU ({Conv 1 D}_{64 \to 128} (F_{1})))

(12)

F_{3} = IN (LeakyReLU ({Conv 1 D}_{128 \to 256} (F_{2})))

(13)

where IN denotes instance normalization and strided convolutions reduce temporal resolution by half at each stage. An attention mechanism selectively emphasizes temporally significant components:

F_{att} = F_{3} \otimes σ ({Conv 1 D}_{256 \to 1} (F_{3}))

(14)

where

σ

is the sigmoid function and ⊗ denotes element-wise multiplication. This enables focus on DOA-critical signal segments while suppressing noisy regions. The decoder performs symmetric upsampling with skip connections:

D_{1} = IN (ReLU ({ConvTranspose 1 D}_{256 \to 128} (F_{att})))

(15)

D_{2} = IN (ReLU ({ConvTranspose 1 D}_{128 \to 64} (D_{1} + F_{2})))

(16)

X_{enhanced} = tanh ({Conv 1 D}_{64 \to 2} (D_{2} + F_{1}))

(17)

To make the mathematical description directly traceable to the generator blocks in Figure 1, we map each equation to its corresponding module: Equations (11)–(13) implement the three encoder/downsampling blocks (Enc-1/Enc-2/Enc-3) producing

F_{1}

,

F_{2}

, and

F_{3}

; Equation (14) corresponds to the attention gate applied to

F_{3}

to obtain the reweighted feature

F_{att}

; Equations (15)–(17) implement the symmetric decoder/upsampling blocks (Dec-1/Dec-2) and the output layer, producing

D_{1}

,

D_{2}

, and the enhanced complex-valued signal

X_{enhanced}

.

The above operations follow a standard multi-scale denoising principle implemented by an encoder–decoder architecture with skip connections (U-Net style). The encoder uses strided 1D convolutions to progressively enlarge the receptive field and to capture both local waveform patterns and long-range temporal context, which is beneficial for modeling noise statistics in low-SNR conditions. The decoder reconstructs the enhanced signal from the compressed representation while preserving temporal resolution.

Skip connections (e.g.,

D_{1} + F_{2}

and

D_{2} + F_{1}

in Equations (16) and (17)) are introduced to retain fine-grained details that may be lost during downsampling and to stabilize optimization by providing short paths for gradient propagation. This is particularly important in DOA enhancement, where subtle phase-related structures must be preserved for subsequent spatial processing.

The attention gate in Equation (14) performs data-adaptive feature reweighting. Under low SNR, not all temporal regions contribute equally to reliable spatial cues; therefore, the learned sigmoid mask acts as a soft selection mechanism that emphasizes DOA-informative segments while suppressing heavily corrupted regions, improving robustness without relying on handcrafted heuristics.

The discriminator D employs a lightweight architecture for efficient adversarial training:

D (X) = σ ({FC}_{128 \to 1} (GAP (f_{conv} (X))))

(18)

where GAP denotes global average pooling and

f_{conv} (\cdot)

represents strided convolutional layers.

3.1.2. Loss Functions

The training employs a composite loss function designed to balance multiple enhancement objectives:

L_{G} = L_{adv} + λ_{1} L_{L 1} + λ_{2} L_{phase}

(19)

where

λ_{1} = 10

and

λ_{2} = 2.0

are empirically determined weighting coefficients. The adversarial loss follows the standard GAN objective:

L_{adv} = E_{x \sim p_{data}} [log D (x)] + E_{z \sim p_{z}} [log (1 - D (G (z)))]

(20)

This establishes a minimax game between generator G and discriminator D, ensuring enhanced signals follow the statistical distribution of clean references. The reconstruction loss employs L1 distance for signal fidelity:

L_{L 1} = {∥ G (X_{noisy}) - X_{clean} ∥}_{1}

(21)

The L1 norm is chosen over L2 for its superior preservation of sharp signal transitions and reduced smoothing effects, crucial for maintaining array signal characteristics. The phase consistency loss preserves spatial phase information essential for DOA estimation:

L_{phase} = \frac{1}{L} \sum_{i = 1}^{L} |ϕ_{enhanced}^{(i)} - ϕ_{clean}^{(i)}|

(22)

where

ϕ_{enhanced}^{(i)} = arctan 2 (G_{imag}^{(i)}, G_{real}^{(i)})

and

ϕ_{clean}^{(i)} = arctan 2 (X_{clean, imag}^{(i)}, X_{clean, real}^{(i)})

are phase angles computed using the four-quadrant arctangent function.

In array DOA estimation, the directional information is primarily encoded in the relative phase between sensor channels. For a narrowband far-field source, the complex baseband snapshot at sensor m can be written as

x_{m} (t) = s (t) exp {- j ω_{0} τ_{m} (θ)} + n_{m} (t)

, where

τ_{m} (θ)

is the geometry-dependent propagation delay and

n_{m} (t)

is noise. Hence, the inter-sensor phase differences

∠ x_{m} (t) - ∠ x_{m^{'}} (t)

carry the DOA-dependent delay information and determine the structure of the spatial covariance matrix used by subspace and learning-based estimators. Additive noise perturbs these inter-channel phase relations and leads to covariance distortion, which is particularly severe at low SNR.

The phase-consistency loss

L_{phase}

explicitly penalizes phase mismatches between the enhanced and clean references, encouraging the generator to preserve DOA-informative inter-channel phase relations during denoising. The weight

λ_{2}

controls the trade-off between amplitude fidelity and phase fidelity; in our implementation,

λ_{2} = 2.0

was selected empirically on the validation set to improve DOA performance without overly constraining the generator.

The weighting strategy assigns higher importance to reconstruction (

λ_{1} = 10

) to ensure signal fidelity, while maintaining sufficient emphasis on phase preservation (

λ_{2} = 2.0

) for DOA-critical spatial information. This formulation enables simultaneous noise suppression, signal reconstruction, and phase preservation.

3.1.3. Training Strategy

The training employs a two-phase procedure for stable GAN convergence. Phase 1 pre-trains the generator using only reconstruction losses:

L_{pre} = λ_{1} L_{L 1} + λ_{2} L_{phase}

(23)

for a predetermined number of epochs, establishing baseline signal enhancement without adversarial complexity.

Phase 2, depicted in Figure 2, initiates adversarial training with alternating optimization. The discriminator updates every batch:

θ_{D} \leftarrow θ_{D} - η \nabla_{θ_{D}} L_{adv}^{D}

(24)

with

L_{adv}^{D} = - E [log D (X_{clean})] - E [log (1 - D (G (X_{noisy})))]

. The generator updates every second batch:

θ_{G} \leftarrow θ_{G} - η \nabla_{θ_{G}} L_{G}

(25)

where

L_{G} = L_{adv}^{G} + L_{rec}

and

L_{adv}^{G} = - E [log D (G (X_{noisy}))]

.

A particularly important aspect of our training strategy is the implementation of dynamic weight adjustment for the loss coefficients throughout the adversarial training phase. The reconstruction loss weight

λ_{1}

undergoes gradual attenuation according to a predefined schedule, effectively transitioning the optimization focus from reconstruction-dominated to adversarial-dominated objectives as training progresses. This adaptive weighting mechanism acknowledges that initial training stages benefit from strong reconstruction guidance to establish basic signal enhancement capabilities, while later stages require increased emphasis on adversarial training to refine the perceptual quality and statistical properties of the generated signals. The phase consistency weight

λ_{2}

maintains a constant value throughout training, reflecting the critical importance of phase preservation for DOA estimation applications regardless of the training stage. The dynamic weight adjustment is mathematically expressed as:

λ_{1}^{(e)} = λ_{1}^{(0)} \cdot γ^{⌊ e / T_{λ} ⌋}

(26)

with a decay factor

γ

and period

e / T_{λ}

, where e represents the current epoch. This formulation ensures smooth transition from reconstruction-focused to adversarial-focused optimization.

λ_{2}

remains constant for continuous phase preservation. Stabilization techniques include gradient clipping:

g \leftarrow \frac{g}{max (1, ∥ g ∥_{2} / c)}

(27)

with clipping threshold c, and learning rate scheduling:

η^{(e)} = η^{(0)} \cdot α^{⌊ e / S_{η} ⌋}

(28)

with decay factor

α

and scheduling period

S_{η}

. The complete procedure, detailed in Algorithm 1, produces enhanced signals with preserved phase relationships for accurate DOA estimation.

Algorithm 1 Two-Phase Training Algorithm for GAN-Based DOA Signal Enhancement

Input: Noisy array signal batches ${(X_{noisy}^{(b)}, X_{clean}^{(b)})}_{b = 1}^{B}$ (where B is total batch number), Pre-training epochs $E_{pre} = 10$ , Adversarial training epochs $E_{adv} = E$ , Loss weights $λ_{1}, λ_{2}$ , Generator G, Discriminator D
Output: Trained generator $G^{*}$ and discriminator $D^{*}$

1:: procedure GANTraining ▹ Phase 1: Generator Pre-training (Reconstruction-Oriented)
2:: for $epoch = 1$ to $E_{pre}$ do
3:: for each batch $b = 1$ to B do
4:: Extract batch data: $X_{noisy} = X_{noisy}^{(b)}$ , $X_{clean} = X_{clean}^{(b)}$
5:: Compute reconstruction loss: $L_{rec} = λ_{1} L_{L 1} (G (X_{noisy}), X_{clean}) + λ_{2} L_{phase} (G (X_{noisy}), X_{clean})$
6:: Update generator parameters $θ_{G}$ via backpropagation: $θ_{G} \leftarrow θ_{G} - η \nabla_{θ_{G}} L_{rec}$ (where $η$ is learning rate)
7:: end for
8:: end for
▹ Phase 2: Adversarial Fine-Tuning (Distribution-Matching)
9:: for $epoch = 1$ to $E_{adv}$ do
10:: for each batch $b = 1$ to B do
11:: Extract batch data: $X_{noisy} = X_{noisy}^{(b)}$ , $X_{clean} = X_{clean}^{(b)}$
12:: Generate enhanced signal: $X_{enhanced} = G (X_{noisy})$
13:: Compute adversarial loss for discriminator: $L_{adv}^{D} = - E [log D (X_{clean})] - E [log (1 - D (X_{enhanced}))]$
14:: Update discriminator parameters $θ_{D}$ via backpropagation: $θ_{D} \leftarrow θ_{D} - η \nabla_{θ_{D}} L_{adv}^{D}$
15:: if $b mod 2 = 0$ then ▹ Alternating update: Generator every 2 batches
16:: Compute total generator loss: $L_{G} = L_{adv}^{G} + L_{rec}$ (where $L_{adv}^{G} = - E [log D (X_{enhanced})]$ )
17:: Update generator parameters $θ_{G}$ via backpropagation: $θ_{G} \leftarrow θ_{G} - η \nabla_{θ_{G}} L_{G}$
18:: end if
19:: end for
20:: end for
21:: Set $G^{*} = G$ , $D^{*} = D$
22:: end procedure

3.2. DOA Estimation with Enhanced Data

3.2.1. Data Preprocessing Pipeline

Enhanced signals undergo preprocessing to obtain spatial domain representations for DOA estimation. The pipeline consists of covariance matrix computation, phase calibration, and normalization.

First, the spatial covariance matrix is computed from enhanced signals:

R = \frac{1}{K} X_{enhanced} X_{enhanced}^{H}

(29)

where

X_{enhanced} \in C^{N \times K}

with

N = 10

array elements and

K = 200

snapshots, yielding

R \in C^{N \times N}

. This matrix encapsulates spatial correlations:

R_{i j} = \frac{1}{K} \sum_{k = 1}^{K} x_{i} (k) x_{j}^{*} (k)

(30)

where

x_{i} (k)

denotes the k-th snapshot at the i-th array element. The covariance matrix emphasizes correlated signal components while suppressing uncorrelated noise. Phase calibration addresses potential phase distortions:

X_{calibrated} = X_{enhanced} ⊙ exp (j ϕ_{corr} α)

(31)

with phase difference

ϕ_{corr} = ∠ X_{orig} - ∠ X_{enhanced}

and calibration factor

α = 0.7

. This preserves critical phase relationships for DOA estimation.

The calibration step applies a partial phase-alignment between the enhanced signal and the original received signal. Specifically,

ϕ_{corr}

captures the average phase offset between

X_{orig}

and

X_{enhanced}

, and

α \in [0, 1]

is a conservative correction gain. From a practical standpoint, using

α = 1

would enforce full phase correction, which may over-compensate under low SNR where the offset estimate is noisy and may introduce phase jitter or discontinuities. Using

α < 1

provides stable under-correction, improving inter-channel phase coherence while avoiding amplification of residual estimation errors. In our experiments,

α = 0.7

was selected empirically based on validation performance as a robust trade-off between phase alignment and stability. Finally, normalization ensures consistent scaling:

\tilde{R} = \frac{R - μ_{R}}{σ_{R}}

(32)

where

μ_{R}

and

σ_{R}

are the mean and standard deviation of

R

, maintaining spatial information while stabilizing CNN processing.

3.2.2. Complex-Valued CNN Architecture

The DOA estimation network employs a sophisticated complex-valued CNN architecture specifically designed to process covariance matrices and extract spatial features for direction finding. The complete network structure is illustrated in Figure 3.

As illustrated in Figure 3, the proposed complex-valued CNN follows a structured pipeline, where each operation in Equations (33)–(41) corresponds to a specific block in Figure 3 (Blocks B1–B7).

The network starts from the complex-valued covariance matrix

R \in C^{N \times N}

and applies a complex convolution layer:

Z_{1} = {Conv 2 D}_{1 \to 32} (R)

(33)

which matches Block B1 in Figure 3.

To preserve phase relations while stabilizing feature magnitudes, we normalize the complex feature map by its modulus, i.e.,

Z_{2} = Z_{1} / | Z_{1} |

. A 2D-FFT is then applied to capture frequency-domain patterns:

Z_{3} = F_{2 D} (Z_{2})

(34)

which are depicted as the modulus-normalization step (B2) and the FFT block (B3) in Figure 3.

The complex tensor is decomposed into real and imaginary parts and concatenated along the channel dimension:

Z_{4} = [ℜ (Z_{3}), ℑ (Z_{3})] \in R^{64 \times H \times W}

(35)

consistent with Block B4 in Figure 3.

The network then performs two stages of convolution–activation–pooling:

H_{1} = {MaxPool}_{2 \times 2} (ReLU ({Conv 2 D}_{64 \to 64} (Z_{4})))

(36)

H_{2} = {MaxPool}_{3 \times 3} (ReLU ({Conv 2 D}_{64 \to 64} (H_{1})))

(37)

which correspond to Blocks B5 and B6 in Figure 3, respectively.

Finally, the feature map is flattened and passed through three fully connected layers to output logits over discrete angles:

F = Flatten (H_{2}) \in R^{576}

(38)

H_{3} = ReLU (W_{1} F + b_{1})

(39)

H_{4} = ReLU (W_{2} H_{3} + b_{2})

(40)

P = W_{3} H_{4} + b_{3}

(41)

where

P \in R^{181}

denotes the logits for

θ \in {0 °, 1 °, \dots, 180 °}

. This part corresponds to the Flatten+FC head in Block B7 of Figure 3.

3.2.3. Training Methodology

Training employs cross-entropy loss:

L_{DOA} = - \frac{1}{B} \sum_{i = 1}^{B} \sum_{k = 0}^{180} y_{i, k} log (\frac{exp (P_{i, k})}{\sum_{j = 0}^{180} exp (P_{i, j})})

(42)

where B is batch size,

y_{i, k} \in {0, 1}

are one-hot ground truth labels, and

P_{i} \in R^{181}

are network outputs. This formulation effectively measures the discrepancy between predicted and true angle distributions, providing a well-established optimization target for the multi-class classification task inherent in DOA estimation. Stratified sampling maintains angle distribution balance:

p (θ) \propto \frac{1}{N_{θ}}, θ \in {0 °, 1 °, \dots, 180 °}

(43)

where

N_{θ}

is sample count per angle, preventing bias toward frequent angles. Learning rate scheduling follows:

η_{e} = η_{0} \cdot γ^{⌊ e / T_{η} ⌋}

(44)

with decay factor

γ

and period

T_{η}

, enabling rapid convergence and fine adjustment To prevent gradient explosion and stabilize optimization, we apply

ℓ_{2}

-norm gradient clipping:

g \leftarrow \frac{g}{max ({1, ∥ g ∥}_{2} / c)}

(45)

where

g

denotes the concatenated gradient vector of all trainable parameters and c is the clipping threshold. In practice, c should be chosen to (i) avoid suppressing typical gradients and (ii) limit rare large updates that may cause divergence. In our implementation, we set

c = 1.0

, which is a commonly used and conservative value for Adam-based training in deep networks. We also observed that values in the range

c \in [0.5, 2.0]

lead to stable training, while too small c (e.g.,

< 0.2

) may slow convergence and too large c (e.g.,

> 5

) provides little protection.

Assume

{∥ g ∥}_{2} = 3.2

and

c = 1.0

. The scaling factor becomes

s = 1 / max (1, 3.2 / 1.0)

, so the clipped gradient is

g \leftarrow 0.3125 g

and its norm is reduced to

{∥ g ∥}_{2} = c = 1.0

. If instead

{∥ g ∥}_{2} = 0.8 < c

, then

s = 1

and no clipping is applied. Early stopping monitors validation accuracy

A_{val}

, terminating training when:

\frac{A_{val} (t) - A_{val} (t - Δ)}{Δ} < ϵ

(46)

for

Δ

consecutive epochs, maximizing generalization while preventing overfitting.

The proposed GAN-based enhancement operates on complex baseband snapshots across sensor channels and does not require an explicit steering vector model; therefore, the enhancement stage is geometry-agnostic. Array geometry affects DOA estimation through the subsequent estimator. Thus, the proposed pipeline can be applied to linear/rectangular/circular arrays, provided that the training/simulation data and calibration settings reflect the target geometry and hardware characteristics. The complete framework synergistically combines GAN-based signal enhancement with complex-valued CNN processing, enabling robust DOA estimation in challenging low-SNR scenarios through complementary improvement of signal quality and spatial feature extraction.

4. Simulations and Performance Evaluation

4.1. Experimental Setup

4.1.1. Datasets and Parameters

We consider a narrowband far-field single-source scenario impinging on a uniform linear array (ULA) with M sensors and inter-element spacing

d = λ / 2

. The complex baseband snapshot at time index k is modeled as

x (k) = a (θ) s (k) + n (k), k = 1, \dots, K,

(47)

where

a (θ) \in C^{M \times 1}

is the steering vector and

s (k)

is the source waveform. For the ULA, we use

a (θ) = {[1, e^{- j 2 π \frac{d}{λ} sin θ}, \dots, e^{- j 2 π (M - 1) \frac{d}{λ} sin θ}]}^{T}

(48)

The noise

n (k)

is modeled as spatially and temporally white complex Gaussian noise, unless otherwise stated in the non-Gaussian noise experiments.

Within each sample, the source DOA

θ

is assumed constant during the K snapshots, which is consistent with common DOA benchmarks and with the covariance-based processing adopted by both classical and learning-based baselines. Across different samples,

θ

is varied according to the predefined angle grid to form a labeled dataset.

A total of 9050 samples are generated. Each sample consists of

K = 200

snapshots collected under a fixed DOA angle and a fixed SNR setting. The source waveform

s (k)

is generated as a zero-mean unit-power complex random sequence to emulate unknown narrowband emissions, and the SNR is controlled by adjusting

σ^{2}

.

The simulations assume ideal sensor responses after standard array calibration. This provides a controlled baseline to isolate the impact of noise and snapshot scarcity. We additionally discuss practical non-idealities as future work and note that these effects can be incorporated by perturbing

a (θ)

or by applying per-sensor complex gains in the simulation pipeline.

4.1.2. Implementation Details

The models are implemented using PyTorch 1.9.0 and trained on NVIDIA RTX 4070ti GPUs. The GAN component undergoes training for 100 epochs with a batch size of 32, using the Adam optimizer with initial learning rates of

2 \times 10^{- 4}

and

1 \times 10^{- 4}

for the generator and discriminator respectively. The CNN-based DOA estimator is trained for 50 epochs with a batch size of 128, employing the Adam optimizer with initial learning rate of

1 \times 10^{- 3}

and step decay scheduling. The training–validation–test split follows a 80-10-10 ratio with stratified sampling to maintain angle distribution consistency.

4.1.3. Baseline Methods

The proposed method is rigorously evaluated through comprehensive comparisons with several established approaches to demonstrate its superior performance. The conventional MUSIC algorithm serves as a fundamental baseline. In our implementation, MUSIC uses the sample covariance matrix computed from K snapshots and performs an eigen-decomposition to obtain the noise subspace. To mitigate coherence effects, spatial smoothing is applied when needed. The MUSIC pseudo-spectrum is evaluated over a uniform angular grid within

[0 °, 180 °]

with a step size of

Δ θ = 0.5 °

. To ensure a fair comparison with the classification-based CNN outputs, continuous MUSIC estimates are mapped to the nearest integer degree before computing the accuracy and RMSE.

ESPRIT provides another classical benchmark. ESPRIT is implemented using two overlapping subarrays with one-sensor displacement to exploit rotational invariance, and it estimates DOA directly from the eigenvalues without grid search. The number of sources

K_{s}

is set consistently with the simulation setup to avoid bias from model-order mismatch.

For deep learning based comparisons, a standard Convolutional Neural Network (CNN-Based) architecture without GAN enhancement is implemented to isolate the contribution of the proposed signal enhancement stage. This baseline employs identical input processing and output layers to ensure fair comparison, but processes the original noisy signals directly without any preprocessing. Additionally, a Deep Residual Network (ResNet-Based) with identical architecture depth and parameter count is included to evaluate the impact of advanced network architectures alone, without the integrated enhancement framework. This comprehensive set of baseline methods ensures thorough evaluation across both traditional algorithmic approaches and modern deep learning paradigms.

4.2. Performance Metrics

To comprehensively evaluate the performance of the proposed framework across different dimensions, this section employs multiple quantitative metrics. The quality of signal enhancement is quantitatively assessed through the SNR Improvement:

SNR Improvement = {SNR}_{enhanced} - {SNR}_{original}

(49)

This metric directly reflects the noise suppression capability of the GAN-based enhancement module and reveals the critical effectiveness of the preprocessing stage in improving signal quality for subsequent DOA estimation. For the core DOA estimation performance, the evaluation employs DOA Estimation Accuracy, defined as:

Accuracy = \frac{Number of Correct Predictions}{Total Predictions} \times 100 %

(50)

This metric measures the percentage of exactly correct angle predictions across the entire test dataset, providing an intuitive understanding of the overall system performance while facilitating direct comparison with classification-based approaches in the literature. To provide a more nuanced assessment of estimation precision, the RMSE is introduced:

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {({\hat{θ}}_{i} - θ_{i})}^{2}}

(51)

This metric captures the average magnitude of estimation errors and penalizes larger deviations more severely, thereby offering a comprehensive view of estimation consistency across all test samples. In summary, these complementary metrics collectively form a multi-perspective evaluation framework that comprehensively assesses the system from two key dimensions: signal enhancement quality and final DOA estimation performance.

4.3. Results and Analysis

4.3.1. Signal Enhancement Performance

The GAN-based enhancement module demonstrates remarkable noise suppression capabilities across various SNR conditions with 500 snapshots. As shown in Table 1, the proposed method achieves significant SNR improvement, particularly in low-SNR scenarios where traditional methods struggle. The consistent performance gain across all SNR levels underscores the robustness of the proposed attention-enhanced GAN architecture in extracting meaningful signal components from noisy observations.

To visually demonstrate the enhancement performance in the time domain, Figure 4 and Figure 5 present the amplitude waveforms of the noisy input signal and the GAN-enhanced output signal, respectively. Figure 4 exhibits significant amplitude fluctuations and distortion due to additive noise contamination, while Figure 5 shows restored signal integrity with smooth amplitude variations and preserved temporal structure, clearly illustrating the noise suppression capability of the proposed method.

The frequency-domain analysis through spectrograms provides further insight into the enhancement performance. Figure 6 displays the Short-Time Fourier Transform (STFT) representation of the noisy signal, showing widespread noise components across the frequency spectrum. In contrast, Figure 7 demonstrates the cleaned spectrogram after GAN processing, exhibiting concentrated signal energy in the relevant frequency bands with significantly reduced noise floor. This comparison indicates effective frequency-domain denoising while maintaining the essential spectral characteristics.

The phase preservation capability is quantitatively evaluated through phase coherence measurements. The proposed method maintains an average phase error of only

2.3 °

compared to

15.7 °

for conventional denoising methods, confirming its effectiveness in preserving spatial information crucial for DOA estimation. This exceptional phase preservation is attributed to the dedicated phase consistency loss function integrated into the GAN training process, which explicitly optimizes for phase accuracy alongside magnitude reconstruction. The visual evidence from both time and frequency domains corroborates the quantitative results, demonstrating the comprehensive enhancement capability of the proposed GAN framework.

4.3.2. DOA Estimation Accuracy Under Varying SNR Conditions with 500 Snapshots

The end-to-end performance of the proposed framework is evaluated under various SNR conditions with 500 snapshots, as summarized in Table 2 and visually depicted in Figure 8. The proposed method consistently outperforms all baseline approaches across the entire SNR range from -10 dB to 20 dB, with particularly notable advantages in challenging low-SNR regimes.

The accuracy improvement is most pronounced at —10 dB SNR, where the proposed method achieves 72.2% accuracy compared to 65.7% for the best baseline method, representing a relative improvement of 6.5%. This demonstrates the framework’s exceptional robustness in extremely challenging low-SNR environments. The consistent performance advantage across the entire SNR spectrum, as clearly visualized in Figure 8, validates the effectiveness of the GAN-enhanced preprocessing stage in mitigating noise impacts on the subsequent DOA estimation.

4.3.3. Performance Analysis Under Different Snapshot Numbers at 20 dB SNR

To evaluate the framework’s capability in scenarios with varying data availability, comprehensive experiments are conducted with different snapshot numbers at 20 dB SNR, as detailed in Table 3 and illustrated in Figure 9. The proposed method demonstrates remarkable resilience to snapshot reduction, maintaining competitive performance even with severely limited temporal samples.

Notably, with only 50 snapshots at 20 dB SNR, the proposed method achieves 93.8% accuracy, significantly outperforming MUSIC (71.3%) and demonstrating a 3.5% relative improvement over the best deep learning baseline (ResNet-Based: 90.6%). The performance advantage is maintained across all snapshot configurations, with the proposed method achieving 98.9% accuracy with 500 snapshots. As evident from Figure 9, this robust performance under limited data conditions highlights the GAN enhancement’s ability to recover meaningful signal components from constrained temporal observations.

4.3.4. Root Mean Square Error Analysis Under Varying Conditions

The RMSE provides a comprehensive evaluation of estimation precision across different operational scenarios. Table 4 and Figure 10 present the RMSE performance under varying SNR conditions with 500 snapshots, demonstrating the proposed method’s superior estimation accuracy particularly in challenging low-SNR environments.

The proposed method achieves the lowest RMSE across all SNR levels, with particularly significant improvements in low-SNR conditions. At —10 dB SNR, the RMSE of 3.9° represents a 17.0% reduction compared to the best baseline method (ResNet-Based: 4.7°) and a 69.5% improvement over traditional MUSIC (12.8°). This substantial error reduction, clearly visible in Figure 10, underscores the effectiveness of the GAN enhancement in mitigating noise-induced estimation errors and improving overall estimation consistency.

Further analysis of RMSE under different snapshot numbers at 20 dB SNR, as shown in Table 5 and Figure 11, reveals the framework’s robustness to limited data availability.

With only 50 snapshots at 20 dB SNR, the proposed method achieves an RMSE of 1.1°, representing a 26.7% improvement over the best baseline (ResNet-Based: 1.5°) and a 73.8% improvement over MUSIC (4.2°). As the number of snapshots increases to 500, the RMSE improves to 0.4°, maintaining a consistent advantage over all comparison methods. The progressive improvement in estimation precision with increasing snapshot numbers is clearly demonstrated in Figure 11, highlighting the framework’s ability to extract reliable spatial information even from severely limited temporal data, a characteristic attributed to the signal enhancement stage’s capacity to recover meaningful components from noisy observations.

To provide a theoretical performance limit for the considered DOA estimation problem, we additionally report the Cramér–Rao Bound (CRB) as a baseline in Figure 10 and Figure 11. The CRB gives a lower bound on the variance of any unbiased estimator under the assumed statistical model, and therefore serves as a reference for gauging how far practical methods are from the best achievable accuracy.

We compute the CRB under the same narrowband far-field single-source ULA model used in our dataset generation, i.e.,

x (k) = a (θ) s (k) + n (k)

with spatially and temporally white complex Gaussian noise

n (k) \sim CN (0, σ^{2} I)

. Following the standard stochastic CRB formulation, the Fisher information for

θ

is obtained from the derivative of the steering vector

\partial a (θ) / \partial θ

. The bound is then plotted as

{RMSE}_{CRB} (θ) = \sqrt{CRB (θ)}

to match the RMSE metric used in our evaluation.

The comprehensive experimental results, supported by both tabular data and visual representations, demonstrate that the proposed method exhibits graceful performance degradation under adverse conditions, whereas traditional methods suffer from rapid performance deterioration. This robust behavior makes the framework particularly suitable for practical applications where both noise contamination and data limitations are common challenges.

4.3.5. Robustness Analysis Under Various Non-Gaussian Noise Conditions

To evaluate the framework’s robustness in practical scenarios where noise often deviates from ideal Gaussian characteristics, additional experiments are conducted under various non-Gaussian noise conditions at 10 dB SNR with 500 snapshots. Three representative non-Gaussian noise types are considered, each with distinct statistical properties that challenge DOA estimation algorithms.

Laplacian Noise is modeled using the Laplace distribution, which represents a heavy-tailed noise with higher kurtosis than Gaussian noise:

p (x) = \frac{1}{2 b} exp (- \frac{| x - μ |}{b})

(52)

with location parameter

μ = 0

and scale parameter

b = 0.5

, generating noise with heavier tails that better model real-world impulsive interference scenarios.

Uniformly Distributed Noise follows a continuous uniform distribution:

p (x) = \{\begin{matrix} \frac{1}{b - a} & a \leq x \leq b \\ 0 & otherwise \end{matrix}

(53)

with

a = - 1

and

b = 1

, representing bounded amplitude noise commonly encountered in quantization and clipping scenarios.

Mixture Gaussian Noise combines two Gaussian components:

p (x) = ρ N (0, σ_{1}^{2}) + (1 - ρ) N (0, σ_{2}^{2})

(54)

with mixing coefficient

ρ = 0.3

, variances

σ_{1}^{2} = 1

and

σ_{2}^{2} = 4

, simulating noise with multiple variance components that may occur in multi-source interference environments.

Table 6 presents the DOA estimation accuracy under different non-Gaussian noise conditions. The Laplacian noise, characterized by its heavier tails and higher probability of large-amplitude samples, poses particular challenges for traditional DOA estimation methods that assume Gaussian noise statistics.

As shown in Table 6 and Figure 12, the proposed method maintains superior performance across all non-Gaussian noise conditions. Under Laplacian noise, the GAN-CNN framework demonstrates remarkable robustness with only a 7.0% relative performance drop, while MUSIC suffers a 21.2% degradation. The relative performance analysis in Figure 12b further confirms the framework’s minimal sensitivity to noise distribution variations. The robustness index is quantified as:

R = \frac{A_{non - Gaussian}}{A_{Gaussian}} \times 100 %

(55)

where

A_{Gaussian}

and

A_{non - Gaussian}

represent accuracy under Gaussian and non-Gaussian conditions. The proposed method achieves indices of 93.0%, 95.7%, and 97.7% for Laplacian, uniform, and mixture Gaussian noise respectively, significantly outperforming MUSIC (78.7%, 85.5%, 91.1%).

Further analysis of the RMSE under Laplacian noise conditions reveals consistent advantages. At 10 dB SNR, the proposed method achieves RMSE of 1.2° compared to 3.1° for MUSIC, 1.7° for CNN-Based, and 1.5° for ResNet-Based. This demonstrates the framework’s ability to mitigate noise outliers while maintaining estimation precision.

The experimental results validate the framework’s strong generalization capability in non-Gaussian environments, particularly under challenging heavy-tailed noise conditions. This robustness ensures practical utility in real-world applications where noise characteristics often deviate from Gaussian assumptions.

4.3.6. Computational Overhead and Practical Considerations

To avoid ambiguity, we emphasize that the end-to-end latency depends on both the algorithmic computation and the data acquisition window used to form the input representation. Accordingly, we report computational complexity and discuss TTFE primarily in terms of snapshot number K (and

T_{win} = K / f_{s}

for a given acquisition rate), rather than claiming a fixed millisecond value as representative for all real-time systems.

Subspace-based algorithms such as MUSIC/ESPRIT require forming the sample covariance matrix, followed by eigendecomposition, which typically scales as

O (M^{2} K)

for covariance accumulation and

O (M^{3})

for eigendecomposition with M sensors and K snapshots. MUSIC further introduces a spectral search over a grid of G angles, leading to additional cost proportional to G. These steps can dominate runtime when fine angular resolution is needed or when repeated estimations are required.

CNN/ResNet-based estimators shift the main computational cost to offline training. At inference time, the complexity is dominated by a fixed number of convolution and fully connected operations, resulting in near-constant latency once the input representation is available. Similar to classical methods, if the input is the covariance matrix, a minimum number of snapshots is still required before the first estimate can be produced.

Compared to a CNN-only estimator, our method introduces an additional enhancement module before covariance construction. Hence, the online overhead mainly comes from one extra 1D convolutional encoder–decoder pass per received snapshot block. However, the enhancement stage is fully parallelizable on GPUs and can also be efficiently implemented on edge accelerators due to its convolutional structure. Importantly, we hypothesize that this additional overhead is compensated by improved robustness at low SNR and under limited snapshots: the GAN-based enhancement suppresses noise while the phase-consistent loss encourages preservation of spatial phase relations, which are crucial for DOA inference. Consequently, the subsequent CNN receives a higher-quality covariance representation, leading to the observed accuracy and RMSE gains in challenging regimes. In summary, the proposed approach trades a moderate additional inference cost for a significant robustness improvement in low-SNR and data-scarce scenarios. The training time comparison among different methods is summarized in Table 7.

4.4. Extension to Multi-Source Scenarios and Lightweight Phase-Interferometry Considerations

To address practical environments where multiple sources may coexist, we discuss how the proposed two-stage enhancement–estimation pipeline can be extended beyond the single-source setting. Importantly, the first-stage GAN in our framework operates on the complex-valued array snapshots to suppress noise while preserving phase relations; therefore, it remains applicable in multi-source conditions, as it does not rely on a single-source assumption. The principal modifications are implemented at the second stage, specifically concerning the DOA estimation network and its associated training and inference protocols.

For the common case of two simultaneous sources, the enhanced snapshots are first processed in the same manner as the single-source pipeline: the enhanced complex signals are used to form the sample covariance matrix, which is then fed into the DOA estimator. To enable two-source inference, the CNN is extended to a dual-head architecture: the feature extraction backbone is shared, while the output layer is split into two parallel classification heads, each producing a probability distribution over the 181 discrete angles. Let

p_{1}, p_{2} \in R^{181}

denote the predicted distributions from the two heads, and let

y_{1}, y_{2}

be the one-shot labels corresponding to the two ground-truth DOAs.

Since a two-source DOA target is an unordered set

{θ_{1}, θ_{2}}

, the learning objective should not enforce an artificial ordering between the heads. To explicitly address this permutation symmetry, we employ a permutation-invariant training objective:

L_{PIT} = min \{CE (y_{1}, p_{1}) + CE (y_{2}, p_{2}), CE (y_{1}, p_{2}) + CE (y_{2}, p_{1})\}

(56)

where

CE (y, p) = - \sum_{c = 1}^{181} y_{c} log (p_{c})

is the standard cross-entropy loss. This formulation allows the estimator to learn two consistent angular peaks without requiring a predefined source order.

During inference, each head produces one DOA estimate by selecting the maximum-probability angle, i.e.,

{\hat{θ}}_{1} = arg {max}_{c} p_{1} [c]

and

{\hat{θ}}_{2} = arg {max}_{c} p_{2} [c]

. For evaluation, the predicted pair is matched to the ground truth by selecting the permutation that minimizes the total angular error.

When the number of active sources is unknown, one practical strategy is to augment the estimator with a confidence-based selection mechanism. For example, each head can output a confidence score based on its maximum probability, and only predictions exceeding a threshold are retained as valid sources. Alternatively, a separate lightweight classifier can be added to predict the source count from the same covariance representation, which then activates the corresponding number of output heads. We leave a full implementation and systematic evaluation of source-count estimation as future work, as it requires broader multi-source simulation settings and additional annotations.

Lightweight AoA techniques based on phase interferometry are attractive for real-time and hardware deployment due to their low computational footprint and simple arithmetic operations. However, in multi-source settings, the measured inter-sensor phase differences generally reflect a superposition of multiple contributors, which introduces ambiguity and often requires additional separation steps to recover multiple angles. These methods are typically efficient but can be more sensitive when sources are closely spaced, have unequal powers, or when phase wrapping becomes prominent under low SNR.

In contrast, the proposed GAN–CNN pipeline aims to improve robustness in challenging regimes by enhancing noisy array snapshots while explicitly encouraging phase consistency, thereby providing a higher-quality covariance representation to the DOA estimator. This design can be viewed as complementary to lightweight interferometric approaches: the enhancement stage targets noise suppression and phase preservation, while the estimation stage can be adapted to multi-source outputs via permutation-invariant learning. A hybrid integration is an interesting direction for future work.

5. Conclusions

This paper presents a novel two-stage GAN-CNN fusion framework for robust DOA estimation in low-SNR and data-limited environments. The proposed method integrates an attention-enhanced GAN for signal denoising with a complex-valued CNN for accurate spatial feature extraction, addressing key challenges in conventional and deep learning-based DOA estimation methods. The GAN component, equipped with a phase-consistent loss function, effectively suppresses noise while preserving spatial phase information essential for accurate direction finding. The subsequent complex-valued CNN processes enhanced covariance matrices to extract discriminative spatial features for precise DOA classification.

Extensive experimental evaluations demonstrate the superior performance of the proposed framework across a wide range of SNR conditions, snapshot numbers, and non-Gaussian noise environments. Specifically, the method achieves a DOA accuracy of 72.2% and an RMSE of 3.9° at —10 dB SNR with 500 snapshots, significantly outperforming traditional algorithms such as MUSIC and ESPRIT, as well as state-of-the-art deep learning baselines. The framework also exhibits strong robustness to limited data, maintaining 93.8% accuracy with only 50 snapshots at 20 dB SNR. Furthermore, the proposed approach demonstrates consistent performance under various non-Gaussian noise conditions, highlighting its practical applicability in real-world scenarios.

The results validate the effectiveness of the integrated enhancement-estimation paradigm and underscore the importance of phase-preserving denoising in DOA estimation. Future work will focus on extending the framework to more complex array geometries, exploring online adaptation mechanisms for dynamic environments, and investigating lightweight architectures for real-time deployment on embedded systems.

Author Contributions

Z.Z.: Investigation, Validation, Writing—original draft, Software. W.X.: Investigation, Validation, Data curation. H.Z.: Conceptualization, writing of the original draft. S.Y.: Conceptualization, Methodology design, Writing—original draft, Resources, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Development of Key Data Algorithms for Jizhi Ship Technology (November 2055072401).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data will be made available by the authors upon request.

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

Author Shichao Yi was employed by the company Zhenjiang Jizhi Ship Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DOA	Direction of Arrival
SNR	Signal-to-Noise Ratio
GAN	Generative Adversarial Network
CNN	Convolutional Neural Network
MUSIC	Multiple Signal Classification
ESPRIT	Estimation of Signal Parameters via Rotational Invariance Techniques
RMSE	Root Mean Square Error
ULA	Uniform Linear Array
STFT	Short-Time Fourier Transform

References

Kulkarni, S.; Thakur, A.; Soni, S.; Hiwale, A.; Belsare, M.H.; Raj, A.B. A comprehensive review of direction of arrival (DOA) estimation techniques and algorithms. J. Electron. Electr. Eng. 2025, 4, 138–186. [Google Scholar] [CrossRef]
Kumar, V.; Dhull, S.K. Techniques of Direction of Arrival Estimation: A Review. IUP J. Electr. Electron. Eng. 2016, 9, 1. [Google Scholar]
Molaei, A.M.; Zakeri, B.; Andargoli, S.M.H.; Abbasi, M.A.B.; Fusco, V.; Yurduseven, O. A comprehensive review of direction-of-arrival estimation and localization approaches in mixed-field sources scenario. IEEE Access 2024, 12, 65883–65918. [Google Scholar] [CrossRef]
Doan, V.S.; Huynh-The, T.; Hoang, V.P.; Vesely, J. Phase-difference measurement-based angle of arrival estimation using long-baseline interferometer. IET Radar Sonar Navig. 2023, 17, 449–465. [Google Scholar] [CrossRef]
Wu, Y.W.; Rhodes, S.; Satorius, E.H. Direction of arrival estimation via extended phase interferometry. IEEE Trans. Aerosp. Electron. Syst. 1995, 31, 375–381. [Google Scholar] [CrossRef]
Grall, P.; Kochanska, I.; Marszal, J. Direction-of-Arrival Estimation Methods in Interferometric Echo Sounding. Sensors 2020, 20, 3556. [Google Scholar] [CrossRef] [PubMed]
Avitabile, G.; Florio, A.; Coviello, G. Angle of Arrival Estimation through a Full-Hardware Approach for Adaptive Beamforming. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 3033–3037. [Google Scholar] [CrossRef]
Xie, Q.; Wen, F.; Wang, X.; Wang, Z.; Yuen, C. Coupled CPD-Aided Tensor Train Decomposition for 2D-DOD and 2D-DOA Estimation in Bistatic MIMO Radar. IEEE Trans. Veh. Technol. 2025, 75, 938–952. [Google Scholar] [CrossRef]
Xie, Q.; Shi, J.; Wen, F.; Zheng, Z. Higher-order tensor decomposition for 2D-DOD and 2D-DOA estimation in bistatic MIMO radar. Signal Process. 2025, 238, 110196. [Google Scholar] [CrossRef]
Xie, Q.-P.; Li, X.-P.; Chen, J.-Y.; Fang, M.-X. Covariance Tensor Decomposition for NLOS Direction Finding in RIS-Aided Bistatic MIMO Radar. IEEE Signal Process. Lett. 2026, 33, 574–578. [Google Scholar] [CrossRef]
Karasalo, I. Estimating the covariance matrix by signal subspace averaging. IEEE Trans. Acoust. Speech Signal Process. 2003, 34, 8–12. [Google Scholar] [CrossRef]
Nadakuditi, R.R.; Silverstein, J.W. Fundamental limit of sample generalized eigenvalue based detection of signals in noise using relatively few signal-bearing and noise-only samples. IEEE J. Sel. Top. Signal Process. 2010, 4, 468–480. [Google Scholar] [CrossRef]
Guerci, J.; Bergin, J. Principal components, covariance matrix tapers, and the subspace leakage problem. IEEE Trans. Aerosp. Electron. Syst. 2002, 38, 152–162. [Google Scholar] [CrossRef]
Zuo, Z.; Shuai, B.; Wang, G.; Liu, X.; Wang, X.; Wang, B.; Chen, Y. Convolutional recurrent neural networks: Learning spatial dependencies for image representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 7–12 June 2015; pp. 18–26. [Google Scholar]
Vidyaratne, L.S.; Alam, M.; Glandon, A.M.; Shabalina, A.; Tennant, C.; Iftekharuddin, K.M. Deep cellular recurrent network for efficient analysis of time-series data with spatial information. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6215–6225. [Google Scholar] [CrossRef]
Demiss, B.A.; Elsaigh, W.A. Application of novel hybrid deep learning architectures combining convolutional neural networks (CNN) and recurrent neural networks (RNN): Construction duration estimates prediction considering preconstruction uncertainties. Eng. Res. Express 2024, 6, 032102. [Google Scholar] [CrossRef]
Bhalla, S.; Kumar, A.; Kushwaha, R. Analysis of recent techniques in marine object detection: A review. Multimed. Tools Appl. 2025, 84, 20339–20437. [Google Scholar] [CrossRef]
Bai, J.; Shu, F.; Gao, W.; Wu, G.; Yang, W.; Chen, R.; Zhuang, Z. A novel deep neural network architecture based GAN-DRANet for DOA sensing with an enhanced performance in low SNR. IEEE Internet Things J. 2025, 13, 22610–22622. [Google Scholar] [CrossRef]
Florio, A.; Avitabile, G.; Coviello, G. Digital Phase Estimation through an I/Q Approach for Angle of Arrival Full-Hardware Localization. In Proceedings of the 2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Ha Long, Vietnam, 8–10 December 2020; pp. 106–109. [Google Scholar]
Haykin, S.; Thomson, D.J. Signal detection in a nonstationary environment reformulated as an adaptive pattern classification problem. Proc. IEEE 2002, 86, 2325–2344. [Google Scholar] [CrossRef]
Kumar, R.R.; Priyadarshi, R. Denoising and segmentation in medical image analysis: A comprehensive review on machine learning and deep learning approaches. Multimed. Tools Appl. 2025, 84, 10817–10875. [Google Scholar] [CrossRef]
Monga, V.; Li, Y.; Eldar, Y.C. Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing. IEEE Signal Process. Mag. 2021, 38, 18–44. [Google Scholar] [CrossRef]
Santoso, A.; Surya, Y. Maximizing decision efficiency with edge-based AI systems: Advanced strategies for real-time processing, scalability, and autonomous intelligence in distributed environments. Q. J. Emerg. Technol. Innov. 2024, 9, 104–132. [Google Scholar]
Nguyen, T.N.T.; Gan, W.-S.; Ranjan, R.; Jones, D.L. Robust source counting and DOA estimation using spatial pseudo-spectrum and convolutional neural network. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2626–2637. [Google Scholar] [CrossRef]
Zhang, Z.; Yi, S.; Wang, C. DOA estimation based on CNNs embedded with Mamba. IEEE Access 2025, 13, 37467–37473. [Google Scholar] [CrossRef]
Gao, R.; Liang, M.; Dong, H.; Luo, X.; Suganthan, P.N. Underwater acoustic signal denoising algorithms: A survey of the state-of-the-art. IEEE Trans. Instrum. Meas. 2025, 74, 6502318. [Google Scholar] [CrossRef]
Li, J.; Deng, L.; Gong, Y.; Haeb-Umbach, R. An overview of noise-robust automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 745–777. [Google Scholar] [CrossRef]
Zhang, Z.; Geiger, J.; Pohjalainen, J.; Mousa, A.E.-D.; Jin, W.; Schuller, B. Deep learning for environmentally robust speech recognition: An overview of recent developments. ACM Trans. Intell. Syst. Technol. 2018, 9, 49. [Google Scholar] [CrossRef]
Xu, W.; Yi, S.; Zhang, Z. DOA Estimation Using Complex-Valued Neural Networks with Generative Probability Wave. Circuits Syst. Signal Process. 2025, 1–19. [Google Scholar] [CrossRef]
Wang, N.; Agathoklis, P.; Antoniou, A. A new DOA estimation technique based on subarray beamforming. IEEE Trans. Signal Process. 2006, 54, 3279–3290. [Google Scholar] [CrossRef]
Li, X.; Jin, M.; Meng, X.-T.; Cao, B.-X.; Yan, F.-G.; Greco, M.S.; Gini, F. Sparse linear arrays for direction-of-arrival estimation: A tutorial overview. IEEE Aerosp. Electron. Syst. Mag. 2025, 40, 1–15. [Google Scholar] [CrossRef]
Alzin, A. MVDR broadband beamforming using polynomial matrix techniques. Acoust. Sci. Technol. 2022, 44, 1–10. [Google Scholar]
Ruan, N.; Wang, H.; Wen, F.; Shi, J. DOA estimation in B5G/6G: Trends and challenges. Sensors 2022, 22, 5125. [Google Scholar] [CrossRef] [PubMed]
Pillai, S.U.; Kwon, B.H. Performance analysis of MUSIC-type high resolution estimators for direction finding in correlated and coherent scenes. IEEE Trans. Acoust. Speech Signal Process. 2002, 37, 1176–1189. [Google Scholar] [CrossRef]
Malioutov, D.; Cetin, M.; Willsky, A.S. A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process. 2005, 53, 3010–3022. [Google Scholar] [CrossRef]
Jung, S.; Kim, D.; Lee, J. Scalable Processor Architecture for Real-Time DOA Estimation Using ESPRIT Algorithm. Electronics 2021, 10, 695. [Google Scholar]
Ortega, E.; Vicente, A.; Martínez-Hellín, A.; Rodríguez, Ó.; Prieto, M.; Parra, P.; Da Silva, A.; Sánchez, S. Enhancing Efficiency in Spaceborne Phased Array Systems: MVDR Algorithm and FPGA Integration. Digit. Signal Process. 2024, 155, 104732. [Google Scholar] [CrossRef]
Florio, A. Real-Time Angle-of-Arrival Estimation Through Phase Interferometry: Theory, Techniques, Applications and Hardware Implementation. Ph.D. Thesis, Politecnico di Bari, Bari, Italy, 2023. [Google Scholar]
Xu, X.; Huang, Q. MD-DOA: A model-based deep learning DOA estimation architecture. IEEE Sens. J. 2024, 24, 20240–20253. [Google Scholar] [CrossRef]
Yu, J.; Wang, Y. Deep learning-based multipath DoAs estimation method for mmWave massive MIMO systems in low SNR. IEEE Trans. Veh. Technol. 2023, 72, 7480–7490. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, Y.; Tao, J.; Wen, C.; Liao, G.; Hong, W. Off-grid DOA estimation via a deep learning framework. Sci. China Inf. Sci. 2023, 66, 222305. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Chadha, G.S.; Panambilly, A.; Schwung, A.; Ding, S.X. Bidirectional deep recurrent neural networks for process fault classification. ISA Trans. 2020, 106, 330–342. [Google Scholar] [CrossRef] [PubMed]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Al Kassir, H.; Zaharis, Z.D.; Lazaridis, P.I.; Kantartzis, N.V.; Yioultsis, T.V.; Xenos, T.D. A review of the state of the art and future challenges of deep learning-based beamforming. IEEE Access 2022, 10, 80869–80882. [Google Scholar] [CrossRef]
Naeem, M.; De Pietro, G.; Coronato, A. Application of reinforcement learning and deep learning in multiple-input and multiple-output (MIMO) systems. Sensors 2021, 22, 309. [Google Scholar] [CrossRef]
Rizvi, S.K.J.; Azad, M.A.; Fraz, M.M. Spectrum of advancements and developments in multidisciplinary domains for generative adversarial networks (GANs). Arch. Comput. Methods Eng. 2021, 28, 4503–4521. [Google Scholar] [CrossRef]
Wang, K.; Gou, C.; Duan, Y.; Lin, Y.; Zheng, X.; Wang, F.-Y. Generative adversarial networks: Introduction and outlook. IEEE/CAA J. Autom. Sin. 2017, 4, 588–598. [Google Scholar] [CrossRef]
Kaneko, T. Generative adversarial networks: Foundations and applications. Acoust. Sci. Technol. 2018, 39, 189–197. [Google Scholar] [CrossRef]
Ge, S.; Li, K.; Rum, S.N.B.M. Deep learning approach in DOA estimation: A systematic literature review. Mob. Inf. Syst. 2021, 2021, 6392875. [Google Scholar] [CrossRef]
Paulraj, A.; Ottersten, B.; Roy, R.; Swindlehurst, A.; Xu, G.; Kailath, T. 16 subspace methods for directions-of-arrival estimation. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 1993; Volume 10, pp. 693–739. [Google Scholar]
O’Shea, T.J.; Hoydis, J. An Introduction to Deep Learning for the Physical Layer. IEEE Trans. Cogn. Commun. Netw. 2017, 3, 563–575. [Google Scholar] [CrossRef]
Luo, H.; Han, Y.; Fan, M. Underwater Acoustic Target Tracking: A Review. Sensors 2018, 18, 112. [Google Scholar] [CrossRef]

Figure 1. Detailed architecture of the generator network, illustrating the encoder–decoder structure with attention mechanism and skip connections for complex-valued signal processing.

Figure 2. Complete architecture of the proposed GAN framework for DOA signal enhancement.

Figure 3. Architecture of the complex-valued CNN for DOA estimation.

Figure 4. Amplitude waveform of the noisy input signal.

Figure 5. Amplitude waveform of the GAN-enhanced output signal.

Figure 6. Spectrogram of the noisy input signal.

Figure 7. Spectrogram of the GAN-enhanced output signal.

Figure 8. DOA estimation accuracy comparison under different SNR conditions with 500 snapshots.

Figure 9. DOA estimation accuracy under different snapshot numbers at 20 dB SNR.

Figure 10. RMSE comparison under different SNR conditions with 500 snapshots.

Figure 11. RMSE under different snapshot numbers at 20 dB SNR.

Figure 12. DOA estimation performance under non-Gaussian noise conditions at 10 dB SNR: (a) Absolute accuracy; (b) Relative performance normalized to Gaussian baseline.

Table 1. SNR improvement comparison with 500 snapshots (dB).

Method	—10 dB	—5 dB	0 dB	10 dB	20 dB
Wavelet Denoising	1.8	2.1	2.8	3.5	3.8
Spectral Subtraction	2.9	3.2	3.9	4.6	4.9
Kalman Filter	3.7	4.1	4.8	5.4	5.7
Proposed GAN	6.2	6.8	7.2	7.8	8.1

Table 2. DOA estimation accuracy comparison under different SNR conditions with 500 snapshots (%).

Method	—10 dB	—5 dB	0 dB	10 dB	20 dB
MUSIC	15.2	24.5	48.3	82.8	95.5
ESPRIT	13.8	22.1	50.7	85.6	96.8
CNN-Based	62.4	72.6	85.8	92.7	97.2
ResNet-Based	65.7	76.3	86.9	93.5	97.8
GAN-CNN	72.2	87.4	92.7	95.9	98.9

Table 3. DOA estimation accuracy under different snapshot numbers at 20 dB SNR (%).

Method	50	80	100	200	500
MUSIC	71.3	76.2	81.4	89.7	95.5
ESPRIT	68.5	78.9	83.6	91.2	96.8
CNN-Based	89.2	91.8	93.5	95.8	97.2
ResNet-Based	90.6	92.7	94.1	96.3	97.8
GAN-CNN	93.8	95.2	96.1	97.6	98.9

Table 4. RMSE comparison under different SNR conditions with 500 snapshots (degrees).

Method	—10 dB	—5 dB	0 dB	10 dB	20 dB
MUSIC	12.8	9.6	5.8	2.3	1.1
ESPRIT	13.5	10.2	6.1	2.1	0.9
CNN-Based	5.2	3.8	2.4	1.3	0.7
ResNet-Based	4.7	3.4	2.1	1.1	0.6
GAN-CNN	3.9	2.6	1.5	0.8	0.4

Table 5. RMSE under different snapshot numbers at 20 dB SNR (degrees).

Method	50	80	100	200	500
MUSIC	4.2	3.1	2.6	1.8	1.1
ESPRIT	3.8	2.8	2.3	1.5	0.9
CNN-Based	1.8	1.4	1.2	0.9	0.7
ResNet-Based	1.5	1.2	1.0	0.8	0.6
GAN-CNN	1.1	0.9	0.8	0.6	0.4

Table 6. DOA estimation accuracy under different non-Gaussian noise conditions at 10 dB SNR with 500 snapshots (%).

Method	Gaussian Noise	Laplacian Noise	Uniform Noise	Mixture Gaussian
MUSIC	82.8	65.2	70.8	75.4
ESPRIT	85.6	68.4	73.1	78.2
CNN-Based	92.7	81.3	85.9	88.6
ResNet-Based	93.5	83.7	87.4	90.1
GAN-CNN	95.9	89.2	91.8	93.7

Table 7. Training time comparison across different methods.

Method	Training Time (s)
MUSIC	N/A (no training)
ESPRIT	N/A (no training)
CNN-Based	312.6
ResNet-Based	498.3
Proposed GAN–CNN	673.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Z.; Xu, W.; Zou, H.; Yi, S. A GAN-CNN Fusion Framework for Deep Learning-Based DOA Estimation in Low-SNR Environments. Sensors 2026, 26, 1676. https://doi.org/10.3390/s26051676

AMA Style

Zhang Z, Xu W, Zou H, Yi S. A GAN-CNN Fusion Framework for Deep Learning-Based DOA Estimation in Low-SNR Environments. Sensors. 2026; 26(5):1676. https://doi.org/10.3390/s26051676

Chicago/Turabian Style

Zhang, Zhenshan, Wenjie Xu, Haitao Zou, and Shichao Yi. 2026. "A GAN-CNN Fusion Framework for Deep Learning-Based DOA Estimation in Low-SNR Environments" Sensors 26, no. 5: 1676. https://doi.org/10.3390/s26051676

APA Style

Zhang, Z., Xu, W., Zou, H., & Yi, S. (2026). A GAN-CNN Fusion Framework for Deep Learning-Based DOA Estimation in Low-SNR Environments. Sensors, 26(5), 1676. https://doi.org/10.3390/s26051676

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A GAN-CNN Fusion Framework for Deep Learning-Based DOA Estimation in Low-SNR Environments

Abstract

1. Introduction

2. Related Work

2.1. Traditional DOA Estimation Methods

2.2. Application of Deep Learning in DOA Estimation

2.3. GAN Model

3. The Proposed Method

3.1. GAN Design for Signal Enhancement

3.1.1. Network Architecture

3.1.2. Loss Functions

3.1.3. Training Strategy

3.2. DOA Estimation with Enhanced Data

3.2.1. Data Preprocessing Pipeline

3.2.2. Complex-Valued CNN Architecture

3.2.3. Training Methodology

4. Simulations and Performance Evaluation

4.1. Experimental Setup

4.1.1. Datasets and Parameters

4.1.2. Implementation Details

4.1.3. Baseline Methods

4.2. Performance Metrics

4.3. Results and Analysis

4.3.1. Signal Enhancement Performance

4.3.2. DOA Estimation Accuracy Under Varying SNR Conditions with 500 Snapshots

4.3.3. Performance Analysis Under Different Snapshot Numbers at 20 dB SNR

4.3.4. Root Mean Square Error Analysis Under Varying Conditions

4.3.5. Robustness Analysis Under Various Non-Gaussian Noise Conditions

4.3.6. Computational Overhead and Practical Considerations

4.4. Extension to Multi-Source Scenarios and Lightweight Phase-Interferometry Considerations

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI