1. Introduction
DOA estimation represents a fundamental problem in array signal processing with widespread applications in radar systems, wireless communications, sonar, and acoustic signal processing. Accurate DOA estimation is crucial for spatial filtering, beamforming, and source localization in various civilian and military systems [
1,
2,
3].
Traditional DOA estimation algorithms can be broadly categorized into phase-comparison techniques, subspace-based methods, and maximum likelihood approaches. Phase-comparison methods estimate the incident angle by measuring inter-sensor phase differences with relatively low computational cost, which makes them attractive for real-time and hardware-efficient implementations; however, they typically require careful calibration and ambiguity resolution when long baselines are used [
4,
5,
6,
7,
8,
9,
10].
Classical subspace-based techniques such as Multiple Signal Classification (MUSIC) and Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) have demonstrated remarkable performance under ideal conditions. These methods exploit the eigenstructure of the covariance matrix to separate signal and noise subspaces. Nevertheless, their performance can become more sensitive to noise and sample limitations in low-SNR regimes or with a small number of snapshots, which may lead to reduced robustness in challenging environments [
11,
12,
13]. Maximum likelihood methods, while statistically efficient, suffer from computational complexity that limits their practical implementation.
Recent advances in deep learning have inspired novel approaches to DOA estimation that circumvent limitations of traditional methods. CNNs have been successfully applied to learn spatial features directly from array data or covariance matrices, while Recurrent Neural Networks (RNNs) and their variants have been employed to capture temporal dependencies in signal sequences [
14,
15].
From a computational perspective, classical subspace-based methods typically involve covariance estimation and subspace separation, where eigendecomposition introduces non-negligible complexity, and MUSIC further requires a grid search over candidate angles, which increases runtime proportionally to the angular resolution. As a result, their time to first estimation (TTFE) is largely determined by the snapshot accumulation needed for a reliable covariance estimate as well as the subsequent subspace separation and spectral search.
In contrast, DNN-based techniques usually shift the main computational burden to the offline training stage. Once trained, inference can be executed via a single forward pass with fixed latency, making them attractive for real-time deployment, especially on GPUs. However, their TTFE still depends on the input representation: models that rely on covariance matrices require collecting sufficient snapshots before forming
, whereas end-to-end networks operating on raw snapshots can potentially reduce TTFE by producing estimates with fewer samples or in a streaming manner. Despite these advantages, most deep learning-based DOA estimators remain vulnerable to performance degradation in noisy environments, as they primarily learn mapping functions without explicit denoising mechanisms. Moreover, existing CNN/RNN-based models often focus on feature extraction from raw or preprocessed data while being sensitive to noise-induced phase distortion, which is indispensable for spatial parameter estimation [
16,
17].
In this work, the term real time refers to online operation with bounded and predictable latency, rather than implying a universal microsecond-level end-to-end response across all platforms. We distinguish (i) the algorithmic latency after the input representation is available and (ii) the TTFE, which additionally includes the data acquisition window. For covariance-based pipelines, TTFE is dominated by the snapshot accumulation required to form the sample covariance matrix. Let
K be the snapshot number and
be the effective acquisition rate; the observation window is
and the TTFE can be approximated as
where
includes covariance construction and the subsequent inference/processing. Therefore, any absolute time value is scenario-dependent and should not be interpreted as a general definition of real-time DOA estimation. Notably, microsecond-level update rates reported by phase-interferometry/naive techniques are typically achieved via highly optimized hardware pipelines and per-snapshot processing, whereas covariance-based estimators trade longer observation windows for improved robustness under low SNR and limited data.
The integration of signal enhancement techniques with DOA estimation presents a promising direction to address noise robustness. Generative Adversarial Networks (GANs) have emerged as powerful tools for signal enhancement and denoising applications. In audio and speech processing, GAN-based enhancement has demonstrated remarkable capability in preserving signal integrity while suppressing noise. However, the application of GANs to array signal enhancement for DOA estimation has not been fully exploited, particularly concerning the preservation of spatial phase information critical for accurate direction finding [
18].
The development of integrated enhancement-estimation frameworks presents several fundamental challenges that require careful consideration [
19]. The primary challenge involves maintaining phase coherence during the denoising process to preserve spatial information, which is particularly crucial for DOA estimation as it fundamentally relies on precise phase relationships across array elements. Any phase distortion introduced during signal enhancement can severely degrade angle estimation accuracy and compromise the overall system performance. Another significant challenge lies in ensuring robust generalization capabilities across diverse noise types and varying SNR conditions, as practical operational environments often exhibit characteristics that may differ substantially from the training data distribution. This necessitates the development of algorithms that can adapt to unseen noise patterns and maintain consistent performance across a wide spectrum of signal quality conditions [
20,
21,
22].
Furthermore, achieving an optimal balance between enhancement quality and computational efficiency represents a critical challenge, especially for real-time applications where processing latency and resource constraints impose practical limitations [
23]. The enhancement process must deliver substantial noise suppression while maintaining computational tractability to enable deployment in resource-constrained environments. Additionally, the effective integration of enhancement and estimation modules within a unified framework poses substantial design challenges, requiring careful coordination between the two stages to ensure they work synergistically rather than as independent components. This integration must facilitate seamless information flow between modules while maintaining overall architectural coherence and optimization compatibility. These interconnected challenges collectively define the core technical obstacles that must be systematically addressed to realize effective integrated frameworks for DOA estimation in practical scenarios [
24,
25].
To address these challenges, this paper makes several key contributions that advance the state of the art in DOA estimation for low-SNR environments. We propose a novel two-stage framework that systematically combines GAN-based signal enhancement with CNN-based DOA estimation, creating an integrated architecture specifically engineered to operate effectively in challenging signal-to-noise ratio conditions. This comprehensive approach addresses the fundamental limitation of existing methods that struggle with performance degradation in noisy environments by incorporating a dedicated signal enhancement stage prior to the estimation process [
26,
27,
28,
29].
We develop an enhanced GAN architecture that incorporates sophisticated attention mechanisms and phase-consistent loss functions to preserve crucial spatial characteristics during the denoising process. The attention mechanism enables selective focus on temporally significant signal components, while the phase-consistent loss function ensures the preservation of phase information that is critical for accurate spatial processing and DOA estimation. This represents a significant improvement over conventional denoising approaches that often compromise phase integrity in pursuit of noise reduction.
Furthermore, we design a specialized complex-valued CNN architecture capable of effectively processing enhanced covariance matrices for accurate DOA classification. This network leverages the complex nature of array signals through dedicated processing pathways that maintain the rich information content in both real and imaginary components, enabling more effective feature extraction from the spatial covariance matrices that form the foundation of DOA estimation.
The remainder of this paper is structured as follows.
Section 2 provides a comprehensive review of traditional DOA estimation methods and recent advances in deep learning-based approaches, along with an introduction to Generative Adversarial Networks.
Section 3 details the proposed two-stage framework, including the design of the GAN-based signal enhancement module and the complex-valued CNN architecture for DOA estimation.
Section 4 presents extensive experimental results and in-depth performance analysis under various conditions. Finally,
Section 5 concludes the paper and suggests directions for future research.
2. Related Work
2.1. Traditional DOA Estimation Methods
Traditional DOA estimation methods can be broadly grouped into beamforming-based techniques, subspace-based methods, and sparse representation approaches [
30,
31]. Beamforming methods estimate DOA by scanning candidate angles and evaluating an output power criterion, offering relatively low implementation complexity but limited resolution under closely spaced sources or strong interference [
32]. Subspace-based algorithms such as MUSIC and ESPRIT achieve super-resolution by exploiting the eigen-structure of the spatial covariance matrix; however, they are known to be sensitive to practical factors such as low SNR, limited snapshots, model mismatch, and source coherence, which can reduce robustness in challenging environments [
33,
34]. Sparse methods cast DOA estimation as a sparse recovery problem and can be effective with limited snapshots, but their performance depends on dictionary resolution and regularization choices [
35].
Since the above principles are well established, we briefly emphasize a recent trend that is highly relevant to practical deployment: efficient real-time and hardware-oriented implementations of classical DOA/beamforming techniques. Recent works report FPGA/parallel architectures for MUSIC-like processing to reduce latency and resource usage, including dedicated acceleration of key blocks and high-level-synthesis (HLS) implementations. Processor-oriented ESPRIT implementations have also been proposed to enable scalable and efficient DOA estimation on digital hardware [
36]. For beamforming, practical system-level implementations of adaptive methods on FPGA-based digital beamforming receivers have been investigated to meet real-time constraints [
37]. In addition, phase-comparison AoA estimation has attracted renewed interest due to its hardware friendliness and reconfigurable full-digital architectures [
38].
Despite these advances in efficient implementations, traditional pipelines remain fundamentally limited by their sensitivity in low-SNR and data-scarce regimes, motivating hybrid data-driven frameworks that explicitly enhance the received signals prior to DOA estimation. This motivates the enhancement–estimation paradigm adopted in this work.
2.2. Application of Deep Learning in DOA Estimation
The application of deep learning to DOA estimation represents a paradigm shift, leveraging neural networks to learn direct mappings from array data to source directions and thereby circumventing the stringent statistical assumptions of classical methods. Deep learning-based DOA estimators can be categorized by network architecture, input representation, and learning strategy, each suited to specific operational scenarios [
39,
40,
41]. Among these, CNNs have been widely adopted due to their ability to extract spatial features from structured representations of array data. A common approach treats the spatial covariance matrix as a two-dimensional image, where the fundamental mapping learned by the network can be expressed as
with
being the array snapshot at time
k and
N the number of sensors. To preserve the phase information critical for DOA, complex-valued CNNs have been developed that employ complex convolution operations. The output feature map can be written as
where ∗ denotes complex convolution, and
and
are complex-valued convolution kernels and bias terms, respectively. Here,
denotes a nonlinear activation function. In this work, we adopt a split activation applied independently to the real and imaginary parts, i.e.,
where
is a standard real-valued nonlinearity (e.g., ReLU),
and
denote the real and imaginary components, and
.
For scenarios involving moving sources or sequential snapshots, recurrent architectures such as recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) networks are employed to capture temporal dependencies [
42]. The state update and DOA prediction at time
t are formulated as
where
is the hidden state. Bidirectional RNNs further incorporate future context to improve accuracy, especially in offline batch processing [
43]. A more recent trend emphasizes end-to-end learning frameworks that bypass explicit covariance computation altogether, mapping raw array data directly to DOA estimates
While eliminating the need for statistical preprocessing, such models demand large datasets and careful architectural design to implicitly learn spatial correlations.
To address the pervasive challenge of data scarcity, transfer learning techniques have been explored to adapt models trained on simulated data to real-world conditions [
44]. This is typically achieved through composite loss functions that incorporate domain adaptation
Despite mitigating the simulation-to-reality gap, these approaches still struggle under low-SNR conditions or when confronted with unseen array geometries.
Notwithstanding significant progress, deep learning-based DOA methods continue to face several unresolved challenges, including pronounced sensitivity to noise, limited generalization across different array configurations, high computational costs for training complex networks, and a general lack of interpretability inherent to black-box models [
45,
46]. These limitations underscore the necessity for hybrid approaches that synergistically integrate established signal processing principles with the representational power of deep learning. The proposed GAN-CNN framework directly responds to this need by introducing a dedicated signal enhancement stage prior to estimation, thereby addressing the critical issue of noise robustness and paving the way for more reliable DOA systems in practical low-SNR environments.
2.3. GAN Model
Generative Adversarial Networks have been widely adopted in signal processing due to their adversarial learning mechanism [
47,
48,
49]. A GAN consists of a generator (
G) and a discriminator (
D) trained in a minimax game:
where
denotes real/true samples drawn from the data distribution, and
is the latent input. In this work, each sample corresponds to complex-valued baseband array snapshots. To enable standard deep-learning operations, we represent complex snapshots by two channels, i.e.,
, where
L is the temporal length.
In signal enhancement settings, conditional GANs are particularly suitable because the generator learns a supervised mapping from noisy observations to clean targets:
Here,
and
are complex array-snapshot sequences represented by their real/imaginary channels. From a physical perspective,
aims to suppress additive noise while preserving inter-sensor phase relations that encode the DOA information. In our implementation, 1D convolutions are applied along the temporal dimension to capture local time structures in the received signals, while the discriminator employs adaptive pooling to aggregate temporal evidence into a compact decision statistic for distinguishing enhanced signals from clean references [
50,
51,
52,
53].
Regarding array calibration, the training data are assumed to be obtained from a calibrated array model. To mitigate potential residual phase offsets after enhancement, a phase alignment step is applied before covariance construction in the preprocessing pipeline. Incorporating explicit sensor gain mismatch modeling via data augmentation or calibration-aware regularization is an important extension and will be considered in future work.
Our implementation builds upon the conditional GAN framework with key enhancements specifically designed for DOA signal processing requirements. The architecture incorporates multi-scale feature extraction through hierarchical downsampling blocks to capture both local and global signal characteristics. An integrated attention mechanism selectively emphasizes temporally significant components, while a symmetric decoder with skip connections ensures the preservation of critical phase information and temporal resolution.
The discriminator employs a computationally efficient design utilizing strided convolutions and adaptive pooling, maintaining strong discriminative capability while minimizing computational overhead. The training process incorporates a composite loss function that combines adversarial training with magnitude reconstruction and phase consistency constraints. Notably, the phase consistency loss explicitly addresses the preservation of spatial phase relationships essential for accurate DOA estimation.
Our implementation employs a two-phase training approach. First, generator pre-training uses only reconstruction losses () to provide stable initialization. Second, adversarial fine-tuning involves joint optimization of generator and discriminator with the complete loss function. This approach, combined with dynamic weight adjustment and gradient clipping, ensures stable training convergence and prevents mode collapse issues common in GAN training.
The proposed GAN framework demonstrates significant advantages for DOA signal enhancement by simultaneously addressing magnitude reconstruction and phase preservation, while maintaining computational efficiency suitable for real-time applications.
4. Simulations and Performance Evaluation
4.1. Experimental Setup
4.1.1. Datasets and Parameters
We consider a narrowband far-field single-source scenario impinging on a uniform linear array (ULA) with
M sensors and inter-element spacing
. The complex baseband snapshot at time index
k is modeled as
where
is the steering vector and
is the source waveform. For the ULA, we use
The noise
is modeled as spatially and temporally white complex Gaussian noise, unless otherwise stated in the non-Gaussian noise experiments.
Within each sample, the source DOA is assumed constant during the K snapshots, which is consistent with common DOA benchmarks and with the covariance-based processing adopted by both classical and learning-based baselines. Across different samples, is varied according to the predefined angle grid to form a labeled dataset.
A total of 9050 samples are generated. Each sample consists of snapshots collected under a fixed DOA angle and a fixed SNR setting. The source waveform is generated as a zero-mean unit-power complex random sequence to emulate unknown narrowband emissions, and the SNR is controlled by adjusting .
The simulations assume ideal sensor responses after standard array calibration. This provides a controlled baseline to isolate the impact of noise and snapshot scarcity. We additionally discuss practical non-idealities as future work and note that these effects can be incorporated by perturbing or by applying per-sensor complex gains in the simulation pipeline.
4.1.2. Implementation Details
The models are implemented using PyTorch 1.9.0 and trained on NVIDIA RTX 4070ti GPUs. The GAN component undergoes training for 100 epochs with a batch size of 32, using the Adam optimizer with initial learning rates of and for the generator and discriminator respectively. The CNN-based DOA estimator is trained for 50 epochs with a batch size of 128, employing the Adam optimizer with initial learning rate of and step decay scheduling. The training–validation–test split follows a 80-10-10 ratio with stratified sampling to maintain angle distribution consistency.
4.1.3. Baseline Methods
The proposed method is rigorously evaluated through comprehensive comparisons with several established approaches to demonstrate its superior performance. The conventional MUSIC algorithm serves as a fundamental baseline. In our implementation, MUSIC uses the sample covariance matrix computed from K snapshots and performs an eigen-decomposition to obtain the noise subspace. To mitigate coherence effects, spatial smoothing is applied when needed. The MUSIC pseudo-spectrum is evaluated over a uniform angular grid within with a step size of . To ensure a fair comparison with the classification-based CNN outputs, continuous MUSIC estimates are mapped to the nearest integer degree before computing the accuracy and RMSE.
ESPRIT provides another classical benchmark. ESPRIT is implemented using two overlapping subarrays with one-sensor displacement to exploit rotational invariance, and it estimates DOA directly from the eigenvalues without grid search. The number of sources is set consistently with the simulation setup to avoid bias from model-order mismatch.
For deep learning based comparisons, a standard Convolutional Neural Network (CNN-Based) architecture without GAN enhancement is implemented to isolate the contribution of the proposed signal enhancement stage. This baseline employs identical input processing and output layers to ensure fair comparison, but processes the original noisy signals directly without any preprocessing. Additionally, a Deep Residual Network (ResNet-Based) with identical architecture depth and parameter count is included to evaluate the impact of advanced network architectures alone, without the integrated enhancement framework. This comprehensive set of baseline methods ensures thorough evaluation across both traditional algorithmic approaches and modern deep learning paradigms.
4.2. Performance Metrics
To comprehensively evaluate the performance of the proposed framework across different dimensions, this section employs multiple quantitative metrics. The quality of signal enhancement is quantitatively assessed through the SNR Improvement:
This metric directly reflects the noise suppression capability of the GAN-based enhancement module and reveals the critical effectiveness of the preprocessing stage in improving signal quality for subsequent DOA estimation. For the core DOA estimation performance, the evaluation employs DOA Estimation Accuracy, defined as:
This metric measures the percentage of exactly correct angle predictions across the entire test dataset, providing an intuitive understanding of the overall system performance while facilitating direct comparison with classification-based approaches in the literature. To provide a more nuanced assessment of estimation precision, the RMSE is introduced:
This metric captures the average magnitude of estimation errors and penalizes larger deviations more severely, thereby offering a comprehensive view of estimation consistency across all test samples. In summary, these complementary metrics collectively form a multi-perspective evaluation framework that comprehensively assesses the system from two key dimensions: signal enhancement quality and final DOA estimation performance.
4.3. Results and Analysis
4.3.1. Signal Enhancement Performance
The GAN-based enhancement module demonstrates remarkable noise suppression capabilities across various SNR conditions with 500 snapshots. As shown in
Table 1, the proposed method achieves significant SNR improvement, particularly in low-SNR scenarios where traditional methods struggle. The consistent performance gain across all SNR levels underscores the robustness of the proposed attention-enhanced GAN architecture in extracting meaningful signal components from noisy observations.
To visually demonstrate the enhancement performance in the time domain,
Figure 4 and
Figure 5 present the amplitude waveforms of the noisy input signal and the GAN-enhanced output signal, respectively.
Figure 4 exhibits significant amplitude fluctuations and distortion due to additive noise contamination, while
Figure 5 shows restored signal integrity with smooth amplitude variations and preserved temporal structure, clearly illustrating the noise suppression capability of the proposed method.
The frequency-domain analysis through spectrograms provides further insight into the enhancement performance.
Figure 6 displays the Short-Time Fourier Transform (STFT) representation of the noisy signal, showing widespread noise components across the frequency spectrum. In contrast,
Figure 7 demonstrates the cleaned spectrogram after GAN processing, exhibiting concentrated signal energy in the relevant frequency bands with significantly reduced noise floor. This comparison indicates effective frequency-domain denoising while maintaining the essential spectral characteristics.
The phase preservation capability is quantitatively evaluated through phase coherence measurements. The proposed method maintains an average phase error of only compared to for conventional denoising methods, confirming its effectiveness in preserving spatial information crucial for DOA estimation. This exceptional phase preservation is attributed to the dedicated phase consistency loss function integrated into the GAN training process, which explicitly optimizes for phase accuracy alongside magnitude reconstruction. The visual evidence from both time and frequency domains corroborates the quantitative results, demonstrating the comprehensive enhancement capability of the proposed GAN framework.
4.3.2. DOA Estimation Accuracy Under Varying SNR Conditions with 500 Snapshots
The end-to-end performance of the proposed framework is evaluated under various SNR conditions with 500 snapshots, as summarized in
Table 2 and visually depicted in
Figure 8. The proposed method consistently outperforms all baseline approaches across the entire SNR range from -10 dB to 20 dB, with particularly notable advantages in challenging low-SNR regimes.
The accuracy improvement is most pronounced at —10 dB SNR, where the proposed method achieves 72.2% accuracy compared to 65.7% for the best baseline method, representing a relative improvement of 6.5%. This demonstrates the framework’s exceptional robustness in extremely challenging low-SNR environments. The consistent performance advantage across the entire SNR spectrum, as clearly visualized in
Figure 8, validates the effectiveness of the GAN-enhanced preprocessing stage in mitigating noise impacts on the subsequent DOA estimation.
4.3.3. Performance Analysis Under Different Snapshot Numbers at 20 dB SNR
To evaluate the framework’s capability in scenarios with varying data availability, comprehensive experiments are conducted with different snapshot numbers at 20 dB SNR, as detailed in
Table 3 and illustrated in
Figure 9. The proposed method demonstrates remarkable resilience to snapshot reduction, maintaining competitive performance even with severely limited temporal samples.
Notably, with only 50 snapshots at 20 dB SNR, the proposed method achieves 93.8% accuracy, significantly outperforming MUSIC (71.3%) and demonstrating a 3.5% relative improvement over the best deep learning baseline (ResNet-Based: 90.6%). The performance advantage is maintained across all snapshot configurations, with the proposed method achieving 98.9% accuracy with 500 snapshots. As evident from
Figure 9, this robust performance under limited data conditions highlights the GAN enhancement’s ability to recover meaningful signal components from constrained temporal observations.
4.3.4. Root Mean Square Error Analysis Under Varying Conditions
The RMSE provides a comprehensive evaluation of estimation precision across different operational scenarios.
Table 4 and
Figure 10 present the RMSE performance under varying SNR conditions with 500 snapshots, demonstrating the proposed method’s superior estimation accuracy particularly in challenging low-SNR environments.
The proposed method achieves the lowest RMSE across all SNR levels, with particularly significant improvements in low-SNR conditions. At —10 dB SNR, the RMSE of 3.9° represents a 17.0% reduction compared to the best baseline method (ResNet-Based: 4.7°) and a 69.5% improvement over traditional MUSIC (12.8°). This substantial error reduction, clearly visible in
Figure 10, underscores the effectiveness of the GAN enhancement in mitigating noise-induced estimation errors and improving overall estimation consistency.
Further analysis of RMSE under different snapshot numbers at 20 dB SNR, as shown in
Table 5 and
Figure 11, reveals the framework’s robustness to limited data availability.
With only 50 snapshots at 20 dB SNR, the proposed method achieves an RMSE of 1.1°, representing a 26.7% improvement over the best baseline (ResNet-Based: 1.5°) and a 73.8% improvement over MUSIC (4.2°). As the number of snapshots increases to 500, the RMSE improves to 0.4°, maintaining a consistent advantage over all comparison methods. The progressive improvement in estimation precision with increasing snapshot numbers is clearly demonstrated in
Figure 11, highlighting the framework’s ability to extract reliable spatial information even from severely limited temporal data, a characteristic attributed to the signal enhancement stage’s capacity to recover meaningful components from noisy observations.
To provide a theoretical performance limit for the considered DOA estimation problem, we additionally report the Cramér–Rao Bound (CRB) as a baseline in
Figure 10 and
Figure 11. The CRB gives a lower bound on the variance of any unbiased estimator under the assumed statistical model, and therefore serves as a reference for gauging how far practical methods are from the best achievable accuracy.
We compute the CRB under the same narrowband far-field single-source ULA model used in our dataset generation, i.e., with spatially and temporally white complex Gaussian noise . Following the standard stochastic CRB formulation, the Fisher information for is obtained from the derivative of the steering vector . The bound is then plotted as to match the RMSE metric used in our evaluation.
The comprehensive experimental results, supported by both tabular data and visual representations, demonstrate that the proposed method exhibits graceful performance degradation under adverse conditions, whereas traditional methods suffer from rapid performance deterioration. This robust behavior makes the framework particularly suitable for practical applications where both noise contamination and data limitations are common challenges.
4.3.5. Robustness Analysis Under Various Non-Gaussian Noise Conditions
To evaluate the framework’s robustness in practical scenarios where noise often deviates from ideal Gaussian characteristics, additional experiments are conducted under various non-Gaussian noise conditions at 10 dB SNR with 500 snapshots. Three representative non-Gaussian noise types are considered, each with distinct statistical properties that challenge DOA estimation algorithms.
Laplacian Noise is modeled using the Laplace distribution, which represents a heavy-tailed noise with higher kurtosis than Gaussian noise:
with location parameter
and scale parameter
, generating noise with heavier tails that better model real-world impulsive interference scenarios.
Uniformly Distributed Noise follows a continuous uniform distribution:
with
and
, representing bounded amplitude noise commonly encountered in quantization and clipping scenarios.
Mixture Gaussian Noise combines two Gaussian components:
with mixing coefficient
, variances
and
, simulating noise with multiple variance components that may occur in multi-source interference environments.
Table 6 presents the DOA estimation accuracy under different non-Gaussian noise conditions. The Laplacian noise, characterized by its heavier tails and higher probability of large-amplitude samples, poses particular challenges for traditional DOA estimation methods that assume Gaussian noise statistics.
As shown in
Table 6 and
Figure 12, the proposed method maintains superior performance across all non-Gaussian noise conditions. Under Laplacian noise, the GAN-CNN framework demonstrates remarkable robustness with only a 7.0% relative performance drop, while MUSIC suffers a 21.2% degradation. The relative performance analysis in
Figure 12b further confirms the framework’s minimal sensitivity to noise distribution variations. The robustness index is quantified as:
where
and
represent accuracy under Gaussian and non-Gaussian conditions. The proposed method achieves indices of 93.0%, 95.7%, and 97.7% for Laplacian, uniform, and mixture Gaussian noise respectively, significantly outperforming MUSIC (78.7%, 85.5%, 91.1%).
Further analysis of the RMSE under Laplacian noise conditions reveals consistent advantages. At 10 dB SNR, the proposed method achieves RMSE of 1.2° compared to 3.1° for MUSIC, 1.7° for CNN-Based, and 1.5° for ResNet-Based. This demonstrates the framework’s ability to mitigate noise outliers while maintaining estimation precision.
The experimental results validate the framework’s strong generalization capability in non-Gaussian environments, particularly under challenging heavy-tailed noise conditions. This robustness ensures practical utility in real-world applications where noise characteristics often deviate from Gaussian assumptions.
4.3.6. Computational Overhead and Practical Considerations
To avoid ambiguity, we emphasize that the end-to-end latency depends on both the algorithmic computation and the data acquisition window used to form the input representation. Accordingly, we report computational complexity and discuss TTFE primarily in terms of snapshot number K (and for a given acquisition rate), rather than claiming a fixed millisecond value as representative for all real-time systems.
Subspace-based algorithms such as MUSIC/ESPRIT require forming the sample covariance matrix, followed by eigendecomposition, which typically scales as for covariance accumulation and for eigendecomposition with M sensors and K snapshots. MUSIC further introduces a spectral search over a grid of G angles, leading to additional cost proportional to G. These steps can dominate runtime when fine angular resolution is needed or when repeated estimations are required.
CNN/ResNet-based estimators shift the main computational cost to offline training. At inference time, the complexity is dominated by a fixed number of convolution and fully connected operations, resulting in near-constant latency once the input representation is available. Similar to classical methods, if the input is the covariance matrix, a minimum number of snapshots is still required before the first estimate can be produced.
Compared to a CNN-only estimator, our method introduces an additional enhancement module before covariance construction. Hence, the online overhead mainly comes from one extra 1D convolutional encoder–decoder pass per received snapshot block. However, the enhancement stage is fully parallelizable on GPUs and can also be efficiently implemented on edge accelerators due to its convolutional structure. Importantly, we hypothesize that this additional overhead is compensated by improved robustness at low SNR and under limited snapshots: the GAN-based enhancement suppresses noise while the phase-consistent loss encourages preservation of spatial phase relations, which are crucial for DOA inference. Consequently, the subsequent CNN receives a higher-quality covariance representation, leading to the observed accuracy and RMSE gains in challenging regimes. In summary, the proposed approach trades a moderate additional inference cost for a significant robustness improvement in low-SNR and data-scarce scenarios. The training time comparison among different methods is summarized in
Table 7.
4.4. Extension to Multi-Source Scenarios and Lightweight Phase-Interferometry Considerations
To address practical environments where multiple sources may coexist, we discuss how the proposed two-stage enhancement–estimation pipeline can be extended beyond the single-source setting. Importantly, the first-stage GAN in our framework operates on the complex-valued array snapshots to suppress noise while preserving phase relations; therefore, it remains applicable in multi-source conditions, as it does not rely on a single-source assumption. The principal modifications are implemented at the second stage, specifically concerning the DOA estimation network and its associated training and inference protocols.
For the common case of two simultaneous sources, the enhanced snapshots are first processed in the same manner as the single-source pipeline: the enhanced complex signals are used to form the sample covariance matrix, which is then fed into the DOA estimator. To enable two-source inference, the CNN is extended to a dual-head architecture: the feature extraction backbone is shared, while the output layer is split into two parallel classification heads, each producing a probability distribution over the 181 discrete angles. Let denote the predicted distributions from the two heads, and let be the one-shot labels corresponding to the two ground-truth DOAs.
Since a two-source DOA target is an unordered set
, the learning objective should not enforce an artificial ordering between the heads. To explicitly address this permutation symmetry, we employ a permutation-invariant training objective:
where
is the standard cross-entropy loss. This formulation allows the estimator to learn two consistent angular peaks without requiring a predefined source order.
During inference, each head produces one DOA estimate by selecting the maximum-probability angle, i.e., and . For evaluation, the predicted pair is matched to the ground truth by selecting the permutation that minimizes the total angular error.
When the number of active sources is unknown, one practical strategy is to augment the estimator with a confidence-based selection mechanism. For example, each head can output a confidence score based on its maximum probability, and only predictions exceeding a threshold are retained as valid sources. Alternatively, a separate lightweight classifier can be added to predict the source count from the same covariance representation, which then activates the corresponding number of output heads. We leave a full implementation and systematic evaluation of source-count estimation as future work, as it requires broader multi-source simulation settings and additional annotations.
Lightweight AoA techniques based on phase interferometry are attractive for real-time and hardware deployment due to their low computational footprint and simple arithmetic operations. However, in multi-source settings, the measured inter-sensor phase differences generally reflect a superposition of multiple contributors, which introduces ambiguity and often requires additional separation steps to recover multiple angles. These methods are typically efficient but can be more sensitive when sources are closely spaced, have unequal powers, or when phase wrapping becomes prominent under low SNR.
In contrast, the proposed GAN–CNN pipeline aims to improve robustness in challenging regimes by enhancing noisy array snapshots while explicitly encouraging phase consistency, thereby providing a higher-quality covariance representation to the DOA estimator. This design can be viewed as complementary to lightweight interferometric approaches: the enhancement stage targets noise suppression and phase preservation, while the estimation stage can be adapted to multi-source outputs via permutation-invariant learning. A hybrid integration is an interesting direction for future work.
5. Conclusions
This paper presents a novel two-stage GAN-CNN fusion framework for robust DOA estimation in low-SNR and data-limited environments. The proposed method integrates an attention-enhanced GAN for signal denoising with a complex-valued CNN for accurate spatial feature extraction, addressing key challenges in conventional and deep learning-based DOA estimation methods. The GAN component, equipped with a phase-consistent loss function, effectively suppresses noise while preserving spatial phase information essential for accurate direction finding. The subsequent complex-valued CNN processes enhanced covariance matrices to extract discriminative spatial features for precise DOA classification.
Extensive experimental evaluations demonstrate the superior performance of the proposed framework across a wide range of SNR conditions, snapshot numbers, and non-Gaussian noise environments. Specifically, the method achieves a DOA accuracy of 72.2% and an RMSE of 3.9° at —10 dB SNR with 500 snapshots, significantly outperforming traditional algorithms such as MUSIC and ESPRIT, as well as state-of-the-art deep learning baselines. The framework also exhibits strong robustness to limited data, maintaining 93.8% accuracy with only 50 snapshots at 20 dB SNR. Furthermore, the proposed approach demonstrates consistent performance under various non-Gaussian noise conditions, highlighting its practical applicability in real-world scenarios.
The results validate the effectiveness of the integrated enhancement-estimation paradigm and underscore the importance of phase-preserving denoising in DOA estimation. Future work will focus on extending the framework to more complex array geometries, exploring online adaptation mechanisms for dynamic environments, and investigating lightweight architectures for real-time deployment on embedded systems.