DOA Estimation Based on Circular-Attention Residual Network

Zhang, Min; Jiang, Hong; Li, Jia; Qu, Jianglong

doi:10.3390/app16020627

Open AccessArticle

DOA Estimation Based on Circular-Attention Residual Network

by

Min Zhang

¹,

Hong Jiang

^1,*,

Jia Li

² and

Jianglong Qu

²

¹

School of Information Engineering, Southwest University of Science and Technology, Mianyang 621010, China

²

School of Manufacturing Science and Engineering, Southwest University of Science and Technology, Mianyang 621010, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 627; https://doi.org/10.3390/app16020627

Submission received: 4 December 2025 / Revised: 30 December 2025 / Accepted: 2 January 2026 / Published: 7 January 2026

Download

Browse Figures

Versions Notes

Abstract

Direction of arrival (DOA) estimation is a fundamental problem in array signal processing, with extensive applications in radar, communications, sonar, and other fields. Traditional DOA estimation methods, such as MUSIC and ESPRIT, rely on eigenvalue decomposition or spectral peak search, which suffer from high computational complexity and performance degradation under conditions of low signal-to-noise ratio (SNR), coherent signals, and array imperfections. Cylindrical arrays offer unique advantages for omnidirectional sensing due to their circular structure and three-dimensional coverage capability; however, their nonlinear array manifold increases the difficulty of estimation. This paper proposes a circular-attention residual network (CA-ResNet) for DOA estimation using uniform cylindrical arrays. The proposed approach achieves high accuracy and robust angle estimation through phase difference feature extraction, a multi-scale residual network, an attention mechanism, and a joint output module. Simulation results demonstrate that the proposed CA-ResNet method delivers superior performance under challenging scenarios, including low SNR (−10 dB), a small number of snapshots (L = 5), and multiple sources (1 to 4 signal sources). The corresponding root mean square errors (RMSE) are 0.21°, 0.45°, and below 1.5°, respectively, significantly outperforming traditional methods like MUSIC and ESPRIT, as well as existing deep learning models (e.g., ResNet, CNN, MLP). Furthermore, the algorithm exhibits low computational complexity and a small parameter size, highlighting its strong potential for practical engineering applications and robustness.

Keywords:

DOA estimation; signal processing; attention mechanism; residual network

1. Introduction

Array signal processing is a prominent research topic in modern radar systems, with Direction of Arrival (DOA) estimation technology, as a vital branch, finding extensive application. Its core objective is to accurately determine the spatial azimuth information of signal sources by analyzing and processing the spatial signals received by an array. The performance of this technology directly determines the reliability of subsequent tasks such as target detection, tracking, and identification [1,2,3,4]. With the escalating demands of modern communication and detection systems for localization accuracy, real-time performance, and anti-jamming capability, the performance bottlenecks of traditional DOA estimation methods in complex scenarios have become increasingly apparent. In low signal-to-noise ratio (SNR) environments or when the array has a limited number of elements, traditional methods often struggle to maintain performance, leading to inaccurate angle estimation. While high-precision DOA estimation algorithms, such as subspace-based methods, can theoretically offer high accuracy, they require substantial computational resources in practice. The decomposition of the covariance matrix and the spectral peak search steps involve significant computational effort, a burden that becomes particularly heavy when extremely high precision is required [5,6].

Over the past few decades, researchers have proposed a series of classical DOA estimation methods, which can be broadly categorized into three types: beamforming-based, subspace decomposition-based, and maximum likelihood-based. Among them, subspace decomposition algorithms like multiple signal classification (MUSIC) [7] and estimation of signal parameters via rotational invariance techniques (ESPRIT) [8] have been widely used in various array structures for a long time due to their high estimation accuracy. However, the performance of these methods heavily relies on the precise modeling of the array manifold. In practical applications, their estimation performance degrades significantly in the presence of array imperfections (e.g., element position deviations, gain/phase inconsistencies), low SNR, or coherent signal sources. Furthermore, traditional methods typically require complex eigenvalue decomposition or spectral peak search, resulting in high computational complexity that hinders their suitability for real-time processing scenarios. For coherent source DOA estimation, spatial smoothing techniques [9] are commonly used for decorrelation, but these methods are only applicable to specific array structures with translational invariance, such as uniform linear arrays, and they reduce the effective array aperture, leading to degraded resolution. For cylindrical arrays and other irregular arrays, spatial smoothing techniques cannot be directly applied. Maximum likelihood estimation methods [10] can handle coherent signals but involve multi-dimensional joint search, which is computationally intensive and suffers from local optima, limiting their practicality.

In recent years, compressed sensing (CS)-based DOA estimation methods [11] have attracted significant attention by exploiting the inherent sparsity of source locations in the spatial domain. These approaches reformulate DOA estimation as a sparse signal recovery problem and are particularly effective in resolving closely spaced or coherent sources. Representative CS-based methods include greedy pursuit algorithms, convex optimization techniques, and Bayesian sparse learning approaches. For example, an iterative Variational Bayes-based sparse recovery algorithm was proposed in [12] for angle-of-arrival estimation, demonstrating improved robustness under limited snapshots and noisy conditions. Furthermore, more advanced Newton-type forward–backward greedy algorithms have been developed for multi-snapshot compressed sensing, offering faster convergence and improved reconstruction accuracy compared to conventional greedy methods. Despite their promising performance, most CS-based DOA estimation techniques rely on iterative optimization procedures or matrix inversions, resulting in relatively high computational complexity and limiting their applicability in real-time scenarios. In practical array signal processing applications, DOA estimation faces several additional challenges beyond noise and limited snapshots. One important issue is the presence of mutual coupling among array elements, which distorts the array manifold and can significantly degrade estimation accuracy if not properly accounted for. This problem becomes even more challenging in online or time-varying DOA estimation scenarios, where rapid adaptation is required under limited computational resources. Moreover, in complex propagation environments with multipath effects, traditional spatial-only processing methods often fail to provide reliable estimates. To address this issue, hybrid techniques, such as spatio-temporal or spatio-frequential smoothing methods, have been proposed to enable joint estimation of angles and times of arrival for multipath signals. Although these approaches improve robustness in multipath-rich environments, they typically impose strict requirements on array structure and incur increased computational complexity.

Array structure design is another critical factor influencing DOA estimation performance. Currently, research on DOA estimation for conventional arrays like linear and planar arrays is relatively mature, but these arrays possess inherent limitations in omnidirectional coverage, three-dimensional (3D) azimuth estimation, and spatial utilization. The cylindrical array, as a typical conformal array structure, offers 360° azimuth coverage capability due to its circularly distributed elements. Its extensibility in the elevation dimension further provides inherent potential for 3D DOA estimation, demonstrating unique advantages in scenarios demanding high omnidirectional perception, such as underwater sonar and omnidirectional communication base stations [13,14]. However, the array manifold of a cylindrical array exhibits strong nonlinearity and coupling characteristics. Estimation algorithms based on the linear array assumption are difficult to adapt directly, necessitating the design of specialized estimation strategies tailored to its structural properties.

In recent years, deep learning technology has provided new solutions for DOA estimation by leveraging its powerful feature learning and nonlinear mapping capabilities. Unlike physics-model-based methods, deep learning constructs neural network models to adaptively learn the complex mapping relationship between the array received signals and the DOAs from large amounts of labeled data, thereby effectively mitigating the impact of array manifold modeling errors [15,16]. Currently, DOA estimation methods based on models such as convolutional neural networks (CNN), recurrent neural networks (RNN), transformer, and generative adversarial networks (GAN) have achieved performance superior to traditional methods in structures like linear and planar arrays, demonstrating particularly strong robustness in scenarios involving low SNR, coherent signals, and array imperfections [17,18,19]. Deep residual networks (ResNet), with their residual block structure, effectively alleviate the vanishing gradient problem in deep networks. Their powerful feature reuse and nonlinear fitting capabilities offer a new pathway for DOA estimation with cylindrical arrays [20,21]. Although research on deep learning in DOA estimation has made progressive advances, studies focusing on the specific structure of cylindrical arrays are still in their infancy. Existing research predominantly concentrates on constructing deep learning-based DOA estimation methods for conventional arrays, often failing to fully account for the 3D spatial coverage capability and symmetry of cylindrical arrays. The circularly symmetric structure of cylindrical arrays increases the complexity of DOA estimation, making it difficult to directly transfer existing deep learning models to them. The standard residual blocks in ResNet lack directional awareness, making it challenging to focus on the key signal features corresponding to different DOAs. If the array received data is fed directly into the network without utilizing the properties of the cylindrical array’s covariance matrix for feature dimensionality reduction, it increases the training burden on the network. Moreover, the estimation accuracy and robustness of existing methods in challenging scenarios such as low SNR and a small number of snapshots still need improvement.

This paper proposes a circular-attention residual network (CA-ResNet) for DOA estimation with cylindrical arrays. The core contributions include three parts: (1) Designing a phase difference feature extraction module that utilizes the geometric prior of the cylindrical array to enhance feature representation; (2) Constructing a multi-scale residual network integrated with an attention mechanism to improve feature resolution and robustness; (3) Validating the superior performance of the proposed method under various scenarios through extensive experiments. This research fills a gap in deep learning-based DOA estimation for cylindrical arrays and provides an effective solution for practical applications.

2. Uniform Cylindrical Array Signal Model

This study employs a uniform cylindrical array model composed of M × N sensor elements. The array consists of N identical circular arrays (UCAs) equally spaced along the cylindrical axis (adjacent UCA spacing denoted as h). Each UCA is perpendicular to the cylinder axis, has a uniform radius R, and contains M uniformly distributed elements. The first element of each UCA is aligned on the same vertical line (parallel to the z-axis), ensuring the symmetry and consistency of the array structure. A schematic diagram of the model is shown in Figure 1.

Assuming K far-field narrowband signal sources with wavelength λ, each source transmits from a single antenna and impinges on the array from different directions. Let

θ_{k} (θ_{k} \in [0,2 π))

and

φ_{k} (φ_{k} \in [- \frac{π}{2}, \frac{π}{2}))

denote the azimuth angle and elevation angle of the k-th signal source, respectively.

The uniform cylindrical array is a two-dimensional array, which can be viewed as N identical UCAs of radius R uniformly arranged along the cylinder’s axis. Each UCA comprises M array elements. The spacing h between adjacent UCAs, as well as the inter-element spacing within the axial direction, satisfies

h \leq \frac{λ}{2}

. Using the coordinate origin as the phase reference, the array response of the 0-th UCA to the k-th signal source is:

a_{UCA} (θ_{k}, φ_{k}) = {[e^{j \frac{2 π h}{λ} \cos φ_{k} \cos (φ_{k} - ϕ_{0})}, e^{j \frac{2 π h}{λ} \cos φ_{k} \cos (φ_{k} - ϕ_{1})} \dots \dots e^{j \frac{2 π h}{λ} \cos φ_{k} \cos (φ_{k} - ϕ_{M - 1})}]}^{T}

(1)

where

ϕ_{m} = \frac{2 π m}{M}

, This

ϕ_{m}

is the angular position of the m-th element in the UCA.

Alternatively, the cylindrical array can be considered as composed of multiple uniformly spaced uniform linear arrays (ULAs) arranged circularly. Using the 0-th element of the 0-th UCA as the reference point, the array response of the 0-th ULA to the k-th signal source is:

a_{ULA} (φ_{k}) = {[1, e^{j \frac{2 π h}{λ} {\sin φ}_{k}} \dots \dots e^{j \frac{2 π h}{λ} (N - 1) \sin φ_{k})}]}^{T}

(2)

The wave path difference between the m-th element on the n-th UCA and the coordinate origin for the k-th signal source is given by

τ_{m n}

:

τ_{kmn} = R \cos φ_{k} \cos (θ_{k} - \frac{2 π m}{M}) + nh \sin φ_{k} a = 1,

(3)

The array response of a uniform cylindrical array can be obtained from Equations (2)–(4) as a

a (θ_{k}, φ_{k})

:

a (θ_{k}, φ_{k}) = a_{UCA} (θ_{k}, φ_{k}) \otimes a_{ULA} (φ_{k})

(4)

The received multi-snapshot signal

X (X \in C^{M N \times 1})

at the uniform cylindrical array can be expressed as:

X (t) = A S (t) + n (t)

(5)

where

A = [(θ_{0}, φ_{0}), (θ_{1}, φ_{1}) \dots \dots (θ_{K - 1}, φ_{K - 1})]

is the MN × K array manifold matrix;

S = {[S_{0}, S_{1} \dots \dots S_{K - 1}]}^{T}

is the K × 1 signal waveform matrix; and

N \in C^{M N \times 1}

represents Additive White Gaussian Noise (AWGN).

The theoretical array output covariance matrix R is:

R = E [x (t) x^{H} (t)] = A R_{s} A^{H} + σ^{2} I_{M}

(6)

where

A = [(θ_{0}, φ_{0}), (θ_{1}, φ_{1}) \dots \dots (θ_{K - 1}, φ_{K - 1})]

is the array manifold matrix, R_s is the source signal covariance matrix, σ² is the noise power, and I is the identity matrix. In practice, the sample covariance matrix is used for estimation:

\hat{R} = \frac{1}{T} \sum_{t = 1}^{T} x (t) x^{H} (t)

(7)

3. Network Architecture

This chapter details the overall architecture design and core modules of the CA-ResNet. As shown in Figure 2, the network consists of five core components: input feature design, an improved residual feature extraction backbone network, multi-scale feature fusion, an attention enhancement module, and a joint DOA output module. This architecture is tailored to the geometric properties of the cylindrical array and the physical characteristics of coherent signal propagation, deeply integrating the prior knowledge of traditional array signal processing with the feature learning capabilities of deep learning. It achieves efficient and robust feature extraction from the signals received by the cylindrical array and enables high-precision DOA estimation.

3.1. Input Feature Design

Traditional DOA estimation methods typically use the covariance matrix as the input feature. However, in coherent signal scenarios, the covariance matrix can become rank-deficient, severely limiting estimation performance. To address this issue, this study proposes using structured phase information as the core input feature, leveraging its insensitivity to signal coherence to enhance model robustness. The phase matrix is defined as the real-valued matrix composed of the phase angles of each element in the sample covariance matrix:

Φ (i, j) = \arg (\hat{R} (i, j))

(8)

where arg(·) denotes the phase angle operation, and i and j are sensor element indices (i, j = 1, 2, …, M).

Φ (i, j)

represents the relative phase between the signals received by the i-th and j-th elements.

Since the phase angle is inherently defined on a bounded interval [−π, π], a direct use of arg(·) may introduce artificial discontinuities when the true underlying phase varies smoothly across the boundary (e.g., near π and −π). Such phase wrapping effects can result in abrupt jumps that are physically meaningless and may dominate the learned filters in a deep neural network. To address this issue, the proposed feature construction avoids relying on absolute phase values alone and instead emphasizes relative and differential phase structures, which are invariant to 2π wrapping. Specifically, after extracting the phase matrix Φ, phase difference operations are applied along well-defined array dimensions. These operations implicitly perform phase unwrapping by computing differences between neighboring elements, thereby eliminating discontinuities caused by boundary crossings while preserving the physically meaningful phase gradients associated with the direction of arrival.

The input feature construction follows a systematic processing pipeline: First, the sample covariance matrix

\hat{R} \in ℂ^{MN \times MN}

of the multi-snapshot signals received by the cylindrical array is calculated, where M is the number of elements per UCA and N is the number of ULA layers along the cylinder axis. Then, the phase matrix Φ is obtained through the phase extraction operation, which effectively suppresses the interference of amplitude fluctuations on feature extraction for coherent signals. To fully characterize the two-dimensional geometry of the cylindrical array, the phase matrix is reshaped into a four-dimensional real-valued tensor

Φ_{3 D} \in ℝ^{N \times M \times N \times M}

, completely preserving the spatial topological relationships between elements. Based on this tensor, two types of structural features with clear physical meanings are extracted in parallel: (1) Circumferential Phase Difference Features: With the ULA layer index fixed, calculate the phase differences between UCA elements on the same circle, capturing the rotational symmetry. (2) Axial Phase Difference Features: With the UCA element index fixed, calculate the phase differences between ULA layer elements on the same generatrix, characterizing the translational invariance.

Finally, the circumferential phase difference features, axial phase difference features, and the original phase matrix are stacked along the channel dimension to construct a three-channel input tensor

X_{i n} \in ℝ^{M \times N \times 3}

. This feature design retains the essential phase information of the signal while explicitly embedding the geometric constraints of the cylindrical array, providing strong prior guidance for the deep learning model.

3.2. Improved Residual Feature Extraction Backbone Network

The feature extraction backbone is based on the standard ResNet. Tailored to the spatial characteristics of array signals, it incorporates three key improvements: multi-scale convolution, a channel attention mechanism, and dense connections, enabling efficient extraction of multi-scale spatial features. The network front-end uses a 7 × 7 convolutional layer followed by a 3 × 3 max pooling layer. The 7 × 7 convolution has a stride of 2 and padding of 3, increasing the number of input channels to 64 while preserving spatial resolution and capturing large-range spatial context. The 3 × 3 max pooling layer also has a stride of 2, further compressing the spatial dimensions to reduce subsequent computational complexity.

The core building block of the backbone network is the multi-scale residual module, as illustrated in Figure 3. As shown in the Multi-Scale Fusion block of Figure 3, the module adopts a parallel multi-branch structure to extract spatial features at different effective scales from the array signals. Specifically, the input feature map is first projected by a 1 × 1 convolution for channel adjustment and then fed into three parallel convolutional branches within the Multi-Scale Fusion block. Each branch employs a 3 × 3 convolutional operation, enabling the network to capture complementary spatial information at different receptive scales. The outputs of these parallel branches are concatenated along the channel dimension, forming a fused multi-scale feature representation. Following multi-scale fusion, the concatenated features are further refined by two attention-based aggregation blocks shown in Figure 3, namely Channel Aggregation and Spatial Aggregation. In the Channel Aggregation block, global average pooling is applied to encode global statistical information across channels, followed by two successive 1 × 1 convolution layers with a ReLU activation to model inter-channel dependencies. In parallel, the Spatial Aggregation block aggregates spatial context by applying a pooling operation and a convolutional layer, generating a spatial attention map that emphasizes informative spatial regions. Finally, the refined features are passed through a 1 × 1 convolution to adjust the channel dimension and are combined with the module input via a skip connection, forming a residual learning structure. This residual design facilitates stable gradient propagation and enhances feature representation capability in deep networks.

An Efficient Channel Attention (ECA) module is incorporated at the end of each multi-scale residual module to enhance informative channels with negligible computational overhead. Specifically, global average pooling is first applied to obtain channel-wise statistics, followed by a one-dimensional convolution along the channel dimension to model local cross-channel interactions. A Sigmoid function is then used to generate channel attention weights, which are applied to the input feature map via element-wise multiplication.

The kernel size k of the one-dimensional convolution is adaptively determined by the channel dimension C rather than being fixed across stages. Following the ECA design principle, k is computed as

k = {|\frac{\log_{2} (C)}{2} + 1|}_{odd}

(9)

Accordingly, for the four backbone stages with channel numbers C = 64, 128, 256, and 512, the kernel sizes are set to k = 3, 3, 5, and 5, respectively. This adaptive strategy ensures consistent channel interaction modeling across different network depths. The backbone network consists of four stages with output channel dimensions of 64, 128, 256, and 512, respectively, forming a hierarchical feature representation from low-level spatial details to high-level angular correlations.

For DOA estimation, a classification-based joint two-dimensional output formulation is adopted. The azimuth and elevation ranges are discretized into a 2D angular grid, and the network outputs a confidence map over this grid. Multiple sources are identified as multiple peaks in the output map, enabling joint azimuth–elevation estimation without explicit angle pairing.

3.3. Feature Fusion and Attention Enhancement

Features output by shallow layers have high spatial resolution, enabling precise characterization of fine-grained spatial relationships between elements; features output by deep layers have high semantic information, enabling effective discrimination of angular features from different signal sources. To comprehensively utilize the advantages of both feature types, this paper designs a multi-scale feature fusion module based on the feature pyramid network (FPN) and introduces a convolutional attention module after fusion to further refine the feature representation.

For multi-scale feature fusion, the output feature maps from three successive stages of the backbone network are selected as fusion inputs, as illustrated in Figure 3. Specifically, stage 2, stage 3, and stage 4 correspond to the outputs of the second, third, and fourth residual blocks in the backbone, respectively, and their output features are denoted as C2, C3, and C4. Due to the progressive downsampling operations in the backbone network, the spatial resolutions of C2, C3, and C4 are 1/8, 1/16, and 1/32 of the input feature size, respectively. The feature fusion process follows a top-down upsampling with lateral connection strategy. First, the deepest feature map C4 is upsampled by a factor of 2 using bilinear interpolation to match the spatial resolution of C3. In parallel, C3 is processed by a 1 × 1 convolution to reduce the channel dimension. The two feature maps are then combined by element-wise addition to generate the intermediate fused feature, denoted as P3. Next, P3 is upsampled by a factor of 2 to align its spatial resolution with that of C2. Similarly, C2 is first transformed by a 1 × 1 convolution for channel reduction and then added element-wise with the upsampled P3, producing another fused feature P2. After each fusion operation, a 3 × 3 convolutional layer is applied to suppress aliasing artifacts introduced by upsampling and to further enhance feature representation capability. Finally, the fused features P2, P3, and the upsampled C4 are uniformly resized to 1/4 of the input spatial resolution. These feature maps are concatenated along the channel dimension and passed through a 3 × 3 convolutional layer to generate the final multi-scale fused feature with 256 channels.

To filter effective information within the fused features, a parallel convolutional block attention module (CBAM) is introduced. This module sequentially stacks channel attention and spatial attention, forming a complementary attention mechanism. The operation flow of the channel attention branch is as follows: Generate two channel-wise context descriptors via global average pooling and global max pooling, respectively. Feed these two descriptors into a shared multi-layer perceptron (MLP) for nonlinear transformation, sum them, and then generate the channel weight vector via the Sigmoid activation function; the dimension of this vector is consistent with the number of channels in the fused feature. The operation flow of the spatial attention branch is as follows: Perform average pooling and max pooling along the channel dimension on the input feature, respectively, obtaining two single-channel feature maps. Concatenate these two feature maps into a 2-channel feature map, then generate the spatial weight matrix via a 7 × 7 convolutional layer and the Sigmoid activation function; the spatial size (H × W) of this matrix is consistent with that of the fused feature.

The complete flow for attention enhancement is as follows: First, the channel weight vector is element-wise multiplied with the fused features, resulting in channel-weighted features. Then, the spatial weight matrix is element-wise multiplied with the channel-weighted features, resulting in spatially weighted features. Finally, the spatially weighted features are added to the original fused features via a residual connection, yielding the final enhanced features. This design both filters key features through the attention mechanism and ensures the stability of information flow through residual connections.

3.4. Joint DOA Output

To balance the robustness and accuracy of DOA estimation, the output module adopts a strategy of joint classification and regression learning. It designs two parallel branches sharing the output of the feature extraction backbone, responsible for learning the probability distribution of angles and predicting precise continuous values, respectively.

The classification branch formulates DOA estimation as a multi-label classification problem. Specific operations are as follows: Discretize the azimuth angle range (0°~360°) into N equal intervals, and the elevation angle range (−90°~90°) into M equal intervals. Convert the enhanced features into a one-dimensional vector via global average pooling. Feed this vector into two fully connected layers for mapping, with a ReLU activation function and Dropout regularization inserted between the two layers to prevent overfitting, finally mapping to an (N + M)-dimensional output space. Finally, output the probability distribution of a signal source being present in each angle interval via the Sigmoid activation function.

The regression branch directly predicts the continuous angle values of the signal sources. Specific operations are as follows: Convert the enhanced features into a one-dimensional vector via an independent global average pooling operation. Feed this vector into a fully connected layer, outputting a vector of dimension 2K, where K is the preset maximum number of signal sources; this vector corresponds to the predicted azimuth and elevation angles for the K signal sources. To constrain the output range, use the Tanh activation function to restrict the predicted angle values to the [−1, 1] interval, then map them back to the actual angle intervals via linear transformation—for azimuth angles, the linear transformation uses 0° as the lower limit and 360° as the upper limit; for elevation angles, it uses −90° as the lower limit and 90° as the upper limit.

The total loss function during the training phase is the weighted sum of the classification loss and the regression loss. The classification loss uses binary cross-entropy loss with a focal factor, focusing on hard-to-classify angle intervals by reducing the weight of easy-to-classify samples. The regression loss uses the smooth L1 loss, which is more robust to outliers. The weights of the two losses are adjusted by the hyperparameter λ, which is determined to be 0.6 via grid search, thereby balancing the classification and regression losses. This joint optimization strategy enables the network to combine the robust learning capability of the classifier for angle distributions with the precise prediction capability of the regressor for continuous angles, effectively improving DOA estimation performance in complex scenarios.

4. Simulation and Analysis

4.1. Simulation Setup

To evaluate the proposed DOA estimation method for uniform cylindrical arrays, we designed a novel CA-ResNet deep learning algorithm and conducted systematic simulation to validate its performance. The simulation setup employed a uniform cylindrical array model composed of 7 circular layers, with 26 elements arranged on each layer. The array radius was 0.4 m, the total height was 0.6 m, and the spacing between adjacent circular arrays was 0.1 m. The signals were set as narrowband, far-field, and incoherent, with a carrier frequency of 1.5 GHz. The test dataset contained multiple independent samples, each encompassing 1 to 4 signal sources. Their DOAs covered the full azimuth range from 0° to 360° and the full elevation range from −90° to 90°. The SNR was varied from −10 dB to 10 dB in steps of 2 dB. For each SNR level, realistic received signals were simulated by adding white Gaussian noise of the corresponding intensity. This process yielded a total of 7,037,334 samples, which constituted the complete dataset. After random shuffling, the dataset was partitioned into training, validation, and test sets at ratios of 75%, 15%, and 10%, respectively. The test set contained entirely new data that was unseen during model training.

All simulations were implemented based on the PyTorch 2.1.1 framework. The hardware platform consisted of an Intel^® Core™ i7-11700 processor (8 cores, 16 threads) equipped with 32 GB of DDR4 memory, running the Windows 11 Professional operating system. The development environment was PyCharm 2024.1, and the programming language was Python 3.12.0. To comprehensively verify the performance of the proposed DOA estimation algorithm for uniform cylindrical arrays, we set up comparative simulation including the classical subspace algorithm MUSIC [21] and various deep learning models, such as MLP [22], CNN [1], and ResNet. All comparative algorithms used identical training and test datasets. The deep learning models underwent hyperparameter optimization (including the number of layers, number of neurons, learning rate, etc.) to ensure each model was in its optimal performance state.

The simulation primarily focused on the following three aspects: verification of model training convergence, performance comparison under different SNR and numbers of snapshots, and validation of angular resolution and multi-source adaptability. Performance evaluation used the Root Mean Square Error (RMSE) as the metric for angle estimation accuracy. This metric comprehensively considers the estimation errors for both azimuth and elevation angles. The specific calculation formula is as follows:

RMSE = \sqrt{\frac{1}{K \cdot p} \sum_{i = 1}^{K} \sum_{j = 1}^{p} {(φ_{i j}^{t r u e} - φ_{i j}^{t e s t})}^{2} + {(θ_{i j}^{t r u e} - θ_{i j}^{t e s t})}^{2}}

(10)

Here,

φ_{i j}^{t r u e}

and

θ_{i j}^{t r u e}

represent the true azimuth and elevation angles, respectively, of the j-th signal source in the i-th sample;

φ_{i j}^{t e s t}

and

θ_{i j}^{t e s t}

are the corresponding estimated values; K is the total number of test samples, and p is the number of signal sources per sample.

4.2. Basic Performance Verification

4.2.1. Model Training Convergence Analysis

The stability of the model training process is fundamental for ensuring reliable algorithm performance. To evaluate the convergence of the proposed algorithm, this simulation set the maximum number of training epochs to 1000, the batch size to 64, the dropout rate to 0.3, and the initial learning rate to 10⁻³, employing an adaptive decay strategy for optimization. The trajectory of the total loss function during training is shown in Figure 4.

As shown in Figure 4, the total loss function exhibits a stable convergence trend within 1000 training epochs. Specifically: In the early stages of training (epochs < 600), the loss value decreased rapidly from approximately 83 to about 1.5, indicating that the model could quickly learn effective angular features. During the middle stages of training (epochs 600–1000), the rate of decrease in the loss value slowed, gradually settling around 0.3. In the later stages of training (epochs > 1000), the loss value eventually stabilized near 0.02 without significant fluctuations. This convergence process demonstrates that the proposed algorithm did not exhibit overfitting or underfitting during the entire training process. The training was sufficient and effective, providing a reliable foundation for subsequent performance evaluation.

4.2.2. DOA Spectrum Verification

To verify the fundamental angle estimation capability of the proposed algorithm in multi-source scenarios, this simulation set up a test condition involving three signal sources. Their true DOAs were: Azimuth φ = 55°, 150°, 250°, corresponding to Elevation angles θ = 30°, 50°, 20°, respectively. The simulation was conducted under an SNR of 0 dB and with 1000 snapshots.

Figure 5 shows the DOA estimation spectrum generated by CA-ResNet under this scenario. The horizontal axis represents the Azimuth Angle (°), the vertical axis represents the Elevation Angle (°), and the color intensity (or z-axis) represents the Spectral Density (dB). The peak locations directly correspond to the estimated DOAs of the signals. As can be seen from the figure, the spectrum generated by the proposed algorithm exhibits sharp peaks without spurious or missing peaks. The estimated angles for the three signal sources closely match their true values. The specific errors are as follows: for Source 1, the azimuth estimation error is 0.57°, and the elevation error is 0.20°; for Source 2, the azimuth error is 0.20°, and the elevation error is 1.34°; for Source 3, the azimuth error is 1.27°, and the elevation error is 0.53°. All errors are less than 1.5°, verifying the algorithm’s good fundamental estimation accuracy even in multi-source environments. Figure 6 displays the DOA estimation spectrum of the traditional MUSIC algorithm. Both the azimuth and elevation errors for the three signal sources are larger than those of CA-ResNet.

4.2.3. Resolution Comparison Under Different SNRs

To evaluate the angular resolution capability of the proposed algorithm under different noise environments, the simulation was set up with a single signal source scenario, an angular separation of 2°, a snapshot number of 1000, and an SNR varying from −10 dB to 10 dB. Comparative algorithms included MUSIC, MLP, CNN, and ResNet, using RMSE as the performance evaluation metric. The results are shown in Figure 7.

The simulation results show that the proposed algorithm exhibits excellent robustness under low SNR conditions. When the SNR is −10 dB, its RMSE is 25°, significantly lower than that of MUSIC (45°), CNN (42°), ResNet (32°), and MLP (40°). As the SNR increases above 0 dB, the estimation accuracy of the proposed algorithm further improves. At an SNR of 10 dB, the RMSE of CA-ResNet reaches 0.21°, whereas the RMSE of ResNet, the best-performing comparative algorithm under the same condition, is 0.48°. In summary, the proposed algorithm maintains the lowest RMSE across the wide SNR range from −10 dB to 10 dB, demonstrating superior angle estimation accuracy and noise adaptation capability.

4.2.4. Ablation Study

To systematically evaluate the contribution of each core module in CA-ResNet to the DOA estimation performance, an ablation study was conducted. The results are shown in Figure 8. As the test SNR increases from −10 dB to 10 dB, the RMSE of all models shows a monotonically decreasing trend, indicating that a higher SNR is beneficial for improving DOA estimation accuracy. In terms of performance, the baseline ResNet model exhibits the highest RMSE across the entire SNR range. After sequentially introducing the attention mechanism, multi-scale feature fusion, and the improved residual structure, the estimation errors of the respective ablated models are significantly reduced. Specifically, after introducing the attention mechanism, the model’s RMSE decreases noticeably, indicating that this module enhances the model’s feature discrimination capability under low SNR conditions by focusing on key spatial information. After adding multi-scale feature fusion, the performance improves further, suggesting that integrating features across hierarchical levels helps enhance the ability to resolve multi-source signals. The improved residual feature structure further reduces the estimation error, demonstrating the promoted efficiency of feature transmission facilitated by the optimized residual connections. After integrating all core modules, the CA-ResNet model achieves the lowest RMSE under all SNR conditions, demonstrating comprehensive performance superiority.

4.3. Evaluation of Algorithm Accuracy and Generalization

This section comprehensively evaluates the accuracy and generalization capability of the proposed algorithm from four dimensions: accuracy rate, adaptability to the number of snapshots, robustness in extreme environments, and multi-source adaptability. An analysis of algorithm complexity is also supplemented to fully validate its comprehensive performance.

4.3.1. Accuracy Comparison Under Different SNRs

This section uses the accuracy rate as the evaluation metric to validate the estimation performance of the proposed algorithm under different SNR conditions. The accuracy rate is defined as the proportion of samples where the combined RMSE for all signal sources is less than 1° to the total number of samples. The simulation was set up with the number of signal sources ranging from 1 to 4, the angular separation between any two signal sources being no less than 7°, the SNR varying in the range of [−10, 10] dB, and 4000 independent test samples generated for each SNR condition. The proposed algorithm is compared against ResNet, representing an advanced deep learning approach, with the results shown in Figure 9.

Under low SNR conditions (e.g., SNR = −4 dB), the accuracy rate of the proposed algorithm reaches 86.7%, which is 1.5 percentage points higher than that of ResNet (85.2%), reflecting its stronger noise robustness. As the SNR increases to the medium-high range (SNR ≥ 2 dB), the accuracy rate of the proposed algorithm continues to improve, approaching 97%. For instance, it reaches 100% at SNR = 7 dB, whereas the accuracy rate of ResNet only ranges between 91.5% and 97.3%. Furthermore, in multi-source scenarios (e.g., number of sources p = 4), the proposed algorithm maintains an accuracy rate of 88.7% even at SNR = 10 dB, significantly higher than the 71.2% achieved by ResNet. This further verifies its superior adaptability in multi-source angle estimation.

4.3.2. Performance Comparison Under Different Numbers of Snapshots

The number of snapshots, reflecting the sample size of the received data, directly impacts the algorithm’s feature extraction capability. In this simulation, with the SNR fixed at 20 dB and the number of signal sources set to 1, the number of snapshots L was varied across several typical values from 5 to 1000. The performance of the proposed algorithm was compared against MUSIC, MLP, CNN, and ResNet using the RMSE metric. The results are shown in Figure 10.

In scenarios with a small number of snapshots (L ≤ 20), the traditional MUSIC algorithm and some deep learning models suffered from insufficient data for adequate feature extraction, resulting in generally high RMSE values. When L = 5, the RMSE values for MUSIC, MLP, CNN, and ResNet were 3.2°, 2.8°, 2.5°, and 2.1°, respectively, whereas the RMSE of the proposed algorithm was only 0.42°, demonstrating its excellent adaptability under limited snapshot conditions.

As the number of snapshots increased to L ≥ 100 (abundant snapshot scenario), the estimation accuracy of the proposed algorithm further improved and stabilized, with the RMSE maintained between 0.08° and 0.10°. This performance is significantly better than that of the comparative algorithms like ResNet (0.22–0.31°) and CNN (0.28–0.35°), indicating that the proposed algorithm can achieve higher precision DOA estimation with sufficient data support.

4.3.3. Performance in Combined Low-SNR and Limited-Snapshot Scenarios

To simulate extreme practical environments such as long-distance signal reception and complex electromagnetic interference, this simulation constructed combined scenarios of low SNR and a limited number of snapshots. The number of snapshots L was set to 50, 100, 150, and 200, while the SNR varied from −10 dB to 10 dB (in steps of 2 dB). For each parameter combination, 800 Monte Carlo trials were conducted, generating independent data and calculating the average RMSE to evaluate algorithm performance. Figure 11 shows the simulation results under these extreme conditions.

Under conventionally extreme conditions (L = 200, SNR = 0–10 dB), the RMSE of the proposed algorithm ranged between 0.28° and 0.55°. In a more challenging scenario (L = 100, SNR = −2 dB), the proposed algorithm achieved an RMSE of 0.7°, significantly outperforming MUSIC (1.9°) and ResNet (1.1°). Even under severely extreme conditions (L = 50, SNR = −10 dB), the proposed algorithm still constrained the RMSE within 1.82°, whereas the RMSE values for MUSIC and ResNet reached 5.2° and 3.7°, respectively. These results fully validate the outstanding robustness of the proposed algorithm in combined low-SNR and limited-snapshot scenarios.

4.3.4. Multi-Source Estimation Performance Validation

To verify the estimation performance of the proposed algorithm in multi-source scenarios, the simulation set the number of signal sources from 1 to 4. The angles for each source were randomly generated, ensuring an angular separation of no less than 5° between any two sources. The simulation conditions were set to SNR = 20 dB and the number of snapshots L = 1000. Figure 12 shows the DOA estimation spectra for different numbers of signal sources.

The simulation results indicate that across all test scenarios with 1 to 4 signal sources, the proposed algorithm accurately identified the number of sources. The spectra showed no spurious or missing peaks, and the peak locations closely matched the true angles. In the complex four-source scenario, the azimuth estimation errors for the individual sources ranged between 0.11° and 0.28°, and the elevation estimation errors ranged between 0.19° and 0.37°. These results demonstrate that the proposed algorithm effectively mitigates the resolution degradation issue common in traditional methods in multi-source scenarios, confirming its effectiveness and stability in multi-source DOA estimation.

4.3.5. Algorithm Complexity Analysis

The practical utility of an algorithm depends not only on its estimation performance but also on its computational complexity. For a comprehensive evaluation, Table 1 lists the average computation time for each algorithm when processing 1000 test samples (the simulation environment is consistent with Section 4.1), along with the model parameter size reflecting spatial complexity.

In terms of time complexity, the average computation time of the proposed algorithm was 0.00532 s. This is only 43% of ResNet’s time (0.01237 s), 1.85% of CNN’s time (0.287 s), and 0.47% of MUSIC’s time (1.123 s), demonstrating a significant computational efficiency advantage. Regarding spatial complexity, the proposed algorithm’s model has only 9.7k parameters, substantially fewer than the comparative models like ResNet (51.8k), CNN (83.7k), and MLP (102.4k). The low resource consumption makes it more suitable for deployment on resource-constrained platforms such as embedded systems, indicating good practical engineering value.

Table 2 and Table 3, respectively, present the training/testing time and algorithmic complexity of five DOA estimation methods, with the definitions of each parameter as follows: M denotes the total number of array elements, L is the number of snapshots, K represents the number of signal sources, L₁ is the number of network layers, Dₗ stands for the number of neurons in the l-th layer, C denotes the number of channels, H/W refers to the feature map size, and K’ is the convolution kernel size. As can be seen from the data in the tables, compared with the MUSIC and MLP methods, CA-ResNet not only has lower algorithmic complexity and shorter testing time, but also can significantly reduce the DOA estimation error, achieving the balanced optimization of “accuracy-efficiency-complexity”.

5. Conclusions

DOA estimation is a critical technology in array signal processing. Traditional methods like MUSIC and ESPRIT suffer from performance limitations in complex scenarios such as low SNR, a small number of snapshots, and coherent signals, coupled with high computational complexity. Although cylindrical arrays offer the advantage of three-dimensional omnidirectional perception, their nonlinear array manifold increases the difficulty of estimation. Existing deep learning models are difficult to apply directly to this structure and have shortcomings in feature utilization and robustness under extreme conditions.

To address the aforementioned problems, this paper proposed a CA-ResNet. The method explicitly utilizes the geometric prior of the cylindrical array through a phase difference feature extraction module; employs a multi-scale residual architecture and an attention mechanism to enhance spatial feature discrimination capability; utilizes a feature pyramid and a dual-attention module to achieve multi-level feature fusion and enhancement; and finally adopts a joint classification–regression output strategy to balance estimation robustness and accuracy.

Systematic experimental validation demonstrates that CA-ResNet exhibits comprehensive performance advantages in complex scenarios: Under low SNR conditions (0 dB), the Root Mean Square Error (RMSE) is only 0.45°, significantly outperforming comparative algorithms like MUSIC (2°) and ResNet (1.2°). With a limited number of snapshots (L = 5), the RMSE is as low as 0.45°, far below those of traditional methods and mainstream deep learning models. Facing multi-source scenarios with 1 to 4 signal sources, the RMSE is consistently controlled within 1.5°, and the algorithm accurately identifies the number of sources without spurious or missing peaks. Simultaneously, the algorithm possesses excellent practical applicability: the processing time per sample is only 0.00532 s (43% of ResNet’s time and 0.47% of MUSIC’s time), and the model has only 9.7k parameters (significantly fewer than ResNet’s 51.8k and CNN’s 83.7k). This low resource consumption makes it suitable for deployment on resource-constrained platforms like embedded systems. Ablation studies further validated the effectiveness of core modules, including the phase difference feature extraction, multi-scale residual structure, attention mechanism, and feature fusion, demonstrating that the synergistic action of these modules significantly enhances the model’s feature discrimination capability and robustness.

This work provides a high-precision, highly robust, and lightweight solution for DOA estimation with cylindrical arrays, also offering new insights for signal processing under complex array structures. Future work will explore its application in broader error models and dynamic scenarios.

Author Contributions

Conceptualization, M.Z.; methodology, M.Z.; software, M.Z. and J.L.; validation, M.Z., J.L. and J.Q.; formal analysis, M.Z.; investigation, M.Z.; resources, M.Z.; data curation, M.Z.; writing—original draft preparation, M.Z.; writing—review and editing, M.Z.; visualization, J.L.; supervision, H.J.; project administration, J.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Papageorgiou, G.K.; Sellathurai, M.; Eldar, Y.C. Deep networks for direction-of-arrival estimation in low SNR. IEEE Trans. Signal Process. 2021, 69, 3714–3729. [Google Scholar] [CrossRef]
Yuan, Y.; Wu, S.; Ma, Y.; Huang, L.; Yuan, N. KR product and sparse prior based CNN estimator for 2-D DOA estimation. AEU-Int. J. Electron. Commun. 2021, 137, 153780. [Google Scholar] [CrossRef]
Ruan, N.; Wang, H.; Wen, F.; Shi, J. DOA estimation in B5G/6G: Trends and challenges. Sensors 2022, 22, 5125. [Google Scholar] [CrossRef]
Compagnoni, M.; Notari, R.; Marcon, M.; Spagnolini, U. An algebraic geometry perspective for the estimation of the directions of arrival. J. Frankl. Inst. 2023, 360, 38–64. [Google Scholar] [CrossRef]
Lemos, R.P.; Kunzler, J.A.; de Souza, M.J.; e Silva, H.V.; Ferreira, Y.R.; Flôres, E.L.; Sander, O. Using matrix norms to estimate the direction of arrival of planar waves on an ULA. J. Frankl. Inst. 2019, 356, 4949–4969. [Google Scholar] [CrossRef]
Fadakar, A.; Mansourian, A.; Akhavan, S. Deep learning aided multi-source passive 3D AOA wireless positioning using a moving receiver: A low complexity approach. Ad Hoc Netw. 2024, 154, 103382. [Google Scholar] [CrossRef]
Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar] [CrossRef]
Roy, R.; Kailath, T. ESPRIT—Estimation of signal parameters via rotational invariance techniques. Opt. Eng. 1990, 29, 296–313. [Google Scholar]
Wen, J.; Liao, B.; Guo, C. Spatial smoothing based methods for direction-of-arrival estimation of coherent signals in nonuniform noise. Digit. Signal Process. 2017, 67, 116–122. [Google Scholar] [CrossRef]
Stoica, P.; Gershman, A.B. Maximum-likelihood DOA estimation by data-supported grid search. IEEE Signal Process. Lett. 1999, 6, 273–275. [Google Scholar] [CrossRef]
Massa, A.; Rocca, P.; Oliveri, G. Compressive sensing in electromagnetics-a review. IEEE Antennas Propag. Mag. 2015, 57, 224–238. [Google Scholar]
Bazzi, A.; Slock, D.T.M.; Meilhac, L. Sparse recovery using an iterative Variational Bayes algorithm and application to AoA estimation. In Proceedings of the 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Limassol, Cyprus, 12–14 December 2016; pp. 197–202. [Google Scholar]
Gao, X.F.; Li, P.; Hao, X.H.; Li, G.L.; Kong, Z.J. A novel DOA estimation algorithm using directional antennas in cylindrical conformal arrays. Def. Technol. 2021, 17, 1042–1051. [Google Scholar] [CrossRef]
Lin, Z.; Lv, T.; Ni, W.; Zhang, J.A.; Liu, R.P. Nested hybrid cylindrical array design and DoA estimation for massive IoT networks. IEEE J. Sel. Areas Commun. 2020, 39, 919–933. [Google Scholar] [CrossRef]
Wu, L.; Fu, Y.; Yang, X.; Xu, L.; Chen, S.; Zhang, Y.; Zhang, J. Research on the multi-signal DOA estimation based on ResNet with the attention module combined with beamforming (RAB-DOA). Appl. Acoust. 2025, 231, 110541. [Google Scholar]
Xu, X.; Huang, Q. MD-DOA: A model-based deep learning DOA estimation architecture. IEEE Sens. J. 2024, 24, 20240–20253. [Google Scholar]
Zhao, Y.; Fan, X.; Liu, J. Robust DOA Estimation via a Deep Learning Framework with Joint Spatial–Temporal Information Fusion. Sensors 2025, 25, 3142. [Google Scholar] [CrossRef]
Liu, K.; Cui, H.; Ma, J. Graph Neural Network-Based DOA Estimation Method Exploring Training Data Association. Circuits Syst. Signal Process. 2025, 44, 3641–3658. [Google Scholar]
Liu, W. Super resolution DOA estimation based on deep neural network. Sci. Rep. 2020, 10, 19859. [Google Scholar] [CrossRef]
Cao, Y.; Lv, T.; Lin, Z.; Huang, P.; Lin, F. Complex ResNet aided DoA estimation for near-field MIMO systems. IEEE Trans. Veh. Technol. 2020, 69, 11139–11151. [Google Scholar]
Al Kassir, H.; Kantartzis, N.V.; Lazaridis, P.I.; Sarigiannidis, P.; Goudos, S.K.; Christodoulou, C.G.; Zaharis, Z.D. Improving DOA estimation via an optimal deep residual neural network classifier on uniform linear arrays. IEEE Open J. Antennas Propag. 2024, 5, 460–473. [Google Scholar] [CrossRef]
Liu, Z.M.; Zhang, C.; Philip, S.Y. Direction-of-arrival estimation based on deep neural networks with robustness to array imperfections. IEEE Trans. Antennas Propag. 2018, 66, 7315–7327. [Google Scholar] [CrossRef]

Figure 1. Uniform Cylindrical Array Model.

Figure 2. Schematic Diagram of the Overall CA-ResNet Architecture.

Figure 3. Flowchart of Multi-scale Feature Fusion and Attention Module.

Figure 4. Convergence trajectory of the model training loss value.

Figure 5. DOA estimation spectrum of CA-ResNet in a multi-source scenario.

Figure 6. DOA estimation spectrum of MUSIC in a multi-source scenario.

Figure 7. Variation curve of RMSE with SNR for different algorithms.

Figure 8. Results of the ablation study on DOA estimation under different SNRs.

Figure 9. Accuracy of the proposed algorithm and ResNet versus SNR.

Figure 10. Variation curve of RMSE versus the number of snapshots for different algorithms.

Figure 11. Variation curves of RMSE with SNR under different numbers of snapshots.

Figure 12. DOA estimation spectra for different numbers of sources.

Table 1. Algorithm computation time and parameter size.

Algorithm	Calculation Time	Model Parameter Size (k)
MUSIC	1.123 s	—
MLP	0.814 s	102.4
CNN	0.287 s	83.7
ResNet	0.01237 s	51.8
CA-ResNet	0.00532 s	9.7

Table 2. Time consumption comparison across different DOA estimation methods.

Time (s)	MUSIC	MLP	CNN	ResNet	CA-ResNet
Training	/	7699.9835	7697.4476	7704.4633	7790.8754
Testing	1.123	0.841	0.287	0.01237	0.00532

Table 3. Complexity of different DOA estimation algorithms.

Algorithm	Complexity
MUSIC	$O (M^{3} + M^{2} L + K^{3})$
MLP	$O (\sum_{l = 1}^{L_{1}} D_{l} D_{l + 1})$
CNN	$O (\sum_{l = 1}^{L_{2}} C_{l} H_{l} W_{l} C_{l + 1} K'_{l}^{2})$
ResNet	$O (\sum_{l = 1}^{L_{3}} C_{l} H_{l} W_{l} C_{l + 1} K'_{l}^{2} + \sum_{l = 1}^{L_{3}} D_{l} D_{l + 1})$
CA-ResNet	$O (\sum_{l = 1}^{L_{4}} C_{l} H_{l} W_{l} C_{l + 1} K'_{l}^{2} / 4 + \sum_{l = 1}^{L_{4}} D_{l} D_{l + 1} / 4)$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, M.; Jiang, H.; Li, J.; Qu, J. DOA Estimation Based on Circular-Attention Residual Network. Appl. Sci. 2026, 16, 627. https://doi.org/10.3390/app16020627

AMA Style

Zhang M, Jiang H, Li J, Qu J. DOA Estimation Based on Circular-Attention Residual Network. Applied Sciences. 2026; 16(2):627. https://doi.org/10.3390/app16020627

Chicago/Turabian Style

Zhang, Min, Hong Jiang, Jia Li, and Jianglong Qu. 2026. "DOA Estimation Based on Circular-Attention Residual Network" Applied Sciences 16, no. 2: 627. https://doi.org/10.3390/app16020627

APA Style

Zhang, M., Jiang, H., Li, J., & Qu, J. (2026). DOA Estimation Based on Circular-Attention Residual Network. Applied Sciences, 16(2), 627. https://doi.org/10.3390/app16020627

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DOA Estimation Based on Circular-Attention Residual Network

Abstract

1. Introduction

2. Uniform Cylindrical Array Signal Model

3. Network Architecture

3.1. Input Feature Design

3.2. Improved Residual Feature Extraction Backbone Network

3.3. Feature Fusion and Attention Enhancement

3.4. Joint DOA Output

4. Simulation and Analysis

4.1. Simulation Setup

4.2. Basic Performance Verification

4.2.1. Model Training Convergence Analysis

4.2.2. DOA Spectrum Verification

4.2.3. Resolution Comparison Under Different SNRs

4.2.4. Ablation Study

4.3. Evaluation of Algorithm Accuracy and Generalization

4.3.1. Accuracy Comparison Under Different SNRs

4.3.2. Performance Comparison Under Different Numbers of Snapshots

4.3.3. Performance in Combined Low-SNR and Limited-Snapshot Scenarios

4.3.4. Multi-Source Estimation Performance Validation

4.3.5. Algorithm Complexity Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI