Deep Unfolded Gridless DOA Estimation Networks Based on Atomic Norm Minimization

: Deep unfolded networks have recently been regarded as an essential way to direction of arrival (DOA) estimation due to the fast convergence speed and high interpretability. However, few consider gridless DOA estimation. This paper proposes two deep unfolded gridless DOA estimation networks to resolve the above problem. We ﬁrst consider the atomic norm-based 1D and decoupled atomic norm-based 2D gridless DOA models solved by the alternating iterative minimization of variables, respectively. Then, the corresponding deep networks are trained ofﬂine after constructing the corresponding complete training datasets. At last, the trained networks are applied to realize the 1D DOA and 2D estimation, respectively. Simulation results reveal that the proposed networks can secure higher 1D and 2D DOA estimation performances while maintaining a lower computational expenditure than typical methods.


Introduction
The Direction of Arrival (DOA) estimation technique is widely used in radar, sonar, and other fields, and is an important research direction in array signal processing [1,2].Typical DOA estimation methods include MUSIC (Multiple Signal Classification) [3] and ESPRIT (Estimation of Signal Parameters via Rotational Invariance Technique) [4].Thanks to the successful application of 1D DOA estimation problems, the subspace super-resolution algorithms are successfully extended to 2D DOA estimation problems, such as 2D unitary estimation of signal parameters via rotational invariance techniques (U-ESPRIT) [5] and the 2D multiple signal classification (2D MUSIC) algorithm [6].These algorithms can obtain high angular resolution under the requirement of multiple snapshot reception data and known coherent signal sources, but any of the conditions not satisfied will cause its estimation performance to degrade or even fail.
In recent years, effective alternative algorithms based on compressive sensing techniques have been introduced into the field of DOA estimation, and fruitful research results have been achieved.The traditional compressed sensing algorithms, for example, orthogonal matching pursuit (OMP) [7], sparse Bayesian learning (SBL) [8], and other algorithms, divide the possible space of the signal source into a finite number of grid points by dimension.It shows excellent estimation performance when the target angle falls exactly on the established grid, which can be applied to complex scenarios such as single snapshot, signal coherence, and missing data.On the contrary, if the real signal source does not fall on the established grid, it will cause a grid mismatch problem, and the estimation performance will be degraded or even fail.In addition, traditional compressed sensing algorithms must satisfy the pairwise isometry property (PIP) [9] and high-density gridding.
A gridless continuous compressive sensing technique based on the atomic norm theory and the Vandermonde decomposition theorem is proposed to overcome these problems, Assume that K t far-field narrow-band signals incident into a uniform linear array (ULA) composed of M array elements.Then, the array received signal can be characterized as where a( T is the steering vector, , λ is the signal wavelength, d = λ/2 is the array spacing, θ k t is the angle between the k t -th signal source and the array, and [•] T represents the transposition; ] ∈ C 1×L is the complex amplitude matrix; c k t = ||s k t || 2 > 0 and b k t = c −1 k t s k t with ||b k t || 2 = 1; N is the zero-mean Gaussian matrix white noise.According to ANM theory [10], the received signal X can be seen as a linear combination of some K t atoms in the set A M .The set A M of atoms can be defined as So, the frequency can be recovered by dealing with the following optimization problem argmin X ||X|| A M ,0 s.t.||Y − X|| 2  2 ≤ ε (4) Usually, the solution of (4) can be obtained by its dual problem [34,37].By exploiting the semi-positive, Toeplitz, and low-rank properties of the array covariance matrix R= E{YY H } =T(u) [10], also (4) can be formulated into the following SDP problem.[Tr(W) + Tr(T(u))]s.t.T(u) X X H W ≥ 0, ||Y − X|| 2 F ≤ ε (5) where E{•} and Tr(•) denote the expectation operator and the trace operation, and T(•) denotes a mapping operation from a vector u ∈ C M×1 to a Hermitian Toeplitz matrix T(u) ∈ C M×M , and W ∈ C L×L denotes an auxiliary variable matrix.
For problem (5), CVX solver SDPT3 can be utilized to secure optimal T(u) efficiently [38].Then, the results of DOA are estimated by Vandermonde composition of T(u) [34,37].Although completely avoiding gridding, the dual ANM and ANM-CVX methods have extensive computational complexity, failing to meet real-time requirements.

2D Signal Model and Its DANM-DOA Model
Consider K t spatial far-field narrowband signals acting on a N × M uniform rectangular array (URA) with array element spacing of half wavelength, as shown in Figure 1.The pitch angle φ k t and azimuth angle θ k t of the k t -th incident signals are related to the angles α k t and β k t between the x and y axes as follows: θ k t = arctan( sin(β k t ) sin(α k t ) (7) Remote Sens. 2022, 13 Equations ( 6) and (7).Therefore, this paper uses [ ( ), ( ), , ( )] [ ( ), ( ), , ( )] The single snapshot data can be expressed as: By finding the angles α k t and β k t , we can obtain θ k t and φ k t according to Equations ( 6) and (7).Therefore, this paper uses α k t and β k t for signal modeling analysis.Then, the array guiding vector, the guiding matrix of dimensions x and y are The single snapshot data can be expressed as: where ) is a diagonal amplitude matrix, and s k t is the amplitude of the k t - th target signal corresponding to the true frequencies of interest w x,k t = π sin φ k t cos θ k t = π sin α k t and w y,k t = π sin φ k t sin θ k t = π sin β k t .Two dimensional frequencies sets are 2D DOA estimation means that w x,k t and w y,k t are first recovered from the observations data X, then arriving at the angles α k t and β k t for acquiring pitch angle φ k t and azimuth angle θ k t .
According to [17], the set of atoms A V in a matrix form of the received signal data X can be expressed as where each atom is a rank l matrix.The atomic norm l 1 of X is defined as Define the one-level Toeplitz matrices T(u x ) and T(u y ) constructed by their first rows u x ∈ C M×1 , and u y ∈ C N×1 , respectively.The problem of minimizing Equation ( 12) can be transformed into addressing the following semidefinite programming (SDP) problem [19] argmin Tr(T(u x )) + Tr(T(u y )) where Y denotes the received data with noise.After obtaining T(u x ) and T(u y ), the x and y-dimensional DOA can be obtained by the following decomposition where D x and D y are diagonal matrices.The final 2D DOA is obtained by the pairing procedure after obtaining the DOA in each dimension [39].
For the DANM optimization model ( 13), an off-the-shelf CVX solver SDPT3 can be utilized to secure T(u x ) and T(u y ), where the dimensionality of the semi-definite constraint matrix is (N + M) × (N + M) [21].The above directly leads to high computational costs when the dimensions N and M are unacceptably large.Although the ADMM algorithm in [20,40] is an alternative to solving (13), computational load still needs reconsidered.

DU-DOA Estimation
In order to lighten the computational load and enhance the estimation performance efficiently, this paper intends to the DU-DOA method to solve (5) for 1D DOA and (13) for the 2D DOA estimations in this section.According to [22], the DU approach can be seen as an iterative step based on the SR algorithm for designing the structure and parameters of the deep neural network.DU methods such as LISTA, LAMP, and LePOM [22][23][24] can all theoretically achieve the 2D DOA estimation.However, these methods are unsuitable for solving the AMM-DOA and DAMM-DOA models, i.e., (5) and (13).This paper analyzes the ANM-ADMM and the DANM-ADMM algorithms to address the above problem in Sections 3.1 and 3.3, respectively.The algorithms are expanded into the deep neural network ANM-ADMM-Net and DANM-ADMM-Net to achieve fast and accurate 1D and 2D DOA estimation in Sections 3.2 and 3.4, respectively.
where τ ∈ R is the regularization factor.
The augmented Lagrangian function of the problem (15) can be defined via [41]. argmin where Λ ∈ C (M+L)×(M+L) is the Lagrangian multiplier, and ρ > 0 is the penalty factor.Note ( 16) is an unconstrained optimization problem, so that given the received signal Y, unknown signal components X, and vector u can be estimated by alternatively minimizing the cost function of X, W, u, Θ and Λ [14].The specific iterative process can be found in Appendix A.
In conclusion, the ADMM algorithm for solving (5) is provided in Algorithm 1.
Input: Y, iterations K, penalty factor ρ, regularization factor τ. Initialization: End Output: Recovered signal of interest X (K) , the optimal estimate u (K) .
Then: Estimate f k t and s k t by Vandermonde decomposition of T u (K) √ M/2) and compute DOA by According to the above derivation, the parameters of the model-driven ADMM algorithm, including the penalty factor ρ and the regularization factor τ, need to be set in advance, which is a challenge for practical applications.Meanwhile, inappropriate parameter settings will decrease the convergence speed and accuracy of the ADMM algorithm, thus adding to the computing complexity of (5) and degrading the DOA estimation performance.Even if the proper parameters can be chosen by theoretical analysis and cross-validation method [41], the fixed parameter settings fail to guarantee the optimal convergence of the ANM-ADMM algorithm.Based on the idea of the DU method, this paper expands the algorithm into a deep neural network ANM-ADMM-Net and learns its optimal parameters from the constructed data obeying a particular distribution, thus solving the above problems.
It is necessary to note that the optimal estimates T(u (K) ) of the algorithm are the critical points of the following estimation.Therefore, we can consider the output as a label and construct a proper loss function in the next subsection.

1D ANM-ADMM-Net DOA
For the optimization problem shown in (5), when the system parameters are given and the complex amplitude and noise all obey a particular distribution, the received data Y will also have a particular distribution.At this point, assume that an optimal set of parameter sequences exists so that (5) can be solved quickly and accurately by the ADMM algorithm for all received signals, with the DOA obeying a specific distribution.Therefore, this subsection constructs the network ANM-ADMM-Net to tackle the problems of the ADMM algorithm based on its iterative steps.This network couples the interpretability of the model-driven algorithm and the nonlinear fitting capability of the data-driven deep learning method.Training networks based on a sufficient and complete training data set can obtain optimal iterative parameters, thereby reducing the number of iterations and further enabling higher speed and ameliorated performance of DOA estimation.In the following, the four parts of ANM-ADMM-Net: network structures, dataset construction, network initialization, and training are described thoroughly.

1D ANM-ADMM-Net DOA
For the optimization problem shown in (5), when the system parameters are given and the complex amplitude and noise all obey a particular distribution, the received data Y will also have a particular distribution.At this point, assume that an optimal set of parameter sequences exists so that (5) can be solved quickly and accurately by the ADMM algorithm for all received signals, with the DOA obeying a specific distribution.Therefore, this subsection constructs the network ANM-ADMM-Net to tackle the problems of the ADMM algorithm based on its iterative steps.This network couples the interpretability of the model-driven algorithm and the nonlinear fitting capability of the data-driven deep learning method.Training networks based on a sufficient and complete training data set can obtain optimal iterative parameters, thereby reducing the number of iterations and further enabling higher speed and ameliorated performance of DOA estimation.In the following, the four parts of ANM-ADMM-Net: network structures, dataset construction, network initialization, and training are described thoroughly.

Network Structure
According to the steps in Algorithm 1, the ANM-ADMM algorithm can be mapped to a K layer network ANM-ADMM-Net shown in Figure 2, whose inputs are Y , , to arrive at the signal components   (1) Reconstruction sub-layer ( 1)   k + A : Taking the output () D of k -th layer and received signal Y as the input, the output where  + is the learnable parameter.The output ( 1)   k + X of ( 1)   k + A can be used as the The network structure of ANM-ADMM-Net.
(1) Reconstruction sub-layer A (k+1) : Taking the output Λ (k) x of the sub-layer E (k) and of the sub-layer D (k) of k-th layer and received signal Y as the input, the output X (k+1) is updated as where ρ k+1 is the learnable parameter.The output X (k+1) of A (k+1) can be used as the input of the sub-layer D (k+1) and E (k+1) in the (k + 1)-th layer.
(2) Auxiliary variable update sub-layer B (k+1) : Taking the output Θ (k) W of the sub-layer D (k+1) and Λ (k) W of the sub-layer E (k) in the k-th layer are used as inputs, the output of B (k+1) is given by where τ k+1 is the learnable parameter.The output W (k+1) is the input of the sub-layer D (k+1) and E (k+1) in the (k + 1)-th layer.
(3) Toeplitz transform sub-layer C (k+1) : The output T(u) of the sub-layer D (k+1) and T(u) of the sub-layer E (k) in the k-th layer are used as inputs, and the output is represented by where u (k+1) can be the input of the sub-layer D (k+1) and E (k+1) in the (k + 1)-th layer.
(4) Nonlinear sub-layer D (k+1) : Taking the output Λ T(u) of the sub-layer E (k) of the k-th layer, the output X (k+1) of the sub-layer A (k) , and the output u (k+1) of the sub-layer B (k+1) as the input, the output of D (k+1) can be given as where the output Θ (k+1) of D (k+1) can be used as the input of the sub-layer A (k+2) , B (k+2) , and C (k+2) in the (K + 2)-th layer.
(5) Multiplier update sub-layer E (k+1) : Taking the output Λ (k) of the sub-layer D (k) of the k-th layer, the output X (k+1) , W (k+1) , u (k+1) and Θ (k+1) of the sub-layer A (k+1) , B (k+1) , C (k+1) and D (k+1) respectively as input, the output of E (k+1) can be updated by where the multiplier update rate η k+1 is the learnable parameter, and the output Λ (k+1) of E (k+1) can be used as the input in the (k + 2)-th layer.It is essential to emphasize that the new parameters η k+1 are added to enhance the learning capability further and the performance of ANM-ADMM-Net compared to updating multipliers by ρ of ANM-ADMM (as shown in Algorithm 1).
Considering that each sub-layer's parameters are learned and tuned, there will be 3K parameters for K layer ANM-ADMM-Net generally, i.e., {ρ Compared with the ANM-ADMM algorithm, where the parameters are fixed, this parameter learning strategy of ANM-ADMM-Net has the advantages of superior flexibility and brilliant nonlinear fitting capability [36].More importantly, the design of the network is guided by the model and is highly interpretable.

Data Construction
The proposed ANM-ADMM-Net is a sparse recovery approach jointly driven by model and data.The key to its effectiveness is constructing a reasonable dataset with generalization capability.By building an adequate and complete dataset, ANM-ADMM-Net is less prone to overfitting during the training process, performing better DOA estimation.Thus, this paper randomly generates the signal data Y obeying a particular distribution and forms the received data X.Specifically: (1) Given the array element number M, the frequency range ( f min , f max ], corresponding angle range (θ min , θ max ] =(asind (2 f min ), asind(2 f max )], number of snapshots L.
(2) Given the maximum value of the number of sources K t and randomly generate the sources number and the frequency interval between any two sources in the case of multiple sources needs to satisfy min i =j , then the angle of the source can be obtained as θ 1:K t .The received signal Y is generated according to (1), where .

Network Initialization and Training
Consider the difficulty of mapping T(•) in ( 5), T * (•) and eigen-decomposition in Algorithm 1 when training ANM-ADMM-Net.Proper initialization of the parameters Ω = {ρ k+1 , λ k+1 , γ k+1 } K−1 k=0 and an appropriate training method, including loss function, optimizer, and learning schedules, will make it easier to reach convergence and avoid falling into a locally optimal solution to a certain extent. (

1) Network initialization
The initial values of the parameters in each layer are set as ρ 1:K = ρ 0 , τ 1:K = τ 0 , and η 1:K = η 0 to enhance the proposed method's flexibility based on the theoretical analysis [41].Compared with the ANM-ADMM algorithm with fixed parameter settings, ANM-ADMM-Net will substantially increase the convergence rate (i.e., reduce the number of iterations) and shorten the time to solve (5), with guaranteed convergence performance. (

2) Network training
The Adam algorithm is adopted for learning and tuning the parameters with an initial learning rate of 1 × 10 −3 to achieve the possible global optimum rapidly.Based on the training dataset Y train q , u train q Q=0.8D q=1 constructed in Section 3.2.2 and given the network layer K, the optimal parameters k=0 can be obtained by minimizing the following normalized mean square error (NMSE) loss function using the principle of Back Propagation (BP) [42], i.e., where M/2 denotes the estimated components of the K-th Toeplitz transform sub-layer output of the network with parameters Ω, Θ (0) = 0 M+L , Λ (0) = 0 M+L and Y train q as inputs, respectively.
Then, according to the given testing dataset , after obtaining the optimal parameters Ω * , the recovered signal X test o and the Toeplitz matrix T(u test o ) can be estimated online by At the end of the ANM-ADMM-Net DOA method, T u test o √ M/2 can be the input of the Vandermonde decomposition to obtain the frequency f k t and amplitude values s k t , thereby achieving the estimated DOA θ k t = asind(2 f k t ).

2D DANM-ADMM DOA
To facilitate the solution, ( 13) is rewritten as ( 25): Tr(T(u x )) + Tr(T(u y )) where λ ∈ R is the regularization factor.Then, the augmented Lagrangian function of the problem ( 25) can be defined via [41] argmin where Λ ∈ C (N+M)×(N+M) is the Lagrangian multiplier, and ρ > 0 is the penalty factor.Note ( 26) is an unconstrained optimization problem, so that, given the received signal Y, unknown signal components X and Toeplitz matrix T(u x ) and T(u y ) can be estimated by minimizing the objective function of X, u x , u y , Θ, Λ alternatively.By the same token as in Algorithm 1, the ADMM algorithm for solving (13) is provided in Algorithm 2. Input: Y, number of iterations K, penalty factor ρ, regularization factor λ. Initialization: End Output: Recovered signal of interest X (K) , the optimal estimate u (K) x and u (K) y .Then: Compute matrix T(u x ) and T(u (K) y ); Retrieve two dimensional frequencies ( ŵx , ŵy ) without pairing by root music algorithm or matrix pencil method [38].Then, final 2D DOA estimation can be obtained by following pairing technique.
where Θ are Λ Hermitian matrices, and N, δ g + denotes that all elements smaller than zero are set to zero; and P is an orthogonal matrix satisfying PP T = I M+N .According to the above derivation, the parameters of the model-driven ADMM algorithm, including the penalty factor ρ and the regularization factor λ, need to be set in advance, which is a challenge for practical applications.Based on the idea of the DU method, this paper also expands the algorithm into a deep neural network DANM-ADMM-Net as Section 3.2, thus solving the above problems.The optimal estimates T(u x ) and T(u (K) y ) can be considered as labels and construct a proper loss function in the next subsection.

2D DANM-ADMM-Net DOA
In the following, the four parts of DANM-ADMM-Net: network structures, dataset construction, network initialization, and training, are described thoroughly.
layer network DANM-ADMM-Net, shown in Figure 3, whose inputs are Y , , and the learnable parameters are , to arrive at the signal components () K

X
and two Toeplitz matrixes ) k + -th layer operation of DANM-ADMM-Net can be expressed as where 1 {} k F +  contains five main structure sub-layers, including reconstruction sub-layer where where and ( 1)   k +

E
in the ( 1) k + -th layer.
(3) Toeplitz transform sub-layer E in the k -th layer as inputs, the output of (1) Reconstruction sub-layer A (k+1) : Taking the output Λ (k) X of the sub-layer E (k) and Θ (k) x of the sub-layer D (k) of k-th layer and received signal Y as the inputs, the output X (k+1) is updated as where the output Θ (k+1) of D (k+1) can be used as the input of the sub-layer A (k+2) , B (k+2) , and C (k+2) in the (K + 2)-th layer.
(5) Multiplier update sub-layer E (k+1) : Taking the output Λ (k) of the sub-layer E (k) of the k-th layer, the output X (k+1) , u , u , Θ (k+1) of the sub-layer A (k+1) , B (k+1) , C (k+1) , D (k+1) respectively, as inputs, the output of E (k+1) can be updated by where multiplier update rate γ k+1 is the learnable parameter, the output Λ (k+1) of E (k+1) can be used as the input in the (k + 2)-th layer.The new parameters γ k+1 are also added to enhance the learning capability further and the performance of DANM-ADMM-Net compared to updating multipliers by ρ of DANM-ADMM (as shown in Algorithm 2).
Considering that each sub-layer's parameters are learned and tuned, there will be 3K parameters for K layer DANM-ADMM-Net generally, i.e., {ρ

Data Construction
This subsection randomly generates the received data Y k t and constructs the dataset.Specifically: (1) Given the array element number M, the pulse number N, pitch angle .
(5) Use the DANM-CVX method to address (13), thereby obtaining the training label set X train q , T(u train x,q ), T(u train y,q according to the pairing technique in Algorithm 2.

Network Initialization and Training
The initial values of the parameters in each layer are set as ρ 1:K = ρ 0 , λ 1:K = λ 0 , and γ 1:K = γ 0 .The Adam algorithm is adopted for learning and tuning the parameters with an initial learning rate of 2 × 10 −3 to achieve the possible global optimum rapidly.Based on the training dataset X train q , T(u train x,q ), T(u train y,q ) Q q=1 constructed in Section 3.4.2and given the number of network layers K, the optimal parameters k=0 can be obtained by minimizing the following NMSE loss function, i.e., where ), and T(u y ) (K) (Ω, Θ (0) , Λ (0) , X train q ) denotes the estimated components of the K-th Toeplitz transform sub-layer output of the network with parameters Ω, Θ (0) = 0 M+N , Λ (0) = 0 M+N and X train q as inputs, respectively.
Then, according to the given testing dataset , after obtaining the optimal parameters Ω * , the recovered signal X test o and the Toeplitz matrix T(u test x,o ), T(u test y,o ) can be estimated online by Follow the procedures of the DANM-ADMM algorithm in Algorithm 2 and thereby achieve 2D DOA estimation perfectly at the end of the DANM-ADMM-Net method.

Experiment Results
In this section, we evaluate the DU-DOA method based on the ANM-ADMM-Net and DANM-ADMM-Net through simulation experiments.Considering that the ADMM algorithm is the only iterative algorithm for solving the ANM and DANM models, we compare it to traditional 1D and 2D DOA estimation methods with fixed parameters.For the convenience of training and testing, all offline training procedures are implemented based on Python 3.8 with the configuration of Intel(R) Core i7-6246 3.30GHz CPU and NVIDIA Quadro GV100 GPU.Once the optimal parameters after training obtaining, all testing simulations will be implemented based on MATLAB 2020b online.
Since the noise level is generally unknown in the practical application, only the training data without noise are used in the training process of the ADMM network in this paper.Then noise is added to the data in the test data to verify the ADMM algorithm's performance under different SNRs.Therefore, the training and testing datasets for DOA estimation can be constructed with parameters in Tables 1 and 2 according to Sections 3.2.2 and 3.4.2,respectively.4c shows that, as the number of network layers (the number of iterations) increases, the NMSEs of both ANM-ADMM-Net and ANM-ADMM algorithms gradually decrease, and the former is much smaller than the latter.In addition, only when the number of iterations of the ANM-ADMM algorithm is at least fifty to sixty times higher than the ANM-ADMM-Net, their NMSE is equal, which implies the computing complexity required for convergence is reduced.5c shows that as the number of network layers (the number of iterations) increases, the NMSEs of both the DANM-ADMM-Net and DANM-ADMM algorithm decrease.However, the NMSE of the former is much smaller than that of the latter.From Figure 5c,d, the two have similar NMSE only when the number of iterations of the algorithm is fifty times higher than that of network when 10 K = .Therefore, it can be concluded that the proposed can learn the optimal iteration parameters from the constructed dataset and obtain better convergence performance.5c shows that as the number of network layers (the number of iterations) increases, the NMSEs of both the DANM-ADMM-Net and DANM-ADMM algorithm decrease.However, the NMSE of the former is much smaller than that of the latter.From Figure 5c,d, the two have similar NMSE only when the number of iterations of the algorithm is fifty times higher than that of network when K = 10.Therefore, it can be concluded that the proposed can learn the optimal iteration parameters from the constructed dataset and obtain better convergence performance.

1D DOA Estimation Results Analysis
This subsection investigates the performance of ANM-ADMM-Net for DOA estimation and its upsides over the SBL algorithm, dual ANM, ANM-CVX and ANM-ADMM methods with simulation experiments.The root mean square error (RMSE) is evaluated by ) The results of different methods are shown in Figure 6, where grid numbers =100 N , =200 K and =1e-3  for the SBL algorithm [8]; =30 K , =5 L for ANM-ADMM and ANM-ADMM-Net.The results indicate that unsuitable grid division inevitably causes the estimated DOA offset of the SBL algorithm.dual ANM and ANM-CVX methods are always the optimal estimation results.ANM-ADMM cannot recover the amplitude and degree of the signal perfectly due to limited iterations.However, the proposed ANM-ADMM-Net has a better estimation of amplitude and degree than the ANM-ADMM algorithm with fixed parameters, demonstrating the effectiveness of the combination of model-driven and data-driven DU methods.In practical applications, the networks can be trained offline based on different simulation conditions to determine the range of network layers that can obtain better DOA estimation performance and lower computational complexity.Then, the network layers can be selected according to the actual situation.

DOA Estimation Results Analysis 4.2.1. 1D DOA Estimation Results Analysis
This subsection investigates the performance of ANM-ADMM-Net for DOA estimation and its upsides over the SBL algorithm, dual ANM, ANM-CVX and ANM-ADMM methods with simulation experiments.The root mean square error (RMSE) is evaluated by M c = 100 Monte Carlo trials.
The results of different methods are shown in Figure 6, where grid numbers N = 100, K = 200 and λ = 1e − 3 for the SBL algorithm [8]; K= 30, L= 5 for ANM-ADMM and ANM-ADMM-Net.The results indicate that unsuitable grid division inevitably causes the estimated DOA offset of the SBL algorithm.dual ANM and ANM-CVX methods are always the optimal estimation results.ANM-ADMM cannot recover the amplitude and degree of the signal perfectly due to limited iterations.However, the proposed ANM-ADMM-Net has a better estimation of amplitude and degree than the ANM-ADMM algorithm with fixed parameters, demonstrating the effectiveness of the combination of model-driven and data-driven DU methods.Figure 7 shows the test RMSEs when the number of network layers =10~40 K .The RMSEs of ANM-ADMM and ANM-ADMM-Net gradually decrease as the number of network layers increases (number of iterations), and the latter is much smaller than the former with fixed parameters.The above results demonstrate that the more layers of the network, the more optimal iterative parameters can be learned from the constructed dataset due to its powerful nonlinear fitting capability and superior flexibility, resulting in better DOA estimation performance.In other words, the DU-gridless DOA method based on ANM-ADMM-Net is suitable for efficiently resolving the different sparsity problems (different target signal problems) at a lower computational cost.The testing dataset when SNR = 0~60dB is constructed to verify the noise robustness according to Section 3.2.2(i.e., generate testing data for every SNR and noise regularization factor of dual ANM is 0.1), and the test results when 40 K = are shown in Figure 7b.It can be seen that when SNR 20dB  , ANM-ADMM is unable to be implemented for estimation.However, the proposed method still performs better under limited network layers/iterations, demonstrating the higher noise robustness of the latter with the optimal parameters.In addition, when SNR > 40dB , the test RMSE of ANM- ADMM-Net tends to be stable and close to the results obtained in the noise-free case.The above implies that even if noise-free data train the network, the proposed method still obtains better parameter results and can perform DOA estimation for the actual array received data containing noise. Figure 7 shows the test RMSEs when the number of network layers K = 10∼ 40.The RMSEs of ANM-ADMM and ANM-ADMM-Net gradually decrease as the number of network layers increases (number of iterations), and the latter is much smaller than the former with fixed parameters.The above results demonstrate that the more layers of the network, the more optimal iterative parameters can be learned from the constructed dataset due to its powerful nonlinear fitting capability and superior flexibility, resulting in better DOA estimation performance.In other words, the DU-gridless DOA method based on ANM-ADMM-Net is suitable for efficiently resolving the different sparsity problems (different target signal problems) at a lower computational cost.Figure 7 shows the test RMSEs when the number of network layers =10~40 K .The RMSEs of ANM-ADMM and ANM-ADMM-Net gradually decrease as the number of network layers increases (number of iterations), and the latter is much smaller than the former with fixed parameters.The above results demonstrate that the more layers of the network, the more optimal iterative parameters can be learned from the constructed dataset due to its powerful nonlinear fitting capability and superior flexibility, resulting in better DOA estimation performance.In other words, the DU-gridless DOA method based on ANM-ADMM-Net is suitable for efficiently resolving the different sparsity problems (different target signal problems) at a lower computational cost.The testing dataset when SNR = 0~60dB is constructed to verify the noise robustness according to Section 3.2.2(i.e., generate testing data for every SNR and noise regularization factor of dual ANM is 0.1), and the test results when 40 K = are shown in Figure 7b.It can be seen that when SNR 20dB  , ANM-ADMM is unable to be implemented for estimation.However, the proposed method still performs better under limited network layers/iterations, demonstrating the higher noise robustness of the latter with the optimal parameters.In addition, when SNR > 40dB , the test RMSE of ANM- ADMM-Net tends to be stable and close to the results obtained in the noise-free case.The above implies that even if noise-free data train the network, the proposed method still obtains better parameter results and can perform DOA estimation for the actual array received data containing noise.The testing dataset when SNR = 0~60dB is constructed to verify the noise robustness according to Section 3.2.2(i.e., generate testing data for every SNR and noise regularization factor of dual ANM is 0.1), and the test results when K = 40 are shown in Figure 7b.It can be seen that when SNR < 20dB, ANM-ADMM is unable to be implemented for estimation.However, the proposed method still performs better under limited network layers/iterations, demonstrating the higher noise robustness of the latter with the optimal parameters.In addition, when SNR > 40dB, the test RMSE of ANM-ADMM-Net tends to be stable and close to the results obtained in the noise-free case.The above implies that even if noise-free data train the network, the proposed method still obtains better parameter results and can perform DOA estimation for the actual array received data containing noise.

2D DOA Estimation Results Analysis
This subsection investigates the performance of DANM-ADMM-Net for 2D DOA estimation and its upsides over the conventional 2D DOA methods with simulation experiments.The root mean square error (RMSE) is also evaluated by M c = 100 Monte Carlo trials.
The 2D DOA results estimated by different methods when M = 20, K = 20 are shown in Figure 8, where the parameters of the 2D-MUSIC method are set to SNR = 20 dB, and the number of snapshots L = 20.The results in Figure 8b show that the DANM-ADMM algorithm fails to precisely converge to the ground truth precisely at a finite number of iterations due to improper parameter settings, resulting in some deviations in both azimuth and pitch angles estimated for individual targets.In contrast, the parameters of the proposed method are optimized for data and networks, thus converging to a better result completely at a finite number of layers.
The 2D DOA results estimated by different methods when 20 M = , =20 K are shown in Figure 8, where the parameters of the 2D-MUSIC method are set to , and the number of snapshots 20 L = .The results in Figure 8b show that the DANM-ADMM algorithm fails to precisely converge to the ground truth precisely at a finite number of iterations due to improper parameter settings, resulting in some deviations in both azimuth and pitch angles estimated for individual targets.In contrast, the parameters of the proposed method are optimized for data and networks, thus converging to a better result completely at a finite number of layers.Figure 9a gives the results of the test RMSEs when the number of array elements =20 M and the number of network layers =5~25 K .The RMSEs of DANM-ADMM and DANM-ADMM-Net gradually decrease as the number of network layers increases (number of iterations).When =25 K , the RMSE of the latter is reduced by 20dB compared to the former with fixed parameters, which indicates that the more layers of the network, the more optimal iterative parameters can be learned from the constructed dataset, resulting in better 2D DOA estimation performance.Figure 9a gives the results of the test RMSEs when the number of array elements M= 20 and the number of network layers K = 5 ∼ 25.The RMSEs of DANM-ADMM and DANM-ADMM-Net gradually decrease as the number of network layers increases (number of iterations).When K= 25, the RMSE of the latter is reduced by 20dB compared to the former with parameters, which indicates that the more layers of the network, the more optimal iterative parameters can be learned from the constructed dataset, resulting in better 2D DOA estimation performance.
The 2D DOA results estimated by different methods when 20 M = , =20 K are shown in Figure 8, where the parameters of the 2D-MUSIC method are set to , and the number of snapshots 20 L = .The results in Figure 8b show that the DANM-ADMM algorithm fails to precisely converge to the ground truth precisely at a finite number of iterations due to improper parameter settings, resulting in some deviations in both azimuth and pitch angles estimated for individual targets.In contrast, the parameters of the proposed method are optimized for data and networks, thus converging to a better result completely at a finite number of layers.Figure 9a gives the results of the test RMSEs when the number of array elements =20 M and the number of network layers =5~25 K .The RMSEs of DANM-ADMM and DANM-ADMM-Net gradually decrease as the number of network layers increases (number of iterations).When =25 K , the RMSE of the latter is reduced by 20dB compared to the former with fixed parameters, which indicates that the more layers of the network, the more optimal iterative parameters can be learned from the constructed dataset, resulting in better 2D DOA estimation performance.The test dataset is constructed by Section 3.4.3when SNR = 0~60dB (i.e., test data are generated for each SNR) to verify the noise robustness of DANM-ADMM-Net.The results when K = 25 are shown in Figure 9b.The tested RMSE of both methods decreased with increased SNR under a limited number of network layers/iterations.If we train the network with data when noise exists, for example, SNR = 10 dB, DANM-ADMM-Net performs better than DANM-ADMM under any circumstances.Therefore, to ensure higher noise robustness, one can estimate the signal-to-noise ratio first, and then use the corresponding data containing the noise to train the network.denotes the number of iterations, which indicates that when the matrix size is large, the computational complexity of the ANM-ADMM and DANM-ADMM methods to solve the ANM model and DANM model, respectively, is much smaller than that of the ANM-CVX and DANM-CVX methods, respectively.It is important to emphasize that the computational complexity analysis of ANM/DANM-ADMM-Net in this paper does not include the computational costs required for network training, since it is possible to train offline and apply online.Moreover, after training to obtain the optimal parameters, the computational complexity of networks and its ADMM algorithms are identical and differ in the iterative parameters.Therefore, networks and ANM/DANM-ADMM algorithms will have the same computational complexity when applied with the same network layers (number of iterations).However, considering the reduced iterations requirements of networks, the computational complexity is further decreased compared to algorithms.
The results in Figure 10 show that the growth rate of the proposed method's running time is much lower than that of the ANM-CVX and dual ANM methods, as the number of array elements or snapshots increases.In conclusion, ANM-ADMM-Net can be an alternative in the presence of larger-scale arrays or snapshots and high real-time requirements, and it is consistent with the above theoretical analysis.The same conclusion also can be drawn for DANM-CVX and DANM-ADMM-Net in Figure 11.

Conclusions
In this paper, a DU-based gridless DOA method is proposed.ANM-ADMM and DANM-ADMM algorithms are examined in the continuous domain, and the deep neural networks ANM-ADMM-Net and DANM-ADMM-Net are constructed for their problems.Their network structure, data set construction method, network initialization, and training method are introduced.Then, their performance is validated by simulation experiments.The results show that: compared with the existing methods, model-plusdata-driven ANM-ADMM-Net and DNM-ADMM-Net can learn the optimal iteration parameters from the data and quickly obtain more accurate 1D and 2D DOA estimation.The required computational complexity is reduced by 50~60 times and 20 times, respectively.From the paper [43], it is known that the computational complexity of the solver SDPT3-based CVX method is q 2 1 q 2 2 , where q 1 denotes the number of variables and q 2 denotes the dimensionality of the SDP matrix.Therefore, the computational complexity of the CVX method for solving the ANM model is O{Q(M + L) 2 (ML + M + L) 2 } and O{Q(M + N) 2 (MN + M + N) 2 } for solving the DANM model.In contrast, while that of the ADMM method is O{Q(M + L) 3 } and O{Q(M + N) 3 }, respectively, where Q denotes the number of iterations, which indicates that when the matrix size is large, the computational complexity of the ANM-ADMM and DANM-ADMM methods to solve the ANM model and DANM model, respectively, is much smaller than that of the ANM-CVX and DANM-CVX methods, respectively.
It is important to emphasize that the computational complexity analysis of ANM/DANM-ADMM-Net in this paper does not include the computational costs required for network training, since it is possible to train offline and apply online.Moreover, after training to obtain the optimal parameters, the computational complexity of networks and its ADMM algorithms are identical and differ in the iterative parameters.Therefore, networks and ANM/DANM-ADMM algorithms will have the same computational complexity when applied with the same network layers (number of iterations).However, considering the reduced iterations requirements of networks, the computational complexity is further decreased compared to algorithms.
The results in Figure 10 show that the growth rate of the proposed method's running time is much lower than that of the ANM-CVX and dual ANM methods, as the number of array elements or snapshots increases.In conclusion, ANM-ADMM-Net can be an alternative in the presence of larger-scale arrays or snapshots and high real-time requirements, and it is consistent with the above theoretical analysis.The same conclusion also can be drawn for DANM-CVX and DANM-ADMM-Net in Figure 11.

Figure 1 .
Figure 1.URA signal model.By finding the angles


for signal modeling analysis.Then, the array guiding vector, the guiding matrix of dimensions x and y are
th layer operation of ANM-ADMM-Net can be expressed as

FD
+  contains five main structural sub-layers, including reconstruction sub- layer ( 1) k + A , the auxiliary variable update sub-layer ( , and the multiplier update sub-layer ( 1) k + E corresponding to(17).The specific description is as follows.
obeys the standard normal distribution and N is Gaussian noise with SNR.(4) Repeat the above to obtain the set of D received signals {Y d , X d } D d=1 , the set of the frequencies and the angles { f 1:K t , θ 1:K t } d D d=1 .The ideal label set vector {u d } Dd=1 can be obtained according to Vandermonde decomposition after the dual atomic norm minimization method[34,37].(5) Divide randomly the above set into a training dataset Y train q

E
in the k -th layer as inputs, the output of

4. 1 .
Network Convergence Analysis This subsection investigates the convergence performance of the ANM-ADMM-Net and DANM-ADMM-Net under different layers and compares it with the traditional algorithms with fixed iterative parameters.The iteration parameters of ANM-ADMM are set as ρ 0 = 0.5, τ 0 = 0.01, η 0 = 0.5.Set different network layers K = 10 ∼ 40, initialize and train network (450 epochs), and the results are shown in Figure 4.Among them, Figure 4a,b shows training NMSEs and testing NMSEs when K = 40, and Figure 4c shows the NMSEs of two methods.Based on Figure 4a,b, the training and testing NMSEs of the proposed method decrease with the increase in training time and effectively reach convergence.

Figure 4 .
Figure 4. Convergence performance of ANM-ADMM-Net and its comparison with ANM-ADMM algorithm.(a) Train NMSEs when M = 10 (b) Test NMSEs when M = 10 (c) Test NMSEs when K = 10 − 40 (d) Test NMSEs when K = 50 − 4050.The iteration parameters of DANM-ADMM algorithm are set as ρ 0 = 0.1, λ 0 = 0.1, and γ 0 = 0.1.Set different network layers K = 5 ∼ 25, initialize and train DANM-ADMM-Net (600 epochs), and the results are shown in Figure 5.Among them, Figure 5a,b is the training NMSEs and testing NMSEs of DANM-ADMM-Net when K = 5/10, and Figure 5c shows the NMSEs of both when the number of network layers (iterations) K = 5 ∼ 25. Figure 5d reveals the NMSEs of algorithm when iterations K = 5 ∼ 300.Based on Figure 5a,b, the training and testing NMSEs of DANM-ADMM-Net decrease with the increase in training time and effectively reach convergence after 500 training epochs.In addition, Figure5cshows that as the number of network layers (the number of iterations) increases, the NMSEs of both the DANM-ADMM-Net and DANM-ADMM algorithm decrease.However, the NMSE of the former is much smaller than that of the latter.From Figure5c,d, the two

4. 3 .
Computational Complexity and Running Time Comparsion This subsection analyzes the computational complexity and running time of SDPT3 based ANM-CVX method and ANM-ADMM-Net at different array elements when network layers K = 40.The running time results when M = 10∼ 50, L = 5 and M = 10, L = 1 ∼ 15 are shown in Figure 10.The results of the comparison of DANM-CVX and DANM-ADMM-Net when network layers K = 25 are also shown in Figure 11.

Figure 11 .
Figure 11.Running time of different methods versus array element number M .

Figure 11 .
Figure 11.Running time of different methods versus array element number M.
and number of samples D.(2) Given the maximum value of the number of sources K t and randomly generate the sources number k t .(3) For each k t (1, 2, • • • , K t ), compute w x,k t = π sin φ k t cos θ k t and w y,k t = π sin φ k t sin θ k t satisfying min i =j |w x,i − w x,j | ≥ 1/ 4(N − 1) and min i =j |w y,i − w y,j | ≥ 1/ 4(M − 1) [19].k t s k t a x (w x,k t )a T y (w y,k t ) = A x SA T y , and amplitude s k t obeys complex standard normal distribution.N p is the complex Gaussian white noise with SNR.(4) divide the received signal data Y d , X d D d=1 into Q training data
This subsection investigates the performance of DANM-ADMM-Net for 2D DOA estimation and its upsides over the conventional 2D DOA methods with simulation experiments.The root mean square error (RMSE) is also evaluated by 100 This subsection investigates the performance of DANM-ADMM-Net for 2D DOA estimation and its upsides over the conventional 2D DOA methods with simulation experiments.The root mean square error (RMSE) is also evaluated by 100