Direction of Arrival Estimation in Elliptical Models via Sparse Penalized Likelihood Approach

In this paper, an l1-penalized maximum likelihood (ML) approach is developed for estimating the directions of arrival (DOAs) of source signals from the complex elliptically symmetric (CES) array outputs. This approach employs the l1-norm penalty to exploit the sparsity of the gridded directions, and the CES distribution setting has a merit of robustness to the uncertainty of the distribution of array output. To solve the constructed non-convex penalized ML optimization for spatially either uniform or non-uniform sensor noise, two majorization-minimization (MM) algorithms based on different majorizing functions are developed. The computational complexities of the above two algorithms are analyzed. A modified Bayesian information criterion (BIC) is provided for selecting an appropriate penalty parameter. The effectiveness and superiority of the proposed methods in producing high DOA estimation accuracy are shown in numerical experiments.


Introduction
Estimating the directions of arrival (DOAs) of a number of far-field narrow-band source signals is an important problem in signal processing. Many DOA estimation methods were proposed early on, such as multiple signal classification (MUSIC) [1], estimation of signal parameters via rotation invariance techniques (ESPRIT) [2] and their variants [3,4]. Many of them work well when having accurate estimates of the array output covariance matrix and source number. In scenarios with sufficient array snapshots and a moderately high signal-to-noise (SNR), the array output covariance matrix and source number can be accurately estimated.
Recently, some sparse DOA estimation methods are popularly proposed based on sparse constructions of the array output model. They are applicable if the number of sources is unknown and many of them are effective when with a limited number of data snapshots. In [5], the sparse DOA estimation methods are categorized into three groups: the on-grid, the off-grid and the grid-less. The on-grid method, which is widely studied and straightforward to implement, assumes that the true DOAs are on a predefined grid [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. The off-grid method also uses a prior grid but does not constrain the DOA estimates to be on this grid, while it introduces more unknown parameters to be estimated and complicates the algorithm [23][24][25][26][27][28]. The grid-less method directly operates in the entire direction domain without a pre-specified grid but, currently, is developed mainly for linear array and encounters rather large computational burden [29][30][31]. Although the on-grid methods may induce the grid mismatch, it is still attractive due to its easy accessibility in a wide range of applications (see, e.g., [28,32,33]). To alleviate the grid mismatch, methods for selecting an appropriate prior grid are proposed in [9,34].
1. Our method is a sparse on-grid method based on the penalized ML principle, and is designed especially for the CES random output signals. The Gaussian distribution and many heavy-tailed and light-tailed distributions are included in the class of the CES distributions, and it is worth mentioning that the heavy-tailed output signals that are non-Gaussian are common in the field of signal processing [35,36]. 2. The sparsity of the unknown sparse vector is enforced by penalizing its l 1 -norm. When the penalty parameter becomes zero, the proposed method is actually the general ML DOA estimation method that is applicable to the scenarios with CES random output signals. 3. Two penalized ML optimization problems are formulated for spatially uniform and non-uniform white sensor noise, respectively. Since it is difficult to solve the two non-convex penalized ML optimizations globally, two majorization-minimization (MM) algorithms having different iterative procedures are developed for seeking the optimal solutions locally for each of them. Some discussions on the computational complexities of the above two algorithms are provided. In addition, the optimal penalty parameter is suggested. 4. The proposed methods are evaluated numerically in scenarios with Gaussian and non-Gaussian output signals. Particularly, the performance gains originated from the added l 1 -norm penalty are numerically demonstrated.
The remainder of this paper is organized as follows. Section 2 introduces a sparse CES data model of the array output. In Section 3, a sparse penalized ML method is developed to estimate the DOAs. For solving the proposed l 1 -penalized ML optimizations that are non-convex, algorithms in the MM framework are developed in Section 4. Section 5 numerically shows the performance of our method in Gaussian and non-Gaussian scenarios. Finally, some conclusions are given in Section 6.

Notations
The notation C m (R m ) denotes the set of all m-dimensional complex-(real-) valued vectors, and C m×n (R m×n ) denotes the set of all m × n complex-(real-) valued matrices. 1 p and 0 p are the p-dimensional vectors with all elements equal to 1 and 0, respectively. I p is the p × p identity matrix. · 1 and · 2 denote the l 1 -norm and l 2 -norm of a vector, respectively. The superscripts (·) T and (·) H , respectively, denote the transpose and the conjugate transpose of a vector or matrix. The imaginary unit is denoted as ı defined by ı 2 = −1.
For a vector x = [x 1 , . . . , x p ] T ∈ C p , define element-wise square root function sqrt(x) = [ |x 1 |, . . . , |x p |] T and element-wise division operation x y = [x 1 /y 1 , . . . , x p /y p ] T for a vector y = [y 1 , . . . , y p ] T ∈ C p with non-zero elements. max(x) = max{x 1 , . . . , denotes the stacked vector of the column vectors x and z. Diag(x) denotes a square diagonal matrix with the elements of vector x on the main diagonal, and (Diag(x)) −1 denotes the diagonal matrix with main diagonal elements 1/x i (i = 1, . . . , p) by taking 1/0 = ∞. For a square matrix X ∈ C p×p , X i,j denotes the (i, j)th entry of X, X > 0 (≥ 0) means that X is Hermitian and positive definite (semidefinite), tr(X) denotes the trace of X, and diag(X) denotes a column vector of the main diagonal elements of X.

Problem Formulation
Consider the problem of estimating the DOAs of k 0 narrow-band signals impinging on an array of m sensors.
Given a set of grid points we assume that the true k 0 (k 0 k) DOAs, respectively, denoted as ξ 1 , . . . , ξ k 0 , take values in it. The array output measurement at the instant t, denoted as x(t) ∈ C m , can be modeled as where . , a(θ k )] ∈ C m×k is the known array manifold matrix with a(θ i ) being the steering vector corresponding to θ i , i = 1, . . . , k; • s(t) = [s 1 (t), . . . , s k (t)] T ∈ C k is the source signal vector at the time instant t, in which s i (t) is the unknown random signal from a possible source at θ i and then is zero if θ i is not in the true DOA set ] T ∈ C m is the noise vector impinging on the sensor array at the time instant t.
Some necessary statistical assumptions are made as follows: • The possible source signals s 1 (t), . . . , s k (t) are uncorrelated and zero-mean at any time instant t.

•
The n (n > m) snapshots x(1), . . . , x(n) of sensor array signals are independent and identically distributed from a CES distribution.
Note that the zero-mean assumptions above are common in the signal processing literature [5,35]. The CES distribution setting of the array output x(t) enables us to effectively process the Gaussian, heavy-tailed or light-tailed array snapshots, because the class of CES distributions [38] includes the Gaussian distribution, the t-distribution, the K-distribution and so on.
For the simplicity of the notations, we denote A(θ) as A and a(θ i ) as a i for i = 1, . . . , k in the following. Under the above assumptions, we can find that the array output x(t) in Equation (2) at any time instant t has mean zero and covariance matrix where is the (unknown) source signal covariance matrix with signal power vector p = [p 1 , . . . , p k ] T , and is the (unknown) noise covariance matrix with the noise power vector σ = [σ 2 1 , . . . , σ 2 m ] T . The matrix R can be rewritten as For any i = 1, . . . , k, the signal power p i = 0 if θ i is not in the set of the true DOAs. Therefore, the true DOAs can be identified by the locations of nonzero elements of the power vector p. In the following, the DOA estimation problem is formulated as a problem of estimating the locations of nonzero elements of the power vector p.

Sparse DOA Estimation
The array output x(t) in Equation (2) is CES distributed with mean zero and covariance matrix R, and then the normalized random vector which actually refers to the angle of the array output vector x(t), has a complex angular central Gaussian (ACG) distribution [35] with the probability density function Denote and then the negative log-likelihood of y(t) = x(t)/ x(t) 2 , t = 1, . . . , n, becomes where c 1 is a constant.

Spatially Non-Uniform White Noise
In the case of spatially non-uniform white sensor noise, not all noise variances σ 2 i , i = 1, . . . , m, are equal. Assuming σ 2 m > 0, we denote where B ∈ C m×(k+m−1) is the first (k + m − 1) columns of [A, I m ] and J m is an m × m matrix with the (m, m)-entry being 1 and the other entries being 0. Since L 0 (aR) = L 0 (R), ∀a > 0 (14) and that the locations of nonzero elements of the sparse vector r [1:k] identify the true DOAs, we formulate the DOA estimation problem as solving the penalized likelihood optimization problem arg min where and λ ≥ 0 is pre-specified. The l 1 -norm penalty term λ r [1:k] 1 would help deduce a sparse solution, and the penalty parameter λ controls the sparsity level [37].

Remark 1.
We explain why the DOAs are not estimated by solving the plausible l 1 -penalized ML optimization problem arg min p,σ L 0 (R) + λ p 1 .
Recalling Equation (6), we find for any λ 1 > 0 and λ 2 > 0, Thus, [p, σ] is a locally optimal solution of Equation (17) with λ = λ 1 if and only if λ 1 λ 2 p; λ 1 λ 2 σ is a locally optimal solution of Equation (17) with λ = λ 2 . This means if we estimate p by Equation (17), the parameter λ cannot work theoretically in adjusting the sparsity level of the estimate of p. Instead of considering Equation (17), we formulate the optimization (Equation (15)) for the DOA estimation. In Equation (15), noticing the constant matrix J m in Equation (12), we find that different values of λ would result in solutions with different sparsity levels.

Spatially Uniform White Noise
In the case of spatially uniform white sensor noise, σ 1 = · · · = σ m = σ. Assuming σ 2 > 0, we denote Estimating the DOAs means identifying the locations of nonzero elements of the vector q. Considering Q = R/σ 2 , we can estimate q through solving the penalized likelihood optimization problem arg min q≥0 L 0 (Q) + λ q 1 (21) with λ ≥ 0, where the term λ q 1 plays the same role as the penalty term in Equation (15). Note that the number of unknown parameters to be estimated is k in the case of spatially uniform noise in contrast to k + m − 1 in the case of spatially non-uniform noise.

DOA Estimation Algorithms
In this section, we provide methods to solve the optimization problems in Equations (15) and (21) with λ fixed. As Equations (15) and (21) are non-convex, it is generally hard to give their globally optimal solutions. Based on the MM framework [39,40], we develop algorithms to find the locally optimal solutions of Equations (15) and (21).
for all x and f (x 0 ) = g(x 0 |x 0 ). In the MM framework, the problem arg min x f (x) can be solved through iteratively solving x u+1 = arg min x g(x|x u ).
When solving the problems in Equations (15) and (21) in the MM framework, majorizing functions can be constructed based on the following two inequalities. For any positive definite matrices U ∈ C m×m and U u ∈ C m×m [40], and where both equalities are achieved at U = U u .

Algorithms for Spatially Non-Uniform White Noise
In this subsection, using two different majorizing functions of the objection function in Equation (15), we develop two different MM algorithms named MM1 and MM2 to solve Equation (15).
where r u ∈ Ω. Replacing the U and U u in Equations (22) and (23) by W and W u , respectively, we have for any W > 0, W u > 0, where c 2 is a constant, the equality in Equation (25) is achieved at W = W u , and Denote the sum of the first two terms on the right side of Equation (26) and the l 1 -norm penalty term in Equation (15) as g 1 (r | r u , λ), and then due to Equation (12), g 1 (r | r u , λ) can be rewritten as From Equation (25), g 1 (r | r u , λ) + c 2 is found to be a majorizing convex function of the objective function in Equation (15). Based on the MM framework, the problem in Equation (15) can be solved by iteratively solving the convex optimization problem Proposition 1 (The MM1 algorithm for Equation (15)). The sequence {r u } generated by with any initial value r 0 ∈ Ω converges to a locally optimal solution of the optimization problem in Equation (15).
Proof. Since g 1 (r | r u , λ) tends to +∞ as W goes to the boundary of the positive semidefinite cone, the optimization problems in Equations (30) and (31) are equivalent. This proposition follows by the convergence property of the MM algorithm [40] and the fact that L 0 (W) → +∞ with probability 1 as W tends to the boundary of the positive semidefinite cone [41].
Due to Equation (11), r 0 = [(r 1 ) 0 , . . . , (r k+m−1 ) 0 ] T required in the iteration in Equation (31) can be the one obtained by inserting the initial estimate of [p; σ] presented in [13]. Specifically, where b i is the ith column of the matrix B and To solve the optimization problem in Equation (31) globally, we develop two available solvers: the coordinate descent (CD) and SPICE-like solvers. Denote r = [r 1 , . . . , r k+m−1 ] T in the following. (31)). The sequence of r generated by iterating

Proposition 2 (The CD solver for Equation
with any initial value r ≥ 0 converges to the globally optimal solution r u+1 of the problem in Equation (31), where δ(·) is the indicator function of the interval (0, ∞), with (w i ) u being the ith element of the vector w u , and Proof. By the general analysis of the CD method in [42], it is easy to find that the convex problem in Equation (31) can be globally solved by iterating and the iteration in Equation (39) can be reformulated as In addition, Solving the convex optimization problems in Equation (42) by the gradient method, we conclude that the equations in Equation (42) are equivalent to those in Equation (34).
As the solver in Proposition 2 is derived by the CD method, we call it the CD solver. The MM1 algorithm in Proposition 1 with the CD solver is specially named the MM1-CD algorithm.
For clarity, detailed steps of the MM1-CD algorithm for Equation (15) are presented in Algorithm 1. Note that, in the CD solver, for i = 1, . . . , k, we have where with a i being the ith column of the matrix A. The β i can be interpreted as the correlation between the signal from a possible source at the grid θ i and the array responses (34), we find that it is more likely to force to zeros the powers of the assumed signals that are less correlated with the array responses.
Proposition 3 (The SPICE-like solver for Equation (31)). The sequence of r generated by iterating with any initial value r > 0 converges to the globally optimal solution r u+1 of problem in Equation (31).

Algorithm 1
The MM1-CD algorithm for spatially non-uniform white noise.
1: Give r 0 and , 2: u = 0, 3: repeat 4: Calculate M u and w u , 6: repeat 9: for i = 1 : k + m − 1 do 10: 11: Calculate α i , β i and γ i , 12: if β i ≤ α i then 13: r i = 0, 14: else 15: 16: end if 17: end for 18: 19: Proof. It is obvious that Equation (31) and the SPICE criterion in [15] are with similar forms. By the same way as the SPICE criterion is analyzed in [15], we find that the globally optimal solution of the problem in Equation (31) is identical to the minimizer r of the optimization problem min r≥0,E∈C (k+m)×n where X u = [x(1), . . . ,x(n)] with For a fixed r, the matrix E minimizing Equation (46) can be verified to be [15] and for a fixed E, the vector r minimizing Equation (46) can be readily given by The sequences of E and r, generated by alternately iterating Equations (48) and (49) from r > 0, converge to the globally optimal solution of the convex problem in Equation (46) [14,15].
Due to X u X H u = M u , iterating Equation (49) is just iterating Equation (45). Thus, the sequence of r generated by Equation (45) converges to the minimizer r of Equation (46).
The solver in Proposition 3 is named the SPICE-like solver, and the MM1 algorithm in Proposition 1 with it is called the MM1-SPICE algorithm. Algorithm 2 summarily illustrates the MM1-SPICE algorithm for Equation (15).
From Propositions 1-3, it is found that the proposed MM1 algorithm is by an inner-outer iteration loop. The MM1-CD and the MM1-SPICE algorithms are with the identical outer loop but different nested inner loops. In addition, relationship and difference between the MM1-CD and the MM1-SPICE are discussed in Section 4.3.

MM2 Algorithm
When r u > 0, it is found from [36] that, for any r > 0, where From Equation (50), we have with the equality achieved at r = r u . The inequality in Equation (53) is also valid for any r ∈ Ω due to W > 0. Denote It is clear from Equation (53) that, when r u > 0, for any r ∈ Ω, where the equality is achieved at r = r u . Therefore, at any r u > 0, g 2 (r | r u , λ) + c 2 majorizes the objective function in Equation (15) for r ∈ Ω.
Proposition 4 (The MM2 algorithm for Equation (15)). The sequence {r u } generated by with any initial value r 0 > 0 converges to a locally optimal solution of the problem in Equation (15).

Proof.
Through the convergence analysis in [36], we have that, although Equation (55) is valid only when r u > 0, the sequence {r u } generated with any initial value r 0 > 0 converges to a locally optimal solution of the problem in Equation (15). It is worth mentioning that the elements of the coefficient vector diag(C H u M u C u ) in Equation (54) are positive. By solving the optimization problem in Equation (57) using the gradient method, the iteration procedure in Equation (57) is found to be exactly Equation (56) The r 0 involved in the iteration procedure in Equation (56) can be the one given in Equation (32). Summarily, Algorithm 3 gives the detailed steps of the MM2 algorithm for the problem in Equation (15).

DOA Estimation for Spatially Uniform White Noise
The DOA estimation in the case of spatially uniform white noise amounts to solving the optimization problem in Equation (21). Problems in Equations (21) and (15) are with similar forms, but Equation (21) involves a smaller number of unknown parameters. By the same way as the problem in Equation (15) is analyzed in Section 4.1, we can naturally derive the algorithms to solve Equation (21). The algorithms for Equation (21) are still named MM1 (including MM1-CD and MM1-SPICE) and MM2.

MM1 Algorithm
Proposition 5 (The MM1 algorithm for Equation (21)). Denote and then the sequence {q u } generated by with any initial value q ≥ 0 converges to a locally optimal solution of the problem in Equation (21).
To solve the optimization problem in Equation (62), using the same method as Equation (31) is solved, we offer two different iterative solvers for Equation (62). (62)). The sequence of q generated by iterating

Proposition 6 (The CD solver for Equation
with any initial value q ≥ 0 converges to the globally optimal solution of the problem in Equation (62), wherẽ with (e j ) u being the jth element of the vector e u , a j being the jth column of the matrix A and Proposition 6 introduces the nested CD inner loop of the MM1 algorithm for the problem in Equation (21), and can be proven similar to Proposition 2. (62)). The sequence of q generated by iterating

Proposition 7 (The SPICE-like solver for Equation
with any initial value q > 0 converges to the globally optimal solution of the problem in Equation (62).
The iterative procedure in Proposition 7 above is the nested SPICE-like inner loop of the MM1 algorithm for Equation (21), and can be proven similar to Proposition 3.
The MM procedure in Proposition 5 with the CD nested loop in Proposition 6 is the MM1-CD algorithm for Equation (21). The MM1-SPICE algorithm for Equation (21) is the MM procedure in Proposition 5 with the SPICE-like nested loop in Proposition 7.

MM2 Algorithm
Proposition 8 (The MM2 algorithm for Equation (21)). The sequence {q u } generated by with any initial value q > 0 converges to a locally optimal solution of the problem in Equation (21).

Discussions on the MM1 and MM2 Algorithms
The following arguments are focused on the case of spatially non-uniform noise, which are also applicable to the case of spatially uniform noise.
Both the MM1 and MM2 algorithms decrease the objective function in Equation (15) at each MM iteration and converge locally, in which the different majorizing functions are adopted. Compared to the MM2 algorithm, the MM1 algorithm has a majorizing function closer to the objective function in Equation (15) due to Equation (55). The computational burdens of the two algorithms are mainly caused by the matrix inversion operations.
Although the MM1-CD and MM1-SPICE algorithms have different nested inner iteration procedures, they converge to the same local solution theoretically because their outer MM iteration procedures are both Equation (31). Each nested inner iteration of the MM1-CD algorithm, detailed by Steps 9-17 in Algorithm 1, requires k + m − 1 matrix inverse operations. In each nested inner iteration of the MM1-SPICE algorithm, presented by Steps 9-10 in Algorithm 2, only one matrix inverse operation is entailed.
It is somewhat interesting to find that the iteration of the MM2 algorithm (see Equation (56)) and the nested inner iteration of the MM1-SPICE algorithm (see Equation (45)) have similar forms. In each iteration of the MM2 algorithm, only one matrix inverse operation is needed.

Selection of Penalty Parameter
The penalty parameter λ in Equations (15) and (21) affects the sparsity levels of the estimates of r and q. A modified Bayesian information criterion (BIC) method [37], which is common and statistically sound, is provided here to choose an appropriate λ. Letr λ andq λ be the solutions of the problems in Equations (15) and (21) with a fixed λ, respectively, and denoteŴ λ = B Diag(r λ )B H + J m and Q λ = A Diag(q λ )A H + I m . The appropriate λ for spatially non-uniform noise and uniform noise are arg min and arg min respectively, whereẑ λ,1 andẑ λ,2 are the numbers of nonzero elements of the vectors (r λ ) [1:k] and q λ , respectively. Note that the elements of (r λ ) [1:k] andq λ can be treated as 0 if they are smaller than some certain values respectively, e.g., ε max((r λ ) [1:k] ) and ε max(q λ ) with a very small ε > 0.
Notice that the L 0 (Ŵ λ ) and L 0 (Q λ ) in Equations (72) and (73) are substitutes for the marginal likelihood also called as Bayesian evidence required in the BIC criterion [43]. By the way, when modeling in the Bayesian framework, the marginal likelihood usually cannot be analytically calculated, but can be approximated by several computational methods in [44][45][46].

Numerical Experiments
In this section, we numerically show the performance of the proposed methods. Consider a uniform linear array (ULA), and, for each j = 1, . . . , k 0 , the steering vector corresponding to the DOA ξ j is a(ξ j ) = [1, exp(ıπ cos(ξ j )), . . . , exp(ıπ(m − 1) cos(ξ j ))] T . (74) The array output data x(t) are generated from the model and both the source signalss 1 (t), . . . ,s k 0 (t) and the observation noise v(t) are temporally independent. The SNR is defined as whereĀ = [a(ξ 1 ), . . . , a(ξ k 0 )],P is the covariance matrix ofs(t) = [s 1 (t), . . . ,s k 0 (t)] T , and V is the covariance matrix of v(t). The root mean-square error (RMSE) of the DOA estimate is employed to evaluate the estimation performance, which is approximated by R = 500 Monte Carlo runs as whereξ j,i is the estimate of ξ j in the ith Monte Carlo run. In the following experiments, we applied the proposed methods and the SPICE [13,15] and LIKES methods [15] to estimate the DOAs. Set the grid points θ = {0 • , 1 • , . . . , 180 • }, and the tolerance = 10 −8 for convergence. All methods were coded in MATLAB and executed on a workstation with two 2.10 GHz Intel Xeon CPUs.

Experiment 1
Consider a scenario with Gaussian source signals and noise: • The number of sensors in the array is m = 15, and the number of snapshots is n = 100.

•
The noise v(t) is zero-mean circular complex Gaussian with covariance matrix V = σ 2 I m .

Experimental Results
In Experiment 1, the MM1-CD, MM1-SPICE and MM2 algorithms with λ = 0 were firstly applied. Table 1 reports the iteration numbers n 1 , n 2 , n 3 , the computational time τ (in seconds), and the DOA estimation RMSEs (in degrees). Specifically, n 1 is the number of total iterations, n 2 is the number of iterations in the outer loop, amd n 3 is the average of the iterations in a nested inner loop. Besides the RMSEs, each value in Table 1 is the average of the results of 500 Monte Carlo runs. Note that, even though n 1 = n 2 × n 3 in a Monte Carlo run, n 1 is not exactly equal to n 2 × n 3 in Table 1 because they are the average of 500 Monte Carlo runs. As shown in Table 1, the MM1-CD and MM1-SPICE algorithms had similar RMSEs and the MM1-CD algorithm took much more time than the MM1-SPICE algorithm. Considering that the MM1-CD and MM1-SPICE algorithms theoretically converge to the identical local solution, we believe that they will always have similar RMSEs. Moreover, the MM2 algorithm had the smallest iteration numbers, the least computational time and satisfactory RMSEs. In the following, we present the DOA estimation performance only of the MM1-SPICE and MM2 algorithms. Note that the MM1-SPICE and MM2 algorithms may converge to different local solutions, and then they are referred to as two different estimation methods hereinafter. Specifically, we compared the following six methods: 1. LIKES: The ML method proposed in [15] under the assumption that the array output x(t) is Gaussian distributed. 2. SPICE: A sparse covariance-based estimation method proposed in [13] with no distribution assumption. It is worth presenting that in, Experiments 1-3, the standard MUSIC algorithm was found to be almost ineffective, thus we do not illustrate it.
The RMSE comparisons of the above six methods are illustrated in Figures 1-3 for Experiments 1-3 with different SNRs, respectively. Figures 4 and 5 for Experiment 2 show the DOA estimation RMSEs of the above six methods versus the number of snapshots and the angle separation, respectively.    In Figure 1, we can see that the MM1-SPICE-0 and MM2-0 methods were comparable to the LIKES and SPICE methods in the Gaussian cases. As shown in Figures 2-5, the scenarios of non-Gaussian random source signals and heavy-tailed random noise, the MM1-SPICE-0 and MM2-0 methods performed much better than the LIKES and SPICE methods. In other words, the ML methods designed for the CES outputs were effective in the simulation scenarios of Gaussian and non-Gaussian distributions.
As shown in Figures 1-5, we found that the penalized ML methods, i.e., the MM1-SPICE-P and MM2-P methods, had smaller RMSEs than the other four methods. As shown in Figure 4, with the increase of the number of snapshots, the performance of the MM1-SPICE-P and MM2-P methods improved but the performance of the MM1-SPICE-0 and MM2-0 methods remained virtually unchanged. Figure 5 shows that, as ∆ξ increased from 2 • to 6 • , the performance of the MM1-SPICE-P and MM2-P methods became better, while, when ∆ξ was larger than 4 • , the performance of the MM1-SPICE-0 and MM2-0 methods no longer improved. As shown in Figures 1-3, unsurprisingly, as the SNR increased, the performance of all the six methods became better.
For illustrating the difference between the penalized and un-penalized ML methods, we evaluated the normalized spectrum (NS): wherer andq are, respectively, the estimates of r and q.

Conclusions
This paper provides a penalized ML method for DOA estimation under the assumption that the array output has a CES distribution. Two MM algorithms, named MM1 and MM2, are developed for solving the non-convex penalized ML optimization problem for spatially uniform or non-uniform noise. They converge locally with different rates. The numerical experiments showed that the MM2 ran faster and performed as well as the MM1 when estimating the DOAs. Their rates of convergence will be further explored theoretically in future.
It is worth mentioning that, in numerical simulations, the proposed l 1 -norm penalized likelihood method effectively estimated the DOAs, although the values of nonzero elements of r or p were not estimated very accurately. By replacing the l 1 -norm penalty by other proper penalties (e.g., the smoothly clipped absolute deviation (SCAD) [47], adaptive LASSO [48], or l p -norm penalty (0 ≤ p < 1)), more accurate estimates of the unknown parameter may be derived in the future.
If the l 1 -norm penalty is replaced by the adaptive LASSO penalty, then the algorithms proposed in this paper can be applied almost without modification because the adaptive LASSO penalty is a weighted l 1 -norm penalty. When the non-convex SCAD or the l p -norm penalty (0 ≤ p < 1) is employed, the convex majorizing functions given in [40,49] can be exploited based on the MM framework, and then the algorithms in this paper with minor modifications are also applicable.
When we have some informative prior knowledge on the directions of source signals, the problem of estimating the DOAs can be formulated as maximizing a sparse posterior likelihood from the Bayesian perspective. The sparse Bayesian method of estimating DOAs from the CES array outputs is interesting and is worth studying, while how to formulate and solve a sparse posterior likelihood optimization and how to do model selection are challenges. The BIC criterion can be used for the model selection, in which the Bayesian evidence difficult to be analytically derived can be approximated by the methods in [44][45][46]. More research along this line will be done in the future.

Conflicts of Interest:
The authors declare no conflicts of interest.