Direction of Arrival Estimation in Elliptical Models via Sparse Penalized Likelihood Approach

Chen Chen; Jie Zhou; Mengjiao Tang

doi:10.3390/s19102356

,

and

¹

College of Mathematics, Sichuan University, Chengdu 610064, China

²

Center for Information Engineering Science Research, School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China

^*

Author to whom correspondence should be addressed.

Sensors2019, 19(10), 2356;https://doi.org/10.3390/s19102356

This article belongs to the Section Physical Sensors

Version Notes

Order Reprints

Abstract

In this paper, an

l_{1}

-penalized maximum likelihood (ML) approach is developed for estimating the directions of arrival (DOAs) of source signals from the complex elliptically symmetric (CES) array outputs. This approach employs the

l_{1}

-norm penalty to exploit the sparsity of the gridded directions, and the CES distribution setting has a merit of robustness to the uncertainty of the distribution of array output. To solve the constructed non-convex penalized ML optimization for spatially either uniform or non-uniform sensor noise, two majorization-minimization (MM) algorithms based on different majorizing functions are developed. The computational complexities of the above two algorithms are analyzed. A modified Bayesian information criterion (BIC) is provided for selecting an appropriate penalty parameter. The effectiveness and superiority of the proposed methods in producing high DOA estimation accuracy are shown in numerical experiments.

Keywords:

direction of arrival (DOA); complex elliptically symmetric (CES) distributions; sparse penalized likelihood method; majorization-minimization (MM) algorithm

1. Introduction

Estimating the directions of arrival (DOAs) of a number of far-field narrow-band source signals is an important problem in signal processing. Many DOA estimation methods were proposed early on, such as multiple signal classification (MUSIC) [1], estimation of signal parameters via rotation invariance techniques (ESPRIT) [2] and their variants [3,4]. Many of them work well when having accurate estimates of the array output covariance matrix and source number. In scenarios with sufficient array snapshots and a moderately high signal-to-noise (SNR), the array output covariance matrix and source number can be accurately estimated.

Recently, some sparse DOA estimation methods are popularly proposed based on sparse constructions of the array output model. They are applicable if the number of sources is unknown and many of them are effective when with a limited number of data snapshots. In [5], the sparse DOA estimation methods are categorized into three groups: the on-grid, the off-grid and the grid-less. The on-grid method, which is widely studied and straightforward to implement, assumes that the true DOAs are on a predefined grid [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]. The off-grid method also uses a prior grid but does not constrain the DOA estimates to be on this grid, while it introduces more unknown parameters to be estimated and complicates the algorithm [23,24,25,26,27,28]. The grid-less method directly operates in the entire direction domain without a pre-specified grid but, currently, is developed mainly for linear array and encounters rather large computational burden [29,30,31]. Although the on-grid methods may induce the grid mismatch, it is still attractive due to its easy accessibility in a wide range of applications (see, e.g., [28,32,33]). To alleviate the grid mismatch, methods for selecting an appropriate prior grid are proposed in [9,34].

During the past two decades, various on-grid sparse techniques, such as linear least-square, covariance-based and maximum-likelihood (ML) type methods, were researched. The linear least-square methods [6,9,11,12], minimizing the

l_{2}

-norm of the noise residual of the deterministic model, enforce the sparsity by constraining the

l_{p}

-norm of the signal vector for

p \in [0, 1]

(note that

l_{p}

-norm for

p \in [0, 1)

is defined as the same form as the one for

p \geq 1

but is not truly a norm in mathematics). The covariance-based methods, such as the sparse iterative covariance-based estimation (SPICE) method [13,14] and its variants [15,16], and those proposed in [17,18,19,20,21,22], are derived from different covariance-fitting criteria and use the

l_{p}

-norm penalty to enforce the sparsity. The ML type methods, including the sparse Bayesian learning (SBL) methods [7,8,10] and the likelihood-based estimation of sparse parameters (LIKES) method [15,16], are deduced under the assumption that the array output signal is multivariate Gaussian distributed. The SBL methods use different prior distributions to model the sparsity. The LIKES method appears to be not explicitly utilizing the sparsity, but provides sparse estimates.

The ML principle is generally believed to be statistically more sound than the covariance-fitting principle [15]. The existing on-grid ML methods are developed on the Gaussian distribution assumption that are not satisfied in many signal processing practical applications. The complex elliptically symmetric (CES) distributions, containing the t-distribution, the K-distribution, the Gaussian distribution and so on, can be used to characterise the Gaussian and non-Gaussian random variables. Particularly, the heavy-tailed data, which usually appear in the field of radar and array signal processing [35,36], can be modeled by some CES distributions.

The

l_{p}

-norm penalty with

p \in [0, 1]

is well-known for its contributions in deriving sparse solutions [37]. In particular, the convex

l_{1}

-norm penalty, also known as the least absolute shrinkage and selection operator (LASSO) penalty, is easy to handle and widely used in various applications. Generally, a sparse and accurate estimate of the unknown sparse parameter can be expected when minimizing the negative likelihood plus an

l_{p}

-norm penalty.

For estimating the DOAs from the CES distributed array outputs, we provide an

l_{1}

-norm penalized ML method based on a sparse reconstruction of the array output covariance matrix. The characteristics and advantages of our method are explained as follows:

Our method is a sparse on-grid method based on the penalized ML principle, and is designed especially for the CES random output signals. The Gaussian distribution and many heavy-tailed and light-tailed distributions are included in the class of the CES distributions, and it is worth mentioning that the heavy-tailed output signals that are non-Gaussian are common in the field of signal processing [35,36].
The sparsity of the unknown sparse vector is enforced by penalizing its $l_{1}$ -norm. When the penalty parameter becomes zero, the proposed method is actually the general ML DOA estimation method that is applicable to the scenarios with CES random output signals.
Two penalized ML optimization problems are formulated for spatially uniform and non-uniform white sensor noise, respectively. Since it is difficult to solve the two non-convex penalized ML optimizations globally, two majorization-minimization (MM) algorithms having different iterative procedures are developed for seeking the optimal solutions locally for each of them. Some discussions on the computational complexities of the above two algorithms are provided. In addition, the optimal penalty parameter is suggested.
The proposed methods are evaluated numerically in scenarios with Gaussian and non-Gaussian output signals. Particularly, the performance gains originated from the added $l_{1}$ -norm penalty are numerically demonstrated.

The remainder of this paper is organized as follows. Section 2 introduces a sparse CES data model of the array output. In Section 3, a sparse penalized ML method is developed to estimate the DOAs. For solving the proposed

l_{1}

-penalized ML optimizations that are non-convex, algorithms in the MM framework are developed in Section 4. Section 5 numerically shows the performance of our method in Gaussian and non-Gaussian scenarios. Finally, some conclusions are given in Section 6.

Notations

The notation

C^{m} (R^{m})

denotes the set of all m-dimensional complex- (real-) valued vectors, and

C^{m \times n} (R^{m \times n})

denotes the set of all

m \times n

complex- (real-) valued matrices.

1_{p}

and

0_{p}

are the p-dimensional vectors with all elements equal to 1 and 0, respectively.

I_{p}

is the

p \times p

identity matrix.

{∥ \cdot ∥}_{1}

and

{∥ \cdot ∥}_{2}

denote the

l_{1}

-norm and

l_{2}

-norm of a vector, respectively. The superscripts

{(\cdot)}^{T}

and

{(\cdot)}^{H}

, respectively, denote the transpose and the conjugate transpose of a vector or matrix. The imaginary unit is denoted as ı defined by

ı^{2} = - 1

.

For a vector

x = {[x_{1}, \dots, x_{p}]}^{T} \in C^{p}

, define element-wise square root function

sqrt (x) = {[\sqrt{| x_{1} |}, \dots, \sqrt{| x_{p} |}]}^{T}

and element-wise division operation

x ⊘ y = {[x_{1} / y_{1}, \dots, x_{p} / y_{p}]}^{T}

for a vector

y = {[y_{1}, \dots, y_{p}]}^{T} \in C^{p}

with non-zero elements.

max (x) = max {x_{1}, \dots, x_{p}}

,

x_{[a : b]} = {[x_{a}, \dots, x_{b}]}^{T}

for

1 \leq a \leq b \leq p

,

x > 0 (\geq 0)

means

x_{i} > 0 (\geq 0)

for

i = 1, \dots, p

,

[x; z]

denotes the stacked vector of the column vectors

x

and

z

.

Diag (x)

denotes a square diagonal matrix with the elements of vector

x

on the main diagonal, and

{(Diag (x))}^{- 1}

denotes the diagonal matrix with main diagonal elements

1 / x_{i} (i = 1, \dots, p)

by taking

1 / 0 = \infty

.

For a square matrix

X \in C^{p \times p}

,

X_{i, j}

denotes the

(i, j)

th entry of

X

,

X > 0 (\geq 0)

means that

X

is Hermitian and positive definite (semidefinite),

tr (X)

denotes the trace of

X

, and

diag (X)

denotes a column vector of the main diagonal elements of

X

.

2. Problem Formulation

Consider the problem of estimating the DOAs of

k_{0}

narrow-band signals impinging on an array of m sensors.

Given a set of grid points

{θ_{1}, \dots, θ_{k}},

(1)

we assume that the true

k_{0}

(

k_{0} ≪ k

) DOAs, respectively, denoted as

ξ_{1}, \dots, ξ_{k_{0}}

, take values in it. The array output measurement at the instant t, denoted as

x (t) \in C^{m}

, can be modeled as

x (t) = A (θ) s (t) + v (t),

(2)

where

$θ = {[θ_{1}, \dots, θ_{k}]}^{T}$ ,
$A (θ) = [a (θ_{1}), \dots, a (θ_{k})] \in C^{m \times k}$ is the known array manifold matrix with $a (θ_{i})$ being the steering vector corresponding to $θ_{i}$ , $i = 1, \dots, k$ ;
$s (t) = {[s_{1} (t), \dots, s_{k} (t)]}^{T} \in C^{k}$ is the source signal vector at the time instant t, in which $s_{i} (t)$ is the unknown random signal from a possible source at $θ_{i}$ and then is zero if $θ_{i}$ is not in the true DOA set ${ξ_{1}, \dots, ξ_{k_{0}}}$ , $i = 1, \dots, k$ ;
$v (t) = {[v_{1} (t), \dots, v_{m} (t)]}^{T} \in C^{m}$ is the noise vector impinging on the sensor array at the time instant t.

Some necessary statistical assumptions are made as follows:

The possible source signals $s_{1} (t), \dots, s_{k} (t)$ are uncorrelated and zero-mean at any time instant t.
The noise components $v_{1} (t), \dots, v_{m} (t)$ are uncorrelated, zero-mean, and independent of $s_{1} (t), \dots, s_{k} (t)$ for any time instant t.
The $n (n > m)$ snapshots $x (1), \dots, x (n)$ of sensor array signals are independent and identically distributed from a CES distribution.

Note that the zero-mean assumptions above are common in the signal processing literature [5,35]. The CES distribution setting of the array output

x (t)

enables us to effectively process the Gaussian, heavy-tailed or light-tailed array snapshots, because the class of CES distributions [38] includes the Gaussian distribution, the t-distribution, the K-distribution and so on.

For the simplicity of the notations, we denote

A (θ)

as

A

and

a (θ_{i})

as

a_{i}

for

i = 1, \dots, k

in the following. Under the above assumptions, we can find that the array output

x (t)

in Equation (2) at any time instant t has mean zero and covariance matrix

\begin{matrix} R = E [x (t) x {(t)}^{H}] = A P A^{H} + V, \end{matrix}

(3)

where

\begin{matrix} P = E [s (t) s {(t)}^{H}] = Diag (p) \end{matrix}

(4)

is the (unknown) source signal covariance matrix with signal power vector

p = {[p_{1}, \dots, p_{k}]}^{T}

, and

\begin{matrix} V = E [v (t) v {(t)}^{H}] = Diag (σ) \end{matrix}

(5)

is the (unknown) noise covariance matrix with the noise power vector

σ = {[σ_{1}^{2}, \dots, σ_{m}^{2}]}^{T}

. The matrix

R

can be rewritten as

\begin{matrix} R = [A, I_{m}] Diag ([p; σ]) {[A, I_{m}]}^{H} . \end{matrix}

(6)

For any

i = 1, \dots, k

, the signal power

p_{i} = 0

if

θ_{i}

is not in the set of the true DOAs. Therefore, the true DOAs can be identified by the locations of nonzero elements of the power vector

p

. In the following, the DOA estimation problem is formulated as a problem of estimating the locations of nonzero elements of the power vector

p

.

3. Sparse DOA Estimation

The array output

x (t)

in Equation (2) is CES distributed with mean zero and covariance matrix

R

, and then the normalized random vector

\begin{matrix} y (t) ≜ x (t) / {∥ x (t) ∥}_{2}, \end{matrix}

(7)

which actually refers to the angle of the array output vector

x (t)

, has a complex angular central Gaussian (ACG) distribution [35] with the probability density function

\begin{matrix} p (y (t); R) \propto det {(R)}^{- 1} {(y {(t)}^{H} R^{- 1} y (t))}^{- m} . \end{matrix}

(8)

Denote

\begin{matrix} L_{0} (R) = log det (R) + \frac{m}{n} \sum_{t = 1}^{n} log (x {(t)}^{H} R^{- 1} x (t)), \end{matrix}

(9)

and then the negative log-likelihood of

y (t) = x (t) / {∥ x (t) ∥}_{2}, t = 1, \dots, n

, becomes

\begin{matrix} n L_{0} (R) - m \sum_{t = 1}^{n} {log (∥ x (t) ∥}_{2}^{2}) + c_{1}, \end{matrix}

(10)

where

c_{1}

is a constant.

3.1. Spatially Non-Uniform White Noise

In the case of spatially non-uniform white sensor noise, not all noise variances

σ_{i}^{2}

,

i = 1, \dots, m

, are equal. Assuming

σ_{m}^{2} > 0

, we denote

\begin{matrix} r & = [p; σ_{[1 : m - 1]}] / σ_{m}^{2}, \end{matrix}

(11)

\begin{matrix} W & = B Diag (r) B^{H} + J_{m}, \end{matrix}

(12)

where

B \in C^{m \times (k + m - 1)}

is the first

(k + m - 1)

columns of

[A, I_{m}]

and

J_{m}

is an

m \times m

matrix with the

(m, m)

-entry being 1 and the other entries being 0. Since

\begin{matrix} W & = R / σ_{m}^{2}, \end{matrix}

(13)

\begin{matrix} L_{0} (a R) & = L_{0} (R), \forall a > 0 \end{matrix}

(14)

and that the locations of nonzero elements of the sparse vector

r_{[1 : k]}

identify the true DOAs, we formulate the DOA estimation problem as solving the penalized likelihood optimization problem

\underset{r \in Ω}{arg min} L_{0} (W) + λ {∥ r_{[1 : k]} ∥}_{1},

(15)

where

Ω = {t \in R^{k + m - 1} ∣ t \geq 0, B Diag (t) B^{H} + J_{m} > 0}

(16)

and

λ \geq 0

is pre-specified. The

l_{1}

-norm penalty term

λ ∥ r_{[1 : k]} ∥_{1}

would help deduce a sparse solution, and the penalty parameter

λ

controls the sparsity level [37].

Remark 1.

We explain why the DOAs are not estimated by solving the plausible

l_{1}

-penalized ML optimization problem

\begin{matrix} \underset{p, σ}{arg min} L_{0} (R) + λ {∥ p ∥}_{1} . \end{matrix}

(17)

Recalling Equation (6), we find for any

λ_{1} > 0

and

λ_{2} > 0

,

\begin{matrix} \begin{matrix} L_{0} (R) + λ_{1} {∥ p ∥}_{1} = L_{0} ([A, I_{m}] Diag ([\frac{λ_{1}}{λ_{2}} p; \frac{λ_{1}}{λ_{2}} σ]) {[A, I_{m}]}^{H}) + λ_{2} {∥\frac{λ_{1}}{λ_{2}} p∥}_{1} . \end{matrix} \end{matrix}

(18)

Thus,

[p, σ]

is a locally optimal solution of Equation (17) with

λ = λ_{1}

if and only if

[\frac{λ_{1}}{λ_{2}} p; \frac{λ_{1}}{λ_{2}} σ]

is a locally optimal solution of Equation (17) with

λ = λ_{2}

. This means if we estimate

p

by Equation (17), the parameter λ cannot work theoretically in adjusting the sparsity level of the estimate of

p

. Instead of considering Equation (17), we formulate the optimization (Equation (15)) for the DOA estimation. In Equation (15), noticing the constant matrix

J_{m}

in Equation (12), we find that different values of λ would result in solutions with different sparsity levels.

3.2. Spatially Uniform White Noise

In the case of spatially uniform white sensor noise,

σ_{1} = \dots = σ_{m} = σ

. Assuming

σ^{2} > 0

, we denote

\begin{matrix} q & = p / σ^{2}, \end{matrix}

(19)

\begin{matrix} Q & = A Diag (q) A^{H} + I_{m} . \end{matrix}

(20)

Estimating the DOAs means identifying the locations of nonzero elements of the vector

q

. Considering

Q = R / σ^{2}

, we can estimate

q

through solving the penalized likelihood optimization problem

\underset{q \geq 0}{arg min} L_{0} (Q) + λ {∥ q ∥}_{1}

(21)

with

λ \geq 0

, where the term

{λ ∥ q ∥}_{1}

plays the same role as the penalty term in Equation (15).

Note that the number of unknown parameters to be estimated is k in the case of spatially uniform noise in contrast to

k + m - 1

in the case of spatially non-uniform noise.

4. DOA Estimation Algorithms

In this section, we provide methods to solve the optimization problems in Equations (15) and (21) with

λ

fixed. As Equations (15) and (21) are non-convex, it is generally hard to give their globally optimal solutions. Based on the MM framework [39,40], we develop algorithms to find the locally optimal solutions of Equations (15) and (21).

A function

f (x)

is said to be majorized by a function

g (x | x_{0})

at

x_{0}

, if

f (x) \leq g (x | x_{0})

for all x and

f (x_{0}) = g (x_{0} | x_{0})

. In the MM framework, the problem

arg {min}_{x} f (x)

can be solved through iteratively solving

x_{u + 1} = arg {min}_{x} g (x | x_{u})

.

When solving the problems in Equations (15) and (21) in the MM framework, majorizing functions can be constructed based on the following two inequalities. For any positive definite matrices

U \in C^{m \times m}

and

U_{u} \in C^{m \times m}

[40],

log det (U) \leq log det (U_{u}) + tr (U_{u}^{- 1} U) - m

(22)

and

log (x {(t)}^{H} U^{- 1} x (t)) \leq log (x {(t)}^{H} U_{u}^{- 1} x (t)) + \frac{x {(t)}^{H} U^{- 1} x (t)}{x {(t)}^{H} U_{u}^{- 1} x (t)} - 1,

(23)

where both equalities are achieved at

U = U_{u}

.

4.1. Algorithms for Spatially Non-Uniform White Noise

In this subsection, using two different majorizing functions of the objection function in Equation (15), we develop two different MM algorithms named MM1 and MM2 to solve Equation (15).

4.1.1. MM1 Algorithm

Denote

W_{u} = B Diag (r_{u}) B^{H} + J_{m},

(24)

where

r_{u} \in Ω

. Replacing the

U

and

U_{u}

in Equations (22) and (23) by

W

and

W_{u}

, respectively, we have for any

W > 0, W_{u} > 0

,

\begin{matrix} L_{0} (W) & \leq tr (W_{u}^{- 1} W) + \frac{m}{n} \sum_{t = 1}^{n} \frac{x {(t)}^{H} W^{- 1} x (t)}{x {(t)}^{H} W_{u}^{- 1} x (t)} + c_{2} \end{matrix}

(25)

\begin{matrix} = tr (W_{u}^{- 1} W) + tr (M_{u} W^{- 1}) + c_{2} \end{matrix}

(26)

where

c_{2}

is a constant, the equality in Equation (25) is achieved at

W = W_{u}

, and

M_{u} = \frac{m}{n} \sum_{t = 1}^{n} \frac{x (t) x {(t)}^{H}}{x {(t)}^{H} W_{u}^{- 1} x (t)} .

(27)

Denote the sum of the first two terms on the right side of Equation (26) and the

l_{1}

-norm penalty term in Equation (15) as

g_{1} (r ∣ r_{u}, λ)

, and then due to Equation (12),

g_{1} (r ∣ r_{u}, λ)

can be rewritten as

g_{1} (r ∣ r_{u}, λ) = tr (M_{u} W^{- 1}) + w_{u}^{T} r

(28)

with

\begin{matrix} w_{u} & = diag (B^{H} W_{u}^{- 1} B) + [λ 1_{k}; 0_{m - 1}] . \end{matrix}

(29)

From Equation (25),

g_{1} (r ∣ r_{u}, λ) + c_{2}

is found to be a majorizing convex function of the objective function in Equation (15). Based on the MM framework, the problem in Equation (15) can be solved by iteratively solving the convex optimization problem

r_{u + 1} = \underset{r \in Ω}{arg min} g_{1} (r ∣ r_{u}, λ) .

(30)

Proposition 1

(The MM1 algorithm for Equation (15)). The sequence

{r_{u}}

generated by

r_{u + 1} = \underset{r \geq 0}{arg min} g_{1} (r ∣ r_{u}, λ)

(31)

with any initial value

r_{0} \in Ω

converges to a locally optimal solution of the optimization problem in Equation (15).

Proof.

Since

g_{1} (r ∣ r_{u}, λ)

tends to

+ \infty

as

W

goes to the boundary of the positive semidefinite cone, the optimization problems in Equations (30) and (31) are equivalent.

This proposition follows by the convergence property of the MM algorithm [40] and the fact that

L_{0} (W) \to + \infty

with probability 1 as

W

tends to the boundary of the positive semidefinite cone [41]. □

Due to Equation (11),

r_{0} = {[{(r_{1})}_{0}, \dots, {(r_{k + m - 1})}_{0}]}^{T}

required in the iteration in Equation (31) can be the one obtained by inserting the initial estimate of

[p; σ]

presented in [13]. Specifically,

\begin{matrix} {(r_{i})}_{0} = \frac{b_{i}^{H} \hat{R} b_{i}}{{\hat{R}}_{m, m} {∥ b_{i} ∥}_{2}^{4}} > 0, i = 1, \dots, k + m - 1, \end{matrix}

(32)

where

b_{i}

is the ith column of the matrix

B

and

\begin{matrix} \hat{R} = \frac{1}{n} \sum_{t = 1}^{n} x (t) x {(t)}^{H} . \end{matrix}

(33)

To solve the optimization problem in Equation (31) globally, we develop two available solvers: the coordinate descent (CD) and SPICE-like solvers. Denote

r = {[r_{1}, \dots, r_{k + m - 1}]}^{T}

in the following.

Proposition 2

(The CD solver for Equation (31)). The sequence of

r

generated by iterating

r_{i} = ((\sqrt{\frac{β_{i}}{α_{i}}} - 1) \frac{1}{γ_{i}}) δ (β_{i} - α_{i}), i = 1, \dots, k + m - 1

(34)

with any initial value

r \geq 0

converges to the globally optimal solution

r_{u + 1}

of the problem in Equation (31), where

δ (\cdot)

is the indicator function of the interval

(0, \infty)

,

\begin{matrix} α_{i} & = {(w_{i})}_{u}, \end{matrix}

(35)

\begin{matrix} β_{i} & = b_{i}^{H} N_{i}^{- 1} M_{u} N_{i}^{- 1} b_{i}, \end{matrix}

(36)

\begin{matrix} γ_{i} & = b_{i}^{H} N_{i}^{- 1} b_{i}, \end{matrix}

(37)

with

{(w_{i})}_{u}

being the ith element of the vector

w_{u}

, and

N_{i} = \sum_{l \neq i} r_{l} b_{l} b_{l}^{H} + J_{m} .

(38)

Proof.

By the general analysis of the CD method in [42], it is easy to find that the convex problem in Equation (31) can be globally solved by iterating

r_{i} = \underset{r_{i} \geq 0}{arg min} g_{1} (r ∣ r_{u}, λ), i = 1, \dots, k + m - 1

(39)

until convergence.

Due to

W = B Diag (r) B^{H} + J_{m} = r_{i} b_{i} b_{i}^{H} + N_{i}

(40)

and

{(r_{i} b_{i} b_{i}^{H} + N_{i})}^{- 1} = N_{i}^{- 1} - \frac{r_{i} N_{i}^{- 1} b_{i} b_{i}^{H} N_{i}^{- 1}}{1 + r_{i} b_{i}^{H} N_{i}^{- 1} b_{i}},

(41)

the iteration in Equation (39) can be reformulated as

r_{i} = \underset{r_{i} \geq 0}{arg min} α_{i} r_{i} - \frac{β_{i} r_{i}}{1 + γ_{i} r_{i}}, i = 1 \dots, k + m - 1 .

(42)

In addition,

α_{i} > 0, β_{i} > 0, γ_{i} > 0

for

i = 1, \dots, k + m - 1

. Solving the convex optimization problems in Equation (42) by the gradient method, we conclude that the equations in Equation (42) are equivalent to those in Equation (34). □

As the solver in Proposition 2 is derived by the CD method, we call it the CD solver. The MM1 algorithm in Proposition 1 with the CD solver is specially named the MM1-CD algorithm.

For clarity, detailed steps of the MM1-CD algorithm for Equation (15) are presented in Algorithm 1.

Note that, in the CD solver, for

i = 1, \dots, k

, we have

β_{i} = \frac{m}{n} \sum_{t = 1}^{n} ζ_{i} ζ_{i}^{H}

(43)

where

ζ_{i} = \frac{a_{i}^{H} N_{i}^{- 1} x (t)}{\sqrt{x {(t)}^{H} W_{u}^{- 1} x (t)}}

(44)

with

a_{i}

being the ith column of the matrix

A

. The

β_{i}

can be interpreted as the correlation between the signal from a possible source at the grid

θ_{i}

and the array responses

x (1), \dots, x (n)

. From the indicator function

δ (β_{i} - α_{i})

in Equation (34), we find that it is more likely to force to zeros the powers of the assumed signals that are less correlated with the array responses.

Proposition 3

(The SPICE-like solver for Equation (31)). The sequence of

r

generated by iterating

r = sqrt (diag (Diag (r) B^{H} W^{- 1} M_{u} W^{- 1} B Diag (r)) ⊘ w_{u})

(45)

with any initial value

r > 0

converges to the globally optimal solution

r_{u + 1}

of problem in Equation (31).

Algorithm 1 The MM1-CD algorithm for spatially non-uniform white noise.

1:: Give $r_{0}$ and $ϵ$ ,
2:: $u = 0$ ,
3:: repeat
4:: $W_{u} = B Diag (r_{u}) B^{H} + J_{m}$ ,
5:: Calculate $M_{u}$ and $w_{u}$ ,
6:: $d = 0$ ,
7:: $r^{0} = r_{u}$ , $r = r^{0}$ ,
8:: repeat
9:: for $i = 1 : k + m - 1$ do
10:: $N_{i} = \sum_{l \neq i} r_{l} b_{l} b_{l}^{H} + J_{m}$ ,
11:: Calculate $α_{i}$ , $β_{i}$ and $γ_{i}$ ,
12:: if $β_{i} \leq α_{i}$ then
13:: $r_{i} = 0$ ,
14:: else
15:: $r_{i} = (\sqrt{β_{i} / α_{i}} - 1) / γ_{i}$ ,
16:: end if
17:: end for
18:: $d = d + 1$ ,
19:: $r^{d} = {[r_{1}, \dots, r_{k + m - 1}]}^{T}$ ,
20:: until $∥ r^{d} - r^{d - 1} ∥_{2} / {∥ r^{d - 1} ∥}_{2} < ϵ$ ,
21:: $u = u + 1$ ,
22:: $r_{u} = r^{d}$ ,
23:: until $∥ r_{u} - r_{u - 1} ∥_{2} / {∥ r_{u - 1} ∥}_{2} < ϵ$ .

Proof.

It is obvious that Equation (31) and the SPICE criterion in [15] are with similar forms. By the same way as the SPICE criterion is analyzed in [15], we find that the globally optimal solution of the problem in Equation (31) is identical to the minimizer

r

of the optimization problem

\begin{matrix} \begin{matrix} min_{r \geq 0, E \in C^{(k + m) \times n}} & tr (E^{H} {(Diag ([r; 1]))}^{- 1} E) + w_{u}^{T} r \\ s.t. & [A, I_{m}] E = X_{u}, \end{matrix} \end{matrix}

(46)

where

X_{u} = [\tilde{x} (1), \dots, \tilde{x} (n)]

with

\tilde{x} (t) = \sqrt{\frac{m}{n}} \frac{x (t)}{\sqrt{x {(t)}^{T} W_{u}^{- 1} x (t)}}, t = 1, \dots, n .

(47)

For a fixed

r

, the matrix

E

minimizing Equation (46) can be verified to be [15]

\begin{matrix} E = Diag ([r; 1]) {[A, I_{m}]}^{H} W^{- 1} X_{u}, \end{matrix}

(48)

and for a fixed

E

, the vector

r

minimizing Equation (46) can be readily given by

\begin{matrix} r = sqrt ({(diag (E E^{H}))}_{[1 : k + m - 1]} ⊘ w_{u}) . \end{matrix}

(49)

The sequences of

E

and

r

, generated by alternately iterating Equations (48) and (49) from

r > 0

, converge to the globally optimal solution of the convex problem in Equation (46) [14,15].

Due to

X_{u} X_{u}^{H} = M_{u}

, iterating Equation (49) is just iterating Equation (45). Thus, the sequence of

r

generated by Equation (45) converges to the minimizer

r

of Equation (46). □

The solver in Proposition 3 is named the SPICE-like solver, and the MM1 algorithm in Proposition 1 with it is called the MM1-SPICE algorithm. Algorithm 2 summarily illustrates the MM1-SPICE algorithm for Equation (15).

Algorithm 2 The MM1-SPICE algorithm for spatially non-uniform white noise.

1:: Give $r_{0}$ and $ϵ$ ,
2:: $u = 0$ ,
3:: repeat
4:: $W_{u} = B Diag (r_{u}) B^{H} + J_{m}$ ,
5:: Calculate $M_{u}$ and $w_{u}$ ,
6:: $d = 0$ ,
7:: $r^{0} = r_{u}$ ,
8:: repeat
9:: $W^{d} = B Diag (r^{d}) B^{H} + J_{m}$ ,
10:: $r^{d + 1} = sqrt (diag (Diag (r^{d}) B^{H} {(W^{d})}^{- 1} M_{u} {(W^{d})}^{- 1} B Diag (r^{d})) ⊘ w_{u})$ ,
11:: $d = d + 1$ ,
12:: until $∥ r^{d} - r^{d - 1} ∥_{2} / {∥ r^{d - 1} ∥}_{2} < ϵ$ ,
13:: $u = u + 1$ ,
14:: $r_{u} = r^{d}$ ,
15:: until $∥ r_{u} - r_{u - 1} ∥_{2} / {∥ r_{u - 1} ∥}_{2} < ϵ$ .

From Propositions 1–3, it is found that the proposed MM1 algorithm is by an inner–outer iteration loop. The MM1-CD and the MM1-SPICE algorithms are with the identical outer loop but different nested inner loops. In addition, relationship and difference between the MM1-CD and the MM1-SPICE are discussed in Section 4.3.

4.1.2. MM2 Algorithm

When

r_{u} > 0

, it is found from [36] that, for any

r > 0

,

[\begin{matrix} C_{u} {(Diag ([r; 1]))}^{- 1} C_{u}^{H} & I_{m} \\ I_{m} & W \end{matrix}] = F^{H} F \geq 0,

(50)

where

\begin{matrix} C_{u} & = W_{u}^{- 1} [A, I_{m}] Diag ([r_{u}; 1]), \end{matrix}

(51)

\begin{matrix} F & = [{(Diag ([r; 1]))}^{- \frac{1}{2}} C_{u}^{H}, {(Diag ([r; 1]))}^{\frac{1}{2}} {[A, I_{m}]}^{H}] . \end{matrix}

(52)

From Equation (50), we have

\begin{matrix} W^{- 1} \leq C_{u} {(Diag ([r; 1]))}^{- 1} C_{u}^{H} \end{matrix}

(53)

with the equality achieved at

r = r_{u}

. The inequality in Equation (53) is also valid for any

r \in Ω

due to

W > 0

.

Denote

g_{2} (r ∣ r_{u}, λ) = tr (C_{u}^{H} M_{u} C_{u} {(Diag ([r; 1]))}^{- 1}) + w_{u}^{T} r .

(54)

It is clear from Equation (53) that, when

r_{u} > 0

, for any

r \in Ω

,

g_{1} (r ∣ r_{u}, λ) \leq g_{2} (r ∣ r_{u}, λ),

(55)

where the equality is achieved at

r = r_{u}

. Therefore, at any

r_{u} > 0

,

g_{2} (r ∣ r_{u}, λ) + c_{2}

majorizes the objective function in Equation (15) for

r \in Ω

.

Proposition 4

(The MM2 algorithm for Equation (15)). The sequence

{r_{u}}

generated by

r_{u + 1} = sqrt (diag (Diag (r_{u}) B^{H} W_{u}^{- 1} M_{u} W_{u}^{- 1} B Diag (r_{u})) ⊘ w_{u})

(56)

with any initial value

r_{0} > 0

converges to a locally optimal solution of the problem in Equation (15).

Proof.

Through the convergence analysis in [36], we have that, although Equation (55) is valid only when

r_{u} > 0

, the sequence

{r_{u}}

generated

\begin{matrix} r_{u + 1} = \underset{r \in Ω}{arg min} g_{2} (r ∣ r_{u}, λ) \end{matrix}

(57)

with any initial value

r_{0} > 0

converges to a locally optimal solution of the problem in Equation (15). It is worth mentioning that the elements of the coefficient vector

diag (C_{u}^{H} M_{u} C_{u})

in Equation (54) are positive. By solving the optimization problem in Equation (57) using the gradient method, the iteration procedure in Equation (57) is found to be exactly Equation (56). □

The

r_{0}

involved in the iteration procedure in Equation (56) can be the one given in Equation (32). Summarily, Algorithm 3 gives the detailed steps of the MM2 algorithm for the problem in Equation (15).

Algorithm 3 The MM2 algorithm for spatially non-uniform white noise.

1:: Give $r_{0}$ and $ϵ$ ,
2:: $u = 0$ ,
3:: repeat
4:: $W_{u} = B Diag (r_{u}) B^{H} + J_{m}$ ,
5:: Calculate $M_{u}$ and $w_{u}$ ,
6:: $r_{u + 1} = sqrt (diag (Diag (r_{u}) B^{H} W_{u}^{- 1} M_{u} W_{u}^{- 1} B Diag (r_{u})) ⊘ w_{u})$ ,
7:: $u = u + 1$ ,
8:: until $∥ r_{u} - r_{u - 1} ∥_{2} / {∥ r_{u - 1} ∥}_{2} < ϵ$ .

4.2. DOA Estimation for Spatially Uniform White Noise

The DOA estimation in the case of spatially uniform white noise amounts to solving the optimization problem in Equation (21). Problems in Equations (21) and (15) are with similar forms, but Equation (21) involves a smaller number of unknown parameters. By the same way as the problem in Equation (15) is analyzed in Section 4.1, we can naturally derive the algorithms to solve Equation (21). The algorithms for Equation (21) are still named MM1 (including MM1-CD and MM1-SPICE) and MM2.

4.2.1. MM1 Algorithm

Proposition 5

(The MM1 algorithm for Equation (21)). Denote

\begin{matrix} h_{1} (q ∣ q_{u}, λ) = tr (Z_{u} Q^{- 1}) + e_{u}^{T} q \end{matrix}

(58)

with

\begin{matrix} Z_{u} & = \frac{m}{n} \sum_{t = 1}^{n} \frac{x (t) x {(t)}^{H}}{x {(t)}^{H} Q_{u}^{- 1} x (t)}, \end{matrix}

(59)

\begin{matrix} e_{u} & = diag (A^{H} Q_{u}^{- 1} A) + λ 1_{k}, \end{matrix}

(60)

\begin{matrix} Q_{u} & = A Diag (q_{u}) A^{H} + I_{m}, \end{matrix}

(61)

and then the sequence

{q_{u}}

generated by

\begin{matrix} q_{u + 1} = \underset{q \geq 0}{arg min} h_{1} (q ∣ q_{u}, λ) \end{matrix}

(62)

with any initial value

q \geq 0

converges to a locally optimal solution of the problem in Equation (21).

To solve the optimization problem in Equation (62), using the same method as Equation (31) is solved, we offer two different iterative solvers for Equation (62).

Proposition 6

(The CD solver for Equation (62)). The sequence of

q

generated by iterating

\begin{matrix} q_{j} = (\frac{{\tilde{β}}_{j}}{{\tilde{α}}_{j}} - 1) \frac{1}{{\tilde{γ}}_{j}} δ ({\tilde{β}}_{j} - {\tilde{α}}_{j}), j = 1, \dots, k \end{matrix}

(63)

with any initial value

q \geq 0

converges to the globally optimal solution of the problem in Equation (62), where

\begin{matrix} {\tilde{α}}_{j} & = {(e_{j})}_{u}, \end{matrix}

(64)

\begin{matrix} {\tilde{β}}_{j} & = a_{j}^{H} {(H_{j})}^{- 1} Z_{u} {(H_{j})}^{- 1} a_{j}, \end{matrix}

(65)

\begin{matrix} {\tilde{γ}}_{j} & = a_{j}^{H} {(H_{j})}^{- 1} a_{j}, \end{matrix}

(66)

with

{(e_{j})}_{u}

being the jth element of the vector

e_{u}

,

a_{j}

being the jth column of the matrix

A

and

\begin{matrix} H_{j} = \sum_{l \neq j} q_{l} a_{l} a_{l}^{H} + I_{m} . \end{matrix}

(67)

Proposition 6 introduces the nested CD inner loop of the MM1 algorithm for the problem in Equation (21), and can be proven similar to Proposition 2.

Proposition 7

(The SPICE-like solver for Equation (62)). The sequence of

q

generated by iterating

\begin{matrix} q & = sqrt (diag (Diag (q) A^{H} Q^{- 1} Z_{u} Q^{- 1} A Diag (q)) ⊘ e_{u}) \end{matrix}

(68)

with any initial value

q > 0

converges to the globally optimal solution of the problem in Equation (62).

The iterative procedure in Proposition 7 above is the nested SPICE-like inner loop of the MM1 algorithm for Equation (21), and can be proven similar to Proposition 3.

The MM procedure in Proposition 5 with the CD nested loop in Proposition 6 is the MM1-CD algorithm for Equation (21). The MM1-SPICE algorithm for Equation (21) is the MM procedure in Proposition 5 with the SPICE-like nested loop in Proposition 7.

4.2.2. MM2 Algorithm

Proposition 8

(The MM2 algorithm for Equation (21)). The sequence

{q_{u}}

generated by

q_{u + 1} = sqrt (diag (Diag (q_{u}) A^{H} Q_{u}^{- 1} Z_{u} Q_{u}^{- 1} A Diag (q_{u})) ⊘ e_{u})

(69)

with any initial value

q > 0

converges to a locally optimal solution of the problem in Equation (21).

Due to Equation (19), by means of inserting in

q

the initial estimates of

p

and

σ^{2}

given in [13],

q_{0} = {[{(q_{1})}_{0}, \dots, {(q_{k})}_{0}]}^{T}

required by the iterations in Equations (62) and (69) can be

{(q_{j})}_{0} = \frac{a_{i}^{H} \hat{R} a_{i}}{\hat{σ} {∥ a_{i} ∥}_{2}^{4}} > 0, j = 1, \dots, k,

(70)

where

\hat{R}

is given by Equation (33) and

\hat{σ}

is the mean of the first m smallest values of the set

\{\frac{a_{1}^{H} \hat{R} a_{1}}{∥ a_{1} ∥_{2}^{2}}, \dots, \frac{a_{k}^{H} \hat{R} a_{k}}{∥ a_{k} ∥_{2}^{2}}, {\hat{R}}_{1, 1}, \dots, {\hat{R}}_{m, m}\} .

(71)

4.3. Discussions on the MM1 and MM2 Algorithms

The following arguments are focused on the case of spatially non-uniform noise, which are also applicable to the case of spatially uniform noise.

Both the MM1 and MM2 algorithms decrease the objective function in Equation (15) at each MM iteration and converge locally, in which the different majorizing functions are adopted. Compared to the MM2 algorithm, the MM1 algorithm has a majorizing function closer to the objective function in Equation (15) due to Equation (55). The computational burdens of the two algorithms are mainly caused by the matrix inversion operations.

Although the MM1-CD and MM1-SPICE algorithms have different nested inner iteration procedures, they converge to the same local solution theoretically because their outer MM iteration procedures are both Equation (31). Each nested inner iteration of the MM1-CD algorithm, detailed by Steps 9–17 in Algorithm 1, requires

k + m - 1

matrix inverse operations. In each nested inner iteration of the MM1-SPICE algorithm, presented by Steps 9–10 in Algorithm 2, only one matrix inverse operation is entailed.

It is somewhat interesting to find that the iteration of the MM2 algorithm (see Equation (56)) and the nested inner iteration of the MM1-SPICE algorithm (see Equation (45)) have similar forms. In each iteration of the MM2 algorithm, only one matrix inverse operation is needed.

4.4. Selection of Penalty Parameter

The penalty parameter

λ

in Equations (15) and (21) affects the sparsity levels of the estimates of

r

and

q

. A modified Bayesian information criterion (BIC) method [37], which is common and statistically sound, is provided here to choose an appropriate

λ

. Let

{\hat{r}}_{λ}

and

{\hat{q}}_{λ}

be the solutions of the problems in Equations (15) and (21) with a fixed

λ

, respectively, and denote

{\hat{W}}_{λ} = B Diag ({\hat{r}}_{λ}) B^{H} + J_{m}

and

{\hat{Q}}_{λ} = A Diag ({\hat{q}}_{λ}) A^{H} + I_{m}

. The appropriate

λ

for spatially non-uniform noise and uniform noise are

\underset{λ}{arg min} L_{0} ({\hat{W}}_{λ}) + {\hat{z}}_{λ, 1} log (n) / (2 n)

(72)

and

\underset{λ}{arg min} L_{0} ({\hat{Q}}_{λ}) + {\hat{z}}_{λ, 2} log (n) / (2 n),

(73)

respectively, where

{\hat{z}}_{λ, 1}

and

{\hat{z}}_{λ, 2}

are the numbers of nonzero elements of the vectors

{({\hat{r}}_{λ})}_{[1 : k]}

and

{\hat{q}}_{λ}

, respectively.

Note that the elements of

{({\hat{r}}_{λ})}_{[1 : k]}

and

{\hat{q}}_{λ}

can be treated as 0 if they are smaller than some certain values respectively, e.g.,

ε max ({({\hat{r}}_{λ})}_{[1 : k]})

and

ε max ({\hat{q}}_{λ})

with a very small

ε > 0

.

Notice that the

L_{0} ({\hat{W}}_{λ})

and

L_{0} ({\hat{Q}}_{λ})

in Equations (72) and (73) are substitutes for the marginal likelihood also called as Bayesian evidence required in the BIC criterion [43]. By the way, when modeling in the Bayesian framework, the marginal likelihood usually cannot be analytically calculated, but can be approximated by several computational methods in [44,45,46].

5. Numerical Experiments

In this section, we numerically show the performance of the proposed methods. Consider a uniform linear array (ULA), and, for each

j = 1, \dots, k_{0}

, the steering vector corresponding to the DOA

ξ_{j}

is

\begin{matrix} a (ξ_{j}) = {[1, exp (ı π cos (ξ_{j})), \dots, exp (ı π (m - 1) cos (ξ_{j}))]}^{T} . \end{matrix}

(74)

The array output data

x (t)

are generated from the model

\begin{matrix} x (t) = \sum_{j = 1}^{k_{0}} a (ξ_{j}) {\bar{s}}_{j} (t) + v (t), t = 1, \dots, n, \end{matrix}

(75)

and both the source signals

{\bar{s}}_{1} (t), \dots, {\bar{s}}_{k_{0}} (t)

and the observation noise

v (t)

are temporally independent. The SNR is defined as

SNR (dB) = 10 {log}_{10} (\frac{tr (\bar{A} \bar{P} {\bar{A}}^{H})}{tr (V)}),

(76)

where

\bar{A} = [a (ξ_{1}), \dots, a (ξ_{k_{0}})]

,

\bar{P}

is the covariance matrix of

\bar{s} (t) = {[{\bar{s}}_{1} (t), \dots, {\bar{s}}_{k_{0}} (t)]}^{T}

, and

V

is the covariance matrix of

v (t)

. The root mean-square error (RMSE) of the DOA estimate is employed to evaluate the estimation performance, which is approximated by

R = 500

Monte Carlo runs as

RMSE = \frac{1}{R} \sum_{i = 1}^{R} \sqrt{\frac{1}{k_{0}} \sum_{j = 1}^{k_{0}} {({\hat{ξ}}_{j, i} - ξ_{j})}^{2}},

(77)

where

{\hat{ξ}}_{j, i}

is the estimate of

ξ_{j}

in the ith Monte Carlo run. In the following experiments, we applied the proposed methods and the SPICE [13,15] and LIKES methods [15] to estimate the DOAs. Set the grid points

θ = {0^{\circ}, 1^{\circ}, \dots, 180^{\circ}}

, and the tolerance

ϵ = 10^{- 8}

for convergence. All methods were coded in MATLAB and executed on a workstation with two 2.10 GHz Intel Xeon CPUs.

5.1. Experimental Settings

5.1.1. Experiment 1

Consider a scenario with Gaussian source signals and noise:

The number of sensors in the array is $m = 15$ , and the number of snapshots is $n = 100$ .
There are $k_{0} = 4$ sources locating at $ξ_{1} = 60^{\circ}, ξ_{2} = 80^{\circ}, ξ_{3} = 100^{\circ}, ξ_{4} = 120^{\circ}$ , respectively.
The signal $\bar{s} (t)$ is zero-mean circular complex Gaussian with covariance matrix $\bar{P} = Diag ([100, 100, 100, 100])$ .
The noise $v (t)$ is zero-mean circular complex Gaussian with covariance matrix $V = σ^{2} I_{m}$ .

5.1.2. Experiment 2

Consider the scenario where both the source signals and the observation noise are non-Gaussian. Let

m = 15

. We used the experimental settings:

The $k_{0} = 4$ source signals from $ξ_{1} = 10^{\circ}, ξ_{2} = 40^{\circ}, ξ_{3} = 40^{\circ} + Δ ξ, ξ_{4} = 70^{\circ}$ are, respectively, as ${\bar{s}}_{1} (t) = 5 exp (ı φ_{1} (t)), {\bar{s}}_{2} (t) = 10 exp (ı φ_{2} (t)), {\bar{s}}_{3} (t) = 10 exp (ı φ_{3} (t)), {\bar{s}}_{4} (t) = 5 exp (ı φ_{4} (t))$ , where $Δ ξ$ is the angle separation between $ξ_{2}$ and $ξ_{3}$ , and $φ_{1} (t), \dots, φ_{4} (t)$ are independent and respectively, distributed as

$\begin{matrix} \begin{matrix} φ_{1} (t) & \sim 2 π Gamma (1, 1), \\ φ_{2} (t) & \sim 2 π Gamma (1, 1), \\ φ_{3} (t) & \sim U [0, 2 π], \\ φ_{4} (t) & \sim U [0, 2 π], \end{matrix} \end{matrix}$

(78)

where $Gamma (a, b)$ denotes the Gamma distribution with the shape parameter a and scale parameter b, and $U [a, b]$ denotes the uniform distribution on the interval $[a, b]$ . Note that the source signals ${\bar{s}}_{1} (t)$ , ${\bar{s}}_{2} (t)$ , ${\bar{s}}_{3} (t)$ and ${\bar{s}}_{4} (t)$ have constant modulus, which is common in communication applications [13].
The observation noise $v (t)$ is distributed as

$\begin{matrix} v (t) \sim ϱ n_{1}, \end{matrix}$

(79)

where $ϱ \sim Gamma (1, 6)$ and $n_{1}$ is zero-mean Gaussian with covariance matrix $V = σ^{2} I_{m}$ .

5.1.3. Experiment 3

Consider a scenario with non-Gaussian source signals and non-Gaussian spatially non-uniform white noise. Let

m = 8

and

n = 100

. The

k_{0} = 3

source signals locating at

ξ_{1} = 40^{\circ}, ξ_{2} = 50^{\circ}, ξ_{3} = 80^{\circ}

are

{\bar{s}}_{1} (t) = 10 exp (ı φ_{2} (t))

,

{\bar{s}}_{2} (t) = 30 exp (ı φ_{3} (t))

and

{\bar{s}}_{3} (t) = 10 exp (ı φ_{4} (t))

with

φ_{2} (t), φ_{3} (t), φ_{4} (t)

being given by Equation (78). The observation noise is distributed as

\begin{matrix} v (t) \sim ϱ n_{2}, \end{matrix}

(80)

where

ϱ \sim Gamma (2, 2)

and

n_{2}

is zero-mean Gaussian with covariance matrix

V = σ^{2} Diag ([m, m - 1, \dots, 1])

.

5.2. Experimental Results

In Experiment 1, the MM1-CD, MM1-SPICE and MM2 algorithms with

λ = 0

were firstly applied. Table 1 reports the iteration numbers

n_{1}, n_{2}, n_{3}

, the computational time

τ

(in seconds), and the DOA estimation RMSEs (in degrees). Specifically,

n_{1}

is the number of total iterations,

n_{2}

is the number of iterations in the outer loop, amd

n_{3}

is the average of the iterations in a nested inner loop. Besides the RMSEs, each value in Table 1 is the average of the results of 500 Monte Carlo runs. Note that, even though

n_{1} = n_{2} \times n_{3}

in a Monte Carlo run,

n_{1}

is not exactly equal to

n_{2} \times n_{3}

in Table 1 because they are the average of 500 Monte Carlo runs.

Table 1. Comparison of computational complexities and RMSEs.

As shown in Table 1, the MM1-CD and MM1-SPICE algorithms had similar RMSEs and the MM1-CD algorithm took much more time than the MM1-SPICE algorithm. Considering that the MM1-CD and MM1-SPICE algorithms theoretically converge to the identical local solution, we believe that they will always have similar RMSEs. Moreover, the MM2 algorithm had the smallest iteration numbers, the least computational time and satisfactory RMSEs. In the following, we present the DOA estimation performance only of the MM1-SPICE and MM2 algorithms. Note that the MM1-SPICE and MM2 algorithms may converge to different local solutions, and then they are referred to as two different estimation methods hereinafter. Specifically, we compared the following six methods:

LIKES: The ML method proposed in [15] under the assumption that the array output $x (t)$ is Gaussian distributed.
SPICE: A sparse covariance-based estimation method proposed in [13] with no distribution assumption.
MM1-SPICE-0: The MM1-SPICE method with the penalty parameter $λ = 0$ ;
MM1-SPICE-P: The MM1-SPICE method with the penalty parameter selected by the criterion in Section 4.4.
MM2-0: The MM2 method with the penalty parameter $λ = 0$ .
MM2-P: The MM2 method with the penalty parameter selected by the criterion in Section 4.4.

It is worth presenting that in, Experiments 1–3, the standard MUSIC algorithm was found to be almost ineffective, thus we do not illustrate it.

The RMSE comparisons of the above six methods are illustrated in Figure 1, Figure 2 and Figure 3 for Experiments 1–3 with different SNRs, respectively. Figure 4 and Figure 5 for Experiment 2 show the DOA estimation RMSEs of the above six methods versus the number of snapshots and the angle separation, respectively.

Figure 1. DOA estimation RMSE versus SNR for Experiment 1.

Figure 2. DOA estimation RMSE versus SNR for Experiment 2 with

n = 100

and

Δ ξ = 15^{\circ}

.

Figure 3. DOA estimation RMSE versus SNR for Experiment 3.

Figure 4. DOA estimation RMSE versus the number of snapshots n for Experiment 2 with

SNR = 0 dB

and

Δ ξ = 15^{\circ}

.

Figure 5. DOA estimation RMSE versus the angle separation

Δ ξ

for Experiment 2 with

n = 100

and

SNR = 0 dB

.

In Figure 1, we can see that the MM1-SPICE-0 and MM2-0 methods were comparable to the LIKES and SPICE methods in the Gaussian cases. As shown in Figure 2, Figure 3, Figure 4 and Figure 5, the scenarios of non-Gaussian random source signals and heavy-tailed random noise, the MM1-SPICE-0 and MM2-0 methods performed much better than the LIKES and SPICE methods. In other words, the ML methods designed for the CES outputs were effective in the simulation scenarios of Gaussian and non-Gaussian distributions.

As shown in Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5, we found that the penalized ML methods, i.e., the MM1-SPICE-P and MM2-P methods, had smaller RMSEs than the other four methods. As shown in Figure 4, with the increase of the number of snapshots, the performance of the MM1-SPICE-P and MM2-P methods improved but the performance of the MM1-SPICE-0 and MM2-0 methods remained virtually unchanged. Figure 5 shows that, as

Δ ξ

increased from

2^{\circ}

to

6^{\circ}

, the performance of the MM1-SPICE-P and MM2-P methods became better, while, when

Δ ξ

was larger than

4^{\circ}

, the performance of the MM1-SPICE-0 and MM2-0 methods no longer improved. As shown in Figure 1, Figure 2 and Figure 3, unsurprisingly, as the SNR increased, the performance of all the six methods became better.

For illustrating the difference between the penalized and un-penalized ML methods, we evaluated the normalized spectrum (NS):

\begin{matrix} NS = \frac{{\hat{r}}_{[1 : k]}}{max ({\hat{r}}_{[1 : k]})} or \frac{\hat{q}}{max (\hat{q})}, \end{matrix}

(81)

where

\hat{r}

and

\hat{q}

are, respectively, the estimates of

r

and

q

.

Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 report the NSs of Experiments 1–3 by the MM2-0 and MM2-P methods (the curves of the NSs of the MM-SPICE-0 and MM2-SPICE-P methods are similar to those of the MM2-0 and MM2-P methods, respectively). Figure 6 and Figure 10 are for Experiments 1 and 3 with

SNR = 0 dB

, respectively. Figure 7 and Figure 8 are both for Experiment 2 with

n = 100

and

SNR = 0 dB

, but they involve two scenarios with different angle separations. Particularly, Figure 8 is for the very small

Δ ξ = 3^{\circ}

, as opposed to Figure 7 for

Δ ξ = 15^{\circ}

. Figure 9 is also for Experiment 2, but it is about the scenario with small snapshot size

n = 20

, small

SNR = - 10 dB

and moderate angle separation

Δ ξ = 10^{\circ}

. Note that the red dashed lines mark the true DOAs and the NS curves in each figure are the results of a randomly selected realization. We can clearly see in Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 that the MM2-P method having the

l_{1}

-norm penalty added yielded a higher angular resolution.

Figure 6. NSs of Experiment 1 with

SNR = 0 dB

.

Figure 7. NSs of Experiment 2 with

n = 100

,

SNR = 0 dB

and

Δ ξ = 15^{\circ}

.

Figure 8. NSs of Experiment 2 with

n = 100

,

SNR = 0 dB

and

Δ ξ = 3^{\circ}

.

Figure 9. NSs of Experiment 2 with

n = 20

,

SNR = - 10 dB

and

Δ ξ = 10^{\circ}

.

Figure 10. NSs of Experiment 3 with

SNR = 0 dB

.

6. Conclusions

This paper provides a penalized ML method for DOA estimation under the assumption that the array output has a CES distribution. Two MM algorithms, named MM1 and MM2, are developed for solving the non-convex penalized ML optimization problem for spatially uniform or non-uniform noise. They converge locally with different rates. The numerical experiments showed that the MM2 ran faster and performed as well as the MM1 when estimating the DOAs. Their rates of convergence will be further explored theoretically in future.

It is worth mentioning that, in numerical simulations, the proposed

l_{1}

-norm penalized likelihood method effectively estimated the DOAs, although the values of nonzero elements of

r

or

p

were not estimated very accurately. By replacing the

l_{1}

-norm penalty by other proper penalties (e.g., the smoothly clipped absolute deviation (SCAD) [47], adaptive LASSO [48], or

l_{p}

-norm penalty (

0 \leq p < 1

)), more accurate estimates of the unknown parameter may be derived in the future.

If the

l_{1}

-norm penalty is replaced by the adaptive LASSO penalty, then the algorithms proposed in this paper can be applied almost without modification because the adaptive LASSO penalty is a weighted

l_{1}

-norm penalty. When the non-convex SCAD or the

l_{p}

-norm penalty (

0 \leq p < 1

) is employed, the convex majorizing functions given in [40,49] can be exploited based on the MM framework, and then the algorithms in this paper with minor modifications are also applicable.

When we have some informative prior knowledge on the directions of source signals, the problem of estimating the DOAs can be formulated as maximizing a sparse posterior likelihood from the Bayesian perspective. The sparse Bayesian method of estimating DOAs from the CES array outputs is interesting and is worth studying, while how to formulate and solve a sparse posterior likelihood optimization and how to do model selection are challenges. The BIC criterion can be used for the model selection, in which the Bayesian evidence difficult to be analytically derived can be approximated by the methods in [44,45,46]. More research along this line will be done in the future.

Author Contributions

Conceptualization, C.C. and J.Z.; Funding acquisition, J.Z.; Methodology, C.C., J.Z. and M.T.; Software, C.C.; Supervision, J.Z.; Validation, C.C.; Writing—original draft, C.C.; and Writing—review and editing, J.Z. and M.T.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant 61374027, the Application Basic Research Project of Sichuan Province under grant 2019YJ0122, and the Program for Changjiang Scholars and Innovative Research Team in University under grant IRT_16R53 from the Chinese Education Ministry.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Schmidt, R.O. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar] [CrossRef]
Roy, R.; Kailath, T. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 984–995. [Google Scholar] [CrossRef]
Stoica, P.; Sharman, K.C. Maximum likelihood methods for direction-of-arrival estimation. IEEE Trans. Acoust. Speech Signal Process. 1990, 38, 1132–1143. [Google Scholar] [CrossRef]
Friedlander, B. The root-MUSIC algorithm for direction finding with interpolated arrays. Signal Process. 1993, 30, 15–29. [Google Scholar] [CrossRef]
Yang, Z.; Li, J.; Stoica, P.; Xie, L. Sparse methods for direction-of-arrival estimation. In Academic Press Library in Signal Processing, Volume 7: Array, Radar and Communications Engineering; Chellappa, R., Theodoridis, S., Eds.; Academic Press: London, UK, 2018; Chapter 11; pp. 509–581. [Google Scholar]
Gorodnitsky, I.F.; Rao, B.D. Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm. IEEE Trans. Signal Process. 1997, 45, 600–616. [Google Scholar] [CrossRef]
Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
Wipf, D.P.; Rao, B.D. Sparse Bayesian learning for basis selection. IEEE Trans. Signal Process. 2004, 52, 2153–2163. [Google Scholar] [CrossRef]
Malioutov, D.; Çetin, M.; Willsky, A.S. A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process. 2005, 53, 3010–3022. [Google Scholar] [CrossRef]
Babacan, S.D.; Molina, R.; Katsaggelos, A.K. Bayesian compressive sensing using Laplace priors. IEEE Trans. Image Process. 2010, 19, 53–63. [Google Scholar] [CrossRef] [PubMed]
Hyder, M.M.; Mahata, K. Direction-of-arrival estimation using a mixed l_2,0 norm approximation. IEEE Trans. Signal Process. 2010, 58, 4646–4655. [Google Scholar] [CrossRef]
Xu, X.; Wei, X.; Ye, Z. DOA estimation based on sparse signal recovery utilizing weighted l₁-norm penalty. IEEE Signal Process. Lett. 2012, 19, 155–158. [Google Scholar] [CrossRef]
Stoica, P.; Babu, P.; Li, J. SPICE: A sparse covariance-based estimation method for array processing. IEEE Trans. Signal Process. 2011, 59, 629–638. [Google Scholar] [CrossRef]
Stoica, P.; Babu, P.; Li, J. New method of sparse parameter estimation in separable models and its use for spectral analysis of irregularly sampled data. IEEE Trans. Signal Process. 2011, 59, 35–46. [Google Scholar] [CrossRef]
Stoica, P.; Babu, P. SPICE and LIKES: Two hyperparameter-free methods for sparse-parameter estimation. Signal Process. 2012, 92, 1580–1590. [Google Scholar] [CrossRef]
Stoica, P.; Zachariah, D.; Li, J. Weighted SPICE: A unifying approach for hyperparameter-free sparse estimation. Digit. Signal Process. 2014, 33, 1–12. [Google Scholar] [CrossRef]
Yin, J.; Chen, T. Direction-of-arrival estimation using a sparse representation of array covariance vectors. IEEE Trans. Signal Process. 2011, 59, 4489–4493. [Google Scholar] [CrossRef]
Zheng, J.; Kaveh, M. Sparse spatial spectral estimation: A covariance fitting algorithm, performance and regularization. IEEE Trans. Signal Process. 2013, 61, 2767–2777. [Google Scholar] [CrossRef]
Liu, Z.M.; Huang, Z.T.; Zhou, Y.Y. Array signal processing via sparsity-inducing representation of the array covariance matrix. IEEE Trans. Aerosp. Electron. Syst. 2013, 49, 1710–1724. [Google Scholar] [CrossRef]
He, Z.Q.; Shi, Z.P.; Huang, L. Covariance sparsity-aware DOA estimation for nonuniform noise. Digit. Signal Process. 2014, 28, 75–81. [Google Scholar] [CrossRef]
Chen, T.; Wu, H.; Zhao, Z. The real-valued sparse direction of arrival (DOA) estimation based on the Khatri-Rao product. Sensors 2016, 16, 693. [Google Scholar] [CrossRef]
Wang, Y.; Hashemi-Sakhtsari, A.; Trinkle, M.; Ng, B.W.H. Sparsity-aware DOA estimation of quasi-stationary signals using nested arrays. Signal Process. 2018, 144, 87–98. [Google Scholar] [CrossRef]
Shutin, D.; Fleur, B.H. Sparse variational Bayesian SAGE algorithm with application to the estimation of multipath wireless channels. IEEE Trans. Signal Process. 2011, 59, 3609–3623. [Google Scholar] [CrossRef]
Hu, L.; Shi, Z.; Zhou, J.; Fu, Q. Compressed sensing of complex sinusoids: An approach based on dictionary refinement. IEEE Trans. Signal Process. 2012, 60, 3809–3819. [Google Scholar]
Yang, Z.; Zhang, C.; Xie, L. Robustly stable signal recovery in compressed sensing with structured matrix perturbation. IEEE Trans. Signal Process. 2012, 60, 4658–4671. [Google Scholar] [CrossRef]
Yang, Z.; Xie, L.; Zhang, C. Off-grid direction of arrival estimation using sparse Bayesian inference. IEEE Trans. Signal Process. 2013, 61, 38–43. [Google Scholar] [CrossRef]
Fan, Y.; Wang, J.; Du, R.; Lv, G. Sparse method for direction of arrival estimation using denoised fourth-order cumulants vector. Sensors 2018, 18, 1815. [Google Scholar] [CrossRef] [PubMed]
Wu, X.; Zhu, W.P.; Yan, J.; Zhang, Z. Two sparse-based methods for off-grid direction-of-arrival estimation. Signal Process. 2018, 142, 87–95. [Google Scholar] [CrossRef]
Bhaskar, B.N.; Tang, G.; Recht, B. Atomic norm denoising with applications to line spectral estimation. IEEE Trans. Signal Process. 2013, 61, 5987–5999. [Google Scholar] [CrossRef]
Yang, Z.; Xie, L.; Zhang, C. A discretization-free sparse and parametric approach for linear array signal processing. IEEE Trans. Signal Process. 2014, 62, 4959–4973. [Google Scholar] [CrossRef]
Yang, Z.; Xie, L. Enhancing sparsity and resolution via reweighted atomic norm minimization. IEEE Trans. Signal Process. 2016, 64, 995–1006. [Google Scholar] [CrossRef]
Wang, X.; Wang, W.; Li, X.; Liu, Q.; Liu, J. Sparsity-aware DOA estimation scheme for noncircular source in MIMO Radar. Sensors 2016, 16, 539. [Google Scholar] [CrossRef] [PubMed]
Jing, X.; Liu, X.; Liu, H. A sparse recovery method for DOA estimation based on the sample covariance vectors. Circuits Syst. Signal Process. 2017, 36, 1066–1084. [Google Scholar] [CrossRef]
Stoica, P.; Babu, P. Sparse estimation of spectral lines: Grid selection problems and their solutions. IEEE Trans. Signal Process. 2012, 60, 962–967. [Google Scholar] [CrossRef]
Ollila, E.; Tyler, D.E.; Koivunen, V.; Vincent, H. Complex elliptically symmetric distributions: Survey, new results and applications. IEEE Trans. Signal Process. 2012, 60, 5597–5625. [Google Scholar] [CrossRef]
Sun, Y.; Babu, P.; Palomar, D.P. Robust estimation of structured covariance matrix for heavy-tailed elliptical distributions. IEEE Trans. Signal Process. 2016, 64, 3576–3590. [Google Scholar] [CrossRef]
Pourahmadi, M. High-Dimensional Covariance Estimation; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2013. [Google Scholar]
Sangston, K.J.; Gini, F.; Greco, M.S. Coherent radar target detection in heavy-tailed compound-Gaussian clutter. IEEE Trans. Aerosp. Electron. Syst. 2012, 48, 64–77. [Google Scholar] [CrossRef]
Hunter, D.R.; Lange, K. A tutorial on MM algorithms. Am. Stat. 2004, 58, 30–37. [Google Scholar] [CrossRef]
Sun, Y.; Babu, P.; Palomar, D.P. Majorization-Minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Signal Process. 2017, 65, 794–816. [Google Scholar] [CrossRef]
Kent, J.T.; Tyler, D.E. Maximum likelihood estimation for the wrapped Cauchy distribution. J. Appl. Stat. 1988, 15, 247–254. [Google Scholar] [CrossRef]
Luo, Z.Q.; Tseng, P. On the convergence of the coordinate descent method for convex differentiable minimization. J. Optim. Theory Appl. 1992, 72, 7–35. [Google Scholar] [CrossRef]
Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Friel, N.; Wyse, J. Estimating the evidence—A review. Stat. Neerl. 2012, 66, 288–308. [Google Scholar] [CrossRef]
Martino, L.; Elvira, V.; Luengo, D.; Corander, J. An adaptive population importance sampler: Learning from uncertainty. IEEE Trans. Signal Process. 2015, 63, 4422–4437. [Google Scholar] [CrossRef]
Martino, L.; Elvira, V.; Luengo, D.; Corander, J. Layered adaptive importance sampling. Stat. Comput. 2017, 27, 599–623. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likilihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Fan, J.; Feng, Y.; Wu, Y. Network exploration via the adaptive LASSO and SCAD penalties. Ann. Appl. Stat. 2009, 3, 521–541. [Google Scholar] [CrossRef] [PubMed]

Figure 1. DOA estimation RMSE versus SNR for Experiment 1.

Figure 2. DOA estimation RMSE versus SNR for Experiment 2 with

n = 100

and

Δ ξ = 15^{\circ}

.

Figure 3. DOA estimation RMSE versus SNR for Experiment 3.

Figure 4. DOA estimation RMSE versus the number of snapshots n for Experiment 2 with

SNR = 0 dB

and

Δ ξ = 15^{\circ}

.

Figure 5. DOA estimation RMSE versus the angle separation

Δ ξ

for Experiment 2 with

n = 100

and

SNR = 0 dB

.

Figure 6. NSs of Experiment 1 with

SNR = 0 dB

.

Figure 7. NSs of Experiment 2 with

n = 100

,

SNR = 0 dB

and

Δ ξ = 15^{\circ}

.

Figure 8. NSs of Experiment 2 with

n = 100

,

SNR = 0 dB

and

Δ ξ = 3^{\circ}

.

Figure 9. NSs of Experiment 2 with

n = 20

,

SNR = - 10 dB

and

Δ ξ = 10^{\circ}

.

Figure 10. NSs of Experiment 3 with

SNR = 0 dB

.

Table 1. Comparison of computational complexities and RMSEs.

SNR	Method	$n_{1}$	$n_{2}$	$n_{3}$	$τ$	RMSE
$- 1 dB$	MM1-CD	$525.5600$	$16.0440$	$32.7550$	$27.9710$	$0.0877$
	MM1-SPICE	$317.6500$	$16.9280$	$18.7370$	$0.3805$	$0.0926$
	MM2	$129.5000$	−	−	$0.3266$	$0.1059$
$- 3 dB$	MM1-CD	$539.2700$	$16.0000$	$33.7040$	$28.7700$	$0.2345$
	MM1-SPICE	$310.0300$	$16.6260$	$18.6070$	$0.3714$	$0.2434$
	MM2	$129.3800$	−	−	$0.3260$	$0.2568$
$- 5 dB$	MM1-CD	$561.0900$	$16.4380$	$34.1500$	$29.3910$	$0.4292$
	MM1-SPICE	$303.1400$	$16.3340$	$18.5130$	$0.3590$	$0.4681$
	MM2	$129.1400$	−	−	$0.3195$	$0.4840$

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Direction of Arrival Estimation in Elliptical Models via Sparse Penalized Likelihood Approach

Abstract

1. Introduction

Notations

2. Problem Formulation

3. Sparse DOA Estimation

3.1. Spatially Non-Uniform White Noise

3.2. Spatially Uniform White Noise

4. DOA Estimation Algorithms

4.1. Algorithms for Spatially Non-Uniform White Noise

4.1.1. MM1 Algorithm

4.1.2. MM2 Algorithm

4.2. DOA Estimation for Spatially Uniform White Noise

4.2.1. MM1 Algorithm

4.2.2. MM2 Algorithm

4.3. Discussions on the MM1 and MM2 Algorithms

4.4. Selection of Penalty Parameter

5. Numerical Experiments

5.1. Experimental Settings

5.1.1. Experiment 1

5.1.2. Experiment 2

5.1.3. Experiment 3

5.2. Experimental Results

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics