An Adaptive Lp Norm Minimization Algorithm for Direction of Arrival Estimation

: In this paper, we propose a new direction of arrival (DOA) estimation algorithm, in which DOA estimation is achieved by ﬁnding the sparsest support set of multiple measurement vectors (MMV) in an over-complete dictionary. The proposed algorithm is based on (cid:96) p norm minimization, which belongs to non-convex optimization. Therefore, the quasi-Newton method is used to converge the iterative process. There are two advantages of this algorithm: one is the higher possibility and resolution of distinguishing closely spaced sources, and the other is the adaptive regularization parameter adjustment. Moreover, an accelerating strategy is applied in the computation, and a weighted method of the proposed algorithm is also introduced to improve the accuracy. We conducted experiments to validate the effectiveness of the proposed algorithm. The performance was compared with several popular DOA estimation algorithms and the Cramer–Rao bound (CRB).


Introduction
With extensive applications, including radar, sonar, wireless communication and remote sensing, DOA estimation has traditionally been a popular branch in the field of array signal processing [1,2]. As a key technology in passive radar, DOA estimation is being extensively developed to locate targets in complex electromagnetic environments without being perceived.
Considering problems, such as distinguishing coherence sources and separating spatial closely sources, sparse reconstruction theory plays an important role in DOA estimation [3,4]. In dividing the whole spatial domain of interests into a discrete set of potential grids, a traditional estimation module can be converted into an ill-posed inverse problem. The specific physical model will be introduced in Section 2.
The MMV data module of a M sensor array takes the following form: where Y ∈ C M×L and N ∈ C M×L are matrices of measurements and noise. The dictionary matrix A ∈ C M×N consists of N steering vectors. These vectors are determined by N different angles. In addition, X ∈ C N×L is a K-row sparse matrix that represents a K sources spatial spectrum. Note that the dictionary matrix A is fat; hence, the equation Y = AX has infinitely many exact solutions. To reconstruct the sparse signal, a simple and intuitive measure of sparsity is 0 -norm. The Lagrangian relaxation problem with the 2,0 mixed norm is as follows: Note that the mixed-norm X 2,0 means a sparse spatial domain but closely related time domain. That is to say, 2 norm gathers the snapshots of X ∈ C N×L to obtain a spatial spectrum X vec ∈ C N×1 , and 0 norm measures the sparsity of X vec . The general formula is: Considering solving (2) directly is an NP-hard problem, alternative approaches are often studied. One alternative method is the greedy algorithm, OMP [5] pursued the current optimal value in every iteration. Another method is an approximate 0 norm method, such as SL0 [6].
A common convex approximation of the 2,0 -pseudo-norm that is known to promote sparse solution is the 2,1 -norm.
Convex optimization algorithms are always convergent to the global optimal solution. This feature makes convex optimization algorithms well-studied. Malioutov et al.
proposed a heuristic approach, named 1 -SVD [7], using a reduction of the dimension of measurement matrix Y by SVD and adaptive grid refinement. However, for a large number of snapshots L or large number of candidate frequencies K, the problem becomes computationally intractable.
Different from other sparse reconstruction fields, in the DOA estimation, atoms of dictionary A are completely determined and fixed by the discrete spatial grid. Highly correlated atoms in the dictionary makes it difficult to separate adjusted vectors. This is far from the "approximate orthogonality" condition, which was introduced in the Restricted Isometry Property (RIP) [8]. However, p norm optimization has a greater possibility of obtaining a sparse solution, which is usually used in imagery target detection [9]. Thereby, p norm optimization also can be used to distinguish closely spaced sources.
Hence, in the 2,p -constrained problem we search for an N-row sparse solution that minimizes the fitting error. Algorithms, such as iterative reweighted least squares (IRLS) [10] and FOCUSS [11] are popular in DOA estimation. They both approximate the p norm with a weighted 2 norm.
The goal of this paper is to find a practical sparse reconstruction method that fills the missing application scenario of traditional DOA estimation algorithm. The main contribution of the paper is a method for the automatic selection of the regularization parameter.
In contrast to other existing methods, the proposed method works completely in the iterative process, which means better adaptability to different situations. The key ingredient of the proposed method is the use of the quasi-Newton algorithm. Without losing accuracy, the iterative process is optimized, and the computation is reduced by a matrix inverse lemma, even with dense grids. In addition, we introduce the improvement of the weighted algorithm. Last but not least, the array does not have to be linear, and the sources can be coherent.
The rest of the paper is organized as follows. The processing of the proposed algorithm and innovative details are considered in Section 2. Potential application scenarios are discussed. Section 3 provides experimental results to compare the proposed algorithm with other DOA estimators. Our conclusions are contained in Section 4.

The Proposed Algorithm
The physical meaning of all parameters in the DOA estimation mathematical model will be explained here briefly. The array with all omnidirectional sensors receives plane waves from the far field of the narrow-band signal source. All signals s k (k = 1, 2, . . . , K) are impinging with different angles θ k (k = 1, 2, . . . , K). The phase difference between two adjacent sensors is defined as where λ s denotes the wavelength. Let the sensor at the coordinate origin be the reference. The direction vector can be expressed as a(θ k ) = [1, exp(−j∆ϕ k ), . . . , exp(−j(M − 1)∆ϕ k )] T , and thus the manifold matrix can be expressed as A s = [a(θ 1 ), . . . , a(θ K )] T . Figure 1 shows the whole processing of DOA estimation. In order of the subplots, first, the signal space can be divided to N grids; therefore, the dictionary matrix A can be expressed as A = [a(θ 1 ), . . . , a(θ N )] T . Then, the sensor data matrix Y ∈ C M×L is the input data of the proposed algorithm. M denotes the number of sensors, and L denotes the number of samples. Finally, with a proper coefficient, linear combinations of atoms in the dictionary can reconstruct the signal. As long as the sparsest coefficient vector can be found, the direction of sources can be known from the index of the non-zero support set.

The p Norm Minimization Algorithm
Due to p (0 < p < 1), cost function (5) is not differentiable at 0. The authors in [12] proved a differentiable approximation suiting our purpose well: where ≥ 0 is the smoothing parameter that controls the trade-off between smoothness and approximation. Usually the value of is set to 10 −6 in the DOA estimation background.
If is too large, the approximation is not good, and if is too small, the inverse of (11) will be singular. Substituting (7) for X 2,p in (5): The Jacobian matrix of the objective function is where the superscript [•] H represents the conjugate transpose operation and The Hessian matrix of the objective function is However, the Newton method is not convergent due to the non-positive definiteness of H. A positive definite approximation of H can be obtained by simply removing the negative part in (11).
Note that In DOA estimation, the dimension of H is affected by the number of grids, usually N ≥ M, which causes heavy computation. Considering H consists of A H A and a diagonal matrix λ(D 1 + D 2 ), the matrix inversion lemma is introduced to optimize the inversion Note that Applying (14) can reduce the dimension of the inverse matrix from N × N to M × M; thus, the computational complexity decreases to In order to facilitate the subsequent measurement of residuals and introduce (14) to reduce the computational burden, the Newton direction N d in the proposed algorithm can be expressed as: where T 1∼4 are repeated parts in calculating and T 1 will be used to obtain the noisy parameter in section B.

Adaptive Method for p
The ability to estimate the DOAs of two closely spaced sources is always an important indicator of the DOA estimation algorithm. Figure 2 is a norm ball of R 2 space. The curves show that, the lower the p value, the easier it is to find a sparser solution. When faced with closely spaced sources, p norm optimization has a greater possibility of obtaining a sparse solution, thereby, distinguishing two sources. As stated in the RIP condition, the uniqueness solution can be characterized in terms of the δ K of matrix A. Recall the smallest constants 0 < δ K ≤ 1 for which the matrix A satisfies the RIP of order K, that is It is clear that any K-sparse vector x is recovered via the 0 norm minimization if and only if the strict inequality δ 2K < 1 holds. Define a new parameter to represent the RIP condition of matrix A The quantity γ 2K can be made arbitrarily close to 1 by when the A is orthogonal column by column. As stated in [12], given 0 < p ≤ 1, if (20) for some integer t ≥ K, then every K-sparse vector is exactly recovered by solving the P norm minimization.
When t = K + 1, condition (20) becomes From (21), when p → 0, the right side of the inequality becomes +∞, which means the RIP condition number δ 2K+2 → 1. That is to say, the smaller the value of p, the lower the 'approximate orthogonal' requirement of sparse reconstruction for matrix A. From the perspective of DOA estimation, as long as p is small enough, the ability of the algorithm to overcome the high correlation between adjacent columns in over-complete basis A is stronger, and the ability of separating spatial closely sources is stronger.
When p → 0 the penalty of p norm is close to 0 norm. However, too small p may lead the algorithm to converge to a local optimization. The common p norm optimization algorithms, such as FOCUSS usually choose p = 0.8 based on experience [13]. In order to make p sensitive to the spectrum convergence and find a sparser solution, this section defines a decreasing processing of p by variation of the spectrum until p < 0.1. where , proves that the spectrum variation of this iteration is too large, then we directly reduce the p value to half of the previous time. Thus, the stopping criteria of the proposed algorithm is v (k) < δ, and δ > 0 is a small constant.

Adaptive Regularization Parameter Adjustment
According to (5), the regularization parameter λ has a significant influence on the optimization process. However, balancing the relationship between residuals and sparsity is a difficult problem. At present, the methods for selecting regularization parameters include Generalized Cross Validation (GCV) [14], L-curve [15] etc. However, the above methods usually require massive data for prior computation and are, thus, not suitable for real-time DOA estimation. Another computational algorithm L1-SRACV [16], which considers the covariance matrix distribution, can also avoid the regularization parameters.
This section will provide a practical procedure for adjusting λ in iteration. This method balances residuals and sparsity adaptively without massive prior simulation and working in the iteration process.
The covariance matrix of receiving data: The ascending sorted eigenvalues γ = [γ 1 , γ 2 , . . . , γ M ] can be obtained by eigenvalue decomposition of R. Therefore, the expected value of the noise energy P N is: If source number K is unknown, there are two alternative methods. Under a low-SNR scenario, P N can be expressed by the minimal eigenvalue P N = Mγ min L . Under a high-SNR scenario, the source number K can be roughly estimated by the slope mutation of γ.
In a convergent and stationary iterative process with accurate estimation results, the power of residual P R = T 1 2 F = Y − AX 2 F must converge to the noise power. Therefore, P R can be the measurement indicator of proposed algorithm, which should be controlled within a certain range. Considering the calculation error of P N and residual, the range of P R is determined to be [0.75P N , 1.25P N ] after many simulation experiments.
The details are: in the kth iteration, if P R > 1.25P N , this indicates an overemphasis on the influence of sparsity on the optimal value-the model is poorly fitting-thus, we reduce λ. If P R < 0.75P N , this indicates an overemphasis on the influence of noise-the model is over fitting-thus we enlarge λ. If 0.75P N < P R < 1.25P N , the descent direction works well, and nothing has to change. When the support set of the spectrum is determined, P R will not have obvious fluctuations, which indicates the convergence of this computation.
Proper initialization can considerably reduce the running time of the quasi-Newton method, and it has a much lower possibility to become trapped in a local minimum. The product of the dictionary and receiving data can be the initialization The proposed Algorithm 1 is described as follows.

Algorithm 1
The improved p norm DOA estimator with adaptive regularization parameter adjustment.

Weighted Method
The weighted algorithm adds a weight in front of the spatial spectrum, which is a simple strategy to accelerate convergence and obtain higher resolution.
Set a small weight in the place where the signal may occur, and set a large weight in the place where there is no signal so that the small value of the spatial spectrum is smaller, the large value is larger and the overall solution is more sparse. Normally, the spatial spectrum of CAPON algorithm is used as the weight.
The weighted method can not only accelerate the iteration but also improve the accuracy of the algorithm. Although the weighted method will increase the computational complexity, it can improve the estimated success rate even in worse situations.
In Figure 3, the simulation is based on a eight-sensor uniform linear array with halfwavelength sensor spacing. The results show that the weighted method can distinguish two sources separated only by 3 • with 200 snapshots and SNR = 5 dB.

Experimental Results
In this section, the empirical performance of the proposed algorithm is investigated and compared with those of MUSIC, FOCUSS and L1-SVD ( 1 norm optimization by CVX toolbox [17] ). The simulations are based on a 12-sensor uniform linear array with halfwavelength sensor spacing. All methods here use a grid of 1801 points in the range of [−90 • , 90 • ], which means 0.1 • .
Considering the stability and good performance of all algorithms, the regularization parameter λ of FOCUSS and L1-SVD was selected by the empirical value and p = 0.8 for the FOCUSS algorithm. For the proposed method, we used adaptive λ and p. The signal waveform was modeled as a complex circular Gaussian random variable S(t) ∼ N C (0 M , 1 M ). The variance of Gaussian white noise N(t) is determined by the SNR. According to X(t) = A s S(t) + N(t), sensor data modeled as random variable X(t) ∼ N C 0 M , E X(t)X H (t) .
First, the highlight of the proposed algorithm is adaptive adjustment of parameters in the iterative process. The proposed algorithm automatically adjusts λ according to the expectation power of residual P R and adjusts p according to the variation of spatial spectrum as shown in Figure 4 with SNR = 0 dB. The top subplot shows the variation of λ, the middle subplot shows the variation of p, and the bottom subplot indicates the convergence procedure of the residual.
In Figure 4, we examine how the proposed method can adjust λ and p to provide a convergence result. From the beginning, the result is too close to the least squares solution, which usually has a small residual. In order to obtain a sparser solution, λ is increased to emphasize the importance of sparsity in iterations. For the whole process, the adaptive method maintains the residuals within expectations as much as possible until iteration convergence.
The adaptive methods can not only reduce the influence of the artificial selection of parameters on the results but also speed up the algorithm convergence to a certain extent. The convergence speed of the proposed algorithm was improved under the combined effect of acceleration strategy and adaptive methods. Figure 5 shows that the improved p method will converge before 18 iterations, and the original method of p and the FOCUSS algorithm requires nearly 24 iterations to converge.  Assume two equal-power uncorrelated signals arrive from 0 • and 0 • + ∆θ, respectively, where ∆θ is varied from 0.1 • to 4 • . The SNR is set to 0 dB, and the number of snapshots is 200. The statistical result obtained via 200 multi-snapshot trials is shown in Figure 6 with the correct rate versus angular separation. As is clear from the figure, the proposed algorithm had a better ability to distinguish two closely spaced signals.
The bias curves in Figure 7 were obtained via 200 independent trials. Every multisnapshots trial had a different DOA, θ 1 and θ 2 , drawn within the intervals [−11 • , −9 • ] and [19 • , 21 • ], and the snapshot number was 200. The root mean square error (RMSE) curves for different SNRs show that the resolution of the proposed algorithm meets the existing sparse reconstruction algorithm standards. Since the set grid size is 0.1 • , when the SNR is high, the resolution of these algorithms is limited by the grid, and the curves will gradually become flat.
In conventional DOA estimation, this degree of error can generally be regarded as a correct estimation. The RMSE of the algorithm does not fully reflect the ability of DOA estimation. Sometimes, it is more meaningful to distinguish two sources in a certain spatial range with low resolution than to obtain only one estimation result with high resolution, which means that the latter option loses a target.  This section will close by examining the impact of the size of the data snapshot on the error-suppression criterion of the adaptive method. Consider two uncorrelated signals arriving from [−11 • , −9 • ] and [19 • , 21 • ], where the SNR is fixed at 0 dB, the number of data snapshots is varied from 2 to 1000, and the remaining conditions are identical to those above. Figure 8 shows that, compared to the other methods, the proposed algorithm is stable in small and large snapshots.
As the focus of this paper is to improve the ability of the algorithm to distinguish closely spaced sources, the proposed algorithm and the conventional sparse reconstruction algorithms are essentially maintained in terms of the RMSE.

Conclusions
In this paper, an improved p norm minimization DOA estimation algorithm was proposed. The proposed algorithm has two advantages: one is the greater ability to separate two closely spaced sources, and the other is the adaptive regularization parameter adjustment. The latter can reduce the impact of manual selection of λ on the results and can accelerate convergence. The main motivation of this work comes from the desire to increase the practicability of sparse reconstruction algorithms.
By applying p norm minimization with the quasi-Newton method and acceleration strategies, the proposed algorithm was faster than L1-SVD, the convex optimization algorithm. In addition, the adaptive method of the regularization parameter can run stably in most non-extreme situations. Compared to other seriously time-consuming methods or by optimizing the inequality-constrained minimization problem directly, the proposed method is more practical.