Off-Grid DoA Estimation via Two-Stage Cascaded Neural Network

: This paper introduces an off-grid DoA estimation via two-stage cascaded network which can resolve a mismatch between true direction-of-arrival (DoA) and discrete angular grid. In the ﬁrst-stage network, the initial DoAs are estimated with a convolutional neural network (CNN), where initial DoAs are mapped on the discrete angular grid. To deal with the mismatch between initially estimated DoAs and true DoAs, the second-stage network estimates a tuning vector which represents the difference between true DoAs and nearest discrete angles. By using tuning vector, the ﬁnal DoAs are estimated by moving initially estimated DoAs as much as the difference between true DoAs and nearest discrete angles. The limitation on estimation accuracy induced by the discrete angular grid can be resolved with the proposed two-stage network so that the estimation accuracy can be further enhanced. Simulation results show that adding the second-stage network after the ﬁrst-stage network helps improve the estimation accuracy by resolving mismatch induced by the discretized grid. In the aspect of the implementation of machine learning, results also show that using CNN and using PReLU as the activation function is the best option for accurate estimation.


Introduction
Direction-of-arrival (DoA) estimation is one of the long-studied research topics in array signal processing. DoA estimation algorithms have been adopted in various applications, such as localization and radar [1]. The DoA estimation can also take essential role in cooperative localization for vehicular networks, where the cooperative localization between vehicles requires the relative distances and DoAs of neighboring vehicles [2,3]. Using the aforementioned information, the position of vehicles that are derived from a global positioning system (GPS) can be estimated using cooperative localization [2,3]. As the commercialization of an automotive multiple-input multiple-output (MIMO) radar progresses [4], it is possible for vehicles to harness DoAs of other vehicles using the DoA estimation algorithm.
Traditional DoA estimation algorithms such as MUSIC [5] have high estimation accuracy and resolution. However, they require a large number of snapshots and cannot properly estimate DoAs of coherent signal sources. To overcome these disadvantages, compressive sensing (CS)-based DoA estimation algorithms have been proposed in [6][7][8], where the CS-based DoA estimation exploits the discrete grid which consists of angular bases. Regardless of the type of CS, such as basis pursuit (BP) [9] and sparse Bayesian learning (SBL) [10], CS-based DoA estimation algorithms are commonly robust against low signal-to-noise ratio (SNR) and fewer snapshots [11]. CS-based DoA estimation exploits the discrete angular grid which consists of angular bases; however, there is a high possibility that the true DoAs do not correspond perfectly with the angular bases in practice. The mismatch between the DoAs and the angular bases is generally referred to as a gridmismatch and induces an inevitable estimation error [12].
Aside from the aforementioned studies, an attempt to exploit the machine learning to the DoA estimation has first been proposed in [13,14], and continues to be studied as the deep learning theory and methods are developing fast [15,16]. In [13,14], a support vector regression (SVR) approximates the function that maps the received signals to DoAs. Results in [13,14] show that the DoA can be successfully estimated with SVR, where SVR is a machine learning technique that is mainly used for pattern recognition problems. However, the estimation accuracy of algorithms in [13,14] is inferior to the traditional DoA estimation algorithm such as MUSIC [5]. As computational resources that can support large datasets are not viable currently, small amounts of data are used for training, and this causes low estimation accuracy.
After the introduction of a neural network (NN) [16], DoA estimation algorithms based on various types of NN were proposed in [17][18][19][20][21], where these algorithms are trained and tested with the massive amounts of data. Although deep neural network (DNN)-based algorithms in [17,18] have higher estimation accuracy and resolution than traditional DoA estimation algorithms, they can only estimate the fixed number of DoAs as the number of DoAs that DNN can estimate is also fixed. In practice, the number of signal sources is unfixed and unknown so that the number of DoAs can vary. Thus, algorithms in [17,18] are not suitable for practical situations. Unlike in [17,18], the algorithms in [19,20] can cope with the varying number of DoAs by setting the output of network as an angular grid, where the idea of [19,20] is motivated by the CS-based DoA estimation. The angular grid represents the discretized angular domain and maps where DoAs belong within the grid. In this case, DoAs can be successfully estimated if every DoA matches with one of the discrete angles within the grid, regardless of the number of DoAs. However, the algorithms in [19,20] suffer from the identical problem of CS-based DoA estimation, as a true DoA may not perfectly match with the discrete angles in practice. Thus, the inevitable estimation error occurs when estimating DoAs with algorithms in [19,20].
To prevent the inevitable estimation error induced by mismatch between true DoAs and discrete angles within the grid, we propose a novel off-grid DoA estimation algorithm based on the two-stage cascaded NN. A design of the two-stage cascaded NN is motivated by a two-step approach in [22]. In the first stage, the DoAs are initially estimated with the convolutional neural network (CNN). As the initial DoAs estimated by the first-stage network are mapped on the discrete angular grid, the initially estimated DoAs contain the estimation error induced by the mismatch. In the second stage, the mismatch between the true DoA and angles within the grid is resolved by DNN. The second-stage network estimates a tuning vector which represents the difference between true DoAs and nearest discrete angles. The final estimated DoA can be obtained by moving initial DoA as much as the difference between the true DoA and the nearest discrete angle. By using this two-stage cascaded NN, the estimation accuracy is no longer limited by the discrete angular grid so that the estimation accuracy can be further enhanced. Simulation results prove that the two-stage cascaded NN can achieve higher estimation accuracy than using a single network by resolving the mismatch induced by the discrete angular grid.
Notations: We use lower case and upper case bold characters to, respectively, represent vectors and matrices throughout this paper. (·) T , (·) H , and (·) * , respectively, denote the transpose, conjugate transpose, and complex conjugation. ⊗ denotes Kronecker product. Trace(·) denotes the trace of a matrix. real(·) and imag(·), respectively, denote the real part and imaginary part. a(i) denotes the i-th element in a vector a, and A(i, j) denotes the (i, j)-th element in a matrix A. a denotes the L2 norm of a. 0 N denotes a N × 1 zero vector, and I N denotes a N × N identity matrix. diag(·) denotes a vector whose entries are diagonal elements of a given matrix. C M×N , R M×N , and R M×N + , respectively, denote M × N matrix whose elements are complex, real, and real positive numbers. If N = 1, they are M × 1 vectors.

Signal Model
We assume that P uncorrelated narrowband signals impinging on the uniform linear array (ULA) with M elements. Here, P narrowband signals have same carrier frequency. The spacing between adjacent antennas is set to half-wavelength λ/2, where λ denotes the wavelength of the signals. DoAs of P signals are denoted as Θ, where Θ = [θ 1 , . . . , θ P ] T . When the narrowband signal impinges on the antenna array, an identical signal arrives at each antenna with a different time delay, where the time delay is dependent on the DoA of the signal. For narrowband signal, the time delay can be translated as a phase shift, and a vector that models the phase shift of each antenna according to DoA is generally known as array manifold vector or steering vector [23]. An array manifold vector whose DoA is θ, a(θ) can be given by follows as in [23].
An array manifold matrix for P signal sources, A(Θ) is The received signal X can be written as where D is the number of snapshots. S = [s 1 , . . . , s P ] T , where s p ∈ C D×1 denotes the p-th signal vector. N is an additive white Gaussian noise matrix whose columns follow and σ 2 denotes the power of the noise. R X , the covariance matrix of X, can be defined as where R S = E SS H . As every signal source is uncorrelated with each other, R S is a diagonal matrix where its p-th diagonal element equals to the power of the p-th signal source. We define z ∈ R P×1 + such that z = diag(R S ). The (i, j)-th element of R X , R X (i, j) can be given by where z(p) is the p-th element of z, which denotes the power of the p-th signal source and satisfies that z(p) = E s H p s p . A vectorized covariance matrix y can be given by where vec(·) denotes the vectorization of the matrix, A(Θ) = [a(θ 1 ) * ⊗ a(θ 1 ), . . . , a(θ P ) * ⊗ a(θ P )] ∈ C M 2 ×P , and i = vec(I M ). Following CS-based DoA estimation and a machine learning-based DoA estimation framework [6][7][8]19,20], y can be represented as a product of a discretized angular grid and a sparse vector. Here, the discretized angular grid covers the potential DoAs that range between [0 • , 180 • ]. The discretized angular grid Φ can be given by where G denotes the size of the angular grid, and ϑ g denotes the g-th discrete DoA such that ϑ g = (180g/G) • . Assuming θ 1 , . . . , θ P corresponds with one of ϑ 1 , . . . , ϑ G , y can be rewritten as η ∈ R G×1 denotes the sparse vector, where the majority of the elements of η are 0. If θ p = ϑ g , η(g) = z(p).

Off-Grid DoA Estimation via Cascaded Convolutional Neural Network
In practice, θ 1 , . . . , θ P may not perfectly correspond to the potential DoAs in Φ. For example, when θ p = 75.3 • and G = 180, none of the potential DoAs in Φ matches perfectly with θ p . Thus, we propose the DoA estimation algorithm based on the two-stage cascaded NN, which can resolve the mismatch between the true DoAs and the discrete angular grid. In the first stage, the discrete angles that are nearest to the true DoAs are estimated by CNN. In the second stage, the difference from the true DoA and the nearest discrete angle is estimated by DNN. The overall structure of the two-stage cascaded NN is given in Figure 1. Throughout the paper, the term first-stage network denotes the network that initially estimates discrete angles, and the term second-stage network denotes the network that estimates the difference from the true DoA and the nearest discrete angles.

First Stage: DoA Estimation via Convolutional Neural Network
The CNN used in the first stage follows the network structure in [20], where 1D convolutional layers in the CNN extract the features from the input and reconstruct η. As the CNN in [20] does not use pooling, the number of rows of the input should be equal to the size of the output, where the size of the output is G. To form the input whose number of rows equals to G,η is formulated as follows.
As all elements of the input have to be real numbers, the input ζ is given by An output of the k-th 1D convolutional layer, c k , can be represented as in [20]: where * denotes the convolution, c (0) = ζ, and K denotes the number of convolutional layers. W (k) and b (k) denote the convolution kernel and the bias of the k-th convolutional layers. f k Act (·) denotes the activation function that the CNN is using. The padding operator Z (·) adds zeros on both sides of the output of the activation function so that the size of c k remains constant for k = 1, . . . , K.
The loss function of the CNN is given by mean squared error (MSE), η − η 2 , wherê η denotes the estimation of η. The convolution kernel and the bias that minimize the loss function can be given by follows as in [20].
whereŴ (k) andb (k) denotes the convolution kernel and the bias of the k-th convolutional layer that minimize the loss function.
The input data and the output data of the first-stage network can be summarized as follows.

Second Stage: Resolving Mismatch via Deep Neural Network
The second-stage network followed by the first-stage network resolves the mismatch by estimating the tuning vector ψ ∈ R G×1 . ψ represents the difference between true DoAs and nearest discrete angles. After obtainingη from the first-stage network, the second-stage network yieldsψ with ζ andη, whereψ denotes the estimation of ψ.  As the input of the DNN must be a real vector, the input of the DNN χ is formulated as Letting L, d (l) , and I (l) , respectively, denote the number of dense layers, the output of the l-th layer, and the size of d (l) , d (l) (j) can be given by where d (0) = χ. U (l) and v (l) denote the weights and the bias of the l-th layers, respectively. In the second stage, a hyperbolic tangent function tanh is employed for the activation function. The reason for selecting tanh as the activation function is that the output vector of the second-stage network must be capable of representing a negative value. For example, if G = 180 so that the angular grid is discretized into units of 1 • , elements ofψ should be able to range between −0.5 • and +0.5 • . Note that ReLU and sigmoid cannot be employed as the activation function of the second-stage network as ReLU and sigmoid can only yield the value equal to or greater than 0. The loss function of the DNN is given by MSE, ψ − ψ 2 . The weights and the bias that minimize the loss function can be denoted as whereÛ (l) andv (l) , respectively, denote the weights and the bias of the l-th dense layer that minimize the loss function.
The input data and the output data of the second stage DNN can be summarized as follows.

Simulation Settings
The performance of the proposed algorithm and L1-SVD [6] are compared in this section. Even though the proposed algorithm is motivated by the work in [22], the comparison with the work in [22] is omitted as the algorithm in [22] exploits the co-prime array [24] and targets for the wideband signal. M = 8 and P = 2. For all algorithms, a size of the grid G is set to 180 so that a discrete grid spacing is set to 1 • . The signal-to-noise ratio (SNR) is defined as The root mean square error (RMSE) is defined as where Q is the number of Monte Carlo trials for RMSE calculation, and θ q p andθ q p , respectively, denote the true DoA and the estimated DoA of the p-th signal source on the q-th trial.
For training, a total of 106,800 training data are used, and 20 percent of the training data is used for the test. When generating the received signal for the training, the SNR is randomly set between 0 and 10 dB while the number of snapshots is set to 256. DoAs are randomly set between [30 • , 150 • ], and z = [1, 1] T . Once the DoAs and SNR are determined, the received signal X is generated as (3), and the covariance matrix R X is formulated as (4). Using R X , the input data of the first-stage network ζ can be formulated by following the procedure of (6)- (8) and (10). The output data of the first-stage network used for training In the first-stage network, the DNN is implemented as well as the CNN to compare the performance according to the type of the network. When the DNN is used for the first stage, the input ζ ∈ G × 2 is vectorized as the DNN can only support vector for both input and output. The output is identical toη in both CNN-and DNN-based first-stage networks. Regardless of the type of network that the first stage is using, three types of activation functions-PReLU, ReLU, and tanh-are used, and the number of convolutional layers and dense layers in the first-stage network is set to 4. If the CNN is used at the first stage, the kernel size of each layer is set to 25 × 12, 15 × 6, 5 × 3, and 3 × 1. If the DNN is used at the first stage, the number of nodes in every layer is set to 200. The number of dense layers in the second-stage network, L equals 4. Here, the number of nodes in every layer is set to 200. Number of epochs of the first-stage network and the second-stage network are, respectively, set to 300 and 500, and the batch size is uniformly set to 64. The overall training settings are organized in Table 1. Batch size 64 The number of epochs 300 500

Performance Analysis According to Hyperparameters
The performance of the NN varies with hyperparameters, where hyperparameters include the number of epochs and the type of the activation function. The performance of the first-stage network is compared with respect to the number of epochs and the type of the activation function in Figure 3. Figure 3 shows that the test MSE is lower in the order of CNN with PReLU, DNN with PReLU, CNN with tanh, DNN with tanh, and DNN with ReLU. PReLU has the low test MSE while tanh has the high test MSE, where the test MSE means the loss function obtained by test data. Overall, the test MSE tends to converge around the 200-th epoch, except the DNN with tanh which converges around the 700-th epoch. Moreover, CNN has a lower test MSE than DNN if an identical activation function is used. From Figure 3, we can conclude that CNN whose activation function is PReLU is best suited for the DoA estimation at the first stage. Note that the test MSE of the second-stage network according to hyperparameters is omitted as the activation function of the second-stage network is fixed to tanh. In case of the second-stage network, the test MSE converges around the 500-th epoch. Figure 4 shows outputs of the first-stage network and the second-stage network according to the type of NN and the activation function. Note that only results of CNN with PReLU, CNN with ReLU, and DNN with ReLU are presented as the others have higher test MSE, and figures in the second row show the true DoAs and the final estimated DoAs determined by two-stage cascaded NN. In Figure 4, CNN with PReLU yields distinct peaks around true DoAs at the first stage and achieves the highest estimation accuracy with the help of the second-stage network. However, DNN with PReLU yields obscure peaks at the first stage, and the output of the second stage shows poor performance when using DNN with PReLU. From Figures 3 and 4, we can tell that CNN whose activation function is PReLU is the best option, and the output of the first-stage network affects the output of the second-stage network.

Analysis on Estimation Accuracy and Resolution
Before presenting simulation results, an explanation of the terms that are used in figures and analysis is given to avoid unclear understanding. The term single-stage denotes the DoA estimation without the second-stage network, and the term two-stage means that the second-stage network is added after the first-stage network. A type of the first-stage network is described after the number of stage that the algorithm is using. Here, the type of the first-stage network is either CNN or DNN. For example, two-stage CNN means that there is the second-stage network followed by the first-stage network and the first-stage network is CNN. Since Figure 3 shows that PReLU outperforms other types of activation functions, the activation function of the first-stage network is fixed to PReLU. Figure 5 shows the root mean square error (RMSE) of DoA estimation algorithms when the SNR and the number of snapshots vary. Figure 5 shows that the proposed two-stage CNN presents the best estimation accuracy among other algorithms, except when fewer snapshots are used. When training the network, the number of snapshots is fixed to 256 so that the condition with the fewer snapshots is not considered. Thus, the RMSE of L1-SVD becomes lower than that of two-stage CNN and single-stage CNN when the number of snapshots is below 18. In Figure 5a,b, the RMSE of single-stage CNN, single-stage DNN, and L1-SVD does not fall below 0.2 • . When the second-stage network is added after CNN, the RMSE falls below 0.2 • by resolving the mismatch between true DoAs and discrete angular grid. However, when DoAs are incorrectly estimated at the first stage, the second-stage network does not improve the RMSE as the incorrect output of the first-stage network affects the performance of the second-stage network. To analyze the resolution of DoA estimation algorithms, we present the RMSE when the difference between two adjacent DoAs varies. We define a distance between two adjacent DoAs τ as τ = |(cos θ 1 − cos θ 2 )/2|, where τ is used as a criterion of resolution in [25][26][27]. Figure 6 shows the RMSE versus τ and presents that the second-stage network helps improve RMSE. The RMSE of the single-stage CNN and the two-stage CNN converges when τ = 0.02, and the RMSE of the single-stage DNN and the two-stage DNN converges when τ = 0.03. The RMSE of L1-SVD is higher than that of the CNN-based DoA estimation and the DNN-based DoA estimation at every τ. From this result, we can conclude that the CNN-based DoA estimation has the highest resolution, and the second-stage network helps to increase the estimation accuracy furthermore.

Conclusions
In this paper, we propose the off-grid DoA estimation algorithm which can resolve mismatch induced by discretized angular grid via two-stage cascaded NN. The first-stage network employs CNN, and a resized covariance matrix to fit the output dimension of CNN is used as an input. The first-stage network estimates initial DoAs that can be represented as discrete angles. Thus, there is a mismatch between true DoAs and initially estimated DoAs. The second-stage network employs DNN, and both input and output of the first-stage network are used as input. The output of the second-stage network is the tuning vector which represents the difference between true DoAs and nearest discrete angles. By using outputs of the first-stage network and the second-stage network, final estimated DoAs can be obtained. Simulation results show that using the cascaded NN improves the RMSE by resolving mismatch induced by the discretized grid. Furthermore, results show that using CNN and using PReLU as the activation function is the best option for DoA estimation considering both estimation accuracy and resolution. The proposed algorithm can be an attractive option for applications such as far-field localization and far-field radar. These applications require high estimation accuracy since a small DoA estimation error may cause large performance degradation. However, the performance of the proposed algorithm is analyzed only when the number of signal sources is fixed. In practice, the number of signal sources can vary so that the performance of the proposed algorithm may decrease. Thus, a method that addresses this issue needs to be studied for the practical implementation of the NN-based DoA estimation algorithm.