A-CRNN-Based Method for Coherent DOA Estimation with Unknown Source Number.

Estimating directions of arrival (DOA) without knowledge of the source number is regarded as a challenging task, particularly when coherence among sources exists. Researchers have trained deep learning (DL)-based models to attack the problem of DOA estimation. However, existing DL-based methods for coherent sources do not adapt to variable source numbers or require signal independence. Herein, we put forward a new framework combining parallel DOA estimators with Toeplitz matrix reconstruction to address the problem. Each estimator is constructed by connecting a multi-label classifier to a spatial filter, which is based on convolutional-recurrent neural networks. Spatial filters divide the angle domain into several sectors, so that the following classifiers can extract the arrival directions. Assisted with Toeplitz-based method for source-number determination, pseudo or missed angles classified by the estimators will be reduced. Then, the spatial spectrum can be more accurately recovered. In addition, the proposed method is data-driven, so it is naturally immune to signal coherence. Simulation results demonstrate the predominance of the proposed method and show that the trained model is robust to imperfect circumstances such as limited snapshots, colored Gaussian noise, and array imperfections.


Introduction
Direction-of-arrival (DOA) or spatial spectrum estimation is one of the most important content in array signal processing, which has been widely applied in navigation, acoustics, electronic reconnaissance [1,2], etc. The past few decades have witnessed the emergence of numerous high-resolution DOA estimation algorithms, which can break through the Rayleigh limit [3]. However, the performance remains to be further improved in some non-ideal situations, such as multi-path effect, limited snapshots, and colored noise.
Traditional solutions to high-resolution DOA estimation are well-known subspace-based algorithms, sparse representation-based methods, etc. Taking one of the most representative subspace-based techniques, for instance, multiple signals classification (MUSIC) [4], it conducts signal subspace decomposition to search for steering vectors approximately orthogonal to noise subspace, and the corresponding angles are considered as arrival directions. These MUSIC-based algorithms [5][6][7][8] can achieve considerably high angular-resolution, but usually require the given signal number, signal independence, and adequate snapshots. Fortunately, Ciuonzo [9] has proposed the approach of locating scatterers with unknown number on the foundation of cell-by-cell processing, which provides a new perspective to improve basic MUSICs. Another common improvement for coherent sources is using spatial smoothing [10] to form the full-ranked source covariance matrix, so that the MUSIC algorithm can be successfully applied. During the past two decades, direction finding by sparse reconstruction aims at minimizing the difference between data covariance matrix and the sparsely reconstructed one, which can be formulated into a convex optimization problem [11][12][13]. In this methodology, DOAs can be estimated on-grid or off-grid, and the source number is not compulsorily required. However, the procedure of finding the sparse solution and detecting the peaks located in the sparse spatial spectrum may pose a threat to pratical cases in terms of computation complexity with the increasing of the matrix sizes.
To overcome the difficulties mentioned above, learning-based methods have been introduced to find the directions. These robust and flexible techniques have been proven an excellent tool in synthetic and realistic data sets [14][15][16]. Dating back to the 1990s, the radial-basis function neural network (RBFNN) was adopted to learn nonlinear mappings from covariance matrices to spatial spectrums [15].
The key benefit arising from the RBFNNs is that they are in common use for correlated as well as uncorrelated signals. However, for small angular spacing, RBFNNs may be tough to separate these angles, while deep learning (DL) networks are capable to work well. In recent years, deep convolutional neural networks (CNN) [17] and recurrent neural networks (RNN) [18] have been applied in DOA estimation. Liu et al. took autoencoders and full-connected layers to build the deep neural network (DNN) framework, which could well resist several kinds of array imperfections. Following the one-vs-all classification guideline, DNN models in [19] were able to detect the number of sources. The algorithm has brought many improvements, however it is merely applicable to independent sources for using linear autoencoders. Lately, RNN has been approved of an excellent structure in time-series processing [20]. Perotin et al. stacked RNNs up to convolution layers to locate the sources with a given number, and the database formed by realistic acoustic signals had represented the effectiveness. However, once the source number changed, the model has to be trained again.
Present DNN-based DOA estimation models mostly ask for signal independence or a prior knowledge of source number. Moreover, a large percentage of studies concentrate on acoustic signals or other special application scenarios [16,21]. Motivated by this, we propose a new DL-based method to realize stable spatial spectrum recovery appropriate for varying signal numbers in this paper. The hierarchical DNN-based method [19] and conventional spatial-smooth MUSIC (SS-MUSIC) algorithm [10] are chosen for comparison. Simulations on uniform linear array (ULA) show that the proposed method shows great advantage in terms of mean absolute error (MAE) and has good adaptability to untrained situations. The main contributions made in our work are concluded as below.

•
We design a framework based on alternate convolutional-recurrent neural network (A-CRNN), which is feasible to DOA estimation regardless of the signal coherence.

•
The scheme that source number is jointly determined by multi-label estimators and reconstructed Toeplitz matrices is employed, which greatly improves the performance of direction finding.

•
Considering the class and label imbalances happening during the training of sub-networks, we adopt Focal loss [22] and data augmentation to reduce the negative effects.

•
Colored noise and other array imperfections are considered, which validates the robustness for potential practical systems.
The rest of the paper is organized into five parts. The related mathematical foundations are given in Section 2. Section 3 describes the details of modeling A-CRNNs for DOA estimation. In Section 4, some simulations and comparisons are conducted. Finally, Section 5 draws a conclusion of the whole work.
Before the description, the mathematical notations we used are explained. Boldface letters in upper case and lower case, respectively, denote matrices and column vectors. Scalars are signed with lowercase letters. R and C mean the sets of real numbers and complex numbers. The operations of inverse, transpose, conjugate transpose and complex conjugate are expressed by (·) −1 , (·) T , (·) * and (·) H . E{a} represents the expectation of a and · denotes the Euclidean norm.

Problem Formulations
In this section, we first discuss the mathematical foundations of array receiving model for coherent signals. After that, a Toeplitz matrices-based method for source counting is introduced. To this end, we describe the multi-label classification (MLC) [23] strategy employed in the proposed DL, which fully meets the requirement of multiple directions finding.

Array Signal-Receiving Model
Consider P far-field narrowband signals with a common center frequency f 0 impinging to a ULA as illustrated in Figure 1, the sensor number of which is (2M + 1). We assume that sensors are isotropic with equal inter-element space denoted by d, (M + 1) ≥ P, and K(K ≤ P) signals are coherent. Let the first signal s 0 (t) as the reference without loss of generality, and the rest coherent sources can be expressed as where the complex constant ρ k = α k e jβ k is the correlation coefficient between s 0 (t) and s k (t). Then, array output of the mth element becomes where θ i is the ith arrival direction, c stands for the velocity of light in vacuum, and n m (t) represents the additive white Gaussian noise (AWGN) with zero mean-value. Let d = λ/2, where λ is the wavelength of carriers, and we can write the observation signals in a vector manner as where s(t) = [s 0 (t), . . . , s P−1 (t)] T is the source signal vector, n(t) = [n −M (t), . . . , n 0 (t), . . . , n M (t)] T represents noise vector, and A is steering matrix with e jMπ sin θ 0 e jMπ sin θ 1 · · · e jMπ sin θ P−1 e j(M−1)π sin θ 0 e j(M−1)π sin θ 1 · · · e j(M−1)π sin θ P−1 . . . . . . . . .

Source Number Determination
In pratical spatial spectrum recovery, we will not be informed of the source number beforehand. Therefore, we embed an extra algorithm on the basis of Toeplitz matrices decomposition in our framework to detect the source number, which performs well whether signals are coherent or not.
Taking the mth row of the output covariance matrix R = E{x(t)x H (t)}, we can reshape a Toeplitz matrix R m as below, where I M+1,m ∈ R (M+1)×(M+1) with one on the mth diagonal and zero elsewhere. By conducting the eigen-decomposition on R m , we can get and S m = diag{s m,0 , · · · , s m,P−1 } with We can find that P k,i = 0 for k = i in Equation (8), regardless of the signal coherence. Thus, ∀i ∈ {0, · · · , P − 1}, we have s m,i = 0, which means that the rank of S m is P and totally unrelated to the coherence among sources. Therefore, we can conduct the eigenvalue decomposition on R m and then pick out P larger eigenvalues, which completes the source number estimation. Here, we omit to give the derivations from Equations (5)- (8), which can be seen in [24].

Multi-Label Classification
In this paper, we model the multi-source DOA estimation as a MLC problem [23]. Let U = {θ 0 , θ 1 · · · , θ N−1 } denote a finite label set of discrete angles and X represent the input space. For a given input instance x ∈ X , we suppose that it holds the label set u, and elements of u mean the angle values to be estimated. We can define a corresponding binary vector y = [y 0 , y 1 , · · · , y N−1 ] T to represent the grid directions with where θ ∈ u and ∆θ > 0 is set to be the angular grid resolution. From Equation (9), we can acquire the output space denoted by Y = {0, 1} N . Supposing that (x, y) is an observation independently and identically generated from X × Y, the modeling of MLC can be described as a optimization problem below, where F describes the hypothesis space of the classification model with F = { f |y = f (x)}, L(·) stands for the multi-label loss function like binary-crossentropy, and λJ( f ) is called the regularization or penalty term depicting the model complexity.

A-CRNN-Based DOA Estimation
To obtain higher angular resolution but not deteriorate the estimation accuracy, we design the A-CRNN architecture as illustrated in Figure 2, which is constituted of parallel pairs of spatial filter and multi-label classifier. Spatial filters allocate multiple directions to narrower angular sectors. Afterwards, the classifiers connected to them can give out the probabilities of discretized angles belonging to each sector. Simply following the one-vs-all guideline [19] to determine the source number is not reliable, which acts as setting a threshold of the output probability to identify the appearance of signals. Therefore, we embed the Toeplitz-based method to the networks for better detecting of source number, in order to avoid causing a few false and missed alarms of signal existence. Ultimately, source numbers can be estimated almost completely correct, and we can obtain the final DOA results by concatenating the results from all sectors. For the goal of relieving the computational burdens, the proposed network receives input vector r following the guidelines of [19], where r m 1 ,m 2 represents the (m 1 , m 2 )th element of the estimated covariance matrix, where S is the number of snapshots.

Network Architecture
The proposed DOA estimation network is built by stacking different scales of the CRNN units as depicted in the upper right of Figure 2, which spontaneously establishes the alternative CNN and RNN structure. Unlike 2-dimensional (2-D) images, time series generated by vectorizing the covariance matrices form the 1-dimensional (1-D) input vectors. Furthermore, from Equation (2), it is obvious that the received data keeps invariant against the order of arrival angles. Thus, 1-D CNNs are appropriate for locally low-order feature extraction. However, the input data of the network is time-involved, and merely using of convolution operations can not give better fitting from the covariance data to angle values. Therefore, we are motivated to design the CRNN unit to construct the DOA estimation network.
Among various types of RNN layers, bidirectional long short-term memory (BiLSTM) and bidirectional gated recurrent units (BiGRU) [25] are two of the most popular structures. They can get access to inputs both in the forward and backward directions, which is benefit of exploring the time-dependence. In addition, they are capable to mitigate gradient vanishing. Even though the training of BiGRUs takes less time, they perform rather poorly regarding the tests to new data, which can be numerically verified in the later simulation section. In the long term, we decide to choose BiLSTMs as the recurrent neurons. Consequently, the N-CRNN unit is comprised of a convolution layer, a BiLSTM layer, and a fully-connected feed-forward (FF) [16] layer, all with N kernels.

Spatial Filters
Each spatial filter shown in Figure 2 is formed with a 32-CRNN unit, deciding which classifier the inputs should be sent to. Compared to the linear autoencoder in [19], our filters allow coherent inputs and hold higher division accuracy. Choose L + 1 angles θ 0 < θ 1 < · · · < θ L , uniformly dividing the direction space into L intervals, which means θ 1 Considering the signals arriving in the ULA from several directions of θ 0 , θ 1 , · · · , θ P−1 , we can denote the output of the lth spatial filter as

Multi-Label Classifiers
The second part of the fine-grained DOA estimator consists of L parallel A-CRNN-based classifiers. Larger L implies we need to train more sub-networks, whereas smaller L leads to degradation of estimation precision. Thus, we should consider a compromise when choosing the number of classifiers. All the multi-label classifiers illustrated in Figure 2 are constructed by stacking a 128-CRNN unit up to a 64-CRNN unit, which is then flattened to a dense layer as the output layer.
In order to make a MLC assignment, we sample the direction interval into N discrete angles shown below, where ∆θ is the angular resolution and u l denotes label set of the lth classifier. Let y l = y l,0 , · · · , y l,n , · · · , y l,N−1 remarks output vector of the lth classifier. y l,n , as described in Equation (9), expresses the probability if there exists an arrival direction which is equivalent to the nth element in u l . It should be supplemented that the lth multi-label classifier will be triggered if and only if z l is equal to 1; otherwise, the output vector of it will be directly set to 0. Based on this strategy, the lth classifier does not have to take the DOAs outside of the interval [θ l−1 , θ l ) into consideration, which largely accelerates the training procedure.

Global DOA Estimation
Combining the outputs of L classifiers, the expected result of the whole network is written as y = [y 0,0 , · · · , y 0,N−1 , · · · , y ln , · · · , y L−1,0 , · · · , y L−1, In particular, if the coming signals are all from the same sector, for instance, the lth sector, then the final outputs should be y = [0 T N×1 , · · · , 0 T N×1 , y T l , 0 T N×1 , · · · , 0 T N×1 ] T . Supposing that the source number determined by means of the Toeplitz matrix-based method as aforementioned in Section 2.2 isP 0 . Then,P 0 larger elements over the total L × N outputs from y in (15) are selected out and their corresponding subscripts constitutes a set denoted by To further reduce the rate of determination failure, we set two threshold values p min and p max , with 0 < p min < p max < 1. On one hand, if y d p < p min , d p will be removed from D. On the other, the index of the unselected y ln (15) with y ln > p max will be added into the set D. At last, the source number will be updated toP. We can gradually recover the directions {A 0 , · · · , A p , · · · , AP −1 } impinging to the ULA of multi-sources by where (y d p · ∆θ) accounts for the interpolation within the angular resolution ∆θ.
In the literature of classical estimation theory, the variance of phase estimation is proportional to 1/cos 2 θ, which indicates that the performance will sharply deteriorate when |θ| is close to 90 • [26]. The proposed DOA classifier will also experience a relatively slight deline in accuracy with the angular sector approaching ±90 • , and the corresponding MAE performance is depicted in Figure 3. We choose L = 6 to uniformly split [−60 • , 60 • ). Figures 4 and 5 exhibit two typical testing responses of the proposed Toeplitz A-CRNN model to three sources with the first two of them are coherent. Figure 4a depicts the dividing results of the sources from three different sectors with the directions (−28 • , −5 • , 44 • ), and the concatenated output of six spatial filters (z 0 , z 1 , · · · , z 5 ) is(0, 1, 1, 0, 0, 1). Figure 5a gives the sector division of directions (−38 • , −36 • , −22 • ), which are all from the second sector. The reconstructed spatial spectrums are individually plotted in Figures 4b and 5b.

Simulations and Discussions
In this section, numerical experiments are given and the results are discussed. First of all, the simulation conditions are stated. Then, we compare the coherent DOA estimation performances among the proposed method, hierarchical DNN-based algorithm [19], and traditional SS-MUSIC [10]. Finally, we test our Toeplitz A-CRNN algorithm in diverse untrained circumstances, which reveals well capability of generalization.

Simulation Settings
In the simulations, received data is hexadecimal quadrature amplitude modulation (QAM) signals generated from three narrow-band sources, with the first two of them coherent. Real and imaginary components of the coherent coefficients are randomly generated. Spacing between the array sensors is half of the wavelength, and other settings with respect to data preparation are listed in Table 1. We can construct the database by stochastically choosing 3 angular values from (θ max − θ min ) \ ∆θ = 120 candidates, which totally produces C 3 120 = 280, 840 samples. Different from the identical interval among directions in [19], our arbitrary sampling scheme more approaches the actual situation. To release the training loads, only one-sixth of the samples are reserved at random. We take 80%, 10%, and 10% of the instances for model training, validation, and testing. Training settings concerning spatial filters and multi-label classifiers are enumerated in Table 2. In spatial filters, output layers are activated by the so f tmax function and other upper layers use relu function. For the training dataset to a spatial filter, only one-sixth of the samples belong to the positive class while others are negative for L = 6, which can cause the class imbalance [27]. To settle this problem, we adopt Focal loss [22], which can guide the learning procedure inclining to the less positive samples. Similar to spatial filters, the popular relu is selected as the activation function to the layers of multi-label classifiers except for the last one, which is changed by sigmoid function to match the MLC loss described in (10). Instances in the same sector can be clustered to three categories distinguished by the source number from the current sector, which is possible to be 1, 2, or 3. The statistical distribution of the three cases is drawn in Figure 6. Observations containing two or three sources related to the same sector take proportions of 16.45% and 1.00%, respectively, yet the one-source case accounts for 82.55%, which forms the severe multi-label imbalance [28]. Hence, we re-sample the minority cases under two extra SNRs quite closing to 20 dB to balance the training dataset. noise-signal ratio 20 dB size of mini-batch 50 regularization l 1 -norm optimizer Adam Figure 6. Source number distribution of the same sector.

Comparison and Evaluation
Recalling that autoencoders in [19] are only valid to independent signals, we extend our trained A-CRNN spatial filters to the scene of two independent sources to form a contrast, which reflects the generalization capability of our models to the independent scene as well. The larger area under receiver operating characteristic (ROC) curve [29] indicates the better performance of the binary classifier. Figure 7 plots the ROC curves of the spatial filters applied in the baseline and the proposed framework. The larger area-under-curve (AUC) means that the model will execute a correct classification in higher probability, and our spatial filter represents the superiority.

Three-Source Testing under AWGN
In this part, we check the accuracies of coherent DOA estimation in AWGN environment. First, the SNR remains unchanged at 20 dB. Table 3 reports the results of five algorithms with absolute-error tolerances less than 1 • , 4 • , 7 • , and 10 • of the entire testing set. Moreover, average testing durations to different methods of a single instance are given. The best results are highlight in bold. The framework in [19] is constructed by multi-layer neural networks which are simply full-connected, and we briefly name it as FC-NN. FC-NN presents poor performance while having the best computative efficiency. It reflects that FC-NNs have limited feature extraction ability for multi-source DOA estimation. The proposed framework reaches the highest estimation accuracy and the testing time is less than the traditional SS-MUSIC algorithm, which indicates that our off-line trained models can relieve the computing burden without damaging the performance. Assisted by Toeplitz-based source-counting, 95.77% of the errors are suppressed smaller than < 1 • . Second, testing results in variable levels of SNR are illustrated in Figure 8. We split the testing samples to four subsets with the minimum angle interval through three sources being 3 • , 5 • , 8 • , and 12 • .
Seen from the subfigures in Figure 8, our Toeplitz A-CRNN method works better over the SNR levels when the angular distance is less than 8 • . SS-MUSIC algorithm will outperform if both the SNR is high and the angle interval is significantly large, which are tough conditions in real applications. Finally, it commonly happens that we are not able to get a great enough number of snapshots. Moreover, several kinds of array imperfections always emerge, such as gain inconsistence, sensor-position bias, and inter-sensor mutual coupling, which have been well modeled in [19]. According to the authors of [19], the imperfect steering matrix can be rewritten as where δ describes the intensity of deviations. Expressions of E mutual , γ gain , and e position can be found in [19], which are omitted here. At this time, conventional direction finding algorithms often can not guarantee the effectiveness. Thus, we need to verify whether the proposed network trained on the ideal ULA can adapt to these pratical interferences. Spotted line charts in Figure 9 indicate that the proposed method remains powerful in learning the angle features in the contrast with traditional SS-MUSIC.

Testing in Untrained Numbers of Source
In order to examine the generalization ability of our framework to changeable number of sources, we form the two-signal and four-signal testing sets, which never appear in the training set, and the level of SNR is fixed at 20 dB. As individually illustrated in Figures 10 and 11, subfigures of them depict DOA estimation performance to each one of the two or four signals. From the figures, we can see that most of the colored crosses denoting the predicted angles fall near the hollow black circles which stand for the ground-truth. Therefore, it is reasonable to deduce that the proposed method is self-adaptive to various numbers of arrival directions.

Generalized to Colored Gaussian Noise
In practice, the noise is almost colored. Therefore, it is necessary to test the robustness of our trained models to colored Gaussian noise, which is simply simulated as [30] n c (t) = n(t) + 0.5n(t − 1) (20) where n(t) denotes AWGN. Figure 12 gives out the testing results of proposed Toeplitz A-CRNN network to the signals from three coherent sources contaminated with colored Gaussian noise. Our models are trained in white Gaussian noise at the SNR of 20 dB, and are tested at −20 dB, −10 dB, 0 dB, and 10 dB. Estimation accuracies in the untrained colored-Gaussian-noise scenario approximately go through a 0.5 • decline in terms of the MAE measurement.

Discussions
As shown in above numerical experiments, the Topelitz A-CRNN method outperforms baselines in fitting the mapping from array received data to arrival directions, because BiLSTMs in CRNN units can adequately explore the time-dependency among local features extracted by CNNs. Owing to the Topelitz module in the proposed framework, the trained network can be self-adaptive to changeable number of sources, which even are coherent.
Besides the performances, we also need to analyze the testing computational complexities. As for the FC-NN algorithm [19], the complexity is O(L in L out ), where L denotes the size of layer input or output vector. Computations in SS-MUSIC are mainly generated from subspace decomposition and spectral peak-searching and can be formulated as O((2M + 1) 2 S + (2M + 1) 2 F) [31], where F is the number of spatial frequencies which is related to the resolution. In the proposed network, the heaviest calculation burden arises from the BiLSTM part, which is O(W). W is the number of parameters in BiLSTM layers, which is always far greater than L in L out .

Conclusions
In summary, this work demonstrates a new structure of DL network to address the DOA estimation problem. Designed spatial filters and alternate multi-label classifiers based on CRNN units can recover the arrival angles of coherent signals. With the facilitation of Toeplitz matrix reconstruction, our framework still reaches a high estimation accuracy when the source number is unknown. Simulations on ULA show great advantages to state-of-the-art FC-NN and the conventional SS-MUSIC algorithm especially when arrival directions of the sources are adjacent. Meanwhile, our trained Toeplitz A-CRNN model reveals excellent adaptation to practical conditions such as limited snapshots, array imperfections, lower SNR, and colored Gaussian noise. In addition, as networks can be trained offline, the proposed method is computational efficient in real-time testing phase, which shows good prospects for realistic applications.
Further studies to extend the proposed A-CRNN framework can be making it adaptable to different kinds of input-signals and array geometries. We will consider to introduce the multimodal learning strategy (see, e.g., [32,33]) to the framework for potential realization.

Conflicts of Interest:
The authors declare no conflicts of interest.