Leveraging Deep Learning for Practical DoA Estimation: Experiments with Real Data Collected via USRP

This paper presents an experimental validation of deep learning-based direction-of-arrival (DoA) estimation by using realistic data collected via universal software radio peripheral (USRP). Deep neural network (DNN) and convolutional neural network (CNN) structures are designed to estimate the DoA. Two types of data are used for training networks. One is the data synthesized by the signal model, and the other is the data collected by USRP. Here, the signal model considers both mutual coupling and multipath signals. Experimental results show that the estimation performance is most accurate when training DNN and CNN with the collected data. Furthermore, the estimation tends to be poor in the indoor environment, which suffers from the strong non-line-of-sight (NLoS) signals.


Introduction
Direction-of-arrival (DoA) estimation is one of the long-studied research topics in array signal processing. DoA estimation algorithms have been adopted in various applications, such as localization and radar [1]. Traditional DoA estimation algorithms such as MUSIC [2] and ESPRIT [3] are proposed based on the characteristic of the signal model. Although they can ideally achieve high estimation accuracy and resolution, unexpected problems (e.g., multipath effect [4], gain-phase error [5], mutual coupling [6], antenna misalignment [7], etc.) may exist in practice. In this case, the signal model cannot capture the characteristics of a real received signal, thereby causing degradation of the DoA estimation performance.
There have been various studies to deal with problems that may cause model mismatch. One of these studies is coherent DoA estimation, which can estimate multipath signals impinging from different directions [4,8,9]. On the other hand, there are problems induced by hardware impairments such as gain-phase error, mutual coupling, and antenna misalignment. There have been efforts to calibrate these errors without using reference signals, where [5,10,11] deal with gain-phase error, [6,12,13] deal with mutual coupling between antennas in an array, and [7,14] deal with the errors in the steering vector that can be caused by antenna misalignment. However, the performance of the aforementioned works may degrade when several problems simultaneously occur or there are more unexpected problems.
After the introduction of deep learning [15], DoA estimation algorithms based on various types of neural network (NN) have been proposed in [16][17][18][19][20]. One of the benefits of using deep learning is that the user does not have to know the exact signal model if there are sufficient training data. For this reason, it is expected that the deep learning-based DoA estimation does not suffer from model mismatch problems by using data that capture all kinds of errors (e.g., actual measured data). In [16,17], deep neural network (DNN)-based DoA estimation was proposed, where these works report that the DNN-based estimation is more accurate than the traditional DoA estimation. After the proposal of the DNN-based DoA estimation, convolutional neural network (CNN)-based DoA estimation has been studied in [18,19]. The CNN-based DoA estimation shows better estimation accuracy and resolution compared to the DNN-based DoA estimation. In [20], the DoA estimation based on unsupervised learning was proposed, where unsupervised learning can make data collection easier since data labeling is not required. In recent works, there have been efforts to exploit features of classical DoA estimation, rather than solely depending on NN.
Refs. [21][22][23] respectively employ DNN, CNN, and recurrent neural network (RNN) to estimate the ideal noiseless covariance matrix, which is denoted as a pseudo covariance matrix. Then, the classical DoA estimation such as MUSIC and root-MUSIC estimates the DoA with pseudo covariance matrix. In [24], the residual neural network (ResNet) first estimates the candidates of DoAs. From the candidates, the classical maximum likelihood estimation (MLE) [25] picks the final DoAs. A combination of these two methods enhances the accuracy while achieving lower complexity than only using MLE.
However, the existing works on the deep learning-based DoA estimation lack experimental validation, even though the deep learning-based DoA estimation is expected to be effective in a practical situation where there are many problems that cause a model mismatch. In this paper, we validate deep learning-based DoA estimation with realistic data collected by a universal software radio peripheral (USRP). In the experiment, two types of data-data synthesized by the signal model and data collected by USRP-are used for training networks. The estimation accuracy is then analyzed according to the type of training data.

System Model
In this paper, we consider one transmitter, which is equipped with an omni-directional antenna. A receiver is equipped with a uniform linear array (ULA), which has M antenna elements. The spacing between adjacent antennas is set to half-wavelength λ/2, where λ denotes the wavelength of the transmitted signal.
To generate the data for training DNN and CNN, the signal model should be defined. Here, the generated data are expected to be well-suited for training if the signal model can capture the state of the hardware systems. Among the many kinds of hardware-induced problems, the gain-phase error in our systems is calibrated using the method in [26]. The antenna spacing is designed to be half-wavelength so that there is no antenna misalignment. However, the current systems cannot calibrate mutual coupling and multipath effects. Thus, in this paper, we consider mutual coupling and multipath effects to design the received signal model.
An array manifold vector whose DoA is θ, a(θ) can be given as follows: To capture mutual coupling and multipath effects, we model a received signal X ∈ C M×D as: where C ∈ C M×M denotes the mutual coupling matrix [27]. P denotes the number of non-line-of-sight (NLoS) paths. α p and θ p respectively denote the channel gain and the DoA of the p-th signal path. Specifically, α 0 and θ 0 denote the channel gain and the DoA of the line-of-sight (LoS) path. s = [s 1 , . . . , s D ] T ∈ C D×1 denotes a signal vector, whose power equals σ 2 s . D is a number of signal snapshots. N ∈ C M×D is a noise matrix, whose entries all follow CN (0, σ 2 ). σ 2 denotes the power of the noise. R X , the covariance matrix of X, can be defined as:

Deep Learning Network Structure for DoA Estimation
This section introduces two network structures for DoA estimation, which are respectively based on DNN and CNN. A scheme of deep learning-based DoA estimation is depicted in Figure 1. In the presence of multipath signals and mutual couplings, the deep learning network aims to estimate the DoA of the LoS path using the covariance matrix.

DoA Estimation via Deep Neural Network
Since the input of the DNN has to be a real vector, the input of the DNN χ DNN ∈ R 2M 2 ×1 is formulated as: where vec(·) denotes the vectorization. real(·) and imag(·) respectively denote the real and imaginary values of the argument. · F denotes the Frobenius norm. Letting L, d (l) , and I (l) respectively denote the number of dense layers, the output of the l-th layer, and the size of d (l) , d (l) (j) can be given by: where d (0) = χ DNN . ReLU(·) denotes the ReLU function, which is widely used as the activation function of the neuron [15]. U (l) and v (l) denote the weights and the bias of the l-th layers. The loss function of the DNN is given by MSE, θ 0 − θ 0 2 , whereθ 0 denotes the estimated value of the DoA of the LoS path. The weights and the bias that minimize the loss function can be denoted as: whereÛ (l) andv (l) denote the weights and the bias of the l-th dense layer that minimize the loss function. (6) is implemented by back propagation. A parameter setting for the DNN structure used in this paper is summarized in Table 1.

DoA Estimation via Convolutional Neural Network
Since the input of the CNN can be a three-dimensional matrix, the input of the CNN χ CNN ∈ R M×M×2 is formulated as: where ; denotes an operator that overlaps matrices with the same dimension. An output of the k-th convolutional layer, C (k) , can be represented as in [18]: where * denotes the convolution. C (k) (:, :, j) denotes the j-th channel of 3D tensor C (k) and C (0) = χ CNN . J(k) and K are the number of kernels and the number of convolutional layers. W (k) j ∈ R Q k ×Q k denotes the j-th kernel in the k-th layer, where Q k is a dimension of the kernels in the k-th layer. B (k) j denotes the bias for the j-th kernel in the k-th convolutional layer.
After undergoing K convolutional layers, all values of C (K) are summed to yield the output. The loss function of the CNN is given by MSE, θ 0 − θ 0 2 . The convolution kernel and the bias that minimize the loss function can be given by: j denote the j-th convolution kernel and the bias of the k-th layer that minimize the loss function. (9) is implemented by back propagation. A parameter setting for CNN structure used in this paper is summarized in Table 2.

Parameter
Value (or Type)

Loss function MSE
Optimizer Adam

Activation function ReLU
Batch size 100

Experimental Setup
In this paper, we use two types of data. One is synthesized data generated based on the signal model in (2). We generated 4,000,000 synthesized data via MATLAB. Here, M, D, and the maximum mutual coupling strength were respectively set to 4, 512, and 0.05. α 0 was fixed to 1. P was randomly set between 0 and 10, and α p was randomly set between 0 and 0.5. The signal-to-noise ratio (SNR) of synthesized data was also randomly set between 0 dB and 20 dB, where the SNR is defined as 10log σ 2 s /σ 2 [dB]. Another type of data is that collected with USRP. Note that this data may differ from the signal model in (2). If so, the estimation is expected to be inaccurate when the network is trained with synthesized data. Figure 2 shows a transmitter and a receiver used for the experiment. The transmitter mainly consists of USRP 2954R and the transmitting antenna. The receiver mainly consists of USRP 2955 and a receiving antenna array. Although USRP 2954R and USRP 2955 support a frequency range of 10 MHz-6 GHz, we set the carrier frequency to 5.8 GHz, which is the center frequency of antennas. For this reason, the spacing between patch antennas in the array is designed to the half-wavelength of 5.8 GHz. USRP 2954R generates a 5.8 GHz cosine wave, and the transmitting antenna emits the wave. Then, USRP 2955 receives the cosine wave via the antenna array at a sampling rate of 1 MHz. By using GNU radio, the covariance matrices of the received signals are collected with USRP 2955. As shown in Figure 3, we data in two different environments, the indoor hallway and the outdoor parking lot. In the indoor hallway, the NLoS signals were expected to be stronger than those in the outdoor parking lot. For this reason, the DoA estimation was expected to be inaccurate in the indoor hallway. DoA estimation range is restricted to [40 • , 140 • ] since the radiation pattern of each patch antenna in an array is directional. From 40 • to 140 • in 10 • increments, a total of 17,600 covariance matrices were collected, where half of the data were collected in the indoor hallway while the other half weere collected in the outdoor parking lot. During the experiment, the transmitting power was set to 20 dBm, and the distance between transmitter and receiver was fixed to 6 m. With the collected data, the DoA estimation accuracy is analyzed in the following subsection.

Peformance Analysis and Discussion
Before analyzing the DoA estimation performance, we checked the similarity between collected data and synthesized data. Since the DoAs of the multipath signals weere not measured, we compared the collected covariance matrices with ideal covariance matrices. Ideal covariance matrices are the covariance matrices calculated without considering multipath signals and other hardware-induced errors. The ideal covariance matrix is defined as: where R ideal (θ) denotes an ideal covariance matrix according to θ. Θ is a set consisting of labeled DoAs of collected data, which equals {40 • , 50 • , 60 • , 70 • , 80 • , 90 • , 100 • , 110 • , 120 • , 130 • , 140 • }. The correlation between collected covariance matrices and ideal covariance matrices is defined as: where ρ(θ) denotes the correlation according to DoA. R col (θ) denotes the collected covariance matrix whose DoA label is θ. A, B denotes the correlation between two matrices, which equals real(trace (A H B)). E[·] denotes the mean calculated using collected data. ρ(θ) ∈ [0, 1], where ρ(θ) = 1 when R col (θ) can be represented as aR ideal (θ). Here, a is a constant. Figure 4 shows the correlation between collected covariance matrices and ideal covariance matrices according to DoA and the experiment environment. As expected, the data collected in the indoor environment has a low correlation since it suffers from strong multipath signals. On the other hand, the data collected in the outdoor environment has a higher correlation since there are fewer objects that can make multipath signals. To analyze the DoA estimation performance, we compared five algorithms. One was MUSIC [2]; two were based on DNN and CNN in Section 3.1, and were trained with 4,000,000 synthesized data. The other two algorithms were also based on DNN and CNN in Section 3.1, but they were trained with 75% of the 17,600 collected data points . When using the collected data for training networks, 25% of the 17,600 collected data points were used for testing DoA estimation accuracy. The root mean squared error (RMSE) is defined as where θ 0 andθ 0 respectively denote the true DoA and the DoA estimated using the test data. Figure 5 presents two results for the indoor environment, the RMSE of the DoA estimation algorithms and the histogram of estimation results. Both results are derived based on indoor collected data. As expected, Figure 5a,b show that the estimation accuracy is poor in the indoor environment due to the strong NLoS signals. Furthermore, the results also show that the signal model in (2) fails to capture the indoor propagation characteristics since estimation accuracy decreases when using synthesized data for training. However, the DNN and the CNN trained with collected data show much better performance than others. The RMSE of the DNN and the CNN trained with collected data do not surpass 3.5 • in every DoA. To be more specific, Figure 5b shows that estimation results tend to gather around the actual DoA when using collected data. When using synthesized data, however, there is a difference between the mean of estimation results and the actual DoA. Moreover, the variance of the estimation is high. Figure 6 presents two results for the outdoor environment, the RMSE of the DoA estimation algorithms and the histogram of estimation results. Since the NLoS signals are expected to be much weaker in the outdoor environment, the RMSE of all algorithms are much lower than those in Figure 5a. The RMSE tended to increase when the DoA got far from 90 • . We think that this is due to the directivity of the antenna elements. The gain of each antenna element was 5 dBi when the DoA was 90 • . However, the gain dropped to 2 dBi when the DoA was 40 • or 140 • . Overall, the DNN trained with synthesized data were more accurate than MUSIC except in a few DoAs, but the RMSE of the CNN trained with synthesized data was unexpectedly high when the DoA was 40 • . Meanwhile, the DNN and the CNN trained with collected data were more accurate than others. To be more specific, when using synthesized data, there was a difference between the mean of estimation results and the actual DoA. However, this difference was smaller than that in Figure 5b. When using collected data, estimation results tended to gather around the actual DoA. Table 3 shows a total RMSE of all DoA estimation algorithms. In the indoor environment, the algorithms except those using collected data showed poor performance. Among them, the DNN and CNN trained with synthesized data showed slightly better perfor-mance than MUSIC. Although the performance of all algorithms improved in the outdoor, the DNN and CNN trained collected data showed much better performance than others. Since the RMSE of the CNN trained with synthesized data soared when the DoA was 40 • -its total RMSE was larger than that of MUSIC. Table 4 shows the training time, computation time, and computational complexity of each algorithm. Here, the computational complexity of CNN is derived using [28]. The training time is proportional to the amount of training data. When training a network with 4,000,000 synthesized data points, it took 3500 and 4400 seconds to train the DNN and CNN. On the other hand, it took 190 and 230 seconds to train the DNN and CNN when using 13, 200 collected data points. Although the DNN and CNN-based DoA estimation take a long time for training, their computational complexity is much less than MUSIC once the networks are trained.  From all results, we conclude that training with collected data enables accurate DoA estimation. However, collecting sufficient data can be difficult in practice. One of the solutions to this problem is using the synthesized data that well capture the characteristics of the realistic wave. Another solution is to use unsupervised learning such as [20]. Unsupervised learning can make collecting data much easier since data labeling is not required.

Conclusions
We present the experimental validation of the deep learning-based DoA estimation using USRP. The DNN and the CNN structures are designed to estimate the DoA of the LoS path with the covariance matrix. In the experiment, two types of data are exploited. One is the data synthesized with the signal model, and the other is the data collected by USRP. The experimental results show that the DoA estimation is most accurate when training DNN and CNN with the collected data. Furthermore, the DoA estimation performance is poor in the indoor environment, which suffers from the strong NLoS signals. However, collecting sufficient data may not be feasible in practice. We expect that this can be resolved by better signal modeling and unsupervised learning.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: