Fast Target Localization Method for FMCW MIMO Radar via VDSR Neural Network

: The traditional frequency-modulated continuous wave (FMCW) multiple-input multiple-output (MIMO) radar two-dimensional (2D) super-resolution (SR) estimation algorithm for target localization has high computational complexity, which runs counter to the increasing demand for real-time radar imaging. In this paper, a fast joint direction-of-arrival (DOA) and range estimation framework for target localization is proposed; it utilizes a very deep super-resolution (VDSR) neural network (NN) framework to accelerate the imaging process while ensuring estimation accuracy. Firstly, we propose a fast low-resolution imaging algorithm based on the Nystrom method. The approximate signal subspace matrix is obtained from partial data, and low-resolution imaging is performed on a low-density grid. Then, the bicubic interpolation algorithm is used to expand the low-resolution image to the desired dimensions. Next, the deep SR network is used to obtain the high-resolution image, and the ﬁnal joint DOA and range estimation is achieved based on the reconstructed image. Simulations and experiments were carried out to validate the computational efﬁciency and effectiveness of the proposed framework.


Introduction
Frequency-modulated continuous wave (FMCW) has achieved great success in the field of communications and has broad prospects in applications such as altimeters [1], vehicle radars [2][3][4] and synthetic aperture radars (SARs) [5][6][7][8][9]. The merits of FMCW radars lie in their ranging ability and low power consumption [10,11]. Recently, the FMCW multiple-input multiple-output (MIMO) radar was investigated, and it has an equivalent virtual antenna array, which independently pairs transmitting and receiving elements as virtual elements [12][13][14]. This greatly expands the aperture of the array, but is accompanied by a sharp increase in data dimensions. Moreover, the received signal of FMCW MIMO radar contains the information of direction-of-arrival (DOA) and range, which can be used for target localization. It is indicated that the resolution of range is proportional to the number of snapshots, which leads to a greatly reduced real-time performance of the system when expanding the dimensions of the data for target localization using traditional algorithms.
In order to achieve high accuracy target localization, joint DOA and range estimation with high resolution is the key issue. As solutions to joint DOA and range estimation, 2D algorithms have been proposed [13,15,16]. The 2D Fast Fourier Transformation (2D-FFT), which is a fast algorithm, is used to estimate DOA and range. Unfortunately, due to the limitations of the Rayleigh criterion and the bandwidth of FMCW MIMO systems, the resolution is not satisfactory. In order to improve the resolution, a 2D multiple signal classification (2D-MUSIC) algorithm for the joint angle and range estimation was presented, which was able to achieve good performance in the experimental data scenarios. However, as for the algorithm in [16], a huge covariance matrix and 2D spectral peak search are needed, which lead to high computational complexity. Thus, these methods make it difficult to meet the real-time requirements.
In recent years, with the rapid development of machine learning, super-resolution methods based on deep learning have become a hot spot [17], and the performances of these methods have come to significantly surpass those of common methods [18][19][20][21]. In [18], an SR algorithm based on sparse dictionary and anchored neighborhood regression (ANR) was proposed. It has superior reconstruction speed and quality. However, the anchored neighborhood projections of ANR are unstable for covering varieties of mapping relationships, so is not suitable for dealing with practical engineering problems. In [19], an image SR algorithm based on local linear regression and random forests was proposed. The stability of this method is higher than the method in [18], but it cannot deal with SR tasks with different magnifications either. In [20], an improved self-similarity based SR algorithm was proposed, which exploits the statistical prior of the same image. However, the internal dictionary is not always suitable, which leads to performance loss. In [21], a convolutional neural network (CNN) was firstly used to implement image SR. Although this method has excellent results due to its shallow network depth, it cannot achieve sufficient field of vision, and cannot support multiple high-definition. As a result, this method is difficult to use in SR tasks involving radar images with multiple sizes and grids. The emergence of very deep super resolution (VDSR) [17] was a qualitative leap for networks based on a pure CNN architecture. Whether it is an ultra-deep network structure or flexible image magnification, VDSR is more suitable for dealing with the problem of radar image super-resolution. This approach focuses on the brightness channel information of the image, reconstructs the brightness residual between the high-resolution image and the low-resolution image and finally obtains the high-resolution image. It is worth noting that the radar image can be regarded as a color image with only the brightness channel; thus, the VDSR framework is a more suitable representation of the reconstructed radar image.
In this paper, we propose a fast joint DOA and range estimation framework based on a VDSR neural network to accelerate the estimation process without precision loss. The proposed framework splits the estimation process into two parts: In the first part, to solve the problem that the traditional 2D-MUSIC algorithm incurs a high computational cost during covariance decomposition, the Nystrom method [22] is introduced to use the covariance of partial data and obtain an approximate signal subspace. This procedure avoids the calculation of the original covariance matrix. Then, a low-density grid is used to generate small size and low-resolution images to avoid a massive 2D peak search. The second part focuses on improving the estimation accuracy of the whole framework. The VDSR network is used to construct a high-resolution image from the low-resolution image achieved in the first part. Finally, the DOA and range are estimated from peaks of the reconstructed image. The simulation results show that the proposed algorithm is much faster than the traditional high precision solutions and the experimental results further verify its performance.
The main contributions of our work are summarized as follows: (1) A fast joint DOA and range estimation framework based on a VDSR neural network is proposed. The framework can estimate the DOA and the range of FMCW MIMO radar in a computationally efficient manner without precision loss.
(2) The proposed framework uses the Nystrom method to reduce the computational complexity of the high-dimensional matrix signal subspace, and VDSR to ensure the accuracy of the estimation.
(3) Simulations and experiments were carried out to validate the proposed framework, and it is demonstrated that running time is greatly reduced without loss of accuracy.
The rest of the paper is organized as follows. In Section 2, the problem is formulated and the data model is presented. A fast imaging algorithm based on Nystrom method and a super-resolution imaging based VDSR method for FMCW MIMO radar are presented in Section 3. The training strategies are introduced in Section 4. Simulation and experimental results are used to demonstrate the superiority of the proposed method compared to the traditional 2D-MUSIC method. The paper is concluded in Section 5.
The notation related to this paper is shown in Table 1.

Data Model
Consider a Texas Instruments (TI) Cascade FMCW MIMO Radar system consisting of MMWAVCAS-RF-EVM and MMWAVCAS-DSP-EVM, shown in Figure 1, with 12 transmitting elements and 16 receiving elements. As shown in Figure 2, the transmitting and receiving elements form a huge virtual array; i.e., each virtual element will be generated at the midpoint of any two transmit and receive elements. λ denotes wavelength. We selected a row of M purple virtual array elements to form a uniform linear array (ULA).  As shown in Figure 3, where θ and d denote the DOA and interspacings of the element, respectively, the FMCW signal transmitted from the transmitting element can be expressed as follows: where f c and k s denote the carrier frequency and the slope of the chirps, respectively. For K far-field narrow-band stationary targets, the transmit signal is reflected in K far-field narrow-band stationary targets and the received signal of the m-th receiving element can be represented as: where γ k is the complex reflection coefficient of the k-th target, n m (t) is the additive white Gaussian noise (AWGN) at the m-th receiving element and the time delay τ mk is the time taken for the signal radiated from the transmitting element to be reflected on the k-th target and received by the m-th receiving element, given as: where c is the speed of light; R k is the distance between the transmitting element and k-th target; R mk is the distance between the receiving element and the k-th target; and θ k denotes the DOA of the k-th target. y 0m is the relative positions of the m-th receiving element with respect to the transmitting element. Then the time delay τ mk can be expressed as: The received signals x m (t) are then multiplied by the transmitted signal and run through a low-pass filter (LPF), and with a sampling time T s ; the x m (t) can be sampled as: where n m (nT s ) is the sampled noise. As 2R k c d(m−1) sin θ k c , ignoring noise for the moment, the received signal of the m-th element at time n can be expressed as: Based on Equation (6), for L snapshots, the matrix form of the received signal with additive white Gaussian noise (AWGN) can be expressed as: In order to achieve super-resolution imaging, a new receiving data model is obtained by transposition and vectorization:

Fast Joint DOA and Range Estimation
Consider the data model of Equation (9). The sampling covariance can be expressed as: where R s = E SS H is the signal covariance. The above equation can be decomposed into the signal subspace and the noise subspace using eigenvalue decomposition: where Λ s is a diagonal matrix composed of the K largest eigenvalues. Λ n is a diagonal matrix composed of the LM − K smaller eigenvalues. U s is the signal subspace composed of the eigenvectors corresponding to the largest K eigenvalues, and U n is the noise subspace composed of the remaining eigenvectors.
Since the signal subspace is orthogonal to the noise subspace, i.e.,Ā H U n = 0, the spatial spectrum function of the 2D-MUSIC algorithm is: Therefore, the K largest peaks of P 0 are the DOA and range estimation of the targets. The computation of 2D-MUSIC focuses on 2D spatial searching and matrix decomposition, whose computational complexities are where q is the number of grids. It can be seen that as the snapshot number and grid density increase, the amount of calculations will grow exponentially, which seriously affects the real-time performance of the system.
To solve this problem, we propose a framework of fast joint DOA and range estimation via Nytrom and VDSR. The structure of the proposed framework is shown in Figure 4, and is divided into four parts: reshaping the received data, using the Nystrom method for estimating subspace, using 2D-MUSIC for low-resolution imaging and using VDSR for reconstructing a high-resolution image.  The reshaping part transforms the original data into the partitioned sampling convariance matrix, as shown in Equations (8) and (10). The signal subspace is obtained using a low-dimensional approximation in the Nystrom part, which reduces the computational load of eigenvalue decomposition. In the 2D-MUSIC part, low-resolution imaging with low grid density is obtained using the signal subspace. The joint DOA and range estimation can be obtained from the VSDR part through a 2D peak search on the high-resolution image.

Nystrom-Based Low-Resolution Imaging
In this part, the Nystrom method is used to estimate the signal subspace. Then, the 2D-MUSIC spatial spectrum function is formulated based on the signal subspace and the low-resolution image is obtained.
The covariance matrix R is partitioned as follows: where The approximate signal subspace of the covariance matrix R is obtained by using the Nystrom method, which only requires the information of R 11 and R 21 . Thus, it is not necessary to calculate the covariance matrix R. This information can be obtained by partitioning the received dataX as follows:X whereX 1 ∈ C z×L ,X 2 ∈ C (ML−z)×L .
According to Equations (13) and (14), we have: whereĀ 1 andĀ 2 are matrices composed of the first z rows and the last ML − z rows ofĀ, respectively. By applying eigenvalue decomposition on R 11 , we obtain: The approximate characteristic matrix is obtained using the Nystrom extension: In Equation (18), matrixŨ does not satisfy the mutual orthogonality of the column vectors, so the following orthogonalization operation is adopted. Let G=ŨΛ 1/2 11 and decompose the eigenvalue of G H G: The approximate eigenmatrix satisfying the orthogonality of columns can be obtained as follows: The approximate signal subspace comprises the last K columns of U.

Lemma 1 ([22]
). We extend the lemma from the array to FMCW MIMO. In the equivalent virtual array of FMCW MIMO radar, if there are K targets, we have span{U s } = span{F K }, where F K represents the first K columns of U. From Equations (17) and (19), it can be deduced that: where (14) and (15), we have:
According to Equation (10) and Equation (11), we have: Then, R s can be expressed as: By introducing Equation (25) into H=R s + σ 2 Ā HĀ −1 , we have: As span{U s } = span{Ā}, there exists a nonsingular matrix T such that U s =ĀT holds. Substituting this matrix into Equation (26), we have: From the above analysis, we see that in the Nystrom approximate eigenmatrix U =ĀHĀ H 1 D=ĀJ, where J = HĀ H 1 D, the first K columns ofĀ H 1 are independent and H and D are nonsingular matrices becauseĀ H 1 has a K × z Vandermonde structure. Then, we have: whereT represents the first K columns of J, and span{U s } = span{F K } holds.
Using the approximate signal subspace and setting low-density grids, the following 2D-MUSIC spatial spectrum function is formulated: A low-resolution gray image can be obtained by normalizing the above equation as follows: where E Low is a matrix in which all elements of the same dimension as P Low are 1.

VDSR-Based High-Resolution Imaging
VDSR is a CNN architecture designed to perform single-image SR [17]. A VDSR network can elicit the mapping relationship between a high-resolution image and a low-resolution image through a very deep CNN structure. Different from traditional CNNs [21], VDSR aims to reconstruct the residual between the low-resolution image and high-resolution image. This residual contains deep high-frequency information. By using bicubic interpolation to upscale the low-resolution image, the dimensions of the input image and the desired output image can be matched. In addition, we use the bicubic interpolation method to generate the training set. If the interpolation method changes, it only needs to keep the same interpolation algorithm when generating the training set and the actual super-resolution.
The small-size, low-resolution FMCW MIMO radar image obtained using the method of the previous section is gray and can be regarded as an RBG image with only a brightness channel. The VDSR network extracts the residual image from the luminance of a color image; thus, the VDSR framework is very suitable for SR tasks.
As shown in Figure 5, the VDSR is a cascaded pair of convolutional and ReLU layers. It takes an interpolated low-resolution image as input and predicts a residual image as the regression output. By superimposing the images, a high-resolution image can be obtained. It should be noted that to maintain the sizes of all feature maps, zeros need to be padded before convolutions. Some sample feature maps were drawn for visualization, most of which were zero after applying the ReLU. The detailed structural parameters of the VDSR are shown in Table 2, and the training dataset can be found in [23], and consists of 20,000 natural images. The experimental platform was a PC with an Intel i9-10920X CPU, RTX3090 GPU and 64 GB of RAM. The stochastic gradient descent algorithm with momentum (SGDM) 0.9 and a learning rate of 0.1 was used to reduce learn the rate every 10 epochs. The maximum number of epochs for training was set to 100, and a mini-batch with 64 observations was used at each iteration. Training took about 2.1 h. The training procedure was offline and the training time was not considered in the proposed method.

Simulations and Experiments
Several simulations and experiments were carried out to validate the performance of the proposed method. First, we compare the accuracy of the proposed algorithm with the original 2D-MUSIC algorithm [15], and then the computational complexity is verified. Finally, the algorithms were applied to experimental data. The TI Cascade FMCW MIMO Radar parameters shared by the simulations and experiments are shown in Table 3.

Simulations
To verify the performance of the overall framework, consider two far-field narrowband stationary targets at (−14.5 • , 4.5 m) and (5.5 • , 6.5 m). The locationing effect was evaluated using the root mean square error (RMSE) metric. Differently from the single parameter estimation in [24,25], for the multi-parameter estimation problem, we defined the RMSE of the DOA and the RMSE of the range as follows: where θ k and R k are the k-th actual DOA and range, andθ k,t , andR k,t are the k-th  Figures 6 and 7, the RMSE of the proposed algorithm is better than that of 2D-MUSIC with a low-density grid but worse than that of 2D-MUSIC with a high-density grid. Due to the existence of grid errors, the estimation error of 2D-MUSIC with grids of [1 • , 1 m] cannot be reduced beyond a certain value through the increase of SNR. Moreover, as shown in Figure 8, the runtime of our proposed algorithm was shorter than that of 2D-MUSIC.

Experiments
The experimental data were from the TI Cascade FMCW MIMO Radar shown in Figure 1. The experimental site was a microwave anechoic chamber with metal reflectors. The number of snapshots was 75. First, the 2D-MUSIC algorithm is compared with the Nystrom-based 2D-MUSIC algorithm, and then the 2D-MUSIC algorithm is compared with the VDSR-based 2D-MUSIC algorithm.    Figure 12a,b shows comparisons between 2D-MUSIC algorithm and the Nystrombased 2D-MUSIC algorithm for the localization results of one and two targets with experimental data. It can be seen from the figures that the performance of the Nystrom-based 2D-MUSIC algorithm is similar to that of 2D-MUSIC, which shows that the subspace obtained by Nystrom method has high accuracy in practical applications.   To obtain the low-resolution image, the low-density intervals were set as [1 • , 1m]. Figure 15a,b shows the low-resolution imaging results obtained from the experimental data of one and two targets, respectively, using the method proposed in Section 3.1. It is obvious that the peak value in the 2D image does not represent the targets' positions accurately due to the large grid division.  Figure 16a,b shows the residual images obtained via VDSR from the low-resolution images of the one and two-target experimental data, with resolutions of 0.1 • and 0.1 m, respectively. It can be obviously observed that the missing high-frequency information of the low-resolution image was reconstructed to correct the peaks and edges.  Figure 17a,b shows the high-resolution images of the single and double-target experimental data, respectively, which complement the details of the low-resolution images. As can be seen from the figures, the image peaks are not very sharp, which indicates that with the decrease of the distance between targets, the grid needs to be further refined to achieve better results also. As the validation of experimental data does not require a large number of Monte Carlo experiments, a fine grid [0.1 • , 0.1 m] was adopted. As shown in Figure 18, the estimation with 2D-MUSIC needed several minutes; in contrast, the proposed algorithm only took 0.45 s, which shows the real-time advantage of the algorithm. Figure 19a,b shows the imaging results of the one and two-target experimental data obtained using the original 2D-MUSIC. It can be seen that with a complete noise subspace and a fine grid, 2D-MUSIC can achieve very sharp spatial spectrum peaks. However, these results are obtained at the cost of extremely long running times, and the real-time performance is extremely poor.  Figure 20a,b shows comparisons between the 2D-MUSIC algorithm and the proposed algorithm for the localization results of one and two targets with experimental data. It can be seen from the figures that the performance of the proposed algorithm is similar to that of 2D-MUSIC, despite the fact that our algorithm is much faster than 2D-MUSIC. In addition, as both algorithms present similar offsets, the calibration of the radar needs to be improved in future work.

Conclusions
In this paper, a fast joint direction-of-arrival (DOA) and range estimation framework for target localization based on a VDSR neural network was proposed. With the proposed algorithm, both the estimation error and the running time can be effectively decreased.
Simulations and experiments have proved that real-time performance of the proposed algorithm is more plausible than with the traditional 2D-MUSIC algorithm. Although the proposed method has achieved good results, it is still limited to the X86 platform and has not been implemented in embedded hardware such as FPGA. In addition, the network structure is not optimal. Some excellent compact network structures in [26,27] could be used to further improve network efficiency. Therefore, in future work, improvements to this method will be made, and real-time signal processing will be implemented on the embedded hardware platform.