Scalable ESPRIT Processor for Direction-of-Arrival Estimation of Frequency Modulated Continuous Wave Radar

: The estimation of signal parameters via rotational invariance techniques (ESPRIT) is an algorithm that uses the shift-invariant properties of the array antenna to estimate the direction-of-arrival (DOA) of signals received in the array antenna. Since the ESPRIT algorithm requires high-complexity operations such as covariance matrix and eigenvalue decomposition, a hardware processor must be implemented such that the DOA is estimated in real time. Additionally, the ESPRIT processor should support a scalable number of antenna conﬁguration for DOA estimation in various applications because the performance of ESPRIT depends on the number of antennas. Therefore, we propose an ESPRIT processor that supports two to eight scalable antenna conﬁguration. In addition, since the proposed ESPRIT processor is based on multiple invariances (MI) algorithm, it can achieve a much better performance than the existing ESPRIT processor. The execution time is reduced by simplifying the Jacobi method, which has the most signiﬁcant computational complexity for calculating eigenvalue decomposition (EVD) in ESPRIT. Moreover, the ESPRIT processor was designed using hardware description language (HDL), and an FPGA-based veriﬁcation was performed. The proposed ESPRIT processor was implemented with 10,088 slice registers, 18,207 LUTs, and 80 DSPs, and the slice register, LUT, and DSP were reduced by up to 71.45%, 54.5%, and 68.38%, respectively, compared to the existing structure.


Introduction
Direction-of-arrival (DOA) estimation of a signal using an array antenna system is widely used in various applications such as radar and communication [1][2][3][4]. The multiple signal classification (MUSIC) and estimation of signal parameters via rotational invariance techniques (ESPRIT) algorithms are the most widely used DOA estimation algorithms [5,6]. The MUSIC calculates a spectrum for a spatial angle by using the principle that the steering vector of the received signal is orthogonal to the noise subspace. Then, the DOA of the signal is obtained by finding the peak value in the calculated spectrum [7]. The MUSIC algorithm has high computational complexity because it requires extensive peak searches across all spectra [8].
In contrast, the ESPRIT algorithm uses a structure with two shift-invariant subarrays of an array antenna for estimating the DOA [9]. Because the ESPRIT algorithm can obtain DOA without the need to find a peak in the spectrum, its computational complexity is lower than that of the MUSIC algorithm [10,11]. Consequently, ESPRIT satisfies the tradeoff between performance and hardware complexity [12]. However, ESPRIT algorithms still require high-complexity operations such as covariance matrix calculation and eigenvalue decomposition (EVD). Thus, hardware implementation is necessary to estimate the DOA in real-time.
The use of ESPRIT processors in various applications require high-precision DOA estimation performance as well as real-time processing. The performance of ESPRIT depends on the number of antennas, which is further dependent on the application [13,14]. For applications that require high DOA precision, a large number of antennas should be configured, whereas a small number of antennas need to be configured for applications that require low cost implementation. Therefore, it is essential that the ESPRIT processor supports a variable number of antenna configurations and caters to various applications. Many studies have been conducted to exploit multiple invariances of antenna arrays [15,16]. Xu proposed a multiple invariances (MI)-ESPRIT algorithm that improves performance using low computation and MI of the ESPRIT algorithm [17]. The MI-ESPRIT algorithm can perform subarray unit calculations, which helps cater to applications that require a variable number of antennas.
Various studies have been conducted with regard to the real-time implementation of the ESPRIT. Alhamed performed EVD operations using QR decomposition methods for implementing an ESPRIT processor with low complexity on an FPGA for the purpose of real-time processing [18]. In addition, Hussain designed the ESPRIT processor using LU decomposition to lower the hardware complexity rather than the frequently used QR decomposition method [19]. However, Hussain's ESPRIT processor only supports a configuration with four antennas. Another way to implement EVD with low complexity is the cyclic Jacobi method [20]. This method simply rotates the plane in a repetitive manner and calculates the eigenvalue and eigenvector which yields lower complexity as compared to QR and LU decomposition. However, both the number of iterations and execution time increases rapidly with the number of antennas [21].
In this paper, we propose an ESPRIT processor that supports two to eight scalable antennas. The proposed ESPRIT processor achieves a much better performance than the existing ESPRIT processor because it exploits multiple invariances based on the MI-ESPRIT structure. The hardware complexity was reduced by simplifying the least-squares method and the execution time was also dramatically reduced by decreasing the number of iterations of the cyclic Jacobi method. The remainder of this paper is organized as follows. In Section 2, we review the ESPRIT algorithm for estimating DOA. The hardware architecture of the proposed ESPRIT processor is described in Section 3. In Section 4, we present the implementation results of the proposed ESPRIT processor. Finally, Section 5 concludes the paper.

Signal Model
The ESPRIT algorithm estimates the DOA using the data received from a uniform linear array (ULA) antenna. As shown in Figure 1, considering that the reflected signal from K targets is incident at different DOAs (θ) of the ULA antenna composed of M sensors, the received signal can be modeled as follows: In Equation (1), T is a M × 1 output vector of the antenna array and x m (t) (m = 1, 2, · · · , M) is the output of the m-th element at t. S(t) = [s 1 (t), s 2 (t), · · · , s K (t)] T is a K × 1 data vector of reflected signal K targets and s k (t) (k = 1, 2, · · · , K) is the k-th signal at t. N(t) = [n 1 (t), n 2 (t), · · · , n M (t)] T is a M × 1 data vector of noises which are additive white Gaussian noises (AWGN) and the noises of each array element are not relevant. n m (t) (m = 1, 2, · · · , M) is the noise of the m-th element at t. A is a matrix of M × K, and it is composed of an array of steering vectorsā(θ) with a constant phase difference according to the incident angles of the received signals, as shown in Equations (2) and (3).
a(θ k ) = 1, e −j 2πd λ sin(θ k ) , e −j 2πd λ 2 sin(θ k ) , · · · , e −j 2πd whereā(θ k ) is the steering vector by the k-th source, λ is the wavelength of the signal, d is the distance between the elements of the ULA, and θ k is the DOA of the k-th source. To apply the ESPRIT algorithm, the covariance matrix of the received signal X(t) in Equation (1) can be expressed as Equation (4) below: where E[·] denotes statistical expectation, the subscript H denotes conjugate transpose, σ 2 is the variance of AWGN, and I M is an identity matrix of size M × M. Because the actual expectation value cannot be obtained, the covariance matrix operation is performed by calculating the average value as shown in Equation (5) from N snapshots.
After the covariance matrix operation is obtained from Equation (5), the eigenvalues and eigenvectors are obtained through eigenvalue decomposition. Subsequently, a signal subspace is formed due to the K eigenvectors by comparing the magnitudes of the eigenvalues.

ESPRIT and MI-ESPRIT Algorithm
The DOA estimation in the ESPRIT algorithm is based on the shift-invariant property of the two subarrays of the array antenna. As shown in Figure 2, the ESPRIT algorithm processes the received signal by dividing the antenna array into two subarrays. The output matrices of the two subarrays can be expressed as Equations (6) and (7), respectively.
Because the antenna array is a ULA, the distances between the antennas are all the same, and there is a phase delay between subarray1 and subarray2, which is corresponding to the distance between the antennas. Therefore, the steering vectors A 1 and A 2 of each subarray can be expressed using Equation (8).
where Φ = diag e j 2πd λ sin θ 1 , · · · , e j 2πd λ sin θ K and represent a diagonal matrix that accounts for the phase delay between the antennas in each pair, and A 1 and A 2 are matrices of size (M − 1) × K. The signal subspaces E s1 and E s2 of the two subarrays can be expressed as Equations (9) and (10), respectively, using a nonsingular matrix F. Here, F is a K × K matrix, and E s1 and From Equations (6)-(10), E s1 and E s2 can be expressed as Equation (11) below: where Ψ is a K × K matrix representing F −1 ΦF, and Ψ and Φ have the same eigenvalues. Therefore, the DOA of the received signal can be estimated by calculating the eigenvalue Ψ, Ψ can be calculated using the least squares method as follows: From the estimated Ψ shown in Equation (12), the eigenvalue z k = e j 2πd λ sin θ k can be calculated, and the DOA, θ k , can be calculated as follows: Hence, the ESPRIT algorithm estimates the DOA using the shift invariances of the two subarrays. However, for antenna arrays with multiple invariance structures, the DOA can be estimated using the MI-ESPRIT algorithm. The MI-ESPRIT algorithm improves the performance of the existing method by changing the antenna array configuration [22,23]. Figure 3 illustrates a case with a multiple invariance structure. The ULA with M antennas can be divided into h subarrays, and each subarray has a z-antenna. In the adjacent subarray, z − 1 antennas are overlapping, and the number of subarrays and the number of antennas in the subarrays are satisfied as follows: The signal subspace for h subarrays can be expressed as follows.
where A 1 represents the steering vector of the first subarray, F represents a nonsingular matrix, and Φ represents diag e j 2πd λ sin θ 1 , · · · , e j 2πd λ sin θ K . By combining the h − 1 signal subspaces in Equation (15), two new subspaces can be formed, as shown in Equations (16) and (17). From Equations (16) and (17), E s1 and E s2 can be expressed as Equation (18) as follows: The Ψ of Equation (18) can be obtained as the following closed-form solution using the least squares method.
After calculating the eigenvalue z k = e j 2πd λ sin θ k in the same manner as that in ESPRIT algorithm, and the DOA, θ k , can be calculated as shown in Equation (13).

DOA Estimation Technique for Single Target
It is essential that the number of targets is known for the subspace-based DOA estimation algorithm to clearly distinguish the signal subspace and noise subspace from the eigenvector. Therefore, a target detection should be performed before applying the ESPRIT algorithm. In the frequency-modulated continuous wave (FMCW) radar system, a target is detected by searching the peak value in a range-Doppler map (R-D map), and the DOA for each target is estimated thereafter [24][25][26]. Therefore, the ESPRIT algorithm needs to perform DOA estimation only for one target. Moreover, because the power of the signal is concentrated, and the power of the noise is distributed in the R-D map through the FFT operation, there is no degradation in performance even if the ESPRIT operation is performed for one peak value of the R-D map, that is when the number of snapshots is set to one [27]. Figure 4 illustrates the root mean square error (RMSE) of ESPRIT algorithm with the signal-to-noise ratio (SNR) for the targets detected in the R-D map. While estimating the DOA for the targets detected in the R-D map, it is observed that the performance is the same regardless of the number of snapshots. Therefore, in this study, we designed a processor that can support the ESPRIT algorithm for one snapshot.

Comparison of RMSE Performance According to the Number of Antennas
The number of antennas required in the successful implementation of the MI-ESPRIT algorithm must be greater than or equal to four. For a system with four antennas, two subarrays can be configured by setting the number of antennas in the subarray to three for the ESPRIT algorithm. In the same instance, three subarrays can also be configured by setting the number of antennas in the subarray to two for the MI-ESPRIT algorithm. Figure 5 shows the RMSE with SNR according to the number of antennas in the subarray. When the total number of antennas is four, as shown in Figure 5a, there is no difference in performance according to the subarray configuration. However, as shown in Figure 5b, when the total number of antennas is eight, optimal performance can be achieved by setting the number of antennas to three, configuring six sub-arrays, and applying the MI-ESPRIT algorithm. Therefore, in this study, the number of antennas in the subarray is selected as three, we designed an ESPRIT processor that uses the ESPRIT algorithm to estimate the DOA when the total number of antennas is less than four and uses the MI-ESPRIT algorithm to estimate the DOA when the total number of antennas is five or more.  Figure 6 shows a block diagram of the proposed ESPRIT processor, which supports a scalable number of antennas. The hardware architecture of the proposed ESPRIT processor consists of a covariance matrix module (CMM), an eigenvalue decomposition module (EDM), a least square module (LSM), and an angle estimation module (AEM). The sequence of operations of the proposed hardware architecture is as follows. First, the signals of the target data detected from the R-D map are entered into the CMM according to the number of antennas, and the covariance matrix is calculated thereafter. After the covariance matrix operation is completed, a data matrix of size M × M is entered into the EDM, and the eigenvalue decomposition operation is performed. When the eigenvalue decomposition is complete, the signal subspace is output from the EDM, and the LSM performs a leastsquares operation according to the number of antennas. The resultant output of the LSM is entered into the AEM to estimate the DOA of the target.

Covariance Matrix Module (CMM)
The hardware architecture of the CMM is illustrated in Figure 7 and is composed of a register file (RF), multiplexer, covariance calculator (Cov calculator), and CMM controller. The target data entered in accordance with the number of antennas is stored in a register, and the multiplexer aligns the data according to the number of antennas and transfers it to a Cov calculator that performs a 2 × 2 multiplication operation to perform the covariance operation according to Equation (5). Equations (20) and (21) represent the calculation of Equation (5) when the number of antennas is set to four and eight, respectively.
where L 1 = X(1 : 2, 1), L 2 = X(3 : 4, 1), L 3 = X(5 : 6, 1), and L 4 = X(7 : 8, 1). When the number of antennas is four, the covariance matrix operation result can be obtained through four 2 × 2 matrix multiplication operations; when the number of antennas is eight, the covariance matrix operation result can be obtained through 16 2 × 2 matrix multiplication operations. This indicates that it is possible to support the number of scalable antennas by repeating the matrix multiplication operation according to the number of antennas.
W p,q means the rotation of the (p, q) plane with respect to the matrix R, and the rotation matrix is sequentially rotated like W 1,2 , W 1,3 , W 1,4 , · · · , W M−1,M and is called the cyclic Jacobi method. When performing the cyclic Jacobi method, R is expressed in Equation (23).
When the iterative operation is performed, the matrix D converges in the form of a diagonal matrix and represents eigenvalue, V represents eigenvector and is expressed as Equation (24) below.
When the elements in row i and column j of matrix R are expressed as r ij , α and φ of the rotation matrix are expressed as Equations (25) and (26), respectively.
Accordingly, the designed EDM generates a signal subspace by performing an eigenvalue decomposition operation on the M × M data matrix for which the covariance matrix opera-tion is performed according to the number of antennas. The rotation matrix only affects the 2 × 2 matrix multiplication operation, and each multiplication operation is independent. Shahshahani designed a scalable EVD processor based on the independent feature of the multiplication of the Jacobi method [28]. Therefore, in this study, a scalable EDM was designed using the architecture proposed by Shahshahani, as shown in Figure 8. We used a processing element (PE), an eigenvalue multiplexer, and an eigenvector multiplexer. PE performs the matrix multiplication operation after obtaining a rotation matrix. A cos/sin calculator is used to obtain the rotation matrix, and a matrix multiplier module is used for the multiplication operation. To find the rotation matrix, the angle of the input data should be obtained, and the cosine and sine values should be determined according to the obtained angle. The cos/sin calculator uses the coordinate rotation digital computer (CORDIC) vector module and the CORDIC rotation module to calculate the rotation matrix. Subsequently, the matrix multiplier performs a matrix multiplication operation between the input and rotation matrices. The output matrix of the PE aligns the data in the order required for the next iteration through the eigenvalue multiplexer and eigenvector multiplexer. When the iteration process is repeated according to the number of antennas, the signal subspace is the output. The cyclic Jacobi method repeats the process in Equations (22)- (26), and the number of iterations to be repeated is M × (M − 1)/2. Thus, as the number of antennas increases, the computational complexity of the cyclic Jacobi method increases dramatically. However, because the angle is estimated for each target in the R-D map, the ESPRIT processor only needs to perform an angle estimation operation on one target. Thus, when the number of targets is one, the signal subspace corresponds to the first column of the eigenvector. Therefore, the rotation matrices needed to compute the first column of the eigenvectors are W 1,2 , W 1,3 , W 1,4 , · · · , W 1,M , and the number of iterations performed in operation is (M − 1). When the number of targets is one, the number of iterations of the cyclic Jacobi method, which has a considerable computational complexity in the ESPRIT algorithm, can be significantly reduced.
In addition, the number of cycles of the proposed EDM needed to perform one iteration in the cos/sin calculator is 46, and two cycles are required to perform a 2 × 2 matrix multiplication operation once. Thus, it takes 48, 50, 50, 52, 52, 54, and 54 cycles each for two, three, four, five, six, seven, and eight antennas. Therefore, the proposed cyclic Jacobi method significantly reduced the number of cycles required to perform EVD operations as compared to the conventional method. As shown in Table 1, the cycle is reduced by up to 75% for two to eight antennas.

Least Square Module (LSM)
When the number of targets is one in the ESPRIT processor because Ψ in Equation (12) is a 1 × 1 matrix, Ψ = Φ is established; thus, the process of eigenvalue decomposition is unnecessary. In addition, because E H s1 E s1 −1 is a real number, the arg(z k ) operation in Equation (13) remains unaffected. Thus, when the number of targets is one, Equation (12) can be simplified as shown in Equation (27).
The hardware architecture of LSM is as shown in Figure 9, and consists of an RF, multiplexer, PSI (ψ) calculator, and LSM controller. The signal subspace entered according to the number of antennas is stored in an RF. After aligning the data according to the subarray configuration of Equation (15) using a multiplexer, we enter it into the PSI calculator to perform the operations in Equation (27). The PSI calculator consists of a matrix multiplication module and a complex adder, so it is designed to support a scalable number of antennas. Equations (28) and (29) represent Equation (27) when the number of antennas is set to four and eight, respectively.
where E 1 = E s (1 : 3, 1), E 2 = E s (2 : 4, 1), E 3 = E s (3 : 5, 1), E 4 = E s (4 : 6, 1), E 5 = E s (5 : 6, 1), and E 6 = E s (6 : 8, 1). When the number of antennas is set to four, ψ can be obtained through a single matrix multiplication operation; when the number of antennas is set to eight, ψ can be obtained by performing five matrix multiplications, and the calculation results are added thereafter. Thus, the number of scalable antennas is supported by repeating the matrix multiplication operation according to the number of antennas.

Angle Estimation Module (AEM)
When the number of targets is one in the ESPRIT processor, the DOA can be estimated using Equation (30) without the eigenvalue decomposition of ψ.
Therefore, the AEM module calculates Equation (30) and estimates the DOA of a target using the ψ obtained from the LSM. The hardware architecture of the AEM is shown in Figure 10. It is composed of the CORDIC vector module and CORDIC ASIN module. The AEM calculates the angle of ψ using the CORDIC vector module and estimates the DOA of a target using the CORDIC ASIN module.

Implementation Results of Proposed ESPRIT Processor
The proposed ESPRIT processor was designed using the hardware description language (HDL), and it was implemented on a Xilinx Zynq UltraScale+ ZCU104 systemon-chip (SoC) platform [29]. Table 2 shows the implementation results of the proposed ESPRIT processor. The proposed ESPRIT processor was implemented with 28,978 LUTs, 11,279 FFs, and 374 DSPs, and it was confirmed that it could be implemented with an operating frequency of up to 334 MHz. In addition, as shown in Figure 11, the proposed ESPRIT processor was configured on the SoC platform using the advanced extensible interface (AXI) bus interface to perform real-time verification. Figure 12 shows the verification environment for the SoC platform. The target data for hardware verification were initialized in the double data rate (DDR) memory, and the number of antennas was set using a microprocessor unit (MPU). Subsequently, when the start signal of the ESPRIT processor was entered through the MPU, the initialized data in the DDR memory were stored in the SRAM of the ESPRIT processor through the master interface channel, and then the ESPRIT processor was operated. Subsequently, the result of the ESPRIT processor was output to the Host-PC through the universal asynchronous receiver/transmitter (UART) interface to confirm the result of the DOA estimation. From Table 2, it was confirmed that the execution time for two to eight antennas was 0.39-1.86 µs.    Table 3 shows the comparison results of the hardware complexity between the proposed ESPRIT processor and the designs in [18,19,30,31]. For a fair comparison, the proposed ESPRIT processor was implemented on a Xilinx Virtex-5 XC5VSX95T FPGA, and the execution time, processing rate, and area efficiency were compared for a configuration with four antennas. In [18], an ESPRIT processor was designed using QR decomposition to implement an ESPRIT processor with low complexity. The LU decomposition was used in [19] for reducing the complexity, as compared to the ESPRIT processor using the QR decomposition method. The Cholesky-based DOA estimation processor presented in [30] does not require the EVD to be calculated, so it has less computational complexity than EVD-based DOA algorithms. Additionally, the DOA estimation processor using the Bartlett algorithm presented in [31] is a spectral-based method, and it is known that the computational complexity is smaller than that of the MUSIC or ESPRIT processor. However, the ESPRIT processor of the four references still has a high hardware complexity and supports only four or eight antennas. On the other hand, the proposed ESPRIT processor supported a scalable number of antennas, and it can be seen that the area efficiency (Hz/Registers, Hz/LUTs) was higher than that of [18,19,30,31]. As compared to the efficiency of the configuration in [18], the efficiency of the register was approximately 45.74 times higher, and that of the LUT was approximately 28.71 times higher. Compared to the configuration in [19], the register was approximately 2.21 times higher, and the LUT was 1.68 times higher. Additionally, in [30], the register was approximately 2.84 times higher, and the LUT was 1.99 times higher. As compared to the design in [31], the area-efficiency for register of the proposed design was about 25.59 times higher and that for LUT was about 10.38 times higher. In particular, the execution time of the DOA estimation processor presented in [31] took about 24.78 times longer than the proposed ESPRIT processor.

Conclusions
In this paper, we proposed a scalable ESPRIT processor that supports two to eight scalable hardware antennas. The proposed ESPRIT processor is based on the MI-ESPRIT structure. The performance of the proposed ESPRIT processor is improved by using multiple invariances of the array, and the complexity is reduced by simplifying the leastsquares method. Moreover, the execution time is reduced by decreasing the cyclic Jacobi method, which has the most significant computational complexity for calculating the EVD in ESPRIT. In addition, the proposed ESPRIT processor was implemented on a Xilinx Zynq UltraScale+ SoC Platform, and it was confirmed that it can support real-time processing. The proposed ESPRIT processor was implemented with 28,978 LUTs, 11,279 FFs, and 374 DSPs with an operating frequency of up to 334 MHz. Additionally, the proposed ESPRIT processor is implemented on a Xilinx Virtex-5 FPGA for a fair comparison of the hardware complexity between the existing DOA estimation processor and the proposed ESPRIT processor. It was confirmed that the proposed ESPRIT processor is superior to the exisiting DOA estimation processors in terms of area efficiency. Therefore, the proposed ESPRIT processor is expected to be one of good solutions for estimating DOA in FMCW radar systems.