An Overview of Direction-of-Arrival Estimation Methods Using Adaptive Directional Time-Frequency Distributions

: Direction of arrival (DOA) is one of the essential topics in array signal processing that has many applications in communications, smart antennas, seismology, acoustics, radars, and many more. As the applications of DOA estimation are broadened, the challenges in implementing a DOA algorithm arise. Different environments require different modiﬁcations to the existing methods. This paper reviews the DOA algorithms in the literature. It evaluates and compares the performance of the three well known algorithms, including MUSIC, ESPRIT, and Eigenvalue Decomposition (EVD), with and without using adaptive directional time–frequency distributions (ADTFD) at the preprocessing stage. We simulated a case with four sources and three receivers. The sources were well separated. Signals were received at each sensor with an SNR value of − 5 dB, 0 dB, 5 dB, and 10 dB. The angles of the sources were 15, 30, 45, and 60 degrees. The simulation results show that the ADTFD algorithm signiﬁcantly improved the performance of MUSIC, while it did not provide similar results for the ESPRIT and EVD methods. As expected, the computation time of the algorithms was increased by implementing the ADTFD algorithm as a preprocessing step.


Introduction
The direction of arrival (DOA) is one of the critical topics in array signal processing. Initially, the DOA was estimated for wireless signals impinging on an array of antennas. Bartlett presented early attempts in 1950 as a periodogram analysis of continuous spectra [1]. Later, Schmidt proposed the Multiple Signal Classification (MUSIC) [2] algorithm in 1986, which is used extensively in DOA estimation. The MUSIC algorithm estimates the frequency content of the received signal using the eigenspace method. In 1989, Roy et al. [3] proposed a new approach known as "the estimation of signal parameters using rotational invariance techniques" (ESPRIT), which exploits the underlying rotational invariance among signal subspaces induced by an array of sensors with a translational invariance structure. It has several advantages over the MUSIC algorithm [3]. Over the years, other researchers proposed many methods and improvements regarding the MUSIC and ESPRIT algorithms and their parameters [4,5]. Advances in machine learning algorithms allowed researchers to develop deep networks for DOA estimation.
The application of DOA is widespread in communication [6], smart antennas, seismology, oceanography [7], acoustics, surveillance, hearing aids, teleconferencing [8], radars, and sonar [9]. As the applications of DOA estimation were broadened, the challenges in implementing the algorithm also arose, such as increased computation time and memory requirements. Different environments require different modifications to the existing methods. In a real-time environment, computational time and cost play an essential role in the application as the military needs the fastest algorithm to estimate the DOA. One of the problems associated with DOA estimation is determining the DOA when there are fewer sensors than the number of sources. This is considered an under-determined case in DOA estimation.

Literature Review
Tho et al., proposed a new method to estimate the DOA in an under-determined scenario, which used a combination of noise floor tracking, onset detection, and coherence tests to identify the dominant source in the time-frequency (TF) bin robustly. The most significant eigenvectors of the covariance matrix corresponding to these bins were clustered next. The DOA sources were estimated based on the cluster centroids [10]. Furthermore, some DOA estimation techniques are specifically designed for the environments being used. Dey et al. [11] proposed an application of smart headphones that enabled the selective passing of speech sounds in the environment. Their algorithm was divided into two parts: the first part was a robust far-field speech detection algorithm for noisy environments. The second part was source localization. The application of this technique was a smart headphone system in which a user could be listening to music over headphones and hear speech from a specified direction.
Array structure plays an essential role in DOA estimation. Shi et al. reported that a coprime array, with a different coarray structure, increased the number of degrees of freedom. The proposed sparse reconstruction-based algorithm estimated DOA. In order to improve the power estimation, they modified the sliding window method and removed the spurious peaks in the reconstructed sparse spatial spectrum. Their work showed promising results in DOA and power estimation with achievable degrees of freedom [12]. Zhou et al. proposed a coprime array incorporating compressive sensing. The received signals were compressed by a random compressive sensing kernel to minimize the dimension; then, high-resolution DOA estimation was performed on the compressed measurements. The study verified the computational effectiveness of the method [13]. However, some DOA estimation operations do not consider the spatial relevance among the partitioned coarray statistics [14]. A recent study proposed a coupled coarray tensor canonical polyadic decomposition (CPD)-based 2D DOA estimation to address this. The work used shifting coarray concatenation to factorize the partitioned fourth-order coarray statistics into multiple coupled coarray tensors for the coprime L-shaped array. The number of degrees of freedom (DOF) was increased [14]. Coprime sensor arrays were used in the far-field DOA estimation of the uncorrelated radar signals [15] in order to increase the DOF. The work used the Cuckoo search, which provided an increased number of DOF with low SNR values [14].
Hioka et al. [16] proposed a DOA algorithm depending on the angular resolution and array structure in human-machine interfaces and speech recognition. The efficiency of the proposed algorithm was superior to that of the classical algorithms. Basikolo et al., used a non-uniform circular array to estimate DOA. They used the Khatri-Rao (KR) subspace approach to eliminate spatial noise covariance and estimate DOA with increased degrees of freedom. Using a non-uniform circular array and KR subspace approach, an increased degree of freedom was achieved in estimating the DOA. Because of this, both overdetermined and underdetermined DOA estimation became possible [17]. Xu et al. [18] explored a rectangular array for DOA estimation. The real-valued propagator method was utilized to estimate two-dimensional DOA in their work. As a result, their algorithm provided better angle estimation performance. Zhai et al. [19] employed an unfolded coprime linear array to suppress the ambiguity problem. The received signals from the two sub-arrays were stacked to derive the complete signal subspace. The authors introduced a reduced dimensional MUSIC algorithm (RD-MUSIC) for noncircular signals impinging on the two sub-arrays, which increased the noncircular signals' accuracy [19].
Feng et al., 2001, proposed a DOA estimation algorithm for wideband signals [20]. Their algorithm used fast chirplet-based adaptive signal decomposition to build a timefrequency covariance matrix. Subspace fitting was conducted similar to that of traditional MUSIC and ESPRIT algorithms. The authors overlapped the narrow and wideband inco-herent subspace and built a general TF matrix in this work. A wideband DOA estimation algorithm using fast Chirplet-based adaptive signal decomposition was projected based on the differences. The advantages of using this algorithm were that there was no restriction for array structure with very low complexity and robust performance.
Time-frequency analysis provides information in both the time domain as well as in the frequency domain. One such method used spatial TF distributions in a wideband scenario. This approach used spatial pseudo-Wigner-Ville distribution to analyze the time and frequency domain signals. The proposed method outperformed methods for FM signals and performed significantly better for wideband signals [21].
Bouri proposed a method using factorizations of a sample cross-spectral matrix for detecting and localizing the sources. This technique did not use eigenvalue decomposition to reduce computational cost and improve performance [22].
Mohan et al., suggested a new method to localize multiple speech sources with small arrays using a coherence test [8]. The authors proposed two methods: (1) narrowband spatial spectrum estimation at each bin followed by summation of directional spectra across time and frequency and (2) clustering low-rank covariance matrices and averaging the covariance matrices within the clusters [8]. However, there are many other approaches used to estimate DOA via different methods. Nishiura et al. [23] designed two other methods apart from the classical methods to estimate the DOA. The first method was DOA estimation based on a cross power spectrum phase and the second method was a statistical sound source identification algorithm based on the Gaussian mixture model. The above methods were used to localize the source signal by enhancing multiple sound signals. A microphone array had to be steered, for which the delay-and-sum beamformer method was employed to localize the source [23]. Sawada et al., proposed an approach for DOA estimation using independent component analysis. They reported that independent component analysis identified source signals from their mixture. The work stated the main advantage of independent component analysis over the MUSIC algorithm was that it could be applied even when the number of sources was equal to the number of sensors [24]. Matsuo et al., implemented a histogram mapping method to estimate the DOA of multiple speech signals.
The significant advantages of the histogram mapping method included low computational complexity and no requirement for the preliminary DOA estimates [25]. The authors introduced a mechanism to delete narrowband components present in the vector analysis. Swartling et al. [9] improved a statistical method known as steered response power with phase transform (SRP-PHAT) for DOA estimation. SRP-PHAT uses second-order statistics through cross power spectra to navigate a beamformer, searching for a maximum power output [9]. A peak in the beamformer was aimed towards the acoustic source with the highest power. Swartling et al. stated that fourth-order statistics provided a route to distinguish speech from noise. The fourth-order statistics provided superior performance compared to the second-order statistics, but the computation complexity doubled.
Wang and Zhang developed an iterative positioning algorithm to solve the link blockage problem in mmWave communication systems. As the first step, they used random beamforming and maximum likelihood estimation to estimate the angle of arrival and the angle of departure. Their proposed iterative algorithm achieved centimeter-level positioning accuracy [26].
Compressed sensing (CS) methods, on-grid, off-grid, and grid-less, use the signal sources' characteristics in the spatial domain. On-grid and off-grid methods have grid mismatch problems resulting in performance loss [27]. However, these two methods have less computational complexity. On the other hand, gridless methods perform better but have higher computational complexity [28].
In recent years, machine learning-based DOA estimation algorithms were proposed, including deep neural networks (DNN) and convolutional neural networks (CNN) [29][30][31][32][33]. Kase et al., developed a DNN-based DOA estimation of two targets [29]. They used a correlation matrix Rxx as an input and tested the proposed DNN for a case with two targets, and both narrowband signals from the targets were uncorrelated and had equal power. DNN-based methods require training. Kase et al. generated the training data by changing the SNR in pre-determined patterns in the range of 0-30 dB. They reported that the designed DNN achieved very high performance for the same case. This verified the well-known fact that the DNN-based solutions' performance highly depends on the training data and is susceptible to overfitting problems.
Liu et al. [30] proposed DOA estimation for underwater acoustic signals with different waves using a CNN architecture that uses the covariance matrix Rxx as the input array. To prevent the neural network from dealing with complex numbers, they divided the covariance matrix into two channels: real number and imaginary number layers. After training, the method was tested under a scenario in which different array elements were simulated under different water environment conditions with SNR of 20 dB, 10 dB, 0 dB, and −10 dB. The paper reported accuracy rates comparable to the MUSIC algorithm and reduced the estimation time of the DOA by 10 times less than the MUSIC algorithm. Liu et al. argued that their proposed CNN-based DOA estimation method was "far better than the traditional MUSIC algorithm" and was especially suitable for the underwater acoustic environment [30]. The environment's complex and changeable characteristics require a shorter calculation time and good accuracy in DOA estimation. However, the authors missed that the neural network-based DOA estimation algorithms' performance exclusively depends on training data, and complex and changeable environment characteristics negatively affect such algorithms.
The technique introduced by Asano et al. [34] constituted two stages corresponding to the different types of noise. In the first stage, ambient noise, which was less directional, was reduced by eliminating the noise-dominant subspaces. In the second stage, the spectrum of the target source was extracted from the multi-directional components. Visser et al. proposed a method for speech enhancement in a noisy environment. A practical application in a car was experimented with [35]. Their approach included combining two techniques, namely blind source separation and speech denoising, using hybrid wavelet independent component analysis. Blind source separation exploited the time correlation of speech signals captured by microphones. Blind source separation was used to locate the point source. Independent component analysis was used for the adaptive denoising of the separated signals. Mitianoudis et al., also used blind source separation for audio source separation. The authors introduced a technique for unmixing audio sources in an auditory scene [36]. Khan et al., reported that ADTFD performed well in analyzing close signal components compared to the other preprocessing methods. The ADTFD optimized the direction of the kernel at each point in the TF domain to obtain a clear representation, which was then exploited for DOA estimation [38].
Postprocessing methods may be employed after DOA estimation. Some of the commonly used postprocessing methods are postfiltering algorithms. Habets et al. [39] proposed a postfiltering algorithm for the spectral enhancement of speech signals. A feature of this technique was reduced interference. Gu et al., suggested a technique that used the QR decomposition-based recursive least square (QRD-RLS) technique as postprocessing. QRD-RLS was used to estimate the DOA from the autoregressive sources estimated employing the Kalman filter. Auto-regressive modeled sources provide excellent temporal information, enabling the QRD-RLS technique to estimate the DOAs. [40].
In the literature, Khan et al. [38] reported that the algorithm was applicable to subspace-based DOA methods. However, they assessed the ADTFD only for MUSIC. One of the goals of this paper is to compare the performances of three well known DOA estimation methods, including MUSIC, ESPRIT, and Eigenvalue Decomposition (EVD), by implementing ADTFD in the preprocessing stage. Another goal is to overview the existing DOA algorithms.
The paper is organized as follows: Sections 2 and 3 overview the studied DOA estimation algorithms. Section 4 explains the implementation of the ADTFD preprocessing method. Case-specific experimental results and discussions are presented in Section 5.

DOA Estimation Algorithms
3.1. The MUSIC Algorithm Schmidt (1986) proposed the MUSIC algorithm [2], which is a subspace-based method. MUSIC stands for multiple signal classification. The MUSIC algorithm provides asymptotically unbiased estimates of the number of signals, directions of arrival (DOA), strengths and cross-correlations between the directional waveforms, polarizations, and strength of noise or interference. The model in Figure 1 states that the waveforms received at the M-array elements are linear combinations of the D incident wavefronts and noise.
Note that (.)* is used to denote the Hermitian conjugate or complex conjugate transpose operation. The MUSIC algorithm assumes that incident signals and noise are uncorrelated. The D × D matrix S may be diagonal and singular. In the case of the number of wavefronts, D is less than the number of array elements M, ASA * is a nonnegative definite, and its rank is less than M. The Equation (7) is satisfied when λ is one of the eigenvalues of R xx in the metric of S o . λ is the minimum eigenvalue λ min ≥ 0.
If the elements of the noise vector W are mean zero, The eigenvectors of the signal subspace and the noise subspace are orthogonal to each other. This is the essential observation of the MUSIC approach. Since the steering vectors corresponding to signal components are orthogonal to the noise subspace, the DOA of the multiple incident signals can be estimated by locating the peaks of the spatial spectrum given by The flowchart of the MUSIC algorithm is summarized in Figure 2. The MUSIC algorithm's performance is different when the received signals are different. The MUSIC algorithm fails to detect correlated input signals as the response of the MUSIC is not sharp at the peaks while it is sharp in the case of the uncorrelated input signal [16].

The ESPRIT Algorithm
Another subspace-based algorithm, which was an improvement over the MUSIC algorithm, was proposed by Roy et al. (1989) [3]. ESPRIT stands for the estimation of signal parameters via rotational invariance techniques. It does not require knowledge of the array geometry and does not involve an exhaustive search through all possible steering vectors to estimate DOA. Hence, it reduces the computational and storage requirements significantly compared to the MUSIC algorithm. ESPRIT exploits an underlying rotational invariance among signal subspaces induced by an array of sensors with a translational invariance structure. This algorithm is more robust for array imperfections than the MUSIC algorithm. Consequently, the computational complexity and storage requirements are lower [6]. It also explores the rotational invariance property in the signal subspace created by two subarrays derived from the original array with a translational invariance structure. Unlike the MUSIC method, ESPRIT simultaneously estimates the number of antenna elements and DOAs. Figure 3 illustrates the ESPRIT algorithm's DOA estimation with multiple sources. Although the ESPRIT algorithm has many advantages, it is not entirely general, as it has restrictions on planar wavefronts and pairwise matched co-directional doubles. ESPRIT describes the array as being comprised of two subarrays, X and Y, to exploit the sensor array's translational invariance property. The subarrays X and Y are identical but physically displaced by a known displacement vector. The received signals are represented as: s k (.) is the kth signal as received at the reference sensor of the X subarray. θ k is the DOA of the kth source relative to the direction of the translational displacement vector. ∆ is defined as the magnitude of the displacement vector between the two arrays, and c is the speed of propagation in the transmission medium. w xi (t) and w yi (t) are the noise signals in the ith doublet for the subarrays, respectively.
The auto-covariance matrix R xx and the cross-covariance matrix R xy are defined as below.
Once Φ is calculated, the DOAs are calculated as:

Eigenvector Clustering Algorithm
Eigenvector clustering is another method used for DOA estimation [10]. Preprocessing is a critical step to eliminate noise vectors in the covariance matrix. The method uses the short-time Fourier transform (STFT), noise floor tracking, onset detection, and coherence test. DOA estimation is performed using the data from the cluster centroids. The array structure is also specified in this method. A triangular array with three microphones at a right angle is employed. The STFT of the multi-component signal is the first step to estimating the time-frequency (TF) bins in each frequency component. A speech enhancement method is used to select TF bins based on the speech enhancement method using a certain threshold value [7,9]. The onset is marked by a sudden rise in the energy of particular frequency bands and is used to detect sudden sound activity. Many onset detection functions detect the changes in one or more signal properties, considering that signals, specifically audio, have constantly changing properties such as amplitude, noise, onsets, offsets, and vibration. The onset of a signal increases the energy in the time domain [41] and in the frequency bands that other properties do not have. Therefore, an increase in energy in some frequency bands can be employed as an indicator of onset [16]. The coherence test proposed by Mohan et al. [8] is applied to select rank-1 TF bins. Rank-TF bins are selected since only one source dominates that particular TF bin. It means that only TF bins with a possibility of the incoming signal are present. The DOAs can be estimated from the cluster centroids after clustering the largest eigenvectors, based on the structure of the steering vectors and the microphone arrangement. Once noise-tracking and onset detection are performed, the method selects rank-1 TF bins, and most of the covariance matrices can be approximated. Eigenvalues and eigenvectors of the covariance matrix are determined. The algorithm clusters the normalized matrix into several clusters equal to the number of sources. Finally, the DOAs are estimated.

Adaptive Directional Time-Frequency Distributions (ADTFD)
Many applications use non-stationary signals that exhibit time-varying frequency spectra. The spatial time-frequency distribution (STFD) is a well-known approach for analyzing non-stationary multi-sensor signals. Since the STFD matrices contain high energy points in the TF domain, they result in a robust DOA estimation against noisy disturbances [42][43][44][45]. Many studies reported improved DOA estimation for the conventional MUSIC algorithm by replacing covariance matrices with the STFD matrices [42,43,46]. The selection of the TF presenters for the sources improves the DOA estimation, where the number of sensors is less than the number of sources, which is called an under-determined case. In such cases, separate STFDs are constructed, each corresponds to one source, and they are used to estimate DOAs [47,48]. The estimated instantaneous frequency (IF) is used to obtain the sources' TF presenters. Spatially averaged time-frequency distribution (TFD) of sensor information is employed to estimate the IF [47,49,50]. The benefits of TFDs can be summarized as follows: (a) In traditional signal representations, time and frequency are mutually exclusive, and each representation is non-localized with respect to the other representation. Only one domain representation may become insufficient for complex problems. In such cases, the distribution of time and frequency may present additional information.
(b) TFDs allow the analysis of the signals representing the signal characteristics such as relative amplitudes, IF, complexity, flatness, and energy distribution in the TF domain [51].
The resolution of the TFD plays an essential role in DOA estimation, mainly when the sources are closely located. Both the STFD and TF filter approaches heavily depend on the TFD's resolution, which has higher computational cost and memory requirements. A DOA approach using the ADTFD, proposed by Khan et al., provided good improvements for non-stationary signals and the MUSIC algorithm [38]. The algorithm, illustrated in Figure 4, consists of several stages, including calculating and averaging TFDs, IF estimation, and TF filtering. Quadratic TFD is used to analyze the signals. The estimated IF components are used to design TF filtering [33,38].

Spatial Averaging of TFDs
Wigner-Ville Distribution (WVD), W z (t, f ), is used to calculate the TFDs of a signal. It is defined as z(t) is the analytic associate of the signal, and it is complex. WVD is used to study non-stationary signals. Considering that DOA deals with non-stationary signals, WVD becomes useful in DOA estimation. Averaging TFDs is performed by dividing ρ z (t, f ) by the number of array elements.
Postprocessing is performed to preserve the energy of weak TFD components while resolving the closely-spaced components. It allows accurate IF extraction. An adaptive smoothing kernel is applied to the ρ avg (t, f ) in order to resolve close components of the signal. Then, the ADTFD is defined using the average TFD and the second derivative directional Gaussian filter.

Multi-Component Analysis
Multi-component analysis consists of IF estimation and TF filtering. The IF of a signal indicates the dominant frequency of the signal at a given time. The peaks of the multicomponent signal in the TF domain are used to estimate the IF. The peaks are calculated by setting the first and second derivatives of the ADTFD to zero. The phase is estimated by TF filtering on the estimated IFs [38].

Special Case: Experimental Results and Discussions
We simulated a case that had four sources with three receivers. The sources are well separated and represented as follows:   Tables 1 and 2 with and without the ADTFD algorithm, respectively. The EVD algorithm performed better than the MUSIC and ESPRIT algorithms in estimating the DOAs without the ADTFD under different SNR values. Corresponding mean square error (MSE) values in dB are depicted in Figure 5. The EVD algorithm's MSE values were around −11 dB, while the ESPRIT algorithm's MSE values were around −3 dB. The MUSIC algorithm produced a steady MSE of about −8.2 dB for different SNR values. The ADTFD algorithm in the preprocessing stage improved the MUSIC algorithm's performance significantly. On the other hand, the ESPRIT and EVD algorithms did not benefit from the ADTFT. The DOAs are given in Table 2. The MSE versus SNR plot for the DOA algorithms with the ADTFD for different SNR values is shown in Figure 6. The MSE values of the MUSIC were calculated below −22 dB with the ADTFD.  T1  T2  T3  T4  T1  T2  T3  T4  T1  T2  T3  T4   −5 dB  35  35  35  35  0  2  74  0  13  32  40  67   0 dB  36  36  36  36  0  9  58  0  18  22  43  58   5 dB  37  37  37  37  0  14  49  0  6  31  39 T1  T2  T3  T4  T1  T2  T3  T4  T1  T2  T3    We can see the effect of the ADTFD preprocessing algorithm on each DOA method more clearly in Figures 7-9. The ESPRIT and EVD methods' MSE values did not improve with the ADTFD. The average unoptimized computation time for ADTFD using the MUSIC algorithm is 1.83 s and for the EVD is 1.66 s on a system using 16 GB RAM. The results are summarized in Table 3.

Conclusions
This work reviewed the DOA estimation algorithms in the literature. Furthermore, it simulated a case that had four well-separated sources with three receivers. Signals were received at each sensor with SNR values of −5 dB, 0 dB, 5 dB, and 10 dB. The angles of the sources were 15, 30, 45, and 60 degrees. The performances of the MUSIC, ESPRIT, and Eigenvalue Decomposition (EVD) algorithms were evaluated and compared with and without using the ADTFD algorithm. The ADTFD algorithm is a preprocessing step before the DOA estimation. It was originally developed for the MUSIC algorithm, but its effects on the other DOA estimation methods are not often studied. Our simulation results showed that the EVD algorithm performed better than the MUSIC and ESPRIT algorithms in estimating the DOAs without the ADTFD under different SNR values. However, the ADTFD algorithm improved the performance of the MUSIC algorithm significantly while not affecting the other DOA estimation methods. As expected, the computation time of the methods increased by using the ADTFD algorithm.
Author Contributions: P.K.E. and B.D.B. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.