An Efficient Framework for Estimating the Direction of Multiple Sound Sources Using Higher-Order Generalized Singular Value Decomposition

This paper presents an efficient framework for estimating the direction-of-arrival (DOA) of wideband sound sources. The proposed framework provides an efficient way to construct a wideband cross-correlation matrix from multiple narrowband cross-correlation matrices for all frequency bins. In addition, the proposed framework is inspired by the coherent signal subspace technique with further improvement of linear transformation procedure, and the new procedure no longer requires any process of DOA preliminary estimation by exploiting unique cross-correlation matrices between the received signal and itself on distinct frequencies, along with the higher-order generalized singular value decomposition of the array of this unique matrix. Wideband DOAs are estimated by employing any subspace-based technique for estimating narrowband DOAs, but using the proposed wideband correlation instead of the narrowband correlation matrix. It implies that the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOAs of the wideband sources directly, which result in reducing computational complexity and facilitating the estimation algorithm. Practical examples are presented to showcase its applicability and effectiveness, and the results show that the performance of fusion methods perform better than others over a range of signal-to-noise ratios with just a few sensors, which make it suitable for practical use.


Introduction
The fundamental competence of sound source localization has received much attention during the past decades, and has become an important part of navigation systems [1,2]. Direction-of-arrival (DOA) estimation in particular plays a critical role in navigation systems for the exploration of sources in widespread applications, including in acoustic signal processing [3][4][5][6][7][8]. Several approaches have been proposed as a potential way to estimate DOA. For instance, the time-difference-of-arrival-based DOA estimation is one of the most frequently used approaches, which is widely known as the generalized cross-correlation with phase transform (GCC-PHAT) [9]. In addition to this approach, a low computational requirement makes it attractive for practical applications; however, the major drawback is its low robustness in noisy and multipath environments. Another relevant approach is adopted from the independent component analysis (ICA) in blind source separation [10,11]. ICA searches independent components by measuring deviations from Gaussian distributions, such as maximization of negentropy or kurtosis. DOAs are estimated easily by using the separated components for all frequency bins, but it should be noted that the estimation accuracy of such a method is highly sensitive to the non-Gaussianity measures.
In an alternative approach to estimate narrowband DOAs, the subspace method has been proposed in an effort to improve estimation performance. The most prominent methods observe the signal and noise subspace for achieving more robust results, such as multiple signal classification (MUSIC) [12], estimation of signal parameters via rotational invariance techniques (ESPRIT) [13], and propagator method [14,15], which have been used frequently for one-dimensional (1D) DOA estimation along with the uniform linear array (ULA) of sensors. In case of a two-dimensional (2D) DOA estimation, a new geometrical structure of a sensor array is required, and it was previously found that the structure of an L-shaped array is considerably effective for estimating 2D DOAs [16]. Additionally, the L-shaped array allows for simple implementation, because it consists of two ULAs connected orthogonally at one end of each ULA. For these reasons, the L-shaped array is widely applied to the 2D DOA estimation method [17][18][19][20][21][22][23][24][25][26], and its practical applications can be found in the past researches [27,28]. Although the narrowband subspace method may be unable to directly estimate wideband DOAs, one possible way to solve this problem is to employ the narrowband subspace method in each temporal frequency intensively, and then the wideband DOA results can be estimated by interpolating the narrowband DOA results all frequency bins [29,30]. It should be noted again that intensive computational costs encountered in the above solution may be limited by practical considerations.
Several approaches were proposed to solve the problem of estimating wideband DOAs, for example, the incoherent MUSIC (IMUSIC) is one of the simplest methods for estimating wideband DOA [31]. There are two steps in IMUSIC: Firstly, a noise subspace model each temporal frequency is constructed. Then, wideband DOAs are obtained by minimizing the norm of orthogonal relation between a steering vector and the noise subspace of all frequency bins. Although accuracy performance of IMUSIC was demonstrated to be an effective method for estimating DOAs of multiple wideband signals in the high signal-to-noise ratio (SNR) region, a single small distortion of the noise subspace at any frequency can affect the whole DOA results. Many attempts were made recently to overcome this problem. For instance, the test of orthogonality of frequency subspaces (TOFS) was proposed to overcome this difficulty [32], but performance degradation caused by the small distortion still remains challenging. Another relevant approach is called the test of orthogonality of projected subspaces (TOPS) [33]. TOPS estimate DOA by constructing signal subspace of one reference frequency, and then measuring orthogonality of the previous signal subspace and noise subspace for all frequency bins. The simulations showed that TOPS is able to achieve higher accuracy than IMUSIC in mid SNR range, however, the undesirable false peaks still remain. The revised and greatly improved version of TOPSs were proposed recently to reduce these false peaks [34,35]. Obviously, computational complexities increased dramatically compared to the classical TOPS.
Another notable approach of wideband DOA estimation is the coherent signal subspace method (CSS) [36,37]. CSS specifically focuses a correlation matrix of received signals of each temporal frequency into a single matrix, which is called a universal correlation, associated with one focusing frequency via linear transformation procedure. Wideband DOAs are estimated by applying a single scheme of any narrowband subspace method on the universal correlation matrix. In addition to the transformation procedure of CSS [38][39][40], a process of DOA preliminary estimation is required before the wideband DOAs can be estimated. Therefore, a common shortcoming is clearly recognized as a requirement of DOA preliminary estimation, which means that any inferior initiation can lead to biased estimates. According to the literature [31][32][33]41], CSS demonstrates deficient performance than others such as TOPS; this is because the solutions of transformation procedure in CSS are solely focused on subspace between a temporal frequency and focusing frequency; to the best knowledge of the authors, it means that a fundamental component of the transformation matrix across all frequency bins may exhibit the different core component, which is clearly apparent when a narrowband DOA result at some frequency is not close enough to the true DOA. A single component distortion can definitely affect the whole DOA results. Therefore, the solutions have to exhibit the exact component even though power present in a received signal at that frequency is very weak; in other words, the solution of transformation matrix have to be focused across all frequency bins instead of the pair of different frequencies.
Therefore, the purpose of this paper is to investigate an alternative for estimating wideband 2D DOAs in a more efficient way. We consider wideband sources as sound sources, such as human speeches and musical sounds. In order to estimate the wideband DOAs, we address the issue of transforming multiple narrowband cross-correlation matrices for all frequency bins into a wideband cross-correlation matrix. Additionally, our study is inspired by a computational model of CSS with further improvement of a linear transformation procedure [36][37][38][39][40]. Since the transformation procedures of CSS are only focused on subspace between current and reference frequency as previously mentioned, we propose a new transformation procedure which focus all frequency bins simultaneously and efficiently. The higher-order generalized singular value decomposition (HOGSVD) is firstly used to achieve this important issue [42]. By employing HOGSVD of arrays of the new unique cross-correlation matrix, where elements in the row and column positions are a sample cross-correlation matrix between received signal and itself on two distinct frequencies, the new transformation procedure no longer require any process of DOA preliminary estimation. Finally, the wideband cross-correlation matrix is constructed via the proposed transformation procedure, and the wideband DOAs can be estimated by employing any subspace-based technique for estimating narrowband DOAs, but using this wideband correlation matrix instead of the narrowband correlation matrix. Therefore, the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOA of the wideband sources directly, which result in reducing computational complexity and facilitating the estimation algorithm. Practical examples, such as 2D-MUSIC and ESPRIT with an L-shaped array, are presented to showcase its applicability and effectiveness.
The rest of this paper is organized as follows. Section 2 presents the array signal model, basic assumptions and problem formulation for transforming narrowband sample cross-correlation matrices for all frequency bins into a single matrix, which is called wideband cross-correlation matrix. Description of the new transformation procedure is introduced in Section 3.1 and its effective solution via HOGSVD in Section 3.2. Section 3.3 provide a description of the proposed framework for estimating wideband DOAs by combining the proposed transformation procedure along with a scheme of estimating DOAs in a recent narrowband subspace method, and its practical examples are presented in Sections 3.3.1 and 3.3.2. The simulation and experimental results are compared with the several existing methods in Sections 4 and 5. Finally, Section 6 concludes this paper.

Data Model
The proposed method presented in this paper considers far-field sound sources. Received signals are a composition of the multiple sources, each one consisting of an angle in a spherical coordinate system. The received signals are transformed into a time-frequency representation via the short-time Fourier transform (STFT), and are given by where r (t, f ) ∈ C M is the summation of a received signal, s (t, f ) ∈ C K is a source signal, w (t, f ) ∈ C M is an additive noise, the constant M is the number of microphone elements, and K is the number of incident sources. The matrix A (φ, θ, f ) ∈ C M×K stands for the array manifold where φ and θ are phase angle components of the source on x and z axes in the spherical coordinate system. Note that the elements in A (φ, θ, f ) depend on an array geometry.
Consider the L-shaped array structure consisting of two ULAs as illustrated in Figure 1, the received signals are simplified as where From the above definitions, and a subscript z are belonged to z subarray where N is the number of microphone elements each subarray with M = 2N. The variable t is time, f is a source frequency, d is the spacing of microphone elements, λ is a wavelength with respect to λ = c f o where c is the speed of sound in current medium, and f o is a reference frequency.

Basic Assumptions
Based on the recent reviews, the following assumptions are required on the proposed framework: Assumption 1: The number of sources is known or predicted in advance [43,44]. Assumption 2: The spacing between adjacent elements of each subarray and spacing between x 1 and z 1 should be set to d = λ 2 for avoiding the angle ambiguity in array structure radiation [1,2,16]. Assumption 3: The source s (t, f ) is assumed to be Gaussian complex random variable as suggested by the literature [12,16,31]. However, we consider wideband sources as sound sources such as human speech; therefore, s (t, f ) can also be Super-Gaussian complex random variable, and it is not stationary signals for the most general case when giving an appropriate period of time.
Assumption 4: According to acoustic theory of speech, frequency dependence of the sound source, especially a human speech, is existed [45]; it means that a cross-covariance between the source and itself with distinct frequencies is not zero; cov (s Next, suppose that s (t, f ) are uncorrelated, which implies that s k (t, f ) and s k (t, f ) are statistically independent of each other when k = k ; cov (s k (t, f ) , s k (t, f )) = 0. When k = k , the sources can take to be partially dependent by the following literature [45]; therefore, a sample cross-covariance matrix of the incident sources over two different frequencies is given by Remark that c s k { f , f } is equal to σ 2 s k { f } , and σ 2 s k { f } ∈ R ≥0 is a variance at frequency f of the source.
Assumption 5: An additive white Gaussian noise is considered in this paper, which is modeled as Gaussian random variable as well as the past studies. A noise cross-covariance matrix over two different frequencies is given by where c w{ f , f } ∈ C, and I i is a i-by-i identity matrix. Note again that c w{ f , f } = σ 2 w{ f } where σ 2 w{ f } ∈ R ≥0 is a variance of the noise at frequency f . In case of the L-shaped array structure in Equation (2), we have where O i×j is a i-by-j null matrix.

Transformation Problem
Under the data model and assumptions in Sections 2.1 and 2.2, a cross-correlation matrix of the received signals is defined as where R { f , f } ∈ C M×M . In order to transform R { f , f } over the available frequency range into a single smoothed matrix, which is named as a wideband cross-correlation, a transformation procedure is required as mentioned previously [36], which is expressed as where R ∈ C M×M is the wideband cross-correlation matrix, and P is the number of STFT frequency bins. T { f i } ∈ C M×M is a transformation matrix, which was originally designed by using the ordinary beamforming technique [36], or by minimizing the Frobenius norm of array manifold matrices [37]. The objective of T { f } is to transform any given f of the array manifold A (φ, θ, f ) into A (φ, θ, f o ). All previous solutions of T { f } are solely based on subspace between pair of distinct frequencies { f , f o }, as emphasized in the introduction [36][37][38][39][40]. When power of the source at some frequency is weak or less than noise power, the matrix T { f } may not share any common angle of φ, θ because its non-zero eigenvalues are not full rank, which is resulted in a performance degradation for estimating both T { f } and wideband DOAs. If the transformation matrix can be focused by all frequency bins instead of the pair of frequencies, a good estimate of DOAs in Equation (8) might be expected. Based on this hypothesis, a new concept and scheme are presented in next section.

Proposed Method
This section introduces a new procedure for estimating a transformation matrix, its alternative solution by using the higher-order generalized singular value decomposition (HOGSVD), and practical examples of wideband DOA estimation scheme.

Problem for Estimating the Transformation Matrix and Its Solution
We start by introducing the following lemma that will be useful for obtaining a solution of transformation matrix. Lemma 1. Given a set of two distinct frequencies by { f , f o } into Equation (7), and given a transformation matrix T { f } which satisfy the property in Equation (9), assume that K < M, the cross-correlation R { f , f o } can be factorized into the singular value decomposition (SVD) form; where are close to zeros by assuming a noise-free signal and using solely the signal subspace Performing the Eigenvalue decomposition (EVD) to Equation (11), square roots of the non-zero eigenvalues of above matrix is equal to [46,47]. This completes the proof of the lemma.
share the common components on the singular values and right singular vectors, whereas the both left singular vectors may be different. Since A (φ, θ, f ) and A (φ, θ, f o ) are full rank, its remaining components are given by [48]: where are full rank matrices and have invertibility. From Equations (9) and (12), which mean that the right singular vectors of where · F is the Frobenius norm, and ∑ K k=1 σ 2 k (A) is the sum-of-squares K largest singular values of A. If the constraint in Equation (15) is not imposed, then one of the possible choices is obtained by the least squares problem [49,50]; the solution is derived by observing the point where the derivative of cost function with respect to To solve the problem much more practically, an alternative solution is introduced, which is based on the constraint in Equation (15) and Lemma 1: Imposing the constraint in Equation (15) and Lemma 1, along with the modification of orthogonal Procrustes problem (MOP), an alternative solution to Equation (15) is given by where † stands for the pseudo-inverse of a matrix. Defining the square matrix Ω { f } ∈ C K×K as the matrix containing error corrections, the error of transformation remains consistent with the following equation; where Σ ψ{ f } ∈ R K×K and U ε{ f } ∈ C M×M−K are the diagonal matrix of the K largest singular values, and the noise subspace left singular vectors of Theorem 1 provides an efficient way to construct T { f } without any process of DOA preliminary estimation, but the solution are still solely based on subspace between pair of distinct frequencies.
In order to observe the solution across all frequency bins, we will present an alternative for constructing T { f } by using HOGSVD along with Theorem 1, which the next section will address further.

Estimation of the Transformation Matrices by HOGSVD
Suppose we have a set of P complex matrices E { f i } ∈ C M×M and all of them have a full rank; where { f 1 , f 2 , · · · , f P } is a set of frequency intervals, and the cross-correlation matrices (7). The definition of HOGSVD of these P matrices are given by the generalized singular value decomposition (GSVD) of P ≥ 2 datasets and its right singular vectors are identical in all decomposition [42], as follows: where are the diagonal matrix of singular values. Note that subscripts s and n denote subspace of signal and noise, respectively. Unlike the left singular vectors To show that V e s is equal to V ψ{ f } s for all frequency bins, let us start from brief description of HOGSVD benchmark. The matrix V e s is obtained by performing EVD on the following matrix; Let us redefine where  (21) and (22) into Equation (20), we have Since where Preforming EVD in Equation (24), we can obtain V e s , which reveal that V e s is equal to V ψ{ f } for all frequency bins. In addition, it can be seen that the matrix V e s or V ψ{ f } is estimated by focusing all frequency bins simultaneously; when power of the source at some frequency is weak or less than noise power, the matrices V ψ{ f } still share common angle of φ, θ across all frequency bands effectively and identically.
After obtaining the right singulars vectors of E { f i } , we then moved forward to find its left singulars vectors. We start by considering the following equations based in Equations (19) and (25); . .
We remark again that U e{ f i } s , U e{ f i } n have unit 2-norm columns instead of orthonormal columns [42]; where ξ jk ∈ C, ∀j ∈ M, ∀k ∈ M : j = k. Then, the singular values are obtained as follows: , where · 2 is the Euclidean norm, and e j ∈ C M is a j th column of E { f i } V e . Finally, the matrices U e{ f i } s , U e{ f i } n are obtained by solving Equation (26) with Equation (28), which also satisfy the condition in Equation (27). After performing HOGSVD of Equation (18) to obtain the left and right singular vectors of can be assembled as follows: .
Note that since orthonormal columns have not yet been assumed on the matrix U ψ{ f } in Theorem 1, the transformation procedure via HOGSVD is still compatible with Theorem 1 without requiring any modifications (For details, see Equations (A13) and (A14) in Appendix A).
We now consider the computational complexity of HOGSVD. It is not surprising that HOGSVD has a heavy computational burden; that is because matrix inversions are intensively used in Equation (20). To avoid the computational burden caused by the matrix inversions, Equation (20) is reformulated by the following technique [51]. It begins by performing the economy-sized QR decomposition of Equation (19);  .
where R ς i ∈ C M×M is the upper triangular matrix, and Q ς i ∈ C M×M is a one portion of the (M × P)-by-M matrix resulting from the QR decomposition of Equation (19). Next, S is simplified as where Performing EVD of Equation (32), then we have D ς = Z ς Λ ς Z H ς , where Z ς ∈ C M×M and Λ ς ∈ R M×M are the matrix of eigenvectors and matrix of eigenvalues, respectively. Finally, the alternative computation of V e is expressed as R H ς Z ς , where the K smallest eigenvalues of D ς are belonged to signal subspace.
Computational complexity of conventional HOGSVD in Equation (20) and optimized HOGSVD in Equation (32) (20) and (32) from Tables 1 and 2, it is clearly seen that the technique in Equations (31) and (32) simplifies the mathematical model, reduces the matrix operations and improves the speed of V e computation. When P = M i ∀i : i > 0, the optimized HOGSVD has arithmetic complexity of O M i+3 , which exhibits remarkably less computational complexity than the conventional HOGSVD that is presented as O M 2i+3 . Since P in most cases is much greater than M, therefore, the cost of the optimized HOGSVD can logically be less than the conventional HOGSVD.

DOA Estimation Scheme
After the transformation matrices are formed by using HOGSVD, we now proceed to describe a framework for estimating the wideband DOAs. We start by simplifying the wideband cross-correlation matrix in Equation (8) with EVD form and substituting with T { f i } MOP , as follows: where Here, Λ ∈ C K×K and Q ∈ C M×K are the diagonal matrix of eigenvalues and matrix of eigenvectors of Equation (33) in signal subspace, and L ∈ C K×K possess unitary property by the fact that Q, V e s are the matrices with orthonormal columns [46,47] are the eigenvectors and diagonal matrix of eigenvalues in signal subspace, and likewise, Furthermore, considering only the signal subspace by focusing on the K largest singular values Λ, we can expect that Equation (33) is equivalent to Equation (8); which can be proved by employing Lemma 1, Equations (12)- (14), and Equations (A4)-(A6) on Appendix A (We omit the proof since the result is easily obtained by performing straightforward substitution). In this state, T { f } MOP provides an efficient way to transform any given f into f o by observing the solution across frequency bands without loss of generality; it means that the transformation is no longer biased by the pair of distinct frequencies { f , f o }. Furthermore, it is clearly seen that the wideband cross-correlation matrix in Equation (33) is the combination of narrowband sample cross-correlation matrices across all frequency bins, but its array manifolds are focused on the single reference frequency by using T { f } MOP , which is now feasible to estimate the wideband DOAs by employing any recent subspace-based technique for estimating narrowband DOAs [18,[20][21][22][23][24][25][26], but using this wideband correlation matrix instead of the narrowband correlation matrix. Practical examples, such as MUSIC and ESPRIT, will be presented to showcase its applicability and effectiveness in the next section. Remarks: In case of the L-shaped array structure in Equation (2), we can repeat the proposed transformation procedure to find the solution for x subarray in Equation (2) and (3); starting from Equation (7) by replacing r (t, f ) with x (t, f ), the solution for the x subarray can be given by: By performing the same procedure, the solution for z subarray is likewise given by replacing and the subscript x with z in Equations (36)- (38);

DOA Estimation Scheme via MUSIC
MUSIC estimates the DOA of the sources by locating the peaks of MUSIC spectrum along with exploiting the orthogonality of the signal and noise subspaces [12,48]. Let us define the complementary (3). Additionally, the following complementary orthogonal space is also valid; by the fact that QQ H = V e s LL H V H e s = V e s V H e s , which implies that it is possible to reduce a computational complexity of Equation (33) by using only V e s instead of calculating Q. The computationally efficient two-dimensional MUSIC (2D-MUSIC) spectrum is expressed as When the denominator in Equation (44) approaches zero for the true angles of the signals, the 2D-MUSIC spectrum will have peak spikes indicating this angles. In case of the L-shaped array structure, the x and z subarray angles are estimated separately by locating the spectral peaks of the following equations: where a

DOA Estimation Scheme via ESPRIT
We start by recalling the array manifold A x (φ, f o ) and A z (θ, f o ) in Equation (3). ESPRIT takes advantage of the rotational invariance property of ULA [13], as follows: where [20,21,26], the matrices Q x , Q z can be simplified with Equations (3), (36)- (38) and (46), as follows: where C x , C z ∈ C K×K are invertible matrices, Q x 1 , Q z 1 ∈ C N−1×K and Q x 2 , Q z 2 ∈ C N−1×K stand for the first and last (N − 1) rows of Q x , Q z , respectively. Considering Equation (48), we can construct new matrices Γ x , Γ z as follows: The angles φ k , θ k can thus be estimated by the eigenvalues of Γ x , Γ z , as follows: where λ x k , λ z k ∈ C is the k th eigenvalue of Γ x , Γ z , respectively. Furthermore, it is possible to reduce the computational complexity by using only V e s as well as MUSIC; where V x 1 ,e s , V z 1 ,e s ∈ C N−1×K and V x 2 ,e s , V z 2 ,e s ∈ C N−1×K stand for the first and last (N − 1) rows of V x,e s , V z,e s , respectively.

Numerical Simulations
In this section, performances of fusion methods by using the proposed framework are demonstrated in four types of the following scenarios: (1) a performance of selected method and the proposed methods with respect to source types, (2) the performance with respect to the number of microphone elements, (3) the performance with considering automatic pairing of the x and z subarray angles, and (4) the performance under a reverberation environment. Scenarios 1, 2 and 4 have to find DOA of x and z subarray angles separately by using the data model in Equation (2). Whereas Scenario 3 has to find DOA of x and z subarray angles simultaneously with considering automatic pairing, by using the data model in Equation (1). We provided the simulation tests of the proposed methods in comparison to following methods: IMUSIC [31], TOFS [32], TOPS [33], Squared-TOPS [34], WS-TOPS [35]. Note that the CSS-based methods are excluded in these tests; this is because unintended biases, causing by a process of DOA preliminary estimation, should be taken into consideration to other candidate methods as discussed in the literature [31][32][33]41].
To measure the overall performance of estimating the x and z subarray angles for each scenario, the root-mean-square-error (RMSE) and standard division (SD) are defined as the following equations; where K is the source number, J is the number of trials,φ k represent the estimated x and z subarray angles each trial,φ k ,θ k represent an average of the estimated x and z subarray angles, and φ k , θ k represent true x and z subarray angles.
Computer   Figures 2 and 3 showed that the proposed method with ESPRIT can efficiently handle both source types compared to other candidate methods with acceptable SNR ranges. Subsequently, it is interesting to take a close look at 40 dB SNR in Figures 2 and 3 where IMUSIC, TOFS, the proposed method with MUSIC and ESPRIT showed very low RMSE, which could attest to good DOA estimation. When decreasing the SNR to 25 dB, IMUSIC and TOFS begin to demonstrate worse RMSE quality, which is much higher than the proposed methods, and it is clearly seen when decreasing the SNR to 10 dB that all tested methods are significantly dominated, but the proposed method with ESPRIT is still associated with more satisfactory results compared to using other methods. It should be mentioned that IMUSIC and TOFS require the number of sensor elements to be much higher than the number of sources to achieve fairly good results [31][32][33]41]. Hence, the simulation results in Figures 2 and  3 are able to provide evidence that the proposed methods perform better in estimation than other candidate methods when the incident sources are wideband and non-stationary signals. Although the performances of the proposed method with MUSIC is also dominated by the noises, the overall performances is still more effective than other methods.

Scenario 2: Performance with Respect to the Number of Microphone Elements
Figures 4 and 5 illustrates performance comparisons of the selected methods and the proposed methods in term of RMSE and SD over a range of SNR. The three uncorrelated source angles are human speeches, and are placed as previously used. Firstly, let us start by looking at the case of twelve microphones in Figures 4c and 5c. IMUSIC, TOFS and WS-TOPS exhibited remarkably low levels of RMSE in the SNR range from 15 to 30 dB; this is because their performances dramatically depend on the number of sensor elements more than the number of sources [31][32][33]41]. Likewise, the proposed method with MUSIC and ESPRIT also demonstrated very low RMSE, which may imply that the performance of the proposed methods, IMUSIC, TOFS and WS-TOPS are especially effective for a wideband DOA estimation. However, the low number of microphone elements should be considered for providing more practical applications. In the case of eight microphones in each subarray, the performances of the selected methods are dominated by the number of microphone elements as illustrated in Figures 4b and 5b. Furthermore, the performances of selected methods are dramatically degraded when employing four microphones as illustrated in Figures 4a and 5a. The relevant reason is that an undesirable false peak in the spatial spectrum of the selected methods occurred, caused by the perturbation of noise; when power of the noise at some frequency is high or grater than source power, the orthogonality between the noise subspace and search space at that frequency may be not sufficient to prevent the false-alarm peaks [41]. On the contrary, RMSE performance of the proposed methods are also dominated, but less than the other methods, by exhibiting the subspace for all frequency bins simultaneously as shown in Section 3. Therefore, the proposed methods provide substantially better RMSE performance than the other methods, which implies that dependency between the number of microphone elements and sources can be relaxed. This substantial ability is more meaningful for many practical applications.

Scenario 3: Performance with Considering Automatic Pairing
This scenario estimated the DOA of x and z subarray angles simultaneously with considering automatic pairing and following the data model in Equation (1). As the L-shaped array structure consisting of two ULAs as illustrated in Figure 1, some research works estimate the DOA of x and z subarray angles separately by implementing 1D DOA estimation for each ULA [17][18][19][20][21][22][23][24][25][26]. When utilizing more than one source, these algorithms require an additional angle pair matching procedure to map the relationship between the two independent subarray angles. For instance, finding the corresponding angle pairs by rearranging the alignment of a x (φ k , f ) with a fixed right-hand side of the array manifolds of the z-subarray in the sample cross-covariance matrix [52]. It should be noted that a pair-matching procedure may results in a performance degradation caused by pair-matching error. In order to achieve the automatic pairing without the pair-matching procedure, we selected the modified 2D-MUSIC in Equation (44) as the proposed method in this scenario. Furthermore, TOPS, Squared-TOPS, WS-TOPS are excluded in these tests by the fact that the methods have only supported the ULA model. Note that the 2D peak finding algorithm was employed on 2D-IMUSIC, 2D-TOFS and the proposed method. Figures 6 and 7 showed performance comparisons of 2D-IMUSIC, 2D-TOFS and the proposed method in term of RMSE and SD over a range of SNR, where the number of microphone elements including all subarray is eight, the three uncorrelated source angles are human speeches, and are placed as previously used. Figure 6 indicates that the proposed method with 2D-MUSIC exhibits extremely similar overall performances to 2D-IMUSIC and 2D-TOFS when the SNR increases to more than 10 dB; however, computational burden of the proposed method can be significantly lower than those of the other methods, which Section 4.5 will reveal further insight.

Scenario 4: Performance under Reverberation Environment
In this scenario, we compared RMSE and SD performances of the proposed methods to other methods with respect to reverberation time. This scenario estimated DOA of x and z subarray angles separately by using the data model in Equation (2) without considering automatic pairing. The proposed methods in this scenario are the modified MUSIC in Equation (45) and ESPRIT on Equations (50)- (52). The reverberations were simulated by the following procedure [53], and its simulated wall absorption coefficients are shown in Table 3, where the dimensions of enclosure room is 15 × 15 × 5 m, a measurement protocol of reverberation time is RT60, and the reverberation time is from 200 to 1000 ms. The three uncorrelated source angles are employed in the same way as previously used, and the number of microphone elements in each subarray is twelve. Figure 8 illustrated performance comparisons of the selected methods and the proposed methods, where a color of the graph on Figure 8a denotes RMSE, whereas a color of the graph on Figure 8b denotes SD estimation performance. The vertical axis is represented as the reverberation time and horizontal axis is represented as a range of SNR. Simulation results in Figure 8 indicated that reverberation has strong effects on RMSE and SD performances in both of the selected methods and the proposed methods, and the performances decreased more significantly at the high noise levels and the long reverberation times. Since the reverberation time is decreasing, all selected methods begin to demonstrate low RMSE. It means that the trade-off between the robustness of reverberation and SNR should be considered deeply in actual applications, for instance, applying a reverberation cancellation technique or a noise cancellation technique to provide much more reliable estimation performances of both RMSE and SD. The proposed methods, however, largely outperform the other methods with respect to the reverberation time index and SNR level range between 10 and 40 dB without considering the trade-off. This can support that the performance of the proposed methods can be especially effective for a wideband DOA estimation under a reverberant environment. Table 3. Wall absorption coefficients at various reverberation time in Scenario 4 [53].

Reverberation Time based on RT60 (Millisecond)
Axial Wall Plane

Positive Direction
Negative Direction  Table 3.

Computational Complexity
Computational complexity of the proposed methods was evaluated using execution time measurement under a stable environment. We provided a computational complexity in comparison with the following cases: (1) calculating DOAs of x and z subarray angles separately as shown in Figure 9a, and (2) calculating the DOAs of both subarray angles simultaneously as shown in Figure 9b. Note that computational burdens of a peak searching algorithm are relevant in this study, where the number of searching angle in each subarray is 180. It is apparently seen in Figure 9 that computation time of the other methods presented higher growth rates than the proposed methods. This is because the peak searching algorithm execution time is potentially high, and almost all selected methods require intensive computations by testing the orthogonality of subspace and search space of narrowband sample cross-correlation matrices for all frequency bins, which results in high computation costs. On the contrary, the proposed methods transform all narrowband sample cross-correlation matrices across all frequency bins into a single matrix as shown in Equations (33)- (35), and this matrix contains useful information of source cross-correlation matrices across all frequency bins as in other words, the orthogonality testing of subspace and search space can be done by using the wideband cross-correlation matrix in Equations (33)- (35) instead of narrowband sample cross-correlation matrices for all frequency bins. Therefore, the computational complexity of the proposed methods remarkably less than the other methods, which is confirmed by the test results in Figure 9.

Experimental Results
In this section, experiments were carried out to examine the performance of the proposed methods. Experimental parameters were chosen as the previous simulations, except as follows: We used human speakers as sources of the original speech with random sentences. Their speeches were recorded for 20 runs continuously, and each record signal, approximating 1 min long, was cut into 3 s epochs. Structure of the microphone was followed by Figures 1 and 10, and the specifications of the microphone and its recording device were followed on Table 4. The experiment was performed in an indoor meeting room, and its dimensions are shown in Figure 11, where sound pressure level in the meeting room in a normal situation is 46.6 dBA, and the estimated reverberation time is based on RT60 is 219 ms.
Two scenarios are considered: (1) estimating DOA of x and z subarray angles separately, and (2) estimating DOA of x and z subarray angles simultaneously while considering automatic pairing. In case of Experiment 1, the proposed methods are the modified MUSIC in Equation (45) and ESPRIT in Equations (50)- (52), comparing with the following methods: IMUSIC [31], TOFS [32], TOPS [33], Squared-TOPS [34], WS-TOPS [35]. In case of Experiment 2, the proposed method is the modified 2D-MUSIC in Equation (44), comparing with 2D-IMUSIC [31], and 2D-TOFS [32].    Tables 5 and 6 showed performance comparisons of the selected methods and the proposed method in term of RMSE over the range of source number, where Table 5 is for Experiment 1, and Table 6 is for Experiment 2. The boldfaced results highlight the optimal minimum RMSE in each problem. As highlighted in Table 5, the performance of IMUSIC exhibited the lowest RMSE when a single source was used, but the performance of the other methods including the proposed methods also exhibited similarly low RMSE in an acceptable error range. When the two sources are performed, the performance of TOPS, Squared-TOPS and WS-TOPS are directly dominated, whereas IMUSIC, TOFS and the proposed methods are slightly dominated, but still maintained sufficiently good performance. When the incident sources are increasing to three, we clearly see that the performance of IMUSIC, TOFS, TOPS, Squared-TOPS and WS-TOPS are significantly dominated by the number of incident sources, because those methods require the number of sensor elements to be much more higher than the number of sources to achieve reasonably good results, which can be verified by referring to the simulation results in Section 4 and Figures 4 and 5. The proposed methods, however, are able estimate the DOA of three sources effectively and better than the selected methods. The reason is that the proposed methods focus on the subspace across all frequency bins simultaneously instead of focusing each frequency band individually, which is stated in Section 3.2. In case of Experiment 2 in Table 6, the experiment results indicate that the proposed method with 2D-MUSIC exhibit extremely similar overall performances to 2D-IMUSIC and 2D-TOFS. As already stated in Section 4.5, the computational complexity of the proposed method is definitely lower than 2D-IMUSIC and 2D-TOFS by the fact that those methods check the orthogonality of subspace and search space of narrowband sample cross-correlation matrices for all frequency bins, resulting in very high computation requirement. The proposed method tests the orthogonality of subspace and search space by using the wideband sample cross-correlation matrix in Equation (33) instead of using the subspace of narrowband sample cross-correlation matrices for all frequency bins, but it is sufficient to exhibit significant effects as well as using the subspace of narrowband sample cross-correlation matrices for all frequency bins. In the end, the experimental results from Tables 5 and 6 are able to provide evidence that the proposed methods have better estimating performance than other methods with respect to the number of incident sources. Since the sound source directions are static in Tables 5 and 6, it is necessary to consider moving sound sources for more practical use. In future work, we will extend the proposed method for moving sound sources, and further develop the prototype to support more realistic tasks.

Conclusions
An efficient framework for estimating DOA of wideband sound sources was presented. The issue of transforming multiple narrowband cross-correlation matrices for all frequency bins into a wideband cross-correlation matrix has been addressed successfully by focusing on signal subspace for all frequency bins simultaneously instead of the pairing of temporal and reference frequency as done by the CSS-based methods. A new solution to this problem has been given by performing HOGSVD of the array of novel cross-correlation matrices, where elements in the row and column positions are a sample cross-correlation matrix between received signal and itself on two distinct frequencies.
It was shown in the theoretical analysis that the proposed transformation procedure provided the best solution under appropriate constraints, and no longer required any process of DOA preliminary estimation. Subsequently, we provided an alternative to construct the wideband cross-correlation matrix via the proposed transformation procedure, and wideband DOAs were estimated easily using this wideband matrix along with a single scheme of estimating DOAs in any narrowband subspace methods. A major contribution of this paper is that the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOA of the wideband sources directly, which results in reducing computational complexity and facilitating the estimation algorithm. We also have performed several examples of using the proposed framework, such as 2D-MUSIC, MUSIC, and ESPRIT method integration with the L-shaped microphone arrays. Furthermore, the simulation and experimental results showed that the fusion methods by using the proposed framework exhibited especially effective performance compared to other wideband DOA estimation methods over a range of SNR with much fewer sensors, high noise and reverberation conditions. We believe that the proposed method represents an efficient way for wideband DOA estimation and would be able to improve wideband DOA estimates not only for acoustic signal processing but also other possible related fields. which is possible to reduce the computation by performing single SVD operation on