Next Article in Journal
Navigation Simulation of a Mecanum Wheel Mobile Robot Based on an Improved A* Algorithm in Unity3D
Previous Article in Journal
Applying Speckle Noise Suppression to Refractive Indices Change Detection in Porous Silicon Microarrays
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient Framework for Estimating the Direction of Multiple Sound Sources Using Higher-Order Generalized Singular Value Decomposition

by
Bandhit Suksiri
1 and
Masahiro Fukumoto
2,*
1
Department of Engineering, Graduate School of Engineering, Kochi University of Technology, Kami Campus, Kochi 782-0003, Japan
2
School of Information, Kochi University of Technology, Kami Campus, Kochi 782-0003, Japan
*
Author to whom correspondence should be addressed.
Sensors 2019, 19(13), 2977; https://doi.org/10.3390/s19132977
Submission received: 28 May 2019 / Revised: 1 July 2019 / Accepted: 3 July 2019 / Published: 5 July 2019
(This article belongs to the Section Physical Sensors)

Abstract

:
This paper presents an efficient framework for estimating the direction-of-arrival (DOA) of wideband sound sources. The proposed framework provides an efficient way to construct a wideband cross-correlation matrix from multiple narrowband cross-correlation matrices for all frequency bins. In addition, the proposed framework is inspired by the coherent signal subspace technique with further improvement of linear transformation procedure, and the new procedure no longer requires any process of DOA preliminary estimation by exploiting unique cross-correlation matrices between the received signal and itself on distinct frequencies, along with the higher-order generalized singular value decomposition of the array of this unique matrix. Wideband DOAs are estimated by employing any subspace-based technique for estimating narrowband DOAs, but using the proposed wideband correlation instead of the narrowband correlation matrix. It implies that the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOAs of the wideband sources directly, which result in reducing computational complexity and facilitating the estimation algorithm. Practical examples are presented to showcase its applicability and effectiveness, and the results show that the performance of fusion methods perform better than others over a range of signal-to-noise ratios with just a few sensors, which make it suitable for practical use.

1. Introduction

The fundamental competence of sound source localization has received much attention during the past decades, and has become an important part of navigation systems [1,2]. Direction-of-arrival (DOA) estimation in particular plays a critical role in navigation systems for the exploration of sources in widespread applications, including in acoustic signal processing [3,4,5,6,7,8]. Several approaches have been proposed as a potential way to estimate DOA. For instance, the time-difference-of-arrival-based DOA estimation is one of the most frequently used approaches, which is widely known as the generalized cross-correlation with phase transform (GCC-PHAT) [9]. In addition to this approach, a low computational requirement makes it attractive for practical applications; however, the major drawback is its low robustness in noisy and multipath environments. Another relevant approach is adopted from the independent component analysis (ICA) in blind source separation [10,11]. ICA searches independent components by measuring deviations from Gaussian distributions, such as maximization of negentropy or kurtosis. DOAs are estimated easily by using the separated components for all frequency bins, but it should be noted that the estimation accuracy of such a method is highly sensitive to the non-Gaussianity measures.
In an alternative approach to estimate narrowband DOAs, the subspace method has been proposed in an effort to improve estimation performance. The most prominent methods observe the signal and noise subspace for achieving more robust results, such as multiple signal classification (MUSIC) [12], estimation of signal parameters via rotational invariance techniques (ESPRIT) [13], and propagator method [14,15], which have been used frequently for one-dimensional (1D) DOA estimation along with the uniform linear array (ULA) of sensors. In case of a two-dimensional (2D) DOA estimation, a new geometrical structure of a sensor array is required, and it was previously found that the structure of an L-shaped array is considerably effective for estimating 2D DOAs [16]. Additionally, the L-shaped array allows for simple implementation, because it consists of two ULAs connected orthogonally at one end of each ULA. For these reasons, the L-shaped array is widely applied to the 2D DOA estimation method [17,18,19,20,21,22,23,24,25,26], and its practical applications can be found in the past researches [27,28]. Although the narrowband subspace method may be unable to directly estimate wideband DOAs, one possible way to solve this problem is to employ the narrowband subspace method in each temporal frequency intensively, and then the wideband DOA results can be estimated by interpolating the narrowband DOA results all frequency bins [29,30]. It should be noted again that intensive computational costs encountered in the above solution may be limited by practical considerations.
Several approaches were proposed to solve the problem of estimating wideband DOAs, for example, the incoherent MUSIC (IMUSIC) is one of the simplest methods for estimating wideband DOA [31]. There are two steps in IMUSIC: Firstly, a noise subspace model each temporal frequency is constructed. Then, wideband DOAs are obtained by minimizing the norm of orthogonal relation between a steering vector and the noise subspace of all frequency bins. Although accuracy performance of IMUSIC was demonstrated to be an effective method for estimating DOAs of multiple wideband signals in the high signal-to-noise ratio (SNR) region, a single small distortion of the noise subspace at any frequency can affect the whole DOA results. Many attempts were made recently to overcome this problem. For instance, the test of orthogonality of frequency subspaces (TOFS) was proposed to overcome this difficulty [32], but performance degradation caused by the small distortion still remains challenging. Another relevant approach is called the test of orthogonality of projected subspaces (TOPS) [33]. TOPS estimate DOA by constructing signal subspace of one reference frequency, and then measuring orthogonality of the previous signal subspace and noise subspace for all frequency bins. The simulations showed that TOPS is able to achieve higher accuracy than IMUSIC in mid SNR range, however, the undesirable false peaks still remain. The revised and greatly improved version of TOPSs were proposed recently to reduce these false peaks [34,35]. Obviously, computational complexities increased dramatically compared to the classical TOPS.
Another notable approach of wideband DOA estimation is the coherent signal subspace method (CSS) [36,37]. CSS specifically focuses a correlation matrix of received signals of each temporal frequency into a single matrix, which is called a universal correlation, associated with one focusing frequency via linear transformation procedure. Wideband DOAs are estimated by applying a single scheme of any narrowband subspace method on the universal correlation matrix. In addition to the transformation procedure of CSS [38,39,40], a process of DOA preliminary estimation is required before the wideband DOAs can be estimated. Therefore, a common shortcoming is clearly recognized as a requirement of DOA preliminary estimation, which means that any inferior initiation can lead to biased estimates. According to the literature [31,32,33,41], CSS demonstrates deficient performance than others such as TOPS; this is because the solutions of transformation procedure in CSS are solely focused on subspace between a temporal frequency and focusing frequency; to the best knowledge of the authors, it means that a fundamental component of the transformation matrix across all frequency bins may exhibit the different core component, which is clearly apparent when a narrowband DOA result at some frequency is not close enough to the true DOA. A single component distortion can definitely affect the whole DOA results. Therefore, the solutions have to exhibit the exact component even though power present in a received signal at that frequency is very weak; in other words, the solution of transformation matrix have to be focused across all frequency bins instead of the pair of different frequencies.
Therefore, the purpose of this paper is to investigate an alternative for estimating wideband 2D DOAs in a more efficient way. We consider wideband sources as sound sources, such as human speeches and musical sounds. In order to estimate the wideband DOAs, we address the issue of transforming multiple narrowband cross-correlation matrices for all frequency bins into a wideband cross-correlation matrix. Additionally, our study is inspired by a computational model of CSS with further improvement of a linear transformation procedure [36,37,38,39,40]. Since the transformation procedures of CSS are only focused on subspace between current and reference frequency as previously mentioned, we propose a new transformation procedure which focus all frequency bins simultaneously and efficiently. The higher-order generalized singular value decomposition (HOGSVD) is firstly used to achieve this important issue [42]. By employing HOGSVD of arrays of the new unique cross-correlation matrix, where elements in the row and column positions are a sample cross-correlation matrix between received signal and itself on two distinct frequencies, the new transformation procedure no longer require any process of DOA preliminary estimation. Finally, the wideband cross-correlation matrix is constructed via the proposed transformation procedure, and the wideband DOAs can be estimated by employing any subspace-based technique for estimating narrowband DOAs, but using this wideband correlation matrix instead of the narrowband correlation matrix. Therefore, the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOA of the wideband sources directly, which result in reducing computational complexity and facilitating the estimation algorithm. Practical examples, such as 2D-MUSIC and ESPRIT with an L-shaped array, are presented to showcase its applicability and effectiveness.
The rest of this paper is organized as follows. Section 2 presents the array signal model, basic assumptions and problem formulation for transforming narrowband sample cross-correlation matrices for all frequency bins into a single matrix, which is called wideband cross-correlation matrix. Description of the new transformation procedure is introduced in Section 3.1 and its effective solution via HOGSVD in Section 3.2. Section 3.3 provide a description of the proposed framework for estimating wideband DOAs by combining the proposed transformation procedure along with a scheme of estimating DOAs in a recent narrowband subspace method, and its practical examples are presented in Section 3.3.1 and Section 3.3.2. The simulation and experimental results are compared with the several existing methods in Section 4 and Section 5. Finally, Section 6 concludes this paper.

2. Preliminaries

2.1. Data Model

The proposed method presented in this paper considers far-field sound sources. Received signals are a composition of the multiple sources, each one consisting of an angle in a spherical coordinate system. The received signals are transformed into a time-frequency representation via the short-time Fourier transform (STFT), and are given by
r t , f = A ϕ , θ , f s t , f + w t , f ,
where r t , f C M is the summation of a received signal, s t , f C K is a source signal, w t , f C M is an additive noise, the constant M is the number of microphone elements, and K is the number of incident sources. The matrix A ϕ , θ , f C M × K stands for the array manifold where ϕ and θ are phase angle components of the source on x and z axes in the spherical coordinate system. Note that the elements in A ϕ , θ , f depend on an array geometry.
Consider the L-shaped array structure consisting of two ULAs as illustrated in Figure 1, the received signals are simplified as
x t , f z t , f = A x ϕ , f A z θ , f s t , f + w x t , f w z t , f ,
where
A x ϕ , f = a x ϕ 1 , f a x ϕ 2 , f a x ϕ K , f , A z θ , f = a z θ 1 , f a z θ 2 , f a z θ K , f , a x ϕ k , f = e α x ϕ k , f j e 2 α x ϕ k , f j e N α x ϕ k , f j T , a z θ k , f = 1 e α z θ k , f j e N 1 α z θ k , f j T , α x ϕ k , f = f f o · 2 π d cos ϕ k λ , α z θ k , f = f f o · 2 π d cos θ k λ .
From the above definitions, x t , f , w x t , f C N , A x ϕ , f C N × K and a subscript x are belonged to x subarray, and likewise, z t , f , w z t , f C N , A z θ , f C N × K and a subscript z are belonged to z subarray where N is the number of microphone elements each subarray with M = 2 N . The variable t is time, f is a source frequency, d is the spacing of microphone elements, λ is a wavelength with respect to λ = c f o where c is the speed of sound in current medium, and f o is a reference frequency.

2.2. Basic Assumptions

Based on the recent reviews, the following assumptions are required on the proposed framework:
Assumption 1: The number of sources is known or predicted in advance [43,44].
Assumption 2: The spacing between adjacent elements of each subarray and spacing between x 1 and z 1 should be set to d = λ 2 for avoiding the angle ambiguity in array structure radiation [1,2,16].
Assumption 3: The source s t , f is assumed to be Gaussian complex random variable as suggested by the literature [12,16,31]. However, we consider wideband sources as sound sources such as human speech; therefore, s t , f can also be Super-Gaussian complex random variable, and it is not stationary signals for the most general case when giving an appropriate period of time.
Assumption 4: According to acoustic theory of speech, frequency dependence of the sound source, especially a human speech, is existed [45]; it means that a cross-covariance between the source and itself with distinct frequencies is not zero; cov s k t , f , s k t , f = c s k f , f , where c s k f , f C . Next, suppose that s t , f are uncorrelated, which implies that s k t , f and s k t , f are statistically independent of each other when k k ; cov s k t , f , s k t , f = 0 . When k = k , the sources can take to be partially dependent by the following literature [45]; therefore, a sample cross-covariance matrix of the incident sources over two different frequencies is given by
S f , f = E s t , f s H t , f = diag c s 1 f , f , c s 2 f , f , , c s k f , f .
Remark that c s k f , f is equal to σ s k f 2 , and σ s k f 2 R 0 is a variance at frequency f of the source.
Assumption 5: An additive white Gaussian noise is considered in this paper, which is modeled as Gaussian random variable as well as the past studies. A noise cross-covariance matrix over two different frequencies is given by
W f , f = E w t , f w H t , f = c w f , f I M ,
where c w f , f C , and I i is a i-by-i identity matrix. Note again that c w f , f = σ w f 2 where σ w f 2 R 0 is a variance of the noise at frequency f. In case of the L-shaped array structure in Equation (2), we have
W x x f , f W x z f , f W z x f , f W z z f , f = c w f , f I N O N × N O N × N c w f , f I N ,
where O i × j is a i-by-j null matrix.

2.3. Transformation Problem

Under the data model and assumptions in Section 2.1 and Section 2.2, a cross-correlation matrix of the received signals is defined as
R f , f = E r t , f r H t , f = A ϕ , θ , f S f , f A H ϕ , θ , f + W f , f ,
where R f , f C M × M . In order to transform R f , f over the available frequency range into a single smoothed matrix, which is named as a wideband cross-correlation, a transformation procedure is required as mentioned previously [36], which is expressed as
R = 1 P i = 1 P T f i R f i , f i T f i H = A ϕ , θ , f o 1 P i = 1 P S f i , f i A H ϕ , θ , f o + 1 P i = 1 P T f i W f i , f i T f i H ,
where
A ϕ , θ , f o = T f i A ϕ , θ , f i ,
R C M × M is the wideband cross-correlation matrix, and P is the number of STFT frequency bins. T f i C M × M is a transformation matrix, which was originally designed by using the ordinary beamforming technique [36], or by minimizing the Frobenius norm of array manifold matrices [37]. The objective of T f is to transform any given f of the array manifold A ϕ , θ , f into A ϕ , θ , f o . All previous solutions of T f are solely based on subspace between pair of distinct frequencies f , f o , as emphasized in the introduction [36,37,38,39,40]. When power of the source at some frequency is weak or less than noise power, the matrix T f may not share any common angle of ϕ , θ because its non-zero eigenvalues are not full rank, which is resulted in a performance degradation for estimating both T f and wideband DOAs. If the transformation matrix can be focused by all frequency bins instead of the pair of frequencies, a good estimate of DOAs in Equation (8) might be expected. Based on this hypothesis, a new concept and scheme are presented in next section.

3. Proposed Method

This section introduces a new procedure for estimating a transformation matrix, its alternative solution by using the higher-order generalized singular value decomposition (HOGSVD), and practical examples of wideband DOA estimation scheme.

3.1. Problem for Estimating the Transformation Matrix and Its Solution

We start by introducing the following lemma that will be useful for obtaining a solution of transformation matrix.
Lemma 1.
Given a set of two distinct frequencies by f , f o into Equation (7), and given a transformation matrix T f which satisfy the property in Equation (9), assume that K < M , the cross-correlation R f , f o can be factorized into the singular value decomposition (SVD) form;
R f , f o = U f , f o s Σ f , f o s V f , f o s H + U f , f o n Σ f , f o n V f , f o n H ,
where U f , f o s , V f , f o s C M × K , Σ f , f o s R K × K are the matrix of left and right singular vectors and diagonal matrix of singular values in signal subspace, and likewise, U f , f o n , V f , f o n C M × M K , Σ f , f o n C M K × M K are with noise subspace. If the K largest singular values of T f R f , f o and R f , f o are equal, then T f U f , f o s is a matrix with orthonormal columns.
Proof. 
Since the transformation procedure of R f , f o is expressed by T f R f , f o and the array manifold A ϕ , θ , f and A ϕ , θ , f o are full rank matrices [36], Lemma 1 is valid if and only if the K largest singular values of T f R f , f o and R f , f o are equal; therefore, U f , f o s T f H T f U f , f o s = I K . Considering the M K smallest singular values of R f , f o are close to zeros by assuming a noise-free signal and using solely the signal subspace U f , f o s Σ f , f o s V f , f o s H , we have
T f R f , f o W f , f o H T f R f , f o W f , f o = V f , f o s Σ f , f o s Σ f , f o s V f , f o s H .
Performing the Eigenvalue decomposition (EVD) to Equation (11), square roots of the non-zero eigenvalues of above matrix is equal to Σ f , f o s [46,47]. This completes the proof of the lemma. □
Lemma 1 shows that R f , f o and T f R f , f o share the common components on the singular values and right singular vectors, whereas the both left singular vectors may be different. Since A ϕ , θ , f and A ϕ , θ , f o are full rank, its remaining components are given by [48]:
A ϕ , θ , f = U f , f o s F f , f o , T f A ϕ , θ , f = T f U f , f o s F f , f o , A ϕ , θ , f o = V f , f o s G f , f o ,
where
Σ f , f o s = F f , f o S f , f o G f , f o H ,
F f , f o , G f , f o C K × K are full rank matrices and have invertibility. From Equations (9) and (12), we have
T f U f , f o s = V f , f o s G f , f o F f , f o 1 ,
which mean that the right singular vectors of R f , f o and the left singular vectors of T f R f , f o share the common subspace when G f , f o F f , f o 1 has unitary property.
Since the left singular vectors of T f R f , f o exist, we continue to introduce a new transformation procedure. The matrix T f can be found as a solution to
minimize T f R f o , f o T f R f , f o F 2 subject to k = 1 K σ k 2 T f R f , f o = k = 1 K σ k 2 R f , f o ,
where · F is the Frobenius norm, and k = 1 K σ k 2 A is the sum-of-squares K largest singular values of A . If the constraint in Equation (15) is not imposed, then one of the possible choices is obtained by the least squares problem [49,50]; the solution is derived by observing the point where the derivative of cost function with respect to T f is zero, then we can have T f LS = R f o , f o R f , f o H ( R f , f o R f , f o H ) 1 , and Σ f o , f o s Σ f , f o s 1 = I K , which is difficult in practice. To solve the problem much more practically, an alternative solution is introduced, which is based on the constraint in Equation (15) and Lemma 1:
Theorem 1.
Let U ψ f , V ψ f C M × K are the matrices in signal subspace containing the left and right singular vectors of R f , f o R f o , f o H . Imposing the constraint in Equation (15) and Lemma 1, along with the modification of orthogonal Procrustes problem (MOP), an alternative solution to Equation (15) is given by
T f MOP = V ψ f U ψ f ,
where † stands for the pseudo-inverse of a matrix. Defining the square matrix Ω f C K × K as the matrix containing error corrections, the error of transformation remains consistent with the following equation;
ε f MOP = 2 · t r Σ ψ f Ω f I K + t r Σ f , f o n 2 U ψ f U ε f H U ψ f U ε f
where Σ ψ f R K × K and U ε f C M × M K are the diagonal matrix of the K largest singular values, and the noise subspace left singular vectors of R f , f o R f o , f o , respectively.
Proof. 
See Appendix A. □
Theorem 1 provides an efficient way to construct T f without any process of DOA preliminary estimation, but the solution are still solely based on subspace between pair of distinct frequencies. In order to observe the solution across all frequency bins, we will present an alternative for constructing T f by using HOGSVD along with Theorem 1, which the next section will address further.

3.2. Estimation of the Transformation Matrices by HOGSVD

Suppose we have a set of P complex matrices E f i C M × M and all of them have a full rank;
E f 1 = R f 1 , f o R f o , f o H , E f 2 = R f 2 , f o R f o , f o H , E f P = R f P , f o R f o , f o H ,
where f 1 , f 2 , , f P is a set of frequency intervals, and the cross-correlation matrices R f i , f o and R f o , f o are obtained form Equation (7). The definition of HOGSVD of these P matrices are given by the generalized singular value decomposition (GSVD) of P 2 datasets and its right singular vectors are identical in all decomposition [42], as follows:
E f 1 E f 2 E f P = U e f 1 s Σ e f 1 s U e f 2 s Σ e f 2 s U e f P s Σ e f P s V e s H + U e f 1 n Σ e f 1 n U e f 2 n Σ e f 2 n U e f P n Σ e f P n V e n H ,
where U e f i s C M × K , U e f i n C M × M K are the matrix of left singular vectors, V e s C M × K , V e n C M × M K are the matrix of right singular vectors, and Σ e f i s R K × K , Σ e f i n R M K × M K are the diagonal matrix of singular values. Note that subscripts s and n denote subspace of signal and noise, respectively. Unlike the left singular vectors U f , f o s and U f , f o n that have orthonormal columns by performing SVD, U e f i s and U e f i n now have unit 2-norm columns instead.
To show that V e s is equal to V ψ f s for all frequency bins, let us start from brief description of HOGSVD benchmark. The matrix V e s is obtained by performing EVD on the following matrix;
S = 1 P P 1 i = 1 P j = i + 1 P E f i H E f i E f j H E f j 1 + E f j H E f j E f i H E f i 1 .
Let us redefine
E f i = U o f i Σ o f i V o f i H ,
where
U o f i = U ψ f i U ε f i , V o f i = V ψ f i V ε f i , Σ o f i = Σ ψ f i O K × M K O M K × K Σ ε f i ,
Σ ε f i R M K × M K is the matrix of the M K smallest singular values of R f i , f o R f o , f o H , and V ψ f i = Q f o s , V ε f i = Q f o n by employing Theorem 1 (For details, see Appendix A). Substituting Equations (21) and (22) into Equation (20), we have
E f i H E f i E f j H E f j 1 = V o f i Σ o f i Σ o f i Σ o f j 1 Σ o f j 1 V o f j 1 , E f j H E f j E f i H E f i 1 = V o f j Σ o f j Σ o f j Σ o f i 1 Σ o f i 1 V o f i 1 .
Since V ψ f i = Q f o s = V ψ f j , V ε f i = Q f o n = V ε f i for all frequency bins, therefore
S = V e 1 P P 1 i = 1 P j = i + 1 P Σ o f i Σ o f i Σ o f j 1 Σ o f j 1 + Σ o f j Σ o f j Σ o f i 1 Σ o f i 1 V e 1 .
where
V e = Q f o s Q f o n .
Preforming EVD in Equation (24), we can obtain V e s , which reveal that V e s is equal to V ψ f for all frequency bins. In addition, it can be seen that the matrix V e s or V ψ f is estimated by focusing all frequency bins simultaneously; when power of the source at some frequency is weak or less than noise power, the matrices V ψ f still share common angle of ϕ , θ across all frequency bands effectively and identically.
After obtaining the right singulars vectors of E f i , we then moved forward to find its left singulars vectors. We start by considering the following equations based in Equations (19) and (25);
E f 1 E f 2 E f P V e = U e f 1 s Σ e f 1 s U e f 2 s Σ e f 2 s U e f P s Σ e f P s I K O K × M K + U e f 1 n Σ e f 1 n U e f 2 n Σ e f 2 n U e f P n Σ e f P n O M K × K I M K .
We remark again that U e f i s , U e f i n have unit 2-norm columns instead of orthonormal columns [42];
U e f i s H U e f i n H U e f i s U e f i n = 1 ξ 12 ξ 1 M ξ 21 1 ξ 2 M ξ M 1 ξ M 2 1 ,
where ξ j k C , j M , k M : j k . Then, the singular values are obtained as follows:
Σ e f i s = diag e 1 2 , e 2 2 , , e K 2 , Σ e f i n = diag e K + 1 2 , e K + 2 2 , , e M 2 ,
where · 2 is the Euclidean norm, and e j C M is a j th column of E f i V e . Finally, the matrices U e f i s , U e f i n are obtained by solving Equation (26) with Equation (28), which also satisfy the condition in Equation (27).
After performing HOGSVD of Equation (18) to obtain the left and right singular vectors of R f i , f o R f o , f o H , the transformation matrices T f i MOP can be assembled as follows:
T f 1 MOP T f 2 MOP T f P MOP = V e s U e f 1 s U e f 2 s U e f P s .
Note that since orthonormal columns have not yet been assumed on the matrix U ψ f in Theorem 1, the transformation procedure via HOGSVD is still compatible with Theorem 1 without requiring any modifications (For details, see Equations (A13) and (A14) in Appendix A).
We now consider the computational complexity of HOGSVD. It is not surprising that HOGSVD has a heavy computational burden; that is because matrix inversions are intensively used in Equation (20). To avoid the computational burden caused by the matrix inversions, Equation (20) is reformulated by the following technique [51]. It begins by performing the economy-sized QR decomposition of Equation (19);
E f 1 E f 2 E f P = Q ς 1 Q ς 2 Q ς P R ς ,
where R ς i C M × M is the upper triangular matrix, and Q ς i C M × M is a one portion of the M × P -by-M matrix resulting from the QR decomposition of Equation (19). Next, S is simplified as
S ς = 1 P P 1 D ς P I M ,
where
D ς = i = 1 P Q ς i Q ς i 1 .
Performing EVD of Equation (32), then we have D ς = Z ς Λ ς Z ς , where Z ς C M × M and Λ ς R M × M are the matrix of eigenvectors and matrix of eigenvalues, respectively. Finally, the alternative computation of V e is expressed as R ς Z ς , where the K smallest eigenvalues of D ς are belonged to signal subspace.
Computational complexity of conventional HOGSVD in Equation (20) and optimized HOGSVD in Equation (32) are investigated by applying the following scenario: an M × M matrix addition, subtraction, multiplication and element-wise multiplication follow the traditional way, whereas an M × M matrix inversion and QR decomposition of P M × M matrix are implemented by using Gauss–Jordan elimination algorithm and Householder transformation, respectively. Comparing computation costs of Equations (20) and (32) from Table 1 and Table 2, it is clearly seen that the technique in Equations (31) and (32) simplifies the mathematical model, reduces the matrix operations and improves the speed of V e computation. When P = M i i : i > 0 , the optimized HOGSVD has arithmetic complexity of O M i + 3 , which exhibits remarkably less computational complexity than the conventional HOGSVD that is presented as O M 2 i + 3 . Since P in most cases is much greater than M, therefore, the cost of the optimized HOGSVD can logically be less than the conventional HOGSVD.

3.3. DOA Estimation Scheme

After the transformation matrices are formed by using HOGSVD, we now proceed to describe a framework for estimating the wideband DOAs. We start by simplifying the wideband cross-correlation matrix in Equation (8) with EVD form and substituting with T f i MOP , as follows:
1 P i = 1 P T f i MOP R f i , f i T f i MOP H = 1 P i = 1 P V e s U e f i s Q f i s Λ f i s Q f i s H + Q f i n Λ f i n Q f i n H V e s U e f i s H = Q Λ Q H + Π ,
where
Λ = L H 1 P i = 1 P U e f i s Q f i s Λ f i s U e f i s Q f i s H L , Π = V e s 1 P i = 1 P U e f i s Q f i n Λ f i n U e f i s Q f i n H V e s H , Q = V e s L .
Here, Λ C K × K and Q C M × K are the diagonal matrix of eigenvalues and matrix of eigenvectors of Equation (33) in signal subspace, and L C K × K possess unitary property by the fact that Q , V e s are the matrices with orthonormal columns [46,47]. Remark that R f i , f i is also derived by performing EVD; the matrices Q f i s C M × K , Λ f i s R K × K are the eigenvectors and diagonal matrix of eigenvalues in signal subspace, and likewise, Q f i n C M × M K , Λ f i n R M K × M K are with noise subspace. Furthermore, considering only the signal subspace by focusing on the K largest singular values Λ , we can expect that Equation (33) is equivalent to Equation (8);
Q Λ Q H A ϕ , θ , f o 1 P i = 1 P S f i , f i A H ϕ , θ , f o ,
which can be proved by employing Lemma 1, Equations (12)–(14), and Equations (A4)–(A6) on Appendix A (We omit the proof since the result is easily obtained by performing straightforward substitution). In this state, T f MOP provides an efficient way to transform any given f into f o by observing the solution across frequency bands without loss of generality; it means that the transformation is no longer biased by the pair of distinct frequencies f , f o . Furthermore, it is clearly seen that the wideband cross-correlation matrix in Equation (33) is the combination of narrowband sample cross-correlation matrices across all frequency bins, but its array manifolds are focused on the single reference frequency by using T f MOP , which is now feasible to estimate the wideband DOAs by employing any recent subspace-based technique for estimating narrowband DOAs [18,20,21,22,23,24,25,26], but using this wideband correlation matrix instead of the narrowband correlation matrix. Practical examples, such as MUSIC and ESPRIT, will be presented to showcase its applicability and effectiveness in the next section.
Remarks: In case of the L-shaped array structure in Equation (2), we can repeat the proposed transformation procedure to find the solution for x subarray in Equation (2) and (3); starting from Equation (7) by replacing r t , f with x t , f , the solution for the x subarray can be given by:
T x f i MOP = V x , e s U x , e f i s ,
1 P i = 1 P T x f i MOP R x f i , f i T x f i MOP H = Q x Λ x Q x H + Π x ,
Q x Λ x Q x H A x ϕ , f o 1 P i = 1 P S f i , f i A x H ϕ , f o .
By performing the same procedure, the solution for z subarray is likewise given by replacing x t , f , A x ϕ , f o with z t , f , A z θ , f o and the subscript x with z in Equations (36)–(38);
T z f i MOP = V z , e s U z , e f i s ,
1 P i = 1 P T z f i MOP R z f i , f i T z f i MOP H = Q z Λ z Q z H + Π z ,
Q z Λ z Q z H A z θ , f o 1 P i = 1 P S f i , f i A z H θ , f o .

3.3.1. DOA Estimation Scheme via MUSIC

MUSIC estimates the DOA of the sources by locating the peaks of MUSIC spectrum along with exploiting the orthogonality of the signal and noise subspaces [12,48]. Let us define the complementary orthogonal space I M Q Q H which is orthogonal to A ϕ , θ , f o ;
a H ϕ k , θ k , f o I M Q Q H a ϕ k , θ k , f o = 0 ,
for all k 1 , 2 , , K , where a ϕ k , θ k , f o C M is a k th column of A ϕ , θ , f o as shown in Equation (3). Additionally, the following complementary orthogonal space is also valid;
a H ϕ k , θ k , f o I M V e s V e s H a ϕ k , θ k , f o = 0 ,
by the fact that Q Q H = V e s L L H V e s H = V e s V e s H , which implies that it is possible to reduce a computational complexity of Equation (33) by using only V e s instead of calculating Q . The computationally efficient two-dimensional MUSIC (2D-MUSIC) spectrum is expressed as
p 2 D MUSIC ϕ , θ = 1 a H ϕ , θ , f o I M V e s V e s H a ϕ , θ , f o .
When the denominator in Equation (44) approaches zero for the true angles of the signals, the 2D-MUSIC spectrum will have peak spikes indicating this angles. In case of the L-shaped array structure, the x and z subarray angles are estimated separately by locating the spectral peaks of the following equations:
p x MUSIC ϕ = 1 a x H ϕ , f o I N V x , e s V x , e s H a x ϕ , f o , p z MUSIC θ = 1 a z H θ , f o I N V z , e s V z , e s H a z θ , f o ,
where a x ϕ , f o , a z θ , f o C N are i th column of A x ϕ , f o , A z θ , f o , respectively.

3.3.2. DOA Estimation Scheme via ESPRIT

We start by recalling the array manifold A x ϕ , f o and A z θ , f o in Equation (3). ESPRIT takes advantage of the rotational invariance property of ULA [13], as follows:
A x 2 ϕ , f o = A x 1 ϕ , f o Φ x , A z 2 θ , f o = A z 1 θ , f o Θ z ,
where
Φ x = diag e α x ϕ 1 , f o j , e α x ϕ 2 , f o j , , e α x ϕ K , f o j , Θ z = diag e α z θ 1 , f o j , e α z θ 2 , f o j , , e α z θ K , f o j ,
A x 1 ϕ , f o , A z 1 θ , f o C N 1 × K and A x 2 ϕ , f o , A z 2 θ , f o C N 1 × K stand for the first and last N 1 rows of A x ϕ , f o , A z θ , f o , respectively. Similar to [20,21,26], the matrices Q x , Q z can be simplified with Equations (3), (36)–(38) and (46), as follows:
Q x 1 = A x 1 ϕ , f o C x 1 , Q x 2 = A x 2 ϕ , f o C x 1 , Q z 1 = A z 1 θ , f o C z 1 , Q z 2 = A z 2 θ , f o C z 1 ,
where C x , C z C K × K are invertible matrices, Q x 1 , Q z 1 C N 1 × K and Q x 2 , Q z 2 C N 1 × K stand for the first and last N 1 rows of Q x , Q z , respectively. Considering Equation (48), we can construct new matrices Γ x , Γ z as follows:
Γ x = Q x 1 Q x 2 = C x Φ x C x 1 , Γ z = Q z 1 Q z 2 = C z Θ z C z 1 .
The angles ϕ k , θ k can thus be estimated by the eigenvalues of Γ x , Γ z , as follows:
ϕ k = cos 1 angle λ x k λ 2 π d , θ k = cos 1 angle λ z k λ 2 π d ,
where λ x k , λ z k C is the k th eigenvalue of Γ x , Γ z , respectively. Furthermore, it is possible to reduce the computational complexity by using only V e s as well as MUSIC;
V x 1 , e s V x 2 , e s = L x Γ x L x 1 = L x C x Φ x L x C x 1 , V z 1 , e s V z 2 , e s = L z Γ z L z 1 = L z C z Θ z L z C z 1 ,
where
V x 1 , e s = A x 1 ϕ , f o L x C x 1 , V x 2 , e s = A x 2 ϕ , f o L x C x 1 , V z 1 , e s = A z 1 θ , f o L z C z 1 , V z 2 , e s = A z 2 θ , f o L z C z 1 ,
V x 1 , e s , V z 1 , e s C N 1 × K and V x 2 , e s , V z 2 , e s C N 1 × K stand for the first and last N 1 rows of V x , e s , V z , e s , respectively.

4. Numerical Simulations

In this section, performances of fusion methods by using the proposed framework are demonstrated in four types of the following scenarios: (1) a performance of selected method and the proposed methods with respect to source types, (2) the performance with respect to the number of microphone elements, (3) the performance with considering automatic pairing of the x and z subarray angles, and (4) the performance under a reverberation environment. Scenarios 1, 2 and 4 have to find DOA of x and z subarray angles separately by using the data model in Equation (2). Whereas Scenario 3 has to find DOA of x and z subarray angles simultaneously with considering automatic pairing, by using the data model in Equation (1). We provided the simulation tests of the proposed methods in comparison to following methods: IMUSIC [31], TOFS [32], TOPS [33], Squared-TOPS [34], WS-TOPS [35]. Note that the CSS-based methods are excluded in these tests; this is because unintended biases, causing by a process of DOA preliminary estimation, should be taken into consideration to other candidate methods as discussed in the literature [31,32,33,41].
To measure the overall performance of estimating the x and z subarray angles for each scenario, the root-mean-square-error (RMSE) and standard division (SD) are defined as the following equations;
RMSE = 1 2 J K j = 1 J k = 1 K ϕ ^ k j ϕ k 2 + θ ^ k j θ k 2 ,
SD = 1 2 J K j = 1 J k = 1 K ϕ ^ k j ϕ ¯ k 2 + θ ^ k j θ ¯ k 2 ,
where K is the source number, J is the number of trials, ϕ ^ k j , θ ^ k j represent the estimated x and z subarray angles each trial, ϕ ¯ k , θ ¯ k represent an average of the estimated x and z subarray angles, and ϕ k , θ k represent true x and z subarray angles.
Computer simulations were carried out in Matlab® R2017a, using PC with Debian GNU/Linux 9.4 × 86_64, Intel® Core i5-4590 CPU 3.30 GHz, 16G RAM, Intel® Math Kernel Library 11.3.1 on BLAS and LAPACK 3.5.0. Each scenario is repeated 100 times, and simulation parameters are chosen as follows: sampling frequency is 48 kHz, an output of each microphone is captured at 1 s, speed of sound c is 343 m/s, the spacing of microphone elements d is 5 cm, STFT focusing frequency range is from 0.1 to 16 kHz, the reference frequency f o is 3.43 kHz. Note that we used perturbations of the true angles by adding Gaussian random noise.

4.1. Scenario 1: Performance with Respect to Source Types

Figure 2 and Figure 3 showed performance comparisons of the selected methods and the proposed methods in term of RMSE and SD over a range of SNR. The proposed methods are the modified MUSIC in Equation (45) and ESPRIT in Equations (50)–(52). The number of microphone elements each subarray is six, and the three uncorrelated source angles ϕ k , θ k are placed at 41.41 , 60 , 60 , 45 and 75.52 , 30 . In Figure 2a and Figure 3a, sources are human speeches. Sources in Figure 2b and Figure 3b are recorded sound on a piano comprising various monochromatic notes and containing sampling frequency range up to 48 kHz. Note that all sources are not stationary signals. The results in Figure 2 and Figure 3 showed that the proposed method with ESPRIT can efficiently handle both source types compared to other candidate methods with acceptable SNR ranges. Subsequently, it is interesting to take a close look at 40 dB SNR in Figure 2 and Figure 3 where IMUSIC, TOFS, the proposed method with MUSIC and ESPRIT showed very low RMSE, which could attest to good DOA estimation. When decreasing the SNR to 25 dB, IMUSIC and TOFS begin to demonstrate worse RMSE quality, which is much higher than the proposed methods, and it is clearly seen when decreasing the SNR to 10 dB that all tested methods are significantly dominated, but the proposed method with ESPRIT is still associated with more satisfactory results compared to using other methods. It should be mentioned that IMUSIC and TOFS require the number of sensor elements to be much higher than the number of sources to achieve fairly good results [31,32,33,41]. Hence, the simulation results in Figure 2 and Figure 3 are able to provide evidence that the proposed methods perform better in estimation than other candidate methods when the incident sources are wideband and non-stationary signals. Although the performances of the proposed method with MUSIC is also dominated by the noises, the overall performances is still more effective than other methods.

4.2. Scenario 2: Performance with Respect to the Number of Microphone Elements

Figure 4 and Figure 5 illustrates performance comparisons of the selected methods and the proposed methods in term of RMSE and SD over a range of SNR. The three uncorrelated source angles are human speeches, and are placed as previously used. Firstly, let us start by looking at the case of twelve microphones in Figure 4c and Figure 5c. IMUSIC, TOFS and WS-TOPS exhibited remarkably low levels of RMSE in the SNR range from 15 to 30 dB; this is because their performances dramatically depend on the number of sensor elements more than the number of sources [31,32,33,41]. Likewise, the proposed method with MUSIC and ESPRIT also demonstrated very low RMSE, which may imply that the performance of the proposed methods, IMUSIC, TOFS and WS-TOPS are especially effective for a wideband DOA estimation. However, the low number of microphone elements should be considered for providing more practical applications. In the case of eight microphones in each subarray, the performances of the selected methods are dominated by the number of microphone elements as illustrated in Figure 4b and Figure 5b. Furthermore, the performances of selected methods are dramatically degraded when employing four microphones as illustrated in Figure 4a and Figure 5a. The relevant reason is that an undesirable false peak in the spatial spectrum of the selected methods occurred, caused by the perturbation of noise; when power of the noise at some frequency is high or grater than source power, the orthogonality between the noise subspace and search space at that frequency may be not sufficient to prevent the false-alarm peaks [41]. On the contrary, RMSE performance of the proposed methods are also dominated, but less than the other methods, by exhibiting the subspace for all frequency bins simultaneously as shown in Section 3. Therefore, the proposed methods provide substantially better RMSE performance than the other methods, which implies that dependency between the number of microphone elements and sources can be relaxed. This substantial ability is more meaningful for many practical applications.

4.3. Scenario 3: Performance with Considering Automatic Pairing

This scenario estimated the DOA of x and z subarray angles simultaneously with considering automatic pairing and following the data model in Equation (1). As the L-shaped array structure consisting of two ULAs as illustrated in Figure 1, some research works estimate the DOA of x and z subarray angles separately by implementing 1D DOA estimation for each ULA [17,18,19,20,21,22,23,24,25,26]. When utilizing more than one source, these algorithms require an additional angle pair matching procedure to map the relationship between the two independent subarray angles. For instance, finding the corresponding angle pairs by rearranging the alignment of a x ϕ k , f with a fixed right-hand side of the array manifolds of the z-subarray in the sample cross-covariance matrix [52]. It should be noted that a pair-matching procedure may results in a performance degradation caused by pair-matching error. In order to achieve the automatic pairing without the pair-matching procedure, we selected the modified 2D-MUSIC in Equation (44) as the proposed method in this scenario. Furthermore, TOPS, Squared-TOPS, WS-TOPS are excluded in these tests by the fact that the methods have only supported the ULA model. Note that the 2D peak finding algorithm was employed on 2D-IMUSIC, 2D-TOFS and the proposed method. Figure 6 and Figure 7 showed performance comparisons of 2D-IMUSIC, 2D-TOFS and the proposed method in term of RMSE and SD over a range of SNR, where the number of microphone elements including all subarray is eight, the three uncorrelated source angles are human speeches, and are placed as previously used. Figure 6 indicates that the proposed method with 2D-MUSIC exhibits extremely similar overall performances to 2D-IMUSIC and 2D-TOFS when the SNR increases to more than 10 dB; however, computational burden of the proposed method can be significantly lower than those of the other methods, which Section 4.5 will reveal further insight.

4.4. Scenario 4: Performance under Reverberation Environment

In this scenario, we compared RMSE and SD performances of the proposed methods to other methods with respect to reverberation time. This scenario estimated DOA of x and z subarray angles separately by using the data model in Equation (2) without considering automatic pairing. The proposed methods in this scenario are the modified MUSIC in Equation (45) and ESPRIT on Equations (50)–(52). The reverberations were simulated by the following procedure [53], and its simulated wall absorption coefficients are shown in Table 3, where the dimensions of enclosure room is 15 × 15 × 5 m, a measurement protocol of reverberation time is RT60, and the reverberation time is from 200 to 1000 ms. The three uncorrelated source angles are employed in the same way as previously used, and the number of microphone elements in each subarray is twelve. Figure 8 illustrated performance comparisons of the selected methods and the proposed methods, where a color of the graph on Figure 8a denotes RMSE, whereas a color of the graph on Figure 8b denotes SD estimation performance. The vertical axis is represented as the reverberation time and horizontal axis is represented as a range of SNR. Simulation results in Figure 8 indicated that reverberation has strong effects on RMSE and SD performances in both of the selected methods and the proposed methods, and the performances decreased more significantly at the high noise levels and the long reverberation times. Since the reverberation time is decreasing, all selected methods begin to demonstrate low RMSE. It means that the trade-off between the robustness of reverberation and SNR should be considered deeply in actual applications, for instance, applying a reverberation cancellation technique or a noise cancellation technique to provide much more reliable estimation performances of both RMSE and SD. The proposed methods, however, largely outperform the other methods with respect to the reverberation time index and SNR level range between 10 and 40 dB without considering the trade-off. This can support that the performance of the proposed methods can be especially effective for a wideband DOA estimation under a reverberant environment.

4.5. Computational Complexity

Computational complexity of the proposed methods was evaluated using execution time measurement under a stable environment. We provided a computational complexity in comparison with the following cases: (1) calculating DOAs of x and z subarray angles separately as shown in Figure 9a, and (2) calculating the DOAs of both subarray angles simultaneously as shown in Figure 9b. Note that computational burdens of a peak searching algorithm are relevant in this study, where the number of searching angle in each subarray is 180. It is apparently seen in Figure 9 that computation time of the other methods presented higher growth rates than the proposed methods. This is because the peak searching algorithm execution time is potentially high, and almost all selected methods require intensive computations by testing the orthogonality of subspace and search space of narrowband sample cross-correlation matrices for all frequency bins, which results in high computation costs. On the contrary, the proposed methods transform all narrowband sample cross-correlation matrices across all frequency bins into a single matrix as shown in Equations (33)–(35), and this matrix contains useful information of source cross-correlation matrices across all frequency bins as 1 P i = 1 P S f i , f i ; in other words, the orthogonality testing of subspace and search space can be done by using the wideband cross-correlation matrix in Equations (33)–(35) instead of narrowband sample cross-correlation matrices for all frequency bins. Therefore, the computational complexity of the proposed methods remarkably less than the other methods, which is confirmed by the test results in Figure 9.

5. Experimental Results

In this section, experiments were carried out to examine the performance of the proposed methods. Experimental parameters were chosen as the previous simulations, except as follows: We used human speakers as sources of the original speech with random sentences. Their speeches were recorded for 20 runs continuously, and each record signal, approximating 1 min long, was cut into 3 s epochs. Structure of the microphone was followed by Figure 1 and Figure 10, and the specifications of the microphone and its recording device were followed on Table 4. The experiment was performed in an indoor meeting room, and its dimensions are shown in Figure 11, where sound pressure level in the meeting room in a normal situation is 46.6 dBA, and the estimated reverberation time is based on RT60 is 219 ms.
Two scenarios are considered: (1) estimating DOA of x and z subarray angles separately, and (2) estimating DOA of x and z subarray angles simultaneously while considering automatic pairing. In case of Experiment 1, the proposed methods are the modified MUSIC in Equation (45) and ESPRIT in Equations (50)–(52), comparing with the following methods: IMUSIC [31], TOFS [32], TOPS [33], Squared-TOPS [34], WS-TOPS [35]. In case of Experiment 2, the proposed method is the modified 2D-MUSIC in Equation (44), comparing with 2D-IMUSIC [31], and 2D-TOFS [32].
Table 5 and Table 6 showed performance comparisons of the selected methods and the proposed method in term of RMSE over the range of source number, where Table 5 is for Experiment 1, and Table 6 is for Experiment 2. The boldfaced results highlight the optimal minimum RMSE in each problem. As highlighted in Table 5, the performance of IMUSIC exhibited the lowest RMSE when a single source was used, but the performance of the other methods including the proposed methods also exhibited similarly low RMSE in an acceptable error range. When the two sources are performed, the performance of TOPS, Squared-TOPS and WS-TOPS are directly dominated, whereas IMUSIC, TOFS and the proposed methods are slightly dominated, but still maintained sufficiently good performance. When the incident sources are increasing to three, we clearly see that the performance of IMUSIC, TOFS, TOPS, Squared-TOPS and WS-TOPS are significantly dominated by the number of incident sources, because those methods require the number of sensor elements to be much more higher than the number of sources to achieve reasonably good results, which can be verified by referring to the simulation results in Section 4 and Figure 4 and Figure 5. The proposed methods, however, are able estimate the DOA of three sources effectively and better than the selected methods. The reason is that the proposed methods focus on the subspace across all frequency bins simultaneously instead of focusing each frequency band individually, which is stated in Section 3.2. In case of Experiment 2 in Table 6, the experiment results indicate that the proposed method with 2D-MUSIC exhibit extremely similar overall performances to 2D-IMUSIC and 2D-TOFS. As already stated in Section 4.5, the computational complexity of the proposed method is definitely lower than 2D-IMUSIC and 2D-TOFS by the fact that those methods check the orthogonality of subspace and search space of narrowband sample cross-correlation matrices for all frequency bins, resulting in very high computation requirement. The proposed method tests the orthogonality of subspace and search space by using the wideband sample cross-correlation matrix in Equation (33) instead of using the subspace of narrowband sample cross-correlation matrices for all frequency bins, but it is sufficient to exhibit significant effects as well as using the subspace of narrowband sample cross-correlation matrices for all frequency bins. In the end, the experimental results from Table 5 and Table 6 are able to provide evidence that the proposed methods have better estimating performance than other methods with respect to the number of incident sources.
Since the sound source directions are static in Table 5 and Table 6, it is necessary to consider moving sound sources for more practical use. In future work, we will extend the proposed method for moving sound sources, and further develop the prototype to support more realistic tasks.

6. Conclusions

An efficient framework for estimating DOA of wideband sound sources was presented. The issue of transforming multiple narrowband cross-correlation matrices for all frequency bins into a wideband cross-correlation matrix has been addressed successfully by focusing on signal subspace for all frequency bins simultaneously instead of the pairing of temporal and reference frequency as done by the CSS-based methods. A new solution to this problem has been given by performing HOGSVD of the array of novel cross-correlation matrices, where elements in the row and column positions are a sample cross-correlation matrix between received signal and itself on two distinct frequencies. It was shown in the theoretical analysis that the proposed transformation procedure provided the best solution under appropriate constraints, and no longer required any process of DOA preliminary estimation. Subsequently, we provided an alternative to construct the wideband cross-correlation matrix via the proposed transformation procedure, and wideband DOAs were estimated easily using this wideband matrix along with a single scheme of estimating DOAs in any narrowband subspace methods. A major contribution of this paper is that the proposed framework enables cutting-edge studies in the recent narrowband subspace methods to estimate DOA of the wideband sources directly, which results in reducing computational complexity and facilitating the estimation algorithm. We also have performed several examples of using the proposed framework, such as 2D-MUSIC, MUSIC, and ESPRIT method integration with the L-shaped microphone arrays. Furthermore, the simulation and experimental results showed that the fusion methods by using the proposed framework exhibited especially effective performance compared to other wideband DOA estimation methods over a range of SNR with much fewer sensors, high noise and reverberation conditions. We believe that the proposed method represents an efficient way for wideband DOA estimation and would be able to improve wideband DOA estimates not only for acoustic signal processing but also other possible related fields.

Author Contributions

B.S. conceived of the hypothesis, provided the mathematical proof, designed and performed the experiments and wrote the manuscript as part of a PhD project. M.F. supervises the project and contributed to the development of the ideas.

Funding

This work was supported by JSPS KAKENHI Grant Number JP18K12111 and MEXT Grant Number 91506000972.

Acknowledgments

The authors are grateful to Kochi University of Technology for Monthly Support via a grant of Special Scholarship Program over a period of three years.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

This appendix provides a detailed derivation of Theorem 1. We begin by considering the cross-correlation matrices in Equation (7). R f o , f o can be constructed into the EVD form, which is given by
R f o , f o = Q f o s Λ f o s Q f o s + Q f o n Λ f o n Q f o n ,
where Q f o s C M × K , Λ f o s R K × K are the matrix of eigenvectors and diagonal matrix of eigenvalues in signal subspace, and likewise, Q f o n C M × M K , Λ f o n R M K × M K are with noise subspace. In case of R f , f o , it can be derived by performing SVD, which directly follows from Equation (10). Since A ϕ , θ , f and A ϕ , θ , f o are full rank matrices [36], its remaining components are expressed as follows [48]:
U f , f o s = A ϕ , θ , f F f , f o 1 , V f , f o s = A ϕ , θ , f o G f , f o 1 , Q f o s = A ϕ , θ , f o H f o 1 ,
where
Σ f , f o s = F f , f o S f , f o G f , f o H , Λ f o s = H f o S f o , f o H f o H ,
F f , f o , G f , f o , H f o C K × K are also full rank and invertible. Note again that U f , f o s , V f , f o s , Q f o s have orthonormal columns [46], hence, it is obvious to see that
Q f o s = V f , f o s G f , f o H f o 1 ,
H f o H H f o = A H ϕ , θ , f o A ϕ , θ , f o = G f , f o H G f , f o .
From Equation (A4), we may expect that G f , f o , H f o have unitary property, but it is incorrect when considering Equation (A5). Therefore, a proposition of G f , f o H f , f o 1 have to be identity;
G f , f o H f o 1 = V f , f o s H Q f o s = I K .
When considering only the signal subspace, it can be seen from Equations (A4)–(A6) that the right singular vectors of R f , f o and the eigenvectors of R f o , f o are identical.
Next, we continue to generalize the objective function in Equation (15) by utilizing Orthogonal Procrustes (OP) [54], but with some modification (MOP). The objective function in Equation (15) is rederived by
R f o , f o T f R f , f o F 2 = tr R f o , f o R f o , f o H + tr T f R f , f o R f , f o H T f H 2 · tr T f R f , f o R f o , f o H ,
where a returns the real part of the variable a, tr A trace of the square matrix A . Considering each expression in Equation (A7), the trace of a product of two square matrices is independent of the orders;
tr R f o , f o R f o , f o H = tr Λ f o s Λ f o s + tr Λ f o n Λ f o n .
Employing Lemma 1, next, we have
tr T f R f , f o R f , f o H T f H = tr Σ f , f o s Σ f , f o s + tr Σ f , f o n Σ f , f o n U f , f o n H T f H T f U f , f o n ,
From Equation (A6), we finally have
tr T f R f , f o R f o , f o H = tr Σ f , f o s Λ f o s Q f o s H T f U f , f o s + tr Σ f , f o n Λ f o n Q f o n H T f U f , f o n
Substituting Equations (A8)–(A10) into Equation (A7), the objective function is simplified as
R f o , f o T f R f , f o F 2 = tr Λ f o s Λ f o s + tr Λ f o n Λ f o n + tr Σ f , f o s Σ f , f o s + tr Σ f , f o n Σ f , f o n U f , f o n H T f H T f U f , f o n 2 · tr Σ f , f o s Λ f o s Q f o s H T f U f , f o s 2 · tr Σ f , f o n Λ f o n Q f o n H T f U f , f o n .
Three expressions of tr Λ f o s Λ f o s , tr Λ f o n Λ f o n , tr Σ f , f o s Σ f , f o s are completely isolated from T f . Therefore, the optimization problem is redefined as
minimize T f tr Σ f , f o n Σ f , f o n U f , f o n H T f H T f U f , f o n 2 · tr Σ f , f o s Λ f o s Q f o s H T f U f , f o s 2 · tr Σ f , f o n Λ f o n Q f o n H T f U f , f o n subject to k = 1 K σ k 2 T f R f , f o = k = 1 K σ k 2 R f , f o .
Now there are two possible cases which we need to consider. The first case is when the M K smallest singular values of R f , f o are close to zeros; the other is when some of the M K smallest singular values of R f , f o are morn than zeros.
Case 1: Assume that all the M K smallest singular values of R f , f o are close to zeros, we have
maximize T f 2 · tr Σ f , f o s Λ f o s Q f o s H T f U f , f o s subject to k = 1 K σ k 2 T f R f , f o = k = 1 K σ k 2 R f , f o , m = K + 1 M σ m 2 R f , f o = 0 .
Using the proposition of Equation (A6) and employing Lemma 1, two possible solutions to reach the maximum point of Equation (A13) can be found. The first solution is given by
T f MOP = Q f o s Ω f MOP U f , f o s ,
where orthonormal columns has not yet been defined on U f , f o s , and the second solution is given by
T f OP = Q f o s Ω f OP U f , f o s H ,
where U f , f o s has orthonormal columns. Note that the subscript † denotes the pseudo-inverse. When the constraints in Equation (A13) are imposed into Equations (A15) and (A14), we can have that Ω f MOP = I K , Ω f OP = I K , and the maximum is achieved;
ϱ ¯ f case 1 = 2 · tr Σ f , f o s Λ f o s .
Case 2: Assume that some of the M K smallest singular values of R f , f o are more than zeros, the best solution of Equation (A12) can be given the same as Equation (A15), and its minimum is equal to Equation (A16);
ϱ ̲ f case 2 : OP = 2 · tr Σ f , f o s Λ f o s .
On the contrary, when using Equation (A14) in Equation (A12), the minimum of cost function is remained by
ϱ ̲ f case 2 : MOP = tr Σ f , f o n Σ f , f o n U f , f o s U f , f o n H U f , f o s U f , f o n 2 · tr Σ f , f o s Λ f o s .
Using the solution of Equation (A14) rather than Equation (A15) allows us to relax the error constraint in the hope of arriving at a reduction in the computation of HOGSVD (For details, see Section 3.2), but this is still sufficient for estimating T f without loss of generality; the squares of M K smallest singular values of R f , f o are very close to zeros, so we can assume that Σ f , f o n 2 O M K × M K . Remark that error of the transformation remains consistent with the following equation;
ε f MOP = 2 · tr Σ f , f o s Λ f o s Ω f I K + tr Σ f , f o n 2 U f , f o s U f , f o n H U f , f o s U f , f o n .
To further reduce a computational burden caused by performing SVD of R f , f o and EVD of R f o , f o , we reinitialize the cross-correlation matrix as
R f , f o R f o , f o H = U f , f o s Σ f , f o s V f , f o s H + U f , f o n Σ f , f o n V f , f o n H Q f o s Λ f o s Q f o s H + Q f o n Λ f o n Q f o n H H = U f , f o s Σ f , f o s Λ f o s Q f o s H + U f , f o n Σ f , f o n Λ f o n Q f o n H ,
which is possible to reduce the computation by performing single SVD operation on R f , f o R f o , f o H .

References

  1. Haykin, S.; Liu, K.R. Handbook on Array Processing and Sensor Networks; Wiley: Hoboken, NJ, USA, 2010. [Google Scholar] [CrossRef]
  2. Zekavat, R.; Buehrer, R.M. Handbook of Position Location: Theory, Practice and Advances, 1st ed.; Wiley: Hoboken, NJ, USA, 2011. [Google Scholar] [CrossRef]
  3. Song, K.; Liu, Q.; Wang, Q. Olfaction and Hearing Based Mobile Robot Navigation for Odor/Sound Source Search. Sensors 2011, 11, 2129–2154. [Google Scholar] [CrossRef] [PubMed]
  4. Velasco, J.; Pizarro, D.; Macias-Guarasa, J. Source Localization with Acoustic Sensor Arrays Using Generative Model Based Fitting with Sparse Constraints. Sensors 2012, 12, 13781–13812. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Tiete, J.; Domínguez, F.; Silva, B.D.; Segers, L.; Steenhaut, K.; Touhafi, A. SoundCompass: A Distributed MEMS Microphone Array-Based Sensor for Sound Source Localization. Sensors 2014, 14, 1918–1949. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Clark, B.; Flint, J.A. Acoustical Direction Finding with Time-Modulated Arrays. Sensors 2016, 16, 2107. [Google Scholar] [CrossRef] [PubMed]
  7. Hoshiba, K.; Washizaki, K.; Wakabayashi, M.; Ishiki, T.; Kumon, M.; Bando, Y.; Gabriel, D.; Nakadai, K.; Okuno, H.G. Design of UAV-Embedded Microphone Array System for Sound Source Localization in Outdoor Environments. Sensors 2017, 17, 2535. [Google Scholar] [CrossRef] [PubMed]
  8. Liu, H.; Li, B.; Yuan, X.; Zhou, Q.; Huang, J. A Robust Real Time Direction-of-Arrival Estimation Method for Sequential Movement Events of Vehicles. Sensors 2018, 18, 992. [Google Scholar] [CrossRef] [PubMed]
  9. Knapp, C.; Carter, G. The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 1976, 24, 320–327. [Google Scholar] [CrossRef] [Green Version]
  10. Sawada, H.; Mukai, R.; Araki, S.; Makino, S. A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Trans. Speech Audio Process. 2004, 12, 530–538. [Google Scholar] [CrossRef]
  11. Yokoi, K.; Hamada, N. ICA-Based Separation and DOA Estimation of Analog Modulated Signals in Multipath Environment. IEICE Trans. Commun. 2005, 88-B, 4246–4249. [Google Scholar] [CrossRef]
  12. Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE Trans. Antennas Propag. 1986, 34, 276–280. [Google Scholar] [CrossRef] [Green Version]
  13. Roy, R.; Kailath, T. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 984–995. [Google Scholar] [CrossRef]
  14. Marcos, S.; Marsal, A.; Benidir, M. Performances analysis of the propagator method for source bearing estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-94), Adelaide, SA, Australia, 19–22 April 1994; pp. IV/237–IV/240. [Google Scholar] [CrossRef]
  15. Marcos, S.; Marsal, A.; Benidir, M. The propagator method for source bearing estimation. Signal Process. 1995, 42, 121–138. [Google Scholar] [CrossRef]
  16. Hua, Y.; Sarkar, T.K.; Weiner, D.D. An L-shaped array for estimating 2-D directions of wave arrival. IEEE Trans. Antennas Propag. 1991, 39, 143–146. [Google Scholar] [CrossRef]
  17. Porozantzidou, M.G.; Chryssomallis, M.T. Azimuth and elevation angles estimation using 2-D MUSIC algorithm with an L-shape antenna. In Proceedings of the 2010 IEEE Antennas and Propagation Society International Symposium, Toronto, ON, Canada, 11–17 July 2010; pp. 1–4. [Google Scholar] [CrossRef]
  18. Wang, G.; Xin, J.; Zheng, N.; Sano, A. Computationally Efficient Subspace-Based Method for Two-Dimensional Direction Estimation With L-Shaped Array. IEEE Trans. Signal Process. 2011, 59, 3197–3212. [Google Scholar] [CrossRef]
  19. Nie, X.; Wei, P. Array Aperture Extension Algorithm for 2-D DOA Estimation with L-Shaped Array. Progress Electromagn. Res. Lett. 2015, 52, 63–69. [Google Scholar] [CrossRef]
  20. Tayem, N. Azimuth/Elevation Directional Finding with Automatic Pair Matching. Int. J. Antennas Propag. 2016, 2016, 5063450. [Google Scholar] [CrossRef]
  21. Wang, Q.; Yang, H.; Chen, H.; Dong, Y.; Wang, L. A Low-Complexity Method for Two-Dimensional Direction-of-Arrival Estimation Using an L-Shaped Array. Sensors 2017, 17, 190. [Google Scholar] [CrossRef]
  22. Li, J.; Jiang, D. Joint Elevation and Azimuth Angles Estimation for L-Shaped Array. IEEE Antennas Wirel. Propag. Lett. 2017, 16, 453–456. [Google Scholar] [CrossRef]
  23. Dong, Y.Y.; Chang, X. Computationally Efficient 2D DOA Estimation for L-Shaped Array with Unknown Mutual Coupling. Math. Probl. Eng. 2018, 2018, 1–9. [Google Scholar] [CrossRef] [Green Version]
  24. Hsu, K.C.; Kiang, J.F. Joint Estimation of DOA and Frequency of Multiple Sources with Orthogonal Coprime Arrays. Sensors 2019, 19, 335. [Google Scholar] [CrossRef]
  25. Wu, T.; Deng, Z.; Li, Y.; Li, Z.; Huang, Y. Estimation of Two-Dimensional Non-Symmetric Incoherently Distributed Source with L-Shape Arrays. Sensors 2019, 19, 1226. [Google Scholar] [CrossRef] [PubMed]
  26. Gao, X.; Hao, X.; Li, P.; Li, G. An Improved Two-Dimensional Direction-of-Arrival Estimation Algorithm for L-Shaped Nested Arrays with Small Sample Sizes. Sensors 2019, 19, 2176. [Google Scholar] [CrossRef] [PubMed]
  27. Omer, M.; Quadeer, A.A.; Al-Naffouri, T.Y.; Sharawi, M.S. An L-shaped microphone array configuration for impulsive acoustic source localization in 2-D using orthogonal clustering based time delay estimation. In Proceedings of the 1st International Conference on Communications, Signal Processing, and their Applications (ICCSPA), Sharjah, UAE, 12–14 February 2013; pp. 1–6. [Google Scholar] [CrossRef]
  28. Wajid, M.; Kumar, A.; Bahl, R. Direction-of-arrival estimation algorithms using single acoustic vector-sensor. In Proceedings of the International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT), Aligarh, India, 24–26 November 2017; pp. 84–88. [Google Scholar] [CrossRef]
  29. Sugimoyo, Y.; Miyabe, S.; Yamada, T.; Makino, S.; Juang, B.H. An Extension of MUSIC Exploiting Higher-Order Moments via Nonlinear Mapping. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2016, E99.A, 1152–1162. [Google Scholar] [CrossRef] [Green Version]
  30. Suksiri, B.; Fukumoto, M. Multiple Frequency and Source Angle Estimation by Gaussian Mixture Model with Modified Microphone Array Data Model. J. Signal Process. 2017, 21, 163–166. [Google Scholar] [CrossRef] [Green Version]
  31. Su, G.; Morf, M. The signal subspace approach for multiple wide-band emitter location. IEEE Trans. Acoust. Speech Signal Process. 1983, 31, 1502–1522. [Google Scholar] [CrossRef]
  32. Yu, H.; Liu, J.; Huang, Z.; Zhou, Y.; Xu, X. A New Method for Wideband DOA Estimation. In Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing, Shanghai, China, 21–25 September 2007; pp. 598–601. [Google Scholar] [CrossRef]
  33. Yoon, Y.S.; Kaplan, L.M.; McClellan, J.H. TOPS: New DOA estimator for wideband signals. IEEE Trans. Signal Process. 2006, 54, 1977–1989. [Google Scholar] [CrossRef]
  34. Okane, K.; Ohtsuki, T. Resolution Improvement of Wideband Direction-of-Arrival Estimation “Squared-TOPS”. In Proceedings of the IEEE International Conference on Communications, Cape Town, South Africa, 23–27 May 2010; pp. 1–5. [Google Scholar] [CrossRef]
  35. Hirotaka, H.; Tomoaki, O. DOA estimation for wideband signals based on weighted Squared TOPS. EURASIP J. Wirel. Commun. Netw. 2016, 2016, 243. [Google Scholar] [CrossRef] [Green Version]
  36. Wang, H.; Kaveh, M. Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Trans. Acoust. Speech Signal Process. 1985, 33, 823–831. [Google Scholar] [CrossRef] [Green Version]
  37. Hung, H.; Kaveh, M. Focussing matrices for coherent signal-subspace processing. IEEE Trans. Acoust. Speech Signal Process. 1988, 36, 1272–1281. [Google Scholar] [CrossRef]
  38. Valaee, S.; Kabal, P. Wideband array processing using a two-sided correlation transformation. IEEE Trans. Signal Process. 1995, 43, 160–172. [Google Scholar] [CrossRef] [Green Version]
  39. Valaee, S.; Champagne, B.; Kabal, P. Localization of wideband signals using least-squares and total least-squares approaches. IEEE Trans. Signal Process. 1999, 47, 1213–1222. [Google Scholar] [CrossRef]
  40. Suksiri, B.; Fukumoto, M. A Computationally Efficient Wideband Direction-of-Arrival Estimation Method for L-Shaped Microphone Arrays. In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar] [CrossRef]
  41. Abdelbari, A. Direction of Arrival Estimation of Wideband RF Sources. Ph.D. Thesis, Near East University, Nicosia, Cyprus, 2018. [Google Scholar] [CrossRef]
  42. Ponnapalli, S.P.; Saunders, M.A.; Van Loan, C.F.; Alter, O. A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms. PLoS ONE 2011, 6, 1–11. [Google Scholar] [CrossRef] [PubMed]
  43. Xin, J.; Zheng, N.; Sano, A. Simple and Efficient Nonparametric Method for Estimating the Number of Signals Without Eigendecomposition. IEEE Trans. Signal Process. 2007, 55, 1405–1420. [Google Scholar] [CrossRef]
  44. Nadler, B. Nonparametric Detection of Signals by Information Theoretic Criteria: Performance Analysis and an Improved Estimator. IEEE Trans. Signal Process. 2010, 58, 2746–2756. [Google Scholar] [CrossRef] [Green Version]
  45. Diehl, R. Acoustic and auditory phonetics: The adaptive design of speech sound systems. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 2008, 363, 965–978. [Google Scholar] [CrossRef] [PubMed]
  46. Van Der Veen, A.; Deprettere, E.F.; Swindlehurst, A.L. Subspace-based signal analysis using singular value decomposition. Proc. IEEE 1993, 81, 1277–1308. [Google Scholar] [CrossRef] [Green Version]
  47. Hogben, L. Discrete Mathematics and Its Applications. In Handbook of Linear Algebra; CRC Press: Boca Raton, FL, USA, 2006. [Google Scholar]
  48. Naidu, P. Sensor Array Signal Processing; Taylor & Francis: Abingdon-on-Thames, UK, 2000. [Google Scholar]
  49. Meyer, C.D. (Ed.) Matrix Analysis and Applied Linear Algebra; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2000. [Google Scholar]
  50. Horn, R.A.; Johnson, C.R. Matrix Analysis, 2nd ed.; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar] [CrossRef]
  51. Van Loan, C.F. Structured Matrix Computations from Structured Tensors: Lecture 6. In The Higher-Order Generalized Singular Value Decomposition; Cornell University: Ithaca, NY, USA, 2015. [Google Scholar]
  52. Wei, Y.; Guo, X. Pair-Matching Method by Signal Covariance Matrices for 2D-DOA Estimation. IEEE Antennas Wirel. Propag. Lett. 2014, 13, 1199–1202. [Google Scholar] [CrossRef]
  53. Lehmanna, E.A.; Johansson, A.M. Prediction of energy decay in room impulse responses simulated with an image-source model. J. Acoust. Soc. Am. 2008, 124, 269–277. [Google Scholar] [CrossRef] [PubMed]
  54. Gower, J.C.; Dijksterhuis, G.B. Procrustes Problems, 1st ed.; Oxford University Press: Oxford, UK, 2004. [Google Scholar] [CrossRef]
Figure 1. L-shaped microphone array configuration for 2D DOA estimation.
Figure 1. L-shaped microphone array configuration for 2D DOA estimation.
Sensors 19 02977 g001
Figure 2. RMSE estimation performance versus SNR on Scenario 1; (a) three different human speeches, and (b) three uncorrelated musical sounds where six microphones are employed each subarray.
Figure 2. RMSE estimation performance versus SNR on Scenario 1; (a) three different human speeches, and (b) three uncorrelated musical sounds where six microphones are employed each subarray.
Sensors 19 02977 g002
Figure 3. SD estimation performance versus SNR on Scenario 1; (a) three different human speeches, and (b) three uncorrelated musical sounds where six microphones is employed each subarray.
Figure 3. SD estimation performance versus SNR on Scenario 1; (a) three different human speeches, and (b) three uncorrelated musical sounds where six microphones is employed each subarray.
Sensors 19 02977 g003
Figure 4. RMSE estimation performance versus SNR on Scenario 2; three human speeches are employed and the number of microphone elements each subarray on (a) N = 4 , (b) N = 8 , and (c) N = 12 .
Figure 4. RMSE estimation performance versus SNR on Scenario 2; three human speeches are employed and the number of microphone elements each subarray on (a) N = 4 , (b) N = 8 , and (c) N = 12 .
Sensors 19 02977 g004
Figure 5. SD estimation performance versus SNR on Scenario 2; three human speeches are employed and the number of microphone elements each subarray on (a) N = 4 , (b) N = 8 , and (c) N = 12 .
Figure 5. SD estimation performance versus SNR on Scenario 2; three human speeches are employed and the number of microphone elements each subarray on (a) N = 4 , (b) N = 8 , and (c) N = 12 .
Sensors 19 02977 g005
Figure 6. RMSE estimation performance versus SNR on Scenario 3 where M = 8 .
Figure 6. RMSE estimation performance versus SNR on Scenario 3 where M = 8 .
Sensors 19 02977 g006
Figure 7. SD estimation performance versus SNR on Scenario 3 where M = 8 .
Figure 7. SD estimation performance versus SNR on Scenario 3 where M = 8 .
Sensors 19 02977 g007
Figure 8. Performance evaluations of Scenario 4; (a) RMSE estimation performance versus SNR, and (b) SD estimation performance versus SNR, where three uncorrelated human speeches are employed along with a reverberant environment. The reverberations were simulated by the following procedure [53], where dimensions of enclosure room is 15 × 15 × 5 m, a measurement protocol of reverberation time is RT60, and wall absorption coefficients are followed on Table 3.
Figure 8. Performance evaluations of Scenario 4; (a) RMSE estimation performance versus SNR, and (b) SD estimation performance versus SNR, where three uncorrelated human speeches are employed along with a reverberant environment. The reverberations were simulated by the following procedure [53], where dimensions of enclosure room is 15 × 15 × 5 m, a measurement protocol of reverberation time is RT60, and wall absorption coefficients are followed on Table 3.
Sensors 19 02977 g008
Figure 9. Computational complexities; (a) changing the number of microphone elements each subarray N, and (b) the number of microphone elements including all subarray M where the number of incident sources K = 3 .
Figure 9. Computational complexities; (a) changing the number of microphone elements each subarray N, and (b) the number of microphone elements including all subarray M where the number of incident sources K = 3 .
Sensors 19 02977 g009
Figure 10. Photograph of the microphone array system.
Figure 10. Photograph of the microphone array system.
Sensors 19 02977 g010
Figure 11. Photograph of the experimental environment, floor plan and the room dimensions.
Figure 11. Photograph of the experimental environment, floor plan and the room dimensions.
Sensors 19 02977 g011
Table 1. Command used in HOGSVD.
Table 1. Command used in HOGSVD.
Command NameCommand Counts
HOGSVD in Equation (20)Optimized HOGSVD in Equation (32)
Matrix Addition/Subtraction P ( P 1 ) 1 P 1
Element-wise Multiplication10
Matrix Multiplication 3 P ( P 1 ) P + { 1 }
Matrix Inversion P ( P 1 ) P
QR Decomposition01
Eigenvalue Decomposition (EVD)11
Remark: { } is caused by a matrix multiplication of R ς Z ς .
Table 2. Computational complexities.
Table 2. Computational complexities.
Command NameComplex Floating Point Operations per Command
Matrix Addition/Subtraction M 2
Element-wise Multiplication M 2
Matrix Multiplication 2 M 3 M 2
Matrix Inversion (Gauss-Jordan elimination) 2 3 M 3 + 3 2 M 2 7 6 M
QR Decomposition (Householder transformation) ( 2 P 2 3 ) M 3 2 P M 2 + 2 3 M
HOGSVD in Equation (20) without counting EVD 20 P ( P 1 ) 3 M 3 P ( P 1 ) 2 M 2 7 P ( P 1 ) 6 M
Optimized HOGSVD in Equation (32) without counting EVD ( 14 P + 4 ) 3 M 3 ( P + 4 ) 2 M 2 ( 7 P 4 ) 6 M
Table 3. Wall absorption coefficients at various reverberation time in Scenario 4 [53].
Table 3. Wall absorption coefficients at various reverberation time in Scenario 4 [53].
Reverberation Time based on RT60 (Millisecond)Axial Wall Plane
Positive DirectionNegative Direction
x z x z x y x z x z x y
2000.72360.20210.68440.07920.24360.5586
3000.71420.16870.76660.26500.23870.7043
4000.73060.05550.77310.40910.84930.8587
5000.50640.49740.82480.41890.80690.7572
6000.60740.62990.80280.75990.63730.8209
7000.74420.76240.87340.69220.64800.7893
8000.67790.68270.78650.80450.83860.8430
9000.69920.71110.77410.87520.82330.9081
10000.76220.77070.93940.82480.81920.8398
Table 4. System specification
Table 4. System specification
Hardware Type/ParameterSpecification/Value
Audio InterfaceRoland® Octa-capture (UA-1010)
Sampling Frequency48,000 Hz
Microphone NameBehringer® C-2 studio condenser microphone
Number of Microphones8
Pickup PatternsCardioid (8.9 mV/Pa; 20–20,000 Hz)
Diaphragm Diameter16 mm
Equivalent Noise Level19.0 dBA (IEC 651)
SNR Ratio75 dB
Microphone StructureL-shaped Array
Spacing of Microphone9 cm
Table 5. Performance evaluation on Experiment 1. The boldfaced results highlight the optimal minimum RMSE.
Table 5. Performance evaluation on Experiment 1. The boldfaced results highlight the optimal minimum RMSE.
Incident SourcesRMSE of DOAs (Degree)
NumberPositionAngle (Degree)IMUSICTOFSTOPSSquared TOPSWS-TOPSProposed Method with MUSICProposed Method with ESPRIT
1 ϕ 1 960.30500.20501.09501.33500.56000.77500.7074
θ 1 860.54001.26001.27502.01500.68500.57000.6915
Average0.42250.73251.18501.67500.62250.67250.6995
2 ϕ 1 651.18571.728620.014328.585737.87141.50002.0284
θ 1 1509.60006.685726.357139.785788.20008.81438.6800
ϕ 2 551.07141.685722.257119.400032.24292.97143.8695
θ 2 1008.37148.38575.01436.785760.22866.67143.1630
Average5.05714.621418.410723.639354.63574.98934.4353
3 ϕ 1 582.14002.390046.550052.810040.95003.66004.0334
θ 1 5555.000055.000055.000055.000055.00009.43004.1057
ϕ 2 1001.84002.000041.570062.400070.91001.87002.4554
θ 2 9595.000083.420052.450071.480095.00009.77005.8638
ϕ 3 13010.930011.890028.830032.280095.24008.25006.9071
θ 3 12026.980025.840016.120018.010091.28005.94007.3165
Average31.981730.090040.086748.663374.73006.48675.1137
Table 6. Performance evaluation on Experiment 2. The boldfaced results highlight the optimal minimum RMSE.
Table 6. Performance evaluation on Experiment 2. The boldfaced results highlight the optimal minimum RMSE.
Incident SourcesRMSE of DOAs (Degree)
NumberPositionAngle (Degree)2D-IMUSIC2D-TOFSProposed Method with 2D-MUSIC
1 ϕ 1 960.90000.90000.9000
θ 1 860.40001.05000.7500
Average0.65000.97500.8250
2 ϕ 1 570.95001.15001.1000
θ 1 911.05001.80001.7000
ϕ 2 1394.95005.20005.4500
θ 2 963.15003.30002.0500
Average2.52502.86252.5750
3 ϕ 1 480.95001.55001.9500
θ 1 861.45000.80002.4500
ϕ 2 980.90001.80001.1500
θ 2 951.45002.15002.6000
ϕ 3 1522.70002.40005.9000
θ 3 954.50003.90001.4500
Average1.99172.10002.5833
4 ϕ 1 1005.80956.52383.2857
θ 1 942.42862.61901.6667
ϕ 2 511.23811.09522.5714
θ 2 950.57140.66671.3333
ϕ 3 1341.95241.85713.9524
θ 3 10310.095210.28579.2857
ϕ 4 1537.47627.80957.8571
θ 4 894.71434.71435.3810
Average4.28574.44644.4167

Share and Cite

MDPI and ACS Style

Suksiri, B.; Fukumoto, M. An Efficient Framework for Estimating the Direction of Multiple Sound Sources Using Higher-Order Generalized Singular Value Decomposition. Sensors 2019, 19, 2977. https://doi.org/10.3390/s19132977

AMA Style

Suksiri B, Fukumoto M. An Efficient Framework for Estimating the Direction of Multiple Sound Sources Using Higher-Order Generalized Singular Value Decomposition. Sensors. 2019; 19(13):2977. https://doi.org/10.3390/s19132977

Chicago/Turabian Style

Suksiri, Bandhit, and Masahiro Fukumoto. 2019. "An Efficient Framework for Estimating the Direction of Multiple Sound Sources Using Higher-Order Generalized Singular Value Decomposition" Sensors 19, no. 13: 2977. https://doi.org/10.3390/s19132977

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop