DOA Estimation of Indoor Sound Sources Based on Spherical Harmonic Domain Beam-Space MUSIC

: The Multiple Signal Classiﬁcation (MUSIC) algorithm has become one of the most popular algorithms for estimating the direction-of-arrival (DOA) of multiple sources due to its simplicity and ease of implementation. Spherical microphone arrays can capture more sound ﬁeld information than planar arrays. The collected multichannel speech signals can be transformed from the space domain to the spherical harmonic domain (SHD) for processing through spherical modal decomposition. The spherical harmonic domain MUSIC (SHD-MUSIC) algorithm reduces the dimensionality of the covariance matrix and achieves better DOA estimation performance than the conventional MUSIC algorithm. However, the SHD-MUSIC algorithm is prone to failure in low signal-to-noise ratio (SNR), high reverberation time (RT), and other multi-source environments. To address these challenges, we propose a novel joint spherical harmonic domain beam-space MUSIC (SHD-BMUSIC) algorithm in this paper. The advantage of decoupling the signal frequency and angle information in the SHD is exploited to improve the anti-reverberation property of the DOA estimation. In the SHD, the broadband beamforming matrix converts the SHD sound pressure to the beam domain output. Beamforming enhances the incoming signal in the desired direction and reduces the SNR threshold as well as the dimension of the signal covariance matrix. In addition, the 3D beam of the spherical array has rotational symmetry and its beam steering is decoupled from the beam shape. Therefore, the broadband beamforming constructed in this paper allows for the arbitrary adjustment of beam steering without the need to redesign the beam shape. Both simulation experiments and practical tests are conducted to verify that the proposed SHD-BMUSIC algorithm has a more robust adjacent source discrimination capability than the SHD-MUSIC algorithm.


Introduction
The ability to localize acoustic events is a fundamental prerequisite for equipping microphones with awareness of their surrounding environment.Source localization provides estimates of positional information, e.g., Directions-of-Arrival (DOAs) [1].Conventional DOA estimation methods are mainly based on time difference of arrival (TDOA) [2], steered response power-phase transform (SRP-PHAT) [3], beam scanning [4], and spatial covariance matrix (SCM) [5]-and so on.The high-resolution spectral estimation method, such as Multiple Signal Classification (MUSIC), has received much attention because it can break the Rayleigh limit constraint and has a more promising direction resolution.However, while handling coherent sources of broadband signals in a reverberant environment, the high-resolution spectral estimation method suffers from a large spectral peak search operation, sensitivity to array error, and a high-resolution SNR threshold.These shortcomings often have a great negative impact on research, which is worth considering deeply.In recent years, the theory of spherical microphone arrays has been widely used and developed.For DOA estimation of multiple sound sources, the spherical microphone array [6][7][8] has become a key and universal research focus.Compared with a linear and planar array, the spherical array has rotational symmetry and is easy for beamforming [9,10].The spherical microphone array enables sampling of non-confounding sound pressure to reconstruct the 3D sound field accurately.Furthermore, the sound pressure received can be transformed into the spherical harmonic domain (SHD) for processing [11].Using the spherical Fourier transform (SFT), the spherical sound pressure received can be decomposed into several orders of spherical harmonic function combinations [12].Thus, a MUSIC and Root-MUSIC pseudo-spectrum can be constructed in the SHD [13].This spherical wave decomposition can separate the frequency and angular components of the sound source from each other.In [14], the authors used the frequency smoothing method to de-correlate the coherent sources, which improved the estimation performance of the SHD-MUSIC algorithm in the presence of coherent sources.However, it is clear that the signal subspace turns out not absolutely orthogonal to the noise subspace in the sound field environments.As a result, the SHD-MUSIC algorithm is sensitive to the effect of noise.There is a phenomenon called the subspace swap.The data covariance matrix cannot always hold at full-rank, which affects the correctness of the matrix decomposition results.The above reasons eventually reduce the performance of the DOA estimation algorithm in case of low SNRs.In the open spherical arrays [15], the zero point of the spherical Bessel function causes a low value of the spherical mode amplitude response [16].At the Bessel zero points, the low amplitude problem of spherical modal amplitude cannot be responded to perfectly, so the direction of the signal source cannot be estimated.Two ideas can solve the Bessel zero problem: spherical configuration and algorithmic optimization.For example, we can use directional microphone arrays, rigid-sphere arrays [17], and dual-sphere arrays [18] under these circumstances.In [19], the authors proposed the SHD-RMUSIC algorithm using relative sound pressure values that are insensitive to noise to improve the robustness to noise.However, the algorithm could not achieve satisfactory performance at lower SNRs (below 5 dB) and was not applicable to strong reverberation (above 0.6 s) environments.To improve the performance of source direction estimation in strongly reverberant environments, the authors proposed the direct path dominance test (DPD-MUSIC) [20].The DPD algorithm exploits the non-smoothness and sparsity of the speech signal.By selecting time-frequency bins (TF-bins) that contain the direct sound dominance of a target source without the remaining reflected sound contribution, these TF-bins are used to construct the spatial spectrum.Thus, a certain reverberation multi-path distortion problem is solved in some scenarios, and multiple sources can be localized in a 3D strongly reverberant environment.However, its application presupposes that most of the TF-bins candidate sets must contain correct DOA information and that the proportion of TF-bins containing correct DOA information is sufficient to estimate the sound source DOA reliably.To make this application prerequisite guaranteed, the authors proposed an improved DPD test method, called the DPD-EDS test [21].The test identifies the TF-bins dominated by the direct sound by means of enhanced decomposition based on the direct sound.This improves the accuracy of selecting the TF-bins that contain the correct DOA information.Obviously, the DPD test requires a direct acoustic prior [22], which is often difficult to obtain in harsh sound field environments.This shows that the rigid spherical microphone arrays for indoor DOA estimation of multiple adjacent sources to attenuate the effects of low-SNR environments and Bessel zeros on DOA estimation performance are an effective avenue of research.
In this paper, a rigid spherical microphone array is used to perform DOA estimation of indoor sound sources by placing 32 omnidirectional microphones in a uniform sampling distribution on the surface of a rigid baffle.We propose a joint SHD-beamforming and beam-domain MUSIC algorithm solution.It is promising and profound that the proposed method can improve the resolution probability of adjacent sound sources and reduce the SNR resolution thresholds.In a nutshell, our contributions are the following:

•
We propose a novel spherical harmonic domain beam-space MUSIC algorithm.Combined with the advantages of the rigid-sphere configuration, the effects of the low SNR environment and Bessel's zero on the performance of multi-source DOA estimation are better reduced.This results in a more robust multi-source localization capability and adjacent source discrimination;

•
We use a quadratic-constraint quadratic-planning optimization method to generate a multi-objective optimized signal-independent beam weight.We construct a very flexible rotationally symmetric beamformer so that the 3D beam of the spherical array can be arbitrarily redirected without redesigning the beam shape;

•
We verify the superior performance of the proposed algorithm using simulated data as well as field-testing data in real-world situations (anechoic room and reverberant room).
The paper is structured as follows: Section 2 describes the space domain and spherical harmonic domain system models.Section 3 introduces the proposed DOA estimation method.Our experiment findings and discussions are clearly shown in Section 4. Finally, we complete the summary part and conclusions in Section 5.

Space-Domain System Model
We consider a set of spherical microphone arrays consisting of I omnidirectional microphones; r i = (r i cos φ i sin θ i , r i sin φ i sin θ i , r i cos θ i ) T denotes the position of the i-th microphone of the array.The azimuth φ is measured counterclockwise from the x-axis, the elevation angle θ is measured downward from the z-axis, and r i represents the distance of the i-th microphone to the center of the array.The adopted spherical coordinate system is shown in Figure 1.
In this paper, a rigid spherical microphone array is used to perform DOA estimation of indoor sound sources by placing 32 omnidirectional microphones in a uniform sampling distribution on the surface of a rigid baffle.We propose a joint SHD-beamforming and beam-domain MUSIC algorithm solution.It is promising and profound that the proposed method can improve the resolution probability of adjacent sound sources and reduce the SNR resolution thresholds.In a nutshell, our contributions are the following: • We propose a novel spherical harmonic domain beam-space MUSIC algorithm.Combined with the advantages of the rigid-sphere configuration, the effects of the low SNR environment and Bessel's zero on the performance of multi-source DOA estimation are better reduced.This results in a more robust multi-source localization capability and adjacent source discrimination; • We use a quadratic-constraint quadratic-planning optimization method to generate a multi-objective optimized signal-independent beam weight.We construct a very flexible rotationally symmetric beamformer so that the 3D beam of the spherical array can be arbitrarily redirected without redesigning the beam shape; • We verify the superior performance of the proposed algorithm using simulated data as well as field-testing data in real-world situations (anechoic room and reverberant room).
The paper is structured as follows: Section 2 describes the space domain and spherical harmonic domain system models.Section 3 introduces the proposed DOA estimation method.Our experiment findings and discussions are clearly shown in Section 4. Finally, we complete the summary part and conclusions in Section 5.

Space-Domain System Model
We consider a set of spherical microphone arrays consisting of I omnidirectional microphones; ( cos sin , sin sin , cos ) = denotes the position of the i -th microphone of the array.The azimuth  is measured counterclockwise from the x-axis, the elevation angle  is measured downward from the z-axis, and i r represents the dis- tance of the i -th microphone to the center of the array.The adopted spherical coordinate system is shown in Figure 1.There are L far-field sound sources generating plane waves that propagate through space and are picked up by microphones.Ψ l = (θ l , φ l ) denotes the direction of propagation of the l-th sound source, k l = −(k cos φ l sin θ l , k sin φ l sin θ l , k cos θ l ) T denotes the wave number vector of the l-th plane wave.The signal received by the i-th microphone in the frequency domain can be expressed as: where v i (k, Ψ l ) denotes the direction of propagation of the i-th microphone associated with the l-th plane wave; s l (k) is the amplitude vector of the l-th plane wave; n i (k) is the noise received by the i-th microphone.The frequency domain received sound pressure model is expressed in matrix form as follows: where V(k, Ψ) is the I × L dimensional direction matrix; s(k) = [s 1 (k), . . ., s L (k)] T is the L dimensional source signal vector; n(k) = [n 1 (k), . . ., n I (k)] T is the I dimensional zero-mean Gaussian white noise vector; and n(k) is assumed to be uncorrelated with s(k).For an open-sphere array (microphones in a free field), v(k, Ψ l ) is denoted as:

Spherical Harmonic Domain System Model
For the case of rigid spherical array with acoustic scattering [23], the time delay method cannot be used to obtain the sound pressure directly.The sound field modal synthesis method will be introduced to calculate the sound pressure response of the spherical array [24].According to the Fourier acoustic principle, for a spherical array of order N, the sampled sound pressure of the microphone at position (r, Ω), Ω = (θ, φ) on the sphere and its spherical harmonic domain (SHD) are expressed as: The spherical Fourier basis function Y m n (Ω) is called the n-th order spherical harmonic function (SHF) of the degree of freedom, m: where P m n denotes the concatenated Lejeune function; P m n (cos θ) reflects the effect of θ on the operation state of the SHF; and e imφ reflects the effect of φ on the operation state of the SHF.When sampling spherical sound pressure, a spherical sample (θ i , φ i ) and a sampling weight a i are given.
To extract the sound field spherical harmonics, the microphone positions are required to satisfy the weighted orthogonality condition.The three types of aliasing-free sampling for spherical array signal processing [25] are equal-angle sampling, Gaussian sampling, and near-uniform sampling.In order to reduce the amount of multi-channel data operation, we use the uniform sampling method which can achieve the minimum number of sampling points.It has a sampling point count of (N + 1) 2 and the sampling weight a i = 4π I is set as a constant.
In the SHD, the frequency and the angular components of the source direction matrix can be separated from each other as follows [20]: where T is a (N + 1) 2 dimensional vector of SHFs in the sound-source direction, consisting of SHFs of different orders; y(Ω i ) is a (N + 1) 2 dimensional vector of SHFs in the microphone direction, having the same form as y(Ψ l ); B(kr) is the modal intensity matrix, which turns out to be a (N + 1) 2 × (N + 1) 2 dimensional symmetric matrix composed of the spherical modal amplitude response b n (kr).b n,open (kr) = 4πi n j n (kr) b n,rigid (kr) = 4πi n j n (kr) − j n (kr) h n (kr) h n (kr) (7) where j n and h n are the first class n order spherical Bessel function and the second class n order spherical Hankel function, respectively.For the rigid spherical array, the first term of b n (kr) describes the incident sound field, which is the same as the open spherical array, and the second term of b n (kr) describes the scattered sound field.
For a plane wave with amplitude s l (k) and wave number vector (k, Ψ), Ψ = (θ l , φ l ), the sound pressure received at the i-th microphone can be expressed as:

Framework of the Proposed SHD-BMUSIC
It is known from the literature [19] that the SHD-MUSIC algorithm is sensitive to noise.To address these challenges, we introduce beam design in the SHD and transform the SHD-MUSIC algorithm into the beam space for processing; therefore, we combine the spherical harmonic domain (SHD) with the beam domain (BD).The beamforming is processed in the SHD and the MUSIC spatial spectrum is constructed in the BD.The main advantages are as follows: 1.
The SHD axisymmetric beamformer is designed to separate the beam steering from the beam weights so that the beam steering can be adjusted arbitrarily without adjusting the beam weights; 2.
The SHD beam weights can be independent of the signal frequency, without the demands on a special constant beam-width design; 3.
The frequency and angle components of the source can be decoupled in the SHD, and the frequency smoothing can be used to decouple the coherent source without affecting the DOAs of the sources [14].
The framework of the proposed spherical harmonic domain beam-space MUSIC algorithm (SHD-BMUSIC) DOA estimation scheme is shown in Figure 2.

Beam Weight Design
The beam weight design can be categorized as a single-objective and multi-objective optimization problem.The beam directional map metrics include directionality, white noise gain, side flap level, and main flap width.Although multi-objective formulation descriptions usually do not have closed-form solutions, they can be integrated into numerically solved optimization problems.The multi-objective beam weight design formulation is expressed in the form of a quadratic constraint quadratic planning (QCQP) optimization problem, which is a special case of second-order cone programming (SOCP):

Beam Weight Design
The beam weight design can be categorized as a single-objective and multi-objective optimization problem.The beam directional map metrics include directionality, white noise gain, side flap level, and main flap width.Although multi-objective formulation descriptions usually do not have closed-form solutions, they can be integrated into numerically solved optimization problems.The multi-objective beam weight design formulation is expressed in the form of a quadratic constraint quadratic planning (QCQP) optimization problem, which is a special case of second-order cone programming (SOCP): where W NG min is the lower bound on W NG. Matrix S depends on the sampling scheme, for the near-uniform sampling scheme, S = 4π Q Y H .The matrices A and B are conjugate symmetric matrices.For non-zero vectors w nm , w H nm Aw nm , and w H nm Bw nm is positive definite.w H nm Aw nm ≤ 1 W NG min denotes the white noise gain constraint.w H nm v nm = 1 denotes the distortion-free response constraint of the beam direction map in the array view direction.Judging by the satisfaction of distortion-free response constraint and white noise gain constraint, the average response w H nm Bw nm in all directions of the array is minimized.The beamforming weight w nm is finally derived using a numerical solution method.

Beam Direction Chart Indicator
The directionality factor (DF) is defined as the ratio of the observed directional signal power gain to the all-around uniform arrival signal power gain.
where y(θ k , φ k ) is the response output in the observation direction.DF quantifies the improvement in SNR provided by the array directional response.White noise gain (W NG) is defined as the improvement in SNR at the array output compared to the array input and is used as a measure of array robustness: where w nm is the beamforming weight.Uniform sampling satisfies SS H = 4π Q I, where I is the unit matrix and Q is the number of sampling samples.v nm = b n (kr)[Y m n (θ k , φ k )] * denotes the array input generated by the plane wave sound field, and can be applied to other arrays such as rigid-or open-sphere arrays by simply modifying b n (kr).Therefore, the SHD-beamforming system has remarkable flexibility [26].
According to Equation ( 9), set the white noise gain constraint to 1 W NG min = 0.2 and 2.7 kHz frequency point.The beam weight value w nm is derived based on the QCQP algorithm, since w nm is independent of the signal frequency.Referring to w nm , W NG = 22.95 dB and DF = 24.0dB should be set for the entire operating frequency band.

Rotational Symmetric Beamformer
The SHD-beamforming consists of two parts: spherical harmonic decomposition and SHD weighting summation.If the beam steering angle is to be adjusted, the weighting vector needs to be redesigned.In the beamforming block diagram proposed by Meyer and Elko [27], reducing the beam formation weights to a one-dimensional function.The resulting beam direction map is axisymmetric when viewed from the viewpoint of the axis forming the symmetry.This separates the beam steering adjustment from the weighted value design: The rotationally symmetric beamformer contains two parts: beam guidance and beam synthesis.The weighted value w nm adjusts the beam map shape and the spherical harmonics Y m n (θ 0 , φ .0 ) adjust the beam steering angle.The rotationally symmetric beamformer with beam weight w nm of Section 3.3 and different beam points Y m n (θ 0 , φ .0 ) is shown in Figure 3.The rotationally symmetric beamformer contains two parts: beam guidance and beam synthesis.The weighted value nm w adjusts the beam map shape and the spherical harmonics A plane wave decomposition (PWD) of ( , ) nm kr p , i.e., dividing by the modal intensity () kr B , yields the plane wave amplitude density can be expressed as: A plane wave decomposition (PWD) of p nm (k, r), i.e., dividing by the modal intensity B(kr), yields the plane wave amplitude density can be expressed as: The beam response output is obtained using the inner product of axisymmetric beamforming weights w * nm,sym with SHD array data p nm (k, r): The beam response y(kr, ∆Ψ) is affected by the angle ∆Ψ between the plane wave arrival direction (θ k , φ k ) and the array view direction (θ 0 , φ 0 ).

Beam Domain MUSIC
The beam domain MUSIC algorithm forms multiple beams in the spatial sector of interest with the number of beams between the number of sources and the number of microphones [28].Different pointing beams are used for synthesis to cover the target area, which forms a D × (N + 1) 2 dimensional axisymmetric beamforming matrix W nm,sym = [w nm1,sym , . . ., w nmD,sym ].The 2D preformed multi-beam matrix is shown in Figure 4.
The beam response output is obtained using the inner product of axisymmetric beamforming weights The beam response ( , ) kr  y is affected by the angle  between the plane wave arrival direction ( , )  and the array view direction 0 0 ( , )  .

Beam Domain MUSIC
The beam domain MUSIC algorithm forms multiple beams in the spatial sector of interest with the number of beams between the number of sources and the number of microphones [28].Different pointing beams are used for synthesis to cover the target area, which forms a The beam response vector, obtained using the inner product of the beamforming If the beamforming matrix W nm,sym does not satisfy W H nm,sym W nm,sym = I, the standard orthogonalization needs to be performed on it as follows: The beam response vector, obtained using the inner product of the beamforming weighting matrix and SHD array data, is expressed in matrix form: The beam domain data covariance matrix, S y (k) = E y(kr, ∆Ψ)y H (kr, ∆Ψ) , is reduced to D × D dimensions and the operations of the feature decomposition are reduced to o(D 3 ).Therefore, the dimension-reduction beam-space processing eases the operations of the subspace algorithm.
Frequency smoothing (FS) can be applied as a straightforward average by assuming frequency independence of the (mode strength compensated) array manifold [29].We de-correlate the coherent source signal by implementing FS that computes the smoothed covariance matrix as the average of covariance matrices at different frequency sectors.Then, the smoothed beam response covariance matrix Sy = 1 The eigenvalue decomposition of Sy is performed to obtain the noise subspace and the signal subspace in the beam domain.The BD-MUSIC spatial pseudo-spectral function is constructed using the orthogonality of the beam scan vector b(Ψ) = W H nm,sym y(Ψ) and beam domain noise subspace E nm .The equations can be summarized as follows: Finally, the spectral peak search is carried out and the direction corresponding to the peak is the DOA of the sound sources.

Simulation Parameter Settings
We simulated the room impulse response (RIRs) between an omnidirectional source and a microphone in a reverberant environment using the image-source method modified to account for the scattering of a rigid sphere [23].Multiple randomly selected audios from the TIMIT database were used as a simulated source of the original vocalizations.The original audio signal was a pure voice signal of 4 s duration with a sampling rate of 16 kHz.The indoor 3D empty room size was set to 4 m × 6 m × 3 m.The radius of the sphere array r was 0.042 m, placed in the middle of the room, and the distance of the original sound sources from the center of the sphere array was 1 m.The spherical array was configured as a rigid sphere using uniform sampling weights.The order N of the spherical harmonic function was taken as three.
The signals received of the sphere array were obtained by convolving the original audio with the RIRs and by adding a certain SNR of Gaussian white noise.The signals received were transformed to the time-frequency domain using the short-time Fourier transform (STFT).The frame length of STFT was 16 ms, and the selected window function was a Hamming window with a window length of 256 points and a frame overlap of 50%.Then, we carried out a spherical harmonic transform in the used frequency band.A disadvantage of rigid spherical arrays is the low-frequency performance, so excessive noise amplification at lower frequencies due to modal strength compensation needs to be avoided [29].According to the spatial aliasing kr < N, the upper-frequency limit is around 5 kHz.Thirty frequency bins ranging from 2700 Hz to 3600 Hz were used to perform the spherical harmonic transform to obtain the SHD sound pressure data.Finally, DOA estimation algorithms were executed separately to obtain the DOAs of the sound sources.The azimuthal accuracy and elevation angle accuracy of the spectrum peak search were 1 degree.The localization performances of the SHD-MUSIC and SHD-BMUSIC are compared in these different SNR levels and reverberation time (RT) environments.

DOA Estimated Evaluation Metrics
To evaluate the algorithm fairly, the performances of the algorithm using two qualitative metrics were measured.Fifty Monte Carlo simulation trials were performed in each sound field condition.
The first metric was the average number of detected sources, which analyzed whether the DOA estimation algorithm was supposed to detect the target source.For each detection, the target source was detected if the deviation between the estimated angle and the true angle was set within 20 • .The number of sources detected each time was statistically averaged to obtain the average number of detected sources.
When multiple sources can be correctly distinguished, we used a second evaluation metric of the average root mean square error (RMSE): where L is the number of sound sources; M is the number of cases that successfully detect all the L sound sources.; (θ m ori (l), φ m ori (l)) is the true position of the l-th source in the m-th trial; (θ m est (l), φ m est (l)) is the estimated position of the l-th source in the m-th trial.The average of the estimated positions of all trials is taken as the final estimated position (θ est (l), φ est (l)) of the lth source.The average RMSE determines the deviation between the true position and the estimated position and is used to analyze the accuracy of the localization algorithm.Only the RMSE of the positioned angle of the detected signal is considered.

Single-Source Localization Algorithm Verification
First, the estimation results of the algorithms for spherical harmonic domain-minimum variance distortion-free response (SHD-MVDR) [12], SHD-MUSIC, and SHD-BMUSIC are compared for the single-source case.Aiming at minimizing the contribution of noise and any arriving signals from other directions, the MVDR method is supposed to maintain a fixed gain in the look direction.Three groups of single-source experiments with different sound field environments were tested.The sound source was located at (60 • , 300 • ).The sound field conditions were high reverberation (RT = 0.8 s, SNR = 20 dB), low SNR (RT = 0.2 s, SNR = 0 dB), and high-reverberation low-SNR (RT = 0.8 s, SNR = 0 dB).For the SHD-BMUSIC algorithm, five beams were formed to cover the area where the target sound source was located.The estimated spatial spectrum of the single source for the three algorithms is shown in Figure 5.
The spatial spectrum of the single source showed that all three algorithms successfully estimated the location of the single source in different environments.However, the spatial spectral peaks of SHD-BMUSIC were both sharper than SHD-MUSIC and SHD-MVDR and the spatial spectral gain was higher.It was quite convincing that the SHD-BMUSIC algorithm suppressed the noise outside the beam region under these circumstances and the SNR outputs were improved.The spatial spectrum of the single source showed that all three algorithms successfully estimated the location of the single source in different environments.However, the spatial spectral peaks of SHD-BMUSIC were both sharper than SHD-MUSIC and SHD-MVDR and the spatial spectral gain was higher.It was quite convincing that the SHD-BMUSIC algorithm suppressed the noise outside the beam region under these circumstances and the SNR outputs were improved.

Adjacent Sound Sources Localization Algorithm Verification
In this section, we verify that the SHD-BMUSIC was able to distinguish the locations of multiple adjacent sound sources with higher source identification accuracy compared to SHD-MUSIC.Two sets of experiments with the adjacent sound sources were performed.The first set of experiments set up sound source 1 at (60°, 150°) and sound source 2 at (60, 180°).The second group of experiments set up sound source 1 at (60°, 140°), sound source 2 at (60, 180°), and sound source 3 at (60°, 220°).Both experiments were conducted in an environment with RT = 0.3 s and SNR = 20 dB.For the SHD-BMUSIC algorithm, for the first set of experiments, five beams were used to cover the area where two sources were located.For the second set of experiments, seven beams were used to cover the area where the three sources were located.The spatial spectra of the adjacent sound sources for both algorithms are shown in Figure 6.

Adjacent Sound Sources Localization Algorithm Verification
In this section, we verify that the SHD-BMUSIC was able to distinguish the locations of multiple adjacent sound sources with higher source identification accuracy compared to SHD-MUSIC.Two sets of experiments with the adjacent sound sources were performed.The first set of experiments set up sound source 1 at (60 • , 150 • ) and sound source 2 at (60 • , 180 • ).The second group of experiments set up sound source 1 at (60 • , 140 • ), sound source 2 at (60 • , 180 • ), and sound source 3 at (60 • , 220 • ).Both experiments were conducted in an environment with RT = 0.3 s and SNR = 20 dB.For the SHD-BMUSIC algorithm, for the first set of experiments, five beams were used to cover the area where two sources were located.For the second set of experiments, seven beams were used to cover the area where the three sources were located.The spatial spectra of the adjacent sound sources for both algorithms are shown in Figure 6.
It is quite convincing that the algorithm proposed has a better discriminative ability than SHD-MUSIC.Tables 1 and 2 show the experimental statistics for the two cases, respectively.It is quite convincing that the algorithm proposed has a better discriminative ability than SHD-MUSIC.Tables 1 and 2 show the experimental statistics for the two cases, respectively.As shown in Tables 1 and 2, for the first set of conditions, both algorithms succeeded most of the time in distinguishing two adjacent sound sources.However, SHD-BMUSIC had a higher average number of detected sources and a lower average RMSE.For the second set of conditions, SHD-MUSIC was basically unable to completely estimate the three adjacent sound sources.The intermediate source was not successfully detected in most cases, resulting in a small average number of detected sources for SHD-MUSIC.In contrast, SHD-BMUSIC was able to detect all three sound sources in most tests.In the tests where all sound sources were successfully discriminated, SHD-BMUSIC also had a smaller average RMSE.As a result, SHD-BMUSIC improved the accuracy of sound source location estimation and had a higher adjacent source discrimination capability.The increase in position resolution was the gain from the beamforming operation.had a higher average number of detected sources and a lower average RMSE second set of conditions, SHD-MUSIC was basically unable to completely esti three adjacent sound sources.The intermediate source was not successfully de most cases, resulting in a small average number of detected sources for SHD-M contrast, SHD-BMUSIC was able to detect all three sound sources in most tests.In where all sound sources were successfully discriminated, SHD-BMUSIC als smaller average RMSE.As a result, SHD-BMUSIC improved the accuracy of soun location estimation and had a higher adjacent source discrimination capability crease in position resolution was the gain from the beamforming operation.The statistical results of SHD-MVDR, PIVs [29], SSPIVs [29], SHD-MUSIC, and SHD-BMUSIC algorithms were compared for multi-source case testing.Figure 8 shows the average results of the tests with five sources for each combination of SNR and RT.
Symmetry 2023, 15, x FOR PEER REVIEW 14 of 18 The statistical results of SHD-MVDR, PIVs [29], SSPIVs [29], SHD-MUSIC, and SHD-BMUSIC algorithms were compared for multi-source case testing.Figure 8 shows the average results of the tests with five sources for each combination of SNR and RT.As seen in Figure 8, overall, the performance of the localization algorithm showed a certain degree of decline as the SNR decreased and the RT increased.The performance of SHD-MVDR was seriously affected by reverberation and noise.SSPIVs improved the shortcomings of PIVs which were severely affected by reverberation.PIVs and SSPIVs had shorter running times because they were not expected to carry out a spectral peak search.The SHD-MUSIC and SHD-BMUSIC, which use frequency smoothing, had strong adaptability to reverberation.Due to the gain from beamforming, the proposed algorithm had better sound-source detection capability and localization accuracy than the rest of the algorithms, showing better anti-noise and anti-reverberation performance.As seen in Figure 8, overall, the performance of the localization algorithm showed a certain degree of decline as the SNR decreased and the RT increased.The performance of SHD-MVDR was seriously affected by reverberation and noise.SSPIVs improved the shortcomings of PIVs which were severely affected by reverberation.PIVs and SSPIVs had shorter running times because they were not expected to carry out a spectral peak search.The SHD-MUSIC and SHD-BMUSIC, which use frequency smoothing, had strong Then, the case of the adjacent dual-sound sources was tested.The locations of the two sources in the anechoic and reverberant rooms were (90 • , 70 • ) and (90 • , 100 • ).The estimated spatial spectra of the two algorithms are shown in Figure 11.Finally, it is promising to conclude that both algorithms are better at estimating the location of a single source in an anechoic room.The SHD-BMUSIC had sharper peaks and lower estimation errors.In the reverberation room, SHD-MUSIC showed additional excess peaks due to reverberant reflected sound in adjacent dual-source experiments.This made it easy for the algorithm to mistake the pseudo-peaks for the direction of the sound source, leading to an incorrect estimation of the sound source location.However, SHD-BMUSIC did not have the phenomenon of redundant pseudo-peaks.This was because the beamforming suppressed the excess reflected sound outside the source region, thus improving the reverberation resistance of the algorithm.

Conclusions
In this paper, we use a spherical harmonic domain beam-space MUSIC algorithm to solve the multi-source DOA estimation problem in harsh environments where reverberation and noise coexist.The designed spherical harmonic-domain beam is not only easy to implement, with flexible adjustment of steering, but also reduces the matrix dimension to reduce the amount of eigen-decomposition operations.From the experimental results, it can be seen that the SHD-BMUSIC algorithm has some advantages over the SHD-MUSIC algorithm without the beam domain strategy.Its offline construction of the beam does not increase the time consumption of online real-time positioning.Instead, it enables online localization with sharper spatial spectral peaks, more robust multi-source localization capability, and adjacent source discrimination.We will further optimize the design in the direction of beamforming and the MUSIC algorithm to better solve the problem of locat- Finally, it is promising to conclude that both algorithms are better at estimating the location of a single source in an anechoic room.The SHD-BMUSIC had sharper peaks and lower estimation errors.In the reverberation room, SHD-MUSIC showed additional excess peaks due to reverberant reflected sound in adjacent dual-source experiments.This made it easy for the algorithm to mistake the pseudo-peaks for the direction of the sound source, leading to an incorrect estimation of the sound source location.However, SHD-BMUSIC did not have the phenomenon of redundant pseudo-peaks.This was because the beamforming suppressed the excess reflected sound outside the source region, thus improving the reverberation resistance of the algorithm.

Conclusions
In this paper, we use a spherical harmonic domain beam-space MUSIC algorithm to solve the multi-source DOA estimation problem in harsh environments where reverberation and noise coexist.The designed spherical harmonic-domain beam is not only easy to implement, with flexible adjustment of steering, but also reduces the matrix dimension to reduce the amount of eigen-decomposition operations.From the experimental results, it can be seen that the SHD-BMUSIC algorithm has some advantages over the SHD-MUSIC algorithm without the beam domain strategy.Its offline construction of the beam does not increase the time consumption of online real-time positioning.Instead, it enables online localization with sharper spatial spectral peaks, more robust multi-source localization capability, and adjacent source discrimination.We will further optimize the design in the direction of beamforming and the MUSIC algorithm to better solve the problem of locating and tracking multiple adjacent sound sources in high-reverberation and highnoise environments.

Figure 1 .Figure 1 .
Figure 1.Defined spherical coordinate system.There are L far-field sound sources generating plane waves that propagate through space and are picked up by microphones.( , ) l l l  = denotes the direction of propagation of the l -th sound source, ( cos sin , sin sin , cos ) T l l l l l l k b n (kr) is constructed based on the structure of the array as for the open and rigid spherical arrays, respectively:

Figure 2 .
Figure 2. Flow chart of DOAs estimation scheme for indoor sound sources.

Figure 2 .
Figure 2. Flow chart of DOAs estimation scheme for indoor sound sources.

Figure 3 .
Figure 3. Axisymmetric beam diagram with the same beam shape and different beam steering.

Figure 3 .
Figure 3. Axisymmetric beam diagram with the same beam shape and different beam steering.

4 .
preformed multi-beam matrix is shown in Figure (a) Individual beam map (b) Multiple beam maps (c) Synthesized beam map

Figure 4 .
Figure 4. Two-dimensional form of the preformed multi-beam directional map.

Figure 4 .
Figure 4. Two-dimensional form of the preformed multi-beam directional map.

Figure 5 .
Figure 5. Spatial spectra of single sound source in different acoustic environments.

Figure 5 .
Figure 5. Spatial spectra of single sound source in different acoustic environments.

4. 3 . 3 .
Multi-Source Localization Algorithm Verification at Different SNRs and RTs To verify the performance of the SHD-BMUSIC algorithm in the reverberation-noise sound field, we considered the positioning results under different reverberation durations and SNR levels.There were five sources emitting simultaneously, and the source locations were (60 • , 300 • ), (140 • , 90 • ), (60 • , 170 • ), (140 • , 250 • ), and (60 • , 30 • ).There were 25 beams used to form five new separate beams to the source area and to cover the target area.The spatial spectra of SHD-BMUSIC at different SNRs and RTs are shown in Figure 7.

4. 3 . 3 .
Multi-Source Localization Algorithm Verification at Different SNRs and RT To verify the performance of the SHD-BMUSIC algorithm in the reverberati sound field, we considered the positioning results under different reverberation d and SNR levels.There were five sources emitting simultaneously, and the source were (60°, 300°), (140°, 90°), (60°, 170°), (140°, 250°), and (60°, 30°).There were 2 used to form five new separate beams to the source area and to cover the target a spatial spectra of SHD-BMUSIC at different SNRs and RTs are shown in Figure 7 (a) Spatial spectra of SHD−BMUSIC at different SNRs and low RT (b) Spatial spectra of SHD−BMUSIC at different SNRs and high RT

Figure 7 .
Figure 7. Multi-source spatial spectra of SHD−BMUSIC at different SNRs and RTs (five so
(a) Average number of detected sources with different SNRs and RTs of localization algorithms (b) Average RMSE with different SNRs and RTs of localization algorithms

Figure 8 .
Figure 8. Line graph of the statistical trials of the localization algorithm (five sources).

Figure 8 .
Figure 8. Line graph of the statistical trials of the localization algorithm (five sources).

Figure 11 .
Figure 11.Spatial spectra for actual testing of adjacent dual-sound sources.

Figure 11 .
Figure 11.Spatial spectra for actual testing of adjacent dual-sound sources.

Table 1 .
Localization results and errors for two adjacent sound sources at SNR = 20 dB, RT = 0.3 s.

Table 2 .
Localization results and errors for three adjacent sound sources at SNR = 20 dB, RT = 0.3 s.

Table 1 .
Localization results and errors for two adjacent sound sources at SNR = 20 dB, RT = 0.3 s.

Table 2 .
Localization results and errors for three adjacent sound sources at SNR = 20 dB, RT = 0.3 s.