Nested sampling for detection and localization of sound sources using a spherical microphone array

: Since its inception in 2004, nested sampling has been used in acoustics applications. This


Introduction
Nested sampling (NS) was introduced by Skilling [1] as a numerical method for efficient Bayesian calculations.Soon afterward, this method was applied to acoustics problems [2], where Jasa and Xiang explored using Lebesgue integral as the mathematical foundation of the NS algorithm.Since then, that effort has resulted in a series of publications in acoustic applications [3][4][5].This paper showcases that the NS algorithm has recently been applied in sound source detection and localization within a Bayesian framework.To detect and localize sound sources, the sound environment is sensed by a spherical microphone array whose signals are processed using a spherical Fourier transform.Spherical harmonics are exploited to process the acoustic data and to formulate the signal models.This paper emphasizes that source detection represents a model comparison problem, source enumeration represents a model selection problem, and source localization represents a parameter estimation problem, all of which can be efficiently accomplished within the Bayesian framework using the NS algorithm.
This paper presents a further development from the previous work [6] in that a background model for a no-source scenario needs to established.The source detection problem is critically based on the model comparison between the no-source and one-source models.Special attention is given to the spherical harmonics when dealing with the background model and is separately dealt with in Section 2.2.In addition to these model improvements, higher-order (fourth) spherical components have been achieved due to the further development of a 32-channel microphone array as illustrated in Figure 1.

Spherical Microphone Data and Models
This section briefly introduces the data processing and the prediction models used for the sound source detection, enumeration, and direction of arrival estimations.

Microphone Array Data
When Q microphones are arranged flush on a rigid sphere of radius a nearly equidistantly, the sound pressure signals P mic are processed by with the third sum over q = 1, . . ., Q being a spherical Fourier transform of Q microphone signals P mic (Θ q ) at angular positions Θ q , and symbol * standing for a complex conjugate.Function B(k a) is a modal strength of the rigid sphere of radius a, where j = √ −1, k = ω/c is the propagation coefficient of sound waves.Functions j n (•) and h n (•) are spherical Bessel and Hankel functions, and j ′ n (•) and h ′ n (•) are their derivatives, respectively.Θ = {θ, ϕ}, collectively represents elevation and azimuth angles, while Θ q specifies Q microphone locations flush-mounted on the spherical surface of the rigid sphere of radius a. Figure 1 shows a spherical microphone array of Q = 32 channels developed for this research.The spherical array is built upon a rigid sphere of radius a = 3.5 cm.In the following, we denote D = {D(Θ)} as a two-dimensional matrix (vectors) representing the experimental data in the context of Bayesian inferential inversion.

Prediction Models
In Equation (1), Y m n (Θ) is so-called spherical harmonics of order n and degree m, it is orthonormal and complete in a sense, and when N → ∞.Θ s is the source angle in the form of the direction of arrival (DoA).δ(•) represents the Kronecker delta function.Using Equation (3), we establish a predictive model of spherical beamforming as where A s counts for different source energy strengths of individual sound sources.Note that Θ S = {A 1 , A 2 , . . ., A S , Θ 1 , Θ 2 , . . ., Θ S } collectively denotes both strength vector A S and angular directions (vectors) of S sound sources, and each angular vector contains one pair of elevation and azimuth angles Θ s = {θ s , ϕ s }.Variable Θ represents the angular range for possible sound sources to be localized.For S ≥ 1, the kernel function g(Θ s , Θ) in Equation ( 3) is processed for the upper order (N + 1) 2 ≤ Q.This means that the integer-valued order N of the spherical harmonics is limited by the number of microphone channels instead of infinity.Figure 2 illustrates a superposition of two simultaneous sources of equal amplitudes predicted by the model kernel in Equation ( 3) for N = 4 before squaring the operation to build source energy.The finite upper order N is responsible for the width of lobes rather than middle-form ones.Figure 3a illustrates the spherical microphone data for the presence of two simultaneous sound sources, processed using Equations ( 1) and ( 2), while Figure 3b illustrates the predicted map of the two simultaneous sound sources using the model in Equation ( 4).The angular range is evaluated over  When processing the microphone array data, there is no prior knowledge about the incoming sound field either with the presence or absence of sound sources.It does not make sense to pursue direction of arrival analysis if no sound sources are present in the incoming microphone signals.For the model-based Bayesian detection, we need to establish a background model.Special attention has to be given to the spherical harmonics processing in this case.Specifically, M 0 represents the no-source model for S = 0.In this case, the kernel function g(Θ 0 , Θ) in Equation ( 3) is only calculated for N = 0, namely the zero-order of the spherical harmonics.
where the direction of 'no-source' Θ 0 is irrelevant over the angular range Θ.For notation purpose, we collectively denote M S = {M 0 , M 1 , . . ., M S } as being the prediction models for the directional of arrival analysis, while we denote M D = {M 0 , M 1 } for the sound source detection, a small subset of M S .

Bayesian Calculations
Given the data D as formulated in Section 2.1 and the prediction models M S (Θ S ) in Section 2.2, this work relies on Bayes theorem: with Π(Θ S ) = p(Θ S |M S ) being the prior probability and L(Θ S ) = p(D|Θ S , M S ) being the likelihood function.The prior and the likelihood are both prior probabilistic in nature and need to be assigned a priori.This work applies the principle of maximum entropy (MaxEnt), which leads to a uniform prior and a Student-T distribution for the likelihood (see Ref. [6] for details).The evidence Z in Equation ( 6) plays a central role for the source detection and source enumeration problems and is determined by where is the prior mass with L(µ(L ϵ )) = L ϵ , and 0 ≤ L ϵ ≤ L max as derived by Skilling [7].The NS algorithm generates a monotonically increasing partition of the likelihood range [0, L max ] via constrained sampling such that L t with t ∈ {0, 1, . . ., T} is sampled from the domain (Θ S : L(Θ S ) > L t−1 ).Observe that as L ϵ increases from 0 to L max , µ ϵ decreases from 1 to 0, where µ ϵ = µ(L ϵ ) and the partition of Equation ( 9) generates the monotonically decreasing sequence Using the sequences in Equations ( 9) and (10), the one-dimensional integration on the far-right-hand side of Equation ( 7) is well approximated by Skilling [7] pointed out that the constrained prior mass is a statistical quantity and follows a shrinkage of µ t ≈ e −t/P , (13) after t iterations, with P being an integer number for initializing random samples.A detailed proof of this result was given using order statistics in Appendix B of Jasa and Xiang [3].NS was shown by Jasa and Xiang [3] to be a numerical implementation of Lebesgue integration, where Equation (11) represents the sum of weighted integrands of simple functions that are generated by partitioning the range rather than the domain of the function.An early account of this connection of the NS algorithm to Lebesgue integration can also be found in Jasa and Xiang [2], published in the Proceedings of the 25th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2005).

Sound Source Detection, Enumeration, and Localization
The above described calculation is implemented for P = 500 [in Equation ( 13)], the initial population with uniformly distributed prior ranges of all pending parameters, including sound source strength A s , the elevation θ S , and azimuth angles ϕ S .For potentially simultaneous sound sources up to four (S ≤ 4), the parameter space is of dimensions up to 3 × S. The NS is applied to estimating Bayes factors for model order from 1 to 4 via evidence.It is used to rank a potential model accounting for an unknown number of simultaneous sound sources, in a so-called sound source enumeration process.This process has also been described previously in Xiang and Landschoot [6], followed by a DoA analysis based on the selected model M S that is carried out by Bayesian parameter estimation.When examining this effort critically, the authors recognize that the source enumeration, even if representing a higher level of inference via Bayesian model selection, would still be incomplete if the machine sensory modality, such as in this application of a spherical microphone array, is not notified that the absence of sound sources often represents predominant portions of the sound environment in practical scenarios.It will only make sense to pursue sound source enumeration and DoA analysis if any sound source is ever detected.
The sound source detection is carried out in the scope of this current work using Bayesian model comparison.The prediction models M D = {M 0 , M 1 } solely involve two models, M 0 in Equation ( 5) and M 1 in Equation ( 4) for S = 1.Note that Equation ( 5) is separately described because the g 2 (Θ 0 , Θ) needs special attention, in which the spherical order is set to N = 0, while ,for M S for S ≥ 1, the spherical order N = 4 due to the 32-channel spherical microphone array used for this work.
For the source detection, Bayesian evidence is estimated using the NS based on the 'nosource' model M 0 against the 'one-source' model M 1 .Specifically, for M 0 -based sampling, there is still one pending parameter A 0 to sample.Figure 4 (left) shows an experimental investigation when the microphone array data contain no sources but noisy background signals.The evidence estimation using the NS demonstrates insignificant differences to that of M 1 , indicating that the source detection is negative.Figure 4 (right) shows that if the microphone array data contain sound sources, yet an unknown number, the evidence estimation clearly shows significant differences in comparison with those of 'no-sources'.The source detection is positive.
Upon a positive detection of sound sources, a further process involves Bayesian model selection.A set of sound source models from M 0 to M 4 is involved for estimating Bayes factors: for i = 1, 2, . . . 4. Figure 5 illustrates one set of Bayes factor estimations.In this work, the evidence and Bayes factors are calculated in units [decibans] denoting 10 times logarithm base 10 [10 lg] in honor of Thomas Bayes [8].In this case, the source enumeration using Bayesian model selection suggests that two sources are contained in the data.At this stage, the interest in specific DoA parameters is pushed into background.Upon the selection of a two-source model using the NS, the exploration samples during the iterative NS process also provide posterior samples for the model M 2 as a byproduct; they are readily available once the evidence Z 2 for two sound sources is sufficiently explored.The posterior samples provide parameter estimates of two sound sources in terms of source strength A 2 , and angular parameters Θ 2 .The data processed using Equation (1) and the model prediction according to the posterior estimates are compared in Figure 3.

Concluding Remarks
From its introduction into Bayesian calculations, Skilling's nested sampling [1] had an immediate impact on room-acoustic research [2], where an early account of the Lebesgue integral view on the nested sampling was first exposed in the MaxEnt Community in 2005.A thorough handling of its mathematical foundation of the Lebesgue integral was given at a later point [3]; 'Interpreting nested sampling as a statistical approximation of a Lebesgue integral opens the possibility of a large body of existing research to be applied in the analysis and possible extension of the algorithm'.Over the past 15 years, a stream of applications using NS in acoustics science and engineering has emerged.Among others, this paper reports on an acoustic application of nested sampling using a spherical microphone array within the Bayesian framework.

Figure 1 .
Figure 1.Spherical microphone array of radius a = 3.5 cm.Altogether, 32 microphones are nearly uniformly flush-mounted over the rigid spherical surface.

Figure 2 .
Figure 2. Beamforming superposition of two sound sources using a spherical order N = 4.

Figure 3 .
Figure 3.Comparison between the experimental data (a) processed according to Equation (1) with the prediction model (b) in Equation (4) of two sound sources using a spherical order N = 4.

Figure 4 .
Figure 4. Sound source detection based on Bayesian model comparison.Bayesian evidence is estimated using both 'no-source' model M 0 and one-source model M 1 .The evidence is expressed in unit [decibans] in honor of Thomas Bayes [8].

Figure 5 .
Figure 5.The sound source enumeration based on Bayes factor estimation.The Bayes factors are expressed in unit [decibans] in honor of Thomas Bayes [8].A two-source model is preferred by the Bayesian model selection.The evidence estimated using nested sampling also provides the posterior as a byproduct.