A Weakly Informative Prior for Resonance Frequencies

Van Soom, Marnix; de Boer, Bart

doi:10.3390/psf2021003002

Open AccessProceeding Paper

A Weakly Informative Prior for Resonance Frequencies^†

by

Marnix Van Soom

^*

and

Bart de Boer

AI Lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium

^*

Author to whom correspondence should be addressed.

^†

Presented at the 40th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, online, 4–9 July 2021.

Phys. Sci. Forum 2021, 3(1), 2; https://doi.org/10.3390/psf2021003002

Published: 4 November 2021

(This article belongs to the Proceedings of The 40th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

We derive a weakly informative prior for a set of ordered resonance frequencies from Jaynes’ principle of maximum entropy. The prior facilitates model selection problems in which both the number and the values of the resonance frequencies are unknown. It encodes a weakly inductive bias, provides a reasonable density everywhere, is easily parametrizable, and is easy to sample. We hope that this prior can enable the use of robust evidence-based methods for a new class of problems, even in the presence of multiplets of arbitrary order.

Keywords:

weakly uninformative prior; resonance frequency; model selection; maximum entropy

1. Introduction

An important problem in the natural sciences is the accurate measurement of resonance frequencies. The problem can be formalized by the following probabilistic model:

p (D, x | I) = p (D | x) p (x | I) \equiv L (x) π (x),

(1)

where D is the data,

x = {x_{k}}_{k = 1}^{K}

are the K resonance frequencies of interest, and I is the prior information about

x

. As an example instance of (1), we refer to the vocal tract resonance (VTR) problem discussed in Section 5 for which D is audio recorded from the mouth of a speaker;

x

are a set of K VTR frequencies, and the underlying model is a sinusoidal regression model. Any realistic problem will include additional model parameters

θ

, but these have been silently ignored by formally integrating them out of (1), i.e.,

p (D, x | I) = \int d θ p (D, x, θ | I)

.

In this paper, we assume that the likelihood

L (x) \equiv p (D | x)

is given, and our task is to choose an uninformative prior

π (x) \equiv p (x | I)

from limited prior information I. A conflict arises, however:

\begin{matrix} The uninformative priors π most commonly chosen to express limited \\ prior information I are, in practice, often precluded by that same I . \end{matrix}

(2)

The goal of this paper is to describe this conflict (2) and to show how it can be resolved by adopting a specific choice for

π

. This allows robust inference of the number of resonances K in the important case of such limited prior information I, which in turn enables accurate measurement of the resonance frequencies

x

with standard methods such as nested sampling [1] or reversible jump MCMC [2].

2. Notation

The symbol

π

is intended to convey a vague notion of a generally uninformative or weakly informative prior. Definite choices for

π

are indicated with the subscript i:

π_{i} (x) \equiv p (x | β_{i}, I_{i}), (i = 1, 2, 3),

(3)

where

β_{i}

is a placeholder for the hyperparameter specific to

π_{i}

. Note that in the plots below and for the experiments in Section 5, the values of the

β_{i}

are always set according to Table 1.

Each

π_{i}

uniquely determines a number of important high-level quantities since the likelihood

L (x)

and data D are assumed to be given. These quantities are the evidence for the model with K resonances:

Z_{i} (K) = \int d^{K} x L (x) π_{i} (x),

(4)

the posterior:

P_{i} (x) = \frac{L (x) π_{i} (x)}{Z_{i} (K)},

(5)

and the information:

H_{i} (K) = \int d^{K} x P_{i} (x) log \frac{P_{i} (x)}{π_{i} (x)},

(6)

which measures the amount of information obtained by updating from prior

π_{i}

to posterior

P_{i}

, i.e.,

H_{i} (K) \equiv D_{KL} (P_{i} | π_{i})

, where

D_{KL}

is the Kullback–Leibler divergence.

3. Conflict

The uninformative priors

π

referenced in (2) are of the independent and identically distributed type:

π (x) = \prod_{k = 1}^{K} g (x_{k} | β),

(7)

where

g (x | β)

is any wide distribution with hyperparameters

β

. A typical choice for g is the uniform distribution over the full frequency bandwidth; other examples include diffuse Gaussians or Jeffreys priors [3,4,5,6,7,8,9].

Second, the limited prior information I in (2) about K implies that the problem will involve model selection, since each value of K implicitly corresponds to a different model for the data. It is, thus, necessary to evaluate and compare evidence

Z (K) = \int d^{K} x L (x) π (x)

for each plausible K.

The conflict between these two elements is due to the label switching problem, which is a well-known issue in mixture modeling, e.g., [10]. The likelihood functions

L (x)

used in models parametrized by resonance frequencies are typically invariant to switching the label k; i.e., the index k of the frequency

x_{k}

has no distinguishable meaning in the model underlying the data. The posterior

P (x) \propto L (x) π (x)

will inherit this exchange symmetry if the prior is of type (7). Thus, if the model parameters

x

are well determined by the data D, the posterior landscape will consist of one primary mode, which is defined as a mode living in the ordered region:

R_{K} (x_{0}) = {x | x_{0} \leq x_{1} \leq x_{2} \leq \dots \leq x_{K}} with x_{0} > 0,

(8)

and

(K! - 1)

induced modes, which are identical to the primary mode up to a permutation of the labels k and, thus, live outside of the region

R_{K} (x_{0})

. The trouble is that correctly taking into account these induced modes during the evaluation of

Z (K)

requires a surprising amount of extra work in addition to tuning the MCMC method of choice, and that is the label switching problem in our setting. In fact, there is currently no widely accepted solution for the label switching problem in the context of mixture models either [11,12]. This is, then, how in (2) uninformative priors

π

are “precluded” by the limited information I: the latter implies model selection, which in turn implies evaluating

Z (K)

, which is hampered by the label switching problem due to the exchange symmetry of the former. Therefore, it seems better to try to avoid it by encoding our preference for primary modes directly into the prior. This results in abandoning the uninformative prior

π

in favor of the weakly informative prior

π_{3}

, which is proposed in Section 4 as a solution to the conflict.

We use the VTR problem to briefly illustrate the label switching problem in Figure 1. The likelihood

L (x)

is described implicitly in Section 5 and is invariant to switching the labels k because the underlying model function (23) of the regression model is essentially a sum of sinusoids, one for each

x_{k}

. As frequencies can be profitably thought of as scale variables ([13], Appendix A), the uninformative prior (7) is represented by

π_{1} (x) \equiv p (x | x_{0}, x_{\max}, I_{1}) = \prod_{k = 1}^{K} h (x_{k} | x_{0}, x_{\max}),

(9)

where

β_{1} \equiv (x_{0}, x_{\max})

are a common lower and upper bound, and

h (x | a, b) = {\begin{matrix} \frac{1}{log (b / a)} \frac{1}{x} & if a \leq x \leq b \\ 0 & otherwise \end{matrix} with \begin{matrix} a & > 0 \\ b & < \infty \end{matrix}

(10)

is the Jeffreys prior, the conventional uninformative prior for a scale variable [although any prior of the form (7) that is sufficiently uninformative would yield essentially the same results.] We have visualized the posterior landscape

P_{1} (x)

in Figure 1 by using the pairwise marginal posteriors

P_{1} (x_{k}, x_{ℓ})

plotted in blue. Note the exchange symmetry of

P_{1}

, which manifests as an (imperfect) reflection symmetry around the dotted diagonal

x_{k} = x_{ℓ}

bordering the ordered region

R_{3} (x_{0})

. The primary mode can be identified by the black dot; all other modes are induced modes. Integrating all

K!

modes to obtain

Z (K)

quickly becomes intractable for

Z ≳ 4

.

A Simple Way Out?

A simple method out of the conflict is to break the exchange symmetry by assuming specialized bounds for each

x_{k}

:

π_{2} (x) \equiv p (x | a, b, I_{2}) = \prod_{k = 1}^{K} h (x_{k} | a_{k}, b_{k}),

(11)

where

β_{2} \equiv (a, b)

with

a = {a_{k}}_{k = 1}^{K}

and

b = {b_{k}}_{k = 1}^{K}

being hyperparameters specifying the individual bounds. However, in order to enable the model to detect doublets (a resolved pair of two close frequencies such as the primary mode in the leftmost panel in Figure 1), it is necessary to assign overlapping bounds in

(a, b)

, presumably by using some heuristic. The necessary degree of overlap increases as the detection of higher order multiplets such as triplets (which can and do occur) is desired, but the more overlap in

(a, b)

, the more the label switching problem returns. Despite this issue, there will be cases where we have sufficient prior information I to set the

(a, b)

hyperparameters without too much trouble; the VTR problem is such a case for which the overlapping values of

(a, b)

up to

K = 5

are given in Table 1.

4. Solution

Our solution to the conflict (2) is a chain of K coupled Pareto distributions:

π_{3} (x) \equiv p (x | \bar{x_{0}}, I_{3}) = \prod_{k = 1}^{K} Pareto (x_{k} | x_{k - 1}, λ_{k})

(12)

where

Pareto (x | x_{*}, λ) = {\begin{matrix} \frac{λ x_{*}^{λ}}{x^{λ + 1}} & if x \geq x_{*} \\ 0 & otherwise \end{matrix} with \begin{matrix} x_{*} & > 0 \\ λ & > 0, \end{matrix}

(13)

and the hyperparameter

β_{3} \equiv \bar{x_{0}}

is defined as

\bar{x_{0}} \equiv (\bar{x_{0}}, \bar{x}), \bar{x_{0}} : = x_{0}, \bar{x} = {\bar{x_{k}}}_{k = 1}^{K}, λ_{k} = \frac{\bar{x_{k}}}{\bar{x_{k}} - \bar{x_{k - 1}}} .

(14)

From Figure 2, it can be seen that

π_{3}

encodes weakly informative knowledge about K ordered frequencies: (12) and (13) together imply that

π_{3} (x)

is defined only for

\bar{x} \in R_{K} (x_{0})

, while nonzero only for

x \in R_{K} (x_{0})

. In other words, its support is precisely the ordered region

R_{K} (x_{0})

, which solves the label switching problem underlying the conflict automatically, as the exchange symmetry of

π

is broken. This is illustrated in Figure 2, where

P_{3}

contracts to a single primary mode, which is just what we would like.

The

K + 1

hyperparameters

\bar{x_{0}}

in (14) are a common lower bound

x_{0}

plus K expected values of the resonance frequencies

\bar{x}

. While the former is generally easily determined, the latter may seem difficult to set given the premise of this paper that we dispose only of limited prior information I. Why do we claim that

π_{3}

is only weakly informative if it is parametrized by the expected values of the very things it is supposed to be only weakly informative about? The answer is that for any reasonable amount of data, inference based on

π_{3}

is completely insensitive to the exact values of

\bar{x}

. Therefore, any reasonable guess for

\bar{x_{0}}

will suffice in practice. For example, for the VTR problem, we simply applied a heuristic where we take

\bar{x_{k}} = k \times 500 Hz

(see Table 1). This insensitivity is due to the maximum entropy status of

π_{3}

and indicates the weak inductive bias it entails. On a more prosaic level, the heavy tails of the Pareto distributions in (12) ensure that the prior will be eventually overwhelmed by the data no matter how a priori improbable the true value of

x

is. More prosaic still, in Section 5.1 below we show quantitatively that for the VTR problem

π_{3}

is about as (un)informative as

π_{2}

.

4.1. Derivation of $π_{3}$

Our ansatz consists of interpreting the

x

as a set of K ordered scale variables that are bounded from below by

x_{0}

. Starting from (9) and not bothering with the bounds

(a, b)

, we obtain the improper pdf

m (x) \propto {\begin{matrix} \prod_{k = 1}^{K} \frac{1}{x_{k}} & x \in R_{K} (x_{0}) \\ 0 & otherwise . \end{matrix}

(15)

We can simplify (15) using the one-to-one transformation

x \leftrightarrow u

defined as

\begin{matrix} x & \to u : & u_{k} & = log \frac{x_{k}}{x_{k - 1}} & (k & = 1, 2, \dots, K) \\ u & \to x : & x_{k} & = x_{0} \exp \sum_{κ = 1}^{k} u_{κ} & (k & = 1, 2, \dots, K) \end{matrix}

(16)

which yields (with abuse of notation for brevity)

m (u) \propto {\begin{matrix} 1 & u \geq 0 \\ 0 & otherwise . \end{matrix}

(17)

Since model selection requires proper priors, we need to normalize

m (u)

by adding extra information (i.e., constraints) to it; we propose to simply fix the K first moments

〈 u 〉 = {〈 u_{k} 〉}_{k = 1}^{K}

. This will yield the Pareto chain prior

π_{3} (u)

directly, expressed in

u

space rather than

x

space. The expression for

π_{3} (u)

is found by minimizing the Kullback–Leibler divergence [14]

D_{KL} (π_{3} | m) = \int d^{K} u π_{3} (u) log \frac{π_{3} (u)}{m (u)}, subject to 〈 u 〉 \equiv \int d^{K} u u π_{3} (u) = \bar{u},

(18)

where

\bar{u} = {\bar{u_{k}}}_{k = 1}^{K}

are the supplied first moments. This variational problem is equivalent to finding

π_{3} (u)

by means of Jaynes’ principle of maximum entropy with

m (u)

serving as the invariant measure [15]. Since the exponential distribution

Exp (x | λ)

is the maximum entropy distribution for a random variable

x \geq 0

with a fixed first moment

〈 x 〉 = 1 / λ

, the solution to (18) is

π_{3} (u) = \prod_{k = 1}^{K} Exp (u_{k} | λ_{k}),

(19)

where the rate hyperparameters

λ_{k} = 1 / \bar{u_{k}}

and

Exp (x | λ) = {\begin{matrix} λ \exp {- λ x} & if x \geq 0 \\ 0 & otherwise \end{matrix} with λ > 0 .

(20)

Transforming (19) to

x

space using (16) finally yields (12), but we still need to express

λ_{k}

in terms of

\bar{x}

—we might find it hard to pick reasonable values of

\bar{u_{k}} = \bar{log x_{k} / x_{k - 1}}

from limited prior information I. For this, we will need the identity

〈 x_{k} 〉 \equiv \int d^{K} x x_{k} π_{3} (x) = \frac{λ_{k}}{λ_{k} - 1} 〈 x_{k - 1} 〉 (k = 1, 2, \dots, K) .

(21)

Constraining

〈 x_{k} 〉 = \bar{x_{k}}

and solving for

λ_{k}

, we obtain

λ_{k} = \bar{x_{k}} / (\bar{x_{k}} - \bar{x_{k - 1}})

, in agreement with (14). Note that the existence of the first marginal moments

x_{k}

requires that

λ_{k} > 1

.

4.2. Sampling from $π_{3}$

Sampling from

π_{3}

is trivial because of the independence of the

u_{k}

in

u

space (19). To produce a sample

x^{'} \sim π_{3} (x)

given the hyperparameter

\bar{x_{0}}

, compute the corresponding rate parameters

{λ_{k}}_{k = 1}^{K}

from (14), and use them in (19) to obtain a sample

u^{'} \sim π_{3} (u)

. The desired

x^{'}

is then obtained from

u^{'}

using the transformation (16).

5. Application: The VTR Problem

We now present a relatively simple but realistic instance of the problem of measuring resonance frequencies, which will allow us to illustrate the above ideas. The VTR problem consists of measuring human vocal tract resonance (VTR) frequencies

x

for each of five representative vowel sounds taken from the CMU ARCTIC database [16]. The VTR frequencies

x

describe the vocal tract transfer function

T (x)

and are fundamental quantities in acoustic phonetics [17]. The five vowel sounds are recorded utterances of the first vowel in the words

W = {shore, that, you, little, until}

. In order to achieve high-quality VTR frequency estimates

\hat{x}

, only the quasi-periodic steady-state part of the vowel sound is considered for the measurement. The data D, thus, consists of a string of highly correlated pitch periods. See Figure 3 for an illustration of these concepts.

The measurement itself is formalized as inference using the probabilistic model (1). The model assumed to underlie the data is the sinusoidal regression model introduced in [18]; due to limited space, we only describe it implicitly. The sinusoidal regression model assumes that each pitch period

d \in D

can be modeled as

d_{t} = f (t; A, α, x) + σ e_{t} where e_{t} \sim N (0, 1), (t = 1, 2, \dots, T),

(22)

where

d = {d_{t}}_{t = 1}^{T}

is a time series consisting of T samples. The model function

f (t; A, α, x) = \sum_{k = 1}^{K} [A_{k} cos (x_{k} t) + A_{K + k} sin (x_{k} t)] exp {- α_{k} t} + \sum_{ℓ = 1}^{L} A_{2 K + ℓ} t^{ℓ - 1}

(23)

consists of a sinusoidal part (first ∑) and a polynomial trend correction (second ∑). Note the additional model parameters

θ = {A, α, σ, L}

. Formally, given the prior

p (θ)

([18], Section 2.2), the marginal likelihood

L (x)

is then obtained as

L (x) = \int θ L (x, θ) p (θ)

, where the complete likelihood

L (x, θ)

is implicitly given by (22) and (23). Practically, we just marginalize out

θ

from samples obtained from the complete problem

p (D, x, θ | I)

.

For inference, the computational method of choice is nested sampling [1] using the dynesty library [19,20,21,22,23], which scales roughly as

O (K^{2})

[24]. Since the VTR problem is quite simple (

H_{i} (K) \sim 30 nats

), we only perform single nested sampling runs and take the obtained

log Z_{i} (K)

and

H_{i} (K)

as point estimates. Full details on the experiments and data are available at https://github.com/mvsoom/frequency-prior.

5.1. Experiment I: Comparing $π_{2}$ and $π_{3}$

In Experiment I, we perform a high-level comparison between

π_{2}

and

π_{3}

in terms of evidence (4) and information (6). The values of the hyperparameters used in the experiment are listed in Table 1. We did not include

π_{1}

in this comparison as the label switching problem prevented convergence of nested sampling runs for

K \geq 4

. The

(a, b)

bounds for

π_{2}

were based on loosely interpreting the VTRs as formants and consulting formant tables from standard works [25,26,27,28,29,30]. These allowed us to compile bounds up until the fifth formant such that

K_{\max} = 5

. For

π_{3}

, we simply applied a heuristic where we take

\bar{x_{k}} = k \times 500 Hz

. We selected

x_{0}

empirically (although a theoretical approach is also possible [31]), and

x_{\max}

was set to the Nyquist frequency. The role of

x_{\max}

is to truncate

π_{3}

in order to avoid aliasing effects, since the support of

π_{3} (x_{i})

is unbounded from above. We implemented this by using the following likelihood function in the nested sampling program:

L^{'} (x) = {\begin{matrix} L (x) & if x_{k} \leq x_{\max} for all (k = 1, 2, \dots, K) \\ 0 & otherwise . \end{matrix}

(24)

First, we compare the influence of

π_{2}

and

π_{3}

on model selection. Given

D \in W

, the posterior probability of the number of resonances K is given by the following.

p_{i} (K) = \frac{Z_{i} (K)}{\sum_{K^{'}} Z_{i} (K^{'})} (K = 1, 2, \dots, K_{\max}) .

(25)

The results in the top row of Figure 4a are striking: while

p_{2} (K)

shows individual preferences based on D,

p_{3} (K)

prefers

K = K_{\max}

unequivocally.

Second, in Figure 4b, we compare

π_{2}

and

π_{3}

directly in terms of differences in evidence [

log Z_{i} (K)

] and uninformativeness [

H_{i} (K)

] for each combination

(D, K)

.

Arrows pointing eastward indicate

Z_{3} (K) > Z_{2} (K)

. The

π_{3}

prior dominates the

π_{2}

prior in terms of evidence, for almost all values of K, indicating that

π_{3}

places its mass in regions of higher likelihood or, equivalently, that the data were much more probable under

π_{3}

than

π_{2}

. This implies that the hint of

π_{3}

at more structure beyond

K > K_{\max}

should be taken serious–we investigate this in Section 5.2.

Arrows pointing northward indicate

H_{3} (K) > H_{2} (K)

, i.e.,

π_{3}

is less informative than

π_{2}

, since more information is gained by updating from

π_{3}

to

P_{3}

than from

π_{2}

to

P_{2}

. It is observed that

π_{2}

and

π_{3}

are roughly comparable in terms of (un)informativeness.

5.2. Experiment II: ‘Free’ Analysis

We now freely look for more structure in the data by letting K vary up until

K_{\max} = 10

. This goes beyond the capacities of

π_{1}

(because of the label switching problem) and

π_{2}

(because no data are available to set the

(a, b)

bounds). Thus, the great advantage of

π_{3}

is that we can use a simple heuristic to set

\bar{x_{0}}

and let the model perform the discovering without worrying about convergence issues or the obtained evidence values. The bottom row in Figure 4a shows that model selection for the VTR problem is well-defined, with the most probable values of

K \leq 10

, except for

D = until

. That case is investigated in Figure 3, where the need for more VTRs (higher K) is apparent from the unmodeled broad peak centered at around 3000 Hz in the FFT power spectrum (right panel). Incidentally, this spectrum also shows that spectral peaks are often resolved into more than one VTR, which underlines the importance of using a prior that enables trouble-free handling of multiplets of arbitrary order. A final observation from the spectrum is the fact that the inferred

{\hat{x}}_{k}

differs substantially from the supplied values in

\bar{x}

(Table 1), which hints at the weak inductive bias underlying

π_{3}

.

6. Discussion

It is only when the information in the prior is comparable to the information in the data that the prior probability can make any real difference in parameter estimation problems or in model selection problems.
([32], p. 9)

Although the weakly informative prior for resonance frequencies

π_{3}

is meant to be overwhelmed, its practical advantage (i.e., solving the label switching problem) will nonetheless persist, making a real difference in model selection problems even when “the information in the prior” is much smaller than “the information in the data”. In this sense,

π_{3}

is quite unlike the prior referenced in the above quote. Since it will be overwhelmed, all it has to do is provide a reasonable density everywhere (which it does), be easily parametrizable (which it is), and be easy to sample from (which it is).

Thus, we hope that this prior can enable the use of robust evidence-based methods for a new class of problems, even in the presence of multiplets of arbitrary order. The prior is compatible with off-the-shelf exploration algorithms and solves the label switching problem without any special tuning or post processing. It would be interesting to compare it to other approaches, e.g., [33], especially in terms of exploration efficiency. It is valid for any collection of scale variables that is intrinsically ordered, of which frequencies and wavelengths seem to be the most natural examples. Some examples of recent work where the prior could be applied directly are:

Nuclear magnetic resonance (NMR) spectroscopy [34];
Resonant ultrasound spectroscopy (a standard method in material science) [35];
In the analysis of atomic spectra [36], such as X-ray diffraction [37];
Accurate modeling of instrument noise (in this case LIGO/Virgo noise) [38];
Model-based Bayesian analysis in acoustics [39].

Author Contributions

Conceptualization, writing, methodology, and analysis: M.V.S. Supervision: B.d.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Flemish AI plan and by the Research Foundation Flanders (FWO) under grant number G015617N.

Acknowledgments

We would like to thank Roxana Radulescu, Timo Verstraeten, and Yannick Jadoul for helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Skilling, J. Nested Sampling for General Bayesian Computation. Bayesian Anal. 2006, 1, 833–859. [Google Scholar] [CrossRef]
Green, P.J. Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination. Biometrika 1995, 82, 711–732. [Google Scholar] [CrossRef]
Mark, Y.Z.; Hasegawa-johnson, M. Particle Filtering Approach to Bayesian Formant Tracking. In Proceedings of the IEEE Workshop on Statistical Signal Processing, St. Louis, MO, USA, 28 September–1 October 2003. [Google Scholar]
Zheng, Y.; Hasegawa-Johnson, M. Formant Tracking by Mixture State Particle Filter. In Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada, 17–21 May 2004; Volume 1, pp. 1–565. [Google Scholar] [CrossRef]
Yan, Q.; Vaseghi, S.; Zavarehei, E.; Milner, B.; Darch, J.; White, P.; Andrianakis, I. Formant Tracking Linear Prediction Model Using HMMs and Kalman Filters for Noisy Speech Processing. Comput. Speech Lang. 2007, 21, 543–561. [Google Scholar] [CrossRef]
Mehta, D.D.; Rudoy, D.; Wolfe, P.J. Kalman-Based Autoregressive Moving Average Modeling and Inference for Formant and Antiformant Tracking. J. Acoust. Soc. Am. 2012, 132, 1732–1746. [Google Scholar] [CrossRef]
Shi, Y.; Chang, E. Spectrogram-Based Formant Tracking via Particle Filters. In Proceedings of the (ICASSP ’03), 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, 6–10 April 2003; Volume 1, p. 1. [Google Scholar] [CrossRef]
Deng, L.; Lee, L.J.; Attias, H.; Acero, A. Adaptive Kalman Filtering and Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model. IEEE Trans. Audio Speech Lang. Process. 2007, 15, 13–23. [Google Scholar] [CrossRef]
Luberadzka, J.; Kayser, H.; Hohmann, V. Glimpsed Periodicity Features and Recursive Bayesian Estimation for Modeling Attentive Voice Tracking. Int. Congr. Acoust. 2019, 9, 8. [Google Scholar]
Stephens, M. Dealing with Label Switching in Mixture Models. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2000, 62, 795–809. [Google Scholar] [CrossRef]
Celeux, G.; Kamary, K.; Malsiner-Walli, G.; Marin, J.M.; Robert, C.P. Computational Solutions for Bayesian Inference in Mixture Models. arXiv 2018, arXiv:1812.07240. [Google Scholar]
Celeux, G.; Fruewirth-Schnatter, S.; Robert, C.P. Model Selection for Mixture Models - Perspectives and Strategies. arXiv 2018, arXiv:1812.09885. [Google Scholar]
Bretthorst, G.L. Bayesian Spectrum Analysis and Parameter Estimation; Springer: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
Knuth, K.H.; Skilling, J. Foundations of Inference. Axioms 2012, 1, 38–73. [Google Scholar] [CrossRef] [Green Version]
Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 227–241. [Google Scholar] [CrossRef]
Kominek, J.; Black, A.W. The CMU Arctic Speech Databases. In Proceedings of the Fifth ISCA Workshop on Speech Synthesis, Pittsburgh, PA, USA, 14–16 June 2004. [Google Scholar]
Van Soom, M.; de Boer, B. A New Approach to the Formant Measuring Problem. Proceedings 2019, 33, 29. [Google Scholar] [CrossRef] [Green Version]
Van Soom, M.; de Boer, B. Detrending the Waveforms of Steady-State Vowels. Entropy 2020, 22, 331. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Speagle, J.S. Dynesty: A Dynamic Nested Sampling Package for Estimating Bayesian Posteriors and Evidences. arXiv 2019, arXiv:1904.02180. [Google Scholar] [CrossRef] [Green Version]
Feroz, F.; Hobson, M.P.; Bridges, M. MULTINEST: An Efficient and Robust Bayesian Inference Tool for Cosmology and Particle Physics. Mon. Not. R. Astron. Soc. 2009, 398, 1601–1614. [Google Scholar] [CrossRef] [Green Version]
Neal, R.M. Slice Sampling. Ann. Stat. 2003, 31, 705–767. [Google Scholar] [CrossRef]
Handley, W.J.; Hobson, M.P.; Lasenby, A.N. POLYCHORD: Nested Sampling for Cosmology. Mon. Not. R. Astron. Soc. 2015, 450, L61–L65. [Google Scholar] [CrossRef]
Handley, W.J.; Hobson, M.P.; Lasenby, A.N. POLYCHORD: Next-Generation Nested Sampling. Mon. Not. R. Astron. Soc. 2015, 453, 4384–4398. [Google Scholar] [CrossRef] [Green Version]
Buchner, J. Nested Sampling Methods. arXiv 2021, arXiv:2101.09675. [Google Scholar]
Peterson, G.E.; Barney, H.L. Control Methods Used in a Study of the Vowels. J. Acoust. Soc. Am. 1952, 24, 175–184. [Google Scholar] [CrossRef]
Hillenbrand, J.; Getty, L.A.; Clark, M.J.; Wheeler, K. Acoustic Characteristics of American English Vowels. J. Acoust. Soc. Am. 1995, 97, 3099–3111. [Google Scholar] [CrossRef] [Green Version]
Vallée, N. Systèmes Vocaliques: De La Typologie Aux Prédictions. Ph.D. Thesis, Université Stendhal, Grenoble, France, 1994. [Google Scholar]
Kent, R.D.; Vorperian, H.K. Static Measurements of Vowel Formant Frequencies and Bandwidths: A Review. J. Commun. Disord. 2018, 74, 74–97. [Google Scholar] [CrossRef]
Vorperian, H.K.; Kent, R.D.; Lee, Y.; Bolt, D.M. Corner Vowels in Males and Females Ages 4 to 20 Years: Fundamental and F1–F4 Formant Frequencies. J. Acoust. Soc. Am. 2019, 146, 3255–3274. [Google Scholar] [CrossRef] [PubMed]
Klatt, D.H. Software for a Cascade/Parallel Formant Synthesizer. J. Acoust. Soc. Am. 1980, 67, 971–995. [Google Scholar] [CrossRef] [Green Version]
de Boer, B. Acoustic Tubes with Maximal and Minimal Resonance Frequencies. J. Acoust. Soc. Am. 2008, 123, 3732. [Google Scholar] [CrossRef] [Green Version]
Bretthorst, G.L. Bayesian Analysis. II. Signal Detection and Model Selection. J. Magn. Reson. 1990, 88, 552–570. [Google Scholar] [CrossRef]
Buscicchio, R.; Roebber, E.; Goldstein, J.M.; Moore, C.J. Label Switching Problem in Bayesian Analysis for Gravitational Wave Astronomy. Phys. Rev. D 2019, 100, 084041. [Google Scholar] [CrossRef] [Green Version]
Wilson, A.G.; Wu, Y.; Holland, D.J.; Nowozin, S.; Mantle, M.D.; Gladden, L.F.; Blake, A. Bayesian Inference for NMR Spectroscopy with Applications to Chemical Quantification. arXiv 2014, arXiv:1402.3580. [Google Scholar]
Xu, K.; Marrelec, G.; Bernard, S.; Grimal, Q. Lorentzian-Model-Based Bayesian Analysis for Automated Estimation of Attenuated Resonance Spectrum. IEEE Trans. Signal Process. 2019, 67, 4–16. [Google Scholar] [CrossRef]
Trassinelli, M. Bayesian Data Analysis Tools for Atomic Physics. Nucl. Instruments Methods Phys. Res. Sect. Beam Interact. Mater. Atoms 2017, 408, 301–312. [Google Scholar] [CrossRef] [Green Version]
Fancher, C.M.; Han, Z.; Levin, I.; Page, K.; Reich, B.J.; Smith, R.C.; Wilson, A.G.; Jones, J.L. Use of Bayesian Inference in Crystallographic Structure Refinement via Full Diffraction Profile Analysis. Sci. Rep. 2016, 6, 31625. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Littenberg, T.B.; Cornish, N.J. Bayesian Inference for Spectral Estimation of Gravitational Wave Detector Noise. Phys. Rev. D 2015, 91, 084034. [Google Scholar] [CrossRef] [Green Version]
Xiang, N. Model-Based Bayesian Analysis in Acoustics—A Tutorial. J. Acoust. Soc. Am. 2020, 148, 1101–1120. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The exchange symmetry of the posterior

P_{1} (x)

for a well-determined instance of the VTR problem from Section 5 with

K : = 3

. The pairwise marginal posteriors

P_{1} (x_{k}, x_{ℓ})

are shown using the isocontours of kernel density approximations calculated from posterior samples of

x

. For each panel, the diagonal

x_{k} = x_{ℓ}

is plotted as a dotted line, and the ordered region

R_{3} (x_{0})

is shaded in grey. The black dot marks the mean of the primary mode for this problem.

Figure 1. The exchange symmetry of the posterior

P_{1} (x)

for a well-determined instance of the VTR problem from Section 5 with

K : = 3

. The pairwise marginal posteriors

P_{1} (x_{k}, x_{ℓ})

are shown using the isocontours of kernel density approximations calculated from posterior samples of

x

. For each panel, the diagonal

x_{k} = x_{ℓ}

is plotted as a dotted line, and the ordered region

R_{3} (x_{0})

is shaded in grey. The black dot marks the mean of the primary mode for this problem.

Figure 2. Contraction of prior (

π_{3}

) to posterior (

P_{3}

) for the application of

π_{3}

to the VTR problem used in Figure 1. The pairwise marginal prior

π_{3} (x_{k}, x_{ℓ})

is obtained by integrating out the third frequency; for example,

π_{3} (x_{1}, x_{2}) = \int d x_{3} π_{3} (x)

. Unlike

P_{1}

in Figure 1,

P_{3}

exhibits only a single mode that coincides with the primary mode as marked by the black dot.

Figure 2. Contraction of prior (

π_{3}

) to posterior (

P_{3}

) for the application of

π_{3}

to the VTR problem used in Figure 1. The pairwise marginal prior

π_{3} (x_{k}, x_{ℓ})

is obtained by integrating out the third frequency; for example,

π_{3} (x_{1}, x_{2}) = \int d x_{3} π_{3} (x)

. Unlike

P_{1}

in Figure 1,

P_{3}

exhibits only a single mode that coincides with the primary mode as marked by the black dot.

Figure 3. The VTR problem for the case

(D : = until, K : = 10)

. Left panel: The data D, i.e., the quasi-periodic steady-state part, consist of 3 highly correlated pitch periods. Right panel: Inferred VTR frequency estimates

{{\hat{x}}_{k}}_{k = 1}^{K}

for

K : = 10

at 3 sigma. They describe the power spectral density of the vocal tract transfer function

{| T (x) |}^{2}

, represented here by 25 posterior samples and compared to the Fast Fourier Transform (FFT) of D. All

{\hat{x}}_{k}

are well resolved, and most have error bars too small to be seen on this scale.

Figure 3. The VTR problem for the case

(D : = until, K : = 10)

. Left panel: The data D, i.e., the quasi-periodic steady-state part, consist of 3 highly correlated pitch periods. Right panel: Inferred VTR frequency estimates

{{\hat{x}}_{k}}_{k = 1}^{K}

for

K : = 10

at 3 sigma. They describe the power spectral density of the vocal tract transfer function

{| T (x) |}^{2}

, represented here by 25 posterior samples and compared to the Fast Fourier Transform (FFT) of D. All

{\hat{x}}_{k}

are well resolved, and most have error bars too small to be seen on this scale.

Figure 4. (a) Model selection in Experiment I (top row) and Experiment II (bottom row). (b) In Experiment I,

π_{2}

and

π_{3}

are compared in terms of evidence [

log Z_{i} (K)

] and uninformativeness [

H_{i} (K)

] for each

(D, K)

. The arrows point from

π_{2}

to

π_{3}

and are color-coded by the value of K. For small values of K, the arrow lengths are too small to be visible on this scale.

Figure 4. (a) Model selection in Experiment I (top row) and Experiment II (bottom row). (b) In Experiment I,

π_{2}

and

π_{3}

are compared in terms of evidence [

log Z_{i} (K)

] and uninformativeness [

H_{i} (K)

] for each

(D, K)

. The arrows point from

π_{2}

to

π_{3}

and are color-coded by the value of K. For small values of K, the arrow lengths are too small to be visible on this scale.

Table 1. The values of the hyperparameters

β_{i}

used throughout the paper. All quantities are given in units of Hz.

Table 1. The values of the hyperparameters

β_{i}

used throughout the paper. All quantities are given in units of Hz.

$k \to$	0	1	2	3	4	5	6	7	8	9	10
$a = {a_{k}}$		200	600	1400	2900	3500
$b = {b_{k}}$		1100	3500	4000	4500	5500
$\bar{x_{0}} = {\bar{x_{k}}}$	200	500	1000	1500	2000	2500	3000	3500	4000	4500	5000
other		$x_{0} =$ 200					$x_{\max} =$ 5500

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Van Soom, M.; de Boer, B. A Weakly Informative Prior for Resonance Frequencies. Phys. Sci. Forum 2021, 3, 2. https://doi.org/10.3390/psf2021003002

AMA Style

Van Soom M, de Boer B. A Weakly Informative Prior for Resonance Frequencies. Physical Sciences Forum. 2021; 3(1):2. https://doi.org/10.3390/psf2021003002

Chicago/Turabian Style

Van Soom, Marnix, and Bart de Boer. 2021. "A Weakly Informative Prior for Resonance Frequencies" Physical Sciences Forum 3, no. 1: 2. https://doi.org/10.3390/psf2021003002

APA Style

Van Soom, M., & de Boer, B. (2021). A Weakly Informative Prior for Resonance Frequencies. Physical Sciences Forum, 3(1), 2. https://doi.org/10.3390/psf2021003002

Article Menu

A Weakly Informative Prior for Resonance Frequencies^†

Abstract

1. Introduction

2. Notation

3. Conflict

A Simple Way Out?

4. Solution

4.1. Derivation of $π_{3}$

4.2. Sampling from $π_{3}$

5. Application: The VTR Problem

5.1. Experiment I: Comparing $π_{2}$ and $π_{3}$

5.2. Experiment II: ‘Free’ Analysis

6. Discussion

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Weakly Informative Prior for Resonance Frequencies †

Abstract

1. Introduction

2. Notation

3. Conflict

A Simple Way Out?

4. Solution

4.1. Derivation of π 3

4.2. Sampling from π 3

5. Application: The VTR Problem

5.1. Experiment I: Comparing π 2 and π 3

5.2. Experiment II: ‘Free’ Analysis

6. Discussion

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

A Weakly Informative Prior for Resonance Frequencies^†

4.1. Derivation of $π_{3}$

4.2. Sampling from $π_{3}$

5.1. Experiment I: Comparing $π_{2}$ and $π_{3}$