An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model

Näger, Christoph; Kniesburges, Stefan; Tur, Bogac; Schoder, Stefan; Becker, Stefan

doi:10.3390/bioengineering10121343

Open AccessArticle

An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model

by

Christoph Näger

^1,*

,

Stefan Kniesburges

²

,

Bogac Tur

²,

Stefan Schoder

³

and

Stefan Becker

¹

Institute of Fluid Mechanics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany

²

Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany

³

Aeroacoustics and Vibroacoustics Group, Institute of Fundamentals and Theory in Electrical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Austria

^*

Author to whom correspondence should be addressed.

Bioengineering 2023, 10(12), 1343; https://doi.org/10.3390/bioengineering10121343

Submission received: 28 September 2023 / Revised: 12 November 2023 / Accepted: 19 November 2023 / Published: 22 November 2023

(This article belongs to the Special Issue Fundamentals and Applications of Fluid Mechanics and Acoustics in Biomedical Engineering)

Download

Browse Figures

Versions Notes

Abstract

In the human phonation process, acoustic standing waves in the vocal tract can influence the fluid flow through the glottis as well as vocal fold oscillation. To investigate the amount of acoustic back-coupling, the supraglottal flow field has been recorded via high-speed particle image velocimetry (PIV) in a synthetic larynx model for several configurations with different vocal tract lengths. Based on the obtained velocity fields, acoustic source terms were computed. Additionally, the sound radiation into the far field was recorded via microphone measurements and the vocal fold oscillation via high-speed camera recordings. The PIV measurements revealed that near a vocal tract resonance frequency f_R, the vocal fold oscillation frequency f_o (and therefore also the flow field’s fundamental frequency) jumps onto f_R. This is accompanied by a substantial relative increase in aeroacoustic sound generation efficiency. Furthermore, the measurements show that f_o-f_R-coupling increases vocal efficiency, signal-to-noise ratio, harmonics-to-noise ratio and cepstral peak prominence. At the same time, the glottal volume flow needed for stable vocal fold oscillation decreases strongly. All of this results in an improved voice quality and phonation efficiency so that a person phonating with f_o-f_R-coupling can phonate longer and with better voice quality.

Keywords:

human phonation; source–filter interaction; particle image velocimetry; synthetic larynx model; transmission line model; aeroacoustic source computation

1. Introduction

The human voice is generated by a complex physiological process that is described by the fluid–structure–acoustic interaction (FSAI) between the tracheal fluid flow, structural vibration of laryngeal tissue (i.e., the vocal folds), and the sound generation and modulation in the larynx and vocal tract [1,2,3]. In this process, the two vocal folds are aerodynamically stimulated to vibrate by the airflow that arises from the lungs. In turn, this vibration leads to a modulation of the airflow, generating a pulsating jet flow in the supraglottal region, which is above the vocal folds [1].

Within this dynamic process of tissue–flow interaction, the basic sound of the human voice is generated by the highly complex 3D field of aeroacoustic sound sources which are produced by the turbulent jet flow in larynx [4]. Moreover, vibroacoustical sound generation also occurs by sound radiation from the vocal fold surface [5]. The generated basic sound is further filtered by the vocal tract and radiated through the mouth, exhibiting the typical spectral characteristics of the human voice composed of tonal harmonic components of the fundamental frequency and additional tonal components, called formants, originating from resonance effects in the vocal tract [6].

In early times, a linear behavior between sound source (vocal folds) and filter (vocal tract) was assumed within the linear source–filter theory, which excluded the influence of the acoustic filter signal back on the source [7]. However, this simplified representation turned out to be invalid, especially when the fundamental oscillation frequency of the vocal folds

f_{o}

is close to a resonance frequency

f_{R}

of the vocal tract [8]. Acoustic back-coupling has been studied using theoretical modeling (e.g., [9,10]), simulations (e.g., [11,12,13]), in vivo studies (e.g., [14,15]), and ex vivo and in vitro experiments (e.g., [16,17,18]). However, due to the complexity of the problem, most studies were restricted to metrics such as

f_{o}

-variation and the change in the subglottal oscillation threshold pressure. The direct acoustic–tissue or acoustic–flow interaction have not yet been investigated.

In this context, Particle Image Velocimetry (PIV) enables us to measure the unsteady flow field in the aeroacoustic source region to gain a deep insight into the FSAI process of phonation. This technique has already been applied successfully to study aerodynamic effects in synthetic as well as ex vivo larynx models that showed typical vocal fold vibrations similar to phonation. In this context, classical planar low-frequency PIV measurements allowed to study the basic features of the supra- and intraglottal aerodynamics [19,20,21,22]. Being still 2D, the emergence of high-speed PIV techniques made it then possible to directly analyze aeroacoustic source terms that were computed from time-resolved PIV data obtained directly in the source region above the vocal folds [5,23]. These data provided the distribution and dynamics of aeroacoustic sources based on state-of-the-art aeroacoustic analogies, such as the Lighthill analogy [24] or the Perturbed Convective Wave Equations [25]. With the rising availability of 3D tomographic PIV measurements, even first studies of volumetric parameters as the Maximum Flow Declination Rate were investigated using ex vivo canine models [26].

However, none of the studies described above have investigated the effects of supraglottal acoustics on the glottal aerodynamics and the aeroacoustic source field yet. Therefore, the present study provides highly resolved data of the entire process to analyze the complete FSAI between vocal tract acoustics and laryngeal aerodynamics. Based on high-speed PIV measurements in combination with aerodynamic and acoustic pressure data, as well as high-speed visualizations of the vocal fold dynamics in a synthetic larynx model [22], the influence of the resonance effects formed in the vocal tract on the supraglottal flow field and the vocal fold motion is studied. Different vocal tract models have been applied with an incremental increase in length. This length change produced acoustic properties of the vocal tract that shifted its resonance frequency down to the fundamental frequency of the vocal folds. This procedure enabled us to systematically study the relationship between laryngeal flow and supraglottal acoustics.

2. Materials and Methods

2.1. Basic Experimental Setup

Synthetic vocal folds were cast from a single layer of Ecoflex 00-30 silicone (Smooth-On, Macungie, PA, USA) with a static Young’s modulus of

4.4

k

Pa

. Their shape was based on the M5 model [27,28], and is displayed in Figure 1. The vocal folds were glued into their mounting, positioned between the subglottal and supraglottal channels. In the prephonatory posturing of the vocal folds, the glottis was completely closed. The subglottal channel had a length of 210

m

m

and a rectangular cross-section of 18 mm × 15 mm, which is within the dimensional range found in vivo [1]. Furthermore, this length was chosen small enough to prevent the interaction of the vocal folds’ oscillation with the subglottal acoustic resonances (see the description in Lodermeyer et al. [21] based on the results by Zhang et al. [29]). The supraglottal channel had a rectangular section of

18 m m \times 15 m m

and a length of 80

m

m

in the region directly downstream of the vocal folds. Attached to it, a circular cross-section tube with a diameter of 32

m

m

followed. An additional tube with a diameter of 34

m

m

made it possible to adjust the vocal tract length and thereby resonance frequency continuously via telescoping. This basic setup is shown schematically in Figure 2, and is based on the setup previously described by, e.g., Kniesburges et al. [22]. A mass flow generator with a supercritical valve [30] produced a constant volume flow

\dot{V}

through the setup. Between the mass flow generator and the subglottal channel, a silencer was placed for conditioning the flow and attenuating sound propagation from the supply hose to the vocal fold position. Several measurements were performed for different vocal tract lengths in the interval

L \in [200, 800] mm

to study the influence of the vocal tract acoustic resonance frequencies on the supraglottal flow field and vocal fold oscillation. The mass flow rate for each length was set to the corresponding minimum required for vocal fold oscillation with complete glottis closure.

2.2. Measurement Setup

Multiple measurement tasks were performed. The transglottal pressure was recorded via two pressure sensors: in the subglottal channel, a Kulite XCQ-093 sealed gauge pressure sensor (Kulite Semiconductor, Leonia, NJ, USA) was flush-mounted into the channel wall 50

m

m

upstream of the glottis. In the supraglottal channel, a Kulite XCS-093 open gauge pressure sensor (Kulite Semiconductor, Leonia, NJ, USA) was mounted the same way at a distance of 50

m

m

downstream of the glottis. The sound radiation from the vocal tract end was recorded in our anechoic chamber by a Brüel and Kjaer 4189-L-001 1/2″-microphone (Brüel and Kjaer, Nærum, Denmark) at a distance of 1

m

perpendicular to the channel outlet. Microphone and wall pressure signals were sampled by a National Instruments PXIe 6356 multifunctional card (National Instruments, Austin, TX, USA) with a resolution of 16 bit and a sample rate of

44.1

k

Hz

. The vocal fold movement was recorded using a Photron FASTCAM SA-X2 high-speed camera (Photron, Tokyo, Japan) at a frame rate of 10

k

Hz

. From the microphone recordings, additional related parameters like the signal-to-noise ratio (SNR), harmonics-to-noise-ratio (HNR), as well as the cepstral peak prominence (CPP), were extracted using the Glottis Analysis Tools [31,32] (GAT; University Hospital Erlangen, Erlangen, Germany).

The planar flow velocity in the supraglottal region in the coronal plane midway along the vocal fold length was measured with a 2D-2C planar PIV setup. This setup is shown in Figure 3. The measurement region of interest was a rectangle with dimensions of

45 m m \times 18 m m

and chosen similarly to the previous study by Lodermeyer et al. [5]. For seeding purposes, a PIVlight30 seeding generator (PIVtec GmbH, Göttingen, Germany) based on Laskin nozzles for atomization of the seeding fluid was applied. PIVtec PIVfluid, which is a propylene glycol mixture based on double de-ionized water and other components, was used as a seeding fluid, resulting in a mean particle diameter of

1.2

μ

m

. The resulting Stokes number was

S t = 0.033 < 0.1

, yielding an acceptable flow tracing accuracy [33,34]. The particles were illuminated by a laser light sheet with a thickness of approximately

0.5

m

m

. The laser used in this study was a double-pulse, frequency-doubled Nd:YLF Continuum Terra PIV high-speed laser (Continuum, San Jose, CA, USA) with a wavelength of 527

n

m

and a repetition rate of

2 \times 5 kHz

. The offset between the two pulses was set to 4 μ

s

, realized by an ILA synchronizer (ILA, Jülich, Germany). A Vision Research Phantom v2511 high-speed camera (Vision Research Inc., Wayne, NJ, USA) in combination with a Canon Macro Lens EF 180 mm Ultrasonic lens (Canon, Tokyo, Japan) was applied to record the distribution of the illuminated seeding particles.

Two image pre-processing steps were applied to increase the signal-to-noise ratio in the recorded images. In the regions downstream of the vocal folds, a background removal via proper orthogonal decomposition (POD) proposed by Mendez et al. [35] was applied. This method is suited for removing background noise without moving walls, but showed poor background removal in the region close to the vocal folds. Therefore, a different approach was chosen for the region close to the glottis. Adatrao and Sciacchitano [36] proposed a background removal technique based on an anisotropic diffusion equation specifically for moving solid objects in PIV images. The results of both background removal techniques for one exemplary image are shown in Figure 4. It can be seen that the original image contains strong reflections at the vocal folds in the left part of the image. Also, sensor noise is visible in the rightmost quarter of the image. The POD-based background removal enables an almost complete removal of the sensor noise while removing most of the light reflections from the vocal folds. However, some artifacts are created around the vocal folds, masking the particle images, e.g., between the vocal folds. In contrast, the anisotropic diffusion-based background removal shows a sharp boundary around the vocal folds, improving the visibility of the particles between them. However, in this case, the sensor noise close to the right image boundary still remains present, albeit with reduced intensity. Therefore, to obtain the best result, the anisotropic diffusion approach is only used in the leftmost part of the images, while the POD-based approach is applied in the remaining part. Looking at Figure 4D, it appears as if there were less particles in the part close to the vocal folds visible than in the remaining image. This is a result of the light reflections from the vocal folds completely masking some particles in vicinity to the vocal folds. The PIV evaluation algorithm was still able to find enough particles in this region to obtain reliable velocity information, however.

Velocity vectors were extracted from the image pairs with the help of the commercial software PIVview2C 3.6.23 (PIVtec GmbH, Göttingen, Germany). For this purpose, a grid of

74 \times 56

correlation windows with an overlap of 50% was defined in the region of interest, leading to a spatial resolution of

Δ x \times Δ y = 0.62 m m \times 0.31 m m

. Outliers were detected via the universal outlier detection by Westerweel and Scarano [37] and interpolated with the information from the surrounding velocity vectors.

2.3. Aeroacoustic Source Computation

One important aspect of understanding the human voice production process is to evaluate the aeroacoustic sources, e.g., Lighthill’s source term for low Mach number isentropic turbulent flows

T (x, t) \approx \nabla \cdot \nabla \cdot (ρ_{0} u u)

(1)

with the velocities

u

measured via PIV and the ambient density

ρ_{0}

[23]. This distributed source term is aggregated in a summed source strength based on Lighthill’s analogy [24] by neglecting the retarded time effects in this acoustically compact 2D region of interest (ROI)

ϕ (t) = \frac{1}{4 π c_{0}^{2} (x_{1} - x_{0}) (y_{1} - y_{0})} \int_{(x_{0}, y_{0})}^{(x_{1}, y_{1})} T (x, t) \frac{π y d x d y}{r} = \frac{1}{4 c_{0}^{2} N_{x} N_{y}} \sum_{i} T (x_{i}, t) ∣ y_{i} ∣ .

(2)

The coordinate locations

x_{0}

,

x_{1}

,

y_{0}

,

y_{1}

are the bounding coordinates of the ROI, respectively,

c_{0}

the isentropic speed of sound, r the direction of a virtual observer point at 1

m

distance. It is assumed that the jet is rotationally symmetric around the rotation axis in x-direction, pointing in the flow direction and being centered in the middle of the vocal folds. From this equation, the root-mean-squared value is computed

Φ = \sqrt{\frac{1}{t_{1} - t_{0}} \int_{t_{0}}^{t_{1}} {(ϕ (t))}^{2} d t},

(3)

being a measure of the ability to generate aerodynamic sound. The equation is applied to the measured 2D mid-section, where the velocity’s principal direction is recorded. In addition, the aerodynamic input energy is quantified by

P = Δ p + \frac{1}{2} ρ_{0} U^{2},

(4)

using the subglottal pressure difference to the ambient pressure

Δ p

. With the input energy, the efficiency of the aeroacoustic sound generation yields

η = \frac{Φ c_{0}^{2} (x_{1} - x_{0}) (y_{1} - y_{0})}{P} .

(5)

2.4. Acoustic Characterization of the Vocal Tract

The acoustic properties of the vocal tract were determined using a transmission line model [38]. In this model, the acoustic pressure

p_{a c}

and volume velocity

u_{a c}

at the vocal tract input, i.e., the glottis, can be related to the same quantities at the output via chain matrix multiplication:

(\begin{matrix} p_{a c, o u t} \\ u_{a c, o u t} \end{matrix}) = K_{t r a c t} (\begin{matrix} p_{a c, i n} \\ u_{a c, i n} \end{matrix}) = (\begin{matrix} A_{t r a c t} & B_{t r a c t} \\ C_{t r a c t} & D_{t r a c t} \end{matrix}) (\begin{matrix} p_{a c, i n} \\ u_{a c, i n} \end{matrix})

(6)

Here, the

2 \times 2

-matrix

K_{t r a c t}

is built from chain multiplication of a series of

2 \times 2

-matrices

K_{i}

, each representing one part of the vocal tract with constant cross-section. These matrices

K_{i}

were computed with the equations derived by Sondhi and Schroeter [38]. In the case of our simplified vocal tract, there are three different cross-sections present: the rectangular section right above the glottis, the circular section of the first tube and the circular section of the second tube. The vocal tract input impedance

Z_{i n} = p_{a c, i n} / u_{a c, i n}

can be obtained from Equation (6):

Z_{i n} = \frac{D_{t r a c t} Z_{o u t} - B_{t r a c t}}{A_{t r a c t} - C_{t r a c t} Z_{o u t}}

(7)

The maxima of the frequency-dependent

Z_{i n}

thereby correspond to the vocal tract resonance frequencies. The transmission line model was implemented following the description given by Story et al. [39]. As the vocal tract walls were fabricated from aluminum and glass, they were modeled as rigid walls. The radiation impedance at the open end was approximated as a vibrating piston in an infinite baffle [40].

3. Results and Discussion

3.1. General Results

As already stated in Section 2.1, the experiment’s volume flow rate

\dot{V}

was set to the minimal flow rate necessary to induce oscillation with contact between the vocal folds. Table 1 lists the resulting

\dot{V}

for all nine configurations measured. It can be seen that

\dot{V}

stays roughly constant for

L \leq 340 m m

and decreases monotonically for larger L. The same behavior can be observed in the transglottal pressure

P_{t r a n s} = P_{s u b} - P_{s u p r a}

, where

P_{s u b}

and

P_{s u p r a}

are the mean pressure values measured by the pressure probes in the sub- and supraglottal channel, respectively. The decrease in

\dot{V}

with increasing L is similar to what Fulcher et al. [41] found, where they used an analytical surface wave model in combination with validation experiments to predict the phonation threshold pressure as a function of the vocal tract length. They related the decreased threshold pressure to an increase in vocal tract inertance due to the increased length.

The oscillation frequency of the vocal folds

f_{o}

, as extracted via discrete Fourier transformation from the PIV-measurements also shows a stationary behavior in the range

L \leq 340 m m

. It stays within the range

150 Hz < f_{o} < 153 Hz

for these lengths. At

L = 400 m m

,

f_{o}

jumps to

225.7

Hz

, which is 1.5-fold of

150.5

Hz

. Increasing the length further leads to a jump back to

\sim 150 Hz

and then a decrease in

f_{o}

down to

119.0

Hz

. An explanation for this behavior can be found looking at the relationship between

f_{o}

and the first vocal tract resonance frequency

f_{R 1}

as computed via the transmission line model. For this purpose, Figure 5 shows the vocal tract input impedance

Z_{i n}

as a function of the frequency f and L. The frequency of the maxima in

Z_{i n}

(indicated by the color yellow) correspond to the vocal tract resonance frequencies

f_{R i}

. As expected,

f_{R i}

decrease with an increase in L. On top of the contour of

Z_{i n}

, the values of

f_{o}

for the different chosen vocal tract lengths from Table 1 are displayed.

Here, it can be seen that for

L < 400 m m

and

L = 500 m m

,

f_{o}

and

f_{R 1}

are not in vicinity to each other. Therefore,

f_{o}

is approximately constant in this region, with the exception of small variations due to small changes in the experimental conditions as, e.g., a slight variation in

\dot{V}

. For

L \geq 600 m m

,

f_{R 1}

starts falling below 160

Hz

and therefore lies in vicinity of the “uninfluenced” value of

f_{o}

. This leads to a decrease in

f_{o}

with further increasing length in this length range. As a consequence,

f_{o}

“jumps” onto

f_{R 1}

in this range and the vocal fold oscillation is coupled to the acoustic standing wave in the vocal tract. This is in accordance to the experimental data observed by Migimatsu et al. [18] in their experimental study with the M5 model. In their work, a much larger increase in L up to ∼1 m led to an

f_{o}

-jump back to the original uninfluenced value due to

f_{R 1}

being not in the vicinity of the uninfluenced

f_{o}

anymore, leading to the domination of the vocal folds’ natural mechanical eigenmode. With our experimental setup, this could also be expected; however, this was not investigated. Zhang et al. also observed a locking of

f_{o}

onto the supraglottal resonances [16]. Similarly, Zhang et al. furthermore showed, that also a locking of

f_{o}

onto the resonance frequencies of the subglottal channel can occur, when they studied the influence of the subglottal resonances onto the vocal fold oscillation [16,29]. A special behavior happens at

L = 400 m m

. Here, despite the uninfluenced

f_{o}

lying still considerably below

f_{R 1}

, a

f_{o}

jump close to

f_{R 1}

takes place. As the oscillation at

L = 500 m m

falls back to similar

f_{o}

as at the smaller L, this gives the indication that there is a different behavior present at

L = 400 m m

that occurs due to a special combination of the eigenmodes of the vocal folds and the acoustic resonance frequency of the vocal tract. To visualize this, high-speed camera videos have been recorded for three different L: 200

m

m

, 400

m

m

and 700

m

m

, respectively. These lengths were chosen, as they are representative of the three different states of vocal fold oscillation we were able to identify: independent

f_{o}

and

f_{R 1}

, a jump of

f_{o}

to a higher frequency, and a shift of

f_{o}

to lower frequencies. The relationship between

f_{o}

and

f_{R 1}

for these lengths can be seen in Figure 5. Snapshots of one oscillation time period T for each of the chosen lengths are shown in Figure 6. The top row here corresponds to the baseline case, where

f_{o} ≪ f_{R 1}

. In this case, the vocal folds oscillate with a clearly visible convergent-divergent transglottal angle devolution, with a convergent glottal duct shape in the opening-phase and a divergent duct shape in the closing-phase. Similarly, at

L = 700 m m

, the same behavior can be seen, albeit with a smaller opening area due to the reduced

\dot{V}

in this case. In contrast, the oscillation at

L = 400 m m

looks considerably different compared to the other two cases. Here, the change between convergent and divergent shape change in the glottal duct does not directly correlate with opening and closing motion of the vocal folds as described by Titze [42] for aerodynamically driven vocal folds. Therefore, combined with the

f_{o}

jump to the vocal tract resonance frequency and a completely changed oscillatory behavior, this suggests that there is some kind of acoustic coupled motion of the vocal folds present in this case. As this only happens at

L = 400 m m

, it is reasonable to assume that an eigenmode of the vocal fold model at a frequency of about 225

Hz

is present in this case, that is excited by the acoustic standing wave of the vocal tract. This eigenmode is in the standard case not dominant compared to the 150

Hz

-mode and therefore not visible with the other vocal tract lengths.

3.2. Supraglottal Aerodynamics

PIV measurements have been performed for all configurations of Table 1. Again, the three cases of

L = 200 m m, L = 400 m m

and

L = 700 m m

are analyzed in more detail as they are representative of the different possible oscillatory behaviors of the vocal folds. Velocity fields of one oscillation cycle for each length are shown in Figure 7, Figure 8 and Figure 9. Additionally, mean velocity fields for all three configurations are displayed in Figure 10. Looking at Figure 7, one can see that the basic characteristic of the flow is an oscillating jet synchronized to the opening and closing of the vocal folds (displayed in gray on top of the velocity contours). The jet is deflected during the closing phase to the top vocal tract wall, forming a large vortex in the supraglottal channel in the closed phase of the vocal folds. Depending on the cycle, also deflection of the jet downwards to the lower vocal tract wall can happen, leading to a vortex in the closed phase that is rotating in the other direction. This deflection and vortex formation is well known in human phonation, and has been studied extensively in the past (e.g., [21,22,43,44,45,46]). If there is approximately 50% of the cycles having an upwards deflection and 50% with a downwards deflection, this leads to a rather symmetric averaged velocity profile, as it can be seen in Figure 10 (top).

Looking at Figure 8, it appears that the basic characteristics of the flow are unchanged from

L = 200 m m

to

L = 400 m m

. There is still an oscillating jet flow, which is deflected to one of the vocal tract walls, leading to a large supraglottal vortex occurring during the closing and closed phase. Qualitatively, the acoustic driving of the vocal folds therefore does not appear to change the flow field in the middle plane of the vocal tract significantly. From an aerodynamic point of view, there are some changes, however related to the elongation of the vocal tract. In this case, the supraglottal jet is always deflected towards the lower vocal tract wall, leading to an asymmetry in the mean velocity field shown in Figure 10 (middle). Here, the supraglottal vortex is stabilized by the longer vocal tract. With a shorter vocal tract, the vortex is convected out of the vocal tract by the starting jet in the opening phase of the vocal folds. This leads to a new flow situation, where the jet deflection direction can be changed from one oscillation cycle to the next one. With a longer vocal tract, the vortex is just convected downstream inside the channel, thereby interacting with the jet starting from the vocal folds and deflecting it towards its side of positive x-velocities. Therefore, the direction of jet deflection in this case is dependent on the initial deflection at the beginning of the phonation. Kniesburges et al. observed a similar behavior when changing the supraglottal channel height (y-direction) [22]. Here, an increase in the channel height also led to a stabilized supraglottal vortex that interacted with the glottal jet flow. The jet deflection direction can also change from one phonation process to the next, as it can be seen when comparing the mean velocity profiles of

L = 400 m m

and

L = 700 m m

in Figure 10.

In the

L = 700 m m

case, the jet is deflected upwards instead of downwards, also visible in the instationary velocity fields of Figure 9. Comparing the three configurations shown, the differences in the velocity magnitudes are notable, resulting from the different volume flow rates needed for vocal fold oscillation with contact. Generally, the peak flow velocities in this setup are higher than what is found in vivo [1], resulting from the large mean transglottal pressure needed for the single-layer synthetic vocal folds to oscillate with contact.

More quantitative differences between the three cases can be found by looking at the velocity fields in the frequency domain. Figure 11 shows the power spectral density (PSD) of the velocity magnitude averaged over the whole domain. All three spectra show the same qualitative trend of a general noise level decreasing with increasing frequency and strong harmonic peaks at their respective

f_{o}

and higher harmonics. The

f_{o}

-shift according to the acoustic resonances as shown in Figure 5 and Table 1 is also observable here. Generally, the decreased velocity magnitudes with increasing L lead to a lower harmonic intensity as well as a lower noise level in the spectra. In the case of

L = 400 m m

, there are also sub-harmonic peaks visible at

1 / 3 f_{o}

,

2 / 3 f_{o}

,

4 / 3 f_{o}

,

5 / 3 f_{o}

, and so on (with a fundamental frequency of

f_{o} = 225 Hz)

. In this case, the mode at 225

Hz

is the strongest, while the 150

Hz

mode is still visible in the spectrum. As can be seen in the spectra, the subharmonic peak at

2 / 3 f_{o}

coincides perfectly with the

f_{o}

-peak at

L = 200 m m

. Therefore, this 150

Hz

mode, as well as the peak at ∼75 Hz, can be interpreted as subharmonic peaks. Similarly, Titze [8] found the occurrence of subharmonic peaks at crossings of

f_{o}

and

f_{R 1}

in a computational model studying the interaction of supraglottal acoustics and vocal fold oscillation. Kniesburges et al. [22] also observed the appearance of subharmonic peaks in the supraglottal aerodynamic pressure, as well as far field acoustic pressure in a synthetic larynx model. They attributed the subharmonic peaks to small changes in the supraglottal jet location from one oscillation cycle to the next one due to the supraglottal vortex changing direction from cycle to cycle. This, however, is not the same mechanism as apparent in the present study; if the change in rotational direction of the supraglottal vortex was the reason for the subharmonic peaks in our spectra, they would need to occur, especially in the case of

L = 200 m m

, as here we have a symmetric mean velocity field (see Figure 10), indicating a 50:50 distribution of upwards and downwards deflection. In the case of

L = 400 m m

, the supraglottal vortex is more stable, resulting in a 100% downwards deflection of the jet. This suggests that the subharmonic peaks are not produced by the supraglottal jet location in our case.

3.3. Aeroacoustic Sources

To investigate the efficiency of the phonation process in the different cases, an aeroacoustic source term computation has been performed on the PIV measurements. For this, the Lighthill analogy was chosen. Root mean square (RMS) values

Φ

and an aeroacoustic efficiency

η

were computed as described by Equations (3) and (5). They are shown in Figure 12A and Figure 12B, respectively. It can be seen that

Φ

decreases with increasing length. This can be expected, as the aeroacoustic source intensity is dependent on the volume flow

\dot{V}

, which generally decreases with increasing length of the supraglottal channel. A similar trend can also be seen in the total subglottal pressure P shown in Figure 12C, which also shows a decrease with increasing length of the channel. Furthermore, the aeroacoustic efficiency

η

shows a slight decrease with increasing length up to a length of

L = 500 m m

. For larger L,

η

is rather constant.

Lighthill [24,47] showed that the efficiency of sound generation in free turbulent flows without influence of solid walls in the flow domain scales with the fifth power of the Mach number. To compare our results to this scaling law, Figure 12B also shows a theoretical computation of the aeroacoustic efficiency

η_{t h e o r}

, which makes use of this fifth power law. The case with

L = 200 m m

is chosen as the baseline case. The proportionality constant of the power law is chosen so that

η = η_{t h e o r}

at

L = 200 m m

. For the other cases, the value for

η_{t h e o r}

is then scaled with the fifth power of the corresponding bulk Mach number. For the channel lengths

L \leq 340 m m

, this law shows a reasonable agreement with the measurement data. It starts deviating from the data for larger L, and shows a strong difference for

L \geq 600 m m

. From Figure 5 we know that this is also the length region where

f_{o}

is close to

f_{R 1}

. This suggests that the acoustic resonance frequency of the vocal tract increases the aeroacoustic source intensity strongly by more than one order of magnitude. It also enhances the vocal fold oscillation, as the total subglottal pressure also needed for stable oscillation shows a strong drop by approximately 900

Pa

. Overall, the aeroacoustic efficiency

η

is with approximately 1% higher than what Lighthill reported for free turbulent flows, which might be explainable by the assumptions we had to make due to the missing information in the third spatial dimension. The assumption of a rotational symmetry of the jet flow leads to a strong correlation in the circumferential direction, which increases the aeroacoustic efficiency. Another reason is that Lighthill did not take the existence of stationary or moving walls in vicinity to the flow field into account. Such walls are known to greatly increase the efficiency of sound production [48,49]. Despite these uncertainties, the

η

-values found are still useful for a relative comparison between the configurations.

3.4. Acoustic Radiation

Figure 13 shows the acoustic spectra as measured by the microphone for the three main L. The difference in the

f_{o}

values is apparent. Furthermore, for

L = 400 m m

strong subharmonic peaks are visible. These subharmonic peaks can be directly related to the aerodynamic flow field, as they are at the same frequencies as in the aerodynamic spectrum of Figure 11. Also notable is the overall much lower noise level for

L = 700 m m

. In contrast to the PIV-related spectra, the peak height at

f_{o}

is, however, at a very similar level between the three measurements. This is the case due to the transfer function of the acoustic pressure of the vocal tract. As

f_{o}

is very close to a resonance frequency of the vocal tract in the

L = 400 m m

and

L = 700 m m

cases, the transfer function of the vocal tract is very large for those frequencies, leading to an amplification compared to the other frequencies. To obtain a better insight into the changes in the acoustic radiation with changing L, several parameters have been computed from the microphone data. Figure 14A shows the overall sound pressure level (SPL) as a function of L. For most L, the SPL is in a range between

81.8

dB and 84 dB. The one outlier is found for

L = 400 m m

. Here, the SPL rises to almost 95 dB. In this case, two amplifying characteristics coincide: related to the large P and

\dot{V}

values, the aeroacoustic source intensity and efficiency are very high (see Figure 12A,B). Additionally,

f_{o}

is close to

f_{R 1}

, leading to an amplification of the harmonic content in the acoustics. Therefore, a strong increase in the SPL can be expected. This also leads to a high vocal efficiency (VE, calculated following [50]), as seen in Figure 14B. Generally, the cases where

f_{o}

is close to

f_{R 1}

show a higher vocal efficiency than the rest of the cases, while also showing higher values for the SNR [51] and the HNR [52], as computed by the GAT. For the case of

L = 400 m m

, the difference in SNR and HNR compared to the other L are, however, much lower than in SPL and VE. This could be related to the strong subharmonic content in the acoustic spectrum, which leads to an erroneous noise content estimation. Also, the CPP [53,54] computed by GAT that is displayed in Figure 14D is very low for this case for the same reason. However, the CPP generally increases with increasing L, which indicates a decrease in noise, in contrast to tonal sound components. An outlier can be found for

L = 600 m m

. Here, some high-intensity, low-frequency noise happened to disturb the acoustic signal at acquisition time, leading to a large low-frequency noise content. This led to a strong decrease in HNR for this length. As this noise increased the overall SPL, the vocal efficiency also supposedly increased here. Generally, the SNR values found for the cases without acoustic backcoupling (small L) lie in the range typical for a healthy voice [51]. With increasing backcoupling, the SNR increases even more, leading to an improved voice quality in these cases. The HNR values reported are in the range of values found for ex vivo studies in the literature [55,56]. They are, however, at the lower end of what is normally found in vivo [57,58,59]. In our case, this might be attributed to the high volume flow rates needed for the synthetic larynx model to oscillate, leading to increased turbulence broadband noise generation. The high volume flow rate is also responsible for the fact that the VE values are also rather low for all L compared to in vivo data [50].

4. Conclusions

Acoustic back-coupling in the human phonation process has been investigated using a synthetic larynx model and PIV measurements. The vocal tract length was changed systematically in the range

L \in [200, 800] mm

to vary the relation between fundamental frequency of vocal fold oscillation

f_{o}

and lowest resonance frequency of the supraglottal channel

f_{R 1}

. The measurements showed that in the vicinity to each other,

f_{o}

is tuned to

f_{R 1}

. Decreasing

f_{R 1}

by increasing L led to a decrease in

f_{o}

as well. Increasing

f_{R 1}

to a value higher than the uninfluenced

f_{o}

generally did not increase

f_{o}

. One exception was the case for

L = 400 m m

. Here, the vocal folds changed their vibration mode, triggered by the acoustic standing waves in the vocal tract having a frequency similar to the eigenfrequency of this mode. The acoustic resonance frequency of the vocal tract did not change the overall characteristics of the supraglottal aerodynamics. However, a changed vocal fold oscillation frequency naturally also led to the change in the dominant frequency in the pulsatile flow field. Looking at the aeroacoustic sources revealed that matching of

f_{o}

and

f_{R 1}

resulted in a more than tenfold increase in aeroacoustic efficiency. This also led to an overall increased vocal efficiency, as well as increased SNR, HNR and CPP of the acoustic radiation. This indicates that, at this configuration, a person phonates with higher quality and efficiency. For the case of the professional female singing voice, where

f_{o}

and

f_{R 1}

matching can occur at frequencies in the range of approx.

500

Hz, this also means that the singer can phonate longer when

f_{o}

and

f_{R 1}

match. This matching is also facilitated by the automatic tuning of

f_{o}

to

f_{R 1}

we saw in vicinity.

The elongation of the vocal tract is a simplified approach to be able to study the phonation behavior for different

f_{o}

-

f_{R 1}

configurations. In reality,

f_{o}

-

f_{R 1}

matching only occurs when

f_{o}

is close to lowest resonance frequency of the vocal tract being in the range of

500

Hz, as stated above, which predominantly occurs in children and female singing voice [15]. To increase realism, in future works, more advanced synthetic vocal fold models could be used that show

f_{o}

values in this range. Then, more anatomically realistic vocal tract shapes, e.g., from MRI-scans [60] could also be applied, leading to an overall more realistic configuration. Furthermore, the application of tomographic PIV or Lagrangian particle tracking methods could enhance the accuracy of aeroacoustic source term computation. In our case, the overall high aeroacoustic efficiency could be attributed to the rotational symmetry we assumed along the channel axis to obtain some information for the missing third spatial dimension. Tomographic methods would render this assumption unnecessary, improving the accuracy of our evaluation method.

Author Contributions

Conceptualization, C.N. and S.K.; Investigation, C.N. and B.T.; Methodology, C.N. and S.S.; Project Administration, C.N.; Resources, S.K., B.T. and S.B.; Software C.N. and S.S.; Writing—original draft, C.N., S.K. and S.S.; Writing—review and editing, C.N., S.K. and S.S.; Visualization, C.N.; Supervision S.B. and S.K.; Funding acquisition, S.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by the German Research Foundation (DFG) through the project “Tracing the mechanisms that generate tonal content in voiced speech”. Project number: 446965891.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are not publicly available due to ongoing research in this field.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

CPP	Cepstral Peak Prominence
FSAI	Fluid–Structure–Acoustic Interaction
GAT	Glottis Analysis Tools
HNR	Harmonics-to-Noise Ratio
PIV	Particle Image Velocimetry
POD	Proper Orthogonal Decomposition
PSD	Power Spectral Density
RMS	Root Mean Square
ROI	Region Of Interest
SNR	Signal-to-Noise Ratio
SPL	Sound Pressure Level
VE	Vocal Efficiency

References

Mittal, R.; Erath, B.D.; Plesniak, M.W. Fluid Dynamics of Human Phonation and Speech. Annu. Rev. Fluid Mech. 2013, 45, 437–467. [Google Scholar] [CrossRef]
Bodaghi, D.; Xue, Q.; Zheng, X.; Thomson, S. Effect of Subglottic Stenosis on Vocal Fold Vibration and Voice Production Using Fluid–Structure–Acoustics Interaction Simulation. Appl. Sci. 2021, 11, 1221. [Google Scholar] [CrossRef]
Döllinger, M.; Zhang, Z.; Schoder, S.; Šidlof, P.; Tur, B.; Kniesburges, S. Overview on state-of-the-art numerical modeling of the phonation process. Acta Acust. 2023, 7, 25. [Google Scholar] [CrossRef]
Schoder, S.; Maurerlehner, P.; Wurzinger, A.; Hauser, A.; Falk, S.; Kniesburges, S.; Döllinger, M.; Kaltenbacher, M. Aeroacoustic Sound Source Characterization of the Human Voice Production-Perturbed Convective Wave Equation. Appl. Sci. 2021, 11, 2614. [Google Scholar] [CrossRef]
Lodermeyer, A.; Bagheri, E.; Kniesburges, S.; Näger, C.; Probst, J.; Döllinger, M.; Becker, S. The mechanisms of harmonic sound generation during phonation: A multi-modal measurement-based approach. J. Acoust. Soc. Am. 2021, 150, 3485–3499. [Google Scholar] [CrossRef]
Titze, I.R.; Story, B.H. Acoustic interactions of the voice source with the lower vocal tract. J. Acoust. Soc. Am. 1997, 101, 2234–2243. [Google Scholar] [CrossRef]
Fant, G. Acoustic Theory of Speech Production; De Gruyter Mouton: The Hague, Netherlands, 1971. [Google Scholar] [CrossRef]
Titze, I.R. Nonlinear source–filter coupling in phonation: Theory. J. Acoust. Soc. Am. 2008, 123, 2733–2749. [Google Scholar] [CrossRef]
Howe, M.S.; McGowan, R.S. Sound generated by aerodynamic sources near a deformable body, with application to voiced speech. J. Fluid Mech. 2007, 592, 367–392. [Google Scholar] [CrossRef]
McGowan, R.S.; Howe, M.S. Source-tract interaction with prescribed vocal fold motion. J. Acoust. Soc. Am. 2012, 131, 2999–3016. [Google Scholar] [CrossRef]
Zañartu, M.; Mongeau, L.; Wodicka, G.R. Influence of acoustic loading on an effective single mass model of the vocal folds. J. Acoust. Soc. Am. 2007, 121, 1119–1129. [Google Scholar] [CrossRef]
Lucero, J.C.; Lourenço, K.G.; Hermant, N.; Hirtum, A.V.; Pelorson, X. Effect of source–tract acoustical coupling on the oscillation onset of the vocal folds. J. Acoust. Soc. Am. 2012, 132, 403–411. [Google Scholar] [CrossRef]
Erath, B.D.; Peterson, S.D.; Weiland, K.S.; Plesniak, M.W.; Zañartu, M. An acoustic source model for asymmetric intraglottal flow with application to reduced-order models of the vocal folds. PLoS ONE 2019, 14, e0219914. [Google Scholar] [CrossRef]
Wade, L.; Hanna, N.; Smith, J.; Wolfe, J. The role of vocal tract and subglottal resonances in producing vocal instabilities. J. Acoust. Soc. Am. 2017, 141, 1546–1559. [Google Scholar] [CrossRef]
Echternach, M.; Herbst, C.T.; Köberlein, M.; Story, B.; Döllinger, M.; Gellrich, D. Are source-filter interactions detectable in classical singing during vowel glides? J. Acoust. Soc. Am. 2021, 149, 4565–4578. [Google Scholar] [CrossRef]
Zhang, Z.; Neubauer, J.; Berry, D.A. Influence of vocal fold stiffness and acoustic loading on flow-induced vibration of a single-layer vocal fold model. J. Sound Vib. 2009, 322, 299–313. [Google Scholar] [CrossRef]
Smith, B.L.; Nemcek, S.P.; Swinarski, K.A.; Jiang, J.J. Nonlinear Source-Filter Coupling Due to the Addition of a Simplified Vocal Tract Model for Excised Larynx Experiments. J. Voice 2013, 27, 261–266. [Google Scholar] [CrossRef] [PubMed][Green Version]
Migimatsu, K.; Tokuda, I.T. Experimental study on nonlinear source–filter interaction using synthetic vocal fold models. J. Acoust. Soc. Am. 2019, 146, 983–997. [Google Scholar] [CrossRef] [PubMed]
Oren, L.; Khosla, S.; Gutmark, E. Intraglottal geometry and velocity measurements in canine larynges. J. Acoust. Soc. Am. 2014, 135, 380–388. [Google Scholar] [CrossRef] [PubMed]
Oren, L.; Khosla, S.; Gutmark, E. Intraglottal pressure distribution computed from empirical velocity data in canine larynx. J. Biomech. 2014, 47, 1287–1293. [Google Scholar] [CrossRef] [PubMed]
Lodermeyer, A.; Becker, S.; Döllinger, M.; Kniesburges, S. Phase-locked flow field analysis in a synthetic human larynx model. Exp. Fluids 2015, 56, 77. [Google Scholar] [CrossRef]
Kniesburges, S.; Lodermeyer, A.; Becker, S.; Traxdorf, M.; Döllinger, M. The mechanisms of subharmonic tone generation in a synthetic larynx model. J. Acoust. Soc. Am. 2016, 139, 3182–3192. [Google Scholar] [CrossRef] [PubMed]
Lodermeyer, A.; Tautz, M.; Becker, S.; Döllinger, M.; Birk, V.; Kniesburges, S. Aeroacoustic analysis of the human phonation process based on a hybrid acoustic PIV approach. Exp. Fluids 2018, 59, 13. [Google Scholar] [CrossRef]
Lighthill, M. On sound generated aerodynamically I. General theory. Proc. Roy. Soc. Lond. 1952, 211, 564–587. [Google Scholar] [CrossRef]
Kaltenbacher, M.; Hüppe, A.; Reppenhagen, A.; Zenger, F.; Becker, S. Computational Aeroacoustics for Rotating Systems with Application to an Axial Fan. AIAA J. 2017, 55, 3831–3838. [Google Scholar] [CrossRef]
de Luzan, C.F.; Oren, L.; Maddox, A.; Gutmark, E.; Khosla, S.M. Volume velocity in a canine larynx model using time-resolved tomographic particle image velocimetry. Exp. Fluids 2020, 61, 63. [Google Scholar] [CrossRef]
Scherer, R.C.; Shinwari, D.; Witt, K.J.D.; Zhang, C.; Kucinschi, B.R.; Afjeh, A.A. Intraglottal pressure profiles for a symmetric and oblique glottis with a divergence angle of 10 degrees. J. Acoust. Soc. Am. 2001, 109, 1616–1630. [Google Scholar] [CrossRef]
Thomson, S.L.; Mongeau, L.; Frankel, S.H. Aerodynamic transfer of energy to the vocal folds. J. Acoust. Soc. Am. 2005, 118, 1689–1700. [Google Scholar] [CrossRef]
Zhang, Z.; Neubauer, J.; Berry, D.A. The influence of subglottal acoustics on laboratory models of phonation. J. Acoust. Soc. Am. 2006, 120, 1558–1569. [Google Scholar] [CrossRef]
Durst, F.; Heim, U.; Ünsal, B.; Kullik, G. Mass flow rate control system for time-dependent laminar and turbulent flow investigations. Meas. Sci. Technol. 2003, 14, 893–902. [Google Scholar] [CrossRef]
Kist, A.M.; Gómez, P.; Dubrovskiy, D.; Schlegel, P.; Kunduk, M.; Echternach, M.; Patel, R.; Semmler, M.; Bohr, C.; Dürr, S.; et al. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. J. Speech Lang. Hear. Res. 2021, 64, 1889–1903. [Google Scholar] [CrossRef]
Maryn, Y.; Verguts, M.; Demarsin, H.; van Dinther, J.; Gomez, P.; Schlegel, P.; Döllinger, M. Intersegmenter Variability in High-Speed Laryngoscopy-Based Glottal Area Waveform Measures. Laryngoscope 2020, 130, E654–E661. [Google Scholar] [CrossRef]
Raffel, M.; Willert, C.E.; Scarano, F.; Kähler, C.J.; Wereley, S.T.; Kompenhans, J. Particle Image Velocimetry; Springer International Publishing: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
Samimy, M.; Lele, S.K. Motion of particles with inertia in a compressible free shear layer. Phys. Fluids A Fluid Dyn. 1991, 3, 1915–1923. [Google Scholar] [CrossRef]
Mendez, M.; Raiola, M.; Masullo, A.; Discetti, S.; Ianiro, A.; Theunissen, R.; Buchlin, J.M. POD-based background removal for particle image velocimetry. Exp. Therm. Fluid Sci. 2017, 80, 181–192. [Google Scholar] [CrossRef]
Adatrao, S.; Sciacchitano, A. Elimination of unsteady background reflections in PIV images by anisotropic diffusion. Meas. Sci. Technol. 2019, 30, 035204. [Google Scholar] [CrossRef]
Westerweel, J.; Scarano, F. Universal outlier detection for PIV data. Exp. Fluids 2005, 39, 1096–1100. [Google Scholar] [CrossRef]
Sondhi, M.; Schroeter, J. A hybrid time-frequency domain articulatory speech synthesizer. IEEE Trans. Acoust. Speech Signal Process. 1987, 35, 955–967. [Google Scholar] [CrossRef]
Story, B.H.; Laukkanen, A.M.; Titze, I.R. Acoustic impedance of an artificially lengthened and constricted vocal tract. J. Voice 2000, 14, 455–469. [Google Scholar] [CrossRef]
Flanagan, J.L. Speech Analysis, Synthesis and Perception; Springer: Berlin/Heidelberg, Germany, 1972; p. 444. [Google Scholar]
Fulcher, L.; Lodermeyer, A.; Kähler, G.; Becker, S.; Kniesburges, S. Geometry of the Vocal Tract and Properties of Phonation near Threshold: Calculations and Measurements. Appl. Sci. 2019, 9, 2755. [Google Scholar] [CrossRef]
Titze, I. Principles of Voice Production; Prentice Hall: Hoboken, NJ, USA, 1994. [Google Scholar]
Neubauer, J.; Zhang, Z.; Miraghaie, R.; Berry, D.A. Coherent structures of the near field flow in a self-oscillating physical model of the vocal folds. J. Acoust. Soc. Am. 2007, 121, 1102–1118. [Google Scholar] [CrossRef] [PubMed]
Erath, B.D.; Plesniak, M.W. The occurrence of the Coanda effect in pulsatile flow through static models of the human vocal folds. J. Acoust. Soc. Am. 2006, 120, 1000–1011. [Google Scholar] [CrossRef]
Erath, B.D.; Plesniak, M.W. An investigation of asymmetric flow features in a scaled-up driven model of the human vocal folds. Exp. Fluids 2010, 49, 131–146. [Google Scholar] [CrossRef]
Erath, B.D.; Plesniak, M.W. Impact of wall rotation on supraglottal jet stability in voiced speech. J. Acoust. Soc. Am. 2011, 129, EL64–EL70. [Google Scholar] [CrossRef] [PubMed]
Lighthill, M.J. On sound generated aerodynamically II. Turbulence as a source of sound. Proc. R. Soc. London. Ser. A. Math. Phys. Sci. 1954, 222, 1–32. [Google Scholar] [CrossRef]
Howe, M.S. Acoustics of Fluid-Structure Interactions; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar] [CrossRef]
Howe, M.; McGowan, R. Aerodynamic sound of a body in arbitrary, deformable motion, with application to phonation. J. Sound Vib. 2013, 332, 3909–3923. [Google Scholar] [CrossRef] [PubMed]
Titze, I.R.; Maxfield, L.; Palaparthi, A. An Oral Pressure Conversion Ratio as a Predictor of Vocal Efficiency. J. Voice 2016, 30, 398–406. [Google Scholar] [CrossRef] [PubMed]
Qi, Y.; Hillman, R.E.; Milstein, C. The estimation of signal-to-noise ratio in continuous speech for disordered voices. J. Acoust. Soc. Am. 1999, 105, 2532–2535. [Google Scholar] [CrossRef] [PubMed]
Yumoto, E.; Sasaki, Y.; Okamura, H. Harmonics-to-Noise Ratio and Psychophysical Measurement of the Degree of Hoarseness. J. Speech Lang. Hear. Res. 1984, 27, 2–6. [Google Scholar] [CrossRef]
Hillenbrand, J.; Cleveland, R.A.; Erickson, R.L. Acoustic Correlates of Breathy Vocal Quality. J. Speech Lang. Hear. Res. 1994, 37, 769–778. [Google Scholar] [CrossRef]
Hillenbrand, J.; Houde, R.A. Acoustic Correlates of Breathy Vocal Quality: Dysphonic Voices and Continuous Speech. J. Speech Lang. Hear. Res. 1996, 39, 311–321. [Google Scholar] [CrossRef]
Semmler, M.; Berry, D.A.; Schützenberger, A.; Döllinger, M. Fluid-structure-acoustic interactions in an ex vivo porcine phonation model. J. Acoust. Soc. Am. 2021, 149, 1657–1673. [Google Scholar] [CrossRef]
Peters, G.; Jakubaß, B.; Weidenfeller, K.; Kniesburges, S.; Böhringer, D.; Wendler, O.; Mueller, S.K.; Gostian, A.O.; Berry, D.A.; Döllinger, M.; et al. Synthetic mucus for an ex vivo phonation setup: Creation, application, and effect on excised porcine larynges. J. Acoust. Soc. Am. 2022, 152, 3245–3259. [Google Scholar] [CrossRef] [PubMed]
Gorris, C.; Maccarini, A.R.; Vanoni, F.; Poggioli, M.; Vaschetto, R.; Garzaro, M.; Valletti, P.A. Acoustic Analysis of Normal Voice Patterns in Italian Adults by Using Praat. J. Voice 2020, 34, 961.e9–961.e18. [Google Scholar] [CrossRef] [PubMed]
Gojayev, E.K.; Büyükatalay, Z.Ç.; Akyüz, T.; Rehan, M.; Dursun, G. The Effect of Masks and Respirators on Acoustic Voice Analysis during the COVID-19 Pandemic. J. Voice 2021. [Google Scholar] [CrossRef] [PubMed]
Nguyen, D.D.; Madill, C. Auditory-perceptual Parameters as Predictors of Voice Acoustic Measures. J. Voice 2023. [Google Scholar] [CrossRef]
Story, B.H.; Titze, I.R.; Hoffman, E.A. Vocal tract area functions from magnetic resonance imaging. J. Acoust. Soc. Am. 1996, 100, 537–554. [Google Scholar] [CrossRef]

Figure 1. 2D representation of the vocal fold model used, with its dimensions given in mm. The flow direction in the experiment is indicated.

Figure 2. 2D cut through the experimental setup. The vocal fold position is indicated between the vocal tract and the subglottal channel. A silencer is placed upstream to attenuate emerging sound in the inflow hose. The flow direction is from left to right.

Figure 3. The setup for the PIV measurements. The flow was visualized using tracer particles and laser double pulses at a repetition rate of

2 \times 5 kHz

. The rectangular section of the vocal tract provided optical access through a glass window, allowing the flow to be recorded with a high-speed camera.

Figure 3. The setup for the PIV measurements. The flow was visualized using tracer particles and laser double pulses at a repetition rate of

2 \times 5 kHz

. The rectangular section of the vocal tract provided optical access through a glass window, allowing the flow to be recorded with a high-speed camera.

Figure 4. The different background removal techniques. The original images (A) are processed by POD (B) and anisotropic diffusion (C). The results are combined into one image (D), where the anisotropic diffusion image is used at the glottis, while the POD image is used in the other regions. The images were inverted to enhance visibility.

Figure 5. The vocal tract input impedance

Z_{i n}

(computed via transmission line model) shown as color map is a function of frequency f and vocal tract length L. The location of the vocal tract resonance frequency

f_{R 1}

shows as a bright yellow line in the plot. Superimposed are the oscillation frequencies

f_{o}

at the individual measurements with different lengths of the supraglottal channel.

Figure 5. The vocal tract input impedance

Z_{i n}

(computed via transmission line model) shown as color map is a function of frequency f and vocal tract length L. The location of the vocal tract resonance frequency

f_{R 1}

shows as a bright yellow line in the plot. Superimposed are the oscillation frequencies

f_{o}

at the individual measurements with different lengths of the supraglottal channel.

Figure 6. High-speed-camera recordings of the vocal fold oscillation for the three different vocal tract lengths of

L = 200 m m

,

L = 400 m m

and

L = 700 m m

, respectively. Every row shows the recording for one length over one oscillation period T. Due to the different

f_{o}

-values, the actual time steps between two images are different in each row. The values for

f_{o}

and

f_{R 1}

for all three cases are displayed to the right of their respective image series.

Figure 6. High-speed-camera recordings of the vocal fold oscillation for the three different vocal tract lengths of

L = 200 m m

,

L = 400 m m

and

L = 700 m m

, respectively. Every row shows the recording for one length over one oscillation period T. Due to the different

f_{o}

-values, the actual time steps between two images are different in each row. The values for

f_{o}

and

f_{R 1}

for all three cases are displayed to the right of their respective image series.

Figure 7. Instantaneous flow fields for 12 different time steps at a vocal tract length of

L = 200 m m

. The time steps of the snapshots are shown in their respective top right corner.

Figure 7. Instantaneous flow fields for 12 different time steps at a vocal tract length of

L = 200 m m

. The time steps of the snapshots are shown in their respective top right corner.

Figure 8. Instantaneous flow fields for 12 different time steps at a vocal tract length of

L = 400 m m

. The time steps of the snapshots are shown in their respective top right corner.

Figure 8. Instantaneous flow fields for 12 different time steps at a vocal tract length of

L = 400 m m

. The time steps of the snapshots are shown in their respective top right corner.

Figure 9. Instantaneous flow fields for 12 different time steps at a vocal tract length of

L = 700 m m

. The time steps of the snapshots are shown in their respective top right corner. Note the changed color map limits compared to Figure 7 and Figure 8 for improved visibility.

Figure 9. Instantaneous flow fields for 12 different time steps at a vocal tract length of

L = 700 m m

. The time steps of the snapshots are shown in their respective top right corner. Note the changed color map limits compared to Figure 7 and Figure 8 for improved visibility.

Figure 10. Mean velocity profiles for

200 m m

,

400 m m

and

700 m m

.

Figure 10. Mean velocity profiles for

200 m m

,

400 m m

and

700 m m

.

Figure 11. Averaged power spectral density of the flow field obtained by PIV for the three different vocal tract lengths of

L = 200 m m

,

L = 400 m m

and

L = 700 m m

, respectively.

Figure 11. Averaged power spectral density of the flow field obtained by PIV for the three different vocal tract lengths of

L = 200 m m

,

L = 400 m m

and

L = 700 m m

, respectively.

Figure 12. RMS aeroacoustic source level

Φ

(A), aeroacoustic efficiency

η

(B) and the total subglottal pressure P (C) of all vocal tract lengths L investigated.

Figure 12. RMS aeroacoustic source level

Φ

(A), aeroacoustic efficiency

η

(B) and the total subglottal pressure P (C) of all vocal tract lengths L investigated.

Figure 13. Acoustic spectra for the three different vocal tract lengths of

L = 200 m m

,

L = 400 m m

and

L = 700 m m

, respectively.

Figure 13. Acoustic spectra for the three different vocal tract lengths of

L = 200 m m

,

L = 400 m m

and

L = 700 m m

, respectively.

Figure 14. Sound pressure level (A), vocal efficiency (B), signal-to-noise ratio, harmonics to noise ratio (both C) and cepstral peak prominence (D) for all vocal tract lengths L investigated.

Table 1. The main measurement quantities of the PIV-measurements. L denotes the length of the supraglottal channel,

\dot{V}

the flow rate,

P_{t r a n s}

the transglottal pressure difference,

f_{o}

the oscillation frequency of the vocal folds and

f_{R 1}

the first resonance frequency of the vocal tract as computed via transmission line model.

Table 1. The main measurement quantities of the PIV-measurements. L denotes the length of the supraglottal channel,

\dot{V}

the flow rate,

P_{t r a n s}

the transglottal pressure difference,

f_{o}

the oscillation frequency of the vocal folds and

f_{R 1}

the first resonance frequency of the vocal tract as computed via transmission line model.

L in mm	$\dot{V}$ in l/min	$P_{trans}$ in Pa	$f_{o}$ in Hz	$f_{R 1}$ in Hz
200	124	4208	151.3	527
240	120	4100	152.9	433
300	123	4188	151.3	337
340	120	4122	150.5	293
400	107	3791	225.7	244
500	99	3617	158.6	190
600	58	2659	158.6	156
700	55	2527	136.8	132
800	46	2266	119.0	114

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Näger, C.; Kniesburges, S.; Tur, B.; Schoder, S.; Becker, S. An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model. Bioengineering 2023, 10, 1343. https://doi.org/10.3390/bioengineering10121343

AMA Style

Näger C, Kniesburges S, Tur B, Schoder S, Becker S. An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model. Bioengineering. 2023; 10(12):1343. https://doi.org/10.3390/bioengineering10121343

Chicago/Turabian Style

Näger, Christoph, Stefan Kniesburges, Bogac Tur, Stefan Schoder, and Stefan Becker. 2023. "An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model" Bioengineering 10, no. 12: 1343. https://doi.org/10.3390/bioengineering10121343

APA Style

Näger, C., Kniesburges, S., Tur, B., Schoder, S., & Becker, S. (2023). An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model. Bioengineering, 10(12), 1343. https://doi.org/10.3390/bioengineering10121343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Basic Experimental Setup

2.2. Measurement Setup

2.3. Aeroacoustic Source Computation

2.4. Acoustic Characterization of the Vocal Tract

3. Results and Discussion

3.1. General Results

3.2. Supraglottal Aerodynamics

3.3. Aeroacoustic Sources

3.4. Acoustic Radiation

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI