Impact of the Sub-Grid Scale Turbulence Model in Aeroacoustic Simulation of Human Voice

: In an aeroacoustic simulation of human voice production, the effect of the sub-grid scale (SGS) model on the acoustic spectrum was investigated. In the ﬁrst step, incompressible airﬂow in a 3D model of larynx with vocal folds undergoing prescribed two-degree-of-freedom oscillation was simulated by laminar and Large-Eddy Simulations (LES), using the One-Equation and Wall-Adaptive Local-Eddy (WALE) SGS models. Second, the aeroacoustic sources and the sound propagation in a domain composed of the larynx and vocal tract were computed by the Perturbed Convective Wave Equation (PCWE) for vowels [u:] and [i:]. The results show that the SGS model has a signiﬁcant impact not only on the ﬂow ﬁeld, but also on the spectrum of the sound sampled 1 cm downstream of the lips. With the WALE model, which is known to handle the near-wall and high-shear regions more precisely, the simulations predict signiﬁcantly higher peak volumetric ﬂow rates of air than those of the One-Equation model, only slightly lower than the laminar simulation. The usage of the WALE SGS model also results in higher sound pressure levels of the higher harmonic frequencies.


Introduction
Generation of the human voice is a highly complex biophysical process, where the viscoelastic multi-layered tissues of the vocal folds interact with the airflow expired from the lungs, start to self-oscillate and close the channel periodically. The vocal fold oscillation and glottal closure modulate the mass flux, create complex turbulent structures and pressure disturbances, which form the voice source. This source signal is modulated by the vocal tract, radiated from the mouth, and is perceived as a human voice. The physiological principles are well described in the monograph by Titze [1].
The acoustic resonances of the subglottal spaces (trachea, bronchi, and bronchioles) and supraglottal part (the vocal tract) can enhance the driving pressures of the vocal folds and the glottal flow, thereby modifying the energy level of sound sources. The voice generation is governed by a three-way interaction between the structure, airflow and acoustics. The dominant aeroacoustic sound sources are located within the glottis and the supraglottal region. Three characteristics of flow-induced sound are defined in terms of free-field radiation-first, vocal fold oscillation induces a monopole radiation pattern. Second, the interaction of the vocal folds and the air jet creates surface pressure fluctuations that radiate like a dipole. Third, the turbulent structures located downstream of the glottal constriction have a quadrupolar radiation pattern [2,3]. The sound due to turbulence has a significant influence on the broadband noise at higher frequencies.
The principles of voice generation may be studied by in vivo, excised larynx or synthetic vocal fold measurements [4,5], or by numerical modelling. During recent decades, numerical simulation of human phonation has advanced considerably towards a state where full-scale aeroacoustic simulations on realistic computed tomography (CT)-or magnetic resonance imaging (MRI)-based geometries are possible. It can be anticipated that soon these simulations could be used for subject-specific pre-surgical predictions of vocal fold oscillations and resulting voice quality for people suffering from various vocal fold dysfunctions [6,7] or for the development of vocal fold prostheses. However, though powerful numerical simulation tools are available, the numerical methods still face challenges due to the nonlinear phenomena both in solid and fluid mechanics. This study focuses on the impact of the numerical approach to turbulence modeling on the aeroacoustic simulation of voice generation.
Turbulence is an inherent property of medium and high Reynolds number flows, where the energy of the large flow structures is transferred in a cascade of scales towards the smallest vortices, where the turbulent kinetic energy dissipates into heat. Measurements on excised canine larynges suggest that the airflow in trachea, subglottal and intraglottal space is typically laminar, but encounters transition to turbulence shortly downstream of the glottis [8][9][10]. The vortex dynamics in the supraglottal spaces and vocal tract are governed by turbulent momentum transfer. Turbulence also induces a broadband sound source, which is important especially in the case of breathy phonation, but is also significant in the case of modal voice [11].
In the computational fluid dynamics (CFD) of turbulent flows, three approaches are used. First, the Direct Numerical Simulations (DNS), that is, straightforward discretization of the Navier-Stokes equations on a sufficiently fine computational mesh, where all turbulent scales up to the smallest dissipating vortices are resolved. Even for moderate Reynolds numbers encountered in laryngeal airflow, this type of simulation is prohibitively expensive in terms of computational requirements. Sometimes, the term "laminar simulation" is, somewhat incorrectly, used for a DNS of flow using a coarse mesh. This is correct for purely laminar flow with no turbulence. However, using the "laminar" model for turbulent flow, actually, a DNS on a coarse grid unable to capture the small-scale fluid motions, introduces error since the influence of the sub-grid vortices on the large-scale (resolved) turbulent motions is neglected. The second approach is the current industry standard, unsteady Reynolds-Averaged Navier-Stokes equations (RANS), which completely gives up resolving the turbulent fluctuations and dynamic evolution of vortical structures, and aims to calculate the mean, time-averaged flow. The influence of turbulence on the mean flow is modeled using some of the plethoras of more or less complex turbulence models. Clearly, RANS is unsuitable for aeroacoustic simulations of voice where the unsteady turbulent motions represent a crucial portion of the aeroacoustic sources.
With rigorous DNS being infeasible and RANS inapplicable, numerical modeling of the aeroacoustic principles of voice production can use either the laminar simulations (such as in [12][13][14][15][16][17][18][19][20]) or the third, arguably most promising approach-the Large Eddy Simulation (LES). LES may be regarded as a compromise between RANS, where the entire effect of turbulence is modeled, and DNS, where all the turbulent scales are resolved. The LES concept resolves the large, anisotropic energy-carrying fluid motions and models the effect of sub-grid scale, largely isotropic turbulent structures. Large Eddy Simulations are still computationally expensive, especially if the boundary layer is to be resolved properly. However, current parallel computational resources make this approach viable for low and moderate Reynolds-number flows. In the numerical simulation of human phonation, the LES approach has been used in recent years. One of the first was the work of Suh and Frankel [21], who used compressible LES and Ffowcs Williams-Hawkings acoustic analogy in a static model of human glottis for far-field sound predictions. The work by Mihaescu [22,23] employed the LES capability of ANSYS Fluent (ANSYS, Canonsburg, PA/USA) to study the laryngeal airflow both during phonation and inspiration. The work of Schwarze et al. [24] explores a variant called Implicit LES, where the intrinsic dissipation of the numerical method is assumed to act as a sub-grid scale (SGS) model. Another compressible LES simulation on a static glottis was published by de Luzan et al. [25]. Recently, Sadeghi [6,7,26] simulated the laryngeal flow and effect of ventricular folds using the LES feature of STAR-CCM+ (Siemens PLM Software, Plano, TX/USA) and quantified the computational requirements on parallel architectures. In the study by Schickhofer et al. [27], the same software and similar numerical approach was used to study the influence of the supraglottal coherent structures produced by flow through static glottis on the far-field sound signal on a realistic vocal tract geometry from MRI data.
A systematic comparison of the airflow and sound generated in human larynx, obtained using laminar and LES flow simulations, should help to assess the importance and impact of turbulence modeling in aeroacoustic simulations of voice generation. This study builds on the previous publications [18,28] and compares the acoustic spectra obtained using a laminar simulation (i.e., with no SGS modeling) and two variants of Large Eddy Simulations with different sub-grid scale models-the One-Equation Eddy-Viscosity SGS model [29] and the Wall-Adaptive Local Eddy-Viscosity (WALE) model [30]. The impact of the sub-grid scale model on the spectrum of the generated voice for vowels [u:] and [i:] is analyzed and discussed. This paper is organized into two major sections, describing the fluid dynamic and aeroacoustic models. Section 2.1 explains the Large Eddy Simulation framework, SGS models used and boundary conditions of the mathematical model. This is followed by Sections 2.3 and 2.4 on the geometry, mesh and numerical solution of the CFD problem, and eventually Section 2.5 presents the results of the CFD simulations. Section 3 has a similar structure, explaining first the mathematical model, then the numerical approach and finally the presenting the results of the aeroacoustic simulations.

Mathematical Model
Large Eddy Simulation is a mathematical concept for modeling turbulent flows, which deals with flow structures carrying most kinetic energy, that is, spatially organized large scales. These consist of two main categories: coherent structures and coherent vortices of recognizable shape [31]. In the numerical implementation, the characteristic length ∆, defining a cutoff between resolved scales and modeled sub-grid scales, is usually given by the mesh grid spacing [32].
In the LES concept, any flow variable f (x, t), where x = (x 1 , x 2 , x 3 ) is the spatial coordinate and t time, may be decomposed as dr is the large-scale component, obtained by spatial filtering, and f (x, t) is the small, sub-grid scale contribution. The convolution introduced above contains a filter function G separating spatial scales |r| ≤ ∆/2 from |r| > ∆/2. In current simulations, a top-hat filter is used, which is a common choice in low-order finite volume methods. The effect of sub-grid scale (SGS) contributions on the large flow scales relies on the assumption of isotropic (non-directional) small-scale turbulence and is modeled. The continuity and momentum equations for the incompressible fluid flow, with LES filtering applied, can be written as ∂U i ∂t where U i is the filtered velocity, p represents the filtered static pressure and ν = 1.58 · 10 −5 m 2 s −1 is the kinematic molecular viscosity. The term U i U j is the dyadic product and cannot be expressed directly [33]. Modification of the momentum Equation (3) by The new term on the right-hand side of (4) is the divergence of the sub-grid scale (SGS) stress tensor where the individual tensors are −u i u j the Reynolds-stress-like term, −(U i u j + u i U j ) the Clark term [34] and −(U i U j + U i U j ) the Leonard term [35]. The SGS stress tensor is left to be modeled to close the set of equations. Compared to the Reynolds stresses in RANS, the SGS stresses carry much less of the turbulent energy, and so the accuracy of the model is less important.

Smagorinsky SGS Model
One of the first and simplest SGS models was the Smagorinsky algebraic model [36], based on the eddy-viscosity assumption, which states that the deviatoric part of the SGS stress tensor is proportional to the symmetric part of the velocity gradient tensor The constant of proportionality in this relation, ν t , is called the kinematic sub-grid scale eddy-viscosity (turbulent viscosity), which eventually adds to the kinematic molecular viscosity ν. The Smagorinsky model assumes that the small scales dissipate instantaneously all energy transferred from the resolved scales. From this, Smagorinsky derived that the SGS viscosity may be estimated as where C S ≈ 0.18 is the Smagorinsky constant describing the rate of break-up of isotropic vortices in the viscous subrange of the turbulence energy spectrum, and where the colon operator denotes the double inner product. The main limitation of the Smagorinsky model, which was used in the previous simulations [37], lies in the assumption of local equilibrium between the production and dissipation of turbulent SGS energy. In many real cases, notably, free shear layer flows, separating and reattaching flows, and wall-dominated flows (which is also the case of glottal airflow), this assumption is not fulfilled. This is why more accurate SGS models have been tested in our study, as described in the following sections.

One-Equation SGS Model
The one-equation eddy-viscosity model tries to address the deficiency of the model of Smagorinsky by solving an additional transport equation. Yoshizawa and Horiuti [29] derived the transport equation for the turbulent kinetic SGS energy k SGS in the form where the terms represent the temporal change, convection, diffusion, production and dissipation. Unlike the Smagorinsky model, which disregards the first three of these, the one-equation model takes into account also the history effects for k SGS . The production term, modeling the decay of turbulence from the resolved scales to the SGS scales via the energy cascade, is approximated by The one-equation model again relies on the SGS eddy viscosity concept with the SGS viscosity The model constants are set to C E = 1.048 and C k = 0.094. Both the Smagorinsky and One-Equation models are unable to reproduce the laminar to turbulent transition and tend to over predict the production rate and thus the turbulent viscosity in free shear layers and near the walls. This is caused by the fact that the invariant S ij : S ij is large in the regions of pure shear, because it is only related to the strain rate, not to the rate of rotation [31].

Wall-Adapting Local Eddy-Viscosity SGS Model
The inaccuracy concerning free shear and boundary layer treatment, caused by the previously described SGS models, can be alleviated by using the Wall-Adapting Local Eddy-viscosity (WALE) model [30]. The model considers the term, the traceless symmetric part of the square of the velocity gradient tensor. Firstly, the term s d ij is rewritten with the symmetric part S and the deviatoric part Ω of the velocity gradient Afterwards the Cayley-Hamilton theorem of linear algebra is used on (12), where IV SΩ = S ik S kj Ω jl Ω li . The WALE turbulent viscosity has the following form The term (s d ij s d ij ) 3/2 / (S ij S ij ) 5/2 would not be well conditioned, because the denominator term can be zero for pure shear or rotational strain. The added term (s d ij s d ij ) 5/4 keeps the turbulent viscosity finite. The model constant C w is set to 0.325.

Boundary Conditions
The boundary conditions for the CFD model are summarized in Table 1 (see also Figure 1). The flow is driven by constant pressure difference between the inlet and outlet. The velocity on Γ in and Γ out is computed from the flux. The flow enters at inlet and exits at the outlet or is set to zero in case of backflow. On the fixed channel walls, a no-slip boundary condition is prescribed. On the moving boundaries, the flow velocity is equal to the velocity of the moving vocal fold surface, given by function h(x, t). Details on the vocal fold kinematics can be found in [18]. In the current simulation, the vocal folds oscillate symmetrically with a frequency f o = 100 Hz, amplitudes at the superior and inferior vocal fold margin A 1 = A 2 = 0.3 mm and phase difference ξ 1 = π/2 and ξ 2 = 0.  Boundary

CFD Geometry and Mesh
The computational domain for the CFD simulation represents a simplified model of the human larynx with a rectangular cross-section, consisting of a short subglottal channel, glottal constriction formed by the vocal folds, ventricles, further contraction by the false vocal folds and straight supraglottal channel (see Figure 1). The geometry of the vocal folds is based on the M5 parametric shape by Scherer et al. [38]. The false vocal folds were specified according to data by Agarwal et al. [39]. The dimensions and more details on the geometry can be found in [18].
In wall-bounded flows, the presence of solid walls fundamentally influences the flow dynamics, turbulence generation and transport in the near-wall regions due to large viscous stresses. The accuracy of the numerical simulation is thus closely related to the grid resolution near the fixed walls. According to the classification by Pope [40], LES of wall-bounded flows can be classified as Large-Eddy Simulation with Near-Wall Resolution (LES-NWR) with a grid sufficiently fine to resolve 80% of the turbulent energy in the boundary layer, and Large-Eddy Simulation with Near-Wall Modeling (LES-NWM), which employs a modeling approach similar to RANS in the near-wall region. For these simulations, an important parameter is the wall unit where is the wall shear stress, µ e f f = (µ + µ t ) is the effective dynamic viscosity and y the dimensional distance in normal direction from the wall. The wall unit y + , commonly referred to as "y plus", is used as the dimensionless wall-normal distance. Using the same normalization, x + and and z + denote the dimensionless streamwise and spanwise distances. Wall units are also commonly used to indicate LES adequacy. According to [41,42], in LES-NWR the theoretical limits for the grid spacing adjacent to the wall are 50 ≤ ∆x + ≤ 150, ∆y + < 1 and 15 ≤ ∆z + ≤ 40, with at least 3-5 gridpoints between 0 < y + < 10.
The computational mesh in the current CFD simulation is block-structured in order to capture well the boundary layer and consists of 2.1 M hexahedral elements. The mesh deforms in time due to vocal fold oscillation. The grid resolution in wall units was evaluated in three distinct time instants, corresponding to maximum opening of the vocal folds (t O ), maximum closure during the divergent phase (t N ) and maximum closure during the convergent phase (t C ), see Table 2 and Figure 2. On the boundary Γ bVF at time t C these values were evaluated: y + avg = 1.77, z + = 14 and x + = 8.

Discretization and Numerical Solution
The Navier-Stokes equations were discretized using the collocated cell-centered Finite Volume Method. Fletcher [43] demonstrated that even-ordered derivatives in the truncation error are associated with numerical dissipation, and odd-ordered spatial derivatives are associated with the numerical dispersion in the solution. Ideally, LES simulations should use schemes with low numerical dissipation. The non-dissipative central differencing scheme (CDS), which was applied in this study, allows an accurate representation of the changing flow field [44]. The discretization of the diffusion term is split into an orthogonal and cross-diffusion term, using a procedure described in [45]. Unlike the discretization of the temporal, convective and orthogonal part of the diffusive term, the nonorthogonal correctors are treated explicitly.
The CFD simulations were run in parallel on 20 cores on a computational cluster, composed of nodes with two 10-core Intel Xeon 4114 CPUs with 96 GB RAM. In order to have sufficient resolution in the spectrum of the aeroacoustic signal, a sufficiently long simulation time t = 0.2 s, that is, 20 periods of vocal fold vibration, is needed. For such settings, one CFD simulation required 27-37 days, that is, about 15,000 core-hours of computational time.

CFD Results
The current study reports on the results of three CFD simulations using different turbulence modeling approaches, which are summarized in Table 3. The laminar case "LAM" used no turbulence model. "OE" and "WALE" are LES simulations with the One-Equation and WALE SGS models, respectively. All the simulations were run for 20 periods of vocal fold oscillations, that is, t = 0.02 s. The CFD simulations provide the filtered velocity and pressure fields U, p. For simplicity, the overbars are dropped in the following presentation.  Figure 2 shows the flow rate and glottal opening during the last four simulated cycles of vocal fold oscillation. The predicted peak flow rate in the laminar case is, for the same boundary conditions, higher than in the One-Equation and WALE SGS models by 16.76% and 5.26%, respectively. This is caused by the different values of the SGS viscosity, which add to the molecular viscosity and limit the flow rate through the glottal constriction. The laminar model does not capture the influence of small-scale turbulence, which corresponds to ν t = 0. The WALE SGS model and the One-Equation SGS model compute with non-zero SGS viscosity, with the latter one significantly higher due to the already mentioned deficiency of the One-Equation model, which overestimates the turbulent viscosity near the vocal fold surfaces (see also Figure 6). The flow rate does not reach zero value, since the vocal folds do not fully close, corresponding physiologically to breathy phonation. The velocity and pressure distributions along the laryngeal mid-line are plotted in Figure 3 in three distinctive times (see Table 2). The high-velocity glottal flow corresponds to a low static pressure due to the Bernoulli effect. The three simulations give almost identical results in the subglottal space, where the flow is laminar, but differ in the supraglottal volume where turbulent fluctuations are present. Figure 4, plotting the velocity fields in a sagittal (x-z) plane, shows the structure of the expanding jets during phonation.  The visualizations of the vorticity fields ω = ∇ × U, which are commonly used for characterizing turbulence in cases with no entrainment rotation, are shown in midcoronal (x-y) plane in Figure 5. They reveal the shear layers, where vortices are shed as a consequence of Kelvin-Helmholtz instability. The vortices may undergo successive instabilities, leading to a direct kinetic-energy cascade towards the small scales. The supraglottal jet deflects stochastically towards either of the ventricular folds. This behavior is not a consequence of the SGS model, it is caused by the bistability of the flow in this symmetric geometry (see e.g., [46,47]). The effect of the unresolved turbulent sub-grid scales on the resolved velocity and pressure fields is carried by the turbulent SGS viscosity ν t (see Equations (10) and (15)). Figure 6 clearly shows that the turbulent viscosity predicted by the One-Equation model is very high in regions of pure shear, especially within glottis. At certain phases of the vocal fold motion, ν t reaches values three times higher than the molecular viscosity ν = 1.58 · 10 −5 m 2 s −1 . In contrast to this, the WALE SGS model predicts considerably lower values, not localized in the shear layers.

Computational Aeroacoustic (CAA) Model of Human Phonation
Aeroacoustics deal with flow-induced sound generation and wave propagation. The sound generation is caused by turbulent motion of fluid, periodic varying flow fields or aerodynamic forces acting on solids. The sources in the case of human phonation are commonly denoted as: i.
A monopole source term due to the motion of vocal folds (the term is zero, when the walls are fixed and also the monopole source term at inlet is often omitted due to a non-reflecting boundary condition). ii. A dipole source term due to the unsteady force exerted by the surface of the vocal folds onto the fluid. iii. A quadrupole sound term due to the unsteady flow inside the vocal tract. See [2] for more details.
The numerical simulation of aeroacoustic effects can be realized either by using direct simulations or hybrid methods. Direct simulation is based on the compressible Navier-Stokes equations, which capture both the fluid dynamic and acoustic fluctuations. The limitation of this approach is hidden in the computational effort associated with the disparity of scales between the flow and acoustic variables (the small turbulent scales and the large acoustic wavelength during common speech), which can reach several orders of magnitude. To circumvent this problem, hybrid approaches are commonly used, where the flow field and the acoustic field are computed separately, see for example, [14,48,49]. In the current study, we used a hybrid method based on an incompressible flow computation and utilizing the perturbation ansatz to achieve the equation with which the acoustic field is computed.

Mathematical Model
The mathematical model used for aeroacoustic simulations in this study is the Perturbed Convective Wave Equation (PCWE). For understanding of the context, the idea of separation of flow variables is introduced first. In the following sections, the Acoustic Perturbation Equations (APE) will be described first. The vector-valued APEs can be further reformulated into scalar PCWE, sparing computational resources.
LES simulations provide the filtered flow variables, and so the computational aeroacoustic (CAA) simulations can work only upon the filtered velocity and pressure fields. For simplicity, the overbars from LES filtering were dropped in further presentation.

Acoustic Perturbation Equations (APEs)
The idea is based on using a perturbation ansatz, that is, splitting a fluid variable into an acoustic and a non-acoustic component [50]. In contrast to the Lighthill analogy, the fluctuating component is further decomposed into the incompressible and acoustic components where U 0 is the temporal mean flow, the term U ic represents the incompressible part of the flow velocity and U a is the acoustic part of the flow velocity. The same decomposition is used for the pressure. The term ρ 1 (x, t) is referred by Hardin and Pope [50] as a density correction and ensures constant entropy in the liquid, that is, ρ 1 = (p ic − p ic )/c 2 0 . The following assumptions are made: • The velocity field is purely solenoidal, that is, ∇ · U ic = 0, • The density ρ 0 is constant, that is, ∇ρ 0 = 0 and ∂ρ 0 /∂t = 0, • The acoustic field is irrotational, that is, ∇ × U a = 0.
The system of equations derived from the momentum and mass conservation Equations (4) and (2), using all ansatz variables defined above, arrive at Equations (20) and (21) are called the Acoustic Perturbation Equations (for detailed derivation, we refer to [51]). The non-acoustic parts p ic and U 0 are obtained from the CFD results. The time derivative of hydrodynamic pressure ∂p ic /∂t from the right-hand side is the major source term.
Thereby, it results in the relation for the acoustic pressure Substitution of (24) into (21) Equation (25) is simplified into (26), which is a scalar-valued partial differential equation called Perturbed Convective Wave Equation [52] for the acoustic potential ψ a : Benefits of PCWE are the following: faster computation with a scalar unknown, lower memory requirements (compared to APE), includes convection inside the wave operator and solves for the acoustic quantity (compared to Lighthill's analogy). The RHS in (26) is the acoustic source term.

Geometry, Mesh and Numerical Solution
The aeroacoustic field is numerically solved by the Finite Element Method. Figure 7 illustrates the geometry and structure of the acoustic mesh. Geometries of vocal tracts for vowels [u:] and [i:] were modeled from frustums concatenated one after another. The shape of the frustums was defined according to vocal tract area functions measured by magnetic resonance imaging [53]. The vocal tracts (purple) were conformly attached to the larynx (red). The connection is formed by two hexahedral layers, with minor influence on the wave propagation. The right edge of the vocal tract is attached to the downstream radiation zone. The radiation zone is overlaid by a Perfectly Matched Layer (PML) (green) [54], with an implemented damping function to avoid reflecting waves backward. The second PML layer is inserted upstream from the inlet boundary. Microphone mic1 is located 1 cm downstream from the mouth. The acoustic element length ∆l a and time step ∆t a for acoustics are given by estimations [51] ∆l a = c 20 f max = 3.43 mm, ∆t a = 1 20 f max = 1 · 10 −5 s, (27) assuming that 20 linear finite elements per one acoustic wavelength are sufficient. In our case the spatial discretization is limited by 3.43 mm and time step by 1 · 10 −5 s in order to resolve properly acoustic frequencies up to f max = 5 kHz. If the condition is not satisfied, the acoustic results are affected by high dissipation and dispersion [55]. The computation of acoustic sources and simulation of the wave propagation were realized on the same computational cluster as the CFD simulations using the finite element software OpenCFS [56]. The computational times for CAA simulations are much lower than those for the CFD simulations, about 5 h on a single CPU core compared to 30 days on 20 cores. It should be noted that the conservative interpolation of the acoustic source from the CFD grid to the much coarser acoustic grid was performed also by a finite element software OpenCFS. The work of Schoder et al. [48] contains an overview of the conservative strategies, granting a reduction of simulation time.

CAA Results
Correct modeling of aeroacoustic sources is essential for the success of the hybrid aeroacoustic simulations. Therefore, the investigation of the aeroacoustic sources is presented first. In the further section, the results of six wave propagation simulations are analyzed.
3.3.1. Acoustic Sources Figure 8 presents the aeroacoustic sources computed by PCWE (26) from three CFD simulations. The sources, which are a function varying in space and time, were transformed from the time domain into the frequency domain by the Fourier transform, which provides a useful insight into the spatial distribution of the sources at distinct frequencies. The first row in Figure 8 shows the aeroacoustic sources at the fundamental frequency (frequency of vibration of the vocal folds) f 0 = 100 Hz. The strongest sources are located inside the glottis. The laminar simulation results in slightly higher source intensities than the LES simulations. This correlates with the flow rate amplitude, which is also higher in the laminar case (see Figure 2). At the third harmonic frequency f 2 = 300 Hz, aeroacoustic sources are predicted also downstream of the glottis and in the supraglottal vestibule. Especially in laminar and WALE cases, disturbances induced in the shear layer of the jet are clearly visible. The third and the last row refer to a higher harmonic frequency f 9 = 1000 Hz and a non-harmonic frequency f = 1235 Hz. At these higher frequencies, the dominant aeroacoustic sources do not occur within glottis, but in the places, where the fast glottal jet interacts with the ventricular folds and with the slowly moving and recirculating air in the supraglottal volume.

Wave Propagation
In this section, the sound pressure levels (SPL) in the frequency domain are compared. Six signals (three CFD simulations and two types of vocal tract models) were used for spectral analysis, with the frequency resolution ∆ f = 5 Hz and the Hanning windowing. The spectra of the acoustic pressure evaluated at the acoustic probe mic1 (see Figure 7) are depicted in Figure 9 for two types of vocal tracts. The vowel [u:] is characterized by formant frequencies f u: F1 = 270 Hz, f u: F2 = 1000 Hz and f u: F3 = 2484 Hz [53]. The tolerable bandwidths are defined as B F1 = ±100 Hz and B F2,F3 = ±200 Hz. The usage of SGS models does not modify positions of formant frequencies, but it modifies considerably the SPLs (L u: F and L i: F ), see details in Table 4 Figure 9 also shows that the simulations using the WALE SGS model for both vowels result in significantly enforced higher harmonic frequencies n f o compared to the One-Equation SGS model for both vowels. In the case of vowel [u:], the harmonic frequencies above 1000 Hz for the WALE SGS model are even stronger than the aeroacoustic output based on the laminar CFD simulation.

Discussion and Conclusions
Large Eddy Simulations of subsonic incompressible flow in a model of human larynx were realized. The effect of the SGS turbulence model on the flow field, namely the flow rate, velocity, pressure and vorticity was analyzed in three variants-the laminar (no SGS), One-Equation and WALE SGS model. Compared to the classical Smagorinsky algebraic SGS model, the One-Equation SGS model has two prominent features-it is more suitable for high Reynolds number flows and avoids the inaccurate assumption of local balance between SGS energy production and dissipation [57]. However, similar to the Smagorinsky model, the One-Equation model is known to over predict the turbulent viscosity in regions where shear is dominant, that is, in the boundary layer adjacent to the vocal folds and in the shear layers of the glottal jet. The WALE SGS model, on the contrary, produces zero eddy viscosity in cases of pure shear flow, and recovers the proper y 3 near-wall scaling for the eddy (turbulent) viscosity without requiring a dynamic procedure. Moreover, Nicoud and Ducros [30] showed that it can handle laminar-turbulent transition, which makes it attractive for complex geometries and specifically for glottal flow simulations.
In the results of the CFD simulations, the waveforms of the flow rate show that the peak flow rate simulated by the One-Equation and WALE SGS models is lower than in the laminar case by 16.76% and 5.26%, respectively. It is necessary to note that these values are obtained for airflow driven by constant transglottal pressure. Other studies commonly prescribe a less realistic constant inlet velocity condition, where this effect cannot be observed. In our simulations, the differences in the peak flow rate are clearly induced by the SGS model, which acts through the turbulent SGS viscosity adding to the molecular viscosity of air and helps to block the airflow in the narrowest constriction between the vocal folds. The results prove that the WALE model predicts considerably lower, and presumably more realistic, values, not localized in the shear layers.
Due to the disparity of scales between the flow structures and acoustic wavelengths, a hybrid aeroacoustic approach based on the perturbed compressible wave equations was used. The flow results computed by the Finite Volume method were used to evaluate the aeroacoustic sources. These were interpolated from the fine grid covering the larynx to the coarse grid of the larynx and vocal tract, where the acoustic variables were computed by the finite element method.
The visualization of the aeroacoustic sources in the frequency domain shows that the laminar and WALE models predict slightly stronger aeroacoustic sources compared to the One-Equation model. Sound sources at the fundamental frequency f 0 are located primarily in the glottis. At higher harmonic frequencies, the sources are distributed also in the supraglottal vestibule and further downstream of the glottis, especially in the shear layer of the jet and in places, where the fast glottal jet interacts with the ventricular folds and with the slowly recirculating air in the supraglottal spaces.
The acoustic spectra of the sound evaluated 1 cm from the lips show important differences between the simulations based on the laminar CFD data, and LES simulations with the One-Equation and WALE SGS models. The sound pressure levels from the WALE simulation are significantly higher than those from the One-Equation model, especially at frequencies above 1000 Hz, where the difference reaches up to 15 dB. In the case of the vowel [i:], the SPL predictions based on the WALE model exceed even those from the laminar model. This can be partly explained by the fact that the CFD simulations using the One-Equation model result in lower flow rates and pressure fluctuations. However, at lower frequencies, the trend in the acoustic spectra is more ambiguous: the differences between the three models are less significant and, for example, at the third harmonic, the WALE model gives even lower SPLs than the One-Equation model. Thus, the differences cannot be attributed only to the amplitude of the underlying fluid flow pulsation. It may be concluded that the WALE SGS model performs better not only in the CFD simulation, where we know from theory that it handles more correctly the boundary and shear layers, but that its effect is also beneficial for the subsequent aeroacoustic simulations of the human voice, where it promotes the higher harmonic content in the acoustic spectrum.