Special Issue "Computational Methods and Engineering Solutions to Voice"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Acoustics and Vibrations".

Deadline for manuscript submissions: closed (31 May 2019)

Special Issue Editor

Guest Editor
Prof. Dr. Michael Döllinger

Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
Website | E-Mail
Interests: FSAI in voice production; clinical transfer of new analysis methods and technologies

Special Issue Information

Dear Colleagues,

Today, voice and speech research is by far not limited to acoustic, medical and clinical studies and investigations. Approaches from different fields like mathematics, computer science, fluid dynamics, mechatronics and biology are widely applied to achieve new insight into and better understanding of the physiological and pathological laryngeal processes within voice and speech production. Based on fruitful interdisciplinary working research groups, many new approaches have been suggested during the last decade. This includes for example highly advanced numerical models (FEM/FVM models) as well as tissue engineering and data analysis approaches. The purpose of this Special Issue is to provide an overview of the newest and most innovative techniques applied in our field. Young colleagues are especially encouraged to submit their work. Authors are invited to submit their work related to the following topics, applying mathematical, engineering, computer science and biological methods, within the field of voice and speech production:

  • Computational modeling
  • Experimental modeling 
  • Computational fluid dynamics
  • Fluid–structure–acoustic interaction
  • Image processing
  • Advanced data analysis
  • New technologies
  • Tissue engineering
  • Molecular biology

Prof. Dr. Michael Döllinger
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1500 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Computational modeling 
  • Experimental modeling
  • Computational fluid dynamics 
  • Fluid–structure–acoustic interaction 
  • Image processing 
  • Advanced data analysis 
  • New technologies 
  • Tissue engineering
  • Molecular biology

Published Papers (6 papers)

View options order results:
result details:
Displaying articles 1-6
Export citation of selected articles as:

Research

Open AccessFeature PaperArticle
A Case of Specificity: How Does the Acoustic Voice Quality Index Perform in Normophonic Subjects?
Appl. Sci. 2019, 9(12), 2527; https://doi.org/10.3390/app9122527
Received: 6 May 2019 / Revised: 10 June 2019 / Accepted: 14 June 2019 / Published: 21 June 2019
PDF Full-text (911 KB) | HTML Full-text | XML Full-text
Abstract
The acoustic voice quality index (AVQI) is a multiparametric tool based on six acoustic measurements to quantify overall voice quality in an objective manner, with the smoothed version of the cepstral peak prominence (CPPS) as its main contributor. In the last decade, many [...] Read more.
The acoustic voice quality index (AVQI) is a multiparametric tool based on six acoustic measurements to quantify overall voice quality in an objective manner, with the smoothed version of the cepstral peak prominence (CPPS) as its main contributor. In the last decade, many studies demonstrated its robust diagnostic accuracy and high sensitivity to voice changes across voice therapy in different languages. The aim of the present study was to provide information regarding AVQI’s and CPPS’s performance in normophonic non-treatment-seeking subjects, since these data are still scarce; concatenated voice samples, consisting of sustained vowel phonation and continuous speech, from 123 subjects (72 females, 51 males; between 20 and 60 years old) without vocally relevant complaints were evaluated by three raters and run in AVQI v.02.06. According to this auditory-perceptual evaluation, two cohorts were set up (normophonia versus slight perceived dysphonia). First, gender effects were investigated. Secondly, between-cohort differences in AVQI and CPPS were investigated. Thirdly, with the number of judges giving G = 1 to partition three sub-levels of slight hoarseness as an independent factor, differences in AVQI and CPPS across these sub-levels were investigated; for AVQI, no significant gender effect was found, whereas, for CPPS, significant trends were observed. For both AVQI and CPPS, no significant differences were found between normophonic and slightly dysphonic subjects. For AVQI, however, this difference did approach significance; these findings emphasize the need for a normative study with a greater sample size and subsequently greater statistical power to detect possible significant effects and differences. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice)
Figures

Figure 1

Open AccessArticle
Estimating Vocal Fold Contact Pressure from Raw Laryngeal High-Speed Videoendoscopy Using a Hertz Contact Model
Appl. Sci. 2019, 9(11), 2384; https://doi.org/10.3390/app9112384
Received: 30 April 2019 / Revised: 31 May 2019 / Accepted: 4 June 2019 / Published: 11 June 2019
PDF Full-text (1246 KB) | HTML Full-text | XML Full-text
Abstract
The development of trauma-induced lesions of the vocal folds (VFs) has been linked to a high collision pressure on the VF surface. However, there are no direct methods for the clinical assessment of VF collision, thus limiting the objective assessment of these disorders. [...] Read more.
The development of trauma-induced lesions of the vocal folds (VFs) has been linked to a high collision pressure on the VF surface. However, there are no direct methods for the clinical assessment of VF collision, thus limiting the objective assessment of these disorders. In this study, we develop a video processing technique to directly quantify the mechanical impact of the VFs using solely laryngeal kinematic data. The technique is based on an edge tracking framework that estimates the kinematic sequence of each VF edge with a Kalman filter approach and a Hertzian impact model to predict the contact force during the collision. The proposed formulation overcomes several limitations of prior efforts since it uses a more relevant VF contact geometry, it does not require calibrated physical dimensions, it is normalized by the tissue properties, and it applies a correction factor for using a superior view only. The proposed approach is validated against numerical models, silicone vocal fold models, and prior studies. A case study with high-speed videoendoscopy recordings provides initial insights between the sound pressure level and contact pressure. Thus, the proposed method has a high potential in clinical practice and could also be adapted to operate with laryngeal stroboscopic systems. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice)
Figures

Figure 1

Open AccessArticle
Towards a Clinically Applicable Computational Larynx Model
Appl. Sci. 2019, 9(11), 2288; https://doi.org/10.3390/app9112288
Received: 5 April 2019 / Revised: 22 May 2019 / Accepted: 30 May 2019 / Published: 3 June 2019
PDF Full-text (14060 KB) | HTML Full-text | XML Full-text
Abstract
The enormous computational power and time required for simulating the complex phonation process preclude the effective clinical use of computational larynx models. The aim of this study was to evaluate the potential of a numerical larynx model, considering the computational time and resources [...] Read more.
The enormous computational power and time required for simulating the complex phonation process preclude the effective clinical use of computational larynx models. The aim of this study was to evaluate the potential of a numerical larynx model, considering the computational time and resources required. Using Large Eddy Simulations (LES) in a 3D numerical larynx model with prescribed motion of vocal folds, the complicated fluid-structure interaction problem in phonation was reduced to a pure flow simulation with moving boundaries. The simulated laryngeal flow field is in good agreement with the experimental results obtained from authors’ synthetic larynx model. By systematically decreasing the spatial and temporal resolutions of the numerical model and optimizing the computational resources of the simulations, the elapsed simulation time was reduced by 90% to less than 70 h for 10 oscillation cycles of the vocal folds. The proposed computational larynx model with reduced mesh resolution is still able to capture the essential laryngeal flow characteristics and produce results with sufficiently good accuracy in a significant shorter time-to-solution. The reduction in computational time achieved is a promising step towards the clinical application of these computational larynx models in the near future. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice)
Figures

Figure 1

Open AccessArticle
Impact of Subharmonic and Aperiodic Laryngeal Dynamics on the Phonatory Process Analyzed in Ex Vivo Rabbit Models
Appl. Sci. 2019, 9(9), 1963; https://doi.org/10.3390/app9091963
Received: 21 March 2019 / Revised: 10 May 2019 / Accepted: 10 May 2019 / Published: 13 May 2019
PDF Full-text (3182 KB) | HTML Full-text | XML Full-text
Abstract
Normal voice is characterized by periodic oscillations of the vocal folds. On the other hand, disordered voice dynamics (e.g., subharmonic and aperiodic oscillations) are often associated with voice pathologies and dysphonia. Unfortunately, not all investigations may be conducted on human subjects; hence animal [...] Read more.
Normal voice is characterized by periodic oscillations of the vocal folds. On the other hand, disordered voice dynamics (e.g., subharmonic and aperiodic oscillations) are often associated with voice pathologies and dysphonia. Unfortunately, not all investigations may be conducted on human subjects; hence animal laryngeal studies have been performed for many years to better understand human phonation. The rabbit larynx has been shown to be a potential model of the human larynx. Despite this fact, only a few studies regarding the phonatory parameters of rabbit larynges have been performed. Further, to the best of our knowledge, no ex vivo study has systematically investigated phonatory parameters from high-speed, audio and subglottal pressure data with irregular oscillations. To remedy this, the present study analyzes experiments with sustained phonation in 11 ex vivo rabbit larynges for 51 conditions of disordered vocal fold dynamics. (1) The results of this study support previous findings on non-disordered data, that the stronger the glottal closure insufficiency is during phonation, the worse the phonatory characteristics are; (2) aperiodic oscillations showed worse phonatory results than subharmonic oscillations; (3) in the presence of both types of irregular vibrations, the voice quality (i.e., cepstral peak prominence) of the audio and subglottal signal greatly deteriorated compared to normal/periodic vibrations. In summary, our results suggest that the presence of both types of irregular vibration have a major impact on voice quality and should be considered along with glottal closure measures in medical diagnosis and treatment. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice)
Figures

Figure 1

Open AccessArticle
Discrimination between Modal, Breathy and Pressed Voice for Single Vowels Using Neck-Surface Vibration Signals
Appl. Sci. 2019, 9(7), 1505; https://doi.org/10.3390/app9071505
Received: 8 March 2019 / Revised: 4 April 2019 / Accepted: 6 April 2019 / Published: 11 April 2019
PDF Full-text (3944 KB) | HTML Full-text | XML Full-text
Abstract
The purpose of this study was to investigate the feasibility of using neck-surface acceleration signals to discriminate between modal, breathy and pressed voice. Voice data for five English single vowels were collected from 31 female native Canadian English speakers using a portable Neck [...] Read more.
The purpose of this study was to investigate the feasibility of using neck-surface acceleration signals to discriminate between modal, breathy and pressed voice. Voice data for five English single vowels were collected from 31 female native Canadian English speakers using a portable Neck Surface Accelerometer (NSA) and a condenser microphone. Firstly, auditory-perceptual ratings were conducted by five clinically-certificated Speech Language Pathologists (SLPs) to categorize voice type using the audio recordings. Intra- and inter-rater analyses were used to determine the SLPs’ reliability for the perceptual categorization task. Mixed-type samples were screened out, and congruent samples were kept for the subsequent classification task. Secondly, features such as spectral harmonics, jitter, shimmer and spectral entropy were extracted from the NSA data. Supervised learning algorithms were used to map feature vectors to voice type categories. A feature wrapper strategy was used to evaluate the contribution of each feature or feature combinations to the classification between different voice types. The results showed that the highest classification accuracy on a full set was 82.5%. The breathy voice classification accuracy was notably greater (approximately 12%) than those of the other two voice types. Shimmer and spectral entropy were the best correlated metrics for the classification accuracy. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice)
Figures

Figure 1

Open AccessFeature PaperArticle
Influence of Analyzed Sequence Length on Parameters in Laryngeal High-Speed Videoendoscopy
Appl. Sci. 2018, 8(12), 2666; https://doi.org/10.3390/app8122666
Received: 23 October 2018 / Revised: 11 December 2018 / Accepted: 13 December 2018 / Published: 18 December 2018
Cited by 1 | PDF Full-text (5006 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Laryngeal high-speed videoendoscopy (HSV) allows objective quantification of vocal fold vibratory characteristics. However, it is unknown how the analyzed sequence length affects some of the computed parameters. To examine if varying sequence lengths influence parameter calculation, 20 HSV recordings of healthy females during [...] Read more.
Laryngeal high-speed videoendoscopy (HSV) allows objective quantification of vocal fold vibratory characteristics. However, it is unknown how the analyzed sequence length affects some of the computed parameters. To examine if varying sequence lengths influence parameter calculation, 20 HSV recordings of healthy females during sustained phonation were investigated. The clinical prevalent Photron Fastcam MC2 camera with a frame rate of 4000 fps and a spatial resolution of 512 × 256 pixels was used to collect HSV data. The glottal area waveform (GAW), describing the increase and decrease of the area between the vocal folds during phonation, was extracted. Based on the GAW, 16 perturbation parameters were computed for sequences of 5, 10, 20, 50 and 100 consecutive cycles. Statistical analysis was performed using SPSS Statistics, version 21. Only three parameters (18.8%) were statistically significantly influenced by changing sequence lengths. Of these parameters, one changed until 10 cycles were reached, one until 20 cycles were reached and one, namely Amplitude Variability Index (AVI), changed between almost all groups of different sequence lengths. Moreover, visually observable, but not statistically significant, changes within parameters were observed. These changes were often most prominent between shorter sequence lengths. Hence, we suggest using a minimum sequence length of at least 20 cycles and discarding the parameter AVI. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice)
Figures

Figure 1

Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top