Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (60)

Search Parameters:
Keywords = vocal tract

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 559 KiB  
Systematic Review
Acoustic Voice Analysis as a Tool for Assessing Nasal Obstruction: A Systematic Review
by Gamze Yesilli-Puzella, Emilia Degni, Claudia Crescio, Lorenzo Bracciale, Pierpaolo Loreti, Davide Rizzo and Francesco Bussu
Appl. Sci. 2025, 15(15), 8423; https://doi.org/10.3390/app15158423 - 29 Jul 2025
Viewed by 178
Abstract
Objective: This study aims to critically review and synthesize the existing literature on the use of voice analysis in assessing nasal obstruction, with a particular focus on acoustic parameters. Data sources: PubMed, Scopus, Web of Science, Ovid Medline, and Science Direct. Review methods: [...] Read more.
Objective: This study aims to critically review and synthesize the existing literature on the use of voice analysis in assessing nasal obstruction, with a particular focus on acoustic parameters. Data sources: PubMed, Scopus, Web of Science, Ovid Medline, and Science Direct. Review methods: A comprehensive literature search was conducted without any restrictions on publication year, employing Boolean search techniques. The selection and review process of the studies followed PRISMA guidelines. The inclusion criteria comprised studies with participants aged 18 years and older who had nasal obstruction evaluated using acoustic voice analysis parameters, along with objective and/or subjective methods for assessing nasal obstruction. Results: Of the 174 abstracts identified, 118 were screened after the removal of duplicates. The full texts of 37 articles were reviewed. Only 10 studies met inclusion criteria. The majority of these studies found no significant correlations between voice parameters and nasal obstruction. Among the various acoustic parameters examined, shimmer was the most consistently affected, with statistically significant changes identified in three independent studies. A smaller number of studies reported notable findings for fundamental frequency (F0) and noise-related measures such as NHR/HNR. Conclusion: This systematic review critically evaluates existing studies on the use of voice analysis for assessing and monitoring nasal obstruction and hyponasality. The current evidence remains limited, as most investigations predominantly focus on glottic sound and dysphonia, with insufficient attention to the influence of the vocal tract, particularly the nasal cavities, on voice production. A notable gap exists in the integration of advanced analytical approaches, such as machine learning, in this field. Future research should focus on the use of advanced analytical approaches to specifically extrapolate the contribution of nasal resonance to voice thus defining the specific parameters in the voice spectrogram that can give precise information on nasal obstruction. Full article
(This article belongs to the Special Issue Innovative Digital Health Technologies and Their Applications)
Show Figures

Figure 1

12 pages, 445 KiB  
Article
The Effect of Phoniatric and Logopedic Rehabilitation on the Voice of Patients with Puberphonia
by Lidia Nawrocka, Agnieszka Garstecka and Anna Sinkiewicz
J. Clin. Med. 2025, 14(15), 5350; https://doi.org/10.3390/jcm14155350 - 29 Jul 2025
Viewed by 268
Abstract
Background/Objective: Puberphonia is a voice disorder characterized by the persistence of a high-pitched voice in sexually mature males. In phoniatrics and speech-language pathology, it is also known as post-mutational voice instability, mutational falsetto, persistent fistulous voice, or functional falsetto. The absence of an [...] Read more.
Background/Objective: Puberphonia is a voice disorder characterized by the persistence of a high-pitched voice in sexually mature males. In phoniatrics and speech-language pathology, it is also known as post-mutational voice instability, mutational falsetto, persistent fistulous voice, or functional falsetto. The absence of an age-appropriate vocal pitch may adversely affect psychological well-being and hinder personal, social, and occupational functioning. The aim of this study was to evaluate of the impact of phoniatric and logopedic rehabilitation on voice quality in patients with puberphonia. Methods: The study included 18 male patients, aged 16 to 34 years, rehabilitated for voice mutation disorders. Phoniatric and logopedic rehabilitation included voice therapy tailored to each subject. A logopedist led exercises aimed at lowering and stabilizing the pitch of the voice and improving its quality. A phoniatrician supervised the therapy, monitoring the condition of the vocal apparatus and providing additional diagnostic and therapeutic recommendations as needed. The duration and intensity of the therapy were adjusted for each patient. Before and after voice rehabilitation, the subjects completed the following questionnaires: the Voice Handicap Index (VHI), the Vocal Tract Discomfort (VTD) scale, and the Voice-Related Quality of Life (V-RQOL). They also underwent an acoustic voice analysis. Results: Statistical analysis of the VHI, VTD, and V-RQOL scores, as well as the voice’s acoustic parameters, showed statistically significant differences before and after rehabilitation (p < 0.005). Conclusions: Phoniatric and logopedic rehabilitation is an effective method of reducing and maintaining a stable, euphonic male voice in patients with functional puberphonia. Effective voice therapy positively impacts selected aspects of psychosocial functioning reported by patients, improves voice-related quality of life, and reduces physical discomfort in the vocal tract. Full article
(This article belongs to the Section Otolaryngology)
Show Figures

Figure 1

11 pages, 2591 KiB  
Article
Clarification of the Acoustic Characteristics of Velopharyngeal Insufficiency by Acoustic Simulation Using the Boundary Element Method: A Pilot Study
by Mami Shiraishi, Katsuaki Mishima, Masahiro Takekawa, Masaaki Mori and Hirotsugu Umeda
Acoustics 2025, 7(2), 26; https://doi.org/10.3390/acoustics7020026 - 13 May 2025
Viewed by 671
Abstract
A model of the vocal tract that mimicked velopharyngeal insufficiency was created, and acoustic analysis was performed using the boundary element method to clarify the acoustic characteristics of velopharyngeal insufficiency. The participants were six healthy adults. Computed tomography (CT) images were taken from [...] Read more.
A model of the vocal tract that mimicked velopharyngeal insufficiency was created, and acoustic analysis was performed using the boundary element method to clarify the acoustic characteristics of velopharyngeal insufficiency. The participants were six healthy adults. Computed tomography (CT) images were taken from the frontal sinus to the glottis during phonation of the Japanese vowels /i/ and /u/, and models of the vocal tracts were created from the CT data. To recreate velopharyngeal insufficiency, coupling of the nasopharynx was carried out in vocal tract models with no nasopharyngeal coupling, and the coupling site was enlarged in models with nasopharyngeal coupling. The vocal tract models were extended virtually for 12 cm in a cylindrical shape to represent the region from the lower part of the glottis to the tracheal bifurcation. The Kirchhoff–Helmholtz integral equation was used for the wave equation, and the boundary element method was used for discretization. Frequency response curves from 1 to 3000 Hz were calculated by applying the boundary element method. The curves showed the appearance of a pole–zero pair around 500 Hz, increased intensity around 250 Hz, decreased intensity around 500 Hz, decreased intensities of the first and second formants (F1 and F2), and a lower frequency of F2. Of these findings, increased intensity around 250 Hz, decreased intensity around 500 Hz, decreased intensities of F1 and F2, and lower frequency of F2 agree with the previously reported acoustic characteristics of hypernasality. Full article
(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)
Show Figures

Figure 1

16 pages, 551 KiB  
Article
Dual-Channel Spoofed Speech Detection Based on Graph Attention Networks
by Yun Tan, Xiaoqian Weng and Jiangzhang Zhu
Symmetry 2025, 17(5), 641; https://doi.org/10.3390/sym17050641 - 24 Apr 2025
Viewed by 504
Abstract
In the field of voice cryptography, detecting forged speech is crucial for secure communication and identity authentication. While most existing spoof detection methods rely on monaural audio, the characteristics of dual-channel signals remain underexplored. To address this, we propose a symmetrical dual-branch detection [...] Read more.
In the field of voice cryptography, detecting forged speech is crucial for secure communication and identity authentication. While most existing spoof detection methods rely on monaural audio, the characteristics of dual-channel signals remain underexplored. To address this, we propose a symmetrical dual-branch detection framework that integrates Res2Net with coordinate attention (Res2NetCA) and a dual-channel heterogeneous graph fusion module (DHGFM). The proposed architecture encodes left and right vocal tract signals into spectrogram and time-domain graphs, and it models both intra- and inter-channel time–frequency dependencies through graph attention mechanisms and fusion strategies. Experimental results on the ASVspoof2019 and ASVspoof2021 LA datasets demonstrate the superior detection performance of our method. Specifically, it achieved an EER of 1.64% and a Min-tDCF of 0.051 on ASVspoof2019, and an EER of 6.76% with a Min-tDCF of 0.3638 on ASVspoof2021, validating the effectiveness and potential of dual-channel modeling in spoofed speech detection. Full article
(This article belongs to the Special Issue Applications Based on Symmetry in Applied Cryptography)
Show Figures

Figure 1

24 pages, 4555 KiB  
Review
Biophysics of Voice Onset: A Comprehensive Overview
by Philippe H. DeJonckere and Jean Lebacq
Bioengineering 2025, 12(2), 155; https://doi.org/10.3390/bioengineering12020155 - 6 Feb 2025
Viewed by 1562
Abstract
Voice onset is the sequence of events between the first detectable movement of the vocal folds (VFs) and the stable vibration of the vocal folds. It is considered a critical phase of phonation, and the different modalities of voice onset and their distinctive [...] Read more.
Voice onset is the sequence of events between the first detectable movement of the vocal folds (VFs) and the stable vibration of the vocal folds. It is considered a critical phase of phonation, and the different modalities of voice onset and their distinctive characteristics are analysed. Oscillation of the VFs can start from either a closed glottis with no airflow or an open glottis with airflow. The objective of this article is to provide a comprehensive survey of this transient phenomenon, from a biomechanical point of view, in normal modal (i.e., nonpathological) conditions of vocal emission. This synthetic overview mainly relies upon a number of recent experimental studies, all based on in vivo physiological measurements, and using a common, original and consistent methodology which combines high-speed imaging, sound analysis, electro-, photo-, flow- and ultrasound glottography. In this way, the two basic parameters—the instantaneous glottal area and the airflow—can be measured, and the instantaneous intraglottal pressure can be automatically calculated from the combined records, which gives a detailed insight, both qualitative and quantitative, into the onset phenomenon. The similarity of the methodology enables a link to be made with the biomechanics of sustained phonation. Essential is the temporal relationship between the glottal area and intraglottal pressure. The three key findings are (1) From the initial onset cycles onwards, the intraglottal pressure signal leads that of the opening signal, as in sustained voicing, which is the basic condition for an energy transfer from the lung pressure to the VF tissue. (2) This phase lead is primarily due to the skewing of the airflow curve to the right with respect to the glottal area curve, a consequence of the compressibility of air and the inertance of the vocal tract. (3) In case of a soft, physiological onset, the glottis shows a spindle-shaped configuration just before the oscillation begins. Using the same parameters (airflow, glottal area, intraglottal pressure), the mechanism of triggering the oscillation can be explained by the intraglottal aerodynamic condition. From the first cycles on, the VFs oscillate on either side of a paramedian axis. The amplitude of these free oscillations increases progressively before the first contact on the midline. Whether the first movement is lateral or medial cannot be defined. Moreover, this comprehensive synthesis of onset biomechanics and the links it creates sheds new light on comparable phenomena at the level of sound attack in wind instruments, as well as phenomena such as the production of intervals in the sung voice. Full article
(This article belongs to the Special Issue The Biophysics of Vocal Onset)
Show Figures

Figure 1

11 pages, 1006 KiB  
Article
Cross-Cultural Adaptation and Validation of the Malayalam Version of the Vocal Tract Discomfort Scale
by Sunil Kumar Ravi, Srushti Shabnam, Saraswathi Thupakula, Vijaya Kumar Narne, Krishna Yerraguntla, Abdulaziz Almudhi, Irfana Madathodiyil, Feby Sajan and Kochette Ria Jacob
Diagnostics 2025, 15(3), 259; https://doi.org/10.3390/diagnostics15030259 - 23 Jan 2025
Viewed by 858
Abstract
Background: Voice disorders significantly impact individuals’ physical, functional, and emotional well-being, necessitating comprehensive assessment tools. The Vocal Tract Discomfort Scale (VTDS) assesses the frequency and severity of vocal discomfort symptoms. Despite its global adaptations, no validated Malayalam version has existed. This study aimed [...] Read more.
Background: Voice disorders significantly impact individuals’ physical, functional, and emotional well-being, necessitating comprehensive assessment tools. The Vocal Tract Discomfort Scale (VTDS) assesses the frequency and severity of vocal discomfort symptoms. Despite its global adaptations, no validated Malayalam version has existed. This study aimed to adapt and validate the VTDS for Malayalam speakers (VTDS-M). Method: The study was conducted in two phases: Phase I involved translation and cultural adaptation of VTDS into Malayalam, followed by content validation by native-speaking speech language pathologists; Phase II involved validation of VTDS-M on 150 professional voice users, categorized into normophonic (n = 105) and dysphonic (n = 45) groups based on otolaryngological and perceptual voice evaluations. Participants completed VTDS-M and VHI-M (Voice Handicap Index—Malayalam). Results: The results showed strong internal consistency (Cronbach’s α = 0.827 for frequency, 0.813 for severity). Significant differences were observed between groups for VTDS-M subscales and total scores, confirming its discriminatory capability. ROC analysis established a cut-off score of 11.5, with an AROC of 0.749, 64.4% sensitivity, and 79.0% specificity. Also, VTDS-M correlated positively with VHI-M, especially the physical and emotional subscales. Conclusions: VTDS-M demonstrated reliable psychometric properties and diagnostic accuracy, making it a valuable tool for assessing vocal discomfort in Malayalam-speaking populations specifically among the professional voice users. Future studies should explore its applicability to non-professional voice users with varied severity levels of dysphonia. Full article
(This article belongs to the Special Issue Clinical Diagnosis of Otorhinolaryngology)
Show Figures

Figure 1

13 pages, 1772 KiB  
Review
Chemical Conversations
by Jana Michailidu, Olga Maťátková, Alena Čejková and Jan Masák
Molecules 2025, 30(3), 431; https://doi.org/10.3390/molecules30030431 - 21 Jan 2025
Cited by 1 | Viewed by 984
Abstract
Among living organisms, higher animals primarily use a combination of vocal and non-verbal cues for communication. In other species, however, chemical signaling holds a central role. The chemical and biological activity of the molecules produced by the organisms themselves and the existence of [...] Read more.
Among living organisms, higher animals primarily use a combination of vocal and non-verbal cues for communication. In other species, however, chemical signaling holds a central role. The chemical and biological activity of the molecules produced by the organisms themselves and the existence of receptors/targeting sites that allow recognition of such molecules leads to various forms of responses by the producer and recipient organisms and is a fundamental principle of such communication. Chemical language can be used to coordinate processes within one species or between species. Chemical signals are thus information for other organisms, potentially inducing modification of their behavior. Additionally, this conversation is influenced by the external environment in which organisms are found. This review presents examples of chemical communication among microorganisms, between microorganisms and plants, and between microorganisms and animals. The mechanisms and physiological importance of this communication are described. Chemical interactions can be both cooperative and antagonistic. Microbial chemical signals usually ensure the formation of the most advantageous population phenotype or the disadvantage of a competitive species in the environment. Between microorganisms and plants, we find symbiotic (e.g., in the root system) and parasitic relationships. Similarly, mutually beneficial relationships are established between microorganisms and animals (e.g., gastrointestinal tract), but microorganisms also invade and disrupt the immune and nervous systems of animals. Full article
Show Figures

Graphical abstract

16 pages, 2926 KiB  
Article
Acoustic and Clinical Data Analysis of Vocal Recordings: Pandemic Insights and Lessons
by Pedro Carreiro-Martins, Paulo Paixão, Iolanda Caires, Pedro Matias, Hugo Gamboa, Filipe Soares, Pedro Gomez, Joana Sousa and Nuno Neuparth
Diagnostics 2024, 14(20), 2273; https://doi.org/10.3390/diagnostics14202273 - 12 Oct 2024
Viewed by 1262
Abstract
Background/Objectives: The interest in processing human speech and other human-generated audio signals as a diagnostic tool has increased due to the COVID-19 pandemic. The project OSCAR (vOice Screening of CoronA viRus) aimed to develop an algorithm to screen for COVID-19 using a dataset [...] Read more.
Background/Objectives: The interest in processing human speech and other human-generated audio signals as a diagnostic tool has increased due to the COVID-19 pandemic. The project OSCAR (vOice Screening of CoronA viRus) aimed to develop an algorithm to screen for COVID-19 using a dataset of Portuguese participants with voice recordings and clinical data. Methods: This cross-sectional study aimed to characterise the pattern of sounds produced by the vocal apparatus in patients with SARS-CoV-2 infection documented by a positive RT-PCR test, and to develop and validate a screening algorithm. In Phase II, the algorithm developed in Phase I was tested in a real-world setting. Results: In Phase I, after filtering, the training group consisted of 166 subjects who were effectively available to train the classification model (34.3% SARS-CoV-2 positive/65.7% SARS-CoV-2 negative). Phase II enrolled 58 participants (69.0% SARS-CoV-2 positive/31.0% SARS-CoV-2 negative). The final model achieved a sensitivity of 85%, a specificity of 88.9%, and an F1-score of 84.7%, suggesting voice screening algorithms as an attractive strategy for COVID-19 diagnosis. Conclusions: Our findings highlight the potential of a voice-based detection strategy as an alternative method for respiratory tract screening. Full article
Show Figures

Figure 1

16 pages, 6324 KiB  
Article
Simultaneous High-Speed Video Laryngoscopy and Acoustic Aerodynamic Recordings during Vocal Onset of Variable Sound Pressure Level: A Preliminary Study
by Peak Woo
Bioengineering 2024, 11(4), 334; https://doi.org/10.3390/bioengineering11040334 - 29 Mar 2024
Cited by 3 | Viewed by 1538
Abstract
Voicing: requires frequent starts and stops at various sound pressure levels (SPL) and frequencies. Prior investigations using rigid laryngoscopy with oral endoscopy have shown variations in the duration of the vibration delay between normal and abnormal subjects. However, these studies were not physiological [...] Read more.
Voicing: requires frequent starts and stops at various sound pressure levels (SPL) and frequencies. Prior investigations using rigid laryngoscopy with oral endoscopy have shown variations in the duration of the vibration delay between normal and abnormal subjects. However, these studies were not physiological because the larynx was viewed using rigid endoscopes. We adapted a method to perform to perform simultaneous high-speed naso-endoscopic video while simultaneously acquiring the sound pressure, fundamental frequency, airflow rate, and subglottic pressure. This study aimed to investigate voice onset patterns in normophonic males and females during the onset of variable SPL and correlate them with acoustic and aerodynamic data. Materials and Methods: Three healthy males and three healthy females were studied by simultaneous high-speed video laryngoscopy and recording with the production of the gesture [pa:pa:] at soft, medium, and loud voices. The fiber optic endoscope was threaded through a pneumotachograph mask for the simultaneous recording and analysis of acoustic and aerodynamic data. Results: The average increase in the sound pressure level (SPL) for the group was 15 dB, from 70 to 85 dB. The fundamental frequency increased by an average of 10 Hz. The flow was increased in two subjects, reduced in two subjects, and remained the same in two subjects as the SPL increased. There was a steady increase in the subglottic pressure from soft to loud phonation. Compared to soft to medium phonation, a significant increase in glottal resistance was observed with medium-to-loud phonation. Videokymogram analysis showed the onset of vibration for all voiced tokens without the need for full glottis closure. In loud phonation, there is a more rapid onset of a larger amplitude and prolonged closure of the glottal cycle; however, more cycles are required to achieve the intended SPL. There was a prolonged closed phase during loud phonation. Fast Fourier transform (FFT) analysis of the kymography waveform signal showed a more significant second- and third-harmonic energy above the fundamental frequency with loud phonation. There was an increase in the adjustments in the pharynx with the base of the tongue tilting, shortening of the vocal folds, and pharyngeal constriction. Conclusion: Voice onset occurs in all modalities, without the need for full glottal closure. There was a more significant increase in glottal resistance with loud phonation than that with soft or middle phonation. Vibration analysis of the voice onset showed that more time was required during loud phonation before the oscillation stabilized to a steady state. With increasing SPL, there were significant variations in vocal tract adjustments. The most apparent change was the increase in tongue tension with posterior displacement of the epiglottis. There was an increase in pre-phonation time during loud phonation. Patterns of muscle tension dysphonia with laryngeal squeezing, shortening of the vocal folds, and epiglottis tilting with increasing loudness are features of loud phonation. These observations show that flexible high-speed video laryngoscopy can reveal observations that cannot be observed with rigid video laryngoscopy. An objective analysis of the digital kymography signal can be conducted in selected cases. Full article
(This article belongs to the Special Issue The Biophysics of Vocal Onset)
Show Figures

Figure 1

28 pages, 660 KiB  
Article
Improving End-to-End Models for Children’s Speech Recognition
by Tanvina Patel and Odette Scharenborg
Appl. Sci. 2024, 14(6), 2353; https://doi.org/10.3390/app14062353 - 11 Mar 2024
Cited by 2 | Viewed by 3809
Abstract
Children’s Speech Recognition (CSR) is a challenging task due to the high variability in children’s speech patterns and limited amount of available annotated children’s speech data. We aim to improve CSR in the often-occurring scenario that no children’s speech data is available for [...] Read more.
Children’s Speech Recognition (CSR) is a challenging task due to the high variability in children’s speech patterns and limited amount of available annotated children’s speech data. We aim to improve CSR in the often-occurring scenario that no children’s speech data is available for training the Automatic Speech Recognition (ASR) systems. Traditionally, Vocal Tract Length Normalization (VTLN) has been widely used in hybrid ASR systems to address acoustic mismatch and variability in children’s speech when training models on adults’ speech. Meanwhile, End-to-End (E2E) systems often use data augmentation methods to create child-like speech from adults’ speech. For adult speech-trained ASRs, we investigate the effectiveness of augmentation methods; speed perturbations and spectral augmentation, along with VTLN, in an E2E framework for the CSR task, comparing these across Dutch, German, and Mandarin. We applied VTLN at different stages (training/test) of the ASR and conducted age and gender analyses. Our experiments showed highly similar patterns across the languages: Speed Perturbations and Spectral Augmentation yield significant performance improvements, while VTLN provided further improvements while maintaining recognition performance on adults’ speech (depending on when it is applied). Additionally, VTLN showed performance improvement for both male and female speakers and was particularly effective for younger children. Full article
(This article belongs to the Special Issue Advances in Speech and Language Processing)
Show Figures

Figure 1

14 pages, 1597 KiB  
Article
Machine Learning-Assisted Speech Analysis for Early Detection of Parkinson’s Disease: A Study on Speaker Diarization and Classification Techniques
by Michele Giuseppe Di Cesare, David Perpetuini, Daniela Cardone and Arcangelo Merla
Sensors 2024, 24(5), 1499; https://doi.org/10.3390/s24051499 - 26 Feb 2024
Cited by 20 | Viewed by 3859
Abstract
Parkinson’s disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal [...] Read more.
Parkinson’s disease (PD) is a neurodegenerative disorder characterized by a range of motor and non-motor symptoms. One of the notable non-motor symptoms of PD is the presence of vocal disorders, attributed to the underlying pathophysiological changes in the neural control of the laryngeal and vocal tract musculature. From this perspective, the integration of machine learning (ML) techniques in the analysis of speech signals has significantly contributed to the detection and diagnosis of PD. Particularly, MEL Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GTCCs) are both feature extraction techniques commonly used in the field of speech and audio signal processing that could exhibit great potential for vocal disorder identification. This study presents a novel approach to the early detection of PD through ML applied to speech analysis, leveraging both MFCCs and GTCCs. The recordings contained in the Mobile Device Voice Recordings at King’s College London (MDVR-KCL) dataset were used. These recordings were collected from healthy individuals and PD patients while they read a passage and during a spontaneous conversation on the phone. Particularly, the speech data regarding the spontaneous dialogue task were processed through speaker diarization, a technique that partitions an audio stream into homogeneous segments according to speaker identity. The ML applied to MFCCS and GTCCs allowed us to classify PD patients with a test accuracy of 92.3%. This research further demonstrates the potential to employ mobile phones as a non-invasive, cost-effective tool for the early detection of PD, significantly improving patient prognosis and quality of life. Full article
(This article belongs to the Special Issue Sensors in Health Disease Detection Based on Speech Signals)
Show Figures

Figure 1

18 pages, 4569 KiB  
Article
Deep Learning for Neuromuscular Control of Vocal Source for Voice Production
by Anil Palaparthi, Rishi K. Alluri and Ingo R. Titze
Appl. Sci. 2024, 14(2), 769; https://doi.org/10.3390/app14020769 - 16 Jan 2024
Cited by 1 | Viewed by 2330
Abstract
A computational neuromuscular control system that generates lung pressure and three intrinsic laryngeal muscle activations (cricothyroid, thyroarytenoid, and lateral cricoarytenoid) to control the vocal source was developed. In the current study, LeTalker, a biophysical computational model of the vocal system was used [...] Read more.
A computational neuromuscular control system that generates lung pressure and three intrinsic laryngeal muscle activations (cricothyroid, thyroarytenoid, and lateral cricoarytenoid) to control the vocal source was developed. In the current study, LeTalker, a biophysical computational model of the vocal system was used as the physical plant. In the LeTalker, a three-mass vocal fold model was used to simulate self-sustained vocal fold oscillation. A constant /ə/ vowel was used for the vocal tract shape. The trachea was modeled after MRI measurements. The neuromuscular control system generates control parameters to achieve four acoustic targets (fundamental frequency, sound pressure level, normalized spectral centroid, and signal-to-noise ratio) and four somatosensory targets (vocal fold length, and longitudinal fiber stress in the three vocal fold layers). The deep-learning-based control system comprises one acoustic feedforward controller and two feedback (acoustic and somatosensory) controllers. Fifty thousand steady speech signals were generated using the LeTalker for training the control system. The results demonstrated that the control system was able to generate the lung pressure and the three muscle activations such that the four acoustic and four somatosensory targets were reached with high accuracy. After training, the motor command corrections from the feedback controllers were minimal compared to the feedforward controller except for thyroarytenoid muscle activation. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

20 pages, 5457 KiB  
Article
Effect of Subglottic Stenosis on Expiratory Sound Using Direct Noise Calculation
by Biao Geng, Qian Xue, Scott Thomson and Xudong Zheng
Appl. Sci. 2023, 13(24), 13197; https://doi.org/10.3390/app132413197 - 12 Dec 2023
Viewed by 1422
Abstract
Subglottic stenosis (SGS) is a rare yet potentially life-threatening condition that requires prompt identification and treatment. One of the primary symptoms of SGS is a respiratory sound that is tonal. To better understand the effect of SGS on expiratory sound, we used direct [...] Read more.
Subglottic stenosis (SGS) is a rare yet potentially life-threatening condition that requires prompt identification and treatment. One of the primary symptoms of SGS is a respiratory sound that is tonal. To better understand the effect of SGS on expiratory sound, we used direct noise calculation to simulate sound production in a simplified axisymmetric configuration that included the trachea, the vocal folds, the supraglottal tract, and an open environmental space. This study focused on flow-sustained tones and explored the impact of various parameters, such as the SGS severity, the SGS distance, the flowrate, and the glottal opening size. It was found that the sound pressure level (SPL) of the expiratory sound increased with flowrate. SGS had little effect on the sound until its severity approached 75% and SPL increased rapidly as the severity approached 100%. The results also revealed that the tonal components of the sound predominantly came from hole tones and tract harmonics and their coupling. The spectra of the sound were greatly influenced by constricting the glottis, which suggests that respiratory tasks that involve maneuvers to change the glottal opening size could be useful in gathering more information on respiratory sound to aid in the diagnosis of subglottic stenosis. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

21 pages, 9934 KiB  
Article
On the Alignment of Acoustic and Coupled Mechanic-Acoustic Eigenmodes in Phonation by Supraglottal Duct Variations
by Florian Kraxberger, Christoph Näger, Marco Laudato, Elias Sundström, Stefan Becker, Mihai Mihaescu, Stefan Kniesburges and Stefan Schoder
Bioengineering 2023, 10(12), 1369; https://doi.org/10.3390/bioengineering10121369 - 28 Nov 2023
Cited by 6 | Viewed by 1502
Abstract
Sound generation in human phonation and the underlying fluid–structure–acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that [...] Read more.
Sound generation in human phonation and the underlying fluid–structure–acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that vibroacoustic coupling can cause a deviation in the vocal fold vibration frequency. This occurred when the fundamental frequency of the vocal fold motion was close to the lowest acoustic resonance frequency of the pipe. What is not fully understood is how the vibroacoustic coupling is influenced by a varying vocal tract length. Presuming that this effect is a pure coupling of the acoustical effects, a numerical simulation model is established based on the computation of the mechanical-acoustic eigenvalue. With varying pipe lengths, the lowest acoustic resonance frequency was adjusted in the experiments and so in the simulation setup. In doing so, the evolution of the vocal folds’ coupled eigenvalues and eigenmodes is investigated, which confirms the experimental findings. Finally, it was shown that for normal phonation conditions, the mechanical mode is the most efficient vibration pattern whenever the acoustic resonance of the pipe (lowest formant) is far away from the vocal folds’ vibration frequency. Whenever the lowest formant is slightly lower than the mechanical vocal fold eigenfrequency, the coupled vocal fold motion pattern at the formant frequency dominates. Full article
Show Figures

Figure 1

17 pages, 4265 KiB  
Article
Examining the Quasi-Steady Airflow Assumption in Irregular Vocal Fold Vibration
by Xiaojian Wang, Xudong Zheng, Ingo R. Titze, Anil Palaparthi and Qian Xue
Appl. Sci. 2023, 13(23), 12691; https://doi.org/10.3390/app132312691 - 27 Nov 2023
Cited by 2 | Viewed by 1189
Abstract
The quasi-steady flow assumption (QSFA) is commonly used in the field of biomechanics of phonation. It approximates time-varying glottal flow with steady flow solutions based on frozen glottal shapes, ignoring unsteady flow behaviors and vocal fold motion. This study examined the limitations of [...] Read more.
The quasi-steady flow assumption (QSFA) is commonly used in the field of biomechanics of phonation. It approximates time-varying glottal flow with steady flow solutions based on frozen glottal shapes, ignoring unsteady flow behaviors and vocal fold motion. This study examined the limitations of QSFA in human phonation using numerical methods by considering factors of phonation frequency, air inertance in the vocal tract, and irregular glottal shapes. Two sets of irregular glottal shapes were examined through dynamic, pseudo-static, and quasi-steady simulations. The differences between dynamic and quasi-steady/pseudo-static simulations were measured for glottal flow rate, glottal wall pressure, and sound spectrum to evaluate the validity of QSFA. The results show that errors in glottal flow rate and wall pressure predicted by QSFA were small at 100 Hz but significant at 500 Hz due to growing flow unsteadiness. Air inertia in the vocal tract worsened predictions when interacting with unsteady glottal flow. Flow unsteadiness also influenced the harmonic energy ratio, which is perceptually important. The effects of glottal shape and glottal wall motion on the validity of QSFA were found to be insignificant. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

Back to TopTop