Acoustics

Editorial

Jump to: Research, Other

8 pages, 198 KB

Open AccessEditorial

Developments in Acoustic Phonetic Research

by Georgios P. Georgiou

Acoustics 2026, 8(1), 19; https://doi.org/10.3390/acoustics8010019 - 16 Mar 2026

Viewed by 715

Abstract

Acoustic phonetics has entered a period of rapid expansion, shaped by new theoretical questions, richer empirical environments, and unprecedented advances in measurement and modeling [...] Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

Research

Jump to: Editorial, Other

11 pages, 2591 KB

Open AccessArticle

Clarification of the Acoustic Characteristics of Velopharyngeal Insufficiency by Acoustic Simulation Using the Boundary Element Method: A Pilot Study

by Mami Shiraishi, Katsuaki Mishima, Masahiro Takekawa, Masaaki Mori and Hirotsugu Umeda

Acoustics 2025, 7(2), 26; https://doi.org/10.3390/acoustics7020026 - 13 May 2025

Viewed by 1573

Abstract

A model of the vocal tract that mimicked velopharyngeal insufficiency was created, and acoustic analysis was performed using the boundary element method to clarify the acoustic characteristics of velopharyngeal insufficiency. The participants were six healthy adults. Computed tomography (CT) images were taken from [...] Read more.

A model of the vocal tract that mimicked velopharyngeal insufficiency was created, and acoustic analysis was performed using the boundary element method to clarify the acoustic characteristics of velopharyngeal insufficiency. The participants were six healthy adults. Computed tomography (CT) images were taken from the frontal sinus to the glottis during phonation of the Japanese vowels /i/ and /u/, and models of the vocal tracts were created from the CT data. To recreate velopharyngeal insufficiency, coupling of the nasopharynx was carried out in vocal tract models with no nasopharyngeal coupling, and the coupling site was enlarged in models with nasopharyngeal coupling. The vocal tract models were extended virtually for 12 cm in a cylindrical shape to represent the region from the lower part of the glottis to the tracheal bifurcation. The Kirchhoff–Helmholtz integral equation was used for the wave equation, and the boundary element method was used for discretization. Frequency response curves from 1 to 3000 Hz were calculated by applying the boundary element method. The curves showed the appearance of a pole–zero pair around 500 Hz, increased intensity around 250 Hz, decreased intensity around 500 Hz, decreased intensities of the first and second formants (F1 and F2), and a lower frequency of F2. Of these findings, increased intensity around 250 Hz, decreased intensity around 500 Hz, decreased intensities of F1 and F2, and lower frequency of F2 agree with the previously reported acoustic characteristics of hypernasality. Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

► Show Figures

Figure 1

18 pages, 976 KB

Open AccessArticle

A Z-Test-Based Evaluation of a Least Mean Square Filter for Noise Reduction

by Alan Rodríguez Bojorjes, Abel Garcia-Barrientos, Marco Cárdenas-Juárez, Ulises Pineda-Rico, Armando Arce, Sharon Macias Velasquez and Obed Pérez Cortés

Acoustics 2025, 7(2), 20; https://doi.org/10.3390/acoustics7020020 - 14 Apr 2025

Viewed by 2490

Abstract

This paper presents a comprehensive evaluation using a Z-test to assess the effectiveness of an adaptive Least Mean Squares (LMS) filter driven by the Steepest Descent Method (SDM). The study utilizes a male voice recording, captured in a controlled studio environment, to which [...] Read more.

This paper presents a comprehensive evaluation using a Z-test to assess the effectiveness of an adaptive Least Mean Squares (LMS) filter driven by the Steepest Descent Method (SDM). The study utilizes a male voice recording, captured in a controlled studio environment, to which persistent Gaussian noise was intentionally introduced, simulating real-world interference. All signal processing methods were implemented accordingly in MATLAB.version: 9.13.0 (R2022b), Natick, MA, USA: The MathWorks Inc.; 2022. The adaptive filter demonstrated a significant improvement of 20 dB in Signal-to-Noise Ratio (SNR) following the initial optimization of the filter parameter

μ

. To further assess the LMS filter’s performance, an empirical experiment was conducted with 30 young adults, aged between 20 and 30 years, who were tasked with qualitatively distinguishing between the clean and noise-corrupted signals (blind test). The quantitative analysis and statistical evaluation of the participants’ responses revealed that a significant majority, specifically 80%, were able to reliably identify the noise-affected and filtered signals. This outcome highlights the LMS filter’s potential—despite the slow convergence of the SDM—for enhancing signal clarity in noise-contaminated environments, thus validating its practical application in speech processing and noise reduction. Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

► Show Figures

Figure 1

17 pages, 4566 KB

Open AccessArticle

Vocal Directivity of the Greek Singing Voice on the First Three Formant Frequencies

by Georgios Dedousis, Konstantinos Bakogiannis, Areti Andreopoulou and Anastasia Georgaki

Acoustics 2025, 7(1), 13; https://doi.org/10.3390/acoustics7010013 - 4 Mar 2025

Cited by 2 | Viewed by 2182

Abstract

This study explores the relationship between formant frequencies and the directivity patterns of the Greek singing voice. Recordings were conducted in a controlled acoustic environment with four professional singers, two trained in classical music and two in Byzantine chant. Using microphones placed symmetrically [...] Read more.

This study explores the relationship between formant frequencies and the directivity patterns of the Greek singing voice. Recordings were conducted in a controlled acoustic environment with four professional singers, two trained in classical music and two in Byzantine chant. Using microphones placed symmetrically on a hemispherical structure, participants sang the Greek vowels across different registers. Directivity patterns were analyzed in third-octave bands centered on each singer’s first three formant frequencies (F1, F2, F3). The results indicate that directivity patterns vary with register and center frequency, with differences observed across vowels and singers. These findings contribute to vocal production research and the development of simulation, auralization, and virtual reality applications for speech and music. Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

► Show Figures

Figure 1

11 pages, 468 KB

Open AccessArticle

Acoustic Analyses of L1 and L2 Vowel Interactions in Mandarin–Cantonese Late Bilinguals

by Yike Yang

Acoustics 2024, 6(2), 568-578; https://doi.org/10.3390/acoustics6020030 - 17 Jun 2024

Cited by 3 | Viewed by 4200

Abstract

While the focus of bilingual research is frequently on simultaneous or early bilingualism, the interactions between late bilinguals’ first language (L1) and second language (L2) have rarely been studied previously. To fill this research gap, the aim of the current study was to [...] Read more.

While the focus of bilingual research is frequently on simultaneous or early bilingualism, the interactions between late bilinguals’ first language (L1) and second language (L2) have rarely been studied previously. To fill this research gap, the aim of the current study was to investigate the production of vowels in the L1 Mandarin and L2 Cantonese of Mandarin–Cantonese late bilinguals in Hong Kong. A production experiment was conducted with 22 Mandarin–Cantonese bilinguals, as well as with 20 native Mandarin speakers and 21 native Cantonese speakers. Acoustic analyses, including formants of and Euclidean distances between the vowels, were performed. Both vowel category assimilation and dissimilation were noted in the Mandarin–Cantonese bilinguals’ L1 and L2 vowel systems, suggesting interactions between the bilinguals’ L1 and L2 vowel categories. In general, the findings are in line with the hypotheses of the Speech Learning Model and its revised version, which state that L1–L2 phonetic interactions are inevitable, as there is a common phonetic space for storing the L1 and L2 phonetic categories, and that learners always have the ability to adapt their phonetic space. Future studies should refine the data elicitation method, increase the sample size and include more language pairs to better understand L1 and L2 phonetic interactions. Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

► Show Figures

Figure 1

31 pages, 7454 KB

Open AccessArticle

Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies

by Neha Chauhan, Tsuyoshi Isshiki and Dongju Li

Acoustics 2024, 6(2), 439-469; https://doi.org/10.3390/acoustics6020024 - 14 May 2024

Cited by 14 | Viewed by 6556

Abstract

This paper delves into an in-depth exploration of speaker recognition methodologies, with a primary focus on three pivotal approaches: feature-level fusion, dimension reduction employing principal component analysis (PCA) and independent component analysis (ICA), and feature optimization through a genetic algorithm (GA) and the [...] Read more.

This paper delves into an in-depth exploration of speaker recognition methodologies, with a primary focus on three pivotal approaches: feature-level fusion, dimension reduction employing principal component analysis (PCA) and independent component analysis (ICA), and feature optimization through a genetic algorithm (GA) and the marine predator algorithm (MPA). This study conducts comprehensive experiments across diverse speech datasets characterized by varying noise levels and speaker counts. Impressively, the research yields exceptional results across different datasets and classifiers. For instance, on the TIMIT babble noise dataset (120 speakers), feature fusion achieves a remarkable speaker identification accuracy of 92.7%, while various feature optimization techniques combined with K nearest neighbor (KNN) and linear discriminant (LD) classifiers result in a speaker verification equal error rate (SV EER) of 0.7%. Notably, this study achieves a speaker identification accuracy of 93.5% and SV EER of 0.13% on the TIMIT babble noise dataset (630 speakers) using a KNN classifier with feature optimization. On the TIMIT white noise dataset (120 and 630 speakers), speaker identification accuracies of 93.3% and 83.5%, along with SV EER values of 0.58% and 0.13%, respectively, were attained utilizing PCA dimension reduction and feature optimization techniques (PCA-MPA) with KNN classifiers. Furthermore, on the voxceleb1 dataset, PCA-MPA feature optimization with KNN classifiers achieves a speaker identification accuracy of 95.2% and an SV EER of 1.8%. These findings underscore the significant enhancement in computational speed and speaker recognition performance facilitated by feature optimization strategies. Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

► Show Figures

Figure 1

15 pages, 1311 KB

Open AccessArticle

Acoustic Characteristics of Greek Vowels Produced by Adult Heritage Speakers of Albanian

by Georgios P. Georgiou and Aretousa Giannakou

Acoustics 2024, 6(1), 257-271; https://doi.org/10.3390/acoustics6010014 - 10 Mar 2024

Cited by 5 | Viewed by 3546

Abstract

Investigating heritage language (HL)-contact effects on the dominant language has received limited attention despite its importance in understanding the dynamic interplay between linguistic systems in situations of bilingualism. This study compares the acoustic characteristics of Greek vowels produced by heritage speakers (HSs) of [...] Read more.

Investigating heritage language (HL)-contact effects on the dominant language has received limited attention despite its importance in understanding the dynamic interplay between linguistic systems in situations of bilingualism. This study compares the acoustic characteristics of Greek vowels produced by heritage speakers (HSs) of Albanian and monolingual Greek speakers, aiming to identify potential differences and explain them. The participants were adult second-generation HSs of Albanian with Greek as their dominant language, born and raised in Greece. A control group of age-matched monolingual Greek speakers was included for comparison purposes. All participants engaged in a controlled speech production task, with the data segmented to extract acoustic values pertaining to the first three formants and the duration of Greek vowels. Bayesian regression models were employed for the subsequent statistical analysis. The results demonstrated differences in the first three formants of certain vowels and the duration of all vowels. These differences can be attributed to the crosslinguistic effect of HL on the dominant language, as well as the interplay between the dynamic and internalized language system of the speakers and the complex effect of the sociophonetic context. These outcomes contribute to the hypothesis positing the emergence of deflected phonetic categories among a distinctive group of bilinguals, namely HSs. Furthermore, this study underscores the significance of a comprehensive exploration of the sociophonetic context of HSs for a nuanced understanding of their phonetic patterns. Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

► Show Figures

Figure 1

11 pages, 2878 KB

Open AccessArticle

The Effect of the Frequency and Energetic Content of Broadband Noise on the Lombard Effect and Speech Intelligibility

by Pasquale Bottalico and Silvia Murgia

Acoustics 2023, 5(4), 898-908; https://doi.org/10.3390/acoustics5040052 - 10 Oct 2023

Cited by 4 | Viewed by 6532

Abstract

The Lombard effect is an unconscious reflex of speakers to increase vocal effort when disturbed by noise, aiming to enhance speech intelligibility. This study aims to evaluate the effect of noise with different energetic content and levels at various frequencies on the Lombard [...] Read more.

The Lombard effect is an unconscious reflex of speakers to increase vocal effort when disturbed by noise, aiming to enhance speech intelligibility. This study aims to evaluate the effect of noise with different energetic content and levels at various frequencies on the Lombard effect, communication disturbance, vocal comfort, and speech intelligibility. Twenty university students participated in the study, reading a six-sentence excerpt and performing an intelligibility test under 12 randomized noise conditions. These conditions included noises at low (20–500 Hz), medium (500–4000 Hz), and high frequencies (4000–20,000 Hz), at four levels (45 dB, 55 dB, 65 dB, 75 dB). After each condition, participants rated their perceived communication disturbance and vocal discomfort. The results indicated that noise with energetic content at medium frequencies produced the highest Lombard effect, produced the most detrimental effect on communication disturbance and vocal comfort, and caused the strongest decrease in speech intelligibility, whereas it was minimally affected by low- and high-frequency noise. In conclusion, this study highlights that medium-frequency noise has the greatest impact on vocal effort, communication disturbance, and vocal comfort, while low- and high-frequency noise has minimal effect on speech intelligibility. Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

► Show Figures

Figure 1

Other

Jump to: Editorial, Research

10 pages, 585 KB

Open AccessTechnical Note

Text-Independent Phone-to-Audio Alignment Leveraging SSL (TIPAA-SSL) Pre-Trained Model Latent Representation and Knowledge Transfer

by Noé Tits, Prernna Bhatnagar and Thierry Dutoit

Acoustics 2024, 6(3), 772-781; https://doi.org/10.3390/acoustics6030042 - 29 Aug 2024

Cited by 1 | Viewed by 2954

Abstract

In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model [...] Read more.

In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained using forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages. Full article

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Developments in Acoustic Phonetic Research

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (9 papers)

Editorial

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI