Developments in Acoustic Phonetic Research

A special issue of Acoustics (ISSN 2624-599X).

Deadline for manuscript submissions: 31 December 2024 | Viewed by 11952

Special Issue Editor


E-Mail Website
Guest Editor
Director of the University of Nicosia Phonetic Lab, Department of Languages and Literature, University of Nicosia, Nicosia CY-2417, Cyprus
Interests: phonetics; phonology; speech acquisition; speech-language disorders

Special Issue Information

Dear Colleagues,

Acoustic phonetics makes invaluable theoretical and practical contributions to the understanding of human speech. Theoretical advancements in acoustic phonetics provide insights into the physical properties and acoustic cues that underlie speech production and perception, allowing researchers to gain a deeper understanding of the underlying patterns, crosslinguistic variations, and subtle phonetic contrasts. These insights have profound theoretical implications, shedding light on the fundamental mechanisms of speech acquisition. Furthermore, the practical applications of acoustic phonetics are extensive. They inform effective speech therapy interventions, optimize language teaching and pronunciation training, enhance speech technology and human–computer interaction, facilitate forensic analysis and speaker identification, and contribute to the development of innovative assistive technologies. Acoustic phonetics thus unites theoretical exploration with practical advancements, ultimately improving communication, accessibility, and our grasp of the complexity of spoken language.

The thematic areas of the volume include but are not limited to the acoustic analysis of first or second language sounds or prosody (with a priority on underresearched varieties), crossdialectal comparison of acoustic features, acoustic analysis of speech in difficult listening conditions (e.g., noisy environments, reverberant environments, whispered speech, fast speech, etc.), acoustic characteristics of speech in atypical populations, role of acoustic cues in the perceptual categorization and discrimination of nonnative speech sounds, use of artificial intelligence for predicting speech perception patterns based on acoustic cues, factors that determine the acoustic characteristics of speech (e.g., linguistic, social, biological, cognitive, psychological, etc.), and developmental changes in the acoustic properties of speech (e.g., from early childhood to adulthood), among others.

Dr. Georgios P. Georgiou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Acoustics is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • acoustic phonetics
  • acoustic cues
  • speech production
  • speech perception
  • speech sounds and prosody
  • typical and atypical speech

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

11 pages, 468 KiB  
Article
Acoustic Analyses of L1 and L2 Vowel Interactions in Mandarin–Cantonese Late Bilinguals
by Yike Yang
Acoustics 2024, 6(2), 568-578; https://doi.org/10.3390/acoustics6020030 - 17 Jun 2024
Viewed by 1800
Abstract
While the focus of bilingual research is frequently on simultaneous or early bilingualism, the interactions between late bilinguals’ first language (L1) and second language (L2) have rarely been studied previously. To fill this research gap, the aim of the current study was to [...] Read more.
While the focus of bilingual research is frequently on simultaneous or early bilingualism, the interactions between late bilinguals’ first language (L1) and second language (L2) have rarely been studied previously. To fill this research gap, the aim of the current study was to investigate the production of vowels in the L1 Mandarin and L2 Cantonese of Mandarin–Cantonese late bilinguals in Hong Kong. A production experiment was conducted with 22 Mandarin–Cantonese bilinguals, as well as with 20 native Mandarin speakers and 21 native Cantonese speakers. Acoustic analyses, including formants of and Euclidean distances between the vowels, were performed. Both vowel category assimilation and dissimilation were noted in the Mandarin–Cantonese bilinguals’ L1 and L2 vowel systems, suggesting interactions between the bilinguals’ L1 and L2 vowel categories. In general, the findings are in line with the hypotheses of the Speech Learning Model and its revised version, which state that L1–L2 phonetic interactions are inevitable, as there is a common phonetic space for storing the L1 and L2 phonetic categories, and that learners always have the ability to adapt their phonetic space. Future studies should refine the data elicitation method, increase the sample size and include more language pairs to better understand L1 and L2 phonetic interactions. Full article
(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)
Show Figures

Figure 1

31 pages, 7454 KiB  
Article
Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies
by Neha Chauhan, Tsuyoshi Isshiki and Dongju Li
Acoustics 2024, 6(2), 439-469; https://doi.org/10.3390/acoustics6020024 - 14 May 2024
Viewed by 1964
Abstract
This paper delves into an in-depth exploration of speaker recognition methodologies, with a primary focus on three pivotal approaches: feature-level fusion, dimension reduction employing principal component analysis (PCA) and independent component analysis (ICA), and feature optimization through a genetic algorithm (GA) and the [...] Read more.
This paper delves into an in-depth exploration of speaker recognition methodologies, with a primary focus on three pivotal approaches: feature-level fusion, dimension reduction employing principal component analysis (PCA) and independent component analysis (ICA), and feature optimization through a genetic algorithm (GA) and the marine predator algorithm (MPA). This study conducts comprehensive experiments across diverse speech datasets characterized by varying noise levels and speaker counts. Impressively, the research yields exceptional results across different datasets and classifiers. For instance, on the TIMIT babble noise dataset (120 speakers), feature fusion achieves a remarkable speaker identification accuracy of 92.7%, while various feature optimization techniques combined with K nearest neighbor (KNN) and linear discriminant (LD) classifiers result in a speaker verification equal error rate (SV EER) of 0.7%. Notably, this study achieves a speaker identification accuracy of 93.5% and SV EER of 0.13% on the TIMIT babble noise dataset (630 speakers) using a KNN classifier with feature optimization. On the TIMIT white noise dataset (120 and 630 speakers), speaker identification accuracies of 93.3% and 83.5%, along with SV EER values of 0.58% and 0.13%, respectively, were attained utilizing PCA dimension reduction and feature optimization techniques (PCA-MPA) with KNN classifiers. Furthermore, on the voxceleb1 dataset, PCA-MPA feature optimization with KNN classifiers achieves a speaker identification accuracy of 95.2% and an SV EER of 1.8%. These findings underscore the significant enhancement in computational speed and speaker recognition performance facilitated by feature optimization strategies. Full article
(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)
Show Figures

Figure 1

15 pages, 1311 KiB  
Article
Acoustic Characteristics of Greek Vowels Produced by Adult Heritage Speakers of Albanian
by Georgios P. Georgiou and Aretousa Giannakou
Acoustics 2024, 6(1), 257-271; https://doi.org/10.3390/acoustics6010014 - 10 Mar 2024
Cited by 3 | Viewed by 2132
Abstract
Investigating heritage language (HL)-contact effects on the dominant language has received limited attention despite its importance in understanding the dynamic interplay between linguistic systems in situations of bilingualism. This study compares the acoustic characteristics of Greek vowels produced by heritage speakers (HSs) of [...] Read more.
Investigating heritage language (HL)-contact effects on the dominant language has received limited attention despite its importance in understanding the dynamic interplay between linguistic systems in situations of bilingualism. This study compares the acoustic characteristics of Greek vowels produced by heritage speakers (HSs) of Albanian and monolingual Greek speakers, aiming to identify potential differences and explain them. The participants were adult second-generation HSs of Albanian with Greek as their dominant language, born and raised in Greece. A control group of age-matched monolingual Greek speakers was included for comparison purposes. All participants engaged in a controlled speech production task, with the data segmented to extract acoustic values pertaining to the first three formants and the duration of Greek vowels. Bayesian regression models were employed for the subsequent statistical analysis. The results demonstrated differences in the first three formants of certain vowels and the duration of all vowels. These differences can be attributed to the crosslinguistic effect of HL on the dominant language, as well as the interplay between the dynamic and internalized language system of the speakers and the complex effect of the sociophonetic context. These outcomes contribute to the hypothesis positing the emergence of deflected phonetic categories among a distinctive group of bilinguals, namely HSs. Furthermore, this study underscores the significance of a comprehensive exploration of the sociophonetic context of HSs for a nuanced understanding of their phonetic patterns. Full article
(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)
Show Figures

Figure 1

11 pages, 2878 KiB  
Article
The Effect of the Frequency and Energetic Content of Broadband Noise on the Lombard Effect and Speech Intelligibility
by Pasquale Bottalico and Silvia Murgia
Acoustics 2023, 5(4), 898-908; https://doi.org/10.3390/acoustics5040052 - 10 Oct 2023
Viewed by 2596
Abstract
The Lombard effect is an unconscious reflex of speakers to increase vocal effort when disturbed by noise, aiming to enhance speech intelligibility. This study aims to evaluate the effect of noise with different energetic content and levels at various frequencies on the Lombard [...] Read more.
The Lombard effect is an unconscious reflex of speakers to increase vocal effort when disturbed by noise, aiming to enhance speech intelligibility. This study aims to evaluate the effect of noise with different energetic content and levels at various frequencies on the Lombard effect, communication disturbance, vocal comfort, and speech intelligibility. Twenty university students participated in the study, reading a six-sentence excerpt and performing an intelligibility test under 12 randomized noise conditions. These conditions included noises at low (20–500 Hz), medium (500–4000 Hz), and high frequencies (4000–20,000 Hz), at four levels (45 dB, 55 dB, 65 dB, 75 dB). After each condition, participants rated their perceived communication disturbance and vocal discomfort. The results indicated that noise with energetic content at medium frequencies produced the highest Lombard effect, produced the most detrimental effect on communication disturbance and vocal comfort, and caused the strongest decrease in speech intelligibility, whereas it was minimally affected by low- and high-frequency noise. In conclusion, this study highlights that medium-frequency noise has the greatest impact on vocal effort, communication disturbance, and vocal comfort, while low- and high-frequency noise has minimal effect on speech intelligibility. Full article
(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)
Show Figures

Figure 1

Other

Jump to: Research

10 pages, 585 KiB  
Technical Note
Text-Independent Phone-to-Audio Alignment Leveraging SSL (TIPAA-SSL) Pre-Trained Model Latent Representation and Knowledge Transfer
by Noé Tits, Prernna Bhatnagar and Thierry Dutoit
Acoustics 2024, 6(3), 772-781; https://doi.org/10.3390/acoustics6030042 - 29 Aug 2024
Viewed by 1140
Abstract
In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model [...] Read more.
In this paper, we present a novel approach for text-independent phone-to-audio alignment based on phoneme recognition, representation learning and knowledge transfer. Our method leverages a self-supervised model (Wav2Vec2) fine-tuned for phoneme recognition using a Connectionist Temporal Classification (CTC) loss, a dimension reduction model and a frame-level phoneme classifier trained using forced-alignment labels (using Montreal Forced Aligner) to produce multi-lingual phonetic representations, thus requiring minimal additional training. We evaluate our model using synthetic native data from the TIMIT dataset and the SCRIBE dataset for American and British English, respectively. Our proposed model outperforms the state-of-the-art (charsiu) in statistical metrics and has applications in language learning and speech processing systems. We leave experiments on other languages for future work but the design of the system makes it easily adaptable to other languages. Full article
(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Tentative title: Improving task naturalness using an immersive video game environment: A case with phonetic convergence
Abstract: Phonetic convergence, where interlocutors exhibit increased phonetic similarity during speech, varies according to numerous factors. This variability presents challenges in eliciting naturalistic language within laboratory settings. To tackle this, we employed a game-based methodology, utilizing the engaging environment of Minecraft to create a controlled yet dynamic task for pairs of participants. Tasked with navigating puzzles, including mazes marked by minimal pairs such as 'bear' versus 'pear', participants relied on clear communication to progress. We analyzed voice onset time, vowel length, first and second formant frequencies, and fundamental frequency values across thirty word-initial voicing minimal pairs to assess phonetic convergence. Our results revealed a dichotomy: while some participants exhibited convergence in acoustic features, others diverged. Notably, participants who successfully completed the task and those who demonstrated quicker task completion converged less along the measured acoustic features. These findings suggest that game-based tasks may elicit real-world linguistic interactions more intricately than traditional lab experiments, providing a nuanced insight into phonetic variation.
Back to TopTop