Developments in Acoustic Phonetic Research

Georgiou, Georgios P.

doi:10.3390/acoustics8010019

Open AccessEditorial

Developments in Acoustic Phonetic Research

by

Georgios P. Georgiou

^1,2

¹

Department of Languages and Literature, University of Nicosia, CY-2417 Nicosia, Cyprus

²

Phonetic Lab, University of Nicosia, CY-2417 Nicosia, Cyprus

Acoustics 2026, 8(1), 19; https://doi.org/10.3390/acoustics8010019

Submission received: 19 December 2025 / Accepted: 14 February 2026 / Published: 16 March 2026

(This article belongs to the Special Issue Developments in Acoustic Phonetic Research)

Download Versions Notes

1. Introduction

Acoustic phonetics has entered a period of rapid expansion, shaped by new theoretical questions, richer empirical environments, and unprecedented advances in measurement and modeling. Where the field once relied primarily on laboratory recordings of isolated segments, researchers now routinely investigate speech as it occurs in daily life: embedded in noise, shaped by bilingual experience, influenced by musical or stylistic conventions, and processed by both human listeners and data-driven computational systems [1,2,3,4]. This broader perspective has deepened our understanding of how speech is produced, transmitted, and perceived under real communicative conditions, and it has revealed how much remains to be explained when speech departs from quiet rooms and carefully controlled tasks.

In addition, methodological developments have widened the empirical and analytical toolkit available to phoneticians. High-resolution imaging allows direct observation of articulatory structures in motion, computational models enable physically interpretable simulations of atypical or perturbed speech, and self-supervised learning approaches offer new ways to extract structure from large amounts of unlabeled audio. These innovations open possibilities for linking articulatory, acoustic, and perceptual levels of analysis in ways that were not previously feasible [5,6,7].

The Special Issue Developments in Acoustic Phonetic Research brings together work that reflects this momentum and illustrates the diversity of new directions in the field. The contributions span four major domains: (i) speech production and perception in noise, including the Lombard effect and intelligibility under adverse conditions; (ii) bilingual and heritage speech, where cross-linguistic influence reshapes vowel systems and challenges conventional models of phonetic categories; (iii) the acoustics of singing voices and their spatial radiation patterns; and (iv) technological and clinical applications, from noise-robust speaker recognition to vocal-tract simulations and self-supervised alignment methods.

Individually, these studies advance our understanding of specific questions. Collectively, they reveal a discipline that is increasingly interdisciplinary, empirically grounded, and oriented toward both theoretical insight and practical impact. They highlight not only what acoustic phonetics has achieved but also the challenges ahead, ranging from accent bias and data scarcity to the need for integrative models that connect physical, computational, and perceptual perspectives.

2. Papers Included in the Special Issue

2.1. Speech in Noise and the Lombard Effect

Bottalico and Murgia evaluate how the spectral content of broadband noise, not just its overall level, modulates the Lombard effect and speech intelligibility. In their experiment, speakers read sentences and performed intelligibility tests under 12 noise conditions: low (20–500 Hz), mid (500–4000 Hz), and high (4000–20,000 Hz) bands, each presented at 45, 55, 65, and 75 dB. They show that mid-frequency noise—overlapping with the bulk of speech energy and critical bands for intelligibility—induces the largest increases in vocal level, the greatest perceived disturbance and vocal discomfort, and the sharpest decline in intelligibility. In contrast, low- and high-frequency noise have relatively minor effects on both production and perception. This result is consistent with earlier psychoacoustic evidence that the Lombard response is frequency-specific and particularly sensitive to interference in the 500–4000 Hz range, which carries many consonantal cues.

The study fills a practical and theoretical gap. Many architectural and ergonomic standards still treat “background noise level” as a scalar. By explicitly manipulating spectral content, Bottalico and Murgia show that mid-band noise is disproportionately harmful for both talkers and listeners. This has direct implications for classroom and open-plan office design, where reducing mid-band noise (e.g., from ventilation, nearby talkers, and equipment) may be more crucial than reducing low-frequency rumbles. It also provides a richer acoustic context for work on Lombard speech intelligibility in hearing-impaired and cochlear-implant listeners, where Lombard-style modifications can sometimes be harnessed for benefit [8,9,10].

2.2. Acoustic Phonetics in Bilingual and Heritage Contexts

Georgiou and Giannakou investigate how an Albanian heritage language shapes the dominant Greek vowel system in adult second-generation bilinguals. Heritage speakers grew up in Albanian-speaking households in Greece and use Greek as their dominant language. Comparing them to age-matched monolingual Greeks, the authors analyze F1–F3 formants and durations for all Greek vowels using Bayesian regression models. They report systematic differences in the first three formants of several vowels and longer durations across the board for heritage speakers. These patterns are interpreted as deflected phonetic categories emerging from the interaction of the following: (i) cross-linguistic influence from Albanian (which has a somewhat different seven-vowel system), (ii) the internal dynamics of the bilingual phonological system, and (iii) the sociophonetic context of Albanian communities in Greece. Rather than viewing heritage grammars as “incomplete”, the study argues for seeing them as restructured systems that respond to rich social meanings and usage patterns. This work tackles an underexplored area in the acoustic documentation of heritage speakers in Southeastern Europe, complementing prior work on Greek vowels and Albanian vowel systems, offering a valuable reference for future research on cross-language vowel learning, sociophonetic identity construction, and clinical assessment of bilingual speech.

Yang provides a detailed acoustic analysis of how first language (L1) Mandarin and second language (L2) Cantonese vowels interact in late bilinguals living in Hong Kong. Late Mandarin–Cantonese bilinguals, who acquired Cantonese after puberty and remain Mandarin-dominant, are compared to monolingual Mandarin and Cantonese controls. Using carefully controlled sentence frames in both languages, the study examines the three corner vowels /i a u/, extracting F1–F2 values at vowel midpoints and analyzing them with linear mixed-effects models, supplemented by Euclidean distance measures in the F1–F2 plane. The results show clear L2-induced restructuring of the L1; relative to Mandarin monolinguals, bilinguals produce Mandarin /i/ as backer, /u/ as fronter, and /a/ as higher and fronter, yielding a more crowded and centralized vowel space. At the same time, their Cantonese vowels do not fully converge on Cantonese monolingual norms—/i/ is backer and lower, while /a/ and /u/ are generally higher—despite several years of residence and advanced functional proficiency. These patterns point to substantial but incomplete L2 phonetic learning in adulthood. Critically, Yang interprets these outcomes in the framework of the Speech Learning Model and its revised version (SLM, SLM-r) and the Category Disassociation Hypothesis (CDH). The bilinguals’ productions exhibit both assimilation and dissimilation, since some L2 categories (e.g., Cantonese /a/) are approximated relatively well, suggesting the formation of new L2 categories, while in other cases, L1 and L2 categories remain distinct but are shifted away from each other in opposite directions to maintain contrast. This indicates that L1 and L2 vowel systems are mutually interactive and remain plastic well into adulthood, with both convergence and contrast-enhancing divergence at play.

2.3. Beyond Speech: Singing Voice and Spatial Acoustics

Dedousis et al. extend acoustic-phonetic inquiry to the singing voice, focusing on the hemispherical directivity of Greek sung vowels /a e i o u/ across low, mid, and high registers for two classical and two Byzantine-chant singers. Using 29 microphones mounted on a hemispherical frame in a hemi-anechoic room, they compute third-octave-band directivity patterns centered on each singer’s F1–F3. They find that directivity patterns vary systematically with center frequency (F1 vs. F2 vs. F3), register, vowel quality, and singing style. For example, higher registers and front vowels tend to produce more forward-focused radiation at F2 and F3, while differences between classical and Byzantine chant emerge in both projection (energy distribution across bands) and radiation (spatial energy patterns).

The study closes two important gaps. First, it adds formant-referenced directivity data for a non-Anglophone, stylistically distinctive tradition (Byzantine chant), where work on formant tuning and ornamentation is still limited. Second, it provides high-resolution hemispherical data that can inform room-acoustic simulations, auralization, VR/AR applications, and microphone placement strategies for both research and performance.

2.4. Signal Processing, Alignment, and Clinical/Technological Applications

2.4.1. Noise-Resilient Speaker Recognition

Chauhan et al. examine how speaker recognition systems can be made more robust to noise by combining feature-level fusion, dimensionality reduction (PCA and ICA), and metaheuristic feature optimization (genetic algorithms and marine predator optimization). Using TIMIT in babble and white noise (for 120 and 630 speakers) and VoxCeleb1, they show that optimized feature subsets achieve very high speaker identification accuracies (≈93–95%) and extremely low equal error rates (down to 0.13%) under noisy conditions. The study directly addresses a methodological limitation by showing that many speaker-identification systems rely on large, deep models while under-optimizing the underlying feature spaces, particularly in noisy or computationally constrained settings. By systematically evaluating fusion and optimization strategies and reporting detailed performance across datasets and classifiers, the authors indicate a roadmap for building efficient, noise-robust speaker recognition systems that remain accessible to labs and applications without GPU-intensive deep architectures.

2.4.2. Noise Reduction and Perceptual Validation

Rodríguez Bojorjes et al. evaluate an adaptive Least Mean Squares (LMS) filter, driven by the Steepest Descent Method, for removing Gaussian noise from a male speech recording. Objectively, the LMS filter improves SNR by about 20 dB. Perceptually, in a blind test with 30 young adults (20–30 years), 80% correctly identified which of two signals had undergone noise reduction—well above the 50% chance level, as confirmed by a Z-test on proportions. The authors also note an age-related pattern in which listeners aged 30+ (a small subset of the sample) have more difficulty detecting the improvement, consistent with known declines in sensitivity around 3–5 kHz that are critical for speech intelligibility. The study thus exemplifies good practice in bridging signal-based and human-based evaluation; improvements in SNR are explicitly checked against what listeners actually hear, emphasizing that objective enhancement does not automatically translate into perceived benefit.

2.4.3. Simulation of Velopharyngeal Insufficiency (VPI)

Shiraishi et al. develop boundary-element vocal-tract models to simulate the acoustic consequences of velopharyngeal insufficiency during Japanese vowels /i/ and /u/. CT scans from six healthy adults are used to build 3D models from the frontal sinus to the tracheal bifurcation. VPI is modeled by opening or enlarging nasopharyngeal coupling sites (1–3 mesh holes) to emulate leakage from oral to nasal cavities. Frequency response curves (1–3000 Hz) reveal consistent acoustic signatures of the VPI-like state: a pole–zero pair around 500 Hz, increased energy near 250 Hz, reduced levels around 500 Hz, and decreased intensity and/or frequency of F1 and F2. These patterns align with prior descriptions of hypernasal speech (e.g., low-frequency boost, attenuation around 500 Hz, reduced formant intensity, and bandwidth changes), suggesting that the simulation captures clinically relevant acoustic phenomena. The study fills a methodological gap in cleft-palate research by showing how physically interpretable changes in vocal-tract geometry can be linked to acoustic consequences without relying solely on patient data, paving the way for in silico experiments that test hypotheses about morphology, surgery, and acoustic perception.

2.4.4. Self-Supervised Text-Independent Phone-to-Audio Alignment

Tits et al. propose TIPAA-SSL, a text-independent phone-to-audio alignment system that leverages a wav2vec 2.0 model fine-tuned for phoneme recognition using CTC loss. The pipeline consists of the following: (i) extracting frame-level SSL representations, (ii) applying PCA for dimensionality reduction, and (iii) training a frame-level phoneme classifier on forced-alignment labels obtained with the Montreal Forced Aligner. The resulting system outputs both phoneme labels and boundaries for each audio frame. On TIMIT (American English) and SCRIBE (British English), TIPAA-SSL outperforms the state-of-the-art text-independent character model on precision, recall, and F1, with only a modest trade-off in r-value. Crucially, the authors emphasize that self-supervised pre-training on large unannotated corpora combined with relatively small supervised alignment sets yields a model that is less biased towards American English and more robust to accent variation, a persistent issue in SSL-based systems trained primarily on American data. TIPAA-SSL thus addresses a key gap between powerful but opaque SSL representations and practical tasks such as pronunciation training, computer-assisted language learning, and multilingual corpus annotation. It demonstrates how modern representation learning can be wrapped in task-specific modules that remain data-efficient and adaptable to new languages.

3. Cross-Cutting Themes and Remaining Gaps

3.1. Frequency- and Space-Aware Thinking About Speech

The Lombard study emphasizes that mid-frequency noise is uniquely disruptive for intelligibility and comfort. The LMS work explicitly relates noise reduction to frequency-dependent loudness and age-related sensitivity around 4 kHz. The VPI simulations highlight how small geometric changes in velopharyngeal coupling selectively perturb energy around 250 and 500 Hz and reshape F1/F2 patterns. And the singing-voice paper shows that directivity is not only frequency-dependent but also sensitive to vowel-specific and style-specific articulatory settings. Together, these studies push the field away from scalar metrics (“overall level”, “overall SNR”) and toward more nuanced accounts of where in the spectrum and where in space speech energy matters most.

3.2. Plastic, Socially Embedded Phonetic Systems in Bilinguals

The heritage-speaker study illustrates that the dominant language itself, here, Greek, bears the imprint of the heritage language, Albanian, in subtle shifts in formant structure and timing. This pattern resonates with Yang’s findings on Mandarin–Cantonese late bilinguals, where bidirectional interactions between L1 and L2 vowels similarly reshape acoustic space rather than leaving either system intact. Together, these results align with broader research on phonetic plasticity in bilingual and diglossic populations, where both systems are shaped simultaneously by input, identity, and interaction patterns—yet longitudinal data tracing such changes over time remains scarce, especially outside a few well-studied language pairs.

3.3. Integration of Physical Models, Machine Learning, and Perception

The Special Issue showcases the full stack, from physically interpretable vocal-tract models (VPI simulations) to optimized acoustic features (speaker recognition, LMS filtering) and SSL-based representations (TIPAA-SSL), all evaluated through engineering metrics and, in several cases, listener judgments. However, truly closed-loop studies, in which model choices are systematically constrained by theories of perception and production and then evaluated against both behavioral and neural data, are still rare.

3.4. Data Scarcity, Annotation Cost, and Accent Bias

Several papers grapple with limited or biased datasets, such as heritage vowels in a specific immigrant community, clinical simulations based on healthy speakers, and alignment systems trained primarily on American and British English corpora. This reflects a broader challenge in acoustic phonetics and speech technology, namely, building models that work for underrepresented languages, accents, and clinical populations, where collecting large, carefully annotated corpora is costly or ethically complex.

3.5. The Need for Theory-Driven Yet Application-Oriented Research

Many contributions have clear real-world targets—better classroom acoustics, robust speaker ID, more accurate CALL tools, clinically informative simulations—but they also raise theoretical questions about the nature of phonetic categories, articulatory-acoustic mapping, and perceptual weighting of cues in complex environments. Exploiting this bidirectionality, where theory guides design and applications stress-test theory, remains a key opportunity.

4. Future Directions for Acoustic Phonetic Research

4.1. Comprehensive Models of Intelligibility in Complex, Multilingual Listening Conditions

Research on speech under adverse conditions has long emphasized the joint roles of acoustic, phonetic, and lexical redundancy in supporting perception. Recent work on clear and noise-adapted speech shows that talker adaptations can yield substantial intelligibility benefits for native and non-native listeners, but that these benefits vary with accent and listening task [11]. Future models should integrate detailed spectral-temporal descriptors, talker and listener characteristics (age, hearing status, L1/L2 background, prior experience with accents), and measures of listening effort, not just word-recognition scores.

4.2. Longitudinal, Multimodal Studies of Phonetic Change in Real Communities

Cross-sectional snapshots of heritage, immigrant, and multilingual speech show that acoustic categories are dynamic and socially conditioned, but we rarely see how they evolve across life stages, migration histories, or shifts in language dominance. Longitudinal corpora combining audio, articulatory imaging (e.g., real-time MRI and ultrasound), and rich social metadata would allow researchers to track how vowel spaces, consonant realizations, and prosodic patterns drift over time, and how listeners adapt to these changes in noise and quiet [12].

4.3. MRI-Driven 3D Vocal-Tract Modeling for Both Typical and Disordered Speech

Recent MRI-based studies have begun to link volumetric vocal-tract configurations to acoustic output for sustained vowels and simple utterances, validating synthesized signals against recorded speech [13]. Future work should extend these methods to running speech, a wider variety of languages, and clinically relevant conditions (e.g., structural anomalies and neurogenic disorders), while maintaining tight coupling between geometric manipulations, predicted acoustics, and perceptual consequences. This would complement simulation-based approaches (such as the VPI work) and provide a physically grounded basis for diagnosis and treatment planning.

4.4. Interpretable Self-Supervised Representations for Phonetics

Self-supervised models already provide powerful representations for ASR and phone recognition with limited labeled data. A major challenge for acoustic phonetics is to make these representations interpretable, namely, how do latent dimensions relate to classic phonetic parameters such as formant structure, spectral tilt, periodicity vs. aperiodicity, voice quality, or prosodic contours? Emerging work on probing, representation analysis, and explainable AI in speech offers tools for this task, but more targeted phonetic studies are needed. Interpretable SSL could, in turn, help uncover which acoustic cues actually drive intelligibility, talker identification, and alignment performance across languages and accents.

4.5. Human-Centered Evaluation Frameworks for Speech Technologies

As enhancement, recognition, and alignment systems are deployed in hearing devices, educational platforms, and clinical tools, evaluation must go beyond error rates to include intelligibility, listening effort, fatigue, fairness across accents, and user experience. Recent reviews of central representations of speech-in-noise perception and of accented speech processing highlight how individual differences and neural factors interact with acoustic cues [14]. Combining psychoacoustic, neuroimaging, and user-experience measures with classic engineering metrics will be crucial for designing technologies that truly support communication, rather than merely optimizing scores.

4.6. Bridging Laboratory Phonetics and Generative Speech Technologies

Finally, the growth of generative TTS and voice conversion, including systems that synthesize Lombard-style or clear speech, opens an experimental avenue for acoustic phonetics: using controllable synthetic voices to test hypotheses about cue weighting, learning, and adaptation [15]. Collaborations between phoneticians and speech technologists could yield synthetic stimuli that are both ecologically plausible and tightly controlled, enabling experiments that would be difficult or impossible with human talkers alone.

5. Conclusions

The contributions to Developments in Acoustic Phonetic Research demonstrate a field that is increasingly interdisciplinary, methodologically diverse, and deeply connected to real-world communication problems. Across the contributions surveyed here, a common trajectory emerges, since acoustic phonetics is expanding beyond controlled laboratory paradigms and embracing the complexity of real communicative environments—noisy, multilingual, stylistically diverse, and technologically mediated. The work in this Special Issue demonstrates how advances in measurement, modeling, and representation learning are beginning to capture that complexity, yielding tools and insights that speak simultaneously to theory, engineering, and clinical practice. Yet the field’s most promising opportunities lie in deeper integration, linking physical models with machine learning, aligning acoustic evidence with social and cognitive accounts of bilingualism, and evaluating technologies in ways that reflect real listener experiences. As speech research continues to move toward more ecologically grounded and computationally rich approaches, such cross-disciplinary connections will be essential for understanding how speech is produced, perceived, and shaped across an increasingly diverse and noisy world.

Conflicts of Interest

The author declares no conflicts of interest.

List of Contributions

Shiraishi, M.; Mishima, K.; Takekawa, M.; Mori, M.; Umeda, H. Clarification of the Acoustic Characteristics of Velopharyngeal Insufficiency by Acoustic Simulation Using the Boundary Element Method: A Pilot Study. Acoustics 2025, 7, 26. https://doi.org/10.3390/acoustics7020026.
Bojorjes, A.R.; Garcia-Barrientos, A.; Cárdenas-Juárez, M.; Pineda-Rico, U.; Arce, A.; Velasquez, S.M.; Cortés, O.P. A Z-Test-Based Evaluation of a Least Mean Square Filter for Noise Reduction. Acoustics 2025, 7, 20. https://doi.org/10.3390/acoustics7020020.
Dedousis, G.; Bakogiannis, K.; Andreopoulou, A.; Georgaki, A. Vocal Directivity of the Greek Singing Voice on the First Three Formant Frequencies. Acoustics 2025, 7, 13. https://doi.org/10.3390/acoustics7010013.
Yang, Y. Acoustic Analyses of L1 and L2 Vowel Interactions in Mandarin–Cantonese Late Bilinguals. Acoustics 2024, 6, 568-578. https://doi.org/10.3390/acoustics6020030.
Chauhan, N.; Isshiki, T.; Li, D. Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies. Acoustics 2024, 6, 439–469. https://doi.org/10.3390/acoustics6020024.
Georgiou, G.P.; Giannakou, A. Acoustic Characteristics of Greek Vowels Produced by Adult Heritage Speakers of Albanian. Acoustics 2024, 6, 257–271. https://doi.org/10.3390/acoustics6010014.
Bottalico, P.; Murgia, S. The Effect of the Frequency and Energetic Content of Broadband Noise on the Lombard Effect and Speech Intelligibility. Acoustics 2023, 5, 898–908. https://doi.org/10.3390/acoustics5040052.
Tits, N.; Bhatnagar, P.; Dutoit, T. Text-Independent Phone-to-Audio Alignment Leveraging SSL (TIPAA-SSL) Pre-Trained Model Latent Representation and Knowledge Transfer. Acoustics 2024, 6, 772–781. https://doi.org/10.3390/acoustics6030042.

References

Jorgensen, E.J. Acoustic Complexity in Real-World Noise and Effects on Speech Perception for Listeners with Normal Hearing and Hearing Loss. Ph.D. Thesis, The University of Iowa, Iowa City, IA, USA, 2022. [Google Scholar]
Georgiou, G.P.; Kaskampa, A. Differences in voice quality measures among monolingual and bilingual speakers. Ampersand 2024, 12, 100175. [Google Scholar] [CrossRef]
Goverts, S.T.; Best, V.; Bouwmeester, J.; Smits, C.; Colburn, H.S. Acoustic Realism of Clinical Speech-in-Noise Testing: Parameter Ranges of Speech-Likeness, Interaural Coherence, and Interaural Differences. Trends Hear. 2025, 29, 23312165251336625. [Google Scholar] [CrossRef]
Singh, A.; Kaur, N.; Kukreja, V.; Kadyan, V.; Kumar, M. Computational intelligence in processing of speech acoustics: A survey. Complex Intell. Syst. 2022, 8, 2623–2661. [Google Scholar] [CrossRef]
Inouye, J.M.; Perry, J.L.; Lin, K.Y.; Blemker, S.S. A computational model quantifies the effect of anatomical variability on velopharyngeal function. J. Speech Lang. Hear. Res. 2015, 58, 1119–1133. [Google Scholar] [CrossRef] [PubMed]
Lim, Y.; Zhu, Y.; Lingala, S.G.; Byrd, D.; Narayanan, S.; Nayak, K.S. 3D dynamic MRI of the vocal tract during natural speech. Magn. Reson. Med. 2019, 81, 1511–1520. [Google Scholar] [CrossRef]
Vaessen, N.; Ordelman, R.; van Leeuwen, D.A. Self-supervised learning of speech representations with Dutch archival data. In Proceedings of Interspeech 2025, Rotterdam, The Netherlands, 17–21 August 2025; ISCA: Copenhagen, Denmark, 2025; pp. 1208–1212. [Google Scholar]
Hansen, J.H.; Lee, J.; Ali, H.; Saba, J.N. A speech perturbation strategy based on “Lombard effect” for enhanced intelligibility for cochlear implant listeners. J. Acoust. Soc. Am. 2020, 147, 1418–1428. [Google Scholar] [CrossRef] [PubMed]
Meemann, K.; Smiljanić, R. Intelligibility of noise-adapted and clear speech in energetic and informational maskers for native and nonnative listeners. J. Speech Lang. Hear. Res. 2022, 65, 1263–1281. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Ando, A.; Takaki, S.; Yamagishi, J.; Kobashikawa, S. Does the Lombard effect improve emotional communication in noise? Analysis of emotional speech acted in noise. In Proceedings of Interspeech 2019, Graz, Austria, 15–19 September 2019; ISCA: Copenhagen, Denmark, 2019; pp. 3292–3296. [Google Scholar]
Aoki, N.B.; Zellou, G. Being clear about clear speech: Intelligibility of hard-of-hearing-directed, non-native-directed, and casual speech for L1-and L2-English listeners. J. Phon. 2024, 104, 101328. [Google Scholar] [CrossRef]
Clément, P.; Hans, S.; Hartl, D.M.; Maeda, S.; Vaissière, J.; Brasnu, D. Vocal tract area function for vowels using three-dimensional magnetic resonance imaging. A preliminary study. J. Voice 2007, 21, 522–530. [Google Scholar] [CrossRef] [PubMed]
Murmura, B.; Barbiera, F.; Mecorio, F.; Bortoluzzi, G.; Orefice, I.; Vetrano, E.; Gucciardo, A.G. Vocal tract physiology and its MRI evaluation. Rev. Investig. Innovación Cienc. Salud 2021, 3, 47–56. [Google Scholar] [CrossRef]
Adank, P.; Nuttall, H.E.; Banks, B.; Kennedy-Higgins, D. Neural bases of accented speech perception. Front. Hum. Neurosci. 2015, 9, 558. [Google Scholar] [CrossRef]
Woszczyk, D.; Ribeiro, M.S.; Merritt, T.; Korzekwa, D. Voice Conversion for Lombard Speaking Style with Implicit and Explicit Acoustic Feature Conditioning. arXiv 2025, arXiv:2507.09310. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Georgiou, G.P. Developments in Acoustic Phonetic Research. Acoustics 2026, 8, 19. https://doi.org/10.3390/acoustics8010019

AMA Style

Georgiou GP. Developments in Acoustic Phonetic Research. Acoustics. 2026; 8(1):19. https://doi.org/10.3390/acoustics8010019

Chicago/Turabian Style

Georgiou, Georgios P. 2026. "Developments in Acoustic Phonetic Research" Acoustics 8, no. 1: 19. https://doi.org/10.3390/acoustics8010019

APA Style

Georgiou, G. P. (2026). Developments in Acoustic Phonetic Research. Acoustics, 8(1), 19. https://doi.org/10.3390/acoustics8010019

Article Menu

Developments in Acoustic Phonetic Research

1. Introduction

2. Papers Included in the Special Issue

2.1. Speech in Noise and the Lombard Effect

2.2. Acoustic Phonetics in Bilingual and Heritage Contexts

2.3. Beyond Speech: Singing Voice and Spatial Acoustics

2.4. Signal Processing, Alignment, and Clinical/Technological Applications

2.4.1. Noise-Resilient Speaker Recognition

2.4.2. Noise Reduction and Perceptual Validation

2.4.3. Simulation of Velopharyngeal Insufficiency (VPI)

2.4.4. Self-Supervised Text-Independent Phone-to-Audio Alignment

3. Cross-Cutting Themes and Remaining Gaps

3.1. Frequency- and Space-Aware Thinking About Speech

3.2. Plastic, Socially Embedded Phonetic Systems in Bilinguals

3.3. Integration of Physical Models, Machine Learning, and Perception

3.4. Data Scarcity, Annotation Cost, and Accent Bias

3.5. The Need for Theory-Driven Yet Application-Oriented Research

4. Future Directions for Acoustic Phonetic Research

4.1. Comprehensive Models of Intelligibility in Complex, Multilingual Listening Conditions

4.2. Longitudinal, Multimodal Studies of Phonetic Change in Real Communities

4.3. MRI-Driven 3D Vocal-Tract Modeling for Both Typical and Disordered Speech

4.4. Interpretable Self-Supervised Representations for Phonetics

4.5. Human-Centered Evaluation Frameworks for Speech Technologies

4.6. Bridging Laboratory Phonetics and Generative Speech Technologies

5. Conclusions

Conflicts of Interest

List of Contributions

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI