Diagnostic Accuracy of Artificial Intelligence in Laryngeal Disorders: An Integrative Review
Abstract
1. Introduction
2. Methods
3. Results and Discussion
3.1. Physioacoustic Foundations of Automated Detection
3.1.1. Source–Filter Model and Glottic Biomechanics
3.1.2. Acoustic Correlates and Exploitable Features
3.1.3. Acoustic Overlap Between Pathophysiologically Distinct Conditions
3.2. Panorama of AI Approaches
3.2.1. Data Characteristics and Acquisition Protocols
Recording Modalities and Acoustic Conditions
Vocal Tasks: Sustained Vowels or Connected Speech
Databases and Representativeness Bias
Ecological Conditions and Noise Robustness
3.2.2. Classical ML and Feature Engineering
3.2.3. Deep Architectures and Representation Learning
3.2.4. Validation Strategies: Internal Versus External
4. Diagnostic Performance and Methodological Biases
4.1. Synthesis of Reported Performance
4.2. Taxonomy of Methodological Biases
4.2.1. Selection Bias
4.2.2. Measurement Bias
4.2.3. Analysis Bias
4.2.4. Publication Bias
4.3. Reproducibility and Inter-Center Variability
5. Current Clinical Recognition Levels
5.1. Three Level Frameworks for Clinical Recognition
5.1.1. Level 1: Binary Detection (Healthy Versus Pathological)
5.1.2. Level 2: Pathophysiological Category Recognition
5.1.3. Level 3: Specific Pathology Identification
5.2. Recognition Level and Clinical Positioning
6. Gaps and Perspectives
6.1. Underrepresented Disorders and Populations
6.2. Methodological Standardization Needs
7. Recommendations for Future Research and Clinical Deployment
7.1. Research Priorities
7.2. Clinical Deployment and Integration into Practice
7.3. Regulatory and Ethical Considerations
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| ML | Machine Learning |
| DL | Deep Learning |
| VQ | Voice Quality |
| F0 | Fundamental frequency |
| HNR | Harmonic-to-noise ratio |
| UVFP | Unilateral Vocal Fold Paralysis |
| MEEI | Massachusetts Eye and Ear Infirmary Database |
| SVD | Saarbrücken Voice Database |
| AVPD | Arabic Voice Pathology Database |
| VOICED | Vox4Health m-health clinical study cohort |
| FEMH | Far Eastern Memorial Hospital voice disorder database |
| CNN | Convolutional Neural Network |
| LSTM | Long Short-term Memory |
| MFCC | Mel-Frequency Cepstral Coefficient |
| U.S. | United States |
References
- Schwartz, S.R.; Cohen, S.M.; Dailey, S.H.; Rosenfeld, R.M.; Deutsch, E.S.; Gillespie, M.B.; Granieri, E.; Hapner, E.R.; Kimball, C.E.; Krouse, H.J.; et al. Clinical practice guideline: Hoarseness (dysphonia). Otolaryngol. Head Neck Surg. 2009, 141, S1–S31. [Google Scholar] [CrossRef] [PubMed]
- Cohen, S.M. Self-reported impact of dysphonia in a primary care population: An epidemiological study. Laryngoscope 2010, 120, 2022–2032. [Google Scholar] [CrossRef]
- Stojanovic, J.; Radovanovic, S.; Jevtic, M.; Krsmanovic, S.; Jovanovic, M.; Jevtovic, A.; Babac, S.; Veselinovic, M.; Bojanovic, M.; Krejovic-Trivic, S.B.; et al. Dysphonia in Occupational Voice Users: Risk Factors, Causes and Socioepidemiological Profiles. Medicina 2026, 62, 381. [Google Scholar] [CrossRef]
- Baudouin, R.; Lechien, J.R.; Carpentier, L.; Gurruchaga, J.M.; Lisan, Q.; Hans, S. Deep Brain Stimulation Impact on Voice and Speech Quality in Parkinson’s Disease: A Systematic Review. Otolaryngol. Head Neck Surg. 2023, 168, 307–318. [Google Scholar] [CrossRef]
- Lechien, J.R.; Khalife, M.; Huet, K.; Finck, C.; Bousard, L.; Delvaux, V.; Piccaluga, M.; Harmegnies, B.; Saussez, S. Perceptual, Aerodynamic, and Acoustic Characteristics of Voice Changes in Patients with Laryngopharyngeal Reflux Disease. Ear Nose Throat J. 2019, 98, E44–E50. [Google Scholar] [CrossRef] [PubMed]
- Akbari, A.; Arjmandi, M.K. Employing linear prediction residual signal of wavelet sub-bands in automatic detection of laryngeal pathology. Biomed. Signal Process. Control 2015, 18, 293–302. [Google Scholar] [CrossRef]
- Al-Nasheri, A.; Muhammad, G.; Alsulaiman, M.; Ali, Z. Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions. J. Voice 2016, 31, 3–15. [Google Scholar] [CrossRef]
- Low, D.M.; Rao, V.; Randolph, G.; Song, P.C.; Ghosh, S.S. Identifying bias in models that detect vocal fold paralysis from audio recordings using explainable machine learning and clinician ratings. PLoS Digit. Health 2024, 3, e0000516. [Google Scholar] [CrossRef]
- Kumar, S.P.; Narayanan, N.; Ramachandran, J.; Thangavel, B. Convolutional neural network for voice disorders classification using kymograms. Biomed. Signal Process. Control 2023, 86, 105159. [Google Scholar] [CrossRef]
- Kim, H.; Jeon, J.; Han, Y.J.; Joo, Y.; Lee, J.; Lee, S.; Im, S. Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy. J. Clin. Med. 2020, 9, 3415. [Google Scholar] [CrossRef] [PubMed]
- Lu, C.; Zhou, Y.; Zhu, S.; Yuan, Y.; Kong, D.; Wang, Z.; Lv, X.; Lu, R.; Xie, Y.; Niu, X.; et al. Acoustic Biomarkers Derived From Computerized Voice Analysis for Predicting Anterior Commissure Involvement and Survival in Laryngeal Carcinoma. J. Voice, 2025, in press. [CrossRef]
- Wang, C.T.; Chen, T.M.; Lee, N.T.; Fang, S.H. AI Detection of Glottic Neoplasm Using Voice Signals, Demographics, and Structured Medical Records. Laryngoscope 2024, 134, 4585–4592. [Google Scholar] [CrossRef]
- Rusz, J.; Švihlík, J.; Krejčí, P.; Novotný, M.; Tykalová, T. Reproducibility of Voice Analysis with Machine Learning. Mov. Disord. 2021, 36, 1282–1283. [Google Scholar] [CrossRef]
- Vrba, J.; Steinbach, J.; Jirsa, T.; Verde, L.; De Fazio, R.; Zeng, Y.; Ichiji, K.; Hájek, L.; Sedláková, Z.; Urbániová, Z.; et al. Reproducible Machine Learning-Based Voice Pathology Detection: Introducing the Pitch Difference Feature. J. Voice, 2025, in press. [CrossRef] [PubMed]
- Van der Woerd, B.; Chen, Z.; Flemotomos, N.; Oljaca, M.; Timmons Sund, L.; Narayanan, S.; Johns, M.M. A Machine-Learning algorithm for the automated perceptual evaluation of dysphonia severity. J. Voice 2023, 39, 1440–1445. [Google Scholar] [CrossRef]
- Kim, Y.E.; Dobko, M.; Li, H.; Shao, T.; Periyakoil, P.; Tipton, C.; Colasacco, C.; Serpedin, A.; Elemento, O.; Sabuncu, M.; et al. A Deep-Learning Model for Multi-class Audio Classification of Vocal Fold Disorders in Office Stroboscopy. Laryngoscope 2025, 135, 2428–2436. [Google Scholar] [CrossRef]
- Schlegel, P.; Kniesburges, S.; Dürr, S.; Schützenberger, A.; Döllinger, M. Machine learning based identification of relevant parameters for functional voice disorders derived from endoscopic high-speed recordings. Sci. Rep. 2020, 10, 10517. [Google Scholar] [CrossRef]
- Tseng, W.H.; Lee, M.S.; Hong, S.C.; Hsiao, T.Y.; Yang, T.L. Application of an AI-Based Model for Non-Invasive Sonographic Assessment for Injection Laryngoplasty. Otolaryngol. Head Neck Surg. 2025, 173, 144–153. [Google Scholar] [CrossRef] [PubMed]
- Aichinger, P.; Schoentgen, J. Detection of Diplophonation in Audio Recordings of German Standard Text Readings. J. Voice 2018, 33, 949.e1–949.e10. [Google Scholar] [CrossRef]
- Lechien, J.R.; Geneid, A.; Bohlender, J.E.; Cantarella, G.; Avellaneda, J.C.; Desuter, G.; Sjogren, E.V.; Finck, C.; Hans, S.; Hess, M.; et al. Consensus for voice quality assessment in clinical practice: Guidelines of the European Laryngological Society and Union of the European Phoniatricians. Eur. Arch. Otorhinolaryngol. 2023, 280, 5459–5473. [Google Scholar] [CrossRef] [PubMed]
- Stachler, R.J.; Francis, D.O.; Schwartz, S.R.; Damask, C.C.; Digoy, G.P.; Krouse, H.J.; McCoy, S.J.; Ouellette, D.R.; Patel, R.R.; Reavis, C.C.W.; et al. Clinical Practice Guideline: Hoarseness (Dysphonia) (Update). Otolaryngol. Head Neck Surg. 2018, 159, S1–S42. [Google Scholar] [CrossRef]
- Morikawa, M.; Spatti, D.H.; Dajer, M.E. Wavelet packet transform and multilayer perceptron to identify voices with a mild degree of vocal deviation. Investig. Innov. Cienc. Salud 2022, 4, 16–25. [Google Scholar] [CrossRef]
- Tirronen, S.; Kadiri, S.R.; Alku, P. The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection. J. Voice 2022, 38, 975–982. [Google Scholar] [CrossRef] [PubMed]
- Geng, L.; Liang, Y.; Shan, H.; Xiao, Z.; Wang, W.; Wei, M. Pathological Voice Detection and Classification Based on Multimodal Transmission Network. J. Voice 2022, 39, 591–601. [Google Scholar] [CrossRef]
- Za’im, N.A.N.; Al-Dhief, F.T.; Azman, M.; Alsemawi, M.R.M.; Abdul Latiff, N.M.; Mat Baki, M. The accuracy of an Online Sequential Extreme Learning Machine in detecting voice pathology using the Malaysian Voice Pathology Database. J. Otolaryngol. Head Neck Surg. 2023, 52, s40463-023. [Google Scholar] [CrossRef] [PubMed]
- Fang, S.H.; Tsao, Y.; Hsiao, M.J.; Chen, J.Y.; Lai, Y.H.; Lin, F.C.; Wang, C.T. Detection of Pathological Voice Using Cepstrum Vectors: A Deep Learning Approach. J. Voice 2018, 33, 634–641. [Google Scholar] [CrossRef]
- Ma, S.; Liao, W.; Zhang, Y.; Zhang, F.; Wang, Y.; Lu, Z.; Zhao, C.; Yu, J.; He, P. Research on automatic assessment of the severity of unilateral vocal cord paralysis based on Mel-spectrogram and convolutional neural networks. Biomed. Eng. Online 2025, 24, 76. [Google Scholar] [CrossRef]
- Gómez, P.; Kist, A.M.; Schlegel, P.; Berry, D.A.; Chhetri, D.K.; Dürr, S.; Echternach, M.; Johnson, A.M.; Kniesburges, S.; Kunduk, M.; et al. BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation. Sci. Data 2020, 7, 186. [Google Scholar] [CrossRef]
- Santana, E.R.; Lopes, L.; de Moraes, R.M. Recognition of the Effect of vocal exercises by Fuzzy Triangular Naive Bayes, a machine learning classifier: A preliminary analysis. J. Voice 2022, 39, 21–30. [Google Scholar] [CrossRef]
- Mahmood, S.A. Multi-Dimensional Features Extraction for Voice Pathology Detection Based on Deep Learning Methods. J. Voice, 2024, in press. [CrossRef]
- Achtuth Rao, M.V.; Yamini, B.K.; Ketan, J.; Preetie Shetty, A.; Pal, P.K.; Shivashankar, N.; Ghosh, P.K. Automatic Classification of Healthy Subjects and Patients With Essential Vocal Tremor Using Probabilistic Source-Filter Model Based Noise Robust Pitch Estimation. J. Voice 2021, 37, 314–321. [Google Scholar] [CrossRef]
- Islam, R.; Abdel-Raheem, E.; Tarique, M. Voice pathology detection using convolutional neural networks with electroglottographic (EGG) and speech signals. Comput. Methods Programs Biomed. Update 2022, 2, 100074. [Google Scholar] [CrossRef]
- Barlow, J.; Sragi, Z.; Rivera-Rivera, G.; Al-Awady, A.; Daşdöğen, Ü.; Courey, M.S.; Kirke, D.N. The Use of Deep Learning Software in the Detection of Voice Disorders: A Systematic Review. Otolaryngol. Head Neck Surg. 2024, 170, 1531–1543. [Google Scholar] [CrossRef] [PubMed]
- Tessler, I.; Primov-Fever, A.; Soffer, S.; Anteby, R.; Gecel, N.A.; Livneh, N.; Alon, E.E.; Zimlichman, E.; Klang, E. Deep learning in voice analysis for diagnosing vocal cord disorders: A systematic review. Eur. Arch. Oto-Rhino-Laryngol. 2024, 281, 863–871. [Google Scholar] [CrossRef]
- Pakravan, M.; Jahed, M. Significant pathological voice discrimination by computing posterior distribution of balanced accuracy. Biomed. Signal Process. Control 2021, 73, 103410. [Google Scholar] [CrossRef]
- Forero, L.A.; Kohler, M.; Velasco, M.M.B.R.; Cataldo, E. Analysis and Classification of Voice Disorders Using Glottal Signal Parameters. J. Voice 2016, 30, 549–556. [Google Scholar] [CrossRef] [PubMed]
- de Abreu, S.R.; Sousa, E.S.D.S.; de Moraes, R.M.; Lopes, L.W. Performance of Acoustic Measures for the Discrimination Among Healthy, Rough, Breathy, and Strained Voices Using the Feedforward Neural Network. J. Voice 2025, 39, 1–9. [Google Scholar] [CrossRef]
- Suppa, A.; Asci, F.; Saggio, G.; Di Leo, P.; Zarezadeh, Z.; Ferrazzano, G.; Ruoppolo, G.; Berardelli, A.; Costantini, G. Voice Analysis with Machine Learning: One Step Closer to an Objective Diagnosis of Essential Tremor. Mov. Disord. 2021, 36, 1401–1410. [Google Scholar] [CrossRef]
- Leite, D.R.A.; de Moraes, R.M.; Wanderley Lopes, L. Different performances of machine learning models to classify dysphonic and non-dysphonic voices. J. Voice 2025, 39, 577–590. [Google Scholar] [CrossRef]
- Li, Z.; Zhou, L.; Liu, M.; Huang, Z. Exploring novel objective voice assessment parameters: A pilot study. J. Voice, 2025, in press. [CrossRef]
- Naranjo, L.; Pérez, C.J.; Campos-Roca, Y.; Madruga, M. Replication-based regularization approaches to diagnose Reinke’s edema by using voice recordings. Artif. Intell. Med. 2021, 120, 102162. [Google Scholar] [CrossRef]
- Hu, H.C.; Chang, S.Y.; Wang, C.H.; Li, K.J.; Cho, H.Y.; Chen, Y.T.; Lu, C.J.; Tsai, T.P.; Lee, O.K.S. Deep Learning Application for Vocal Fold Disease Prediction Through Voice Recognition: Preliminary Development Study. J. Med. Internet Res. 2021, 23, e25247. [Google Scholar] [CrossRef]
- Celepli, S.; Bigat, I.; Karakas, B.; Tezcan, H.M.; Yar, M.D.; Celepli, P.; Aksahin, M.F.; Hancerliogullari, O.; Yilmaz, Y.F.; Erogul, O. SHAP-Based Identification of Potential Acoustic Biomarkers in Patients with Post-Thyroidectomy Voice Disorder. Diagnostics 2025, 15, 2065. [Google Scholar] [CrossRef]
- Fujimura, S.; Kojima, T.; Okanoue, Y.; Shoji, K.; Inoue, M.; Omori, K.; Hori, R. Classification of Voice Disorders Using a One-Dimensional Convolutional Neural Network. J. Voice 2022, 36, 15–20. [Google Scholar] [CrossRef]
- Gülşen, P.; Gülşen, A.; Alci, M. Machine Learning Models With Hyperparameter Optimization for Voice Pathology Classification on Saarbrücken Voice Database. J. Voice, 2024, in press. [CrossRef]
- Kojima, T.; Hasebe, K.; Fujimura, S.; Okanoue, Y.; Kagoshima, H.; Taguchi, A.; Yamamoto, H.; Shoji, K.; Hori, R. A New iPhone Application for Voice Quality Assessment Based on the GRBAS Scale. Laryngoscope 2020, 131, 580–582. [Google Scholar] [CrossRef]
- Compton, E.C.; Cruz, T.; Andreassen, M.; Beveridge, S.; Bosch, D.; Randall, D.R.; Livingstone, D. Developing an Artificial Intelligence Tool to Predict Vocal Cord Pathology in Primary Care Settings. Laryngoscope 2022, 133, 1952–1960. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.B.; Lee, H.G. Quantitative analysis of automatic voice disorder detection studies for hybrid feature and classifier selection. Biomed. Signal Process. Control 2024, 91, 106014. [Google Scholar] [CrossRef]
- Jenkins, P.; Harrison, R.; Bedrick, S.; Karsten, L.; Hersh, W. Voice as a biomarker: Exploratory analysis for benign and malignant vocal fold lesions. Front. Digit. Health 2025, 7, 1609811. [Google Scholar] [CrossRef] [PubMed]
- Maskeliūnas, R.; Damaševičius, R.; Kulikajevas, A.; Pribuišis, K.; Uloza, V. A laryngeal Speech Enhancement for Noisy Environments Using a Pareto Denoising Gated LSTM. J. Voice, 2024, in press. [CrossRef]
- Medeiros Araujo Lima-Filho, L.; Lopes, L.W.; de Menezes e Silva Filho, T. Integrated Vocal Deviation Index (IVDI): A Machine Learning Model to Classify the General Grade of Vocal Deviation. J. Voice, 2024, in press. [CrossRef] [PubMed]
- Yao, Y.; Powell, M.; White, J.; Feng, J.; Fu, Q.; Zhang, P.; Schmidt, D.C. A multi-stage transfer learning strategy for diagnosing a class of rare laryngeal movement disorders. Comput. Biol. Med. 2023, 166, 107534. [Google Scholar] [CrossRef]
- Pham, T.D.; Holmes, S.B.; Zou, L.; Patel, M.; Coulthard, P. Diagnosis of pathological speech with streamlined features for long short-term memory learning. Comput. Biol. Med. 2024, 170, 107976. [Google Scholar] [CrossRef]
- Kwon, I.; Wang, S.G.; Shin, S.C.; Cheon, Y.I.; Lee, B.J.; Lee, J.C.; Lim, D.W.; Jo, C.; Cho, Y.; Shin, B.J. Diagnosis of Early Glottic Cancer Using Laryngeal Image and Voice Based on Ensemble Learning of Convolutional Neural Network Classifiers. J. Voice 2025, 39, 245–257. [Google Scholar] [CrossRef]
- Pribuišis, K.; Maskeliūnas, R.; Ulozaitė-Stanienė, N.; Padervinskis, E.; Damaševičius, R.; Blažauskas, T.; Uloza, V. Assessment of the Performance of an AI-Driven Speech Enhancer Algorithm for Speech Enhancement Following Laryngeal Onco-surgery. J. Voice, 2025, in press. [CrossRef]
- Kojima, T.; Fujimura, S.; Hasebe, K.; Okanoue, Y.; Shuya, O.; Yuki, R.; Shoji, K.; Hori, R.; Kishimoto, Y.; Omori, K. Objective Assessment of Pathological Voice Using Artificial Intelligence Based on the GRBAS Scale. J. Voice 2024, 38, 561–566. [Google Scholar] [CrossRef]
- Zhang, Z. Toward ambulatory monitoring of vocal behavior at the physiological level using deep ensembles and Bayesian neural networks. JASA Express Lett. 2025, 5, 118601. [Google Scholar] [CrossRef]
- Roitman, A.; Edelstain, Y.; Katzir, C.; Ofir, H.; Peleg, N.; Doweck, I.; Yanir, Y. Harnessing machine learning in diagnosing complex hoarseness cases. Am. J. Otolaryngol. 2025, 46, 104533. [Google Scholar] [CrossRef]
- Hung, C.H.; Wang, S.S.; Wang, C.T.; Fang, S.H. Using SincNet for Learning Pathological Voice Disorders. Sensors 2022, 22, 6634. [Google Scholar] [CrossRef] [PubMed]
- Verma, V.; Benjwal, A.; Chhabra, A.; Singh, S.K.; Kumar, S.; Gupta, B.B.; Arya, V.; Chui, K.T. A novel hybrid model integrating MFCC and acoustic parameters for voice disorder detection. Sci. Rep. 2023, 13, 22719. [Google Scholar] [CrossRef] [PubMed]
- Bensoussan, Y.; Vanstrum, E.B.; Johns, M.M.; Rameau, A. Artificial Intelligence and Laryngeal Cancer: From screening to prognosis: A state of the art review. Otolaryngol. Head Neck Surg. 2022, 168, 319–329. [Google Scholar] [CrossRef]
- Bur, A.M.; Zhang, T.; Chen, X.; Kavookjian, H.; Kraft, S.; Karadaghy, O.; Farrokhian, N.; Mussatto, C.; Penn, J.; Wang, G. Interpretable Computer Vision to Detect and Classify Structural Laryngeal Lesions in Digital Flexible Laryngoscopic Images. Otolaryngol. Head Neck Surg. 2023, 169, 1564–1572. [Google Scholar] [CrossRef]
- Kuo, H.C.; Hsieh, Y.P.; Tseng, H.H.; Wang, C.T.; Fang, S.H.; Tsao, Y. Toward Real-World Voice Disorder Classification. IEEE Trans. Biomed. Eng. 2023, 70, 2922–2932. [Google Scholar] [CrossRef]
- Surapaneni, S.; Kutler, R.B.; Setzen, S.A.; Kim, Y.E.; Yao, P.; Siddiqui, S.H.; Pitman, M.J.; Sulica, L.; Elemento, O.; Khosravi, P.; et al. A multimodal approach for deep-learning classification of vocal fold pathologies in stroboscopy. Laryngoscope 2026, 136, 2503–2510. [Google Scholar] [CrossRef]
- Reid, J.; Parmar, P.; Lund, T.; Aalto, D.K.; Jeffery, C.C. Development of a machine-learning based voice disorder screening tool. Am. J. Otolaryngol. 2022, 43, 103327. [Google Scholar] [CrossRef]
- Ghasemzadeh, H.; Khass, M.T.; Arjmandi, M.K.; Pooyan, M. Detection of vocal disorders based on phase space parameters and Lyapunov spectrum. Biomed. Signal Process. Control 2015, 22, 135–145. [Google Scholar] [CrossRef]
- Liu, G.S.; Hodges, J.M.; Yu, J.; Sung, C.K.; Erickson-DiRenzo, E.; Doyle, P.C. End-to-end deep learning classification of vocal pathology using stacked vowels. Laryngoscope 2023, 8, 1312–1318. [Google Scholar] [CrossRef]
- Pan, X.; Feng, T.; Zhang, N. PVGAN: A Pathological Voice Generation Model Incorporating a Progressive Nesting Strategy. J. Voice 2026, 40, 289–302. [Google Scholar] [CrossRef]
- Xie, X.; Cai, H.; Li, C.; Wu, Y.; Ding, F. A voice disease detection method based on MFCCs and Shallow CNN. J. Voice 2026, 40, 524.e1–524.e11. [Google Scholar] [CrossRef]
- Islam, R.; Tarique, M. Escalate prognosis of Parkinson’s disease employing wavelet features and Artificial Intelligence from vowel phonation. BioMedInformatics 2025, 5, 23. [Google Scholar] [CrossRef]
- Cai, J.; Song, Y.; Wu, J.; Chen, X. Voice Disorder Classification Using Wav2vec 2.0 Feature Extraction. J. Voice, 2024, in press. [CrossRef]
- Kim, H.B.; Song, J.; Park, S.; Lee, Y.O. Classification of laryngeal diseases including laryngeal cancer, benign mucosal disease, and vocal cord paralysis by artificial intelligence using voice analysis. Sci. Rep. 2024, 14, 9297. [Google Scholar] [CrossRef]
- Cordeiro, H.; Fonseca, J.; Guimarães, I.; Meneses, C. Hierarchical Classification and System Combination for Automatically Identifying Physiological and Neuromuscular Laryngeal Disorders. J. Voice 2017, 31, 9–14. [Google Scholar] [CrossRef]
- Lee, J.H.; Seok, J.; Kim, J.Y.; Kim, H.C.; Kwon, T.K. Evaluating the Diagnostic Potential of Connected Speech for Benign Laryngeal Disease Using Deep Learning Analysis. J. Voice, 2024, in press. [CrossRef]
- Özbay, E.; Altunbey Özbay, F.; Khodadadi, N.; Soleimanian Gharehchopogh, F.; Mirjalili, S. Multifeature Fusion Method with Metaheuristic Optimization for Automated Voice Pathology Detection. J. Voice, 2024, in press. [CrossRef]
- Vidal, J.; Ribas, D.; Bonomi, C.; Lleida, E.; Ferrer, L.; Ortega, A. Automatic Voice Disorder Detection from a Practical Perspective. J. Voice, 2024, in press. [CrossRef]
- Cala, F.; Frassineti, L.; Cantarella, G.; Buccichini, G.; Battilocchi, L.; Manfredi, C.; Lanata, A. Towards an explainable Artificial intelligence system for voice pathology identification and post-treatment characterisation. Biomed. Signal Process. Control 2025, 104, 107530. [Google Scholar] [CrossRef]
- Mehta, D.D.; Van Stan, J.H.; Zañartu, M.; Ghassemi, M.; Guttag, J.V.; Espinoza, V.M.; Cortés, J.P.; Cheyne, H.A.; Hillman, R.E. Using ambulatory voice monitoring to investigate common voice disorders: Research update. Front. Bioeng. Biotechnol. 2015, 3, 155. [Google Scholar] [CrossRef]
- Rao, D.; Singh, R.; Devaraja, K.; Kolekar, S. A comprehensive review of diagnostic approaches to vocal fold paralysis using Artificial Intelligence. Indian J. Otolaryngol. Head Neck Surg. 2025, 77, 2775–2783. [Google Scholar] [CrossRef]
- Weissman, G.E. Evaluation and regulation of Artificial Intelligence Medical Devices for clinical decision support. Annu. Rev. Biomed. Sci. 2025, 8, 81–99. [Google Scholar] [CrossRef]
- Ong, A.Y.; Taribagil, P.; Sevgi, M.; Kale, A.U.; Dow, E.R.; Macdonald, T.; Kras, A.; Maniatopouls, G.; Liu, X.; Keane, P.A.; et al. A scoping review of artificial intelligence as a medical device for ophthalmic image analysis in Europe, Australia and America. npj Digit. Med. 2025, 8, 323. [Google Scholar] [CrossRef]
- Yang, S.R.; Chien, J.T.; Lee, C.Y. Advancements in clinical evaluation and regulatory frameworks for AI-driven software as a medical device (SaMD). IEEE Open J. Eng. Med. Biol. 2024, 6, 147–151. [Google Scholar] [CrossRef]
- When Is AI Regulated? Comparing EU, UK & US Approaches to Classifying AI-Enabled Medical Devices—Learnova (s.d.). Available online: https://www.learnova.io/insights/ai-medical-device-regulation-eu-uk-us (accessed on 17 February 2026).
- Byrne, J. AI in Medical Devices: Navigating the Regulation in the US, UK and EU. Cognidox. Available online: https://www.cognidox.com/blog/ai-in-medical-devices-regulation (accessed on 7 February 2026).
- CSDmed. Post-Market Surveillance (PMS) MDR and IVDR: MDCG Guide 2025-10. CSDmed. Available online: https://www.csdmed.mc/en/news/medical-devices-regulation/mdcg-2025-10-pms-medical-devices-168 (accessed on 17 February 2026).
- Laurent, A. FDA SaMD Classification: AI and Machine Learning Guide. IntuitionLab. Available online: https://intuitionlabs.ai/articles/fda-samd-classification-ai-machine-learning (accessed on 14 February 2026).
- Zhang, S.; Li, Y.; Liu, W.; Chu, Q.; Wang, S.; Li, J.; Chen, Y. A decade of review in global regulation and research of artificial intelligence medical devices (2015–2025). Front. Med. 2025, 12, 1630408. [Google Scholar] [CrossRef]
- Terranova, C.; Cestonaro, C.; Fava, L.; Cinquetti, A. AI and professional liability assessment in healthcare. A revolution in legal medicine? Front. Med. 2024, 10, 1337335. [Google Scholar] [CrossRef]



| Base | N Healthy | N Pathological | Vocal Tasks | Recording Conditions | Language | Documented Biases |
|---|---|---|---|---|---|---|
| SVD | 687 | 1356 | Vowels /a/, /i/, /u/, at normal, high and low speech in the German sentence “Guten Morgen, wie geht es Ihnen?” | Phonetic lab recordings with studio microphone and electroglottograph, 16-bit, 50 kHz sampling, controlled acoustic environment | German | Highly unbalanced distribution; dominance of sustained vowels and read speech rather than spontaneous speech, limiting ecological validity [48,50,52] |
| MEEI | 53 | 1319 | Sustained phonation /a/ Reading of the first sentence of the Rainbow Passage. | Clinical recordings with Kay Elemetrics (Kay Elemetrics Corp., Boston, MA, USA); high-quality microphone, originally sampled up to 50 kHz | English | Class imbalance with far fewer healthy than pathological samples; tasks restricted to sustained vowels plus one standard sentence [6,7] |
| AVPD | 188 | 178 | Sustained vowels. Continuous speech. Isolated words. | Recorded with the Computerized Speech Lab (CSL 4500; KayPENTAX, Montvale, NJ, USA) using a studio microphone in a controlled clinical environment with a standardized protocol | Arabic | Mono-ethnic; predominance of sustained vowels compared with spontaneous speech [6] |
| VOICED | 58 | 150 | Sustained vowel /a/. | Recorded through a smartphone in a quiet room, 20 cm mouth-microphone distance, 8 kHz | Italian | Adult Italian speakers only; pathology spectrum limited to three main dysphonia groups [5] |
| FEMH | 0 | 2106 | Continuous speech: seven designed sentences per subject. Sustained vowel /a/. | Clinical recording environment, standard microphone, 16 kHz | Mandarin | No healthy control speakers; only four diagnostic groups; single-center mandarin cohort [12] |
| Key References | Bias Type | Typical Magnitude of Effect | Internal Validation | External Validation | Clinical Impact |
|---|---|---|---|---|---|
| [25,48,65,67] | Selection bias | Overestimation of accuracy by approximately 8–15 percentage points; Reduced detection of minority classes and rare disorders | Very high level 1 (binary) classification accuracy (88–99%) on sustained vowel datasets with balanced or curated samples | Typical decrease of 10–20 percentage points in accuracy and sensitivity when applied to independent or more diverse cohorts | Unequal diagnostic performance across age groups, languages and pathology subtypes, potentially leading to underdiagnosis in underrepresented populations |
| [8,19,31,42] | Measurement bias | Inflation of area under the curve and accuracy due to non-physiological cues such as recording conditions, signal intensity or duration rather than pathology-related features | Stable and homogeneous recordings (e.g., sustained vowel /a/, clinical-grade microphone) yielding optimistic and highly reproductible performance estimates | Significant performance degradation in noisy, ambulatory or telemedicine environments, reflecting reduced robustness to real-world variability | Models may rely on recording-related characteristics rather than pathophysiological features, increasing the risk of false negatives in real-world clinical settings |
| [13,25,48,65] | Analysis bias | Optimism of approximately 8–15 percentage points in accuracy or unweighted average recall compared to standardized or external validation procedures | k-fold cross-validation conducted on a single dataset frequently yields high accuracy, particularly in the presence of class imbalance or data homogeneity | Mean accuracy decreases of approximately 12 percentage points across studies performing both internal and external validation | Overestimation of true diagnostic performance, particularly for more complex classification levels (e.g., pathophysiological subtypes), with systematic under-detection of rare or subtle disorders |
| [13,14,34,48] | Publication and reproducibility bias | Reported accuracy and unweighted average recall often exceed independently reproduced performance by about 8–15 percentage points | Striking internal results obtained on single datasets are more likely to be submitted and published, while negative or modest findings remain underreported | Limited availability of code, pre-trained models or detailed methodological documentation; Independent re-implementations frequently report lower performance and incomplete reproducibility | Risk of premature clinical adoption of insufficiently validated tools, leading to overestimation of real-world reliability and potential patient safety concerns |
| [5,19,23,63,64] | Ecological validity bias | Large but difficult to quantify discrepancy between laboratory performance and real-world clinical effectiveness | Most models are trained and evaluated on sustained vowels or short read sentences recorded under controlled acoustic conditions, resulting in high level 1 classification accuracy | Reduced accuracy and increased variability observed when models are applied to continuous speech, spontaneous communication or smartphone recordings | Tools optimized for standardized speech tasks may fail to detect intermittent or context-dependent symptoms, increasing the risk of false reassurance and underdiagnosis in routine clinical practice |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Mairesse, S.; Maniaci, A.; Briganti, G.; Lechien, J.R. Diagnostic Accuracy of Artificial Intelligence in Laryngeal Disorders: An Integrative Review. J. Pers. Med. 2026, 16, 301. https://doi.org/10.3390/jpm16060301
Mairesse S, Maniaci A, Briganti G, Lechien JR. Diagnostic Accuracy of Artificial Intelligence in Laryngeal Disorders: An Integrative Review. Journal of Personalized Medicine. 2026; 16(6):301. https://doi.org/10.3390/jpm16060301
Chicago/Turabian StyleMairesse, Samantha, Antonino Maniaci, Giovanni Briganti, and Jerome R. Lechien. 2026. "Diagnostic Accuracy of Artificial Intelligence in Laryngeal Disorders: An Integrative Review" Journal of Personalized Medicine 16, no. 6: 301. https://doi.org/10.3390/jpm16060301
APA StyleMairesse, S., Maniaci, A., Briganti, G., & Lechien, J. R. (2026). Diagnostic Accuracy of Artificial Intelligence in Laryngeal Disorders: An Integrative Review. Journal of Personalized Medicine, 16(6), 301. https://doi.org/10.3390/jpm16060301

