Next Article in Journal
Design and Analysis of Non-Binary Cyclic Permutation Sequences for Low-Correlation Multiuser Synchronization
Previous Article in Journal
Numerical Analysis on the Horizontal Bearing Mechanism of Pile–Soil Composite Foundations Under Asymmetric Lateral Constraint Conditions
Previous Article in Special Issue
Quantum-Inspired Impulsive Continuous Hopfield Networks for Robust and Resilient Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Review

Deep Learning in Medical Speech to Text: Methods and Challenges

by
Maciej Sztabinski
1 and
Pawel Weichbroth
2,*
1
Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, 80-222 Gdansk, Poland
2
Department of Software Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, 80-222 Gdansk, Poland
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(6), 885; https://doi.org/10.3390/sym18060885 (registering DOI)
Submission received: 24 April 2026 / Revised: 15 May 2026 / Accepted: 19 May 2026 / Published: 23 May 2026

Abstract

Automated clinical documentation based on clinician-patient conversations is an emerging application of deep learning, driven by advances in medical speech recognition and natural language processing. Despite technological progress, real-world adoption remains limited. This review analyzes deep learning–based medical speech-to-text systems, focusing on methodologies, evaluation strategies, and barriers to clinical implementation. A systematic review of 31 studies was conducted, covering automatic speech recognition, clinical dialogue processing, and large language model-based documentation pipelines. Speech recognition accuracy varies considerably in noisy, multi-speaker, and spontaneous clinical environments. Downstream tasks such as entity extraction and summarization are highly sensitive to transcription errors and constrained by limited real-world datasets. Most systems lack external clinical validation and are tested in controlled settings. Key challenges include speaker diarization, domain adaptation, privacy protection, and the need for standardized evaluation frameworks. Although LLMs demonstrate strong potential, concerns remain regarding hallucinations and factual reliability, necessitating improved robustness and clinician oversight.
Keywords: automatic speech recognition; clinical documentation; digital scribe automatic speech recognition; clinical documentation; digital scribe

Share and Cite

MDPI and ACS Style

Sztabinski, M.; Weichbroth, P. Deep Learning in Medical Speech to Text: Methods and Challenges. Symmetry 2026, 18, 885. https://doi.org/10.3390/sym18060885

AMA Style

Sztabinski M, Weichbroth P. Deep Learning in Medical Speech to Text: Methods and Challenges. Symmetry. 2026; 18(6):885. https://doi.org/10.3390/sym18060885

Chicago/Turabian Style

Sztabinski, Maciej, and Pawel Weichbroth. 2026. "Deep Learning in Medical Speech to Text: Methods and Challenges" Symmetry 18, no. 6: 885. https://doi.org/10.3390/sym18060885

APA Style

Sztabinski, M., & Weichbroth, P. (2026). Deep Learning in Medical Speech to Text: Methods and Challenges. Symmetry, 18(6), 885. https://doi.org/10.3390/sym18060885

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop