Previous Article in Journal
Eghatha: A Blockchain-Based System to Enhance Disaster Preparedness
Previous Article in Special Issue
Integrating Large Language Models with near Real-Time Web Crawling for Enhanced Job Recommendation Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Benchmarking the Responsiveness of Open-Source Text-to-Speech Systems

Faculty of Science, Engineering and Built Environment, School of Information Technology, Deakin University, Burwood, VIC 3125, Australia
*
Author to whom correspondence should be addressed.
Computers 2025, 14(10), 406; https://doi.org/10.3390/computers14100406
Submission received: 8 August 2025 / Revised: 18 September 2025 / Accepted: 18 September 2025 / Published: 23 September 2025

Abstract

Responsiveness—the speed at which a text-to-speech (TTS) system produces audible output—is critical for real-time voice assistants yet has received far less attention than perceptual quality metrics. Existing evaluations often touch on latency but do not establish reproducible, open-source standards that capture responsiveness as a first-class dimension. This work introduces a baseline benchmark designed to fill that gap. Our framework unifies latency distribution, tail latency, and intelligibility within a transparent and dataset-diverse pipeline, enabling a fair and replicable comparison across 13 widely used open-source TTS models. By grounding evaluation in structured input sets ranging from single words to sentence-length utterances and adopting a methodology inspired by standardized inference benchmarks, we capture both typical and worst-case user experiences. Unlike prior studies that emphasize closed or proprietary systems, our focus is on establishing open, reproducible baselines rather than ranking against commercial references. The results reveal substantial variability across architectures, with some models delivering near-instant responses while others fail to meet interactive thresholds. By centering evaluation on responsiveness and reproducibility, this study provides an infrastructural foundation for benchmarking TTS systems and lays the groundwork for more comprehensive assessments that integrate both fidelity and speed.
Keywords: text-to-speech; voice assistant; responsiveness; benchmark; latency; trade-off; real-time text-to-speech; voice assistant; responsiveness; benchmark; latency; trade-off; real-time

Share and Cite

MDPI and ACS Style

Dinh, H.P.T.; Patamia, R.A.; Liu, M.; Cosgun, A. Benchmarking the Responsiveness of Open-Source Text-to-Speech Systems. Computers 2025, 14, 406. https://doi.org/10.3390/computers14100406

AMA Style

Dinh HPT, Patamia RA, Liu M, Cosgun A. Benchmarking the Responsiveness of Open-Source Text-to-Speech Systems. Computers. 2025; 14(10):406. https://doi.org/10.3390/computers14100406

Chicago/Turabian Style

Dinh, Ha Pham Thien, Rutherford Agbeshi Patamia, Ming Liu, and Akansel Cosgun. 2025. "Benchmarking the Responsiveness of Open-Source Text-to-Speech Systems" Computers 14, no. 10: 406. https://doi.org/10.3390/computers14100406

APA Style

Dinh, H. P. T., Patamia, R. A., Liu, M., & Cosgun, A. (2025). Benchmarking the Responsiveness of Open-Source Text-to-Speech Systems. Computers, 14(10), 406. https://doi.org/10.3390/computers14100406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop