Next Article in Journal
Uncertainty-Aware Design of High-Entropy Alloys via Ensemble Thermodynamic Modeling and Search Space Pruning
Previous Article in Journal
Efficient Recycling Process of Waste Sand with Inorganic Binder via Ultrasonic Treatment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Investigation of ASR Models for Low-Resource Kazakh Child Speech: Corpus Development, Model Adaptation, and Evaluation

by
Diana Rakhimova
1,2,*,
Zhansaya Duisenbekkyzy
1 and
Eşref Adali
3
1
Department of Information Systems, Al-Farabi Kazakh National University, Almaty 050040, Kazakhstan
2
Institute of Information and Computational Technologies, Almaty 050010, Kazakhstan
3
Department of Computer Engineering, Faculty of Computer and Informatics, Maslak Campus, Istanbul Technical University, Istanbul 34467, Turkey
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(16), 8989; https://doi.org/10.3390/app15168989
Submission received: 29 June 2025 / Revised: 7 August 2025 / Accepted: 8 August 2025 / Published: 14 August 2025
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

This study focuses on the development and evaluation of automatic speech recognition (ASR) systems for Kazakh child speech, an underexplored domain in both linguistic and computational research. A specialized acoustic corpus was constructed for children aged 2 to 8 years, incorporating age-related vocabulary stratification and gender variation to capture phonetic and prosodic diversity. The data were collected from three sources: a custom-designed Telegram bot, high-quality Dictaphone recordings, and naturalistic speech samples recorded in home and preschool environments. Four ASR models, Whisper, DeepSpeech, ESPnet, and Vosk, were evaluated. Whisper, ESPnet, and DeepSpeech were fine-tuned on the curated corpus, while Vosk was applied in its standard pretrained configuration. Performance was measured using five evaluation metrics: Word Error Rate (WER), BLEU, Translation Edit Rate (TER), Character Similarity Rate (CSRF2), and Accuracy. The results indicate that ESPnet achieved the highest accuracy (32%) and the lowest WER (0.242) for sentences, while Whisper performed well in semantically rich utterances (Accuracy = 33%; WER = 0.416). Vosk demonstrated the best performance on short words (Accuracy = 68%) and yielded the highest BLEU score (0.600) for short words. DeepSpeech showed moderate improvements in accuracy, particularly for short words (Accuracy = 60%), but faced challenges with longer utterances, achieving an Accuracy of 25% for sentences. These findings emphasize the critical importance of age-appropriate corpora and domain-specific adaptation when developing ASR systems for low-resource child speech, particularly in educational and therapeutic contexts.
Keywords: сhild speech recognition; ASR; low-resource languages; Kazakh language; fine-tuning; Whisper; ESPnet; DeepSpeech; Vosk сhild speech recognition; ASR; low-resource languages; Kazakh language; fine-tuning; Whisper; ESPnet; DeepSpeech; Vosk

Share and Cite

MDPI and ACS Style

Rakhimova, D.; Duisenbekkyzy, Z.; Adali, E. Investigation of ASR Models for Low-Resource Kazakh Child Speech: Corpus Development, Model Adaptation, and Evaluation. Appl. Sci. 2025, 15, 8989. https://doi.org/10.3390/app15168989

AMA Style

Rakhimova D, Duisenbekkyzy Z, Adali E. Investigation of ASR Models for Low-Resource Kazakh Child Speech: Corpus Development, Model Adaptation, and Evaluation. Applied Sciences. 2025; 15(16):8989. https://doi.org/10.3390/app15168989

Chicago/Turabian Style

Rakhimova, Diana, Zhansaya Duisenbekkyzy, and Eşref Adali. 2025. "Investigation of ASR Models for Low-Resource Kazakh Child Speech: Corpus Development, Model Adaptation, and Evaluation" Applied Sciences 15, no. 16: 8989. https://doi.org/10.3390/app15168989

APA Style

Rakhimova, D., Duisenbekkyzy, Z., & Adali, E. (2025). Investigation of ASR Models for Low-Resource Kazakh Child Speech: Corpus Development, Model Adaptation, and Evaluation. Applied Sciences, 15(16), 8989. https://doi.org/10.3390/app15168989

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop