Next Article in Journal
Shoe Recommendation System Integrating Generative Artificial Intelligence and Convolutional Neural Networks for Image Recognition
Previous Article in Journal
Integrating Sustainable Concepts into Blended Learning and Interactive Game System Design
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Speech Delay Assistive Device for Speech-to-Text Transcription Based on Machine Learning †

by
Maria Kristina C. Rodriguez
*,
Gheciel Mayce M. Santos
,
Jennifer C. Dela Cruz
and
Jmi C. Dela Cruz
School of Electrical, Electronics, and Communications Engineering, Mapua University, Manila 1002, Philippines
*
Author to whom correspondence should be addressed.
Presented at the 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering, Yunlin, Taiwan, 15–17 November 2024.
Eng. Proc. 2025, 92(1), 60; https://doi.org/10.3390/engproc2025092060
Published: 8 May 2025
(This article belongs to the Proceedings of 2024 IEEE 6th Eurasia Conference on IoT, Communication and Engineering)

Abstract

Despite advances by major companies, existing technologies often misinterpret speech from individuals with speech delays. To address this challenge, a portable machine learning (ML) speech-to-text assistive device was developed for speech-delayed children. The device is composed of a Raspberry Pi 4 and Google Web Speech API and enables the accurate transcription of challenging speech sounds of children aged 6 to 14 years old. The device performs noise reduction and digital transcription. Its performance was validated by speech language pathologists (SLPs). The device achieved 94% word accuracy, 92% sentence accuracy, and a word error rate (WER) of 0 to 14%. The ML-based device is a significant improvement on existing speech therapy tools, offering an accessible solution for speech-delayed children.

1. Introduction

Speech is a natural way of communication and to express an individual’s thoughts systematically in distinct languages. However, it is challenging for speech-impaired children. Although speech-to-text technology has significantly improved and been used by major companies including Apple and Android with Siri and Alexa, respectively, people with speech impairments are often misunderstood by these devices. Speech language pathologists (SLPs) employ assistive technology to aid these individuals, but there is still a need for more resources and technology for speech-delayed patients.
This research aimed to develop a machine learning (ML) speech-to-text assistive device. The portable assistive device converts speech to text using machine learning. There are many machine learning technologies used in this research. A neural network was used to analyze sound signatures and classify the maturity level of coconuts [1], while support vector machine (SVM), random forest, adaptive boosting (AdaBoost), and Bootstrapping were employed to classify stress levels. The Bootstrapping algorithm is a resampling technique that draws out samples from the source to estimate a population parameter with high accuracy. This algorithm demonstrates the effectiveness of ML in processing stress patterns [2].
SLPs need to validate the performance of the experimental device. Using a Raspberry Pi, the device transcribes speech in children with delays, addressing oral, motor and feeding issues to improve communication. In this study, the developed device was tested on children aged 6 to 14 years old, focusing on accurately transcribing challenging speech sounds. However, diagnosing other impairments or medical conditions was not conducted. Broca’s area in the brain is a significant component involved in speech, controlling various functions such as thought, memory, emotion, motor skills, vision, and essential bodily processes [3]. Other technologies can be used to help speech-impaired individuals use the electronic communication system (ECS), which is one-finger operated, offering real-time responses with an average assessment score of 9.02. However, the limitations of this device must be considered when helping speech-delayed children [4].

2. Methodology

We captured the patient’s speech using a microphone and created a recorded file. From the file, data gathering, noise filtering, and transcription were conducted using Python 3.10 algorithms and ML. A timestamped and dated transcribed text file was obtained and saved as an Excel file on a USB drive, ensuring accurate and organized data management for efficient analysis and review.
ML algorithms transcribed the speech and displayed the text on a graphical user interface (GUI). GUI prompted a retry until accurate transcription was achieved. The final text was saved as a .csv file in a USB drive connected to the device’s Raspberry Pi. The .csv file was converted into an Excel file for easy documentation and printing, allowing tracking of the child’s progress over time. Figure 1 shows the system’s operational flowchart.
Figure 2 shows a code for recording the patient’s statement. Python programming was used to analyze the code and record the statement of the patient.
GUI with Raspberry Pi’s Thonny Python IDE was used to calibrate the texts. The file system executed the main GUI code to check for responsiveness, functionality, and intuitive design. Real-time adjustments are made to meet user needs. The Raspberry Pi’s configuration file matched the 7-inch LCD touchscreen’s resolution, and the microphone input volume was adjusted with noise reduction techniques for clear speech capture.

3. Machine Learning

Speech Recognition Algorithm

The device adopted advanced ML technology by integrating Google’s Web Speech API to convert speech to text with high accuracy. Google Web Speech API enables developers to integrate voice recognition and transcription functionalities into web applications. It leverages Google’s ML models to convert spoken language into text, supporting multiple languages and dialects. The API was designed in this study for ease of use, allowing developers to incorporate speech-to-text capabilities with minimal code [5]. By using the SpeechRecognition library in Python, the device captured audio input and processed it through Google’s sophisticated ML models (Figure 3).
The integration of the recognize_google function enabled the device to utilize these cutting-edge models, ensuring reliable and precise transcription. This seamless integration of ML technology provided an effective and practical solution for individuals with speech difficulties, improving their communication and overall quality of life.
Ten clinically useful polysyllabic words such as “ambulance” and “helicopter” were selected to assess children’s speech development and articulation. These words challenge children to articulate complex sound combinations and manage syllable stress, providing diagnostic information. Sentences with complex consonant clusters, such as -th, -sl, -ch, -fr, -bl, -cl, -fl, and -rt, were used to train the device on a wide range of speech sounds to ensure accurate transcription for speech therapy sessions [6]. The noise was reduced using the reduction filter in the Rpi 4 and the quad-core ARM Cortex-A72 CPU within the Broadcom BCM2711 SoC. Audio data were captured through the microphone to execute noise reduction algorithms. The adjust for ambient noise function from the speech recognition library dynamically adjusted the microphone sensitivity to filter out background noise. The ARM Cortex-A72’s advanced SIMD and floating point (FP) units enabled efficient mathematical operations for these signal-processing tasks [5] (Figure 4).
The quad-core ARM Cortex-A72 CPU, with advanced SIMD, FP, and cryptography units, efficiently processed instructions and large data sets of audio streams, aided by its L2 cache for fast memory access and enhanced real-time noise reduction. The Python function recognize speech() used the Speech Recognition library, initializing a control variable and creating instances of Recognizer() and Microphone() for accurate speech recognition.
WER is a critical metric in speech recognition for evaluating the accuracy of transcribed speech. Based on the Levenshtein distance, the concept assesses errors in decoding algorithms, highlighting the relationship between error propagation and decoding matrix size [7] to assess the performance of speech recognition systems [8].
WER was calculated using (1).
W E R = S w + D w + I w N w

4. Results and Discussion

Trials for Speech Delayed Children

Younger patients showed higher success rates, with occasional device failures in reading specific words. Patients 3 (6 years old) and 4 (7 years old) exhibited similar patterns, with initial issues resolved in subsequent trials, while Patient 5 (8 years old) maintained perfect results throughout all trials. Older patients showed mixed results; patients 6 (10 years old) and 9 (13 years old) consistently succeeded, but Patients 7 (11 years old) and 10 (13 years old) experienced more frequent issues and variability in their trial results (Table 1).
The data from patients with speech delays showed varying levels of device effectiveness in reading words across different trials (Table 2). The younger patients, Patients 1 and 2 (6 years old), showed high success rates, with occasional device failures, while older patients, such as Patient 6 (10 years old) and Patient 9 (13 years old), showed consistent success, and Patient 7 (11 years old) and Patient 10 (13 years old) faced more frequent issues. This variability suggested that several patients encountered challenges in ensuring consistent success, but overall, the device generally performed well.
Figure 5 shows the outcomes for average speaking and speech delay patients using a specific device for the word trial. The “YES” responses were nearly identical, with 468 average speaking and 469 speech delay patients. The “NO” responses were also closely matched, with 32 average speaking and 31 speech delay patients. This close alignment between the two groups demonstrated the device’s high accuracy and reliability in assessing average speech and speech delay patients for the word trial.
Figure 6 shows the device’s performance during the sentencing trial for average-speaking patients and those with speech delay. There were 294 successful “YES” trials for average speakers, indicating high effectiveness, with only 6 “NO” trials showing high accuracy. Patients with speech delay had 274 successful “YES” trials and 26 unsuccessful “NO” trials, indicating reasonable accuracy but room for improvement. Overall, the device was highly effective and accurate for average speakers and reasonably practical for those with speech delay.
Mean is a statistical measure calculated by adding up all the values in a data set and then dividing them by the number of values. It presents a central value of the overall trend of the data. It is used to summarize extensive data sets with a single number and understand the general performance or behavior of the data. It is widely used to make comparisons, identify trends, and inform decision-making processes [9].
Mean is calculated using (2).
x ¯ = x N
The mean number of “YES” responses across all age groups for average-speaking children and speech-delayed children were 156 and 156.33. This small difference indicates a high degree of accuracy of the device in reading words. The consistency of these results across different age groups underscored the reliability of the device. The results consistently demonstrated the device’s capability to accurately read words regardless of the child’s speech abilities. This level of accuracy was high, which suggested that the device was effective for both average-speaking and speech-delayed children. Overall, these findings highlighted the device’s potential for broad application in various speech development contexts.
Table 3 shows the ranks of how frequently words were recognized by the speech-to-text device. “Caterpillar” ranked highest with 49 recognitions, while “Caravan” ranked lowest with 36. These data highlighted the device’s performance, showing which words were easily recognized and more challenging, offering information on how to improve transcription accuracy in children with speech delays.
Table 4 ranks sentences based on transcription difficulty, with the first being the easiest and the sixth the hardest. “Clap your hands together” ranked first with 49 recognitions, followed by “The boy is taking a bath” with 48. “Did you hurt your foot?” ranked sixth with 40 recognitions, making it challenging. The ranking highlighted varying transcription challenges and provided information on how the sentence structure impacted the system’s performance.
Dela Cruz, a registered speech-language pathologist from the Philippines, validated the intelligibility ratings for ten speech-delayed patients aged 6 to 13 years old. On Table 5, the ratings ranged from one (fully intelligible) to five (completely unintelligible). Younger patients generally showed better intelligibility, mostly around two and three, though one 6-year-old was rated four. Older patients showed more variability, with ratings from one to five, indicating a broader range of speech difficulties. Several patients were intelligible, while others were not, highlighting the need for personalized speech therapy. The RSLP’s validation ensures accurate ranking and reinforces the reliability of these ratings.
A low WER presented better performance, leading to higher user satisfaction and trust in the technology.
WER was calculated using (3).
W E R = S w + D w + I w N w
Table 6 shows the WER for ten patients with speech delay, detailing substitution, deletion, and insertion errors made while transcribing 50 words each. WER was calculated by dividing the total errors by the total words and multiplying by 100. Patients 6 and 9 had a WER of 0%, while Patient 7 showed the highest at 14%, with two substitutions and five deletions. This result highlighted transcription accuracy differences, showing patients’ varying error levels. A low WER is essential for the accuracy and reliability of a speech recognition system.

5. Conclusion and Recommendations

We developed an ML speech-to-text assistive device to enhance transcription and understanding in children with speech delay. Using the Raspberry Pi platform, the device converted speech to text with advanced audio processing filters to reduce background noise, improving clarity. Significant transcription accuracy was obtained with positive feedback from speech language pathologists (SLPs) on its usability and therapeutic benefits. The portable device employs Google’s Web Speech API for precise speech analysis, demonstrating effective speech-to-text conversion. An audio processing filter using the Broadcom BCM27711 SoC in the Raspberry Pi reduced external noise effectively. The quad-core ARM Cortex-A72 CPU and software libraries, such as the “adjust for ambient noise” function, enabled noise reduction. An Excel file was developed as a comprehensive data repository and analysis tool. SLP-validated accuracy testing was conducted with specific statements using WER, and the device’s reliability and precision were confirmed. It is necessary to explore other speech disorders and incorporate additional formulas such as percent consonants correct (PCC) to enhance data credibility. The device needs to be more portable by integrating the microphone directly into it. By including more children with various speech disorders, the training dataset can be expanded. More metrics need to be used for the assessment and evaluation of the device’s long-term effectiveness. Adaptability to changing speech patterns is also required.

Author Contributions

Conceptualization, M.K.C.R. and G.M.M.S.; methodology, M.K.C.R. and G.M.M.S.; software, M.K.C.R. and G.M.M.S.; validation, J.C.D.C. (Jennifer C. Dela Cruz); formal analysis, M.K.C.R. and G.M.M.S.; investigation, M.K.C.R. and G.M.M.S.; resources, M.K.C.R. and G.M.M.S.; data curation, M.K.C.R. and G.M.M.S.; writing—original draft preparation, M.K.C.R. and G.M.M.S.; writing—review and editing, M.K.C.R. and G.M.M.S.; visualization, M.K.C.R. and G.M.M.S.; supervision, J.C.D.C. (Jennifer C. Dela Cruz); project administration, J.C.D.C. (Jmi C. Dela Cruz). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The dataset generated during the current study is not publicly available but is available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fadchar, N.A.; Dela Cruz, J.C. Design and Development of a Neural Network—Based Coconut Maturity Detector Using Sound Signatures. In Proceedings of the 2020 IEEE 7th International Conference on Industrial Engineering and Applications (ICIEA), Bangkok, Thailand, 16–21 April 2020; pp. 927–931. [Google Scholar] [CrossRef]
  2. Pauzi, T.M.A.A.T.M.; Samah, A.A.; Dela Cruz, J.C.; Ghaffa, D.; Nordin, R.; Abdullah, N.F. Classification of Stress using ML Based on Physiological and Psychological Data from Wearables. In Proceedings of the 2023 IEEE 15th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Coron, Palawan, Philippines, 19–23 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
  3. Hammond, N.; Cafasso, J. What Part of the Brain Controls Speech? 2022 Healthline Media. 17 May 2019. Available online: https://www.healthline.com/health/what-part-of-the-brain-controls-speech (accessed on 12 April 2022).
  4. Mule, P.; Cheeran, A.N.; Palav, T.; Sasi, S. Low cost and easy to use Electronic Learning. Communication System for speech impaired people with Wired and Wireless operability. In Proceedings of the 2016 IEEE International Conference on Engineering and Technology (ICETECH), Coimbatore, India, 17–18 March 2016; pp. 1194–1198. [Google Scholar] [CrossRef]
  5. Web Docs. Web Speech API. MDN Web Docs. 2021. Available online: https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API (accessed on 2 August 2024).
  6. Bowen, C. Children’s Speech Sound Disorders, 2nd ed.; Wiley Professional, Reference & Trade (Wiley K&L): Hoboken, NJ, USA, 2014; Available online: https://bookshelf.vitalsource.com/books/9781118634011 (accessed on 8 July 2024).
  7. Hickey, R. Standard English and Standards of English; Cambridge University Press: Cambridge, UK, 2012; Available online: https://www.cambridge.org/core/books/abs/standards-of-english/standard-english-and-standards-of-english/96A3A5BD37C3C294B209BB9B14F141C6#access-block (accessed on 24 June 2024).
  8. Drongowski, P. Raspberry Pi 4 ARM Cortex-A72 Processor. Sand, Software and Sound—Electronics and Computing for the Fun of It. 2021. Available online: https://sandsoftwaresound.net/raspberry-pi-4-arm-cortex-a72-processor/ (accessed on 2 August 2024).
  9. Hirsh, I.J. Book Review: Speech Science Primer: Physiology, Acoustics, and Perception of Speech. Ann. Otol. Rhinol. Laryngol. 1981, 90, 412–413. [Google Scholar] [CrossRef]
Figure 1. System’s operational flowchart.
Figure 1. System’s operational flowchart.
Engproc 92 00060 g001
Figure 2. Sample code for the device to record a voice.
Figure 2. Sample code for the device to record a voice.
Engproc 92 00060 g002
Figure 3. ML code.
Figure 3. ML code.
Engproc 92 00060 g003
Figure 4. Block diagram of the quad-core ARM Cortex-A72 CPU.
Figure 4. Block diagram of the quad-core ARM Cortex-A72 CPU.
Engproc 92 00060 g004
Figure 5. Comparison of results for words.
Figure 5. Comparison of results for words.
Engproc 92 00060 g005
Figure 6. Comparison of sentence results.
Figure 6. Comparison of sentence results.
Engproc 92 00060 g006
Table 1. Speech delay patients’ trials for words.
Table 1. Speech delay patients’ trials for words.
PatientAgeGenderTrial No.W1W2W3W4W5W6W7W8W9W10
16B1YESYESYESYESYESYESYESYESYESYES
2YESYESYESNOYESYESYESYESYESYES
3YESYESYESYESYESYESYESYESYESYES
4YESYESYESYESNONOYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
26B1YESYESYESYESYESYESYESYESYESYES
2YESYESNOYESYESYESNOYESYESYES
3YESYESNOYESYESYESYESYESYESYES
4YESYESYESYESYESYESYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
36B1YESYESYESYESYESYESYESYESYESYES
2YESYESYESYESNOYESYESNOYESYES
3YESNOYESYESYESYESYESNOYESYES
4YESYESYESYESNOYESYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
47B1NOYESYESYESNOYESYESYESYESYES
2YESYESYESYESYESYESNOYESYESYES
3YESYESYESYESYESYESYESYESYESYES
4YESYESYESYESYESYESYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
58B1YESYESYESYESYESYESYESYESYESYES
2YESYESYESYESYESYESYESYESYESYES
3YESYESYESYESYESYESYESYESYESYES
4YESYESYESYESYESYESYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
610B1YESYESYESYESYESYESYESYESYESYES
2YESYESYESYESYESYESYESYESYESYES
3YESYESYESYESYESYESYESYESYESYES
4YESYESYESYESYESYESYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
711B1YESYESYESYESYESYESYESYESYESYES
2YESNONOYESYESYESNOYESYESYES
3YESNOYESYESNOYESNOYESYESYES
4YESYESYESYESNOYESYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
812G1YESYESYESYESYESYESYESYESYESYES
2YESYESYESNOYESYESYESYESYESYES
3YESYESYESYESYESYESYESYESYESYES
4YESYESYESYESNONOYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
913B1YESYESYESYESYESYESYESYESYESYES
2YESYESYESYESYESYESYESYESYESYES
3YESYESYESYESYESYESYESYESYESYES
4YESYESYESYESYESYESYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
1013G1YESYESYESNOYESYESNOYESYESYES
2YESYESNONOYESYESYESYESYESYES
3YESYESYESYESNOYESNOYESYESYES
4NOYESYESYESYESYESYESYESYESYES
5YESYESYESYESYESYESYESYESYESYES
Table 2. Speech Delay Patients’ Trials for Sentences.
Table 2. Speech Delay Patients’ Trials for Sentences.
PatientAgeGenderTrial No. S1S4S5S7S8S10
16B1YESYESYESNOYESYES
2YESYESYESYESYESYES
3YESYESYESYESYESYES
4YESYESYESYESYESYES
5YESYESYESYESYESYES
26B1YESYESYESYESYESYES
2YESYESYESNOYESYES
3YESYESYESYESYESNO
4YESYESYESYESYESYES
5YESYESYESYESYESYES
36B1YESYESYESYESYESYES
2NOYESYESYESYESYES
3YESYESYESYESYESNO
4NONOYESYESYESNO
5YESYESYESYESYESYES
47B1YESYESYESYESYESYES
2YESYESYESYESYESYES
3YESYESYESYESYESYES
4YESYESYESYESYESYES
5YESYESYESYESYESYES
58B1YESYESYESYESYESNO
2YESYESYESYESYESYES
3YESYESYESYESYESNO
4YESYESYESYESYESYES
5YESYESYESYESYESYES
610B1YESYESYESYESYESYES
2NONONOYESYESYES
3NOYESYESYESYESYES
4YESYESYESNOYESNO
5YESYESYESYESYESYES
711B1YESYESYESYESYESYES
2YESYESYESYESYESNO
3YESYESNOYESYESNO
4YESYESNOYESYESYES
5YESYESYESYESYESYES
812G1YESYESYESYESYESYES
2YESYESYESYESYESYES
3YESYESYESYESYESYES
4YESYESYESYESYESYES
5YESYESYESYESYESYES
913B1YESYESYESYESYESYES
2YESYESYESYESYESYES
3YESYESYESYESYESYES
4YESYESYESYESYESYES
5YESYESYESYESYESYES
1013G1YESYESYESNOYESYES
2YESYESYESYESYESNO
3YESYESNONOYESYES
4YESYESYESNOYESNO
5YESYESYESYESYESYES
Table 3. Ranking of Words.
Table 3. Ranking of Words.
WordNumber of Times the Word Was RecognizedRanking
(1 to 10)
Caterpillar491st
Helicopter482nd
Ambulance473rd
Butterfly464th
Vegetables455th
Spaghetti436th
Computer427th
Hippopotamus428th
Animals399th
Caravan3610th
Table 4. Ranking of Sentences.
Table 4. Ranking of Sentences.
SentencesNumber of Times the Word Was RecognizedRanking
(1 to 6)
Clap your hands together.491st
The boy is taking a bath.482nd
I saw a black cat.473rd
I want to eat lunch in the kitchen.464th
I went to the mall.415th
Did you hurt your foot?406th
Table 5. SLP Validation.
Table 5. SLP Validation.
PatientAgeGenderIntelligibility RatingSLP Validation
16Boy3Engproc 92 00060 i001
26Boy3Engproc 92 00060 i001
36Boy4Engproc 92 00060 i001
47Boy2Engproc 92 00060 i001
58Boy3Engproc 92 00060 i001
610Boy3Engproc 92 00060 i001
711Boy4Engproc 92 00060 i001
812Girl2Engproc 92 00060 i001
913Boy1Engproc 92 00060 i001
1013Girl5Engproc 92 00060 i001
Table 6. WER Results.
Table 6. WER Results.
PatientSubstitutionDeletionInsertionTotal WordsWER (%)
1030506
2210506
31405010
4210506
5100502
6000500
72505014
8030506
9000500
101605014
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rodriguez, M.K.C.; Santos, G.M.M.; Cruz, J.C.D.; Cruz, J.C.D. Speech Delay Assistive Device for Speech-to-Text Transcription Based on Machine Learning. Eng. Proc. 2025, 92, 60. https://doi.org/10.3390/engproc2025092060

AMA Style

Rodriguez MKC, Santos GMM, Cruz JCD, Cruz JCD. Speech Delay Assistive Device for Speech-to-Text Transcription Based on Machine Learning. Engineering Proceedings. 2025; 92(1):60. https://doi.org/10.3390/engproc2025092060

Chicago/Turabian Style

Rodriguez, Maria Kristina C., Gheciel Mayce M. Santos, Jennifer C. Dela Cruz, and Jmi C. Dela Cruz. 2025. "Speech Delay Assistive Device for Speech-to-Text Transcription Based on Machine Learning" Engineering Proceedings 92, no. 1: 60. https://doi.org/10.3390/engproc2025092060

APA Style

Rodriguez, M. K. C., Santos, G. M. M., Cruz, J. C. D., & Cruz, J. C. D. (2025). Speech Delay Assistive Device for Speech-to-Text Transcription Based on Machine Learning. Engineering Proceedings, 92(1), 60. https://doi.org/10.3390/engproc2025092060

Article Metrics

Back to TopTop