Touch to Speak: Real-Time Tactile Pronunciation Feedback for Individuals with Speech and Hearing Impairments
Abstract
1. Introduction
Our Contribution
2. Related Work
2.1. Clinical Interventions: Speech–Language Pathologists
2.2. Real-Time Translation and Captioning Technologies
2.3. Haptic Feedback Systems for Language Support
- TActile Phonemic Sleeve (TAPS): A wearable system that encodes English phonemes into tactile patterns on the forearm. Tan et al. [20] demonstrated the acquisition of over 500 English words through repeated haptic exposure. However, TAPS is optimized for vocabulary comprehension, not for pronunciation output or real-time correction.
- HAPOVER System: Ilhan and Kaçanoğlu [4] introduced HAPOVER, a wearable pronunciation training device utilizing eight vibration motors to provide phoneme-level feedback. The system captures speech, extracts Mel-Frequency Cepstral Coefficients (MFCC), and compares them with reference templates using Dynamic Time Warping (DTW). If a mismatch is detected, it delivers corrective haptic signals to guide users toward accurate articulation. Their pilot study reported an 80.74% improvement in word pronunciation accuracy among second-language learners after training. Despite its promising results, HAPOVER uses a limited number of actuators and covers only a subset of phonemes, without a scalable design for full language production support.
- Skin Reading Interfaces: Luzhnica et al. [31] proposed encoding textual information using six-channel haptic displays. While innovative, the system focuses on passive reception and requires high cognitive training for interpretation.
- Vibrotactile Encoding Capacity: Novich and Eagleman [32] investigated how spatial and temporal patterns of vibration can encode complex information, estimating the throughput capacity of human skin. Their results highlight the potential—but also the limitations—of tactile systems for detailed language feedback.
- Tactile Vocoder with Envelope Expansion: Fletcher et al. [33,34] proposed an audio-to-tactile substitution approach using amplitude envelope expansion across multiple frequency bands. Their system demonstrated enhanced phoneme identification accuracy—especially for vowels and consonants—under noisy conditions, achieving an average improvement of 9.6%. The authors emphasized the clinical potential of such compact, low-power devices for real-time feedback. However, while promising for passive perception, this system does not provide active pronunciation correction or phoneme-level feedback tailored to speech production.
- Phoneme-Based Tactile Display: Jung et al. [35] developed a 4 × 6 tactile actuator array that maps 39 English phonemes to unique vibration patterns, allowing users to receive words and short messages via touch. In their study, participants achieved an average word recognition accuracy of 87% for trained four-phoneme words and 77% for novel words after 3–5 h of training. Additionally, a longitudinal experiment demonstrated two-way tactile communication with 73% phoneme-level and 82% word-level accuracy, albeit at a limited transmission rate (approximately one message per minute). While this system effectively supports connected phoneme-level reception and real-world communication scenarios, it does not offer active feedback for pronunciation correction or real-time speech training.
- Phoneme-Based Tactile Display: Jung et al. [35] developed a tactile actuator array that maps 39 English phonemes (their chosen inventory) to unique vibration patterns, allowing users to receive words and short messages via touch. In their study, participants achieved an average word recognition accuracy of 87% for trained four-phoneme words and 77% for novel words after 3–5 h of training. Additionally, a longitudinal experiment demonstrated two-way tactile communication with 73% phoneme-level and 82% word-level accuracy, albeit at a limited transmission rate (approximately one message per minute). While this system effectively supports connected phoneme-level reception and real-world communication scenarios, it does not offer active feedback for pronunciation correction or real-time speech training.
2.4. Limitations in Existing Approaches
2.5. Summary and Gap Analysis
- Most systems lack real-time, corrective feedback.
- Phoneme coverage is often partial.
- Few are designed for active pronunciation training or broad scalability.
- Real-time, closed-loop feedback;
- Full phoneme coverage;
- A scalable, therapist-free design aligned with cognitive and perceptual principles.
3. Method
3.1. System Overview
- Raspberry Pi 5 for speech processing and experiment control, running Python-based algorithms (ver 3.8).
- Arduino Uno to individually control up to 26 vibration motors.
- A microphone (speech input; not used during core haptic feedback phases).
- A speaker (optional auditory feedback; not used in the main experiments).
- A 3D-printed ergonomic hand rest for consistent contact and user comfort.
3.2. Phonemes and the International Phonetic Alphabet (IPA)
3.3. Haptic Feedback Selection
- Compactness and cost-effectiveness;
- Clear, discriminable tactile sensation;
- imple integration with microcontrollers.
3.4. Phoneme Conversion and Haptic Mapping Procedure
3.5. Mechanical Design and Assembly
- Ergonomic hand fixture for comfortable, repeatable placement;
- Precise motor mounts to avoid vibratory overlap;
- Ventilation for electronics cooling.
3.6. Experiment Protocol and Participants
- Familiarization: Participants learned the phoneme-to-motor mapping via isolated haptic cues.
- Production Task: The experimenter delivered each word’s phoneme sequence as haptic cues; participants pronounced the word aloud, relying solely on touch.
- Correction: Incorrect attempts were met with a “not correct” prompt; repetition continued until success.
- Timing: The researcher measured time from cue onset to successful pronunciation with a stopwatch.
4. Results
4.1. Experiment 1—Word Recognition Task
4.2. Experiment 2—Learning Curves with Expanding Phoneme Sets
- Learning effect: Mean times dropped by from Phase 1 to 2, indicating rapid adaptation. Phase 3 times increased slightly but remained faster than initial performance.
- Outliers: One slow participant was excluded; the fastest individual achieved times below 12 s per word.
4.3. Qualitative Observations
- Native English advantage. Participants who were native in both Hebrew and English—or who had native-level proficiency in English—consistently achieved faster and more accurate detections. They were already accustomed to the mismatch between spelling and pronunciation; for example, they could segment candle as kændəl, whereas almost all native Hebrew speakers cannot distinguish sheep from ship.
- Phoneme-load effect with learning curve. As predicted, tasks containing a larger inventory of phonemes posed a greater initial challenge; mean detection times fell across successive sessions, indicating a clear learning effect.
- Gender parity. No statistically significant performance differences were observed between male and female participants ().
- Zero-detection group. In addition to the 20 participants, four volunteers (two men and two women) failed to detect even a single word. None of these four were native English speakers, implying that some people—particularly those with little to no phonological experience in English—may find the task nearly impossible.
- Usability impressions. Participants provided positive feedback regarding the usability of the system. Even those who did not successfully complete the tasks expressed appreciation for the intuitiveness of the tactile interface. Notably, it became evident that the hand’s placement on the tactile platform played a crucial role in performance. Each participant naturally adjusted their hand to a position that felt most comfortable and effective for detecting the vibrations. This self-guided adjustment suggests that future designs may benefit from incorporating adaptable or user-customizable hand rests to enhance tactile perception and learning outcomes.
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kushalnagar, R. Deafness and hearing loss. In Web Accessibility: A Foundation for Research; Springer: Berlin/Heidelberg, Germany, 2019; pp. 35–47. [Google Scholar]
- Graydon, K.; Waterworth, C.; Miller, H.; Gunasekera, H. Global burden of hearing impairment and ear disease. J. Laryngol. Otol. 2019, 133, 18–25. [Google Scholar] [CrossRef] [PubMed]
- Guo, R. Advancing real-time close captioning: Blind source separation and transcription for hearing impairments. Appl. Comput. Eng. 2024, 30, 125–130. [Google Scholar] [CrossRef]
- Ilhan, R.; Kaçanoğlu, K. HAPOVER: A Haptic Pronunciation Improver Device. IEEJ Trans. Electr. Electron. Eng. 2024, 19, 985–992. [Google Scholar] [CrossRef]
- Greenwell, T.; Walsh, B. Evidence-based practice in speech-language pathology: Where are we now? Am. J. Speech Lang. Pathol. 2021, 30, 186–198. [Google Scholar] [CrossRef]
- Tambyraja, S.R. Facilitating parental involvement in speech therapy for children with speech sound disorders: A survey of speech-language pathologists’ practices, perspectives, and strategies. Am. J. Speech Lang. Pathol. 2020, 29, 1987–1996. [Google Scholar] [CrossRef]
- CATy, M.È.; Kinsella, E.A.; DOyLE, P.C. Reflective practice in speech-language pathology: A scoping review. Int. J. Speech Lang. Pathol. 2015, 17, 411–420. [Google Scholar] [CrossRef]
- Gomez-Risquet, M.; Cáceres-Matos, R.; Magni, E.; Luque-Moreno, C. Effects of Haptic Feedback Interventions in Post-Stroke Gait and Balance Disorders: A Systematic Review and Meta-Analysis. J. Pers. Med. 2024, 14, 974. [Google Scholar] [CrossRef]
- Bogach, N.; Boitsova, E.; Chernonog, S.; Lamtev, A.; Lesnichaya, M.; Lezhenin, I.; Novopashenny, A.; Svechnikov, R.; Tsikach, D.; Vasiliev, K.; et al. Speech processing for language learning: A practical approach to computer-assisted pronunciation teaching. Electronics 2021, 10, 235. [Google Scholar] [CrossRef]
- Pennington, M.C.; Rogerson-Revell, P.; Pennington, M.C.; Rogerson-Revell, P. Using technology for pronunciation teaching, learning, and assessment. In English Pronunciation Teaching and Research: Contemporary Perspectives; Palgrave Macmillan: London, UK, 2019; pp. 235–286. [Google Scholar]
- Raghul, J.; Sreya, M.; Maithreye, S.; Durga Devi, K. Mobility Stick with Haptic Feedback for People with Vision Impairments. In Proceedings of the 2024 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT), Karaikal, India, 4–5 July 2024; pp. 1–6. [Google Scholar]
- Kleinberg, D.; Yozevitch, R.; Abekasis, I.; Israel, Y.; Holdengreber, E. A haptic feedback system for spatial orientation in the visually impaired: A comprehensive approach. IEEE Sens. Lett. 2023, 7, 1–4. [Google Scholar] [CrossRef]
- Emami, M.; Bayat, A.; Tafazolli, R.; Quddus, A. A survey on haptics: Communication, sensing and feedback. IEEE Commun. Surv. Tutor. 2024, 27, 2006–2050. [Google Scholar] [CrossRef]
- Reed, C.M.; Tan, H.Z.; Jones, L.A. Haptic communication of language. IEEE Trans. Haptics 2023, 16, 134–153. [Google Scholar] [CrossRef]
- Xia, J.; Pei, S.; Chen, Z.; Wang, L.; Hu, J.; Wang, J. Effects of Conventional Speech Therapy with Liuzijue Qigong, a Traditional Chinese Method of Breath Training, in 70 Patients with Post-Stroke Spastic Dysarthria. Med. Sci. Monit. Int. Med. J. Exp. Clin. Res. 2023, 29, e939623-1–e939623-10. [Google Scholar] [CrossRef] [PubMed]
- de Groot, A.; Eijsvoogel, N.; van Well, G.; van Hout, R.; de Vries, E. Evidence-based decision-making in the treatment of speech, language, and communication disorders in Down syndrome; a scoping review. J. Intellect. Disabil. 2024, 17446295241283659. [Google Scholar]
- Martins, G.d.S.; Santos, I.R.D.d.; Brazorotto, J.S. Validation of an educational resource for speech therapists on the use of video feedback in training families of hearing-impaired children. Audiol. Commun. Res. 2024, 29, e2928. [Google Scholar] [CrossRef]
- Holdengreber, E.; Yozevitch, R.; Khavkin, V. Intuitive cognition-based method for generating speech using hand gestures. Sensors 2021, 21, 5291. [Google Scholar] [CrossRef] [PubMed]
- Yozevitch, R.; Frenkel-Toledo, S.; Elion, O.; Levy, L.; Ambaw, A.; Holdengreber, E. Cost-Effective and Efficient Solutions for the Assessment and Practice of Upper Extremity Motor Performance. IEEE Sens. J. 2023, 23, 23494–23499. [Google Scholar] [CrossRef]
- Tan, H.Z.; Reed, C.M.; Jiao, Y.; Perez, Z.D.; Wilson, E.C.; Jung, J.; Martinez, J.S.; Severgnini, F.M. Acquisition of 500 English words through a TActile phonemic sleeve (TAPS). IEEE Trans. Haptics 2020, 13, 745–760. [Google Scholar] [CrossRef]
- Weidner, K.; Lowman, J. Telepractice for adult speech-language pathology services: A systematic review. Perspect. Asha Spec. Interest Groups 2020, 5, 326–338. [Google Scholar] [CrossRef]
- Powell, R.K. Unique contributors to the curriculum: From research to practice for speech-language pathologists in schools. Lang. Speech Hear. Serv. Sch. 2018, 49, 140–147. [Google Scholar] [CrossRef]
- Thomas, S.; Schulz, J.; Ryder, N. Assessment and diagnosis of Developmental Language Disorder: The experiences of speech and language therapists. Autism Dev. Lang. Impair. 2019, 4, 2396941519842812. [Google Scholar] [CrossRef]
- Raina, S. Schizophrenia: Communication disorders and role of the speech-language pathologist. Am. J. Speech Lang. Pathol. 2024, 33, 1099–1112. [Google Scholar] [CrossRef] [PubMed]
- Otoom, M.; Alzubaidi, M.A. Ambient intelligence framework for real-time speech-to-sign translation. Assist. Technol. 2018, 30, 119–132. [Google Scholar] [CrossRef] [PubMed]
- Shull, P.B.; Damian, D.D. Haptic wearables as sensory replacement, sensory augmentation and trainer—A review. J. Neuroeng. Rehabil. 2015, 12, 1–13. [Google Scholar] [CrossRef]
- Giri, G.S.; Maddahi, Y.; Zareinia, K. An application-based review of haptics technology. Robotics 2021, 10, 29. [Google Scholar] [CrossRef]
- Irigoyen, E.; Larrea, M.; Graña, M. A Narrative Review of Haptic Technologies and Their Value for Training, Rehabilitation, and the Education of Persons with Special Needs. Sensors 2024, 24, 6946. [Google Scholar] [CrossRef]
- Levy, L.; Ambaw, A.; Ben-Itzchak, E.; Holdengreber, E. A real-time environmental translator for emotion recognition in autism spectrum disorder. Sci. Rep. 2024, 14, 31527. [Google Scholar] [CrossRef]
- Levy, L.; Blum, Y.; Ambaw, A.; Yozevitch, R.; Holdengreber, E. Harnessing Haptic Technology for Real-Time Emotion Detection. IEEE Sens. Lett. 2025, 9, 5500804. [Google Scholar] [CrossRef]
- Luzhnica, G.; Veas, E.; Pammer, V. Skin reading: Encoding text in a 6-channel haptic display. In Proceedings of the 2016 ACM International Symposium on Wearable Computers, Heidelberg, Germany, 12–16 September 2016; pp. 148–155. [Google Scholar]
- Novich, S.D.; Eagleman, D.M. Using space and time to encode vibrotactile information: Toward an estimate of the skin’s achievable throughput. Exp. Brain Res. 2015, 233, 2777–2788. [Google Scholar] [CrossRef]
- Fletcher, M.D.; Akis, E.; Verschuur, C.A.; Perry, S.W. Improved tactile speech perception and noise robustness using audio-to-tactile sensory substitution with amplitude envelope expansion. Sci. Rep. 2024, 14, 15029. [Google Scholar] [CrossRef]
- Fletcher, M.D.; Verschuur, C.A.; Perry, S.W. Improving speech perception for hearing-impaired listeners using audio-to-tactile sensory substitution with multiple frequency channels. Sci. Rep. 2023, 13, 13336. [Google Scholar] [CrossRef]
- Jung, J.; Reed, C.M.; Martinez, J.S.; Tan, H.Z. Tactile Speech Communication: Reception of Words and Two-Way Messages through a Phoneme-Based Display. Virtual Worlds 2024, 3, 184–207. [Google Scholar] [CrossRef]
- Anazia, E.K.; Eti, E.F.; Ovili, P.H.; Ogbimi, O.F. Speech-To-Text: A Secured Real-Time Language Translation Platform for Students. FUDMA J. Sci. 2024, 8, 329–338. [Google Scholar] [CrossRef]
- Sanaullah, M.; Ahmad, B.; Kashif, M.; Safdar, T.; Hassan, M.; Hasan, M.H.; Aziz, N. A real-time automatic translation of text to sign language. Comput. Mater. Contin. 2022, 70, 2471–2488. [Google Scholar] [CrossRef]
- Bernard, M.; Titeux, H. Phonemizer: Text to Phones Transcription for Multiple Languages in Python. J. Open Source Softw. 2021, 6, 3958. [Google Scholar] [CrossRef]
- Hayden, R.E. The relative frequency of phonemes in General-American English. Word 1950, 6, 217–223. [Google Scholar] [CrossRef]
Approach | Real-Time Feedback | Active Speech Production | Full Phoneme Coverage | Requires Therapist | Scalability |
---|---|---|---|---|---|
SLP (in-person) [21,22,23,24] | ✓ | ✓ | ✓ | ✓ | × |
Real-Time Translation Tools [25,36,37] | ✓ | × | × | × | ✓ |
TAPS [20] | × | × | Partial | × | Medium |
HAPOVER [4] | ✓ | ✓ | Partial | × | Low–Medium |
Tactile Vocoder [33,34] | ✓ | × | Partial (perceptual only) | × | Medium–High |
Phoneme-Based Tactile Display [35] | × | × | ✓ | × | Medium |
Our Proposed System | ✓ | ✓ | ✓ | × | ✓ |
Word | IPA Phonemes |
---|---|
Phonemes: | /p/, /æ/, /n/, /I/, /s/, /t/ |
sit | /s/ /I/ /t/ |
pan | /p/ /æ/ /n/ |
pin | /p/ /I/ /n/ |
nap | /n/ /æ/ /p/ |
satin | /s/ /æ/ /t/ /I/ /n/ |
Phase | Word | IPA Phonemes |
---|---|---|
Phase 1 | Phonemes: /p/, /æ/, /n/, /I/, /s/, /t/ | |
sit | /s/ /I/ /t/ | |
pan | /p/ /æ/ /n/ | |
pin | /p/ /I/ /n/ | |
nap | /n/ /æ/ /p/ | |
satin | /s/ /æ/ /t/ /I/ /n/ | |
Phase 2 | New phonemes: /m/, /k/ All: /p/, /æ/, /n/, /I/, /s/, /t/, /m/, /k/ | |
man | /m/ /æ/ /n/ | |
kit | /k/ /I/ /t/ | |
mat | /m/ /æ/ /t/ | |
panic | /p/ /æ/ /n/ /I/ /k/ | |
mantis | /m/ /æ/ /n/ /t/ /I/ /s/ | |
Phase 3 | New phonemes: /d/, /l/ All: /p/, /æ/, /n/, /I/, /s/, /t/, /m/, /k/, /d/, /l/ | |
land | /l/ /æ/ /n/ /d/ | |
sand | /s/ /æ/ /n/ /d/ | |
sandal | /s/ /æ/ /n/ /d/ /@/ /l/ | |
candle | /k/ /æ/ /n/ /d/ /@/ /l/ | |
paddle | /p/ /æ/ /d/ /@/ /l/ |
Metric | Phase 1 | Phase 2 | Phase 3 |
---|---|---|---|
Mean time (s), all | 41.3 | 25.9 | 30.3 |
Fastest | 11.4 | 12.4 | 10.6 |
Slowest | 85.0 | 65.3 | 47.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sharon, A.; Yozevitch, R.; Holdengreber, E. Touch to Speak: Real-Time Tactile Pronunciation Feedback for Individuals with Speech and Hearing Impairments. Technologies 2025, 13, 345. https://doi.org/10.3390/technologies13080345
Sharon A, Yozevitch R, Holdengreber E. Touch to Speak: Real-Time Tactile Pronunciation Feedback for Individuals with Speech and Hearing Impairments. Technologies. 2025; 13(8):345. https://doi.org/10.3390/technologies13080345
Chicago/Turabian StyleSharon, Anat, Roi Yozevitch, and Eldad Holdengreber. 2025. "Touch to Speak: Real-Time Tactile Pronunciation Feedback for Individuals with Speech and Hearing Impairments" Technologies 13, no. 8: 345. https://doi.org/10.3390/technologies13080345
APA StyleSharon, A., Yozevitch, R., & Holdengreber, E. (2025). Touch to Speak: Real-Time Tactile Pronunciation Feedback for Individuals with Speech and Hearing Impairments. Technologies, 13(8), 345. https://doi.org/10.3390/technologies13080345