Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA Methodology
Abstract
:1. Introduction
2. Methodology
2.1. Eligibility Criteria
2.2. Exclusion Criteria
2.3. Search Method
2.4. Paper Selection
3. Pronunciation Assessment in L2
4. Assessment of Linguistic Levels in L2 through ASR
4.1. Grammar and Lexical Assessment
4.2. Phonetic Assessment
4.3. Prosodic Assessment
5. Commercial Systems
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
1 | http://www.lti.cs.cmu.edu/Research/Fluency/ (accessed on 18 July 2023). |
2 | http://www.rosettastone.com (accessed on 20 July 2023). |
3 | https://elsaspeak.com/en/about-us (acessed on 10 July 2023). |
References
- Adami, André Gustavo. 2007. Modeling Prosodic Differences for Speaker Recognition. Speech Communication 49: 277–91. [Google Scholar] [CrossRef]
- Alharbi, Sadeen, Muna Alrazgan, Alanoud Alrashed, Turkiayh Alnomasi, Raghad Almojel, Rimah Alharbi, Saja Alharbi, Sahar Alturki, Fatimah Alshehri, and Maha Almojil. 2021. Automatic Speech Recognition: Systematic Literature Review. IEEE Access 9: 131858–76. [Google Scholar] [CrossRef]
- Anguera, Xavier, and Vu Van. 2016. English Language Speech Assistant. In Interspeech. San Francisco: International Speech Communication Association. Available online: http://kaldi-asr.org (accessed on 15 June 2023).
- Arkin, Gulnur, Askar Hamdulla, and Mijit Ablimit. 2021. Analysis of Phonemes and Tones Confusion Rules Obtained by ASR. Wireless Networks 27: 3471–81. [Google Scholar] [CrossRef]
- Ateeq, Mohammad, and Abualsoud Hanani. 2019. Speech-Based L2 Call System for English Foreign Speakers. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Berlin/Heidelberg: Springer, pp. 43–53. [Google Scholar] [CrossRef]
- Bashori, Muzakki, Roeland van Hout, Helmer Strik, and Catia Cucchiarini. 2022. ‘Look, I Can Speak Correctly’: Learning Vocabulary and Pronunciation through Websites Equipped with Automatic Speech Recognition Technology. Computer Assisted Language Learning 2022: 1–29. [Google Scholar] [CrossRef]
- Bataineh, Ahmad, and Nasir Al-Qadi. 2014. The Effect of Using Authentic Videos on English Major Students’ Prosodic Competence. Journal of Education and Practice 5: 157–72. [Google Scholar]
- Belpaeme, Tony, Paul Vogt, Rianne van den Berghe, Kirsten Bergmann, Tilbe Göksun, Mirjam de Haas, Junko Kanero, James Kennedy, Aylin C. Küntay, Ora Oudgenoeg-Paz, and et al. 2018. Guidelines for Designing Social Robots as Second Language Tutors. International Journal of Social Robotics 10: 325–41. [Google Scholar] [CrossRef]
- Besacier, Laurent, Etienne Barnard, Alexey Karpov, and Tanja Schultz. 2014. Automatic Speech Recognition for Under-Resourced Languages: A Survey. Speech Communication 56: 85–100. [Google Scholar] [CrossRef]
- Cucchiarini, Catia, and Helmer Strik. 2017. Automatic Speech Recognition for Second Language Pronunciation Training. In The Routledge Handbook of Contemporary English Pronunciation. Abingdon and New York: Routledge, pp. 556–69. [Google Scholar] [CrossRef]
- Cucchiarini, Catia, and Helmer Strik. 2019. Second Language Learners’ Spoken Discourse: Practice and Corrective Feedback Through Automatic Speech Recognition. In Computer-Assisted Language Learning. Hershey: IGI Global, pp. 787–810. [Google Scholar] [CrossRef]
- Cylwik, N., G. Demenko, O. Jokisch, R. Jäckel, M. Rusko, R. Hoffmann, A. Ronzhin, D. Hirschfeld, U. Koloska, and L. Hanisch. 2008. The Use of CALL in Acquiring Foreign Language Pronunciation and Prosody-General Specifications for Euronounce Project. Speech and Language Technology 11: 123–29. [Google Scholar]
- Danka, Sandor. 2018. Current Debates in the Theory and Teaching of English L2 Pronunciation. The New English Teacher 12: 59. Available online: http://www.assumptionjournal.au.edu/index.php/newEnglishTeacher/article/view/3093 (accessed on 6 June 2023).
- De Iacovo, Valentina, Marco Palena, and Antonio Romano. 2021. Evaluating Prosodic Cues in Italian: The Use of a Telegram Chatbot as a CALL Tool for Italian L2 Learners. In Speaker Individuality in Phonetics and Speech Sciences. Edited by Camilla Bernardasci, Dalila Dipino, Davide Garassino, Stefano Negrinelli, Elisa Pellegrino and Stephan Schmid. Milan: Officinaventuno, pp. 283–98. [Google Scholar] [CrossRef]
- De Villiers, Jill G., and Peter A. De Villiers. 1978. Language Acquisition. Cambridge: Harvard University Press. Available online: https://www.hup.harvard.edu/catalog.php?isbn=9780674509313 (accessed on 28 June 2023).
- Demenko, Grazyna, Agnieszka Wagner, and Natalia Cylwik. 2010. The Use of Speech Technology in Foreign Language Pronunciation Training. Archives of Acoustics 35: 309–29. [Google Scholar] [CrossRef]
- Escudero, David, Enrique Cámara, Cristian Tejedor, César González, and Valentín Cardeñoso. 2015. Implementation and Test of a Serious Game Based on Minimal Pairs for Pronunciation Training. Paper presented at the Workshop on Speech and Language Technology in Education (SLATE), Leipzig, Germany, September 4–5; Available online: https://uvadoc.uva.es/handle/10324/27533 (accessed on 19 June 2023).
- Escudero-Mancebo, David, Valentín Cardeñoso-Payo, Mario Corrales-Astorgano, César González Ferreras, Valle Flóres-Lucas, Lourdes Aguilar, Yolanda Martín-De-San-Pablo, and Alfonso Rodríguez-De-Rojas. 2021. Incorporation of a Module for Automatic Prediction of Oral Productions Quality in a Learning Video Game. Paper presented at the IberSPEECH, Valladolid, Spain, March 24–25; pp. 123–26. [Google Scholar] [CrossRef]
- Eskenazi, Maxine. 1999. Using Automatic Speech Processing for Foreign Language Pronunciation Tutor: Some Issues and a Prototype. Language Learning & Technology 2: 62–76. [Google Scholar]
- Frost, Dan, and Francis Picavet. 2014. Putting Prosody First—Some Practical Solutions to a Perennial Problem: The Innovalangues Project. Research in Language 12: 233–43. [Google Scholar] [CrossRef]
- Gómez-Zaragozá, Lucía, Simone Wills, Cristian Tejedor-Garcia, Javier Marín-Morales, Mariano Alcañiz, and Helmer Strik. 2023. Alzheimer Disease Classification through ASR-Based Transcriptions: Exploring the Impact of Punctuation and Pauses. In Proceedings of the Interspeech. Dublin: International Speech Communication Association, pp. 2403–7. [Google Scholar] [CrossRef]
- Guo, Weitong, Hongwu Yang, and Zhenye Gan. 2019. Improving Mandarin Chinese Learning in Tibetan Second-Language Learning by Artificial Intelligent Speech Technology. Paper presented at the International Joint Conference on Information, Media, and Engineering, IJCIME 2019, Osaka, Japan, December 17–19; pp. 368–72. [Google Scholar] [CrossRef]
- Guskaroska, Agata. 2019. ASR as a Tool for Providing Feedback for Vowel Pronunciation Practice. Ames: Iowa State University. [Google Scholar]
- Hirai, Akiyo, and Angelina Kovalyova. 2023. Using Speech-to-Text Applications for Assessing English Language Learners’ Pronunciation: A Comparison with Human Raters. English Language Education 31: 337–55. [Google Scholar] [CrossRef]
- Hirst, Daniel, and Albert Di Cristo. 1998. Intonation Systems: A Survey of Twenty Languages. Edited by Daniel Hirst and Albert Di Cristo. Cambridge: Cambridge University Press (CUP). [Google Scholar]
- Hönig, Florian Thomas. 2016. Automatic Assessment of Prosody in Second Language Learning. Erlangen: Friedrich-Alexander-Universität. [Google Scholar]
- Johnson, David O., Okim Kang, and Romy Ghanem. 2016. Improved Automatic English Proficiency Rating of Unconstrained Speech with Multiple Corpora. International Journal of Speech Technology 19: 755–68. [Google Scholar] [CrossRef]
- Jokisch, Oliver, Uwe Koloska, Diane Hirschfeld, and Rüdiger Hoffmann. 2005. Pronunciation Learning and Foreign Accent Reduction by an Audiovisual Feedback System. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Berlin/Heidelberg: Springer, pp. 419–25. [Google Scholar] [CrossRef]
- Kang, Okim, and David Johnson. 2018. The Roles of Suprasegmental Features in Predicting English Oral Proficiency with an Automated System. Language Assessment Quarterly 15: 150–68. [Google Scholar] [CrossRef]
- Kochem, Tim, Jeanne Beck, and Erik Goodale. 2022. The Use of ASR-Equipped Software in the Teaching of Suprasegmental Features of Pronunciation: A Critical Review. CALICO Journal 39: 306–25. [Google Scholar] [CrossRef]
- Levis, John, Tracey M. Derwing, and Sinem Sonsaat-Hegelheimer. 2022. Second Language Pronunciation: Bridging the Gap between Research and Teaching. [Google Scholar]
- Liakin, Denis, Walcir Cardoso, and Natallia Liakina. 2015. Learning L2 Pronunciation with a Mobile Speech Recognizer: French /Y/. CALICO Journal 32: 1–25. [Google Scholar] [CrossRef]
- Liakin, Denis, Walcir Cardoso, and Natallia Liakina. 2017. Mobilizing Instruction in a Second-Language Context: Learners’ Perceptions of Two Speech Technologies. Languages 2: 11. [Google Scholar] [CrossRef]
- Lima, Edna F. 2020. The Supra Tutor Improving Speaker Comprehensibility through a Fully Online Pronunciation Course. Journal of Second Language Pronunciation 6: 39–67. [Google Scholar] [CrossRef]
- Ling, Li, and Weiying Chen. 2023. Integrating an ASR-Based Translator into Individualized L2 Vocabulary Learning for Young Children. Education and Information Technologies 28: 1231–49. [Google Scholar] [CrossRef]
- Magaña Redondo, Juan José. 2017. Audio Trainer Play: Design of a Gamified App for the Development of Audio Skills in a Secondary School Context. Madrid: Universidad Nacional de Educación a Distancia. Facultad de Filología. [Google Scholar]
- Mansour, Eman, Rand Sandouka, Dima Jaber, and Abualsoud Hanani. 2019. Speech-Based Automatic Assessment of Question Making Skill in L2 Language. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Berlin/Heidelberg: Springer, pp. 317–26. [Google Scholar] [CrossRef]
- McCrocklin, Shannon. 2019. Dictation Programs for Second Language Pronunciation Learning: Perceptions of the Transcript, Strategy Use and Improvement. Konińskie Studia Językowe 7: 137–57. [Google Scholar]
- Mirzaei, Maryam Sadat, Kourosh Meshgi, and Tatsuya Kawahara. 2018. Exploiting Automatic Speech Recognition Errors to Enhance Partial and Synchronized Caption for Facilitating Second Language Listening. Computer Speech & Language 49: 17–36. [Google Scholar] [CrossRef]
- Moher, David, Alessandro Liberati, Jennifer Tetzlaff, Douglas G Altman, and PRISMA Group. 2009. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Medicine 6: e1000097. [Google Scholar] [CrossRef]
- Molenaar, Bo, Cristian Tejedor-Garcia, Catia Cucchiarini, and Helmer Strik. 2023. Automatic Assessment of Oral Reading Accuracy for Reading Diagnostics. Paper presented at the Interspeech, Dublin, Ireland, August 20–24; pp. 5232–36. [Google Scholar] [CrossRef]
- Mrozek, Patryk Mikołaj. 2020. ShadowTalk: A Prosody-Training Mobile App for English as a Second or Foreign Language Students. Long Beach: California State University. [Google Scholar]
- Munro, Murray J. 2021. On the Difficulty of Defining ‘Difficult’ in Second-Language Vowel Acquisition. Frontiers in Communication 6: 639398. [Google Scholar] [CrossRef]
- Murad, Dania, Riwu Wang, Douglas Turnbull, and Ye Wang. 2018. SLIONS: A Karaoke Application to Enhance Foreign Language Learning. In Paper presented at the MM 2018—Proceedings of the 2018 ACM Multimedia Conference, Seoul, Republic of Korea, October 22–26; Volume 18, pp. 1679–87. [Google Scholar] [CrossRef]
- O’Brien, Mary Grantham, Tracey M. Derwing, Catia Cucchiarini, Debra M. Hardison, Hansjörg Mixdorff, Ron I. Thomson, Helmer Strik, John M. Levis, Murray J. Munro, Jennifer A. Foote, and et al. 2019. Directions for the Future of Technology in Pronunciation Research and Teaching. Journal of Second Language Pronunciation 4: 182–207. [Google Scholar] [CrossRef]
- O’Brien, Mary Grantham. 2020. Ease and Difficulty in L2 Pronunciation Teaching: A Mini-Review. Frontiers in Communication 5: 626985. [Google Scholar] [CrossRef]
- Pellegrini, Thomas, Rui Correia, Isabel Trancoso, Jorge Baptista, Nuno Mamede, and Maxine Eskenazi. 2013. ASR-Based Exercises for Listening Comprehension Practice in European Portuguese. Computer Speech & Language 27: 1127–42. [Google Scholar] [CrossRef]
- Pennington, Martha C., and Pamela Rogerson-Revell. 2019. Assessing Pronunciation. English Pronunciation Training and Research, 287–342. [Google Scholar] [CrossRef]
- Robertson, Sean, Cosmin Munteanu, and Gerald Penn. 2018. Designing Pronunciation Learning Tools: The Case for Interactivity against over-Engineering. Paper presented at the Conference on Human Factors in Computing Systems, Montreal, QC, Canada, April 21–26. [Google Scholar]
- Rosenberg, Andrew. 2018. Speech, Prosody, and Machines: Nine Challenges for Prosody Research. Paper presented at the 9th International Conference on Speech Prosody 2018, Poznan, Poland, June 13–16; pp. 784–93. [Google Scholar] [CrossRef]
- Tejedor García, Cristian. 2020. Design and Evaluation of Mobile Computer-Assisted Pronunciation Training Tools for Second Language Learning. Ph.D. thesis, Universidad de Valladolid, Valladolid, Spain. [Google Scholar] [CrossRef]
- Tejedor-García, Cristian, David Escudero-Mancebo, Valentín Cardeñoso-Payo, and César González-Ferreras. 2020. Using Challenges to Enhance a Learning Game for Pronunciation Training of English as a Second Language. IEEE Access 8: 74250–66. [Google Scholar] [CrossRef]
- Timpe-Laughlin, Veronika, Tetyana Sydorenko, and Phoebe Daurio. 2020. Using Spoken Dialogue Technology for L2 Speaking Practice: What Do Teachers Think? Computer Assisted Language Learning 35: 1194–217. [Google Scholar] [CrossRef]
- van Doremalen, Joost, Lou Boves, Catia Cucchiarini, and Helmer Strik. 2014. Implementation of an ASR-Enabled CALL System for Practicing Pronunciation and Grammar: The BASSIsT System. In Developing Automatic Speech Recognition-Enabled Language Learning Applications: From Theory to Practice. Edited by Joost van Doremalen. Nijmegen: Radboud University Nijmegen, pp. 71–94. [Google Scholar]
- van Doremalen, Joost. 2014. Developing Automatic Speech Recognition-Enabled Language Learning Applications: From Theory to Practice. Nijmegen: Radboud Universiteit Nijmegen. [Google Scholar]
- Wang, Yi-Hsuan, and Shelley Shwu-Ching Young. 2014. A Study of the Design and Implementation of the ASR-Based ICASL System with Corrective Feedback to Facilitate English Learning. Educational Technology & Society 17: 219–33. [Google Scholar]
- Yaneva, Alexandrina. 2021. Speech Technologies Applied to Second Language Learning. A Use Case on Bulgarian. Barcelona: Universitat Pompeu Fabra. Available online: http://repositori.upf.edu/handle/10230/48854 (accessed on 20 July 2023).
- Yeh, Rosa. 2014. Effective Strategies for Using Text-to-Speech, Speech-to-Text, and Machine-Translation Technology for Teaching Chinese: A Multiple-Case Study. Prescott Valley: Northcentral University. [Google Scholar]
- Yenkimaleki, Mahmood, and Vincent J. van Heuven. 2019. The Relative Contribution of Computer Assisted Prosody Training vs. Instructor Based Prosody Teaching in Developing Speaking Skills by Interpreter Trainees: An Experimental Study. Speech Communication 107: 48–57. [Google Scholar] [CrossRef]
- Zhang, Xinlei, Takashi Miyaki, and Jun Rekimoto. 2020. WithYou: Automated Adaptive Speech Tutoring with Context-Dependent Speech Recognition. Paper presented at the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, April 25–30. [Google Scholar]
- Zielinski, Beth. 2006. The Intelligibility Cocktail: An Interaction between Speaker and Listener Ingredients. Prospect 21: 22–45. Available online: https://researchers.mq.edu.au/en/publications/the-intelligibility-cocktail-an-interaction-between-speaker-and-l (accessed on 19 July 2023).
Reference | ASR/System | L1 | L2 | Level |
---|---|---|---|---|
(Mansour et al. 2019) | Kaldi-based | Arabic | English | grammar |
(Ateeq and Hanani 2019) | Google, SLaTE2018 | any | English | grammar |
(Ling and Chen 2023) | “Speak and Translate” app | English | Chinese | lexical |
(Tejedor García 2020) | Kaldi | any | English, Spanish | phonetic |
(Wang and Young 2014) | iCASL | Taiwanese | English | lexical, phonetic |
(Arkin et al. 2021) | Tsinghua University | Uyghur | Chinese | phonetic |
(Guo et al. 2019) | ASR algorithm | Tibetan | Chinese | phonetic |
(Bashori et al. 2022) | ASR-based websites | Indonesian | English | lexical, phonetic |
(Guskaroska 2019) | Gboard, Siri, voice dictation on smartphones | Macedonian | English | phonetic |
(Escudero et al. 2015) | Android ASR | Spanish | English | phonetic |
(Tejedor-García et al. 2020) | Clash of Pronunciations (COP) | Spanish | English | phonetic |
(van Doremalen et al. 2014) | bASSIsT, DISCO | any | Dutch | phonetics, morphology, syntax |
(Pellegrini et al. 2013) | Daily-REAP | any | Portuguese | phonetic |
(Liakin et al. 2015) | mobile ASR | any | French | phonetic |
(Mirzaei et al. 2018) | Julius ASR | any | English | phonetic, lexical |
(Johnson et al. 2016) | not specified | Portuguese | English | prosodic |
(Liakin et al. 2017) | Nuance Dragon Dictation | English, Mandarin, Arabic, Spanish | French | phonetic, prosodic |
(Demenko et al. 2010) | AzAR3.0 | German, Polish, Slovak, Czech, Russian | German, Polish, Slovak, Czech, Russian | phonetic, prosodic |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Farrús, M. Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA Methodology. Languages 2023, 8, 242. https://doi.org/10.3390/languages8040242
Farrús M. Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA Methodology. Languages. 2023; 8(4):242. https://doi.org/10.3390/languages8040242
Chicago/Turabian StyleFarrús, Mireia. 2023. "Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA Methodology" Languages 8, no. 4: 242. https://doi.org/10.3390/languages8040242
APA StyleFarrús, M. (2023). Automatic Speech Recognition in L2 Learning: A Review Based on PRISMA Methodology. Languages, 8(4), 242. https://doi.org/10.3390/languages8040242