Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings
Abstract
:1. Introduction
- In addition to Amharic reading speech of the ALFFA (African Languages in the Field: speech Fundamentals and Automation) dataset [22], we prepared additional speech and text corpora, which cover various data sources. These data provide good coverage of the morphological behavior of Amharic language.
- CTC-attention end-to-end ASR architecture with phoneme mapping algorithms is proposed to model subword level Amharic language units to resolve the problem of OOV words in Amharic automatic speech recognition (AASR).
- We explored the effects of OOV words by considering the most frequently occurring words in different vocabulary sizes, namely, 6.5 k, 10 k, 15 k, and 20 k, in character-based and phoneme-based end-to-end models.
- Evaluating and analyzing the speech recognition performance was performed using various meaningful Amharic language modeling units such as phoneme-recurrent neural network language modeling (RNNLM), character-RNNLM, and word-RNNLM. These language models help to explore the effects of context-dependent and independent RNNLMs in end-to-end speech recognition models.
- The performance speech recognition results were compared and better results were found in phoneme-based subwords generated by the BPE segmentation algorithm. These phonemes include the Amharic epithetic vowel እ[ɨ] inserted by syllabification algorithms during preprocessing (phoneme mapping) of our dataset.
2. Related Work
3. Dataset and Methods
3.1. Dataset and Data Pre-Processing
3.1.1. Dataset
3.1.2. Text Corpus Pre-Processing
Algorithm 1: Amharic Grapheme-to-Phoneme (G2P) Conversion algorithm |
Input: Grapheme-based Amharic text corpus |
Output: Phoneme-based Amharic text corpus |
1: Index text file = 0; repeat |
2: For each indexed file f in text corpora normalize the text corpus using unique phonemes |
3: If grapheme is not unique phonemes |
4: Replace each Grapheme G by its CV phoneme representation using G2P conversion list: Grapheme G ← phoneme with CV phoneme pattern |
5: Else |
6: Keep its phoneme representation |
7: End if |
8: End for |
9: index ++; |
10: until the process is applied to all files |
- # CC #CɨC
- CCC CCɨC
- C1C1C (CC:) C1C1ɨC (C:ɨC)
- CC1C1(CC:) CɨC1C1 (CɨC:)
- C1C1C2C2 (C:C:) C1C1ɨC2C2 (C:ɨC:)
- CC# CɨC#
3.2. Methods
3.2.1. CTC Model
3.2.2. Attention-Based Model
3.2.3. CTC-Attention Model
3.3. Our Proposed Speech Recognition System
4. Experiment Parameter Setups and Results
4.1. Parameter Setups and Configuration
4.2. Experiment Results
4.2.1. Character-Based Baseline End-to-End Models
4.2.2. Phoneme-Based End-to-End Models
- እውቅና ን ማግኘቴ ለ እኔ ትልቅ ክብር ነው
- ምን ለማ ለት ነው ግልጽ አድርገው
- ከዚያ በ ተጨማሪ የ ስልጠና ውን ሂደት የሚ ያሻሽል ላቸው ይሻሉ
- እውቅንኣ ን ምኣግኝኧትኤ ልኧ እንኤ ትልቅ ክብር ንኧው
- ም ን ልኧምኣ ልኧት ንኧው ግልጽ ኣድርግኧው
- ክኧዝኢይኣ ብኧ ትኧጭኧምኣርኢ ይኧ ስልጥኧንኣ ውን ህኢድኧት ይኧምኢ ይኣሽኣሽል ልኣችኧው ይ ሽኣልኡ
4.2.3. Subword-Based End-to-End Models
5. Discussion
6. Conclusions and Future Works
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Consonants | 1st Order | 2nd Order | 3rd Order | 4th Order | 5th Order | 6th Order | 7th Order |
---|---|---|---|---|---|---|---|
ə | u | i | a | e | ɨ | o | |
h | ሀ | ሁ | ሂ | ሃ | ሄ | ህ | ሆ |
l | ለ | ሉ | ሊ | ላ | ሌ | ል | ሎ |
m | መ | ሙ | ሚ | ማ | ሜ | ም | ሞ |
s | ሠ | ሡ | ሢ | ሣ | ሤ | ሥ | ሦ |
r | ረ | ሩ | ሪ | ራ | ሬ | ር | ሮ |
. | . | . | . | . | . | . | . |
f | ፈ | ፉ | ፊ | ፋ | ፌ | ፍ | ፎ |
p | ፐ | ፑ | ፒ | ፓ | ፔ | ፕ | ፖ |
Appendix B
Manner of Articulation | Voicing | Labial | Dental | Palatal | Velar | Glottal |
---|---|---|---|---|---|---|
stops | Voiceless | P ፕ | t ት | k ክ | Kw ኰ | ? ዕ |
Voiced | b ብ | d ድ | g ግ | gw ጐ | ||
glottalized | p’ ጵ | t’ ጥ | q ቅ | qw ቈ | ||
rounded | h ህ | |||||
fricatives | Voiceless | f ፍ | s ስ | š ሽ | ||
Voiced | v ቭ | z ዝ | ž ዥ | |||
glottalized | s’ ጽ | hw ኈ | ||||
rounded | ||||||
Affricative | Voiceless | č ች | ||||
Voiced | ğ ጅ | |||||
glottalized | č’ ጭ | |||||
rounded | ||||||
Nasals | voiced | m ም | n ን | ň ኝ | ||
Liquids | Voiceless | r ር | ||||
voiced | l ል | |||||
Glides | w ዉ | y ይ |
References
- Claire, W.Y.; Roy, S.; Vincent, T.Y. Syllable based DNN-HMM Cantonese speech-to-text system. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23–28 May 2016. [Google Scholar]
- Novoa, J.; Fredes, J.; Poblete, V.; Yoma, N.B. Uncertainty weighting and propagation in DNN–HMM-based speech recog-nition. Comput. Speech Lang. 2018, 47, 30–46. [Google Scholar] [CrossRef]
- Hori, T.; Cho, J.; Watanabe, S. End-to-end Speech Recognition With Word-Based Rnn Language Models. In Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 18–21 December 2018; pp. 389–396. [Google Scholar]
- Wu, L.; Li, T.; Wang, L.; Yan, Y. Improving Hybrid CTC/Attention Architecture with Time-Restricted Self-Attention CTC for End-to-End Speech Recognition. Appl. Sci. 2019, 9, 4639. [Google Scholar] [CrossRef] [Green Version]
- Yoshimura, T.; Hayashi, T.; Takeda, K.; Watanabe, S. End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 6999–7003. [Google Scholar]
- Qin, C.-X.; Zhang, W.-L.; Qu, D. A new joint CTC-attention-based speech recognition model with multi-level multi-head attention. EURASIP J. Audio Speech Music. Process. 2019, 2019, 1–12. [Google Scholar] [CrossRef]
- Graves, A.; Jaitly, N. Towards end-to-end speech recognition with recurrent neural networks. In Proceedings of the International Conference on Machine Learning; PMLR: Bejing, China, 2014; pp. 1764–1772. [Google Scholar]
- Graves, A. Sequence transduction with recurrent neural networks. arXiv 2012, arXiv:1211.3711. [Google Scholar]
- Chorowski, J.K.; Bahdanau, D.; Serdyuk, D.; Cho, K.; Bengio, Y. Attention-based models for speech recognition. In Ad-vances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2015; pp. 577–585. [Google Scholar]
- Kim, S.; Hori, T.; Watanabe, S. Joint CTC-attention based end-to-end speech recognition using multi-task learning. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 4835–4839. [Google Scholar]
- Watanabe, S.; Hori, T.; Hershey, J.R. Language independent end-to-end architecture for joint language identification and speech recognition. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, 16–20 December 2017; pp. 265–271. [Google Scholar]
- Boyer, F.; Rouas, J.-L. End-to-End Speech Recognition: A review for the French Language. arXiv 2019, arXiv:1910.08502. [Google Scholar]
- Das, A.; Li, J.; Zhao, R.; Gong, Y. Advancing Connectionist Temporal Classification with Attention Modeling. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 4769–4773. [Google Scholar]
- Fathima, N.; Patel, T.; C, M.; Iyengar, A. TDNN-based Multilingual Speech Recognition System for Low Resource Indian Languages. In Proceedings of the Interspeech 2018, Hyderabad, India, 2–6 September 2018; pp. 3197–3201. [Google Scholar]
- Le, D.; Provost, E.M. Improving Automatic Recognition of Aphasic Speech with AphasiaBank. In Proceedings of the Interspeech 2016, Francisco, CA, USA, 8–12 September 2016; pp. 2681–2685. [Google Scholar]
- Li, J.; Ye, G.; Zhao, R.; Droppo, J.; Gong, Y. Acoustic-to-word model without OOV. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, 16–20 December 2017; pp. 111–117. [Google Scholar]
- Sikdar, U.K.; Gambäck, B. Named Entity Recognition for Amharic Using Stack-Based Deep Learning. In International Conference on Computational Linguistics and Intelligent Text Processing; Springer: Cham, Switzerland, 2018; pp. 276–287. [Google Scholar]
- Abate, S.T.; Menzel, W.; Tafila, B. An Amharic speech corpus for large vocabulary continuous speech recognition. In Proceedings of the Ninth European Conference on Speech Communication and Technology, Lisbon, Portugal, 4–8 September 2005. [Google Scholar]
- Melese, M.; Besacier, L.; Meshesha, M. Amharic speech recognition for speech translation. In Proceedings of the Atelier Traitement Au-tomatique des Langues Africaines (TALAF), JEP-TALN 2016, Paris, France, 4 July 2016. [Google Scholar]
- Belay, B.H.; Habtegebrial, T.; Meshesha, M.; Liwicki, M.; Belay, G.; Stricker, D. Amharic OCR: An End-to-End Learning. Appl. Sci. 2020, 10, 1117. [Google Scholar] [CrossRef] [Green Version]
- Gamback, B.; Sikdar, U.K. Named entity recognition for Amharic using deep learning. In Proceedings of the 2017 IST-Africa Week Conference (IST-Africa), Windhoek, Namibia, 30 May–2 June 2017; pp. 1–8. [Google Scholar]
- Tachbelie, M.Y.; Abate, S.T.; Besacier, L. Using different acoustic, lexical and language modeling units for ASR of an under-resourced language–Amharic. Speech Commun. 2014, 56, 181–194. [Google Scholar] [CrossRef]
- Dribssa, A.E.; Tachbelie, M.Y. Investigating the use of syllable acoustic units for amharic speech recognition. In Proceedings of the AFRICON 2015, Addis Ababa, Ethiopia, 14–17 September 2015; pp. 1–5. [Google Scholar]
- Gebremedhin, Y.B.; Duckhorn, F.; Hoffmann, R.; Kraljevski, I.; Hoffmann, R. A new approach to develop a syllable based, continuous Amharic speech recognizer. In Proceedings of the Eurocon 2013, Zagreb, Croatia, 1–4 July 2013; pp. 1684–1689. [Google Scholar]
- Kim, Y.; Jernite, Y.; Sontag, D.; Rush, A.M. Character-aware neural language models. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Taipei City, Taiwan, 26–29 November 2016. [Google Scholar]
- Inaguma, H.; Mimura, M.; Sakai, S.; Kawahara, T. Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR. In Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT), Athens, Greece, 18–21 December 2018; pp. 212–218. [Google Scholar]
- Zeyer, A.; Zhou, W.; Ng, T.; Schlüter, R.; Ney, H. Investigations on Phoneme-Based End-To-End Speech Recognition. arXiv 2020, arXiv:2005.09336. [Google Scholar]
- Wang, W.; Zhou, Y.; Xiong, C.; Socher, R. An investigation of phone-based subword units for end-to-end speech recogni-tion. arXiv 2020, arXiv:2004.04290. [Google Scholar]
- Xiao, Z.; Ou, Z.; Chu, W.; Lin, H. Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units. In Proceedings of the 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP), Taipei City, Taiwan, 26–29 November 2018; pp. 146–150. [Google Scholar]
- Yuan, Z.; Lyu, Z.; Li, J.; Zhou, X. An improved hybrid CTC-Attention model for speech recognition. arXiv 2018, arXiv:1810.12020. [Google Scholar]
- Schuster, M.; Nakajima, K. Japanese and Korean voice search. In Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 5149–5152. [Google Scholar]
- Huang, M.; Lu, Y.; Wang, L.; Qian, Y.; Yu, K. Exploring model units and training strategies for end-to-end speech recog-nition. In Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore, 14–18 December 2019; pp. 524–531. [Google Scholar]
- Das, A.; Li, J.; Ye, G.; Zhao, R.; Gong, Y. Advancing Acoustic-to-Word CTC Model With Attention and Mixed-Units. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 1880–1892. [Google Scholar] [CrossRef] [Green Version]
- Zhang, F.; Wang, Y.; Zhang, X.; Liu, C.; Saraf, Y.; Zweig, G. Fast, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces. arXiv 2020, arXiv:2005.09150. [Google Scholar]
- Gokay, R.; Yalcin, H. Improving Low Resource Turkish Speech Recognition with Data Augmentation and TTS. In Proceedings of the 2019 16th International Multi-Conference on Systems, Signals & Devices (SSD), Istanbul, Turkey, 21–24 March 2019; pp. 357–360. [Google Scholar]
- Liu, C.; Zhang, Q.; Zhang, X.; Singh, K.; Saraf, Y.; Zweig, G. Multilingual Graphemic Hybrid ASR with Massive Data Augmentation. In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), Marseille, France, 11–12 May 2020; pp. 46–52. [Google Scholar]
- Laptev, A.; Korostik, R.; Svischev, A.; Andrusenko, A.; Medennikov, I.; Rybin, S. You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation. arXiv 2020, arXiv:2005.07157. [Google Scholar]
- Park, D.S.; Chan, W.; Zhang, Y.; Chiu, C.-C.; Zoph, B.; Cubuk, E.D.; Le, Q.V. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv 2019, arXiv:1904.08779. [Google Scholar]
- Hailu, N.; Hailemariam, S. Modeling improved syllabification algorithm for Amharic. In Proceedings of the International Conference on Management of Emergent Digital EcoSystems; Association for Computing Machinery: New York, NY, USA, 2012; pp. 16–21. [Google Scholar]
- Mariam, S.H.; Kishore, S.P.; Black, A.W.; Kumar, R.; Sangal, R. Unit selection voice for amharic using festvox. In Proceedings of the Fifth ISCA Workshop on Speech Synthesis, Pittsburgh, PA, USA, 14–16 June 2004. [Google Scholar]
- Hori, T.; Watanabe, S.; Hershey, J.; Barzilay, R.; Kan, M.-Y. Joint CTC/attention decoding for end-to-end speech recognition. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 518–529. [Google Scholar]
- Watanabe, S.; Hori, T.; Kim, S.; Hershey, J.R.; Hayashi, T. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition. IEEE J. Sel. Top. Signal Process. 2017, 11, 1240–1253. [Google Scholar] [CrossRef]
- Graves, A.; Fernández, S.; Gomez, F.; Schmidhuber, J. Connectionist temporal classification: Labelling unsegmented se-quence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 2006; pp. 369–376. [Google Scholar]
- Li, J.; Ye, G.; Das, A.; Zhao, R.; Gong, Y. Advancing Acoustic-to-Word CTC Model. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 5794–5798. [Google Scholar]
- Moritz, N.; Hori, T.; Le Roux, J. Triggered Attention for End-to-end Speech Recognition. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 5666–5670. [Google Scholar]
- Shan, C.; Zhang, J.; Wang, Y.; Xie, L. Attention-Based End-to-End Speech Recognition on Voice Search. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 1–5. [Google Scholar]
- Schuller, B.; Steidl, S.; Batliner, A.; Marschik, P.B.; Baumeister, H.; Dong, F.; Hantke, S.; Pokorny, F.B.; Rathner, E.-M.; Bartl-Pokorny, K.D.; et al. The interspeech 2018 computational paralinguistics challenge: Atypical\& self-assessed affect, crying\& heart beats. In Proceedings of the INTERSPEECH, Hyderabad, India, 2–6 September 2018; Volume 5. [Google Scholar]
- Tjandra, A.; Sakti, S.; Nakamura, S. Attention-based Wav2Text with feature transfer learning. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, 16–20 December 2017; pp. 309–315. [Google Scholar]
- Chan, W.; Jaitly, N.; Le, Q.; Vinyals, O. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 4960–4964. [Google Scholar]
- Chen, M.; He, X.; Yang, J.; Zhang, H. 3-D convolutional recurrent neural networks with attention model for speech emo-tion recognition. IEEE Signal Process. Lett. 2018, 25, 1440–1444. [Google Scholar] [CrossRef]
- Ueno, S.; Inaguma, H.; Mimura, M.; Kawahara, T. Acoustic-to-word attention-based model complemented with charac-ter-level CTC-based model. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 5804–5808. [Google Scholar]
- Tachbelie, M.Y.; Abate, S.T.; Menzel, W. Morpheme-based automatic speech recognition for a morphologically rich language-amharic. In Proceedings of the Spoken Languages Technologies for Under-Resourced Languages, Penang, Malaysia, 3–5 May 2010. [Google Scholar]
- Mittal, P.; Singh, N. Subword analysis of small vocabulary and large vocabulary ASR for Punjabi language. Int. J. Speech Technol. 2020, 23, 71–78. [Google Scholar] [CrossRef]
- Shaik, M.A.B.; Mousa, A.E.-D.; Hahn, S.; Schlüter, R.; Ney, H. Improved strategies for a zero OOV rate LVCSR system. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; pp. 5048–5052. [Google Scholar]
- Xu, H.; Ding, S.; Watanabe, S. Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Model-ing. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 7110–7114. [Google Scholar]
- Soltau, H.; Liao, H.; Sak, H. Neural speech recognizer: Acoustic-to-word LSTM model for large vocabulary speech recognition. arXiv 2016, arXiv:1610.09975. [Google Scholar]
- Andrusenko, A.; Laptev, A.; Medennikov, I. Exploration of End-to-End ASR for OpenSTT-Russian Open Speech-to-Text Dataset. arXiv 2020, arXiv:2006.08274. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909. [Google Scholar]
- Markovnikov, N.; Kipyatkova, I. Investigating Joint CTC-Attention Models for End-to-End Russian Speech Recognition. In International Conference on Speech and Computer; Springer: Cham, Switzerland, 2019; pp. 337–347. [Google Scholar]
Language Unit | Vocabulary Size | LM | Acoustic Model | CER (%) | WER (%) |
---|---|---|---|---|---|
character | 6.5 k | Word-RNNLM | CTC-attention | 28.09 | 39.30 |
character | 10 k | Word-RNNLM | CTC-attention | 26.91 | 37.60 |
character | 15 k | Word-RNNLM | CTC-attention | 25.60 | 37.01 |
character | 20 k | Word-RNNLM | CTC-attention | 25.21 | 36.80 |
Language Unit | LM | Acoustic Model | CER (%) | WER (%) |
---|---|---|---|---|
characters | Character-RNNLM | CTC-attention | 24.90 | 44.02 |
characters | Character-RNNLM | CTC-attention + SpecAugment | 23.80 | 41.00 |
Language Unit | Vocabulary Size | LM | Acoustic Model | PER (%) | WER (%) |
---|---|---|---|---|---|
phoneme | 6.5 k | Word-RNNLM | CTC-attention | 18.68 | 26.13 |
phoneme | 10 k | Word-RNNLM | CTC-attention | 17.36 | 24.26 |
phoneme | 15 k | Word-RNNLM | CTC-attention | 16.8 | 24.29 |
phoneme | 20 k | Word-RNNLM | CTC-attention | 16.1 | 23.50 |
Language Unit | LM | Acoustic Model | PER (%) | WER (%) |
---|---|---|---|---|
Phoneme | Phoneme-RNNLM | CTC-attention | 15.80 | 36.20 |
Phoneme | Phoneme-RNNLM | CTC-attention + SpecAugment | 14.60 | 34.01 |
Language Unit | Subword Unit | Acoustic Model | C/PER (%) | WER (%) |
---|---|---|---|---|
Subword | character | CTC-attention | 21.60 | 34.70 |
CTC-attention + SpecAugment | 16.90 | 31.30 | ||
Phoneme-based subword | phoneme | CTC-attention | 15.80 | 22.60 |
CTC-attention + SpecAugment | 14.60 | 21.40 | ||
Proposed phoneme-based with epenthesis subword | Phoneme | CTC-attention | 12.61 | 20.30 |
CTC-attention + SpecAugment | 12.80 | 18.42 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Emiru, E.D.; Xiong, S.; Li, Y.; Fesseha, A.; Diallo, M. Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings. Information 2021, 12, 62. https://doi.org/10.3390/info12020062
Emiru ED, Xiong S, Li Y, Fesseha A, Diallo M. Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings. Information. 2021; 12(2):62. https://doi.org/10.3390/info12020062
Chicago/Turabian StyleEmiru, Eshete Derb, Shengwu Xiong, Yaxing Li, Awet Fesseha, and Moussa Diallo. 2021. "Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings" Information 12, no. 2: 62. https://doi.org/10.3390/info12020062
APA StyleEmiru, E. D., Xiong, S., Li, Y., Fesseha, A., & Diallo, M. (2021). Improving Amharic Speech Recognition System Using Connectionist Temporal Classification with Attention Model and Phoneme-Based Byte-Pair-Encodings. Information, 12(2), 62. https://doi.org/10.3390/info12020062