Special Issue on Automatic Speech Recognition

Chen, Lijiang

doi:10.3390/app13095389

Open AccessEditorial

Special Issue on Automatic Speech Recognition

by

Lijiang Chen

School of Electronic and Information Engineering, Beihang University, Beijing 100191, China

Appl. Sci. 2023, 13(9), 5389; https://doi.org/10.3390/app13095389

Submission received: 4 April 2023 / Revised: 25 April 2023 / Accepted: 25 April 2023 / Published: 26 April 2023

(This article belongs to the Special Issue Automatic Speech Recognition)

Download Versions Notes

With the rapid development of artificial intelligence and deep learning technology, automatic speech recognition technology is experiencing new vitality. However, there are still technological barriers to flexible solutions and user satisfaction in this field. This is due to several factors, such as sensitivity to the environment (background noise) or the weak representation of grammatical and semantic knowledge. There are actually many factors affecting speech realization: regional, sociolinguistic, environmental or personal. These create a wide range of speech variations that may not be correctly recognized and modeled.

This Special Issue aimed to collect and present all breakthrough research on speech segmentation and phoneme detection, speech recognition with noised speech, speech translation, classification of emotions in speech, and multimodal speech recognition with video or physiological signals.

A total of nine papers (seven research papers and two review papers) in different fields of automatic speech recognition, including speech recognition of different languages, speech enhancement, and speech emotion recognition, are presented in this Special Issue. Bhardwaj et al. [1] reported that children’s speech recognition (SR) is a challenging task due to the large variations in the articulatory, acoustic, physical, and linguistic characteristics of children’s speech compared to adult speech. Dhouib et al. [2] reported existing speech recognition studies related to Arabic ASR. Ali et al. [3] proposed a precise speech recognition system to overcome the issues of accents and local differences that affect an automatic speech recognition (ASR) system’s performance while analyzing speech signals. Kłosowski et al. [4] presented a rule-based grapheme-to-phoneme conversion method and algorithm for Polish. Song et al. [5] proposed a fully convolutional neural network based on recursive recurrent convolution for monaural speech enhancement in the time domain. Chen et al. [6] proposed a model utilizing only the fundamental frequency from electroglottograph (EGG) signals and cross-modal emotion distillation (CMED) to train the EGG-based speech emotion recognition (SER) model. Hossain et al. [7] created a dataset containing 30 h of Bangla speech of seven regional Bangla dialects with the goal of detecting synthesized Bangla speech and categorizing it. Dua et al. [8] developed a speech-to-text recognition system to recognize the tonal speech signals of Gurbani hymns using a CNN. Chen et al. [9] proposed a method for extracting features from the EGG of target speakers and separate target speakers from mixtures of different speakers in a noisy environment without clean speech.

Although submissions for this Special Issue are now closed, more in-depth research in the field of automatic speech recognition continues to address the challenges we continue to face today, such as multilingual automatic translation, automatic emotion recognition, and multimodal human–computer interactions.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

Bhardwaj, V.; Ben Othman, M.T.; Kukreja, V.; Belkhier, Y.; Bajaj, M.; Goud, B.S.; Rehman, A.U.; Shafiq, M.; Hamam, H. Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review. Appl. Sci. 2022, 12, 4419. [Google Scholar] [CrossRef]
Dhouib, A.; Othman, A.; El Ghoul, O.; Khribi, M.K.; Al Sinani, A. Arabic Automatic Speech Recognition: A Systematic Literature Review. Appl. Sci. 2022, 12, 8898. [Google Scholar] [CrossRef]
Ali, M.H.; Jaber, M.M.; Abd, S.K.; Rehman, A.; Awan, M.J.; Vitkutė-Adžgauskienė, D.; Damaševičius, R.; Bahaj, S.A. Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System. Appl. Sci. 2022, 12, 1091. [Google Scholar] [CrossRef]
Kłosowski, P. A Rule-Based Grapheme-to-Phoneme Conversion System. Appl. Sci. 2022, 12, 2758. [Google Scholar] [CrossRef]
Song, Z.; Ma, Y.; Tan, F.; Feng, X. Hybrid Dilated and Recursive Recurrent Convolution Network for Time-Domain Speech Enhancement. Appl. Sci. 2022, 12, 3461. [Google Scholar] [CrossRef]
Chen, L.; Ren, J.; Mao, X.; Zhao, Q. Electroglottograph-Based Speech Emotion Recognition via Cross-Modal Distillation. Appl. Sci. 2022, 12, 4338. [Google Scholar] [CrossRef]
Hossain, P.S.; Chakrabarty, A.; Kim, K.; Piran, M.J. Multi-Label Extreme Learning Machine (MLELMs) for Bangla Regional Speech Recognition. Appl. Sci. 2022, 12, 5463. [Google Scholar] [CrossRef]
Dua, S.; Kumar, S.S.; Albagory, Y.; Ramalingam, R.; Dumka, A.; Singh, R.; Rashid, M.; Gehlot, A.; Alshamrani, S.S.; AlGhamdi, A.S. Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network. Appl. Sci. 2022, 12, 6223. [Google Scholar] [CrossRef]
Chen, L.; Mo, Z.; Ren, J.; Cui, C.; Zhao, Q. An Electroglottograph Auxiliary Neural Network for Target Speaker Extraction. Appl. Sci. 2023, 13, 469. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, L. Special Issue on Automatic Speech Recognition. Appl. Sci. 2023, 13, 5389. https://doi.org/10.3390/app13095389

AMA Style

Chen L. Special Issue on Automatic Speech Recognition. Applied Sciences. 2023; 13(9):5389. https://doi.org/10.3390/app13095389

Chicago/Turabian Style

Chen, Lijiang. 2023. "Special Issue on Automatic Speech Recognition" Applied Sciences 13, no. 9: 5389. https://doi.org/10.3390/app13095389

APA Style

Chen, L. (2023). Special Issue on Automatic Speech Recognition. Applied Sciences, 13(9), 5389. https://doi.org/10.3390/app13095389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Special Issue on Automatic Speech Recognition

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI