Skip to Content

1,305 Results Found

  • Review
  • Open Access
25 Citations
9,110 Views
23 Pages

16 October 2023

Dysarthric speech has several pathological characteristics, such as discontinuous pronunciation, uncontrolled volume, slow speech, explosive pronunciation, improper pauses, excessive nasal sounds, and air-flow noise during pronunciation, which differ...

  • Article
  • Open Access
55 Citations
23,652 Views
17 Pages

KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition

  • Jeong-Uk Bang,
  • Seung Yun,
  • Seung-Hi Kim,
  • Mu-Yeol Choi,
  • Min-Kyu Lee,
  • Yeo-Jeong Kim,
  • Dong-Hyun Kim,
  • Jun Park,
  • Young-Jik Lee and
  • Sang-Hun Kim

3 October 2020

This paper introduces a large-scale spontaneous speech corpus of Korean, named KsponSpeech. This corpus contains 969 h of general open-domain dialog utterances, spoken by about 2000 native Korean speakers in a clean environment. All data were constru...

  • Article
  • Open Access
24 Citations
8,891 Views
18 Pages

Multilingual Speech Recognition for Turkic Languages

  • Saida Mussakhojayeva,
  • Kaisar Dauletbek,
  • Rustem Yeshpanov and
  • Huseyin Atakan Varol

28 January 2023

The primary aim of this study was to contribute to the development of multilingual automatic speech recognition for lower-resourced Turkic languages. Ten languages—Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Uyghur, an...

  • Article
  • Open Access
12 Citations
7,436 Views
16 Pages

Research on Robust Audio-Visual Speech Recognition Algorithms

  • Wenfeng Yang,
  • Pengyi Li,
  • Wei Yang,
  • Yuxing Liu,
  • Yulong He,
  • Ovanes Petrosian and
  • Aleksandr Davydenko

5 April 2023

Automatic speech recognition (ASR) that relies on audio input suffers from significant degradation in noisy conditions and is particularly vulnerable to speech interference. However, video recordings of speech capture both visual and audio signals, p...

  • Communication
  • Open Access
42 Citations
12,452 Views
11 Pages

24 August 2022

The study of understanding sentiment and emotion in speech is a challenging task in human multimodal language. However, in certain cases, such as telephone calls, only audio data can be obtained. In this study, we independently evaluated sentiment an...

  • Proceeding Paper
  • Open Access
1 Citations
1,850 Views
6 Pages

Design of the Speech Emotion Recognition Model

  • Hanping Ke,
  • Feng Luo and
  • Manyin Shi

Existing emotional feature methods only represent the limited information on the emotional state and lack the mining and utilization of the correlation between emotional features. Therefore, a new design scheme is proposed based on the psychological...

  • Article
  • Open Access
3,042 Views
13 Pages

Evaluation of Speech Quality Through Recognition and Classification of Phonemes

  • Svetlana Pekarskikh,
  • Evgeny Kostyuchenko and
  • Lidiya Balatskaya

25 November 2019

This paper discusses an approach for assessing the quality of speech while undergoing speech rehabilitation. One of the main reasons for speech quality decrease during the surgical treatment of vocal tract diseases is the loss of the vocal tractˈs pa...

  • Article
  • Open Access
11 Citations
6,879 Views
28 Pages

11 March 2024

Children’s Speech Recognition (CSR) is a challenging task due to the high variability in children’s speech patterns and limited amount of available annotated children’s speech data. We aim to improve CSR in the often-occurring scena...

  • Article
  • Open Access
34 Citations
5,654 Views
14 Pages

9 August 2022

Data augmentation techniques have recently gained more adoption in speech processing, including speech emotion recognition. Although more data tend to be more effective, there may be a trade-off in which more data will not provide a better model. Thi...

  • Article
  • Open Access
16 Citations
31,238 Views
19 Pages

17 February 2020

To build automatic speech recognition (ASR) systems with a low word error rate (WER), a large speech and text corpus is needed. Corpus preparation is the first step required for developing an ASR system for a language with few argument speech documen...

  • Article
  • Open Access
81 Citations
7,542 Views
13 Pages

Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network

  • Sakshi Dua,
  • Sethuraman Sambath Kumar,
  • Yasser Albagory,
  • Rajakumar Ramalingam,
  • Ankur Dumka,
  • Rajesh Singh,
  • Mamoon Rashid,
  • Anita Gehlot,
  • Sultan S. Alshamrani and
  • Ahmed Saeed AlGhamdi

19 June 2022

Deep learning-based machine learning models have shown significant results in speech recognition and numerous vision-related tasks. The performance of the present speech-to-text model relies upon the hyperparameters used in this research work. In thi...

  • Review
  • Open Access
232 Citations
28,886 Views
26 Pages

An Overview of End-to-End Automatic Speech Recognition

  • Dong Wang,
  • Xiaodong Wang and
  • Shaohe Lv

7 August 2019

Automatic speech recognition, especially large vocabulary continuous speech recognition, is an important issue in the field of machine learning. For a long time, the hidden Markov model (HMM)-Gaussian mixed model (GMM) has been the mainstream speech...

  • Review
  • Open Access
13 Citations
5,596 Views
47 Pages

Frontier Research on Low-Resource Speech Recognition Technology

  • Wushour Slam,
  • Yanan Li and
  • Nurmamet Urouvas

10 November 2023

With the development of continuous speech recognition technology, users have put forward higher requirements in terms of speech recognition accuracy. Low-resource speech recognition, as a typical speech recognition technology under restricted conditi...

  • Review
  • Open Access
100 Citations
14,458 Views
26 Pages

Automatic Speech Recognition (ASR) Systems for Children: A Systematic Literature Review

  • Vivek Bhardwaj,
  • Mohamed Tahar Ben Othman,
  • Vinay Kukreja,
  • Youcef Belkhier,
  • Mohit Bajaj,
  • B. Srikanth Goud,
  • Ateeq Ur Rehman,
  • Muhammad Shafiq and
  • Habib Hamam

27 April 2022

Automatic speech recognition (ASR) is one of the ways used to transform acoustic speech signals into text. Over the last few decades, an enormous amount of research work has been done in the research area of speech recognition (SR). However, most stu...

  • Article
  • Open Access
15 Citations
9,817 Views
11 Pages

Robust Cochlear-Model-Based Speech Recognition

  • Mladen Russo,
  • Maja Stella,
  • Marjan Sikora and
  • Vesna Pekić

Accurate speech recognition can provide a natural interface for human–computer interaction. Recognition rates of the modern speech recognition systems are highly dependent on background noise levels and a choice of acoustic feature extraction m...

  • Article
  • Open Access
10 Citations
5,009 Views
27 Pages

12 October 2022

Speech is a commonly used interaction-recognition technique in edutainment-based systems and is a key technology for smooth educational learning and user–system interaction. However, its application to real environments is limited owing to the...

  • Review
  • Open Access
259 Citations
30,904 Views
27 Pages

Deep Learning Techniques for Speech Emotion Recognition, from Databases to Models

  • Babak Joze Abbaschian,
  • Daniel Sierra-Sosa and
  • Adel Elmaghraby

10 February 2021

The advancements in neural networks and the on-demand need for accurate and near real-time Speech Emotion Recognition (SER) in human–computer interactions make it mandatory to compare available methods and databases in SER to achieve feasible solutio...

  • Article
  • Open Access
1 Citations
4,397 Views
29 Pages

Speech Recognition and Synthesis Models and Platforms for the Kazakh Language

  • Aidana Karibayeva,
  • Vladislav Karyukin,
  • Balzhan Abduali and
  • Dina Amirova

10 October 2025

With the rapid development of artificial intelligence and machine learning technologies, automatic speech recognition (ASR) and text-to-speech (TTS) have become key components of the digital transformation of society. The Kazakh language, as a repres...

  • Article
  • Open Access
937 Views
18 Pages

CAs-Net: A Channel-Aware Speech Network for Uyghur Speech Recognition

  • Jiang Zhang,
  • Miaomiao Xu,
  • Lianghui Xu and
  • Yajing Ma

17 June 2025

This paper proposes a Channel-Aware Speech Network (CAs-Net) for low-resource speech recognition tasks, aiming to improve recognition performance for languages such as Uyghur under complex noisy conditions. The proposed model consists of two key comp...

  • Review
  • Open Access
41 Citations
14,779 Views
22 Pages

Arabic Automatic Speech Recognition: A Systematic Literature Review

  • Amira Dhouib,
  • Achraf Othman,
  • Oussama El Ghoul,
  • Mohamed Koutheair Khribi and
  • Aisha Al Sinani

5 September 2022

Automatic Speech Recognition (ASR), also known as Speech-To-Text (STT) or computer speech recognition, has been an active field of research recently. This study aims to chart this field by performing a Systematic Literature Review (SLR) to give insig...

  • Article
  • Open Access
21 Citations
5,656 Views
17 Pages

Emotional Speech Recognition Method Based on Word Transcription

  • Gulmira Bekmanova,
  • Banu Yergesh,
  • Altynbek Sharipbay and
  • Assel Mukanova

2 March 2022

The emotional speech recognition method presented in this article was applied to recognize the emotions of students during online exams in distance learning due to COVID-19. The purpose of this method is to recognize emotions in spoken speech through...

  • Article
  • Open Access
10 Citations
4,925 Views
17 Pages

21 September 2019

This article presents the novel method for emotion recognition from speech based on committee of classifiers. Different classification methods were juxtaposed in order to compare several alternative approaches for final voting. The research is conduc...

  • Article
  • Open Access
96 Citations
10,734 Views
29 Pages

17 February 2023

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual information can be used for both automatic lip-reading and gesture recogni...

  • Article
  • Open Access
43 Citations
5,215 Views
20 Pages

26 December 2019

Speech emotion recognition is a challenging and widely examined research topic in the field of speech processing. The accuracy of existing models in speech emotion recognition tasks is not high, and the generalization ability is not strong. Since the...

  • Article
  • Open Access
40 Citations
6,822 Views
19 Pages

7 May 2019

Since conventional Automatic Speech Recognition (ASR) systems often contain many modules and use varieties of expertise, it is hard to build and train such models. Recent research show that end-to-end ASRs can significantly simplify the speech recogn...

  • Article
  • Open Access
3,605 Views
17 Pages

Speech recognition approaches typically fall into three categories: audio, visual, and audio–visual. Visual speech recognition, or lip reading, is the most difficult because visual cues are ambiguous and data is scarce. To address these challen...

  • Review
  • Open Access
22 Citations
11,761 Views
18 Pages

Code-Switching in Automatic Speech Recognition: The Issues and Future Directions

  • Mumtaz Begum Mustafa,
  • Mansoor Ali Yusoof,
  • Hasan Kahtan Khalaf,
  • Ahmad Abdel Rahman Mahmoud Abushariah,
  • Miss Laiha Mat Kiah,
  • Hua Nong Ting and
  • Saravanan Muthaiyah

23 September 2022

Code-switching (CS) in spoken language is where the speech has two or more languages within an utterance. It is an unsolved issue in automatic speech recognition (ASR) research as ASR needs to recognise speech in bilingual and multilingual settings,...

  • Review
  • Open Access
17 Citations
6,807 Views
21 Pages

19 February 2023

Superficially, read and spontaneous speech—the two main kinds of training data for automatic speech recognition—appear as complementary, but are equal: pairs of texts and acoustic signals. Yet, spontaneous speech is typically harder for r...

  • Article
  • Open Access
49 Citations
9,671 Views
19 Pages

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data

  • Ayesha Pervaiz,
  • Fawad Hussain,
  • Huma Israr,
  • Muhammad Ali Tahir,
  • Fawad Riasat Raja,
  • Naveed Khan Baloch,
  • Farruh Ishmanov and
  • Yousaf Bin Zikria

19 April 2020

The advent of new devices, technology, machine learning techniques, and the availability of free large speech corpora results in rapid and accurate speech recognition. In the last two decades, extensive research has been initiated by researchers and...

  • Article
  • Open Access
11 Citations
10,466 Views
32 Pages

Combined Hand Gesture — Speech Model for Human Action Recognition

  • Sheng-Tzong Cheng,
  • Chih-Wei Hsu and
  • Jian-Pan Li

12 December 2013

This study proposes a dynamic hand gesture detection technology to effectively detect dynamic hand gesture areas, and a hand gesture recognition technology to improve the dynamic hand gesture recognition rate. Meanwhile, the corresponding relationshi...

  • Article
  • Open Access
3 Citations
4,875 Views
12 Pages

Adapting Off-the-Shelf Speech Recognition Systems for Novel Words

  • Wiam Fadel,
  • Toumi Bouchentouf,
  • Pierre-André Buvet and
  • Omar Bourja

13 March 2023

Current speech recognition systems with fixed vocabularies have difficulties recognizing Out-of-Vocabulary words (OOVs) such as proper nouns and new words. This leads to misunderstandings or even failures in dialog systems. Ensuring effective speech...

  • Article
  • Open Access
11 Citations
3,208 Views
16 Pages

Speech GAU: A Single Head Attention for Mandarin Speech Recognition for Air Traffic Control

  • Shiyu Zhang,
  • Jianguo Kong,
  • Chao Chen,
  • Yabin Li and
  • Haijun Liang

The rise of end-to-end (E2E) speech recognition technology in recent years has overturned the design pattern of cascading multiple subtasks in classical speech recognition and achieved direct mapping of speech input signals to text labels. In this st...

  • Article
  • Open Access
35 Citations
5,902 Views
12 Pages

19 September 2019

This work presents a new approach to speech recognition, based on the specific coding of time and frequency characteristics of speech. The research proposed the use of convolutional neural networks because, as we know, they show high resistance to cr...

  • Article
  • Open Access
20 Citations
10,090 Views
17 Pages

Hierarchical Phoneme Classification for Improved Speech Recognition

  • Donghoon Oh,
  • Jeong-Sik Park,
  • Ji-Hwan Kim and
  • Gil-Jin Jang

4 January 2021

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech...

  • Article
  • Open Access
36 Citations
5,057 Views
18 Pages

Harris Hawks Sparse Auto-Encoder Networks for Automatic Speech Recognition System

  • Mohammed Hasan Ali,
  • Mustafa Musa Jaber,
  • Sura Khalil Abd,
  • Amjad Rehman,
  • Mazhar Javed Awan,
  • Daiva Vitkutė-Adžgauskienė,
  • Robertas Damaševičius and
  • Saeed Ali Bahaj

21 January 2022

Automatic speech recognition (ASR) is an effective technique that can convert human speech into text format or computer actions. ASR systems are widely used in smart appliances, smart homes, and biometric systems. Signal processing and machine learni...

  • Article
  • Open Access
4 Citations
4,451 Views
16 Pages

There is a large interest in the annotation of speech addressed to infants. Infant-directed speech (IDS) has acoustic properties that might pose a challenge to automatic speech recognition (ASR) tools developed for adult-directed speech (ADS). While...

  • Article
  • Open Access
5 Citations
4,648 Views
20 Pages

Edge Container for Speech Recognition

  • Lukáš Beňo,
  • Rudolf Pribiš and
  • Peter Drahoš

4 October 2021

Containerization has been mainly used in pure software solutions, but it is gradually finding its way into the industrial systems. This paper introduces the edge container with artificial intelligence for speech recognition, which performs the voice...

  • Article
  • Open Access
23 Citations
4,626 Views
14 Pages

10 September 2021

The performance of automatic speech recognition (ASR) may be degraded when accented speech is recognized because the speech has some linguistic differences from standard speech. Conventional accented speech recognition studies have utilized the accen...

  • Article
  • Open Access
3 Citations
2,022 Views
9 Pages

Speech Audiometry: The Development of Lithuanian Bisyllabic Phonemically Balanced Word Lists for Evaluation of Speech Recognition

  • Vija Vainutienė,
  • Justinas Ivaška,
  • Vytautas Kardelis,
  • Tatjana Ivaškienė and
  • Eugenijus Lesinskas

29 March 2024

Background and Objectives: Speech audiometry employs standardized materials, typically in the language spoken by the target population. Language-specific nuances, including phonological features, influence speech perception and recognition. The mater...

  • Article
  • Open Access
5 Citations
3,303 Views
14 Pages

Grammar-Supervised End-to-End Speech Recognition with Part-of-Speech Tagging and Dependency Parsing

  • Genshun Wan,
  • Tingzhi Mao,
  • Jingxuan Zhang,
  • Hang Chen,
  • Jianqing Gao and
  • Zhongfu Ye

27 March 2023

For most automatic speech recognition systems, many unacceptable hypothesis errors still make the recognition results absurd and difficult to understand. In this paper, we introduce the grammar information to improve the performance of the grammatica...

  • Communication
  • Open Access
13 Citations
4,095 Views
14 Pages

A New Network Structure for Speech Emotion Recognition Research

  • Chunsheng Xu,
  • Yunqing Liu,
  • Wenjun Song,
  • Zonglin Liang and
  • Xing Chen

22 February 2024

Deep learning promotes the breakthrough of emotion recognition in many fields, especially speech emotion recognition (SER). As an important part of speech emotion recognition, the most relevant acoustic feature extraction has always attracted the att...

  • Article
  • Open Access
14 Citations
2,819 Views
11 Pages

20 October 2022

Speech emotion recognition is an important part of human–computer interaction, and the use of computers to analyze emotions and extract speech emotion features that can achieve high recognition rates is an important step. We applied the Fractio...

  • Article
  • Open Access
22 Citations
7,135 Views
17 Pages

Development of Speech Recognition Systems in Emergency Call Centers

  • Alakbar Valizada,
  • Natavan Akhundova and
  • Samir Rustamov

9 April 2021

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the fact that d...

  • Article
  • Open Access
10 Citations
3,539 Views
13 Pages

12 January 2023

Building a good speech recognition system usually requires a lot of pairing data, which poses a big challenge for low-resource languages, such as Kazakh. In recent years, unsupervised pre-training has achieved good performance in low-resource speech...

  • Article
  • Open Access
3 Citations
4,072 Views
9 Pages

An Effective Learning Method for Automatic Speech Recognition in Korean CI Patients’ Speech

  • Jiho Jeong,
  • S. I. M. M. Raton Mondol,
  • Yeon Wook Kim and
  • Sangmin Lee

The automatic speech recognition (ASR) model usually requires a large amount of training data to provide better results compared with the ASR models trained with a small amount of training data. It is difficult to apply the ASR model to non-standard...

  • Article
  • Open Access
2 Citations
1,868 Views
11 Pages

Building a Speech Dataset and Recognition Model for the Minority Tu Language

  • Shasha Kong,
  • Chunmei Li,
  • Chengwu Fang and
  • Peng Yang

4 August 2024

Speech recognition technology has many applications in our daily life. However, for many low-resource languages without written forms, acquiring sufficient training data remains a significant challenge for building accurate ASR models. The Tu languag...

  • Article
  • Open Access
8 Citations
6,062 Views
20 Pages

Whispered Speech Recognition Based on Audio Data Augmentation and Inverse Filtering

  • Jovan Galić,
  • Branko Marković,
  • Đorđe Grozdić,
  • Branislav Popović and
  • Slavko Šajić

12 September 2024

Modern Automatic Speech Recognition (ASR) systems are primarily designed to recognize normal speech. Due to a considerable acoustic mismatch between normal speech and whisper, ASR systems suffer from a significant loss of performance in whisper recog...

  • Article
  • Open Access
2 Citations
1,115 Views
15 Pages

With the development of the marine economy and the increase in marine activities, deep saturation diving has gained significant attention. Helium speech communication is indispensable for saturation diving operations and is a critical technology for...

  • Article
  • Open Access
344 Views
16 Pages

Modern Speech Recognition for Romanian Language

  • Remus-Dan Ungureanu and
  • Mihai Dascalu

14 February 2026

Despite having approximately 24 million native speakers, Romanian remains a low-resource language for automatic speech recognition (ASR), with few accurate and publicly available systems. To address this gap, this study explores the challenges of ada...

  • Article
  • Open Access
15 Citations
2,366 Views
9 Pages

Speech emotion recognition is an emerging research field in the 21st century, which is of great significance to human–computer interaction. In order to enable various smart devices to better recognize and understand the emotions contained in hu...

of 27