Skip to Content

15 Results Found

  • Article
  • Open Access
65 Citations
11,370 Views
12 Pages

17 April 2019

With the improvement of computer performance, virtual reality (VR) as a new way of visual operation and interaction method gives the automatic lip-reading technology based on visual features broad development prospects. In an immersive VR environment...

  • Article
  • Open Access
4 Citations
3,682 Views
19 Pages

Variation in lighting conditions is a major cause of performance degradation in pattern recognition when using optical imaging. In this study, infrared (IR) and depth images were considered as possible robust alternatives against variations in illumi...

  • Article
  • Open Access
7 Citations
2,366 Views
19 Pages

12 February 2023

Endangered language generally has low-resource characteristics, as an immaterial cultural resource that cannot be renewed. Automatic speech recognition (ASR) is an effective means to protect this language. However, for low-resource language, native s...

  • Article
  • Open Access
3 Citations
3,404 Views
27 Pages

EMOLIPS: Towards Reliable Emotional Speech Lip-Reading

  • Dmitry Ryumin,
  • Elena Ryumina and
  • Denis Ivanko

27 November 2023

In this article, we present a novel approach for emotional speech lip-reading (EMOLIPS). This two-level approach to emotional speech to text recognition based on visual data processing is motivated by human perception and the recent developments in m...

  • Article
  • Open Access
6 Citations
6,407 Views
12 Pages

11 December 2019

Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human...

  • Article
  • Open Access
11 Citations
3,488 Views
12 Pages

Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely researched owing to the development in deep learning. Most VSR research works focus only on frontal face images. However, assuming real scenes, it is ob...

  • Article
  • Open Access
11 Citations
6,183 Views
21 Pages

12 April 2022

Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech rec...

  • Article
  • Open Access
3,621 Views
17 Pages

Speech recognition approaches typically fall into three categories: audio, visual, and audio–visual. Visual speech recognition, or lip reading, is the most difficult because visual cues are ambiguous and data is scarce. To address these challen...

  • Article
  • Open Access
5 Citations
3,678 Views
15 Pages

Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language

  • Yorghos Voutos,
  • Georgios Drakopoulos,
  • Georgios Chrysovitsiotis,
  • Zoi Zachou,
  • Dimitris Kikidis,
  • Efthymios Kyrodimos and
  • Themis Exarchos

28 February 2022

Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized w...

  • Article
  • Open Access
96 Citations
10,811 Views
29 Pages

17 February 2023

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual information can be used for both automatic lip-reading and gesture recogni...

  • Article
  • Open Access
12 Citations
7,491 Views
16 Pages

Research on Robust Audio-Visual Speech Recognition Algorithms

  • Wenfeng Yang,
  • Pengyi Li,
  • Wei Yang,
  • Yuxing Liu,
  • Yulong He,
  • Ovanes Petrosian and
  • Aleksandr Davydenko

5 April 2023

Automatic speech recognition (ASR) that relies on audio input suffers from significant degradation in noisy conditions and is particularly vulnerable to speech interference. However, video recordings of speech capture both visual and audio signals, p...

  • Article
  • Open Access
11 Citations
3,901 Views
28 Pages

An Effective Conversion of Visemes to Words for High-Performance Automatic Lipreading

  • Souheil Fenghour,
  • Daqing Chen,
  • Kun Guo,
  • Bo Li and
  • Perry Xiao

26 November 2021

As an alternative approach, viseme-based lipreading systems have demonstrated promising performance results in decoding videos of people uttering entire sentences. However, the overall performance of such systems has been significantly affected by th...

  • Article
  • Open Access
33 Citations
9,103 Views
20 Pages

23 December 2021

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark...

  • Article
  • Open Access
16 Citations
6,493 Views
12 Pages

7 February 2023

This paper investigates multimodal sensor architectures with deep learning for audio-visual speech recognition, focusing on in-the-wild scenarios. The term “in the wild” is used to describe AVSR for unconstrained natural-language audio st...

  • Article
  • Open Access
27 Citations
5,987 Views
15 Pages

Improving the Accuracy of Automatic Facial Expression Recognition in Speaking Subjects with Deep Learning

  • Sathya Bursic,
  • Giuseppe Boccignone,
  • Alfio Ferrara,
  • Alessandro D’Amelio and
  • Raffaella Lanzarotti

9 June 2020

When automatic facial expression recognition is applied to video sequences of speaking subjects, the recognition accuracy has been noted to be lower than with video sequences of still subjects. This effect known as the speaking effect arises during s...