15 Results Found

Advanced Search

Select all

Results per page

Select all

Article

65 Citations

11,370 Views

12 Pages

Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory

Yuanyao Lu and
Hongbo Li

Appl. Sci.2019, 9(8), 1599;https://doi.org/10.3390/app9081599

-

17 April 2019

With the improvement of computer performance, virtual reality (VR) as a new way of visual operation and interaction method gives the automatic lip-reading technology based on visual features broad development prospects. In an immersive VR environment...

Article

4 Citations

3,682 Views

19 Pages

Improving the Performance of Automatic Lip-Reading Using Image Conversion Techniques

Ki-Seung Lee

Electronics2024, 13(6), 1032;https://doi.org/10.3390/electronics13061032

-

9 March 2024

Variation in lighting conditions is a major cause of performance degradation in pattern recognition when using optical imaging. In this study, infrared (IR) and depth images were considered as possible robust alternatives against variations in illumi...

Article

7 Citations

2,366 Views

19 Pages

Improvement of Acoustic Models Fused with Lip Visual Information for Low-Resource Speech

Chongchong Yu,
Jiaqi Yu,
Zhaopeng Qian and
Yuchen Tan

Sensors2023, 23(4), 2071;https://doi.org/10.3390/s23042071

-

12 February 2023

Endangered language generally has low-resource characteristics, as an immaterial cultural resource that cannot be renewed. Automatic speech recognition (ASR) is an effective means to protect this language. However, for low-resource language, native s...

Article

3 Citations

3,404 Views

27 Pages

EMOLIPS: Towards Reliable Emotional Speech Lip-Reading

Dmitry Ryumin,
Elena Ryumina and
Denis Ivanko

Mathematics2023, 11(23), 4787;https://doi.org/10.3390/math11234787

-

27 November 2023

In this article, we present a novel approach for emotional speech lip-reading (EMOLIPS). This two-level approach to emotional speech to text recognition based on visual data processing is motivated by human perception and the recent developments in m...

Article

6 Citations

6,407 Views

12 Pages

Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi

Jing Wen and
Yuanyao Lu

Appl. Sci.2019, 9(24), 5432;https://doi.org/10.3390/app9245432

-

11 December 2019

Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human...

Article

11 Citations

3,488 Views

12 Pages

Multi-Angle Lipreading with Angle Classification-Based Feature Extraction and Its Application to Audio-Visual Speech Recognition

Shinnosuke Isobe,
Satoshi Tamura,
Satoru Hayamizu,
Yuuto Gotoh and
Masaki Nose

Future Internet2021, 13(7), 182;https://doi.org/10.3390/fi13070182

-

15 July 2021

Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely researched owing to the development in deep learning. Most VSR research works focus only on frontal face images. However, assuming real scenes, it is ob...

Article

11 Citations

6,183 Views

21 Pages

End-to-End Lip-Reading Open Cloud-Based Speech Architecture

Sanghun Jeon and
Mun Sang Kim

Sensors2022, 22(8), 2938;https://doi.org/10.3390/s22082938

-

12 April 2022

Deep learning technology has encouraged research on noise-robust automatic speech recognition (ASR). The combination of cloud computing technologies and artificial intelligence has significantly improved the performance of open cloud-based speech rec...

Article

3,621 Views

17 Pages

MultiAVSR: Robust Speech Recognition via Supervised Multi-Task Audio–Visual Learning

Shad Torrie,
Kimi Wright and
Dah-Jye Lee

Electronics2025, 14(12), 2310;https://doi.org/10.3390/electronics14122310

-

6 June 2025

Speech recognition approaches typically fall into three categories: audio, visual, and audio–visual. Visual speech recognition, or lip reading, is the most difficult because visual cues are ambiguous and data is scarce. To address these challen...

Article

5 Citations

3,678 Views

15 Pages

Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language

Yorghos Voutos,
Georgios Drakopoulos,
Georgios Chrysovitsiotis,
Zoi Zachou,
Dimitris Kikidis,
Efthymios Kyrodimos and
Themis Exarchos

Computers2022, 11(3), 34;https://doi.org/10.3390/computers11030034

-

28 February 2022

Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized w...

Article

96 Citations

10,811 Views

29 Pages

Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices

Dmitry Ryumin,
Denis Ivanko and
Elena Ryumina

Sensors2023, 23(4), 2284;https://doi.org/10.3390/s23042284

-

17 February 2023

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual information can be used for both automatic lip-reading and gesture recogni...

Article

12 Citations

7,491 Views

16 Pages

Research on Robust Audio-Visual Speech Recognition Algorithms

Wenfeng Yang,
Pengyi Li,
Wei Yang,
Yuxing Liu,
Yulong He,
Ovanes Petrosian and
Aleksandr Davydenko

Mathematics2023, 11(7), 1733;https://doi.org/10.3390/math11071733

-

5 April 2023

Automatic speech recognition (ASR) that relies on audio input suffers from significant degradation in noisy conditions and is particularly vulnerable to speech interference. However, video recordings of speech capture both visual and audio signals, p...

Article

11 Citations

3,901 Views

28 Pages

An Effective Conversion of Visemes to Words for High-Performance Automatic Lipreading

Souheil Fenghour,
Daqing Chen,
Kun Guo,
Bo Li and
Perry Xiao

Sensors2021, 21(23), 7890;https://doi.org/10.3390/s21237890

-

26 November 2021

As an alternative approach, viseme-based lipreading systems have demonstrated promising performance results in decoding videos of people uttering entire sentences. However, the overall performance of such systems has been significantly affected by th...

Article

33 Citations

9,103 Views

20 Pages

Lipreading Architecture Based on Multiple Convolutional Neural Networks for Sentence-Level Visual Speech Recognition

Sanghun Jeon,
Ahmed Elsharkawy and
Mun Sang Kim

Sensors2022, 22(1), 72;https://doi.org/10.3390/s22010072

-

23 December 2021

In visual speech recognition (VSR), speech is transcribed using only visual information to interpret tongue and teeth movements. Recently, deep learning has shown outstanding performance in VSR, with accuracy exceeding that of lipreaders on benchmark...

Article

16 Citations

6,493 Views

12 Pages

Multimodal Sensor-Input Architecture with Deep Learning for Audio-Visual Speech Recognition in Wild

Yibo He,
Kah Phooi Seng and
Li Minn Ang

Sensors2023, 23(4), 1834;https://doi.org/10.3390/s23041834

-

7 February 2023

This paper investigates multimodal sensor architectures with deep learning for audio-visual speech recognition, focusing on in-the-wild scenarios. The term “in the wild” is used to describe AVSR for unconstrained natural-language audio st...

Article

27 Citations

5,987 Views

15 Pages

Improving the Accuracy of Automatic Facial Expression Recognition in Speaking Subjects with Deep Learning

Sathya Bursic,
Giuseppe Boccignone,
Alfio Ferrara,
Alessandro D’Amelio and
Raffaella Lanzarotti

Appl. Sci.2020, 10(11), 4002;https://doi.org/10.3390/app10114002

-

9 June 2020

When automatic facial expression recognition is applied to video sequences of speaking subjects, the recognition accuracy has been noted to be lower than with video sequences of still subjects. This effect known as the speaking effect arises during s...