Next Article in Journal
Thermo-Mechanical Investigations of Packed Beds for High Temperature Heat Storage: Uniaxial Compression Test Experiments and Particle Discrete Simulations
Next Article in Special Issue
Design and Analysis of Cloud Upper Limb Rehabilitation System Based on Motion Tracking for Post-Stroke Patients
Previous Article in Journal
A Measurement Method of Microsphere with Dual Scanning Probes
Previous Article in Special Issue
BIM-Based AR Maintenance System (BARMS) as an Intelligent Instruction Platform for Complex Plumbing Facilities
Article Menu

Export Article

Open AccessArticle
Appl. Sci. 2019, 9(8), 1599; (registering DOI)

Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory

School of Information Science and Technology, North China University of Technology, Beijing 100144, China
Author to whom correspondence should be addressed.
Received: 27 March 2019 / Revised: 9 April 2019 / Accepted: 11 April 2019 / Published: 17 April 2019
(This article belongs to the Special Issue Augmented Reality: Current Trends, Challenges and Prospects)
PDF [3441 KB, uploaded 17 April 2019]
  |     |  


With the improvement of computer performance, virtual reality (VR) as a new way of visual operation and interaction method gives the automatic lip-reading technology based on visual features broad development prospects. In an immersive VR environment, the user’s state can be successfully captured through lip movements, thereby analyzing the user’s real-time thinking. Due to complex image processing, hard-to-train classifiers and long-term recognition processes, the traditional lip-reading recognition system is difficult to meet the requirements of practical applications. In this paper, the convolutional neural network (CNN) used to image feature extraction is combined with a recurrent neural network (RNN) based on attention mechanism for automatic lip-reading recognition. Our proposed method for automatic lip-reading recognition can be divided into three steps. Firstly, we extract keyframes from our own established independent database (English pronunciation of numbers from zero to nine by three males and three females). Then, we use the Visual Geometry Group (VGG) network to extract the lip image features. It is found that the image feature extraction results are fault-tolerant and effective. Finally, we compare two lip-reading models: (1) a fusion model with an attention mechanism and (2) a fusion model of two networks. The results show that the accuracy of the proposed model is 88.2% in the test dataset and 84.9% for the contrastive model. Therefore, our proposed method is superior to the traditional lip-reading recognition methods and the general neural networks. View Full-Text
Keywords: virtual reality (VR); self-attention; automatic lip-reading; sensory input; deep learning virtual reality (VR); self-attention; automatic lip-reading; sensory input; deep learning

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Lu, Y.; Li, H. Automatic Lip-Reading System Based on Deep Convolutional Neural Network and Attention-Based Long Short-Term Memory. Appl. Sci. 2019, 9, 1599.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top