Next Article in Journal
Collaborative Networks: A Pillar of Digital Transformation
Next Article in Special Issue
Projection-Based Augmented Reality Assistance for Manual Electronic Component Assembly Processes
Previous Article in Journal
Rotomolding of Thermoplastic Elastomers Based on Low-Density Polyethylene and Recycled Natural Rubber
Previous Article in Special Issue
Design, Application and Effectiveness of an Innovative Augmented Reality Teaching Proposal through 3P Model
Open AccessArticle

Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi

by Jing Wen and Yuanyao Lu *
School of Information Science and Technology, North China University of Technology, Beijing 100144, China
Author to whom correspondence should be addressed.
Appl. Sci. 2019, 9(24), 5432;
Received: 11 November 2019 / Revised: 10 December 2019 / Accepted: 10 December 2019 / Published: 11 December 2019
(This article belongs to the Special Issue Augmented Reality, Virtual Reality & Semantic 3D Reconstruction)
Virtual Reality (VR) is a kind of interactive experience technology. Human vision, hearing, expression, voice and even touch can be added to the interaction between humans and machine. Lip reading recognition is a new technology in the field of human-computer interaction, which has a broad development prospect. It is particularly important in a noisy environment and within the hearing- impaired population and is obtained by means of visual information from a video to make up for the deficiency of voice information. This information is a visual language that benefits from Augmented Reality (AR). The purpose is to establish an efficient and convenient way of communication. However, the traditional lip reading recognition system has high requirements of running speed and performance of the equipment because of its long recognition process and large number of parameters, so it is difficult to meet the requirements of practical application. In this paper, the mobile end lip-reading recognition system based on Raspberry Pi is implemented for the first time, and the recognition application has reached the latest level of our research. Our mobile lip-reading recognition system can be divided into three stages: First, we extract key frames from our own independent database, and then use a multi-task cascade convolution network (MTCNN) to correct the face, so as to improve the accuracy of lip extraction. In the second stage, we use MobileNets to extract lip image features and long short-term memory (LSTM) to extract sequence information between key frames. Finally, we compare three lip reading models: (1) The fusion model of Bi-LSTM and AlexNet. (2) A fusion model with attention mechanism. (3) The LSTM and MobileNets hybrid network model proposed by us. The results show that our model has fewer parameters and lower complexity. The accuracy of the model in the test dataset is 86.5%. Therefore, our mobile lip reading system is simpler and smaller than other PC platforms and saves computing resources and memory space. View Full-Text
Keywords: mobile lip reading system; lightweight neural network; face correction; virtual reality (VR) mobile lip reading system; lightweight neural network; face correction; virtual reality (VR)
Show Figures

Figure 1

MDPI and ACS Style

Wen, J.; Lu, Y. Automatic Lip Reading System Based on a Fusion Lightweight Neural Network with Raspberry Pi. Appl. Sci. 2019, 9, 5432.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Search more from Scilit
Back to TopTop