Next Article in Journal
Change-Point Detection in Autoregressive Processes via the Cross-Entropy Method
Previous Article in Journal
PUB-SalNet: A Pre-Trained Unsupervised Self-Aware Backpropagation Network for Biomedical Salient Segmentation
Open AccessArticle

The Effect of Different Deep Network Architectures upon CNN-Based Gaze Tracking

1
Department of Computer and Communication Engineering, Ming Chuan University, Taoyuan 333, Taiwan
2
Department of Communication Engineering, National Central University, Taoyuan 320, Taiwan
*
Author to whom correspondence should be addressed.
Algorithms 2020, 13(5), 127; https://doi.org/10.3390/a13050127
Received: 12 March 2020 / Revised: 17 May 2020 / Accepted: 17 May 2020 / Published: 19 May 2020
In this paper, we explore the effect of using different convolutional layers, batch normalization and the global average pooling layer upon a convolutional neural network (CNN) based gaze tracking system. A novel method is proposed to label the participant’s face images as gaze points retrieved from eye tracker while watching videos for building a training dataset that is closer to human visual behavior. The participants can swing their head freely; therefore, the most real and natural images can be obtained without too many restrictions. The labeled data are classified according to the coordinate of gaze and area of interest on the screen. Therefore, varied network architectures are applied to estimate and compare the effects including the number of convolutional layers, batch normalization (BN) and the global average pooling (GAP) layer instead of the fully connected layer. Three schemes, including the single eye image, double eyes image and facial image, with data augmentation are used to feed into neural network to train and evaluate the efficiency. The input image of the eye or face for an eye tracking system is mostly a small-sized image with relatively few features. The results show that BN and GAP are helpful in overcoming the problem to train models and in reducing the amount of network parameters. It is shown that the accuracy is significantly improved when using GAP and BN at the mean time. Overall, the face scheme has a highest accuracy of 0.883 when BN and GAP are used at the mean time. Additionally, comparing to the fully connected layer set to 512 cases, the number of parameters is reduced by less than 50% and the accuracy is improved by about 2%. A detection accuracy comparison of our model with the existing George and Routray methods shows that our proposed method achieves better prediction accuracy of more than 6%. View Full-Text
Keywords: gaze tracking; convolution neural network; batch normalization; global average pooling layer gaze tracking; convolution neural network; batch normalization; global average pooling layer
Show Figures

Figure 1

MDPI and ACS Style

Chen, H.-H.; Hwang, B.-J.; Wu, J.-S.; Liu, P.-T. The Effect of Different Deep Network Architectures upon CNN-Based Gaze Tracking. Algorithms 2020, 13, 127.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop