Most Facial Expression Recognition (FER) systems rely on machine learning approaches that require large databases for an effective training. As these are not easily available, a good solution is to augment the databases with appropriate data augmentation (DA) techniques, which are typically based on either geometric transformation or oversampling augmentations (e.g., generative adversarial networks (GANs)). However, it is not always easy to understand which DA technique may be more convenient for FER systems because most state-of-the-art experiments use different settings which makes the impact of DA techniques not comparable. To advance in this respect, in this paper, we evaluate and compare the impact of using well-established DA techniques on the emotion recognition accuracy of a FER system based on the well-known VGG16 convolutional neural network (CNN). In particular, we consider both geometric transformations and GAN to increase the amount of training images. We performed cross-database evaluations: training with the "augmented" KDEF database and testing with two different databases (CK+ and ExpW). The best results were obtained combining horizontal reflection, translation and GAN, bringing an accuracy increase of approximately 30%. This outperforms alternative approaches, except for the one technique that could however rely on a quite bigger database.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited