You are currently viewing a new version of our website. To view the old version click .
Information
  • Article
  • Open Access

3 August 2022

Face Identification Using Data Augmentation Based on the Combination of DCGANs and Basic Manipulations

,
and
1
Laboratoire MIA, Université de La Rochelle, Avenue M. Crépeau, 17000 La Rochelle, France
2
Laboratoire MIRACL, Université de Sfax, Route de l’Aéroport, Sfax 3029, Tunisia
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue New Challenges of Face Detection Based on Deep Learning

Abstract

Recently, Deep Neural Networks (DNNs) have become a central subject of discussion in computer vision for a broad range of applications, including image classification and face recognition. Compared to existing conventional machine learning methods, deep learning algorithms have shown prominent performance with high accuracy and speed. However, they always require a large amount of data to achieve adequate robustness. Furthermore, additional samples are time-consuming and expensive to collect. In this paper, we propose an approach that combines generative methods and basic manipulations for image data augmentations and the FaceNet model with Support Vector Machine (SVM) for face recognition. To do so, the images were first preprocessed by a Deep Convolutional Generative Adversarial Net (DCGAN) to generate samples having realistic properties inseparable from those of the original datasets. Second, basic manipulations were applied on the images produced by DCGAN in order to increase the amount of training data. Finally, FaceNet was employed as a face recognition model. FaceNet detects faces using MTCNN, 128-D face embedding is computed to quantify each face, and an SVM was used on top of the embeddings for classification. Experiments carried out on the LFW and VGG image databases and ChokePoint video database demonstrate that the combination of basic and generative methods for augmentation boosted face recognition performance, leading to better recognition results.

1. Introduction

Despite the exceptional efficiency of 2D and 3D recognition, face recognition based on DNNs has faced several challenges, such as the difficulty of collecting enough training images, because DNNs often require a large amount of data for effective learning. Generally, a large volume of data is useful to achieve high recognition accuracy. Because DNNs have powerful learning ability, they need various views of the face for each subject. However, obtaining such a dataset for a one class is not only impractical, it is very time-consuming. An insufficient amount of samples leads to over-parameterization and over-fitting issues, resulting in an obvious decline in the effectiveness of the learning outcomes. Moreover, it is often necessary to train samples of faces in different conditions of illumination, facial expressions, poses, and occlusion. To deal with the issue of scarcity or insufficiency of samples, an efficient solution is to use data augmentation techniques. The main purpose of data augmentation is to expand the diversity and size of the training database and expose the model to various aspects of data in order to guarantee that the model is evaluated on images that are not seen twice during training, achieving greater robustness, higher accuracy, and stable classification performance. However, there is a gap in the literature concerning research on the methodological and theoretical augmentation techniques for face recognition. Furthermore, there is a lack of studies on how to increase the size of small datasets using augmentation. As a result, in the current work we focus on filling this gap. We propose a new data augmentation method and compare the effect of using various data augmentation techniques on face recognition accuracy. Specifically, we consider geometric transformations, image brightness changes, and application of different filter operations, as well as using DCGAN to generate new images from the original dataset to augment the amount of training samples.
A number of studies using deep learning methods have claimed high performance in a significant number of tasks. These include image classification [1], natural language processing [2], and text classification [3]. These models use the Softmax function in the classification layer. However, there have been studies [4,5] conducted that consider SVM as an alternative to the Softmax function for classification. These studies have asserted that the use of SVM in an artificial neural network (ANN) instead of the Softmax function may improve recognition accuracy. In our work, the augmented training data are sent to FaceNet to extract the embedding and then classified using an SVM.
The present work is an extension of our preliminary work published at the ISVC conference in 2020 [6]. The main contributions of this paper can be summarized as follows:
  • We propose a novel data augmentation technique in which DCGAN and basic manipulations are combined. We use the Wasserstein loss to replace the standard DCGAN cross-entropy loss to solve the problem of DCGAN training instability. We show that our model improves face recognition performance by considering the LFW dataset [7], VGGFace2 dataset [8], and ChokePoint video database [9].
  • We demonstrate the benefits of the proposed augmentation strategy for face recognition by comparing our approach with approaches using only basic manipulations and only a generative approach.
  • We show that the use of SVM instead of the Softmax function with a FaceNet model may improve face recognition accuracy compared to the other tested techniques.
The rest of the paper is organized as follows: in Section 2, we review the literature in the area of data augmentation and face recognition; Section 3 presents the proposed approach in detail; Section 4 discusses the qualitative and quantitative results of our proposed method; and finally, Section 5 contains our conclusions and future research directions.

3. Proposed Approach

Compared to our previous work [6], our proposed approach here is based on the combination of DCGAN and basic image manipulations for data augmentation to tackle the problem of scarce image data. Therefore, we decided to add synthetic images to the original face data. The face recognition system used here consists of several components, including fast and accurate face detection, face processing and cropping by computing facial landmarks using a Multi-task Cascaded Convolutional Neural Network (MTCNN), data augmentation by combining generative models and basic manipulations, face representation extraction, FaceNet training, and finally, applying SVM to classify and recognize faces in images and video streams, all of which is shown in the block diagram in Figure 1. Our proposed approach is carried out in the following steps: (1) MTCNN is applied for face detection; (2) the result of MTCNN is used as the input of the DCGAN to generate synthetic images; (3) basic manipulations are applied to the images created by the DCGAN; (4) images are added to the original data; and (5) FaceNet and SVM are applied for feature extraction and face recognition.
Figure 1. Overall architecture of the proposed approach.

3.1. Face Detection Using MTCNN

As a first step, we performed the same preprocessing on all training and testing samples. We detected the location and extracted the canonical coordinates of the face from the input image or video frame using the MTCNN model [58]. MTCNN tries to perform a canonical face alignment by identifying the geometric structure of faces based on rotation, translation and scale. MTCNN is based on carrying out several tasks at the same time: (1) bounding box regression; (2) probability prediction that the face sample predicted by the MTCNN is a real face; and (3) facial landmark localization (location of eyes, mouth corners, and nose tip). MTCNN has several networks in a cascade: (1) P-Net processes the image at multiple resolutions and produces candidate face bounding boxes quickly; (2) R-Net refines the predictions and works as filter for selecting the high-accuracy candidate box; (3) O-Net further refines the predictions and generates the final bounding boxes. To remove overlapping bounding boxes, a Non-Maximum Suppression algorithm is applied. The network outputs the positions of the key features of the face. All input face images are cropped and further resized to 112 × 112 pixels according to the five facial points detected by MTCNN. A similar transformation is made depending on the position of the located keypoints, ensuring that all faces are cropped into images of a certain dimensions.

3.2. Data Augmentation Using DCGAN Combined with Basic Manipulations

After applying face alignment and cropping, the extracted faces were passed through a DCGAN to synthesize new images. Basic manipulations are characterized by their simplicity and ease of implementation, however, they cannot reveal realistic face variations. While the generative approach can generate significant realistic face variations, it uses additional resources. To alleviate this issue, we trained the DCGAN [59] using the images from each class of the original dataset in order to learn the features of the faces and then artificially synthesize more facial images. We exploited the capacity of the DCGAN generator to synthesize more faces similar to the original faces in the training dataset, the idea being to simultaneously train two adversarial neural networks: the discriminator tries to discern whether the sample comes from the actual data distribution, while the generator aims to trick the discriminator by generating better samples. In the DCGAN training process, the discriminator aims to differentiate real samples from G(z), i.e., log(D(x)) + log(1 − D(z)) should be maximized. Here, G(z) represents the generator, which takes as input a random noise z sampled from a uniform distribution, while D(.) represents the discriminator, which takes x as input, such as images, from the selected database or the output of G(z). At the same time, the generator aims to trick the discriminator by minimizing log(1 − D(z)). A stable point in training is achieved when, after multiple steps, the discriminator is unable to differentiate between x and G(z). The generator takes a random noise vector as input sampled from a uniform distribution between (−1, 1), followed by a fully connected layer containing 8192 neurons and resized to the dimension of 4 × 4 × 1024, then four transposed convolutional layers with a stride of 2 and padding, which results in channel reduction and an upsampling of the features by factor of two. The final output images have a of size 64 × 64 × 3. The discriminator, D(x), is the inverse of the generator. The input image with dimensions 64 × 64 × 3 is transmitted through four consecutive convolutional layers with final output dimensions 4 × 4 × 512. A Softmax function is implemented to convert output real logits from the last fully connected layer into the final class probabilities. A learning rate of 0.0002 is used when training with the Adam optimizer, with a momentum term of β 1 = 0.5. The batch size is fixed at 128 and the weights are initialized from the normal distribution with a standard deviation of 0.02. Finally, a Nash equilibrium is achieved when the output of the discriminator is 0.5. That is to say, the discriminator D can no longer judge whether the input is from real data or from generated data. However, DCGANs have multiple problems, such as unstable training and unconvergent generator loss function. In this work, to solve the problem of mode collapse and vanishing gradient we propose using the Wasserstein distance [60] to evaluate the distance between generated samples and actual samples. The benefit of the Wasserstein distance is that it can measure the distance between two non-coincident parts and further enhance the stability of training compared with the original DCGAN. The use of Wasserstein loss allowed the DCGAN to generate high-resolution synthetic faces with a high level of detail. Next, we applied basic manipulations, including geometric transformations (rotation and translation), brightness change, and filter operations (Gaussian filter, median filter, mean filter, and bilateral filter) to the DCGAN-generated images in order to enlarge the training data in a more efficient way. FaceNet was used to extract face embeddings, which were later classified using SVM model.

3.3. Face Recognition Using FaceNet Combined with SVM

Our face recognition system integrates FaceNet with SVM for facial embedding feature extraction and classification, respectively. FaceNet [43] was used to obtain facial features and determine whether there was any matching between the input faces and a non-matching face with triplet-based loss function [54]. Figure 2 shows the structure of the FaceNet model, in which the face image is inserted into a DCNN which learns the features directly from the face image pixels, followed by an L2 normalization layer to finally obtain a 128-byte vector which results in the face image represented by the face embedding. Using an input face image, this model extracts 128-D vectors as face representations, which are then used to cluster faces in an efficient way. Unlike other face representation models, this embedding technique has the advantage that a larger distance between two face representations means that the faces are probably of different people. Training of the network requires a face triplet, a face image of the target person, a test face image of the target person, and a face image of a different person. This advantage facilitates the detection of similarities, grouping, and classification compared to other face recognition methods in which the Euclidean distance between features is not important. The network is trained in such a manner that the squared L2 distances between two embeddings correspond to face similarity. Faces of the “same” person have close distances and faces of different people have great distances. After this encoding has been generated, the distance between the two encodings is thresholded for face verification. To obtain the face embedding, FaceNet [43] generally uses two architectures based on CNNs. The first category adds 1 × 1 × d convolutional layers between the standard convolutional layers of the Zeiler and Fergus [61] architecture, then obtains a 22-layers NN1 model. The second category consists of an Inception model based on GoogLeNet [62]. Figure 3 represents the network structure of an Inception module. It contains four branches, from left to right. It employs convolution with 1 × 1 filters as well as 3 × 3 and 5 × 5 filters and a 3 × 3 max pooling layer. Each branch employs a 1 × 1 convolution to decrease time complexity. Finally, FaceNet employs the triplet loss function to train the model, which is used to minimize the distance between an anchor and a positive sample of the same person and to maximize the distance between the anchor and a negative sample.
Figure 2. FaceNet model architecture: FaceNet consists of a batch input layer and a deep CNN (DCNN) followed by L2 normalization, which provides face embedding. Finally, the triplet loss is calculated during training.
Figure 3. Inception module.
In this work, we propose to modify the original FaceNet networkby using SVM instead of the last fully connected layer, followed by L2 normalization. We propose to remove the last fully connected layer in a CNN and replace it with an SVM. As SVM is an efficient supervised learning algorithm for classification, this alteration creates a combined architecture with SVM for the facial recognition task. SVM is a very fast machine learning algorithm for multiclass classification, and it can be used for large data sets. Althought he FaceNet model can be used as part of the classifier itself, in our work we used the FaceNet model to pre-process a face in order to create a face embedding that could be employed as input in our classifier model. This latter approach was favoured because the FaceNet model is both large and slow to generate a face embedding.

4. Results and Discussions

In this paper, we propose to combine DCGANs and basic image manipulations as data augmentation techniques for our face recognition approach based on FaceNet + SVM. We evaluated the proposed method on two face image datasets, the LFW dataset [7] and VGGFace2 dataset [8], as well as on a video face dataset, the ChokePoint dataset [9]. We proposed to add 100, 250 and 500 generated images per one class. We compared our proposed approach with three popular face recognition algorithms: Principal Component Analysis (PCA), Tensor Robust Principal Component Analysis (TRPCA) [38], and Local Binary Pattern Histograms (LBPH). PCA is often employed for dimensionality reduction. It keeps only the values which contribute the most to variance in order to compress the dataset. It decomposes the covariance matrix to obtain the principal components (i.e., the eigenvectors) of the data and their corresponding eigenvalues. TRPCA [38] plays a critical role in handling high multi-dimensional datasets, aiming to recover the low-ranked and sparse components both accurately and efficiently. The LBPH method is based on the LBP, which is considered a texture description method. Using a face image, the histogram features are extracted from the occurrences of the LBP codes for texture categorization. Then, classification is carried out by finding the similarity between histograms. Additionally, we compared our approach with the work of Pei et al. [63], which was based on standard data augmentation techniques, including filtering, geometric transformations (rotation, translation, …), and brightness changes along with a VGG-16 network for face classification. Furthermore, we compared our work with our own previous work [6] based on DCGANs for data augmentation and FaceNet + SVM for face recognition. In our experiments, we began by identifying faces in images, then moved on to identifying faces in videos.

4.1. Datasets

4.1.1. Labeled Faces in the Wild (LFW) Dataset

The LFW dataset is the standard dataset for face verification and recognition. This dataset contains 13,233 facial images of 5749 subjects. It includes several challenges, such as varying face poses, expressions, and illumination conditions as well as partial occlusion. In this dataset, only 1680 subjects out of a total of 5749 identities have more than one facial image. A subset of the database containing 3137 pictures obtained from 62 identities was employed in our experiments by choosing only those subjects with 20 or more images [7].

4.1.2. VGGFace2 Dataset

The VGGFace2 dataset contains 9000 identities. The distribution of faces for different subjects is varied, from 87 to 843, with a mean of 362 images for each subject. We did not perform experiments on the entire dataset for time reasons. We selected a subset of the dataset by randomly choosing 20 identities to assess the performance of our method. The selected subset consisted of eight women and twelve men, with a total of 7746 samples [8].

4.1.3. ChokePoint Dataset

The ChokePoint video dataset was designed for the identification of people acquired in real-world surveillance environments. The faces contained in the database have variations in terms of their pose, sharpness, and lighting, as well as misalignment due to the automatic detection/localization of faces. The Chokepoint video dataset consists of 25 identities (nineteen men and six women) in Portal 1 and 29 identities (twenty-three men and six women) in Portal 2. We used Portal 1 for our experiments [9].

4.2. Evaluation

4.2.1. Quality of Generated Images

In this paper, we trained a DCGAN model on two facial datasets, the LFW dataset [7] and VGGFace2 dataset [8]. Figure 4 and Figure 5 show samples of images generated by the DCGAN. As expected, the images appear realistic, although with occasional artifacts. More realistic images can be generated by increasing the number of epochs. In this experiment, the DCGAN produces images which are difficult to assess subjectively against real images. These augmented samples are very similar to the original images. The quality of the generated images improved over the course of 40 epochs. DCGANs represent a very effective tool to modify attributes of real faces such as gender, age, skin, and hair color, and allow for surprisingly realistic results. DCGAN-generated samples cannot be distinguished by the CNN discriminator in terms of whether they are real or not. Thus, we assumed that DCGAN-generated samples would have similar CNN features, function as similar images, and help to create a limited dataset. Then, we applied basic manipulations on the synthetic images produced by the DCGAN. Figure 6 shows an example of the basic image manipulations applied to images generated by DCGAN. We added 100, 250, and 500 synthetic images per class to the LFW dataset [7] and VGGFace2 dataset [8]. Our proposed method demonstrates the ability to synthesize realistic faces with DCGANs and basic manipulations, which can be used to improve face recognition tasks.
Figure 4. Generated images using DCGAN on LFW dataset [7] after 50 epochs.
Figure 5. Generated images using DCGAN on VGGFace2 dataset [8] after 50 epochs.
Figure 6. Example of basic manipulations applied on images from LFW dataset [7].

4.2.2. Results

As shown in Table 1, the more training samples are added, the higher the accuracy of the model is. The results show that our face recognition approach achieves an accuracy of 64% and 61% with the LFW dataset [7] and VGGFace2 dataset [8], respectively. After a period of collecting more samples, the accuracy increases to 90% and 88%, respectively. Furthermore, when adding 500 samples per class, we achieve 94.5% accuracy on the LFW dataset [7] and 92.2% on the VGGFace2 dataset [8]. Table 2 shows that face recognition accuracy increases by 1.7% when adding only 100 images per class using ChokePoint dataset [9]. As expected, the face recognition accuracy increases as the amount of training samples increases.
Table 1. Face recognition accuracy with data augmentation using the proposed method.
Table 2. Face recognition accuracy with data augmentation using the proposed method.
Table 3 and Table 4 show the results of the experiments using 62 classes from the LFW dataset [7] and 20 classes from the VGGFace2 dataset [8]. These experimental results demonstrate that our approach outperforms the traditional face recognition methods, namely, PCA, LBPH, and TRPCA. The results presented in Table 5, obtained with ChokePoint dataset [9], confirm the effectiveness of our approach. Our proposed method based on DCGAN and basic manipulations for data augmentation and FaceNet + SVM for face recognition has more advantages than the PCA, TRPCA and LBPH methods, which use a smaller number of samples.
Table 3. Recognition performance with different methods using 62 classes from LFW dataset [7].
Table 4. Recognition performance with different methods using 20 classes from VGGFace2 dataset [8].
Table 5. Recognition performance with different methods using portal 1 from ChokePoint dataset [9].
Moreover, we can observe that our proposed approach achieves better results than the work of Pei et al. [63], which was based on only basic data augmentation techniques (geometric transformations, brightness change, filtering and CNN for face recognition); see Table 3 and Table 4. The number of augmented images did not change, at 500 images per face. The results shown in Table 5 confirm the effectiveness of our proposed approach on the ChokePoint dataset [9]. Again, the number of augmented images did not change, at 100 images per face.
Additionally, the experimental evaluation demonstrates that a significant increase in accuracy can be obtained by combining DCGANs and basic manipulations for data augmentation and FaceNet + SVM for face recognition compared to only basic manipulations (geometric transformations, brightness change, filtering …) as a data augmentation technique (see Table 3, Table 4 and Table 5). The obtained results show that our proposed approach using a DCGAN with a filter operation for data augmentation achieves higher accuracy than our previous work [6] based on only DCGANs, with a difference of 1.28% and 0.04%, respectively, for the LFW dataset [7] and VGGFace2 dataset [8] and 0.62% for the ChokePoint dataset [9]. Moreover, the results show improvement with our proposed approach using DCGAN and basic manipulations (geometric transformations, brightness change) as a data augmentation method, with a difference of 2.38% and 0.38% over our previous proposed method [6] for the LFW dataset [7] and VGGFace2 dataset [8], respectively, and 1.22% for the ChokePoint dataset [9]. These results show that augmentations can, in general, considerably improve the quality of face recognition systems, and that the combination of generative and basic manipulations performs better than the other tested techniques.
Figure 7 shows that the face A n d r e _ A g a s s i from the LFW dataset [7] is recognized with 72.32% accuracy. However, the confidence is higher with data augmentation based on the combination of DCGAN and basic transformations, achieving 77.08%. In Figure 8a, we can see that the face prediction has only 50.71% confidence when using the ChokePoint dataset [9]; however, this confidence is higher when applying data augmentation with DCGAN and basic manipulations, achieving 91.63%, as shown in Figure 8b. The same is the case in Figure 8c,d with an increase of 1.8% when adding more samples per class. The results with the LFW database [7], VGG database [8], and ChokePoint database [9] show that the proposed approach can improve face recognition performance and lead to better recognition results.
Figure 7. Face confidence using LFW dataset [7].
Figure 8. Face confidence using ChokePoint dataset [9].

5. Conclusions

Researchers use data augmentation to increase the size of the datasets used to train deep learning models. In this paper, we demonstrate that the combination of DCGAN and basic manipulations can generate data that approximate real face images, thereby both providing a larger data set for the training of large neural networks and improving the generalization ability of recognition models. Based on the augmented human face dataset, facial features were extracted using FaceNet and then classified using SVM. The effectiveness of the proposed method was demonstrated by various experiments and comparisons with frequently used data augmentation and face recognition methods. The proposed data augmentation method generates images realistic enough to boost the performance of face recognition systems.
Although the combination of approaches used here for augmentation demonstrates a good increase in accuracy, the basic approach is not far behind in terms of performance while requiring less time and hardware resources. Improving the quality of DCGAN-generated samples and evaluating their effectiveness on a broad range of datasets is a very important area for future work. DCGANs can be optimized by adjusting parameters such as batch size, momentum, and learning rate in order to generate more realistic and diverse face samples, which could further improve the accuracy of the results. Future work could include the use of Wasserstein loss with a gradient penalty to improve the quality of the generated images.

Author Contributions

Conceptualization, S.A.; writing—review and editing, T.B.; M.N. substantially revised the manuscript and supervised, and all authors commented on previous versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wu, R.; Yan, S.; Shan, Y.; Dang, Q.; Sun, G. Deep image: Scaling up image recognition. arXiv 2015, arXiv:1501.02876. [Google Scholar]
  2. Torfi, A.; Shirvani, R.; Keneshloo, Y.; Fox, E. Natural language processing advancements by deep learning: A survey. arXiv 2020, arXiv:2003.01200. [Google Scholar]
  3. Yang, Z.; Yang, D.; Dyer, C. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
  4. Agarap, A.F. An Architecture Combining Convolutional Neural Network (CNN) and Support Vector Machine (SVM) for Image Classification. arXiv 2019, arXiv:1712.03541v2. [Google Scholar]
  5. Suguna, G.C.; Kavitha, H.S.; Sunita, S. Face Recognition System For Realtime Applications Using SVM Combined With FaceNet And MTCNN. Int. J. Electr. Eng. Technol. (IJEET) 2021, 12, 328–335. [Google Scholar]
  6. Ammar, S.; Bouwmans, T.; Zaghden, N.; Neji, M. Towards an Effective Approach for Face Recognition with DCGANs Data Augmentation. Adv. Vis. Comput. 2020, 12509. [Google Scholar]
  7. Huang, G.B.; Mattar, M.; Tamara, B.; Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. In Proceedings of the Workshop on Faces in ’Real-Life’ Images: Detection, Alignment, and Recognition, Tuscany, Italy, 28 July–3 August 2008. [Google Scholar]
  8. Cao, Q.; Shen, L.; Xie, W.; Parkhi, O.M.; Zisserman, A. VGGFace2: A dataset for recognising face across pose and age. In Proceedings of the International Conference on Automatic Face and Gesture Recognition, Xi’an, China, 15–19 May 2018. [Google Scholar]
  9. Wong, Y.; Chen, S.; Mau, S.; Sanderson, C.; Lovell, B.C. Patch-based Probabilistic Image Quality Assessment for Face Selection and Improved Video-based Face Recognition. In Proceedings of the IEEE Biometrics Workshop, Computer Vision and Pattern Recognition (CVPR) Workshops, Colorado Springs, CO, USA, 20–25 June 2011; pp. 81–88. [Google Scholar]
  10. Kwasigroch, A.; Mikołajczyk, A.; Grochowski, M. Deep neural networks approach to skin lesions classification—A comparative analysis. In Proceedings of the International Conference on Methods and Models in Automation and Robotics (MMAR), Miedzyzdroje, Poland, 28–31 August 2017; pp. 1069–1074. [Google Scholar]
  11. Ben Fredj, H.; Bouguezzi, S.; Souani, C. Face recognition in unconstrained environment with CNN. Vis. Comput. 2020, 37, 217–226. [Google Scholar] [CrossRef]
  12. Noh, H.; You, T.; You, M.J.; Han, B. Regularizing deep neural networks by noise: Its interpretation and optimization. Adv. Neural Inf. Process. Syst. 2017, 5109–5118. [Google Scholar]
  13. Francisco, J.M.-B.; Fiammetta, S.; Jose, M.J.; Daniel, U.; Leonardo, F. Forward noise adjustment scheme for data augmentation. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018. [Google Scholar]
  14. Xu, Y.; Li, X.; Yang, J.; Zhang, D. Integrate the original face image and its mirror image for face recognition. Neurocomputing 2014, 131, 191–199. [Google Scholar] [CrossRef]
  15. Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random erasing data augmentation. Proc. AAAI Conf. Artif. Intell. 2020, 34, 13001–13008. [Google Scholar] [CrossRef]
  16. Mohammadzade, H.; Hatzinakos, D. Projection into expression subspaces for face recognition from single sample per person. IEEE Trans. Affect. Comput. 2013, 4, 69–82. [Google Scholar] [CrossRef]
  17. Kang, G.; Dong, X.; Zheng, L.; Yang, Y. PatchShuffle regularization. arXiv 2017, arXiv:1707.07103. [Google Scholar]
  18. Lv, J.; Shao, X.; Huang, J.; Zhou, X.; Zhou, X. Data augmentation for face recognition. Neurocomputing 2017, 230, 184–196. [Google Scholar] [CrossRef]
  19. Li, B.; Wu, F.; Lim, S.; Weinberger, K. On feature normalization and data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 12383–12392. [Google Scholar]
  20. Zheng, X.; Chalasani, T.; Ghosal, K.; Lutz, S. Stada: Style transfer as data augmentation. arXiv 2019, arXiv:1909.01056. [Google Scholar]
  21. Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
  22. Christopher, B.; Liang, C.; Ricardo, G.P.B.; Roger, G.; Hammers, A.; David, A.D.; Maria, V.H. GAN augmentation: Augmenting training data using generative adversarial networks. arXiv 2018, arXiv:1810.10863. [Google Scholar]
  23. Yi, W.; Sun, Y.; He, S. Data Augmentation Using Conditional GANs for Facial Emotion Recognition. In Proceedings of the Progress in Electromagnetics Research Symposium, Toyama, Japan, 1–4 August 2018. [Google Scholar]
  24. Doersch, C. Tutorial on Variational Autoencoders. arXiv 2016, arXiv:1606.05908. [Google Scholar]
  25. Ammar, S.; Zaghden, N.; Neji, M. A Framework for People Re-Identification in Multi-Camera Surveillance Systems; International Association for Development of the Information Society: Lisbon, Portugal, 2017. [Google Scholar]
  26. Ammar, S.; Bouwmans, T.; Zaghden, N.; Neji, M. From Moving Objects Detection to Classification And Recognition: A Review for Smart Cities. In Handbook on Towards Smart World: Homes to Cities using Internet of Things Publisher; CRC Press, Taylor and Francis Group: Boca Raton, FL, USA, 2017. [Google Scholar]
  27. Anzar, S.M.; Amrutha, T. Efficient wavelet based scale invariant feature transform for partial face recognition. In AIP Conference Proceedings; AIP Publishing LLC: New York, NY, USA, 2020; Volume 2222, p. 030017. [Google Scholar]
  28. Ghorbel, A.; Tajouri, I.; Aydi, W.; Masmoudi, N. A comparative study of GOM, uLBP, VLC and fractional Eigenfaces for face recognition. In Proceedings of the 2016 International Image Processing, Applications and Systems (IPAS), Virtual Event, Italy, 9–11 December 2016; pp. 1–5. [Google Scholar]
  29. Johannes, R.; Armin, S. Face Recognition with Machine Learning in OpenCV Fusion of the results with the Localization Data of an Acoustic Camera for Speaker Identification. arXiv 2017, arXiv:1707.00835. [Google Scholar]
  30. Khoi, P.; Thien, L.H.; Viet, V.H. Face Retrieval Based on Local Binary Pattern and Its Variants: A Comprehensive Study. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 249–258. [Google Scholar] [CrossRef] [Green Version]
  31. Xi, M.; Chen, M.; Polajnar, D.; Tong, W. Local binary pattern network: A deep learning approach for face recognition. IEEE ICIP 2016, 25, 3224–3228. [Google Scholar]
  32. Laure Kambi, I.; Guo, C. Enhancing face identification using local binary patterns and k-nearest neighbors. J. Imaging 2017, 3, 37. [Google Scholar]
  33. Kumar, D.; Garaina, J.; Kisku, D.R.; Sing, J.K.; Gupta, P. Unconstrained and Constrained Face Recognition Using Dense Local Descriptor with Ensemble Framework. Neurocomputing 2020, 408, 273–284. [Google Scholar] [CrossRef]
  34. Karraba, M.; Surinta, O.; Schomaker, L.; Wiering, M. Robust face recognition by computing distances from multiple histograms of oriented gradients. IEEE Symp. Ser. Comput. Intell. 2015, 7, 10. [Google Scholar]
  35. Arigbabu, O.; Ahmad, S.; Adnan, W.A.W.; Yussof, S.; Mahmood, S. Soft biometrics: Gender recognition from unconstrained face images using local feature descriptor. arXiv 2017, arXiv:1702.02537. [Google Scholar]
  36. Napoléon, T.; Alfalou, A. Local binary patterns preprocessing for face identification/verification using the VanderLugt correlator. In Optical Pattern Recognition; SPIE: Bellingham, DC, USA, 2014; pp. 408–909. [Google Scholar]
  37. Lu, C.; Feng, J.; Chen, Y.; Liu, W. Tensor Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Tensors via Convex Optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5249–5257. [Google Scholar]
  38. Shuting, C.; Luo, Q.; Yang, M.; Xiao, M. Tensor Robust Principal Component Analysis via Non-Convex Low Rank Approximation. Appl. Sci. 2019, 9, 7. [Google Scholar]
  39. Liu, Y. Tensors for Data Processing: Theory, Methods and Applications, 1st ed.; Academic Press: Cambridge, MA, USA, 2021. [Google Scholar]
  40. Qian, Y.; Gong, M.; Cheng, L. Stocs: An efficient self-tuning multiclass classification approach. In Proceedings of the Canadian Conference on Artificial Intelligence, Halifax, NS, Canada, 2–5 June 2015; pp. 291–306. [Google Scholar]
  41. Wu, Z.; Peng, M.; Chen, T. Thermal face recognition using convolutional neural network. In Proceedings of the 2016 International Conference on Optoelectronics and Image Processing (ICOIP), Warsaw, Poland, 10–12 June 2016; pp. 6–9. [Google Scholar]
  42. Song, L.; Gong, D.; Li, Z.; Liu, C.; Liu, W. Occlusion Robust Face Recognition Based on Mask Learning with Pairwise Differential Siamese Network. In Proceedings of the 2019 International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
  43. Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
  44. Weinberger, K.Q.; Blitzer, J.; Saul, L.K. Distance metric learning for large margin nearset neighbor classification. J. Mach. Learn. Res. Adv. Neural Inf. Process. Syst. 2009, 10, 207–244. [Google Scholar]
  45. Liu, W.; Wren, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July; pp. 212–220.
  46. Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 4690–4699. [Google Scholar]
  47. Tornincasa, S.; Vezzetti, E.; Moos, S.; Violante, M.G.; Marcolin, F.; Dagnes, N.; Ulrich, L.; Tregnaghi, G.F. 3D Facial Action Units and Expression Recognition using a Crisp Logic. Comput. Aided Des. Appl. 2019, 16, 256–268. [Google Scholar] [CrossRef]
  48. Dagnes, N.; Marcolin, F.; Vazzetti, E.; Sarhan, F.R.; Dakpé, S.; Marin, F.; Nonis, F.; Mansour, K.B. Optimal marker set assessment for motion capture of 3D mimic facial movements. J. Biomech. 2019, 93, 86–93. [Google Scholar] [CrossRef]
  49. Sun, Y.; Liang, D.; Wang, X.; Tang, X. Deepid3: Face recognition with very deep neural networks. arXiv 2015, arXiv:1502.00873. [Google Scholar]
  50. Zhu, Z.; Luo, P.; Wang, X.; Tang, X. Recover Canonical-View Faces in the Wild with Deep Neural Networks. arXiv 2014, arXiv:1404.3543. [Google Scholar]
  51. Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 1701–1708. [Google Scholar]
  52. Simonyan, K.; Zisserman, K. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  53. Sun, Y.; Wang, X.; Tang, X. Deep learning face representation from predecting 10,000 classes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionhl, Columbus, OH, USA, 24–27 June 2014; Volume 23, pp. 1891–1898. [Google Scholar]
  54. Sun, Y.; Chen, Y.; Wang, X.; Tang, X. Deep Learning Face representation by joint identification-verification. In Proceedings of the NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
  55. Chen, D.; Cao, X.; Wang, L.; Wen, F.; Sun, J. Bayesian face revisited: A joint formulation. In Proceedings of the Computer Vision ECCV, Florence, Italy, 7–13 October 2012; pp. 566–579. [Google Scholar]
  56. Wang, J.; Song, Y.; Leung, T.; Rosenberg, C.; Wang, J.; Philbin, J.; Chen, B.; Wu, Y. Learning grained image similarity with deep ranking. In Proceedings of the CVPR 2014: 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
  57. Duan, Q.; Zhang, L. Look more into occlusion: Realistic face frontalization and recognition with boostgan. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 214–228. [Google Scholar] [CrossRef]
  58. Zhang, K.; Zhang, Z.; Li, Z.; Qiao, Y. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 2016, 23, 1499–1503. [Google Scholar] [CrossRef] [Green Version]
  59. Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
  60. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein gan. arXiv 2017, arXiv:1701.078757. [Google Scholar]
  61. Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
  62. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
  63. Pei, Z.; Xu, H.; Zhang, Y.; Guo, M.; Yang, Y. Face recognition via deep learning using data augmentation based on orthogonal experiments. Electronics 2019, 8, 1088. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.