You are currently viewing a new version of our website. To view the old version click .
Journal of Imaging
  • Article
  • Open Access

18 January 2023

Analysis of Real-Time Face-Verification Methods for Surveillance Applications

,
,
,
,
and
1
Instituto Politecnico Nacional, ESIME Culhuacan, Mexico City 04440, Mexico
2
Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Image Processing and Biometric Facial Analysis

Abstract

In the last decade, face-recognition and -verification methods based on deep learning have increasingly used deeper and more complex architectures to obtain state-of-the-art (SOTA) accuracy. Hence, these architectures are limited to powerful devices that can handle heavy computational resources. Conversely, lightweight and efficient methods have recently been proposed to achieve real-time performance on limited devices and embedded systems. However, real-time face-verification methods struggle with problems usually solved by their heavy counterparts—for example, illumination changes, occlusions, face rotation, and distance to the subject. These challenges are strongly related to surveillance applications that deal with low-resolution face images under unconstrained conditions. Therefore, this paper compares three SOTA real-time face-verification methods for coping with specific problems in surveillance applications. To this end, we created an evaluation subset from two available datasets consisting of 3000 face images presenting face rotation and low-resolution problems. We defined five groups of face rotation with five levels of resolutions that can appear in common surveillance scenarios. With our evaluation subset, we methodically evaluated the face-verification accuracy of MobileFaceNet, EfficientNet-B0, and GhostNet. Furthermore, we also evaluated them with conventional datasets, such as Cross-Pose LFW and QMUL-SurvFace. When examining the experimental results of the three mentioned datasets, we found that EfficientNet-B0 could deal with both surveillance problems, but MobileFaceNet was better at handling extreme face rotation over 80 degrees.

1. Introduction

Biometric recognition has played an important role in different application fields in recent decades. Frequent examples include face, iris, voice, palm, and fingerprint recognition []. One of the most widely used methods is facial recognition, which has experienced gains in its development in the last decade, with improvements in face processing, detection, and recognition []. Its primary objective is identifying which faces belong to individual identities within a dataset. On the other hand, face verification consists of analyzing the facial features of an image to determine if it belongs to the person it claims to be. Facial recognition and verification have shared problems related to illumination changes, occlusions, face rotation, and distance to the subject. These challenges are strongly related to video-surveillance applications; hence, the trending computer vision solution of deep learning can be used to address the mentioned problems. Deep neural networks (DNNs) are composed of several hidden layers with millions of artificial neurons connected and running parallel to handle a large amount of data []. Among DNNs, convolutional neural networks (CNNs) are the best-fitting option for image classification and object detection [].
Currently, CNNs are more frequently used than traditional feature-extraction methods for face recognition, as they can solve common related issues such as changes in facial expressions, illumination, poses, low resolution, and occlusion []. CNNs are commonly built with complex architectures and high computational costs [], with examples such as DeepFace [], FaceNet [], ArcFace [], and MagFace []. Due to the huge amount of memory that these methods require, their applications are not designed to work in real-time on embedded devices with limited resources [,]. Therefore, lightweight CNN architectures have arisen that cover some of the mentioned requirements []. MobileFaceNet [], EfficientNet-B0 [], and GhostNet [] are some of the lightweight architectures employed for face recognition and verification. Nonetheless, these methods struggle with problems usually solved by their heavy and more complex counterparts, such as face rotation and low-level face inputs.
The main contributions of this paper are three-fold: In this paper, we present an analysis of current SOTA methods for face verification based on lightweight architectures. The analysis specifically focuses on the problems of different facial rotations and low resolution present in video-surveillance camera applications. The SOTA methods used in the analysis include the aforementioned MobileFaceNet [], EfficientNet-B0 [], and GhostNet [], as they are lightweight architectures that can be implemented in real-time and limited embedded devices. The datasets used to test methods were Cross-Pose LFW (CPLFW) [] and QMUL-SurvFace [], as they include facial images in different poses and low-resolution images, which are the problems analyzed in the present work. Furthermore, to methodically analyze the effect of face rotation and low-resolution problems, we propose an evaluation subset with 3000 facial images including the combination of the CPLFW [] and Celebrities in Frontal-Profile in the Wild (CFPW) [] datasets. We specifically define five groups of face rotation degrees with five levels of resolution that appear in common surveillance scenarios. With our complete analysis and based on the three datasets employed, we found that EfficientNet-B0 can deal with rotation and resolution problems, while MobileFaceNet is better at handling extreme face rotation over 80 degrees. The main contributions of this paper are three-fold:
  • An evaluation subset with 3000 facial images obtained from CPLFW [] and CFPW [] was divided into five intervals of rotation degree and five resolution levels to evaluate rotation and resolution variations methodically.
  • An analysis of three SOTA lightweight architectures (MobileFaceNet [], EfficientNet-B0 [], and GhostNet []) was carried out to deal with face-verification problems on conventional datasets (CPLFW [] and QMUL-SurvFace []).
  • A methodical analysis of the effect of facial rotation and low resolution was conducted for the face verification of the three aforementioned architectures.

3. Face Recognition in Real-Time

We considered the number of parameters and multiply–accumulate operations (MACs) to choose the real-time face-recognition methods for our analysis. Specifically, we limited our search to architectures that have about 30 M params. and about 200 M MACs. In this case, we ensured that they could be applied on limited devices and embedded systems. Thus, the three methods chosen are detailed below.

3.1. MobileFaceNet

In 2018, Cheng et al. [] proposed MobileFaceNet (1.2 M params. and 228 M MACs), which is based on the inverted residual bottlenecks introduced by MobileNetV2 [], with small expansion factors as its main building blocks. The residual bottleneck block contains a three-layer convolution with direct access to the bottleneck connection, as shown in Figure 1. The depth-separable convolutions of MobileNetV1 [] are used to reduce the size and complexity of the network []. In addition, the architecture uses the nonlinear activation function PReLU, helping face-verification performance. One of the main contributions of MobileFaceNet is the replacement of the global average pooling (GAPool) layer with the global depth convolutional layer (GDConv), which can obtain a more discriminating face representation. The GDConv layer deals with different levels of importance of different output feature maps, as it generates a 512-dimensional facial feature vector. GDConv is represented by:
G m = i , j K i , j , m · F i , j , m
where K is a depth convolutional kernel of size W × H × M , F is the input feature map of size W × H × M , and ( i , j ) is the spatial dimension in K and F. M refers to the channel index, and G m is the m-th channel in G. G is the output of size 1 × 1 × M . W is the spatial width. H is the spacial height of a feature map. M is the number of input channels. The GDConv layer has a computational cost assigned by
W · H · M .
Figure 1. Bottleneck residual blocks in MobileFaceNet [].
The MobileFaceNet architecture is shown in Table 1. The expansion multiplier is defined as t. c is the number of channels. n is the blocked repeated time. s is the step stride []. It is worth noting that MobileFaceNet has been tested and employed in different face-recognition applications, such as in [,,].
Table 1. MobileFaceNet architecture [].

3.2. EfficientNet

In 2019, Tan and Le [] introduced EfficientNet (33 M params. and 78 M MACs), which combines a neural architecture search (NAS) with a composite scaling method to optimize the training speed and efficiency jointly. The idea of EffcientNet is to expand the width, depth, and resolution of the grid through the composite-scaling method, as shown in Figure 2e. In addition, a single variable is used to uniformly scale the width, depth, and resolution of the network []. The following equations show the composite scaling method:
D e p t h : d = α ϕ W i d t h : w = β ϕ R e s o l u t i o n : r = γ ϕ s . t . α β 2 γ 2 2 α 1 , β 1 , γ 1
where α , β , and γ are the distribution coefficients of the network depth, width, and resolution, respectively (all found by the NAS using MBConv blocks). A composite coefficient phi is used to find the alpha, beta, and gamma parameters that maximize the recognition accuracy. It is important to note that phi is adjusted according to the desirable computational resources [].
Figure 2. Architectural representation of standard and composite scaling methods. (a) ConvNet generic, (b) ConvNet with width scaling, (c) ConvNet with Depth scaling, (d) ConvNet with resolution scaling, and (e) ConvNet with compound scaling.
The reference network of EfficientNet-B0 is obtained by calculating the coefficients α , β , and γ using a small grid search when ϕ = 1 . More complex versions of EfficientNet have been proposed by scaling the reference network with different ϕ (EfficientNet-B1-7) [].
The EfficientNet-B0 architecture is shown in Table 2. The number of output feature channels and convolutional layers of each stage are shown as channels and layers, respectively. EfficientNet mainly comprises mobile inverted bottleneck convolution (MBConv1, MBConv6), standard convolutional layers, pooling layers, and one fully connected layer [].
Table 2. EfficientNet-B0 architecture [].

3.3. GhostNet

In 2020, Han et al. [] presented GhostNet (27 M params. and 194 M MACs), mainly constituted by the proposed Ghost modules. The main contribution of these modules is to substitute a significant part of the convolutional filters with a series of linear transformations. Ghost feature maps are generated by economic operations, saving computation from the standard convolutions. A Ghost module is shown in Figure 3, and it can be expressed by
Y = X × F
where Y is the m intrinsic feature map generated by the primary convolution, X is the input feature map, × is the convolution operation, and F is the kernel size of the convolutional filter. Thus, the feature maps are given by
y i j = Φ i , j ( y i ) , i = 1 , , m , j = 1 , , s ,
Figure 3. Representation of a GhostNet module [].
Φ i , j is the j-th linear operation used to generate the j-th Ghost feature map. y i j , except for the last Φ i , s , is the identity mapping used to preserve the intrinsic feature maps. y i is the i-th intrinsic feature map in Y . The Ghost module is plug-and-play and can be used to update existing CNNs [].
The GhostNet architecture is shown in Table 3, where t denotes the expansion size, c is the number of output channels, SE indicates whether the squeeze-and-excitation (SE) module is used, and stride is the number of steps that the neural network filter moves in the image []. Bottlenecks are gathered according to the sizes of the input feature maps []. The average pooling and a convolutional layer are used to transform the feature maps into a 1280-dimensional vector for the classification [].
Table 3. GhostNet architecture [].

4. Experiment Setup

This section presents the implementation details used for evaluating the MobileFaceNet, EfficientNet-B0, and GhostNet architectures. We specifically compared their performance in face verification, where the conventional CPLFW and QMUL-SurvFace datasets were first used to analyze scenarios where face rotation and low-resolution images appeared in video surveillance cameras (Experiment 1). In addition, the proposed evaluation subset was used to methodologically analyze the impact of face rotation using a particular rotation degree group and low resolution by using specific image sizes. The main goal of our analysis was to understand how images with rotation or low resolution affect the facial-verification performance of the SOTA lightweight architectures (Experiment 2).

4.1. Implementation Details

All experiments were run on a computer with a 7th-generation Intel Core i7 processor, 32 GB of RAM, and a single NVIDIA GTX 1060 GPU. We used Python 3.10, Torch 1.12.0, and Torchvision 0.13.0 with CUDA 11.3. To obtain the verification accuracy, we employed the pre-trained models (MobileFaceNet [], EfficientNet-B0 [], and GhostNet []) shared by the FaceX-Zoo repository []. These models were trained with the MS-Celeb1M-v1c [] dataset with a stochastic gradient descent (SGD) optimizer, a momentum of 0.9, and the MV-Softmax [] loss function. The training batch size was 512, with a total of 18 epochs and a learning rate initialized at 0.1 and divided by 10 at Epochs 10, 13, and 16. To perform the test with the CPLFW dataset, QMUL-SurvFace, and the proposed evaluation subset, the images were normalized to 112 × 112 pixels using the same parameters from [].

4.2. Datasets

The CPLFW [] dataset contains 11,652 images of 3930 identities at a resolution of 250 × 250 pixels with different facial pose variations. We used 6000 total pairs (3000 positive and 3000 negative pairs) for the evaluation. The QMUL-SurvFace [] dataset comprises 463,507 video-surveillance images with 15,573 identities. Out of 10,638 identities, 2 or more images were included with resolutions between 6 × 5 and 124 × 106 pixels. The average resolution was 24 × 20 pixels and can be used for facial verification and identification []. A total of 10,640 pairs (5320 positive and 5320 negative) were used in our evaluation.
To methodologically analyze face-verification performance in scenarios where variations such as face rotation and low resolutions are present, we designed an evaluation subset using the CPLFW [] and CFPW [] datasets. The CFPW [] dataset has 7000 images from 500 identities, with 10 frontal and 4 profile pictures each. For the construction of our evaluation subset, we used a facial-pose-estimation method (6DRepNet []) to determine the rotation degree and thus divide the images into 5 angle intervals ([0°; 20°], [20°; 40°], [40°; 60°], [60°; 80°], and [80°; 180°]).
The facial-pose estimation method 6DRepNet [] is based on a CNN and uses a 6D continuous rotation matrix for compressed regression. Thus, it can learn the entire facial rotation appearance using a geodesic loss to penalize the network with respect to the special orthogonal group SO(3) geometry. The publicly available code of 6DRepNet [] was used to obtain the rotation angle from all faces.
It is worth noting that, from each pair of images in our evaluation subset, we specifically selected one image in frontal view and another with a rotation angle. In this way, we emulated security-video-surveillance applications. Table 4 shows the numbers of our evaluation subset, with 200 pairs for the intervals of [0°; 20°], [20°; 40°], [40°; 60°], and [60°; 80°] and 700 pairs of [80°; 180°]. Figure 4 shows some examples of the pairs included. Furthermore, to overcome the challenges of distance to the subject in the video-surveillance cameras, we resized the resolution of our evaluation subset. Figure 5 shows an example of the five resolution levels, their equivalent at the standard resolution, and the resized input to the three methods.
Table 4. Number of images and pairs per interval of each dataset used to build the proposed evaluation subset.
Figure 4. Examples of the pairs chosen for the proposed evaluation subset.
Figure 5. Example of the five levels of resolution in the proposed evaluation subset.

5. Experimental Results

5.1. Evaluation with Conventional Datasets

In the first experiment, we analyzed the performance of lightweight architectures with 6000 pairs from CPLFW []. Table 5 shows the facial-verification performance for the three pre-trained models.
Table 5. Verification performance with the CPLFW [] dataset.
Table 5 shows that, for the CPLFW [] dataset, the EffcientNet-B0 [] model has the best verification performance compared to the other two models. To analyze the facial-verification performance using angle rotation, we also used the 6DRepNet [] method. Unfortunately, we could only obtain 5864 pairs. The pairs not included were misdetections caused by heavy occlusions generated by rotations greater than 90°, soccer helmets, cropped images, etc. Figure 6 shows examples of the occlusions found in the faces not included. We grouped the detected pairs by the angle difference between each image pair. Hence, we also defined five intervals, [0°; 20°], [20°; 40°], [40°; 60°], [60°; 80°], and [80°; 180°]. Table 6 shows the results of the verification performance for each angle interval.
Figure 6. Examples of the occlusions found in the faces not included.
Table 6. Verification performance over the five intervals using the CPLFW [] dataset.
As we can see in Table 6, EfficientNet-B0 [] has the best verification performance for all intervals. It is important to note that the accuracy of the [0°; 20°] interval is lower than that of [20°; 40°]. This inconsistency in the results can be attributed to angle-detection problems. Figure 7 shows examples of image pairs that are supposed to belong to the [0°; 20°] interval, where we can see the apparent misdetection problems. However, with this test, we can see that, in general, if the rotation angle increases, the verification accuracy decreases.
Figure 7. Example of image pairs with facial pose estimation error (attributed to the [0°; 20°] interval).
Figure 8 shows examples of image pairs incorrectly classified by EfficientNet-B0. In these two intervals, the images present occlusions (images with missing pixels in the face, glasses, and cap) and extreme rotations, making facial verification difficult.
Figure 8. Example of CPLFW image pairs incorrectly classified by EfficientNet-B0.
Next, we also analyzed the performance of the THREE methods using 10,640 image pairs from the challenging QMUL-SurvFace dataset []. Table 7 shows the verification performance, where EffcientNet-B0 achieved the best results again. It is important to note that the results of all methods are low due to the image quality, capture distance, occlusions, and extreme rotations. Figure 9 shows examples of image pairs incorrectly classified by EfficientNet-B0.
Table 7. Verification performance with the QMUL-SurvFace [] dataset.
Figure 9. Example of QMUL-SurvFace image pairs incorrectly classified by EfficientNet-B0.

5.2. Evaluation with the Proposed Evaluation Subset

We started this test by analyzing the performance of the three methods with 1500 pairs from the proposed evaluation subset. Table 8 shows the face-verification performance, where MobileFaceNet [] surprisingly had the best verification performance. We also analyzed the performance of all methods with facial rotations divided into five angle intervals. Table 9 shows the verification performance of each interval.
Table 8. Verification performance with the proposed evaluation subset.
Table 9. Verification performance over the five intervals with the proposed evaluation subset.
We can see from Table 9 that MobileFaceNet [] has the best verification performance in the intervals [0°; 20°], [20°; 40°], and [80°; 180°]. Meanwhile, EfficientNet-B0 [] has the best verification performance for [20°; 40°], [40°; 60°], and [60°; 80°]. Thus, MobileFaceNet [] has the best general accuracy, and it is the best method for handling extreme facial rotation for angles greater than 80°. It was found that the verification accuracy decreased as the rotation angle increased in each interval because all of the images were at an extreme angle, and the feature vector had less information to provide. Figure 10 shows examples of image pairs misclassified by MobileFaceNet.
Figure 10. Example of evaluation subset pairs incorrectly classified by MobileFaceNet.
Furthermore, we analyzed the performance of the three methods with the resolutions of 142, 282, 422, 842, and 1122 pixels in our evaluation subset. Table 10 shows the obtained results of the verification accuracy with different resolution levels. MobileFaceNet [] achieved the best results for 28 × 28 to 112 × 112 pixels. This may be attributed to the richness of the feature vector generated with the GDConv of the architecture. On the other hand, EfficientNet-B0 [] had the best verification performance for 14 × 14 pixels, which can be attributed to the specific filter sizes found by the NAS procedure.
Table 10. Verification performance of the five resolution levels with the proposed evaluation subset.
We also analyzed the facial rotation together with different resolutions. Figure 11 shows plots for each angle interval with different resolution levels. In Figure 11a, it can be seen that MobileFaceNet [] had the best performance when working with images equal to or greater than 84 pixels, EffcientNet-B0 [] was the best for images of 14 and 42 pixels, and GhostNet [] was the best for images of 28 to 42 pixels. In Figure 11b, it can be seen that MobileFaceNet [] had the best performance for working with images equal to or larger than 84 pixels, while EffcientNet-B0 [] was the best for images from 14 to 42 and 112 pixels. Figure 11c shows that MobileFaceNet [] had the best performance for images with 28 pixels; EffcientNet-B0 [] was the best for 14, 42, and 112 pixels, and GhostNet [] was the best for 84 pixels. Figure 11d shows that EffcientNet-B0 [] achieved the best results for 14- to 112-pixel images. Figure 11e indicates that MobileFaceNet [] had the best performance when working with 28- to 112-pixel images, while GhostNet [] was the best for 14-pixel images.
Figure 11. Plot results for each angle range with different resolution levels, where the following intervals are shown: (a) [ 0 ; 20 ] , (b) [ 20 ; 40 ] , (c) [ 40 ; 60 ] , (d) [ 60 ; 80 ] , and (e) [ 80 ; 180 ] . Pixel resolutions of 14 × 14, 28 × 28, 42 × 42, 84 × 84, and 112 × 112 are used in all intervals.
In summary, EfficientNet-B0 [] is the best method for working with 14 × 14-pixel images in all of the different intervals, except for the [80°;180°] interval. MobileFaceNet [] with image resolutions from 28 × 28 to 112 × 112 pixels proved to be the best method to work in the interval [80°;180°], where extreme rotations are found. Figure 12 shows examples of image pairs misclassified by MobileFaceNet, where we can qualitatively corroborate the challenges for each angle and resolution interval.
Figure 12. Examples of evaluation subset pairs per resolution interval incorrectly classified by MobileFaceNet.
Finally, in Table 11, we present the inference time of each method running on a single GPU (NVIDIA GTX 1060) and CPU (Intel Core i7). The time was averaged over 500 single passes of 112 × 112-pixel images. In this table, we can see that MobileFaceNet is the only approach that can surpass the real-time barrier of 30 FPS. However, all methods can run over 15 FPS, which is considered efficient on a CPU and low-cost GPU such as the GTX 1060.
Table 11. Inference time.

6. Discussion

Based on the analysis using two conventional datasets, EfficientNet-B0 demonstrated that it could handle different facial rotations, prominent occlusions, illuminations, and low resolutions. This is because the mobile inverted bottleneck convolution in the first layer expands the channels and compresses them. Consequently, the layers with fewer channels skip connections to obtain discriminative feature maps to generalize facial features. Therefore, such features (facial contour, nose, eyes, eyebrows, mouth, etc.) can be enriched between each pair of images in training.
An evaluation subset was designed to analyze only the variations with different rotations and low resolutions to understand how the methods work with images that can be obtained in video-surveillance cameras. This evaluation subset has well-defined image pairs for each angle range and five resolution levels. EfficientNet-B0 proved to be the best method to work with resolutions of 14 × 14 pixels and a rotation of less than 80°. On the other hand, MobileFaceNet proved to be the best with extreme rotations (greater than 80°) with resolutions from 28 × 28 to 112 × 112 pixels. This might relate to the global depthwise convolutional modules responsible for obtaining rich feature maps in specific regions of the face. GhostNet, on average, did not perform well because Ghost modules lack features that better represent the face, which is attributed to the “cheap” features calculated by linear transformations instead of standard convolutional operations.

7. Conclusions

In this paper, we analyzed the real-time face-verification methods of MobileFaceNet, EfficientNet-B0, and GhostNet using datasets explicitly focusing on problems present in video-surveillance applications. We tested their performance on conventional datasets (CPLFW and QMUL-SurvFace) that also have different illuminations, occlusions, and facial expressions. In addition, we proposed an evaluation subset that focused only on the problems of facial rotation and low resolutions, divided into five angle intervals and five levels of resolution. The experimental results showed that, for resolutions of 14 × 14 pixels with angles less than 80°, EfficientNet-B0 was the best method. MobileFaceNet, at angles greater than 80° and with resolutions of 28 × 28 up to 112 × 112 pixels, proved to be the best method compared to the other two. Therefore, we can conclude that using the three mentioned datasets, EfficientNet-B0 can cope with facial rotation variations and low resolutions in general, while MobileFaceNet can cope with extreme rotations. Nonetheless, all analyzed methods can run on limited devices and embedded systems in real-time.

Author Contributions

Conceptualization, J.O.-M. and G.B.-G.; funding acquisition, O.L.-G.; investigation, F.P.-M.; methodology, G.B.-G.; project administration, G.S.-P.; software, F.P.-M.; supervision, G.S.-P.; validation, Lidia Prudente-Tixteco; writing—original draft, F.P.-M.; writing—review and editing, J.O.-M., G.S.-P., G.B.-G., L.P.-T. and O.L.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. de Freitas Pereira, T.; Schmidli, D.; Linghu, Y.; Zhang, X.; Marcel, S.; Günther, M. Eight Years of Face Recognition Research: Reproducibility, Achievements and Open Issues. arXiv 2022, arXiv:2208.04040. [Google Scholar]
  2. Sundaram, M.; Mani, A. Face Recognition: Demystification of Multifarious Aspect in Evaluation Metrics; Intech: London, UK, 2016. [Google Scholar]
  3. Zhang, A.; Lipton, Z.C.; Li, M.; Smola, A.J. Dive into deep learning. arXiv 2021, arXiv:2106.11342. [Google Scholar]
  4. Boutros, F.; Damer, N.; Kuijper, A. QuantFace: Towards lightweight face recognition by synthetic data low-bit quantization. arXiv 2022, arXiv:2206.10526. [Google Scholar]
  5. Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar]
  6. Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
  7. Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4690–4699. [Google Scholar]
  8. Meng, Q.; Zhao, S.; Huang, Z.; Zhou, F. Magface: A universal representation for face recognition and quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 14225–14234. [Google Scholar]
  9. Boutros, F.; Siebke, P.; Klemt, M.; Damer, N.; Kirchbuchner, F.; Kuijper, A. PocketNet: Extreme lightweight face recognition network using neural architecture search and multistep knowledge distillation. IEEE Access 2022, 10, 46823–46833. [Google Scholar] [CrossRef]
  10. Chen, S.; Liu, Y.; Gao, X.; Han, Z. Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In Proceedings of the Chinese Conference on Biometric Recognition, Urumqi, China, 11–12 August 2018; pp. 428–438. [Google Scholar]
  11. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
  12. Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar]
  13. Zheng, T.; Deng, W. Cross-pose lfw: A database for studying cross-pose face recognition in unconstrained environments. Beijing Univ. Posts Telecommun. Tech. Rep. 2018, 5, 7. [Google Scholar]
  14. Cheng, Z.; Zhu, X.; Gong, S. Surveillance face recognition challenge. arXiv 2018, arXiv:1804.09691. [Google Scholar]
  15. Sengupta, S.; Chen, J.C.; Castillo, C.; Patel, V.M.; Chellappa, R.; Jacobs, D.W. Frontal to profile face verification in the wild. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
  16. Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning face representation from scratch. arXiv 2014, arXiv:1411.7923. [Google Scholar]
  17. Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep Face Recognition; BMVC Press: Swansea, UK, 2015. [Google Scholar]
  18. Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 87–102. [Google Scholar]
  19. Gecer, B.; Bhattarai, B.; Kittler, J.; Kim, T.K. Semi-supervised adversarial learning to generate photorealistic face images of new identities from 3d morphable model. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 217–234. [Google Scholar]
  20. Zhu, Z.; Huang, G.; Deng, J.; Ye, Y.; Huang, J.; Chen, X.; Zhu, J.; Yang, T.; Guo, J.; Lu, J.; et al. Masked face recognition challenge: The webface260m track report. arXiv 2021, arXiv:2108.07189. [Google Scholar]
  21. DeepGlint. Trillion Pairs Testing Faceset; DeepGlint: Beijing, China, 2019. [Google Scholar]
  22. Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 17–20 October 2008. [Google Scholar]
  23. Gross, R.; Matthews, I.; Cohn, J.; Kanade, T.; Baker, S. Multi-pie. Image Vis. Comput. 2010, 28, 807–813. [Google Scholar] [CrossRef]
  24. Grgic, M.; Delac, K.; Grgic, S. SCface–surveillance cameras face database. Multimed. Tools Appl. 2011, 51, 863–879. [Google Scholar] [CrossRef]
  25. Wang, M.; Deng, W. Deep face recognition: A survey. Neurocomputing 2021, 429, 215–244. [Google Scholar] [CrossRef]
  26. Chen, J.; Guo, Z.; Hu, J. Ring-regularized cosine similarity learning for fine-grained face verification. Pattern Recognit. Lett. 2021, 148, 68–74. [Google Scholar] [CrossRef]
  27. Chen, J.C.; Patel, V.M.; Chellappa, R. Unconstrained face verification using deep cnn features. In Proceedings of the 2016 IEEE winter conference on applications of computer vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; pp. 1–9. [Google Scholar]
  28. Guo, G.; Zhang, N. A survey on deep learning based face recognition. Comput. Vis. Image Underst. 2019, 189, 102805. [Google Scholar] [CrossRef]
  29. Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 212–220. [Google Scholar]
  30. Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5265–5274. [Google Scholar]
  31. Zhang, X.; Zhao, R.; Qiao, Y.; Wang, X.; Li, H. Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10823–10832. [Google Scholar]
  32. Wang, X.; Zhang, S.; Wang, S.; Fu, T.; Shi, H.; Mei, T. Mis-Classified Vector Guided Softmax Loss for Face Recognition. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12241–12248. [Google Scholar] [CrossRef]
  33. Boutros, F.; Damer, N.; Kirchbuchner, F.; Kuijper, A. Elasticface: Elastic margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1578–1587. [Google Scholar]
  34. Zhao, J.; Cheng, Y.; Xu, Y.; Xiong, L.; Li, J.; Zhao, F.; Jayashree, K.; Pranata, S.; Shen, S.; Xing, J.; et al. Towards pose invariant face recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2207–2216. [Google Scholar]
  35. Ju, Y.J.; Lee, G.H.; Hong, J.H.; Lee, S.W. Complete face recovery gan: Unsupervised joint face rotation and de-occlusion from a single-view image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2022; pp. 3711–3721. [Google Scholar]
  36. Nam, G.P.; Choi, H.; Cho, J.; Kim, I.J. PSI-CNN: A pyramid-based scale-invariant CNN architecture for face recognition robust to various image resolutions. Appl. Sci. 2018, 8, 1561. [Google Scholar] [CrossRef]
  37. Shahbakhsh, M.B.; Hassanpour, H. Empowering Face Recognition Methods Using a GAN-based Single Image Super-Resolution Network. Int. J. Eng. 2022, 35, 1858–1866. [Google Scholar] [CrossRef]
  38. Maity, S.; Abdel-Mottaleb, M.; Asfour, S.S. Multimodal low resolution face and frontal gait recognition from surveillance video. Electronics 2021, 10, 1013. [Google Scholar] [CrossRef]
  39. Nadeem, A.; Ashraf, M.; Rizwan, K.; Qadeer, N.; AlZahrani, A.; Mehmood, A.; Abbasi, Q.H. A Novel Integration of Face-Recognition Algorithms with a Soft Voting Scheme for Efficiently Tracking Missing Person in Challenging Large-Gathering Scenarios. Sensors 2022, 22, 1153. [Google Scholar] [CrossRef]
  40. Mishra, N.K.; Dutta, M.; Singh, S.K. Multiscale parallel deep CNN (mpdCNN) architecture for the real low-resolution face recognition for surveillance. Image Vis. Comput. 2021, 115, 104290. [Google Scholar] [CrossRef]
  41. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
  42. Martínez-Díaz, Y.; Méndez-Vázquez, H.; Luevano, L.S.; Chang, L.; Gonzalez-Mendoza, M. Lightweight low-resolution face recognition for surveillance applications. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 5421–5428. [Google Scholar]
  43. Oo, S.L.M.; Oo, A.N. Child Face Recognition System Using Mobilefacenet. Ph.D. Thesis, University of Information Technology, Mandalay, Myanmar, 2019. [Google Scholar]
  44. Xiao, J.; Jiang, G.; Liu, H. A Lightweight Face Recognition Model based on MobileFaceNet for Limited Computation Environment. EAI Endorsed Trans. Internet Things 2021, 7, 1–9. [Google Scholar] [CrossRef]
  45. Wang, J.; Liu, Y.; Hu, Y.; Shi, H.; Mei, T. Facex-zoo: A pytorch toolbox for face recognition. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021; pp. 3779–3782. [Google Scholar]
  46. Hempel, T.; Abdelrahman, A.A.; Al-Hamadi, A. 6D Rotation Representation For Unconstrained Head Pose Estimation. arXiv 2022, arXiv:2202.12555. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.