Entropy-Based Face Recognition and Spoof Detection for Security Applications

: Nowadays, cyber attacks are becoming an extremely serious issue, which is particularly important to prevent in a smart city context. Among cyber attacks, spooﬁng is an action that is increasingly common in many areas, such as emails, geolocation services or social networks. Identity spooﬁng is deﬁned as the action by which a person impersonates a third party to carry out a series of illegal activities such as committing fraud, cyberbullying, sextorsion, etc. In this work, a face recognition system is proposed, with an application to the spooﬁng prevention. The method is based on the Histogram of Oriented Gradients (HOG) descriptor. Since different face regions do not have the same information for the recognition process, introducing entropy would quantify the importance of each face region in the descriptor. Therefore, entropy is added to increase the robustness of the algorithm. Regarding face recognition, our approach has been tested on three well-known databases (ORL, FERET and LFW) and the experiments show that adding entropy information improves the recognition rate signiﬁcantly, with an increase over 40% in some of the considered databases. Spooﬁng tests has been implemented on CASIA FASD and MIFS databases, having obtained again better results than similar texture descriptors approaches.


Introduction
Biometrics relies on measuring different human characteristics and matching them to previously collected measurements in a database. Biometric features are "built-in" the body so they cannot be shared or subtracted. Even though these systems seem to be extremely reliable, it is always possible to capture a legitimate biometric trait from a user, copy it and replicate it later by someone else. The act of using an artifact to fool a biometric system, where someone pretends to be another person, is known as spoof attack [1].
Nowadays, fingerprint hardware systems represent more than 92% of the total biometric features market [2]. With the rise of facial identification in mobile systems, experts forecast that annual facial recognition devices and licenses will increase from $28.5 mn in 2015 to more than $122.8 mn worldwide by 2024. During that period, annual revenue for facial biometrics, including both visible light facial recognition and infrared-based facial thermography, will increase from $149.5 mn to $882.5 mn, at a compound annual growth rate (CAGR) of 22% [3].
In the last few years, technology regarding image capturing has evolved, allowing consumers to buy high resolution cameras at a very low cost, specially with the use of billions of smartphones that allow users to have a digital camera on their hand constantly. Taking advantage of this situation, it seems straightforward to prevent spoofing by authenticating using biometric sensors such as fingerprints, iris or facial features. The annual revenue from mobile biometrics systems and applications is expected to grow from $6.5 bn in 2016 to $50.6 bn in 2022, with a compound annual growth rate of 41% [4].
Biometric systems can be compromised and are vulnerable to a wide range of attacks. Among all these potential attacks, the one with the greatest practical relevance is the spoof attack. As mentioned before, it consists of submitting a stolen or copied biometric trait to the sensor in order to defeat the biometric system and access the system in an unauthorized way. These attacks don't need any knowledge about the security system itself because, if the authorized user is able to access, the attacker just needs to simulate the biometric trait of that user. Because of this, most security systems provide some kind of protection such as hashing, digital signature or encryption that are ineffective in spoof attacks [5]. In the last few years there has been an intensive research to provide reliable anti-spoofing systems for biometric traits, including fingerprints [6,7], face [8][9][10], and other biometric features [11][12][13].
Spoofing attacks have grown exponentially in the last few years [14][15][16]. Among other areas, social networks have recently reported serious privacy and security issues [17][18][19]. Therefore, to ensure a higher security level in social networks it would be convenient to implement some kind of spoofing detector. However, using a supervised control of all the information on a social network is unfeasible due to the huge amount of data that can be produced at any given time. One of the most common ways cyberbullying develops is through identity spoofing. In this case, false user profiles attributed to the victim can be created. It may also be possible to access the user's profile or personal account on different social networks in such a way that the identity is spoofed by contacting others or making comments on behalf of the victim of bullying.
Consequently, the main objective of this work is to propose and develop a face recognition and spoof detection method that can be applied on social media by means of a novel entropy-based system. Entropy has been used in face recognition in recent years [20,21]. Thus, as different areas of a face image contribute in a different way to the global recognition, entropy on each area is introduced to construct a new version of the Histogram of Oriented Gradients (HOG) descriptor. As a result, the main contribution of this paper is the introduction of this new HOG-based descriptor, which makes use of entropy to code each area in a face image. To do this, after the Entropy-Based Histogram of Oriented Gradients descriptor is computed, Support Vector Machines are used for the classification process. Our system has been tested with three face recognition databases (Olivetti Research Laboratory: ORL, FERET and Labeled Faces in the Wild: LFW) and two face spoof detection datasets (CASIA Face Anti-Spoofing Database:FADS and Makeup Induced Face Spoofing: MIFS dataset), obtaining reliable results and outperforming other recent works using texture descriptors on the same databases. This paper is organized as follows: Section 2 summarizes some related works; Section 3 explains our proposal of face detection, recognition and spoofing detection, introducing the Entropy-Based Histogram of Oriented Gradients (EBHOG) descriptor; Section 4 describes the experimental setup and the set of experiments completed with different databases; and finally, conclusions and some future works are discussed in Section 5.

Related Work
On social networks, a great amount of pictures and videos are uploaded and shared every day, where users can post an image where someone else appears without his/her consent. A face detection and recognition system would act before the image is published, identifying the people appearing in the image and notifying those users to give consent, protecting their privacy and increasing security by reporting potential cases of spoofing.
In order to simulate the authorized user, some face recognition systems can be spoofed by showing a photograph, video or even a face model of the user. Spoofing attacks can be detected using several methods. When detection and recognition are required to work in real-time, they must be computationally inexpensive. Most of the recognition methods are not fast enough or use non-conventional images [22]. Moreover, due to social image sharing and social networking websites, personal facial photographs of many users are usually accessible to the public. For instance, an impostor can obtain the photographs of genuine users from a social network, and submit them to a biometric authentication system to fool it [23].
Over the last few years, a wide variety of feature representation methods have been proposed to help describe scenes, objects and biometric features in different images. The particularities of each of these methods describe different aspects of visual features, each being best suited to certain particular conditions. Some of these methods focus on local information, others on holistic descriptors. Among all local feature descriptors, the most commonly used are SIFT (Scale-Invariant Feature Transform) [24,25], HOG (Histogram of Oriented Gradients) [26,27], SURF (Speeded-up Robust Features) [28,29] and LBP (Local Binary Patterns) [30,31], which are used to address variability in the image caused by changes in perspective, occlusions and variation in brightness.
As in many other machine learning applications, deep learning methods have proven to be an effective way to detect spoofing attacks. Many related works have considered face spoofing as a binary classification problem, where the system classifies a face as belonging to either a legitimate user or a fake user [32,33]. Thus, in [34] authors use CaffeNet and GoogLeNet convolutional neural networks (CNNs) models and perform a texture analysis. Alotaibi and Mahmood [35] presented a nonlinear diffusion to distinguish a fake image from a real image, which is the applied to a CNN for face liveness detection. Finally, an LBP network for face spoofing detection is proposed in [36], where LBP features are combined with deep learning. In spite of the immense potential of deep learning methods, they are computationally expensive, they need extremely large datasets for training and their internal complexity makes it difficult in some applications to interpret the results or to understand the algorithm mechanism [37][38][39].
The HOG descriptor is one of the most popular approaches for object detection. It is invariant to illumination and geometric transformations and it has been successfully applied to many security applications, such as privacy in image feature extraction using homomorphic encryption [40], phishing detection [41], classification of sensitive information embedded within uploaded photos [42], handwritten digits recognition [43], facial expression recognition with CNNs [44] and, particularly, to face spoofing detection [11,[45][46][47][48]. Due to its popularity in anti-spoofing detection, in this work a variant of the HOG descriptor will be presented and experimentally validated.

Materials and Methods
The proposed system will detect and extract faces in a set of images, recognize extracted faces by matching them against the ones stored in a database and perform experimental proofs of the proposed method to enhance security. In order to validate the system, images with only one face will be considered. A description of the developed system is in Figure 1.
The whole process consists of the following stages: • Image acquisition: image retrieval from a still photo.

•
Face detection: detection of some patterns in the image in order to locate a face. • Image pre-processing: crop the image to remove irrelevant parts and, if needed, apply image processing (change color space, filtering, etc.) to enhance some parameters to be measured in the next stage.

Detection Framework
Regarding the detection framework, we have used a well-known object detection algorithm developed by Paul Viola and Michael Jones [49]. It is a robust algorithm with a very high detection rate, suitable for real time applications that can be used for face detection. The algorithm uses four stages to enable a fast and accurate detection: Haar feature selection, the integral image for feature computation, AdaBoost for feature selection and an attentional cascade for efficient computational resource allocation [50].
After this process, original images are cropped so that only face images are taken into account in the following stages of our proposal. Then images are normalized in size and color information is removed, since our system will work with grayscale images.

Recognition Framework
For the recognition process, the Histogram of Oriented Gradients (HOG) has been considered. HOG is a well-known image descriptor based on the image's gradient orientations. HOGs are rotationallyinvariant image descriptors that have been used in optimization problems as well as in computer vision. The Histogram of Oriented Gradients (HOG) method has proven to be an effective descriptor, in general, for object recognition and for face recognition in particular [27].
The method is based on evaluating local histograms of image gradient orientations in a dense grid. The basic idea is that local object appearance and shape can often be characterized by the distribution of local intensity gradients or edge directions, even without precise knowledge of the corresponding gradient or edge positions [51]. HOG counts occurrences of edge orientations in a neighborhood of an image. In practice, this is implemented by dividing the image window into small spatial regions ("cells"), and each cell accumulates a local 1-D histogram of gradient directions (or edge orientations) over the pixels of the cell. The combined histogram entries form the representation [26].
Let G x and G y be the horizontal and vertical gradients of the image I. They can be computed for each pixel (x, y) using simple 1-D masks as follows: Then, the magnitude and orientation of the gradient are calculated as: Histograms are then constructed with the magnitude and orientation of each pixel, so that each cell will have one histogram and they will be concatenated to obtain the feature descriptor. The procedure for the implementation of the HOG descriptor algorithm is shown in Figure 2 and can be summarized as: • Divide input image into small connected regions (cells).

•
For each cell, a histogram of edge orientations is computed for all the pixels in the cell.

•
Every cell is discretized into angular bins, according to the gradient orientation.

•
Calculate histograms of oriented gradients over spatial cells.

•
Group adjacent cells into overlapping blocks and normalize histograms.  HOGs give the same importance (or weight) to each block in the image. However, some of these blocks contain information of the most remarkable features for face recognition, such as the eyes, the nose or the mouth [52,53], and many other blocks do not provide significant features for recognition. In other words, not all the blocks give the same information for a face recognition scheme. Shannon introduced entropy as a measure for measuring quantitatively the amount of information produced by a process [54]. Therefore, we consider that using entropy would quantify or weigh the importance of each block in the calculated HOGs, since different face regions will have different weights. Consequently, we introduce the Entropy-Based Histogram of Oriented Gradients (EBHOG) descriptor as follows: • Divide input image into c small connected regions (cells).

•
For each cell c, a histogram of edge orientations is computed for all the pixels in the cell. • Every cell is discretized into b angular bins, according to the gradient orientation.

•
Calculate histograms of oriented gradients over spatial cells.
• Group adjacent cells into overlapping blocks and normalize histograms. • Calculate Shannon's entropy for each computed HOG. This will give a weight w k for each block, for k = 1, 2, · · · , N. • Normalize weighted histograms.

•
The weighted HOGs using entropy constitute the descriptor. • Train a classifier by using SVM. • Classify using the trained SVM.
The entropy for block k, H k , is defined as: where N is the number of blocks in the image and P j k is: HOG k indicates the HOG obtained for block k in the input image.
The entropies of different regions in an image are shown in Figure 3. In this figure, it can be noticed that blocks containing key features for recognition, such as eyes, nose or mouth, have significant higher entropy values than blocks with irrelevant information for face recognition. The weight w k will then be calculated as: where Let W = {w 1 , w 2 , · · · , w N } be the set of weights calculated from Equation (7) and HOG = {HOG 1 , HOG 2 , · · · , HOG N } the histograms of oriented gradients for each block. The entropy-based HOG is then calculated by multiplying each HOG k by its corresponding weight w k : After weighting each HOG, the histograms must be normalized again, since the sum of all the values in each entropy weighted histogram is not 1. Thus, if EBHOG k = w k × HOG k is the k-th weighted histogram, min k and max k are the minimum and maximum values in EBHOG k , the normalized histogram EBHOG Norm k is: The whole training and testing process has been represented in Figures 4 and 5.  For classification, the Support Vector Machines (SVM) classifier has been chosen. Having several classes to be identified, the main idea of SVM is to select a hyperplane that is equidistant from the examples of each class to achieve the so-called maximum margin on each side of the hyperplane. To define this hyperplane, only the training data of each class next to these margins, which are called support vectors, are taken into account. The search for the separation hyperplane in these transformed spaces, usually of very high dimension, will be done implicitly using the so-called kernel functions. A kernel function K(x i , x j ) is a function that assigns to each pair of elements x i , x j ∈ X of an input space X, a real value corresponding to the scalar product of the transformed version of that element in a new space. Among the most popular kernel functions, one can find: • Linear kernel, whose expression is: where ·, · refers to the scalar product.
• Gaussian kernel, expressed as: where σ is standard deviation.
The selection of the kernel function depends on the data to be classified and will be validated in Section 4.

Detecting Spoofing Using EBHOG
Face spoofing attacks can be performed in general terms by using still images, videos or real faces. In order to apply the EBHOG method, our proposal aims at training models with different textures to detect a real face from fake faces. It can be noticed that fake faces cause some reflections that depend on the surface from where the facial image is being projected, being non-existent if the image were real [55].
Our proposal is based on the results of [9]. In their work, authors showed that introducing color information achieves reliable results to prevent face spoofing attacks. In particular, the YCbCr color system is used, since the texture information of the chrominance components show visible differences between real and fake faces. They used LBP to validate their proposal, among other texture descriptors. We propose here to use EBHOG instead of LBP as antispoofing scheme, as shown in Figure 6.

Results
In this section, the datasets used to evaluate our model will be first introduced. Then, the parameters to be used in order to achieve reliable results in face recognition are calculated. After that, our approach will be compared with state-of-the-art methods on the selected databases. Finally, the results of some experiments completed to verify the suitability of our anti spoofing model are shown.
The tests for face recognition were performed using three face databases: • The Olivetti Research Laboratory (ORL) database [56], which has 400 grayscale images of 40 persons. The images were taken at different times, with changing illumination conditions and different facial expressions. There are 10 images per person. 3 images were used for the training process in order to estimate the necessary EBHOG parameters. Then 2 images were used for the enrollment and, finally, the remaining 5 were used for the recognition stage.

•
The Color FERET database [57]. It contains 11,338 pictures of 994 different individuals.The gallery set fa was used for the training process, with a subset of 200 users. Then, the tests were completed using gallery sets fb, fc, dup1 and dup2 of FERET database. Images stored in fb have changes in the expression from the images in subset fa. Images in fc have mainly differences in illumination. Then, dup1 and dup2 subsets are challenging, since images were taken on different dates from the ones in subset fa.

•
The Labeled Faces in the Wild (LFW) dataset [58], composed of color images taken from the Internet. There are more than 13,000 photos of 5749 individuals, but for the recognition process only the users with at least two or more images per person have been considered. This reduces the bank of images to 1680 individuals. For the training process a subset of 200 users is again taken into account. Figure 7 shows graphical examples of some images in these three databases. On the other hand, the datasets used for face spoof detection are: • The CASIA Face Anti-Spoofing Database (FASD) [59], which contains videos of 50 subjects with their corresponding fake faces. There are different image qualities and the face attacks are warped photo, cut photo and digital display device attacks. 20 subjects were used for training the remaining 30 for testing.

•
The MIFS (Makeup Induced Face Spoofing) dataset [60], composed of face images of 107 subjects obtained from YouTube video makeup tutorials and face images of associated target subjects from the Internet. There are 4 photos per subject (2 before makeup, 2 after makeup) and 2 photos per target subject, making a total of 642 still images. This database focus specially on impersonating a target person by using makeup.  All the experiments have been completed in a computer using MATLAB R in Windows 10, with an Intel R Core i7-7500U processor @2.70 GHz and 8 GB of RAM.

Proposed System Settings and Recognition Results
As mentioned before (see Section 3.1 and Figure 4), the well-known Viola and Jones detector has been used to detect faces, and then, color information is removed from images. Finally, all the images have been normalized to 130 × 150 pixels.
Then, the parameters for the calculation of the Entropy-Based Histogram of Oriented Gradients (EBHOG) descriptor must be set. In order to choose the best cell size and the number of cells per block for our application, several tests changing the cell size of EBHOG have been performed. These calculations were performed by using 3 images per user in the ORL database. The number of orientation histogram bins (9 bins) has been the same as in [26], as well as the number of overlapping cells between adjacent blocks: half the block size. Finally, a Support Vector Machine (SVM) with a linear kernel has been chosen to classify faces [61], since it is usually suggested to use linear kernels if the number of features is much larger than the number of samples, which would happen in the original data set. The results using Receiver Operating Characteristic (ROC) curves are shown in Figure 9, where CS stands for Cell Size and BS stands for Block Size. To analyze these curves, the Area Under the ROC Curve (AUC) will be computed. The bigger the area, the better the classifier performs. Therefore, the results obtained regarding cell size/block size are presented in Table 1 in terms of area under the ROC curve (AUC). From these results, it becomes clear that the biggest area in Figure 9 is the one obtained with a EBHOG cell size of 4 × 4 and a block size of 2 × 2, so these are the parameters chosen to compute the histograms. To sum up, the parameters to obtain the Entropy-Based Histogram of Oriented Gradients are: • Size of EBHOG cell: 4 × 4 pixels. All the experiments considered the same number of samples of a true null hypothesis and a false null hypothesis. With this, the False Rejection Rate (FRR) is defined as the probability that the system will not recognize the identity of an already enrolled person, and the False Acceptance Rate (FAR) is the probability that the system will not reject an impostor (person who is not in the database). False Rejection and False Acceptance errors happen when genuine users are denied access while impostors are accepted to the system, respectively. The experiments completed with the testing set in the ORL database give a False Rejection Rate (Number of False Rejections/Number of Enrollee Recognition Attempts) of 7% and a False Acceptance Rate (Number of False Acceptances/Number of Impostor Recognition Attempts) of just 2%.
Finally, the True Positive Rate (TPR) describes the performance of our system, since it calculates the ratio between the number of True Positives and the number of correct identification cases. In our case, with the parameters considered before we obtained a TPR of 94.4% for the ORL database. Table 2 shows the recognition rate results for the three databases considered in the experiments. Note that the LFW database is commonly used for benchmarking face verification. However, in this work we consider the closed set identification protocol defined in [62][63][64]. The TPR is measured by the rank-1 identification accuracy, i.e., by a correct identification.
The mean computational time to extract EBHOG with the parameters set before was 12 ms for one subject on average. When using standard HOG this time was 8 ms on average.

Comparison with Other Methods and Discussion
Let us compare now the performance of our EBHOG descriptor and some other state-of-the-art descriptors for face recognition. In particular, the original Histogram of Oriented Gradients (HOG) approach has been taken into account [26], as well as other texture descriptors, such as Local Binary Patterns (LBP) [30], Patterns of Oriented Edge Magnitudes (POEM) [65], Scale-Invariant Feature Transform (SIFT) [51], Local Directional Patterns (LDP) [66], Weber Local Descriptors (WLD) [67] and recent Local Diagonal Extrema Number Patterns (LDENP) [68].
On the other hand, given their popularity and accuracy in recognition tasks, CNNs and Deep Learning represent a very successful model and they are used in many applications, particularly in computer vision tasks. One of the strategies that can be followed to apply Deep Learning is transfer learning. It consists of taking a pre-trained network and using it as a starting point to learn a new task. The advantage of this approach is that the pre-established network has already learned a broad set of features that can be applied for similar purposes. To do this, AlexNet has been selected [69]. The AlexNet architecture has eight layers with their respective learning parameters, five of which are convolutional layers and the remaining are fully connected. AlexNet was originally designed to support 1000 classes. However, in our classification problem the number of classes will be equal to the number of different users in each considered database. Thus, AlexNet was adapted to a smaller number of outputs, this being possible due to the flexibility to modify the last layer of the network.
The results the comparison between all these methods can be found in Table 3. From these results, it becomes clear that adding entropy information to the original Histogram of Oriented Gradients descriptor improves the recognition rate significantly, with an increase over 40% in some of the databases considered for the experiments.
When working with ORL database and f b and f c sets from the FERET database, our proposal does not get the best results, although the performance is rather similar to other state-of-the-art descriptors (our recognition rate is less than 2% lower than the best method in Table 3). In particular, both the Local Diagonal Extrema Number Patterns (LDENP) method and using AlexNet with Transfer Learning achieve the highest recognition rates for these datasets, which are characterized by having different light conditions (ORL and subset f c in FERET) and different expressions (ORL and subset f b in FERET). On the other hand, the EBHOG descriptor has the better recognition rates for both datasets dup1 and dup2 from the FERET database and for the challenging LFW database. These datasets are characterized by including photos with temporal/age changes (subsets dup1 and dup2 in FERET) and large variations in pose, expression and illuminations (LFW). Our method achieves high recognition results for these difficult datasets, which shows again that entropy plays a major role to achieve reliable recognition rates in difficult, demanding situations.

Experiments on Detecting Spoofing and Discussion
Let us show now the results of the completed tests in order to detect face spoofing attacks. Experiments on the CASIA FASD database are strictly done with the original protocol defined by the authors. Thus, 30 face images in each of the training videos are selected randomly. Then, 30 face images from each of the testing videos are also selected randomly. The video is then classified as 'real' or 'fake' by averaging the 30 images scores. In order to compare the classification results, the Equal Error Rate (EER) is used, as suggested by the authors. The results are shown in Table 4, where a comparison with the results from [9] and from using HOG instead of EBHOG are displayed. Table 4. Performance using Equal Error Rate (EER) (%) in the CASIA FASD database and YCbCr color system. The best results are highlighted in bold font.

Method EER
Here φ(x, y) is the similarity match score between images x and y. Since our descriptor is based on histogram calculation, we propose to calculate the similarity between two images x and y by using the histogram intersection kernel: φ(x, y) is finally normalized in the [0, 1] interval. Input images are then again classified as 'real' or 'fake' and, as in the CASIA FASD database, the Equal Error Rate (EER) is calculated to compare the results. The results are shown in Table 5, where a comparison with the same methods as in Table 4 are considered. Notice that the threshold in the similarity score to consider that an image is genuine or not has been introduced, as well. The threshold for the histogram intersection similarity score that achieves better results from Table 5 is 0.85. Again, the best performance corresponds to the method proposed. As a conclusion, introducing entropy improves the results in face spoofing detection compared with similar approaches.
To sum up, the results from the experiments show that our proposal is consistently among the best local descriptors for face recognition, outperforming most of the recent approaches results in Tables 3-5. The great amount of tests implemented on several face databases have effectively shown the potential of the EBHOG approach.

Conclusions
In the last few years, with the increasing popularity of mobile technologies, almost all mobile phone applications have access to private data in some way. This fact is particularly vulnerable in a smart city context. Cyberattacks on social networks have become common to get profiles and hackers often use them to steal personal data or even to discredit their real user. One way to prevent spoofing is by authenticating users using biometric traits such as fingerprints, iris or facial features.
In this work, a new face recognition and spoofing detection approach using an entropy-based HOG descriptor has been presented. The results show that our method provides a reliable descriptor for different databases and, as a result, we consider that our proposal may be applied to detect possible face spoofing attacks using pictures uploaded to social media. Future works aim at applying the proposed algorithm to real situations in social networks. We are currently adapting the method to work with GPUs and parallelizing the most time-consuming steps in the algorithm.  Acknowledgments: This work was partially supported by the Ministerio de Economía y Competitividad (Spain), project TIN2013-40982-R, the FEDER funds, and the "Red de Investigación en el uso del aprendizaje colaborativo para la adquisición de competencias básicas. El caso Erasmus+ EUROBOTIQUE", Red ICE 3701 curso 2016-2017.

Conflicts of Interest:
The authors declare no conflict of interest.