All articles published by MDPI are made immediately available worldwide under an open access license. No special
permission is required to reuse all or part of the article published by MDPI, including figures and tables. For
articles published under an open access Creative Common CC BY license, any part of the article may be reused without
permission provided that the original article is clearly cited. For more information, please refer to
Feature Papers represent the most advanced research with significant potential for high impact in the field. Feature
Papers are submitted upon individual invitation or recommendation by the scientific editors and undergo peer review
prior to publication.
The Feature Paper can be either an original research article, a substantial novel research study that often involves
several techniques or approaches, or a comprehensive review paper with concise and precise updates on the latest
progress in the field that systematically reviews the most exciting advances in scientific literature. This type of
paper provides an outlook on future directions of research or possible applications.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world.
Editors select a small number of articles recently published in the journal that they believe will be particularly
interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the
most exciting work published in the various research areas of the journal.
Face liveness detection is important for ensuring security. However, because faces are shown in photographs or on a display, it is difficult to detect the real face using the features of the face shape. In this paper, we propose a thermal face-convolutional neural network (Thermal Face-CNN) that knows the external knowledge regarding the fact that the real face temperature of the real person is 36~37 degrees on average. First, we compared the red, green, and blue (RGB) image with the thermal image to identify the data suitable for face liveness detection using a multi-layer neural network (MLP), convolutional neural network (CNN), and C-support vector machine (C-SVM). Next, we compared the performance of the algorithms and the newly proposed Thermal Face-CNN in a thermal image dataset. The experiment results show that the thermal image is more suitable than the RGB image for face liveness detection. Further, we also found that Thermal Face-CNN performs better than CNN, MLP, and C-SVM when the precision is slightly more crucial than recall through F-measure.
Face liveness detection in indoor residential environments is an important technique for delivering security information, such as in the case of unlocking a mobile device using a face recognition system. For example, in order to allow access to only one specific person, that person’s unique information, such as their face, can be used to unlock security measures. However, because the printed face photograph and face from the display can sufficiently generate the unique information of the face, the reliability of the security is reduced. Therefore, there is a need to provide more secure security by using face liveness detection, in which thermal images are distinguishable between the real face and the fake face through the heat distribution existing in the face of the real person.
In this paper, we first quantitatively identify a more suitable image for face liveness detection using both the RGB image and the thermal image. The same algorithms were applied to the RGB and thermal image datasets for the comparison. A multi-layer neural network (MLP) , convolutional neural network (CNN) , and C-support vector machine (C-SVM)  with a smooth hyperplane were used for the comparison. In addition, we compared the performance of the existing algorithms with thermal face-convolutional neural network (Thermal Face-CNN) proposed in this paper. Thermal Face-CNN is an algorithm with external knowledge about the temperature values that are found in a real face.
We have collected thermal images because there are many RGB image datasets for face liveness detection but few or no thermal image datasets available. We obtained RGB and thermal images of the same scene in order to evaluate how these thermal images improve performance over RGB images. Accuracy , recall , and precision  were mainly obtained on both the RGB and thermal image datasets.
The experimental results show that the best-performing CNN performance has an accuracy of 0.6898, a recall of 0.5752, and a precision of 0.7342 on the RGB image dataset, while it has an accuracy of 0.8367, a recall of 0.7876, and a precision of 0.8476 on the thermal image dataset. Therefore, it has been shown that the thermal image is more effective in face liveness detection than the RGB image. In addition, we show that the average recall value is improved by 13.72% over CNN by using the Thermal Face-CNN proposed in this paper for the thermal image dataset. It is also shown that we found that Thermal Face-CNN performs better than CNN, MLP, and C-SVM when the precision is slightly more crucial than recall through F-measure.
2. Background and Related Work
Face detection is a field involving the detection of a face in an image. Algorithms for face detection judge whether or not the object in the picture is the face . However, face liveness detection is a field in which the face presented is judged to be the real face or the fake face or no face. Therefore, face detection is a very different field from face liveness detection. For this reason, a paper related to face detection could not be compared with a paper related to face liveness detection. In the field of face liveness detection, there are three ways to imitate a real face: using a picture with that face, replaying a video with that face, and using a 3D face mask . The method using the picture with the face involves printing the face on paper or displaying the face on a display. In order to solve this problem, studies have been carried out to explore ways to detect the real face using a photo-based dataset [6,7,8,9]. In addition, there have been studies into the use of video-based datasets to distinguish the real face from the fake face [7,10]. Further studies into ways to distinguish between the real face and the 3D face mask have also been conducted [11,12].
Many datasets can be used for face liveness detection: NUAA , ZJU Eyeblink , Idiap Print-attack , Idiap Replay-attack , CASIA FASD , MSU-MFSD , MSU RAFS , UVAD [18,19], MSU USSA , and so on. However, these datasets include data composed of RGB images. There are not enough datasets composed of thermal images. Therefore, research on face liveness detection with thermal images has been insufficient to date. Thermal images have already been used in research for face detection and pedestrian detection [20,21,22,23]. Thermal images can be obtained through the distribution of infrared rays, even at night when there is no visible light. Because RGB images have the disadvantage of being affected by the intensity of visible light, while thermal images have the advantage of being usable in places where there is no visible light, thermal images have been successfully applied in various fields. Therefore, it is necessary to compare the RGB image and the thermal image with regard to how much performance improvement is offered by the use of the thermal image in face liveness detection. For comparison, using an existing dataset would be ideal, but none of these contain information about temperature. Thus, a new dataset is needed.
Face liveness detection involves detecting the real face by analyzing the information obtained from the image. Therefore, previous studies on face liveness detection have been carried out using image processing methods. The support vector machine (SVM) is a classification algorithm that has been used to distinguish between the real and fake faces in face liveness detection [7,11]. As shown in these studies, SVM performs well in the area of classification. Of the SVM algorithms, the linear SVM finds the linear hyperplane with the largest margin . The linear SVM assumes that classification can be performed by a line. However, there are cases where the data to be classified cannot be simply classified as a line. In order to solve this problem, research was carried out on nonlinear SVM using kernel functions . The classification was proceeded using SVM on the abstraction information combining static features and dynamic features for face liveness detection in . In addition, SVM learned the multispectral reflectance distribution information that can distinguish real human skin from images or objects meant to look like skin for face liveness detection in . Previously, SVM used in face liveness detection learned to perfectly classify training data without error. However, there is another way to find a soft margin hyperplane that has the largest margins while allowing exceptional misclassification of the small amount of data in the learning data . By using a soft margin hyperplane, we can find a hyperplane that is more generalizable without having an overfitting hyperplane on the learning data. Therefore, C-SVM, which is a nonlinear SVM using a soft margin hyperplane and more generalizable than the SVMs used in previous studies, was used in Section 4 to evaluate the performance of algorithms on the thermal image dataset.
The artificial neural network imitates human neurons . In particular, MLP is one of the artificial neural networks used in image processing . Image processing can be done through MLP, in which the information of pixels is inserted into the input layer, and the output layer outputs 0 and 1 with one node for binary classification. CNN , which is designed for effective image processing, is an algorithm that modifies MLP in a way that reduces weights and shares weights. There are studies that have effectively performed face liveness detection using CNN on the RGB image [7,26,27]. In addition, it is known that CNN is a more powerful algorithm for face liveness detection on the RGB image than SVM . Furthermore, CNN can achieve 98.99% accuracy on the relatively easy RGB image dataset called NUAA , which means that CNN is superior to previous methods  and is state-of-the-art. An accuracy of 98.99% does not mean that this field is entirely conquered. There is a need to study more difficult face liveness detection by allowing multiple objects to be included simultaneously in an image and increasing a lot of computation with more pixels in an image. The thermal image can be used to do this because there have also been studies showing that CNN has been successfully used on the thermal image [20,21,22]. For these reasons, and because there is a need to properly process the thermal image used for face liveness detection with CNN, we used this algorithm in Section 4. Nevertheless, it is necessary to investigate an algorithm superior to CNN for face liveness detection based on the thermal image. The CNN algorithm and Thermal Face-CNN for face liveness detection are concretely described in Section 3 of this paper.
In addition to the support vector machine and the artificial neural network, the algorithms used for face liveness detection are diverse. A logistic regression model [8,28] was used to classify the real face and the fake face. In addition, as methods to identify the features of the image, local binary pattern [9,29] and Lambertian model  were used for face liveness detection. The local binary pattern is a method of extracting the feature of the image considering the difference of value relative to neighboring pixels on the basis of a pixel. By this method, the feature vector representing the feature of the image was extracted for face liveness detection . Similarly, the Lambertian model is a method that has been studied for extracting information about the difference between the real face and fake face. Therefore, we can know that there has been a lot of research on how to extract image feature information in the related studies.
3. The Proposed Method
The proposed Thermal Face-CNN is an algorithm for face liveness detection based on CNN. In this algorithm, external knowledge for face liveness detection is inserted first, followed by CNN. In the proposed method, the artificial neural network part is the same as the existing CNN. CNN combines the convolutional layer, the pooling layer, and the fully connected layer. The number of convolutional layers, pooling layers, and fully connected layers vary depending on the number and type of pixels in the image. For visual convenience, an example of Thermal Face-CNN with two convolutional layers, two pooling layers, and one hidden layer is shown in Figure 1. The numbers of layers used are explained in Section 4.
First, knowledge is inserted for face liveness detection. After that, the data with external knowledge is calculated in the convolutional layer and transferred to the pooling layer. This can be repeated several times in order to process the complex image. Next, CNN passes the previously obtained information to the fully connected layer. Finally, CNN classifies the image in the output layer. The process of inserting external knowledge, the convolutional layer, the pooling layer, and fully connected layer are explained as the paper continues. The process of inserting external knowledge for face liveness detection can be accomplished by the process of inserting knowledge about the temperature that a human face can have. This can be represented as Equation (1).
In Equation (1), g is the measured temperature value, and h is the input value to CNN. Equation (1) is a formula that multiplies the value between up limit and down limit by knowledge value so as to make use of the physiological knowledge of the mean body temperature of a person, which is between 36 and 37 degrees . A pixel measuring a part of a real face must have a temperature value in this vicinity. The fact that there is a high probability that a pixel with a value close to 36 or 37 degrees in a measured thermal image is likely to represent a part of a real face can only be obtained from external knowledge, not from the data. In order to insert this knowledge into the artificial neural network, we make a remarkably different value than the measured value using Equation (1). In this case, the artificial neural network recognizes the temperature of this pixel as very different from the temperature measured at other pixels. If the knowledge value is 10, it is about ten times larger than the values of other pixels. Figure 2 shows an example of selecting 34 and 39 values near the human body temperature of 36 and 37 degrees, taking into account the errors that may occur during measurement. In Section 4, we conducted experiments setting various values of knowledge value, up limit, and down limit.
In the graph shown in the upper left of Figure 2, the vertical axis represents the temperature values. In the graph shown in the upper right of Figure 2, the external knowledge about the possibility that a part of an object measured by each pixel is a part of a real face and the possibility that it is not is expressed. Note that there are no quantitative values in the vertical axis shown in the upper right graph in Figure 2. All of the graphs of the horizontal axes shown in Figure 2 represent the pixel index. In the upper left graph in Figure 2, pixels 2 and 3 are data with different meanings from the graph on the upper right, but there is almost no quantitative difference. In order to emphasize this content, input data must be re-expressed so that there are distinct differences between the two different data: one might measure a part of a real face, and the other might not. To do so, knowledge value in Equation (1) is used. As shown in the graph in Figure 2, below, information is forced to be distributed in a specific region through a considerable difference between real values, and thermal information about the temperature value of the pixels measured is also expressed showing a minute difference. The differences in measured temperatures can be seen by comparing pixel 1 to pixel 3 and pixel 2 to pixel 4. The optimal knowledge value can be empirically found through experimentation.
The convolutional layer serves to extract the complex features of the two-dimensional image . The parameters of the convolutional layer are kernel_size, filters, and stride. kernel_size indicates the width and height of a kernel composed of learnable weights. filters represent the number of kernels, and stride is a parameter for extracting the characteristics of an image based on a certain interval. From the convolutional layer, we can extract the spatial information while sharing the weights . Formal equations related to the convolutional layer are presented in . The information calculated in the convolutional layer is transferred to the pooling layer.
Among the layers that make up CNN, the pooling layer induces spatial invariance by reducing the size of the feature map . The parameters of the pooling layer are pooling_size and stride. pooling_size represents the size of the zone to be examined, such as kernel_size, a parameter of the convolutional layer discussed above. stride in the pooling layer serves the same purpose as the stride parameter of the convolutional layer. The max pooling layer has a function to find the maximum value in each region and to transfer it to the next layer . Finally, the information is transferred to the fully connected layer through the convolutional layer and the pooling layer.
The fully connected layer is a type of layer used in MLP consisting of nodes completely connected to the nodes in each of the previous and subsequent layers .
4.1. Data Collection and Experimental Environment Construction
The Flir C3 was used as the camera for collecting data. The camera has two lenses on the front: an RGB lens to obtain RGB images of 640 × 480 pixels and an infrared lens to obtain thermal images of 80 × 60 pixels. The information on the Flir C3 can be found at a website listed in Supplementary Materials at the end of this paper. We collected one RGB image and one thermal image in each scene to find suitable data for face liveness detection. Since a thermal image is better than an RGB image at night, we took images in indoor residential environments with visible light for accurate performance comparison. There were no conditions for the distance of the object. The faces in the dataset were used with and without a variety of accessories, such as glasses. No matter what, the face is covered by any object, which can cover anything except the eyes, nose, and mouth. We used the function of the Flir C3 that allows for the simultaneous operation of the two lenses. A total of 844 scenes were taken. The actual data used were 844 Excel files with temperature information collected from infrared lens and 2532 Excel files with R, G, and B information collected from RGB lens. In Figure 3, the images in the top row are RGB images, while the images in the bottom row are thermal images.
Figure 3a,d are RGB and thermal images with a real face present, respectively. Figure 3b,e are RGB and thermal images with a face on a display, respectively. Figure 3c,f shows images taken of a ceiling air conditioner with no face. In the thermal images, the color is obtained by the software in the thermal camera itself so that the measured temperature can be intuitively grasped visually. In Figure 3a,b,d,e, it can be seen that the outline of the heat distribution and the heat on the face from the display differ from those of the real face. The RGB face liveness detection dataset jongwoo (RFLDDJ) we created and the thermal face liveness detection dataset jongwoo (TFLDDJ) we created are available on the internet. In NUAA , the whole picture is completely filled with faces. However, in the RGB dataset we created, people and objects were shot in indoor living environments in order to increase the level of difficulty. In other words, multiple objects coexist in a single image in the datasets we made. The data are more difficult because a more general situation is assumed. The information of the datasets can be found at websites listed in the Supplementary Materials at the end of this paper.
The numbers of pixels differ between the two lenses. The RGB lens has 640 pixels horizontally and 480 pixels vertically, for a total of 307,200 pixels on an image. By contrast, the infrared lens has 80 pixels horizontally and 60 pixels vertically, for a total of 4800 pixels on an image. The numbers of pixels in images obtained by the two lenses differ by 64 times. However, the range of actually measured scenes is not much different. Figure 4 shows its example.
As shown in Figure 4, the number of pixels has a difference of 64 times, but there is not much difference in the area to be taken. In addition, because the RGB lens and the infrared lens have different pixel sizes, and because there is a slight difference in the position of each lens on the camera, it is not clear how many pixels from the horizontal, vertical, top, and bottom sides should be cut for the same range of the scene. Therefore, it is impossible to capture the same extent of the range of the scene. For the correct experiment, if the real face is in a scene that the infrared lens cannot capture as an image, this image was removed from the experiment.
We use Adam , Dropout , and ReLu  to improve learning abilities when learning CNN and Thermal Face-CNN. The Adam algorithm reduces error by learning the weights existing in the artificial neural network. It is easier to execute than the back-propagation algorithm . It is also more efficient and requires less memory . Dropout prevents overfitting by allowing each node not to participate in the calculation randomly during the learning process . Sigmoid  was used as an activation function in the output layer of all artificial neural networks used in the experiments except for C-SVM, and ReLu was used as an activation function of the hidden layer. As the pooling layer, the max pooling layer  is used. In addition, the probability of dropping each node is 10%. An intel core i7-7820X CPU was used as the hardware in the experiment, and the memory was DDR4 32G. The experiment was carried out using the Tensorflow  library, which has artificial neural network code. In the case of C-SVM, the sklearn.svm.svc library was used to carry out the experiment. The information of the library can be found at a website listed in the Supplementary Materials at the end of this paper.
Accuracy , recall , and precision  were mainly used as evaluation indices in the experiment. In this study, accuracy refers to how the actual value and predicted value are matched, regardless of the presence or absence of a real face. Recall is an index of how many images having the real face are judged to have the real face. Precision is also an index of how many images have the real face among those predicted to have the real face.
4.2. The Comparison of Face Liveness Detection between the RGB Image and Thermal Image
Before examining the performance of the proposed Thermal Face-CNN, we obtained accuracy, recall, and precision for each RGB image and thermal image dataset in order to identify the appropriate dataset for face liveness detection. For the comparison, we used CNN, MLP, and C-SVM. The left side of Table 1 shows the parameters of CNN applied to the RGB image dataset, and the right side of Table 1 shows the parameters of CNN applied to the thermal image dataset. We empirically sought the values of the parameters that would make the error of the artificial neural network converge to zero.
In Table 1, nodes refers to the number of nodes in the corresponding layer. Further, con_ means convolutional layer and pool_ means pooling layer. input_, hidden_, and output_ mean input layer, hidden layer, and output layer, respectively. The rest of the parameters are the same as those described in Section 3. In Table 1, the values in parentheses represent two values for the width and length of the kernel and pooling sequentially.
The parameter values for C-SVM used in the thermal image dataset are shown in Table 2.
In Table 2, c is an error penalty parameter, and we changed c when we experimented. RBF  or polynomial (POLY)  is used as kernel. gamma is the coefficient of kernel. In addition, n_features means the number of features and tolerance means stopping criterion. degree means the degree of the polynomial kernel function.
The parameters of the MLP used to learn the thermal images are shown in Table 3.
A total of 599 images in the RGB image dataset and thermal image dataset from image 1 to image 599 were used as training data, and the remaining 245 images were used for test data. There are 338 images of 844 images with the real face, and 506 images without the real face. In the training set are 225 images with the real face, and 113 images with the real face are in test set. In the training set were 374 images without the real face, and 132 images without the real face are in the test set. Table 4 shows the experimental results of CNN in the RGB image dataset and the thermal image dataset. Table 5 and Table 6 show the experimental results of MLP and C-SVM in the thermal image dataset. The figures in the following tables, including Table 4, Table 5 and Table 6, were rounded to the fourth decimal place. Figures expressed as percentages in the following tables were rounded to the second decimal place.
In Table 4 and Table 5, “The best” refers to the highest values. “Average” means the average value. In order to obtain the information shown in Table 4, five CNNs in the RGB image dataset and 20 CNNs in the thermal image dataset were implemented with the same parameters. Because the combinations of weights obtained when the neural network is learned with the same parameters are always different and show different performances, we repeated the experiment 20 times in order to obtain the average performance of the general accuracy, recall, and precision values. However, in the RGB image dataset, the number of pixels contained in each image was 907,200, which required a substantial amount of computation. Therefore, 20 CNNs were learned in the thermal image dataset, but only five CNNs were learned in the RGB image dataset. To obtain Table 5, five MLPs were learned because MLP requires a large amount of computation. To evaluate C-SVM’s performance in Table 6, we obtained one C-SVM on each parameter setting. The values of accuracy, recall, and precision shown in Table 4, which were obtained using the thermal image dataset, are higher than those of the RGB image dataset. It can be seen from the above that, on CNN, the thermal image is more suitable than the RGB image.
In the case of MLP, since there is 907,200-pixel information per RGB image, the number of nodes in the input layer should also be 907,200. We tried to implement an MLP with about 900,000 nodes in the input layer, but the hardware limitations made it impossible to calculate. Further, the C-SVM was learned using the parameters shown in Table 2, but it was determined that there was no real face for all the test data, because it was not learned properly. However, as shown in Table 5 and Table 6, MLP and C-SVM can be learned because of the small number of pixels in a thermal image data. Through comparing Table 4, Table 5 and Table 6, it can be seen that good performance can be obtained by the thermal image data.
4.3. Performance Comparison of CNN, C-SVM, and Thermal Face-CNN
Section 4.2 showed that the thermal image is better than the RGB image. In Section 4.3, we applied the Thermal Face-CNN proposed in this paper to the thermal image with superior performance for face liveness detection than the RGB image, and we compared its performance with those of the other algorithms. We used the same parameters of CNN on the thermal image dataset for Thermal Face-CNN. We also constructed 20 Thermal Face-CNNs with the same parameter setting as used in the experiment on 20 CNNs, shown in Table 4. The accuracy, recall, and precision values of Thermal Face-CNNs are shown in Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12. Parenthetical values in these tables indicate knowledge value, up limit, and down limit values, sequentially.
When the knowledge value is 10 in the Thermal Face-CNNs described in Table 7 and Table 8 and the left side of Table 9, the values of accuracy, recall, and precision are obtained as changes occur to the values of the up limit and down limit. When the up limit and down limit are 39 and 33, respectively, the average recall value has the greatest increase, by 12.39%. When the up limit and down limit values are 39 and 34, respectively, the average recall value is increased by 10.44%. When the up limit and down limit are 40 and 34, respectively, the average recall value is increased by 7.97%, and the average precision value is decreased slightly by −1.53%. In addition, when the up limit and down limit are 41 and 34, respectively, the average recall is increased by 6.61%, and the precision is decreased by −2.18%. When the values of the up limit and down limit are 39 and 35, respectively, the amount of the increment of recall is reduced the best.
The Thermal Face-CNNs described on the left side of Table 7 and the right side of Table 9 and Table 10, Table 11 and Table 12 show the amount by which the performance changed when the up limit and down limit are 39 and 34, respectively, and when the knowledge value is changed. Table 12 shows that much lower performance can be achieved with Thermal Face-CNN than with CNN. The Thermal Face-CNN used to obtain the data in Table 12 has the same parameters as the Thermal Face-CNNs used to obtain the data in the left side of Table 7, except for the fact that the knowledge value is 1,000. Therefore, a huge knowledge value shows that performance can be rather reduced. The best performance was obtained by increasing the average recall value by 13.72% when the knowledge value was −5, and the second-best average recall value was increased by 11.47% when the knowledge value was −10. In addition, when the knowledge value was 10, the third-best performance was obtained by increasing the average recall value by 10.44%. When the knowledge value was −100, the average recall value was increased by 10.43%, which was the fourth-best performance.
Except for Table 12, the average recall values of the Thermal Face-CNN having external knowledge about the temperature of the real face in Table 7, Table 8, Table 9, Table 10 and Table 11 show that the average recall value and the best recall value are better than the CNN shown in the right side of Table 4. An increase of the recall value means that the Thermal Face-CNN has detected more data having the real face than CNN. It can be seen that CNN and Thermal Face-CNN are not significantly different in terms of accuracy and precision when we compare the values in the right sides of Table 4 and Table 7, Table 8, Table 9, Table 10 and Table 11. Looking at the performance of Thermal Face-CNN that obtained the best performance, in the left side of Table 10, we can see that Thermal Face-CNN was not reduced at all. Therefore, Thermal Face-CNN is superior to CNN in all indices.
The performance obtained by Thermal Face-CNN must be compared with the accuracy, recall, and precision values recorded in Table 5 and Table 6 quantitatively. Table 10 shows that the method with the highest accuracy is 0.8367 on Thermal Face-CNN. In addition, the results in Table 6 show that C-SVM is the method with the highest recall. Further, Table 5 shows that MLP is the method with the highest precision. However, MLP is a relatively bad way to detect the real face because the recall value is too small. Thermal Face-CNN has the best accuracy and more balance between recall and precision than MLP and C-SVM. For accurate performance evaluation, F-measure  is used. F-measure is a widely used index that quantitatively evaluates performance by simultaneously considering recall and precision. F-measure is shown in Equation (2).
β is a positive real number or zero. Also precision, recall, and F-measure are the values of precision, recall, and F-measure, respectively. A larger F_measure value means a better algorithm. When β is one, the most frequently used F-measure formula appears in Equation (3).
F-measure_1 in Equation (3) means the value of F-measure when β is one. As shown in Equation (4), difference denotes the difference value of F-measures of the Thermal Face-CNN and C-SVM; Thermal Face-CNN obtained 0.8327 accuracy, 0.8407 recall, 0.8051 precision, and C-SVM obtained 0.8245 accuracy, 0.9381 recall, 0.7465 precision corresponding to Table 6.
When the difference is zero, the β value is 0.8885, meaning that the two f-measure values are the same. When β is greater than or equal to 0 and less than 0.8885, then Thermal Face-CNN is better. By contrast, when β is greater than 0.8885, C-SVM is better. You can find the corresponding conditions by obtaining equations in the same way for several Thermal Face-CNNs. It is trivial to find β that makes difference zero when the parameters are different. Nevertheless, it is important to show that the Thermal Face-CNN is superior by listing the F-measures obtained at commonly used β values of 0.5 and 2. Table 13 shows it.
In Table 13, “Average F-measure” means the F-measure using average recall and average precision in the left side of Table 10. When β is 2, F-measure means that F-measure weighs recall higher than precision. When β is 0.5, F-measure means that F-measure weighs recall lower than precision. Therefore, we can see that Thermal Face-CNN is best when precision has more weight than recall. Precision is more important than recall when the reliability of the algorithm is important. Therefore, Thermal Face-CNN is good for this situation.
In addition to the comparison based on accuracy, recall, precision, and F-measure, it is shown that the CNN-based proposed algorithm is superior to CNN and has similar performance with the others on receiver operating characteristic (ROC) graph  in Figure 5. Parenthetical values in Figure 5 indicate knowledge value, up limit, and down limit values, sequentially.
‘A’ line is better than ‘B’ line if ‘A’ line is closer to the northwest than ‘B’ line in ROC graph. The blue line in Figure 5 shows the performance of C-SVM, the green and black lines show the performance of Thermal Face-CNN, the red line shows the performance of MLP, and the orange line shows the performance of CNN. To obtain Figure 5, we used the parameters having the best performance: MLP which has an accuracy of 0.7837, a recall of 0.5664, and a precision of 0.9412 and the CNN which has an accuracy of 0.8367, a recall of 0.7876, and a precision of 0.8476 and the best performance among a up limit value of 39, and a down limit value of 34 in Thermal Face-CNN which has an accuracy of 0.8327, a recall of 0.8407, a precision of 0.8051, a knowledge value value of−5, a up limit value of 39, and a down limit value of 34 and the best performance among a knowledge value of 10 in Thermal Face-CNN which has an accuracy of 0.8245, a recall of 0.8496, a precision of 0.7869, a knowledge value value of 10, a up limit value of 39, and a down limit value of 33 and C-SVM which has a c value of 1 are used. As shown in Figure 5, Thermal Face-CNN has the dramatic performance improvement compared to CNN, and the Thermal Face-CNN’s performance is close to that of MLP and C-SVM. In this paper, we argue that Thermal Face-CNN is better when precision is more important than recall. However, ROC graph does not directly consider precision because it uses true positive rate and false positive rate, which are not precision. Nonetheless, the ROC graph shows that Thermal Face-CNN is superior to CNN.
5. Conclusions and Future Works
Face liveness detection is an important field that allows for information about a real person to be communicated when communicating security. In this paper, face liveness detection was performed in indoor residential environment using the fact that thermal patterns on a face in a display and a photograph differ from those on the real face. First, we quantitatively compared the performance of the thermal image with the RGB image. It has been shown that the thermal image is more suitable for face liveness detection because CNN has the best performance, with an accuracy of 0.6898, a recall of 0.5752, a precision of 0.7342 on the RGB image dataset, and an accuracy of 0.8367, a recall of 0.7876, and a precision of 0.8476 on the thermal image dataset. We also propose Thermal Face-CNN, which has external knowledge about the real face temperature in the existing CNN algorithm and compares it with CNN. The performance of the best-performing Thermal Face-CNN is equal to or better than CNN. Furthermore, we used the F-measure to identify the condition in which the Thermal Face-CNN performs better than the C-SVM.
Based on the results in this paper, we hope that Thermal Face-CNN with the thermal image is used to detect malicious tricks to imitate the face. This paper shows that it is possible to insert external knowledge by adjusting the value of a particular real number range. Therefore, it is expected that the application algorithms that have knowledge in various fields will emerge.
In this study, the experiment was conducted using 844 scenes. Nevertheless, as the number of data increases, it becomes more feasible to use face liveness detection in more general situations. Therefore, there is a need to collect thermal images in the future. Moreover, due to the difference between the RGB lens and the infrared lens, the images measured differ in terms of pixel size, the number of pixels, and the range of the scene. Therefore, there is a need to construct datasets with fewer differences between the RGB and thermal image. Because the experiments of all the possible combinations of the parameters in the algorithms were not done, the comparisons are not conclusive. Therefore, it is necessary to accurately identify the optimal parameters combination that obtains the highest accuracy, recall, precision, F-measure value through additional experimentation.
Powers, D.M. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness and Correlation. J. Mach. Learn. Technol.2011, 2, 37–63. [Google Scholar]
Jiang, H.; Learned-Miller, E. Face detection with the faster R-CNN. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; pp. 650–657. [Google Scholar]
Patel, K.; Han, H.; Jain, A.K. Secure face unlock: Spoof detection on smartphones. IEEE Trans. Inf. Forensics Secur.2016, 11, 2268–2283. [Google Scholar] [CrossRef]
Wu, L.; Xu, Y.; Xu, X.; Qi, W.; Jian, M. A face liveness detection scheme to combining static and dynamic features. In Proceedings of the Chinese Conference on Biometric Recognition, Chengdu, China, 14–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 628–636. [Google Scholar]
Tan, X.; Li, Y.; Liu, J.; Jiang, L. Face liveness detection from a single image with sparse low rank bilinear discriminative model. In Proceedings of the European Conference on Computer Vision, Crete, Greece, 5–11 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 504–517. [Google Scholar]
Kim, G.; Eum, S.; Suhr, J.K.; Kim, D.I.; Park, K.R.; Kim, J. Face liveness detection based on texture and frequency analyses. In Proceedings of the 2012 5th IAPR International Conference on Biometrics (ICB), New Delhi, India, 29 March–1 April 2012; pp. 67–72. [Google Scholar]
Chingovska, I.; Anjos, A.; Marcel, S. On the effectiveness of local binary patterns in face anti-spoofing. In Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG), Darmstadt, Germany, 6–7 September 2012. [Google Scholar]
Zhang, Z.; Yi, D.; Lei, Z.; Li, S.Z. Face liveness detection by learning multispectral reflectance distributions. In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011), Santa Barbara, CA, USA, 21–25 March 2011; pp. 436–441. [Google Scholar]
Erdogmus, N.; Marcel, S. Spoofing face recognition with 3D masks. IEEE Trans. Inf. Forensics Secur.2014, 9, 1084–1097. [Google Scholar] [CrossRef]
Pan, G.; Sun, L.; Wu, Z.; Lao, S. Eyeblink-Based Anti-Spoofing in Face Recognition from a Generic Webcamera. In Proceedings of the 2007 11th IEEE International Conference on Computer Vision(ICCV), Rio de Janeiro, Brazil, 14–20 October 2007; pp. 1–8. [Google Scholar]
Anjos, A.; Marcel, S. Counter-measures to photo attacks in face recognition: A public database and a baseline. In Proceedings of the 2011 International Joint Conference on Biometrics (IJCB), Washington, DC, USA, 11–13 October 2011; pp. 1–7. [Google Scholar]
Zhang, Z.; Yan, J.; Liu, S.; Lei, Z.; Yi, D.; Li, S.Z. A face antispoofing database with diverse attacks. In Proceedings of the 2012 5th IAPR International Conference on Biometrics (ICB), New Delhi, India, 29 March–1 April 2012; pp. 26–31. [Google Scholar]
Wen, D.; Han, H.; Jain, A.K. Face spoof detection with image distortion analysis. IEEE Trans. Inf. Forensics Secur.2015, 10, 746–761. [Google Scholar] [CrossRef]
Patel, K.; Han, H.; Jain, A.K.; Ott, G. Live face video vs. spoof face video: Use of moiré patterns to detect replay video attacks. In Proceedings of the 2015 International Conference on Biometrics (ICB), Phuket, Thailand, 19–22 May 2015; pp. 98–105. [Google Scholar]
Pinto, A.; Schwartz, W.R.; Pedrini, H.; Rocha, A.D. Using visual rhythms for detecting video-based facial spoof attacks. IEEE Trans. Inf. Forensics Secur.2015, 10, 1025–1038. [Google Scholar] [CrossRef]
Pinto, A.; Pedrini, H.; Schwartz, W.R.; Rocha, A. Face spoofing detection through visual codebooks of spectral temporal cubes. IEEE Trans. Image Process.2015, 24, 4726–4740. [Google Scholar] [CrossRef] [PubMed]
König, D.; Adam, M.; Jarvers, C.; Layher, G.; Neumann, H.; Teutsch, M. Fully convolutional region proposal networks for multispectral person detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 243–250. [Google Scholar]
Zhang, X.; Chen, G.; Saruta, K.; Terata, Y. Deep Convolutional Neural Networks for All-Day Pedestrian Detection. In Information Science and Applications; Springer: Singapore, 2017; pp. 171–178. [Google Scholar]
Baek, J.; Hong, S.; Kim, J.; Kim, E. Efficient pedestrian detection at nighttime using a thermal camera. Sensors2017, 17, 1850. [Google Scholar] [CrossRef] [PubMed]
Kwaśniewska, A.; Rumiński, J. Face detection in image sequences using a portable thermal camera. In Proceedings of the 13th Quantitative Infrared Thermography Conference, Quebec City, QC, Canada, 4–8 July 2016. [Google Scholar]
Burges, C.J. A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov.1998, 2, 121–167. [Google Scholar] [CrossRef]
Peña, J.; Gutiérrez, P.; Hervás-Martínez, C.; Six, J.; Plant, R.; López-Granados, F. Object-based image classification of summer crops with machine learning methods. Remote Sens.2014, 6, 5019–5041. [Google Scholar] [CrossRef]
Alotaibi, A.; Mahmood, A. Deep face liveness detection based on nonlinear diffusion using convolution neural network. Signal Image Video Process.2017, 11, 713–720. [Google Scholar] [CrossRef]
Akbulut, Y.; Şengür, A.; Budak, Ü.; Ekici, S. Deep learning based face liveness detection in videos. In Proceedings of the 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey, 16–17 September 2017; pp. 1–4. [Google Scholar]
Peixoto, B.; Michelassi, C.; Rocha, A. Face liveness detection under bad illumination conditions. In Proceedings of the 2011 18th IEEE International Conference on Image Processing (ICIP), Brussels, Belgium, 11–14 September 2011; pp. 3557–3560. [Google Scholar]
Boulkenafet, Z.; Komulainen, J.; Hadid, A. Face spoofing detection using colour texture analysis. IEEE Trans. Inf. Forensics Secur.2016, 11, 1818–1830. [Google Scholar] [CrossRef]
Sund-Levander, M.; Forsberg, C.; Wahren, L.K. Normal oral, rectal, tympanic and axillary body temperature in adult men and women: A systematic literature review. Scand. J. Caring Sci.2002, 16, 122–128. [Google Scholar] [CrossRef] [PubMed]
Cong, J.; Xiao, B. Minimizing computation in convolutional neural networks. In Proceedings of the International Conference on Artificial Neural Networks, Hamburg, Germany, 15–19 September 2014; Springer: Cham, Switzerland, 2014; pp. 281–290. [Google Scholar]
Scherer, D.; Müller, A.; Behnke, S. Evaluation of pooling operations in convolutional architectures for object recognition. In Artificial Neural Networks—ICANN 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 92–101. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv, 2014; arXiv:1412.6980. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res.2014, 15, 1929–1958. [Google Scholar]
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, Savannah, GA, USA, 2–4 November 2016; Volume 16, pp. 265–283. [Google Scholar]
Chang, Y.-W.; Hsieh, C.-J.; Chang, K.-W.; Ringgaard, M.; Lin, C.-J. Training and testing low-degree polynomial data mappings via linear SVM. J. Mach. Learn. Res.2010, 11, 1471–1490. [Google Scholar]
Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond accuracy, F-score and ROC: A family of discriminant measures for performance evaluation. In Proceedings of the Australasian Joint Conference on Artificial Intelligence, Hobart, Australia, 4–8 December 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett.2006, 27, 861–874. [Google Scholar] [CrossRef]
Example of the process of inserting external knowledge.
Example of the process of inserting external knowledge.
Data examples: (a) a real face taken by RGB lens; (b) a face on a display taken by RGB lens; (c) a ceiling air conditioner taken by RGB lens; (d) a real face taken by infrared lens; (e) a face on a display taken by infrared lens; (f) a ceiling air conditioner taken by infrared lens.
Data examples: (a) a real face taken by RGB lens; (b) a face on a display taken by RGB lens; (c) a ceiling air conditioner taken by RGB lens; (d) a real face taken by infrared lens; (e) a face on a display taken by infrared lens; (f) a ceiling air conditioner taken by infrared lens.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely
those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or
the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas,
methods, instructions or products referred to in the content.