A 3D Mask Presentation Attack Detection Method Based on Polarization Medium Wave Infrared Imaging

Facial recognition systems are often spoofed by presentation attack instruments (PAI), especially by the use of three-dimensional (3D) face masks. However, nonuniform illumination conditions and significant differences in facial appearance will lead to the performance degradation of existing presentation attack detection (PAD) methods. Based on conventional thermal infrared imaging, a PAD method based on the medium wave infrared (MWIR) polarization characteristics of the surface material is proposed in this paper for countering a flexible 3D silicone mask presentation attack. A polarization MWIR imaging system for face spoofing detection is designed and built, taking advantage of the fact that polarization-based MWIR imaging is not restricted by external light sources (including visible light and near-infrared light sources) in spite of facial appearance. A sample database of real face images and 3D face mask images is constructed, and the gradient amplitude feature extraction method, based on MWIR polarization facial images, is designed to better distinguish the skin of a real face from the material used to make a 3D mask. Experimental results show that, compared with conventional thermal infrared imaging, polarization-based MWIR imaging is more suitable for the PAD method of 3D silicone masks and shows a certain robustness in the change of facial temperature.


Introduction
Biometric techniques have become a part of daily life, and the most widely used technique is facial recognition. However, the vulnerability of the data capture subsystem, and even the whole system in general, greatly reduces the security of facial recognition applications [1]. Face presentation attack [2,3] creates this problem. Biometric features or objects used in a face presentation attack are called presentation attack instruments (PAI) [ISO/IEC JTC1 SC37 Biometrics 2016] [3][4][5]. Facial presentation attacks mainly originate from three types of PAI: photos of a whole face, replaying videos of a face, and three-dimensional (3D) masks [3].
Many researchers are now working on research of the presentation attack detection (PAD) method, which is also referred to as a countermeasure or an anti-spoofing technique in some of the  [2,[6][7][8]. Most PAD methods, however, have only been developed to detect 2D presentation attacks. As 3D printing technology has matured in recent years, a large number of cheap and realistic 3D masks have appeared, which makes the PAD of 3D masks a new challenge [9].
The existing 3D mask PAD methods are mainly divided into two categories: visible-based methods and infrared-based methods. Of all the different types of visible-based methods, texture [10][11][12] and motion features [13] are the most commonly used. For example, by combining different local binary pattern (LBP) descriptors, the texture differences between a real face and a 3D face mask can be effectively obtained [9]. Recently, a method based on remote photoplethysmography has conducted the classification using various heartbeat signals [14]. Moreover, Azim et al. have used image statistics to classify real faces and their facial photos using visible light polarization imaging and achieved an accuracy of 87.84%, a true positive rate of 90%, and a false positive rate of less than 10% [15]. Due to the fact that the polarization degree of visible light reflected from a real face with black skin has similar statistical characteristics (such as mean) to those of printed photos, Azim et al. have proposed a method to distinguish the real faces with black skin from facial photos by using the Mean_BC algorithm, which improves the accuracy rate to 93.24% [16]. However, one major potential shortcoming of most existing visual spectrum spoofing detection methods is that the observed texture of a face is quite sensitive to the environment, such as illumination and expression.
In addition to the visible spectrum, the infrared spectrum has also been considered, especially the near-infrared (NIR) spectrum [17]. Wang Y. et al. combined visible and NIR spectrum bands to model gradient features to detect PVC faces, silicone face masks, and photographs of faces, which produced good results [18]. Their attempt shows that difference in reflectivity can be a powerful clue in the detection of a real or fake face. Jun Liu et al. conducted spoofing detection to differentiate between a real face and a 3D face mask by means of deep learning and multi-spectral imaging that included both the visible and NIR spectrum [19]. Three convolutional neural networks (CNN) were selected for statistical analysis, and the results showed that the lowest average classification error rate was 0.05% [20]. The significance of this experiment lies in the fact that NIR imaging has a better performance than visible light imaging in the detection of 3D masks. However, the amount of training data is too small, which may result in the over-fitting of the network, and the accuracy of the results and generalization ability of the algorithm may be reduced. Several multispectral-based methods try to overcome the problem of face presentation attack. However, the zeroth-order and first-order statistics of mask images in both visible and NIR domains are quite similar to those of bona fide presentations [21]. Some NIR-based methods are also reported as being susceptible to nonuniform illumination conditions [15,18,22].
Recently, researchers have attempted to explore PAD methods using the thermal infrared imaging technique. However, most previous studies on 3D mask PAD have considered rigid masks, not flexible silicone masks. Firstly, Marcel et al. conducted a systematic study on the vulnerability of face recognition systems to impersonation attacks based on custom-made silicone masks [23]. Furthermore, they found that real human faces and 3D silicone masks showed significantly different low-order statistics in the thermal domain, which means that thermal imaging can be used to detect 3D mask presentation attacks [24]. However, since the temperature information is easily changed by attackers, resulting in a similar temperature reading for spoof faces and real faces, the expression of the thermal infrared radiation information for real and fake faces will become similar, which may lead to the degradation of the detection performance.
Based on the thermal infrared PAD method, a PAD method based on the medium wave infrared (MWIR) polarization characteristics of material surfaces is proposed in this paper for countering 3D silicone mask presentation attacks. Different targets have different polarization characteristics. Even if these targets have similar thermal infrared radiation intensity characteristics, their polarization characteristics are still quite different [25]. This work chooses the spectral band of 3.7-4.8 µm and collects facial images by means of a polarization imaging system so that the images are not restricted by external light sources. Based on the collected polarization MWIR face images, this paper constructs local maximum gradient amplitude feature vectors before being trained by a Support Vector Machine (SVM) classifier to distinguish the skin of a real human face from the silicone material used to make a 3D mask, regardless of the appearance differences. Through the comparative experiments, it is found that the polarization MWIR imaging is more suitable for 3D silicone masks than conventional MWIR imaging. This method becomes a reference for exploring other detection methods of 3D silicone mask spoofing based on infrared imaging.
The remainder of this paper is organized as follows: Section 2 explains, in detail, the PAD method proposed in this paper based on the MWIR polarization characteristics of the material surface; Section 3 introduces the experimental results and analysis; Section 4 is the summary. Figure 1 is a flow chart of the PAD method developed in this paper. A polarization MWIR imaging system is used to capture a group of facial images for feature extraction. Based on the polarization degree of each region, a feature extraction method with a local maximum gradient value is proposed. Subsequently, an SVM classifier is used for training and classification.

Methods
Symmetry 2020, 12, x FOR PEER REVIEW 3 of 15 found that the polarization MWIR imaging is more suitable for 3D silicone masks than conventional MWIR imaging. This method becomes a reference for exploring other detection methods of 3D silicone mask spoofing based on infrared imaging. The remainder of this paper is organized as follows: Section 2 explains, in detail, the PAD method proposed in this paper based on the MWIR polarization characteristics of the material surface; Section 3 introduces the experimental results and analysis; Section 4 is the summary. Figure 1 is a flow chart of the PAD method developed in this paper. A polarization MWIR imaging system is used to capture a group of facial images for feature extraction. Based on the polarization degree of each region, a feature extraction method with a local maximum gradient value is proposed. Subsequently, an SVM classifier is used for training and classification.

Imaging System
A time-dependent polarization imaging system is selected in this paper. The infrared intensity images obtained from four polarization angles are registered to reduce the deviation caused by the slight shaking of the experiment table and displacement of the subjects that may occur when the polarizer is rotated. Figure 2 shows the configuration of the experimental imaging system. It is worth noting that, in addition to the normal scene radiation signal, the detector will also superimpose nonfiltered AC noise signals coming from the weak reflection of the polarizer in the cold environment of the imaging system, which forms a black spot in the center of the image's field of view. This phenomenon is called the cold reflection phenomenon. In order to eliminate the cold reflection phenomenon, the polarizer can be tilted to defocus the cold reflection, thus removing the black spot in the center of the image's field of view [26].
The target's spontaneous emission in the MWIR band has a specific polarization characteristic. Polarized light in nature is mainly composed of linear polarized light, and the degree of linear polarization (DoLP) is used to measure the intensity. In this paper, a Stokes vector is used to calculate the DoLP of infrared radiation, which is expressed in terms of radiation intensity x I as:

Gradient-Based Feature Extraction
Genuine and Fake Face Classification

Imaging System
A time-dependent polarization imaging system is selected in this paper. The infrared intensity images obtained from four polarization angles are registered to reduce the deviation caused by the slight shaking of the experiment table and displacement of the subjects that may occur when the polarizer is rotated. Figure 2 shows the configuration of the experimental imaging system. found that the polarization MWIR imaging is more suitable for 3D silicone masks than conventional MWIR imaging. This method becomes a reference for exploring other detection methods of 3D silicone mask spoofing based on infrared imaging. The remainder of this paper is organized as follows: Section 2 explains, in detail, the PAD method proposed in this paper based on the MWIR polarization characteristics of the material surface; Section 3 introduces the experimental results and analysis; Section 4 is the summary. Figure 1 is a flow chart of the PAD method developed in this paper. A polarization MWIR imaging system is used to capture a group of facial images for feature extraction. Based on the polarization degree of each region, a feature extraction method with a local maximum gradient value is proposed. Subsequently, an SVM classifier is used for training and classification.

Imaging System
A time-dependent polarization imaging system is selected in this paper. The infrared intensity images obtained from four polarization angles are registered to reduce the deviation caused by the slight shaking of the experiment table and displacement of the subjects that may occur when the polarizer is rotated. Figure 2 shows the configuration of the experimental imaging system. It is worth noting that, in addition to the normal scene radiation signal, the detector will also superimpose nonfiltered AC noise signals coming from the weak reflection of the polarizer in the cold environment of the imaging system, which forms a black spot in the center of the image's field of view. This phenomenon is called the cold reflection phenomenon. In order to eliminate the cold reflection phenomenon, the polarizer can be tilted to defocus the cold reflection, thus removing the black spot in the center of the image's field of view [26].
The target's spontaneous emission in the MWIR band has a specific polarization characteristic. Polarized light in nature is mainly composed of linear polarized light, and the degree of linear polarization (DoLP) is used to measure the intensity. In this paper, a Stokes vector is used to calculate the DoLP of infrared radiation, which is expressed in terms of radiation intensity x I as:

Gradient-Based Feature Extraction
Genuine and Fake Face Classification It is worth noting that, in addition to the normal scene radiation signal, the detector will also superimpose nonfiltered AC noise signals coming from the weak reflection of the polarizer in the cold environment of the imaging system, which forms a black spot in the center of the image's field of view. This phenomenon is called the cold reflection phenomenon. In order to eliminate the cold reflection phenomenon, the polarizer can be tilted to defocus the cold reflection, thus removing the black spot in the center of the image's field of view [26]. The target's spontaneous emission in the MWIR band has a specific polarization characteristic. Polarized light in nature is mainly composed of linear polarized light, and the degree of linear polarization (DoLP) is used to measure the intensity. In this paper, a Stokes vector is used to calculate the DoLP of infrared radiation, which is expressed in terms of radiation intensity I x as: where I 0 , I 45 , I 90 , and I 135 represent the linear polarization infrared radiation intensity images taken at polarization angles (relative to horizontal direction) of 0 • , 45 • , 90 • , and 135 • , respectively. I R and I L are left and right circularly polarization infrared radiation images. S 0 represents the total conventional radiation intensity images. S 1 captures horizontal and vertical polarization information, while S 2 captures diagonal polarization information. In other words, S 1 and S 2 capture orthogonal, but complementary, polarization information, providing additional texture and geometric details, which enhances the recognition ability. Generally, very little circularly polarized light exists in nature (i.e., the component of S 3 is very small) so much so that it is generally considered to be zero [5]. The DoLP of infrared radiation can be directly calculated using Stokes parameters as: The surface radiation from an object has different emittance in different polarization directions, which results in the polarization effect of spontaneous radiation [27]. According to the object's infrared radiation characteristic, the numerical relationship between emissivity and reflectivity is: where ε sur f is the emissivity of the object's surface, and r sur f is the reflectivity. Therefore, the Stokes expression of the polarization radiation transmission model can be deduced by means of a polarized bidirectional reflection distribution function (pBRDF) model based on the micro-plane element theory [28] as: where σ represents the roughness of the object's surface. The smaller the value of σ, the smoother the object's surface. θ is the angle between the normal z µ of a micro-plane element and the surface normal z. θ i and θ r are the incident and reflecting zenith angles, respectively. ϕ r is the reflection azimuth angle. η i is the angle between the incident light and the normal light of the material's surface. I bg and I obj are the infrared radiation intensity of the background and target, respectively. R s and R p are the polarized Fresnel reflectivity from a rough surface. Based on Equation (4) and the physical definition of polarization degree, the DoLP of infrared radiation including multiple influencing factors can be obtained as: It can be seen from Equation (5) that DoLP is a function of certain parameters, such as the roughness, reflectivity, incidence angle and intensity contrast between the background and target.
Due to the complexity of the mathematical model, reasonable assumptions about the detection conditions can be made. (i) Assume that the incoming light and reflected light are in the same plane, so that the rotation angles η i and η r among the reference planes in the micro-plane element can be ignored. (ii) Assume that the surface smoothness of the measured object is high, so that θ can be ignored. The simplified relationship between DoLP and factors including the incident angle, material reflectivity and roughness can be expressed as: where a is the ratio of the intensity difference between the background and the target and the target's radiation intensity. Under the experimental condition of this paper, further reasonable assumptions can be made. (i) Before and after wearing a 3D mask, the subject faces towards the imaging system, so the incident angle θ i can be regarded as a constant. (ii) The background is fixed, so the difference in coefficient a can be seen as being caused by the difference in radiation intensity between a positive and negative sample. As described in Section 1, attackers can easily make the conventional radiation intensity of real and fake faces very similar by changing the mask's temperature or by other means. Thus, a can be regarded as a constant in this paper. If the energy loss from absorption and scattering is not considered, when the incident light wave is reflected onto the interface of two different media, the light energy is redistributed between the reflected light and refracted light according to a certain law, and the total energy remains constant. Therefore, the reflectivity and refractive index meet: Then refractive index can be expressed as: In summary, according to Equations (5) to (8), it is deduced that the DoLP of infrared radiation is a function of surface roughness σ and surface refractive index N, which is not affected by illumination conditions, and it is denoted as DoLP = F 1 (σ, N). Furthermore, the DoLP decreases with the increase in σ, while it increases with the increase in N. This means that, under the same condition: (i) the rougher the target's surface, the lower the polarization degree of infrared radiation; (ii) the higher the target's surface refractive index, the greater the DoLP of infrared radiation [29].
Note that the calibration process of a polarization infrared imaging system is not studied in this paper, and the functional relationship between the image's pixel value and the value of the imaging system's response is not derived in detail. When the system is stable, it is assumed that the corresponding calibration relationship remains unchanged. Therefore, the pixel value I of polarization MWIR face image has a certain function relationship F 2 (·) with refractive index N and surface roughness σ as:

Feature Design
In view of different DoLP values of infrared radiation in different regions, a feature extraction method based on the local maximum gradient amplitude is designed in this section. In the process of feature extraction, real face images are taken as positive samples and 3D face mask images as negative samples.
Firstly, the symmetric gradient amplitudes centered on the target pixels are calculated pixel by pixel for each polarization MWIR face image as: where I is the image's pixel value. g(x, y) can be further expressed as: It can be seen from Equation (11) that the gradient amplitude of a polarization MWIR face image is determined by the different refractive index and surface roughness in each region.
Secondly, the gradient amplitude image is scaled pixel by pixel as: where T is the threshold. When the pixel value is less than T, it is multiplied by c 1 , and when the pixel value is larger or equal to T, it is multiplied by c 2 . After scaling, the face masks will show distinct contours around the eyes, nostrils and even the mouths, while the real faces will not. Next, the gradient amplitude image h(x, y) is equally divided into blocks after scaling for feature extraction. In order to facilitate the extraction process of the image features, the sizes of the facial images were uniformly adjusted to 196 × 196 pixels, and the size of each block was set to 14 × 14 pixels. Additionally, different settings for the block sizes will be tried in future works.
Finally, the maximum gradient amplitude of all pixels in each block is selected and all of these are connected in a series to form a feature vector. Since statistics, such as mean and variance, are sensitive to the complexity of pixel value distribution in each block, the maximum value is selected in this paper instead of the above two parameters. The feature vector is shown as: where k represents the image number, m x is the maximum value of gradient amplitude of each region: M is the dimension of feature vector, and its calculation formula is: where P is the size of sample image, B is the size of block, and S is the step size. The construction process of the feature vector is shown in Figure 3. M is the dimension of feature vector, and its calculation formula is: where P is the size of sample image, B is the size of block, and S is the step size. The construction process of the feature vector is shown in Figure 3. The gradient feature is designed based on the difference in DoLP of infrared radiation in different regions of facial images. According to the modeling process described in Section 2.1, this feature is only related to the refractive index N and surface roughness σ of the face, and it is independent of the facial appearance.
After obtaining the feature vector based on the gradient, an SVM classifier is used to learn the gradient features of the polarization MWIR image of the real face and the 3D face mask, then the classification is completed to obtain the evaluation results.

Data Collection System and Material
The existing public databases for the research of the PAD method are as follows: most of them are built to study the PAD methods for photos and replaying videos, including nine visible light databases, the most typical of which is the NUAA (Nanjing University of Aeronautics and Astronautics) Imposter Database; additionally, there is one multi-spectral (visible light and NIR or short-wave infrared) database called the MS-Face Database and the one (visible light) for the research of the 3D mask PAD method named the 3D MAD Database [15]. Recently, a database with several attacks that included 3D masks was published this summer, named The Wide Multi-Channel Presentation Attack (WMCA) Database [30]. However, there is no database based on the use of polarization infrared imaging in the study of the 3D silicone mask PAD method. In order to verify the effectiveness of the method proposed in this paper, the time-dependent polarization MWIR imaging system described in Section 2.1 is used for data acquisition, which shows in Figure 4. , block size is 14 14 × . The feature vector dimension is 196.
The gradient feature is designed based on the difference in DoLP of infrared radiation in different regions of facial images. According to the modeling process described in Section 2.1, this feature is only related to the refractive index and surface roughness σ of the face, and it is independent of the facial appearance.
After obtaining the feature vector based on the gradient, an SVM classifier is used to learn the gradient features of the polarization MWIR image of the real face and the 3D face mask, then the classification is completed to obtain the evaluation results.

Data Collection System and Material
The existing public databases for the research of the PAD method are as follows: most of them are built to study the PAD methods for photos and replaying videos, including nine visible light databases, the most typical of which is the NUAA (Nanjing University of Aeronautics and Astronautics) Imposter Database; additionally, there is one multi-spectral (visible light and NIR or short-wave infrared) database called the MS-Face Database and the one (visible light) for the research of the 3D mask PAD method named the 3D MAD Database [15]. Recently, a database with several attacks that included 3D masks was published this summer, named The Wide Multi-Channel Presentation Attack (WMCA) Database [30]. However, there is no database based on the use of polarization infrared imaging in the study of the 3D silicone mask PAD method. In order to verify the effectiveness of the method proposed in this paper, the time-dependent polarization MWIR imaging system described in Section 2.1 is used for data acquisition, which shows in Figure 4. The data collection system shown in Figure 4 consists of an MWIR camera with a resolution of 320 * 256, made by the Guide Infrared Company (pixel size: 30 μm and detection band: 3.7-4.8 μm), image acquisition software, a metal wire grid polarizer made by the Edmund Optics Company (applicable band: 2-12 μm), an optical experiment platform and several other polarization device accessories. The polarizer is fixed on the optical experiment platform and placed in front of the lens of the camera.
The 3D masks used in this research are made of silicone, as shown in Figure 5. accessories. The polarizer is fixed on the optical experiment platform and placed in front of the lens of the camera.
The 3D masks used in this research are made of silicone, as shown in Figure 5.
to prevent cold reflection: rotate the polaroid horizontally so that its main axis is about 11° from the main axis of the camera lens.
The data collection system shown in Figure 4 consists of an MWIR camera with a resolution of 320 * 256, made by the Guide Infrared Company (pixel size: 30 μm and detection band: 3.7-4.8 μm), image acquisition software, a metal wire grid polarizer made by the Edmund Optics Company (applicable band: 2-12 μm), an optical experiment platform and several other polarization device accessories. The polarizer is fixed on the optical experiment platform and placed in front of the lens of the camera.
The 3D masks used in this research are made of silicone, as shown in Figure 5. These masks are manufactured with holes in the eye and mouth locations and the facial region visually resembles real human facial skin. We tested it with an iPhone XS, as well as some other electronic devices which have facial recognition capabilities, and found that these masks can pass the verification of these systems.

Data Collection and Composition of Dataset
The temperature of the laboratory is approximately 25°C, and the facial temperature is about 35 • C. Subjects are asked to sit around 220 cm away from the imaging system and to face the camera. During the data collection, all the subjects do not wear eyeglasses. By rotating the polarizer, the experimenter captures the MWIR intensity images of four polarization angles (i.e., I 0 , I 45 , I 90 , and I 135 ) via image acquisition software. Figure 6 shows an example of the I 0 , I 45 , I 90 , and I 135 intensity images from one subject. These masks are manufactured with holes in the eye and mouth locations and the facial region visually resembles real human facial skin. We tested it with an iPhone XS, as well as some other electronic devices which have facial recognition capabilities, and found that these masks can pass the verification of these systems.

Data Collection and Composition of Dataset
The temperature of the laboratory is approximately 25 ℃, and the facial temperature is about 35 °C. Subjects are asked to sit around 220 cm away from the imaging system and to face the camera. During the data collection, all the subjects do not wear eyeglasses. By rotating the polarizer, the experimenter captures the MWIR intensity images of four polarization angles (i.e., I 0 , I 45 , I 90 , and I 135 ) via image acquisition software. Figure 6 shows an example of the I 0 , I 45 , I 90 , and I 135 intensity images from one subject. Then Stokes parameters S 0 , S 1 , and S 2 are calculated, and the MWIR polarization images can then be obtained. Figure 7 shows the MWIR images of two subjects before and after wearing 3D masks. Then Stokes parameters S 0 , S 1 , and S 2 are calculated, and the MWIR polarization images can then be obtained. Figure 7 shows the MWIR images of two subjects before and after wearing 3D masks.
Generally, the surface roughness of a 3D silicone mask is smaller than that of real facial skin and its surface refractive index is larger [31][32][33]. Thus, the MWIR DoLP of a 3D silicone mask is higher than that of a real human face. In addition, presentations with masks produce darker images than those of real faces in view of conventional MWIR intensity images, but their darkness varies with the facial temperature, which may lead to small differences between the two kinds of presentations. In contrast, with the polarization images, the differences are more obvious. Besides, polarization MWIR face images have richer textures and geometric information, which are conducive to improving the stability of the detection results. Then Stokes parameters S 0 , S 1 , and S 2 are calculated, and the MWIR polarization images can then be obtained. Figure 7 shows the MWIR images of two subjects before and after wearing 3D masks. Generally, the surface roughness of a 3D silicone mask is smaller than that of real facial skin and its surface refractive index is larger [31][32][33]. Thus, the MWIR DoLP of a 3D silicone mask is higher than that of a real human face. In addition, presentations with masks produce darker images than those of real faces in view of conventional MWIR intensity images, but their darkness varies with the A total of 352 effective samples are collected in this experiment as a sample dataset for the experiment, including 183 conventional MWIR intensity images and 169 polarization MWIR images. Table 1 shows the composition of the dataset. All data in the dataset are images taken by a 320*256 resolution camera and saved in PNG format. For the convenience of feature extraction, the image size is adjusted to 196 * 196 pixels.

Difference before and after Wearing Masks
For face presentation attack detection, we believe that the larger the difference in the low-order features of real and fake face images, the better the detection ability will be. In other words, the larger the difference value D, the better the results will be for feature extraction using the PAD method. The difference value D of real facial images and masked facial images is defined as: where I Fake and I Real represent 3D face mask images and real face images, respectively, and var(·) represents the process of solving the variance.
In this research, we calculated the D values of conventional WMIR images and corresponding polarization images of each subject before and after wearing the 3D silicone masks (a total of 58 sets of data). The statistical results are shown in Figure 8.
As can be seen from Figure 8, for each subject, the D values of polarization MWIR images before and after wearing 3D silicone masks are greater than the differences in conventional MWIR images. This result indicates that, compared with conventional MWIR imaging, polarization-based MWIR imaging may be more suitable for solving the PAD problem of 3D silicone masks.
where I Fake and I Re al represent 3D face mask images and real face images, respectively, and ( ) var  represents the process of solving the variance.
In this research, we calculated the D values of conventional WMIR images and corresponding polarization images of each subject before and after wearing the 3D silicone masks (a total of 58 sets of data). The statistical results are shown in Figure 8. As can be seen from Figure 8, for each subject, the D values of polarization MWIR images before and after wearing 3D silicone masks are greater than the differences in conventional MWIR Figure 8. The D-value distribution of 58 subjects' face images. The red line represents the differences in subjects' conventional MWIR images before and after wearing masks and the blue line represents those of the polarization images.

PAD Results
In this paper, the feature vectors representing each facial image are inputted into an SVM classifier, and then the classification results are obtained after training and testing. During the classification, a seven-fold cross-validation method is used, namely, all data are divided into seven parts on average; one part is taken as a test set and other six parts are taken as a training set. Corresponding to this experiment, the sample quantity of the test set is 24, and that of the training set is 145. The cross-validation is repeated seven times, every part of which is treated as a test set once, and then averaged with the results of the seven-fold cross-validation, resulting in a single estimate. The advantage of this method is that relatively stable and reliable detection results can be obtained.
To evaluate the PAD performance, this paper uses not only three old evaluation metrics (accuracy, recall, and precision) but also two new metrics defined by the ISO/IEC 30107-3 standard, namely the attack presentation classification error rate (APCER) and the bona fide presentation classification (BPCER) error rate. Another metric which is also derived is the average classification error rate (ACER), defined as (APCER+BPCER)/2, to summarize the overall performance of the PAD method as a single number. The lower the ACER values, the better the performance [34].
As mentioned in Section 2.2, the selection of threshold T, reduction coefficient c 1 and amplification coefficient c 2 will affect the detection results. By setting different values for coefficients, observing and comparing the experimental results under different coefficient values, the heuristic parameters were determined by taking the combination of coefficient values corresponding to the optimal results. After a large number of experiments, it was found that when the parameters are set as T = 77, c 1 = 0.07, c 2 = 2.5, the detection performance of the PAD method in this paper would reach the optimal state. After averaging the cross-validation results, the test result values are shown in Table 2. As we can see from Table 2, on the premise of using the same feature extraction and classification scheme, the performance of all measures, except APCER, in polarization MWIR images is better than that in conventional MWIR images. The error in the the test results may come from the SIFT-based registration algorithm before obtaining the polarization degree.
Since accuracy, recall, precision and ACER can directly represent the detection performance of the PAD method, the standard deviations of seven cross-validation experiments under conventional MWIR data and polarized MWIR data with these four metrics are calculated to measure the stability of the PAD performance represented by these two data types, as shown in Table 3 (for conventional MWIR data) and in Table 4 (for polarization MWIR data). As can be seen from Tables 3 and 4, the standard deviations of the PAD results under polarization MWIR images are all significantly lower than those under conventional MWIR images. Such a distribution of standard deviation happens because PAD detection results can be maintained in a relatively small fluctuation range, compared with conventional MWIR imaging, when polarized MWIR imaging is used. This shows that the use of a polarization MWIR imaging system for silicone mask presentation attack detection can provide a relatively more stable performance.
To summarize, polarized MWIR imaging is more suitable than conventional MWIR for studying PAD methods for 3D silicone masks.
In addition, in order to reflect the stability and reliability of the classifier used in this paper, the receiver-operation curve (ROC) and precision-recall curve are drawn from the average results of all cross-validations, as shown in Figure 9. Furthermore, the area under curve (AUC) value of ROC and the average precision (AP) value (the AP value is also the area under the curve surrounding the axis) of the precision-recall curve are both calculated to numerically reflect the performance of the classifier. As can be seen from the annotation in the figure, AUC = 0.96 and AP = 0.92. Combining the trend of the two curves with the value of AUC and AP, it can be inferred that the classifier used in this paper is stable and has good performance.  As can be seen from Tables 3 and 4, the standard deviations of the PAD results under polarization MWIR images are all significantly lower than those under conventional MWIR images. Such a distribution of standard deviation happens because PAD detection results can be maintained in a relatively small fluctuation range, compared with conventional MWIR imaging, when polarized MWIR imaging is used. This shows that the use of a polarization MWIR imaging system for silicone mask presentation attack detection can provide a relatively more stable performance.
To summarize, polarized MWIR imaging is more suitable than conventional MWIR for studying PAD methods for 3D silicone masks.
In addition, in order to reflect the stability and reliability of the classifier used in this paper, the receiver-operation curve (ROC) and precision-recall curve are drawn from the average results of all cross-validations, as shown in Figure 9. Furthermore, the area under curve (AUC) value of ROC and the average precision (AP) value (the AP value is also the area under the curve surrounding the axis) of the precision-recall curve are both calculated to numerically reflect the performance of the classifier. As can be seen from the annotation in the figure, AUC = 0.96 and AP = 0.92. Combining the trend of the two curves with the value of AUC and AP, it can be inferred that the classifier used in this paper is stable and has good performance. In order to explore the influence of facial temperature on the performance of the detection method in this paper, conventional MWIR images and corresponding MWIR polarization infrared images of the real and fake faces of 10 subjects are selected from the collected database. The facial

Effect of Facial Temperature
In order to explore the influence of facial temperature on the performance of the detection method in this paper, conventional MWIR images and corresponding MWIR polarization infrared images of the real and fake faces of 10 subjects are selected from the collected database. The facial temperature of these 10 subjects is maintained at the normal temperature, and they are numbered from No. 1 to No. 10, respectively. The other 10 subjects are asked to increase their facial temperature through exercise, and then conventional infrared intensity images and corresponding polarization infrared images of their real and fake faces are taken in the same way. The 10 subjects with an increased facial temperature are numbered from No. 11 to No. 20.
The D values of the above 20 subjects' 3D face mask images and real face images are calculated, respectively, and they are shown in Figure 10. For convenience, the data from (a) and (b) in Figure 10 are combined, as shown in Figure 11.  Figure 11 shows that: 1. Whether or not the facial temperature is changed, the polarization infrared images of real faces and 3D face masks can maintain the large differences between them compared with the conventional MWIR intensity images. 2. After the increase in facial temperature, the difference in conventional MWIR images between the real and fake faces tends to decrease, while the differences in their polarization images remain at a high level. It is easy for an attacker to make the infrared radiation intensity of a 3D mask similar to that of a real face by changing the facial temperature, so as to reduce the detection performance of the PAD method based on conventional MWIR images. However, the results of this experiment show that changes in the facial temperature For convenience, the data from (a) and (b) in Figure 10 are combined, as shown in Figure 11. For convenience, the data from (a) and (b) in Figure 10 are combined, as shown in Figure 11.  Figure 11 shows that: 1. Whether or not the facial temperature is changed, the polarization infrared images of real faces and 3D face masks can maintain the large differences between them compared with the conventional MWIR intensity images. 2. After the increase in facial temperature, the difference in conventional MWIR images between the real and fake faces tends to decrease, while the differences in their polarization images remain at a high level. It is easy for an attacker to make the infrared radiation intensity of a 3D mask similar to that of a real face by changing the facial temperature, so as to reduce the detection performance of the PAD method based on conventional MWIR images. However, the results of this experiment show that changes in the facial temperature cannot reduce the detection performance of the PAD method based on the MWIR

Number of Sample
Intensity differences(Before) Polarization differences(Before) Intensity differences(After) Polarization differences(After) Figure 11. Joint D-value distribution for real and fake face images of 20 subjects. Figure 11 shows that: 1.
Whether or not the facial temperature is changed, the polarization infrared images of real faces and 3D face masks can maintain the large differences between them compared with the conventional MWIR intensity images.

2.
After the increase in facial temperature, the difference in conventional MWIR images between the real and fake faces tends to decrease, while the differences in their polarization images remain at a high level. It is easy for an attacker to make the infrared radiation intensity of a 3D mask similar to that of a real face by changing the facial temperature, so as to reduce the detection performance of the PAD method based on conventional MWIR images. However, the results of this experiment show that changes in the facial temperature cannot reduce the detection performance of the PAD method based on the MWIR polarization characteristics of the material surface and gradient amplitude features.

Conclusions
This paper proposes a method to solve the problem of 3D silicone mask presentation attacks by employing polarization MWIR imaging. The method uses a polarization MWIR imaging system to capture a set of data without the need for visible or NIR light sources. The feature extraction process is designed based on the difference in infrared radiation (DoLP) in different regions of a facial image, which is only related to the refractive index N and the surface roughness σ, and is independent of the appearance of the face. The quantitative experiment in this paper shows that polarization-based MWIR imaging is more suitable for the study of the 3D silicone face mask PAD method than conventional MWIR imaging. Furthermore, the PAD method in this paper displays a certain robustness in the detection of facial temperature changes. However, due to the cost of the masks, the amount of data collected in this research is not large, so deep learning method cannot be used due to the over-fitting of the network. In future works, the data amount will be expanded to develop a more advanced deep learning method based on polarized MWIR imaging for 3D silicone masks.