1. Introduction
Despite their widespread use and acceptance in an ever-increasing range of applications, biometric person recognition systems remain vulnerable to sophisticated spoofing attacks that can undermine the trust in them [
1]. This type of spoofing is a direct attack on the sensor and is also known as presentation attack. Such presentation attacks use artefacts such as photos or masks that may be created from the previously captured data of genuine users and then presented at the system sensor(s). In this way, without any or much prior knowledge about the internal operation of the biometric system, a fake biometric sample of a genuine user can be presented by an impostor to gain unauthorised access. To detect such sensor-level attacks, it is necessary to automatically recognise if any artefacts are being used and establish that a genuine user is present at the sensor providing the “live” sample.
This paper presents novel features, based on stimulated pupillary movements, for presentation attack detection (PAD), extending our work originally reported in [
2]. The users’ gaze is directed along a random path presented via a display device. The accuracy with which this path is followed by the users’ gaze is then used as a means of detecting presentation attacks; the underlying assumption being that the use of an attack artefact such as a photo or mask by an imposter makes it more difficult to follow the path of the stimulus accurately. Different stimulus trajectories, durations, planar geometries and attack artefacts are evaluated through data captured from 80 volunteers. The results are discussed and compared with other approaches for presentation attack detection. While our previous work [
3] was able to deal with video attacks, the scope of this work is limited to photo and mask attack detections only. The proposed approach does not protect against video replay attack. The focus of this work is the reduction in the user interaction time required for the mask and photo attack instruments while considering a more restrictive device geometry to simulate a mobile device application.
The paper is organised as follows.
Section 2 provides an overview of the state-of-the-art related to presentation attack detection.
Section 3 describes the proposed techniques including two different types of challenge trajectories, and the feature extraction technique.
Section 4 presents the evaluation protocol and the experimental results and finally
Section 5 provides conclusions and suggestions for further work.
4. Experimental Evaluation
Three types of attack artefacts were used here in order to evaluate the proposed techniques. The attack scenarios assume an impostor attempting to subvert the biometric system by displaying a high-resolution image of a genuine user on a tablet screen (photo attack), or a high-quality printed colour photo with holes in place of the pupils held in front of the impostor’s face as a mask (2D mask attack) or presenting a three-dimensional mask constructed using the genuine user’s data (3D mask attack) [
17].
Eighty adult male and female participants from a range of ethnic backgrounds were recruited to evaluate the proposed system while acting as both genuine users and impostors. The number of participants was similar to that used in other published work in the presentation attack detection and should be sufficient to illustrate the potential of the proposed approach.
Figure 4 illustrates the hardware setup for data acquisition as well as snapshots of user attempts (both genuine and impostor attacks).
Figure 4a is an example of a genuine attempt,
Figure 4b shows projected photo attack, and
Figure 4c,d show 2D mask and 3D mask attacks, respectively.
Two different device geometries were simulated on a desktop display for system evaluation. Active screen areas of dimensions 6.45 × 11.30 cm2 and 15.87 × 21.18 cm2 were used, which corresponds to typical handheld mobile phone and tablet devices, respectively. These formats were envisaged the most likely ones that may be used while accessing services through mobile devices. Acquired data were partitioned at the 60:40 ratio for training and test purposes while the k-NN schemes were used for PAD classification.
The Receiver Operating Characteristic (ROC) curves for photo, 2D mask and 3D mask attacks for the tablet format with the
Lines challenge for attempt durations of 5 s are given in
Figure 5a. Here,
True Positive Rate (TPR) relates to decisions where genuine user attempts are correctly identified whereas
False Positive Rates (FPR) are presentation attacks not detected by the system. While the 2D and 3D mask attacks are relatively easy to detect using the
Lines challenge trajectory type, photo attack detections were significantly more difficult. This can be due to the fact that images captured from the photo attacks are smaller in size and of lower quality, making gaze feature extraction more susceptible to noise.
The ROC curves for the three attack scenarios in the phone format are given in
Figure 5b. The photo attack again appears to be relatively harder to detect compared to the 2D and 3D mask attacks which are easy to detect. 3D attacks are attacks which are slightly harder to detect compared to 2D mask attacks.
Table 1 provides a summary of the performance figures for the photo, 2D mask and 3D mask attacks both in Tablet and Phone formats at various FPR settings. The system performance values are reasonably high at 10% FPR. However, as the FPR values are lowered, the performance, especially of the photo attack detection, drop significantly at lower FPRs. Especially for the phone format, at 1% FPR, the TPR values dropped to as low as 16%. Nevertheless, in cases of 2D and 3D mask attacks, the reduction in system performance was much smaller compared to the photo in both tablet and mobile format.
Table 2 provides a comparison of the performance of the three PAD cases in both Tablet and Phone formats for various challenge time durations. In most cases, the performance remains almost unchanged for 3, 5 and 10 s challenge durations. This suggests that short duration challenges may be acceptable, thus enhancing the usability of the proposed approach. Performance in photo attack detection (esp. for the phone format) has increased with longer durations, indicating that for some difficult cases, relatively longer challenges can improve the robustness of the system.
The next set of results explore the impact of a challenge scenario comprising smooth pursuits only and report on a set of experiments with data which were captured while using the Curves challenge. The purpose of these experiments was to check the effect of the challenge trajectory design on the performance of the system. It is envisaged that the abrupt directional changes present in the Lines stimuli may have had detrimental effect (such as trigger large head movements) and the new Curves challenge will inspire a smooth pursuit of gaze. Once again, all three attack artefact types, photo, 2D and 3D masks were used and the tablet and phone challenge geometries were investigated.
Figure 6 presents the ROC plots for the
Curves challenge trajectory for the three attack artefact types. The proposed method again performed very effectively in distinguishing 2D mask attacks from genuine presentation. The performance for the 3D mask attacks, while not as good as that for 2D mask detection, was reasonably close. However, significantly low performance was observed for the photo attack detection for both of the form factors.
Table 3 provides a summary of the results for various FPR settings with the
Curves challenge trajectory. When compared with the
Lines challenge figures in
Table 2, the TPR values for the
Curves stimulus were relatively lower for most of the attack scenarios except that the photo attack detection rate for the phone format improved. This indicates that there exists some complementarity between these two challenge trajectories and a hybrid one may prove optimal.
The performance of the proposed PAD system using the
Curves challenge trajectory at 10% FPR is summarised in
Table 4 for various challenge durations. Unlike the
Lines challenge, TPR values improved with the increased challenge duration. In particular, the performance for photo attacks, while lower than that for the other attack types, did noticeably improve with increased challenge duration. The relatively low performance for Photo attacks may be due to the relatively small size of the photos used in the simulated attack, making feature extraction less precise. Even for a 5 s challenge duration, the RPD-based system was able to achieve a TPR accuracy of 90% or more for the two types of mask attack.
In the following set of experiments, we explored the feasibility of a composite scheme simulating a hybrid scenario where the user was presented with both visual challenges in succession and the final decision was based on the fusion of the outcome of the two components.
Figure 7 shows the results for this composite challenge for the two device formats and three attack types. Each of the challenges were presented to the user for 3 s and were analysed independently before being fused using the product rule. The ROC curves in
Figure 7 also include the 3 s- and 5 s-long pure
Lines and pure
Curves challenge outcomes for comparison. Logarithmic axes were used to highlight the differences at low FPR settings. It is very obvious that the composite scheme clearly outperformed the individual challenge types by a significant margin, especially for the 2D mask and 3D mask attack types. The response to the photo-attacks was somewhat mixed. For the tablet format, the detection rates were clearly higher than those for the pure
Lines and pure
Curves challenges. However, for the phone format, the detection rates were similar to that for the
Curves challenge albeit a little lower.
Table 5 summarises these performance figures for the photo, 2D mask and 3D mask attacks both in the tablet and phone formats at various FPR settings. It is evident that the TPR values from the composite challenge scenario are noticeably higher than those from the
Lines or
Curves challenges only, except for the photo attack detections in the phone format. When compared with the best TPR values obtained (either from the
Lines or
Curves), 3–5% improvements can be achieved for the Tablets and 0.5–1.3% for the phone devices at low FPR settings (≤0.03). Even when compared with 10 s pure challenges, in most cases, the 6 s composite challenge performed better.
Only in the photo attack cases on the phone devices, degraded performances (by 5–6%) were noticed. This is most likely due to the low photo attack detection success of the Lines challenge in the phone format. Perhaps a careful adjustment in the contribution of Line and Curve elements in the composite scenario will be able to overcome this anomaly; however, this optimisation has not been explored in this study.
In a real-life scenario, an impostor may use any of the face artefacts (photo or 2D/3D masks) in their presentation attack. To simulate this, in the following experiment, we combined all the three types of attacks under a single category and assessed the detection success of the proposed system.
Figure 8 shows ROC plots for the three challenge scenarios (
Lines,
Curves, or composite). The aim here was to detect whether any of the attack artefacts were used or not and no attempt was made to determine the type of artefact. Due to the increased diversity of attack, the TPR values (see
Table 6) are lower than those values when the specific attack type was known. However, especially for the Composite challenge type, more than 80% of the attacks were detected at FPR settings of ≥0.02.
Table 7 presents a comparison of performances of PAD techniques reported in the literature with the proposed technique. For this comparison, the results for the proposed system for all three attack types were combined to obtain an overall estimate of the False Negative Rate (FNR) at different FPR settings. As different databases and evaluation protocols were used in the evaluations reported in the literature, it is difficult to make a direct comparison between these results. However, as a general indication of the potential of the proposed eye-movement features for presentation attack detection, the comparison was very promising.