Iris Liveness Detection Using Multiple Deep Convolution Networks

: In the recent decade, comprehensive research has been carried out in terms of promising biometrics modalities regarding humans’ physical features for person recognition. This work focuses on iris characteristics and traits for person identiﬁcation and iris liveness detection. This study used ﬁve pre-trained networks, including VGG-16, Inceptionv3, Resnet50, Densenet121, and EfﬁcientNetB7, to recognize iris liveness using transfer learning techniques. These models are compared using three state-of-the-art biometric databases: the LivDet-Iris 2015 dataset, IIITD contact dataset, and ND Iris3D 2020 dataset. Validation accuracy, loss, precision, recall, and f1-score, APCER (attack presentation classiﬁcation error rate), NPCER (normal presentation classiﬁcation error rate), and ACER (average classiﬁcation error rate) were used to evaluate the performance of all pre-trained models. According to the observational data, these models have a considerable ability to transfer their experience to the ﬁeld of iris recognition and to recognize the nanostructures within the iris region. Using the ND Iris 3D 2020 dataset, the EfﬁceintNetB7 model has achieved 99.97% identiﬁcation accuracy. Experiments show that pre-trained models outperform other current iris biometrics variants.


Introduction
Iris identification systems have proven to be dependable over time and are inexpensive, non-invasive, and contactless; these attributes will help it to expand in the market over the following years [1]. Presentation attack instruments (PAI) have been proven to be a significant threat to iris recognition systems [2]. Here, PAI refers to a trait of biometrics employed in a presentation attack (PA). Presentation attack detection (PAD) refers to a biometric system's ability to identify PAIs, which would otherwise deceive the system into mistaking an unauthorized user for a legitimate one by presenting an artificial, forged version of the original biometric attribute to the image capture equipment.
The biometric community, including researchers and manufacturers, has taken on the difficult challenge of designing and creating effective security measures against this issue [3], with PAD approaches being recommended as a possible solution. Threats are no longer limited to theoretical or scientific research; they are already being conducted against real-world businesses. One example is using a regular printer and a contact lens to attack Samsung Galaxy S8 devices with the iris unlock feature. Hacking groups aiming to gain notoriety for genuine criminal cases have disclosed this instance to the public via live biometric presentations during conferences [4]. All of these threats, as well as any new or unfamiliar PAI forms that might be developed in the future, should be detectable using an ideal PAD approach [4]. As early models of the LivDet competition have demonstrated, PAD for iris recognition systems is diverse, with many unresolved issues in developing practical algorithms for identifying iris PAD [5]. This article uses five pre-trained networks to recognize iris liveness: VGG-16 [6], Inceptionv3 [7], Resnet50 [8], Densenet121 [9], and EfficientNetB7 [10]. We compared models in this study using the same data and factors to find the best model for distinguishing between real and fake iris images. To eliminate any biases, the models were trained and evaluated on real and fake iris images from several datasets. The models were then evaluated using performance measures, along with the time it takes to compute them. The findings were thoroughly examined, and the best model for binary classification was selected.
To our knowledge, the most commonly employed transfer learning models for identifying iris liveness in the existing literature are VGG-16 [11], Inceptionv3 [7], Resnet50 [8], Densenet121 [8], and EfficientNetB7. These models are validated using one or two iris datasets. To date, no one has carried out a comparative analysis among these models, based on different state-of-the-art iris biometric databases. Therefore, there is a need for these comparative analyses to identify which pre-trained model gives the best iris liveness detection among the different standard iris benchmark datasets.
In this paper, several transfer learning models are used to identify iris liveness detection. This work's primary contributions can be summarized in the following points: • To identify iris liveness through five pre-trained networks, namely, VGG-16, Incep-tionv3, Resnet50, Densenet121, and EfficientNetB7; • To conduct a performance comparison across all five models to decide which pretrained model is better for Iris-PAD; • To fine-tune all these models to achieve better performance.
To assess these models as performance indicators, twelve metrics are used: i.e., validation accuracy, training accuracy, validation loss, training loss, precision, recall and f1-score, APCER, BPCER, ACER, training time, and testing time.
The rest of the paper is structured as follows. Section 2 discusses related work in the published literature. The background, architecture, and working process of the proposed system are all described in Section 3. Section 4 explains the experimental setup, along with a description of the datasets used for experimentation and the performance metrics used for evaluation. Section 5 describes the experimental results. A comparison of the model's performance with other models and a discussion of the results is offered in Section 6. Lastly, Section 7 offers our conclusions.

Related Work
Due to the increasing deployment of these systems for various secure processes, which raises the possibility of criminal assaults on these sensitive systems, numerous PAD approaches for iris identification systems have been presented in the research literature [12]. The majority of iris PAD research has been focused on deep learning algorithms since 2018, but a few traditional computer-vision-based methods have been proposed [13].
The following are some of the most prevalent methods previously used for detecting iris liveness: Czajka [14] used pupil dynamics to create a liveness detection system. The pupil reaction was tested in this technique, using unexpected changes in light intensity. Fathy and Ali [15] did not take into account the segmentation and normalization steps that are commonly employed in fake iris identification systems. The original image is broken down into wavelets using wavelet packets (WPs). For false iris identification, Agarwal et al. [16] employed a feature descriptor called a local binary hexagonal extreme pattern. The proposed description takes advantage of the Hexa neighbor relationship between the center pixel and its neighbors. Thavalengal et al. [17] created a smartphone device for capturing RGB and NIR images of the eye and iris. For detection, distance measurements and pupil localization algorithms are often applied. One of the most recent and promising classification techniques uses deep learning. In the field of iris images, there are many works that use and apply this approach. Some of these works are described below. The author Kuehl Kamp [18] suggested integrating two iris PAD techniques: ensemble learning and CNNs. Widespread testing of this technique was carried out with the most challenging datasets available that were widely accessible. Cross-sensor and cross-dataset analyses were part of their experiments.
Their results revealed that different BSIF + CNN representations have differing abilities to capture distinct elements of the input images. This technique outperforms the LivDet-Iris 2017 competition results. One author, Hoffman, also reported good cross-dataset and crossattack performance. A CNN had previously been used in [9] to perform classification tests on patches of an iris region. The findings revealed that the most challenging presentation attack to identify is that of textured contact lenses. This classification test method was eventually expanded to include the ocular region [19]. Three CNNs were combined to generate classification judgments in that study. Additional information that assists classification and excellent cross-dataset performance can be obtained by studying the ocular region in conjunction with the iris.
Transfer learning [20] is a process in which a model trained for one purpose on any large dataset can be reused for training and testing for a related purpose on small datasets. This approach is still used and is applied to iris images in several previous studies, some of which are described here.
Spoof nets [7] comprise four convolutional layers and one inception module and were inspired by GoogleNet. The inception module is made up of parallel layers of convolutional filters with dimensions of 11, 33, and 55. The module benefits from minimizing the architecture's complexity and increasing its efficiency, as the dimension-11 filters reduce the number of features beforehand, implementing layers of convolution through higherdimensional filters. The ResNet50 framework was used by Boyd [8] to see if iris-specific feature extractors could outperform a network trained for non-iris applications. They used five distinct sets of weights to demonstrate "three types of networks: off-the-shelf networks, fine-tuned networks, and networks trained from scratch for iris identification. They found that fine-tuning a current network to the specific iris domain outperformed training from scratch".
For iris PAD, Yadav et al. [11] integrated handmade and deep-learning-based features. The VGG16 features were acquired from the last fully connected layer, which had a size of 4096; then, PCA (principal component analysis) was used to reduce it to a lowerdimensional vector. Trokielewicz et al. [21] offered a method of iris PAD to detect postmortem samples, using a fine-tuned VGG-16 architecture. By providing class activation maps, this approach also examines those features and regions that the network finds most relevant to PAD classification. The results demonstrated a significant ability to detect post-mortem iris samples; however, there was no discussion of the cross-attack analysis. Yadav et al. [9] offered DensePAD, a novel PAD design based on the famous DenseNet CNN architecture. This suggested architecture received 120 × 160 normalized iris images as input and would then output a judgment on whether the sample was genuine or not. Their study looked at textured contact lenses in an uncontrolled, cross-sensor environment and presented promising findings on previously unseen varieties of textured contacts.
Unfortunately, because different researchers use separate deep learning approaches, it is questionable which model is superior. As a result, this paper examines multiple deep neural networks that have previously produced excellent results in identifying iris liveness. Based on our classification needs, the existing models were fine-tuned (VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7) in this research. As a result, the models were evaluated throughout this study using the same datasets and parameters to find the best model for distinguishing between authentic and false iris images. To prevent biases, the models were first trained and evaluated on real and fake iris images from diverse datasets. The models were then evaluated, based on their evaluation criteria and the amount of time they took to compute the results. The findings were thoroughly examined, and the best model for binary classification was chosen.

Proposed Iris Liveness Detection
To mitigate time issues with the training data set, a transfer learning strategy was applied and pre-trained weights from ImageNet were used, which helped to speed up the process. Due to the tiny data set, the models avoided overfitting via transfer learning. The schematic design of the proposed model is illustrated in Figure 1. In this investigation, three standard iris benchmark datasets were used. To prevent any biases toward data, images from several different databases were sent to the models. We fine-tuned the last layer of five state-of-the-art deep learning models-VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7. For binary classification, these models were fine-tuned: the last set of layers was modified with a flattened layer, consisting of fully connected layers and a SoftMax activation function that turned the data from the existing layer into a huge one-dimensional matrix. During the training of these networks, the data augmentation (DA) technique was used. Some augmentation approaches, like flipping and rotation, were used on the input image matrix to generate supplementary training images. For regularization, a 0.5-dropout was added. Finally, a dense layer was added that employed SoftMax activation for the earlier layers and produced two probability outputs for the "Live Iris Image" and "Fake Iris Image" classes. Finally, we offered model selection criteria based on performance and time complexity. We have made all trained models public so that they may be utilized for iris liveness detection transfer learning. The design of these models and how they have been employed for two-class classification will be briefly described in the following section.
liveness. Based on our classification needs, the existing models were fine-tuned (VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7) in this research. As a result, the models were evaluated throughout this study using the same datasets and parameters to find the best model for distinguishing between authentic and false iris images. To prevent biases, the models were first trained and evaluated on real and fake iris images from diverse datasets. The models were then evaluated, based on their evaluation criteria and the amount of time they took to compute the results. The findings were thoroughly examined, and the best model for binary classification was chosen.

Proposed Iris Liveness Detection
To mitigate time issues with the training data set, a transfer learning strategy was applied and pre-trained weights from ImageNet were used, which helped to speed up the process. Due to the tiny data set, the models avoided overfitting via transfer learning. The schematic design of the proposed model is illustrated in Figure 1. In this investigation, three standard iris benchmark datasets were used. To prevent any biases toward data, images from several different databases were sent to the models. We fine-tuned the last layer of five state-of-the-art deep learning models-VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7. For binary classification, these models were fine-tuned: the last set of layers was modified with a flattened layer, consisting of fully connected layers and a SoftMax activation function that turned the data from the existing layer into a huge one-dimensional matrix. During the training of these networks, the data augmentation (DA) technique was used. Some augmentation approaches, like flipping and rotation, were used on the input image matrix to generate supplementary training images. For regularization, a 0.5-dropout was added. Finally, a dense layer was added that employed SoftMax activation for the earlier layers and produced two probability outputs for the "Live Iris Image" and "Fake Iris Image" classes. Finally, we offered model selection criteria based on performance and time complexity. We have made all trained models public so that they may be utilized for iris liveness detection transfer learning. The design of these models and how they have been employed for two-class classification will be briefly described in the following section.

VGG-16
The VGG16 input is a 224 × 224 RGB image with a predefined size. It has 16 layers, including 13 convolutional layers and three fully connected layers, using maxpooling to minimize volume size and a SoftMax classifier just after the fully connected layer. For this learning process, the last fully connected layer and SoftMax activation are substituted with our designed classifier, as shown in Figure 2

VGG-16
The VGG16 input is a 224 × 224 RGB image with a predefined size. It has 16 layers, including 13 convolutional layers and three fully connected layers, using maxpooling to minimize volume size and a SoftMax classifier just after the fully connected layer. For this learning process, the last fully connected layer and SoftMax activation are substituted with our designed classifier, as shown in Figure 2 [6].

InceptionV3
InceptionV1 architecture is also known as GoogleNet. InceptionV3 has 484 levels, each of which contains 11 inception modules. It has a 299 × 299 image input size. Convolution filters, pooling layers, and the Relu activation function are included in each module. InceptionV3 lessens the number of variables without compromising network efficiency, by factoring in convolutions. To minimize the number of features, InceptionV3 developed a revolutionary downsizing method. Figure

InceptionV3
InceptionV1 architecture is also known as GoogleNet. InceptionV3 has 484 levels, each of which contains 11 inception modules. It has a 299 × 299 image input size. Convolution filters, pooling layers, and the Relu activation function are included in each module. InceptionV3 lessens the number of variables without compromising network efficiency, by factoring in convolutions.
To minimize the number of features, InceptionV3 developed a revolutionary downsizing method. Figure 3 depicts our fine-tuned InceptionV3 model for detecting iris liveness.

ResNet 50
ResNet50 is a ResNet (residual network) variant. There are 48 convolutional layers, 1 MaxPool, and one average pool layer in this model. Each convolution block has three convolution layers, and each identification block also has three convolution layers. ResNet-50 can be trained with about 23 million parameters. Figure 4

ResNet 50
ResNet50 is a ResNet (residual network) variant. There are 48 convolutional layers, 1 MaxPool, and one average pool layer in this model. Each convolution block has three convolution layers, and each identification block also has three convolution layers. ResNet-50 can be trained with about 23 million parameters. Figure 4 depicts the fine-tuned ResNet50 model for detecting Iris liveness.

DenseNet121
DenseNet121 s input is a 224 × 224 RGB image of a predefined size. DenseNet121 is made up of 121 layers, with about 8 million parameters. It is organized into dense blocks, with the same feature map size but with various filters within each block. Transition layers are the layers that reside between the blocks and apply batch normalization for down-sampling. The last fully connected layer and SoftMax activation are substituted with a classifier in this experiment, as shown in Figure 5.

DenseNet121
DenseNet121′s input is a 224 × 224 RGB image of a predefined size. DenseNet121 is made up of 121 layers, with about 8 million parameters. It is organized into dense blocks, with the same feature map size but with various filters within each block. Transition layers are the layers that reside between the blocks and apply batch normalization for downsampling. The last fully connected layer and SoftMax activation are substituted with a classifier in this experiment, as shown in Figure 5.

EfficientNetB7
Efficient Net [10], one of the most advanced models, introduced a scaling strategy that uses a compound coefficient to equally scale all parameters of a network's depth, width, and resolution. The EfficientNetB0-B7 designs are a family of architectures that have been built up from the baseline network and represent a good blend of accuracy and efficiency.

EfficientNetB7
Efficient Net [10], one of the most advanced models, introduced a scaling strategy that uses a compound coefficient to equally scale all parameters of a network's depth, width, and resolution. The EfficientNetB0-B7 designs are a family of architectures that have been built up from the baseline network and represent a good blend of accuracy and efficiency. Figure 6 depicts our fine-tuned EfficientNetB7 model for detecting iris liveness.

Experimental Set-Up
There are three subheadings in this section. All three datasets utilized for experimental validations are discussed in the first subsection. The second subsection describes how these deep learning models are trained. The third subsection delves deeper into the evaluation criteria utilized to evaluate the suggested approach's results.

Description of the Dataset
Using numerous databases, the efficacy of the suggested models against various types of iris spoofing assaults is assessed. A description of each dataset and the total images used for experimentation is given below. From the total samples, 50% of images were arbitrarily used for testing. The remaining 50% of images were divided into two groups, with 80% used for training and 20% used for validation.

LivDet-Iris 2015: Clarkson dataset
The Clarkson dataset has different training and testing images. The chosen classifiers were trained on training samples and tested on testing samples that are present in the dataset. In total, 3588 images were used for experimentation. Dalsa and LG sensors were utilized to acquire the images on this dataset. For both training and testing, three types of iris images were provided: live, patterned (contact lenses), and printed photographs [22]. Table 1 shows the number of images used for training, testing, and validation, along with samples of live and fake images from the LivDet-Iris 2015 dataset.

Datasets Features
Parameters Total Instances 3588

Experimental Set-Up
There are three subheadings in this section. All three datasets utilized for experimental validations are discussed in the first subsection. The second subsection describes how these deep learning models are trained. The third subsection delves deeper into the evaluation criteria utilized to evaluate the suggested approach's results.

Description of the Dataset
Using numerous databases, the efficacy of the suggested models against various types of iris spoofing assaults is assessed. A description of each dataset and the total images used for experimentation is given below. From the total samples, 50% of images were arbitrarily used for testing. The remaining 50% of images were divided into two groups, with 80% used for training and 20% used for validation.

LivDet-Iris 2015: Clarkson Dataset
The Clarkson dataset has different training and testing images. The chosen classifiers were trained on training samples and tested on testing samples that are present in the dataset. In total, 3588 images were used for experimentation. Dalsa and LG sensors were utilized to acquire the images on this dataset. For both training and testing, three types of iris images were provided: live, patterned (contact lenses), and printed photographs [22]. Table 1 shows the number of images used for training, testing, and validation, along with samples of live and fake images from the LivDet-Iris 2015 dataset. The IIIT-D CLI database is provided by the image analysis and biometrics laboratory of the IIIT in Delhi [23,24]. It consists of 6570 iris images from 101 separate people. A total of 202 iris classifications were created by photographing each subject's left and right iris. Images were captured using the Cogent CIS 202 dual iris sensor and the VistaFA2E single iris sensor [25]. Datasets provided three types of iris images: live (original images), colored contact lenses, and clear contact lenses. A total of 2000 images were selected randomly for experimentation. Table 2 shows the number of images used for training, testing, and validation, along with a sample of live and fake images from the CLI dataset. The IIIT-D CLI database is provided by the image analysis and biometrics laboratory of the IIIT in Delhi [23,24]. It consists of 6570 iris images from 101 separate people. A total of 202 iris classifications were created by photographing each subject's left and right iris. Images were captured using the Cogent CIS 202 dual iris sensor and the VistaFA2E single iris sensor [25]. Datasets provided three types of iris images: live (original images), colored contact lenses, and clear contact lenses. A total of 2000 images were selected randomly for experimentation. Table 2 shows the number of images used for training, testing, and validation, along with a sample of live and fake images from the CLI dataset.

IIITD Contact Lens Iris (CLI Dataset)
The IIIT-D CLI database is provided by the image analysis and biometrics laboratory of the IIIT in Delhi [23,24]. It consists of 6570 iris images from 101 separate people. A total of 202 iris classifications were created by photographing each subject's left and right iris. Images were captured using the Cogent CIS 202 dual iris sensor and the VistaFA2E single iris sensor [25]. Datasets provided three types of iris images: live (original images), colored contact lenses, and clear contact lenses. A total of 2000 images were selected randomly for experimentation. Table 2 shows the number of images used for training, testing, and validation, along with a sample of live and fake images from the CLI dataset.   Images were taken of 88 subjects (176 irises) wearing three distinct brands of contact lenses: Johnson & Johnson, Ciba Vision, and Bausch & Lomb, of irises with and without contact lenses [5,26]. Under varying near-infrared illumination, images were acquired using the LG4000 and AD100 iris sensors, allowing optical stereo-based 3D reconstruction techniques to be designed and tested. The dataset contains 6838 images, with the LG4000 sensor acquiring 3488 images and the AD100 sensor acquiring 3362 images. Table 3 shows the number of images used for training, testing, and validation, along with a sample of live and fake images from the ND_Iris3D_2020 dataset. Images were taken of 88 subjects (176 irises) wearing three distinct brands of contact lenses: Johnson & Johnson, Ciba Vision, and Bausch & Lomb, of irises with and without contact lenses [5,26]. Under varying near-infrared illumination, images were acquired using the LG4000 and AD100 iris sensors, allowing optical stereo-based 3D reconstruction techniques to be designed and tested. The dataset contains 6838 images, with the LG4000 sensor acquiring 3488 images and the AD100 sensor acquiring 3362 images. Table 3 shows the number of images used for training, testing, and validation, along with a sample of live and fake images from the ND_Iris3D_2020 dataset.

Model Training
For this study, a transfer learning approach was adopted, and pre-trained weights from ImageNet were used for the training data set. The pre-trained model weights were treated as the initial values for the new training process, and they were updated and adjusted in the training process. In this case, the weights were fine-tuned, from generic feature maps to the specific features associated with the new dataset. The goal of fine-tuning is to adapt generic features to a given task, rather than overwrite the generic learning.
The VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7 models were trained on an Intel(R) Core(TM) i3-6006U CPU @ 2.00 GHz 1.99 GHz, 12.0 GB RAM computer, running on a 64-bit operating system, for this experiment. The deep learning library TensorFlow 2.7 with Keras API was utilized for the algorithm creation and implementation of all models. To quantify the model's effectiveness from the ground truth probabilities, the categorical cross-entropy loss function was used to train it. We then used an Adam optimizer with a learning rate of 0.001 to reduce the loss function and increase efficacy. To avoid the problem of an overfitting or underfitting model, we created an early termination approach, based on validation performance. During the training of these networks, data augmentation (DA) techniques, such as flipping and rotation, were used. Flipping and rotation are used on the input image matrix to generate supplementary training images. These augmentation techniques are used to reduce the risk of overfitting, Fake Iris Images lenses: Johnson & Johnson, Ciba Vision, and Bausch & Lomb, of irises with and without contact lenses [5,26]. Under varying near-infrared illumination, images were acquired using the LG4000 and AD100 iris sensors, allowing optical stereo-based 3D reconstruction techniques to be designed and tested. The dataset contains 6838 images, with the LG4000 sensor acquiring 3488 images and the AD100 sensor acquiring 3362 images. Table 3 shows the number of images used for training, testing, and validation, along with a sample of live and fake images from the ND_Iris3D_2020 dataset.

Model Training
For this study, a transfer learning approach was adopted, and pre-trained weights from ImageNet were used for the training data set. The pre-trained model weights were treated as the initial values for the new training process, and they were updated and adjusted in the training process. In this case, the weights were fine-tuned, from generic feature maps to the specific features associated with the new dataset. The goal of fine-tuning is to adapt generic features to a given task, rather than overwrite the generic learning.
The VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7 models were trained on an Intel(R) Core(TM) i3-6006U CPU @ 2.00 GHz 1.99 GHz, 12.0 GB RAM computer, running on a 64-bit operating system, for this experiment. The deep learning library TensorFlow 2.7 with Keras API was utilized for the algorithm creation and implementation of all models. To quantify the model's effectiveness from the ground truth probabilities, the categorical cross-entropy loss function was used to train it. We then used an Adam optimizer with a learning rate of 0.001 to reduce the loss function and increase efficacy. To avoid the problem of an overfitting or underfitting model, we created an early termination approach, based on validation performance. During the training of these networks, data augmentation (DA) techniques, such as flipping and rotation, were used. Flipping and rotation are used on the input image matrix to generate supplementary training images. These augmentation techniques are used to reduce the risk of overfitting,

Model Training
For this study, a transfer learning approach was adopted, and pre-trained weights from ImageNet were used for the training data set. The pre-trained model weights were treated as the initial values for the new training process, and they were updated and adjusted in the training process. In this case, the weights were fine-tuned, from generic feature maps to the specific features associated with the new dataset. The goal of fine-tuning is to adapt generic features to a given task, rather than overwrite the generic learning.
The VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7 models were trained on an Intel(R) Core(TM) i3-6006U CPU @ 2.00 GHz 1.99 GHz, 12.0 GB RAM computer, running on a 64-bit operating system, for this experiment. The deep learning library TensorFlow 2.7 with Keras API was utilized for the algorithm creation and implementation of all models. To quantify the model's effectiveness from the ground truth probabilities, the categorical cross-entropy loss function was used to train it. We then used an Adam optimizer with a learning rate of 0.001 to reduce the loss function and increase efficacy. To avoid the problem of an overfitting or underfitting model, we created an early termination approach, based on validation performance. During the training of these networks, data augmentation (DA) techniques, such as flipping and rotation, were used. Flipping and rotation are used on the input image matrix to generate supplementary training images. These augmentation techniques are used to reduce the risk of overfitting, thereby improving the accuracy of unseen data. To reduce the bias toward a dataset, the system was introduced to images from several databases. For regularization, a 0.5-dropout was added. Finally, a dense layer was added that employed SoftMax activation in the earlier layers and produced two probability outputs for the "Live Iris Image" and "Fake Iris Image" classes.

Performance Measures
Accuracy, as a common machine learning performance evaluation metric, was utilized to compare the performance of all the tested variations of the suggested approach. In addition, other critical biometric measures were considered: performance indices, such as loss, validation accuracy, precision, recall, f1_score, APCER, NPCER, and ACER were used to evaluate our analysis. The formulae for all performance measures are given in Equations (1)- (8).

Results
This section gives the results of several experiments on the three datasets with five transfer learning networks, i.e., VGG-16, Inceptionv3, Resnet50, Densenet121, and Efficient-NetB7. This section is organized into four sub-sections. Section 5.1 presents the results and graphs for the VGG-16 approach. Section 5.2 presents the results of the InceptionV3 network tests. The ResNet50 approach is discussed in Section 5.3. Section 5.4 presents the results of the DenseNet121 network tests. Section 5.5 presents the results of the EfficientNetB7 network tests. Tables 4-8 show the results of validation for the respective networks.

VGG-16
This section reports the best results for each dataset using the VGG16 model. Adam optimization achieved better performances compared to SGD. Table 4 shows a summary of the outcomes of the VGG16 model across all three datasets. It can be observed from Table 4 that the Clarkson 2015 dataset gave the best validation accuracy of 99.72%, while the ND Iris3D_2020 dataset gave the lowest ACER of 0.1%, with the lowest testing time of 417 s. Figure 7 shows the training and validation analysis over five epochs of the pre-trained VGG-16 model. From Figure 7, we can infer that, overall, the best results were observed using Adam optimization, with the Clarkson 2015 dataset for fine-tuning.

InceptionV3
The InceptionV1 architecture is also known as GoogleNet. This section reports best results for each dataset, using the InceptionV3 model. Adam optimization achiev improved performances compared to SGD. Table 5 shows a summary of the outcomes the InceptionV3 model across all datasets. It can be observed from Table 5

InceptionV3
The InceptionV1 architecture is also known as GoogleNet. This section reports the best results for each dataset, using the InceptionV3 model. Adam optimization achieved improved performances compared to SGD. Table 5 shows a summary of the outcomes of the InceptionV3 model across all datasets. It can be observed from Table 5 that the Clarkson 2015 dataset gave the best validation accuracy of 99.44% and the lowest ACER of 1.4%. The IIITD_contact dataset yielded the lowest testing time of 194 sec. Figure 8 shows the training and validation analyses over five epochs of the pre-trained InceptionV3 model. From Figure 8, we can infer that, overall, the best results were observed using Adam optimization, with the Clarkson 2015 dataset for fine-tuning.
Big Data Cogn. Comput. 2022, 6, x FOR PEER REVIEW 17 Figure 8 shows the training and validation analyses over five epochs of the trained InceptionV3 model. From Figure 8, we can infer that, overall, the best results observed using Adam optimization, with the Clarkson 2015 dataset for fine-tuning.

ResNet50
This section reports the best results for each dataset, using the ResNet50 model. 6 shows an overview of the results for the ResNet50 model across all datasets. It c observed from Table 6 that Clarkson 2015 gave the best validation accuracy of 99 while the ND Iris3D_2020 dataset gave the lowest ACER of 0.1%, with the lowest te time of 121 sec. Figure 9 shows training and validation analyses over five epochs o

ResNet50
This section reports the best results for each dataset, using the ResNet50 model. Table 6 shows an overview of the results for the ResNet50 model across all datasets. It can be observed from Table 6 that Clarkson 2015 gave the best validation accuracy of 99.72%, while the ND Iris3D_2020 dataset gave the lowest ACER of 0.1%, with the lowest testing time of 121 sec. Figure 9 shows training and validation analyses over five epochs of the pre-trained ResNet50 model. From Figure 9, we can infer that, overall, the best results were observed using Adam optimization with the Clarkson 2015 dataset for fine-tuning.

DenseNet121
DenseNet121′s input is a 224 × 224 RGB image with a predefined size. The best re for each dataset using the DenseNet121 model are reported in this section. Table 7 sh an overview of the results for the DenseNet121 model across all datasets. It ca observed from Table 7 that Clarkson 2015 gave the best validation accuracy of 99. while the ND Iris3D_2020 dataset gave the lowest ACER 0.1%, with the lowest te time of 121 sec. Figure 9 shows the training and validation analysis over five epochs o pre-trained ResNet50 model. From Figure 10, one can infer that, overall, the best re

DenseNet121
DenseNet121 s input is a 224 × 224 RGB image with a predefined size. The best results for each dataset using the DenseNet121 model are reported in this section. Table 7 shows an overview of the results for the DenseNet121 model across all datasets. It can be observed from Table 7 that Clarkson 2015 gave the best validation accuracy of 99.72%, while the ND Iris3D_2020 dataset gave the lowest ACER 0.1%, with the lowest testing time of 121 s. Figure 9 shows the training and validation analysis over five epochs of the pre-trained ResNet50 model. From Figure 10, one can infer that, overall, the best results were observed using Adam optimization, with the Clarkson 2015 dataset for fine-tuning.

EfficientNetB7
EfficientNetB7, one of the most advanced models, introduced a scaling strate uses a compound coefficient to equally scale all parameters of a network's depth, and resolution. The best results for each dataset using the EfficientNetB7 mod reported in this section. Table 8 shows an overview of the results for the Efficien model across all datasets. It can be observed from Table 8 that ND Iris3D_2020 ga best validation accuracy of 99.97% and the lowest ACER 0%, with the lowest testin of 319 sec. Figure 9 shows the training and validation analysis of the pre-EfficientNetB7 model of five epochs. From Figure 9, we can infer that, overall, t

EfficientNetB7
EfficientNetB7, one of the most advanced models, introduced a scaling strategy that uses a compound coefficient to equally scale all parameters of a network's depth, width, and resolution. The best results for each dataset using the EfficientNetB7 model are reported in this section. Table 8 shows an overview of the results for the EfficientNetB7 model across all datasets. It can be observed from Table 8 that ND Iris3D_2020 gave the best validation accuracy of 99.97% and the lowest ACER 0%, with the lowest testing time of 319 s. Figure 9 shows the training and validation analysis of the pre-trained EfficientNetB7 model of five epochs. From Figure 9, we can infer that, overall, the best results were observed using Adam optimization, with the ND Iris3D_2020 dataset for fine-tuning.
The confusion matrix shows how many images the model both erroneously and accurately detected. For all datasets and models, a confusion matrix was created. The confusion matrix for all five models is given in Table 9. Although multiple models performed well during validation, EfficientNetB7 showed the lowest false positive and false negative results, implying that the EfficientNetB7 model made the fewest errors when predicting whether the image was real or not.  Table 10 shows that EfficientNetB7 is an excellent choice, offering the highest accuracy. EfficientNetB7 also showed promising results in terms of ACER. For faster execution, DenseNet121 can be used with reasonable accuracy. If processing time is not an issue, then EfficientNetB7 should be utilized for the best level of accuracy.

Discussions
As per the extensive literature review, VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7 are the most frequently used transfer learning models in the literature for detecting iris liveness. These models are validated using one or two iris datasets. To date, no one has carried out a comparative analysis among these models, based on the different state-of-the-art iris biometric databases. To compensate for the training data set, a transfer learning strategy was applied, and ImageNet pre-trained weights were utilized, which helped to speed up the process. Due to the small size of the data set, the models avoided overfitting via transfer learning.
The loss and the accuracy values during the validation and training procedures for each fine-tuned model are listed in Tables 4-8 and are presented in Figures 7-11. When comparing accuracy and ACER, it can be observed that the EfficientNetB7 model gives the maximum accuracy and minimum ACER values, followed by the VGG16 model. These can both attain a validation accuracy of 99 percent or more with only a few epochs. This suggests that these models are capable of rapidly learning the differences between live and fake iris images. When the loss and accuracy for the validation set are considered, it is clear that EfficientNetB7 and VGG16 have the highest training accuracy, while ResNet50 has the lowest training loss. VGG16, EfficientNetB7, and ResNet50 have the lowest training loss in the training set [4]. As a result of this data, it can be concluded that the EfficientNetB7 model outperforms the other five models in terms of training and validation.
Big Data Cogn. Comput. 2022, 6, x FOR PEER REVIEW 21 training loss in the training set [4]. As a result of this data, it can be concluded tha EfficientNetB7 model outperforms the other five models in terms of training and validati ACER gives the average classification error rate. Table 10 summarizes the ACER all five models. Though numerous models did well during validation, EfficientNetB7 the lowest ACER, meaning that the EfficientNetB7 model produced the fewest fa when identifying whether the image was real or fake. The accuracy, precision, recall, F1 score of these pre-trained models were compared in this study, as shown in Table  8. EfficientNetB7, with a validation accuracy of 99.97%, was the most accurate, clo followed by VGG16, with a validation accuracy of 99.75%. Tables 4-8 compare model's training and testing computational times. The VGG16 model takes the lon time to train (2983 s); DenseNet121, on the other hand, was slowest throughout learning procedure (300 s) but was the speediest during the testing step (87 s). ACER gives the average classification error rate. Table 10 summarizes the ACER for all five models. Though numerous models did well during validation, EfficientNetB7 had the lowest ACER, meaning that the EfficientNetB7 model produced the fewest faults when identifying whether the image was real or fake. The accuracy, precision, recall, and F1 score of these pre-trained models were compared in this study, as shown in Tables 4-8. Efficient-NetB7, with a validation accuracy of 99.97%, was the most accurate, closely followed by VGG16, with a validation accuracy of 99.75%. Tables 4-8 compare each model's training and testing computational times. The VGG16 model takes the longest time to train (2983 s); DenseNet121, on the other hand, was slowest throughout the learning procedure (300 s) but was the speediest during the testing step (87 s).
The initial experimental results demonstrate that transfer learning models have a great deal of potential for iris liveness detection. Table 10 shows that EfficientNetB7 is an excellent choice, with the highest accuracy. EfficientNetB7 also showed promising results in terms of ACER. For faster execution, DenseNet121 can be used with reasonable accuracy. Statistical analysis was performed using a Wilcoxon signed-rank test to compare the two top models' performances (Wilcoxon P = 0.059). The analysis has demonstrated that if processing time is not an issue, then EfficientNetB7can be utilized for the best level of accuracy.
We recommend EfficientNetB7 for live and artificial iris image classification, based on the results mentioned earlier (99.97% accuracy, 100% precision, 100% recall, and 100% F1 score), further comparing our fine-tuned EfficientNetB7 against other publications that have recently published classification models for iris images. As demonstrated in Table 11, our analysis achieved the highest binary classification accuracy compared to other works that have worked with iris liveness detection. While comparing related works with various models, in our approach, the studies are selected based on the models used for ILD. Only a few studies have worked on the same datasets as those that are used in our analysis. Arora et al. [23] employed VGGNet, a pre-trained network, and the IIITD dataset; however, the accuracy attained was lower, possibly because of the smaller number of training images. Umer et al. [27] achieved the second-greatest accuracy with their suggested network, VGG16.

Conclusions
Deep learning models can help to identify iris liveness with minimal preprocessing of iris images. Several two-class datasets were employed in this investigation, which contained genuine iris and fake iris images from standard benchmark datasets. The transfer learning technique was used to evaluate many state-of-the-art pre-trained neural networks, including VGG-16, Inceptionv3, Resnet50, Densenet121, and EfficientNetB7. EfficientNetB7, with a classification accuracy of 99.97 percent, was found to be the best model, followed by the VGG16 model, which achieved a 99.75 percent classification accuracy. The results of this work show that recognition models created using transfer learning and CNNs can perform well in binary classification tasks using iris images. Both natural and synthetic iris images have similar characteristics that humans can decipher. However, the CNN model can quickly learn the salient features and adequately categorize the images after only a few training epochs. The excellent accuracy found shows that the deep learning models were able to detect something exceptional in the counterfeit iris images, allowing the deep networks to accurately differentiate the images. These trained models can improve the confidentiality and security of biometric systems and the accuracy and efficiency of biometric authentication. Our approach analysis can be extended to other biometric traits like fingerprints and facial recognition and could present a promising framework for robust biometric identification.
Funding: This research is funded by the Symbiosis Institute of Technology, Symbiosis International (Deemed University) and Symbiosis Centre for Applied Artificial Intelligence Pune, India.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data that support the findings of this study are available on request from the respective corresponding author, [Mention in references]. The data are not publicly available due to the privacy concern of research participants.