Deep Learning-Based Iris Segmentation for Iris Recognition in Visible Light Environment

: Existing iris recognition systems are heavily dependent on speciﬁc conditions, such as the distance of image acquisition and the stop-and-stare environment, which require signiﬁcant user cooperation. In environments where user cooperation is not guaranteed, prevailing segmentation schemes of the iris region are confronted with many problems, such as heavy occlusion of eyelashes, invalid off-axis rotations, motion blurs

Most existing iris recognition algorithms are designed for highly controlled cooperative environments, which is the cause of their failure in non-cooperative environments, i.e., those that include noise, off-angles, motion blurs, glasses, hairs, specular reflection (SR), eyelids and eyelashes incorporation, and partially open eyes.Furthermore, the iris is always assumed to be a circular object, and common methods segment it as a circle, but considering intense cases of side view and partially open eyes, the iris boundary deviates from being circular and may include skin, eyelid, and eyelash areas.Iris recognition systems are based on the specific texture of the iris area, which is used as a base entity for recognition and authentication purposes.Therefore, the accurate segmentation of the iris boundary is important even in intense environments.
Algorithms for iris segmentation should be designed to reduce user cooperation to improve the overall iris recognition performance.Many studies have attempted to reduce the error caused by the Symmetry 2017, 9, 263 2 of 25 lack of user cooperation in the past two decades [11][12][13], but the detection of the true iris boundary is still a challenge.To address this issue, we proposed a two-stage iris segmentation method based on convolutional neural networks (CNN), which is capable of robustly finding the true iris boundary in the above-mentioned intense cases with limited user cooperation.Our proposed iris segmentation scheme can be used with inferior quality noisy images even in visible light environment.

Related Works
The existing schemes for iris segmentation can be broadly divided into four types based on implementation.The first and most common type consists of the boundary-based methods.These systems will first find the base element pupil as an inner boundary of the iris, and then find the other parameters, such as eyelid and limbic areas, to separate them from the iris.The first type of methods include Hough transform (HT) and Daugman's integro-differential operator.HT finds the circularity by edge-map voting within the given range of the radius, which is known as the Wildes approach [14].HT-based detection methods are also used [15,16].Daugman's integro-differential operator is another scheme that finds the boundary using an integral derivative, and the advanced method was developed in [17].An effective technique to reduce the error rate in a non-cooperative environment was proposed by Jeong et al. [18].They used two circular edge detectors in combination with AdaBoost for pupil and iris boundary detection, and their method approximated the real boundary of eyelashes and eyelid detection.Other methods are also known to reduce noise prior to the detection of the iris boundary to increase the segmentation accuracy [19].All methods in the first type of segmentation require eye images of good quality and an ideal imaging environment; therefore, these types of methods are less accurate for non-ideal situations or result in higher error rates.
The second type of methods is composed of pixel-based segmentation methods.These methods are based on the identification of the iris boundary using specific color texture and illumination information gradient to discriminate between an iris pixel and another pixel in the neighborhood.Based on the discriminating features, the iris classifier is created for iris and non-iris pixel classification.A novel method for iris and pupil segmentation was proposed by Khan et al. [20].They used 2-D profile lines between the iris and sclera boundary and calculated the gradient pixel by pixel, where the maximum change represents the iris boundary.Parikh et al. [21] first approximated the iris boundary by color-based clustering.Then, for off-angle eye images, two circular boundaries of the iris were detected, and the intersection area of these two boundaries was defined as the iris boundary.However, these methods are affected by eyelashes and hairs or dark skin.The true boundary of the iris is also not identified if it includes the area of eyelids or skin, which reduces the overall iris authentication performance.
The third type of segmentation methods is composed of active contours and circle fitting-based methods [22,23].A similar approach is used in the local Chan-Vese algorithm, where a mask is created according to the size of the iris, and then an iterative process determines the true iris boundary with the help of the localized region-based formulation [24].However, this approach shares the drawback faced by other active contour-based models, i.e., it is usually prevented by the iris texture during iteration, and considers the iris pattern as the boundary, which results in erroneous segmentation.On the other hand, active contour-based methods are more reliable in detecting the pupillary boundary because of the significant difference in visibility.
To eliminate the drawbacks of all current segmentation methods and reduce the complexity of intensive pre-and post-processing, a fourth type of segmentation methods evolved, which consists of learning-based methods [25].Among all learning-based methods, deep learning via deep CNN is the most ideal and popular in current computer vision applications because of its accuracy and performance.This method has been applied to damaged road marks detection [26], human gender recognition from human body images [27], and human detection in night environments using visible light camera [28].Considering segmentation, CNN can provide a powerful platform to simplify the intensive task with accuracy and reliability similar to brain tumor segmentation done by CNN [29].Iris-related applications are sensitive because of the very dense and complex iris texture.Therefore, few researchers focus on CNN related to iris segmentation.Liu et al. [30] used DeepIris to solve an intra-class variation of heterogeneous iris images where CNNs are used to learn relational features to measure the similarity between two candidate irises for verification purposes.Gangwar et al. [31] used DeepIrisNet for iris visual representation and cross-sensor iris recognition.However, these two types of research focus on iris recognition instead of iris segmentation.
Considering iris segmentation using CNNs, Liu et al. [32] identified accurate iris boundaries in non-cooperative environments using fully convolutional networks (FCN).In their study, hierarchical CNNs (HCNNs) and multi-scale FCNs (MFCNs) were used to locate the iris boundary automatically.However, due to the use of full input image without the definition of region of interest (ROI) into the CNN, the eyelids, hairs, eyebrows, and glasses frames, which look similar to iris, can be considered as iris points by the CNN model.This scheme has better performance compared to previous methods.However, the error of the iris segmentation can potentially be further reduced.
To address these issues concerning the existing approaches, we propose a two-stage iris segmentation method based on CNN that is robust, to find the true boundary in less-constrained environments.This study is novel in the following three ways compared to previous works.

-
The proposed method accurately identified the true boundary even in intense scenarios, such as glasses, off-angle eyes, rotated eyes, side view, and partially opened eyes.

-
The first stage includes bottom-hat filtering, noise removal, Canny edge detector, contrast enhancement, and modified HT to segment the approximate the iris boundary.In the second stage, deep CNN with the image input of 21 × 21 pixels is used to fit the true iris boundary.By applying the second stage segmentation only within the ROI defined by the approximate iris boundary detected in the first stage, we can reduce the processing time and error of iris segmentation.-To reduce the effect of bright SR in iris segmentation performance, the SR regions within the image input to CNN are normalized by the average RGB value of the iris region.Furthermore, our trained CNN models have been made publicly available through [33] to achieve fair comparisons by other researchers.
Table 1 shows the comparison between existing methods and the proposed method.
Table 1.Comparisons between the proposed and previous methods on iris segmentation.

Boundary-based methods
Integro-differential operator [17] Fast processing speed using a simple technique.These methods are less accurate for iris segmentation in non-ideal situations or visible light.
Iris localization by circular HT.Upper and lower eyelids detection by parabolic curves [14].
As the first practical scheme, it can produce a good approximation of the iris region with image smoothness.
Using two-circular edge detector along with AdaBoost eye detector.
In addition, eyelash and eyelid detection is performed [18].
This method provides satisfactory results in non-ideal situation, and a closed eye can be detected as well.
This method fails in iris segmentation in case the RGB values of the eyelids are similar to the iris or pupil/eyelid detection error.

Pixel-based methods
Drawing 2D profile line on the iris-sclera area and calculating the gradient to find the boundary points [20].
A new way to estimate iris boundary from both side gradient.
The calculated gradient is affected by the eyelashes, and the true boundary of the iris is not found.
Initially approximating iris boundary by color clustering, and then fitting the boundary by two side curves [21].
To reduce the error of iris segmentation, the upper and lower eyelids and the eyelashes are removed by average intensity analysis.Some empirical threshold is set for eyelid and eyelash detection, but the limitations in detecting the true boundary of the iris still exist.

Active contour and circle fitting-based methods
Iterative method by starting with mask, checking the gradient, and reaching the boundary [24].
For non-ideal cases, it can segment the iris true boundary more accurately than those by boundary-and pixel-based methods.
These methods are better for detecting pupil boundary because they can be prevented by the iris texture considering the iris pattern as the boundary.

CNN-based methods
Using HCNNs and MFCNs based deep learning method [32].
This approach has better accuracy relative to existing segmentation methods for non-ideal conditions.
As they use full image into the network, the eyelids, hairs, eyebrows, and glasses frames, which look similar to iris can be considered as iris points by the CNN model.
Two-stage iris segmentation method (Proposed method) This approach simply finds the rough boundary of the iris and applies CNN just within the ROI defined based on the rough boundary.
A large amount of data is needed for CNN training.

Overview of the Proposed System
Figure 1 shows an overall flowchart of the proposed two-stage iris segmentation method.In the first stage, the rough iris boundary is obtained from the input image to define ROI for next stage.The resultant image from Stage 1 includes parts of the upper and lower eyelids and other areas, such as skin, eyelashes, and sclera.Consequently, the true iris boundary needs to be identified.In the second stage, within this ROI (defined by Stage 1), CNN is applied, which can provide the real iris boundary with the help of learned features.Considering the standard information of the ratio between pupil contraction and dilation, pupil approximation is performed, and finally, the actual iris area can be obtained.gradient, and reaching the boundary [24].
than those by boundary-and pixel-based methods.
by the iris texture considering the iris pattern as the boundary.

CNN-based methods
Using HCNNs and MFCNs based deep learning method [32].
This approach has better accuracy relative to existing segmentation methods for non-ideal conditions.
As they use full image into the network, the eyelids, hairs, eyebrows, and glasses frames, which look similar to iris can be considered as iris points by the CNN model.
Two-stage iris segmentation method (Proposed method) This approach simply finds the rough boundary of the iris and applies CNN just within the ROI defined based on the rough boundary.
A large amount of data is needed for CNN training.

Overview of the Proposed System
Figure 1 shows an overall flowchart of the proposed two-stage iris segmentation method.In the first stage, the rough iris boundary is obtained from the input image to define ROI for next stage.The resultant image from Stage 1 includes parts of the upper and lower eyelids and other areas, such as skin, eyelashes, and sclera.Consequently, the true iris boundary needs to be identified.In the second stage, within this ROI (defined by Stage 1), CNN is applied, which can provide the real iris boundary with the help of learned features.Considering the standard information of the ratio between pupil contraction and dilation, pupil approximation is performed, and finally, the actual iris area can be obtained.In Phase 1, as shown in Figures 2 and 3c, the RGB input image is converted into grayscale for further processing, and morphological operation is applied through bottom-hat filtering with symmetrical structuring element disk of size 5 for contrast enhancement.Finally, two images of the gray image and resultant image by bottom-hat filtering are added to obtain an enhanced image as shown in Figure 3d.

Stage 1. Detection of Rough Iris Boundary by Modified Circular HT
An approximate localization of the iris boundary is the prerequisite of this study, and it is obtained by modified circular HT-based method.As shown in Figure 1, Stage 1 consists of two phases, namely, pre-processing and circular HT-based detection.

Phase 1. Pre-Processing
In Phase 1, as shown in Figures 2 and 3c, the RGB input image is converted into grayscale for further processing, and morphological operation is applied through bottom-hat filtering with symmetrical structuring element disk of size 5 for contrast enhancement.Finally, two images of the gray image and resultant image by bottom-hat filtering are added to obtain an enhanced image as shown in Figure 3d.Symmetry 2017, 9, 263 5 of 25

Stage 1. Detection of Rough Iris Boundary by Modified Circular HT
An approximate localization of the iris boundary is the prerequisite of this study, and it is obtained by modified circular HT-based method.As shown in Figure 1, Stage 1 consists of two phases, namely, pre-processing and circular HT-based detection.

Phase 1. Pre-Processing
In Phase 1, as shown in Figures 2 and 3c, the RGB input image is converted into grayscale for further processing, and morphological operation is applied through bottom-hat filtering with symmetrical structuring element disk of size 5 for contrast enhancement.Finally, two images of the gray image and resultant image by bottom-hat filtering are added to obtain an enhanced image as shown in Figure 3d.

Phase 2. ROI Detection by Rough Iris Boundary
The overall process of Phase 2 is presented by the flowchart in Figure 4.In Phase 2, the filtered image from Phase 1 is redefined as an image of 280 × 220 pixels to reduce the effect of eyebrows in detecting the iris boundary.Then, a 17 × 17 median filter is applied to smooth the image from salt and pepper noises and reduce the skin tone and texture illumination as shown in Figure 5c.13 × 13 symmetrical Gaussian smoothing filter with σ of 2 is applied to the filtered image to increase pixel uniformity as shown in Figure 5d.Then, Canny edge detector with same σ value is used to detect the edges of the iris boundary as shown in Figure 5e.However, the edges are not very clear, and gamma adjustment with γ = 1.90 is applied to enhance the contrast of the image as shown in Figure 5f.With the gamma-enhanced image, the binarized edge image is obtained with eight neighbor connectivity as shown in Figure 5g.In this edge image, there are more circular edges along the iris boundary edges, and circular HT can find all possible circles in the image.However, the incorrect circle-type edges as shown in Figure 5h can be removed by filtering the edges whose radius is out of the range of the minimum and maximum human iris radius.Then, the most-connected edges are selected as iris edges, and the rough iris boundary is detected in Figure 5i.Considering the possibility of detection error of the iris boundary, ROI is defined slightly larger than the detected boundary as shown in Figure 7a.
For fair comparisons, all the optimal parameters for the operation in ROI detection including median filter, Gaussian smoothing filter, Canny edge detector, gamma adjustment, and binarization, etc., were empirically selected only by training data without testing data.

Phase 2. ROI Detection by Rough Iris Boundary
The overall process of Phase 2 is presented by the flowchart in Figure 4.In Phase 2, the filtered image from Phase 1 is redefined as an image of 280 × 220 pixels to reduce the effect of eyebrows in detecting the iris boundary.Then, a 17 × 17 median filter is applied to smooth the image from salt and pepper noises and reduce the skin tone and texture illumination as shown in Figure 5c.13 × 13 symmetrical Gaussian smoothing filter with σ of 2 is applied to the filtered image to increase pixel uniformity as shown in Figure 5d.Then, Canny edge detector with same σ value is used to detect the edges of the iris boundary as shown in Figure 5e.However, the edges are not very clear, and gamma adjustment with  =1.90 is applied to enhance the contrast of the image as shown in Figure 5f.With the gamma-enhanced image, the binarized edge image is obtained with eight neighbor connectivity as shown in Figure 5g.In this edge image, there are more circular edges along the iris boundary edges, and circular HT can find all possible circles in the image.However, the incorrect circle-type edges as shown in Figure 5h can be removed by filtering the edges whose radius is out of the range of the minimum and maximum human iris radius.Then, the most-connected edges are selected as iris edges, and the rough iris boundary is detected in Figure 5i.Considering the possibility of detection error of the iris boundary, ROI is defined slightly larger than the detected boundary as shown in Figure 7a.
For fair comparisons, all the optimal parameters for the operation in ROI detection including median filter, Gaussian smoothing filter, Canny edge detector, gamma adjustment, and binarization, etc., were empirically selected only by training data without testing data.

Iris Rough Boundary Analysis
The rough iris boundary detected in Stage 1 is not the real iris boundary, but an approximation for the next stage.Considering ideal cases in which user cooperation is available, the output of the HT can show the accurate iris boundary.However, for non-ideal cases, such as off-angle eyes, rotated eyes, eyelash occlusions, and semi-open eyes, the HT can sometimes produce inaccurate iris boundaries as shown in Figure 6.Moreover, the detected iris circle includes the eyelid, eyelash, pupil, and SR, which should be discriminated from the true iris area for iris recognition.Therefore, we proposed a CNN-based segmentation method of the iris region based on the ROI defined by the rough iris boundary in Stage 1.

Iris Rough Boundary Analysis
The rough iris boundary detected in Stage 1 is not the real iris boundary, but an approximation for the next stage.Considering ideal cases in which user cooperation is available, the output of the HT can show the accurate iris boundary.However, for non-ideal cases, such as off-angle eyes, rotated eyes, eyelash occlusions, and semi-open eyes, the HT can sometimes produce inaccurate iris boundaries as shown in Figure 6.Moreover, the detected iris circle includes the eyelid, eyelash, pupil, and SR, which should be discriminated from the true iris area for iris recognition.Therefore, we proposed a CNN-based segmentation method of the iris region based on the ROI defined by the rough iris boundary in Stage 1.

Extracting the Mask for CNN Input
To detect the iris region accurately, the square mask of 21 × 21 pixels is extracted from the ROI of Figure 7a and is used as input to CNN.The mask is extracted within the ROI to reduce the number of objects to be classified.Specifically, in many cases, iris color can be similar to the eyebrows and eyelids.Furthermore, in non-ideal cases, the skin can have similar color to iris.Therefore, by extracting the mask only within the ROI, we can reduce the iris segmentation error by CNN.This mask is scanned in both horizontal and vertical directions as shown in Figure 7b.Based on the output of CNN, the center position of the mask is determined as an iris or non-iris pixel.Figure 7c shows the examples of the collected masks of 21 × 21 pixels for CNN training or testing.As shown in Figure 7c, the mask from the iris region has the characteristics where most pixels of the mask are from the iris texture, whereas that from the non-iris region has the characteristics where most pixels are from the skin, eyelid, eyelash, or sclera.

Extracting the Mask for CNN Input
To detect the iris region accurately, the square mask of 21 × 21 pixels is extracted from the ROI of Figure 7a and is used as input to CNN.The mask is extracted within the ROI to reduce the number of objects to be classified.Specifically, in many cases, iris color can be similar to the eyebrows and eyelids.Furthermore, in non-ideal cases, the skin can have similar color to iris.Therefore, by extracting the mask only within the ROI, we can reduce the iris segmentation error by CNN.This mask is scanned in both horizontal and vertical directions as shown in Figure 7b.Based on the output of CNN, the center position of the mask is determined as an iris or non-iris pixel.Figure 7c shows the examples of the collected masks of 21 × 21 pixels for CNN training or testing.As shown in Figure 7c, the mask from the iris region has the characteristics where most pixels of the mask are from the iris texture, whereas that from the non-iris region has the characteristics where most pixels are from the skin, eyelid, eyelash, or sclera.However, if the mask is extracted from the bright SR region, the characteristics of the pixels of the mask can be changed as shown in Figure 8a, which can increase the error of iris segmentation by CNN.To solve this problem, if the bright pixels whose gray level is higher than 245 exist inside the mask, they are replaced by the average RGB value of the iris area (the red dotted box in Figure 8a) as shown in Figure 8b.The threshold of 245 was experimentally found with training data.Here, we can However, if the mask is extracted from the bright SR region, the characteristics of the pixels of the mask can be changed as shown in Figure 8a, which can increase the error of iris segmentation by CNN.To solve this problem, if the bright pixels whose gray level is higher than 245 exist inside the mask, they are replaced by the average RGB value of the iris area (the red dotted box in Figure 8a

Iris Segmentation by CNN
The mask of 21 × 21 pixels is used as input to CNN, and the output of CNN is either iris or noniris area.In this study, a pre-trained VGG-face model [34] is used by fine-tuning with the help of our training images.The VGG-face model was pre-trained with about 2.6 million face images of 2,622 different people.Detailed explanations of the configuration of VGG-face are shown in Table 2 and Figure 9.To obtain an accurate boundary and its difference from other objects, the ROI is selected with slightly increased rough iris boundary detected by HT.

Iris Segmentation by CNN
The mask of 21 × 21 pixels is used as input to CNN, and the output of CNN is either iris or non-iris area.In this study, a pre-trained VGG-face model [34] is used by fine-tuning with the help of our training images.The VGG-face model was pre-trained with about 2.6 million face images of 2,622 different people.Detailed explanations of the configuration of VGG-face are shown in Table 2 and Figure 9.To obtain an accurate boundary and its difference from other objects, the ROI is selected with slightly increased rough iris boundary detected by HT.The VGG-face model consists of 13 convolutional layers and 5 pooling layers in combination with 3 fully connected layers.The filter size, rectified linear unit (Relu), padding, pooling, and stride are explained in Table 2 and Figure 9.A total of 64 3 × 3 size filters are adopted in the 1st convolutional layer.Therefore, the size of the feature map is 224 × 224 × 64 in the 1st convolutional layer.Here, 224 and 224 denote the height and width of the feature map, respectively.They are calculated based on (output height (or width) = (input height (or width) − filter height (or width) + 2 × the number of padding)/the number of stride + 1 [35]).For example, in the image input layer and Conv-1 in Table 2, the input height, filter height, number of padding, and number of strides are 224, 3, 1, and 1, respectively.As a result, the height of the output feature map becomes 224 (= (224 − 3 + 2 × 1)/1 + 1).
There are three common activation functions, such as sigmoid, tanh function, and Relu [35].The sigmoid-based activation function forces the candidate input value between the range of 0 and 1 as shown in Equation ( 1), which means that for the negative inputs, the output becomes zero, whereas it becomes 1 for large positive inputs.
The tanh function is slightly different from the sigmoid activation function because it keeps fitting the input value in the range of −1 and 1 as shown in Equation (2).
Relu performs faster compared to these two nonlinear activation functions, and it is useful to wipe off the gradient problem in the back propagation at the time of training [36,37].In [38], they also showed that the speed of training by Relu with the CIFAR-10 dataset based on the four-layered CNN is six times faster than the tanh function with same dataset and network.Therefore, we use Relu for faster training with simplicity and to avoid the gradient issue in CNN.The Relu was initially used in Boltzmann machines, and it is formulated as follows.
where y i and x i are the corresponding outputs and inputs of the unit, respectively.The Relu layer exists after each convolutional layer, and it maintains the size of each feature map.Max-pooling layers can provide a kind of subsampling.Considering pool-1, which performs max pooling after the convolutional layer-2 and Relu-2, the feature map of 224 × 224 × 64 is reduced to that of 112 × 112 × 64.In the case that the max-pooling layer (pool-1) is executed, the input feature map size is 224 × 224 × 64, the filter size is 2 × 2, and the number of strides is 2 × 2. Here, 2 × 2 for the number of strides denotes the max-pooling filter of 2 × 2 where the filter moves by two pixels in both the horizontal and vertical directions.Owing to the lack of overlapped area due to filter movement, the feature map size is reduced to 1/4 (1/2 horizontally and 1/2 vertically).Consequently, the feature map size after passing pool-1 becomes 112 × 112 × 64 pixels.This pooling layer is used after Relu-2, Relu-4, Relu-7, Relu-10, and Relu-13 as shown in Table 2.For all cases, the filter of 2 × 2 and stride of 2 × 2 are used, and the feature map size diminishes to 1/4 (1/2 horizontally and 1/2 vertically).
In Equation ( 4), given that the array of the output neurons is q, the probability of neurons of the jth class can be obtained by dividing the value of the jth component by the summation of the values of all the components.
CNN usually has an over-fitting issue, which can reduce the testing accuracy.Therefore, to solve this issue, the dropout method and data augmentation have been considered.Dropout methods are important during training to avoid the same type of neurons representing the same feature repeatedly.Which can cause overfitting, wastage of network capacity, and computational resources.Therefore, the solution is to drop out these cases at random by using the dropout ratio to obtain specific benefits, such as the dropped-out neurons' contribution in forward or back propagation.Consequently, it can reduce co-adoption during the process by decreasing dependencies over other neurons and it can force the network to learn strong features with different neurons [38,40].In this study, we use the dropout ratio of 0.5.The dropout layer was used twice, that is, after the 1st FCL with Relu-6 and after the 2nd FCL and Relu-7, as shown in Table 2.
Compared to the elements and parameters of the original VGG-face model, three parts of initial learning rate, the momentum value, and the size of the mini-batch were modified, and their optimal values were experimentally found with training data.The detailed explanations of these values are included in Section 4.2.In addition, the last part modified is the number of CNN output as 2 as shown in Figure 9 because the number of classes in our research is 2 as iris and non-iris pixels.

Pupil Approximation by the Information of the Ratio of Pupil Contraction and Dilation
When applying CNN, false positive errors (non-iris pixel is incorrectly classified into iris pixel) exist in the pupil area.However, the noisy iris challenge evaluation part -II (NICE-II) database used in our experiment includes inferior quality images, where it is very difficult to segment pupillary boundary accurately.Therefore, we used anthropometric information provided by Wyatt [41].
where P d represents the pupil diameter and I d shows the iris diameter.This anthropometric information [41] provides the details of pupil dilation ratio (d p ) in Equation ( 5) that varies from 12% to 60%, and we use the minimum value in our experiment.Specifically, the iris pixel that is extracted by CNN belongs to the region whose d p is less than 12% is determined as non-iris pixel (pupil).

Experimental Data and Environment
In this study, we used the NICE-II training database, which was used for NICE-II competition [42].The database includes extremely noisy iris data of the UBIRIS.v2.This database contains severely noisy 1000 image of 171 classes.The size of the image is 400 × 300 pixels.The images of the iris were acquired from people walking 4-8 m away from a high-resolution visible light camera with visible light illumination [43].Therefore, this database includes the difficulties, such as poorly focused, off-angle, rotated, motion blur, eyelash obstruction, eyelids occlusions, glasses obstructions, irregular SR, non-uniform lighting reflections, and partial captured iris images as shown in Figure 10.[42].The database includes extremely noisy iris data of the UBIRIS.v2.This database contains severely noisy 1000 image of 171 classes.The size of the image is 400 × 300 pixels.The images of the iris were acquired from people walking 4-8 m away from a high-resolution visible light camera with visible light illumination [43].Therefore, this database includes the difficulties, such as poorly focused, off-angle, rotated, motion blur, eyelash obstruction, eyelids occlusions, glasses obstructions, irregular SR, non-uniform lighting reflections, and partial captured iris images as shown in Figure 10 Symmetry 2017, 9, 263 14 of 25 In this study, among the total 1000 iris images, 500 iris images are used for training, and the other 500 iris images are used for testing purposes.The average value of two accuracies was measured by two-fold cross validation.CNN training and testing are performed on a system using Intel ® Core™ i7-3770K CPU @ 3.50 GHz (4 cores) with 28 GB of RAM, and NVIDIA GeForce GTX 1070 (1920 Cuda cores) with graphics memory of 8 GB (NVIDIA, Santa Clara, CA, USA) [44].The training and testing are done with Windows Caffe (version 1) [45].

CNN Training
In our proposed method, we fine-tuned the VGG-face model using 500 iris images for classification of two classes (iris and non-iris).For two-fold cross validation, we used the first 500 images and the other 500 images for testing.From the first 500 iris images for training, we have 9.6 million images of 21 × 21 pixels from the iris ROI for training, and in the second training for cross validation, we have 8.9 million images of 21 × 21 pixels from the iris ROI.During the CNN training, stochastic gradient descent (SGD) is used to minimize the difference between the calculated and desired outputs with the help of the gradient derivative [46].The number of the whole training set divided by a mini-batch size is defined as iteration.The total time taken for the complete training including all the iterations is set as 1 epoch, and the training was executed several times as per a predetermined epoch.In this study, CNN was trained by 10 epochs.For the fine-tuning of VGG-face, the optimum fine-tuning model was experimentally found based on the optimal parameters of initial learning rate of 0.00005, the momentum value of 0.9, and the size of the mini-batch of 20.The detailed explanations of these parameters can be found in the following literature [47].Figure 11a,b show the curves of average loss and accuracy with training data in case of two-fold cross validation.Epoch count is represented on the X-axis, whereas the right Y-axis represents the training accuracy and the left Y-axis shows the training loss.Depending on the learning rate and mini-batch size, the loss varies.While training, it is important to reach the minimum training loss (maximum training accuracy); therefore, learning rate should be decided carefully.The loss value decreased dramatically with a higher value of the learning rate, which can deteriorate the loss value with reaching to a minimum.In our proposed method, the finest model in which the training loss curve converges to 0% (training accuracy of 100%) is used for testing as shown in Figure 11.To have fair comparisons by other researchers, we have made our trained CNN models publicly available through [33].Figure 12 shows the examples of trained filters of 1st convolutional layer of Table 2 in case of two-fold cross validation.In this study, among the total 1000 iris images, 500 iris images are used for training, and the other 500 iris images are used for testing purposes.The average value of two accuracies was measured by two-fold cross validation.CNN training and testing are performed on a system using Intel ® Core™ i7-3770K CPU @ 3.50 GHz (4 cores) with 28 GB of RAM, and NVIDIA GeForce GTX 1070 (1920 Cuda cores) with graphics memory of 8 GB (NVIDIA, Santa Clara, CA, USA) [44].The training and testing are done with Windows Caffe (version 1) [45].

CNN Training
In our proposed method, we fine-tuned the VGG-face model using 500 iris images for classification of two classes (iris and non-iris).For two-fold cross validation, we used the first 500 images and the other 500 images for testing.From the first 500 iris images for training, we have 9.6 million images of 21 × 21 pixels from the iris ROI for training, and in the second training for cross validation, we have 8.9 million images of 21 × 21 pixels from the iris ROI.During the CNN training, stochastic gradient descent (SGD) is used to minimize the difference between the calculated and desired outputs with the help of the gradient derivative [46].The number of the whole training set divided by a mini-batch size is defined as iteration.The total time taken for the complete training including all the iterations is set as 1 epoch, and the training was executed several times as per a pre-determined epoch.In this study, CNN was trained by 10 epochs.For the fine-tuning of VGG-face, the optimum fine-tuning model was experimentally found based on the optimal parameters of initial learning rate of 0.00005, the momentum value of 0.9, and the size of the mini-batch of 20.The detailed explanations of these parameters can be found in the following literature [47].Figure 11a,b show the curves of average loss and accuracy with training data in case of two-fold cross validation.Epoch count is represented on the X-axis, whereas the right Y-axis represents the training accuracy and the left Y-axis shows the training loss.Depending on the learning rate and mini-batch size, the loss varies.While training, it is important to reach the minimum training loss (maximum training accuracy); therefore, learning rate should be decided carefully.The loss value decreased dramatically with a higher value of the learning rate, which can deteriorate the loss value with reaching to a minimum.In our proposed method, the finest model in which the training loss curve converges to 0% (training accuracy of 100%) is used for testing as shown in Figure 11.To have fair comparisons by other researchers, we have made our trained CNN models publicly available through [33].Figure 12 shows the examples of trained filters of 1st convolutional layer of Table 2 in case of two-fold cross validation.

Testing of the Proposed CNN-Based Iris Segmentation
The performance of the proposed method for iris segmentation is evaluated based on the metrics of the NICE-I competition to compare the accuracy with that of the teams participating in NICE-I competition [48].As shown in Equation ( 6), the classification error (  ) is measured by comparing

Testing of the Proposed CNN-Based Iris Segmentation
The performance of the proposed method for iris segmentation is evaluated based on the metrics of the NICE-I competition to compare the accuracy with that of the teams participating in NICE-I competition [48].As shown in Equation ( 6), the classification error (  ) is measured by comparing

Testing of the Proposed CNN-Based Iris Segmentation
The performance of the proposed method for iris segmentation is evaluated based on the metrics of the NICE-I competition to compare the accuracy with that of the teams participating in NICE-I competition [48].As shown in Equation ( 6), the classification error (E i ) is measured by comparing the resultant image (I i (m , n )) by our proposed method and the ground truth image (G(m , n )) with help of the exclusive-OR (XOR) operation.
where m and n are the width and height of the image, respectively.To evaluate the proposed method, the average segmentation error (E1) is calculated by averaging the classification error rate (E i ) of the whole images as shown in Equation (7).
where k represents the total number of testing images.E1 varies between [0,1], which denotes that "0" represents the least error and "1" represents the largest error.

Iris Segmentation Results by the Proposed Method
Figure 13 shows the examples of good segmentation results by our proposed method.In our experiment, we can consider two types of error, such as false positive and false negative errors.The former denotes that the non-iris pixel is incorrectly classified into an iris, whereas the latter denotes that the iris pixel is incorrectly classified into non-iris.In Figure 13, the false positive and false negative errors are shown in green and red, respectively.The true positive case (iris pixel is correctly classified into iris one) is shown in black.As shown in Figure 13, our proposed method can correctly segment the iris region irrespective of various noises in eye image.Figure 14 shows examples of incorrect segmentation of the iris region by our proposed method.False positive errors occur in the eyelash area whose pixel values are similar to that of the iris region whereas false negative errors happen in case of the reflection noises from glasses surface or severely dark iris area.the resultant image (  ( ′ ,  ′ )) by our proposed method and the ground truth image (( ′ ,  ′ )) with help of the exclusive-OR (XOR) operation.
where  and  are the width and height of the image, respectively.To evaluate the proposed method, the average segmentation error (E1) is calculated by averaging the classification error rate (  ) of the whole images as shown in Equation (7).
where k represents the total number of testing images.E1 varies between [0,1], which denotes that "0" represents the least error and "1" represents the largest error.

Iris Segmentation Results by the Proposed Method
Figure 13 shows the examples of good segmentation results by our proposed method.In our experiment, we can consider two types of error, such as false positive and false negative errors.The former denotes that the non-iris pixel is incorrectly classified into an iris, whereas the latter denotes that the iris pixel is incorrectly classified into non-iris.In Figure 13, the false positive and false negative errors are shown in green and red, respectively.The true positive case (iris pixel is correctly classified into iris one) is shown in black.As shown in Figure 13, our proposed method can correctly segment the iris region irrespective of various noises in eye image.Figure 14 shows examples of incorrect segmentation of the iris region by our proposed method.False positive errors occur in the eyelash area whose pixel values are similar to that of the iris region whereas false negative errors happen in case of the reflection noises from glasses surface or severely dark iris area.

Comparison of the Proposed Method with Previous Methods
In Table 3, we show the comparisons of the proposed method with previous methods based on E1 of Equation (7).As shown in these results, our method outperforms the previous methods in iris segmentation error.
In addition, we compared the results by support vector machine (SVM) with those by our CNN.For fair comparison, we used the same training and testing data (obtained from the stage 1 of Figure 1) of two-fold cross validation for both SVM and CNN.From the two-fold cross validation, we obtained the average value of E1 of Equation ( 7).As shown Table 3, E1 by proposed method (using CNN for the stage 2 of Figure 1) is lower than that by SVM-based method (using SVM for the stage 2 of Figure 1), from which we can confirm that our proposed CNN-based method is better than traditional machine learning method of SVM.The reason why the error by CNN-based method is lower than that by SVM-based method is that optimal features by optimal filters can be extracted by the 13 convolutional layers (Table 2) of our CNN compared to SVM.
The goal of our research is to detect the accurate positions of iris pixels based on pixel-level labels.For this purpose, we can consider fast R-CNN [49] or faster R-CNN [50], but only the area of square box shape (based on box-level labels) can be detected by these method [49,50].Therefore, in order to detect the accurate positions of all the iris pixels (not square box-level, but pixel-level), we should consider the different type of CNN such as semantic segmentation network (SSN) [51,52].By using this SSN, the accurate positions of all the iris pixels can be obtained.In details, for SSN, whole image (instead of the mask of 21 × 21 pixels extracted from the stage 1 of Figure 1) was used as input without the stage 1 of Figure 1.For fair comparison, we used the same training and testing images (from which the mask of 21 × 21 pixels were extracted for our method) for the SSN.From the two-fold cross validation, we obtained the average value of E1 of Equation (7).As shown Table 3, E1 by proposed method (using CNN for the stage 2 of Figure 1) is lower than that by SSN-based method (using SSN for the stages 1 and 2 of Figure 1), from which we can confirm that our proposed method is better than SSN-based method.The reason why the error by SSN-based method is higher than that by our method is that the classification complexity of iris and non-iris pixels is increased by applying the SSN to whole image instead of the ROI detected by the stage 1 of Figure 1.

Comparison of the Proposed Method with Previous Methods
In Table 3, we show the comparisons of the proposed method with previous methods based on E1 of Equation (7).As shown in these results, our method outperforms the previous methods in iris segmentation error.
In addition, we compared the results by support vector machine (SVM) with those by our CNN.For fair comparison, we used the same training and testing data (obtained from the stage 1 of Figure 1) of two-fold cross validation for both SVM and CNN.From the two-fold cross validation, we obtained the average value of E1 of Equation ( 7).As shown Table 3, E1 by proposed method (using CNN for the stage 2 of Figure 1) is lower than that by SVM-based method (using SVM for the stage 2 of Figure 1), from which we can confirm that our proposed CNN-based method is better than traditional machine learning method of SVM.The reason why the error by CNN-based method is lower than that by SVM-based method is that optimal features by optimal filters can be extracted by the 13 convolutional layers (Table 2) of our CNN compared to SVM.
The goal of our research is to detect the accurate positions of iris pixels based on pixel-level labels.For this purpose, we can consider fast R-CNN [49] or faster R-CNN [50], but only the area of square box shape (based on box-level labels) can be detected by these method [49,50].Therefore, in order to detect the accurate positions of all the iris pixels (not square box-level, but pixel-level), we should consider the different type of CNN such as semantic segmentation network (SSN) [51,52].By using this SSN, the accurate positions of all the iris pixels can be obtained.In details, for SSN, whole image (instead of the mask of 21 × 21 pixels extracted from the stage 1 of Figure 1) was used as input without the stage 1 of Figure 1.For fair comparison, we used the same training and testing images (from which the mask of 21 × 21 pixels were extracted for our method) for the SSN.From the two-fold cross validation, we obtained the average value of E1 of Equation (7).As shown Table 3, E1 by proposed method (using CNN for the stage 2 of Figure 1) is lower than that by SSN-based method (using SSN for the stages 1 and 2 of Figure 1), from which we can confirm that our proposed method is better than SSN-based method.The reason why the error by SSN-based method is higher than that by our method is that the classification complexity of iris and non-iris pixels is increased by applying the SSN to whole image instead of the ROI detected by the stage 1 of Figure 1.We performed additional experiments with other open database of the mobile iris challenge evaluation (MICHE) data [63,64].There are various databases of the CASIA datasets and iris challenge evaluation (ICE) datasets.The goal of this study is to obtain correct iris segmentation with the iris image captured by the visible light environment.However, very few open iris databases by visible light environment exist, and we used the MICHE datasets for this reason.They were collected by three mobile phones, such as iPhone 5, Galaxy Samsung IV, and Galaxy Tablet II in both indoor and outdoor environments.The ground truth images are not provided; therefore, among the whole images, we used the images where the ground truth iris regions can be obtained by their provided algorithm [56,58] according to the instruction of MICHE.Same procedure of two-fold cross validation was also used for the MICHE data as that for NICE-II dataset.
Figures 15 and 16 show the examples of good segmentation and incorrect segmentations by our proposed method, respectively.Like Figures 13 and 14, the false positive and false negative errors are shown in green and red, respectively.The true positive case (iris pixel is correctly classified into iris one) is shown in black.As shown in Figure 15, our method can correctly segment the iris region with the images captured by various cameras and environment.As shown in Figure 16, the false positive errors are caused by the eyelid region whose pixel values are similar to those of iris region.On the other hand, false negative errors happen in case of the reflection noises from glasses surface or environmental sunlight.[16] 0.0121 SSN-based method [51,52] 0.02816 Proposed method (Using SVM for the stage 2 of Figure 1) 0.03852 Proposed method (Using CNN for the stage 2 of Figure 1) 0.0082

Iris Segmentation Error with Another Open Database
We performed additional experiments with other open database of the mobile iris challenge evaluation (MICHE) data [63,64].There are various databases of the CASIA datasets and iris challenge evaluation (ICE) datasets.The goal of this study is to obtain correct iris segmentation with the iris image captured by the visible light environment.However, very few open iris databases by visible light environment exist, and we used the MICHE datasets for this reason.They were collected by three mobile phones, such as iPhone 5, Galaxy Samsung IV, and Galaxy Tablet II in both indoor and outdoor environments.The ground truth images are not provided; therefore, among the whole images, we used the images where the ground truth iris regions can be obtained by their provided algorithm [56,58] according to the instruction of MICHE.Same procedure of two-fold cross validation was also used for the MICHE data as that for NICE-II dataset.
Figures 15 and 16 show the examples of good segmentation and incorrect segmentations by our proposed method, respectively.Like Figures 13 and 14  databases show that the proposed method achieved higher accuracies of iris segmentation compared to the state-of-the-art methods.
Although our method shows the high accuracy of iris segmentation, the traditional image processing algorithms should be used in the stage 1 of Figure 1.In addition, it is necessary to reduce the processing time for CNN-based classification with the window masks extracted from the stage 1.To solve these problems, we can consider the other type of CNN such as semantic segmentation network (SSN) which can use whole image as input (without the stages 1 and 2 of Figure 1).However, as shown in Table 3, its performance is lower than proposed method.As future work, we would research the method of using this SSN with appropriate post-processing so as to get high accuracy and fast processing speed.In addition, we would apply our proposed method to various iris datasets in NIR light environments, or other biometrics, such as vein segmentation for finger-vein recognition or human body segmentation in thermal image.

Figure 1 .
Figure 1.Flowchart of the proposed method.Figure 1. Flowchart of the proposed method.

Figure 1 .
Figure 1.Flowchart of the proposed method.Figure 1. Flowchart of the proposed method.

3. 2 .
Stage 1.Detection of Rough Iris Boundary by Modified Circular HT An approximate localization of the iris boundary is the prerequisite of this study, and it is obtained by modified circular HT-based method.As shown in Figure 1, Stage 1 consists of two phases, namely, pre-processing and circular HT-based detection.

Figure 3 .
Figure 3. Resultant images by Phase 1 of Figure 1.(a) Original input image of 400 × 300 pixels; (b) grayscale converted image; (c) resultant image by bottom-hat filtering; (d) resultant image by adding the two images of (b,c).

Figure 4 .
Figure 4. Overall region of interest (ROI) detection procedure of Phase 2.Figure 4. Overall region of interest (ROI) detection procedure of Phase 2.

Figure 4 .
Figure 4. Overall region of interest (ROI) detection procedure of Phase 2.Figure 4. Overall region of interest (ROI) detection procedure of Phase 2.

Figure 5 .
Figure 5. Resultant images of rough iris boundary detected by Phase 2 of Figure 1; (a) pre-processed image of 400 × 300 pixels from Phase 1; (b) redefined image of 280 × 220 pixels; (c) median filtered image; (d) image by Gaussian smoothing; (e) image after Canny edge detector; (f) image after gamma contrast adjustment; (g) binarized edge image; (h) resultant image by removing incorrect circles by radius (i) final image with rough iris boundary.

Figure 5 .
Figure 5. Resultant images of rough iris boundary detected by Phase 2 of Figure 1; (a) pre-processed image of 400 × 300 pixels from Phase 1; (b) redefined image of 280 × 220 pixels; (c) median filtered image; (d) image by Gaussian smoothing; (e) image after Canny edge detector; (f) image after gamma contrast adjustment; (g) binarized edge image; (h) resultant image by removing incorrect circles by radius (i) final image with rough iris boundary.

Figure 6 .
Figure 6.Examples of rough iris boundaries detected by Stage 1 for non-ideal environments.Figure 6. Examples of rough iris boundaries detected by Stage 1 for non-ideal environments.

Figure 6 .
Figure 6.Examples of rough iris boundaries detected by Stage 1 for non-ideal environments.Figure 6. Examples of rough iris boundaries detected by Stage 1 for non-ideal environments.

Figure 7 .
Figure 7. Extracting the mask of 21 × 21 pixels for training and testing of CNN.(a) ROI defined from Stage 1 of Figure 1; (b) extracting the mask of 21 × 21 pixels within the ROI; (c) examples of the extracted masks (the 1st (4th) row image shows the masks from the boundary between the upper (lower) eyelid and iris, whereas the others represent those from the left boundary between iris and sclera).

Figure 7 .
Figure 7. Extracting the mask of 21 × 21 pixels for training and testing of CNN.(a) ROI defined from Stage 1 of Figure 1; (b) extracting the mask of 21 × 21 pixels within the ROI; (c) examples of the extracted masks (the 1st (4th) row image shows the masks from the boundary between the upper (lower) eyelid and iris, whereas the others represent those from the left boundary between iris and sclera).

Figure 8 .
Figure 8. Example of replacing specular reflection (SR) pixels by the average RGB value of the iris region.(a) The mask including the SR region; (b) the mask where SR pixels are replaced.

Figure 8 .
Figure 8. Example of replacing specular reflection (SR) pixels by the average RGB value of the iris region.(a) The mask including the SR region; (b) the mask where SR pixels are replaced.

Figure 9 .
Figure 9. CNN architecture used in the proposed method.

Figure 9 .
Figure 9. CNN architecture used in the proposed method.

Figure 10 .
Figure 10.Examples of noisy iris images of NICE-II database.

Figure 10 .
Figure 10.Examples of noisy iris images of NICE-II database.

Figure 11 .
Figure 11.Average loss and accuracy curves for training.Average loss and accuracy curve from (a) 1st fold cross validation; and (b) 2nd fold cross validation.

Figure 12 .
Figure 12.Examples of trained filters of the 1st convolutional layer of Table 2. (a) 1st fold cross validation; (b) 2nd fold cross validation.

Figure 11 .Figure 11 .Figure 12 .
Figure 11.Average loss and accuracy curves for training.Average loss and accuracy curve from (a) 1st fold cross validation; and (b) 2nd fold cross validation.

Figure 12 .
Figure 12.Examples of trained filters of the 1st convolutional layer of Table 2. (a) 1st fold cross validation; (b) 2nd fold cross validation.

Figure 13 .
Figure 13.Examples of good segmentation results by our proposed method.(a) And (c) segmentation results with corresponding   ; (b) and (d) ground truth images (the false positive and false negative errors are shown in green and red, respectively.The true positive case (iris pixel is correctly classified into iris one) is shown in black).

Figure 13 .Figure 13 .
Figure 13.Examples of good segmentation results by our proposed method.(a,c) segmentation results with corresponding E i ; (b,d) ground truth images (the false positive and false negative errors are shown in green and red, respectively.The true positive case (iris pixel is correctly classified into iris one) is shown in black).

Figure 14 .
Figure 14.Examples of incorrect segmentation of the iris region by our proposed method.(a) Original input images; (b) segmentation results with corresponding   ; (c) ground truth images (the false positive and false negative errors are shown in green and red, respectively.The true positive case (iris pixel is correctly classified into iris one) is shown in black).

Figure 14 .
Figure 14.Examples of incorrect segmentation of the iris region by our proposed method.(a) Original input images; (b) segmentation results with corresponding E i ; (c) ground truth images (the false positive and false negative errors are shown in green and red, respectively.The true positive case (iris pixel is correctly classified into iris one) is shown in black).

Figure 15 .
Figure 15.Examples of good segmentation results by our method.(a) And (c) segmentation results with corresponding E i .(b) and (d) ground truth images (the false positive and false negative errors are shown in green and red, respectively.The true positive case (iris pixel is correctly classified into iris one) is shown in black).

Table 2 .
Configuration of the VGG-face model used in the proposed method.

Table 2 .
Configuration of the VGG-face model used in the proposed method.

Table 3 .
Comparison of the proposed method with previous methods using NICE-II dataset.

Table 3 .
Comparison of the proposed method with previous methods using NICE-II dataset.