A Secure Biometric Key Generation Mechanism via Deep Learning and Its Application

: Biometric keys are widely used in the digital identity system due to the inherent uniqueness of biometrics. However, existing biometric key generation methods may expose biometric data, which will cause users’ biometric traits to be permanently unavailable in the secure authentication system. To enhance its security and privacy, we propose a secure biometric key generation method based on deep learning in this paper. Firstly, to prevent the information leakage of biometric data, we utilize random binary codes to represent biometric data and adopt a deep learning model to establish the relationship between biometric data and random binary code for each user. Secondly, to protect the privacy and guarantee the revocability of the biometric key, we add a random permutation operation to shufﬂe the elements of binary code and update a new biometric key. Thirdly, to further enhance the reliability and security of the biometric key, we construct a fuzzy commitment module to generate the helper data without revealing any biometric information during enrollment. Three benchmark datasets including ORL, Extended YaleB, and CMU-PIE are used for evaluation. The experiment results show our scheme achieves a genuine accept rate (GAR) higher than the state-of-the-art methods at a 1% false accept rate (FAR), and meanwhile satisﬁes the properties of revocability and randomness of biometric keys. The security analyses show that our model can effectively resist information leakage, cross-matching, and other attacks. Moreover, the proposed model is applied to a data encryption scenario in our local computer, which takes less than 0.5 s to complete the whole encryption and decryption at different key lengths.


Introduction
With the rapid development of biometrics-based recognition technology, biometric images (e.g., face, iris, fingerprint, iris, retina) can be adopted to generate a biometric key (bio-key), which is used as a user's physical identity in the fields of IoT, blockchain, and cloud computing [1][2][3]. In recent years, people have paid more attention to the privacy and security of biometric data. Once the bio-key generation system exposes biometric data, attackers can utilize this data to access the server for stealing the user's private information, which leads to sensitive information leakage and financial loss. In addition, biometric data is permanently associated with the user's natural identity, thus revocation of the user's biometric trait is impossible [4]. Therefore, for a secure and reliable bio-key generation approach, there are three main issues to be solved. As shown in Figure 1, these issues are as follows: 1.
Accuracy issue. Generated bio-key is affected by some variations of the biometric image such as illumination, blur, and pose.
(2) The attack model can utilize the stored helper data to reconstruct biometric data. Many researchers have explored different approaches to solve these issues. The traditional bio-key generation scheme is divided into three categories [5,6]: key binding, key generation, and secure sketch and fuzzy extractor, which face the following challenges: 1. For the key binding scheme, biometric data and cryptographic key are bound to generate the helper data for hiding the biometric information. There are two typical instances of this scheme: fuzzy commitment and fuzzy vault. On the one hand, Ignatenko et al. [7] demonstrate the fuzzy commitment approach leaks the biometric information. On the other hand, Kholmatov et al. [8] show that multiple helper data of the fuzzy vault can be filtered chaff points to retrieve bio-key via the correlation attack. Thus, they both face the information leakage challenge. 2. For the key generation scheme, biometric data is used to directly generate bio-keys without the external auxiliary information. However, the accuracy of the generated bio-key is sensitive to intra-user variations. In addition, since the input biometric data is continuous, generating a high-entropy bio-key is difficult [9]. Therefore, there is still room for improvement in accuracy and security. 3. For the secure sketch and fuzzy extractor schemes, they both use auxiliary information to restore the bio-key. Nevertheless, Smith et al. [10] and Dodis et al. [11] demonstrate that these two schemes have information leakage risk. Furthermore, multiple uses of helper data cause privacy risk [12].
With the rapid development of deep learning in the field of biometric recognition [13,14], Pandey et al. [15] use a deep neural network (DNN) to learn maximum entropy binary (MEB) codes from biometric images. Roh et al. [16] design a bio-key generation method based on a convolutional neural network (CNN) and a recurrent neural network (RNN). Roy et al. [17] propose a DNN framework to learn robust biometric features for improving authentication accuracy. However, these methods based on the DNN or CNN scheme did not consider the mentioned challenges of security and privacy.
To overcome the above challenges, we propose a secure bio-key generation method based on deep learning. The proposed approach is used to improve security and privacy Many researchers have explored different approaches to solve these issues. The traditional bio-key generation scheme is divided into three categories [5,6]: key binding, key generation, and secure sketch and fuzzy extractor, which face the following challenges: 1.
For the key binding scheme, biometric data and cryptographic key are bound to generate the helper data for hiding the biometric information. There are two typical instances of this scheme: fuzzy commitment and fuzzy vault. On the one hand, Ignatenko et al. [7] demonstrate the fuzzy commitment approach leaks the biometric information. On the other hand, Kholmatov et al. [8] show that multiple helper data of the fuzzy vault can be filtered chaff points to retrieve bio-key via the correlation attack. Thus, they both face the information leakage challenge.

2.
For the key generation scheme, biometric data is used to directly generate bio-keys without the external auxiliary information. However, the accuracy of the generated bio-key is sensitive to intra-user variations. In addition, since the input biometric data is continuous, generating a high-entropy bio-key is difficult [9]. Therefore, there is still room for improvement in accuracy and security. 3.
For the secure sketch and fuzzy extractor schemes, they both use auxiliary information to restore the bio-key. Nevertheless, Smith et al. [10] and Dodis et al. [11] demonstrate that these two schemes have information leakage risk. Furthermore, multiple uses of helper data cause privacy risk [12].
With the rapid development of deep learning in the field of biometric recognition [13,14], Pandey et al. [15] use a deep neural network (DNN) to learn maximum entropy binary (MEB) codes from biometric images. Roh et al. [16] design a bio-key generation method based on a convolutional neural network (CNN) and a recurrent neural network (RNN). Roy et al. [17] propose a DNN framework to learn robust biometric features for improving authentication accuracy. However, these methods based on the DNN or CNN scheme did not consider the mentioned challenges of security and privacy.
To overcome the above challenges, we propose a secure bio-key generation method based on deep learning. The proposed approach is used to improve security and privacy while maintaining accuracy in the biometric authentication system. Specifically, it consists of three parts: (1) a biometrics mapping network; (2) a random permutation module; and (3) a fuzzy commitment module. Firstly, the generated binary code by the random number generator (RNG) can represent the biometric data for each user. Subsequently, we adopt the biometrics mapping network to learn the mapping relationship between the biometric data and the binary code during enrollment, which can preserve the recognition accuracy and prevent the information leakage of biometric data. Then, a random permutation module is designed to shuffle the elements of the binary code for generating the distinctive bio-keys without retraining the biometrics mapping network, which keeps the generated bio-key revocable. Next, we construct the fuzzy commitment module to encode the random binary code for generating the auxiliary data without revealing any biometric information. The bio-key is decoded from query biometric data with the help of the auxiliary data, which enhances its stability and security. Finally, the proposed scheme is applied to the AES encryption scenario for verifying its availability and practicality on our local computer. In this work, we use face image as the biometric trait to demonstrate our proposed approach. In summary, the contributions of our paper are summarized as follows:

1.
We design a biometrics mapping network based on the DNN framework to obtain the random binary code from biometric data, which prevents information leakage and maintains the accuracy performance under intra-user variations.

2.
We propose a revocable bio-key protection approach by utilizing a random permutation module, which can powerfully guarantee the revocability and protect the privacy of bio-key.

3.
We construct a fuzzy commitment architecture through an error-correcting technique, which can generate stable bio-keys with the help of auxiliary data, and avoid the exposure of bio-key and biometric data during enrollment.

4.
We conduct extended experiments on three benchmark datasets, and the results show that our model not only effectively improves the accuracy performance but also enhances the security and privacy of the biometric authentication system.

5.
Furthermore, we validate our bio-key generation model in the AES encryption application, which can reliably generate the bio-keys with different lengths to meet practical encryption requirements on our local computer.
The rest of this paper is organized as follows. Section 2 reviews related work. Section 3 presents the proposed approach of bio-key generation in detail. Section 4 discusses our experimental results. Finally, we conclude in Section 5.

Related Work
Bio-key generation schemes can be classified into key binding, key generation, secure sketch and fuzzy extractor, and machine learning. Therefore, we briefly review these schemes in this section.

Key Binding Scheme Based on Biometrics
This scheme is used to generate a bio-key by binding biometric data with the secret key. Specifically, the biometric data and the key are bound to generate helper data during the enrollment stage. If the query biometric data is different from the registered biometrics with a limited error, the bio-key can be retrieved by the helper data. This scheme has two typic instances: fuzzy commitment [18] and fuzzy vault [19]. Hao et al. [20] proposed a fuzzy commitment approach based on a coding scheme that used Hadamard code and Reed-Solomon codes. Veen et al. [21] presented a renewable fuzzy commitment method that integrated helper data in a biometric recognition system. Chauhan S et al. [22] proposed a fuzzy commitment approach based on the Reed-Solomon code that removed the error of the biometric template. However, the above methods based on fuzzy commitment do not guarantee that input biometric data is high entropy. Ignatenko et al. [7] and Zhou et al. [23] demonstrated the fuzzy commitment scheme existed information leakage when input biometric data is low entropy. Moreover, Rathgeb et al. [24,25] proposed a statistical attack that could attack different fuzzy commitment schemes. Clancy et al. [26] improved the fuzzy vault scheme that provided an optimized algorithm by exploiting the best vault parameters. Uludag et al. [27] combined the fuzzy vault with helper data to protect biometric data. Nandakumar et al. [28] utilized the helper data to align the biometrics and query biometrics for improving the authentication accuracy. Li C et al. [29] designed a fuzzy vault scheme by utilizing a pair-polar structure to increase the reliability of the cryptosystem. Nevertheless, the attacker can compare multiple vaults to obtain a candidate set of real points mixed by using attack via record multiplicity (ARM) in the fuzzy vault scheme [30][31][32]. Therefore, the above methods cannot ensure security and privacy in the key binding scheme. In this paper, we propose a deep learning framework to generate random binary code, and utilize random binary codes to represent biometric data, which can effectively prevent information leakage.

Key Generation Scheme Based on Biometrics
The task of the key generation scheme is to directly generate a bio-key from biometric traits. Zhang et al. [33] proposed a generalized thresholding approach for improving the authentication accuracy and the security of the bio-key. Hoque et al. [34] presented a key generation method based on several feature partitioning schemes. Rathgeb et al. [35] designed an interval-mapping approach that mapped the features into intervals for generating the bio-key. Lalithamani et al. [36] described a non-invertible bio-key generation approach from biometric templates. The main idea of this approach is to divide the templates into two vectors, and then shuffle the divided vectors and convert them into a matrix to ensure irreversibility. Wu et al. [37] proposed a key generation approach based on face images that combined binary quantization and Reed-Solomon techniques. Ranjan et al. [38] introduced a key generation approach based on the distance to reduce some complex operations for generating the bio-key. Sarkar et al. [39] gave a cancelable key generation approach for asymmetric cryptography. Specifically, they adopted a transformation method based on shuffling to generate the revocable bio-key. Anees et al. [40] presented a bio-key generation method based on binary feature extraction and quantization. However, these methods do not consider the intra-user variations, which makes it difficult to generate stable bio-keys. Moreover, maintaining a high entropy of the key is the main challenge when the bio-key is derived directly from the biometric data.

Secure Sketch and Fuzzy Extractor Scheme Based on Biometrics
Dodis et al. [41] first proposed secure sketch and fuzzy extractor notions. On the one hand, the secure sketch could generate helper data that did not reveal biometric data and yet recovered the bio-key when query data was close to biometric data. Therefore, this scheme has error correction capability and can correct error-prone biometric data. On the other hand, the fuzzy extractor could obtain biometrics to produce a uniform biokey for applying various cryptographic applications. Chang et al. [42] designed a hiding secret points approach based on the secure sketch scheme. Sutcu et al. [43] presented a secure sketch by fusing face and fingerprint features for enhancing security. Li et al. [44] proposed two levels of quantization approach for constructing a robust and effective secure sketch. Specifically, they used the first quantizer to calculate the difference between the codeword and noise data, and further utilized the second quantizer to quantize the difference for correcting the noise. Lee et al. [45] added some random noise into the minutiae measurements to construct a fuzzy extractor. Yang et al. [46] improved the fuzzy extractor scheme through registration-free and Delaunay triangulation for improving authentication performance. Chi et al. [47] proposed a multi-biometric cryptosystem that combined secret share and fuzzy extractor approaches. Alexandr et al. [48] designed a new fuzzy extractor without the non-secret helper data for improving its security. Nevertheless, these methods did not take information leakage into consideration. Smith et al. [10] and Dodis et al. [11] demonstrated that the secure sketch and fuzzy extractor schemes would leak information about input biometric data. Morever, Linnartz et al. [12] showed they Appl. Sci. 2021, 11, 8497 5 of 23 suffered from privacy risks in the case of multiple uses. Hence, the above methods still have weaknesses in security and privacy.

Machine Learning Scheme
With the rapid development of machine learning and deep learning in biometric recognition, there are many meaningful works on these topics [49,50]. Wu et al. [51] studied a novel bio-key generation algorithm based on machine learning, which was used to directly generate stable bio-keys for improving accuracy. Panchal et al. [52] proposed a support vector machine (SVM)-based ranking scheme without threshold selection to increase the accuracy. Pandey et al. [15] presented a DNN model to generate bio-keys with randomness. Roh et al. [16] combined a CNN framework and an RNN framework to produce bio-keys without helper data. Wang et al. [53] used a DNN architecture to learn biometric features for enhancing the stability of bio-keys. Roy et al. [17] used a CNN model to extract robust features for improving the accuracy. However, the above methods only focus on accuracy and ignore the security and privacy issues of the bio-key generation. Iurii et al. [54] designed an efficient approach for securing identification documents using deep learning, which can demonstrate high-accuracy performance while resisting biometric impostor attacks.

Methodology
In this section, we illustrate the proposed bio-key generation scheme. First, we give an overview of the proposed bio-key generation mechanism in Section 3.1. Then, we introduce two components of our biometrics mapping network: feature vector extraction and binary code mapping networks in Section 3.2. Next, we present the implementation of random permutation and fuzzy commitment in Section 3.3. Finally, we describe the enrollment and reconstruction processes of whole bio-key generation in Section 3.4.

Overview
The overall framework of the proposed bio-key generation mechanism via deep learning is shown in Figure 2. It mainly consists of the enrollment stage and reconstruction stage. (1) In the enrollment stage, we use a random binary code generator comprised of RNG to produce the binary code K, and then train a biometrics mapping network to learn the mapping between the original biometric data and random binary code. Specifically, this network includes two components: feature extraction and binary code mapping. Next, the elements of the binary code are shuffled by using a random permutation module to yield a permuted code K R as the bio-key, meanwhile, the generated permutation vector (PV) is stored in the database. Finally, K and K R are encoded to generate auxiliary data (AD) via a fuzzy commitment encoder. Therefore, the PV and AD are only stored in the database during the enrollment process. (2) In the reconstruction stage, a query image is input to the trained network model to generate the corresponding binary code K * . Subsequently, we obtain the stored PV and AD from the database. Next, the query permuted code K R * is generated from the predicted binary code by utilizing the random permutation module with PV. Finally, the bio-key K R is decoded with the help of AD when the query image is close to the registered biometric image. Otherwise, the bio-key cannot be restored. In the next section, we describe the biometrics mapping network in detail.

Biometrics Mapping Network Based on DNN Architecture
As DNNs [13,50] have made great progress in the field of image recognition, Taigman et al. [55] and Deng et al. [56] proposed a biometric recognition model based on a DNN framework which can effectively learn intermediate feature representation from the biometric image. Although these methods have satisfactory performance, there are still two challenges while applying them in the real-word. The first challenge is that these models with large weight parameters require greater computing power of the device. The second challenge is that directly learning random binary code from biometric images needs a robust feature extractor. To overcome these challenges, we propose a biometrics mapping network based on DNN architecture which contains two components: feature extraction network and binary code mapping network. The former is used to extract an intermediate feature vector, and the latter is to map the extracted feature vector into binary code. This architecture is shown in Figure 3. In the next section, we introduce two components of the biometrics mapping network.

Biometrics Mapping Network Based on DNN Architecture
As DNNs [13,50] have made great progress in the field of image recognition, Taigman et al. [55] and Deng et al. [56] proposed a biometric recognition model based on a DNN framework which can effectively learn intermediate feature representation from the biometric image. Although these methods have satisfactory performance, there are still two challenges while applying them in the real-word. The first challenge is that these models with large weight parameters require greater computing power of the device. The second challenge is that directly learning random binary code from biometric images needs a robust feature extractor. To overcome these challenges, we propose a biometrics mapping network based on DNN architecture which contains two components: feature extraction network and binary code mapping network. The former is used to extract an intermediate feature vector, and the latter is to map the extracted feature vector into binary code. This architecture is shown in Figure 3. In the next section, we introduce two components of the biometrics mapping network.

Feature Extraction Network
To solve the first challenge, we adopt pointwise (PW) and depthwise (DW) convolutions instead of standard convolution to build a lightweight feature extraction network which can reduce the amount of memory storage and computational power while preserving accuracy [57]. On this basis, we improve the bottleneck architecture for a better intermediate feature representation. The architecture of the network is shown in Figure 4. Specifically, on the one hand, we first use PW to expand input features into a higherdimensional feature space for extracting rich feature maps, and then utilize DW to reduce the computation redundancy. On the other hand, we add an attention module named a squeeze-and-excitation network (SENet) [58] between two nodes of the bottleneck, which can selectively strengthen useful features and suppress useless features or less useful ones for improving the ability of feature representation. Therefore, these key components can complement each other, resulting in an effective and robust biometric feature vector.

Feature Extraction Network
To solve the first challenge, we adopt pointwise (PW) and depthwise (DW) convolutions instead of standard convolution to build a lightweight feature extraction network which can reduce the amount of memory storage and computational power while preserving accuracy [57]. On this basis, we improve the bottleneck architecture for a better intermediate feature representation. The architecture of the network is shown in Figure 4. Specifically, on the one hand, we first use PW to expand input features into a higher-dimensional feature space for extracting rich feature maps, and then utilize DW to reduce the computation redundancy. On the other hand, we add an attention module named a squeeze-and-excitation network (SENet) [58] between two nodes of the bottleneck, which can selectively strengthen useful features and suppress useless features or less useful ones for improving the ability of feature representation. Therefore, these key components can complement each other, resulting in an effective and robust biometric feature vector.

Binary Code Mapping Network
To effectively learn the mapping between face image and random binary code, we design a robust binary mapping network. In fact, the mapping network is to learn unique binary code, which follows a uniform distribution. In other words, each bit of this binary code has a 50% chance of being 0 or 1. Since the extracted feature vector can represent the uniqueness of each face image, our proposed method only requires a nonlinear project matrix to map the feature vector into the binary code. Assuming that the extracted feature vector can be defined as and the nonlinear project matrix can be defined as , the mapped binary code can thus be denoted as: Therefore, we can combine a sequence of fully connected (FC) layers with a nonlinear activate function to establish nonlinear mapping, such as Equation (1). The mapping network contains three FC layers (namely FC_1 with 512 dimensions, FC_2 with 2048 dimensions, FC_3 with 512 dimensions) and one tanh layer. For different bio-key lengths, we slightly modify the dimension of the FC_3 layer. Moreover, a dropout strategy [59] is applied to these FC layers with a 0.35 probability to avoid overfitting. The tanh layer is used as the last activation function for generating approximately uniform binary code. This is because the tanh layer is differentiable in the backpropagation learning and close to the signum function.
It is noted that each element of the mapped real-value through the network may be close to 0 or 1 where ∈ ℝ . In this case, we adopt binary quantization to generate binary code from . To obtain the uniform distribution of the binary code, we set a dynamic threshold = ∑ where denotes -th element of , and represents the

Binary Code Mapping Network
To effectively learn the mapping between face image and random binary code, we design a robust binary mapping network. In fact, the mapping network is to learn unique binary code, which follows a uniform distribution. In other words, each bit of this binary code has a 50% chance of being 0 or 1. Since the extracted feature vector can represent the uniqueness of each face image, our proposed method only requires a nonlinear project matrix to map the feature vector into the binary code. Assuming that the extracted feature vector can be defined as V and the nonlinear project matrix can be defined as M, the mapped binary code K can thus be denoted as: Therefore, we can combine a sequence of fully connected (FC) layers with a nonlinear activate function to establish nonlinear mapping, such as Equation (1). The mapping network contains three FC layers (namely FC_1 with 512 dimensions, FC_2 with 2048 dimensions, FC_3 with 512 dimensions) and one tanh layer. For different bio-key lengths, we slightly modify the dimension of the FC_3 layer. Moreover, a dropout strategy [59] is applied to these FC layers with a 0.35 probability to avoid overfitting. The tanh layer is used as the last activation function for generating approximately uniform binary code. This is because the tanh layer is differentiable in the backpropagation learning and close to the signum function.
It is noted that each element of the mapped real-value Y through the network may be close to 0 or 1 where Y ∈ R l . In this case, we adopt binary quantization to generate binary code from Y. To obtain the uniform distribution of the binary code, we set a dynamic and l represents the length of Y. Therefore, the final mapping element K r of binary code K can be defined as: Here, quantization function q(Y r ) is defined as:

Training Network
As mentioned above, our proposed DNN model includes two components: feature extraction and binary code mapping. So as to effectively learn the mapping between biometric image and random binary code, we combine three objective functions to implement an end-to-end training network.
First, for the feature extraction component, we use ArcFace loss [55] as classification loss to train this component, which is used to generate a discriminative feature vector for the user's face image. Hence, the first objective function J 1 is expressed by the ArcFace loss. Second, for the binary code mapping component, the output of this network is an l-dimensional binary code; this is actually a regression task. To reduce the quantization loss between mapping real-value Y and binary codes K, the second loss is defined as: where N denotes the batch size, and i represents i-th sample. Moreover, the binary code is high entropy, that is, the distributions of 0 and 1 are equal probability. To maximize the entropy of the binary code, the third objective function is selected as: where mean (Y i ) donates average of Y i . Therefore, the final loss L can be defined as: where α, β, and γ are the scale factor of each term, respectively. We give the implementation process in Algorithm 1. In our training network process, binary codes K are firstly assigned by the random binary code generator module according to different users. Then, we can set up the mapping relationship between biometric images X and binary codes K. Next, we initialize the weight parameter W and bias parameters b, and the full objective function as the mentioned Equation (6) is adopted to train our network. Subsequently, multiple pairs of X and K are fed into the DNN model to update parameters W and b by using a stochastic gradient descent (SGD) method. Finally, the parameters are computed to obtain the trained model parameters. All steps are presented in Equation (1). To improve the security for preventing information leakage, a random binary code is assigned to each user and used as the label data to train the biometrics mapping model based on the DNN framework. Therefore, during every new enrolment, we should assign a new random binary code to a new subject, and then retrain the network to learn the mapping between the new biometric image and binary code, which can provide better accuracy and security.

Algorithm 1 Process of training network in our DNN model
Parameters: learning rate η, epoch size N, weight parameter W and bias paramters b Input: biometric images X as input data, the assigned binary code K as label data Output: the trained DNN model with W and b 1. Generate binary codes K by random binary code generator according to different users. Then, establish mapping a relationship between input X and output K. 2. Initialize W and b. 3. Compute loss function according to Equation (6). 4. Update W and b by SGD:

Random Permutation and Fuzzy Commitment
In this section, we introduce two functional modules: random permutation and fuzzy commitment. For the revocability, we utilize the random permutation module to shuffle the elements of binary code for generating distinctive bio-keys. Furthermore, a fuzzy commitment is constructed to generate stable bio-key while avoiding information leakage of bio-key and biometric data.

Random Permutation
Random permutation can shuffle the elements of binary code, which can obtain different bio-keys. On the one hand, once the generated bio-key is compromised, a new bio-key can be easily regenerated by only modifying the random permutation seed. On the other hand, it can reduce the cost-time of bio-key generation instead of retraining the DNN model. PV is firstly produced by RNG with random permutation seed for each user, let the produced PV = P 1 , P 2 , P 3 · · · P r , · · · P l−1 , P l , which satisfies ∀t = {1 · · · l}, ∃t = P r . Given a binary code K with length l where K = {K 1 , K 2 , K 3 · · · K r , · · · K l−1 , K l }, a permuted binary code K R = {K P 1 , K P 2 , K P 3 · · · K P r , · · · K P l−1 , K P l } is obtained by randomly shuffling the elements of K according to PV. In addition, PV is stored in the database during the enrollment stage. It can be observed from random permutation that PV does not reveal any information of the binary code or bio-key. This is because that PV is only determined by random permutation seed and has no relationship with binary code. In the bio-key reconstruction stage, the permuted binary code is generated by only shuffling the elements of query binary code through the stored PV in the database. As shown in Figure 5, we give an example of random permutation when l = 256.
and .

Random Permutation and Fuzzy Commitment
In this section, we introduce two functional modules: random permutation and fuzzy commitment. For the revocability, we utilize the random permutation module to shuffle the elements of binary code for generating distinctive bio-keys. Furthermore, a fuzzy commitment is constructed to generate stable bio-key while avoiding information leakage of bio-key and biometric data.

Random Permutation
Random permutation can shuffle the elements of binary code, which can obtain different bio-keys. On the one hand, once the generated bio-key is compromised, a new biokey can be easily regenerated by only modifying the random permutation seed. On the other hand, it can reduce the cost-time of bio-key generation instead of retraining the DNN model. PV is firstly produced by RNG with random permutation seed for each user, let the produced PV = , , ⋯ , ⋯ , , which satisfies ∀ = 1 ⋯ , ∃ = . Given a binary code with length where = , , ⋯ , ⋯ , , a permuted binary code = , , ⋯ , ⋯ , is obtained by randomly shuffling the elements of K according to PV. In addition, PV is stored in the database during the enrollment stage. It can be observed from random permutation that PV does not reveal any information of the binary code or bio-key. This is because that PV is only determined by random permutation seed and has no relationship with binary code. In the bio-key reconstruction stage, the permuted binary code is generated by only shuffling the elements of query binary code through the stored PV in the database. As shown in Figure 5, we give an example of random permutation when = 256.   Figure 5. Illustration of the random permutation where the length of binary code is 256. Figure 5. Illustration of the random permutation where the length of binary code is 256.

Fuzzy Commitment
Considering the distance error between the predicted binary code and the corresponding random binary code during the reconstruction stage, we can correct the predicted binary code using an error correction code (ECC) technique for obtaining a stable bio-key. Inspired by the fuzzy commitment, we construct a new fuzzy commitment to extract a robust bio-key from the binary code and adopt BCH code as the encoder and decoder of ECC in Figure 6. There are two advantages in our fuzzy commitment. The first is that the error correction code is utilized to reconstruct a stable bio-key from the predicted binary code during the decoder stage. The second is that generated helper data does not reveal the biometric data or bio-key during the encoder stage.

Fuzzy Commitment
Considering the distance error between the predicted binary code and the corresponding random binary code during the reconstruction stage, we can correct the predicted binary code using an error correction code (ECC) technique for obtaining a stable bio-key. Inspired by the fuzzy commitment, we construct a new fuzzy commitment to extract a robust bio-key from the binary code and adopt BCH code as the encoder and decoder of ECC in Figure 6. There are two advantages in our fuzzy commitment. The first is that the error correction code is utilized to reconstruct a stable bio-key from the predicted binary code during the decoder stage. The second is that generated helper data does not reveal the biometric data or bio-key during the encoder stage.

Fuzzy commitment Encoder
Random binary code K Z Z* K* AD Fuzzy commitment Decoder Figure 6. The implementation process of the fuzzy commitment. During the encoder stage, binary codes and are encoded to auxiliary data AD through the fuzzy commitment encoder. During the decoder stage, the bio-key is decoded with the help of the AD and query permuted code * through the fuzzy commitment decoder.
In the next part, we describe how to construct the encoder of the fuzzy commitment from random binary code and the permuted binary code during the encoder stage. Then, we introduce how the decoder of the fuzzy commitment is designed to recover the bio-key from query permuted code * during the decoder stage.
Encoder stage: Random binary code K is encoded by BCH to generate a binary codeword : where ENC() means the BCH encoder of the fuzzy commitment. Then, the generated is XORed with the permuted binary code to generate AD: where  denotes XOR operation and is generated from through random permutation module. Since and are random, the AD does not expose any biometric data and bio-key although it is public.
Decoder stage: Query permuted code * is XORed with AD to yield *. Then, * is decoded to generate the corresponding binary code * through the BCH decoder: * = ( * ) = (AD ⊕ * ) where DEC() denotes the BCH decoder of fuzzy commitment and * is obtained from query biometric image through DNN and random permutation modules. Finally, the biokey is reconstructed by XOR operation between * and AD: If the distance error | | between permuted binary codes and * is less than the error tolerance threshold of the BCH code, the reconstructed is equal to . It can be deduced as the following formula: Figure 6. The implementation process of the fuzzy commitment. During the encoder stage, binary codes K R and K are encoded to auxiliary data AD through the fuzzy commitment encoder. During the decoder stage, the bio-key is decoded with the help of the AD and query permuted code K * R through the fuzzy commitment decoder.
In the next part, we describe how to construct the encoder of the fuzzy commitment from random binary code K and the permuted binary code K R during the encoder stage. Then, we introduce how the decoder of the fuzzy commitment is designed to recover the bio-key from query permuted code K R * during the decoder stage. Encoder stage: Random binary code K is encoded by BCH to generate a binary codeword Z: where ENC() means the BCH encoder of the fuzzy commitment. Then, the generated Z is XORed with the permuted binary code K R to generate AD: where ⊕ denotes XOR operation and K R is generated from K through random permutation module. Since K R and K are random, the AD does not expose any biometric data and bio-key although it is public. Decoder stage: Query permuted code K * R is XORed with AD to yield Z*. Then, Z* is decoded to generate the corresponding binary code K * through the BCH decoder: where DEC() denotes the BCH decoder of fuzzy commitment and K * R is obtained from query biometric image through DNN and random permutation modules. Finally, the bio-keyK R is reconstructed by XOR operation between K * and AD: If the distance error |ε| between permuted binary codes K R and K * R is less than the error tolerance threshold τ of the BCH code, the reconstructedK R is equal to K R . It can be deduced as the following formula: The constructed fuzzy commitment has two characteristics: reliability and security. In terms of reliability, the bio-key can be correctly recovered when |ε| of a genuine query is less than τ. On the contrary, the bio-key cannot be correctly reconstructed when |ε| of imposter query is larger than τ, which has the ECC ability and increases the stability of the bio-key. In terms of security, the generated AD does not reveal biometric data or the bio-key because input binary codes K R and K are uniformly distributed during the encoder stage, which can enhance the security of bio-key generation.

Enrollment and Reconstruction Procedure
In this section, we present the process of enrollment and reconstruction in Figure 2. In the enrollment stage, the generated random binary code K is firstly assigned to each user. At the same time, the biometric images X as input data and random binary code K as label data are used to train the DNN model. Then, PV is obtained by RNG, and the binary code K is shuffled to produce the permuted binary code K R according to PV. Next, the binary codes K R and K are encoded into AD by using the fuzzy commitment encoder module. Finally, the generated PV and AD are stored in the database. In our method, only AD and PV are registered to the database instead of biometric data, which can effectively prevent information leakage about biometrics from AD and PV.
In the bio-key reconstruction stage, we adopt the trained DNN model to generate binary code K * from the query image. Then, the permuted binary code K * R is obtained from K * by random permutation according to PV. Next, the bio-key can be correctly decoded from K * R and AD by utilizing the fuzzy commitment decoder module when K * R is close to K R for genuine queries. Therefore, the final bio-key K R is recovered.

Experimental Results
In this section, we introduce datasets and experimental setup in Sections 4.1 and 4.2, respectively. Then, we conduct our experiments to evaluate the accuracy performance in Section 4.3. Subsequently, we analyze revocability and randomness properties in Section 4.4. Next, we discuss the security of our proposed scheme in Section 4.5. Furthermore, we compare our approach with related works in Section 4.6. Finally, our proposed method is applied to the data encryption scenario for validating its effectiveness and practicality in Section 4.7.

Dataset
We adopt multi-shot enrolment including more than one image to evaluate our method on the following three benchmark datasets.
(1) ORL dataset [60]: this dataset comes from Olivetti Research Laboratory formerly named American Telephone and Telegraph Company. This dataset is composed of 10 different face images of each 40 face subjects, which includes different illuminations, expressions, and poses. In addition, we randomly select five face images of each subject for enrollment, and other face images are applied to test the performance of bio-key generation during the reconstruction stage. (2) Extended YaleB dataset [61]: this dataset includes 2332 face images of 38 subjects, and it is captured under 64 different lighting conditions. Hence, the face image of each user has 64 different illuminations. We randomly choose 10 face images of each subject in the enrollment stage, and the rest images are used for testing. (3) CMU-PIE dataset [62]: the CMU-PIE dataset contains 41,368 face images of 68 subjects, including larger variations in illuminations, poses, and expressions. In this experiment, we utilize five different poses (p05, p07, p09, p27, p29) and illuminations to validate our scheme. We follow the same partition strategy with the Extended YaleB dataset in training and testing images.

Experiment Setup
In this experiment, we train the DNN model during the enrollment stage on ORL, Extended YaleB, and CMU-PIE datasets, respectively. Before training our DNN model, we adopt the MTCNN [63] model to implement image alignment operation, and then take the center crop operation to generate the final face image of 112 × 96 from the aligned image so that the input size to the network is consistent. After the alignment and crop operations, we train our DNN model to generate bio-keys from the pre-processed face images. In the training process, α, β, and γ are set to 0.25, 0.25, and 0.5. The batch size is 64 and the learning rate is 0.0001. We use a SGD optimizer to train our network with 80,000 epochs. It is noted that five different trained DNN models are generated by only modifying the output dimension of the FC_3 layer. Then, these cropped images are fed into the trained DNN models to generate 128-, 256-, 512-, 1024-, and 2048-bit binary codes. Next, these binary codes produce PV and AD for registering into the database. In the reconstruction stage, test images are submitted as the inputs of the bio-key generation model for testing. Our experiment environment is: Window10, 64 bits, CPU: and Intel(R) Core(TM) i7-9750H. In addition, all verifications of our proposed scheme are implemented in Pycharm IDE.

Accuracy Performance
In this section, we discuss the accuracy performance on ORL, Extended YaleB, and CMU-PIE datasets. The accuracy is evaluated by GAR (Genuine Accept Rate) and EER (Equal Error Rate) at a given 1% FAR (False Acceptance Rate). The results on these three datasets are listed in Table 1. It can be seen that all GARs are larger than 97% at a fixed 1% FAR, and EERs are less than 2% where the key length of bio-key is varied from 128, 256, 1024, and 2048. Generally, since the generated binary code contains more noise when the key length is longer, it may reduce the authentication accuracy of the bio-key. However, as the key length increases, our proposed scheme also achieves better accuracy. For example, we obtain a better GAR of 99.62% and EER of 0.51% at fixed 1% FAR for a length size of 1024 bits on the ORL dataset. This is because that our method based on the DNN model can effectively learn the mapping relationship between binary code and biometric image. In general, accuracy performance can be well preserved under the different key lengths and the intra-user variations. To further illustrate the accuracy performance of our methods, we compare our approach with other state-of-art methods [15,[64][65][66] on Extended YaleB and CMU-PIE dataset. Tables 2 and 3 show the comparison results on the Extended YaleB and CMU-PIE datasets, respectively. On the one hand, our method can earn a GAR of 99.40% for 128 bits binary code, which outperforms the Genetic-ECOC method [64] at GAR at a fixed 1% FAR on the Extended YaleB dataset. On the other hand, references [64,66] may perform extremely well; they are close to our performance. However, their lengths of the codeword are both less than 90, and as such they offer lower security to brute force attacks. In Table 1, we can find that the GAR and EER of our method are 99.97% and 1.49%, respectively, for a length size of 128 on the CMU-PIE dataset. Thus, our algorithm outperforms reference [65] even if our length of the codeword is less than [65]. In summary, our method can achieve a higher GAR and a lower EER than Hybrid [65], BDA [66], MEB coding [15], and Genetic-ECOC [64] on the CMU-PUE dataset. The reason is that our proposed scheme adopts the DNN model based on feature extraction network and binary code mapping network to generate robust binary code, which can enhance the compactness of intra-class and discrepancy of inter-class. Our proposed method can effectively perform better accuracy than the state-of-the-art methods under the intra-user variations such as illumination, pose, and expressions. Therefore, our approach can enhance stability under the multi-shot enrolment.

Basic Property Analysis
In this section, we discuss two basic attributes, including randomness and revocability, which are the basic security requirement of the bio-key generation method. In the next part, we will analyze these security properties in detail.

Randomness Analysis
For a key to be secure, it needs to guarantee randomness, which can increase the computational complexity of brute force attacks. To validate this point, we test the randomness of the generated bio-keys by the standard NIST statistical test suite. Firstly, we randomly select 1000 face images to generate the corresponding bio-keys, and then these generated bio-keys can be sequentially formed into 1000 × 512 binary bits as a source file, which is larger than 10,000 bits. Next, the source file is fed into NIST statistical test suite for evaluating randomness. Finally, according to the test benchmark, 16 tests of randomness are generated. According to NIST recommendations, if the p-value of 16 tests is larger than 0.01, the binary bit of the bio-key is random. The computed p-values of 16 tests are all greater than 0.01 in our scheme. This is because that generated bio-key is derived from random binary code. Overall, our generated bio-keys pass the randomness test and meet the randomness property.

Revocability Analysis
The revocability denotes a new bio-key can be regenerated when the original bio-key is leaked. Meanwhile, it should ensure that the new bio-key has no relation with the leaked bio-key, which can prevent the attacker from successfully accessing other authentication systems through the leaked bio-key or biometric data. Therefore, we will analyze the revocability of our proposed approach. As is illustrated in Section 3.3.1, we adopt the random permutation module to generate different PV by modifying a random permutation seed for each user, which can shuffle the assigned binary code to generate the new bio-key. Therefore, an old bio-key can be replaced by a new bio-key by setting a different random permutation seed. In addition, there is no relationship between the new bio-key and the old bio-key because the generated bio-key is only determined by random PV.
To further demonstrate the effectiveness of the revocability in our scheme, we conduct some experiments on ORL, Extended YaleB, and CMU-PIE datasets. The mated distances are computed by Hamming Distance (HD), which are from two generated bio-keys of intrauser with different random permutation seeds. On the contrary, the genuine distances are calculated by HD matching from intra-user with the same permutation seed. If the mated HDs are both close to l/2, where l denotes the key length, it indicates that the new bio-key is fully dissimilar to the old bio-key. Therefore, we can collect genuine and mated HDs from ORL, Extended YaleB, and CMU-PIE datasets, respectively. As shown in Figure 7, it can be observed that the distributions of mated and genuine HD are distinguishable on three datasets at l = 1024, 2048. In addition, the mated HDs are mainly distributed around half of the key length. Specifically, HD between the new bio-key and the old bio-key from the same user is larger than the error-correcting range of the BCH decoder. Through the above phenomenon, we can conclude two points. The first point is that the new key and the old bio-key of intra-user are fully independent and dissimilar from each other. The second point is that our scheme can approximatively produce 100% revocability of the bio-key. For example, if l = 2048, there are 2 2048−1 different revocations of the bio-key. Hence, our proposed scheme satisfies the revocability property. systems through the leaked bio-key or biometric data. Therefore, we will analyze the revocability of our proposed approach. As is illustrated in Section 3.3.1, we adopt the random permutation module to generate different PV by modifying a random permutation seed for each user, which can shuffle the assigned binary code to generate the new bio-key. Therefore, an old bio-key can be replaced by a new bio-key by setting a different random permutation seed. In addition, there is no relationship between the new bio-key and the old bio-key because the generated bio-key is only determined by random PV.
To further demonstrate the effectiveness of the revocability in our scheme, we conduct some experiments on ORL, Extended YaleB, and CMU-PIE datasets. The mated distances are computed by Hamming Distance (HD), which are from two generated bio-keys of intra-user with different random permutation seeds. On the contrary, the genuine distances are calculated by HD matching from intra-user with the same permutation seed. If the mated HDs are both close to /2, where denotes the key length, it indicates that the new bio-key is fully dissimilar to the old bio-key. Therefore, we can collect genuine and mated HDs from ORL, Extended YaleB, and CMU-PIE datasets, respectively. As shown in Figure 7, it can be observed that the distributions of mated and genuine HD are distinguishable on three datasets at = 1024, 2048. In addition, the mated HDs are mainly distributed around half of the key length. Specifically, HD between the new bio-key and the old bio-key from the same user is larger than the error-correcting range of the BCH decoder. Through the above phenomenon, we can conclude two points. The first point is that the new key and the old bio-key of intra-user are fully independent and dissimilar from each other. The second point is that our scheme can approximatively produce 100% revocability of the bio-key. For example, if = 2048, there are 2 different revocations of the bio-key. Hence, our proposed scheme satisfies the revocability property.

Security Analysis
In practical application, we should consider the security of generated bio-key in different attack scenarios. In this part, we analyze attacks at information leakage about biometric data and bio-keys for our system. In addition, we discuss some other attacks including cross-matching attack, correlation attack, and guessing mapped binary code attack in detail.

Resisting Information Leakage Attacks
In the worst-case scenario, we assume that attackers can obtain intermediate information in our proposed system. Moreover, our algorithm is public to attackers. There are two points where information is leaked as follows: (1) trained network parameters, (2) PV and AD stored in the database. We will analyze the security according to these two points.
(1) Trained network parameters: In the trained DNN model, there are a large number of weight and bias parameters, which are used to achieve the mapping of the biometric image to binary code. Since network parameters are only combined with the input biometric image to forward predict binary code, the information of biometric data and biokey is not revealed from the network parameters. In the case of the known algorithm with network parameters, the attackers can use a large number of imposter samples as input to yield a false acceptance in brute forcing. Actually, this attack exploits the vulnerability of a biometric system in false acceptance. If the system has low distinguishability between genuine and imposter samples, the attacker can easily access the system under a false acceptance. Thus, the FAR of the system under this attack scenario is a satisfactory evaluation metric.
To verify this point, we utilize the trained DNN model with parameters to generate binary code under the aforementioned attack. The distributions between genuine and imposter matching distance for all other user samples other than the genuine is considered as the imposter. As shown in Figure 8, it can be seen that the HD distribution of intersubjects is close to half of the key length. Meanwhile, the HD distribution of intra-subjects is about 15% of the key length. Thus, our model can recognize individuals well when the error-correcting range is set to approximately 15% of the key length. According to the generation rule of BCH codewords, we can make the error-correcting capacity slightly larger than 15% of the key length, which can also maintain the recognition accuracy at FAR = 0. In summary, the error-correcting range of the BCH can recognize individuals in our scheme. In other words, it can be observed that the distributions between the imposter and genuine matching distance are fully distinguishable on ORL, Extended YaleB, and CMU-PIE datasets, specifically, which indicates the FAR is close to zero under this attack.

Security Analysis
In practical application, we should consider the security of generated bio-key in different attack scenarios. In this part, we analyze attacks at information leakage about biometric data and bio-keys for our system. In addition, we discuss some other attacks including cross-matching attack, correlation attack, and guessing mapped binary code attack in detail.

Resisting Information Leakage Attacks
In the worst-case scenario, we assume that attackers can obtain intermediate information in our proposed system. Moreover, our algorithm is public to attackers. There are two points where information is leaked as follows: (1) trained network parameters, (2) PV and AD stored in the database. We will analyze the security according to these two points.
(1) Trained network parameters: In the trained DNN model, there are a large number of weight and bias parameters, which are used to achieve the mapping of the biometric image to binary code. Since network parameters are only combined with the input biometric image to forward predict binary code, the information of biometric data and bio-key is not revealed from the network parameters. In the case of the known algorithm with network parameters, the attackers can use a large number of imposter samples as input to yield a false acceptance in brute forcing. Actually, this attack exploits the vulnerability of a biometric system in false acceptance. If the system has low distinguishability between genuine and imposter samples, the attacker can easily access the system under a false acceptance. Thus, the FAR of the system under this attack scenario is a satisfactory evaluation metric.
To verify this point, we utilize the trained DNN model with parameters to generate binary code under the aforementioned attack. The distributions between genuine and imposter matching distance for all other user samples other than the genuine is considered as the imposter. As shown in Figure 8, it can be seen that the HD distribution of intersubjects is close to half of the key length. Meanwhile, the HD distribution of intra-subjects is about 15% of the key length. Thus, our model can recognize individuals well when the error-correcting range is set to approximately 15% of the key length. According to the generation rule of BCH codewords, we can make the error-correcting capacity slightly larger than 15% of the key length, which can also maintain the recognition accuracy at FAR = 0. In summary, the error-correcting range of the BCH can recognize individuals in our scheme. In other words, it can be observed that the distributions between the imposter and genuine matching distance are fully distinguishable on ORL, Extended YaleB, and CMU-PIE datasets, specifically, which indicates the FAR is close to zero under this attack. Hence, it proves the false acceptance attack is difficult and biometric information is not leaked in this situation. Hence, it proves the false acceptance attack is difficult and biometric information is not leaked in this situation.
(a) (b) (c) (2) PV and AD stored in the database: In this case, it is assumed that the PV and AD for each user in the database are obtained by attackers. Firstly, the PV is generated by RNG in the random permutation module, and the binary code is randomly shuffled to generate bio-key by using PV. It can be concluded that PV is independent of the binary code and bio-key. Even if the PV is compromised, sensitive biometric information is not leaked. Secondly, the AD is produced from binary codes and by using the fuzzy commitment encoder. It is noted that the elements of the and follow a uniform distribution. Hence, the attacker cannot infer the element of random binary code through the element correlation of AD. References [24,25] proposed the statistical attacks to retrieve the biometric data and bio-key by analyzing the correlation of multiple helper data for the same user. However, the extracted binary codes from the same user are independent of each other in our proposed approach. Although multiple ADs are public to the attacker, it is also difficult to retrieve the binary code by using statistical attacks. This is because the binary codes are randomly assigned by RNG with different random seeds for the same user, which leads to no correlation between multiple ADs. Therefore, it is practically impossible to retrieve biometric data and bio-keys when PV and AD are compromised.

Resisting Other Attacks
In this section, we will discuss brute force attack, cross-matching attack and guessing mapped binary code attack.
(1) Brute force attack: In this attack scenario, an attacker directly tries to guess a biokey. It is assumed that no a priori information about the bio-key is public to the attacker. To obtain the corrected bio-key, the attacker has no choice but to guess the bio-key in brute forcing. If the length of bio-keys is known to the attacker, it will take 2 trials to obtain the corrected bio-key when the bio-keys are random and dissimilar. For instance, the attacker conducts 2 attempts when the length = 1024. Therefore, it is difficult to guess the bio-key by using an electronic computer in this situation. It can be noted that the randomness of bio-keys is important against the brute force attack. In our proposed approach, the bio-key for each user is derived from random binary code. Further, the randomness of bio-keys has been passed in Section 4.4.1. Therefore, bio-keys of inter-users are random and independent of each other, and it is impractical to guess the bio-key.
(2) Cross-matching attack: In this attack scenario, we assume that the attacker can obtain a bio-key of a user, which can be used to perform cross-matching across multiple biometric databases. In our proposed scheme, to generate multiple discriminative bio- (2) PV and AD stored in the database: In this case, it is assumed that the PV and AD for each user in the database are obtained by attackers. Firstly, the PV is generated by RNG in the random permutation module, and the binary code K is randomly shuffled to generate bio-key by using PV. It can be concluded that PV is independent of the binary code and bio-key. Even if the PV is compromised, sensitive biometric information is not leaked. Secondly, the AD is produced from binary codes K R and K by using the fuzzy commitment encoder. It is noted that the elements of the K R and K follow a uniform distribution. Hence, the attacker cannot infer the element of random binary code through the element correlation of AD. References [24,25] proposed the statistical attacks to retrieve the biometric data and bio-key by analyzing the correlation of multiple helper data for the same user. However, the extracted binary codes from the same user are independent of each other in our proposed approach. Although multiple ADs are public to the attacker, it is also difficult to retrieve the binary code by using statistical attacks. This is because the binary codes are randomly assigned by RNG with different random seeds for the same user, which leads to no correlation between multiple ADs. Therefore, it is practically impossible to retrieve biometric data and bio-keys when PV and AD are compromised.

Resisting Other Attacks
In this section, we will discuss brute force attack, cross-matching attack and guessing mapped binary code attack.
(1) Brute force attack: In this attack scenario, an attacker directly tries to guess a bio-key. It is assumed that no a priori information about the bio-key is public to the attacker. To obtain the corrected bio-key, the attacker has no choice but to guess the bio-key in brute forcing. If the length l of bio-keys is known to the attacker, it will take 2 l trials to obtain the corrected bio-key when the bio-keys are random and dissimilar. For instance, the attacker conducts 2 1024 attempts when the length l = 1024. Therefore, it is difficult to guess the bio-key by using an electronic computer in this situation. It can be noted that the randomness of bio-keys is important against the brute force attack. In our proposed approach, the bio-key for each user is derived from random binary code. Further, the randomness of bio-keys has been passed in Section 4.4.1. Therefore, bio-keys of inter-users are random and independent of each other, and it is impractical to guess the bio-key.
(2) Cross-matching attack: In this attack scenario, we assume that the attacker can obtain a bio-key of a user, which can be used to perform cross-matching across multiple biometric databases. In our proposed scheme, to generate multiple discriminative biokeys for the same user, we only modify permutation seed to produce random PVs in the enrollment stage, which can be deployed to multiple databases. Hence, the generated bio-keys of intra-users are also random and dissimilar for each other. The compromised bio-key cannot be linked to other biometric systems. In addition, we can easily regenerate the PV to obtain a new bio-key by setting different random permutation seeds in the compromised system. In other words, our bio-key generation method has revocability that ensures the distinguishability and renewability of the bio-keys for intra-user, which has been analyzed in Section 4.4.2. Hence, our proposed approach can effectively resist cross-matching attack.
(3) Guessing mapped binary code attack: In our bio-key reconstruction process, the bio-key is generated from mapped binary code. It is assumed that PV and AD stored in the database are known to an attacker. Meanwhile, it is tested if an attacker can correctly guess the mapped binary code. Based on the above assumption, the bio-key can be restored by utilizing the guessed binary code, PV, and AD of the user through our scheme. To analyze the effectiveness of our proposed approach against the attack, we measure the degree of freedom (N) (i.e., uncertainty) of the binary code from one sample of a user to an inter-sample of other users. Firstly, the binary code is generated from the sample of a user, and other binary codes are generated from inter-samples of other users. The normalized Hamming distances are calculated from the generated binary codes on ORL, Extended YaleB, and CMU-PIE datasets. Then, we compute the mean (µ) and the standard deviation (σ) from the normalized Hamming distances, respectively. Next, the degrees of freedom N of the mapped binary code for the user in the three datasets are obtained as (N = µ(1−µ) σ 2 ). Moreover, our ECC coding can correct approximately 10 percent of bit error. If N = 512 and C = 0.1× N ≈ 51 , the attacker actually has 2 N−C = 2 461 brute force (BF) trials. Finally, we can compute a conservative theoretical bound BF by using the bound formula, it can be defined as: where N is the degree of freedom, C is the error-correction bits, and BF is the brute force trials. Therefore, we can compute N of the mapping binary code for the user in Table 4. There are 2 417 , 2 387 , and 2 366 trials for three datasets, respectively. Thus, it is difficult to guess the correctly mapped binary to restore the bio-key for a user.

Comparison with Related Works
In this section, we compare our approach with related works in the aspect of security. From Table 5, we compare with the related works [17,29,65,66]. Now, we will analyze the mentioned methods in detail. Li et al. [29] proposed alignment-free fingerprint cryptosystem based on fuzzy vault. Since the stored data contains the biometric feature, an attacker can retrieve biometric data. Chauhan et al. [67] improved the fuzzy commitment scheme. If parameters of S and P are compromised, the biometric template and bio-key can be retrieved by using stored data in a database. Roy et al. [17] put forward a DNN model to generate the bio-key from the retinal image. However, if the extract biometric template is compromised, the new template cannot be regenerated. Asthana et al. [68] designed a new key binding with biometric data via objective function. Nevertheless, the security actually depends on the predefined threshold value. If this threshold value is obtained, the stored helper data may reveal biometric data. Therefore, these methods are vulnerable to information leakage attack.
To avoid the information leakage of biometric data and bio-key, we combine DNN architecture and fuzzy commitment to generate the bio-key. During enrollment, we assign random binary code to each user, which is shuffled via PV to generate permuted code. Next, the binary codes are encoded by the fuzzy commitment module to yield AD. Finally, PV and AD are stored in the database. During bio-key reconstruction, we utilize the DNN to generate binary code, and the final bio-key is restored through our scheme. It can be observed that input binary codes of the fuzzy commitment encoder are uniformly distributed and are not related to biometric data during the enrollment stage. Hence, it does not expose biometric data and bio-key when PV and AD are public. Additionally, we can modify the permutation seed to update a new pair of PV and AD for the same user, which can also resist cross-matching attack.
Overall, our proposed approach is more robust against information leakage attack and cross-matching attack. Moreover, to meet the different sizes of the key in the encryption application, our model can be easily modified by only setting the output dimension in the DNN architecture, which means that our accuracy can be well preserved in different key lengths.

Experiment Platform
To validate the flexibility and practicality, we apply our model to a real-world data encryption scenario. In this application, we use the computer system to test the performance of cost time and accuracy in data encryption application. This test system contains a 64-bit CPU with Intel(R) Core(TM) i7-9750H, and a USB camera with a resolution of 640 × 480. a face image is captured by the USB camera then the captured face data and plaintext are entered in the test model to verify its performance. We can utilize the PC display to observe the test result.

Experiment Dataset
In this application scenario, we use face image as the input biometric image and al-ice29.txt from Canterbury corpus as plaintext to test the performance on the user computer. For biometric images, we adopt the USB camera to collect face images of different users in a real environment, which are composed of 10 different face images with 640 × 480 resolution of each of the 40 face subjects. These datasets, including different illuminations, expressions, and poses, are aligned to obtain the cropped face images with the resolution of 112 × 96 by using MTCNN [63]. In addition, we randomly select five face images of each subject for training our key generation model, and other face images are used for testing. For alice29.txt of plaintext file, it consists of various characters, and the size of the file is 152,089 bytes.

Experiment Process
To meet the different key lengths of standard AES based on cipher block changing (CBC), our key generation model is trained to generate 128-, 192-, and 256-bit bio-keys from the collected dataset during the training stage. For different sizes of bio-keys, we only make a slight modification on the output dimension in the DNN architecture. Subsequently, our key generation model with trained DNN parameters is deployed into the user computer.
As is shown in Figure 9, we give the encryption and decryption process. Our test model consists of face detection, key generation model, AES encryption, and decryption modules. In the encryption process, we adopt the USB camera to obtain a face image of the user; then, this face image is detected through the face detection module by MTCNN.
Next, a bio-key is generated by our trained key generation model. At the same time, the plaintext is submitted as an input of the AES encryption module. Finally, we perform data encryption with the inputs of the bio-key and plaintext to yield a ciphertext. In the decryption process, we also obtain different face images of the same user to regenerate the bio-keys, which are used to decrypt the ciphertext for generating plaintext. Therefore, we compare the data consistency.
from the collected dataset during the training stage. For different sizes of bio-keys, we only make a slight modification on the output dimension in the DNN architecture. Subsequently, our key generation model with trained DNN parameters is deployed into the user computer.
As is shown in Figure 9, we give the encryption and decryption process. Our test model consists of face detection, key generation model, AES encryption, and decryption modules. In the encryption process, we adopt the USB camera to obtain a face image of the user; then, this face image is detected through the face detection module by MTCNN. Next, a bio-key is generated by our trained key generation model. At the same time, the plaintext is submitted as an input of the AES encryption module. Finally, we perform data encryption with the inputs of the bio-key and plaintext to yield a ciphertext. In the decryption process, we also obtain different face images of the same user to regenerate the bio-keys, which are used to decrypt the ciphertext for generating plaintext. Therefore, we compare the data consistency.

Results Analysis
In this section, we analyze accuracy and time consumption in the data encryption application. In the aspect of accuracy, there are 200 × 199 = 39,800 comparisons at different key lengths in data encryption and decryption processes. Among them, genuine comparisons have 3800 trials and imposter comparisons have 36,000 trials. The verification results are listed in Table 6. Our GAR@1%FAR is close to 100% under different key lengths. However, the several obtained face images include large pose changes and partial occlusion, which slightly reduce the accuracy. In practical application, face images are generally obtained when a user maintains a normal posture. Therefore, this accuracy is acceptable. In the aspect of time consumption, the cost time of data encryption and decryption is shown in Figure 10. The face detection and key generation modules account for nearly half of the total time consumption for the key lengths of 128, 192, and 256 bits. This is because these two modules both adopt DNN architecture, thus they are more time-consuming than the AES encryption module. In general, the time for once encryption and decryption is within 250 ms on this application. Specially, the time of key generation is

Results Analysis
In this section, we analyze accuracy and time consumption in the data encryption application. In the aspect of accuracy, there are 200 × 199 = 39,800 comparisons at different key lengths in data encryption and decryption processes. Among them, genuine comparisons have 3800 trials and imposter comparisons have 36,000 trials. The verification results are listed in Table 6. Our GAR@1%FAR is close to 100% under different key lengths. However, the several obtained face images include large pose changes and partial occlusion, which slightly reduce the accuracy. In practical application, face images are generally obtained when a user maintains a normal posture. Therefore, this accuracy is acceptable. In the aspect of time consumption, the cost time of data encryption and decryption is shown in Figure 10. The face detection and key generation modules account for nearly half of the total time consumption for the key lengths of 128, 192, and 256 bits. This is because these two modules both adopt DNN architecture, thus they are more time-consuming than the AES encryption module. In general, the time for once encryption and decryption is within 250 ms on this application. Specially, the time of key generation is less than 130 ms of 128, 192, and 256 key lengths. For the data encryption application, this cost consumption is worthwhile, and our scheme is effective and practical for real-world bio-key applications.
less than 130 ms of 128, 192, and 256 key lengths. For the data encryption application, this cost consumption is worthwhile, and our scheme is effective and practical for real-world bio-key applications.

Conclusions
In this paper, we propose a secure bio-key generation scheme based on deep learning. Firstly, to improve the security for preventing information leakage, a random binary code is assigned to each user. Moreover, the biometrics mapping model based on the DNN framework is designed to map the biometric images into diverse binary codes for different users. Secondly, the random permutation is adopted to shuffle the random binary code by modifying the permutation seed for protecting privacy and guaranteeing revocability. Thirdly, to generate a stable and secure bio-key, we construct a new fuzzy commitment module. Furthermore, our scheme was applied to the data encryption scenario for testing its practicality and effectiveness. Through the analysis of the experimental results, on the one hand, our scheme can effectively enhance security and privacy while maintaining accuracy performance. On the other hand, the security analysis illustrates our scheme not only satisfies the properties of revocability and randomness of bio-keys, but resists various attacks such as information leakage attack, brute force attack, cross-matching attack, and guessing mapped binary code attack. However, our method has a limitation without retraining the network. In other words, it is not suitable for zero-shot enrollment. Since the generated bio-key needs to be unique, reliable, and random, it is difficult to ensure that the trained DNN model meets the above three properties without retraining. We will focus on how to improve stability and security under zero-shot enrollment in the future work.

Conclusions
In this paper, we propose a secure bio-key generation scheme based on deep learning. Firstly, to improve the security for preventing information leakage, a random binary code is assigned to each user. Moreover, the biometrics mapping model based on the DNN framework is designed to map the biometric images into diverse binary codes for different users. Secondly, the random permutation is adopted to shuffle the random binary code by modifying the permutation seed for protecting privacy and guaranteeing revocability. Thirdly, to generate a stable and secure bio-key, we construct a new fuzzy commitment module. Furthermore, our scheme was applied to the data encryption scenario for testing its practicality and effectiveness. Through the analysis of the experimental results, on the one hand, our scheme can effectively enhance security and privacy while maintaining accuracy performance. On the other hand, the security analysis illustrates our scheme not only satisfies the properties of revocability and randomness of bio-keys, but resists various attacks such as information leakage attack, brute force attack, cross-matching attack, and guessing mapped binary code attack. However, our method has a limitation without retraining the network. In other words, it is not suitable for zero-shot enrollment. Since the generated bio-key needs to be unique, reliable, and random, it is difficult to ensure that the trained DNN model meets the above three properties without retraining. We will focus on how to improve stability and security under zero-shot enrollment in the future work.

Conflicts of Interest:
The authors declare no conflict of interest.