An Improved Fuzzy Vector Signature with Reusability

: Fuzzy vector signature (FVS) is a new primitive where a fuzzy (biometric) data w is used to generate a veriﬁcation key ( VK w ) , and, later, a distinct fuzzy (biometric) data w (cid:48) (as well as a message) is used to generate a signature ( σ w (cid:48) ) . The primary feature of FVS is that the signature ( σ w (cid:48) ) can be veriﬁed under the veriﬁcation key ( VK w ) only if w is close to w (cid:48) in a certain predeﬁned distance. Recently, Seo et al. proposed an FVS scheme that was constructed (loosely) using a subset-based sampling method to reduce the size of helper data. However, their construction fails to provide the reusability property that requires that no adversary gains the information on fuzzy (biometric) data even if multiple veriﬁcation keys and relevant signatures of a single user, which are all generated with correlated fuzzy (biometric) data, are exposed to the adversary. In this paper, we propose an improved FVS scheme which is proven to be reusable with respect to arbitrary correlated fuzzy (biometric) inputs. Our efﬁciency improvement is achieved by strictly applying the subset-based sampling method used before to build a fuzzy extractor by Canetti et al. and by slightly modifying the structure of the veriﬁcation key. Our FVS scheme can still tolerate sub-linear error rates of input sources and also reduce the signing cost of a user by about half of the original FVS scheme. Finally, we present authentication protocols based on fuzzy extractor and FVS scheme and give performance comparison between them in terms of computation and transmission costs.


Introduction
Biometric information (e.g., fingerprint, iris, face, vein) has been used for user authentication [1][2][3][4][5] because of its uniqueness and immutability. Due to these properties, such biometric information can be used in place of a user secret key in an authentication system. When using biometric information as a security key, the user is not required to memorize or securely store anything to authenticate, which makes the process much more convenient and user-friendly. However, since biometric information is noisy and non-uniformly distributed, it differs greatly from what is known about cryptographic secret keys. In general, a secret key of an authentication system is largely set to be a uniformly random string of fixed length. Until now, a large body of research has been conducted to bridge this gap and enable biometric information to be used as a secret key in a cryptographic way.
To overcome the problem of noisy secret keys, researchers proposed the fuzzy extractor and fuzzy signature as two types of biometric cryptosystems. Comprised of two algorithms, Gen and Rep, a fuzzy extractor [6] is able to generate a uniformly random string of fixed length (i.e., a secret key) from fuzzy (biometric) data. The generation algorithm Gen takes as input sample f fuzzy (biometric) data w to generate a secret key r together with helper data p. The reproduction algorithm Rep takes as input another sample of fuzzy (biometric) data w close to w and p to reproduce r. If the difference between w and w is less than a minimal threshold value, Rep can generate the same secret key r. On the other hand, consisting of three algorithms, KG, Sign, and Vrfy, the fuzzy signature [7] generates a signature by using fuzzy (biometric) data itself as a signing key. The key generation algorithm KG takes as input sample of fuzzy (biometric) data w to generate a verification key vk. The signing algorithm Sign takes as input another sample of biometric information w to generate a signature σ. The verification algorithm Vrfy takes as input vk and σ and succeeds in verification if w and w are close to within a fixed threshold.
Biometric cryptosystems are generally considered secure when each sample of fuzzy (biometric) data is used only once. However, in reality, a user may use the same biometric source (e.g., right index fingerprint) to authenticate their accounts for several applications as a matter of expediency. Since similar biometric information is used multiple times, a new security notion is required to guarantee both the privacy of the fuzzy (biometric) data at hand and the reusable security of biometric cryptosystems in this situation. In 2004, Boyen [8] introduced the reusability of a fuzzy extractor, which ensures no entropy loss to the secret key or biometric source even when relevant pairs of helper data or keys from similar forms of biometric information are revealed. Since then, many researchers have focused on studying this fuzzy extractor with reusability (a.k.a. reusable fuzzy extractor).
In order for a reusable fuzzy extractor to be widely used in practice, it should be able to tolerate more than a certain level of errors inherent in fuzzy (biometric) data. For example, iris readings have the average error rate of 20-30% [9][10][11]. A number of studies have proposed constructions that tolerate linear fraction of errors [8,[12][13][14][15], but these schemes impose a strong requirement on the distribution of fuzzy (biometric) data, namely: (1) the distribution must have sufficiently high min-entropy ("Y" in High min-entropy in Table 1), or (2) any difference between two distinct biometric readings cannot significantly decrease the min-entropy of the fuzzy (biometric) data ("H ∞ [w i |w i − w j ] > m" in Source Distribution in Table 1). Unfortunately, both of these expectations are somewhat unrealistic.  [8] O Y + linear weak w i = w + δ i [16] X N LWE log Strong w i = w + δ i [13] O Y LWE linear Strong w i = w + δ i [15] O Y DDH linear Strong [17] X N X sub-lin Strong (w, w i ) [18] X N X sub-lin Strong (w, w i ) [12] X Y X linear Strong (w, w i ) FS [19] X N LWE log Strong • In Secure Sketch, If the scheme used the secure sketch, "O". Otherwise, "X". • In Reusability, "weak" means that the scheme is proven in the weak reusability model; • " " means that the scheme does not provide a formal proof; • In Source Distribution, "w i = w + δ i " means that, for a fuzzy (biometric) source w, the error δ i is controlled by an adversary; • "(w, w i )" means that the biomteric readings w and w i are arbitrary correlated; • "(T, k)" means the (T, k)-block source in Reference [21]; • High Min-entropy means that min-entropy is higher than some value secure against brute-force attack. In High Min-entropy, "Y" is indicated if the scheme requires sufficiently high min-entropy of the input (biometric) data, and "N" otherwise; • In Security Assumption, "+" means that the scheme is information-theoretically secure.
Canetti et al. [17] relaxed these conditions, only requiring a subset of samples to have sufficient average min-entropy for given subset indices, and proposed a reusable fuzzy extractor that would tolerate a sub-linear fraction of errors. The construction is contingent on the existence of a powerful tool called digital locker, which is a non-standard assumption. Other reusable biometric cryptosystems [16,19] did not require either unrealistic requirements on the biometric distribution or contain non-standard assumptions, but they only tolerated a logarithmic fraction of errors.
Recently, a new primitive called fuzzy vector signature (FVS) [20] was proposed based on bilinear maps (i.e., pairings), which improved the error tolerance rate without any additional requirements on the distribution of biometric information. This scheme tolerates a sub-linear fraction of errors and also is based on standard assumptions, like the external Diffie-Hellman (XDH). It is also claimed to be reusable, but no formal proof for reusability was provided in Reference [20]. In this paper, we introduce the formal security model for reusability of fuzzy vector signature (FVS) and prove that our proposed scheme is reusable in the reusability model. By more strictly applying the subset-based sampling method [17], our scheme is also more efficient than Reference [20] from the perspective of the user and the authentication server. Specifically, it reduces not only the size of the signature and verification key but also the number of pairing operations required for verification. Section 5 outlines a detailed performance comparison.

Related Work
Reusable Fuzzy Extractor. The concept of a fuzzy extractor and secure sketch was first proposed by Dodis et al. [6]. Following this, Boyen et al. [8] introduced the notion of reusability, meaning that additional keys could be securely generated for any helper data, which is required to regenerate the key, even if the helper data or key pairs were exposed. Wen et al. [14] subsequently proposed a reusable fuzzy extractor based on the Decisional Diffie-Hellman (DDH) problem, which can tolerate a linear size of errors. However, their scheme requires that, for any two distinct readings of the same source, the differences between them should not leak significant information about the source-a requirement that is too strict, as each component of fuzzy (biometric) data is non-uniformly distributed. In response to this, Wen et al. [15] proposed a DDH-based reusable fuzzy extractor to remove the requirement [14] by changing the source distribution. Apart from these studies, Wen et al. [13] also proposed a reusable fuzzy extractor that allowed a linear fraction of errors based on the Learning with Errors (LWE) assumption [22]. However, their schemes [13][14][15] used a secure sketch as a method for controlling noise of data. A secure sketch is used for recovering w from w if w and w are close to within a fixed distance, but it nevertheless causes a small leakage of biometric information. Therefore, when considering reusability scenarios, it is noted that these schemes require an input source to have sufficiently high min-entropy to avoid brute-force attack even after security loss.
On the other side of the spectrum are reusable fuzzy extractors that did not suggest using secure sketches. Canetti et al. [17] proposed a reusable fuzzy extractor that is constructed using the subset-based sampling technique instead of a secure sketch to control a noisy data source. This scheme relied on the use of a digital locker to generate helper data. However, even if a digital locker was simply instantiated with a hash function, the size of helper data would increase significantly (e.g., about 0.8 GB to achieve an error tolerance rate of 20%). Cheon et al. [18] modified this scheme [17] to reduce the size of the helper data, but in turn, the computational cost for Rep increased significantly. Alamélou et al. [12] also improved [17] and suggested a fuzzy extractor that achieved a linear fraction of errors. However, they also used a digital locker and this scheme had an unrealistic requirement that each component of a fuzzy (biometric) source had to have significant min-entropy. Apon et al. [16] modified a non-reusable fuzzy extractor [23] based on the LWE assumption into a reusable one. However, their scheme can tolerate only a logarithmic fraction of errors due to the time-consuming process of reproducing a key even with a small number of errors.
Fuzzy Signature. The concept of a fuzzy signature was proposed by Takahashi et al. [7]. The fuzzy signature does not need helper data, unlike the fuzzy extractor, because it only requires a valid correlation between two input fuzzy (biometric) data relevant to a verification key and a signature, which can be achieved using a linear sketch. However, in Reference [7], a strong assumption was required that input fuzzy (biometric) data should be uniformly distributed, which is then relaxed in Reference [24]. Afterwards, Yasuda et al. announced that a linear sketch of fuzzy extractors [7,24] is vulnerable to recovery attacks [25]. In Reference [26], the merge of Reference [7,24], Takahashi et al. proposed two instantiations of a fuzzy signature secure against recovery attacks. In 2019, Tian et al.
first introduced the notion of reusability in a fuzzy signature [19]. To construct a reusable fuzzy signature, they adopted a reusable fuzzy extractor in Reference [16]. Consequently, the reusability is limited only to the generation of verification keys, ignoring the privacy of signatures.
Fuzzy Vector Signature. Seo et al. first proposed a fuzzy vector signature [20] following the subset-based sampling method of Reference [17]. The fuzzy vector signature requires the signing parameter like helper data in fuzzy extractor, to be reusable, but the size of the signing parameter is much smaller than that of helper data in Reference [17]. In addition, a fuzzy vector signature can tolerate sub-linear errors, while a reusable fuzzy signature [19] can only tolerate logarithmic errors. However, the size of verification key in Reference [20] is still huge, which results in high computational costs for verification. In addition, security models for reusability are incomplete as in Reference [19], and, as a result, no formal proof of reusability is provided.

Source Distributions
A source distribution of reusable biometric cryptosystems can be categorized into four types, according to correlation between two repeating readings, as in Table 1.
"w i = w + δ i " implies that the hamming distance between w ∈ W and w + δ i is less than a threshold value t, where W is an input source. Since δ i is randomly chosen by an adversary, w + δ i may not be included in W, which is a little far from reality. Especially in Reference [16,19], additional assumptions were made that W should be a small error distribution χ of Learning with Errors (LWE) problem and both w and w + δ i should be in W to tolerate logarithmic errors.
"H ∞ [w i |w i − w j ] > m" implies that, for any two distinct readings w i and w j in W, H ∞ [w i |w i − w j ] > m holds where m is a minimum level of security. In other words, the difference between w i and w j (i.e., w i − w j ) should not leak too much information of w i even if w i − w j is correlated to w i , which is a strong requirement for the input source.
"(w, w i )" implies that any two distinct readings are arbitrarily correlated, which would be the most realistic assumption. However, as a trade-off, schemes based on this distribution require additional assumptions on the input source. For example, in Reference [17,18], any subset of W = (W[1], . . . , W[n]) should have high min-entropy even if indices are exposed, and, in Reference [12], each component W[j] should have a high min-entropy even if other components are exposed.
Unlike the above three types, a previous fuzzy vector signature [20] used a (T, k)-block source in Reference [21], although following the subset concept in Reference [17] for construction. (T, k)-block source means that for input sources W 1 , . . . , W T , i-th source has a high min-entropy even though i − 1 readings are set to w 1 , . . . , where k is a minimum level of security. Namely, W 1 , . . . , W T are not correlated. However, since fuzzy (biometric) data from the same source must be correlated, it is inappropriate to use the (T, k)-block source when considering reusability.

Contribution
In this paper, we propose a new fuzzy vector signature (FVS) scheme based on a subset concept in Reference [17] to deal with noise in fuzzy (biometric) data. In consequence, our scheme is reusable and can tolerate sub-linear errors without any additional requirements, such as sufficiently high min-entropy of the input source. Compared to the previous fuzzy vector signature [20], we eliminated redundant parts in both verification key and signature by trying a new approach to security proofs, which in turn improved the efficiency of the scheme. For instance, for 80-bit security with 20% error tolerance rates, we reduce the size of the signing parameter by 33%, from 48 KB to 32 KB, the signature by 50%, from 32 KB to 16 KB, and a verification key by 22%, from 1.61 GB to 1.25 GB, where the length of the fuzzy (biometric) data is n = 512 bits. Additionally, we reduce the number of pairing operations in verification by up to 33% from 18.5 × 10 6 to 12.3 × 10 6 .
We also provide the formal security models to reflect reusability of the privacy of the verifier and signer (i.e., VK-privacy and SIG-privacy, respectively) and the unforgeability of the signature (i.e., Reusability). And we prove these properties on the assumption that the repeating readings of fuzzy (biometric) data are arbitrarily correlated, which is more realistic than the (T, k)-block source used in Reference [20]. In addition, we analyze the performance of our FVS scheme in terms of transmission and computational costs by comparing it with previous reusable biometric cryptosystems [17,18,20]. In particular, a signature can be generated in 155 ms on the signer's side, which is almost twice as fast as in Reference [20] (for more details, see Section 5).

Notation
Let λ be the security parameter and poly(λ) denote a polynomial in variable λ. We use the acronym "PPT" to indicate probabilistic polynomial-time. Let W be a fuzzy (biometric) space with metric space M and Z be a set of arbitrary alphabet. We denote w ← W as the process of sampling w according to a random variable W such that W ∈ W. Let w = {w [1], . . . , w[n]} be a fuzzy vector of length n. We denote string concatenation with the symbol "||," and U represents an arbitrary random distribution.

Hamming Distance Metric
A metric space is a set M with a non-negative integer distance function dis: The elements of M are vectors in Z n for some alphabet sets Z. For any w, w ∈ M, the hamming distance dis(w, w ) = |{i|w i = w i }| is defined as the number of components in which w and w differ.

Min-Entropy
Let X and Y be random variables. The min-entropy of X is defined as . For a given source X = (X [1], . . . , X[n]), we say that X is a source with α-entropy -samples if where α and are determined by the security parameter λ.

Statistical Distance
The statistical distance between random variables X and Y with the same domain is defined by

Universal Hash Function
A collection H of hash functions H : In particular, Lemma 2. (Reference [27]) Let (X, Z) be any two jointly distributed random variables and Z has at most 2 v possible values. Then, for any > 0 it holds that

Discrete Logarithm Assumption
Let G be a group of prime order q, and let g be a generator of G. For any PPT algorithm A, we define the advantage of A, denoted by Adv DL A (λ), in solving the discrete logarithm (DL) problem as follows: We say that the DL assumption holds in Gif, for all PPT algorithms A and any security parameter λ, Adv DL A (λ) < (λ) for some negligible function .

Bilinear Maps
Let G 1 , G 2 , G T be groups of some prime order q and a bilinear map (or pairing) e : G 1 × G 2 → G T over (G 1 , G 2 ) be admissible if it satisfies the following properties: Our fuzzy vector signature is constructed using a Type-3 pairing where G 1 = G 2 and there is no known efficiently computable isomorphism between G 1 and G 2 .

External Diffie-Hellman (XDH) Assumption
Let G 1 , G 2 be groups with prime order q and g, h be generators of G 1 , G 2 , respectively. The XDH problem in G 1 is defined as follows: given D = (g, g a , g b , h) ∈ G 3 1 × G 2 and T ∈ G 1 with a Type-3 pairing, the goal of an adversary A is to distinguish whether T is either g ab or random R. For any PPT algorithm A, the advantage in solving the XDH problem in G 1 is defined as We say that an XDH assumption holds in G 1 if, for any PPT algorithm A, the advantage Adv XDH A (λ) is negligible for λ.

Syntax of Fuzzy Vector Signature
Let W be a fuzzy (biometric) space with the Hamming distance metric M and w is a sample of random variable W ∈ W. A fuzzy vector signature (FVS) scheme [20] is defined by three algorithms (Setup, Sign, Verify) as follows: • Setup(1 λ , w, n, d, , t): The setup algorithm takes as input the security parameter 1 λ , a sample of fuzzy (biometric) data w ← W, the length n of w, the number d of subsets, the number of elements included into each subset, and the maximum number t of errors that can be tolerated. It generates a signing parameter SP and verification key VK w corresponding to w. Here, t/n is said to be the error tolerance rate. • Sign SP, w , m : The signature generation algorithm takes as input a signing parameter SP, a sample of fuzzy (biometric) data w ← W of length n, and a message m. It generates a signature σ w corresponding to w .
• Verify VK w , σ w , m : The verification algorithm takes as input a verification key VK w , a signature σ w , and a message m. If a signature σ w is valid under the condition that dis(w, w ) ≤ t, it outputs 1; otherwise 0.
Correctness. Let δ be the probability that the Verify algorithm outputs 0, and let dis(w, w ) ≤ t for two sample of fuzzy (biometric) data w, w ∈ M. For the signing parameter SP and the verification key

Security Models
We considered three security notions for FVS security: VK-privacy, SIG-privacy, and reusability.

VK-Privacy
VK-privacy blocks an adversary from obtaining any information about fuzzy input data w from a verification key VK w . In other words, the adversary cannot distinguish between a real VK w and a random R. We say that an FVS scheme is VK-private if the advantage that any PPT adversary A wins against the challenger C in the following game for any j = 1, . . . , q is negligible in λ: • Setup: A selects target correlated random variables W = (W 1 , . . . , W q ) ∈ W q and gives these to C.
and chooses random R. C chooses one of the modes, real or random experiment. For any

SIG-Privacy
SIG-privacy means that an adversary cannot ascertain any information on fuzzy input data w from a signature σ w . The adversary cannot distinguish between a valid signature σ w and a signature corresponding to a uniformly random fuzzy data u without a corresponding verification key. We say that an FVS scheme is SIG-private if the advantage that any PPT adversary A wins against the challenger C in the following game for j = 1, . . . , q is negligible in λ: • Setup: A selects target correlated random variables W = (W * , W 1 , . . . , W q ) ∈ W and gives it to C.
and SP * to A. • Query: A issues a random variable W correlated with (W * , W 1 , . . . , W q ), a message m k , and an index j ∈ [1, q] of the signing parameter SP j . C chooses w ← W correlated with (w * , w 1 , . . . , w q ), runs Sign SP j , w , m k , and gives the resulting signature to A. • Challenge: A issues a message m * . C obtains σ * ← Sign SP * , w * , m * and selects a bit b ∈ {0, 1} at random. If b = 0, C sends σ * to A. Otherwise, C selects a random input u from a uniformly random distribution U, obtains σ * ← Sign SP * , u, m * , and gives σ * to A. • Guess: A outputs its guess b ∈ {0, 1}. If b = b, A wins the game. Definition 2 (SIG-privacy.). An FVS scheme Π f is SIG-private if, for any PPT adversary A against SIG-privacy, there exists a negligible function ν(λ) such that

Reusability
Reusability means that an adversary cannot generate a valid signature without knowing the target input source of data, even if the adversary is given verification keys and signing parameters correlated with the target input data. In addition, the adversary can get valid signatures that are verified with the target verification key or other (correlated) verification keys via signing oracles. We say that an FVS scheme is reusable if the advantage that any PPT adversary A wins against the challenger C in the following game is negligible in λ: • Setup: A selects correlated random variables (W 1 , . . . , W q ) ∈ W q and gives it to C. C chooses (w 1 , . . . , w q ) ∈ (W 1 , . . . , W q ) and runs Setup(1 λ , w j , n, d, , t) for j = 1, . . . , q. Then, C gives {VK w j , SP j } q j=1 to A. • Signing query: A issues a random variable W correlated with (W 1 , . . . , W q ), a message m k , and an index j ∈ [1, q] of the signing parameter SP j . C chooses w ← W correlated with (w 1 , . . . , w q ), and runs Sign SP j , w , m k . C sends the resulting signature to A. • Output: A outputs (m * , σ * ) such that σ * was not the output of m * queried. If Verify VK w j , σ * , m * = 1 for some j ∈ {1, . . . , q}, A wins the game.

Definition 3 (Reusability).
An FVS scheme Π f is reusable in chosen message attacks if, for any PPT adversary A making at most q s signature queries, there is a negligible function ν(λ) such that

Construction
Let G = (p, g, h, G 1 , G 2 , G T , e) be a bilinear group, where g ∈ G 1 and h ∈ G 2 are generators. Let W be a (biometric) space with the Hamming distance metric, and let W ∈ W be a random variable that is the user source. Given W, we consider two fuzzy (biometric) samples such that w, w ∈ W. In this section, we present our FVS scheme that consists of the following three algorithms: Setup, Sign, and Verify.
• Setup(1 λ , w, n, d, , t) For a security parameter λ, the setup algorithm generates a bilinear group G, and picks a hash function H : {0, 1} * → Z p . The setup algorithm generates a signing parameter (SP) as follows: Let n be the length of a fuzzy (biometric) data w, and d be the number of entire subsets, be the number of elements included in each subset, and t be the maximum number of errors among n elements, indicating an error tolerance rate. Given a fuzzy (biometric) data w = (w [1], . . . , w[n]) ← W, the setup algorithm generates a verification key VK w as follows: 3. Select random elements r j ∈ Z p for j ∈ [1, d].

Setting the Number of Subsets
Let δ be the probability that the Verify algorithm outputs 0, meaning that it fails to verify a signature σ w using VK w . Thus, if δ = 1/2 is set, then a signer would produce a signature two times with overwhelming probability of generating a valid signature. Following [17], we show that the number d of subsets is determined by setting the value δ. Given (n, ), we assume that our FVS scheme wants to tolerate at most t errors among n elements. During verification, the probability that e(σ 2 , vk 1,j ) = e(A, vk 2,j ) is at least (1 − t n ) for each j = 1, . . . , d. Thus, the probability that the Verify algorithm outputs 0 is at most 1 − (1 − t n ) d , meaning that no (biometric) vector − → v j matches the vector corresponding with w . If we set the failure probability δ as the approximation e x ≈ x + 1 gives the relations, such as Consequently, we obtain the relation d ≈ e t n · ln 1 δ , as required to determine the number d of subsets.

Security
Theorem 1. If W is a family of sources over Z n with α-entropy -samples, then the FVS scheme is VK-private for such a W.
Proof. Before proving the VK-privacy, we first prove that a family H of functions { f SP : Z → G 2 } generating the partial verification key is a 1/p-universal hash function for a fixed SP. In our FVS scheme, given a vector − → v j = {w[i]|i ∈ I j } as input, the function f SP ( − → v j ) is defined as follows: where r j is a randomly chosen exponent ∈ Z p . Since r j is raised for each input, it is easy to see that two outputs of the function could be equal with probability 1/p. Thus, it holds that for two distinct inputs Next, we prove that a verification key VK w corresponding with w ∈ W is almost close to uniform distribution based on Lemma 3. After that, we extend the result to the case of a polynomial number of verification keys. Let W = (W[1], ..., W[n]) be a random variable of a source with α-entropy -samples. If V j = {W[i]|i ∈ I j }, we see that (V 1 , ..., V d ) is a joint distribution of d subsets of W ∈ W such that H ∞ [V j |I j ] ≥ α for random sets of indices (I 1 , ..., I d ). In addition, as mentioned above, each function f SP is a 1/p-universal hash function with respect to a distinct j ∈ [1, d].
In reality, if verification keys and signatures were to become revealed to an adversary, we would have to prove that it is infeasible for the adversary to glean any information on the fuzzy (biometric) data from the signatures. As mentioned above, however, we prove from Theorem 4.1 that the VK-privacy holds, meaning that it is difficult for the adversary to get some information on the fuzzy data from verification keys. In addition, the setup algorithm chooses a new signing parameter SP each time a verification key is generated for each fuzzy data. Thus, it is sufficient to show that a signature generated under a fixed SP does not reveal any information about a challenged fuzzy data.
To do this, a simulator chooses a challenge sample w * along with other correlated samples. For the length n of w * , we create the following sequence of games where w * is used for generating a challenge signature: In Game 0, the challenge signature is generated for the original w * as in a real game, whereas in Game n the challenge signature is generated for a random vector and thus does not have any information on the original w * . For proving the SIG-privacy, it is sufficient to show that it is infeasible for the adversary to distinguish between Game (α − 1) and Game α under the DDH problem.

Lemma 4.
Under the XDH assumption in G 1 , it is infeasible to distinguish between Game (α-1) and Game α.
Proof. Given an XDH instance (g, h, g a , g b , T) ∈ G 1 × G 2 × G 3 1 , a challenger C interacts with an adversary A who tries to break the the SIG-privacy of our FVS scheme.
• Signing queries. A issues a pair of a correlated distribution W with (W * , W 1 , . . ., W q ), an index j ∈ [1, q] of signing parameter SP j , and a message m. Then, C chooses a sample w = (w [1], . . . , w [n]) ← W correlated with (w * , w 1 , ..., w q ). C performs the ordinary signature generation algorithm by taking (SP j , w , m) as inputs.
• Challenge. A sends a message m * to C. In particular, we use a non-interactive zero-knowledge (NIZK) simulator to generate two elements (σ * 4 , σ * 5 ) without knowing the witness. C generates a challenge signature as follows: A outputs a guess b ∈ 0, 1 in response to the challenge signature.
Otherwise, σ * (1,α) becomes a random element, in which case A is in Game α. Therefore, depending on the ability of A, C is able to solve the given XDH problem.

Theorem 3.
If the FVS is VK-private and SIG-private and the DL assumption holds in G 1 , the FVS scheme is reusable in random oracle model.
The proof for Theorem 4.3 is almost the same as the proof for unforgeability in Reference [20], but the difference resides in the fact that an adversary in our reusability proof is given verification keys and signatures that correspond to correlated fuzzy (biometric) data. In other words, even if such correlated fuzzy data is reused, our proof shows that it is difficult for the adversary to generate a valid signature with (unknown) target fuzzy data. To prove reusability, we need the VK-privacy and SIG-privacy to guarantee that the verification keys and signatures exposed to the adversary do not reveal any information about the fuzzy data. At this point, there are two strategies we anticipate that the adversary might take in their forgery. The first is that the adversary would guess w from a certain distribution W and then generate a signature on the input (SP, w , m). However, as long as W is assumed to be a distribution with α-entropy -samples and α is sufficiently large with respect to the security parameter, such a strategy can be successful with negligible probability only.
The other strategy is to reuse a previous signature without changing the fuzzy data w embedded into it. More specifically, there are two prongs to this strategy. One is to re-randomize the previous signature by raising a new random exponent s into the elements {σ (1,i } n i=1 , σ 2 , and σ 3 . Thus, the discrete logarithm of such elements becomes s · s , where s was chosen by a signer and s is now selected by an adversary. The important point is that the adversary still cannot know the exact discrete logarithm s · s , which is then the witness necessary for generating the other signature elements (σ 4 , σ 5 ) as an NIZK proof as it relates to s · s . However, generating such an NIZK proof equates to breaking the statistical soundness of proving the equality of discrete logarithms [28] in regard to the unknown witness. Thus, the probability that the adversary would succeed is at most q h /p, where q h is the number of H-oracle queries, which is also negligible. Now, the remaining case is to reuse the previous signature as it is and simply reconstruct a new proof (σ 4 , σ 5 ). Fortunately, this case can be reduced to the forgery of a one-time multi-user Schnorr signature [29], which is provably secure under the DL assumption. In our proof, a slight variant of the one-time multi-user Schnorr signature proves the equality of discrete logarithms rather than proving knowledge. In line with this, the variant is also provably unforgeable [30,31] against chosen-message attacks in a multi-user setting (MU-SUF-CMA). Based on the variant we use, we show that, under the DL assumption, it is difficult for the adversary to succeed in the remaining case.

Proof.
A simulator B uses an adversary A (which breaks the reusability of the FVS scheme) as a subroutine to forge a signature in the one-time multi-user Schnorr signature scheme. Let q s ≤ ρ for the number q s of signature queries. Given ρ independent public keys g, g 1 , {(g s 1 , g s 1 1 ), . . . , (g s ρ , g s ρ 1 )} of the one-time multi-user Schnorr signature scheme, B interacts with A as follows: • Setup. A gives correlated random variables (W 1 , . . . , W q ) ∈ W q , of which each is in Z n . B samples (w 1 , . . . , w q ) ← (W 1 , . . . , W q ) and runs Setup(1 λ , n, d, , w j , t) to obtain a signing parameter SP j and a verification key VK w j for j = 1, . . . , q. In the setup algorithm, (x j1 , . . . , x jn ) and (y j1 , . . . , y jn ) are selected uniformly at random in Z p . B gives {(SP j , VK w j )} q j=1 to A. • Signing queries. For j ∈ [1, q], A issues signing queries with input, namely a random variable W correlated with (W 1 , . . ., W q ), an index of signing parameter SP j , and a message m k . B responds to the query as follows: -Choose a sample w = (w [1], . . . , w [n]) correlated with (w 1 , . . . , w q ) from W .
-Generate σ 1,i = (g s k ) x ji +y ji w [i] for i ∈ [1, n], and set M k = g s k , g s k 1 , {σ 1,i } i∈ [1,n] , m k . -Query (j, M k ) to the signing oracle of the one-time multi-user Schnorr signature scheme, meaning a signing query on a message M k under the j-th public key, and receive (h k , c k ). -Set σ k = σ 1,i i∈ [1,n] , g s k , g s k 1 , h k , c k and give σ k to A.
• Output. A outputs (m * , σ * ) = m * , σ * 1,i i∈ [1,n] , σ * 2 , σ * 3 , σ * 4 , σ * 5 . B checks if (m * , σ * ) = (m k , σ k ) for any k ∈ [1, q s ], and finds k * such that σ * 2 = g s k * , σ * 3 = g s k * 1 . After finding the index j corresponding with the k * -th query, B checks if Verify(VK w j , σ * , m * ) outputs 1. If it does, B outputs a forgery of the one-time multi-user Schnorr signature scheme as follows: It follows that, as longs as A breaks the reusability of the FVS scheme, then B can break the MU-SUF-CMA security of the Schnorr signature scheme. Indeed, if Verify(VK w j , σ * , m * ) = 1, this means that (σ * 4 , σ * 5 ) is the valid signature of the message σ * 2 , σ * 3 , {σ * 1,i } i∈ [1,n] , m * under the k * -th public key (σ * 2 , σ * 3 ). Since it is clearly proven that the variant of the one-time multi-user Schnorr signature scheme is MU-SUF-CMA secure under the DL assumption in the random oracle model [29], the reusability of our FVS scheme can also be proven in the random oracle model under the DL problem.

Performance Analysis
Our FVS scheme is constructed using the subset-based sampling method of the re-usable fuzzy extractor [17], which does not require fuzzy (biometric) input data to have sufficiently high min-entropy and can still tolerate a sub-linear fraction of errors. Generally, biometric data sources are non-uniformly and uniquely distributed from person to person; thus, it is not easy to expect that such biometric data will have high min-entropy. Nevertheless, most reusable fuzzy extractors [8,[13][14][15] based on secure sketches require source data with high min-entropy because a secure sketch is known to cause an entropy loss of a biometric input data. For comparison, we focus on fuzzy extractors [17,18] that do not rely on secure sketches, and therefore not require a high min-entropy source. A reusable fuzzy extractor suggested by Reference [12] is not based on a secure sketch and tolerates a linear fraction of errors, but it uses a so-called pseudoentropic isometry that requires each biometric input data component to have high min-entropy. This requirement is also far from realistic biometric information. A reusable fuzzy extractor suggested by Apon et al. [16] is constructed based on the hardness of Learning with Errors (LWE) problems, where biometric data is injected into the part with LWE errors. In that case, such biometric data must follow a certain error distribution (e.g., Gaussian) in order to ensure the security of LWE problems. This reveals a limitation in its real-world application potential. Furthermore, the LWE-based fuzzy extractor [16] requires a time-consuming reproducing algorithm, where another sample of biometric data is subtracted from an LWE instance that was previously created per each component, and then a randomly chosen linear system is expected to be solved. A problem arises if any error components in the chosen linear system are not 0, at which point a new linear system must be randomly reselected until achieving success. The same problem is found in the reusable fuzzy signature [19] that follows the LWE-based fuzzy extractor technique. To mitigate the reproducing problem, [16,19] should be limited to dealing with only a logarithmic size of errors.
Eventually, we compare our FVS scheme with previous fuzzy extractors [17,18] and the original FVS scheme [20], which all follow the subset-based sampling technique [17]. We specifically consider authentication protocols where a fuzzy extractor or an FVS scheme is instantiated to authenticate a user using biometric data. During protocol executions, we compare our proposed scheme with existing schemes in terms of storage or transmission costs and computational costs on the part of the user. In regards to the fuzzy extractor, we assume that a digital signature scheme S = (KeyGen, Sign, Verify) is additionally provided. As usual, an authentication protocol consists of two phases; enrollment and authentication. In the enrollment phase, a user is registered with an authentication server by sending an identity ID, a verification key vk ID , and helper data P ID . In each authentication phase, a user receives helper data P ID from the server, recovers a secret key for signature generation using their biometric data, and returns a signature σ ID in response to a challenge message R sent by the server. Figures 1 and 2 present two authentication protocols in more detail.

Storage or Transmission Costs
In the authentication phase, helper data P ID is needed to generate a signing key from user biometric data. There are two options through which the user obtains an P ID . The first way is to store it in their own personal device, and the other is to receive it from the server for each instance of authentication. The first method can reduce the amount of transmission, while carrying a personal storage device. Conversely, the second one does not require a personal device, while the server sends out a huge amount of transmission, and it has the advantage in that the user authentication can work on secure devices that are shared by multiple users. For comparison purposes, we present the size of P ID as the storage or transmission cost in Table 2.   Let n be the dimension of biometric data, d be the number of subsets, be the number of elements in each subset, and t be the maximum number of errors among n elements. With fuzzy extractors [17,18], helper data P ID consists of an information set, a nonce, and an output of a hash function per subset. To begin, the number d of subsets is obtained via d ≈ ln(1/δ) · e t n in Reference [17], whereas d is computed as d ≈ m/2q in Reference [18] for (τ, m)-threshold scheme [32]. In this case, the information set per subset has indices which are represented by log n bits. When using SHA-256 as the hash function, we set the size of a nonce to be sufficient at 176(= 256 − 80) bits. As a result, the whole size of P ID is about d · ( log n + 256 + 176) for n =512, 1024, and 2048 cases, which, as shown in Table 2, becomes huge when setting t/n = 0.20 as an error tolerance rate and = 80. On the other hand, with FVS schemes, the number of subsets is determined by d ≈ ln(1/δ) · e t n following [17], but helper data P ID consists of only a signing parameter SP, regardless of the number of subsets. Indeed, in Figure 3, SP in Reference [20] is 3n + 1 elements in G 1 , whereas SP in ours is 2n + 2 elements in G 1 . When taking the Type-3 pairing [33] at the 100-bit security level, the size of elements in G 1 and Z p is 256 bits. For n = 512, 1024, and 2048 cases, Table 2 shows that the P ID size of the FVS schemes are overwhelmingly smaller than that of the fuzzy extractors. Compared to Reference [20], our FVS scheme obtains a slightly shorter size of SP with the same parametrization. Regarding the signature size, the σ ID in our scheme consists of n + 2 elements in G 1 plus 2 elements in Z p that are transmitted to the server. When taking the Type-3 paring and n = 512 again, the amount of σ ID transmission is about (512 + 4) × 256/2 13 = 16 KB. δ = 1/2 is the the probability that verification fails, and we expect the user to run step [A2] of the authentication protocols twice, during which the transmission cost of σ ID becomes about 32 KB.

Computation Cost
For the fuzzy extractor, we considered the computational cost required for obtaining a signing key K ID by running the FE.Rep algorithm that takes as input helper data P ID and a biometric data bio . This is the step for [A2] in Figure 1. We assume that a reproduced value from FE.Rep is straightforwardly used as the singing key corresponding with the verification key vk ID . If a hash function H, such as SHA-256, is used as a digital locker [17], K ID is locked with (nonce i , H(nonce i , {bio} i ) (K ID ||0 s )) for a positive integer s, where {bio} i is the set of biometric data corresponding to a subset i among n components. Therefore, with a new bio , the unlocking algorithm needs to perform a hash computation |H| plus an XOR operation |X| per each subset i until K ID is obtained. Consequently, the FE.Rep algorithm in Reference [17] must compute d · (|H| + |X|) operations as a worst case scenario. Since [18] also requires solving a (τ, m)-secret sharing scheme, the FE.Rep algorithm chooses a set of τ shares among m unlocked values and then solves a secret-sharing scheme, leading to performing additional d m ( m τ ) · τm(m − 1) |X| operations. In contrast, the FVS scheme needs to run the FVS.Sign algorithm to generate a signature σ ID by taking (bio , SP, R). This is the step for [A2] in Figure 2. In our FVS scheme, the FVS.Sign algorithm needs (n + 4) exponentiations in G 1 for the dimension n of biometric data. Compared to Reference [20], the signing operation is reduced by about half, which is shown in Table 3. Table 3. Comparison of computational costs necessary for signature generation.
In order to measure the actual amount of calculation, we considered how much computation the user should perform by substituting the numbers in Table 2 directly. For instance, let n = 512 and = 80. The fuzzy extractor by Canetti et al. [17] must complete (61.6 × 10 5 )(|H| + |X|) operations, and when setting (τ, m) = (5, 32), the fuzzy extractor by Cheon et al. [18] needs to compute approximately (53.6 × 10 3 )|H| + (16.7 × 10 11 )|X| operations-which still seems to be a burdensome amount of calculation to perform on a personal device. To compare, when considering 0.3 ms for an exponentiation in G 1 [34], the FVS.Sign algorithm in ours takes about (512 + 4) · 0.3 ≈ 155 ms. Since δ = 1/2, the user is required to run the step [A2] of the authentication protocols twice, such that the FVS.Sign algorithm, which is computed by the user, takes about 310 ms.

Implementation
We implemented our fuzzy vector signature as a C program to measure actual time consumption. All implementations are performed on an Intel Core i7-8700k with 8GB RAM running Ubuntu 18.04 LTS, and GNU GCC version 7.5.0 is used for the compilation. For implementation, we selected the BLS12381 curve that offer around a 128-bit security level and a SHA-256 for a hash function. In a BLS12381 curve, the size of a element of G 1 and G 2 are 192 bit and 384 bit, respectively. Our implementation codes are available to https://github.com/Ilhwan123/FVS.
For each error rate 5% and 10%, we measured the time consumption required for running our scheme several times. Table 4 presents parameter setting, storage size, and time required for each algorithm, which is the results of our implementation. Compared with Table 2, the size of the G 1 group element is 192 bits, which is less than 256 bits. Therefore, the size of a signing parameter is smaller. However, this depends on the curve type, so it can be different depending on which curve is selected.
Note that, if signature verification fails in the parameter setting in Table 4, the total verification time takes about 56 s. • Signing Parameter, Verification Key, and Signature means the size of them, respectively, for each fuzzy (biometric) data length n. • Setup, Sign, and Verify means the average time required for each algorithm.
• Error rate bio' means a percent of difference between a input data of Setup bio and a input data of Sign bio'.

Conclusions
In this paper, we presented an FVS scheme improved upon across all aspects of efficiency and security that is, more strictly speaking, based on the subset-based sampling method [17]. Compared to the original FVS scheme [20], we reduced the size of the signing parameter and the verification key to approximately two-thirds their original sizes and cut the signature size by about half. In addition, we reduced the number of pairings necessary for signature verification to about two-thirds the original number.
We prove that our FVS scheme is VK-private and SIG-private, meaning that the verification key and signatures generated using user correlated fuzzy (biometric) data do not reveal any information about the fuzzy input data. Additionally, instead of the unforgeability of Reference [20], we define the reusuability property which guarantees that a user is able to reuse their fuzzy correlated (biometric) data to generate polynomially-many verification keys, all while still making it infeasible for an adversary to forge a signature without any fuzzy (biometric) data. Under the reusability notion, we can prove that our FVS scheme is reusable, assuming that our FVS scheme is {VK, SIG}-private and the DL assumption holds.
In the remote authentication protocol of our FVS scheme, a user must receive the signing parameter and transmit a signature in response to a random challenge message. The primary advantage of FVS-based (biometric) authentication is that the transmission cost, including the signing parameter and signature, is determined only by the number of dimensions with respect to the fuzzy (biometric) data, not by the number of entire subsets. Thus, unlike the authentication protocol with a fuzzy extractor, the transmission cost between the user and the authentication server becomes remarkably smaller. However, the disadvantage of our FVS-based authentication scheme is that the server is required to perform pairing operations by the number of entire subsets, which is the worst case scenario. Such a burden may be somewhat alleviated by utilizing the computing power of the server in parallel, but it could be more desirable to build a new FVS scheme that supports efficient batch verification operations in the future.

Conflicts of Interest:
The authors declare no conflict of interest.
For i = 1, Lemma 2 shows that with a probability of at least 1 − over the sample of (y 2 , . . . , y q ) ← (H 2 (X 2 ), . . . , H q (X q )), it holds that H ∞ [X 1 |H 2 , . . . , H q , {H i (X i ) = y i } In this case, the leftover hash lemma [35] implies that the two distributions D and D 1 are 2 -close. Next, assuming that the above lemma holds for i − 1 < q, we show that the case for i also holds. Lemma 2 shows that, with a probability of at least 1 − over the sample of (y i+1 , . . . , y q ) ← (H i+1 (X i+1 ), . . . , H q (X q )), it holds that Similarly, the leftover hash lemma [35] shows that the two distributions D i−1 and D i are 2 -close. It follows that ∆(D, D i ) ≤ ∆(D, D i−1 ) + ∆(D i−1 , D i ) ≤ 2 (i − 1) + 2 = 2 i, which concludes the proof of Lemma 3.