Biometric Identification Systems With Noisy Enrollment for Gaussian Source

In the present paper, we investigate the fundamental trade-off of identification, secrecy, storage, and privacy-leakage rates in biometric identification systems for hidden or remote Gaussian sources. We introduce a technique for deriving the capacity region of these rates by converting the system to one where the data flow is in one-way direction. Also, we provide numerical calculations of three different examples for the generated-secret model. The numerical results imply that it seems hard to achieve both high secrecy and small privacy-leakage rates simultaneously. In addition, as special cases, the characterization coincides with several known results in previous studies.


I. INTRODUCTION
The identification capacity of biometric identification systems (BIS) was clarified in [1] for both discrete memoryless and Gaussian sources. For the discrete memoyless source (DMS), the fundamental performances of the BIS are extensively analyzed in [2], [3] for a visible source model (VSM) and in [4], [5] for a remote source model (RSM). However, the studies under Gaussian settings are still few. For example, the optimal trade-off between secrecy and privacy-leakage was clarified in [6] and in order to speed up search complexity, hierarchical identification was taken into account in [7]. A common stand in [6], [7] is that the VSM was assumed.
In this study, we extend the BIS assuming the RSM in [5] to Gaussian sources. This is motivated by the fact that the signal of biometric data (bio-data) is basically represented by vectors with continuous elements in real application and most communication links can be modeled as Gaussian channels. What is more, when the model is switched from the VSM to the RSM, the evaluation becomes more challenging [4], [5] and many existing techniques for deriving the results of the VSM are not directly applicable. Thus, the extension is of both theoretical and practical interest. Our goal is to find the optimal trade-off of identification and secrecy rates in the BIS under privacy and storage constraints. We demonstrate that an idea of converting the system to another one where the data flow of each user is in the same direction, which enables us to characterize the capacity region. More specifically, in establishing the outer bound of the region, the converted system allows us to use the well-known entropy power inequality (EPI) [8] twice in two opposite directions, and also its property facilitates the derivation of the inner bound. In [4], Mrs. Gerber's lemma was applied twice, too, to simplify the rate region of the RSM for binary sources without converting the BIS. That was possible due to the uniformity of the source, and the backward channel of the enrollment channel is also the binary symmetric channel with the same crossover probability. However, this claim is no longer true in the Gaussian case, so it is necessary to formulate the general behavior of the backward channel. We also provide numerical calculations of three different examples. As a consequence, we may conclude that it is difficult to achieve high secrecy and small privacy-leakage rates at the same time. To achieve a small privacy-leakage rate, the secrecy rate is scarified somehow. Furthermore, as a by-product of our result, the capacity regions of the BIS analyzed in [4] (the BIS with a single user) is obtained, and as special cases, it can be checked that this characterization reduces to the results given in [1], [6].

A. Notation and System Model
Upper-case A and lower-case a ∈ A denote random variable (RV) and its realization, respectively. A n = (A 1 , · · · , A n ) represents a string of RVs and subscripts represent the position of a RV in the string. f A denotes the probability density function (pdf) of RV A. For integers k and t such that k < t, [k : t] denotes the set {k, k + 1, · · · , t}. log x stands for the natural logarithm of x > 0. A (n) (·) denotes the weaklytypical set [9], and B (n) (·) is a modified -typical set, defined as follows.
where is small enough positive, and X n is drawn i.i.d. from the transition probability n k=1 f X|Y (x k |y k ). In addition, define B (n) (U |y n ) = {u n : (u n , y n ) ∈ B (n) (Y U )} for all y n , and B (n) (U |y n ) c denotes the complementary set of The generated-secret BIS model and chosen-secret BIS model considered in this study are depicted in Fig. 1. Arrows (g) and (c) indicate the directions of the secret key of the M J ] be the sets of user's indexes, secret keys, and helper data, respectively. These sets are assumed to be finite. X n i ,Y n i , and Z n denote the bio-data sequence of user i generated from source P X , the output of X n i via the enrollment channel P Y |X , and the output of X n i via the identification channel P Z|X , respectively. For i ∈ I and k ∈ [1 : n], we assume X ik ∼ N (0, 1). Note that RV with unit variance can be obtained by applying a scaling technique. P Y |X and P Z|X are modeled as follows: where |ρ 1 | < 1, |ρ 2 | < 1 are the Pearson's correlation coefficients, and N 1 ∼ N (0, 1 − ρ 2 1 ) and N 2 ∼ N (0, 1 − ρ 2 2 ) are Gaussian RVs, independent of each other and bio-data sequences. From (2), Y and Z are Gaussian with zero mean and unit variance, and the Markov chain Y − X − Z holds. Then, the pdf corresponding to the tuple (X n i , Y n i , Z n ) is given by where for x, y, z ∈ R, .
In the generated-secret BIS model, upon observing Y n i , the encoder e generates secret key S(i) ∈ S and helper data J(i) ∈ J as (S(i), J(i)) = e(Y n i ). J(i) is stored at position i in public database (helper DB) and S(i) is saved in key DB, which is installed in a secure location. Seeing Z n , the decoder d estimates ( W , S(W )) from Z n and all helper data in DB J ≡ {J(1), · · · , J(M I )}, i.e., ( W , S(W )) = d(Z n , J ). In the chosen-secret BIS model, S(i) is chosen uniformly from S and independent of other RVs. The encoder forms the helper data by J(i) = e(Y n i , S(i)) for every individual. The decoder d owns the same functionality as the generated-secret model.

B. Converted System
The original system, having X as input source and Y, Z as outputs, is in the top figure in Fig. 2. There are two main obstacles toward characterizing the capacity regions directly from this system. (I) In establishing the converse proof, an upper bound regarding RV Y for a fixed condition of RV X is needed, but it is laborious to pursue the desired bound since applying EPI to the first relation in (2) only produces a lower bound. (II) It seems difficult to prove the achievability part based on generating auxiliary sequences from edge X, e.g., the rate settings. To overcome these bottlenecks, we introduce an idea of converting the original system to a new one in which the data flow of each user is one-way from Y to Z without losing its general properties. The image of this idea is shown in the bottom figure of Fig. 2, where Y becomes input virtually. To achieve this objective, knowing the property of the backward channel P X|Y , namely, how X correlates to the virtual input Y , is crucial and we explore that in the rest of this section.
Due to the Markov chian Y − X − Z, (4) can also be expanded in the following form.
Observe that Without loss of generality, the exponential part in (5) can be rearranged as From (6) and (8), we may conclude that the following equations hold.
with some RV N 1 ∼ N (0, 1 − ρ 2 1 ). Equation (9) describes the output of the backward channel with having Y as input. The above relations play key roles for solving the problem of the RSM, and indeed we use them in many steps during the analysis in this study. In [6] and [7], the concept of this transformation is not seen because the enrollment channel does not exist due to the assumption of VSM as mentioned before.

Remark 1.
In case there is no operation of scaling, equations (9) and (10) are settled as follows. Suppose that X ik ∼ N (0, σ 2 x ), Y ik = X ik + D 1 , and Z k = X ik + D 2 , where D 1 ∼ N (0, σ 2 1 ) and D 2 ∼ N (0, σ 2 2 ) are Gaussian RVs, and independent of other RVs. By applying the arguments around (6)- (8), we obtain that where D 1 ∼ N (0, ) is Gaussian and independent of other RVs. The capacity regions of the models considered in this study can also be characterized from (11) and (12). However, equation developments need more space and it does not look so neat. Herein, we pursue our results based on the method that RVs X, Y , and Z are standardized. (9) and (10), it is not difficult to calculate that

Now from
where (14) is attained because the variance of the noise term

III. STATEMENT OF RESULTS
In this section, we provide the formal definitions of both the generated-and chosen-secret BIS models, and state the main results.

A. Problem Formulation and Main Results
The achievability definition for the generated-secret BIS model is given below.

Definition 2.
A tuple of identification, secrecy, storage, and privacy-leakage rates (R I , R S , R J , R L ) is said to be achievable for a Gaussian source if for any δ > 0 and large enough n there exist pairs of encoders and decoders satisfying Moreover, R G is defined as the set of all achievable rate tuples for the generated-secret BIS model, called the capacity region.
The achievability definition for the chosen-secret BIS model is provided as follows: is said to be achievable for a Gaussian source if there exist pairs of encoders and decoders that satisfy all the requirements in Definition 2 for any δ > 0 and large enough n. In addition, R C is defined as the capacity region of the chosen-secret BIS model.
Note that the left-hand side of (17) is expressed as 1 n log M S because S(i) is chosen uniformly from S.

Remark 2.
It is worthwhile to mention that it is not really suitable to call the rate of helper data the storage rate. In [5], it was called the template rate instead and the reason behind the scene is that there exist two databases in the BIS, namely, databases of secret keys and helper data or templates. The storage space of the database for storing the templates is minimized, while that for the secret keys is maximized. Thus, only a part of the entire storage space of the BIS is being minimized. In this paper, however, we also use this term because it is widely used in many previous works, e.g., [3], [4].
Now we are ready to present our main results. Theorem 1. The capacity regions for the generated-and chosen-secret BIS models are given by Similar to a conclusion in [5], the lower bound on R J in R C is greater than the one in R G . This means the chosen-secret BIS model consumes more storage space. This is because the information related to the secret key chosen at the encoder must be saved together with the helper data in DB so as to aid the estimation of the key at the decoder. Unlike R J , the bound on R L remains unchanged in both models, and it rises in accordance with the increase of R I .
As a by-product of Theorem 1, the following remark is obtained.
Remark 3. The capacity regions of the generated-and chosen-secret BIS models with a single user (the models con-sidered in [4]) for Gaussian sources are given by substituting R I = 0 into R G and R C , respectively.
The proofs of Remark 3 can be done similarly to the arguments in proving Theorem 1.
As special cases, when R S = 0, and R J and R L are large enough (R J , R L → ∞), the maximum value of R I is ). This value is exactly the identification capacity I(Y ; Z) (cf. (14)) derived in [1], and it is achieved when α ↓ 0. Moreover, when R I = 0, R J → ∞, and the enrollment channel is noiseless (ρ 1 = 1), one can see that Theorem 1 naturally reduces to the characterizations of [6].

Remark 4.
If there is no scaling as in (11) and (12) in Remark 1, the capacity regions of the generated-and chosen-secret BIS models R G and R C , respectively, are characterized as follows: It can be verified that R G and R C are equivalent to R G and R C , respectively, if we set

B. Examples
For the sake of succinct discussion, we only concentrate on the generated-secret BIS model at which R I = 0. We first look over some special points of secrecy and privacy-leakage rates when storage rate becomes extremely low or large. We first define two rate functions where (25) and (26) are the maximum secrecy rate and minimum privacy-leakage rate, respectively, for given R J . Moreover, we define R α J = 1 2 log( . (28) As R α J → ∞ (α ↓ 0), the optimal asymptotic secrecy rate and the quantity of privacy-leakage approach to The result (29) corresponds to the optimal asymptotic secrecy rate [6, Sect. III-B] and in order to achieve this rate, it is required to take the storage rate to infinity and leak the user's privacy up to rate I(X; Y ) − I(Z; Y ).
In contrast, when R J ↓ 0, it is evident that R S and R L become zero as well, which does not carry much information. However, to investigate the BIS that achieves high secrecy and small privacy-leakage rates in the low storage rate regime, the zero-rate slopes of secrecy and privacy-leakage rates, namely, how fast they converge to zero, are important indicators. In views of (27) and (28), by a few steps of calculations, the slopes of secrecy and privacy-leakage rates at R J ↓ 0 can be determined as follows: where (31) is equal to the signal-to-noise ratio of the compound channel from Y to Z. This value multiplied by the reverse of the signal-to-noise ratio of the channel P Z|X appears in the slope of privacy-leakage rate in (32). Next, we give numerical computations of three different examples and take a look into behaviors of the special points. Ex. 1: a) ρ 2 1 = 3/4, ρ 2 2 = 2/3, b) ρ 2 1 = 7/8, ρ 2 2 = 2/3, c) ρ 2 1 = 15/16, ρ 2 2 = 2/3, Ex. 2: a) ρ 2 1 = 3/4, ρ 2 2 = 2/3, b) ρ 2 1 = 9/10, ρ 2 2 = 7/8, c) ρ 2 1 = 15/16, ρ 2 2 = 11/12, Ex. 3: a) ρ 2 1 = 3/4, ρ 2 2 = 2/3, b) ρ 2 1 = 3/4, ρ 2 2 = 8/9, c) ρ 2 1 = 3/4, ρ 2 2 = 14/15. Note that as ρ 2 1 , ρ 2 2 are large, the noises added to the bio-data sequences at encoder and decoder become small. Example 1 is the case where the noise at encoder is gradually small from a) to c), but the noise at the decoder stays constant for each round. Example 2 is the case in which the noises at both encoder and decoder are improved gradually from a) to c). Example 3 is opposite to Example 1. The calculated results of the secrecy and privacy-leakage rates for these cases are summarized in Table I and II, and Fig. 3-6.
It is ideal to keep the privacy-leakage rate small, while produce high secrecy rate, but Example 1 works out in the opposite way (cf. the rows of Ex. 1 in Table I and II), so this is not a preferable choice. Example 2 realizes a high secrecy rate, but the amount of privacy-leakage remains high at some level, too (cf. the rows of Ex. 2 in Table I and II, and Fig. 3  and 4). On the other hand, in Example 3, the privacy-leakage rate declines, but the secrecy rate becomes small compared to Example 3 (cf. the rows of Ex. 3 in Table I and II, and Fig. 5 and 6). From these behaviors, we may conclude that it is unmanageable to achieve both a high secrecy rate and small privacy-leakage at the same time. If one aims to achieve a high secrecy rate, it is important to diminish the noises at both encoder and decoder, e.g., deploying quantizers with high quality, but this could result in leaking more user's privacy.
In different circumstances, to achieve a small privacy-leakage rate, it is preferable to maintain a certain level of noise at encoder and pay sufficient attention for processing the noise at decoder. In this way, however, the gain of the secrecy rate may be dropped.

IV. PROOF OF THE REGION R G
In this section, we give the proof of the capacity region of the generated-secret BIS model.

A. Converse Part
We consider a more relaxed case where W is uniformly distributed on I, and (15) is replaced with the average error criterion Pr{( W , S(W )) = (W, S(W ))} ≤ δ. We shall show that the capacity region for this case, which contains R G , is contained in (21). We assume that a rate tuple (R I , R S , R J , R L ) is achievable.
where (f) hold as (S(W ), J(W )) is a function of Y n W , (g) follows since and W is independent of other RVs, and S(W ) is a function of Y n W , (h) follows because conditioning reduces entropy and W is uniformly distributed on I, (i) follows because h(Y n W ) = h(Z n ) = n 2 log(2πe), and (16) and (33) are applied.
Analysis of Privacy-Leakage Rate: where (j) follows as conditioning reduces entropy and W is uniformly distributed on I, with some 0 < α ≤ 1. Indeed, this is reasonable setting because 1 2 log(2πe) ≥ 1 n h(X n W |J(W ), S(W )) ≥ 1 2 log(2πe(1 − ρ 2 1 )). The lower bound is obtained from 1 n h(X n W |J(W ), S(W )) ≥ 1 n h(X n W |Y n W , J(W ), S(W )) = 1 n h(X n W |Y n W ) due to the fact that (J(W ), S(W )) is a function of Y n W . In the direction from X to Z, by applying the conditional EPI [10, Lemma II] to the first equality in (10), it follows that where (l) holds as N n 2 is independent of (J(W ), S(W )), and as a deduction, In the opposite direction (from X to Y ), by again applying the conditional EPI [10, Lemma II] to (9), we have that meaning that and thus e 2 n h(Y n |J(W ),S(W )) ≤ 2πeα.
Hence, it follows that which is not derivable from the first equation in (2) of the original system. Now plugging (38), (40), and (44) into (35)-(37), we obtain that Eventually, by letting n → ∞ and δ ↓ 0, from (45)-(47), we can see that the capacity region is contained in the right-hand side of (21).

B. Achievability Part
Overviews: The modified typical set (cf. Definition 1), giving the socalled Markov lemma for weak typicality, and Gaussian typicality [9, Section 8.2] help us show that the error probability of the BIS vanishes for large enough n. Though a more general version of the Markov lemma for Gaussian sources, including lossy reconstruction, is shown in [11], we found out that the two properties of the modified typical set are handy tools for checking all conditions in Definition 2, and thus we provide our proof of the achievability based on this set. For evaluating the uniformity of secret keys (17), secrecy-leakage (19), and privacy-leakage (20), we extend [12, Lemma 4] to include continuous RVs so that the extended one can be used to derive the upper bounds on conditional differential entropies of jointly typical sequences, appearing in these evaluations.
Let 0 < α ≤ 1 and fix δ > 0 (small enough positive), the block length n, and the joint pdf of (U, Y, X, Z) such that the Markov chain U − Y − X − Z holds, where we let U be Gaussian with mean zero and variance 1 − α. Now consider that where Φ, independent of U , is Gaussian with mean zero and variance α. From (9) and (10) of the converted system, it yields that Hence, we readily see that Now set 0 < R I < I(Z; U ), and Next we generate 2 n(I(Y ;U )+δ) sequences of u n (s, j), where each symbol of these sequences is i.i.d. Gaussian with mean zero and variance 1 − α, and s ∈ S and j ∈ J .
Seeing y n i (i ∈ I), the encoder finds u n (s, j) such that (y n i , u n (s, j)) ∈ B (n) δ (Y U ). If there are multiple pairs of such (s, j), the encoder picks one at random. Otherwise, it declares error. We denote the chosen pair as (s(i), j(i)), where they are function of the index i. Template j(i) is stored in the public DB and secret key s(i) is saved in the key DB.
Observing z n , the noisy sequence of the identified user x n w , the decoder looks for u n (s, j(i)) such that (z n , u n (s, j(i))) ∈ A (n) δ (ZU ) for some i ∈ I and s ∈ S. If a unique pair (i, s) is found, it outputs ( w, s(w)) = (i, s), or else it declares error. Finally, it compares s(w) with s( w) in the key DB, and the authentication is successful if they match.
Let (J(i), S(i)) denote the index pair chosen at the encoder based on Y n i , i.e., (Y n i , U n (S(i), J(i))) ∈ B (n) δ (Y U ). Furthermore, we denote U n (S(i), J(i)) as U n i for simplicity. Next, we check all conditions in Definition 2 hold for a random codebook C n = {U n (s, j), s ∈ S and j ∈ J }.
Analysis of Error Probability: For W = i, an error event possibly happens at the encoder is: for all s ∈ S and j ∈ J }, and those at the decoder are: δ (ZU ) for some ∃i = i (i ∈ I) and s ∈ S}.
Note that the authentication process is guaranteed to be successful if the genuine index and secret key of the identified user are correctly estimated at the decoder, indicating that it is sufficient to focus on assessing the probability of incorrect estimation for the pair at the decoder. Then, the error probability can be further evaluated as Pr{( W , S(W )) = (W, S(W ))|W = i} By applying the similar arguments of [2, Appendix A-B], it can be shown that the entire error probability vanishes. Nonetheless, we provide the details for completeness of the proof. The first term Pr {E 1 } can be evaluated as for large enough n, where (a) is due to the fact that Y n i and U n (s, j) are mutually independent, (b) is obtained by applying Property 1 of the modified δtypical set [2], suggesting that if (y n , u n ) ∈ B (n) (Y U ), (y n , u n ) is also a member of A (n) (Y U ), and thus (c) follows because (1−αβ) m ≤ 1−α+2 −mβ [9] is applied, (d) holds since 1 2 log |S| + 1 2 log |J | = I(Y ; U ) + 4δ. (e) follows by applying Property 2 of the modified -typical set [2].
For the second term, it follows that where (f) follows from the definition of the modified δ-typical set due to the Markov chain Z − Y − U .
Finally, the last term Pr {E 3 ∪ E 4 } can be bounded as where (g) follows as 1 n log M I + 1 n log M S = I(Z; U ) − 2δ. Consequently, for large enough n.
Before proceeding further, we introduce a lemma that is often used in the sequel. Again recall that the index pair (J(i), S(i)) determines the chosen sequence U n i directly and thus the following lemma can be thought of an extended version of [12,Lemma 4], incorporating continuous RVs.

Proof:
The tie between the modified δ-typical set B (n) δ (·) and the weakly δ-typical set A (n) δ (·) is helpful for proving the above lemma. We first prove (62).
Define an RV T as follows: In the analysis of the error probability, we have already demonstrated that P T (0) ≤ 2δ, or (Y n i , U n i ) ∈ B with high probability. From the left-hand side of (62), where (h) follows as (J(i), S(i)) determines U n i , (i) follows because conditioning reduces entropy, (j) follows as h(Y n i |U n i , T = 0) ≤ h(Y n i ) = n 2 log(2πe), and we define n = 1 n + δ log(2πe), δ (Y U )} ≤ 1, and from Property 1 of the modified δ-typical set [2], we have that f Y n Therefore, from (65), we obtain that where δ n = 2δ + n and δ n ↓ 0 as n → ∞ and δ ↓ 0. Next, we briefly summarize how to show (63). The left-hand side of (63) can be developed as h(Y n i |X n i , J(i), S(i), C n ) = h(Y n i |X n i , U n i , J(i), S(i), C n ) ≤ h(Y n i |X n i , U n i , C n ), where the first equality and second inequality follow due to the same reasons of (h) and (i) in (65), respectively. By the definition of the modified δ-typical set [2], it can be concluded that Pr{(X n i , Y n i , U n i ) ∈ A δ (Y |x n , u n )|(X n i , U n i ) = (x n , u n )} → 1 as n → ∞ as well. Based on this observation, the rest of proof for (63) can be done similarly by the arguments seen in [12,Appendix C], and therefore the details are omitted.

Analysis of Identification and Storage Rates:
Equations (16) and (18) obviously hold from the parameter settings.
for sufficiently large n. Finally, by using the selection lemma [13, Lemma 2.2], there exists at least a good codebook satisfying all the conditions in Definition 2 for large enough n.

V. PROOF SKETCH OF THE REGION R C
In this section, we highlight the proof of the chosen-secret BIS model. Some parts follow from the arguments in Section IV, so we omit the similarities.

A. Converse Part
As seen in the converse proof of the generated-secret BIS model, we also consider the case in which W is uniformly distributed on I. Suppose that a pair (R I , R S , R J , R L ) is achievable.
For the analyses of identification, secrecy, and privacyleakage rates, the reader should refer to the discussions around (35) and (37). We argue only the bound of R J , which is different from the one seen in the generated-secret BIS model.
where (a) follows since J(W ) is a function of (Y n W , S(W )), (b) follows as S(W ) is chosen independently of Y n W , (c) follows because conditioning reduces entropy and (44) is applied. Then, we have that By letting n → ∞ and δ ↓ 0, the capacity region of the chosen-secret BIS model is contained in the right-hand side of (22).

B. Achievability Part
In order to avoid confusion in the subsequent arguments, we define some new notations used only in this part. The pairs (J C (i), S C (i)) and (J G (i), S G (i)) denote the helper data and secret key of individual i for chosen-and generatedsecret BIS models, respectively. Moreover, M J C and M J G denote the number of templates 1 , and R J G and R J C denote the storage rates in the generated-and chosen-secret BIS models, respectively.