Robust Biometric Authentication from an Information Theoretic Perspective †

Robust biometric authentication is studied from an information theoretic perspective. Compound sources are used to account for uncertainty in the knowledge of the source statistics and are further used to model certain attack classes. It is shown that authentication is robust against source uncertainty and a special class of attacks under the strong secrecy condition. A single-letter characterization of the privacy secrecy capacity region is derived for the generated and chosen secret key model. Furthermore, the question is studied whether small variations of the compound source lead to large losses of the privacy secrecy capacity region. It is shown that biometric authentication is robust in the sense that its privacy secrecy capacity region depends continuously on the compound source.


Introduction τ 0
Biometric identifiers, such as fingerprints, iris and retina scans, are becoming increasingly attractive for the use in security systems because of their uniqueness and time invariant characteristics-for example, in authentication and identification systems.Conventional personal authentication systems usually use secret passwords or physical tokens to guarantee the legitimacy of a person.On the other hand, biometric authentication systems use the physical characteristics of a person to guarantee the legitimacy of the person to be authenticated.
Biometric authentication systems are decomposed into two phases: the enrollment and the authentication phase.A simple authentication approach is to gather biometric measurements in the enrollment phase, apply a one-way function and then store the results in a public database.In the authentication phase, new biometric measurements are gathered.The same one-way is applied and the outcome is then compared to the one stored in the database.Unfortunately, biometric measurements might be affected by noise.To deal with noisy data, error correction is needed.Therefore, helper data is generated during the enrollment phase as well based on the biometric measurements and then stored directly in the public database that will be then used in the authentication phase, which will then be used in the authentication phase to correct the noisy imperfections of the measurements.
Since the database containing the helper data is public, an eavesdropper can have access to the data if desired.How can we prevent an eavesdropper from gaining information about the biometric data from the publicly stored helper data?One is interested in encoding the biometric data into a helper data and a secret key such that the helper data does not reveal any information about the secret key.Cryptographic techniques are one approach to keeping the key secret.However, security on higher layers is usually based on the assumption of insufficient computational capabilities of eavesdroppers.Information theoretic security, on the contrary, uses the physical properties of the source to guarantee security independent from the computational capabilities of the adversary.This line of research was initiated by Shannon in [1] and has attracted considerable interest recently-cf., for example, recent textbooks [2][3][4] and references therein.In particular, Ahlswede and Csiszár in [5] and Maurer in [6] introduced a secret key sharing model.It consists of two terminals that observe the correlated sequences of a joint source.Both terminals generate a common key based on their observation and using public communication.The message transmitted over the public channel should not leak any amount of information about the common key.
Both works mentioned above use the weak secrecy condition as a measure of secrecy.Given a code of a certain blocklength, the weak secrecy condition is fulfilled if the mutual information between the key and the available information at the eavesdropper normalized by the code blocklength is arbitrarily small for large blocklengths.On the other hand, the strong secrecy condition is fulfilled if the un-normalized mutual information between the key and the available information at the eavesdropper is arbitrarily small for large blocklengths, i.e., the total amount of information leaked to the eavesdropper is negligible.The secret key sharing model satisfying the strong secrecy condition has been studied in [7].
One could model the biometric authentication similar to this secret key generation source model; however, this model does not take into account the amount of information that the public data (the helper data in the biometric scenario) leaks about the biometric measurement.The goal of biometric authentication is to perform a secret and successful authentication procedure without compromising the information about the user (privacy leakage).Compromised biometric information is unique and cannot be replaced, so once it is compromised, it is compromised forever, which might lead to an identity theft (see [8][9][10] for more information on privacy concerns).Since the helper data we use to deal with noisy data is a function of the biometric measurements, it contains information about the biometric measurement.Thus, if an attacker breaks into the data base, he could be able to extract information about the biometric measurement from where the helper data is stored.Hence, we aim to control the privacy leakage as well.An information theoretic approach of secure biometric authentication controlling the privacy leakage was studied in [11,12] under ideal conditions, i.e., with perfect source state information (SSI) and without the presence of active attackers.
In both references [11,12], the capacity results under the weak secrecy condition were derived.In [13], the capacity result for the sequential key-distillation with rate limited one-way public communication using the strong secrecy condition was shown.
For reliable authentication, SSI is needed; however, in practical systems, it is never perfectly available.Compound sources model a simple and realistic SSI scenario in which the legitimate users are not aware of the actual source realisation.Nevertheless, they know that it belongs to a known uncertainty set of sources and that it remains constant during the entire observation.This model was first introduced and studied in [14,15] in a channel coding context.Compound sources can also model the presence of an active attacker, who is able to control the state of the source.We are interested in performing an authentication process that is robust against such uncertainties and attacks.The secret key generation for source uncertainty was studied in [16][17][18][19].In [16], the secret key generation using compound joint sources was studied and the key-capacity was established.
In [20], the achievability result of the privacy secrecy capacity region for generated secret keys for compound sources has been derived under the weak secrecy condition.In this work, we study robust biometric authentication in detail and extend this result in several directions.First, we consider a model where the legitimate users suffer from source uncertainty and/or attacks and derive achievability results under the strong secrecy conditions for both the generated and chosen secret key authentication.We then provide matching converses to obtain single-letter characterizations of the corresponding privacy secrecy capacity regions.
We further address the following question: can small changes of the compound source cause large changes in the privacy secrecy capacity region?Such a question has been first studied in [21] for arbitrarily varying quantum channels (AVQCs) showing that deterministic capacity has discontinuity points, while the randomness-assisted capacity is a continuous function of the AVQCs.This line of research is continued in [22,23], in which the classical compound wiretap channel, the arbitrarily varying wiretap channel (AVWC), and the compound broadcast channel with confidential messages (BCC) are studied.We study this for the biometric authentication problem at hand and show that the corresponding privacy secrecy capacity regions are continuous functions of the underlying uncertainty sets.Thus, small changes in the compound set lead to small changes in the capacity region only.
The rest of this paper is organized as follows.In Section 2, we introduce the biometric authentication model for perfect SSI and present the corresponding capacity results.In Section 3, we introduce the biometric authentication model for compound sources and show that secure, under the strong secrecy condition, and reliable authentication, under source uncertainty with positive rates, is possible deriving a single-letter characterization of the privacy secrecy capacity region for the chosen and generated secret key model.In Section 4, we show that the privacy secrecy capacity region for compound sources is a continuous function of the uncertainty set.Finally, the paper ends with a conclusion in Section 5.
Notation: Discrete random variables are denoted by capital letters and their realizations and ranges by lower case and script letters.P (X ) denotes the set of all probability distributions on X ; E(•) denotes the expectation of a random variable; Pr{•}, H(•) and I(•; •) indicate the probability, the entropy of a random variable, and mutual information between two random variables; D(• •) is the information divergence; p − q TV is the total variation distance between p and q on X defined as p − q TV := ∑ x∈X |p(x) − q(x)|.The set T n p,δ denotes the set of δ−typical sequences of length n with respect to the distribution p; the set T n W,δ (x n ) denotes the set of δ−conditional typical sequences with respect to the conditional distribution W : X → P (Y ) and sequence x n ∈ X n ; p x n denotes the empirical distribution of the sequence x n .

Information Theoretic Model for Biometric Authentication
Let X and Y be two finite alphabets.Let (x n , y n ) ∈ X n × Y n be a pair of biometric sequences of length n ∈ N; then, the discrete memoryless joint-source is given by the joint probability distribution ).This models perfect SSI, i.e., all possible measurements are generated by the discrete memoryless joint-source source Q, which is perfectly known at both the enrollment and the authentication terminal.

Generated Secret Key Model
The information theoretic authentication model consists of a discrete memoryless joint-source Q, which represents the biometric measurement source, and two terminals: the enrollment terminal and the authentication terminal as shown in Figure 1.At the enrollment terminal, the enrollment sequence X n is observed and the secret key K and helper data M are generated.At the authentication terminal, the authentication sequence Y n is observed.An estimate of the secret key K is made based on the authentication sequence Y n and the helper data M .Since the helper data is stored in a public database, this should not reveal anything about the secret key K and also as little as possible about the enrollment measurement X n .The distribution of the key must be close to uniform.

Enrollment
Authentication The biometric measurements X n and Y n are observed in the enrollment and authentication terminal, respectively.In the enrollment terminal, the key K and the helper data M are generated.The helper data is public, hence the eavesdropper also has access to it.In the authentication terminal, an estimation of a key K is made based on the observed biometric measurements Y n and the helper data M .
We consider a block-processing of arbitrary but fixed length n.Let M := {1, . . ., M n } be the helper data set and K := {1, . . ., K n } the secret key set.Definition 1.An (n, M n , K n )-code for generated secret key authentication for joint-source Q ∈ P (X × Y ) consists of an encoder f at the enrollment terminal with f : X n → K × M and a decoder ϕ at the authentication terminal + is called achievable for the generated secret key authentication for a joint-source Q, if, for any δ > 0, there exist an n(δ) ∈ N and a sequence of (n, M n , K n )-codes such that, for all n ≥ n(δ), we have Remark 2. Condition (1b) requires the key distribution p K to be close to the uniform distribution p K, where K is a random variable uniformly distributed over the key set K. By (1b), we have For small δ, we have that both distributions are close to each other.
Remark 3. Condition (1a) stands for reliable authentication, the information about the key leaked by the helper data is negligible by (1c) and the information about the biometric measurements leaked by the helper data

Definition 3.
The set of all achievable privacy secrecy rate pairs for generated key authentication is called privacy secrecy capacity region and is denoted by C G (Q).
We next present the privacy secrecy capacity region for the generated key authentication for the joint-source Q, which was first established in [11,12].
To do so, for some U with alphabet |U | ≤ |X | + 1 and V : X → P (U ), we define the region Theorem 1 ([11,12]).The privacy secrecy capacity region for generated key authentication is given by

Chosen Secret Key Model
In this section, we study the authentication model for systems for which the secret key is chosen beforehand.At the enrollment terminal, a secret key K is chosen uniformly and independent of the biometric measurements.The secret key K is bound to the biometric measurements X n , and, based on this, the helper data M is generated as shown in Figure 2. At the authentication terminal, the authentication measurement Y n is observed.An estimate of the secret key K is made based on the authentication sequence Y n and the helper data M .Since the helper data is stored in a public database, this should not reveal anything about the secret key and minimize the information leakage about the enrollment sequence X n .However, we should be able to reconstruct K. To achieve this, a masking layer based on the one-time pad principles is used.
The biometric sequences X n and Y n are observed at the enrollment and authentication terminal, respectively.In the enrollment terminal, the helper data M is generated for a given secret key K.The helper data is public, hence the eavesdropper also has access to it.In the authentication terminal, an estimation of a key K is made based on the observed biometric authentication sequence Y n and the helper data M .
The masking layer, which is another uniformly distributed chosen secret key K, is added to the top of the generated secret key authentication.At the enrollment terminal, a secret key K g and a helper data M are generated.The generated secret key is added modulo-|K| to the masking layer K and sent together with the helper data as additional helper data, i.e., M = (M, K ⊕ K g ).At the authentication terminal, an estimation of the generated secret key Kg is made based on Y n and M and the estimation of masking layer is made K = K ⊕ K g Kg .
We consider a block-processing of arbitrary but fixed length n.Let M := {1, . . ., M n } be the helper data set and K := {1, . . ., K n } the secret key set.+ for chosen secret key authentication is called achievable for a joint-source Q, if, for any δ > 0, there exist an n(δ) ∈ N and a sequence of (n, M n , K n )-codes, such that, for all n ≥ n(δ), we have Remark 4. The difference between Definition 5 and 2 is that, in here, the uniformity of the key is already guaranteed.Definition 6.The privacy secrecy capacity region for chosen secret key authentication for the joint-source Q ∈ P (X × Y ) is called privacy secrecy capacity region and is denoted as We next present the privacy secrecy capacity region for chosen secret key authentication for the joint-source Q as showed in [11].
Theorem 2 ([11]).The privacy secrecy capacity region for the chosen secret key authentication is given by

Authentication for Compound Sources
Let X and Y be two finite sets and S a finite state set.Let (x n , y n ) ∈ X n × Y n be a sequence pair of length n ∈ N.For every s ∈ S, the discrete memoryless joint-source is given by the joint probability distribution with p s ∈ P (X ) a marginal distribution on X and W s : X → P (Y ) a stochastic matrix.Definition 7. The discrete memoryless compound joint-source Q X Y is given by the family of joint probabilities distributions on X × Y as We define the finite set of marginal distributions Q X over the alphabet X from the compound joint-source Q X Y as For every ∈ L, we define the subset of the compound joint-source Q X Y with the same marginal distribution p as For every ∈ L, we define the index set S of Q X Y, as

Compound Generated Secret Key Model
In this section, we study the generated secret key authentication for finite compound joint-sources, which is a special class of sources that model a limited SSI, as shown in Figure 3.

Enrollment
Authentication The attacker controls the state of the source s ∈ S. The biometric sequences X n and Y n are observed at the enrollment and authentication, terminal respectively.In the enrollment terminal, the key K and the helper data M are generated.The helper data is public, hence the attacker also has access to it.In the authentication terminal, an estimation of a key K is made based on the observed authentication sequence Y n and the helper data M .
We consider a block-processing of arbitrary but fixed length n.Let M := {1, . . ., M n } be the helper data set and K := {1, . . ., K n } the secret key set.Definition 8.An (n, M n , K n )-code for generated secret key authentication for the compound joint-source Q X Y ⊂ P (X × Y ) consists of an encoder f at the enrollment terminal with f : X n → K × M and a decoder ϕ at the authentication terminal + is called achievable for generated secret key authentication for the compound joint-source Q X Y , if, for any δ > 0, there exist an n(δ) ∈ N and a sequence of (n, M n , K n )-codes, such that for all n ≥ n(δ) and for every s ∈ S, we have Consider the compound joint-source Q X Y .For a fixed ∈ L, V : X → P (U ) and for every s ∈ S , we define the region R(V, , s) as the set of all (R PL , R K ) ∈ R 2 + that satisfy Theorem 3. The privacy secrecy capacity region for generated secret key authentication for the compound joint-source Q X Y is given by Proof.The proof of Theorem 3 consists of two parts: achievability and converse.The achievability scheme uses the following protocol: • Estimate the marginal distribution p ˆ ∈ Q X from the observed sequence X n at the enrollment terminal via hypothesis testing.

•
Compute the key K and a helper data M based on X n , a common shared sequence T = U n by the enrollment and authentication terminal and using an extractor function g : {0, 1} n × {0, 1} d → {0, 1} k with N, d, k ∈ N whose input are the shared sequence T and a sequence of d uniformly distributed bits U d .The helper data M is equivalent to the helper data for the case with perfect SSI.The extended helper data in this case contains also the state of the marginal distribution and the uniformly distributed bits sequence, i.e., M = (M, L, U d ).

•
Store the extended helper data M in the public database.

•
Estimate the key K at the authentication terminal, based on the observations M and Y n , which can be seen as the outcome of one of the channels in W ˆ := {W s : X → P (Y ) : s ∈ S ˆ }.
A detailed proof can be found in Appendix A.
Remark 6.Note that the authentication for compound source model is a generalization of the models studied by [11,12], i.e., |S| = 1.Furthermore, one can see that, for |S| = 1, the capacity region under the strong secrecy condition equals the capacity region under the weak secrecy condition showed by [11,12].

Remark 7.
As we already mentioned, we aim for strong secrecy, i.e., in contrast to the weak secrecy constraint in (1c), we now require the un-normalized mutual information between the key and the helper data to be negligibly small.It would be Ideal to show perfect secrecy and a perfectly uniformed key, i.e., I(K; M ) = 0 and H(K) = 1 n log K n .It would be interesting to see how this constraint affects the achievable rate region.We suspect that the achievable rate region under perfect secrecy and perfectly uniformed key remains the same as in Theorem 3.

Remark 8.
From the protocol, note that once we have estimated the marginal distribution p ˆ ∈ Q X , we deal with a compound channel model without channel state information (CSI) at the transmitter (see [24]).Remark 9.The order of the set operations of the capacity region displays the fact that the marginal distribution is first estimated.This can be seen as partial state information, where the marginal distribution over X is known.

Compound Chosen Secret Key Model
In this section, we study chosen secret key authentication for finite compound joint-sources (see Figure 4).

Enrollment
Authentication The attacker controls the state of the source s ∈ S. The biometric sequences X n and Y n are observed in the enrollment and authentication terminal, respectively.In the enrollment terminal, the key K is predefined and the helper data M is generated.The helper data is public, hence the attacker also has access to it.In the authentication terminal, an estimation of a key K is made based on the observed authentication sequences Y n and the helper data M .
We consider a (n, M n , K n )-code of arbitrary but fixed length n.Definition 10.A privacy secrecy rate pair (R PL , R K ) ∈ R 2 + is called achievable for chosen secret key authentication for the compound joint-source Q X Y , if for any δ > 0 there exist an n(δ) ∈ N and a sequence of (n, M n , K n )-codes, such that, for all n ≥ n(δ) and for every s ∈ S, we have Consider the compound joint-source Q X Y .For a fixed ∈ L, V : X → P (U ) and for every s ∈ S , we define the region R(V, , s) as the set of all (R PL , R K ) ∈ R 2 + that satisfy Theorem 4. The privacy secrecy capacity region for chosen secret key authentication for the compound joint-source Q X Y is given by Proof.The proof can be found in Appendix B.
Remark 10.Note that, as for generated secret key authentication for compound sources, chosen secret key authentication for compound sources is a generalization of the models studied by [11].Furthermore, for perfect SSI, one can see that the capacity region under the strong secrecy condition equals the capacity region under the weak secrecy condition showed by [11].
Remark 11.Note that the privacy secrecy capacity region for the generated key model equals the privacy secrecy capacity region for chosen secret key authentication, i.e.,

Continuity of the Privacy Secrecy Capacity Region for Compound Sources
We are interested in studying how small variations in the compound source affect the privacy secrecy capacity region.The question of whether the capacity or capacity region is a continuous function of a source or channel is not always clear, especially if the source or channel are complicated.In [22], one can find an example of AVWCs, whose uncertainty set consists of only two channels, which already shows discontinuity points in its unassisted secrecy capacity.For a detailed discussion, see [25].In this section, we study the continuity of the privacy secrecy capacity region for compound sources.For this purpose, we introduce the distance between two compound sources and capacity regions, respectively.

Distance between Compound Sources
Definition 11.Let Q X Y,1 and Q X Y,2 be two compound sources.We define Definition 12. Let R 1 , and R 2 be two non-empty subsets of the metric space (R 2 , d) with We define the distance between two sets as

Continuity of the Privacy Secrecy Capacity Region
Theorem 5. Let ∈ (0, 1) and n ∈ N. Let Q X Y,1 and Q X Y,2 be two compound sources.If Remark 12.Note that since the privacy secrecy capacity region for the chosen secret key equals the privacy secrecy capacity region for the chosen secret key, the continuity behaviour holds also for the chosen secret key privacy capacity region.
Remark 13.This theorem shows that the privacy secrecy capacity region is a continuous function of the uncertainty set.In other words, small variations of the uncertainty set lead to small variations in the capacity region.
Proof.A detailed proof can be found in Appendix C.

Remark 14.
A complete characterisation of the discontinuity behaviour of the AVC capacity under list decoding can be found in [26].Note that this behaviour, based on Theorem 5, can not occur.

Conclusions
In this paper, we considered a biometric authentication model in the presence of source uncertainty.In particular, we studied a model where the actual source realization is not known, however it belongs to a known source set: this is the finite compound source model.We have shown that biometric authentication is robust against source uncertainty and certain classes of attacks.In other words, reliable and secure authentication is possible at positive key rates.We further characterize the minimum privacy leakage rate under source uncertainty.For future work, perfect secrecy for the biometric authentication model and a compound source with infinite sources is of great interest.
The test function is the indicator function 1[x n ∈ T n p ,δ ], i.e., after observing x n the test looks for the hypothesis p ˆ = p for which 1[x n ∈ T n p ,δ ] = 1.An error occurs, if the sequence x n was generated by the source p for any ∈ L; however, x n / ∈ T n p ,δ .This implies that either x n / ∈ ∈L T n p ,δ or x n ∈ T n p ,δ with = .Using Lemma 2.12 in [27], we upper bound the probability of this error event by where Letting n → ∞, the right-hand side of (A1) tends to zero.

Code Construction
For each ∈ L, we consider the auxiliary random variable U and the channel V and construct a code for which we analyze the decoding error, secrecy and privacy condition. Generate with k ∈ K := {1, . . ., 2 nR K } and m ∈ M := {1, . . ., 2 nR M } by choosing each symbol U i k,m in the codebook independently at random according to p u ∈ P (U ), computed from p (x)V(u|x) for every (x, u) ∈ X × U .We denote the codebook as Ũ = {U n k,m } (k,m)∈K×M .For every ∈ L and every s ∈ S , we define the following channels Σ X : U → P (X ), Σ Y s : U → P (Y ) and Σ X Y s : U → P (X × Y ) that satisfy: for every (u, x, y) ∈ U × X × Y.

Encoding Sets
For every (k, m, ) ∈ K × M × L, we define the encoding sets E k,m, ( Ũ) ⊂ X n as follows: Remark 15.Note that, by the definition of δ and Lemma 2.10 in [27]

. Decoding Sets
For every (k, m, ) ∈ K × M × L, we define the decoding sets D k (m( Ũ), ) ⊂ Y n as follows: Remark 16.One could consider sending some bits of the sequences X n through the public channel, such that the user at the authentication terminal can be able to estimate the actual source realization and so avoid the complicated decoding strategy.However, this approach would violate the strong secrecy condition.
Appendix A.1.5.Encoder-Decoder Pair Sets For every (k, m) ∈ K × M, we define the encoder-decoder pair set C k,m, ( Ũ) ∈ X n × Y n as follows: Appendix A.1.6.Error Analysis For every ∈ L, assume that the marginal distribution was estimated correctly, i.e., ˆ = .We analyze the probability of each error event separately.We denote the error at the enrollment terminal given the codebook Ũ as E,n ( Ũ).An error occurs at the enrollment terminal if, for every (k, m, ) ∈ K × M × L, the observed sequence x n does not belong to E k,m, ( Ũ), i.e., Averaging over all codebooks, from the independence of the random variables involved and from Lemma 2.13 in [27], we have The inequality (A2) follows from (1 − x) r ≤ exp(−rx), which holds for every x, r > 0.
Letting n → ∞ and choosing We denote the error probability of this event given the codebook Ũ for each correlated source Q t with t ∈ S as t n,k ( Ũ).

Definition 4 .
An (n, M n , K n )-code for chosen secret key authentication for joint-source Q ∈ P (X × Y ) consists of an encoder f at the enrollment terminal with f : K × X n → M and a decoder ϕ at the authentication terminal ϕ : Y n × M → K. Definition 5. A privacy secrecy rate pair (R PL , R K ) ∈ R 2 A3)the right-hand side of (A2) goes doubly exponentially fast to zero.An error at the authentication terminal occurs, when (k, m) was encoded at the enrollment terminal, but k = k was decoded at the authentication terminal.The set of joint observations describing this event is given byC E k,m, ( Ũ) c = C k,m, ( Ũ) c ∩ E k,m, ( Ũ) × D k (m( Ũ), ) c = E k,m, ( Ũ) × D k (m( Ũ), ) c ∪ s∈S T n Σ X Y s , δ(U n k,m ) c .