Open Access This article is
- freely available
Cryptography 2018, 2(2), 8; doi:10.3390/cryptography2020008
Robust Secure Authentication and Data Storage with Perfect Secrecy
Institute of Theoretical Information Technology, Technical University of München, 80333 München, Germany
Author to whom correspondence should be addressed.
Received: 29 January 2018 / Accepted: 6 April 2018 / Published: 10 April 2018
We consider an authentication process that makes use of biometric data or the output of a physical unclonable function (PUF), respectively, from an information theoretical point of view. We analyse different definitions of achievability for the authentication model. For the secrecy of the key generated for authentication, these definitions differ in their requirements. In the first work on PUF based authentication, weak secrecy has been used and the corresponding capacity regions have been characterized. The disadvantages of weak secrecy are well known. The ultimate performance criteria for the key are perfect secrecy together with uniform distribution of the key. We derive the corresponding capacity region. We show that, for perfect secrecy and uniform distribution of the key, we can achieve the same rates as for weak secrecy together with a weaker requirement on the distribution of the key. In the classical works on PUF based authentication, it is assumed that the source statistics are known perfectly. This requirement is rarely met in applications. That is why the model is generalized to a compound model, taking into account source uncertainty. We also derive the capacity region for the compound model requiring perfect secrecy. Additionally, we consider results for secure storage using a biometric or PUF source that follow directly from the results for authentication. We also generalize known results for this problem by weakening the assumption concerning the distribution of the data that shall be stored. This allows us to combine source compression and secure storage.
Keywords:authentication; secure storage; perfect secrecy; privacy leakage
The present work addresses two essential practical problems concerning secrecy in information systems. The first problem is authentication in order to manage access to the system. The second problem is secure storage in public databases. Both problems are of essential importance for further development of future communication systems. The goal of this work is to derive a fundamental characterization of the possible performance of such communication systems that meets very strict secrecy requirements. We show that these strict requirements can be met without loss in performance compared to known results with weaker secrecy requirements.
Information theoretic security has become a very active field of research in information theory in the past ten years, with a large number of promising approaches. For a current presentation, see . In , the paper first introducing information theoretic security, the authors suggest requiring perfect secrecy  to guarantee security in communication. This means the data available to an attacker should be stochastically independent of the message that should be kept secret (the data and the message are modeled using random variables (RVs)). Thus, an attacker does not benefit from learning these data. In , this notion of security is weakened. The authors use weak secrecy  instead of perfect secrecy to guarantee secure communication. In many of the works on information theoretic security following , one considers weak secrecy or strong secrecy , which is yet another security requirement that is also weaker than perfect secrecy. As the name suggests, perfect secrecy is the desired ideal situation in cryptographic applications where an attacker does not get any information about the secret. Considering the roots of information theoretic security and its intuitive motivation, it suggests itself to require perfect secrecy for secure communications. Additionally, in , the recommendation is to not use weak secrecy as a secrecy measure. In , there is an example of a protocol that is obviously not secure, but meets the weak secrecy requirement.
The authors of the landmark paper  derive the capacity for secret key generation requiring perfect secrecy. A different model in information theoretic security has as an essential feature a biometric source or a PUF source. The outputs of biometric sources and the outputs of PUF sources both uniquely characterize a person , or a device, respectively . This property qualifies them for being used for authentication as well as for secure storage. In [7,9], the authors consider a model for authentication using the output of a biometric source. They also consider a model that can be interpreted as a model for secure storage using a biometric source. Both of these models are very similar to the model for secret key generation and for both of the models the authors require weak secrecy to hold when defining achievability.
In [6,7,9], the authors assume that the statistics of the (PUF) source are perfectly known. A simple analysis of [6,7,9] shows that the protocols for authentication constructed there heavily depend on the knowledge of the source statistics. Particularly, it is possible that small variations of the source statistics influence the reliability and secrecy of the protocols for authentication or storage, respectively. The assumption that the source statistics are perfectly known is too optimistic in applications. That is why we are interested in considering the uncertainty of the source or PUF source. We assume that we do not know the statistics of the source, but that we know a set of source statistics that contains the actual source statistic. Thus, we consider a compound version of the source model. We want to develop robust protocols that work for all source statistics in a given set. The compound model also allows us to describe an attack scenario where the attacker is able to alter the source statistics. There are relatively few results concerning compound sources. The compound version of the source model from  is considered in .
One of our contributions in the present work is the generalization of the model for authentication from , by considering authentication using a compound PUF source (or equivalently a biometric source). Additionally, our work differs from the state of the art as we consider protocols for authentication that achieve perfect secrecy.
We also consider secure data storage making use of a PUF source (or equivalently a biometric source). The corresponding information theoretic model is very similar to the second model presented in , but, in contrast to , we define achievability requiring perfect secrecy and we consider source uncertainty of the PUF source. Our considerations concerning perfect secrecy in this work answer the question posed in the conclusion of .
Some of the results for secure authentication described in this work have already been published in . Here, we additionally present the proofs that have been omitted in , i.e., the proofs of Theorem 4 and Theorem 5 and some more discussion. The results concerning secure storage have been presented in [13,14]. As these results heavily depend on , we briefly state them here (as well as the corresponding definitions).
In Section 2, we describe the authentication process and define the corresponding information theoretic model. We discuss different definitions of achievability for the model in Section 3. In this context, protocols that achieve perfect secrecy are of special interest. We develop the corresponding definition of achievability in this section. In Section 4, we prove capacity results for the model with respect to the various definitions of achievability. The main result in this section is Theorem 2. In Section 5, we generalize the model for authentication to the case with source uncertainty and define achievability for this model in Section 6. In Section 7, we derive the capacity region for the compound storage model. In Section 8, we consider some results for secure storage that follow from our results for authentication. The key result from authentication that we use for secure storage with perfect secrecy is Theorem 2. In Section 9, we further discuss our results.
For the most part, we use the notation introduced in .
2. Authentication Model
At first, we consider authentication using biometric or PUF data. This means we consider a scenario where a user enrolls in a system by giving a certain amount of biometric or PUF data to the system. Later, when the user wants to be authenticated, he again gives biometric or PUF data to the system. The system then decides if the user is accepted, i.e., if it is the same user that is enrolled in the system. In our considerations, we assume that the system can store some data in a public database.
Figure 1 depicts the authentication process as described in . The process consists of two phases. In the first phase, the enrollment phase, the authentication system receives from the PUF source and the of a user. It generates a helper message M and a secret-key K from . It then uses a one-way function f on K and stores the result and M in a public database together with the user’s . The second phase is the authentication phase. In this phase, the system receives from the PUF source and the of a user. It reads the corresponding helper message M and from the database. From M and , it generates a secret-key . Then, the system compares and . If they are equal, the user is accepted; otherwise, the user is rejected.
Now, we define an information theoretic model of the authentication process. We use random variables (RVs) to model the data. In the first chapters of this work, we assume that the distribution of the RVs is perfectly known. We drop this assumption in Section 5.
Let . The authentication model consists of a discrete memoryless multiple source (DMMS) with generic variables , the (possibly randomized) encoders  , and the deterministic decoder . Let and be the output of the DMMS. The RVs M and K are generated from using Φ and Θ. The RV is generated from and M using ψ. We use the term authentication protocol for .
It is possible to define the authentication protocol in a more general way by permitting randomized decoders Ψ, but one can argue that in our definition of achievability a randomized Ψ does not improve the performance of the protocols (, Problem 17.11). For convenience, we use the less general definition.
We model the PUF source as a DMMS. Due to physically induced distortions, we model the biometric/PUF data read in the two phases as jointly distributed RVs.
The distribution of is assumed to be known and can be used for the generation of the RVs. Thus, the encoders and the decoder are allowed to depend on the distribution.
3. Various Definitions of Achievability
For the authentication model, we define achievable secret-key rate versus privacy-leakage rate pairs. Intuitively, we want the probability that a legitimate user is rejected in the authentication phase to be small. Thus, should be large to fulfill this reliability condition. Additionally, the probability that an attacker is accepted in the authentication phase should be as small as possible. Thus, we consider the maximum false acceptance probability (mFAP) , which is the probability that an attacker using the best possible attack strategy is accepted in the authentication phase averaged over all public messages . As we want the mFAP to be as small as possible, we are interested in the largest possible set of secret keys . This reasoning is explained below. The system uses the output of a PUF source as input so it should leak as little information about as possible . This motivates the following definition of achievable rate pairs.
A tuple , , is an achievable secret-key rate versus privacy-leakage rate pair for the authentication model if for every there is an such that for all there exists an authentication protocol such that
We denote the corresponding authentication protocols by FAP-Protocols (False-Acceptance- Probability-Protocols).
In , a very similar definition of achievability is used. Instead of considering the relation between the mFAP and the set of secret-keys (1), the authors define the false-acceptance exponent that describes the exponential decrease of the mFAP in n. A rate pair that is achievable using FAP-protocols is also achievable according to the definition in , R playing the role of the false-acceptance exponent.
We now clarify the bound on the mFAP in Inequality (1) and our interest in large secret-key rates. For this purpose, we consider the following observation.
For a communication protocol fulfilling the reliability condition, it holds that
Introduce the RV E, setting for and , otherwise. Thus,
Here, (a) follows as if there is no such that and (b) follows from the -recoverability of K from . ☐
Thus, Lemma 1 shows that requiring Inequality (1) is in fact equivalent to requiring the mFAP to be as small as possible. It also justifies our interest in a large set .
There is another way to define achievable secret-key rate versus privacy-leakage rate pairs for the authentication model. Here, we want to keep the key secret from the attacker. can be interpreted as the average information required to specify k when m is known (, Chapter 2). Thus, we want to be as large as possible instead of requiring a small mFAP. This means we require . This condition is equivalent to the combination of the perfect secrecy condition  and the uniform distribution of the key, i.e., . Thus, we define achievability as follows.
A tuple , , is an achievable secret-key rate versus privacy-leakage rate pair for the authentication model if for every there is an such that for all there exists an authentication protocol such that
We denote the corresponding authentication protocols by PSA-Protocols (Perfect-Secrecy-Authentication-Protocols).
In , the authors derive the secret-key capacity for the source model. They define achievability requiring perfect secrecy and uniform distribution of the key. They do not consider the privacy-leakage in contrast to our definition of achievability.
It is interesting to compare the rate pairs achievable with respect to the restrictive Definition 3 with commonly used weaker requirements. In (, Definition 3.1), the authors give a different definition of achievable secret-key rate versus privacy-leakage rate pairs. Instead of Eqation (2), they requireand instead of Equation (3) they requirewhich is called the weak secrecy condition . Thus, we get a third definition of achievability.
(). A tuple , , is an achievable secret-key rate versus privacy-leakage rate pair for the authentication model if for every there is an such that for all there exists an authentication protocol such that
We denote the corresponding authentication protocols by WSA-Protocols (Weak-Secrecy-Authentication-Protocols).
The set of achievable rate pairs that are achievable using PSA-Protocols is called the capacity region . The set of achievable rate pairs that are achievable using WSA-Protocols is called the capacity region and the set of achievable rate pairs that are achievable using FAP-Protocols is called the capacity region .
Now, we look at some straightforward relations between these capacity regions. We can directly see that Definition 3 is more restrictive than Definition 4 so a PSA-Protocol is also a WSA-Protocol and thus
We now show that a PSA-Protocol is also a FAP-Protocol.
It holds that
As Equations (2) and (3) imply, for all , we have☐
4. Capacity Regions for the Authentication Model
In (, Theorem 3.1), the authors derive the capacity region .
(). It holds that
The union is over all RVs U such that . We only have to consider RVs U with .
The authors of  do not consider randomized encoders. In contrast, we permit randomization of the encoders in the enrollment phase. Using the strategy described in (, Problem 17.15), one can use the converse for deterministic encoders to prove the converse for randomized encoders with the same bounds on the secret-key rate and the privacy-leakage rate. Thus, the converse in  also holds true when randomization is permitted.
The following theorem is one of our main results.
It holds that
We do not prove Theorem 2 here but prove a more general result in the remainder of the text. This result is Theorem 5. It is more general as it is concerned with a compound version of the authentication model. The authentication model is a special case of the compound authentication model where the compound set consists of a single DMMS. ☐
We now strengthen Lemma 2.
It holds that
The achievability result is implied by Lemma 2. For the converse, we use a result of . As discussed in Remark 4, a rate pair which is achievable according to Definition 2 is also achievable according to the definition of achievability used in , where R plays the role of the false acceptance exponent E. Thus, we use (, Theorem 4), which says that a rate pair is not achievable. This implies our converse. ☐
5. Compound Authentication Model
We now consider authentication when the data source is not perfectly known. Figure 2 shows the corresponding authentication process. The only difference to the authentication process in Section 2 is the source uncertainty. As one can see in Figure 2, we even assume that an attacker can influence the source in the sense that the state of the source is altered, i.e., it generates another statistic. If the protocol for authentication is not robust, then authentication will not work.
We define the following information theoretic model for this authentication process with source uncertainty.
Let . The compound authentication model consists of a set of DMMSs with generic variables , , (all on the same alphabets and ), the (possibly randomized) encoders , and the (possibly randomized) decoder . Let and be the output of one of the DMMSs in , i.e., for an , but s is not known. The RVs M and K are generated from using Φ and Θ. The RV is generated from and M using Ψ. We use the term compound authentication protocol for .
The uncertainty of the data source is modeled making use of a compound DMMS, that is, the DMMS modeling the PUF source is not known, but we know a set of DMMSs to which the actual DMMS belongs.
is assumed to be known and can be used for the generation of the RVs, that is, the encoder and the decoder can depend on these distributions.
Given , we define the setfor . The sets , , form a partition of , as they form the equivalence classes for the corresponding equivalence relation. We denote a set of representatives by .
6. Achievability for the Compound Model
For the compound authentication model, we define achievable secret-key rate versus privacy-leakage rate pairs.
A tuple , , is an achievable secret-key rate versus privacy-leakage rate pair for the compound authentication model if for every there is an such that, for all , there exists a compound authentication protocol such that, for allwhere . We denote the corresponding authentication protocols by PSCA-Protocols (Perfect-Secrecy-Compound-Authentication-Protocols).
The set of achievable secret-key versus privacy-leakage rate pairs that are achievable using PSCA-Protocols is called the compound capacity region .
7. Capacity Regions for the Compound Authentication Model
We now derive the compound capacity region for the compound authentication model. We only consider compound sets such that . For the proof, we need the following theorem, which is a generalization of (, Theorem 6.10).
Given a (possibly infinite) set of channels , a set with , , and . Then, for every and all n large enough, there is a pair of mappings , , , such that is an -code for all with codewords in A andWe call this pair of mappings a compound -code for .
Even though the proof of Theorem 4 is very similar to the proof of (, Theorem 6.10), the proof of (, Theorem 4.3) and the proof of the results in , we prove Theorem 4 for the sake of completeness. The proof can be found in Appendix A.
It holds thatwhere, for (, we define appropriately. For all the union is over all RVs such that, for all we have . For , we only have to consider RVs with .
For all and all , let , and be RVs where are the output of the DMMS in with index s and and are connected by the channel . Thus, we have the Markov chains for all . Let . We now show that, given , for n large enough we can choose a set that consists of disjoint subsets with the following properties.
- We consider a partition of the set of all sets in subsets. Thus, we denote the sets by , , indicating to which subset they belong. We denote the set of indices m corresponding to by . For each , we have
- Each consists of sequences of the same type.
- It holds that
- For each , one can define pairs of mappings that are compound -codes, , for the channels , for all in the following way. Define an (arbitrary) bijective mapping and an appropriate mapping . Then, is such a code. This means
Let . We denote the elements of by . We consider , , which are disjoint subsets of . We show that they are in fact disjoint subsets of for small enough. This can be seen as follows. For , , it holds that for at least one . Thus, there is a withfor some .
Now, assume that there is a . Denote the type of by . Thus, there is a withwhere the last inequality follows from the assumption that . Thus, for , this is a contradiction and we know and are disjoint.
We start the construction of by choosing a set with with . According to Theorem 4, there is a compound -code for the channels , with at leastcodewords for n large enough. We denote the set of these codewords by . As there are less than types, we know that there is a set of at leastcodewords in with the same type. We only pick these codewords. There are at leastof them for n large enough. We now pick exactlyof these codewords and we denote this set by . Now, we choose a set with . We construct the set in the same way as . Thus, is a set ofcodewords of the same type corresponding to an -code. We continue this process until we can not find a setwith
We repeat this process for all , . Thus, we have for all
Thus, we have Inequality (8) for n large enough.
We now can define the encoders/decoders , and .
- We define and as follows. The system gets a sequence . It checks if , , for an (We can choose small enough and n large enough such that the are disjoint). If this is true for , the channel is used n times to generate from . For , the system looks in for . If the system chooses for m the index of the subset containing . If it chooses an arbitrary . In addition, if , it chooses an arbitrary . For , the system looks in for . If , it considers the compound -code corresponding to the subset containing . If
- We define as follows. The system gets a sequence and m. It decodes using the code corresponding to . Then, is used on the result. The result is if it differs from . Otherwise, an arbitrary is chosen.
Using the properties of the communication protocol, we analyse the achievability conditions. We denote the outputs of the DMMS by and and the output of the channel used on by . Assume the index of the DMMS is , . Thus, .
- We define the following events:
- We define
- For the secret-key rate, we have
- Finally, we analyse the privacy-leakage rate. We have
Using these results, we conclude from Inequalities (10) and (13) that
Using the distributive law for sets, we can see that this is equivalent to(see Appendix B). We now consider the converse. Assume are distributed i.i.d. according to for an arbitrary . The following calculations hold for all . Similarly to the converse part of the proof of (, Theorem 3.1), we havewhere we use Equation (6) for (a), Fano’s inequality with and the data processing inequality in combination with , which follows from the definition of the compound authentication protocol for (b) and Equation (7) for (c). From the definition of the compound authentication protocol, we also know that . Using the definition of Markov chains, this implies for all (see Appendix C). (From we get using Implications (A11) and (A13). Then, we use Implication (A12) to get and from this we get the desired result using Implication (A13).)
The equationis equivalent to (, Definition 3.9). This is equivalent to
Thus, . Thus, we haveso
Now, we define for all . This implies for all , which can again be seen using the results from Appendix C. Let Q be a time sharing RV independent of all others and uniformly distributed on and let , and . Then,for all , where ( follows from and the independence of Q. We havefor an arbitrary and , where follows as for all as the RVs are generated i.i.d. We also have for all
Thus, which means . We also have
Thus, using the definition of F, we getwhich impliesfor and n large enough. We also consider
From the definition of the compound storage model, we know . Using the data processing inequality, we get which means , where the last inequality follows from Fano’s inequality. Thus,where (a) follows as and are i.i.d. and (b) follows from Inequality (14). With our definition of U, X and Y and the same argumentation as before, we getfor n large enough, where, for (, we use the definition of F and Inequality (16). We have for allwhere (a) follows from , which follows from the definition of the compound authentication protocol. As is the same for all , , this result implies that is the same for all , . We get the bounds (16) and (17) for each . We denote the corresponding RVs by for all . The joint distribution of is as we see from Equation (15). Thus, Equation (18) and the Inequalities (16) and (17) for all imply
We again use the distributive law for sets to get our result. The bounds on the cardinality of the alphabet of the auxiliary random variables can be derived as in . ☐
This result implies Theorem 2 as we use a deterministic decoder for the achievability proof.
In , the authors also derive the compound capacity region for , but, in contrast to this work, they consider deterministic protocols and require strong secrecy instead of perfect secrecy when defining achievability. This compound capacity region equals .
8. Secure Storage
We now discuss some other applications of the already proven results apart from authentication. For this purpose, we take a look at some results for secure storage from [13,14], which follow directly from our results for authentication. Here, we again consider compound sets with .
In , we consider the following model for secure storage with source uncertainty, where the corresponding scenario is depicted in Figure 3.
Let . The compound storage model consists of a set of DMMSs with generic variables , , (all on the same alphabets and ), a source that puts out a RV , the (possibly randomized) encoder and the (possibly randomized) decoder . Let and be the output of one of the DMMSs in , i.e., for an , but s is not known. is independent of . The RV M is generated from and using . The RV is generated from and M using . We use the term compound storage protocol for . Additionally, it holds that, for all , there is an such that for all
We define achievability for this model.
A tuple , , is an achievable storage rate versus privacy-leakage rate pair for the compound storage model if for every there is an such that for all there exists a compound storage protocol such that for allwhere . We denote the corresponding storage protocols by PSCS-Protocols (Perfect-Secrecy- Compound-Storage-Protocols).
The set of achievable rate pairs that are achievable using PSCS-Protocols is called the compound capacity region .
We then can prove the following result.
It holds that
We combine source compression and secure storage in  by considering the following model, which models the scenario depicted in Figure 4.
Let . The compound source storage model consists of a set of DMMSs with generic variables , , (all on the same alphabets and ), a general source  that fulfills the strong converse property, the (possibly randomized) encoder and the (possibly randomized) decoder . Let and be the output of one of the DMMSs in , i.e., for an , but s is not known. The RV M is generated from and using . The RV is generated from and M using . We use the term compound source storage protocol for .
For this model, we define achievability where we consider the output of the PUF source as a resource.
A tuple , , is an achievable performance pair for the compound source storage model if, for every , there is a such that, for all there exists a compound source storage protocol such that, for allwhere . We denote the corresponding compound source storage protocols by PSCSS-Protocols (Perfect-Secrecy-Compound-Source-Storage-Protocols).
The set of achievable performance pairs that are achievable using PSCSS-Protocols is called the optimal performance region .
We then can prove the following results.
It holds thatwhere for we define appropriately. For all , the union is over all RVs such that, for all we have .
For stationary ergodic sources , it holds thatFor all , the union is over all RVs such that, for all , we have . For , we only have to consider RVs with .
We derived the capacity region for the (compound) authentication model requiring perfect secrecy and uniform distribution of the key generated for authentication and compared the result to existing results where only strong secrecy and a weaker condition on the key distribution is required. The two capacity regions are the same. We could prove this result by allowing for randomized encoders, which are not necessarily used when deriving the capacity region corresponding to the weaker definition of achievability. We saw that we can use the results for authentication to prove corresponding results for secure storage.
As already mentioned, compound sources do not only model source uncertainty but also model attacks where an attacker can influence parameters of the source while the legitimate parties do not know which parameters the attacker chose. It is essential that in this scenario the parameter is constant for all symbols read from the source. An attack where the parameter can be varied while the source is used is fundamentally stronger. A characterization of achievable rates for this attack scenario is not known, except for the source model for secret key generation, which has been derived in . For an overview of these types of attacks, see . Recently, the corresponding problem for wiretap channels could be solved [23,24]. For the source model, the attacker can choose his strategy depending on the public data, which is a difficulty that does not appear for wiretap channels. Nevertheless the authors hope that, using techniques from the works concerning the wiretap channel, the open problem for the source model can be solved.
Funding is acknowledged from the German Research Foundation (DFG) via grant BO 1734/20-1 and from the Federal Ministry of Education and Research (BMBF) via grant 16KIS0118K. Holger Boche would like to thank Rainer Plaga, Federal Office for Information Security (BSI), for the discussion on PUFs and issues concerning different secrecy measures.
Sebastian Baur and Holger Boche conceived this study and derived the results. Sebastian Baur wrote the manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Proof of Theorem 4
We prove the result for compound codes with the additional constraint on the decoding sets that, for it holds thatfor all messages . Additionally, for we requirefor all . First, consider the case that is a finite set. Let be such a code that can not be extended. Thus, for all , there is a such thatwhere . It also holds thatfor n large enough. We now consider the set
We know , as for all there is at least one with Inequality (A3). Thus,
Thus, there is a such that for alland