Cryptographic Foundations of Pseudonymisation for Personal Data Protection
Abstract
1. Introduction
2. Background
2.1. Privacy and Personal Data Protection
Everyone has the right to respect for his or her private and family life, home and communications.
In relation to personal data protection, the Charter further provides:
Everyone has the right to the protection of personal data concerning him or her. Such data must be processed fairly for specified purposes and on the basis of the consent of the person concerned or some other legitimate basis laid down by law. Everyone has the right of access to data which has been collected concerning him or her, and the right to have it rectified (…).
- Purpose limitation: Personal data should be collected for specified, explicit, and legitimate purposes and not further processed in a way that is incompatible with those purposes.
- Data minimisation: Personal data must be adequate, relevant, and limited to what is necessary in relation to the purposes for which it is processed.
- Integrity and confidentiality: Personal data must be processed in a manner that ensures appropriate security, including protection against unauthorised or unlawful processing and against accidental loss, destruction, or damage, using suitable technical or organisational measures.
2.2. The Notion of Pseudonymisation: Technical and Legal Perspectives
- Deterministic pseudonymisation. This is the case that the same pseudonym is always assigned to the same individual, even within the same database (if there are more than one entries corresponding to this individual) or in different databases. This is also known as consistent pseudonymisation.
- Randomized pseudonymisation. By these means, a different pseudonym is being assigned for the same user, even within the same database (if there are more than one entries corresponding to this individual) or in different databases. This enables the provision of unlinkability [14]—i.e., from an attacker’s perspective that aims to gain more information than they are authorised to have, any two pseudonymous entries corresponding to the same individual are no more and no less related than they are related concerning their a priori knowledge.
2.3. Contribution of This Work
3. Taxonomy of Cryptographic Techniques for Pseudonymisation
- Reversible vs. Irreversible: A pseudonymisation is considered as reversible if the entity that determines the purpose and the means of the pseudonymisation (hereinafter, pseudonymisation entity) can directly reverse it—i.e., given the pseudonym and any auxiliary information used during pseudonymisation, mapping back the pseudonym into the original identifier is straightforward. On the other hand, irreversible pseudonymisation refers to a one-way process for which no efficient method exists to reconstruct the original identifier from the pseudonym. It should be noted, though, that irreversibility does not preclude verifiability—that is, the ability to efficiently determine whether a given pseudonym corresponds to a specific identifier (verifiability is also, apparently, present in any reversible pseudonymisation).
- Deterministic vs. Probabilistic: A pseudonymisation is considered as deterministic if it ensures that the same pseudonym is generated for the same individual, within the same context—for example, if a database has more than one entries corresponding to the same individual, then for each of them the individual’s direct identifier(s) is/are replaced by the same pseudonym. On the other hand, in a probabilistic pseudonymisation setting, different pseudonyms are always generated for the same individual; this ensures that, e.g., within the same database, unlinkability between different entries corresponding to the same individual is ensured.It should be pointed out that unlinkability is a broad concept that refers to the property that different pieces of information relating to the same individual cannot be connected [14]. As such, unlinkability also encompasses situations where correlations between different datasets (e.g., separate databases) need also to be prevented. While probabilistic pseudonymisation suffices to ensure this property, it is not the only approach; indeed, each dataset could also be pseudonymised deterministically, provided that distinct pseudonyms are used for the same individual across different datasets to prevent linking (although, in such a case, linkage within the same database is feasible).
- Blind vs. Non-blind: A pseudonymisation process is considered as blind if the entity that performs pseudonymisation does not obtain access to the original identifiers in any stage of the procedure. We refer to non-blind pseudonymisation for any other case.
- Centralised vs. Distributed: A pseudonymisation is considered as centralised if the relevant process is being performed by a single entity—i.e., the so-called pseudonymisation entity. On the contrary, there are techniques that necessitate, towards deriving pseudonyms, the active collaboration of more than one entities; in such a case, we refer to distributed pseudonymisation.
- Irreversibility is closely linked to both security guarantees and the GDPR principle of data minimisation. In scenarios where data controllers, acting as pseudonymisation entities, do not require the ability to recover original identifiers, irreversible pseudonymisation represents an appropriate design choice. Conversely, in cases where the controller must be able to re-establish the association between pseudonyms and data subjects: for example, to enable the exercise of data subject rights, a reversible pseudonymisation technique is necessary.
- Deterministic pseudonymisation is required in use cases where consistent pseudonyms are needed, that is, where the same data subject must be mapped to the same pseudonym across records or processing operations. When such consistency is not required, probabilistic pseudonymisation provides stronger protection by limiting linkability and reducing the risk of tracking data subjects across pseudonymised datasets, in line with the GDPR’s objective of mitigating re-identification risks.
- Blind pseudonymisation is particularly relevant in settings where the data controller must never obtain access to the original identifiers, thereby enforcing strict separation of knowledge. This aligns with the GDPR’s emphasis on reducing unnecessary access to personal data and limiting the exposure of identifiers to entities that do not require them for their processing purposes. However, blind pseudonymisation typically involves increased implementation and operational complexity, as it relies on either interactive protocols or advanced cryptographic mechanisms to guarantee that identifiers remain hidden throughout the process.
- The distinction between centralised and distributed approaches is significant from both a technical and an organisational standpoint. Distributed pseudonymisation limits the amount of personal data accessible to any single entity, thereby reinforcing the GDPR principle of data minimisation and reducing the risk of undue re-identification. Since no single party can perform all required operations or reconstruct full identifiers, distributed approaches inherently constrain data exposure. At the same time, they typically entail higher implementation and operational complexity, as they require coordination among multiple parties and the deployment of more sophisticated cryptographic or protocol mechanisms. Nevertheless, they mitigate risks associated with single points of failure and contribute to the principles of integrity and confidentiality.
4. Pseudonymisation Based on Symmetric Cryptography
4.1. Pseudonymisation Based on Symmetric Encryption
- It is reversible, since knowledge of the secret key allows decryption and, thus, re-identification.
- It is mainly deterministic, provided that the same key K is being used, as well as that the relevant Initialisation Vector (IV) is also fixed (recall that IV is a non-secret, random or pseudorandom value used in conjunction with the symmetric key to initialize the encryption process). However, if the IV is being changed within the same pseudonymisation application, then a probabilistic pseudonymisation scheme occurs.
- It is a non-blind process, since the pseudonymisation entity has direct access to the original identifiers.
- It shall be considered as a centralised approach, since the pseudonyms are being generated in one place from the pseudonymisation entity.
- The pseudonymisation of clinical data for research applications is being discussed in [25], in which patient identifiers are encrypted using a symmetric cryptographic algorithm to produce reversible pseudonyms. This approach allows an authorized entity to subsequently decrypt the pseudonym and re-establish the link between the research data and the corresponding patient, while ensuring that all other parties have access only to the pseudonymised identifiers.
- An e-health architecture is described in [26], in which patient identifiers are replaced with pseudonyms generated via symmetric encryption and maintained independently from the corresponding clinical data. Patient re-identification for authorized primary health care purposes is feasible due to the symmetric keys, while the original identifiers remain inaccessible to researchers analysing the pseudonymised dataset.
- A translational research scenario is discussed in [27] that is related to the pseudonymisation of clinical data for secondary use in a research database. This scenario necessitates one single deterministic and distinct pseudonym per patient; to this end, the usage of a block cipher is proposed to unambiguously transform the unique identifiers into pseudonyms.
- The authors of [28] utilize the Advanced Encryption Standard for creating pseudonyms for the purpose of allowing health data exchange for secondary, cross-institutional clinical research.
Format-Preserving Encryption
4.2. Pseudonymisation Based on Hash Functions
4.2.1. Use of Unkeyed Hash Function
4.2.2. Use of Keyed Hash Functions
- It is irreversible (i.e., an one-way transformation).
- It is mainly deterministic, provided that the same key K is being used in the same pseudonymisation process.
- It is a non-blind process, since the pseudonymisation entity has direct access to the original identifiers.
- It shall be considered as a centralised approach, since the pseudonyms are being generated in one place from the pseudonymisation entity.
- To derive statistics for smart TV’s customers use [31].
- To propose a generic approach for implementing pseudonymisation so as to generate stable pseudonyms that preserve linkability across records while preventing unauthorized re-identification [32].
- For log pseudonymisation [33].
- In the case of the OpenPseudononymiser tool, created by the University of Nottingham [34].
5. Pseudonymisation Based on Asymmetric Cryptography
5.1. Use of Classical Public Key Encryption
5.2. Identity-Based Encryption
- Pseudonymisation could be either reversible or irreversible, according to how exactly the strings being used for pseudonyms are being generated (for instance, in the aforementioned example that is based on a hash function, we have an irreversible pseudonymisation).
- The pseudonymisation is deterministic, since the same master key is being used to derive pseudonyms.
- The pseudonymisation is not blind, since the pseudonymisation entity obtains access to the original identifiers.
- The pseudonymisation is centralised (one single entity suffices to derive the pseudonyms).
5.3. Polymorphic Encryption and Pseudonymisation
- With regard to encryption, personal data can be encrypted and stored at a central point in such a way that there is no need to fix a priori who can decrypt the data later; this can be decided later on, via some transformation of the ciphertext which allows ciphertext to be locally decryptable via locally different cryptographic keys. This transformation can be performed blindly, without the party performing this—being called transcryptor in [38]— being able to see the original plaintext.
- With regard to the pseudonymisation, PEP also proceeds similarly to encryption; again, the role of the transcryptor is crucial since it generates pseudonyms via cryptographic transformations in a “blind” way—i.e., the transcryptor manages to “change” the content of the ciphertext so as the original user’s identifier in the corresponding plaintext is being transformed into a meaningless irreversible pseudonym before the transcryptor “allows” the ciphertext to be locally decryptable for a legitimate user. Different recipients receive different pseudonyms for the same individual, thus promoting unlinkability.
- 1.
- Re-randomisation (RR): Adds new randomness s to an existing ciphertext without changing the underlying message.
- 2.
- Re-keying (RK): Changes the effective public key under which a ciphertext can be decrypted. Specifically, if the original ciphertext has been produced based on a public key X, re-keying with a factor k yields a ciphertext decryptable under the key .
- 3.
- Re-shuffling (RS): Scales the ciphertext and embedded message, used in pseudonym derivation.
- A key factor controlling how the ciphertext’s decryption key is adapted for the recipient. This key stems from another entity being called the Access Manager, which in turn never learns the aforementioned pseudonym (i.e., the ciphertext). In fact, controls who can decrypt.
- A pseudonym factor , specific to the destination, determining the local pseudonym for that domain. Again, this comes from the Access Manager.
- It is mathematically reversible, due to the fact that the Registration Manager is in place to associate each with the corresponding (unless of course the Manager destroys any such associations). Moreover, there always exists a corresponding private key allowing for decryption (i.e., the master private key), despite the fact that the system is designed in such a way that this private key is not known to any party and not used. As described in [38], this key is securely stored by a trusted Key Server, in secure hardware. However, this in fact—from a cryptographical point of view—yields a reversible pseudonymisation.
- It is a probabilistic process—for the same individual, different pseudonyms are always created for a different recipient. Even for the same recipient B, who decrypts to the same pseudonym for a given individual A, the corresponding encrypted value of the pseudonym is always different, due to the cipher’s probabilistic notion.
- The process is non-blind in terms of the reversibility described above. However, the transcryptor, being the main part of the pseudonymisation entity that actually produces the final pseudonyms, does not obtain access to the original identifiers (i.e., thus operating in a blind way).
- The process is centralised. since in fact there is a main central point performing pseudonymisation.
5.4. Oblivious Pseudorandom Functions (OPRFs)
- Irreversibility, in terms that the pseudonymisation entity which holds the secret key does not even obtain access to the derived pseudonym (of course, in case that this entity learns somehow a pseudonym, the original identifier can be recovered; however, OPRFs are being used exactly in cases that the pseudonymisation entity does not need to know the original identifier).
- Deterministic, under the assumption that, for a specific pseudonymisation process, the same key is being used by the pseudonymisation entity to derive pseudonyms.
- Blind, since the pseudonymisation entity does never learn the original identifier.
- Since OPRFs are in fact 2-party computation protocols (for a specific purpose), they should not be considered as centralised but distributed—namely, the pseudonymisation entity cannot perform any pseudonymisation unless the active participation of the entity which is to be pseudonymised.
- Pseudonymisation is performed by a central yet fully oblivious intermediary. This pseudonymisation entity learns neither the original identifiers nor the pseudonyms it generates, which are computed as values of a pseudorandom function (PRF). The entity additionally serves as a central storage point, acting as an intermediary for subsequent data-sharing operations.
- When data are stored at rest at this central point, they are pseudonymised in a fully unlinkable and irreversible manner. Concretely, if the original dataset of a single data provider contains, besides identifiers, N attributes, the intermediary stores N separate tables, each consisting of two columns: one holding a pseudonym and the other a single attribute. The pseudonyms used across these N tables are pairwise distinct, even when the corresponding entries relate to the same individual. As a result, neither re-identification nor cross-attribute linkage is possible at the level of the pseudonymisation entity.
- When specific subsets of data are requested by an authorised data recipient, linkage is established only at that point through a controlled, non-transitive join operation (which means that when data is joined for a specific query, the resulting pseudonymous identifiers are fresh and cannot be linked across different join results. In other words, even if two join operations involve overlapping data, the pseudonymisation linkage from one join does not automatically propagate to another, preventing unintended correlation across joins). Consequently, only the requesting data recipient obtains the correlated pseudonymous data, while different data recipients accessing different subsets are unable to correlate their respective views.
- A notable property of this approach is that the join operation enables the derivation, for each individual, of a consistent yet irreversible pseudonym, thereby yielding a deterministic pseudonymisation outcome. This holds despite the fact that the intermediary stores multiple, mutually unlinkable pseudonyms corresponding to the same individual. Importantly, this derivation is carried out blindly by the pseudonymisation entity using an oblivious PRF, such that the entity does not learn the resulting deterministic pseudonyms.
OPRFs in Secure Multiparty Computation (MPC) Protocols
- A chooses a secret key K for a pseudorandom function F.
- A and B execute m Oblivious Pseudorandom Function evaluations, such as in the i-th execution, , the entity A inputs K and the entity B inputs the i-th identifier . Due to the property of the OPRF, at the end of these evaluations, B learns the outputs , without learning K, whilst A does not learn anything from the list .
- A computes the outputs , and sends these values to B.
- B computes the intersection between the lists , and , and sends to A all values from its private list for which there exists such that .
- Irreversible, since the OPRF values are not reversible; although they are mathematically reversible, an OPRF inherently ensures the property that no entity has the information needed to reverse its output.
- Deterministic, provided that the same key K is used (which is indeed the case for a specific pseudonymisation context).
- Blind, since the underlying idea is to hide the original identifiers.
- Distributed, since we have a two-party protocol.
5.5. Secret Sharing
- They are reversible, since collaboration of a well-determined number of entities allows recovering the original identifier—and such collaboration is not forbidden in principle.
- The pseudonymisation process is probabilistic due to the randomness of share generation.
- They are in principle non-blind, in terms that, typically, there is an original entity having access to the original identifier that splits it into n shares.
- It is inherently distributed, since we have n entities that in fact contribute to the whole process.
5.6. User-Generated Pseudonyms
Anonymous Credentials
- A function called performed by each issuer, defined as follows:through which the issuer (i.e., an entity that signs and issues credentials for provers) issues for the user u a credential, through the issuer’s secret key . The user’s attribute set is the one that the prover wants to be certified. Typically, the credential is bound to a secret that only the user holds since this function is in fact a joint, interactive issuance protocol between issuer and user, whilst credential issuance is typically blind (or partially blind), preventing the issuer from linking the issued credential to future presentations.
- A function called executed by the user (prover) as follows:which obtains as inputs a credential , the set of attributes to be disclosed, and the prover’s private (secret) key , towards generating a proof token which, when verified with respect to the issuer’s public key attests that the disclosed attribute information is correctly embedded in , whilst the latter suffices to convince a verifier of this fact.
- The function executed by the Verifier, defined aswhich checks whether the proof token is valid, in relation to the issuer’s public key
- Idemix [55], which implements the Camenisch–Lysyanskaya paradigm; a user generates a presentation token by deriving a zero-knowledge proof from a credential bound to a user-held secret and a random value. The token attests to possession of a valid credential and, where required, to selected certified attributes, while remaining unlinkable across multiple showings for ordinary verifiers. However, the token is cryptographically derived from persistent secret information and issuer-certified data.
- U-prove [56], which is based on [57]; in this system, a presentation token is generated from a credential obtained via a blind issuance protocol and is cryptographically bound to a user-held secret. Unlike Idemix though, tokens are typically designed to be intentionally linkable, so as to use repeated presentations of the same token in order to yield a stable pseudonym for the user (prover). Verification confirms the validity of the certified attributes with respect to the issuer’s public key, but does not by itself reveal the user’s identity.
- PRIMA [58], which decouples credential issuance by the Identity Provider from subsequent service authentication by enabling the user to mediate the use of issued credentials. Authentication tokens are generated locally by the user and disclose only the information strictly required by the relying service, without involving the Identity Provider at authentication time. As a result, the Identity Provider is prevented from directly observing service usage and its ability to track or profile users across services is significantly reduced.
- They are irreversible, since the derived pseudonyms (tokens) do not allow computing directly the users original identifiers. Of course, as also discussed above, they are verifiable, meaning that having the user’s identifier and the secret information allows verifying whether a given token corresponds to this user.
- They are inherently probabilistic, since tokens are generated using random values within the underlying cryptographic computations.
- They are blind, as all user-generated pseudonyms (i.e., the data controller does not learn the original identifiers).
- They are decentralised, as users actively participate in the whole pseudonymisation process. However, typically, there is a central issuer—especially in anonymous credential systems.
6. Summary and Discussion
6.1. Trust Assumptions for Data Controllers
- Honest controller—this is the case that the data controller is assumed to properly follow the specified pseudonymisation process, protect the cryptographic keys, re-identify only when it is strictly required and generally fulfils the purpose limitation and data minimisation principles.
- Semi-honest controller—this is the case that the data controller is assumed to properly follow the specified pseudonymisation process but it may attempt to infer or derive additional information on data subjects based on legitimately accessible data (e.g., pseudonymised data, other auxiliary information etc.).
- Adversarial controller—this is the case that the controller is assumed to possibly deviate even from the intended pseudonymisation process, intentionally attempting to re-identify of single out individuals so as to violate data minimisation and/or purpose limitation. Such a controller may also collude with other parties.
6.2. Implementation Aspects of Pseudonymisation Approaches
6.2.1. Symmetric Encryption
6.2.2. Keyed Hash Function
6.2.3. Asymmetric Encryption
6.2.4. Identity-Based Encryption (IBE)
6.2.5. Polymorphic Encryption and Pseudonymisation (PEP)
6.2.6. Oblivious Pseudorandom Functions (OPRFs)
6.2.7. Secret Sharing
6.2.8. User-Generated Pseudonyms
6.3. How to Protect the Secret Keys for Pseudonymisation
7. (Legally) Revisiting the Notion of Pseudonymisation
8. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AES | Advanced Encryption Standard |
| CJEU | Court of Justice of the European Union |
| EDPB | European Data Protection Board |
| EU | European Union |
| FPE | Format-Preserving Encryption |
| GDPR | General Data Protection Regulation |
| HIPAA | Health Insurance Portability and Accountability Act |
| IBE | Identity-Based Encryption |
| ID | Identifier |
| IoT | Internet of Things |
| IP | Internet Protocol |
| IV | Initialisation Vector |
| MAC | Message Authentication Code |
| OPRF | Oblivious Pseudorandom Function |
| PEP | Polymorphic Encryption and Pseudonymisation |
| PKG | Private Key Generator |
| PSI | Private Set Intersection |
| SMPC | Secure Multiparty Computation |
| VOPRF | Verifiable Oblivious Pseudorandom Function |
References
- Dhirani, L.L.; Mukhtiar, N.; Chowdhry, B.S.; Newe, T. Ethical Dilemmas and Privacy Issues in Emerging Technologies: A Review. Sensors 2023, 23, 1151. [Google Scholar] [CrossRef]
- Hintze, M.; Emam, K.E. Comparing the Benefits of Pseudonymisation and Anonymisation under the GDPR. J. Data Prot. Priv. 2018, 2, 145–158. [Google Scholar] [CrossRef]
- European Union Agency for Cybersecurity (ENISA). Recommendations on Shaping Technology According to GDPR Provisions: An Overview on Data Pseudonymisation; Technical Report; European Union Agency for Cybersecurity (ENISA): Athens, Greece, 2019. [Google Scholar] [CrossRef]
- European Union Agency for Cybersecurity (ENISA). Pseudonymisation Techniques and Best Practices; Technical Report, ENISA Report; European Union Agency for Cybersecurity (ENISA): Athens, Greece, 2019. [Google Scholar] [CrossRef]
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). Official Journal of the European Union, L119, 4 May 2016, pp. 1–88. 2016. Available online: https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng (accessed on 5 March 2026).
- European Parliament and Council of the European Union. Regulation (EU) 2025/327 of the European Parliament and of the Council of 11 February 2025 on the European Health Data Space and Amending Directive 2011/24/EU and Regulation (EU) 2024/2847. Official Journal of the European Union, L 327, 2025. Regulation (EU) 2025/327, 5.3.2025. Available online: https://eur-lex.europa.eu/eli/reg/2025/327/oj/eng (accessed on 5 March 2026).
- European Parliament and Council of the European Union. Regulation (EU) 2022/868 of the European Parliament and of the Council of 30 May 2022 on European Data Governance and Amending Regulation (EU) 2018/1724 (Data Governance Act). Official Journal of the European Union, L 152, 2022. Regulation (EU) 2022/868. 30 May 2022. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32022R0868 (accessed on 5 March 2026).
- European Parliament and Council of the European Union. Regulation (EU) 2023/2854 of the European Parliament and of the Council of 13 December 2023 on Harmonised Rules on Fair Access to and Use of Data and Amending Regulation (EU) 2017/2394 and Directive (EU) 2020/1828 (Data Act). Official Journal of the European Union, L 327, 2023. Regulation (EU) 2023/2854, Published 22 December 2023, Commonly Referred to as the Data Act. Available online: https://eur-lex.europa.eu/eli/reg/2023/2854/oj/eng (accessed on 5 March 2026).
- European Union Agency for Cybersecurity (ENISA). Data Pseudonymisation: Advanced Techniques and Use Cases; Technical Report TP-01-21-024-EN-N; European Union Agency for Cybersecurity (ENISA): Athens, Greece, 2021. [Google Scholar] [CrossRef]
- Limniotis, K. Cryptography as the Means to Protect Fundamental Human Rights. Cryptography 2021, 5, 34. [Google Scholar] [CrossRef]
- Charter of Fundamental Rights of the European Union. Official Journal of the European Communities C 364/01, 18 December 2000, 2000. Proclaimed by the European Parliament, the Council and the Commission, Consolidated Versions Commonly Cited (e.g., 2012) Available. Available online: https://www.europarl.europa.eu/charter/pdf/text_en.pdf (accessed on 5 March 2026).
- Chatzistefanou, V.; Limniotis, K. On the (Non-)anonymity of Anonymous Social Networks. In Communications in Computer and Information Science, E-Democracy—Privacy-Preserving, Secure, Intelligent E-Government Services—7th International Conference, E-Democracy 2017, Athens, Greece, 14–15 December 2017; Katsikas, S.K., Zorkadis, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; Volume 792, pp. 153–168. [Google Scholar]
- Finck, M.; Pallas, F. They who must not be identified—Distinguishing personal from non-personal data under the GDPR. Int. Data Priv. Law 2020, 10, 11–36. [Google Scholar] [CrossRef]
- Pfitzmann, A.; Hansen, M. Anonymity, Unlinkability, Unobservability, Pseudonymity, and Identity Management—A Consolidated Proposal for Terminology (Version v0.28). Technical Terminology Draft, TU Dresden, Faculty of Computer Science/ULD Kiel, 2006. Available online: https://dud.inf.tu-dresden.de/literatur/Anon_Terminology_v0.28.pdf (accessed on 5 March 2026).
- Jajodia, S.; Samarati, P.; Yung, M. (Eds.) Encyclopedia of Cryptography, Security and Privacy; Springer Nature: Berlin/Heidelberg, Germany, 2025. [Google Scholar] [CrossRef]
- European Data Protection Board. Guidelines 01/2025 on Pseudonymisation; Technical Report; European Data Protection Board: Brussels, Belgium, 2025. [Google Scholar]
- Akil, M.; Islami, L.; Fischer-Hübner, S.; Martucci, L.A.; Zuccato, A. Privacy-Preserving Identifiers for IoT: A Systematic Literature Review. IEEE Access 2020, 8, 168470–168485. [Google Scholar] [CrossRef]
- Abu Attieh, H.; Müller, A.; Wirth, F.N.; Prasser, F. Pseudonymization tools for medical research: A systematic review. BMC Med. Inform. Decis. Mak. 2025, 25, 128. [Google Scholar] [CrossRef] [PubMed]
- Dijkhuizen, N.V.; Ham, J.V.D. A Survey of Network Traffic Anonymisation Techniques and Implementations. ACM Comput. Surv. 2018, 51, 52:1–52:27. [Google Scholar] [CrossRef]
- Asad, M.; Shaukat, S.; Javanmardi, E.; Nakazato, J.; Tsukada, M. A Comprehensive Survey on Privacy-Preserving Techniques in Federated Recommendation Systems. Appl. Sci. 2023, 13, 6201. [Google Scholar] [CrossRef]
- Au nón, J.M.; Hurtado-Ramírez, D.; Porras-Díaz, L.; Irigoyen-Pe na, B.; Rahmian, S.; Al-Khazraji, Y.; Soler-Garrido, J.; Kotsev, A. Evaluation and utilisation of privacy enhancing technologies—A data spaces perspective. Data Brief 2024, 55, 110560. [Google Scholar] [CrossRef] [PubMed]
- Garrido, G.M.; Sedlmeir, J.; Uludağ, Ö.; Alaoui, I.S.; Luckow, A.; Matthes, F. Revealing the landscape of privacy-enhancing technologies in the context of data markets for the IoT: A systematic literature review. J. Netw. Comput. Appl. 2022, 207, 103465. [Google Scholar] [CrossRef]
- Sweeney, L. k -Anonymity: A Model for Protecting Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2002, 10, 557–570. [Google Scholar] [CrossRef]
- National Institute of Standards and Technology (NIST). Federal Information Processing Standards Publication 197 (FIPS 197): Advanced Encryption Standard (AES); Technical Report FIPS 197-upd1; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2023. [CrossRef]
- Noumeir, R.; Lemay, A.; Lina, J.M. Pseudonymization of Radiology Data for Research Purposes. J. Digit. Imaging 2007, 20, 284–295. [Google Scholar] [CrossRef] [PubMed]
- Heurix, J.; Neubauer, T. Privacy-Preserving Storage and Access of Medical Data through Pseudonymization and Encryption. In Proceedings of the 8th International Conference on Trust, Privacy and Security in Digital Business (TrustBus 2011); Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6863, pp. 186–197. [Google Scholar] [CrossRef]
- Aamot, H.; Kohl, C.D.; Richter, D.; Knaup-Gregori, P. Pseudonymization of patient identifiers for translational research. BMC Med. Inform. Decis. Mak. 2013, 13, 75. [Google Scholar] [CrossRef]
- Elger, B.S.; Iavindrasana, J.; Iacono, L.L.; Müller, H.; Roduit, N.; Summers, P.E.; Wright, J. Strategies for health data exchange for secondary, cross-institutional clinical research. Comput. Methods Programs Biomed. 2010, 99, 230–251. [Google Scholar] [CrossRef] [PubMed]
- Dworkin, M. Recommendation for Block Cipher Modes of Operation: Methods for Format-Preserving Encryption; NIST Special Publication 800-38G; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2016.
- Demir, L.; Kumar, A.; Cunche, M.; Lauradoux, C. The Pitfalls of Hashing for Privacy. IEEE Commun. Surv. Tutor. 2018, 20, 551–565. [Google Scholar] [CrossRef]
- Schwartmann, R.; Weiß, S.; Group, D.P.F. White Paper on Pseudonymization: Guidelines for the Legally Secure Deployment of Pseudonymization Solutions in Compliance with the General Data Protection Regulation; White Paper/Technical Report; Digital Summit Data Protection Focus Group/ePrivacy.eu: Madrid, Spain, 2017. [Google Scholar]
- Zimmer, E.; Burkert, C.; Petersen, T.; Federrath, H. PEEPLL: Privacy-Enhanced Event Pseudonymisation with Limited Linkability. In Proceedings of the 35th ACM/SIGAPP Symposium on Applied Computing (SAC ’20), Brno, Czech Republic, 30 March–3 April 2020; pp. 1308–1311. [Google Scholar] [CrossRef]
- Varanda, A.; Santos, L.; Costa, R.L.d.C.; Oliveira, A.; Rabad ao, C. Log pseudonymization: Privacy maintenance in practice. J. Inf. Secur. Appl. 2021, 63, 103021. [Google Scholar] [CrossRef]
- University of Nottingham and Julia Hippisley-Cox. OpenPseudonymiser. Open Source Pseudonymisation Software for Dataset Digest Generation. 2011. Available online: https://www.openpseudonymiser.org/ (accessed on 5 March 2026).
- Shamir, A. Identity-Based Cryptosystems and Signature Schemes. In Advances in Cryptology—CRYPTO 84; Chaum, D., Blakley, G.R., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1985; Volume 196, pp. 47–53. [Google Scholar] [CrossRef]
- Boneh, D.; Franklin, M. Identity Based Encryption from the Weil Pairing. SIAM J. Comput. 2003, 32, 586–615. [Google Scholar] [CrossRef]
- Boussada, R.; Elhdhili, M.E.; Saidane, L.A. A Lightweight Privacy-Preserving Solution for IoT: The Case of E-Health. In Proceedings of the IEEE 20th International Conference on High Performance Computing and Communications, IEEE 16th International Conference on Smart City, IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Exeter, UK, 28–30 June 2018; pp. 555–562. [Google Scholar]
- Verheul, E.; Jacobs, B.; Meijer, C.; Hildebrandt, M.; de Ruiter, J. Polymorphic Encryption and Pseudonymisation for Personalised Healthcare. Cryptology ePrint Archive, Paper 2016/411. 2016. Available online: https://eprint.iacr.org/2016/411 (accessed on 5 March 2026).
- van Gastel, B.E.; Jacobs, B.; Popma, J. Data Protection Using Polymorphic Pseudonymisation in a Large-Scale Parkinson’s Disease Study. J. Park. Dis. 2021, 11, S19–S25. [Google Scholar] [CrossRef]
- Casacuberta, S.; Hesse, J.; Lehmann, A. SoK: Oblivious Pseudorandom Functions. In Proceedings of the 7th IEEE European Symposium on Security and Privacy, Genoa, Italy, 6–10 June 2022; pp. 625–646. [Google Scholar] [CrossRef]
- Lehmann, A. ScrambleDB: Oblivious (Chameleon) Pseudonymization-as-a-Service. Proc. Priv. Enhancing Technol. 2019, 2019, 289–309. [Google Scholar] [CrossRef]
- Davidson, A.; Goldberg, I.; Sullivan, N.; Tankersley, G.; Valsorda, F. Privacy Pass: Bypassing Internet Challenges Anonymously. Proc. Priv. Enhancing Technol. 2018, 2018, 164–180. [Google Scholar] [CrossRef]
- Kolesnikov, V.; Kumaresan, R.; Rosulek, M.; Trieu, N. Efficient Batched Oblivious PRF with Applications to Private Set Intersection. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS ’16), Vienna, Austria, 24–28 October 2016; pp. 818–829. [Google Scholar] [CrossRef]
- Lindell, Y. Secure Multiparty Computation (MPC). Commun. ACM 2020, 64, 86–96. [Google Scholar] [CrossRef]
- Shamir, A. How to Share a Secret. Commun. ACM 1979, 22, 612–613. [Google Scholar] [CrossRef]
- Li, H.; Pei, L.; Liao, D.; Sun, G.; Xu, D. Blockchain Meets VANET: An Architecture for Identity and Location Privacy Protection in VANET. Peer-to-Peer Netw. Appl. 2019, 12, 1178–1193. [Google Scholar] [CrossRef]
- Biskup, J.; Flegel, U. On Pseudonymization of Audit Data for Intrusion Detection. In Designing Privacy Enhancing Technologies; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2001; Volume 2009, pp. 161–180. [Google Scholar] [CrossRef]
- Lehnhardt, J.; Spalka, A. Decentralized Generation of Multiple, Uncorrelatable Pseudonyms without Trusted Third Parties. In Trust, Privacy and Security in Digital Business (TrustBus 2011); Furnell, S., Lambrinoudakis, C., Pernul, G., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6863, pp. 113–124. [Google Scholar] [CrossRef]
- Schartner, P.; Schaffer, M. Unique User-Generated Digital Pseudonyms. In Computer Network Security (MMM-ACNS 2005); Gorodetsky, V., Kotenko, I., Skormin, V., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3685, pp. 194–205. [Google Scholar] [CrossRef]
- Kermezis, G.; Limniotis, K.; Kolokotronis, N. User-Generated Pseudonyms Through Merkle Trees. In Privacy Technologies and Policy (APF 2021); Gruschka, N., Antunes, L.F.C., Rannenberg, K., Drogkaris, P., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; Volume 12703, pp. 89–105. [Google Scholar] [CrossRef]
- Camenisch, J.; Hansen, M. (Eds.) Privacy and Identity Management for Life; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6794. [Google Scholar]
- Chaum, D. Blind Signatures for Untraceable Payments. In Advances in Cryptology: Proceedings of CRYPTO ’82; Plenum Press: New York, NY, USA, 1983; pp. 199–203. [Google Scholar]
- Camenisch, J.; Lysyanskaya, A. An Efficient System for Non-transferable Anonymous Credentials with Optional Anonymity Revocation. In Advances in Cryptology—EUROCRYPT 2001; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2001; Volume 2045, pp. 93–118. [Google Scholar]
- Kakvi, S.A.; Martin, K.M.; Putman, C.; Quaglia, E.A. SoK: Anonymous Credentials. In Security Standardisation Research; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2023; Volume 13895, pp. 129–151. [Google Scholar] [CrossRef]
- Camenisch, J.; Herreweghen, E.V. Design and Implementation of the Idemix Anonymous Credential System. In Proceedings of the 9th ACM Conference on Computer and Communications Security, Washington, DC, USA, 18–22 November 2002; ACM: New York, NY, USA, 2002; pp. 21–30. [Google Scholar]
- Paquin, C. U-Prove Cryptographic Specification, version 1.1. Technical Report. Microsoft: Redmond, WA, USA, 2011.
- Brands, S. Rethinking Public Key Infrastructures and Digital Certificates: Building in Privacy; MIT Press: Cambridge, MA, USA, 2000. [Google Scholar]
- Asghar, R.; Backes, M.; Simeonovski, M. PRIMA: Privacy-Preserving Identity and Access Management at Internet-Scale. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; IEEE: Kansas City, MO, USA, 2018; pp. 1–6. [Google Scholar]
- Court of Justice of the European Union. Judgment of the Court (First Chamber) of 4 September 2025, Case C-413/23 P: European Data Protection Supervisor v Single Resolution Board (Concept of Personal Data/Pseudonymisation). 2025. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:62023CJ0413 (accessed on 5 March 2026).
- European Data Protection Board. Report on Stakeholder Event on Anonymisation and Pseudonymisation of 12 December 2025; Technical Report; European Data Protection Board: Brussels, Belgium, 2026. [Google Scholar]
- Khutsaeva, A.; Leevik, A.; Bezzateev, S. A Survey of Post-Quantum Oblivious Protocols. Cryptography 2025, 9, 62. [Google Scholar] [CrossRef]






| Cryptographic Technique | Reversible/ Irreversible | Deterministic/ Probabilistic | Blind/ Non-Blind | Centralised/ Distributed |
|---|---|---|---|---|
| Symmetric encryption | Reversible | Deterministic # | Non-blind | Centralised |
| Keyed hash function | Irreversible | Deterministic # | Non-blind | Centralised |
| Asymmetric encryption | Reversible | Probabibilistic | Non-blind | Centralised |
| Identity-Based Encryption | Both options | Deterministic | Non-blind | Centralised |
| Polymorphic * | Reversible | Probabilistic | Non-blind | Centralised |
| Oblivious PRFs | Irreversible † | Deterministic | Blind | Distributed |
| Secret sharing | Reversible | Probabilistic | Non-blind | Distributed |
| User-generated pseudonyms | Irreversible † | Both options | Blind | Hybrid ‡ |
| Technique | Required Trust Model |
|---|---|
| Symmetric encryption | Honest |
| Keyed hash function | Honest |
| Asymmetric encryption | Honest |
| Identity-Based Encryption | Honest |
| Polymorphic | Semi-honest |
| Oblivious PRFs | Semi-honest or Adversarial (depending on the implementation) |
| Secret sharing | Adversarial (up to the relevant threshold) |
| User-generated pseudonyms | Semi-honest |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Limniotis, K. Cryptographic Foundations of Pseudonymisation for Personal Data Protection. Cryptography 2026, 10, 18. https://doi.org/10.3390/cryptography10020018
Limniotis K. Cryptographic Foundations of Pseudonymisation for Personal Data Protection. Cryptography. 2026; 10(2):18. https://doi.org/10.3390/cryptography10020018
Chicago/Turabian StyleLimniotis, Konstantinos. 2026. "Cryptographic Foundations of Pseudonymisation for Personal Data Protection" Cryptography 10, no. 2: 18. https://doi.org/10.3390/cryptography10020018
APA StyleLimniotis, K. (2026). Cryptographic Foundations of Pseudonymisation for Personal Data Protection. Cryptography, 10(2), 18. https://doi.org/10.3390/cryptography10020018

