A Framework for the Design of Privacy-Preserving Record Linkage Systems
Abstract
1. Introduction
2. What Is PPRL?
3. System Design Framework for PPRL
3.1. Key Roles in a PPRL System
3.2. Architecture of PPRL Systems
- Data controllers send the PII they use for linkage through a tokenizer to create linkage tokens and controller-specific tokens.
- Tokens are then sent to the Honest Broker.
- Data controllers provide to the data platform de-identified data with controller-specific tokens generated from the tokenizer.
- The honest broker facilitates linkage by either providing mapping tables between the linkage tokens and controller-specific tokens or by providing the encryption keys that can decrypt the controller-specific tokens back into the linkage tokens. The linkage process can be performed automatically by the platform using the information provided by data controllers and the honest broker.
- A qualified expert determiner examines the system and the processes performed by the data platform to ensure that no PII gets shared at any point in this process and may make recommendations for mitigating any privacy gaps if necessary. These recommendations could include modifications to the data flow, additional encryption steps, increasing the security context of the data platform, adding access controls, limiting the number or type of data queries, perturbing the linked data to prevent the data controllers from linking it to their original data, or others.
- Data users are able to submit queries to the platform and receive the linked de-identified data they need for their analyses. They will not be able to see the process performed by the data platform to create the linked dataset.
3.3. System Controls for Privacy Protection
- Data sharing agreements between controllers specifying the data they plan to share, who will have access and for what purpose, allowable data usages, term limits, and liabilities;
- Data use agreements for end users specifying the specific terms of use, including prohibitions against further data sharing without approval and re-identification of data subjects;
- Honest Broker agreements between controllers and Honest Brokers within the PPRL system, specifying the data Honest Brokers will receive from controllers, the allowed usages for said data, and whom the Honest Brokers can share data with;
- Business agreements with Tokenizer providers should Tokenization software need to be purchased or licensed;
- Policies at organizations receiving data (such as the Honest Broker, the host of the data platform, and the data users) that establish roles-based access to data, data protection, secure data storage, data retention and destruction, identity and authentication, regular monitoring of data access and data use, and oversight over the PPRL system;
- Sanctions for people who violate terms of agreements and privacy and security policies;
- Personnel granted with authority and responsibility for oversight over the system, which includes IT security, data privacy and protection officers, data governance and ethics councils, and data stewardship committees;
- Training on privacy, security, and confidentiality for staff who have access to the system and users of the data;
- Establishment of protocols for privacy breaches, including playbooks for containment, response, and informing those who are affected;
- IT technical controls that enable identity and authentication, role-based access, secure data storage, data retention and destruction, and monitoring of data access and data use;
- Secure transmission of data with encryption in transit to ensure sensitive information does not get leaked when transferring data between different parties in the system;
- Encryption and hardening of systems that hold sensitive information, such as the Honest Broker’s environments that hold lookup tables between different sets of tokens;
- Implementation of anti-virus and anti-malware programs on all systems where data is stored;
- Physical and technical security for all systems where data is stored; and
- Regular review and security audits of the system.
3.4. Qualified Expert Evaluations
- If any PII gets transferred out of a controller’s environment during the tokenization process;
- Whether the tokens generated through the tokenization process could still be considered PII;
- Whether the information transferred by the Honest Broker back to data controllers or the data platform could be considered PII;
- Any additional re-identification risk to the de-identified datasets arising from the data linkage;
- Whether the controls implemented on the system ensure privacy protection throughout the PPRL process.
4. Discussion
4.1. Elucidation of Existing Systems
4.2. Design of New PPRL Systems
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
MDPI | Multidisciplinary Digital Publishing Institute |
DOAJ | Directory of open access journals |
PPRL | Privacy-preserving record linkage |
PII | Personal identifying information |
EHR | Electronic health record |
SDOH | Social determinants of health |
N3C | National Clinical Cohorts Collaborative |
NCATS | National Center for Advancing Translational Sciences |
SHA-2 | Secure Hash Algorithm 2 |
AES | Advanced Encryption Standard |
SNAP | Supplemental Nutrition Assistance Program |
NIST | National Institute of Standards and Technology |
FISMA | Federal Information Security Modernization Act |
HITRUST | Health Information Trust Alliance |
SOC 2 | Systems and Organization Controls 2 |
ISO | International Standards Organization |
IEC | International Electrotechnical Commission |
References
- Vo, A.; Tao, Y.; Li, Y.; Albarrak, A. The Association Between Social Determinants of Health and Population Health Outcomes: Ecological Analysis. JMIR Public Health Surveill. 2023, 9, e44070. [Google Scholar] [CrossRef] [PubMed]
- Burström, B.; Tao, W. Social determinants of health and inequalities in COVID-19. Eur. J. Public Health 2020, 30, 617–618. [Google Scholar] [CrossRef] [PubMed]
- Howell, C.R.; Zhang, L.; Yi, N.; Mehta, T.; Garvey, W.T.; Cherrington, A.L. Race Versus Social Determinants of Health in COVID-19 Hospitalization Prediction. Am. J. Prev. Med. 2022, 63, S103–S108. [Google Scholar] [CrossRef] [PubMed]
- Park, H.S.; White, R.S.; Ma, X.; Lui, B.; Pryor, K.O. Social determinants of health and their impact on postcolectomy surgery readmissions: A multistate analysis, 2009–2014. J. Comp. Eff. Res. 2019, 8, 1365–1379. [Google Scholar] [CrossRef] [PubMed]
- Washington, D.L.; Bean-Mayberry, B.; Riopelle, D.; Yano, E.M. Access to care for women veterans: Delayed healthcare and unmet need. J. Gen. Intern. Med. 2011, 26 (Suppl. S2), 655–661. [Google Scholar] [CrossRef] [PubMed]
- Dusetzina, S.B.; Tyree, S.; Meyer, A.-M.; Meyer, A.; Green, L.; Carpenter, W.R. An Overview of Record Linkage Methods. In Linking Data for Health Services Research: A Framework and Instructional Guide [Internet]; Agency for Healthcare Research and Quality (US): Rockville, MD, USA, 2014. Available online: https://www.ncbi.nlm.nih.gov/books/NBK253312/ (accessed on 29 April 2025).
- Office for Civil Rights (OCR). Summary of the HIPAA Privacy Rule. Available online: https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html (accessed on 29 April 2025).
- Pathak, A.; Serrer, L.; Zapata, D.; King, R.; Mirel, L.B.; Sukalac, T.; Srinivasan, A.; Baier, P.; Bhalla, M.; David-Ferdon, C.; et al. Privacy preserving record linkage for public health action: Opportunities and challenges. J. Am. Med. Inform. Assoc. 2024, 31, 2605–2612. [Google Scholar] [CrossRef] [PubMed]
- Mirel, L.B. Privacy Preserving Techniques: Case Studies from the Data Linkage Program. 19 May 2021. Available online: https://stacks.cdc.gov/view/cdc/114623 (accessed on 15 November 2023).
- Mirel, L.B.; Resnick, D.M.; Aram, J.; Cox, C.S. A methodological assessment of privacy preserving record linkage using survey and administrative data. Stat. J. IAOS 2022, 38, 413–421. [Google Scholar] [CrossRef] [PubMed]
- Landscape Analysis of Privacy Preserving PAtient Record Linkage Software (P3RLS). National Cancer Institute (NCI), National Institutes of Health (NIH), Department of Health and Human Services (HHS), Final Report Prepared by Synectics for Management Decisions, Inc., January 2020. Available online: https://surveillance.cancer.gov/reports/TO-P1-PPRLS-Landscape-Analysis.pdf (accessed on 3 July 2025).
- Evaluating the Performance of Privacy Preserving Record Linkage Systems (PPRLS). Evaluation Performed by Information Management Services (IMS) for Leidos Biomedical Research (LBR) Under the Agreement 20Q035TO01, Issued as a Subcontract Under Contract HHSN2612015000031, Task Order No. HHSN26100038 Issued by the National Cancer Institute (NCI), National Institutes of Health (NIH), Department of Health and Human Services (HHS). March 2023. Available online: https://surveillance.cancer.gov/reports/TO-P2-PPRLS-Evaluation-Report.pdf (accessed on 3 July 2025).
- Tachinardi, U.; Grannis, S.J.; Michael, S.G.; Misquitta, L.; Dahlin, J.; Sheikh, U.; Kho, A.; Phua, J.; Rogovin, S.S.; Amor, B.; et al. Privacy-preserving record linkage across disparate institutions and datasets to enable a learning health system: The national COVID cohort collaborative (N3C) experience. Learn. Health Syst. 2024, 8, e10404. [Google Scholar] [CrossRef] [PubMed]
- N3C Consortium. N3C Privacy-Preserving Record Linkage and Linked Data Governance; Zenodo: Geneva, Switzerland, 2021. [Google Scholar] [CrossRef]
- Petersen, S.; Lieberthal, R.; Miller, K.; Vakil, N. Privacy Preserving Record Linkage (PPRL) Strategy and Recommendations; PPRL Linkage Strategies Report; The MITRE Corporation: Mclean, VA, USA, 2023. Available online: https://www.nia.nih.gov/sites/default/files/2023-08/pprl-linkage-strategies-preliminary-report.pdf (accessed on 23 May 2025).
- Eckrote, M.J.; Nielson, C.M.; Lu, M.; Alexander, T.; Gupta, R.S.; Low, K.W.; Zhang, Z.; Eliazar, A.; Klesh, R.; Kress, A.; et al. Linking clinical trial participants to their U.S. real-world data through tokenization: A practical guide. Contemp. Clin. Trials Commun. 2024, 41, 101354. [Google Scholar] [CrossRef] [PubMed]
- Datavant. Overview: Tokenization Technology for Structured Data. January 2024. Available online: https://assets-global.website-files.com/655ba3a14f5a76dc96d65e09/65a8755ffd1a65fe7b1e5a53_LEPS_Whitepaper_Datavant%20Connect%20Overview%20-%20Tokenization%20Structured%20Data_Jan24.pdf (accessed on 3 July 2025).
- Supplemental Nutrition Assistance Program: Requirement for Interstate Data Matching to Prevent Duplicate Issuances. Federal Register. Available online: https://www.federalregister.gov/documents/2022/10/03/2022-21011/supplemental-nutrition-assistance-program-requirement-for-interstate-data-matching-to-prevent (accessed on 6 May 2025).
- NIST. NIST Privacy Framework 1.1; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2025; NIST CSWP 40 ipd. [CrossRef]
- Pascoe, C.; Quinn, S.; Scarfone, K. The NIST Cybersecurity Framework (CSF) 2.0; NIST: Gaithersburg, MD, USA, 2024. Available online: https://www.nist.gov/publications/nist-cybersecurity-framework-csf-20 (accessed on 29 April 2025).
- ISO/IEC 27001:2022; Information Security, Cybersecurity and Privacy Protection—Information Security Management Systems—Requirements. ISO: Geneva, Switzerland, 2022. Available online: https://www.iso.org/standard/27001 (accessed on 29 April 2025).
- ISO/IEC 27701:2019; Security Techniques—Extension to ISO/IEC 27001 and ISO/IEC 27002 for Privacy Information Management—Requirements and Guidelines. ISO: Geneva, Switzerland, 2019. Available online: https://www.iso.org/standard/71670.html (accessed on 29 April 2025).
- An Act to Amend Chapter 35 of Title 44, United States Code, to Provide for Reform to Federal Information Security. U.S. Government Publishing Office. 18 December 2014. Available online: https://www.govinfo.gov/app/details/PLAW-113publ283 (accessed on 29 April 2025).
- SOC 2®—SOC for Service Organizations: Trust Services Criteria. Available online: https://www.aicpa-cima.com/topic/audit-assurance/audit-and-assurance-greater-than-soc-2 (accessed on 29 April 2025).
- HITRUST Framework for Cybersecurity and Compliance Success. Available online: https://hitrustalliance.net/hitrust-framework (accessed on 29 April 2025).
- Office for Civil Rights (OCR). Guidance Regarding Methods for De-Identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. HHS.gov. Available online: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html (accessed on 21 March 2023).
- Suver, C.; Harper, J.; Loomba, J.; Saltz, M.; Solway, J.; Anzalone, A.J.; Walters, K.; Pfaff, E.; Walden, A.; McMurry, J.; et al. The N3C governance ecosystem: A model socio-technical partnership for the future of collaborative analytics at scale. J. Clin. Transl. Sci. 2023, 7, e252. [Google Scholar] [CrossRef] [PubMed]
Party | Roles and Responsibilities |
---|---|
Data Controllers | Role: Custodians of data subjects’ information |
Responsibility: Has the legal obligation to protect the privacy rights of data subjects whose data they control. Must enact physical, administrative, and technical controls to protect data. Must ensure that any data processors who access and use data they control are able to meet the same standards they follow for privacy and security controls. | |
Tokenizer | Role: Provides a technical solution for the creation of non-identifiable tokens to represent data subjects and facilitate linkages between datasets. |
Responsibility: Provides a secure solution for the creation of data subject tokens. Said solution often involves deploying tokenization software on-site that does not require connection to the internet, cloud computing, or any external system to operate, requiring the solution to be modular and containerizable. | |
Honest Broker | Role: Facilitates data linkages in a privacy-preserving manner. This can include performing de-identification and data encryption or just be limited to acting as an escrow service for encryption keys and patient token mappings. |
Responsibility: Is a neutral third party that provides protections for data subject privacy within a data use and sharing system. Other responsibilities may be defined in an Honest Broker Agreement with contractual controls in place that specify their role in the system and what they can and cannot do with the data. | |
Expert Determiner | Role: Evaluates and provides evidence (i.e., a report) that shows that what is being planned with the data does not violate the privacy rights of data subjects |
Responsibility: Provides an unbiased evaluation of the privacy risks to data subjects from the anticipated usage and data linkages and, if there are risks, provides recommendations for mitigation strategies to address those risks. The expert determiner may evaluate data subject identifiability before and after linkage, privacy and security controls on a system, end data user motives and capacity for privacy violations (including data access and use limitations), and likelihood of data breach. The determination may recommend additional transformations be applied to the data or additional controls be put in place. | |
Privacy Officer | Role: Knowledgeable expert in privacy-related matters who can help identify and address privacy-related issues when handling PII. |
Responsibility: Provides guidance on privacy-related matters on projects where PII is being handled to ensure compliance with regulations and policies. Assists with system design to ensure privacy principles are followed throughout the data lifecycle. | |
Data Platform | Role: Provides the hosting environment where individual unlinked datasets and linked datasets are stored and may also provide a computational platform for data users to access, process, and analyze said data. |
Responsibility: Implement the necessary privacy and security controls to protect the data hosted on the platform and provide analytic tools to end users for data processing and analysis. Platforms may be the enablers of PPRL, as the actual linkage process may take place on the platform using analytic tools provided within the platform. Platforms may also be the best place to implement technical privacy and security controls, such as limits on downloading data, access controls, and logging and monitoring data use. | |
Data Governance | Role: Provides the rules within which the system operates and oversight to ensure rules are followed. |
Responsibility: Development of the rules that govern the data system, including the privacy and security controls that must be followed, access control procedures, roles and responsibilities of different parties in the system, contractual terms and agreements, data retention and destruction policies, quality rules, data and metadata standards, and regular monitoring and auditing. Provides oversight over data governance rules to ensure that they are followed. | |
Data Users | Role: End users of the data product. |
Responsibility: Process and analyze the data according to the requirements and limitations set down within the data access and use agreements they sign. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nie, Z.; Tyndall, B.; Brannock, D.; Gentles, E.; Parish, E.; Banger, A. A Framework for the Design of Privacy-Preserving Record Linkage Systems. J. Cybersecur. Priv. 2025, 5, 44. https://doi.org/10.3390/jcp5030044
Nie Z, Tyndall B, Brannock D, Gentles E, Parish E, Banger A. A Framework for the Design of Privacy-Preserving Record Linkage Systems. Journal of Cybersecurity and Privacy. 2025; 5(3):44. https://doi.org/10.3390/jcp5030044
Chicago/Turabian StyleNie, Zixin, Benjamin Tyndall, Daniel Brannock, Emily Gentles, Elizabeth Parish, and Alison Banger. 2025. "A Framework for the Design of Privacy-Preserving Record Linkage Systems" Journal of Cybersecurity and Privacy 5, no. 3: 44. https://doi.org/10.3390/jcp5030044
APA StyleNie, Z., Tyndall, B., Brannock, D., Gentles, E., Parish, E., & Banger, A. (2025). A Framework for the Design of Privacy-Preserving Record Linkage Systems. Journal of Cybersecurity and Privacy, 5(3), 44. https://doi.org/10.3390/jcp5030044