Privacy-by-Design Environments for Large-Scale Health Research and Federated Learning from Data
Abstract
:1. Introduction
1.1. Challenges in Health and Healthcare Data Sharing
1.2. Overcoming Data Sharing Challenges Using TREs and PHTs
2. Key Aspects and Components of TREs
2.1. Key Stakeholders
- Citizens and patients are not primary users of TREs, but they collectively contribute a wealth of data that help inform and improve public health, in which case they also serve as the beneficiary. When a new system is proposed, the concern of data privacy naturally arises, so advocators of TREs will be expected to earn patients’ trust by being in continual and transparent communication with them.
- Researchers and analysts are the intended users of TREs who will request access to data that meet their research requirements. It is also important to satisfy users’ need for analytical tools that can securely support their data processing and data science tasks.
- Data custodians are managers, key providers, and curators of patient data who remain in control of their data even when others are granted access to these data. The provision and hosting of data may be outsourced to third-party TRE providers, but only data custodians maintain ownership of their data.
- Funding agencies provide financial support to create technical prototyping and development of TREs, with an interest in pushing innovative and effective healthcare research.
2.2. Five Safes Framework
- Safe people Individuals who are trained, verified, authorised, and comply with ethical data usage and legal agreements are considered safe people. Only safe people are permitted access to requested data and code execution within a TRE, and their account information is not to be shared with another individual. For example, researchers who are approved by data custodians to work on projects that benefit the public would fall into this category.
- Safe projects Proposed projects must demonstrate appropriate use of data and justify the potential public benefit. Proposals will be reviewed by data custodians, and only approved projects can be conducted within a TRE.
- Safe setting Systems should store data securely and prevent unauthorised imports and exports of individual records, track researcher activity within the TRE for compliance monitoring, and provide tooling for running data analysis within the TRE. Two alternatives were presented in the original UKHDRA proposal to grant researchers remote access to a safe setting, (a) via a Virtual Desktop Interface (VDI) that requires users to authenticate before accessing view-only data and performing analysis, and (b) via an interface where only code execution can be run and tested against artificial (e.g., synthetic or transformed) datasets while hiding the actual data from the user. Ultimately, any necessary packages and tools needed to carry out a safe project should be deployable upon request.
- Safe computing as an extension of safe setting Safe settings imply on-premise hardware resources that data custodians are expected to provide. Given the demand for scalability, intense computation and robust security, cloud computing infrastructures are preferable today. Safe computing should provide a safe setting such that any outsourced private or public sector computing infrastructure should safeguard individual health records from unauthorised third-party access, including cloud provider administrators. Data in a safe setting or safe computing environment should also be encrypted and only be decryptable by the appropriate safe people.
- Safe data All identifying information should be removed from data within a TRE in a way that minimises risks of re-identification. Data should be standardised, and meta information about existing datasets within TREs, such as standards implemented, data descriptions, data origin, etc., should be made publicly available for greater discoverability and usability.
- Safe outputs Any data analysis outputs produced via a TRE should not be exported without proper authorisation, and, if authorised, only necessary results that support reporting or publishing should be exported. However, researchers may apply for the release of data or code from the TRE.
- Safe return-an optional extension Study subjects may consent to individual analysis results being returned to them. In this scenario, outputs produced by research users of TREs may return the corresponding data to the organisation or team that collected the subject data. The identifiers, if known, should only be available and re-identifiable at the original setting [11].
2.3. Workflow of Federated TREs
2.4. TREs with a Stronger Community Involvement
3. Benefits of TREs
3.1. Data Accessibility
3.2. Data Quality
3.3. Data Privacy
3.4. Expedited Data Sharing
3.5. Promoting Collaboration and Advancement in Healthcare Research and Practice
4. Select Examples of Successful TRE Implementations in the United Kingdom
5. PHTs: Key Concepts, Benefits and Examples
6. Future Directions
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kamel Boulos, M.N.; Kwan, M.-P.; El Emam, K.; Chung, A.L.; Gao, S.; Richardson, D.B. Reconciling public health common good and individual privacy: New methods and issues in geoprivacy. Int. J. Health Geogr. 2022, 21, 1. [Google Scholar] [CrossRef] [PubMed]
- Quinn, M.; Forman, J.; Harrod, M.; Winter, S.; Fowler, K.E.; Krein, S.L.; Gupta, A.; Saint, S.; Singh, H.; Chopra, V. Electronic health records, communication, and data sharing: Challenges and opportunities for improving the diagnostic process. Diagnosis 2019, 6, 241–248. [Google Scholar] [CrossRef] [PubMed]
- Nair, S.; Hsu, D.; Celi, L.A. Chapter 3: Challenges and opportunities in secondary analyses of electronic health record data. In Secondary Analysis of Electronic Health Records; Springer: Berlin/Heidelberg, Germany, 2016; pp. 17–26. Available online: https://www.ncbi.nlm.nih.gov/books/NBK543649 (accessed on 7 September 2022).
- Downey, A.S.; Olson, S.; Rapporteurs. Sharing Clinical Research Data: Workshop Summary; National Academies Press: Washington, DC, USA, 2013. Available online: https://www.ncbi.nlm.nih.gov/books/NBK131772/pdf/Bookshelf_NBK131772.pdf (accessed on 7 September 2022).
- Seh, A.H.; Zarour, M.; Alenezi, M.; Sarkar, A.K.; Agrawal, A.; Kumar, R.; Ahmad Khan, R. Healthcare data breaches: Insights and implications. Healthcare 2020, 8, 133. [Google Scholar] [CrossRef] [PubMed]
- DHI News Team. TREs in the NHS—How Health Data Sharing Is Saving Lives. 2022. Available online: https://www.digitalhealth.net/2022/05/tres-in-the-nhs-how-health-data-sharing-is-saving-lives/ (accessed on 12 August 2022).
- Goldacre, B.; Morley, J. Better, Broader, Safer: Using Health Data for Research and Analysis. A Review Commissioned by the Secretary of State for Health and Social Care. 2022. Available online: https://www.gov.uk/government/publications/better-broader-safer-using-health-data-for-research-and-analysis (accessed on 12 August 2022).
- Health-RI. Personal Health Train. Available online: https://www.health-ri.nl/initiatives/personal-health-train (accessed on 12 August 2022).
- UK Health Data Research Alliance and NHSX. Building Trusted Research Environments—Principles and Best Practices; Towards TRE Ecosystems (1.0); Zenodo: Geneva, Switzerland, 2021. [Google Scholar] [CrossRef]
- Desai, T.; Ritchie, F.; Welpton, R. Five Safes: Designing Data Access for Research; University of the West of England: Bristol, UK, 2016; Available online: https://www2.uwe.ac.uk/faculties/bbs/Documents/1601.pdf (accessed on 7 September 2022).
- Arbuckle, L.; Ritchie, F. The five safes of risk-based anonymisation. IEEE Secur. Priv. 2019, 17, 84–89. [Google Scholar] [CrossRef]
- Health Data Research UK. Discover Data on the Gateway. Available online: https://www.hdruk.org/access-to-health-data/health-data-research-innovation-gateway/ (accessed on 12 August 2022).
- OpenID Connect. Available online: https://openid.net/connect/ (accessed on 12 August 2022).
- OAuth 2.0. Available online: https://oauth.net/2/ (accessed on 12 August 2022).
- OHDSI (Observational Health Data Sciences and Informatics). OMOP Common Data Model. Available online: https://www.ohdsi.org/data-standardization/the-common-data-model/ (accessed on 12 August 2022).
- Boniface, M.; Carmichael, L.; Hall, W.; Pickering, B.; Stalla-Bourdillon, S.; Taylor, S. The Social Data Foundation model: Facilitating health and social care transformation through datatrust services. Data Policy 2022, 4, e6. [Google Scholar] [CrossRef]
- Mohanta, B.K.; Panda, S.S.; Jena, D. An overview of smart contract and use cases in blockchain technology. In Proceedings of the 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Bengaluru, India, 10–12 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Lovato, L.C.; Hill, K.; Hertert, S.; Hunninghake, D.B.; Probstfield, J.L. Recruitment for controlled clinical trials: Literature summary and annotated bibliography. Control. Clin. Trials 1997, 18, 328–352. [Google Scholar] [CrossRef]
- Safran, C.; Bloomrosen, M.; Hammond, W.E.; Labkoff, S.; Markel-Fox, S.; Tang, P.C.; Detmer, D.E. Toward a national framework for the secondary use of health data: An American Medical Informatics Association White Paper. J. Am. Med. Inform. Assoc. 2007, 14, 1–9. [Google Scholar] [CrossRef] [PubMed]
- Cai, L.; Zhu, Y. The challenges of data quality and data quality assessment in the big data era. Data Sci. J. 2015, 14, 2. [Google Scholar] [CrossRef] [Green Version]
- Hulsen, T. Sharing Is Caring—Data Sharing Initiatives in Healthcare. Int. J. Environ. Res. Public Health 2020, 17, 3046. [Google Scholar] [CrossRef] [PubMed]
- van Panhuis, W.G.; Paul, P.; Emerson, C.; Grefenstette, J.; Wilder, R.; Herbst, A.J.; Heymann, D.; Burke, D.S. A systematic review of barriers to data sharing in public health. BMC Public Health 2014, 14, 1144. [Google Scholar] [CrossRef] [PubMed]
- Madden, S.; Pollard, C. Joining Up the Dots: Driving Innovation, Research and Planning through Trusted Research Environments. 2021. Available online: https://transform.england.nhs.uk/blogs/joining-up-the-dots-driving-innovation-research-and-planning-through-trusted-research-environments/ (accessed on 12 August 2022).
- Hoepman, J.-H. Privacy Design Strategies. In Proceedings of the 29th IFIP TC11 International Information Security Conference (IFIP SEC 2014), Marrakech, Morocco, Germany, 2–4 June 2014; Springer: Berlin/Heidelberg, Germany; pp. 446–459. [Google Scholar] [CrossRef]
- Health Data Research UK. Innovation Gateway—Trusted Research Environments. Available online: https://www.healthdatagateway.org/collectioncategories/trusted-research-environment (accessed on 12 August 2022).
- NHS Digital. Trusted Research Environment Service for England. Available online: https://digital.nhs.uk/coronavirus/coronavirus-data-services-updates/trusted-research-environment-service-for-england (accessed on 12 August 2022).
- Wood, A.; Denholm, R.; Hollings, S.; Cooper, J.; Ip, S.; Walker, V.; Denaxas, S.; Akbari, A.; Banerjee, A.; Whiteley, W.; et al. Linked electronic health records for research on a nationwide cohort of more than 54 million people in England: Data resource. BMJ 2021, 373, n826. [Google Scholar] [CrossRef] [PubMed]
- Whiteley, W.N.; Ip, S.; Cooper, J.A.; Bolton, T.; Keene, S.; Walker, V.; Denholm, R.; Akbari, A.; Omigie, E.; Hollings, S.; et al. Association of COVID-19 vaccines ChAdOx1 and BNT162b2 with major venous, arterial, or thrombocytopenic events: A population-based cohort study of 46 million adults in England. PLoS Med. 2022, 19, e1003926. [Google Scholar] [CrossRef] [PubMed]
- CLOSER (UCL Social Research Institute). A Step-By-Step Guide to Applying and Accessing Linked Data in the UK LLC Trusted Research Environment (Webinar Videos and Slides). 2022. Available online: https://www.closer.ac.uk/event/webinar-ukllc-trusted-research-environment/ (accessed on 12 August 2022).
- UK Longitudinal Linkage Collaboration (UK LLC). Available online: https://ukllc.ac.uk/ (accessed on 12 August 2022).
- Genomics England. Research Environment. Available online: https://www.genomicsengland.co.uk/research/research-environment (accessed on 12 August 2022).
- Genomics England. Publications. Available online: https://www.genomicsengland.co.uk/research/publications (accessed on 12 August 2022).
- Genomics England. Genomics England Launches Next-Generation Research Platform Central to UK COVID-19 Response. 2020. Available online: https://www.genomicsengland.co.uk/news/research-environment-covid-19-lifebit-aws (accessed on 12 August 2022).
- SeRP (Secure eResearch Platform, Swansea University). SAIL Databank: A World-Class Trusted Research Environment (TRE). Available online: https://serp.ac.uk/2021/07/26/saildatabank/ (accessed on 12 August 2022).
- Jones, K.H.; Ford, D.V.; Thompson, S.; Lyons, R.A. A profile of the SAIL databank on the UK secure research platform. Int. J. Popul. Data Sci. 2019, 4, 1134. [Google Scholar] [CrossRef] [PubMed]
- Research Data Scotland. Safe Haven Services. Available online: https://www.researchdata.scot/safe-haven-services (accessed on 12 August 2022).
- Health and Social Care (HSC) Northern Ireland Regional Business Services Organisation (RBSO/BSO). Honest Broker Service. Available online: https://hscbusiness.hscni.net/services/2454.htm (accessed on 12 August 2022).
- UK Office for National Statistics. Secure Research Service. Available online: https://www.ons.gov.uk/aboutus/whatwedo/statistics/requestingstatistics/secureresearchservice (accessed on 12 August 2022).
- UK Office for National Statistics. Explorable Datasets—Search the ONS Secure Research Service Metadata Catalogue. Available online: https://ons.metadata.works/domain/index.html (accessed on 12 August 2022).
- UK Statistics Authority. List of Digital Economy Act Accredited Processing Environments. Available online: https://uksa.statisticsauthority.gov.uk/digitaleconomyact-research-statistics/better-access-to-data-for-research-information-for-processors/list-of-digital-economy-act-accredited-processing-environments/ (accessed on 12 August 2022).
- Personal Health Train Implementation Network Manifesto. Available online: https://www.go-fair.org/wp-content/uploads/2019/05/Personal-Health-Train-Implementation-Network-Manifesto.pdf (accessed on 12 August 2022).
- Health-RI. Frequently Asked Questions—The Personal Health Train. Available online: https://pht.health-ri.nl/faq (accessed on 12 August 2022).
- Van Soest, J.; Sun, C.; Mussmann, O.; Puts, M.; van den Berg, B.; Malic, A.; van Oppen, C.; Towend, D.; Dekker, A.; Dumontier, M. Using the Personal Health Train for Automated and Privacy-Preserving Analytics on Vertically Partitioned Data. Stud. Health Technol. Inform. 2018, 247, 581–585. [Google Scholar] [CrossRef] [PubMed]
- Welten, S.; Mou, Y.; Neumann, L.; Jaberansary, M.; Yediel Ucer, Y.; Kirsten, T.; Decker, S.; Beyan, O. A Privacy-Preserving Distributed Analytics Platform for Health Care Data. Methods Inf. Med. 2022, 61, e1–e11. [Google Scholar] [CrossRef] [PubMed]
- Kamel Boulos, M.N.; Cai, Q.; Padget, J.A.; Rushton, G. Using software agents to preserve individual health data confidentiality in micro-scale geographical analyses. J. Biomed. Inform. 2006, 39, 160–170. [Google Scholar] [CrossRef] [PubMed]
- DIFUTURE Tübingen PHT-meDIC. Software Architecture—PHT. Available online: https://personalhealthtrain.de/software-architecture/ (accessed on 12 August 2022).
- DIFUTURE Tübingen PHT-meDIC. PHT-meDIC GitHub Repository. Available online: https://github.com/PHT-Medic (accessed on 12 August 2022).
- PADME—Platform for Analytics and Distributed Machine Learning for Enterprises. Available online: https://websites.fraunhofer.de/PersonalHealthTrain/ (accessed on 12 August 2022).
- Vantage6—Privacy Preserving Federated Learning Infrastructure for Secure Insight Exchange. Available online: https://www.distributedlearning.ai/ (accessed on 12 August 2022).
- Shi, Z.; Zhovannik, I.; Traverso, A.; Dankers, F.J.W.M.; Deist, T.M.; Kalendralis, P.; Monshouwer, R.; Bussink, J.; Fijten, R.; Aerts, H.J.W.L.; et al. Distributed radiomics as a signature validation study using the Personal Health Train infrastructure. Sci. Data 2019, 6, 218. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Deist, T.M.; Dankers, F.J.W.M.; Ojha, P.; Scott Marshall, M.; Janssen, T.; Faivre-Finn, C.; Masciocchi, C.; Valentini, V.; Wang, J.; Chen, J.; et al. Distributed learning on 20 000+ lung cancer patients—The Personal Health Train. Radiother. Oncol. 2020, 144, 189–200. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Health-RI. Use Cases—Healthy Living—The Personal Health Train. Available online: https://pht.health-ri.nl/use-cases/healthy-living (accessed on 12 August 2022).
- Health-RI. Use Cases—Health Care—The Personal Health Train. Available online: https://pht.health-ri.nl/use-cases/health-care (accessed on 12 August 2022).
- Health-RI. Use Cases—Health Research—The Personal Health Train. Available online: https://pht.health-ri.nl/use-cases/health-research (accessed on 12 August 2022).
- Zorginstituut Nederland/Health-RI. The Personal Health Train in Health Care—Stories from the Work Field. Available online: https://pht.health-ri.nl/sites/healthtrain/files/2020-07/PHT%20in%20health%20care.pdf (accessed on 12 August 2022).
- DHI News Team. Ming Tang Joins Networks Debate on Federated Data Platform. 2022. Available online: https://www.digitalhealth.net/2022/08/ming-tang-networks-debate-federated-data-platform/ (accessed on 7 September 2022).
- Gartner. What Is Web3? 2022. Available online: https://www.gartner.com/en/articles/what-is-web3 (accessed on 7 September 2022).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, P.; Kamel Boulos, M.N. Privacy-by-Design Environments for Large-Scale Health Research and Federated Learning from Data. Int. J. Environ. Res. Public Health 2022, 19, 11876. https://doi.org/10.3390/ijerph191911876
Zhang P, Kamel Boulos MN. Privacy-by-Design Environments for Large-Scale Health Research and Federated Learning from Data. International Journal of Environmental Research and Public Health. 2022; 19(19):11876. https://doi.org/10.3390/ijerph191911876
Chicago/Turabian StyleZhang, Peng, and Maged N. Kamel Boulos. 2022. "Privacy-by-Design Environments for Large-Scale Health Research and Federated Learning from Data" International Journal of Environmental Research and Public Health 19, no. 19: 11876. https://doi.org/10.3390/ijerph191911876