A Markov Chain Replacement Strategy for Surrogate Identifiers: Minimizing Re-Identification Risk While Preserving Text Reuse
Abstract
1. Introduction
1.1. Replacement Text in Existing Software
1.2. Single Occurrence Re-Identification Vulnerability in Existing HIPS Replacement Methods
1.3. Need for Improved HIPS Strategies
1.4. Contribution: Evaluation of Novel HIPS Strategies
2. Materials and Methods
2.1. Software Implementation
2.2. Evaluation Corpora
2.2.1. Resynthesis Elements
2.2.2. Surrogate Substitution Strategies
2.2.3. Maximum Surrogate Repeat Size (MSRS)
2.3. PHI Leakage Evaluation on Real Corpora
2.4. PHI Leakage Evaluation on Simulated Corpora
2.5. Assessment of HIPS Strategy on Information Extraction Efficacy
2.6. BRATsynthetic Runtime Experiment
3. Results
3.1. Surrogate Replacement Strategy: PHI Leakage Assessment
3.2. Surrogate Replacement Strategy: Effect on Information Extractions Tasks
4. Discussion
4.1. PHI Leakage Replacement Strategy Evaluation
4.2. Larger Corpus Size
4.3. HIPS Strategy vs NER for PHI Protection
4.4. Impact on Information Extraction Tasks
4.5. Comparison to LLM-Based De-Identification Methods
4.6. Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, J.; Zhou, Y.; Jiang, X.; Natarajan, K.; Pakhomov, S.V.; Liu, H.; Xu, H. Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition. J. Am. Med. Inform. Assoc. 2021, 28, 2193–2201. [Google Scholar] [CrossRef]
- Kumichev, G.; Blinov, P.; Kuzkina, Y.; Goncharov, V.; Zubkova, G.; Zenovkin, N.; Goncharov, A.; Savchenko, A. Medsyn: Llm-based synthetic medical text generation framework. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2024; pp. 215–230. [Google Scholar]
- Hiebel, N.; Ferret, O.; Fort, K.; Névéol, A. Clinical Text Generation: Are We There Yet? Annu. Rev. Biomed. Data Sci. 2025, 8, 173–198. [Google Scholar] [CrossRef] [PubMed]
- Dernoncourt, F.; Lee, J.Y.; Uzuner, O.; Szolovits, P. De-identification of patient notes with recurrent neural networks. J. Am. Med. Inform. Assoc. 2016, 24, 596–606. [Google Scholar] [CrossRef] [PubMed]
- Schneble, C.O.; Elger, B.S.; Shaw, D.M. Google’s Project Nightingale highlights the necessity of data science ethics review. EMBO Mol. Med. 2020, 12, e12053. [Google Scholar] [CrossRef]
- Emam, K.E.; Jonker, E.; Arbuckle, L.; Malin, B. A Systematic Review of Re-Identification Attacks on Health Data. PLoS ONE 2011, 6, e28071. [Google Scholar] [CrossRef] [PubMed]
- Heider, P.M.; Obeid, J.S.; Meystre, S.M. A Comparative Analysis of Speed and Accuracy for Three Off-the-Shelf De-Identification Tools. AMIA Summits Transl. Sci. Proc. 2020, 2020, 241–250. [Google Scholar]
- Steinkamp, J.M.; Pomeranz, T.; Adleberg, J.; Kahn, C.E.; Cook, T.S. Evaluation of Automated Public De-Identification Tools on a Corpus of Radiology Reports. Radiol. Artif. Intell. 2020, 2, e190137. [Google Scholar] [CrossRef]
- Guzman, B.; Metzger, I.; Aphinyanaphongs, Y.; Grover, H. Assessment of Amazon Comprehend Medical: Medication Information Extraction. arXiv 2020, arXiv:2002.00481. [Google Scholar] [CrossRef]
- CliniDeID. 2022. Available online: https://github.com/Clinacuity/CliniDeID (accessed on 1 January 2025).
- Meystre, S.M.; Underwood, G.; Heider, P. CliniDeID: An Open Source Solution for Accurate Clinical Text De-Identification. Technical Report, ResearchGate. 2023. Available online: https://www.researchgate.net/profile/Stephane-Meystre/publication/371303503_CliniDeID_an_Open_Source_Solution_for_Accurate_Clinical_Text_De-Identification/links/647def6279a722376513593b/CliniDeID-an-Open-Source-Solution-for-Accurate-Clinical-Text-De-Identification.pdf (accessed on 1 June 2025).
- Kayallp, M.; Browne, A.C.; Dodd, Z.A.; Sagan, P.; McDonald, C.J. An Easy-to-Use Clinical Text De-identification Tool for Clinical Scientists: NLM Scrubber. 2015. Available online: https://www.researchgate.net/publication/319914511_An_Easy-to-Use_Clinical_Text_De-identification_Tool_for_Clinical_Scientists_NLM_Scrubber?channel=doi&linkId=59c193170f7e9b21a8265f57&showFulltext=true (accessed on 1 June 2025).
- Aberdeen, J.; Bayer, S.; Yeniterzi, R.; Wellner, B.; Clark, C.; Hanauer, D.; Malin, B.; Hirschman, L. The MITRE Identification Scrubber Toolkit: Design, training, and assessment. Int. J. Med. Inform. 2010, 79, 849–859. [Google Scholar] [CrossRef]
- MIST v2.04. Updated Version of MITRE MIST Tool. 2019. Available online: https://mist-deid.sourceforge.net/ (accessed on 1 June 2025).
- Douglass, M.; Clifford, G.; Reisner, A.; Moody, G.; RG, M. Computer-assisted de-identification of free text in the MIMIC II database. In Proceedings of the Computers in Cardiology, Chicago, IL, USA, 19–22 September 2004; pp. 341–344. [Google Scholar] [CrossRef]
- Gardner, J.; Xiong, L. HIDE: An Integrated System for Health Information DE-identification. In Proceedings of the 2008 21st IEEE International Symposium on Computer-Based Medical Systems, Jyvaskyla, Finland, 17–19 June 2008; pp. 254–259, ISSN 1063-7125. [Google Scholar] [CrossRef]
- Chambon, P.J.; Wu, C.; Steinkamp, J.M.; Adleberg, J.; Cook, T.S.; Langlotz, C.P. Automated deidentification of radiology reports combining transformer and “hide in plain sight” rule-based methods. J. Am. Med. Inform. Assoc. 2022, 30, 318–328. [Google Scholar] [CrossRef]
- Murugadoss, K.; Rajasekharan, A.; Malin, B.; Agarwal, V.; Bade, S.; Anderson, J.R.; Ross, J.L.; Faubion, W.A.; Halamka, J.D.; Soundararajan, V.; et al. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning. Patterns 2021, 2, 100255. [Google Scholar] [CrossRef]
- Microsoft Presidio. Presidio: Data Protection and De-Identification SDK. Available online: https://microsoft.github.io/presidio/ (accessed on 1 June 2025).
- Neamatullah, I.; Douglass, M.; Lehman, L.H.; Goldberger, A. De-Identification Software Package v1.1 PhysioNet. 2007. Available online: https://doi.org/10.13026/C20M3F (accessed on 1 June 2025).
- Trotter, A.; O’Leary, T.; Cochran, M.D.; Coffee, C.; Nadimpalli, A.; Osborne, J.D. BRATsynthetic. 2025. Available online: https://github.com/uabnlp/BRATsynthetic (accessed on 20 September 2025).
- Kovačević, A.; Bašaragin, B.; Milošević, N.; Nenadić, G. De-identification of clinical free text using natural language processing: A systematic review of current approaches. Artif. Intell. Med. 2024, 102845. [Google Scholar] [CrossRef]
- Carrell, D.; Malin, B.; Aberdeen, J.; Bayer, S.; Clark, C.; Wellner, B.; Hirschman, L. Hiding in plain sight: Use of realistic surrogates to reduce exposure of protected health information in clinical text. J. Am. Med. Inform. Assoc. 2013, 20, 342–348. [Google Scholar] [CrossRef]
- Carrell, D.S.; Malin, B.A.; Cronkite, D.J.; Aberdeen, J.S.; Clark, C.; Li, M.R.; Bastakoty, D.; Nyemba, S.; Hirschman, L. Resilience of clinical text de-identified with “hiding in plain sight” to hostile reidentification attacks by human readers. J. Am. Med. Inform. Assoc. 2020, 27, 1374–1382. [Google Scholar] [CrossRef]
- Carrell, D.S.; Cronkite, D.J.; Li, M.R.; Nyemba, S.; Malin, B.A.; Aberdeen, J.S.; Hirschman, L. The machine giveth and the machine taketh away: A parrot attack on clinical text deidentified with hiding in plain sight. J. Am. Med. Inform. Assoc. 2019, 26, 1536–1544. [Google Scholar] [CrossRef]
- Patsakis, C.; Lykousas, N. Man vs the machine in the struggle for effective text anonymisation in the age of large language models. Sci. Rep. 2023, 13, 16026. [Google Scholar] [CrossRef] [PubMed]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Jonnagaddala, J.; Wong, Z.S.Y. Privacy preserving strategies for electronic health records in the era of large language models. npj Digit. Med. 2025, 8, 34. [Google Scholar] [CrossRef] [PubMed]
- Stenetorp, P.; Pyysalo, S.; Topić, G.; Ohta, T.; Ananiadou, S.; Tsujii, J. brat: A Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics; Segond, F., Ed.; Association for Computational Linguistics: Avignon, France, 2012; pp. 102–107. Available online: https://aclanthology.org/E12-2021/ (accessed on 1 September 2025).
- Osborne, J.D.; Booth, J.S.; O’Leary, T.; Mudano, A.; Rosas, G.; Foster, P.J.; Saag, K.G.; Danila, M.I. Identification of Gout Flares in Chief Complaint Text Using Natural Language Processing. AMIA Annu. Symp. Proc. 2020, 2020, 973–982. [Google Scholar]
- Pradhan, S.; Elhadad, N.; Chapman, W.; Manandhar, S.; Savova, G. SemEval-2014 Task 7: Analysis of Clinical Text. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; pp. 54–62. [Google Scholar] [CrossRef]
- Johnson, A.; Bulgarelli, L.; Pollard, T.; Horng, S.; Celi, L.A.; Mark, R. *MIMIC-IV* (version 0.4). PhysioNet, 2020. RRID:SCR_007345. Available online: https://physionet.org/content/mimiciv/0.4/ (accessed on 1 September 2025).
- Almudaifer, A.I.; Feldman, S.S.; O’Leary, T.; Covington, W.L.; Hairston, J.; Deitch, Z.; Crisan, E.; Riggs, K.; Walters, L.; Osborne, J.D. Annotation of Opioid Use Disorder Entity Modifiers in Clinical Text. In MEDINFO 2023—The Future Is Accessible; IOS Press: Amsterdam, The Netherlands, 2024; pp. 1458–1459. [Google Scholar]
- Stubbs, A.; Uzuner, O. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. J. Biomed. Inform. 2015, 58, S20–S29. [Google Scholar] [CrossRef]
- Curella, F.; Faraglia, D. Faker Project. Available online: https://faker.readthedocs.io/en/master/ (accessed on 1 June 2025).
- Sigman, K. Expected number of Visits of a Finite State Markov Chain to a Transient State. 2016. Available online: http://www.columbia.edu/~ks20/4106-18-Fall/Notes-Transient.pdf (accessed on 1 January 2022).
- Eyre, H.; Chapman, A.B.; Peterson, K.S.; Shi, J.; Alba, P.R.; Jones, M.M.; Box, T.L.; DuVall, S.L.; Patterson, O.V. Launching into clinical space with medspaCy: A new clinical text processing toolkit in Python. AMIA Annu. Symp. Proc. 2022, 2021, 438. [Google Scholar]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef]
- Almudaifer, A.I.; Covington, W.; Hairston, J.; Deitch, Z.; Anand, A.; Carroll, C.M.; Crisan, E.; Bradford, W.; Walter, L.A.; Eaton, E.F.; et al. Multi-task transfer learning for the prediction of entity modifiers in clinical text: Application to opioid use disorder case detection. J. Biomed. Semant. 2024, 15, 11. [Google Scholar] [CrossRef] [PubMed]
- Shi, J.; Mowery, D.L.; Doing-Harris, K.M.; Hurdle, J.F. RuSH: A Rule-based Segmentation Tool Using Hashing for Extremely Accurate Sentence Segmentation of Clinical Text. In Proceedings of the AMIA Annual Symposium; American Medical Informatics Association: Bethesda, MD, USA, 2016; pp. 1587–1596. Available online: https://api.semanticscholar.org/CorpusID:35505279 (accessed on 1 July 2025).
- Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 2003, 49, 1858–1860. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar]
- Osborne, J.D.; O’Leary, T.; Mudano, A.; Booth, J.; Rosas, G.; Peramsetty, G.S.; Knighton, A.; Foster, J.; Saag, K.; Danila, M.I. *Gout Emergency Department Chief Complaint Corpora* (Version 1.0). PhysioNet, 2020. RRID:SCR_007345. Available online: https://physionet.org/content/emer-complaint-gout/1.0/ (accessed on 1 September 2025).
- Petrov, S.; Das, D.; McDonald, R. A universal part-of-speech tagset. arXiv 2011, arXiv:1104.2086. [Google Scholar]
- O’Leary, T.; Osborne, J.D. *uabnlp/BRATsynthetic: Version 0.3*. Zenodo. 2022. Available online: https://zenodo.org/records/7250621 (accessed on 1 January 2024).
- Osborne, J.D.; O’Leary, T.; Nadimpalli, A.; Kennedy, R.E. Bratsynthetic: Text de-identification using a markov chain replacement strategy for surrogate personal identifying information. arXiv 2022, arXiv:2210.16125. [Google Scholar] [CrossRef]
- Simancek, D.; Vydiswaran, V.V. Handling Name Errors of a BERT-Based De-Identification System: Insights from Stratified Sampling and Markov-based Pseudonymization. In Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024), St. Julian’s, Malta, 21 March 2024; pp. 1–7. [Google Scholar]
Tool or Method | Surrogate Strategy | Open | |||
---|---|---|---|---|---|
Simple | Label | Consistent | Markov | Source | |
Presidio [19] | ✓ | ✗ | ✗ | ✗ | ✓ |
Deid [20] | ✗ | ✓ | ✗ | ✗ | ✓ |
MIST v2 [14] | ✗ | ✓ | ✗ | ✗ | ✓ |
CliniDeID [10] | ✓ | ✗ | ✓ | ✗ | ✓ |
Stanford HIPS System [17] | ✓ | ✗ | ✓ | ✗ | ✓ |
nference De-id [18] | ✓ | ✗ | ✓ | ✗ | ✗ |
BRATsynthetic [21] | ✓ | ✗ | ✓ | ✓ | ✓ |
HIPAA Safe Harbor Category | Type | Description | Critical PHI | BRATsynthetic Replacement | MIMIC Identifier Table |
---|---|---|---|---|---|
NAMES | DOCTOR | Health care provider name | No | DOCTOR | DOCTOR |
PATIENT | Patient name | Yes | PATIENT | PATIENT | |
USERNAME | User IDS of provider | No | USERNAME | - | |
GEO LOCATION | LOC-OTHER | Identifiable locations and landmarks | No | LOCATION-OTHER | LOC-OTHER |
HOSPITAL | Hospital or clinic name | No | HOSPITAL | HOSPITAL | |
WARD | Ward or unit name | No | - | WARD | |
ZIP | Zip code | No | ZIP | ZIP | |
ORGANIZATION | Employers | No | ORGANIZATION | ORGANIZATION | |
COUNTRY | Country | No | COUNTRY | COUNTRY | |
STATE | State or province name | No | STATE | STATE | |
CITY | Name of city | No | CITY | LOC-OTHER | |
STREET | Street address | No | STREET | STREET | |
DATES | DATE | Year | No | DATE (regex) | DATE |
Month/Day | No | DATE (regex) | DATE | ||
Day of the week | No | DATE (regex) | - | ||
HOLIDAY | Holidays | No | DATE (regex) | HOLIDAY | |
AGE | AGE>=90 | No | AGE | AGE_90_ANDUP | |
AGE<90 | No | AGE | - | ||
PHONE | PHONE | Telephone numbers | Yes | PHONE | PHONE |
VEHICLE IDS | VEHICLE_ID | Vehicle identification number or license | Yes | IDNUM (regex) | IDNUM |
FAX | FAX | Fax numbers | Yes | PHONE | PHONE |
DEVICE IDS | DEVICE IDS | Device identifiers and serial numbers | Yes | DEVICE (regex alphanumeric) | IDNUM |
IDNUM | IDNUM | License and health plan numbers | Yes | IDNUM (regex) | - |
MEDICAL RECORD | MEDICAL RECORD | Medical record number | Yes | IDNUM (alphanumeric) | - |
SSN | SSN | Social security number | Yes | IDNUM (regex) | SSN |
ACCOUNT ID | ACCOUNT ID | Account numbers | Yes | ACCOUNT ID | |
Email address | Yes | - | |||
URL | URL | URL | No | URL | - |
BIOMETRIC ID | BIOID | Biometric identifiers, including finger and voice prints | NA | BIOID (alphanumeric) | - |
IP ADDRESS | IP ADDRESS | Internet Protocol Address | No | URL | - |
Safe Harbor Element | Brat Synthetic Category |
---|---|
IMAGE | Images are not de-identified |
UNIQUE | Unique identifying phrases must be manually redacted |
Not covered under safe harbor | PROFESSION (also a category in i2b2) |
Not covered under safe harbor | TIME (exclusive to BRATsynthetic) |
Substitution Strategy | ||||
---|---|---|---|---|
Simple | Consistent | Random | Markov | |
Original Name | Sandy | Sandy | Sandy | Sandy |
1st Surrogate Replacement | ENTITY_NAME | Sara | Kim | Sara |
2nd Surrogate Replacement | ENTITY_NAME | Sara | Nisha | Sara |
3rd Surrogate Replacement | ENTITY_NAME | Sara | Cathy | Ann |
4th Surrogate Replacement | ENTITY_NAME | Sara | Maria | Maria |
5th Surrogate Replacement | ENTITY_NAME | Sara | Hannah | Maria |
6th Surrogate Replacement | ENTITY_NAME | Sara | Lin | Maria |
PHI Similarity to Original | Lowest | High | Low | Intermediate |
MSRS | NA | NA | 1 | 3 |
Corpus | UAB Total | UAB Discharge | MIMIC Discharge | UAB Total |
---|---|---|---|---|
Type | Document | Document | Document | Patient |
Mean | 388.5 | 355.6 | 6.8 | 8123.0 |
Median | 224 | 199 | 5 | 985 |
Range | 2–2545 | 10–2414 | 2–76 | 7–321,945 |
Machine | Documents | Words | PHI Entities | Runtime * |
---|---|---|---|---|
3.4 GHz Quad-Core Intel i7 CPU with 32 GB 1600 MHz DDR 3 | 28,547 | 32,432,577 | 1,710,386 | 20.9 s system time |
Task | Traditional | HIPS | ||||
---|---|---|---|---|---|---|
None | Simple | Consistent | Random | Markov | p-Value | |
NER | 0.723 | 0.730 | 0.723 | 0.720 | 0.722 | 0.704 |
Subject | 0.914 | 0.936 | 0.918 | 0.926 | 0.927 | 0.868 |
DocTime | 0.932 | 0.926 | 0.925 | 0.934 | 0.925 | 0.105 |
Negation | 0.972 | 0.969 | 0.975 | 0.975 | 0.972 | 0.469 |
MedSpacy Task | Consistent | Random | Markov |
---|---|---|---|
Context | |||
Dependency Parse | |||
Tokenization | |||
Entity Extraction | |||
POS Tagging | |||
Spans |
HIPS Substitution Strategy | |||||
---|---|---|---|---|---|
Document Type | None | Cons | Rand | Mark | Explanation |
Psychiatry Note | 0 | 1 | 0 | 0 | ER: UAB ER, -> ER: ACH ER |
MR Breast Diagnostic Bil wo+w contrast | 2 | 2 | 1 | 2 | Estrogen receptor positive status [ER+] |
Emergency Department Note | 0 | 0 | 0 | 2 | recently seen in ER on 4/4 |
Emergency Department Note | 0 | 1 | 2 | 2 | Pt was recently seen in ER on 02/08 |
Consult Note | 15 | 15 | 15 | 16 | MRN: 1324616 Progesterone receptor |
Type | Example Mentions | ||||
Consistent | 36 Years | Ethanol level | 07/17/19 08:48 | Ampheta | 07/17/19 |
Markov | 41 Years | 07/14/19 23:59 | 07/12/19 | ||
Random | 37 Years | 07/13/19 22:17 | 07/17/19 | ||
Simple | [AGE][AGE] | [DATE][TIME] | [DATE] |
Type | Example Mentions | |||
Consistent | Date & | Time 07/13/2019 18:32 | Ethanol | level 07/17/19 08:48 |
Markov | Time 07/13/2019 22:35 | level 07/14/19 07:54 | ||
Random | Time 07/17/2019 06:44 | level 07/13/19 22:17 | ||
Simple | Time [DATE][TIME] | level [DATE] [TIME] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Osborne, J.D.; Trotter, A.; O’Leary, T.; Coffee, C.; Cochran, M.D.; Mansilla-Gonzalez, L.; Nadimpalli, A.; McAnnally, A.; Almudaifer, A.I.; Curtis, J.R.; et al. A Markov Chain Replacement Strategy for Surrogate Identifiers: Minimizing Re-Identification Risk While Preserving Text Reuse. Electronics 2025, 14, 3945. https://doi.org/10.3390/electronics14193945
Osborne JD, Trotter A, O’Leary T, Coffee C, Cochran MD, Mansilla-Gonzalez L, Nadimpalli A, McAnnally A, Almudaifer AI, Curtis JR, et al. A Markov Chain Replacement Strategy for Surrogate Identifiers: Minimizing Re-Identification Risk While Preserving Text Reuse. Electronics. 2025; 14(19):3945. https://doi.org/10.3390/electronics14193945
Chicago/Turabian StyleOsborne, John D., Andrew Trotter, Tobias O’Leary, Chris Coffee, Micah D. Cochran, Luis Mansilla-Gonzalez, Akhil Nadimpalli, Alex McAnnally, Abdulateef I. Almudaifer, Jeffrey R. Curtis, and et al. 2025. "A Markov Chain Replacement Strategy for Surrogate Identifiers: Minimizing Re-Identification Risk While Preserving Text Reuse" Electronics 14, no. 19: 3945. https://doi.org/10.3390/electronics14193945
APA StyleOsborne, J. D., Trotter, A., O’Leary, T., Coffee, C., Cochran, M. D., Mansilla-Gonzalez, L., Nadimpalli, A., McAnnally, A., Almudaifer, A. I., Curtis, J. R., Aly, S. M., & Kennedy, R. E. (2025). A Markov Chain Replacement Strategy for Surrogate Identifiers: Minimizing Re-Identification Risk While Preserving Text Reuse. Electronics, 14(19), 3945. https://doi.org/10.3390/electronics14193945