Investigating De-Identification Methodologies in Dutch Medical Texts: A Replication Study of Deduce and Deidentify

Mosteiro, Pablo; Wang, Ruilin; Scheepers, Floortje; Spruit, Marco

doi:10.3390/electronics14081636

Open AccessArticle

Investigating De-Identification Methodologies in Dutch Medical Texts: A Replication Study of Deduce and Deidentify

¹

Department of Methodology and Statistics, Utrecht University, 3584 CH Utrecht, The Netherlands

²

Leiden Institute of Advanced Computer Science (LIACS), Leiden University, 2333 CC Leiden, The Netherlands

³

Psychiatry Department, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands

⁴

Public Health and Primary Care (PHEG), Leiden University Medical Center, 2511 DP The Hague, The Netherlands

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(8), 1636; https://doi.org/10.3390/electronics14081636

Submission received: 14 February 2025 / Revised: 11 April 2025 / Accepted: 13 April 2025 / Published: 18 April 2025

(This article belongs to the Special Issue Digital Security and Privacy Protection: Trends and Applications, 2nd Edition)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Deidentifying sensitive information in electronic health records (EHRs) is increasingly important as legal obligations to data privacy evolve along with the need to protect patient and institutional confidentiality. This study aims to comparatively evaluate the performance of two state-of-the-art deidentification systems, Deduce and Deidentify, on both real-world and synthetic Dutch medical texts, thereby providing insights into their relative strengths and limitations in preserving privacy while maintaining data utility. We employ a replication-extension research design, utilizing two distinct datasets: (1) the Annotation-Based Dataset from the Utrecht University Medical Center (UMC Utrecht), comprising manually annotated patient records spanning 1987 to 2021, and (2) the Synthetic Dataset, generated using a two-step process involving OpenAI’s GPT-4 model. Utilizing precision, recall, and

F_{1}

scores as evaluation metrics, we uncover the relative strengths and limitations of the two methods. Our findings indicate that both techniques show variable performance across different entities of deidentifying text information. Deduce outperforms Deidentify in overall accuracy by a margin of 0.42 on the synthetic datasets. On the real-world annotation-based dataset, the generalization ability of Deidentify is lower than Deduce by 0.2. However, the performance of both techniques is affected by the limitations of the dataset. In conclusion, this study provides valuable insights into the comparative performance of Deduce and Deidentify for deidentifying Dutch EHRs, contributing to the development of more effective privacy preservation techniques in the healthcare domain.

Keywords:

Dutch medical records; privacy information; natural language processing; named entity recognition; machine learning; deep learning methods

1. Introduction

Electronic health records (EHRs) are invaluable data sources for healthcare research and administration, yet they contain sensitive patient information that must be protected [1]. Regulatory frameworks such as the USA’s Health Insurance Portability and Accountability Act (HIPAA) and the European Union’s General Data Protection Regulation (GDPR) mandate the protection of personally identifiable information in healthcare data [2,3]. During the COVID-19 pandemic, the importance of preserving privacy became even more apparent, as extensive data collection via contact tracing highlighted the necessity for robust deidentification techniques [4]. Deidentification techniques play a crucial role in preserving patient privacy by removing or anonymizing sensitive information.

Figure 1 illustrates the deidentification of a piece of Dutch medical text. The original text contains Protected Health Information (PHI) instances (Patient, Persoon, Locatie, Instelling, Datum, Leeftijd, Patientnummer, Telefoonnummer, Url; for translations to English, see Table 1). The text goes through the deidentification process through different NER techniques, until all PHI instances have been removed.

Despite advancements in deidentification techniques, no system is infallible. For example, the leading named-entity-recognition (NER) system [5] in the benchmark CoNLL 2003 (English) (https://www.clips.uantwerpen.be/conll2003/ner/ (accessed on 12 April 2025)) has an

F_{1}

score of 94.6% (https://paperswithcode.com/paper/automated-concatenation-of-embeddings-for-1 (accessed on 12 April 2025)). Deidentification errors can lead to data leakage. To mitigate this risk, a conservative approach could involve aggressively removing any information suspected of being PHI. However, an overly aggressive strategy may result in excessive data loss, rendering the deidentified texts unusable due to the removal of essential non-PHI elements, such as drug names, medical codes, or other relevant clinical information. For example, Berg et al. [6] evaluated the impact of the quality of deidentification on downstream clinical NER used to extract body parts, disorders, drugs, or findings from texts, and found that poor deidentification led to poorer downstream clinical NER performance.

Deidentification is by no means a new task, and indeed impressive performance on English-language datasets has been achieved. The authors in Neamatullah et al. [7] reported a recall score of 0.967 for their pattern matching method, which was trained and evaluated on a test corpus comprising 1836 nursing notes. These systems do not generalize out-of-the-box to the Dutch language. As an example, two reasons are mentioned here. First, Dutch names often have tussenvoegsels, i.e., words that come between the first name and the last name (such as van der or de). Such a format is not common in English, which can lead to names not being recognized, and therefore leaked. Second, Dutch institutions often have names such as Mondrian (https://www.mondriaan.eu/ (accessed on 12 April 2025)), which do not give an indication of their nature. Again, this can lead to data leakage.

This paper presents a comparative study of two prominent deidentification approaches–rule-based and neural network methods–focusing on Dutch medical texts. The Rule-Based Approach uses pre-defined rules, language knowledge, and domain-specific dictionaries to identify or replace PHI in text. The authors in Douglass et al. [8] used lexical lookup tables, regular expressions, and heuristic methods to deidentify medical care text data in US hospitals. The authors in Menger et al. [9] applied these techniques to a test corpus of Dutch medical care notes and treatment plans. They created a pattern-matching method called Deduce to automatically deidentify sensitive information. The Machine-Learning approach [10] consists of training a machine-learning system from labelled data to learn complex textual patterns and contextual structures, a process that, unlike rule-based methods, does not require explicit programming. The authors in Trienes et al. [11] conducted a comparative study on rule-based and machine learning methods for Dutch medical record deidentification. Their labeling process, however, was different from that of Deduce. Therefore, their results do not adequately demonstrate the superiority of state-of-the-art machine learning architectures over rule-based methods in the deidentification of Dutch medical texts. We aim to answer the following research question:

How do the rule-based and neural network approaches to deidentification compare in terms of effectiveness, adaptability, and generalization when applied to Dutch medical data, and how can the strengths of these methodologies be used to improve the deidentification process and enhance patient privacy protection?

To address this question, we conducted a replication-extension study aimed at comparing deidentification methodologies in Dutch medical data, building upon the research carried out by Trienes et al. [11] that investigated rule-based Menger et al. [9] and neural network approaches.

We address this question by conducting comparative experiments using correctly labeled datasets and exploring the feasibility of both rule-based methods and deep learning algorithms for deidentification in Dutch medical texts. We provide clear, accurate results and insights into effective deidentification techniques by employing an effective assessment methodology. Ultimately, this research aims to contribute to the body of knowledge enriching privacy protection in the medical records of the Netherlands. By exploring established and emerging deidentification methods, we hope to offer a comprehensive view of the field as it stands today and provide insights for future improvement and development.

2. Materials and Methods

2.1. Data

This experiment relied on two dataset sources, each generated by a different method, to study deidentification methods in Dutch medical texts.

The initial dataset format used for this experiment is a collection of files in JSON Lines text format from the UMC Utrecht EHR system. This format is a convenient choice for working with large datasets since each row is a complete and independent data object.

The data consists of Dutch text, metadata, and span tags, reflecting various key information about individuals. Each row in the data set corresponds to a unique record, formatted as a JSON object. The main fields in the JSON object are “text”, “meta”, and “spans”.

The “text” field contains a string of text in Dutch, usually representing a medical context. It includes patient information such as name, email address, phone number, patient number, facility location, appointment date, and name of the treating physician.

The “meta” field contains additional metadata about the record, such as the individual’s first name, last name, initials, data source, year, and unique identifier (uid).

The ”spans” field is an array of objects, each object containing information about a particular tag present in the text. Each object consists of the start and end indices of the tags in the text and the tags themselves. The tags represent various types of identifiable information, such as “patient”, “person”, “location”, “institution”, “date”, “age”, “patient number”, “phone number”, and “url”.

In this study, two deidentification methods, Deduce and Deidentify, are compared. Deduce and Deidentify were chosen because they are the leading out-of-the-box deidentification systems for Dutch texts [12,13], which makes them the tools most likely to be used in real practice. Deduce is currently used at the UMC Utrecht. Deduce’s rule-based method uses a clear anonymization strategy. Deidentify offers an alternative strategy based on machine learning, which should increase its power, but does not prove its superiority over Deduce. To fairly compare the two, we must use Deduce’s labeling strategy for data annotation, and subsequently train a Deidentify model with the annotated data.

2.1.1. Annotated Dataset

This dataset was created at the UMC Utrecht, using an annotation strategy inspired by Deduce [9]. Prodigy was used for document annotation, with version 1.10.8 in token mode. The annotation categories align with those specified by Deduce. The strategy was honed over multiple rounds of document annotation, with a particular challenge being the decision to annotate complete or partial words. Prodigy’s design makes whole-word tagging easier, but partial-word tagging is more in line with Deduce’s design. We settled for whole-word tagging for simplicity.

2.1.2. Synthetic Dataset

The second dataset source was synthetically generated using OpenAI’s GPT-4 model [14] and manual annotations. This two-step approach aims to create a comprehensive, consistent, and specific patient medical dataset in Dutch. First, GPT-4 generated English medical data containing various key elements such as patient name, contact information, location, appointment details, and age. Previous research has demonstrated the effectiveness of using generative language models, such as LSTM and GPT-2, to create synthetic EHR datasets annotated for named-entity recognition, highlighting their utility for downstream NLP tasks like deidentification [15]. The prompt given to GPT-4 was as follows:

I want you to help me to generate medical text. Within this medical text, it should include eight categories. Here is the example data:
Intake interview with Jan Jansen (e: j.g.jsnen_1966@email.com, t: 0612345678, patnr: 1243567). It concerns a 51-year-old man who will be treated at the outpatient clinic of the UMCU from 14 March to 31 July due to a heart attack. gloom complaints. The patient lives at Voorstraat 45b in Utrecht and will be treated here by Peter de Visser.
Please generate 10 sets of such data for me.

After generating the English dataset, the text was translated into Dutch using Google Translate. The translated data was then fed back into the GPT-4 model to generate corresponding JSON files, containing patient information, metadata, and indexed tags. A custom Python script was used to accurately identify and annotate the start and end index of each label, creating a synthetic Dutch healthcare dataset containing information on 100 patients. This method provides a useful tool for generating medical datasets in Dutch while protecting real individuals’ privacy. All our code is at https://github.com/PabloMosUU/DeidCompare (commit 9f6a8a1) (accessed on 12 April 2025).

2.2. Methodology

2.2.1. Deduce

Deduce [9] is an automated method for deidentifying Dutch medical texts by masking private information. This process protects patient privacy and complies with local laws and regulations, such as the General Data Protection Regulation (GDPR). Deduce focuses on several Protected Health Information (PHI) categories, including names, addresses, social security numbers, etc. The method uses a combination of lookup tables, decision rules, and fuzzy string matching to deidentify sensitive information. Deduce has been shown to be effective in test environments, and it is used at the UMC Utrecht.

2.2.2. Deidentify

Deidentify [11] represents a significant advancement in the field of automatic deidentification of health records, offering a practical solution to balance patient confidentiality with the exploitation of information in unstructured data. We use the pre-trained Deidentify model bistmcrf_fast-v0.2.0.

Deidentify uses a neural approach termed BiLSTM-CRF (Bidirectional Long Short-Term Memory–Conditional Random Field). This approach minimizes the reliance on hand-crafted features intrinsic to traditional CRF-based deidentification, making it a more versatile and generalizable solution. The BiLSTM-CRF architecture, paired with contextual string embeddings, delivers state-of-the-art results for sequence labeling tasks.

2.3. Evaluation Metrics

We employed precision, recall, and their harmonic mean, the

F_{1}

score, defined as follows:

\begin{matrix} P & = \frac{T P}{T P + F P} \end{matrix}

(1)

\begin{matrix} R & = \frac{T P}{T P + F N} \end{matrix}

(2)

\begin{matrix} F_{1} & = \frac{2 \times P \times R}{P + R} \end{matrix}

(3)

where

T P

(true positives) is the number of correctly identified sensitive entities,

F P

(false positives) is the number of non-sensitive entities incorrectly identified as sensitive, and

F N

(false negatives) is the number of sensitive entities that were not identified. We do not consider partial entity matches.

3. Results

Performance metrics on the synthetic dataset for Deduce and Deidentify are presented in Table 1, and those for the annotation-based dataset are shown in Table 2. Figure 2 shows the performance metrics precision, recall, and

F_{1}

score for each PHI category as bar charts.

4. Discussion

4.1. Deduce

On the synthetic dataset, Deduce demonstrates a high

F_{1}

score in tagging elements such as DATUM, LEEFTIJD, PATIENTNUMMER, PERSOON, TELEFOONNUMMER, and URL. The limitation of the detection of Dutch institution tags “INSTELLING” may be due to the fact that Deduce’s institution names are mainly from a list of psychiatric care institutions, and names made up by GPT-4 might not be in that list. In addition, there are various ways to express the name of the medical institution, such as different abbreviation formats or a reversed order of the text before and after. This variability can pose challenges for Deduce to identify them correctly. To overcome this problem, enriching Deduce’s data source of institution names is proposed. By integrating more diverse names of medical institutions across the Netherlands and employing flexible matching algorithms or natural language processing techniques, it should be possible to account for the different representation formats. Similarly, for the “LOCATIE” category, Deduce has predefined records for 2647 places of residence. In our Synthetic Dataset, geographical locations are complex and randomized, which means they might not always appear in the whitelist provided by Deduce. In some instances, the data might only contain the postcode and city without including the street and house number, further complicating the task. This complexity could lead to the observed lower recall rates.

On the real-world annotation-based dataset, Deduce performs well in DATUM, INSTELLING, LEEFTIJD, LOCATIE, PERSOON, and TELEFOONNUMMER, but PATIENTNUMMER has a surprisingly low performance, which means that Deduce does not recognize the correct patient number well in the real-world data set. In some instances, phone numbers are written with the prefix separate from the rest of the number, for example: 06 12345678. Deduce often makes an error in these cases, tagging “1234567” as a patient number instead of recognizing it as part of a phone number.

4.2. Deidentify

Deidentify shows mixed performance on the synthetic dataset. Interestingly, the precision of Deidentify was relatively high, except for the INSTELLING, LEEFTIJD, PERSOON, and TELEFOONNUMMER classes. However, Deduce outperformed Deidentify in recognizing INSTELLING, LEEFTIJD, LOCATIE, PERSOON, TELEFOONNUMMER, and URL. The recall of Deidentify was lower than that of Deduce for the DATUM, LEEFTIJD, PATIENTNUMMER, TELEFOONNUMMER, and URL classes. Despite Deidentify being a pre-trained neural network model, its performance on the same synthetic datasets used by Deduce was not outstanding. Only in the case of INSTELLING and LOCATIE were its generalization abilities superior to that of Deduce. Thus, the overall performance of Deidentify was lower than that of Deduce on synthetic datasets. In terms of overall performance, Deduce surpassed Deidentify. However, Deidentify demonstrated a better generalization ability in complex patterns such as INSTELLING and LEEFTIJD in unknown data.

Importantly, we note that we are using a pre-trained Deidentify model that was trained on data that was annotated using a different strategy from the one used by Deduce. Thus, like previous work that unfairly inflated Deidentify’s performance with respect to Deduce [11], we acknowledge that we inflate the performance of Deduce with respect to Deidentify. This underscores the conclusion drawn in previous work that Dutch deidentification systems do not generalize well out of the box, and that further work on the data is required before using them [9,11]. This can be annotations, as well as the modification of rules of the re-training of machine-learning models. Future work should re-train the Deidentify model using data that was annotated in the Deduce format.

4.3. Generalizability

The observations regarding the performance of Deduce and Deidentify on our datasets encompass two phenomena. On the one hand, there are limitations to both systems regarding how well they can deidentify the texts they were designed to deidentify. On the other hand, there is the challenge of generalizing to a new dataset. Comparing our results in Table 1 and Table 2 to those of previous work [9,11], we note that Deduce performs much worse on institutions on our datasets than in the original paper. This is likely due to the fact that Deduce uses curated lists of institutions for deidentification, and our new and synthetic datasets include institutions that are probably not in the list in Deduce. This highlights the importance of tailoring a rule-based system such as Deduce before use, which can be a time-consuming process.

Interestingly, Deidentify showed a performance of 0.74 and 0.71 on locations on the synthetic and annotated datasets, respectively, while the original study had a performance of 0.84 on addresses. Although our observed performance is lower, this can be explained by the fact that we did not fine-tune the model on our dataset, as pointed out in Section 4.2. It would be interesting to repeat this study on data from outside the Netherlands, where addresses have a different format, to see if the performance would drop dramatically in that case.

5. Conclusions

Our experiments revealed that both Deduce and Deidentify have strengths and limitations, demonstrating variable performance with Dutch medical data. In fixed-format synthetic data, Deduce surpasses Deidentify in terms of effectiveness, achieving a higher

F_{1}

score for several categories like DATUM, LEEFTIJD, PATIENTNUMMER, PERSOON, TELEFOONNUMMER, and URL, and on average. Its rule-based approach, while requiring continuous rule modification and updating, provides a more precise and effective method for deidentifying structured data and numeric categories. On the other hand, Deidentify’s neural network approach demonstrates better adaptability and generalization capability when handling complex patterns like DATUM and LOCATIE on unknown or real-world data.

When facing real-world data, after verifying the true labels of the dataset with accurate format checks and ensuring no errors, for entity categories such as INSTELLING, PATIENTNUMMER, PERSOON, TELEFOONNUMMER, and URL, the Deduce method should be employed for deidentification. However, before utilizing Deduce, the lookup list, trie-based hashing, and regular expressions should be appropriately modified to enhance Deduce’s performance. For the other entity categories, the Deidentify method can be considered for deidentification. Importantly, before using Deidentify, it would be beneficial to fine-tune the pre-trained model of Deidentify on the required dataset. By utilizing the performance of Deduce in deidentifying structured and numeric categories with Deidentify’s adaptability and generalization ability for complex and unstructured data, the de- identification techniques’ effectiveness, adaptability, and generalization on Dutch medical text can be improved. Thus, we show a summary of the recommended deidentification methods for different PHI categories in Table 3. In essence, this study sheds light on the distinct strengths of Deduce and Deidentify. However, the full spectrum of their comparative efficacies can only be deeply understood with broader access to the real-world datasets. Consequently, while the findings guide the current best practices in Dutch medical text deidentification, there remains a window of opportunity for further research, especially with access to richer and diverse real-world datasets.

6. Limitations and Future Work

As mentioned elsewhere in the paper, the current study does not fine-tune Deidentify on our datasets, nor does it curate the Deduce lists specifically for our purposes. Thus, both models are likely to underperform to some degree on our datasets. This is especially the case for Deidentify, which used a different annotation strategy at training time. In future work, we plan to fine-tune Deidentify and carefully evaluate the white lists used by Deduce before running the comparison.

The performance of deidentification methods is not only important for legal reasons, but it might also have implications on downstream tasks [6]. For this reason, in future work, we plan to use our data for a downstream task such as violence risk assessment [16], and use both deidentification systems, to see how the performance on the downstream task varies.

Finally, as mentioned in Section 4.3, our study focused on Dutch clinical text. Some of the labels would perform significantly differently on non-clinical text or non-Dutch text. As an example, consider deidentifying addresses in countries with a different system for addresses. In future work, we will attempt to generate synthetic data from other countries to see how these deidentification systems perform.

Author Contributions

Conceptualization, P.M. and M.S.; methodology, P.M. and R.W.; software, P.M. and R.W.; validation, P.M. and R.W.; formal analysis, P.M. and R.W.; investigation, P.M. and R.W.; resources, P.M., R.W., F.S. and M.S.; data curation, P.M., F.S. and R.W.; writing—original draft preparation, R.W.; writing—review and editing, P.M. and M.S.; visualization, R.W.; supervision, M.S.; project administration, M.S. and F.S.; funding acquisition, F.S. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This research was approved by the ethics board of the UMC Utrecht as part of the Computing Visits Data (COVIDA) research programme of the Strategic Alliance Fund of Utrecht University, the University Medical Center Utrecht, and Technical University Eindhoven (round 2019).

Data Availability Statement

Our data is not publicly available due to restriction placed by the owners, the UMC Utrecht.

Conflicts of Interest

Two of the authors are co-authors of the original Deduce paper.

References

Meystre, S.M.; Ferrández, Ó.; Friedlin, F.J.; South, B.R.; Shen, S.; Samore, M.H. Text de-identification for privacy protection: A study of its impact on clinical text information content. J. Biomed. Inform. 2014, 50, 142–150. [Google Scholar] [CrossRef] [PubMed]
Edemekong, P.F.; Annamaraju, P.; Afzal, M.; Haydel, M.J. Health Insurance Portability and Accountability Act (HIPAA) Compliance. In StatPearls; StatPearls Publishing: Treasure Island, FL, USA, 2025. [Google Scholar]
Voigt, P.; Von Dem Bussche, A. The EU General Data Protection Regulation (GDPR): A Practical Guide; Springer Nature: Cham, Switzerland, 2024. [Google Scholar] [CrossRef]
Newlands, G.; Lutz, C.; Tamò-Larrieux, A.; Villaronga, E.F.; Harasgama, R.; Scheitlin, G. Innovation under pressure: Implications for data privacy during the Covid-19 pandemic. Big Data Soc. 2020, 7, 2053951720976680. [Google Scholar] [CrossRef]
Wang, X.; Jiang, Y.; Bach, N.; Wang, T.; Huang, Z.; Huang, F.; Tu, K. Automated Concatenation of Embeddings for Structured Prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; Zong, C., Xia, F., Li, W., Navigli, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 2643–2660. [Google Scholar] [CrossRef]
Berg, H.; Henriksson, A.; Dalianis, H. The Impact of De-identification on Downstream Named Entity Recognition in Clinical Text. In Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, Online, 20 November 2020; Holderness, E., Jimeno Yepes, A., Lavelli, A., Minard, A.L., Pustejovsky, J., Rinaldi, F., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1–11. [Google Scholar] [CrossRef]
Neamatullah, I.; Douglass, M.M.; Lehman, L.W.H.; Reisner, A.; Villarroel, M.; Long, W.J.; Szolovits, P.; Moody, G.B.; Mark, R.G.; Clifford, G.D. Automated de-identification of free-text medical records. BMC Med. Inform. Decis. Mak. 2008, 8, 32. [Google Scholar] [CrossRef] [PubMed]
Douglass, M.M.; Cliffford, G.D.; Reisner, A.T.; Long, W.J.; Moody, G.B.; Mark, R.G. De-identification algorithm for free-text nursing notes. In Proceedings of the Computers in Cardiology, Lyon, France, 25–28 September 2005; pp. 331–334. [Google Scholar]
Menger, V.; Scheepers, F.; van Wijk, L.M.; Spruit, M. DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text. Telemat. Inform. 2018, 35, 727–736. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 12 April 2025).
Trienes, J.; Trieschnigg, D.; Seifert, C.; Hiemstra, D. Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records. arXiv 2020, arXiv:2001.05714. [Google Scholar] [CrossRef]
Van Toledo, C.; Van Dijk, F.; Spruit, M. Dutch Named Entity Recognition and De-Identification Methods for the Human Resource Domain. arXiv 2021, arXiv:2106.02287. [Google Scholar] [CrossRef]
Das, A.; Tariq, A.; Batalini, F.; Dhara, B.; Banerjee, I. Framework for Exposing Vulnerabilities of Clinical Large Language Model: A Case Study in Breast Cancer. TechRxiv 2024. [Google Scholar] [CrossRef]
OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar] [CrossRef]
Libbi, C.A.; Trienes, J.; Trieschnigg, D.; Seifert, C. Generating Synthetic Training Data for Supervised De-Identification of Electronic Health Records. Future Internet 2021, 13, 136. [Google Scholar] [CrossRef]
Singh, J.P.; Desmarais, S.L.; Hurducas, C.; Arbach-Lucioni, K.; Condemarin, C.; Dean, K.; Doyle, M.; Folino, J.O.; Godoy-Cervera, V.; Grann, M.; et al. International perspectives on the practical application of violence risk assessment: A global survey of 44 countries. Int. J. Forensic Ment. Health 2014, 13, 193–206. [Google Scholar] [CrossRef]

Figure 1. Example of the deidentification process, from https://tdslab.nl/deduce (accessed on 12 April 2025). The figure consists of three parts: the legend categories, the original Dutch medical text containing instances of the legend categories, and the deidentified text with the legend categories removed. The text is synthetic, and its English translation is as follows: “Intake interview with Jan Jansen (e: j.g.janen_1966@gmail.com, t:0612345678, patnr:1243567). It concerns a 51-year-old man who will be treated at the outpatient clinic of the UMCU from 14 March to 31 July for complaints of depression. Patient lives at Voorstraat 45b in Utrecht and will be treated here by Peter de Visser”.

Figure 2. The evaluation results of Deduce and Deidentify, with three performance measures, on the synthetic and real/annotated datasets.

Table 1. The evaluation result of Deduce and Deidentify performance on the Synthetic Dataset.

PHI Category	Deduce			Deidentify
	P	R	$F_{1}$	P	R	$F_{1}$
DATUM (date)	0.95	0.97	0.96	0.95	0.87	0.91
INSTELLING (institution)	0.93	0.38	0.54	0.76	0.63	0.69
LEEFTIJD (age)	1.00	0.80	0.89	0.53	0.26	0.35
LOCATIE (location)	0.90	0.52	0.66	0.87	0.65	0.74
PATIENTNUMMER (patient number)	1.00	0.97	0.98	1.00	0.18	0.31
PERSOON (person)	0.93	0.99	0.96	0.03	1.00	0.06
TELEFOONNUMMER (telephone number)	0.98	0.96	0.97	0.57	0.27	0.37
URL	0.98	0.98	0.98	0.97	0.34	0.50
macro avg	0.85	0.84	0.77	0.67	0.52	0.39
weighted avg	0.96	0.82	0.87	0.83	0.40	0.48

Table 2. The evaluation result of Deduce and Deidentify performance on the Annotated Dataset.

PHI Category	Deduce			Deidentify
	P	R	$F_{1}$	P	R	$F_{1}$
DATUM (date)	0.94	0.80	0.86	0.97	0.86	0.91
INSTELLING (institution)	1.00	0.38	0.55	0.93	0.33	0.49
LEEFTIJD (age)	1.00	0.26	0.41	0.96	0.27	0.42
LOCATIE (location)	0.93	0.40	0.56	0.82	0.63	0.71
PATIENTNUMMER (patient number)	0.03	0.03	0.03	0.37	0.80	0.50
PERSOON (person)	0.98	0.79	0.87	0.20	0.77	0.32
TELEFOONNUMMER (telephone number)	0.96	0.80	0.87	0.83	0.91	0.87
URL	0.81	0.93	0.87	0.59	0.93	0.72
macro avg	0.74	0.60	0.56	0.67	0.65	0.49
weighted avg	0.95	0.66	0.76	0.87	0.46	0.46

Table 3. A summary of recommended deidentification methods for different PHI categories.

PHI Category	Deduce	Deidentify
DATUM (date)		✓
INSTELLING (institution)	✓
LEEFTIJD (age)		✓
LOCATIE (location)		✓
PATIENTNUMMER (patient number)	✓
PERSOON (person)	✓
TELEFOONNUMMER (telephone number)	✓
URL	✓

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mosteiro, P.; Wang, R.; Scheepers, F.; Spruit, M. Investigating De-Identification Methodologies in Dutch Medical Texts: A Replication Study of Deduce and Deidentify. Electronics 2025, 14, 1636. https://doi.org/10.3390/electronics14081636

AMA Style

Mosteiro P, Wang R, Scheepers F, Spruit M. Investigating De-Identification Methodologies in Dutch Medical Texts: A Replication Study of Deduce and Deidentify. Electronics. 2025; 14(8):1636. https://doi.org/10.3390/electronics14081636

Chicago/Turabian Style

Mosteiro, Pablo, Ruilin Wang, Floortje Scheepers, and Marco Spruit. 2025. "Investigating De-Identification Methodologies in Dutch Medical Texts: A Replication Study of Deduce and Deidentify" Electronics 14, no. 8: 1636. https://doi.org/10.3390/electronics14081636

APA Style

Mosteiro, P., Wang, R., Scheepers, F., & Spruit, M. (2025). Investigating De-Identification Methodologies in Dutch Medical Texts: A Replication Study of Deduce and Deidentify. Electronics, 14(8), 1636. https://doi.org/10.3390/electronics14081636

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Investigating De-Identification Methodologies in Dutch Medical Texts: A Replication Study of Deduce and Deidentify

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.1.1. Annotated Dataset

2.1.2. Synthetic Dataset

2.2. Methodology

2.2.1. Deduce

2.2.2. Deidentify

2.3. Evaluation Metrics

3. Results

4. Discussion

4.1. Deduce

4.2. Deidentify

4.3. Generalizability

5. Conclusions

6. Limitations and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI