Integrating Unstructured EHR Data Using an FHIR-Based System: A Case Study with Problem List Data and an FHIR IPS Model
Abstract
1. Introduction
2. Related Work
3. Research Methodology
3.1. Problem Statement
3.2. Method
- Identification of the section and resource tags—executed only once (Stage 1);
- Data processing—executed for every clinical note file (Stage 2).
- Stage 1: Tag Identification
- The identification of sections related to the patient problem list and diagnosis: The list of section tags in [21] with 6773 items was used as the starting point to identify and extract section candidates that may contain information related to the patient problem list and diagnosis. The resulting list was then optimized by eliminating duplicates and semantic matching using a pivot term. For example, principal diagnosis and secondary diagnosis were replaced by diagnosis (Figure 2).
- Classification to categories: The final French tag list includes three main categories: tags related to allergies (50 elements), diagnosis (106 elements), and other conditions (156 elements) (see Supplementary Materials)
- Stage 2: Data processing
- Step 1: Data preparation
- 2.
- Step 2: NLP pipelines
- 3.
- Step 3: Mapping to SNOMED
- 4.
- Step 4: FHIR model creation
- Four rules related to context modifiers (experiencer, negation, temporality, and certainty);
- Two additional rules related to resource tag identification (allergy, diagnosis).
- Experiencer = non-patient: Experiencer context means that the indicated problem is for the patient or their family member. In this scenario, the FHIR resource used was FamliyMemberHistory. In this study, the focus was on the patient’s clinical problems.
- Experiencer = patient and allergy tag presence: When an allergy tag is detected, the concept is mainly related to an allergy. Therefore, the FHIR resource to be used is AllergyIntolerance. The negation context is then used to confirm whether the patient had an allergy.
- Experiencer = patient and no allergy tag detected: The current patient problem is not related to an allergy, so the FHIR resource to be used is Condition. Next, the negation context is used to confirm the decision rule following these alternatives:
- Negation = Yes (Concept is negated): If it is the only concept for a condition, then the patient has no known conditions, and the corresponding SNOMED CT code will be added. Otherwise, the process continues with the next concept.
- Negation = No. The next step is to add the corresponding SNOMED CT code and mapping to the corresponding FHIR Condition element following these three rules:
- ○
- Temporality confirms whether the identified problem is still active. The value of the ClinicalStatus element of the Condition resource is active or inactive if the temporality is Recent or Historical, respectively.
- ○
- Certainty enables us to determine whether an identified problem is confirmed or only a hypothesis. In the first case, the VerificationStatus element value is confirmed; otherwise, it is unconfirmed.
- ○
- The diagnosis section tag enables the identification of the condition category element. The possible values are Encounter-diagnosis or Problem-List-Item.
3.3. Evaluation
- -
- The rule-based approach results were manually validated by testing all possible cases because the dataset did not cover all scenarios of the context modifiers.
- -
- The result of the overall process was then viewed in the IPS viewer.
4. Results
- (a)
- CANTEMIST-FRASIMED: The patient summary is organized into sections for medical history, physical examination, diagnosis, treatment, etc.
- (b)
- DISTEMIST-FRASIMED: The summary is a text with no headers.
- Step 1: Data preparation:
- -
- Conversion into text format: The file format is text type.
- -
- Section extraction:
- (a)
- CANTEMIST-FRASIMED corpus: Based on the resource and section tag list, the sections related to the patient problem list were extracted (see Table 2 for the list of section titles available in this corpus and their corresponding categories);
- (b)
- DISTEMIST-FRASIMED Corpus: This step is not applicable to this corpus because there are no headers.
- Step 2: NLP pipelines
- -
- Language detection: The purpose of this step is to select either the French or English model to be used based on the text file language. This study focuses on French text cases.
- -
- Concepts and context extraction: Figure 4 is an example of the SIFR output for a clinical text that contains the results of context modifiers for each detected concept.
- 3.
- Step 3: Mapping to SNOMED
- 4.
- Step 4: FHIR model creation
- -
- Most of the tools used in this study were an implementation of version 4 of FHIR [37].
- -
- The HAPI FHIR server, an implementation of the FHIR specifications in Java, was used to test the proposed FHIR model [38].
- -
- The details of the specification profile describing the FHIR resources and their format are based on the IPS implementation guide [39].
- -
5. Limitations
6. Discussion, Contributions, and Future Work
- The French language is rich and can be finely tuned in a specific context (regulatory, administrative, etc.).
- The development of models in French is crucial to ensure data, digital, and technology sovereignty [43,44,45]. There is increasingly a need for different forms of independence, control, and autonomy over digital infrastructure, technologies, and data. For example, translating into English for processing potentially exposes data to third parties (US servers). Canadian public health prefers locally hosted solutions due to security, confidentiality, and other regulatory constraints.
- If no one develop NLP models in French, the language will be hidden in AI systems, and furthermore, translating may risk enforcing a linguistic and cultural bias.
- Finally, a well-trained French-language model may outperform an English model when used alongside a translation system.
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| API | Application Programming Interface |
| BERT | Bidirectional Encoder Representations from Transformers |
| BioBERT | Bidirectional Encoder Representations from Transformers for Biomedical Text Mining |
| CEN | European Committee for Standardization |
| CDA | Clinical Document Architecture |
| CLAMP | Clinical Language Annotation, Modeling, and Processing |
| cTAKES | Clinical Text Analysis and Knowledge Extraction System |
| eHN | European eHealth Network |
| EHR | Electronic Health Record |
| FHIR | Fast Healthcare Interoperability Resource |
| G7 | Group of Seven Summits: Annual meeting of leaders from seven of the world’s largest advanced economies |
| GDHP | Global Digital Health Partnership |
| HL7 | Health Level Seven |
| IHE | Integrated Healthcare Exchange |
| IPS | International Patient Summary |
| ISO | International Organization for Standardization |
| PS-CA | Canadian Patient Summary |
| MedXN | Medication eXtraction and Normalization |
| MedSpacy | SpaCy-based library of core components targeting medical text |
| ML | Machine Learning |
| NLM | National Library of Medicine |
| NLP | Natural Language Processing |
| ONC | Office of the National Coordinator |
| SIFR | Ontology-based annotation web service to process biomedical text in French |
| SNOMED CT | Systematized Nomenclature of Medicine—Clinical Terms |
| UIMA | Unstructured Information Management Architecture |
| UMLS | Unified Medical language System |
References
- International Patient Summary. 2025. Available online: https://international-patient-summary.net/ips-links-to-standards-and-specifications/ (accessed on 1 April 2025).
- Amar, F.; April, A.; Abran, A. Electronic Health Record and Semantic Issues Using Fast Healthcare Interoperability Resources: Systematic Mapping Review. J. Med. Internet Res. 2024, 26, e45209. [Google Scholar] [CrossRef]
- Health Level Seven International IPS. 2025. Available online: https://hl7.org/fhir/uv/ips/ (accessed on 1 April 2025).
- HL7, IPS-Condition Resource. 2025. Available online: https://build.fhir.org/ig/HL7/fhir-ips/StructureDefinition-Condition-uv-ips.html (accessed on 1 April 2025).
- Zaghir, J.; Bjelogrlic, M.; Goldman, J.-P.; Aananou, S.; Gaudet-Blavignac, C.; Lovis, C. FRASIMED: A Clinical French Annotated Resource Produced through Crosslingual BERT-Based Annotation Projection. arXiv 2023, arXiv:2309.10770. [Google Scholar] [CrossRef]
- Durango, M.C.; Torres-Silva, E.A.; Orozco-Duque, A. Named Entity Recognition in Electronic Health Records: A Methodological Review. Healthc. Inform. Res. 2023, 29, 286–300. [Google Scholar] [CrossRef]
- Gaudet-Blavignac, C.; Foufi, V.; Bjelogrlic, M.; Lovis, C. Use of the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) for Processing Free Text in Health Care: Systematic Scoping Review. J. Med. Internet Res. 2021, 23, e24594. [Google Scholar] [CrossRef] [PubMed]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
- Hong, N.; Wen, A.; Stone, D.J.; Tsuji, S.; Kingsbury, P.R.; Rasmussen, L.V.; Pacheco, J.A.; Adekkanattu, P.; Wang, F.; Luo, Y.; et al. Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries. J. Biomed. Inform. 2019, 99, 103310. [Google Scholar] [CrossRef]
- Hong, N.; Wen, A.; Shen, F.; Sohn, S.; Liu, S.; Liu, H.; Jiang, G. Integrating Structured and Unstructured EHR Data Using an FHIR-based Type System: A Case Study with Medication Data. Proc. AMIA Jt. Summits Transl. Sci. 2018, 2017, 74–83. [Google Scholar]
- Hong, N.; Wen, A.; Shen, F.; Sohn, S.; Wang, C.; Liu, H.; Jiang, G. Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. JAMIA Open 2019, 2, 570–579. [Google Scholar] [CrossRef]
- Liu, S.; Luo, Y.; Stone, D.; Zong, N.; Wen, A.; Yu, Y.; Rasmussen, L.V.; Wang, F.; Pathak, J.; Liu, H.; et al. Integration of NLP2FHIR Representation with Deep Learning Models for EHR Phenotyping: A Pilot Study on Obesity Datasets. AMIA Jt. Summits Transl. Sci. Proc. 2021, 2021, 410–419. [Google Scholar] [PubMed]
- Soysal, E.; Wang, J.; Jiang, M.; Wu, Y.; Pakhomov, S.; Liu, H.; Xu, H. CLAMP—A toolkit for efficiently building customized clinical natural language processing pipelines. J. Am. Med. Inform. Assoc. 2018, 25, 331–336. [Google Scholar] [CrossRef]
- Wang, J.; Mathews, W.C.; Pham, H.A.; Xu, H.; Zhang, Y. Opioid2FHIR: A system for extracting FHIR-compatible opioid prescriptions from clinical text. In Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, Republic of Korea, 16–19 December 2020; pp. 1748–1751. [Google Scholar] [CrossRef]
- Peterson, K.J.; Jiang, G.; Liu, H. A corpus-driven standardization framework for encoding clinical problems with HL7 FHIR. J. Biomed. Inform. 2020, 110, 103541. [Google Scholar] [CrossRef]
- Peterson, K.J.; Liu, H. Automating the Transformation of Free-Text Clinical Problems into SNOMED CT Expressions. AMIA Jt. Summits Transl. Sci. Proc. 2020, 2020, 497–506. [Google Scholar]
- Wu, H.; Toti, G.; Morley, K.I.; Ibrahim, Z.M.; Folarin, A.; Jackson, R.; Kartoglu, I.; Agrawal, A.; Stringer, C.; Gale, D.; et al. SemEHR: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. J. Am. Med. Inform. Assoc. 2018, 25, 530–537. [Google Scholar] [CrossRef]
- SNOMED CT Tooling. 2025. Available online: https://www.snomed.org/software-tools (accessed on 1 March 2025).
- Shaitarova, A.; Zaghir, J.; Lavelli, A.; Krauthammer, M.; Rinaldi, F. Exploring the Latest Highlights in Medical Natural Language Processing across Multiple Languages: A Survey. Yearb. Med. Inform. 2023, 32, 230–243. [Google Scholar] [CrossRef]
- Denny, J.C.; Spickard, A.; Johnson, K.B.; Peterson, N.B.; Peterson, J.F.; Miller, R.A. Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents. J. Am. Med. Inform. Assoc. 2009, 16, 806–815. [Google Scholar] [CrossRef] [PubMed]
- Pomares-Quimbaya, A.; Kreuzthaler, M.; Schulz, S. Current approaches to identify sections within clinical narratives from electronic health records: A systematic review. BMC Med. Res. Methodol. 2019, 19, 155. [Google Scholar] [CrossRef]
- Deepl Translator. 2025. Available online: https://www.deepl.com/en/translator (accessed on 1 March 2025).
- French English Medical Dictionary. 2025. Available online: https://dictionary.reverso.net/medical-french-english/ (accessed on 1 March 2025).
- ChatGPT. 2025. Available online: https://chatgpt.com/ (accessed on 1 March 2025).
- Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E. A Survey of Data Augmentation Approaches for NLP. arXiv 2021, arXiv:2105.03075. [Google Scholar] [CrossRef]
- Bayer, M.; Kaufhold, M.-A.; Reuter, C. A Survey on Data Augmentation for Text Classification. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
- Li, B.; Hou, Y.; Che, W. Data augmentation approaches in natural language processing: A survey. AI Open 2022, 3, 71–90. [Google Scholar] [CrossRef]
- File Convertor PDF to Text. 2025. Available online: https://www.freeconvert.com/pdf-to-text (accessed on 1 November 2024).
- MedspaCy. 2025. Available online: https://github.com/medspacy/medspacy/blob/master/README.md (accessed on 1 November 2024).
- Eyre, H.; Chapman, A.B.; Peterson, K.S.; Shi, J.; Alba, P.R.; Jones, M.M.; Box, T.L.; DuVall, S.L.; Patterson, O.V. Launching into clinical space with medspaCy: A new clinical text processing toolkit in Python. arXiv 2021, arXiv:2106.07799. [Google Scholar] [CrossRef]
- Spacy. 2025. Available online: https://spacy.io/ (accessed on 1 November 2024).
- SIFR, Clinical French Annotator. 2025. Available online: https://bioportal.lirmm.fr/annotator (accessed on 1 April 2025).
- Tchechmedjiev, A.; Abdaoui, A.; Emonet, V.; Zevio, S.; Jonquet, C. SIFR annotator: Ontology-based semantic annotation of French biomedical text and clinical notes. BMC Bioinform. 2018, 19, 405. [Google Scholar] [CrossRef]
- Mirzapour, M.; Abdaoui, A.; Tchechmedjiev, A.; Digan, W.; Bringay, S.; Jonquet, C. French FastContext: A publicly accessible system for detecting negation, temporality and experiencer in French clinical notes. J. Biomed. Inform. 2021, 117, 103733. [Google Scholar] [CrossRef] [PubMed]
- Canada Health Infoway Terminology Server. 2025. Available online: https://infocentral.infoway-inforoute.ca/en/tools/standards-tools/terminology-server (accessed on 1 April 2025).
- Shrimp Tool. 2025. Available online: https://ontoserver.csiro.au/shrimp/ (accessed on 1 April 2025).
- H7 FHIR V4. 2025. Available online: https://hl7.org/fhir/R4/resourcelist.html (accessed on 1 November 2024).
- HAPI FHIR. 2025. Available online: https://hapi.fhir.org/ (accessed on 1 November 2024).
- HL:7, IPS Implementation GuideHL:7, IPS Implementation Guide. 2025. Available online: https://build.fhir.org/ig/HL7/fhir-ips/OperationDefinition-summary.html (accessed on 1 April 2025).
- Osornio, A.L.; Kaminker, D.; Campos, F.; D’Amore, J. IPS Viewer. 2025. Available online: https://www.ipsviewer.com/classic (accessed on 1 April 2025).
- Kong, M.; Fernandez, A.; Bains, J.; Milisavljevic, A.; Brooks, K.C.; Shanmugam, A.; Avilez, L.; Li, J.; Honcharov, V.; Yang, A.; et al. Evaluation of the accuracy and safety of machine translation of patient-specific discharge instructions: A comparative analysis. BMJ Qual. Saf. 2025. online ahead of print. [Google Scholar] [CrossRef]
- Brandenberger, J.; Stedman, I.; Stancati, N.; Sappleton, K.; Kanathasan, S.; Fayyaz, J.; Singh, D. Using artificial intelligence based language interpretation in non-urgent paediatric emergency consultations: A clinical performance test and legal evaluation. BMC Health Serv. Res. 2025, 25, 138. [Google Scholar] [CrossRef] [PubMed]
- Pizzul, D.; Veneziano, M. Digital sovereignty or sovereignism? Investigating the political discourse on digital contact tracing apps in France. Inf. Commun. Soc. 2024, 27, 1008–1024. [Google Scholar] [CrossRef]
- Tan, K.L.; Chi, C.-H.; Lam, K.-Y. Survey on Digital Sovereignty and Identity: From Digitization to Digitalization. ACM Comput. Surv. 2024, 56, 1–36. [Google Scholar] [CrossRef]
- Couture, S.; Toupin, S. What does the notion of “sovereignty” mean when referring to the digital? New Media Soc. 2019, 21, 2305–2322. [Google Scholar] [CrossRef]






| ID | Category | Challenge Description |
|---|---|---|
| 1. | Data format | |
| 1.1 | Information related to the patient problem list is mainly in unstructured format. | |
| 1.2 | Most reports are in PDF file format. | |
| 2. | Language | |
| 2.1 | Clinical notes in Canada, and other French countries, are either in English or French. | |
| 2.2 | NLP models and techniques are language-dependent. Selecting the appropriate NLP pipeline requires prior identification of the language used. | |
| 2.3 | Most NLP tools are for English text. There is a major need in other languages, including French, which is largely used in Quebec, for the interoperability of the patient problem list. | |
| 3. | Context and modifiers | |
| 3.1 | The patient problem list may be related to an allergy/intolerance, a diagnosis, or other types of related clinical conditions. It is important to distinguish between these items to ensure correct mapping to the FHIR elements. | |
| 3.2 | The proposed framework needs to consider that the extracted condition may be in a negation context. | |
| 3.3 | The extracted condition may be related to the patient or their family members. | |
| 3.4 | The extracted condition may be confirmed or only a hypothesis. | |
| 3.5 | The extracted condition may be active or resolved (historical). | |
| 4. | Standard/guidelines | |
| 4.1 | A standard (e.g., SNOMED CT) must be used to ensure semantic interoperability or common understanding and interpretability. | |
| 5. | Condition type | |
| The patient problem list may be related to an allergy or another type of health conditions. Allergies need to be distinguished from other condition types. |
| Section Title (Original List) | Selected/Unselected | Tag Category |
|---|---|---|
| Anamnèse | Selected | Other condition |
| Examen physique | Unselected | - |
| Examens complémentaires | Unselected | - |
| Tests complémentaires | Unselected | - |
| Diagnostic | Selected | Diagnosis |
| DIAGNOSTIC PRINCIPAL | Selected | Diagnosis |
| HISTOIRE DE LA FAMILLE | Selected | Other condition |
| MALADIE ACTUELLE | Selected | Diagnosis |
| CONTEXTE PERSONNEL | Selected | Other condition |
| Antécédents | Selected | Other condition |
| Antécédents oncologiques | Selected | Other condition |
| Traitement | Unselected | - |
| Évolution | Unselected | - |
| L’évolution | Unselected | - |
| Développements | Unselected | - |
| FRAMISED Dataset Files | Language | Accuracy | Recall | Precision | F1 Score |
|---|---|---|---|---|---|
| CANTEMIST-FRASIMED | French | 1 | 1 | 1 | 1 |
| DISTEMIST-FRASIMED | French | 0.947 | 1 | 0.8888 | 0.9411 |
| Step | Input | Decision Logic | Output |
|---|---|---|---|
| 1 | Experiencer = patient and no associated allergy tag | The concept is related to the patient and no flag that it is an allergy | Use the Condition resource |
| 2 | Negation | “Affirmed” means there is no negation | Three contexts to verify |
| 3 | Temporality | Apply rule for recent | ClinicalStatus = Active |
| 4 | Certainty | Apply rule for value = certain | VerificationStatus = Confirmed |
| 5 | Diagnostic section tag | Concept included in Diagnosis section | Category = Encounter-diagnosis |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Amar, F.; April, A.; Abran, A. Integrating Unstructured EHR Data Using an FHIR-Based System: A Case Study with Problem List Data and an FHIR IPS Model. Electronics 2025, 14, 4134. https://doi.org/10.3390/electronics14214134
Amar F, April A, Abran A. Integrating Unstructured EHR Data Using an FHIR-Based System: A Case Study with Problem List Data and an FHIR IPS Model. Electronics. 2025; 14(21):4134. https://doi.org/10.3390/electronics14214134
Chicago/Turabian StyleAmar, Fouzia, Alain April, and Alain Abran. 2025. "Integrating Unstructured EHR Data Using an FHIR-Based System: A Case Study with Problem List Data and an FHIR IPS Model" Electronics 14, no. 21: 4134. https://doi.org/10.3390/electronics14214134
APA StyleAmar, F., April, A., & Abran, A. (2025). Integrating Unstructured EHR Data Using an FHIR-Based System: A Case Study with Problem List Data and an FHIR IPS Model. Electronics, 14(21), 4134. https://doi.org/10.3390/electronics14214134

