Experts have developed multiple semantic standards, ranging from databases (e.g., protein, organism, etc.) to controlled terminologies (e.g., International Classification of Diseases (ICD), Systematized Nomenclature of Medicine (SNOMED), etc.) to ontologies (e.g., Gene Ontology, Diseases Ontology [1
], etc.), with varying complexity to capture biomedical and healthcare semantics. In general, these standards express the same term in diverse ways. For example, the term atrial fibrillation is referred to as Atrial Fibrillations in MeSH (Medical Subject Heading), Auricular Fibrillation in PSY (Psychological Index) terms, Auricular Fibrillations in MeSH, Fibrillation—atrial in SNOMED, AFib in NCI (National Cancer Institute), etc. Furthermore, the position of the term in the hierarchical structure varies across the standards.
The National Library of Medicine (NLM) has undertaken the Unified Medical Language System (UMLS) [2
] to bring different semantic standards together by integrating the similar medical terms expressed in diverse ways into a single entity, called a concept, and it also capture the relationships between the terms defined in the integrating sources. The current version of UMLS has integrated 211 semantic standards, generating 4.26 million concepts from 15.2 million medical terms, and more than 10 million relationships. The UMLS has three modules—Lexical tools, Metathesaurus, and Semantic Network [5
]. Metathesaurus holds the concepts, mapping between concepts to terms from the source standard, and concept attributes, synonyms, relationships, etc. Semantic Network is an ontological resource that categorizes the Metathesaurus concepts into a higher-level category. For the rest of the article, UMLS knowledge refers to both UMLS Metathesaurus and Semantic Network.
Currently, users have two ways to access UMLS knowledge. First, obtain UMLS knowledge as raw files (.nlm) and execute MetamorphoSys, an installation wizard, to generate a set of RRF (Rich Release Format) files (e.g., MRCONSO.RRF, MRSAB.RRF, MRREL.RRF, MRSTY.RRF, etc.) and SQL scripts. Executing the SQL scripts will create SQL tables (with the same name as the RRF file) and copy the data from the RRF files into the respective tables. Each RRF file has a corresponding SQL table, and the table structure reflects the RRF file structure. Individual users write SQL queries to obtain the desired UMLS knowledge, and also a custom software to parse the results to use in an application. Second, users query the UMLS API endpoints [6
] developed by NLM and the results are communicated in JSON format. With the former approach, users must first understand the complex UMLS knowledge structure to query and use it, a complex and slow process. The UMLS knowledge structural complexity is due to knowledge distribution across multiple SQL tables (or RRF files), each holding attributes, relationships, and other characteristics related to a concept. The users must tackle the perplexing task of understanding what each table (or RRF file) holds, the columns of the table, and how to join the tables (or parse RRF files) to obtain the desired UMLS knowledge. The user must also have good SQL knowledge to write efficient SQL queries, and installing UMLS from scratch or updating it is another dreaded process. This approach is a convoluted process and daunting for non-technical users, who are a majority in the domain of biomedical and healthcare informatics.
presents a query output, using the latter approach, to find medical terms equivalent to atrial fibrillation in the JSON format. There are 26 terms from multiple source standards equivalent to the query term atrial fibrillation. The figure only displays the JSON structure of one term due to space constraints. To use this knowledge, users need to refer to the documentation of one or more RRF files (or SQL tables) to understand the keys (ui, obsolete, rootSource, etc.) and associated value in the output JSON structure. For example, the key “obsolete” indicates if the term is outdated in the source. The user needs to refer to the MRCONSO RRF file (or SQL table) documentation to get this information. Similarly, “rootSource” indicates the original semantic source of the term. The user must refer to the MRSAB RRF file (or SQL table) documentation to obtain information on the abbreviation. For keys (e.g., “relation”) with a link (another endpoint) as its value, the users need to query the endpoint and understand the output’s JSON structure by referring to one or more RRF files. For some keys (e.g., parents, ancestors, etc.), the user needs to refer to the UMLS API documentation.
In either approach, users face the same challenges—the need to understand the UMLS’s native knowledge structure so as to use UMLS knowledge. These challenges—the complex UMLS knowledge structure that does not align with a healthcare standard or any standard in general, the need to write complex SQL queries and custom software by the user to parse the query results—make it difficult to interact with the UMLS and use its knowledge. However, when a widely accepted standard is used to represent the UMLS knowledge, the standard normalizes the UMLS knowledge structure, hiding the discussed issues. Further, the user can write a reusable software program or use an existing library to parse the standard’s structure to use the UMLS knowledge. As UMLS is recognized internationally and used in a wide range of applications, there is a need for a terminology service—a light-weight solution, when compared to the current UMLS installation process, which expresses its knowledge in an interoperable format using mature standard(s).
The emerging interoperability standard that can represent healthcare data and also has the ability to express a semantic standard’s metadata and its content is Fast Interoperability Health Resources (FHIR) [7
]. FHIR is an emerging new member of the HL7 family, developed to meet healthcare data interoperability requirements, and it has the great benefit of it being relatively easy to implement and rapidly deploy FHIR compliant applications when compared to other existing standards. Some of the salient features of FHIR are as follows: data are captured and shared as resource-modular data units; integrated support to REST with well-defined guidelines [10
] allowing developers to manage and manipulate the resources using an API; and FHIR extensions allow experts to extend the FHIR resource to meet new requirements. Furthermore, when compared with existing standards (e.g., HL7 Clinical Document Architecture (CDA) [11
], HL7 V2 messaging [11
], openEHR [12
], etc.), FHIR by design has the ability to represent semantic knowledge using FHIR terminology resources; other standards do not have any foundational support. FHIR’s ability to represent semantic knowledge along with FHIR features, and the availability of open-source implementation of the FHIR standard, the HAPI library [13
], makes the FHIR standard a prime candidate to be evaluated and used to represent UMLS knowledge. Other healthcare standards lack an open-source implementation of the standard’s specification.
The overall objective of this research is to implement a terminology service, which allows users to query and easily access the UMLS knowledge structured using the FHIR standard. An interoperable representation of UMLS knowledge allows any FHIR-compliant application to consume UMLS knowledge without the need to understand its native knowledge structure and inner workings. Towards this objective, this research presents the following: A detailed analysis of similarities and differences between UMLS’s knowledge structure and FHIR ConceptMap, an FHIR terminology resource; A design of FHIR extensions to profile ConceptMap to align it with UMLS’s knowledge structure, which then allows us to express UMLS knowledge as ConceptMap and implement the extensions using the HAPI library to demonstrate the validity of the research; Finally, it allows us to design and implement the proposed REST-based terminology service, named UMLS FHIR, using the HAPI library that allows users to query the desired UMLS knowledge and expresses it in an interoperable format using the profiled FHIR ConceptMap. The UMLS FHIR terminology service is available at http://umls.it.ilstu.edu/umlsfhir/
, whereas the API which drives the service is available at http://umls.it.ilstu.edu/umlsfhir/fhir
. As like any other API, UMLS FHIR API endpoints can be called from any FHIR compliant application to consume the FHIR formatted knowledge. The terminology service presents a simple web interface, on top of UMLS FHIR API, to query UMLS, returning the results in the FHIR format. The service acts as a simple interoperable facade hiding the discussed UMLS complexities.
2. Literature Review
Structural and Semantic standards have played a crucial role in digitalizing healthcare data. A structural standard such as HL7 FHIR provides a structure to represent healthcare data, while a semantic standard such as LOINC complements the data with semantics added to it. Separately, UMLS, a semantic terminology, and FHIR, an interoperability standard, are well-known and widely used for various healthcare research and industrial applications. For example, UMLS is extensively used in clinical NLP [14
], where UMLS is used as a reference terminology to identify medical terms in clinical notes. This improves UMLS, where UMLS is again used as a reference terminology to identify new medical terms or relationship from medical corpora (e.g., journals) using existing medical terms and then adding the new knowledge to be included in UMLS. As new knowledge is frequently added to UMLS [19
], there are multiple audits and quality checks to ensure the quality of UMLS knowledge [23
]. Clinical decision systems [15
] have used UMLS in the past to extract similar medical cases for clinical references, etc. FHIR is primarily used to achieve data interoperability [7
] across diverse healthcare systems, design customized Electronic Health Records (EHRs) and Personal Health Records (PHRs) [26
] to meet physician and patient requirements, enable the easy access to and adoption of medical vocabulary [27
], create computable case forms to identify patients suitable to cancer trails [30
], capture genomic data [31
], etc. A detailed discussion of the potential uses of UMLS or FHIR, separately, is beyond the scope of this research article.
Currently, there are a few terminology services and servers that allow access to semantic standards to satisfy diverse healthcare needs, and only one service that uses FHIR to serve the semantic content. The NLM UMLS API, a REST-based service, serves UMLS knowledge to a registered user via multiple endpoints. As discussed (Figure 1
), the barrier is that the knowledge structure is directly related to the UMLS native knowledge structure, a non-interoperable format. Before UMLS API, NLM developed the UMLS Knowledge Source Server [32
]. The authors used the innovative technology available at that time to service UMLS knowledge via the web. The server was implemented as a collection of Java servlets. The servlets queried UMLS using the RMI API. The query results, in XML format, are returned to the servlets that then apply the XSLT stylesheets to transform those results into HTML for display purposes. The server put UMLS on the web. Metke-Jimenez et al. have developed OntoServer [28
]. The Ontoserver is a terminology server that uses FHIR terminology resources to expresses the semantic knowledge encapsulated by SNOMED CT terminology, LOINC codes, and select OWL ontologies, such as the Human Phenotype Ontology (HPO). Similarly, the LONIC FHIR Terminology Server [29
] serves LONIC terminology codes using FHIR terminology resources. This research is developing a terminology service using FHIR to express UMLS knowledge, which encompasses SNOMED CT and LONIC along with 209 other internationally known terminologies. Jiang et al. proposed to develop a vocabulary mapping service that maps Observational Health Data Sciences and Informatics (OHDSI) vocabularies to FHIR resources [21
]. However, the author did not take the proposal beyond the conceptual idea. Currently, there are no efforts, based on the authors’ knowledge, that leverage the FHIR standard and its terminology resources for representing UMLS knowledge.
For the user search string “malaria”, the service returns all concepts whose name starts with “malaria” structured as an FHIR CodeSystem and wrapped in an FHIR Bundle. Figure 9
shows the output Bundle with CodeSystem in JSON format. The web application parses this JSON response to display the search results. If there are no results, the service will still return a Bundle resource, where the concept attribute within the CodeSystem resource will be empty.
In the above example link, the CUI C0004238 identifies the UMLS concept, Atrial Fibrillation, and the knowledge on this concept is returned as a ConceptMap, the profiled ConceptMap (Figure 7
). For explanation purposes, the output ConceptMap, in the JSON format, is divided into two figures. Figure 10
shows all the atoms from diverse semantic sources equivalent to the CUI C0004238, and Figure 11
shows its relationships. The table (left) in Figure 10
shows all atoms (63 of them) equivalent to Atrial Fibrillation and the respective ConceptMap representation (right). All the atoms from the same source and version are represented as a group (Group class, Figure 7
). Each group has one source (SourceElement class, Figure 7
) representing the UMLS concept and multiple targets (TargetElement class, Figure 7
), each target representing an atom. For example, a group captures the mapping of six atoms from CHV (Consumer Health Vocabulary, 2011_02) to the UMLS concept with CUI C0004238. The source standards without an official URL are encoded as a URN using their abbreviation and version information within the context of UMLS. As discussed in the gap analysis, an “equal” value for the equivalence attribute says that the atoms from diverse sources are equal to the UMLS concept. Figure 11
shows the relationships of the concept Atrial Fibrillation. Like Figure 10
, a group aggregates the relationships from the same source and version (not shown in the table due to space constraints). For each group, the source is Atrial Fibrillation with CUI C0004238, and the target(s) are other UMLS concepts. The attributes mappingType (code from UMLSRelationTypeCode valueset) and mappingLabel (a string) capture the information about the relationship (Relationship class, Figure 3
When any other resources are accessed using the base URL, the FHIR server will generate a 404 error, which means the FHIR server does not support the REST operation on that resource [10
]. The implemented valueset and extensions are available at the respective URLs, as shown in the figures.
Applications employ semantic standards to annotate the data with semantics, which allows users (humans and machines) a uniform understanding of the data. There exists many semantic standards in the domain of biomedical and healthcare informatics that captured knowledge related to various health specializations. For example, ICD, a classification system, classifies diseases, a variety of signs, symptoms, abnormal findings, complaints, and external causes. SNOMED, a computer-processable collection of the medical terms (in both human and veterinary medicine), provides codes, terms, synonyms, and definitions for diseases, findings, procedures, substances, etc. LOINC provides codes for identifying laboratory and clinical observations, and Disease Ontology, which is a standard ontology that represents human diseases. Generally, there is an overlap of knowledge captured by these standards. The UMLS system has successfully encircled current and legacy semantics standards under one roof and integrated the collective knowledge to build a single semantic reference system. Integrating the standards into UMLS allows a unified format and the distribution of biomedical and healthcare knowledge.
However, the users are required to understand UMLS’s complex knowledge structure and its inner workings, and also need strong technical skills to interact with UMLS. Representing UMLS knowledge in an interoperable format would be easier to consume and understand for users—both humans and machines. Our research was motivated by the need to represent UMLS knowledge in an interoperable format using standards that are widely accepted and allow easy access to the knowledge without the need to understand its inner working. As such, the research developed a terminology server that allows easy access to UMLS knowledge structured using the FHIR standard. The UMLS FHIR terminology server, along with its API, acts as a semantic facade that hides the composite intricacies related to UMLS. As the UMLS knowledge is structured using the FHIR standard, a widely adopted standard in the industry, the users are immune to UMLS’s native knowledge structure and changes. The users need to know the FHIR standard, resources, and its features to use the terminology server. For any user in healthcare, being popular with the ability to represent a variety of healthcare data in an interoperable format, and applicable across the healthcare domain, the users gain much more learning from the FHIR than the UMLS native knowledge structure. Further, any FHIR-compliant application can easily consume the UMLS knowledge structured as FHIR resources, as the application is already capable of handling FHIR’s structure data. In succession, the application developers need not write custom software modules to query UMLS and parse its knowledge to be used in the application. Even with the availability of many different healthcare standards (HL7 CDA, openEHR, etc.), FHIR is the interoperability standard, and has gained incredible support from researchers, healthcare organizations, governments, and healthcare application (EHR, PHR, etc.) vendors. This supports our choice of the standard and the goal-representing UMLS knowledge using FHIR.
There exist different terminology servers and services, such as the VOSER vocabulary server [38
], the UMLS Knowledge Source Server [32
], UMLS API [6
], GALEN terminology server [39
] and OntoServer [28
], to name a few. The UMLS API and OntoServer are REST APIs, while for the others, the user needs to install and configure the software on the server. As UMLS FHIR API is a REST-based service, the applications query the endpoints programmatically, as required, and use the knowledge. As the results are formatted using FHIR resources, users can use open-source FHIR libraries, such as HAPI, to get the knowledge from the structure. With other services, except for OntoServer, users need to understand the output structure and develop custom software modules to parse the output structure to get the knowledge. The FHIR standard integrates the REST specification into the standard. The FHIR specification presents detailed guidelines on how to manage FHIR resources via REST operation (read, update, delete, etc.), operation responses, naming style, operation return type, etc. Thus, any FHIR implementation (e.g., HAPI library, etc.) must follow the guidelines, and all the users must have a mutual understanding of how to access the resources. FHIR specification has standardized the REST API design. No other healthcare standard has this feature, and it is the responsibility of the developers to design the API specification for that terminology server and service, which is a very concerning drawback. Most of the existing terminological services cause vendor lock-in. If a healthcare application currently using, for example, UMLS API wants to move to another service, say OntoServer, then the developers must reconfigure the application to work with the new service. This can be a time-consuming and expensive process. Using a terminology service compliant with FHIR, such as UMLS FHIR API, prevents vendor lock-ins, as any new FHIR-compliant terminology services can replace the existing one. This is a great benefit to the application and its stakeholders. Another hurdle to using a terminology is the value it provides as compared to the cost, as terminologies come with various technical challenges. One of them is terminology updates. An updated terminology allows users to access the newest semantic knowledge. As UMLS FHIR is a service, the users need not worry about UMLS terminology updates.