Adaption BERT for Medical Information Processing with ChatGPT and Contrastive Learning

Min, Lingtong; Fan, Ziman; Dou, Feiyang; Sun, Jiaao; Luo, Changsheng; Lv, Qinyi

doi:10.3390/electronics13132431

Open AccessArticle

Adaption BERT for Medical Information Processing with ChatGPT and Contrastive Learning

by

Lingtong Min

,

Ziman Fan

,

Feiyang Dou

,

Jiaao Sun

,

Changsheng Luo

and

Qinyi Lv

^*

School of Electronic Information, Northwestern Polytechnical University, Xi’an 710072, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(13), 2431; https://doi.org/10.3390/electronics13132431

Submission received: 27 May 2024 / Revised: 17 June 2024 / Accepted: 19 June 2024 / Published: 21 June 2024

Download

Browse Figures

Versions Notes

Abstract

Calculating semantic similarity is paramount in medical information processing, and it aims to assess the similarity of medical professional terminologies within medical databases. Natural language models based on Bidirectional Encoder Representations from Transformers(BERT) offer a novel approach to semantic representation for semantic similarity calculations. However, due to the specificity of medical terminologies, these models often struggle with accurately representing semantically similar medical terms, leading to inaccuracies in term representation and consequently affecting the accuracy of similarity calculations. To address this challenge, this study employs Chat Generative Pre-trained Transformer (ChatGPT) and contrastive loss during the training phase to adapt BERT, enhancing its semantic representation capabilities and improving the accuracy of similarity calculations. Specifically, we leverage ChatGPT-3.5 to generate semantically similar texts for medical professional terminologies, incorporating them as pseudo-labels into the model training process. Subsequently, contrastive loss is utilized to minimize the distance between relevant samples and maximize the distance between irrelevant samples, thereby enhancing the performance of medical similarity models, especially with limited training samples. Experimental validation is conducted on the open Electronic Health Record (OpenEHR) dataset, randomly divided into four groups to verify the effectiveness of the proposed methodology.

Keywords:

medical information system; medical similarity model; BERT; ChatGPT

1. Introduction

Semantic similarity computation of medical information, as one of the cores in medical information processing, aims to calculate the similarity of medical professional terms within massive datasets [1,2,3,4]. Semantic similarity computation of medical information finds essential applications in various areas, including medical information modeling [5], semantic retrieval [6], intelligent decision support [7], and medical knowledge graph construction [8].

In the modern healthcare sector, electronic health (e-health) refers to using information and communication technologies to support and enhance the delivery and management of healthcare services. e-Health encompasses various forms, including electronic health records (EHR), telemedicine, and mobile health applications. EHR is a digital system for managing and storing patient medical information. Accurate understanding and processing of medical terminology within EHR systems are essential. Methods for calculating medical semantic similarity help EHR systems analyze and compare the semantic similarity between different medical terms more precisely, thereby improving the efficiency and accuracy of medical information processing. Additionally, Digital Decision Support Systems (DDSSs) [9] highly depend on accurate and efficient medical information processing to provide reliable recommendations and insights for clinical decision-making. Medical semantic similarity calculations can enhance DDSSs by providing precise and efficient medical information processing, offering reliable advice and insights for clinical decision-making. However, due to differences in expertise and backgrounds, the representation of medical terms exhibits complexity, diversity, and dynamics, posing challenges to understanding and utilizing medical information [10,11]. For instance, in the open Electronic Health Record (OpenEHR) dataset [12], the term “care plan” has several semantically similar words, such as “Treatment plan”, “Care regimen”, and “Health management plan”. Manual comparison methods struggle to handle massive data, thus affecting the ability to identify and understand different expressions of medical professional terms. Therefore, utilizing semantic similarity calculations in medical information processing to address the complexity and diversity of terms in the medical field is crucial for improving the efficiency and accuracy of medical information processing.

In contrast to the inefficient manual comparison methods, natural language models like Bidirectional Encoder Representations from Transformers (BERT) provide new modeling approaches for semantic similarity calculations in medical information [13,14]. However, due to the specificity of medical terminology, natural language models have limitations in handling medical terms that are semantically similar, leading to inaccurate expressions of medical terms and consequently affecting the accuracy of similarity calculations [15,16,17,18,19]. Specifically, medical terms in the medical field may involve rich medical knowledge and specific contexts.

In this work, we adapt BERT during the training phase using Generative Pre-trained Transformer (ChatGPT) and contrastive loss, as illustrated in Figure 1, to enhance BERT’s semantic representation capability and thereby improve the accuracy of similarity calculation. Specifically, we utilize ChatGPT-3.5 to generate semantically similar texts for target text and incorporate them into the model training process as pseudo-labels. Subsequently, we adapt the semantic representation parameters of BERT. Finally, we employ contrastive loss to minimize the distance between relevant samples and maximize the distance between irrelevant samples, enhancing the performance of medical similarity models under limited training samples. In the experimental section, we simulate massive datasets, conduct multiple sets of experiments, and validate and evaluate the model, confirming the effectiveness of this approach in improving the accuracy of semantic similarity calculation in medical information.

2. Related Work

Semantic representation modeling is one of the main methods used to enhance the intelligence of machine language, and it is widely applied in information retrieval and natural language processing fields [1,5,6]. After pre-training on a large-scale unlabeled corpus, the BERT semantic representation model based on Transformer [21] dramatically improves the performance of information retrieval and natural language processing tasks [22,23,24].

In the medical field, due to the specificity of medical information, it is often necessary to adapt BERT to different tasks [25,26,27,28,29,30]. Suneetha et al. [31] proposed a cardiovascular disease diagnosis method based on Fine-tuning BERT, providing an effective and accurate diagnosis for cardiovascular disease patients. Kim et al. [32] proposed a medical specialty prediction model for patient-side medical problem texts based on BERT, assisting doctors in making decisions. Su et al. [33] proposed pre-training and fine-tuning BERT, improving the accuracy of automatic extraction of biomedical relationships. Ding et al. [34] proposed a crop disease diagnosis model based on bidirectional encoder representations transformer and RCNN applicable to crop diseases, assisting plant doctors in diagnosing crop diseases. Babu et al. [35] developed a medical chatbot based on BERT, significantly enhancing healthcare communication and accessibility using advanced deep-learning techniques. Meanwhile, Chen et al. [36] utilized coronary heart disease, also known as angina, as an example to construct a pre-trained diagnostic model for traditional Chinese medicine texts based on the BERT model, completing text classification tasks for different types of coronary heart disease medical cases. Faris et al. [37] designed a method for symptom identification and diagnosis based on BERT to assist doctors in handling consultations in multiple languages from users. BERT-based and CNN-based medical application methods are summarized in Table 1. In this paper, we utilize ChatGPT-3.5 to create a small-sample dataset and adapt the language expression parameters of BERT to enhance its ability to express semantic information in information retrieval and medical contexts.

3. Methods

3.1. Preliminary Knowledge

BERT is a widely used semantic information representation model consisting of BertTokenizer and multiple layers of transformer encoders. Accurate representation of semantic information is crucial for diagnosis, treatment, and research in the medical field. Therefore, we adopt BERT as the base model to utilize its powerful semantic learning ability to enhance the quality of medical text representation. Firstly, we preprocess medical text using BertTokenizer to convert it into three types of embeddings: token embedding, segment embedding, and position embedding. Token embedding captures the semantic information of each word, segment embedding distinguishes between different sentences or text segments, and position embedding encodes the positional information of words in sentences. Combining these three embedding forms provides rich semantic and positional information to the model, improving the expression capability of textual information. Next, we sum up these three embedding forms to obtain a vector

E_{j_{1}}, E_{j_{2}}, \dots E_{j_{N}}

, the input to the transformer encoder. In the multi-layer encoder, textual information undergoes multiple self-attention and feed-forward operations, gradually transforming into high-level semantic representations. Specifically, we employ a multi-head attention mechanism to simultaneously focus on different positions and semantic features of the input sequence, thus better capturing the contextual relationships and semantic information in medical text. Through this approach, the model can comprehensively understand the text content, thereby enhancing the quality of semantic information representation. In the multi-head attention module, as shown in Figure 2, vectors

E_{j_{1}}, E_{j_{2}}, \dots E_{j_{N}}

undergo linear transformations to generate three matrices: Q (query), K (key), and V (value), which are then used to compute attention weights and obtain the final output:

A t t e n t i o n = s o f t max (\frac{Q K^{T}}{\sqrt{d}} V)

(1)

This attention mechanism enables the model to be more flexible in handling medical text, adjusting weights based on the correlation between different words to express semantic information better. However, despite BERT being a powerful semantic information representation model, there are still limitations when dealing with medical terminologies. Due to the complex diversity of medical terms and common phenomena such as abbreviations and acronyms in medical texts, BERT may not accurately capture subtle differences between terms. Therefore, adapting BERT enhances its ability to represent medical terminologies. Precisely, we adjust the parameters of token embedding, segment embedding, and position embedding in semantic modeling to better adapt the model to the characteristics of the medical domain.

3.2. Overview

To address the challenge posed by the diversity of medical terminology and contextual differences in processing medical information, we adapt BERT using ChatGPT and Contrastive Loss. This adjustment aims to enhance the accuracy of semantic similarity recognition of medical information under similar semantic bases. As illustrated in Figure 1, we first utilize ChatGPT-3.5 to generate pseudo-labels for the OpenEHR dataset, creating the OpenEHR-S dataset with similar semantic expressions. Subsequently, the BERT and BERT-S models separately process the OpenEHR and OpenEHR-S datasets to obtain output vectors

T_{j_{1}}, T_{j_{2}}, \dots T_{j_{N}}

and

{T^{'}}_{j_{1}}, {T^{'}}_{j_{2}}, \dots {T^{'}}_{j_{N}}

, respectively, where

j = 1, 2, \dots M

and M represent the batch size. Then, the vectors

T_{j_{1}}, T_{j_{2}}, \dots T_{j_{N}}

and

{T^{'}}_{j_{1}}, {T^{'}}_{j_{2}}, \dots {T^{'}}_{j_{N}}

are averaged to obtain text vector representations

O_{j}

and

{O'}_{j}

for the OpenEHR and OpenEHR-S datasets, respectively. Finally, we compute the text similarity

s_{j}

and utilize contrastive loss to minimize the distance between relevant samples in the OpenEHR and OpenEHR-S datasets while maximizing the distance between irrelevant samples.

Specifically, we create a dataset with semantics similar to medical professional terminologies as pseudo-labels. We utilize the large-scale language model ChatGPT-3.5 to generate a dataset of medical professional terminologies with similar semantics to the OpenEHR dataset, serving as pseudo-labels

G T_{j}

for both the training and testing phases. The true label is designated as 1, while the erroneous label is set as 0. Subsequently, to process the medical professional terminology texts from OpenEHR, we employ the BertTokenizer for tokenization. The sentences are segmented into appropriate words or subword units, and special tokens (such as [CLS] and [SEP]) are added. The text is then encoded into a numerical sequence

E_{j_{1}}, E_{j_{2}}, \dots E_{j_{N}}

, serving as the input for BERT. Next, the numerical sequence

E_{j_{1}}, E_{j_{2}}, \dots E_{j_{N}}

is processed through the encoder of the transformer to compute the output

T_{j_{1}}, T_{j_{2}}, \dots T_{j_{N}}

of BERT. Finally, the medical text information

O_{j}

is obtained by averaging the output

T_{j_{1}}, T_{j_{2}}, \dots T_{j_{N}}

:

O_{j} = \frac{1}{N} \sum_{i = 1}^{N} T_{j_{i}}

(2)

It is worth noting that, to learn similar medical knowledge information during training, the embeddings in BERT are involved in training, while all other parameters are frozen.

Processing pseudo-labels for the OpenEHR-S dataset follows a procedure similar to that of the OpenEHR dataset. Firstly, the pseudo-labels are tokenized using the BertTokenizer to obtain

{E^{'}}_{j_{1}}, {E^{'}}_{j_{2}}, \dots {E^{'}}_{{j^{'}}_{N}}

, which serves as input for BERT-S. Subsequently,

{E^{'}}_{j_{1}}, {E^{'}}_{j_{2}}, \dots {E^{'}}_{{j^{'}}_{N}}

is passed through the encoder of the Transformer architecture to compute the output of BERT-S, denoted as

{T^{'}}_{j_{1}}, {T^{'}}_{j_{2}}, \dots {T^{'}}_{j_{N}}

. Finally, the medical text information

{O^{'}}_{j}

is obtained using Formula (1). It is worth noting that during the computation process, all parameters of BERT-S are frozen and do not participate in training. The similarity

s_{j}

can be calculated using the following formula:

s_{j} = \frac{O_{j} \cdot {O^{'}}_{j}}{∥O_{j}∥ ∥{O^{'}}_{j}∥}

(3)

Furthermore, we present the algorithm for calculating the similarity score between two individual medical terms, as shown in Algorithm 1.

Algorithm 1 Similarity score s calculation process

Input: a medical term

t e x t 1

, a medical term

t e x t 2

,
Output: Similarity score s between

t e x t 1

and

t e x t 2

1:: Tokenize $t e x t 1$ using BertTokenizer:
$E_{1}, E_{2}, \dots, E_{N} = BertTokenizer (t e x t 1)$
2:: Tokenize $t e x t 2$ using BertTokenizer:
$E_{1}^{'}, E_{2}^{'}, \dots, E_{N}^{'} = BertTokenizer (t e x t 2)$
3:: Compute BERT embeddings for $t e x t 1$ :
$T_{1}, T_{2}, \dots, T_{N} = BERT (E_{1}, E_{2}, \dots, E_{N})$
4:: Compute BERT-S embeddings for $t e x t 2$ :
$T_{1}^{'}, T_{2}^{'}, \dots, T_{N}^{'} = BERT - S (E_{1}^{'}, E_{2}^{'}, \dots, E_{N}^{'})$
5:: Compute the text vector representations using Equation (2):
$O \leftarrow Equation (2)$ # Input: $T_{1}, T_{2}, \dots, T_{N}$
$O^{'} \leftarrow Equation (2)$ # Input: $T_{1}^{'}, T_{2}^{'}, \dots, T_{N}^{'}$
6:: Compute the similarity score using Equation (3):
$s \leftarrow Equation (3)$ # Input: $O, O^{'}$
7:: return s

3.3. Pseudo-Label Generation with ChatGPT

ChatGPT [41,42,43,44] learn language models by pre-training on large-scale textual data and can adapt to different scenarios through prompts. To enhance the semantic expression ability of BERT, we utilize ChatGPT-3.5 to generate pseudo-label texts with similar semantic expressions, as illustrated in Figure 3. Specifically, we use the prompt “Generate words with similar semantics based on the following medical phrases” for each medical professional phrase to generate phrases and manually select appropriate phrases as pseudo-labels. Additionally, we prioritize selecting phrases with fewer duplicate words as pseudo-labels. For example, taking “self-test result” as input for ChatGPT-3.5, we obtain the following phrases in sequence: “Self-assessment outcome”, “Self-diagnostic findings”, “Personal evaluation outcome”, “Individual screening result”, and “Self-examination outcome”. We select “Personal evaluation outcome” as the pseudo-label.

3.4. Adaption BERT with Contrastive Loss

During training, we employ contrastive loss to minimize the distance between relevant samples and maximize the distance between irrelevant samples:

L (s, G T, δ) = \frac{1}{M} \sum_{j = 1}^{M} ((1 - G T_{j}) \cdot {s_{j}}^{2} + G T_{j} \cdot (max {(0, s_{j} - δ)}^{2}))

(4)

where

G T_{j}

represents the true label of the jth sample, with positive samples denoted as 1 and negative samples denoted as 0.

δ

is the margin of the loss function, which defaults to 1 and controls the gap between the similarity score and the loss function. In summary, our proposed medical information similarity model leverages natural language models such as BERT to enhance the recognition of medical information similarity under similar semantic contexts. By adapting these models and utilizing contrastive loss to optimize similarity calculations, we aim to bridge the gap in understanding between diverse medical terminologies and contextual differences in medical information processing. This approach contributes to more accurate medical information retrieval and analysis and lays the foundation for advancing natural language processing techniques in healthcare.

4. Experiments

4.1. Experiment Setup

OpenEHR is an international standard and open-source specification designed for creating, storing, sharing, and exchanging electronic health records. It provides a robust framework for managing and utilizing medical terminology. Using standardized and structured data models ensures the interoperability and consistency of medical information. In this study, we leverage the strengths of OpenEHR to extract medical-specific terminology. Specifically, we collect 708 medical text information from the OpenEHR website as the dataset for this study. From these, we randomly select 40 samples for training and testing, maintaining a 1:1 ratio between training and testing samples. As shown in Table 2, we present some target text and their corresponding pseudo-labels. In addition to evaluating the testing samples, we conduct more challenging experiments. To assess the search capability for semantically similar texts, we randomly divided the remaining 668 into groups labeled as Group1, Group2, Group3, and Group4, each containing 167 negative samples. These four groups are combined with pseudo-labeled samples from the testing set, forming 187 query samples. We calculate the similarity between 20 test texts and 187 query samples, with the top five similarity scores used as the prediction results for each test text, and evaluate the model performance.

To assess the model’s search capability for similar semantic expressions, we employ Top 1 Accuracy, Mean Reciprocal Rank (MRR), Precision, Recall,

F_{1}

, Area Under the Receiver Operating Characteristic Curve (AUC), and Matthews Correlation Coefficient (MCC) as evaluation metrics. Top 1 Accuracy refers to the proportion of predicted results matching the true labels among the highest similarity scores. MRR represents the average of the reciprocals of the ranks where the correct answer first appears among the top five predictions given by the model:

MRR = \frac{1}{M} \sum_{j = 1}^{M} \frac{1}{r a n k_{j}}

(5)

where M represents the number of real texts,

r a n k_{j}

represents the position where the correct answer first appears in the jth sample, used to evaluate ranking task performance. The

F_{1}

is an evaluation metric that comprehensively considers

P r e c i s i o n

and

R e c a l l

:

F_{1} = 2 \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} .

(6)

In addition, we use the AUC and MCC to evaluate our model further.

We conduct experiments in our PyTorch environment, using version 1.13.0, with Python version 3.10.0. Both training and evaluation are conducted on an NVIDIA RTX 4090 GPU. The batch size for training is set to 4, and we utilize the Adam optimizer with a learning rate of

1 \times 10^{- 4}

.

4.2. Method Comparison

Results on the Test Dataset: Table 3 shows the comparative results of TF-IDF [45,46], Count Vector [47], Levenshtein Distance [48], Damerau–Levenshtein Distance [49], BERT, and our model. It can be observed that BERT performs better on this test dataset compared to TF-IDF, Count Vector, Levenshtein Distance, and Damerau–Levenshtein Distance. Top 1, MRR, and F1 reach 85%, 86.7%, and 84.1%, respectively, demonstrating BERT’s excellent semantic representation ability in small-scale similarity calculation. In contrast, our model improves by 0.6% and 2.5% on MRR and F1, respectively, validating the effectiveness of our model in the semantic representation of medical vocabulary.

Results on Simulated Massive Dataset: Table 4 presents the comparative experiments of TF-IDF, Count Vector, Levenshtein Distance, Damerau–Levenshtein Distance, BERT, and our model on four groups. It can be observed that BERT could perform better when faced with a large number of negative samples, with Top-1 percentages of 25%, 35%, 35%, and 30% in the four groups, respectively. This indicates that BERT’s similarity calculation needs to be more accurate under the interference of many negative samples, resulting in many erroneous detections. In contrast, our model achieves Top-1 percentages of 30%, 50%, 40%, and 30% in the four groups, respectively. Additionally, regarding the MRR metric representing retrieval capability, BERT achieved scores of 35.6%, 47.2%, 46.8%, and 39.9% on the four datasets, respectively. Compared to BERT, our model showed improvements of 4.6%, 9.5%, 1.1%, and 0.3% on the MRR metric across the four groups. This indicates that our model improves the accuracy of similarity calculation compared to BERT, validating the effectiveness of our model.

Results on International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) [50] and International Classification of Diseases, 9th Revision, Clinical Modification (ICD-10-CM) [51] Datasets: Additionally, to further validate the effectiveness of our model, we conduct experiments on the ICD-9-CM and ICD-10-CM datasets, randomly selecting 25 nouns from each dataset as the validation dataset. Moreover, to assess the algorithmic efficiency of the improved model, we calculate the time required to process each text. It is worth noting that this dataset is not involved in the training process. The experimental results, as shown in Table 5, indicate that compared to BERT, our model achieves a 3.8% and 7.7% improvement in MRR and F1 scores, respectively, on the ICD-9-CM dataset and a 0.6% and 0.2% enhancement in MRR and AUC metrics, respectively, on the ICD-10-CM dataset. The consistency in parameters and FLOPs is attributed to our use of fine-tuning without introducing additional parameters. In conclusion, our model exhibits improvement on both the ICD-9-CM and ICD-10-CM datasets, thus affirming its efficacy.

4.3. Qualitative Analysis

Table 6 presents qualitative analysis experiments on the target text, where the top 5 similarity scores are considered as the predicted results for the target text. We observe that the BERT model predicts the highest scores for “self-test result” and “classification of glaucoma” text as “Personal assessment outcome” and “Glaucoma categorization”, respectively, consistent with the pseudo-labels. For the “inspection of the rectum” and “imaging examination of a placenta” text, the highest predicted scores are “Examination of the thyroid” and “Imaging examination of a body structure”, respectively, inconsistent with the pseudo-labels. This indicates that BERT focuses more on common vocabulary (inspection) and overlooks the specificity of medical terms (placenta and rectum), especially when dealing with large datasets. In contrast, our model’s highest predicted scores for “self-test result”, “classification of glaucoma”, and “imaging examination of a placenta” text are consistent with the pseudo-labels. Although the highest predicted score for the “Examination of the thyroid” text is inconsistent with the pseudo-label, it improves the ranking of the pseudo-label. This validates that our approach enhances BERT’s expressive capability in terms of medical professional terms.

4.4. Ablation Studies

Contrastive Loss: Table 7 presents the results of our ablation study on four datasets. It can be observed that when employing the strategy of minimizing positive sample distance alone, there are improvements of 5% and 3.6% in the Top-1 and MRR metrics for Group 1, respectively; an increase of 2.6% and 3.3% in the MRR and F1 metrics for Group 2, respectively; no improvement in the metrics for Group 3 and a decrease in the metrics for Group 4. This suggests that the effectiveness of solely minimizing positive sample distance is insignificant, likely due to the lack of consideration for negative sample influence during positive sample learning. When solely employing the strategy of maximizing negative sample distance, there are improvements of 5% and 3.7% in the Top-1 and MRR metrics for Group 1, respectively; an increase of 15% and 9.5% in the Top-1 and MRR metrics for Group 2, respectively; a decrease in the metrics for Group 3 and no improvement in the metrics for Group 4. This indicates that the effectiveness of solely maximizing negative sample distance is similarly insignificant, likely because the distance of positive samples is not reduced during negative sample learning. When simultaneously employing both the strategies of minimizing positive sample distance and maximizing negative sample distance, there are improvements of 5%, 4.6%, and 4% in the Top-1, MRR, and F1 metrics for Group 1, respectively; an increase of 15% and 9.5% in the Top-1 and MRR metrics for Group 2, respectively; enhancements of 5% and 1.1% in the Top-1 and MRR metrics for Group 3, respectively, and an improvement of 0.3% in the MRR metric for Group 4. This validates the effectiveness of strategies for both minimizing positive sample distance and maximizing negative sample distance.

Different Loss: Table 8 illustrates the results of adjusting BERT using different loss functions. When employing the cross-entropy loss, we observe a decrease of 5 and 2.9 in the Top-1 and MRR metrics, respectively. When using the smooth cross-entropy loss, there is a decrease of 10 and 5.4 in the Top-1 and MRR metrics, respectively. This indicates that utilizing cross-entropy loss and smooth cross-entropy loss leads to the adapted BERT model being less effective in learning information from medical texts, performing worse than the original BERT model. In contrast, the contrastive loss strategy of maximizing negative and minimizing positive samples better facilitates learning semantic representation parameters. Furthermore, we compare the Contrastive Loss* without using the square term. It is observed that there is an improvement compared to BERT, although the improvement is not significant. The square function is smooth and continuous, possessing good mathematical properties. Moreover, using the square term can magnify larger errors, making the model pay more attention to larger errors during training.

4.5. Limitations

The experimental results indicate that while our model shows some improvement compared to BERT, there is still significant room for enhancement. On the one hand, this might be attributed to our relatively small training dataset, comprising only 20 pairs of similar medical domain terms. This limitation may hinder the full exploration of the specificity of medical terminologies, thus requiring further improvement in semantic expression capabilities. On the other hand, generating pseudo-labels using ChatGPT entails considerable time and effort, leading to high costs. Future research will focus on enhancing the similarity of medical domain terms.

5. Conclusions

This paper examines the challenges faced by the medical information field in the context of big data and artificial intelligence technology development, focusing on the diversity of medical professional terms, contextual differences, and the challenges posed by massive datasets to traditional manual comparison methods and the BERT model. Addressing these challenges, we propose a method for semantic similarity computation of medical information by adapting the BERT model to enhance its understanding and processing capabilities of medical professional terms. We utilize ChatGPT-3.5 to generate semantically similar texts for medical professional terms, which are then incorporated into the model training process as pseudo-labels. Subsequently, by adjusting the semantic representation parameters of the BERT model, we enhance its adaptability to semantic features specific to the medical domain. We employ contrastive loss during training to minimize the distance between relevant samples and maximize the distance between irrelevant samples, thereby improving the model’s performance under limited training samples. Through validation on test sets and simulation of massive datasets, we find that our model outperforms the baseline BERT model regarding the expressive capability of medical professional terms and the accuracy of semantic similarity computation in medical information. In future research, we plan to explore fine-tuning BERT and other Transformer-based models for applications in medical information processing, aiming to enhance decision-making processes in healthcare settings. Additionally, we aim to investigate the performance of these models in traditional Chinese medicine and their adaptability in handling medical terminologies in minority languages.

Author Contributions

Conceptualization, L.M. and Z.F.; methodology, L.M. and Z.F.; software, L.M. and Z.F.; validation, L.M., Z.F. and Q.L.; formal analysis, L.M. and Z.F.; investigation, L.M. and Z.F.; resources, L.M. and Z.F.; data curation, L.M. and Z.F.; writing—original draft preparation, L.M. and Z.F.; writing—review and editing, F.D., J.S. and C.L.; visualization, L.M. and Z.F.; supervision, L.M. and Z.F.; project administration, L.M. and Z.F.; funding acquisition, L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (Grant 62206221) and the Fundamental Research Funds for the Central Universities.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

	Name	Meaning
Abbreviations	BERT	Bidirectional Encoder Representations from Transformers
	ChatGPT	Chat Generative Pre-trained Transformer
	OpenEHR	open Electronic Health Record
	MRR	Mean Reciprocal Rank
	AUC	Area Under the Receiver Operating Characteristic Curve
	MCC	Matthews Correlation Coefficient
	EHR	Electronic Health Records
	ICD-9-CM	International Classification of Diseases, 9th Revision, Clinical Modification
	ICD-10-CM	International Classification of Diseases, 9th Revision, Clinical Modification
	DDSSs	Digital Decision Support Systems
	e-health	electronic health
Notations	OpenEHR-S	Pseudo-labeled OpenEHR dataset
	BERT-S	The model for processing the pseudo-labeled dataset.
	E	World Embedding
	Q	Query
	K	Key
	V	Value
	T	The final layer vectors obtained after BERT processing
	s	Similarity score
	$G T$	Ground Truth
	O	The average sum of vectors from the final layer obtained after BERT processing.
	$δ$	The margin of the contrastive loss function

References

Min, L.; Atalag, K.; Tian, Q.; Chen, Y.; Lu, X. Verifying the feasibility of implementing semantic interoperability in different countries based on the openEHR approach: Comparative study of acute coronary syndrome registries. JMIR Med. Inform. 2021, 9, e31288. [Google Scholar] [CrossRef] [PubMed]
Kryszyn, J.; Cywoniuk, K.; Smolik, W.T.; Wanta, D.; Wróblewski, P.; Midura, M. Performance of an openEHR based hospital information system. Int. J. Med. Inform. 2022, 162, 104757. [Google Scholar] [CrossRef] [PubMed]
Ferreira, D.E.; de Souza, J.M. Methodology for developing OpenEHR archetypes: A narrative literature review. J. Health Inform. 2023, 15, 53–59. [Google Scholar] [CrossRef]
Talebi, S.; Tong, E.; Li, A.; Yamin, G.; Zaharchuk, G.; Mofrad, M.R. Exploring the performance and explainability of fine-tuned BERT models for neuroradiology protocol assignment. BMC Med. Inform. Decis. Mak. 2024, 24, 40. [Google Scholar] [CrossRef] [PubMed]
Min, L.; Tian, Q.; Lu, X.; Duan, H. Modeling EHR with the openEHR approach: An exploratory study in China. BMC Med. Inform. Decis. Mak. 2018, 18, 1–15. [Google Scholar] [CrossRef] [PubMed]
Min, L.; Tian, Q.; Lu, X.; An, J.; Duan, H. An openEHR based approach to improve the semantic interoperability of clinical data registry. BMC Med. Inform. Decis. Mak. 2018, 18, 49–56. [Google Scholar] [CrossRef] [PubMed]
Johnson, D.; Goodman, R.; Patrinely, J.; Stone, C.; Zimmerman, E.; Donald, R.; Chang, S.; Berkowitz, S.; Finn, A.; Jahangir, E.; et al. Assessing the accuracy and reliability of AI-generated medical responses: An evaluation of the Chat-GPT model. Res. Sq. 2023, 28, rs.3.rs-2566942. [Google Scholar]
Murali, L.; Gopakumar, G.; Viswanathan, D.M.; Nedungadi, P. Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study. J. Biomed. Inform. 2023, 143, 104403. [Google Scholar] [CrossRef] [PubMed]
Bertl, M.; Ross, P.; Draheim, D. Systematic AI support for decision-making in the healthcare sector: Obstacles and success factors. Health Policy Technol. 2023, 12, 100748. [Google Scholar] [CrossRef]
Rossander, A.; Karlsson, D. Structure of Health Information With Different Information Models: Evaluation Study with Competency Questions. JMIR Med. Inform. 2023, 11, e46477. [Google Scholar] [CrossRef] [PubMed]
Pedrera-Jiménez, M.; Kalra, D.; Beale, T.; Muñoz-Carrero, A.; Serrano-Balazote, P. Can OpenEHR, ISO 13606 and HL7 FHIR work together? An agnostic perspective for the selection and application of EHR standards from Spain. Authorea Prepr. 2023, 12, 100748. [Google Scholar]
openEHR Website. Available online: https://www.openehr.org/ (accessed on 19 April 2024).
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Mutinda, F.W.; Yada, S.; Wakamiya, S.; Aramaki, E. Semantic textual similarity in Japanese clinical domain texts using BERT. Methods Inf. Med. 2021, 60, e56–e64. [Google Scholar] [CrossRef] [PubMed]
Reese, J.T.; Danis, D.; Caufield, J.H.; Groza, T.; Casiraghi, E.; Valentini, G.; Mungall, C.J.; Robinson, P.N. On the limitations of large language models in clinical diagnosis. medRxiv 2023. [Google Scholar] [CrossRef] [PubMed]
Vachatimanont, S.; Kingpetch, K. Exploring the capabilities and limitations of large language models in nuclear medicine knowledge with primary focus on GPT-3.5, GPT-4 and Google Bard. J. Med. Artif. Intell. 2024, 7. [Google Scholar] [CrossRef]
Chakraborty, C.; Bhattacharya, M.; Lee, S.S. Need an AI-enabled, next-generation, advanced ChatGPT or large language models (LLMs) for error-free and accurate medical information. Ann. Biomed. Eng. 2024, 52, 134–135. [Google Scholar] [CrossRef] [PubMed]
Walker, H.L.; Ghani, S.; Kuemmerli, C.; Nebiker, C.A.; Müller, B.P.; Raptis, D.A.; Staubli, S.M. Reliability of medical information provided by ChatGPT: Assessment against clinical guidelines and patient information quality instrument. J. Med. Internet Res. 2023, 25, e47479. [Google Scholar] [CrossRef] [PubMed]
Cox, A.; Seth, I.; Xie, Y.; Hunter-Smith, D.J.; Rozen, W.M. Utilizing ChatGPT-4 for providing medical information on blepharoplasties to patients. Aesthetic Surg. J. 2023, 43, NP658–NP662. [Google Scholar] [CrossRef] [PubMed]
Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Huggingface’s transformers: State-of-the-art natural language processing. arXiv 2019, arXiv:1910.03771. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Wang, J.; Huang, J.X.; Tu, X.; Wang, J.; Huang, A.J.; Laskar, M.T.R.; Bhuiyan, A. Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges. ACM Comput. Surv. 2024, 56, 1–33. [Google Scholar] [CrossRef]
Rasmy, L.; Xiang, Y.; Xie, Z.; Tao, C.; Zhi, D. Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ Digit. Med. 2021, 4, 86. [Google Scholar] [CrossRef] [PubMed]
Liu, N.; Hu, Q.; Xu, H.; Xu, X.; Chen, M. Med-BERT: A pretraining framework for medical records named entity recognition. IEEE Trans. Ind. Inform. 2021, 18, 5600–5608. [Google Scholar] [CrossRef]
Luo, L.; Ning, J.; Zhao, Y.; Wang, Z.; Ding, Z.; Chen, P.; Fu, W.; Han, Q.; Xu, G.; Qiu, Y.; et al. Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks. J. Am. Med. Inform. Assoc. 2024, ocae037. [Google Scholar] [CrossRef] [PubMed]
He, J.; Li, P.; Liu, G.; Zhao, Z.; Zhong, S. PeFoMed: Parameter Efficient Fine-tuning on Multimodal Large Language Models for Medical Visual Question Answering. arXiv 2024, arXiv:2401.02797. [Google Scholar]
Shi, W.; Xu, R.; Zhuang, Y.; Yu, Y.; Wu, H.; Yang, C.; Wang, M.D. MedAdapter: Efficient Test-Time Adaptation of Large Language Models towards Medical Reasoning. arXiv 2024, arXiv:2405.03000. [Google Scholar]
Muizelaar, H.; Haas, M.; van Dortmont, K.; van der Putten, P.; Spruit, M. Extracting Patient Lifestyle Characteristics from Dutch Clinical Text with BERT Models. BMC Med. Inform. Decis. Mak. 2024, 24, 151. [Google Scholar] [CrossRef] [PubMed]
Kulkarni, D.; Ghosh, A.; Girdhari, A.; Liu, S.; Vance, L.A.; Unruh, M.; Sarkar, J. Enhancing pre-trained contextual embeddings with triplet loss as an effective fine-tuning method for extracting clinical features from electronic health record derived mental health clinical notes. Nat. Lang. Process. J. 2024, 6, 100045. [Google Scholar] [CrossRef]
Kumar, P.S. Bridging the Knowledge Gap: Improving BERT models for answering MCQs by using Ontology-generated synthetic MCQA Dataset. In Proceedings of the International FLAIRS Conference Proceedings, Sandestin Beach, FL, USA, 19–21 May 2024; Volume 37. [Google Scholar]
Suneetha, A.R.N.; Mahalngam, T. Fine tuning bert based approach for cardiovascular disease diagnosis. Int. J. Intell. Syst. Appl. Eng. 2023, 11, 59–66. [Google Scholar]
Kim, Y.; Kim, J.H.; Kim, Y.M.; Song, S.; Joo, H.J. Predicting medical specialty from text based on a domain-specific pre-trained BERT. Int. J. Med. Inform. 2023, 170, 104956. [Google Scholar] [CrossRef] [PubMed]
Su, P.; Vijay-Shanker, K. Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction. BMC Bioinform. 2022, 23, 120. [Google Scholar] [CrossRef] [PubMed]
Ding, J.; Li, B.; Xu, C.; Qiao, Y.; Zhang, L. Diagnosing crop diseases based on domain-adaptive pre-training BERT of electronic medical records. Appl. Intell. 2023, 53, 15979–15992. [Google Scholar] [CrossRef]
Babu, A.; Boddu, S.B. BERT-Based Medical Chatbot: Enhancing Healthcare Communication through Natural Language Understanding. Explor. Res. Clin. Soc. Pharm. 2024, 13, 100419. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Qin, D.; Zhang, X.; Zhang, H.; Liang, X. A Study on the Classification of Chinese Medicine Records Using BERT, Chest Impediment as an Example. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, Guilin, China, 24–25 September 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 29–37. [Google Scholar]
Faris, H.; Faris, M.; Habib, M.; Alomari, A. Automatic symptoms identification from a massive volume of unstructured medical consultations using deep neural and BERT models. Heliyon 2022, 8, e09683. [Google Scholar] [CrossRef] [PubMed]
Zheng, T.; Gao, Y.; Wang, F.; Fan, C.; Fu, X.; Li, M.; Zhang, Y.; Zhang, S.; Ma, H. Detection of medical text semantic similarity based on convolutional neural network. BMC Med. Inform. Decis. Mak. 2019, 19, 1–11. [Google Scholar] [CrossRef] [PubMed]
Liang, H.; Lin, K.; Zhu, S. Short text similarity hybrid algorithm for a Chinese medical intelligent question answering system. In Proceedings of the Technology-Inspired Smart Learning for Future Education: 29th National Conference on Computer Science Technology and Education, NCCSTE 2019, Kaifeng, China, 9–11 October 2019; Revised Selected Papers 29. Springer: Berlin/Heidelberg, Germany, 2020; pp. 129–142. [Google Scholar]
Li, Q.; He, S. Similarity matching of medical question based on Siamese network. BMC Med. Inform. Decis. Mak. 2023, 23, 55. [Google Scholar] [CrossRef] [PubMed]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://www.mikecaptain.com/resources/pdf/GPT-1.pdf (accessed on 20 May 2024).
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. Openai Blog 2019, 1, 9. [Google Scholar]
Dey, R.K.; Das, A.K. Modified term frequency-inverse document frequency based deep hybrid framework for sentiment analysis. Multimed. Tools Appl. 2023, 82, 32967–32990. [Google Scholar] [CrossRef]
Wan, Q.; Xu, X.; Han, J. A dimensionality reduction method for large-scale group decision-making using TF-IDF feature similarity and information loss entropy. Appl. Soft Comput. 2024, 150, 111039. [Google Scholar] [CrossRef]
Selva Birunda, S.; Kanniga Devi, R. A review on word embedding techniques for text classification. In Innovative Data Communication 436 Technologies and Application: Proceedings of ICIDCA 2020; Springer: Berlin/Heidelberg, Germany, 2021; Volume 59, pp. 267–281. [Google Scholar]
Berger, B.; Waterman, M.S.; Yu, Y.W. Levenshtein distance, sequence comparison and biological database search. IEEE Trans. Inf. Theory 2020, 67, 3287–3294. [Google Scholar] [CrossRef] [PubMed]
Zhao, C.; Sahni, S. String correction using the Damerau–Levenshtein distance. BMC Bioinform. 2019, 20, 1–28. [Google Scholar] [CrossRef] [PubMed]
ICD-9-CM Website. Available online: https://www.cms.gov/medicare/coding-billing/icd-10-codes/icd-9-cm-diagnosis-procedure-codes-abbreviated-and-full-code-titles/ (accessed on 10 June 2024).
ICD-10-CM Website. Available online: https://www.cms.gov/medicare/coding-billing/icd-10-codes/2024-icd-10-cm/ (accessed on 10 June 2024).

Figure 1. Illustration of the proposed method. During training, pseudo-label texts with similar semantic expressions are generated using ChatGPT-3.5 for a given target text. Subsequently, the target text and pseudo-label texts are separately processed by BERT and BERT-S to obtain semantic representation vectors

T_{j_{1}}, T_{j_{2}}, \dots T_{j_{N}}

and

{T^{'}}_{j_{1}}, {T^{'}}_{j_{2}}, \dots {T^{'}}_{j_{N}}

, respectively, with initialization weights provided by Hugging Face [20]. Then, the semantic representation vectors

T_{j_{1}}, T_{j_{2}}, \dots T_{j_{N}}

and

{T^{'}}_{j_{1}}, {T^{'}}_{j_{2}}, \dots {T^{'}}_{j_{N}}

are averaged to obtain text vector representations

O_{j}

and

{O'}_{j}

. Finally, the similarity score is computed using contrastive loss, which minimizes the distance between relevant samples and maximizes the distance between irrelevant samples.

Figure 1. Illustration of the proposed method. During training, pseudo-label texts with similar semantic expressions are generated using ChatGPT-3.5 for a given target text. Subsequently, the target text and pseudo-label texts are separately processed by BERT and BERT-S to obtain semantic representation vectors

T_{j_{1}}, T_{j_{2}}, \dots T_{j_{N}}

and

{T^{'}}_{j_{1}}, {T^{'}}_{j_{2}}, \dots {T^{'}}_{j_{N}}

, respectively, with initialization weights provided by Hugging Face [20]. Then, the semantic representation vectors

T_{j_{1}}, T_{j_{2}}, \dots T_{j_{N}}

and

{T^{'}}_{j_{1}}, {T^{'}}_{j_{2}}, \dots {T^{'}}_{j_{N}}

are averaged to obtain text vector representations

O_{j}

and

{O'}_{j}

. Finally, the similarity score is computed using contrastive loss, which minimizes the distance between relevant samples and maximizes the distance between irrelevant samples.

Figure 2. Ecoder (a) and Multi-Head Attention (b).

Figure 3. Flowchart of Pseudo-label Generation by ChatGPT.

Table 1. Summary of BERT-Based and CNN-based Methods in Medical Applications.

Methods	Authors	Problem	Solution
CNN-based	Zheng et al. [38]	Manually retrieving and comparing imaging and pathology reports with overlapping exam body sites is time-consuming.	A convolutional neural network model was used to calculate similarities among report pairs.
	Liang et al. [39]	Calculating the semantic similarity of noisy short medical question texts for an intelligent QA system.	A shared layer-based CNN combined with TF-IDF for feature extraction and noise reduction.
	Li et al. [40]	Enhancing the efficiency of online medical QA by accurately matching user questions with professional medical answers.	A bidirectional gated recurrent unit network with CNN for feature extraction to improve matching accuracy.
BERT-Based	Suneetha et al. [31]	Improve the diagnosis effect of cardiovascular disease.	Fine-tuning BERT to provide an effective and accurate diagnosis.
	Kim et al. [32]	Use natural language processing technology to improve outpatient diagnosis and treatment initiation efficiency and accuracy.	A medical professional prediction model based on BERT patient-side medical question text is used to diagnose outpatients.
	Su et al. [33]	Improve the accuracy of automatic extraction of biomedical relationships.	Pre-training and fine-tuning BERT to improve extraction accuracy.
	Ding et al. [34]	Crop disease diagnosis.	BERT and RCNN-based model to assist plant doctors in diagnosing crop diseases.
	Babu et al. [35]	Healthcare communication and accessibility.	BERT-based medical chatbot using advanced deep-learning techniques.
	Chen et al. [36]	Text classification for coronary heart disease.	BERT-based pre-trained diagnostic model for traditional Chinese medicine texts.
	Faris et al. [37]	Symptom identification and diagnosis in multiple languages.	BERT-based method to assist doctors in handling multilingual consultations.

Table 2. Target Text and Pseudo-labels. The first column represents the target text of OpenEHR, and the second column represents the pseudo-labels generated using ChatGPT-3.5.

Target Text	Pseudo-Label
exclusion of pregnancy	Negative pregnancy test
iciq-ui short form	Urinary Incontinence Questionnaire - Short Version
financial record	Financial documentation
visual acuity test result	Vision clarity assessment outcome
palpation of the uterus	Manual examination of the uterus
organisation	institution
breast-feeding summary	Lactation overview
oocyte and embryo assessment	Examination of oocyte and embryo development
examination of the cornea	Corneal examination
informed consent	Informed agreement
clinical synopsis	Clinical summary
housing summary	Living situation summary
inspection of the vagina	Vaginal examination
self-test result	Personal assessment outcome
classification of glaucoma	Glaucoma categorization
imaging examination of a pregnant uterus	Antenatal imaging of the uterus
inspection of the rectum	Rectal examination
imaging examination of a placenta	Placental imaging study
occupation record	Career profile
imaging examination of the scrotum	Scrotal imaging

Table 3. OpenEHR dataset comparison results.

Models	Top 1	MRR	$Precision$	$Recall$	$F_{1}$	AUC	MCC
TF-IDF	0.0	22.8	75.0	50.0	60.0	67.5	13.2
Count Vector	25.0	77.0	75.0	50.0	60.0	66.7	13.2
Levenshtein Distance	35.0	85.8	75.5	55.0	63.6	76.1	15.9
Damerau–Levenshtein Distance	35.0	85.8	75.5	55.0	63.6	73.4	14.7
BERT	85.0	86.7	79.0	90.0	84.1	95.3	34.4
Ours	85.0	87.3	79.5	95.0	86.6	95.5	37.1

Table 4. Results of OpenEHR simulated massive dataset.

	Models	Top 1	MRR	$Precision$	$Recall$	$F_{1}$	AUC	MCC
Group1	TF-IDF	0.0	10.8	97.0	20.0	33.2	66.1	7.9
	Count Vector	5.0	15.8	96.8	15.0	29.8	66.3	5.6
	Levenshtein Distance	10.0	37.0	97.2	35.0	51.5	73.5	14.7
	Damerau–Levenshtein Distance	5.0	32.0	97.2	35.0	51.5	73.4	14.7
	BERT	25.0	35.6	97.4	55.0	70.3	94.1	23.8
	Ours	30.0	40.2	97.4	60.0	74.3	94.7	26.1
Group2	TF-IDF	10.0	20.0	97.1	25.0	40.0	66.6	10.4
	Count Vector	10.0	22.0	97.0	20.0	33.2	66.5	7.9
	Levenshtein Distance	15.0	37.8	97.1	30.0	45.8	76.6	12.4
	Damerau–Levenshtein Distance	15.0	37.8	97.1	30.0	45.8	73.4	14.7
	BERT	35.0	47.2	97.5	70.0	81.5	94.8	30.6
	Ours	50.0	56.7	97.5	70.0	81.5	95.4	30.6
Group3	TF-IDF	5.0	17.5	97.0	20.0	33.2	66.1	7.0
	Count Vector	10.0	20.0	97.0	20.0	33.2	66.2	7.9
	Levenshtein Distance	15.0	40.0	97.1	25.0	40.0	72.7	10.1
	Damerau–Levenshtein Distance	15.0	40.0	97.1	25.0	40.0	73.4	14.7
	BERT	35.0	46.8	97.6	70.0	81.5	95.1	32.9
	Ours	40.0	47.9	97.4	60.0	74.3	95.1	26.1
Group4	TF-IDF	0.0	9.0	97.0	20.0	33.2	65.2	7.9
	Count Vector	5.0	14.5	97.1	15.0	40.0	65.4	10.1
	Levenshtein Distance	10.0	33.3	97.2	35.0	51.5	73.4	14.7
	Damerau–Levenshtein Distance	10.0	33.3	97.2	35.0	51.5	73.4	14.7
	BERT	30.0	39.9	97.4	60.0	74.3	94.7	26.1
	Ours	30.0	40.2	97.4	60.0	74.3	94.9	26.1

Table 5. ICD-9-CM and ICD-10-CM dataset comparison results.

Dataset	Models	Top 1	MRR	$Precision$	$Recall$	$F_{1}$	AUC	MCC	FLOPs	Parameters
ICD-10-CM	BERT	80.0	86.7	83.7	96.0	89.4	98.0	38.8	1.19G	104.4M
ICD-10-CM	Ours	80.0	87.3	83.7	96.0	89.4	98.2	38.8	1.19G	104.4M
ICD-9-CM	BERT	40.0	46.1	81.1	64.0	68.9	79.2	22.5	1.19G	104.4M
ICD-9-CM	Ours	36.0	49.9	81.8	72.0	76.6	82.5	26.5	1.19G	104.4M

Table 6. Qualitative Analysis Experiment on Target Texts, Considering the Top Five Similarity Scores as Predicted Results. The first column represents the target text, the second column represents the pseudo label, the third column represents BERT’s predicted results, and the fourth column represents our model’s predicted results.

Target Text	Pseudo Label	BERT Prediction	Our Prediction
self-test result	Personal assessment outcome	Personal assessment outcome Test sample Pregnancy test result Specimen measurements Examination findings-lens	Personal assessment outcome Health assessment questionnaire Neonatal assessment score Exposure screening questionnaire Acquisition details on visual field test
classification of glaucoma	Glaucoma categorization	Glaucoma categorization Examination of a tympanic membrane Examination of the respiratory system Fundoscopic examination of eyes Scrotal imaging study	Glaucoma categorization Examination of the respiratory system Jugular venous pressure Scrotal imaging study Fundoscopic examination of eyes
inspection of the rectum	Rectal examination	Examination of the thyroid Examination of a breast Palpation of the prostate Examination of the respiratory system Rectal examination	Examination of the thyroid Examination of a breast Rectal examination Examination of the respiratory system Palpation of the prostate
imaging examination of a placenta	Placental imaging study	Imaging examination of a body structure Placental imaging study Obstetric ultrasound scan Scrotal imaging study Ophthalmic tomography examination	Placental imaging study Imaging examination of a body structure Scrotal imaging study Obstetric ultrasound scan Ophthalmic tomography examination

Table 7. Ablation experiments on the contrastive loss strategies of minimizing positive samples and maximizing negative samples. Min Positive denotes the strategy of minimizing positive samples, and Max Negative denotes the strategy of maximizing negative samples.

	Min Positive	Max Negative	Top 1	MRR	$Precision$	$Recall$	$F_{1}$	AUC	MCC
Group1			25.0	35.6	97.4	55.0	70.3	94.1	23.8
	✓		30.0	39.2	97.4	55.0	70.3	94.6	23.8
		✓	30.0	39.3	97.4	55.0	70.3	94.5	23.8
	✓	✓	30.0	40.2	97.4	60.0	74.3	94.7	26.1
Group2			35.0	47.2	97.5	70.0	81.5	94.8	30.6
	✓		35.0	49.8	97.6	75.0	84.8	95.4	32.9
		✓	50.0	53.7	97.5	70.0	81.5	95.1	30.6
	✓	✓	50.0	56.7	97.5	70.0	81.5	95.4	30.6
Group3			35.0	46.8	97.6	70.0	81.5	95.1	32.9
	✓		35.0	45.6	97.5	65.0	78.0	95.3	28.3
		✓	35.0	45.6	97.4	60.0	74.3	94.8	26.1
	✓	✓	40.0	47.9	97.4	60.0	74.3	95.1	26.1
Group4			30.0	39.9	97.4	60.0	74.3	94.7	26.1
	✓		25.0	35.6	97.5	65.0	70.3	94.9	26.1
		✓	30.0	39.9	97.4	60.0	74.3	94.9	26.1
	✓	✓	30.0	40.2	97.4	60.0	74.3	94.9	26.1

Table 8. Adaption BERT with different Loss. Contrastive Loss* indicates that the square is not used.

Models	Top 1	MRR	$Precision$	$Recall$	$F_{1}$	AUC	MCC
Baseline	85.0	86.7	79.0	90.0	84.1	95.3	34.4
Cross Entropy Loss	80.0	83.8	79.0	90.0	84.1	94.4	34.4
Smooth Cross Entropy Loss	75.0	81.3	79.0	90.0	84.1	94.2	34.4
Contrastive Loss*	85.0	86.7	79.0	90.0	84.1	95.3	34.4
Contrastive Loss	85.0	87.3	79.5	95.0	86.6	95.5	37.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Min, L.; Fan, Z.; Dou, F.; Sun, J.; Luo, C.; Lv, Q. Adaption BERT for Medical Information Processing with ChatGPT and Contrastive Learning. Electronics 2024, 13, 2431. https://doi.org/10.3390/electronics13132431

AMA Style

Min L, Fan Z, Dou F, Sun J, Luo C, Lv Q. Adaption BERT for Medical Information Processing with ChatGPT and Contrastive Learning. Electronics. 2024; 13(13):2431. https://doi.org/10.3390/electronics13132431

Chicago/Turabian Style

Min, Lingtong, Ziman Fan, Feiyang Dou, Jiaao Sun, Changsheng Luo, and Qinyi Lv. 2024. "Adaption BERT for Medical Information Processing with ChatGPT and Contrastive Learning" Electronics 13, no. 13: 2431. https://doi.org/10.3390/electronics13132431

APA Style

Min, L., Fan, Z., Dou, F., Sun, J., Luo, C., & Lv, Q. (2024). Adaption BERT for Medical Information Processing with ChatGPT and Contrastive Learning. Electronics, 13(13), 2431. https://doi.org/10.3390/electronics13132431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Adaption BERT for Medical Information Processing with ChatGPT and Contrastive Learning

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Preliminary Knowledge

3.2. Overview

3.3. Pseudo-Label Generation with ChatGPT

3.4. Adaption BERT with Contrastive Loss

4. Experiments

4.1. Experiment Setup

4.2. Method Comparison

4.3. Qualitative Analysis

4.4. Ablation Studies

4.5. Limitations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI