You are currently viewing a new version of our website. To view the old version click .
Journal of Personalized Medicine
  • Article
  • Open Access

10 May 2022

A Novel Patient Similarity Network (PSN) Framework Based on Multi-Model Deep Learning for Precision Medicine

,
,
,
and
1
Department of Information Systems and Security, College of Information Technology, UAE University, Al Ain P.O. Box 15551, United Arab Emirates
2
Department of Computer Science and Software Engineering, Concordia University, Montreal, QC H3G 1M8, Canada
3
Department of Epidemiology and Public Health, College of Medicine and Health Sciences, Khalifa University, Abu Dhabi P.O. Box 17666, United Arab Emirates
4
Institute of Public Health, College of Medicine and Health Sciences, UAE University, Al Ain P.O. Box 15551, United Arab Emirates
This article belongs to the Topic Big Data in Healthcare, Bioinformatics and Precision Medicine

Abstract

Precision medicine can be defined as the comparison of a new patient with existing patients that have similar characteristics and can be referred to as patient similarity. Several deep learning models have been used to build and apply patient similarity networks (PSNs). However, the challenges related to data heterogeneity and dimensionality make it difficult to use a single model to reduce data dimensionality and capture the features of diverse data types. In this paper, we propose a multi-model PSN that considers heterogeneous static and dynamic data. The combination of deep learning models and PSN allows ample clinical evidence and information extraction against which similar patients can be compared. We use the bidirectional encoder representations from transformers (BERT) to analyze the contextual data and generate word embedding, where semantic features are captured using a convolutional neural network (CNN). Dynamic data are analyzed using a long-short-term-memory (LSTM)-based autoencoder, which reduces data dimensionality and preserves the temporal features of the data. We propose a data fusion approach combining temporal and clinical narrative data to estimate patient similarity. The experiments we conducted proved that our model provides a higher classification accuracy in determining various patient health outcomes when compared with other traditional classification algorithms.

1. Introduction

A “one-size-fits-all” approach to medicine is unreliable since some therapies work better in some individuals than others. Precision medicine, which is a recent and innovative approach, considers the individual differences in people’s genes, environmental contexts, and lifestyles. The precision medicine initiative, which was implemented by President Obama in 2015 [1], empowers people to invest and manage their health by providing tailored healthcare. Often, individuals seek examples from other individuals in similar fields to make decisions regarding various life-related matters. For instance, in real life, students make academic and career plans by seeking guidance from their seniors who have taken similar choices and have experienced the same path previously. Physicians take inputs, learn, and adapt based on their previous experience in handling various cases [2]. Similarly, patients seek guidance, recommendations, and medical treatments from patients suffering from similar health conditions. Patient-friendly social websites, such as PatientsLikeMe [3], are platforms on which people with every type of condition share their health experiences, find similar patients, learn how to take control over their health, and participate in their health management. These websites enable information sharing between patients and the provision of advice from healthcare workers. As a result, patient care is improved, and realistic medical research is accelerated.
Patient similarity analysis [4] aims to classify patients into medically relevant clusters to gain insight into underlying disease mechanisms. Common disease trajectories leading to specific outcomes can be established based on the clustering of patient journeys, which involves all the timeline of medical services and events from admission to discharge/death. This is based on the premise that insights gained using prediction models trained on similar patients’ data are more dependable than those obtained using all available data. The patient similarity network (PSN) model makes it possible for classifiers to be accurate and generalizable. Furthermore, it provides the classifiers with the ability to incorporate heterogeneous data and manage missing information naturally [5]. PSNs are used to handle heterogeneous data by converting each datatype to a similarity network and then easily integrating/aggregating them into one similarity network using, for instance, a fusion algorithm. Moreover, surpasses other classification and clustering algorithms in handling missing data because the existing data can be used in another network if patient data is missing for one network. Additionally, techniques for deep network embedding, graph neural networks, and ordinary neural differential equation models can be implemented using graph analytics algorithms [6]. These approaches are predominantly used in the case of the multimodal patient data associated with the predictive modeling of health hazards and subtyping of diseases. In precision medicine, patient similarity analysis can be used to improve patient outcome prediction and, it is likely to contribute to clinical decision making.
PSN is a new trend that comes under the umbrella of precision medicine, where patients are clustered or classified based on their similarities according to various features. The theory associated with the case similarity of patients can be explained using the following example. If two patients are similar, based on several aspects, their medical case progression is also bound to be similar. Therefore, identifying past patients similar to the current patient could help to provide insights related to disease investigations and potential treatments. Thus, the objective of PSNs is to recommend the appropriate therapy, medicine, and lifestyle changes to the current patient based on relevant data extracted from similar patients, thereby determining the possible clinical outcomes [5].
Each input patient data feature is represented as a patient similarity network in this system (PSN) [7]. Each PSN node is an individual patient and an edge between two patients corresponds to pairwise similarity. Using a similarity measure, PSNs can be generated from any available data. Deep learning (DL) based on supervised patient similarity [8], represents patient pairs with embedding matrices (Ea and Eb) passing through convolutional filters and are mapped onto feature maps to train the neural network (Figure 1). Deep embedding patient representations (Pa and Pb) are created for patients by pooling patient feature maps into the intermediate vectors. A symmetrical similarity matrix M with feature vectors is learned to calculate the similarity between patients a and b.
Figure 1. Supervised patient similarity matching framework.
The remainder of this paper is organized as follows. Initially, a comparative study of the existing literature in PSN and the challenges are identified. Further, we propose a hybrid model for PSNs. Then, we present our recommended hybrid model formulation and establish the model using the presented algorithms. Subsequently, we detail our experimental scenarios and discuss the results. Finally, we discuss directions for future work and conclude the paper.

3. A Multidimensional Data Fusion Model based on Deep Learning and PSN

In this section, we describe the proposed system architecture in which a DL-based approach was adopted for building patient similarity. We emphasize the main processes involved in implementing our solution, including the data collection phase, DL model development, training, testing, model accuracy evaluation, and diagnostic prediction and clinical recommendations.

3.1. Data Collection, Preparation, and Preprocessing

The data of each patient were characterized by demographic and clinical variables, including the recorded vital signs (e.g., blood pressure and heart rate), physical exam findings, symptoms, laboratory tests, and prior medical history. Health data streams were managed using various stream preprocessing approaches, such as PCA, or other data reduction techniques. The processed streamed data were then stored in databases. Various data features can be selected based on the diseases to be predicted. The stored data were queried accordingly and processed to eliminate inconsistent and redundant data. Then, the data were represented in an adequate form accepted by DL algorithms (e.g., vector and matrix).

3.2. Architecture: Component Description

Figure 2 depicts the main components of our system and the key processes involved in data collection, model construction, training, and evaluation. Proactive recommendations will be drawn from the prediction results, which can be obtained from the laboratory test recommendations, medication suggestions, and treatment propositions.
Figure 2. System architecture.

3.2.1. Deep Learning Algorithm Selection

This process involved the exploration of different DL algorithms and selection of the most appropriate algorithm based on various criteria, including the type of machine learning tasks (supervised, unsupervised, semi-supervised, or reinforcement learning), the type of disease to be predicted, the nature of selected features, data size and type (discrete or time-series), and complexity of the model. This selection can be based on the previously conducted studies and a thorough comparison and benchmarking of the different DL models.

3.2.2. Model Development, Training, Prediction, and Evaluation

The model highlights the similarity-network-fusion-based aggregation referred to as the hybrid model (Figure 2). The dynamic data from the stream processing module could benefit the DL model, whereas the clinical static data could employ contextualized word embeddings. The similarity distances were calculated for each patient and combined to output a patient similarity score that serves to find similar patients when a new patient arrives. For prediction model evaluation, we used several performance metrics, including the root-mean-square error (RMSE) and mean absolute percentage error (MAE).

3.2.3. Prediction and Visualization

In this module, a dashboard was designed to visualize the forecast outcomes and collection of guidelines and clinical advice, including diagnosis, potential laboratory examination, and drug prescription. A prototype of the mobile app visualization dashboard, which provides a physician’s perspective of listing similar patients when a specific patient is selected (in this case, patient ID 5), is depicted in Figure 3. It also indicates the common symptoms experienced by similar patients with respect to cardiovascular disease (CVD) events and brain seizures.
Figure 3. Visualization dashboard—A physician’s perspective.

3.3. Architecture: Technologies, DL Platforms, and Tools

Traditionally in NLP, feature engineering techniques require considerable awareness of the domain and commitment in interpreting meaningful characteristics. The situation is more challenging in the case of the healthcare domain, where clinical machine learning models are difficult to use daily in the case of hospital stays on unstructured, high-dimensional, and fragmented data, such as clinical notes, including laboratory reports, radiology reports, as well as nursing, pharmacy, and physician notes. Reading numerous clinical notes is a tedious task for a physician. However, clinical notes have considerable scientific benefits. Tools that can automate and obtain accurate clinical forecasts are invaluable in medical practice. BERT preprocessing and training are highly computational processes. The authors in [46] proposed a pre-trained fine-tuned BERT model to support researchers’ applications in different domains. Clinical BERT [46,47] is a tool for modeling clinical notes that can discover and allow medical professionals to forecast clinical insights. Similarly, BioBERT [48] is a pre-trained language representation model for the biomedical domain, and biomedical NLP studies may benefit from it. Alsentzer et al. further pre-trained BioBERT on all MIMIC III discharge summaries (DischargeBERT) [46]. BioBERT is the most similar to PubMedBERT [49] since it also pre-trains using PubMed content. However, by completing domain-specific pretraining from scratch, including the use of the PubMed vocabulary, PubMedBERT outperforms BioBERT in most tasks. BlueBERT [50] is a BERT-based model that has been pre-trained on PubMed abstracts and MIMIC III clinical notes. Researchers have come up with an improved procedure for training BERT models, called RoBERTa [51], which includes training the model for longer, with bigger batches, and over more data. Biomedical ALBERT (BioALBERT) [52] is a context-dependent, rapid, and effective language model trained on huge biomedical corpora to overcome the problem of limited training data. BoneBert [53] is a BERT-based labeling system that was trained on a dataset of 6048 X-ray radiology reports and then fine-tuned using a small collection of 4890 expert annotations. Thus, by employing the pre-trained BERT model, features can be mapped into an embedding matrix that serves as input to other classifiers. Further, BERT is proposed as the apt model for static data.
The architecture proposed in this paper (Figure 2) reveals the possibilities of big eHealth data processing technologies represented by stream ingestion platforms as well as stream and batch processing modules. This will respond to the need of handling timely inputs and provide more personalized treatment. Concerning dynamic data, healthcare professionals can utilize a data-driven approach using platforms such as Apache Kafka, a prominent stream ingestion platform, to enable them to ingest real-time health data sources from patients, such as sensors and medical devices.
Data stream processing engines, such as Spark Streaming [54], support native in-memory storage. However, others typically do not provide their own data storage mechanisms, but offer data source and sink connectors to data ingestion mechanisms, such as Kinesis, Kafka, HDFS, and Cassandra. Spark Streaming can be used to collect data streams from live sources and split the data into batches, which are further processed by the Spark engine to produce the final batch. The resulting batches of data from stream processing and the output of the batch processing module using Spark MLib or similar batch processing tools are stored in databases and utilized to train the model. DL networks can use technologies, such as Tensorflow, Keras, PyTorch, and other DL platforms and libraries, for developing the model to calculate patient similarity scores and provide prediction and visualization regarding diagnosis, treatments, and lifestyle recommendations.
The notion of the proposed patient similarity model is a combination or ensemble model, which is multifaceted. Our proposed multidimensional model can be obtained via algorithm aggregation based on SNF in which a DL network and contextual word embeddings of a PSN are combined. Specifically, the patient similarities in clinical diagnosis, imaging, genomics, and time-series data are considered when finding the most similar patient. Hence, the proposed model can efficiently identify similar patients with comorbidities, for example, having multiple medical conditions.

4. Model Formulation

We propose a model formulation to represent patients and derive a similarity measure based on the vectors generated from medical events. We extracted a dense and lower-dimensional representation for patients from EHR data, while conserving temporal information.
To model this data, we denoted the patient set as   S = { s 1 ,   s 2 ,   ,   s n } , where s i   is the vector of the i t h patient and n is the number of patients. This vector comprises a tuple of two main parts, namely, the static part s t   and dynamic part   d , s i = ( s t i ,   d i ) . In this section, we describe static and dynamic data modeling, the similarity network, and the PSN construction algorithms.

4.1. Static Data Modeling

The static data part S t   represents the patient’s profile information containing age, gender, multiple laboratory test items, and multiple disease diagnoses. Further, the similarity of a few selected features, such as age, gender, and diabetes, was modeled.

4.1.1. Feature Similarity for Age

We denoted a g e i and a g e j as the ages of patients i and j , respectively. We can represent the feature similarity f s 1 for age as the ratio of the smaller age to the larger age [55].
f s i , j 1 = min ( a g e i , a g e j ) max ( a g e i , a g e j )

4.1.2. Feature Similarity for Gender

For the gender feature, we defined the similarity feature f s 2 between patients i and j as 1 if they had the same gender and 0 otherwise.
f s i , j 2 = { 1 ,                     i f   g e n i = g e n j 0 ,                                   O t h e r w i s e

4.1.3. Feature Similarity for Other Static Features

Other static features included events, such as patients having a chronic disease, represented as a Boolean value. For example, when a patient was diabetic, we defined the similarity feature f s 3 between patients i and j as 1 if both patients had the same condition (either both diabetic or both nondiabetic) and 0 otherwise.
f s i , j 3 = { 1 ,                     i f   d i a b i = d i a b j 0 ,                                         O t h e r w i s e

4.1.4. Global Static Patient Similarity

We calculated the global patient similarity for static features using the following weighted sum of all the static feature similarities as a single measure of static patient similarity (STPS) for patients i and   j . We used a weight vector   W V = { w 1 ,   w 2 ,   ,   w n f } , where n f is the number of static features used to evaluate the patient similarity, w k   is the weight given for each static similarity feature   f s k ,   w k ϵ   W , and   k = 1 n f w k = 1 .
S T P S i , j = k = 1 n f w k   f s i , j k ,   w h e r e   k = 1 ,   2 ,   3 , , n f .

4.2. Dynamic Data Modeling

The dynamic data part D   was extracted from the EHR data, which is a time-series vector representing the number of visits   m and was denoted by a sequence of visits as   D = {   P V d 1 , P V d 2 ,   ,   P V d m }. Each visit P V d i   was denoted by a high dimensional vector   P V d i , where each element in that vector ϵ   R and indicates that the patient has a medical event value represented as a real number, for example, a patient p having a visit P V d i , which is a vector containing all medical events that were measured during this visit, such as BMI = 20.80, smoke = 0, diabetic = 0, sbp = 116, dbp = 81, and chol = 214. Therefore, the horizontal axis indicates the rows (i), each of which represents a visit P V d i ,   and the vertical axis indicates the columns (j), which represent the medical events x i   ϵ   X , where X is the set of medical events, that is, features measured during the visit. The ( i , j ) t h   value was observed at time t i of P V d i   for a certain patient. The number of visits varied for different patients. Thus, the dimension of this matrix was defined as d i m = max ( D ) i = 1 m . This variable-sized data can be managed using an autoencoder-based long short-term memory (LSTM), which is detailed in the following section.

4.2.1. Long Short-Term Memory (LSTM)

LSTM [56] is a variation of deep RNNs that have been commonly adopted in diverse domains, such as language modeling and speech recognition. A typical LSTM network is comprised of different memory blocks called cells. There are two states that are being transferred to the next cell: the cell state and the hidden state. The memory blocks are responsible for remembering things and manipulations to this memory are achieved through three major mechanisms, called gates (Forget Gate, Input Gate, and Output Gate). LSTM quickly learns to differentiate between two or more widely spaced instances of a given element in a series of inputs. Learning rate, input gate bias, and output gate bias are just a few of the factors that LSTM excels at. RNN is designed for sequential data, such as time-series, text, audio, and video data. Contrary to a standard feedforward neural network, RNN considers the input data at the current time step and the output of the previous time step [57]. In addition, RNN involves cycles with network activations from a previous time step as inputs to the network, affecting the predictions at the current time step, and incorporates the memory of previous events. Nevertheless, standard RNN exhibits issues, such as vanishing and exploding gradients, which affect long-term dependencies [58]. LSTM overcomes vanishing gradient problems using a forget gate that allows the error to be backpropagated through time and via layers, allowing gradients to flow unaffected through many time steps [59].
Choosing LSTM in our autoencoder model facilitated the feature reduction process to learn from the temporal relationships among time-series features, instead of implementing a feature reduction process that flattened all the time-series features and lost the temporal information contained in the set of features. We first trained our dataset utilizing a reconstruction autoencoder model to reduce the size from 20,680 to 4046 rows with 5D embeddings each. Choosing 5D embeddings produced good accuracy results when training. The proposed model used a batch of series of patient exam records as input and output (1 × 5) vectors that is the final hidden state. We used the rectified linear activation function (ReLU) in our LSTM model and the loss values were calculated based on the mean square error (MSE). In our model, LSTM was a gated RNN with an input vector, which is the dynamic part vector d i ϵ   R S of the patient’s set   P V .

4.2.2. Patient Visit Matrix Embedding (Data Dimension Reduction)

The dynamic data part D   was fed into one layer of the time-series LSTM model encoder to preserve the temporal features of patients’ data. This layer reduced the data dimension to produce an output vector   D , which included embeddings of a smaller dimension d   as the final hidden state. This was performed to reduce data dimensions and learn relationships among features. Thus, each column was embedded in the vector space. Consequently, each visit d P V i was mapped into an embedding matrix E B i ϵ   R d i m , where d i m < | X | the embedding dimension. Using the rectified linear activation function (ReLU), the summed weighted input was transformed into an output using a formula similar to that in a previous study [12], where W v ϵ   R d i m | X |   and   b v ϵ   R d i m are the weight matrix and bias vector to be learned, respectively.
e i = R e L U   ( W v   V i + b v )
R e L U ( x ) = max ( 0 , x )
This operation resulted in an embedding matrix E B i   for each patient, resulting in a lower feature dimension than that of the original dataset.

4.3. Similarity Network Fusion

SNF is a new nonlinear computational approach for integrating and fusing different PSNs [18]. It combines different datasets. In our study, the static and dynamic similarity matrices were aggregated for a given dataset of patients, achieving good performance. This approach begins with the construction of a sample similarity network for each data matrix. In this work, we used the static data matrix S T M and the dynamic data matrix   D M , which were formed using algorithms 1 and 2, respectively, depicted in Section 5. Then, we iteratively integrated such networks using a network fusion method described as follows. First, we normalized each matrix by dividing each row element of the matrix by the sum of the rows, so that the sum of all the elements in each row was 1.
w i , j = m i , j j = 1 n m i , j ,
where w i , j is the normalized value of each element m i , j   of the similarity matrix. Then, the normalized matrix W can be symmetrized as
W S y m = ( W + W T ) / 2   ,  
where W T   denotes the transpose of   W . The resulting matrices were defined as S T M   and D M   to represent the static data similarity matrix and dynamic data similarity matrix, respectively.
Next, we used the K-nearest neighbor method to calculate the local similarity for each matrix [18].
w i , j = { w i , j y   ϵ   W w i , j y ,     j   ϵ   N 0 ,     O t h e r w i s e ,
where N is a set of nearest neighbors of patient i   from both matrices denoted by y with size K determined by the user. Thus, the strongest links with the highest weights were selected, and the weak links in the network were eliminated to reduce noise interference. Finally, the two updated matrices S T M 1   and D M 2 , formed by calculating the local similarity using the above equation, were fed to the SNF algorithm that iterated for a given number of iterations T , starting at M P t = 0 1 = S T M   and   M P t = 0 2 = D M . In general, SNF fuses the similarity networks attained from different data types separately by aggregating their data. The resultant fused network captures the integrated information obtained from different data sources, that is, by fusing the similarity between all patients rather than a pair of patients. However, in this paper, we used SNF to combine patient similarity matrices rather than raw data. Therefore, we modified the algorithm to aggregate the similarity values between each pair of patients into a single value in accordance with the following aggregation function based on the weighted average [60].
M P t + 1 1 = ( w t s   S T M + ( 1 w t s )   M P t 2 )   / 2 , M P t + 1 2 = ( w t d   D M + ( 1 w t d )   M P t 1 )   / 2 ,
where w t s   and   w t d denote the weights according to the significance of each matrix estimated by the user. Here, M P t + 1 1 is the state matrix transformed based on the S T M similarity matrix after t iterations and M P t + 1 2 is the state matrix transformed based on the D M similarity matrix after t iterations. In each iteration, the information of each similarity network was changed to produce two final state matrices that were integrated into the fusion similarity matrix F M as:
F M = ( M P t 1 + M P t 2 ) / 2 , w h e r e   t = T
This modification distinctly indicated the strength of similarity between each pair of patients and reduced the noise and interference that can be attributed to the similarity of other patients. This integrated matrix, which was obtained from the sequential operations, produced a PSN defined as a graph G = (V, E). The vertex V represents the patient set S, and the edges   E are weighted by the similarity level between the patients. The edge weights were denoted as a N × N similarity matrix F M resulted from the final iteration of the SNF algorithm, as explained earlier, where each element w i , j indicates the similarity level between patients s i and   s j . Figure 4 shows the key processes associated with the building of a hybrid PSN, including static, dynamic, and fused similarity matrix constructions, as per the formal description.
Figure 4. Key processes in building a PSN.

5. PSN Construction Algorithms

In this section, we describe our algorithms for constructing the proposed hybrid PSN. We developed three algorithms. The first algorithm implements the procedure to generate the static similarity matrix, the second algorithm implements dynamic similarity matrix generation, and the third algorithm implements similarity matrix fusion.
Algorithm 1 outputs the STPS matrix based on the model explained in this study. The input to this algorithm is the static data part of the patient dataset, the list of selected features to be evaluated for similarity, the list of similarity utility for each selected feature, and the given weights for each similarity feature.
Algorithm 1. Static data similarity evaluation algorithm
Input:
PList, List of Patients 
SFList, List of selected features 
SUList, List of similarity utility for each feature 
weights List of weight for each feature 
Output:
SSM Static similarity matrix for all patients 
 1: procedure STATICSIMILARITYMATRIX(PList, SFList, SUList, weights)
 2:  SSMinitilizeToEmpty()
 3:  for si ← 1,N do            each patient i  
 4:    for sjsi + 1,N do        each patient j
 5:     for fk ← 1,K do       each selected feature (col)
 6:        FSscore[si,sj] ← getSimilarityScore(si,sj,SUList[fk])
 7:        SSM[si,sj] ← SSM[si,sj]+FSscore[si,sj]∗weights[fk]
 8:     end for
 9:    end for
 10: end for
 11: return SSM
 12: end procedure = 0
Algorithm 2 applies the DL autoencoder model to generate the dynamic similarity matrix. It takes as input the dynamic data part of the patient list denoted as P V , the activation function, e.g., ReLu, the number of dynamic features to be evaluated, and the output embedding dimension.
Algorithm 2. Dynamic data similarity evaluation algorithm
Input:
DPList, List of Patients with dynamic data 
ACTF, Activation function 
NF, Number of features 
NEMB Embedding dimension
Output:
DSM Dynamic similarity matrix for all patients 
 1: procedure DYNAMICSIMILARITYMATRIX(DPList, ACTF, NF, NEMB)
 2: preprocess(DPList)
 3: EBdeepLearningAutoencoder(DPList,ACTF,NF,NEMB)
 4: for si ← 1,N do               each patient i
 5:   for sjsi +1,N do            each patient j
 6:     DSM[si,sj] ← getSimilarityScore(EB[si],EB[sj]) Euclidean
 7:   end for
 8: end for
 9: return DSM
 10: end procedure=0
Algorithm 3 finalizes the fusion process. It takes as input the two matrices, the number of nearest neighbors   K , and the number of iterations   T required for executing the iterative SNF process. The final output is the fused patient matrix referred to in this study.
Algorithm 3. Similarity network fusion algorithm
Input:
STM, Static similarity matrix 
DM, Dynamic similarity matrix 
T, Number of iterations to complete fusion 
K, Number of nearest neighbors 
wts, Weight for Static similarity matrix 
wtd Weight for Dynamic similarity matrix 
Output:
FPSM Fused patient similarity matrix 
 1: procedure SIMILARITYNETWORKFUSION(STM,DM,T,K)
 2: M1prevSTM
 3: M2prevDM
 4: normalize(STM,DM)
 5: symmetrize(STM,DM)
 6: for siSTM do        calculate local similarity for STM
 7:  neighborListnearestKNeihbors(si,k,STM,DM)
 8:  for sjneighborList do
 9:     S T M [ s i , s j ] S T M [ s i , s j ] / i = 1 k n e i g h b o r L i s t [ i ]
 10:  end for
 11:  end for
 12: for siDM do         calculate local similarity for DM
 13:  neighborListnearestKNeihbors(si,k,STM,DM)
 14:  for sjneighborList do
 15:    D M [ s i , s j ] D M [ s i , s j ] / i = 1 k n e i g h b o r L i s t [ i ]
 16:   end for
 17:  end for
 18: for ti ← 1, T do
 19:   M 1     ( w t s × S T M + ( 1 w t s   ) × M p r e v 2 ) / 2
 20:   M 2     ( w t d × D M + ( 1 w t d   ) × M p r e v 1 ) / 2
 21:   M p r e v 1     M 1
 22:   M p r e v 2     M 2
 23:  end for
 24: FPSMFM = (M1 +M2)/2
 25: return FPSM
 26: end procedure = 0

6. Experimentation and Result Discussion

In this section, we describe the experimental setup and tools, dataset, and details of the experiments, after which the obtained results will be discussed.

6.1. Experimental Setup

For our experiments, we used Google Colab notebooks, with DL framework Tensorflow, machine learning packages from Scikit-learn, SciPy, and BERT with Configuration {“attention_probs_dropout_prob”: 0.1, “hidden_act”: “gelu”, “hidden_dropout_prob”: 0.1, “hidden_size”: 768, “initializer_range”: 0.02, “intermediate_size”: 3072, “max_position_embeddings”: 512, “num_attention_heads”: 12, “num_hidden_layers”: 12, “type_vocab_size”: 2, “vocab_size”: 28,996}, which is a transformer-based machine learning technique for NLP pretraining for our batch processing. We also developed an autoencoder-based DL module and performed PSN distance computation (Figure 4). Further, we implemented the PSN construction, including the static, dynamic, and fusion matrix construction algorithms previously explained in this study, and performed a matrix performance evaluation using JAVA via Apache NetBeans IDE version 12.2 from the Apache Software Foundation.

6.2. Dataset

We used two data sources throughout our experiments. (1) Dataset-1 was the epidemiological COVID-19 data [61], which were compiled and assembled from the state, regional, and local health reports. The data are geocoded and contain symptoms, primary dates (date of onset, admission, and confirmation), chronic diseases, travel history, and admission status for multiple COVID-19 patients. We used the data collected until 30 August 2020, including 155 complete records after preprocessing and cleaning, each of which represents an individual patient case. The dataset has 33 columns with four class outcomes (death, discharged, stable, and recovered). This dataset was selected for experimenting with the clinical text data and primarily includes symptoms, chronic disease, and additional information; NLP can be applied in this case. (2) Dataset-2 was the Framingham offspring heart study [62], which is a long-term cardiovascular cohort study including adult offspring of the original Framingham study that began in 1949 (Framingham, MA, USA). A total of 5124 individuals were recruited from 1971 to 1975 and were followed up for many years to examine secular trends in cardiovascular disease and its risk factors and also to investigate the association between risk factors and the incidence of cardiovascular disease, including stroke, myocardial infraction and CVD death. Details about the Framingham offspring cohort (https://biolincc.nhlbi.gov/studies/framoffspring/ (accessed on 1 March 2022)) utilized in the research and information about all Framingham cohort studies (https://biolincc.nhlbi.nih.gov/studies/fhs/ (accessed on 1 March 2022)) are available.
We adopted this dataset in our experiment because it considers the dynamicity of patient data characteristics. Further, multiple visiting records and static features were considered for each patient to evaluate our proposed fusion algorithm.
Table 2 summarizes the principal features of the two datasets used in our experiments.
Table 2. Summary of the datasets used in our experiments.

6.3. Evaluation Criteria

In our experiments, we compared the different distance algorithms used to generate our proposed similarity matrices and selected the optimal similarity matrix. Furthermore, we compared the performance of the fused matrix with the performance of the static and dynamic data similarity matrices independently. We evaluated the similarity matrices by adopting different evaluation criteria, such as accuracy, precision, recall, and F1-score [63]. We summarized our similarity matrix model evaluation using a 2 × 2 confusion matrix that depicted all four possible outcomes: true positive (TP), false positive (FP), false negative (FN), and true negative (TN).
TP: accurate prediction of similar patients (predicted that two patients are similar and both died).
TN: accurate prediction of non-similar patients (predicted that two patients are not similar, and both have different outcomes., e.g., P1 died and P2 survived).
FN: similar patients inaccurately predicted as non-similar (predicted two patients as non-similar, but they both have similar outcomes).
FP: non-similar patients inaccurately predicted as similar patients (predicted two patients as similar, but they have different outcomes).
We adopted the following measurements to validate and compare the performances of our similarity matrices as follows.
A c c u r a c y = T P + T N T P + T N + F P + F N = c o r r e c t l y   p r e d i c t e d   s i m i l a r   a n d   n o n s i m i l a r   p a t i e n t s t o t a l   n u m b e r   o f   p r e d i c t i o n s
R e c a l l = T P T P + F N = c o r r e c t l y   p r e d i c t e d   s i m i l a r   p a t i e n t s c o r r e c t l y   p r e d i c t e d   s i m i l a r   p a t i e n t s + s i m i l a r   p a t i e n t s   i n c o r r e c t l y   p r e d i c t e d   a s   n o n s i m i l a r
P r e c i s i o n = T P T P + F P = c o r r e c t l y   p r e d i c t e d   s i m i l a r   p a t i e n t s c o r r e c t l y   p r e d i c t e d   s i m i l a r   p a t i e n t s + n o n s i m i l a r   p a t i e n t s   i n c o r r e c t l y   p r e d i c t e d   a s   s i m i l a r
P r e c i s i o n = T P T P + F P = c o r r e c t l y   p r e d i c t e d   s i m i l a r   p a t i e n t s c o r r e c t l y   p r e d i c t e d   s i m i l a r   p a t i e n t s + n o n s i m i l a r   p a t i e n t s   i n c o r r e c t l y   p r e d i c t e d   a s   s i m i l a r
F 1   S c o r e = 2 ( R e c a l l     P r e c i s i o n ) R e c a l l + P r e c i s i o n

6.4. Experimental Scenarios

We conducted a series of experiments to evaluate our proposed multidimensional PSN using two different datasets. We adopted two principal experimental scenarios. In the first scenario, we focused on evaluating the similarity matrices generated based on a mixture of numerical and textual clinical data. In the second scenario, we focused on the performance of the SNF model that aggregates the static and dynamic features of patient data. Throughout all our experiments, we compared the performance of different geometrical distance algorithms, including Euclidean, Manhattan, cosine, Chebyshev, and weighted Manhattan, for patient similarity calculations. The goal of any machine learning project is to construct a more generic model that can perform well on unknown data, thus we chose k-fold cross-validation [64], one of the most popular strategies extensively utilized by data scientists. The fivefold cross validation approach was used in our experiments, which divided the training dataset into five parts, each of which having been chosen as the validation dataset for testing. The accuracy of the experiments was evaluated based on the equation in Section 6.3. Our experimental scenarios were aligned to validate the following objectives.
  • Scenario 1 evaluated the PSN model, where the data exhibited static features with a mixture of numerical and textual data.
    • ICU admission prediction for COVID-19 patients based on Dataset-1.
    • Evaluate the accuracy of the patient similarity matrix while using NLP models, BERT, and one-hot-encoding. These models were adopted to better capture the semantics of the clinical textual data and find the most similar patient.
    • Identify the best similarity distance measurement approach among the Euclidean, Manhattan, cosine, Chebyshev, and weighted Manhattan approaches.
    • Determine the optimal weight distribution among features when using the weighted distance evaluation approach. This approach improves accuracy when giving more significance to certain features than others.
    • Evaluating the PSN model performance when applying the local similarity approach for the similarity matrix can limit data conflicts and improve accuracy.
  • Scenario 2 evaluated the overall performance of our proposed multidimensional model, where the dataset involved a combination of dynamic and static features.
    • Predict a CVD event in the future based on Dataset-2.
    • Build a static PSN matrix for the static portion of the data and evaluate the performance of the STPS matrix according to the evaluation criteria mentioned in this study.
    • Evaluate the performance of the autoencoder used for the dynamic portion of the patient data for data reduction, thereby compacting the input patient information into a lower-dimensional space.
    • Build and evaluate the performance of the dynamic similarity matrix.
    • Evaluate the performance of the fused similarity matrix based on our proposed SNF algorithm and confirm that our model can represent the large, heterogeneous, and dynamic contents of a dataset.

6.4.1. Scenario 1. PSN Evaluation on Static Data having Numerical and Textual Data

Dataset-1 was used for this scenario, wherein both numerical and clinical textual data were available. The effectiveness of the static algorithm solution and distance estimation were evaluated. Further, the classification performance was analyzed using a fivefold cross-validation method. The accuracy, recall, precision, and F1-score measures were calculated, as explained in the evaluation criteria of this study, to compare the performances of different similarity distance calculation algorithms.
  • Accuracy measure of patient similarity
In this scenario, we generated numerical representations from the contextual embedding of textual clinical data via hot encoding and BERT. Next, we evaluated the accuracy of the resulting patient similarity matrix using different distance calculation techniques, including Euclidean, Manhattan, cosine, Chebyshev, and weighted approaches (Figure 5). The graphs illustrate that the Euclidean and weighted distance calculations performed better in accuracy for one-hot encoding, whereas cosine excelled when using BERT.
Figure 5. Accuracy with various distance measures (one-hot encoding and BERT).
Table 3 presents the results obtained based on the performance evaluation parameters of various distance measures used in one-hot encoding and BERT. The overall performance of BERT is slightly better than that of one-hot encoding.
Table 3. Evaluation of the PSN distance measures with one-hot encoding and BERT.
2.
Weighted-Distance Accuracy Measure against Similar Patients
In this experiment, we evaluated the patient similarity matrix generated using the weighted Manhattan distance algorithm after BERT contextual encoding. We defined different weights for each feature to provide more significance to some features over others that were validated based on medical expertise.
We employed a weighted scoring approach [65], a prioritization framework to prioritize the features and determine the weights for the current scenario. The set of weights were given to the six features, namely, age, gender, symptoms, additional_information, chronic_disease_binary, and chronic_disease, as shown in Table 4. We assigned various weights to each feature to give certain features more importance than others, which was confirmed by medical experts.
Table 4. Weighted scoring table.
Then, we assigned scores for each feature option ranging from 1 to 4. The default weight was 1. We used the following guidelines to assign weight scores:
  • To boost the score contribution, we set the weight to higher than 1.
  • To maintain the score contribution, we set the weight to 1.
Wt1 = [1,1,3,3,3,1], Wt2 = [1,1,3,2,3,3], Wt3 = [1,1,4,3,2,2], Wt4 = [1,1,3,3,3,3], and Wt5 = [1,1,3,1,1,1] represent the sets of weights assigned to age, gender, symptoms, additional_information, chronic_disease_binary, and chronic_disease, respectively. Optimal results were obtained when the features (symptoms, additional_information, chronic_disease_binary, and chronic_disease) of Wt4 were assigned higher weights, as represented (Figure 6).
Figure 6. Weighted accuracy based on weighted features.
3.
Accuracy Measure against the Selected Percentage of Similar Patients
Our next step in the experiment was based on the strategy of using the K-nearest neighbors of similar patients to calculate the local similarity for each matrix to increase the prediction accuracy. The details of this approach are depicted in this study.
The results presented (Figure 7) show that improved outcome prediction results can be obtained by considering only similar patients. The highest accuracy of 89% could be obtained for the Manhattan approach when selecting 5% of the related patients in our training, whereas selecting the full data (100%) resulted in a mere 75% accuracy. Thus, selecting the optimum number of similar patients was crucial to improve the predictive performance and decrease the training time (a key factor when big health data are considered).
Figure 7. Accuracy with varying training data involving similar patients.

6.4.2. Scenario 2. Hybrid PSN Model Evaluation Data with Static and Dynamic Features

In this scenario, we adopted Dataset-2, which is a combination of patient static demographic and dynamic longitudinal data, indicating multiple patient visits, which is ideal for evaluating our proposed fusion model. The class attribute in this dataset was developing CVD.
  • Static PSN Evaluation
In this experiment, we evaluated the accuracy of the STPS matrix based on different distance calculation algorithms. Table 2 presents the static features used for similarity. We evaluated the accuracy based on the different K-nearest neighbor values of similar patients. Accuracy increased when closely similar patients were selected for training the model, that is, the K-value decreased, as depicted in Figure 8. The weighted distance measurement resulted in the highest accuracy, at 83–84%, among all trials, followed by the cosine distance measure with an accuracy of 83% when considering 5% similar patients and 75% when using the full dataset. All the remaining distance measures resulted in improved results when the training data included the most similar patients.
Figure 8. Static data: accuracy in the case of similar patients.
2.
Dynamic PSN Evaluation (Autoencoder)
In Dataset-2 (CVD dataset), each patient has a different number of records representing the health measurements associated with each visit, which dictates reducing data dimensionality to facilitate the construction of the dynamic PSN. We first trained our dataset utilizing a reconstruction autoencoder model to reduce the size from 20,680 to 4046 rows with 5D, 32D, and 64D embeddings each. Subsequently, we trained the autoencoder model-generated output into a similarity matrix using one of the different distance measurement approaches. First, we split our dataset into static profile data and dynamic time-series patient visit records. Figure 9 presents the dynamic data balanced distribution, for example, approximately 400 patients have 2 records each and 800 patients have 7 records each.
Figure 9. Dataset 2: dynamic data distribution.
We made a random search to fine-tune the hyperparameters of our autoencoder. We followed a simple algorithm to train the model with the hyperparameters, chosen by intuition and experience, and then tried different combinations of hyperparameter values using cross-validation and measured the MSE to decide on the optimal combination of values for the hyperparameters.
For the embedding dimensions, we compared the accuracy using values ranging from 5 to 64. This experiment depicted a better accuracy for dimension 32. Accordingly, we decided to use an embedding dimension value of 32, which increased the accuracy of the fused matrix and gave us better overall results. Similarly, we compared the MSE when using a different number of layers, ranging from 1 to 3. The results show that using one hidden layer worked well with our problem, although a slight improvement was achieved when using a higher number of layers, which did not justify the extra time spent for training. In other words, more layers can be better, but also harder to train, so we decided to choose one layer for faster training.
In summary, the dynamic part of our data was fed into an LSTM layer. The proposed model used a batch of series of patient exam records as input and output (1 × 32) vector that is the final hidden state. However, the decoder used the (1 × 32) vector and passed it to an LSTM layer, which produced the dynamic time-series part. Figure 10 describes the architecture of the LSTM-based encoder–decoder neural network developed for data reduction. The following are the parameters used for the L S T M   ( i n p u t s i z e ,   h i d d e n s i z e ,   n u m l a y e r s ) , where the i n p u t s i z e is the number of expected features (x = 9), the h i d d e n s i z e is the number of features in the hidden state h = 32, and n u m l a y e r s is the number of recurrent layers = 1. Additionally, the set of inputs were ( i n p u t ,   ( h 0 ,   c 0 ) ) , where the i n p u t is a tensor of shape ( b a t c h s i z e ,   s e q u e n c e l e n g t h ,   i n p u t s i z e ) having a b a t c h s i z e of 32, a s e q u e n c e l e n g t h that is a variable depending on the number of rows (visits) for an individual patient, and an i n p u t s i z e = 9, i.e., the number of features. In our experiment, h 0 was a tensor of shape ( n u m l a y e r s ,   b a t c h s i z e ,   H o u t ) , where n u m l a y e r s = 1, b a t c h s i z e = 32, and H o u t = 32. Furthermore, c 0 is a tensor of shape ( n u m l a y e r s ,   b a t c h s i z e ,   h i d d e n s i z e ) , where n u m l a y e r s = 1, b a t c h s i z e = 32, and h i d d e n s i z e = 32. Figure 11 illustrates the autoencoder reconstruction loss values obtained based on MSE while generating (1 × 32) vector embedding. In this model, the reconstruction loss values decreased gradually and stabilized after approximately 3000 iterations.
Figure 10. The architecture of the data reduction autoencoder.
Figure 11. Reconstruction loss associated with an autoencoder.
3.
Fusion PSN Evaluation
In this experiment, we evaluated the performance of the resultant fused patient similarity matrix against the outcome class with respect to the different distance measurements explained in this study. Figure 12 depicts the performance of the final PSN matrix when compared with the static and dynamic similarity matrices while adopting different distance measurements. Our proposed SNF approach improved the accuracy of the final fusion patient similarity matrix when compared with the accuracies of the static and dynamic similarity matrices.
Figure 12. Accuracy of the fusion PSN.
Our experimental evaluation (Figure 12) also discloses that the static PSN data provided more accuracy than the dynamic PSN data. Here, the dataset consisted of static data, such as gender, age, and diabetic status, which featured categorical values with little variance. However, the dynamic features included frequently changing time-variant fields, such as BMI, Chol, and LDL, and each patient had a varying number of hospital visits (Figure 9). According to our view, the variance in static and dynamic data components, as well as the differences in PSN calculation methods, such as data reduction using autoencoders in dynamic PSN calculation, resulted in a considerable difference in accuracy. Similar studies depicted that autoencoders may cause accuracy reduction [66]. Another study on the performance of autoencoder with Bi-Directional LSTM [67] reported that the accuracy and F1-score of the model with an autoencoder dropped by around 4% and 9%, respectively, indicating that some information is lost because the encoding process does not hold all of the information from the original data. Moreover, as per Chen [68], even if the epoch size is high, the accuracy will be less than the initial accuracy because encoding and decoding cause some data loss. We believe this holds true in our above experiment using autoencoder for data reduction as well, where accuracy variation is around 1–5% between the static and dynamic PSN data.

6.4.3. Scenario 3. Benchmark to Other Classification Algorithms

Our multi-model PSN can be used for unsupervised or supervised data with high accuracy. To validate this, we selected one of the features as a labeled outcome to convert unsupervised learning into a supervised learning technique. Further, we evaluated the similarity network matrices with respect to this outcome. The experimental results show that a higher accuracy is achieved by the fused similarity matrix when compared with those of both the static and dynamic data similarity matrices when evaluated independently.
Furthermore, we benchmarked our PSN model with other widely adopted classification algorithms using the CVD and COVID-19 datasets. Accuracy improvement can be obtained by performing classification using our multi-model PSN when compared with those of other baseline-supervised classification models, such as Logistic Regression, Naïve Bayes, ZeroR, Decision Tree, and Random Forest. The parameters used in the chosen classification models were:
  • Naïve Bayes: {var_smoothing = 1e-09}
  • SVM: {‘SVMType’: C-SVC, ‘KernelType’: 2, ‘Degree’: 3, ‘nu’: 0.5, ‘cachesize’: 40, ‘cost’: 1, ‘eps’: 0.001, ‘loss’:0.1}
  • ZeroR: {‘batchsize’: 100, ‘useKernelEstimator’: False, ‘useSupervisedDiscretization’: False}
  • CNN: {‘layer’: 5, ‘Out’: 2, ‘gradNormThreshold’: 1.0-minimize, ‘algorithm’: STOCHASTIC_GRADIENT_DESCENT, ‘updater’: Adam, ‘biasUpdater’: Sgd, ‘weightInit’: XAVIER, ‘learningRate’: 0.001, ‘numEpochs’: 10 “}
  • Logistic Regression: {‘C’: 1.0, ‘dual’: False, ‘fit_intercept’: True, ‘intercept_scaling’: 1, ‘max_iter’: 100, ‘multi_class’: ‘auto’, ‘penalty’: ‘l2′, ‘solver’: ‘lbfgs’, ‘tol’: 0.0001, ‘warm_start’: False}
  • RandomTree: {‘KValue’:0, ‘minNum’: 1, ‘minVarianceProp’:0.001, ‘seed’: 1}
  • Decision Tree: {‘ccp_alpha’: 0.0, ‘criterion’: ‘gini’, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘splitter’: ‘best’}
Table 5 presents the different accuracy results of the classification algorithms. When testing using the CVD dataset results, the accuracy improved by 20% when compared with that of naïve Bayes; further, a minimum of 10% improvement could be observed when compared with those of zeroR and decision tree. However, experiments on the COVID-19 dataset show that our model results in a 7% higher accuracy than those of zeroR and LR and around 1–3% improvement compared with the other models. We included a CNN model that was the second best in accuracy for the CVD Dataset, scoring 91.2%, indicating that our proposed PSN model outperforms the neural network models as well.
Table 5. Benchmark PSN model compared to other classification algorithms.

7. Conclusions

Although data-driven prediction in personalized medicine is a developing field, the data analytics paradigm has been successfully applied in other research fields, such as personalized product recommendation in e-commerce. PSN is a new model to integrate data to cluster patients, and it has exciting potential for personalizing and improving healthcare. Although several data mining and DL models have been used to build PSNs and apply them, a single model cannot cope with the heterogeneity of the data and their large dimensionality, while maintaining a high accuracy and preserving the veracity of the data. Therefore, in this study, we proposed a multidimensional model that captures both contextual and longitudinal data and addresses the data dimensionality problem. In this model, DL models were combined with PSNs to provide richer clinical evidence and extract relevant information based on which similar patients can be compared and explored. BERT was used for contextual data analysis and the generation of embeddings, whereas CNN was used to capture the semantic features. In addition, an LSTM-based autoencoder was developed for data dimensionality reduction while preserving temporal features. A fusion model was developed to aggregate the results obtained from the two models and proposed more precise diagnoses and recommendations for a new patient. A set of experiments was conducted to evaluate the accuracy of our DL-based PSN fusion model. The results proved that the model provides a higher classification accuracy in determining various patient health outcomes when compared with other traditional classification algorithms.
Five potential directions are available for further improvement: (1) establish how PSN can be applied in survival analysis and implement a cardiovascular risk calculator; (2) address scalability issues when similarity matrices increase in size; (3) enhance the model to support values other than classes of nominal outcomes; and (4) improve the model with thorough experiments because the methodology is a new (5) experiment with a few of the BERT model variations described in Section 3.2, such as BioBERT, Dis-chargeBERT, PubMedBERT, BlueBERT, RoBERTa, and BioALBERT.
The PSN paradigm, for example, can be used to improve patient outcomes, provide treatment or drug recommendations to new patients, predict clinical outcomes, and provide clinical decision support. The trust associated with the recommendations can be considerably improved using new and continuously added data. Network-based patient similarity approaches have conceptual and technical features that are crucial to enable precision medicine.

Author Contributions

A.N.N. conceived the main conceptual ideas related to PSN, architecture, literature, and overall implementation/execution of experimentation. H.T.E-K. contributed to the formal modeling, design, fusion algorithm, implementation of the PSN fusion components, and to the analysis of the results. M.A.S. contributed to the overall architecture of the proposed model, supervised the study, and oversaw the overall direction and planning. A.O. provided the CVD dataset and contributed to the design of the experimentation scenarios and analysis. K.K. was involved in the design, deployment, and evaluation of the dynamic PSN model. All authors contributed to the writing of the manuscript, and revision and proofreading of the final version of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Zayed Health Center at UAE University, grant number 12R005.

Institutional Review Board Statement

The study did not require ethical approval.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Terry, S.F. Obama’s Precision Medicine Initiative. Genet. Test. Mol. Biomark. 2015, 19, 113–114. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Du, F.; Plaisant, C.; Spring, N.; Shneiderman, B. Finding Similar People to Guide Life Choices. J. Mol. Biol. 2017, 15, 5498–5544. [Google Scholar] [CrossRef]
  3. PatientsLikeMe. Available online: https://www.patientslikeme.com/ (accessed on 9 December 2021).
  4. Allam, A.; Dittberner, M.; Sintsova, A.; Brodbeck, D.; Krauthammer, M. Patient Similarity Analysis with Longitudinal Health Data. Available online: http://arxiv.org/abs/2005.06630 (accessed on 2 April 2022).
  5. Pai, S.; Bader, G.D. Patient Similarity Networks for Precision Medicine. J. Mol. Biol. 2018, 430, 2924–2938. [Google Scholar] [CrossRef] [PubMed]
  6. Wang, C.Z.F.; Cui, P.; Pei, J.; Song, Y. Recent Advances on Graph Analytics and Its Applications in Healthcare. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 23–27 August 2020; pp. 3545–3546. [Google Scholar]
  7. Pai, S.; Hui, S.; Isserlin, R.; Shah, M.A.; Kaka, H.; Bader, G.D. netDx: Interpretable patient classification using integrated patient similarity networks. Mol. Syst. Biol. 2019, 15, 8497. [Google Scholar] [CrossRef]
  8. Zhu, Z.; Yin, C.; Qian, B.; Cheng, Y.; Wei, J.; Wang, F. Measuring patient similarities via a deep architecture with medical concept embedding. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 749–758. [Google Scholar] [CrossRef] [Green Version]
  9. Gupta, V.; Sachdeva, S.; Bhalla, S. A Novel Deep Similarity Learning Approach to Electronic Health Records Data. IEEE Access 2020, 8, 209278–209295. [Google Scholar] [CrossRef]
  10. Suo, Q.; Ma, F.; Yuan, Y.; Huai, M.; Zhong, W.; Zhang, A.; Gao, J. Personalized disease prediction using a CNN-based similarity learning method. Proceedings of IEEE International Conference of Bioinformacy Biomedicine BIBM, Kansas City, MO, USA, 13–16 November 2017; pp. 811–816. [Google Scholar] [CrossRef]
  11. Lipton, Z.C.; Kale, D.C.; Elkan, C.; Wetzel, R. Learning to Diagnose with LSTM Recurrent Neural Networks. In Proceedings of the 4th International Conference on Learning Representations, ICLR 2016 -Conference Track Proceedings, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  12. Suo, O.; Ma, F.; Yuan, Y.; Huai, M.; Zhong, W.; Zhang, A.; Gao, J. Deep patient similarity learning for personalized healthcare. IEEE Trans. Nanobiosci. 2018, 17, 219–227. [Google Scholar] [CrossRef]
  13. Hamet, P.; Tremblay, J. Querying Clinical Workflows by Temporal Similarity. Metabolism 2017, 69, S36–S40. [Google Scholar] [CrossRef]
  14. Brown, S.A. Patient Similarity: Emerging Concepts in Systems and Precision Medicine. Front. Physiol. 2016, 7, 1–6. [Google Scholar] [CrossRef] [Green Version]
  15. Gottlieb, A.; Stein, G.Y.; Ruppin, E.; Altman, R.B.; Sharan, R. A method for inferring medical diagnoses from patient similarities. BMC Med. 2013, 11, 2013. [Google Scholar] [CrossRef] [Green Version]
  16. Lee, J.; Maslove, D.M.; Dubin, J.A. Personalized mortality prediction driven by electronic medical data and a patient similarity metric. PLoS ONE 2015, 10, 1–13. [Google Scholar] [CrossRef] [Green Version]
  17. Miotto, R.; Li, L.; Kidd, B.A.; Dudley, J.T. Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Sci. Rep. 2016, 6, 1–10. [Google Scholar] [CrossRef] [PubMed]
  18. Wang, B.; Mezlini, M.A.; Demir, F.; Fiume, M.; Tu, Z.; Brudno, M.; Haibe-Kains, B.; Goldenberg, A. Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 2014, 11, 333–337. [Google Scholar] [CrossRef] [PubMed]
  19. Ng, K.; Sun, J.; Hu, J.; Wang, F. Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity. AMIA Jt. Summits Transl. Sci. 2015, 2015, 132–138. [Google Scholar]
  20. Chawla, N.V.; Davis, D.A. Bringing big data to personalized healthcare: A patient-centered framework. J. Gen. Intern. Med. 2013, 28, 660–665. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Song, I.; Marsh, N.V. Anonymous indexing of health conditions for a similarity measure. IEEE Trans. Inf. Technol. Biomed. 2012, 16, 737–744. [Google Scholar] [CrossRef]
  22. Chan, T. Machine Learning of Patient Similarity: A case study on predicting survival in cancer patient after locoregional chemotherapy. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW), Hong Kong, 18 December 2010; pp. 467–470. [Google Scholar] [CrossRef] [Green Version]
  23. Girardi, D.; Wartner, S.; Halmerbauer, G.; Ehrenmüller, M.; Kosorus, H.; Dreiseitl, S. Using concept hierarchies to improve calculation of patient similarity. J. Biomed. Inform. 2016, 63, 66–73. [Google Scholar] [CrossRef]
  24. Panahiazar, M.; Taslimitehrani, V.; Pereira, N.L.; D, M.; Pathak, J. Using EHRs for Heart Failure Therapy Recommendation Using Multidimensional Patient Similarity Analytics. Stud. Health Technol. Inform. 2015, 210, 369–373. [Google Scholar] [CrossRef] [Green Version]
  25. Heckerman, D. Probabilistic similarity networks. Networks 1990, 20, 607–636. [Google Scholar] [CrossRef] [Green Version]
  26. Heckerman, D.E.; Horvitz, E.J.; Nathwani, B.N. Update on the Pathfinder Project. Annu. Symp. Comput. Appl. Med. Care 1989, 754, 203–207. [Google Scholar]
  27. Wang, Y.; Tian, Y.; Tian, L.L.; Qian, Y.M.; Li, J.S. An Electronic Medical Record System with Treatment Recommendations Based on Patient Similarity. J. Med. Syst. 2015, 5, 237. [Google Scholar] [CrossRef]
  28. Roque, F.S.; Jensen, P.B.; Schmock, H.; Dalgaard, M.; Andreatta, M.; Hansen, T.; Søeby, K.; Bredkjær, S.; Juul, A.; Werge, T.; et al. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput. Biol. 2011, 7, 2141. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Lage, K.; Karlberg, E.O.; Størling, Z.M.; Olason, P.I.; Pedersen, A.G.; Rigina, O.; Hinsby, A.M.; Tümer, Z.; Pociot, F.; Tommerup, N. A human phenome-interactome network of protein complexes implicated in genetic disorders. Nat. Biotechnol. 2007, 25, 309–316. [Google Scholar] [CrossRef]
  30. Seligson, D.N.; Warner, L.J.; Dalton, S.W.; Martin, D.; Miller, S.R.; Patt, D.; Kehl, L.K.; Palchuk, B.M.; Alterovitz, G.; Wiley, K.L. Recommendations for patient similarity classes: Results of the AMIA 2019 workshop on defining patient similarity. J. Am. Med. Inform. Assoc. 2020, 10, 1–5. [Google Scholar] [CrossRef]
  31. Tashkandi, A.; Wiese, I.; Wiese, L. Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems. Big Data Res. 2018, 13, 52–64. [Google Scholar] [CrossRef]
  32. Perlman, L.; Gottlieb, A.; Atias, N.; Ruppin, E.; Sharan, R. Combining Drug and Gene Similarity Measures for Drug-Target Elucidation. J. Comput. Biol. 2011, 18, 133–145. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Köhler, S.; Schulz, M.H.; Krawitz, P.; Bauer, S.; Dölken, S.; Ott, C.E.; Mundlos, C.; Horn, D.; Mundlos, S.; Robinson, P.N. Clinical Diagnostics in Human Genetics with Semantic Similarity Searches in Ontologies. Am. J. Hum. Genet. 2009, 85, 457–464. [Google Scholar] [CrossRef] [Green Version]
  34. Lee, J.; Sun, J.; Wang, F.; Wang, S.; Jun, C.-H.; Jiang, X. Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis. JMIR Med. Inform. 2018, 6, 7744. [Google Scholar] [CrossRef]
  35. Koks, S.; Williams, R.W.; Quinn, J.; Farzaneh, F.; Conran, N.; Tsai, S.J.; Awandare, G.; Goodman, S.R. Highlight article: COVID-19: Time for precision epidemiology. Exp. Biol. Med. 2020, 245, 677–679. [Google Scholar] [CrossRef]
  36. Hartono, P. Similarity maps and pairwise predictions for transmission dynamics of COVID-19 with neural networks. Inform. Med. Unlocked 2020, 20, 100386. [Google Scholar] [CrossRef]
  37. Gao, J.; Xiao, C.; Glass, L.M.; Sun, J. COMPOSE: Cross-Modal Pseudo-Siamese Network for Patient Trial Matching. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, 6–10 July 2020; pp. 803–812. [Google Scholar] [CrossRef]
  38. Shahri, M.P.; Lyon, K.; Schearer, J.; Kahanda, I. DeepPPPred: An Ensemble of BERT, CNN, and RNN for Classifying Co-mentions of Proteins and Phenotypes. bioRxiv 2020. [Google Scholar] [CrossRef]
  39. Xiong, Y.; Chen, S.; Qin, H.; Cao, H.; Shen, Y.; Wang, X.; Chen, Q.; Yan, J.; Tang, B. Distributed representation and one-hot representation fusion with gated network for clinical semantic textual similarity. BMC Med. Inform. Decis. Mak. 2020, 20, 1–7. [Google Scholar] [CrossRef]
  40. Žitnik, M.; Zupan, B. Data fusion by matrix factorization. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 41–53. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Bhalla, S.; Melnekoff, D.T.; Aleman, A.; Leshchenko, V.; Restrepo, P.; Keats, J.; Onel, K.; Sawyer, J.R.; Madduri, D.; Richter, J.; et al. Patient similarity network of newly diagnosed multiple myeloma identifies patient subgroups with distinct genetic features and clinical implications. Sci. Adv. 2021, 7, 47. [Google Scholar] [CrossRef] [PubMed]
  42. Ni, J.; Liu, J.; Zhang, C.; Ye, D.; Ma, Z. Fine-grained Patient Similarity Measuring using Deep Metric Learning. Comput. Sci. 2017, 47, 1189–1198. [Google Scholar] [CrossRef]
  43. Chan, L.W.; Liu, Y.; Chan, T.; Law, H.K.W.; Wong, S.C.; Yeung, A.P.; Lo, K.F.; Yeung, S.W.; Kwok, K.Y.; Chan, W.Y.L.; et al. PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction. BMC Med. Inform. Decis. Mak. 2015, 15, 1–8. [Google Scholar] [CrossRef]
  44. Barkhordari, M.; Niamanesh, M. ScaDiPaSi: An Effective Scalable and Distributable MapReduce-Based Method to Find Patient Similarity on Huge Healthcare Networks. Big Data Res. 2015, 2, 19–27. [Google Scholar] [CrossRef]
  45. Sun, J.; Wang, F.; Hu, J.; Edabollahi, S. Supervised patient similarity measure of heterogeneous patient records. ACM Explor. Newsl. 2012, 14, 16. [Google Scholar] [CrossRef]
  46. Alsentzer, E.; Murphy, J.R.; Boag, W.; Weng, W.-H.; Jin, D.; Naumann, T.; McDermott, M.B.A. Publicly Available Clinical BERT Embeddings. arXiv 2019. [Google Scholar] [CrossRef]
  47. Huang, K.; Altosaar, J.; Ranganath, R. ClinicalBert: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv 2019. [Google Scholar] [CrossRef]
  48. Lee, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef]
  49. Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthcare 2020, 3, 1–23. [Google Scholar] [CrossRef]
  50. Peng, Y.; Yan, S.; Lu, Z. Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. BioNLP 2019, 56, 58–65. [Google Scholar] [CrossRef] [Green Version]
  51. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019. [Google Scholar] [CrossRef]
  52. Naseem, U.; Khushi, M.; Reddy, V.; Rajendran, S.; Razzak, I.; Kim, J. BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition. Proc. Int. Jt. Conf. Neural Netw. 2021, 2021, 3884. [Google Scholar] [CrossRef]
  53. Dai, Z.; Li, Z.; Han, L. BoneBert: A BERT-based Automated Information Extraction System of Radiology Reports for Bone Fracture Detection and Diagnosis. Lect. Notes Comput. Sci. 2021, 12695, 263–274. [Google Scholar] [CrossRef]
  54. Isah, H.; Abughofa, T.; Mahfuz, S.; Ajerla, D.; Zulkernine, F.; Khan, S. A survey of distributed data stream processing frameworks. IEEE Access 2019, 7, 154300–154316. [Google Scholar] [CrossRef]
  55. Wang, N. Measurement and application of patient similarity in personalized predictive modeling based on electronic medical records. Biomed. Eng. Online 2019, 18, 1–15. [Google Scholar] [CrossRef] [Green Version]
  56. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735. [Google Scholar] [CrossRef]
  57. Cheng, H.; Tan, P.N.; Gao, J.; Scripps, J. Multistep-ahead time series prediction. Pac. Asia Conf. Knowl. Discov. Data Min. 2006, 14, 765–774. [Google Scholar]
  58. Hochreiter, S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int. J. Uncertainty Fuzziness Knowl. Based Syst. 1998, 6, 107–116. [Google Scholar] [CrossRef] [Green Version]
  59. Patterson, J.; Gibson, A. Deep Learning: A Practitioner’s Approach; O’Reilly Media, Inc.: Newton, MA, USA, 2017. [Google Scholar]
  60. Isele, R.; Bizer, C. Learning linkage rules using genetic programming. In Proceedings of the 6th International Conference on Ontology Matching; ACM Digital Library: Bonn, Germany, 2011; Volume 814, pp. 13–24. [Google Scholar]
  61. Xu, B.; Gutierrez, B.; Mekaru, S.; Sewalk, K.; Goodwin, L.; Loskill, A.; Cohn, E.L.; Hswen, Y.; Hill, S.C.; Cobo, M.M.; et al. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci. Data 2020, 7, 448. [Google Scholar] [CrossRef] [PubMed]
  62. Framingham Heart Study. Available online: https://framinghamheartstudy.org/participants/participant-cohorts/ (accessed on 2 April 2022).
  63. Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
  64. Wong, T.T.; Yeh, P.Y. Reliable Accuracy Estimates from k-Fold Cross Validation. IEEE Trans. Knowl. Data Eng. 2020, 32, 1586–1594. [Google Scholar] [CrossRef]
  65. Weighted Scoring Definition and Overview. Available online: https://www.productplan.com/glossary/weighted-scoring/ (accessed on 5 May 2021).
  66. Dimensionality Reduction with Autoencoders versus PCA by Andrea Castiglioni towards Data Science. Available online: https://towardsdatascience.com/dimensionality-reduction-with-autoencoders-versus-pca-f47666f80743 (accessed on 5 May 2021).
  67. Song, Z. Performance of Autoencoder with Bi-Directional Long-Short Term Memory Network in Gestures Unit Segmentation. Aust. Nat. Univ. 2018, 1, 1–6. [Google Scholar]
  68. Chen, J. The effect of an auto-encoder on the accuracy of a convolutional neural network classification task. Res. Sch. Comput. Sci, Aust. Nat. Univ. 2018, 1–8. Available online: https://users.cecs.anu.edu.au/~Tom.Gedeon/conf/ABCs2018/paper/ABCs2018_paper_166.pdf (accessed on 2 April 2022).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.