A Multimodal Deep Learning Approach to Predicting Systemic Diseases from Oral Conditions

Background: It is known that oral diseases such as periodontal (gum) disease are closely linked to various systemic diseases and disorders. Deep learning advances have the potential to make major contributions to healthcare, particularly in the domains that rely on medical imaging. Incorporating non-imaging information based on clinical and laboratory data may allow clinicians to make more comprehensive and accurate decisions. Methods: Here, we developed a multimodal deep learning method to predict systemic diseases and disorders from oral health conditions. A dual-loss autoencoder was used in the first phase to extract periodontal disease-related features from 1188 panoramic radiographs. Then, in the second phase, we fused the image features with the demographic data and clinical information taken from electronic health records (EHR) to predict systemic diseases. We used receiver operation characteristics (ROC) and accuracy to evaluate our model. The model was further validated by an unseen test dataset. Findings: According to our findings, the top three most accurately predicted chapters, in order, are the Chapters III, VI and IX. The results indicated that the proposed model could predict systemic diseases belonging to Chapters III, VI and IX, with AUC values of 0.92 (95% CI, 0.90–94), 0.87 (95% CI, 0.84–89) and 0.78 (95% CI, 0.75–81), respectively. To assess the robustness of the models, we performed the evaluation on the unseen test dataset for these chapters and the results showed an accuracy of 0.88, 0.82 and 0.72 for Chapters III, VI and IX, respectively. Interpretation: The present study shows that the combination of panoramic radiograph and clinical oral features could be considered to train a fusion deep learning model for predicting systemic diseases and disorders.


Introduction
Oral health, as an integral part of overall wellbeing, is an important indicator of general health and quality of life [1,2]. Periodontal (gum) disease, including gingivitis and periodontitis, is one of the major oral disease burdens worldwide, and it is characterized by dysregulated host immuno-inflammatory responses to oral dysbiotic biofilms, thereby resulting in destruction of tooth-supporting periodontal tissues [3]. Of note, a systematic review of the research activity in periodontal medicine reveals that 129 Medical Subject Headings (MeSH) terms representing 57 systemic diseases/disorders or comorbidities have been associated with periodontal disease [4]. Currently, to the best of our knowledge, there are no machine learning models attempting to identify systemic diseases on the basis of oral conditions. Here, we developed a multimodal deep learning method by fusing various features from panoramic radiographs (so-called orthopantomogram, OPG) and other oral clinical information taken from the electronic health record (EHR) to predict systemic diseases.
In recent years, there has been a sharply rising interest in the advancement of artificial intelligence (AI) across all scientific disciplines. AI is a broad term that refers to a machine's ability to learn and react in a similar way as humans do. AI-based systems are trained to perform a wide range of rational tasks without the need for specialized programming. These intelligent platforms are therefore commonly employed in real-world applications, such as image processing, text mining and speech recognition [5,6]. Furthermore, by incorporating AI into healthcare services, we could to some extent improve clinical diagnosis, treatment planning and development of new approaches, protocols or drugs [7]. Conceivably, AI-assisted diagnosis and treatment planning may become less expensive and more widely available in the near future. Indeed, machines can learn from large amounts of patients' datasets to determine the fundamental characteristics of an individual patient, allowing clinicians to diagnose various diseases and disorders at an early stage for effective healthcare [8]. Notably, detection of brain tumors via MRI, retinopathy detection via eye images, cardiac health assessment via electrocardiograms and COVID epidemic management and analysis are some eye-catching examples of AI in healthcare [9][10][11].
Deep learning models have shown enormous promise in medical image processing for providing clinical decision-making support for a wide range of diseases and disorders such as diabetic retinopathy, cancer, and Alzheimer's disease [12]. However, relying only on medical images is inadequate in clinical practice, and a more robust clinical decision could be made by incorporating various sources of data like structural laboratory results and medical history information from the EHR [13]. It has recently been demonstrated that using data fusion to train a multimodal deep learning model can enhance the accuracy and robustness of the final prediction [14]. In these models, features from medical images have been fused with other characteristics to improve the performance of the model.
Medical images are usually high-dimension data with a large number of trainable features [15]. For instance, pixels in a CT image of the chest have sub-mm resolution, resulting in an imaging dataset containing a million or more voxels. Therefore, extracting the related features from the raw medical images would be a challenging step to build an efficient multimodal model. Using manual annotation to extract features is usually costly and time-consuming. This is especially true in the field of medical images, where the data are sensitive, and annotation requires a lot of specialized domain knowledge.
Here, we used a two-phase approach to predict systemic diseases from oral condition. In the first phase we used a dual-loss autoencoder to extract periodontal disease-related features from 1188 oral radiographs of subjects with different periodontal conditions. The autoencoder does not require pixel-level annotation to extract the related features from the OPG radiographs. In the second phase, we fused these image features with clinical oral information taken from the EHR to train a deep neural network model for identifying the potential systemic comorbidities. Notably, the multimodal model trained by the fused features outperformed those only trained by one of such features. Our findings show that the proposed model could be used to predict systemic diseases based on oral conditions.

Dataset
The data were retrieved from the EHRs of Prince Philip Dental Hospital (PPDH) and Queen Mary Hospital (QMH). We collected age, gender, income, teeth number, periodontal stage, and bone loss (Table 1)  of 56 years; 61% of females). Mean imputation was used to replace missing values in 13 subjects where alveolar bone level was not measurable. The Digital Imaging and Communications in Medicine (DICOM) format of individual OPG file was collected from the PPDH database. All OPG images were assessed by a single examiner (DZ) and the subjects were then assigned to four groups based on the new classification of periodontal diseases and conditions [16]. We used OPG images along with the eight demographic and clinical features to train our model. On the other hand, the medical records of patients were obtained from the EHR of QMH. The diseases/conditions were coded with the International Classification of Diseases (ICD)-10, and the disease profiles were then categorized into 14 Chapters (excluding K00-K14 which is "Diseases of oral cavity, salivary glands and jaws") (Table S1) [17]. Next, all subjects were classified as two groups with the presence or absence of the diseases/conditions in each Chapter. This study was approved by the Institutional Review Board of the University of Hong Kong/Hospital Authority of Hong Kong West Cluster (IRB UW 16-434).

Dual-Loss Autoencoder to Extract Periodontal-Related Features from OPG
In the first phase, we extracted periodontal disease-related features from OPG images using a dual-loss autoencoder ( Figure 1). To train the model, we used two losses: the mean squared error (MSE) loss [18] for input image reconstruction and the cross-entropy loss [19] for predicting patient periodontal stages. We labeled the patients into two groups including 544 (45.8%) in severe disease group (Stage III generalized and Stage IV Periodontitis) and 644 (54.2%) from the counterpart (periodontal health, gingivitis, Stages I-III Periodontitis, localized).
For the given input image samples set x (1) , x (2) , . . . , x (N) where x (i) ∈ R W×L , the inputs are mapped by encoder Q φ (x) to the latent representation of LR. The LR data propagate into two paths, one for reconstruction of the input image and another path for classification of images based on their periodontal stages. For the reconstruction path, the latent space fed into a decoder P θ (LR) to reconstruct the same input images using MSE loss (Equation (1)).
Simultaneously, the LR array goes through a dense layer network of D ψ (LR) and is then classified using cross-entropy loss (Equation (2)). for OPG images we have a label set {y 1 , y 2 , y 3 , . . . , y} where y i ∈ {0, 1}. The final loss to train the autoencoder consists of these two losses: where the L 1,Q is the backpropagated MSE loss through the decoder (Q φ ) and the L 2,D presents the loss of dense layers (D ψ ).
FOR PEER REVIEW 5 of 13 Figure 1. The overall schema of the model. Phase 1: We used the raw OPG images as input to an autoencoder to extract periodontal disease-related features. The autoencoder has a dual-loss architecture, with mean squared error (MSE) and cross-entropy (CE) losses. After training the autoencoder, the latent space of the model is used for the next step. Phase 2: The periodontal disease-related features (the latent space of the autoencoder) have been used as inputs in a deep neural network model. The patient demographic data extracted from the EHR were combined with the image features to train the model. Prediction of systemic diseases using the fusion model.
In the second phase, we fused the features extracted from the OPG images with the demographic and clinical oral features extracted from the EHR to train a deep neural network (DNN) (Figure 1). The DNN model is a binary classifier that uses dense and drop out layers to predict chronic diseases based on input features ( Figure S2). As the number of features from the OPG images and the EHR is not the same (8,192 and 8, respectively), we passed these two feature sets from the 1 and 2 networks prior to feature concatenation, making the number of features more equal and yet avoiding the potential bias in model training caused by a large number of OPG image features. Finally, we concatenated We used the raw OPG images as input to an autoencoder to extract periodontal disease-related features. The autoencoder has a dual-loss architecture, with mean squared error (MSE) and cross-entropy (CE) losses. After training the autoencoder, the latent space of the model is used for the next step. Phase 2: The periodontal disease-related features (the latent space of the autoencoder) have been used as inputs in a deep neural network model. The patient demographic data extracted from the EHR were combined with the image features to train the model. Prediction of systemic diseases using the fusion model.
Using two losses helps to avoid bias and makes the autoencoder robust against noise. The MSE loss could contribute to avoiding a possible bias in feature extraction by ignoring irrelevant information not essential to reconstruct the OPG image. In addition, applying cross-entropy loss assists to make a sparser latent space and leads to boosting the final fusion model to make it more robust against input noise, as the periodontal diseaserelated features could be distinguished in LR space. This would be helpful in OPG image processing, as the artifact of movement in imaging could be ignored by the sparsification of LR.
We used a modified version of the BCDU-NET [20] for our image reconstruction model. For patient periodontal stages prediction, we used a fully connected network with three layers and Leaky-Relu activation function. The architecture of the model can be found in Figure S1. The training process was completed using Adam optimizer with a batch size of 32 in 10 epochs having 1e-4 learning rate and 10 extra epochs, with 1e-5 to be more precise.
In the second phase, we fused the features extracted from the OPG images with the demographic and clinical oral features extracted from the EHR to train a deep neural network (DNN) (Figure 1). The DNN model is a binary classifier that uses dense and drop out layers to predict chronic diseases based on input features ( Figure S2). As the number of features from the OPG images and the EHR is not the same (8192 and 8, respectively), we passed these two feature sets from the N 1 and N 2 networks prior to feature concatenation, making the number of features more equal and yet avoiding the potential bias in model training caused by a large number of OPG image features. Finally, we concatenated N 1 and N 2 output, and it was then fed into the N 3 network for the final prediction ( Figure 1).
For each ICD-10 chapter in our dataset, we trained a separated binary classifier model to predict if the patient belongs to that chapter or not. We used a down sampling approach to create a balanced dataset for those chapters so that the number of samples in the majority class was more than double the size of the minority class (extremely unbalanced). For each layer, the ReLU activation function was performed. For each layer L, we have: W denotes the weight matrix; X is the input, and b represents the bias value, and i is the indices of the output array u. Additionally, a drop out layer was utilized between each dense layer to minimize the overfitting of the model. Of note, we trained the model with a learning rate of 1e-4, batch size of 50, in 300 epochs. The architecture of the model can be found in the Figure S2.

Extract Periodontal-Related Features from OPG
To assess the performance of the autoencoder model, a 2D projection of the latent space was employed. A principle component analysis (PCA) dimensionality reduction method was applied to the latent space of the OPG images. The plot shows two separate clusters formed by the healthy samples and OPG images with periodontitis ( Figure 2).
Next, the discriminative regions were visualized via Grad-Class Activation Mapping (Grad-CAM) [21] for better realizing the regions that the network concentrates on while building the latent space. The activation maps help us determine which parts of the image make most of the contributions to the model's final output. The model gave more attention to the brighter parts of the plots. These results revealed the autoencoder focused more on the periodontal area of the OPG images that was key for identifying periodontal disease. Figure 3 presents a graph of the Grad-CAM results for a group of OPG images.

Predict Systemic Disease Using a Fusion Model
In our dataset, we had systemic disease information for 14 ICD-10 chapters. Using the fused features, we trained 14 separate binary classifiers for each disease chapter to predict whether or not each patient belonged to that chapter. To evaluate the DNN models, the area under the receiver operator characteristic (ROC) curve was used. The data were divided between the training (70%) and unseen test (30%) for each disease chapter. For the training dataset, ten-fold cross-validation was undertaken, and the average of model performance in the ten-fold was presented. The results indicated that the model trained for Chapters III, VI and IX had the best performance among all 14 models, with AUCs of 0.92 (95% CI, 0.90-94), 0.87 (95% CI, 0.84-89) and 0.78 (95% CI, 0.75-81), respectively (Figure 4). each dense layer to minimize the overfitting of the model. Of note, we trained the model with a learning rate of 1e-4, batch size of 50, in 300 epochs. The architecture of the model can be found in the Figure S2.

Extract Periodontal-Related Features from OPG
To assess the performance of the autoencoder model, a 2D projection of the latent space was employed. A principle component analysis (PCA) dimensionality reduction method was applied to the latent space of the OPG images. The plot shows two separate clusters formed by the healthy samples and OPG images with periodontitis ( Figure 2). Next, the discriminative regions were visualized via Grad-Class Activation Mapping (Grad-CAM) [21] for better realizing the regions that the network concentrates on while building the latent space. The activation maps help us determine which parts of the image make most of the contributions to the model's final output. The model gave more attention to the brighter parts of the plots. These results revealed the autoencoder focused more on the periodontal area of the OPG images that was key for identifying periodontal disease. Figure 3 presents a graph of the Grad-CAM results for a group of OPG images.

Predict Systemic Disease Using a Fusion Model
In our dataset, we had systemic disease information for 14 ICD-10 chapters. Using the fused features, we trained 14 separate binary classifiers for each disease chapter to predict whether or not each patient belonged to that chapter. To evaluate the DNN models, the area under the receiver operator characteristic (ROC) curve was used. The data were divided between the training (70%) and unseen test (30%) for each disease chapter. To further determine whether combining OPG and EHR data features could improve the model performance, we trained models using OPG features or EHR data features alone, with reference to our fusion model, respectively. Moreover, the models were trained for predicting the top three disease chapters using only OPG image and EHR data features. The AUC results demonstrated that the model trained with the combined features outperformed those trained using only one type of the features. We also compared ROC curves using Delong's paired AUC comparison tests (two-sided, null hypothesis = no difference in AUC) [22]. The p-values shows that models trained with both OPG and EHR features have a better performance ( Figure 5). To further determine whether combining OPG and EHR data features could improve the model performance, we trained models using OPG features or EHR data features alone, with reference to our fusion model, respectively. Moreover, the models were trained for predicting the top three disease chapters using only OPG image and EHR data features. The AUC results demonstrated that the model trained with the combined features outperformed those trained using only one type of the features. We also compared ROC curves using Delong's paired AUC comparison tests (two-sided, null hypothesis = no difference in AUC) [22]. The p-values shows that models trained with both OPG and EHR features have a better performance ( Figure 5).   To further determine whether combining OPG and EHR data features could improve the model performance, we trained models using OPG features or EHR data features alone, with reference to our fusion model, respectively. Moreover, the models were trained for predicting the top three disease chapters using only OPG image and EHR data features. The AUC results demonstrated that the model trained with the combined features outperformed those trained using only one type of the features. We also compared ROC curves using Delong's paired AUC comparison tests (two-sided, null hypothesis = no difference in AUC) [22]. The p-values shows that models trained with both OPG and EHR features have a better performance ( Figure 5). Finally, to assess the robustness of the models, we performed the evaluation on the 30% of unseen dataset for Chapters III, VI and IX. The accuracy, sensitivity, specificity, Finally, to assess the robustness of the models, we performed the evaluation on the 30% of unseen dataset for Chapters III, VI and IX. The accuracy, sensitivity, specificity, precision and F1 score, and confusion matrix were used to evaluate our models on the unseen data. The results revealed the robustness of the model performance with an accuracy of 0.88, 0.82 and 0.72 for Chapters III, VI and IX, respectively (Table 2, Figure 6). precision and F1 score, and confusion matrix were used to evaluate our models on the unseen data. The results revealed the robustness of the model performance with an accuracy of 0.88, 0.82 and 0.72 for Chapters III, VI and IX, respectively (Table 2, Figure 6).  In the present study, anaemia is the major disease identified in Chapter III. Chronic inflammation and persistent activation of the immune system may lead to anemia [23], likely due to insufficient production of erythropoietin and the decreased response of erythroid progenitors [24]. Consequently, the inhibition of hepcidin by erythroferrone is reduced, and the increased level of hepcidin alters the status of iron [25]. In addition, the release of iron stored in the body is also hindered by the persistent inflammation [26]. A recent systematic review indicated that periodontal disease, especially severe periodontitis, could reduce hemoglobin concentration and contribute to iron metabolism disorder [27]. Furthermore, as the leading cause of severe tooth loss in adults [28], periodontitis impacts nutritional intake, which may also account for anaemia.
Among the subjects with conditions in Chapter VI, about 25% of them had sleep disorders (G47) and mononeuropathies of the upper limb (G56). According to the summary of 13 studies, the linkage of sleep disorders to periodontal disease exists, which may be induced by an elevated level of systemic inflammation in these patients [29]. Mononeuropathies of the upper limb could result in pain, weakness, and loss of upper extremity function, consequently limiting oral hygiene behaviors and increasing the risk of periodontal disease. In addition, Alzheimer's disease (G30) is also coded in this Chapter. Recently, the keystone periodontopathogen Porphyromonas gingivalis and its gingipain have been detected in the brain specimens from patients with Alzheimer's disease, which imply that periodontal disease may be involved in the pathogenesis of Alzheimer's disease [30].
Furthermore, the diseases coded in Chapter IX have been widely documented, both epidemiologically and mechanically. In the present study, hypertension, ischaemic heart diseases, and stroke are the three most prevalent diseases from this Chapter with 349, 92 and 48 subjects, respectively. Indeed, periodontitis was proven to be an independent risk factor of atherosclerotic CVD in a large-scale cohort study with over 60 thousand participants [31]. Moreover, the landmark randomized controlled trial in periodontitis patients demonstrates that intensive periodontal treatment could improve endothelial function that would be potentially beneficial to enhance healthcare for CVD patients [32]. Our recent studies further show that periodontal therapy can favorably modulate the gene expression of inflammatory mediators in endothelial progenitor cells and yet contribute to improving the heart function in diabetic patients [33,34], as these promising treatment In the present study, anaemia is the major disease identified in Chapter III. Chronic inflammation and persistent activation of the immune system may lead to anemia [23], likely due to insufficient production of erythropoietin and the decreased response of erythroid progenitors [24]. Consequently, the inhibition of hepcidin by erythroferrone is reduced, and the increased level of hepcidin alters the status of iron [25]. In addition, the release of iron stored in the body is also hindered by the persistent inflammation [26]. A recent systematic review indicated that periodontal disease, especially severe periodontitis, could reduce hemoglobin concentration and contribute to iron metabolism disorder [27]. Furthermore, as the leading cause of severe tooth loss in adults [28], periodontitis impacts nutritional intake, which may also account for anaemia.
Among the subjects with conditions in Chapter VI, about 25% of them had sleep disorders (G47) and mononeuropathies of the upper limb (G56). According to the summary of 13 studies, the linkage of sleep disorders to periodontal disease exists, which may be induced by an elevated level of systemic inflammation in these patients [29]. Mononeuropathies of the upper limb could result in pain, weakness, and loss of upper extremity function, consequently limiting oral hygiene behaviors and increasing the risk of periodontal disease. In addition, Alzheimer's disease (G30) is also coded in this Chapter. Recently, the keystone periodontopathogen Porphyromonas gingivalis and its gingipain have been detected in the brain specimens from patients with Alzheimer's disease, which imply that periodontal disease may be involved in the pathogenesis of Alzheimer's disease [30].
Furthermore, the diseases coded in Chapter IX have been widely documented, both epidemiologically and mechanically. In the present study, hypertension, ischaemic heart diseases, and stroke are the three most prevalent diseases from this Chapter with 349, 92 and 48 subjects, respectively. Indeed, periodontitis was proven to be an independent risk factor of atherosclerotic CVD in a large-scale cohort study with over 60 thousand participants [31]. Moreover, the landmark randomized controlled trial in periodontitis patients demonstrates that intensive periodontal treatment could improve endothelial function that would be potentially beneficial to enhance healthcare for CVD patients [32]. Our recent studies further show that periodontal therapy can favorably modulate the gene expression of inflammatory mediators in endothelial progenitor cells and yet contribute to improving the heart function in diabetic patients [33,34], as these promising treatment outcomes are connected to the underlying mechanisms of infection and inflammation. For example, the DNAs of periodontal pathogens are detectable from various endarterectomy specimens [35][36][37]. Indeed, these noxious pathogens and their virulence factors can crucially account for the endothelial dysfunction and the progression of CVD [38].

Discussion
As one of the most prevalent inflammatory diseases worldwide, periodontal disease is ranked 7th among all 369 diseases, injuries and impairments investigated in the extended Global Burden of Disease Study 2019 (GBD) [39]. Moreover, severe periodontitis is the most common cause of tooth loss and edentulism in adult population, and it substantially affects oral functions, nutritional intake and quality of life [28], with considerable socioeconomic impacts [40]. Notably, it is evident that periodontitis could crucially account for systemic dissemination of infections such as bacteremia, increased inflammation levels of the body and dysbiotic microbiomes in various niches (e.g., gut) [41]. Consequently, this serious oral inflammation is closely linked to various life-threatening noncommunicable diseases (NCDs), such as cardiovascular diseases (CVD), diabetes mellitus (DM), inflammatory bowel disease, respiratory diseases, Alzheimer's disease (AD), chronic kidney disease, rheumatoid arthritis, pancreatic and colorectal cancers, various systemic comorbidities, and lately the COVID-19 [38,[42][43][44][45][46][47][48]. In addition to these specific linking profiles of one systemic condition to periodontal disease, our 18-year follow-up study identifies for the first time that periodontitis experience may represent an increased risk for the onset of multiple common NCDs [49], and our latest findings further indicate that periodontitis is significantly linked to a cluster of systemic comorbidities [45].
In this study, we established a multimodal deep learning model to predict the presence of systemic diseases and disorders following the ICD-10 code, based on periodontal conditions assessed by alveolar bone levels and basic demographic characteristics. The benefit of our model is that it is trained using features from both medical images and patient clinical information to perform prediction. The dual loss autoencoder used in our model helps to identify the most powerful features from OPG radiographs without any pixel-level annotation. Additionally, we demonstrated that combining image and clinical features improves the model's performance when compared to using only one type of feature. According to our findings, the top three most accurately predicted chapters in order are the Chapters III, VI and IX.
While we demonstrated that deep learning models can be used to predict systemic diseases from oral conditions, there are some limitations to this approach that should be addressed. First, we were unable to find an appropriate external dataset on which to evaluate our model, due to the complexity of the combined dental and medical datasets. Second, our model could be expanded by including more data of clinical images and parameters to further improve the model's performance for those systematic disease chapters where our current model has performed poorly. Further investigation would be highly warranted for screening and potentially identifying concurrent systemic diseases and conditions on the basis of oral/periodontal status, via applying the refined deep learning model. As such, it could be anticipated that the close collaboration and good teamwork of dental and medical professionals and computer science experts can better promote proactive disease prevention and deliver cost-effective personalized healthcare in the near future.

Conclusions
Within the limitations of the present study, various systemic diseases and disorders could be predicted according to oral conditions via the combination of panoramic radiographic findings and clinical oral/periodontal features by a fusion deep learning model. Oral/periodontal conditions may therefore be used conveniently for reflecting general health status and hopefully identifying concurrent systemic comorbidities in the future. It could be anticipated that medical professionals may well collaborate with dental and computer science experts for better promoting proactive disease prevention and thereby delivering precise and effective healthcare. Further evaluation and modification of this proposed model are highly warranted using the combined oral and medical datasets in different ethnic groups.