Deep Learning Algorithms for Screening and Diagnosis of Systemic Diseases Based on Ophthalmic Manifestations: A Systematic Review

Deep learning (DL) is the new high-profile technology in medical artificial intelligence (AI) for building screening and diagnosing algorithms for various diseases. The eye provides a window for observing neurovascular pathophysiological changes. Previous studies have proposed that ocular manifestations indicate systemic conditions, revealing a new route in disease screening and management. There have been multiple DL models developed for identifying systemic diseases based on ocular data. However, the methods and results varied immensely across studies. This systematic review aims to summarize the existing studies and provide an overview of the present and future aspects of DL-based algorithms for screening systemic diseases based on ophthalmic examinations. We performed a thorough search in PubMed®, Embase, and Web of Science for English-language articles published until August 2022. Among the 2873 articles collected, 62 were included for analysis and quality assessment. The selected studies mainly utilized eye appearance, retinal data, and eye movements as model input and covered a wide range of systemic diseases such as cardiovascular diseases, neurodegenerative diseases, and systemic health features. Despite the decent performance reported, most models lack disease specificity and public generalizability for real-world application. This review concludes the pros and cons and discusses the prospect of implementing AI based on ocular data in real-world clinical scenarios.


Introduction
Deep learning (DL) is a state-of-the-art subset of machine learning that allows computers to automatically learn the features from raw data. It comprises multiple processing layers that transform the data into stratified abstract levels to achieve specific tasks [1]. In recent years, DL has significantly advanced in various fields, such as visual recognition and natural language processing. With its ability to unveil intrinsic characteristics from high-dimensional data, DL has also been widely applied in medical AI to develop disease screening and diagnosis algorithms.
Ophthalmology is a pioneer in the field of medical artificial intelligence (AI). The eye is informative and accessible due to its inherent anatomical features. As an organ located on the body surface, most examinations can be done non-intrusively. Furthermore, the complex anatomy comprising various cells and tissues allows the generation of multimodal the quality and the risk of bias of the included studies.

Study Selection
The flowchart of the selection process is demonstrated in Figure 1. In the s stage, 2873 articles were identified from the three target databases, with 817 from 822 from Embase, and 1234 from Web of Science. After removing 812 duplica reports were excluded by screening the title and abstract. In the 85 studies ent full-text screening stage, one was unavailable and 26 others were eliminated fo reasons. With the addition of the four studies extended from references, 62 stud included in this systematic review. The included articles are summarized in Tab The results of the risk of bias evaluation using QUADAS-2 is demonstrated in Figure 2. Most of the studies included had low risk in all categories, while larger proportions of high risk were found in patient selection and index test. The detailed results are shown in Supplementary Table S2. al., 2020 [16] 1 Only the best performance are presented when there was more than one model. Metadata-based models and hybrid models are not presented in this table. EyePACS, eye picture archive communication system; HbA1c, glycated hemoglobin; AUC, area under curve; HC, healthy control; N/A, not applicable; T2DM, type 2 diabetes mellitus; CCM, corneal confocal microscopy; T1DM, type 1 diabetes mellitus; DPN, diabetic peripheral neuropathy; CNN, convolutional neural network; N/M, not mentioned; PCOS, polycystic ovary syndrome.

Algorithms Based on the Anterior Segment of the Eye
Most abnormalities of the external eye can be observed intuitively. These manifestations could provide easy access to several systemic diseases and were proven accessible with deep learning models. Babenko et al. [13] developed the DL algorithms based on external eye images taken with fundus cameras to distinguish hemoglobin A1c (HbA1c) ≥ 9% and lipid levels. The former achieved the highest AUC of 0.67 to 0.73, though the latter lacked significance. The models mainly focused on the nasal and temporal scleral areas, indicating that the clue for diagnosis may be conjunctival vessels. The work of Li et al. [14] proved that diabetes could be identified from conjunctival images with an accuracy of 75.15%.   Jaundice is also a distinct symptom often observed from the external eye. Using slit lamp photos as input, Xiao et al. [19] achieved AUCs over 0.90 in diagnosing liver cirrhosis and liver cancer. Another study focusing on neonatal jaundice [17] also attained an accuracy of 79.03% based on smartphone-captured images. On the other hand, the model built by Lv et al. [18] detected polycystic ovary syndrome (PCOS) based on sectioned scleral images. The model attained an accuracy and AUC over 0.90 by focusing on thick and foggy blood vessels in the sclera, which could be caused by sex steroid changes in the patients.
The cornea is densely innervated by the ophthalmic branch of the trigeminal nerves. Corneal confocal microscopy (CCM) allows for non-invasive quantification of the small corneal nerve fibers, providing a rapid evaluation method for various diseases. With CCM images as input, two DL algorithms were developed for the early diagnosis of diabetic neuropathy, one achieving an accuracy of 96% [16] and the other having an F 1 -score of 0.91 [15]. The Grad-CAM highlighted the absence of nerve fibers in the CCM images, showing that the models are explainable despite the relatively small datasets.

Algorithms Based on the Posterior Segment of the Eye
The retina provides a window for directly observing neurovascular structures in vivo based on its natural anatomical features. The development of retinal imaging technologies such as color fundus photographs and ultra-widefield fundus (UWF) imaging enabled intuitive en-face records of retinal pathologies. Additionally, optical coherence tomography (OCT) with the interferometry technique allows cross-sectional imaging of the multiple layers of the retina. The multimodal retinal data generated from various imaging methods creates an ideal platform for building DL algorithms for diagnosing systemic diseases.

Systemic Health Features
Systemic health features, such as age, gender, smoking status, blood pressure, and glucose level, are indicative and predictive of various disorders. The pioneering work of Poplin et al. [30] unveiled the possibility of using deep learning algorithms based on fundus photographs to predict systemic risk factors, giving rise to a series of works with akin goals. These models successfully predicted age with the mean absolute error (MAE) ranging from 2.43 to 6.328 [22,24,25,28,30,31]. As for identifying gender, the models also achieved satisfying AUCs ranging from 0.85 to 0.97 [20,22,[24][25][26]28,30,31,52], with a few studies highlighting the optic disc and the macula as regions of interest [20,26,30]. Ethnicities could also be categorized from fundus images with AUC and accuracy surpassing 0.90 [24,34].
Other than fundus images, OCT scans were also proven to be suggestive of patients' age and gender. The result from Munk et al. [28] indicated that prediction performance with OCT C-Scans or B-Scans of the macular region outperformed fundus images, achieving AUCs of 0.90 and 0.84 in detecting gender and obtaining the MAEs of 5.625 and 4.541 in predicting age. Mendoza et al. [27] proposed that circle and radial scans of the optic nerve head incorporate the potential for predicting age, gender, and race, among which race prediction achieved the best performance with an AUC of 0.96. The authors also illustrated that circle scans have better predictive value in DL algorithms.
As a notorious risk factor for various systemic diseases, smoking is deleterious since it impacts systemic vascular structure and function [65]. Recent studies have demonstrated that DL diagnostic models based on retinal images could achieve AUCs between 0.71 and 0.78 by capturing the pathological changes, as retinal circulation characteristics were marked in the attention maps [22,24,30]. Contrast-enhanced photographs emphasizing the vessel structure could significantly boost the model's performance and reach an accuracy of 88.88% [33].
Based on the parallel decrepitude of the body and the retina, several researchers proposed the idea of "retinal age" as a novel feature in disease monitoring. Chang et al. [38] suggested that a higher algorithm-predicted age than the chronological age translates into higher all-cause mortality. Nusinovici et al. [29] interpreted it with a different approach by defining "RetiAGE" as the probability of age being ≥65 years, and their study obtained similar results. The work of Hu et al. [23] extended the application of this method, proving that the model based on retinal age is predictive of the future risk of Parkinson's disease with an AUC of 0.71.

CVD
Cardiovascular diseases (CVD) cause the most significant proportion of deaths among non-communicable diseases. Studies have proven that the presence and severity of CVD are associated with retinal vascular morphology [66], providing the theoretical basis for building AI diagnosing models with retinal images.
The coronary artery calcium score (CACS) is a non-invasive assessment system that quantifies the prognostic risks of CVD [67]. Two studies have applied DL to predict CAC cores based on retinal images. One of the algorithms attained the highest accuracy of 0.72 in predicting CACS >0 despite a small sample size [42], and both studies suggested that the AUC improved as the diagnosing threshold increased [44].
Alternatively, several researchers have developed unique retinal scoring systems to use as CVD indicators. The RetiCAC score, the probability score of the DL binary classification task, could predict the presence of coronary artery calcium with an AUC over 0.70 [43] and was comparable with the traditional CAC risk stratification in predicting disease prognosis. Another scoring system based on retinal images, namely the DL fundoscopic atherosclerosis score (DL-FAS), achieved akin results in predicting carotid artery atherosclerosis and allcause mortality [41].
For direct classification of CVD, Al-Absi et al. [36] achieved an accuracy of 75.6% using only retinal images, and the region of interest of the model was mainly the central retinal area. Another study recruiting type 1 diabetes mellitus patients achieved an AUC of 0.77 in diagnosing CVD [37]. Peripheral artery disease (PAD), also attributed to atherosclerosis, was proven to be detectable from fundus images with an AUC reaching 0.89 [40]. Furthermore, there was evidence of applying retinal image-based AI in predicting perioperative parameters of congenital heart diseases [39], with the AUC of detecting cardiopulmonary bypass time reaching 0.80 and that of oxygen saturation, arterial blocking time, length of ICU stay, and perioperative complications surpassing 0.70.

Hypertension
Hypertension causes microvascular dysfunction. Morphological retinal vascular changes, such as narrower arteries and wider venules, could be observed in hypertensive patients [68]. In algorithms predicting biomarkers, the MAE was from 8.96 to 11.35 for systolic blood pressure (BP) and 6.42 to 7.20 for diastolic BP [22,30,31]. Interestingly, the studies applying DL to diagnose hypertension concomitantly preprocessed the input photographs to augment the vessel structures and erase background noise. The models based on processed images achieved AUC values of 0.65% and 0.77%, respectively [35,45], and the work predicting mild hypertension reached an accuracy of 93.75% [46] based on only 400 photographs.

Diabetes Mellitus
Diabetic retinopathy, with its rocketing prevalence and distinct fundus pathologies, has become the pilot field of ophthalmic AI. Aside from diagnosing typical retinopathy, there have been multiple attempts at employing DL to predict diabetic mellitus (DM) as a disease. Kang et al. reached the highest AUC of 0.92 [59], and the performance of other approaches was no worse than 0.73 [35,47,[50][51][52]. When evaluated for accuracy, the models reached from 83.7% to 85.0% [48,49]. One study that applied Xception and dense neural network (DNN) achieved a training accuracy of 96.68% and a validation accuracy of 66.82% although only 220 images were put into model training.
Hemoglobin A1c (HbA1c) is an essential biomarker for long-term glucose monitoring [69]. Tham et al. have proven that retinal images contain information indicating HbA1c level by achieving an MAE of 0.87% with the DL algorithm [32]. Notably, it was suggested that diabetic neuropathy could also be detected from fundus photographs, with the AUC reaching 0.71 [53].

Anemia
Anemia is a common disease and a symptom of various systemic disorders. DL based on fundus images was proven sufficient in predicting hemoglobin concentration and diagnosing anemia [54,56], thus could be considered a novel non-invasive method for disease management. Explanation methods showed that the models focused on the optic disc and the retinal vessels, which is consistent with the typical ocular symptoms such as pale discs and narrower arteries in anemic patients.
Wei et al. [55] tackled the problem from a different perspective by using OCT images that displayed the cross-section of retinal vessels as the model input. Although the algorithm achieved excellent results, the dataset was diminutive and external validation was not applied.

Hepatobiliary Diseases and Kidney Diseases
The liver and the kidney share multiple essential physiological functions, including metabolism and maintaining homeostasis. Recent studies have suggested that diseases of both organs can be observed with deep learning algorithms based on fundus photos. Xiao et al. [19] proved that hepatobiliary diseases, especially liver cancer and liver cirrhosis, could be diagnosed with an AUC over 0.80 from fundus images. In the case of chronic kidney disease (CKD), the algorithms obtained excellent performance in predicting early CKD and CKD [57][58][59]. Color fundus images could provide intuitive observation of the systemic microvasculature, enabling the detection of vascular defects in CKD patients.

Neurological Disorders
A diversity of neurological diseases can be detected from the morphological changes of the retina. White matter hyperintensity, referring to the lesions caused by cerebral small vessel diseases, is predicted from fundus photos with an AUC of approximately 0.70 [60]. As for cognitive impairment, previous studies indicated that DL with retinal images alone was limited in predicting cognition status [21]; however, UWF combined with OCTA and autofluorescence (FAF) could achieve an AUC of 0.74 in detecting Alzheimer's disease (AD) [63]. Likewise, autoimmune diseases such as axial spondyloarthritis [64] could also be distinguished with a fair AUC of 0.74. On the contrary, studies focusing on autism spectrum disorder (ASD) [62] and schizophrenia [61] obtained an AUC of over 0.97, possibly attributable to the fact that both models applied cross-validation methods for performance evaluation. These results have proven that several categories of neurological disorders demonstrate retinal changes, although the DL models based on fundus images are not yet sufficiently developed to perform diagnostic tasks individually.

Algorithms Based on the Movements of the Eye
Eye movements are coordinative actions dominated by cognitive processes and behavior mechanisms [70]. Previous studies have proven that the specific gaze patterns captured by eye-tracking devices could be predictive of neurodegenerative diseases, such as Parkinson's disease (PD), dementia, and autism spectrum disorders (ASD). With the advancements in hardware and algorithms, the current eye-tracking methods have achieved explicit temporal resolutions and could provide additional information unattainable by traditional imaging techniques.

Dementia and Parkinson's Disease
Dementia is a global health issue in the aging society. It was suggested that eyetracking tests could provide a rapid and objective method for assessing patients' cognitive functions, such as memory and attention [11]. Mengoudi et al. [71] designed a test to trace the participants' sight while presenting images with different stimuli, and the model achieved an accuracy of 78.3% in classifying dementia. Alternatively, Biondi et al. [72] developed a resolution based on eye movement during reading tasks. Their result had a decent performance with an accuracy of 89.8%, and the severity scaled by model output was comparable with psychiatrists' scoring.
PD is another neurodegenerative disease affecting a large population worldwide. As previous studies implied fixational defects in PD patients, Archila et al. [73] developed an algorithm based on fixational performances to distinguish and stage PD. Their model achieved relatively good specificities, and the performance advanced after combining gait data.

Autism Spectrum Disorders
Visual attention characteristics are among the most specific traits obtained from eye movement data. Such hallmarks could be applied in ASD screening, which distinctively presents changes in attention patterns towards certain visual elements. Jiang et al. [78] discovered that ASD patients mainly focused on non-social subjects while presented with a variety of pictures, and their model achieved an AUC of 0.92. Xie et al. [77] further distinguished several categories of image features, such as outdoor objects and food and drinks, that poses importance in identifying ASD. The model based on the top features also performed excellently with an AUC of 0.92.
Li et al. adopted a different method by displaying the mother's image and tracking the children's gaze patterns [74,75]. By applying appearance-based gaze estimation, their models achieved high accuracies of over 90%. Besides the reaction to still images, Varma et al. [76] captured the gaze fixation and visual scanning methods in socially motivated gameplay. The developed algorithm showed mild predictive power in identifying ASD in children.

Other Disorders
Vestibular disorders could cause significant ocular presentations, namely abnormal nystagmus and saccade. It usually requires an experienced specialist for evaluation in clinical settings to help diagnose vestibular diseases. With the advancement of DL, a few studies utilized eye movement data for discrimination between systemic diseases. Ahmadi et al. [80] identified vestibular strokes and peripheral acute vestibular syndrome with an AUC of 0.96. Mao et al. [79], on the other hand, obtained eye motion during gazing tasks and achieved an AUC of 0.94 in differentiating controls, brain injury, and vertigo patients.

Discussion
This systematic review concludes the performances of deep learning algorithms based on ocular data in evaluating systemic health conditions. Overall, most systemic diseases proven to be detectable from static ocular manifestations impact neurovascular structures, which project changes to the eye in areas such as retinal vessels and corneal nerves. Most studies used colored photographs as input; however, depth-resoluted OCT images were also applicable. Alternatively, neurodegenerative disorders mainly present as defects in eye movements, and eye-tracking data in specific tasks or spontaneous abnormalities were obtained as model inputs. The reported algorithms achieved fair results, with AUCs and accuracies exceeding 0.7 in most studies despite small datasets. The saliency maps and heatmaps also showed that the models were built on rational reasoning despite the "black box" process of deep learning. Regardless of the outstanding performances presented in mostly retrospective datasets and with handpicked participants, several aspects should be advanced before putting the models in real-world application.

Present and Prospects
Systemic health features could have significant latent effects on the primeval ocular characteristics. Features such as age, sex, and ethnicity were proven to be credibly identified from ophthalmic data. While predicting age, the algorithms mainly focused on retinal vessels and the optic nerve head (ONH) areas [25,30], which are concordant with the aging of the retina [81,82]. Sex, on the other hand, was identified based on the ONH and the macular area, where innate gender differences in ONH blood supply [83] and FAZ area [84] exist. Besides being the baseline characteristics of the patients, these features could concurrently be risk factors for many systemic diseases. Former reports [7,8] proved that age and sex are interrelated with cardiometabolic risk factors and conditions in retinal image-based DL algorithms, possibly due to their mutual effect on fundus vessels. Therefore, studies targeting diseases with sex or age differences should control for these confounders to prevent overestimating the model's performance.
Ethnicity was another critical factor proven to be distinguishable from ophthalmic presentations. Aside from affecting the retinal structure [85], ethnicity is also a determinant of the ocular disease spectrum. However, most algorithms were trained on datasets with little to no diversity, affecting the generalizability in real-world scenarios. We suggest that researchers consider data with racial diversity as external validation, and more multi-ethnic datasets should be established to produce generalizable DL models.
Regarding algorithms for diagnosing neurodegenerative diseases based on gazing patterns and eye movements, the communal issues are the limited datasets and the lack of external validation. With video data as input in most cases, these algorithms must be robust against significantly greater interferences to be applied in different real-world scenarios. A large-scale validation in the generalized public would be much preferred for further approval of the DL algorithms.
DL is known for its representation-learning nature. The ocular vasculature, including the conjunctival and the retinal vessels, were some of the most conspicuous and vulnerable structures and were often identified as the focused feature in saliency maps. For instance, metabolic syndrome [86] presented as hypertension, hyperglycemia, and dyslipidemia was found to cause retinal arteriolar narrowing [87]. Arterial defects in these conditions were reported to be caused by a few shared pathophysiology, such as oxidative stress, glutathione peroxidase, and impaired acetylcholine-mediated vasodilatation. As a result, algorithms predicting hypertension, diabetes, and CVD simultaneously highlighted the retinal vessels as the area of interest. On the other hand, CKD causes systemic atherosclerosis and vascular calcification [88], which could also present in the retina as arteriolar thinning and sparse capillaries. Since current studies mainly focused on discriminating the target disease from healthy controls, the algorithms were likely to identify universal pathologies instead of exclusive characteristics of each condition. Therefore, these DL models could lack specificity if applied in real-world scenarios where all systemic diseases coexist. Future studies aiming to distinguish between diseases with similar pathological characteristics would greatly favor the implementation of DL algorithms in real-world screening and diagnosis.

Advantages and Drawbacks of AI in Clinical Settings
The application of AI algorithms in clinical settings has been a controversial topic. AI models benefit disease screening, diagnosing, and management in several aspects: (a) improve efficiency compared with human graders and enable large-scale screening programs; (b) allow advanced medical technology to reach remote areas with algorithms deployed in portable devices; (c) reduce health-care expenses by saving human resources; and (d) discover preclinical changes for early disease screening. Implementing ophthalmic examinations in disease screening algorithms further provides several advantages. Ophthalmic examinations are non-invasive and rapid compared with other traditional tests; therefore, the screening procedure can be simplified to a great extent. Moreover, the neurovascular structures could be observed intuitively from the ocular anatomy, offering a window for analyzing the underlying morphological and pathological features.
Nonetheless, there are primary disputes about launching AI algorithms in clinical settings. First and foremost is the debate on AI ethics. Models should be thoroughly investigated before being assigned with allowance for real-world tasks. Secondly, the robustness of AI models is often questioned in actual practice. Despite data with varied baseline characteristics, the algorithms could also encounter a variety of low-quality inputs. Researchers should ensure the algorithm can adapt to widely-varied datasets to offer a generalizable and reliable program.

Strengths and Limitations
This systematic review is the first to conclude deep learning algorithms for systemic disease screening and diagnosing based on ocular data. It provides a comprehensive view of the current trend and methodology in observing various systemic conditions from eye manifestations. We believe this work could be a valuable reference for subsequent studies.
There are some limitations in the current study. According to our selection criteria, studies utilizing DL for feature extraction and statistical methods for condition diagnosis were excluded. This may lead to information loss, as several studies achieving decent results were eliminated. Secondly, our study included meeting abstracts to involve upto-date research works that have not yet been published. However, the lack of detailed information in the study design translates into unknown risks of bias. Lastly, this review did not inspect the development of deep learning algorithms in detail. Future reviews focusing on AI techniques are preferred to provide further information for computer scientists and program developers.

Conclusions
Deep learning has been shown to be beneficial in identifying systemic diseases from ocular presentations. Despite presenting decent performance in the articles, the algorithms still have several shortcomings for clinical application. Future studies should aim at improving the disease specificity and generalizability of the DL models for implementation in real-world screening tasks.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/diagnostics13050900/s1; Table S1: The search strategy used for obtaining research articles in the three selected databases;