Optimized Prediction Models from Fundus Imaging and Genetics for Late Age-Related Macular Degeneration

Age-related macular degeneration (AMD) is a leading cause of blindness in the developed world. In this study, we compare the performance of retinal fundus images and genetic-information-based machine learning models for the prediction of late AMD. Using data from the Age-related Eye Disease Study, we built machine learning models with various combinations of genetic, socio-demographic/clinical, and retinal image data to predict late AMD using its severity and category in a single visit, in 2, 5, and 10 years. We compared their performance in sensitivity, specificity, accuracy, and unweighted kappa. The 2-year model based on retinal image and socio-demographic (S-D) parameters achieved a sensitivity of 91.34%, specificity of 84.49% while the same for genetic and S-D-parameters-based model was 79.79% and 66.84%. For the 5-year model, the retinal image and S-D-parameters-based model also outperformed the genetic and S-D parameters-based model. The two 10-year models achieved similar sensitivities of 74.24% and 75.79%, respectively, but the retinal image and S-D-parameters-based model was otherwise superior. The retinal-image-based models were not further improved by adding genetic data. Retinal imaging and S-D data can build an excellent machine learning predictor of developing late AMD over 2–5 years; the retinal imaging model appears to be the preferred prognostic tool for efficient patient management.


Introduction
Age-related macular degeneration (AMD) is the leading cause of visual disability in the developed world and a leading cause globally [1]. Approximately 11 million individuals are affected with late AMD in the United States of America (USA) alone, with a global prevalence of 170 million in 2015 [2,3]. Aging is the most significant risk factor. The prevalence of late AMD in the USA is expected to increase to 22 million by the year 2050, while the global prevalence is expected to increase to 288 million by the year 2040 [2].
AMD has been associated with many genetic and environmental risk factors and their interactions [4]. Early detection and referral to an ophthalmologist could enable management options such as photobiomodulation [5], laser intervention [6], or other strategies that may arise in the future.
Our literature review found three publications on the prediction of late AMD progression. A model proposed by Bhuiyan et al. [7], color fundus image and socio-demographic (S-D)-data-based prediction models for 2-year, 5-year and 10-year incidence achieved up to 86.4% accuracy. Yan et al. [8] found 85% accuracy for their 7-year incidence prediction model. In one of our previous papers [9], we proposed a screening method which showed over 92.5% accuracy. There are also AMD screening approaches quantifying drusen using traditional non-deep-learning-based approaches [10]. In a research paper by Wu et al. [11], a color fundus image and optical coherence tomography (OCT) image-based prediction achieved an AUC of 0.88 for 3-year incidence. The genetic loci and their association with late AMD progression has been summarized in [12] with providing the odds ratio (OR) (maximum OR 8.59 for ARMS2) for individual locus. Our study did not find any model that is based on genetic information only. A review of various methods using traditional approaches was given by Kanagasingam et al. [13] The models we test and optimize are machine learning (ML)-based models built on retinal images, genetic and socio-demographic (S-D) parameters, and their combinations. We also compare these models with our recently developed late AMD prediction model using retinal fundus photos and (S-D) parameters [7], which predicts an individual at risk of developing late AMD within 2 years, 5 years, and 10 years. We have used the combinations of the input variables to study the best parameters for the prediction of late AMD and validate the best method to predict the disease at an early stage and help the prevention of blindness.

Materials and Methods
The methods section is organized as follows. First, the dataset is described briefly along with the acquisition of the dataset. Second, the different types of data used in the study are described, namely, genetic data, socio-demographic and health data, and retinal image data. Third, we describe the various models we built for comparison, i.e., the various combinations of the different types of data used to build the models. Lastly, we describe the statistical measures used in this comparison study.
Briefly, the main points of the methods are as follows: 1.
For building AMD prediction models, we used retrospective dataset made available by the AREDS study [14]. It consists of data from 4146 participants who enrolled in the study and were monitored for over 13 years during the course of the study.

2.
We wanted to analyze the best and the most useful predictors of AMD disease, so we built and compared statistical and machine learning models with a variety of different combinations of data types (retinal, genetic, and medical data).

3.
In total, there were 566 participants chosen from the AREDS study based on the availability of all the retinal, medical, and genetic data.

4.
In the case of retinal image data, we already had deep-learning-based classifiers that were built using over 100,000 images from the AREDS dataset, not including the data from 566 participants used in this comparative study. The dataset was split into separate training, validation, and testing sets in the ratio of 60:20:20, respectively. The automated grading by these classifiers was used as parameters in building further machine learning models along with the other genetic and medical (and socio-demographic) parameters.

5.
Over the course of the AREDS study, conversions to late AMD are recorded in yearly visits. To build prediction models, we prepared datasets which included participants whose eyes converted to late AMD in 2, 5, and 10 years. 6.
For each of these durations, we separately built models using combinations of retinal, socio-demographic, and genetic data. 7.
Lastly, we analyzed the best predictors for each of the duration (2, 5, and 10 years) and proposed the best models.

The Age-Related Eye Disease Study (AREDS)
The AREDS is a study on late AMD sponsored by the National Eye Institute [14]. The AREDS participants were 55 to 80 years old at enrollment, and they had to be free of any illness or condition that would make a long-term follow-up or compliance with study medications unlikely. Based on fundus photographs graded by a central reading center, the best-corrected visual acuity, and ophthalmologic evaluations, 4753 participants were enrolled in one of several AMD categories, including persons with no AMD. Broadly, the information collected in the follow-up visits were socio-demographic and clinical data (e.g., blood pressure or diabetes), genetic data, and retinal fundus image data. In our study, the number of subjects was based on the availability of genetic data in the AREDS dataset, as explained in the following subsections.

Genotype Data
The AREDS dataset contains genetic data (SNPs) for 568 subjects, 381 cases of advanced AMD (exudative (246), atrophic (184), or both (51)) and 187 normal cases [15]. We restricted our study to these AREDS subjects for the 2-year models. For the 5-and 10-years models, there were 566 participants with data available: 183 with atrophic AMD, 246 with exudative AMD, 50 with both forms, and 187 normal cases.
The current literature suggests that risk alleles of genes ARMS2, CFH, and SNPs in C2/CFB rs641153, C3 rs2230199, C2/CFB rs4151667 are linked with late AMD (Tables A1 and A2 in the Appendix A), and that high risk is attributable to the one SNP of ARMS2 and the five SNPs of CFH (Table A2) [15][16][17][18][19] that we have considered in this study. These are CFH SNPs-rs380390, rs572515, rs800292, rs1329428, rs10801575, and ARMS2-rs10490924. The genotyping was done by AREDS using the Illumina HumanOmni2.5 platform [20]. The risk ratios of these SNPs for late AMD outcomes, with associated p values, are shown in Table A1.

Socio-Demographic Data
The AREDS subjects were randomly assigned to the vitamins and mineral supplements and placebo groups [14]. Socio-demographic data, along with physiological data, were collected every six months from the participants, and altogether 13 years of follow-up data are available in the dataset. For this study, we considered socio-demographic, physiological, and clinical data taken periodically during that time. They include gender, age, smoking status, diabetes status, body mass index (BMI), blood pressure, sunlight exposure, and visual acuity. From this longitudinal data, for those subjects with incident late AMD, study data were taken from one visit approximately within 2 years of the disease diagnosis for building the 2-year risk prediction model, and the data were taken similarly for the 5-year model. For subjects without incident late AMD within 2 years (similarly, 5 years and 10 years for the 5-year and 10-year models), data from one random visit during the longitudinal data were taken. Therefore, all subjects with and without incident late AMD had exactly one row of data in the final dataset, which was used for the analysis and model building.

Retinal Image Data
AREDS has defined a 9-point scale based on the retinal image for the risk of late AMD progression [21]. The risk of patients converting to late AMD ranges from about 2% in level 1 to about 50% in level 9 on that scale. To include the images that do not fall in the 9-point scale, i.e., the images that already show progression, the scale was extended to add 3 more levels to define advanced AMD for grading all the images. The 10, 11, and 12 levels indicate late dry AMD only, late wet AMD only, and both dry and wet late AMD. In total, we used twelve levels to develop AMD prediction models. Along with AMD severity, AREDS data contain graded information about the AMD category [14]. Four categories are defined, namely, 1 to 4, based on the presence and extent of drusen and other AMD pathologies. Category 1 is referred to as normal, 2 as early, 3 as intermediate, and 4 as advanced or late AMD. A deep-learning-based automated classifier pre-classified the images into these AREDS defined categories automatically. This retinal-image-based grading algorithm can be summarized as follows.
The deep-learning-based automated AMD category and severity classifiers made use of over 80,000 images from the AREDS dataset, which had been graded for AMD severity and category by the AREDS grading center. Extra care was taken to ensure that no participant whose images are used in these automated models also appears in the AMD prediction model that was built solely on those participants who have genetic data, as explained in the previous sections.

Data Categories
The input variables are a combination of continuous variables, binary variables, and categorical variables. The clinical data and socio-demographic data are continuous and binary, whereas image parameters are categorical (AMD category and severity), and the SNPs are binary variables. Keeping the reference allele frequency as base, the two allele frequencies are taken as two separate input variables for every SNP.

Late AMD Prediction
Socio-demographic data, clinical data, fundus image data, and genetic data were used to build the models for late AMD prediction. Six different machine learning models were implemented and compared in this study with these sets of input variables ( Table 2): 3. Automated AMD grades from retinal images. 4. Socio-demographic/clinical data and genetic data. 5. Socio-demographic/clinical data and retinal image data. 6. All the input variables (1-5).

Ensemble Approach to Model Building
Our deep learning (DL)-based classifier [22] is an ensemble of DL AMD classifiers developed earlier that determine from a retinal image the stage of AMD present (no AMD, early AMD, intermediate AMD, and late AMD) [23]. The classifiers are five networks of different image input sizes of "Inception-V3" proposed by Szegedy et al. [24], "Inception-Resnet-V2" proposed by Szegedy et al. [25], and "Xception" proposed by Chollet [26]. The ensemble approach was found to be the best performing experimentally compared to any individual classifier.
Another ensemble deep learning classifier [27] was used to assign probabilities that an image falls within each of the 12 AREDS classes; specifically, for images without late AMD, the probabilities of falling within the first 9 classes. This system consists of an ensemble of six neural networks, each differing from the other with respect to the combination of input image size and the network architecture. The six networks are: Xception network with input size 499 × 499, Inception-Resnet-V2 network with size 399 × 399, Xception network with size 299 × 299, Inception-V3 network with input size 599 × 599, Inception-V3 with input size 399 × 399, and NasNet network (proposed by Zoph et al. [28]) with input size 399 × 399. Each network is trained to give an array of 12 probabilities, one for each class, whose values are then combined to give the most probable AMD severity level. The AMD stage and severity level are then the retinal image inputs to the 6 machine learning (ML) AMD prediction models just listed. The complete set of input variables (Figure 1, left column) are normalized and scaled.

Model Optimization
Five separate machine learning (ML) and statistical algorithms (Bayesian modeling [29], logistic regression modeling [30], decision trees [31], random forest [32], and logistic model tree [33]) were ensembled in each of the 6 models to provide optimum prediction probabilities. Random forest (100 iterations, batch size of 50, and unlimited maximum depth), naïve Bayes (batch size of 100), logistic model tree (15 instances, 1 boosting iteration, and batch size of 1), simple logistic (max boosting iterations set at 500, heuristic stop at 50, and batch size of 100), and multilayer perceptron (with 500 iterations, a learning rate of 0.3, and momentum of 0.2). Due to the relatively small size of the dataset, we used 10-fold cross-validation for assessing each model's performance. Each prediction model had two separate subsystems, for predicting late dry AMD and late wet AMD, respectively. Both subsystems operated with the same procedure and input variables as the parent system, with their own target outcomes The deep learning models were implemented using the Keras framework with the Ten-sorFlow backend. Pandas and lifelines libraries were used for the ML/statistical models. The training data are available in the public domain upon request from dbGaP that holds the AREDS image and genotype datasets. The code and trained models are available upon request.

Statistical Measures
The final AMD prediction system contains several sub-methods and algorithms which were evaluated separately and as a whole. Retinal-image-based classifiers that classify the images into 4 categories and 12 levels were evaluated using categorical accuracy, loss, and quadratic weighted kappa. More on this was published in our detailed research publication on stratifying AMD automatically [23]. The individual machine learning models were evaluated using accuracy measures based on the two classes-AMD or no AMD. The final classifier, which is an ensembled model, was evaluated based on several clinically relevant measures such as sensitivity, specificity, accuracy with AMD as positive, and no AMD as negative. The model was also evaluated by plotting the receiver operating curve. The area under the curve was also calculated for the final models. Kappa scores were calculated for each of the models and are used for comparison. For all the measures, 95% confidence intervals were calculated and presented for comparison. Table 1 shows the detailed results of the two-year prediction model on sensitivity, specificity, and accuracy measures. The table also shows the results for the two subsystems with the same measures. The 2-year AMD prediction system achieves a high accuracy of 89.61% (95% CI-86.81% to 92.00%) with a sensitivity 92.13% (88.95% to 94.62%), a specificity of 84.49% (78.49% to 89.36%) when all the input variables are used. When only data from socio-demographic and clinical variables are taken, the model achieves an accuracy of 72.01% (68.12% to 75.66%). Table 2 details the results of the 5-year AMD prediction system that achieves an accuracy of 87.21% (84.34% to 91.29%) with a sensitivity of 90.11% (87.05% to 93.32%), a specificity of 83.45% (76.49% to 88.26%) when all the input variables are used. When only data from socio-demographic and clinical variables are taken, the model achieves an accuracy of 68.07% (63.12% to 74.06%). Figure 2 shows the ROC curves from the 2-year and 5-year late AMD prediction models.

Results
FOR PEER REVIEW 9 of 14 Figure 2 shows the ROC curves from the 2-year and 5-year late AMD prediction models.    Table 2. The performance comparison of the models with different inputs for predicting 5-year risk of developing 'any AMD' (Dry or Wet AMD), dry AMD and wet AMD. The measures such as the sensitivity, specificity, accuracy and kappa along with their 95% confidence intervals are given.  Table 3. The performance comparison of the models with different inputs for predicting 10-year risk of developing "any AMD" (dry or wet AMD), dry AMD and wet AMD. The measures such as the sensitivity, specificity, accuracy, and kappa along with their 95% confidence intervals are given.  Table 3.

Discussion and Conclusions
In this paper, we have provided a novel comparative study between image-and genetic-data-based late AMD prediction models. The results clearly show that the best models of AMD incidence within two years are based on retinal images. The addition of socio-demographic, clinical, and genetic data improved their overall performance, but were inferior predictors by themselves of late AMD incidence in two years and five years. The reason may be that the early signs of late AMD such as drusen or reticular pseudodrusen appear in retinal imaging. Predictive models could improve the management of higher risk patients in several ways: greater attention to modifiable risk factors such as body mass index (BMI), smoking, diet, and blood lipid levels [34]; increased motivation for decreasing these risks; increased motivation for both physician and patient for more frequent exams. Factors that cannot be modified at present include genotype at a given risk locus, sex, ethnicity, and age [34]. However, there is promising research into the early treatment of AMD in addition to oral AREDS supplements [35].
Although the genotype and phenotype were studied earlier to find the association with retinal disease, this paper addresses the challenge of predicting late AMD and shows a comparison between genetic and image-based late AMD prediction. In another similar model [8] a genetic and image-based AI model was proposed for predicting late AMD. However, our analysis showed that genetic information is more useful in addition to traditional risk factors in long-term scenarios when baseline imaging does not capture pathologic precursors of late AMD. However, for the 2-year and 5-year prediction of late AMD, imaging is much superior, sufficient, and highly accurate.
For this study, we used many different types of variables which may be hard to obtain in another similar study for external validation. However, we have tested a retinalimage-based model on an AMD study dataset called "NAT-2" [36]. This dataset contains images from 300 participants monitored for 3 years. We used expert graders to assess the referability of the images and tested it against our AMD screening system. The results were more than promising. The system correctly classified 82 out of 89 as non-referable for AMD (specificity 92.13; 95% CI 84.46% to 96.78%) and correctly classified 175 out of 199 as referable for AMD (sensitivity 87.94; 95% CI, 82.59% to 92.12%). The rest of the images were deemed ungradable.
For short-term late AMD prediction, genetic information alone was inferior to the high predictive value of baseline imaging severity of AMD. However, for 10-year late AMD prediction, results were similar: the genetics-based information achieved 68.2% accuracy with 75% sensitivity and 51.6% specificity, while the retinal-image-based model achieved 72.9% accuracy with 73.8% sensitivity and 72.7% specificity. Thus, image-based models are superior for shorter terms (2-year and 5-year), but not longer.
The models performed very well in clinically relevant measures, indicating that good late AMD prediction systems are within reach and ready for testing in real-world, prospective scenarios. We also found that these systems can predict the type of advanced AMD (dry or wet) with acceptable accuracy, but they could surely be improved by including spectral-domain optical coherent tomography (SD-OCT).
Our comparative study contrasts with another AI model that uses fundus images primarily for automating the standard AMD severity scores in general use by ophthalmologists. The authors of that proposed AI model did not include other risk factors such as age, smoking status, or AMD risk genetic variants. We believe that the prediction of late AMD progression will be more useful than severity score for patient management.
We anticipate that with an image-based prediction system, we can diagnose AMD early and can help to screen AMD fast and in a cost-effective way, which will lead to a large-scale screening through primary care settings. A further prospective study utilizing primary care settings can confirm the suitability of the screening. This is an essential approach that we screen through primary care settings, as when most subjects show up, it is late, and the only option is to stop the degeneration. Thus, if implemented, a screening in primary care settings, where most people visit regularly, will overcome the problem of not showing up at ophthalmic clinics timely. The individuals at risk of late AMD will be referred to an ophthalmologist who can make a further diagnosis and start preventative measures (e.g., AREDS supplement) or necessary treatment. This early intervention will eventually help the prevention of late AMD and unnecessary blindness.
We conclude that late AMD prediction by imaging only, including the dry and wet forms, is possible without genome sequencing for shorter time periods (2 to 5 years), but not significantly longer. Data Availability Statement: The training data are available in the public domain upon request from dbGaP that holds the AREDS image and genotype datasets. The code and trained models are available upon request.

Conflicts of Interest:
The authors declare no conflict of interest.  Table A2. The short list of significant genes which are associated to the incidence of late AMD are shown here. The cumulative data of significance of genes related to AMD which are selected in this study based on existing research. ARMS2 and CFH were consistently found in multiple research publications. Other genes found to be associated with different types of AMD with varying levels are also shown concisely in the table.