Genome-Wide Polygenic Risk Score for Predicting High Risk Glaucoma Individuals of Han Chinese Ancestry

Glaucoma is a progressive and irreversible blindness-causing disease. However, the underlying genetic factors and molecular mechanisms remain poorly understood. Previous genome-wide association studies (GWAS) have made tremendous progress on the SNP-based disease association and characterization. However, most of them were conducted for Europeans. Since differential genetic characteristics among ethnic groups were evident in glaucoma, it is worthwhile to complete its genetic landscape from the larger cohorts of Asian individuals. Here, we present a GWAS based on the Taiwan Biobank. Among 1013 glaucoma patients and 36,562 controls, we identified a total of 138 independent glaucoma-associated SNPs at the significance level of p < 1 × 10−5. After clumping genetically linked SNPs (LD clumping), 134 independent SNPs with p < 10−4 were recruited to construct a Polygenic Risk Score (PRS). The model achieved an area under the receiver operating characteristic curve (AUC) of 0.8387 (95% CI = [0.8269–0.8506]), and those within the top PRS quantile had a 45.48-fold increased risk of glaucoma compared with those within the lowest quantile. The PRS model was validated with an independent cohort that achieved an AUC of 0.7283, thereby showing the effectiveness of our polygenic risk score in predicting individuals in the Han Chinese population with higher glaucoma risks.


Introduction
Glaucoma is the second leading cause of blindness globally [1], but the irreversibility of glaucoma, as opposed to cataract, highlights the importance of diagnosing and treating it as early as possible. Glaucoma is a chronic progressive disease that is commonly characterized by changes in the optic nerve head as well as the corresponding visual field defects [2]. Current treatment methods of glaucoma are limited to lowering the intraocular pressure (IOP) to slow the disease progression at early disease stages, but a majority of glaucoma cases remain undiagnosed until irreversible optic nerve damage occurs [3]. A follow-up study on patients in the United States with treated primary open angle glaucoma (POAG) found a 9% probability of bilateral blindness and 26% probability of unilateral blindness at 20-years' follow-up [4]. This could be because the initial stages of visual field abnormalities in glaucoma frequently occur in the periphery nasal field rather than the central field, and functional loss mostly only happens as a late symptom in centrally deteriorating visual field cases. Therefore, given that the prognosis for glaucoma depends on the stage at which it is detected, early detection and treatment are the best ways to preserve the retinal ganglion cells and nerve fiber layers. The understanding of genes associated with glaucoma therefore opens new and important avenues for glaucoma treatment.
Genome-wide association studies (GWAS) are observational studies of the genomewide genetic variation of different individuals aimed to identify any associations with the traits. Such research usually focuses on the association between single nucleotide polymorphisms (SNPs) and major diseases. GWAS and other genetic and genomic studies have accelerated the discovery of genes associated with glaucoma over the past decade. So far, several GWAS studies of glaucoma have identified over 100 risk loci, including mutations in the CYP1B1 [5][6][7], OPTN [8,9], and PLEKHA7 [10,11] genes. Despite this, neither the normal functions of MYOC nor how MYOC mutations result in IOP elevation and subsequently glaucoma have been defined. Further genomic studies will help to extend our understanding of the corresponding biological pathways contributing to the pathogenesis of glaucoma. However, many studies have shown that the incidence of glaucoma is geographically and ethnically variable. The incidence of primary congenital glaucoma (PCG) varies from 1:10,000 in the Western population to 1:1250 in inbred populations such as the Gypsy subpopulation of Slovakia [12], whereas the prevalence of POAG is 2.8-8.8% among the African descendants, much higher than among the Caucasians (1.1-2.1%) and Asians (2.6%) [13][14][15]. The discovery of novel genes and their mutations associated with glaucoma with considerations over ethnic differences is hence essential [16,17]. However, presently, most loci were discovered in the populations of the European ancestry [8,12,18,19]. Hence, a larger GWAS cohort focused on people of East Asian ancestry is needed to identify additional risk loci and complete the glaucoma genetic landscape. With increasing genetic testing, early interventions to highlight glaucoma risks at their earliest onset can allow the best precision management strategies to prevent glaucoma-related vision impairment [20,21]. It can also serve as a genetic basis for identifying new therapeutic targets for the associated pathological pathways and molecular events.
Hence, we conducted a large-scale GWAS for glaucoma using the Taiwan Biobank (TWB) 1.0 and TWB 2.0 databases. The TWB database includes whole-genome sequencing data of the Taiwanese population [22]. TWB (https://www.twbiobank.org.tw/new_web/, (accessed on 28 September 2021) is a prospective cohort study of the Taiwanese population, mainly of Han Chinese ancestry, with genomic data divided into either hospital-based or community-based. The community-based biobank covering all the participants of this study includes repeated measurements of a wide range of phenotypes collected from 148,567 individuals (as of August 2021). TWB recruits 30-to 70-year-old participants with no cancer history across 29 recruitment centers in Taiwan [22]. In this study, a total of 96,715 Han Chinese ancestry subjects were involved, and 16,222,535 loci were analyzed. Glaucoma risk loci that have not been previously reported at genome-wide significance levels were identified. Finally, we constructed a glaucoma polygenic risk score (PRS) for risk stratification of glaucoma to predict glaucoma risk in the Taiwanese population retrospectively. The PRS was modeled based on the TWB2.0 and verified with an independent TWB1.0 cohort. We aimed to provide further evidence for the differential genetic architecture of glaucoma between different ethnic populations and provide a unique PRS model for risk stratification in individuals of Han Chinese ancestry.

Participant Characteristics in TWB 2.0 and TWB 1.0
There were 68,978 TWB2.0 participants and 27,737 TWB1.0 participants, with TWB2.0 being used as a discovery set for glaucoma risk alleles and TWB1.0 as a validation set. Among the discovery set (Table 1), 1013 self-reported glaucoma individuals (defined as cases), constituted 1.47% of the 68,978 TWB2.0 participants. A total of 36,562 individuals without self-reported glaucoma or related comorbidities were considered as controls (for the list of comorbidities, please refer to the Methods section). The total sample size for the discovery set was 37,575, with the case-control ratio of 1:36. On the other hand, in the validation set (TWB1.0), there was a total number of 7082 individuals, consisting of 450 self-reported glaucoma cases and 6632 controls who had neither glaucoma nor glaucoma-related comorbidities. The basic characteristics of both discovery (TWB2.0) and validation (TWB1.0) sets are shown in Table 1. In the discovery set, gender distributions were similar between the cases and controls, with females predominating.

Glaucoma Risk Loci
After performing adjustments of the Scalable and Accurate Implementation of Generalized mixed model (SAIGE) as described in the Methods section, we conducted a GWAS, and a Manhattan plot for this TWB2.0 study group was generated ( Figure 1). Three SNPs had a genome-wide significance level <1 × 10 −6 while 138 SNPs had p < 1 × 10 −5 .

Polygenic Risk Score (PRS) and Glaucoma Risk Prediction
To predict glaucoma risk, we constructed a PRS model based on 134 associated SNPs discovered in the TWB2.0 study population. In Table 3, different models based on various combinations of linkage disequilibrium (LD) clumping threshold (r 2 ) and genome-wide significance level threshold (p) are listed. The mean PRS was significantly higher in glaucoma cases compared to controls across all models (Table 3 and Figure 2A). After weighing the clinical significance of r 2 , p, and AUC, we constructed the model with 134 selected independent SNPs, r 2 < 0.2, and p < 10 −4 (referred to as PRS_TWB2.0 below) (Table S2).   Table S2 shows the complete list of 134 SNPs.
Regarding the PRS performance, the PRS_TWB2.0 was effective in distinguishing individuals with high glaucoma risks from those with low risks (Figure 2A). The association came up with a dose-response relationship ( Figure 2B,C and Table 4). Individuals in the highest quantile of PRS_TWB2.0 (Q3-Q4) had 45.48-fold increased risk compared to the lowest risk quantile (min-Q1), and those in the third (Q2-Q3) and second (Q1-Q2) highest quantile had 9.77 and 3.80-fold risks, respectively. Table 4 shows the case-control distribution among quantiles. Furthermore, in the high-risk group (top 5% to 25% in PRS distribution), Table 5 showed significantly elevated risks of glaucoma: the top 25% of the PRS had a 9.41-fold risk, the top 10% had a 9.72-fold risk, and the top 5% had a 13.30-fold risk of developing glaucoma compared to the remaining population.

Discussion
The Taiwan Biobank contains high-coverage, whole-genome sequencing data of the Taiwanese population who are mostly of Han Chinese ancestry, a less studied population in glaucoma-related research. In this study, we included 37,575 individuals (1013 cases, 36,562 controls) from TWB2.0 to build a PRS glaucoma prediction model. A total of 138 independent glaucoma-associated SNPs at the significance level of p < 1 × 10 −5 were identified, including the most significant three loci at p ≈ 5 × 10 −7 . All except 4 SNPs had an odds ratio larger than 1, indicating greater odds of association with the SNP (exposure) and glaucoma (outcome). Three of these SNPs even displayed odds ratios of >1.9. After LD clumping, 134 selected SNPs were used to construct a PRS that retrospectively predicts glaucoma risk in the Taiwanese population, with an area under the receiver operating characteristic (ROC) curve (AUC) of 0.8387 (95% CI = [0.8269-0.8506]) and 0.8894 after covariates adjustment. Those within the top PRS quantile had a 45.48-fold increased risk of glaucoma compared with those within the lowest quantile. To our knowledge, this is the largest Taiwanese-based PRS glaucoma prediction model to date.
While finding the most supported genes for our 138 newly identified loci is a known challenge, it is nonetheless highly important and relevant for further functional followups. The nearest genes to glaucoma-associated SNPs identified in previous studies using other biobanks include the CYP1B1 [5][6][7], OPTN [8,9], LTBP2 [23,24], COL11A1 [25,26], PLEKHA7 [10,11], CDKN2B-AS1 [18,27,28], LOXL1-AS1 [29,30], and CARD10 [31,32] genes. Among them, only the PLEKHA7 gene was also found as a supporting gene in our 138 newly identified loci from this Taiwan Biobank population of the Han Chinese ancestry, suggesting marked differences between different ethnicities. By closely analyzing all the nearest genes underlying our 138 newly identified loci, we highlighted the LINC00475, RIPOR2, and PLEKHA7 genes to be the most supported genes for the most significant SNPs. Importantly, PLEKHA7 (Pleckstrin Homology Domain Containing A7) has been associated with Primary Angle Closure Glaucoma. PLEKHA7 is a gene located at the locus 11p15.2-p15.1 (NCBI Gene ID: 144100) related to the pathway of stabilizing and expanding the E-cadherin adherens junctions. It plays a role of supporting and maintaining adherens junctions through interaction with proteins such as the delta-catenin [33]. PLEKHA7 interacts with CAMSAP3, thereby anchoring microtubules at their minus-ends to zonula adherens and recruiting KIFC3 kinesin to the junctional site [33]. Additionally, PLEKHA7-PDZD11 complex mediates the clustering of ADAM10 at cell-cell junctions with the ADAM10-binding protein TSPAN33 [34]. This mechanism is tightly relevant to the agerelated changes in glaucoma. As for the other two genes nearest to the most significant SNPs, LINC00475 and RIPOR2, there has been no reported association to glaucoma and other eye diseases so far. The LINC00475 is a long intergenic non-protein-coding RNA whereas the RIPOR2 (RHO Family Interacting Cell Polarization Regulator 2) gene encodes a protein that mediates the polarization of T cells and neutrophils and is a component of hair cell stereocilia essential for hearing.
The list of 138 newly identified SNPs corresponds to 20 unique nearest genes. Three of them, ROR2, SDCCAG8, and FXYD6, are involved in molecular pathways that might have the slightest relation to glaucoma and are of potential interest for further investigations into their potential roles in glaucoma-related mechanisms. ROR2 (Receptor Tyrosine Kinase Like Orphan Receptor 2) is located on chromosome 9 at location 9q22.31 (NCBI Gene ID: 4920). It encodes the cell surface receptor protein tyrosine kinase and type I transmembrane protein involved in the early formation of chondrocytes, cartilage, and growth plate development. Mutations in this gene can cause brachydactyly type B, the Robinow syndrome, among other skeletal disorders. Besides pluripotency and the GPCR Pathway, ROR2 is also involved in the Wnt signaling pathway where it acts as a receptor for the non-canonical Wnt ligand, WNT5A. A recent study has found that WNT5A was upregulated by dexamethasone (DEX), which in turn regulates aqueous humor outflow by inducing a re-organization of the cytoskeleton in the trabecular meshwork (TM) cells [10]. The knockdown of ROR2 receptor abolished the effects of DEX in TM cells. Given the simi-larities between DEX-induced glaucoma and primary open angle glaucoma, these results shed light on a potential mechanism to treat primary open-angle glaucoma. SDCCAG8 (SHH Signaling and Ciliogenesis Regulator) is a gene located on chromosome 1 in the location 1q43-44 (NCBI Gene ID 10806), encoding a protein that organizes the centrosome during interphase and mitosis. It is also involved in epithelial lumen formation, ciliogenesis, and organelle biogenesis. Several studies have also associated the mutations in this gene with retinal-renal ciliopathy. The ciliary localization of SDCCAG8 in photoreceptors is determined by RPGRIP1α, and its absence suppresses their ciliary targeting. However, its association with glaucoma and other ocular diseases remains to be investigated. FXYD6 (FXYD Domain-Containing Ion Transport Regulator 6) is a gene located on chromosome 11 at the location 11q23.3. It encodes phosphohippolin, a transmembrane protein which is likely involved in the regulation of the activity of Na/K-ATPase. Notably, evidence exists for the therapeutic effects of targeting Na+/K+ ATPases to treat ocular hypertension [12]. The long-lasting effects of decreasing IOP by siRNAs targeting Na+/K+ ATPases is similar to that produced by commercial drugs. Since glaucoma is treated by lowering intraocular pressure, investigation of the involvement of the FXYD6 gene in glaucoma development may be worthwhile.
The strength of this study lies in its large-scale multi-center approach. Previous studies have reported the differential presentation as well as ethnic and geographic disparities in glaucoma genetics [35][36][37]. We have provided further evidence for the differential genetic architecture of glaucoma between the East Asian and European populations. Further meta-analysis of the UK Biobank and the NHGRI-EBI GWAS Catalog did not improve the genome-wide significance and associations despite after multiple testing corrections. Nonetheless, we replicated and validated our analysis results with the TWB2.0 population with an independent but ethnically similar TWB1.0 population. In general, the SNP effects found in our Taiwanese population are not comparable to genome-wide variants reported in previous studies of other ethnicities and ethnicity remains an important criterion in the genetic basis of glaucoma.
Notably, many of the risk genes we identified yielded a relatively large odds ratio of >1.5. This might be due to the imbalanced case-control data ratio we had, causing statistically significant genes with an odds ratio that has a wide confidence interval to be more easily identified. Therefore, SAIGE association tests were used to further evaluate those genes to account for imbalanced case-control data ratio. However, adopting SAIGE to avoid inflation of type I errors also meant reducing the power of the study [38]. A further limitation of this work is that all the cases of disease status were self-reported, and disease subtypes were not documented. The number of patients with glaucoma was underestimated, and only 1.47% of TWB2.0 participants had glaucoma. According to the Shipai Eye Study in Taiwan, the estimated prevalence of glaucoma, primary open angle glaucoma (POAG), and primary angle-closure glaucoma (PACG) in the elderly population (≥72-year-old) was 8.7%, 3.7%, and 4.8%, respectively [39]. However, the TWB2.0 glaucoma participants are relatively younger, and the prevalence of glaucoma should have been lower. In addition, the proportion of glaucoma types can be different. It may be worthwhile to investigate the association of each of these SNPs with different subtypes of glaucoma.
Furthermore, we also expanded this analysis to derive a PRS that was tested across a wide spectrum of clinically relevant glaucoma outcomes. The significant effect of PRS prediction supports the polygenic nature of the glaucoma-associated risk alleles, i.e., there are many genetic variants with small effects that collectively exert a significant impact on the risk of glaucoma in carriers of the mutations. We also observed similar results for the unweighted PRSs. The inclusion of PRSs resulted in significantly improved prediction accuracy for these homogenous groups compared with the traditional risk factors, showing the clinical utility of biomarkers-based GWAS. Glaucoma is an increasingly significant public health issue given the increasing number of individuals that will be affected by the disease over the next several decades. At present, since glaucoma is irreversible and there is no definitive cure, early detection is crucial for early treatment, and identifying individuals at a greater risk of the disease can further reduce its impact. To estimate the cumulative measure of genetic risk identified through genetic association studies, PRSs aggregate individual variants to better measure of the genetic predisposition for a polygenic health outcome like glaucoma. As a result, early detection of glaucoma can provide high risk asymptomatic individuals (with a higher PRS) with the opportunity for earlier intervention.

Study Population and Genome-Wide Association Study (GWAS)
The participants and their data were obtained exclusively from the TWB (https://www. twbiobank.org.tw/test_en, (accessed on 28 September 2021). Up to 15 April 2021, more than 144,000 participants have been recruited. The demographic and health-related survey data for 105,388 study subjects were released in December 2019. Detailed genotyping and imputation procedures are described by Wei et al. [22]. In brief, 105,388 demographic and health-related survey data points were released in December 2019. There were 95,252 participants who had been genotyped with custom TWB1.0 array (TWB1.0 = 27,737) or TWB2.0 array (TWB2.0 = 68,978).
After performing quality control for the samples, 11,785,052 variants from 7082 (450 cases, 6632 controls) TWB1.0 and 11,110,260 variants from 37,575 (1013 cases, 36,562 controls) TWB2.0 participants were used in the subsequent analysis. Adjusting sex and age as a covariate, logistic regression case/control analysis was performed using the PLINK software. We performed GWAS in two stages. Firstly, PLINK was used to screen out significant signals for all traits. The genome-wide significance threshold was set at a loose level of p = 5.0 × 10 −3 . Secondly, significant signals from the first stage were further selected for association tests using R's SAIGE package (version 0.35) in order to adjust for the imbalanced case-control ratio.

Polygenic Risk Score (PRS) Analyses
To build the PRS prediction models, we used the standard clumping and thresholding (C + T) method. The hyperparameters for this method were the cut-off of correlation r2 and p-value threshold p. The parameter spaces for r 2 and p were {0.2, 0.4, 0.6, 0.8} and {10 −4 , 10 −5 } respectively. For each combination of (r 2 , p), we used PLINK with window size 10 Mb to select SNPs. For model selection, we considered TWB2.0 as the training sample report the prediction performance (AUC) and TWB1.0 as the testing sample to evaluate AUC of the prediction model. For the SNPs whose minor alleles showed protective effects on glaucoma, we converted their minor alleles to major alleles as risk alleles, which results in positive weight values for all variants. The PRS analyses were performed using PLINK 1.9. In this study, the predictive abilities of TWB1.0 and TWB2.0 PRS were compared using the area under the receiver operating characteristic (ROC) curve (AUC) [41]. Improvement in AUC between ROC curves were tested using Delong's method [42]. The analyses were performed using the R package "pROC".

Statistical Analyses
Continuous data are presented as means with standard deviation, and categorical data are presented as proportions. Our study used Student's T-test to compare the mean values of continuous variables and chi-squared tests to compare the frequencies of categorical variables between two groups. All statistical analyses were performed using R version 4.1.1.

Conclusions
In our study, we analyzed data from TWB1.0 and TWB2.0. A total of 138 independent glaucoma-associated SNPs were identified from the TWB2.0 population at the significance level of p < 1 × 10 −5 . Among these SNPs, the SNPs rs2282199, rs4078356, and rs4757474 had odds ratios of 1.266, 1.77, and 1.238 and were nearest to the LINC00475, RIPOR2, and PLEKHA7 genes, respectively. Further analyses are warranted to survey the risk loci differences between different glaucoma subtypes. Furthermore, we built a novel PRS model to identify patients susceptible to glaucoma, which was validated in an independent cohort from the TWB1.0 database. Further meta-analysis of the UK Biobank and NHGRI-EBI GWAS Catalog did not improve the genome-wide significance and associations, indicating the differential genetic architecture of glaucoma between different ethnic populations. Together with the newly identified experimentally significant loci, the PRS model we have constructed highlights the potential for far-reaching clinical utility in reducing the global burden of glaucoma.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Taipei Veterans General Hospital (ID No: 2020-07-008A and 2020-12-009AC).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest:
The authors declare no conflict of interest.