Comprehensive Genetic Analysis of Druze Provides Insights into Carrier Screening

Background: Druze individuals, like many genetically homogeneous and isolated populations, harbor recurring pathogenic variants (PV) in autosomal recessive (AR) disorders. Methods: Variant calling of whole-genome sequencing (WGS) of 40 Druze from the Human Genome Diversity Project (HGDP) was performed (HGDP-cohort). Additionally, we performed whole exome sequencing (WES) of 118 Druze individuals: 38 trios and 2 couples, representing geographically distinct clans (WES-cohort). Rates of validated PV were compared with rates in worldwide and Middle Eastern populations, from the gnomAD and dbSNP datasets. Results: Overall, 34 PVs were identified: 30 PVs in genes underlying AR disorders, 3 additional PVs were associated with autosomal dominant (AD) disorders, and 1 PV with X-linked-dominant inherited disorder in the WES cohort. Conclusions: The newly identified PVs associated with AR conditions should be considered for incorporation into prenatal-screening options offered to Druze individuals after an extension and validation of the results in a larger study.


Introduction
Druze individuals constitute a Middle Eastern minority population. Traditionally, the Druze religion is believed to have formed as an Islamic reform movement, under the rule of the sixth caliph of the Fatimid Dynasty of Egypt, ElHakim (AD 966-1020) [1]. In Israel, there are~150,000 Druze (of an estimated~1,000,000 worldwide), overwhelmingly residing in the Northern part of the country [2]. For centuries, Druze have strictly prohibited marriage to non-Druze and limited conversion into the religion. These practices, combined with a high rate of (47%) consanguineous marriages [3], and residence in isolated, mountainous regions, have made the Druze a unique population for genetic research.
Given the founder population attributes of Druze, drifted variants resulting in a high prevalence of monogenic disorders are expected. Indeed, previously reported recurring Given the founder population attributes of Druze, drifted variants resulting in a high prevalence of monogenic disorders are expected. Indeed, previously reported recurring pathogenic variants (PVs) amongst Druze include two PVs in the ATM gene (the gene that underlies Ataxia Telangectesia-OMIM # 208900) in Druze communities in Jordan, Lebanon, and Syria [4]; a PV in the β globin gene [5]; and a nonsense variant in the LDL receptor (LDLR) gene, causing familial hypercholesterolemia [6]. In the most comprehensive account of prevalent germline PVs causing autosomal recessive (AR) disorders in the non-Jewish Israeli population, of 103 PVs in 81 genes, 32 PVs were founder mutations in Druze individuals [7].
Behar et al. [8] demonstrated close genetic relations between Druze and other Middle Eastern populations, such as Bedouins, Palestinians, Syrians, Lebanese, and Jews. A previous study published by some of us [9] confirmed the Middle Eastern origins of the Druze, as well as suggested a ≈ 15-fold reduction in population size taking place ≈ 22-47 generations ago.
In the current study, we performed whole exome sequencing (WES) in 118 samples collected from Druze trios SNP-genotyped in our previous study [9] to further define the genetic makeup of Druze individuals and characterize novel, clinically relevant coding variants in this population. We also analyzed HGDP-available Druze whole-genome sequence (WGS) data from 40 distinct Druze samples [10] (Figure 1).

Figure 1.
Methodology flow diagram: HGDP-derived data was filtered based on Druze ethnicity to create a Druze cohort of 40 individuals. Additionally, exome sequencing was performed on 118 Druze individuals from different clans in Israel, creating the WES cohort. Simultaneously, all the variants from ClinVar were filtered based on interpretation labeled as "pathogenic" or "likely pathogenic". Then, the Druze-cohort variants and the WES-cohort variants were cross referenced with the catalogue of the pathogenic variants from ClinVar creating the Druze pathogenic-variants list. Only variants that were classified as "pathogenic" or "likely pathogenic" according to the ACMG-AMP guidelines were included in the list. We compared the allele frequency of each variant in our cohort and the allele frequency of the variants in worldwide populations based on the data from gnomAD and dbSNP. Using Fisher's test, we identified the variants that were significantly different in Druze. After a literature review, we narrowed down the list to obtain a curated set of pathogenic variants that are enriched in the Druze population in comparison to other populations.

Recruitment of Druze Participants for WES
Druze trios-The study population was individuals who were recruited and participated in our previously described study [9]. Briefly, in the original study, 40 trios of Druze origin (n = 120) representing the different clans (Hamullas) were recruited. These healthy participants were recruited from the Druze communities in Beit Jan located in the Northern Galilee in Israel (20 trios) and in the Golan Heights (20 trios), primarily the village of Figure 1. Methodology flow diagram: HGDP-derived data was filtered based on Druze ethnicity to create a Druze cohort of 40 individuals. Additionally, exome sequencing was performed on 118 Druze individuals from different clans in Israel, creating the WES cohort. Simultaneously, all the variants from ClinVar were filtered based on interpretation labeled as "pathogenic" or "likely pathogenic". Then, the Druze-cohort variants and the WES-cohort variants were cross referenced with the catalogue of the pathogenic variants from ClinVar creating the Druze pathogenic-variants list. Only variants that were classified as "pathogenic" or "likely pathogenic" according to the ACMG-AMP guidelines were included in the list. We compared the allele frequency of each variant in our cohort and the allele frequency of the variants in worldwide populations based on the data from gnomAD and dbSNP. Using Fisher's test, we identified the variants that were significantly different in Druze. After a literature review, we narrowed down the list to obtain a curated set of pathogenic variants that are enriched in the Druze population in comparison to other populations.

Recruitment of Druze Participants for WES
Druze trios-The study population was individuals who were recruited and participated in our previously described study [9]. Briefly, in the original study, 40 trios of Druze origin (n = 120) representing the different clans (Hamullas) were recruited. These healthy participants were recruited from the Druze communities in Beit Jan located in the Northern Galilee in Israel (20 trios) and in the Golan Heights (20 trios), primarily the village of Majdal Shams. Clan ancestral roots were based on family names and repress ented the origins of major locales of Druze residing in the Middle East. Only 118/120 individuals recruited in the original study were included herein, based on DNA quality and availability. HGDP cohort-The HGDP contains 929 DNA samples and WGS data from ethnically diverse individuals, including 40 Druze samples [10]. HGDP DNA samples were Illumina-genome sequenced to an average coverage of 35× (minimum 25×) and reads were mapped to the GRCh38 reference assembly as reported [10]. HGDP Druze study individuals resided in Druze villages in the Carmel and Galilee regions of Israel and not in the Golan Heights. Whole exome sequencing-WES was carried out at the Regeneron Genetics Center following previously published protocols [11]. In brief, genomic DNA was sheared and used to prepare 75 bp paired-end libraries for exome sequencing. Samples were captured using the IDT XGen exome capture reagent and sequenced on an Illumina NovaSeq instrument. Captured fragments were sequenced to achieve a minimum of 85% of the target bases covered at 20× or greater. Following sequencing, data were processed using a DNAnexus-implemented cloud-based pipeline that runs standard tools for sample-level data production and analysis. Sequence reads were mapped and aligned to the GRCh38/hg38 human genome reference assembly using BWA-mem and SNP and InDel variants, and genotypes were called using GATK's HaplotypeCaller in accordance with the best practices for germline short-variant discovery. Samtools 1.12 was used for coverage and depth calculations. Variant filtering-In this study we focused on variants that were labeled as either "Pathogenic" or "Likely pathogenic" (PV) according to ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/ (accessed on 23 January 2023)). Additionally, the actual pathogenicity of each PV was classified according to the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG-AMP) guidelines [12]. Since the focus of this research is on disease-associated variants previously unreported in the Druze population, variants previously reported to be present in the Druze population and appear in previous relevant studies or at the Israeli Medical Genetic Database (http://INGD.huji.ac.il) are listed separately in Table S1 (WES analysis) and Table S2 (HGDP analysis).
Sources of comparison populations and datasets-For each PV, general population allele count (AC) and general population allele number (AN) were retrieved from gnomAD (https://gnomad.broadinstitute.org/ (accessed on 1 November 2022)), as indicated by the total row in the Population Frequencies table. If AC and AN were missing, those values were extracted from dbSNP (https://www.ncbi.nlm.nih.gov/snp/ (accessed on 1 November 2022)), as indicated by the total column in the ALFA allele-frequency table. Additionally, suitable AC and AN of the Middle Eastern population were extracted from gnomAD, as indicated by the Middle East row in the population frequencies table. Allele frequency (AF) was calculated by dividing AC by AN.
Statistical analyses-A two-sided Fisher's exact test was performed to compare the difference between the AF of the WES cohort and the AF of the general population for each SNP and between the AF of the HGDP cohort and the AF of the general population for each SNP. A p value of 0.05 was set to be the cutoff for statistically significant results.
One PV (rs777172978) was significantly enriched in both the WES and the HGDP analysis.
One PV (rs777172978) was significantly enriched in both the WES and the HGDP analysis.

Discussion
In the current study, 34 PVs in genes associated with AR and AD disorders not previously described in Druze individuals were identified. The most updated list of genes and PVs prevalent in the Druze population in Israel encompasses 79 AR diseases, 81 genes, and 103 variants [7]. The findings of PVs in the isolated populations reported herein are in line with previous reports [12,13]. Specifically, Khayat et al. [13] reported 48 PVs in the AR genes (24 novel PVs) in an isolated community of Muslim Arabs in Israel (n = 50) based on the results of WES in that population [14]. The Israeli population genetic carrier screening program is included in the health basket and hence is covered by the health maintenance organizations (HMOs) [15]. The data presented herein suggest that the expansion of the list of testable AR disease genes genotyped in the context of the Israeli population genetic-carrier screening program should be considered. Such a list should be based on more comprehensive data collected from all ethnicities with a specific emphasis on genotyping adequate numbers of individuals from isolated populations to address their unique needs. Notably, rates of carrier screening use among Druze and other non-Jewish ethnic groups in Israel are substantially lower compared to rates in Jewish Israeli counterparts [16]. Given the cost effectiveness of prenatal screens in guiding prenatal diagnostic procedures, awareness of the availability of effective testing should be increased in the Druze population.
PVs in two genes that are associated with AD chronic pancreatitis-PRSS1 (OMIM #276000; PV-rs111033565) and CTRC (OMIM #601405; PV-rs202058123) were detected. The incidence of chronic pancreatitis ranges from 4 to 14 per 100,000 per year, and the prevalence from 13 to 52 per 100,000 population [17]. There are no reported studies suggesting that Druze individuals are at an increased risk for developing chronic pancreatitis compared with other ethnically diverse populations. Since clinical manifestations may be subtle, the implication of this finding needs to be investigated in a larger population of Druze cases. Perhaps those that are referred for a clinical workup of undefined abdominal pain

Discussion
In the current study, 34 PVs in genes associated with AR and AD disorders not previously described in Druze individuals were identified. The most updated list of genes and PVs prevalent in the Druze population in Israel encompasses 79 AR diseases, 81 genes, and 103 variants [7]. The findings of PVs in the isolated populations reported herein are in line with previous reports [12,13]. Specifically, Khayat et al. [13] reported 48 PVs in the AR genes (24 novel PVs) in an isolated community of Muslim Arabs in Israel (n = 50) based on the results of WES in that population [14]. The Israeli population genetic carrier screening program is included in the health basket and hence is covered by the health maintenance organizations (HMOs) [15]. The data presented herein suggest that the expansion of the list of testable AR disease genes genotyped in the context of the Israeli population genetic-carrier screening program should be considered. Such a list should be based on more comprehensive data collected from all ethnicities with a specific emphasis on genotyping adequate numbers of individuals from isolated populations to address their unique needs. Notably, rates of carrier screening use among Druze and other non-Jewish ethnic groups in Israel are substantially lower compared to rates in Jewish Israeli counterparts [16]. Given the cost effectiveness of prenatal screens in guiding prenatal diagnostic procedures, awareness of the availability of effective testing should be increased in the Druze population.
PVs in two genes that are associated with AD chronic pancreatitis-PRSS1 (OMIM #276000; PV-rs111033565) and CTRC (OMIM #601405; PV-rs202058123) were detected. The incidence of chronic pancreatitis ranges from 4 to 14 per 100,000 per year, and the prevalence from 13 to 52 per 100,000 population [17]. There are no reported studies suggesting that Druze individuals are at an increased risk for developing chronic pancreatitis compared with other ethnically diverse populations. Since clinical manifestations may be subtle, the implication of this finding needs to be investigated in a larger population of Druze cases. Perhaps those that are referred for a clinical workup of undefined abdominal pain or nonspecific symptom that may herald chronic pancreatitis. Other possibilities to account for these genetic findings, as well as for other seemingly prevalent PVs in AD disorders reported herein, should also be entertained: incomplete penetrance, or even misclassification of pathogenicity by ClinVar.
Notably, the high rate of the PV in the PRRT2 gene in the current study (3%), as is the rate of the PV in the COL6A2 gene (3%), are expected to be associated with a high rate of Episodic Kinesigenic Dyskinesia, Type 1 and Ullrich congenital muscular dystrophy, Type 1 amongst Druze individuals, respectively. Underreporting, incomplete penetrance, or variable expressivity of these disorders in Druze individuals, as indeed is the case in other populations for Episodic Kinesigenic Dyskinesia, Type 1 [18], may account for the lack of reported overrepresentation of clinically relevant diseases.
In this study, we identified two PVs in two ACMG actionable genes [19] MUTYH (OMIM #604933; PV-rs587778541) and MEFV (OMIM #608107; PV-rs28940580). Homozygous MUTYH PVs are associated with colorectal cancer and adenomatous polyposis while homozygous PVs in MEFV cause Familial Mediterranean Fever (FMF), a relatively prevalent disease in people who live around the Mediterranean region, including the Druze population [20,21]. Notably, homozygous PVs in both genes are associated with a clinically significant disease, whereas heterozygous PVs, as is the case here, are not.
The p.I1307K APC (OMIM #611731) increased risk allele was detected in two Druze, cancer-free individuals in the current study (AF = 0.03, AC = 2). This variant is very prevalent in Ashkenazi Jews (AJ),~6% [22] of the general average risk population with rates of up to 20% in AJ colorectal cancer (CRC) cases with a family history of CRC [23]. Since its original description in AJ, this variant has been reported in ethnically diverse populations of Jewish non-Ashkenazim [24] and Muslim Arabs residing in Israel [25]. Detecting this variant in Druze individuals, given the unique and almost exclusive intrafaith marriage patterns, may suggest that this variant may have arisen in the Middle East prior to the separation of the Druze from the Muslims. The clinical implication of harboring the p.I1307K APC variant and the associated cancer risk is still unsettled. In most studies, this variant marginally increases the risk for developing CRC with a pooled odds ratio in one meta-analysis of 2.17 (95% confidence interval: 1.64, 2.86) [26] with the median age not younger in variant carriers compared to the general population [27]. The risk for developing CRC in Israel is significantly lower for non-Jewish individuals compared with ethnically diverse Jews (https://www.health.gov.il/UnitsOffice/HD/ICD C/ICR/CancerIncidence/Pages/default.aspx (accessed on 1 November 2022)). Yet the carrier rate in non-AJ of the p.I1307K APC variant is estimated to be 1.6% [27], similar to what has been observed in the current study. Taken together, these facts may be indirect evidence for a minimal role of this specific APC variant in conferring CRC risk during population screens.
Behçet disease (BD) is a multisystem inflammatory disorder pathologically hallmarked by vasculitis affecting the small and large veins and arteries [28]. Ethnic groups living along the historical silk road are at an increased risk of developing BD [29]. Specifically, in Israel, the rate of BD amongst Druze is reportedly among the highest of all ethnic groups with rates of up to 150/100,000 [8]. Like most adult-onset diseases, genetic factors play a role in BD predisposition. Notably, human leukocyte antigen (HLA)-B51 has been reported as the strongest genetically-associated factor for BD. Other HLA alleles, as well as other loci containing genes involved in host defense, immunity, and inflammation pathways (detected predominantly via GWAS), have been shown to contribute to BD susceptibility [30]. Of these additional BD-associated genes, the interleukin pathway family of genes, including IL10, IL23R-IL12RB2, IL12A, and IL23R, have been reported [30]. Specifically, the possible contribution of the IL18R1 gene to BD has not been thoroughly investigated. IL18R1 encodes for the α chain, a subunit of the IL18 receptor [31]. IL18, the IL18 receptor ligand, is a member of the IL1 family of cytokines [31], proteins that play a key role in BD ocular or mucocutaneous manifestations and was found to be elevated in the synovial fluid of BD patients [32,33]. Tan and coworkers [34] reported that three SNPs in the genomic region encompassing the IL18R1 gene were associated with ocular manifestations of BD in the Han Chinese population. In the current study, these three SNPs were in perfect linkage disequilibrium creating a 10 Kb haplotype enriched in the Druze population. Yet, the high rate of these SNPs in the general population, the lack of any bona fide PVs in the WES cohort, and the paucity of supporting data in other populations may indicate that the contribution of PVs in the IL18R1 gene to the burden of BD may be minimal at best.
The limitations of the current study should be acknowledged. This is a study that generated data on a limited number of Druze families residing in Israel, where only a small subset of the world Druze population resides, and it may not reflect the entire populational spectrum of this ethnic community. Given the lack of precise clinical knowledge on the genotyped individuals and basing the health status on self reporting at a single time point adds another limitation. Given the current study design, the penetrance of the autosomal dominant alleles reported herein cannot be assessed, thus limiting the ability to provide more insightful and evidence-based genetic counseling. Additionally, the results on which AR genes' PVs (or a subset of them) should be incorporated into a Druze prenatal screening, should await a validation study encompassing more Druze cases.

Conclusions
Novel PVs in genes associated with severe AR disorders prevalent in Druze individuals should be considered for inclusion in the next version of the national prenatal screening in Israel to the relevant population, after validation in a larger study.