Exploring the Genetic Landscape of Retinal Diseases in North-Western Pakistan Reveals a High Degree of Autozygosity and a Prevalent Founder Mutation in ABCA4

Variants in more than 271 different genes have been linked to hereditary retinal diseases, making comprehensive genomic approaches mandatory for accurate diagnosis. We explored the genetic landscape of retinal disorders in consanguineous families from North-Western Pakistan, harboring a population of approximately 35 million inhabitants that remains relatively isolated and highly inbred (~50% consanguinity). We leveraged on the high degree of consanguinity by applying genome-wide high-density single-nucleotide polymorphism (SNP) genotyping followed by targeted Sanger sequencing of candidate gene(s) lying inside autozygous intervals. In addition, we performed whole-exome sequencing (WES) on at least one proband per family. We identified 7 known and 4 novel variants in a total of 10 genes (ABCA4, BBS2, CNGA1, CNGA3, CNGB3, MKKS, NMNAT1, PDE6B, RPE65, and TULP1) previously known to cause inherited retinal diseases. In spite of all families being consanguineous, compound heterozygosity was detected in one family. All homozygous pathogenic variants resided in autozygous intervals ≥2.0 Mb in size. Putative founder variants were observed in the ABCA4 (NM_000350.2:c.214G>A; p.Gly72Arg; ten families) and NMNAT1 genes (NM_022787.3:c.25G>A; p.Val9Met; two families). We conclude that geographic isolation and sociocultural tradition of intrafamilial mating in North-Western Pakistan favor both the clinical manifestation of rare “generic” variants and the prevalence of founder mutations.


Introduction
Inherited retinal dystrophies (IRDs) constitute a genetically heterogeneous group of rare conditions of the eye. They are mainly characterized by the progressive loss of rod and/or cone photoreceptors, resulting in complete or nearly complete blindness at the end [1]. Globally, IRDs affect approximately one million people, with a frequency of 1 in 3000 births. Clinically, they may range from mild and non-progressive night blindness to more severe and degenerative phenotypes, including retinitis pigmentosa (RP) and cone or cone-rod dystrophies [2]. To date, mutations in over 271 genes have been linked to various forms of IRDs (RetNet; https://sph.uth.edu/RETNET/; accessed on 12 December 2019), and the sequencing of their coding parts has allowed the detection of pathogenic mutations in more than 60% of the patients [3]. IRDs are inherited as an autosomal recessive, autosomal dominant, X-linked, or mitochondrial trait, with autosomal recessive being the most prominent type [1,4]. Recent technological advancements, such as next-generation sequencing (NGS), have significantly increased gene discovery rates in a wide range of inherited ocular conditions [5], with a sensitivity value of ~75%, when applied to a clinically focused IRDs group [6]. Since consanguinity unmasks the adverse effects of recessive mutations through bi-parental inheritance of the same allele, it is possible to reveal the presence of disease-causing variants in consanguineous pedigrees by simply flagging large segments of consecutive homozygous genotypes surrounding the mutations, using a technique called "autozygosity mapping" [7][8][9][10]. For a more rapid and robust analysis, scientists usually combine autozygosity mapping with NGS to maximize the acquisition of relevant genetic information [10]. The combination of such information with targeted functional studies has provided significant insights into the molecular mechanisms of rare Mendelian diseases, including IRDs.
Since children of consanguineous couples are more likely than children of non-consanguineous parents to be affected by recessive genetic anomalies [11], the incidence of rare Mendelian diseases is higher in populations having a high degree of endogamy [12]. For example, Pakistan has one of the highest rates of inherited genetic diseases in the world, likely due to the fact that consanguinity is present in more than 50% of the population and marriages among first cousins are highly favored by the society [13,14]. According to a recent estimate, approximately 1.12 million people in Pakistan are blind, and the vision loss burden has continued to rise in the country since 1990 [15].
Although some information exists on blindness caused by cataracts or refractive errors, the prevalence of IRDs is not well documented in Pakistan at the level of the whole population. A hospital-based study in Karachi, a metropolitan city, revealed that 1 in 800 patients who visited the ophthalmic outpatient department had retinal dystrophies, with RP being the most frequent type (64%), followed by Stargardt disease (14.7%) and cone dystrophies (6.7%). Unsurprisingly, more than half of the patients from this study were born to consanguineous parents [16]. Recently, a few studies on IRDs have been published in Pakistan [17][18][19][20]. However, the majority of these reports were based on pedigrees from the Punjab and Sindh provinces, thus leaving North-Western Pakistan largely unexplored. Administratively known as Khyber Pakhtunkhwa (KP), this part of the country is predominantly a Pashtuns territory and includes a heterogeneous population of approximately 35 million inhabitants. Consanguinity in KP ranges between 22% and 66%, and the rate of consanguinity was found to have increased over time, possibly due to the growing violence and geo-political conflicts in the region (consanguineous marriages are believed to strengthen pre-existing intrafamilial relationships and thus be advantageous in the context of civil unrest) [21][22][23][24][25]. To our knowledge, no comprehensive study on the genetic spectrum of IRDs has ever been undertaken in North-Western Pakistan, possibly because of socio-economic and cultural limitations, a lack of infrastructure, difficult terrain, and escalating conflicts in the region.

Enrollment of Families and Collection of Samples
Our study conforms to the standards of the Declaration of Helsinki and was approved by the Institutional Review Boards of the Hazara University, Mansehra, Pakistan (approval code: F.No:185/HU/Zool/2018/583) and of all our respective Institutions. Informed consents were provided in written form by all families prior to their participation and were signed by all members who were enrolled. Families with at least two or more affected persons, a history of consanguinity, and a clear autosomal recessive inheritance pattern of the disease were selected for molecular analysis. All families participating in our study were ethnically Pashtuns and were geographically located in the KP province in North-Western Pakistan. Clinical and demographic information was obtained via a pre-designed questionnaire, and pedigrees were drawn and cross-checked through face-to-face interviews with patients and/or elder members of the family, in their native language. Clinical data were obtained either directly from the patients' medical reports (if available) or through consultation with the local clinicians/ophthalmologists who examined them on our request. Due to the limited medical infrastructure in the region, detailed clinical investigations, such as electroretinography (ERG) or fundus images, were not available. Furthermore, most of the clinical information obtained was derived from self-reported data, including, for example, difficulties in day/night vision, photophobia, disease onset and progression, response to medication, outcome of the Ishihara test, dark/light adaptation, ability to see/focus near and distant objects, loss of central or peripheral vision, and nystagmus. Patients were also investigated for other parameters, such as their ability to perform routine activities, e.g., reading, doing physical activities, and socializing with people. Electronic versions of pedigrees were created using the Pedigree Chart Designer (CeGaT, Tubingen, Germany). Saliva samples were collected by using the Oragene saliva kit (OG-500, DNA Genotek, Ottawa, ON, Canada) from patients as well as their clinically unaffected relatives, following the manufacturer's guidelines. DNA was extracted from these samples following standard protocols, e.g., by following the prepIT-L2P manual (DNA Genotek, Canada). Quantitative and qualitative assessments of DNA were made using the NanoDrop™ 1000 Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and electrophoresis on 1% agarose gels.

Genotyping and Homozygosity Mapping
Initially, nine families were subjected to genetic analysis using genome-wide high-density single-nucleotide polymorphism (SNP) arrays. For this purpose, genomic DNA of two or more individuals per family was genotyped by using the InfiniumCoreExome-24v1-1 array (Illumina, San Diego, CA, USA), which encompasses ~550,000 genome-wide SNP markers, at the iGE3 Genomics Platform of the University of Geneva, Switzerland. Arrays were processed using an iScan, according to the manufacturer's protocol. Genotype calls were generated using the GenomeStudio software by Illumina. PLINK was used to analyze the genotype data [26]. Following the identification of shared autozygous intervals among two or more patients from the same family, all exons and exon-intron boundaries of candidate gene(s) inside these intervals were sequenced using the Sanger method. Additionally, families that belonged to the same geographic area and had clinically overlapping phenotypes were also investigated for the presence of shared autozygous intervals. Using this method, putative disease-causing variants were identified in four consanguineous pedigrees segregating autosomal recessive IRDs, while the remaining unsolved families were subsequently characterized by whole-exome sequencing (WES).

Whole-Exome Sequencing
Overall, WES was performed for 10 pedigrees. For WES analysis, 2.0 μg of genomic DNA from index patients was initially processed by Novogene Co. Ltd (Hong Kong, China). Sequencing libraries were generated using the Agilent SureSelect Human All ExonV6 kit (Agilent Technologies, Santa Clara, CA, USA), while fragmentation was carried out by hydrodynamic shearing (Covaris, Massachusetts, MA, USA). Following adapter ligation, DNA fragments were selectively enriched in a PCR reaction. Products were purified using the AMPure XP system (Beckman Coulter, Beverly, CA, USA) and quantified using an Agilent high-sensitivity DNA assay on the Agilent Bioanalyzer 2100 system. Captured DNA libraries underwent paired-end sequencing on an Illumina Novaseq 6000 S4 platform, resulting in sequences of 150 bases (PE150 sequencing strategy). WES data were analyzed using our in-house computational pipeline [27], and autozygosity mapping was done using AutoMap (unpublished). Finally, Sanger sequencing was performed to validate the potentially pathogenic variants detected and to confirm their causality, via genotype-phenotype cosegregation within the families.

Clinical Synopsis
As mentioned earlier, detailed clinical investigation could not be achieved for all patients. However, a summary of the clinical information of one family is shown in Figure S1. On the basis of the few data available, mostly based on patients' symptoms, we could identify five major clinical IRD classes. Briefly, patients with severe early-onset blindness were tentatively categorized as individuals with Leber congenital amaurosis (LCA, two families), while patients presenting with IRD and extraocular symptoms such obesity, hypogonadism, learning/developmental disabilities, post-axial polydactyly of hands and/or feet, and renal abnormalities were examined by a local clinician who classified them as suffering from Bardet-Biedl syndrome (BBS) (two families). Patients with progressive loss of central vision were categorized as having macular dystrophy (eleven families), while those presenting with initial night blindness and progressive loss of peripheral vision were classified as suffering from RP (four families). Patients with RP were clinically evaluated with the help of a local ophthalmologist who reported the presence of bilateral bone spicules and peripheral retinal vascular attenuation, through fundus examination. Lastly, patients with a complete inability to discriminate between colors were classified as having achromatopsia (one family).

Molecular Findings
Collectively, we identified 11 disease-causing variants in 10 IRD-associated genes, in a total of 20 consanguineous IRDs pedigrees, all from North-Western Pakistan (Table 1). These variants were detected using a genome-wide SNP array followed by Sanger sequencing (four families), WES (ten families), and targeted Sanger sequencing alone (six families). Of these variants, seven were previously known IRD mutations, while four variants had never been identified before. Newly detected changes comprised three protein-truncating mutations and one nonsynonymous singlenucleotide variant (SNV). While the pathogenicity of protein-truncating variants can be easily postulated, the causality of the nonsynonymous SNV was inferred through in silico analysis and segregation studies. In spite of all families being consanguineous, compound heterozygosity was detected in one family (PK-E). All homozygous pathogenic variants were detected inside tractable autozygous intervals (≥2.0 Mb in size) and co-segregated with the disease in homozygosis in members from the remaining 19 families, including those who were not pre-ascertained by means of homozygosity mapping (Figure 1). Putative founder mutations were observed in the ABCA4 (NM_000350.2:c.214G>A; p.Gly72Arg; 10 families) and in the NMNAT1 genes (NM_022787.3:c.25G>A; p.Val9Met; 2 families), which together accounted for more than half of the IRD pedigrees analyzed in this study.

Macular Dystrophy (Possibly Including Stargardt Disease and Cone-Rod Degeneration)
Following SNP-based autozygosity mapping in three apparently unrelated families (PK-B, PK-D, and PK-F), we initially identified a ~2.0 Mb autozygous interval on chromosome 1, which was shared by three probands belonging to these families ( Figure 1). Interestingly, the ABCA4 gene was residing inside this interval, and an approximately 100 kb haplotype flanking this gene was identical in all three patients. WES analysis revealed a homozygous missense variant (NM_000350.2:c.214G>A:p.Gly72Arg) in ABCA4. The same variant (p.Gly72Arg) was also present in a compound heterozygous state with a nonsense mutation (NM_000350.2:c.3081T>G:p.Tyr1027Ter) in an additional family (PK-E) belonging to the same geographic location. Both of these variants have previously been identified to cause Stargardt disease [32,33]. Next, we performed targeted Sanger sequencing for p.Gly72Arg in a cohort of 18 previously uncharacterized consanguineous pedigrees from the region and identified the p.Gly72Arg mutation in six of them, in homozygosis. In total, p.Gly72Arg was found to cause disease in at least 10 independent pedigrees from a small town in North-Western Pakistan, collectively accounting for 37 patients (Figure 2). Geographically, these families belonged to Darra Adam Khel in North-Western Pakistan, an area which is mainly inhabited by the Afridi clan of Pashtuns ethnicity. Since these families were from the same geographic region and had a common ethnic affiliation, and an identical haplotype around ABCA4 was detected in the three patients who were investigated for it, we believe that p.Gly72Arg constitutes a founder mutation.
Furthermore, WES analysis in family PK009 revealed a nonsense variant (NM_019098.4:c.1574_1575del:p.Phe525Ter) in the CNGB3 gene, which co-segregated with the disease in homozygosis (Figure 2). This variant has never been reported in any public databases and constitutes a loss-of-function allele, therefore likely representing the molecular cause of disease in this family.

Leber Congenital Amaurosis (Early-Onset Retinal Blindness)
Using SNP-based autozygosity mapping in two families with early-onset visual problems (PK-L, PK-M), we identified an autozygous interval on chromosome 1 that was shared by both probands from these families (Figure 1). Since NMNAT1, residing in this region, was a suitable candidate gene for LCA, we screened all exons and exon-intron boundaries of this gene, using Sanger sequencing. We found a missense variant (NM_022787.3:c.25G>A:p.Val9Met) in exon 2 that co-segregated with the disease in both families ( Figure 2). The same variant (p.Val9Met) was previously reported to cause LCA in a pedigree of Pakistani descent [30]. However, we could not establish whether this previously identified family had any relationship with the pedigrees analyzed in our study. Considering the geographic proximity of these families, we suggest that p.Val9Met in NMNAT1 constitutes another example of a founder mutation in North-Western Pakistan.

Achromatopsia
Through exome sequencing in a consanguineous pedigree (PK004) with three affected children suffering from putative complete achromatopsia, we identified a homozygous nonsynonymous single-nucleotide variant (NM_001298.2:c.847C>T:p.Arg283Trp) in the CNGA3 gene. Autozygosity mapping revealed the CNGA3 gene to lie within a 5 Mb autozygous interval on chromosome 2 ( Figure  1). The mutation is a known cause of achromatopsia [28].

Discussion
Pakistan has one of the highest prevalence of inherited genetic diseases in the world [13], likely due to the high consanguinity rate of its population, generally exceeding 50% [11,12,14]. In this country, marriages of first cousins are highly favored, and families from Pakistan are considered a valuable resource for medical genetics research, which has led to significant scientific findings in the recent past [13,43]. Several studies on IRDs have been conducted in Pakistan during the last few years, but the majority of them were based on pedigrees from the Punjab and Sindh provinces [17][18][19][20]. Khan et al. [20] showed that 90% of the mutations for non-syndromic IRDs and 100% of the mutations for syndromic IRDs were specific to families of Pakistani origin and that mutations in 35 different genes were found to cause non-syndromic IRDs specifically in families of Pakistani descent. In our study, we observed a different trend: out of the 11 mutations identified, only 4 were novel, whereas the remainder had previously been reported in patients from China, the UK, and Germany, and only two of them were found in Pakistani residents [28,29,[31][32][33]. This probably reflects the fact that we focused our analysis on pedigrees from the North-Western part of the country, which remains largely isolated from populations inhabiting central Pakistan for cultural, linguistic, and geographic reasons. Further, North-Western Pakistan is predominantly a Pashtuns territory and, in contrast to central Pakistan, these populations trace their lineage to Pashtuns living in Western Afghanistan, with whom they share, even today, linguistic and sociocultural affiliations.
Our data therefore provide a first insight into the genetic landscape of IRDs in this peculiar part of the world, slightly extending the known mutational spectrum of IRDs. In particular, we have reported two founder mutations (ABCA4:c.214G>A:p.Gly72Arg and NMNAT1:c.25G>A:p.Val9Met) that, together, were responsible for disease in more than half of the patients in our study. At least 37 patients from 10 different families were found to be affected by the mutation in ABCA4, while the NMNAT1 mutation touched at least 6 people in two unrelated families. Owing to the high rates of traditional intra-familial marriages, strong socio-cultural and ethnic divides, as well as geographic barriers, we predict that the number of patients/families affected by these two founder mutations may be even higher. Furthermore, a considerable number of variants identified in our study were detected inside large autozygous intervals (≥10 Mb), thus reflecting recent endogamy in the population. Similarly, our data support the previous notion that the likelihood for a homozygous pathogenic variant to be found within one autozygous interval is much higher in consanguineous pedigrees [10,[44][45][46].
In summary, our study explored the genetic landscape of inherited retinal diseases in North-Western Pakistan, a large but relatively ignored part of the country. In addition to detecting a high degree of autozygosity and relevant founder mutations in the region, our data further expand the mutational spectrum of IRD-associated genes by adding four new variants to it. These findings will help future researchers/clinicians in their rapid screenings of patients from this region and will assist families seeking genetic counselling. We also predict that these insights on eye disorders might apply to other rare conditions that affect individuals from this region, such as deafness, intellectual disabilities, and developmental defects.