Catalogue for Transmission Genetics in Arabs (CTGA) Database: Analysing Lebanese Data on Genetic Disorders

Lebanon has a high annual incidence of birth defects at 63 per 1000 live births, most of which are due to genetic factors. The Catalogue for Transmission Genetics in Arabs (CTGA) database, currently holds data on 642 genetic diseases and 676 related genes, described in Lebanese subjects. A subset of disorders (14/642) has exclusively been described in the Lebanese population, while 24 have only been reported in CTGA and not on OMIM. An analysis of all disorders highlights a preponderance of congenital malformations, deformations and chromosomal abnormalities and demonstrates that 65% of reported disorders follow an autosomal recessive inheritance pattern. In addition, our analysis reveals that at least 58 known genetic disorders were first mapped in Lebanese families. CTGA also hosts 1316 variant records described in Lebanese subjects, 150 of which were not reported on ClinVar or dbSNP. Most variants involved substitutions, followed by deletions, duplications, as well as in-del and insertion variants. This review of genetic data from the CTGA database highlights the need for screening programs, and is, to the best of our knowledge, the most comprehensive report on the status of genetic disorders in Lebanon to date.


Introduction
Lebanon is a Levantine country, located on the Mediterranean Sea. Although a small country, with a total area of 10,452 km 2 , its location on the crossroad of both land and maritime routes has provided it with great historical and cultural significance since antiquity. The total population in Lebanon is currently estimated at 6.8 million people [1]. However, non-Lebanese residents account for almost 30% of this figure, with the massive influx of refugees, mostly from Syria, that the country has witnessed in the past decade [2]. On the other hand, during and following the Lebanese civil war and up until the present, a significant proportion of citizens left the country, contributing to a large Lebanese diaspora (estimated 4-13 million individuals) across the globe, especially concentrated in South America, Canada, and Australia [3].
Genetic disorders are highly prevalent in the Arab World [4][5][6]. Some of the factors contributing to this high prevalence in these populations include the high fertility and birth rates, and the high rates of consanguineous unions [7]. Although lower relative to countries in the Arabian Gulf and North Africa, Lebanon still has a high annual incidence of birth defects at 63 per 1000 live births; most of these congenital defects being due to genetic factors [4]. Genetic disorders in Lebanon have been overviewed by Nakouzi et al., Khneisser et al., and earlier by Der Kaloustian [8][9][10].
The Catalogue for Transmission Genetics in Arabs (CTGA) database, hosted online at https://cags.org.ae/ (accessed on 29 July 2021), is a compendium of bibliographic data on genetic disorders in Arabs, compiled and curated through searches on PubMed and Index Medicus [11]. To date, CTGA is the only database documenting Arab genomic variation which is entirely manually curated. In this review, we present a report of genetic disorders and associated gene variants in the Lebanese population pulled from the CTGA database. This is, to the best of our knowledge, the most comprehensive review on the status of genetic disorders in this country.

Materials and Methods
We conducted a comprehensive literature search on PubMed for biomedical literature either originating from Lebanon or referring to Lebanese subjects, using the search string "Lebanon* OR Lebanese*", up to the end of 2020. The 22,159 articles thus obtained were manually screened according to the following inclusion criteria: (a) article describing a genetic disease in a Lebanese individual(s) or subject(s) of Lebanese origin, (b) article describing gene variant(s) in a Lebanese individual or in the Lebanese population, and (c) article reporting a disorder, not commonly known to be genetic, in multiple members of a Lebanese family. The following categories of articles were excluded: (a) articles carrying information only on non-Lebanese subjects, (b) articles carrying redundant data. The 814 screened articles were then carefully analyzed and information on the genetic disorder, relevant HPO terms, and gene variants reported in anonymous Lebanese subjects were manually extracted and added to the CTGA database ( Figure 1A). CTGA is an SQL database containing five related categories of records: disease, gene, variant, subject, and reference article. Disease and gene records are linked to their corresponding OMIM records, whenever available. Published variants for which HGVS reference sequence positions were identifiable were entered into CTGA as distinct records and linked to relevant subjects. These variants are linked to their dbSNP and ClinVar records, if available. Data on other variants were incorporated in text descriptions in relevant gene and disease records, but not as separate variant records. Anonymous subject records contain HPO terms and are linked to the relevant published article reference. Data can be accessed through a simple or advanced search.

Results and Discussion
Although genetic publications on Lebanese subjects started as early as 1950 [12], it is only after the late 1990s, following the end of the Lebanese civil war, that we see a significant amount of genetic literature being published. An earlier study analyzing the biomedical bibliometric output from Lebanon until 2007 showed an increasing trend for publications [13]. We see a similar trend in publications over the years in our analysis of year-wise distribution of the selected articles, signifying an increase in genetic research.
We were also interested in monitoring the evolution of medical genetic studies, especially with the advent of newer molecular technologies. The percentage of research articles with available molecular data shows a clear increasing trend, especially over the last decade ( Figure 1B), corresponding to the time period when NGS techniques were adopted by clinicians in Lebanon for the diagnosis of genetic diseases. Interestingly, the number of such publications declined in the past couple of years, likely due to the economic crisis that Lebanon has since been facing.
The CTGA database currently holds data on 642 genetic diseases, 676 related genes, and 1316 variants described in the Lebanese population. Of all diseases, 24 are genetic and/or congenital disorders that are not available on OMIM (Table 1). Almost all of these are syndromic conditions, with a combination of clinical features not described elsewhere. For instance, variants of MCA/MR (multiple congenital anomalies/mental retardation) conditions with new constellations of features [14,15], novel phenotypes associated with genes known to cause other related genetic diseases [16,17], and other syndromic congenital disorders [18,19]. The absence of these disorders in other genetic databases like OMIM points to the rarity of these conditions. Our review also revealed 14 rare genetic disorders that have exclusively been described in the Lebanese population (Table 1). These include the Lebanese type of Mannose 6-Phosphate Receptor Recognition Defect (MIM # 154570), a form of autosomal recessive deafness (MIM # 603678), as well as a form of autosomal recessive dystonia (MIM # 612406). Two factors in combination could be responsible for the presence of this relatively large number of genetic conditions in this population. The first is the availability of and access to trained geneticists capable of recognizing and diagnosing these conditions, despite a relatively low grade medical and genetic infrastructure in the country [8]. The second factor is the persistent relatively increased prevalence of consanguineous marriages, resulting in the manifestation of rare disorders, many of which follow a recessive mode of inheritance [20]. In fact, around 65% of diseases we report in Lebanese subjects on CTGA follow an autosomal recessive inheritance pattern ( Figure 2). This is in line with the numbers reported in 2015 [8]. Disorders following an autosomal dominant inheritance pattern make up about 26% of all reports, followed by X-linked and mitochondrial disorders.   We categorized the genetic disorders in the Lebanese population based on the WHO ICD-10 classification criteria (Figure 3). The most common category of disorders is congenital malformations, deformations and chromosomal abnormalities, followed by endocrine, nutritional and metabolic diseases and diseases of the nervous system. This pattern is comparable to data from other Arab countries in the CTGA database [21]. The overwhelming predominance of congenital malformations points to the large number of monogenic syndromic disorders in the database. Most of these disorders are relatively rare, with prevalence rates of less than 1 in 100,000. In fact, 314/394 (79.6%) Lebanese genetic disorders with known prevalence rates are rare, with rates less than 10 in 100,000. On the other hand, based on limited studies, certain disorders, including Fanconi anemia (MIM # 227650), alpha/beta thalassemia (MIM # 604131; MIM # 613985), and familial hypercholesterolemia (MIM # 143890), have been reported to have high prevalence rates [22][23][24][25]. Analyzing CTGA entries also allows us to note several rare disorders with a remarkably high occurrence among Lebanese subjects. For instance, odontoonychodermal dysplasia (MIM # 257980) and Dyggve-Melchior-Clausen syndrome (MIM # 223800) have each been identified in eight different families to date, while 15 unrelated families have been reported with Berardinelli-Seip congenital lipodystrophy type 2 (MIM # 269700). In its various forms, predominantly type 1A (MIM # 220290), recessive deafness has been reported in subjects from over 30 Lebanese families. The presence of such rare genetic disorders, especially in large consanguineous kindreds, is very useful in the mapping of the causative loci and the identification of the causal gene [26][27][28]. In fact, our survey on the Lebanese population identified 58 separate genetic disorders that were first mapped in Lebanese families (Table 1). Despite this, 62 of the disorders reported in Lebanon in the CTGA Database remain unmapped. We collated the clinical features of all Lebanese subjects in the CTGA Database (Table 2). Individual patients and familial studies frequently reported intellectual disability, developmental delay, short stature, hearing impairment, muscular hypotonia etc., highlighting the role of congenital malformations, in accordance with our ICD-10 classification. In contrast, an analysis of clinical manifestations in studies involving large groups of patients, mostly comprising association studies, brings out the impact of multifactorial and polygenic disorders that are common in the population (Table 2). For instance, diabetes, hypercholesterolemia, coronary artery disease, and neoplasms feature prominently. The most recent review of genetic disorders on Lebanon which surveyed CTGA, OMIM, and the literature reported a total of 378 diseases reported in individuals of Lebanese origin [8]. In a preliminary 2017 CTGA analysis, only about half of these reported diseases had been molecularly diagnosed. In the current study, 78% of the 642 diseases we report have a molecular diagnosis. This rise in the number of reported diseases as well as in percentage of molecularly diagnosed cases is likely due to the increased adoption of NGS by Lebanese clinics/diagnostic centers. In addition, from our own experience, revisiting NGS data has helped in identifying causal variants of previously undiagnosed cases [29]. However, there remain 131 diseases out of the 642 total diseases we report here from CTGA wherein the Lebanese subjects lack a molecular diagnosis. The vast majority of these 131 disorders (82%) contain purely clinical descriptions or reports with no molecular study attempted.
To date, CTGA hosts a total of 1316 variant records described in Lebanese subjects. The majority of variants entered into CTGA expectedly involved substitutions, followed by deletions, duplications, as well as in-del and insertion variants. Less frequently reported variants included haplotypes, involving up to three variants on one allele, and microsatellites ( Figure 4A). Variant records were added to CTGA and screened against dbSNP and ClinVar variant databases, revealing 150 CTGA-exclusive variants (Supplementary Table  S2), 840 with both dbSNP and ClinVar records, 274 with dbSNP records, and 28 with ClinVar records ( Figure 4B). Additionally, 222 variants were described in text summaries in disease and gene records, with over half of these involving copy number variations (54.8%), and the remaining entries reporting linkage studies, unspecified/whole gene deletions, gene rearrangements, inversions, as well as karyotypes. HLA and KIR alleles, as well as Gm and MHC class III allele variants were excluded from this count.
A more detailed look at the individual variants reported reveals several interesting results. The first of these is the presence of high prevalence variants reported in subjects of Lebanese origin. One such example is the p.Met1Ile mutation (rs587777839) in PET100. The latter has been identified in over 31 subjects exclusively from 12 different Lebanese families to date [30][31][32][33]. Another variant to note is the well-known p.Cys681X variant in the LDLR gene (rs121908031). Although it has been identified in various Arab and non-Arab nationalities, this variant, associated with Familial Hypercholesterolemia 1 (MIM # 143890), has been termed the Lebanese allele because of its high frequency in the Lebanese population [34,35] and even the Lebanese diaspora [36][37][38]. Evidence points towards founder mutation events driving the increased prevalence of these two variants. Another category of variants are those that, although have been reported in non-Arab subjects, have so far only been reported in Lebanese subjects among Arabs in CTGA. For instance, a mutation in SLC52A2 (rs398124641) leading to Brown-Vialetto-Van Laere Syndrome 2 (MIM # 614707) has been identified in two large unrelated Lebanese consanguineous families [39,40]. Another in BSCL2 (rs587777608) has been reported in five unrelated Lebanese families with Congenital Generalized Lipodystrophy 2 (MIM # 269700) [41,42]. A p.Met1Val mutation in DMP1 (rs104893834), associated with Autosomal Recessive Hypophosphatemic Rickets 1 (MIM # 241520) has been described in 14 subjects from at least three different Lebanese families [43][44][45].
Repositories of population specific genetic variants are crucial in providing clinical interpretations of these variants for both rare and common genetic disorders [46]. This is especially true for Arab nations which are burdened with a high incidence of rare genetic disorders and occurrence of founder mutations within their populations. Unfortunately, the Middle Eastern population is represented poorly in global variation databases, such as the Genome Aggregation Database [47]. Through continuous updates since its first release in 2005, CTGA is now the largest compendium of clinical genomic variants in Arab populations. Despite only hosting bibliographic data, 13% of the Lebanese variants within it are not found in either dbSNP or Clinvar, indicating its value for clinicians and researchers who deal with Arab patients (Supplementary Table S2).
CTGA is freely accessible online and researchers are encouraged to make use of the data available within it. As an example, here we show the type of data on ciliopathies in Lebanon that can be accessed from the database. CTGA contains information on 31 different ciliopathies that have been diagnosed in Lebanese subjects. These include several subtypes of Bardet-Beidl Syndrome, Leber Congenital Amaurosis and Usher Syndrome (Supplementary Table S3). Some of these ciliopathies are disorders that were first mapped in Lebanese families, such as Orofaciodigital Syndrome XIV (MIM # 615948), Short-Rib Thoracic Dysplasia 14 with Polydactyly (MIM # 616546), and Bardet-Biedl Syndrome 10, (MIM # 615987) [48][49][50]. On the other hand, the database also contains reports of rare ciliopathies in Lebanese subjects that are yet to be mapped, such as Ciliary Discoordination due to Random Ciliary Orientation (MIM # 215518) and Rhizomelic Dysplasia, Scoliosis, and Retinitis Pigmentosa (MIM # 610319) [51,52]. Of the 676 genes studied in Lebanese subjects, 28 (4.1%), carrying a total of 45 variants, are genes related to ciliopathies (Supplementary Table S3). Notably, seven of these variants, associated with subtypes of LCA, BBS, and Usher Syndrome, were absent from dbSNP and ClinVar.

Conclusions
In this report, we have attempted to provide an overview of the status of genetic disorders in Lebanon. High levels of consanguinity have been shown time and again to correlate with the spread of rare recessive disorders through inbred kindreds, especially in Arab populations [27,[53][54][55]. In Lebanon, the relatively high level of consanguinity, coupled with a severe lack of genetic infrastructure within the country [8], has serious implications for the diagnosis and treatment of families affected with genetic disorders. There is thus an immediate need to both increase awareness among the general population on the consequences of inbreeding and familial genetic disorders, as well as to build advanced molecular diagnostic facilities. Simultaneously, despite improvement over the past 20 years, active effort needs to be made towards building an environment that further provides support for clinical research.
Population level screening programs are still in their infancy in Lebanon. Although privately operated neonatal screening programs exist, they are estimated to cover less than half of the newborn population [56]. At the same time, rare disease registries that can offer valuable insights to clinicians, pharmaceutical companies and families of affected patients are non-existent. Efforts towards initiating a comprehensive public neonatal screening program and establishing rare disease registries could go a long way towards reducing the burden of genetic disorders in the country.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/genes12101518/s1, Table S1: List of all disorders reported in Lebanese subjects on CTGA, with their OMIM numbers and associated genes where applicable, as well as mode of inheritance and WHO ICD-10 classification, Table S2: List of variants reported in Lebanese subjects in CTGA, but not in ClinVar or dbSNP, Table S3: List of ciliopathies, related genes and gene variants described in Lebanese subjects in CTGA. Institutional Review Board Statement: Ethical review and approval were waived for this study, due to the bibliographic nature of the data analyzed and described in this report.

Informed Consent Statement: Not applicable.
Data Availability Statement: All data reported and analyzed in this study can be found at www. cags.org.ae/ctga-search.