Determination of Carrier Frequency of Actionable Pathogenic Variants in Autosomal Recessive Genetic Diseases in the Turkish Cypriot Population

Whole-exome DNA sequencing is a rich source of clinically useful information for specialists, patients, and their families, as well as elucidating the genetic basis of monogenic and complex diseases in clinical diagnosis. However, interpreting and reporting variants encompassing exome and genome sequence analysis outcome data are one of the greatest challenges of the genomic era. In this study, we aimed to investigate the frequency and allele frequency spectrum of single nucleotide variants accepted as recessive disease carrier status in Turkish Cypriot exomes. The same sequencing platform and data processing line were used for the analysis of data from 100 Turkish Cypriot whole-exome sequence analysis. Identified variants were classified according to ACMG guidelines, and pathogenic variants were confirmed in other databases such as ClinVar, HGMD, Varsome, etc. Pathogenic variants were detected in 68 genes out of 100 whole-exome sequence data. The carriage rate was the highest in the CYP21A2 gene, causing 21-hydroxylase deficiency (14.70%), 11.76% in the HBB gene causing β-thalassemia, 10.29% in the BTD gene causing biotinidase deficiency, 8.82% in the CFTR gene causing cystic fibrosis, 8.82% in the RBM8A gene causing thrombocytopenia-absent radius syndrome, which is an ultra-rare disease, and 5.88% in the GAA gene causing glycogen storage disease II. The carriage of pathogenic variants in other genes causing the disease (GJB2, PAH, GALC, CYP11B2, COL4A3, HBA1, etc.) was determined as less than 5.00%. Also, the identified variations in the mentioned gene within the examined population were reported. The most prevalent mutation in North Cyprus was a missense variant (c.1360 C>T, p.Pro454Ser) detected in the CYP21A2 gene (rs6445), and the most frequently seen variant in the HBB gene was c.93-21G>A (rs35004220). We investigated reported pathogenic variants by estimating the lower and upper limits of carrier and population frequencies for autosomal recessive diseases, for which exome sequencing may reveal additional medically relevant information. Determining the lower and upper limits of these frequencies will shed light on preventive medicine practices and governmental actions.


Introduction
High-throughput next-generation sequencing (NGS) technology, especially targeted sequencing techniques, was introduced as a cost-effective and high-throughput method for human genome studies and clinical practices [1].It is one of the most widely used methods because of its ability to identify the variants in hotspot and non-hotspot regions of genes [2].Therefore, the identification rate of disease causative variants was raised by using this method, which paved the way for making molecular diagnoses and more effective treatments for patients even with rare diseases [3].In addition, it helps to provide expanded carrier screening in populations without the omission of rare diseases, which means testing individuals without apparent symptoms of a genetic disease but may carry a single variant allele within a gene or genes linked to a particular condition, mainly Mendelian disorders [4].Carrier screening is important in the prevention of pregnancies, with the heightened risk of being impacted by hereditary genetic conditions and in identifying who possesses a genetic disorder with a late or variable onset [5].Numerous Mendelian disorders demonstrate autosomal recessive inheritance patterns, and approximately 1875 identified protein-coding genes are associated with recessive diseases, but this count may encompass only around 20% of the estimated total, indicating that the majority of recessive diseases remain uncharted [6].As carriers of recessive disorders mostly have no clinical manifestation and no suggestive family history for these diseases, most couples are uninformed about the risk of having an affected child [7].However, if both partners carry a pathogenic variant of the same gene or two different pathogenic variants of that gene or the female partner carries a disease-causing variant on her X chromosome, the risk of having an affected fetus elevates [8].So, possessing epidemiological data regarding numerous hereditary conditions and carrier testing facilitates the prevention of the incidence of diseases in future generations by enabling the estimation of the genetic risk score in particular ethnic groups even in the absence of affected cases [9].Moreover, preconception carrier screening is cost-effective due to its role in preventing Mendelian diseases, especially rare ones [7].Indeed, the fundamental goal of carrier screening is to detect carriers, offer them genetic counseling and details about reproductive risks to facilitate their reproductive decision, and provide them with potential choices for reproductive assistance and prenatal tests, which are becoming more advanced, accurate, and rapid day by day with the progress of genetic technologies [10].Recently, advanced genetic tests like NGS methods with the capability of screening hundreds of genetic disorders simultaneously as a panel (selective sets of genes associated with particular diseases) or whole-exome sequencing (WES) (determining exomes of all genes and not limited to a group of genes) have facilitated expanded carrier screening [6].Undoubtedly, WES, as a major NGS technology, has gained increasing prominence in clinical applications and various scientific investigations [5].Meanwhile, due to its ability to provide a more comprehensive evaluation compared to targeted carrier screening tests by examining a large portion of identified protein-coding genes, by identifying causative variants of genetic disorders effectively, and finding previously unknown pathogenic genes for monogenic diseases, it has been widely utilized for carrier screening [10].
In NGS, as a multiplex technology, different segments of DNA are sequenced simultaneously, which causes billions of reads, and the mapping of these reads yields massive data [3].For the sake of the accurate and sufficient management of these data, various bioinformatics software have been developed [2].Despite the multiple guidelines and software, variant interpreting and defining a molecular diagnosis based on the sequencing data remain challenging [8].One of the most useful factors in assessing a variant's potential pathogenicity is the frequency of its alleles in the general population [4].According to ACMG-AMP guidelines, high allele frequency (BA1 and BS1) and the presence of a variation in the controls (BS2) are the criteria that shift the classification toward being benign, but higher frequency among affected individuals (PS4) and being absent from the controls (PM2) are the two criteria which have an impact on a variant classification toward pathogenicity [11].In the direction of determining allele frequency in different populations, various reference datasets such as 1000 Genomes, Exome Variant Server (ESP), Exome Aggregation Consortium (ExAC), and the Genome Aggregation Database (gno-mAD) were created [12].In spite of abundant samples in these projects, allele frequency and disease prevalence are not fully clear in many areas.Hence, a lot of race-matched control studies have taken place around the globe to fill this gap [1].In this study, we aimed to estimate the prevalence of monogenic autosomal recessive diseases according to the frequency of pathogenic alleles in the northern part of Cyprus using the whole-exome sequencing method.

Demographics of Studied Subjects
The study group contains 100 individuals (58 males and 42 females) who came to Near East Hospital due to the Myocardial bridge.Patients with known genetic syndromes like Down syndrome, Turner syndrome, Andersen-Tawil syndrome, Leopard syndrome, and 22q11 deletion syndrome were excluded from this study.Informed consent forms were taken from all participants, and the study protocol was in acquiescence with the Helsinki Declaration and approved by the institutional Ethics Committee (Approval number: YDU/2020/85-1210).

Genomic Analysis Workflow
Venous blood samples were obtained from 100 participants, and genomic DNA was isolated (EZ1 Advanced XL Blood, QIAGEN, Hilden, Germany) from dry blood spots in filter cards (CentoCard) containing ethylenediaminetetraacetic acid (EDTA) for wholeexome sequencing analysis, as well as copy number and mitochondrial DNA analysis following the manufacturer's instructions (QIAamp DNA Blood Mini QIAcube Kit, Qiagen, Valencia, CA, USA).Extracted DNA samples were stored at −20 • C. Prior to the sequencing analysis, DNA quality and concentration for each individual were determined using a photometric spectrometer (OD260/OD280 1.8-2.0).
The service was procured from CENTOGENE ® (Rostock, Germany) for whole-exome sequencing analysis.DNA-captured probes were used for enzymatically digested and enriched target sites of genomic DNA.The target region (~<98% of GRCh37/hg19) covered approximately 41 Mb of the human coding region, flanking ±20 intronic nucleotides of genes and the mitochondrial genome.Sequencing of the generated library was performed on an Illumina platform to achieve at least 20× depth.The company used its own in-house bioinformatics pipeline.Variant calling, annotation, and extensive variant filtering were applied to the GRCh37/hg19 genome assembly, including read alignment and the revised Cambridge Reference Sequence (rCRS) of human mitochondrial DNA (NC_012920).Any variant with a minor allele frequency (MAF) less than 1% and registered as a disease-causing variant in other databases was reported.Variants were categorized according to ACMG guidelines (pathogenic; likely pathogenic; variant with unknown significance (VUS); likely benign; benign) [12].A whole-exome sequencing (WES) analysis does not cover larger deletions/duplications involving intron-exon boundaries, reamplification, or methylation abnormalities.The .vcf files can be given upon request to mahmutcerkez.ergoren@neu.edu.tr.

Results
A WES including next-generation sequencing (NGS)-based copy number variation (CNV) analysis was performed on 100 participants.The targeted nucleotide coverage was ≥20×, and it covered ~99.35% of the interesting regions.The patient group consisted of 42% females and 58% males.The ethnic origin distribution was 45.5% Turkish, 53.5% Turkish Cypriot, and 1% Turkoman (Table 1).Firstly, variants with high allele frequencies in databases such as the 1000 Genome Project (1KGP) (2500 samples; http://www.1000genomes.org,accessed on 13 September 2022), the Exome Variant Server (ESP) (6500 WES samples; https://evs.gs.washington.edu/EVS/),and the Exome Aggregation Consortium (ExAC) database (61,468 multiethnic individuals) were filtered out.Additionally, we focused on any variant function (nonsense, frameshift, conserved splice site, and missense) that affects protein structure, with supporting evidence on the zygosity/segregation/functional importance of the gene.Gene selection was conducted based on OMIM ® phenotypes and variant databases; any variant associated with severe and early-onset disease and reported as "pathogenic" and "likely pathogenic" were determined and listed.Gene variants associated with late-onset diseases with unclear penetrance and/or cancer-related genes with onset in adulthood were not included.
Table 2 showed that 100 participants had pathogenic and/or likely pathogenic variants relevant to Mendelian diseases.In total, 14.7% of the studied population were carriers for CYP21A2 gene variants, which are associated with autosomal recessive congenital adrenal hyperplasia with 21-hydroxylase deficiency (OMIM ® : 201910), 11.7% had HBB gene variants that are relevant to autosomal recessive β-thalassemia (OMIM ® : 613985), which is the most common carrier pattern in Cyprus, 10.29% were found to carry BTD gene variants that cause biotinidase deficiency (OMIM ® : 253260), 8.82% for CFTR gene variants that cause cystic fibrosis (OMIM ® : 602421), 8.82% for the RBM8A gene that causes thrombocytopenia-absent radius syndrome (OMIM ® : 605313), 5.88% for the GAA gene that causes glycogen storage disease II (OMIM ® : 232300), and 4.41% for GJB2 and PAH genes that cause autosomal recessive deafness 1A (OMIM ® : 220290) and Phenylketonuria (OMIM ® : 261600), respectively.The carrier rate was detected as 2.94% for ATP7B, GALC, PYGM, COL4A3, CYP11B2, ECHS1, HBA1, LAMA2, OPHN1, and POLR3A genes, which cause Wilson disease (OMIM ® : 277900), Krabbe disease (OMIM ® : 245200), McArdle disease (OMIM ® : 232600), Alport syndrome 2 (OMIM ® : 203780), hypoaldosteronism (OMIM ® : 203400 and 610600), mitochondrial short-chain enoyl-CoA hydratase 1 deficiency (OMIM ® : 616277), α-thalassemia (OMIM ® : 604131), muscular dystrophy (OMIM ® : 607855 and 618138), and leukodystrophy or Wiedemann-Rautenstrauch syndrome (OMIM ® : 607694 and 264090), respectively.The carrier rate for the OPHN1 gene, which is responsible for X-linked syndromic intellectual developmental disorder (OMIM ® : 300486), was estimated as 2.94%.X-linked G6PD-deficient (favism) hemolytic anemia (OMIM ® : 611162), which is caused by the G6PD gene and many others listed in Table 2, was found at 1.47% in the studied population.3 demonstrates the observed mutations of each gene in the study population with details like nucleotide change, amino acid alteration, and SNP IDs.Among all 84 pathogenic and/or likely pathogenic variants that were encountered in the 64 found genes in this study, missense mutations have the highest frequency.As well frameshift, nonsense, splice site variants, and other mutations were noted.For some genes, only one variation was discovered in carriers, but for some others there was more than one.These mutations were as follows: five mutations in CYP21A2 gene (c.850A>G/p.Met284Val, c.844 G>T/p.Val282Leu, c.293-13 C>G, c.1174   prevention of more than 100 genetic conditions with recessive inheritance patterns [5,10]. In the current study, we aimed to investigate the frequency and allele frequency spectrum of single nucleotide variants accepted as recessive disease carrier status in Northern Cyprus using the whole-exome sequencing method.The use of the WES test will enhance the rate of detection for numerous disorders and gene variations.WES raw data were analyzed, and all pathogenic and likely pathogenic variants associated with the severe and early-onset disease were classified.If any of the participants provide consent to know, information and genetic counseling will be administered to them and their families about the identified variants that indicate additional genetic risks or diagnoses.Cases were carriers of most (14.7%) of these variants within the CYP21A2 gene that is associated with autosomal recessive congenital adrenal hyperplasia with 21-hydroxylase deficiency.According to the information provided by Phedonos et al. in 2013 on the carrier frequency of CYP21A2 mutations in Cyprus, 1 out of 25 to 1 out of 10 newborns were reported as a carrier of a mutation in this gene.Also, Baumgartner-Parzer and colleagues documented the carrier frequency of the CYP21A2 gene in their 2005 study, in which they screened newborns, as 9.5% in the middle European population.Furthermore, Gialluisi et al. (2017) conducted a study in two distinct regions of Italy, in which they calculated the prevalence of a mutation in the CYP21A2 gene and found it to be high.Thus, our findings align with prior research, indicating a heightened prevalence of CYP21A2 gene variations in regions such as Cyprus, Italy, Middle European populations, and Turkey [13][14][15][16].Typically, congenital adrenal hyperplasia's severity corresponds to the combination of more or less severe mutations (homozygous or compound heterozygous) and, as a result, encompasses a broad spectrum of disease manifestations, which could hold significant implications for genetic and prenatal counseling [15,16].Thus, screening the individuals for CYP21A2 gene variants in this population should be recommended in the future.On the other hand, even though the individuals who are heterozygotes for CYP21A2 gene mutations do not exhibit clinical symptoms, they still possess a well-defined phenotype.Nordenström et al. in a study in 2019 investigated the mortality rate and cause of death between CYP21A2 mutation carriers and population controls and suggested that carriers might experience lower mortality while facing severe infections, with a possible emphasis on pneumonia.One theory proposes that an enhanced ability to produce cortisol hormone during acute circumstances could explain the potential evolutionary benefit of being a carrier of CYP21A2 mutation, which may have contributed as an effective factor in survival advantage and could explain the widespread prevalence of CYP21A2 carriers across the globe [17].According to the fact that an inactive pseudogene (CYP21A1P) with 98% similarity to the functional CYP21A2 gene in exonic sequences and 96% in introns exist within the major human histocompatibility complex (HLA), approximately 30 kb away from the CYP21A2 gene, correct genotyping and distinguishing these genes using PCR-based methods are challenging.Because of high levels of homology, amplification occurs in both genes by most primers at the same time [18].Although genotype-phenotype correlation analysis can help to recognize false positive cases and reduce misdiagnosis, it was impractical due to the nature of our research on carriers.So, segregation analysis and studying flanking microsatellites in family members can be helpful solutions to reduce this problem [16].Not surprisingly, β-thalassemia carriers had the second most (11.76%)seen pathogenic variants.The findings manifest a lower frequency of HBB mutation than CYP21A2 mutations in the Turkish Cypriot population rising in Northern Cyprus, which can be the result of the premarital screening program for Thalassemia that started in 1984.According to Bozkurt G, 2007, this project significantly shrank the incidence of Thalassemia between 1991and 2001, and between 2002and 2007, as no thalassemia babies have been born in Northern Cyprus since then [19].Accordingly, having guidelines for the carrier screening of congenital adrenal hyperplasia and preconception and prenatal screening programs for CYP21A2 mutations seems to be the requisite in Northern Cyprus considering the preventive medicine strategies on the island.Also, it seems reasonable to offer CYP21A2 mutation screening to the families before gamete donation or adoption.Despite the high carrier frequency of Familial Mediterranean Fever in the Turkish population [20], no variants in the MEFV gene were observed in our study in Northern Cyprus.This may be the result of the founder effect in the Turkish Cypriot population, but more data are required to prove it.
Furthermore, in this study, the prevalent mutations of each gene were detected in carriers which can offer several benefits and are essential for personalized medicine, improving the overall health and wellbeing of individuals and communities.Gathering data on diverse gene mutations, single nucleotide polymorphisms, or copy number variations has expedited research by identifying target areas for genetic research, leading to the development of new therapies, drugs, and interventions.Likewise, it allows for better and cost-effective healthcare planning and the management of genetic conditions by focusing resources on the most prevalent genetic conditions and reducing unnecessary testing and treatments.This information supports the training and education of healthcare professionals and helps them to be more prepared to diagnose, treat, and manage individuals with these mutations, which can lead to the early detection of genetic disorders, timely interventions and treatments, suggesting proper prenatal tests, or considering alternative reproductive options.In addition, these data assist public health organizations in implementing preventive measures, such as screening programs or genetic counseling services, to reduce the burden of genetic diseases.Additionally, the outcomes of this study can aid individuals and families to make informed decisions about their health and genetic risks and reduce the stigma associated with genetic disorders, as it becomes a more widely understood aspect of a population's genetic makeup.
Some restrictions in this study may lead to either underestimation or overestimation of our results, such as the probable underestimation of carrier numbers due to the small size of the population under examination, which may not accurately represent the entire population of Turkish Cypriotes, as well as the absence of whole-genome sequencing (WGS) data and lack of information on other mutations like promoter and intronic mutations.Another effective issue is the restriction of our estimations on recognized pathogenic or likely pathogenic variants, which leads to missing unidentified variations and underestimating the carrier frequency or overestimated carrier frequencies owing to the incorrect classification of some variants as being pathogenic or likely pathogenic, even though we adhered to the ACMG criteria for classifying all variants.Additionally, the incorrect classification of variants of uncertain significance (VUS) is a challenge that may persist until there is sufficient evidence linking these variants to the disease, which may result in the underestimation of carrier frequency due to the classification of certain potentially pathogenic variants as VUS as a consequence of their high frequency or the absence of supportive clinical data.Moreover, existing estimates may only capture a fraction of the total occurrence of recessive diseases, as statistical projections suggest that some of these conditions remain unidentified or undescribed.
Most of the carrier statuses were relevant to autosomal recessive monogenic disorders; therefore, genetic counseling would be beneficial for their family.However, the COL4A3 gene variants were also associated with autosomal dominant disorders; thus, patients should be re-examined for their possible clinical phenotype considering the penetrance degree in autosomal dominant disorders.However, these variants may help to close a potential diagnostic gap regarding the current clinical picture, but this ethnicity-based information should be used with caution because of challenges like individuals of mixed ethnicity, adoptive backgrounds, or unknown ancestral heritage.This information may be used in the further differential diagnosis processes and orthogonal validation of relevant variants and has the advantage of easing the distress experienced by families during the initial phase of diagnosis and the advantage of informing relatives about their elevated genetic risks.Couples facing genetic risks can explore alternative methods for beginning a family while aligning with their religious and ethical beliefs.This study also is the first WES study performed in the Turkish Cypriot population; therefore, the results, especially for autosomal recessive diseases and their carrier status, have a public health advantage in terms of being significant to shed light on preventive medicine practices and aiding in

Table 1 .
General demographics of the studied group.

Table 2 .
Carriage rate of pathogenic and/or likely pathogenic variants found in genes relevant to Mendelian diseases in 100 participants.

Table 3 .
Pathogenic and/or likely pathogenic variants found in each gene in the study population.