Insight into the Natural History of Pathogenic Variant c.919-2A>G in the SLC26A4 Gene Involved in Hearing Loss: The Evidence for Its Common Origin in Southern Siberia (Russia)

Pathogenic variants in the SLC26A4 gene leading to nonsyndromic recessive deafness (DFNB4), or Pendred syndrome, are some of the most common causes of hearing loss worldwide. Earlier, we found a high proportion of SLC26A4-related hearing loss with prevailing pathogenic variant c.919-2A>G (69.3% among all mutated SLC26A4 alleles that have been identified) in Tuvinian patients belonging to the indigenous Turkic-speaking Siberian people living in the Tyva Republic (Southern Siberia, Russia), which implies a founder effect in the accumulation of c.919-2A>G in Tuvinians. To evaluate a possible common origin of c.919-2A>G, we genotyped polymorphic STR and SNP markers, intragenic and flanking SLC26A4, in patients homozygous for c.919-2A>G and in healthy controls. The common STR and SNP haplotypes carrying c.919-2A>G were revealed, which convincingly indicates the origin of c.919-2A>G from a single ancestor, supporting a crucial role of the founder effect in the c.919-2A>G prevalence in Tuvinians. Comparison analysis with previously published data revealed the identity of the small SNP haplotype (~4.5 kb) in Tuvinian and Han Chinese carriers of c.919-2A>G, which suggests their common origin from founder chromosomes. We assume that c.919-2A>G could have originated in the geographically close territories of China or Tuva and subsequently spread to other regions of Asia. In addition, the time intervals of the c.919-2A>G occurrence in Tuvinians were roughly estimated.


Introduction
Pathogenic variants in the SLC26A4 gene (solute carrier family 26, member 4/pendrin, 7q22.3, OMIM 605646) are one of the most common causes of hearing loss worldwide. The SLC26A4 gene (21 exons) encodes the protein pendrin, which is involved in the transport of various anions [1]. High levels of SLC26A4 expression are observed in the inner ear, thyroid, and kidneys [2]. Several hundred pathogenic SLC26A4 variants (the Deafness Variation Database: https://deafnessvariationdatabase.org/gene/SLC26A4, accessed on 13 February 2023) are currently known to be associated with varying phenotypes. They can lead to nonsyndromic recessive deafness (DFNB4) or Pendred syndrome. The DFNB4 (OMIM 600791) is characterized by the prelingual or perilingual onset of sensorineural or mixed hearing loss, which may be fluctuating or progressive. Pendred syndrome (PDS, OMIM 274600) is an autosomal recessive disorder associated with sensorineural hearing loss and goiter. In the inner ear, deficiency or dysfunction of pendrin presumably leads to the development of endolymphatic hydrops due to defects in anion and fluid transport. As a result, SLC26A4-related hearing loss is most commonly accompanied by the enlarged vestibular aqueduct (EVA) and/or other malformations of the inner ear structures [3]. which was represented by unrelated healthy individuals without c.919-2A>G (n = 63). Both samples were formed after our recent analysis of the SLC26A4 gene in Tuvinians belonging to indigenous population of the Tyva Republic (Southern Siberia, Russia) [27].

Ethics Statement
Written informed consent was obtained from all individuals or their legal guardians before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Bioethics Commission at the Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia (Protocol No. 9, 24 April 2012).
The location of all analyzed genetic markers (STRs and SNPs) on chromosome 7 is presented in Figure 1.
The STRs genotyping was performed by fragment analysis. To amplify fragments containing STRs, the primer sequences were taken from the Ensembl genome browser (http://www.ensembl.org, accessed on 15 September 2022). One (forward) from each primer pair was marked on the 5 end by different fluorescent dyes (Applied Biosystems 5 Labeled/Unlabeled Primer Pairs, Thermo Fisher Scientific, Waltham, MA, USA). The SNP genotyping was performed by Sanger sequencing. To amplify fragments containing SNPs, the primer pairs were designed using Primer Premier 5 tools (https://www. bioprocessonline.com/doc/primer-premier-5-design-program-0001). All used primers are summarized in Table S2 The STRs genotyping was performed by fragment analysis. To amplify fragments containing STRs, the primer sequences were taken from the Ensembl genome browser (http://www.ensembl.org, accessed on 15 September 2022). One (forward) from each primer pair was marked on the 5′ end by different fluorescent dyes (Applied Biosystems 5´ Labeled/Unlabeled Primer Pairs, Thermo Fisher Scientific, Waltham, MA, USA). The SNP genotyping was performed by Sanger sequencing. To amplify fragments containing SNPs, the primer pairs were designed using Primer Premier 5 tools (https://www.bioprocessonline.com/doc/primer-premier-5-design-program-0001). All used primers are summarized in Table S2. Fragment analysis and Sanger sequencing were performed in the SB RAS Genomics Core Facility (Institute of Chemical Biology and Fundamental Medicine SB RAS, Novosibirsk, Russia).

Reconstruction of STR and SNP Haplotypes
The STR and SNP genotyping data were used for the reconstruction and calculation of haplotype frequencies performed by the Expectation-Maximization (EM) algorithm of the Arlequin 3.5.1.2 software [34]. Linkage disequilibrium between the STR and SNP alleles and the c.919-2A>G variant was calculated using δ = (Pd − Pn)/(1 − Pn), where δ is the measure of linkage disequilibrium; Pd is the marker allele frequency among mutant chromosomes carrying c.919-2A>G (the sample of patients homozygous for c.919-2A>G); and Pn is the frequency of the same allele among normal chromosomes (control sample) [35].

Estimation of c.919-2A>G Age
The estimation of the c.919-2A>G age was performed by the DMLE+ v2.3 software method (Disequilibrium Mapping and Likelihood Estimation, http://dmle.org/) [36] and by the single-marker method using the algorithm [37] where g is the number of generations passed from the moment of the mutation appear- Figure 1. Schematic structure of the SLC26A4 gene and the location of genetic markers (five STRs and nine SNPs) that were used to reconstruct the c.919-2A>G haplotypes. Location of SLC26A4 gene is shown by red square. The c.919-2A>G variant is marked by red color. Four of SNP markers from the study by Wu et al. [12] are marked by blue color.

Reconstruction of STR and SNP Haplotypes
The STR and SNP genotyping data were used for the reconstruction and calculation of haplotype frequencies performed by the Expectation-Maximization (EM) algorithm of the Arlequin 3.5.1.2 software [34]. Linkage disequilibrium between the STR and SNP alleles and the c.919-2A>G variant was calculated using δ = (Pd − Pn)/(1 − Pn), where δ is the measure of linkage disequilibrium; Pd is the marker allele frequency among mutant chromosomes carrying c.919-2A>G (the sample of patients homozygous for c.919-2A>G); and Pn is the frequency of the same allele among normal chromosomes (control sample) [35].

Estimation of c.919-2A>G Age
The estimation of the c.919-2A>G age was performed by the DMLE+ v2.3 software method (Disequilibrium Mapping and Likelihood Estimation, http://dmle.org/) [36] and by the single-marker method using the algorithm [37] where g is the number of generations passed from the moment of the mutation appearance to the present; Q is the share of mutant chromosomes unlinked with the founder haplotype; Pn is the population frequency of allele included in the founder haplotype; and Ѳ is the recombinant fraction calculated from physical distance between marker and mutation (under the assumption that 1 cM = 1000 kb). (See details in Supplementary Materials File S1).

Statistical Analysis
Fisher's exact test with a significance level of p < 0.05 was applied to compare the allele and haplotype frequencies between the examined samples of patients and controls.

Results
We assumed that the high prevalence of c.919-2A>G in the SLC26A4 gene in Tuvinians is a consequence of the founder effect. To test whether c.919-2A>G shares a common haplotype, we performed genotyping of polymorphic genetic markers: five STRs (four of them are flanking the SLC26A4 gene and one is intragenic) and nine intragenic SNPs closely linked to c.919-2A>G in 23 unrelated Tuvinian patients homozygous for c.919-2A>G. We also genotyped the same genetic markers in 63 healthy unrelated Tuvinians without c.919-2A>G. The results of the STR and SNP genotyping in the sample of the c.919-2A>G homozygotes and in the control sample are summarized in Tables S3 and S4.

STR and SNP Haplotypes
STR haplotypes. Data on genotyping of five STRs (D7S2420, D7S496, D7S2459, D7S2456, and D7S525) were used to reconstruct STR haplotypes by the Expectation-Maximization (EM) algorithm of the Arlequin 3.5.1.2 software [34] both in Tuvinian deaf patients homozygous for c.919-2A>G and in the ethnically matched control sample. The boundaries of the shared STR haplotypes were determined by observed linkage disequilibrium between certain alleles of distal markers (D7S2420 and D7S525) and c.919-2A>G (Table S3). The total length of the region flanked by D7S2420 and D7S525 is~2.8 Mb. Four different haplotypes formed by specific alleles of all five STRs were found to be associated with c.919-2A>G in homozygotes for c.919-2A>G, while none of these haplotypes were detected in the control sample (Table 1). Among these haplotypes, the 278-120-147-244-227 haplotype was the most common (91.3%) among mutant chromosomes bearing c.919-2A>G.  Designations of the STR alleles included in haplotypes correspond to the size of the PCR products (in nucleotides). The most common haplotypes are shown in bold. *-rs2712212, rs2395911, rs2712211, and rs3801940 correspond to SNPs analyzed in the study by Wu et al. [12]. The haplotype A-G-G-C (rs2712212*-rs2395911*-rs2712211*-rs3801940*) is underlined. Its allelic composition corresponds to the core haplotype T-C-C-G in the study by Wu et al. [12]. Statistically significant (p < 0.05) differences in haplotype frequencies are in bold.
Four of the SNPs (rs2712212, rs2395911, rs2712211, and rs3801940) were early analyzed in the c.919-2A>G carriers from Taiwan in the study by Wu et al. [12], where the core haplotype T-C-C-G composed of certain alleles of these SNPs (designated in their study as JST160568, JST089508, JST160566, and JST160565, respectively) was revealed in a majority of chromosomes of the c.919-2A>G homozygotes, favoring the origin of c.919-2A>G from a common ancestor. In our study, when considering a haplotype constituted by these SNPs, a single haplotype (corresponding to T-C-C-G in the study by Wu et al. [12]) was found in

Estimation of c.919-2A>G Age
Common STR and SNP haplotypes found for pathogenic SLC26A4 variant c.919-2A>G, which is predominant in Tuvinians, suggest that c.919-2A>G originated from a single ancestor. We tried to evaluate the approximate "age" of c.919-2A>G by estimation of the numbers of generations (g) and years (with the assumption that g = 25 years) passed from the ancestral mutation event by the single-marker method (when appropriate) and by the DMLE+ v.2.3 program (Supplementary Materials File S1). Allele 227 of the distal STR marker D7S525 (~2.32 Mb from c.919-2A>G), which was found in strong linkage disequilibrium with c.919-2A>G (Table S3), was used when applying the single-marker method. We were not able to apply the single-marker method for SNP markers due to the lack of recombination in all SNPs analyzed (Table S4). The results of the c.919-2A>G age evaluation are summarized in Table 2. The c.919-2A>G age estimations gave three time intervals depending on different population growth rates (d = 0.05, 0.1, and 0.2) that we applied for calculations (Table 2), thus demonstrating the sensitivity of the methods used from demographic parameters (Supplementary Materials File S1). In addition, we also calculated (using the DMLE+ v.2.3 program) the age of c.919-2A>G based on the SNP internal haplotype A-G-G-C (rs2712212-rs2395911-rs2712211-rs3801940) ( Table 1). The variations in the c.919-2A>G age in that case, being 106-182 generations (2650-4550 years), 112-192 (2800-4800 years), and 105-188 generations (2625-4700 years) with d = 0.05, 0.1, and 0.2, respectively, indirectly indicate a more ancient age of this haplotype ( Table 2).

Discussion
Understanding the regional or ethnospecific landscape of different pathogenic SLC26A4 variants is still far from clear due to the heterogeneity in size and phenotypic characteristics of the examined cohorts of patients and the variable sensitivity of the SLC26A4 molecular diagnostics in different studies. In particular, the proportion of c.919-2A>G, a well-known pathogenic variant, among other mutant alleles of the SLC26A4 gene found in different cohorts of patients with SLC26A4-related hearing loss, remains often uncertain. To assess such data worldwide, we reviewed the literature and selected relevant papers according to the following main criteria: the methodology of the SLC26A4 gene analysis, implying sequencing of all coding exons of SLC26A4 with flanking regions, which allowed us to conclude the presence or absence of c.919-2A>G in the studied patients (at that, more than two unrelated families were studied) and a mandatory indication of the territorial affiliation and/or the ethnicity of patients. In addition, if the required information was not available, we calculated ourselves the proportion of alleles carrying c.919-2A>G among all mutant SLC26A4 alleles identified in patients. Based on the obtained data, we came to the conclusion that the spatial distribution of c.919-2A>G can be limited to the territory of Eurasia, since c.919-2A>G was not found on other world continents, as follows from the relevant studies [38][39][40][41][42][43][44][45][46]. Figure 2 represents a hot map demonstrating the c.919-2A>G prevalence in patients with SLC26A4-related hearing loss in Eurasia.
vant papers according to the following main criteria: the methodology of the SLC26A4 gene analysis, implying sequencing of all coding exons of SLC26A4 with flanking regions, which allowed us to conclude the presence or absence of c.919-2A>G in the studied patients (at that, more than two unrelated families were studied) and a mandatory indication of the territorial affiliation and/or the ethnicity of patients. In addition, if the required information was not available, we calculated ourselves the proportion of alleles carrying c.919-2A>G among all mutant SLC26A4 alleles identified in patients. Based on the obtained data, we came to the conclusion that the spatial distribution of c.919-2A>G can be limited to the territory of Eurasia, since c.919-2A>G was not found on other world continents, as follows from the relevant studies [38][39][40][41][42][43][44][45][46]. Figure 2 represents a hot map demonstrating the c.919-2A>G prevalence in patients with SLC26A4-related hearing loss in Eurasia.   [4,12,[47][48][49][50][51][52][53][54][55]. The c.919-2A>G variant is observed with frequency in the range of 60-70% in Mongolian patients from Mongolia and Mongolians living in the northwest of China [22,25]. In Korea and Japan, the c.919-2A>G appears to be the second-most common, by frequency, pathogenic SLC26A4 variant (after c.2168A>G (p.His723Arg)) in patients with hearing loss, and its frequency falls within 20-40% in Korea and 5-10% in Japan, respectively [5,11,19,56,57]. In Thailand, c.919-2A>G was found in one third of all mutated SLC26A4 alleles in a small sample of patients with Pendred syndrome [58]. The c.919-2A>G has also been found in several Iranian families [9], as well as in Turkish families; thus, the "area" of c.919-2A>G apparently extends to Turkey as a result of historical migration of Turks from Central Asia to Anatolia [59].
The detection of c.919-2A>G in multiple patients from different Asian populations suggests that it might have arisen on a common ancestral founder chromosome. To our knowledge, there are only a few studies aimed at confirming this hypothesis by analyzing the genetic background (haplotypes) of c.919-2A>G [11,12,18,60]. The study by Park et al. [11] was the first study where haplotypes bearing c.919-2A>G were analyzed: three STRs (D7S496, D7S2459, and D7S2456) were used for haplotype analysis in several probands of different ethnicities (Korean, Chinese, and Japanese) who were homozygous or heterozygous for c.919-2A>G. The authors did not reveal a strong association of certain STR alleles with c.919-2A>G on different chromosomes and suggested that c.919-2A>G may be an older founder mutation that has undergone ancestral recombination events with the flanking STR markers. Nevertheless, they did not rule out that c.919-2A>G is a hot spot for recurrent mutational events, despite this allele not being observed in western populations [11]. Subsequently, Yang et al. (2005) analyzed the c.919-2A>G associated haplotypes by the genotyping of five STRs (D7S2549, D7S2420, D7S496, D7S2459, and D7S2456) in four Taiwanese families [18]. Haplotype analysis showed a significant haplotype between markers D7S2420 and D7S2456 common to the family members carrying c.919-2A>G, suggesting that they may be derived from a common ancestor [18]. In the study by Reiisi et al. (2014), different STR haplotypes (defined by the specific alleles of D7S2420, D7S496, D7S2459, and D7S2456) were revealed in two Iranian families carrying SLC26A4 variants c.919-2A>G or c.416G>T (p.Gly139Val) in each of them: 2-2-1-2 for the c.919-2A>G-associated haplotype in one family and 1-1-1-1 for the c.416G>T (p.Gly139Val)associated haplotype in another family [60]. In the study by Wu et al. (2005), the evidence of a common ancestral origin for c.919-2A>G was also obtained, since on the majority of chromosomes with c.919-2A>G in patients homozygous or heterozygous for c.919-2A>G from Taiwan (Han Chinese), the core haplotype consisting of four SNPs closely flanking c.919-2A>G (JST160568, JST089508, JST160566, and JST160565) was revealed [12].
In our recent study [27], we revealed a high rate of the SLC26A4-related hearing loss in Tuvinian patients belonging to indigenous Siberian people living in Southern Siberia (Russia). At that, we found that the frequency of c.919-2A>G reaches 69.3% among all SLC26A4 mutant alleles identified in Tuvinian patients, which allowed us to suggest a role of the founder effect in the accumulation of c.919-2A>G in these indigenous Siberian people.
Thus, based on the common STR and SNP haplotypes bearing c.919-2A>G found in Tuvinians, we obtained convincing evidence supporting the origin of c.919-2A>G from a single ancestor, and the observed accumulation of c.919-2A>G in this indigenous Siberian people may be explained by the founder effect.
In addition, we roughly estimated the potential time intervals of the c.919-2A>G occurrence in the Tuva. As far as we know, there are no age estimations for any pathogenic variants of the SLC26A4 gene yet, and the age of c.919-2A>G was evaluated by us for the first time. It is worth noting that the methods applied for the estimation of the age of mutation are sensitive to demographic parameters (Supplementary Materials File S1) [36,37,[65][66][67]. In view of the lack of reliable data on the variation of the population size of Tuvinians throughout their history, the time of the c.919-2A>G occurrence in the Tuva territory should be considered only as an approximate value. Nevertheless, the partly overlapping time intervals obtained at different population growth rates (d = 0.05, d = 0.1, and d = 0.5) are almost coincided for STR markers (2575-4950 years, 1575-2675 years, and 875-1475 years) and SNP markers (2275-4775 years, 1325-2575 years, and 725-1350 years) ( Table 2). Now, Tuvinians live mainly in the Tyva Republic (Tuva) located in Southern Siberia (Russia), which is bordered by Mongolia in the south and the east. Besides the Tyva Republic, relatively small groups of Tuvinians also live in the northern part of Mongolia and in the Xinjiang Uygur Autonomous Region of China [68,69]. Tuva is located in the geographical center of the Asian continent, and the ancient population of Tuva experienced different gene flows from neighboring regions. At different times, Tuva was at the periphery of a powerful state of Huns (the 2nd century BC-the 1st century AD) or was incorporated in the Ancient Turkic Khaganate (the 6th-8th centuries), the Uyghur Khaganate (the 8th-9th centuries), the Yenisei Kyrgyz Khaganate (the 9th-12th centuries), and also in the Mongol Empire (the 13th-14th centuries). These historical events had a certain impact on the formation of the Tuvinian ethnic group [70,71]. We believe that c.919-2A>G could have appeared in the ancestors of the modern Tuvinian population as a result of different gene flows before the final formation of the Tuvinian ethnos, which was completed by the end of the 13th-14th centuries [70,71].
A very interesting finding of our study was the identity of the "internal" haplotype A-G-G-C (rs2712212-/c.919-2A>G/-rs2395911-rs2712211-rs3801940), encompassing~4.5 kb, found in the c.919-2A>G homozygotes from Tuva (Tuvinians) and the core haplotype (formed by the same SNPs) in the c.919-2A>G carriers from Taiwan (Han Chinese) [12]. This finding indicates the common ancestor for "Tuvinian" and "Chinese" founder chromosomes with c.919-2A>G. Thus, we speculate that c.919-2A>G could have arisen in the geographically close territories of China or Tuva and subsequently spread to other regions of Asia.

Conclusions
The common STR and SNP haplotypes carrying c.919-2A>G, found in Tuvinian patients, convincingly indicate the origin of this SLC26A4 pathogenic variant from a common ancestor that supports a crucial role of the founder effect in the accumulation of c.919-2A>G in the indigenous Siberian people living in Southern Siberia. The identity of small haplotype (~4.5 kb) bearing c.919-2A>G found in Tuvinian and Han Chinese carriers of c.919-2A>G indicates their common founder chromosomes with c.919-2A>G. The SLC26A4 pathogenic variant c.919-2A>G could have arisen in the geographically close territories of China or Tuva and subsequently spread to other regions of Asia.
Supplementary Materials: The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/genes14040928/s1: Table S1: Allelic frequencies of analyzed SNPs according to data from the Genome Aggregation Database (gnomAD v3.1.2, https: //gnomad.broadinstitute.org/); Table S2: Primer sequences for STR and SNP genotyping; Table  S3: The allelic frequencies of STRs (D7S2420, D7S496, D7S2459, D7S2456, and D7S525) in patients homozygous for c.919-2A>G and in the control sample; Table S4: The genotypes and the allelic frequencies of analyzed SNPs in patients homozygous for c.919-2A>G and in the control sample; Supplementary Materials File S1: Estimation of the c.919-2A>G age. Informed Consent Statement: Written informed consent was obtained from all individuals or their legal guardians before they participated in the study.

Data Availability Statement:
The data presented in this study are available in this article and Supplementary Materials.