Haplotype Analysis of GJB2 Mutations: Founder Effect or Mutational Hot Spot?

The GJB2 gene is the most frequent cause of congenital or early onset hearing loss worldwide. In this study, we investigated the haplotypes of six GJB2 mutations frequently observed in Japanese hearing loss patients (i.e., c.235delC, p.V37I, p.[G45E; Y136X], p.R143W, c.176_191del, and c.299_300delAT) and analyzed whether the recurring mechanisms for each mutation are due to founder effects or mutational hot spots. Furthermore, regarding the mutations considered to be caused by founder effects, we also calculated the age at which each mutation occurred using the principle of genetic clock analysis. As a result, all six mutations were observed in a specific haplotype and were estimated to derive from founder effects. Our haplotype data together with their distribution patterns indicated that p.R143W and p.V37I may have occurred as multiple events, and suggested that both a founder effect and hot spot may be involved in some mutations. With regard to the founders’ age of frequent GJB2 mutations, each mutation may have occurred at a different time, with the oldest, p.V37I, considered to have occurred around 14,500 years ago, and the most recent, c.176_191del, considered to have occurred around 4000 years ago.


Introduction
Congenital hearing loss affects approximately one in 500-1000 infants in developed countries, and genetic causes account for at least 50% of all childhood onset non-syndromic sensorineural hearing loss [1]. Currently, it is estimated that there are more than 100 causative genes related to non-syndromic hereditary hearing loss [2], with the most frequent deafness-associated gene worldwide being the GJB2 gene. Hearing loss caused by GJB2 gene mutations is divided into autosomal recessive inheritance (DFNB1A) and autosomal dominant inheritance (DFNA3A), but most cases of GJB2-associated hearing loss are autosomal recessive inheritance. The allele frequency of GJB2 gene mutations in the normal Japanese population is approximately 2% [3]. Regarding GJB2 mutations, recurrent mutations are known to differ among ethnic groups. For example, the c.35delG mutation is commonly observed in European, American, North African, and Middle Eastern populations, but this mutation is rarely observed in the Japanese population, whereas the c.235delC mutation is commonly observed in the Japanese population, but this mutation is relatively rare in European and American populations [4]. Therefore, it is important to clarify the mutation spectrum in each population. In particular, the identification of recurrent mutations is crucial for molecular diagnosis to allow decision-making with regard to the appropriate intervention. Generally, recurrent genetic mutations occur via two mechanisms: one is a founder effect and the other is a mutational hot spot. Interestingly, there are great variations in the prevalence of patients with the GJB2 mutation in each population, suggesting that the allele frequency in the population, which reflects a founder effect, strongly affects the status of the GJB2 gene in the deafness population.
Indeed, the c.35delG mutation in the GJB2 gene has been proven to be due to a founder effect by haplotype analysis using single nucleotide polymorphisms (SNPs) [5]. Recently, not only GJB2, but mutations in various other genes have been extensively studied and the establishment of these recurrent mutations due to a founder effect or a mutational hot spot clarified [6][7][8][9].
In this study, six mutations in the GJB2 gene commonly observed in the Japanese population were analyzed by SNP-based haplotype analysis to estimate whether the recurring mechanisms for each mutation were due to a founder effect or a mutational hot spot. Also, as founder effects have received special interest in terms of human migration, to address questions about the origin of the founder effect, we also calculated the age at which each mutation considered to be established by a founder effect in this study using the principle of genetic clock analysis [10].

Subjects
We enrolled 7408 sensorineural hearing loss patients, and extracted about 20 patients with each homozygous GJB2 mutation frequently identified in the Japanese population (i.e., c.235delC, p.V37I (c.109G>A), p.[G45E; Y136X] (c.[134G>A; 408C>A]), p.R143W (c.427C>T), c.176_191del, and c.299_300delAT) (Figure 1) [4]. For the mutations with fewer patients, we also included patients with compound heterozygous mutations, including c.235delC. By using these patients, it was possible to estimate the haplotype for each mutation by eliminating the c.235delC haplotype. This study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics to allow decision-making with regard to the appropriate intervention. Generally, recurrent genetic mutations occur via two mechanisms: one is a founder effect and the other is a mutational hot spot. Interestingly, there are great variations in the prevalence of patients with the GJB2 mutation in each population, suggesting that the allele frequency in the population, which reflects a founder effect, strongly affects the status of the GJB2 gene in the deafness population. Indeed, the c.35delG mutation in the GJB2 gene has been proven to be due to a founder effect by haplotype analysis using single nucleotide polymorphisms (SNPs) [5]. Recently, not only GJB2, but mutations in various other genes have been extensively studied and the establishment of these recurrent mutations due to a founder effect or a mutational hot spot clarified [6][7][8][9].
In this study, six mutations in the GJB2 gene commonly observed in the Japanese population were analyzed by SNP-based haplotype analysis to estimate whether the recurring mechanisms for each mutation were due to a founder effect or a mutational hot spot. Also, as founder effects have received special interest in terms of human migration, to address questions about the origin of the founder effect, we also calculated the age at which each mutation considered to be established by a founder effect in this study using the principle of genetic clock analysis [10].

Subjects
We enrolled 7,408 sensorineural hearing loss patients, and extracted about 20 patients with each homozygous GJB2 mutation frequently identified in the Japanese population (i.e., c.235delC, p.V37I (c.109G>A), p.[G45E; Y136X] (c.[134G>A; 408C>A]), p.R143W (c.427C>T), c.176_191del, and c.299_300delAT) ( Figure 1) [4]. For the mutations with fewer patients, we also included patients with compound heterozygous mutations, including c.235delC. By using these patients, it was possible to estimate the haplotype for each mutation by eliminating the c.235delC haplotype. This study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the

Mutation Analysis
Amplicon libraries were prepared using an Ion AmpliSeq™ Custom Panel (ThermoFisher Scientific, MA, USA), in accordance with the manufacturer's instructions, for 68 genes reported to cause non-syndromic hereditary hearing loss. After preparation, emulsion PCR and sequencing were performed according to the manufacturer's instructions. The detailed protocol has been described elsewhere [11]. MPS was performed with an Ion Torrent Personal Genome Machine (PGM) system

Mutation Analysis
Amplicon libraries were prepared using an Ion AmpliSeq™ Custom Panel (ThermoFisher Scientific, MA, USA), in accordance with the manufacturer's instructions, for 68 genes reported to cause non-syndromic hereditary hearing loss. After preparation, emulsion PCR and sequencing were performed according to the manufacturer's instructions. The detailed protocol has been described elsewhere [11]. MPS was performed with an Ion Torrent Personal Genome Machine (PGM) system using an Ion PGM™ 200 Sequencing Kit (ThermoFisher Scientific) and an Ion 318™ Chip (Life Technologies). The sequence data were mapped against the human genome sequence (build GRCh37/hg19) with a Torrent Mapping Alignment Program. After sequence mapping, the DNA variant regions were piled up with Torrent Variant Caller plug-in software (ThermoFisher Scientific). After variant detection, their effects were analyzed using ANNOVAR software [12]. After annotation, we selected the patients with biallelic pathogenic GJB2 mutations which were reported previously. Direct sequencing was utilized to confirm the selected patients.

SNP Analysis
Haplotypes within the 2 Mbp region surrounding the position of the most frequent mutation (c.235delC) were characterized using a set of 23 SNPs (11 sites upstream and 12 sites downstream). The most representative Tag SNPs were selected at approximately 100,000 bp intervals. For selecting each SNP, we referred to the allelic frequencies in the Integrative Japanese Genome Variation Database (cf. https://ijgvd.megabank.tohoku.ac.jp/). For the Tag SNPs with extremely biased allelic frequencies (e.g., C: 97%, T:3%), we chose other Tag SNPs near this interval ( Figure 2). Haplotype analysis was performed using the direct sequencing method.
Genes 2019, 10, x FOR PEER REVIEW 3 of 9 using an Ion PGM™ 200 Sequencing Kit (ThermoFisher Scientific) and an Ion 318™ Chip (Life Technologies). The sequence data were mapped against the human genome sequence (build GRCh37/hg19) with a Torrent Mapping Alignment Program. After sequence mapping, the DNA variant regions were piled up with Torrent Variant Caller plug-in software (ThermoFisher Scientific).
After variant detection, their effects were analyzed using ANNOVAR software [12]. After annotation, we selected the patients with biallelic pathogenic GJB2 mutations which were reported previously. Direct sequencing was utilized to confirm the selected patients.

SNP Analysis
Haplotypes within the 2 Mbp region surrounding the position of the most frequent mutation (c.235delC) were characterized using a set of 23 SNPs (11 sites upstream and 12 sites downstream). The most representative Tag SNPs were selected at approximately 100,000 bp intervals. For selecting each SNP, we referred to the allelic frequencies in the Integrative Japanese Genome Variation Database (cf. https://ijgvd.megabank.tohoku.ac.jp/). For the Tag SNPs with extremely biased allelic frequencies (e.g., C: 97%, T:3%), we chose other Tag SNPs near this interval ( Figure 2). Haplotype analysis was performed using the direct sequencing method.

Figure 2.
The location of single nucleotide polymorphisms (SNPs). Haplotypes within the 2 Mbp region surrounding the position of the most frequent mutation (c.235delC) were characterized using a set of 23 SNPs (11 sites upstream and 12 sites downstream). Tag SNPs were selected at approximately 100,000 bp intervals. For the positions with extremely biased allelic frequencies (e.g., C: 97%, T:3%), SNPs were inevitably set according to the interval. The blue rectangle in the middle indicates the GJB2 gene. The yellow line in the rectangle indicates c.235delC. Other white rectangles indicate genes around GJB2. The numbers above the line indicate the relative distance of each SNP when c.235delC is set to 0. The numbers of the SNPs below the line correspond to the SNP numbering used in this paper.

Statistical Analysis
The linkage disequilibrium range was examined by comparing the allele frequency of each SNP for the hearing loss patients analyzed in this study to the allele frequency in the 3.5KJPN population in the Integrative Japanese Genome Variation Database. Briefly, the allele frequency obtained in this study and the allele frequency in the 3.5KJPN population were examined by using the X 2 test, and those with a significant difference were regarded as SNPs with linkage disequilibrium. To estimate the linkage disequilibrium region, we used the following criteria; 1) for two continuous SNPs showing p > 0.05, this region was not considered to show linkage disequilibrium, and 2) SNPs with allele frequencies ranging from 0.45-0.55 were not included in the linkage disequilibrium.

Estimation of the Occurrence of Each Recurrent Mutation
The estimation of the age at which each mutation occurred was calculated using the equation

Statistical Analysis
The linkage disequilibrium range was examined by comparing the allele frequency of each SNP for the hearing loss patients analyzed in this study to the allele frequency in the 3.5KJPN population in the Integrative Japanese Genome Variation Database. Briefly, the allele frequency obtained in this study and the allele frequency in the 3.5KJPN population were examined by using the X 2 test, and those with a significant difference were regarded as SNPs with linkage disequilibrium. To estimate the linkage disequilibrium region, we used the following criteria; 1) for two continuous SNPs showing p > 0.05, this region was not considered to show linkage disequilibrium, and 2) SNPs with allele frequencies ranging from 0.45-0.55 were not included in the linkage disequilibrium.

Estimation of the Occurrence of Each Recurrent Mutation
The estimation of the age at which each mutation occurred was calculated using the equation where P mO is the frequency of the marker allele O on all chromosomes bearing the mutation M, P O is the frequency of the marker allele O on all chromosomes in the normal population, c is the recombination rate per generation (we used the value: one recombination for every 1,000,000 bp per generation [13]), and t is the number of generations [10].

Results
A total of 263 patients with homozygous GJB2 mutations were identified among the 7408 patients (c. 235delC: 192  On the other hand, in patients with p.V37I, no linkage disequilibrium was observed in our SNP analysis and SNPs very close to the p.V37I mutation (5'SNP1) also differed among the patients. However, if the patients were divided by the 5 SNP1 residue (i.e., the C group or T group), common haplotypes could be confirmed in the 3'SNPs. The C residue group showed a G residue in 3'SNP2, G in 3'SNP4, and G in 3'SNP6, while the T residue group showed different haplotypes with A in 3'SNP2, A in 3'SNP4, and A in 3'SNP6. The linkage disequilibrium range for the C residue group was 80,923 bp, whereas that for the T residue group was 301,883 bp.
Thus, we concluded that all six mutations occurred due to founder effects, and we next estimated the year at which each mutation occurred by using the length of the linkage disequilibrium and the equation described in the Methods section. If we assume that one generation is 25 years, it was predicted that c.235delC occurred around 6500 years ago, p.[G45E; Y136X] occurred around 6000 years ago, p.R143W occurred around 6500 years ago, c.176_191del occurred around 4000 years ago, and c.299_300delAT occurred around 7700 years ago. Further, it was predicted that the founder of the C residue group in 5'SNP1 for p.V37I occurred around 14,500 years ago and the founder of the T residue group in 5'SNP1 for p.V37I occurred around 5000 years ago. The blue lines show the linkage disequilibrium range. The yellow boxes show that there is a significant difference between the allele frequency obtained in this study and the allele frequency in the Integrative Japanese Genome Variation Database as assessed by the X 2 test. The gray boxes show that there is no significant difference between the allele frequency obtained in this study and the allele frequency in the Integrative Japanese Genome Variation Database as assessed by X 2 test. The green boxes show that there is no significant difference due to the originally biased allele frequency in the Integrative Japanese Genome Variation Database. The red boxes show that the allele frequency obtained this study is from 0.45 to 0.55.

Discussion
The present results indicated that all six mutations frequently observed in the Japanese population seem to be founder mutations. The geographic regional distribution of each GJB2 mutation can also be indicative of whether the mutation is a founder mutation or hot spot mutation. If the mutations are clearly found in only a limited ethnic population, it is possible to predict those mutations are due to a founder effect; conversely, if the mutations appear uniformly all over the world or in many ethnic populations, then the mutations can be considered to be due to a mutational hot spot.
Regarding the c.235delC mutation, a series of previous studies based on haplotype analysis concluded that this mutation was caused by a founder effect [14][15][16], with the same result obtained in this study. Our previous haplotype analysis using six SNPs in 16 homozygous and 92 heterozygous c.235delC patients and in 90 controls without the 235delC mutation, indicated that the c.235delC mutation is derived from a common ancestor because we found common alleles on two SNPs near the c.235delC [10]. Similarly, Yan et al. performed haplotype analysis using seven SNPs near the c.235delC for 45 unrelated patients carrying the c.235delC mutation and found common alleles on their SNPs, so they also concluded that the c.235delC mutation is caused by a founder effect [16].
Based on our review of the GJB2 mutation spectrum, each population around the world has a specific spectrum [4]. According to the results, the c.235delC mutation is frequently observed in countries in East and Central Asia, such as Japan, Korean, China, Mongolia, and Thailand, whereas it is rarely observed in other countries. The uneven distribution of this mutation also suggests that it was caused by a founder effect.
Based on the present data, the occurrence of the c.235delC mutation is estimated at around 6500 years ago. Yan et al. reported that the c.235delC mutation may have occurred about 11,500 years ago [16]. The reason for the differences in the estimated time between two studies may be due to the different SNPs used and the number of subjects.
The c.299_300delAT mutation is reported to be found in the Japanese, Chinese, Korean, Mongolian, Australian, Turkish, Romanian, and American populations and is especially frequently observed in East Asian countries [4]. Current haplotype analysis indicated that this mutation is estimated to have occurred around 7700 years ago. The estimated founders' age of the c.299_300delAT mutation is compatible with the distribution of the mutation; i.e., the distribution of the c.299_300delAT mutation is relatively wider than those of the c.176_191del mutation and p.[G45E; Y136X] mutation mentioned below.
The c.176_191del mutation is observed in the Japanese, Chinese, American, and Brazilian populations, but is mainly distributed in East Asian countries [4]. We estimated that this mutation occurred around 4000 years ago, a relatively recent event, which is consistent with the limited distribution of this mutation.
The p.[G45E; Y136X] mutation is only observed in the Japanese population [4]. This distribution means that this mutation occurred only in a Japanese ancestor, but this mutation is estimated to have occurred around 6000 year ago based on our results. However, the putative linkage disequilibrium length for this mutation was longer than those of the other mutations, supporting the notion that this mutation occurred more recently. Therefore, the true age at which p.[G45E; Y136X] occurred may be younger than that suggested by our analysis. Further analysis using a larger number of patients may clarify this estimation.
Regarding the p.R143W mutation, Tsukada et al. summarized this mutation as occurring in the Ghanaian, Japanese, Korean, and Argentinean populations at moderate frequencies and also in the Mongolian, Australian, Iranian, Turkish, Estonian, Dutch, Spanish, Swedish, and American populations at low frequencies [4]. Except for the Ghanaian population, the frequency of this mutation in all GJB2 mutations observed in each population is less than 10%. Otherwise, the frequency of p.R143W in all GJB2 mutations observed in the Ghanaian population is 90.9%. The wide distribution of this mutation suggests that this mutation occurred as a mutational hot spot or in very old common ancestors. However, the finding that the peripheral region of the p.R143W mutation is conserved in the Japanese population suggests that this mutation was caused by a founder effect and the estimated age of occurrence is 6500 years ago. This fact suggests that the ancestor of this mutation in Ghana and that in the other countries may be different, and this mutation may have occurred as multiple events. Haplotype analysis of the p.R143W patients in Ghana could clarify this controversy.
Lastly, regarding the p.V37I mutation, according to Dahl et al. [17], haplotype analysis showed that the p.V37I mutation is derived from a founder effect. In their report, the sample size was relatively small with only four subjects examined, so the possibility of selection bias cannot be ruled out. Based on our present results, it would be natural to hypothesize that there were two founders for the p.V37I mutation, although we did not distinguish which haplotype group is the founder for the Australian population reported by Dahl et al.
Tsukada et al. summarized the p.V37I mutation as frequently occurring in the East Asian, Southeast Asian, and Australian populations [4]. At the same time, this mutation is also observed in the United States, Argentina, and North African countries at low frequencies. This biased distribution of the p.V37I mutation also supports the hypothesis that this mutation occurred due to a founder effect. The wide distribution suggested that this mutation occurred at an older age and probably as multiple events. Indeed, from our results, it was estimated that the founder of the C residue group in 5'SNP1 for p.V37I occurred around 14,500 years ago. In addition, the founder of the T residue group in 5'SNP1 for p.V37I is considered to be relatively new, and may be the type most commonly found in East Asian countries. Haplotype analysis of p.V37I patients in Taiwanese and Chinese patients could clarify this problem.
There are two limitations to the method used for the estimation of the time at which each founder mutation occurred. The first is that we do not know the correct recombination rate of this region. For example, based on the distance from 5'SNP 6 to c.235delC (265,063bp), we used a linear relationship of 1cM, and calculated the recombination rate as 0.00265063 per generation. However, as we do not know the true recombination frequency for this region, this calculation is only a rough estimate. The second limitation is the SNP selection bias. Ideally, SNPs with an allele frequency of 0.5:0.5 are favorable for detecting statistically significant differences. However, for example, the interval of about 250,000 bp between the 5'SNPs 6 and 5'SNPs 7 is open because there were no appropriate SNPs. Since the prediction of the founders' ages is dependent on the linkage disequilibrium length, more detailed data on the correct recombination rate will solve the problem.

Conclusions
This study has shown that frequent mutations in GJB2 are derived from founder effects rather than hot spots. Also, in some of the frequent mutations (p.R143W and p.V37I) there are potentially multiple origins, indicating that both a founder effect and hot spot may be involved. When considering the fact that there are many ethnically specific mutations, in spite of methodological limitations, this study has shed light on the relative founders' age of frequent GJB2 mutations and shown one example of the occurrence of mutational events during human migration.