Analysis of Worldwide Carrier Frequency and Predicted Genetic Prevalence of Autosomal Recessive Congenital Hypothyroidism Based on a General Population Database

To assess how genomic information of the general population reflects probabilities of developing diseases and the differences in those probabilities among ethnic groups, a general population database was analyzed with an example of congenital hypothyroidism. Twelve candidate genes that follow an autosomal recessive inheritance pattern in congenital hypothyroidism (SLC5A5, TPO, TG, IYD, DUOXA2, DUOX2, TSHR, SLC26A7, GLIS3, FOXE1, TSHB, TRHR) in the gnomAD database (v2.1.1) were analyzed. The carrier frequency (CF) and predicted genetic prevalence (pGP) were estimated. The total CF in the overall population was 3.6%. DUOX2 showed the highest CF (1.8%), followed by TG (0.46%), TPO (0.44%), TSHR (0.31%), SLC26A7 (0.144%), DUOXA2 (0.141%), IYD (0.08%), SLC5A5 (0.06%), TRHR (0.059%), GLIS3 (0.059%), TSHB (0.04%), and FOXE1 (0%). The pGP in the overall population was 10.01 individuals per 100,000 births (1:9992). The highest pGP was in the East Asian population at 52.48 per 100,000 births (1:1905), followed by Finnish (35.96), Non-Finnish European (9.56), African/African American (4.0), Latino/Admixed American (3.89), South Asian (3.56), and Ashkenazi Jewish (1.81) groups. Comparing the pGP with the real incidence of congenital hypothyroidism, the pGP in East Asian populations was highly consistent with the real incidence.


Introduction
Genetic screening is a type of genetic testing that is designed to identify a specified population at a higher risk of having or developing a disease with the aim of prevention or early treatment [1]. Generally, genetic screening is performed as targeted testing for known hotspot variations. The development of next-generation sequencing (NGS) techniques has introduced a new genomic era by producing massive genomic data and reducing costs. Recently, the Genome Aggregation Database (gnomAD, https://gnomad.broadinstitute. org/, accessed on 31 March 2021) has been constructed as a very large database that contains genomic information of the general population worldwide [2,3]. Several companies have launched proactive genetic testing for generally healthy individuals without a personal or family history using NGS techniques for identifying particular genes or performing whole exome/genome sequencing that is not confined to hotspot variations.
At present, major questions are how genomic information of a generally healthy population reflects probabilities of developing diseases and the differences in those probabilities among population groups. Additional questions are the use of genetic testing to provide useful and crucial information and eventually prevent diseases in healthy individuals. To answer these questions, genomic data associated with congenital hypothyroidism, which is one of the major achievements of preventive medicine [4], were analyzed based on the general population database, and their carrier frequency and genetic prevalence were estimated by population.
All genetic variants in the 12 candidate genes reported in the gnomAD database (v.2.1.1) were classified following the 2015 American College of Medical Genetics and Genomics (ACMG)/Association for Molecular Pathology (AMP) standards and guidelines [6] and Sequence Variant Interpretation (SVI) general recommendations by Clin-Gen (https://clinicalgenome.org/working-groups/sequence-variant-interpretation/, accessed on 1 April 2021). Loss-of-function variants of these candidate genes were presumed to be responsible for a congenital hypothyroidism mechanism. Therefore, a PVS1 (pathogenic criterion for predicted loss of function variants) decision tree was applied for PVS1 ACMG/AMP variant criteria [7]. If the genetic variants were not known pathogenic or likely pathogenic variants (PLPVs), the null variants (stop-gain, splice site disrupting, or frameshift variants) with flags of low-confidence predicted loss-of-function (pLoF) or pLof flag by loss-of-function transcript effect estimator (LOFTEE, https://github.com/konradjk/ loftee, accessed on 31 March 2021) were filtered. For the PM2 code, the method used to determine the PM2 threshold was adopted by the ClinGen inborn errors of metabolism (IEM) working group [8]; the most frequent pathogenic variant in 12 candidate genes in gnomAD is c.2895_2898delGTTC (p.Phe966SerfsTer29) in DUOX2, which has a minor allele frequency of 0.0029 (allele frequency of heterozygous pathogenic variant in the global population in gnomAD); therefore, the PM2 (absence/rarity) threshold was set at an order of magnitude lower, an allele frequency of 0.0003. The PM3 (in trans criterion) code was applied following the recommendation of the SVI working group (https://clinicalgenome.org/working-groups/sequence-variant-interpretation/, accessed on 1 April 2021); each proband was awarded points value, and then the strength level for PM3 was determined. In addition, the PM3 was applied considering the direction of avoiding circular logic. For the PP1 (co-segregation) code, the evidence strength was determined following the specification of the ClinGen inborn errors of metabolism (IEM) working group [8]. The PP4 and PP5 codes were not applied in this study. For the prediction of variant pathogenicity (PP3), multiple in silico software such as REVEL (>0.75 for missense variants, https://sites.google.com/site/revelgenomics/downloads?authuser=0, accessed on 31 March 2021) [9,10], Mutation Taster (http://www.mutationtaster.org/, accessed on 31 March 2021) [11], PROVEAN [12] (for in-frame insertion or deletion variants, http://provean.jcvi.org/index.php, accessed on 31 March 2021), and spliceAI (for predicted impact on splicing, https://spliceailookup.broadinstitute.org/, accessed on 31 March 2021) [13] were used. In addition, for checking critical functional domains when applying the PVS1 decision tree [7] or for applying the PM1 code, Pfam (https://pfam.xfam.org/, accessed on 1 April 2021), InterPro (https://www.ebi.ac.uk/interpro/, accessed on 1 April 2021), and UniProt (https://www.uniprot.org/, accessed on 1 April 2021) were used.

Carrier Frequency (CF) and Predicted Genetic Prevalence Analysis (pGP)
For CF and pGP analysis, only heterozygous PLPV (not homozygous PLPV) was considered [14][15][16]. Therefore, the allele frequency of heterozygous PLPV (AF V ) and CF V for a variant V were calculated as follows: where the allele count (number of variant alleles), allele number (number of genotyped alleles = 2 * number of individuals), and homozygous count (number of homozygous individuals) for a variant were provided by gnomAD.
For the CF and pGP in a gene level (CF G and pGP G , respectively), two methods were applied. The first method (method 1) followed CF G and pGP G calculations, as previously described [15] as follows:

Discussion
Congenital hypothyroidism is the most common neonatal disorder [17]. Prompt diagnosis and treatment may help prevent patient intellectual disability [18]. The newborn screening program for congenital hypothyroidism with detection of blood spot thyroid stimulating hormone (TSH) or thyroxine (T4) was implemented between 1970 and 1980 worldwide, especially in developed countries. This public health program has nearly eradicated the profound physical and cognitive impairments due to severe congenital hypothyroidism. Recent studies raised the issue that current screening criteria miss borderline or subclinical congenital hypothyroidism [19,20].
Primary congenital hypothyroidism is broadly caused by thyroid dysgenesis (including agenesis, hypoplasia, or abnormal location) or dyshormogenesis (when a normal thyroid gland produces abnormal amounts of thyroid hormone). Historically, the most common cause (approximately 85%) of primary hypothyroidism is thyroid dysgenesis [18,21-

Discussion
Congenital hypothyroidism is the most common neonatal disorder [17]. Prompt diagnosis and treatment may help prevent patient intellectual disability [18]. The newborn screening program for congenital hypothyroidism with detection of blood spot thyroid stimulating hormone (TSH) or thyroxine (T4) was implemented between 1970 and 1980 worldwide, especially in developed countries. This public health program has nearly eradicated the profound physical and cognitive impairments due to severe congenital hypothyroidism. Recent studies raised the issue that current screening criteria miss borderline or subclinical congenital hypothyroidism [19,20].
Primary congenital hypothyroidism is broadly caused by thyroid dysgenesis (including agenesis, hypoplasia, or abnormal location) or dyshormogenesis (when a normal thyroid gland produces abnormal amounts of thyroid hormone). Historically, the most common cause (approximately 85%) of primary hypothyroidism is thyroid dysgenesis [18,[21][22][23], with an incidence of about 1:4000 births. However, thyroid dysgenesis occurs sporadically, and fewer than 5% of thyroid dysgenesis cases are attributable to genetic variations in the known genes. Dyshormogenesis accounts for approximately 15% of primary hypothyroidism and is mainly caused by a genetic defect. The proportion of dyshormogenesis cases within congenital hypothyroidism has been increasing up to over 30% [18,21].
Differences in the prevalence of congenital hypothyroidism by population have been reported [17,26,27] (Table S12). The Asian and Latino (Hispanic) groups showed higher rates, while the African population had a lower rate compared with the incidence of congenital hypothyroidism in the European group. In this study, the pGP of congenital hypothyroidism in EAS (1:1905) was notably higher than other populations and consistent with the incidence based on newborn screening programs in EAS (1:1443-1:2380) (Table S12). However, in contrast to the previous studies (1:1404-1:4149, Table S12), the AMR in this study (1:25,728) showed the lower rate of pGP for congenital hypothyroidism. In addition, there was a difference between the pGP and real incidence of congenital hypothyroidism in other populations except the EAS group.
The difference between the pGP based on the population database and the real incidence might be determined by how many genes following autosomal recessive inheritance patterns were associated with their diseases by population group, because the pGP in this study was calculated not considering autosomal dominant or X-linked inheritance; in a larger proportion of genes that follow an autosomal recessive inheritance pattern within the entire genetic portion, the gap between the pGP and real prevalence is narrowing. There are differences between the proportions of thyroid dysgenesis and dyshormonogenesis between ethnic groups [26]. In particular, the proportion of dyshormonogenesis in congenital hypothyroidism in Asian is higher than that in Caucasian [26]. Since most pathogenic variations associated with dyshormonogenesis are inherited in an autosomal recessive manner, if the proportion of thyroid dyshormonogenesis is higher in the specific population, the pGP would be more consistent with the real incidence. These results may indicate why the pGP of the EAS group in this study was more consistent with the real incidence. Recent studies using NGS have reported that among the causes of congenital hypothyroidism in East Asians, thyroid dyshormonogenesis is higher than thyroid dysgenesis or others [28][29][30]. In contrast, if the proportion of dyshormonogenesis in a population is lower, the difference between pGP and real incidence would be bigger because the genetic cause from thyroid dysgenesis would be underestimated; many of the thyroid dysgenesis genes are inherited in an autosomal dominant manner.
Another important reason for the difference between the pGP and real incidence is that this study simplified the pGP by gene unit and assumed that congenital hypothyroidism occurs only when two (likely) pathogenic variants are present in one gene, depending on the inheritance of autosomal recessivity. However, a congenital hypothyroidism has the genetic heterogeneity. Not all the genetic factors associated with congenital hypothyroidism have yet been identified. Recently, it has been identified that loss-of-function variants in SLC26A7 are another genetic cause of dyshormonogenesis [31]. When testing already known genes in cohorts of patients with congenital hypothyroidism using NGS, only one causative variant is often identified in genes related to autosomal recessive inheritance [32][33][34][35]. This suggests that no other causative variant has been found due to limitations in genetic testing (e.g., variant in deep intron regions) or interpretation for genetic variants (e.g., variants of uncertain significance, VUS), or that the genetic variant found in patients has been inherited in a different pattern (e.g., dominant negative). In addition, a congenital hypothyroidism 7 of 9 might have a digenic cause; digenic DUOX1 and DUOX2 causative variants in cases with congenital hypothyroidism have been reported [36].
Generally, if the specific variant is submitted to ClinVar as PLPV, it means that clinical patients with those PLPVs are present, and those PLPVs are the main cause of their disease development. In this study, the results showed that only 14.5% variants were submitted to ClinVar. In addition, genetic studies on congenital hypothyroidism have been analyzed based on specific populations. Therefore, the CF and pGP might be underestimated in the population that showed the biggest difference between the pGP and real prevalence, because many variants would be classified as VUS and not as PLPVs due to insufficiency of genetic and clinical information. Especially with the application of ClinGen recommendations, even if the variants were detected repeatedly in the patients, the evidence with higher weights cannot be applied without functional studies or family analysis of the variants. For the classification of variants associated with congenital hypothyroidism, establishment of the threshold weight of each functional study with respect to the PS3 code is needed. Additionally, epidemiologic or environmental factors [27,37] also are attributable to the difference between pGP and prevalence.

Conclusions
In conclusion, this is the first study that assessed congenital hypothyroidism based on general population data and estimated CF and pGP by population. In particular, comparing the pGP with the real incidence of congenital hypothyroidism, the pGP in East Asian populations was highly consistent with the real incidence. The approach to obtain genomic information of a general population would allow an additional and helpful direction for preventive medicine. However, when using genomic information from the general population, the pathogenesis of particular diseases should be considered by population group.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12060863/s1, Table S1: Pathogenic or likely pathogenic variants in SLC5A5 gene, Table S2: Pathogenic or likely pathogenic variants in TPO gene, Table S3: Pathogenic or likely pathogenic variants in TG gene, Table S4: Pathogenic or likely pathogenic variants in IYD gene, Table S5: Pathogenic or likely pathogenic variants in DUOXA2 gene, Table S6: Pathogenic or likely pathogenic variants in DUOX2 gene, Table S7: Pathogenic or likely pathogenic variants in TSHR gene, Table S8: Pathogenic or likely pathogenic variants in TSHB gene, Table S9: Pathogenic or likely pathogenic variants in TRHR gene, Table S10: Pathogenic or likely pathogenic variants in SLC26A7 gene, Table S11: Pathogenic or likely pathogenic variants in GLIS3 gene, Table S12: Incidence of congenital hypothyroidism.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable (This study is a public database analysis study).
Informed Consent Statement: Not applicable (This study is a public database analysis study).

Data Availability Statement:
All data analyzed in this study are included in this article and its supplementary information files.