Two Approaches for a Genetic Analysis of Pompe Disease: A Literature Review of Patients with Pompe Disease and Analysis Based on Genomic Data from the General Population

In this study, two different approaches were applied in the analysis of the GAA gene. One was analyzed based on patients with Pompe disease, and the other was analyzed based on GAA genomic data from unaffected carriers in a general population genetic database. For this, GAA variants in Korean and Japanese patients reported in previous studies and in patients reported in the Pompe disease GAA variant database were analyzed as a model. In addition, GAA variants in the Korean Reference Genome Database (KRGDB), the Japanese Multi Omics Reference Panel (jMorp), and the Genome Aggregation Database (gnomAD) were analyzed. Overall, approximately 50% of the pathogenic or likely pathogenic variants (PLPVs) found in unaffected carriers were also found in real patients with Pompe disease (Koreans, 57.1%; Japanese, 46.2%). In addition, there was a moderate positive correlation (Spearman’s correlation coefficient of 0.45–0.69) between the proportion of certain PLPVs in patients and the minor allele frequency of their variants in a general population database. Based on the analysis of general population databases, the total carrier frequency for Pompe disease in Koreans and Japanese was estimated to be 1.7% and 0.7%, respectively, and the predicted genetic prevalence was 1:13,657 and 1:78,013, respectively.


Introduction
Pompe disease, or glycogen storage disease type II (MIM #232300), is a monogenic autosomal recessive disorder caused by deficiency of lysosomal alpha-glucosidase (GAA). This deficiency results in the accumulation of lysosomal glycogen in various body tissues, especially in cardiac and skeletal muscles [1][2][3]. Pompe patients who develop hypertrophic cardiomyopathy and general muscle weakness within the first year of life are classified as having classic infantile Pompe disease. Without enzyme replacement therapy (ERT), classic infantile Pompe disease is typically fatal within the first year of life. Nonclassic Pompe disease (or late-onset Pompe disease (LOPD) or childhood or adult onset Pompe disease) [3] is associated with a slowly progressive weakness of proximal muscles and respiratory dysfunction. Patients with nonclassic Pompe disease either develop symptoms without cardiac involvement before 1 year of life or develop symptoms after the first year of life. Given the benefits of early diagnosis and treatment with ERT, Pompe disease was included in the recommended uniform screening panel (the newborn screening program, NBS) in the USA [4].
In general, a research on rare diseases is conducted with the clinical and genetic information of patients. However, a huge amount of genetic information has been released to public databases, allowing us to think of new approaches to genetic diseases. Theoretically, the prevalence of a specific Mendelian disease is estimated by analyzing the proportion of unaffected carriers (carrier frequency) with the genomic information in the general population. In this study, two approaches were applied in the analysis of Pompe disease. One was based on the literature review of patients with Pompe disease reported, and the other was based on the genomic information from the general population. For these, the GAA gene in patients and the general population was analyzed.

Analysis Workflow
The entire analysis workflow for the two approaches is presented in Figure 1. A literature search for Korean and Japanese patients with Pompe disease was conducted, and the causative GAA variants in the patients were analyzed. In this study, newborn cases without specific symptoms or signs were excluded from the analysis. For the GAA analysis in unaffected carriers, the GAA gene from both Korean and Japanese general population databases was analyzed. Recently, a database containing Korean genomic information was released, called the Korean Reference Genome Database (KRGDB, http: //coda.nih.go.kr/coda/KRGDB/index.jsp, accessed on 8 February 2021), which contains 1722 Korean genomic data [5]. In the present study, GAA genetic variants found in KRGDB (30× coverage group, 1465 individuals) were analyzed. In addition, the Japanese Multi Omics Reference Panel (jMorp, https://jmorp.megabank.tohoku.ac.jp/202102/variants, accessed on 16 March 2021) was used to analyze GAA variants in the Japanese general population [6,7]. To date, the jMorp database contains the genomic data (whole-genome sequencing data) of 8380 Japanese individuals.  In order to compare GAA variants between general databases, excluding common variants, GAA variants with a minor allele frequency (MAF) < 1% in East Asians in the Genome Aggregation Database (gnomAD, https://gnomad.broadinstitute.org/, accessed on 17 March 2021, search by genomic region: chr17:78,075,380-78,093,680 (GRCh37/hg19)) [8] were compared with those in KRGDB and JMorp. For a comparison between patients with Pompe disease and the general population, GAA variants in Korean or Japanese patients were compared with those found in KRGDB or JMorp. In addition, GAA variants in the Pompe disease GAA variant database [9] (http://www.pompevariantdatabase.nl/, accessed on 16 March 2021) were compared with those in the general population (global) in gnomAD (https://gnomad.broadinstitute.org/, accessed on 17 March 2021). A Venn diagram for comparative analysis used InteractiVenn [10] (Figure 2). A correlation between the proportions of certain PLPVs among all PLPVs found in total patients considering the frequency of detection (for simplicity, the proportions of certain PLPVs) and the MAF of those variants in a general population database was analyzed using Spearman's rank correlation analysis. To determine the clinical severity of Pompe disease per specific GAA variant, information provided by the Pompe disease GAA variant database was used [9] (http://www.pompevariantdatabase.nl/, accessed on 16 March 2021).

GAA Variant Classification
All GAA variants were analyzed based on NM_000152.5 (NP_000143.2) and described following the Human Genome Variation Society (HGVS) variant nomenclature standards ((http://varnomen.hgvs.org/, accessed on 17 March 2021). The GAA variants described in an incorrect nomenclature, which were reported in the previous literature, were not included in this study. The GAA variants in KRGDB, jMorp, and previous literature on Korean or Japanese patients with Pompe disease were classified or reclas-sified according to the 2015 American College of Medical Genetics and Genomics and the Association for Molecular Pathology standards and guidelines (2015 ACMG/AMP guidelines) [11] and specifications by a ClinGen lysosomal storage disorders expert panel (https://clinicalgenome.org/affiliation/50009/, accessed on 20 March 2021). Briefly, the PVS1, PS1, PS3, PM2, PM5, and PP4 ACMG/AMP variant criteria by the ClinGen lysosomal storage disorders expert panel (https://clinicalgenome.org/affiliation/50009/, accessed on 20 March 2021) were applied. The PM3 criterion was applied following a general recommendation by the Sequence Variant Interpretation Working Group (https://clinicalgenome. org/working-groups/sequence-variant-interpretation/, accessed on 20 March 2021); that is, each proband was given point values considering the direction of avoiding circular logic and combined values, and then the strength level for PM3 was determined. For the PP3 criterion, REVEL (>0.75 for missense variants) [12,13], MutationTaster [14], MaxEntScan (for predicted impact on splicing) [15], and spliceAI (for the predicted impact on splicing) [16] were used.

Analysis of Carrier Frequency and Predicted Genetic Prevalence
The carrier frequency (CF) and predicted genetic prevalence (pGP) were analyzed based on the heterozygous PLPVs. Neither the KRGDB nor the jMorp database provides information about homozygous variants. Thus (likely) pathogenic variants found in these databases were considered heterozygous variants because the general population assumes that there are no rare diseases. The CF and pGP were calculated as previously described [8,17].

Correlation between Patients with Pompe Disease and Unaffected Carriers
It was found that the overall distribution of clinical severity associated with GAA PLPVs detected in patients with Pompe disease and those in unaffected carriers differed ( Figure 3). Especially, more GAA PLPVs associated with classic infantile Pompe disease were found in patients with Pompe disease than in unaffected carriers.

Carrier Frequency and Predicted Genetic Prevalence Based on General Population Databases
The total CF for Pompe disease in Koreans was estimated to be 1.7%, and the pGP was 1:13,657 (7.32 per 100,000 births) based on KRGDB (Figure 4b). In addition, the CF for Pompe disease in Japanese was predicted to be 0.7%, and the pGP was 1:78,013 (1.28 per 100,000 births) based on jMorp (Figure 4b).

Discussion
The main questions in this study are how GAA variants detected in Pompe patients are related to those in unaffected carriers and, on the contrary, how genomic information from the healthy population reflects the likelihood of developing Pompe disease. Two aspects can be considered to analyze how much GAA PLPVs found in patients and unaffected carriers have in common. One is to consider the qualitative aspect and to analyze how identical the GAA PLPVs between two groups are. The other is the quantitative aspect, which is whether certain GAA PLPVs frequently found in patients with Pompe disease are also found at a high frequency in unaffected carriers. In this study, Koreans and Japanese and a wider range of ethnic groups were independently analyzed to identify questions related to Pompe disease and associated GAA variants.
Of the 17 different PLPVs detected in Korean patients with Pompe disease, 23.5% (4 PLPVs) were found in unaffected Korean carriers in KRGDB. In addition, 20.7% (6/29) of the PLPVs detected in Japanese patients were found in unaffected Japanese carriers in JMorp (Figure 2b and Table 1). Among the PLPVs detected in Korean or Japanese patients with Pompe disease, certain PLPVs were not found in any general population databases, such as KRGDB, jMorp, and gnomAD (e.g., c.796C>T (p.Pro266Ser), c.2171C>A (p.Ala724Asp), c.1585_1586delinsGT (p.Ser529Val), c.1696T>C (p.Ser566Pro)). This means that there are GAA variants that are enriched especially in patients, which contribute to the development of Pompe disease. In contrast, about 50% of the PLPVs in unaffected carriers are also found in real patients with Pompe disease (Koreans, 57.1%; Japanese, 46.2%) (Figure 2b). When considering GAA PLPVs found in both patients and unaffected carriers, the sum of the proportion of these PLPVs (among all PLPVs found in total patients) in patients occupied up to 50%-60% (52.5% in Korean patients and 59.2% in Japanese patients).
In addition, there was a moderate positive correlation (Spearman's correlation coefficient of 0.45-0.69) between the proportion of certain PLPVs in patients and the MAF of their variants in a general population database in each of the three independent analyses. However, not all cases where PLPVs were detected in patients with Pompe disease are reported in the literature, so there is a limit to the accuracy of the proportion of certain PLPVs. In this study, Koreans were predicted to have higher CF and pGP than Japanese, and what is interesting is that Spearman's correlation coefficient in Koreans (0.69) is also higher than in Japanese (0.45).
The incidence of Pompe disease has been estimated to be 1 in 40,000, but varies depending on the geographic region or population [1]. However, the incidence of Pompe disease reported by the NBS is much higher than the estimate [4,8]. Pompe disease has not yet been included in the Korean NBS program. It is important to estimate the incidence or prevalence rate of a disease when considering its inclusion in the newborn screening program. To date, the prevalence or incidence of Pompe disease in Koreans has not been studied. The pGP (1:13,657, 7.32 per 100,000 births) for Korean Pompe disease in this study is comparable to the incidence of 1:16,919 from an NBS program involving 473,738 newborn samples in Taiwan [45]. In this study, the pGP for Pompe disease in Japanese was 1:78,013 (1.28 per 100,000 births). According to a recent study of 103,204 newborns in Japan, the incidence of Pompe disease in Japanese is 1:34,401 (three patients with potential LOPD were identified) [46]. In these three newborns, [c.752C>T; c.761C>T] ([p.Ser251Leu; p.Ser254Leu]) variant was commonly detected, and additionally, c.317G>A (p.Arg106His), c.2003A>G (p.Tyr668Cys), and c.1244C>T (p.Thr415Met) were detected, respectively. According to the 2015 ACMG/AMP guidelines [11] and specifications by a ClinGen lysosomal storage disorders expert panel (https://clinicalgenome.org/affiliation/ 50009/, accessed on 20 March 2021), the additional three GAA variants are classified as VUS. Therefore, if GAA variants are classified according to the current guidelines and specifications, the incidence of Pompe disease in Japanese might be lower than 1:34,401.
Interestingly, there were differences in the distribution of PLPVs detected in East Asia. The c.1316T>A (p.Met439Lys) variant was the most frequently detected in Korean patients and the second most frequent in the Korean general population, but was not found in other populations in gnomAD. This variant is supposed to be a founder pathogenic variant for Korean Pompe disease. In addition, c.546G>T (p.Thr182=) was the most reported variant in Japanese patients, and none other than the Japanese general population was reported. In addition, c.1316T>A (p.Met439Lys) and c.546G>T (p.Thr182=) were only reported in Korean or Japanese patients in the Pompe disease GAA variant database [9] (http://www.pompevariantdatabase.nl/, last accessed on 27 April 2021). The (c.752C>T; c.761C>T) (p.Ser251Leu; p.Ser254Leu) variant has the highest AF in both KRGDB and jMorp. However, this variant was not identified in Korean patients with Pompe disease and was identified in only one Japanese patient with a homozygous status [28]. This variant has been reported as a common causative variant in Asia, but is mostly identified on the NBS (http://www.pompevariantdatabase.nl/ accessed on 27 April 2021). It is presumed that the clinical severity associated with this variant might be very mild. Therefore, Pompe disease with this variant could not be identified. Additionally, the haplotype frequency (including this variant) for developing Pompe disease might be extremely rare. The c.-32-13T>G variant is the most common pathogenic variant for European Pompe disease [1]. However, this variant was only found in KRGDB and not reported in any Korean or Japanese patients with Pompe disease. The c.2238G>C (p.Trp746Cys) variant was reported as a common pathogenic variant for Pompe disease in mainland China [2]; however, this variant has not been reported in Japanese patients.

Conclusions
In this study, two different approaches were made to study Pompe disease. One was to analyze GAA variants based on patients in a traditional way, and the other was to analyze how likely this disease was in the general population. To apply this analysis, the GAA variants found in patients and the general population were interpreted as the same criterion according to the standards/guidelines or specifications for the interpretation of genetic variants, and Pompe disease in Koreans and Japanese was analyzed as a model. In addition, GAA PLPVs (with a review status of ≥2 gold stars in ClinVar) in the Pompe disease GAA variant database and gnomAD were compared.
Although some real PLPVs may have been classified as VUS due to currently insufficient evidence and the accuracy of this analysis is limited because GAA variants in patients with Pompe disease have been analyzed in only those reported in previous studies, the relationship between GAA variants found in patients with Pompe disease and in the general population is predicted to be more than a moderate correlation.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/children8070601/s1: Table S1: Presumed pathogenic or likely pathogenic variants in the GAA gene are found in Korean or Japanese patients or in general population databases.