Phenylketonuria Diagnosis by Massive Parallel Sequencing and Genotype-Phenotype Association in Brazilian Patients

Phenylketonuria (PKU) is a common inborn error of amino acid metabolism in which the enzyme phenylalanine hydroxylase, which converts phenylalanine to tyrosine, is functionally impaired due to pathogenic variants in the PAH gene. Thirty-four Brazilian patients with a biochemical diagnosis of PKU, from 33 unrelated families, were analyzed through next-generation sequencing in the Ion Torrent PGM™ platform. Phenotype–genotype correlations were made based on the BioPKU database. Three patients required additional Sanger sequencing analyses. Twenty-six different pathogenic variants were identified. The most frequent variants were c.1315+1G>A (n = 8/66), c.473G>A (n = 6/66), and c.1162G>A (n = 6/66). One novel variant, c.524C>G (p.Pro175Arg), was found in one allele and was predicted as likely pathogenic by the American College of Medical Genetics and Genomics (ACMG) criteria. The molecular modeling of p.Pro175Arg indicated that this substitution can affect monomers binding in the PAH tetramer, which could lead to a change in the stability and activity of this enzyme. Next-generation sequencing was a fast and effective method for diagnosing PKU and is useful for patient phenotype prediction and genetic counseling.


Introduction
Phenylketonuria (PKU, OMIM #261600) is an autosomal recessive inborn error of metabolism in which the conversion of phenylalanine (Phe) to tyrosine by the phenylalanine hydroxylase (EC 1.14.16.1) is defective, resulting in partial or total inactivity of the conversion due to biallelic variants in the PAH gene [1]. Untreated Phe accumulation leads to irreversible neurological effects, such as impaired cognitive development in children and seizures [2].
The treatment for PKU consists of Phe-free dietary management and supplementation with the Phe-free metabolic formula [3,4]. The use of sapropterin dihydrochloride may be also recommended for tetrahydrobiopterin (BH 4 )-responsive patients [5].
In Brazil, the public health system neonatal screening program performs biochemical screening for PKU by the detection of Phe in dried blood spots (DBS). If the results are abnormal, an additional blood sample is requested to confirm the diagnosis and begin treatment. The confirmatory test includes the measurement of blood Phe and tyrosine concentrations [6].
The PAH gene comprises 13 exons and 12 introns, resulting in a 452-residue protein. Worldwide, about 1188 variants in the PAH gene have been described in the PAHvdb (http://www.biopku.org) and about 1013 variants in the Human Gene Mutation Database (HGMD, http://www.hgmd.cf.ac.uk) [7]. The molecular investigation is sometimes the key to concluding the diagnosis of PKU and, consequently, assists in improving the treatment. The gold standard for gene variant detection in PKU patients is Sanger sequencing, which is costly and time-consuming [8]. Next-generation sequencing allows massive parallel deep-level sequencing, i.e., analyzing the entire exome or a targeted gene panel, which results in increased diagnostic sensitivity, faster sequencing and an inexpensive process [9]. PAH genotype data can be used for the prediction of BH4 responsiveness [9].
This study aimed to perform a molecular diagnosis of Brazilian PKU patients through massive parallel sequencing to confirm the diagnosis and obtain new data that can improve the choice of treatment for some patients.

Subjects
A total of 34 (33 nonrelated) nonconsanguineous PKU patients were recruited (female = 18, classic PKU = 22, mild PKU = 10, and undefined PKU type = 2), of whom 7 had complete previous genotyping, and 7 had incomplete previous genotyping. Of the total cohort, 23 patients were seen at the HCPA Medical Genetics Service (Porto Alegre, Rio Grande do Sul-RS, Brazil), and 11 were seen at the Hospital de Apoio de Brasília Neonatal Service on Newborn Screening, Genetics Unit (Distrito Federal-DF, Brazil).
For the patients from RS, a BH 4 deficiency was previously excluded by the measurement of 6,7-dihydropteridine reductase (DHPR) activity in the blood or DBS and of biopterins and neopterins in urine or DBS. Information such as the Phe level at diagnosis, age at diagnosis, age at treatment initiation, BH 4 responsiveness [10,11], and previous genotyping diagnosis of the patients were obtained retrospectively from the medical records.

DNA Extraction and Sequencing
Total blood samples were collected, and DNA extraction was performed with an Easy-DNA gDNA Purification Kit (Thermo Fisher Scientific, Waltham, MA, USA), according to the manufacturer's instructions. The DNA samples were quantified in Qubit (Thermo Fisher Scientific).
A targeted gene panel was designed using the Ion AmpliSeq Designer (Thermo Fisher Scientific) to include all the exonic regions and intron-exon boundaries of the PAH gene and of the genes causing BH 4 deficiencies (GCH1, GCHFR, PTS, PCBD1, QDPR, and SPR). Genomic DNA libraries were prepared using an Ion AmpliSeq™ Library Kit 2.0 (Thermo Fisher Scientific), followed by purification with magnetic beads (AMPure beads). The samples were sequenced in an Ion Torrent PGM Platform (Thermo Fisher Scientific, server version 5.0, Waltham, MA, USA), with a minimal coverage of 250X at the Unidade de Pesquisa Laboratorial (Centro de Pesquisa Experimental, Hospital de Clínicas de Porto Alegre).
Massive parallel sequencing data were analyzed using Torrent Suite 5.0.5 (Thermo Fisher Scientific) to perform the base-calling procedure. IGV 2.8.2 [12] was used for detection of the depth of sequencing and coverage failures that could suggest deletions. Variants were filtered by Enlis Genome Research (Enlis Genomics, Berkeley, CA, USA) and Ion Reporter software (Thermo Fisher Scientific), as well as the following databases: ClinVar, Phenylalanine Hydroxylase Gene Locus-Specific (PAHvdb) [9], and Human Gene Mutation Database.
Novel, conflicting and phase undetermined variants were confirmed by automated Sanger sequencing in an ABI 3500 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). The results were analyzed in Chromas 2.6.1 (Technelysium, South Brisbane, Australia), and NM_000277.3 and NP_000268.1 were used as the reference sequences.
Previous genotypes were identified through the Sanger sequencing method.

Pathogenicity Determination and Prediction
To determine the pathogenicity of the novel variant, the following variables were considered: allele frequency < 1% in gnomAD [13] and ABraOM [14]. The American College of Medical Genetics and Genomics (ACMG) guidelines for interpreting variants were used [15].

Molecular Modeling
The tridimensional structure of wild-type phenylalanine hydroxylase was taken from Protein Data Base (PDB) ID 6HYC [18], which also served as a template for tetramer reconstruction. The point mutations were modeled with DeepView [19], while the frameshift and early stop codon variants were modeled with I-TASSER [20]. FoldX 5.0 (Analy-seComplex command) was used to inspect the possible differences in binding affinity between monomers in the PAH tetramer. The differences between the energies of the mutant and wild-type proteins (∆∆G = ∆Gmut − ∆Gwt) were considered significant above 1.6 kcal/mol. This value corresponds to twice the intrinsic standard deviation of FoldX [21] and should significantly affect the stability of the variant [22].
A total of 26 different pathogenic variants were found in the PAH gene (Table 2). One was a novel variant c.524C>G (p.Pro175Arg), five were located at the intron-exon boundaries, and twenty were found in exonic regions (Figure 1). The majority (n = 6) of the pathogenic variants were found in exon 7. For the other BH 4 metabolism-related genes, no pathogenic variants were found.  Of the 14 patients without a BH 4 responsiveness test, the results of nine were predicted through the BioPKU database: two were responsive, and seven were nonresponsive. Of the total cohort, the results of ten agreed with the BioPKU data. Two RS patients (patients 2 and 7) presented a genotype not described in the BioPKU database [46] and were nonresponsive to BH 4 , according to the biochemical test [10,11]. Also, three DF patients (patients 26, 27 and 31) presented a genotype not described at BioPKU database, being two responsive and one nonresponsive, respectively. The novel variant c.524C>G was found in patient 20, located on exon 6 of the PAH. The ACMG criteria fulfilled by the variant were PM2, PM5, PP2, and PP3, resulting in a likely pathogenic classification. In addition, the patient's clinical information was consistent with classic PKU.
As shown in Figure 2, the novel variant c.524C>G resulted in a proline being substituted with an arginine in position 175, which is located in the catalytic domain of the PAH protein. This variant does not promote structural alterations in the protein. In the combination of variants p.(Pro175Arg) and p.Arg252Trp, found in the genotype of patient 2, a small portion of monomers showed higher affinity between the subunits than the wild-type complex. The molecular modeling analysis of PAH variants p.Thr238Pro and p.Gly272Ter, found in patient 14, showed differences in the interaction energy between monomers in the PAH tetramer, and most of the different tetramers showed significantly lower affinity than the wild-type (Table S1).
Of the 14 patients without a BH4 responsiveness test, the results of nine were predicted through the BioPKU database: two were responsive, and seven were nonresponsive. Of the total cohort, the results of ten agreed with the BioPKU data. Two RS patients (patients 2 and 7) presented a genotype not described in the BioPKU database [46] and were nonresponsive to BH4, according to the biochemical test [10,11]. Also, three DF patients (patients 26, 27 and 31) presented a genotype not described at BioPKU database, being two responsive and one nonresponsive, respectively.
The novel variant c.524C>G was found in patient 20, located on exon 6 of the PAH. The ACMG criteria fulfilled by the variant were PM2, PM5, PP2, and PP3, resulting in a likely pathogenic classification. In addition, the patient's clinical information was consistent with classic PKU.
As shown in Figure 2, the novel variant c.524C>G resulted in a proline being substituted with an arginine in position 175, which is located in the catalytic domain of the PAH protein. This variant does not promote structural alterations in the protein. In the combination of variants p.(Pro175Arg) and p.Arg252Trp, found in the genotype of patient 2, a small portion of monomers showed higher affinity between the subunits than the wildtype complex. The molecular modeling analysis of PAH variants p.Thr238Pro and p.Gly272Ter, found in patient 14, showed differences in the interaction energy between monomers in the PAH tetramer, and most of the different tetramers showed significantly lower affinity than the wild-type (Table S1).

Discussion
PKU is the most common IEM, and its incidence ranges between 1:850-112,000 in Europe (Karachay-Cherkessia Republic (Russia) and Finland, respectively], to 1:10,000 live births in the USA [47]. In Brazil, its incidence is 1:25,000 live births [48], while, in Southern Brazil, its incidence is 1:12,000-16,000 [49]. PKU has been included in Brazil's newborn screening program since 2001 [50]. Despite this screening program, our sample's median age at diagnosis was higher than the Brazilian Ministry of Health recommendations, i.e., up to 28 days of age [6]. A reason for this high median age at diagnosis is the difficulties in the execution of the program, which was implemented only in 2001. Some of our patients were born before that, when each Brazilian state performed a different screening and not all states included PKU in their newborn screening program. This is the reason behind the outstandingly late diagnosis of patients 22.1 and 22.2, diagnosed only after the development of severe symptoms. Besides that, this family is very interesting,

Discussion
PKU is the most common IEM, and its incidence ranges between 1:850-112,000 in Europe (Karachay-Cherkessia Republic (Russia) and Finland, respectively], to 1:10,000 live births in the USA [47]. In Brazil, its incidence is 1:25,000 live births [48], while, in Southern Brazil, its incidence is 1:12,000-16,000 [49]. PKU has been included in Brazil's newborn screening program since 2001 [50]. Despite this screening program, our sample's median age at diagnosis was higher than the Brazilian Ministry of Health recommendations, i.e., up to 28 days of age [6]. A reason for this high median age at diagnosis is the difficulties in the execution of the program, which was implemented only in 2001. Some of our patients were born before that, when each Brazilian state performed a different screening and not all states included PKU in their newborn screening program. This is the reason behind the outstandingly late diagnosis of patients 22.1 and 22.2, diagnosed only after the development of severe symptoms. Besides that, this family is very interesting, since the oldest brother (22.2), who was diagnosed after-and because of-the youngest brother, presented a milder neurological phenotype.
The PAH gene analysis by massive parallel sequencing is a fast, cost-effective, and accurate alternative for the genetic diagnosis of PKU [8,51]. Due to its large size and heterogeneity, similar symptoms are caused by alterations in more than one gene, as in the differential diagnosis of BH 4 deficiency and DNAJC12 gene variants. In PKU, especially, a less time-consuming diagnosis can be helpful to avoid the development of neurological symptoms to help predict BH 4 responsiveness and to facilitate a differential diagnosis [52].
In this study, the patients' molecular diagnosis agreed in every case with the diagnosis based on biochemical and clinical observations, which confirms the effectiveness of this methodology. We identified variants that were not covered in the previous genotyping analysis. In patient 9, for example, the error in previous genotyping could have been the result of a lack of specificity or coverage of the implemented technique or a lack of analysis of the parents' genotypes.
The most frequent variant found in the patients, c.1315+1G>A, was described as a common pathogenic variant in different Northern European populations, especially in Germany [53]. The second-most frequent variant, p.Arg158Gln, is also common in European populations, including Southern Italy and Eastern Europe [53]. The third-most prevalent variant found, p.Val388Met, is described as common in the Iberian Peninsula (5.7% in Spain and 19% in Portugal) and has a high frequency in Brazil (9%) and Chile (13%) [54].
The novel variant p.Pro175Arg involves the substitution of a proline for an arginine. The hydrophobic amino acid proline has particular properties: its side chain is connected to the protein backbone. However, unlike proline, which does not display main-chain conformation, arginine, a charged amino acid, is usually found in protein-active or proteinbinding sites [64]. The variant is located in the catalytic domain, although not in a hotspot region with highly destabilizing pathogenic variants between residues 238-330 [18]. The molecular modeling analysis indicates that this substitution can affect the binding between monomers in the PAH tetramer, which could lead to a change in the stability and activity of this enzyme. Another variant, p.Arg252Trp, has 1% of the PAH activity [65] and is related to the classic PKU.

Conclusions
The correlation of many variations in the genotypes and their resulting phenotypes is already available in public databases. Thus, a fast genotype diagnosis of PKU patients can help with treatment outcomes. Genotyping is a helpful way to understand how phenylalanine hydroxylase is altered in a patient, the impact of this specific alteration to the enzyme, and the enzyme's level of residual activity with these variants. Additionally, genotyping can help with the patients whose genotypes have information of the BH 4 responsiveness; when these patients are responsive, the supplementation with BH 4 leads to the enhancement of residual PAH activity, with a chaperone-like effect on a misfolding enzyme subunit [66].
This study presents a summary of the clinical and genetic data of 33 unrelated patients from two different regions of Brazil, which confirmed the diagnosis of PAH deficiency in every case. A novel variant was found in the PAH gene.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-442 5/12/1/20/s1, Table S1. Differences in binding affinity between monomers in the PAH tetramer in patients 2 and 14 as predicted by the program FoldX 5.0. Interaction energy (∆G) between monomers A to D calculated using combinations of the allelles 1 and 2 found in each patient. Differences between the energies of mutant and wild-type proteins (∆∆G = ∆Gmut−∆Gwt) above 1.6 kcal/mol should significantly affect the stability of the tetramer.  Institutional Review Board Statement: This observational, cross-sectional study, with a convenience sampling strategy, was approved by the Hospital de Clínicas de Porto Alegre (HCPA) Research Ethics Committee (no. 2015-0556). All patients recruited for this study or their legal guardians provided written informed consent.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study or their legal guardians provided written informed consent.