E-Learning for Rare Diseases: An Example Using Fabry Disease

Background: Rare diseases represent a challenge for physicians because patients are rarely seen, and they can manifest with symptoms similar to those of common diseases. In this work, genetic confirmation of diagnosis is derived from DNA sequencing. We present a tutorial for the molecular analysis of a rare disease using Fabry disease as an example. Methods: An exonic sequence derived from a hypothetical male patient was matched against human reference data using a genome browser. The missense mutation was identified by running BlastX, and information on the affected protein was retrieved from the database UniProt. The pathogenic nature of the mutation was assessed with PolyPhen-2. Disease-specific databases were used to assess whether the missense mutation led to a severe phenotype, and whether pharmacological therapy was an option. Results: An inexpensive bioinformatics approach is presented to get the reader acquainted with the diagnosis of Fabry disease. The reader is introduced to the field of pharmacological chaperones, a therapeutic approach that can be applied only to certain Fabry genotypes. Conclusion: The principle underlying the analysis of exome sequencing can be explained in simple terms using web applications and databases which facilitate diagnosis and therapeutic choices.


Introduction
A rare disease is any disease that affects only a small percentage of the population, occurring at a frequency of 1/2000 according to the most recent guidelines [1]. There are more than 6000 of these diseases, of which 5081 have a known phenotype description and molecular basis, while 1597 have a known phenotype description or locus with the molecular basis unknown, according to OMIM (Online Mendelian Inheritance in Men) [2]. Although these disease are individually rare, they affect a great number of people on the whole. Eighty per cent of these diseases are of genetic origin, and they are often chronic and life-threatening.
Each rare disease can have different genotypes and a large pheno-typical spectrum. Non-sense mutations, deletions, and insertions abolish the function of the affected protein, but missense mutations have variable effects that go from complete inactivation to mild reduction of activity. On average there are 10-12 missense mutations per disease, but in some cases there are hundreds. Bare figures give a sense of the great challenge represented by rare diseases both in terms of diagnosis and therapy. Understanding rare diseases at a genetic level is essential in order to search for personalized therapies

Results
The case is of an adolescent male patient who is affected by angiokeratoma and mild proteinurea. His parents are apparently healthy. There is a suspicion of Fabry disease and, if the diagnosis is confirmed, the clinician should decide whether it is a classic (severe) or an atypical (mild) form and whether a therapy should be started before other symptoms appear. In this case, a choice would need to be made between two types of intervention: ERT, or pharmacological chaperones.
Confirmation of the diagnosis can be carried out through sequencing the exons of the GLA gene or those of a restricted panel of genes that are associated with the symptoms of the patient. The appropriate assays can be found by searching by condition/phenotypes in the genetic testing registry (https://www.ncbi.nlm.nih.gov/gtr/tests/). This tutorial starts from a short sequence of DNA that could be derived from next-generation sequencing.
Data obtained following the tutorial are presented in the methods section. In brief, a variant is found mapping the exonic sequences derived from the patient on the reference human genome using the program BLAT. Most variants are not associated with disease. Usually it is assumed that if a nucleotide change results in a synonymous codon, it is benign. Therefore, it is necessary to understand whether the mutation affects the protein product. Indeed, the transition observed in the GLA gene of the patient, G>A (Figure 1), generates a missense mutation in AGAL, p.V269M (Figure 2). Confirmation of the diagnosis can be carried out through sequencing the exons of the GLA gene or those of a restricted panel of genes that are associated with the symptoms of the patient. The appropriate assays can be found by searching by condition/phenotypes in the genetic testing registry (https://www.ncbi.nlm.nih.gov/gtr/tests/). This tutorial starts from a short sequence of DNA that could be derived from next-generation sequencing.
Data obtained following the tutorial are presented in the methods section. In brief, a variant is found mapping the exonic sequences derived from the patient on the reference human genome using the program BLAT. Most variants are not associated with disease. Usually it is assumed that if a nucleotide change results in a synonymous codon, it is benign. Therefore, it is necessary to understand whether the mutation affects the protein product. Indeed, the transition observed in the GLA gene of the patient, G>A (Figure 1), generates a missense mutation in AGAL, p.V269M ( Figure  2). This mutation is not reported in UniProt ("Pathology & Biotech" section), or ExAC (a database for alternative allele frequencies [26]), nor is it reported in OMIM® [2] and ClinVar [27]). It is predicted as being pathogenic according to PolyPhen-2 [28] (Figure 3). p.V269M might represent a novel case.     This mutation is not reported in UniProt ("Pathology & Biotech" section), or ExAC (a database for alternative allele frequencies [26]), nor is it reported in OMIM® [2] and ClinVar [27]). It is predicted as being pathogenic according to PolyPhen-2 [28] (Figure 3). p.V269M might represent a novel case.   Although purposely simplified, the example presented so far illustrates the generic pipeline that is followed for the analysis of a disease mutation. In order to go into the diagnosis in more depth and personalize the therapy, more must be learnt about the affected protein and the specific disease.
UniProt is a manually curated and annotated protein database that can be searched with the accession code provided by BlastX, P06280.1. Scrolling the UniProt page one can learn that: (1) Alpha-galactosidase A (AGAL) is encoded by the gene GLA (in the header of the file).
(2) AGAL has catalytic activity: "Hydrolysis of terminal, non-reducing α-D-galactose residues in α-D-galactosides, including galactose oligosaccharides, galactomannans and galactolipids." (in the "Function" section). Fabrazyme ® (from Genzyme). Used as a long-term enzyme replacement therapy in patients with a confirmed diagnosis of Fabry disease. The differences between Replagal ® (also known as agalsidase alpha) and Fabrazyme ® (also known as agalsidase beta) lie in the glycosylation patterns. Agalsidase alpha is produced in the hamster CHO cell line while agalsidase alpha is produced in human cell lines." ("Pathology & Biotech: Pharmaceutical use" section). (5) Another therapy is available for FD. A link to DrugBank ("Pathology & Biotech: Chemistry databases" section) [29] (DB05018) provides some details about the drug and summarizes its mechanism of action: "migalastat hydrochloride is an experimental, oral therapy for the treatment of Fabry disease and belongs to a class of molecules known as pharmacological chaperones". Indeed, migalastat hydrochloride or 1-deoxygalactonojirimycin (DGJ) is a pharmacological chaperone for FD; it stabilizes wild type AGAL as well as some mutant forms. Mutations affecting the active site or cysteines involved in disulphide bridge formation do not respond. These conditions are necessary, but not sufficient to exclude the usefulness of migalastat [30]. In general, each mutation must be experimentally tested; the techniques needed for analysis have been extensively described elsewhere [31][32][33] but are outside the scope of this tutorial.
Once the reader has been introduced to the disease and has become acquainted with the main genetic and biochemical aspects, he can move onto disease-specific databases by looking for them in PubMed. Two references point to on-line user-friendly databases. On searching fabry-database.org ( Figure 4) the reader learns that this missense mutation in AGAL p.V269M has already been reported in the literature [34][35][36] and is associated to the classic phenotype, thus confirming the prediction of PolyPhen-2. The mutation is also annotated [37] in the manually-curated database of disease-associated variants HGMD (The Human Gene Mutation Database) [11]. It should be remembered that when a new variant, which is not included in any of the data bases of clinical phenotype, is found, it is advisable to perform a biopsy to obtain a definitive diagnosis [38].
Fabry_CEP is a specialized database that reports data found in the literature concerning the residual activity in cells of each possible AGAL mutant with or without the pharmacological chaperone DGJ [39]. References are provided and, in case no experimental data are available, the probability of being responsive is provided. When Fabry_CEP is queried with the mutation p.V269M, it returns experimental results ( Figure 5), numerical values of activity with standard deviation obtained by three independent groups [32,40,41], and information provided by the group that commercializes migalastat under the registered name of Galafold ® . Experimental conditions are slightly different, but in all cases the mutation is responsive to the drug. The estimate of the residual activity of the mutant enzyme in cells is low and confirms that the phenotype can be classic. Besides these data obtained from the literature, the reader will learn that the mutation does not occur in the active site (this result was obtained running the program DrosteP [42] on the X-ray structure of AGAL).

Discussion
The tutorial we presented shows how a variant found in a patient can be critically evaluated to graduate diagnosis and personalize the therapy. We chose FD as an example, but the approach is not limited to this disease. Some emerging questions were raised. In the first place, the clinician could encounter a variant that is not (yet) among disease mutations in the most frequently consulted databases. This can occur either because the variant is new or because it has been described in medical literature, but has not yet been included. This problem can be solved by predicting the association with disease and/or looking for disease-specific databases. In addition to this, the clinician should check whether the mutation is associated to a severe phenotype and if a mutation-specific therapy exists. FD represents a successful example of the use of pharmacological chaperones. This approach, which is definitely limited to a subset of missense mutations, is not limited to FD, and is being assessed with respect to other lysosomal [43,44] and metabolic disorders [45][46][47][48] as well.
The diagnosis of rare diseases takes advantage of the sequence of the DNA of the patient alone or of the so-called trios in which data from parents are obtained too. The analysis can be extended to the whole genome or exome, or limited to a panel of genes or to the exons of a single gene. The huge amount of data, particularly in the case of genome or exome sequencing, requires the help of experts who can run pipelines of specific dedicated software. Yet, in the end, when the number of candidate variants is restricted, it is up to the clinician to make a diagnosis critically and choose the therapy. We have shown that this is possible because user-friendly web applications and databases can be used without specific bioinformatics training.

Discussion
The tutorial we presented shows how a variant found in a patient can be critically evaluated to graduate diagnosis and personalize the therapy. We chose FD as an example, but the approach is not limited to this disease. Some emerging questions were raised. In the first place, the clinician could encounter a variant that is not (yet) among disease mutations in the most frequently consulted databases. This can occur either because the variant is new or because it has been described in medical literature, but has not yet been included. This problem can be solved by predicting the association with disease and/or looking for disease-specific databases. In addition to this, the clinician should check whether the mutation is associated to a severe phenotype and if a mutation-specific therapy exists. FD represents a successful example of the use of pharmacological chaperones. This approach, which is definitely limited to a subset of missense mutations, is not limited to FD, and is being assessed with respect to other lysosomal [43,44] and metabolic disorders [45][46][47][48] as well.
The diagnosis of rare diseases takes advantage of the sequence of the DNA of the patient alone or of the so-called trios in which data from parents are obtained too. The analysis can be extended to the whole genome or exome, or limited to a panel of genes or to the exons of a single gene. The huge amount of data, particularly in the case of genome or exome sequencing, requires the help of experts who can run pipelines of specific dedicated software. Yet, in the end, when the number of candidate variants is restricted, it is up to the clinician to make a diagnosis critically and choose the therapy. We have shown that this is possible because user-friendly web applications and databases can be used without specific bioinformatics training.

Aims
The reader will get familiar with databases and programs that are used during exome sequencing analysis and with disease-specific tools. Only a basic knowledge of genetics and biochemistry is required. The tutorial will start from the results of an analysis of the DNA of a hypothetical male patient. It will proceed with variant calling, identification of the type of mutation, and prediction of its pathogenic nature. Information about the affected protein and potential therapies will be gained.

Requirements
It is an in silico experience and only a computer with an internet connection is required. A list of bioinformatics tools which do not require registration and have the advantages of enabling fast, low-cost, and reliable analysis of biological data with user-friendly interfaces is provided.

Input Exonic Sequence
We assume that the sequence has been obtained from a male patient: ttaatgattggcaactttggcctcagctggaatcagcaagtaactcagatggccctctgggctatcatggctgctcc

Protocol
Step 1. Variant Calling The nucleotide sequence will be mapped on the human reference genome.
(1) Open the UCSC genome browser and choose among the BLAT tools ( Figure 6A, point 1).
(2) The latest assembly of the HUMAN genome is chosen by default and does not need to be changed.
An overview of how the program BLAT works is offered in the search page. Paste the given sequence into the Query Sequence box ( Figure 6B, point 2). (3) Submit ( Figure 6B, point 3).
The output is a list of significant hits. The highest score is obtained mapping the sequence on the X chromosome. Clicking on "details" (Figure 6C, point 4) a side-by-side alignment of the patient's sequence with the reference genome is obtained. A transition A>G is observed (Figure 1).
Step 2. Is it a missense, nonsense, or a synonymous mutation?
The sequence will be translated to check in the protein database UniProt whether the mutation has an effect on the gene product.
(1) Go to BLAST and choose BlastX. The program searches protein databases using a translated nucleotide query.   Be patient! When you get results, scroll the page. The best alignment is obtained with αgalactosidase A, Uniprot Sequence ID: P06280.1 (Figure 2). One amino acid (V269) in the subject found in UniProt is substituted by M in the query (i.e., the patient's) sequence.
Step 3. Obtaining information about the protein affected by the mutation (1) Query Uniprot using the ID of the target protein found with BlastX: P06280.1 ( Figure 8A, point1).
You will get the entry name AGAL_HUMAN and you should click on link besides it ( Figure 8A, point 2).
Many details on the protein will appear: Function, Names & Taxonomy, Subcellular location, and so on. A long list of natural variants is reported in the Pathology & Biotech section, most of which are implicated in FD ( Figure 8B). Among them 269 V to M (p.V269M) is not found. Links to OMIM (300644. gene; 301500. phenotype) can be followed to read about the disease and its pathological variants. Another popular site is ClinVar https://www.ncbi.nlm.nih.gov/clinvar/. It can be searched with "GLA AND V269M", but in this database also the variant p.V269M cannot be found.
Step 4. Is the variant pathological?
The variant carried from the patient, p.V269M, is not in the list reported by UniProt, MIM or ClinVar. It might be a new disease mutation. You can run predictive programs such as PolyPhen-2.
(1) Launch the program inserting the entry name of the protein, "AGAL_HUMAN" (Figure 9A, point 1), the site of the mutation 269, the wild-type amino acid, V, and the mutated one, M ( Figure 9A, point 2).
Be patient! Then you can check the result by clicking on View (Figure 3). Be patient! When you get results, scroll the page. The best alignment is obtained with α-galactosidase A, Uniprot Sequence ID: P06280.1 (Figure 2). One amino acid (V269) in the subject found in UniProt is substituted by M in the query (i.e., the patient's) sequence.
Step 3. Obtaining information about the protein affected by the mutation (1) Query Uniprot using the ID of the target protein found with BlastX: P06280.1 ( Figure 8A, point1).
You will get the entry name AGAL_HUMAN and you should click on link besides it ( Figure 8A, point 2).
Many details on the protein will appear: Function, Names & Taxonomy, Subcellular location, and so on. A long list of natural variants is reported in the Pathology & Biotech section, most of which are implicated in FD ( Figure 8B). Among them 269 V to M (p.V269M) is not found. Links to OMIM (300644. gene; 301500. phenotype) can be followed to read about the disease and its pathological variants. Another popular site is ClinVar https://www.ncbi.nlm.nih.gov/clinvar/. It can be searched with "GLA AND V269M", but in this database also the variant p.V269M cannot be found.
Step 4. Is the variant pathological?
The variant carried from the patient, p.V269M, is not in the list reported by UniProt, MIM or ClinVar. It might be a new disease mutation. You can run predictive programs such as PolyPhen-2.
(1) Launch the program inserting the entry name of the protein, "AGAL_HUMAN" (Figure 9A, point 1), the site of the mutation 269, the wild-type amino acid, V, and the mutated one, M ( Figure 9A, point 2).
Be patient! Then you can check the result by clicking on View (Figure 3).  Step 5. Do specific databases exist? Does the mutation cause severe inactivation of AGAL? Does it respond to DGJ?
PubMed can be searched with the keywords "Fabry AND Database AND User friendly"; fabrydatabase and FABRY_CEP are disease specific and require only the introduction of the missense mutation. designed the study and wrote the manuscript. Chiara Cimmaruta and Ludovica Liguori carried out the experiments. Maria Monticelli prepared the figures and tested the tutorial. All authors read and approved the final manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

FD
Fabry disease Gb3 Globotriaosylceramide ERT Enzymatic replacement therapy DGJ 1-Deoxygalactonojirimycin Step 5. Do specific databases exist? Does the mutation cause severe inactivation of AGAL? Does it respond to DGJ?
PubMed can be searched with the keywords "Fabry AND Database AND User friendly"; fabry-database and FABRY_CEP are disease specific and require only the introduction of the missense mutation.