Annotating Cancer Variants and Anti-Cancer Therapeutics in Reactome

Reactome describes biological pathways as chemical reactions that closely mirror the actual physical interactions that occur in the cell. Recent extensions of our data model accommodate the annotation of cancer and other disease processes. First, we have extended our class of protein modifications to accommodate annotation of changes in amino acid sequence and the formation of fusion proteins to describe the proteins involved in disease processes. Second, we have added a disease attribute to reaction, pathway, and physical entity classes that uses disease ontology terms. To support the graphical representation of “cancer” pathways, we have adapted our Pathway Browser to display disease variants and events in a way that allows comparison with the wild type pathway, and shows connections between perturbations in cancer and other biological pathways. The curation of pathways associated with cancer, coupled with our efforts to create other disease-specific pathways, will interoperate with our existing pathway and network analysis tools. Using the Epidermal Growth Factor Receptor (EGFR) signaling pathway as an example, we show how Reactome annotates and presents the altered biological behavior of EGFR variants due to their altered kinase and ligand-binding properties, and the mode of action and specificity of anti-cancer therapeutics.

Currently, the pathways in Reactome cover about 25% of the gene products encoded in the human genome, and contain the normal versions of many pathways that can be abnormally activated in cancer, such as "Signaling by EGFR" [21], "Signaling by FGFR" [22], "Signaling by NOTCH" [23], "PIP3 Activates AKT Signaling" [24], "RAF/MAP Kinase Cascade" [25]. We have also annotated a number of pathways that can be inactivated in cancer, such as pathways involving TP53: "Apoptosis" [26] and "Cell Cycle Checkpoints" [27], as well as pathways involving the RB1 protein family: "Mitotic G1-G1/S phases" [28].
Here, we use the epidermal growth factor receptor (EGFR), fibroblast growth factor receptor (FGFR) and PI3K/AKT signaling pathways to illustrate Reactome annotation of cancer pathways. EGFR and FGFR are transmembrane receptor tyrosine kinases. EGFR is activated by several growth factors, including the epidermal growth factor (EGF) [29]. FGFR family members (FGFR1, FGFR2, FGFR3 and FGFR4) are activated by 18 of 22 existing human fibroblast growth factors (FGFs), with each FGFR showing different affinity for individual FGFs [30]. Growth factor binding induces a conformational change that enables dimerization and trans-autophosphorylation on C-tail tyrosine residues of EGFR [31] and FGFRs [32][33][34]. Phosphorylated tyrosines in the C-tails of EGFR and FGFR serve as docking sites for downstream effectors that, upon binding to phosphorylated receptors, activate intracellular signaling cascades that regulate cellular proliferation, differentiation, and survival [30,35,36]. One of the intracellular signaling cascades downstream of EGFR and FGFRs is PI3K/AKT signaling [37,38]. PI3K class IA enzymes are heterodimers composed of a regulatory subunit p85 (encoded by PIK3R1, PIK3R2 or PIK3R3) and a catalytic subunit p110 (encoded by PIK3CA, PIK3CB or PIK3CD) [39]. The catalytic p110 subunit of PI3K becomes activated when inhibitory contacts with the p85 regulatory subunit are relieved by p85 binding to phosphorylated adaptor proteins recruited to activated EGFR or FGFRs [40,41]. Active PI3K class I enzymes phosphorylate PIP2 (phosphatidylinositol-4,5-bisphophate), converting it into PIP3 (phosphatidylinositol-3,4,5-trisphosphate), a reaction negatively regulated by PTEN phosphatase [42]. PIP3 serves as a second messenger that activates AKT (AKT1, AKT2 or AKT3) [43]. AKT family members are cytosolic and nuclear serine/threonine protein kinases involved in phosphorylation-mediated regulation of numerous proteins involved in cell survival and growth [39].
Small molecule therapeutics and recombinant antibodies are being developed as potential treatments for cancers driven by increased activity of EGFR, FGFR and/or PI3K/AKT. Gefitinib and erlotinib, small tyrosine kinase inhibitors, are approved for the clinical treatment of cancers harboring specific EGFR mutations. A recombinant antibody, cetuximab, is approved for the clinical treatment of cancers that overexpress wild-type EGFR [59]. Small molecules that inhibit the catalytic activity of FGFRs [60], PI3K and AKT [61] are currently undergoing clinical trials or are in pre-clinical development.
We have extended the Reactome data model and enhanced the web tools to permit the annotation and visualization of the altered biological behavior of protein variants. These enhancements can be applied to any molecular abnormality due to germline or somatic mutation, as well as to abnormalities due to expression of foreign proteins encoded by genomes of infectious agents like viruses or intracellular parasites.

Annotation of Cancer-Perturbed Pathways
Pathways that stimulate cell growth, cell division and survival, and maintenance of undifferentiated state are activated in cancer through gain-of-function mutations in participating proto-oncogenes and/or their overexpression. On the other hand, pathways that negatively regulate cell division, growth and survival, or promote cellular differentiation are inactivated through loss-of-function mutations in tumor suppressor genes and/or their downregulation. To capture these two groups of cancer effectors, we have added new classes of data to the Reactome database.

Extension of Protein Modifications to Accommodate Annotation of Changes in Amino Acid Sequence
The protein modification class in the Reactome data model was constructed to support annotation of covalent co-and post-translational modifications of proteins such as the phosphorylation of serine residues. To allow for annotation of mutant proteins, two new subclasses of modifications were introduced: Replaced Residue and Fragment Modification (Figure 1a). The Replaced Residue class is used to annotate amino acid substitutions in a protein sequence. A Replaced Residue instance associates a specific coordinate of a protein sequence with two PSI-MOD ontology [62] attributes: the first identifies the amino acid found at that position in the normal protein and the second attribute identifies the amino acid that replaces it in the mutant protein. For example, the most frequently found mutation in EGFR is the substitution of a leucine residue at position 858 with an arginine residue in the catalytic domain of EGFR. This mutation disrupts autoinhibitory interactions, facilitating adoption of an active conformation [63]. The Reactome record for EGFR L858R (Figure 1b) indicates this amino acid substitution.
The FragmentModification subclass includes two subclasses, FragmentInsertionModification and FragmentDeletionModification. FragmentInsertionModification is used to annotate insertions of amino acid residues in a protein sequence. FragmentDeletionModification is used to annotate removal of amino acid residues. PIK3R1 Y463_L466del is a variant of the PI3K regulatory subunit p85alpha found in endometrial cancer (Figure 1c). This PIK3R1 mutant lacks four amino acid residues in the inter-SRC homology 2 (iSH2) domain. PIK3R1 is able to bind the catalytic subunit of PI3K, PIK3CA (p110alpha), but does not inhibit it, resulting in the constitutive activity of PI3K, in the absence of growth factors [64]. The deletion coordinates are indicated in the Reactome record for PIK3R1 Y463_L466del mutant. Future changes to the website will allow chemical modifications to be distinguished from effects of mutations. (b) Reactome record for EGFR L858R caused by a missense mutation that replaces leucine residue at position 858 with arginine. (c) Reactome record for PIK3R1 Y463_L466del caused by an in-frame intragenic deletion in PIK3R1 that removes amino acid residues from position 463 to position 466, as captured by the Fragment Deletion Modification instance. (d) Reactome record for BCR-FGFR1 fusion protein. Truncation of the wild-type BCR protein sequence is shown by altered end coordinate. FGFR1 fragment fused to BCR is annotated as an insertion using FragmentInsertionModification class.
The FragmentModification class can also be used to annotate fusion proteins. For example, the translocation t(8;22)(p11;q11) in chronic myeloid leukemia produces a BCR-FGFR1 fusion that consists of the first four exons of BCR and exons 9-18 of FGFR1 [65]. The BCR-FGFR1 fusion protein is annotated as an Entity with Accessioned Sequence (Figure 1d) that consists of a truncated BCR protein, starting at position 1 and ending at position 584 of the reference UniProt sequence P11274 (human BCR). Then, a FragmentInsertionModification instance defines insertion of amino acids 429-822 of the UniProt reference sequence P11362 (human FGFR1) at position 585 of BCR ( Figure 1d).
On the Reactome website, selecting a physical entity or an event node by clicking on a pathway diagram brings up a record for that particular instance in the details pane, which appears by clicking on the yellow triangle at the bottom of the Pathway Browser page. Selecting EGFRvIII in the diagram (Figure 2a), brings up Reactome information on this mutant protein, as well as interactive cross references that direct users to other Reactome website pages or other databases of interest ( Figure 2b). Each cancer-related disease variant record cross-references available records in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (Table 1) [66]. The EGFRvIII record displayed on Reactome website links to COSMIC record 21351, which provides information on nucleotide sequence changes and tumor samples in which this mutation was reported.

Associating Disease Attributes with Physical Entities and Events
All physical entities related to disease variants, such as proteins, sets of proteins, and protein complexes are tagged with disease attributes (Table 1), using a term from the Disease Ontology (DO) [67]. This DO record provides, when possible, a link to the synonymous disease record in the National Cancer Institute Thesaurus (NCIt) [68]. The disease attribute of the physical entity is assigned to all reactions and pathways in which it participates.
Besides providing information on disease involvement of specific proteins and directing users to more detailed disease descriptions, a disease attribute annotation enables users to search Reactome database for proteins and events associated with a specific disease. For example, in Figure 2b, a DO instance "adult glioblastoma multiforme" is associated with EGFRvIII. Clicking on the "adult glioblastoma multiforme" link displayed on Reactome website (Figure 2b) provides a DO identifier for this disease instance (3075) and also lists all other proteins in Reactome database whose mutant forms are associated with adult glioblastoma multiforme ( Figure 2c). Thus, Reactome provides cancer researchers with a quick access to cancer type-specific disease variants and information on the mechanism of action for each variant annotated.

Mode of Action and Specificity of Anti-Cancer Therapeutics
The Reactome data model allows for annotation of small molecules and antibodies used as anti-cancer therapeutics, as well as the annotation of their specific mode of action. We have annotated nine small tyrosine kinase inhibitors (TKIs) used to inhibit EGFR kinase activity in cancer [59,69], as well as the recombinant antibody cetuximab [70] (Figure 3a). In addition, we annotated five benzaquinoid ansamycins that inhibit the HSP90 chaperone protein that stabilizes EGFR mutant proteins [71], twelve anti-FGFR TKIs [60], one anti-FGFR recombinant antibody [72], ten small molecules that inhibit the catalytic subunit of PI3K [61], and three small molecules that inhibit AKT [61] (Table 2).  For each anti-EGFR TKI, we specify whether it associates with the EGFR catalytic domain through formation of a covalent (irreversible) bond or through a non-covalent interaction (reversible). We also specify whether a TKI is EGFR-specific or whether it can inhibit other receptor tyrosine kinases besides EGFR (EGFRplus). Each small molecule instance we annotate is associated with the Chemical Entities of Biological Interest (ChEBI) database identifier [19]. On the Reactome website, a link to a corresponding ChEBI record is displayed after the name of each small molecule. Clicking on the ChEBI link associated with gefitinib ( Figure 3b) directs the user to the gefitinib information in ChEBI, displaying its molecular structure and additional information not directly captured by Reactome. EGFR cancer mutants in Reactome are classified into sets based on their sensitivity to various TKIs ( Figure 3a). Ligand responsive EGFR mutants sensitive to non-covalent TKIs can be inhibited by low concentrations of non-covalent (reversible) TKIs that do not significantly affect the function of wild-type EGFR and therefore produce minimal side effects. Ligand responsive EGFR mutants resistant to non-covalent TKIs can be inhibited by covalent (irreversible) TKIs. As can be seen from the diagram (Figure 3a), concentrations of irreversible TKIs that inhibit EGFR mutants also inhibit the function of the wild-type protein, causing more severe side effects, as described in event summations. Cetuximab is used for treatment of cancers that overexpress wild-type EGFR protein, usually due to amplification of the EGFR locus [59,70].

Other Disease Pathways in Reactome
In addition to cancer, Reactome also collects and provides information on communicable diseases. Currently featured infection-related Reactome pathways are "HIV Infection", "Influenza Infection", "Botulinum Neurotoxicity", and "Latent Infection with Mycobacterium tuberculosis". The pathway "Signaling by FGFR in Disease" contains, besides information on FGFR in cancer, the information on FGFR mutations and their functional implication in various developmental disorders, such as Pfeiffer syndrome and Crouzon syndrome. Reactome has recently published "Abnormal Metabolism in Phenylketonuria" and "Mucopolysaccharidoses" pathways, thereby introducing metabolic genetic diseases.

Enhancing the Reactome Pathway Browser for Display of Disease Variants
The Reactome Pathway Browser, based upon the Systems Biology Graphical Notation (SBGN) [73], permits the navigation and analysis of Reactome data, in a similar manner to Google Maps. SBGN is a standard graphical representation of biological pathway and network models. The Pathway Browser was adapted to enable display of disease variants and disease-related events involving proteins. A pathway diagram is shared between a wild-type pathway, for example "Signaling by EGFR", and the corresponding disease pathway, "Signaling by EGFR in Cancer". A disease attribute, attached to events involving cancer, instructs the browser to hide disease events when a user selects a wild-type pathway view (Figure 4a). When a user selects a disease pathway view, disease events appear in the diagram while all normal events are shaded gray. All disease events and physical entities with disease tags are outlined in red for easier visualization (Figure 4b). . Display of wild-type and disease pathway diagrams. (a) A cancer disease attribute, assigned to events involved in cancer, instructs the browser to hide disease events when a user selects to view a wild-type pathway. (b) When a user selects to view a disease pathway, disease events appear in the wild type diagram, while all normal events are shaded. All disease events and physical entities with disease tags are outlined in red for easier visualization.
Physical entity and reaction nodes within the pathway diagrams are interactive. Clicking on either feature displays specific information and additional links out to external databases in the "Details" Panel, which opens by clicking on the yellow triangle at the bottom of the Pathway Browser page (Figure 2). Context sensitive menus, accessible through the right click on a selected entity, provide additional information about the physical entity in the pathway: a catalogue of other pathways in Reactome in which the selected entity participates; a list of the entities that contribute to the macromolecular complex; a catalogue of interactors of the selected entity; and the option to export a list of interactors of the selected entity. The latter two features of the context sensitive menu increase protein coverage and associated variant annotations. The Molecular Interaction Overlay (MI Overlay), accessible through "Analyze, Annotate & Upload" button of the Pathway Browser, displays proteins interacting with the manually annotated protein components of a Reactome pathway. This network overlay tool employs PSICQUIC (Proteomics Standard Initiative Common QUery InterfaCe) to apply an interactive display of interaction data from an external database such as IntAct [74] into Reactome pathway diagrams. Other sources of interaction data include protein-protein and protein-drug/small molecule interactions; a user-supplied list can also be displayed. By displaying interaction data from ChEMBL, a database of bioactive drug-like molecules ( Figure 5) [75], the MI Overlay feature provides an opportunity to identify protein variant-drug interactions, identify novel cancer targets or off-target effects, or pharmaceuticals that can moderate perturbed reactions or pathways experimentally. Figure 5. AKT1 E17K mutant-small molecule interactions. When ChEMBL is selected as the interaction database, the MI Overlay displays small molecules from ChEMBL as interactors of AKT1 E17K variant protein of the PI3K/AKT Signaling in Cancer pathway. The nodes of the mini network are interactive; clicking the node to the left of the green arrow will link out to the Staurosporine protein kinase inhibitor record at ChEMBL.

Reactome Cancer-Perturbed Pathways Support Pathway Visualization and Analysis
The Pathway Browser provides an intuitive and interactive pathway visualization system, promoting a variety of web-based data analyses of user-supplied experimental data. The Pathway Analysis tool provides two alternate functions to analyze lists of genes. First, in the identifier (ID) mapping mode, a user-supplied set of gene or protein identifiers can be mapped to Reactome events. Second, in the overrepresentation analysis mode, users can determine which pathways are statistically overrepresented in a gene/protein list. The Expression Analysis tool will aid with the biological interpretation of large-scale cancer genome sequencing, genomics and proteomics experiments. For example, this tool allows users to visualize expression data (or any other numeric value, e.g., differential expression) superimposed on the Reactome pathway diagram. Reactome applies an orthology-based computational algorithm to curated human data to infer pathways in 22 diverse model organisms. The Species Comparison tool allows users to visually compare and contrast human pathways with these predicted model organism pathways. As additional cancer-perturbed pathways are added to Reactome, this method of "inferred" curation will provide a platform from which to study molecular disease mechanisms across the evolutionary spectrum. Reactome data is available for downloading and manipulation by third party visualization and analysis tools, including Cytoscape, Vanted and CellDesigner [76][77][78].

Experimental Section
Using the previously curated human EGFR pathway, which included a number of annotations for EGFR and downstream signaling by SHC1, GRB2, PLCG1 and CBL, as a template from which to extend the EGFR pathway, we imported this dataset into the Reactome Curator Tool [11]. Briefly, the curator tool provides Reactome curators with all the necessary tools to access the Reactome database and annotate data in agreement with the Reactome data model. Curators identified research articles and reviews in PubMed that were relevant to the annotation of the cancer-perturbed EGFR, FGFR and PI3K/AKT pathways. Once publications had been reviewed, a list of cancer-related proteins, small molecules and macromolecular complexes was prepared. Additional queries were performed in UniProt and ChEBI to identify the reference entity proteins and small molecules, respectively that would be used to construct the reactions of the cancer-perturbed EGFR pathway. Additional attributes of a reaction were captured. For example, details of the input and output entity(s), the catalytic or regulatory protein(s), the cellular location(s) of the reactants, a textual summation describing the reaction and the supporting literature reference(s). The Disease Ontology terms that match literature references and COSMIC records for annotated cancer variants were assigned as disease attributes to physical entities and events involving these mutant proteins. Oncogenic overexpression of proteins as a consequence of gene amplification is usually not explicitly shown in pathway diagrams, but is captured in text summations that accompany cancer pathways.

Conclusions
Reactome is a highly reliable, curated database of biological pathways. Through our website, we provide access to pathway and network data analysis tools for visualizing pathway data and interpreting experimental data sets. All Reactome data and software is openly available with no licensing required.
In view of the potential applicability of pathway and network analyses to identify and characterize novel cancer targets, Reactome has integrated and expanded the pathway gene product-function annotation and pathway curation to promote comprehensive and effective characterization of cancer targets, their related relationships and pathways. Our curation efforts thus far have focused on the EGFR pathway (including the EGFR, ERBB2, ERBB3, ERBB4 receptors), FGFR and PI3K-AKT signaling and their downstream effector genes. Reactome curators will enhance our curation of other cancer-perturbed pathways, such as apoptosis, cell cycle checkpoints, and other signaling pathways, including BMP, PDGF, NOTCH, VEGF, WNT, Rho-GTPase, and TGF-beta. Furthermore, as the Ontario Institute for Cancer Research and its partners in the International Cancer Genome Consortium (ICGC) [79,80] sequence various tumor genomes, new cancer-related candidate pathways will be identified and curated into Reactome. Existing Reactome pathways are updated on a regular basis, and additional cancer variants and anti-cancer drugs implicated in EGFR, FGFR and PI3K/AKT pathways will be included as information on their function becomes available.
Reactome is not the only pathway database to curate pathway data relevant to cancer and disease. Cancer-perturbed signaling pathways can be found in KEGG, Panther, MetaCyc, and NCI-PID [81][82][83][84]. The Reactome data model, however, provides a more detailed framework for the curation of the knowledge relevant to cancer-related pathways, a visualization environment to display pathway data, and a suite of analysis tools for the interpretation of experimental cancer data sets.
A number of other bioinformatics databases such as Mouse Genome Informatics (MGI) [85] and Comparative Toxicogenomics Database (CTD) [86] have established disease curation pipelines, employing OMIM. OMIM is a detail-orientated database of disease annotation, widely used by the clinical community, but it lacks the structure and features of an ontology that would otherwise make it a perfect data source to systematically reference disease. Curation of human disease requires an establishment of a widely accessible and structured vocabulary (or ontology) that consists of knowledge that is familiar to Reactome's end user, flexible to future Reactome annotation updates, and open to semantic reasoning. One such ontology is the Disease Ontology. Reactome will continue to work with the research community to support the development and continuous improvement of human disease ontologies and will link out to the relevant cancer and disease-related databases, to advance our own annotation consistency. In future versions of Reactome, we may also cross-reference NCIt [68] directly for cancer-related physical entities and events. The Disease Ontology does provide NCIt identifiers when possible, but disease terms captured by the Disease Ontology and NCIt do not completely overlap. Cross-referencing different ontologies will make our disease annotations more comprehensive and stable. Since some amount of overlap exists between disease terms in any disease ontology, the overlap is reflected in our current annotation of disease attributes. This is not ideal and we are developing guidelines to standardize the use of disease terms in Reactome. As far as anti-cancer therapeutics are concerned, we do not capture their approval for clinical use other than in text summations, as this is outside the scope of Reactome project. However, cross-referencing a drug database, such as PharmaGKB [87] would provide Reactome users with easy access to clinically relevant drug information, and is currently under our consideration.
We are working on further improvements to the Reactome pathway browser to produce more compact images and to be able to share one diagram between the wild-type pathway and several disease pathways with different etiologies. Furthermore, we are making additions to the Molecular Interaction Overlay to promote visual linkages between pathway entities and disease annotations, such as OMIM. Network-based methods have been used extensively in genomic and proteomic studies to analyze challenging and complex datasets. Reactome provides the Functional Interaction (FI) network plug-in for Cytoscape, which can identify network patterns related to diseases, including cancer [88]. Future expansion of the FI network with interactions based upon Reactome cancer-related pathways should significantly improve coverage, enhance the functionality of the analysis, and enrich the functional annotations supported by the FI network plug-in. Reactome will continue to develop novel and useful technologies for the querying, visualization and analysis of experimental datasets, in the context of not only normal but also disease pathways.