Predicting Deleterious Non-Synonymous Single Nucleotide Polymorphisms (nsSNPs) of HRAS Gene and In Silico Evaluation of Their Structural and Functional Consequences towards Diagnosis and Prognosis of Cancer

Simple Summary The HRAS gene has been reported to cause cancer, and identifying alleles that could potentially predispose one to cancer could lead to early diagnosis and better prognosis. Here for the first time, we conducted a machine-learning approach to identify high-risk predictive alleles of the HRAS gene. Our study reported alleles that may serve as potential targets for different proteomic studies, diagnoses, and therapeutic interventions focusing on cancer. Abstract The Harvey rat sarcoma (HRAS) proto-oncogene belongs to the RAS family and is one of the pathogenic genes that cause cancer. Deleterious nsSNPs might have adverse consequences at the protein level. This study aimed to investigate deleterious nsSNPs in the HRAS gene in predicting structural alterations associated with mutants that disrupt normal protein–protein interactions. Functional and structural analysis was employed in analyzing the HRAS nsSNPs. Putative post-translational modification sites and the changes in protein–protein interactions, which included a variety of signal cascades, were also investigated. Five different bioinformatics tools predicted 33 nsSNPs as “pathogenic” or “harmful”. Stability analysis predicted rs1554885139, rs770492627, rs1589792804, rs730880460, rs104894227, rs104894227, and rs121917759 as unstable. Protein–protein interaction analysis revealed that HRAS has a hub connecting three clusters consisting of 11 proteins, and changes in HRAS might cause signal cascades to dissociate. Furthermore, Kaplan–Meier bioinformatics analyses indicated that the HRAS gene deregulation affected the overall survival rate of patients with breast cancer, leading to prognostic significance. Thus, based on these analyses, our study suggests that the reported nsSNPs of HRAS may serve as potential targets for different proteomic studies, diagnoses, and therapeutic interventions focusing on cancer.


Introduction
The HRas proto-oncogene GTPase (HRAS) is one of the pathogenic genes that cause cancer. The human HRAS gene is located on chromosome 11p15.5 and is responsible for cell division by regulating the cellular signaling pathway using a molecular switch mechanism [1]. HRAS is encoded by the RAS protein, which is commonly deregulated in human cancer [2]. RAS is considered 'undruggable' because the RAS protein lacks a druggable binding pocket [3]. Mutations in HRAS have been reported in rhabdomyosarcoma, salivary gland carcinoma, thyroid carcinoma, and bladder and mouth carcinoma [4][5][6][7].
Single nucleotide polymorphisms (SNPs) account for more than 90% of all nucleic acid sequence variations in humans. Non-synonymous single nucleotide polymorphisms (nsSNPs) cause amino acid substitution, which may affect protein function and lead to pathogenic phenotypes [8]. nsSNPs have also been reported to affect protein stability, which impacts protein function and further alters a protein's allosteric sites and its stability [9,10]. nsSNPs in genes are responsible for cell growth and could potentially lead to downregulation of the gene causing continuous proliferation, thus leading towards the formation of cancer cells and has been associated with various human diseases [11][12][13][14]. Furthermore, analyses of putative functional SNP in cancer genes have been reported to exhibit prominent improvement in human health in terms of personalized therapeutics [15].
Mutations in the HRAS gene have been reported to be common among different types of cancer; however, the structural and functional effect of the reported mutations remains vague. Understanding how the HRAS nsSNPs lead toward cancer could provide better insights into understanding the influencing pathways, which hold implications for cancer diagnosis and prognosis. Therefore, this study aims to investigate the nsSNPs of HRAS in understanding its pathogenesis for elucidating the diagnosis and prognosis of cancer using in silico analysis. The nsSNPs of the HRAS gene were extracted from the NCBI database and screened for high-risk pathogenicity using multiple bioinformatics software tools based on their high accuracy and frequency of use [16,17]. The various bioinformatics steps involved in this study are shown in Figure 1.

Retrieving nsSNPs
The HRAS nsSNPs were obtained from the NCBI dbSNP 2.0 (gene ID: 3265) database (National Center for Biological Information) (http://www.ncbi.nlm.nih.gov/ (accessed on 12 October 2020)). The amino acid and DNA sequences, SNP IDs, wild-type amino acids, amino acid positions, missense amino acids, and minor allele frequency (MAF) were also retrieved. A total of 180 nsSNPs were extracted from this database. The human HRAS protein structure was accessed from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) (ID: 4Q21).

Identifying the Damaging nsSNPs
Five different bioinformatics tools were used to predict the functional effects of nsS-NPs.

Verifying the High-Risk nsSNPs
PMut [https://mmb.irbbarcelona.org/PMut/ (accessed on 20 August 2021)] was used to verify the nsSNPs identified by the previous five tools. The prediction has a range of 0-1, where the range of 0-0.5 is considered neutral while that of 0.5-1 is considered disease [18].

Analyzing Protein Stability
I-Mutant3.0 [http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi (accessed on 28 October 2020)] was used to determine protein stability. Its predictions are based on two support vector machines (SVM), and it uses ProTherm, which is now the most comprehensive database of thermodynamic experimental data on protein stability when mutated (calculated as free energy change value DDG) [19]. It also predicts the reliability index (RI) of the results ranging from 0-10, where 10 is the highest reliability. Other than inputting the protein sequence and the mutation sites, the temperature and pH setting remained the same (25 • C, pH 7).

Analyzing Protein Evolutionary Conservation
ConSurf [https://consurf.tau.ac.il/ (accessed on 28 October 2020)] was used to estimate the evolutionary conservation of amino acid residues in a protein based on phylogenetic relations between homologous sequences [20]. In this study, 50 homologous sequences were used to estimate the conservation score of each residue of HRAS protein.

3D Protein Modeling
The 3D models for wild-type HRAS protein and its mutants were generated using Phyre2 [http://www.sbg.bio.ic.ac.uk/~phyre2/html/page.cgi?id=index (accessed on 28 October 2020)] and I-TASSER (Iterative Threading ASSEmbly Refinement) [https: //zhanglab.ccmb.med.umich.edu/I-TASSER/ (accessed on 5 November 2020)]. PDB formats for wild-type HRAS and its mutants were obtained through Phyre2 by substituting each nsSNP into the HRAS protein sequence [21]. Then, TM-align [https://zhanglab.ccmb. med.umich.edu/TM-align/ (accessed on 28 October 2020)] was employed to compare wild-type with mutant protein structures. This algorithm computes a template modeling score (TM-score) and the root mean square deviation (RMSD) along with the superposition of the structures. The TM-score gives values between 0 and 1, where 0.00 < TM score < 0.30 means random structural similarity, 0.50 < TM score < 1.00 means in about the same fold. While higher RMSD indicates greater variation between wild-type and mutant structures [22,23]. Finally, the proteins were subjected to I-TASSER, which predicts the top five highest possible protein models for the sequence. All the models generated were verified with ERRAT [https://servicesn.mbi.ucla.edu/ERRAT/ (accessed on 5 November 2020)], and the resulting structures were viewed using Chimera 1.15.

Predicting Post-Translational Modification (PTM) Sites
The putative methylation sites at lysine and arginine residues in the HRAS protein sequence were predicted using MusiteDeep [https://www.musite.net/ (accessed on 20 August 2021)] and GPS-MSP 1.0 [http://msp.biocuckoo.org/online.php (accessed on 20 August 2021)]. A local database from UnitProtKB/Swiss-Prot was always provided and updated in MusiteDeep. The known PTM sites were shown on the sequence according to the confidence threshold [8]. GPS-MSP 1.0 was used to predict the potential of methylation sites, and it only showed sites that were higher than the cutoff point [24]. Phosphorylation sites in the HRAS protein at serine, threonine, and tyrosine residues were predicted using NetPhos 3.1 [https://services.healthtech.dtu.dk/service.php?NetPhos-3.1 (accessed on 20 August 2021)] and GPS 5.0 [http://gps.biocuckoo.org/online.php (accessed on 20 August 2021)]. NetPhos 3.1 uses ensembles of neural networks to complete this task, and residues with scores >0.5 the residue was considered phosphorylated [25]. Likewise, in GPS 5.0, the positions with scores higher than the cutoff point were listed by the program [26]. Putative protein ubiquitylation sites at lysine residues were predicted by BDM-PUB [http://bdmpub.biocuckoo.org/prediction.php (accessed on 20 August 2021)] and iUbiq-Lys [http://www.jci-bioinfo.cn/iUbiq-Lys (accessed on 20 August 2021)]. In BDM-PUB, the Bayesian Discrimination Method (BDM) was used to determine the probability score [27]. In iUbiq-Lys, SVM was used as a vector for calculation [28].

Predicting Protein-Protein Interactions by Search Tool for the Retrieval of Interacting Proteins (STRING)
STRING (https://string-db.org/ (accessed on 3 December 2020)]) is a database that calculates protein-protein interactions [29]. The database contains data from empirical evidence, computational prediction tools, and collections of universal text. The interaction of HRAs proteins with other proteins was determined using STRING.

PolymiRTS Database 3.0
PolymiRTS (https://compbio.uthsc.edu/miRSNP/ (accessed on 3 December 2020)]) analyses SNPs and INDELs in microRNA, which may affect the miRNA-mRNA interactions resulting in altered expression of protein [30]. The effects of the variants are classified as "D" (the derived allele disrupts a conserved miRNA site), "N" (the derived allele disrupts a nonconserved miRNA site), "C" (the derived allele generates a new miRNA site), and "O" (the derived allele creates a new miRNA site). The ancestral allele cannot be determined with "D" and "C" groups indicating functional impacts.

Kaplan-Meier Plotter Analysis (KM Plotter)
The Kaplan-Meier plotter evaluates the impact of 54,000 genes on survival in 21 types of cancer using meta-analysis-based detection and validation of biomarkers for cancer [31]. The overall survival (OS) is the period of time from the start of changes in gene expression level until the time a patient diagnosed with it is still alive. A p-value less than 0.05 will be considered statistically significant.

Molecular Dynamics Simulation
The structural stability of the mutants was simulated in a temporal manner using molecular dynamics (MD) simulation for wild-type and each mutant structure using GROMACS version 2020 [32,33]. The MD simulation was powered by the CHARMM36 force field and CHARMM-modified TIP3P water model [34]. The systems were solvated with water contained in a dodecahedron box with borders a minimum of 1.0 Å from the protein structure. The system was deionized with default cations and anions to achieve electrostatic neutralization. To prepare the system for the MD environment, the system was minimized using the steepest descent algorithm until energy convergence. Then, NVT and NPT simulations for 100 picoseconds each to equilibrate the system at the proper starting temperature, pressure, and density were carried out. Finally, the system proceeded with the production of MD simulations for 10 nanoseconds (ns) to observe protein structure dynamics over time.
For analysis, the first and last frame of the protein throughout the production of the MD simulation, as well as the computed stability metrics such as root mean square deviation (RMSD), root mean square fluctuation (RMSF) per residue, and radius of gyration (Rg) were extracted.

nsSNPs Retrieved from dbSNP Database
nsSNPs were retrieved from the dbSNP database as it is the more extensive SNP database [23]. A total of 2299 HRAS SNPs were extracted, of which 180 were nsSNPs located at the intronic region (28 nsSNPs) and exonic region (152 nsSNPs).

Deleterious nsSNPs Identified in HRAS Gene
PROVEAN, SIFT, PolyPhen-2, SNPs&GO, and PhD-SNP were used to predict and identify the most deleterious nsSNPs. A total of 33 nsSNPs were predicted as "pathogenic" or "harmful" by all five tools, hence they were high-risk nsSNPs (Table 1).

Verification of 33 HRAS High-Risk nsSNPs by PMut and I-Mutant
The 33 high-risk nsSNPs were further verified using PMut. Table 2 shows the prediction of the 33 nsSNPs by PMut for verification, where all of them were predicted to be "disease", indicating that it is damaging. I-Mutant 3.0 was used to predict protein stability after each mutation. The effect on protein stability, reliability index (RI), and free energy change value (DDG) were predicted ( Table 2). A decrease in stability was observed for 30 nsSNPs, of which V14G, I36T, F90S, L113P, and L133P showed a DDG value lower than −0.5, indicating a larger impact on the protein.
A total of eight: 38H, T58P, G60S, G60V, R73C, K117R, E143Q, and A146V nsSNPs were predicted to be highly conserved and exposed with decreased protein stability. These were confirmed as the most damaging and selected for comparative modeling.

Comparative Modeling of Wild-Type HRAS and Its Mutants
To determine whether the eight high-risk nsSNPs alter the wild-type structure of HRAS protein, Phyre2 was used to generate 3D structures of the wild-type protein and its eight mutants. The c5c2kA template was used for predicting the 3D models. I-TASSER was used to generate the 3D models for the wild-type HRAS protein and mutants. The common templates used were 6cuoA, 5wdrA, 1c1y, 6zioA, 5tarA, and 1ctqA. For each input, five models were generated by I-TASSER, which was verified by ERRAT, and the model with the highest possible C-score and highest possible ERRAT value was selected ( Table 4). The models were visualized with Chimera 1.15 ( Figure 2).

Post-Translational Modifications
Post-translational modifications (PTMs) are biochemical modifications of amino acids that extend the structures and change the properties and functions of the protein, regulating the structural confirmation of proteins, protein-protein interactions, and cellular signaling processes [35]. Listing a few of many PTMs, methylation (N-methylation) occurs mainly on lysine and arginine residues; phosphorylation on serine, threonine, and tyrosine residues; and ubiquitylation on lysine residues only. The eight high-risk nsSNPs identified in this study were further investigated to see if they have any effect on PTMs in HRAS protein.
The methylation predictors used were MusiteDeep and GPS-MSP 1.0. Only one residue (K5) was predicted by MusiteDeep (cutoff point = 0.5), while GPS-MSP predicted (at a very low cutoff point = 0) 11 lysine residues that can be methylated. The residue K5 was the only one in common between both tools ( Figure 2). The phosphorylation sites in HRAS protein were predicted by NetPhos 3.1 and GPS 5.0. NetPhos 3.1 predicted a total of 19 residues (Ser:7, Thr:7, Tyr:5) that had the potential of being phosphorylated, while GPS 5.0 predicted 31 residues (Ser:11, Thr:11, Tyr:9). The common sites predicted by both tools are shown in Figure 3.  predicted (at a very low cutoff point = 0) 11 lysine residues that can be methylated. The residue K5 was the only one in common between both tools (Figure 2). The phosphorylation sites in HRAS protein were predicted by NetPhos 3.1 and GPS 5.0. NetPhos 3.1 predicted a total of 19 residues (Ser:7, Thr:7, Tyr:5) that had the potential of being phosphorylated, while GPS 5.0 predicted 31 residues (Ser:11, Thr:11, Tyr:9). The common sites predicted by both tools are shown in Figure 3.

Protein-Protein Interaction Analysis
Protein-protein interactions analysis was carried out to determine if the nsSNPs interact with other proteins, thus altering phenotypic effects. The interaction analysis showed that HRAS is related to RAS1, PIK3CA, RASA1, BRAF, NF1, RALGDS, SOS1, ARAF, RIN1, and PIK3CG gene (Figure 4).

Protein-Protein Interaction Analysis
Protein-protein interactions analysis was carried out to determine if the nsSNPs interact with other proteins, thus altering phenotypic effects. The interaction analysis showed that HRAS is related to RAS1, PIK3CA, RASA1, BRAF, NF1, RALGDS, SOS1, ARAF, RIN1, and PIK3CG gene (Figure 4).

Prediction of nsSNPs within 3′ UTR
PolymiRTS database was used for the prediction of nsSNPs within 3′ UTR of the HRAS gene. nsSNP within the 3′ UTR region may disrupt and/or create miRNA target sites. A total of three: rs142218590, rs151229168, and rs140060409 functional SNPs were

Prediction of nsSNPs within 3 UTR
PolymiRTS database was used for the prediction of nsSNPs within 3 UTR of the HRAS gene. nsSNP within the 3 UTR region may disrupt and/or create miRNA target sites. A total of three: rs142218590, rs151229168, and rs140060409 functional SNPs were predicted to affect the miRNA target sites, further creating new miRNA binding sites (Table 4).

Expression Levels of HRAS on Overall Survival (OS) in Patients with Cancers
Kaplan-Meier plotter was used to determine the prognostic value of HRAS gene expression for breast, ovarian, lung, and gastric cancers by combining gene expression and cancer patient survival. Hazard ratio (HR) with 95% confidence intervals (CI) and log-rank p-value were calculated. HRAS gene showed a hazard ratio (HR) = 9.53 (95% CI: 1.3-69.88) and log-rank p-value = 0.006 for breast cancer; indicating that the result was statistically significant (the relation between the high expression of HLA-G gene and greater survival rate). However, no significant differences were observed for bladder carcinoma (p = 0.6591), cervical squamous cell carcinoma (p = 1), colon adenocarcinoma (p = 0.4552), cutaneous melanoma (p = 0.3256), head-neck squamous cell carcinoma (p = 0.5036), kidney renal clear cell carcinoma (p = 1), kidney renal papillary cell carcinoma (p = 1), lung adenocarcinoma (p = 9474), and ovarian cancer (p = 1). This indicates that HRAS deregulation can serve as a prognostic marker for patients' breast cancer.

Visualization and Analysis of MD Simulation
RMSD, Rg, and RMSF per residue distributions using a Python library and Matplotlib were visualized and plotted [36]. The RMSF of each residue for each model via the ChimeraX defattr method was also visualized [37].

Discussion
The HRAS gene has been annotated in the cell proliferation of several carcinomas; however, the in silico analysis understanding of the structural and functional effect of deleterious nsSNPs has remained uncharacterized. The HRAS gene is reported to contribute to germline mutations that activate RAS/MAPK signaling to lead toward "RASopathies" [38]. HRAS is also reported as a frequently mutated gene in cancers, subsequently reported as an effective RAS inhibitor [39]. Therefore, any changes in the HRAS protein at functional and structural levels may alter its bio-molecular interactions. This study determined the impact of deleterious HRAS nsSNPs at molecular, functional, and structural levels using computational analysis to identify the most harmful nsSNPs and their impact on RASSF5 protein.
Damaging nsSNPs were predicted using five different tools (PROVEAN, SIFT, PolyPhen-2, SNPs&GO, and PhD-SNP), resulting in 33 nsSNPs predicted as "highly damaging". In order to further narrow down the number of possible pathogenic nsSNPs, PMut, I-Mutant, and ConSurf tools were used to predict protein stability, the evolutionary conservation of amino acids, the physical and chemical properties, and the changes in protein structure after mutations. This resulted in eight "high-risk" nsSNPs: rs750680771, rs770492627, rs1589792804, rs730880460, rs749674880, rs104894227, rs909222512, and rs121917759, which were predicted (i) pathogenic by all five predicting tools; (ii) reduced protein stability; and (iii) evolutionary conservation showed these nsSNPs as highly conserved. This indicated that these nsSNPs with altered protein stability could cause misfolding, degradation, or aberrant conglomeration of proteins [40]. Moreover, we also found that these highly deleterious nsSNPs with high conservation scores could increase the risk of tumorigenesis by inactivating HRAS.
PTMs are essential in regulating the structures and functions of proteins and are involved in protein-protein interactions and cell signaling [41,42]. The eight "high-risk" nsSNPs were further investigated to determine their effect on PTM in the HRAS gene. The phosphorylation of HRAS at Y4C, T58I, and T58P was predicted to cause functional impairment leading to destabilization of the protein, eventually enhancing the harms of PTM impairment ( Figure 3). As PTMs can modify the functions and regulate the expression of a protein, mutations in PTM regions can cause the regulation mechanism of the protein to malfunction, which will lead to the formation of cancer cells. Compared with ActiveDriverDB (https://www.activedriverdb.org/ (accessed on 3 December 2020)), the phosphorylation sites in Figure 4 matched the information. In a study by Ting and colleagues (2015), the authors found that phosphorylation of Y137 could enhance the signaling capacity of the HRAS protein by changing the protein conformation and effector binding [43]. While no methylation sites were shown on the website, it did show a few ubiquitination sites. However, only one (K170) predicted by BDM-PUB matched it. We followed the results using ubiquitination predictors and considered that no ubiquitination sites were predicted. The polarity and hydrophobicity of these three nsSNPs were also determined as it contributes to the protein's structure and functionality. The prediction revealed that a polarity change was observed for T58I and T58P, indicating a change from neutral to hydrophobic and hydrophilic.
The structural consequences of these deleterious nsSNPs were predicted using the Phyre2 homology modeling tool. Six templates: 6cuoA, 5wdrA, 1c1y, 6zioA, 5tarA, and 1ctqA, were utilized to generate the wild-type and mutant protein models of the HRAS protein. Moreover, I-TASSER was used to calculate the C-score, which was verified by ERRAT. The model with the highest possible C-score and highest possible ERRAT value was selected. Based on these criteria, the eight nsSNPs were selected.
The protein interaction network of HRAS determined using the STRING tool showed a strong network (RAF1, P1K3CA, RASA1, BRAF, NF1, RALGDS, SOS1), and a negative regulation of the fibroblast apoptotic process. HRAS, an upstream activated protein, binds to RAF and MEK kinases and transduces intra and extracellular signals tyrosine receptor kinases (Trk) in the MAPK/ERK pathway. Therefore, nsSNPs in HRAS may inhibit the MAPK/ERK pathway causing the restoration of tumor cells to a non-transformed state and increasing the activation of the ERK/MAPK signaling pathway leading to the occurrence and development of tumors [44].
microRNAs have been associated with cancer-associated biological processes, such as proliferation, differentiation, apoptosis, metabolism, invasion, metastasis, and drug resistance. Furthermore, the pathological origin of cancer has also been proven to be directly related to the dysregulation of miRNAs [45]. The polymiRTS analysis in this study showed that the rs142218590, rs151229168, and rs140060409 functional SNPs were predicted to affect the miRNA target sites, further creating new miRNA binding sites. Therefore, this might influence the regulation of HRAS, leading to pathological conditions [46]. This study also evaluated the HRAS nsSNPs against different types of cancer using the Kaplan-Meier bioinformatics analyses. The results indicated that the HRAS gene deregulation might affect the overall survival rate of patients with breast cancer and thus affect prognosis significance. This finding is in agreement with a study by Bieche and colleagues (2021) that reported that HRAS-mutated AMEs could potentially be treated with MEK inhibitors [47]. Another study by Zhumakayeva and colleagues (2019) reported the clinical applicability of HRAS as a prognostic factor or to serve as a therapeutic target for breast cancer treatment [48].
Lastly, we performed 10 ns MD simulations for the wild-type HRAS and each mutant structure to simulate the structural stability of the mutant over time. All models were relatively stabilized without big structural fluctuations, as illustrated by the RMSD and RMSF scores of less than 1 nanometer (nm) ( Figure 5). As expected, we observed relatively larger fluctuations at the C-terminus coiled structure unanimously for all models. Moreover, all models' compactness remained stabilized per their invariant Rg values. Notably, T58P and K117R mutants consistently exhibited higher RMSD values, indicating more structural instability than others; however, most of their structural components, except for the Cterminal coils, fluctuated negligibly with RMSF of less than 0.4 nm.
RMSF scores of less than 1 nanometer (nm) ( Figure 5). As expected, we observed relatively larger fluctuations at the C-terminus coiled structure unanimously for all models. Moreover, all models' compactness remained stabilized per their invariant Rg values. Notably, T58P and K117R mutants consistently exhibited higher RMSD values, indicating more structural instability than others; however, most of their structural components, except for the C-terminal coils, fluctuated negligibly with RMSF of less than 0.4 nm.

Conclusions
The HRAS protein plays a vital role as a tumor suppressor. This study reports three major nsSNPs in HRAS: Tyrosine to Cysteine at position 4 (rs764622691), Threonine to Isoleucine at position 58 (rs121917758), and Threonine to Proline at position 58 (rs770492627) as high-risk. Furthermore, nsSNPs deregulation was also reported to affect the prognosis of breast cancer. These nsSNPs can be strongly considered as key molecular biomarkers for the diagnosis and prognosis of cancer. Nonetheless, in vitro studies are needed to explore the effects of these polymorphisms on the structure and function of the protein.