ProTstab2 for Prediction of Protein Thermal Stabilities
Abstract
:1. Introduction
2. Results
2.1. Choice of Algorithm
2.2. Feature Selection and Method Training
2.3. Performance of ProTstab2 Algorithm
2.4. Application of ProTstab2 to Proteome-Wide Predictions and Comparison of Stabilities of Human, Mouse, and Zebrafish Proteins
2.5. ProTstab2 Web Application
3. Discussion
4. Materials and Methods
4.1. Data Sets
4.2. Features
4.3. Algorithms
4.4. Feature Selection
4.5. Performance Assessment
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Nisthal, A.; Wang, C.Y.; Ary, M.L.; Mayo, S.L. Protein stability engineering insights revealed by domain-wide comprehensive mutagenesis. Proc. Natl. Acad. Sci. USA 2019, 116, 16367–16377. [Google Scholar] [CrossRef] [PubMed]
- Chandler, P.G.; Broendum, S.S.; Riley, B.T.; Spence, M.A.; Jackson, C.J.; McGowan, S.; Buckle, A.M. Strategies for increasing protein stability. Methods Mol. Biol. 2020, 2073, 163–181. [Google Scholar] [CrossRef] [PubMed]
- Ferrer-Costa, C.; Orozco, M.; de la Cruz, X. Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J. Mol. Biol. 2002, 315, 771–786. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, K.; Dill, K.A. Computing protein stabilities from their chain lengths. Proc. Natl. Acad. Sci. USA 2009, 106, 10649–10654. [Google Scholar] [CrossRef]
- Robertson, A.D.; Murphy, K.P. Protein structure and the energetics of protein stability. Chem. Rev. 1997, 97, 1251–1268. [Google Scholar] [CrossRef]
- Gorania, M.; Seker, H.; Haris, P.I. Predicting a protein’s melting temperature from its amino acid sequence. In Proceedings of the 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, Buenos Aires, Argentina, 31 August–4 September 2010; pp. 1820–1823. [Google Scholar] [CrossRef]
- Ku, T.; Lu, P.; Chan, C.; Wang, T.; Lai, S.; Lyu, P.; Hsiao, N. Predicting melting temperature directly from protein sequences. Comput. Biol. Chem. 2009, 33, 445–450. [Google Scholar] [CrossRef]
- Ebrahimi, M.; Lakizadeh, A.; Agha-Golzadeh, P.; Ebrahimie, E.; Ebrahimi, M. Prediction of thermostability from amino acid attributes by combination of clustering with attribute weighting: A new vista in engineering enzymes. PLoS ONE 2011, 6, e23146. [Google Scholar] [CrossRef]
- Braiuca, P.; Buthe, A.; Ebert, C.; Linda, P.; Gardossi, L. Volsurf computational method applied to the prediction of stability of thermostable enzymes. Biotechnol. J. 2007, 2, 214–220. [Google Scholar] [CrossRef]
- Dehouck, Y.; Folch, B.; Rooman, M. Revisiting the correlation between proteins’ thermoresistance and organisms’ thermophilicity. Protein Eng. Des. Sel. 2008, 21, 275–278. [Google Scholar] [CrossRef] [Green Version]
- Pucci, F.; Dhanani, M.; Dehouck, Y.; Rooman, M. Protein thermostability prediction within homologous families using temperature-dependent statistical potentials. PLoS ONE 2014, 9, e91659. [Google Scholar] [CrossRef]
- Pucci, F.; Rooman, M. Stability curve prediction of homologous proteins using temperature-dependent statistical potentials. PLoS Comput. Biol. 2014, 10, e1003689. [Google Scholar] [CrossRef]
- Pucci, F.; Kwasigroch, J.M.; Rooman, M. SCooP: An accurate and fast predictor of protein stability curves as a function of temperature. Bioinformatics 2017, 33, 3415–3422. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Ding, X.; Zhu, G.; Niroula, A.; Lv, Q.; Vihinen, M. ProTstab—Predictor for cellular protein stability. BMC Genom. 2019, 20, 804. [Google Scholar] [CrossRef] [PubMed]
- Pucci, F.; Rooman, M. Towards an accurate prediction of the thermal stability of homologous proteins. J. Biomol. Struct. Dyn. 2016, 34, 1132–1142. [Google Scholar] [CrossRef] [PubMed]
- Leuenberger, P.; Ganscha, S.; Kahraman, A.; Cappelletti, V.; Boersema, P.J.; von Mering, C.; Claassen, M.; Picotti, P. Cell-wide analysis of protein thermal unfolding reveals determinants of thermostability. Science 2017, 355, eaai7825. [Google Scholar] [CrossRef]
- Jarzab, A.; Kurzawa, N.; Hopf, T.; Moerch, M.; Zecha, J.; Leijten, N.; Bian, Y.; Musiol, E.; Maschberger, M.; Stoehr, G.; et al. Meltome atlas-thermal proteome stability across the tree of life. Nat. Methods 2020, 17, 495–503. [Google Scholar] [CrossRef]
- Niroula, A.; Urolagin, S.; Vihinen, M. PON-P2: Prediction method for fast and reliable identification of harmful variants. PLoS ONE 2015, 10, e0117380. [Google Scholar] [CrossRef]
- Yang, Y.; Shao, A.; Vihinen, M. PON-All, amino acid substitution tolerance predictor for all organisms. Front. Mol. Biosci. 2022, 9, 867572. [Google Scholar] [CrossRef]
- Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The Protein Data Bank. Nucleic Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef] [Green Version]
- Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. [Google Scholar] [CrossRef]
- Morales, J.; Pujar, S.; Loveland, J.E.; Astashyn, A.; Bennett, R.; Berry, A.; Cox, E.; Davidson, C.; Ermolaeva, O.; Farrell, C.M.; et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 2022, 604, 310–315. [Google Scholar] [CrossRef]
- Niroula, A.; Vihinen, M. Variation interpretation predictors: Principles, types, performance, and choice. Hum. Mutat. 2016, 37, 579–597. [Google Scholar] [CrossRef]
- Vihinen, M. How to evaluate performance of prediction methods? Measures and their interpretation in variation effect analysis. BMC Genom. 2012, 13, S2. [Google Scholar] [CrossRef]
- Vihinen, M. Guidelines for reporting and using prediction tools for genetic variation analysis. Hum. Mutat. 2013, 34, 275–282. [Google Scholar] [CrossRef]
- UniProt: The universal protein knowledgebase in 2021. Nucleic Acids Res. 2021, 49, D480–D489. [CrossRef]
- Xiao, N.; Cao, D.S.; Zhu, M.F.; Xu, Q.S. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 2015, 31, 1857–1859. [Google Scholar] [CrossRef]
- Brettner, L.M.; Masel, J. Protein stickiness, rather than number of functional protein-protein interactions, predicts expression noise and plasticity in yeast. BMC Syst. Biol. 2012, 6, 128. [Google Scholar] [CrossRef]
- Ruiz-Blanco, Y.B.; Paz, W.; Green, J.; Marrero-Ponce, Y. ProtDCal: A program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinform. 2015, 16, 162. [Google Scholar] [CrossRef]
- Gasteiger, E.; Hoogland, C.; Gattiker, A.; Duvaud, S.; Wilkins, M.R.; Appel, R.D.; Bairoch, A. Protein identification and analysis tools on the ExPASy server. In The Proteomics Protocols Handbook; Walker, J.M., Ed.; Humana Press: Totowa, NJ, USA, 2005; pp. 571–607. [Google Scholar]
- Shen, B.; Vihinen, M. Conservation and covariance in PH domain sequences: Physicochemical profile and information theoretical analysis of XLA-causing mutations in the Btk PH domain. Protein Eng. Des. Sel. 2004, 17, 267–276. [Google Scholar] [CrossRef] [Green Version]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Ho, C.-H.; Lin, C.-J. Large-scale linear support vector regression. J. Mach. Learn. Res. 2012, 13, 3323–3348. [Google Scholar]
- Fan, R.-E.; Chang, K.-W.; Hsieh, C.-H.; Wang, X.-R.; Lin, C.-J. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 2008, 9, 1871–1874. [Google Scholar]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A library for support vector machines. ACM Transact. Int. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Statist. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16, San Francisco, CA, USA, 13–17 August 2016; ACM: San Fransisco, CA, USA, 2016; pp. 785–794. [Google Scholar]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree; Neural Information Processing Systems: La Jolla, CA, USA, 2017. [Google Scholar]
- Taud, H.; Mas, J.F. Multilayer perceptron (MLP). In Geomatic Approaches for Modeling Land Change Scenarios; Camacho Olmedo, M., Paegelow, M., Mas, J.F., Escobar, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2018; pp. 451–455. [Google Scholar]
- Gardner, M.W.; Dorling, S.R. Artificial neural networks (the multilayer perceptron)—A review of applications in the atmospheric sciences. Atmosph. Environ. 1998, 32, 2627–2636. [Google Scholar] [CrossRef]
- Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Species | OGT (°C) | ProTstab | Meltome Atlas | Total |
---|---|---|---|---|
Oleispira antarctica | 15 | 0 | 1352 | 1352 |
Caenorhabditis elegans | 20 | 0 | 3327 | 3327 |
Arabidopsis thaliana | 25 | 0 | 2489 | 2489 |
Danio rerio | 28 | 0 | 3362 | 3362 |
Drosophila melanogaster | 28 | 0 | 1681 | 1681 |
Bacillus subtilis | 30 | 0 | 1563 | 1563 |
Saccharomyces cerevisiae | 30 | 706 | 1949 | 2655 |
Homo sapiens | 37 | 984 | 5472 | 6456 |
Escherichia coli | 37 | 729 | 1830 | 2559 |
Mus musculus | 37 | 0 | 5800 | 5800 |
Geobacillus stearothermophilus | 55 | 0 | 776 | 776 |
Picrophilus torridus | 60 | 0 | 908 | 908 |
Thermus thermophilus | 70 | 1081 | 904 | 1985 |
Total | 3500 | 31,413 | 34,913 |
ProTstab | Meltome Atlas | ProTstab2 | |
---|---|---|---|
Blind test set | 299 | 3144 | 3443 |
Training set | 3201 | 28,269 | 31,470 |
Total | 3500 | 31,413 | 34,913 |
DT | RF | SVR | GBRT | XGBoost | LightGBM | MLP | |
---|---|---|---|---|---|---|---|
PCC | 0.55 | 0.71 | 0.59 | 0.72 | 0.73 | 0.75 | 0.74 |
RMSE (°C) | 10.21 | 7.58 | 8.88 | 7.43 | 7.42 | 7.11 | 7.26 |
R2 | 0.09 | 0.50 | 0.31 | 0.52 | 0.52 | 0.56 | 0.54 |
MSE (°C) | 104.22 | 57.46 | 78.88 | 55.15 | 55.07 | 50.50 | 52.87 |
MAE (°C) | 7.45 | 5.60 | 6.68 | 5.51 | 5.49 | 5.27 | 5.35 |
Running time (s) | 4076 | 23777 | 5258 | 27299 | 7977 | 673 | 7323 |
RFE | RFECV | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Number of features | 50 | 100 | 200 | 300 | 500 | 1000 | 2000 | 3000 | 6935 (all) | 1214 |
PCC | 0.750 | 0.757 | 0.758 | 0.758 | 0.757 | 0.755 | 0.748 | 0.752 | 0.749 | 0.755 |
RMSE (°C) | 7.083 | 7.021 | 6.991 | 6.992 | 7.001 | 7.027 | 7.114 | 7.062 | 7.104 | 7.032 |
R2 | 0.563 | 0.570 | 0.574 | 0.574 | 0.573 | 0.570 | 0.559 | 0.565 | 0.560 | 0.569 |
MSE (°C) | 50.189 | 49.304 | 48.887 | 48.906 | 49.028 | 49.396 | 50.614 | 49.878 | 50.485 | 49.469 |
MAE (°C) | 5.282 | 5.228 | 5.197 | 5.196 | 5.204 | 5.217 | 5.277 | 5.240 | 5.271 | 5.221 |
ProTstab | ProTstab2 | |
---|---|---|
PCC | 0.736 | 0.803 |
RMSE | 9.636 | 9.097 |
MSE | 93.581 | 82.752 |
MAE | 8.158 | 6.934 |
R2 | −0.850 | 0.580 |
SCooP | ProTstab2 | |
---|---|---|
PCC | 0.443 | 0.715 |
RMSE | 16.926 | 7.605 |
MSE | 286.480 | 57.837 |
MAE | 13.867 | 5.682 |
R2 | −1.594 | 0.476 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Y.; Zhao, J.; Zeng, L.; Vihinen, M. ProTstab2 for Prediction of Protein Thermal Stabilities. Int. J. Mol. Sci. 2022, 23, 10798. https://doi.org/10.3390/ijms231810798
Yang Y, Zhao J, Zeng L, Vihinen M. ProTstab2 for Prediction of Protein Thermal Stabilities. International Journal of Molecular Sciences. 2022; 23(18):10798. https://doi.org/10.3390/ijms231810798
Chicago/Turabian StyleYang, Yang, Jianjun Zhao, Lianjie Zeng, and Mauno Vihinen. 2022. "ProTstab2 for Prediction of Protein Thermal Stabilities" International Journal of Molecular Sciences 23, no. 18: 10798. https://doi.org/10.3390/ijms231810798
APA StyleYang, Y., Zhao, J., Zeng, L., & Vihinen, M. (2022). ProTstab2 for Prediction of Protein Thermal Stabilities. International Journal of Molecular Sciences, 23(18), 10798. https://doi.org/10.3390/ijms231810798