AoP-LSE: Antioxidant Proteins Classification Using Deep Latent Space Encoding of Sequence Features
Abstract
:1. Introduction
2. Materials and Methods
2.1. Evaluation Parameters
2.2. Dataset
2.3. Features and Latent Space Encoding
2.4. Neural Network Architecture
2.5. Training Configurations
3. Results and Discussion
3.1. Ablation Study
3.1.1. Finding Best Latent-Space Encoding (LSE) Scheme
3.1.2. Finding Best-Configuration for DeepLSE Architecture
3.2. Comparison with the Contemporary Methods
3.3. Verification on Independent-Dataset of Antioxidant Proteins
4. Analysis of Deep Latent-Space Encoding
4.1. Comparison of Feature and Latent-Space Discrimination Capability
4.2. Comparison of Proposed DeepLSE and Conventional Auto-Encoder-Based Encoding Schemes
4.3. Analysis of Decoder and Residual Error
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Chauvin, J.P.R.; Griesser, M.; Pratt, D.A. The antioxidant activity of polysulfides: It’s radical! Chem. Sci. 2019, 10, 4999–5010. [Google Scholar] [CrossRef] [Green Version]
- Sannasimuthu, A.; Arockiaraj, J. Intracellular free radical scavenging activity and protective role of mammalian cells by antioxidant peptide from thioredoxin disulfide reductase of Arthrospira platensis. J. Funct. Foods 2019, 61, 103513. [Google Scholar] [CrossRef]
- Tang, J.; Fu, J.; Wang, Y.; Luo, Y.; Yang, Q.; Li, B.; Tu, G.; Hong, J.; Cui, X.; Chen, Y.; et al. Simultaneous improvement in the precision, accuracy, and robustness of label-free proteome quantification by optimizing data manipulation chains. Mol. Cell. Proteom. 2019, 18, 1683–1699. [Google Scholar] [CrossRef] [PubMed]
- Grzesik, M.; Bartosz, G.; Stefaniuk, I.; Pichla, M.; Namieśnik, J.; Sadowska-Bartosz, I. Dietary antioxidants as a source of hydrogen peroxide. Food Chem. 2019, 278, 692–699. [Google Scholar] [CrossRef]
- Feng, P.; Ding, H.; Lin, H.; Chen, W. AOD: The antioxidant protein database. Sci. Rep. 2017, 7, 1–4. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naive Bayes. Comput. Math. Methods Med. 2013, 2013, 567529. [Google Scholar] [CrossRef] [Green Version]
- Feng, P.; Chen, W.; Lin, H. Identifying antioxidant proteins by using optimal dipeptide compositions. Interdiscip. Sci. Comput. Life Sci. 2016, 8, 186–191. [Google Scholar] [CrossRef]
- St, L.; Wold, S. Analysis of variance (ANOVA). Chemom. Intell. Lab. Syst. 1989, 6, 259–272. [Google Scholar]
- Li, H.; Tian, S.; Li, Y.; Fang, Q.; Tan, R.; Pan, Y.; Huang, C.; Xu, Y.; Gao, X. Modern deep learning in bioinformatics. J. Mol. Cell Biol. 2020, 12, 823–827. [Google Scholar] [CrossRef]
- Senior, A.W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Qin, C.; Žídek, A.; Nelson, A.W.; Bridgland, A.; et al. Improved protein structure prediction using potentials from deep learning. Nature 2020, 577, 706–710. [Google Scholar] [CrossRef]
- Torrisi, M.; Pollastri, G.; Le, Q. Deep learning methods in protein structure prediction. Comput. Struct. Biotechnol. J. 2020, 18, 1301–1310. [Google Scholar] [CrossRef]
- Park, S.; Khan, S.; Wahab, A. E3-targetPred: Prediction of E3-Target Proteins Using Deep Latent Space Encoding. arXiv 2020, arXiv:2007.12073. [Google Scholar]
- Usman, M.; Khan, S.; Lee, J.A. Afp-LSe: Antifreeze proteins prediction Using Latent Space encoding of composition of k-Spaced Amino Acid pairs. Sci. Rep. 2020, 10, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Al-Saggaf, U.M.; Usman, M.; Naseem, I.; Moinuddin, M.; Jiman, A.A.; Alsaggaf, M.U.; Alshoubaki, H.K.; Khan, S. ECM-LSE: Prediction of Extracellular Matrix Proteins using Deep Latent Space Encoding of k-Spaced Amino Acid Pairs. Front. Bioeng. Biotechnol. 2021. [Google Scholar] [CrossRef]
- Khan, S.; Naseem, I.; Togneri, R.; Bennamoun, M. Rafp-pred: Robust prediction of antifreeze proteins using localized analysis of n-peptide compositions. IEEE/ACM Trans. Comput. Biol. Bioinform. 2016, 15, 244–250. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Naseem, I.; Khan, S.; Togneri, R.; Bennamoun, M. ECMSRC: A sparse learning approach for the prediction of extracellular matrix proteins. Curr. Bioinform. 2017, 12, 361–368. [Google Scholar] [CrossRef]
- Usman, M.; Khan, S.; Park, S.; Wahab, A. AFP-SRC: Identification of Antifreeze Proteins Using Sparse Representation Classifier. Neural Comput. Appl. 2021. [Google Scholar] [CrossRef]
- Mosharaf, M.P.; Hassan, M.M.; Ahmed, F.F.; Khatun, M.S.; Moni, M.A.; Mollah, M.N.H. Computational prediction of protein ubiquitination sites mapping on Arabidopsis thaliana. Comput. Biol. Chem. 2020, 85, 107238. [Google Scholar] [CrossRef]
- Usman, M.; Lee, J.A. Afp-cksaap: Prediction of antifreeze proteins using composition of k-spaced amino acid pairs with deep neural network. In Proceedings of the 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE), Athens, Greece, 28–30 October 2019; pp. 38–43. [Google Scholar]
- Ju, Z.; Wang, S.Y. Prediction of lysine formylation sites using the composition of k-spaced amino acid pairs via Chou’s 5-steps rule and general pseudo components. Genomics 2020, 112, 859–866. [Google Scholar] [CrossRef]
- Zhao, H.; Zheng, J.; Xu, J.; Deng, W. Fault diagnosis method based on principal component analysis and broad learning system. IEEE Access 2019, 7, 99263–99272. [Google Scholar] [CrossRef]
- Yoon, Y.H.; Khan, S.; Huh, J.; Ye, J.C. Efficient b-mode ultrasound image reconstruction from sub-sampled rf data using deep learning. IEEE Trans. Med. Imaging 2018, 38, 325–336. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 29 September 2021).
- Consortium, U. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 2019, 47, D506–D515. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, X.; Tang, Q.; Tang, H.; Chen, W. Identifying antioxidant proteins by combining multiple methods. Front. Bioeng. Biotechnol. 2020, 8, 858. [Google Scholar] [CrossRef]
- Jolliffe, I.T. Principal components in regression analysis. In Principal Component Analysis; Springer: Berlin/Heidelberg, Germany, 1986; pp. 129–155. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Fan, R.E.; Chang, K.W.; Hsieh, C.J.; Wang, X.R.; Lin, C.J. LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 2008, 9, 1871–1874. [Google Scholar]
- Khan, S.; Naseem, I.; Togneri, R.; Bennamoun, M. A novel adaptive kernel for the rbf neural networks. Circuits Syst. Signal Process. 2017, 36, 1639–1653. [Google Scholar] [CrossRef] [Green Version]
- Rennie, J.D.; Shih, L.; Teevan, J.; Karger, D.R. Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 616–623. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Park, S.; Khan, S.; Moinuddin, M.; Al-Saggaf, U.M. GSSMD: A new standardized effect size measure to improve robustness and interpretability in biological applications. In Proceedings of the 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Seoul, Korea, 16–19 December 2020; pp. 1096–1099. [Google Scholar]
- Rodriguez-Molares, A.; Rindal, O.M.H.; D’hooge, J.; Måsøy, S.E.; Austeng, A.; Bell, M.A.L.; Torp, H. The generalized contrast-to-noise ratio: A formal definition for lesion detectability. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2019, 67, 745–759. [Google Scholar] [CrossRef] [Green Version]
- Peyré, G.; Cuturi, M. Computational optimal transport: With applications to data science. Found. Trends Mach. Learn. 2019, 11, 355–607. [Google Scholar] [CrossRef]
- Khan, S.; Huh, J.; Ye, J.C. Variational Formulation of Unsupervised Deep Learning for Ultrasound Image Artifact Removal. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 2021, 68, 2086–2100. [Google Scholar] [CrossRef] [PubMed]
Gap/LV | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|
Configuration (N) | 1 | 5 | 10 |
---|---|---|---|
Encode | 10-5-2 | 50-25-10 | 100-50-20 |
Decode | 2-5-10 | 10-25-50 | 20-50-100 |
Classifier | 2-2 | 10-10 | 20-20 |
Metric/Configuration (N) | 1 | 5 | 10 | |||
---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | |
Youden’s Index | ||||||
MCC | ||||||
ROC-AUC | ||||||
PR-AUC | ||||||
MSE () |
Method | Accuracy | Precision | YI | BACC | MCC | F1 | |||
---|---|---|---|---|---|---|---|---|---|
Naive Bayes [6] | 0.668 | 0.720 | 0.660 | 0.26 | 0.38 | 0.690 | 0.27 | 0.38 | 0.22 |
AODPred(SVM) [7] | 0.747 | 0.750 | 0.744 | 0.33 | 0.49 | 0.747 | 0.36 | 0.46 | 0.32 |
AoP-LSE (DL) | 0.824 | 0.674 | 0.849 | 0.43 | 0.52 | 0.762 | 0.43 | 0.52 | 0.42 |
UniProtKB ACC | NCBI Definition | AODPred | Vote9 | AoP-LSE |
---|---|---|---|---|
P9WQB7 | Alkyl hydroperoxide reductase C | ✓ | ✗ | ✓ |
P9WHH9 | Dihydrolipoyl dehydrogenase | ✗ | ✗ | ✓ |
P9WIS7 | Dihydrolipoyllysine-residue | ✗ | ✓ | ✓ |
P9WG35 | Thiol peroxidase | ✓ | ✗ | ✓ |
P9WGE9 | Superoxide dismutase | ✓ | ✗ | ✓ |
P9WQB5 | Alkyl hydroperoxide reductase | ✓ | ✗ | ✓ |
P9WIE3 | Alkyl hydroperoxide reductase | ✓ | ✗ | ✓ |
P0CU34 | Peroxiredoxin TSA1 | ✓ | ✗ | ✓ |
Q5ACV9 | Cell surface superoxide dismutase | ✗ | ✗ | ✓ |
P9WHH8 | Dihydrolipoyl dehydrogenase | ✗ | ✓ | ✓ |
P9WIE1 | Putative peroxiredoxin Rv2521 | ✗ | ✓ | ✓ |
P9WIS6 | Dihydrolipoyllysine-residue | ✗ | ✗ | ✓ |
P9WQB6 | Alkyl hydroperoxide reductase | ✓ | ✗ | ✓ |
P9WID9 | Putative peroxiredoxin Rv1608c | ✓ | ✗ | ✓ |
O17433 | Cys peroxiredoxin | ✓ | ✗ | ✗ |
P9WIE0 | Putative peroxiredoxin MT2597 | ✗ | ✗ | ✓ |
P9WID8 | Putative peroxiredoxin MT1643 | ✓ | ✗ | ✓ |
P9WGE8 | Superoxide dismutase [Cu-Zn] | ✓ | ✗ | ✓ |
C0HK70 | Superoxide dismutase | ✓ | ✗ | ✓ |
P9WQB4 | Alkyl hydroperoxide reductase AhpD | ✓ | ✗ | ✓ |
P9WG34 | Thiol peroxidase | ✓ | ✗ | ✓ |
P9WIE2 | Alkyl hydroperoxide reductase E | ✓ | ✗ | ✓ |
Method | Sensitivity | Specificity | Accuracy | BACC | MCC | F1 Score | YI |
---|---|---|---|---|---|---|---|
AE + MLP | 0.65 | 0.57 | 0.58 | 0.61 | 0.16 | 0.31 | 0.23 |
AE + SVM | 0.64 | 0.56 | 0.57 | 0.50 | 0.15 | 0.30 | 0.21 |
AE + NB | 0.78 | 0.36 | 0.42 | 0.57 | 0.11 | 0.28 | 0.15 |
Proposed DeepLSE | 0.67 | 0.84 | 0.82 | 0.76 | 0.43 | 0.52 | 0.52 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Usman, M.; Khan, S.; Park, S.; Lee, J.-A. AoP-LSE: Antioxidant Proteins Classification Using Deep Latent Space Encoding of Sequence Features. Curr. Issues Mol. Biol. 2021, 43, 1489-1501. https://doi.org/10.3390/cimb43030105
Usman M, Khan S, Park S, Lee J-A. AoP-LSE: Antioxidant Proteins Classification Using Deep Latent Space Encoding of Sequence Features. Current Issues in Molecular Biology. 2021; 43(3):1489-1501. https://doi.org/10.3390/cimb43030105
Chicago/Turabian StyleUsman, Muhammad, Shujaat Khan, Seongyong Park, and Jeong-A Lee. 2021. "AoP-LSE: Antioxidant Proteins Classification Using Deep Latent Space Encoding of Sequence Features" Current Issues in Molecular Biology 43, no. 3: 1489-1501. https://doi.org/10.3390/cimb43030105
APA StyleUsman, M., Khan, S., Park, S., & Lee, J.-A. (2021). AoP-LSE: Antioxidant Proteins Classification Using Deep Latent Space Encoding of Sequence Features. Current Issues in Molecular Biology, 43(3), 1489-1501. https://doi.org/10.3390/cimb43030105