Next Article in Journal
Exploring the Relationship between Crassulacean Acid Metabolism (CAM) and Mineral Nutrition with a Special Focus on Nitrogen
Next Article in Special Issue
Structure-Based Design and in Silico Screening of Virtual Combinatorial Library of Benzamides Inhibiting 2-trans Enoyl-Acyl Carrier Protein Reductase of Mycobacterium tuberculosis with Favorable Predicted Pharmacokinetic Profiles
Previous Article in Journal
Growth Hormone Secretagogues and the Regulation of Calcium Signaling in Muscle
Previous Article in Special Issue
Development of Multi-Target Chemometric Models for the Inhibition of Class I PI3K Enzyme Isoforms: A Case Study Using QSAR-Co Tool
Open AccessArticle

Improvement of Epitope Prediction Using Peptide Sequence Descriptors and Machine Learning

RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, 15071 A Coruña, Spain
Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n, 15071 A Coruña, Spain
Unidad Profesional Interdisciplinaria de Biotecnología, National Polytechnic Institute (IPN), Ticoman, 07340 Mexico City, Mexico
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2019, 20(18), 4362;
Received: 31 July 2019 / Revised: 26 August 2019 / Accepted: 30 August 2019 / Published: 5 September 2019
(This article belongs to the Special Issue QSAR and Chemoinformatics Tools for Modeling)
In this work, we improved a previous model used for the prediction of proteomes as new B-cell epitopes in vaccine design. The predicted epitope activity of a queried peptide is based on its sequence, a known reference epitope sequence under specific experimental conditions. The peptide sequences were transformed into molecular descriptors of sequence recurrence networks and were mixed under experimental conditions. The new models were generated using 709,100 instances of pair descriptors for query and reference peptide sequences. Using perturbations of the initial descriptors under sequence or assay conditions, 10 transformed features were used as inputs for seven Machine Learning methods. The best model was obtained with random forest classifiers with an Area Under the Receiver Operating Characteristics (AUROC) of 0.981 ± 0.0005 for the external validation series (five-fold cross-validation). The database included information about 83,683 peptides sequences, 1448 epitope organisms, 323 host organisms, 15 types of in vivo processes, 28 experimental techniques, and 505 adjuvant additives. The current model could improve the in silico predictions of epitopes for vaccine design. The script and results are available as a free repository. View Full-Text
Keywords: epitopes; machine learning; protein sequences; qualitative structure–activity relationships epitopes; machine learning; protein sequences; qualitative structure–activity relationships
Show Figures

Figure 1

MDPI and ACS Style

Munteanu, C.R.; Gestal, M.; Martínez-Acevedo, Y.G.; Pedreira, N.; Pazos, A.; Dorado, J. Improvement of Epitope Prediction Using Peptide Sequence Descriptors and Machine Learning. Int. J. Mol. Sci. 2019, 20, 4362.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop