TruMPET: A New Method for Protein Secondary Structure Prediction Using Neural Networks Trained on Multiple Pre-Selected Physicochemical and Structural Features

Milchevskiy, Yury V.; Kravatskaya, Galina I.; Kravatsky, Yury V.

doi:10.3390/ijms262311284

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

TruMPET: A New Method for Protein Secondary Structure Prediction Using Neural Networks Trained on Multiple Pre-Selected Physicochemical and Structural Features

by

Yury V. Milchevskiy

^*

,

Galina I. Kravatskaya

and

Yury V. Kravatsky

Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Int. J. Mol. Sci. 2025, 26(23), 11284; https://doi.org/10.3390/ijms262311284

Submission received: 9 October 2025 / Revised: 18 November 2025 / Accepted: 20 November 2025 / Published: 21 November 2025

(This article belongs to the Special Issue Recent Research of Protein Structure Prediction and Design)

Download Versions Notes

Abstract

Protein structure prediction continues to pose multiple challenges, despite the progress made by ML. While recent deep learning models have achieved a strong performance using embeddings from protein language models, they often ignore non-canonical amino acids and rely heavily on sequence alignments or evolutionary profiles. Here, we present an improvement to this approach for predicting the secondary protein structure of DSSP classes solely from amino acid sequences. We suggest that ML feature sets should be generated from statistically significant mutually uncorrelated descriptors. The selection of statistically assessed descriptors, including predicting the physicochemical parameters of non-canonical amino acids, is a key component of the proposed method. The statistical significance and influence of each of the suggested features were assessed using a two-step Linear Discriminant Analysis, which permitted the evaluation of the statistical significance of each descriptor and their impact on model accuracy. We applied the set of 109 most influential statistically significant descriptors as a learning model for the two-layer Bi-LSTM network combined with ESMFold2 embeddings. Our method, TruMPET (Training upon Multiple Pre-selected Elements Technique), outperformed all other methods reported in the literature for the non-redundant datasets (CB513: DSSP Q3 = 91.36% and Q8 = 85.41%, TEST2018: DSSP Q3 = 90.64% and Q8 = 84.17%).

Keywords: protein secondary structure; PSSP (protein secondary structure prediction); DSSP (Dictionary of Secondary Structure in Proteins); machine learning (ML); LDA (Linear Discriminant Analysis); ncAA (non-canonical Amino Acid)

Share and Cite

MDPI and ACS Style

Milchevskiy, Y.V.; Kravatskaya, G.I.; Kravatsky, Y.V. TruMPET: A New Method for Protein Secondary Structure Prediction Using Neural Networks Trained on Multiple Pre-Selected Physicochemical and Structural Features. Int. J. Mol. Sci. 2025, 26, 11284. https://doi.org/10.3390/ijms262311284

AMA Style

Milchevskiy YV, Kravatskaya GI, Kravatsky YV. TruMPET: A New Method for Protein Secondary Structure Prediction Using Neural Networks Trained on Multiple Pre-Selected Physicochemical and Structural Features. International Journal of Molecular Sciences. 2025; 26(23):11284. https://doi.org/10.3390/ijms262311284

Chicago/Turabian Style

Milchevskiy, Yury V., Galina I. Kravatskaya, and Yury V. Kravatsky. 2025. "TruMPET: A New Method for Protein Secondary Structure Prediction Using Neural Networks Trained on Multiple Pre-Selected Physicochemical and Structural Features" International Journal of Molecular Sciences 26, no. 23: 11284. https://doi.org/10.3390/ijms262311284

APA Style

Milchevskiy, Y. V., Kravatskaya, G. I., & Kravatsky, Y. V. (2025). TruMPET: A New Method for Protein Secondary Structure Prediction Using Neural Networks Trained on Multiple Pre-Selected Physicochemical and Structural Features. International Journal of Molecular Sciences, 26(23), 11284. https://doi.org/10.3390/ijms262311284

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

TruMPET: A New Method for Protein Secondary Structure Prediction Using Neural Networks Trained on Multiple Pre-Selected Physicochemical and Structural Features

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI