In Silico Models of Biological Activities of Peptides Using the Coefficient of Conformism of a Correlative Prediction and the Las Vegas Algorithm

Toropova, Alla P.; Toropov, Andrey A.; Roncaglioni, Alessandra; Benfenati, Emilio

doi:10.3390/macromol5020027

Open AccessArticle

In Silico Models of Biological Activities of Peptides Using the Coefficient of Conformism of a Correlative Prediction and the Las Vegas Algorithm

Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy

^*

Author to whom correspondence should be addressed.

Macromol 2025, 5(2), 27; https://doi.org/10.3390/macromol5020027

Submission received: 25 February 2025 / Revised: 20 May 2025 / Accepted: 4 June 2025 / Published: 13 June 2025

Download Versions Notes

Abstract

Peptides are substances with numerous applications in chemistry, biology, medicine, and agriculture. Systematization of knowledge related to peptides may well have not only scientific research but also economic consequences. This study examines the antioxidant activity of peptides and the ACE-inhibitory capacity of peptides. Peptides are considered here containing three or four amino acids. Nevertheless, instead of considering peptides as traditional molecules, an attempt is made here to systematize the corresponding endpoints as mathematical functions of lists of amino acids, rather than considering the corresponding atoms and covalent bonds. New techniques that may be useful in theory and in practice for the development of quantitative structure–property/activity relationships (QSPRs/QSARs) related to certain types of biological activity of peptides are proposed and discussed.

Keywords:

antioxidant activity; ACE-inhibitory activity; peptides; quasi-SMILES; QSAR; Monte Carlo method; CORAL software

1. Introduction

Peptides are substances that are sequences of amino acids linked together [1,2]. A non-rigorous, but useful statement can be formulated as follows: traditional substances are associations of atoms into molecules, and peptides are associations of amino acids. Peptides have a large number of roles (or functions). Neuropeptides are produced in the brain and regulate the functioning of the central nervous system [3,4]. Other peptides are regulators of immunity and increase protective functions [5,6]. Other peptides are hormones [7], affecting the course of the main physiological functions of the body. Other peptides are antibiotics with an antibacterial effect [8]. Thus, peptides are undoubtedly a very important object of study in the field of medicine, biology and natural sciences. Thus, modeling their biological activity should be considered a very important task in both theoretical and practical terms.

In principle, to develop models of the physicochemical and biochemical behavior of peptides, one can use traditional approaches developed for constructing quantitative structure–property/activity relationships (QSPRs/QSARs) oriented towards any type of molecule [9]. However, building models of peptides based on their molecular structure, described as a system of atoms linked by covalent bonds, is not always convenient; moreover, of course the bioactivity of peptides depends on amino acid composition [10]. There are approaches that combine molecular representation learning of 1D sequential tokens, 2D topology graphs, and 3D conformers [11,12]. It is possible to consider the tripeptides, tetrapeptides, pentapeptides and hexapeptides, etc., separately [13]. A popular trend is to use artificial intelligence ideas to model the biochemical behavior of peptides [14]. Research based on three-dimensional representations of peptides is another common strategy for searching QSAR for peptides [15]. There is research conducted on individual types of biological activity of peptides, mutagenicity [16], antidiabetic potential, antihypertension potential [17], and others [18]. Classical molecular descriptors also apply for modelling peptide activity [19,20]. Molecular docking has been used to simulate the activity of peptides in [13,21]. Furthermore, more sophisticated approaches that involve machine learning are applied [22].

The information about the amino acid sequences has been used in some cases as a basis for QSAR/QSAR analysis [23,24,25,26,27,28]. This approach is like quasi-SMILES [29]. Traditional SMILES is a special language for describing molecular structure, whereas quasi-SMILES is an extension of traditional SMILES by including some data not related to molecular structure, such as codes conveying experimental conditions. Amino acid sequences are a special case of quasi-SMILES.

In this paper, an attempt is made to use the experience of applying the mentioned approach (the endpoint model as a mathematical function of the amino acid sequence represented by their single-symbol designation). Unlike previous works, this study uses a recently proposed criterion for the predictive potential so-called coefficient of conformism of a correlative prediction (CCCP), as well as a Las Vegas algorithm that generates “lucky” distributions of available data into training and validation sets [29].

The essence of the Las Vegas algorithm for the stated purposes can be formulated as testing several splits into an active training set, a passive training set, a calibration set, and a validation set. The split where the best results were achieved for the calibration set is remembered, in the hope that good results can be expected also for the external validation set, not used in the process of building the model.

The aim of this study is (i) to check whether CCCP is useful to improve the predictive potential of the model; and (ii) to check whether the Las Vegas algorithm can provide satisfactory splits of available data in training and validation sets. These computational experiments are carried out with CORAL software-2024 (http://www.insilico.eu/coral accessed on 2 June 2025).

2. Materials and Methods

2.1. Data

In this study, two databases on the biological activity of peptides are considered. Database 1: Data on the antioxidant activity of 214 tripeptides taken from work [2]. These peptides were measured in vitro with their ACE-inhibitory activity IC50, which was experimentally determined as the inhibitory concentration of the peptide that reduced ACE activity by 50% [2]. Here, the negative logarithm of IC50 (pIC50) is considered. Database 2: Data on the inhibitory activity of 268 peptides in goat milk taken from [1]. In this case, two duplicates were found (identical sequences of amino acids) and removed from consideration [30]. All peptide sample activities were expressed by IC50 values, which represent the peptide concentration (in μM) required to block ACE activity by 50%. The negative logarithm of the above values was considered as an endpoint (pIC50).

The data were divided into these sets: active (≈25%) and passive training (≈25%), calibration (≈25%), and validation (≈25%). This distribution was performed using the Las Vegas algorithm, which boils down to the following steps. Ten random divisions were made into the specified sets to construct models for the active and passive training sets and the calibration set. The model with the best statistics for the calibration set was selected. The validation set was not used in the modelling phase.

2.2. Scheme for Model Construction

The formula for calculating optimal descriptors calculated from a list of amino acids (more precisely, from a list of so-called correlation weights of the corresponding amino acids) are as follows:

D 0 (T, N) = \sum C W (A_{k}) + \sum C W ({A A}_{k})

(1)

D S (T, N) = \sum C W (A_{k}) + \sum C W ({A A}_{k}) + C W (S y m m e t r y)

(2)

A_k denotes one amino acid, AA_k denotes a couple of amino acids which are neighbors in the sequence of amino acids for a peptide. T and N are parameters of the Monte Carlo optimization; T is the threshold to define rare amino acids, which are not involved in the optimization, having correlation weights equal to zero; N is the number of epochs of the optimization. Epoch is one cycle of modifications of all non-rare amino acids. The numerical values (T = 3 and N = 15) were selected within the initial computational experiments.

Symmetry is the number in the sequence of amino acid fragments such as XYX, where X and Y are one-symbol abbreviations of any amino acid. For instance, peptide LWE contains symmetry equal to zero (denoted as ‘xyx0’); LWL contains symmetry equal to one (denoted as ‘xyx1’).

The values of the correlation weights were found by Monte Carlo optimization using different mathematical functions. Naturally, the numerical values of the correlation weights for different functions differed significantly, which in turn led to different statistical quality of the corresponding models.

Equations (1) and (2) need the numerical data on the above correlation weights. The Monte Carlo optimization is a tool to calculate those correlation weights. Here, three target functions for the Monte Carlo optimization are examined (TF₁, TF₂, and TF₃):

{T F}_{0} = r_{A T} + r_{P T} - |r_{A T} - r_{P T}| \times 0.1

(3)

{T F}_{1} = {T F}_{0} + {I I C}_{C} \times 0.3

(4)

{T F}_{2} = {T F}_{0} + {C I I}_{C} \times 0.3

(5)

{T F}_{3} = {T F}_{0} + C C C P \times 0.3

(6)

The

r_{A T}

and

r_{P T}

are correlation coefficients between the observed and predicted endpoint for the active training and passive training sets, respectively. The statistical criteria applied in Equations (4)–(6) are described in the literature [31]. These are the index of ideality of correlation (IIC), the correlation intensity index (CII), and the conformism coefficient of correlative prediction (CCCP).

The IIC_C is calculated with data on the calibration set as the following:

{I I C}_{C} = r_{C} \frac{m i n ({}^{-}{M A E}_{C}, {}^{+}{M A E}_{C})}{m a x ({}^{-}{M A E}_{C}, {}^{+}{M A E}_{C})}

(7)

\min (x, y) = \{\begin{matrix} x, i f x < y \\ y, o t h e r w i s e \end{matrix}

(8)

\max (x, y) = \{\begin{matrix} x, i f x > y \\ y, o t h e r w i s e \end{matrix}

(9)

{}^{-}{M {A E}_{C}} = \frac{1}{{}^{-}N} \sum |∆_{k}|, {}^{-}{N i s t h e n u m b e r o f ∆_{k}} < 0

(10)

{}^{+}{M {A E}_{C}} = \frac{1}{{}^{+}N} \sum |∆_{k}|, {}^{+}{N i s t h e n u m b e r o f ∆_{k}} \geq 0

(11)

Δ_{k} = {o b s e r v e d}_{k} - {c a l c u l a t e d}_{k}

(12)

The observed and calculated are corresponding values of the endpoint.

The CII was developed as a tool to improve the quality of the Monte Carlo optimization aimed to build up QSPR/QSAR models.

The CII is calculated as follows:

{C I I}_{C} = 1 - \sum {P r o t e s t}_{k}

(13)

{P r o t e s t}_{k} = \{\begin{matrix} R_{k}^{2} - R^{2}, & i f R_{k}^{2} - R^{2} > 0 \\ 0, & o t h e r w i s e \end{matrix}

(14)

R² is the correlation coefficient for a set that contains n substances. R²_k is the correlation coefficient for n − 1 substances of a set after removing the k-th substance. Hence, if the ∆ = (R²_k − R²) is positive, the k-th substance is an “oppositionist” for the correlation between experimental and predicted values of the set. A small sum of “protests” means a better correlation.

However, in addition to the above-mentioned “oppositionists” of the correlation, there are also its “supporters”; in this case the ∆ = (R²_k − R²) is negative. The comparison of correlation coefficients separately for “supporters” and “oppositionists” of the correlation is an informative criterion for the Monte Carlo optimization, similarly to IIC and CII.

C C C P = \frac{\sum ∆ R (o p p o s i t i o n i s t)}{\sum ∆ R (s u p p o r t e r)}

(15)

The suggested ratio is able to show the measure of conformism of the “oppositionists” and “supporters”. Thus, one can name this criterion the conformism coefficient of correlative prediction (CCCP).

2.3. Applicability Domain

The applicability domain is determined based on the so-called statistical defects of the amino acids present in a given sequence of amino acids. Thus, the basis for assessing the suitability of a model for each peptide as a sequence of amino acids is the prevalence of its constituent amino acids extracted from the sequence. The rarer a sequence of amino acids is in the training sets, the greater its statistical defect; as a result, it is less likely to obtain a reliable model prediction. If the statistical defect of peptide (sequence of amino acids) is greater than twice the average statistical defect in the training sets, this peptide is considered a potential outlier. The statistical defect of amino acid is calculated as

d_{k} = \frac{|{P (A}_{k}) - {P' (A}_{k})|}{N (A_{k}) + N' (A_{k})} + \frac{|{P (A}_{k}) - {P ″ (A}_{k})|}{N (A_{k}) + N ″ (A_{k})} + \frac{|{P' (A}_{k}) - {P ″ (A}_{k})|}{N' (A_{k}) + N ″ (A_{k})}

(16)

where P(A_k), P′(A_k), P″(A_k) are the probability of A_k in the active training, passive training, and calibration sets, respectively; N(A_k), N′(A_k), and N″(A_k) are frequencies of A_k in the active training, passive training, and calibration sets, respectively. The statistical defects of peptides (D_j) are calculated as:

D_{j} = \sum_{k = 1}^{N A} d_{k}

(17)

where NA is the number of non-blocked amino acids in peptide.

A j-th peptide falls in the domain of applicability if

D_{j} < 2 * x \bar{D}

(18)

Compared with other methods to evaluate similarity and distance (e.g., the Euclidean distance), our approach focuses on specific molecular parts, and if a rare sequence of amino acids is found, this is enough to label the substance; instead, the Euclidean distance makes an overall balance with all the components of the molecules.

2.4. Mechanistic Interpretation

Mechanistic interpretation of the model can be obtained by carrying out several Monte Carlo optimizations runs under the same conditions (or using the same CORAL method (http://www.insilico.eu/coral accessed on 2 June 2025). In this case, the following types of amino acids will be obtained. Type one: all correlation weights are positive; type two: all correlation weights are negative. The first type can be interpreted as amino acids leading to an increase in the value of the endpoint under study. Conversely, the second type can be interpreted as amino acids whose presence leads to a decrease in the endpoint under study. It should be noted that when conducting the computational experiment, there will be amino acids with alternating correlation weights. However, they cannot be used as indicators of an increase or decrease in the values of the endpoint under study, and thus they are not useful for modelling purposes.

2.5. Model

The concept of models in the frames of the CORAL software-2024 (http://www.insilico.eu/coral accessed on 2 June 2025) is the following:

M o d e l = C_{0} + C_{1} \times D (T, N)

(19)

where C₀ and C₁ are regression coefficients and D(T,N) is descriptor D0(T,N) or DS(T,N), calculated with equations 1 or 2.

3. Results

The CORAL model provides different kinds of results, depending on the splits and the functions. First, CORAL gives predicted values, so that it is possible to get the best correlation prediction for the validation set; secondly, the model provides the average value and dispersion of the determination coefficient for the validation set over five splits in active, passive training, calibration, and validation sets. At the same time, statistical characteristics observed for five divisions obtained by the Las Vegas algorithm for the target function TF₀ are recorded, if used D0(3,15) or DS(3,15) with target functions TF₁, TF₂, and TF₃. The different models include or do not include descriptors using the information on symmetry (DS or D0), while the different functions introduce mathematical components, possibly improving the statistics. Thus, CORAL provides different results, related to different algorithms, and related to different splits, for each algorithm. The most interesting results are those related to the validation set, which refers to results obtained with substances not used to build up the model. This kind of statistics is representative of the expected performance when the model is applied to “new” substances. The other important statistical value is that observed for the calibration set. This value represents the statistics of the final model when it is optimized. The statistical values on the calibration set should be close to those of the validation set. The other statistical values, on what we call active and passive training sets, refer to preliminary steps in the modelling process, and thus are useful mainly for internal purposes to monitor the progress of the modelling process.

3.1. Database 1

Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 contain statistical characteristics of the antioxidant activity models of the peptide collection presented in the Supplementary Materials (Table S1). The best model is observed for split #1 using TF₃:

pIC50 = 1.2142 + 1.5047 × DS(3,15)

(20)

According to the computational experiments, the stable promoters of the growth of antioxidant activity are arginine (R), tyrosine (Y), leucine (L), proline (P), as well as the absence of fragments of local symmetry (xyx0). Amino acids with small prevalence can characterized by correlation weights with alternating sign in several runs of the optimization.

3.2. Database 2

Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 contain the statistical characteristics of the antioxidant activity models of the peptide collection presented in the Supplementary Materials (Table S2). The best model is that observed for split #3:

pIC50 = 3.0784 + 0.4286 × DCW(3,15)

(21)

According to the computational experiments, the stable promoters of the growth of the inhibitory activity of peptides are tyrosine (Y), valine (V), isoleucine (I), and lysine (K) as well as the absence of fragments of local symmetry (xyx0). Amino acids with small prevalence can characterized by correlation weights with alternating sign in several runs of the optimization.

4. Discussion

Table 13 contains the statistical quality of models on dataset 1 considering the validation set. One can see that the best results are observed in the case of descriptors considering the symmetry xyx. Furthermore, the target function TF₃, which uses the CCCP, turned out to be the most promising for achieving the best predictive potential, and this is observed both when the symmetry descriptors are used or not. It is also possible to see that the TF₂ has a larger standard deviation. The larger deviation is an indication of a noisy model.

Table 14 contains the statistical quality of models on dataset 2 on the validation set. Again, one can see, the best results observed in the case of descriptors considering the symmetry (xyx). Again, the target function TF₃, which uses the CCCP, turned out to be the most promising for achieving the best predictive potential.

Despite the apparent simplicity of the considered approach, useful theoretical and practical aspects are quite noticeable. Firstly, future expansion of the information capacity of the proposed system of accounting of symmetry of peptides, where equivalent positions are considered in terms of the presence of different amino acids, are quite obvious. Secondly, once the most appropriate descriptors are identified, including the symmetry, the proposed algorithms, such as IIC, CII, and CCCP, can subsequently improve the modelling process, as tools for controlling the optimization by the Monte Carlo method. Finally, the construction of databases, where peptides are represented by sequences of amino acids, can prove competitive in comparison with traditional databases, both in economic and heuristic aspects. Thus, the possibility of applying stochastic methods to develop models of endpoints related to peptides is demonstrated.

Table 15 contains a comparison of some analogic models for peptide endpoints from the literature. Our models provide good results, better than results reported in the literature.

Nevertheless, we underline that peptides may be very different, as well as their properties and effects, thus, these studies are only partial. With increasing length (number of amino acids), new effects of the impact of amino acid collectives on the properties under study may arise. It is natural to expect a broader range of effects with increasing complexity of the peptides. However, it is difficult to say when the transition from quantity to quality will occur. To properly address these aspects, new data and also new studies with new approaches are necessary. The characterization of the peptides should be improved, and one possibility is to use the information on the size and the overall polarity, for instance.

Based on the available results, the approach under consideration is characterized by satisfactory predictive potential, at least for a preliminary assessment of various aspects of peptide behavior. The symmetry ‘xyx’-type can be supplemented by additional symmetry types, such as ‘xyyx’, ‘xyzyx’, and possibly more complex ones. In addition, combinations of the considered criteria of predictive potential (for example, IIC and CII or IIC and CCCP) can be effective.

IIC and CII improve the predictive potential of the model for the calibration set, but to the detriment of active and passive training sets [29]. Apparently, CCCP gives the same effect. This is due to the fact that initial steps of the model, as represented by the results of the active and passive training sets, may be attracted by some not common peptides; thus, the initial model is quite local, while in the following phase, as performed with the calibration step, the role of the general features present in the larger population prevails. Thus, the statistics of the calibration set are much more informative.

5. Conclusions

The antioxidant activity and ACE-inhibitory activity of peptides can be simulated using the Monte Carlo technique, achieving good statistical performance. The conception of xyx symmetry in peptides improves the predictive potential of the QSAR model. Comparison of the criteria of the predictive potential, such as IIC, CII, and CCCP, shows some advantages of CCCP. This study demonstrates that it is possible to use a quite simplified format for the description of the peptides, without using time-consuming and more sophisticated approaches, to address the activity of peptides. The CORAL software can successfully handle both the chemical format and the related algorithms. These studies are useful both from a methodological point of view, indicating the most favourable conditions and algorithms, and from a practical point of view, introducing models that can be used to explore properties of peptides. The obtained models compare favourably with other models previously published, showing clear improvements. The approach presented here can be useful in the search for new drugs and the selection of peptides for various practical applications.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/macromol5020027/s1, Table S1 contains the best model for database 1. Table S2 contains the best model for database 2.

Author Contributions

Conceptualization, A.P.T., A.A.T., A.R. and E.B.; data curation, A.P.T., A.A.T., A.R. and E.B.; writing—original draft preparation, A.P.T., A.A.T., A.R. and E.B.; writing—review and editing, A.P.T., A.A.T., A.R. and E.B.; supervision, A.R. and E.B.; project administration, E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by EFSA within the project sOFT-ERA, OC/EFSA/IDATA/2022/02.

Data Availability Statement

Data are available in this article or its Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations and symbols are used in this manuscript:

Abbreviation	Meaning
QSPR/QSAR	Quantitative structure–property/activity relationships
A	Active training set
P	Passive training set
C	Calibration set
V	Validation set
CCCP	Coefficient of conformism of a correlative prediction
IIC	Index of ideality of correlation
CII	Correlation intensity index
FLS	Fragments of local symmetry
CCC	Concordance correlation coefficient
R²	Determination coefficient
Q²	Cross-validated R²
MAE	Mean absolute error
F	Fischer F-ratio
Nact	The number of features under optimization.

References

Du, A.; Jia, W. Bioaccessibility of novel antihypertensive short-chain peptides in goat milk using the INFOGEST static digestion model by effect-directed assays. Food Chem. 2023, 427, 136735. [Google Scholar] [CrossRef]
Wang, J.-H.; Liu, Y.-L.; Ning, J.-H.; Yu, J.; Li, X.-H.; Wang, F.-X. Is the structural diversity of tripeptides sufficient for developing functional food additives with satisfactory multiple bioactivities? J. Mol. Struct. 2013, 1040, 164–170. [Google Scholar] [CrossRef]
Schüß, C.; Vu, O.; Mishra, N.M.; Tough, I.R.; Du, Y.; Stichel, J.; Cox, H.M.; Weaver, C.D.; Meiler, J.; Emmitte, K.A.; et al. Structure-Activity Relationship Study of the High-Affinity Neuropeptide Y4 Receptor Positive Allosteric Modulator VU0506013. J. Med. Chem. 2023, 66, 8745–8766. [Google Scholar] [CrossRef] [PubMed]
Wang, M.; Li, X.; Chen, M.; Wu, X.; Mi, Y.; Kai, Z.; Yang, X. 3D-QSAR based optimization of insect neuropeptide allatostatin analogs. Bioorg. Med. Chem. Lett. 2019, 29, 890–895. [Google Scholar] [CrossRef]
Granstein, R.D.; Wagner, J.A.; Stohl, L.L.; Ding, W. Calcitonin gene-related peptide: Key regulator of cutaneous immunity. Acta Physiol. 2015, 213, 586–594. [Google Scholar] [CrossRef]
Ren, M.; Wang, Y.; Zheng, X.; Yang, W.; Liu, M.; Xie, S.; Yao, Y.; Yan, J.; He, W. Hydrogelation of peptides and carnosic acid as regulators of adaptive immunity against postoperative recurrence of cutaneous melanoma. J. Control. Release 2024, 375, 654–666. [Google Scholar] [CrossRef]
Besman, M.; Zambrowicz, A.; Matwiejczyk, M. Review of Thymic Peptides and Hormones: From Their Properties to Clinical Application. Int. J. Pept. Res. Ther. 2025, 31, 10. [Google Scholar] [CrossRef]
He, Y.; He, X. Molecular design and genetic optimization of antimicrobial peptides containing unnatural amino acids against antibiotic-resistant bacterial infections. Biopolymers 2016, 106, 746–756. [Google Scholar] [CrossRef] [PubMed]
Zhou, P.; Liu, Q.; Wu, T.; Miao, Q.; Shang, S.; Wang, H.; Chen, Z.; Wang, S.; Wang, H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J. Chem. Inf. Model. 2021, 61, 1718–1731. [Google Scholar] [CrossRef]
Kashung, P.; Karuthapandian, D. Milk-derived bioactive peptides. Food Prod. Process. Nutr. 2025, 7, 6. [Google Scholar] [CrossRef]
Li, J.; Zong, K.; Wei, C.; Zhong, Q.; Yan, H.; Wang, J.; Li, X. Design, synthesis, and biological activity of human glutaminyl cyclase inhibitors against Alzheimer’s disease. Bioorg. Med. Chem. 2025, 120, 118105. [Google Scholar] [CrossRef]
Yin, K.; Li, R.; Zhang, S.; Sun, Y.; Huang, L.; Jiang, M.; Xu, D.; Xu, W. Deep learning combined with quantitative structure-activity relationship accelerates de novo design of antifungal peptides. Adv. Sci. 2025, 12, 2412488. [Google Scholar] [CrossRef]
Chen, Q.; Ge, Y.; He, X.; Li, S.; Fang, Z.; Li, C.; Chen, H. Virtual-screening of xanthine oxidase inhibitory peptides: Inhibition mechanisms and prediction of activity using machine-learning. Food Chem. 2024, 460, 140741. [Google Scholar] [CrossRef]
Khalaf, W.S.; Morgan, R.N.; Elkhatib, W.F. Clinical microbiology and artificial intelligence: Different applications, challenges, and future prospects. J. Microbiol. Methods 2025, 232–234, 107125. [Google Scholar] [CrossRef] [PubMed]
Tran, T.T.N.; Nguyen, H.T.D.; Nguyen, V.C. A Machine Learning-Driven 3D-QSAR Approach for developing antioxidant preservatives from bovine hemoglobin and tryptophyllin l for meat products. Pept. Sci. 2025, 117, e70004. [Google Scholar] [CrossRef]
Rane, R.; Satpute, B.; Kumar, D.; Suryawanshi, M.; Prabhune, A.G.; Gawade, B.; Mahajan, A.; Pawar, A.; Sakat, S. Mutagenic and genotoxic in silico QSAR prediction of dimer impurity of gliflozins; canagliflozin, dapaglifozin, and emphagliflozin and in vitro evaluation by Ames and micronucleus test. Drug Chem. Toxicol. 2025, 48, 416–425. [Google Scholar] [CrossRef] [PubMed]
Ye, X.; Yang, R.; Yang, Z.; Huang, B.; Riaz, T.; Zhao, C.; Chen, J. Novel angiotensin-I-converting enzyme (ACE) inhibitory peptides from Porphyra haitanensis: Screening, digestion stability, and mechanistic insights. Food Biosci. 2025, 68, 106460. [Google Scholar] [CrossRef]
Cournoyer, A.; Bernier, M.-È.; Aboubacar, H.; de Toro-Martín, J.; Vohl, M.-C.; Ravallec, R.; Cudennec, B.; Bazinet, L. Machine learning-driven discovery of bioactive peptides from duckweed (Lemnaceae) protein hydrolysates: Identification and experimental validation of 20 novel antihypertensive, antidiabetic, and/or antioxidant peptides. Food Chem. 2025, 482, 144029. [Google Scholar] [CrossRef]
Garro, L.A.; Andrada, M.F.; Vega-Hissi, E.G.; Barberis, S.; Garro Martinez, J.C. Development of QSARs for cysteine-containing di- and tripeptides with antioxidant activity:influence of the cysteine position. J. Comput. Aided Mol. Des. 2024, 38, 27. [Google Scholar] [CrossRef]
van der Walt, M.; Möller, D.S.; van Wyk, R.J.; Ferguson, P.M.; Hind, C.K.; Clifford, M.; Do Carmo Silva, P.; Sutton, J.M.; Mason, A.J.; Bester, M.J.; et al. QSAR reveals decreased lipophilicity of polar residues determines the selectivity of antimicrobial peptide activity. ACS Omega 2024, 9, 26030–26049. [Google Scholar] [CrossRef]
Wang, B.; Zhang, H.; Wen, Y.; Yuan, W.; Chen, H.; Lin, L.; Guo, F.; Zheng, Z.-P.; Zhao, C. The novel angiotensin-I-converting enzyme inhibitory peptides from Scomber japonicus muscle protein hydrolysates: QSAR-based screening, molecular docking, kinetic and stability studies. Food Chem. 2024, 447, 138873. [Google Scholar] [CrossRef] [PubMed]
Martínez-Mauricio, K.L.; García-Jacas, C.R.; Cordoves-Delgado, G. Examining evolutionary scale modeling-derived different-dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow 2. Protein Sci. 2024, 33, e4928. [Google Scholar] [CrossRef]
Mahmoodi-Reihani, M.; Abbasitabar, F.; Zare-Shahabadi, V. In Silico Rational Design and Virtual Screening of Bioactive Peptides Based on QSAR Modeling. ACS Omega 2020, 5, 5951–5958. [Google Scholar] [CrossRef]
Toropova, A.P.; Raškova, M.; Raška, I., Jr.; Toropov, A.A. The sequence of amino acids as the basis for the model of biological activity of peptides. Theor. Chem. Acc. 2021, 140, 15. [Google Scholar] [CrossRef] [PubMed]
Moinul, M.; Khatun, S.; Abdul Amin, S.; Jha, T.; Gayen, S. Quasi-SMILES as a tool for peptide QSAR modelling. In QSPR/QSAR Analysis Using SMILES and Quasi-SMILES. Challenges and Advances in Computational Chemistry and Physics; Toropova, A.P., Toropov, A.A., Eds.; Springer: Cham, Switzerland, 2023; Volume 33, pp. 269–294. [Google Scholar] [CrossRef]
Toropova, A.P.; Toropov, A.A.; Kumar, P.; Kumar, A.; Achary, P.G.R. Fragments of local symmetry in a sequence of amino acids: Does one can use for QSPR/QSAR of peptides? J. Mol. Struct. 2023, 1293, 136300. [Google Scholar] [CrossRef]
Toropova, M.A.; Veselinović, A.M.; Veselinović, J.B.; Stojanović, D.B.; Toropov, A.A. QSAR modeling of the antimicrobial activity of peptides as a mathematical function of a sequence of amino acids. Comput. Biol. Chem. 2015, 59, 126–130. [Google Scholar] [CrossRef] [PubMed]
Toropov, A.A.; Toropova, A.P.; Raska, I., Jr.; Benfenati, E.; Gini, G. QSAR modeling of endpoints for peptides which is based on representation of the molecular structure by a sequence of amino acids. Struct. Chem. 2012, 23, 1891–1904. [Google Scholar] [CrossRef]
Toropova, A.P.; Toropov, A.A. The coefficient of conformism of a correlative prediction (CCCP): Building up reliable nano-QSPRs/QSARs for endpoints of nanoparticles in different experimental conditions encoded via quasi-SMILES. Sci. Total Environ. 2024, 927, 172119. [Google Scholar] [CrossRef]
Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. Quantitative structure–activity relationship models for the angiotensin-converting enzyme inhibitory activities of short-chain peptides of goat milk using quasi-SMILES. Macromol 2024, 4, 387–400. [Google Scholar] [CrossRef]
Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. In Silico Simulation of Daphnia magna Immobilization Exposed to Mixtures of TiO₂ Nanoparticles with Inorganic Compounds. J. Compos. Sci. 2025, 9, 16. [Google Scholar] [CrossRef]
Guendouzi, A.; Belkhiri, L.; Guendouzi, A.; Derouiche, T.M.T.; Djekoun, A. A combined in silico approaches of 2D-QSAR, molecular docking, molecular dynamics and ADMET prediction of anti-cancer inhibitor activity for actinonin derivatives. J. Biomol. Struct. Dyn. 2024, 42, 119–133. [Google Scholar] [CrossRef] [PubMed]
Wang, F.; Wen, M.; Zhou, B. Exploring details about structure requirements based on antioxidant tripeptide derived from β-Lactoglobulin by in silico approaches. Amino Acids 2023, 55, 1909–1922. [Google Scholar] [CrossRef] [PubMed]

Table 1. The statistical quality of models applying descriptor D0(3,15) and target function TF₁. The average determination coefficient on the validation set is 0.673 ± 0.064; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	53	0.4186	0.5902	0.5777	0.7461	0.3780	0.1623	1.72	37
	P	53	0.4442	0.5730	0.4105	0.7869	0.3897	0.3276	1.92	41
	C	54	0.7315	0.8492	0.8518	0.8648	0.7128	0.5358	0.693	142
	V	54	0.7763	-	-	-	-	-	0.79	-
2	A	54	0.4898	0.6575	0.4812	0.7625	0.4548	0.0508	1.74	50
	P	53	0.4412	0.6126	0.4035	0.7716	0.3974	0.2435	1.74	40
	C	54	0.7196	0.8441	0.8482	0.8129	0.6993	0.3364	0.737	133
	V	53	0.6082	-	-	-	-	-	0.95	-
3	A	54	0.4616	0.6317	0.4324	0.7739	0.4223	0.2087	1.90	45
	P	54	0.4364	0.6185	0.5263	0.7452	0.3852	−0.2171	1.77	40
	C	53	0.7799	0.8196	0.8821	0.8488	0.7628	0.2735	0.886	181
	V	53	0.6400	-	-	-	-	-	0.94	-
4	A	54	0.4315	0.6029	0.4180	0.7942	0.3853	0.3020	1.87	39
	P	54	0.4122	0.5763	0.6063	0.7196	0.3669	−0.1321	1.71	36
	C	53	0.5735	0.6745	0.7558	0.7556	0.5471	0.1111	1.07	69
	V	53	0.6230	-	-	-	-	-	1.07	-
5	A	54	0.4341	0.6054	0.5680	0.7471	0.3884	0.1746	1.84	40
	P	53	0.4817	0.6049	0.3838	0.7440	0.4447	0.1200	1.72	47
	C	54	0.6904	0.8284	0.8305	0.8220	0.6695	0.4316	0.790	116
	V	53	0.7162	-	-	-	-	-	0.77	-

Table 2. The statistical quality of models applying descriptor D0(3,15) and target function TF₂. The average determination coefficient on the validation set is 0.555 ± 0.144; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	53	0.5003	0.6670	0.6316	0.7785	0.4580	0.3387	1.59	51
	P	53	0.5515	0.6486	0.5520	0.7748	0.5064	0.3229	1.72	63
	C	54	0.4318	0.5940	0.5550	0.7841	0.3817	0.1768	1.34	40
	V	54	0.7462	-	-	-	-	-	0.87	-
2	A	54	0.5504	0.7100	0.5505	0.7545	0.5181	−0.0525	1.63	64
	P	53	0.5508	0.7292	0.6894	0.8061	0.5062	0.2358	1.58	63
	C	54	0.3083	0.5027	0.5104	0.7380	0.2292	−0.0362	1.70	23
	V	53	0.3066	-	-	-	-	-	1.83	-
3	A	54	0.5267	0.6900	0.5385	0.7851	0.4872	0.2728	1.78	58
	P	54	0.5380	0.6922	0.4572	0.7556	0.4943	0.0738	1.58	61
	C	53	0.6450	0.7509	0.7245	0.7816	0.6016	0.3030	1.07	93
	V	53	0.5316	-	-	-	-	-	1.17	-
4	A	54	0.5285	0.6915	0.5394	0.7903	0.4851	0.3570	1.70	58
	P	54	0.4467	0.6109	0.5755	0.7217	0.4034	−0.0215	1.66	42
	C	53	0.6245	0.7036	0.5481	0.8039	0.5986	0.3578	1.10	85
	V	53	0.6251	-	-	-	-	-	1.15	-
5	A	54	0.5598	0.7178	0.5986	0.7712	0.5259	0.2167	1.62	66
	P	53	0.5854	0.6533	0.5561	0.7814	0.5521	0.3261	1.59	72
	C	54	0.4628	0.6550	0.6001	0.7497	0.4248	−0.0275	1.16	45
	V	53	0.5697	-	-	-	-	-	1.02	-

Table 3. The statistical quality of models applying descriptor D0(3,15) and target function TF₃. The average determination coefficient on the validation set is 0.719 ± 0.071; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	53	0.4467	0.6176	0.6436	0.7586	0.4033	0.2435	1.68	41
	P	53	0.5025	0.6064	0.4925	0.7690	0.4575	0.2569	1.80	52
	C	54	0.6116	0.7390	0.7245	0.7919	0.5795	0.4200	0.992	82
	V	54	0.7398	-	-	-	-	-	0.82	-
2	A	54	0.4829	0.6513	0.6453	0.7587	0.4390	0.0653	1.75	49
	P	53	0.4699	0.6544	0.5045	0.7992	0.4282	0.3661	1.70	45
	C	54	0.7329	0.8355	0.8356	0.8545	0.7116	0.6257	0.815	143
	V	53	0.5865	-	-	-	-	-	1.14	-
3	A	54	0.4564	0.6267	0.5824	0.7592	0.4175	0.1189	1.91	44
	P	54	0.4351	0.6151	0.5814	0.7763	0.3910	0.2176	1.71	40
	C	53	0.8272	0.8494	0.4508	0.8917	0.8167	0.7148	0.779	244
	V	53	0.7282	-	-	-	-	-	0.81	-
4	A	54	0.4874	0.6554	0.4800	0.7828	0.4441	0.2803	1.77	49
	P	54	0.4154	0.5828	0.6122	0.7286	0.3667	0.0135	1.71	37
	C	53	0.8065	0.8519	0.5969	0.8894	0.7936	0.7060	0.751	213
	V	53	0.8017	-	-	-	-	-	0.79	-
5	A	54	0.4940	0.6613	0.5623	0.7724	0.4545	0.3352	1.74	51
	P	53	0.5015	0.6045	0.3406	0.7347	0.4681	0.2146	1.71	51
	C	54	0.7668	0.8721	0.8225	0.8697	0.7484	0.6937	0.699	171
	V	53	0.7389	-	-	-	-	-	0.73	-

Table 4. The statistical quality of models applying descriptor DS(3,15) and target function TF₁. The average determination coefficient on the validation set is 0.718 ± 0.065; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	53	0.4085	0.5800	0.6155	0.7756	0.3648	0.1055	1.73	35
	P	53	0.4099	0.5397	0.3697	0.7687	0.3576	0.2399	1.96	35
	C	54	0.6723	0.8077	0.8190	0.8249	0.6506	0.4542	0.781	107
	V	54	0.7905	-	-	-	-	-	0.76	-
2	A	54	0.5085	0.6742	0.7131	0.7426	0.4744	−0.0974	1.71	54
	P	53	0.4866	0.6473	0.5123	0.7591	0.4500	0.1895	1.67	48
	C	54	0.5195	0.7016	0.7204	0.7691	0.4743	0.0744	1.11	56
	V	53	0.6382	-	-	-	-	-	0.96	-
3	A	54	0.3719	0.5422	0.4193	0.7384	0.3361	−0.0119	2.05	31
	P	54	0.3691	0.5544	0.5412	0.7373	0.3182	0.0333	1.80	30
	C	53	0.6810	0.7711	0.8238	0.7990	0.6573	0.1410	0.885	109
	V	53	0.6446	-	-	-	-	-	0.90	-
4	A	54	0.4669	0.6366	0.6345	0.8064	0.4278	0.4151	1.81	46
	P	54	0.4715	0.6042	0.5903	0.7656	0.4317	0.2127	1.63	46
	C	53	0.7145	0.8255	0.8445	0.8276	0.6937	0.3913	0.778	128
	V	53	0.7770	-	-	-	-	-	0.75	-
5	A	54	0.4786	0.6474	0.5964	0.7662	0.4409	0.2791	1.76	48
	P	53	0.5387	0.6010	0.3847	0.7573	0.5027	0.2696	1.69	60
	C	54	0.6738	0.8128	0.8206	0.8095	0.6507	0.3382	0.807	107
	V	53	0.7402	-	-	-	-	-	0.74	-

Table 5. The statistical quality of models applying descriptor DS(3,15) and target function TF₂. The average determination coefficient on the validation set is 0.550 ± 0.131; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	53	0.5118	0.6771	0.6889	0.7668	0.4719	0.2756	1.57	53
	P	53	0.5254	0.6446	0.4894	0.7726	0.4779	0.2716	1.76	56
	C	54	0.5309	0.6811	0.6230	0.7913	0.4940	0.1979	1.21	59
	V	54	0.7415	-	-	-	-	-	0.85	-
2	A	54	0.6002	0.7501	0.6198	0.7938	0.5673	0.2997	1.54	78
	P	53	0.5760	0.7528	0.6341	0.7948	0.5354	0.1572	1.54	69
	C	54	0.2279	0.4092	0.4109	0.7441	0.1304	−0.0253	2.03	15
	V	53	0.3301	-	-	-	-	-	1.90	-
3	A	54	0.5588	0.7169	0.5139	0.7947	0.5236	0.2960	1.72	66
	P	54	0.5614	0.7094	0.5205	0.7764	0.5208	0.1697	1.52	67
	C	53	0.5841	0.6915	0.6708	0.7856	0.5304	0.3957	1.30	72
	V	53	0.5503	-	-	-	-	-	1.22	-
4	A	54	0.4892	0.6570	0.5189	0.7776	0.4462	0.2829	1.77	50
	P	54	0.4898	0.6275	0.5827	0.7558	0.4525	0.2073	1.61	50
	C	53	0.5716	0.6163	0.3655	0.7894	0.5430	0.1404	1.27	68
	V	53	0.5452	-	-	-	-	-	1.23	-
5	A	54	0.5756	0.7307	0.6070	0.7833	0.5397	0.1936	1.59	71
	P	53	0.5800	0.7066	0.5849	0.7735	0.5495	0.2993	1.54	70
	C	54	0.5861	0.7370	0.7116	0.7858	0.5589	0.2611	1.08	74
	V	53	0.5829	-	-	-	-	-	1.14	-

Table 6. The statistical quality of models applying descriptor DS(3,15) and target function TF₃. The average determination coefficient on the validation set is 0.764 ± 0.068; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	53	0.3940	0.5652	0.6044	0.7437	0.3506	0.1639	1.75	33
	P	53	0.4276	0.5583	0.4450	0.7746	0.3773	0.2540	1.93	38
	C	54	0.7795	0.8724	0.7921	0.8762	0.7656	0.6769	0.659	184
	V	54	0.8645	-	-	-	-	-	0.69	-
2	A	54	0.4264	0.5979	0.5224	0.7434	0.3801	0.1072	1.84	39
	P	53	0.4234	0.5990	0.4421	0.7665	0.3846	0.2149	1.77	37
	C	54	0.7231	0.8255	0.6353	0.8496	0.6992	0.6389	0.796	136
	V	53	0.7076	-	-	-	-	-	0.79	-
3	A	54	0.4353	0.6066	0.4895	0.7440	0.3979	0.1082	1.94	40
	P	54	0.4108	0.6050	0.6292	0.7478	0.3587	−0.2145	1.73	36
	C	53	0.8000	0.8305	0.5113	0.8755	0.7837	0.6745	0.848	204
	V	53	0.7049	-	-	-	-	-	0.88	-
4	A	54	0.4198	0.5914	0.5586	0.7896	0.3745	0.2578	1.88	38
	P	54	0.4209	0.5588	0.6292	0.7432	0.3763	0.0544	1.71	38
	C	53	0.8150	0.8776	0.8115	0.9004	0.8002	0.7697	0.608	225
	V	53	0.8282	-	-	-	-	-	0.62	-
5	A	54	0.5029	0.6692	0.6113	0.7737	0.4651	0.2869	1.72	53
	P	53	0.5021	0.6405	0.4509	0.7327	0.4693	0.1690	1.67	51
	C	54	0.7861	0.8718	0.7648	0.8870	0.7705	0.7127	0.736	191
	V	53	0.7149	-	-	-	-	-	0.82	-

Table 7. The statistical quality of models applying descriptor D0(3,15) and target function TF₁. The average determination coefficient on the validation set is 0.525 ± 0.044; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	67	0.5155	0.6803	0.6183	0.7658	0.4920	0.0840	0.911	69
	P	66	0.3678	0.5213	0.4903	0.6934	0.3308	−0.3774	0.946	37
	C	67	0.3858	0.6178	0.6208	0.7076	0.3421	−0.2355	0.766	41
	V	66	0.5930	-	-	-	-	-	0.66	-
2	A	67	0.4668	0.6365	0.6631	0.7510	0.4338	−0.1719	0.957	57
	P	67	0.2059	0.4280	0.4162	0.7275	0.1475	−0.4468	1.12	17
	C	66	0.3124	0.5529	0.5569	0.6899	0.2435	−0.2488	0.759	29
	V	66	0.5025	-	-	-	-	-	0.77	-
3	A	67	0.4031	0.5746	0.5148	0.7392	0.3736	−0.1289	0.912	44
	P	66	0.4394	0.4974	0.6214	0.7483	0.4083	−0.0126	1.03	50
	C	67	0.4752	0.6886	0.6889	0.7352	0.4429	0.0010	0.720	59
	V	66	0.5421	-	-	-	-	-	0.67	-
4	A	66	0.2066	0.3425	0.4546	0.7404	0.1625	−0.4292	1.13	17
	P	66	0.2029	0.2730	0.3764	0.7276	0.1615	−0.6328	1.03	16
	C	67	0.4376	0.5641	0.5323	0.7292	0.3963	−0.1733	0.661	51
	V	67	0.4589	-	-	-	-	-	0.79	-
5	A	67	0.5872	0.7400	0.7006	0.7865	0.5647	0.0027	0.866	92
	P	66	0.4448	0.6107	0.4936	0.7539	0.4164	−0.0304	0.935	51
	C	66	0.4951	0.6817	0.7036	0.7503	0.4589	−0.0205	0.747	63
	V	67	0.5306	-	-	-	-	-	0.84	-

Table 8. The statistical quality of models applying descriptor D0(3,15) and target function TF₂. The average determination coefficient on the validation set is 0.458 ± 0.067; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	67	0.5535	0.7126	0.6802	0.7718	0.5312	0.1148	0.874	81
	P	66	0.4136	0.5689	0.5260	0.7047	0.3776	−0.4531	0.906	45
	C	67	0.1656	0.4055	0.3465	0.7516	0.0819	−0.5802	0.945	13
	V	66	0.4695	-	-	-	-	-	0.79	-
2	A	67	0.4294	0.6008	0.5991	0.7269	0.3923	−0.0821	0.990	49
	P	67	0.2858	0.4999	0.5095	0.7189	0.2420	−0.2515	1.05	26
	C	66	0.0725	0.2594	0.1964	0.8198	0.0000	−0.4639	1.07	5
	V	66	0.4318	-	-	-	-	-	0.84	-
3	A	67	0.4688	0.6384	0.6260	0.7287	0.4425	−0.0381	0.861	57
	P	66	0.4683	0.6264	0.5587	0.7550	0.4374	0.0264	0.963	56
	C	67	0.4041	0.6135	0.4929	0.7330	0.3651	−0.0690	0.925	44
	V	66	0.5190	-	-	-	-	-	0.79	-
4	A	66	0.3085	0.4715	0.5227	0.7212	0.2710	−0.3468	1.06	29
	P	66	0.2500	0.3513	0.3068	0.7132	0.2077	−0.3624	1.01	21
	C	67	0.2592	0.4627	0.2994	0.7631	0.2109	−0.3577	0.831	23
	V	67	0.3431	-	-	-	-	-	0.91	-
5	A	67	0.5575	0.7159	0.6429	0.7754	0.5317	0.1568	0.897	82
	P	66	0.5528	0.6442	0.5107	0.7535	0.5297	−0.0981	0.927	79
	C	66	0.4001	0.5650	0.5668	0.7136	0.3378	−0.1096	0.981	43
	V	67	0.5286	-	-	-	-	-	0.95	-

Table 9. The statistical quality of models applying descriptor D0(3,15) and target function TF₃. The average determination coefficient on the validation set is 0.526 ± 0.028; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	67	0.4596	0.6298	0.5497	0.7402	0.4325	−0.1192	0.962	55
	P	66	0.3353	0.4846	0.4509	0.6947	0.2989	−0.3925	0.965	32
	C	67	0.4511	0.6702	0.6598	0.7596	0.4141	0.2419	0.679	53
	V	66	0.5430	-	-	-	-	-	0.68	-
2	A	67	0.3705	0.5406	0.5908	0.7274	0.3345	−0.2820	1.04	38
	P	67	0.2444	0.4128	0.4831	0.7252	0.1971	−0.6011	1.05	21
	C	66	0.3725	0.6103	0.5246	0.7522	0.3210	0.0950	0.666	38
	V	66	0.5385	-	-	-	-	-	0.69	-
3	A	67	0.3762	0.5467	0.5953	0.7323	0.3472	−0.2253	0.933	39
	P	66	0.5037	0.5975	0.6135	0.7623	0.4745	0.1579	0.950	65
	C	67	0.4930	0.7015	0.5590	0.7513	0.4606	0.1828	0.713	63
	V	66	0.5465	-	-	-	-	-	0.69	-
4	A	66	0.2489	0.3986	0.4695	0.7096	0.2076	−0.5025	1.10	21
	P	66	0.1949	0.2997	0.2903	0.7325	0.1499	−0.6218	1.04	15
	C	67	0.4518	0.5918	0.3463	0.8025	0.4167	0.3853	0.699	54
	V	67	0.4713	-	-	-	-	-	0.80	-
5	A	67	0.5545	0.7134	0.7227	0.7675	0.5239	−0.0080	0.900	81
	P	66	0.3329	0.5330	0.3591	0.7409	0.2955	−0.1086	1.07	32
	C	66	0.4773	0.6659	0.6393	0.7738	0.4358	0.2862	0.744	58
	V	67	0.5329	-	-	-	-	-	0.90	-

Table 10. The statistical quality of models applying descriptor DS(3,15) and target function TF₁. The average determination coefficient on the validation set is 0.711 ± 0.045; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	67	0.5430	0.7038	0.6345	0.7586	0.5190	0.1508	0.885	77
	P	66	0.4224	0.5744	0.6110	0.7084	0.3909	−0.2587	0.894	47
	C	67	0.5644	0.7497	0.7512	0.7799	0.5353	−0.0017	0.611	84
	V	66	0.6833	-	-	-	-	-	0.62	-
2	A	67	0.4364	0.6077	0.6412	0.7088	0.4043	−0.2366	0.984	50
	P	67	0.3732	0.4999	0.5345	0.6830	0.3404	−0.4046	0.961	39
	C	66	0.7101	0.8420	0.8426	0.8300	0.6932	0.4950	0.424	157
	V	66	0.7877	-	-	-	-	-	0.46	-
3	A	67	0.4621	0.6321	0.6598	0.7226	0.4336	−0.2341	0.866	56
	P	66	0.4426	0.6202	0.6375	0.7519	0.4126	0.0041	0.983	51
	C	67	0.6435	0.7984	0.8006	0.7968	0.6227	−0.0491	0.601	117
	V	66	0.7067	-	-	-	-	-	0.54	-
4	A	66	0.4453	0.6162	0.5911	0.7221	0.4187	−0.5249	0.945	51
	P	66	0.3214	0.5133	0.4233	0.7167	0.2825	−0.2437	0.951	30
	C	67	0.6268	0.7763	0.7901	0.8344	0.6016	0.5212	0.602	109
	V	67	0.6535	-	-	-	-	-	0.62	-
5	A	67	0.5692	0.7255	0.7323	0.7592	0.5468	−0.1441	0.885	86
	P	66	0.4215	0.5878	0.6413	0.7188	0.3907	−0.2753	0.946	47
	C	66	0.6686	0.8113	0.8174	0.8112	0.6493	0.1802	0.535	129
	V	67	0.7241	-	-	-	-	-	0.57	-

Table 11. The statistical quality of models applying descriptor DS(3,15) and target function TF₂. The average determination coefficient on the validation set is 0.711 ± 0.034; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	67	0.6152	0.7618	0.6754	0.7861	0.5967	0.3185	0.812	104
	P	66	0.4360	0.6458	0.6359	0.7089	0.4015	−0.1545	0.898	49
	C	67	0.5035	0.6887	0.6943	0.7676	0.4684	−0.1207	0.774	66
	V	66	0.7338	-	-	-	-	-	0.75	-
2	A	67	0.5085	0.6742	0.6921	0.7277	0.4812	−0.0749	0.919	67
	P	67	0.4444	0.5770	0.5553	0.7067	0.4126	−0.0500	0.895	52
	C	66	0.5879	0.7622	0.7597	0.7653	0.5603	0.0988	0.541	91
	V	66	0.7486	-	-	-	-	-	0.52	-
3	A	67	0.5492	0.7090	0.7193	0.7404	0.5275	−0.2055	0.793	79
	P	66	0.5238	0.6980	0.6418	0.7572	0.4936	0.1345	0.925	70
	C	67	0.6190	0.7524	0.7493	0.8018	0.5961	0.2755	0.758	106
	V	66	0.7169	-	-	-	-	-	0.65	-
4	A	66	0.4800	0.6486	0.6521	0.7417	0.4547	−0.1824	0.915	59
	P	66	0.3509	0.5810	0.4455	0.7057	0.3099	−0.2432	0.971	35
	C	67	0.6021	0.7294	0.7334	0.8442	0.5766	0.5658	0.746	98
	V	67	0.6484	-	-	-	-	-	0.74	-
5	A	67	0.5723	0.7280	0.7343	0.7772	0.5480	0.1446	0.882	87
	P	66	0.5737	0.6713	0.7350	0.7657	0.5493	0.0978	0.863	86
	C	66	0.5202	0.7016	0.5059	0.7593	0.4789	0.1184	0.699	69
	V	67	0.7092	-	-	-	-	-	0.68	-

Table 12. The statistical quality of models applying descriptor DS(3,15) and target function TF₃. The average determination coefficient on the validation set is 0.750 ± 0.033; bold indicates the best model.

Split	Set	n	R²	CCC	IIC	CII	Q²	CCCP	RMSE	F
1	A	67	0.5438	0.7045	0.5979	0.7593	0.5204	0.1170	0.884	77
	P	66	0.3943	0.5999	0.6094	0.7036	0.3621	−0.1837	0.921	42
	C	67	0.7085	0.8351	0.8059	0.8480	0.6894	0.5784	0.522	158
	V	66	0.7706	-	-	-	-	-	0.59	-
2	A	67	0.4710	0.6404	0.6274	0.7269	0.4416	−0.0220	0.953	58
	P	67	0.4006	0.5149	0.5772	0.7042	0.3685	−0.1156	0.939	43
	C	66	0.7218	0.8474	0.7631	0.8545	0.7011	0.6390	0.417	166
	V	66	0.7683	-	-	-	-	-	0.47	-
3	A	67	0.4356	0.6069	0.5683	0.6982	0.4014	−0.3446	0.887	50
	P	66	0.4271	0.6144	0.6362	0.7362	0.3962	−0.1572	1.00	48
	C	67	0.7418	0.8561	0.7376	0.8580	0.7259	0.5858	0.514	187
	V	66	0.7875	-	-	-	-	-	0.50	-
4	A	66	0.4806	0.6492	0.6140	0.7290	0.4561	−0.3088	0.915	59
	P	66	0.3203	0.5485	0.3616	0.6975	0.2773	−0.3392	0.981	30
	C	67	0.7164	0.7947	0.5741	0.8660	0.6959	0.6680	0.647	164
	V	67	0.7251	-	-	-	-	-	0.68	-
5	A	67	0.5178	0.6823	0.6984	0.7536	0.4906	−0.0139	0.936	70
	P	66	0.4228	0.5667	0.4983	0.7358	0.3914	0.0413	0.971	47
	C	66	0.6928	0.8174	0.6477	0.8443	0.6732	0.5595	0.531	144
	V	67	0.6991	-	-	-	-	-	0.58	-

Table 13. The average value and dispersions of determination coefficients on the validation set for the model’s antioxidant activity of peptides (dataset 1).

Descriptor	TF₁	TF₂	TF₃
D0(3,15)	0.673 ± 0.064	0.556 ± 0.144	0.719 ± 0.071
DS(3,15)	0.718 ± 0.065	0.550 ± 0.131	0.764 ± 0.068

Table 14. The average value and dispersions of determination coefficients on the validation set for the model’s antioxidant activity of peptides (dataset 2).

Descriptor	TF₁	TF₂	TF₃
D0(3,15)	0.525 ± 0.044	0.458 ± 0.067	0.526 ± 0.028
DS(3,15)	0.711 ± 0.045	0.711 ± 0.034	0.750 ± 0.033

Table 15. Comparison of models on the activity of peptides.

n	R²	References	Comment
32	0.746	[32]	pIC50 anti-cancer
-	0.708	[33]	Antioxidant activity of tripeptides
54	0.865	This work	Antioxidant activity of tripeptides (best model)
66	0.787	This work	The inhibitory activity on (ACE) (best model)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. In Silico Models of Biological Activities of Peptides Using the Coefficient of Conformism of a Correlative Prediction and the Las Vegas Algorithm. Macromol 2025, 5, 27. https://doi.org/10.3390/macromol5020027

AMA Style

Toropova AP, Toropov AA, Roncaglioni A, Benfenati E. In Silico Models of Biological Activities of Peptides Using the Coefficient of Conformism of a Correlative Prediction and the Las Vegas Algorithm. Macromol. 2025; 5(2):27. https://doi.org/10.3390/macromol5020027

Chicago/Turabian Style

Toropova, Alla P., Andrey A. Toropov, Alessandra Roncaglioni, and Emilio Benfenati. 2025. "In Silico Models of Biological Activities of Peptides Using the Coefficient of Conformism of a Correlative Prediction and the Las Vegas Algorithm" Macromol 5, no. 2: 27. https://doi.org/10.3390/macromol5020027

APA Style

Toropova, A. P., Toropov, A. A., Roncaglioni, A., & Benfenati, E. (2025). In Silico Models of Biological Activities of Peptides Using the Coefficient of Conformism of a Correlative Prediction and the Las Vegas Algorithm. Macromol, 5(2), 27. https://doi.org/10.3390/macromol5020027

Article Menu

In Silico Models of Biological Activities of Peptides Using the Coefficient of Conformism of a Correlative Prediction and the Las Vegas Algorithm

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.2. Scheme for Model Construction

2.3. Applicability Domain

2.4. Mechanistic Interpretation

2.5. Model

3. Results

3.1. Database 1

3.2. Database 2

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI