1. Introduction
Peptides are substances that are sequences of amino acids linked together [
1,
2]. A non-rigorous, but useful statement can be formulated as follows: traditional substances are associations of atoms into molecules, and peptides are associations of amino acids. Peptides have a large number of roles (or functions). Neuropeptides are produced in the brain and regulate the functioning of the central nervous system [
3,
4]. Other peptides are regulators of immunity and increase protective functions [
5,
6]. Other peptides are hormones [
7], affecting the course of the main physiological functions of the body. Other peptides are antibiotics with an antibacterial effect [
8]. Thus, peptides are undoubtedly a very important object of study in the field of medicine, biology and natural sciences. Thus, modeling their biological activity should be considered a very important task in both theoretical and practical terms.
In principle, to develop models of the physicochemical and biochemical behavior of peptides, one can use traditional approaches developed for constructing quantitative structure–property/activity relationships (QSPRs/QSARs) oriented towards any type of molecule [
9]. However, building models of peptides based on their molecular structure, described as a system of atoms linked by covalent bonds, is not always convenient; moreover, of course the bioactivity of peptides depends on amino acid composition [
10]. There are approaches that combine molecular representation learning of 1D sequential tokens, 2D topology graphs, and 3D conformers [
11,
12]. It is possible to consider the tripeptides, tetrapeptides, pentapeptides and hexapeptides, etc., separately [
13]. A popular trend is to use artificial intelligence ideas to model the biochemical behavior of peptides [
14]. Research based on three-dimensional representations of peptides is another common strategy for searching QSAR for peptides [
15]. There is research conducted on individual types of biological activity of peptides, mutagenicity [
16], antidiabetic potential, antihypertension potential [
17], and others [
18]. Classical molecular descriptors also apply for modelling peptide activity [
19,
20]. Molecular docking has been used to simulate the activity of peptides in [
13,
21]. Furthermore, more sophisticated approaches that involve machine learning are applied [
22].
The information about the amino acid sequences has been used in some cases as a basis for QSAR/QSAR analysis [
23,
24,
25,
26,
27,
28]. This approach is like quasi-SMILES [
29]. Traditional SMILES is a special language for describing molecular structure, whereas quasi-SMILES is an extension of traditional SMILES by including some data not related to molecular structure, such as codes conveying experimental conditions. Amino acid sequences are a special case of quasi-SMILES.
In this paper, an attempt is made to use the experience of applying the mentioned approach (the endpoint model as a mathematical function of the amino acid sequence represented by their single-symbol designation). Unlike previous works, this study uses a recently proposed criterion for the predictive potential so-called coefficient of conformism of a correlative prediction (CCCP), as well as a Las Vegas algorithm that generates “lucky” distributions of available data into training and validation sets [
29].
The essence of the Las Vegas algorithm for the stated purposes can be formulated as testing several splits into an active training set, a passive training set, a calibration set, and a validation set. The split where the best results were achieved for the calibration set is remembered, in the hope that good results can be expected also for the external validation set, not used in the process of building the model.
The aim of this study is (i) to check whether CCCP is useful to improve the predictive potential of the model; and (ii) to check whether the Las Vegas algorithm can provide satisfactory splits of available data in training and validation sets. These computational experiments are carried out with CORAL software-2024 (
http://www.insilico.eu/coral accessed on 2 June 2025).
2. Materials and Methods
2.1. Data
In this study, two databases on the biological activity of peptides are considered. Database 1: Data on the antioxidant activity of 214 tripeptides taken from work [
2]. These peptides were measured in vitro with their ACE-inhibitory activity IC50, which was experimentally determined as the inhibitory concentration of the peptide that reduced ACE activity by 50% [
2]. Here, the negative logarithm of IC50 (pIC50) is considered. Database 2: Data on the inhibitory activity of 268 peptides in goat milk taken from [
1]. In this case, two duplicates were found (identical sequences of amino acids) and removed from consideration [
30]. All peptide sample activities were expressed by IC50 values, which represent the peptide concentration (in μM) required to block ACE activity by 50%. The negative logarithm of the above values was considered as an endpoint (pIC50).
The data were divided into these sets: active (≈25%) and passive training (≈25%), calibration (≈25%), and validation (≈25%). This distribution was performed using the Las Vegas algorithm, which boils down to the following steps. Ten random divisions were made into the specified sets to construct models for the active and passive training sets and the calibration set. The model with the best statistics for the calibration set was selected. The validation set was not used in the modelling phase.
2.2. Scheme for Model Construction
The formula for calculating optimal descriptors calculated from a list of amino acids (more precisely, from a list of so-called correlation weights of the corresponding amino acids) are as follows:
Ak denotes one amino acid, AAk denotes a couple of amino acids which are neighbors in the sequence of amino acids for a peptide. T and N are parameters of the Monte Carlo optimization; T is the threshold to define rare amino acids, which are not involved in the optimization, having correlation weights equal to zero; N is the number of epochs of the optimization. Epoch is one cycle of modifications of all non-rare amino acids. The numerical values (T = 3 and N = 15) were selected within the initial computational experiments.
Symmetry is the number in the sequence of amino acid fragments such as XYX, where X and Y are one-symbol abbreviations of any amino acid. For instance, peptide LWE contains symmetry equal to zero (denoted as ‘xyx0’); LWL contains symmetry equal to one (denoted as ‘xyx1’).
The values of the correlation weights were found by Monte Carlo optimization using different mathematical functions. Naturally, the numerical values of the correlation weights for different functions differed significantly, which in turn led to different statistical quality of the corresponding models.
Equations (1) and (2) need the numerical data on the above correlation weights. The Monte Carlo optimization is a tool to calculate those correlation weights. Here, three target functions for the Monte Carlo optimization are examined (TF
1, TF
2, and TF
3):
The
and
are correlation coefficients between the observed and predicted endpoint for the active training and passive training sets, respectively. The statistical criteria applied in Equations (4)–(6) are described in the literature [
31]. These are the index of ideality of correlation (IIC), the correlation intensity index (CII), and the conformism coefficient of correlative prediction (CCCP).
The IIC
C is calculated with data on the calibration set as the following:
The observed and calculated are corresponding values of the endpoint.
The CII was developed as a tool to improve the quality of the Monte Carlo optimization aimed to build up QSPR/QSAR models.
The CII is calculated as follows:
R2 is the correlation coefficient for a set that contains n substances. R2k is the correlation coefficient for n − 1 substances of a set after removing the k-th substance. Hence, if the ∆ = (R2k − R2) is positive, the k-th substance is an “oppositionist” for the correlation between experimental and predicted values of the set. A small sum of “protests” means a better correlation.
However, in addition to the above-mentioned “oppositionists” of the correlation, there are also its “supporters”; in this case the ∆ = (R
2k − R
2) is negative. The comparison of correlation coefficients separately for “supporters” and “oppositionists” of the correlation is an informative criterion for the Monte Carlo optimization, similarly to IIC and CII.
The suggested ratio is able to show the measure of conformism of the “oppositionists” and “supporters”. Thus, one can name this criterion the conformism coefficient of correlative prediction (CCCP).
2.3. Applicability Domain
The applicability domain is determined based on the so-called statistical defects of the amino acids present in a given sequence of amino acids. Thus, the basis for assessing the suitability of a model for each peptide as a sequence of amino acids is the prevalence of its constituent amino acids extracted from the sequence. The rarer a sequence of amino acids is in the training sets, the greater its statistical defect; as a result, it is less likely to obtain a reliable model prediction. If the statistical defect of peptide (sequence of amino acids) is greater than twice the average statistical defect in the training sets, this peptide is considered a potential outlier. The statistical defect of amino acid is calculated as
where P(A
k), P′(A
k), P″(A
k) are the probability of A
k in the active training, passive training, and calibration sets, respectively; N(A
k), N′(A
k), and N″(A
k) are frequencies of A
k in the active training, passive training, and calibration sets, respectively. The statistical defects of peptides (D
j) are calculated as:
where NA is the number of non-blocked amino acids in peptide.
A j-th peptide falls in the domain of applicability if
Compared with other methods to evaluate similarity and distance (e.g., the Euclidean distance), our approach focuses on specific molecular parts, and if a rare sequence of amino acids is found, this is enough to label the substance; instead, the Euclidean distance makes an overall balance with all the components of the molecules.
2.4. Mechanistic Interpretation
Mechanistic interpretation of the model can be obtained by carrying out several Monte Carlo optimizations runs under the same conditions (or using the same CORAL method (
http://www.insilico.eu/coral accessed on 2 June 2025). In this case, the following types of amino acids will be obtained. Type one: all correlation weights are positive; type two: all correlation weights are negative. The first type can be interpreted as amino acids leading to an increase in the value of the endpoint under study. Conversely, the second type can be interpreted as amino acids whose presence leads to a decrease in the endpoint under study. It should be noted that when conducting the computational experiment, there will be amino acids with alternating correlation weights. However, they cannot be used as indicators of an increase or decrease in the values of the endpoint under study, and thus they are not useful for modelling purposes.
2.5. Model
The concept of models in the frames of the CORAL software-2024 (
http://www.insilico.eu/coral accessed on 2 June 2025) is the following:
where C
0 and C
1 are regression coefficients and D(T,N) is descriptor D0(T,N) or DS(T,N), calculated with equations 1 or 2.
3. Results
The CORAL model provides different kinds of results, depending on the splits and the functions. First, CORAL gives predicted values, so that it is possible to get the best correlation prediction for the validation set; secondly, the model provides the average value and dispersion of the determination coefficient for the validation set over five splits in active, passive training, calibration, and validation sets. At the same time, statistical characteristics observed for five divisions obtained by the Las Vegas algorithm for the target function TF0 are recorded, if used D0(3,15) or DS(3,15) with target functions TF1, TF2, and TF3. The different models include or do not include descriptors using the information on symmetry (DS or D0), while the different functions introduce mathematical components, possibly improving the statistics. Thus, CORAL provides different results, related to different algorithms, and related to different splits, for each algorithm. The most interesting results are those related to the validation set, which refers to results obtained with substances not used to build up the model. This kind of statistics is representative of the expected performance when the model is applied to “new” substances. The other important statistical value is that observed for the calibration set. This value represents the statistics of the final model when it is optimized. The statistical values on the calibration set should be close to those of the validation set. The other statistical values, on what we call active and passive training sets, refer to preliminary steps in the modelling process, and thus are useful mainly for internal purposes to monitor the progress of the modelling process.
3.1. Database 1
According to the computational experiments, the stable promoters of the growth of antioxidant activity are arginine (R), tyrosine (Y), leucine (L), proline (P), as well as the absence of fragments of local symmetry (xyx0). Amino acids with small prevalence can characterized by correlation weights with alternating sign in several runs of the optimization.
3.2. Database 2
According to the computational experiments, the stable promoters of the growth of the inhibitory activity of peptides are tyrosine (Y), valine (V), isoleucine (I), and lysine (K) as well as the absence of fragments of local symmetry (xyx0). Amino acids with small prevalence can characterized by correlation weights with alternating sign in several runs of the optimization.
4. Discussion
Table 13 contains the statistical quality of models on dataset 1 considering the validation set. One can see that the best results are observed in the case of descriptors considering the symmetry xyx. Furthermore, the target function TF
3, which uses the CCCP, turned out to be the most promising for achieving the best predictive potential, and this is observed both when the symmetry descriptors are used or not. It is also possible to see that the TF
2 has a larger standard deviation. The larger deviation is an indication of a noisy model.
Table 14 contains the statistical quality of models on dataset 2 on the validation set. Again, one can see, the best results observed in the case of descriptors considering the symmetry (xyx). Again, the target function TF
3, which uses the CCCP, turned out to be the most promising for achieving the best predictive potential.
Despite the apparent simplicity of the considered approach, useful theoretical and practical aspects are quite noticeable. Firstly, future expansion of the information capacity of the proposed system of accounting of symmetry of peptides, where equivalent positions are considered in terms of the presence of different amino acids, are quite obvious. Secondly, once the most appropriate descriptors are identified, including the symmetry, the proposed algorithms, such as IIC, CII, and CCCP, can subsequently improve the modelling process, as tools for controlling the optimization by the Monte Carlo method. Finally, the construction of databases, where peptides are represented by sequences of amino acids, can prove competitive in comparison with traditional databases, both in economic and heuristic aspects. Thus, the possibility of applying stochastic methods to develop models of endpoints related to peptides is demonstrated.
Table 15 contains a comparison of some analogic models for peptide endpoints from the literature. Our models provide good results, better than results reported in the literature.
Nevertheless, we underline that peptides may be very different, as well as their properties and effects, thus, these studies are only partial. With increasing length (number of amino acids), new effects of the impact of amino acid collectives on the properties under study may arise. It is natural to expect a broader range of effects with increasing complexity of the peptides. However, it is difficult to say when the transition from quantity to quality will occur. To properly address these aspects, new data and also new studies with new approaches are necessary. The characterization of the peptides should be improved, and one possibility is to use the information on the size and the overall polarity, for instance.
Based on the available results, the approach under consideration is characterized by satisfactory predictive potential, at least for a preliminary assessment of various aspects of peptide behavior. The symmetry ‘xyx’-type can be supplemented by additional symmetry types, such as ‘xyyx’, ‘xyzyx’, and possibly more complex ones. In addition, combinations of the considered criteria of predictive potential (for example, IIC and CII or IIC and CCCP) can be effective.
IIC and CII improve the predictive potential of the model for the calibration set, but to the detriment of active and passive training sets [
29]. Apparently, CCCP gives the same effect. This is due to the fact that initial steps of the model, as represented by the results of the active and passive training sets, may be attracted by some not common peptides; thus, the initial model is quite local, while in the following phase, as performed with the calibration step, the role of the general features present in the larger population prevails. Thus, the statistics of the calibration set are much more informative.
5. Conclusions
The antioxidant activity and ACE-inhibitory activity of peptides can be simulated using the Monte Carlo technique, achieving good statistical performance. The conception of xyx symmetry in peptides improves the predictive potential of the QSAR model. Comparison of the criteria of the predictive potential, such as IIC, CII, and CCCP, shows some advantages of CCCP. This study demonstrates that it is possible to use a quite simplified format for the description of the peptides, without using time-consuming and more sophisticated approaches, to address the activity of peptides. The CORAL software can successfully handle both the chemical format and the related algorithms. These studies are useful both from a methodological point of view, indicating the most favourable conditions and algorithms, and from a practical point of view, introducing models that can be used to explore properties of peptides. The obtained models compare favourably with other models previously published, showing clear improvements. The approach presented here can be useful in the search for new drugs and the selection of peptides for various practical applications.
Author Contributions
Conceptualization, A.P.T., A.A.T., A.R. and E.B.; data curation, A.P.T., A.A.T., A.R. and E.B.; writing—original draft preparation, A.P.T., A.A.T., A.R. and E.B.; writing—review and editing, A.P.T., A.A.T., A.R. and E.B.; supervision, A.R. and E.B.; project administration, E.B. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by EFSA within the project sOFT-ERA, OC/EFSA/IDATA/2022/02.
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations and symbols are used in this manuscript:
Abbreviation | Meaning |
QSPR/QSAR | Quantitative structure–property/activity relationships |
A | Active training set |
P | Passive training set |
C | Calibration set |
V | Validation set |
CCCP | Coefficient of conformism of a correlative prediction |
IIC | Index of ideality of correlation |
CII | Correlation intensity index |
FLS | Fragments of local symmetry |
CCC | Concordance correlation coefficient |
R2 | Determination coefficient |
Q2 | Cross-validated R2 |
MAE | Mean absolute error |
F | Fischer F-ratio |
Nact | The number of features under optimization. |
References
- Du, A.; Jia, W. Bioaccessibility of novel antihypertensive short-chain peptides in goat milk using the INFOGEST static digestion model by effect-directed assays. Food Chem. 2023, 427, 136735. [Google Scholar] [CrossRef]
- Wang, J.-H.; Liu, Y.-L.; Ning, J.-H.; Yu, J.; Li, X.-H.; Wang, F.-X. Is the structural diversity of tripeptides sufficient for developing functional food additives with satisfactory multiple bioactivities? J. Mol. Struct. 2013, 1040, 164–170. [Google Scholar] [CrossRef]
- Schüß, C.; Vu, O.; Mishra, N.M.; Tough, I.R.; Du, Y.; Stichel, J.; Cox, H.M.; Weaver, C.D.; Meiler, J.; Emmitte, K.A.; et al. Structure-Activity Relationship Study of the High-Affinity Neuropeptide Y4 Receptor Positive Allosteric Modulator VU0506013. J. Med. Chem. 2023, 66, 8745–8766. [Google Scholar] [CrossRef] [PubMed]
- Wang, M.; Li, X.; Chen, M.; Wu, X.; Mi, Y.; Kai, Z.; Yang, X. 3D-QSAR based optimization of insect neuropeptide allatostatin analogs. Bioorg. Med. Chem. Lett. 2019, 29, 890–895. [Google Scholar] [CrossRef]
- Granstein, R.D.; Wagner, J.A.; Stohl, L.L.; Ding, W. Calcitonin gene-related peptide: Key regulator of cutaneous immunity. Acta Physiol. 2015, 213, 586–594. [Google Scholar] [CrossRef]
- Ren, M.; Wang, Y.; Zheng, X.; Yang, W.; Liu, M.; Xie, S.; Yao, Y.; Yan, J.; He, W. Hydrogelation of peptides and carnosic acid as regulators of adaptive immunity against postoperative recurrence of cutaneous melanoma. J. Control. Release 2024, 375, 654–666. [Google Scholar] [CrossRef]
- Besman, M.; Zambrowicz, A.; Matwiejczyk, M. Review of Thymic Peptides and Hormones: From Their Properties to Clinical Application. Int. J. Pept. Res. Ther. 2025, 31, 10. [Google Scholar] [CrossRef]
- He, Y.; He, X. Molecular design and genetic optimization of antimicrobial peptides containing unnatural amino acids against antibiotic-resistant bacterial infections. Biopolymers 2016, 106, 746–756. [Google Scholar] [CrossRef] [PubMed]
- Zhou, P.; Liu, Q.; Wu, T.; Miao, Q.; Shang, S.; Wang, H.; Chen, Z.; Wang, S.; Wang, H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J. Chem. Inf. Model. 2021, 61, 1718–1731. [Google Scholar] [CrossRef]
- Kashung, P.; Karuthapandian, D. Milk-derived bioactive peptides. Food Prod. Process. Nutr. 2025, 7, 6. [Google Scholar] [CrossRef]
- Li, J.; Zong, K.; Wei, C.; Zhong, Q.; Yan, H.; Wang, J.; Li, X. Design, synthesis, and biological activity of human glutaminyl cyclase inhibitors against Alzheimer’s disease. Bioorg. Med. Chem. 2025, 120, 118105. [Google Scholar] [CrossRef]
- Yin, K.; Li, R.; Zhang, S.; Sun, Y.; Huang, L.; Jiang, M.; Xu, D.; Xu, W. Deep learning combined with quantitative structure-activity relationship accelerates de novo design of antifungal peptides. Adv. Sci. 2025, 12, 2412488. [Google Scholar] [CrossRef]
- Chen, Q.; Ge, Y.; He, X.; Li, S.; Fang, Z.; Li, C.; Chen, H. Virtual-screening of xanthine oxidase inhibitory peptides: Inhibition mechanisms and prediction of activity using machine-learning. Food Chem. 2024, 460, 140741. [Google Scholar] [CrossRef]
- Khalaf, W.S.; Morgan, R.N.; Elkhatib, W.F. Clinical microbiology and artificial intelligence: Different applications, challenges, and future prospects. J. Microbiol. Methods 2025, 232–234, 107125. [Google Scholar] [CrossRef] [PubMed]
- Tran, T.T.N.; Nguyen, H.T.D.; Nguyen, V.C. A Machine Learning-Driven 3D-QSAR Approach for developing antioxidant preservatives from bovine hemoglobin and tryptophyllin l for meat products. Pept. Sci. 2025, 117, e70004. [Google Scholar] [CrossRef]
- Rane, R.; Satpute, B.; Kumar, D.; Suryawanshi, M.; Prabhune, A.G.; Gawade, B.; Mahajan, A.; Pawar, A.; Sakat, S. Mutagenic and genotoxic in silico QSAR prediction of dimer impurity of gliflozins; canagliflozin, dapaglifozin, and emphagliflozin and in vitro evaluation by Ames and micronucleus test. Drug Chem. Toxicol. 2025, 48, 416–425. [Google Scholar] [CrossRef] [PubMed]
- Ye, X.; Yang, R.; Yang, Z.; Huang, B.; Riaz, T.; Zhao, C.; Chen, J. Novel angiotensin-I-converting enzyme (ACE) inhibitory peptides from Porphyra haitanensis: Screening, digestion stability, and mechanistic insights. Food Biosci. 2025, 68, 106460. [Google Scholar] [CrossRef]
- Cournoyer, A.; Bernier, M.-È.; Aboubacar, H.; de Toro-Martín, J.; Vohl, M.-C.; Ravallec, R.; Cudennec, B.; Bazinet, L. Machine learning-driven discovery of bioactive peptides from duckweed (Lemnaceae) protein hydrolysates: Identification and experimental validation of 20 novel antihypertensive, antidiabetic, and/or antioxidant peptides. Food Chem. 2025, 482, 144029. [Google Scholar] [CrossRef]
- Garro, L.A.; Andrada, M.F.; Vega-Hissi, E.G.; Barberis, S.; Garro Martinez, J.C. Development of QSARs for cysteine-containing di- and tripeptides with antioxidant activity:influence of the cysteine position. J. Comput. Aided Mol. Des. 2024, 38, 27. [Google Scholar] [CrossRef]
- van der Walt, M.; Möller, D.S.; van Wyk, R.J.; Ferguson, P.M.; Hind, C.K.; Clifford, M.; Do Carmo Silva, P.; Sutton, J.M.; Mason, A.J.; Bester, M.J.; et al. QSAR reveals decreased lipophilicity of polar residues determines the selectivity of antimicrobial peptide activity. ACS Omega 2024, 9, 26030–26049. [Google Scholar] [CrossRef]
- Wang, B.; Zhang, H.; Wen, Y.; Yuan, W.; Chen, H.; Lin, L.; Guo, F.; Zheng, Z.-P.; Zhao, C. The novel angiotensin-I-converting enzyme inhibitory peptides from Scomber japonicus muscle protein hydrolysates: QSAR-based screening, molecular docking, kinetic and stability studies. Food Chem. 2024, 447, 138873. [Google Scholar] [CrossRef] [PubMed]
- Martínez-Mauricio, K.L.; García-Jacas, C.R.; Cordoves-Delgado, G. Examining evolutionary scale modeling-derived different-dimensional embeddings in the antimicrobial peptide classification through a KNIME workflow 2. Protein Sci. 2024, 33, e4928. [Google Scholar] [CrossRef]
- Mahmoodi-Reihani, M.; Abbasitabar, F.; Zare-Shahabadi, V. In Silico Rational Design and Virtual Screening of Bioactive Peptides Based on QSAR Modeling. ACS Omega 2020, 5, 5951–5958. [Google Scholar] [CrossRef]
- Toropova, A.P.; Raškova, M.; Raška, I., Jr.; Toropov, A.A. The sequence of amino acids as the basis for the model of biological activity of peptides. Theor. Chem. Acc. 2021, 140, 15. [Google Scholar] [CrossRef] [PubMed]
- Moinul, M.; Khatun, S.; Abdul Amin, S.; Jha, T.; Gayen, S. Quasi-SMILES as a tool for peptide QSAR modelling. In QSPR/QSAR Analysis Using SMILES and Quasi-SMILES. Challenges and Advances in Computational Chemistry and Physics; Toropova, A.P., Toropov, A.A., Eds.; Springer: Cham, Switzerland, 2023; Volume 33, pp. 269–294. [Google Scholar] [CrossRef]
- Toropova, A.P.; Toropov, A.A.; Kumar, P.; Kumar, A.; Achary, P.G.R. Fragments of local symmetry in a sequence of amino acids: Does one can use for QSPR/QSAR of peptides? J. Mol. Struct. 2023, 1293, 136300. [Google Scholar] [CrossRef]
- Toropova, M.A.; Veselinović, A.M.; Veselinović, J.B.; Stojanović, D.B.; Toropov, A.A. QSAR modeling of the antimicrobial activity of peptides as a mathematical function of a sequence of amino acids. Comput. Biol. Chem. 2015, 59, 126–130. [Google Scholar] [CrossRef] [PubMed]
- Toropov, A.A.; Toropova, A.P.; Raska, I., Jr.; Benfenati, E.; Gini, G. QSAR modeling of endpoints for peptides which is based on representation of the molecular structure by a sequence of amino acids. Struct. Chem. 2012, 23, 1891–1904. [Google Scholar] [CrossRef]
- Toropova, A.P.; Toropov, A.A. The coefficient of conformism of a correlative prediction (CCCP): Building up reliable nano-QSPRs/QSARs for endpoints of nanoparticles in different experimental conditions encoded via quasi-SMILES. Sci. Total Environ. 2024, 927, 172119. [Google Scholar] [CrossRef]
- Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. Quantitative structure–activity relationship models for the angiotensin-converting enzyme inhibitory activities of short-chain peptides of goat milk using quasi-SMILES. Macromol 2024, 4, 387–400. [Google Scholar] [CrossRef]
- Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. In Silico Simulation of Daphnia magna Immobilization Exposed to Mixtures of TiO2 Nanoparticles with Inorganic Compounds. J. Compos. Sci. 2025, 9, 16. [Google Scholar] [CrossRef]
- Guendouzi, A.; Belkhiri, L.; Guendouzi, A.; Derouiche, T.M.T.; Djekoun, A. A combined in silico approaches of 2D-QSAR, molecular docking, molecular dynamics and ADMET prediction of anti-cancer inhibitor activity for actinonin derivatives. J. Biomol. Struct. Dyn. 2024, 42, 119–133. [Google Scholar] [CrossRef] [PubMed]
- Wang, F.; Wen, M.; Zhou, B. Exploring details about structure requirements based on antioxidant tripeptide derived from β-Lactoglobulin by in silico approaches. Amino Acids 2023, 55, 1909–1922. [Google Scholar] [CrossRef] [PubMed]
Table 1.
The statistical quality of models applying descriptor D0(3,15) and target function TF1. The average determination coefficient on the validation set is 0.673 ± 0.064; bold indicates the best model.
Table 1.
The statistical quality of models applying descriptor D0(3,15) and target function TF1. The average determination coefficient on the validation set is 0.673 ± 0.064; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 53 | 0.4186 | 0.5902 | 0.5777 | 0.7461 | 0.3780 | 0.1623 | 1.72 | 37 |
| P | 53 | 0.4442 | 0.5730 | 0.4105 | 0.7869 | 0.3897 | 0.3276 | 1.92 | 41 |
| C | 54 | 0.7315 | 0.8492 | 0.8518 | 0.8648 | 0.7128 | 0.5358 | 0.693 | 142 |
| V | 54 | 0.7763 | - | - | - | - | - | 0.79 | - |
2 | A | 54 | 0.4898 | 0.6575 | 0.4812 | 0.7625 | 0.4548 | 0.0508 | 1.74 | 50 |
| P | 53 | 0.4412 | 0.6126 | 0.4035 | 0.7716 | 0.3974 | 0.2435 | 1.74 | 40 |
| C | 54 | 0.7196 | 0.8441 | 0.8482 | 0.8129 | 0.6993 | 0.3364 | 0.737 | 133 |
| V | 53 | 0.6082 | - | - | - | - | - | 0.95 | - |
3 | A | 54 | 0.4616 | 0.6317 | 0.4324 | 0.7739 | 0.4223 | 0.2087 | 1.90 | 45 |
| P | 54 | 0.4364 | 0.6185 | 0.5263 | 0.7452 | 0.3852 | −0.2171 | 1.77 | 40 |
| C | 53 | 0.7799 | 0.8196 | 0.8821 | 0.8488 | 0.7628 | 0.2735 | 0.886 | 181 |
| V | 53 | 0.6400 | - | - | - | - | - | 0.94 | - |
4 | A | 54 | 0.4315 | 0.6029 | 0.4180 | 0.7942 | 0.3853 | 0.3020 | 1.87 | 39 |
| P | 54 | 0.4122 | 0.5763 | 0.6063 | 0.7196 | 0.3669 | −0.1321 | 1.71 | 36 |
| C | 53 | 0.5735 | 0.6745 | 0.7558 | 0.7556 | 0.5471 | 0.1111 | 1.07 | 69 |
| V | 53 | 0.6230 | - | - | - | - | - | 1.07 | - |
5 | A | 54 | 0.4341 | 0.6054 | 0.5680 | 0.7471 | 0.3884 | 0.1746 | 1.84 | 40 |
| P | 53 | 0.4817 | 0.6049 | 0.3838 | 0.7440 | 0.4447 | 0.1200 | 1.72 | 47 |
| C | 54 | 0.6904 | 0.8284 | 0.8305 | 0.8220 | 0.6695 | 0.4316 | 0.790 | 116 |
| V | 53 | 0.7162 | - | - | - | - | - | 0.77 | - |
Table 2.
The statistical quality of models applying descriptor D0(3,15) and target function TF2. The average determination coefficient on the validation set is 0.555 ± 0.144; bold indicates the best model.
Table 2.
The statistical quality of models applying descriptor D0(3,15) and target function TF2. The average determination coefficient on the validation set is 0.555 ± 0.144; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 53 | 0.5003 | 0.6670 | 0.6316 | 0.7785 | 0.4580 | 0.3387 | 1.59 | 51 |
| P | 53 | 0.5515 | 0.6486 | 0.5520 | 0.7748 | 0.5064 | 0.3229 | 1.72 | 63 |
| C | 54 | 0.4318 | 0.5940 | 0.5550 | 0.7841 | 0.3817 | 0.1768 | 1.34 | 40 |
| V | 54 | 0.7462 | - | - | - | - | - | 0.87 | - |
2 | A | 54 | 0.5504 | 0.7100 | 0.5505 | 0.7545 | 0.5181 | −0.0525 | 1.63 | 64 |
| P | 53 | 0.5508 | 0.7292 | 0.6894 | 0.8061 | 0.5062 | 0.2358 | 1.58 | 63 |
| C | 54 | 0.3083 | 0.5027 | 0.5104 | 0.7380 | 0.2292 | −0.0362 | 1.70 | 23 |
| V | 53 | 0.3066 | - | - | - | - | - | 1.83 | - |
3 | A | 54 | 0.5267 | 0.6900 | 0.5385 | 0.7851 | 0.4872 | 0.2728 | 1.78 | 58 |
| P | 54 | 0.5380 | 0.6922 | 0.4572 | 0.7556 | 0.4943 | 0.0738 | 1.58 | 61 |
| C | 53 | 0.6450 | 0.7509 | 0.7245 | 0.7816 | 0.6016 | 0.3030 | 1.07 | 93 |
| V | 53 | 0.5316 | - | - | - | - | - | 1.17 | - |
4 | A | 54 | 0.5285 | 0.6915 | 0.5394 | 0.7903 | 0.4851 | 0.3570 | 1.70 | 58 |
| P | 54 | 0.4467 | 0.6109 | 0.5755 | 0.7217 | 0.4034 | −0.0215 | 1.66 | 42 |
| C | 53 | 0.6245 | 0.7036 | 0.5481 | 0.8039 | 0.5986 | 0.3578 | 1.10 | 85 |
| V | 53 | 0.6251 | - | - | - | - | - | 1.15 | - |
5 | A | 54 | 0.5598 | 0.7178 | 0.5986 | 0.7712 | 0.5259 | 0.2167 | 1.62 | 66 |
| P | 53 | 0.5854 | 0.6533 | 0.5561 | 0.7814 | 0.5521 | 0.3261 | 1.59 | 72 |
| C | 54 | 0.4628 | 0.6550 | 0.6001 | 0.7497 | 0.4248 | −0.0275 | 1.16 | 45 |
| V | 53 | 0.5697 | - | - | - | - | - | 1.02 | - |
Table 3.
The statistical quality of models applying descriptor D0(3,15) and target function TF3. The average determination coefficient on the validation set is 0.719 ± 0.071; bold indicates the best model.
Table 3.
The statistical quality of models applying descriptor D0(3,15) and target function TF3. The average determination coefficient on the validation set is 0.719 ± 0.071; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 53 | 0.4467 | 0.6176 | 0.6436 | 0.7586 | 0.4033 | 0.2435 | 1.68 | 41 |
| P | 53 | 0.5025 | 0.6064 | 0.4925 | 0.7690 | 0.4575 | 0.2569 | 1.80 | 52 |
| C | 54 | 0.6116 | 0.7390 | 0.7245 | 0.7919 | 0.5795 | 0.4200 | 0.992 | 82 |
| V | 54 | 0.7398 | - | - | - | - | - | 0.82 | - |
2 | A | 54 | 0.4829 | 0.6513 | 0.6453 | 0.7587 | 0.4390 | 0.0653 | 1.75 | 49 |
| P | 53 | 0.4699 | 0.6544 | 0.5045 | 0.7992 | 0.4282 | 0.3661 | 1.70 | 45 |
| C | 54 | 0.7329 | 0.8355 | 0.8356 | 0.8545 | 0.7116 | 0.6257 | 0.815 | 143 |
| V | 53 | 0.5865 | - | - | - | - | - | 1.14 | - |
3 | A | 54 | 0.4564 | 0.6267 | 0.5824 | 0.7592 | 0.4175 | 0.1189 | 1.91 | 44 |
| P | 54 | 0.4351 | 0.6151 | 0.5814 | 0.7763 | 0.3910 | 0.2176 | 1.71 | 40 |
| C | 53 | 0.8272 | 0.8494 | 0.4508 | 0.8917 | 0.8167 | 0.7148 | 0.779 | 244 |
| V | 53 | 0.7282 | - | - | - | - | - | 0.81 | - |
4 | A | 54 | 0.4874 | 0.6554 | 0.4800 | 0.7828 | 0.4441 | 0.2803 | 1.77 | 49 |
| P | 54 | 0.4154 | 0.5828 | 0.6122 | 0.7286 | 0.3667 | 0.0135 | 1.71 | 37 |
| C | 53 | 0.8065 | 0.8519 | 0.5969 | 0.8894 | 0.7936 | 0.7060 | 0.751 | 213 |
| V | 53 | 0.8017 | - | - | - | - | - | 0.79 | - |
5 | A | 54 | 0.4940 | 0.6613 | 0.5623 | 0.7724 | 0.4545 | 0.3352 | 1.74 | 51 |
| P | 53 | 0.5015 | 0.6045 | 0.3406 | 0.7347 | 0.4681 | 0.2146 | 1.71 | 51 |
| C | 54 | 0.7668 | 0.8721 | 0.8225 | 0.8697 | 0.7484 | 0.6937 | 0.699 | 171 |
| V | 53 | 0.7389 | - | - | - | - | - | 0.73 | - |
Table 4.
The statistical quality of models applying descriptor DS(3,15) and target function TF1. The average determination coefficient on the validation set is 0.718 ± 0.065; bold indicates the best model.
Table 4.
The statistical quality of models applying descriptor DS(3,15) and target function TF1. The average determination coefficient on the validation set is 0.718 ± 0.065; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 53 | 0.4085 | 0.5800 | 0.6155 | 0.7756 | 0.3648 | 0.1055 | 1.73 | 35 |
| P | 53 | 0.4099 | 0.5397 | 0.3697 | 0.7687 | 0.3576 | 0.2399 | 1.96 | 35 |
| C | 54 | 0.6723 | 0.8077 | 0.8190 | 0.8249 | 0.6506 | 0.4542 | 0.781 | 107 |
| V | 54 | 0.7905 | - | - | - | - | - | 0.76 | - |
2 | A | 54 | 0.5085 | 0.6742 | 0.7131 | 0.7426 | 0.4744 | −0.0974 | 1.71 | 54 |
| P | 53 | 0.4866 | 0.6473 | 0.5123 | 0.7591 | 0.4500 | 0.1895 | 1.67 | 48 |
| C | 54 | 0.5195 | 0.7016 | 0.7204 | 0.7691 | 0.4743 | 0.0744 | 1.11 | 56 |
| V | 53 | 0.6382 | - | - | - | - | - | 0.96 | - |
3 | A | 54 | 0.3719 | 0.5422 | 0.4193 | 0.7384 | 0.3361 | −0.0119 | 2.05 | 31 |
| P | 54 | 0.3691 | 0.5544 | 0.5412 | 0.7373 | 0.3182 | 0.0333 | 1.80 | 30 |
| C | 53 | 0.6810 | 0.7711 | 0.8238 | 0.7990 | 0.6573 | 0.1410 | 0.885 | 109 |
| V | 53 | 0.6446 | - | - | - | - | - | 0.90 | - |
4 | A | 54 | 0.4669 | 0.6366 | 0.6345 | 0.8064 | 0.4278 | 0.4151 | 1.81 | 46 |
| P | 54 | 0.4715 | 0.6042 | 0.5903 | 0.7656 | 0.4317 | 0.2127 | 1.63 | 46 |
| C | 53 | 0.7145 | 0.8255 | 0.8445 | 0.8276 | 0.6937 | 0.3913 | 0.778 | 128 |
| V | 53 | 0.7770 | - | - | - | - | - | 0.75 | - |
5 | A | 54 | 0.4786 | 0.6474 | 0.5964 | 0.7662 | 0.4409 | 0.2791 | 1.76 | 48 |
| P | 53 | 0.5387 | 0.6010 | 0.3847 | 0.7573 | 0.5027 | 0.2696 | 1.69 | 60 |
| C | 54 | 0.6738 | 0.8128 | 0.8206 | 0.8095 | 0.6507 | 0.3382 | 0.807 | 107 |
| V | 53 | 0.7402 | - | - | - | - | - | 0.74 | - |
Table 5.
The statistical quality of models applying descriptor DS(3,15) and target function TF2. The average determination coefficient on the validation set is 0.550 ± 0.131; bold indicates the best model.
Table 5.
The statistical quality of models applying descriptor DS(3,15) and target function TF2. The average determination coefficient on the validation set is 0.550 ± 0.131; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 53 | 0.5118 | 0.6771 | 0.6889 | 0.7668 | 0.4719 | 0.2756 | 1.57 | 53 |
| P | 53 | 0.5254 | 0.6446 | 0.4894 | 0.7726 | 0.4779 | 0.2716 | 1.76 | 56 |
| C | 54 | 0.5309 | 0.6811 | 0.6230 | 0.7913 | 0.4940 | 0.1979 | 1.21 | 59 |
| V | 54 | 0.7415 | - | - | - | - | - | 0.85 | - |
2 | A | 54 | 0.6002 | 0.7501 | 0.6198 | 0.7938 | 0.5673 | 0.2997 | 1.54 | 78 |
| P | 53 | 0.5760 | 0.7528 | 0.6341 | 0.7948 | 0.5354 | 0.1572 | 1.54 | 69 |
| C | 54 | 0.2279 | 0.4092 | 0.4109 | 0.7441 | 0.1304 | −0.0253 | 2.03 | 15 |
| V | 53 | 0.3301 | - | - | - | - | - | 1.90 | - |
3 | A | 54 | 0.5588 | 0.7169 | 0.5139 | 0.7947 | 0.5236 | 0.2960 | 1.72 | 66 |
| P | 54 | 0.5614 | 0.7094 | 0.5205 | 0.7764 | 0.5208 | 0.1697 | 1.52 | 67 |
| C | 53 | 0.5841 | 0.6915 | 0.6708 | 0.7856 | 0.5304 | 0.3957 | 1.30 | 72 |
| V | 53 | 0.5503 | - | - | - | - | - | 1.22 | - |
4 | A | 54 | 0.4892 | 0.6570 | 0.5189 | 0.7776 | 0.4462 | 0.2829 | 1.77 | 50 |
| P | 54 | 0.4898 | 0.6275 | 0.5827 | 0.7558 | 0.4525 | 0.2073 | 1.61 | 50 |
| C | 53 | 0.5716 | 0.6163 | 0.3655 | 0.7894 | 0.5430 | 0.1404 | 1.27 | 68 |
| V | 53 | 0.5452 | - | - | - | - | - | 1.23 | - |
5 | A | 54 | 0.5756 | 0.7307 | 0.6070 | 0.7833 | 0.5397 | 0.1936 | 1.59 | 71 |
| P | 53 | 0.5800 | 0.7066 | 0.5849 | 0.7735 | 0.5495 | 0.2993 | 1.54 | 70 |
| C | 54 | 0.5861 | 0.7370 | 0.7116 | 0.7858 | 0.5589 | 0.2611 | 1.08 | 74 |
| V | 53 | 0.5829 | - | - | - | - | - | 1.14 | - |
Table 6.
The statistical quality of models applying descriptor DS(3,15) and target function TF3. The average determination coefficient on the validation set is 0.764 ± 0.068; bold indicates the best model.
Table 6.
The statistical quality of models applying descriptor DS(3,15) and target function TF3. The average determination coefficient on the validation set is 0.764 ± 0.068; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 53 | 0.3940 | 0.5652 | 0.6044 | 0.7437 | 0.3506 | 0.1639 | 1.75 | 33 |
| P | 53 | 0.4276 | 0.5583 | 0.4450 | 0.7746 | 0.3773 | 0.2540 | 1.93 | 38 |
| C | 54 | 0.7795 | 0.8724 | 0.7921 | 0.8762 | 0.7656 | 0.6769 | 0.659 | 184 |
| V | 54 | 0.8645 | - | - | - | - | - | 0.69 | - |
2 | A | 54 | 0.4264 | 0.5979 | 0.5224 | 0.7434 | 0.3801 | 0.1072 | 1.84 | 39 |
| P | 53 | 0.4234 | 0.5990 | 0.4421 | 0.7665 | 0.3846 | 0.2149 | 1.77 | 37 |
| C | 54 | 0.7231 | 0.8255 | 0.6353 | 0.8496 | 0.6992 | 0.6389 | 0.796 | 136 |
| V | 53 | 0.7076 | - | - | - | - | - | 0.79 | - |
3 | A | 54 | 0.4353 | 0.6066 | 0.4895 | 0.7440 | 0.3979 | 0.1082 | 1.94 | 40 |
| P | 54 | 0.4108 | 0.6050 | 0.6292 | 0.7478 | 0.3587 | −0.2145 | 1.73 | 36 |
| C | 53 | 0.8000 | 0.8305 | 0.5113 | 0.8755 | 0.7837 | 0.6745 | 0.848 | 204 |
| V | 53 | 0.7049 | - | - | - | - | - | 0.88 | - |
4 | A | 54 | 0.4198 | 0.5914 | 0.5586 | 0.7896 | 0.3745 | 0.2578 | 1.88 | 38 |
| P | 54 | 0.4209 | 0.5588 | 0.6292 | 0.7432 | 0.3763 | 0.0544 | 1.71 | 38 |
| C | 53 | 0.8150 | 0.8776 | 0.8115 | 0.9004 | 0.8002 | 0.7697 | 0.608 | 225 |
| V | 53 | 0.8282 | - | - | - | - | - | 0.62 | - |
5 | A | 54 | 0.5029 | 0.6692 | 0.6113 | 0.7737 | 0.4651 | 0.2869 | 1.72 | 53 |
| P | 53 | 0.5021 | 0.6405 | 0.4509 | 0.7327 | 0.4693 | 0.1690 | 1.67 | 51 |
| C | 54 | 0.7861 | 0.8718 | 0.7648 | 0.8870 | 0.7705 | 0.7127 | 0.736 | 191 |
| V | 53 | 0.7149 | - | - | - | - | - | 0.82 | - |
Table 7.
The statistical quality of models applying descriptor D0(3,15) and target function TF1. The average determination coefficient on the validation set is 0.525 ± 0.044; bold indicates the best model.
Table 7.
The statistical quality of models applying descriptor D0(3,15) and target function TF1. The average determination coefficient on the validation set is 0.525 ± 0.044; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 67 | 0.5155 | 0.6803 | 0.6183 | 0.7658 | 0.4920 | 0.0840 | 0.911 | 69 |
| P | 66 | 0.3678 | 0.5213 | 0.4903 | 0.6934 | 0.3308 | −0.3774 | 0.946 | 37 |
| C | 67 | 0.3858 | 0.6178 | 0.6208 | 0.7076 | 0.3421 | −0.2355 | 0.766 | 41 |
| V | 66 | 0.5930 | - | - | - | - | - | 0.66 | - |
2 | A | 67 | 0.4668 | 0.6365 | 0.6631 | 0.7510 | 0.4338 | −0.1719 | 0.957 | 57 |
| P | 67 | 0.2059 | 0.4280 | 0.4162 | 0.7275 | 0.1475 | −0.4468 | 1.12 | 17 |
| C | 66 | 0.3124 | 0.5529 | 0.5569 | 0.6899 | 0.2435 | −0.2488 | 0.759 | 29 |
| V | 66 | 0.5025 | - | - | - | - | - | 0.77 | - |
3 | A | 67 | 0.4031 | 0.5746 | 0.5148 | 0.7392 | 0.3736 | −0.1289 | 0.912 | 44 |
| P | 66 | 0.4394 | 0.4974 | 0.6214 | 0.7483 | 0.4083 | −0.0126 | 1.03 | 50 |
| C | 67 | 0.4752 | 0.6886 | 0.6889 | 0.7352 | 0.4429 | 0.0010 | 0.720 | 59 |
| V | 66 | 0.5421 | - | - | - | - | - | 0.67 | - |
4 | A | 66 | 0.2066 | 0.3425 | 0.4546 | 0.7404 | 0.1625 | −0.4292 | 1.13 | 17 |
| P | 66 | 0.2029 | 0.2730 | 0.3764 | 0.7276 | 0.1615 | −0.6328 | 1.03 | 16 |
| C | 67 | 0.4376 | 0.5641 | 0.5323 | 0.7292 | 0.3963 | −0.1733 | 0.661 | 51 |
| V | 67 | 0.4589 | - | - | - | - | - | 0.79 | - |
5 | A | 67 | 0.5872 | 0.7400 | 0.7006 | 0.7865 | 0.5647 | 0.0027 | 0.866 | 92 |
| P | 66 | 0.4448 | 0.6107 | 0.4936 | 0.7539 | 0.4164 | −0.0304 | 0.935 | 51 |
| C | 66 | 0.4951 | 0.6817 | 0.7036 | 0.7503 | 0.4589 | −0.0205 | 0.747 | 63 |
| V | 67 | 0.5306 | - | - | - | - | - | 0.84 | - |
Table 8.
The statistical quality of models applying descriptor D0(3,15) and target function TF2. The average determination coefficient on the validation set is 0.458 ± 0.067; bold indicates the best model.
Table 8.
The statistical quality of models applying descriptor D0(3,15) and target function TF2. The average determination coefficient on the validation set is 0.458 ± 0.067; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 67 | 0.5535 | 0.7126 | 0.6802 | 0.7718 | 0.5312 | 0.1148 | 0.874 | 81 |
| P | 66 | 0.4136 | 0.5689 | 0.5260 | 0.7047 | 0.3776 | −0.4531 | 0.906 | 45 |
| C | 67 | 0.1656 | 0.4055 | 0.3465 | 0.7516 | 0.0819 | −0.5802 | 0.945 | 13 |
| V | 66 | 0.4695 | - | - | - | - | - | 0.79 | - |
2 | A | 67 | 0.4294 | 0.6008 | 0.5991 | 0.7269 | 0.3923 | −0.0821 | 0.990 | 49 |
| P | 67 | 0.2858 | 0.4999 | 0.5095 | 0.7189 | 0.2420 | −0.2515 | 1.05 | 26 |
| C | 66 | 0.0725 | 0.2594 | 0.1964 | 0.8198 | 0.0000 | −0.4639 | 1.07 | 5 |
| V | 66 | 0.4318 | - | - | - | - | - | 0.84 | - |
3 | A | 67 | 0.4688 | 0.6384 | 0.6260 | 0.7287 | 0.4425 | −0.0381 | 0.861 | 57 |
| P | 66 | 0.4683 | 0.6264 | 0.5587 | 0.7550 | 0.4374 | 0.0264 | 0.963 | 56 |
| C | 67 | 0.4041 | 0.6135 | 0.4929 | 0.7330 | 0.3651 | −0.0690 | 0.925 | 44 |
| V | 66 | 0.5190 | - | - | - | - | - | 0.79 | - |
4 | A | 66 | 0.3085 | 0.4715 | 0.5227 | 0.7212 | 0.2710 | −0.3468 | 1.06 | 29 |
| P | 66 | 0.2500 | 0.3513 | 0.3068 | 0.7132 | 0.2077 | −0.3624 | 1.01 | 21 |
| C | 67 | 0.2592 | 0.4627 | 0.2994 | 0.7631 | 0.2109 | −0.3577 | 0.831 | 23 |
| V | 67 | 0.3431 | - | - | - | - | - | 0.91 | - |
5 | A | 67 | 0.5575 | 0.7159 | 0.6429 | 0.7754 | 0.5317 | 0.1568 | 0.897 | 82 |
| P | 66 | 0.5528 | 0.6442 | 0.5107 | 0.7535 | 0.5297 | −0.0981 | 0.927 | 79 |
| C | 66 | 0.4001 | 0.5650 | 0.5668 | 0.7136 | 0.3378 | −0.1096 | 0.981 | 43 |
| V | 67 | 0.5286 | - | - | - | - | - | 0.95 | - |
Table 9.
The statistical quality of models applying descriptor D0(3,15) and target function TF3. The average determination coefficient on the validation set is 0.526 ± 0.028; bold indicates the best model.
Table 9.
The statistical quality of models applying descriptor D0(3,15) and target function TF3. The average determination coefficient on the validation set is 0.526 ± 0.028; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 67 | 0.4596 | 0.6298 | 0.5497 | 0.7402 | 0.4325 | −0.1192 | 0.962 | 55 |
| P | 66 | 0.3353 | 0.4846 | 0.4509 | 0.6947 | 0.2989 | −0.3925 | 0.965 | 32 |
| C | 67 | 0.4511 | 0.6702 | 0.6598 | 0.7596 | 0.4141 | 0.2419 | 0.679 | 53 |
| V | 66 | 0.5430 | - | - | - | - | - | 0.68 | - |
2 | A | 67 | 0.3705 | 0.5406 | 0.5908 | 0.7274 | 0.3345 | −0.2820 | 1.04 | 38 |
| P | 67 | 0.2444 | 0.4128 | 0.4831 | 0.7252 | 0.1971 | −0.6011 | 1.05 | 21 |
| C | 66 | 0.3725 | 0.6103 | 0.5246 | 0.7522 | 0.3210 | 0.0950 | 0.666 | 38 |
| V | 66 | 0.5385 | - | - | - | - | - | 0.69 | - |
3 | A | 67 | 0.3762 | 0.5467 | 0.5953 | 0.7323 | 0.3472 | −0.2253 | 0.933 | 39 |
| P | 66 | 0.5037 | 0.5975 | 0.6135 | 0.7623 | 0.4745 | 0.1579 | 0.950 | 65 |
| C | 67 | 0.4930 | 0.7015 | 0.5590 | 0.7513 | 0.4606 | 0.1828 | 0.713 | 63 |
| V | 66 | 0.5465 | - | - | - | - | - | 0.69 | - |
4 | A | 66 | 0.2489 | 0.3986 | 0.4695 | 0.7096 | 0.2076 | −0.5025 | 1.10 | 21 |
| P | 66 | 0.1949 | 0.2997 | 0.2903 | 0.7325 | 0.1499 | −0.6218 | 1.04 | 15 |
| C | 67 | 0.4518 | 0.5918 | 0.3463 | 0.8025 | 0.4167 | 0.3853 | 0.699 | 54 |
| V | 67 | 0.4713 | - | - | - | - | - | 0.80 | - |
5 | A | 67 | 0.5545 | 0.7134 | 0.7227 | 0.7675 | 0.5239 | −0.0080 | 0.900 | 81 |
| P | 66 | 0.3329 | 0.5330 | 0.3591 | 0.7409 | 0.2955 | −0.1086 | 1.07 | 32 |
| C | 66 | 0.4773 | 0.6659 | 0.6393 | 0.7738 | 0.4358 | 0.2862 | 0.744 | 58 |
| V | 67 | 0.5329 | - | - | - | - | - | 0.90 | - |
Table 10.
The statistical quality of models applying descriptor DS(3,15) and target function TF1. The average determination coefficient on the validation set is 0.711 ± 0.045; bold indicates the best model.
Table 10.
The statistical quality of models applying descriptor DS(3,15) and target function TF1. The average determination coefficient on the validation set is 0.711 ± 0.045; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 67 | 0.5430 | 0.7038 | 0.6345 | 0.7586 | 0.5190 | 0.1508 | 0.885 | 77 |
| P | 66 | 0.4224 | 0.5744 | 0.6110 | 0.7084 | 0.3909 | −0.2587 | 0.894 | 47 |
| C | 67 | 0.5644 | 0.7497 | 0.7512 | 0.7799 | 0.5353 | −0.0017 | 0.611 | 84 |
| V | 66 | 0.6833 | - | - | - | - | - | 0.62 | - |
2 | A | 67 | 0.4364 | 0.6077 | 0.6412 | 0.7088 | 0.4043 | −0.2366 | 0.984 | 50 |
| P | 67 | 0.3732 | 0.4999 | 0.5345 | 0.6830 | 0.3404 | −0.4046 | 0.961 | 39 |
| C | 66 | 0.7101 | 0.8420 | 0.8426 | 0.8300 | 0.6932 | 0.4950 | 0.424 | 157 |
| V | 66 | 0.7877 | - | - | - | - | - | 0.46 | - |
3 | A | 67 | 0.4621 | 0.6321 | 0.6598 | 0.7226 | 0.4336 | −0.2341 | 0.866 | 56 |
| P | 66 | 0.4426 | 0.6202 | 0.6375 | 0.7519 | 0.4126 | 0.0041 | 0.983 | 51 |
| C | 67 | 0.6435 | 0.7984 | 0.8006 | 0.7968 | 0.6227 | −0.0491 | 0.601 | 117 |
| V | 66 | 0.7067 | - | - | - | - | - | 0.54 | - |
4 | A | 66 | 0.4453 | 0.6162 | 0.5911 | 0.7221 | 0.4187 | −0.5249 | 0.945 | 51 |
| P | 66 | 0.3214 | 0.5133 | 0.4233 | 0.7167 | 0.2825 | −0.2437 | 0.951 | 30 |
| C | 67 | 0.6268 | 0.7763 | 0.7901 | 0.8344 | 0.6016 | 0.5212 | 0.602 | 109 |
| V | 67 | 0.6535 | - | - | - | - | - | 0.62 | - |
5 | A | 67 | 0.5692 | 0.7255 | 0.7323 | 0.7592 | 0.5468 | −0.1441 | 0.885 | 86 |
| P | 66 | 0.4215 | 0.5878 | 0.6413 | 0.7188 | 0.3907 | −0.2753 | 0.946 | 47 |
| C | 66 | 0.6686 | 0.8113 | 0.8174 | 0.8112 | 0.6493 | 0.1802 | 0.535 | 129 |
| V | 67 | 0.7241 | - | - | - | - | - | 0.57 | - |
Table 11.
The statistical quality of models applying descriptor DS(3,15) and target function TF2. The average determination coefficient on the validation set is 0.711 ± 0.034; bold indicates the best model.
Table 11.
The statistical quality of models applying descriptor DS(3,15) and target function TF2. The average determination coefficient on the validation set is 0.711 ± 0.034; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 67 | 0.6152 | 0.7618 | 0.6754 | 0.7861 | 0.5967 | 0.3185 | 0.812 | 104 |
| P | 66 | 0.4360 | 0.6458 | 0.6359 | 0.7089 | 0.4015 | −0.1545 | 0.898 | 49 |
| C | 67 | 0.5035 | 0.6887 | 0.6943 | 0.7676 | 0.4684 | −0.1207 | 0.774 | 66 |
| V | 66 | 0.7338 | - | - | - | - | - | 0.75 | - |
2 | A | 67 | 0.5085 | 0.6742 | 0.6921 | 0.7277 | 0.4812 | −0.0749 | 0.919 | 67 |
| P | 67 | 0.4444 | 0.5770 | 0.5553 | 0.7067 | 0.4126 | −0.0500 | 0.895 | 52 |
| C | 66 | 0.5879 | 0.7622 | 0.7597 | 0.7653 | 0.5603 | 0.0988 | 0.541 | 91 |
| V | 66 | 0.7486 | - | - | - | - | - | 0.52 | - |
3 | A | 67 | 0.5492 | 0.7090 | 0.7193 | 0.7404 | 0.5275 | −0.2055 | 0.793 | 79 |
| P | 66 | 0.5238 | 0.6980 | 0.6418 | 0.7572 | 0.4936 | 0.1345 | 0.925 | 70 |
| C | 67 | 0.6190 | 0.7524 | 0.7493 | 0.8018 | 0.5961 | 0.2755 | 0.758 | 106 |
| V | 66 | 0.7169 | - | - | - | - | - | 0.65 | - |
4 | A | 66 | 0.4800 | 0.6486 | 0.6521 | 0.7417 | 0.4547 | −0.1824 | 0.915 | 59 |
| P | 66 | 0.3509 | 0.5810 | 0.4455 | 0.7057 | 0.3099 | −0.2432 | 0.971 | 35 |
| C | 67 | 0.6021 | 0.7294 | 0.7334 | 0.8442 | 0.5766 | 0.5658 | 0.746 | 98 |
| V | 67 | 0.6484 | - | - | - | - | - | 0.74 | - |
5 | A | 67 | 0.5723 | 0.7280 | 0.7343 | 0.7772 | 0.5480 | 0.1446 | 0.882 | 87 |
| P | 66 | 0.5737 | 0.6713 | 0.7350 | 0.7657 | 0.5493 | 0.0978 | 0.863 | 86 |
| C | 66 | 0.5202 | 0.7016 | 0.5059 | 0.7593 | 0.4789 | 0.1184 | 0.699 | 69 |
| V | 67 | 0.7092 | - | - | - | - | - | 0.68 | - |
Table 12.
The statistical quality of models applying descriptor DS(3,15) and target function TF3. The average determination coefficient on the validation set is 0.750 ± 0.033; bold indicates the best model.
Table 12.
The statistical quality of models applying descriptor DS(3,15) and target function TF3. The average determination coefficient on the validation set is 0.750 ± 0.033; bold indicates the best model.
Split | Set | n | R2 | CCC | IIC | CII | Q2 | CCCP | RMSE | F |
---|
1 | A | 67 | 0.5438 | 0.7045 | 0.5979 | 0.7593 | 0.5204 | 0.1170 | 0.884 | 77 |
| P | 66 | 0.3943 | 0.5999 | 0.6094 | 0.7036 | 0.3621 | −0.1837 | 0.921 | 42 |
| C | 67 | 0.7085 | 0.8351 | 0.8059 | 0.8480 | 0.6894 | 0.5784 | 0.522 | 158 |
| V | 66 | 0.7706 | - | - | - | - | - | 0.59 | - |
2 | A | 67 | 0.4710 | 0.6404 | 0.6274 | 0.7269 | 0.4416 | −0.0220 | 0.953 | 58 |
| P | 67 | 0.4006 | 0.5149 | 0.5772 | 0.7042 | 0.3685 | −0.1156 | 0.939 | 43 |
| C | 66 | 0.7218 | 0.8474 | 0.7631 | 0.8545 | 0.7011 | 0.6390 | 0.417 | 166 |
| V | 66 | 0.7683 | - | - | - | - | - | 0.47 | - |
3 | A | 67 | 0.4356 | 0.6069 | 0.5683 | 0.6982 | 0.4014 | −0.3446 | 0.887 | 50 |
| P | 66 | 0.4271 | 0.6144 | 0.6362 | 0.7362 | 0.3962 | −0.1572 | 1.00 | 48 |
| C | 67 | 0.7418 | 0.8561 | 0.7376 | 0.8580 | 0.7259 | 0.5858 | 0.514 | 187 |
| V | 66 | 0.7875 | - | - | - | - | - | 0.50 | - |
4 | A | 66 | 0.4806 | 0.6492 | 0.6140 | 0.7290 | 0.4561 | −0.3088 | 0.915 | 59 |
| P | 66 | 0.3203 | 0.5485 | 0.3616 | 0.6975 | 0.2773 | −0.3392 | 0.981 | 30 |
| C | 67 | 0.7164 | 0.7947 | 0.5741 | 0.8660 | 0.6959 | 0.6680 | 0.647 | 164 |
| V | 67 | 0.7251 | - | - | - | - | - | 0.68 | - |
5 | A | 67 | 0.5178 | 0.6823 | 0.6984 | 0.7536 | 0.4906 | −0.0139 | 0.936 | 70 |
| P | 66 | 0.4228 | 0.5667 | 0.4983 | 0.7358 | 0.3914 | 0.0413 | 0.971 | 47 |
| C | 66 | 0.6928 | 0.8174 | 0.6477 | 0.8443 | 0.6732 | 0.5595 | 0.531 | 144 |
| V | 67 | 0.6991 | - | - | - | - | - | 0.58 | - |
Table 13.
The average value and dispersions of determination coefficients on the validation set for the model’s antioxidant activity of peptides (dataset 1).
Table 13.
The average value and dispersions of determination coefficients on the validation set for the model’s antioxidant activity of peptides (dataset 1).
Descriptor | TF1 | TF2 | TF3 |
---|
D0(3,15) | 0.673 ± 0.064 | 0.556 ± 0.144 | 0.719 ± 0.071 |
DS(3,15) | 0.718 ± 0.065 | 0.550 ± 0.131 | 0.764 ± 0.068 |
Table 14.
The average value and dispersions of determination coefficients on the validation set for the model’s antioxidant activity of peptides (dataset 2).
Table 14.
The average value and dispersions of determination coefficients on the validation set for the model’s antioxidant activity of peptides (dataset 2).
Descriptor | TF1 | TF2 | TF3 |
---|
D0(3,15) | 0.525 ± 0.044 | 0.458 ± 0.067 | 0.526 ± 0.028 |
DS(3,15) | 0.711 ± 0.045 | 0.711 ± 0.034 | 0.750 ± 0.033 |
Table 15.
Comparison of models on the activity of peptides.
Table 15.
Comparison of models on the activity of peptides.
n | R2 | References | Comment |
---|
32 | 0.746 | [32] | pIC50 anti-cancer |
- | 0.708 | [33] | Antioxidant activity of tripeptides |
54 | 0.865 | This work | Antioxidant activity of tripeptides (best model) |
66 | 0.787 | This work | The inhibitory activity on (ACE) (best model) |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).