Next Article in Journal
Recyclability Perspectives of the Most Diffused Biobased and Biodegradable Plastic Materials
Previous Article in Journal
Autoclaving Achieves pH-Neutralization, Hydrogelation, and Sterilization of Chitosan Hydrogels in One Step
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantitative Structure–Activity Relationship Models for the Angiotensin-Converting Enzyme Inhibitory Activities of Short-Chain Peptides of Goat Milk Using Quasi-SMILES

by
Alla P. Toropova
*,
Andrey A. Toropov
,
Alessandra Roncaglioni
and
Emilio Benfenati
Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Science, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy
*
Author to whom correspondence should be addressed.
Macromol 2024, 4(2), 387-400; https://doi.org/10.3390/macromol4020022
Submission received: 27 March 2024 / Revised: 17 May 2024 / Accepted: 31 May 2024 / Published: 4 June 2024

Abstract

:
The inhibitory activity of peptides on angiotensin-converting enzyme (ACE) is a measure of their antihypertensive potential. Quantitative structure–activity relationship (QSAR) models obtained based on the analysis of sequences of amino acids are suggested. The average determination coefficient for the active training sets is 0.36 ± 0.07. The average determination coefficient for validation sets is 0.79 ± 0.02. The paradoxical situation is caused by applying the vector of ideality of correlation, which improves the statistical quality of a model for the calibration and validation sets but is detrimental to the statistical quality of models for the training sets.

1. Introduction

Currently, computer methodologies are often used to evaluate the physicochemical and biochemical behavior of various substances [1,2,3,4,5,6,7,8,9,10]. The possibility of creating and distributing databases and software largely determines the semantic framework of modern scientific research activities. In other words, one of the main drivers of the research is the development of mathematical models, which can facilitate the theoretical framework in a field and explore new scenarios. An unexpected obstacle on this path is the contradiction between the mathematical aiming of accuracy or ideality, on the one hand, and the aiming of instability and dispersion of many natural phenomena, including the values of endpoints used for the characterization and evaluation of substances for various purposes. Monte Carlo methods [11,12,13,14,15,16,17,18,19,20], in principle, can be a compromise in resolving the aforementioned contradiction between nature and mathematics. However, the cost of this compromise is uncertain; consequently, one can discuss some intervals as a more useful indicator compared with pseudo-reliable and pseudo-accurate theoretical values for physicochemical and biochemical endpoints.
To think that one can construct an ideal, reliable model is quite a naive conception. Again, there is a poisonous contradiction between the desire to build a “super-excellent” model in a certain area of space called the applicability domain and the inevitability of obtaining a group of outliers even for a very carefully planned applicability domain. Any software designed to estimate the experimental value of any endpoint is only a simulacrum (surrogate) of a real experiment to determine this endpoint.
The classic idea of endpoint modeling is to base the model on the molecular structure of the substance under study. For these purposes, the molecular structure is represented by a mathematical graph in which the vertices represent atoms and the edges represent covalent bonds [21,22].
It is possible to radically change the physical meaning of the graph if its vertices represent not atoms but molecular orbitals (1s2, 2s2, 3p6, etc.). The edges in such a graph will be a fragmented representation of the classical covalent bond between atoms split into orbitals. Such graphs are called graphs of atomic orbitals (GAOs). GAOs were used to develop quantitative structure-property/activity relationships (QSPRs/QSARs) [23,24,25].
In the case of studying peptides and proteins, representing these substances for modeling in the form of graphs becomes inconvenient for the following reasons: Firstly, the corresponding graphs are doomed to be of enormous size, that is, to have a very large number of atoms (vertices of the graph). If the graph is represented through an adjacency matrix, then for the case of n atoms, a square n × n matrix is required. Even the rapid growth of memory and computer speed cannot provide the ability to process databases with such representations of peptide and protein molecules in an acceptable time if we are talking about hundreds or even thousands of such molecules/molecular graphs. Secondly, the repetition of molecular fragments that are slightly different and carry similar physicochemical and biomedical functions is inevitable. This makes it a very tempting alternative to consider the structures containing data about these similar fragments in a more compact form. In other words, it is tempting to build models of proteins and peptides using amino acid identities as a basis [26,27].
In this way, three levels of representation of substances for QSPR/QSAR can be considered and/or compared. The most detailed level is the GAO. The intermediate level is the “classical” molecular graph (hydrogen-filled graph or hydrogen-suppressed graph). Moreover, an option that is somewhat out of the logical chain is the presentation of the structure of the peptide as a sequence of amino acids. It should be noted that the representation of peptide structure via a sequence of amino acids is quite similar to the representation of the molecular structure of organic compounds via SMILES [28].
The famous saying “knowledge is power” makes one think about the following: It is quite natural to accept that knowledge is information. If one solves a certain problem, one needs information (knowledge). If there is not enough information, solving the problem will require a lot of “extra” effort. If there is exactly as much information as needed, solving the problem will require a minimum of effort. The key point is as follows: if there is more information than is needed to solve a problem, then solving the problem will require additional effort to filter out and ignore unnecessary information. Thus, excess information is nothing more than a variant of disinformation. From this point of view, it is very likely that for constructing models of the biochemical behavior of peptides, the traditional graph and GAO are rather misinformation, despite the fact that simulation of the biochemical behavior of peptides using graphs has been described in the literature [4].
In addition to SMILES, so-called quasi-SMILES can be used for QSPR/QSAR research. Quasi-SMILES have been used to construct models of the physicochemical and biochemical behavior of nanomaterials under various experimental conditions [29,30]. It should be noted that the sequence of amino acids that make up the peptide could be considered as quasi-SMILES [31,32,33]. By the way, quasi-SMILES are able to serve as a tool to establish QSPR/QSAR for “usual” organic molecules if the experimental conditions are taken into account [34,35]. However likely, the main application of the quasi-SMILES is the development of models for the physicochemical and biochemical behavior of diverse nanomaterials [36,37,38,39,40].
Pesticides are one of the major sources of contamination of dairy products due to the presence of their residues in animal feedstuffs. Other contributory factors in this regard may include the application of pesticides to farm animals, environmental contamination, and accidental spills. Milk contamination may be avoided by hindering the entry of pesticide residues into dairy animals through contaminated feedstuffs/food chains [41].
This work attempts to use the Monte Carlo method to construct a model of the inhibitory activity of short-chain peptides in goat milk.
Angiotensin-converting enzyme (ACE) inhibitory peptides are a significant component of food technology. The peptide activities are expressed by IC50 values, which represent the peptide concentration (in μM) required to block ACE activity by 50%. The above-mentioned activity is valuable information; therefore, the quantitative structure–activity relationships (QSARs) for the above endpoint are developed here.
The study aims to build up QSAR models for the pIC50 and estimate the stability of the predictive potential of the model observed for different distributions in the training and validation sets. However, unlike previous works devoted to developing models using the Monte Carlo technique, here we consider fragments of local symmetry in the amino acid sequences that make up the peptides [7]. Local symmetry is the coincidence of one-character abbreviations of amino acids in a three-character fragment included in the peptide. This can be considered a composition of three symbols, i.e., xyx, where x and y are arbitrary, but x is not equal to y.

2. Materials and Methods

2.1. Data

Here, models of the inhibitory activity of peptides in goat milk are developed and evaluated. The peptides are represented by sequences of amino acids. Experimental data on inhibitory activity are taken from [42]. The work [42] presents data on 268 peptides, but our software identified two duplicates, which were eliminated from further consideration. In this way, 266 peptides are considered. These sequences are applied as the quasi-SMILES [43].

2.2. Splitting Available Data into Training and Validation Sets

The CORAL-2023 (http://www.insilico.eu/coral, accessed on 20 May 2024) software provides the user with several options for constructing models. One can build a model by dividing the available data into a training set and a validation set. However, in this case, a number of doubts remain unexamined. First, the division may be too successful. Then, the model can inspire the user with false hopes, which will be dispelled when trying to apply the model to new data that was not available at the time of its construction. Secondly, the splitting of training and validation sets may be too unfortunate. Then, the model again misinforms the user about the real state of affairs, making him think that the model for the endpoint and set of connections being measured is impossible (although this may not be the case).
These doubts led to the development of the concept of simulation, based on the following principles: 1. The statistical quality of the model depends on the chosen division of data into a training and validation set. 2. In the process of building a model based on correlation weighting, there must be feedback between the currently observed statistical quality of the model for the objects under consideration in the training set and the statistical quality of the model for similar objects that are not visible to the simulation. 3. Overtraining (when the model becomes excellent for objects involved in building the model and useless for external objects not involved in building the model) should be blocked.
To practically follow the above principles, the peptides considered were distributed into the following four subsets: (i) the active training set; (ii) the passive training set; (iii) the calibration set; and (iv) the validation set. Thus, instead of a traditional monolithic training set, a structured training set is used, which includes active and passive learning together with a calibration set. The active training set is a list of those objects (peptides) that are involved in the development of the model (their structure, expressed by the sequence of amino acids, for which correlation weights are optimized using the Monte Carlo method). A passive training set is a list of peptides whose data are not used for the specified optimization; however, the quality of the model for them is taken into account. Finally, the calibration set is a list of peptides that are used to recognize the moment when optimization begins to produce the statistical quality of the active learning model significantly better than the one observed on the calibration set.

2.3. Optimal Descriptor

By the Monte Carlo method, using active and passive training sets together with a calibration set, the so-called correlation weights of amino acids were calculated, which give the maximum value of the objective function, defined as follows:
T F = r A T + r P T r A T r P T × α + ( I I C + C I I ) × β
where rAT and rPT are determination coefficients for active and passive training sets; IIC is the index of ideality of correlation [24]; CII is the correlation intensity index [24]; and α = 0.1 and β = 0.3. Alpha and beta coefficients are selected empirically. The indicated values for other QSPR/QSAR tasks may vary depending on the results of the corresponding computational experiments.
Calculations of IIC and CII are described in the literature [44]. The main idea of the IIC is an attempt to combine the statistical quality of the model, transmitted through the values of the coefficients of determination, and the statistical quality of the model, transmitted through the values of the average absolute error. Instead of the latter, one can use the root mean square error, but the use of the absolute error turned out to be more effective in terms of the results of the stochastic process of the optimization by the Monte Carlo method. The basic idea of CII is to evaluate how individual peptides influence the overall correlation in the set. In the case of CII, the overall total contribution of all opponents of the correlation is examined, that is, those peptides whose removal from consideration leads to an improvement in the correlation.
The optimal descriptor calculated with quasi-SMILES is calculated as follows:
D C W T , N = C W A + C W A x A y + C W ( F L S )
where A is one symbol abbreviation of an amino acid and AxAy is a pair of neighboring amino acids. FLS is a fragment of local symmetry suggested in [45]. Here T = 3 and N = 15 are used.

3. Results

Models for five random splits of available data into the active training, passive training, calibration, and validation sets are as follows:
pIC50 = 1.491 (±0.045) + 0.5435 (±0.0082) × DCW(3, 15)
pIC50 = 2.395 (±0.035) + 0.3710 (±0.0063) × DCW(3, 15)
pIC50 = 2.646 (±0.040) + 0.2532 (±0.0062) × DCW(3, 15)
pIC50 = 2.366 (±0.049) + 0.3063 (±0.0070) × DCW(3, 15)
pIC50 = 1.314 (±0.051) + 0.4341 (±0.0075) × DCW(3, 15)
Table 1 contains the statistical characteristics of models for pIC50 related to considered peptides calculated with Equations (3)–(7).
One can see that for all five splits into training and validation sets, the predictive potential (statistics on the validation set) is quite good. At the same time, the determination coefficient for active and passive training samples is very poor. As noted above, this is due to the use of the idealization vector in the corresponding Monte Carlo calculations. Indeed, at the stage of the active and passive training sets, the model is not yet complete, but it is complete with the results shown for the calibration set, which is still part of the modeling phase. The results of the calibration set represent those of the model when finished. The results of the validation set are obtained using peptides never used by the software. Thus, it is appropriate to compare the results of the validation set with those of the calibration set. Graphically, the situation under discussion can be illustrated by the splitting of correlation clusters (Figure 1). In fact, instead of the traditional single correlations for training samples, they are divided into pairs of correlations. These clusters are defined by the magnitude of the differences between the observed and model values. If this value is positive (including zero), then the corresponding point is indicated in green; if this value is negative, the corresponding point is indicated in red. Figure 1 shows that such clustering occurs for all four sets considered (active and passive training sets, calibration sets, and validation sets). However, these clusters for the training samples are located quite far from each other, while for the control and validation samples, these clusters are much closer.
Table 2 contains an example of the technical details of the model obtained with split 1 for the validation set (Equation (3)).
Peptides are quite a popular object of research, both in medical aspects [46,47] and in food chemistry [48]. The study discussed here is useful in both of these areas. The presented approach can also be applied to modeling arbitrary biological phenomena based on the behavior of peptides. Thus, the considered approach can be used for practical applications [33]. Table 3 contains a comparison of the statistical quality of the suggested models with the statistical quality of analogic models from the literature. It has been shown that the predictive potential of the models obtained by the described approach is better than the statistical quality of the models described in the literature [42].
Carrying out several runs of the described optimization procedure makes it possible to determine the probabilistic mechanistic interpretation of the considered models. Table 4 presents the results of five runs of the Monte Carlo optimization procedure. It can be seen that the absence of fragments of local symmetry is favorable for increasing the studied endpoint. At the same time, the presence of [xyx2] is favorable for reducing the endpoint value. Some examples of the individual effects of amino acids and their combinations are also presented in Table 4. It is obvious, however, that for a constructive comparison of the features of the peptides under consideration, it is necessary to take into account the frequencies of their presence in the active and passive training sets, as well as in the calibration set. From this point of view, case Proline (P) is the most reliable, while case Arginine (R) is the least reliable.
Based on consideration of these data and assuming that they are true, one can attempt to design the activity of peptides that are not in the peptide inhibitory activity database used here. The following peptides (combinations of amino acids) are not in the database used: EGGY, YPYP, ALLG, PLLP, PLPG, LYLP, FYYF, YFYF, PGPF, and LFFL. Figure 2 shows the relative values of the inhibitory activity of these peptides.
Supplementary Materials (Tables S1–S5) contain the technical details of the described models for all splits considered.

4. Discussion

This study was planned as a means of testing the ability of local symmetry fragments to improve the predictive potential of models obtained based on optimal descriptors, which are the sum of the correlation weights of the codes forming quasi-SMILES. These quasi-SMILES are unusual in that they are built on the use of amino acids as independent, indivisible parts of the peptide structure. The fragments of local symmetry proposed here are very ambiguous since, on the one hand, they certainly exist as a special concrete phenomenon in the structure of peptides, and on the other hand, they are “almost” not related to traditional mathematical symmetry. The only more or less mathematical term that can be associated with fragments of local symmetry is equivalent positions. For example, for ‘xyx’, both ‘x’ are in equivalent positions, but only for a selected fragment of the peptide (the sequence of amino acids).
At the same time, three amino acids may turn out to be an information codon in DNA or RHA, e.g., AAA, AGA, and GGG are codons [49]. Therefore, considering triplets of amino acids as carriers of some topological symmetry may be a useful marker in the study of peptides, at least in the QSPR/QSAR aspects [49].
Regarding the mechanistic interpretation of the model, it should be noted that there are some logical inconsistencies between the data presented in Table 4 and the results of the virtual analysis presented in Figure 2. Firstly, according to Table 4, FLS of the type ‘xyyx0’ (but not ‘xyyx1’) should contribute to an increase in inhibitory activity. However, from Figure 2, it is clear that the highest values of the endpoint are observed for the case ‘xyyx1’ (PLLP, FYYF, and LFFL). Secondly, ‘xyx2’, according to Table 4, should help reduce inhibitory activity. However, according to Figure 2, this feature accompanies medium activity levels but not low ones. These inconsistencies can be resolved by using a larger database. In particular, it is quite promising to carry out similar computational experiments with peptides represented by chains of amino acids of various lengths.
However, the main goal is to assess the predictive potential of models based on Monte Carlo calculations. The main advantage of the Monte Carlo method is the use of randomness to solve problems that, in principle, may be more or less deterministic. Nowadays, computers easily supply randomness to the interested user through random variable generators [50,51,52,53,54,55,56,57,58,59,60]. The models considered are certainly largely random. The splitting of available peptides into training sets and validation sets is random. Modifications of correlation weights are random. The sequence of calls to correlation weights to modify them in the stochastic Monte Carlo process is random. However, an important result of this and other works is that there is reproducibility of the results, which consists of the obvious similarity of the determination coefficients for the calibration set and the validation set for all five random partitions considered. Last but not least, quasi-SMILES is a user-friendly language for formulating problems solved by a computer [61].
The approach used here obeys the OECD principles (defined endpoint; unambiguous algorithm; defined applicability domain; measure of predictive potential; mechanistic interpretation). Possible ways to improve the represented approach are as follows: 1. Extending the work set of peptides by taking into account the ones of different lengths. 2. Considering more diversity of fragments of local symmetry (‘xxyxx’, ‘xyyyx’, etc.). 3. Improving the algorithm of Monte Carlo optimization via the use of new statistical criteria [62]. In addition, quasi-SMILES can be a suitable representation for polymers [37] and nanomaterials [63].

5. Conclusions

The optimal descriptor calculated using quasi-SMILES (sequence of amino acids) by the Monte Carlo method using the Internet-available software “CORAL-2023” (http://www.insilico.eu/coral, accessed on 20 May 2024) can be a completely satisfactory basis for predicting the inhibitory activity of angiotensin-converting enzyme (ACE) peptides, expressed through pIC50 values. Recently proposed local symmetry fragments can be used to improve the predictive potential of the model by correlation weighting them through optimization, also carried out by the Monte Carlo method. The index of ideality of correlation and correlation intensity index one more had confirmed their suitability as a tool to improve the predictive potential of the model. The quasi-SMILES concept may have applications not only for structure-property/activity studies of peptides but also for other substances (such as copolymers and nanomaterials).

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/macromol4020022/s1. Tables S1–S5. Tables S1–S5 contain technical details of models observed for the cases of use split1–split5, respectively.

Author Contributions

Conceptualization, A.P.T., A.A.T., A.R. and E.B.; data curation, A.P.T., A.A.T., A.R. and E.B.; writing—original draft preparation, A.P.T., A.A.T., A.R. and E.B.; writing—review and editing, A.P.T., A.A.T., A.R. and E.B.; supervision, A.R. and E.B.; project administration, E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the data are available within the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Speck-Planche, A.; Cordeiro, M.N.D.S. Computer-aided drug design methodologies toward the design of anti-hepatitis C agents. Curr. Top. Med. Chem. 2012, 12, 802–813. [Google Scholar] [CrossRef]
  2. Chen, C.Y.-C. A novel integrated framework and improved methodology of computer-aided drug design. Curr. Top. Med. Chem. 2013, 13, 965–988. [Google Scholar] [CrossRef]
  3. Raevsky, O.A.; Mukhametov, A.; Grigorev, V.Y.; Ustyugov, A.; Tsay, S.-C.; Hwu, R.J.-R.; Yarla, N.S.; Tarasov, V.V.; Aliev, G.; Bachurin, S.O. Applications of multi-target computer-aided methodologies in molecular design of CNS drugs. Curr. Med. Chem. 2018, 25, 5293–5314. [Google Scholar] [CrossRef]
  4. Klopman, G.; Ptchelintsev, D. Antifungal triazole alcohols: A comparative analysis of structure-activity, structure-teratogenicity and structure-therapeutic index relationships using the Multiple Computer-Automated Structure Evaluation (Multi-CASE) methodology. J. Comput.-Aided Mol. Des. 1993, 7, 349–362. [Google Scholar] [CrossRef]
  5. Klopman, G.; Ptchelintsev, D. Application of the Computer Automated Structure Evaluation Methodology to a QSAR Study of Chemoreception. Aromatic Musky Odorants. J. Agric. Food Chem. 1992, 40, 2244–2251. [Google Scholar] [CrossRef]
  6. Gordeeva, E.V.; Molchanova, M.S.; Zefirov, N.S. General methodology and computer program for the exhaustive restoring of chemical structures by molecular connectivity indexes. Solution of the inverse problem in QSAR/QSPR. Tetrahedron Comput. Methodol. 1990, 3 Pt B, 389–415. [Google Scholar] [CrossRef]
  7. Speck-Planche, A.; Luan, F.; Cordeiro, M. Abelson tyrosine-protein kinase 1 as principal target for drug discovery against leukemias role of the current computer-aided drug design methodologies. Curr. Top. Med. Chem. 2012, 12, 2745–2762. [Google Scholar] [CrossRef]
  8. Bordás, B.; Kömíves, T.; Lopata, A. Ligand-based computer-aided pesticide design. A review of applications of the CoMFA and CoMSIA methodologies. Pest Manag. Sci. 2003, 59, 393–400. [Google Scholar] [CrossRef]
  9. Scotti, L.; Scotti, M.T.; De Oliveira Lima, E.; Da Silva, M.S.; Do Carmo Alves De Lima, M.; Da Rocha Pitta, I.; De Moura, R.O.; De Oliveira, J.G.B.; Da Cruz, R.M.D.; Mendonça, F.J.B., Jr. Experimental methodologies and evaluations of computer-aided drug design methodologies applied to a series of 2-aminothiophene derivatives with antifungal activities. Molecules 2012, 17, 2298–2315. [Google Scholar] [CrossRef]
  10. de Sousa, N.F.; Scotti, L.; de Moura, É.P.; Dos Santos Maia, M.; Rodrigues, G.C.S.; de Medeiros, H.I.R.; Lopes, S.M.; Scotti, M.T. Computer Aided Drug Design Methodologies with Natural Products in the Drug Research Against Alzheimer’s Disease. Curr. Neuropharmacol. 2022, 20, 857–885. [Google Scholar] [CrossRef]
  11. Kumar, A.; Sindhu, J.; Kumar, P. In-silico identification of fingerprint of pyrazolyl sulfonamide responsible for inhibition of N-myristoyltransferase using Monte Carlo method with index of ideality of correlation. J. Biomol. Struct. Dyn. 2021, 39, 5014–5025. [Google Scholar] [CrossRef]
  12. Chen, J.; Intes, X. Comparison of Monte Carlo methods for fluorescence molecular tomography-computational efficiency. Med. Phys. 2011, 38, 5788–5798. [Google Scholar] [CrossRef]
  13. Harvey, J.-P.; Gheribi, A.E.; Chartrand, P. Accurate determination of the Gibbs energy of Cu-Zr melts using the thermodynamic integration method in Monte Carlo simulations. J. Chem. Phys. 2011, 135, 084502. [Google Scholar] [CrossRef]
  14. Chen, H.-C.; Lin, L.-C. Computing Mixture Adsorption in Porous Materials through Flat Histogram Monte Carlo Methods. Langmuir 2023, 39, 15380–15390. [Google Scholar] [CrossRef]
  15. Golubović, M.; Lazarević, M.; Zlatanović, D.; Krtinić, D.; Stoičkov, V.; Mladenović, B.; Milić, D.J.; Sokolović, D.; Veselinović, A.M. The anesthetic action of some polyhalogenated ethers—Monte Carlo method based QSAR study. Comput. Biol. Chem. 2018, 75, 32–38. [Google Scholar] [CrossRef]
  16. Kumar, P.; Kumar, A. Nucleobase sequence based building up of reliable QSAR models with the index of ideality correlation using Monte Carlo method. J. Biomol. Struct. Dyn. 2020, 38, 3296–3306. [Google Scholar] [CrossRef]
  17. Geoghegan, T.J.; Nelson, N.P.; Flynn, R.T.; Hill, P.M.; Rana, S.; Hyer, D.E. Design of a focused collimator for proton therapy spot scanning using Monte Carlo methods. Med. Phys. 2020, 47, 2725–2734. [Google Scholar] [CrossRef]
  18. Chopdar, K.S.; Dash, G.C.; Mohapatra, P.K.; Nayak, B.; Raval, M.K. Monte-Carlo method-based QSAR model to discover phytochemical urease inhibitors using SMILES and GRAPH descriptors. J. Biomol. Struct. Dyn. 2022, 40, 5090–5099. [Google Scholar] [CrossRef]
  19. Zhang, X.; Chong, K.H.; Zhu, L.; Zheng, J. A Monte Carlo method for in silico modeling and visualization of Waddington’s epigenetic landscape with intermediate details. BioSystems 2020, 198, 104275. [Google Scholar] [CrossRef]
  20. Kumar, P.; Kumar, A. In silico enhancement of azo dye adsorption affinity for cellulose fibre through mechanistic interpretation under guidance of QSPR models using Monte Carlo method with index of ideality correlation. SAR QSAR Environ. Res. 2020, 31, 697–715. [Google Scholar] [CrossRef]
  21. Estrada, E.; Guevara, N.; Gutman, I. Extension of edge connectivity index. Relationships to line graph indices and QSPR applications. J. Chem. Inf. Comput. Sci. 1998, 38, 428–431. [Google Scholar] [CrossRef]
  22. Estrada, E.; González, H. What are the limits of applicability for graph theoretic descriptors in QSPR/QSAR? Modeling dipole moments of aromatic compounds with TOPS-MODE descriptors. J. Chem. Inf. Comput. Sci. 2003, 43, 75–84. [Google Scholar] [CrossRef]
  23. Ahmadi, S.; Mehrabi, M.; Rezaei, S.; Mardafkan, N. Structure-activity relationship of the radical scavenging activities of some natural antioxidants based on the graph of atomic orbitals. J. Mol. Struct. 2019, 1191, 165–174. [Google Scholar] [CrossRef]
  24. Toropova, A.P.; Toropov, A.A.; Rasulev, B.F.; Benfenati, E.; Gini, G.; Leszczynska, D.; Leszczynski, J. QSAR models for ACE-inhibitor activity of tripeptides based on representation of the molecular structure by graph of atomic orbitals and SMILES. Struct. Chem. 2012, 23, 1873–1878. [Google Scholar] [CrossRef]
  25. Toropov, A.A.; Toropova, A.P. QSPR modeling of alkanes properties based on graph of atomic orbitals. J. Mol. Struct. THEOCHEM 2003, 637, 1–10. [Google Scholar] [CrossRef]
  26. Toropov, A.A.; Toropova, A.P.; Raska, I., Jr.; Benfenati, E.; Gini, G. QSAR modeling of endpoints for peptides which is based on representation of the molecular structure by a sequence of amino acids. Struct. Chem. 2012, 23, 1891–1904. [Google Scholar] [CrossRef]
  27. Toropova, A.P.; Toropov, A.A.; Kumar, P.; Kumar, A.; Achary, P.G.R. Fragments of local symmetry in a sequence of amino acids: Does one can use for QSPR/QSAR of peptides? J. Mol. Struct. 2023, 1293, 136300. [Google Scholar] [CrossRef]
  28. Weininger, D. SMILES, a Chemical Language and Information System: 1: Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
  29. Toropova, A.P.; Toropov, A.A.; Rallo, R.; Leszczynska, D.; Leszczynski, J. Optimal descriptor as a translator of eclectic data into prediction of cytotoxicity for metal oxide nanoparticles under different conditions. Ecotoxicol. Environ. Saf. 2015, 112, 39–45. [Google Scholar] [CrossRef]
  30. Trinh, T.X.; Choi, J.-S.; Jeon, H.; Byun, H.-G.; Yoon, T.-H.; Kim, J. Quasi-SMILES-based Nano-Quantitative Structure-Activity Relationship model to predict the cytotoxicity of multiwalled carbon nanotubes to human lung cells. Chem. Res. Toxicol. 2018, 31, 183–190. [Google Scholar] [CrossRef]
  31. Toropov, A.A.; Toropova, A.P.; Leszczynska, D.; Leszczynski, J. “Ideal correlations” for biological activity of peptides. BioSystems 2019, 181, 51–57. [Google Scholar] [CrossRef]
  32. Toropova, A.P.; Toropov, A.A.; Benfenati, E.; Leszczynska, D.; Leszczynski, J. Prediction of antimicrobial activity of large pool of peptides using quasi-SMILES. BioSystems 2018, 169–170, 5–12. [Google Scholar] [CrossRef]
  33. Moinul, M.; Khatun, S.; Abdul Amin, S.; Jha, T.; Gayen, S. Quasi-SMILES as a tool for peptide QSAR modelling. In QSPR/QSAR Analysis Using SMILES and Quasi-SMILES; Toropova, A.P., Toropov, A.A., Eds.; Challenges and Advances in Computational Chemistry and Physics; Springer: Cham, Switzerland, 2023; Volume 33, pp. 269–294. [Google Scholar] [CrossRef]
  34. Kumar, P.; Kumar, A.; Sindhu, J.; Lal, S. Quasi-SMILES as a basis for the development of QSPR models to predict the CO2 capture capacity of deep eutectic solvents using correlation intensity index and consensus modelling. Fuel 2023, 345, 128237. [Google Scholar] [CrossRef]
  35. Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E.; Leszczynska, D.; Leszczynski, J. CORAL: Model of Ecological Impact of Heavy Metals on Soils via the Study of Modification of Concentration of Biomolecules in Earthworms (Eisenia fetida). Arch. Environ. Contam. Toxicol. 2023, 84, 504–515. [Google Scholar] [CrossRef]
  36. Manganelli, S.; Benfenati, E. Nano-QSAR model for predicting cell viability of human embryonic kidney cells. Methods Mol. Biol. 2017, 1601, 275–290. [Google Scholar] [CrossRef]
  37. Toropova, A.P.; Toropov, A.A. Nanomaterials: Quasi-SMILES as a flexible basis for regulation and environmental risk assessment. Sci. Total Environ. 2022, 823, 153747. [Google Scholar] [CrossRef]
  38. Toropova, A.P.; Toropov, A.A. Quasi-SMILES as a basis to build up models of endpoints for nanomaterials. Environ. Technol. 2023, 44, 4460–4467. [Google Scholar] [CrossRef]
  39. Toropov, A.A.; Kjeldsen, F.; Toropova, A.P. Use of quasi-SMILES to build models based on quantitative results from experiments with nanomaterials. Chemosphere 2022, 303, 135086. [Google Scholar] [CrossRef]
  40. Toropova, A.P.; Meneses, J.; Alfaro-Moreno, E.; Toropov, A.A. The system of self-consistent models based on quasi-SMILES as a tool to predict the potential of nano-inhibitors of human lung carcinoma cell line A549 for different experimental conditions. Drug Chem. Toxicol. 2023, 47, 306–313. [Google Scholar] [CrossRef]
  41. Muhammad, F.; Awais, M.M.; Akhtar, M.; Anwar, M.I. Quantitative structure activity relationship and risk analysis of some pesticides in the goat milk. Iran. J. Environ. Health Sci. Eng. 2013, 10, 4. [Google Scholar] [CrossRef]
  42. Du, A.; Jia, W. Bioaccessibility of novel antihypertensive short-chain peptides in goat milk using the INFOGEST static digestion model by effect-directed assays. Food Chem. 2023, 427, 136735. [Google Scholar] [CrossRef]
  43. Toropov, A.A.; Di Nicola, M.R.; Toropova, A.P.; Roncaglioni, A.; Dorne, J.L.C.M.; Benfenati, E. Quasi-SMILES: Self-consistent models for toxicity of organic chemicals to tadpoles. Chemosphere 2023, 312, 137224. [Google Scholar] [CrossRef] [PubMed]
  44. Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. Monte Carlo technique to study the adsorption affinity of azo dyes by applying new statistical criteria of the predictive potential. SAR QSAR Environ. Res. 2022, 33, 621–630. [Google Scholar] [CrossRef]
  45. Toropov, A.A.; Toropova, A.P.; Roncaglioni, A.; Benfenati, E. In silico prediction of the mutagenicity of nitroaromatic compounds using correlation weights of fragments of local symmetry. Mutat. Res. Genet. Toxicol. Environ. Mutagen. 2023, 891, 503684. [Google Scholar] [CrossRef] [PubMed]
  46. Sigala-Robles, R.; Santiago-López, L.; Hernández-Mendoza, A.; Vallejo-Cordoba, B.; Mata-Haro, V.; Wall-Medrano, A.; González-Córdova, A.F. Peptides, exopolysaccharides, and short-chain fatty acids from fermented milk and perspectives on inflammatory bowel diseases. Dig. Dis. Sci. 2022, 67, 4654–4665. [Google Scholar] [CrossRef] [PubMed]
  47. Lewandowski, B.; Wennemers, H. Asymmetric catalysis with short-chain peptides. Curr. Opin. Chem. Biol. 2014, 22, 40–46. [Google Scholar] [CrossRef] [PubMed]
  48. Liu, T.; Sun, Z.; Yang, Z.; Qiao, X. Microbiota-derived short-chain fatty acids and modulation of host-derived peptides formation: Focused on host defense peptides. Biomed. Pharmacother. 2023, 162, 114586. [Google Scholar] [CrossRef] [PubMed]
  49. Lenstra, R. The graph, geometry and symmetries of the genetic code with hamming metric. Symmetry 2015, 7, 1211–1260. [Google Scholar] [CrossRef]
  50. Rehm, L.; Morshed, M.G.; Misra, S.; Shukla, A.; Rakheja, S.; Pinarbasi, M.; Ghosh, A.W.; Kent, A.D. Temperature-resilient random number generation with stochastic actuated magnetic tunnel junction devices. Appl. Phys. Lett. 2024, 124, 052401. [Google Scholar] [CrossRef]
  51. Liman, W.; Oubahmane, M.; Hdoufane, I.; Bjij, I.; Villemin, D.; Daoud, R.; Cherqaoui, D.; Allali, A.E. Monte Carlo method and GA-MLR-based QSAR modeling of NS5A inhibitors against the hepatitis C virus. Molecules 2022, 27, 2729. [Google Scholar] [CrossRef]
  52. Gálvez-Llompart, M.; Sastre, G. Machine Learning Search for Suitable Structure Directing Agents for the Synthesis of Beta (BEA) Zeolite Using Molecular Topology and Monte Carlo Techniques. In AI-Guided Design and Property Prediction for Zeolites and Nanoporous Materials; Sastre, G., Daeyaert, F., Eds.; Wiley: Hoboken, NJ, USA, 2023; pp. 61–80. [Google Scholar] [CrossRef]
  53. Ahmadi, S.; Lotfi, S.; Afshari, S.; Kumar, P.; Ghasemi, E. CORAL: Monte Carlo based global QSAR modelling of Bruton tyrosine kinase inhibitors using hybrid descriptors. SAR QSAR Environ. Res. 2021, 32, 1013–1031. [Google Scholar] [CrossRef]
  54. Antović, A.R.; Karadžić, R.; Veselinović, A.M. Monte Carlo optimization method based QSAR modeling of postmortem redistribution of structurally diverse drugs. New J. Chem. 2022, 46, 14731–14737. [Google Scholar] [CrossRef]
  55. Ouabane, M.; Tabti, K.; Hajji, H.; Elbouhi, M.; Khaldan, A.; Elkamel, K.; Sbai, A.; Ajana, M.A.; Sekkate, C.; Bouachrine, M.; et al. Structure-odor relationship in pyrazines and derivatives: A physicochemical study using 3D-QSPR, HQSPR, Monte Carlo, molecular docking, ADME-Tox and molecular dynamics. Arab. J. Chem. 2023, 16, 105207. [Google Scholar] [CrossRef]
  56. Antović, A.; Karadžić, R.; Živković, J.V.; Veselinović, A.M. Development of QSAR Model Based on Monte Carlo optimization for predicting GABAA receptor binding of newly emerging benzodiazepines. Acta Chim. Slov. 2023, 70, 634–641. [Google Scholar] [CrossRef] [PubMed]
  57. Tabti, K.; Abdessadak, O.; Sbai, A.; Maghat, H.; Bouachrine, M.; Lakhlifi, T. Design and development of novel spiro-oxindoles as potent antiproliferative agents using quantitative structure activity based Monte Carlo method, docking molecular, molecular dynamics, free energy calculations, and pharmacokinetics/toxicity studies. J. Mol. Struct. 2023, 1284, 135404. [Google Scholar] [CrossRef]
  58. Nikolić, N.; Kostić, T.; Golubović, M.; Nikolić, T.; Marinković, M.; Perić, V.; Mladenović, S.; Veselinović, A.M. Monte Carlo optimization based QSAR modeling of angiotensin II receptor antagonists. Acta Chim. Slov. 2023, 70, 318–326. [Google Scholar] [CrossRef]
  59. Lotfi, S.; Ahmadi, S.; Kumar, P. Ecotoxicological prediction of organic chemicals toward Pseudokirchneriella subcapitata by Monte Carlo approach. RSC Adv. 2022, 12, 24988–24997. [Google Scholar] [CrossRef] [PubMed]
  60. Ahmadi, S.; Ghanbari, H.; Lotfi, S.; Azimi, N. Predictive QSAR modeling for the antioxidant activity of natural compounds derivatives based on Monte Carlo method. Mol. Divers. 2021, 25, 87–97. [Google Scholar] [CrossRef] [PubMed]
  61. Drefahl, A. CurlySMILES: A chemical language to customize and annotate encodings of molecular and nanodevice structures. J. Cheminf. 2011, 3, 1. [Google Scholar] [CrossRef] [PubMed]
  62. Toropova, A.P.; Toropov, A.A. The coefficient of conformism of a correlative prediction (CCCP): Building up reliable nano-QSPRs/QSARs for endpoints of nanoparticles in different experimental conditions encoded via quasi-SMILES. Sci. Total Environ. 2024, 927, 172119. [Google Scholar] [CrossRef]
  63. Toropova, A.P.; Toropov, A.A.; Manganelli, S.; Leone, C.; Baderna, D.; Benfenati, E.; Fanelli, R. Quasi-SMILES as a tool to utilize eclectic data for predicting the behavior of nanomaterials. NanoImpact 2016, 1, 60–64. [Google Scholar] [CrossRef]
Figure 1. The graphical representation of the model for pIC50 of short-chain peptides from goat milk for split #1.
Figure 1. The graphical representation of the model for pIC50 of short-chain peptides from goat milk for split #1.
Macromol 04 00022 g001
Figure 2. The relative levels of inhibitory activity for tetra-peptides absent in the database considered according to the model obtained for split 1.
Figure 2. The relative levels of inhibitory activity for tetra-peptides absent in the database considered according to the model obtained for split 1.
Macromol 04 00022 g002
Table 1. The statistical characteristics of models of inhibitory activity observed for five random splits into the training and validation sub-systems.
Table 1. The statistical characteristics of models of inhibitory activity observed for five random splits into the training and validation sub-systems.
Split/Eq. n *R2IICCIIQ2Q2F1Q2F2Q2F3<Rm2>MAEFNact
1/3A630.48070.63030.73220.4501 0.78356
P620.20880.40450.70110.1430 0.96716
C700.77860.88220.89100.76380.78440.75830.84750.68950.379239
V710.7649-------0.38-28
2/4A690.37320.56000.71870.3399 0.89240
P660.38370.60730.71250.3486 0.84040
C650.78760.88690.87670.77450.80240.78750.86810.68860.375234
V660.7867-------0.33-25
3/5A710.28360.51780.69790.2391 0.93927
P640.25070.39680.73180.2000 0.94521
C650.76540.87480.87930.74930.78690.75800.87630.59590.342206
V660.8339-------0.30-25
4/6A710.28320.46220.69270.2418 0.93827
P690.42170.50240.71580.3913 0.90449
C620.83260.91200.89420.82190.81060.80510.89130.59240.341299
V640.8083-------0.32-22
5/7A650.39260.60760.71150.3582 0.85941
P670.50010.68460.72670.4753 0.90865
C660.78890.88800.85610.77600.79300.78270.91220.70110.299239
V680.7748-------0.34-26
(*) A = active training set; P = passive training set; C = calibration set; V = validation set; R2 = determination coefficient; IIC = index of ideality of correlation; CII = correlation intensity index; Q2Q2F3 are statistical criteria suggested in the literature [4]; <Rm2> is a metric suggested in [5]; MAE = mean absolute error; F = Fischer F-ratio; Nact is the number of parameters under optimization.
Table 2. The technical details of the model observed in the case of split 1 for the validation set.
Table 2. The technical details of the model observed in the case of split 1 for the validation set.
Quasi-SMILESDCW(3, 15)pIC50(Expr)pIC50(Calc)Defect of Quasi-SMILESApplicability Domain *
ADDA4.91213.83004.16115.0143YES
AEEL5.56664.24004.51685.0078YES
AFFL4.67884.20004.03443.0151YES
AGAG0.52402.60001.77633.0179YES
AKKK7.62775.49005.63702.0211YES
AYAY4.23584.06003.79363.0133YES
DGDG1.58792.15002.35452.0099YES
FAAL5.17094.58004.30182.0226YES
FFFP4.50584.92003.94033.0170YES
FGGK3.86713.80003.59322.0221YES
GSGS2.01502.42002.58665.0099YES
GYGY2.63553.63002.92383.0110YES
IAAE5.59304.46004.53123.0191YES
IAAQ5.54334.46004.50423.0191YES
IGIG2.25142.92002.71513.0142YES
IKKP7.45845.68005.54503.0184YES
IPIP3.31663.89003.29403.0171YES
IRRA5.29185.01004.36752.0370YES
ITTF5.93884.31004.71914.0076YES
LWLW5.62534.45004.54883.0262YES
LYLY4.55754.41003.96843.0093YES
MYMY5.87503.71004.68455.0054YES
PLPL4.42383.47003.89570.0292YES
RARA2.11503.34002.64090.0413YES
RFRF2.83953.79003.03473.0406YES
RGGP4.25454.27003.80371.0406YES
YEEY6.66755.40005.11523.0074YES
YGGY5.29624.83004.36992.0189YES
YLYL4.55753.99003.96843.0093YES
YNYN4.91504.29004.16275.0054YES
YPPR5.70544.78004.59232.0278YES
YPYY4.85734.05004.13142.0182YES
FPFP2.76093.50002.99203.0201YES
FPPF4.92694.68004.16922.0245YES
FQQP5.48284.92004.47134.0098YES
FVAP6.62775.00005.09351.0501YES
FYFY4.78084.63004.08983.0126YES
GDGD1.58792.04002.35452.0099YES
GEEG4.46913.72003.92045.0120YES
GLGL1.03212.60002.05240.0631YES
GNGN1.77782.89002.45772.0099YES
GQGQ2.10032.15002.63305.0099YES
GRRP4.82404.70004.11333.0380YES
KAKA2.11313.42002.63990.0224YES
KGKG1.94672.49002.54953.0201YES
KPPF5.21134.49004.32382.0214YES
LDDP5.59894.37004.53444.0081YES
LEEE6.44874.00004.99636.0039No
LEEL6.95314.81005.27045.0103YES
LFLF4.14443.46003.74393.0155YES
LGGI4.11624.54003.72861.0338YES
LGGL4.01734.48003.67480.0546YES
LIYP6.45305.00004.99863.0108YES
LKKA6.32395.07004.92842.0180YES
LLLF5.04054.10004.23091.0239YES
LLLP5.21434.80004.32540.0287YES
LNNP5.76074.24004.62235.0081YES
LQQW7.01765.42005.30544.0128YES
RPPP5.30474.22004.37451.0338YES
RPRP3.34873.74003.31153.0419YES
RRRR6.85794.23005.21873.0600YES
RWRW5.21614.80004.32640.0513YES
SGSG2.01502.07002.58665.0099YES
SYSY4.93544.18004.17385.0054YES
VAAA5.84244.89004.66680.0573YES
VAAF5.27814.45004.36011.0498YES
VGGP4.75514.58004.07581.0298YES
VIIY6.93515.12005.26063.0104YES
VLLY6.13224.51004.82422.0164YES
VVVF5.92614.45004.71221.0386YES
VYVY5.52504.92004.49433.0128YES
(*) Applicability domain is defined by the statistical defect [40].
Table 3. Comparison of the statistical quality models for the ACE inhibitory activities of short-chain peptides.
Table 3. Comparison of the statistical quality models for the ACE inhibitory activities of short-chain peptides.
MethodDetermination Coefficient for Training or Calibration SetDetermination Coefficient for the Validation SetReference
Partial least-squares 0.61 for training set0.40[42]
Support vector machine0.93 for training set0.65[42]
Monte Carlo method
Split #10.78 * for calibration set0.76This work
Split #20.79 for calibration set0.79-
Split #30.76 for calibration set0.83-
Split #40.83 for calibration set0.81-
Split #50.79 for calibration set0.77-
(*) Average value over training and calibration sets (in the case of models obtained with the Monte Carlo method).
Table 4. The lists of promoters for increase and decrease in ACE inhibitory activities of short-chain peptides.
Table 4. The lists of promoters for increase and decrease in ACE inhibitory activities of short-chain peptides.
A or FLSCWs Probe 1CWs Probe 2CWs Probe 3CWs Probe 4CWs Probe 5NANPNCStatistical Defect
Increase
[xyyx0].....0.53620.71740.04940.16630.29455958630.0004
[xyx0]......2.01531.92192.49352.10931.34104449480.0015
P...........0.23320.28680.55920.33100.25432327200.0043
L...........0.55700.39700.74520.31970.66621518200.0020
Y...........0.83560.77321.03430.99380.70291514160.0005
V...........0.70981.02581.24530.65310.2719139150.0037
I...........0.84810.53780.99220.67330.5455119100.0021
F...........0.13220.23100.29630.24050.14341014130.0036
K...........0.31800.76721.04510.87520.12886540.0051
G...G.......0.65421.25570.72370.73410.63805530.0058
W...........0.95911.48431.75611.56751.160051050.0090
A...A.......1.14900.75350.64670.94230.88904420.0072
P...A.......1.02471.40881.69421.85741.14334420.0072
R...........0.60490.55350.65940.58830.271241340.0145
Decrease
G...........−0.6792−0.7316−0.4849−0.6757−0.33972016240.0028
[xyx2]......−0.3288−0.5771−0.4390−0.9101−0.57551913220.0039
P...G.......−0.0676−0.4900−0.1707−0.1083−0.31395840.0085
L...G.......−0.1497−0.3795−0.1461−0.6752−0.44254110.0164
Y...Y.......−0.7225−0.4130−0.6068−0.6308−0.62694130.0118
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. Quantitative Structure–Activity Relationship Models for the Angiotensin-Converting Enzyme Inhibitory Activities of Short-Chain Peptides of Goat Milk Using Quasi-SMILES. Macromol 2024, 4, 387-400. https://doi.org/10.3390/macromol4020022

AMA Style

Toropova AP, Toropov AA, Roncaglioni A, Benfenati E. Quantitative Structure–Activity Relationship Models for the Angiotensin-Converting Enzyme Inhibitory Activities of Short-Chain Peptides of Goat Milk Using Quasi-SMILES. Macromol. 2024; 4(2):387-400. https://doi.org/10.3390/macromol4020022

Chicago/Turabian Style

Toropova, Alla P., Andrey A. Toropov, Alessandra Roncaglioni, and Emilio Benfenati. 2024. "Quantitative Structure–Activity Relationship Models for the Angiotensin-Converting Enzyme Inhibitory Activities of Short-Chain Peptides of Goat Milk Using Quasi-SMILES" Macromol 4, no. 2: 387-400. https://doi.org/10.3390/macromol4020022

APA Style

Toropova, A. P., Toropov, A. A., Roncaglioni, A., & Benfenati, E. (2024). Quantitative Structure–Activity Relationship Models for the Angiotensin-Converting Enzyme Inhibitory Activities of Short-Chain Peptides of Goat Milk Using Quasi-SMILES. Macromol, 4(2), 387-400. https://doi.org/10.3390/macromol4020022

Article Metrics

Back to TopTop