Next Article in Journal
Super-Ballistic Width Dependence of Thermal Conductivity in Graphite Nanoribbons and Microribbons
Previous Article in Journal
Multi-Level Resistive Al/Ga2O3/ITO Switching Devices with Interlayers of Graphene Oxide for Neuromorphic Computing
Previous Article in Special Issue
Co-Treatment of Caco-2 Cells with Doxorubicin and Gold Nanoparticles Produced from Cyclopia intermedia Extracts or Mangiferin Enhances Drug Effects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development of Self-Consistency Models of Anticancer Activity of Nanoparticles under Different Experimental Conditions Using Quasi-SMILES Approach

by
Andrey A. Toropov
1,*,
Alla P. Toropova
1,
Danuta Leszczynska
2 and
Jerzy Leszczynski
3
1
Laboratory of Environmental Chemistry and Toxicology, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy
2
Interdisciplinary Nanotoxicity Center, Department of Civil and Environmental Engineering, Jackson State University, 1325 Lynch Street, Jackson, MS 39217-0510, USA
3
Interdisciplinary Nanotoxicity Center, Department of Chemistry, Physics and Atmospheric Sciences, Jackson, MS 39217-0510, USA
*
Author to whom correspondence should be addressed.
Nanomaterials 2023, 13(12), 1852; https://doi.org/10.3390/nano13121852
Submission received: 3 May 2023 / Revised: 30 May 2023 / Accepted: 8 June 2023 / Published: 13 June 2023

Abstract

:
Algorithms of the simulation of the anticancer activity of nanoparticles under different experimental conditions toward cell lines A549 (lung cancer), THP-1 (leukemia), MCF-7 (breast cancer), Caco2 (cervical cancer), and hepG2 (hepatoma) have been developed using the quasi-SMILES approach. This approach is suggested as an efficient tool for the quantitative structure–property–activity relationships (QSPRs/QSARs) analysis of the above nanoparticles. The studied model is built up using the so-called vector of ideality of correlation. The components of this vector include the index of ideality of correlation (IIC) and the correlation intensity index (CII). The epistemological component of this study is the development of methods of registration, storage, and effective use of experimental situations that are comfortable for the researcher-experimentalist in order to be able to control the physicochemical and biochemical consequences of using nanomaterials. The proposed approach differs from the traditional models based on QSPR/QSAR in the following respects: (i) not molecules but experimental situations available in a database are considered; in other words, an answer is offered to the question of how to change the plot of the experiment in order to achieve the desired values of the endpoint being studied; and (ii) the user has the ability to select a list of controlled conditions available in the database that can affect the endpoint and evaluate how significant the influence of the selected controlled experimental conditions is.

1. Introduction

Knowledge is the basis of all actions aimed at improving people’s lives and the evolution of civilization as a whole. However, knowledge has internal contradictions. For example, in order to manage or even observe complex processes, it is necessary first to study the available information, which is the personification of the corresponding knowledge. If the knowledge is not structured, learning to use these disordered facts or skills becomes quite expensive and difficult in several respects, such as the necessity of a long period to learn and apply expensive equipment and software. The main function of the quasi-SMILES conception examined here is to search for reasonably simple ways to study complex phenomena. The simulation of physicochemical and biochemical behavior nanomaterials is quite a complex phenomenon.
Strangeness is one of the manifestations of reality about which it is difficult to speak clearly. Nevertheless, strangeness is often a property of things that are new, unexpected, or important. For the implementation of any activity, an economy of thinking is necessary. In practice, such savings can be achieved in various ways. Sorting is perhaps the simplest fragment of the thought economy process. Sorting consists of selecting the most informative of the data under experimental conditions in the laboratory, production, and interaction with environmental circumstances (climate, epidemics, economic crises). However, sorting does not provide any guidance for decision making at the stage when the choice of priorities is made. Models are needed at this stage. Having a model for a process can help one to manage the process. The complexity of the choice is considerable because usefulness and harm can change places when harm becomes a benefit and the benefit turns out to be harmful. For instance, the toxicity of nanoparticles is considered a useful quality because it can be used for good aims. However, left unattended, this toxicity can harm or even kill humans and animals. The list of nanoparticles is expanding exponentially. The number of types of toxicity is by no means small. Obviously, under such circumstances, it is impossible quickly to evaluate experimentally all nanoparticles that are used or can be used. However, the assessment of the physicochemical and biochemical behavior of a significant number of new nanoparticles using databases on already studied nanoparticles is quite feasible. The key points in the development of such models are the need to reduce the memory and logic requirements of the users of the models. In other words, developers of models should provide user-friendly means of evaluating new nanoparticles. At the same time, it is highly desirable and important that such models consider the effect of possible changes in the corresponding directions for the experimental use of nanoparticles [1].
Quantitative structure–property–activity relationships (QSPRS/QSARs) are a well-known approach to establishing models of different endpoints considered as a mathematical function of molecular structure. A successful QSAR analysis is possible if and only if: (i) there is a large enough number of compounds with a clear definition of the congeneric features corresponding to the molecules; (ii) there is a hypothesis on how and which molecular features affect the endpoint (topological architecture, 3D configurations, quantum mechanics interactions, etc.); and (iii) checking of the predictive potential of the model can be carried out [2,3]. However, the QSPR/QSAR paradigm is widely applied to relatively traditional substances, such as organic, inorganic, metal-organic chemicals, and polymers. On the other hand, attempts to use the abovementioned paradigm for nanomaterials face quite a complex situation. First, there are only small databases on experimentally measured basic endpoints, such as thermodynamic parameters and/or biochemical effects. In other words, selecting a series of nanomaterials with experimental data is the problem. Secondly, the huge number of atoms in the majority of nanomaterials lessens the usefulness of traditional molecular descriptors: their values become non-sensitive to small molecular modifications.
There is an urgent need to clarify the approaches and methodology for measuring the biochemical potential of engineered nanomaterials. Factually, this is a problem of tuning computational and experimental approaches oriented to “traditional” substances for application to nanomaterials. The possibility of employing computational approaches like nano-QSAR or nano-read-across to predict nanomaterial hazards based on some “standard” databases is an attractive possibility from a financial point of view. The attractiveness from an ethical point of view is also clear (minimal animal tests). Many research studies have endeavored to investigate the eco-toxicological hazards of engineered nanomaterials. However, little is known regarding nanomaterials’ actual environmental risks, combining hazard and exposure data on a planetary scale [1].
It has been assumed that strangeness and research activity rarely intersect. However, when they meet, they either reinforce or disregard each other. For example, modelling, one of the most important and complex areas of research, can be summed up in the short aphorism “All models are wrong, but some are useful” [4].
Systematization of knowledge related to nanomaterials has become necessary due to the fast growth of applications of these “unusual” substances. Systematization involves various aspects of research activity. The development of approaches that allow for the simulation of different characteristics of nanomaterials, including their interactions with other species, is one of them. There are many methods to perform such simulations. One of the possible approaches is to carry simulations out using the so-called quasi-SMILES [5,6,7,8,9,10,11,12,13] approach. The traditional simplified molecular input line entry system (SMILES) [14] allows the molecular architecture to be represented via a sequence of symbol-codes. At the same time, the quasi-SMILES approach gives us the possibility of representing the experimental conditions or even any arbitrary eclectic data related to the behavior of nanomaterials via symbol-codes. Figure 1 displays the general scheme for the simulation of the biological effects of nanoparticles. This scheme was used to build up the models described below.

2. Materials and Methods

2.1. Data

The dataset used in this study includes measurements of half maximal effective (EC50), inhibitory (IC50), and lethal (LC50) concentration toxicity endpoints toward cell lines A549 (lung cancer), THP-1 (leukemia), MCF-7 (breast cancer), Caco2 (cervical cancer), and hepG2 (hepatoma) under different experimental conditions (various nanoparticles, size, exposure time) for human cells. The indicated conditions and circumstances were represented by special codes listed in Table 1. These codes are used for the construction of the quasi-SMILES that represent the above measurements of the toxicity of the studied nanoparticles [15].
The listed codes for quasi-SMILES make it possible to constructively describe the available experimental situations for developing models in order to predict the results of varying codes (i.e., varying of an experiment). The system described can assess the statistical significance of individual experimental conditions (i.e., the above codes for quasi-SMILES). In other words, concentration values, exposure times, impacted objects, nanoparticle sizes, and others are under consideration to simulate the behavior of nanoparticles.
After removing duplicates, the source [15] contains 935 measurements, representing data related only to human cells. The total set studied here includes 102 measurements. These data were randomly split into an active training set (≈25%), a passive training set (≈25%), a calibration set (≈25%), and a validation set (≈25%). The advantages of considering a structured training set (divided into an active training set, passive training set, and calibration set) are described in the literature [16]. Five such splits that involve the deposition of different data each time for the considered data sets are considered to assess the reproducibility of the approach considered here for creating models [17].

2.2. Optimal Descriptor

The optimal descriptor is the sum of the correlation weights of the quasi-SMILES codes obtained by the Monte Carlo method using the CORAL software (http://www.insilico.eu/coral, accessed on 29 May 2023). The values of the optimal descriptor serve as the basis for the model of half-maximal concentration (HMC) (i.e., EC50, IC50, or LC50) calculated by the formula:
H M C k = C 0 + C 1 × D C W ( T , N )
The optimal descriptor depends on the selected method of the Monte Carlo optimization of the correlation weights for codes of quasi-SMILES (Table 1). The T and N are the parameters of the optimization procedure. T is a threshold applied to define rare codes. If T = 1, this means that codes which are absent in the active training set are rare. The rare codes are not involved in modelling (their correlation weights are zero). N is the number of epochs of the Monte Carlo optimization.

2.3. Optimization of Correlation Weights

The optimal descriptors are calculated using the correlation weights obtained by the Monte Carlo optimization [16,17]. Two target functions of the optimization are compared here:
T F 1 = r A T + r P T r A T r P T × 0.1
T F 2 = r A T + r P T r A T r P T × 0.1 + ( I I C + C I I ) × 0.3
rAT and rPT are correlation coefficients between the experimental and predicted values for the active and passive training sets, respectively. The IIC represents the index of ideality of correlation [15,16,17]. The CII is the correlation intensity index [15,16,17].
Figure 2 contains examples of the optimization history with target functions TF1 and TF2. The figure demonstrates the advantage of the target function TF2 graphically.

2.4. Mechanistic Interpretation

If Monte Carlo optimization is carried out several times, then some components of the optimized correlation weights will have positive values in all optimization trials. Such correlation weights indicate those fragments of quasi-SMILES that are growth promoters of the studied endpoint. At the same time, some of the correlation weights will only have negative values. These correlation weights indicate those fragments of quasi-SMILES that are patrons of the decrease in the simulated endpoint. Correlation weights with alternating values (positive and negative in different runs of Monte Carlo optimizations) have no mechanical interpretation for the models under consideration.

2.5. Applicability Domain

The applicability domain for the described model defines via the so-called statistical defects of codes used in quasi-SMILES. These defects can be calculated as follows:
d k = P ( S k ) P ( S k ) N S k + N ( S k ) + P ( S k ) P ( S k ) N S k + N ( S k ) + P ( S k ) P ( S k ) N S k + N ( S k )
where P(Sk), P′(Sk) P″(Sk) are the probability of Sk in the active training set, passive training set, and calibration set, respectively; N(Sk), N′(Sk), and N″(Sk) are frequencies of Sk in the active training set, passive training set, and calibration set, respectively. The statistical defects of quasi-SMILES (Dj) are calculated as follows:
D j = k = 1 N A d k
where NA is the number of non-blocked codes in quasi-SMILES.
A quasi-SMILES falls in the applicability domain if
D j < 2 × D ¯

3. Results

Table 2 contains an example of the model of biological activity related to different experimental situations represented via quasi-SMILES (split 1, target function TF2). However, since the statistical characteristics of a model can vary for different splits into the training and validation set, it is necessary to consider a system of several different splits.
Two CORAL methods are applied here for five random splits.
The first CORAL method is the Monte Carlo optimization with target function without the vector of ideality of correlation and the correlation weights of fragments of local symmetry (Equation (2)). This method gives the models represented in Table 3.
The second CORAL method is the Monte Carlo optimization with target function calculated by Equation (3) with the use of the vector of ideality of correlation together with the correlation weights of fragments of local symmetry. This method gives the models represented in Table 4.
One can see that the statistical characteristics of models observed in the case when the second method is applied are better than those observed in the case of the first method. This is evidenced by the average determination coefficient for the validation set, which in the case of the first method amounts to R V 2 = 0.605 ( R V 2 = 0.073 ). The second method gave R V 2 = 0.751 ( R V 2 = 0.097 ).

4. Discussion

The most popular traditional QSAR modelling approach can be formulated as follows: (i) selection of a group of available and convenient descriptors; (ii) defining a model using training-set substances; and (iii) validating the model using external validation-set substances. One can formulate several questions related to the optimization of this approach. For example, how will the model’s statistical quality change in the next division into training and testing samples? How to avoid overfitting (i.e., how to avoid a situation where a good model for the training set becomes a bad model for external substances)? How can one estimate the probability of obtaining a satisfactory and reliable model? In fact, the approach under consideration attempts to solve these problems using original idealizations, assumptions, and limitations.
Much excellent research is dedicated to nano-topics; nevertheless, even a simple question, e.g., whether a nanomaterial can be assessed using software, is quite ambiguous. The results of different estimations can vary depending on the personal experience of the expert conducting the study, and one cannot guarantee the reproducibility of these assessments.
Perhaps the main and convenient (from the user point of view) idealization of the considered approach is that instead of searching for sources of numerous descriptors, it is supposed to use “artificial” optimal descriptors, which can be tuned to correlate with the endpoint of interest. This assumption may not be correct. In this case, the approach under consideration is unsuitable for such a task, and a useful alternative approach to solve the task becomes necessary. However, there are cases where the approach discussed here has been useful [6,7,8,9,10,11,12].
The approach considered here has various advantages. First, to apply this approach, one can use arbitrary data. There is no ‘a priori’ knowledge before the experiment about whether such data can improve the model or not. The instability of the values of the correlation weights is a reliable indicator of the uselessness of the tested factor. On the contrary, at the same time, stability is a significant indicator of the influence of the factor on the predictive potential of the model. Secondly, this approach makes it easy to change the set of correlation-weighted factors, thus radically changing the model. This facilitates fast evaluation of the benefits of various hypotheses related to the optimal list of factors involved in the model development process.
The universality of the approach provides the user with sample opportunities to choose a set of basic factors for developing a model. However, overextension of such a set can lead to useless models that are excellent for the training set of samples but are completely unsuitable for external sets of similar samples. Given this circumstance, it is difficult to formulate rules for dividing the available data into active learning, passive learning, calibration, and an external validation set. It seems reasonable to assume that each of the four mentioned sets has the same significance. Therefore, the distribution of available samples should be approximately the same, i.e., about 25% of the data for each set (Table 2 and Table 3).
The presented approach is similar to incremental methods based on the selection of suitable contributions from individual parts of molecules to describe or model the physicochemical property or biological activity of interest [18,19]. The main common feature of the described approach with the additive scheme is that in both cases, the modeled endpoint is considered as the sum of the contributions of some participants in the model-building process. The difference between the mentioned approaches lies in the fact that for the traditional additive scheme, the set of participants is constant. At the same time, for the quasi-SMILES approach, it is possible to vary the number and quality of participants in the model-building process. For example, theoretically, the user of the quasi-SMILES method can eliminate the correlation weights reflecting particle size by reducing the number of Monte Carlo optimization parameters.
On the other hand, the user can expand the brutto formulas by representing the corresponding metal oxides with traditional SMILES (e.g., instead of Al2O3 using [O-2].[O-2].[O-2].[Al+3].[Al+3]), thereby increasing the number of optimized parameters. Of course, such changes do not guarantee an improvement in the predictive potential of the model, but they do provide the user with extended opportunities in the search for a model of the phenomenon and perhaps even stimulate the user’s creative activity. Another important although hidden point is that the considered approach allows the user to identify and discard those quasi-SMILES fragments that are non-informative due to their low prevalence in training samples and/or in the general array of available data. This defines automatically through the appropriate selection of the threshold described above (i.e., parameter T in Equation (1)). Since QSAR is actually a random event [20] associated with and determined by the distribution of available data in training and control samples, this option is very useful because it allows one to go from so-called “naive cross-validation” to “two-step cross-validation” [21]. The difference between naive and two-step cross-validation is as follows. Naive cross-validation is the result of a single distribution in the training and the validation sets. In contrast, two-step cross-validation is the result of considering and analyzing multiple random distributions in the training and validation sets.
A very significant component of models built on optimal descriptors using quasi-SMILES codes is optimization procedures by the Monte Carlo method. The choice of the target function is the key to the success of such Monte Carlo calculations. The ideality index of correlations (IIC) [22] turned out to be a very useful finding for improving the objective functions for the Monte Carlo method optimizations used to construct optimal descriptors calculated in using both SMILES and quasi-SMILES codes. The majority of the phenomena involved in the natural sciences are complex. Idealization (or simplification) is one of the most common approaches to studying complex phenomena in the natural sciences, such as ideal gas, ideal solution, ideal crystals, and ideal symmetry [22]. Ideal correlation is also a very attractive variant of correlations in general. The main idea of ideal correlation expressed through IIC is a correlation with forced minimization of the mean absolute error (MAE). It is to be noted, however, that the application of IIC gives an improvement to the statistical characteristics for calibration and validation sets which is accompanied by reducing the correlation coefficient for the training sets. This is a paradox situation. Nevertheless, from a practical point of view, this situation is preferable to overtraining (i.e., the situation in which the excellent statistical quality for the training set is accompanied by poor statistical quality for the validation set). An analysis of the graphical representations of such a paradox observed with various geometric configurations on plots for ‘experiment vs calculation’ shows that such idealization is not always possible; fortunately, however, it is possible in the majority of cases of the different arrangement of points on the plot diagram ‘experiment vs calculation’ [23].
Another useful invention for improving the predictive potential of models based on quasi-SMILES codes is the Correlation Intensity Index (CII) [17]. Data on a group of quasi-SMILES (e.g., calibration set or validation set) with experimental and predicted values of an endpoint gives the possibility to estimate the contribution of each quasi-SMILES to the correlation between experiments vs calculated endpoint value. The negative effect of removing quasi-SMILES means it is a ‘supporter’ of the correlation; the positive effect of removing quasi-SMILES means it is an ‘oppositionist’ of the correlation. The sum of these effects is the CII.

5. Conclusions

The present study demonstrated that the quasi-SMILES technique gives statistically robust models for the half-maximal concentrations for the five cell lines. We showed that the statistical quality is well reproduced for five random splits of available data into a structured training set (i.e., the active training, passive training, and calibration sets) and an external validation set. Such approach is tested and recommended for various applications of the quasi-SMILES approach. Paradoxically, the vector of ideality of correlation, which is the sum of the described IIC and CII, improves the predictive potential of the studied models but in detriment to the statistical quality of the models on the training set. The described approach can be easily adapted to simulate other experimental situations and endpoints for nanomaterials and other substances (mixtures, polymers, peptides, proteins).
A quasi-SMILES approach describing experimental situations can be modified both by feedback (i.e., depending on the results obtained) and purely heuristically in accordance with spontaneous ideas for which statistical expertise is possible. Thus, quasi-SMILES are a simple and versatile approach for modelling experimental situations not yet implemented in practice. Indices IIC and CII cannot only improve Monte Carlo optimization, but the mentioned values can also be indicators of the predictive potential of various models.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/nano13121852/s1. Supplementary Materials section contains technical details (Table S1 quasi-SMILES split 1–5 with experimental and calculated values of the half-maximal concentrations for the five cells lines; Table S2 includes the numerical data on the corresponding correlation weights for elements of quasi-SMILES).

Author Contributions

Conceptualization, A.A.T., A.P.T., D.L. and J.L.; methodology, A.A.T., A.P.T., D.L. and J.L.; software, A.A.T.; validation, A.P.T. and A.A.T.; formal analysis, A.A.T., A.P.T., D.L. and J.L.; data curation, A.A.T. and A.P.T.; writing—original draft preparation, A.A.T., A.P.T., D.L. and J.L.; writing—review and editing, A.A.T., A.P.T., D.L. and J.L. All authors have read and agreed to the published version of the manuscript.

Funding

A.A.T. and A.P.T. would like to thank the EC project LIFE-CONCERT contract (LIFE17 GIE/IT/000461) for financial support; D.L. and J.L. would like to thank the National Science Foundation (NSF-CREST HRD-1547754) for financial support.

Data Availability Statement

Data are available within the article or in the Supplementary Materials (Tables S1 and S2).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Toropov, A.A.; Toropova, A.P. The index of ideality of correlation: A criterion of predictive potential of QSPR/QSAR models? Mutat. Res. Genet. Toxicol. Environ. Mutagen. 2017, 819, 31–37. [Google Scholar] [CrossRef] [PubMed]
  2. OECD. Organisation for Economic Co-Operation and Development. Ecotoxicology and Environmental Fate of Manufactured Nanomaterials, Series on the Safety of Manufactured Nanomaterials, ENV/JM/MONO(2014)1; Test No. 40; OECD: Paris, France, 2014. [Google Scholar]
  3. OECD. Organisation for Economic Co-Operation and Development. Guidance Document for the Testing of Dissolution and Dispersion Stability of Nanomaterials and the Use of the Data for Further Environmental Testing and Assessment Strategies. OECD Guidelines for the Testing of Chemicals, ENV/JM/MONO(2020)9; Test No. 318; OECD: Paris, France, 2020. [Google Scholar]
  4. Box, G.E.P. Science and statistics. J. Am. Stat. Assoc. 1976, 71, 791–799. [Google Scholar] [CrossRef]
  5. Toropova, A.P.; Meneses, J.; Alfaro-Moreno, E.; Toropov, A.A. The system of self-consistent models based on quasi-SMILES as a tool to predict the potential of nano-inhibitors of human lung carcinoma cell line A549 for different experimental conditions. Drug Chem. Toxicol. 2023, in press. [Google Scholar] [CrossRef] [PubMed]
  6. Toropov, A.A.; Kjeldsen, F.; Toropova, A.P. Use of quasi-SMILES to build models based on quantitative results from experiments with nanomaterials. Chemosphere 2022, 303, 135086. [Google Scholar] [CrossRef]
  7. Ahmadi, S.; Ketabi, S.; Qomi, M. CO2 uptake prediction of metal-organic frameworks using quasi-SMILES and Monte Carlo optimization. New J. Chem. 2022, 46, 8827–8837. [Google Scholar] [CrossRef]
  8. Ahmadi, S.; Aghabeygi, S.; Farahmandjou, M.; Azimi, N. The predictive model for band gap prediction of metal oxide nanoparticles based on quasi-SMILES. Struct. Chem. 2021, 32, 1893–1905. [Google Scholar] [CrossRef]
  9. Jafari, K.; Fatemi, M.H. Application of nano-quantitative structure–property relationship paradigm to develop predictive models for thermal conductivity of metal oxide-based ethylene glycol nanofluids. J. Therm. Anal. Calorim. 2020, 142, 1335–1344. [Google Scholar] [CrossRef]
  10. Ahmadi, S. Mathematical modeling of cytotoxicity of metal oxide nanoparticles using the index of ideality correlation criteria. Chemosphere 2020, 242, 125192. [Google Scholar] [CrossRef]
  11. Choi, J.-S.; Trinh, T.X.; Yoon, T.-H.; Kim, J.; Byun, H.-G. Quasi-QSAR for predicting the cell viability of human lung and skin cells exposed to different metal oxide nanomaterials. Chemosphere 2019, 217, 243–249. [Google Scholar] [CrossRef]
  12. Trinh, T.X.; Choi, J.-S.; Jeon, H.; Byun, H.-G.; Yoon, T.-H.; Kim, J. Quasi-SMILES-Based Nano-Quantitative Structure-Activity Relationship Model to Predict the Cytotoxicity of Multiwalled Carbon Nanotubes to Human Lung Cells. Chem. Res. Toxicol. 2018, 31, 183–190. [Google Scholar] [CrossRef]
  13. Weininger, D. SMILES, a Chemical Language and Information System: 1: Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. [Google Scholar] [CrossRef]
  14. Gakis, G.P.; Aviziotis, I.G.; Charitidis, C.A. Metal and metal oxide nanoparticle toxicity: Moving towards a more holistic structure-Activity approach. Environ. Sci. Nano 2023, 10, 761–780. [Google Scholar] [CrossRef]
  15. Toropov, A.A.; Raška, I., Jr.; Toropova, A.P.; Raškova, M.; Veselinović, A.M.; Veselinović, J.B. The study of the index of ideality of correlation as a new criterion of predictive potential of QSPR/QSAR-models. Sci. Total Environ. 2019, 659, 1387–1394. [Google Scholar] [CrossRef]
  16. Toropov, A.A.; Toropova, A.P. The system of self-consistent models for the uptake of nanoparticles in PaCa2 cancer cells. Nanotoxicology 2021, 15, 995–1004. [Google Scholar]
  17. Toropov, A.A.; Toropova, A.P. The unreliability of the reliability criteria in the estimation of QSAR for skin sensitivity: A pun or a reliable law? Toxicol. Lett. 2021, 340, 133–140. [Google Scholar] [CrossRef]
  18. Yalkowsky, S.H.; Alantary, D. Estimation of Melting Points of Organics. J. Pharm. Sci. 2018, 107, 1211–1227. [Google Scholar] [CrossRef] [PubMed]
  19. He, W.; Sun, P.; Zhao, Y.; Pu, Q.; Yang, H.; Hao, N.; Li, Y. Source toxicity characteristics of short- and medium-chain chlorinated paraffin in multi-environmental media: Product source toxicity, molecular source toxicity and food chain migration control through silica methods. Sci. Total Environ. 2023, 876, 162861. [Google Scholar] [CrossRef] [PubMed]
  20. Toropov, A.A.; Toropova, A.P.; Puzyn, T.; Benfenati, E.; Gini, G.; Leszczynska, D.; Leszczynski, J. QSAR as a random event: Modeling of nanoparticles uptake in PaCa2 cancer cells. Chemosphere 2013, 92, 31–37. [Google Scholar] [CrossRef]
  21. Majumdar, S.; Basak, S.C. Beware of naïve q2, use true q2: Some comments on QSAR model building and cross validation. Curr. Comput. Aided Drug Des. 2018, 14, 5–6. [Google Scholar] [CrossRef]
  22. Toropova, A.P.; Toropov, A.A. Does the Index of Ideality of Correlation Detect the Better Model Correctly? Mol. Inf. 2019, 38, 1800157. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. Monte Carlo technique to study the adsorption affinity of azo dyes by applying new statistical criteria of the predictive potential. SAR QSAR Environ. Res. 2022, 33, 621–630. [Google Scholar] [CrossRef] [PubMed]
Figure 1. General scheme for building up the models examined here.
Figure 1. General scheme for building up the models examined here.
Nanomaterials 13 01852 g001
Figure 2. The history of the Monte Carlo optimization carried out using the target functions TF1 or TF2.
Figure 2. The history of the Monte Carlo optimization carried out using the target functions TF1 or TF2.
Nanomaterials 13 01852 g002
Table 1. Codes that are applied to construct quasi-SMILES.
Table 1. Codes that are applied to construct quasi-SMILES.
CodeCommentsCW(Code)d
[108m].......Exposure time
(minutes or hours)
−0.45250.0343
[12h].......-−0.30370.0513
[216].......-−0.11350.0251
[18h].......-0.01.0000
[24h].......-−0.28730.0063
[36h].......-0.01.0000
[48h].......-−0.32580.0160
[60h].......-0.01.0000
[72m]........-0.01.0000
[Ag]........Types of nanoparticles−0.42070.0178
[Al2O3].....-0.01.0000
[Bi2O3].....-0.01.0000
[CeO2]......-0.01.0000
[Co3O4].....-−0.39801.0000
[Co]........-0.01.0000
[Cu2O]......-0.01.0000
[CuO].......-0.48790.0333
[Cu]........-0.01.0000
[Fe2O3].....-0.01.0000
[MgO].......-0.01.0000
[Mn3O4].....-0.01.0000
[Mn3O4].....-0.02991.0000
[MoO3]......-0.01.0000
[NiO].......-0.01.0000
[Ni]........-0.01.0000
[Sb2O3].....-0.38140.0185
[SnO2]......-0.01.0000
[Y2O3]......-0.01.0000
[TiO2]......-0.47361.0000
[WO3].......-0.01.0000
[ZnO].......-0.06980.0338
[ZrO2]......-0.01.0000
[nm100].....Size of nanoparticle in nm0.01.0000
[nm11,9]....-0.01.0000
[nm12,2]....-0.01.0000
[nm14,9]....-0.01.0000
[nm−].......-−0.30090.0069
[nm149].....-0.01.0000
[nm14]......-0.01.0000
[nm19,7]....-0.01.0000
[nm20,8]....-0.01.0000
[nm20–30]...-0.01.0000
[nm22,9]....-0.01.0000
[nm20]......-0.46120.0160
[nm21]......-0.01.0000
[nm30–50]...-0.01.0000
[nm312].....-0.01.0000
[nm33,4]....-0.01.0000
[nm30]......-0.01.0000
[nm31]......-0.01.0000
[nm32]......-0.01.0000
[nm33]......-0.01.0000
[nm40–68]...-0.01.0000
[nm45,4]....-0.01.0000
[nm42]......-0.01.0000
[nm5–15]...-0.01.0000
[nm48,9]....-0.01.0000
[nm50]......-0.01.0000
[nm64–69]...-0.01.0000
[nm83–94]...-0.01.0000
[nm9,2].....-0.01.0000
[nm90]......-0.01.0000
[nm5].......-0.01.0000
[MCF-7].....Cell line MCF-7 (breast cancer)0.18881.0000
[A549]......Cell line A549 (lung cancer)0.18170.0068
[THP-1].....Cell line THP-1 (leukemia)0.39300.0185
[HepG2].....Cell line hepG2 (hepatoma)0.01.0000
[Caco2].....Cell line Caco2 (cervical cancer)−0.39120.0288
[EC50]......concentration that gives 50% of the maximal response0.34440.0164
[IC50]......Concentration that gives 50% inhibition of a biological process0.30230.0128
[LC50]......Concentration that kills 50% test animals0.01.0000
[LD50]......Dose that kills 50% test animals−0.34240.0010
Table 2. Quasi-SMILES, optimal descriptor DCW(1,15), experimental and calculated biological activity, statistical defects (D) of quasi-SMILES, and applicability domain (AD).
Table 2. Quasi-SMILES, optimal descriptor DCW(1,15), experimental and calculated biological activity, statistical defects (D) of quasi-SMILES, and applicability domain (AD).
SetIDQuasi-SMILESDCW(1,15)ExprCalcDAD
P5[Ag][nm312][24h][IC50][THP-1]2.2559−3.5930−3.61911.0553YES
A6[Ag][nm5][24h][EC50][MCF-7]−1.5770−5.3340−5.32212.0405YES
C7[Ag][nm5][24h][EC50][HepG2]−0.8541−5.2630−5.00092.0405YES
P8[Ag][nm5][24h][EC50][A549]−0.2091−5.0330−4.71431.0472YES
C9[Ag][nm20][24h][EC50][A549]2.7579−4.0350−3.39610.0632YES
V10[Ag][nm50][24h][EC50][A549]3.5230−3.9100−3.05621.0472YES
A11[Ag][nm20][24h][EC50][MCF-7]1.3899−3.8780−4.00391.0565YES
P12[Ag][nm20][24h][EC50][HepG2]2.1129−3.6270−3.68271.0565YES
A13[Ag][nm50][24h][EC50][HepG2]2.8780−3.5070−3.34272.0405YES
P14[Ag][nm50][24h][EC50][MCF-7]2.1550−3.3560−3.66402.0405YES
P15[Al2O3][nm31][24h][ec50][A549]5.0528−2.0090−2.37652.0171YES
A18[Al2O3][nm40–68][24h][IC50][THP-1]5.1653−2.1810−2.32652.0375YES
A21[Bi2O3][nm149][24h][LC50][A549] 1.8978−3.7930−3.77823.0131No
V22[Bi2O3][nm149][24h][LC50][HepG2]1.2528−3.6680−4.06484.0063No
C23[CeO2][nm14][24h][ec50][A549]3.5063−2.2360−3.06362.0171YES
P25[CeO2][nm33,4][24h][IC50][THP-1]3.2467−2.5570−3.17892.0375YES
P26[CeO2][nm-][24h][LD50][A549]4.8429−2.2360−2.46971.0209YES
C27[CeO2][nm-][48h][LD50][A549]3.7565−2.2360−2.95241.0307YES
P30[Co][nm20][24h][IC50][THP-1]3.8171−2.5540−2.92551.0535YES
A31[Co3O4][nm9,2][24h][IC50][Caco2]3.3840−3.2920−3.11792.0478YES
V32[Co3O4][nm9,2][24h][IC50][A549]4.2995−3.2650−2.71122.0258YES
A33[Co3O4][nm-][12h][ec50][A549]3.0885−3.3590−3.24921.0689YES
V34[Co3O4][nm-][108][ec50][A549]2.9873−3.3510−3.29421.0519YES
P35[Co3O4][nm-][36h][ec50][A549]3.0404−3.3480−3.27062.0176YES
V36[Co3O4][nm-][60h][ec50][A549]3.1083−3.3410−3.24042.0176YES
C42[Cu][nm90][24h][IC50][THP-1]2.8851−4.0500−3.33962.0375YES
C44[Cu][nm22,9][24h][IC50][THP-1] 3.0525−2.7470−3.26522.0375YES
A48[Cu2O][nm83–94][24h][IC50][THP-1]2.0570−4.2440−3.70752.0375YES
V49[CuO][nm48][24h][ec50][A549]3.4861−2.9010−3.07260.0504YES
P50[CuO][nm11,9][24h][IC50][Caco2]2.3263−3.8000−3.58791.0812YES
C51[CuO][nm11,9][24h][IC50][A549]3.2418−3.6240−3.18111.0592YES
A52[CuO][nm42][18h][EC50][A549]0.4290−3.6000−4.43082.0565YES
C58[CuO][nm45,4][24h][IC50][THP-1]3.0629−3.8080−3.26061.0709YES
V59[CuO][nm30][24h][IC50][THP-1] 2.9429−3.6480−3.31391.0709YES
V64[CuO][nm > 50][24h][IC50][A549] 3.1342−3.4230−3.22890.0592YES
C65[CuO][nm-][60h][ec50][Caco2]2.3452−3.4190−3.57951.0730YES
C66[CuO][nm-][108][ec50][Caco2]2.2242−3.4020−3.63320.1073YES
V67[CuO][nm-][36h][ec50][A549]3.1928−3.3940−3.20291.0510YES
A68[CuO][nm-][108][ec50][A549]3.1397−3.3420−3.22650.0852YES
V69[CuO][nm-][216][ec50][Caco2]1.8852−3.3320−3.78380.0981YES
A70[CuO][nm-][216][ec50][A549]2.8007−3.3300−3.37710.0761YES
A71[CuO][nm-][60h][ec50][A549] 3.2607−3.3280−3.17271.0510YES
C72[CuO][nm-][12h][ec50][A549]3.2409−3.3190−3.18150.1022YES
V73[CuO][nm-][36h][ec50][Caco2]2.2773−3.3140−3.60961.0730YES
A74[CuO][nm-][24h][ec50][A549]4.0369−3.2590−2.82780.0573YES
C75[CuO][nm-][24h][ec50][Caco2] 3.1214−2.8320−3.23460.0793YES
V76[Fe2O3][nm39][24h][ec50][A549] 4.6853−2.2040−2.53971.0171YES
V77[Fe2O3][nm-][24h][LD50][A549]6.0495−2.2040−1.93371.0209YES
A78[Fe2O3][nm-][48h][LD50][A549]4.9631−2.2040−2.41641.0307YES
A79[MgO][nm20][24h][ec50][A549]6.4904−1.6020−1.73781.0331YES
V80[Mn3O4][nm14,9][24h][IC50][Caco2]2.6535−3.5360−3.44252.0478YES
C81[Mn3O4][nm14,9][24h][IC50][A549]3.5690−3.2260−3.03572.0258YES
A82[Mn3O4][nm-][108][ec50][Caco2]1.3911−4.0440−4.00341.0739YES
V83[Mn3O4][nm-][216][ec50][Caco2]1.0522−3.8990−4.15391.0648YES
P84[Mn3O4][nm-][108][ec50][A549]2.3066−3.8570−3.59661.0519YES
V85[Mn3O4][nm-][60h][ec50][Caco2]1.5121−3.8390−3.94962.0396YES
P86[Mn3O4][nm-][60h][ec50][A549]2.4276−3.7670−3.54282.0176YES
A87[Mn3O4][nm-][216][ec50][A549]1.9677−3.6870−3.74721.0428YES
V88[Mn3O4][nm-][36h][ec50][A549] 2.3598−3.4030−3.57302.0176YES
A89[MoO3][nm100][24h][ec50][A549]5.4219−2.1580−2.21252.0171YES
A91[Ni][nm64–69][24h][IC50][THP-1]4.5318−2.6220−2.60802.0375YES
P94[NiO][nm48,9][24h][IC50][THP-1]2.7830−4.5020−3.38502.0375YES
V95[Sb2O3][nm20,8][24h][IC50][Caco2]1.7044−3.7080−3.86421.0663YES
C96[Sb2O3][nm20,8][24h][IC50][A549]2.6199−3.5630−3.45741.0443YES
P97[Sb2O3][nm-][24h][ec50][Caco2]2.2649−4.4650−3.61510.0644YES
P98[Sb2O3][nm-][48h][ec50][Caco2] 1.1785−4.4650−4.09780.0741YES
A99[Sb2O3][nm-][72][ec50][Caco2]0.3668−4.4650−4.45851.0581YES
A100[Sb2O3][nm-][216][ec50][A549] 1.9442−4.1230−3.75760.0612YES
P101[Sb2O3][nm-][60h][ec50][Caco2] 1.4887−3.8860−3.96001.0581YES
V102[Sb2O3][nm-][108][ec50][A549]2.2832−3.8800−3.60700.0704YES
C103[Sb2O3][nm-][108][ec50][Caco2]1.3677−3.7900−4.01380.0924YES
C104[Sb2O3][nm-][216][ec50][Caco2]1.0287−3.7820−4.16440.0832YES
V105[Sb2O3][nm-][60h][ec50][A549] 2.4042−3.6940−3.55331.0361YES
P106[Sb2O3][nm-][36h][ec50][Caco2]1.4208−3.6410−3.99011.0581YES
V107[Sb2O3][nm-][36h][ec50][A549] 2.3363−3.5380−3.58341.0361YES
P108[SnO2][nm21][24h][ec50][A549]4.4986−2.1790−2.62272.0171YES
P111[SnO2][nm33][24h][IC50][THP-1]2.7228−2.4290−3.41172.0375YES
A112[TiO2][nm30–50][24h][ec50][A549] 5.4703−1.9030−2.19102.0171YES
P114[TiO2][nm5–15][24h][ec50][A549]5.2689−1.9030−2.28052.0171YES
P120[TiO2][nm12,2][24h][IC50][THP-1] 4.7545−1.8770−2.50902.0375YES
V121[TiO2][nm-][24h][LD50][A549]6.5859−1.9030−1.69531.0209YES
A122[TiO2][nm-][48h][LD50][A549] 5.4994−1.9030−2.17801.0307YES
C123[WO3][nm30][24h][ec50][A549]3.4752−2.3650−3.07742.0171YES
P124[Y2O3][nm33][24h][ec50][A549] 3.8498−2.3540−2.91102.0171YES
A126[ZnO][nm21][24h][ec50][A549]4.6905−2.9080−2.53751.0509YES
P127[ZnO][nm19,7][24h][IC50][A549]3.1916−3.5120−3.20341.0597YES
C128[ZnO][nm19,7][24h][IC50][Caco2]2.2761−3.4280−3.61021.0817YES
V132[ZnO][nm53,6][24h][IC50][THP-1]2.6149−2.9990−3.45960.0714YES
P133[ZnO][nm-][48h][LD50][A549]3.5956−3.0330−3.02390.0645YES
C134[ZnO][nm-][24h][LD50][A549]4.6821−2.5110−2.54120.0548YES
V139[ZnO][nm > 50][24h][IC50][A549]2.9661−3.1300−3.30360.0597YES
P140[ZnO][nm-][216][ec50][A549]2.6326−3.2950−3.45180.0766YES
C141[ZnO][nm-][36h][ec50][A549]3.0247−3.2770−3.27761.0515YES
V142[ZnO][nm-][60h][ec50][A549]3.0925−3.2460−3.24741.0515YES
C143[ZnO][nm-][108][ec50][A549]2.9715−3.2380−3.30120.0858YES
V144[ZnO][nm-][60h][ec50][Caco2]2.1770−3.1890−3.65421.0735YES
C145[ZnO][nm-][216][ec50][Caco2]1.7171−3.1220−3.85850.0986YES
C146[ZnO][nm-][108][ec50][Caco2]2.0560−3.1130−3.70790.1078YES
C147[ZnO][nm-][36h][ec50][Caco2]2.1092−3.0900−3.68431.0735YES
A148[ZnO][nm-][12h][ec50][A549]3.0727−2.8380−3.25620.1028YES
C149[ZrO2][nm20–30][24 h][ec50][A549]5.5291−2.0900−2.16492.0171YES
A150[ZrO2][nm32][24 h][IC50][THP-1]5.3352−2.3340−2.25102.0375YES
Table 3. The statistical characteristics of models observed in the case of the first CORAL method.
Table 3. The statistical characteristics of models observed in the case of the first CORAL method.
SplitSet *nR2CCCIICCIIQ2RMSEF
1A260.99530.99760.73160.99590.99460.0625075
P250.81160.85770.75590.84440.79120.43699
C250.58390.71290.55450.73890.50340.51832
V260.6669----0.421-
2A250.96740.98340.90790.97510.96250.162682
P260.87410.89360.91410.90620.85920.452167
C260.71190.82230.50940.78850.65100.28759
V250.6664----0.440-
3A260.99420.99710.99710.99470.99350.0714145
P250.80970.84890.66150.87290.78400.48898
C250.02500.15000.15770.84500.28460.6541
V260.4691----0.459-
4A260.96760.98360.72140.96900.96370.175718
P250.92920.85130.24250.94710.91940.455302
C250.49480.68130.58810.71060.33170.34023
V260.6266----0.544-
5A250.99580.99790.66530.99710.99480.0615496
P250.84220.90950.88210.87960.82290.423123
C260.54140.71880.47510.77460.43480.32028
V260.5943----0.466-
* A = active training set; P = passive training set; C = calibration set; V = validation set; n = the number of samples in a set; R2 = determination coefficient; CCC = concordance correlation coefficient; IIC = index of ideality of correlation; CII = correlation intensity index; Q2 = leave-one-out cross-validated R2; RMSE = root mean squared error; F = Fischer F-ratio.
Table 4. The statistical characteristics of models observed in the case of the second CORAL method.
Table 4. The statistical characteristics of models observed in the case of the second CORAL method.
SplitSet *nR2CCCIICCIIQ2RMSEF
1A260.90290.94900.69680.92030.88500.282223
P250.77800.80480.87270.83910.75340.47781
C250.59260.72730.76960.76770.52270.46833
V260.6895----0.325-
2A250.76460.86660.80720.87170.71540.43475
P260.75380.74610.86690.86120.71850.62273
C260.83530.89810.91240.86790.78570.221122
V250.9132----0.239-
3A260.85880.92400.57920.90020.83860.353146
P250.77940.69710.56270.88480.74810.66181
C250.63320.77490.79470.78820.54150.28640
V260.6278----0.318-
4A260.88680.94000.69060.92750.86710.327188
P250.83270.85290.36630.87190.80210.486114
C250.37950.58030.61570.80390.24970.43914
V260.7355----0.413-
5A250.83410.90950.84300.90630.80620.387116
P250.75930.82710.49980.83160.72580.55773
C260.81460.80620.90160.89820.76730.317105
V260.7878----0.273-
* A = active training set; P = passive training set; C = calibration set; V = validation set; n = the number of samples in a set; R2 = determination coefficient; CCC = concordance correlation coefficient; IIC = index of ideality of correlation; CII = correlation intensity index; Q2 = leave-one-out cross-validated R2; RMSE = root mean squared error; F = Fischer F-ratio.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Toropov, A.A.; Toropova, A.P.; Leszczynska, D.; Leszczynski, J. Development of Self-Consistency Models of Anticancer Activity of Nanoparticles under Different Experimental Conditions Using Quasi-SMILES Approach. Nanomaterials 2023, 13, 1852. https://doi.org/10.3390/nano13121852

AMA Style

Toropov AA, Toropova AP, Leszczynska D, Leszczynski J. Development of Self-Consistency Models of Anticancer Activity of Nanoparticles under Different Experimental Conditions Using Quasi-SMILES Approach. Nanomaterials. 2023; 13(12):1852. https://doi.org/10.3390/nano13121852

Chicago/Turabian Style

Toropov, Andrey A., Alla P. Toropova, Danuta Leszczynska, and Jerzy Leszczynski. 2023. "Development of Self-Consistency Models of Anticancer Activity of Nanoparticles under Different Experimental Conditions Using Quasi-SMILES Approach" Nanomaterials 13, no. 12: 1852. https://doi.org/10.3390/nano13121852

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop