Next Article in Journal
Synergistic Toxicity of Fine Particulate Matter and Ozone and Their Underlying Mechanisms
Previous Article in Journal
Risk Assessment of Impairment of Fertility Due to Exposure to Tobacco Constituents Classified as Reprotoxicants
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semi-Correlations for the Simulation of Dermal Toxicity

by
Andrey A. Toropov
,
Alla P. Toropova
*,
Alessandra Roncaglioni
and
Emilio Benfenati
Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milan, Italy
*
Author to whom correspondence should be addressed.
Toxics 2025, 13(4), 235; https://doi.org/10.3390/toxics13040235
Submission received: 18 February 2025 / Revised: 17 March 2025 / Accepted: 20 March 2025 / Published: 23 March 2025
(This article belongs to the Section Novel Methods in Toxicology Research)

Abstract

The skin is the primary pathway for harmful substances to enter the body and a susceptible target organ, making compound-induced acute dermal toxicity a significant health risk. In this work, the possibility of modelling dermal toxicity using so-called semi-correlations is studied. Semi-correlations are a specific case of correlations, where one variable takes only two values. For example, 0 denotes the absence of activity (e.g., dermal toxicity), and 1 denotes the presence of activity. The described computational experiments can be carried out by interested readers using the freely available software CORAL.

1. Introduction

The skin is the primary pathway for harmful substances to enter the body and a susceptible target organ, making compound-induced acute dermal toxicity a significant health risk [1]. The research on acute dermal toxicity has consistently been a crucial component in assessing the potential risks of human exposure to active ingredients in cosmetics in particular, pharmaceuticals mainly for topical use, most substances of common use for consumers, and all substances for occupational exposure. However, it is difficult to directly identify the acute dermal toxicity of potential compounds through animal experiments alone [2].
The growing popularity of studies on various skin dermal toxicity is explained by the increase of medicinal agents assessed as safe and effective due to their natural origin and long history of use, which are, however, dangerous sometimes. The inherent natural quality of these agents does not guarantee their safety, as evidenced by the risks associated with their use [3]. It is important to test and confirm the safety and efficacy of these agents for use. According to the Organisation for Economic Co-operation and Development (OECD) guidelines, acute toxicity refers to adverse reactions observed shortly after application to the skin of a single dose of the test substance [4]. Dermal toxicity may cover several endpoints, such as irritation and sensitization, but also related effects due to systemic exposure. The era of big data and high-throughput screening technology has led to the generation and collection of vast amounts of experimental dermal toxicity data in publicly available databases. These data have stimulated the development and validation of computational simulations that are further used to predict dermal toxicity in a variety of fields and organisms [5,6,7,8].
One can expect new, more complex technologies and treatments that will require an expansion of the list of substances of interest, generally speaking, for all endpoints, and in particular for acute dermal toxicity [9]. This process can progressively result in updated versions of the database, but the new data may be introduced or not, and the frequency of the updated versions varies [10]. In all cases, the number of substances we are exposed to is much higher and increasing more rapidly than the updating procedure of the databases. Thus, there is a need to adopt further approaches to cope with dermal toxicity, different from experimental assays, while waiting for the experimental results. In silico models may be a valuable solution.
There are two options for constructing in silico models, depending on the presentation of the corresponding data. The regression model, where a continuous range of values from a certain minimum to a certain maximum is studied (simulated). Another option is the prediction of two states: the substance is active or inactive [11].
For the case of linear regression models, the least squares method is often used, which allows, based on the series of experimental and calculated values of the endpoint being studied, to find the regression coefficients C0 and C1, the meaning of which is presented in Figure 1. For the case of classification models, the least squares method is not appropriate. However, as it turned out, this method can be applied in this case if we accept the possibility of using so-called semi-correlations.
The idea of semi-correlation is an attempt to use algorithms designed to handle “ordinary” regressions for the case when the simulated quantity takes two values, interpreted as activity (which is denoted, for example, by 1) and lack of activity (which is denoted, for example, by 0). Here, the semi-correlations are used to develop a categorical model of acute dermal toxicity. The CORAL-2024 software (http://www.insilico.eu/coral, Accessed on 21 March 2025) is a tool for building semi-correlations. Simplified molecular input-line entry systems (SMILES) [12] are used to represent the molecular structure and calculate the corresponding structural descriptors [13,14].

2. Materials and Methods

2.1. Data

Data on acute dermal toxicity of 2616 compounds have been taken from the literature [15]. The effect summarises results related to different endpoints, both topical (skin sensitization, skin irritation and corrosion, and eye irritation and corrosion) and systemic (acute oral toxicity, acute inhalation toxicity, and acute dermal toxicity). The total number of active compounds (acute dermal toxicity observed) is 382. Thus, there is an imbalance in the data. For constructing binary classification models in general, and for constructing such models using the semi-correlation method, it is preferable to use balanced data where the numbers of active and inactive connections are at least approximately balanced (coincident). Considering this, the random lists containing 382 active compounds (in all splits, these are the same compounds) and 382 inactive compounds were selected randomly ten times. Every time, non-identical lists of inactive compounds have been used. For the splits, a similar approach has been used.
The approach under consideration includes the following steps. First, the available data (thus, 382 active and 382 inactive substances) are divided into four sets. The sets are designated as active training (≈25%), passive training (≈25%), calibration (≈25%), and validation (≈25%) sets. The latter remains invisible until the end of model development. This allows the set to be used for the final assessment of the predictive potential of the resulting model. The active training set is used as a basis for model development by calculating the so-called correlation weights for the molecular features involved in model building. As the outcome of the active training set, there are the selected correlation weights. During these calculations, compounds assigned to the passive set act as permanent checkers, checking whether the model is good for compounds outside the active training set. The calibration set is designed to catch the moment of overfitting when the improvement in statistical quality on the training sets is accompanied by a deterioration in statistics on the calibration set. The considered split into the specified subsets is carried out by means of the Las Vegas algorithm, aimed at obtaining the division most favourable for the calibration set, with the hope that such a division will also be favourable for the external validation set. The selection of the best split for the calibration set is repeated in ten random attempts, assessing the differences between the splits. While the active and passive training sets are initial steps in the model’s development, only with the results from the calibration set all parameters are optimised. Thus, the results from the calibration set are those relevant to describe the statistical values of the model, and not the results of the active and passive training sets, which are shown below. Moreover, the use of the calibration set is very useful to avoid overfitting, focussing attention on the results from substances not yet used in the model’s development.
However, the statistical evaluation of the calibration set provides information on the robustness of the model, but not on its predictivity when new substances are evaluated. For this purpose, the validation set is used. Thus, the validation set provides the check considering substances that have not been used in the model development. Obviously, different partitions lead to different statistical results, and the average values should be considered as the quality indicator.

2.2. Model

The model of dermal toxicity is defined as
M O D E L = C 0 + C 1 × D C W ( T , N )
D C W T , N = C W ( S k ) + C W ( S S k ) + C W ( S S S k )
The descriptor DCW(T,N) is calculated with correlation weights (CW) of SMILES attributes. Here, three types of SMILES attributes are considered: Sk is the SMILES-atom, i.e., one symbol (‘C’, ‘N’, ‘S’, etc.) or a group of symbols (‘Cl’, ‘%11, ‘@@’, etc.), which lose meaning if the symbols are separated; SSk is a pair of SMILES-atoms which are neighbours in SMILES notation; and SSSk is a sequence of three SMILES-atoms which are neighbours in the SMILES notation. T and N are parameters of the Monte Carlo optimization: T is the threshold to define rare SMILES attributes, which should be removed from the simulation process and active (non-rare) SMILES attributes, which should be involved in the simulation process; N is the number of epochs of the process of the Monte Carlo optimization aimed at defining correlation weights applied in the calculation with Equation (2).
However, when using such a model, it is not a numerical value that is needed, but a categorical definition in the format of “active or inactive” [13]. The formation of these two categories is numerically expressed by Equation (3). Graphically, these categories are presented in Figure 2. The threshold 0.5 is selected as the middle value between inactive (i.e., 0) and active (i.e., 1). Indeed, if we imagine that activity ranges from 0 to 1, we may have all possible values between these two cases, and this situation is addressed by regression models. Instead, in our case, to address a categorical output, we simply define that all substances with values below 0.5 are inactive, and the others are active. This is the easiest solution, which has been applied in other cases, for instance, by the US EPA predictive platform T.E.S.T. (https://www.epa.gov/comptox-tools/toxicity-estimation-software-tool-test, Accessed on 21 March 2025), to define whether a predicted substance is mutagenic or not. Thus, this value should not be associated with a specific toxicological potency, but it is simply used to introduce a mathematical parameter functional in our semi-correlation algorithm.
C A T E G O R Y S M I L E S =     1     a c t i v e , i f   M O D E L 0.5 0 i n a c t i v e , i f   M O D E L < 0.5
The effectiveness of binary classification was assessed according to the following statistical characteristics:
S e n s i t i v i t y   ( S e n s ) = T P T P + F N
S p e c i f i c i t y   ( S p e c ) = T N T N + F P
A c c u r a c y   ( A c c ) = T P + T N T P + F P + F N + T N
M a t h e w   c o r r e l a t i o n   c o e f f i c i e n t   ( M C C ) = T P × T N F P × F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N )

2.3. Monte Carlo Optimization

The Monte Carlo method was used to calculate the correlation weights. Two objective functions are considered in this study: TF0 and TF1.
T F 0 = r A T + r P T r A T r P T × 0.1
T F 1 = T F 0 + I I C C × 0.3
r A T and r P T are determination coefficients between the experimental and calculated endpoint values for the active and passive training sets, respectively.
IICC is the index of ideality of correlation calculated with data on the calibration set as follows [14]:
I I C C = r C m i n ( M A E C , M A E C + ) m a x ( M A E C M A E C + )
min x , y = x ,   i f   x < y y , o t h e r w i s e
max x , y = x ,   i f   x > y y , o t h e r w i s e
M A E C = 1 N k , N   i s   t h e   n u m b e r   o f   k < 0
M A E C + = 1 N + k , N   i s   t h e   n u m b e r   o f   k + 0
Δ k = o b s e r v e d k c a l c u l a t e d k
rc is the correlation coefficient between the observed and calculated values of the endpoint on the calibration set. Observed and calculated are the corresponding values of ‘y’ applied to define the corresponding categories (active/inactive).
The use of the index of ideality of correlation has a rather strong effect on the Monte Carlo process in the case of constructing conventional regression models. The essence of this effect is improving the statistical quality of correlations on the calibration set, even though it may be accompanied by some decrease in statistical quality on the training sets (usually on both: active and passive). As we clarified above, the model in the stage of the active and passive training sets is not mature, and only when all parameters are optimized is the model built up. This occurs using the calibration set. The use of the index of ideality of correlation, which is still within the process of model building, privileges the role of the calibration set versus the results of the initial steps of model building. The calibration set has a broader assessment of all the factors, which may affect the model, compared to the active and passive sets, which only use the initial correlation weights applied to part of the chemicals. As we explained, relying on the active and passive sets introduces the risk of overfitting, since apparently good results on these sets may not be replicated using new substances. Conversely, the index of ideality of correlation forces the system towards the best results obtained on the calibration set.

2.4. Applicability Domain

The applicability domain, calculated with Equation (3), defines the so-called statistical defects of SMILES attributes. These defects can be calculated as follows:
d k = P ( A k ) P ( A k ) N A k + N ( A k ) + P ( A k ) P ( A k ) N A k + N ( A k ) + P ( A k ) P ( A k ) N A k + N ( A k )
where P(Ak), P′(Ak), P″(Ak) are the probabilities of Ak in the active training, passive training, and calibration sets, respectively; N(Ak), N′(Ak), and N″(Ak) are frequencies of Ak in the active training, passive training, and calibration sets, respectively. The statistical SMILES-defects (Dj) are calculated as follows:
D j = k = 1 N A d k
where NA is the number of non-blocked SMILES attributes in the SMILES.
A SMILES falls in the applicability domain if
D j < 2 D ¯
The D ¯ is the average value of the Dj on the active training set.

2.5. Mechanistic Interpretation

Having the numerical data on the correlation weights of codes applied in quasi-SMILES, which were observed in several runs of the Monte Carlo optimization, one can extract three categories of these codes:
(i)
Codes that have a positive value for the correlation weight in all runs. These are promoters of endpoint increase;
(ii)
Codes that have a negative value for the correlation weight in all runs. These are promoters of endpoint decrease;
(iii)
Codes that have both negative and positive values for the correlation weight in different runs of the optimization. These are codes with unclear roles (one cannot classify these features as promoters of increase or decrease for the endpoint).

3. Results

The statistical results obtained from the described computational experiments are reported (i) for the case of applying the target function TF0 in Table 1; and (ii) for the case of applying target function TF1 in Table 2; as well as in Figure 3.
Figure 3 shows that the sequence of optimizations using TF0 and TF1 is significantly different; for the case of optimization without the correlation ideality index (TF0), the threshold is T = 3; the number of training epochs N=3 was adopted, i.e., descriptor DCW (3, 3) was used. For the optimization with the target function TF1, descriptor DCW (3, 15) was used. One can see from Figure 3 that for semi-correlations obtained with the target function TF1, there is a distribution of statistical quality in favour of the calibration set.
Table 1 presents the statistical characteristics of the models obtained using Monte Carlo optimization with the target function TF0. The Mathew correlation coefficient (MCC) is the most informative measure of the predictive potential of classification models. An MCC value above 0.5 indicates that the model is good, while a value close to 0 means that the model is poor. It can be seen that the MCC values for the active training set and passive training set are generally significantly higher than the MCC values for the calibration and validation sets. The values change depending on the split, and there are low values in some cases, indicating problems.
Table 2 presents the statistical characteristics of the models obtained using Monte Carlo optimization with the target function TF1. It can be seen that when using TF1 as the objective function, the situation changes: For the training samples, the MCC values are quite low, but at the same time, the MCC values for the calibration set and the validation set are quite high. The reason for this is that the simulation process, due to the use of the correlation ideality index, turns towards increasing the statistical quality of the calibration set, without forcing the improvement of the statistics on the training sets. When the process of optimisation occurs on the active and passive training sets, overtraining occurs, and thus the model is not able to extract rules of general value. The final development of the model occurs during the calibration set step. Thus, the statistics for the calibration set are those to be used to characterise the results of the model internally, while the results on the validation set represent the situation expected when the model is used externally. Fortunately, as can be seen from Table 2, the increase in MCC for the calibration set is accompanied by an increase in MCC for the validation set, which confirms that the model is mature and valid, and thus can be used for predictions.
The models obtained using TF1 are as follows:
MODEL = 0.3787 + 0.04072 × DCW(3, 15)
MODEL = 0.3512 + 0.04326 × DCW(3, 15)
MODEL = 0.3868 + 0.03540 × DCW(3, 15)
MODEL = 0.5030 + 0.05521 × DCW(3, 15)
MODEL = 0.5911 + 0.07688 × DCW(3, 15)
MODEL = 0.5841 + 0.04142 × DCW(3, 15)
MODEL = 0.5362 + 0.02138 × DCW(3, 15)
MODEL = 0.5474 + 0.04579 × DCW(3, 15)
MODEL = 0.5758 + 0.05128 × DCW(3, 15)
MODEL = 0.5357 + 0.02854 × DCW(3, 15)
The check performed for the applicability domain has shown that the applicability domain observed for external validation sets according to the statistical defect values, on average, is more than 80%.
Table 3 presents the results of five probes of the Monte Carlo optimization using target function TF1. Table 3 presents the observed promoters of increase and decrease in the probability of dermal toxicity in the five optimization probes along with their prevalence in the active and passive training sets and the calibration set, depending on the sign of the correlation weight, i.e., positive or negative. Dermal toxicity is a complex phenomenon. Therefore, to discern truly reliable bases for the mechanistic interpretation of models, it is necessary to process data on the behaviour of larger molecular fragments than those collected in Table 3. Nevertheless, it should be noted that there are fragments for which a stable role in impacting the probability of dermal toxicity is observed. No less important, among these, there are fragments characterized by significant prevalence in the training and the calibration sets.

4. Discussion

The approach under consideration requires a large amount of data (100 or more substances). It is impossible to use the described methodology for 10 or 15 substances. In addition, a balance between the numbers of active and inactive substances is necessary. As Box said [16], “all models are wrong, but some are useful”. Can the approach under consideration be assessed as useful?
The reliability and reproducibility of the statistical quality of the forecast obtained by using semi-correlations are confirmed across ten significantly different distributions of available data in the training samples and the validation set. It should be noted that the filling of inactive molecules for each of the mentioned separations was carried out from a total array containing 2234 substances. In other words, the probability of random success of the described approach should be considered a very unlikely event.
The use of the correlation ideality index deserves special attention. The fundamental feature of the stochastic process implemented using the above-mentioned index is its focus on the forecast description for the calibration set. As noted above, this is done regardless of the statistical values for the training samples (active and passive). The success of this technology is due to the fact that the above-mentioned sets are nevertheless taken into account. However, in the case of Monte Carlo optimization with TF1, the role of the training sets in generating a model through the correlation weights is exploited and optimised, giving a strong role to the results on the calibration set. It is assumed that good statistical quality on the calibration set should be accompanied by good statistical quality for the external set, which uses substances unknown in the model development process.
The traditional accepted concept of QSPR/QSAR analysis requires the same or at least similar statistical quality of models for all samples used (training and validation). In particular, models should not be overfitted, which is a problem [17]. In the case of simulating endpoints using the described stochastic processes, at the first stage of optimization, the main components (affecting the correlation weights) are molecular fragments and molecules exhibiting average behaviour. Using these data, the optimization algorithm arrives at a model consistent between the training and calibration samples. However, there is a risk of overly optimizing the weights for molecular fragments exhibiting “non-standard” and “non-average” behaviour observed in the training sets. Their optimization leads to an improvement in statistical quality only for the training set. At the same time, the statistical quality of the model for the calibration set decreases [17]. To address this issue, the application of the correlation ideality index blocks the growing influence of non-standard and atypical components in the optimization process. In some cases, this may result in a situation that is apparently a paradoxical model, where the statistical quality for training samples is significantly lower than that for the calibration set, and the validation set exists. This is a semantic aspect because what we refer to as training sets in our modelling approach are only the initial sets, representing the preliminary phases of the modelling. Instead, the results from the calibration set are close to the results for the training set in other models, which do not split the initial set into several subsets. This situation has been shown and discussed in several works where the CORAL-2024 software (https://www.insilico.eu/coral, Accessed on 21 March 2025) is used [18,19,20,21,22,23,24,25].
The analysis of Table 3 shows that the increase in the probability of acute dermal toxicity is caused by the branching of the carbon skeleton, as well as the presence of chlorine, oxygen and nitrogen atoms, and double bonds. A decrease in the probability of acute dermal toxicity is caused by the presence of nitrogen atoms with a triple bond, as well as sulphur atoms with a double bond. The complexity of the mechanisms of dermal toxicity has been discussed many times [26,27,28,29,30,31,32]. Some agreement with the assumptions about the mechanisms of dermal toxicity and the structural alerts presented in Table 3 is observed. The presence of chlorine atoms can also implicate dermal toxicity [28], and oxygen (and other) atoms may indicate the possibility of oxidative stress, also in relation to the branching of the carbon skeleton and polyaromatic rings [29,30,31,32].
The approach discussed has been used to model other endpoints, but this is the first time that the semi-correlation method has been used to model dermal toxicity.
The approach considered is convenient for the practical construction of the models in that its implementation requires only SMILES and experimental data without involving additional descriptors. For interpreting the results, this approach provides a convenient opportunity to assess the relevance of various molecular features according to their distribution in training and the validation sets. Finally, it provides an opportunity for a convenient statistical version of the mechanistic interpretation of the results obtained.
A comparison of the models obtained here with the models proposed in the literature is presented in Table 4. The statistical quality of these models is quite comparable.
We note that the results in Ref. [15] are particularly interesting because in that study authors used exactly the same substances that we used, since we derived the set of substances from that paper. This offers the opportunity to compare the results obtained using the random forest algorithm, as in the original work, with those from our new algorithm. Random forest is a complex machine learning approach, which requires building many parallel models, which are then used collectively. Our algorithm is much simpler and easier, relying on a single model. Another important difference is that the CORAL software does not require the calculation of molecular descriptors. In the original paper, several descriptors were used jointly: two-dimensional Morgan fingerprints, MACCS keys, and the Mordred descriptor. CORAL simply needs the SMILES, and the characters used in the SMILES are used to construct the model, as described above. The results shown in Table 4 for the model as in Ref. [15] were obtained using five-fold external cross-validation. These values may be compared with the results that we obtained using ten splits on the external validation set. The average sensitivity that we obtained is 0.88, which is higher than the 0.74 obtained in Ref. [15]. The average specificity that we obtained is 0.87, which is higher than the 0.78 obtained in Ref. [15]. Thus, in the specific case of the use of exactly the same substances, we obtained better results.

5. Conclusions

Semi-correlations may provide a basis for developing qualification models for acute dermal toxicity. The reliability of the suggested approach has been checked with ten random splits. This proves the consistency and robustness of our approach. The applicability domain and mechanistic interpretation of these models have been suggested and discussed. The resulting model showed good statistical performance for acute dermal toxicity based on information about the substance using its molecular representation as SMILES. One can repeat the computational experiments described using freeware and instructions available on the Internet (http://www.insilico.eu/coral, Accessed on 21 March 2025).

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/toxics13040235/s1, Tables S1–S10 contain the statistical characteristics of models observed for 10 splits in the case of optimization with target function TF1. The supplementary materials section contains the compositions of all ten sets considered. The active compounds are the same (382), but the inactive ones were selected from the entire set (2234 inactive substances) randomly (also 382 substances each time). For each split, the compounds included in the active training set (A), passive training set (P), calibration set (K), and validation set (V) are indicated.

Author Contributions

Conceptualization, A.A.T., A.P.T., A.R. and E.B.; data curation, A.A.T., A.P.T., A.R. and E.B.; writing—original draft preparation, A.A.T., A.P.T., A.R. and E.B.; writing—review and editing, A.A.T., A.P.T., A.R. and E.B.; supervision, A.R. and E.B.; project administration, E.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available in the article or its Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ma, X.; Fu, X.; Wang, T.; Zhuo, L.; Zou, Q. GraphADT: Empowering interpretable predictions of acute dermal toxicity with multi-view graph pooling and structure remapping. Bioinformatics 2024, 40, btae438. [Google Scholar] [CrossRef]
  2. Lou, S.; Yu, Z.; Huang, Z.; Wang, H.; Pan, F.; Li, W.; Liu, G.; Tang, Y. In silico prediction of chemical acute dermal toxicity using explainable machine learning methods. Chem. Res. Toxicol. 2024, 37, 513–524. [Google Scholar] [CrossRef] [PubMed]
  3. Belsito, D.; Bickers, D.; Bruze, M.; Calow, P.; Greim, H.; Hanifin, J.M.; Rogers, A.E.; Saurat, J.H.; Sipes, I.G.; Tagami, H. A toxicologic and dermatologic assessment of cyclic and non-cyclic terpene alcohols when used as fragrance ingredients. Food Chem. Toxicol. 2008, 46 (Suppl. S11), S1–S71. [Google Scholar] [CrossRef]
  4. Kumar, S.; Nikam, Y.P.; Baruah, S.; Kushari, S.; Ghose, S.; Prasad, S.K.; Das, A.; Banu, Z.W.; Kalita, J.; Laloo, D. Safety profile assessment of standardized root extract of Potentilla fulgens in Wistar rats: Acute and sub-acute dermal toxicity study. J. Appl. Pharm. Sci. 2024, 14, 138–147. [Google Scholar] [CrossRef]
  5. Dattaray, D.; Roy, P.; Chakraborty, J.; Mandal, T.K. Evaluation of acute and subacute dermal toxicity of antibacterial bioactive glass-infused surgical cotton gauze in Wistar rats. Drug Chem. Toxicol. 2024, 47, 1–12. [Google Scholar] [CrossRef]
  6. Aswathy, A.A.; Shibin, F.P.; Maya, M.; Sasidharan, S.; Radha, R.K. Acute dermal toxicity study of flower essential oil from Etlingera fenzlii (Kurz.) K. Schum. in Wistar albino rats. Res. J. Biotechnol. 2023, 18, 126–130. [Google Scholar] [CrossRef]
  7. Pengiran, H.; Kamaldin, J.B.; Leo, B.F.; Sabri, N. Evaluating the acute dermal toxicity and skin irritation of the temephos impregnated cellulose nanofiber intended for mosquito larvicide. J. Health Transl. Med. 2023, 1, 194–201. [Google Scholar] [CrossRef]
  8. van der Kamp, S.; Elliott, C. Increasing confidence in waiving dermal toxicity studies: A comparison of oral and dermal acute data with alternative approaches for agrochemicals and products. Regul. Toxicol. Pharmacol. 2021, 121, 104865. [Google Scholar] [CrossRef]
  9. Friday, A.; Abalaka, M.E.; Halimat, A.; Builders, P.F. Gas chromatography-mass spectrometry analysis, druggability and in-silico dermatopharmacokinetics screening of mitracarpus scaber extract. J. Phytomedicine Ther. 2024, 23, 1607–1618. [Google Scholar] [CrossRef]
  10. Christian Fischer, B.; Harrison Foil, D.; Kadic, A.; Kneuer, C.; König, J.; Herrmann, K. Pobody’s Nerfect: (Q)SAR works well for predicting bacterial mutagenicity of pesticides and their metabolites, but predictions for clastogenicity in vitro have room for improvement. Comput. Toxicol. 2024, 30, 100318. [Google Scholar] [CrossRef]
  11. Pawar, G.; Madden, J.C.; Ebbrell, D.; Firman, J.W.; Cronin, M.T.D. In silico toxicology data resources to support read-across and (Q)SAR. Front. Pharmacol. 2019, 10, 561. [Google Scholar] [CrossRef]
  12. Weininger, D. Smiles. 3. Depict. Graphical depiction of chemical structures. J. Chem. Inf. Comput. Sci. 1990, 30, 237–243. [Google Scholar] [CrossRef]
  13. Toropova, A.P.; Toropov, A.A. CORAL: Binary classifications (active/inactive) for drug-induced liver injury. Toxicol. Lett. 2017, 268, 51–57. [Google Scholar] [CrossRef] [PubMed]
  14. Iovine, N.; Toropova, A.P.; Toropov, A.A.; Roncaglioni, A.; Benfenati, E. Models for the No-Observed-Effect Concentration (NOEC) and maximal half-effective concentration (EC50). Toxics 2024, 12, 425. [Google Scholar] [CrossRef]
  15. Borba, J.V.B.; Alves, V.M.; Braga, R.C.; Korn, D.R.; Overdahl, K.; Silva, A.C.; Hall, S.U.S.; Overdahl, E.; Kleinstreuer, N.; Strickland, J.; et al. STopTox: An in silico alternative to animal testing for acute systemic and topical toxicity. Environ. Health Perspect. 2022, 130, 027012–1. [Google Scholar] [CrossRef]
  16. Box, G.E.P. Science and statistics. J. Am. Stat. Assoc. 1976, 71, 791–799. [Google Scholar] [CrossRef]
  17. Toropova, A.P.; Toropov, A.A.; Benfenati, E.; Gini, G. Co-evolutions of correlations for QSAR of toxicity of organometallic and inorganic substances: An unexpected good prediction based on a model that seems untrustworthy. Chemom. Intell. Lab. Syst. 2011, 105, 215–219. [Google Scholar] [CrossRef]
  18. Ahmadi, S.; Lotfi, S.; Hamzehali, H.; Kumar, P. A simple and reliable QSPR model for prediction of chromatography retention indices of volatile organic compounds in peppers. RSC Adv. 2024, 14, 3186–3201. [Google Scholar] [CrossRef]
  19. Soleymani, N.; Ahmadi, S.; Shiri, F.; Almasirad, A. QSAR and molecular docking studies of isatin and indole derivatives as SARS 3CLpro inhibitors. BMC Chem. 2023, 17, 32. [Google Scholar] [CrossRef]
  20. Rani, P.; Chahal, S.; Priyanka; Kumar, P.; Singh, D.; Sindhu, J. Structural attributes driving λmax towards NIR region: A QSPR approach. Chemom. Intell. Lab. Syst. 2024, 252, 105199. [Google Scholar] [CrossRef]
  21. Kumar, P.; Kumar, A.; Sindhu, J.; Lal, S. Quasi-SMILES as a basis for the development of QSPR models to predict the CO2 capture capacity of deep eutectic solvents using correlation intensity index and consensus modelling. Fuel 2023, 345, 128237. [Google Scholar] [CrossRef]
  22. Tajiani, F.; Ahmadi, S.; Lotfi, S.; Kumar, P.; Almasirad, A. In-silico activity prediction and docking studies of some flavonol derivatives as anti-prostate cancer agents based on Monte Carlo optimization. BMC Chem. 2023, 17, 87. [Google Scholar] [CrossRef]
  23. Goyal, S.; Rani, P.; Chahar, M.; Hussain, K.; Kumar, P.; Sindhu, J. Analysis of good and bad fingerprint for identification of NIR based optical frameworks using Monte Carlo method. Microchem. J. 2024, 196, 109549. [Google Scholar] [CrossRef]
  24. Vukomanović, P.; Stefanović, M.; Stevanović, J.M.; Petrić, A.; Trenkić, M.; Andrejević, L.; Lazarević, M.; Sokolović, D.; Veselinović, A.M. Monte Carlo optimization method based QSAR modeling of placental barrier permeability. Pharm. Res. 2024, 41, 493–500. [Google Scholar] [CrossRef]
  25. Bagri, K.; Kapoor, A.; Kumar, P.; Kumar, A. Hybrid descriptors–conjoint indices: A case study on imidazole-thiourea containing glutaminyl cyclase inhibitors for design of novel anti-Alzheimer’s candidates. SAR QSAR Environ. Res. 2023, 34, 361–381. [Google Scholar] [CrossRef]
  26. Fuadah, Y.N.; Pramudito, M.A.; Firdaus, L.; Vanheusden, F.J.; Lim, K.M. QSAR Classification modeling using machine learning with a consensus-based approach for multivariate chemical hazard end points. ACS Omega 2024, 9, 50796–50808. [Google Scholar] [CrossRef]
  27. Luechtefeld, T.; Marsh, D.; Rowlands, C.; Hartung, T. Machine learning of toxicological big data enables read-across structure activity relationships (RASAR) outperforming animal test reproducibility. Toxicol. Sci. 2018, 165, 198–212. [Google Scholar] [CrossRef]
  28. Bernard, A. Chlorination products: Emerging links with allergic diseases. Curr. Med. Chem. 2007, 14, 1771–1782. [Google Scholar] [CrossRef]
  29. Fan, J.; Song, W.; Wang, Y.; Li, S.; Zhang, C.; Wang, X.; Yang, X. An in-depth review of the dermal toxicity of T-2 toxin: Clinical symptoms, injury mechanisms, and treatment approach. Food Chem. Toxicol. 2024, 193, 114986. [Google Scholar] [CrossRef]
  30. Larnac, E.; Montoni, A.; Haydont, V.; Marrot, L.; Rochette, P.J. Lipid peroxidation as the mechanism underlying polycyclic aromatic hydrocarbons and sunlight synergistic toxicity in dermal fibroblasts. Int. J. Mol. Sci. 2024, 25, 1905. [Google Scholar] [CrossRef]
  31. Wu, Z.; Li, J.; Ma, P.; Li, B.; Xu, Y. Long-term dermal exposure to diisononyl phthalate exacerbates atopic dermatitis through oxidative stress in an FITC-induced mouse model. Front. Biol. 2015, 10, 537–545. [Google Scholar] [CrossRef]
  32. Liang, Y.; Zhang, H.; Cai, Z. New insights into the cellular mechanism of triclosan-induced dermal toxicity from a combined metabolomic and lipidomic approach. Sci. Total Environ. 2021, 757, 143976. [Google Scholar] [CrossRef]
Figure 1. Comparison of the traditional correlation and the semi-correlation.
Figure 1. Comparison of the traditional correlation and the semi-correlation.
Toxics 13 00235 g001
Figure 2. Graphical representation of applying semi-correlation in the classification model.
Figure 2. Graphical representation of applying semi-correlation in the classification model.
Toxics 13 00235 g002
Figure 3. The comparison of histories of the Monte Carlo optimizations for the case of target functions TF0 and TF1.
Figure 3. The comparison of histories of the Monte Carlo optimizations for the case of target functions TF0 and TF1.
Toxics 13 00235 g003
Table 1. The statistical characteristics of models observed for ten splits in the case of the target function TF0.
Table 1. The statistical characteristics of models observed for ten splits in the case of the target function TF0.
SplitSet *SensSpecAccMCCTNTPFPFNAll
1A0.65260.68750.67020.340462663033191
P0.67440.79440.74090.473058852228193
C0.66300.80610.73680.474961791931190
V0.62390.74070.67370.361368602141190
2A0.82290.76040.79170.584579732317192
P0.77550.79570.78530.571076741922191
C0.77270.78640.78010.558368812220191
V0.79000.73330.76320.524579662421190
3A0.75790.81440.78650.573472791823192
P0.73640.73170.73440.464381602229192
C0.73400.68750.71050.421969663025190
V0.77110.82240.80000.593564881919190
4A0.77080.81050.79060.581774771822191
P0.63460.80230.71050.438566691738190
C0.69070.69150.69110.382267652930191
V0.69410.71030.70310.402559763126192
5A0.81250.76600.78950.579278722218190
P0.71570.87780.79170.597073791129192
C0.52750.73740.63680.271348732643190
V0.64520.88890.77080.552960881133192
6A0.80000.76840.78420.568776732219190
P0.79760.75930.77600.552867822617192
C0.76670.72000.74210.486169722821190
V0.80530.68350.75520.491991542522192
7A0.76040.77080.76560.531373742223192
P0.80230.64420.71580.447669673717190
C0.63740.44000.53400.078858445633191
V0.60550.58540.59690.189266483443191
8A0.74740.77320.76040.520871752224192
P0.74730.77780.76320.525368772223190
C0.57290.76840.67020.347955732241191
V0.68000.75820.71730.438568692232191
9A0.71280.71880.71580.431567692727190
P0.77230.74190.75770.514578692423194
C0.56120.52170.54210.083055484443190
V0.52810.62380.57890.152447633842190
10A0.81050.83870.82450.649377781518188
P0.84810.87160.86170.717367951412188
C0.68000.75530.71650.435968712332194
V0.71300.75580.73200.465877652131194
* A = active training set; P = passive training set; C = calibration set; V = validation set; TN = true negative; TP = true positive; FN = false negative; FP = false positive; All = the total number of compounds in a set.
Table 2. The statistical characteristics of models observed for ten splits in the case of optimization with target function TF1.
Table 2. The statistical characteristics of models observed for ten splits in the case of optimization with target function TF1.
SplitSet *SensSpecAccMCCTNTPFPFNAll
1A0.64210.64580.64400.287961623434191
P0.62790.65420.64250.280954703732193
C0.92390.87760.90000.80128586127190
V0.83490.87650.85260.705091711018190
2A0.68750.68750.68750.375066663030192
P0.72450.76340.74350.487971712227191
C0.92050.87380.89530.79198190137191
V0.90000.80000.85260.705790721810190
3A0.62110.73200.67710.355359712636192
P0.62730.70730.66150.331269582441192
C0.91490.86460.88950.78018683138190
V0.92770.84110.87890.76277790176190
4A0.67710.72630.70160.403865692631191
P0.57690.79070.67370.372060681844190
C0.90720.85110.87960.75998880149191
V0.87060.90650.89060.778174971011192
5A0.72920.74470.73680.473870702426190
P0.65690.77780.71350.435767702035192
C0.76920.76770.76840.536670762321190
V0.84950.88890.86980.739479881114192
6A0.70530.73680.72110.442367702528190
P0.69050.72220.70830.410958783026192
C0.88890.89000.88950.778580891110190
V0.91150.87340.89580.7849103691010192
7A0.63540.63540.63540.270861613535192
P0.63950.67310.65790.311855703431190
C0.90110.89000.89530.79058289119191
V0.86240.90240.87960.75899474815191
8A0.60000.73200.66670.335057712638192
P0.74730.70710.72630.454068702923190
C0.90630.88420.89530.79078784119191
V0.95000.87910.91620.83339580115191
9A0.70210.67710.68950.379366653128190
P0.74260.68820.71650.431575642926194
C0.93880.79350.86840.74259273196190
V0.84270.86140.85260.704175871414190
10A0.66320.70970.68620.373263662732188
P0.74680.67890.70740.420359743520188
C0.86000.79790.82990.659886751914194
V0.87960.89530.88660.77209577913194
* A = active training set; P = passive training set; C = calibration set; V = validation set; TN = true negative; TP = true positive; FN = false negative; FP = false positive; All = the total number of compounds in a set.
Table 3. A collection of promoters for the increase or decrease of dermal toxicity observed in five processes of the Monte Carlo optimization for split 1.
Table 3. A collection of promoters for the increase or decrease of dermal toxicity observed in five processes of the Monte Carlo optimization for split 1.
SMILES AttributeCWs Probe 1CWs Probe 2CWs Probe 3CWs Probe 4CWs Probe 5NA *NPNCdk
(Equation (16))
C...C...(...0.42821.14840.65940.71110.84348988950.0003
c...1...c...0.38181.69820.03020.31921.10508781890.0004
C...(...=...0.30310.28860.92260.14880.12157478740.0001
1...c...(...0.60790.23210.01760.19390.45965055450.0006
Cl..........0.40251.44931.21600.37780.49654434440.0009
O...C...C...0.83110.13210.39250.59690.55814337630.0020
c...(...O...0.89020.72460.99601.01990.64174336350.0007
N...C.......1.69062.37101.43141.45420.98204162470.0014
(...Cl..(...0.57340.14071.41430.44970.20513525360.0012
C...(...(...0.67261.17311.29480.85871.42333328220.0014
N...(...C...1.57900.68920.41591.67331.19053135270.0008
c...O.......0.96870.86990.97201.42561.00482720340.0019
=...C...(...0.92700.62790.62370.74991.28202139230.0022
C...#.......0.91100.75710.82950.83730.65871915130.0013
C...C...=...1.70711.86310.73220.82551.10051924160.0014
C...........−0.2116−0.5459−0.4764−0.4084−0.13541831861840.0000
(...........−0.1254−0.7692−0.3606−0.2885−0.34161641651730.0002
=...........−0.6579−0.1147−0.3085−0.2914−0.26371111291390.0008
N...(.......−0.9757−0.4264−0.3162−0.6883−0.58845157520.0004
Cl..(.......−0.3343−0.2207−0.8490−0.5057−0.12713630410.0011
S...........−0.8896−0.6718−1.0932−1.1870−0.00932832520.0023
(...c...(...−0.7161−0.3445−0.7610−0.8026−0.91001817250.0014
N...C...(...−1.4065−1.3928−0.4576−0.8499−0.66311825170.0013
n...........−0.5006−0.4051−0.5518−0.2254−0.25661821180.0005
N...#.......−1.6835−1.7643−1.1689−1.7475−1.42761512100.0014
n...c.......−0.9213−0.1947−0.6151−1.1517−0.23671516150.0002
c...1...(...−1.0012−2.0611−2.2902−1.1616−1.63731321140.0017
=...(...(...−0.2949−0.2031−0.5799−0.8272−0.21281017230.0027
S...=.......−1.0649−0.1770−0.5255−1.5191−2.67991011260.0036
n...1.......−1.2993−0.6785−0.5283−0.8285−0.826191490.0016
* NA, NP, and NC are the frequencies of SMILES-attributes in the active training, passive training, and calibration sets, respectively; dk is the statistical defect of SMILES-attribute calculated using Equation (16).
Table 4. The statistical quality of models for acute dermal toxicity.
Table 4. The statistical quality of models for acute dermal toxicity.
SensitivitySpecificityReference
0.740.78[15]
0.710.81[26]
0.890.94[27]
0.88 ± 0.040.87 ± 0.03The average and dispersion on validation sets on models suggested here
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Toropov, A.A.; Toropova, A.P.; Roncaglioni, A.; Benfenati, E. Semi-Correlations for the Simulation of Dermal Toxicity. Toxics 2025, 13, 235. https://doi.org/10.3390/toxics13040235

AMA Style

Toropov AA, Toropova AP, Roncaglioni A, Benfenati E. Semi-Correlations for the Simulation of Dermal Toxicity. Toxics. 2025; 13(4):235. https://doi.org/10.3390/toxics13040235

Chicago/Turabian Style

Toropov, Andrey A., Alla P. Toropova, Alessandra Roncaglioni, and Emilio Benfenati. 2025. "Semi-Correlations for the Simulation of Dermal Toxicity" Toxics 13, no. 4: 235. https://doi.org/10.3390/toxics13040235

APA Style

Toropov, A. A., Toropova, A. P., Roncaglioni, A., & Benfenati, E. (2025). Semi-Correlations for the Simulation of Dermal Toxicity. Toxics, 13(4), 235. https://doi.org/10.3390/toxics13040235

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop