Next Article in Journal
Insights into the Intraspecific Variability of the above and Belowground Emissions of Volatile Organic Compounds in Tomato
Next Article in Special Issue
QSAR Assessing the Efficiency of Antioxidants in the Termination of Radical-Chain Oxidation Processes of Organic Compounds
Previous Article in Journal
Strawberry Decreases Intraluminal and Intestinal Wall Hydrolysis of Testosterone Undecanoate
Previous Article in Special Issue
QSAR Model for Predicting the Cannabinoid Receptor 1 Binding Affinity and Dependence Potential of Synthetic Cannabinoids
Due to scheduled maintenance work on our core network, there may be short service disruptions on this website between 16:00 and 16:30 CEST on September 25th.
Article

Classification of Congeneric and QSAR of Homologous Antileukemic S–Alkylcysteine Ketones

1
Centro de Investigación Traslacional San Alberto Magno (CITSAM), Universidad Católica de Valencia San Vicente Mártir, Guillem de Castro-94, E-46001 València, Spain
2
Escuela de Doctorado, Universidad Católica de Valencia San Vicente Mártir, E-46008 València, Spain
3
Institut Universitari de Ciència Molecular, Universitat de València, Edifici d’Instituts de Paterna, P. O. Box 22085, E-46071 València, Spain
*
Authors to whom correspondence should be addressed.
Academic Editor: Alla P. Toropova
Molecules 2021, 26(1), 235; https://doi.org/10.3390/molecules26010235
Received: 4 November 2020 / Revised: 30 December 2020 / Accepted: 31 December 2020 / Published: 5 January 2021
(This article belongs to the Special Issue QSAR and QSPR: Recent Developments and Applications II)

Abstract

Based on a set of six vector properties, the partial correlation diagram is calculated for a set of 28 S-alkylcysteine diazomethyl- and chloromethyl-ketone derivatives. Those with the greatest antileukemic activity in the same class correspond to high partial correlations. A periodic classification is performed based on information entropy. The first four characteristics denote the group, and the last two indicate the period. Compounds in the same period and, especially, group present similar properties. The most active substances are situated at the bottom right. Nine classes are distinguished. The principal component analysis of the homologous compounds shows five subclasses included in the periodic classification. Linear fits of both antileukemic activities and stability are good. They are in agreement with the principal component analysis. The variables that appear in the models are those that show positive loading in the principal component analysis. The most important properties to explain the antileukemic activities (50% inhibitory concentration Molt-3 T-lineage acute lymphoblastic leukemia minus the logarithm of 50% inhibitory concentration Nalm-6 B-lineage acute lymphoblastic leukemia and stability k) are ACD logD, surface tension and number of violations of Lipinski’s rule of five. After leave-m-out cross-validation, the most predictive model for cysteine diazomethyl- and chloromethyl-ketone derivatives is provided.
Keywords: partial correlation diagram; periodic classification; information entropy; principal component analysis partial correlation diagram; periodic classification; information entropy; principal component analysis

1. Introduction

Nowadays, cancer is one of the most widespread diseases. It appears in different tissues and cells. Regarding its causes, there are a wide variety of carcinogens, both endogenous and exogenous. Breast, lung and colon are the most common cancers in developed countries. The global burden of cancer continues to increase, largely because of the aging and growth of the world population alongside the habits or behaviors that continuously expose us to carcinogens. Governments invest in preventive and informative public health campaigns. The most popular is against smoking, but there are many others such as preventives against breast and colon cancers, which are well known [1]. Owing to the rise in cancer, the search for anticancer drugs is still a target of study by many researchers. Most S-alkylcysteine diazomethyl- and chloromethyl-ketone derivatives have been shown to have anticancer action against acute lymphoblastic leukemia (ALL). They have been tested successfully [2,3,4,5,6,7]. The structures of these compounds are pretty close to amino acid cysteine (Cys).
The treatment of N-methoxycarbonyl C-carboxylate ester derivatives of S-methyl-l-cysteine by chloroperoxidase/hydrogen peroxide resulted in the oxidation of sulfur to produce (RS) sulfoxide in moderate to high diastereomeric excess [8]. The (SS) natural product sulfoxide chondrine was obtained via biotransformation of the N-tert-butyloxycarbonyl (Boc) derivative of l-4-S-morpholine-2-carboxylic acid using Beauveria bassiana or B. caledonica. The nucleophilic amino acids, largely employed for the peptide chemical modification, are the lysine and the cysteine residues. Cysteine modification is performed via its thiol side chain, which is characterized by a strong nucleophilicity, higher than that of a primary amine as amino acid lysine, which is protonated at pH values below 9.0. Therefore, a cysteine can react faster than lysine, resulting in the selective modification of a key amino acid over other residues. A possible synthetic route is the S-alkylation reaction; in this regard, post-translational modifications occurring on this amino acid are essential for the biological function of many proteins. In particular, numerous signaling proteins are post-translationally lipidated on a cysteine residue. Since this lipidation is essential for the correct localization and function of these proteins, the enzymes responsible for the covalent introduction/removal of lipid moieties have been considered interesting targets for blocking aberrant signaling processes [9].
In earlier publications, our research group showed a quantitative structure–activity relationship (QSAR) of sesquiterpene lactones (STLs) with potential antileukemic activity, with the aim of predicting inhibitors of Myb-induced gene expression and their mechanisms of action [10,11]. Moreover, molecular classifications of some series of phenolic compounds [12,13,14,15], triterpenoids and steroids [16] by information entropy were reported and related to their antioxidant activity. In the present report, 28 S-alkylcysteine diazomethyl- and chloromethyl-ketone derivatives were classified using this information entropy-based algorithm. The scientific rationale behind the classification is because the dodecyl derivative (12) is an exceptionally active compound against leukemia cells, the length of the alkyl chain has a profound effect on the antileukemic potency of the homologous series and the congeneric series may be useful for treating patients with therapy-refractory or relapsed leukemia. Thus, we want to validate if different moieties in the congeneric series correspond to the same potency. The objective of this study was to predict the antileukemic activity of these compounds based on their molecular structures; moreover, a study of QSAR and a principal component analysis (PCA) related the antileukemic activity of a homologous series of S-alkylcysteine chloromethyl-ketone derivatives to the physical and chemical properties of these compounds.

2. Results and Discussion

Figure 1 shows the basic structure of cysteine diazomethyl- and chloromethyl-ketone derivatives.
Table 1 lists the vector of properties of cysteine diazomethyl- and chloromethyl-ketone derivatives and experimental data of antileukemic activity (IC50) and stability k.

2.1. GraphCor Partial Correlation Diagram

The matrix of Pearson correlation coefficients has been calculated between each pair of vector properties <i1,i2,i3,i4,i5,i6> for the 28 cysteine diazomethyl- and chloromethyl-ketone derivatives. The Pearson intercorrelations are computed for the partial correlation diagram, which contains high partial correlations (r ≥ 0.75), medium partial correlations (0.50 ≤ r < 0.75), low partial correlations (0.25 ≤ r < 0.50) and zero partial correlations (r < 0.25). Pairs of compounds with high partial correlation show a similar vector property. With the Equipartition Conjecture of Entropy Production, the partial correlations matrix (cf. Figure 2) contains 187 high, 44 medium, 116 low and 31 zero partial correlations. Many partial correlations are high. Red lines, representing high partial correlations, link cysteine derivatives with the greatest antileukemic activity because the most active compounds (11 and 12) are taken as reference molecules with vector properties <111111>. The antileukemic activities are expressed as IC50.

2.2. MolClas Molecular Classification Based on the Equipartition Conjecture of Entropy Production

The grouping rule is the following: two molecules are assigned to the same class if rb, where b is the classification level. A comparative analysis of the molecular dataset, from 28 classes (each compound in its own class) to one class (containing all compounds), by the method of information entropy theory, matching <i1,i2,i3,i4,i5,i6> and classification at level b (Cb), is calculated for antileukemic activity [17] and summarized in Table 2.
The grouping rule in the case with equal weights ak = 0.5, for the classification level 0.94 ≤ b ≤ 0.96, allows nine classes (grouped from Class 1 to Class 9, cf. Table 3).
The classes above are obtained with the associated entropy h(Rb) = 38.32, which is the classification closest to the cut-off point of the entropy vs. classification level with its trend line (cf. Figure 3).
Table 2 shows a classification of periodic properties by using a procedure based on the information entropy theory (artificial intelligence). The first four features were taken to denote the group or column, and the last two features were used to indicate the period or row in the table of periodic classification. Cysteine derivatives in the same group present similar properties. Furthermore, compounds also in the same period show maximum resemblance. In this report, the cysteine diazomethyl- and chloromethyl-ketone derivatives, in the table, are related to the experimental data of antileukemic bioactivity properties, taken from the technical literature [2,3,4,5,6,7]. The antileukemic activity increases on going right through a period and augments when descending in a group. The chloromethyl-ketone derivatives with the greatest activity (Class 1, compounds 11, 12 and 24) are grouped into the same class, corresponding to acetyl amides with a linear chain containing either 11 or 12 carbons in R2. Moreover, chloromethyl-ketone derivatives with great activity (Classes 2–5) are clustered into other groupings. Finally, the groups with the least antileukemic activity are cysteine diazomethyl derivatives and are located at the left side of the table (Classes 6–9). The results are in agreement with Figure 2 because pairs of compounds in the same class with similar vector properties <i1,i2,i3,i4,i5,i6> show red lines, representing high partial correlations, e.g., the pair (11, 12) and both compounds with vector properties <111111> in Class 1.

2.3. Principal Component Analysis for Classification of the Most Antileukemic Bioactive Compounds

After obtaining the classification of the cysteine chloromethyl- and diazomethyl-ketone derivatives, a PCA PC1–PC2 scores plot was made (cf. Figure 4) with the properties for the highly active compounds, forming a homologous series of chloromethyl-ketone derivatives with an acetyl group at R1 (compounds 312 and 1618). Compounds 1 and 2 are inactive, and neither are included because the value of stability k is not published for them. The following 18 properties were taken from the ChEMBL database and were used for statistical assessment: full molecular weight (Full_mw, V1), ACD logP (V2), number of rotatable bonds (rtb, V3), heavy atoms (V4), number of carbons in R2 (V5), a_logP (V6), boiling point (V7), enthalpy of vaporization (V8), a different estimation of ACD/logP (V9), molar volume (V10), polarizability (V11), ACD logD (pH 7.4, V12), ACD/KOC (pH 7.4, V13) ACD/BCF (pH 7.4, V14) Qed_weighted (V15), number of violations of Lipinski’s rule of five (Ro5, V16), surface tension (V17) and density (V18). In addition, the variables of both IC50 (µM) Nalm-6 B-lineage ALL (V19) and IC50 (µM) Molt-3 T-lineage ALL (V20), and stability k [hr-1] in 0.01M phosphate buffer, pH 7.5 (V21), were taken from the bibliographic experimental data of Uckun and coworkers [2,3,4,5,6,7]. Notice that there is only one entry for the logD value, compound 15 (chlorhydrate), different from logP; for the rest of them, there is no ionizable form, hence logP ~ logD for most of the compounds.
PCA was applied to reduce the initial variables to a small number of principal components (PCs) in order to obtain an overview variation of the compounds and identify behavioral patterns. Figure 4 shows the two-dimensional representation of the homologous series for all the variables taken into account for the first two PCs. The variance explained by PC1 and PC2 is 95.9%.
The homologous series of cysteine chloromethyl-ketone derivatives, with an acetyl group in R1, is distributed to five subclasses (Figure 4), in agreement with the clustering by entropy information and experimental data: Class 1 (compounds 11 and 12, PC1 > 0, PC2 < 0, bottom), which includes the compounds with the greatest antileukemic activity, characterized by the presence of 11 or 12 carbons in R2; Class 2A (compounds 610, PC1 < 0 in general, PC2 < 0, middle), characterized by the presence of 6–10 carbons in R2; Class 2B (compounds 35, PC1 < 0, PC2 > 0, left), characterized by the presence of 3–5 carbons in R2; Class 3A (compounds 16 and 17, PC1 > 0, PC2 > 0, right), characterized by the presence of 14 and 15 carbons in R2; and Class 3B (compound 18, PC1 > 0, PC2 >> 0, top), characterized by the presence of 16 carbons in R2. This scheme can be generalized to adopt a larger Class 3 merging Classes 3A and 3B.
Figure 5 describes the behavior of the variables. The properties most remote from the origin (0.0, 0.0) are the most important for describing PCs, and those closest to the origin are the least important.
On the one hand, PC1 (87.6% of the total variance) shows positive loading mainly with acd_logp, rtb, full_mwt, alogp, num_lipinski_ro5_violations, ACD/KOC and ACD/BDF, as well as negative loading with surface_tension, QedWeigted, density and stability k. On the other hand, principal PC2 (8.3% of the total variance) shows positive loading, mainly with ACD/BDF, ACD/KOC, num_lipinski_ro5_violations and k. The rest of the variables are near the origin and are less important for PC2.
Both compounds with important antileukemic activity and stability (11 and 12) are characterized by positive loading with the number of violations of Lipinski’s Ro5, ACD/KOC and ACD/BCF, as well as negative loading with surface tension, density and stability k. The rest of the variables are near the origin and are less important for antileukemic activity.
A multiple linear regression model approach was adopted to determine the quantitative importance of the combined presence of some of the 18 properties, taken from the ChEMBL database (cf. Supplementary Material Table S1) to explain such antileukemic activities: IC50 Molt-3 T-lineage ALL (higher value means lower antileukemic activity), pIC50 Nalm-6 B-lineage ALL (higher value means higher antileukemic activity) and stability k (higher value means higher antileukemic activity). The fits were checked with the correlation coefficient r, the standard deviation s and Fisher’s ratio F. The equations of the models between the homologous series of compounds and the properties follow:
I C 50 _ M o l t - 3 _ T - l i n e a g e _ A L L = ( 653 ± 126 ) + ( 7.55 ± 1.29 ) A C D log D       + ( 16.61 ± 3.19 ) s u r f a c e _ t e n s i o n N = 13 r = 0.895 s = 2.096 F = 20.1 q = 0.764
In the case of Nalm-6 B-lineage ALL, we have calculated pIC50 values because the p = −log function smoothens the data and provides a better correlation:
p I C 50 _ N a l m - 6 _ B - l i n e a g e _ A L L = ( 187.3 ± 100.5 ) ( 0.934 ± 0.427 ) R o 5       ( 3.51 ± 1.79 ) s u r f a c e _ t e n s i o n N = 13 r = 0.821 s = 0.270 F = 2.9 q = 0.424
k = ( 0.0532 ± 0.0063 ) ( 0.00279 ± 0.00120 ) A C D log D N = 13 r = 0.573 s = 0.009 F = 5.4 q = 0.286
In Equation (1), the substitution of the dependent variable IC50_Molt-3_T-lineage_ALL by the pIC50 does not improve the correlation. The same occurs in Equation (3) for the substitution of k by logk. All the results are in agreement with the PCA (Figure 5) because both IC50 variables show positive loading, among others, with ACD logD, surface tension and number of violations of Lipinski’s Ro5.
Quantitative structure–activity/property relationship (QSAR/QSPR) researchers are trying to establish equations that correlate the physicochemical parameters of the molecules with their activities/properties; e.g., molar refractivity, refractive index and electronic parameters, which have been used extensively. The first study that correlated the surface tension with dissociation constants was Thakur’s work [18]. He showed that the surface tension could be successfully used to model the dissociation constant of sulfonamide drugs. The dissociation constant pKa depends on the polarity and the intermolecular forces. For maximum activity, the sulfonamides should present a proper pKa for penetrating in vivo membranes and best binding abilities to their target enzyme. The abilities depend on their protonated/unprotonated form dissociation constants, expressed as pKa. Thakur’s results could explain the interest of surface tension appearing in Equations (1) and (2) because it reduces the bioactivity of our molecules.
The applicability domain of the proposed models (1)–(3) is analyzed by Williams plot (cf. Figure 6), which is the chart of cross-validated standardized residuals vs. leverage (Hat diagonal) values (k). In Equation (1), the response outlier (cross-validated standardized residual >3σ) is compound 16 and the structurally influential chemical (h > h*) is compound 18. In Equation (2), there is neither response outlier nor structurally influential chemical. In Equation (3), there is no response outlier and the structurally influential chemical is compound 18.
Leave-m-out (1 ≤ m ≤ 10) cross-validated correlation coefficients rcv calculated for Cys diazomethyl- and chloromethyl-ketone derivatives (q = rcv (m = 1), cf. Table 4) show that rcv decays with m except for IC50 Molt-3 T-lineage and pIC50 Nalm-6 B-lineage (Equations (1) and (2)), which indicates possible outliers. In Equation (2), cross-validation can be calculated for only m ≤ 2 because Ro5 values are not very discriminating (cf. Table S1). In particular, the Molt-3 T-lineage activity inhibition model IC50 vs. {ACDlogD, surface_tension (Equation (1)) gives the greatest rcv. Therefore, Equation (1) results more predictive than Equations (2) and (3).
The linear regressions suggest that the number of carbons is an important individual factor. Figure 7a,b shows the representations of both IC50 Nalm-6 B-lineage ALL and IC50 Molt-3 T-lineage ALL, as well as stability k vs. the number of carbons. Both IC50 Nalm-6 and IC50 Molt-3 are similar, with the minimum in 11-12 carbon atoms (Figure 7a). All properties are fitted to second-degree polynomial curves. The most active compounds (11 and 12), which present minimum values in the fitted models in the graphics, match Class 1 in Table 2 of periodic properties, obtained by the procedure based on information entropy theory (artificial intelligence). These compounds are in the last (right side) group and last (bottom) period.
Figure 8 displays surface tension data vs. the number of carbons for the homologous series. The surface tension decays monotonically with the number of carbons.

3. Materials and Methods

3.1. MolClas Program for Molecular Classification Based on the Equipartition Conjecture of Entropy Production

The computational method is the same as the one that we successfully applied to the classification of polyphenolic compounds. The first step in quantifying the concept of similarity, for molecules of cysteine diazomethyl- and chloromethyl-ketone derivatives, is to list the most important moieties with respect to the antileukemic activity of such compounds. Furthermore, the vector of properties i ¯ = <i1,i2,…ik,…> should be associated with each feature ik, whose components correspond to a number of characteristic functional groups in the molecule, in a hierarchical order, according to the expected importance of their antileukemic activity. The components ik are either “1” or “0,” according to the experimental conclusions of antileukemic power for structural variations in the cysteine derivative compounds.
In this way, index i1 = 1 denotes a chloromethyl group at R3; i2 = 1 signifies either an acetyl or tert-butyloxycarbonyl (Boc)-substituent at R1; i3 = 1 indicates the only presence of an acetyl group at R1; i4 = 1 means a chain that has between 3 and 12 carbons in line either with or without ramifications, either with or without double bonds at R2; i5 = 1 represents that at R2; the structure presents a chain with either 11 or 12 carbons in line, either with or without ramifications and either with or without double bonds; and i6 = 1 shows the absence of ramifications and double bonds in the R2 chain (Table 1).
Let us denote by rij (0 ≤ rij ≤ 1) the similarity index of two cysteine derivatives, associated with the i ¯ and j ¯ vectors, respectively. A similarity matrix R = [rij] characterizes the relation of similitude. The similarity index between two cysteine derivatives i ¯ = <i1,i2,…ik,…> and j ¯ = <j1,j2,…jk,…> is defined as:
r i j = k t k ( a k ) k ( k = 1 , 2 , )
where 0 ≤ ak ≤ 1 and tk = 1 if ik = jk, but tk = 0 if ikjk. This definition assigns a weight (ak)k to each property involved in the description of molecule i or j. The hierarchical order of the six structural features is expressed by their corresponding weights. For instance, for all ak = 0.5, these weights are 0.5, 0.25, 0.125, 0.0625, 0.03125 and 0.015625, which have been used in this work.
Learning procedures similar to those encountered in stochastic methods are implemented as follows [19]. Consider a given partition into classes as good or ideal from practical or empirical observations. This corresponds to a reference similarity matrix S = [sij] obtained for an arbitrary number of fictitious properties. Next, consider the same set of species as in the good classification and the actual properties.
The similarity degree rij is then computed from the R correlation matrix. The number of properties for R and S may differ. The learning procedure consists of trying to find classification results for R as close as possible to the good classification. The distance between the partitions in classes characterized by R and S is given by:
D = i j ( 1 r i j ) ln 1 r i j 1 s i j i j r i j ln r i j s i j 0 r i j , s i j 1
This definition was suggested by that introduced in information theory by Kullback to measure the distance between two probability distributions [20]. Such a procedure has been applied to the synthesis of complex dendrograms using information entropy [21,22].
We have written a MolClas program for molecular classification based on the Equipartition Conjecture of Entropy Production. It punches the similarity and difference matrices, as well as the latter in format NEXUS (.NEX) for programs PAUP, MacClade and SplitsTree. Code MolClas performs single- and complete-linkage hierarchical cluster analyses (CAs) of the compounds by using the IMSL subroutine CLINK [23].

3.2. GraphCor Program for Partial Correlation Diagram

The partial correlation diagram presents high partial correlations (|r| ≥ 0.75) in red, medium partial correlations (0.50 ≤ |r| < 0.75) in orange, low partial correlations (0.25 ≤ |r| < 0.50) in yellow and zero partial correlations (|r| < 0.25) in black. Codes MolClas and GraphCor are available from the authors at Internet ([email protected]) and are free for academics.

3.3. Statistical Analysis

Principal component analysis (PCA), linear and multiple linear regression models were performed using SPSS (vs. 21.0, IBM Corp., USA), Minitab (vs. 17.1.0), Knowledge Miner and Microsoft Excel for Office (2020) for Windows 10 OS. The calculated statistics are the number of data points N, the correlation coefficient r, the standard deviation s and the Fisher’s ratio F. The correlation coefficients between cross-validation rcv (q = rcv (m = 1), etc.) were calculated with the leave-m-out (LMO) procedure [24]. The process furnishes a new method for selecting the best set of descriptors: LMO selects the best set of descriptors according to the criterion of maximization of the value of rcv. The cross-validation was used to determine the predictability of the models, which were compared and validated taking into account rcv (q).

4. Conclusions

From the discussion of the present results, the following conclusions can be drawn.
  • Based on a set of six vector properties, the partial correlation diagram was calculated for a set of 28 S-alkylcysteine diazomethyl- and chloromethyl-ketone derivatives. Derivatives with the greatest antileukemic activity in the same class correspond to high partial correlations.
  • A table of periodic classification is made based on information entropy. The first four characteristics denote the group, and the last two indicate the period. Nine classes are clearly distinguished. The most active compounds (11, 12 and 24), all with 11 or 12 carbons in line in R2, are situated at the right side, bottom and, especially, bottom right of this periodic table.
  • The principal component analysis scores plot of the homologous series of S-alkyl chloromethyl ketones, for 18 properties, shows five subclasses corresponding to the periodic classification of the congeneric series into nine classes.
  • Linear fits of both antileukemic activities and stability are good (correlation coefficients of 0.57 or greater). They are in agreement with the principal component analysis. The variables that appear in the models are those that show positive loading in the principal component analysis.
  • The most important properties to explain the antileukemic activities (50% inhibitory concentration Molt-3 T-lineage acute lymphoblastic leukemia minus the logarithm of 50% inhibitory concentration Nalm-6 B-lineage acute lymphoblastic leukemia and stability k) are ACD logD, surface tension and number of violations of Lipinski’s rule of five.
  • After leave-m-out cross-validation, Equation (1) is the most predictive for cysteine diazomethyl- and chloromethyl-ketone derivatives (cross-validated correlation coefficient of 0.764).
  • The results of the antileukemic activities for the cysteine diazomethyl- and chloromethyl-ketone derivatives show that the surface tension has an unfavorable influence and this could be related to the results obtained by Thakur.
  • The representations of 50% inhibitory concentration Nalm-6 B-lineage and 50% inhibitory concentration Molt-3 T-lineage acute lymphoblastic leukemias, as well as stability k vs. the number of carbons, are fitted to second-degree polynomial curves. The most active compounds (11 and 12) present minimum values and coincide with Class 1 obtained by information entropy theory.

Supplementary Materials

Supplementary materials can be found online.

Author Contributions

All authors have contributed equally to the work reported. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded from an internal aid from Universidad Católica de Valencia San Vicente Mártir.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article or supplementary material.

Acknowledgments

The authors acknowledge E. Besalú for providing them his full-linear leave-many-out program before publication and Y. Marrero-Ponce for Williams plots.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Sample Availability

The samples of compounds are not available from authors.

References

  1. Jemal, A.; Bray, F.; Center, M.M.; Ferlay, J.; Ward, E.; Forman, D. Global cancer statistics. CA Cancer J. Clin. 2011, 61, 69–90. [Google Scholar] [CrossRef] [PubMed]
  2. Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018, 68, 394–424. [Google Scholar] [CrossRef] [PubMed]
  3. Uckun, F.M.; Narla, R.M.; Perry, D.A. Parker Hughes Institute. Alkyl Ketones as Potent Anti-Cancer Agents. Patent US6251882B1, 26 June 2001. [Google Scholar]
  4. Uckun, F.M.; Narla, R.M.; Perry, D.A. Parker Hughes Institute. Alkyl Ketones as Potent Anti-Cancer Agents. Patent CA2336108A1, 6 January 2001. [Google Scholar]
  5. Perrey, D.A.; Narla, R.K.; Uckun, F.M. Cysteine chloromethyl and diazomethyl ketone derivatives with potent anti-leukemic activity. Bioorg. Med. Chem. Lett. 2000, 10, 547–549. [Google Scholar] [CrossRef]
  6. Perrey, D.A.; Scannell, M.P.; Narla, R.K.; Uckun, F.M. The S-alkyl chain length as a determinant of the anti-leukemic activity of cysteine chloromethyl ketone compounds. Bioorg. Med. Chem. Lett. 2000, 10, 551–552. [Google Scholar] [CrossRef]
  7. Kotchevar, A.T.; Perrey, D.A.; Uckun, F.M. A degradation study of a series of chloromethyl and diazomethyl ketone anti-leukemic agents. Drug Develop. Ind. Pharm. 2002, 28, 143–149. [Google Scholar] [CrossRef] [PubMed]
  8. Holland, H.L.; Brown, F.M.; Johnson, D.V.; Kerridge, A.; Mayne, B.; Turner, C.D.; van Vliet, A.J. Biocatalytic oxidation of S-alkylcysteine derivatives by chloroperoxidase and Beauveria species. J. Mol. Catal. B Enzym. 2002, 17, 249–256. [Google Scholar] [CrossRef]
  9. Calce, E.; De Luca, S. The cysteine S-alkylation reaction as a synthetic method to covalently modify peptide sequences. Chem. Eur. J. 2017, 23, 224–233. [Google Scholar] [CrossRef] [PubMed]
  10. Castellano, G.; Redondo, L.; Torrens, F. QSAR of natural sesquiterpene lactones as inhibitors of Myb-dependent gene expression. Curr. Top. Med. Chem. 2017, 17, 3256–3268. [Google Scholar] [CrossRef] [PubMed]
  11. Torrens, F.; Castellano, G. Structure–activity relationships of cytotoxic lactones as inhibitors and mechanisms of action. Curr. Drug Discov. Technol. 2020, 17, 166–182. [Google Scholar] [CrossRef] [PubMed]
  12. Castellano, G.; Tena, J.; Torrens, F. Structural indicators and its relation to antioxidant properties of Posidonia oceanica (L.) Delile. MATCH Commun. Math. Comput. Chem. 2012, 67, 231–250. [Google Scholar]
  13. Castellano, G.; González-Santander, J.L.; Lara, A.; Torrens, F. Classification of flavonoid compounds by using entropy of information theory. Phytochemistry 2013, 93, 182–191. [Google Scholar] [CrossRef] [PubMed]
  14. Castellano, G.; Lara, A.; Torrens, F. Classification of stilbenoid compounds by entropy of artificial intelligence. Phytochemistry 2014, 97, 62–69. [Google Scholar] [CrossRef] [PubMed]
  15. Castellano, G.; Torrens, F. Quantitative structure–antioxidant activity models of isoflavonoids: A theoretical study. Int. J. Mol. Sci. 2015, 16, 12891–12906. [Google Scholar] [CrossRef] [PubMed]
  16. Castellano, G.; Torrens, F. Information entropy-based classification of triterpenoids and steroids from Ganoderma. Phytochemistry 2015, 116, 305–313. [Google Scholar] [CrossRef] [PubMed]
  17. Shaw, P.J.A. Multivariate Statistics for the Environmental Sciences; Hodder-Arnold: New York, NY, USA, 2003. [Google Scholar]
  18. Thakur, A. QSAR study on benzenesulfonamide ionization constant: Physicochemical approach using surface tension. Arch. Org. Chem. 2005, 14, 49–58. [Google Scholar] [CrossRef]
  19. White, H. Neural network learning and statistics. AI Expert 1989, 4, 48–52. [Google Scholar]
  20. Kullback, S. Information Theory and Statistics; Wiley: New York, NY, USA, 1959. [Google Scholar]
  21. Iordache, O. Modeling Multi-Level Systems; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
  22. Iordache, O. Self-Evolvable Systems: Machine Learning in Social Media; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  23. IMSL. Integrated Mathematical Statistical Library (IMSL); IMSL: Houston, TX, USA, 1989. [Google Scholar]
  24. Besalú, E. Fast computation of cross-validated properties in full linear leave-many-out procedures. J. Math. Chem. 2001, 29, 191–203. [Google Scholar] [CrossRef]
Figure 1. Basic structure of cysteine diazomethyl- and chloromethyl-ketone derivatives.
Figure 1. Basic structure of cysteine diazomethyl- and chloromethyl-ketone derivatives.
Molecules 26 00235 g001
Figure 2. Partial correlation diagram: high (red), medium (orange) and low (yellow) correlations.
Figure 2. Partial correlation diagram: high (red), medium (orange) and low (yellow) correlations.
Molecules 26 00235 g002
Figure 3. Entropy h vs. classification level b for different numbers of classes.
Figure 3. Entropy h vs. classification level b for different numbers of classes.
Molecules 26 00235 g003
Figure 4. Scores plot for the homologous series of chloromethyl ketones with an acetyl group at R1.
Figure 4. Scores plot for the homologous series of chloromethyl ketones with an acetyl group at R1.
Molecules 26 00235 g004
Figure 5. Loading plot of variables for homologous series of chloromethyl ketones with acetyl at R1.
Figure 5. Loading plot of variables for homologous series of chloromethyl ketones with acetyl at R1.
Molecules 26 00235 g005
Figure 6. The Williams plot for the graphical visualization of outliers for the response (on the Y-axis: standardized residuals >3σ) or for the structure (on the X-axis: highest Hat value >h* cut-off line) in the regression models: (a) Equation (1); (b) Equation (2); (c) Equation (3). Numbers 1–13 correspond to compounds 312, 1618.
Figure 6. The Williams plot for the graphical visualization of outliers for the response (on the Y-axis: standardized residuals >3σ) or for the structure (on the X-axis: highest Hat value >h* cut-off line) in the regression models: (a) Equation (1); (b) Equation (2); (c) Equation (3). Numbers 1–13 correspond to compounds 312, 1618.
Molecules 26 00235 g006
Figure 7. Experimental data: (a) IC50 antileukemic activity and (b) stability k vs. the number of carbons.
Figure 7. Experimental data: (a) IC50 antileukemic activity and (b) stability k vs. the number of carbons.
Molecules 26 00235 g007
Figure 8. Surface tension data vs. the number of carbons.
Figure 8. Surface tension data vs. the number of carbons.
Molecules 26 00235 g008
Table 1. Vector of properties of Cys diazo- and chloromethyl-ketone derivatives and experimental data of antileukemic activity (IC50) and stability k.
Table 1. Vector of properties of Cys diazo- and chloromethyl-ketone derivatives and experimental data of antileukemic activity (IC50) and stability k.
CompoundR1R2R3<i1,i2,i3,i4,i5,i6> aIC50 (µM)
Nalm-6
B-lineage ALL
IC50 (µM) Molt-3
T-lineage ALL
k [hr−1] 0.01M Phosphate Buffer,
pH = 8.0, Ionic Strength = 0.3 M
1CH3COCH3CH2Cl11100130.380.8
2CH3COCH2CH3CH2Cl11100152.899.9
3CH3CO(CH2)2CH3CH2Cl1111016.98.00.0658
4CH3CO(CH2)3CH3CH2Cl11110141.45.60.0523
5CH3CO(CH2)4CH3CH2Cl1111015.85.40.0498
6CH3CO(CH2)5CH3CH2Cl1111013.30.70.0336
7CH3CO(CH2)6CH3CH2Cl1111014.82.50.0319
8CH3CO(CH2)7CH3CH2Cl1111015.64.10.0388
9CH3CO(CH2)8CH3CH2Cl1111017.36.70.0373
10CH3CO(CH2)9CH3CH2Cl1111014.73.40.0352
11CH3CO(CH2)10CH3CH2Cl1111111.73.00.0345
12CH3CO(CH2)11CH3CH2Cl1111112.02.30.0242
13CH3CO(CH2)11CH3CH=N201111115.422.9
14Boc b(CH2)11CH3CH2Cl11011115.115.5
15H c(CH2)11CH3CH2Cl10011117.712.5
16CH3CO(CH2)13CH3CH2Cl1110018.78.80.0417
17CH3CO(CH2)14CH3CH2Cl1110018.98.60.0374
18CH3CO(CH2)15CH3CH2Cl11100116.017.30.0363
19Boc-Glytrans,trans-FarnesylCH=N200011051.384.5
20Boc-Glytrans,trans-FarnesylCH2Cl10011012.917.5
21Boctrans,trans-FarnesylCH=N201011049.850.1
22Boctrans,trans-FarnesylCH2Cl11011010.77.7
23CH3COtrans,trans-FarnesylCH=N201111030.332.2
24CH3COtrans,trans-FarnesylCH2Cl1111103.01.4
25CH3COtrans-GeranylCH=N2011000>100>100
26Boctrans-GeranylCH=N2010000>100>100
27CH3CO3-Methyl-2-butenylCH=N2011000>100>100
28CH3CO3-Methyl-2-butenylCH2Cl11100012.67.9
a i1 = 1, a chloromethyl group at R3; i2 = 1, either an acetyl or Boc-substituent at R1; i3 = 1, the only presence of an acetyl group at R1; i4 = 1, a chain with between 3 and 12 carbons in line either with or without ramifications, either with or without double bonds at R2; i5 = 1, at R2, a chain with either 11 or 12 carbons in line, either with or without ramifications, either with or without double bonds; i6 = 1, absence of ramifications and double bonds in the R2 chain. b Boc: tert-butyloxycarbonyl. c The molecule is a hydrochloride (acid salt resulting from its reaction with hydrochloric acid).
Table 2. Classification of cysteine diazomethyl- and chloromethyl-ketone derivatives by information entropy method.
Table 2. Classification of cysteine diazomethyl- and chloromethyl-ketone derivatives by information entropy method.
P a0001 b0100/0101/011001111001110111101111
0X c Class 9 Class 3Class 2
Molecules 26 00235 i001 Molecules 26 00235 i002 Molecules 26 00235 i003
25 R1: CH3CO;
R2: trans-Geranyl
26 R1: Boc;
R2: trans-Geranyl
27 R1: CH3CO;
R2: 3-Methyl-2-butenyl
1 R2: -CH3
2 R2: -CH2CH3
16 R2: -(CH2)13CH3
17 R2: -(CH2)14CH3
18 R2: -(CH2)15CH3
28 R2: 3-Methyl-2-butenyl
3 R2: -(CH2)2CH3
4 R2: -(CH2)3CH3
5 R2: -(CH2)4CH3
6 R2: -(CH2)5CH3
7 R2: -(CH2)6CH3
8 R2: -(CH2)7CH3
9 R2: -(CH2)8CH3
10 R2: -(CH2)9CH3
1XClass 8Class 7Class 6Class 5Class 4 Class 1
Molecules 26 00235 i004 Molecules 26 00235 i005 Molecules 26 00235 i006 Molecules 26 00235 i007 Molecules 26 00235 i008 Molecules 26 00235 i009
19 R1: Boc-Gly;
R2: trans,trans-Farnesyl
21 R2: trans,trans-Farnesyl13 R2: -(CH2)11CH3
23 R2: trans,trans-Farnesyl
15 R1: H.HCl;
R2: -(CH2)11CH3
20 R1: Boc-Gly;
R2: trans,trans-Farnesyl
14 R2: -(CH2)11CH3
22 R2: trans,trans-Farnesyl
11 R2: -(CH2)10CH3
12 R2: -(CH2)11CH3
24 R2: trans,trans-Farnesyl
a P: period <i5,i6>. b 0001: group <i1,i2,i3,i4>. c X = either 0 or 1.
Table 3. Entropy and classification level b for different numbers of classes.
Table 3. Entropy and classification level b for different numbers of classes.
bhNo. of Classes
1.0000320.885828
0.979993.393814
0.959938.34009
0.949938.31789
0.929930.48598
0.919930.53888
0.889930.51668
0.869917.42596
0.839911.89255
0.759911.53835
0.74997.58604
0.58994.16983
Table 4. Cross-validated correlation coefficient in leave-m-out for Cys diazomethyl- and chloromethylketones.
Table 4. Cross-validated correlation coefficient in leave-m-out for Cys diazomethyl- and chloromethylketones.
mIC50 Molt-3 T-Lineage ALL Equation (1)pIC50 Nalm-6 B-Lineage ALL Equation (2)k Equation (3)
10.7640.4240.286
20.7670.4280.285
30.7700.283
40.7720.281
50.7750.280
60.7760.280
70.7750.283
80.7690.290
90.7380.306
100.340
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Back to TopTop