# Classification of Congeneric and QSAR of Homologous Antileukemic S–Alkylcysteine Ketones

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

_{S}) sulfoxide in moderate to high diastereomeric excess [8]. The (S

_{S}) natural product sulfoxide chondrine was obtained via biotransformation of the N-tert-butyloxycarbonyl (Boc) derivative of l-4-S-morpholine-2-carboxylic acid using Beauveria bassiana or B. caledonica. The nucleophilic amino acids, largely employed for the peptide chemical modification, are the lysine and the cysteine residues. Cysteine modification is performed via its thiol side chain, which is characterized by a strong nucleophilicity, higher than that of a primary amine as amino acid lysine, which is protonated at pH values below 9.0. Therefore, a cysteine can react faster than lysine, resulting in the selective modification of a key amino acid over other residues. A possible synthetic route is the S-alkylation reaction; in this regard, post-translational modifications occurring on this amino acid are essential for the biological function of many proteins. In particular, numerous signaling proteins are post-translationally lipidated on a cysteine residue. Since this lipidation is essential for the correct localization and function of these proteins, the enzymes responsible for the covalent introduction/removal of lipid moieties have been considered interesting targets for blocking aberrant signaling processes [9].

**12**) is an exceptionally active compound against leukemia cells, the length of the alkyl chain has a profound effect on the antileukemic potency of the homologous series and the congeneric series may be useful for treating patients with therapy-refractory or relapsed leukemia. Thus, we want to validate if different moieties in the congeneric series correspond to the same potency. The objective of this study was to predict the antileukemic activity of these compounds based on their molecular structures; moreover, a study of QSAR and a principal component analysis (PCA) related the antileukemic activity of a homologous series of S-alkylcysteine chloromethyl-ketone derivatives to the physical and chemical properties of these compounds.

## 2. Results and Discussion

_{50}) and stability k.

#### 2.1. GraphCor Partial Correlation Diagram

_{1},i

_{2},i

_{3},i

_{4},i

_{5},i

_{6}> for the 28 cysteine diazomethyl- and chloromethyl-ketone derivatives. The Pearson intercorrelations are computed for the partial correlation diagram, which contains high partial correlations (r ≥ 0.75), medium partial correlations (0.50 ≤ r < 0.75), low partial correlations (0.25 ≤ r < 0.50) and zero partial correlations (r < 0.25). Pairs of compounds with high partial correlation show a similar vector property. With the Equipartition Conjecture of Entropy Production, the partial correlations matrix (cf. Figure 2) contains 187 high, 44 medium, 116 low and 31 zero partial correlations. Many partial correlations are high. Red lines, representing high partial correlations, link cysteine derivatives with the greatest antileukemic activity because the most active compounds (

**11**and

**12**) are taken as reference molecules with vector properties <111111>. The antileukemic activities are expressed as IC

_{50}.

#### 2.2. MolClas Molecular Classification Based on the Equipartition Conjecture of Entropy Production

**b**is the classification level. A comparative analysis of the molecular dataset, from 28 classes (each compound in its own class) to one class (containing all compounds), by the method of information entropy theory, matching <i

_{1},i

_{2},i

_{3},i

_{4},i

_{5},i

_{6}> and classification at level b (C

_{b}), is calculated for antileukemic activity [17] and summarized in Table 2.

_{k}= 0.5, for the classification level 0.94 ≤ b ≤ 0.96, allows nine classes (grouped from Class 1 to Class 9, cf. Table 3).

**R**

_{b}) = 38.32, which is the classification closest to the cut-off point of the entropy vs. classification level with its trend line (cf. Figure 3).

**11**,

**12**and

**24**) are grouped into the same class, corresponding to acetyl amides with a linear chain containing either 11 or 12 carbons in R

_{2}. Moreover, chloromethyl-ketone derivatives with great activity (Classes 2–5) are clustered into other groupings. Finally, the groups with the least antileukemic activity are cysteine diazomethyl derivatives and are located at the left side of the table (Classes 6–9). The results are in agreement with Figure 2 because pairs of compounds in the same class with similar vector properties <i

_{1},i

_{2},i

_{3},i

_{4},i

_{5},i

_{6}> show red lines, representing high partial correlations, e.g., the pair (

**11**,

**12**) and both compounds with vector properties <111111> in Class 1.

#### 2.3. Principal Component Analysis for Classification of the Most Antileukemic Bioactive Compounds

_{1}–PC

_{2}scores plot was made (cf. Figure 4) with the properties for the highly active compounds, forming a homologous series of chloromethyl-ketone derivatives with an acetyl group at R

_{1}(compounds

**3**–

**12**and

**16**–

**18**). Compounds

**1**and

**2**are inactive, and neither are included because the value of stability k is not published for them. The following 18 properties were taken from the ChEMBL database and were used for statistical assessment: full molecular weight (Full_mw, V

_{1}), ACD logP (V

_{2}), number of rotatable bonds (rtb, V

_{3}), heavy atoms (V

_{4}), number of carbons in R

_{2}(V

_{5}), a_logP (V

_{6}), boiling point (V

_{7}), enthalpy of vaporization (V

_{8}), a different estimation of ACD/logP (V

_{9}), molar volume (V

_{10}), polarizability (V

_{11}), ACD logD (pH 7.4, V

_{12}), ACD/KOC (pH 7.4, V

_{13}) ACD/BCF (pH 7.4, V

_{14}) Qed_weighted (V

_{15}), number of violations of Lipinski’s rule of five (Ro5, V

_{16}), surface tension (V

_{17}) and density (V

_{18}). In addition, the variables of both IC

_{50}(µM) Nalm-6 B-lineage ALL (V

_{19}) and IC

_{50}(µM) Molt-3 T-lineage ALL (V

_{20}), and stability k [hr

^{-1}] in 0.01M phosphate buffer, pH 7.5 (V

_{21}), were taken from the bibliographic experimental data of Uckun and coworkers [2,3,4,5,6,7]. Notice that there is only one entry for the logD value, compound

**15**(chlorhydrate), different from logP; for the rest of them, there is no ionizable form, hence logP ~ logD for most of the compounds.

_{1}and PC

_{2}is 95.9%.

_{1}, is distributed to five subclasses (Figure 4), in agreement with the clustering by entropy information and experimental data: Class 1 (compounds

**11**and

**12**, PC

_{1}> 0, PC

_{2}< 0, bottom), which includes the compounds with the greatest antileukemic activity, characterized by the presence of 11 or 12 carbons in R

_{2}; Class 2A (compounds

**6**–

**10**, PC

_{1}< 0 in general, PC

_{2}< 0, middle), characterized by the presence of 6–10 carbons in R

_{2}; Class 2B (compounds

**3**–

**5**, PC

_{1}< 0, PC

_{2}> 0, left), characterized by the presence of 3–5 carbons in R

_{2}; Class 3A (compounds

**16**and

**17**, PC

_{1}> 0, PC

_{2}> 0, right), characterized by the presence of 14 and 15 carbons in R

_{2;}and Class 3B (compound

**18**, PC

_{1}> 0, PC

_{2}>> 0, top), characterized by the presence of 16 carbons in R

_{2}. This scheme can be generalized to adopt a larger Class 3 merging Classes 3A and 3B.

_{1}(87.6% of the total variance) shows positive loading mainly with acd_logp, rtb, full_mwt, alogp, num_lipinski_ro5_violations, ACD/KOC and ACD/BDF, as well as negative loading with surface_tension, QedWeigted, density and stability k. On the other hand, principal PC

_{2}(8.3% of the total variance) shows positive loading, mainly with ACD/BDF, ACD/KOC, num_lipinski_ro5_violations and k. The rest of the variables are near the origin and are less important for PC

_{2}.

**11**and

**12**) are characterized by positive loading with the number of violations of Lipinski’s Ro5, ACD/KOC and ACD/BCF, as well as negative loading with surface tension, density and stability k. The rest of the variables are near the origin and are less important for antileukemic activity.

_{50}Molt-3 T-lineage ALL (higher value means lower antileukemic activity), pIC

_{50}Nalm-6 B-lineage ALL (higher value means higher antileukemic activity) and stability k (higher value means higher antileukemic activity). The fits were checked with the correlation coefficient r, the standard deviation s and Fisher’s ratio F. The equations of the models between the homologous series of compounds and the properties follow:

_{50}values because the p = −log function smoothens the data and provides a better correlation:

_{50}variables show positive loading, among others, with ACD logD, surface tension and number of violations of Lipinski’s Ro5.

_{a}depends on the polarity and the intermolecular forces. For maximum activity, the sulfonamides should present a proper pK

_{a}for penetrating in vivo membranes and best binding abilities to their target enzyme. The abilities depend on their protonated/unprotonated form dissociation constants, expressed as pK

_{a}. Thakur’s results could explain the interest of surface tension appearing in Equations (1) and (2) because it reduces the bioactivity of our molecules.

**16**and the structurally influential chemical (h > h*) is compound

**18**. In Equation (2), there is neither response outlier nor structurally influential chemical. In Equation (3), there is no response outlier and the structurally influential chemical is compound

**18**.

_{cv}calculated for Cys diazomethyl- and chloromethyl-ketone derivatives (q = r

_{cv}(m = 1), cf. Table 4) show that r

_{cv}decays with m except for IC

_{50}Molt-3 T-lineage and pIC

_{50}Nalm-6 B-lineage (Equations (1) and (2)), which indicates possible outliers. In Equation (2), cross-validation can be calculated for only m ≤ 2 because Ro5 values are not very discriminating (cf. Table S1). In particular, the Molt-3 T-lineage activity inhibition model IC

_{50}vs. {ACDlogD, surface_tension (Equation (1)) gives the greatest r

_{cv}. Therefore, Equation (1) results more predictive than Equations (2) and (3).

_{50}Nalm-6 B-lineage ALL and IC

_{50}Molt-3 T-lineage ALL, as well as stability k vs. the number of carbons. Both IC

_{50}Nalm-6 and IC

_{50}Molt-3 are similar, with the minimum in 11-12 carbon atoms (Figure 7a). All properties are fitted to second-degree polynomial curves. The most active compounds (

**11**and

**12**), which present minimum values in the fitted models in the graphics, match Class 1 in Table 2 of periodic properties, obtained by the procedure based on information entropy theory (artificial intelligence). These compounds are in the last (right side) group and last (bottom) period.

## 3. Materials and Methods

#### 3.1. MolClas Program for Molecular Classification Based on the Equipartition Conjecture of Entropy Production

_{1},i

_{2},…i

_{k},…> should be associated with each feature i

_{k}, whose components correspond to a number of characteristic functional groups in the molecule, in a hierarchical order, according to the expected importance of their antileukemic activity. The components i

_{k}are either “1” or “0,” according to the experimental conclusions of antileukemic power for structural variations in the cysteine derivative compounds.

_{1}= 1 denotes a chloromethyl group at R

_{3}; i

_{2}= 1 signifies either an acetyl or tert-butyloxycarbonyl (Boc)-substituent at R

_{1}; i

_{3}= 1 indicates the only presence of an acetyl group at R

_{1}; i

_{4}= 1 means a chain that has between 3 and 12 carbons in line either with or without ramifications, either with or without double bonds at R

_{2}; i

_{5}= 1 represents that at R

_{2}; the structure presents a chain with either 11 or 12 carbons in line, either with or without ramifications and either with or without double bonds; and i

_{6}= 1 shows the absence of ramifications and double bonds in the R

_{2}chain (Table 1).

_{ij}(0 ≤ r

_{ij}≤ 1) the similarity index of two cysteine derivatives, associated with the $\overline{i}$ and $\overline{j}$ vectors, respectively. A similarity matrix R = [r

_{ij}] characterizes the relation of similitude. The similarity index between two cysteine derivatives $\overline{i}$ = <i

_{1},i

_{2},…i

_{k},…> and $\overline{j}$ = <j

_{1},j

_{2},…j

_{k},…> is defined as:

_{k}≤ 1 and t

_{k}= 1 if i

_{k}= j

_{k}, but t

_{k}= 0 if i

_{k}≠ j

_{k}. This definition assigns a weight (a

_{k})

^{k}to each property involved in the description of molecule i or j. The hierarchical order of the six structural features is expressed by their corresponding weights. For instance, for all a

_{k}= 0.5, these weights are 0.5, 0.25, 0.125, 0.0625, 0.03125 and 0.015625, which have been used in this work.

**S**= [s

_{ij}] obtained for an arbitrary number of fictitious properties. Next, consider the same set of species as in the good classification and the actual properties.

_{ij}is then computed from the

**R**correlation matrix. The number of properties for

**R**and

**S**may differ. The learning procedure consists of trying to find classification results for

**R**as close as possible to the good classification. The distance between the partitions in classes characterized by

**R**and

**S**is given by:

#### 3.2. GraphCor Program for Partial Correlation Diagram

#### 3.3. Statistical Analysis

_{cv}(q = r

_{cv}(m = 1), etc.) were calculated with the leave-m-out (LMO) procedure [24]. The process furnishes a new method for selecting the best set of descriptors: LMO selects the best set of descriptors according to the criterion of maximization of the value of r

_{cv}. The cross-validation was used to determine the predictability of the models, which were compared and validated taking into account r

_{cv}(q).

## 4. Conclusions

- Based on a set of six vector properties, the partial correlation diagram was calculated for a set of 28 S-alkylcysteine diazomethyl- and chloromethyl-ketone derivatives. Derivatives with the greatest antileukemic activity in the same class correspond to high partial correlations.
- A table of periodic classification is made based on information entropy. The first four characteristics denote the group, and the last two indicate the period. Nine classes are clearly distinguished. The most active compounds (
**11**,**12**and**24**), all with 11 or 12 carbons in line in R_{2}, are situated at the right side, bottom and, especially, bottom right of this periodic table. - The principal component analysis scores plot of the homologous series of S-alkyl chloromethyl ketones, for 18 properties, shows five subclasses corresponding to the periodic classification of the congeneric series into nine classes.
- Linear fits of both antileukemic activities and stability are good (correlation coefficients of 0.57 or greater). They are in agreement with the principal component analysis. The variables that appear in the models are those that show positive loading in the principal component analysis.
- The most important properties to explain the antileukemic activities (50% inhibitory concentration Molt-3 T-lineage acute lymphoblastic leukemia minus the logarithm of 50% inhibitory concentration Nalm-6 B-lineage acute lymphoblastic leukemia and stability k) are ACD logD, surface tension and number of violations of Lipinski’s rule of five.
- After leave-m-out cross-validation, Equation (1) is the most predictive for cysteine diazomethyl- and chloromethyl-ketone derivatives (cross-validated correlation coefficient of 0.764).
- The results of the antileukemic activities for the cysteine diazomethyl- and chloromethyl-ketone derivatives show that the surface tension has an unfavorable influence and this could be related to the results obtained by Thakur.
- The representations of 50% inhibitory concentration Nalm-6 B-lineage and 50% inhibitory concentration Molt-3 T-lineage acute lymphoblastic leukemias, as well as stability k vs. the number of carbons, are fitted to second-degree polynomial curves. The most active compounds (
**11**and**12**) present minimum values and coincide with Class 1 obtained by information entropy theory.

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Sample Availability

## References

- Jemal, A.; Bray, F.; Center, M.M.; Ferlay, J.; Ward, E.; Forman, D. Global cancer statistics. CA Cancer J. Clin.
**2011**, 61, 69–90. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bray, F.; Ferlay, J.; Soerjomataram, I.; Siegel, R.L.; Torre, L.A.; Jemal, A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.
**2018**, 68, 394–424. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Uckun, F.M.; Narla, R.M.; Perry, D.A. Parker Hughes Institute. Alkyl Ketones as Potent Anti-Cancer Agents. Patent US6251882B1, 26 June 2001. [Google Scholar]
- Uckun, F.M.; Narla, R.M.; Perry, D.A. Parker Hughes Institute. Alkyl Ketones as Potent Anti-Cancer Agents. Patent CA2336108A1, 6 January 2001. [Google Scholar]
- Perrey, D.A.; Narla, R.K.; Uckun, F.M. Cysteine chloromethyl and diazomethyl ketone derivatives with potent anti-leukemic activity. Bioorg. Med. Chem. Lett.
**2000**, 10, 547–549. [Google Scholar] [CrossRef] - Perrey, D.A.; Scannell, M.P.; Narla, R.K.; Uckun, F.M. The S-alkyl chain length as a determinant of the anti-leukemic activity of cysteine chloromethyl ketone compounds. Bioorg. Med. Chem. Lett.
**2000**, 10, 551–552. [Google Scholar] [CrossRef] - Kotchevar, A.T.; Perrey, D.A.; Uckun, F.M. A degradation study of a series of chloromethyl and diazomethyl ketone anti-leukemic agents. Drug Develop. Ind. Pharm.
**2002**, 28, 143–149. [Google Scholar] [CrossRef] [PubMed] - Holland, H.L.; Brown, F.M.; Johnson, D.V.; Kerridge, A.; Mayne, B.; Turner, C.D.; van Vliet, A.J. Biocatalytic oxidation of S-alkylcysteine derivatives by chloroperoxidase and Beauveria species. J. Mol. Catal. B Enzym.
**2002**, 17, 249–256. [Google Scholar] [CrossRef] - Calce, E.; De Luca, S. The cysteine S-alkylation reaction as a synthetic method to covalently modify peptide sequences. Chem. Eur. J.
**2017**, 23, 224–233. [Google Scholar] [CrossRef] [PubMed] - Castellano, G.; Redondo, L.; Torrens, F. QSAR of natural sesquiterpene lactones as inhibitors of Myb-dependent gene expression. Curr. Top. Med. Chem.
**2017**, 17, 3256–3268. [Google Scholar] [CrossRef] [PubMed] - Torrens, F.; Castellano, G. Structure–activity relationships of cytotoxic lactones as inhibitors and mechanisms of action. Curr. Drug Discov. Technol.
**2020**, 17, 166–182. [Google Scholar] [CrossRef] [PubMed] - Castellano, G.; Tena, J.; Torrens, F. Structural indicators and its relation to antioxidant properties of Posidonia oceanica (L.) Delile. MATCH Commun. Math. Comput. Chem.
**2012**, 67, 231–250. [Google Scholar] - Castellano, G.; González-Santander, J.L.; Lara, A.; Torrens, F. Classification of flavonoid compounds by using entropy of information theory. Phytochemistry
**2013**, 93, 182–191. [Google Scholar] [CrossRef] [PubMed] - Castellano, G.; Lara, A.; Torrens, F. Classification of stilbenoid compounds by entropy of artificial intelligence. Phytochemistry
**2014**, 97, 62–69. [Google Scholar] [CrossRef] [PubMed] - Castellano, G.; Torrens, F. Quantitative structure–antioxidant activity models of isoflavonoids: A theoretical study. Int. J. Mol. Sci.
**2015**, 16, 12891–12906. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Castellano, G.; Torrens, F. Information entropy-based classification of triterpenoids and steroids from Ganoderma. Phytochemistry
**2015**, 116, 305–313. [Google Scholar] [CrossRef] [PubMed] - Shaw, P.J.A. Multivariate Statistics for the Environmental Sciences; Hodder-Arnold: New York, NY, USA, 2003. [Google Scholar]
- Thakur, A. QSAR study on benzenesulfonamide ionization constant: Physicochemical approach using surface tension. Arch. Org. Chem.
**2005**, 14, 49–58. [Google Scholar] [CrossRef] [Green Version] - White, H. Neural network learning and statistics. AI Expert
**1989**, 4, 48–52. [Google Scholar] - Kullback, S. Information Theory and Statistics; Wiley: New York, NY, USA, 1959. [Google Scholar]
- Iordache, O. Modeling Multi-Level Systems; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Iordache, O. Self-Evolvable Systems: Machine Learning in Social Media; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- IMSL. Integrated Mathematical Statistical Library (IMSL); IMSL: Houston, TX, USA, 1989. [Google Scholar]
- Besalú, E. Fast computation of cross-validated properties in full linear leave-many-out procedures. J. Math. Chem.
**2001**, 29, 191–203. [Google Scholar] [CrossRef]

**Figure 4.**Scores plot for the homologous series of chloromethyl ketones with an acetyl group at R

_{1}.

**Figure 5.**Loading plot of variables for homologous series of chloromethyl ketones with acetyl at R

_{1}.

**Figure 6.**The Williams plot for the graphical visualization of outliers for the response (on the Y-axis: standardized residuals >3σ) or for the structure (on the X-axis: highest Hat value >h* cut-off line) in the regression models: (

**a**) Equation (1); (

**b**) Equation (2); (

**c**) Equation (3). Numbers 1–13 correspond to compounds

**3**–

**12**,

**16**–

**18**.

**Figure 7.**Experimental data: (

**a**) IC

_{50}antileukemic activity and (

**b**) stability k vs. the number of carbons.

**Table 1.**Vector of properties of Cys diazo- and chloromethyl-ketone derivatives and experimental data of antileukemic activity (IC

_{50}) and stability k.

Compound | R_{1} | R_{2} | R_{3} | <i_{1},i_{2},i_{3},i_{4},i_{5},i_{6}> ^{a} | IC_{50} (µM)Nalm-6 B-lineage ALL | IC_{50} (µM) Molt-3T-lineage ALL | k [hr^{−1}] 0.01M Phosphate Buffer,pH = 8.0, Ionic Strength = 0.3 M |
---|---|---|---|---|---|---|---|

1 | CH_{3}CO | CH_{3} | CH_{2}Cl | 111001 | 30.3 | 80.8 | – |

2 | CH_{3}CO | CH_{2}CH_{3} | CH_{2}Cl | 111001 | 52.8 | 99.9 | – |

3 | CH_{3}CO | (CH_{2})_{2}CH_{3} | CH_{2}Cl | 111101 | 6.9 | 8.0 | 0.0658 |

4 | CH_{3}CO | (CH_{2})_{3}CH_{3} | CH_{2}Cl | 111101 | 41.4 | 5.6 | 0.0523 |

5 | CH_{3}CO | (CH_{2})_{4}CH_{3} | CH_{2}Cl | 111101 | 5.8 | 5.4 | 0.0498 |

6 | CH_{3}CO | (CH_{2})_{5}CH_{3} | CH_{2}Cl | 111101 | 3.3 | 0.7 | 0.0336 |

7 | CH_{3}CO | (CH_{2})_{6}CH_{3} | CH_{2}Cl | 111101 | 4.8 | 2.5 | 0.0319 |

8 | CH_{3}CO | (CH_{2})_{7}CH_{3} | CH_{2}Cl | 111101 | 5.6 | 4.1 | 0.0388 |

9 | CH_{3}CO | (CH_{2})_{8}CH_{3} | CH_{2}Cl | 111101 | 7.3 | 6.7 | 0.0373 |

10 | CH_{3}CO | (CH_{2})_{9}CH_{3} | CH_{2}Cl | 111101 | 4.7 | 3.4 | 0.0352 |

11 | CH_{3}CO | (CH_{2})_{10}CH_{3} | CH_{2}Cl | 111111 | 1.7 | 3.0 | 0.0345 |

12 | CH_{3}CO | (CH_{2})_{11}CH_{3} | CH_{2}Cl | 111111 | 2.0 | 2.3 | 0.0242 |

13 | CH_{3}CO | (CH_{2})_{11}CH_{3} | CH=N_{2} | 011111 | 15.4 | 22.9 | – |

14 | Boc ^{b} | (CH_{2})_{11}CH_{3} | CH_{2}Cl | 110111 | 15.1 | 15.5 | – |

15 | H ^{c} | (CH_{2})_{11}CH_{3} | CH_{2}Cl | 100111 | 17.7 | 12.5 | – |

16 | CH_{3}CO | (CH_{2})_{13}CH_{3} | CH_{2}Cl | 111001 | 8.7 | 8.8 | 0.0417 |

17 | CH_{3}CO | (CH_{2})_{14}CH_{3} | CH_{2}Cl | 111001 | 8.9 | 8.6 | 0.0374 |

18 | CH_{3}CO | (CH_{2})_{15}CH_{3} | CH_{2}Cl | 111001 | 16.0 | 17.3 | 0.0363 |

19 | Boc-Gly | trans,trans-Farnesyl | CH=N_{2} | 000110 | 51.3 | 84.5 | – |

20 | Boc-Gly | trans,trans-Farnesyl | CH_{2}Cl | 100110 | 12.9 | 17.5 | – |

21 | Boc | trans,trans-Farnesyl | CH=N_{2} | 010110 | 49.8 | 50.1 | – |

22 | Boc | trans,trans-Farnesyl | CH_{2}Cl | 110110 | 10.7 | 7.7 | – |

23 | CH_{3}CO | trans,trans-Farnesyl | CH=N_{2} | 011110 | 30.3 | 32.2 | – |

24 | CH_{3}CO | trans,trans-Farnesyl | CH_{2}Cl | 111110 | 3.0 | 1.4 | – |

25 | CH_{3}CO | trans-Geranyl | CH=N_{2} | 011000 | >100 | >100 | – |

26 | Boc | trans-Geranyl | CH=N_{2} | 010000 | >100 | >100 | – |

27 | CH_{3}CO | 3-Methyl-2-butenyl | CH=N_{2} | 011000 | >100 | >100 | – |

28 | CH_{3}CO | 3-Methyl-2-butenyl | CH_{2}Cl | 111000 | 12.6 | 7.9 | – |

^{a}i

_{1}= 1, a chloromethyl group at R

_{3}; i

_{2}= 1, either an acetyl or Boc-substituent at R

_{1}; i

_{3}= 1, the only presence of an acetyl group at R

_{1}; i

_{4}= 1, a chain with between 3 and 12 carbons in line either with or without ramifications, either with or without double bonds at R

_{2}; i

_{5}= 1, at R

_{2}, a chain with either 11 or 12 carbons in line, either with or without ramifications, either with or without double bonds; i

_{6}= 1, absence of ramifications and double bonds in the R

_{2}chain.

^{b}Boc: tert-butyloxycarbonyl.

^{c}The molecule is a hydrochloride (acid salt resulting from its reaction with hydrochloric acid).

**Table 2.**Classification of cysteine diazomethyl- and chloromethyl-ketone derivatives by information entropy method.

P ^{a} | 0001 ^{b} | 0100/0101/0110 | 0111 | 1001 | 1101 | 1110 | 1111 |
---|---|---|---|---|---|---|---|

0X ^{c} | Class 9 | Class 3 | Class 2 | ||||

25 R_{1}: CH_{3}CO;R _{2}: trans-Geranyl26 R_{1}: Boc;R _{2}: trans-Geranyl27 R_{1}: CH_{3}CO;R _{2}: 3-Methyl-2-butenyl | 1 R_{2}: -CH_{3}2 R_{2}: -CH_{2}CH_{3}16 R_{2}: -(CH_{2})_{13}CH_{3}17 R_{2}: -(CH_{2})_{14}CH_{3}18 R_{2}: -(CH_{2})_{15}CH_{3}28 R_{2}: 3-Methyl-2-butenyl | 3 R_{2}: -(CH_{2})_{2}CH_{3}4 R_{2}: -(CH_{2})_{3}CH_{3}5 R_{2}: -(CH_{2})_{4}CH_{3}6 R_{2}: -(CH_{2})_{5}CH_{3}7 R_{2}: -(CH_{2})_{6}CH_{3}8 R_{2}: -(CH_{2})_{7}CH_{3}9 R_{2}: -(CH_{2})_{8}CH_{3}10 R_{2}: -(CH_{2})_{9}CH_{3} | |||||

1X | Class 8 | Class 7 | Class 6 | Class 5 | Class 4 | Class 1 | |

19 R_{1}: Boc-Gly;R _{2}: trans,trans-Farnesyl | 21 R_{2}: trans,trans-Farnesyl | 13 R_{2}: -(CH_{2})_{11}CH_{3}23 R_{2}: trans,trans-Farnesyl | 15 R_{1}: H.HCl;R _{2}: -(CH_{2})_{11}CH_{3}20 R_{1}: Boc-Gly;R _{2}: trans,trans-Farnesyl | 14 R_{2}: -(CH_{2})_{11}CH_{3}22 R_{2}: trans,trans-Farnesyl | 11 R_{2}: -(CH_{2})_{10}CH_{3}12 R_{2}: -(CH_{2})_{11}CH_{3}24 R_{2}: trans,trans-Farnesyl |

^{a}P: period <i

_{5},i

_{6}>.

^{b}0001: group <i

_{1},i

_{2},i

_{3},i

_{4}>.

^{c}X = either 0 or 1.

b | h | No. of Classes |
---|---|---|

1.0000 | 320.8858 | 28 |

0.9799 | 93.3938 | 14 |

0.9599 | 38.3400 | 9 |

0.9499 | 38.3178 | 9 |

0.9299 | 30.4859 | 8 |

0.9199 | 30.5388 | 8 |

0.8899 | 30.5166 | 8 |

0.8699 | 17.4259 | 6 |

0.8399 | 11.8925 | 5 |

0.7599 | 11.5383 | 5 |

0.7499 | 7.5860 | 4 |

0.5899 | 4.1698 | 3 |

**Table 4.**Cross-validated correlation coefficient in leave-m-out for Cys diazomethyl- and chloromethylketones.

m | IC_{50} Molt-3 T-Lineage ALL Equation (1) | pIC_{50} Nalm-6 B-Lineage ALL Equation (2) | k Equation (3) |
---|---|---|---|

1 | 0.764 | 0.424 | 0.286 |

2 | 0.767 | 0.428 | 0.285 |

3 | 0.770 | – | 0.283 |

4 | 0.772 | – | 0.281 |

5 | 0.775 | – | 0.280 |

6 | 0.776 | – | 0.280 |

7 | 0.775 | – | 0.283 |

8 | 0.769 | – | 0.290 |

9 | 0.738 | – | 0.306 |

10 | – | – | 0.340 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Castellano, G.; León, A.; Torrens, F.
Classification of Congeneric and QSAR of Homologous Antileukemic *S*–Alkylcysteine Ketones. *Molecules* **2021**, *26*, 235.
https://doi.org/10.3390/molecules26010235

**AMA Style**

Castellano G, León A, Torrens F.
Classification of Congeneric and QSAR of Homologous Antileukemic *S*–Alkylcysteine Ketones. *Molecules*. 2021; 26(1):235.
https://doi.org/10.3390/molecules26010235

**Chicago/Turabian Style**

Castellano, Gloria, Adela León, and Francisco Torrens.
2021. "Classification of Congeneric and QSAR of Homologous Antileukemic *S*–Alkylcysteine Ketones" *Molecules* 26, no. 1: 235.
https://doi.org/10.3390/molecules26010235