Classification of Congeneric and QSAR of Homologous Antileukemic S–Alkylcysteine Ketones

Based on a set of six vector properties, the partial correlation diagram is calculated for a set of 28 S-alkylcysteine diazomethyl- and chloromethyl-ketone derivatives. Those with the greatest antileukemic activity in the same class correspond to high partial correlations. A periodic classification is performed based on information entropy. The first four characteristics denote the group, and the last two indicate the period. Compounds in the same period and, especially, group present similar properties. The most active substances are situated at the bottom right. Nine classes are distinguished. The principal component analysis of the homologous compounds shows five subclasses included in the periodic classification. Linear fits of both antileukemic activities and stability are good. They are in agreement with the principal component analysis. The variables that appear in the models are those that show positive loading in the principal component analysis. The most important properties to explain the antileukemic activities (50% inhibitory concentration Molt-3 T-lineage acute lymphoblastic leukemia minus the logarithm of 50% inhibitory concentration Nalm-6 B-lineage acute lymphoblastic leukemia and stability k) are ACD logD, surface tension and number of violations of Lipinski’s rule of five. After leave-m-out cross-validation, the most predictive model for cysteine diazomethyl- and chloromethyl-ketone derivatives is provided.


Introduction
Nowadays, cancer is one of the most widespread diseases. It appears in different tissues and cells. Regarding its causes, there are a wide variety of carcinogens, both endogenous and exogenous. Breast, lung and colon are the most common cancers in developed countries. The global burden of cancer continues to increase, largely because of the aging and growth of the world population alongside the habits or behaviors that continuously expose us to carcinogens. Governments invest in preventive and informative public health campaigns. The most popular is against smoking, but there are many others such as preventives against breast and colon cancers, which are well known [1]. Owing to the rise in cancer, the search for anticancer drugs is still a target of study by many researchers. Most S-alkylcysteine diazomethyl-and chloromethyl-ketone derivatives have been shown to have anticancer action against acute lymphoblastic leukemia (ALL). They have been tested successfully [2][3][4][5][6][7]. The structures of these compounds are pretty close to amino acid cysteine (Cys).
The treatment of N-methoxycarbonyl C-carboxylate ester derivatives of S-methyl-Lcysteine by chloroperoxidase/hydrogen peroxide resulted in the oxidation of sulfur to produce (R S ) sulfoxide in moderate to high diastereomeric excess [8]. The (S S ) natural product sulfoxide chondrine was obtained via biotransformation of the N-tert-butyloxycarbonyl (Boc) Figure 1 shows the basic structure of cysteine diazomethyl-and chloromethylketone derivatives.

Results and Discussion
fication is performed via its thiol side chain, which is characterized by a philicity, higher than that of a primary amine as amino acid lysine, which i pH values below 9.0. Therefore, a cysteine can react faster than lysine, r selective modification of a key amino acid over other residues. A possible is the S-alkylation reaction; in this regard, post-translational modification this amino acid are essential for the biological function of many proteins numerous signaling proteins are post-translationally lipidated on a cy Since this lipidation is essential for the correct localization and function of the enzymes responsible for the covalent introduction/removal of lipid been considered interesting targets for blocking aberrant signaling proces In earlier publications, our research group showed a quantitative str relationship (QSAR) of sesquiterpene lactones (STLs) with potential antil ty, with the aim of predicting inhibitors of Myb-induced gene expres mechanisms of action [10,11]. Moreover, molecular classifications of som nolic compounds [12][13][14][15], triterpenoids and steroids [16] by information reported and related to their antioxidant activity. In the present report, 28 diazomethyl-and chloromethyl-ketone derivatives were classified using t entropy-based algorithm. The scientific rationale behind the classification dodecyl derivative (12) is an exceptionally active compound against leuk length of the alkyl chain has a profound effect on the antileukemic pote mologous series and the congeneric series may be useful for treating pati apy-refractory or relapsed leukemia. Thus, we want to validate if different congeneric series correspond to the same potency. The objective of this predict the antileukemic activity of these compounds based on their molec moreover, a study of QSAR and a principal component analysis (PCA) tileukemic activity of a homologous series of S-alkylcysteine chloromethy atives to the physical and chemical properties of these compounds. Figure 1 shows the basic structure of cysteine diazomethyl-a thyl-ketone derivatives.  Table 1 lists the vector of properties of cysteine diazomethylthyl-ketone derivatives and experimental data of antileukemic activity (IC k.  Table 1 lists the vector of properties of cysteine diazomethyl-and chloromethyl-ketone derivatives and experimental data of antileukemic activity (IC 50 ) and stability k. 9 a i 1 = 1, a chloromethyl group at R 3 ; i 2 = 1, either an acetyl or Boc-substituent at R 1 ; i 3 = 1, the only presence of an acetyl group at R 1 ; i 4 = 1, a chain with between 3 and 12 carbons in line either with or without ramifications, either with or without double bonds at R 2 ; i 5 = 1, at R 2 , a chain with either 11 or 12 carbons in line, either with or without ramifications, either with or without double bonds; i 6 = 1, absence of ramifications and double bonds in the R 2 chain. b Boc: tert-butyloxycarbonyl. c The molecule is a hydrochloride (acid salt resulting from its reaction with hydrochloric acid).

GraphCor Partial Correlation Diagram
The matrix of Pearson correlation coefficients has been calculated between each pair of vector properties <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > for the 28 cysteine diazomethyl-and chloromethylketone derivatives. The Pearson intercorrelations are computed for the partial correlation diagram, which contains high partial correlations (r ≥ 0.75), medium partial correlations (0.50 ≤ r < 0.75), low partial correlations (0.25 ≤ r < 0.50) and zero partial correlations (r < 0.25). Pairs of compounds with high partial correlation show a similar vector property. With the Equipartition Conjecture of Entropy Production, the partial correlations matrix (cf. Figure 2) contains 187 high, 44 medium, 116 low and 31 zero partial correlations. Many partial correlations are high. Red lines, representing high partial correlations, link cysteine derivatives with the greatest antileukemic activity because the most active compounds (11 and 12) are taken as reference molecules with vector properties <111111>. The antileukemic activities are expressed as IC 50 .
property. With the Equipartition Conjecture of Entropy Production, the partial correlations matrix (cf. Figure 2) contains 187 high, 44 medium, 116 low and 31 zero partial correlations. Many partial correlations are high. Red lines, representing high partial correlations, link cysteine derivatives with the greatest antileukemic activity because the most active compounds (11 and 12) are taken as reference molecules with vector properties <111111>. The antileukemic activities are expressed as IC50.

MolClas Molecular Classification Based on the Equipartition Conjecture of Entropy Production
The grouping rule is the following: two molecules are assigned to the same class if r ≥ b, where b is the classification level. A comparative analysis of the molecular dataset, from 28 classes (each compound in its own class) to one class (containing all compounds), by the method of information entropy theory, matching <i1,i2,i3,i4,i5,i6> and classification at level b (Cb), is calculated for antileukemic activity [17] and summarized in Table 2.

MolClas Molecular Classification Based on the Equipartition Conjecture of Entropy Production
The grouping rule is the following: two molecules are assigned to the same class if r ≥ b, where b is the classification level. A comparative analysis of the molecular dataset, from 28 classes (each compound in its own class) to one class (containing all compounds), by the method of information entropy theory, matching <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > and classification at level b (C b ), is calculated for antileukemic activity [17] and summarized in Table 2.                  The grouping rule in the case with equal weights a k = 0.5, for the classification level 0.94 ≤ b ≤ 0.96, allows nine classes (grouped from Class 1 to Class 9, cf. Table 3). The classes above are obtained with the associated entropy h(R b ) = 38.32, which is the classification closest to the cut-off point of the entropy vs. classification level with its trend line (cf. Figure 3). The grouping rule in the case with equal weights ak = 0.5, for the classification level 0.94 ≤ b ≤ 0.96, allows nine classes (grouped from Class 1 to Class 9, cf. Table 3). The classes above are obtained with the associated entropy h(Rb) = 38.32, which is the classification closest to the cut-off point of the entropy vs. classification level with its trend line (cf. Figure 3).  Table 2 shows a classification of periodic properties by using a procedure based on the information entropy theory (artificial intelligence). The first four features were taken to denote the group or column, and the last two features were used to indicate the period or row in the table of periodic classification. Cysteine derivatives in the same group present similar properties. Furthermore, compounds also in the same period show maximum resemblance. In this report, the cysteine diazomethyl-and chloromethyl-ketone derivatives, in the table, are related to the experimental data of antileukemic bioactivity properties, taken from the technical literature [2][3][4][5][6][7]. The antileukemic activity increases on going right through a period and augments when descending in a group. The chloromethyl-ketone derivatives with the greatest activity (Class 1, compounds 11, 12 and 24) are grouped into the same class, corresponding to acetyl amides with a linear chain containing either 11 or 12 carbons in R2. Moreover, chloromethyl-ketone derivatives with great activity (Classes 2-5) are clustered into other groupings. Finally, the groups with the least antileukemic activity are cysteine diazomethyl derivatives and are located at the left side of the table (Classes 6-9). The results are in agreement with Figure  2 because pairs of compounds in the same class with similar vector properties  Table 2 shows a classification of periodic properties by using a procedure based on the information entropy theory (artificial intelligence). The first four features were taken to denote the group or column, and the last two features were used to indicate the period or row in the table of periodic classification. Cysteine derivatives in the same group present similar properties. Furthermore, compounds also in the same period show maximum resemblance. In this report, the cysteine diazomethyl-and chloromethyl-ketone derivatives, in the table, are related to the experimental data of antileukemic bioactivity properties, taken from the technical literature [2][3][4][5][6][7]. The antileukemic activity increases on going right through a period and augments when descending in a group. The chloromethyl-ketone derivatives with the greatest activity (Class 1, compounds 11, 12 and 24) are grouped into the same class, corresponding to acetyl amides with a linear chain containing either 11 or 12 carbons in R 2 . Moreover, chloromethyl-ketone derivatives with great activity (Classes 2-5) are clustered into other groupings. Finally, the groups with the least antileukemic activity are cysteine diazomethyl derivatives and are located at the left side of the table (Classes 6-9). The results are in agreement with Figure 2 because pairs of compounds in the same class with similar vector properties <i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 > show red lines, representing high partial correlations, e.g., the pair (11,12) and both compounds with vector properties <111111> in Class 1.

Principal Component Analysis for Classification of the Most Antileukemic Bioactive Compounds
After obtaining the classification of the cysteine chloromethyl-and diazomethylketone derivatives, a PCA PC 1 -PC 2 scores plot was made (cf. Figure 4) with the properties for the highly active compounds, forming a homologous series of chloromethyl-ketone derivatives with an acetyl group at R 1 (compounds 3-12 and 16-18). Compounds 1 and 2 are inactive, and neither are included because the value of stability k is not published for them. The following 18 properties were taken from the ChEMBL database and were used for statistical assessment: full molecular weight (Full_mw, V 1 ), ACD logP (V 2 ), number of rotatable bonds (rtb,  [2][3][4][5][6][7]. Notice that there is only one entry for the logD value, compound 15 (chlorhydrate), different from logP; for the rest of them, there is no ionizable form, hence logP~logD for most of the compounds.

Principal Component Analysis for Classification of the Most Antileukemic Bioactive Compounds
After obtaining the classification of the cysteine chloromethyl-and diazomethyl-ketone derivatives, a PCA PC1-PC2 scores plot was made (cf. Figure 4) with the properties for the highly active compounds, forming a homologous series of chloromethyl-ketone derivatives with an acetyl group at R1 (compounds 3-12 and 16-18). Compounds 1 and 2 are inactive, and neither are included because the value of stability k is not published for them. The following 18 properties were taken from the ChEMBL database and were used for statistical assessment: full molecular weight (Full_mw, V1) (Ro5, V16), surface tension (V17) and density (V18). In addition, the variables of both IC50 (µM) Nalm-6 B-lineage ALL (V19) and IC50 (µM) Molt-3 T-lineage ALL (V20), and stability k [hr -1 ] in 0.01M phosphate buffer, pH 7.5 (V21), were taken from the bibliographic experimental data of Uckun and coworkers [2][3][4][5][6][7]. Notice that there is only one entry for the logD value, compound 15 (chlorhydrate), different from logP; for the rest of them, there is no ionizable form, hence logP ~ logD for most of the compounds. PCA was applied to reduce the initial variables to a small number of principal components (PCs) in order to obtain an overview variation of the compounds and identify behavioral patterns. Figure 4 shows the two-dimensional representation of the homologous series for all the variables taken into account for the first two PCs. The variance explained by PC1 and PC2 is 95.9%.
The homologous series of cysteine chloromethyl-ketone derivatives, with an acetyl group in R1, is distributed to five subclasses (Figure 4), in agreement with the clustering by entropy information and experimental data: Class 1 (compounds 11 and 12, PC1 > 0, PCA was applied to reduce the initial variables to a small number of principal components (PCs) in order to obtain an overview variation of the compounds and identify behavioral patterns. Figure 4 shows the two-dimensional representation of the homologous series for all the variables taken into account for the first two PCs. The variance explained by PC 1 and PC 2 is 95.9%.
The homologous series of cysteine chloromethyl-ketone derivatives, with an acetyl group in R 1 , is distributed to five subclasses (Figure 4), in agreement with the clustering by entropy information and experimental data: Class 1 (compounds 11 and 12, PC 1 > 0, PC 2 < 0, bottom), which includes the compounds with the greatest antileukemic activity, characterized by the presence of 11 or 12 carbons in R 2 ; Class 2A (compounds 6-10, PC 1 < 0 in general, PC 2 < 0, middle), characterized by the presence of 6-10 carbons in R 2 ; Class 2B (compounds 3-5, PC 1 < 0, PC 2 > 0, left), characterized by the presence of 3-5 carbons in R 2 ; Class 3A (compounds 16 and 17, PC 1 > 0, PC 2 > 0, right), characterized by the presence of 14 and 15 carbons in R 2; and Class 3B (compound 18, PC 1 > 0, PC 2 >> 0, top), characterized by the presence of 16 carbons in R 2 . This scheme can be generalized to adopt a larger Class 3 merging Classes 3A and 3B. Figure 5 describes the behavior of the variables. The properties most remote from the origin (0.0, 0.0) are the most important for describing PCs, and those closest to the origin are the least important.  A multiple linear regression model approach was adopted to determine the quantitative importance of the combined presence of some of the 18 properties, taken from the ChEMBL database (cf. Supplementary Material Table S1) to explain such antileukemic activities: IC50 Molt-3 T-lineage ALL (higher value means lower antileukemic activity), pIC50 Nalm-6 B-lineage ALL (higher value means higher antileukemic activity) and stability k (higher value means higher antileukemic activity). The fits were checked with the correlation coefficient r, the standard deviation s and Fisher's ratio F. The equations of the models between the homologous series of compounds and the properties follow: On the one hand, PC 1 (87.6% of the total variance) shows positive loading mainly with acd_logp, rtb, full_mwt, alogp, num_lipinski_ro5_violations, ACD/KOC and ACD/BDF, as well as negative loading with surface_tension, QedWeigted, density and stability k. On the other hand, principal PC 2 (8.3% of the total variance) shows positive loading, mainly with ACD/BDF, ACD/KOC, num_lipinski_ro5_violations and k. The rest of the variables are near the origin and are less important for PC 2 .
Both compounds with important antileukemic activity and stability (11 and 12) are characterized by positive loading with the number of violations of Lipinski's Ro5, ACD/KOC and ACD/BCF, as well as negative loading with surface tension, density and stability k. The rest of the variables are near the origin and are less important for antileukemic activity.
A multiple linear regression model approach was adopted to determine the quantitative importance of the combined presence of some of the 18 properties, taken from the ChEMBL database (cf. Supplementary Material Table S1) to explain such antileukemic activities: IC 50 Molt-3 T-lineage ALL (higher value means lower antileukemic activity), pIC 50 Nalm-6 B-lineage ALL (higher value means higher antileukemic activity) and stability k (higher value means higher antileukemic activity). The fits were checked with the correlation coefficient r, the standard deviation s and Fisher's ratio F. The equations of the models between the homologous series of compounds and the properties follow: Quantitative structure-activity/property relationship (QSAR/QSPR) researchers are trying to establish equations that correlate the physicochemical parameters of the molecules with their activities/properties; e.g., molar refractivity, refractive index and electronic parameters, which have been used extensively. The first study that correlated the surface tension with dissociation constants was Thakur's work [18]. He showed that the surface tension could be successfully used to model the dissociation constant of sulfonamide drugs. The dissociation constant pK a depends on the polarity and the intermolecular forces. For maximum activity, the sulfonamides should present a proper pK a for penetrating in vivo membranes and best binding abilities to their target enzyme. The abilities depend on their protonated/unprotonated form dissociation constants, expressed as pK a . Thakur's results could explain the interest of surface tension appearing in Equations (1) and (2) because it reduces the bioactivity of our molecules.
The applicability domain of the proposed models (1)-(3) is analyzed by Williams plot (cf. Figure 6), which is the chart of cross-validated standardized residuals vs. leverage (Hat diagonal) values (k). In Equation (1), the response outlier (cross-validated standardized residual >3σ) is compound 16 and the structurally influential chemical (h > h*) is compound 18. In Equation (2), there is neither response outlier nor structurally influential chemical. In Equation (3), there is no response outlier and the structurally influential chemical is compound 18. Leave-m-out (1 ≤ m ≤ 10) cross-validated correlation coefficients rcv calculated for Cys diazomethyl-and chloromethyl-ketone derivatives (q = rcv (m = 1), cf. Table 4) show that rcv decays with m except for IC50 Molt-3 T-lineage and pIC50 Nalm-6 B-lineage (Equations (1) and (2)), which indicates possible outliers. In Equation (2), cross-validation can be calculated for only m ≤ 2 because Ro5 values are not very discriminating (cf. Table S1). In particular, the Molt-3 T-lineage activity inhibition model IC50 vs. {ACDlogD, sur-face_tension (Equation (1)) gives the greatest rcv. Therefore, Equation (1) results more predictive than Equations (2) and (3). Leave-m-out (1 ≤ m ≤ 10) cross-validated correlation coefficients r cv calculated for Cys diazomethyl-and chloromethyl-ketone derivatives (q = r cv (m = 1), cf. Table 4) show that r cv decays with m except for IC 50 Molt-3 T-lineage and pIC 50 Nalm-6 B-lineage (Equations (1) and (2)), which indicates possible outliers. In Equation (2), cross-validation can be calculated for only m ≤ 2 because Ro5 values are not very discriminating (cf . Table S1). In particular, the Molt-3 T-lineage activity inhibition model IC 50 vs. {ACDlogD, surface_tension (Equation (1)) gives the greatest r cv . Therefore, Equation (1) results more predictive than Equations (2) and (3). The linear regressions suggest that the number of carbons is an important individual factor. Figure 7a,b shows the representations of both IC 50 Nalm-6 B-lineage ALL and IC 50 Molt-3 T-lineage ALL, as well as stability k vs. the number of carbons. Both IC 50 Nalm-6 and IC 50 Molt-3 are similar, with the minimum in 11-12 carbon atoms (Figure 7a). All properties are fitted to second-degree polynomial curves. The most active compounds (11 and 12), which present minimum values in the fitted models in the graphics, match Class 1 in Table 2 of periodic properties, obtained by the procedure based on information entropy theory (artificial intelligence). These compounds are in the last (right side) group and last (bottom) period.  The linear regressions suggest that the number of carbons is an important individual factor. Figure 7a,b shows the representations of both IC50 Nalm-6 B-lineage ALL and IC50 Molt-3 T-lineage ALL, as well as stability k vs. the number of carbons. Both IC50 Nalm-6 and IC50 Molt-3 are similar, with the minimum in 11-12 carbon atoms ( Figure  7a). All properties are fitted to second-degree polynomial curves. The most active compounds (11 and 12), which present minimum values in the fitted models in the graphics, match Class 1 in Table 2 of periodic properties, obtained by the procedure based on information entropy theory (artificial intelligence). These compounds are in the last (right side) group and last (bottom) period.

MolClas Program for Molecular Classification Based on the Equipartition Conjecture of Entropy Production
The computational method is the same as the one that we successfully applied to the

MolClas Program for Molecular Classification Based on the Equipartition Conjecture of Entropy Production
The computational method is the same as the one that we successfully applied to the classification of polyphenolic compounds. The first step in quantifying the concept of similarity, for molecules of cysteine diazomethyl-and chloromethyl-ketone derivatives, is to list the most important moieties with respect to the antileukemic activity of such compounds. Furthermore, the vector of properties i = <i 1 ,i 2 , . . . i k , . . . > should be associated with each feature i k , whose components correspond to a number of characteristic functional groups in the molecule, in a hierarchical order, according to the expected importance of their antileukemic activity. The components i k are either "1" or "0," according to the experimental conclusions of antileukemic power for structural variations in the cysteine derivative compounds.
In this way, index i 1 = 1 denotes a chloromethyl group at R 3 ; i 2 = 1 signifies either an acetyl or tert-butyloxycarbonyl (Boc)-substituent at R 1 ; i 3 = 1 indicates the only presence of an acetyl group at R 1 ; i 4 = 1 means a chain that has between 3 and 12 carbons in line either with or without ramifications, either with or without double bonds at R 2 ; i 5 = 1 represents that at R 2 ; the structure presents a chain with either 11 or 12 carbons in line, either with or without ramifications and either with or without double bonds; and i 6 = 1 shows the absence of ramifications and double bonds in the R 2 chain (Table 1).
Let us denote by r ij (0 ≤ r ij ≤ 1) the similarity index of two cysteine derivatives, associated with the i and j vectors, respectively. A similarity matrix R = [r ij ] characterizes the relation of similitude. The similarity index between two cysteine derivatives i = <i 1 ,i 2 , . . . i k , . . . > and j = <j 1 ,j 2 , . . . j k , . . . > is defined as: where 0 ≤ a k ≤ 1 and t k = 1 if i k = j k , but t k = 0 if i k = j k . This definition assigns a weight (a k ) k to each property involved in the description of molecule i or j. The hierarchical order of the six structural features is expressed by their corresponding weights. For instance, for all a k = 0.5, these weights are 0.5, 0.25, 0.125, 0.0625, 0.03125 and 0.015625, which have been used in this work.
Learning procedures similar to those encountered in stochastic methods are implemented as follows [19]. Consider a given partition into classes as good or ideal from practical or empirical observations. This corresponds to a reference similarity matrix S = [s ij ] obtained for an arbitrary number of fictitious properties. Next, consider the same set of species as in the good classification and the actual properties.
The similarity degree r ij is then computed from the R correlation matrix. The number of properties for R and S may differ. The learning procedure consists of trying to find classification results for R as close as possible to the good classification. The distance between the partitions in classes characterized by R and S is given by: This definition was suggested by that introduced in information theory by Kullback to measure the distance between two probability distributions [20]. Such a procedure has been applied to the synthesis of complex dendrograms using information entropy [21,22].
We have written a MolClas program for molecular classification based on the Equipartition Conjecture of Entropy Production. It punches the similarity and difference matrices, as well as the latter in format NEXUS (.NEX) for programs PAUP, MacClade and SplitsTree. Code MolClas performs single-and complete-linkage hierarchical cluster analyses (CAs) of the compounds by using the IMSL subroutine CLINK [23].

GraphCor Program for Partial Correlation Diagram
The partial correlation diagram presents high partial correlations (|r| ≥ 0.75) in red, medium partial correlations (0.50 ≤ |r| < 0.75) in orange, low partial correlations (0.25 ≤ |r| < 0.50) in yellow and zero partial correlations (|r| < 0.25) in black. Codes MolClas and GraphCor are available from the authors at Internet (torrens@uv.es) and are free for academics.

Statistical Analysis
Principal component analysis (PCA), linear and multiple linear regression models were performed using SPSS (vs. 21.0, IBM Corp., USA), Minitab (vs. 17.1.0), Knowledge Miner and Microsoft Excel for Office (2020) for Windows 10 OS. The calculated statistics are the number of data points N, the correlation coefficient r, the standard deviation s and the Fisher's ratio F. The correlation coefficients between cross-validation r cv (q = r cv (m = 1), etc.) were calculated with the leave-m-out (LMO) procedure [24]. The process furnishes a new method for selecting the best set of descriptors: LMO selects the best set of descriptors according to the criterion of maximization of the value of r cv . The cross-validation was used to determine the predictability of the models, which were compared and validated taking into account r cv (q).

Conclusions
From the discussion of the present results, the following conclusions can be drawn.

1.
Based on a set of six vector properties, the partial correlation diagram was calculated for a set of 28 S-alkylcysteine diazomethyl-and chloromethyl-ketone derivatives. Derivatives with the greatest antileukemic activity in the same class correspond to high partial correlations.

2.
A table of periodic classification is made based on information entropy. The first four characteristics denote the group, and the last two indicate the period. Nine classes are clearly distinguished. The most active compounds (11, 12 and 24), all with 11 or 12 carbons in line in R 2 , are situated at the right side, bottom and, especially, bottom right of this periodic table.

3.
The principal component analysis scores plot of the homologous series of S-alkyl chloromethyl ketones, for 18 properties, shows five subclasses corresponding to the periodic classification of the congeneric series into nine classes.

4.
Linear fits of both antileukemic activities and stability are good (correlation coefficients of 0.57 or greater). They are in agreement with the principal component analysis. The variables that appear in the models are those that show positive loading in the principal component analysis.

5.
The most important properties to explain the antileukemic activities (50% inhibitory concentration Molt-3 T-lineage acute lymphoblastic leukemia minus the logarithm of 50% inhibitory concentration Nalm-6 B-lineage acute lymphoblastic leukemia and stability k) are ACD logD, surface tension and number of violations of Lipinski's rule of five.
The results of the antileukemic activities for the cysteine diazomethyl-and chloromethylketone derivatives show that the surface tension has an unfavorable influence and this could be related to the results obtained by Thakur. 8.
The representations of 50% inhibitory concentration Nalm-6 B-lineage and 50% inhibitory concentration Molt-3 T-lineage acute lymphoblastic leukemias, as well as stability k vs. the number of carbons, are fitted to second-degree polynomial curves. The most active compounds (11 and 12) present minimum values and coincide with Class 1 obtained by information entropy theory.