Next Article in Journal
The Presence of Blood–Brain Barrier Modulates the Response to Magnesium Salts in Human Brain Organoids
Previous Article in Journal
MicroRNA-449a Inhibits Triple Negative Breast Cancer by Disturbing DNA Repair and Chromatid Separation
Previous Article in Special Issue
Suppressing Effect of Na+/Ca2+ Exchanger (NCX) Inhibitors on the Growth of Melanoma Cells
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Antitumor Activity of Anthrapyrazole Derivatives

by
Marcin Gackowski
1,*,
Karolina Szewczyk-Golec
2,
Robert Pluskota
1,
Marcin Koba
1,
Katarzyna Mądra-Gackowska
3 and
Alina Woźniak
2
1
Department of Toxicology and Bromatology, Faculty of Pharmacy, L. Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, A. Jurasza 2 Street, PL-85089 Bydgoszcz, Poland
2
Department of Medical Biology and Biochemistry, Faculty of Medicine, L. Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, Karłowicza 24 Street, PL-85092 Bydgoszcz, Poland
3
Department of Geriatrics, Faculty of Health Sciences, L. Rydygier Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Torun, Skłodowskiej Curie 9 Street, PL-85094 Bydgoszcz, Poland
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2022, 23(9), 5132; https://doi.org/10.3390/ijms23095132
Submission received: 2 March 2022 / Revised: 26 April 2022 / Accepted: 29 April 2022 / Published: 4 May 2022
(This article belongs to the Special Issue Advances in Molecular Activity of Potential Drugs)

Abstract

:
An approach using multivariate adaptive regression splines (MARSplines) was applied for quantitative structure–activity relationship studies of the antitumor activity of anthrapyrazoles. At the first stage, the structures of anthrapyrazole derivatives were subjected to geometrical optimization by the AM1 method using the Polak–Ribiere algorithm. In the next step, a data set of 73 compounds was coded over 2500 calculated molecular descriptors. It was shown that fourteen independent variables appearing in the statistically significant MARS model (i.e., descriptors belonging to 3D-MoRSE, 2D autocorrelations, GETAWAY, burden eigenvalues and RDF descriptors), significantly affect the antitumor activity of anthrapyrazole compounds. The study confirmed the benefit of using a modern machine learning algorithm, since the high predictive power of the obtained model had proven to be useful for the prediction of antitumor activity against murine leukemia L1210. It could certainly be considered as a tool for predicting activity against other cancer cell lines.

1. Introduction

Anthrapyrazoles are synthetic anticancer drugs, synthesized in order to retain high levels of the wide spectrum of antitumor activity in anthracyclines (e.g., doxorubicin), while at the same time, diminishing cardiotoxicity by reducing the potential to generate semiquinone free radicals in cardiac cells [1,2]. Although there was a broad range of antitumor activity in model tumors [1,3], they revealed diversified activity in doxorubicin-resistant cells [4]. The action mechanism of these planar compounds is based on DNA intercalation, topoisomerase II inhibition of DNA synthesis, and DNA strand breaks [2]. Structurally, anthrapyrazoles are similar to mitoxantrone, but their structure has to be modified to reduce the abovementioned side effect. Attempts to reduce the toxicity of anthracyclines have led to the development of various anthrapyrazole derivatives, including teloxantrone (Ci-937, DUP-937, molecule a-60, which is studied in this work), piroxantrone (CI-942, DUP-492, molecule a-58, which is studied in this work), and finally losoxantrone (CI-941, DUP-941), with reduced side effects and increased efficacy in patients with breast cancer. Those three anthrapyrazoles even underwent clinical trials, and in phase II trials, they exhibited significant response rates in women with metastatic breast cancer [3]. Losoxantrone has shown impressive cytotoxic activity on a wide range of tumor cell lines (virtually the same spectrum of antitumor activity as mitoxantrone) with predicted potential to replace anthracyclines through a more favorable therapeutic index [1]. What is more, is that a response rate of 63% in women with metastatic breast cancer was observed in the study conducted by Talbot et al. [5].
The multivariate adaptive regression splines (MARSplines) were presented by Friedman as a method for flexible regression modeling of high dimensional data [6]. This modern machine learning algorithm was successfully applied in a quantitative structure–activity relationship (QSAR), and a quantitative structure–retention relationship (QSRR) modeling approach was applied in studies for drug activity prediction. A MARSplines procedure was used for the development of predictive QSAR models of various compounds with diverse pharmacological activities, such as antitrypanosomal 4-thiazolidinones [7], antispasmodial artemisin compounds [8], pyridine N-oxide derivatives against human severe acute respiratory syndrome [9], or anticancer acridone derivatives [10]. The advantages of the MARS technique were shown, among others, in the case of artemisinin compounds. Namely, it was found that QSAR models determined by the MARS procedure are the most satisfactory predictive models in comparison with some other methods such as multiple linear regression [8]. For abovementioned reasons, the MARSplines algorithm was chosen as a promising tool for a prediction of the antitumor activity of anthrapyrazoles in the present study.
A large set of anthrapyrazole compounds (about 119 derivatives, 73 of which have been studied in the present work) was tested against L1210 murine leukemia in vitro, and P388 leukemia in vivo, by Hollis Showalter et al. [11] In subsequent studies, some of the abovementioned compounds were tested in eight different mouse tumor systems [1]. Moreover, it was found in another study that 12 different anthrapyrazole derivatives inhibited the growth of K562 and K/VP.5 cells [12]. In light of the constant need to develop new anticancer drugs, as well as the high potential of such a large group of anthrapyrazole derivatives studied in the present work, structure–activity studies using modern machine learning algorithms may contribute to achieving better levels of predictivity, thus indicating a potential candidate for further research. The goal of the present work is to create a model predicting the antitumor activity of 73 anthrapyrazole derivatives, as well as to evaluate the usefulness of the MARSplines procedure for QSAR studies.

2. Results

More than 2500 molecular descriptors were obtained using Hyperchem and Dragon software, which were used as independent variables to create a model predicting the antitumor activity of 73 anthrapyrazole derivatives ().

2.1. Geometry Optimization

Molecular modeling was performed with 73 derivatives, which were first geometrically optimized. Examples of three-dimensional particle structures with defined geometries are shown in Figure 1.

2.2. Statistical Analysis

Completion of an optimal model describing the structure–activity relationship allowed the selection of relevant variables (Mor05s, Mor19m, MATS8e, H1e, ATSC7vk, ATSC1e, SpMax8_Bh(s), Mor21e, Mor13s, R5p, ATSC1s, ATSC8s, RDF135e, and HATS5s) presented in Table 1.

2.2.1. Model Construction and Prediction of pIC50 Values

The MARS model, using a considerable set of descriptors as possible predictors, was developed using a training set to describe the antitumor activity denoted as a negative logarithm of the half maximal inhibitory concentration (pIC50). The degree of interaction was set at 3, which led to linear, second, and third order splines being incorporated into the model, whereas the maximum number of basis functions was set at 40. Finally, the optimal MARS model was selected on the basis of three validation parameters (R2,Q2 and MAE). All fourteen descriptors incorporated into the model are characterized in Table 2.
The MARS model is based on several interactions between molecular properties. All of the abovementioned molecular descriptors treated as predictor variables appear in 38 basis functions, which form 23 splines (high-order basis functions) (Bm). The model starts with the constant function B1, and then, in subsequent steps, functions giving the best learning system fit for the current residual are added to the model according to Equation (1):
pIC 50 = m = 1 22 a m B m
The optimal model contains eight single basis functions (B2, B3, B8, B9, B10, B21, B22, B23), twelve splines that are second-order interactions of two molecular properties (B4, B5, B6, B7, B11, B12, B13, B14, B17, B18, B19, B20), and finally two splines that are third-order interactions of three molecular properties (B15, B16). All basis functions (B1. B2…B23) and their coefficients am that comprise the model are shown in Table 3.
As an example of a linear basis function, B9 can be considered:
( 0.42100 Mor 19 m ) + = { ( 0.42100 Mor 19 m )   if   Mor 19 m < 0.42100   0   otherwise ,
What this means, is that the ninth term of Equation (1) is—5.67335 (0.42100 − Mor19m) when Mor19m is lower than 0.42100, and zero when it is smaller than 0.42100. As for an exemplary two-order interaction between molecular properties, B14 may be reviewed:
( MATS 8 e 0.07400 ) + ( RDF 135 e 7.04700 ) + = { ( MATS 8 e 0.07400 ) ( RDF 135 e 7.04700 )   if   MATS 8 e > 0.07400   and   RDF 135 e > 7.04700 0   otherwise
What this means is that the fourteenth term of Equation (1) is—8.31766 (MATS8e − 0.07400)(RDF135e − 7.04700) when MATS8e is higher than 0.07400 and RDF135e is higher than 7.04700, but otherwise it is zero.
As mentioned above, 14 of more than 2500 descriptors were incorporated into the MARS model. Their relevance to the MARS model expressed as the number in the basis functions, as well as in their definition, block, and dimensionality which are presented in Table 2. Descriptors describing the molecule’s 3-D geometrical properties (3D-MoRSE descriptors, GETAWAY descriptors, RDF descriptors) emerge in the foreground in the present molecular modeling. The other descriptors are two-dimensional burden eigenvalues and autocorrelations, namely ATS descriptors, which describe how a property is distributed along the topological structure. Out of all the descriptors present in the model, Mor05s and Mor19m descriptors belong to the class of 3D-MoRSE descriptors, and contribute the most to the model, as they appear in the basis functions nine and six times, respectively. The next descriptor is MATS8e, which appears four times in the model, and belongs to the class of 2D autocorrelations, and finally, H1e presents three times as a representative of GETAWAY descriptors. Other descriptors of minor importance for the model (i.e., occurring twice), include ATSC7v, ATSC1e, SpMax8_Bh(s), and R5p. The contributions of ATSC1s, ATSC8s, RDF135e, and HATS5s are much less significant.

2.2.2. Validation of Models and Selection of the Optimal One for Prediction

Using the Multivariate Adaptive Regression Splines nonparametric procedure, 11 QSAR models were created using a different degree of interactions, as well as a different maximum number of basis functions. The coefficients included in the models were determined on the basis of the training group (see Table 1). Following the calculated validation parameters of all models, an optimal model was selected showing the structure–activity relations (degree of interaction 3, number of basis functions 38) with the highest determination coefficient (R2) (a perfect correlation was obtained), a cross-validated R2 (Q2) threshold greater than 0.5 (checking R2 for internal validation), and the lowest mean absolute error (MAE). The values of the aforementioned parameters are presented in Table 4.
Moreover, for the optimal MARS model, the extended validation procedure that is typical for QSAR models was applied according to Roy et al. [13] (see Table 5) Considering the above characteristics, the reasonably high predictive power of the established MARS model should be emphasized.

2.3. Values of Predicted Data

Values of pIC50 ( p I C 50 c a l c ) obtained on the basis of the constructed model were compared with the experimental data ( p I C 50 e x p ) (see Table S1) and in the scatter plot, where a strong positive relationship is shown (see Figure 2). Moreover, analysis of residuals showed that the residual plot represents a normal distribution (see Figure 3). An elaborated MARS model was also employed for the prediction of antitumor activity against murine leukemia L1210 out of the seven other anthrapyrazole derivatives. This external set was adopted from the literature [1]. It should be noted that the antitumor activity against murine leukemia L1210 has not been reported so far. For more details, see Table S2.

3. Discussion

On the basis of the abovementioned validation parameters, namely R2, Q2, and MAE [13], the optimal predictive and applicative model was selected from the eleven proposed MARS models elaborated in this study, differing in terms of independent variables included, as well as the degree of interactions, and maximum number of basis functions. The interpretation of obtained results begins with a focus on the number and the nature of molecular descriptors present in the model. Fourteen selected descriptors appear in 38 basis functions, which form 23 splines. Predictive descriptors can be divided into the following groups: 3D-MoRSE descriptors, 2D autocorrelations, GETAWAY descriptors, Burden eigenvalues, and RDF descriptors. Descriptors derived from the three-dimensional structure of anthrapyrazole compounds have the highest frequency in repetition (and, in this way, the largest share) in the model (over 68%). This class of geometrical descriptors, which is calculated based on optimized molecular geometry that is obtained by the method of computational chemistry in the current study, comprises 3D-MoRSE descriptors and GETAWAY descriptors. The remaining 32% of the descriptors are calculated from the 2D structure of a molecule (molecular topology).
The 3D-MoRSE Molecular Representation of Structures’ (based on electronic diffraction) descriptors, which have contributed the most to the model percentage-wise (50%), and they comprise the most prominent block of descriptors in the present study. The 3D-MoRSE structure was introduced in 1996 by J.H. Schuur, P. Selzer, and J. Gasteiger in order to encode the 3D structure of a molecule by a fixed number of variables. Each representative of this descriptor block combines the information about the whole molecule structure and its final value, which is derived mostly from short-distance atomic pairs [14]. The 3D-MoRSE descriptors, which are representations of the 3D structure of a molecule, encode features such as molecular weight, van der Waals volume, electronegativities and polarizabilities. In this study, 3D-MoRSE descriptors, weighted by I-state, weighted by mass, and weighted by Sanserson electronegativity, are distinguished. The 3D-MoRSE descriptors cannot describe complex atomic groups or regions with a high or low electron density, or some quantum-chemical properties, but they result in a good model performance when activity variation coincides with variation in interatomic distances due to changes to the bonds’ order and the introduction of new atoms [14].
It was shown that other important factors in predicted antitumor activity (MATS8e, ATSC7v, ATSC1e, ATSC1s, ATSC8s) are 2D autocorrelation descriptors. In general, they explain how the considered property is distributed along the topological structure. An autocorrelation descriptor is a topological descriptor encoding both the molecular structure and physicochemical properties of a molecule [15,16]. The 2D autocorrelations have a share in the optimal model with a percentage of 26.30%.
The next important variables selected belong to GETAWAY (Geometry, Topology, and Atom–Weights Assembly) descriptors (H1e, R5p and HATS5s), which are the block of descriptors that contribute 15.80%. GETAWAY tries to match the 3D molecular geometry provided by the molecular influence matrix and atom relatedness, using topology and chemical information, with the use of various atomic weighting schemes [15].
Another important variable, which is representative of burden eigenvalues with two repetitions in the MARS model, is denoted as SpMax8_Bh(s), which occurs two times in the elaborated MARS model. It belongs to the block of molecular descriptors based on the assumption that the lowest eigenvalues contain contributions from all atoms, and thus, they reflect topology of the molecule [15].
The last parameter, which has been used for modeling and has the smallest frequency, represents RDF (Radial Distribution Function) descriptors, which are based on a radial distribution function. It can be interpreted as the probability distribution of finding an atom in a spherical volume of a radius [15,17].
Efforts to establish mathematical equations for the prediction antitumor activity of anthrapyrazoles also prompt a closer examination and understanding of the mechanism action of these compounds. First of all, anthrapyrazoles with their planar structure can intercalate into DNA. It is well known that compounds that intercalate into DNA stabilize the DNA double helix and increase the temperature at which the DNA is denatured. It is worth noticing that, for a small set of anthrapyrazoles examined in the study, some anthrapyrazole compounds were bound to DNA even more strongly than doxorubicin, the drug hoped, as mentioned before, to be replaced with anthrapyrazoles due to its cardiotoxicity. Anthrapyrazoles not only target DNA, but also interfere with one of the enzymes processing DNA. More specifically, anthrapyrazoles inhibit the decatenation activity of human topoisomerase IIα. This enzyme alters DNA topology by catalyzing the passing of an intact DNA double helix through a transient double-stranded break, which is made in a second helix. Topoisomerase IIα activity is critical for relieving torsional stress that occurs during replication and transcription, and for daughter-strand separation during mitosis. Not only anthrapyrazoles, but also most of the currently used anticancer agents, such as anthracyclines (for instance doxorubicin, mitoxantrone, and etoposide), act as topoisomerase II inhibitors, and their cytotoxicity is a result of the stabilization of a covalent topoisomerase II-DNA intermediate (the cleavable complex). Finally, docking studies on several compounds revealed that the inhibitory activity of anthrapyrazoles is due, in part, to their ability to bind to DNA and structurally similar anthrapyrazoles that can be docked into the doxorubicin-binding pocket on DNA. Moreover, increased binding is associated with increased anthrapyrazole-DNA van der Waals interactions [12].
In the study by Showalter et al. [11] which incorporates the activity data subjected to the current study, the antitumor activity against murine L1210 leukemia in vitro, as well as against P388 leukemia in vivo, was tested over one hundred anthrapyrazole derivatives. Findings of the study indicate that basic side chains at N-2 and C5two to three carbon spaces between proximal and distal nitrogen atoms of the side chain, and A-ring hydroxylations, especially at C-7, contribute to the activity against P388 leukemia growth [11]. Those findings were confirmed by Hartley et al. [18] but the obtained results were not always consistent. On the one hand, the side chains had a greater effect on DNA binding, but on the other hand, the intercalation was affected more by hydroxylation of the A-ring. DNA binding was increased by hydroxylation at C-7 and decreased by hydroxyl groups at any position on the A-ring [18]. Interestingly, in the study by Begleiter et al. [3] anthrapyrazole derivatives showed a broad range of activity for inhibiting topoisomerase II decatenation activity; however, there was no significant correlation with the cytotoxic activity observed. All of the anthrapyrazole analogues examined in this study inhibited the growth of the four cell lines with IC50 values that ranged from 0.1 to 45.2 μM, but losoxantrone was the most potent molecule. Structure–activity studies revealed an increase in the cytotoxic activity with the presence of a tertiary amine in the basic side chain at N-2, in comparison with a secondary amine in the same position for the majority of examined derivatives, but only in the case of the absence of a basic side chain at the C-5 position. Other structural alternations, such as a chlorine substituent on the basic side chain at N-2, moving the position of a chlorine substituent from C-5 to C-7, or introducing a basic side chain at C-5, did not have a consistent effect on cytotoxic activity. The authors of this study suggested that the ability of the analogues to bind to DNA by alkylation does not contribute significantly to the antitumor activity of the anthrapyrazoles [3]. A study by Liang et al. [12] confirmed the abovementioned results. Namely, cell growth inhibition by anthrapyrazoles was not well-correlated with the inhibition of topoisomerase IIα catalytic activity, which suggests that the anthrapyrazole derivatives examined in this study did not act solely by inhibiting the catalytic activity of topoisomerase II. Moreover, the authors showed that hydrogen-bond donor interactions and electrostatic interactions with the protonated amino side chains of the anthrapyrazoles led to high cell growth inhibitory activity [12].
The abovementioned studies demonstrated that structural changes on the basic side chain at N-2, and at C-5, C-7, can have a considerable impact on the cytotoxic activity of anthrapyrazoles as well as on topoisomerase II inhibition. Those results that are still inconsistent, may even, to small extent, help to understand the role of descriptors incorporated into the optimal MARS model. In the present study, descriptors derived from the three-dimensional structure of anthrapyrazole compounds comprise the largest share in the model, alongside the most prominent 3D-MoRSE descriptors, with values that are very sensitive to any conformational change in the molecule, and GETAWAY descriptors encoding information about the influence that each atom has in determining the whole shape of the molecule. In this light, the antitumor activity difference of the anthrapyrazoles studied, is presumably a result of the interatomic distances’ changes, or the introduction of new atoms at N-2 and C-5, C-7. The obtained data indicate that parameters based on the molecular geometry and physicochemical properties, the reflection of molecular topology, and finally, the distance distribution of the compounds, are of the greatest importance for the antitumor activity of the anthrapyrazole derivatives.
It should be emphasized that the MARS model has been expanded upon in the present study, in order to predict the antitumor activity of 73 anthrapyrazole compounds, so that it is able to describe more than 96% of the variance in the experimental activity. Good predictive properties of the model were confirmed by an extensive validation procedure, which is characteristic of QSAR models. Several validation parameters were calculated. Among others, cross-validated R2 was checked for internal validation, mean absolute error was calculated, and predictability, as well as precision and accuracy, were also assessed. It should be noted that all tested parameters met the acceptance criteria [13] listed in Table 5. Moreover, the MARS model that was created may be successfully employed for the prediction of the antitumor activity of anthrapyrazole compounds. Its applicative value was confirmed by an external set of seven molecules with the predicted pIC50 listed in Table S2.
Searching the literature, it is still easier to come across QSAR analysis based on multiple linear regression than multivariate adaptive regression splines. Nevertheless, MARSplines procedure is one of the modern machine learning algorithms with numerous advantages that are emphasized in this work. Its usefulness was confirmed for QSAR studies for predicting the antimalarial activity of dihydroartemisinin derivatives by Nguyen-Cong et al. [8], antitumor activity of acridone derivatives by Koba and Bączek [10], or anti-HIV activities of thiazolylthiourea derivatives by Alamdari et al. [19]. The present study strongly supports the idea of promoting the MARSplines technique in QSAR analysis; however, it should be considered that, given the multitude of possible datasets and descriptors available, various options for MARSplines analysis, as well as other modern machine learning algorithms with their numerous advantages, in a particular case or other regression procedure, may show a better performance. This trend is visible in the study by Kryshchyshyn et al. [7] where four machine learning algorithms, namely, Random forest regression, Stochastic gradient boosting, Multivariate adaptive regression splines, and Gaussian processes regression, were studied to reach better levels of predictivity. Finally, in the case of predicting the antitrypanosomal activity of 4-thiazolidinones, a model developed only with the Random forest and Gaussian processes regression algorithms had good predictive ability. In light of this, there is no universal regression method, but different studies prove that modern machine algorithms are worth exploring.
In sum, the obtained model can be successfully used for in silico studies in order to find new compounds with promising antitumor activity. On the one hand, it can be assumed that the presented approach has some limitations as a restriction to the chemical domain of the training set, especially A-ring hydroxylation at C-7,10, and a necessity to follow whole procedure of geometry optimization and descriptor calculation; however, on the other hand, it should be taken into consideration that there are a multitude of combinations of different possible substituents at N-2 and C-5 in the anthrapyrazole ring (some of them were tested so far and some of them were considered promising). What is more, is that the semi-empirical method AM1 for geometry optimization and the overall process of descriptor generation are fast, which speaks for a routine application of a presented MARSplines approach for QSAR studies. For the abovementioned reasons, the expanded MARSplines procedure may become a part of the process of drug design, largely as it may be useful in the selection of the new anticancer compounds of anthrapyrazoles for the synthesis and in vitro testing on various cancer cell lines.

4. Materials and Methods

4.1. Anthrapyrazole Derivatives

The conducted analyses were founded on 73 compounds of anthrapyrazole derivatives (Anthra [1,9-cd]pirazol-6(2H)-on), differing in both chemical structure and antitumor activity, as shown in Table 6. The data concerning the antitumor activity of anthrapyrazoles against the L1210 murine leukemia cell line, tested in vivo, and expressed as IC50, were obtained from the literature [11].

4.2. Geometry Optimization and Structural Descriptors

The initial optimization of the geometric structures of the analyzed particles, with the use of specialized HyperChem Release 8.0 (Hypercube Inc., Gainesville, FL, USA) software, was performed using the built-in Molecular Mechanic Force Field (MM+) procedure, taking into account the adequacy of the principles of quantum mechanics. In the next step, the proper optimization was achieved using the Semi-Empirical Molecular Method AM1, with the utilization of the Polak–Ribiere algorithm. The gradient norm limit applied for the calculations was 0.01 kcal (Å⋅mol)−1, and the maximum possible number of cycles was set to 32,000. Finally, HyperChem, as well as Dragon 7 (Talete, Milano, Italy) software, were used to obtain molecular descriptors for all studied structures, using previously optimized molecules. In total, 2554 descriptors were calculated, mainly by using Dragon software. In the next stage of the study, the obtained descriptors were subjected to MARSplines analysis. Descriptors calculated by Dragon include 29 logical molecular descriptor blocks: constitutional indices, ring descriptors, topological indices, walk and path counts, connectivity indices, information indices, 2D matrix-based descriptors, 2D autocorrelations, burden eigenvalues, P_VSA-like descriptors, ETA indices, edge adjacency indices, geometrical descriptors, 3D matrix-based descriptors, 3D autocorrelations, RDF descriptors, 3D-MoRSE descriptors, WHIM descriptors, GETAWAY descriptors, randic molecular profiles, functional group counts, atom-centered fragments, atom-type E-state indices, CATS 2D, 2D atom pairs, 3D atom pairs, charge descriptors, molecular properties, and drug-like indices [20].

4.3. Statistical Analysis

Statistical analysis was carried out using Statistica 13.3 software (StatSoft, Cracow, Poland), introducing the data obtained in the previously performed molecular modeling. The analysis used the following variables: descriptors describing molecular properties of a particle, and the values of the negative decimal logarithm of the IC50 describing the biological activity against the L1210 murine leukemia cell line tested in vitro, obtained from the literature data. The whole group of compounds was divided into a training and test set on the basis of random sample selection in STATISTICA 13.3 Data Miner (StatSoft, Cracow, Poland). Raw data, consisting of 2554 descriptors (independent variables) and negative decimal logarithm values of the IC50 (pIC50, dependent variable), were subjected to a process of standardization and pre-selection. The selection consisted of removing the variables that did not show variability. The analyses were performed at the 5% significance level (α = 0.05). The multivariate adaptive regression splines procedure was used to build eleven different models. Pearson’s correlation coefficient was used in the analysis of the correlation of variables. J. Guilford’s classification was used for the interpretation of the results. The analysis of three validation parameters, providing minimal but sufficient information about model performance (R2, Q2, MAE) [13,21], and is explained in Section 4.4, allowed for the selection of an optimal theoretical model aimed at predicting the pIC50 value for each of the considered derivatives.

4.4. MARSplines Analysis

Multivariate Adaptive Regression Splines (MARSplines), performed with the use of Statistica 13.3, is an adaptive procedure for regression. The specification of MARSplines analysis is shown in Table 7. It is very useful, especially to solve high-dimensional problems, such as a large number of inputs. Moreover, it is used to solve both regression and classification problems, and does not require assumptions about the functional relationship between independent (input) and dependent (output) data. This relationship is modeled with the use of base functions and a set of coefficients generated solely on the basis of data [6,22].
The basis functions in MARS model are single truncated spline functions, or an interaction of a few spline functions, and they consist of a left-sided and right-sided segment (reflected pair) (Equation (4)). The subscript “+” means the positive part, thus:
( x t ) + = { x t ,   if   x > t , 0 ,   otherwise ,   ( t x ) + = { t x ,   if   x < t 0 ,   otherwise .
Each function is piecewise linear, with a knot at value t (the so-called linear spline). Those reflected pairs are formed for each input Xj, with knots at each observed value xij of that input. That is why the collection of basis functions is as follows:
C = { ( X j t ) + , ( t X j ) + t   { x 1 j , x 2 j , , x Nj } j = 1 , 2 , p
If all of the input values are distinct, there are 2Np basis functions altogether. Although each basis function depends only on a single Xj, it is still considered as a function over the entire input space IRp.
The model–building approach is similar to a forward stepwise linear regression, but instead of using the original inputs, functions from set C and their products are used, so the model is as follows:
f ( X ) = β 0 + m 1 M β m h m ( X )
where each hm(X) is a function in C, or a product of two or more such functions. During each iteration, the best reflected pair is chosen and all possible predictors, as well as corresponding knot locations, are evaluated. As a result of each iteration, the so-called interactions may be introduced if this improves the model. The building process stops when a user-defined maximum number of basis functions is reached; however, it should be emphasized that, during model building, a global model usually overfits the training data. That is why, in the next step, a pruning procedure based on generalized cross-validation (GCV) is applied, which leads to exclusion functions that receive the lowest contribution from the model. The GCV parameter, comprising a penalty for the model complexity, is an adjusted residual sum of squares used to prevent the occurrence of an excessive number of spline functions in the final model.

4.5. Model Validation

Elaborated models underwent a process of validation in the terms of the determination coefficient, cross validated determination coefficient, and mean absolute error, in order to select the optimal MARS model suitable for the prediction of the antitumor activity of the anthrapyrazoles studied [13].
R 2 = 1   ( Y obs Y cal ) 2   ( Y obs   Y ¯ training ) 2
The determination coefficient R2 (Equation (7)) measures the variation of observed data with the predicted data. A perfect correlation is observed when the R2 reaches the maximum possible value (i.e., 1. Yobs denotes the observed response values for the training set, and Ycalc denotes the calculated response values for the training set of compounds. Ytraining is the mean observed response of the training set compounds [13]).
Q 2 ( orQ LOO 2 ) = 1   ( Y obs ( training ) Y pred ( training ) ) 2   ( Y obs ( training )   Y ¯ ( training ) ) 2
Cross-validated R2 (Q2), presented in Equation (8), is checked for internal validation. Yobs(training) is the observed response, and Ypred (training) is the predicted response of the training set molecules based on the leave-one-out (LOO) technique. The generally accepted threshold value of Q2 is 0.5 [13].
MAE =   | Y obs Y pred | n
The mean absolute error (MAE) (Equation (9)) is also recognized as the average absolute error (AAE). Generally, it is regarded as a superior index of errors in the context of predictive modeling studies. Due to the involvement of the squared term of the prediction errors in the expression of RMSE, the variance of errors may be influenced by a set of data. That is because squaring the higher prediction error values have more weight than the lower errors in the formalism of the root mean square error (RMSE), whereas MAE provides an equal weight to all errors; thus, MAE is considered to be a simpler and more straightforward determinant of prediction errors [13].
For the optimal MARS model following validation, the parameters as follows: R2, Q2, QF12, QF22, QF32, CCC, ∆rm2, r m 2 ¯ , PRES, SDEP, and MAE, were calculated according to Roy et al. [13].

5. Conclusions

A quantitative structure–activity relationship study was applied to a large set of anthrapyrazole compounds presenting antitumor activity against murine leukemia L1210. The approach of MARSplines was employed for prediction purposes, and was able to describe more than 96% of the variance in the experimental activity. This study has shown that fourteen parameters appearing in the statistically significant and extensively validated MARS model (i.e., descriptors belonging to 3D-MoRSE, 2D autocorrelations, GETAWAY, burden eigenvalues and RDF descriptors) significantly affect the antitumor activity of anthrapyrazole compounds. Moreover, this study confirmed the benefit of using the modern machine learning algorithm, namely, the MARSplines procedure, because the elaborated flexible model was also used in the prediction of antitumor activity against murine leukemia L1210 using an external set of seven anthrapyrazole compounds. Finally, in light of the potential laying in such a large set of anthrapyrazole compounds, which still may be tested on various cell lines, and the high predictive power of the MARS model, the MARSplines procedure may be useful in the selection of the anticancer compounds of anthrapyrazoles for future clinical studies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms23095132/s1.

Author Contributions

Conceptualization. M.G. and M.K.; methodology. M.G.; validation. R.P. and M.G.; formal analysis. M.G., R.P. and K.S.-G.; investigation. M.G. and K.M.-G.; resources. M.G. and M.K.; data curation. R.P. and M.G.; writing—original draft preparation. M.G.; writing—review and editing. M.G, K.M.-G., K.S.-G. and A.W.; visualization. R.P. and K.M.-G.; supervision. M.K. and A.W.; project administration. M.G.; funding acquisition. K.S.-G. and A.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Nelson, J.M.; Plowman, J.; Jackson, R.C.; Leopold, W.R. Anthrapyrazoles, a new class of intercalating agents with high-level, broad spectrum activity against murine tumors. Cancer Res. 1985, 45, 5532–5539. [Google Scholar]
  2. Fry, D.W.; Boritzki, T.J.; Besserer, J.A.; Jackson, R.C. In vitro DNA strand scission and inhibition of nucleic acid synthesis in L1210 leukemia cells by a new class of DNA complexers, the anthra[1,9-cd]pyrazol-6(2H)-ones (anthrapyrazoles). Biochem. Pharmacol. 1985, 34, 3499–3508. [Google Scholar] [CrossRef]
  3. Begleiter, A.; Lin, D.; Larson, K.K.; Lang, J.; Wu, X.; Cabral, T.; Taylor, H.; Guziec, L.J.; Kerr, P.D.; Hasinoff, B.B.; et al. Structure-activity studies with cytotoxic anthrapyrazoles. Oncol. Rep. 2006, 15, 1575–1580. [Google Scholar] [CrossRef] [Green Version]
  4. Klohs, W.D.; Steinkampf, R.W.; Havlick, M.J.; Jackson, R.C. Resistance to anthrapyrazoles and anthracyclines in multidrug-resistant P388 murine leukemia cells: Reversal by calcium blockers and calmodulin antagonists. Cancer Res. 1986, 46, 4352–4356. [Google Scholar] [PubMed]
  5. Talbot, D.C.; Smith, I.E.; Mansi, J.L.; Judson, I.; Calvert, A.H.; Ashley, S.E. Anthrapyrazole CI941: A highly active new agent in the treatment of advanced breast cancer. J. Clin. Oncol. 1991, 9, 2141–2147. [Google Scholar] [CrossRef] [PubMed]
  6. Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat. 1991, 19, 1–141. [Google Scholar] [CrossRef]
  7. Kryshchyshyn, A.; Devinyak, O.; Kaminskyy, D.; Grellier, P.; Lesyk, R. Development of predictive QSAR models of 4-thiazolidinones antitrypanosomal activity using modern machine learning algorithms. Mol. Inform. 2018, 37, 1700078. [Google Scholar] [CrossRef] [PubMed]
  8. Nguyen-Cong, V.; Van Dang, G.; Rode, B.M. Using multivariate adaptive regression splines to QSAR studies of dihydroartemisinin derivatives. Eur. J. Med. Chem. 1996, 31, 797–803. [Google Scholar] [CrossRef]
  9. Jalali-Heravi, M.; Asadollahi-Baboli, M.; Mani-Varnosfaderani, A. Shuffling multivariate adaptive regression splines and adaptive neuro-fuzzy inference system as tools for QSAR study of SARS inhibitors. J. Pharm. Biomed. Anal. 2009, 50, 853–860. [Google Scholar] [CrossRef] [PubMed]
  10. Koba, M.; Bączek, T. The evaluation of multivariate adaptive regression splines for the prediction of antitumor activity of acridinone derivatives. Med. Chem. 2013, 9, 1041–1050. [Google Scholar] [CrossRef] [PubMed]
  11. Hollis Showalter, H.D.; Johnson, J.L.; Hoftiezer, J.M.; Turner, W.R.; Werbel, L.M.; Leopold, W.R.; Shillis, J.L.; Jackson, R.C.; Elslager, E.F. Anthrapyrazole anticancer agents. synthesis and structure-activity relationships against murine leukemias. J. Med. Chem. 1987, 30, 121–131. [Google Scholar] [CrossRef] [PubMed]
  12. Liang, H.; Wu, X.; Guziec, L.J.; Guziec, F.S.; Larson, K.K.; Lang, J.; Yalowich, J.C.; Hasinoff, B.B. A structure-based 3D-QSAR study of anthrapyrazole analogues of the anticancer agents losoxantrone and piroxantrone. J. Chem. Inf. Model. 2006, 46, 1827–1835. [Google Scholar] [CrossRef] [PubMed]
  13. Roy, K.; Ambure, P.; Kar, S.; Ojha, P.K. Is it possible to improve the quality of predictions from an “intelligent” use of multiple QSAR/QSPR/QSTR models? J. Chemom. 2018, 32, e2992. [Google Scholar] [CrossRef]
  14. Devinyak, O.; Havrylyuk, D.; Lesyk, R. 3D-MoRSE descriptors explained. J. Mol. Graph. Model. 2014, 54, 194–203. [Google Scholar] [CrossRef] [PubMed]
  15. Todeschini, R.; Consonni, V. Molecular Descriptors for Chemoinformatics; Wiley-VCH: Weinheim, Germany, 2009. [Google Scholar]
  16. Hollas, B. An analysis of the autocorrelation descriptor for molecules. J. Math. Chem. 2003, 33, 91–101. [Google Scholar] [CrossRef]
  17. Wong, K.Y.; Mercader, A.G.; Saavedra, L.M.; Honarparvar, B.; Romanelli, G.P.; Duchowicz, P.R. QSAR analysis on tacrine-related acetylcholinesterase inhibitors. J. Biomed. Sci. 2014, 21, 84. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Hartley, J.A.; Reszka, K.; Zuo, E.T.; Wilson, W.D.; Morgan, A.R.; Lown, J.W. Characteristics of the interaction of anthrapyrazole anticancer agents with deoxyribonucleic acids: Structural requirements for DNA binding, intercalation, and photosensitization. Mol. Pharmacol. 1988, 33, 265–271. [Google Scholar] [PubMed]
  19. Alamdari, R.F.; Mani-Varnosfaderani, A.; Asadollahi-Baboli, M.; Khalafi-Nezhad, A. Monte Carlo sampling and multivariate adaptive regression splines as tools for QSAR modelling of HIV-1 reverse transcriptase inhibitors. SAR QSAR Environ. Res. 2012, 23, 665–682. [Google Scholar] [CrossRef] [PubMed]
  20. Talete SRL List of Molecular Descriptors Calculated by Dragon. Available online: http://www.talete.mi.it/products/dragon_molecular_descriptor_list.pdf (accessed on 25 February 2022).
  21. Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef] [PubMed]
  22. Hastie, T.; Tibshirani, R.; Friedman, J. Additive models, trees, and related methods. In The Elements of Statistical Learning Data Mining, Inference, and Prediction; Springer Series in Statistics; Springer: Stanford, CA, USA, 2001. [Google Scholar]
Figure 1. Geometrically optimized structures of selected anthrapyrazole derivatives: (a) a-01; (b) a-08; (c) a-18; (d) a-30; (e) a-50; (f) a-60.
Figure 1. Geometrically optimized structures of selected anthrapyrazole derivatives: (a) a-01; (b) a-08; (c) a-18; (d) a-30; (e) a-50; (f) a-60.
Ijms 23 05132 g001aIjms 23 05132 g001b
Figure 2. Correlation between the calculated and experimental antitumor data of anthrapyrazoles for the training and test data sets.
Figure 2. Correlation between the calculated and experimental antitumor data of anthrapyrazoles for the training and test data sets.
Ijms 23 05132 g002
Figure 3. Residual normality plot for the optimal model.
Figure 3. Residual normality plot for the optimal model.
Ijms 23 05132 g003
Table 1. Values of significant molecular descriptors for the tested anthrapyrazole derivatives.
Table 1. Values of significant molecular descriptors for the tested anthrapyrazole derivatives.
CompoundSetDescriptors
Mor05sMor19mMATS8eH1eATSC7vATSC1eSpMax8_Bh(s)Mor21eMor13sR5pATSC1sATSC8sRDF135eHATS5s
a-01training−23.2480.140.0711.8688.7470.0893.701−1.038−3.1060.35510.91524.8433.7160.885
a-02test−24.1510.350.0862.44910.3760.0633.729−1.668−2.0110.4317.56217.7082.3120.888
a-03test−25.0440.3120.0881.88710.2290.0723.805−1.273−3.2320.3529.63626.8174.8330.696
a-04training−26.0130.5270.0932.41511.9090.0623.826−1.909−1.9830.427.48619.9472.330.731
a-07training−28.3550.3740.1362.26612.3340.1083.868−1.422−3.9350.3414.86170.1528.0440.59
a-08training−29.4270.5560.1732.49414.0420.0673.892−1.992−2.4910.3939.46263.1826.0660.623
a-14test−27.1080.380.0812.35213.0210.093.868−1.49−4.1660.35612.92853.7067.1090.56
a-15training−24.5850.4050.0872.49910.6640.0723.731−0.942−2.7110.4629.6934.52301.068
a-16test−27.470.410.1042.48512.5540.1083.93−1.01−5.5160.4314.86150.7692.7140.979
a-17training−26.3320.3990.0452.65113.2460.093.91−0.998−4.0390.46312.92848.5292.5810.748
a-18training−27.160.3540.0512.48614.2060.1174.11−1.333−4.6560.37215.02549.7239.6280.857
a-19training−27.1820.5930.0842.45315.1320.0713.913−1.856−4.490.39710.35143.9615.7760.78
a-20training−30.2640.7110.0742.68516.8960.0843.943−2.188−2.6890.51310.25641.2578.7670.675
a-21training−30.3360.887−0.0152.45418.9160.0933.892−2.927−1.6020.4399.08937.6714.1270.535
a-23training−28.0630.4730.0522.39515.1410.0793.892−1.916−3.3630.4739.55542.1763.3590.664
a-24training−28.5660.5470.0752.39716.1230.0793.892−2.428−2.1170.4568.74939.3866.1710.6
a-25training−28.0390.7720.0622.43117.0930.0843.892−1.927−3.760.44710.25640.3575.9360.665
a-26test−28.8730.7790.0862.4418.8870.0933.892−2.847−1.140.5269.58135.76310.1980.721
a-27training−32.5260.8860.0542.4620.0270.0993.892−3.152−2.3120.49610.07341.72115.5630.608
a-28training−31.5130.772−0.0342.5921.690.1053.892−3.454−0.6940.57510.55143.57819.8450.613
a-29training−35.1891.1080.0222.50425.2180.1223.941−4.4940.0730.5411.90144.57123.8510.586
a-30test−29.8950.9840.0742.71118.2110.0693.892−2.825−3.7980.5717.22134.7294.6230.685
a-31test−32.6481.1820.0882.69218.6290.093.892−2.855−1.780.5668.36834.6044.6180.639
a-32training−37.6790.6030.072.8423.5090.0634.302−3.52−2.1930.55510.11861.6972.0660.602
a-33training−31.7560.295−0.151.8547.9640.153.907−0.585−6.7580.32519.42268.77800.778
a-34training−32.9520.272−0.0841.9629.9440.1734.222−0.896−7.170.35521.39665.8414.3120.899
a-35training−34.1320.496−0.0822.48611.7450.093.942−1.561−5.9430.42912.55450.1075.3620.949
a-36training−36.2920.493−0.0472.61314.2610.0894.279−1.313−5.8090.42313.37857.1533.5030.845
a-38training−35.8840.3270.032.4913.450.1864.35−1.215−7.4430.37521.796.6087.6120.876
a-40training−35.2470.336−0.0052.39610.1190.1944.342−0.67−7.8230.36427.496130.41200.885
a-41training−36.5660.2930.0192.37912.0940.2184.352−1.021−7.7910.35429.098127.3867.5550.807
a-42training−36.2990.3730.0522.44512.0080.1524.281−1.172−7.0050.32419.476115.9441.1620.692
a-43training−37.4660.4850.0522.5813.90.1284.279−1.574−6.8360.41918.037111.5124.3110.833
a-44training−38.0670.7110.0362.82313.1230.1784.346−1.597−7.370.53520.604115.41101.067
a-46training−40.9830.4220.0342.46616.8380.1914.376−1.732−7.5720.36926.156131.34811.2880.83
a-47test−34.6140.389−0.1012.46610.8720.1664.339−0.836−7.4930.3824.127106.02700.873
a-48training−34.5950.45−0.092.48412.0980.1554.343−1.003−7.7720.3723.038115.09400.866
a-49test−34.2590.372−0.0642.43811.8050.1474.281−1.068−7.0850.36818.73896.6460.7120.82
a-50training−37.0760.338−0.0562.38312.8270.1894.351−0.811−7.8660.35525.724102.9098.2250.782
a-51test−38.930.529−0.0852.59314.2130.1774.35−0.971−7.420.45124.541110.4012.9281.43
a-52test−39.9410.441−0.0642.40415.7760.1264.331−1.487−6.0970.37216.89197.49710.9510.628
a-53test−36.7870.433−0.1332.48114.4050.1774.354−1.006−8.2230.36324.54184.4237.2870.858
a-54test−39.8710.503−0.1512.41917.3280.1164.336−1.735−6.2130.35816.21474.59813.2410.756
a-55training−36.810.361−0.0432.514.1850.1694.308−1.119−7.4620.37620.40983.8026.6190.863
a-56training−34.2170.218−0.082.73513.0430.1894.354−0.752−9.0590.47625.72493.7112.5940.994
a-57training−36.6240.562−0.0742.58614.2980.1774.356−0.99−10.1280.41424.541102.6293.7051.362
a-60training−33.7560.421−0.0542.57214.0110.1694.302−0.952−8.7420.41620.40984.4684.291.368
a-62training−37.5030.552−0.0482.77215.0190.2124.359−1.275−8.560.4327.2190.6633.0060.789
a-63training−41.7910.386−0.0722.81316.3850.1994.358−1.074−9.1410.525.9797.9752.9330.898
a-64test−41.4080.429−0.0242.6717.5420.2344.447−1.311−9.1270.43632.04592.1088.3371.114
a-65training−42.3940.647−0.0692.73718.9360.224.445−1.996−8.8130.48430.676102.4439.9660.994
a-66training−35.1710.471−0.0312.57114.9980.1494.297−1.291−8.1450.40518.9179.8567.2641.18
a-67training−37.340.443−0.0332.74616.8380.1264.296−2.132−8.0910.50717.38273.9715.8590.842
a-68training−37.2470.848−0.0332.84916.0920.1724.356−1.65−8.6840.51219.23377.962.2250.842
a-69training−35.640.589−0.0892.61815.7420.144.267−1.564−7.4460.42617.56978.6367.2160.946
a-70test−36.9070.519−0.0482.68215.4570.1864.358−1.247−8.2270.39524.21185.13310.2850.744
a-71training−39.5990.429−0.0552.55817.9350.1464.348−1.844−6.6180.38118.31284.95213.5480.808
a-73training−39.4460.471−0.0622.60518.4740.1294.297−1.592−7.4970.41517.76382.3524.1010.769
a-74training−38.2840.554−0.1312.50515.5580.1654.353−1.035−8.240.40123.364100.73311.1980.861
a-76training−38.0740.387−0.0382.59716.4060.1914.357−1.39−6.3310.41125.14883.8978.1330.984
a-77training−35.230.447−0.0642.55813.5770.1284.285−0.952−6.5330.46417.28979.75500.918
a-78training−36.9410.529−0.062.47514.840.1174.289−1.303−7.3970.41616.65988.642.3111.377
a-79test−32.380.413−0.0332.72715.5630.1494.303−1.113−7.9310.46318.9177.1747.0470.88
a-80test−35.1720.604−0.182.48713.8320.1174.312−1.267−7.5580.46316.65974.5512.7891.292
a-81test−38.7190.681−0.1692.51115.1080.1084.315−1.611−7.8980.41515.99182.9314.8081.146
a-82test−35.2430.672−0.1272.76615.8260.1374.323−1.786−8.2110.48818.16472.0040.0530.897
a-83training−36.320.495−0.0622.54615.0140.1084.286−1.608−8.1390.53215.97473.760.4190.896
a-84test−36.3820.473−0.0392.46516.0240.0894.26−1.908−6.3120.46512.75567.0735.6920.813
a-86training−38.160.486−0.0342.49917.020.1264.301−1.772−7.5010.44717.38271.6016.7250.797
a-87test−38.5230.577−0.0212.53518.8770.0894.257−2.65−5.6730.48113.3461.7818.6080.73
a-88test−39.8250.429−0.0482.64417.9270.1464.353−2.036−7.3980.40618.31282.66318.6980.771
a-90training−39.4060.7840.022.67518.5460.1664.352−1.824−6.0020.48624.308118.3317.8940.78
a-91training−40.7921.0030.0482.64520.30.0954.302−2.484−5.4350.52917.575106.76618.0720.75
Table 2. Selected descriptors and the number of times they appeared in the basis functions of the MARS model.
Table 2. Selected descriptors and the number of times they appeared in the basis functions of the MARS model.
SymbolDefinitionBlockDimensionalityNumber in the Basis Function
Mor05ssignal 05/weighted by I-state3D-MoRSE descriptors3D9
Mor19msignal 19/weighted by mass3D-MoRSE descriptors3D6
MATS8eMoran autocorrelation of lag 8 weighted by Sanderson electronegativity2D autocorrelations2D4
H1eH autocorrelation of lag 1/weighted by Sanderson electronegativityGETAWAY descriptors3D3
ATSC7vCentred Broto–Moreau autocorrelation of lag 7 weighted by van der Waals volume2D autocorrelations2D2
ATSC1eCentred Broto–Moreau autocorrelation of lag 1 weighted by Sanderson electronegativity2D autocorrelations2D2
SpMax8_Bh(s)largest eigenvalue n. 8 of Burden matrix weighted by I-stateBurden eigenvalues2D2
Mor21esignal 21/weighted by Sanderson electronegativity3D-MoRSE descriptors3D2
Mor13ssignal 13/weighted by I-state3D-MoRSE descriptors3D2
R5pR autocorrelation of lag 5/weighted by polarizabilityGETAWAY descriptors3D2
ATSC1sCentred Broto–Moreau autocorrelation of lag 1 weighted by I-state2D autocorrelations2D1
ATSC8sCentred Broto–Moreau autocorrelation of lag 8 weighted by I-state2D autocorrelations2D1
RDF135eRadial Distribution Function—135/weighted by Sanderson electronegativityRDF descriptors3D1
HATS5sleverage-weighted autocorrelation of lag 5/weighted by I-stateGETAWAY descriptors3D1
Table 3. The functions of the basis splines.
Table 3. The functions of the basis splines.
BmDefinitionam
B117.00228
B2(Mor05s + 28.56600)+−0.41345
B3(−28.56600 − Mor05s)+−0.10460
B4(ATSC7v − 12.33400) + (Mor05s + 28.56600)+0.29808
B5(12.33400 − ATSC7v) + (Mor05s + 28.56600)+0.11583
B6(−28.56600 − Mor05s) + (R5p − 0.37500)+0.98096
B7(−28.56600 − Mor05s) +(0.37500 − R5p) +3.57380
B8(Mor19m − 0.42100) +−1.63111
B9(0.42100 − Mor19m)+−5.67335
B10(MATS8e − 0.07400)+−14.65355
B11(15.99100 − ATSC1s)+ (0.07400 − MATS8e)+−6.03111
B12(70.15200 − ATSC8s)+ (0.07400 − MATS8e)+0.92668
B13(−28.56600 − Mor05s)+ (H1e − 2.54600)+−0.53694
B14(MATS8e − 0.07400)+ (RDF135e − 7.04700)+−8.31766
B15(SpMax8_Bh(s) − 4.32300)+ (−28.56600 − Mor05s)+ (0; 2.54600 − H1e)+−16.59500
B16(4.32300 − SpMax8_Bh(s))+ (−28.56600 − Mor05s)+ (2.54600 − H1e)+−0.64411
B17(Mor19m − 0.42100)+ (Mor21e + 1.26700)+−19.90208
B18(Mor19m − 0.42100)+ (−1.26700 − Mor21e)+−0.88179
B19(Mor19m − 0.42100)+ (Mor13s + 6.31200)+0.33453
B20(Mor19m − 0.42100)+ (−6.31200 − Mor13s)+0.65372
B21(0.85700 − HATS5s)+1.71725
B22(ATSC1e − 0.11600)+6.68741
B23(0.11600 − ATSC1e)+6.15634
Table 4. Values of validation parameters of models obtained with the MARSplines procedure (the optimal model marked in yellow).
Table 4. Values of validation parameters of models obtained with the MARSplines procedure (the optimal model marked in yellow).
Degree of InteractionNumber of Basis FunctionsR2Q2MAE
160.5291−0.15250.2622
160.82880.57870.1709
210.92770.87060.1133
210.91850.88070.1230
260.46910.13430.2819
160.86490.74800.1616
330.93280.93110.1096
360.46910.13430.2819
260.86490.74800.1616
380.96170.90160.0772
400.95320.90330.0897
Table 5. Values of validation parameters of the optimal MARS model.
Table 5. Values of validation parameters of the optimal MARS model.
Parameter [13]ValueThreshold [13]Meaning [13]
R 2 = 1   ( Y obs Y cal ) 2   ( Y obs Y ¯ training ) 2 0.9617~1
(1 means perfect correlation)
It measures the variation of observed
data with the predicted ones.
Q 2 ( orQ LOO 2 ) = 1   ( Y obs ( training ) Y pred ( training ) ) 2   ( Y obs ( training ) Y ¯ ( training ) ) 2 0.9016≥0.5Cross-validated R2 (Q2) checked for internal validation.
Q F 1 2 = 1   ( Y obs ( test ) Y pred ( test ) ) 2   ( Y obs ( test ) Y ¯ ( training ) ) 2 0.9119≥0.5A measure of correlation between the observed and predicted
data of the test set.
Q F 2 2 = 1   ( Y obs ( test ) Y pred ( test ) ) 2   ( Y obs ( test ) Y ¯ ( test ) ) 2 0.90163≥0.5Almost equal or closer values of Q2(F2) and Q2(F1) infer that the training set mean lies in the close propinquity to that of the test set.
Q F 3 2 = 1 [   ( Y obs ( test ) Y pred ( test ) ) 2 ] / n test [   ( Y obs ( train ) Y ¯ ( train ) ) 2 ] / n train 0.7959≥0.5It measures the model predictability.
CCC = 2 i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n ( x i x ¯ ) 2 + i = 1 n ( y i y ¯ ) + n ( x ¯ y ¯ ) 0.9496~1Concordance correlation coefficient (CCC) measures both precision and accuracy, detecting the distance of the observations from the fitting line and the degree of deviation of the regression line from that passing through the origin, respectively.
r m 2 ¯ = ( r m 2 + r m 2 ) 2 and   Δ r m 2 = | r m 2 r m 2 | ,
where   r m 2 = r 2 × ( 1 r 2 r 0 2 )
r m 2 = r 2 × ( 1 r 2 r 0 2 )
and   parameters   r 2   and   r 0 2   are denoted as follows:
r 0 2 = 1   ( Y obs k × Y pred ) 2   ( Y obs Y ¯ obs ) 2 and
r 0 2 = 1   ( Y pred k × Y obs ) 2   ( Y pred Y ¯ pred ) 2
The terms k and k are explained as follows:
k =   ( Y obs × Y pred )   ( Y pred ) 2 and   k =   ( Y obs × Y pred )   ( Y obs ) 2
0.0173 and 0.9181 Δ r m 2 <   0.2   provided   that   the   value   of   r m 2 ¯ 2 > 0.5They reflect the overall predictability of the model
for the entire data set.
PRESS =   ( Y obs Y pred ) 2 0.3446 It evaluates the model using
the predicted residual sum of squares.
SDEP = PRESS n 0.1252 Standard deviation of error of prediction (SDEP) is calculated from PRESS.
MAE =   | Y obs Y pred | n 0.0772 Index of errors in the context of predictive modeling studies.
Table 6. Chemical structures and antitumor activity of the anthrapyrazoles studied.
Table 6. Chemical structures and antitumor activity of the anthrapyrazoles studied.
Ijms 23 05132 i001
CompoundSetXR1NR2R3L1210
Leukemia
In Vitro:
IC50,M
a-01trainingHHNHCH2CH2NHCH2CH2OH2.2 × 10−6
a-02testHHNHCH2CH2NEt21.5 × 10−6
a-03testHCH3NHCH2CH2NHCH2CH2OH7.1 × 10−7
a-04trainingHCH3NHCH2CH2NEt26.7 × 10−7
a-07trainingHCH2CH2OHNHCH2CH2NHCH2CH2OH1.8 × 10−6
a-08trainingHCH2CH2OHNHCH2CH2NEt28.8 × 10−6
a-14testHCH2CH2NH2NHCH2CH2NHCH2CH2OH8.0 × 10−8
a-15trainingHCH2CH2NHCH2CH2OHNHCH37.4 × 10−7
a-16testHCH2CH2NHCH2CH2OHNHCH2CH2OH7.5 × 10−7
a-17trainingHCH2CH2NHCH2CH2OHNHCH2CH2NH26.9 × 10−8
a-18trainingHCH2CH2NHCH2OHNHCH2CH2NHCH2CH2OH7.4 × 10−8
a-19trainingHCH2CH2NHCH2CH2OHNHCH2CH2NMe23.2 × 10−8
a-20trainingHCH2CH2NHCH2CH2OHNHCH2CH2NEt26.0 × 10−8
a-21trainingHCH2CH2NEt2NH(CH2)5CH32.0 × 10−6
a-23trainingHCH2CH2NEt2NHCH2CH2NH24.6 × 10−8
a-24trainingHCH2CH2NEt2NHCH2CH2NHMe2.7 × 10−8
a-25trainingHCH2CH2NEt2NHCH2CH2NHCH2CH2OH3.2 × 10−8
a-26testHCH2CH2NEt2NHCH2CH2NEt23.9 × 10−7
a-27trainingHCH2CH2NEt2NH(CH2)3NEt25.2 × 10−7
a-28trainingHCH2CH2NEt2NH(CH2)4NEt26.2 × 10−7
a-29trainingHCH2CH2NEt2NH(CH2)7NEt26.3 × 10−7
a-30testHCH2CH2NEt2NHCH2CH2N(CH2CH2)2O4.8 × 10−7
a-31testHCH2CH2NEt2NHCH2CH2N(CH2CH2)2NH5.0 × 10−7
a-32trainingHCH2CH2NEt2NHCH2CH2N(CH2CH2)2NCOOCH2Ph3.9 × 10−7
a-33training7,10-(OH)2CH3NHCH2CH2NH22.4 × 10−7
a-34training7,10-(OH)2CH3NHCH2CH2NHCH2CH2OH1.5 × 10−7
a-35training7,10-(OH)2CH3NHCH2CH2NEt24.5 × 10−7
a-36training7,10-(OH)2CH2PhNHCH2CH2NMe28.6 × 10−7
a-38training7,10-(OH)2CH2CH2OMeNHCH2CH2NHCH2CH2OH1.6 × 10−6
a-40training7,10-(OH)2CH2CH2OHNHCH2CH2NH24.8 × 10−7
a-41training7,10-(OH)2CH2CH2OHNHCH2CH2NHCH2CH2OH7.8 × 10−7
a-42training7,10-(OH)2CH2CH2OHNHCH2CH2NMe21.5 × 10−8
a-43training7,10-(OH)2CH2CH2OHNHCH2CH2NEt27.3 × 10−7
a-44training7,10-(OH)2CH2CH2OHNHCH2CH2N(CH2CH2)2O1.1 × 10−6
a-46training7,10-(OH)2CH2CH(OH)CH2OHNHCH2CH2NHCH2CH2NMe22.2 × 10−6
a-47test7,10-(OH)2CH2CH2NH2NHCH2CH2NH24.8 × 10−7
a-48training7,10-(OH)2CH2CH2NH2NH(CH2)3NH23.1 × 10−7
a-49test7,10-(OH)2CH2CH2NH2NHCH2CH2NHMe7.0 × 10−7
a-50training7,10-(OH)2CH2CH2NH2NHCH2CH2NHCH2CH2OH5.8 × 10−7
a-51test7,10-(OH)2CH2CH2NH2NH(CH2)3NHCH2CH2OH8.7 × 10−7
a-52test7,10-(OH)2CH2CH2NH2NHCH2CH2NHCH2CH2NMe29.3 × 10−7
a-53test7,10-(OH)2(CH2)3NH2NHCH2CH2NHCH2CH2OH1.6 × 10−7
a-54test7,10-(OH)2(CH2)3NH2NHCH2CH2NHCH2CH2NMe26.4 × 10−7
a-55training7,10-(OH)2CH2CH2NHMeNHCH2CH2NHCH2CH2OH4.4 × 10−7
a-56training7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2NH21.6 × 10−6
a-57training7,10-(OH)2CH2CH2NHCH2CH2OHNH(CH2)3NH29.6 × 10−7
a-60training7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2NHMe1.4 × 10−7
a-62training7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2NHCH2CH2OH7.4 × 10−7
a-63training7,10-(OH)2CH2CH2NHCH2CH2OHNH(CH2)3NHCH2CH2OH1.8 × 10−6
a-64test7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2N(CH2CH2OH)24.3 × 10−7
a-65training7,10-(OH)2CH2CH2NHCH2CH2OHNH(CH2)3N(CH2CH2OH)29.2 × 10−7
a-66training7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2NMe22.3 × 10−7
a-67training7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2NEt25.1 × 10−7
a-68training7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2N(CH2CH2)2O6.5 × 10−7
a-69training7,10-(OH)2CH2CH2NHCH2CH2OHN(CH2CH2)2NMe4.3 × 10−7
a-70test7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2NHCH2CH2NH23.3 × 10−7
a-71training7,10-(OH)2CH2CH2NHCH2CH2OHNHCH2CH2NHCH2CH2NMe27.6 × 10−7
a-73training7,10-(OH)2CH2CH2NHCH2CH2OHN(Me)CH2CH2NMe26.3 × 10−7
a-74training7,10-(OH)2CH2CH2NHCH2CH2OHNH(CH2)3NH21.8 × 10−6
a-76training7,10-(OH)2CH2CH2NMeCH2CH2OHNHCH2CH2NHCH2CH2OH3.3 × 10−7
a-77training7,10-(OH)2CH2CH2NMe2NHCH2CH2NH22.2 × 10−7
a-78training7,10-(OH)2CH2CH2NMe2NH(CH2)3NH25.4 × 10−7
a-79test7,10-(OH)2CH2CH2NMe2NHCH2CH2NHCH2CH2OH1.2 × 10−7
a-80test7,10-(OH)2(CH2)3NMe2NHCH2CH2NH22.2 × 10−6
a-81test7,10-(OH)2(CH2)3NMe2NH(CH2)3NH28.0 × 10−7
a-82test7,10-(OH)2(CH2)3NMe2NHCH2CH2NHCH2CH2OH5.9 × 10−7
a-83training7,10-(OH)2CH2CH2NEt2NHCH2CH2NH24.6 × 10−8
a-84test7,10-(OH)2CH2CH2NEt2NHCH2CH2NHMe7.4 × 10−6
a-86training7,10-(OH)2CH2CH2NEt2NHCH2CH2NHCH2CH2OH1.3 × 10−7
a-87test7,10-(OH)2CH2CH2NEt2NHCH2CH2NEt25.5 × 10−7
a-88test7,10-(OH)2CH2CH2NHCH2CH2NMe2NHCH2CH2NHCH2CH2OH1.4 × 10−6
a-90training7,10-(OH)2CH2CH(OH)CH2NEt2NHCH2CH2NHCH2CH2OH8.4 × 10−7
a-91training7,10-(OH)2CH2CH(OH)CH2NEt2NHCH2CH2NEt21.3 × 10−6
Table 7. Specification of MARSplines analysis.
Table 7. Specification of MARSplines analysis.
OptionsValues
Maximum number of basis functions40
Degree of interactions3
Penalty2
Threshold0.0005
Apply pruningYES
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gackowski, M.; Szewczyk-Golec, K.; Pluskota, R.; Koba, M.; Mądra-Gackowska, K.; Woźniak, A. Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Antitumor Activity of Anthrapyrazole Derivatives. Int. J. Mol. Sci. 2022, 23, 5132. https://doi.org/10.3390/ijms23095132

AMA Style

Gackowski M, Szewczyk-Golec K, Pluskota R, Koba M, Mądra-Gackowska K, Woźniak A. Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Antitumor Activity of Anthrapyrazole Derivatives. International Journal of Molecular Sciences. 2022; 23(9):5132. https://doi.org/10.3390/ijms23095132

Chicago/Turabian Style

Gackowski, Marcin, Karolina Szewczyk-Golec, Robert Pluskota, Marcin Koba, Katarzyna Mądra-Gackowska, and Alina Woźniak. 2022. "Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Antitumor Activity of Anthrapyrazole Derivatives" International Journal of Molecular Sciences 23, no. 9: 5132. https://doi.org/10.3390/ijms23095132

APA Style

Gackowski, M., Szewczyk-Golec, K., Pluskota, R., Koba, M., Mądra-Gackowska, K., & Woźniak, A. (2022). Application of Multivariate Adaptive Regression Splines (MARSplines) for Predicting Antitumor Activity of Anthrapyrazole Derivatives. International Journal of Molecular Sciences, 23(9), 5132. https://doi.org/10.3390/ijms23095132

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop