2.1. Lipophilicity—Experimental and Theoretical Studies
Lipophilicity is a physicochemical property of a compound describing its behavior in a two-phase system consisting of a nonpolar organic phase and a polar phase, most often aqueous. This parameter is one of the key features of compounds required to assess the processes of absorption, distribution and transport in biological systems, next to solubility, stability and acid–base characteristics. Moreover, lipophilicity is a factor determining the affinity of the compound to the target proteins, which is responsible for the final effect of biological action [
24,
25,
26].
The studies we conducted involved the use of reversed-phase thin-layer chromatography (RP-TLC) to determine the experimental logP
TLC values of previously synthesized 3 and/or 28-indole-betulin derivatives (
Table 1) [
27,
28].
The initial stage of the studies aimed at obtaining the logP
TLC values of the tested compounds concerned the determination of the standard curve. For this purpose, the lipophilicity values were determined experimentally (R
M0), and values from the literature (logP
lit) [
29,
30] were used for the following standard substances: acetanilide, prednisone, 4-bromoacetophenone, benzophenone, anthracene, dibenzyl and DDT (dichlorodiphenyltrichloroethane). Determination of the R
M0 values for reference compounds was performed under the same chromatographic conditions as for the indole derivatives EB355A, EB365, EB366 and EB367. The standard curve of the relationship between logP
lit and R
M0 is described by the following equation:
Table 2 presents the literature-based (logP
lit) and experimental values of lipophilicity (R
M0 and logP
TLC) for the reference substances.
Using the equations R
M = R
M0 + bC and logP
TLC = 1.1332R
M0 + 0.6683 and the R
M values determined during the experiment for indole derivatives of betulin, logP
TLC values were calculated. These values are presented in
Table 3.
The experimentally determined logPTLC values for the indole derivatives of betulin ranged from 7.52 to 8.28. The highest lipophilicity among the compounds tested was characterized by the diester derivative EB365, which had an acetyl substituent at the C-3 position, and the lowest was characterized by the monoester derivative EB355A containing a free hydroxyl group at the C-3 carbon atom. Both of these derivatives have an indolyl moiety at the C-28 position. The indole derivatives EB355A and EB367, containing a free hydroxyl group at positions C-3 and C-28, respectively, are characterized by reduced lipophilicity compared with the disubstituted derivatives EB365 and EB366. In the conducted study, betulin was used as a reference compound (logPTLC = 6.11).
The results obtained by us confirmed previous studies on the lipophilicity parameters of betulin’s mono- and diester derivatives. The lower values of the lipophilicity parameter of betulin’s monoester derivatives resulted from the presence of a hydrophilic hydroxyl group attached to the hydrophobic triterpene scaffold [
31,
32].
Except for the lipophilicity values obtained from the thin-layer chromatography experiment, the lipophilicity values obtained using computer programs or Internet databases (iLOGP, XLOPG2, XLOPG3, WLOGP, MLOGP, SILICOS-IT, milogP, ACD/logP, KOWWIN and ALOGPs) were considered as well. These values of the theoretical lipophilicity for the analyzed triterpenoids are presented in the
Table 4.
The theoretical values of the logP parameter of the indole derivatives of betulin ranged from 4.89 to 12.52. The graph below (
Figure 1) shows a comparison of the experimentally determined values of lipophilicity with the theoretical values obtained using various commercially available computer programs.
By analyzing the above graph, it can be concluded that the lowest lipophilicity values are characteristic of betulin (logP = 4.31–8.28). The modification of the betulin molecule caused all its derivatives to have higher lipophilicity values. iLOGP predicted the lowest values for all compounds, and ACD/logP had the highest values. The experimental logPTLC value of betulin’s indole derivatives was closest to the theoretical value obtained using ALOGP.
2.2. Correlation Analysis
Table 5 presents the results of the correlation analysis between all values considered in the studies. These include the lipophilicity value determined experimentally (logP
TLC) and 10 lipophilicity values determined theoretically using generally available online calculators. The correlations between logP
TLC and other values were specific for the analysis. High correlation values (all above 0.9) allowed us to set the correlation equations to calculate logP
TLC based only on the theoretical value.
A correlation analysis was also performed between the logP
TLC parameters and the values of the physicochemical (M, molar weight; TPSA, topological polar surface; nROTB, rotational bonds; nHBD, hydrogen bond donors; nHBA, hydrogen bond acceptors; MR, molar refractivity) and ADME properties (logPapp, Caco-2 permeability; logKp, skin permeability) (
Table 6).
And in this case (
Table 6), all correlation coefficients, except nHBD, were also high. In the case of this parameter, it was impossible to find a correlation equation that would describe the linear relationship between logP
TLC and nHBD. In this case, the correlation coefficient was 0.608 and the significance level was
p = 0.28. As can be seen in
Table 6, the number of donors for betulin and the EB355A and EB367 derivatives was the same, while the compounds differed in lipophilicity (logP
TLC value). This situation resulted from introducing a lipophilic indole system and replacing one of the donors (OH in Position C-3 or C-28) with an NH group of similar acidity. In the case of excluding betulin from the calculations, a higher correlation coefficient was obtained from the dependence of logP
TLC on nHBD. Another parameter of hydrogen bonds assumed in the calculations was the number of hydrogen bond acceptors (nHBA), which were different for betulin and its indole derivatives, for which a high correlation was obtained.
Table 7 presents all statistically significant correlation equations and the parameters describing them. Due to the high values of the correlation coefficients, all equations presented here could be of use for determining the lipophilicity of the tested compounds without the need to conduct an experiment.
2.3. Cluster Analysis (CA)
Cluster analysis for the results obtained, including the experimental lipophilicity (R
M0 and logP
TLC) and those calculated using computer programs, was performed. The analysis is presented in
Figure 2.
Figure 2 shows three visible clusters that grouped the results obtained. The first one included R
M0, MLOGP and ALOGPs; the second included logP
TLC, WLOGP, SILICOS-IT and milogP; and the third one consisted of XLOGP2, XLOGP3 and KOWWIN. The grouping arose from the similarity in the individual values of lipophilicity. The values from WLOGP, SILICOS-IT and milogP were the most similar to the experimental logP
TLC value. ACD/logP and iLOGP were the ones that deviated the most from the rest, which was obviously due to their values. The ACD/logP values were the highest, and iLOGP was the lowest among the others.
To compare the compounds analyzed, another similarity analysis was performed. This analysis showed (
Figure 3) that the compounds formed two pairs, i.e., EB355A and EB367, and EB365 and EB366, which resulted directly from their structure. The first two had a free OH group in the C-3 and C-28 positions, and the remaining two had an acetyl group in analogous positions. Betulin, because its structure differed the most from other newly synthesized compounds, did not belong to any group.
Similar conclusions as in the case of lipophilicity were obtained by analyzing all the values of the physicochemical properties and ADME values (
Figure 4). The compounds were grouped in the same way as in the case of the analysis based on the lipophilicity values. Because the values of the analyzed data were calculated according to the SMILE code, which resulted directly from the structure of the compound, compounds with similar structures formed one cluster.
2.4. Principal Component Analysis (PCA)
PCA analysis was performed for the logP values (experimental and theoretically calculated), physicochemical properties and ADME properties. The analyzed data were standardized. Four eigenvalues, which described 100% of the variability of the system, were obtained. To select the number of eigenvalues, a scree plot was used (
Figure 5). The first eigenvalue had a very high percentage of the total variance (94.21%). The remaining eigenvalues complemented it. When we analyzed the contributions of individual data values to the main components, it was visible that the first component contained the highest shares of almost all data. Only the nHBD value made the highest contribution to the second eigenvalue. A graph of the cases projected onto the factor plane was prepared using the obtained eigenvalues (
Figure 6). The analyzed compounds were grouped in pairs, i.e., EB355A and EB367, and EB365 and EB366. The same pairs of compounds formed clusters when analyzing the lipophilicity values using CA. The compounds that formed pairs on the plot had very similar structures, and betulin, which differed from the others, was located in a completely different part of the chart. The relationships discussed here are presented in the figures below.
2.5. Correlation with IC50 Values
The cytotoxic activity of indole-functionalized betulin derivatives was tested in vitro against amelanotic melanoma (A375, C32), triple-negative breast cancer (MDA-MB-231), estrogen receptor-positive breast cancer (MCF-7), lung cancer (A549) and colon adenocarcinoma (DLD-1, HT-29) cells [
27,
28].
A correlation analysis of all the values of lipophilicity, ADME, physicochemical properties and IC
50 values, describing the cytotoxic activity of indole derivatives of betulin, was also performed. Due to the fact that only three of the tested compounds (EB355A, EB366 and EB367) had cytotoxic activity, such a correlation was performed only for them. Another limitation was the cell lines against which the abovementioned compounds showed cytotoxicity; therefore, the correlation applied only to the MCF-7 cell line (estrogen-receptor-positive breast cancer). The IC
50 values of the EB355A, EB366 and EB367 derivatives against MCF-7 cells were 67, 156 and 35 µM, respectively. The correlation analysis between the IC
50 values for MCF-7 and the other analyzed values for the compounds EB355A, EB366 and EB367 is presented in
Table 8.
As
Table 8 shows, the correlation coefficients had low values. The highest value was for the relationship between the IC
50 for MCF-7 and milogP. Most of the analyzed data were determined on the basis of the structure. Hence, we could conclude that the structure of the compound is not so significant for the IC
50 value. However, the lack of a correlation with the experimental values of lipophilicity (R
M0 and logP
TLC) indicated no dependence between these two properties, i.e., logP does not depend on IC
50 nor does IC
50 depend on logP.
2.6. In Silico Analysis of Toxicity 3- and 28-Indolobetulin Derivatives
The development of new drugs is a costly and lengthy process, often burdened with failure due to the occurrence of toxicity of the bioactive substance. The toxic effects are most often observed at late stages of the development of the drug. Therefore, there is a need for a rapid assessment of the toxicity of a chemical compound in the early stages of research. For this purpose, various in silico predictions are used, which allow us, among other matters, to determine toxicity at the organ level (hepatotoxicity, nephrotoxicity, neurotoxicity, cardiotoxicity and skin toxicity) [
33,
34]. Predictions of toxicity for indole derivatives of betulin were performed using the web tool admetSAR 3.0 [
35,
36] (
Table 9).
On the basis of the data obtained, a compound is classified as nontoxic when the label is 0 or as toxic when the label is 1. All compounds were characterized by the absence of hepatotoxic, nephrotoxic, neurotoxic and skin-sensitizing effects. In silico prediction showed that only the 28-indolosubstituted derivative EB355A may exhibit cardiotoxic activity.