Modeling of Anticancer Sulfonamide Derivatives Lipophilicity by Chemometric and Quantitative Structure-Retention Relationships Approaches

Sulfonamides are a classic group of chemotherapeutic drugs with a broad spectrum of pharmacological action, including anticancer activity. In this work, reversed-phase high-performance liquid chromatography and biomimetic chromatography were applied to characterize the lipophilicity of sulfonamide derivatives with proven anticancer activities against human colon cancer. Chromatographically determined lipophilicity parameters were compared with obtained logP, employing various computational approaches. Similarities and dissimilarities between experimental and computational logP were studied using principal component analysis, cluster analysis, and the sum of ranking differences. Furthermore, quantitative structure–retention relationship modeling was applied to understand the influences of sulfonamide’s molecular properties on lipophilicity and affinity to phospholipids.


Introduction
Due to inadequate pharmacokinetic properties, many drug candidates are rejected during clinical trials. Under those circumstances, besides biological activity, the physicochemical properties of putative drug molecules must be optimized at the early stage of drug development. These actions are carried out to achieve the desired in vivo drug metabolism and pharmacokinetics (DMPK) profile [1]. Among available screening methods, chromatography has a well-established position [2,3], especially in lipophilicity profiling.
The lipophilicity is a crucial physicochemical parameter of every molecule, affecting toxicity, absorption, distribution, metabolism, and drug elimination [4]. Retention in reversed-phase high-performance liquid chromatography (RP-HPLC) is governed by lipophilicity; consequently, retention data can be considered a surrogate of logP. The International Union of Pure and Applied Chemistry (IUPAC) and the Organization for Economic Co-operation and Development (OECD) consider RP-HPLC an equivalent method to the traditional shake-flask approach for logP estimation. Generally, the chromatographic technique is low-cost, fast, and requires a small amount of target substances that do not need to be pure, as their impurities are readily separated during the chromatographic process.

Results and Discussion
It is well established that the computational approaches for lipophilicity estimation have several advantages over the experimental methods, including short calculation time and saving chemical reagents. Furthermore, it allows for lipophilicity prediction before synthesis; therefore, it can be applied to the design of potential drug candidates. These advantages make computational approaches desirable from economic and environmental points of view. Nevertheless, it should be emphasized that significant differences between the calculated logP values for the same molecules using various theoretical approaches are the computational methods' crucial limitations [33][34][35][36].
Similarly, this is also evident in the calculated logP values for the investigated sulfonamides derivatives. As can be observed in Table 1, the calculated logP values are very diversified within a single substance. On average, within one molecule, the calculated logP values vary by four LogP units. It is most noticeable for molecule no. 21, where the difference between WLogP and iLogP states 7.3 clogP units. Indicated differences can be explained by the varied nature of the algorithms employed in the applied software programs [37]. The lowest calculated logP values are achieved by the iLogP descriptor (in 16 cases), and the MLogP descriptor follows it in 10 cases. The iLogP descriptor is a physics-based algorithm established on free solvation energies, whereas the MLogP descriptor investigated the hydride algorithm of topological and molecular properties. Each compound has a logP with a value less than five for both mentioned algorithms and reached the Lipinski rule of five [38]. On the other hand, in most cases, higher calculated logP is observed for descriptor WLogP, based on purely atomistic methods. Among tested derivatives, the most hydrophilic substance, according to each algorithm, is compound no. 1. Nevertheless, selecting the most lipophilic substance based on calculated logP is difficult. According to the descriptors WLogP (atomistic method), Silicos-IT LogP (hybrid fragmental/topological method), and Consensus LogP (average of all five predictions), the most lipophilic compound is no. 21. However, compounds 19, 27, and 11 are the most lipophilic accordingly to the descriptors iLogP, XLogP (atomistic and knowledge-based method), and KOWWIN LogP (atom-based approach and fragmental contribution). Additionally, compound no. 27, in the case of the XlogP descriptor, is also the most lipophilic substance according to the MlogP descriptor.
The obtained contradiction suggested that it is still worth using experimental methods to assess drug candidates' lipophilicity. For this reason, the next step of our study concerns the characterization of investigated sulfonamide derivatives using a chromatographic approach. Generally, in RP-HPLC, the interactions between solutes and the stationary phase surface are mainly governed by lipophilicity [36,39,40]. Consequently, RP-HPLC is the primary tool for assessing the lipophilicity of xenobiotics [41]. Nevertheless, the one unique protocol for chromatographic lipophilicity measurement, which can be applied to analyze each chemical group, does not exist. Generally, the chemical character of target solutes determines interactions between the molecule and the stationary phase that may occur. Considering the large variety of RP-HPLC beds and differences in specific surface area, the energy of the sorbent's intermolecular interactions between solutes and chemical structures, four different reversed-phase stationary phases and IAM column were investigated during this study. The lipophilicity index was presented as a logk w value, so the retention factor was extrapolated to the pure water after calculation using a protocol based on two gradient measurements and DryLab software [42,43]. In the case of IAM and C 18 chromatography, we applied protocols proposed by Valko and co-workers [44,45] and subsequently linearly transformed data into logk IAM [46] and CHI log D [47] using the following formulas: logk IAM = 0.045 CHI IAM + 0.42 (1) Since the biomimetic chromatography experiments are carried out at physiological pH to compare obtained results, all chromatographic experiments have been carried out at pH 7.4. Calculated pKa indicates that sulfonamide derivatives molecules are in a charged state. Consequently, the chromatography lipophilicity indices refer to the distribution coefficient (logD) values, which better reflect the lipophilicity of ionized substances under physiological conditions. The chromatographically obtained lipophilicity indices of sulfonamide derivatives are presented in Table 2. As might be expected, the obtained chromatographic indexes are generally highly correlated ( Figure 1). A high correlation was noticed between IAM, C 18 , and C 8 columns (r < 0.91). On the contrary, significantly lower correlation coefficients were achieved for the Ph and CN columns and the above-mentioned stationary phase (0.76 < r < 0.89), indicating a significant difference in the sulfonamide retention mechanism using these beds.
Based on the experimental data, compound no. 1 turned out to be the least lipophilic substance. The calculation methods provided the same conclusion as experimental methods in selecting the least lipophilic drug candidate from tested sulfonamide derivatives. Contrarily, the most lipophilic properties are in compound no. 4, taking only the chromatographic experiments. However, none of the calculation algorithms selected this molecule as a highly lipophilic substance. Surprisingly, according to the calculation methods, it is one of the least lipophilic derivatives. These data indicate that experimental methods are still necessary for the drug development pipeline. As might be expected, the obtained chromatographic indexes are generally highly correlated ( Figure 1). A high correlation was noticed between IAM, C18, and C8 columns (r < 0.91). On the contrary, significantly lower correlation coefficients were achieved for the Ph and CN columns and the above-mentioned stationary phase (0.76 < r < 0.89), indicating a significant difference in the sulfonamide retention mechanism using these beds. Considering the proportion of lipophilicity and anticancer activity, the assumption that the compound should be characterized by the lowest lipophilicity while maintaining pharmacological action for further tested molecules, 1-4 and 26, can be recommended. Assessing the overall lipophilicity of the tested compounds, it can be said that it is at a reasonably high level. Compounds containing the 1-naphthyl substituent are especially more lipophilic. To obtain a lower, more desirable lipophilicity, the phenyl substituent should be more desirable because the analyzed compounds which comprise it, as a rule, had significantly lower lipophilicity.
Next, we aimed to analyze similarities and dissimilarities between chromatographic and computational lipophilicity indexes of the target molecules; therefore, the principal component analysis (PCA), cluster analysis (CA), and the sum of ranking differences (SRD) were performed.
The CA analysis results are presented in Figure 2 as a heatmap. Two main groups of lipophilicity indexes, the computational and the chromatographical, are very clearly marked. One of the obvious explanations for the grouping results obtained may be that the calculation methods did not show the ionization of the compounds that took place during the chromatographic experiments. Additionally, the heatmap of HCA provided information about grouping in the space of the compounds under study. Here, we can easily separate two groups; one included only pirolo derivatives (molecules no 1-7) and the second one where other derivatives are grouped. Our strategy also allows us to compare the similarity of anticancer activity expressed as pC 50 concerning computational and experimental methods, determining lipophilicity. Interestingly, pIC50 is a group close to chromatographic that determines lipophilicity. Therefore, we cautiously conclude that chromatographic indices can be more effective for predicting anticancer activity than computational lipophilicity indices. Similar results are observed on the PCA (Figure 3), where the first two PCs distinguish the parameters of computational and experimental lipophilicity. A normalized loadings plot ( Figure 3) indicated that PC1 affected computational indices, whereas on PC2 it mainly affected the mainly chromatographic parameters, except IAM chromatography. Interestingly, the IAM stationary phase has a significant impact on both PCs. The surface of IAM is mostly zwitterionic at pH 7.4, and positively charged choline moieties are located in the outer part of the IAM layer. In contrast, the phosphate groups are negatively charged at the same pH and present in the phase's inner part. This distinguishes the IAM phase from the rest tested RP-HPLC beds. Similar results are observed on the PCA (Figure 3), where the first two PCs distinguish the parameters of computational and experimental lipophilicity. A normalized loadings plot ( Figure 3) indicated that PC1 affected computational indices, whereas on PC 2 it mainly affected the mainly chromatographic parameters, except IAM chromatography. Interestingly, the IAM stationary phase has a significant impact on both PCs. The surface of IAM is mostly zwitterionic at pH 7.4, and positively charged choline moieties are located in the outer part of the IAM layer. In contrast, the phosphate groups are negatively charged at the same pH and present in the phase's inner part. This distinguishes the IAM phase from the rest tested RP-HPLC beds. Molecules 2022, 27, x FOR PEER REVIEW 8 of 20 The next step of our investigation concerned the application of SRD analysis to select the best and the worst approaches for lipophilicity measurement. The limitations of unsupervised chemometric tools such as PCA and CA are that these methods do not provide any information about statistical figures of performed analysis. Consequently, the SRD analysis was used to complete the data analysis and support the selection of the best lipophilicity indices.
The results of the SRD analysis are presented in Figure 4. The scaled SRD values are plotted on the x-axis and left y-axis, while the right y-axis gives the cumulated relative frequencies for random ranking (black curve). The presented graph indicates the following facts: the Consensus LogP descriptor is placed closest to the reference ranking and can be considered the right choice for lipophilicity estimation of the studied series of compounds; the parameters logkwCN, CHI logDC18 and logkwPh are places furthest from the reference ranking and therefore are depicted as the least suitable lipophilicity descriptors of sulfonamide derivatives; specific groupings of the parameters can be observed; WlogP, Silicos-IT LogP and XLogP3 have very close SRD values and the same stands for logkIAM and LogP KOWWIN descriptors as well as for logkwCN, CHI logDC18 and logkwPh parameters; looking globally, there are two clearly visible gaps (gray zones in Figure 4) dividing the lipophilicity parameters into three main groups regarding the SRD values; the group consisting of in silico descriptors including Consensus LogP, WLogP, Silicos-IT LogP, XLogP3 and MLogP is closest to the reference ranking, while all the experimentally determined lipophilicity measures are placed far from the reference ranking.  The next step of our investigation concerned the application of SRD analysis to select the best and the worst approaches for lipophilicity measurement. The limitations of unsupervised chemometric tools such as PCA and CA are that these methods do not provide any information about statistical figures of performed analysis. Consequently, the SRD analysis was used to complete the data analysis and support the selection of the best lipophilicity indices.
The results of the SRD analysis are presented in Figure 4. The scaled SRD values are plotted on the x-axis and left y-axis, while the right y-axis gives the cumulated relative frequencies for random ranking (black curve). The presented graph indicates the following facts: the Consensus LogP descriptor is placed closest to the reference ranking and can be considered the right choice for lipophilicity estimation of the studied series of compounds; the parameters logk wCN , CHI logD C18 and logk wPh are places furthest from the reference ranking and therefore are depicted as the least suitable lipophilicity descriptors of sulfonamide derivatives; specific groupings of the parameters can be observed; WlogP, Silicos-IT LogP and XLogP3 have very close SRD values and the same stands for logk IAM and LogP KOWWIN descriptors as well as for logk wCN , CHI logD C18 and logk wPh parameters; looking globally, there are two clearly visible gaps (gray zones in Figure 4) dividing the lipophilicity parameters into three main groups regarding the SRD values; the group consisting of in silico descriptors including Consensus LogP, WLogP, Silicos-IT LogP, XLogP3 and MLogP is closest to the reference ranking, while all the experimentally determined lipophilicity measures are placed far from the reference ranking.  The next step of our investigation concerned the application of SRD analysis to select the best and the worst approaches for lipophilicity measurement. The limitations of unsupervised chemometric tools such as PCA and CA are that these methods do not provide any information about statistical figures of performed analysis. Consequently, the SRD analysis was used to complete the data analysis and support the selection of the best lipophilicity indices.
The results of the SRD analysis are presented in Figure 4. The scaled SRD values are plotted on the x-axis and left y-axis, while the right y-axis gives the cumulated relative frequencies for random ranking (black curve). The presented graph indicates the following facts: the Consensus LogP descriptor is placed closest to the reference ranking and can be considered the right choice for lipophilicity estimation of the studied series of compounds; the parameters logkwCN, CHI logDC18 and logkwPh are places furthest from the reference ranking and therefore are depicted as the least suitable lipophilicity descriptors of sulfonamide derivatives; specific groupings of the parameters can be observed; WlogP, Silicos-IT LogP and XLogP3 have very close SRD values and the same stands for logkIAM and LogP KOWWIN descriptors as well as for logkwCN, CHI logDC18 and logkwPh parameters; looking globally, there are two clearly visible gaps (gray zones in Figure 4) dividing the lipophilicity parameters into three main groups regarding the SRD values; the group consisting of in silico descriptors including Consensus LogP, WLogP, Silicos-IT LogP, XLogP3 and MLogP is closest to the reference ranking, while all the experimentally determined lipophilicity measures are placed far from the reference ranking.  The SRD procedure was validated by 7-fold cross-validation. The results are presented in two ways in Figures 5 and 6. The clustering of the experimental and in silico lipophilicity descriptors of sulfonamide derivatives in the space of the normalized SRD values (SRD%) obtained by 7-fold cross-validation is presented in the form of a dendrogram in Figure 7. Two main clusters are observable: cluster #1 contains two sub-clusters, 1a sub-cluster (CHI logD C18 , logk wPh and logk wCN ) and 1b (logk wC8 , logk IAM , LogP KOWWIN, iLogP); cluster #2 contains MLogP, XLogP3, Silicos-IT LogP, WLogP and Consensus LogP parameters. The groupings of the parameters suggested in the SRD graph and the dendrogram comply with each other.
Molecules 2022, 27, x FOR PEER REVIEW 9 of 20 The SRD procedure was validated by 7-fold cross-validation. The results are presented in two ways in Figures 5 and 6. The clustering of the experimental and in silico lipophilicity descriptors of sulfonamide derivatives in the space of the normalized SRD values (SRD%) obtained by 7-fold cross-validation is presented in the form of a dendrogram in Figure 7. Two main clusters are observable: cluster #1 contains two sub-clusters, 1a sub-cluster (CHI logDC18, logkwPh and logkwCN) and 1b (logkwC8, logkIAM, LogP KOWWIN, iLogP); cluster #2 contains MLogP, XLogP3, Silicos-IT LogP, WLogP and Consensus LogP parameters. The groupings of the parameters suggested in the SRD graph and the dendrogram comply with each other.   The SRD procedure was validated by 7-fold cross-validation. The results are presented in two ways in Figures 5 and 6. The clustering of the experimental and in silico lipophilicity descriptors of sulfonamide derivatives in the space of the normalized SRD values (SRD%) obtained by 7-fold cross-validation is presented in the form of a dendrogram in Figure 7. Two main clusters are observable: cluster #1 contains two sub-clusters, 1a sub-cluster (CHI logDC18, logkwPh and logkwCN) and 1b (logkwC8, logkIAM, LogP KOWWIN, iLogP); cluster #2 contains MLogP, XLogP3, Silicos-IT LogP, WLogP and Consensus LogP parameters. The groupings of the parameters suggested in the SRD graph and the dendrogram comply with each other.    Additionally, the results of the 7-fold cross-validation of the SRD procedure are presented in Figure 6 in the box and whisker plot; the parameters are arranged in the same order as in the SRD graph. The Consensus LogP parameter has the lowest median of the SRD data and is definitely depicted as the best lipophilicity parameter among the other calculated and experimentally determined parameters. There is a significant difference between the lipophilicity parameters separated by vertical dotted lines at the 5% level according to the Wilcoxon matched-pair test, which agrees with the separation of the parameters suggested in the SRD graph ( Figure 5) and on the dendrogram ( Figure 6). The Additionally, the results of the 7-fold cross-validation of the SRD procedure are presented in Figure 6 in the box and whisker plot; the parameters are arranged in the same order as in the SRD graph. The Consensus LogP parameter has the lowest median of the SRD data and is definitely depicted as the best lipophilicity parameter among the other calculated and experimentally determined parameters. There is a significant difference between the lipophilicity parameters separated by vertical dotted lines at the 5% level according to the Wilcoxon matched-pair test, which agrees with the separation of the parameters suggested in the SRD graph ( Figure 5) and on the dendrogram (Figure 6). The highest median can be observed for CHI logD C18 , logk wPh, and logk wCN parameters as the parameters with the highest SRD values. To conclude, the chromatographically determined lipophilicity measures do not outperform the computationally estimated lipophilicity parameters. Nevertheless, among the chromatographic lipophilicity parameters of the analyzed sulfonamide derivatives, logk IAM can be considered the best choice.
The next step of our study focuses on QSRR modeling. This methodology, proposed by Kaliszan, presents the relationship between retention and analyte structures [48]. On the one hand, obtained QSRR models allow insights into the molecular mechanism of retention. Therefore, they help us understand what molecular properties govern the chromatographically determined lipophilicity. On the other hand, the established QSRR model supports lipophilicity prediction in similar structures. The selection of theoretical descriptors which influence the retention factors employed GA-PLS. Briefly, GA is a stochastic approach that helps solve the variable selection problem. Therefore, the integration of GA with PLS may be helpful for the development of highly predictive and precise QSPR models. We chose the partial least squares (PLS) as regression mode since it can be used to analyze highly correlated data, which is frequently observed in the case of molecular descriptors.
Finally, five models of GA-PLS QSRR, each describing retention in studied chromatographic systems, were calculated. Statistical figures of obtained models are summarized in Table 3, whereas in Figure 7 the contribution of the descriptors to the individual LVs is presented. The values of theoretical descriptors and their description are listed in Tables S7 and S8, respectively. The interpretation of established models can be challenging since many applied descriptors are based on a complex matrix and weighted by different functions. Nevertheless, such holistic descriptors guarantee the coverage of the molecular structure space more efficiently than if limited to only a few mechanical descriptors. Holistic descriptors consider not only the presence of some chemical groups and pharmacophore fragments but also the relative position [49]. Several descriptors of well-recognized molecular properties, which govern retention in reserved phase chromatography, can be found. Great examples are descriptors weighted by polarizability, such as R5p+, R6p+, and SpMin2_Bh(p), or coded lipophilic pharmacophore (SHED_LL).
Next, we aimed to analyze which type of lipophilicity indices, computational or chromatographic, provided a better prediction of anticancer activity expressed as pIC 50 . The obtained PLS model confirmed findings, which can be concluded after analysis of CA since logk CN is one of the most important descriptors. This finding suggested that experimentally determined lipophilicity exceeds that obtained computationally in the screening of biological activity of the tested sulfonamides. As expected, the obtained models also included other types of molecular descriptors. Generally, lipophilicity gives information related to the drug membrane permeability. The steric and electrostatic properties are fundamental to the interaction between molecules and receptors. This statement is also supported by the proposed model where descriptors related to molecule charge (RPCG), polarizability (G2p), and geometrical properties (WHALES00_IR) significantly affect the anticancer activity of this class of chemicals. Nevertheless, comprehensive research is needed on a larger group of compounds to generalize this statement.
The statistical figures of the training and testing set indicated that obtained models are well fitted (R 2 ,Q 2, and RMSE CV ) and show suitable predictive parameters (RMSE P ). Additionally, the applicability domain assessment (AD) and the y-randomization tests were performed for each model.
The applicability domain of predictive models was assessed using a leverage approach ( Figure S1 in Supplementary Materials), where the leverages (x-axis) are plotted against standardized residuals (y-axis) on a so-called William's plot. The leverages are calculated from the descriptor matrix and then compared to a critical h* value, represented by a vertical dashed line. Additionally, values of 3 (±σ) standardized residual units (horizontal dashed lines) define the cut-off value for acceptable predictions. The study revealed that in three models (B, C, D in Figure S1), one compound from each validation set (6, 26, and 4, respectively) had a higher leverage value than the critical value (h*), indicating that their structure differs significantly. However, these compounds did not exceed the cut-off value for acceptable predictions (±3σ), resulting in very low residuals. Therefore, the developed models will provide correct predictions, even when extrapolated, for compounds that differ significantly in structure.
Each developed model was additionally subjected to a y-randomization test to confirm their robustness and that the linear relationships are not derived by chance ( Figure S1 in Supplementary Materials). The performance of the original model is tested by permuting the response variable (y) and then building the model on the primary dataset of descriptors (X). We performed 200 permutations (random models) in the presented study for each developed model. All tests confirmed that in every case, the original model was robust and not derived by chance (the randomly generated models had significantly lower R 2 and Q 2 values).

Sulfonamides Derivatives
The chemical names and SMILES notation of the target sulfonamide derivatives are presented in Table S1, whereas the 2D structures are shown in Figure 8. Their synthesis and characterization were described in the literature [27][28][29][30]. The 1H NMR and 13 C NMR spectra of target molecules are displayed in the Supplementary Materials. All samples were dissolved in DMSO (1 mg/mL). Each stock solution of analytes was stored at 2-8 • C between analyses. During this study, four chemical classes of sulfonamide derivatives were tested, including 1H-pyrrole derivatives, 5-oxo-1,2,4-triazine derivatives, 1,2,4-triazine derivatives, and N-acylbenzenesulfonamides. All sulfonamide derivatives have confirmed in vitro anticancer activities against human colon cancer (HCT-116) expressed as pIC50 and shown in Table 2.

Chromatographic Systems
All HPLC experiments were carried out using a Prominence-1 LC-2030C 3D HPLC system (Shimadzu, Japan) equipped with a DAD detector and controlled by the LabSolution system (version 5.90 Shimadzu, Japan). The stock solutions of solutes were diluted to obtain 100 µg/mL concentrations, and the injected volume was 10 µL. During this study, six different columns in terms of chemical modification of stationary phases were used. Retention times (t R ) of investigated sulfonamides were collected, and their detection in all systems was performed at the wavelength characteristic for each compound, summarized in Table S1. For all chromatography systems, analysis was carried out at 40 • C, except for IAM chromatography, in which oven temperature was set to 30 • C. The CHI IAM and CHI 18 indices of the target sulfonamides derivatives were obtained using the protocol proposed by Valko and co-workers [50]. C 18 chromatography was performed on a Waters-C 18 column (150 mm × 3.9 mm; 5.0 µm; Symmetry; USA), with a 1.5 mL/min flow rate. The mobile phase was ammonium acetate buffer (50 mM) at pH 7.4 and acetonitrile as an organic modifier. The linear gradient from 2 to 98% ACN was applied from 0 to 30 min. IAM chromatography was executed on an IAM.PC.DD2 column (100 cm × 4.6 mm; 10.0 µm; Regis Technologies, Morton Grove, IL, USA) additionally equipped with an IAM guard column with the same flow rate of 1.5 mL/min. The mobile phase was sodium phosphate buffer (10 mM) at pH 7.4 and acetonitrile as the organic phase. The linear gradient from 0 to 85% was applied within 5.25 min.
For C 8 , cyanopropyl, and phenyl chromatography, according to the assumption proposed by Snyder and co-workers [42,43], appropriate logk w values (i.e., the retention factor logk extrapolated to 0% organic modifier, as an alternative to logP) were obtained. Two retention times for each system in two different gradients (short and long) were collected, and these data, as input, were introduced into DryLab 6.0 software (Molnar Institute, Berlin, Germany). Dwell volume for these HPLC systems was measured at 0.780 mL, whereas the obtained dead times for used HPLC columns were equal to 1.530 min, 1.252 min, and 2.335 min for C 8 , SB-CN, and UK-Phenyl, respectively. The mobile phase was ammonium acetate buffer (50 mM) at pH 7.4 and acetonitrile as an organic modifier in each system. C 8 chromatography was performed on Unison UK-C8 column (150 × 2 mm; 3.0 µm. Imtakt; USA) with 0.3 mL/min flow rate. Analyses were carried out in linear gradient from 30 to 100% within 15 min in short gradient and 30 min in long gradient. The Agilent SB-CN column (150 × 4.6 mm; 3.5 µm; Zorbax; USA) was used for the cyanopropyl chromatography. The CN chromatography analyses were carried out in linear gradient from 20 to 100%, which was applied from 0 to 20 min in short gradient, and from 0 to 40 min in long gradient. The flow rate was 1.5 mL/min. The column Unison UK-Phenyl (150 × 2 mm; 3.0 µm; Imtakt; USA) was used to perform phenyl chromatography. The flow rate was set to 0.2 mL/min. The phenyl analyses were carried out in linear gradient from 40 to 100%, which was applied from 0 to 20 min in short gradient, and from 0 to 40 min in long gradient. Each HPLC analysis was run in triplicate; in Tables S2-S6, obtained retention times are listed, whereas representative chromatograms are shown in Figure 9.

Theoretical Descriptors
The theoretical descriptors were calculated applying alvaDesc software [51] and based on geometries optimization using universal force field (UFF) via OpenBabel software [52]. Before QSRR analysis, constant and near-constant were removed. MolGpka was applied for the calculation of pKa (https://xundrug.cn/molgpka, 01 March 2022). Finally, 3352 descriptors belonging to 31 classes were calculated. CA and PCA were performed on databases that included chromatographic data and in-silico-calculated lipophilicity indices. In order to eliminate the impact of various lipophilicity scales, data were standardized before analysis. Using Ward's agglomeration rule and the Euclidian distance measure, CA has presented results as clustered heat maps. Both PCA and CA analysis and visualization were performed using Python scripts.

Theoretical Descriptors
The theoretical descriptors were calculated applying alvaDesc software [51] and based on geometries optimization using universal force field (UFF) via OpenBabel software [52]. Before QSRR analysis, constant and near-constant were removed. MolGpka was applied for the calculation of pKa (https://xundrug.cn/molgpka, accessed on 1 March 2022). Finally, 3352 descriptors belonging to 31 classes were calculated. CA and PCA were performed on databases that included chromatographic data and in-silico-calculated lipophilicity indices. In order to eliminate the impact of various lipophilicity scales, data were standardized before analysis. Using Ward's agglomeration rule and the Euclidian distance measure, CA has presented results as clustered heat maps. Both PCA and CA analysis and visualization were performed using Python scripts.

Sum of Ranking Differences (SRD) Analysis
The SRD analysis, introduced by Héberger [53][54][55], was carried out on the standardized lipophilicity data to rank, group, and select the most suitable lipophilicity measures of the studied series of sulfonamide derivatives obtained by in silico and experimental (chromatographic) approaches. The analysis was performed by using a data matrix in which the lipophilicity measures are organized in the columns while the compounds are listed in the rows. The last column contained average row values used as a reference ranking. This so-called "consensus approach" measures the differences from the center as a non-parametric measure of similarities and dissimilarities [55]. The results of the ranking are interpreted based on the SRD values. The best objects (models, descriptors, molecules, etc.) are the ones that have the SRD values equal to or closer to zero (the objects closest to the reference ranking, i.e., "golden standard") [55]. The validation of the SRD procedure was done by comparison of rank by random numbers (CRRN) and 7-fold cross-validation based on omitting about 1/7 of objects and carrying out the ranking on the rest of the objects [54,55]. The normalized SRD values (SRD%) were used to compare results from different SRD analyses.

QSRR Analysis
Descriptors selection was supported by a genetic algorithm (GA), whereas multiple linear regression (PLS) was employed as a regression method. The QSRR models were built using the retention data and the calculated descriptors as dependent and independent variables, respectively. The set of parameters applied to control GA was the size of the population (500) and the mutation rate (0.1). The models were built using three LVs and five structural descriptors in each variable. The optimization of the models was carried out based on R 2 . Before calculating GA-PLS for each modeled endpoint, the target solutes were randomly divided into the training group (n = 18) and the testing group (n = 9). The training set always contains the molecule with the highest and lowest value of the modeled endpoint. The information on the belonging of each compound to a training or testing set is included in Figure S1.
The following statistical figures were used for assessment of model fitting and predictive abilities: the coefficient of determination (R 2 ), external validation coefficients (Q F1 2 , Q F2 2 , Q F3 2 ), root-mean-squared error of cross-validation (RMSE CV ), root-mean-square error in prediction (RMSE P ) and concordance correlation coefficient (CCC). Applied statistics were calculated using the following formulas: The same procedures were used to calculate the quantitative structure-activity relationship model, where the pIC 50 was used as the independent variable, and the dependent variables were lipophilicity parameters, both computational and chromatographic, and the remaining structural descriptors obtained.

Conclusions
Both PCA and CA showed significant differences between the chromatographically determined lipophilicity and calculated ones. Although the grouping results can be explained by ionization, since the calculation methods do not include information about the ionization of the compounds all the time, significant differences between the values calculated by different algorithms affect the credibility of the results obtained through the computational approach. SRD indicated that among the chromatographic lipophilicity parameters of the analyzed sulfonamide derivatives, logk IAM could be considered the best choice. Consequently, IAM-HPLC can be recommended as the most sustainable method for the lipophilicity characterization of this class of chemical compounds. From the practical point of view, considering the ratio between lipophilicity and anticancer activity, according to the assumption that the compound should be characterized by the lowest lipophilicity while maintaining pharmacological action for different tested molecules, compounds 1-4 and 26 can be recommended.

Supplementary Materials:
The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/molecules27133965/s1, Table S1. The SMILES of the studied sulfonamide derivatives and their chemical class of substituents and IC 50 values; Table S2. The obtained retention times together with CHI IAM for the target sulfonamides derivatives in IAM chromatography; Table S3. The obtained retention times and CHI C18 for the target sulfonamide derivatives in C 18 chromatography; Table S4. The obtained retention times and log kw for the target sulfonamide derivatives in C 8 chromatography; Table S5. The obtained retention times and log kw for the target sulfonamide derivatives in Cyanopropyl chromatography; Table S6. The obtained retention times and log kw for the target sulfonamide derivatives in phenyl chromatography; Table S7. List of molecular descriptors used to build QSRR models.; Table S8. The values of theoretical descriptors for each compound; Figure  S1. William's plots and y-randomization tests (Q2 vs. R2) generated for each model. The numbers correspond to the next stationary phases A) IAM. B) C18. C) C8. D) CN. E) Ph and F) pIC50.

Conflicts of Interest:
The authors declare no conflict of interest.