Lipophilicity Determination of Antifungal Isoxazolo[3,4-b]pyridin-3(1H)-ones and Their N1-Substituted Derivatives with Chromatographic and Computational Methods

The lipophilicity of a molecule is a well-recognized as a crucial physicochemical factor that conditions the biological activity of a drug candidate. This study was aimed to evaluate the lipophilicity of isoxazolo[3,4-b]pyridine-3(1H)-ones and their N1-substituted derivatives, which demonstrated pronounced antifungal activities. Several methods, including reversed-phase thin layer chromatography (RP-TLC), reversed phase high-performance liquid chromatography (RP-HPLC), and micellar electrokinetic chromatography (MEKC), were employed. Furthermore, the calculated logP values were estimated using various freely and commercially available software packages and online platforms, as well as density functional theory computations (DFT). Similarities and dissimilarities between the determined lipophilicity indices were assessed using several chemometric approaches. Principal component analysis (PCA) indicated that other features beside lipophilicity affect antifungal activities of the investigated derivatives. Quantitative-structure-retention-relationship (QSRR) analysis by means of genetic algorithm—partial least squares (GA-PLS)—was implemented to rationalize the link between the physicochemical descriptors and lipophilicity. Among the studied compounds, structure 16 should be considered as the best starting structure for further studies, since it demonstrated the lowest lipophilic character within the series while retaining biological activity. Sum of ranking differences (SRD) analysis indicated that the chromatographic approach, regardless of the technique employed, should be considered as the best approach for lipophilicity assessment of isoxazolones.


Introduction
Lipophilicity is one of the essential factors that determine the biological activity of drug candidates. It determines not only the transport of molecules through biological membranes but also their ability to undergo complexation with blood proteins and binding to receptors [1]. Knowledge of lipophilicity helps us understanding pharmacokinetic properties, including absorption, distribution, metabolism, and excretion (ADME) processes, as well as toxicity [2,3].
Although lipophilicity has been used since the 1960s as one of the main parameters in medical chemistry, the standardization of the high-throughput and reliable analytical procedure for its assessment is still required [4]. According to the International Union of Pure and Applied Chemistry (IUPAC), the operational definition of lipophilicity is the affinity of a molecule or a moiety for a lipophilic environment. It is commonly measured by its distribution behavior in a biphasic system, either a liquid-liquid or solid-liquid system [5].
The direct, "shake flask" method proposed by Hansch and co-workers employing n-octanol-water partitioning is laborious, time-consuming, and requires large amounts of organic solvent and absolutely pure substances. For this reasons, methods based on the solid-liquid partitioning such as reserved phase liquid chromatography (RP-LC) and electrometric techniques including micellar electrokinetic chromatography (MEKC) and microemulsion electrokinetic chromatography (MEEKC) are currently gaining popularity [6][7][8][9][10]. Simultaneously, in silico approaches dedicated to the calculation of partition coefficient (logP) are being intensively developed.
The computational methods that provide quick information regarding lipophilicity are commonly utilized for the screening of chemical libraries. Hence, in silico approaches have considerable advantages when compared to experimental methods. They are significantly faster and cheaper, since their application requires neither laboratory experiments nor specialized equipment and chemical reagents. Another advantage of the calculation approach is a possibility to estimate lipophilicity during the design of drug candidates prior to their synthesis. There are several programs dedicated to logP calculation. Most of them are freely available online and generate results nearly instantaneously.
In addition, recently presented reports prove the usefulness of ab initio methods and density functional theory (DFT) for lipophilicity assessments [11,12]. The latter approach is based on the calculated Gibbs energy differences for solvated molecules with respect to various phases, i.e., n-octanol and water.
The investigated isoxazolo [3,4-b]pyridine-3(1H)-ones derivatives ( Figure 1) exhibit moderate antibacterial properties and pronounced antifungal activity (MIC < 6.2 µg/mL for Candida parapsilosis) [15,21,22]. The matter of new antifungal drugs has become a vital topic, due to the development of resistance to currently used antifungal drugs, as well as their high toxicity, thus rendering the treatment of fungal infections a major challenge for modern medicine.
This work pertains to an extensive study aimed at the assessment of the physicochemical properties of pyrido-isoxazolone derivatives with proven antifungal activity. For lipophilicity evaluations, several methods-including two modes of RP-LC, reversed phase thin layer chromatography (RP-TLC), and reversed phase high performance liquid chromatography (RP-HPLC)-were used. Furthermore, for lipophilicity assessment, MEKC was also employed. The calculated logP values were established with various freely and commercially available software packages and online platforms that include theoretical models based on pure atomic and/or fragment contribution approaches, properties dependent methods, and DFT computations. Chemometric methodology was applied in order to select the optimal tools for lipophilicity assessment of the studied isoxazolone derivatives. Finally, quantitative structure retention relationship (QSRR) analysis was performed in other to rationalize the link between the physicochemical properties of target compounds and their lipophilicity.

Results and Discussion
Taking into account the importance of lipophilicity from a medicinal chemistry perspective, the present study focuses on comparing the performance of methods for lipophilicity determination with regard to the aforementioned group of drugs candidates. Table 1 lists all the computational logP parameters attained for the tested compounds using particular software programs. Chemical names of the target compounds are given in Table S1, whereas 2D structures are presented in Figure 1.

Results and Discussion
Taking into account the importance of lipophilicity from a medicinal chemistry perspective, the present study focuses on comparing the performance of methods for lipophilicity determination with regard to the aforementioned group of drugs candidates. Table 1 lists all the computational logP parameters attained for the tested compounds using particular software programs. Chemical names of the target compounds are given in Table S1, whereas 2D structures are presented in Figure 1. Significant differences in terms of minimum, maximum, and mean values of logP, depending on the theoretical approach applied, were found (Table S2). Furthermore, it was noted that not all of the obtained lipophilicity parameters correlate with each other, as indicated in the presented correlation matrix (supplementary data excel file in the worksheet correlation matrix).

Lipophilicity Estimation by Computational Methods
These differences can be explained by the diversified nature of the algorithms employed in the utilized software programs. Generally, the following three main groups of theoretical methodologies can be distinguished: the atomic approach, the fragment contribution technique, and properties dependent methods. Classification of the investigated software based on algorithm type is presented in Table 2. Significant differences in terms of minimum, maximum, and mean values of logP, depending on the theoretical approach applied, were found (Table S2). Furthermore, it was noted that not all of the obtained lipophilicity parameters correlate with each other, as indicated in the presented correlation matrix (supplementary data excel file in the worksheet correlation matrix).
These differences can be explained by the diversified nature of the algorithms employed in the utilized software programs. Generally, the following three main groups of theoretical methodologies can be distinguished: the atomic approach, the fragment contribution technique, and properties dependent methods. Classification of the investigated software based on algorithm type is presented in Table 2. The lowest mean value of logP among the investigated software platforms was obtained using Villar algorithms implemented into Spartan (LogPV Spartan). This method employs semi-empirical wave functions and a calculated overlap matrix, which includes type and number of lone pairs, along with the surface area of selected atoms (H, C, N, O, F, S, and Cl). Surprisingly, according to this program the maximum logP value of 4.72 was obtained for compound 12 (pyrido-isoxazolone substituted with the methylsulfonyl group). This result is in contradiction with the logP value of 0.69 calculated for the same structure with the alternative Ghose-Crippen algorithm available in Spartan (LogPC Spartan). Hence, this atomic approach-based method, which is parameterized for 110 atom types and includes correction factors, proved the compound described above to be the most hydrophilic in the series. Another striking example is compound 5, for which the logP parameter obtained with fragment contribution-based software is 3.51 (MlogP), whereas the lipophilicity index calculated by hybrid KOWWIN algorithm, implementing both an atom-based approach and fragment contribution framework, is significantly lower (1.88).
The presented results indicate that the computational methods can be very unreliable with respect to certain functional groups. In consequence, the same chemical structure can feature completely different logP parameters depending on the algorithm applied.
Significant differences between logP values calculated for particular chemical structures can be considered as one of the major drawbacks of the computational methods. The above observation justifies the fact that the experimental methods are still preferred over the computational approaches and provides evidences of the importance of proper method selection in lipophilicity assessment processes.
Alternative ab initio methods, based on DFT wave functions and solvation models can be utilized for calculation of the free solvation energy change for the analyte transfer between n-octanol and water phases. This approach is considerably less popular than the other theoretical techniques, and hence only a few reports proved usefulness of DFT calculations for partition coefficients estimation. Those publications concern lipophilicity assessments for various groups of chemical species, such as alcohols [12], ruthenium(II)-arene complexes [23], and organophosphate-type pesticides [11]. Table 3 summarized the calculated logP parameters using investigated DFT methods. Among the tested functionals, a good overall agreement between the DFT calculated logPs and chromatographically determined lipophilicity indexes (logk w ) was found for PBE0 6-311G++(2df,2dp) (R = 0.680, p = 0.005). The outliers detected by 2.5 sigma rule are compounds 11, 9, and 12. After exclusion of outliers from the obtained model, the value of the correlation coefficient (R) increases considerably (R = 0.895, p = 0.005). These results suggest that regular development of ab initio methods may lead to a substantial improvement of the aforesaid lipophilicity assessment procedure. The major problem in this approach is the selection of a suitable functional for calculation of molecules featuring particular chemical or functional groups. Undoubtedly, better understanding of this subject-as well as the application of self-learning algorithms, such as artificial neural networks, for selection of an appropriate functional-may contribute to dissemination and improvement of the ab initio lipophilicity assessment approach.

Estimation of Lipophilicity by Chromatographic Methods
In parallel to computational methodologies, indirect approaches (mostly chromatographic) have been utilized for lipophilicity assessments [9,10]. This is attributed to the fact that lipophilicity governs the retention of molecules in RP-LC, which is one of the most popular modes of LC separation in the field of pharmaceutical chemistry [24].
The pyrido-isoxazolone derivatives were analyzed by RP-LC and micellar electrokinetic chromatography (MEKC). The experimentally determined chromatographic lipophilicity indices are summarized in Table 4.  The present study comprehends two types of reversed phase liquid chromatography setups-that is, RP-TLC and RP-HPLC techniques-were utilized. RP-TLC analyses were performed on two main reversed stationary phases, i.e., C 8 and C 18 modified silica. Mixtures of methanol and water were used as mobile phases according to Komsta's recommendation [25]. However, in order to attain reasonable retentions for the investigated compounds on C 18 bonded silica gel, higher amounts of methanol in the mobile phase were required when compared to the experiments performed with application of C 8 modified silica. Generally, regular chromatographic behavior of the target structures was observed. The retardation factor (R F ) increased systematically with increasing fraction of methanol in the mobile phase. As might be expected, the behavior of the analytes on both types of plates well reflects the Snayder-Soczewiński equation. These findings were confirmed by high values of correlation coefficient (R), determination coefficient (R 2 ), significant F-Snedecor's test (F) and small standard estimation error (s). The statistical figures for the obtained linearity of Snyder-Soczewiński equation are summarized in Table S3.
Subsequently, the intercorrelation between m and R M 0 was verified in order to evaluate the hypothesis, that the analytes can be regarded from chromatographic point of view as a group of structurally similar compounds. The established linear correlations between slope and intercept of Soczewiński-Wachtmeister's equation are presented below. The results obtained clearly indicate that the investigated isoxazolone derivatives constitute a series of chromatographically related congeners.
Routinely, lipophilicity measurements with RP-TLC are expressed as a R M 0 values. However, other RP-TLC lipophilicity indexes, such as the mean value of R M , C 0 , and PC 1 , can be calculated on the basis of the retention data. Parameter m characterizes hydrophobic character of the studied compounds. All experimentally determined RP-TLC lipophilicity and hydrophobicity constants are listed in Table 4. The correlation matrix of all assessed parameters for the two investigated stationary phases are presented in the supplemented data sheet file. In general, all the obtained lipophilicity parameters are highly correlated, which indicates that they present similar information.
Only the hydrophobicity parameter m shows significantly lower correlations with mean value of R M , C 0 , and PC 1 . Furthermore, correlation between RP-TLC lipophilicity indices and HPLC determinant logk w was found. This result suggests that, for this class of chemical compound, RP-TLC can be used as pilot method for optimization of their RP-HPLC separations. Surprisingly, RP-LC chromatographic indexes and logk MEKC were uncorrelated with each other. The correlation between RP-LC chromatographic indexes and logk MEKC was expected due to theoretical assumptions that in both techniques the retention depends on lipohilicity [26]. Although, the MEKC background electrolyte (BGE) contained TRIS/HEPES buffer at pH 7.4, the investigated structures should not be subjected to ionization, except for compounds 1, 14, and 23. As the pK a value of compound 14 is 6.9 [15], in alkaline pH, the isoxazolones are subjected to acid dissociation and, hence, anion species are formed. Nonetheless, due to the presence of two ambident nucleophilic nitrogen atoms (N1 and N7 for 14 or N1 and N9 for 1 and 23), these compounds can exist as mixtures of prototropic 1H-oxo and 7H/9H-oxo tautomers. However, the DFT calculations reveal that, in a solution of high relative permittivity (e.g., water, DMF), the 7H/9H-oxo forms are thermodynamically favored over the 1H-oxo tautomers [15,21].
The differences between the retention of the studied isoxazolone derivatives in MEKC and RP-LC experiments can be explained by the chemical character of the sodium dodecyl sulfate (SDS) micelles. SDS is an anionic surfactant that forms negatively charged micelles. In the case of the investigated structures, other lipophilic-hydrophobic interactions between solutes and micelles can occur. This phenomenon has been thoroughly studied and described in the section dedicated to QSRR.
Generally, the chromatographic lipophilicity parameters are roughly correlated with the calculated logPs obtained using both DFT calculations and software (correlation matrix in the supplementary datasheet excel file). The highest correlation coefficients (R) were observed for R M 0 and m C 8 plates and ClogP (Chem Draw). Concurrently, the lack of correlations between logk MECK and the calculated logPs should be emphasized. Analysis of the presented results indicates that the most lipophilic compound within the studied series is compound 2, i.e., N1-benzyl substituted quinolino-isoxazolone. Hence, this structure exhibited the highest logk w and R M 0 in HPLC and C 18 TLC measurements. Likewise, compounds 6, 11, 13, and 18 bearing large alkyl and benzyl substituents displayed pronounced lipophilic properties. The least lipophilic character demonstrated the derivative 14, i.e., pyridino-isoxazolone without any substituent on the ring nitrogen atom. The correlation between lipophilic properties and antifungal activities of the studied compounds was analyzed. The previously reported MIC data for five different Candida species (C. albicans ATCC 10231, C. glabrata ATCC 66032, C. lusitaniae ATCC 34499, C. parapsilosis ATCC 22019, C. tropicalis ATCC 750), assessed for all compounds except structure 13 [15,21,22], are presented in Table S4. The lack of correlation between any of the lipophilicity indexes, both calculated and chromatographically determined, and antifungal activities should be emphasized. The most active compounds 14, 15, and 16, although of diversified lipophilic attributes, logk w in the range from 2.81 to 3.70, show similar antifungal properties. These results indicate that other of lipophilicity molecular properties affect antifungal activities of the studied isoxazolo [3,4-b]pyridine-3(1H)-ones and suggest a non-specific mode of antifungal action.

Multivariate Analysis of Antifungal Activity, Chromatographic Lipophilicity Indices, and Computationally Estimated logP Values
In order to investigate differences between chromatographic and computational lipophilicity measures and antifungal activities of the studied compounds, principal component analysis (PCA) was performed. PCA is one of the basic multivariate techniques which provides an insight into data structure, similarities and dissimilarities of variables, disposition of objects, tendencies for their grouping, and outlying effects. Therefore, it is usually carried out during the data exploration step. PCA transforms a huge number of variables into a significantly smaller set of new orthogonal variables called principal components (PC). The results of PCA are usually illustrated as a pattern of objects (score plot) or variables (as points in two-dimensional plots). Figure 2 presents the loading diagram of PC 1 versus PC 2 , where investigated lipophilicity indexes both chromatographically determined and calculated together with antifungal activity are projected. Compound 13 was excluded from this analysis because microbiological data for this derivative were not available.

Multivariate Analysis of Antifungal Activity, Chromatographic Lipophilicity Indices, and Computationally Estimated logP Values
In order to investigate differences between chromatographic and computational lipophilicity measures and antifungal activities of the studied compounds, principal component analysis (PCA) was performed. PCA is one of the basic multivariate techniques which provides an insight into data structure, similarities and dissimilarities of variables, disposition of objects, tendencies for their grouping, and outlying effects. Therefore, it is usually carried out during the data exploration step. PCA transforms a huge number of variables into a significantly smaller set of new orthogonal variables called principal components (PC). The results of PCA are usually illustrated as a pattern of objects (score plot) or variables (as points in two-dimensional plots). Figure 2 presents the loading diagram of PC1 versus PC2, where investigated lipophilicity indexes both chromatographically determined and calculated together with antifungal activity are projected. Compound 13 was excluded from this analysis because microbiological data for this derivative were not available.
The first two PCs included 65.46% of the overall data variability. The greatest differences in the value of the PC1 can be observed between the RP-TLC parameter m, and PC1 for both stationary phases and other investigated parameters. Other RP-TLC parameters as well as logkw determined by HPLC are located together with computationally estimated logP values. However, the value of PC2 set apart logPs calculated by means of DFT algorithms. Only the experimentally obtained logkMEKC is located centrally, distant from other lipophilicity parameters. Furthermore, all microbiological data, expressed as 1/log MIC, are grouped closely together, albeit at a considerable distance to the other lipophilicity parameters, both experimental and computational.  The first two PCs included 65.46% of the overall data variability. The greatest differences in the value of the PC 1 can be observed between the RP-TLC parameter m, and PC 1 for both stationary phases and other investigated parameters. Other RP-TLC parameters as well as logk w determined by HPLC are located together with computationally estimated logP values. However, the value of PC 2 set apart logPs calculated by means of DFT algorithms. Only the experimentally obtained logk MEKC is located centrally, distant from other lipophilicity parameters. Furthermore, all microbiological data, expressed as 1/log MIC, are grouped closely together, albeit at a considerable distance to the other lipophilicity parameters, both experimental and computational.

Comparison of Computationally and Chromatographically Derived Lipophilicity Indices by the Sum of Ranking Differences (SRD)
Computational and chromatographic methods have been analyzed by the sum of ranking differences in order to rank, group, compare, and select the best lipophilicity measures. Although some exploratory, unsupervised chemometric tools such as PCA and cluster analysis may reveal similarities among lipophilicity indices [27], they do not provide the possibility for selection of the best and the worst methods. Also, PCA deals only with a fraction of data variability, while the SRD takes the entire information pool. The superiority of the SRD method in comparison of lipophilicity measures has been demonstrated on several occasions [28][29][30].
In the case of the SRD-CRRN ranking of standardized lipophilicity measures (Figure 3a), the methods with smallest SRD values, i.e., the closest to the consensus, are XlogP3, m, and R M 0 obtained on C 8 -modified silica. These are closely followed by the rest of chromatographic and computational estimations, most of which are located on the left side of the plot, far from the random distribution curve, which makes them statistically significantly ranked. However, the retentions obtained under MEKC conditions are the worst measures grouped with the logP values estimated by ChemDraw.
They fall under the random distribution curve and are unable to rank the studied compounds according to their lipophilic character better than a chance. The similar ranking was observed in the case of interval scaled and rank transformed data. Sevenfold cross-validation followed by ascending ordering of the SRD medians and non-parametric pairwise comparisons of methods (by the sign test and matched-pairs test) reveals two large groups of lipophilicity indices separated at the predefined significance level of p = 0.05 (Figure 3b). In the first, much smaller group, only three chromatographic indices and most of the common computational methods can be found. In the second group, the rest of chromatographic indices and all of the DFT computational methods with remaining computational approaches are segregated. Only four indices-VlogP, AClogP, logP Chem Draw, and logk MEKC -are located at the very end, and are all mutually separated at the predefined significance level of p = 0.05 and from the rest of the methods.
Since the sevenfold cross-validation introduces variability in the SRD values it is possible to decompose such variability to factors that can affect SRD ranking, and test their significance by ANOVA. In this particular case 651 SRD values were collected (31 lipophilicity measures ×3 data pretreatment methods ×7 repetitions) and subjected to ANOVA. The full interaction model without quadratic terms (Equation (3)) was defined for two factors of particular interest.
Statistical significance of factors is summarized in Table 5. Only types of lipophilicity descriptors significantly affect the outcome of SRD scores. As expected, data treatment, as well as the cross-coupling term are statistically insignificant at p = 0.05. Sevenfold cross-validation followed by ascending ordering of the SRD medians and non-parametric pairwise comparisons of methods (by the sign test and matched-pairs test) reveals two large groups of lipophilicity indices separated at the predefined significance level of p = 0.05 (Figure 3b). In the first, much smaller group, only three chromatographic indices and most of the common computational methods can be found. In the second group, the rest of chromatographic indices and all of the DFT computational methods with remaining computational approaches are segregated. Only four indices-VlogP, AClogP, logP Chem Draw, and logkMEKC-are located at the very end, and are all mutually separated at the predefined significance level of p = 0.05 and from the rest of the methods.
Since the sevenfold cross-validation introduces variability in the SRD values it is possible to decompose such variability to factors that can affect SRD ranking, and test their significance by  Plotting the means of factor levels and 95% confidence intervals ( Figure 4) reveals that the SRD scores are the lowest in the case of chromatographic descriptors obtained on C 8 -and C 18 -modified silica, regardless to chromatographic technique employed (TLC or HPLC). Considering the fact that the lower the SRD scores are, the better are the methods, these chromatographic methods should be considered as the best ways for lipophilicity estimation. MEKC gives the highest SRD values, i.e., it is the worst method for lipophilicity determination. All computational methods, including those based on DFT, are of the same performance. The Fisher's post hoc test can differentiate between all these three groups at the predefined significance of p = 0.05. However, Tukey's Honest significant difference methods can only separate MEKC from the rest.

Quantitative Structure Retention Relationships Analysis of Chromatographically Derived Lipophilicity Indices
In pursuance of prediction of the retention factors and cognition of the retention mechanism for the four techniques used to estimate lipophilicity indices-RP-TLC (both C8 C18 plates), HPLC, and MEKC-QSRR analysis was employed. First, the application of the stringent criteria detailed in the experimental section resulted in a reduction of the initial matrix of 2848 descriptors to 44, 43, 43, and 38, for TLC C8 RM 0 , C18 RM 0 , HPLC logkw, and logkMEKC QSRR models, respectively. After 1000 iterations of GA-PLS, the first three consensus models comprised of 17, whereas the logkMEKC QSRR Factor effects presented as level arithmetic means and 95% confidence intervals (denoted as vertical bars). SRD score values are plotted on y-axis. F 1 is depicted as lines of different colors, while F 2 is plotted on x-axis.

Quantitative Structure Retention Relationships Analysis of Chromatographically Derived Lipophilicity Indices
In pursuance of prediction of the retention factors and cognition of the retention mechanism for the four techniques used to estimate lipophilicity indices-RP-TLC (both C 8 C 18 plates), HPLC, and MEKC-QSRR analysis was employed. First, the application of the stringent criteria detailed in the experimental section resulted in a reduction of the initial matrix of 2848 descriptors to 44, 43, 43, and 38, for TLC C 8 R M 0 , C 18 R M 0 , HPLC logk w , and logk MEKC QSRR models, respectively. After 1000 iterations of GA-PLS, the first three consensus models comprised of 17, whereas the logk MEKC QSRR model comprised of 12 molecular descriptors. Figure 5A-D depicts the percentage of selected molecular descriptors (%Selection) upon 1000 iterations of GA-PLS for each modeled end point. The number of PLS latent variables (LVs) was comprehensively optimized. Optimal number of LVs for the first three GA-PLS QSRR models was four (RMSECV of 0.311, 0.353, and 0.197), whereas for the logkMEKC model, the optimal number of LVs was five with an RMSECV of 0.104 ( Figure S1). All the models were found to be strongly statistically significant (using CV-ANOVA) with p values < 0.0001 (Tables S5-S8) Figure S2A-D. It can be observed that the trend of LOO-CV (training set) predictive performance for all the models except the consensus MEKC GA-PLS model ( Figure S2D) matches the predictive performance trends shown in Figure 5. The inconsistency between the training RMSE and LOO-CV RMSECV (for the training set) for the MEKC GA-PLS points to a potential over-estimation of the training error using LOO-CV. These results indicate that the established models can be successfully used for prediction on chromatographic indexes of pyrido-isoxazolone derivatives based on the computational descriptors. Nevertheless, the MEKC GA-PLS model should be applied with care. The number of PLS latent variables (LVs) was comprehensively optimized. Optimal number of LVs for the first three GA-PLS QSRR models was four (RMSECV of 0.311, 0.353, and 0.197), whereas for the logk MEKC model, the optimal number of LVs was five with an RMSECV of 0.104 ( Figure S1). All the models were found to be strongly statistically significant (using CV-ANOVA) with p values < 0.0001 (Tables S5-S8) and exhibited strong predictive ability as evident from Figure 6A Another benefit of QSRR analysis is the possibility of getting insight into the molecular mechanism of retention. For this, the molecular descriptors that affect chromatographic lipophilicity indexes have been determined. Table S9 summarizes full names and classes of descriptors used to build GA-PLS QSRR models.
In the case of RP-LC, the importance of two groups of molecular descriptors-geometry, topology, and atom-weights assembly (GETAWAY) descriptors and weighted holistic invariant molecular (WHIM) descriptors-should be highlighted. GETAWAY descriptors provide information regarding 3D-molecular geometry afforded by the molecular influence matrix (MIM) and atom relatedness by molecular topology, with chemical information obtained applying various atomic weightings (atomic mass, polarizability, van der Waals volume, and electronegativity, together with unit weights) [31]. This class of descriptors implement MIM, which is the matrix representation of molecules denoted by hydrogen atoms and constituted by the centered Cartesian coordinates x, y, z [32]. WHIM descriptors also belong to the group of 3D-molecular descriptors. This class of molecular descriptors contain information with regard to the whole 3D structure in terms of size, shape, symmetry, and atom distribution [33]. The calculation algorithms include realization of PCA based on the centered molecular coordinates. Although these two groups of descriptors are similar to each other, the WHIM descriptors reflect the holistic representation, whereas GETAWAY descriptors more effectively present information with respect to portions of the molecular structures [34]. It should also be emphasized that there were significant contributions to retention in RP-LC to descriptors that are weightings of polarizability, such as E2p, GATS5p, R3p, R4p, and R7p+.
The obtained GA-PLS QSRR models also indicate differences between RP-LC and MEKC Another benefit of QSRR analysis is the possibility of getting insight into the molecular mechanism of retention. For this, the molecular descriptors that affect chromatographic lipophilicity indexes have been determined. Table S9 summarizes full names and classes of descriptors used to build GA-PLS QSRR models.
In the case of RP-LC, the importance of two groups of molecular descriptors-geometry, topology, and atom-weights assembly (GETAWAY) descriptors and weighted holistic invariant molecular (WHIM) descriptors-should be highlighted. GETAWAY descriptors provide information regarding 3D-molecular geometry afforded by the molecular influence matrix (MIM) and atom relatedness by molecular topology, with chemical information obtained applying various atomic weightings (atomic mass, polarizability, van der Waals volume, and electronegativity, together with unit weights) [31]. This class of descriptors implement MIM, which is the matrix representation of molecules denoted by hydrogen atoms and constituted by the centered Cartesian coordinates x, y, z [32]. WHIM descriptors also belong to the group of 3D-molecular descriptors. This class of molecular descriptors contain information with regard to the whole 3D structure in terms of size, shape, symmetry, and atom distribution [33]. The calculation algorithms include realization of PCA based on the centered molecular coordinates. Although these two groups of descriptors are similar to each other, the WHIM descriptors reflect the holistic representation, whereas GETAWAY descriptors more effectively present information with respect to portions of the molecular structures [34]. It should also be emphasized that there were significant contributions to retention in RP-LC to descriptors that are weightings of polarizability, such as E2p, GATS5p, R3p, R4p, and R7p+.
The obtained GA-PLS QSRR models also indicate differences between RP-LC and MEKC retentions of studied isoxazolone derivatives. In case of MEKC, the crucial descriptors that affect the retention are HATS1e and CATS3D_08_AA. HATS1e that belong to GETAWAY descriptors class weighted by Sanderson electronegativity. Hence, these descriptors can explain differences between retention of studied compounds in RP-LC and MEKC. For example, HATS1e variability may influence interactions between negatively charged SDS micelles and electronegative groups of solutes. Another descriptor that significantly affects retention under MEKC conditions belongs to CATS3D descriptor class. The chemically advanced template search (CATS) descriptors are introduced as a pharmacophore/biophore model based on the cross-correlation of generalized atom types [35]. The CATS3D descriptors used in GA-PLS models include information corresponding to hydrogen-bond acceptor interactions (CATS3D_08_AA) and lipophilic character of a molecule (CATS3D_03_LL).

Analytes
The synthesis of the investigated compounds is previously reported [15,21,22]. Briefly, compound 14 was obtained via condensation of N-hydroxy-3-(hydroxyamino)-3-iminopropanamide with acetylacetone in the presence of piperidne [15], while compounds 1 and 23 were synthesized through multi-step procedure from aniline and 2,5-dimethoxyaniline, respectively [21,36]. Derivatives 2-5 and 6-13, 15-18 were synthesized by acylation, alkylation or sulfonation reactions of compounds 1 [15,22] and 14 [21], respectively, while compounds 19-22, 24-26 were obtained through methylation of the corresponding isoxazolones [21]. Chemical names and the structures of the investigated compounds are presented in Table S1 whereas SMILES notation is listed in supplementary data excel sheet file (in the worksheet SMILES notation). The target compounds were dissolved in DMSO to obtain a concentration of 1 mg mL −1 . The stock solutions of analytes were stored at 2-8 • C prior to analyses.

RP-TLC Analysis
RP-TLC experiments were performed with ready to use C 18 and C 8 plates (20 cm × 10 cm) manufactured by Merck (Darmstadt, Germany) with F 254 fluorescence indicator. The chromatographic chambers (Twin Trough Chambers from CAMAG, Philadelphia, PA, USA) were saturated with the mobile phase vapors for 30 min. The 5 µL of the stock solutions of the analytes were spotted manually on the plates with the use of a micropipette from Brand (Wertheim, Germany). The mobile phases were prepared by mixing appropriate volumes of methanol and water in a range from 40 to 90% (v/v), in the case of C 8 bonded silica, and from 60 to 100% (v/v), in the case of C 18 bonded silica stationary phase. In each chromatographic experiment, the content of organic modifier was increased in steps of 10% (v/v). Chromatograms were developed at a room temperature (20 ± 2 • C) in ascending fashion to the solvent distance of 8 cm. Then, the chromatographic plates were dried in a stream of warm air for 5 min. The identification was performed under UV light at λ = 254 nm by CAMAG UV Lamp 4 and the Viewing Box 4 (Philadelphia, PA, USA).
Subsequently, the Soczewiński-Wachtmeister's [37] equation, which presents linear relationship between the concentration of organic solvent in mobile phase (C) and retention factor R M , was used in order to determine the basic lipophilicity RP-TLC parameter R M 0 and hydrophobic constant m. Another parameter of RP-TLC lipophilicity, C 0 , introduced by Bieganowska [38], has been calculated according to the formula This metric corresponds to the parameter ϕ 0 (the isocratic chromatographic lipophilicity index) previously intended for the HPLC technique. C 0 parameter relates to the concentration of the organic component in the mobile phase for which the distribution of the analyzed substance between the mobile and stationary phase is equal (1:1) [38].
Moreover, the PCA was performed according to the protocol proposed by Sarbu and co-workers [39] to calculate the principal component PC 1 as another estimate of TLC lipophilicity. The data matrixes included R M values of solutes × modifier concentrations in the mobile phase.

HPLC Analysis
During this study, we used a Prominence-1 LC-2030C 3D HPLC system (Shimadzu, Japan) equipped with DAD detector and controlled by LabSolution system (version 5.90 Shimadzu, Japan). The concentrations of the investigated analytes were approximately 100 µg/mL, and the injected volume was 20 µL. The RP-HPLC experiments were performed on Knauer C 18 100 × 4.6 × 5 µ HPLC column with a linear gradient 20-98% phase B (where phase A was water and phase B was methanol) at a flow rate of 1 mL/min. The temperature of the chromatographic column was controlled and set to 30.0 • C. Two gradient runs differing in gradient time (t G equal to 20 min and 40 min) were performed and retention times (t R ) of investigated isoxazolone derivatives were collected. Detection of solutes was performed at 290 nm. These data were used as input data, and appropriate logk w values (i.e., the retention factor logk extrapolated to 0% organic modifier, as an alternative to logP) were calculated using the DryLab 6.0 software (Molnar Institute, Berlin, Germany) based on the assumption proposed by Snyder and co-workers [40,41]. Each HPLC run was repeated twice. Dwell volume for HPLC system was measured at 0.780 mL, whereas the obtained dead time for used HPLC columns was equal 1.401 min.

MEKC Analysis
All MEKC experiments were carried out with a P/ACE MDQ plus system (Sciex, Framingham, MA, USA). The electropherograms were recorded and analyzed with the 32 Karat Software (version 10.2). The uncoated fused silica capillaries (50 mm i.d., Polymicro Technologies, West Yorkshire, UK) of a total length equal to 60 cm × 50 µm were used during the study. The following rinsing procedures were applied before every working day: first rinsing with 0.1 M NaOH for 30 min, next ultrapure water for 10 min, and finally BGE for 30 min. Between analyses the capillary was conditioned with BGE for 2 min. The applied pressure for all rinsing operations was 345 kPa. The investigated isoxazolone derivatives were dissolved in BGE at concentrations of 100 µg/mL with the addition of quinine (micelles marker) and DMSO (EOF marker). Hydrodynamic injected mode (35 kPa for 5 s) was used to introduce samples into the capillary. The separation condition was as follows: voltage application of 20 kV with positive polarity and a constant temperature of 25 ± 0.1 • C. The separations were performed in duplicates. The BGE consisted of aqueous solution of 50 mM SDS and 120 mM HEPES/100 mM Tris buffer of pH 7.4. Detection was carried out at 200, and 250 nm with 8 Hz probing frequency. The logarithm of retention factor logk MEKC was calculated by the equation proposed by Terabe and co-workers. Lipophilicity parameters logPC and logPV (am1) implemented into the Spartan'08 package (www.wavefun.com) were calculated according to the models of Ghose, Pritchett, and Crippen [42] and Villar [43,44], respectively. Additional logP and ClogP indexes were derived from ChemDraw suite. VlogP parameter was obtained using Bernard Testa's virtual logP calculator available online: https://nova.disfarm.unimi.it/vlogp.htm (accessed on 21 September 2019). All the calculated logP values are summarized in Table 1.

DFT Calculations
All calculations have been performed with the Gaussian 16 package using default thresholds and algorithms. The standard hybrid Becke-3-Lee-Yang-Parr functional (B3LYP) [45], parameter-free Perdew-Burke-Enzerhof (abbreviated as PBE0 or PBE1PBE) functional [46], long-range-corrected hybrid functionals CAM-B3LYP [47] and ωB97XD [48] were utilized. The bulk solvent effects were taken into account for the DFT calculations by means of Solvation Model based on Density (SMD) [49]. A selection of standard basis sets has been used in the course of this study including the 6-31G+, 6-311G+(d,p), and 6-311G++(2df, 2dp). The geometry optimizations of all molecules in their ground states were carried out with the inclusion of solvent effects. Vibrational analysis was used to verify that the optimized structures correspond to local minima on the energy surface. Gibbs free energies, including zero-point corrections, temperature corrections, and vibrational energies, were computed for standard conditions (T = 298.15 K, P = 1.0 atm) using the harmonic oscillator approximation. The theoretical logarithm of partition coefficient (logP DFT ) was calculated according to the following equation: The calculated logP DFT values are presented in Table 3.

Principal Component Analysis (PCA)
Principal component analysis (PCA) was performed for two databases using the Statistica 10 software package (StatSoft, Tulsa, OK, USA). The first set of data included the obtained data matrixes (solutes × modifier concentrations) from TLC separations. The seconded investigated PCA matrix included lipophilicity indices obtained with a variety of indirect methods, theoretical methods, and antifungal activities. Due to the fact that these data have different units, they were standardized to unit variance and zero mean in order to eliminate the impact of divergent scales.

Sum of Ranking Differences (SRD) Analysis
Sum of ranking differences (SRD) was performed on lipophilicity data using Microsoft Excel visual basic macros freely available from http://aki.ttk.mta.hu/srd/. Although the SRD methodology is described in details by Hébrger and Hunek in their source papers [50][51][52], here, we will provided a short summary. During the SRD analysis, the studied compounds and lipophilicity estimation methods are arranged in rows and columns of a data matrix, respectfully. Since different lipophilicity measures were expressed in different scales, before the SRD analysis, they were transformed to the same scale. Since there are many methods to rescale the data, we have decided to test three most frequently employed strategies: standardization, i.e., scaling to the unit standard deviation (STD), interval scaling between 0 and 1 (IS), and rank transformation (RNK). In order to rank and compare different lipophilicity methods, SRD requires an additional column in a data matrix, the so-called reference vector. This vector can be a series of row-wise minima, maxima, or arithmetic means pulled from the all studied methods, or it can be a series of gold standard reference values. In this particular case, we have used a series of arithmetic means as a benchmark (consensus-based comparison). Consensus-based comparison has several advantages. First, every lipophilicity estimation method suffers from systematic and random errors. These errors are at least partially cancelled out by calculating the arithmetic mean. Second, the arithmetic mean, according to the maximum likelihood principle, is the value accompanied by maximum probability to be considered as a true value. After adding a proper reference column, all the values in the data matrix are ranked column-wise, and the ranks associated with each method are subtracted from the reference ranks. The obtained differences are then summed up in SRD values, which are then normalized according to Equation (8) and associated with each method. The smaller the SRD value is, the better is the lipophilicity estimation method, i.e., the closer is to the reference.
In order to validate such ranking, two approaches are implemented. One is based on comparison with a random distribution of SRD values, comparison with random numbers (SRD-CRRN), which tests the null hypothesis that ranking of objects by each particular method is performed by a chance. If the SRD value of a method lies within the random distribution curve, then the null hypothesis cannot be rejected and the ranking of compounds by that particular lipophilciity method is statistically insignificant at the 95% confidence level. Otherwise, methods lying out of the random distribution curve are statistically significant. The other approach is a sevenfold jack knife-like cross-validation procedure. It removes 1/7 of objects out of a data matrix and performs the SRD analysis. The procedure is repeated seven times, resulting in seven SRD values associated with each method. Methods are then arranged in ascending order of SRD medians and depicted in the form of a box plot, and similar ones are grouped into sections following non-parametric pairwise comparison testing. STATISTICA 9.1 (Statsoft, Tulsa, OK, USA) was further used for non-parametric significance testing, as well as the analysis of variance (ANOVA) of the SRD values obtained by the sevenfold cross-validation.

QSRR Analysis
In order to get insight into molecular mechanisms governing retentions and predict the four end points in the studied chromatographic systems (R M 0 for C 8 and C 18 TLC systems, logk w for HPLC, and logk for the MEKC system) QSRR analysis was performed. Briefly, the QSRR approach was proposed by Kaliszan at the end of the 1970s [3,53]. This approach analyzes the influence of the molecular structure of the analytes on their respective retention factors. Molecular descriptors were calculated for structures optimized in water by means of DFT at the PBE0/6-311G++(2df,2pd) level [46] and the SMD solvation model [49] (files available in mol format) using the Dragon 7.0 (Talete, Milan, Italy) software. Full names, symbols and definitions of the descriptors can be found the handbook by Todeschini et al. [54]. In total, 2848 descriptors were calculated for 26 molecular structures.
For construction of the QSRR models, the initial matrix of molecular descriptors was notably reduced using stringent pre-selection criteria, removing all the molecular descriptors with (i) missing values, (ii) relative standard deviation value < 5%, and (iii) those that were found to be redundant (pairwise R > 0.5 and <−0.5, removed the descriptor less correlated with the dependent variable) [55]. The dataset was split into a training and an external validation set (70/30% proportion) using the Kennard and Stone algorithm [56]. Genetic algorithm in its binary formulation coupled with partial least squares (GA-PLS) [55,57,58] was used for simultaneous variable selection and retention factor modeling. Each unit of a GA population comprised of molecular descriptors encoded in binary format (1: selected, 0: not selected). GA hyper-parameters and functions were set as follows: population size of 20, cross-over fraction of 0.8 with using the single-point function, mutation rate of 0.2 using the uniform mutation function, and tournament selection function. Leave-one-out cross-validation was used to optimize the number of latent variables within each unit and for the final models. The objective function of the GA was root mean square error of cross-validation, where y(exp), y(cv) represent the experimental values of the dependent variable and those estimated using LOO-CV, respectively. The GA-PLS algorithm was performed in 1000 cycles for each chromatographic setup. Final consensus GA-PLS models were built out of the molecular descriptors with the percentage of selection (% selection) higher than the mean % selection of all the molecular descriptors. Cross-validated analysis of variance (CV-ANOVA) [59] was employed to test the statistical significance of all the final consensus GA-PLS models. All the calculations pertaining to the QSRR analysis were performed in MATLAB 2019b (Mathworks, Sherborn, MA, USA).

Conclusions
Lipophilicity is an essential parameter for selection of compounds which may constitute a starting point for the development of novel antifungal drug candidates. Drug discovery aims to achieve strong potency with a minimal increase in molecular weight or lipophilicity. Among the studied compounds, structure 16 can be considered as the best starting structures for further studies, since it demonstrates the lowest lipophilic character among the compounds with pronounced antifungal activity.
The comparison of methods applied for lipophilicity assessment proved that three computational indices, i.e., VlogP, AClogP, logP Chem Draw, along with MEKC are the least applicable methods for lipophilicity determination of the investigated pyrido-isoxazolones. According to the obtained SRD analysis, it is not possible to give a clear recommendation for any particular computational program, but the lowest SRD scores indicated that the chromatographic lipophilicity parameters obtained using C 8 and C 18 modified silica, regardless to the chromatographic technique employed (TLC or HPLC), should be considered as the best manner of lipophilicity estimation for the studied pyrido-isoxazolones.
Supplementary Materials: Figure S1. Optimization of the number of latent variables (LVs) for the consensus GA-PLS models for (A) TLC with the C 8 plate, (B) TLC with the C 18 plate, (C) HPLC logkw, and (D) MEKC logk parameters. The optimal number of LVs is denoted in pink; Figure S2. Leave one out-cross validation (LOO-CV) predictive ability (on the training set) for (A) TLC with the C 8 plate, (B) TLC with the C 18 plate, (C) HPLC logkw, and (D) MEKC logk parameters; Table S1. Chemical names and structural formulas of the studied of pyridoand quinolino-isoxazolones; Table S2. Minimum, maximum, and mean value of logP-s calculated by different software, Table S3: Retention data for the investigated pyrido-and quinolino-isoxazolones obtained from the Soczewiński-Wachtmeister method with statistical parameters; Table S4. Antifungal activity toward references Candida species [1][2][3]; Table S5. Cross-validation-analysis of variance summary for the TLC C 8 QSRR model; Table S6. Cross-validation-analysis of variance summary for the TLC C 18 QSRR model; Table S7. Cross-validation analysis of variance summary for the HPLC logkw QSRR model; Table S8. Cross-validation analysis of variance summary for the MEKC logk QSRR model; Table S9. List of molecular descriptors with coefficients values in PLS models built for RP-LC and MEKC.
Author Contributions: K.C. designed the research study, performed MEKC experiments, analyzed the data, performed QSRR interpretation, and prepared the manuscript. J.F. and J.S. synthesized the compounds, analyzed the data, provided DFT calculations and consultation for conception of the study. P.Ž. contributed GA-PLS QSRRs models calculation, mechanistic interpretation, and prepared the manuscript. F.A. performed SRD analysis and prepared the manuscript. K.E.G. carried out HPLC analyses. P.B. calculated logPs with software and on-line platforms. J.N. carried out TLC analyses. P.K. and T.B. performed calculations in the Dragon program. J.S. corrected the manuscript.