Abstract
We present a systematic machine learning study of the solubility of diverse pharmaceutical acids in deep eutectic solvents (DESs). Using an automated Dual-Objective Optimization with Iterative feature pruning (DOO-IT) framework, we analyze a solubility dataset compiled from the literature for ten pharmaceutically important carboxylic acids and augment it with new measurements for mefenamic and niflumic acids in choline chloride- and menthol-based DESs, yielding N = 1020 data points. The data-driven multi-criterion measure is applied for final model selection among all collected accurate and parsimonious models. This three-step procedure enables extensive exploration of the model’s hyperspace and effective selection of models fulfilling notable accuracy, simplicity, and also persistency of the descriptors selected during model development. The dual-solution landscape clarifies the trade-off between complexity and cost in QSPR for DES systems and shows that physically meaningful energetic descriptors can replace or enhance explicit COSMO-RS predictions depending on the application.