Next Article in Journal
Synthesis and Luminescent Properties of Dy3+-Activated Yellow Phosphors with Anomalous Thermal Quenching for w-LEDs
Previous Article in Journal
Phase Separation Phenomena in Lightly Cu-Doped A-Site-Ordered Quadruple Perovskite NdMn7O12
Previous Article in Special Issue
Duality of Simplicity and Accuracy in QSPR: A Machine Learning Framework for Predicting Solubility of Selected Pharmaceutical Acids in Deep Eutectic Solvents
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering

Department of Physical Chemistry, Faculty of Pharmacy, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-950 Bydgoszcz, Poland
*
Author to whom correspondence should be addressed.
Molecules 2025, 30(23), 4563; https://doi.org/10.3390/molecules30234563
Submission received: 13 October 2025 / Revised: 5 November 2025 / Accepted: 23 November 2025 / Published: 26 November 2025

Abstract

This study presents a robust, data-driven framework for classifying and predicting drug solubility in deep eutectic solvents (DESs), moving beyond empirical approaches to enable rational formulation design. By analyzing 2010 solubility measurements of 21 diverse pharmaceutical compounds across numerous choline chloride, betaine, and menthol-based DESs, we employed Principal Component Analysis to reduce 16 COSMO-RS-derived descriptors into four chemically interpretable dimensions explaining 86.7% of the total variance. Persistence analysis confirmed component stability, revealing two key factors: PC1 (global solvation propensity, i.e., the overall capacity of the solvent to stabilize solutes through all interaction types) and PC2 (specific interaction complementarity, i.e., the degree of matching between solute and solvent hydrogen-bonding/polarity features). K-means clustering identified four distinct solubility regimes: high-solubility DES-optimized systems (Cluster 1), reliable moderate performers (Cluster 0), intermediate candidates for optimization (Cluster 3), and fundamentally challenging combinations (Cluster 2). Comparative analysis demonstrated choline chloride’s broad utility while revealing specialized roles for menthol and betaine in specific chemical spaces. Case studies of Sulfasalazine and Caffeine illustrated how multi-cluster distributions guide formulation strategies, distinguishing precision-requiring from forgiving compounds. This taxonomy provides formulation scientists with a rational framework for DES selection, emphasizing aqueous modification, HBD and HBA diversity, and balanced solvation-interaction optimization. The integrated PCA-clustering approach transforms DES development from trial-and-error screening to targeted design, offering fundamental insights into solubility mechanisms while accelerating sustainable pharmaceutical formulation.
Keywords: deep eutectic solvents; solubility prediction; principal component analysis; K-means clustering; pharmaceutical formulation; data-driven design; COSMO-RS deep eutectic solvents; solubility prediction; principal component analysis; K-means clustering; pharmaceutical formulation; data-driven design; COSMO-RS

Share and Cite

MDPI and ACS Style

Cysewski, P.; Przybyłek, M.; Jeliński, T. Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering. Molecules 2025, 30, 4563. https://doi.org/10.3390/molecules30234563

AMA Style

Cysewski P, Przybyłek M, Jeliński T. Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering. Molecules. 2025; 30(23):4563. https://doi.org/10.3390/molecules30234563

Chicago/Turabian Style

Cysewski, Piotr, Maciej Przybyłek, and Tomasz Jeliński. 2025. "Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering" Molecules 30, no. 23: 4563. https://doi.org/10.3390/molecules30234563

APA Style

Cysewski, P., Przybyłek, M., & Jeliński, T. (2025). Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering. Molecules, 30(23), 4563. https://doi.org/10.3390/molecules30234563

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop