Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering

Cysewski, Piotr; Przybyłek, Maciej; Jeliński, Tomasz

doi:10.3390/molecules30234563

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering

by

Piotr Cysewski

,

Maciej Przybyłek

and

Tomasz Jeliński

^*

Department of Physical Chemistry, Faculty of Pharmacy, Collegium Medicum in Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-950 Bydgoszcz, Poland

^*

Author to whom correspondence should be addressed.

Molecules 2025, 30(23), 4563; https://doi.org/10.3390/molecules30234563

Submission received: 13 October 2025 / Revised: 5 November 2025 / Accepted: 23 November 2025 / Published: 26 November 2025

(This article belongs to the Special Issue New Horizons in Deep Eutectic Solvents (DESs): Synthesis, Characterization and Applications)

Download Versions Notes

Abstract

This study presents a robust, data-driven framework for classifying and predicting drug solubility in deep eutectic solvents (DESs), moving beyond empirical approaches to enable rational formulation design. By analyzing 2010 solubility measurements of 21 diverse pharmaceutical compounds across numerous choline chloride, betaine, and menthol-based DESs, we employed Principal Component Analysis to reduce 16 COSMO-RS-derived descriptors into four chemically interpretable dimensions explaining 86.7% of the total variance. Persistence analysis confirmed component stability, revealing two key factors: PC1 (global solvation propensity, i.e., the overall capacity of the solvent to stabilize solutes through all interaction types) and PC2 (specific interaction complementarity, i.e., the degree of matching between solute and solvent hydrogen-bonding/polarity features). K-means clustering identified four distinct solubility regimes: high-solubility DES-optimized systems (Cluster 1), reliable moderate performers (Cluster 0), intermediate candidates for optimization (Cluster 3), and fundamentally challenging combinations (Cluster 2). Comparative analysis demonstrated choline chloride’s broad utility while revealing specialized roles for menthol and betaine in specific chemical spaces. Case studies of Sulfasalazine and Caffeine illustrated how multi-cluster distributions guide formulation strategies, distinguishing precision-requiring from forgiving compounds. This taxonomy provides formulation scientists with a rational framework for DES selection, emphasizing aqueous modification, HBD and HBA diversity, and balanced solvation-interaction optimization. The integrated PCA-clustering approach transforms DES development from trial-and-error screening to targeted design, offering fundamental insights into solubility mechanisms while accelerating sustainable pharmaceutical formulation.

Keywords: deep eutectic solvents; solubility prediction; principal component analysis; K-means clustering; pharmaceutical formulation; data-driven design; COSMO-RS

Share and Cite

MDPI and ACS Style

Cysewski, P.; Przybyłek, M.; Jeliński, T. Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering. Molecules 2025, 30, 4563. https://doi.org/10.3390/molecules30234563

AMA Style

Cysewski P, Przybyłek M, Jeliński T. Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering. Molecules. 2025; 30(23):4563. https://doi.org/10.3390/molecules30234563

Chicago/Turabian Style

Cysewski, Piotr, Maciej Przybyłek, and Tomasz Jeliński. 2025. "Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering" Molecules 30, no. 23: 4563. https://doi.org/10.3390/molecules30234563

APA Style

Cysewski, P., Przybyłek, M., & Jeliński, T. (2025). Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering. Molecules, 30(23), 4563. https://doi.org/10.3390/molecules30234563

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Data-Driven Classification of Solubility Space in Deep Eutectic Solvents: Deciphering Driving Forces Using PCA and K-Means Clustering

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI