Feature-Guided Machine Learning for Studying Passive Blood–Brain Barrier Permeability to Aid Drug Discovery

Zhu, Baining; Liu, Suwei

doi:10.3390/ijms262211228

Open AccessArticle

Feature-Guided Machine Learning for Studying Passive Blood–Brain Barrier Permeability to Aid Drug Discovery

by

Baining Zhu

¹ and

Suwei Liu

^2,*

¹

Phillips Exeter Academy, Exeter, NH 03833, USA

²

Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA

^*

Author to whom correspondence should be addressed.

Int. J. Mol. Sci. 2025, 26(22), 11228; https://doi.org/10.3390/ijms262211228

Submission received: 19 October 2025 / Revised: 13 November 2025 / Accepted: 18 November 2025 / Published: 20 November 2025

(This article belongs to the Section Molecular Pharmacology)

Download

Browse Figures

Versions Notes

Abstract

Effective prediction of blood–brain barrier (BBB) permeability remains essential for central nervous system drug development. This study evaluates multiple supervised machine learning models using a public dataset of permeable and non-permeable compounds. Random Forest models demonstrate optimal balance between accuracy and generalizability, outperforming more complex gradient boosting methods that were prone to overfitting. Feature analysis identifies NH/OH and NO group counts as key determinants of passive diffusion, with reduced hydrogen bond donor and heteroatom counts enhancing permeability. Additionally, model performance deteriorates at NH/OH count = 3, establishing this as a decision boundary where hydrogen bonding complexity disrupts reliable prediction. This study shows the non-linear structure-permeability relationships that challenge traditional descriptor-based approaches, while demonstrating that machine learning can simultaneously provide both accurate prediction and applicable insights for drug discovery applications.

Keywords:

blood–brain barrier; machine learning; drug discovery; permeability prediction

1. Introduction

The blood–brain barrier (BBB) plays a vital role in the central nervous system (CNS) by regulating the exchange of molecules between the bloodstream and neural tissue [1,2,3,4]. However, this protective function has a negative impact on developing therapeutics for neurological disorders such as Alzheimer’s disease, Parkinson’s disease, epilepsy, and brain cancers [5,6,7,8]. Failure to accurately predict BBB permeability has led to significant costs throughout the drug discovery procedures, as promising drug candidates are frequently abandoned late in clinical development. The ability to determine whether a compound can cross the BBB is therefore central to efficient drug discovery [9,10,11,12,13].

A range of experimental strategies has been developed to study BBB transport. In vivo animal models, in vitro cell culture systems, and receptor-mediated assays offer high biological fidelity and provide detailed insights into molecular pathways [14,15,16,17,18]. These approaches produce physiologically relevant data and proof-of-concept validation, but they remain slow, expensive, and difficult to scale up. More specifically, extrapolating results from a controlled lab environment to the complex human BBB is often uncertain, which limits large-scale or early-stage drug discovery [19,20,21].

To increase throughput, theoretical models have been developed to predict permeability using chemical descriptors. More specifically, these descriptors include quantitative structure–activity relationship (QSAR) analysis and features such as molecular weight, lipophilicity, and polar surface area, which are used to estimate transport potential [22,23,24,25,26,27,28,29,30,31]. These methods are computationally efficient and easy to interpret, allowing large compound libraries to be screened at low cost. However, traditional QSAR and linear regression approaches rely on linear additivity assumptions, where molecular features are assumed to contribute independently to permeability without synergistic or antagonistic interactions. This dependence on simplified feature–permeability relationships restricts their ability to capture the complex, non-linear interactions that determine BBB transport, such as threshold effects or compensatory mechanisms between lipophilicity and hydrogen bonding capacity [19,32,33,34].

At the molecular scale, simulation methods such as molecular dynamics (MD) have enabled atomistic views of drug–membrane interactions. MD can describe conformational changes, energetics, and transient membrane phenomena that are difficult to observe experimentally [35,36,37,38]. These studies offer a valuable understanding at the molecular level, but the heavy computational cost makes MD studies suitable for only a small number of molecules. For large compound libraries, MD is not yet a practical tool for permeability prediction [35,39]. Typical MD investigations are limited to small or moderately sized molecules (usually below 400 Da) with well-parameterized chemical groups, since accurate force-field parameterization and extensive sampling times (hundreds of nanoseconds) are required within explicit lipid bilayers [40,41]. Scaling such atomistic systems to thousands of compounds remains computationally prohibitive, as discussed in previous BBB simulation studies [42,43].

To solve the aforementioned bottlenecks, the machine learning (ML) method bridges the gap, combining both the scalability and the ability to learn nonlinear patterns in molecular features [44,45,46]. By training on experimental data, ML models can identify structural and physicochemical properties that correlate with BBB penetration and apply those rules to new compounds [47,48]. In this study, the publicly available Blood–Brain Barrier Penetration (BBBP) dataset from MoleculeNet [49] is used to evaluate the performance of different ML approaches. The dataset contains 1955 compounds annotated as permeable (BBB+) or non-permeable (BBB−), represented by 2048-bit Morgan fingerprints and 208 RDKit descriptors. The data are imbalanced, with 76% labeled BBB+ and 24% BBB−. The BBBP dataset includes a diverse collection of marketed and experimental molecules, including both CNS-targeted drugs and non-CNS agents with reported central side effects. Because permeability labels were derived from experimentally measured log BB values rather than pharmacological indication, the dataset reflects general physicochemical diversity rather than a therapeutic-class bias.

The goal of this study is to provide a comprehensive comparison of machine learning algorithms for predicting BBB permeability, with a focus on further interpretation of the ML performance results. A variety of supervised ML algorithms are examined, ranging from simple linear classifiers such as logistic regression to ensemble-based approaches including random forest and gradient boosting, as well as neural-network-based models. Multiple resampling methods, including Synthetic Minority Oversampling Technique (SMOTE) [50], Borderline SMOTE [51], and combined undersampling, are applied to address the imbalanced feature of the obtained dataset. This work demonstrates how accessible machine learning strategies can inform the rational design of CNS-active compounds and provide an efficient complement to experimental, theoretical and simulation-based approaches.

2. Results and Discussions

2.1. Base Model Comparison

The dataset used in this study comprises 1955 chemical compounds from the MoleculeNet BBBP dataset. After preprocessing to remove low-variance features (variance threshold of 0.14) and those with missing or infinite values, the dimensionality of the feature space is reduced to 743, which suggests these are the most informative molecular descriptors.

A major characteristic of the dataset is its class imbalance, where 1492 compounds (76.3%) are labeled as BBB+ while 463 (23.7%) are labeled as BBB−. This imbalance ratio of approximately 3:1 introduces bias risks, as classifiers may favor the majority class and inflate accuracy at the expense of minority class prediction. In drug discovery applications, such misclassification is problematic, particularly when false negatives lead to premature exclusion of promising candidates.

To establish baseline predictive performance, 11 machine learning models are trained and evaluated using stratified ten-fold cross-validation. Performance metrics include accuracy, precision, recall, F1-score, and runtime, as summarized in Table 1. The models represent diverse families, including linear models, ensemble methods, boosting frameworks, neural networks, decision trees, probabilistic classifiers, and simple baselines.

Across models, Logistic Regression and Random Forest achieve the best overall balance of precision and recall, with F1-scores of 0.925 and 0.924, respectively. Logistic Regression offers the highest precision (0.891), while Random Forest delivers the highest recall (0.978) and the lowest runtime (0.04 s), making both suitable candidates for large-scale screening applications. Gradient boosting frameworks such as XGBoost, Gradient Boosting, and LightGBM also perform competitively but do not exceed the simpler models in this task. Neural networks and boosting methods achieve strong recall but require greater computational cost and tuning effort. Simpler models such as Decision Tree, k-Nearest Neighbors, Gaussian Naive Bayes, or the dummy baseline show weaker performance, underscoring the value of ensemble and linear approaches in this context.

The advantage of Random Forest over Logistic Regression in terms of Recall performance, although seemingly subtle in the cross-validated results shown in Table 1, becomes more pronounced when examined on single validation splits (as detailed in Table 2 and Table 3). This difference reflects fundamental algorithmic characteristics: Random Forest’s ensemble of decision trees can model complex, non-linear decision boundaries that adapt to local data structure, enabling better identification of minority-class compounds (BBB−) and substantially reducing false negatives. In contrast, Logistic Regression relies on a single global linear boundary, making it more susceptible to missing minority-class cases in imbalanced datasets. On single-split evaluations, Random Forest achieves significantly higher recall (0.989 with only 5 false negatives) compared to Logistic Regression (0.938 with 28 false negatives), while both models maintain comparable false positive rates (around 10%). This demonstrates Random Forest’s superior ability to handle class imbalance and improve sensitivity without sacrificing specificity, a critical advantage in drug discovery applications where missing potentially permeable compounds can be costly. Based on this comparison, Logistic Regression and Random Forest are selected for further evaluation under resampling strategies to mitigate class imbalance and enhance predictive reliability.

2.2. Impact of Resampling Techniques on Model Performance

The inherent class imbalance in the BBBP dataset presents a fundamental challenge for machine learning classifiers, as models tend to be biased toward the majority class, potentially compromising their ability to accurately identify blood–brain barrier permeable compounds. High recall values across most models indicate strong sensitivity to the positive class, but this often comes at the expense of precision, resulting in an increased rate of false positives. This trade-off is especially pronounced in ensemble methods such as AdaBoost and neural network architectures like MLP, which achieve strong recall but comparatively lower precision [52,53]. In drug discovery, where the costs of false positives and false negatives differ, this imbalance in predictive behavior must be carefully managed.

To address class imbalance, multiple resampling techniques are applied to the best two representative classifiers (Logistic Regression and Random Forest). These models allow evaluation of resampling effects across different algorithmic families, which include standard SMOTE, Borderline SMOTE, and an undersampling method, each designed to re-balance training data in distinct ways.

SMOTE generates synthetic minority class instances by interpolating between existing samples and their nearest neighbors, expanding the decision space of the minority class. Borderline SMOTE focuses on minority samples near the decision boundary, where misclassification risk is highest. The undersampling method reduces majority class instances while augmenting the minority class, producing balance through simultaneous contraction and expansion.

As shown in Table 2, Logistic Regression combined with SMOTE produces the strongest improvements, raising ROC AUC from 0.764 to 0.791 (+2.7%) and average precision from 0.873 to 0.887, while also improving true negative identification (82 to 93). This gain in precision is balanced by a slight drop in recall (0.938 to 0.913). Borderline SMOTE yields smaller improvements, while the combined method pushes precision slightly higher but reduces recall more substantially, showing a shift toward conservative predictions.

As documented in Table 3, Random Forest shows a stronger baseline performance and smaller changes under resampling. SMOTE slightly improves ROC AUC (+0.010) and precision, with minimal recall loss. The combined method produces the best precision (0.911) and the largest ROC AUC gain (+0.020) but lowers recall from 0.989 to 0.955, indicating a trade-off between sensitivity and specificity. Borderline SMOTE produces negligible changes.

Overall, both resampling techniques have shown improvement in terms of the prediction performance, with ROC and AUC consistently increasing across methods. Logistic Regression is more sensitive to resampling, showing larger precision–recall trade-offs, whereas Random Forest maintains stability with incremental gains. These results suggest that resampling is most useful for improving specificity and reducing false positives, which is particularly valuable in drug discovery pipelines where filtering out non-permeable compounds can save development costs and time. Subsequent analyses are based on the output from Random Forest combined with the undersampling technique.

2.3. Feature Ranking and Interpretation of Key Molecular Features

To enhance the interpretability of the machine learning models, feature importance scores are extracted from the Random Forest classifier. This analysis highlights a combination of physicochemical descriptors and Morgan fingerprint bits as the most influential factors for BBB permeability, as shown in Figure 1.

Among the top descriptors, NHOHCount is observed to serve as the most important indicator. This feature measures the total number of hydroxyl (–OH) or amine (–NH) groups in a molecule. Each group acts as a hydrogen bond donor, increasing polarity and strengthening interactions with water or polar residues in the BBB endothelium [54,55]. However, these same interactions reduce the molecule’s lipophilicity and hinder passive diffusion through the lipid-rich membrane. Molecules with fewer donors are more likely to partition into and cross the BBB.

The influence of –NH/OH groups is illustrated in Figure 2. More specifically, as shown in Figure 2a, molecules with 0–1 donors achieve permeability rates above 80%, while those with four or more drop below 20%. This sharp decrease highlights the negative effect of excessive hydrogen bonding on passive diffusion. Figure 2b further illustrates that permeable compounds concentrate at low NHOH counts (median ≈ 1), while non-permeable molecules exhibit a broader spread extending into higher counts. Figure 2c quantifies the prediction complexity introduced by donor content, where accuracy systematically decreases from 94.0% (0 donors) to 67.9% (3 donors), indicating that the permeability decision boundary becomes increasingly complex as NH/OH count approaches three. However, prediction accuracy recovers for compounds with 4+ donors, as these molecules exhibit predominantly non-permeable behavior, simplifying the classification task despite their chemical complexity. Clinically, this trend is consistent with some well-known drugs such as propranolol and ampicillin. More specifically, propranolol has 2 donors and it is BBB permeable [56,57], whereas ampicillin possesses 4 donors and is poorly permeable [58,59], as shown in Figure 3, respectively. These examples confirm that the presence of a hydrogen bond donor is a critical factor of BBB permeability and must be strategically optimized in CNS drug design.

The second most important feature is NOCount, which quantifies the total number of nitrogen and oxygen atoms in a molecule. These atoms often function as hydrogen bond acceptors and increase polarity. While necessary for solubility and target binding, an excess of heteroatoms typically reduces membrane permeability [60,61]. This finding is consistent with Lipinski’s Rule of Five [62], which claims that compounds with more than 10 hydrogen bond acceptors (N and O atoms combined) rarely achieve good oral bioavailability or CNS penetration.

Figure 4a shows that molecules with 0–2 NO atoms achieve nearly 100% permeability, whereas those with 4 or more fall below 85%. It is further illustrated in Figure 4b that permeable compounds cluster at lower NO counts, while non-permeable molecules extend across a much wider range, including very high counts. Lastly, the model accuracy is highest (96%) for compounds with low NO counts and declines progressively with increasing heteroatom numbers, as shown in Figure 4c. Clinically, dexamfetamine with 1 N atom, shown in Figure 5a, crosses the BBB efficiently [63], while etoposide with 13 O atoms, illustrated in Figure 5b is unable to penetrate [64]. These contrasting cases highlight how balancing polarity and lipophilicity is essential in designing CNS-active drugs.

Aside from these dominant descriptors, additional features also contribute to BBB permeability prediction. HeavyAtomMolWt, a measure of molecular size, shows high importance, consistent with the principle that smaller compounds more easily diffuse through the BBB. Several Morgan fingerprint bits (e.g., MFP_808, MFP_390) also rank highly, suggesting that particular structural motifs or substructures recur among BBB-permeable drugs. Although hashed and not directly interpretable, these fingerprints likely correspond to chemical scaffolds that enhance penetration, such as small lipophilic rings or heterocycles.

Overall, the prominence of hydrogen bond donor count (NHOHCount) and heteroatom count (NOCount) confirms that polarity is the primary barrier to BBB penetration. By linking ML model-derived features to established pharmacological heuristics and real drug examples, these findings demonstrate that machine learning models capture biologically meaningful determinants of CNS drug delivery, rather than operating as black-box predictors.

3. Methods

3.1. Dataset Description and Molecular Representations

The BBBP dataset consists of 1955 unique molecules, each represented by its SMILES string, compound name, and corresponding

log B B

value. A threshold of

log B B = - 1

is applied to categorize molecules into BBB+ (permeable) and BBB− (impermeable), consistent with prior pharmacokinetic studies [24,26]. After preliminary permeability classification, the dataset contains 1492 BBB+ and 463 BBB− compounds, which confirms that the dataset is highly imbalanced. To mitigate class imbalance, SMOTE is applied exclusively within the training folds to avoid data leakage. Additional preprocessing includes removal of duplicate entries, standardization of molecular representations, and normalization of descriptor values. Compounds in the BBBP dataset are annotated by experimental BBB penetration data and are not restricted to CNS-active therapeutics, ensuring representative physicochemical diversity.

In addition to permeability information, molecular structures are encoded into numerical features using multiple complementary representations designed to capture both substructural and physicochemical properties. MACCS keys serve as 166-bit binary fingerprints, indicating the presence or absence of predefined structural fragments such as hydroxyl groups and aromatic rings. To capture more flexible substructural information, Morgan fingerprints are generated, encoding circular neighborhoods around each atom, as computed using the RDKit cheminformatics toolkit [65]. In addition to these fragment-based encodings, 208 physicochemical descriptors are calculated with RDKit, including properties such as molecular weight, lipophilicity (logP), topological polar surface area (TPSA), and counts of hydrogen bond donors and acceptors. Topological descriptors are also included to provide higher-order information on connectivity indices, ring systems, and molecular shape. All continuous-valued descriptors are standardized to zero mean and unit variance prior to model training to ensure comparability across features. These diverse representations provide a comprehensive feature set for evaluating machine learning models in BBB permeability prediction. Note that the descriptors are calculated using the dominant protonation state at physiological pH (≈7.4) [66]. As a result, the current models therefore do not explicitly consider multiple ionization states. Nevertheless, combining with machine learning, the dataset is capable of providing general guidance on BBB permeability given its wide range of properties examined.

3.2. Feature Engineering, Preprocessing, Model Training and Evaluation

The overall workflow of the study is illustrated in Figure 6. Starting from the MoleculeNet BBBP dataset, raw molecular structures are transformed into numerical representations through feature engineering. This step includes encoding compounds as molecular fingerprints and physicochemical descriptors, hence capturing both substructural patterns and global molecular properties. The resulting feature matrix is then split into training and test sets, with stratification to maintain the proportion of BBB+ and BBB− classes.

To reduce redundancy and improve interpretability, feature selection is applied using feature importance scores derived from 11 different classifiers, which are further discussed in Section 2.1. Features with low variance below 0.14 threshold are excluded, thereby mitigating noise and lowering the risk of overfitting. Preprocessing of the training set involves normalization of continuous-valued features to zero mean and unit variance, as well as the application of resampling techniques to address class imbalance. In particular, oversampling methods such as SMOTE and its variants are explored to synthetically generate minority class samples, while undersampling approaches are used to further balance class representation.

Model training is conducted using a diverse range of supervised learning algorithms, each chosen to represent distinct modeling features. Logistic Regression serves as a simple yet interpretable linear baseline. Random Forest, a tree-based ensemble method, is included for its robustness to overfitting and its ability to quantify feature importance. Support Vector Machines (SVM) with both linear and radial basis function (RBF) kernels are tested for their capacity to model complex, high-dimensional decision boundaries. Gradient boosting frameworks, including XGBoost and LightGBM, are selected for their strong track record of predictive accuracy on structured data. In addition, k-Nearest Neighbors (k-NN) is evaluated as a non-parametric method relying on local molecular similarity. Model performance is evaluated through stratified ten-fold cross-validation to ensure consistent class distributions across different sampling sets.

Reliable evaluation metrics are essential in the context of BBB permeability prediction. For example, the relative consequences of false positives and false negatives differ significantly. Metrics used to evaluate the performance of the models are explained as follows. Accuracy provides an overall measure of correctness but can be misleading under class imbalance, which is the scenario of the dataset used in this study. Precision quantifies how many predicted BBB+ molecules are truly permeable, helping to minimize wasted resources on false positives. Recall or sensitivity emphasizes the correct identification of permeable molecules, reducing the risk of prematurely missing potential drug candidates. The F1-score is the harmonic mean of precision and recall, which provides a balanced assessment considering both error types.

3.3. Model Evaluation and Interpretation

Model performance is evaluated to capture both overall accuracy and class-specific trade-offs. In particular, accuracy, precision, recall, and F1-score are computed to assess predictive balance across BBB+ and BBB− compounds. Receiver operating characteristic (ROC) curves and the corresponding area under the curve (AUC) values quantify discriminative power, while average precision scores provide an additional summary of performance under class imbalance. To ensure robust prediction, all metrics are reported as averages across stratified ten-fold cross-validation.

To address the class imbalance present in the dataset, resampling techniques such as SMOTE, Borderline-SMOTE, and undersampling are applied prior to model training. Their impact on predictive performance is evaluated by comparing classification outcomes, including true positives, true negatives, false positives, and false negatives. This enables quantitative assessment of the trade-offs between precision and recall under different sampling strategies.

Beyond the aforementioned performance metrics, interpretability and visualization are discussed to provide chemical and biological insights into model predictions. Feature importance scores derived from the ideal ML model are used to identify the most influential descriptors. These evaluation and interpretation procedures not only benchmark model predictive power but also connect statistical performance to biologically meaningful features, offering a practical bridge between cheminformatics and drug discovery applications.

4. Conclusions

In this study, supervised ML models are evaluated for predicting BBB permeability using ten-fold cross-validation and resampling strategies to address class imbalance. Unlike prior work that emphasizes predictive accuracy alone, this study establishes a feature-guided interpretability framework that links model performance directly to actionable design principles for CNS drug discovery. Random Forest models achieve high F1-scores with lower computational cost than more complex gradient boosting approaches, demonstrating that algorithmic simplicity combined with transparent descriptor analysis can offer superior practical utility.

The feature analysis identifies hydrogen bond donor count (NHOHCount) and heteroatom count (NOCount) as dominant predictors, revealing clear non-linear permeability trends. It is clear that molecules with 0–1 NH/OH donors show >80% permeability, while those with four or more donors show reduced penetration. A critical decision boundary at NH/OH = 3 is identified, corresponding to a pharmacokinetic “gray zone” where predictive performance drops sharply, suggesting a structural threshold where compensatory physicochemical mechanisms become dominant. These findings provide experimentally relevant guidelines, supported by clinically established examples such as propranolol (BBB+) and ampicillin (BBB−).

From a cheminformatics perspective, NHOHCount and NOCount capture polarity, solubility, and hydrogen bonding capacity—properties that directly shape passive transcellular diffusion across the BBB. The models successfully quantify the relative importance of these features and recapitulate established pharmacological heuristics such as Lipinski’s Rule of Five, reinforcing that model predictions remain consistent with known biochemical principles rather than purely statistical correlations.

An important practical consideration is the applicability domain (AD) of these models. While stratified cross-validation indicates robust internal performance, generalization to novel chemotypes may vary. Future work could incorporate molecular similarity metrics (e.g., Tanimoto coefficient) and ensemble-based uncertainty estimates to identify predictions made outside the chemical space represented in training data. External validation on scaffold-split datasets or newly synthesized compounds will further strengthen confidence in model generalizability. Moreover, this study focuses on physicochemical descriptors associated with passive diffusion, and active transporter mechanisms such as P-glycoprotein and BCRP are not modeled. As a result, the predictions are most applicable to compounds that permeate primarily via passive mechanisms. Additionally, ionization states are not explicitly represented; incorporating pH-dependent features such as logD may further refine future predictions. Additional work includes expanding the training dataset to incorporate broader chemical diversity, exploring advanced molecular representations such as graph neural networks, and integrating predictive modeling with generative design workflows. These developments are expected to enhance both the predictive accuracy and real-world applicability of computational pipelines for BBB permeability and support more effective CNS drug discovery.

Author Contributions

Conceptualization, B.Z. and S.L.; methodology, B.Z. and S.L.; analysis, B.Z. and S.L.; writing—original draft preparation, B.Z. and S.L.; writing—review and editing, S.L.; visualization, B.Z. and S.L.; supervision, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Processed results will be available for at least 2 years after publication upon written request to the author.

Acknowledgments

Baining Zhu would like to thank Lizette Li for her valuable guidance and insights throughout the development of this project.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC	Area Under the Curve
BBB	Blood-Brain Barrier
BBBP	Blood-Brain Barrier Penetration
CNS	Central Neural System
GaussianNB	Gaussian Naive Bayes
k-NN	k-Nearest Neighbors
LightGBM	Light Gradient Boosting Machine
logP	Lipophilicity
MACCS	Molecular ACCess System
MD	Molecular Dynamics
MFP	Morgan FingerPrint
MLP	Multi-Layer Perceptron
RBF	Radial Basis Function
ROC	Receiver Operating Characteristic
SMILES	Simplified Molecular Input Line Entry System
SMOTE	Synthetic Minority Oversampling Technique
SVM	Support Vector Machines
TPSA	Topological Polar Surface Area
XGBoost	eXtreme Gradient Boosting

References

Jeffrey, P.; Summerfield, S. Assessment of the blood–brain barrier in CNS drug discovery. Neurobiol. Dis. 2010, 37, 33–37. [Google Scholar] [CrossRef]
Alavijeh, M.S.; Chishty, M.; Qaiser, M.Z.; Palmer, A.M. Drug metabolism and pharmacokinetics, the blood–brain barrier, and central nervous system drug discovery. NeuroRx 2005, 2, 554–571. [Google Scholar] [CrossRef]
Pardridge, W.M. CNS drug design based on principles of blood–brain barrier transport. J. Neurochem. 1998, 70, 1781–1792. [Google Scholar] [CrossRef]
Pardridge, W.M. Blood–brain barrier endogenous transporters as therapeutic targets: A new model for small molecule CNS drug discovery. Expert Opin. Ther. Targets 2015, 19, 1059–1072. [Google Scholar] [CrossRef]
Pardridge, W.M. The blood–brain barrier: Bottleneck in brain drug development. NeuroRx 2005, 2, 3–14. [Google Scholar] [CrossRef] [PubMed]
Cecchelli, R.; Berezowski, V.; Lundquist, S.; Culot, M.; Renftel, M.; Dehouck, M.P.; Fenart, L. Modelling of the blood–brain barrier in drug discovery and development. Nat. Rev. Drug Discov. 2007, 6, 650–661. [Google Scholar] [CrossRef] [PubMed]
Pardridge, W.M. Alzheimer’s disease drug development and the problem of the blood–brain barrier. Alzheimer’s Dement. 2009, 5, 427–432. [Google Scholar] [CrossRef] [PubMed]
Markou, A.; Chiamulera, C.; Geyer, M.A.; Tricklebank, M.; Steckler, T. Removing obstacles in neuroscience drug discovery: The future path for animal models. Neuropsychopharmacology 2009, 34, 74–89. [Google Scholar] [CrossRef]
Spencer, B.J.; Verma, I.M. Targeted delivery of proteins across the blood–brain barrier. Proc. Natl. Acad. Sci. USA 2007, 104, 7594–7599. [Google Scholar] [CrossRef]
Lim, S.; Kim, W.J.; Kim, Y.H.; Lee, S.; Koo, J.H.; Lee, J.A.; Yoon, H.; Kim, D.H.; Park, H.J.; Kim, H.M.; et al. dNP2 is a blood–brain barrier-permeable peptide enabling ctCTLA-4 protein delivery to ameliorate experimental autoimmune encephalomyelitis. Nat. Commun. 2015, 6, 8244. [Google Scholar] [CrossRef]
Pardridge, W.M. Blood-brain barrier drug targeting: The future of brain drug development. Mol. Interv. 2003, 3, 90. [Google Scholar] [CrossRef] [PubMed]
Banks, W.A. Characteristics of compounds that cross the blood–brain barrier. BMC Neurol. 2009, 9, S3. [Google Scholar] [CrossRef]
Ohtsuki, S.; Terasaki, T. Contribution of carrier-mediated transport systems to the blood–brain barrier as a supporting and protecting interface for the brain; importance for CNS drug discovery and development. Pharm. Res. 2007, 24, 1745–1758. [Google Scholar] [CrossRef] [PubMed]
Kaisar, M.A.; Sajja, R.K.; Prasad, S.; Abhyankar, V.V.; Liles, T.; Cucullo, L. New experimental models of the blood–brain barrier for CNS drug discovery. Expert Opin. Drug Discov. 2017, 12, 89–103. [Google Scholar] [CrossRef]
Garberg, P.; Ball, M.; Borg, N.; Cecchelli, R.; Fenart, L.; Hurst, R.; Lindmark, T.; Mabondzo, A.; Nilsson, J.; Raub, T.; et al. In vitro models for the blood–brain barrier. Toxicol. Vitr. 2005, 19, 299–334. [Google Scholar] [CrossRef]
Wilhelm, I.; Krizbai, I.A. In vitro models of the blood–brain barrier for the study of drug delivery to the brain. Mol. Pharm. 2014, 11, 1949–1963. [Google Scholar] [CrossRef]
Dehouck, M.P.; Jolliet-Riant, P.; Brée, F.; Fruchart, J.C.; Cecchelli, R.; Tillement, J.P. Drug transfer across the blood–brain barrier: Correlation between in vitro and in vivo models. J. Neurochem. 1992, 58, 1790–1797. [Google Scholar] [CrossRef]
Kafa, H.; Wang, J.T.W.; Rubio, N.; Venner, K.; Anderson, G.; Pach, E.; Ballesteros, B.; Preston, J.E.; Abbott, N.J.; Al-Jamal, K.T. The interaction of carbon nanotubes with an in vitro blood–brain barrier model and mouse brain in vivo. Biomaterials 2015, 53, 437–452. [Google Scholar] [CrossRef]
Ruck, T.; Bittner, S.; Meuth, S.G. Blood-brain barrier modeling: Challenges and perspectives. Neural Regen. Res. 2015, 10, 889–891. [Google Scholar] [CrossRef]
Gidwani, M.; Singh, A.V. Nanoparticle enabled drug delivery across the blood brain barrier: In vivo and in vitro models, opportunities and challenges. Curr. Pharm. Biotechnol. 2013, 14, 1201–1212. [Google Scholar] [CrossRef] [PubMed]
Shah, B.; Dong, X. Current status of in vitro models of the blood–brain barrier. Curr. Drug Deliv. 2022, 19, 1034–1046. [Google Scholar]
Bujak, R.; Struck-Lewicka, W.; Kaliszan, M.; Kaliszan, R.; Markuszewski, M.J. Blood–brain barrier permeability mechanisms in view of quantitative structure–activity relationships (QSAR). J. Pharm. Biomed. Anal. 2015, 108, 29–37. [Google Scholar] [CrossRef]
Vucicevic, J.; Nikolic, K.; Dobričić, V.; Agbaba, D. Prediction of blood–brain barrier permeation of α-adrenergic and imidazoline receptor ligands using PAMPA technique and quantitative-structure permeability relationship analysis. Eur. J. Pharm. Sci. 2015, 68, 94–105. [Google Scholar] [CrossRef]
Liu, R.; Sun, H.; So, S.S. Development of quantitative structure- property relationship models for early ADME evaluation in drug discovery. 2. Blood-brain barrier penetration. J. Chem. Inf. Comput. Sci. 2001, 41, 1623–1632. [Google Scholar] [CrossRef] [PubMed]
Golmohammadi, H.; Dashtbozorgi, Z.; Acree, W.E., Jr. Quantitative structure–activity relationship prediction of blood-to-brain partitioning behavior using support vector machine. Eur. J. Pharm. Sci. 2012, 47, 421–429. [Google Scholar] [CrossRef] [PubMed]
Wang, T.; Wu, M.B.; Lin, J.P.; Yang, L.R. Quantitative structure–activity relationship: Promising advances in drug discovery platforms. Expert Opin. Drug Discov. 2015, 10, 1283–1300. [Google Scholar] [CrossRef]
Kortagere, S.; Chekmarev, D.; Welsh, W.J.; Ekins, S. New predictive models for blood–brain barrier permeability of drug-like molecules. Pharm. Res. 2008, 25, 1836–1845. [Google Scholar] [CrossRef]
Vilar, S.; Chakrabarti, M.; Costanzi, S. Prediction of passive blood–brain partitioning: Straightforward and effective classification models based on in silico derived physicochemical descriptors. J. Mol. Graph. Model. 2010, 28, 899–903. [Google Scholar] [CrossRef]
Mensch, J.; Jaroskova, L.; Sanderson, W.; Melis, A.; Mackie, C.; Verreck, G.; Brewster, M.E.; Augustijns, P. Application of PAMPA-models to predict BBB permeability including efflux ratio, plasma protein binding and physicochemical parameters. Int. J. Pharm. 2010, 395, 182–197. [Google Scholar] [CrossRef]
Shityakov, S.; Neuhaus, W.; Dandekar, T.; Förster, C. Analysing molecular polar surface descriptors to predict blood–brain barrier permeation. Int. J. Comput. Biol. Drug Des. 2013, 6, 146–156. [Google Scholar] [CrossRef] [PubMed]
Bickel, U. How to measure drug transport across the blood–brain barrier. NeuroRx 2005, 2, 15–26. [Google Scholar] [CrossRef]
Jackson, S.; Meeks, C.; Vezina, A.; Robey, R.W.; Tanner, K.; Gottesman, M.M. Model systems for studying the blood–brain barrier: Applications and challenges. Biomaterials 2019, 214, 119217. [Google Scholar] [CrossRef] [PubMed]
Abbott, N.J. Blood–brain barrier structure and function and the challenges for CNS drug delivery. J. Inherit. Metab. Dis. 2013, 36, 437–449. [Google Scholar] [CrossRef] [PubMed]
Hajal, C.; Le Roi, B.; Kamm, R.D.; Maoz, B.M. Biology and models of the blood–brain barrier. Annu. Rev. Biomed. Eng. 2021, 23, 359–384. [Google Scholar] [CrossRef]
Carpenter, T.S.; Kirshner, D.A.; Lau, E.Y.; Wong, S.E.; Nilmeier, J.P.; Lightstone, F.C. A method to predict blood–brain barrier permeability of drug-like compounds using molecular dynamics simulations. Biophys. J. 2014, 107, 630–641. [Google Scholar] [CrossRef]
Shamloo, A.; Pedram, M.Z.; Heidari, H.; Alasty, A. Computing the blood brain barrier (BBB) diffusion coefficient: A molecular dynamics approach. J. Magn. Magn. Mater. 2016, 410, 187–197. [Google Scholar] [CrossRef]
Goliaei, A.; Adhikari, U.; Berkowitz, M.L. Opening of the blood–brain barrier tight junction due to shock wave induced bubble collapse: A molecular dynamics simulation study. ACS Chem. Neurosci. 2015, 6, 1296–1301. [Google Scholar] [CrossRef]
Man, V.H.; Li, M.S.; Derreumaux, P.; Wang, J.; Nguyen, T.T.; Nangia, S.; Nguyen, P.H. Molecular mechanism of ultrasound interaction with a blood brain barrier model. J. Chem. Phys. 2020, 153, 045104. [Google Scholar] [CrossRef]
Rajagopal, N.; Irudayanathan, F.J.; Nangia, S. Computational nanoscopy of tight junctions at the blood–brain barrier interface. Int. J. Mol. Sci. 2019, 20, 5583. [Google Scholar] [CrossRef] [PubMed]
Salo-Ahen, O.M.; Alanko, I.; Bhadane, R.; Bonvin, A.M.; Honorato, R.V.; Hossain, S.; Juffer, A.H.; Kabedev, A.; Lahtela-Kakkonen, M.; Larsen, A.S.; et al. Molecular dynamics simulations in drug discovery and pharmaceutical development. Processes 2020, 9, 71. [Google Scholar] [CrossRef]
Borhani, D.W.; Shaw, D.E. The future of molecular dynamics simulations in drug discovery. J. Comput.-Aided Mol. Des. 2012, 26, 15–26. [Google Scholar] [CrossRef]
Durrant, J.D.; McCammon, J.A. Molecular dynamics simulations and drug discovery. BMC Biol. 2011, 9, 71. [Google Scholar] [CrossRef]
Saikia, S.; Bordoloi, M. Molecular docking: Challenges, advances and its use in drug discovery perspective. Curr. Drug Targets 2019, 20, 501–521. [Google Scholar] [CrossRef] [PubMed]
Miao, R.; Xia, L.Y.; Chen, H.H.; Huang, H.H.; Liang, Y. Improved classification of blood–brain-barrier drugs using deep learning. Sci. Rep. 2019, 9, 8802. [Google Scholar] [CrossRef] [PubMed]
Ansari, M.Y.; Chandrasekar, V.; Singh, A.V.; Dakua, S.P. Re-routing drugs to blood brain barrier: A comprehensive analysis of machine learning approaches with fingerprint amalgamation and data balancing. IEEE Access 2022, 11, 9890–9906. [Google Scholar] [CrossRef]
Wang, Z.; Yang, H.; Wu, Z.; Wang, T.; Li, W.; Tang, Y.; Liu, G. In silico prediction of blood–brain barrier permeability of compounds by machine learning and resampling methods. ChemMedChem 2018, 13, 2189–2201. [Google Scholar] [CrossRef]
Yang, Q.; Fan, L.; Hao, E.; Hou, X.; Deng, J.; Xia, Z.; Du, Z. Machine Learning Exploration of the Relationship Between Drugs and the Blood–Brain Barrier: Guiding Molecular Modification. Pharm. Res. 2024, 41, 863–875. [Google Scholar] [CrossRef]
Mazumdar, B.; Sarma, P.K.D.; Mahanta, H.J.; Sastry, G.N. Machine learning based dynamic consensus model for predicting blood–brain barrier permeability. Comput. Biol. Med. 2023, 160, 106984. [Google Scholar] [CrossRef]
Wu, Z.; Ramsundar, B.; Feinberg, E.N.; Gomes, J.; Geniesse, C.; Pappu, A.S.; Leswing, K.; Pande, V. MoleculeNet: A benchmark for molecular machine learning. Chem. Sci. 2018, 9, 513–530. [Google Scholar] [CrossRef] [PubMed]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing, Hefei, China, 23–26 August 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887. [Google Scholar]
Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
Xue, J.; Ma, J. Extreme Sample Imbalance Classification Model Based on Sample Skewness Self-Adaptation. Symmetry 2023, 15, 1082. [Google Scholar] [CrossRef]
Poduslo, J.F.; Curran, G.L. Polyamine modification increases the permeability of proteins at the blood-nerve and blood–brain barriers. J. Neurochem. 1996, 66, 1599–1609. [Google Scholar] [CrossRef]
Fong, C.W. Permeability of the blood–brain barrier: Molecular mechanism of transport of drugs and physiologically important compounds. J. Membr. Biol. 2015, 248, 651–669. [Google Scholar] [CrossRef]
Pardridge, W.M.; Sakiyama, R.; Fierer, G. Transport of propranolol and lidocaine through the rat blood–brain barrier. Primary role of globulin-bound drug. J. Clin. Investig. 1983, 71, 900–908. [Google Scholar] [CrossRef]
Olesen, J.; Hougård, K.; Hertz, M. Isoproterenol and propranolol: Ability to cross the blood–brain barrier and effects on cerebral circulation in man. Stroke 1978, 9, 344–349. [Google Scholar] [CrossRef]
Medeiros, A.; O’Brien, T. Ampicillin-resistant Haemophilus influenzae type B possessing a TEM-type β-lactamase but little permeability barrier to ampicillin. Lancet 1975, 305, 716–719. [Google Scholar] [CrossRef] [PubMed]
Nau, R.; Sorgel, F.; Eiffert, H. Penetration of drugs through the blood-cerebrospinal fluid/blood–brain barrier for treatment of central nervous system infections. Clin. Microbiol. Rev. 2010, 23, 858–883. [Google Scholar] [CrossRef] [PubMed]
Fu, X.C.; Wang, G.P.; Shan, H.L.; Liang, W.Q.; Gao, J.Q. Predicting blood–brain barrier penetration from molecular weight and number of polar atoms. Eur. J. Pharm. Biopharm. 2008, 70, 462–466. [Google Scholar] [CrossRef]
Dichiara, M.; Amata, B.; Turnaturi, R.; Marrazzo, A.; Amata, E. Tuning properties for blood–brain barrier permeation: A statistics-based analysis. ACS Chem. Neurosci. 2019, 11, 34–44. [Google Scholar] [CrossRef]
Lipinski, C.A.; Lombardo, F.; Dominy, B.W.; Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 1997, 23, 3–25. [Google Scholar] [CrossRef]
Berman, S.M.; Kuczenski, R.; McCracken, J.T.; London, E.D. Potential adverse effects of amphetamine treatment on brain and behavior: A review. Mol. Psychiatry 2009, 14, 123–142. [Google Scholar] [CrossRef] [PubMed]
Spigelman, M.K.; Zappulla, R.A.; Johnson, J.; Goldsmith, S.J.; Malis, L.I.; Holland, J.F. Etoposide-induced blood–brain barrier disruption: Effect of drug compared with that of solvents. J. Neurosurg. 1984, 61, 674–678. [Google Scholar] [CrossRef] [PubMed]
Landrum, G. Rdkit documentation. Release 2013, 1, 4. [Google Scholar]
Charman, W.N.; Porter, C.J.; Mithani, S.; Dressman, J.B. Physicochemical and physiological mechanisms for the effects of food on drug absorption: The role of lipids and pH. J. Pharm. Sci. 1997, 86, 269–282. [Google Scholar] [CrossRef]

Figure 1. Top-ranked feature importance based on the Random Forest model.

Figure 2. Impact of –NH/OH group count on BBB permeability: (a) permeability rate, (b) distribution across permeable and non-permeable compounds, (c) model accuracy.

Figure 3. Skeletal formula of (a) propranolol (BBB permeable) and (b) ampicillin (BBB non-permeable) with –OH groups and –NH groups highlighted in yellow.

Figure 4. Impact of NO atom count on BBB permeability: (a) permeability rate, (b) distribution across permeable and non-permeable compounds, (c) model accuracy.

Figure 5. Skeletal formula of (a) dexamfetamine (BBB permeable) and (b) etoposide (BBB non-permeable) with O atoms and N atoms highlighted in yellow.

Figure 6. Workflow of the machine learning pipeline for BBB permeability prediction.

Table 1. Baseline performance of selected classifiers on the imbalanced BBBP dataset.

Model	Run Time (s)	Accuracy	Precision	Recall	F1-Score
LogisticRegression	0.32	0.881	0.891	0.962	0.925
RandomForestClassifier	0.04	0.877	0.876	0.978	0.924
XGBClassifier	0.06	0.876	0.888	0.958	0.922
GradientBoostingClassifier	0.13	0.876	0.882	0.966	0.922
LGBMClassifier	0.05	0.872	0.891	0.949	0.919
MLPClassifier	0.37	0.860	0.882	0.944	0.912
AdaBoostClassifier	0.04	0.855	0.863	0.964	0.910
DecisionTreeClassifier	0.01	0.823	0.887	0.882	0.884
KNeighborsClassifier	0.00	0.814	0.844	0.928	0.884
DummyClassifier_most_frequent	0.00	0.763	0.763	1.000	0.866
GaussianNB	0.00	0.658	0.824	0.701	0.758

Table 2. Performance comparison of Logistic Regression with different resampling techniques. The metrics shown represent single-run evaluations and therefore differ slightly from the cross-validated results presented in Table 1.

Metric	Without SMOTE	SMOTE	Borderline SMOTE	Undersampling
True Positives (TP)	420	409 (−11)	403 (−17)	389 (−31)
True Negatives (TN)	82	93 (+11)	92 (+10)	96 (+14)
False Positives (FP)	57	46 (−11)	47 (−10)	43 (−14)
False Negatives (FN)	28	39 (+11)	45 (+17)	59 (+31)
Correct Predictions	502	502 (0)	495 (−7)	485 (−17)
Incorrect Predictions	85	85 (0)	92 (+7)	102 (+17)
Accuracy	0.855	0.855 (+0.000)	0.843 (−0.012)	0.826 (−0.029)
Precision	0.881	0.899 (+0.018)	0.896 (+0.015)	0.900 (+0.020)
Recall	0.938	0.913 (−0.025)	0.900 (−0.038)	0.868 (−0.070)
F1 Score	0.908	0.906 (−0.002)	0.898 (−0.011)	0.884 (−0.024)
ROC AUC	0.764	0.791 (+0.027)	0.781 (+0.017)	0.779 (+0.016)
Average Precision	0.873	0.887 (+0.014)	0.882 (+0.009)	0.882 (+0.009)

Table 3. Performance comparison of Random Forest with different resampling techniques. The metrics shown represent single-run evaluations and therefore differ slightly from the cross-validated results presented in Table 1.

Metric	Without SMOTE	SMOTE	Borderline SMOTE	Undersampling
True Positives (TP)	443	442 (−1)	440 (−3)	428 (−15)
True Negatives (TN)	87	90 (+3)	88 (+1)	97 (+10)
False Positives (FP)	52	49 (−3)	51 (−1)	42 (−10)
False Negatives (FN)	5	6 (+1)	8 (+3)	20 (+15)
Correct Predictions	530	532 (+2)	528 (−2)	525 (−5)
Incorrect Predictions	57	55 (−2)	59 (+2)	62 (+5)
Accuracy	0.903	0.906 (+0.003)	0.900 (−0.004)	0.894 (−0.009)
Precision	0.895	0.900 (+0.005)	0.896 (+0.001)	0.911 (+0.016)
Recall	0.989	0.987 (−0.002)	0.982 (−0.007)	0.955 (−0.034)
F1 Score	0.940	0.941 (+0.001)	0.937 (−0.003)	0.933 (−0.008)
ROC AUC	0.807	0.817 (+0.010)	0.808 (+0.001)	0.827 (+0.020)
Average Precision	0.893	0.898 (+0.005)	0.894 (+0.001)	0.904 (+0.011)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, B.; Liu, S. Feature-Guided Machine Learning for Studying Passive Blood–Brain Barrier Permeability to Aid Drug Discovery. Int. J. Mol. Sci. 2025, 26, 11228. https://doi.org/10.3390/ijms262211228

AMA Style

Zhu B, Liu S. Feature-Guided Machine Learning for Studying Passive Blood–Brain Barrier Permeability to Aid Drug Discovery. International Journal of Molecular Sciences. 2025; 26(22):11228. https://doi.org/10.3390/ijms262211228

Chicago/Turabian Style

Zhu, Baining, and Suwei Liu. 2025. "Feature-Guided Machine Learning for Studying Passive Blood–Brain Barrier Permeability to Aid Drug Discovery" International Journal of Molecular Sciences 26, no. 22: 11228. https://doi.org/10.3390/ijms262211228

APA Style

Zhu, B., & Liu, S. (2025). Feature-Guided Machine Learning for Studying Passive Blood–Brain Barrier Permeability to Aid Drug Discovery. International Journal of Molecular Sciences, 26(22), 11228. https://doi.org/10.3390/ijms262211228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Feature-Guided Machine Learning for Studying Passive Blood–Brain Barrier Permeability to Aid Drug Discovery

Abstract

1. Introduction

2. Results and Discussions

2.1. Base Model Comparison

2.2. Impact of Resampling Techniques on Model Performance

2.3. Feature Ranking and Interpretation of Key Molecular Features

3. Methods

3.1. Dataset Description and Molecular Representations

3.2. Feature Engineering, Preprocessing, Model Training and Evaluation

3.3. Model Evaluation and Interpretation

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI