Towards Deep Neural Network Models for the Prediction of the Blood–Brain Barrier Permeability for Diverse Organic Compounds

Radchenko, Eugene V.; Dyabina, Alina S.; Palyulin, Vladimir A.

doi:10.3390/molecules25245901

Open AccessArticle

Towards Deep Neural Network Models for the Prediction of the Blood–Brain Barrier Permeability for Diverse Organic Compounds

by

Eugene V. Radchenko

,

Alina S. Dyabina

and

Vladimir A. Palyulin

^*

Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, Russia

^*

Author to whom correspondence should be addressed.

Molecules 2020, 25(24), 5901; https://doi.org/10.3390/molecules25245901

Submission received: 23 November 2020 / Revised: 6 December 2020 / Accepted: 10 December 2020 / Published: 13 December 2020

(This article belongs to the Section Medicinal Chemistry)

Download

Browse Figures

Versions Notes

Abstract

Permeation through the blood–brain barrier (BBB) is among the most important processes controlling the pharmacokinetic properties of drugs and other bioactive compounds. Using the fragmental (substructural) descriptors representing the occurrence number of various substructures, as well as the artificial neural network approach and the double cross-validation procedure, we have developed a predictive in silico LogBB model based on an extensive and verified dataset (529 compounds), which is applicable to diverse drugs and drug-like compounds. The model has good predictivity parameters (

Q^{2} = 0.815

,

R M S E_{c v} = 0.318

) that are similar to or better than those of the most reliable models available in the literature. Larger datasets, and perhaps more sophisticated network architectures, are required to realize the full potential of deep neural networks. The analysis of fragment contributions reveals patterns of influence consistent with the known concepts of structural characteristics that affect the BBB permeability of organic compounds. The external validation of the model confirms good agreement between the predicted and experimental LogBB values for most of the compounds. The model enables the evaluation and optimization of the BBB permeability of potential neuroactive agents and other drug compounds.

Keywords:

ADMET; prediction; pharmacokinetics; distribution; blood–brain barrier; permeability

1. Introduction

Permeation through the blood–brain barrier (BBB) is among the most important processes controlling the pharmacokinetic properties of drugs and other bioactive compounds in humans and animals. For compounds targeting the central nervous system (CNS), such a penetration should be enhanced during drug development while, for peripherally acting drugs, it should be avoided to prevent central side effects [1,2]. In addition to the passive diffusion, this process may involve active efflux and uptake transport, as well as the binding of a drug to plasma proteins and brain tissue. In recent years, substantial progress has been made in the development of direct in vivo measurements of the blood-to-brain transport (e.g., microdialysis [3] and cerebral open-flow microperfusion [4]), as well as non-mammalian whole-organism models suitable for the high-throughput screening [5]. Increasingly relevant and accurate in vitro models are also being developed [6,7,8], including cell-free methods such as the widely used parallel artificial membrane permeability assay (PAMPA) or immobilized artificial membrane (IAM) chromatography, brain slices, isolated capillaries, and various cell culture models. Nevertheless, all these approaches obviously require significant amounts of a physical substance, while achieving physiological relevance in a model may be challenging. Thus, the need for robust in silico techniques for the prediction of the BBB permeability of diverse compounds with different transport mechanisms is still quite valid, especially in virtual screening, multiparameter assessment [9], and lead optimization contexts.

Traditionally, the most commonly used quantitative measure of BBB penetration and distribution has been the ratio of total concentrations of a compound in the brain and in plasma or whole blood, usually expressed as a logarithm

L o g B B = \log K_{p} = \log \frac{C_{b r a i n}}{C_{p l a s m a}}

(1)

Although it is now recognized that this quantity is not the best endpoint for drug optimization because actual bioavailability in the brain can be significantly distorted by non-specific binding on both sides of the barrier [1,3,10], it strongly dominates the body of the available experimental data and is used in the majority of modeling studies. Additional variability limiting the model quality is caused by the time-dependent nature of the distribution process. The reported brain-to-plasma ratios may be based on the concentration or area under curve (AUC) values at different timepoints and in different experimental conditions [11]. In many publications, the more easily accessible and abundant classification data (BBB+ for penetrating and BBB− for non-penetrating compounds), often estimated by the presence or absence of the CNS activity, are used.

Starting with the pioneering works of Levin [12] and Young et al. [13], dozens of papers aiming to predict the BBB permeability were published. A review of earlier results can be found in Garg et al. [14], while the reviews by Raevsky et al. [15], Lanevskij et al. [16], Morales et al. [10], and Liu et al. [17] focus on recent publications. Most of the models are based on some combinations of physico-chemical descriptors, primarily lipophilicity, ionization, molecule size, surface area, polarity, polarizability, and hydrogen bonding ability [18,19,20,21,22,23,24,25,26]. Simple rules of thumb or scores to predict BBB-penetrating compounds, similar to the drug-likeness filters, have been formulated [27,28]. As additional descriptors, orbital and solvation energies calculated by quantum chemistry methods [29], membrane permeation parameters derived from molecular dynamics simulations [30,31,32], and even experimental parameters from chromatography [33,34,35] and ion mobility spectroscopy [36], can be used. External estimates of probability that a compound will undergo active efflux mediated by P‑glycoprotein (P-gp) can also be included [21,24,37]. In other approaches, large pools of various 1D, 2D (including molecular fingerprints), and 3D molecular descriptors calculated by different methods [26,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52] are analyzed by various statistical learning techniques, e.g., multiple linear regression, linear discriminant analysis, partial least squares regression, support vector machines, artificial neural networks, random forests, etc., often in combination with some descriptor selection protocols [23,24,26,43,44,45,46,47,48,49,50,51,52]. However, the limited size of the training sets, use of unverified data, and too-small modeling errors for such an inherently noisy endpoint often give rise to the concerns of possible model overfitting [16].

In view of these issues, the goal of the present work was to develop a predictive in silico blood–brain barrier permeability model, applicable to diverse drugs and drug-like compounds. We decided to focus on the fragmental (substructural) descriptors representing the occurrence number of various substructures. Although rarely employed in the literature for the blood–brain permeability modeling, in combination with artificial neural networks they provide efficient tools for various quantitative structure–property relationship (QSPR) and quantitative structure–activity relationship (QSAR) problems [53,54,55]. Previously, this approach was successfully used by us to model the effects of structure on a number of physico-chemical, pharmacokinetic, and toxicity endpoints such as lipophilicity [56], blood–brain barrier permeability (preliminary model [57]), human intestinal absorption [58], hERG-mediated cardiac toxicity [59], etc. Some of these models are available online at our ADMET Prediction Service page (http://qsar.chem.msu.ru/admet/ accessed 01 November 2020) and have been successfully used to evaluate the key absorption, distribution, metabolism, excretion, toxicity (ADMET) properties of potential drug compounds in the virtual screening and molecular design studies [60,61,62,63]. A secondary goal was to refine the modeling procedures suitable for deep neural networks as well as to evaluate their applicability.

2. Results and Discussion

2.1. Blood–Brain Barrier Permeability Dataset

Both our experience and the literature data [64,65,66,67] show that the completeness and accuracy of the training sets play a critical role in developing predictive and widely applicable QSAR/QSPR models. We have compiled a dataset based on the open quantitative (LogBB) data that was significantly extended and more complete compared to the largest sets published at the time. More than 100 source publications were included. Unfortunately, the quality of the available literature data has not significantly improved in the almost two decades since the analysis [68] was published. During the preparation of the dataset, the data were verified and the errors in structures and endpoint values corrected against the original publications. On the other hand, inorganic molecules irrelevant to medicinal chemistry were excluded. The final dataset used in the modeling contains 529 diverse organic compounds, with LogBB values ranging from −2.15 to 1.70 (the full dataset with literature references is provided in the Supplementary Materials). The plot of the LogBB value distribution (Figure 1) confirms representative coverage of the entire endpoint range.

Although a large number of papers on LogBB modeling have been published, most of them are based on significantly smaller datasets that contain data completely overlapping with our dataset. One of the exceptions is the extensive dataset compiled by Brito-Sánchez et al. [26] which we decided to use as an external validation set. Out of 581 compounds, 13 were excluded because of the unrealistic LogBB values (<−2.5). Among the remaining 568 compounds, 216 compounds with LogBB values ranging from −2.15 to 1.60 were not present in our dataset. Taking into account the significant number of non-overlapping compounds, instead of merging the data and rebuilding the models, we used this dataset (without any further curation) for additional external “stress-test” validation of the model.

2.2. Molecular Descriptors

The fragmental (substructural) descriptors [53,54,55] representing the occurrence number of various substructures were calculated in the framework of the NASAWIN 2.0 [69] software. Linear paths, cycles, and branches were generated using multi-level classification that takes into account atom types, valence states, bonding patterns, and number of attached hydrogens, as well as bond types. The rare fragments that are present in four or fewer compounds, and thus cannot be used to detect general predictive relationships, were removed. The fragments containing up to 10 non-hydrogen atoms were considered in order to provide a sufficiently detailed description of the structures without an excessive increase in the number of descriptors. In total, several thousands of descriptors (depending on the fragment size) were generated.

As sufficiently complete and accurate data on the role of various passive and active transport mechanisms (especially for P-gp substrates) in the transport through the BBB and on the binding to blood plasma proteins and brain tissues are not available for the majority of compounds [1,10], these parameters were not explicitly considered in the modeling. Instead, the related patterns in the effect of structure on the BBB penetration were expected to be modeled implicitly by the neural network-based fragmental model.

2.3. Neural Network Modeling Procedure

2.3.1. General Modeling Approach

The high-level modeling workflow shown in Figure 2 integrates the classical feed-forward back-propagation neural network (BPNN) architecture and the repeated double cross-validation [70] approach. The double cross-validation procedure involves two loops and in each loop a fraction of the dataset is randomly selected as a test subset. During each iteration of the inner loop, a neural network model is built using the training subset while the prediction error on the test subset is monitored to provide the early termination, while the outer loop test subset is used to validate the resulting model. Usually the 5 × 4-fold double cross-validation scheme is employed, corresponding to

N_{O} = 5

and

N_{I} = 4

in Figure 2. That is, in the outer loop the dataset is split into five subsets of approximately equal sizes, and each of them is used to validate four models built in the inner loop by splitting the remaining data into four subsets of approximately the same size and using three of them for training the model and one for early termination. The procedure can be repeated several times (

N_{R}

) to enhance the stability and reliability of the results. The validation subset errors are then consolidated and normalized into the usual cross-validation statistics such as the

Q^{2}

parameter and the root mean squared error

R M S E_{c v}

Q^{2} = 1 - \frac{P R E S S}{S S}

(2)

R M S E_{c v} = \sqrt{\frac{P R E S S}{N}}

(3)

where

P R E S S

is the sum of squared prediction errors,

S S

is the sum of squared deviations from mean, and

N

is the total number of samples in the dataset. To reduce the risk of overfitting and chance correlations, the inner and outer splits are randomly shuffled at each step. This approach not only provides quite reliable estimates of the model predictivity, but also generates an ensemble of neural network models based on different subsets of data that can be used to improve prediction quality and evaluate the model applicability (see Section 2.3.4). The neural network models were built using the Python script based on the TensorFlow 1.14 and Keras 2.2.4 frameworks on a high-performance NVIDIA GTX1080 GPU.

The neural network architecture may include one or more fully connected (Dense) layers. Based on our preliminary testing, the scaled exponential linear unit (SELU) activation function [71] was found to provide the best results in terms of model quality and training efficiency. Optionally, the fully connected layers can be interleaved with the AlphaDropout [71] regularization layers in order to prevent overfitting. The mean squared error was used as a loss function for model training.

2.3.2. Data Preprocessing and Descriptor Selection

The preprocessing of raw descriptor and endpoint values should involve some kind of scaling to transform them into finite, small, and consistent intervals suitable for neural network modeling. Several scaler algorithms available in the scikit-learn 0.21 framework [72] can be used, including MinMaxScaler (transform features by linear scaling to the

[0, 1]

range), StandardScaler (standardize features by removing the mean and scaling to unit variance), RobustScaler (outlier-robust standardization by removing the median and scaling to interquartile range), and QuantileTransformer (outlier-robust standardization approximating uniform or normal distribution). Each descriptor column is scaled independently.

The descriptors represent integer fragment counts that can vary from zero to several dozens for the types of structures and fragments considered in the modeling. Their distribution is far from normal, and small changes, especially at the lower end of their range, may be significant. Thus, somewhat predictably, in preliminary tests, the MinMaxScaler descriptor scaling was found to be superior to the other scalers. Rather unexpectedly, similar results were also obtained for the continuous LogBB endpoint values, possibly because the dataset distribution is to some extent skewed. Because of this, in the derivation of final models, the MinMaxScaler scaling was used for both the descriptors and the endpoint values.

Descriptor selection is performed globally (for the entire modeling dataset after scaling) as well as locally (for the training sets selected in the outer and inner loops of the double cross-validation procedure). It aims to remove low-variable descriptors (defined as variance below 10^–6) and to identify the most relevant descriptor subset. For the latter task, three general approaches were implemented on the basis of the scikit-learn framework:

Selection of a specified number of descriptors with the highest F-values in the univariate linear regression (f_regression) or non-parametric mutual information scores [73] (mutual_info_regression) between the descriptor and the endpoint;
Recursive feature elimination (RFE) [74] based on the descriptor importance scores from the Partial Least Squares (PLSR), Random Forest, linear Support Vector Machine, ElasticNet or Lasso regression models;
Stepwise descriptor selection procedure, wherein a multiple linear or Partial Least Squares regression is iteratively refined by adding descriptors with the highest F-value or mutual information scores with the residual endpoint.

Based on preliminary testing, the optimal balance of modeling quality and efficiency is achieved for the PLSR-based stepwise selection procedure using F-value or mutual information scores. Since these models are sufficiently different from the resulting neural network models, we can be reasonably confident that the descriptor selection procedure does not lead to overfitting or chance correlations.

2.3.3. Hyperparameter Optimization

Every machine learning modeling workflow involves a number of hyperparameters that can significantly affect its quality and efficiency. These include model architecture (number and size of the hidden layers) and training parameters, as well as the descriptor set (in particular, fragment size, selection algorithm and the number of selected descriptors) and the prediction and applicability control parameters (Section 2.3.4). In the present study, hyperparameter optimization and model selection were performed using the Hyperopt 0.1.2 [75] library that implements sequential model-based (Bayesian) optimization in the hyperparameter space. The loss function for the minimization was defined as

L o s s = - Q^{2} + \frac{\log T i m e}{100}

(with time in seconds), aiming to achieve the best model predictivity, preferably in the shortest time. The modeling runs that failed to provide trained models of reasonable quality within specified time limits were discarded, while good models were saved for further analysis. For some of the hyperparameters, the optimal values determined in the preliminary tests were kept fixed during the final modeling.

2.3.4. Prediction and Applicability Control

As mentioned above, the double cross-validation procedure generates an ensemble of neural network models based on different subsets of data that can be used to improve prediction quality and evaluate the model applicability. The prediction procedure involves the following steps:

The predicted values are calculated for each individual model in the ensemble and transformed back to the original endpoint scale;
For each predicted value, a sanity check is performed to ensure that it lies within a reasonable range ( $[- 2.5, 2.5]$ for LogBB). Values outside of this range (extended compared to the training dataset) most probably indicate that the compound is beyond the model applicability domain limits and the individual predicted value cannot be trusted;
If such failed predictions are obtained from more than a specified fraction of the ensemble models (usually 50%), a prediction failure is reported;
The individual predicted values are clipped to a specified acceptable range ( $[- 2, 2]$ for LogBB);
Mean and standard deviation of the individual predicted values are computed;
If the standard deviation is greater than a specified fraction of the acceptable range (usually 30%), a prediction failure is reported;
Otherwise, the mean and standard deviation values are reported.

For the analysis and interpretation purposes, the model sensitivities to the descriptors for a particular compound can be evaluated [76] in a local linear approximation as the gradient values (partial derivatives of the scaled network output with respect to scaled inputs) calculated using TensorFlow facilities and averaged over the ensemble models. In order to analyze general trends in the influence of descriptors, these values should be multiplied by the respective scaled inputs and averaged over the prediction set.

2.4. Predictive LogBB Model

2.4.1. Optimal Architecture and Model Quality

The preliminary studies of various modeling approaches have shown that the deep neural network architectures containing two or three fully connected hidden layers do not provide significant improvements in model quality for this relatively small training set compared to the shallow (one-layer) networks. On the other hand, they require more training time and increase the risk of overfitting due to greater model complexity. Apparently, larger datasets, and perhaps more sophisticated network architectures, are required to realize the full potential of deep networks.

For this reason, the one-layer network architectures were considered for the final predictive model. During the hyperparameter optimization, three sets of fragmental descriptors were considered, containing up to 6, 8 or 10 non-hydrogen atoms. Descriptor subsets of varying size (from 100 to 1000 descriptors) were selected. The size of a hidden layer relative to the number of descriptors was varied between 0.2 and 0.6, and the dropout layers with probability between 0 and 0.5 were used.

The optimal model is based on 200 fragmental descriptors containing up to eight non-hydrogen atoms. Its predictivity parameters (

Q^{2} = 0.815

,

R M S E_{c v} = 0.318

) are similar to or better than those of the most reliable models available in the literature [16,23,26,43,57], and the average prediction error is close to the error of experimental determination of LogBB (0.3 log units [27]). The training of all individual neural network models was completed in less than 250 epochs (about 100 epochs for most of them), indicating a low risk of overfitting. The comparison between the experimental LogBB values and values predicted during double cross-validation (Figure 3) also confirms high prediction accuracy for the vast majority of compounds. It should be noted that this model is based on a significantly larger and more representative training set, ensuring a broader applicability domain of the model covering more diverse compounds. In addition, the model is indeed able to implicitly handle the peculiarities of the compounds undergoing active influx or efflux: the prediction accuracy for the majority of compounds is quite high, and the significant outliers are not correlated with the known actively transported compounds. The model can provide useful guidance and improve the efficiency of the virtual screening, multiparameter assessment, and lead optimization efforts; however, like any in silico model, its predictions should eventually be validated in vitro and/or in vivo, since a specific compound of interest might be outside of the model applicability domain or could interact with the BBB components (such as transporters and receptors) in some unexpected ways.

2.4.2. Model Interpretation

For the analysis and interpretation purposes, the average fragment contributions to the predicted blood–brain permeability over the entire training set were calculated from the model gradient values, as explained in Section 2.3.4. The fragments with the most significant negative and positive influence on the LogBB values are shown in Figure 4. Many of them afford a simple interpretation, consistent with the known concepts of structural characteristics that affect the BBB permeability of organic compounds [27,77]. For example, the permeability tends to be higher for carbon-rich aliphatic and aromatic compounds, aliphatic amines, ethers, fluoro-derivatives, and aromatic chloro-derivatives. On the other hand, the presence of oxygen atoms (especially in carboxylic groups), unsaturated groups, amides, polyamines, guanidine derivatives, aliphatic chloro-derivatives, aromatic sulfoxides, sulfones, and sulfonamides tends to decrease the permeability.

Nevertheless, it should be noted that both “positive” and “negative” fragments are usually present and may even overlap in real structures. Thus, their effects may partially compensate each other in subtle non-linear ways. The model also includes a large number of other fragmental descriptors affecting the predicted BBB permeability. Moreover, in contrast to the individual gradient values, the total fragment contributions reveal significant variability between the individual compounds that reflect different numbers of their occurrences in a structure. Thus, in the optimization of the pharmacokinetic properties of a drug, a more detailed visualization approach based on the permeability heatmaps would be helpful, coupled with full-model predictions and virtual screening of proposed structures.

2.4.3. External Validation

As explained in Section 2.1, external validation of the model was performed using the extensive dataset compiled by Brito-Sánchez et al. [26]. Among the 568 compounds with reasonable LogBB values, 216 compounds were not present in our training dataset. Their distribution is very similar, with LogBB values ranging from −2.15 to 1.60. The prediction using our model was successful for 564 compounds (213 non-overlapping compounds). The prediction results are shown in Figure 5 in terms of the experimental (dataset) and predicted LogBB values. It can be seen that the agreement between them is generally good for most of the compounds both in the overlapping and non-overlapping subsets (the statistical parameters are listed in Table 1).

However, the number of outlier compounds with significant errors is greater than desired. Surprisingly, four compounds with absolute errors greater than 1.0 log units were found even among the compounds present in our training set (overlapping subset) while, in our tests, the predictions for the training set using the full ensemble model yield

R M S E = 0.21

and no errors greater than 0.89. Although the full curation and reconciliation of data was beyond the scope of this study, these compounds strongly exceeded the expected error level and we decided to analyze the possible reasons for this discrepancy. The results are summarized in Table 2. For compounds YG15 and YG16, the LogBB values in the validation set, for no obvious reason, do not match the values in the provided reference [78] and the other literature [38] while the training set data seem correct. For tacrine, the training set and most of the literature sources provide LogBB values close to −0.12, in agreement with the classical experimental data [79], while the

K_{p}

value in the referenced source [80] corresponds to

L o g B B = 0.98

, still different from the validation set (the values based on the unbound concentrations are in fact close to the commonly accepted value). Finally, for warfarin, the value in the referenced source [81] was calculated from Abraham descriptors rather than determined experimentally. We expect that a more detailed data curation would reveal better concordance between the experimental and predicted values. Nevertheless, this analysis strongly highlights the need for better curation procedures as well as more extensive and representative training data. It should be noted that in the external validation, significant outliers are also not correlated to known actively transported compounds, confirming that the model is able to implicitly handle the peculiarities of the compounds undergoing active influx or efflux.

Using this validation dataset, we also attempted to evaluate to what extent the standard deviation values from the ensemble prediction procedure (Section 2.3.4) could be used as a predictor of resulting prediction errors, and thus as a measure of model applicability. The plot in Figure 6 reveals a loose correlation between these quantities (

R = 0.40

) and indicates that very high prediction errors are indeed much more likely to occur for the compounds with greater ensemble standard deviations (reflecting substantial differences between individual neural network models based on different subsets of training data). However, the accuracy of this test is not sufficient for it to be used as a strict prediction acceptability filter, providing instead just a warning of potential problems.

3. Materials and Methods

3.1. Blood–Brain Barrier Permeability Datasets

The dataset was compiled from the open quantitative (LogBB) data using more than 100 source publications. The data were verified and the errors in structures and endpoint values corrected against the original publications. On the other hand, inorganic molecules irrelevant to medicinal chemistry were excluded. The final dataset used in the modeling contains 529 diverse organic compounds with LogBB values ranging from −2.15 to 1.70 (the full dataset with the literature references is provided in the Supplementary Materials).

The external validation dataset was obtained from the publication [26]. Out of 581 compounds, 13 were excluded because of the unrealistic LogBB values (<−2.5). No additional data curation was performed.

Instant JChem 20.17 software (ChemAxon Kft., Budapest, Hungary, https://chemaxon.com/) was used for structure database management, search, and analysis.

3.2. Modeling Workflow

The fragmental (substructural) descriptors representing the occurrence number of various substructures were calculated in the framework of the NASAWIN 2.0 [69] software. Linear paths, cycles, and branches were generated using multi-level classification that takes into account atom types, valence states, bonding patterns, and number of attached hydrogens, as well as bond types. The rare fragments that are present in four or fewer compounds, and thus cannot be used to detect general predictive relationships, were removed. The fragments containing up to 10 non-hydrogen atoms were considered.

Predictive neural network models were built using the Python script based on the TensorFlow 1.14 and Keras 2.2.4 frameworks on a high-performance NVIDIA GTX1080 GPU. In addition to standard libraries, the scikit-learn 0.21 machine learning framework [72] and the Hyperopt 0.1.2 [75] hyperparameter optimization library were used.

4. Conclusions

Thus, we have developed a predictive in silico blood–brain barrier permeability (LogBB) model based on extensive and verified dataset (529 compounds) and applicable to diverse drugs and drug-like compounds. Using the fragmental (substructural) descriptors representing the occurrence number of various substructures, we have refined the modeling workflow suitable for deep neural networks and evaluated the performance of different options. Playing a key role, the double cross-validation procedure generates an ensemble of neural network models based on different subsets of data that can be used to improve prediction quality and to evaluate the model applicability for a particular compound. It was shown that larger datasets, and perhaps more sophisticated network architectures, are required to realize the full potential of deep neural networks.

Nevertheless, our optimal model has quite good predictivity parameters (

Q^{2} = 0.815

,

R M S E_{c v} = 0.318

) that are similar to or better than those of the most reliable models available in the literature. In addition, it is based on a significantly larger and more representative training set, ensuring a broader applicability domain of the model covering more diverse compounds. The analysis of the average fragment contributions to the predicted blood–brain permeability reveals influence patterns consistent with the known concepts of structural characteristics that affect the BBB permeability of organic compounds. The external validation of the model on the independent dataset confirms good agreement between the predicted and experimental LogBB values for most of the compounds. It was shown that high ensemble standard deviations could provide a warning of potential model applicability problems. The model can provide useful guidance and improve the efficiency of the virtual screening, multiparameter assessment, and lead optimization efforts; however, like any in silico model, its predictions should eventually be validated in vitro and/or in vivo, since a specific compound of interest might be outside of the model applicability domain or could interact with the BBB components in some unexpected ways.

In the future, we plan to extend the blood–brain barrier permeability dataset and make the model available online at our ADMET Prediction Service page (http://qsar.chem.msu.ru/admet/), enabling the evaluation and optimization of BBB permeability and other key ADMET properties of potential neuroactive agents and other drug compounds.

Supplementary Materials

The following are available online, Modeling dataset and references.

Author Contributions

Conceptualization, E.V.R. and V.A.P.; methodology, E.V.R. and V.A.P.; software, E.V.R.; investigation, E.V.R., A.S.D., V.A.P.; data curation, E.V.R., A.S.D., V.A.P.; writing—original draft preparation, E.V.R. and V.A.P.; writing—review and editing, E.V.R. and V.A.P.; supervision, V.A.P.; funding acquisition, V.A.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Russian Science Foundation, grant number 17-15-01455.

Acknowledgments

We are grateful to the ChemAxon Kft. company for providing the academic licenses for the structural database management, search, and analysis software.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Di, L.; Rong, H.; Feng, B. Demystifying brain penetration in central nervous system drug discovery. J. Med. Chem. 2013, 56, 2–12. [Google Scholar] [CrossRef]
Wager, T.T.; Liras, J.L.; Mente, S.; Trapa, P. Strategies to minimize CNS toxicity: In vitro high-throughput assays and computational modeling. Expert Opin. Drug Metab. Toxicol. 2012, 8, 531–542. [Google Scholar] [CrossRef]
Summerfield, S.G.; Zhang, Y.; Liu, H. Examining the uptake of central nervous system drugs and candidates across the blood-brain barrier. J. Pharmacol. Exp. Ther. 2016, 358, 294–305. [Google Scholar] [CrossRef] [PubMed]
Birngruber, T.; Ghosh, A.; Perez-Yarza, V.; Kroath, T.; Ratzer, M.; Pieber, T.R.; Sinner, F. Cerebral open flow microperfusion: A new in vivo technique for continuous measurement of substance transport across the intact blood-brain barrier. Clin. Exp. Pharmacol. Physiol. 2013, 40, 864–871. [Google Scholar] [CrossRef] [PubMed]
Geldenhuys, W.J.; Allen, D.D.; Bloomquist, J.R. Novel models for assessing blood-brain barrier drug permeation. Expert Opin. Drug Metab. Toxicol. 2012, 8, 647–653. [Google Scholar] [CrossRef] [PubMed]
Palmer, A.M.; Alavijeh, M.S. Overview of experimental models of the blood-brain barrier in CNS drug discovery. Curr. Protoc. Pharmacol. 2013, 62, 7.15.1–7.15.30. [Google Scholar] [CrossRef]
Neuhaus, W. In vitro models of the blood-brain barrier. In Handbook of Experimental Pharmacology; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar] [CrossRef]
Katt, M.E.; Shusta, E.V. In vitro models of the blood-brain barrier: Building in physiological complexity. Curr. Opin. Chem. Eng. 2020, 30, 42–52. [Google Scholar] [CrossRef]
Sosnina, E.A.; Osolodkin, D.I.; Radchenko, E.V.; Sosnin, S.; Palyulin, V.A. Influence of descriptor implementation on compound ranking based on multiparameter assessment. J. Chem. Inf. Model. 2018, 58, 1083–1093. [Google Scholar] [CrossRef]
Morales, J.F.; Montoto, S.S.; Fagiolino, P.; Ruiz, M.E. Current state and future perspectives in QSAR models to predict blood-brain barrier penetration in central nervous system drug R&D. Mini Rev. Med. Chem. 2017, 17, 247–257. [Google Scholar] [CrossRef]
Kerns, E.H.; Di, L. Blood-brain barrier. In Drug-Like Properties: Concepts, Structure Design and Methods; Kerns, E.H., Di, L., Eds.; Academic Press: San Diego, CA, USA, 2008; pp. 122–136. ISBN 978-0-12-369520-8. [Google Scholar] [CrossRef]
Levin, V.A. Relationship of octanol/water partition coefficient and molecular weight to rat brain capillary permeability. J. Med. Chem. 1980, 23, 682–684. [Google Scholar] [CrossRef]
Young, R.C.; Mitchell, R.C.; Brown, T.H.; Ganellin, C.R.; Griffiths, R.; Jones, M.; Rana, K.K.; Saunders, D.; Smith, I.R.; Sore, N.E.; et al. Development of a new physicochemical model for brain penetration and its application to the design of centrally acting H2 receptor histamine antagonists. J. Med. Chem. 1988, 31, 656–671. [Google Scholar] [CrossRef] [PubMed]
Garg, P.; Verma, J.; Roy, N. In silico modeling for blood-brain barrier permeability predictions. In Drug Absorption Studies: In Situ, In Vitro and In Silico Models; Ehrhardt, C., Kim, K.-J., Eds.; Biotechnology: Pharmaceutical Aspects; Springer: Boston, MA, USA, 2008; pp. 510–556. ISBN 978-0-387-74901-3. [Google Scholar] [CrossRef]
Raevsky, O.A.; Solodova, S.L.; Lagunin, A.A.; Poroikov, V.V. Computer modeling of blood brain barrier permeability for physiologically active compounds. Biochem. Mosc. Suppl. Ser. B 2013, 7, 95–107. [Google Scholar] [CrossRef]
Lanevskij, K.; Japertas, P.; Didziapetris, R. Improving the prediction of drug disposition in the brain. Expert Opin. Drug Metab. Toxicol. 2013, 9, 473–486. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Dong, K.; Zhang, W.; Summerfield, S.G.; Terstappen, G.C. Prediction of brain: Blood unbound concentration ratios in CNS drug discovery employing in silico and in vitro model systems. Drug Discov. Today 2018, 23, 1357–1372. [Google Scholar] [CrossRef] [PubMed]
Lanevskij, K.; Dapkunas, J.; Juska, L.; Japertas, P.; Didziapetris, R. QSAR analysis of blood-brain distribution: The influence of plasma and brain tissue binding. J. Pharm. Sci. 2011, 100, 2147–2160. [Google Scholar] [CrossRef] [PubMed]
Fu, X.-C.; Wang, G.-P.; Shan, H.-L.; Liang, W.-Q.; Gao, J.-Q. Predicting blood-brain barrier penetration from molecular weight and number of polar atoms. Eur. J. Pharm. Biopharm. 2008, 70, 462–466. [Google Scholar] [CrossRef]
Shayanfar, A.; Soltani, S.; Jouyban, A. Prediction of blood-brain distribution: Effect of ionization. Biol. Pharm. Bull. 2011, 34, 266–271. [Google Scholar] [CrossRef][Green Version]
Garg, P.; Verma, J. In silico prediction of blood brain barrier permeability: An Artificial Neural Network model. J. Chem. Inf. Model. 2006, 46, 289–297. [Google Scholar] [CrossRef]
Kortagere, S.; Chekmarev, D.; Welsh, W.J.; Ekins, S. New predictive models for blood-brain barrier permeability of drug-like molecules. Pharm. Res. 2008, 25, 1836–1845. [Google Scholar] [CrossRef]
Abraham, M.H.; Ibrahim, A.; Zhao, Y.; Acree, W.E. A data base for partition of volatile organic compounds and drugs from blood/plasma/serum to brain, and an LFER analysis of the data. J. Pharm. Sci. 2006, 95, 2091–2100. [Google Scholar] [CrossRef]
Chen, Y.; Zhu, Q.-J.; Pan, J.; Yang, Y.; Wu, X.-P. A prediction model for blood-brain barrier permeation and analysis on its parameter biologically. Comput. Methods Programs Biomed. 2009, 95, 280–287. [Google Scholar] [CrossRef] [PubMed]
Raevsky, O.A.; Grigorev, V.Y.; Polianczyk, D.E.; Raevskaja, O.E.; Dearden, J.C. Contribution assessment of multiparameter optimization descriptors in CNS penetration. SAR QSAR Environ. Res. 2018, 29, 785–800. [Google Scholar] [CrossRef] [PubMed]
Brito-Sánchez, Y.; Marrero-Ponce, Y.; Barigye, S.J.; Yaber-Goenaga, I.; Morell Pérez, C.; Le-Thi-Thu, H.; Cherkasov, A. Towards better BBB passage prediction using an extensive and curated data set. Mol. Inform. 2015, 34, 308–330. [Google Scholar] [CrossRef] [PubMed]
Clark, D.E. In silico prediction of blood-brain barrier permeation. Drug Discov. Today 2003, 8, 927–933. [Google Scholar] [CrossRef]
Gupta, M.; Lee, H.J.; Barden, C.J.; Weaver, D.F. The Blood-Brain Barrier (BBB) score. J. Med. Chem. 2019, 62, 9824–9836. [Google Scholar] [CrossRef]
Roy, D.; Hinge, V.K.; Kovalenko, A. To pass or not to pass: Predicting the blood-brain barrier permeability with the 3D-RISM-KH molecular solvation theory. ACS Omega 2019, 4, 16774–16780. [Google Scholar] [CrossRef]
Carpenter, T.S.; Kirshner, D.A.; Lau, E.Y.; Wong, S.E.; Nilmeier, J.P.; Lightstone, F.C. A method to predict blood-brain barrier permeability of drug-like compounds using molecular dynamics simulations. Biophys. J. 2014, 107, 630–641. [Google Scholar] [CrossRef]
Wang, Y.; Gallagher, E.; Jorgensen, C.; Troendle, E.P.; Hu, D.; Searson, P.C.; Ulmschneider, M.B. An experimentally validated approach to calculate the blood-brain barrier permeability of small molecules. Sci. Rep. 2019, 9, 6117. [Google Scholar] [CrossRef]
Thai, N.Q.; Theodorakis, P.E.; Li, M.S. Fast estimation of the blood-brain barrier permeability by pulling a ligand through a lipid membrane. J. Chem. Inf. Model. 2020, 60, 3057–3067. [Google Scholar] [CrossRef]
Kouskoura, M.G.; Piteni, A.I.; Markopoulou, C.K. A new descriptor via bio-mimetic chromatography and modeling for the blood brain barrier (Part II). J. Pharm. Biomed. Anal. 2019, 164, 808–817. [Google Scholar] [CrossRef]
Sobańska, A.W.; Wanat, K.; Brzezińska, E. Prediction of the blood-brain barrier permeability using RP-18 thin layer chromatography. Open Chem. 2019, 17, 43–56. [Google Scholar] [CrossRef]
Janicka, M.; Sztanke, M.; Sztanke, K. Predicting the blood-brain barrier permeability of new drug-like compounds via HPLC with various stationary phases. Molecules 2020, 25, 487. [Google Scholar] [CrossRef] [PubMed]
Guntner, A.S.; Thalhamer, B.; Klampfl, C.; Buchberger, W. Collision cross sections obtained with ion mobility mass spectrometry as new descriptor to predict blood-brain barrier permeation by drugs. Sci. Rep. 2019, 9, 19182. [Google Scholar] [CrossRef]
Lingineni, K.; Belekar, V.; Tangadpalliwar, S.R.; Garg, P. The role of multidrug resistance protein (MRP-1) as an active efflux transporter on blood-brain barrier (BBB) permeability. Mol. Divers. 2017, 21, 355–365. [Google Scholar] [CrossRef] [PubMed]
Katritzky, A.R.; Kuanar, M.; Slavov, S.; Dobchev, D.A.; Fara, D.C.; Karelson, M.; Acree, W.E.; Solov’ev, V.P.; Varnek, A. Correlation of blood-brain penetration using structural descriptors. Bioorg. Med. Chem. 2006, 14, 4888–4917. [Google Scholar] [CrossRef] [PubMed]
Hemmateenejad, B.; Miri, R.; Safarpour, M.A.; Mehdipour, A.R. Accurate prediction of the blood-brain partitioning of a large set of solutes using ab initio calculations and genetic neural network modeling. J. Comput. Chem. 2006, 27, 1125–1135. [Google Scholar] [CrossRef]
Zhang, L.; Zhu, H.; Oprea, T.I.; Golbraikh, A.; Tropsha, A. QSAR modeling of the blood-brain barrier permeability for diverse organic compounds. Pharm. Res. 2008, 25, 1902–1914. [Google Scholar] [CrossRef]
Fan, Y.; Unwalla, R.; Denny, R.A.; Di, L.; Kerns, E.H.; Diller, D.J.; Humblet, C. Insights for predicting blood-brain barrier penetration of CNS targeted molecules using QSPR approaches. J. Chem. Inf. Model. 2010, 50, 1123–1133. [Google Scholar] [CrossRef]
Zhang, Y.-H.; Xia, Z.-N.; Qin, L.-T.; Liu, S.-S. Prediction of blood-brain partitioning: A model based on molecular electronegativity distance vector descriptors. J. Mol. Graph. Model. 2010, 29, 214–220. [Google Scholar] [CrossRef]
Muehlbacher, M.; Spitzer, G.M.; Liedl, K.R.; Kornhuber, J. Qualitative prediction of blood-brain barrier permeability on a large and refined dataset. J. Comput.-Aided Mol. Des. 2011, 25, 1095–1106. [Google Scholar] [CrossRef]
Chen, H.; Winiwarter, S.; Fridén, M.; Antonsson, M.; Engkvist, O. In silico prediction of unbound brain-to-plasma concentration ratio using machine learning algorithms. J. Mol. Graph. Model. 2011, 29, 985–995. [Google Scholar] [CrossRef] [PubMed]
Nikolic, K.; Filipic, S.; Smoliński, A.; Kaliszan, R.; Agbaba, D. Partial least square and hierarchical clustering in ADMET modeling: Prediction of blood-brain barrier permeation of α-adrenergic and imidazoline receptor ligands. J. Pharm. Pharm. Sci. 2013, 16, 622–647. [Google Scholar] [CrossRef] [PubMed]
Yuan, Y.; Zheng, F.; Zhan, C.-G. Improved prediction of blood-brain barrier permeability through machine learning with combined use of molecular property-based descriptors and fingerprints. AAPS J. 2018, 20, 54. [Google Scholar] [CrossRef] [PubMed]
Zhu, L.; Zhao, J.; Zhang, Y.; Zhou, W.; Yin, L.; Wang, Y.; Fan, Y.; Chen, Y.; Liu, H. ADME properties evaluation in drug discovery: In silico prediction of blood-brain partitioning. Mol. Divers. 2018, 22, 979–990. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Yang, H.; Wu, Z.; Wang, T.; Li, W.; Tang, Y.; Liu, G. In silico prediction of blood-brain barrier permeability of compounds by machine learning and resampling methods. ChemMedChem 2018, 13, 2189–2201. [Google Scholar] [CrossRef]
Majumdar, S.; Basak, S.C.; Lungu, C.N.; Diudea, M.V.; Grunwald, G.D. Finding needles in a haystack: Determining key molecular descriptors associated with the blood-brain barrier entry of chemical compounds using machine learning. Mol. Inform. 2019, 38, e1800164. [Google Scholar] [CrossRef]
Singh, M.; Divakaran, R.; Konda, L.S.K.; Kristam, R. A classification model for blood brain barrier penetration. J. Mol. Graph. Model. 2020, 96, 107516. [Google Scholar] [CrossRef]
Alsenan, S.; Al-Turaiki, I.; Hafez, A. A Recurrent Neural Network model to predict blood-brain barrier permeability. Comput. Biol. Chem. 2020, 89, 107377. [Google Scholar] [CrossRef]
Shaker, B.; Yu, M.-S.; Song, J.S.; Ahn, S.; Ryu, J.Y.; Oh, K.-S.; Na, D. LightBBB: Computational prediction model of blood-brain-barrier penetration based on LightGBM. Bioinformatics 2020. [Google Scholar] [CrossRef]
Zefirov, N.S.; Palyulin, V.A. Fragmental approach in QSPR. J. Chem. Inf. Comput. Sci. 2002, 42, 1112–1122. [Google Scholar] [CrossRef]
Artemenko, N.V.; Baskin, I.I.; Palyulin, V.A.; Zefirov, N.S. Artificial neural network and fragmental approach in prediction of physicochemical properties of organic compounds. Russ. Chem. Bull. 2003, 52, 20–29. [Google Scholar] [CrossRef]
Artemenko, N.V.; Baskin, I.I.; Palyulin, V.A.; Zefirov, N.S. Prediction of physical properties of organic compounds using artificial neural networks within the substructure approach. Dokl. Chem. 2001, 381, 317–320. [Google Scholar] [CrossRef]
Artemenko, N.V.; Palyulin, V.A.; Zefirov, N.S. Neural-network model of the lipophilicity of organic compounds based on fragment descriptors. Dokl. Chem. 2002, 383, 114–116. [Google Scholar] [CrossRef]
Dyabina, A.S.; Radchenko, E.V.; Palyulin, V.A.; Zefirov, N.S. Prediction of blood-brain barrier permeability of organic compounds. Dokl. Biochem. Biophys. 2016, 470, 371–374. [Google Scholar] [CrossRef] [PubMed]
Radchenko, E.V.; Dyabina, A.S.; Palyulin, V.A.; Zefirov, N.S. Prediction of human intestinal absorption of drug compounds. Russ. Chem. Bull. 2016, 65, 576–580. [Google Scholar] [CrossRef]
Radchenko, E.V.; Rulev, Y.A.; Safanyaev, A.Y.; Palyulin, V.A.; Zefirov, N.S. Computer-aided estimation of the hERG-mediated cardiotoxicity risk of potential drug components. Dokl. Biochem. Biophys. 2017, 473, 128–131. [Google Scholar] [CrossRef]
Berishvili, V.P.; Kuimov, A.N.; Voronkov, A.E.; Radchenko, E.V.; Kumar, P.; Choonara, Y.E.; Pillay, V.; Kamal, A.; Palyulin, V.A. Discovery of novel tankyrase inhibitors through molecular docking-based virtual screening and molecular dynamics simulation studies. Molecules 2020, 25, 3171. [Google Scholar] [CrossRef]
Karlov, D.S.; Radchenko, E.V.; Palyulin, V.A.; Zefirov, N.S. Molecular design of proneurogenic and neuroprotective compounds-allosteric NMDA receptor modulators. Dokl. Biochem. Biophys. 2017, 473, 132–136. [Google Scholar] [CrossRef]
Makhaeva, G.F.; Kovaleva, N.V.; Boltneva, N.P.; Lushchekina, S.V.; Astakhova, T.Y.; Rudakova, E.V.; Proshin, A.N.; Serkov, I.V.; Radchenko, E.V.; Palyulin, V.A.; et al. New hybrids of 4-amino-2,3-polymethylene-quinoline and p-tolylsulfonamide as dual inhibitors of acetyl- and butyrylcholinesterase and potential multifunctional agents for Alzheimer’s disease treatment. Molecules 2020, 25, 3915. [Google Scholar] [CrossRef]
Makhaeva, G.F.; Kovaleva, N.V.; Boltneva, N.P.; Lushchekina, S.V.; Rudakova, E.V.; Stupina, T.S.; Terentiev, A.A.; Serkov, I.V.; Proshin, A.N.; Radchenko, E.V.; et al. Conjugates of tacrine and 1,2,4-thiadiazole derivatives as new potential multifunctional agents for Alzheimer’s disease treatment: Synthesis, quantum-chemical characterization, molecular docking, and biological evaluation. Bioorg. Chem. 2020, 94, 103387. [Google Scholar] [CrossRef]
Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 2010, 29, 476–488. [Google Scholar] [CrossRef] [PubMed]
Ekins, S.; Tropsha, A. A turning point for blood-brain barrier modeling. Pharm. Res. 2009, 26, 1283–1284. [Google Scholar] [CrossRef] [PubMed]
Fourches, D.; Muratov, E.; Tropsha, A. Trust, but verify: On the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 2010, 50, 1189–1204. [Google Scholar] [CrossRef] [PubMed]
Fourches, D.; Muratov, E.; Tropsha, A. Trust, but verify II: A practical guide to chemogenomics data curation. J. Chem. Inf. Model. 2016, 56, 1243–1252. [Google Scholar] [CrossRef]
Oprea, T.I.; Olah, M.; Ostopovici, L.; Rad, R.; Mracec, M. On the propagation of errors in the QSAR literature. In EuroQSAR 2002 Designing Drugs and Crop Protectants: Processes, Problems and Solutions; Ford, M., Livingstone, D., Dearden, J., van de Waterbeemd, H., Eds.; Blackwell Science Inc.: New York, NY, USA, 2003; pp. 314–315. ISBN 978-1-4051-2516-1. [Google Scholar]
Baskin, I.I.; Halberstam, N.M.; Artemenko, N.V.; Palyulin, V.A.; Zefirov, N.S. NASAWIN—A universal software for QSPR/QSAR studies. In EuroQSAR 2002 Designing Drugs and Crop Protectants: Processes, Problems and Solutions; Ford, M., Livingstone, D., Dearden, J., van de Waterbeemd, H., Eds.; Blackwell Science Inc.: New York, NY, USA, 2003; pp. 260–263. ISBN 978-1-4051-2516-1. [Google Scholar]
Filzmoser, P.; Liebmann, B.; Varmuza, K. Repeated double cross validation. J. Chemom. 2009, 23, 160–171. [Google Scholar] [CrossRef]
Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-normalizing neural networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems; NIPS’17. Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 972–981, ISBN 978-1-5108-6096-4. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Ross, B.C. Mutual information between discrete and continuous data sets. PLoS ONE 2014, 9, e87357. [Google Scholar] [CrossRef]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; Cox, D.D. Hyperopt: A Python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 2015, 8, 014008. [Google Scholar] [CrossRef]
Baskin, I.I.; Ait, A.O.; Halberstam, N.M.; Palyulin, V.A.; Zefirov, N.S. An approach to the interpretation of backpropagation neural network models in QSAR studies. SAR QSAR Environ. Res. 2002, 13, 35–41. [Google Scholar] [CrossRef]
Geldenhuys, W.J.; Mohammad, A.S.; Adkins, C.E.; Lockman, P.R. Molecular determinants of blood-brain barrier permeation. Ther. Deliv. 2015, 6, 961–971. [Google Scholar] [CrossRef] [PubMed]
Wichmann, K.; Diedenhofen, M.; Klamt, A. Prediction of blood-brain partitioning and human serum albumin binding based on COSMO-RS σ-moments. J. Chem. Inf. Model. 2007, 47, 228–233. [Google Scholar] [CrossRef] [PubMed]
Telting-Diaz, M.; Lunte, C.E. Distribution of tacrine across the blood-brain barrier in awake, freely moving rats using in vivo microdialysis sampling. Pharm. Res. 1993, 10, 44–48. [Google Scholar] [CrossRef] [PubMed]
Fridén, M.; Winiwarter, S.; Jerndal, G.; Bengtsson, O.; Wan, H.; Bredberg, U.; Hammarlund-Udenaes, M.; Antonsson, M. Structure-brain exposure relationships in rat and human using a novel data set of unbound drug concentrations in brain interstitial and cerebrospinal fluids. J. Med. Chem. 2009, 52, 6233–6243. [Google Scholar] [CrossRef]
Tsinman, O.; Tsinman, K.; Sun, N.; Avdeef, A. Physicochemical selectivity of the BBB microenvironment governing passive diffusion—Matching with a porcine brain lipid extract artificial membrane permeability model. Pharm. Res. 2011, 28, 337–363. [Google Scholar] [CrossRef]

Figure 1. Distribution of the LogBB values in the modeling dataset.

Figure 2. General modeling workflow.

Figure 3. Comparison of the experimental LogBB values and the values predicted during double cross-validation.

Figure 4. Fragments having the strongest negative (a) and positive (b) effect on the predicted value of BBB permeability of compounds. Fragments are highlighted in blue for negative and red for positive. Asterisk denotes any atom type; standalone atom symbol means any atom subtype compatible with the specified bond pattern. For more complex fragments, the examples of their occurrence in a structure are shown.

Figure 5. Comparison of the experimental and predicted LogBB values for the external validation dataset. The compounds overlapping with our training set are shown as blue diamonds and the non-overlapping compounds are shown as red circles.

Figure 6. Correlation between the ensemble standard deviations of predicted LogBB values and the resulting absolute prediction errors for the external validation dataset compounds.

Table 1. Statistical parameters for the comparison of experimental and predicted LogBB values for the external validation dataset.

Parameter	Full Set	Non-Overlapping Subset
Number of compounds N	564	213
Correlation coefficient R	0.78	0.58
Root mean squared error RMSE	0.47	0.68
Compounds with absolute error > 1.0	33 (6%)	29 (14%)
Compounds with absolute error > 1.5	11 (2%)	11 (5%)

Table 2. Additional analysis of some outlier compounds.

Compound	LogBB Val ¹	LogBB Train ²	LogBB Pred ³	Notes
2-(2-Aminoethyl)thiazole (YG16)	−1.40 (78)	−0.42	−0.37	Incorrect validation value
2-(2-Dimethylaminoethyl)pyridine (YG15)	−1.30 (131)	−0.06	−0.03	Incorrect validation value
Tacrine	1.16 (146)	−0.13	−0.00	Literature discrepancy
Warfarin	0.00 (520)	−1.30	−1.07	Calculated value in source

¹ Value in the validation dataset [26], compound number in parentheses. ² Value in our training set. ³ Value predicted by our model.

Sample Availability: The samples of compounds are not available from authors.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Radchenko, E.V.; Dyabina, A.S.; Palyulin, V.A. Towards Deep Neural Network Models for the Prediction of the Blood–Brain Barrier Permeability for Diverse Organic Compounds. Molecules 2020, 25, 5901. https://doi.org/10.3390/molecules25245901

AMA Style

Radchenko EV, Dyabina AS, Palyulin VA. Towards Deep Neural Network Models for the Prediction of the Blood–Brain Barrier Permeability for Diverse Organic Compounds. Molecules. 2020; 25(24):5901. https://doi.org/10.3390/molecules25245901

Chicago/Turabian Style

Radchenko, Eugene V., Alina S. Dyabina, and Vladimir A. Palyulin. 2020. "Towards Deep Neural Network Models for the Prediction of the Blood–Brain Barrier Permeability for Diverse Organic Compounds" Molecules 25, no. 24: 5901. https://doi.org/10.3390/molecules25245901

APA Style

Radchenko, E. V., Dyabina, A. S., & Palyulin, V. A. (2020). Towards Deep Neural Network Models for the Prediction of the Blood–Brain Barrier Permeability for Diverse Organic Compounds. Molecules, 25(24), 5901. https://doi.org/10.3390/molecules25245901

Article Menu

Towards Deep Neural Network Models for the Prediction of the Blood–Brain Barrier Permeability for Diverse Organic Compounds

Abstract

1. Introduction

2. Results and Discussion

2.1. Blood–Brain Barrier Permeability Dataset

2.2. Molecular Descriptors

2.3. Neural Network Modeling Procedure

2.3.1. General Modeling Approach

2.3.2. Data Preprocessing and Descriptor Selection

2.3.3. Hyperparameter Optimization

2.3.4. Prediction and Applicability Control

2.4. Predictive LogBB Model

2.4.1. Optimal Architecture and Model Quality

2.4.2. Model Interpretation

2.4.3. External Validation

3. Materials and Methods

3.1. Blood–Brain Barrier Permeability Datasets

3.2. Modeling Workflow

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI