Next Article in Journal
Effects of Single Nucleotide Polymorphism Ala270Ser (rs316019) on the Function and Regulation of hOCT2
Next Article in Special Issue
Dynamical Rearrangement of Human Epidermal Growth Factor Receptor 2 upon Antibody Binding: Effects on the Dimerization
Previous Article in Journal
Development of a Simple Pretreatment Immunoassay Based on an Organic Solvent-Tolerant Nanobody for the Detection of Carbofuran in Vegetable and Fruit Samples
Open AccessArticle

A Computational Toxicology Approach to Screen the Hepatotoxic Ingredients in Traditional Chinese Medicines: Polygonum multiflorum Thunb as a Case Study

by 1,2,3,4,5, 1,2,3,4,5, 1,2,3,4,5, 1,2,3,4,5, 1,2,3,4,5,* and 1,2,3,4,5,*
1
Beijing Key Laboratory of Innovative Drug Discovery of Traditional Chinese Medicine (Natural Medicine) and Translational Medicine, Institute of Medicinal Plant Development, Peking Union Medical College and Chinese Academy of Medical Sciences, Beijing 100193, China
2
Key Laboratory of Bioactive Substances and Resource Utilization of Chinese Herbal Medicine, Ministry of Education, Beijing 100193, China
3
Key Laboratory of Efficacy Evaluation of Chinese Medicine against Glycolipid Metabolic Disorders, State Administration of Traditional Chinese Medicine, Beijing 100193, China
4
Key Laboratory of new drug discovery based on Classic Chinese medicine prescription, Chinese Academy of Medical Sciences, Beijing 100193, China
5
Zhongguancun Open Laboratory of the Research and Development of Natural Medicine and Health Products, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100193, China
*
Authors to whom correspondence should be addressed.
Biomolecules 2019, 9(10), 577; https://doi.org/10.3390/biom9100577
Received: 8 September 2019 / Revised: 29 September 2019 / Accepted: 5 October 2019 / Published: 7 October 2019

Abstract

In recent years, liver injury induced by Traditional Chinese Medicines (TCMs) has gained increasing attention worldwide. Assessing the hepatotoxicity of compounds in TCMs is essential and inevitable for both doctors and regulatory agencies. However, there has been no effective method to screen the hepatotoxic ingredients in TCMs available until now. In the present study, we initially built a large scale dataset of drug-induced liver injuries (DILIs). Then, 13 types of molecular fingerprints/descriptors and eight machine learning algorithms were utilized to develop single classifiers for DILI, which resulted in 5416 single classifiers. Next, the NaiveBayes algorithm was adopted to integrate the best single classifier of each machine learning algorithm, by which we attempted to build a combined classifier. The accuracy, sensitivity, specificity, and area under the curve of the combined classifier were 72.798, 0.732, 0.724, and 0.793, respectively. Compared to several prior studies, the combined classifier provided better performance both in cross validation and external validation. In our prior study, we developed a herb-hepatotoxic ingredient network and a herb-induced liver injury (HILI) dataset based on pre-clinical evidence published in the scientific literature. Herein, by combining that and the combined classifier developed in this work, we proposed the first instance of a computational toxicology to screen the hepatotoxic ingredients in TCMs. Then Polygonum multiflorum Thunb (PmT) was used as a case to investigate the reliability of the approach proposed. Consequently, a total of 25 ingredients in PmT were identified as hepatotoxicants. The results were highly consistent with records in the literature, indicating that our computational toxicology approach is reliable and effective for the screening of hepatotoxic ingredients in Pmt. The combined classifier developed in this work can be used to assess the hepatotoxic risk of both natural compounds and synthetic drugs. The computational toxicology approach presented in this work will assist with screening the hepatotoxic ingredients in TCMs, which will further lay the foundation for exploring the hepatotoxic mechanisms of TCMs. In addition, the method proposed in this work can be applied to research focused on other adverse effects of TCMs/synthetic drugs.
Keywords: Traditional Chinese Medicines; TCMs; drug-induced liver injury; DILI; computational toxicology; hepatotoxicity; Polygonum multiflorum Thunb Traditional Chinese Medicines; TCMs; drug-induced liver injury; DILI; computational toxicology; hepatotoxicity; Polygonum multiflorum Thunb

1. Introduction

For thousands of years, Traditional Chinese Medicines (TCMs) have been widely consumed in many Asian countries. As natural products, TCMs were generally considered to be green and harmless [1]. However, actually, as all drugs do, TCMs also possess a series of side effects. The hepatotoxicity [2], nephrotoxicity [3], cardiotoxicity [4], neurotoxicity [5], carcinogenicity [6], and some other toxicities of TCMs have been reported in the literature. As one of the major concerns of TCMs, hepatotoxicity has gained more and more attention recently. Currently, hundreds of TCMs or their extracts have been reported to possess potential hepatotoxicity [7,8,9]. Several special databases focused on TCM-induced liver injuries also have been developed, such as Hepatox (http://www.hepatox.org) and HDS hepatotoxicity databases [10]. Additionally, in recent studies focused on the etiology of drug-induced liver injury (DILI) in several Asian countries, TCMs were found to be the leading induction factors of DILI [11,12]. TCMs with hepatotoxicity always resulted in severe clinical adverse events, such as liver fibrosis, hepatitis, liver failure, and even death [13]. Therefore, studies focused on the hepatotoxicity of TCMs are urgent and imperative.
The identification of hepatotoxic ingredients is always the first step to exploring the hepatotoxicity of TCMs. Unlike the synthetic drugs, which generally produce their efficacy based on the theory of one drug and one target, the efficacy/toxicity of TCMs relies on the comprehensive effects of multiple ingredients, multiple targets, and multiple pathways [14]. The molecular basis of a TCM is not a single chemical entity but the combination of many chemical ingredients. Therefore, it is difficult to unveil the mechanisms of TCMs solely based on the reductionism research strategies of Western medicine [15]. Network pharmacology, a burgeoning new field, analyzes the mechanisms of drugs by integrating the complex interactions among drugs, genes, diseases, and any other relevant entities in biological systems comprehensively. The basic theory of network pharmacology is in accordance with the holistic idea of TCMs [16].
During the last decade, network pharmacology has been applied to identify the active/toxic ingredients of many TCMs [17,18,19]. Recently, taking Polygonum multiflorum Thunb (PmT) as a case, Wang et al. proposed a pathway-based systems toxicology approach to understand TCM-induced liver injury [20]. As a result, a total of 54 compounds were found to be associated with the hepatotoxicity of PmT. The top seven compounds consisted of luteolin, kaempferol, gallic acid, resveratrol, apigenin, quercetin, and emodin. The hepatotoxicity of emodin has been well documented in many studies [21,22,23,24,25]. Luteolin was reported to cause cytotoxicity in primary rat hepatocytes at dosages of 50 μM or lower levels of concentration [26]. Apigenin was found to can significantly increase the accumulation of lipid droplets and cause fatty liver disease [27]. However, for the other four compounds, no direct evidence focused on their hepatotoxicity was retrieved. Inversely, all of those four compounds were reported to be promising hepatoprotectors [28,29,30,31,32]. In another study, a similar phenomenon was observed. Based on the network pharmacology framework, the authors attempted to identify the potential hepatotoxic ingredients in Xiao-Chai-Hu-Tang. As a result, kaempferol and thymol exhibited the largest number of hepatotoxic targets connections [33]. However, neither clinical cases nor scientific reports about the hepatotoxicity of those two compounds were available. In fact, both kaempferol and thymol could significantly attenuate the liver injury induced by several hepatotoxicants [32,34]. In the two representative cases mentioned above, one can easily find that those potential hepatotoxic compounds identified solely based on network pharmacology were the mixtures of beneficial and toxic ingredients. Therefore, to uncover the real hepatotoxic ingredients in TCMs, research focused on differentiating hepatotoxicants and non-hepatotoxicants is required.
Quantitative structure–activity relationship (QSAR) is aimed at correlating structure with activity [35]. In recent years, it has been widely applied to assess the hepatotoxic risks of the synthetic drugs [36,37,38]. However, QSAR research focused on evaluating the hepatotoxicity of TCMs is very rare. Huang et al. [39], Shi et al. [40], Liu et al. [41], and Wu et al. [42] have attempted to develop QSAR models to evaluate the hepatotoxic risks of ingredients from TCMs. However, all of those models were built solely based on the synthetic drugs. As we all know, the chemical environment of natural products is quite different from that of the synthetic drugs [43,44]. Therefore, in silico models developed solely based on the synthetic drugs are always not applicable to natural products. Based on the Liver Toxicity Knowledge Base, Zhao et al. [45] and Ye et al. [46] have attempted to develop QSAR models to evaluate the hepatotoxic risks of ingredients from TCMs. Although the training sets of those two studies were relatively small (n ≤ 350), they still indicated that QSAR models developed based on both TCMs and the synthetic drugs outperformed those only relying on the synthetic drugs.
In our previous studies, we collected two hepatotoxic datasets, among which, one was specially focused on herb-induced liver injury (HILI) [47,48]. In the current study, by integrating those two datasets and Comparative Toxicogenomics Database (CTD), we built a novel hepatotoxic dataset. Then, we attempted to develop QSAR models to predict the hepatotoxic risks of compounds by incorporating the use of 13 types of molecular fingerprints/descriptors and eight machine learning algorithms (NaiveBayes, LibSVM, IBK, KStar, AdaboostM1, Bagging, J48, and RandomForest). The recursive feature elimination (RFE) method was utilized to identify the optimal feature subset of each machine learning algorithm. As a result, the best model of each machine learning algorithm was attained. We then used the NaiveBayes algorithm to develop a combined classifier based on the eight best single classifiers. By integrating the three external validation sets collected by Ai et al. [37], Zhang et al. [49], and Kotsampasakou et al. [38], an integrated external validation set was acquired and utilized to test the reliability of the combined classifier. In addition, comparisons between the combined classifier and prior studies were conducted against the three external validation sets separately. Finally, taking PmT as a case, we proposed a computational toxicology approach to screen the hepatotoxic ingredients in TCMs by combining the combined classifier constructed in this work and the herb-hepatotoxic ingredient network and HILI dataset published in our prior studies [47,48].

2. Materials and Methods

2.1. Data Set

As a robust, publicly available database, CTD is focused on understanding the interaction between the environmental exposures and human health [50]. We searched CTD with the term, “Chemical and Drug Induced Liver Injury.” Only those hepatotoxic compounds curated by experts were preserved. As a result, a total of 1009 hepatotoxic compounds were collected. Then, the data preparation method used in our prior study was adopted to unify the structures of compounds collected from CTD [47]. In our prior studies, two hepatotoxic datasets were compiled by retrieving scientific publications comprehensively [47,48]. Of note, one of the datasets is specially focused on HILI, including 584 non-hepatotoxic ingredients and 296 hepatotoxic ingredients. We integrated those two datasets and the CTD dataset, deleting duplicates and compounds that appeared in the external validation sets. As a result, a novel dataset of DILI, including 1049 hepatotoxic compounds and 1142 non-hepatotoxic compounds, was built. This dataset was used to develop QSAR models for predicting DILI. It should be noted that a total of 287 unique hepatotoxic compounds from CTD were retained. For external validation, test sets compiled by Ai et al. [37], Zhang et al. [49], and Kotsampasakou et al. [38] were adopted. Both the training set and external test sets used in this work were listed in Supplementary File 1.

2.2. Descriptors

As a freely available software package for calculating molecular descriptors and fingerprints, PaDEL-Descriptor [51] has been successfully used in many QSAR studies. In the current study, to characterize the molecular structural property of compounds, 12 types of fingerprints and a series of 2D molecular descriptors were generated by PaDEL-Descriptor (version 2.21) and detailed in Table 1. A detailed description focused on those molecular fingerprints/descriptors is provided in Supplementary File 1 (Fingerprints-descriptors).

2.3. Feature Selection

Feature selection is an effective method to eliminate redundant and irrelevant features. In the present work, we selected the features through two steps, as follows: Firstly, Boruta, an all-relevant feature selection wrapper algorithm, was adopted to select all features related to the hepatotoxic label [52]. As detailed in Table 1, for the 13 types of molecular fingerprints/descriptors, the number of molecular features identified by Boruta ranged from 13 to 138. Then, Pearson’s correlation analysis was conducted, focusing on those features. To remove the highly correlated features (Pearson’s correlation coefficients > 0.90), the findCorrelation function from the R package (version 3.5.3) caret was used. Finally, RFE from sklearn. feature_selection was implemented to rank the features. The parameter configuration of RFE was listed as follows: n_features_to_select = 1, step = 1. To perform RFE, a machine learning algorithm is required to assess the importance of each feature. Herein, we adopted the RandomForestClassifier from the sklearn. ensemble with the default parameter settings.

2.4. Machine Learning Models

In this work, all of the data mining tasks were implemented against Waikato Environment for Knowledge Analysis (WEKA, version 3.8.3) [53] within 10-fold cross validation. As an open and freely available machine learning tool, WEKA contains several different categories of supervised classifiers, such as bayes, functions, lazy, meta, and trees. To integrate the advantages of different categories of classifiers, we selected one or two representative algorithms in each category of classifier. As a result, a total of eight machine learning algorithms were selected, including NaiveBayes (bayes), LibSVM (function), IBK (lazy), KStar (lazy), AdaboostM1-J48 (meta), Bagging-IBK (meta), J48 (trees), and RandomForest (tree). Detailed descriptions of the basic theory of these machine learning algorithms are available in our previous paper [54].
The detailed model-building workflow was displayed in Figure 1.
(1)
Single classifiers: For each type of molecular fingerprint/descriptor, the non-redundant features listed in Table 1 were used as input to implement the eight machine learning algorithms selected.
Then, RFE was performed to identify the optimal feature subset for each machine learning algorithm. We updated the feature subset of each type of molecular fingerprint/descriptor according to the feature rankings provided by the RFE method. We removed the lowest-ranked feature, and the remaining features were used as input to train the eight machine learning algorithms again. This step was repeated until the number of features was reduced to 1. As a result, a total of 677 models were developed for each machine learning algorithm. Then comparisons among those models were conducted, by which we attempted to identify the optimal feature subset for each machine learning algorithm. For each machine learning algorithm, the feature subset generating the highest average value of area under curve (AUC) and accuracy (ACC) was selected as the optimal feature subset. Of note, herein, all of the machine learning algorithms were implemented with the default parameter settings provided by WEKA. The optimal feature subsets for all of the eight machine learning algorithms are available in Table 3 and are detailed in Supplementary File 1 (Optimal feature subsets).
(2)
Parameter optimization: In the development of the single classifiers, CVParameterSelection and GridSearch, two commonly used parameter optimization strategies, were adopted to determine the optimal parameter for each machine learning algorithm. The detailed parameter optimization information is provided in Table 2. All of the parameter optimization methods were implemented against WEKA within 10-fold cross validation.
(3)
Combined classifiers: It has been proven that the combined classifiers strategy outperformed many classic classifier-fusion techniques (such as mean, maximum, and majority vote) [55]. In this work, the eight best single classifiers were utilized to predict the training sets. For each compound in the training set, eight predicted results (positive/negative) were provided by the eight best single classifiers. Then, those eight predicted results were defined as new descriptors and used as input to train NaiveBayes algorithm. As a result, a combined classifier focused on predicting DILI was developed.
(4)
External Validation: By integrating the three external validation sets collected by Ai et al. [37], Zhang et al. [49], and Kotsampasakou et al. [38], an integrated external validation set was acquired and utilized to test the generalization ability of our model. In addition, to investigate the model’s robustness and statistical significance, Y-randomization analysis was performed [56].

2.5. Performance Metrics

To assess the predictive ability of the QSAR models, four parameters were calculated and defined as follows:
Accuracy = ( True   positives + True   negatives ) ( N )
Specificity = True   negatives ( True   negatives + False   positives )
Sensitivity = True   positives ( True   positives + False   negatives )
Balanced   accuracy = ( Sensitivity + Specificity ) 2
where N represents the size of the training set, and positives and negatives represent hepatotoxicants and non-hepatotoxicants, respectively.
In addition, AUC, another important indicator, was also calculated through the receiver operating characteristic analysis [57].

2.6. Herb-Hepatotoxic Ingredient Network

In our previous study, we collected hundreds of hepatotoxic ingredients and hepatotoxic herbs. Then, based on the ”herb-ingredient” pair data included in several typical TCM databases, we constructed a herb–hepatotoxic ingredient network. A total of 187 hepatotoxic herbs and 223 hepatotoxic ingredients were included in the network [48]. In the present study, this herb-hepatotoxic ingredient network was used to identify the hepatotoxic ingredients in TCMs.

3. Results

3.1. The Construction of In Silico Models for Predicting Drug-Induced Liver Injury

3.1.1. Single Classifiers

The performance of the eight best single classifiers is provided in Table 3. The sizes of the optimal feature subsets ranged from 14 to 80. Four of the eight machine learning algorithms attained their best performance based on 2D Descriptor. In addition, single classifiers developed based on 2D Descriptor, including RandomForest, Bagging, AdaBoostM1, and IBk, showed greater AUC values than those developed based on molecular fingerprints, indicating the advantage of 2D Descriptor in the development of DILI prediction models. The highest AUC (0.794) was generated by RandomForest and 2D Descriptor when the feature number was reduced to 30. For that single classifier, ACC (accuracy), SE (sensitivity), SP (specificity), and BACC (balanced accuracy) were 72.250, 0.725, 0.720, and 0.723, respectively. For the other seven single classifiers, ACC varied between 64.993 and 70.196; AUC ranged from 0.699 to 0.762; SE, SP, and BACC were generally greater than 0.651, 0.625, and 0.651, respectively. Figure 2 provides the receiver-operating characteristic curves of the eight best single classifiers, displaying the classifiers’ performances more intuitively. In summary, most of the single classifiers did not provide a satisfactory performance, with their ACC values being less than 70%.

3.1.2. Combined Classifier

By using the predicted results (positive/negative), of the eight best single classifiers developed in Section 3.1.1 as input, a combined classifier was built based on the NaiveBayes algorithm. The ACC, AUC, SE, SP, and BACC values of this combined classifier were 72.798, 0.793, 0.732, 0.724, and 0.728, respectively.
The combined classifier outperformed all the eight best single classifiers. For example, the ACC of the combined classifier was 72.798, which was higher than that of the eight best single classifiers by 0.548% to 7.805%. Among the eight best single classifiers, RandomForest classifier generated the highest ACC (72.250). Compared to RandomForest classifier, ACC, SE, SP, and BACC of the combined classifier improved by 0.548%, 0.700%, 0.400%, and 0.550%, respectively. For the external validation, by integrating the three external validation sets and eliminating the duplicates, we attained an integrated external validation set consisting of 125 positives and 79 negatives. Table 4 showed the performance of the combined classifier and the eight best single classifiers on the integrated external validation set. The combined classifier achieved the highest ACC (78.922), AUC (0.826), SP (0.750), and BACC (0.782), which further indicated that the combined classifier outperformed the eight best single classifiers. The predicted results of the integrated external validation set provided by the combined classifier are detailed in Supplementary File 1 (Integrated validation set). Summing up the above, the combined classifier outperformed the single classifiers both in cross validation and external validation, indicating the advantage and effectiveness of the combined classifiers strategy.
For further validation, 100 runs of Y-randomization were conducted. As a result, the ACC (50.8507 ± 0.9103) of the Y-randomization model was significantly worse than that of the combined classifier (ACC = 72.250), indicating the robustness and statistical significance of the combined classifier.

3.1.3. Comparison to Prior Studies

Generally, performance comparisons to prior studies are indispensable. We compared our combined classifier to three, recently reported DILI prediction models from three aspects: sample size, cross validation, and external validation. Of note, for the external validation, the combined classifier was tested against the external validation sets collected by Ai et al. [36], Zhang et al. [42], and Kotsampasakou et al. [43] separately.
For the sample size, our combined classifier and the other three models contained 2191, 1241, 978, and 996 compounds, respectively. The sample size of our combined classifier was 1.77 to 2.24 times of the other three models (Figure 3). Within cross validation, the SE of Ai’s model was 0.799, which was higher than that of our combined classifier by 6.7% (Figure 3). However, for the external validation, our combined classifier attained SE of 0.909, which is equal to that of Ai’s model (Figure 4A). In addition, both in cross validation (Figure 3) and external validation (Figure 4A), ACC, AUC, SP, and BACC of our combined classifier were significantly higher than that of Ai’s model. Within cross validation, Zhang’s model [48] attained the highest ACC, AUC, SE, and SP (Figure 3). However, SP of Zhang’s model was only 0.585, which reduced its application value significantly. Against Zhang’s external validation set, the ACC, SP, and BACC of our combined classifier were 0.753, 0.815, and 0.770, which are higher than those of Zhang’s model by 7.1%, 47%, and 17.3%, respectively (Figure 4B). The SE (0.848) of Zhang’s model was quite satisfactory. However, the SP was only 0.345. Zhang’s model provided very low SP value both in cross validation and external validation, which indicates that the results provided by Zhang’s model may contain a very number of false positives. In addition, all of the indexes provided by our combined classifier were significantly higher than those of Kotsampasakou’s model [38] both in cross validation (Figure 3) and external validation (Figure 4C).
Actually, compared to in silico models developed solely based on the synthetic drugs, it is more difficult for our combined classifier to attain a high accuracy. Both Zhao et al. [44] and Ye et al. [45] have demonstrated that models built solely based on synthetic drugs make it easier to attain a higher accuracy than those developed based on both TCMs and the synthetic drugs. The reason may be that the structures of natural products are more complex and diverse than those of synthetic drugs, increasing the difficulties of prediction.
In summary, the performance of our combined classifier was satisfactory. Both the cross validation and external validation indicated that our combined classifier was better than the prior studies. The detailed prediction results of the external validation sets collected by Ai et al. [36], Zhang et al. [42], and Kotsampasakou et al. [43] provided by the combined classifier and prior studies are available in Supplementary File 1 (Ai et al, Zhang et al, and Kotsampasakou et al).

3.2. A Computational Toxicology Approach to Screening the Hepatotoxic Ingredients in TCMs: Polygonum multiflorum Thunb as a Case

3.2.1. The Computational Toxicology Approach

In this section, we describe the computational toxicology approach proposed to screen the hepatotoxic ingredients in TCMs. The entire workflow of this computational toxicology approach is illustrated in Figure 5. The hepatotoxic ingredients of TCMs consisted of three hepatotoxic ingredient subgroups.
Hepatotoxic ingredient subgroup 1: The chemical ingredients of TCMs were extracted from TCM databases or collected from the literature. Then, the combined classifier developed in Section 3.1.2 was utilized to assess the hepatotoxic risk of each ingredient. The ingredients with higher hepatotoxic risks were marked as hepatotoxic ingredient subgroup 1.
Hepatotoxic ingredient subgroup 2: In our previous study, we developed a HILI dataset, which includes 296 hepatotoxic ingredients [48]. Herein, the chemical ingredients of TCMs were used as input to search this HILI dataset by name matching. The chemical ingredients of TCMs that appeared in the HILI dataset formed the hepatotoxic ingredients of subgroup 2.
Hepatotoxic ingredient subgroup 3: In our prior study, we constructed a herb-hepatotoxic ingredient network which includes 223 hepatotoxic ingredients and 187 hepatotoxic herbs [48]. Taking the herb name as input, one can obtain a hepatotoxic ingredient list of the corresponding herb. This hepatotoxic ingredient list was recorded as hepatotoxic ingredient subgroup 3.
Finally, the hepatotoxic ingredients were confirmed by integrating the three hepatotoxic ingredient subgroups and removing duplicates.

3.2.2. Polygonum multiflorum Thunb as a Case

PmT, a commonly used herb in China, is known for blackening hair, tonifying the liver and kidney, and slowing the aging process. However, during recent decades, it has been reported to cause variable degrees of liver injury [58]. Taking PmT as an example, we attempted to investigate the reliability of the computational toxicity approach.
By retrieving three typical TCM databases, TCMSP, TCMID, and TCM [email protected], Wang et al. collected 98 chemical components of PmT [20]. Those chemical components were used as input to identify the hepatotoxic ingredients of PmT. All of those 98 ingredients were provided in Supplementary File 1 (PmT) where the hepatotoxic risk of each compound provided by our combined classifier was also available.
Hepatotoxic ingredient subgroup 1: Among the 98 chemical ingredients of PmT, 21 ingredients were predicted as hepatotoxic by our combined classifier. Those 21 ingredients formed the hepatotoxic ingredient subgroup 1 of PmT.
Hierarchical cluster analysis based on Euclidean distance was performed to investigate the distribution of the 21 potential hepatotoxic compounds in the ingredient spectrum of PmT (Figure 6). As a result, the 21 hepatotoxic ingredients were significantly clustered into four major groups (Clusters 1–4). Ingredients in each cluster may have similar chemical characteristics. Such a cluster pattern was in line with the basic theory of QSAR, showing that our predicted results were reasonable to certain extent. In addition, consistency between the predicted results and reports in the literature for each compound was investigated and detailed in Supplementary File 1 (PmT). According to records in the literature, seven (Emodin, chrysophanol, rhein, danthron, aloe emodin, physcion, and Apigenin) out of the 21 hepatotoxic ingredients could cause variable degrees of liver injury. For the other 14 (Table 5 12–25) hepatotoxic ingredients, although direct evidence focused on their hepatotoxicity was not available, no one of them was reported to be potential hepatoprotector. A total of 77 chemical ingredients of PmT were predicted as non-hepatotoxicity, among which only three (Emodin 8-glucoside, physcion-8-O-D-glucopyranoside, and luteolin) chemical ingredients were reported to possess potential hepatotoxicity. For the other 74 chemical ingredients, no direct evidence about their hepatotoxicity was retrieved. Two (2,3,5,4’-tetrahydroxystilbene-2-O-β-D-glucopyranoside and resveratrol) of those 74 ingredients were reported to be non-hepatotoxic. In addition, a total of 17 ingredients among the 74 ingredients were reported to be potential hepatoprotectors. In summary, the predicted results of our combined classifier were highly consistent with reports in the literature, which further verified the reliability of the combined classifier.
Hepatotoxic ingredient subgroup 2: Taking the 98 chemical ingredients of PmT as input, we searched the HILI dataset by name matching. Resultantly, six ingredients (Table 5) were retrieved in the HILI dataset. Those six ingredients formed the hepatotoxic ingredient subgroup 2 of PmT.
Hepatotoxic ingredient subgroup 3: He Shou Wu, the pinyin name of PmT, was used as input to search the herb-hepatotoxic ingredient network. As a result, a total of seven hepatotoxic (Table 5) ingredients were searched, which formed the hepatotoxic ingredient subgroup 3 of PmT.
Finally, we acquired the hepatotoxic ingredients of PmT by integrating the three hepatotoxic ingredient subgroups of PmT and removing the duplicates. Table 5 shows the hepatotoxic ingredients of PmT, which consist of 25 compounds. The unique hepatotoxic ingredients from subgroup 1, subgroup 2, and subgroup 3 were 15, one, and three in number, respectively. The three hepatotoxic ingredient subgroups complemented each other, which showed the advantage of the computational toxicology approach.

4. Discussion

During the past few decades, a series of adverse effects caused by TCMs have been reported, among which hepatotoxicity is one of the major concerns. Identification of toxic ingredients is always considered to be the first step to illustrating the hepatotoxicity of TCMs. However, until now, an effective method to predict the hepatotoxicity of TCMs was unavailable. QSAR aims at finding the relationship between structure and activity/toxicity. In recent years, it has been widely used to predict the hepatotoxicity of the synthetic drugs. However, relevant research focused on TCMs is very rare. The major obstacle for such a situation is that dataset on the hepatotoxicity of TCMs is very limited. In our prior study, we collected a dataset on HILI, which laid the foundation for assessing the hepatotoxicity of TCMs based on the QSAR methods [48].
In this work, by utilizing eight machine learning algorithms and 13 types of molecular fingerprints/descriptors, a combined classifiers strategy was utilized to develop QSAR models for predicting the hepatotoxicity of TCMs. A total of 5416 single classifiers and one combined classifier were developed. For validation, an integrated external validation set was utilized to test the combined classifier. As a result, our combined classifier resulted in better performance than several prior studies both in cross validation and external validation. Within 10-fold cross validation, ACC of the combined classifier was 72.798, which was significantly higher than that of the Y-randomization model (ACC = 50.8507 ± 0.9103). The results mentioned above indicated that our combined classifier was stable and reliable. Another advantage of our combined classifier is the large scale of its training set. Our training set included 1049 positives and 1142 negatives. To the best of our knowledge, so large a DILI dataset is very rare. In addition, when predicting the hepatotoxicity of a compound, our combined classifier outputs the probability that the compound belongs to hepatotoxicants simultaneously. In this work, we set the threshold of the probability to 0.500. In other words, only those compounds with probabilities of hepatotoxicity greater than 0.500 would be categorized as hepatotoxicants, and all of the other compounds would be classified as non-hepatotoxicants. In practical applications, researchers can adjust the threshold of the probability according to their requirements. In our prior study, based on a training set consisting of 1254 compounds, we developed an in-silico model for predicting DILI by utilizing a voting method. Compared to our prior study, the advantages of this work include, but are not limited to, several aspects, as follows. Firstly, we built the combined classifier using 2191 unique compounds, 1.75 times more compounds than that used in our prior study. Secondly, a complex feature selection strategy was utilized to identify the optimal feature set for each machine learning algorithm. Thirdly, parameter optimization was performed in this work. Fourthly, the ACC, SE, SP, and BACC values of the combined classifier against the integrated external validation set were 78.922, 0.813, 0.750, and 0.782, which are higher than those of our prior work by 5.9%, 4.0%, 9.2%, and 6.6%, respectively.
Relying on a pathway-based systems toxicology approach, Wang et al. attempted to explore the hepatotoxicity of Pmt [20]. Firstly, the authors assessed the intestinal absorption properties of 98 compounds in PmT. Compounds with very poor intestinal absorption were discarded, and the remaining compounds were used for further analyses. Finally, a total of 44 compounds (Figure 7, Wang “−”) with very poor intestinal absorption were filtered off, and the other 54 compounds (Figure 7, Wang “+”) with good intestinal absorption were identified to be associated with the liver toxicity of PmT. In this work, taking PmT as a case, we proposed a computational toxicology approach to screen the heapotoxic ingredients in TCMs. The heapotoxic ingredients consisted of three heapotoxic ingredient subgroups which originated from the combined classifier, HILI dataset, and herb-hepatotoxic ingredient network, respectively. As a result, a total of 25 compounds in Pmt were identified as hepatotoxicants, among which 20 ingredients (Figure 7, module 2) exhibited good intestinal absorption. For the 44 compounds with very poor intestinal absorption, only 2 (Module 1 in Figure 7) of them were classified as hepatotoxicants. The most popular administration route of TCMs is oral administration. Therefore, chemical ingredients with good oral absorption are more likely to distribute in the liver and lead to liver injury. Therefore, we may speculate that screening focused on the oral absorption properties may help to narrow the range of screening hepatotoxic ingredients in TCMs. In introduction section, we highlighted that the potential hepatotoxicants identified by Wang et al. were the mixtures of toxic and beneficial ingredients. Thus, studies focused on differentiating hepatotoxicants and non-hepatotoxicants are required. In this work, for the 54 potential hepatotoxic compounds identified by Wang et al, twenty (Figure 7, module 2) of them were identified as hepatotoxicants by our computational toxicology approach. According to records in the literature, eight (Emodin, chrysophanol, rhein, danthron, aloe emodin, luteolin, physcion, and apigenin) out of those 20 hepatotoxic ingredients could cause varying degrees of liver injury. For the other 12 hepatotoxic ingredients, although direct evidence focused on their hepatotoxicity was not available, no one of them was reported to be a potential hepatoprotector. For the 34 non-hepatotoxic compounds (Figure 7, module 3), no direct evidence about their hepatotoxicity was retrieved. In addition, totals of 11 and one ingredients among these 34 non-hepatotoxic compounds were reported to be potential hepatoprotectors and a non-hepatotoxicant, respectively. Results provided by our integrated toxicity approach were highly consistent with reports in the literature. Summing up the above, we can conclude that the computational toxicity approach proposed in this work can differentiate non-hepatotoxic and hepatotoxic ingredients effectively. In addition, our computational toxicology approach also provided three (Chrysarobin, polygonumnolide C2, and emodin dianthrone) additional hepatotoxic ingredients in PmT which were not included in the 98 ingredients collected by Wang et al. Of note, among the 25 potential hepatotoxic ingredients in PmT, a total of 11 ingredients (Table 5, 1–11) have been reported to possess potential liver toxicity in the literature. For the other 14 ingredients, their hepatotoxicity has not been investigated. Therefore, experimental verification focused on the hepatotoxicity of these potential hepatotoxic ingredients is urgently needed in the future.
The major limitation of the study is that the majority of the HILI data used in this work was derived from animal/cell experiments rather than clinical case reports. The reason for such a situation is that studies on most ingredients from TCMs are still at the stage of animal-testing. Therefore, clinical data for these ingredients are generally unavailable. Nevertheless, we cannot deny the value of these data. After all, animal/cell experiments are often necessary in the discovery and development phases of novel drugs. Generally, researchers test new therapies through animal studies, and drug candidates which exhibit the most significant efficacy are moved on to clinical trials. Therefore, we must acknowledge the value of animal/cell experiments. Of course, if the HILI data used in this work were clinical, the value of this work would be greatly enhanced.
In summary, the advantages of this work can be summarized as, but not limited to, the following several aspects: Firstly, a large scale and high quality dataset for DILI was built, which will be a valuable resource for modeling/data mining in the future. Secondly, QSAR models developed solely based on the synthetic drugs often failed to assess the hepatotoxicity of natural products. Hundreds of hepatotoxic/non-hepatotoxic ingredients from medicinal plants were added into the modeling dataset, which led to our combined classifier, applicable to both natural products and synthetic drugs. Thirdly, the computational toxicology approach proposed in this work will assist in the screening of the hepatotoxic ingredients in TCMs, which will further lay the foundation for exploring the hepatotoxic mechanisms of TCMs. In addition, the method proposed in this work can be applied to research focused on other adverse effects of TCMs/synthetic drugs.

5. Conclusions

In the current study, by combining 13 types of molecular fingerprints/descriptors and eight machine learning algorithms, 5416 single classifiers were developed for predicting DILI. Then the NaiveBayes algorithm was utilized to construct a combined classifier by integrating the best single classifier of each machine learning algorithm. The combined classifier outperformed the eight best single classifiers and several prior studies in both cross validation and external validation. Subsequently, taking PmT as an example, a computational toxicology approach was proposed for the first time to uncover the hepatotoxic ingredients in TCMs. As a result, a total of 25 ingredients from PmT were identified as hepatotoxicants, among which 11 ingredients had been reported to be hepatotoxicants in the literature. The results were highly consistent with reports in the literature, indicating that our computational toxicology approach is reliable and effective in the screening of hepatotoxic ingredients in Pmt. In summary, the combined classifier can be used to assess the hepatotoxic risk of both natural compounds and synthetic drugs. The computational toxicity approach proposed in this work will be a powerful in silico methods for screening the hepatotoxic ingredients in TCMs.

Supplementary Materials

Supplementary materials can be found at https://www.mdpi.com/2218-273X/9/10/577/s1. Supplementary File 1.

Author Contributions

X.S. and G.S. conceived the research; S.H. carried out the majority of the computational analyses, and completed the task of manuscript preparation; X.Z. collected the dataset; S.L. and T.Z. contributed to revising and editing the manuscript. All authors have reviewed and approved the manuscript.

Funding

This research was funded by National Science and Technology Major Project (grant number 2015ZX09501004-001-003), the Special Research Project for TCM (grant number 201507004), and Major Program of the National Natural Science Foundation of China (grant number 81891012).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

TCMsTraditional Chinese Medicines
DILIDrug-induced liver injury
PmTPolygonum multiflorum Thunb
QSARQuantitative structure-activity relationship
CTDComparative Toxicogenomics Database
AUCArea under curve
ACCAccuracy
SESensitivity
SPSpecificity
BACCBalanced accuracy
HILIHerb-induced liver injury
RFERecursive feature elimination
WEKAWaikato Environment for Knowledge Analysis

References

  1. Shad, J.A.; Chinn, C.G.; Brann, O.S. Acute hepatitis after ingestion of herbs. South Med. J. 1999, 92, 1095–1097. [Google Scholar] [CrossRef] [PubMed]
  2. Teschke, R.; Andrade, R.J. Drug, Herb, and Dietary Supplement Hepatotoxicity. Int. J. Mol. Sci. 2016, 17, 1488. [Google Scholar] [CrossRef] [PubMed]
  3. Allard, T.; Wenner, T.; Greten, H.J.; Efferth, T. Mechanisms of herb-induced nephrotoxicity. Curr. Med. Chem. 2013, 20, 2812–2819. [Google Scholar] [CrossRef] [PubMed]
  4. Tai, Y.T.; But, P.P.; Young, K.; Lau, C.P. Cardiotoxicity after accidental herb-induced aconite poisoning. Lancet 1992, 340, 1254–1256. [Google Scholar] [CrossRef]
  5. Zhang, S.N.; Li, X.Z.; Wang, Y.; Zhang, N.; Yang, Z.M.; Liu, S.M.; Lu, F. Neuroprotection or neurotoxicity? new insights into the effects of Acanthopanax senticosus harms on nervous system through cerebral metabolomics analysis. J. Ethnopharmacol. 2014, 156, 290–300. [Google Scholar] [CrossRef] [PubMed]
  6. Lai, M.Y.; Yang, W.C. Herb-associated carcinogenicity and chronic renal failure in Asian patients with kidney cancer and hypertension. Kidney Int. 2005, 68, 412. [Google Scholar] [CrossRef]
  7. Abdualmjid, R.J.; Sergi, C. Hepatotoxic botanicals - an evidence-based systematic review. J. Pharm. Pharm. Sci. 2013, 16, 376–404. [Google Scholar] [CrossRef]
  8. Stickel, F.; Shouval, D. Hepatotoxicity of herbal and dietary supplements: An update. Arch. Toxicol. 2015, 89, 851–865. [Google Scholar] [CrossRef]
  9. Teschke, R.; Larrey, D.; Melchart, D.; Danan, G. Traditional Chinese Medicine (TCM) and Herbal Hepatotoxicity: RUCAM and the Role of Novel Diagnostic Biomarkers Such as MicroRNAs. Medicines 2016, 3, 18. [Google Scholar] [CrossRef]
  10. Zhu, J.; Seo, J.E.; Wang, S.; Ashby, K.; Ballard, R.; Yu, D.; Ning, B.; Agarwal, R.; Borlak, J.; Tong, W.; et al. The Development of a Database for Herbal and Dietary Supplement Induced Liver Toxicity. Int. J. Mol. Sci. 2018, 19, 2955. [Google Scholar] [CrossRef]
  11. Byeon, J.H.; Kil, J.H.; Ahn, Y.C.; Son, C.G. Systematic review of published data on herb induced liver injury. J. Ethnopharmacol. 2019, 233, 190–196. [Google Scholar] [CrossRef] [PubMed]
  12. Shen, T.; Liu, Y.; Shang, J.; Xie, Q.; Li, J.; Yan, M.; Xu, J.; Niu, J.; Liu, J.; Watkins, P.B.; et al. Incidence and Etiology of Drug-Induced Liver Injury in Mainland China. Gastroenterology 2019, 156, 2230–2241. [Google Scholar] [CrossRef] [PubMed]
  13. Hebels, D.G.; Jetten, M.J.; Aerts, H.J.; Herwig, R.; Theunissen, D.H.; Gaj, S.; van Delft, J.H.; Kleinjans, J.C. Evaluation of database-derived pathway development for enabling biomarker discovery for hepatotoxicity. Biomark. Med. 2014, 8, 185–200. [Google Scholar] [CrossRef] [PubMed]
  14. Luo, J.; Ren, Y.; Gu, H.; Wu, Y.; Wang, Y. dTGS: Method for Effective Components Identification from Traditional Chinese Medicine Formula and Mechanism Analysis. Evid. Based Complement. Altern. Med. 2013, 2013, 840427. [Google Scholar] [CrossRef] [PubMed]
  15. Li, S. Exploring traditional chinese medicine by a novel therapeutic concept of network target. Chin. J. Integr. Med. 2016, 22, 647–652. [Google Scholar] [CrossRef] [PubMed]
  16. Huang, C.; Zheng, C.; Li, Y.; Wang, Y.; Lu, A.; Yang, L. Systems pharmacology in drug discovery and therapeutic insight for herbal medicines. Brief Bioinform. 2014, 15, 710–733. [Google Scholar] [CrossRef]
  17. Tang, H.; He, S.; Zhang, X.; Luo, S.; Zhang, B.; Duan, X.; Zhang, Z.; Wang, W.; Wang, Y.; Sun, Y. A Network Pharmacology Approach to Uncover the Pharmacological Mechanism of XuanHuSuo Powder on Osteoarthritis. Evid. Based Complement. Altern. Med. 2016, 2016, 3246946. [Google Scholar] [CrossRef]
  18. Zhang, Y.; Bai, M.; Zhang, B.; Liu, C.; Guo, Q.; Sun, Y.; Wang, D.; Wang, C.; Jiang, Y.; Lin, N.; et al. Uncovering pharmacological mechanisms of Wu-tou decoction acting on rheumatoid arthritis through systems approaches: Drug-target prediction, network analysis and experimental validation. Sci. Rep. 2015, 5, 9463. [Google Scholar] [CrossRef]
  19. Liang, X.; Li, H.; Li, S. A novel network pharmacology approach to analyse traditional herbal formulae: The Liu-Wei-Di-Huang pill as a case study. Mol. Biosyst. 2014, 10, 1014–1022. [Google Scholar] [CrossRef]
  20. Wang, Y.Y.; Li, J.; Wu, Z.R.; Zhang, B.; Yang, H.B.; Wang, Q.; Cai, Y.C.; Liu, G.X.; Li, W.H.; Tang, Y. Insights into the molecular mechanisms of Polygonum multiflorum Thunb-induced liver injury: A computational systems toxicology approach. Acta Pharmacol. Sin. 2017, 38, 719–732. [Google Scholar] [CrossRef]
  21. Yang, X.; Zhang, Y.; Liu, Y.; Chen, C.; Xu, W.; Xiao, H. Emodin induces liver injury by inhibiting the key enzymes of FADH/NADPH transport in rat liver. Toxicol. Res. 2018, 7, 888–896. [Google Scholar] [CrossRef] [PubMed]
  22. Wu, L.; Chen, Y.; Liu, H.; Zhan, Z.; Liang, Z.; Zhang, T.; Cai, Z.; Ye, L.; Liu, M.; Zhao, J.; et al. Emodin-induced hepatotoxicity was exacerbated by probenecid through inhibiting UGTs and MRP2. Toxicol. Appl. Pharmacol. 2018, 359, 91–101. [Google Scholar] [CrossRef] [PubMed]
  23. Jiang, L.L.; Jiang, Y.; Zhao, D.S.; Fan, Y.X.; Yu, Q.; Li, P.; Li, H.J. CYP3A Activation and Glutathione Depletion Aggravate Emodin-Induced Liver Injury. Chem. Res. Toxicol. 2018, 31, 1052–1060. [Google Scholar] [CrossRef] [PubMed]
  24. Dong, X.; Ni, B.; Fu, J.; Yin, X.; You, L.; Leng, X.; Liang, X.; Ni, J. Emodin induces apoptosis in human hepatocellular carcinoma HepaRG cells via the mitochondrial caspasedependent pathway. Oncol. Rep. 2018, 40, 1985–1993. [Google Scholar]
  25. Dong, X.; Fu, J.; Yin, X.; Cao, S.; Li, X.; Lin, L.; Ni, J. Emodin: A Review of its Pharmacology, Toxicity and Pharmacokinetics. Phytother. Res. 2016, 30, 1207–1218. [Google Scholar] [CrossRef] [PubMed]
  26. Shi, F.; Zhao, P.; Li, X.; Pan, H.; Ma, S.; Ding, L. Cytotoxicity of luteolin in primary rat hepatocytes: The role of CYP3A-mediated ortho-benzoquinone metabolite formation and glutathione depletion. J. Appl. Toxicol. 2015, 35, 1372–1380. [Google Scholar] [CrossRef]
  27. Choi, Y.J.; Yoon, Y.; Choi, H.S.; Park, S.; Oh, S.; Jeong, S.M.; Suh, H.R.; Lee, B.H. Effects of Medicinal herb Extracts and their Components on Steatogenic Hepatotoxicity in Sk-hep1 Cells. Toxicol. Res. 2011, 27, 211–216. [Google Scholar] [CrossRef]
  28. Miltonprabu, S.; Tomczyk, M.; Skalicka-Wozniak, K.; Rastrelli, L.; Daglia, M.; Nabavi, S.F.; Alavian, S.M.; Nabavi, S.M. Hepatoprotective effect of quercetin: From chemistry to medicine. Food Chem. Toxicol. 2017, 108, 365–374. [Google Scholar] [CrossRef]
  29. Zhou, R.J.; Ye, H.; Wang, F.; Wang, J.L.; Xie, M.L. Apigenin inhibits d-galactosamine/LPS-induced liver injury through upregulation of hepatic Nrf-2 and PPARgamma expressions in mice. Biochem. Biophys. Res. Commun. 2017, 493, 625–630. [Google Scholar] [CrossRef]
  30. Faghihzadeh, F.; Hekmatdoost, A.; Adibi, P. Resveratrol and liver: A systematic review. J. Res. Med. Sci. 2015, 20, 797–810. [Google Scholar]
  31. Feng, R.B.; Wang, Y.; He, C.; Yang, Y.; Wan, J.B. Gallic acid, a natural polyphenol, protects against tert-butyl hydroperoxide- induced hepatotoxicity by activating ERK-Nrf2-Keap1-mediated antioxidative response. Food Chem. Toxicol. 2018, 119, 479–488. [Google Scholar] [CrossRef] [PubMed]
  32. Tsai, M.S.; Wang, Y.H.; Lai, Y.Y.; Tsou, H.K.; Liou, G.G.; Ko, J.L.; Wang, S.H. Kaempferol protects against propacetamol-induced acute liver injury through CYP2E1 inactivation, UGT1A1 activation, and attenuation of oxidative stress, inflammation and apoptosis in mice. Toxicol. Lett. 2018, 290, 97–109. [Google Scholar] [CrossRef] [PubMed]
  33. Hong, M.; Li, S.; Tan, H.Y.; Cheung, F.; Wang, N.; Huang, J.; Feng, Y. A Network-Based Pharmacology Study of the Herb-Induced Liver Injury Potential of Traditional Hepatoprotective Chinese Herbal Medicines. Molecules 2017, 22, 632. [Google Scholar] [CrossRef] [PubMed]
  34. Geyikoglu, F.; Yilmaz, E.G.; Erol, H.S.; Koc, K.; Cerig, S.; Ozek, N.S.; Aysin, F. Hepatoprotective Role of Thymol in Drug-Induced Gastric Ulcer Model. Ann. Hepatol. 2018, 17, 980–991. [Google Scholar] [CrossRef] [PubMed]
  35. Wang, T.; Wu, M.B.; Lin, J.P.; Yang, L.R. Quantitative structure-activity relationship: Promising advances in drug discovery platforms. Expert. Opin. Drug Discov. 2015, 10, 1283–1300. [Google Scholar] [CrossRef] [PubMed]
  36. Yang, H.; Lou, C.; Sun, L.; Li, J.; Cai, Y.; Wang, Z.; Li, W.; Liu, G.; Tang, Y. admetSAR 2.0: Web-service for prediction and optimization of chemical ADMET properties. Bioinformatics 2019, 35, 1067–1069. [Google Scholar] [CrossRef] [PubMed]
  37. Ai, H.; Chen, W.; Zhang, L.; Huang, L.; Yin, Z.; Hu, H.; Zhao, Q.; Zhao, J.; Liu, H. Predicting Drug-Induced Liver Injury Using Ensemble Learning Methods and Molecular Fingerprints. Toxicol. Sci. 2018, 165, 100–107. [Google Scholar] [CrossRef] [PubMed]
  38. Kotsampasakou, E.; Montanari, F.; Ecker, G.F. Predicting drug-induced liver injury: The importance of data curation. Toxicology 2017, 389, 139–145. [Google Scholar] [CrossRef]
  39. Huang, S.H.; Tung, C.W.; Fulop, F.; Li, J.H. Developing a QSAR model for hepatotoxicity screening of the active compounds in traditional Chinese medicines. Food Chem. Toxicol. 2015, 78, 71–77. [Google Scholar] [CrossRef]
  40. Shi, S.Z.; Wang, Q. Validation of two predictive softwares for the toxicity prediction of chemical ingredients in traditional Chinese medicine. Chin. J. New Drugs 2016, 25, 2647–2652. [Google Scholar]
  41. Liu, Y. Incorporation of absorption and metabolism into liver toxicity prediction for phytochemicals: A tiered in silico QSAR approach. Food Chem. Toxicol. 2018, 118, 409–415. [Google Scholar] [CrossRef] [PubMed]
  42. Wu, Q.; Cai, C.; Guo, P.; Chen, M.; Wu, X.; Zhou, J.; Luo, Y.; Zou, Y.; Liu, A.L.; Wang, Q.; et al. In silico Identification and Mechanism Exploration of Hepatotoxic Ingredients in Traditional Chinese Medicine. Front. Pharmacol. 2019, 10, 458. [Google Scholar] [CrossRef] [PubMed]
  43. Wetzel, S.; Schuffenhauer, A.; Roggo, S.; Ertl, P.; Waldmann, H. Cheminformatic Analysis of Natural Products and their Chemical Space. CHIMIA Int. J. Chem. 2007, 61, 355–360. [Google Scholar] [CrossRef]
  44. Stratton, C.F.; Newman, D.J.; Tan, D.S. Cheminformatic comparison of approved drugs from natural product versus synthetic origins. Bioorg. Med. Chem. Lett. 2015, 25, 4802–4807. [Google Scholar] [CrossRef] [PubMed]
  45. Zhao, P.; Liu, B.; Wang, C. Hepatotoxicity evaluation of traditional Chinese medicines using a computational molecular model. Clin. Toxicol. 2017, 55, 996–1000. [Google Scholar] [CrossRef] [PubMed]
  46. Li, Y.E.; Wang, X.Z.; Zhu, Y.L.; Jin, R.M.; Zu-Guang, Y.E.; Yao, G.T.; Liu, J.G.; Qian, X.P. Predicting Hepatotoxicity of Compounds from Traditional Chinese Medicines Using Tree Models. Chin. Pharm. J. 2014, 49, 1583–1588. [Google Scholar]
  47. He, S.; Ye, T.; Wang, R.; Zhang, C.; Zhang, X.; Sun, G.; Sun, X. An In Silico Model for Predicting Drug-Induced Hepatotoxicity. Int. J. Mol. Sci. 2019, 20, 1897. [Google Scholar] [CrossRef]
  48. He, S.; Zhang, C.; Zhou, P.; Zhang, X.; Ye, T.; Wang, R.; Sun, G.; Sun, X. Herb-Induced Liver Injury: Phylogenetic Relationship, Structure-Toxicity Relationship, and Herb-Ingredient Network Analysis. Int. J. Mol. Sci. 2019, 20, 3633. [Google Scholar] [CrossRef]
  49. Zhang, C.; Cheng, F.; Li, W.; Liu, G.; Lee, P.W.; Tang, Y. In silico Prediction of Drug Induced Liver Toxicity Using Substructure Pattern Recognition Method. Mol. Inform. 2016, 35, 136–144. [Google Scholar] [CrossRef]
  50. Davis, A.P.; Grondin, C.J.; Johnson, R.J.; Sciaky, D.; McMorran, R.; Wiegers, J.; Wiegers, T.C.; Mattingly, C.J. The Comparative Toxicogenomics Database: Update 2019. Nucleic Acids Res. 2019, 47, D948–D954. [Google Scholar] [CrossRef]
  51. Yap, C.W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. [Google Scholar] [CrossRef] [PubMed]
  52. Kursa, M.B.; Rudnicki, W.R. Feature Selection with the Boruta Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef]
  53. Frank, E.; Hall, M.; Trigg, L.; Holmes, G.; Witten, I.H. Data mining in bioinformatics using Weka. Bioinformatics 2004, 20, 2479–2481. [Google Scholar] [CrossRef] [PubMed]
  54. He, S.; Li, M.; Ye, X.; Wang, H.; Yu, W.; He, W.; Wang, Y.; Qiao, Y. Site of metabolism prediction for oxidation reactions mediated by oxidoreductases based on chemical bond. Bioinformatics 2017, 33, 363–372. [Google Scholar] [CrossRef] [PubMed]
  55. Cai, C.; Fang, J.; Guo, P.; Wang, Q.; Hong, H.; Moslehi, J.; Cheng, F. In Silico Pharmacoepidemiologic Evaluation of Drug-Induced Cardiovascular Complications Using Combined Classifiers. J. Chem. Inf. Model. 2018, 58, 943–956. [Google Scholar] [CrossRef] [PubMed]
  56. Rucker, C.; Rucker, G.; Meringer, M. y-Randomization and its variants in QSPR/QSAR. J. Chem. Inf. Model. 2007, 47, 2345–2357. [Google Scholar] [CrossRef]
  57. Linden, A. Measuring diagnostic and predictive accuracy in disease management: An introduction to receiver operating characteristic (ROC) analysis. J. Eval. Clin. Pract. 2006, 12, 132–139. [Google Scholar] [CrossRef]
  58. Lin, L.; Ni, B.; Lin, H.; Zhang, M.; Li, X.; Yin, X.; Qu, C.; Ni, J. Traditional usages, botany, phytochemistry, pharmacology and toxicology of Polygonum multiflorum Thunb.: A review. J. Ethnopharmacol. 2015, 159, 158–183. [Google Scholar] [CrossRef]
Figure 1. Workflow illustrating the combined classifier framework for predicting drug induced liver injury.
Figure 1. Workflow illustrating the combined classifier framework for predicting drug induced liver injury.
Biomolecules 09 00577 g001
Figure 2. Receiver operating characteristic curves of the eight best single classifiers and the combined classifier.
Figure 2. Receiver operating characteristic curves of the eight best single classifiers and the combined classifier.
Biomolecules 09 00577 g002
Figure 3. Comparisons between the combined classifier and prior studies within cross validation.
Figure 3. Comparisons between the combined classifier and prior studies within cross validation.
Biomolecules 09 00577 g003
Figure 4. Comparisons between the combined classifier and prior studies on external validation sets. (A) The combined classifier versus Ai’s model; (B) the combined classifier versus Zhang’s model; (C) the combined classifier versus Kotsampasakou’s model.
Figure 4. Comparisons between the combined classifier and prior studies on external validation sets. (A) The combined classifier versus Ai’s model; (B) the combined classifier versus Zhang’s model; (C) the combined classifier versus Kotsampasakou’s model.
Biomolecules 09 00577 g004
Figure 5. Diagram of the computational toxicology approach to identification the hepatotoxic ingredients in Traditional Chinese Medicines (TCMs).
Figure 5. Diagram of the computational toxicology approach to identification the hepatotoxic ingredients in Traditional Chinese Medicines (TCMs).
Biomolecules 09 00577 g005
Figure 6. Hierarchical cluster analysis of the 98 ingredients in Polygonum multiflorum Thunb (PmT). The compounds predicted as hepatotoxicity by the combined classifier were highlighted with red solid circles. Molecule ID corresponds to ID in Supplementary File 1 (PmT).
Figure 6. Hierarchical cluster analysis of the 98 ingredients in Polygonum multiflorum Thunb (PmT). The compounds predicted as hepatotoxicity by the combined classifier were highlighted with red solid circles. Molecule ID corresponds to ID in Supplementary File 1 (PmT).
Biomolecules 09 00577 g006
Figure 7. Venn diagram to show a comparison between the computational toxicology approach and the prior study. Prsent “+” and Prsent “−” indicate that the compound was predicted as hepatotoxic or non-hepatotoxic by our computational toxicology approach, respectively. Wang “+” and Wang “−” represent that the compound was identified as hepatotoxic or non-hepatotoxic by Wang et al, respectively. Ingredients included in each module are available in Supplementary File 1 (PmT).
Figure 7. Venn diagram to show a comparison between the computational toxicology approach and the prior study. Prsent “+” and Prsent “−” indicate that the compound was predicted as hepatotoxic or non-hepatotoxic by our computational toxicology approach, respectively. Wang “+” and Wang “−” represent that the compound was identified as hepatotoxic or non-hepatotoxic by Wang et al, respectively. Ingredients included in each module are available in Supplementary File 1 (PmT).
Biomolecules 09 00577 g007
Table 1. Feature selection results of 13 types of molecular fingerprints/descriptors.
Table 1. Feature selection results of 13 types of molecular fingerprints/descriptors.
IDTypeNumber of Features
PaDEL-DescriptorBorutaFind Correlation
1FP1024117117
2ExtFP1024111111
3EStateFP791312
4GraphFP10248373
5MACCSFP1665952
6PubchemFP8817758
7SubFP3072522
8SubFPC3071815
9KRFP48606149
10KRFPC48603826
11AP2D7803834
12APC2D7803317
132D Descriptor144413891
Table 2. Parameter optimization.
Table 2. Parameter optimization.
AlgorithmParameterParameter Optimization Method
RandomForestDepth: The maximum depth of the treeCVParameterSelection
BaggingK: The number of neighbors to useCVParameterSelection
AdaBoostM1C: The confidence factor used for pruningCVParameterSelection
IBkK: The number of neighbors to useCVParameterSelection
LibSVMC: Penalty parameter, γ: The radial basis function kernel parameterGridSearch
KStarB: The parameter for global blendingCVParameterSelection
J48C: The confidence factor used for pruningCVParameterSelection
Table 3. Performance of the eight best single classifiers.
Table 3. Performance of the eight best single classifiers.
ClassifiersType (Feature Number)ParameterACCAUCSESPBACC
RandomForest2D Descriptor (30)Depth = 072.2500.7940.7250.7200.723
Bagging2D Descriptor (70)K = 569.5570.7620.6630.7260.695
AdaBoostM12D Descriptor (69)C = 0.2568.7360.7430.6510.7210.686
IBk2D Descriptor (76)K = 570.1960.7580.6730.7290.701
LibSVMExtFP (80)C = 0.5, γ = 0.12569.7860.6990.7180.6800.699
KStarAP2D (14)B = 166.8190.7040.6910.6470.669
J48SubFP (19)C = 0.2568.1880.7120.6940.6710.683
NaiveBayesExtFP (42)Default set64.9930.7040.6770.6250.651
Table 4. Performance of the eight best single classifiers and the combined classifier on the integrated external validation set.
Table 4. Performance of the eight best single classifiers and the combined classifier on the integrated external validation set.
ClassifierACCAUCSESPBACC
RandomForest76.9610.8100.7890.7370.763
Bagging69.1180.7640.7110.6580.685
AdaBoostM167.6470.7130.6720.6840.678
IBk68.1370.7520.7110.7110.711
LibSVM78.4310.7560.8670.6450.756
KStar65.6860.6730.7030.5790.641
J4864.2160.6320.7110.5260.619
NaiveBayes61.2750.6030.7190.4340.577
Combined classifier78.9220.8260.8130.7500.782
The maximum value of each index was highlighted with bold.
Table 5. Hepatotoxic ingredients in PmT identified by the computational toxicology approach.
Table 5. Hepatotoxic ingredients in PmT identified by the computational toxicology approach.
IDIngredientLiver ToxicitySource
Subgroup 1Subgroup 2Subgroup 3
1Emodin[21]+++
2Chrysophanol[43]+++
3Chrysarobin[41]nonenone+
4Rhein[44]+++
5Danthron[45]+none+
6Polygonumnolide C2[41]nonenone+
7Emodin dianthrone[41]nonenone+
8Aloe emodin[46]++none
9Luteolin[26]-+none
10Physcion[43]++none
11Apigenin[27]+nonenone
12Emodin-8-methyl etherNo report+nonenone
13CitreoroseinNo report+nonenone
14Emodin-3-methyl etherNo report+nonenone
15FallacinolNo report+nonenone
162-AcetylemodinNo report+nonenone
17Hexadecanoic acid methyl esterNo report+nonenone
18Octadecanoic acid methyl esterNo report+nonenone
19Docosanoic acid methyl esterNo report+nonenone
204-HydroxybenzaldehydeNo report+nonenone
212,5-dimethyl-7-hydroxychromoneNo report+nonenone
22HydroxymaltolNo report+nonenone
23Butanedioic acidNo report+nonenone
24Emodin-6,8-dimethyletherNo report+nonenone
25Hexanoic acidNo report+nonenone
“+” and “−” indicate hepatotoxic and non-hepatotoxic, respectively. The unique hepatotoxic ingredients from each subgroup are highlighted with red; “none” in column subgroup 1 means that the ingredient was not included in the 98 compounds of Pmt; “none” in subgroup 2 or subgroup 3 represents that the ingredient was not found in the HILI dataset or herb-hepatotoxic ingredient network, respectively.
Back to TopTop