An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning
Abstract
:1. Introduction
2. Materials and Methods
2.1. Datasets
- (1)
- The Python 3.8 Rdkit tool [44] was used to match Canonical SMILES from the literature through SMILES, and the corresponding Pubchem Compound ID (CID). We combined the DILI-tagged data from different authors, and removed duplicate data as well as metals and compounds containing rare elements.
- (2)
- We binarized the labels of different datasets to obtain binary labels. The rules of the label are shown in Table 1. We adopted cautious binarization rules and took compounds with high reliability DILI classes. First, the data came from trusted sources, such as scientific literature, medical monographs clinical data and data approved by the FDA. Second, we set labels to “1” for the compounds with definite DILI from the original source, and “0” for the compounds without DILI from the original source. This is reflected in the processing of Greene, DILIrank, Livertox and LTKB data, where “HH” and “Most concern” represent “Evidence of human hepatotoxicity” and DILI-positive, respectively. Meanwhile, “NE” and “no concern” indicate “no evidence of hepatotoxicity in any species” and DILI-negative [45,46,47,48,49,50]. The “Category A” and “Category B” from the LiverTox are the classes of compounds that have been “frequently reported” and “reported” to cause DILI; “Category E” means “no evidence that the drug has caused liver injury” [47,51]. We found that Shuaibing et al. and Mulliner et al. had the same strict binarization rules we adopted, so we considered the data of these authors also to be credible [7,8]. It was found that Xu et al.’s binding data also came from highly trusted data sources, including NCTR, Greene et al., and Xu et al., which removed inconsistent compound’s DILI label from the dataset, and we considered their data equally reliable [6]. Finally, to expand the dataset, we took a small portion of compounds from Greene’s “WE” compound’s DILI classes which represented “Weak evidence of (<10 case reports) human hepatotoxicity”, and they were also considered as DILI compounds in the literature [47,50].
- (3)
- (4)
- We voted on the remaining data to determine the label of the compound. The voting rules were as follows: if the label of a compound is consistent in all datasets or consistent in 80% of the datasets, we take the label as the toxicity label of the compound; otherwise, we delete the compound.
ID | Source | Type of Data | No. of Compound (Positive/Negative) | DILI Categories |
---|---|---|---|---|
1 | Greene et al., 2010 [45] | Literature reviews and medical monographs | 487 (331/156) | HH, WE represented positives and NE represented negatives |
2 | Xu et al., 2015 [6] | Medical monographs and FDA-approved drug labeling | 475 (236/239) | Authors definition |
3 | Mulliner et al., 2017 [7] | Clinical data and drug labeling | 1370 (932/438) | Authors definition |
4 | Shuaibing et al., 2019 [8] | Drug labeling and comprehensive data | 1458 (761/697) | Authors definition |
5 | DILIrank [46] | Drug labeling and clinical data | 504 (192/312) | Most concern as 1; no concern as 0 |
6 | Livertox [47] | Scientific literature and public database | 343 (119/224) | Categories A and B were combined into positives, and Category E was considered as negatives |
7 | LTKB [48] | FDA-approved drug labeling | 195 (113/82) | Most concern as 1;no concern as 0 |
2.2. Molecular Representations
2.3. GA Algorithm
- Individual: A solution to a problem, and a unit of evolution.
- Bit: A code that constitutes the solution to the problem.
- Fitness: The degree of individual adaptation to the environment.
- Encoding: The mapping from the solution of the problem to the individual.
- Decoding: The conversion of the individual to the problem solution.
- Initialization: Set the maximum evolutionary , population size , crossover probability , mutation probability , and randomly generate individuals as the initial population .
- Fitness: The fitness function indicates the pros and cons of the individual or solution.
- Genetic Operator: Three types: selection operator, crossover operator, and mutation operator. Each population is manipulated by the genetic operators to obtain the next generation .
- Termination: When the evolution generation reaches the maximum T, the evolution is terminated.
2.4. Framework of Rotation-Ensemble-GA Algorithm
2.5. Details about R-E-GA
2.5.1. Initialization
Algorithm 1 Initialization |
|
2.5.2. Crossover and Mutation
Algorithm 2 Crossover and Mutation |
Input: : ith population. : the possibility of mutation. : the amount of individuals in a population. : the amount of feature subsets in an individual. : feature size. Output: : generated newly population after crossover and mutation operation. 1: [] 2: for do 3: select two individuals and from Pi randomly 4: select crossover point in randomly 5: divides and into left and right parts, i.e., ,, , , respectively 6: 7: 8: add C_i and C_j to C. 9: for individual I in Children do 10: select a number of mutation bits MBS with the probability P_M 11: for b in MBS do 12: I [b] = random number in S_f |
2.5.3. Fitness Function
Algorithm 3 Fitness Function |
Input:: The samples amount of training set. : The amount of feature subsets in an individual. The number of weak classifiers to train. : An individual. : Training Set. Output: : Fitness value of . 1: For do 2: Divide features into Binary Feature and Continuous Feature . 3: Rotation: Apply to and apply to and then merge the two parts. 4: Initialize the weights 5: For do 6: Take a sample from using distribution . 7: Train a classifier using as the training set. 8: Calculate the weighted ensemble error at step by ( if misclassifies and otherwise.) 9: If , ignore , reinitialize the weights to and continue. 10: Else, calculate where . 11: Update the part weights in by 12: Calculate the support for each class in Validation Set by 13: The class with the maximum support is chosen as the label for . 14: is calculated by n validation data. |
2.6. Experiment Settings
- Base Classifier:
- Iteration Number : 50
- Population Size : 50
- Possibility of Mutation: 0.1
- Ensemble Size : 10
- Feature Subset Size: 300
2.7. Details of the Comparison Algorithm
2.7.1. Voting Ensemble
2.7.2. Graph Embedding Neural Networks
3. Results and Discussion
3.1. Comparison of the Results of Each Algorithm
3.2. External Validation
3.3. Evolutionary Curve
3.4. Ablation Experiment
3.5. The Proportion of Import Features
4. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
References
- Blomme, E.A.; Chadwick, A.; Copple, I.M.; Gerets, H.H.J.; Goldring, C.E.; Guillouzo, A.; Hewitt, P.G.; Ingelman-Sundberg, M.; Jensen, K.G.; Juhila, S.; et al. Managing the challenge of drug-induced liver injury: A roadmap for the development and deployment of preclinical predictive models. Nat. Rev. Drug Discov. 2020, 19, 131–148. [Google Scholar]
- Su, R.; Wu, H.; Liu, X.; Wei, L. Predicting drug-induced hepatotoxicity based on biological feature maps and diverse classification strategies. Brief. Bioinform. 2021, 22, 428–437. [Google Scholar] [CrossRef] [PubMed]
- Lee, G.; Kim, H.; Park, J.Y.; Kim, G.; Han, J.; Chung, S.; Yang, J.H.; Jeon, J.S.; Woo, D.-H.; Han, C.; et al. Generation of uniform liver spheroids from human pluripotent stem cells for imaging-based drug toxicity analysis. Biomaterials 2021, 269, 120529. [Google Scholar] [CrossRef] [PubMed]
- Cheng, F.; Li, W.; Liu, G.; Tang, Y. In silico ADMET prediction: Recent advances, current challenges and future trends. Curr. Top. Med. Chem. 2013, 13, 1273–1289. [Google Scholar] [CrossRef]
- Fraser, K.; Bruckner, D.M.; Dordick, J.S. Advancing Predictive Hepatotoxicity at the Intersection of Experimental, in Silico, and Artificial Intelligence Technologies. Chem. Res. Toxicol. 2018, 31, 412–430. [Google Scholar] [CrossRef]
- Xu, Y.; Dai, Z.; Chen, F.; Gao, S.; Pei, J.; Lai, L. Deep Learning for Drug-Induced Liver Injury. J. Chem. Inf. Model. 2015, 55, 2085–2093. [Google Scholar] [CrossRef]
- Mulliner, D.; Schmidt, F.; Stolte, M.; Spirkl, H.P.; Czich, A.; Amberg, A. Computational Models for Human and Animal Hepatotoxicity with a Global Application Scope. Chem. Res. Toxicol. 2016, 29, 757–767. [Google Scholar] [CrossRef] [Green Version]
- He, S.; Ye, T.; Wang, R.; Zhang, C.; Zhang, X.; Sun, G.; Sun, X. An In Silico Model for Predicting Drug-Induced Hepatotoxicity. Int. J. Mol. Sci. 2019, 20, 1897. [Google Scholar] [CrossRef] [Green Version]
- Wang, Y.; Xiao, Q.; Chen, P.; Wang, B. In Silico Prediction of Drug-Induced Liver Injury Based on Ensemble Classifier Method. Int. J. Mol. Sci. 2019, 20, 4106. [Google Scholar] [CrossRef] [Green Version]
- Ai, H.X.; Chen, W.; Zhang, L.; Huang, L.C.; Yin, Z.M.; Hu, H.; Zhao, Q.; Zhao, J.; Liu, H.S. Predicting Drug-Induced Liver Injury Using Ensemble Learning Methods and Molecular Fingerprints. Toxicol. Sci. 2018, 165, 100–107. [Google Scholar] [CrossRef] [Green Version]
- Vall, A.; Sabnis, Y.; Shi, J.; Class, R.; Hochreiter, S.; Klambauer, G. The Promise of AI for DILI Prediction. Front. Artif. Intell. 2021, 4, 638410. [Google Scholar] [CrossRef] [PubMed]
- Atz, K.; Grisoni, F.; Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell. 2021, 3, 1023–1032. [Google Scholar] [CrossRef]
- Xiong, Z.; Wang, D.; Liu, X.; Zhong, F.; Wan, X.; Li, X.; Li, Z.; Luo, X.; Chen, K.; Jiang, H.; et al. Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism. J. Med. Chem. 2020, 63, 8749–8760. [Google Scholar] [CrossRef] [PubMed]
- Jiang, D.J.; Wu, Z.X.; Hsieh, C.Y.; Chen, G.Y.; Liao, B.; Wang, Z.; Shen, C.; Cao, D.S.; Wu, J.A.; Hou, T.J. Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models. J. Cheminform. 2021, 13, 1–23. [Google Scholar] [CrossRef] [PubMed]
- Ma, H.H.; An, W.Z.; Wang, Y.H.; Sun, H.M.; Huang, R.L.; Huang, J.Z. Deep Graph Learning with Property Augmentation for Predicting Drug-Induced Liver Injury. Chem. Res. Toxicol. 2021, 34, 495–506. [Google Scholar] [CrossRef]
- Wu, Z.; Jiang, D.; Wang, J.; Hsieh, C.Y.; Cao, D.; Hou, T. Mining Toxicity Information from Large Amounts of Toxicity Data. J. Med. Chem. 2021, 64, 6924–6936. [Google Scholar] [CrossRef]
- Aguirre-Plans, J.; Pinero, J.; Souza, T.; Callegaro, G.; Kunnen, S.J.; Sanz, F.; Fernandez-Fuentes, N.; Furlong, L.I.; Guney, E.; Oliva, B. An ensemble learning approach for modeling the systems biology of drug-induced injury. Biol. Direct 2021, 16, 5. [Google Scholar] [CrossRef]
- Holland, J.H. Adaptation in Natural And Artificial Systems; University of Michigan Press: Ann Arbor, MI, USA, 1975. [Google Scholar]
- Rechenberg, I. Cybernetic solution path of an experimental problem. In Royal Aircraft Establishment Translation 1122; IEEE Press: Farnborough, UK, 1965. [Google Scholar]
- Thomas, B. Evolutionary computation: Comments on the history and current state. IEEE Trans. Evol. Comput. 1997, 1, 3–17. [Google Scholar]
- Davis, L.D. Handbook of genetic algorithms. In Handbook of Genetic Algorithms; Davis, L.D., Ed.; Van Nostrand Reinhold: New York, NY, USA, 1991. [Google Scholar]
- Zhan, W.P.; Min, J.; Yao, J.F.; Liu, K.H.; Wu, Q.Q. The Design of Evolutionary Feature Selection Operator for the Micro-expression Recognition. Memetic Comput. 2022, 14, 61–76. [Google Scholar]
- Liu, K.-H.; Li, B.; Zhang, J.; Du, J.-X. Ensemble component selection for improving ICA based microarray data prediction models. Pattern Recognit. 2009, 42, 1274–1283. [Google Scholar] [CrossRef]
- Liu, K.H.; Xu, C.G. A genetic programming-based approach to the classification of multiclass microarray datasets. Bioinformatics 2009, 25, 331–337. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Li, K.S.; Wang, H.R.; Liu, K.H. A novel Error-Correcting Output Codes algorithm based on genetic programming. Swarm Evol. Comput. 2019, 50, 100564. [Google Scholar] [CrossRef]
- Liang, Y.-F.; Chang, L.; Wang, H.-R.; Liu, K.-H.; Yao, J.F.; She, Y.-Y.; Dai, G.-M.; Okina, Y. A Novel Error-Correcting Output Codes Based on Genetic Programming and Ternary Digit Operators. Pattern Recognit. 2021, 110, 107642. [Google Scholar]
- Ye, X.-N.; LIU, K.-H.; Liong, S.-T. A Ternary Bitwise Calculator Based Genetic Algorithm for Improving Error Correcting Output Codes. Inf. Sci. 2020, 537, 485–510. [Google Scholar] [CrossRef]
- Zhang, Y.-P.; YE, X.-N.; Liu, K.-H.; Yao, J.-F. A Novel Multi-Objective Genetic Algorithm Based Error Correcting Output Codes. Swarm Evol. Comput. 2020, 57, 100709. [Google Scholar] [CrossRef]
- Li, X.; Chen, L.; Tang, Y. HARD: Bit-Split String Matching Using a Heuristic Algorithm to Reduce Memory Demand. Rom. J. Inf. Sci. Technol. 2020, 23, T94–T105. [Google Scholar]
- Precup, R.-E.; David, R.-C.; Petriu, E.M.; Preitl, S.; Paul, A.S. Gravitational Search Algorithm-Based Tuning of Fuzzy Control Systems with a Reduced Parametric Sensitivity. Soft Comput. Ind. Appl. 2011, 96, 141–150. [Google Scholar]
- Zamfirache, I.A.; Precup, R.-E.; Roman, R.-C.; Petriu, E.M. Policy Iteration Reinforcement Learning-based control using a Grey Wolf Optimizer algorithm. Inf. Sci. 2022, 585, 162–175. [Google Scholar] [CrossRef]
- Martarelli, N.J.; Nagano, M.S. A constructive evolutionary approach for feature selection in unsupervised learning. Swarm Evol. Comput. 2018, 42, 125–137. [Google Scholar] [CrossRef]
- Tong, M.; Liu, K.H.; Xu, C.; Ju, W. An ensemble of SVM classifiers based on gene pairs. Comput. Biol. Med. 2013, 43, 729–737. [Google Scholar] [CrossRef]
- Dutta, D.; Sil, J.; Dutta, P. Automatic Clustering by Multi-Objective Genetic Algorithm with Numeric and Categorical Features. Expert Syst. Appl. 2019, 137, 357–379. [Google Scholar] [CrossRef]
- Soumaya, Z.; Taoufiq, B.D.; Benayad, N.; Yunus, K.; Abdelkrim, A. The detection of Parkinson disease using the genetic algorithm and SVM classifier. Appl. Acoust. 2021, 171, 107528. [Google Scholar] [CrossRef]
- Uyar, K.; İlhan, A. Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks. Procedia Comput. Sci. 2017, 120, 588–593. [Google Scholar] [CrossRef]
- Sharma, M. Cervical cancer prognosis using genetic algorithm and adaptive boosting approach. Health Technol. 2019, 9, 877–886. [Google Scholar] [CrossRef]
- Ghaheri, A.; Shoar, S.; Naderan, M.; Hoseini, S.S. The Applications of Genetic Algorithms in Medicine. Oman Med. J. 2015, 30, 406–416. [Google Scholar] [CrossRef] [PubMed]
- Mansour, A.M. Decision tree-based expert system for adverse drug reaction detection using fuzzy logic and genetic algorithm. Int. J. Adv. Comput. Res. 2018, 8, 110–128. [Google Scholar] [CrossRef]
- Spiegel, J.O.; Durrant, J.D. AutoGrow4: An open-source genetic algorithm for de novo drug design and lead optimization. J. Cheminforma. 2020, 12, 1–16. [Google Scholar] [CrossRef]
- Devi, R.V.; Sathya, S.S.; Coumar, M.S. Evolutionary algorithms for de novo drug design—A survey. Appl. Soft Comput. 2015, 27, 543–552. [Google Scholar] [CrossRef]
- Devi, R.V.; Sathya, S.S.; Coumar, M.S. Multi-objective Genetic Algorithm for De Novo Drug Design (MoGADdrug). Curr. Comput.-Aided Drug Des. 2021, 17, 445–457. [Google Scholar] [CrossRef]
- Liu, K.H.; Huang, D.S. Cancer classification using Rotation Forest. Comput. Biol. Med. 2008, 38, 601–610. [Google Scholar]
- Landrumetal, G. Rdkit: Open-Source Cheminformatics. Available online: http://www.rdkit.org/ (accessed on 1 April 2022).
- Greene, N.; Fisk, L.; Naven, R.T.; Note, R.R.; Patel, M.L.; Pelletier, D.J. Developing Structure-Activity Relationships for the Prediction of Hepatotoxicity. Chem. Res. Toxicol. 2010, 23, 1215–1222. [Google Scholar] [CrossRef] [PubMed]
- Chen, M.J.; Suzuki, A.; Thakkar, S.; Yu, K.; Hu, C.C.; Tong, W.D. DILIrank: The largest reference drug list ranked by the risk for developing drug-induced liver injury in humans. Drug Discov. Today 2016, 21, 648–653. [Google Scholar] [CrossRef]
- Hoofnagle, J.H.; Serrano, J.; Knoben, J.E.; Navarro, V.J. LiverTox: A website on drug-induced liver injury. Hepatology 2013, 57, 873–874. [Google Scholar] [CrossRef] [PubMed]
- Chen, M.; Zhang, J.; Wang, Y.; Liu, Z.; Kelly, R.; Zhou, G.; Fang, H.; Borlak, J.; Tong, W. The Liver Toxicity Knowledge Base: A Systems Approach to a Complex End Point. Clin. Pharmacol. Ther. 2013, 93, 409–412. [Google Scholar] [CrossRef] [PubMed]
- Bajzelj, B.; Drgan, V. Hepatotoxicity Modeling Using Counter-Propagation Artificial Neural Networks: Handling an Imbalanced Classification Problem. Molecules 2020, 25, 481. [Google Scholar] [CrossRef] [Green Version]
- Zhao, L.L.; Russo, D.P.; Wang, W.Y.; Aleksunes, L.M.; Zhu, H. Mechanism-Driven Read-Across of Chemical Hepatotoxicants Based on Chemical Structures and Biological Data. Toxicol. Sci. 2020, 174, 178–188. [Google Scholar] [CrossRef]
- Liu, R.F.; Yu, X.P.; Wallqvist, A. Data-driven identification of structural alerts for mitigating the risk of drug-induced human liver injuries. J. Cheminform. 2015, 7, 1–8. [Google Scholar] [CrossRef] [Green Version]
- Svetnik, V.; Liaw, A.; Tong, C.; Culberson, J.C.; Sheridan, R.P.; Feuston, B.P. Random forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 2003, 43, 1947–1958. [Google Scholar] [CrossRef]
- Jayaraman, V.K.; Sundararajan, V. Applications of Support Vector Machines In Chemo And Bioinformatics. AIP Conf. Proc. 2010, 1298, 18–23. [Google Scholar]
- Sheridan, R.P.; Wang, W.M.; Liaw, A.; Ma, J.S.; Gifford, E.M. Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships. J. Chem. Inf. Model. 2016, 56, 2353–2360. [Google Scholar] [CrossRef]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef] [Green Version]
- Cronin, M.T.D.; Aptula, A.O.; Dearden, J.C.; Duffy, J.C.; Netzeva, T.I.; Patel, H.; Rowe, P.H.; Schultz, T.W.; Worth, A.P.; Voutzoulidis, K.; et al. Structure-based classification of antibacterial activity. J. Chem. Inf. Comput. Sci. 2002, 42, 869–878. [Google Scholar] [CrossRef] [PubMed]
- Shamsara, J.; Schuurmann, G. A machine learning approach to discriminate MR1 binders: The importance of the phenol and carbonyl fragments. J. Mol. Struct. 2020, 1217, 128459. [Google Scholar] [CrossRef]
- Gao, K.F.; Nguyen, D.D.; Sresht, V.; Mathiowetz, A.M.; Tu, M.H.; Wei, G.W. Are 2D fingerprints still valuable for drug discovery? Phys. Chem. Chem. Phys. 2020, 22, 8373–8390. [Google Scholar] [CrossRef] [Green Version]
- Huang, K.X.; Fu, T.F.; Glass, L.M.; Zitnik, M.; Xiao, C.; Sun, J.M. DeepPurpose: A deep learning library for drug-target interaction prediction. Bioinformatics 2020, 36, 5545–5547. [Google Scholar] [CrossRef]
- Li, M.F.; Zhou, J.J.; Hu, J.J.; Fan, W.X.; Zhang, Y.K.; Gu, Y.X.; Karypis, G. DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science. Acs Omega 2021, 6, 27233–27238. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Wu, Z.X.; Zhu, M.F.; Kang, Y.; Leung, E.L.H.; Lei, T.L.; Shen, C.; Jiang, D.J.; Wang, Z.; Cao, D.S.; Hou, T.J. Do we need different machine learning algorithms for QSAR modeling? A comprehensive assessment of 16 machine learning algorithms on 14 QSAR data sets. Brief. Bioinform. 2021, 22, bbaa321. [Google Scholar] [CrossRef]
- Lusci, A.; Pollastri, G.; Baldi, P. Deep Architectures and Deep Learning in Chemoinformatics: The Prediction of Aqueous Solubility for Drug-Like Molecules. J. Chem. Inf. Model. 2013, 53, 1563–1575. [Google Scholar] [CrossRef] [Green Version]
- Hong, H.X.; Xie, Q.; Ge, W.G.; Qian, F.; Fang, H.; Shi, L.M.; Su, Z.Q.; Perkins, R.; Tong, W.D. Mold(2), molecular descriptors from 2D structures for chemoinformatics and toxicoinformatics. J. Chem. Inf. Model. 2008, 48, 1337–1344. [Google Scholar] [CrossRef] [PubMed]
- Yap, C.W. PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 2011, 32, 1466–1474. [Google Scholar] [CrossRef] [PubMed]
- Chen, M.J.; Hong, H.X.; Fang, H.; Kelly, R.; Zhou, G.X.; Borlak, J.; Tong, W.D. Quantitative Structure-Activity Relationship Models for Predicting Drug-Induced Liver Injury Based on FDA-Approved Drug Labeling Annotation and Using a Large Collection of Drugs. Toxicol. Sci. 2013, 136, 242–249. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Liew, C.Y.; Lim, Y.C.; Yap, C.W. Mixed learning algorithms and features ensemble in hepatotoxicity prediction. J. Comput. Aid. Mol. Des. 2011, 25, 855–871. [Google Scholar] [CrossRef]
- Xu, J.H.J.; Henstock, P.V.; Dunn, M.C.; Smith, A.R.; Chabot, J.R.; de Graaf, D. Cellular imaging predictions of clinical drug-induced liver injury. Toxicol. Sci. 2008, 105, 97–105. [Google Scholar] [CrossRef] [Green Version]
- Warszycki, D.; Struski, L.; Smieja, M.; Kafel, R.; Kurczab, R. Pharmacoprint: A Combination of a Pharmacophore Fingerprint and Artificial Intelligence as a Tool for Computer-Aided Drug Design. J. Chem. Inf. Model. 2021, 61, 5054–5065. [Google Scholar] [CrossRef]
Models | ACC | F1-Score | AUC |
---|---|---|---|
SVC | 0.747 | 0.685 | 0.766 |
Random Forest | 0.760 | 0.700 | 0.782 |
XGBoost | 0.723 | 0.669 | 0.740 |
AttentiveFP | 0.729 | 0.716 | 0.750 |
GCN | 0.725 | 0.698 | 0.759 |
GIN_AttrMasking | 0.732 | 0.697 | 0.785 |
GIN_ContextPred | 0.754 | 0.703 | 0.790 |
Voting Ensemble | 0.753 | 0.752 | 0.826 |
R-E-GA | 0.770 | 0.769 | 0.842 |
Datasets | DILI-Positive | DILI-Negative | Total Number | |
---|---|---|---|---|
Training | Combined training dataset [6] | 236 | 239 | 475 |
External validation | Combined validation dataset | 114 | 84 | 198 |
Molecular Descriptors | Index | Neural Network [6] | Xu et al. Model [6] | R-E-GA | |||
---|---|---|---|---|---|---|---|
10-Fold Test | Validation | 10-Fold Test | Validation | 10-Fold Test | Validation | ||
Mold2 descriptors | ACC | 0.825 | 0.823 | 0.832 | 0.833 | 0.852 | 0.851 |
AUC | - | 0.916 | - | 0.931 | 0.912 | 0.949 | |
SEN | 0.784 | 0.711 | 0.831 | 0.790 | 0.855 | 0.794 | |
SPE | 0.866 | 0.976 | 0.833 | 0.893 | 0.850 | 0.898 | |
PaDEL descriptors | ACC | 0.816 | 0.791 | 0.823 | 0.811 | 0.840 | 0.821 |
AUC | - | 0.869 | - | 0.895 | 0.906 | 0.904 | |
SEN | 0.758 | 0.723 | 0.852 | 0.821 | 0.831 | 0.797 | |
SPE | 0.875 | 0.881 | 0.794 | 0.798 | 0.853 | 0.858 |
Models | Internal 10-Fold Cross Validation | External Validation | ||||||
---|---|---|---|---|---|---|---|---|
ACC | AUC | SEN | SPE | ACC | AUC | SEN | SPE | |
Xu et al. [6] | 0.884 | - | 0.899 | 0.87 | 0.869 | 0.955 | 0.825 | 0.929 |
R-E-GA | 0.702 | 0.721 | 0.803 | 0.690 | 0.711 | 0.725 | 0.780 | 0.663 |
Number of drugs | (positive/negative = 236/239) | (positive/negative = 114/184) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yan, B.; Ye, X.; Wang, J.; Han, J.; Wu, L.; He, S.; Liu, K.; Bo, X. An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning. Molecules 2022, 27, 3112. https://doi.org/10.3390/molecules27103112
Yan B, Ye X, Wang J, Han J, Wu L, He S, Liu K, Bo X. An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning. Molecules. 2022; 27(10):3112. https://doi.org/10.3390/molecules27103112
Chicago/Turabian StyleYan, Bowei, Xiaona Ye, Jing Wang, Junshan Han, Lianlian Wu, Song He, Kunhong Liu, and Xiaochen Bo. 2022. "An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning" Molecules 27, no. 10: 3112. https://doi.org/10.3390/molecules27103112
APA StyleYan, B., Ye, X., Wang, J., Han, J., Wu, L., He, S., Liu, K., & Bo, X. (2022). An Algorithm Framework for Drug-Induced Liver Injury Prediction Based on Genetic Algorithm and Ensemble Learning. Molecules, 27(10), 3112. https://doi.org/10.3390/molecules27103112