Chem2Side: A Deep Learning Model with Ensemble Augmentation (Conventional + Pix2Pix) for COVID-19 Drug Side-Effects Prediction from Chemical Images
Abstract
:1. Introduction
- We compiled a comprehensive dataset by gathering information from reputable sources such as SIDER and PubChem;
- We converted our problem from a multi-label to a multi-class format, employing effective techniques for enhanced clarity and precision;
- We opted for the reliable diffusion method and traditional augmentation techniques to produce synthetic data;
- Our study introduces a novel model designed for predicting multiple drug side effects (DSEs) by leveraging 2D chemical structures of drugs;
- In an effort to streamline the training process, we incorporated a transfer learning approach, thereby minimizing the required training time;
- Our proposed model streamlines the intricate transformation process, in contrast to the NLP domain’s approach, which involves converting smiles into fingerprints and extracting features.
2. Proposed Methodology
2.1. Dataset
2.2. Transformation of Multi-Label Problem to Multi-Class
2.3. Data Augmentation with Diffusion and Conventional Augmentation Techniques
2.4. Proposed Model Based on Transfer Learning and Fine-Tuning
3. Experiment Results and Discussion
3.1. Evaluation Metrics
- Accuracy: Accuracy is a metric that assesses the overall correctness of a model’s predictions. It calculates the proportion of correctly classified samples out of the total samples. While accuracy is a crucial evaluation measure, it may not be sufficient in certain scenarios, such as imbalanced datasets or cases where different types of errors have varying consequences. In such situations, additional evaluation metrics may be necessary to provide a more comprehensive understanding of the model’s performance and capabilities. In Equations (2)–(4), TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative, respectively.
- Precision: Precision is a metric that evaluates a model’s capability to correctly identify positive samples among the predicted positive samples. It calculates the proportion of true-positive predictions to the total number of positive predictions (which includes both true positives and false positives). Precision provides valuable insights into how accurately the model detects and classifies positive instances, making it an essential measure in many classification tasks.
- Recall: Recall, also known as sensitivity or the true-positive rate, measures the model’s capacity to correctly identify positive samples among all actual positive samples. It calculates the ratio of true positives to the sum of true positives and false negatives. Recall reflects the model’s ability to be comprehensive in capturing positive instances, making it a critical evaluation metric in classification tasks.
- Score: The F1 score is computed as the harmonic mean of precision and recall, providing a single statistic that balances the two metrics. This makes it particularly useful when dealing with imbalanced class distributions or scenarios where equal emphasis is placed on both types of errors. The F1 score ranges from 0 to 1, with 1 representing the best possible performance of the model. By incorporating both precision and recall, the F1 score offers a comprehensive evaluation of the model’s overall effectiveness in classification tasks.
3.2. Stratified K-Fold (Train, Validation, Test)
Algorithm 1: Stratified K-Fold Approach for the Proposed Study |
Input: Drug 2D Chemical Structures Dataset |
Step 1: Split the dataset into 3 folds. |
Repeat |
For fold i = 1 to 3 do |
Step 2: Select fold i as the validation and the remaining folds as the training set. |
Step 3: Divide the validation set with ratio 70:30, 30% used for testing. |
Step 4: Fit the model on training set. |
Step 5: Evaluate for validation set during training. |
Step 6: At the end of the fold i, evaluate the model for test set. |
Step 7: Store the evaluation scores in list S. |
End-for |
Step 8: Find the average performance with S. |
Output: Average Performance of the Model |
3.3. Results of Proposed CHEM2SIDE Model and Discussion
3.4. Comparison with the Literature Contributions
3.5. Robustness of Proposed CHEM2SIDE
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
TP | True Positive |
FP | False Positive |
TN | True Negative |
TP | True Positive |
FN | False Negative |
LR | Learning Rate |
GAN | Generative Adversarial Networks |
ML | Machine Learning |
DL | Deep Learning |
CNN | Convolutional Neural Network |
AI | Artificial Intelligence |
1D | One-Dimensional |
2D | Two-Dimensional |
ROC | Receiver Operating Characteristics |
FDA | Food and Drug Administration |
DSEs | Drug Side Effects |
KNN | K-Nearest Neighbors |
References
- Khalil, H.; Huang, C. Adverse drug reactions in primary care: A scoping review. BMC Health Serv. Res. 2020, 20, 5. [Google Scholar] [CrossRef] [PubMed]
- Billingsley, M.L. Druggable Targets and Targeted Drugs: Enhancing the Development of New Therapeutics. Pharmacology 2008, 82, 239–244. [Google Scholar] [CrossRef] [PubMed]
- Giacomini, K.M.; Krauss, R.M.; Roden, D.M.; Eichelbaum, M.; Hayden, M.R.; Nakamura, Y. When good drugs go bad. Nature 2007, 446, 975–977. [Google Scholar] [CrossRef] [PubMed]
- Drugs|FDA. Available online: https://www.fda.gov/drugs (accessed on 10 April 2023).
- Yao, B.; Zhu, L.; Jiang, Q.; Xia, H.A. Safety Monitoring in Clinical Trials. Pharmaceutics 2013, 5, 94–106. [Google Scholar] [CrossRef] [PubMed]
- Ho, T.-B.; Le, L.; Thai, D.T.; Taewijit, S. Data-driven Approach to Detect and Predict Adverse Drug Reactions. Curr. Pharm. Des. 2016, 22, 3498–3526. [Google Scholar] [CrossRef] [PubMed]
- Boland, M.R.; Jacunski, A.; Lorberbaum, T.; Romano, J.D.; Moskovitch, R.; Tatonetti, N.P. Systems biology approaches for identifying adverse drug reactions and elucidating their underlying biological mechanisms. Wiley Interdiscip. Rev. Syst. Biol. Med. 2015, 8, 104–122. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Chen, Y.; Tu, S.; Liu, F.; Qu, Q. Drug side effect prediction through linear neighborhoods and multiple data source integration. In Proceedings of the 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Shenzhen, China, 15–18 December 2016; pp. 427–434. [Google Scholar]
- Shaked, I.; Oberhardt, M.A.; Atias, N.; Sharan, R.; Ruppin, E. Metabolic Network Prediction of Drug Side Effects. Cell Syst. 2016, 2, 209–213. [Google Scholar] [CrossRef] [PubMed]
- Cakir, A.; Tuncer, M.; Taymaz-Nikerel, H.; Ulucan, O. Side effect prediction based on drug-induced gene expression profiles and random forest with iterative feature selection. Pharmacogenomics J. 2021, 21, 673–681. [Google Scholar] [CrossRef]
- Uner, O.C.; Cinbis, R.G.; Tastan, O.; Cicek, A.E. DeepSide: A Deep Learning Framework for Drug Side Effect Prediction. bioRxiv 2019, 843029. [Google Scholar] [CrossRef]
- Pauwels, E.; Stoven, V.; Yamanishi, Y. Predicting drug side-effect profiles: A chemical fragment-based approach. BMC Bioinform. 2011, 12, 169. [Google Scholar] [CrossRef]
- Yamanishi, Y.; Pauwels, E.; Kotera, M. Drug side-effect prediction based on the integration of chemical and biological spaces. J. Chem. Inf. Model. 2012, 52, 3284–3292. [Google Scholar] [CrossRef] [PubMed]
- Martin, Y.C.; Kofron, J.L.; Traphagen, L.M. Do Structurally Similar Molecules Have Similar Biological Activity? J. Med. Chem. 2002, 45, 4350–4358. [Google Scholar] [CrossRef] [PubMed]
- Duffy, Á.; Verbanck, M.; Dobbyn, A.; Won, H.H.; Rein, J.L.; Forrest, I.S.; Nadkarni, G.; Rocheleau, G.; Do, R. Tissue-specific genetic features inform prediction of drug side effects in clinical trials. Sci. Adv. 2020, 6, 6242. [Google Scholar] [CrossRef] [PubMed]
- Zheng, Y.; Peng, H.; Ghosh, S.; Lan, C.; Li, J. Inverse similarity and reliable negative samples for drug side-effect prediction. BMC Bioinform. 2019, 19, 91–104. [Google Scholar] [CrossRef] [PubMed]
- Cami, A.; Arnold, A.; Manzi, S.; Reis, B. Predicting Adverse Drug Events Using Pharmacological Network Models. Sci. Transl. Med. 2011, 3, 114ra127. [Google Scholar] [CrossRef] [PubMed]
- Rees, K.E.; Chyou, T.-Y.; Nishtala, P.S. A Disproportionality Analysis of the Adverse Drug Events Associated with Lurasidone in Paediatric Patients Using the US FDA Adverse Event Reporting System (FAERS). Available online: https://link.springer.com/article/10.1007/s40264-020-00928-1 (accessed on 13 April 2023).
- Drug Decision Support from Wolters Kluwer|Wolters Kluwer. Available online: https://www.wolterskluwer.com/en/know/drug-decision-support-solutions (accessed on 13 April 2023).
- Huang, L.-C.; Wu, X.; Chen, J.Y. Predicting adverse side effects of drugs. BMC Genom. 2011, 12, S11. [Google Scholar] [CrossRef] [PubMed]
- Jiang, K.; Zheng, Y. Mining Twitter data for potential drug effects. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: New York, NY, USA, 2013; Volume 8346, pp. 434–443. [Google Scholar] [CrossRef]
- Ginn, R.; Pimpalkhute, P.; Nikfarjam, A.; Patki, A.; O’Connor, K.; Sarker, A.; Smith, K.; Gonzalez, G. Mining Twitter for Adverse Drug Reaction Mentions: A corpus and Classification Benchmark. Available online: https://www.researchgate.net/profile/Abeed-Sarker/publication/280301158_Mining_Twitter_for_adverse_drug_reaction_mentions_a_corpus_and_classification_benchmark/links/56d205b608ae85c8234ae39d/Mining-Twitter-for-adverse-drug-reaction-mentions-a-corpus-and-classification-benchmark.pdf (accessed on 13 April 2023).
- Zhang, W.; Liu, F.; Luo, L.; Zhang, J. Predicting drug side effects by multi-label learning and ensemble learning. BMC Bioinform. 2015, 16, 1–11. [Google Scholar] [CrossRef]
- Elaziz, M.A.; Yousri, D. Automatic selection of heavy-tailed distributions-based synergy Henry gas solubility and Harris hawk optimizer for feature selection: Case study drug design and discovery. Artif. Intell. Rev. 2021, 54, 4685–4730. [Google Scholar] [CrossRef]
- Liu, M.; Wu, Y.; Chen, Y.; Sun, J.; Zhao, Z.; Chen, X.W.; Matheny, M.E.; Xu, H. Large-Scale Prediction of Adverse Drug Reactions using Chemical, Biological, and Phenotypic Properties of Drugs. Available online: https://academic.oup.com/jamia/article-abstract/19/e1/e28/2909247 (accessed on 11 April 2023).
- Mizutani, S.; Pauwels, E.; Stoven, V.; Goto, S.; Yamanishi, Y. Relating Drug–Protein Interaction Network with Drug Side Effects. Available online: https://academic.oup.com/bioinformatics/rticle-abstract/28/18/i522/246017 (accessed on 14 April 2023).
- Jamal, S.; Goyal, S.; Shanker, A.; Grover, A. Predicting neurological Adverse Drug Reactions based on biological, chemical and phenotypic properties of drugs using machine learning models. Sci. Rep. 2017, 7, 872. [Google Scholar] [CrossRef]
- Dey, S.; Luo, H.; Fokoue, A.; Hu, J.; Zhang, P. Predicting adverse drug reactions through interpretable deep learning framework. BMC Bioinform. 2018, 19, 476. [Google Scholar] [CrossRef]
- Wang, C.-S.; Lin, P.-J.; Cheng, C.-L.; Tai, S.-H.; Yang, Y.-H.K.; Chiang, J.-H. Detecting Potential Adverse Drug Reactions Using a Deep Neural Network Model. J. Med. Internet Res. 2019, 21, e11016. Available online: https://www.jmir.org/2019/2/e11016/ (accessed on 14 April 2023). [CrossRef] [PubMed]
- Lee, C.Y.; Chen, Y.-P.P. Prediction of drug adverse events using deep learning in pharmaceutical discovery. Brief. Bioinform. 2020, 22, 1884–1901. [Google Scholar] [CrossRef] [PubMed]
- Zhou, H.; Cao, H.; Matyunina, L.; Shelby, M.; Cassels, L.; McDonald, J.F.; Skolnick, J. MEDICASCY: A Machine Learning Approach for Predicting Small-Molecule Drug Side Effects, Indications, Efficacy, and Modes of Action. Mol. Pharm. 2020, 17, 1558–1574. [Google Scholar] [CrossRef] [PubMed]
- Mohsen, A.; Tripathi, L.P.; Mizuguchi, K. Deep Learning Prediction of Adverse Drug Reactions in Drug Discovery Using Open TG–GATEs and FAERS Databases. Front. Drug Discov. 2021, 1. [Google Scholar] [CrossRef]
- Jiang, M.; Zhou, B.; Chen, L. Identification of drug side effects with a path-based method. Math. Biosci. Eng. 2022, 19, 5754–5771. [Google Scholar] [CrossRef] [PubMed]
- Liang, X.; Fu, Y.; Qu, L.; Zhang, P.; Chen, Y. Prediction of drug side effects with transductive matrix co-completion. Bioinformatics 2023, 39, btad006. [Google Scholar] [CrossRef]
- SIDER Side Effect Resource. Available online: http://sideeffects.embl.de/ (accessed on 16 August 2023).
- Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; et al. PubChem 2019 update: Improved access to chemical data. Nucleic Acids Res. 2019, 47, D1102–D1109. Available online: https://academic.oup.com/nar/article-abstract/47/D1/D1102/5146201 (accessed on 6 June 2023). [CrossRef]
- Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; et al. DrugBank 5.0: A Major Update to the DrugBank Database for 2018. Nucleic Acids Res. 2018, 46, D1074–D1082. Available online: https://academic.oup.com/nar/article-abstract/46/D1/D1074/4602867 (accessed on 6 June 2023). [CrossRef]
- Brooks, T.; Holynski, A.; Efros, A.A. InstructPix2Pix: Learning to Follow Image Editing Instructions. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 18392–18402. Available online: https://arxiv.org/abs/2211.09800v2 (accessed on 1 August 2023).
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. Available online: https://arxiv.org/abs/1411.1784v1 (accessed on 1 August 2023).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. Available online: http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html (accessed on 23 March 2023).
Drug Name | Drug Structure | Fever | Vomiting |
---|---|---|---|
18F-FDG | 0 | 0 | |
4-PBA | 0 | 1 | |
abiraterone | 1 | 0 | |
8-MOP | 1 | 0 |
Drug | Fever | Vomit | Class |
---|---|---|---|
0 | 0 | 0 | |
0 | 1 | 1 | |
1 | 0 | 2 | |
1 | 1 | 3 |
Hyperparameter | Value |
---|---|
Image Guidance Scale | 2.0 |
Number of Inference Steps | 20 |
Mode | RGB |
Input Image Size | 300 × 300 |
Generated Image Size | 1024 × 1024 |
Save Size | 300 × 300 |
Hyperparameter | Value |
---|---|
Rotation Range | 10–30 |
Shear Range | 0.1–0.2 |
Zoom Range | 0.1–0.2 |
Brightness Range | 0.8–1.2 |
Horizontal Flip | True |
Fill Mode | Nearest |
Parameters | Values |
---|---|
Batch Size | 32 |
Epochs | 100 |
Learning Rate | 0.0001 |
Reduced Learning Rate | Yes |
Patience for Reduced Learning Rate | 3 |
Early-Stopping Patience | 5 |
Stratified K-Fold K Value | 3 |
Optimizer | Adam |
FC-Layer Activation Function | ReLU |
FC-Layer Neurons | 512, 256, 128 |
Output-Layer Neurons | 4 |
Output Activation Function | SoftMax |
Dropout between FC Layers | 0.2, 0.2 |
Compile Loss | Categorical Cross-Entropy |
Test Set Fold | Per-Class Samples | Accuracy | Weighted Precision | Weighted Recall | Weighted F1 | Support |
---|---|---|---|---|---|---|
Fold 1 | Class 0: 97, Class 1: 90, Class 2: 106, Class 3: 108 | 0.72 | 0.73 | 0.72 | 0.72 | 401 |
Fold 2 | Class 0: 95, Class 1: 90, Class 2: 106, Class 3: 109 | 0.73 | 0.73 | 0.73 | 0.73 | 400 |
Fold 3 | Class 0: 95, Class 1: 91, Class 2: 106, Class 3: 108, | 0.74 | 0.74 | 0.74 | 0.74 | 400 |
Average | 0.73 | 0.73 | 0.73 | 0.73 | 400 |
Model | Average Scores (Stratified K-Fold = 3, Validation → 70% Validation and 30% Testing) | |||||
---|---|---|---|---|---|---|
Train Accuracy | Validation Accuracy | Test Accuracy | Weighted Precision | Weighted Recall | Weighted F1 | |
Proposed CHEM2SIDE | 0.78 | 0.73 | 0.73 | 0.73 | 0.73 | 0.73 |
MobileNetV2 | 0.68 | 0.66 | 0.66 | 0.67 | 0.66 | 0.65 |
VGG16 | 0.51 | 0.51 | 0.51 | 0.51 | 0.51 | 0.50 |
DenseNet121 | 0.62 | 0.62 | 0.61 | 0.62 | 0.61 | 0.61 |
KNN | 0.38 | 0.33 | 0.32 | 0.32 | 0.33 | 0.25 |
COVID-19 Drugs | ||
---|---|---|
bromhexine | ivermectin | budesonide |
chloroquine | losartan | celecoxib |
colchicine | montelukast | chlorpromazine |
dipyridamole | nitazoxanide | darunavir |
methylprednisolone | quetiapine | dexamethasone |
rivaroxaban | ribavirin | famotidine |
tranexamic acid | ritonavir | fondaparinux |
argatroban | ruxolitinib | heparin |
azithromycin | simvastatin | hydroxychloroquine |
bicalutamide | sofosbuvir | ibuprofen |
Class Label | Class Samples |
---|---|
0 | 2 |
1 | 5 |
2 | 1 |
3 | 22 |
Class Label | Class Samples |
---|---|
0 | 22 |
1 | 22 |
2 | 22 |
3 | 22 |
Precision | Recall | F1 | Support | |
---|---|---|---|---|
0 | 1.00 | 0.68 | 0.81 | 22 |
1 | 0.55 | 0.82 | 0.65 | 22 |
2 | 1.00 | 0.68 | 0.81 | 22 |
3 | 0.60 | 0.68 | 0.64 | 22 |
Accuracy | 0.72 | 88 | ||
Macro Average | 0.79 | 0.72 | 0.73 | 88 |
Weighted Average | 0.79 | 0.72 | 0.73 | 88 |
COVID-19 Accuracy | COVID-19 Weighted Precision | COVID-19 Weighted Recall | COVID-19 Weighted F1 | |
---|---|---|---|---|
Proposed CHEM2SIDE | 0.72 | 0.79 | 0.72 | 0.73 |
MobileNetV2 | 0.45 | 0.47 | 0.45 | 0.39 |
VGG16 | 0.43 | 0.51 | 0.43 | 0.33 |
DenseNet121 | 0.34 | 0.16 | 0.34 | 0.22 |
KNN | 0.35 | 0.17 | 0.35 | 0.23 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arshed, M.A.; Ibrahim, M.; Mumtaz, S.; Tanveer, M.; Ahmed, S. Chem2Side: A Deep Learning Model with Ensemble Augmentation (Conventional + Pix2Pix) for COVID-19 Drug Side-Effects Prediction from Chemical Images. Information 2023, 14, 663. https://doi.org/10.3390/info14120663
Arshed MA, Ibrahim M, Mumtaz S, Tanveer M, Ahmed S. Chem2Side: A Deep Learning Model with Ensemble Augmentation (Conventional + Pix2Pix) for COVID-19 Drug Side-Effects Prediction from Chemical Images. Information. 2023; 14(12):663. https://doi.org/10.3390/info14120663
Chicago/Turabian StyleArshed, Muhammad Asad, Muhammad Ibrahim, Shahzad Mumtaz, Muhammad Tanveer, and Saeed Ahmed. 2023. "Chem2Side: A Deep Learning Model with Ensemble Augmentation (Conventional + Pix2Pix) for COVID-19 Drug Side-Effects Prediction from Chemical Images" Information 14, no. 12: 663. https://doi.org/10.3390/info14120663
APA StyleArshed, M. A., Ibrahim, M., Mumtaz, S., Tanveer, M., & Ahmed, S. (2023). Chem2Side: A Deep Learning Model with Ensemble Augmentation (Conventional + Pix2Pix) for COVID-19 Drug Side-Effects Prediction from Chemical Images. Information, 14(12), 663. https://doi.org/10.3390/info14120663