# Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

- We propose a novel framework for drug response prediction that uses a GCN model learning genomic features of cell lines with a graph structure, which is the first approach to our knowledge.
- DrugGCN generates a gene graph suitable for drug response prediction by the integration of a PPI network and gene expression data and the feature selection process of genes with high predictive power.
- DrugGCN with localized filters can detect the local features in a biological network such as subnetworks of genes that contribute together to the drug response, and its learning complexity is suitable for biological networks with a huge number of vertices and edges.
- The performance of the proposed approach is demonstrated by a GDSC cell line dataset, and DrugGCN shows high prediction accuracy among the competing methods.

## 2. Materials and Methods

#### 2.1. Graph Construction for Drug Response Prediction

#### 2.1.1. Feature Selection on Gene Expression Data

#### 2.1.2. Graph Construction with Gene Expression Data

#### 2.2. Localized Filters for the Graph Convolutional Network

#### 2.3. Evaluation Criteria

## 3. Results and Discussion

#### 3.1. Performance Evaluation on the GDSC Dataset

`scikit-learn`library [44]. The hyperparameters of the methods were set as default values except the regularization parameter $\lambda $ in KRL. $\lambda $ was tuned with a line search with $\lambda \in 1.0,0.1,0.01,\dots ,1\times {10}^{-6}$ as described in the original paper and finally selected as one. The hyperparameters of DrugGCN are described in Appendix A Table A1. In the learning process, a three-fold cross-validation with a 50% training set, a 25% validation set, and a 25% test set was carried out.

#### 3.2. Case Study: ERK MAPK Signaling-Related Drugs

## 4. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A

Hyperparameters |
---|

GCN Layer = 2 |

Number of Kernels = [20, 10] |

Kernel Size = [40, 20] |

Polling size = [4, 4] |

FC Size = [128, 1] |

Reg = 0 |

Dropout = 1 |

l_rate = 0.02 |

Momentum = 0.9 |

Decay_rate = 0.95 |

Batch_size = 4 |

**Figure A1.**Performance evaluation results of six methods from the L1000-AUC dataset using the (

**a**) RMSE, (

**b**) PCC, (

**c**) SCC, and (

**d**) NDCG.

**Table A2.**p-values of the one-sided Wilcoxon signed-rank test using the L1000-AUC dataset comparing the ranks of DrugGCN and the other five methods.

Measure | BR | KRR | KRL | MLP | RWEN |
---|---|---|---|---|---|

RMSE | $0.0038$ | 9.91 $\times {10}^{-35}$ | - | 4.55 $\times {10}^{-46}$ | 1.99 $\times {10}^{-36}$ |

PCC | $0.0082$ | 1.43 $\times {10}^{-8}$ | 4.09 $\times {10}^{-9}$ | 1.01 $\times {10}^{-32}$ | 3.73 $\times {10}^{-23}$ |

SCC | $0.0025$ | 1.71 $\times {10}^{-8}$ | $0.5151$ | 2.75 $\times {10}^{-30}$ | 3.28 $\times {10}^{-26}$ |

**Figure A2.**Performance evaluation results of the six methods from the Var1000-IC50 dataset using the (

**a**) RMSE, (

**b**) PCC, (

**c**) SCC, and (

**d**) NDCG.

**Table A3.**p-values of the one-sided Wilcoxon signed-rank test using the Var1000-IC50 dataset comparing the ranks of DrugGCN and the other five methods.

Measure | BR | KRR | KRL | MLP | RWEN |
---|---|---|---|---|---|

RMSE | $0.9914$ | 4.87 $\times {10}^{-37}$ | - | 4.59 $\times {10}^{-36}$ | 2.34 $\times {10}^{-37}$ |

PCC | 3.66 $\times {10}^{-14}$ | 2.39 $\times {10}^{-29}$ | 4.10 $\times {10}^{-26}$ | 1.31 $\times {10}^{-35}$ | 1.48 $\times {10}^{-35}$ |

SCC | 6.01 $\times {10}^{-13}$ | 1.18 $\times {10}^{-27}$ | 1.03 $\times {10}^{-25}$ | 1.31 $\times {10}^{-35}$ | 9.06 $\times {10}^{-36}$ |

**Figure A3.**Performance evaluation results of the six methods from the Var1000-AUC dataset using the (

**a**) RMSE, (

**b**) PCC, (

**c**) SCC, and (

**d**) NDCG.

**Table A4.**p-values of the one-sided Wilcoxon signed-rank test using the Var1000-AUC dataset comparing the ranks of DrugGCN and the other five methods.

Measure | BR | KRR | KRL | MLP | RWEN |
---|---|---|---|---|---|

RMSE | $0.0957$ | 1.12 $\times {10}^{-32}$ | - | 5.76 $\times {10}^{-36}$ | 1.00 $\times {10}^{-36}$ |

PCC | $0.0001$ | 3.56 $\times {10}^{-11}$ | 1.08 $\times {10}^{-10}$ | 2.93 $\times {10}^{-33}$ | 9.47 $\times {10}^{-27}$ |

SCC | 4.52 $\times {10}^{-6}$ | 8.06 $\times {10}^{-16}$ | $0.0002$ | 5.81 $\times {10}^{-35}$ | 4.88 $\times {10}^{-33}$ |

## References

- Liang, P.; Pardee, A.B. Analysing differential gene expression in cancer. Nat. Rev. Cancer
**2003**, 3, 869–876. [Google Scholar] [CrossRef] [PubMed] - Hanahan, D.; Weinberg, R.A. Hallmarks of cancer: The next generation. Cell
**2011**, 144, 646–674. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bao, R.; Connolly, D.C.; Murphy, M.; Green, J.; Weinstein, J.K.; Pisarcik, D.A.; Hamilton, T.C. Activation of cancer-specific gene expression by the survivin promoter. J. Natl. Cancer Inst.
**2002**, 94, 522–528. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Hutchinson, L.; DeVita, V.T. The era of personalized medicine: Back to basics. Nat. Clin. Pract. Oncol.
**2008**, 5, 623. [Google Scholar] [CrossRef] - Castiblanco, J.; Anaya, J.M. Genetics and vaccines in the era of personalized medicine. Curr. Genom.
**2015**, 16, 47–59. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Marquart, J.; Chen, E.Y.; Prasad, V. Estimation of the percentage of US patients with cancer who benefit from genome-driven oncology. JAMA Oncol.
**2018**, 4, 1093–1098. [Google Scholar] [CrossRef] - Weinstein, J.N.; Collisson, E.A.; Mills, G.B.; Shaw, K.R.M.; Ozenberger, B.A.; Ellrott, K.; Shmulevich, I.; Sander, C.; Stuart, J.M. The cancer genome atlas pan-cancer analysis project. Nat. Genet.
**2013**, 45, 1113–1120. [Google Scholar] [CrossRef] - Zhang, J.; Baran, J.; Cros, A.; Guberman, J.M.; Haider, S.; Hsu, J.; Liang, Y.; Rivkin, E.; Wang, J.; Whitty, B.; et al. International Cancer Genome Consortium Data Portal—A one-stop shop for cancer genomics data. Database
**2011**, 2011, bar026. [Google Scholar] [CrossRef] [Green Version] - Garnett, M.J.; Edelman, E.J.; Heidorn, S.J.; Greenman, C.D.; Dastur, A.; Lau, K.W.; Greninger, P.; Thompson, I.R.; Luo, X.; Soares, J.; et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature
**2012**, 483, 570–575. [Google Scholar] [CrossRef] [Green Version] - Chen, J.; Zhang, L. A survey and systematic assessment of computational methods for drug response prediction. Briefings Bioinform.
**2021**, 22, 232–246. [Google Scholar] [CrossRef] - Baptista, D.; Ferreira, P.G.; Rocha, M. Deep learning for drug response prediction in cancer. Briefings Bioinform.
**2021**, 22, 360–379. [Google Scholar] [CrossRef] [PubMed] - Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- He, X.; Folkman, L.; Borgwardt, K. Kernelized rank learning for personalized drug recommendation. Bioinformatics
**2018**, 34, 2808–2816. [Google Scholar] [CrossRef] [PubMed] - Basu, A.; Mitra, R.; Liu, H.; Schreiber, S.L.; Clemons, P.A. RWEN: Response-weighted elastic net for prediction of chemosensitivity of cancer cell lines. Bioinformatics
**2018**, 34, 3332–3339. [Google Scholar] [CrossRef] [PubMed] - Sharma, H.K.; Kumari, K.; Kar, S. A rough set approach for forecasting models. Decis. Mak. Appl. Manag. Eng.
**2020**, 3, 1–21. [Google Scholar] [CrossRef] - Ghosh, I.; Chaudhuri, T.D. FEB-Stacking and FEB-DNN Models for Stock Trend Prediction: A Performance Analysis for Pre and Post Covid-19 Periods. Decis. Mak. Appl. Manag. Eng.
**2021**, 4, 51–84. [Google Scholar] [CrossRef] - Szklarczyk, D.; Gable, A.L.; Lyon, D.; Junge, A.; Wyder, S.; Huerta-Cepas, J.; Simonovic, M.; Doncheva, N.T.; Morris, J.H.; Bork, P.; et al. STRING v11: Protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res.
**2019**, 47, D607–D613. [Google Scholar] [CrossRef] [Green Version] - Zhang, T.; Leng, J.; Liu, Y. Deep learning for drug–drug interaction extraction from the literature: A review. Briefings Bioinform.
**2020**, 21, 1609–1627. [Google Scholar] [CrossRef] - Karimi, M.; Wu, D.; Wang, Z.; Shen, Y. DeepAffinity: Interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics
**2019**, 35, 3329–3338. [Google Scholar] [CrossRef] [Green Version] - Öztürk, H.; Özgür, A.; Ozkirimli, E. DeepDTA: Deep drug–target binding affinity prediction. Bioinformatics
**2018**, 34, i821–i829. [Google Scholar] [CrossRef] [Green Version] - Lee, C.Y.; Chen, Y.P.P. Prediction of drug adverse events using deep learning in pharmaceutical discovery. Briefings Bioinform.
**2020**, 22, 1884–1901. [Google Scholar] [CrossRef] [PubMed] - Rifaioglu, A.S.; Atas, H.; Martin, M.J.; Cetin-Atalay, R.; Atalay, V.; Doğan, T. Recent applications of deep learning and machine intelligence on in silico drug discovery: Methods, tools and databases. Briefings Bioinform.
**2019**, 20, 1878–1912. [Google Scholar] [CrossRef] [PubMed] - Sharifi-Noghabi, H.; Zolotareva, O.; Collins, C.C.; Ester, M. MOLI: Multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics
**2019**, 35, i501–i509. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Ding, M.Q.; Chen, L.; Cooper, G.F.; Young, J.D.; Lu, X. Precision oncology beyond targeted therapy: Combining omics data with machine learning matches the majority of cancer cells to effective therapeutics. Mol. Cancer Res.
**2018**, 16, 269–278. [Google Scholar] [CrossRef] [Green Version] - Aloysius, N.; Geetha, M. A review on deep convolutional neural networks. In Proceedings of the 2017 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 6–8 April 2017; pp. 588–592. [Google Scholar]
- Liu, P.; Li, H.; Li, S.; Leung, K.S. Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network. BMC Bioinform.
**2019**, 20, 408. [Google Scholar] [CrossRef] [PubMed] - Chang, Y.; Park, H.; Yang, H.J.; Lee, S.; Lee, K.Y.; Kim, T.S.; Jung, J.; Shin, J.M. Cancer drug response profile scan (CDRscan): A deep learning model that predicts drug effectiveness from cancer genomic signature. Sci. Rep.
**2018**, 8, 1–11. [Google Scholar] [CrossRef] - Yap, C.W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem.
**2011**, 32, 1466–1474. [Google Scholar] [CrossRef] - Cortés-Ciriano, I.; Bender, A. KekuleScope: Prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J. ChemInform.
**2019**, 11, 1–16. [Google Scholar] [CrossRef] [Green Version] - Niepert, M.; Ahmed, M.; Kutzkov, K. Learning convolutional neural networks for graphs. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2014–2023. [Google Scholar]
- Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 3844–3852. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Nguyen, T.T.; Nguyen, G.T.T.; Nguyen, T.; Le, D.H. Graph convolutional networks for drug response prediction. IEEE/ACM Trans. Comput. Biol. Bioinform.
**2021**. [Google Scholar] [CrossRef] - Sun, M.; Zhao, S.; Gilvary, C.; Elemento, O.; Zhou, J.; Wang, F. Graph convolutional networks for computational drug development and discovery. Briefings Bioinform.
**2020**, 21, 919–935. [Google Scholar] [CrossRef] [PubMed] - Costello, J.C.; Heiser, L.M.; Georgii, E.; Gönen, M.; Menden, M.P.; Wang, N.J.; Bansal, M.; Hintsanen, P.; Khan, S.A.; Mpindi, J.P.; et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol.
**2014**, 32, 1202–1212. [Google Scholar] [CrossRef] - Iorio, F.; Knijnenburg, T.A.; Vis, D.J.; Bignell, G.R.; Menden, M.P.; Schubert, M.; Aben, N.; Gonçalves, E.; Barthorpe, S.; Lightfoot, H.; et al. A landscape of pharmacogenomic interactions in cancer. Cell
**2016**, 166, 740–754. [Google Scholar] [CrossRef] [Green Version] - Rhee, S.; Seo, S.; Kim, S. Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3527–3534. [Google Scholar]
- Lee, S.; Lim, S.; Lee, T.; Sung, I.; Kim, S. Cancer subtype classification and modeling by pathway attention and propagation. Bioinformatics
**2020**, 36, 3818–3824. [Google Scholar] [CrossRef] - Subramanian, A.; Narayan, R.; Corsello, S.M.; Peck, D.D.; Natoli, T.E.; Lu, X.; Gould, J.; Davis, J.F.; Tubelli, A.A.; Asiedu, J.K.; et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell
**2017**, 171, 1437–1452. [Google Scholar] [CrossRef] [PubMed] - Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process. Mag.
**2013**, 30, 83–98. [Google Scholar] [CrossRef] [Green Version] - Hammond, D.K.; Vandergheynst, P.; Gribonval, R. Wavelets on graphs via spectral graph theory. Appl. Comput. Harmon. Anal.
**2011**, 30, 129–150. [Google Scholar] [CrossRef] [Green Version] - Dhillon, I.S.; Guan, Y.; Kulis, B. Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans. Pattern Anal. Mach. Intell.
**2007**, 29, 1944–1957. [Google Scholar] [CrossRef] - Järvelin, K.; Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS)
**2002**, 20, 422–446. [Google Scholar] [CrossRef] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics; Springer: Berlin, Germany, 1992; pp. 196–202. [Google Scholar]
- Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc.
**2009**, 4, 44. [Google Scholar] - Khatri, P.; Sirota, M.; Butte, A.J. Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput. Biol.
**2012**, 8, e1002375. [Google Scholar] [CrossRef] [PubMed] - Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.
**2000**, 28, 27–30. [Google Scholar] [CrossRef] [PubMed]

**Figure 2.**Performance evaluation results of the six methods from L1000-IC50 dataset using the (

**a**) RMSE, (

**b**) PCC, (

**c**) SCC, and (

**d**) Normalized Discounted Cumulative Gain (NDCG). KRR, Kernelized Ridge Regression; BR, Bagging Regressor; KRL, Kernelized Ranked Learning; RWEN, Response-Weighted Elastic Net.

**Figure 3.**Box plot of ranks for each drug group predicted by DrugGCN using the L1000-IC50 dataset. Results from the RMSE, PCC, and SCC are combined.

**Figure 4.**Box plot of ranks for each drug group predicted by DrugGCN using the L1000-IC50 dataset. Ranks of drug groups are calculated from the (

**a**) RMSE, (

**b**) PCC, and (

**c**) SCC.

**Table 1.**p-values of the one-sided Wilcoxon signed-rank test using the L1000-IC50 dataset comparing the ranks of DrugGCN and the other five methods.

Measure | BR | KRR | KRL | MLP | RWEN |
---|---|---|---|---|---|

RMSE | 1.48 $\times {10}^{-10}$ | 9.44 $\times {10}^{-38}$ | - | 6.99 $\times {10}^{-37}$ | 2.77 $\times {10}^{-37}$ |

PCC | 9.57 $\times {10}^{-15}$ | 5.03 $\times {10}^{-28}$ | 2.93 $\times {10}^{-31}$ | 1.38 $\times {10}^{-35}$ | 3.89 $\times {10}^{-35}$ |

SCC | 5.77 $\times {10}^{-12}$ | 1.07 $\times {10}^{-25}$ | 1.31 $\times {10}^{-28}$ | 2.11 $\times {10}^{-35}$ | 9.28 $\times {10}^{-35}$ |

**Table 2.**Pathway enrichment test results with 50 genes for refametinib. QueryG indicates the number of query genes found in the pathway, and PathwayG is the number of total genes in the pathway.

Pathway | QueryG | PathwayG | % | p-Value | FDR |
---|---|---|---|---|---|

hsa05212:Pancreatic cancer | 6 | 65 | 12.2449 | 7.65 $\times {10}^{-6}$ | 9.11 $\times {10}^{-4}$ |

hsa05210:Colorectal cancer | 5 | 62 | 10.20408 | 1.38 $\times {10}^{-4}$ | 0.008185 |

hsa05220:Chronic myeloid leukemia | 5 | 72 | 10.20408 | 2.46 $\times {10}^{-4}$ | 0.009764 |

hsa05200:Pathways in cancer | 8 | 393 | 16.32653 | 0.001218 | 0.036235 |

hsa04380:Osteoclast differentiation | 5 | 131 | 10.20408 | 0.002345 | 0.054046 |

hsa05214:Glioma | 4 | 65 | 8.163265 | 0.002725 | 0.054046 |

hsa05166:HTLV-I infection | 6 | 254 | 12.2449 | 0.004409 | 0.074948 |

hsa05205:Proteoglycans in cancer | 5 | 200 | 10.20408 | 0.010509 | 0.156321 |

hsa04068:FoxO signaling pathway | 4 | 134 | 8.163265 | 0.019992 | 0.26434 |

hsa04010:MAPK signaling pathway | 5 | 253 | 10.20408 | 0.023081 | 0.264767 |

hsa05223:Non-small cell lung cancer | 3 | 56 | 6.122449 | 0.024474 | 0.264767 |

hsa05206:MicroRNAs in cancer | 5 | 286 | 10.20408 | 0.034222 | 0.32247 |

hsa04917:Prolactin signaling pathway | 3 | 71 | 6.122449 | 0.037938 | 0.32247 |

hsa05218:Melanoma | 3 | 71 | 6.122449 | 0.037938 | 0.32247 |

hsa04062:Chemokine signaling pathway | 4 | 186 | 8.163265 | 0.046265 | 0.367033 |

**Table 3.**Pathway enrichment test results with 50 genes for PLX-4720. QueryG indicates the number of query genes found in the pathway, and PathwayG is the number of total genes in the pathway.

Pathway | QueryG | PathwayG | % | p-Value | FDR |
---|---|---|---|---|---|

hsa05010:Alzheimer’s disease | 4 | 168 | 8 | 0.027233 | 1 |

hsa05120:Epithelial cell signaling in Helicobacter pylori infection | 3 | 67 | 6 | 0.028047 | 1 |

hsa04912:GnRH signaling pathway | 3 | 91 | 6 | 0.049052 | 1 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kim, S.; Bae, S.; Piao, Y.; Jo, K.
Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data. *Mathematics* **2021**, *9*, 772.
https://doi.org/10.3390/math9070772

**AMA Style**

Kim S, Bae S, Piao Y, Jo K.
Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data. *Mathematics*. 2021; 9(7):772.
https://doi.org/10.3390/math9070772

**Chicago/Turabian Style**

Kim, Seonghun, Seockhun Bae, Yinhua Piao, and Kyuri Jo.
2021. "Graph Convolutional Network for Drug Response Prediction Using Gene Expression Data" *Mathematics* 9, no. 7: 772.
https://doi.org/10.3390/math9070772