A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data
Abstract
1. Introduction
2. Materials and Methods
2.1. PKB Model
2.1.1. Penalized Boosting
2.1.2. Penalized Boosting
3. Results
3.1. Simulation Studies
- -
- Model 1:
- -
- Model 2:
- -
- Model 3:
3.2. Real Data Applications
3.2.1. Breast Cancer
3.2.2. Lower Grade Glioma
3.2.3. Melanoma
4. Discussion
Supplementary Materials
Author Contributions
Acknowledgments
Conflicts of Interest
Abbreviations
KEGG | Kyoto Encyclopedia of Genes and Genomes |
TCGA | The Cancer Genome Atlas |
References
- Subramanian, A.; Tamayo, P.; Mootha, V.K.; Mukherjee, S.; Ebert, B.L.; Gillette, M.A.; Paulovich, A.; Pomeroy, S.L.; Golub, T.R.; Lander, E.S.; et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 2005, 102, 15545–15550. [Google Scholar] [CrossRef] [PubMed]
- Carlson, C.S.; Eberle, M.A.; Kruglyak, L.; Nickerson, D.A. Mapping complex disease loci in whole-genome association studies. Nature 2004, 429, 446. [Google Scholar] [CrossRef] [PubMed]
- Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27–30. [Google Scholar] [CrossRef] [PubMed]
- Schaefer, C.F.; Anthony, K.; Krupa, S.; Buchoff, J.; Day, M.; Hannay, T.; Buetow, K.H. PID: The pathway interaction database. Nucleic Acids Res. 2008, 37, D674–D679. [Google Scholar] [CrossRef] [PubMed]
- Nishimura, D. BioCarta. Biotech Softw. Internet Rep. Comput. Softw. J. Sci. 2001, 2, 117–120. [Google Scholar] [CrossRef]
- Liu, D.; Lin, X.; Ghosh, D. Semiparametric Regression of Multidimensional Genetic Pathway Data: Least-Squares Kernel Machines and Linear Mixed Models. Biometrics 2007, 63, 1079–1088. [Google Scholar] [CrossRef]
- Wu, M.C.; Lee, S.; Cai, T.; Li, Y.; Boehnke, M.; Lin, X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 2011, 89, 82–93. [Google Scholar] [CrossRef]
- Shou, J.; Massarweh, S.; Osborne, C.K.; Wakeling, A.E.; Ali, S.; Weiss, H.; Schiff, R. Mechanisms of tamoxifen resistance: Increased estrogen receptor-HER2/neu cross-talk in ER/HER2–positive breast cancer. J. Natl. Cancer Inst. 2004, 96, 926–935. [Google Scholar] [CrossRef]
- Shtivelman, E.; Hensing, T.; Simon, G.R.; Dennis, P.A.; Otterson, G.A.; Bueno, R.; Salgia, R. Molecular pathways and therapeutic targets in lung cancer. Oncotarget 2014, 5, 1392. [Google Scholar] [CrossRef]
- Berk, M. Neuroprogression: Pathways to progressive brain changes in bipolar disorder. Int. J. Neuropsychopharmacol. 2009, 12, 441–445. [Google Scholar] [CrossRef]
- Wei, Z.; Li, H. Nonparametric pathway-based regression models for analysis of genomic data. Biostatistics 2007, 8, 265–284. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Luan, Y.; Li, H. Group additive regression models for genomic data analysis. Biostatistics 2007, 9, 100–113. [Google Scholar] [CrossRef] [PubMed][Green Version]
- Gönen, M.; Margolin, A.A. Drug susceptibility prediction against a panel of drugs using kernelized Bayesian multitask learning. Bioinformatics 2014, 30, i556–i563. [Google Scholar] [CrossRef] [PubMed]
- Aiolli, F.; Donini, M. EasyMKL: A scalable multiple kernel learning algorithm. Neurocomputing 2015, 169, 215–224. [Google Scholar] [CrossRef]
- Costello, J.C.; Heiser, L.M.; Georgii, E.; Gönen, M.; Menden, M.P.; Wang, N.J.; Bansal, M.; Hintsanen, P.; Khan, S.A.; Mpindi, J.P.; et al. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol. 2014, 32, 1202–1212. [Google Scholar] [CrossRef] [PubMed]
- Friedrichs, S.; Manitz, J.; Burger, P.; Amos, C.I.; Risch, A.; Chang-Claude, J.; Wichmann, H.E.; Kneib, T.; Bickeböller, H.; Hofner, B. Pathway-based kernel boosting for the analysis of genome-wide association studies. Comput. Math. Methods Med. 2017, 2017. [Google Scholar] [CrossRef] [PubMed]
- Manica, M.; Cadow, J.; Mathis, R.; Martínez, M.R. PIMKL: Pathway-Induced Multiple Kernel Learning. NPJ Syst. Biol. Appl. 2019, 5, 8. [Google Scholar] [CrossRef] [PubMed]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
- Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
- Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.J.; Vapnik, V. Support Vector Regression Machines. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1997; pp. 155–161. [Google Scholar]
- Fukumizu, K.; Bach, F.R.; Jordan, M.I. Kernel dimension reduction in regression. Ann. Stat. 2009, 37, 1871–1905. [Google Scholar] [CrossRef]
- Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning; Springer Series in Statistics: New York, NY, USA, 2001; Volume 1. [Google Scholar]
- Johnson, R.; Zhang, T. Learning nonlinear functions using regularized greedy forest. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 942–954. [Google Scholar] [CrossRef] [PubMed]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Pereira, B.; Chin, S.F.; Rueda, O.M.; Vollan, H.K.M.; Provenzano, E.; Bardwell, H.A.; Pugh, M.; Jones, L.; Russell, R.; Sammut, S.J.; et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat. Commun. 2016, 7, 11479. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Xu, L.h.; Yu, Q. Cell aggregation induces phosphorylation of PECAM-1 and Pyk2 and promotes tumor cell anchorage-independent growth. Mol. Cancer 2010, 9, 7. [Google Scholar] [CrossRef] [PubMed]
- Monteith, G.R.; McAndrew, D.; Faddy, H.M.; Roberts-Thomson, S.J. Calcium and cancer: Targeting Ca2+ transport. Nat. Rev. Cancer 2007, 7, 519. [Google Scholar] [CrossRef]
- Hermani, A.; Hess, J.; De Servi, B.; Medunjanin, S.; Grobholz, R.; Trojan, L.; Angel, P.; Mayer, D. Calcium-binding proteins S100A8 and S100A9 as novel diagnostic markers in human prostate cancer. Clin. Cancer Res. 2005, 11, 5146–5152. [Google Scholar] [CrossRef]
- TCGA. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 2015, 2015, 2481–2498. [Google Scholar]
- Leung, N.; Turbide, C.; Olson, M.; Marcus, V.; Jothy, S.; Beauchemin, N. Deletion of the carcinoembryonic antigen-related cell adhesion molecule 1 (Ceacam1) gene contributes to colon tumor progression in a murine model of carcinogenesis. Oncogene 2006, 25, 5527. [Google Scholar] [CrossRef]
- Tilan, J.; Kitlinska, J. Neuropeptide Y (NPY) in tumor growth and progression: Lessons learned from pediatric oncology. Neuropeptides 2016, 55, 55–66. [Google Scholar] [CrossRef]
- TCGA; Akbani, R.; Akdemir, K.C.; Aksoy, B.A.; Albert, M.; Ally, A.; Amin, S.B.; Arachchi, H.; Arora, A.; Auman, J.T.; et al. Genomic classification of cutaneous melanoma. Cell 2015, 161, 1681–1696. [Google Scholar] [CrossRef] [PubMed]
- Pio, R.; Corrales, L.; Lambris, J.D. The Role of Complement in Tumor Growth. In Tumor Microenvironment and Cellular Stress; Springer: Berlin, Germany, 2014; pp. 229–262. [Google Scholar]
1. Initialize target function as an optimal constant: |
For t from 0 to T-1 (maximum number of iterations) do: |
2. calculate the first and second derivatives: |
3. optimize the regularized loss function in the base learner space: |
4. find the step length with the steepest descent: |
5. update the target function: |
End For |
return |
Method | Model 1 | Model 2 | Model 3 | |||||
---|---|---|---|---|---|---|---|---|
50 | 150 | 50 | 150 | 50 | 150 | |||
PKB- | 0.151 | 0.196 | 0.198 | 0.189 | 0.179 | 0.21 | ||
PKB- | 0.158 | 0.185 | 0.201 | 0.183 | 0.157 | 0.173 | ||
Random Forest | 0.305 | 0.331 | 0.290 | 0.328 | 0.341 | 0.400 | ||
SVM | 0.353 | 0.431 | 0.412 | 0.476 | 0.431 | 0.492 | ||
NPR | 0.271 | 0.321 | 0.299 | 0.317 | 0.479 | 0.440 | ||
EasyMKL | 0.253 | 0.284 | 0.268 | 0.330 | 0.212 | 0.300 |
Method | Data Sets | ||||
---|---|---|---|---|---|
Metabric (Grade) | Glioma (Grade) | Glioma (Site) | Melanoma (Stage) | Melanoma (Met) | |
PKB- | 0.274 | 0.283 | 0.168 | 0.304 | 0.081 |
PKB- | 0.304 | 0.283 | 0.154 | 0.307 | 0.083 |
Random Forest | 0.306 | 0.302 | 0.306 | 0.320 | 0.136 |
SVM | 0.285 | 0.292 | 0.185 | 0.314 | 0.083 |
NPR | 0.306 | 0.298 | 0.197 | 0.282 | 0.110 |
EasyMKL | 0.297 | 0.302 | 0.291 | 0.314 | 0.100 |
Metabric (Grade) | Glioma (Grade) | Melanoma (Met) | |
---|---|---|---|
1 | Cell aggregation | Homophilic cell adhesion via plasma membrane adhesion molecules | Lectin induced complement pathway |
2 | Sequestering of metal ion | Neuropeptide signaling pathway | Classical complement pathway |
3 | Glutathione derivative metabolic process | Multicellular organismal macromolecule metabolic process | Phospholipase c delta in phospholipid associated cell signaling |
4 | Antigen processing and presentation of exogenous peptide antigen via mhc class i | Peripheral nervous system neuron differentiation | Fc epsilon receptor i signaling in mast cells |
5 | Sterol biosynthetic process | Positive regulation of hair cycle | Inhibition of matrix metalloproteinases |
6 | Pyrimidine containing compound salvage | Peptide hormone processing | Regulation of map kinase pathways through dual specificity phosphatases |
7 | Protein dephosphorylation | Hyaluronan metabolic process | Estrogen responsive protein efp controls cell cycle and breast tumors growth |
8 | Homophilic cell adhesion via plasma membrane adhesion molecules | Positive regulation of synapse maturation | Chaperones modulate interferon signaling pathway |
9 | Cyclooxygenase pathway | Stabilization of membrane potential | Il-10 anti-inflammatory signaling pathway |
10 | Establishment of protein localization to endoplasmic reticulum | Lymphocyte chemotaxis | Reversal of insulin resistance by leptin |
11 | Negative regulation of dephosphorylation | Insulin secretion | Bone remodeling |
12 | Xenophagy | Positive regulation of osteoblast proliferation | Cycling of ran in nucleocytoplasmic transport |
13 | Attachment of spindle microtubules to kinetochore | Negative regulation of dephosphorylation | Alternative complement pathway |
14 | Fatty acyl coa metabolic process | Trophoblast giant cell differentiation | Cell cycle: g/m checkpoint |
15 | Apical junction assembly | Synaptonemal complex organization | Hop pathway in cardiac development |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zeng, L.; Yu, Z.; Zhao, H. A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data. Genes 2019, 10, 670. https://doi.org/10.3390/genes10090670
Zeng L, Yu Z, Zhao H. A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data. Genes. 2019; 10(9):670. https://doi.org/10.3390/genes10090670
Chicago/Turabian StyleZeng, Li, Zhaolong Yu, and Hongyu Zhao. 2019. "A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data" Genes 10, no. 9: 670. https://doi.org/10.3390/genes10090670
APA StyleZeng, L., Yu, Z., & Zhao, H. (2019). A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data. Genes, 10(9), 670. https://doi.org/10.3390/genes10090670