EpiMOGA: An Epistasis Detection Method Based on a Multi-Objective Genetic Algorithm
Abstract
:1. Introduction
2. Materials and Methods
2.1. Simulated Datasets
2.2. Real GWAS Dataset
2.3. Problem Description
2.4. Bayesian Network Scoring and Gini Index
2.5. Pareto Optimal Approach
- (1)
- ,
- (2)
- ,
- (3)
- ,
2.6. EpiMOGA
Algorithm 1. EpiMOGA pseudocode Input: Num: the number of the initial population, positive integers greater than 1 X: m × n matrix consisted of 0 and 1, representing the states of m samples at n SNP sites. Y: 1 × m vector consisted of 0 and 1, representing the state of m samples. Maxtimes:Maximum number of iterations Output: Best: the set of optimal SNP combinations |
01: begin 02: For i = 1:Num 03: Initial population_i 04: Evaluate population:fitvalue⬅Twoobjection(population_i,X,Y) 05: while (generation <Maxtimes) 06: Selection 07: Crossover 08: Mutation 09: Evaluate population 09: Output candidate chromosome 10: end 11: end 11: Merge candidate set 12: Evaluate candidate set 13: Output Best⬅best SNP combinations 14: end |
2.6.1. Encoding Schemes, Initializing the Population
2.6.2. Genetic Operations
Selection Operation
Crossover Operation
Mutation Operation
2.6.3. Fitness Function
Algorithm 2. Twoobjection() pseudocode Input: X: m × n matrix consisted of 1 and 0, representing the states of m samples at n SNP sites. Y: 1 × m vector consisted of 0 and 1, representing the state of m samples. Pop:t × K matrix, represents t K-SNP combinations Output: Objvalue: 1 × t vector, representing the fitness value of t K-SNP combinations, which is a positive integer greater than or equal to 1. |
01: begin 02: Initialization: objvalue(1:pm) = 2 03: For i = 1:t 04: [objvalue1(i), objvalue2(i)] = TwoScore01(X(:,pop(i,:)),y); 05: end 06: For each i, j = 1:t 07: If((objvalue1(j)<objvalue1(i))&&(objvalue2(j)<objvalue2(i)))||((objvalue1(j)<objvalue1(i)) &&(objvalue2(j)==objvalue2(i)))||((objvalue1(j)==objvalue1(i))&&(objvalue2(j)<objvalue2(i))) 08: objvalue(j) = objvalue(j) + objvalue(i); 09: objvalue(i) = 1; 10: break; 11: end 12: end |
2.6.4. Screening Candidate Sets
2.7. Evaluation Criteria
3. Results
3.1. Simulation Experiments and Results
3.1.1. Parameter Setting
3.1.2. Simulation Experiment Case 1
3.1.3. Simulation Experiment Case 2
3.1.4. Simulation Experiment Case 3
3.2. Real Experiment and Results on the Alzheimer’s Disease Dataset
4. Discussion
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Welter, D.; MacArthur, J.; Morales, J.; Burdett, T.; Hall, P.; Junkins, H.; Klemm, A.; Flicek, P.; Manolio, T.; Hindorff, L.; et al. The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014, 42, D1001–D1006. [Google Scholar] [CrossRef] [PubMed]
- Eichler, E.E.; Flint, J.; Gibson, G.; Kong, A.; Leal, S.M.; Moore, J.H.; Nadeau, J.H. Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 2010, 11, 446–450. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jing, P.J.; Shen, H.B. MACOED: A multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics 2015, 31, 634–641. [Google Scholar] [CrossRef] [PubMed]
- Park, M.Y.; Hastie, T. Penalized logistic regression for detecting gene interactions. Biostatistics 2008, 9, 30–50. [Google Scholar] [CrossRef] [PubMed]
- Zhang, F.; Xie, D.; Liang, M.; Xiong, M. Functional regression models for epistasis analysis of multiple quantitative traits. PLoS Genet. 2016, 12, e1005965. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, X.; Zou, F.; Wang, W. FastANOVA: An efficient algorithm for genome-wide association study. KDD 2008, 821–829. [Google Scholar]
- Zhang, Y.; Liu, J.S. Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 2007, 39, 1167–1173. [Google Scholar] [CrossRef] [PubMed]
- Wan, X.; Yang, C.; Yang, Q.; Xue, H.; Fan, X.; Tang, N.L.; Yu, W. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am. J. Hum. Genet. 2010, 87, 325–340. [Google Scholar] [CrossRef] [Green Version]
- Tuo, S. FDHE-IW: A fast approach for detecting high-order epistasis in genome-wide case-control studies. Genes 2018, 9, 435. [Google Scholar] [CrossRef] [Green Version]
- Tuo, S.; Zhang, J.; Yuan, X.; Zhang, Y.; Liu, Z. FHSA-SED: Two-locus model detection for genome-wide association study with harmony search algorithm. PLoS ONE 2016, 11, e0150669. [Google Scholar] [CrossRef] [Green Version]
- Sun, Y.; Shang, J.; Liu, J.X.; Li, S.; Zheng, C.H. epiACO—A method for identifying epistasis based on ant Colony optimization algorithm. BioData Min. 2017, 10, 23. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Wang, Y.; Liu, X.; Robbins, K.; Rekaya, R. AntEpiSeeker: Detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res. Notes 2010, 3, 117. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Guo, Y.; Zhong, Z.; Yang, C.; Hu, J.; Jiang, Y.; Liang, Z.; Gao, H.; Liu, J. Epi-GTBN: An approach of epistasis mining based on genetic Tabu algorithm and Bayesian network. BMC Bioinform. 2019, 20, 444. [Google Scholar] [CrossRef] [PubMed]
- Chen, S.H.; Sun, J.; Dimitrov, L.; Turner, A.R.; Adams, T.S.; Meyers, D.A.; Chang, B.L.; Zheng, S.L.; Gronberg, H.; Xu, J.; et al. A support vector machine approach for detecting gene-gene interaction. Genet. Epidemiol. 2008, 32, 152–167. [Google Scholar] [CrossRef]
- Li, J.; Malley, J.D.; Andrew, A.S.; Karagas, M.R.; Moore, J.H. Detecting gene-gene interactions using a permutation-based random forest method. BioData Min. 2016, 9, 14. [Google Scholar] [CrossRef] [Green Version]
- Zhang, Q.; Long, Q.; Ott, J. AprioriGWAS, a new pattern mining strategy for detecting genetic variants associated with disease through interaction effects. PLoS Comput. Biol. 2014, 10, e1003627. [Google Scholar] [CrossRef] [Green Version]
- Wan, X.; Yang, C.; Yang, Q.; Xue, H.; Tang, N.L.; Yu, W. Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics 2010, 26, 30–37. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Urbanowicz, R.J.; Kiralis, J.; Sinnott-Armstrong, N.A.; Heberling, T.; Fisher, J.M.; Moore, J.H. GAMETES: A fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min. 2012, 5, 16. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gusev, A.; Bhatia, G.; Zaitlen, N.; Vilhjalmsson, B.J.; Diogo, D.; Stahl, E.A.; Gregersen, P.K.; Worthington, J.; Klareskog, L.; Raychaudhuri, S.; et al. Quantifying missing heritability at known GWAS loci. PLoS Genet. 2013, 9, e1003993. [Google Scholar] [CrossRef] [PubMed]
- Marchini, J.; Donnelly, P.; Cardon, L.R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 2005, 37, 413–417. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Z.; Lin, Y.; Li, X.; Driver, J.A.; Liang, L. Shared genetic architecture between metabolic traits and Alzheimer’s disease: A large-scale genome-wide cross-trait analysis. Hum. Genet. 2019, 138, 271–285. [Google Scholar] [CrossRef] [PubMed]
- Visweswaran, S.; Wong, A.K.; Barmada, M.M. A Bayesian method for identifying genetic interactions. AMIA Annu. Symp. Proc. 2009, 2009, 673–677. [Google Scholar] [PubMed]
- Raileanu, L.E.; Stoffel, K. Theoretical Comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 2004, 41, 77–93. [Google Scholar] [CrossRef]
- Goldberg, D.E. Genetic Algorithms in Search, Optimization, and Machine Learning; Ethnographic Praxis in Industry Conference Proceedings; Addison-Wesley Professional: Boston, MA, USA, 1988; Volume 9. [Google Scholar]
- Shannon, P.; Markiel, A.; Ozier, O.; Baliga, N.S.; Wang, J.T.; Ramage, D.; Amin, N.; Schwikowski, B.; Ideker, T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13, 2498–2504. [Google Scholar] [CrossRef] [PubMed]
- Xia, L.; Zhu, X.; Zhao, Y.; Yang, G.; Zuo, X.; Xie, P.; Chen, C.; Han, Q. Genome-wide RNA sequencing analysis reveals that IGF-2 attenuates memory decline, oxidative stress and amyloid plaques in an Alzheimer’s disease mouse model (AD) by activating the PI3K/AKT/CREB signaling pathway. Int. Psychogeriatr. 2019, 3. [Google Scholar] [CrossRef]
- Manavalan, A.; Mishra, M.; Sze, S.K.; Heese, K. Brain-site-specific proteome changes induced by neuronal P60TRP expression. Neurosignals 2013, 21, 129–149. [Google Scholar] [CrossRef]
- Heese, K. G proteins, p60TRP, and neurodegenerative diseases. Mol. Neurobiol. 2013, 47, 1103–1111. [Google Scholar] [CrossRef]
- Piard, J.; Hu, J.H.; Campeau, P.M.; Rzonca, S.; Van Esch, H.; Vincent, E.; Han, M.; Rossignol, E.; Castaneda, J.; Chelly, J.; et al. FRMPD4 mutations cause X-linked intellectual disability and disrupt dendritic spine morphogenesis. Hum. Mol. Genet. 2018, 27, 589–600. [Google Scholar] [CrossRef]
- Kim, E.; Naisbitt, S.; Hsueh, Y.P.; Rao, A.; Rothschild, A.; Craig, A.M.; Sheng, M. GKAP, a novel synaptic protein that interacts with the guanylate kinase-like domain of the PSD-95/SAP90 family of channel clustering molecules. J. Cell Biol. 1997, 136, 669–678. [Google Scholar] [CrossRef] [Green Version]
- Leuba, G.; Vernay, A.; Kraftsik, R.; Tardif, E.; Riederer, B.M.; Savioz, A. Pathological reorganization of NMDA receptors subunits and postsynaptic protein PSD-95 distribution in Alzheimer’s disease. Curr. Alzheimer Res. 2014, 11, 86–96. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, X.; Bai, J.; Tian, X.; Zhao, X.; Liu, W.; Duan, X.; Shang, W.; Fan, H.Y.; Tong, C. Mitoguardin regulates mitochondrial fusion through MitoPLD and is required for neuronal homeostasis. Mol. Cell 2016, 61, 111–124. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Villela, D.; Suemoto, C.K.; Pasqualucci, C.A.; Grinberg, L.T.; Rosenberg, C. Do copy number changes in CACNA2D2, CACNA2D3, and CACNA1D constitute a predisposing risk factor for Alzheimer’s disease? Front. Genet. 2016, 7, 107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Choi, H.; Andersen, J.P.; Molday, R.S. Expression and functional characterization of missense mutations in ATP8A2 linked to severe neurological disorders. Hum. Mutat. 2019, 40, 2353–2364. [Google Scholar] [CrossRef] [PubMed]
- Martin-Hernandez, E.; Rodriguez-Garcia, M.E.; Camacho, A.; Matilla-Duenas, A.; Garcia-Silva, M.T.; Quijada-Fraile, P.; Corral-Juan, M.; Tejada-Palacios, P.; de Las Heras, R.S.; Arenas, J.; et al. New ATP8A2 gene mutations associated with a novel syndrome: Encephalopathy, intellectual disability, severe hypotonia, chorea and optic atrophy. Neurogenetics 2016, 17, 259–263. [Google Scholar] [CrossRef] [PubMed]
- Coffey, S.M.; Cook, K.; Tartaglia, N.; Tassone, F.; Nguyen, D.V.; Pan, R.; Bronsky, H.E.; Yuhas, J.; Borodyanskaya, M.; Grigsby, J.; et al. Expanded clinical phenotype of women with the FMR1 premutation. Am. J. Med. Genet. A 2008, 146a, 1009–1016. [Google Scholar] [CrossRef] [Green Version]
- Nagase, T.; Ishikawa, K.; Suyama, M.; Kikuno, R.; Hirosawa, M.; Miyajima, N.; Tanaka, A.; Kotani, H.; Nomura, N.; Ohara, O. Prediction of the coding sequences of unidentified human genes. XII. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. DNA Res. Int. J. Rapid Publ. Rep. Genes Genomes 1998, 5, 355–364. [Google Scholar] [CrossRef]
- Vaags, A.K.; Bowdin, S.; Smith, M.L.; Gilbert-Dussardier, B.; Brocke-Holmefjord, K.S.; Sinopoli, K.; Gilles, C.; Haaland, T.B.; Vincent-Delorme, C.; Lagrue, E.; et al. Absent CNKSR2 causes seizures and intellectual, attention, and language deficits. Ann. Neurol. 2014, 76, 758–764. [Google Scholar] [CrossRef]
- Sunamura, N.; Iwashita, S.; Enomoto, K.; Kadoshima, T.; Isono, F. Loss of the fragile X mental retardation protein causes aberrant differentiation in human neural progenitor cells. Sci. Rep. 2018, 8, 11585. [Google Scholar] [CrossRef] [Green Version]
- Dombrowski, C.; Lévesque, S.; Morel, M.L.; Rouillard, P.; Morgan, K.; Rousseau, F. Premutation and intermediate-size FMR1 alleles in 10,572 males from the general population: Loss of an AGG interruption is a late event in the generation of fragile X syndrome alleles. Hum. Mol. Genet. 2002, 11, 371–378. [Google Scholar] [CrossRef] [Green Version]
- Kalkan, Z.; Durasi, I.M.; Sezerman, U.; Atasever-Arslan, B. Potential of GRID2 receptor gene for preventing TNF-induced neurodegeneration in autism. Neurosci. Lett. 2016, 620, 62–69. [Google Scholar] [CrossRef]
- Du Puy, L.; Beqqali, A.; Monshouwer-Kloots, J.; Haagsman, H.P.; Roelen, B.A.; Passier, R. CAZIP, a novel protein expressed in the developing heart and nervous system. Dev. Dyn. Off. Publ. Am. Assoc. Anat. 2009, 238, 2903–2911. [Google Scholar] [CrossRef] [PubMed]
- Yoo, S.; Kim, Y.; Lee, H.; Park, S.; Park, S. A gene trap knockout of the Tiam-1 protein results in malformation of the early embryonic brain. Mol. Cells 2012, 34, 103–108. [Google Scholar] [CrossRef] [Green Version]
- Ehler, E.; van Leeuwen, F.; Collard, J.G.; Salinas, P.C. Expression of Tiam-1 in the developing brain suggests a role for the Tiam-1-Rac signaling pathway in cell migration and neurite outgrowth. Mol. Cell. Neurosci. 1997, 9, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Vawter, M.P.; Harvey, P.D.; DeLisi, L.E. Dysregulation of X-linked gene expression in Klinefelter’s syndrome and association with verbal cognition. Am. J. Med Genet. Part B Neuropsychiatr. Genet. 2007, 144b, 728–734. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Method | Power | F-Measure |
---|---|---|
EpiMOGA | 0.90 | 0.9072 |
Epi-GTBN | 0.99 | 0.3004 |
Method | Power | F-Measure |
---|---|---|
EpiMOGA | 0.88 | 0.7273 |
FDHE-IW | 0.39 | 0.7529 |
BOOST | 0.34 | 0.6887 |
Epi-GTBN | 0.98 | 0.3925 |
Order | SNP1 | P1 | SNP2 | P2 | P | SVM |
---|---|---|---|---|---|---|
1 | rs17021105 | 1.52934 × 10−7 | rs8083566 | 3.48624 × 10−5 | 1.30237 × 10−9 | 0.720265781 |
2 | rs7003370 | 1.93263 × 10−6 | rs264173 | 0.00032777 | 2.73286 × 10−8 | 0.694629014 |
3 | rs2832810 | 6.48301 × 10−5 | rs7285350 | 0.040037803 | 2.91614 × 10−8 | 0.694518272 |
4 | rs852549 | 0.001864809 | rs7358822 | 0.000190734 | 2.06066 × 10−7 | 0.706256921 |
5 | rs2010668 | 0.00234868 | rs7358822 | 0.000190734 | 2.77707 × 10−7 | 0.706256921 |
6 | rs2832727 | 0.000632929 | rs7285350 | 0.040037803 | 3.38987 × 10−7 | 0.701605759 |
7 | rs8083566 | 3.48624 × 10−5 | rs723259 | 0.01579838 | 3.40313 × 10−7 | 0.73654485 |
8 | rs10853690 | 3.48624 × 10−5 | rs723259 | 0.01579838 | 3.40313 × 10−7 | 0.73654485 |
9 | rs42733 | 0.025710022 | rs7358822 | 0.000190734 | 5.87876 × 10−7 | 0.706256921 |
10 | rs2832841 | 5.07116 × 10−5 | rs5933762 | 0.000330143 | 7.53923 × 10−7 | 0.727131783 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Xu, F.; Pian, C.; Xu, M.; Kong, L.; Fang, J.; Li, Z.; Zhang, L. EpiMOGA: An Epistasis Detection Method Based on a Multi-Objective Genetic Algorithm. Genes 2021, 12, 191. https://doi.org/10.3390/genes12020191
Chen Y, Xu F, Pian C, Xu M, Kong L, Fang J, Li Z, Zhang L. EpiMOGA: An Epistasis Detection Method Based on a Multi-Objective Genetic Algorithm. Genes. 2021; 12(2):191. https://doi.org/10.3390/genes12020191
Chicago/Turabian StyleChen, Yuanyuan, Fengjiao Xu, Cong Pian, Mingmin Xu, Lingpeng Kong, Jingya Fang, Zutan Li, and Liangyun Zhang. 2021. "EpiMOGA: An Epistasis Detection Method Based on a Multi-Objective Genetic Algorithm" Genes 12, no. 2: 191. https://doi.org/10.3390/genes12020191
APA StyleChen, Y., Xu, F., Pian, C., Xu, M., Kong, L., Fang, J., Li, Z., & Zhang, L. (2021). EpiMOGA: An Epistasis Detection Method Based on a Multi-Objective Genetic Algorithm. Genes, 12(2), 191. https://doi.org/10.3390/genes12020191