Genetic Risk Assessment of Nonsyndromic Cleft Lip with or without Cleft Palate by Linking Genetic Networks and Deep Learning Models
Abstract
:1. Introduction
2. Results
2.1. Genetic Association Analysis for NSCL/P
2.2. Genetic Risk Prediction
2.3. In Silico Functional Analysis
3. Discussion
4. Materials and Methods
4.1. Study Subjects
4.2. SNP Genotyping
4.3. Genetic Association Analysis
4.4. Genetic Risk Prediction
4.4.1. SNP Subset Selection
4.4.2. Polygenic Risk Score
4.4.3. Traditional Machine Learning Algorithms
4.4.4. Artificial Neural Network
4.4.5. Genetic-Algorithm-Optimized Neural Networks Ensemble
4.5. Model Evaluation and Validation
4.6. In Silico Functional Analysis
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dixon, M.J.; Marazita, M.L.; Beaty, T.H.; Murray, J.C. Cleft lip and palate: Understanding genetic and environmental influences. Nat. Rev. Genet. 2011, 12, 167–178. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ryu, J.Y.; Park, T.H.; Lee, J.S.; Yang, J.D.; Chung, H.Y.; Cho, B.C.; Choi, K.Y. A nationwide cohort study on growth impairment by cleft lip with or without palate. Sci. Rep. 2021, 11, 23609. [Google Scholar] [CrossRef] [PubMed]
- Martinelli, M.; Palmieri, A.; Carinci, F.; Scapoli, L. Non-syndromic Cleft Palate: An Overview on Human Genetic and Environmental Risk Factors. Front. Cell Dev. Biol. 2020, 8, 592271. [Google Scholar] [CrossRef] [PubMed]
- Yuan, Q.; Blanton, S.H.; Hecht, J.T. Genetic causes of nonsyndromic cleft lip with or without cleft palate. Adv. Otorhinolaryngol. 2011, 70, 107–113. [Google Scholar] [CrossRef]
- Mangold, E.; Ludwig, K.U.; Birnbaum, S.; Baluardo, C.; Ferrian, M.; Herms, S.; Reutter, H.; De Assis, N.A.; Chawa, T.A.; Mattheisen, M.; et al. Genome-wide association study identifies two susceptibility loci for nonsyndromic cleft lip with or without cleft palate. Nat. Genet. 2010, 42, 24–26. [Google Scholar] [CrossRef] [Green Version]
- Beaty, T.H.; Taub, M.A.; Scott, A.F.; Murray, J.C.; Marazita, M.L.; Schwender, H.; Parker, M.M.; Hetmanski, J.B.; Balakrishnan, P.; Mansilla, M.A.; et al. Confirming genes influencing risk to cleft lip with/without cleft palate in a case–parent trio study. Hum. Genet. 2013, 132, 771–781. [Google Scholar] [CrossRef] [Green Version]
- Birnbaum, S.; Ludwig, K.U.; Reutter, H.; Herms, S.; Steffens, M.; Rubini, M.; Baluardo, C.; Ferrian, M.; Almeida de Assis, N.; Alblas, M.A.; et al. Key susceptibility locus for nonsyndromic cleft lip with or without cleft palate on chromosome 8q24. Nat. Genet. 2009, 41, 473–477. [Google Scholar] [CrossRef]
- Blanton, S.H.; Burt, A.; Stal, S.; Mulliken, J.B.; Garcia, E.; Hecht, J.T. Family-based study shows heterogeneity of a susceptibility locus on chromosome 8q24 for nonsyndromic cleft lip and palate. Birth Defects Res. Part A Clin. Mol. Teratol. 2010, 88, 256–259. [Google Scholar] [CrossRef] [Green Version]
- Sun, Y.; Huang, Y.; Yin, A.; Pan, Y.; Wang, Y.; Wang, C.; Du, Y.; Wang, M.; Lan, F.; Hu, Z.; et al. Genome-wide association study identifies a new susceptibility locus for cleft lip with or without a cleft palate. Nat. Commun. 2015, 6, 6414. [Google Scholar] [CrossRef] [Green Version]
- Lewis, C.M.; Vassos, E. Polygenic risk scores: From research tools to clinical instruments. Genome Med. 2020, 12, 44. [Google Scholar] [CrossRef]
- Ho, D.S.W.; Schierding, W.; Wake, M.; Saffery, R.; O’Sullivan, J. Machine Learning SNP Based Prediction for Precision Medicine. Front. Genet. 2019, 10, 267. [Google Scholar] [CrossRef] [Green Version]
- Mahesh, B. Machine learning algorithms-a review. Int. J. Sci. Res. (IJSR) 2020, 9, 381–386. [Google Scholar] [CrossRef]
- Zhang, S.J.; Meng, P.; Zhang, J.; Jia, P.; Lin, J.; Wang, X.; Chen, F.; Wei, X. Machine Learning Models for Genetic Risk Assessment of Infants with Non-syndromic Orofacial Cleft. Genom. Proteom. Bioinform. 2018, 16, 354–364. [Google Scholar] [CrossRef]
- Machado, R.A.; de Oliveira Silva, C.; Martelli-Junior, H.; das Neves, L.T.; Coletta, R.D. Machine learning in prediction of genetic risk of nonsyndromic oral clefts in the Brazilian population. Clin. Oral Investig. 2021, 25, 1273–1280. [Google Scholar] [CrossRef]
- Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [Green Version]
- Montaez, C.A.C.; Fergus, P.; Montaez, A.C.; Hussain, A.; Al-Jumeily, D.; Chalmers, C. Deep learning classification of polygenic obesity using genome wide association study SNPs. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Yan, Q.; Weeks, D.E.; Xin, H.; Swaroop, A.; Chew, E.Y.; Huang, H.; Ding, Y.; Chen, W. Deep-learning-based Prediction of Late Age-Related Macular Degeneration Progression. Nat. Mach. Intell. 2020, 2, 141–150. [Google Scholar] [CrossRef]
- Eraslan, G.; Avsec, Z.; Gagneur, J.; Theis, F.J. Deep learning: New computational modelling techniques for genomics. Nat. Rev. Genet. 2019, 20, 389–403. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 3104–3112. [Google Scholar]
- Wei, Z.; Wang, W.; Bradfield, J.; Li, J.; Cardinale, C.; Frackelton, E.; Kim, C.; Mentch, F.; Van Steen, K.; Visscher, P.M.; et al. Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. Am. J. Hum. Genet. 2013, 92, 1008–1012. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Perez-Enciso, M.; Zingaretti, L.M. A Guide for Using Deep Learning for Complex Trait Genomic Prediction. Genes 2019, 10, 553. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Okser, S.; Pahikkala, T.; Airola, A.; Salakoski, T.; Ripatti, S.; Aittokallio, T. Regularized machine learning in the genetic prediction of complex traits. PLoS Genet. 2014, 10, e1004754. [Google Scholar] [CrossRef] [Green Version]
- Tong, D.L.; Schierz, A.C. Hybrid genetic algorithm-neural network: Feature extraction for unpreprocessed microarray data. Artif. Intell. Med. 2011, 53, 47–56. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Niu, Y.; Zou, H.; Luo, L.; Liu, Q.; Wu, W. Accurate prediction of immunogenic T-cell epitopes from epitope sequences using the genetic algorithm-based ensemble learning. PLoS ONE 2015, 10, e0128194. [Google Scholar] [CrossRef] [PubMed]
- Hu, X.; Chu, L.; Pei, J.; Liu, W.; Bian, J. Model complexity of deep learning: A survey. Knowl. Inf. Syst. 2021, 63, 2585–2619. [Google Scholar] [CrossRef]
- Rahman, S.; Irfan, M.; Raza, M.; Moyeezullah Ghori, K.; Yaqoob, S.; Awais, M. Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living. Int. J. Environ. Res. Public Health 2020, 17, 1082. [Google Scholar] [CrossRef] [Green Version]
- Liu, B.; Cui, Q.; Jiang, T.; Ma, S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinform. 2004, 5, 136. [Google Scholar] [CrossRef] [Green Version]
- Rafik, A.; Nadifi, S. Updating genetics polymorphisms of non-syndromic clefts lip-palates. Am. J. Mol. Biol. 2018, 8, 178–185. [Google Scholar] [CrossRef] [Green Version]
- Huang, Y.Q.; Ma, J.; Ma, M.; Deng, Y.; Li, Y.D.; Ren, H.W.; Zhao, G.Z.; Guo, S.S.; Wang, Y.Y.; Zhang, G.X.; et al. Association between MSX1 variants and oral clefts in Han Chinese in western China. DNA Cell Biol. 2011, 30, 1057–1061. [Google Scholar] [CrossRef]
- Song, T.; Wu, D.; Wang, Y.; Li, H.; Yin, N.; Zhao, Z. SNPs and interaction analyses of IRF6, MSX1 and PAX9 genes in patients with nonsyndromic cleft lip with or without palate. Mol. Med. Rep. 2013, 8, 1228–1234. [Google Scholar] [CrossRef]
- Lidral, A.C.; Murray, J.C.; Buetow, K.H.; Basart, A.M.; Schearer, H.; Shiang, R.; Naval, A.; Layda, E.; Magee, K.; Magee, W. Studies of the candidate genes TGFB2, MSX1, TGFA, and TGFB3 in the etiology of cleft lip and palate in the Philippines. Cleft Palate-Craniofacial J. 1997, 34, 1–6. [Google Scholar] [CrossRef]
- Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference On Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 17–19 June 1997; pp. 2921–2929. [Google Scholar]
- Iles, M.M.; Barrett, J.H. Single-locus Tests of Association for Population-based Studies. In Analysis of Complex Disease Association Studies; Elsevier: Amsterdam, The Netherlands, 2011; pp. 109–122. [Google Scholar]
- Seo, Y.J.; Park, J.W.; Kim, Y.H.; Baek, S.H. Initial growth pattern of children with cleft before alveolar bone graft stage according to cleft type. Angle Orthod. 2011, 81, 1103–1110. [Google Scholar] [CrossRef]
- Cho, Y.G.; Song, H.J.; Lee, S.K.; Jang, S.N.; Jeong, J.Y.; Choi, Y.H.; Hong, K.S.; Choi, M.G.; Kang, S.H.; Kang, J.H.; et al. The relationship between body fat mass and erectile dysfunction in Korean men: Hallym Aging Study. Int. J. Impot. Res. 2009, 21, 179–186. [Google Scholar] [CrossRef] [Green Version]
- Xu, Z.; Taylor, J.A. SNPinfo: Integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res. 2009, 37, W600–W605. [Google Scholar] [CrossRef] [Green Version]
- Kim, B.M.; Kim, Y.H.; Kim, D.H.; Park, J.W.; Baek, S.H. Genetic effect of transforming growth factor alpha gene variants on the risk of nonsyndromic cleft lip with or without palate in korean populations. Cleft Palate-Craniofacial J. 2015, 52, 293–300. [Google Scholar] [CrossRef]
- Barrett, J.C.; Fry, B.; Maller, J.; Daly, M.J. Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics 2005, 21, 263–265. [Google Scholar] [CrossRef] [Green Version]
- Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P.I.; Daly, M.J.; et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007, 81, 559–575. [Google Scholar] [CrossRef] [Green Version]
- Chatterjee, N.; Shi, J.; Garcia-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 2016, 17, 392–406. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar] [CrossRef]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv 2016. [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014. [CrossRef]
- Huang da, W.; Sherman, B.T.; Lempicki, R.A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 2009, 4, 44–57. [Google Scholar] [CrossRef] [PubMed]
- Szklarczyk, D.; Gable, A.L.; Lyon, D.; Junge, A.; Wyder, S.; Huerta-Cepas, J.; Simonovic, M.; Doncheva, N.T.; Morris, J.H.; Bork, P.; et al. STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 2019, 47, D607–D613. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Gene | Chr. | SNP | NR/R | RAF (%) | HWE (p) | NN/NR/RR | OR | 95% CI | p-Value | |
---|---|---|---|---|---|---|---|---|---|---|
Case/Control | Case | Control | ||||||||
IRF6 | 1q32.3-q41 | rs2235373 a | A/G | 66.8/51.3 | 0.58 | 19/57/67 | 30/56/33 | 1.91 | 1.34–2.72 | 3.5 × 10−4 c |
rs2235371 a | T/C | 72.4/57.6 | 0.85 | 15/49/79 | 22/57/40 | 1.93 | 1.34–2.78 | 4.4 × 10−4 c | ||
rs2013162 b | A/C | 52.8/38.7 | 0. 57 | 35/65/43 | 43/60/16 | 1.78 | 1.25–2.52 | 1.5 × 10−3 | ||
rs2235375 b | C/G | 52.8/39.1 | 0.70 | 35/65/43 | 43/59/17 | 1.74 | 1.23–2.47 | 2.1 × 10−3 | ||
rs1044516 | A/C | 55.6/42.0 | 0. 46 | 31/65/47 | 42/54/23 | 1.73 | 1.22–2.45 | 2.1 × 10−3 | ||
rs595918 | G/A | 21.5/13.4 | 1.00 | 87/49/6 | 89/28/2 | 1.76 | 1.10–2.81 | 0.02 | ||
RUNX2 | 6p21 | rs16873348 | T/C | 35.0/25.6 | 0.81 | 59/68/16 | 65/47/7 | 1.56 | 1.07–2.28 | 0.02 |
ARNT | 1q21 | rs11204737 | C/T | 50.7/39.9 | 0. 45 | 39/63/41 | 45/53/21 | 1.55 | 1.09–2.19 | 0.01 |
TGFB3 | 14q24 | rs3917192 | G/A | 49.0/39.5 | 0.85 | 35/76/32 | 44/56/19 | 1.47 | 1.04–2.08 | 0.03 |
rs2284791 | G/C | 45.1/35.7 | 0.69 | 44/69/30 | 48/57/14 | 1.48 | 1.04–2.11 | 0.03 | ||
MTHFR | 1p36.3 | rs3753582 | G/T | 93.7/88.2 | 0.20 | 2/14/127 | 3/22/94 | 1.99 | 1.07–3.69 | 0.03 |
TCOF1 | 5q32-q33.1 | rs7715100 | A/G | 9.1/4.2 | 1.00 | 119/22/2 | 109/10/0 | 2.28 | 1.08–4.83 | 0.04 |
SNP (p-Value) | Model | Train_Acc | Test_Acc | F1 Score | Test_AUC | Model | Train_Acc | Test_Acc | F1 Score | Test_AUC |
---|---|---|---|---|---|---|---|---|---|---|
3 SNPs (<0.01) | PRS | - | 0.584 | - | 0.625 | ANN | 0.580 | 0.597 | 0.615 | 0.690 |
RF | 0.585 | 0.500 | 0.500 | 0.502 | I a | 0.588 | 0.537 | - | 0.686 | |
SVM | 0.585 | 0.597 | 0.570 | 0.554 | 95% CI | 0.570–0.590 | 0.355–0.629 | - | 0.625–0.724 | |
XGBoost | 0.585 | 0.581 | 0.581 | 0.574 | GANNE b | 0.725 | 0.694 | 0.709 | 0.741 | |
LR | 0.580 | 0.597 | 0.573 | 0.567 | I a | 0.707 | 0.597 | - | 0.720 | |
LGBM | 0.585 | 0.581 | 0.581 | 0.574 | 95% CI | 0.650–0.745 | 0.500–0.677 | - | 0.677–0.754 | |
ADA | 0.585 | 0.500 | 0.502 | 0.502 | - | - | - | - | - | |
10 SNPs (<0.05) | PRS | - | 0.607 | - | 0.657 | ANN | 0.945 | 0.629 | 0.649 | 0.714 |
RF | 0.955 | 0.597 | 0.598 | 0.597 | I a | 0.880 | 0.580 | - | 0.626 | |
SVM | 0.955 | 0.677 | 0.678 | 0.685 | 95% CI | 0.775–0.910 | 0.484–0.710 | - | 0.495–0.742 | |
XGBoost | 0.950 | 0.613 | 0.613 | 0.623 | GANNE b | 1.000 | 0.742 | 0.756 | 0.882 | |
LR | 0.625 | 0.613 | 0.614 | 0.619 | I a | 0.921 | 0.654 | - | 0.752 | |
LGBM | 0.935 | 0.565 | 0.566 | 0.568 | 95% CI | 0.870–0.955 | 0.500–0.823 | 0.667–0.895 | ||
ADA | 0.630 | 0.630 | 0.630 | 0.638 | - | - | - | - | - | |
16 SNPs (<0.1) | PRS | - | 0.603 | - | 0.679 | ANN | 0.990 | 0.645 | 0.659 | 0.650 |
RF | 0.995 | 0.645 | 0.640 | 0.635 | I a | 0.935 | 0.540 | - | 0.570 | |
SVM | 0.950 | 0.677 | 0.678 | 0.68 | 95% CI | 0.910–0.955 | 0.403–0.661 | - | 0.433–0.719 | |
XGBoost | 0.995 | 0.629 | 0.630 | 0.625 | GANNE b | 1.000 | 0.694 | 0.709 | 0.744 | |
LR | 0.695 | 0.565 | 0.565 | 0.559 | I a | 0.931 | 0.652 | - | 0.675 | |
LGBM | 0.995 | 0.548 | 0.550 | 0.549 | 95% CI | 0.895–0.965 | 0.548–0.774 | 0.561–0.759 | ||
ADA | 0.690 | 0.581 | 0.582 | 0.582 | - | - | - | - | - | |
92 SNPs (All) | PRS | - | 0.676 | - | 0.711 | ANN | 0.980 | 0.613 | 0.633 | 0.595 |
RF | 1.000 | 0.565 | 0.566 | 0.592 | I a | 0.865 | 0.449 | - | 0.454 | |
SVM | 0.975 | 0.613 | 0.615 | 0.615 | 95% CI | 0.409–0.900 | 0.274–0.726 | - | 0.293–0.610 | |
XGBoost | 1.000 | 0.581 | 0.578 | 0.569 | - | - | - | - | - | |
LR | 0.855 | 0.516 | 0.518 | 0.512 | - | - | - | - | - | |
LGBM | 1.000 | 0.565 | 0.563 | 0.555 | - | - | - | - | - | |
ADA | 0.880 | 0.548 | 0.550 | 0.549 | - | - | - | - | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kang, G.; Baek, S.-H.; Kim, Y.H.; Kim, D.-H.; Park, J.W. Genetic Risk Assessment of Nonsyndromic Cleft Lip with or without Cleft Palate by Linking Genetic Networks and Deep Learning Models. Int. J. Mol. Sci. 2023, 24, 4557. https://doi.org/10.3390/ijms24054557
Kang G, Baek S-H, Kim YH, Kim D-H, Park JW. Genetic Risk Assessment of Nonsyndromic Cleft Lip with or without Cleft Palate by Linking Genetic Networks and Deep Learning Models. International Journal of Molecular Sciences. 2023; 24(5):4557. https://doi.org/10.3390/ijms24054557
Chicago/Turabian StyleKang, Geon, Seung-Hak Baek, Young Ho Kim, Dong-Hyun Kim, and Ji Wan Park. 2023. "Genetic Risk Assessment of Nonsyndromic Cleft Lip with or without Cleft Palate by Linking Genetic Networks and Deep Learning Models" International Journal of Molecular Sciences 24, no. 5: 4557. https://doi.org/10.3390/ijms24054557