Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer’s Disease Using Genomic Data
Abstract
:1. Introduction
2. Results
2.1. Performance of the Models
2.2. Comparison of Machine Learning Methods with Polygenic Risk Score
2.3. Implementation of Feature Selection Techniques
2.4. Prioritized Genomic Variants in Multiple Sclerosis
3. Discussion
4. Materials and Methods
4.1. Inclusion and Exclusion Criteria
4.2. Pre-Processing of Genomic Data
4.3. Machine Learning Methods
4.4. Explainability Methods Applied to Machine Learning Models
4.5. Polygenic Risk Score
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Uffelmann, E.; Huang, Q.Q.; Munung, N.S.; de Vries, J.; Okada, Y.; Martin, A.R.; Martin, H.C.; Lappalainen, T.; Posthuma, D. Genome-Wide Association Studies. Nat. Rev. Methods Primers 2021, 1, 1–21. [Google Scholar] [CrossRef]
- Wang, G.; Sarkar, A.; Carbonetto, P.; Stephens, M. A Simple New Approach to Variable Selection in Regression, with Application to Genetic Fine Mapping. J. R. Stat. Soc. Ser. B Stat. Methodol. 2020, 82, 1273–1300. [Google Scholar] [CrossRef]
- Collister, J.A.; Liu, X.; Clifton, L. Calculating Polygenic Risk Scores (PRS) in UK Biobank: A Practical Guide for Epidemiologists. Front. Genet. 2022, 13, 818574. [Google Scholar] [CrossRef] [PubMed]
- Lipton, Z.C. The Mythos of Model Interpretability. Commun. ACM 2016, 61, 35–43. [Google Scholar] [CrossRef]
- Lin, J.; Ngiam, K.Y. How Data Science and AI-Based Technologies Impact Genomics. Singap. Med. J. 2023, 64, 59–66. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.-H.; Chen, Y.-C.; Ho, W.-M.; Lee, R.-G.; Chung, R.-H.; Liu, Y.-L.; Chang, P.-Y.; Chang, S.-C.; Wang, C.-W.; Chung, W.-H.; et al. Classifying Alzheimer’s Disease and Normal Subjects Using Machine Learning Techniques and Genetic-Environmental Features. J. Formos. Med. Assoc. 2024, 123, 701–709. [Google Scholar] [CrossRef] [PubMed]
- Oriol, J.D.V.; Vallejo, E.E.; Estrada, K.; Peña, J.G.T.; Alzheimer’s Disease Neuroimaging Initiative. Benchmarking Machine Learning Models for Late-Onset Alzheimer’s Disease Prediction from Genomic Data. BMC Bioinform. 2019, 20, 709. [Google Scholar] [CrossRef]
- Romero-Rosales, B.L.; Tamez-Pena, J.G.; Nicolini, H.; Moreno-Treviño, M.G.; Trevino, V. Improving Predictive Models for Alzheimer’s Disease Using GWAS Data by Incorporating Misclassified Samples Modeling. PLoS ONE 2020, 15, e0232103. [Google Scholar] [CrossRef]
- Gyawali, P.K.; Le Guen, Y.; Liu, X.; Belloy, M.E.; Tang, H.; Zou, J.; He, Z. Improving Genetic Risk Prediction across Diverse Population by Disentangling Ancestry Representations. Commun. Biol. 2023, 6, 964. [Google Scholar] [CrossRef]
- Jin, Y.; Ren, Z.; Wang, W.; Zhang, Y.; Zhou, L.; Yao, X.; Wu, T. Classification of Alzheimer’s Disease Using Robust TabNet Neural Networks on Genetic Data. Math. Biosci. Eng. 2023, 20, 8358–8374. [Google Scholar] [CrossRef]
- Shigemizu, D.; Akiyama, S.; Suganuma, M.; Furutani, M.; Yamakawa, A.; Nakano, Y.; Ozaki, K.; Niida, S. Classification and Deep-Learning-Based Prediction of Alzheimer Disease Subtypes by Using Genomic Data. Transl. Psychiatry 2023, 13, 232. [Google Scholar] [CrossRef] [PubMed]
- Chang, Y.C.; Wu, J.T.; Hong, M.Y.; Tung, Y.A.; Hsieh, P.H.; Yee, S.W.; Giacomini, K.M.; Oyang, Y.J.; Chen, C.Y. GenEpi: Gene-Based Epistasis Discovery Using Machine Learning. BMC Bioinform. 2020, 21, 1–13. [Google Scholar] [CrossRef]
- Jemimah, S.; AlShehhi, A. C-Diadem: A Constrained Dual-Input Deep Learning Model to Identify Novel Biomarkers in Alzheimer’s Disease. BMC Med. Genom. 2023, 16, 1–13. [Google Scholar] [CrossRef]
- Chandrashekar, P.B.; Alatkar, S.; Wang, J.; Hoffman, G.E.; He, C.; Jin, T.; Khullar, S.; Bendl, J.; Fullard, J.F.; Roussos, P.; et al. DeepGAMI: Deep Biologically Guided Auxiliary Learning for Multimodal Integration and Imputation to Improve Genotype–Phenotype Prediction. Genome Med. 2023, 15, 88. [Google Scholar] [CrossRef] [PubMed]
- Vivek, S.; Faul, J.; Thyagarajan, B.; Guan, W. Explainable Variational Autoencoder (E-VAE) Model Using Genome-Wide SNPs to Predict Dementia. J. Biomed. Inform. 2023, 148, 104536. [Google Scholar] [CrossRef] [PubMed]
- McGinley, M.P.; Goldschmidt, C.H.; Rae-Grant, A.D. Diagnosis and Treatment of Multiple Sclerosis: A Review. JAMA 2021, 325, 765–779. [Google Scholar] [CrossRef] [PubMed]
- Ghafouri-Fard, S.; Taheri, M.; Omrani, M.D.; Daaee, A.; Mohammad-Rahimi, H. Application of Artificial Neural Network for Prediction of Risk of Multiple Sclerosis Based on Single Nucleotide Polymorphism Genotypes. J. Mol. Neurosci. 2020, 70, 1081–1087. [Google Scholar] [CrossRef] [PubMed]
- Fuh-Ngwa, V.; Zhou, Y.; Melton, P.E.; van der Mei, I.; Charlesworth, J.C.; Lin, X.; Zarghami, A.; Broadley, S.A.; Ponsonby, A.L.; Simpson-Yap, S.; et al. Ensemble Machine Learning Identifies Genetic Loci Associated with Future Worsening of Disability in People with Multiple Sclerosis. Sci. Rep. 2022, 12, 19291. [Google Scholar] [CrossRef]
- Briggs, F.B.S.; Sept, C. Mining Complex Genetic Patterns Conferring Multiple Sclerosis Risk. Int. J. Environ. Res. Public Health 2021, 18, 2518. [Google Scholar] [CrossRef] [PubMed]
- Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely Randomized Trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 1–74. [Google Scholar] [CrossRef] [PubMed]
- Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification Using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
- Goris, A.; Vandebergh, M.; McCauley, J.L.; Saarela, J.; Cotsapas, C. Genetics of Multiple Sclerosis: Lessons from Polygenicity. Lancet Neurol. 2022, 21, 830–842. [Google Scholar] [CrossRef]
- Cheng, J.; Novati, G.; Pan, J.; Bycroft, C.; Žemgulytė, A.; Applebaum, T.; Pritzel, A.; Wong, L.H.; Zielinski, M.; Sargeant, T.; et al. Accurate Proteome-Wide Missense Variant Effect Prediction with AlphaMissense. Science 2023, 381, eadg7492. [Google Scholar] [CrossRef]
- Meyer, A.; Parmar, P.J.; Shahrara, S. Significance of IL-7 and IL-7R in RA and Autoimmunity. Autoimmun. Rev. 2022, 21, 103120. [Google Scholar] [CrossRef]
- Brynedal, B.; Duvefelt, K.; Jonasdottir, G.; Roos, I.M.; Åkesson, E.; Palmgren, J.; Hillert, J. HLA-A Confers an HLA-DRB1 Independent Influence on the Risk of Multiple Sclerosis. PLoS ONE 2007, 2, e664. [Google Scholar] [CrossRef]
- Bergamaschi, L.; Leone, M.A.; Fasano, M.E.; Guerini, F.R.; Ferrante, D.; Bolognesi, E.; Barizzone, N.; Corrado, L.; Naldi, P.; Agliardi, C.; et al. HLA-Class I Markers and Multiple Sclerosis Susceptibility in the Italian Population. Genes Immun. 2010, 11, 173–180. [Google Scholar] [CrossRef]
- Menegatti, J.; Schub, D.; Schäfer, M.; Grässer, F.A.; Ruprecht, K. HLA-DRB1*15:01 Is a Co-Receptor for Epstein–Barr Virus, Linking Genetic and Environmental Risk Factors for Multiple Sclerosis. Eur. J. Immunol. 2021, 51, 2348–2350. [Google Scholar] [CrossRef] [PubMed]
- González-Jiménez, A.; López-Cotarelo, P.; Agudo-Jiménez, T.; Casanova, I.; Silanes, C.L.D.; Martín-Requero, Á.; Matesanz, F.; Urcelay, E.; Espino-Paisán, L. Impact of Multiple Sclerosis Risk Polymorphism Rs7665090 on MANBA Activity, Lysosomal Endocytosis, and Lymphocyte Activation. Int. J. Mol. Sci. 2022, 23, 8116. [Google Scholar] [CrossRef] [PubMed]
- Law, S.P.L.; Gatt, P.N.; Schibeci, S.D.; McKay, F.C.; Vucic, S.; Hart, P.; Byrne, S.N.; Brown, D.; Stewart, G.J.; Liddle, C.; et al. Expression of CYP24A1 and Other Multiple Sclerosis Risk Genes in Peripheral Blood Indicates Response to Vitamin D in Homeostatic and Inflammatory Conditions. Genes Immun. 2021, 22, 227–233. [Google Scholar] [CrossRef]
- Wang, K.; Song, F.; Fernandez-Escobar, A.; Luo, G.; Wang, J.H.; Sun, Y. The Properties of Cytokines in Multiple Sclerosis: Pros and Cons. Am. J. Med. Sci. 2018, 356, 552–560. [Google Scholar] [CrossRef]
- Ward-Kavanagh, L.K.; Lin, W.W.; Šedý, J.R.; Ware, C.F. The TNF Receptor Superfamily in Co-Stimulating and Co-Inhibitory Responses. Immunity 2016, 44, 1005–1019. [Google Scholar] [CrossRef] [PubMed]
- ten Bosch, G.J.A.; Bolk, J.; ‘t Hart, B.A.; Laman, J.D. Multiple Sclerosis Is Linked to MAPKERK Overactivity in Microglia. J. Mol. Med. 2021, 99, 1033–1042. [Google Scholar] [CrossRef] [PubMed]
- Van Luijn, M.M.; Kreft, K.L.; Jongsma, M.L.; Mes, S.W.; Wierenga-Wolf, A.F.; Van Meurs, M.; Melief, M.J.; Van Der Kant, R.; Janssen, L.; Janssen, H.; et al. Multiple Sclerosis-Associated CLEC16A Controls HLA Class II Expression via Late Endosome Biogenesis. Brain 2015, 138, 1531–1547. [Google Scholar] [CrossRef] [PubMed]
- Dong, Y.; Zhou, S.; Xing, L.; Chen, Y.; Ren, Z.; Dong, Y.; Zhang, X. Deep Learning Methods May Not Outperform Other Machine Learning Methods on Analyzing Genomic Studies. Front. Genet. 2022, 13, 992070. [Google Scholar] [CrossRef] [PubMed]
- Tanjo, T.; Kawai, Y.; Tokunaga, K.; Ogasawara, O.; Nagasaki, M. Practical Guide for Managing Large-Scale Human Genome Data in Research. J. Human. Genet. 2020, 66, 39–52. [Google Scholar] [CrossRef] [PubMed]
- Gunter, N.B.; Gebre, R.K.; Graff-Radford, J.; Heckman, M.G.; Jack, C.R.; Lowe, V.J.; Knopman, D.S.; Petersen, R.C.; Ross, O.A.; Vemuri, P.; et al. Machine Learning Models of Polygenic Risk for Enhanced Prediction of Alzheimer Disease Endophenotypes. Neurol. Genet. 2024, 10, e200120. [Google Scholar] [CrossRef]
- Hermes, S.; Cady, J.; Armentrout, S.; O’Connor, J.; Holdaway, S.C.; Cruchaga, C.; Wingo, T.; Greytak, E.M. Epistatic Features and Machine Learning Improve Alzheimer’s Disease Risk Prediction Over Polygenic Risk Scores. J. Alzheimers Dis. 2024, 99, 1425–1440. [Google Scholar] [CrossRef] [PubMed]
- Darst, B.; Engelman, C.D.; Tian, Y.; Lorenzo Bermejo, J. Data Mining and Machine Learning Approaches for the Integration of Genome-Wide Association and Methylation Data: Methodology and Main Conclusions from GAW20. BMC Genet. 2018, 19, 76. [Google Scholar] [CrossRef] [PubMed]
- Vatcheva, K.P.; Lee, M.; McCormick, J.B.; Mohammad, R.H. Multicollinearity in Regression Analyses Conducted in Epidemiologic Studies. Epidemiology 2016, 6, 227. [Google Scholar] [CrossRef]
- Huang, Y.W.A.; Zhou, B.; Wernig, M.; Südhof, T.C. ApoE2, ApoE3, and ApoE4 Differentially Stimulate APP Transcription and Aβ Secretion. Cell 2017, 168, 427–441.e21. [Google Scholar] [CrossRef]
- Bellenguez, C.; Küçükali, F.; Jansen, I.E.; Kleineidam, L.; Moreno-Grau, S.; Amin, N.; Naj, A.C.; Campos-Martin, R.; Grenier-Boley, B.; Andrade, V.; et al. New Insights into the Genetic Etiology of Alzheimer’s Disease and Related Dementias. Nat. Genet. 2022, 54, 412–436. [Google Scholar] [CrossRef] [PubMed]
- Toloşi, L.; Lengauer, T. Classification with Correlated Features: Unreliability of Feature Ranking and Solutions. Bioinformatics 2011, 27, 1986–1994. [Google Scholar] [CrossRef]
- Sawcer, S.; Hellenthal, G.; Pirinen, M.; Spencer, C.C.A.; Patsopoulos, N.A.; Moutsianas, L.; Dilthey, A.; Su, Z.; Freeman, C.; Hunt, S.E.; et al. Genetic Risk and a Primary Role for Cell-Mediated Immune Mechanisms in Multiple Sclerosis. Nature 2011, 476, 214–219. [Google Scholar] [CrossRef] [PubMed]
- Hafler, D.A.; Compston, A.; Sawcer, S.; Lander, E.S.; Daly, M.J.; De Jager, P.L.; de Bakker, P.I.; Gabriel, S.B.; Mirel, D.B.; Ivinson, A.J.; et al. Risk Alleles for Multiple Sclerosis Identified by a Genomewide Study. N. Engl. J. Med. 2007, 357, 851–862. [Google Scholar] [CrossRef] [PubMed]
- Landrum, M.J.; Lee, J.M.; Benson, M.; Brown, G.R.; Chao, C.; Chitipiralla, S.; Gu, B.; Hart, J.; Hoffman, D.; Jang, W.; et al. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence. Nucleic Acids Res. 2018, 46, D1062–D1067. [Google Scholar] [CrossRef] [PubMed]
- Piñero, J.; Bravo, Á.; Queralt-Rosinach, N.; Gutiérrez-Sacristán, A.; Deu-Pons, J.; Centeno, E.; García-García, J.; Sanz, F.; Furlong, L.I. DisGeNET: A Comprehensive Platform Integrating Information on Human Disease-Associated Genes and Variants. Nucleic Acids Res. 2017, 45, D833–D839. [Google Scholar] [CrossRef] [PubMed]
- Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M.A.R.; Bender, D.; Maller, J.; Sklar, P.; De Bakker, P.I.W.; Daly, M.J.; et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. Am. J. Hum. Genet. 2007, 81, 559. [Google Scholar] [CrossRef]
- Delaneau, O.; Zagury, J.F.; Robinson, M.R.; Marchini, J.L.; Dermitzakis, E.T. Accurate, Scalable and Integrative Haplotype Estimation. Nat. Commun. 2019, 10, 1–10. [Google Scholar] [CrossRef] [PubMed]
- Rubinacci, S.; Delaneau, O.; Marchini, J. Genotype Imputation Using the Positional Burrows Wheeler Transform. PLOS Genet. 2020, 16, e1009049. [Google Scholar] [CrossRef] [PubMed]
- Auton, A.; Abecasis, G.R.; Altshuler, D.M.; Durbin, R.M.; Bentley, D.R.; Chakravarti, A.; Clark, A.G.; Donnelly, P.; Eichler, E.E.; Flicek, P.; et al. A Global Reference for Human Genetic Variation. Nature 2015, 526, 68–74. [Google Scholar] [CrossRef] [PubMed]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017; Volume 7, pp. 5109–5118. [Google Scholar]
- Choi, S.W.; O’Reilly, P.F. PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data. GigaScience 2019, 8, giz082. [Google Scholar] [CrossRef] [PubMed]
- Beecham, A.H.; Patsopoulos, N.A.; Xifara, D.K.; Davis, M.F.; Kemppinen, A.; Cotsapas, C.; Shah, T.S.; Spencer, C.; Booth, D.; Goris, A.; et al. Analysis of Immune-Related Loci Identifies 48 New Susceptibility Variants for Multiple Sclerosis. Nat. Genet. 2013, 45, 1353–1360. [Google Scholar] [CrossRef] [PubMed]
- Kunkle, B.W.; Grenier-Boley, B.; Sims, R.; Bis, J.C.; Damotte, V.; Naj, A.C.; Boland, A.; Vronskaya, M.; van der Lee, S.J.; Amlie-Wolf, A.; et al. Genetic Meta-Analysis of Diagnosed Alzheimer’s Disease Identifies New Risk Loci and Implicates Aβ, Tau, Immunity and Lipid Processing. Nat. Genet. 2019, 51, 414–430. [Google Scholar] [CrossRef]
- Sollis, E.; Mosaku, A.; Abid, A.; Buniello, A.; Cerezo, M.; Gil, L.; Groza, T.; Güneş, O.; Hall, P.; Hayhurst, J.; et al. The NHGRI-EBI GWAS Catalog: Knowledgebase and Deposition Resource. Nucleic Acids Res. 2023, 51, D977. [Google Scholar] [CrossRef]
- Saykin, A.J.; Shen, L.; Yao, X.; Kim, S.; Nho, K.; Risacher, S.L.; Ramanan, V.K.; Foroud, T.M.; Faber, K.M.; Sarwar, N.; et al. Genetic studies of quantitative MCI and AD phenotypes in ADNI: Progress, opportunities, and plans. Alzheimer’s Dement. 2015, 11, 792–814. [Google Scholar] [CrossRef]
- Lin, W.-Y.; Liu, N. Reducing Bias of Allele Frequency Estimates by Modeling SNP Genotype Data with Informative Missingness. Front. Genet. 2012, 3, 24355. [Google Scholar] [CrossRef]
- Wang, C.; Schroeder, K.B.; Rosenberg, N.A. A Maximum-Likelihood Method to Correct for Allelic Dropout in Microsatellite Data with No Replicate Genotypes. Genetics 2012, 192, 651–669. [Google Scholar] [CrossRef]
- Danecek, P.; Bonfield, J.K.; Liddle, J.; Marshall, J.; Ohan, V.; Pollard, M.O.; Whitwham, A.; Keane, T.; McCarthy, S.A.; Davies, R.M.; et al. Twelve years of SAMtools and BCFtools. GigaScience 2021, 10, giab008. [Google Scholar] [CrossRef] [PubMed]
- Zheng, X.; Shen, J.; Cox, C.; Wakefield, J.C.; Ehm, M.G.; Nelson, M.R.; Weir, B.S. HIBAG—HLA genotype imputation with attribute bagging. Pharmacogenomics J. 2013, 14, 192–200. [Google Scholar] [CrossRef]
- Choi, S.W.; Mak, T.S.-H.; O’reilly, P.F. Tutorial: A guide to performing polygenic risk score analyses. Nat. Protoc. 2020, 15, 2759–2772. [Google Scholar] [CrossRef]
- Shrikumar, A.; Greenside, P.; Kundaje, A. Learning Important Features Through Propagating Activation Differences. arXiv 2017. [Google Scholar] [CrossRef]
- Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv 2013. [Google Scholar] [CrossRef]
- Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for Simplicity: The All Convolutional Net. arXiv 2014. [Google Scholar] [CrossRef]
Accuracy Mean | Accuracy SD | Specificity Mean | Specificity SD | Sensitivity Mean | Sensitivity SD | |
---|---|---|---|---|---|---|
GB | 0.628 | 0.007 | 0.635 | 0.005 | 0.622 | 0.017 |
ET | 0.625 | 0.006 | 0.660 | 0.014 | 0.590 | 0.022 |
RF | 0.612 | 0.008 | 0.657 | 0.011 | 0.567 | 0.022 |
LR | 0.635 | 0.005 | 0.635 | 0.008 | 0.634 | 0.010 |
FFN | 0.629 | 0.014 | 0.599 | 0.059 | 0.660 | 0.075 |
CNN | 0.619 | 0.011 | 0.652 | 0.058 | 0.587 | 0.067 |
Accuracy Mean | Accuracy SD | Specificity Mean | Specificity SD | Sensitivity Mean | Sensitivity SD | |
---|---|---|---|---|---|---|
GB | 0.637 | 0.021 | 0.651 | 0.022 | 0.623 | 0.034 |
ET | 0.675 | 0.013 | 0.723 | 0.006 | 0.627 | 0.031 |
RF | 0.681 | 0.011 | 0.723 | 0.007 | 0.639 | 0.026 |
LR | 0.674 | 0.010 | 0.693 | 0.005 | 0.655 | 0.024 |
FFN | 0.645 | 0.018 | 0.693 | 0.068 | 0.598 | 0.072 |
CNN | 0.629 | 0.024 | 0.665 | 0.043 | 0.594 | 0.034 |
RR Mean 99th Percentile | RR SD 99th Percentile | OR Mean 99th Percentile | OR SD 99th Percentile | RR Mean ML Classification | RR SD ML Classification | OR Mean ML Classification | OR SD ML Classification | |
---|---|---|---|---|---|---|---|---|
GB | 3.598 | 1.575 | 3.901 | 1.822 | 2.790 | 0.162 | 2.868 | 0.170 |
ET | 3.886 | 1.653 | 4.247 | 2.004 | 2.719 | 0.119 | 2.795 | 0.125 |
RF | 3.341 | 1.687 | 3.610 | 1.915 | 2.456 | 0.159 | 2.517 | 0.168 |
LR | 5.509 | 1.390 | 6.232 | 1.782 | 2.935 | 0.119 | 3.021 | 0.126 |
FFN | 4.107 | 0.630 | 4.455 | 0.752 | 2.908 | 0.426 | 2.988 | 0.444 |
CNN | 2.827 | 1.060 | 2.984 | 1.219 | 2.641 | 0.269 | 2.712 | 0.284 |
PRS | 4.002 | 0.729 | 4.333 | 0.864 | --- | --- | --- | --- |
RR Mean 99th Percentile | RR SD 99th Percentile | OR Mean 99th Percentile | OR SD 99th Percentile | RR Mean ML Classification | RR SD ML Classification | OR Mean ML Classification | OR SD ML Classification | |
---|---|---|---|---|---|---|---|---|
GB | 4.313 | 0.823 | 4.795 | 1.037 | 3.013 | 0.564 | 3.128 | 0.609 |
ET | 6.643 | 1.315 | 8.031 | 1.914 | 4.183 | 0.426 | 4.408 | 0.462 |
RF | 6.837 | 0.719 | 8.273 | 1.044 | 4.391 | 0.415 | 4.636 | 0.453 |
LR | 6.882 | 0.651 | 8.336 | 0.969 | 4.094 | 0.368 | 4.300 | 0.398 |
FFN | 5.906 | 1.696 | 7.011 | 2.528 | 3.350 | 0.611 | 3.502 | 0.673 |
CNN | 4.788 | 1.798 | 5.473 | 2.487 | 2.862 | 0.608 | 2.971 | 0.661 |
PRS | 5.683 | 1.098 | 6.644 | 1.523 | --- | --- | --- | --- |
% Samples with the Same Prediction in Original, RFE and RFECV | Method | Features Fold1 | Features Fold2 | Features Fold3 | Features Fold4 | Features Fold5 | n Features 1 Time | n Features 2 Times | n Features 3 Times | n Features 4 Times | n Features 5 Times | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GB | 78.31 | RFE | 150 | 250 | 250 | 250 | 200 | 54 | 33 | 40 | 65 | 120 |
RFECV | 314 | 138 | 252 | 321 | 341 | 19 | 29 | 56 | 124 | 125 | ||
ET | 75.76 | RFE | 200 | 200 | 150 | 50 | 200 | 42 | 40 | 52 | 88 | 34 |
RFECV | 86 | 316 | 274 | 175 | 277 | 33 | 29 | 87 | 109 | 68 | ||
RF | 74.10 | RFE | 100 | 250 | 100 | 100 | 200 | 62 | 66 | 38 | 28 | 66 |
RFECV | 225 | 103 | 216 | 79 | 251 | 30 | 49 | 85 | 44 | 63 | ||
LR | 88.35 | RFE | 200 | 200 | 20 | 250 | 50 | 89 | 127 | 76 | 21 | 13 |
RFECV | 205 | 140 | 25 | 231 | 39 | 102 | 128 | 50 | 18 | 12 |
% Samples with the Same Prediction in Original, RFE and RFECV | Method | Features Fold1 | Features Fold2 | Features Fold3 | Features Fold4 | Features Fold5 | n Features 1 Time | n Features 2 Times | n Features 3 Times | n Features 4 Times | n Features 5 Times | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
GB | 74.68 | RFE | 5 | 150 | 5 | 5 | 5 | 140 | 9 | 1 | 1 | 1 |
RFECV | 1 | 29 | 1 | 1 | 1 | 28 | 0 | 0 | 0 | 1 | ||
ET | 93.02 | RFE | 150 | 5 | 100 | 100 | 5 | 34 | 35 | 75 | 4 | 3 |
RFECV | 32 | 1 | 1 | 2 | 1 | 30 | 1 | 0 | 0 | 1 | ||
RF | 94.38 | RFE | 150 | 5 | 100 | 100 | 100 | 32 | 16 | 22 | 75 | 5 |
RFECV | 139 | 3 | 130 | 124 | 99 | 7 | 11 | 29 | 91 | 3 | ||
LR | 92.23 | RFE | 20 | 5 | 20 | 50 | 20 | 29 | 21 | 10 | 1 | 2 |
RFECV | 7 | 1 | 7 | 51 | 3 | 44 | 4 | 4 | 0 | 1 |
dbSNP ID | Gene | Chromosome | LR Rank | GB Rank | ET Rank | RF Rank | FFN Rank | CNN Rank | Sum of Ranks |
---|---|---|---|---|---|---|---|---|---|
HLA-A*02:01 | HLA-A | chr6 | 1 | 9 | 8 | 8 | 12 | 46 | 1 |
rs2255214 | ILDR1 | chr3 | 10 | 17 | 15 | 13 | 1 | 56 | 2 |
rs7665090 | MANBA | chr4 | 16 | 5 | 112 | 18 | 32 | 54 | 3 |
rs180515 | RPS6KB1 | chr17 | 27 | 12 | 29 | 58 | 29 | 92 | 4 |
rs1800693 | TNFRSF1A | chr12 | 35 | 93 | 36 | 29 | 3 | 68 | 5 |
rs2248359 | CYP24A1 | chr20 | 18 | 6 | 126 | 64 | 4 | 49 | 6 |
rs11586238 | chr1 | 4 | 82 | 35 | 54 | 48 | 127 | 7 | |
rs2283792 | MAPK1 | chr22 | 78 | 48 | 113 | 99 | 23 | 9 | 8 |
rs7200786 | CLEC16A | chr16 | 17 | 27 | 17 | 30 | 42 | 239 | 9 |
rs4285028 | SLC15A2 | chr3 | 22 | 13 | 144 | 34 | 6 | 156 | 10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arnal Segura, M.; Bini, G.; Krithara, A.; Paliouras, G.; Tartaglia, G.G. Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer’s Disease Using Genomic Data. Int. J. Mol. Sci. 2025, 26, 2085. https://doi.org/10.3390/ijms26052085
Arnal Segura M, Bini G, Krithara A, Paliouras G, Tartaglia GG. Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer’s Disease Using Genomic Data. International Journal of Molecular Sciences. 2025; 26(5):2085. https://doi.org/10.3390/ijms26052085
Chicago/Turabian StyleArnal Segura, Magdalena, Giorgio Bini, Anastasia Krithara, Georgios Paliouras, and Gian Gaetano Tartaglia. 2025. "Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer’s Disease Using Genomic Data" International Journal of Molecular Sciences 26, no. 5: 2085. https://doi.org/10.3390/ijms26052085
APA StyleArnal Segura, M., Bini, G., Krithara, A., Paliouras, G., & Tartaglia, G. G. (2025). Machine Learning Methods for Classifying Multiple Sclerosis and Alzheimer’s Disease Using Genomic Data. International Journal of Molecular Sciences, 26(5), 2085. https://doi.org/10.3390/ijms26052085