A Novel Approach to Dual Feature Selection of Atrial Fibrillation Based on HC-MFS
Abstract
:1. Introduction
2. Data Sources and Methods
2.1. Data Sources
2.2. Statistical Analysis
2.3. Dual Feature-Selection Method Based on Hierarchical Clustering and Mean Fisher Score HC-MFS
3. Empirical Analysis
3.1. Parameter Settings
3.2. Clustering Results
3.3. Model Evaluation and Comparison
4. Discussion and Suggestions
4.1. Feature Analysis
4.1.1. Season Correlation Analysis
4.1.2. Serum Total Cholesterol (TC) and Low-Density Lipoprotein Correlation Analysis
4.1.3. Platelet Count Correlation Analysis
4.1.4. C-Reactive Protein (CRP) Correlation Analysis
4.2. Suggestions
4.2.1. Suggestions to Patients
4.2.2. Suggestions for Hospitals
5. Conclusions
6. Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Methods | Model | Datasets | Accuracy | Precision | Recall | F1 | AUC |
---|---|---|---|---|---|---|---|
HC_MFS | KNN | IMV | 0.5941 | 0.5840 | 0.6481 | 0.6138 | 0.6095 |
IMV+SS | 0.6938 | 0.6647 | 0.7815 | 0.7171 | 0.7316 | ||
IMV+OR | 0.5941 | 0.5843 | 0.6481 | 0.6134 | 0.6060 | ||
IMV+OR+SS | 0.6974 | 0.6822 | 0.7407 | 0.7076 | 0.7323 | ||
RF | IMV | 0.7619 | 0.7344 | 0.8180 | 0.7731 | 0.8298 | |
IMV+SS | 0.7619 | 0.7344 | 0.8180 | 0.7731 | 0.8299 | ||
IMV+OR | 0.7270 | 0.7163 | 0.7474 | 0.7310 | 0.8055 | ||
IMV+OR+SS | 0.7270 | 0.7163 | 0.7474 | 0.7310 | 0.8052 | ||
SVM | IMV | 0.7103 | 0.7015 | 0.7182 | 0.7072 | 0.7699 | |
IMV+SS | 0.6973 | 0.6858 | 0.7333 | 0.7051 | 0.7687 | ||
IMV+OR | 0.6899 | 0.6799 | 0.7298 | 0.7035 | 0.7248 | ||
IMV+OR+SS | 0.6938 | 0.6823 | 0.7382 | 0.7079 | 0.7444 | ||
NB | IMV | 0.7011 | 0.6938 | 0.7229 | 0.7065 | 0.7534 | |
IMV+SS | 0.7252 | 0.7016 | 0.7905 | 0.7430 | 0.7516 | ||
IMV+OR | 0.6863 | 0.6837 | 0.6934 | 0.6875 | 0.7367 | ||
IMV+OR+SS | 0.6900 | 0.6795 | 0.7111 | 0.6946 | 0.7427 | ||
LR | IMV | 0.6808 | 0.6730 | 0.7000 | 0.6852 | 0.7369 | |
IMV+SS | 0.6716 | 0.6687 | 0.6778 | 0.6711 | 0.7288 | ||
IMV+OR | 0.6679 | 0.6695 | 0.6667 | 0.6653 | 0.7191 | ||
IMV+OR+SS | 0.6679 | 0.6599 | 0.6926 | 0.6747 | 0.7320 | ||
FS | KNN | IMV | 0.5720 | 0.5765 | 0.5259 | 0.5496 | 0.5729 |
IMV+SS | 0.6771 | 0.6616 | 0.7222 | 0.6897 | 0.7214 | ||
IMV+OR | 0.5721 | 0.5673 | 0.5963 | 0.5795 | 0.5934 | ||
IMV+OR+SS | 0.6697 | 0.6528 | 0.7259 | 0.6870 | 0.7034 | ||
RF | IMV | 0.7454 | 0.7405 | 0.7512 | 0.7455 | 0.8198 | |
IMV+SS | 0.7454 | 0.7405 | 0.7512 | 0.7455 | 0.8198 | ||
IMV+OR | 0.7437 | 0.7310 | 0.7697 | 0.7491 | 0.8103 | ||
IMV+OR+SS | 0.7437 | 0.7310 | 0.7697 | 0.7491 | 0.8103 | ||
SVM | IMV | 0.6624 | 0.6527 | 0.6914 | 0.6702 | 0.7264 | |
IMV+SS | 0.6806 | 0.6710 | 0.7325 | 0.7001 | 0.7392 | ||
IMV+OR | 0.6659 | 0.6602 | 0.6628 | 0.6608 | 0.7207 | ||
IMV+OR+SS | 0.6605 | 0.6505 | 0.6704 | 0.6585 | 0.7394 | ||
NB | IMV | 0.6882 | 0.6699 | 0.7481 | 0.7057 | 0.7433 | |
IMV+SS | 0.6754 | 0.6587 | 0.7411 | 0.6967 | 0.7424 | ||
IMV+OR | 0.6660 | 0.6693 | 0.6850 | 0.6760 | 0.7348 | ||
IMV+OR+SS | 0.6403 | 0.6444 | 0.6742 | 0.6560 | 0.7185 | ||
LR | IMV | 0.6753 | 0.6706 | 0.6852 | 0.6770 | 0.7348 | |
IMV+SS | 0.6753 | 0.6690 | 0.6889 | 0.6778 | 0.7358 | ||
IMV+OR | 0.6605 | 0.6600 | 0.6593 | 0.6591 | 0.7204 | ||
IMV+OR+SS | 0.6513 | 0.6516 | 0.6481 | 0.6489 | 0.7230 | ||
R_F | KNN | IMV | 0.5869 | 0.5792 | 0.6296 | 0.6030 | 0.6066 |
IMV+SS | 0.6715 | 0.6451 | 0.7741 | 0.7022 | 0.7032 | ||
IMV+OR | 0.5848 | 0.5728 | 0.6481 | 0.6077 | 0.6047 | ||
IMV+OR+SS | 0.6531 | 0.6235 | 0.7741 | 0.6888 | 0.6888 | ||
RF | IMV | 0.7030 | 0.6832 | 0.7510 | 0.7149 | 0.7849 | |
IMV+SS | 0.7030 | 0.6785 | 0.7658 | 0.7192 | 0.7920 | ||
IMV+OR | 0.6975 | 0.6724 | 0.7658 | 0.7156 | 0.7847 | ||
IMV+OR+SS | 0.6975 | 0.6724 | 0.7658 | 0.7156 | 0.7839 | ||
SVM | IMV | 0.6733 | 0.6691 | 0.6852 | 0.6767 | 0.7209 | |
IMV+SS | 0.6644 | 0.6493 | 0.6640 | 0.6542 | 0.7171 | ||
IMV+OR | 0.6604 | 0.6733 | 0.6741 | 0.6726 | 0.7081 | ||
IMV+OR+SS | 0.6771 | 0.6739 | 0.6962 | 0.6841 | 0.7209 | ||
NB | IMV | 0.6973 | 0.6775 | 0.7621 | 0.7162 | 0.7354 | |
IMV+SS | 0.6974 | 0.6667 | 0.7645 | 0.7112 | 0.7413 | ||
IMV+OR | 0.6882 | 0.6782 | 0.7054 | 0.6908 | 0.7426 | ||
IMV+OR+SS | 0.6898 | 0.6856 | 0.7148 | 0.6983 | 0.7413 | ||
LR | IMV | 0.6808 | 0.6817 | 0.6852 | 0.6818 | 0.7319 | |
IMV+SS | 0.6790 | 0.6812 | 0.6815 | 0.6795 | 0.7330 | ||
IMV+OR | 0.6624 | 0.6620 | 0.6667 | 0.6621 | 0.7248 | ||
IMV+OR+SS | 0.6641 | 0.6616 | 0.6778 | 0.6688 | 0.7154 | ||
MI | KNN | IMV | 0.5665 | 0.5572 | 0.6444 | 0.5972 | 0.5948 |
IMV+SS | 0.6918 | 0.6717 | 0.7481 | 0.7069 | 0.7291 | ||
IMV+OR | 0.5849 | 0.5797 | 0.6074 | 0.5929 | 0.5926 | ||
IMV+OR+SS | 0.6715 | 0.6563 | 0.7222 | 0.6866 | 0.7174 | ||
RF | IMV | 0.7436 | 0.7295 | 0.7697 | 0.7484 | 0.8226 | |
IMV+SS | 0.7436 | 0.7295 | 0.7697 | 0.7484 | 0.8228 | ||
IMV+OR | 0.7288 | 0.7331 | 0.7141 | 0.7226 | 0.8138 | ||
IMV+OR+SS | 0.7306 | 0.6984 | 0.8068 | 0.7484 | 0.8203 | ||
SVM | IMV | 0.7048 | 0.6844 | 0.7260 | 0.7043 | 0.7608 | |
IMV+SS | 0.6476 | 0.6365 | 0.6763 | 0.6555 | 0.7131 | ||
IMV+OR | 0.6827 | 0.6777 | 0.7212 | 0.6987 | 0.7444 | ||
IMV+OR+SS | 0.6475 | 0.6397 | 0.6778 | 0.6566 | 0.7086 | ||
NB | IMV | 0.6935 | 0.6797 | 0.7508 | 0.7122 | 0.7360 | |
IMV+SS | 0.6734 | 0.6406 | 0.7355 | 0.6847 | 0.7320 | ||
IMV+OR | 0.6752 | 0.6976 | 0.6618 | 0.6750 | 0.7423 | ||
IMV+OR+SS | 0.6624 | 0.6606 | 0.6594 | 0.6584 | 0.7219 | ||
LR | IMV | 0.6587 | 0.6565 | 0.6741 | 0.6639 | 0.7048 | |
IMV+SS | 0.6661 | 0.6688 | 0.6630 | 0.6644 | 0.7296 | ||
IMV+OR | 0.6439 | 0.6431 | 0.6481 | 0.6443 | 0.6991 | ||
IMV+OR+SS | 0.6457 | 0.6423 | 0.6556 | 0.6481 | 0.6981 | ||
DCFS | KNN | IMV | 0.5721 | 0.5610 | 0.6444 | 0.5991 | 0.6100 |
IMV+SS | 0.6863 | 0.6630 | 0.7593 | 0.7068 | 0.7374 | ||
IMV+OR | 0.5720 | 0.5655 | 0.6074 | 0.5851 | 0.5749 | ||
IMV+OR+SS | 0.6771 | 0.6621 | 0.7222 | 0.6904 | 0.7247 | ||
RF | IMV | 0.7233 | 0.7166 | 0.7326 | 0.7238 | 0.8093 | |
IMV+SS | 0.7196 | 0.7129 | 0.7289 | 0.7199 | 0.8074 | ||
IMV+OR | 0.7288 | 0.7183 | 0.7474 | 0.7319 | 0.8085 | ||
IMV+OR+SS | 0.7288 | 0.7183 | 0.7474 | 0.7319 | 0.8087 | ||
SVM | IMV | 0.6643 | 0.6579 | 0.6963 | 0.6760 | 0.7286 | |
IMV+SS | 0.6716 | 0.6573 | 0.7017 | 0.6769 | 0.7455 | ||
IMV+OR | 0.6679 | 0.6502 | 0.6707 | 0.6594 | 0.7246 | ||
IMV+OR+SS | 0.6753 | 0.6699 | 0.6827 | 0.6757 | 0.7172 | ||
NB | IMV | 0.6936 | 0.6738 | 0.7665 | 0.7165 | 0.7473 | |
IMV+SS | 0.7048 | 0.6830 | 0.7636 | 0.7205 | 0.7484 | ||
IMV+OR | 0.6882 | 0.6935 | 0.6909 | 0.6917 | 0.7490 | ||
IMV+OR+SS | 0.6623 | 0.6644 | 0.6665 | 0.6652 | 0.7403 | ||
LR | IMV | 0.6790 | 0.6772 | 0.6852 | 0.6796 | 0.7373 | |
IMV+SS | 0.6808 | 0.6768 | 0.6889 | 0.6817 | 0.7376 | ||
IMV+OR | 0.6550 | 0.6577 | 0.6481 | 0.6511 | 0.7088 | ||
IMV+OR+SS | 0.6568 | 0.6541 | 0.6630 | 0.6578 | 0.7226 |
References
- Vos, T.; Lim, S.S.; Abbafati, C.; Abbas, K.M.; Abbasi, M.; Abbasifard, M.; Abbasi-Kangevari, M.; Abbastabar, H.; Abd-Allah, F.; Abdelalim, A.; et al. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: A systematic analysis for the Global Burden of Disease Study 2019. Lancet 2020, 396, 1204–1222. [Google Scholar] [CrossRef]
- Joseph, P.G.; Healey, J.S.; Raina, P.; Connolly, S.J.; Ibrahim, Q.; Gupta, R.; Avezum, A.; Dans, A.L.; Lopez-Jaramillo, P.; Yeates, K.; et al. Global variations in the prevalence, treatment, and impact of atrial fibrillation in a multinational cohort of 153 152 middle-aged individuals. Cardiovasc. Res. 2021, 117, 1523–1531. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.; Gue, Y.; Calvert, P.; Gupta, D.; McDowell, G.; Azariah, J.L.; Namboodiri, N.; Bucci, T.; Jabir, A.; Tse, H.F.; et al. Predicting stroke in Asian patients with atrial fibrillation using machine learning: A report from the KERALA-AF registry, with external validation in the APHRS-AF registry. Curr. Probl. Cardiol. 2024, 49, 102456. [Google Scholar] [CrossRef] [PubMed]
- Tseng, A.S.; Noseworthy, P.A. Prediction of Atrial Fibrillation Using Machine Learning: A Review. Front. Physiol. 2021, 12, 752317. [Google Scholar] [CrossRef] [PubMed]
- Al Snousy, M.B.; El-Deeb, H.M.; Badran, K.; Al Khlil, I.A. Suite of decision tree-based classification algorithms on cancer gene expression data. Egypt. Inform. J. 2011, 12, 73–82. [Google Scholar] [CrossRef]
- Kuwabara, M.; Niwa, K.; Nishihara, S.; Nishi, Y.; Takahashi, O.; Kario, K.; Yamamoto, K.; Yamashita, T.; Hisatome, I. Hyperuricemia is an independent competing risk factor for atrial fibrillation. Int. J. Cardiol. 2017, 231, 137–142. [Google Scholar] [CrossRef] [PubMed]
- Voskoboinik, A.; Prabhu, S.; Ling, L.H.; Kalman, J.M.; Kistler, P.M. Alcohol and atrial fibrillation: A sobering review. J. Am. Coll. Cardiol. 2016, 68, 2567–2576. [Google Scholar] [CrossRef] [PubMed]
- Iacobellis, G.; Ribaudo, M.C.; Zappaterreno, A.; Iannucci, C.V.; Leonetti, F. Relation between epicardial adipose tissue and left ventricular mass. Am. J. Cardiol. 2004, 94, 1084–1087. [Google Scholar] [CrossRef] [PubMed]
- Nattel, S. New ideas about atrial fibrillation 50 years on. Nature 2002, 415, 219–226. [Google Scholar] [CrossRef]
- Erkapic, D.; Aleksic, M.; Roussopoulos, K.; Weipert, K.F.; Sözener, K.; Kostev, K.; Allendörfer, J.; Rosenbauer, J.; Guenduez, D.; Tanislav, C. Microembolizations in the Arterial Cerebral Circulation in Patients with Atrial Fibrillation Ablation Using the Cryoballoon Technique—Protocol and Methodology of a Prospective Observational Study. Diagnostics 2023, 13, 1660. [Google Scholar] [CrossRef]
- Bennett, M.; Nault, I.; Koehle, M.; Wilton, S. Air Pollution and Arrhythmias. Can. J. Cardiol. 2023, 39, 1253–1262. [Google Scholar] [CrossRef] [PubMed]
- Kim, I.S.; Yang, P.S.; Lee, J.; Yu, H.T.; Kim, T.H.; Uhm, J.S.; Pak, H.N.; Lee, M.H.; Joung, B. Long-term exposure of fine particulate matter air pollution and incident atrial fibrillation in the general population: A nationwide cohort study. Int. J. Cardiol. 2019, 283, 178–183. [Google Scholar] [CrossRef] [PubMed]
- Dahlquist, M.; Frykman, V.; Kemp-Gudmunsdottir, K.; Svennberg, E.; Wellenius, G.A.; Ljungman, P.L.S. Short-term associations between ambient air pollution and acute atrial fibrillation episodes. Environ. Int. 2020, 141, 105765. [Google Scholar] [CrossRef]
- Murphy, N.F.; Stewart, S.; MacIntyre, K.; Capewell, S.; McMurray, J.J. Seasonal variation in morbidity and mortality related to atrial fibrillation. Int. J. Cardiol. 2004, 97, 283–288. [Google Scholar] [CrossRef] [PubMed]
- Mottillo, S.; Filion, K.B.; Genest, J.; Joseph, L.; Pilote, L.; Poirier, P.; Rinfret, S.; Schiffrin, E.L.; Eisenberg, M.J. The Metabolic Syndrome and Cardiovascular Risk: A Systematic Review and Meta-Analysis. J. Am. Coll. Cardiol. 2010, 56, 1113–1132. [Google Scholar] [CrossRef] [PubMed]
- Watanabe, H.; Tanabe, N.; Watanabe, T.; Darbar, D.; Roden, D.M.; Sasaki, S.; Aizawa, Y. Metabolic syndrome and risk of development of atrial fibrillation: The Niigata Preventive Medicine Study. Circulation 2008, 117, 1255–1260. [Google Scholar] [CrossRef] [PubMed]
- Beam, A.L.; Motsinger-Reif, A.; Doyle, J. Bayesian neural networks for detecting epistasis in genetic association studies. BMC Bioinform. 2014, 15, 368. [Google Scholar] [CrossRef] [PubMed]
- Altamirano-Flores, J.S.; Alvarado-Hernández, L.; Cuevas-Tello, J.C.; Tino, P.; Guerra-Palomares, S.E.; Garcia-Sepulveda, C.A. Identification of Clinically Relevant HIV Vif Protein Motif Mutations through Machine Learning and Undersampling. Cells 2023, 12, 772. [Google Scholar] [CrossRef]
- Schilling, C.; Keller, M.; Scherr, D.; Oesterlein, T.; Haïssaguerre, M.; Schmitt, C.; Dössel, O.; Luik, A. Fuzzy decision tree to classify complex fractionated atrial electrograms. Biomed. Eng./Biomed. Tech. 2015, 60, 245–255. [Google Scholar] [CrossRef]
- Duque, S.; Orozco-Duque, A.; Kremen, V.; Novak, D.; Tobón, C.; Bustamante, J. Feature subset selection and classification of intracardiac electrograms during atrial fibrillation. Biomed. Signal Process. Control 2017, 38, 182–190. [Google Scholar] [CrossRef]
- Rivera-Caravaca, J.M.; Roldan, V.; Vicente, V.; Lip, G.Y.H.; Marin, F. Particulate Matter and Temperature: Increased Risk of Adverse Clinical Outcomes in Patients with Atrial Fibrillation. Mayo Clin. Proc. 2020, 95, 2360–2369. [Google Scholar] [CrossRef] [PubMed]
- Tayyib, N.; Coyer, F.; Lewis, P.A. A Two-Arm Cluster Randomized Control Trial to Determine the Effectiveness of a Pressure Ulcer Prevention Bundle for Critically Ill Patients. J. Nurs. Sch. 2015, 47, 237–247. [Google Scholar] [CrossRef]
- Dutta, D.; Dutta, P.; Sil, J. Simultaneous feature selection and clustering with mixed features by multi objective genetic algorithm. Int. J. Hybrid Intell. Syst. 2014, 11, 41–54. [Google Scholar] [CrossRef]
- Hancer, E.; Xue, B.; Zhang, M. Differential evolution for filter feature selection based on information theory and feature ranking. Knowl.-Based Syst. 2018, 140, 103–119. [Google Scholar] [CrossRef]
- Gu, Q.; Li, Z.; Han, J. Generalized fisher score for feature selection. arXiv 2012, arXiv:1202.3725. [Google Scholar]
- Bernardini, A.; Bindini, L.; Antonucci, E.; Berteotti, M.; Giusti, B.; Testa, S.; Palareti, G.; Poli, D.; Frasconi, P.; Marcucci, R. Machine learning approach for prediction of outcomes in anticoagulated patients with atrial fibrillation. Int. J. Cardiol. 2024, 407, 132088. [Google Scholar] [CrossRef] [PubMed]
- Hindricks, G.; Potpara, T.; Dagres, N.; Arbelo, E.; Bax, J.J.; Blomström-Lundqvist, C.; Boriani, G.; Castella, M.; Dan, G.A.; Dilaveris, P.E.; et al. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS) The Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) Developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Eur. Heart J. 2021, 42, 373–498. [Google Scholar] [CrossRef] [PubMed]
- Alonso, A.; Yin, X.; Roetker, N.S.; Magnani, J.W.; Kronmal, R.A.; Ellinor, P.T.; Chen, L.Y.; Lubitz, S.A.; McClelland, R.L.; McManus, D.D.; et al. Blood Lipids and the Incidence of Atrial Fibrillation: The Multi-Ethnic Study of Atherosclerosis and the Framingham Heart Study. J. Am. Heart Assoc. 2014, 3, e001211. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.R.; Choi, E.K.; Han, K.D.; Lee, S.H.; Oh, S. Effect of the variability of blood pressure, glucose level, total cholesterol level, and body mass index on the risk of atrial fibrillation in a healthy population. Heart Rhythm 2020, 17, 12–19. [Google Scholar] [CrossRef]
- Pauklin, P.; Zilmer, M.; Eha, J.; Tootsi, K.; Kals, M.; Kampus, P. Markers of Inflammation, Oxidative Stress, and Fibrosis in Patients with Atrial Fibrillation. Oxidative Med. Cell. Longev. 2022, 2022, 4556671. [Google Scholar] [CrossRef]
Features | Non-AF Group (n = 339) | AF Group (n = 339) | t/χ2 | p-Value |
---|---|---|---|---|
Age/years | 76.53 ± 11.16 | 79.48 ± 9.45 | −3.717 | <0.001 |
CO/ppm | 14.94 ± 4.44 | 13.82 ± 4.76 | 3.187 | 0.002 |
Minimum Temperature/°C | 13.87 ± 9.45 | 11.78 ± 9.49 | 2.874 | 0.004 |
Seasons | 36.091 | <0.001 | ||
Spring/n | 68 (20.1%) | 112 (33.0%) | ||
Summer/n | 65 (19.2%) | 99 (29.2%) | ||
Autumn/n | 95 (28.0%) | 61 (18.0%) | ||
Winter/n | 111 (32.7%) | 67 (19.8%) | ||
CRP/(mg·L−1) | 27.74 ± 55.00 | 9.52 ± 12.03 | 2.950 | 0.003 |
Platelets/(L−1) | 194.91 ± 76.04 | 169.49 ± 56.80 | 4.168 | <0.001 |
Platelet Distribution Width/(%) | 12.43 ± 2.37 | 13.31 ± 2.88 | −4.328 | <0.001 |
Large Platelet Ratio/(%) | 29.90 ± 8.33 | 31.86 ± 8.11 | −3.709 | <0.001 |
Platelet Crit/(%) | 0.21 ± 0.08 | 0.19 ± 0.69 | 3.121 | 0.0019 |
Mean Platelet Volume/(fL) | 10.63 ± 1.09 | 10.91 ± 1.11 | −4.009 | <0.001 |
LDL/(mmol·L−1) | 2.38 ± 0.99 | 2.04 ± 0.79 | 4.600 | <0.001 |
Uric Acid/(μmol·L−1) | 0.34 ± 0.13 | 0.38 ± 0.13 | −5.352 | <0.001 |
TC/(mmol·L−1) | 4.38 ± 1.26 | 3.90 ± 1.03 | 4.534 | <0.001 |
PM10/(μg/m3) | 31.97 ± 16.75 | 34.99 ± 17.35 | −2.306 | 0.021 |
NO2/(ppbv) | 22.14 ± 12.54 | 24.24 ± 14.69 | −2.002 | 0.046 |
Diabetes/n | 2.446 | 0.015 | ||
Diabetes/n | 129 (38.1%) | 99 (29.2%) | ||
Not Diabetes/nic | 210 (61.9%) | 240 (70.8%) | ||
Hypertension/n | −2.664 | 0.024 | ||
Hypertension/n | 236 (69.6%) | 262 (77.3%) | ||
Not Hypertension/n | 103 (30.4%) | 77 (22.7%) | ||
Erythrocyte Distribution Width/(fL) | 45.13 ± 6.12 | 46.14 ± 6.46 | −2.072 | 0.039 |
Erythrocyte Pressure/(L/L) | 37.38 ± 6.92 | 38.37 ± 6.30 | −1.959 | 0.049 |
High-Density Lipoprotein (HDL) | 1.61 ± 0.65 | 1.51 ± 0.46 | 2.168 | 0.031 |
Abbreviation | Explanation |
---|---|
HC-MFS | A novel dual feature-selection methodology, combining hierarchical clustering with Fisher scores |
FS | Fisher score feature-selection method |
R_F | Relief_F feature-selection method |
MI | Mutual information feature-selection method |
DCFS | Distance Correlation Factor feature-selection method |
KNN | A classification model: K-nearest Neighbor |
RF | A classification model: Random Forest |
SVM | A classification model: Support Vector Machine |
NB | A classification model: Naive Bayes |
LR | A classification model: the Logistic Regression Model |
IMV | Only missing values are processed on the dataset. |
IMV+SS | The dataset is processed for missing values and standardized. |
IMV+OR | The dataset is processed for missing values and outliers. |
IMV+OR+SS | The dataset is processed for missing values, outlier values, and standardization. |
Methods | Model | Parameter | Parameter Value | Datasets |
---|---|---|---|---|
HC-MFS | KNN | n_neighbors | 19 | IMV+OR+SS |
p | 1 | |||
weights | distance | |||
RF | max_depth | 16 | IMV+SS | |
random_state | 2 | |||
n_estimators | 300 | |||
SVM | C | 1 | IMV | |
kernel | linear | |||
NB | / | GaussianNB | IMV+SS | |
LR | solver | liblinear | IMV | |
penalty | L1 | |||
C | 10 | |||
FS | KNN | n_neighbors | 30 | IMV+SS |
p | / | |||
weights | uniform | |||
RF | max_depth | 5 | IMV+SS | |
random_state | 2 | |||
n_estimators | 300 | |||
SVM | C | 1 | IMV+SS | |
kernel | linear | |||
NB | / | GaussianNB | IMV | |
LR | solver | liblinear | IMV+SS | |
penalty | L2 | |||
C | 1 | |||
R_F | KNN | n_neighbors | 18 | IMV+SS |
p | 1 | |||
weights | distance | |||
RF | max_depth | 3 | IMV+SS | |
random_state | 1 | |||
n_estimators | 100 | |||
SVM | C | 1 | IMV+OR+SS | |
kernel | linear | |||
NB | / | GaussianNB | IMV+OR+SS | |
LR | solver | liblinear | IMV | |
penalty | L1 | |||
C | 10 | |||
MI | KNN | n_neighbors | 20 | IMV+SS |
p | 1 | |||
weights | distance | |||
RF | max_depth | 19 | IMV+SS | |
random_state | 1 | |||
n_estimators | 200 | |||
SVM | C | 1 | IMV | |
kernel | linear | |||
NB | / | GaussianNB | IMV | |
LR | solver | liblinear | IMV+SS | |
penalty | L2 | |||
C | 0.1 | |||
DCFS | KNN | n_neighbors | 28 | IMV+SS |
p | / | |||
weights | uniform | |||
RF | max_depth | 4 | IMV+OR+SS | |
random_state | 2 | |||
n_estimators | 100 | |||
SVM | C | 1 | IMV+SS | |
kernel | linear | |||
NB | / | GaussianNB | IMV+SS | |
LR | solver | liblinear | IMV+SS | |
penalty | L2 | |||
C | 1 |
Method | Classification | Mean | Initial Feature Subsets |
---|---|---|---|
HC-MFS | 1 | 0.0260 | 8 |
2 | 0.0111 | ||
3 | 0.0257 |
Method | Feature Subsets |
---|---|
HC-MFS | {Season, uric acid, low-density lipoprotein, total cholesterol, platelet distribution width, platelets, mean platelet volume, age, large platelet ratio, C-reactive protein, high-density lipoprotein, and erythrocyte distribution width} |
FS | {Season, uric acid, LDL, total cholesterol, platelet distribution width, platelets, mean platelet volume, age, large platelet ratio, CO, platelet pressure, and C-reactive protein} |
R_F | {Season, hypertension, large platelet ratio, diabetes mellitus, high-density lipoprotein, erythrocyte pressure product, C-reactive protein, total cholesterol, low-density lipoprotein, platelet pressure product, platelets, and age} |
MI | {C-reactive protein, total cholesterol, LDL, large platelet ratio, platelets, HDL, erythrocyte pressure volume, erythrocyte distribution width, season, platelet distribution width, uric acid, and PM10} |
DCFS | {Season, uric acid, total cholesterol, LDL, platelet distribution width, age, platelets, mean platelet volume, large platelet ratio, CO, HDL, and C-reactive protein} |
Method | Accuracy | Precision | Recall | F1 | AUC | Datasets |
---|---|---|---|---|---|---|
HC-MFS | 0.7868 | 0.7887 | 0.8000 | 0.7943 | 0.7964 | IMV+SS |
FS | 0.7574 | 0.7403 | 0.8183 | 0.7755 | 0.7556 | IMV+SS |
R_F | 0.7206 | 0.7162 | 0.7571 | 0.7361 | 0.7195 | IMV+SS |
MI | 0.7574 | 0.7467 | 0.8000 | 0.7724 | 0.7561 | IMV+SS |
DCFS | 0.7647 | 0.7639 | 0.7857 | 0.7746 | 0.7641 | IMV+OR+SS |
Method | RMSE | R | std |
---|---|---|---|
HC-MFS | 0.4774 | 0.5437 | 0.4769 |
FS | 0.4926 | 0.5156 | 0.4899 |
R_F | 0.5286 | 0.4405 | 0.5278 |
MI | 0.4926 | 0.5146 | 0.4912 |
DCFS | 0.4851 | 0.5288 | 0.4848 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, H.; Lu, L.; Xiong, H.; Fan, C.; Fan, L.; Lin, Z.; Zhang, H. A Novel Approach to Dual Feature Selection of Atrial Fibrillation Based on HC-MFS. Diagnostics 2024, 14, 1145. https://doi.org/10.3390/diagnostics14111145
Liu H, Lu L, Xiong H, Fan C, Fan L, Lin Z, Zhang H. A Novel Approach to Dual Feature Selection of Atrial Fibrillation Based on HC-MFS. Diagnostics. 2024; 14(11):1145. https://doi.org/10.3390/diagnostics14111145
Chicago/Turabian StyleLiu, Hong, Lifeng Lu, Honglin Xiong, Chongjun Fan, Lumin Fan, Ziqian Lin, and Hongliu Zhang. 2024. "A Novel Approach to Dual Feature Selection of Atrial Fibrillation Based on HC-MFS" Diagnostics 14, no. 11: 1145. https://doi.org/10.3390/diagnostics14111145
APA StyleLiu, H., Lu, L., Xiong, H., Fan, C., Fan, L., Lin, Z., & Zhang, H. (2024). A Novel Approach to Dual Feature Selection of Atrial Fibrillation Based on HC-MFS. Diagnostics, 14(11), 1145. https://doi.org/10.3390/diagnostics14111145