Compact Data Learning for Machine Learning Classifications
Abstract
:1. Introduction
2. Compact Data Learning
2.1. Reducing Input Features of Machine Learning Systems
2.2. Reducing Samples of Machine Learning Systems
3. Case Study: Compact Arrhythmia Detection
3.1. Experiment Setups
3.2. Reduced Input Features
3.3. Reduced Sample Size
3.4. Overall Performance
4. Challenges and Future Research
4.1. The Complexity of Correlation Calculation
4.2. The Correlation Threshold Optimization
5. Conclusions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Barreno, M.A.; Nelson, B.A.; Sears, R.; Joseph, A.D.; Tygar, J.D. Can machine learning be secure? In Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, Taipei, Taiwan, 21–24 March 2006; pp. 16–25. [Google Scholar]
- Xu, Z.; Saleh, J.H. Machine learning for reliability engineering and safety applications: Review of current status and future opportunities. arXiv 2021, arXiv:2008.08221. [Google Scholar] [CrossRef]
- Drira, K.; Wang, H.; Yu, Q.; Wang, Y.; Yan, Y.; Charoy, F.; Mendling, J.; Mohamed, M.; Wang, Z.; Bhiri, S. Data provenance model for internet of things (iot) systems. In Proceedings of the Service-Oriented Computing—ICSOC 2016 Workshops, Banff, AB, Canada, 10–13 October 2016; pp. 85–91. [Google Scholar]
- Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall: Englewood Cliffs, NJ, USA, 2010. [Google Scholar]
- Mohri, M.; Rostamizadeh, A.; Talwalkar, A. Foundations of Machine Learning; The MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Ramirez, M.A.; Kim, S.-K.; Hamadi, H.A.; Damiani, E.; Byon, Y.-J.; Kim, T.-Y.; Cho, C.-S.; Yeun, C.Y. Poisoning Attacks and Defenses on Artificial Intelligence: A Survey. arXiv 2022, arXiv:2202.10276. [Google Scholar]
- Wang, Y.; Yao, Q.; Kwok, J.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-Shot Learning. arXiv 2019, arXiv:1904.05046. [Google Scholar] [CrossRef]
- Fei-Fei, L.; Fergus, R.; Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 594–611. [Google Scholar] [CrossRef]
- Fink, M. Object classification from a single example utilizing class relevance metrics. In Proceedings of the 17th International Conference on Neural Information Processing Systems, NIPS 2004, Vancouver, BC, Canada, 13–18 December 2004; pp. 449–456. Available online: https://www.researchgate.net/publication/221619654_Object_Classification_from_a_Single_Example_Utilizing_Class_Relevance_Metrics (accessed on 10 January 2024).
- Shu, J.; Xu, Z.; Meng, D. Small sample learning in big data era. arXiv 2018, arXiv:1808.04572. [Google Scholar]
- Adadi, A.; Berrada, M. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
- Tjoa, E.; Guan, C. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4793–4813. [Google Scholar] [CrossRef]
- Fisher, A.; Rudin, C.; Dominici, F. Model class reliance: Variable importance measures for any machine learning model class. arXiv 2018, arXiv:1801.01489. [Google Scholar]
- Casalicchio, G.; Molnar, C.; Bischl, B. Visualizing the feature importance for black box models. arXiv 2018, arXiv:1804.06620. [Google Scholar]
- Lei, J.; G’Sell, M.; Rinaldo, A.; Tibshirani, R.J.; Wasserman, L. Distribution-free predictive inference for regression. J. Am. Stat. Assoc. 2018, 113, 1094–1111. [Google Scholar] [CrossRef]
- Al-Hammadi, A.Y.; Yeun, C.Y.; Damiani, E.; Yoo, P.D.; Hu, J.; Yeun, H.K.; Yim, M.-S. Explainable artificial intelligence to evaluate industrial internal security using EEG signals in IoT framework. Ad Hoc Netw. 2020, 123, 102641. [Google Scholar] [CrossRef]
- Kim, S.K. Toward Compact Data from Big Data. In Proceedings of the 2020 15th International Conference for Internet Technology and Secured Transactions (ICITST), London, UK, 8–10 December 2020; pp. 1–5. [Google Scholar]
- Dean, J. Big Data, Data Mining, and Machine Learning; Wiley: Hoboken, NJ, USA, 2014. [Google Scholar]
- Battams, K. Stream processing for solar physics: Applications and implications for big solar data. arXiv 2020, arXiv:1409.8166. [Google Scholar]
- Kambatla, K.; Kollias, G.; Kumar, V.; Grama, A. Trends in big data analytics. J. Parallel. Distrib. Comput. 2014, 74, 2561–2573. [Google Scholar] [CrossRef]
- Kim, S.K.; Yeun, C.Y.; Damiani, E.; Lo, N.-W. A Machine Learning Framework for Biometric Authentication using Electrocardiogram. IEEE Access 2019, 7, 94858–94868. [Google Scholar] [CrossRef]
- Al Alkeem, E.; Kim, S.K.; Yeun, C.Y.; Zemerly, M.J.; Poon, K.F.; Gianini, G.; Yoo, P.D. An Enhanced Electrocardiogram Biometric Authentication System Using Machine Learning. IEEE Access 2019, 7, 123069–123075. [Google Scholar] [CrossRef]
- Kim, S.K.; Yeun, C.Y.; Yoo, P.D. An Enhanced Machine Learning-based Biometric Authentication System Using RR-Interval Framed Electrocardiograms. IEEE Access 2019, 7, 168669–168674. [Google Scholar] [CrossRef]
- Yoon, S.; Cantwell, W.J.; Yeun, C.Y.; Cho, C.S.; Byon, Y.J.; Kim, T.Y. Defect Detection in Composites by Deep Learning using Highly Nonlinear Solitary Waves. Int. J. Mech. Sci. 2023, 239, 107882. [Google Scholar] [CrossRef]
- Akogul, S. A Novel Approach to Increase the Efficiency of Filter-Based Feature Selection Methods in High-Dimensional Datasets with Strong Correlation Structure. IEEE Access 2023, 11, 115025–115032. [Google Scholar] [CrossRef]
- Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
- Chuang, L.-Y.; Chang, H.-W.; Tu, C.-J.; Yang, C.-H. Improved binary PSO for feature selection using gene expression data. Comput. Biol. Chem. 2008, 32, 29–38. [Google Scholar] [CrossRef]
- Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef] [PubMed]
- Jaeger, J.; Sengupta, R.; Ruzzo, W.L. Improved Gene Selection for Classification of Microarrays. Proc. Pac. Symp. Biocomput. 2003, 53–64. [Google Scholar] [CrossRef]
- Jain, A.K.; Duin, R.P.W.; Mao, J. Statistical Pattern Recognition: A Review. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 4–37. [Google Scholar] [CrossRef]
- Kwak, N.; Choi, C.H. Input Feature Selection by Mutual Information Based on Parzen Window. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 1667–1671. [Google Scholar] [CrossRef]
- Iannarilli, F.J.; Rubin, P.A. Feature Selection for Multiclass Discrimination via Mixed-Integer Linear Programming. IEEE Trans. Pattern Anal. Mach. Intell. 2003, 25, 779–783. [Google Scholar] [CrossRef]
- Kim, S.-K.; Yeun, C.Y.; Yoo, P.D.; Lo, N.-W.; Damiani, E. Deep Learning-Based Arrhythmia Detection Using RR-Interval Framed Electrocardiograms. In Proceedings of the Eighth International Congress on Information and Communication Technology, London, UK, 20–23 February 2023; pp. 11–21. [Google Scholar]
- Ross, S. A First Course in Probability, 8th ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2010. [Google Scholar]
- Kosorok, M.R. On Brownian Distance Covariance and High Dimensional Data. Ann. Appl. Stat. 2009, 3, 1266–1269. [Google Scholar] [CrossRef] [PubMed]
- Székely, G.J.; Rizzo, M.L.; Bakirov, N.K. Measuring and testing dependence by correlation of distances. Ann. Stat. 2007, 35, 2769–2794. [Google Scholar] [CrossRef]
- Goldberger, A.L.; Amaral, L.A.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank Physio Toolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, e215–e220. [Google Scholar] [CrossRef]
Parameter | Setup Value | Description |
---|---|---|
m | 222 | Total number of initial input features |
1019 | Total number of initial training samples | |
2 | Number of output classes | |
0.8 | Correlation threshold | |
0.8416 | Tolerance range |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, S.-K. Compact Data Learning for Machine Learning Classifications. Axioms 2024, 13, 137. https://doi.org/10.3390/axioms13030137
Kim S-K. Compact Data Learning for Machine Learning Classifications. Axioms. 2024; 13(3):137. https://doi.org/10.3390/axioms13030137
Chicago/Turabian StyleKim, Song-Kyoo (Amang). 2024. "Compact Data Learning for Machine Learning Classifications" Axioms 13, no. 3: 137. https://doi.org/10.3390/axioms13030137
APA StyleKim, S. -K. (2024). Compact Data Learning for Machine Learning Classifications. Axioms, 13(3), 137. https://doi.org/10.3390/axioms13030137