Enhancing Self-Care Prediction in Children with Impairments: A Novel Framework for Addressing Imbalance and High Dimensionality
Abstract
:1. Introduction
1.1. Problem Statement
1.2. Related Works
1.3. Objectives
2. Materials and Methods
2.1. Data
2.2. Resampling Method
2.3. Machine Learning (ML)
2.3.1. Support Vector Machine (SVM)
2.3.2. Random Forest (RF)
2.3.3. Decision Tree (DT)
2.3.4. Bagging
2.4. Feature Selection
2.4.1. Mutual Information (MI)
- Step 1: Input original Features as X_ Original.
- Step 2: Determine the connection among input (feature) and output by computing gain ratio scores (target).
- Step 3: A strong correlation indicates a greater dependence on the target attribute. To develop the model, SelectKBest() was used to choose only the features with highest gain scores.
- Step 4: Finally, all the features that will be used in the classification model will be transferred as X_MI based on the optimal gain scores.
2.4.2. Feature Selection Using Wrapper Methods
2.5. Hyperparameter
2.6. Comparative methodology analysis
- ensemble Approach: Unlike previous works, our study employs a combination of Random Forest, Decision Tree, SVM, and Bagging Classifier, providing a more robust and diverse approach.
- Resampling and Data Shuffling: In contrast to previous works, our approach addresses both data imbalance and generalization issues, ensuring a more comprehensive solution.
- Feature Selection Strategy: Hyper-framework Feature Selection (MI, RF-RFE): Differing from previous works feature selection, we utilize a hyper-framework incorporating mutual-information statistics and RF-RFE, resulting in improved model interpretability and efficiency.
- SHAP Analysis: In contrast to previous works reliance on traditional metrics, we employ Shapley Value Explanation (SHAP) for a more nuanced understanding of feature significance.
- Overall Accuracy (99%): Exceeding the accuracy reported in contemporary literature, our model achieves an outstanding accuracy of 99% for both binary and multi label SCADI datasets.
- Fewest Unique Features: Distinctively, our model achieves superior accuracy while utilizing the fewest number of unique features, outperforming existing literature.
- Applicability to Medical Industry: Through hyperparameter tuning, our model showcases potential utility in diagnosing self-care issues within the medical industry, demonstrating broader applicability.
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Le, T.; Son, L.H.; Vo, M.T.; Lee, M.Y.; Baik, S.W. A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset. Symmetry 2018, 10, 250. [Google Scholar] [CrossRef]
- Lan, K.; Wang, D.-T.; Fong, S.; Liu, L.-S.; Wong, K.K.L.; Dey, N. A Survey of Data Mining and Deep Learning in Bioinformatics. J. Med. Syst. 2018, 42, 139. [Google Scholar] [CrossRef] [PubMed]
- Goshvarpour, A. A Novel Feature Level Fusion for Heart Rate Variability Classification Using Correntropy and Cauchy-Schwarz Divergence. J. Med. Syst. 2018, 42, 109. [Google Scholar] [CrossRef] [PubMed]
- Rao, H.; Shi, X.; Rodrigue, A.K.; Feng, J.; Xia, Y.; Elhoseny, M.; Yuan, X.; Gu, L. Feature selection based on artificial bee colony and gradient boosting decision tree. Appl. Soft Comput. 2019, 74, 634–642. [Google Scholar] [CrossRef]
- El Houby, E.M. A survey on applying machine learning techniques for management of diseases. J. Appl. Biomed. 2018, 16, 165–174. [Google Scholar] [CrossRef]
- Chen, G.; Chen, J. A novel wrapper method for feature selection and its applications. Neurocomputing 2015, 159, 219–226. [Google Scholar] [CrossRef]
- Sharifai, G.A.; Zainol, Z. Feature Selection for High-Dimensional and Imbalanced Biomedical Data Based on Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm. Genes 2020, 11, 717. [Google Scholar] [CrossRef]
- de Sá, A.G.; Pereira, A.C.; Pappa, G.L. A customized classification algorithm for credit card fraud detection. Eng. Appl. Artif. Intell. 2018, 72, 21–29. [Google Scholar] [CrossRef]
- Chen, H.; Li, T.; Fan, X.; Luo, C. Feature selection for imbalanced data based on neighborhood rough sets. Inf. Sci. 2019, 483, 1–20. [Google Scholar] [CrossRef]
- Elhoseny, M.; Mohammed, M.A.; Mostafa, S.A.; Abdulkareem, K.H.; Maashi, M.S.; Garcia-Zapirain, B.; Mutlag, A.A.; Maashi, M.S. A new multi-agent feature wrapper machine learning approach for heart disease diagnosis. Comput. Mater. Contin. 2021, 67, 51–71. [Google Scholar] [CrossRef]
- Albashish, D.; Hammouri, A.I.; Braik, M.; Atwan, J.; Sahran, S. Binary biogeography-based optimization based SVM-RFE for feature selection. Appl. Soft Comput. 2021, 101, 107026. [Google Scholar] [CrossRef]
- Abdel-Basset, M.; El-Shahat, D.; El-Henawy, I.; de Albuquerque, V.H.C.; Mirjalili, S. A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst. Appl. 2020, 139, 112824. [Google Scholar] [CrossRef]
- Elavarasan, D.; Vincent P M, D.R.; Srinivasan, K.; Chang, C.-Y. A Hybrid CFS Filter and RF-RFE Wrapper-Based Feature Extraction for Enhanced Agricultural Crop Yield Prediction Modeling. Agriculture 2020, 10, 400. [Google Scholar] [CrossRef]
- Amini, F.; Hu, G. A two-layer feature selection method using Genetic Algorithm and Elastic Net. Expert Syst. Appl. 2021, 166, 114072. [Google Scholar] [CrossRef]
- Zarchi, M.; Bushehri, S.F.; Dehghanizadeh, M. SCADI: A standard dataset for self-care problems classification of children with physical and motor disability. Int. J. Med. Inform. 2018, 114, 81–87. [Google Scholar] [CrossRef] [PubMed]
- Islam, B.; Ashafuddula, N.I.M.; Mahmud, F. A Machine Learning Approach to Detect Self-Care Problems of Children with Physical and Motor Disability. In Proceedings of the 2018 21st International Conference of Computer and Information Technology, ICCIT 2018, Dhaka, Bangladesh, 21–23 December 2018. [Google Scholar]
- Liu, L.; Zhang, B.; Wang, S.; Li, S.; Zhang, K.; Wang, S. Feature selection based on feature curve of subclass problem. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019. [Google Scholar]
- Souza, P.V.C.; dos Reis, A.G.; Marques, G.R.R.; Guimaraes, A.J.; Araujo, V.J.S.; Araujo, V.S.; Rezende, T.S.; Batista, L.O.; da Silva, G.A. Using hybrid systems in the construction of expert systems in the identification of cognitive and motor problems in children and young people. In Proceedings of the 2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), New Orleans, LA, USA, 23–26 June 2019. [Google Scholar]
- Akyol, K. Comparing of deep neural networks and extreme learning machines based on growing and pruning approach. Expert Syst. Appl. 2020, 140, 112875. [Google Scholar] [CrossRef]
- Putatunda, S. Care2Vec: A hybrid autoencoder-based approach for the classification of self-care problems in physically disabled children. Neural Comput. Appl. 2020, 32, 17669–17680. [Google Scholar] [CrossRef]
- Prasetiyowati, M.I.; Maulidevi, N.U.; Surendro, K. Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest. J. Big Data 2021, 8, 84. [Google Scholar] [CrossRef]
- Sevinç, E. An empowered AdaBoost algorithm implementation: A COVID-19 dataset study. Comput. Ind. Eng. 2022, 165, 107912. [Google Scholar] [CrossRef]
- Qasim, H.M.; Ata, O.; Ansari, M.A.; Alomary, M.N.; Alghamdi, S.; Almehmadi, M. Hybrid Feature Selection Framework for the Parkinson Imbalanced Dataset Prediction Problem. Medicina 2021, 57, 1217. [Google Scholar] [CrossRef]
- Elyan, E.; Moreno-Garcia, C.F.; Jayne, C. CDSMOTE: Class decomposition and synthetic minority class oversampling technique for imbalanced-data classification. Neural Comput. Appl. 2021, 33, 2839–2851. [Google Scholar] [CrossRef]
- Ayon, S.I.; Islam, M.; Hossain, R. Coronary Artery Heart Disease Prediction: A Comparative Study of Computational Intelligence Techniques. IETE J. Res. 2020, 68, 2488–2507. [Google Scholar] [CrossRef]
- Senan, E.M.; Al-Adhaileh, M.H.; Alsaade, F.W.; Aldhyani, T.H.H.; Alqarni, A.A.; Alsharif, N.; Uddin, M.I.; Alahmadi, A.H.; E Jadhav, M.; Alzahrani, M.Y. Diagnosis of Chronic Kidney Disease Using Effective Classification Algorithms and Recursive Feature Elimination Techniques. J. Healthc. Eng. 2021, 2021, 1004767. [Google Scholar] [CrossRef] [PubMed]
- Speiser, J.L. A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data. J. Biomed. Inform. 2021, 117, 103763. [Google Scholar] [CrossRef]
- Solorio-Fernández, S.; Carrasco-Ochoa, J.A.; Martínez-Trinidad, J.F. A review of unsupervised feature selection methods. Artif. Intell. Rev. 2020, 53, 907–948. [Google Scholar] [CrossRef]
- Mohammedqasem, R.; Mohammedqasim, H.; Ata, O. Real-time data of COVID-19 detection with IoT sensor tracking using artificial neural network. Comput. Electr. Eng. 2022, 100, 107971. [Google Scholar] [CrossRef]
- Mohammedqasim, H.; Mohammedqasem, R.; Ata, O.; Alyasin, E.I. Diagnosing Coronary Artery Disease on the Basis of Hard Ensemble Voting Optimization. Medicina 2022, 58, 1745. [Google Scholar] [CrossRef]
- Kadam, V.J.; Jadhav, S.M. Performance analysis of hyperparameter optimization methods for ensemble learning with small and medium sized medical datasets. J. Discret. Math. Sci. Cryptogr. 2020, 23, 115–123. [Google Scholar] [CrossRef]
- Zhang, R.; Wu, X.; Chen, Y.; Xiang, Y.; Liu, D.; Bian, X. Grey Wolf Optimizer for Variable Selection in Quantification of Quaternary Edible Blend Oil by Ultraviolet-Visible Spectroscopy. Molecules 2022, 27, 5141. [Google Scholar] [CrossRef]
- Bian, X.; Zhao, Z.; Liu, J.; Liu, P.; Shi, H.; Tan, X. Discretized butterfly optimization algorithm for variable selection in the rapid determination of cholesterol by near-infrared spectroscopy. Anal. Methods 2023, 15, 5190–5198. [Google Scholar] [CrossRef]
- Piri, J.; Mohapatra, P.; Singh, H.K.R.; Acharya, B.; Patra, T.K. An Enhanced Binary Multiobjective Hybrid Filter-Wrapper Chimp Optimization Based Feature Selection Method for COVID-19 Patient Health Prediction. IEEE Access 2022, 10, 100376–100396. [Google Scholar] [CrossRef]


| Method | Methodology | Key Techniques | Best Model | Accuracy | 
|---|---|---|---|---|
| Zarchi et al. [15] | ANN, DT | 205 neurons in the input layer, 40 neurons in the 235 hidden layer | ANN | 83% | 
| Islam et al. [16] | ELM, KNN, SVM, ANN, RF, GB | Principal Component Analysis (PCA) | KNN | 84% | 
| Souza et al. [18] | FNN, C.45,MLP,SVM | Artificial neural networks and fuzzy systems. | FNN | 85% | 
| Putatunda [20] | DT, DNN | Autoencoders and deep neural networks | DNN | 81% | 
| Prasetiyowati et al. [21] | RF | Correlation-Base Feature Selection, Fast fourier transform and inverse fast fourier transform | RF | 84% | 
| Sevinç [22] | Adaboost, DT, Bagging, Extra Tree, GB, RF, Hist.GB | Adaptive Boost Algorithm with Decision Tree | E-ADAD | 85% | 
| Proposed Model | MI-RFE | Resampling, Data Shuffling, Hyper-framework Feature Selection (MI, RF-RFE), SHAP | RF | 99% | 
| Model | Processing | Accuracy | Precision | Recall | F1-Score | Kappa | 
|---|---|---|---|---|---|---|
| RF | Original | 91.42 | 81.25 | 81.26 | 81.37 | 75.69 | 
| MI | 97.10 | 94.70 | 99.10 | 97.29 | 94.34 | |
| MI-RFE | 97.10 | 94.77 | 99.12 | 97.29 | 94.44 | |
| SVC | Original | 88.57 | 78.57 | 68.75 | 73.33 | 66.10 | 
| MI | 86.41 | 88.52 | 99.12 | 93.71 | 87.0 | |
| MI-RFE | 93.80 | 91.22 | 99.14 | 93.99 | 88.0 | |
| Bagging | Original | 87.14 | 73.33 | 68.75 | 70.96 | 62.72 | 
| MI | 95.47 | 91.52 | 99.14 | 95.57 | 90.74 | |
| MI-RFE | 96.20 | 93.10 | 99.22 | 96.42 | 92.80 | |
| DT | Original | 87.14 | 76.92 | 62.5 | 68.96 | 60.96 | 
| MI | 95.30 | 91.52 | 99.20 | 95.55 | 90.60 | |
| MI-RFE | 97.21 | 94.60 | 99.30 | 95.80 | 90.88 | 
| Model | Processing | Accuracy | Precision | Recall | F1-Score | Kappa | 
|---|---|---|---|---|---|---|
| RF | Original | 78.43 | 78.83 | 84.28 | 81.37 | 78.16 | 
| MI | 97.10 | 98.0 | 98.0 | 98.0 | 97.60 | |
| MI-RFE | 97.30 | 98.51 | 98.11 | 98.30 | 98.20 | |
| SVC | Original | 78.35 | 77.41 | 84.28 | 80.50 | 77.90 | 
| MI | 89.33 | 91.70 | 90.60 | 90.20 | 89.0 | |
| MI-RFE | 96.50 | 97.20 | 97.0 | 97.0 | 96.50 | |
| Bagging | Original | 84.51 | 81.99 | 88.57 | 84.91 | 84.16 | 
| MI | 96.37 | 97.12 | 97.0 | 97.0 | 96.50 | |
| MI-RFE | 97.10 | 97.40 | 97.50 | 97.52 | 97.20 | |
| DT | Original | 84.31 | 83.90 | 88.57 | 85.84 | 84.0 | 
| MI | 96.0 | 96.62 | 96.50 | 96.49 | 95.97 | |
| MI-RFE | 97.12 | 97.50 | 97.40 | 97.30 | 97.18 | 
| Class | Model | Hyperparameters | 
|---|---|---|
| Binary | RF | Number of trees = 90, Criterion = gini, Max depth= 6 | 
| SVC | C = 8, kernel = linear, gamma = 2 | |
| DT | Criterion = gini, max depth = 12, max_features = 21 | |
| Bagging | Estimator = RF, number estimators = 50, max samples = 0.9 | |
| Multi | RF | Number of trees = 150, Criterion = gini, Max depth = 10 | 
| SVC | C = 11, kernel = linear, gamma = 22 | |
| DT | Criterion = gini, max depth = 18, max features = 12 | |
| Bagging | Estimator = RF, number estimators = 40, max samples = 0.7 | 
| Class | Model | Accuracy | Precision | Recall | F1-Score | Kappa | 
|---|---|---|---|---|---|---|
| Binary | RF | 99.10 | 98.50 | 99.80 | 99.17 | 98.60 | 
| SVC | 95.55 | 91.52 | 99.90 | 95.57 | 90.77 | |
| Bagging | 98.0 | 96.43 | 99.20 | 98.10 | 96.30 | |
| DT | 98.20 | 96.45 | 99.81 | 98.20 | 98.30 | |
| Multi | RF | 98.60 | 98.70 | 99.80 | 98.59 | 98.60 | 
| SVC | 97.55 | 98.21 | 99.90 | 95.57 | 90.77 | |
| Bagging | 98.30 | 98.55 | 98.20 | 98.60 | 98.27 | |
| DT | 98.12 | 98.05 | 98.01 | 98.10 | 97.70 | 
| Method | FS | NF | Method Validation | Classes | ACC | Year | 
|---|---|---|---|---|---|---|
| Zarchi et al. [15] | - | 205 | 10-fold CV | Multi-class | 83% | 2018 | 
| Islam et al. [16] | PCA | 53 | 5-fold CV | Multi-class | 84% | 2018 | 
| Souza et al. [18] | - | 205 | k-fold | Binary-class | 85% | 2019 | 
| Putatunda [20] | - | 205 | 10-fold CV | Binary-class | 84% | 2020 | 
| - | 205 | 10-fold CV | Multi-class | 81% | 2020 | |
| Prasetiyowati et al. [21] | CBF | 19 | 10-fold CV | Binary-class | 84.14% | 2021 | 
| Sevinç [22] | - | 205 | 5-fold CV | Multi-class | 85% | 2022 | 
| Proposed Model | MI-RFE | 14 | 10-fold CV | Binary-class | 99.10% | - | 
| MI-RFE | 15 | 10-fold CV | Multi-class | 98.60% | - | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alyasin, E.I.; Ata, O.; Mohammedqasim, H.; Mohammedqasem, R. Enhancing Self-Care Prediction in Children with Impairments: A Novel Framework for Addressing Imbalance and High Dimensionality. Appl. Sci. 2024, 14, 356. https://doi.org/10.3390/app14010356
Alyasin EI, Ata O, Mohammedqasim H, Mohammedqasem R. Enhancing Self-Care Prediction in Children with Impairments: A Novel Framework for Addressing Imbalance and High Dimensionality. Applied Sciences. 2024; 14(1):356. https://doi.org/10.3390/app14010356
Chicago/Turabian StyleAlyasin, Eman Ibrahim, Oguz Ata, Hayder Mohammedqasim, and Roa’a Mohammedqasem. 2024. "Enhancing Self-Care Prediction in Children with Impairments: A Novel Framework for Addressing Imbalance and High Dimensionality" Applied Sciences 14, no. 1: 356. https://doi.org/10.3390/app14010356
APA StyleAlyasin, E. I., Ata, O., Mohammedqasim, H., & Mohammedqasem, R. (2024). Enhancing Self-Care Prediction in Children with Impairments: A Novel Framework for Addressing Imbalance and High Dimensionality. Applied Sciences, 14(1), 356. https://doi.org/10.3390/app14010356
 
        


 
       