Next Article in Journal
Spacetime Symmetry and LemaîTre Class Dark Energy Models
Previous Article in Journal
An Information Theoretically Secure E-Lottery Scheme Based on Symmetric Bivariate Polynomials
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Robust Framework for Self-Care Problem Identification for Children with Disability

Digital Contents Research Institute, Sejong University, Seoul 05006, Korea
*
Author to whom correspondence should be addressed.
Symmetry 2019, 11(1), 89; https://doi.org/10.3390/sym11010089
Submission received: 29 November 2018 / Revised: 7 January 2019 / Accepted: 11 January 2019 / Published: 15 January 2019

Abstract

:
Recently, a standard dataset namely SCADI (Self-Care Activities Dataset) based on the International Classification of Functioning, Disability, and Health for Children and Youth framework for self-care problems identification of children with physical and motor disabilities was introduced. This is a very interesting, important and challenging topic due to its usefulness in medical diagnosis. This study proposes a robust framework using a sampling technique and extreme gradient boosting (FSX) to improve the prediction performance for the SCADI dataset. The proposed framework first converts the original dataset to a new dataset with a smaller number of dimensions. Then, our proposed framework balances the new dataset in the previous step using oversampling techniques with different ratios. Next, extreme gradient boosting was used to diagnose the problems. The experiments in terms of prediction performance and feature importance were conducted to show the effectiveness of FSX as well as to analyse the results. The experimental results show that FSX that uses the Synthetic Minority Over-sampling Technique (SMOTE) for the oversampling module outperforms the ANN (Artificial Neural Network) -based approach, Support vector machine (SVM) and Random Forest for the SCADI dataset. The overall accuracy of the proposed framework reaches 85.4%, a pretty high performance, which can be used for self-care problem classification in medical diagnosis.

1. Introduction

With strong growth recently, machine learning has been applied in so many application domains such as economics [1,2,3], medical diagnosis [4,5,6,7,8,9,10] and biomedical application [11,12]. In medical systems, many intelligence systems have been proposed to improve medical services. For disease diagnosis, Malmir, Amini & Chang [13] proposed a decision support system for medical services with uncertain input values. The main purpose of this system is to suggest physicians make faster and more accurate medical diagnoses on several specific diseases. They only enterer their associated symptoms’ values and waiting for the conclusion suggestions of this system. Their tasks are to check the accuracy of that suggestion and make the disease diagnosis. Next, Eshtay, Faris & Obeid [14] proposed an improvement of an extreme learning machine based on competitive swarm optimization to classify the medical dataset which has 15 medical classes. The experiment in this study demonstrates that their model can obtain better generalization performance with only a smaller number of hidden neurons as well as with higher stability. Moreover, to support the services of hospitals, Turgeman, May & Sciulli [15] proposed a machine learning model based on Cubist tree to predict the hospital length of stay (denoted by LOS) for patients at the time of admission. The model was constructed and validated on the de-identified administrative dataset collected from the Veterans Health Administration (VHA) hospitals. This dataset consists of 20,321 inpatient admissions of 4,840 Congestive Heart Failure (CHF) patients. As another application, Liu, Chen & Tzeng [16] proposed a model to identify the main factors in consumers’ adoption behavior of intelligent medical terminals to improve the quality of this system. Detection of key influential factors (and their interrelationships) of consumer adoption behavior is useful for improving and promoting intelligent medical terminals toward achieving a set aspiration level in each dimension and criterion. Later, Mustaqeem et al. [17] introduced a dataset from heart disease patients in a local hospital and proposed a model to predict for heart disease diagnosis. That system promises to be a useful contribution in the field of e-health and medical informatics. Next, Lucini et al. [18] proposed a text mining approach to predict future hospitalizations and discharges based on early ED patient records using the SOAP framework. The average F1-score of Nu-SVC reported in this study reached 77.70% with a standard deviation of 0.66%.
Disabilities including physical disability and motor disability are disorders that restrict individual activities [19]. In reality, disability diagnosis is a complex process that requires expert occupational therapists. Therefore, the International Classification of Functioning, Disability and Health for Children and Youth (ICF-CY) was proposed to collect characteristics of children. Based on this framework, machine learning algorithms can learn and predict the disability. ICF-CY is the most widely used framework for disability diagnosis [20,21,22], which is a multipurpose classification conceptual framework and is a branch of International Classification of Functioning, Disability and Health (denoted by ICF) [23]. ICF and ICF-CY are frequently used as conceptual frameworks in disability evaluation, assessment, as well as classification in machine learning. In the ICF-CY framework, a number of self-care activities including drinking, eating, caring for body parts, washing oneself, etc are considered; in which, self-care activities are defined as a subset of activities and a participation component in ICF-CY and usually are expanded from birth to 7 years old [23]. For instance, children with cerebral palsy usually have problems in self-care skills due to their limitations in thinking [24]. Detection for self-care problems of children is an important problem that supports the process of choosing treatment approaches. Due to the diversity and complexity of self-care problems and lack of professional therapists, the expert system for self-care problem classification can help therapists in choosing the best treatment approach for specific cases.
In practical datasets, data has an uneven distribution between classes, this is called the class imbalance problem (CIP). This problem leads to difficulty in prediction and reduces the performance of the traditional classifiers. There are many approaches for dealing with class imbalance problems in which the most advanced is the sampling technique. Le et al. [2] utilized several oversampling techniques to improve the performance of the bankruptcy prediction task. Ijaz et al. [25] used DBSCAN-Based Outlier Detection, Synthetic Minority Over Sampling Technique (SMOTE), and Random Forest for Type 2 Diabetes and Hypertension identification. Recently, Bang et al. [26] utilized the adaptive data-boosting technique for speech emotion detection in emotionally-imbalanced small-sample environments.
SCADI (Self-Care Activities Dataset based on ICF-CY), the first standard dataset for self-care activities of children with physical and motor disabilities, which is based on the ICF-CY framework, was introduced by Zarchi, Bushehri & Dehghanizadeh [27] in 2018. In this dataset, the authors collected self-care actives of 70 children in seven classes. Each child had been carefully investigated through 29 self-care activities based on ICF-CY. In this study, the authors also proposed the ANN-based system for the self-care problem classification of children with physical and motor disability. However, this dataset has three issues that reduce the performance of the system. Therefore, this study proposed a robust framework for dealing with these issues to obtain good prediction performance. Briefly, the main contributions of this study can be summarized as follows. (1) A preprocessing module to convert the SCADI with 205 features to SCADI_2 with only 31 features was proposed. (2) An FSX framework using an oversampling technique for dealing with the class imbalance problem of identifying the self-care problem of children with physical and motor disabilities was proposed. (3) The importance feature analysis was conducted to identify the important features. The results of the study achieved 85.4% in terms of accuracy which is expected to make the problems of diagnosing as well as counseling and treatment better.
The rest of this paper is structured as follows. Section 2 gives the related works on the class imbalance problem. As stated in the proposal, Section 3 presents the materials and methods of this study; in which, the summary of the SCADI dataset is introduced. Next, we propose a robust framework with three modules including preprocessing, over-sampling and classification modules. The experiment is conducted and presented in Section 4 to show the effectiveness of the developed framework for the self-care problem classification. Finally, the conclusion of this study is presented in Section 5.

2. Related Works

Due to the class imbalance problem facing the SCADI dataset, the peformance of the classifier will be reduced. Therefore, a specific technique should be used for handling this probelm. Currently, many methods are given to deal with class imbalance problem which are grouped into four following categories [28]. (1) Algorithm level approaches that modify the existing classifier learning algorithms to bias the learning toward the minority class [29,30] without adjusting training data. (2) Data level approaches change the class distribution by resampling the data space [31,32] to improve the predictive performance. Data level approaches consist of three stratergies including under-sampling, oversampling and hybrids techniques. This way, the modification of the learning algorithm is avoided since the effect caused by imbalance is decreased with the resampling step. (3) The cost-sensitive learning framework falls between the data and algorithm-level approaches. Both data-level transformations (by adding costs to instances) and algorithm-level modifications (by modifying the learning process to accept costs) [33,34] are combined together. The classifier is biased toward the minority class by assuming higher misclassification costs for this class and seeking to minimize the total cost errors of both classes. (4) Ensemble-based methods usually consist of a combination between an ensemble learning algorithm and one of the techniques above, specifically, data-level and cost-sensitive ones [35]. By combining the data level approach to the ensemble learning algorithm, the new hybrid method usually preprocesses the data before training each classifier, whereas cost-sensitive ensembles instead of modifying the base classifier in order to accept costs in the learning process, guide the cost minimization via the ensemble learning algorithm. The above four methods are used depending on the dataset to improve performance.
Among the above approaches for handling the class imbalance problem, the data-level approaches used are quite popular. There are some famous data-level approaches such as SMOTE [31], SMOTE-ENN [36] and SMOTE-Tomek [36]. SMOTE generates synthetic data randomly. SMOTE-ENN and SMOTE-Tomek are the combination of the oversampling technique and undersampling technique. They use SMOTE for generating the synthetic data samples and then they use ENN or Tomek to remove the noise sampling in the balanced dataset.

3. Materials and Methods

As per the discussion in the introduction, this section summarises the standard dataset, namely SCADI, for self-care problem classification in Section 3.1. Next, we analyse three problems of this dataset and give three solutions for solving these problems in Section 3.2 and Section 3.3. Then, the extreme gradient boosting model will be summarized in Section 3.4. The overall framework as the main contribution of this study will be presented in Section 3.5.

3.1. SCADI Dataset

The experimental dataset, namely SCADI [27], was collected by Zarchi et al. in collaboration with two professional therapists. SCADI can be considered as the first standard dataset in self-care activities of children with physical and motor disability as affirmed by this article [27]. Twenty-nine activities in eight categories including washing oneself, caring for body parts, toileting, dressing, eating, drinking, looking after one’s health and looking after one’s safety were considered as self-care activities (see Table 1). Each self-care activity listed above has one of the seven values represented for the level including NO impairment, MILD impairment, MODERATE impairment, SEVERE impairment, COMPLETE impairment, NOT Specified and NOT Applicable. Each above status is one binary feature in the SCADI dataset. Hence, the total number of features in the SCADI dataset is 203 (29 × 7). Additionally, the age and gender of participants are added to these features namely F1 and F2 respectively. In the gender feature, a male participant is represented by 0 and a female participant is represented by 1. In summary, SCADI consists of 205 features for each child.
To collect the records of the SCADI dataset, 70 students who study in the three educational and health centers belonging to Yazd, Iran at the period from 2016 to 2017 were invested. The age of these children was from 6 to 18 years old. Seventy children were categorized into seven groups based on their self-care activities by professional therapists as shown in Table 2, which will be used as the target class in the medical diagnosis.

3.2. Preprocessing

In the SCADI dataset, we realize the first problem is that the use of seven binary features for each activity will make the data have more dimensions and be susceptible to interference. Therefore, we proposed a modification of this dataset that uses only one feature for one activity. In the modified dataset, we use seven levels from 0 to 6 to represent seven statuses including NO impairment, MILD impairment, MODERATE impairment, SEVERE impairment, COMPLETE impairment, NOT Specified and NOT Applicable, respectively. Therefore, the modified dataset has 31 features including 29 activities, age and gender. Figure 1 shows the visualization of the SCADI_2 dataset using the PCA technique.
In addition, the distribution of seven classes has the problem of class imbalance. This issue will reduce the performance of the predicted model. For handling this issue, we use the SMOTE algorithm which is introduced in the next subsection to balance the dataset.

3.3. SMOTE Algorithm

To deal with the class imbalance problem occurring in the experimental dataset, we use the SMOTE algorithm for resampling SCADI_2 first. The SMOTE algorithm [31] generates synthetic samples for the minority classes based on the feature space similarities between the real samples. Fundamentally, SMOTE will consider k-nearest neighbors using the Euclidean distance denoted by K x i for each sample xi that the algorithm wants to generate. Next, SMOTE randomly selects an element x ^ i in K x i so that x ^ i is also in the same class with x i . The feature vector of the new synthetic sample denoted xnew is determined as follows.
x n e w = x i + ( x ^ i x i ) × δ
where δ ∈ [0, 1] is a random number. According to Equation (1), the new synthetic sample is a point along the line segment joining x i and x ^ i . The algorithm will repeat this strategy until the target ratio is in reach. In that case, the resample dataset is nearly balanced. Note that if the SMOTE algorithm did not find any nearest neighbor of sample x i or the number of nearest neighbors is smaller than k, we modified this algorithm by only duplicating this data point.
Figure 2 presents the flowchart of the SMOTE algorithm with our modification for dealing with the problem of a very small number of nearest neighbors. Figure 3 shows the visualization of the SCADI_2 dataset using Principal component analysis (PCA) after resampling data using SMOTE with different balancing ratios including 0.4, 0.6, 0.8 and 1.0 respectively.

3.4. Extreme Gradient Boosting

Chen and Guestrin [37] proposed Extreme Gradient Boosting (XGBoost) that is summarized as follows. Let D = {(xi, yi)}i=1..n, where xi is a vector of m features and yi is the label value. XGBoost is the tree ensemble model i.e., a set of classification and regression trees (CARTs). The prediction function for a sample xi is as follows.
y ^ i = f f ( x i )
where and y ^ i are the set of all possible CARTs and the predicted value respectively. The objective function can be written as follows.
( θ ) = i l ( y i , y ^ i ) + k Ω ( f k )
where θ represents a CART, i l ( y i , y ^ i ) is the summation of the loss function between true and predicted labels and Ω ( f ) is the regularization used to measure the complexity of the model.
At each iteration t, this model searches a function f t that optimizes the objective and adds this function to the ensemble. Let y ^ i ( t ) be the prediction label of vector xi at the t-th iteration. The model finds f t to optimize the following objective:
( t ) = i = 1 n l ( y i , y ^ i ( t ) ) + i = 1 t Ω ( f i ) = i = 1 n l ( y i , y ^ i ( t 1 ) + f t ( x i ) ) + i = 1 t Ω ( f i )
The second order Taylor expansion was used to approximate the objective at step t as follows.
˜ ( t ) = n = 1 N [ ρ i f t ( x i ) + 1 2 ϱ i f t 2 ( x i ) ] + Ω ( f t )
where ρ i = y ^ i ( t 1 ) l ( y i , y ^ i ( t 1 ) ) and ϱ i = y ^ i ( t 1 ) 2 l ( y i , y ^ i ( t 1 ) ) . The model will iteratively add functions that optimize Equation (4) for several user-specified iterations. In addition, the author defined the regularization as follows:
Ω ( f t ) = γ T + 1 2 λ j = 1 T w j 2
Using Equations (4) and (6), XGBoost will effectively determine the objective to improve the learning time.

3.5. The Robust Framework

Figure 4 shows the architect of the FSX framework. Firstly, the SCADI dataset is passed through to the preprocessing module which will modify the SCADI dataset to give the SCADI_2 dataset as shown in Section 3.2. Then, this dataset is divided into training and testing datasets by 10-fold cross validation. Then, the training set will be resampled by the SMOTE algorithm, which was summarized in Section 3.3. This process helps improve the prediction performance. The balance dataset, which is the result of previous step, will be used to training the XGBoost Classifier summarized in Section 3.4. In the testing phrase, the testing set will be used to predict the problem for the new cases in the testing set. Based on this prediction, the doctor will give counseling and treatment to patients. The pseudocode of FSX is introduced in Algorithm 1.
Algorithm 1. FSX framework
Input: The SCADI dataset
Output: The best model for self-care problem identification and the best AUC value
1Let R = {0.4, 0.6, 0.8, 1.0}
2Let the ACCbest be 0 and Mbest is ermty
3For each r in R:
4 Spliting the SCADI dataset using k-fold cross validation.
5 Balancing the training set using SMOTE with the balancing ratio equals to r.
6 Using the balanced dataset in the previous step for training the XGBr.
7 Evaluating XGBr using the testing set and obtaining the ACCr.
8If ACCr > ACCbest then
9  ACCbest = ACCr
10  Mbest = XGBr
11End for
12Return Mbest and ACCbest

4. Results

4.1. Oversampling Comparision

To choose the best oversampling technique for the SCADI dataset, we perform the oversampling comparision experiment that examines serveral oversampling techniques including SMOTE [31], SMOTE-ENN [36] and SMOTE-Tomek [36] with various balancing ratios for the oversampling module in the FSX framework. Based on the results shown in Table 3, SMOTE with the balancing ratio = 0.8 is the best oversampling technique for the SCADI dataset which yields 85.4% in terms of accuracy.

4.2. Performance Evaluation

This subsection reports on the performances of the proposed method with the balancing ratio = 0.8 compared with the performance of the ANN-based system in Reference [37], SVM and Random Forest (denoted by RF). According to Table 4, we can easily see that using our proposed framework can improve the prediction performance compared with the state-of-the art methods including ANN, SVM and RF for self-care problem classification in medical diagnosis. The proposed framework reaches the best performance at 85.4% while ANN, SVM and RF achieve 0.831, 0.780 and 0.834 respectively. The paired t-test was used to compare the performances of ANN and FSX. We ran two methods for the SCADI dataset 30 times and achieved a p-value = 0.0002 for two sets of performance, which is smaller than the threshold (0.05). Therefore, the proposed approach is better than ANN in terms of accuracy.
Table 5 shows the sensitivity and specificity of the experimental approaches in 10-fold cross validation. Random Forest achieved the best results at 0.786 in terms of sensitivity while FSX got the best results at 0.963 in terms of specificity.

4.3. Feature Importance

In this subsection, the feature importance of each feature over the 10-fold cross validation is reported in Figure 5. Based on these results, we can divide all features into three groups. The first group includes F2, F1, F4, F17 and F23, which are all very important features. Missing one of these values, the model may not have the good performance. The second group consists of F28, F6, F25, F12, F3, F27, F31, F16, F29, F13, F5, F20, F9, F7 and F26, which are still important for improving the performance classification. However, their influence is not equal to the first group. The rest of features including F18, F14, F8, F21, F11, F24, F15, F30, F22, F19 and F10 can be removed. It means that during data collection, we do not need to identify the features in the third group.

5. Conclusions

This study proposed a robust framework, namely FSX, to improve the prediction performance for SCADI, which is a standard dataset of self-care problem classification in medical diagnosis. This framework consists of three modules including the preprocessing module, oversampling module using SMOTE and classification module using the extreme gradient-boosting model. The preprocessing module is used to convert the original dataset to the modified dataset which only has 31 features. The second module is used to balance the training dataset to improve the prediction performance. The last module used a fashion classification model namely XGBoost to achieve the best performance. The first experiment was conducted by utilizing several famous oversampling techniques including SMOTE, SMOTE-ENN and SMOTE-Tomek with various balancing ratios to find the best oversampling method as well as the optimal balancing ratio for the oversampling module. The experimental results show that our proposed framework outperforms the state-of-the-art methods for the SCADI dataset. FSX reaches 85.4% in accuracy for self-care problem classification in medical diagnosis.
For future works, some feature selection approaches will be developed to improve the performance of our framework. Second, we will study sampling techniques including over-sampling and under-sampling techniques to enhance the performance as well.

Author Contributions

S.W.B. proposed the topic and obtained funding; T.L. proposed and implemented the framework. T.L. wrote the paper. S.W.B. improved the quality of the manuscript.

Acknowledgments

This research was supported by the Korean MSIT (Ministry of Science and ICT) under the National Program for Excellence in SW (2015-0-00938), supervised by the IITP (Institute for Information & communications Technology Promotion).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Le, T.; Le, H.S.; Vo, M.T.; Lee, M.Y.; Baik, S.W. A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset. Symmetry 2018, 10, 250. [Google Scholar] [CrossRef]
  2. Le, T.; Lee, M.Y.; Park, J.R.; Baik, S.W. Oversampling techniques for bankruptcy prediction: Novel features from a transaction dataset. Symmetry 2018, 10, 79. [Google Scholar] [CrossRef]
  3. Le, T.; Vo, B.; Baik, S.W. Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept. Eng. Appl. Artif. Intell. 2018, 68, 1–9. [Google Scholar] [CrossRef]
  4. Roan, T.N.; Ali, M.; Le, H.S. δ-equality of intuitionistic fuzzy sets: A new proximity measure and applications in medical diagnosis. Appl. Intell. 2018, 48, 499–525. [Google Scholar]
  5. Le, H.S.; Tran, M.T.; Fujita, H.; Dey, N.; Ashour, A.S.; Vo, T.N.N.; Le, Q.A.; Chu, D.T. Dental diagnosis from X-Ray images: An expert system based on fuzzy computing. Biomed. Signal Process. Control 2018, 39, 64–73. [Google Scholar]
  6. Ali, M.; Le, H.S.; Khan, M.; Nguyen, T.T. Segmentation of dental X-ray images in medical imaging using neutrosophic orthogonal matrices. Expert Syst. Appl. 2018, 91, 434–441. [Google Scholar] [CrossRef]
  7. Vajda, S.; Karargyris, A.; Jäger, S.; Santosh, K.C.; Candemir, S.; Xue, Z.; Antani, S.K.; Thoma, G.R. Feature Selection for Automatic Tuberculosis Screening in Frontal Chest Radiographs. J. Med. Syst. 2018, 42, 146. [Google Scholar] [CrossRef]
  8. Lan, K.; Wang, D.; Fong, S.; Liu, L.; Wong, K.; Dey, N. A Survey of Data Mining and Deep Learning in Bioinformatics. J. Med. Syst. 2018, 42, 139. [Google Scholar] [CrossRef]
  9. Goshvarpour, A.; Goshvarpour, A. A Novel Feature Level Fusion for Heart Rate Variability Classification Using Correntropy and Cauchy-Schwarz Divergence. J. Med. Syst. 2018, 42, 109. [Google Scholar] [CrossRef]
  10. Pham, N.T.; Lee, J.W.; Kwon, G.R.; Park, C.S. Efficient image splicing detection algorithm based on markov features. Multimed. Tools Appl. 2018. [Google Scholar] [CrossRef]
  11. Le, D.H.; Pham, V.H. HGPEC: A Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network. BMC Syst. Biol. 2017, 11, 61. [Google Scholar] [CrossRef] [PubMed]
  12. Le, D.H.; Dao, L.T.M. Annotating Diseases Using Human Phenotype Ontology Improves Prediction of Disease-Associated Long Non-coding RNAs. J. Mol. Biol. 2018, 430, 2219–2230. [Google Scholar] [CrossRef] [PubMed]
  13. Malmir, B.; Amini, M.; Chang, S.I. A medical decision support system for disease diagnosis under uncertainty. Expert Syst. Appl. 2017, 88, 95–108. [Google Scholar] [CrossRef]
  14. Eshtay, M.; Faris, H.; Obeid, N. Improving Extreme Learning Machine by Competitive Swarm Optimization and its application for medical diagnosis problems. Expert Syst. Appl. 2018, 104, 134–152. [Google Scholar] [CrossRef]
  15. Turgeman, L.; May, J.; Sciulli, R. Insights from a machine learning model for predicting the hospital length of stay (los) at the time of admission. Expert Syst. Appl. 2017, 78, 376–385. [Google Scholar] [CrossRef]
  16. Liu, Y.; Chen, Y.; Tzeng, G.H. Identification of key factors in consumers’ adoption behavior of intelligent medical terminals based on a hybrid modified MADM model for product improvement. Int. J. Med. Inform. 2017, 105, 68–82. [Google Scholar] [CrossRef]
  17. Mustaqeem, A.; Anwar, S.M.; Khan, A.R.; Majid, M. A statistical analysis-based recommender model for heart disease patients. Int. J. Med. Inform. 2017, 108, 134–145. [Google Scholar] [CrossRef]
  18. Lucini, F.R.; Fogliatto, F.S.; Silveira, G.J.C.; Neyeloff, J.; Anzanello, M.J.; Kuchenbecker, R.S.; Schaan, B.D. Text mining approach to predict hospital admissions using early medical records from the emergency department. Int. J. Med. Inform. 2017, 100, 1–8. [Google Scholar] [CrossRef] [PubMed]
  19. Lewis Brown, R.; Turner, R.J. Physical disability and depression: Clarifying racial/ ethnic contrasts. J. Aging Health 2010, 22, 977–1000. [Google Scholar] [CrossRef]
  20. Lollar, D.J.; Simeonsson, R.J. Diagnosis to function: Classification for children and youths. J. Dev. Behav. Pediatrics 2005, 26, 323–330. [Google Scholar] [CrossRef]
  21. Lee, A.M. Using the ICF-CY to organise characteristics of children’s functioning. Disabil. Rehabil. 2011, 33, 605–616. [Google Scholar] [CrossRef] [PubMed]
  22. Ståhl, Y.; Granlund, M.; Gäre-Andersson, B.; Enskär, K. Review article: Mapping of children’s health and development data on population level using the classification system ICF-CY. Scand. J. Public Health 2011, 39, 51–57. [Google Scholar] [CrossRef] [PubMed]
  23. Organization, W.H. International Classification of Functioning, Disability, and Health: Children & Youth Version: ICF-CY; World Health Organization: Geneva, Switzerland, 2007. [Google Scholar]
  24. Case-Smith, J. Self-care Strategies for Children with Developmental Disabilities. In Ways of Living: Self-Care Strategies for Special Needs, 2nd ed.; Christiansen, C., Ed.; American Occupational Therapy Association: Bethesda, MD, USA, 2000; pp. 83–121. [Google Scholar]
  25. Ijaz, M.; Alfian, G.; Syafrudin, M.; Rhee, J. Hybrid Prediction Model for Type 2 Diabetes and Hypertension Using DBSCAN-Based Outlier Detection, Synthetic Minority Over Sampling Technique (SMOTE), and Random Forest. Appl. Sci. 2018, 8, 1325. [Google Scholar] [CrossRef]
  26. Bang, J.; Hur, T.; Kim, D.; Lee, J.; Han, Y.; Banos, O.; Kim, J.I.; Lee, S. Adaptive Data Boosting Technique for Robust Personalized Speech Emotion in Emotionally-Imbalanced Small-Sample Environments. Sensors 2018, 18, 3744. [Google Scholar] [CrossRef]
  27. Zarchi, M.S.; Fatemi Bushehri, S.M.M.; Dehghanizadeh, M. SCADI: A standard dataset for self-care problems classification of children with physical and motor disability. Int. J. Med. Inform. 2018, 114, 81–87. [Google Scholar] [CrossRef] [PubMed]
  28. Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Berlin, Germany, 2018; pp. 1–377. ISBN 978-3-319-98073-7. [Google Scholar]
  29. Lin, Y.; Lee, Y.; Wahba, G. Support vector machines for classification in nonstandard situations. Mach. Learn. 2002, 46, 191–202. [Google Scholar] [CrossRef]
  30. Liu, B.; Ma, Y.; Wong, C. Improving an association rule-based classifier. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, PKDD, Lyon, France, 13–16 September 2000; pp. 293–317. [Google Scholar]
  31. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  32. Lemaitre, G.; Nogueira, F.; Aridas, C.K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2017, 18, 1–5. [Google Scholar]
  33. Chawla, N.; Cieslak, D.; Hall, L.; Joshi, A. Automatically countering imbalance and its empirical relationship to cost. Data Min. Knowl. Discov. 2008, 17, 225–252. [Google Scholar] [CrossRef] [Green Version]
  34. Ling, C.; Sheng, V.; Yang, Q. Test strategies for cost-sensitive decision trees. IEEE Trans. Knowl. Data Eng. 2006, 18, 1055–1067. [Google Scholar] [CrossRef] [Green Version]
  35. Galar, M.; Fernández, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for class imbalance problem: Bagging, boosting and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C 2012, 42, 463–484. [Google Scholar] [CrossRef]
  36. Batista, G.; Prati, R.C.; Monard, M.C. A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
  37. Chen, T.; Guestrin, T. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Figure 1. Visualization of SCADI dataset.
Figure 1. Visualization of SCADI dataset.
Symmetry 11 00089 g001
Figure 2. The flowchart of SMOTE algorithm.
Figure 2. The flowchart of SMOTE algorithm.
Symmetry 11 00089 g002
Figure 3. Visualization of SCADI_2 dataset after resampling data using SMOTE with different balancing ratios.
Figure 3. Visualization of SCADI_2 dataset after resampling data using SMOTE with different balancing ratios.
Symmetry 11 00089 g003
Figure 4. The FSX framework.
Figure 4. The FSX framework.
Symmetry 11 00089 g004
Figure 5. Feature importance.
Figure 5. Feature importance.
Symmetry 11 00089 g005
Table 1. The SCADI dataset of self-care activities.
Table 1. The SCADI dataset of self-care activities.
Cat No.Self-Care CategoryActivity No.DescriptionFeature Name
IWashing oneself1Washing body partsF3
2Washing whole bodyF4
3Drying oneselfF5
IICaring for body parts4Caring for skinF6
5Caring for teethF7
6Caring for hairF8
7Caring for fingernailsF9
8Caring for toenailsF10
9Caring for noseF11
IIIToileting10Indicating need for urinationF12
11Carrying out urination appropriatelyF13
12Indicating need for defecationF14
13Carrying out defecation appropriatelyF15
14Menstrual careF16
IVDressing15Putting on clothesF17
16Taking off clothesF18
17Putting on footwearF19
18Taking off footwearF20
19Choosing appropriate clothingF21
VEating20Indicating need for eatingF22
21Carrying out eating appropriatelyF23
VIDrinking22Indicating need for drinkingF24
23Indicating need for drinkingF25
VIILooking after one’s health24Ensuring one’s physical comfortF26
25Managing diet and fitnessF27
26Managing medications and following health adviceF28
27Seeking advice or assistance from caregivers or professionalsF29
28Avoiding risks of abuse of drugs or alcoholF30
VIIILooking after one’s safety29Looking after one’s safetyF31
Table 2. The SCADI dataset of self-care activities.
Table 2. The SCADI dataset of self-care activities.
No.DescriptionNotation
1Caring for body parts problemClass #1
2Toileting problemClass #2
3Dressing problemClass #3
4Washing oneself and caring for body parts and dressing problemClass #4
5Washing oneself, caring for body parts, toileting and dressing problemClass #5
6Eating, Drinking, washing oneself, caring for body parts, toileting, dressing, looking after one’s health and looking after one’s safety problemClass #6
7No ProblemClass #7
Table 3. Performance results of several oversampling techniques with various balancing ratios.
Table 3. Performance results of several oversampling techniques with various balancing ratios.
Oversampling TechniqueBalancing RatioAccuracy
SMOTE0.40.837
0.60.837
0.80.854
1.00.839
SMOTE-ENN0.40.724
0.60.812
0.80.822
1.00.807
SMOTE-Tomek0.40.837
0.60.837
0.80.853
1.00.839
Table 4. Performance results in terms of the accuracy of the experimental approaches in 10-fold cross validation.
Table 4. Performance results in terms of the accuracy of the experimental approaches in 10-fold cross validation.
FoldANNFSXSVMRF
Fold 10.7270.7270.8180.636
Fold 21.0000.8000.751.000
Fold 30.8570.8570.8571.000
Fold 41.0001.0000.8330.833
Fold 50.8570.8570.8570.857
Fold 60.5710.7710.7140.714
Fold 70.8001.0000.8001.000
Fold 80.6670.8890.6670.667
Fold 91.0000.8000.8001.000
Fold 100.6670.8330.8330.667
Average0.8150.8540.7930.837
Table 5. Performance results in terms of sensitivity and specificity of the experimental approaches in 10-fold cross validation.
Table 5. Performance results in terms of sensitivity and specificity of the experimental approaches in 10-fold cross validation.
FoldSensitivitySpecificity
ANNFSXSVMRFANNFSXSVMRF
Fold 10.6430.8750.7140.4280.9480.9580.9660.935
Fold 21.0000.7000.6001.0001.0000.9750.9501.000
Fold 30.8751.0000.8751.0000.9381.0000.9581.000
Fold 41.0000.5830.8890.8891.0000.9550.9330.930
Fold 50.6000.5720.6000.7500.9710.9440.9710.958
Fold 60.5000.6670.6250.6250.8750.8890.9170.917
Fold 70.8890.6000.6671.0000.9170.9670.9501.000
Fold 80.5721.0000.5710.5710.9401.0000.9460.940
Fold 91.0001.0000.5001.0001.0001.0000.9501.000
Fold 100.6000.5720.8000.6000.9100.9390.9600.920
Average0.7680.7570.6840.7860.9500.9630.9500.960

Share and Cite

MDPI and ACS Style

Le, T.; Baik, S.W. A Robust Framework for Self-Care Problem Identification for Children with Disability. Symmetry 2019, 11, 89. https://doi.org/10.3390/sym11010089

AMA Style

Le T, Baik SW. A Robust Framework for Self-Care Problem Identification for Children with Disability. Symmetry. 2019; 11(1):89. https://doi.org/10.3390/sym11010089

Chicago/Turabian Style

Le, Tuong, and Sung Wook Baik. 2019. "A Robust Framework for Self-Care Problem Identification for Children with Disability" Symmetry 11, no. 1: 89. https://doi.org/10.3390/sym11010089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop