You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

13 April 2022

An Empirical Assessment of Performance of Data Balancing Techniques in Classification Task

,
,
and
1
Symbiosis Centre for Information Technology, Symbiosis International (Deemed University), Pune 411057, India
2
Faculty of Computers and Information, South Valley University, Qena 83523, Egypt
3
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
4
Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

Abstract

Many real-world classification problems such as fraud detection, intrusion detection, churn prediction, and anomaly detection suffer from the problem of imbalanced datasets. Therefore, in all such classification tasks, we need to balance the imbalanced datasets before building classifiers for prediction purposes. Several data-balancing techniques (DBT) have been discussed in the literature to address this issue. However, not much work is conducted to assess the performance of DBT. Therefore, in this research paper we empirically assess the performance of the data-preprocessing-level data-balancing techniques, namely: Under Sampling (OS), Over Sampling (OS), Hybrid Sampling (HS), Random Over Sampling Examples (ROSE), Synthetic Minority Over Sampling (SMOTE), and Clustering-Based Under Sampling (CBUS) techniques. We have used six different classifiers and twenty-five different datasets, that have varying levels of imbalance ratio (IR), to assess the performance of DBT. The experimental results indicate that DBT helps to improve the performance of the classifiers. However, no significant difference was observed in the performance of the US, OS, HS, SMOTE, and CBUS. It was also observed that performance of DBT was not consistent across varying levels of IR in the dataset and different classifiers.

1. Introduction

Classification is a supervised machine learning (ML) technique used to predict class label of unseen data by building a classifier using historical data. Classification algorithms usually work with the assumption that dataset used to build the classifier is balanced. However, many datasets are highly imbalanced. An imbalanced dataset refers to a dataset where one class outnumbers the other classes in the dataset with respect to the target class variable. For example, consider a dataset that contains 1000 transactions, out of which 990 are nonfraudulent and only 10 are fraudulent transactions. This is a good example of a highly imbalanced dataset. Many such examples of imbalanced datasets in classification tasks are discussed in the literature. Some of them are software product defect detection [1], survival prediction of hepatocellular carcinoma patients [2], customer churn prediction [3], predicting freshmen student attrition [4], insurance fraud detection [5], and intrusion and crime detection [6]. When we build a classifier using a highly imbalanced dataset, the classifier is usually biased towards the majority class cases. It means that the performance of the classifier will be better at correctly predicting majority class cases than minority class cases. However, in real life we expect that a classifier should be unbiased and equally good at correctly predicting both minority and majority cases. Therefore, balancing imbalanced datasets is one of the most important activities because it helps to reduce bias in the model prediction, and thereby enhances the classifier’s performance.
To address the problem of imbalanced datasets in the classification task, several solutions have been proposed in the literature [7,8,9]. These solutions are broadly divided into several categories, namely data-preprocessing-level solutions, cost-sensitive learning methods, algorithm-level solutions, and ensemble methods. The data-preprocessing-level solutions are based on resampling of the original data. Resampling is performed before building the classifier. Therefore, resampling techniques are easy to implement and are independent of the classifier. Cost-sensitive learning approaches take into account the significance of misclassification of majority and minority class instances. Algorithm-level solutions either suggest a new algorithm or modify existing algorithms. Algorithm-level solutions are dependent on algorithms and require a detailed understanding of the algorithm for implementation. Therefore, algorithm-level solutions are less popular compared to resampling techniques. Ensemble solutions combine ensemble (bagging and boosting) models with resampling techniques or a cost-sensitive approach [7,8].
Though several solutions have been proposed in the literature to deal with the imbalanced dataset problem in classification tasks, there is a lack of research assessing the performance of DBT [7]. As a large number of solutions have been proposed in the literature, it is difficult to assess the performance of all proposed DBT. Therefore, in this study we limited the scope of our study to assess the performance of only resampling techniques used to balance the imbalanced dataset at the data-preprocessing level. The reason for choosing resampling techniques was that these techniques are very widely used in the literature to deal with imbalanced dataset problems in classification tasks.
The objectives of this study are: (1) to assess performance of DBT used to balance the imbalanced dataset; (2) to assess whether performance of DBT is independent of the level of imbalance ratio in the dataset; (3) to assess whether performance of DBT is independent of the classifier; (4) to assess whether DBT help to improve the performance of the classifiers.
The rest of the paper is organized as follows. Section 2 describes the theoretical background and related work. In Section 3, we discuss the experimental setup. The results of the experiment are analyzed and discussed in Section 4. Finally, the paper is concluded in Section 5.

3. Experimental Setup and Datasets

We used six different classifiers, namely the Decision Tree (C4.5), k-Nearest Neighbor (kNN), Logistic Regression (LR), Naïve Bayes (NB), random forest (RF), and support vector machine (SVM), to assess the performance of DBT instead of using a single classifier. Considering six different classifiers will also help us to understand whether performance of DBT varies for different classifiers or it is the same. In this study, we used 25 different small datasets with varying levels of IR. All datasets were downloaded from the KEEL dataset repository [42]. Information about the datasets is given in Table 1. More details about the dataset are available at (https://sci2s.ugr.es/keel/imbalanced.php (accessed on 3 November 2021)). The last column in Table 1 is the Imbalance Ratio (IR), which indicates the proportion of the number of majority class instances to the minority class instances.
Table 1. Dataset Information.
We built a total of 1050 (25 datasets times 7 techniques times 6 classifiers) classifiers using open source ‘R’ software. To build the classification model and assess its performance, the following processes were used: (i) Divide the dataset into training and test sets. The training set contains 80% of the data and the test set contains 20% of the data; (ii) apply DBT on the training dataset; (iii) build the classification model using the balanced training set; (iv) test the performance of the classification model using the test set. The performance of the classifier was measured using the area under ROC curve (AUC value) [43]. To train the classifiers, we used the default hyperparameters settings of the caret package in ‘R’ tool. No specific hyperparameter tuning was performed, as the objective of this study was not to improve the performance of the classifiers but to assess the performance of DBT. More details about the caret package are discussed by Max Kuhk et al. [44].
We used Friedman’s test statistics to compare the performances of different classifiers as they are, based on the average ranked performance of the DBT in the classification task for each dataset [45,46]. The Friedman test statistics helped us to understand whether there was significance difference in the DBT performance of different classifiers [46]. To report the differences in the performance of DBT, we applied a post hoc Nemenyi test [46,47]. It tells us which DBT differed significantly with respect to its performance in the classification task. We used Kendall’s test statistics [48] to test the agreement on rankings of DBT, based on the performance in the classification task, for varying levels of IR in the dataset. Kendall’s test was used to assess the performance of the imputation method [49]. If the value of Kendall’s ‘w’ is 1, it means that there is complete agreement over the ranking, and when it is 0, this means there is no agreement over the ranking.

4. Results and Discussion

4.1. Performance of DBT

Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 report the performances of six different classifiers, namely DT, kNN, LR, NB, RF, and SVM, for different data-balancing techniques, namely None, US, OS, HS, ROSE, and CBUS. The performance of the classifier is measured using the area under receiver operating characteristics curve, i.e., the AUC value. The first column in the table indicates the name of the imbalanced dataset. The second column indicates the performance of the classifier without balancing the imbalanced dataset (None strategy). Column 3 to Column 8 indicate the performance of the classifier for US, OS, HS, ROSE, SMOTE, and CBUS strategies. The mean rank of the classifiers based on their performance over 25 different datasets, along with Friedman test statistics, is also reported in each table in the last two rows. Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13 report the post hoc analysis using the Nemenyi multiple comparison test. The Nemenyi statistics are used to understand which DBT performance is significantly different.
Table 2. Performance of DBT for DT classifier.
Table 3. Performance of DBT for kNN classifier.
Table 4. Performance of DBT for LR.
Table 5. Performance of DBT for NB classifier.
Table 6. Performance of DBT for RF.
Table 7. Performance of DBT for SVM.
Table 8. Post hoc analysis using Nemenyi multiple comparison test for DT.
Table 9. Post hoc analysis using Nemenyi multiple comparison test for kNN.
Table 10. Post hoc analysis using Nemenyi multiple comparison test for LR.
Table 11. Post hoc analysis using Nemenyi multiple comparison test for NB classifier.
Table 12. Post hoc analysis using Nemenyi multiple comparison test for RF.
Table 13. Post hoc analysis using Nemenyi multiple comparison test for SVM.
The Friedman test statistics show that for all the classifiers, the ‘p’ value is less than 0.05. So, we can say that, statistically, there is a significance difference in the performance of DBT. As difference in the performance of DBT is significant, the Nemenyi test was then applied to find which DBT has significant difference in the performance. The following are our observations based on the Friedman statistics and Nemenyi post hoc analysis:
  • For the DT classifier, the performance of the None strategy was poor and significantly different than US, OS, HS, SMOTE, and CBUS. However, no difference in the performance was observed between None and ROSE strategies. Further, no significant difference was observed in the performance of US, OS, HS, SMOTE, and CBUS.
  • For the kNN classifier, the performance of the None strategy was poor and significantly different than OS, HS, ROSE, SMOTE, and CBUS. However, no difference in the performance was observed between the None and US strategy. Further, no significant difference was observed in the performance of OS, HS, ROSE, and CBUS. Significant difference was observed in the performance of US and OS strategies.
  • For the LR classifier, the performance of the None strategy was found to be poor and significantly different than US, OS, HS, and SMOTE. However, no significant difference in the performance was observed between None, ROSE, and CBUS. It was also observed that there was no difference in the performance of US, OS, HS, ROSE, SMOTE, and CBUS.
  • For the NB classifier, the performance of the None strategy was found to be poor and significantly different to US, SMOTE, and CBUS. However, no difference in the performance was observed between None, OS, HS, and ROSE. Further, no significant difference was observed in the performance of US, SMOTE, and CBUS.
  • For the RF classifier, it was found that the performance of the None Strategy was poor and significantly different to the US and CBUS strategies. However, no difference was observed in the performance of the None, OS, HS, ROSE, and SMOTE. Further, no significant difference was observed in the performance of US, HS, SMOTE, and CBUS.
  • For SVM, the performance of the None strategy was poor and significantly different than US, OS, HS, SMOTE, and CBUS. However, no significant difference was observed in the performance of None and ROSE. Further, no significant difference was observed in the performance of US, OS, HS, ROSE, SMOTE, and CBUS.
Therefore, from all the observations above, we can infer that: (i) the performance of the None and ROSE strategies are poor and significantly different than the others; (ii) no significant difference was observed in the performance of US, OS, HS, SMOTE, and CBUS strategies. Dealing with imbalanced datasets is a very common problem in classification tasks, and which DBT is more suitable to enhance the performance of the classifier is the most common question that needs to be answered. In this section, we have attempted to answer this question by applying data-preprocessing-level DBT to 25 different datasets using six different classifiers. From the results of the experiment and statistical analysis, we can infer the following: (i) balancing the imbalanced dataset certainly helps to improve the performance of the classifier; (ii) For DT classifier CBUS and US techniques give a better performance; (iii) For logistic regression SMOTE and OS techniques give a better performance; (iv) For Naïve Bayes classifier US and SMOTE give a better performance; (v) For random forest CBUS and US techniques give a better performance; (vi) For support vector machine HS and CBUS give better results; (vii) For kNN classifer OS and SMOTE give better results. However, it is important to note that every time we apply DBT on an imbalanced dataset, there is no guarantee that the same data will be generated or removed in order to balance it. Therefore, model performance and results could also vary slightly.

4.2. Performance of DBT across the Classifier

In this section, we assess whether performance of DBT is consistent across the classifiers or varies for different classifiers. To do this assessment, we used Kendall’s ‘w’ statistics. When ‘w’ is 1, then there is complete agreement over the ranks; and when ‘w’ is 0, then there is complete disagreement over the ranks. The ranks of DBT and results of the Kendall’s statistics are shown in Table 14. The results show that there is agreement over the ranking of data-balancing techniques. However, the concordance coefficient value (w) is 0.562, which indicates that there is partial agreement over the ranking. It is observed from Table 14 that there is consistency in the ranking only for None and ROSE techniques. However, there was no consistency in the performance of the US, OS, HS, SMOTE and CBUS techniques. The experimental results show that the performance of None and ROSE was poor and consistent. However, there was no consistency in the performance of US, OS, HS, SMOTE, and CBUS techniques, but its performance was better than ROSE and None.
Table 14. Ranks of DBT for different classifiers.
In this section of the paper, we have attempted to answer the following question: Is performance of DBT consistent across classifiers? The results show that the performance of DBT is not consistent across different classifiers.

4.3. Performance of DBT for Varying Levels of IR in the Datase

Table 15, Table 16, Table 17, Table 18, Table 19 and Table 20 show ranks of DBT for six different classifiers for varying levels of IR in the dataset. In order to assess the performance of DBT for varying levels of IR, we used Kendall’s test statistics. The rows in the tables indicate the DB strategy and the columns indicate the range of IR in the dataset. The values in each cell indicate the rank of DBT for varying levels of IR in the dataset for a given classifier. The last row in each table shows the results of Kendall’s test statistics.
Table 15. Ranks of DBT for DT for varying levels of IR.
Table 16. Ranks of DBT for kNN for varying levels of IR.
Table 17. Ranks of DBT for LR for varying levels of IR.
Table 18. Ranks of DBT for NB classifier for varying levels of IR.
Table 19. Ranks of DBT for RF for varying levels of IR.
Table 20. Ranks of DBT for SVM classifier for varying levels of IR.
From the results of Kendall’s statistics, we can infer that:
  • For the DT classifier, there is no agreement over the rankings of DBT as the “p” value is greater than 0.05. This means that for the DT classifier, the performance of the data balancing techniques was not consistent for varying imbalance-ratio percentages.
  • For the kNN classifier, there is agreement over the rankings of the data-balancing techniques as the “p” value is less than 0.05. However, the concordance value (w) is 0.593, which indicates that there is partial agreement over the rankings. From the ranks of DBT, it is observed that the performance of None seemed consistent, whereas the performance of other DBT was different for varying imbalance-ratio percentages.
  • For the LR classifier, there is no agreement over the rankings of data-balancing techniques as the “p” value is greater than 0.05. This means that for the LR classifier, the performance of the data-balancing techniques was not consistent for varying imbalance-ratio percentages.
  • For the NB classifier, there is agreement over the rankings of data-balancing techniques as the “p” value is less than 0.05. However, the concordance value (w) is 0.686, which indicates that there is partial agreement over the rankings. From the ranks of DBT, it is observed that performance of the None was consistent, whereas the performance of other data-balancing techniques was different for varying imbalance-ratio percentages.
  • For the RF classifier, there is agreement over the rankings of data-balancing techniques as the “p” value is less than 0.05. However, the concordance value (w) is 0.539, which indicates that there is partial agreement over the rankings. From the ranks of DBT, it is observed that performance of the None, US, CBUS seemed consistent, whereas the performance of other data-balancing techniques was different for varying imbalance-ratio percentages.
  • For the SVM classifier, there is agreement over the rankings of data-balancing techniques as the “p” value is less than 0.05. However, the concordance value (w) is 0.564, which indicates that there is partial agreement over the rankings. From the ranks of DBT, it is observed that only the performance of the None was consistent, whereas the performance of other data-balancing techniques was different for varying imbalance-ratio percentages.
In this section of the paper, we have attempted to answer the following question: Is the performance of the DBT consistent for varying levels of IR in the dataset? The results of the experiment show that for all the classifiers, the performance of the None and ROSE strategy was poor and consistent for varying levels of IR in the dataset. However, performances of the other DBT were not consistent for varying levels of IR in the dataset.

5. Conclusions and Recommendation for Further Work

In this research paper, we have assessed the performance of six different DBT. The assessment was performed using six different classifiers and 25 different datasets that had different levels of IR. The performance of the DBT was assessed using the performance of classifiers, which was measured using the area under ROC curve. The experimental results show that (i) for all the six classifiers, the performance of the None and ROSE strategy was poor and significantly different than the others. It was also observed that there was no significant difference in the performance of the US, OS, HS, SMOTE, and CBUS techniques; (ii) performance of None and ROSE was poor and consistent across the classifiers. There was no consistency in the performance of US, OS, HS, SMOTE, and CBUS techniques, but its performance was better than the ROSE and None strategy; (iii) there was no agreement over the ranks of DBT for varying levels of IR in the dataset except for the None and ROSE strategy; (iv) DBT helps to improve the performance of the classifiers. However, performance of the ROSE was not significantly different than the None Strategy. Thus, from the experimental results, we may infer that DBT helps to improve the performance of the classifier in classification tasks. Further, performance of the DBT is independent of the classification algorithm and IR in the dataset. These inferences are drawn based on our experimental results.
As stated earlier in the introduction section, we assessed the performance of only data-preprocessing-level data-balancing techniques. However, there is a need to assess the performance of advanced DBT such as algorithm-level solutions, cost-based learning, and ensemble methods.

Author Contributions

Conceptualization, A.J. and S.M.M.; methodology, A.J. and S.M.M.; software, A.J. and S.M.M.; validation, A.J. and S.M.M.; formal analysis, A.J. and S.M.M.; investigation, A.J., H.E., F.K.K. and S.M.M.; resources, A.J., H.E., F.K.K. and S.M.M.; data curation, A.J. and S.M.M.; writing—original draft preparation, A.J., H.E., F.K.K. and S.M.M.; writing—review and editing, A.J., H.E., F.K.K. and S.M.M.; visualization, A.J., H.E., F.K.K. and S.M.M.; supervision, H.E., F.K.K. and S.M.M.; project administration, H.E. and F.K.K.; funding acquisition, H.E. and F.K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research project was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R300).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request.

Acknowledgments

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R300), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare that they have no conflict of interest to report regarding the present study.

References

  1. Siers, M.J.; Islam, M.Z. Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf. Syst. 2015, 51, 62–71. [Google Scholar] [CrossRef]
  2. Santos, M.S.; Abreu, P.H.; Laencina, P.J.G.; Simão, A.; Carvalho, A. A new cluster-based oversampling method for improving survival prediction of hepatocellular carcinoma patients. J. Biomed. Inform. 2015, 58, 49–59. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Zhu, B.; Baesens, B.; Broucke, S.K.L.M. An empirical comparison of techniques for the class imbalance problem in churn prediction. Inf. Sci. 2017, 408, 84–99. [Google Scholar] [CrossRef] [Green Version]
  4. Thammasiri, D.; Delen, D.; Meesad, P.; Kasap, N. A critical assessment of imbalanced class distribution problem: The case of predicting freshmen student attrition. Expert Syst. Appl. 2014, 41, 321–330. [Google Scholar] [CrossRef] [Green Version]
  5. Hassan, A.K.I.; Abraham, A. Modeling insurance fraud detection using imbalanced data classification. In Proceedings of the 7th World Congress on Nature and Biologically Inspired Computing (NaBIC2015), Pietermaritzburg, South Africa, 18 November 2015; pp. 117–127. [Google Scholar]
  6. Hajian, S.; Ferrer, J.D.; Balleste, A.M. Discrimination prevention in data mining for intrusion and crime detection. In Proceedings of the IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Paris, France, 11–15 April 2011; pp. 1–8. [Google Scholar]
  7. Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 2012, 42, 463–484. [Google Scholar] [CrossRef]
  8. Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
  9. Kotsiantis, S.; Kanellopoulos, D.; Pintelas, P. Handling imbalanced datasets: A review. GESTS Int. Trans. Comput. Sci. Eng. 2006, 30, 1–12. [Google Scholar]
  10. Kotsiantis, S.; Pintelas, P. Mixture of Expert Agents for Handling Imbalanced Data Sets. Ann. Math. Comput. TeleInformatics 2003, 1, 46–55. [Google Scholar]
  11. Tahir, M.A.; Kittler, J.; Mikolajczyk, K.; Yan, F. A multiple expert approach to the class imbalance problem using inverse random under sampling. In Proceedings of the International Workshop on Multiple Classifier Systems, Reykjavik, Iceland, 10–12 June 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 82–91. [Google Scholar]
  12. Kubat, M.; Matwin, S. Addressing the curse of imbalanced training sets: One sided selection. In Proceedings of the 14th International Conference on Machine Learning, Nashville, TN, USA, 8 July 1997; pp. 179–186. [Google Scholar]
  13. Cateni, S.; Colla, V.; Vannucci, M. A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 2014, 135, 32–41. [Google Scholar] [CrossRef]
  14. Yeh, C.W.; Li, D.C.; Lin, L.S.; Tsai, T.I. A Learning Approach with Under and Over-Sampling for Imbalanced Data Sets. In Proceedings of the 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Kumamoto, Japan, 10–14 July 2016; pp. 725–729. [Google Scholar]
  15. Lunardon, N.; Menardi, G.; Torelli, N. ROSE: A Package for Binary Imbalanced Learning. R J. 2014, 6, 79–89. [Google Scholar] [CrossRef] [Green Version]
  16. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  17. Chawla, N.V.; Lazarevic, A.; Hall, L.O.; Bowyer, K.W. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, Cavtat-Dubrovnik, Dubrovnik, Croatia, 22–26 September 2003; pp. 107–119. [Google Scholar]
  18. Hu, S.; Liang, Y.; Ma, L.; He, Y. MSMOTE: Improving classification performance when training data is imbalanced. In Proceedings of the Second International Workshop on Computer Science and Engineering, Qingdao, China, 28–30 October 2009; pp. 13–17. [Google Scholar]
  19. Barua, S.; Islam, M.M.; Yao, X.; Murase, K. MWMOTE—Majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 2012, 26, 405–425. [Google Scholar] [CrossRef]
  20. Lin, W.; Tsai, C.; Hu, Y.; Jhang, J. Clustering-based undersampling in class-imbalanced data. Inf. Sci. 2017, 409, 17–26. [Google Scholar] [CrossRef]
  21. Jadhav, A. Clustering Based Data Preprocessing Technique to Deal with Imbalanced Dataset Problem in Classification Task. In Proceedings of the IEEE Punecon, Pune, India, 30 November–2 December 2018; pp. 1–7. [Google Scholar]
  22. Fan, W.; Stolfo, S.J.; Zhang, J.; Chan, P.K. AdaCost: Misclassification cost-sensitive boosting. In Proceedings of the Sixteenth International Conference on Machine Learning, San Francisco, CA, USA, 27–30 June 1999; pp. 99–105. [Google Scholar]
  23. Zhou, Z.; Liu, X. Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. IEEE Trans. Knowl. Data Eng. 2006, 18, 63–77. [Google Scholar] [CrossRef]
  24. Domingos, P. MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 15–18 August 1999; pp. 155–164. [Google Scholar]
  25. López, V.; Río, S.D.; Benítez, J.M.; Herrera, F. Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Sets Syst. 2015, 258, 5–38. [Google Scholar] [CrossRef]
  26. Sun, Y.; Kamel, M.S.; Wong, A.K.; Wang, Y. Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit. 2007, 40, 3358–3378. [Google Scholar] [CrossRef]
  27. Chen, Z.Y.; Shu, P.; Sun, M. A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data. Eur. J. Oper. Res. 2012, 223, 461–472. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Fu, P.; Liu, W.; Chen, G. Imbalanced data classification based on scaling kernel-based support vector machine. Neural Comput. Appl. 2014, 25, 927–935. [Google Scholar] [CrossRef]
  29. Kim, S.; Kim, H.; Namkoong, Y. Ordinal Classification of Imbalanced Data with Application in Emergency and Disaster Information Service. IEEE Intell. Syst. 2016, 31, 50–56. [Google Scholar] [CrossRef]
  30. Godoy, M.D.P.; Fernández, A.; Rivera, A.J.; Jesus, M.J.D. Analysis of an evolutionary RBFN design algorithm, CO2RBFN, for imbalanced data sets. Pattern Recognit. Lett. 2010, 31, 2375–2388. [Google Scholar] [CrossRef]
  31. Seiffert, C.; Khoshgoftaar, T.M.; Hulse, J.V.; Napolitano, A. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2010, 40, 185–197. [Google Scholar] [CrossRef]
  32. Wang, S.; Yao, X. Diversity analysis on imbalanced data sets by using ensemble models. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009; pp. 324–331. [Google Scholar]
  33. Barandela, R.; Valdovinos, R.M.; S´anchez, J.S. New applications of ensembles of classifiers. Pattern Anal. Appl. 2003, 6, 245–256. [Google Scholar] [CrossRef]
  34. Liao, J.J.; Shih, C.H.; Chen, T.F.; Hsu, M.F. An ensemble-based model for two-class imbalanced financial problem. Econ. Model. 2014, 37, 175–183. [Google Scholar] [CrossRef]
  35. Susan, S.; Kumar, A. The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art. Eng. Rep. 2021, 3, e12298. [Google Scholar] [CrossRef]
  36. Halimu, C.; Kasem, A. Split balancing (sBal)—A data preprocessing sampling technique for ensemble methods for binary classification in imbalanced datasets. In Computational Science and Technology; Springer: Singapore, 2021; pp. 241–257. [Google Scholar]
  37. Tolba, M.; Ouadfel, S.; Meshoul, S. Hybrid ensemble approaches to online harassment detection in highly imbalanced data. Expert Syst. Appl. 2021, 175, 114751. [Google Scholar] [CrossRef]
  38. Tao, X.; Zheng, Y.; Chen, W.; Zhang, X.; Qi, L.; Fan, Z.; Huang, S. SVDD-based weighted oversampling technique for imbalanced and overlapped dataset learning. Inf. Sci. 2022, 588, 13–51. [Google Scholar] [CrossRef]
  39. Islam, A.; Belhaouari, S.B.; Rehman, A.U.; Bensmail, H. KNNOR: An oversampling technique for imbalanced datasets. Appl. Soft Comput. 2022, 115, 108288. [Google Scholar] [CrossRef]
  40. López, V.; Fernández, A.; Torres, J.G.M.; Herrera, F. Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Syst. Appl. 2012, 39, 6585–6608. [Google Scholar] [CrossRef]
  41. Burez, J.; Poel, V.D. Handling class imbalance in customer churn prediction. Expert Syst. Appl. 2009, 36, 4626–4636. [Google Scholar] [CrossRef] [Green Version]
  42. Alcalá-Fdez, J.; Fernández, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
  43. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef] [Green Version]
  44. Kuhn, M.; Wing, J.; Weston, S.; Williams, A.; Keefer, C.; Engelhardt, A.; Cooper, T.; Mayer, Z.; Kenke, B.; R Core Team. Classification and Regression Training. 2022. Available online: https://cran.r-project.org/web/packages/caret/caret.pdf (accessed on 3 November 2021).
  45. Friedman, M. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
  46. Brown, I.; Mues, C. An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst. Appl. 2012, 39, 3446–3453. [Google Scholar] [CrossRef] [Green Version]
  47. Nemenyi, P. Distribution-Free Multiple Comparisons. Ph.D. Thesis, University of Princeton, Princeton, NJ, USA, 1963. [Google Scholar]
  48. Kendall, M.G.; Smith, B.B. The Problem of m Rankings. Ann. Math. Stat. 1939, 10, 275–287. [Google Scholar] [CrossRef]
  49. Jadhav, A.; Pramod, D.; Ramanathan, K. Comparison of performance of data imputation methods for numeric dataset. Appl. Artif. Intell. 2019, 33, 913–933. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.