DTO-SMOTE: Delaunay Tessellation Oversampling for Imbalanced Data Sets
Abstract
:1. Introduction
- We point out (for the first time) the chain trail pattern formed by the artificial instances that SMOTE generates.
- We propose a new preprocessing technique, named Delaunay Tessellation Oversampling SMOTE (DTO-SMOTE), which uses Delaunay Tessellation to build a simplex mesh. Then, we use simplex quality measures to select candidates (our previous study draws a simplex at random) for instance generation and use a Dirichlet distribution to control where synthetic instances creation inside a simplex (our former study uses the barycenter of the simplex).
- We conduct an extensive experimental evaluation with 61 bi-class data sets (our previous study only considers 15 binary data sets). This empirical comparison includes five preprocessing methods and ten learning algorithms (our former analysis only compares with SMOTE and uses kNN as the learning algorithm). It shows our approach’s appropriateness in many situations, with better average performance on binary data sets.
2. The Imbalanced Data Set Classification Problem
2.1. Performance Evaluation
2.2. Methods for Dealing with Class Imbalance
2.2.1. Preprocessing
- Oversampling Oversampling creates synthetic or duplicate minority class samples [16,20] to match the same number of samples from the majority class. As a result, the training data set becomes balanced before the training phase. The Synthetic Minority Oversampling Technique [13,21], discussed in Section 3, is the primary method in this category.
- Undersampling Undersampling discards some majority class instances to matches the number of samples from the minority class. The primary method in this category is Random Under-Sampling (RUS) [16,20,22]. Nevertheless, there are some issues regarding these techniques. When some data suffer deletion, important information could be discarded, leading to a weak classifier’s training due to the lack of relevant information. To deal with the lost of information, some strategies were proposed in the literature. In Evolutionary Undersampling [23], the undersampling is framed as a search problem for prototypes. This process reduces the number of instances from a data set, aiming not to lose a significant accuracy classification rate, using an evolutionary algorithm. Another interesting method is ACOSampling [24]. This method is based on ant colony optimization [25] in the search phase. It adopts this strategy to determine the best subset of majority class instances to keep in the training set.
- Hybrid The hybrid method’s main idea is to minimize drawbacks from undersampling and oversampling, while taking into account their benefits, to achieve a balanced data set. To illustrate this combination, we can cite some methods, like SMOTE + Tomek Link [26], SMOTE + ENN [27], SMOTE-RSB [28], and SMOTE-IPF [29], which combine the SMOTE oversampling technique with different data cleaning, to remove some spurious artificial instances introduced in the oversampling phase, and data clustering followed by oversampling [30].
2.2.2. Cost-Sensitive Learning
2.2.3. Ensemble Learning
Bagging
Boosting
3. The SMOTE Family of Oversampling
- is the new instance vector;
- is the feature vector of instance i;
- is the feature vector of instance j;
- r is a random number between 0 and 1.
3.1. SMOTE Variations
3.1.1. Borderline-SMOTE
3.1.2. SMOTE-SVM
3.1.3. ADASYN
3.1.4. Geometric SMOTE
3.1.5. Manifold-Based Synthetic Oversampling
3.2. The SMOTE Chain Trail Pattern Bias
4. Delaunay Tessellation Oversampling—DTO-SMOTE
4.1. Simplex Geometry
4.2. Mesh Generation
4.3. Tetrahedral Quality Evaluation
- Relative Volume: The relative volume of the current tetrahedron is computed as its real volume divided by the value of maximal volume in the tessellation [48].
- Radius Ratio: The radius ratio is the weighed ratio between the radius of the inscribed sphere (r) to the radius of the circumscribed sphere (R), as shown in Equation (18).
- Solid Angle: The solid angle is the area of a spherical triangle created on the unit sphere in which the center is in the tetrahedron vertex [48]. We compute the sum of four solid angles of the tetrahedron.
- Minimum Solid Angle: This returns the minimum solid angle instead of their sum.
- Maximum Solid Angle: This returns the maximum solid angle instead of their sum.
- Edge Ratio: The edge ratio computes the ratio between the length of the most prolonged edge E to the length of the shortest edge of the tetrahedron, as shown in Equation (19).
- Aspect Ratio: The aspect ration computes the ratio between the radius of the sphere that circumscribes the tetrahedron (R) to the length of the longest edge (E), as shown in Equation (20).
4.4. Synthetic Instance Generation
4.5. Method Description
Algorithm 1: DTO-SMOTE |
- The algorithm takes as input an imbalanced data set, a data compression algorithm, some tetrahedron quality measure (Section 4.3), and the Dirichlet hyperparameter c (Section 4.4);
- Using the compression technique, reduce feature space to a three-dimensional space resulting in ;
- Construct the Delaunay tessellation using the compressed data set
- For all simplex from , compute a simplex quality measure (Section 4.3);
- After that, we calculate weights for all simplex in this way: where is the vertex’s proportion of instances belonging to the minority class in the simplex and is the quality index calculated for the simplex, according to quality measure selected; The weights are normalized to sum 1, for representing probabilities.
- Randomly choose with replacement a simplex (tetrahedron) from , according to the probabilities calculated from step 5.
- Once a simplex was selected, generate a new sample using a Dirichlet distribution, as described in Section 4.4.
- Steps 6 and 7 are repeated until the number of samples in Minority Class matches the number of instances of the Majority Class.
5. Experiments and Results
5.1. Influence of Parameters
5.2. Experimental Results for Bi-Class Data Sets
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Prati, R.C.; Batista, G.E.A.P.A.; Silva, D.F. Class imbalance revisited: A new experimental setup to assess the performance of treatment methods. Knowl. Inf. Syst. 2015, 45, 247–270. [Google Scholar] [CrossRef]
- Krawczyk, B.; Galar, M.; Jeleń, Ł.; Herrera, F. Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. J. 2016, 38, 714–726. [Google Scholar] [CrossRef]
- Troncoso, A.; Ribera, P.; Asencio-Cortés, G.; Vega, I.; Gallego, D. Imbalanced classification techniques for monsoon forecasting based on a new climatic time series. Environ. Model. Softw. 2018, 106, 48–56. [Google Scholar] [CrossRef]
- Yan, B.; Han, G. LA-GRU: Building Combined Intrusion Detection Model Based on Imbalanced Learning and Gated Recurrent Unit Neural Network. Secur. Commun. Netw. 2018, 2018. [Google Scholar] [CrossRef] [Green Version]
- Farías, D.I.H.; Prati, R.; Herrera, F.; Rosso, P. Irony detection in Twitter with imbalanced class distributions. J. Intell. Fuzzy Syst. 2020, 39, 2147–2163. [Google Scholar] [CrossRef]
- Huang, X.; Zhang, C.Z.; Yuan, J. Predicting Extreme Financial Risks on Imbalanced Dataset: A Combined Kernel FCM and Kernel SMOTE Based SVM Classifier. Comput. Econ. 2020, 56, 187–216. [Google Scholar] [CrossRef]
- Roumani, Y.F.; Nwankpa, J.K.; Tanniru, M. Predicting firm failure in the software industry. Artif. Intell. Rev. 2020, 53, 4161–4182. [Google Scholar] [CrossRef]
- Zhang, X.; Li, Y.; Kotagiri, R.; Wu, L.; Tari, Z.; Cheriet, M. KRNN: K Rare-class Nearest Neighbour classification. Pattern Recognit. 2017, 62, 33–44. [Google Scholar] [CrossRef]
- Sawangarreerak, S.; Thanathamathee, P. Random Forest with Sampling Techniques for Handling Imbalanced Prediction of University Student Depression. Information 2020, 11, 519. [Google Scholar] [CrossRef]
- Oksuz, K.; Cam, B.C.; Kalkan, S.; Akbas, E. Imbalance problems in object detection: A review. IEEE Trans. Pattern Anal. Mach. Intell. 2020. [Google Scholar] [CrossRef] [Green Version]
- Fiorentini, N.; Losa, M. Handling imbalanced data in road crash severity prediction by machine learning algorithms. Infrastructures 2020, 5, 61. [Google Scholar] [CrossRef]
- Patel, H.; Singh Rajput, D.; Thippa Reddy, G.; Iwendi, C.; Kashif Bashir, A.; Jo, O. A review on classification of imbalanced data for wireless sensor networks. Int. J. Distrib. Sens. Netw. 2020, 16, 1550147720916404. [Google Scholar] [CrossRef]
- Chawla, N.; Bowyer, K.; Hall, L.; Kegelmeyer, W. SMOTE: Synthetic Minority Over-sampling Technique Nitesh. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
- Schaap, W.E.; van de Weygaert, R. Continuous fields and discrete samples: Reconstruction through Delaunay tessellations. Astron. Astrophys. 2000, 363, L29–L32. [Google Scholar]
- Carvalho, A.M.D.; Prati, R.C. Improving kNN classification under Unbalanced Data. A New Geometric Oversampling Approach. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–6. [Google Scholar]
- Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
- Japkowicz, N.; Shah, M. Evaluating Learning Algorithms: A Classification Perspective; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- García, V.; Sánchez, J.; Mollineda, R. On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 2012, 25, 13–21. [Google Scholar] [CrossRef]
- Prati, R.C.; Batista, G.E.A.P.A.; Monard, M.C. A Survey on Graphical Methods for Classification Predictive Performance Evaluation. IEEE Trans. Knowl. Data Eng. 2011, 23, 1601–1618. [Google Scholar] [CrossRef]
- Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
- Fernandez, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [Google Scholar] [CrossRef]
- Lemaître, G.; Nogueira, F.; Aridas, C.K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2017, 18, 1–5. [Google Scholar]
- García, S.; Herrera, F. Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol. Comput. 2009, 17, 275–306. [Google Scholar] [CrossRef]
- Yu, H.; Ni, J.; Zhao, J. ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing 2013, 101, 309–318. [Google Scholar] [CrossRef]
- Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization. IEEE Comput. Intell. Mag. 2006, 1, 28–39. [Google Scholar] [CrossRef] [Green Version]
- Sun, Y.; Castellano, C.G.; Robinson, M.; Adams, R.; Rust, A.G.; Davey, N. Using pre & post-processing methods to improve binding site predictions. Pattern Recognit. 2009, 42, 1949–1958. [Google Scholar]
- Batista, G.E.A.P.A.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explor. Newsl. 2004, 6, 20. [Google Scholar] [CrossRef]
- Ramentol, E.; Caballero, Y.; Bello, R.; Herrera, F. SMOTE-RSB *: A hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 2012, 33, 245–265. [Google Scholar] [CrossRef]
- Sáez, J.A.; Luengo, J.; Stefanowski, J.; Herrera, F. SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 2015, 291, 184–203. [Google Scholar] [CrossRef]
- Guo, H.; Zhou, J.; Wu, C.A. Imbalanced learning based on data-partition and SMOTE. Information 2018, 9, 238. [Google Scholar] [CrossRef] [Green Version]
- Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Cost-Sensitive Learning. In Learning from Imbalanced Data Sets; Springer: New York, NY, USA, 2018; pp. 63–78. [Google Scholar]
- Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Ensemble Learning. In Learning from Imbalanced Data Sets; Springer: New York, NY, USA, 2018; pp. 147–196. [Google Scholar]
- Galar, M.; Fernández, A.; Barrenechea, E.; Sola, H.; Herrera, F. A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. Syst. Man Cybern. Part C Appl. Rev. IEEE Trans. 2012, 42, 463–484. [Google Scholar] [CrossRef]
- Leo, B. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar]
- Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In Advances in Intelligent Computing; Huang, D.S., Zhang, X.P., Huang, G.B., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887. [Google Scholar]
- Nguyen, H.M.; Cooper, E.W.; Kamei, K. Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradig. 2011, 3, 4. [Google Scholar] [CrossRef]
- He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1322–1328. [Google Scholar]
- Douzas, G.; Bacao, F. Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE. Inf. Sci. 2019, 501, 118–135. [Google Scholar] [CrossRef]
- Bellinger, C.; Drummond, C.; Japkowicz, N. Manifold-based synthetic oversampling with manifold conformance estimation. Mach. Learn. 2018, 107, 605–637. [Google Scholar] [CrossRef] [Green Version]
- Elreedy, D.; Atiya, A.F. A Comprehensive Analysis of Synthetic Minority Oversampling TEchnique (SMOTE) for Handling Class Imbalance. Inf. Sci. 2019, 505, 32–64. [Google Scholar] [CrossRef]
- Gao, Z.; Yu, Z.; Holst, M. Feature-preserving surface mesh smoothing via suboptimal Delaunay triangulation. Graph. Model. 2013, 75, 23–38. [Google Scholar] [CrossRef] [Green Version]
- Samat, A.; Gamba, P.; Liu, S.; Du, P.; Abuduwaili, J. Jointly Informative and Manifold Structure Representative Sampling Based Active Learning for Remote Sensing Image Classification. IEEE Trans. Geosci. Remote. Sens. 2016, 54, 6803–6817. [Google Scholar] [CrossRef]
- Kolluri, R.; Shewchuk, J.R.; O’Brien, J.F. Spectral surface reconstruction from noisy point clouds. In Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, Nice, France, 8–10 July 2004; pp. 11–21. [Google Scholar]
- De Kok, T.; Van Kreveld, M.; Löffler, M. Generating realistic terrains with higher-order Delaunay triangulations. Comput. Geom. 2007, 36, 52–65. [Google Scholar] [CrossRef] [Green Version]
- Anderson, S.J.; Karumanchi, S.B.; Iagnemma, K. Constraint-based planning and control for safe, semi-autonomous operation of vehicles. In Proceedings of the 2012 IEEE Intelligent Vehicles Symposium (IV), Madrid, Spain, 3–7 June 2012; pp. 383–388. [Google Scholar]
- Devriendt, K.; Van Mieghem, P. The simplex geometry of graphs. J. Complex Netw. 2019, 7, 469–490. [Google Scholar] [CrossRef] [Green Version]
- Jones, E.; Oliphant, T.; Peterson, P. SciPy: Open Source Scientific Tools for Python. 2001. Available online: https://www.scipy.org/ (accessed on 5 November 2020).
- Maur, P. Delaunay Triangulation in 3D. Ph.D. Thesis, University of West Bohemia in Pilsen, Pilsen, Czech Republic, 2002. [Google Scholar]
- Santos, M.S.; Soares, J.P.; Abreu, P.H.; Araujo, H.; Santos, J. Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches [research frontier]. IEEE Comput. Intell. Mag. 2018, 13, 59–76. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef] [Green Version]
- Abou-Moustafa, K.; Ferrie, F.P. Local generalized quadratic distance metrics: Application to the k-nearest neighbors classifier. Adv. Data Anal. Classif. 2018, 12, 341–363. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and regression trees. Classif. Regres. Trees 2017, 1, 1–358. [Google Scholar]
- Pearlmutter, B.A. Fast Exact Multiplication by the Hessian. Neural Comput. 1994, 6, 147–160. [Google Scholar] [CrossRef]
- Utkin, L.V.; Zhuk, Y.A. Robust boosting classification models with local sets of probability distributions. Knowl.-Based Syst. 2014, 61, 59–75. [Google Scholar] [CrossRef]
- Shen, H. Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 811–820. [Google Scholar]
- Chang, C.C.; Lin, C.j.; Tieleman, T. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. (TIST) 2008, 307, 1–39. [Google Scholar] [CrossRef]
- Zhang, T.; Damerau, F.; Johnson, D. Text chunking based on a generalization of winnow. J. Mach. Learn. Res. 2002, 2, 615–637. [Google Scholar] [CrossRef]
Oversampling Technique | Alias | Reference |
---|---|---|
SMOTE | SMOTE | [21] |
BORDERLINE SMOTE 1 | BORDERLINE1 | [35] |
BORDERLINE SMOTE 2 | BORDERLINE2 | [35] |
SMOTE SVM | SMOTESVM | [36] |
GEOMETRIC SMOTE | GEOSMOTE | [38] |
No preprocessing | Original |
Algorithm | Alias | Parameters | Reference |
---|---|---|---|
Random Forest Classifier | RF | n_estimators = 100 | [50] |
k-Neighbors Classifier | KNN | n_neighbors = 5 | [51] |
Decision Tree Classifier | DTREE | criterion = ‘gini’ | [52] |
Logistic Regression | LRG | penalty = ‘l2’,C=1.0 | [53] |
AdaBoost Classifier | ABC | n_estimators = 50 | [54] |
MultiLayer Perceptron | MLP | max_iter = 500 | [55] |
Suport Vector Machine | SVM | probability = True | [56] |
Stochastic Gradient Descendent Classifier | SGD | loss = “hinge”,penalty = “l2”,max_iter = 500 | [57] |
Data Sets | IR | Inst. | Feat. |
---|---|---|---|
page-blocks0 | 8.79 | 5472 | 10 |
ecoli3 | 8.6 | 336 | 7 |
Spectrometer | 11.0 | 531 | 93 |
yeast3 | 8.1 | 1484 | 8 |
glass6 | 6.38 | 214 | 9 |
ecoli2 | 5.46 | 336 | 7 |
new-thyroid2 | 5.14 | 215 | 5 |
new-thyroid1 | 5.14 | 215 | 5 |
ecoli1 | 3.36 | 336 | 7 |
vehicle0 | 3.25 | 846 | 18 |
glass-0-1-2-3 vs 4-5-6 | 3.2 | 214 | 9 |
vehicle3 | 2.99 | 846 | 18 |
vehicle1 | 2.9 | 846 | 18 |
vehicle2 | 2.88 | 846 | 18 |
yeast1 | 2.46 | 1484 | 8 |
glass0 | 2.06 | 214 | 9 |
iris0 | 2 | 150 | 4 |
pima | 1.87 | 768 | 8 |
glass1 | 1.82 | 214 | 9 |
yeast6 | 41.4 | 1484 | 8 |
Pen Digits | 9.4 | 10,992 | 16 |
yeast5 | 32.73 | 1484 | 8 |
yeast-2 vs 8 | 23.1 | 482 | 8 |
page-blocks-1-3 vs 4 | 15.86 | 472 | 10 |
ecoli4 | 15.8 | 336 | 7 |
shuttle-c0-vs-c4 | 13.87 | 1829 | 9 |
vowel0 | 9.98 | 988 | 13 |
yeast-0-5-6-7-9 vs 4 | 9.35 | 528 | 8 |
yeast-2 vs 4 | 9.08 | 514 | 8 |
Wine Quality | 26 | 4898 | 11 |
Ecoli | 8.6 | 336 | 7 |
OIL | 22 | 937 | 49 |
appendicitis | 4.04 | 106 | 8 |
autoUniv-au1-1000 | 2.86 | 1000 | 20 |
climate-simulation-craches | 10.73 | 540 | 20 |
colon32 | 1.81 | 62 | 32 |
fertility-diagnosis | 7.3 | 100 | 9 |
habermans-survival | 2.78 | 306 | 3 |
indian-liver-patient | 2.49 | 583 | 10 |
ionosphere | 2.02 | 351 | 33 |
lsvt-voice-rehabilitation | 2.0 | 126 | 310 |
ozone-eighthr | 14.83 | 2533 | 72 |
ozone-onehr | 33.73 | 2535 | 72 |
parkinsons | 3.06 | 194 | 22 |
phoneme | 2.4 | 5413 | 5 |
pima-indians-diabetes | 1.86 | 767 | 8 |
planning-relax | 2.5 | 181 | 12 |
qsar-biodegradation | 1.96 | 1054 | 41 |
saheart | 1.88 | 461 | 9 |
seismic-bumps | 14.2 | 2583 | 18 |
spambase | 1.53 | 4600 | 57 |
spectf-heart | 2.67 | 348 | 44 |
german-credit | 2.33 | 999 | 20 |
german-credit-numeric | 2.33 | 999 | 24 |
thoracic-surgery | 5.71 | 469 | 16 |
thyroid-hypothyroid | 19.94 | 3162 | 25 |
thyroid-sick-euthyroid | 9.8 | 3163 | 25 |
vertebra-column-2c | 2.1 | 309 | 6 |
wdbc | 1.68 | 568 | 30 |
wholesale-channel | 2.1 | 439 | 7 |
wilt | 17.54 | 4838 | 5 |
ABC | DTREE | KNN | LRG |
AUC,biclass | AUC,biclass | AUC,biclass | AUC,biclass |
GEO,biclass | GEO,biclass | GEO,biclass | GEO,biclass |
IBA,biclass | IBA,biclass | IBA,biclass | IBA,biclass |
MLP | RF | SGD | SVM |
AUC,biclass | AUC,biclass | AUC,biclass | AUC,biclass |
GEO,biclass | GEO,biclass | GEO,biclass | GEO,biclass |
IBA,biclass | IBA,biclass | IBA,biclass | IBA,biclass |
ALGORITHM | ORIGINAL | SMOTE | SMOTESVM | BORDERLINE1 | BORDERLINE2 | GEOSMOTE | DTO-SMOTE |
---|---|---|---|---|---|---|---|
ABC | 6.36 | 3.27 | 3.42 | 3.59 | 4.28 | 3.81 | 3.26 |
DTREE | 5.6 | 3.43 | 3.41 | 3.96 | 4.4 | 3.79 | 3.4 |
KNN | 5.87 | 3.11 | 3.7 | 3.65 | 4.24 | 3.8 | 3.62 |
LRG | 6.43 | 3.32 | 3.69 | 3.69 | 4.42 | 3.36 | 3.08 |
MLP | 5.88 | 3.46 | 3.47 | 4.1 | 4.08 | 3.68 | 3.32 |
RF | 5.7 | 3.34 | 3.31 | 3.78 | 4.19 | 4.45 | 3.24 |
SGD | 5.64 | 3.53 | 3.58 | 3.84 | 3.92 | 4.02 | 3.47 |
SVM | 6.05 | 3.35 | 3.54 | 3.86 | 4.51 | 3.41 | 3.28 |
avarage | 5.94 | 3.35 | 3.52 | 3.81 | 4.25 | 3.79 | 3.34 |
std | 0.63 | 0.29 | 0.21 | 0.25 | 0.37 | 0.4 | 0.17 |
ALGORITHM | ORIGINAL | SMOTE | SMOTESVM | BORDERLINE1 | BORDERLINE2 | GEOSMOTE | DTO-SMOTE |
---|---|---|---|---|---|---|---|
ABC | 6.02 | 3.54 | 3.34 | 3.8 | 4.31 | 3.81 | 3.16 |
DTREE | 5.79 | 3.41 | 3.57 | 4.05 | 3.94 | 3.9 | 3.33 |
KNN | 5.93 | 3.35 | 3.71 | 3.69 | 3.78 | 3.94 | 3.59 |
LRG | 6.76 | 3.19 | 3.63 | 3.68 | 4.47 | 3.17 | 3.09 |
MLP | 6.44 | 3.23 | 3.67 | 3.71 | 4.17 | 3.52 | 3.26 |
RF | 5.75 | 3.31 | 3.6 | 4.04 | 4.01 | 4.16 | 3.13 |
SGD | 6.35 | 3.19 | 3.56 | 3.61 | 4.22 | 3.67 | 3.4 |
SVM | 6.17 | 3.2 | 3.65 | 3.69 | 4.18 | 3.91 | 3.2 |
avarage | 6.15 | 3.3 | 3.59 | 3.79 | 4.14 | 3.76 | 3.27 |
std | 0.59 | 0.31 | 0.19 | 0.26 | 0.38 | 0.46 | 0.17 |
ALGORITHM | ORIGINAL | SMOTE | SMOTESVM | BORDERLINE1 | BORDERLINE2 | GEOSMOTE | DTO-SMOTE |
---|---|---|---|---|---|---|---|
ABC | 6.07 | 3.28 | 3.38 | 3.65 | 4.39 | 4.1 | 3.14 |
DTREE | 6.2 | 3.19 | 3.5 | 3.81 | 4.37 | 3.64 | 3.29 |
KNN | 5.83 | 3.11 | 3.76 | 3.72 | 4.07 | 3.87 | 3.65 |
LRG | 6.27 | 3.16 | 3.7 | 3.96 | 4.57 | 3.28 | 3.06 |
MLP | 6.5 | 3.21 | 3.4 | 3.91 | 3.95 | 3.68 | 3.36 |
RF | 6.0 | 3.34 | 3.52 | 3.69 | 3.93 | 4.29 | 3.24 |
SGD | 6.14 | 3.2 | 3.55 | 3.79 | 4.01 | 3.89 | 3.41 |
SVM | 5.93 | 3.47 | 3.61 | 3.86 | 4.57 | 3.37 | 3.2 |
avarage | 6.12 | 3.24 | 3.55 | 3.8 | 4.23 | 3.77 | 3.29 |
std | 0.6 | 0.33 | 0.18 | 0.24 | 0.38 | 0.46 | 0.19 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
de Carvalho, A.M.; Prati, R.C. DTO-SMOTE: Delaunay Tessellation Oversampling for Imbalanced Data Sets. Information 2020, 11, 557. https://doi.org/10.3390/info11120557
de Carvalho AM, Prati RC. DTO-SMOTE: Delaunay Tessellation Oversampling for Imbalanced Data Sets. Information. 2020; 11(12):557. https://doi.org/10.3390/info11120557
Chicago/Turabian Stylede Carvalho, Alexandre M., and Ronaldo C. Prati. 2020. "DTO-SMOTE: Delaunay Tessellation Oversampling for Imbalanced Data Sets" Information 11, no. 12: 557. https://doi.org/10.3390/info11120557
APA Stylede Carvalho, A. M., & Prati, R. C. (2020). DTO-SMOTE: Delaunay Tessellation Oversampling for Imbalanced Data Sets. Information, 11(12), 557. https://doi.org/10.3390/info11120557