Feature-Weighted Sampling for Proper Evaluation of Classification Models
Abstract
:1. Introduction
2. Materials and Methods
- (1)
- Generate numerous candidate cases of train/test sets using modified RBS.
- (2)
- Evaluate the similarity between the original dataset and candidate cases. The similarity is measured by the distance.
- (3)
- Choose the case that has smallest distance to original dataset.
2.1. Phase 1: Generate Candidates
2.1.1. Concept of Class Overlap
2.1.2. Modified RBS
2.2. Phase 2: Evaluate the Candidates and Select Best Train/Test Sets
2.2.1. Generation of Histograms
2.2.2. Measurement of Histogram Similarity
2.2.3. Feature Weighting
- : similarity distance of given train/test sets
- : weight of i-th feature
- : Similarity distance between whole dataset and training set for i-th feature
- : Similarity distance between whole dataset and test set for i-th feature
2.3. Evaluation of FWS Method
2.3.1. Evaluation Metric: MAI
2.3.2. Benchmark Datasets and Classifiers
3. Results
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
Appendix A
No | Classifier | K = 3 | K = 4 | K = 5 | K = 6 | K = 7 |
---|---|---|---|---|---|---|
1 | C50 | 2.383 | 2.368 | 0.499 | 0.499 | 0.499 |
KNN | 0.552 | 0.534 | 1.252 | 0.480 | 0.480 | |
RF | 0.479 | 0.479 | 0.479 | 1.020 | 0.479 | |
SVM | 0.792 | 0.775 | 0.940 | 1.568 | 0.543 | |
2 | C50 | 0.076 | 0.491 | 0.232 | 0.719 | 0.120 |
KNN | 0.474 | 0.241 | 0.515 | 2.064 | 0.390 | |
RF | 1.233 | 0.211 | 1.328 | 0.153 | 3.192 | |
SVM | 1.075 | 1.557 | 0.058 | 1.175 | 0.418 | |
3 | C50 | 0.078 | 0.446 | 1.189 | 0.835 | 0.164 |
KNN | 1.766 | 0.166 | 1.195 | 0.659 | 0.639 | |
RF | 0.351 | 0.027 | 0.379 | 0.414 | 0.407 | |
SVM | 0.721 | 0.682 | 0.682 | 0.663 | 0.644 | |
4 | C50 | 0.161 | 1.010 | 0.932 | 1.293 | 0.079 |
KNN | 0.845 | 0.532 | 1.135 | 0.458 | 0.313 | |
RF | 0.760 | 0.055 | 1.117 | 1.495 | 1.112 | |
SVM | 1.213 | 0.332 | 0.023 | 1.407 | 0.107 | |
5 | C50 | 0.538 | 0.078 | 0.414 | 0.322 | 1.122 |
KNN | 0.214 | 0.286 | 0.326 | 0.165 | 0.430 | |
RF | 0.618 | 0.251 | 1.441 | 0.502 | 1.316 | |
SVM | 0.485 | 0.576 | 0.182 | 0.730 | 0.731 | |
6 | C50 | 0.268 | 0.837 | 0.832 | 1.184 | 1.943 |
KNN | 0.243 | 0.027 | 0.025 | 0.576 | 0.845 | |
RF | 0.159 | 0.503 | 0.167 | 0.948 | 0.274 | |
SVM | 0.737 | 0.447 | 0.729 | 0.447 | 0.727 | |
7 | C50 | 0.347 | 0.524 | 0.337 | 0.883 | 0.160 |
KNN | 0.122 | 0.466 | 0.241 | 0.066 | 0.235 | |
RF | 0.317 | 0.905 | 0.484 | 1.097 | 0.084 | |
SVM | 0.252 | 1.006 | 0.810 | 0.620 | 0.246 | |
8 | C50 | 0.145 | 0.920 | 1.169 | 2.073 | 0.344 |
KNN | 0.231 | 0.346 | 0.361 | 0.689 | 1.287 | |
RF | 0.439 | 1.023 | 0.502 | 0.140 | 0.502 | |
SVM | 1.185 | 2.047 | 0.529 | 1.490 | 0.217 | |
9 | C50 | 0.807 | 0.075 | 7.890 | 1.820 | 0.818 |
KNN | 1.039 | 0.482 | 0.517 | 0.719 | 0.144 | |
RF | 0.690 | 0.633 | 0.830 | 0.778 | 0.085 | |
SVM | 1.652 | 2.145 | 0.086 | 1.531 | 0.086 | |
10 | C50 | 0.020 | 0.448 | 0.508 | 0.832 | 1.027 |
KNN | 0.675 | 0.183 | 0.781 | 0.123 | 0.241 | |
RF | 0.500 | 0.054 | 0.115 | 1.340 | 0.948 | |
SVM | 0.359 | 1.253 | 0.232 | 1.212 | 1.123 | |
11 | C50 | 0.365 | 0.398 | 0.273 | 0.429 | 0.273 |
KNN | 0.267 | 0.230 | 0.533 | 0.195 | 0.230 | |
RF | 0.152 | 0.114 | 0.114 | 0.844 | 0.114 | |
SVM | 0.316 | 0.278 | 0.500 | 0.242 | 0.500 | |
12 | C50 | 0.930 | 0.702 | 1.616 | 0.899 | 1.128 |
KNN | 0.312 | 0.912 | 0.912 | 0.422 | 0.067 | |
RF | 0.514 | 0.768 | 0.512 | 1.283 | 0.512 | |
SVM | 0.839 | 1.615 | 2.160 | 0.294 | 1.112 | |
13 | C50 | 0.641 | 0.311 | 0.825 | 0.634 | 1.249 |
KNN | 0.058 | 0.498 | 1.366 | 0.460 | 0.133 | |
RF | 0.256 | 1.089 | 1.402 | 0.063 | 0.787 | |
SVM | 0.784 | 1.250 | 0.619 | 0.414 | 1.642 | |
14 | C50 | 0.462 | 0.185 | 1.220 | 0.720 | 0.605 |
KNN | 1.024 | 0.295 | 0.100 | 0.630 | 1.349 | |
RF | 0.328 | 0.375 | 0.840 | 0.018 | 0.135 | |
SVM | 0.405 | 0.619 | 0.746 | 0.502 | 0.453 | |
15 | C50 | 0.454 | 0.201 | 0.585 | 0.280 | 0.042 |
KNN | 0.318 | 0.149 | 0.161 | 0.918 | 0.537 | |
RF | 0.667 | 0.091 | 0.129 | 1.118 | 1.073 | |
SVM | 0.425 | 0.150 | 0.643 | 0.506 | 0.678 | |
16 | C50 | 0.100 | 0.138 | 0.176 | 0.138 | 0.247 |
KNN | 0.066 | 1.184 | 0.391 | 0.976 | 0.723 | |
RF | 0.085 | 0.614 | 0.819 | 1.364 | 0.217 | |
SVM | 0.023 | 0.063 | 0.101 | 0.063 | 0.174 | |
17 | C50 | 1.062 | 2.067 | 0.677 | 0.145 | 2.023 |
KNN | 0.424 | 0.382 | 1.763 | 0.807 | 1.272 | |
RF | 0.613 | 1.997 | 0.570 | 1.117 | 1.959 | |
SVM | 0.826 | 1.581 | 1.552 | 0.512 | 1.727 | |
18 | C50 | 0.078 | 0.446 | 1.189 | 0.835 | 0.164 |
KNN | 1.766 | 0.166 | 1.195 | 0.659 | 0.639 | |
RF | 0.351 | 0.027 | 0.379 | 0.414 | 0.407 | |
SVM | 0.721 | 0.682 | 0.682 | 0.663 | 0.644 | |
19 | C50 | 0.636 | 0.062 | 0.347 | 1.515 | 0.388 |
KNN | 0.099 | 2.152 | 0.724 | 0.747 | 0.243 | |
RF | 1.309 | 0.751 | 2.264 | 0.566 | 0.302 | |
SVM | 0.033 | 1.142 | 0.343 | 1.126 | 2.135 | |
20 | C50 | 0.226 | 0.459 | 0.366 | 0.737 | 0.381 |
KNN | 0.693 | 0.656 | 0.104 | 0.110 | 0.266 | |
RF | 0.086 | 0.961 | 0.827 | 1.319 | 0.120 | |
SVM | 0.610 | 1.173 | 0.819 | 0.448 | 0.440 | |
mean | 0.554 | 0.654 | 0.775 | 0.754 | 0.645 |
No | Classifier | bw = 0.05 | bw = 0.1 | bw = 0.2 |
---|---|---|---|---|
1 | C50 | 2.383 | 2.383 | 2.383 |
KNN | 0.552 | 0.552 | 0.552 | |
RF | 0.479 | 0.479 | 0.479 | |
SVM | 0.792 | 0.792 | 0.792 | |
2 | C50 | 0.076 | 1.519 | 1.519 |
KNN | 0.474 | 0.615 | 0.615 | |
RF | 1.233 | 1.016 | 1.016 | |
SVM | 1.075 | 1.040 | 1.040 | |
3 | C50 | 0.741 | 0.078 | 0.078 |
KNN | 0.202 | 1.766 | 1.766 | |
RF | 0.061 | 0.351 | 0.351 | |
SVM | 0.155 | 0.721 | 0.721 | |
4 | C50 | 0.161 | 0.941 | 0.161 |
KNN | 0.188 | 0.188 | 0.845 | |
RF | 0.871 | 1.687 | 0.760 | |
SVM | 0.054 | 0.719 | 1.213 | |
5 | C50 | 0.538 | 0.538 | 0.538 |
KNN | 0.214 | 0.214 | 0.214 | |
RF | 0.618 | 0.618 | 0.618 | |
SVM | 0.485 | 0.485 | 0.485 | |
6 | C50 | 0.268 | 0.268 | 0.268 |
KNN | 0.243 | 0.243 | 0.243 | |
RF | 0.159 | 0.159 | 0.159 | |
SVM | 0.737 | 0.737 | 0.737 | |
7 | C50 | 0.347 | 0.347 | 0.347 |
KNN | 0.122 | 0.122 | 0.122 | |
RF | 0.317 | 0.317 | 0.317 | |
SVM | 0.252 | 0.252 | 0.252 | |
8 | C50 | 0.148 | 0.148 | 0.145 |
KNN | 0.888 | 0.888 | 0.231 | |
RF | 0.236 | 0.236 | 0.439 | |
SVM | 0.479 | 0.479 | 1.185 | |
9 | C50 | 0.807 | 0.807 | 0.807 |
KNN | 0.025 | 0.640 | 1.039 | |
RF | 0.065 | 0.060 | 0.690 | |
SVM | 0.297 | 1.662 | 1.652 | |
10 | C50 | 0.344 | 0.344 | 0.020 |
KNN | 0.071 | 0.071 | 0.675 | |
RF | 0.500 | 0.500 | 0.500 | |
SVM | 0.848 | 0.848 | 0.359 | |
11 | C50 | 0.365 | 0.365 | 0.365 |
KNN | 0.514 | 0.514 | 0.267 | |
RF | 0.152 | 0.152 | 0.152 | |
SVM | 0.316 | 0.316 | 0.316 | |
12 | C50 | 0.213 | 0.930 | 0.930 |
KNN | 1.047 | 0.312 | 0.312 | |
RF | 0.257 | 0.514 | 0.514 | |
SVM | 1.112 | 0.839 | 0.839 | |
13 | C50 | 1.667 | 1.667 | 0.641 |
KNN | 0.483 | 0.483 | 0.058 | |
RF | 1.975 | 1.975 | 0.256 | |
SVM | 1.056 | 1.056 | 0.784 | |
14 | C50 | 0.289 | 0.462 | 0.462 |
KNN | 0.349 | 1.024 | 1.024 | |
RF | 0.078 | 0.328 | 0.328 | |
SVM | 0.209 | 0.405 | 0.405 | |
15 | C50 | 1.290 | 0.197 | 0.454 |
KNN | 0.206 | 0.802 | 0.318 | |
RF | 0.839 | 0.839 | 0.667 | |
SVM | 0.134 | 0.134 | 0.425 | |
16 | C50 | 0.348 | 0.348 | 0.100 |
KNN | 1.033 | 1.033 | 0.066 | |
RF | 0.418 | 0.418 | 0.085 | |
SVM | 1.127 | 1.127 | 0.023 | |
17 | C50 | 1.697 | 0.419 | 1.062 |
KNN | 1.368 | 0.896 | 0.424 | |
RF | 0.240 | 1.467 | 0.613 | |
SVM | 1.843 | 0.210 | 0.826 | |
18 | C50 | 0.741 | 0.078 | 0.078 |
KNN | 0.202 | 1.766 | 1.766 | |
RF | 0.061 | 0.351 | 0.351 | |
SVM | 0.155 | 0.721 | 0.721 | |
19 | C50 | 0.695 | 0.636 | 0.636 |
KNN | 0.842 | 0.099 | 0.099 | |
RF | 0.955 | 1.309 | 1.309 | |
SVM | 0.783 | 0.033 | 0.033 | |
20 | C50 | 0.226 | 0.055 | 0.226 |
KNN | 0.693 | 0.080 | 0.693 | |
RF | 0.448 | 1.362 | 0.086 | |
SVM | 0.610 | 0.244 | 0.610 | |
mean | 0.827 | 0.831 | 0.829 |
References
- Kotsiantis, S.B. Supervised Machine Learning: A Review of Classification Techniques. Informatica 2007, 31, 249–268. [Google Scholar]
- Kang, D.; Oh, S. Balanced Training/Test Set Sampling for Proper Evaluation of Classification Models. Intell. Data Anal. 2020, 24, 5–18. [Google Scholar] [CrossRef]
- Reitermanova, Z. Data Splitting. In Proceedings of the WDS, Prague, Czech Republic, 1–4 June 2010; Volume 10, pp. 31–36. [Google Scholar]
- Ditrich, J. Data Representativeness Problem in Credit Scoring. Acta Oeconomica Pragensia 2015, 2015, 3–17. [Google Scholar] [CrossRef] [Green Version]
- Elsayir, H. Comparison of Precision of Systematic Sampling with Some Other Probability Samplings. Stat. J. Theor. Appl. Stat. 2014, 3, 111–116. [Google Scholar] [CrossRef]
- Martin, E.J.; Critchlow, R.E. Beyond Mere Diversity: Tailoring Combinatorial Libraries for Drug Discovery. J. Comb. Chem. 1999, 1, 32–45. [Google Scholar] [CrossRef] [PubMed]
- Hudson, B.D.; Hyde, R.M.; Rahr, E.; Wood, J.; Osman, J. Parameter Based Methods for Compound Selection from Chemical Databases. Quant. Struct. Act. Relatsh. 1996, 15, 285–289. [Google Scholar] [CrossRef]
- Oh, S. A New Dataset Evaluation Method Based on Category Overlap. Comput. Biol. Med. 2011, 41, 115–122. [Google Scholar] [CrossRef] [PubMed]
- Raschka, S. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arXiv 2018, arXiv:181112808. [Google Scholar]
- Wu, B.; Nevatia, R. Tracking of multiple, partially occluded humans based on static body part detection. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 1, pp. 951–958. [Google Scholar]
- Shi, X.; Ling, H.; Xing, J.; Hu, W. Multi-target tracking by rank-1 tensor approximation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2387–2394. [Google Scholar]
- Rubner, Y.; Tomasi, C.; Guibas, L.J. The Earth Mover’s Distance as a Metric for Image Retrieval. Int. J. Comput. Vis. 2000, 40, 99–121. [Google Scholar] [CrossRef]
- Ioannidis, Y. The History of Histograms (abridged). In Proceedings of the 2003 VLDB Conference; Berlin, Germany, 9–12 September 2003, Freytag, J.-C., Lockemann, P., Abiteboul, S., Carey, M., Selinger, P., Heuer, A., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 2003; pp. 19–30. ISBN 978-0-12-722442-8. [Google Scholar]
- Bityukov, S.I.; Maksimushkina, A.V.; Smirnova, V.V. Comparison of Histograms in Physical Research. Nucl. Energy Technol. 2016, 2, 108–113. [Google Scholar] [CrossRef] [Green Version]
- Bazan, E.; Dokládal, P.; Dokladalova, E. Quantitative Analysis of Similarity Measures of Distributions. 2019; ⟨hal-01984970⟩. [Google Scholar]
- Covert, I.; Lundberg, S.; Lee, S.-I. Understanding Global Feature Contributions with Additive Importance Measures. Adv. Neural Inf. Process. Syst. 2020, 33, 17212–17223. [Google Scholar]
- Zheng, A.; Casari, A. Feature Engineering for Machine Learning, 1st ed.; O Reilly: Sebastopol, CA, USA, 2018; pp. 1–4. [Google Scholar]
No | Name | # of Features | # of Instances | # of Class |
---|---|---|---|---|
1 | audit | 25 | 772 | 2 |
2 | avila | 10 | 10,430 | 12 |
3 | breastcancer | 30 | 569 | 2 |
4 | breastTissue | 91 | 106 | 6 |
5 | ecoil | 7 | 336 | 8 |
6 | Frogs_MFCCs | 22 | 7127 | 3 |
7 | gender_classification | 7 | 5001 | 2 |
8 | glass | 9 | 214 | 6 |
9 | hill_Valley | 100 | 1212 | 2 |
10 | ionosphere | 33 | 351 | 2 |
11 | iris | 4 | 150 | 3 |
12 | liver | 6 | 345 | 2 |
13 | music_genre | 26 | 1000 | 10 |
14 | pima_diabetes | 8 | 768 | 2 |
15 | satimage | 36 | 4435 | 6 |
16 | seed | 7 | 210 | 3 |
17 | statlog_segment | 16 | 2310 | 7 |
18 | wdbc | 30 | 569 | 2 |
19 | winequality | 11 | 4893 | 6 |
20 | Wireless_Indoor | 7 | 2000 | 4 |
Classifier | R Package | Parameter Values |
---|---|---|
KNN | class | k = 5 |
SVM | e1071 | Default |
RF | randomForest | Default |
C50 | C50 | trials = 1 |
Dataset | Classifier | MA | SD | RBS Accuracy | FWS Accuracy | RBS MAI | FWS MAI |
---|---|---|---|---|---|---|---|
1 | C50 | 0.998 | 0.004 | 0.99479 | 0.990 | 0.972 | 2.383 |
KNN | 0.957 | 0.014 | 0.94792 | 0.949 | 0.628 | 0.552 | |
RF | 0.998 | 0.003 | 0.99479 | 1.000 | 1.083 | 0.479 | |
SVM | 0.969 | 0.012 | 0.9375 | 0.959 | 2.623 | 0.792 | |
2 | C50 | 0.974 | 0.004 | 0.97694 | 0.974 | 0.645 | 0.076 |
KNN | 0.698 | 0.007 | 0.69254 | 0.702 | 0.758 | 0.474 | |
RF | 0.979 | 0.003 | 0.97771 | 0.975 | 0.388 | 1.233 | |
SVM | 0.69 | 0.011 | 0.68255 | 0.678 | 0.652 | 1.075 | |
3 | C50 | 0.936 | 0.021 | 0.986 | 0.951 | 2.389 | 0.741 |
KNN | 0.968 | 0.013 | 0.986 | 0.965 | 1.348 | 0.202 | |
RF | 0.959 | 0.017 | 0.972 | 0.958 | 0.740 | 0.061 | |
SVM | 0.974 | 0.012 | 0.993 | 0.972 | 1.532 | 0.155 | |
4 | C50 | 0.665 | 0.069 | 0.708 | 0.676 | 0.632 | 0.161 |
KNN | 0.661 | 0.078 | 0.625 | 0.595 | 0.458 | 0.845 | |
RF | 0.699 | 0.066 | 0.583 | 0.649 | 1.746 | 0.760 | |
SVM | 0.598 | 0.070 | 0.583 | 0.514 | 0.215 | 1.213 | |
5 | C50 | 0.803 | 0.036 | 0.783 | 0.822 | 0.546 | 0.538 |
KNN | 0.850 | 0.027 | 0.855 | 0.856 | 0.209 | 0.214 | |
RF | 0.860 | 0.029 | 0.831 | 0.878 | 0.987 | 0.618 | |
SVM | 0.807 | 0.061 | 0.735 | 0.778 | 1.191 | 0.485 | |
6 | C50 | 0.963 | 0.005 | 0.967 | 0.961 | 0.754 | 0.268 |
KNN | 0.992 | 0.002 | 0.991 | 0.992 | 0.319 | 0.243 | |
RF | 0.987 | 0.003 | 0.984 | 0.987 | 0.973 | 0.159 | |
SVM | 0.992 | 0.002 | 0.990 | 0.990 | 1.037 | 0.737 | |
7 | C50 | 0.972 | 0.004 | 0.970 | 0.970 | 0.369 | 0.347 |
KNN | 0.965 | 0.005 | 0.970 | 0.965 | 1.086 | 0.122 | |
RF | 0.974 | 0.004 | 0.977 | 0.973 | 0.655 | 0.317 | |
SVM | 0.972 | 0.004 | 0.973 | 0.970 | 0.298 | 0.252 | |
8 | C50 | 0.686 | 0.055 | 0.615 | 0.694 | 1.275 | 0.145 |
KNN | 0.634 | 0.049 | 0.596 | 0.645 | 0.768 | 0.231 | |
RF | 0.779 | 0.048 | 0.750 | 0.758 | 0.608 | 0.439 | |
SVM | 0.686 | 0.048 | 0.673 | 0.629 | 0.276 | 1.185 | |
9 | C50 | 0.505 | 0.002 | 0.505 | 0.503 | 0.110 | 0.807 |
KNN | 0.548 | 0.025 | 0.558 | 0.549 | 0.380 | 0.025 | |
RF | 0.600 | 0.026 | 0.653 | 0.601 | 2.061 | 0.065 | |
SVM | 0.515 | 0.017 | 0.545 | 0.510 | 1.776 | 0.297 | |
10 | C50 | 0.9 | 0.03 | 0.862 | 0.890 | 1.276 | 0.344 |
KNN | 0.844 | 0.029 | 0.839 | 0.846 | 0.169 | 0.071 | |
RF | 0.934 | 0.022 | 0.954 | 0.923 | 0.882 | 0.500 | |
SVM | 0.942 | 0.022 | 0.943 | 0.923 | 0.017 | 0.848 | |
11 | C50 | 0.938 | 0.036 | 0.944 | 0.951 | 0.174 | 0.365 |
KNN | 0.96 | 0.031 | 0.944 | 0.951 | 0.484 | 0.267 | |
RF | 0.956 | 0.03 | 0.944 | 0.951 | 0.376 | 0.152 | |
SVM | 0.961 | 0.031 | 0.944 | 0.951 | 0.538 | 0.316 | |
12 | C50 | 0.648 | 0.048 | 0.756 | 0.637 | 2.252 | 0.213 |
KNN | 0.607 | 0.045 | 0.605 | 0.560 | 0.062 | 1.047 | |
RF | 0.725 | 0.043 | 0.767 | 0.714 | 0.983 | 0.257 | |
SVM | 0.693 | 0.04 | 0.721 | 0.648 | 0.689 | 1.112 | |
13 | C50 | 0.485 | 0.03 | 0.484 | 0.466 | 0.029 | 0.641 |
KNN | 0.616 | 0.027 | 0.568 | 0.617 | 1.790 | 0.058 | |
RF | 0.645 | 0.027 | 0.628 | 0.652 | 0.610 | 0.256 | |
SVM | 0.654 | 0.028 | 0.648 | 0.633 | 0.229 | 0.784 | |
14 | C50 | 0.736 | 0.029 | 0.724 | 0.745 | 0.423 | 0.289 |
KNN | 0.734 | 0.026 | 0.719 | 0.724 | 0.570 | 0.349 | |
RF | 0.762 | 0.025 | 0.734 | 0.760 | 1.109 | 0.078 | |
SVM | 0.761 | 0.026 | 0.724 | 0.755 | 1.405 | 0.209 | |
15 | C50 | 0.857 | 0.010 | 0.859 | 0.852 | 0.225 | 0.454 |
KNN | 0.901 | 0.008 | 0.898 | 0.898 | 0.343 | 0.318 | |
RF | 0.910 | 0.008 | 0.914 | 0.915 | 0.548 | 0.667 | |
SVM | 0.891 | 0.008 | 0.892 | 0.894 | 0.056 | 0.425 | |
16 | C50 | 0.908 | 0.039 | 0.882 | 0.912 | 0.664 | 0.100 |
KNN | 0.928 | 0.032 | 0.882 | 0.930 | 1.421 | 0.066 | |
RF | 0.927 | 0.035 | 0.882 | 0.930 | 1.276 | 0.085 | |
SVM | 0.929 | 0.030 | 0.882 | 0.930 | 1.533 | 0.023 | |
17 | C50 | 0.964 | 0.008 | 0.977 | 0.973 | 1.644 | 1.062 |
KNN | 0.959 | 0.007 | 0.963 | 0.956 | 0.661 | 0.424 | |
RF | 0.978 | 0.006 | 0.980 | 0.974 | 0.465 | 0.613 | |
SVM | 0.944 | 0.008 | 0.955 | 0.950 | 1.340 | 0.826 | |
18 | C50 | 0.936 | 0.021 | 0.986 | 0.951 | 0.895 | 0.741 |
KNN | 0.968 | 0.013 | 0.986 | 0.965 | 0.965 | 0.202 | |
RF | 0.959 | 0.017 | 0.972 | 0.958 | 0.951 | 0.061 | |
SVM | 0.974 | 0.012 | 0.993 | 0.972 | 0.979 | 0.155 | |
19 | C50 | 0.573 | 0.014 | 0.636 | 0.581 | 4.636 | 0.636 |
KNN | 0.543 | 0.012 | 0.571 | 0.541 | 2.329 | 0.099 | |
RF | 0.684 | 0.011 | 0.723 | 0.699 | 3.404 | 1.309 | |
SVM | 0.571 | 0.011 | 0.577 | 0.572 | 0.497 | 0.033 | |
20 | C50 | 0.97 | 0.007 | 0.972 | 0.968 | 0.296 | 0.226 |
KNN | 0.984 | 0.005 | 0.98 | 0.980 | 0.732 | 0.693 | |
RF | 0.984 | 0.005 | 0.978 | 0.984 | 1.040 | 0.086 | |
SVM | 0.981 | 0.005 | 0.98 | 0.984 | 0.158 | 0.610 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shin, H.; Oh, S. Feature-Weighted Sampling for Proper Evaluation of Classification Models. Appl. Sci. 2021, 11, 2039. https://doi.org/10.3390/app11052039
Shin H, Oh S. Feature-Weighted Sampling for Proper Evaluation of Classification Models. Applied Sciences. 2021; 11(5):2039. https://doi.org/10.3390/app11052039
Chicago/Turabian StyleShin, Hyunseok, and Sejong Oh. 2021. "Feature-Weighted Sampling for Proper Evaluation of Classification Models" Applied Sciences 11, no. 5: 2039. https://doi.org/10.3390/app11052039
APA StyleShin, H., & Oh, S. (2021). Feature-Weighted Sampling for Proper Evaluation of Classification Models. Applied Sciences, 11(5), 2039. https://doi.org/10.3390/app11052039