A Survey of Six Classical Classifiers, Including Algorithms, Methodological Characteristics, Foundational Variants, and Recent Advances
Abstract
1. Introduction
2. Methodology
3. Classification Approaches
3.1. Support Vector Machines (SVM)
| Algorithm 1 Support Vector Machine (Kernel Soft-Margin SVM) |
| Require: Training set ; regularization parameter ; kernel function ; tolerance . Ensure: Support vectors , their weights , and bias . 1: Function SVM 2: Compute the kernel similarities between all training pairs to form the kernel matrix. 3: Solve the kernelized dual optimization problem in (5) to obtain . 4: Select support vectors: . 5: Compute the bias using any support vector with ; if several exist, average the bias values. 6: Return 7: End function Prediction: for a new point , compute the decision score and assign the label using (2). |
3.1.1. Characteristics
3.1.2. Foundational Variants
3.2. K-Nearest Neighbors (KNN)
| Algorithm 2 Classical k-Nearest Neighbors (KNN) |
| Require: Training set ; neighborhood size ; distance measure ; scaling rule. Ensure: Predicted class for a query instance . 1: Function KNN 2: Store all labeled instances in . 3: Scale/standardize and the stored data using the same rule. 4: For each 5: Compute distance . 6: End for 7: Select the smallest distances and their labels. 8: Assign by majority vote among those labels. 9: return 10: End function |
3.2.1. Characteristics
3.2.2. Foundational Variants
3.3. Decision Tree
| Algorithm 3 Classical Decision Tree Induction (DT) |
| Require: Training set ; feature set ; split-quality measure (IG, GR, or Gini); stopping rules. Ensure: Decision tree . 1: Function 2: If is empty then return a leaf with a default class. 3: If all labels in identical then return that leaf. 4: If is empty or a stopping rule holds then return a leaf with the majority class in . 5: Select best split (feature/threshold) by . 6: Create node and partition into . 7: For each do attach . 8: Return node. 9: End function |
3.3.1. Characteristics
3.3.2. Foundational Variants
3.4. Logistic Regression (LR)
| Algorithm 4 Binary Logistic Regression (LR) |
| Require: Training set ; convergence tolerance ; decision threshold (usually 0.5). Ensure: Coefficients . 1: function LR 2: Initialize . 3: Repeat 4: Compute for all samples using (15). 5: Evaluate using (16). 6: Update to increase . 7: Until change in . 8: Return 9: End function Prediction: for a new point , compute by (15); assign class 1 if , else class |
3.4.1. Characteristics
3.4.2. Foundational Variants
3.5. Naïve Bayes
| Algorithm 5 Naïve Bayes (generic classical NB) |
| Require: Training set ; choice of likelihood model (Gaussian, Multinomial, or Bernoulli). Ensure: Class priors and feature-wise likelihood parameters. 1: Function 2: Estimate class priors from class frequencies. 3: For each class , estimate per-feature likelihood parameters using the chosen variant. 4: Return priors and likelihood parameters. 5: End function Prediction: for a new point , compute a posterior score for each class using (17) under the independence assumption, then assign the class by (18). |
3.5.1. Characteristics
3.5.2. Foundational Variants
3.6. Random Forest (RF)
| Algorithm 6 Random Forest |
| Require: Training set ; number of trees ; number of candidate features per split ; split criterion (Gini or entropy); stopping rules. Ensure: A forest of T decision trees . 1: Function RF 2: Initialize an empty forest . 3: For do 4: Draw a bootstrap sample . 5: Grow a decision tree : 6: At each node, randomly select features. 7: Select the best split by . 8: Repeat splitting until a stopping rule holds. 9: Add . 10: End for 11: Return 12: End function Prediction: for a new point , collect votes from all trees in and output the majority class. |
3.6.1. Characteristics
3.6.2. Foundational Variants
4. Recent Research
4.1. Recent Research on Support Vector Machine (SVM)
| Study | Year | Targeted Issue | Key Outcomes |
|---|---|---|---|
| Tao et al. [148] | 2020 | Class imbalance | Top mean ranks over baselines (Friedman–Holm significant); under 10:1 imbalance reaches G-Mean 0.9867, F-Measure 0.9835, AUC 0.9775; robust to outliers, border, and class noise but sensitive to hyperparameters. |
| Gao et al. [151] | 2021 | Nonlinear decision boundaries | Highest accuracy on artificial data with advantage growing in higher dimensions; on benchmarks typically +0.2–1.75 percentage points over the second best; training ≤20 s yet ~1–2 orders slower, test time ~10−4 s per case; top AUC on some credit sets; lower accuracy variance than baselines. |
| Wang et al. [152] | 2022 | Training efficiency | On 2-D synthetic data with label flips 0–20%, L0/1-SVM’s test accuracy decreases from ≈97% to ≈78% but remains slightly above competitors; on 14 real datasets it usually achieves the highest accuracy with the fewest support vectors and short training times (~0.57–14.26 s). |
| Francis et al. [153] | 2022 | Overfitting and generalization | Accuracy about 84.9%, 84.2% and 86.2%; corresponding precision/recall/F1 about 85.7/84.1/84.9, 85.1/83.3/84.2 and 85.0/86.7/85.9; generally lower or comparable false positive and false negative rates than the other evaluated SVM variants. |
| Pimentel et al. [154] | 2024 | Scalability and training time | On the largest Car Sales sample, training is ~10× faster than full SVM (<2 h vs. 21 h); across simulated and real data, accuracy usually stays within ≤5 percentage points of full SVM while time gains grow with size, with reduced accuracy variance and a competitive time–accuracy trade-off. |
| Sowmya et al. [157] | 2025 | Improving accuracy | SHiP-RBF Achieves 96.44% and 90.12% accuracy on two intrusion-detection tasks |
4.2. Recent Research on K-Nearest Neighbors (KNN)
| Study | Year | Targeted Issue | Key Outcomes |
|---|---|---|---|
| Maillo et al. [158] | 2020 | Scalability and efficiency | Average accuracy 76.44–77.37% for k = 3, 5, 7 with GAHS-FkNN and LHS-FkNN; on average they outperform crisp kNN variants, with GAHS slowing on high-dimensional data and LHS gaining from increased partitions. |
| Gou et al. [159] | 2022 | Robustness to K selection | Average accuracy 85.49%, 80.74%, 90.78%, and 87.17% over tabular, time-series, image, and noisy-attribute tasks; RCKNCN shows markedly improved robustness to the choice of k across these domains. |
| Ma & Chi [160] | 2022 | Similarity measure and feature weighting | Outperforms standard KNN, with accuracy ranges roughly 88–100%, 89–95%, 81–88%, and 96–98%, and greater stability as dataset size increases (accuracy ranges approximate, read from Figures 9–12). |
| Liu et al. [161] | 2022 | Hyperparameter optimization | Averaged over 30 runs on WSN-DS, kNNPL-AOA attains ACC 99.721%, DR 99.171%, and FPR 6.897%, and it achieves the highest ACC and DR among the four compared kNN-based models. |
| Kiyak et al. [162] | 2023 | Robustness to noisy/ambiguous data | Average accuracy 81.01% versus 79.76% for standard KNN, with equal or better performance on 26/32 datasets; average precision 0.8129 vs. 0.7611 (>5% improvement) and F-score 0.8111 vs. 0.7779 (>3% improvement). |
| Lin [163] | 2024 | Feature and K optimization | Achieves higher accuracy than EA-KNN in 6/10 cases and higher or equal recall in 9/10 (with one tie) but is computationally heavier than EA-KNN. |
4.3. Recent Research on Decision Tree (DT)
| Study | Year | Targeted Issue | Key Outcomes |
|---|---|---|---|
| Cai et al. [164] | 2020 | Nonlinear fuzzy splits and tree compactness | Reduces average MI from 92.88 to 79.64 and TS from 16.41 to 8.54, and against two other oblique trees it lowers MI to 75.24 (vs. 86.79/87.36) at the cost of slightly larger TS (8.13 vs. 5.39/7.03); on 12 datasets, Holm tests show it is significantly more accurate than most standard axis-aligned and fuzzy decision trees but not than C4.5 or an earlier fuzzy tree. |
| Wang et al. [165] | 2020 | Multiclass splitting under imbalance | match or exceed C4.5 accuracy on all datasets and, compared with CART, are at least two percentage points more accurate on five datasets and at least two points worse only on one (“phishing”). |
| Dhebar & Deb [166] | 2021 | Nonlinear trees and interpretability | Achieves SVM-level accuracy on several benchmarks with far fewer feature appearances and rules, attains single-rule models with top accuracy (including 100% on one engineering task), and confirms scalability by producing compact, interpretable trees even on up to about 500 features. |
| Loyola-González et al. [167] | 2023 | Split-selection robustness | Attains average AUC 0.7985 and STV-VM-DT 0.7962; Friedman–Finner and Bayesian tests rank VM-DT variants above individual split measures, and their trees have average depths 9.83 and 9.79, node counts 81.89 and 81.09, and training times 14.29 and 13.28 min. |
| Zhang et al. [168] | 2024 | Causal splitting and interpretability | Produces trees with average depth 5.5 versus 8.6 for the baseline (≈35% reduction) while maintaining accuracy in the range 70.50–97.96% and AUC 63.12–97.97%, improving interpretability and fairness |
4.4. Recent Research on Logistic Regression (LR)
| Study | Year | Targeted Issue | Key Outcomes |
|---|---|---|---|
| Sheng et al. [169] | 2022 | Interpretability in high-dimensional neuroimaging | Attains 89.5–95.8% accuracy, improves classical LR by +6.3–10.4 pp (avg +8.34 pp), yields a much more balanced sensitivity–specificity profile than LR, and achieves best or near-best accuracy compared with recent competing classifiers. |
| Song et al. [170] | 2023 | High-dimensional image outliers | On a noisy EEG task, DRLRENR attains 83.33% accuracy vs. 66.67–79.17% for baselines; on five face-image tasks with 30%-pixel corruption it achieves 82.81–98.44% accuracy. |
| Charizanos et al. [171] | 2024 | Separation with class imbalance | On synthetic MAE-optimized models, fuzzy LR averages Spec 0.946, Sens 0.839, F1 0.874, MCC 0.744; on five real datasets, best F1 spans 0.807–0.996 and it drastically reduces Sens = 0/Spec = 1 collapse (45–62.5% vs. 0.9–1.3%) and separation-induced perfect-score runs (35–50% vs. <3%). |
| Khashei et al. [172] | 2024 | Loss–decision mismatch in LR | Attains 88.07% average accuracy vs. 81.95% for classical LR; on a Japanese dataset it reaches 91.58% vs. 85.33% and generally surpasses recent single and hybrid statistical/intelligent classifiers. |
| Sun [173] | 2025 | High-dimensional, correlated SNP data in GWAS LR | Lowers misclassification in high-dimensional SNP settings (for example, at n = 5000 its MCR is about 0.19–0.20 vs. 0.32–0.42). In a coronary artery disease GWAS it achieves MCR about 0.29–0.30 vs. 0.46–0.61 while selecting SNPs in genes previously linked to cardiac or metabolic traits. |
4.5. Recent Research on Naïve Bayes (NB)
| Study | Year | Targeted Issue | Key Outcomes |
|---|---|---|---|
| Chen et al. [174] | 2020 | Dependence and redundancy among features | Reduces zero–one loss and RMSE more often than it worsens them relative to classical NB (wins/draws/losses 33/14/18 and 36/9/20; p ≈ 0.05), and its average runtime (0.123 s) lies between the fastest and slowest methods. |
| Zhang et al. [175] | 2021 | Unequal informativeness of attributes and training instances | Lift mean accuracy to 84.94% and 85.52% (vs. 83.86–84.81% for competing NB variants), never lose in win–tie–loss counts (eager 7–13 wins, 23–29 ties; lazy 7–15 wins, 21–29 ties), and show clear Wilcoxon dominance with R+ far exceeding R− (eager 488.0–555.5 vs. 110.5–142.0; lazy 462.0–591.5 vs. 74.5–168.0). |
| Alizadeh et al. [176] | 2021 | Correlated high-dimensional features | Average AUC 0.92 and accuracy 0.89, while alternatives stay at or below 0.91 AUC and 0.87 accuracy. It records 98/62 wins–losses in AUC and 124/30 in accuracy, achieves the best Friedman ranks (3.7941, 2.7647; p = 3.08 × 10−6, 0.009), and post hoc tests confirm statistically significant gains. |
| Yang et al. [177] | 2023 | Uncertain and ambiguous predictions | Attains average F1 of 0.9501/0.9081 and precision of 0.9648/0.9289; versus standard Naïve Bayes on discrete data F1 improves from 0.6364 to 0.9167 and precision from 0.7778 to 1.0000, and versus Gaussian NB on continuous data F1 rises from 0.8036 to 0.9967 and precision from 0.5285 to 0.8850. |
| Kim & Lee [178] | 2023 | class imbalance | Achieving average AUC gains over NB of +6.76%, +6.71%, and +6.56%, with the best AUC on eleven of the thirty datasets. Wilcoxon tests report p < 0.05 versus baselines. |
4.6. Recent Research on Random Forest (RF)
| Study | Year | Targeted Issue | Key Outcomes |
|---|---|---|---|
| Gajowniczek et al. [179] | 2020 | Overfitting and limited generalization in tree ensembles | Yields large, significant score/AUC gains on two tasks (e.g., 61.7 → 80.7 with 0.93 → 0.985 and 30.6 → 86.1 with ≈0.97 → >0.995), but only slight, non-significant score changes (~31.5 → 31.9) and near-baseline AUC (≈0.87–0.99 with changes within about ±0.01) on the remaining tasks. |
| Bi et al. [180] | 2020 | Overfitting in small-sample, high-dimensional data | On AD vs. HC, with 340 trees and 7 evolutions reaches ~90% accuracy, and an RF using the top 290 CERF-selected pairs reaches 91.3%; Outperforms unimodal and t-test baselines and generalizes with stable accuracy to EMCI (37/36) and PPMI-PD (55/49) cohorts. |
| Wan et al. [181] | 2021 | Redundancy and inefficiency in large random forests | achieves 97.09–98.12% accuracy (≈+0.7–0.8 pp over RF baselines), cuts diagnosis time to 3.1 min (~7× faster), and reduces training and sub-forest optimization times by 51–73% and 49–72% when using four workers; it also keeps >91% accuracy at 20% noise and ~80% at 100% noise, and under imbalance remains slightly better. |
| Jalal et al. [182] | 2022 | Redundant features in high-dimensional, imbalanced text data | On binary SMS and Hate & Offensive data, IRFTC attains accuracies of 0.922 and 0.940 (≈+2.1 and +5.9 pp over RF), and on multiclass US Airline and Hate Speech it reaches 0.957 and 0.925 (≈+1.4 and +2.3 pp); it also shows the lowest accuracy standard deviations in 10-fold CV. |
| Tian et al. [183] | 2023 | Interpretability of feature subsets | Matches RF accuracy on NSCLC (0.9457 vs. 0.9483) and hESC (0.9280 vs. 0.9301), while its top-100 features are more connected (NSCLC 20.65 vs. 94.9 components, largest 73.75 vs. 3.7; hESC 31.15 vs. 83.85, largest 67 vs. 7.95), and show similar accuracy but much higher feature-selection AUC (>0.9 vs. 0.6–0.75) with more connected predictors. |
4.7. Summary Across Performance Perspectives
5. Discussion and Future Research Directions
- In Random Forest classifiers, performance gains rely on randomness from bootstrap sampling of instances and random subspace selection of features across individual trees. Although Random Forests can be more computationally demanding than simpler models, they do not uniformly outperform alternative classifiers across tasks and datasets [184]. Future research could allocate region-specific ensemble sizes to parts of the feature space (for example, subsets of instances defined by a clear procedure such as out-of-bag margin diagnostics, leaf-neighborhood error, or clustering near class boundaries [132,185,186], particularly where complex boundaries arise from overlap, imbalance, noise, outliers, or proximity to class margins. In such cases, allocating more trees to difficult regions and fewer to easy ones, under a fixed total-tree budget, may yield a controlled accuracy–compute trade-off [186,187]. We are not aware of any method that, during training within a single forest, explicitly allocates different numbers of trees to distinct regions; existing variants instead adjust capacity globally, reweigh or prune trees, fit local forests, or use dynamic ensemble selection at prediction time. Pursuing this direction may yield higher accuracy under fixed training and inference budgets, with complexity that is controllable and justifiable.
- Hyperparameters are central determinants of whether a classifier underfits, overfits, or generalizes effectively [188]. However, despite the availability of automated tuning methods, they are still often selected heuristically using domain expertise or tuned empirically [189,190]. A promising direction is to regularize these quantities using signals derived directly from training instances rather than relying on decision-boundary alignment or aggregate accuracy metrics. For instance, one could estimate an instance error ratio for each misclassified example by comparing it with a peer instance in the same class that is correctly classified with high confidence. This quantifies each instance’s relative contribution to training and allows these values to directly guide optimization. While related to instance weighting and prototype-based methods, this mechanism differs by offering a boundary-agnostic, exemplar-driven signal that can be incorporated into the training of traditional classifiers. This direction may reduce the number of optimization iterations and the overall fitting time, but computing per-instance signals introduces additional overhead, so the net benefit would need to be demonstrated empirically [191].
- A large proportion of recent studies have benchmark novel classification methods against well-established algorithms, typically demonstrating relative performance improvements using standard metrics such as accuracy, F1-score, and AUC. Yet the influence of preprocessing (e.g., normalization, handling missing values, noise reduction) and feature selection techniques is often overlooked in such evaluations [47,192]. Because feature values and dimensionality directly influence how instances align with a classifier’s decision boundary, preprocessing strategies should be systematically included in performance assessments, particularly under noisy or imbalanced conditions where outcomes can be disproportionately affected [193]. A similar concern applies to dimensionality reduction methods (e.g., PCA, LDA, t-SNE), which can substantially reshape the feature space and thereby alter classification outcomes [194]. Future research should therefore prioritize integrated evaluation frameworks that jointly assess classifiers, together with preprocessing, feature selection, and dimensionality reduction pipelines, enabling benchmarking that more faithfully reflects the complexities of real-world data and the robustness of algorithms across diverse scenarios [195].
- The perceived contribution and generalizability of numerous proposed methodologies can be restricted by the challenges that real-world datasets frequently present, including high dimensionality, overlapping classes, imbalanced instance ratios, and scalability issues [54,196,197]. One major reason is that these methods are often fine-tuned for performance in specific experimental settings, and the benchmark datasets used to test them may not accurately represent the variety and complexity of real-world situations [195]. To fill this gap, future research could look into modular or hybrid models that can still be understood at the component level. Such models could integrate dedicated mechanisms to handle distinct challenges when shaping the decision boundary. For example, one component might regulate the trade-off between accuracy and overfitting in regions with a relatively simple structure, while another could activate synthetic resampling to mitigate imbalance. An additional adaptive parameter could be designed to adjust the boundary flexibly in highly non-linear regions. With such next-generation modular models, classification methods may achieve greater robustness on real-world datasets and extend their applicability to unforeseen or emerging cases.
- In many classification methods, explicitly representing the learned decision boundary in the feature space is challenging, especially in high-dimensional settings where interpretability is constrained [198]. For example, ensemble approaches such as random forests induce fragmented, nonparametric boundaries, and probabilistic models such as Naïve Bayes yield class-conditional rules that are not easily visualized in high-dimensional spaces. Linear SVMs can yield explicit hyperplanes, but multiclass extensions and kernelized variants often increase computational complexity. Future research could investigate representing decision boundaries via per-class boundary instances. For each class, the decision boundary is explicitly encoded by a small set of class-specific border instances. These boundary instances extend beyond one-vs.-one oppositions and form a comprehensive per-class boundary structure that governs membership decisions [44,199]. From these sets, compact per-class summaries (for example, a handful of representative border examples or simple local rules) can be derived with parallel procedures, lowering training and refresh cost while yielding more interpretable and defensible decision boundaries. When new classes emerge, models could be updated gradually rather than retrained end to end, with precautions to preserve established boundaries. To ensure stability, adaptive mechanisms should minimize redundant updates and manage boundary drift for additional samples within existing classes [200].
- The distributional characteristics of instances in the feature space, as indicated by their shapes and regional densities, are crucial for classifier selection and may substantially affect the computational effort required during the learning process [201]. A potentially fruitful research direction is to dynamically derive model requirements directly from the inherent characteristics of the data distribution, rather than solely relying on experimental parameter tuning to identify the optimal configuration for each distribution [202]. Standard deviation (which shows how spread out the features are), covariance (which shows how different variables are related), and empirically estimated probability density functions could all be useful signals for dynamically guiding the learning process [203]. In other words, a good way to go is to reduce reliance on search-heavy tuning and start using methods that automatically adjust to the data distribution. This is consistent with recent advancements in meta-learning and automated machine learning (AutoML), and it further develops these techniques to facilitate more direct, distribution-driven adaptation [204].
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
- Bzdok, D.; Krzywinski, M.; Altman, N. Machine learning: Supervised methods. Nat. Methods 2018, 15, 5. [Google Scholar] [CrossRef] [PubMed]
- Wu, X.; Kumar, V.; Ross Quinlan, J.; Ghosh, J.; Yang, Q.; Motoda, H.; McLachlan, G.J.; Ng, A.; Liu, B.; Yu, P.S. Top 10 algorithms in data mining. Knowl. Inf. Syst. 2008, 14, 1–37. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Hollmann, N.; Müller, S.; Purucker, L.; Krishnakumar, A.; Körfer, M.; Hoo, S.B.; Schirrmeister, R.T.; Hutter, F. Accurate predictions on small data with a tabular foundation model. Nature 2025, 637, 319–326. [Google Scholar] [CrossRef]
- Smith, A.M.; Walsh, J.R.; Long, J.; Davis, C.B.; Henstock, P.; Hodge, M.R.; Maciejewski, M.; Mu, X.J.; Ra, S.; Zhao, S. Standard machine learning approaches outperform deep representation learning on phenotype prediction from transcriptomics data. BMC Bioinform. 2020, 21, 119. [Google Scholar] [CrossRef]
- Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019, 1, 206–215. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Xie, F.; Zhou, J.; Lee, J.W.; Tan, M.; Li, S.; Rajnthern, L.S.O.; Chee, M.L.; Chakraborty, B.; Wong, A.-K.I.; Dagan, A. Benchmarking emergency department prediction models with machine learning and public electronic health records. Sci. Data 2022, 9, 658. [Google Scholar] [CrossRef]
- Dumitrescu, E.; Hué, S.; Hurlin, C.; Tokpavi, S. Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. Eur. J. Oper. Res. 2022, 297, 1178–1192. [Google Scholar] [CrossRef]
- Li, Q.; Peng, H.; Li, J.; Xia, C.; Yang, R.; Sun, L.; Yu, P.S.; He, L. A survey on text classification: From traditional to deep learning. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 1–41. [Google Scholar] [CrossRef]
- Wang, P.; Fan, E.; Wang, P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar] [CrossRef]
- Zhang, C.; Jia, D.; Wang, L.; Wang, W.; Liu, F.; Yang, A. Comparative research on network intrusion detection methods based on machine learning. Comput. Secur. 2022, 121, 102861. [Google Scholar] [CrossRef]
- Adugna, T.; Xu, W.; Fan, J. Comparison of random forest and support vector machine classifiers for regional land cover mapping using coarse resolution FY-3C images. Remote Sens. 2022, 14, 574. [Google Scholar] [CrossRef]
- Theissler, A.; Pérez-Velázquez, J.; Kettelgerdes, M.; Elger, G. Predictive maintenance enabled by machine learning: Use cases and challenges in the automotive industry. Reliab. Eng. Syst. Saf. 2021, 215, 107864. [Google Scholar] [CrossRef]
- Wolpert, D.H. The lack of a priori distinctions between learning algorithms. Neural Comput. 1996, 8, 1341–1390. [Google Scholar] [CrossRef]
- Fernández-Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 2014, 15, 3133–3181. [Google Scholar]
- Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
- Biau, G.; Scornet, E. A random forest guided tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef]
- Mienye, I.D.; Jere, N. A survey of decision trees: Concepts, algorithms, and applications. IEEE Access 2024, 12, 86716–86727. [Google Scholar] [CrossRef]
- de Hond, A.A.; Leeuwenberg, A.M.; Hooft, L.; Kant, I.M.; Nijman, S.W.; van Os, H.J.; Aardoom, J.J.; Debray, T.P.; Schuit, E.; van Smeden, M. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: A scoping review. NPJ Digit. Med. 2022, 5, 2. [Google Scholar] [CrossRef] [PubMed]
- Woodman, R.J.; Mangoni, A.A. A comprehensive review of machine learning algorithms and their application in geriatric medicine: Present and future. Aging Clin. Exp. Res. 2023, 35, 2363–2397. [Google Scholar] [CrossRef] [PubMed]
- Menghani, G. Efficient deep learning: A survey on making deep learning models smaller, faster, and better. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
- Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Wang, J.; Zhao, Z.; Zhu, J.; Li, X.; Dong, F.; Wan, S. Improved support vector machine for voiceprint diagnosis of typical faults in power transformers. Machines 2023, 11, 539. [Google Scholar] [CrossRef]
- Kalita, D.J.; Singh, S. SVM Hyper-parameters optimization using quantized multi-PSO in dynamic environment. Soft Comput. Fusion Found. Methodol. Appl. 2020, 24, 1225. [Google Scholar] [CrossRef]
- Chang, C.-C.; Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Luo, Y.; Tseng, H.-H.; Cui, S.; Wei, L.; Ten Haken, R.K.; El Naqa, I. Balancing accuracy and interpretability of machine learning approaches for radiation treatment outcomes modeling. BJR Open 2019, 1, 20190021. [Google Scholar] [CrossRef]
- Sayeed, M.A.; Rahman, A.; Rahman, A.; Rois, R. On the interpretability of the SVM model for predicting infant mortality in Bangladesh. J. Health Popul. Nutr. 2024, 43, 170. [Google Scholar] [CrossRef]
- Rezvani, S.; Pourpanah, F.; Lim, C.P.; Wu, Q. Methods for class-imbalanced learning with support vector machines: A review and an empirical evaluation. arXiv 2024, arXiv:2406.03398. [Google Scholar] [CrossRef]
- Iranmehr, A.; Masnadi-Shirazi, H.; Vasconcelos, N. Cost-sensitive support vector machines. Neurocomputing 2019, 343, 50–64. [Google Scholar] [CrossRef]
- Steinwart, I. Sparseness of support vector machines. J. Mach. Learn. Res. 2003, 4, 1071–1105. [Google Scholar]
- Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S. Choosing multiple parameters for support vector machines. Mach. Learn. 2002, 46, 131–159. [Google Scholar] [CrossRef]
- Hofmann, T.; Schölkopf, B.; Smola, A.J. Kernel methods in machine learning. Ann. Statist. 2008, 36, 1171–1220. [Google Scholar] [CrossRef]
- Azzeh, M.; Elsheikh, Y.; Nassif, A.B.; Angelis, L. Examining the performance of kernel methods for software defect prediction based on support vector machine. Sci. Comput. Program. 2023, 226, 102916. [Google Scholar] [CrossRef]
- Piccialli, V.; Sciandrone, M. Nonlinear optimization and support vector machines. Ann. Oper. Res. 2022, 314, 15–47. [Google Scholar] [CrossRef]
- Schölkopf, B.; Smola, A.J.; Williamson, R.C.; Bartlett, P.L. New support vector algorithms. Neural Comput. 2000, 12, 1207–1245. [Google Scholar] [CrossRef]
- Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
- Khemchandani, R.; Chandra, S. Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 905–910. [Google Scholar] [CrossRef]
- Zhu, J.; Rosset, S.; Tibshirani, R.; Hastie, T. 1-norm support vector machines. Adv. Neural Inf. Process. Syst. 2003, 16. [Google Scholar]
- Veropoulos, K.; Campbell, C.; Cristianini, N. Controlling the sensitivity of support vector machines. In Proceedings of the International Joint Conference on AI, Stockholm, Sweden, 31 July–6 August 1999; p. 60. [Google Scholar]
- Crammer, K.; Singer, Y. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2001, 2, 265–292. [Google Scholar]
- Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
- Aha, D.W.; Kibler, D.; Albert, M.K. Instance-based learning algorithms. Mach. Learn. 1991, 6, 37–66. [Google Scholar] [CrossRef]
- Singh, D.; Singh, B. Investigating the impact of data normalization on classification performance. Appl. Soft Comput. 2020, 97, 105524. [Google Scholar] [CrossRef]
- Blanco-Mallo, E.; Morán-Fernández, L.; Remeseiro, B.; Bolón-Canedo, V. Do all roads lead to Rome? Studying distance measures in the context of machine learning. Pattern Recognit. 2023, 141, 109646. [Google Scholar] [CrossRef]
- Jia, H.; Cheung, Y.-m.; Liu, J. A new distance metric for unsupervised learning of categorical data. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 1065–1079. [Google Scholar] [CrossRef]
- Hall, P.; Park, B.U.; Samworth, R.J. Choice of neighbor order in nearest-neighbor classification. Ann. Statist. 2008, 36, 2135–2152. [Google Scholar] [CrossRef]
- Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Cheng, D. Learning k for knn classification. ACM Trans. Intell. Syst. Technol. (TIST) 2017, 8, 1–19. [Google Scholar] [CrossRef]
- Gadat, S.; Klein, T.; Marteau, C. Classification in general finite dimensional spaces with the k-nearest neighbor rule. Ann. Statist. 2016, 44, 982–1009. [Google Scholar] [CrossRef]
- François, D.; Wertz, V.; Verleysen, M. The concentration of fractional distances. IEEE Trans. Knowl. Data Eng. 2007, 19, 873–886. [Google Scholar] [CrossRef]
- Radovanovic, M.; Nanopoulos, A.; Ivanovic, M. Hubs in space: Popular nearest neighbors in high-dimensional data. J. Mach. Learn. Res. 2010, 11, 2487–2531. [Google Scholar]
- Györfi, L.; Weiss, R. Universal consistency and rates of convergence of multiclass prototype algorithms in metric spaces. J. Mach. Learn. Res. 2021, 22, 1–25. [Google Scholar]
- Döring, M.; Györfi, L.; Walk, H. Rate of Convergence of $ k $-Nearest-Neighbor Classification Rule. J. Mach. Learn. Res. 2018, 18, 1–16. [Google Scholar]
- Lu, S.-C.; Swisher, C.L.; Chung, C.; Jaffray, D.; Sidey-Gibbons, C. On the importance of interpretable machine learning predictions to inform clinical decision making in oncology. Front. Oncol. 2023, 13, 1129380. [Google Scholar] [CrossRef]
- Chen, G.H.; Shah, D. Explaining the success of nearest neighbor methods in prediction. Found. Trends® Mach. Learn. 2018, 10, 337–588. [Google Scholar] [CrossRef]
- Mullick, S.S.; Datta, S.; Das, S. Adaptive learning-based $ k $-nearest neighbor classifiers with resilience to class imbalance. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5713–5725. [Google Scholar] [CrossRef]
- Zhang, X.; Li, Y.; Kotagiri, R.; Wu, L.; Tari, Z.; Cheriet, M. KRNN: K rare-class nearest neighbour classification. Pattern Recognit. 2017, 62, 33–44. [Google Scholar] [CrossRef]
- Muja, M.; Lowe, D.G. Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 2227–2240. [Google Scholar] [CrossRef]
- Wilson, D.L. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2007, 3, 408–421. [Google Scholar] [CrossRef]
- Dudani, S.A. The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 1976, 4, 325–327. [Google Scholar] [CrossRef]
- Keller, J.M.; Gray, M.R.; Givens, J.A. A fuzzy k-nearest neighbor algorithm. IEEE Trans. Syst. Man Cybern. 1985, 4, 580–585. [Google Scholar] [CrossRef]
- Sun, S.; Huang, R. An adaptive k-nearest neighbor algorithm. In Proceedings of the 2010 Seventh International Conference on Fuzzy Systems And Knowledge Discovery, Yantai, China, 10–12 August 2010; IEEE: New York City, NY, USA, 2010; pp. 91–94. [Google Scholar]
- Weinberger, K.Q.; Saul, L.K. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 2009, 10, 207–244. [Google Scholar]
- Hart, P. The condensed nearest neighbor rule (corresp.). IEEE Trans. Inf. Theory 1968, 14, 515–516. [Google Scholar] [CrossRef]
- Quinlan, J.R. C4.5: Programs for Machine Learning; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
- Breiman, L.; Friedman, J.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman and Hall/CRC: Boca Raton, FL, USA, 2017. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Costa, V.G.; Pedreira, C.E. Recent advances in decision trees: An updated survey. Artif. Intell. Rev. 2023, 56, 4765–4800. [Google Scholar] [CrossRef]
- Zhang, G.; Gionis, A. Regularized impurity reduction: Accurate decision trees with complexity guarantees. Data Min. Knowl. Discov. 2023, 37, 434–475. [Google Scholar] [CrossRef]
- Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef]
- Klusowski, J.M.; Tian, P.M. Large scale prediction with decision trees. J. Am. Stat. Assoc. 2024, 119, 525–537. [Google Scholar] [CrossRef]
- Lazebnik, T.; Bunimovich-Mendrazitsky, S. Decision tree post-pruning without loss of accuracy using the SAT-PP algorithm with an empirical evaluation on clinical data. Data Knowl. Eng. 2023, 145, 102173. [Google Scholar] [CrossRef]
- Liu, W.; Tsang, I.W. Making decision trees feasible in ultrahigh feature and label dimensions. J. Mach. Learn. Res. 2017, 18, 1–36. [Google Scholar]
- Loh, W.Y. Fifty years of classification and regression trees. Int. Stat. Rev. 2014, 82, 329–348. [Google Scholar] [CrossRef]
- Esposito, F.; Malerba, D.; Semeraro, G.; Kay, J. A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 1997, 19, 476–491. [Google Scholar] [CrossRef]
- Cieslak, D.A.; Hoens, T.R.; Chawla, N.V.; Kegelmeyer, W.P. Hellinger distance decision trees are robust and skew-insensitive. Data Min. Knowl. Discov. 2012, 24, 136–158. [Google Scholar] [CrossRef]
- Zhu, Y.; Li, C.; Dunson, D.B. Classification trees for imbalanced data: Surface-to-volume regularization. J. Am. Stat. Assoc. 2023, 118, 1707–1717. [Google Scholar] [CrossRef]
- Gajowniczek, K.; Ząbkowski, T. ImbTreeEntropy and ImbTreeAUC: Novel R packages for decision tree learning on the imbalanced datasets. Electronics 2021, 10, 657. [Google Scholar] [CrossRef]
- Kass, G.V. An exploratory technique for investigating large quantities of categorical data. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1980, 29, 119–127. [Google Scholar] [CrossRef]
- Domingos, P.; Hulten, G. Mining high-speed data streams. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, USA, 20–23 August 2000; pp. 71–80. [Google Scholar]
- Mehta, M.; Agrawal, R.; Rissanen, J. SLIQ: A fast scalable classifier for data mining. In International Conference on Extending Database Technology; Springer: Berlin/Heidelberg, Germany, 1996; pp. 18–32. [Google Scholar]
- Shafer, J.; Agrawal, R.; Mehta, M. SPRINT: A Scalable Parallel Classifier for Data Mining; Vldb: San Jose, CA, USA, 1996; pp. 544–555. [Google Scholar]
- Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
- Nelder, J.A.; Wedderburn, R.W. Generalized linear models. J. R. Stat. Soc. Ser. A Stat. Soc. 1972, 135, 370–384. [Google Scholar] [CrossRef]
- Theil, H. A multinomial extension of the linear logit model. In Henri Theil’s Contributions to Economics and Econometrics: Econometric Theory and Methodology; Springer: Berlin/Heidelberg, Germany, 1992; pp. 181–191. [Google Scholar]
- Lin, C.-J.; Weng, R.C.; Keerthi, S.S. Trust region Newton method for large-scale logistic regression. J. Mach. Learn. Res. 2008, 9, 627–650. [Google Scholar]
- Riley, R.D.; Snell, K.I.; Ensor, J.; Burke, D.L.; Harrell, F.E., Jr.; Moons, K.G.; Collins, G.S. Minimum sample size for developing a multivariable prediction model: PART II-binary and time-to-event outcomes. Stat. Med. 2019, 38, 1276–1296. [Google Scholar] [CrossRef] [PubMed]
- Mansournia, M.A.; Geroldinger, A.; Greenland, S.; Heinze, G. Separation in logistic regression: Causes, consequences, and control. Am. J. Epidemiol. 2018, 187, 864–870. [Google Scholar] [CrossRef] [PubMed]
- Ostrovskii, D.M.; Bach, F. Finite-sample analysis of M-estimators using self-concordance. Electron. J. Statist. 2021, 15, 326–391. [Google Scholar] [CrossRef]
- Sur, P.; Candès, E.J. A modern maximum-likelihood theory for high-dimensional logistic regression. Proc. Natl. Acad. Sci. USA 2019, 116, 14516–14525. [Google Scholar] [CrossRef]
- Vatcheva, K.P.; Lee, M.; McCormick, J.B.; Rahbar, M.H. Multicollinearity in regression analyses conducted in epidemiologic studies. Epidemiology 2016, 6, 227. [Google Scholar] [CrossRef]
- Norton, E.C.; Dowd, B.E. Log odds and the interpretation of logit models. Health Serv. Res. 2018, 53, 859–878. [Google Scholar] [CrossRef]
- Van Calster, B.; McLernon, D.J.; Van Smeden, M.; Wynants, L.; Steyerberg, E.W. Calibration: The Achilles heel of predictive analytics. BMC Med. 2019, 17, 230. [Google Scholar] [CrossRef]
- Moons, K.G.; Altman, D.G.; Reitsma, J.B.; Ioannidis, J.P.; Macaskill, P.; Steyerberg, E.W.; Vickers, A.J.; Ransohoff, D.F.; Collins, G.S. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): Explanation and elaboration. Ann. Intern. Med. 2015, 162, W1–W73. [Google Scholar] [CrossRef]
- Murdoch, W.J.; Singh, C.; Kumbier, K.; Abbasi-Asl, R.; Yu, B. Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. USA 2019, 116, 22071–22080. [Google Scholar] [CrossRef]
- King, G.; Zeng, L. Logistic regression in rare events data. Political Anal. 2001, 9, 137–163. [Google Scholar] [CrossRef]
- Esposito, C.; Landrum, G.A.; Schneider, N.; Stiefl, N.; Riniker, S. GHOST: Adjusting the decision threshold to handle imbalanced data in machine learning. J. Chem. Inf. Model. 2021, 61, 2623–2640. [Google Scholar] [CrossRef]
- Krawczyk, B. Learning from imbalanced data: Open challenges and future directions. Prog. Artif. Intell. 2016, 5, 221–232. [Google Scholar] [CrossRef]
- Bewick, V.; Cheek, L.; Ball, J. Statistics review 14: Logistic regression. Crit. Care 2005, 9, 112. [Google Scholar] [CrossRef] [PubMed]
- Cessie, S.L.; Houwelingen, J.V. Ridge estimators in logistic regression. J. R. Stat. Soc. Ser. C Appl. Stat. 1992, 41, 191–201. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- McCullagh, P. Regression models for ordinal data. J. R. Stat. Soc. Ser. B Methodol. 1980, 42, 109–127. [Google Scholar] [CrossRef]
- Firth, D. Bias reduction of maximum likelihood estimates. Biometrika 1993, 80, 27–38. [Google Scholar] [CrossRef]
- McFadden, D. Conditional Logit Analysis of Qualitative Choice Behavior; University of California, Berkeley: Berkeley, CA, USA, 1972. [Google Scholar]
- Hastie, T.; Tibshirani, R. Generalized additive models. Stat. Sci. 1986, 1, 297–310. [Google Scholar] [CrossRef]
- Domingos, P.; Pazzani, M. On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 1997, 29, 103–130. [Google Scholar] [CrossRef]
- Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
- Sebastiani, F. Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 2002, 34, 1–47. [Google Scholar] [CrossRef]
- Xu, S. Bayesian Naïve Bayes classifiers to text classification. J. Inf. Sci. 2018, 44, 48–59. [Google Scholar] [CrossRef]
- Chen, S.F.; Goodman, J. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 1999, 13, 359–394. [Google Scholar] [CrossRef]
- Wang, W.; Duan, Y.; Cao, L.; Jiang, Z. Application of improved Naive Bayes classification algorithm in 5G signaling analysis. J. Supercomput. 2023, 79, 6941. [Google Scholar] [CrossRef]
- Raizada, R.D.; Lee, Y.-S. Smoothness without smoothing: Why Gaussian naive Bayes is not naive for multi-subject searchlight studies. PLoS ONE 2013, 8, e69566. [Google Scholar] [CrossRef]
- Pajila, P.B.; Sheena, B.G.; Gayathri, A.; Aswini, J.; Nalini, M. A comprehensive survey on naive bayes algorithm: Advantages, limitations and applications. In Proceedings of the 2023 4th International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 20–22 September 2023; IEEE: New York City, NY, USA, 2023; pp. 1228–1234. [Google Scholar]
- Fang, X. Inference-based naive bayes: Turning naive bayes cost-sensitive. IEEE Trans. Knowl. Data Eng. 2012, 25, 2302–2313. [Google Scholar] [CrossRef]
- Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, DC, USA, 4–6 August 2001; pp. 41–46. [Google Scholar]
- Abellán, J.; Castellano, J.G. Improving the Naive Bayes classifier via a quick variable selection method using maximum of entropy. Entropy 2017, 19, 247. [Google Scholar] [CrossRef]
- Berend, D.; Kontorovich, A. A finite sample analysis of the Naive Bayes classifier. J. Mach. Learn. Res. 2015, 16, 1519–1545. [Google Scholar]
- Rahnama, A.H.A.; Bütepage, J.; Geurts, P.; Boström, H. Can local explanation techniques explain linear additive models? Data Min. Knowl. Discov. 2024, 38, 237–280. [Google Scholar] [CrossRef]
- Nagahisarchoghaei, M.; Nur, N.; Cummins, L.; Nur, N.; Karimi, M.M.; Nandanwar, S.; Bhattacharyya, S.; Rahimi, S. An empirical survey on explainable ai technologies: Recent trends, use-cases, and categories from technical and application perspectives. Electronics 2023, 12, 1092. [Google Scholar] [CrossRef]
- Lu, Y.; Cheung, Y.-M.; Tang, Y.Y. Bayes imbalance impact index: A measure of class imbalanced data set for classification problem. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 3525–3539. [Google Scholar] [CrossRef] [PubMed]
- Blanquero, R.; Carrizosa, E.; Ramírez-Cobo, P.; Sillero-Denamiel, M.R. Constrained Naïve Bayes with application to unbalanced data classification. Cent. Eur. J. Oper. Res. 2022, 30, 1403–1425. [Google Scholar] [CrossRef]
- Treder, M.S. MVPA-light: A classification and regression toolbox for multi-dimensional data. Front. Neurosci. 2020, 14, 289. [Google Scholar] [CrossRef] [PubMed]
- Friedman, N.; Geiger, D.; Goldszmidt, M. Bayesian network classifiers. Mach. Learn. 1997, 29, 131–163. [Google Scholar] [CrossRef]
- Webb, G.I.; Boughton, J.R.; Wang, Z. Not so naive Bayes: Aggregating one-dependence estimators. Mach. Learn. 2005, 58, 5–24. [Google Scholar] [CrossRef]
- Rennie, J.D.; Shih, L.; Teevan, J.; Karger, D.R. Tackling the poor assumptions of naive bayes text classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 616–623. [Google Scholar]
- John, G.H.; Langley, P. Estimating continuous distributions in Bayesian classifiers. arXiv 2013, arXiv:1302.4964. [Google Scholar] [CrossRef]
- Langley, P.; Sage, S. Induction of selective Bayesian classifiers. In Uncertainty in Artificial Intelligence; Elsevier: Amsterdam, The Netherlands, 1994; pp. 399–406. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Chi, C.-M.; Vossler, P.; Fan, Y.; Lv, J. Asymptotic properties of high-dimensional random forests. Ann. Stat. 2022, 50, 3415–3438. [Google Scholar] [CrossRef]
- Mentch, L.; Zhou, S. Randomization as regularization: A degrees of freedom explanation for random forest success. J. Mach. Learn. Res. 2020, 21, 1–36. [Google Scholar]
- Wright, M.N.; Ziegler, A. ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 2017, 77, 1–17. [Google Scholar] [CrossRef]
- Lin, Y.; Jeon, Y. Random forests and adaptive nearest neighbors. J. Am. Stat. Assoc. 2006, 101, 578–590. [Google Scholar] [CrossRef]
- Basu, S.; Kumbier, K.; Brown, J.B.; Yu, B. Iterative random forests to discover predictive and stable high-order interactions. Proc. Natl. Acad. Sci. USA 2018, 115, 1943–1948. [Google Scholar] [CrossRef] [PubMed]
- Deng, H. Interpreting tree ensembles with intrees. Int. J. Data Sci. Anal. 2019, 7, 277–287. [Google Scholar] [CrossRef]
- Athey, S.; Tibshirani, J.; Wager, S. Generalized random forests. Ann. Statist. 2019, 47, 1148–1178. [Google Scholar] [CrossRef]
- O’Brien, R.; Ishwaran, H. A random forests quantile classifier for class imbalanced data. Pattern Recognit. 2019, 90, 232–249. [Google Scholar] [CrossRef]
- He, J.; Cheng, M.X. Weighting methods for rare event identification from imbalanced datasets. Front. Big Data 2021, 4, 715320. [Google Scholar] [CrossRef]
- Probst, P.; Wright, M.N.; Boulesteix, A.L. Hyperparameters and tuning strategies for random forest. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef]
- Hothorn, T.; Hornik, K.; Zeileis, A. Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Stat. 2006, 15, 651–674. [Google Scholar] [CrossRef]
- Menze, B.H.; Kelm, B.M.; Splitthoff, D.N.; Koethe, U.; Hamprecht, F.A. On oblique random forests. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2011; pp. 453–469. [Google Scholar]
- Lin, C.-F.; Wang, S.-D. Fuzzy support vector machines. IEEE Trans. Neural Netw. 2002, 13, 464–471. [Google Scholar] [PubMed]
- Tao, X.; Li, Q.; Ren, C.; Guo, W.; He, Q.; Liu, R.; Zou, J. Affinity and class probability-based fuzzy support vector machine for imbalanced data sets. Neural Netw. 2020, 122, 289–307. [Google Scholar] [CrossRef] [PubMed]
- Tax, D.M.; Duin, R.P. Support vector data description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef]
- Yu, K.; Ji, L.; Zhang, X. Kernel nearest-neighbor algorithm. Neural Process. Lett. 2002, 15, 147–156. [Google Scholar] [CrossRef]
- Gao, Z.; Fang, S.-C.; Luo, J.; Medhin, N. A kernel-free double well potential support vector machine with applications. Eur. J. Oper. Res. 2021, 290, 248–262. [Google Scholar] [CrossRef]
- Wang, H.; Shao, Y.; Zhou, S.; Zhang, C.; Xiu, N. Support Vector Machine Classifier via $ L_ {0/1} $ L 0/1 Soft-Margin Loss. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7253–7265. [Google Scholar] [CrossRef]
- Francis, L.M.; Sreenath, N. Robust scene text recognition: Using manifold regularized twin-support vector machine. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 589–604. [Google Scholar] [CrossRef]
- Pimentel, J.S.; Ospina, R.; Ara, A. A novel fusion Support Vector Machine integrating weak and sphere models for classification challenges with massive data. Decis. Anal. J. 2024, 11, 100457. [Google Scholar] [CrossRef]
- Strack, R.; Kecman, V.; Strack, B.; Li, Q. Sphere support vector machines for large classification tasks. Neurocomputing 2013, 101, 59–67. [Google Scholar] [CrossRef]
- Wang, S.; Li, Z.; Liu, C.; Zhang, X.; Zhang, H. Training data reduction to speed up SVM training. Appl. Intell. 2014, 41, 405–420. [Google Scholar] [CrossRef]
- Sowmya, T.; Anita, E.M. A Novel SHiP Vector Machine for Network Intrusion Detection. IEEE Access 2025, 13, 117445–117463. [Google Scholar] [CrossRef]
- Maillo, J.; García, S.; Luengo, J.; Herrera, F.; Triguero, I. Fast and scalable approaches to accelerate the fuzzy k-nearest neighbors classifier for big data. IEEE Trans. Fuzzy Syst. 2019, 28, 874–886. [Google Scholar] [CrossRef]
- Gou, J.; Sun, L.; Du, L.; Ma, H.; Xiong, T.; Ou, W.; Zhan, Y. A representation coefficient-based k-nearest centroid neighbor classifier. Expert Syst. Appl. 2022, 194, 116529. [Google Scholar] [CrossRef]
- Ma, C.; Chi, Y. KNN normalized optimization and platform tuning based on hadoop. IEEE Access 2022, 10, 81406–81433. [Google Scholar] [CrossRef]
- Liu, G.; Zhao, H.; Fan, F.; Liu, G.; Xu, Q.; Nazir, S. An enhanced intrusion detection model based on improved kNN in WSNs. Sensors 2022, 22, 1407. [Google Scholar] [CrossRef]
- Ozturk Kiyak, E.; Ghasemkhani, B.; Birant, D. High-Level K-Nearest Neighbors (HLKNN): A supervised machine learning model for classification analysis. Electronics 2023, 12, 3828. [Google Scholar] [CrossRef]
- Lin, K.Y.C. Optimizing variable selection and neighbourhood size in the K-nearest neighbour algorithm. Comput. Ind. Eng. 2024, 191, 110142. [Google Scholar] [CrossRef]
- Cai, Y.; Zhang, H.; He, Q.; Duan, J. A novel framework of fuzzy oblique decision tree construction for pattern classification. Appl. Intell. 2020, 50, 2959–2975. [Google Scholar] [CrossRef]
- Wang, F.; Wang, Q.; Nie, F.; Li, Z.; Yu, W.; Ren, F. A linear multivariate binary decision tree classifier based on K-means splitting. Pattern Recognit. 2020, 107, 107521. [Google Scholar] [CrossRef]
- Dhebar, Y.; Deb, K. Interpretable rule discovery through bilevel optimization of split-rules of nonlinear decision trees for classification problems. IEEE Trans. Cybern. 2020, 51, 5573–5584. [Google Scholar] [CrossRef]
- Loyola-González, O.; Ramírez-Sáyago, E.; Medina-Pérez, M.A. Towards improving decision tree induction by combining split evaluation measures. Knowl. Based Syst. 2023, 277, 110832. [Google Scholar] [CrossRef]
- Zhang, S.; Chen, X.; Ran, X.; Li, Z.; Cao, W. Prioritizing causation in decision trees: A framework for interpretable modeling. Eng. Appl. Artif. Intell. 2024, 133, 108224. [Google Scholar] [CrossRef]
- Sheng, J.; Wu, S.; Zhang, Q.; Li, Z.; Huang, H. A binary classification study of Alzheimer’s disease based on a novel subclass weighted logistic regression method. IEEE Access 2022, 10, 68846–68856. [Google Scholar] [CrossRef]
- Song, Z.; Wang, L.; Xu, X.; Zhao, W. Doubly robust logistic regression for image classification. Appl. Math. Model. 2023, 123, 430–446. [Google Scholar] [CrossRef]
- Charizanos, G.; Demirhan, H.; İçen, D. A Monte Carlo fuzzy logistic regression framework against imbalance and separation. Inf. Sci. 2024, 655, 119893. [Google Scholar] [CrossRef]
- Khashei, M.; Etemadi, S.; Bakhtiarvand, N. A new discrete learning-based logistic regression classifier for Bankruptcy prediction. Wirel. Pers. Commun. 2024, 134, 1075–1092. [Google Scholar] [CrossRef]
- Sun, W. Integrative functional logistic regression model for genome-wide association studies. Comput. Biol. Med. 2025, 187, 109766. [Google Scholar] [CrossRef]
- Chen, S.; Webb, G.I.; Liu, L.; Ma, X. A novel selective naïve Bayes algorithm. Knowl. Based Syst. 2020, 192, 105361. [Google Scholar] [CrossRef]
- Zhang, H.; Jiang, L.; Yu, L. Attribute and instance weighted naive Bayes. Pattern Recognit. 2021, 111, 107674. [Google Scholar] [CrossRef]
- Alizadeh, S.H.; Hediehloo, A.; Harzevili, N.S. Multi independent latent component extension of naive Bayes classifier. Knowl. Based Syst. 2021, 213, 106646. [Google Scholar] [CrossRef]
- Yang, Z.; Ren, J.; Zhang, Z.; Sun, Y.; Zhang, C.; Wang, M.; Wang, L. A new three-way incremental naive Bayes classifier. Electronics 2023, 12, 1730. [Google Scholar] [CrossRef]
- Kim, T.; Lee, J.-S. Maximizing AUC to learn weighted naive Bayes for imbalanced data classification. Expert Syst. Appl. 2023, 217, 119564. [Google Scholar] [CrossRef]
- Gajowniczek, K.; Grzegorczyk, I.; Ząbkowski, T.; Bajaj, C. Weighted random forests to improve arrhythmia classification. Electronics 2020, 9, 99. [Google Scholar] [CrossRef] [PubMed]
- Bi, X.-a.; Hu, X.; Wu, H.; Wang, Y. Multimodal data analysis of Alzheimer’s disease based on clustering evolutionary random forest. IEEE J. Biomed. Health Inform. 2020, 24, 2973–2983. [Google Scholar] [CrossRef] [PubMed]
- Wan, L.; Gong, K.; Zhang, G.; Yuan, X.; Li, C.; Deng, X. An efficient rolling bearing fault diagnosis method based on spark and improved random forest algorithm. IEEE Access 2021, 9, 37866–37882. [Google Scholar] [CrossRef]
- Jalal, N.; Mehmood, A.; Choi, G.S.; Ashraf, I. A novel improved random forest for text classification using feature ranking and optimal number of trees. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 2733–2742. [Google Scholar] [CrossRef]
- Tian, L.; Wu, W.; Yu, T. Graph random forest: A graph embedded algorithm for identifying highly connected important features. Biomolecules 2023, 13, 1153. [Google Scholar] [CrossRef]
- Shmuel, A.; Glickman, O.; Lazebnik, T. A comprehensive benchmark of machine and deep learning models on structured data for regression and classification. Neurocomputing 2025, 655, 131337. [Google Scholar] [CrossRef]
- Shi, T.; Horvath, S. Unsupervised learning with random forest predictors. J. Comput. Graph. Stat. 2006, 15, 118–138. [Google Scholar] [CrossRef]
- Yang, F.; Lu, W.-h.; Luo, L.-k.; Li, T. Margin optimization based pruning for random forest. Neurocomputing 2012, 94, 54–63. [Google Scholar] [CrossRef]
- Ragab Hassen, H.; Alabdeen, Y.Z.; Gaber, M.M.; Sharma, M. D2TS: A dual diversity tree selection approach to pruning of random forests. Int. J. Mach. Learn. Cybern. 2023, 14, 467–481. [Google Scholar] [CrossRef]
- Morales-Hernández, A.; Van Nieuwenhuyse, I.; Rojas Gonzalez, S. A survey on multi-objective hyperparameter optimization algorithms for machine learning. Artif. Intell. Rev. 2023, 56, 8043–8093. [Google Scholar] [CrossRef]
- Gong, J.; Chen, T. Deep configuration performance learning: A systematic survey and taxonomy. ACM Trans. Softw. Eng. Methodol. 2024, 34, 1–62. [Google Scholar] [CrossRef]
- Kannengiesser, N.; Hasebrook, N.; Morsbach, F.; Zöller, M.-A.; Franke, J.K.; Lindauer, M.; Hutter, F.; Sunyaev, A. Practitioner Motives to Use Different Hyperparameter Optimization Methods. ACM Trans. Comput. Hum. Interact. 2025, 32, 1–33. [Google Scholar] [CrossRef]
- Soviany, P.; Ionescu, R.T.; Rota, P.; Sebe, N. Curriculum learning: A survey. Int. J. Comput. Vis. 2022, 130, 1526–1565. [Google Scholar] [CrossRef]
- Xie, J.; Wang, M.; Grant, P.W.; Pedrycz, W. Feature selection with discernibility and independence criteria. IEEE Trans. Knowl. Data Eng. 2024, 36, 6195–6209. [Google Scholar] [CrossRef]
- Niño-Adan, I.; Landa-Torres, I.; Portillo, E.; Manjarres, D. Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0. Eng. Appl. Artif. Intell. 2022, 111, 104807. [Google Scholar] [CrossRef]
- Jia, W.; Sun, M.; Lian, J.; Hou, S. Feature dimensionality reduction: A review. Complex Intell. Syst. 2022, 8, 2663–2693. [Google Scholar] [CrossRef]
- Gijsbers, P.; Bueno, M.L.; Coors, S.; LeDell, E.; Poirier, S.; Thomas, J.; Bischl, B.; Vanschoren, J. Amlb: An automl benchmark. J. Mach. Learn. Res. 2024, 25, 1–65. [Google Scholar]
- Santos, M.S.; Abreu, P.H.; Japkowicz, N.; Fernández, A.; Santos, J. A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research. Inf. Fusion 2023, 89, 228–253. [Google Scholar] [CrossRef]
- Verbraeken, J.; Wolting, M.; Katzy, J.; Kloppenburg, J.; Verbelen, T.; Rellermeyer, J.S. A survey on distributed machine learning. ACM Comput. Surv. (CSUR) 2020, 53, 1–33. [Google Scholar] [CrossRef]
- Sohns, J.T.; Garth, C.; Leitte, H. Decision boundary visualization for counterfactual reasoning. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2023; Volume 42, pp. 7–20. [Google Scholar]
- Doǧan, Ü.; Glasmachers, T.; Igel, C. A unified view on multi-class support vector classification. J. Mach. Learn. Res. 2016, 17, 1550–1831. [Google Scholar]
- Gama, J.; Žliobaitė, I.; Bifet, A.; Pechenizkiy, M.; Bouchachia, A. A survey on concept drift adaptation. ACM Comput. Surv. (CSUR) 2014, 46, 1–37. [Google Scholar] [CrossRef]
- Lorena, A.C.; Garcia, L.P.; Lehmann, J.; Souto, M.C.; Ho, T.K. How complex is your classification problem? A survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 2019, 52, 1–34. [Google Scholar] [CrossRef]
- Rivolli, A.; Garcia, L.P.; Soares, C.; Vanschoren, J.; de Carvalho, A.C. Meta-features for meta-learning. Knowl. Based Syst. 2022, 240, 108101. [Google Scholar] [CrossRef]
- Alcobaça, E.; Siqueira, F.; Rivolli, A.; Garcia, L.P.; Oliva, J.T.; De Carvalho, A.C. MFE: Towards reproducible meta-feature extraction. J. Mach. Learn. Res. 2020, 21, 1–5. [Google Scholar]
- Feurer, M.; Eggensperger, K.; Falkner, S.; Lindauer, M.; Hutter, F. Auto-sklearn 2.0: Hands-free automl via meta-learning. J. Mach. Learn. Res. 2022, 23, 1–61. [Google Scholar]
| Source/Model | DT | RF | NB | SVM | LR | KNN |
|---|---|---|---|---|---|---|
| IEEE: Title | 97 | 167 | 23 | 192 | 34 | 79 |
| IEEE: Abstract | 987 | 1550 | 283 | 2194 | 462 | 1227 |
| Science Direct | 4133 | 12,558 | 1067 | 5249 | 12,246 | 3342 |
| MDPI: Title | 44 | 121 | 10 | 126 | 21 | 16 |
| MDPI: Abstract | 2853 | 8689 | 849 | 5822 | 4305 | 2058 |
| Springer Nature: Title | 32 | 69 | 8 | 105 | 14 | 33 |
| Classifier | Identification Pool Entering Screening |
|---|---|
| DT | 216 |
| RF | 396 |
| NB | 80 |
| SVM | 581 |
| LR | 279 |
| KNN | 244 |
| Variant | Targeted Limit | Methodology | Trade-Off/Limitation |
|---|---|---|---|
| v-SVM [39] | is less interpretable as a control of errors and sparsity. | Reparametrizes soft-margin SVM by replacing C with ν ∈ (0,1], which upper-bounds the fraction of margin errors and lower-bounds the fraction of support vectors. | Easier to understand but less flexible than C-SVC because errors and complexity are not controlled separately. |
| Least Squares Support Vector Machine (LS-SVM) [40] | Standard SVM needs QP training. | It Uses equality constraints and squared-error loss, reducing training to solving linear equations and yielding dense example weights. | Dense solution, so prediction aggregates most training points. |
| Twin Support Vector Machine (TWSVM) [41] | One large QP slows training. | Solves two smaller class-specific QPs to learn nonparallel hyperplanes and assigns new points to the class whose hyperplane has the smallest perpendicular distance. | Two QPs and several hyperparameters make it more complex. |
| 1-norm Support Vector Machine (1-norm SVM) [42] | Interpretability in high-dimensional noise. | Replaces the L2 penalty with an L1 penalty that drives many coefficients to zero and uses a solution-path algorithm to trace the piecewise-linear path as features enter or leave the model. | May drop small but relevant effects; sparsity is limited by sample size; path can be heavy on very large datasets. |
| Sensitivity-controlled Support Vector Machine [43] | Class imbalance. | Uses class-dependent misclassification penalties and class-dependent kernel regularization to bias the margin toward the minority or critical class; hyperparameters are tuned using ROC-based sensitivity–specificity trade-offs. | Extra hyperparameters; still sensitive to kernel choice and data distribution. |
| Crammer–Singer Multiclass Support Vector Machine [44] | Multiclass handling. | Formulates a single global multiclass objective with one weight vector per class; enforces a margin between true and other class scores and solves it via decomposition over individual training examples. | Global, interdependent optimization increases computational complexity. |
| Variant | Targeted Limit | Methodology | Trade-Off/Limitation |
|---|---|---|---|
| Edited Nearest Neighbor (ENN) [62] | Sharpen boundaries and reduce error. | Iteratively removes samples whose label disagrees with the majority of their k nearest neighbors (typically k = 3). | Strongly dependent on the choice of k. |
| Distance-Weighted k-Nearest Neighbors (DW-KNN) [63] | Improve decisions in overlaps and reduce ties | Changes the voting rule by weighting neighbor votes by distance, giving nearer neighbors greater influence while keeping the original training set. | Depending on the chosen distance metric and weighting scheme. |
| Fuzzy k-Nearest Neighbors (FKNN) [64] | Managing unequal neighbor influence | Replaces hard votes with distance-based class-membership degrees, so each neighbor can support multiple classes according to its distance and class distribution. | Membership definition and parameter tuning are more complex. |
| Adaptive k-Nearest Neighbors (AdaNN) [65] | Improve reliability under varying densities. | For each training sample, finds its smallest correct k in the range 1–9; at prediction time, a query adopts the k of its nearest training neighbor, adapting neighborhood size to local data. | Needs offline k-search and still uses the full dataset at test time. |
| Large Margin Nearest Neighbor (LMNN) [66] | Use discriminative metric to improve accuracy. | Learns a discriminative Mahalanobis metric that pulls same-class neighbors closer and pushes impostors beyond a margin to enlarge interclass separation. | Training is costly because it optimizes a large weight matrix. |
| Condensed Nearest Neighbor (CNN) [67] | speed up classification | Iteratively traverses the dataset and adds any misclassified samples to the prototype set until no training errors remain. | Sensitive to data order; subset may approach the full set in the worst case. |
| Variant | Targeted Limit | Methodology | Trade-Off/Limitation |
|---|---|---|---|
| Chi-squared Automatic Interaction Detector (CHAID) [82] | Getting Shallower, more interpretable trees. | Uses chi-square tests and p-values with recursive category merging and multiway splits for nominal predictors, splitting until no statistically significant association remains. | Chi-square calculations become difficult in high-dimensional settings. |
| Very Fast Decision Tree/Hoeffding Tree (VFDT) [83] | DT induction on infinite/high-speed data streams. | Incrementally updates attribute–class counts in streaming leaves and splits only when the best attribute beats the second-best by the Hoeffding bound; discards weak attributes locally. | Local attribute pruning can be less effective in high-dimensional settings and relies on heuristic parameters such as sample thresholds and tie-breaking rules. |
| Supervised Learning In Quest (SLIQ) [84] | Training to large, disk-resident datasets. | Global presort of numeric attributes plus a RAM class list and per-attribute lists on disk; breadth-first growth with single scan per level and Gini-based split retention, followed by MDL pruning. | Training costs still grow linearly with the number of attributes. |
| Scalable Parallelizable Induction of Decision Trees (SPRINT) [85] | Further improve scalability on large, disk-resident datasets beyond what SLIQ can achieve. | Extends SLIQ by removing the centralized class list; each attribute list directly carries class labels and record IDs and uses running histograms to evaluate all candidate splits in a single pass while keeping breadth-first growth, MDL pruning, and the categorical-split strategy. | Per-level scans over all attribute lists keep training cost linear in the number of attributes, which can limit scalability in very high-dimensional data. |
| Variant | Targeted Limit | Methodology | Trade-Off/Limitation |
|---|---|---|---|
| Ridge Logistic Regression (L2-penalized LR) [103] | Unstable maximum-likelihood estimates and poor predictive accuracy when predictors are many or strongly correlated (multicollinearity). | Maximizes a penalized log-likelihood that subtracts an L2 penalty on the squared coefficient magnitudes, shrinking them toward zero and stabilizing estimation in the presence of multicollinearity. | The quadratic penalty cannot drive coefficients exactly to zero, so all predictors remain in the model and there is no true sparsity or built-in variable selection. |
| Lasso Logistic Regression (L1-penalized LR) [104] | Need for sparse logistic models with built-in variable selection when many predictors may be irrelevant. | Maximizes a penalized log-likelihood with an L1 penalty on the absolute coefficient values, driving some coefficients exactly to zero and discarding the corresponding predictors. | With correlated predictors, it often keeps only one and discards others, losing grouped information and stability. |
| Elastic Net Logistic Regression [105] | Unstable, non-sparse estimates when predictors are many and highly correlated, where ridge or lasso alone are unsatisfactory. | Maximizes a penalized log-likelihood mixing L1 and L2; corrected version rescales coefficients to avoid double shrinkage. | The naïve elastic net over-shrinks coefficients; the corrected form alleviates this, but the original work did not treat multinomial extensions. |
| Proportional-Odds Logistic Regression (Ordinal LR) [106] | Information loss when ordinal outcomes are binarized or assigned arbitrary numeric scores. | Models the cumulative probability of being in a given or lower category as a logistic function of the predictors, with a single set of slopes and category-specific thresholds estimated across ordered classes. | Relies on proportional-odds/parallel-slopes assumption; may misfit if effects vary across thresholds. |
| Jeffreys-prior Penalized/Bias-Reduced Logistic Regression [107] | Small or sparse data (including imbalance or perfect prediction) causes biased or infinite MLE estimates. | Adds a Jeffreys-prior penalty to the likelihood and maximizes the resulting penalized likelihood iteratively with weight updates, applying data-adaptive shrinkage that stabilizes the estimates. | Bias–variance trade-off; penalty can reduce precision depending on model complexity and data distribution. |
| Multinomial Logistic Regression (Random-Utility/SoftMax LR) [108] | Binary LR is insufficient for multi-class outcomes. | Uses class-specific utilities with Gumbel noise to yield closed-form multinomial probabilities; coefficients estimated jointly by MLE. | Independence of irrelevant alternatives (IIA) may fail for similar classes; higher computational cost from joint estimation. |
| GAM-Logit (Semiparametric Logistic Regression) [109] | Linear logit assumption cannot capture complex nonlinear patterns. | Replaces linear terms with smooth functions (typically spline or basis expansions) controlled by smoothing parameters; fitted by local scoring/backfitting, which is equivalent to maximizing a penalized log-likelihood with a roughness penalty. | Additive assumption remains; sensitive to smoothing choice; higher computational cost than classical LR. |
| Rare-Events Logistic Regression (ReLogit) [99] | Standard LR underestimates probabilities in rare-event data. | Post-estimation bias correction for coefficients, probability adjustment using variance–covariance, and case–control sampling with intercept/weight correction. | Adds methodological complexity; tailored to binary rare-event contexts, less applicable to multiclass or balanced settings. |
| Variant | Targeted Limit | Methodology | Trade-Off/Limitation |
|---|---|---|---|
| Tree Augmented Naïve Bayes (TAN) [127] | Independence assumption limits accuracy. | Adds a tree-structured dependency: each feature has the class as a parent and at most one extra feature parent chosen via conditional mutual information, learned as a maximum spanning tree. | Conditioning on class and another feature enlarges probability tables, causing data fragmentation and unreliable estimates in high-dimensional or small datasets. |
| Averaged One-Dependence Estimators (AODE) [128] | Bias from the independence assumption while keeping NB’s efficiency. | Forms an ensemble of one-dependence models by letting each attribute act as a “super-parent” with the class; averages their class-probability estimates, falling back to NB for rare values. | Prediction is slower; gains may diminish in very high-dimensional data with sparse super-parent values. |
| Complement Naïve Bayes (CNB) [129] | Multinomial NB biasing decisions under class imbalance. | Reweights features using counts from the complement of each class (all other classes), then smooths and normalizes those complement statistics. | Departs from the standard generative form and requires extra complement-statistic computation, making prediction slightly slower than multinomial NB. |
| Kernel-Density Naïve Bayes (KDE-NB)/Flexible Bayes [130] | Gaussian NB’s rigid normality assumption fails for multimodal, skewed, or irregular continuous feature distributions. | Replaces Gaussian likelihoods with kernel density estimation: each training point contributes a Gaussian kernel with shared bandwidth, and class densities are averages of these kernels. | Prediction must consider all training instances, is bandwidth-sensitive, noise-vulnerable, and still assumes independence across features. |
| Selective Bayesian Classifier (SBC) [131] | NB can overweight correlated or redundant features, reducing accuracy. | Keeps NB estimation but adds greedy forward feature selection (no backtracking), retaining features that improve or do not reduce training accuracy. | Selected subset is not guaranteed optimal; independence is still assumed among kept features; training-accuracy selection can overfit. |
| Variant | Targeted Limit | Methodology | Trade-Off/Limitation |
|---|---|---|---|
| Extremely Randomized Trees (Extra-Trees) [143] | Improve accuracy and reduce training time | Increases randomness by selecting random split thresholds (no greedy threshold search) in addition to random feature subsets; uses the whole training set per tree. | Each tree is more biased; it still needs tuning for features-per-node, minimum node size, and number of trees. |
| Rotation Forest [144] | Increase base-learner diversity to improve accuracy | Trains each tree on a rotated feature space: randomly splits features into subsets, applies PCA on each subset (using 75% bootstrap and random class subset), keeps all components, and builds a rotation matrix for data transformation. | Higher computational complexity from repeated PCA and no standard hyperparameter-tuning mechanism. |
| Conditional Inference Forest (cforest) [145] | Remove split-selection bias | Uses conditional-inference trees: variable selection via permutation-based conditional independence tests (smallest p-value chosen), then split point determined within that variable; forests grown on bootstraps with random feature sub spacing. | More computation due to statistical testing at each node. |
| Oblique Random Forest (Oblique RF-Ridge) [146] | High-dimensional or correlated data | Replace axis-aligned splits with oblique splits using weighted linear combinations of features; ridge regression at each node finds the split direction, then samples are projected and thresholded by Gini. | Slower training: hyperparameters (trees, features per split) remain heuristically chosen. |
| Classifier | Accuracy and Boundary | Hyperparameters, High Dimensionality and Scalability | Class Imbalance, Multiclass Classification, Interpretability and Speed |
|---|---|---|---|
| SVM | Margin-based; kernels enable complex boundaries but accuracy depends on and kernel settings → recent work uses membership/weighting and kernel-free nonlinear formulations → evidence is still limited for truly large-scale and sparse high-dimensional regimes. | Tuning burden is dominated by and kernel parameters; cost grows with support vectors → mitigations via margin-near active sets, coreset/weak-model selection, and SV reduction → adds heuristic/tunable settings governing the efficiency–accuracy trade-off. | Soft-margin can bias toward the majority class → class-dependent costs and membership weighting address imbalance → multiclass relies on decompositions; nonlinear models remain less interpretable and SV-heavy prediction can be slow. |
| KNN | Local nonlinear; accuracy sensitive to , distance choice, and noisy neighbors → recent work strengthens neighbor selection/metric/voting → high-dimensional distance degradation persists. | Few hyperparameters (mainly but storage + neighbor search dominates → mitigation via prototype reduction and approximate/distributed indexing; sometimes joint +feature weighting/selection → added preprocessing/optimization cost scales with features. | Naturally multiclass and decisions are traceable to retrieved neighbors → majority-vote bias under imbalance and slow querying on large data remain → often needs explicit rebalancing/cost-sensitive handling. |
| DT | Piecewise nonlinear, axis-aligned; deep trees fit complex structure but overfit and become unstable → mitigations via more expressive/local splits and improved split selection (including shallower stopping) → instability and weak scalability evidence in very high-dimensional/large-scale settings persist. | Depth/node-size/leaves and pruning control complexity; many noisy/sparse attributes make split search costly and unreliable → mitigations via statistical-test splitting, streaming induction, and disk/parallel level-wise construction → added per-node computation/heuristic thresholds and linear dependence on attributes still constrain very high-dimensional regimes. | Interpretable path rules when shallow and supports multiclass, but impurity splits favor the majority class and large trees reduce clarity and speed → shallower/test- or causality-guided growth improves compactness while adding computation → classical imbalance bias remains largely unaddressed in the reviewed variants. |
| LR | Linear boundary; strong on near-linear structure but biased under nonlinearity → recent work adds structured effects, robust objectives, and discrete training → added complexity, and boundary remains linear unless nonlinear terms are introduced. | Few intrinsic hyperparameters, but high-d/collinearity and (quasi) separation destabilize MLE → regularization and structured representations stabilize and reduce effective dimensionality → tuning and added preprocessing/optimization can increase cost. | Interpretable coefficients, but imbalance plus fixed thresholds can miss the rare class → mitigations via thresholding/weighting and bias-reduction ideas → interpretability can weaken as added structure and tuning grow. |
| NB | Often competitive but independence and simple likelihoods can misfit dependencies/complex continuous structure → variants select/weight features, add limited dependencies or latent components, and use flexible densities → added structure/tuning and dependence effects remain limiting. | Nearly hyperparameter-free (mainly smoothing) and efficient with many features/classes → dependence-relaxing and flexible-density variants add tuning and can raise prediction or model-selection cost (e.g., cross-validated structure/latent size, bandwidth-like choices) → heavy tuning/CV can be costly at scale. | Interpretable additive per-feature terms and fast prediction; multiclass via class posteriors → skewed priors bias toward majority classes → imbalance-oriented NB variants reweight/optimize AUC and may use resampling, but add optimization and hyperparameter dependence. |
| RF | Strong accuracy; flexible nonlinear boundary via many trees → variants reweight/select trees, reduce redundancy, and use oblique/transform splits to exploit correlations → evidence still limited and gains can depend on heuristic choices. | Key settings (trees, features-per-split, leaf/depth) drive bias–variance; random subspaces help high-d, but cost scales with forest size → mitigations via extra randomization, redundancy pruning, and structure-based simplification → added tunables/heuristic thresholds and preprocessing or per-node tests can raise cost. | Majority voting supports multiclass but impurity and vote bias favor the majority class; ensemble is less transparent than single trees → some work improves diagnostics/feature-structure interpretability (graph-guided selection) → imbalance sensitivity and ensemble opacity largely persist without explicit correction. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Alshammari, A.H.; Bencsik, G.; Ali, A.H. A Survey of Six Classical Classifiers, Including Algorithms, Methodological Characteristics, Foundational Variants, and Recent Advances. Algorithms 2026, 19, 37. https://doi.org/10.3390/a19010037
Alshammari AH, Bencsik G, Ali AH. A Survey of Six Classical Classifiers, Including Algorithms, Methodological Characteristics, Foundational Variants, and Recent Advances. Algorithms. 2026; 19(1):37. https://doi.org/10.3390/a19010037
Chicago/Turabian StyleAlshammari, Ali Hussein, Gergely Bencsik, and Almashhadani Hasnain Ali. 2026. "A Survey of Six Classical Classifiers, Including Algorithms, Methodological Characteristics, Foundational Variants, and Recent Advances" Algorithms 19, no. 1: 37. https://doi.org/10.3390/a19010037
APA StyleAlshammari, A. H., Bencsik, G., & Ali, A. H. (2026). A Survey of Six Classical Classifiers, Including Algorithms, Methodological Characteristics, Foundational Variants, and Recent Advances. Algorithms, 19(1), 37. https://doi.org/10.3390/a19010037

