Foundations and Innovations in Data Fusion and Ensemble Learning for Effective Consensus
Abstract
:1. Introduction
2. Ensemble Learning Methods
3. Aggregation
4. Majority Voting
5. Theoretical Analysis of Ensemble Methods
5.1. Bias–Variance Decomposition
5.2. Margin Theory in Boosting
5.3. Ensemble Pruning
6. Bagging
7. Boosting
AdaBoost
Algorithm 1: The AdaBoost algorithm for boosting weak classifiers. |
|
8. Random Forests and Decision Forests
8.1. Random Forests
Algorithm 2: Outline of the random forest algorithm. |
|
8.2. Recent Advances in Random Forests
8.3. Decision Forests
8.4. Gradient-Boosted Decision Trees
8.5. Adaboost, GBDTs Versus Random Forests
9. Comparison of Ensemble Methods: Computational Trade-Offs
10. Solving Multiclass Classification
10.1. One-Against-All Strategy
10.2. One-Against-One Strategy
10.3. Error-Correcting Output Codes
11. Dempster–Shafer Theory of Evidence
12. Multiple Kernel Learning
13. Multiview Learning
13.1. Subspace-Based Approach
13.2. Coregularization Approach
13.3. Multiview Clustering
14. Ensemble Neural Networks
15. Theoretical Results
15.1. Ensemble Size
15.2. Diversity Versus Ensemble Accuracy
15.3. Bias Versus Variance
15.4. Regularization
16. Incremental Ensemble Learning for Streaming Data
17. Ensemble Learning Versus Deep Learning
18. Empirical Validation from the Literature
Practical Applications of Ensemble Methods
- Finance: Random forests and gradient boosting models are widely used in fraud detection, where ensemble learning helps identify anomalous transaction patterns with high accuracy [374].
- Cybersecurity: Intrusion detection systems (IDS) benefit from ensemble learning by combining multiple weak anomaly detectors, leading to enhanced threat detection [377].
- Autonomous systems: In self-driving cars, ensembles of deep learning models help improve object detection and scene understanding, increasing safety in real-world deployment [378].
19. Future Research Directions
19.1. Theoretical Analysis of Ensemble Learning
19.2. Integration of Ensemble Learning and Deep Learning
19.3. Multimodal Data Fusion
19.4. Processing Low-Quality Multimodal Data
20. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AdaBoost | adaptive boosting |
ADMM | alternating direction method of multipliers |
AE | autoencoder |
Agghoo | aggregated hold-out |
ARCing | adaptive resampling and combining |
bagging | bootstrap aggregating |
BPA | basic probability assignment |
CART | classification and regression trees |
CCA | canonical correlation analysis |
DAGSVM | directed acyclic graph SVM |
ECOC | error-correcting output codes |
EKF | extended Kalman filter |
EM | expectation-maximization |
EnKF | ensemble Kalman filter |
FCM | fuzzy C-means |
GBDT | gradient-boosted decision tree |
GBM | gradient boosting machines |
GFDC | granule fusion density-based clustering with evidential reasoning |
k-NN | k-nearest neighbors |
LDA | linear discriminant analysis |
LP | linear programming |
MCMC | Markov Chain Monte Carlo |
mDA | marginalized denoising autoencoder |
MFC-ACL | multiview fusion clustering with attentive contrastive learning |
MKL | multiple kernel learning |
MSE | mean squared error |
MVRL | multiview representation learning |
NMF | nonnegative matrix factorization |
MULPP | multiview uncorrelated locality-preserving projection |
OPLS | orthogonal partial least squares |
OWA | ordered weighted averaging |
OXT | online extra trees |
PAC | probably approximately correct |
PCA | principal component analysis |
probability density function | |
PLS | partial least squares |
QCQP | quadratically constrained quadratic program |
RBF | radial basis function |
RC | randomized combination |
ReLU | rectified linear unit |
ResNet | residual network |
RTRL | real-time recurrent learning |
SDP | semidefinite programming |
SNR | signal-to-noise ratio |
SPORF | sparse projection oblique randomer forests |
SVM | support vector machine |
TFFC | transformer feature fusion contrastive module |
XGBoost | extreme gradient boosting |
References
- Tumer, K.; Ghosh, J. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognit. 1996, 29, 341–348. [Google Scholar] [CrossRef]
- Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976. [Google Scholar]
- Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
- Schapire, R.E. The strength of weak learnability. Mach. Learn. 1990, 5, 197–227. [Google Scholar] [CrossRef]
- Du, K.-L.; Swamy, M.N.S. Wireless Communication Systems; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
- Koliander, G.; El-Laham, Y.; Djuric, P.M.; Hlawatsch, F. Fusion of probability density functions. Proc. IEEE 2022, 110, 404–453. [Google Scholar] [CrossRef]
- Ting, K.M.; Witten, I.H. Stacking bagged and dagged models. In Proceedings of the International Conference on Machine Learning (ICML), Nashville, TN, USA, 8–12 July 1997. [Google Scholar]
- Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Bauer, E.; Kohavi, R. An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Mach. Learn. 1999, 36, 105–139. [Google Scholar] [CrossRef]
- Webb, G.I. MultiBoosting: A technique for combining boosting and wagging. Mach Learn. 2000, 40, 159–196. [Google Scholar] [CrossRef]
- Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pat. Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Rodriguez, J.J.; Kuncheva, L.I.; Alonso, C.J. Rotation forest: A new classifier ensemble method. IEEE Trans. Pat. Anal. Mach. Intell. 2006, 28, 1619–1630. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Yao, X. Ensemble learning via negative correlation. Neural Netw. 1999, 12, 1399–1404. [Google Scholar] [CrossRef]
- Hansen, L.K.; Salamon, P. Neural network ensembles. IEEE Trans Pat. Anal. Mach. Intell. 1990, 12, 993–1001. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Moghimi, M.; Belongie, S.J.; Saberian, M.; Yang, J.; Vasconcelos, N.; Li, L.-J. Boosted convolutional neural networks. In Proceedings of the British Machine Vision Conference (BMVC), Dundee, UK, 19–22 September 2016; pp. 1–13. [Google Scholar]
- Huang, G.; Li, Y.; Pleiss, G.; Liu, Z.; Hopcroft, J.E.; Weinberger, K.Q. Snapshot ensembles: Train 1, get M for free. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017; pp. 1–9. [Google Scholar]
- Wyner, A.J.; Olson, M.; Bleich, J.; Mease, D. Explaining the success of AdaBoost and random forests as interpolating classifiers. J. Mach. Learn. Res. 2017, 18, 1–33. [Google Scholar]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition (ICDAR), Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
- Jacobs, R.A.; Jordan, M.I.; Nowlan, S.J.; Hinton, G.E. Adaptive mixtures of local experts. Neural Comput. 1991, 3, 79–87. [Google Scholar] [CrossRef]
- Jiang, W. The VC dimension for mixtures of binary classifiers. Neural Comput. 2000, 12, 1293–1301. [Google Scholar] [CrossRef] [PubMed]
- Tresp, V. A Bayesian committee machine. Neural Comput. 2000, 12, 2719–2741. [Google Scholar] [CrossRef] [PubMed]
- Domingos, P. Bayesian averaging of classifiers and the overfitting problem. In Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, USA, 29 June–2 July 2000; Morgan Kaufmann: San Mateo, CA, USA, 2000; pp. 223–230. [Google Scholar]
- Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
- Wolpert, D.; Macready, W.G. Combining Stacking with Bagging to Improve a Learning Algorithm; Technical Report; Santa Fe Inst.: Santa Fe, NM, USA, 1996; p. 30. [Google Scholar]
- Clarke, B. Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 2003, 4, 683–712. [Google Scholar]
- Lakshminarayanan, B.; Pritzel, A.; Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 3140–3148. [Google Scholar]
- Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
- Du, K.-L.; Swamy, M.N.S. Search and Optimization by Metaheuristics: Techniques and Algorithms Inspired by Nature; Springer: New York, NY, USA, 2016. [Google Scholar]
- Zhang, B.; Wu, Y.; Lu, J.; Du, K.-L. Evolutionary Computation and Its Applications in Neural and Fuzzy Systems. Appl. Comput. Intell. Soft Comput. 2011, 2011, 938240. [Google Scholar] [CrossRef]
- Vanschoren, J. Meta-learning: A Survey. arXiv 2018, arXiv:1810.03548. [Google Scholar]
- Kim, W.; Goyal, B.; Chawla, K.; Lee, J.; Kwon, K. Attention-based ensemble for deep metric learning. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 736–751. [Google Scholar]
- Sha, C.; Wang, K.; Wang, X.; Zhou, A. Ensemble pruning: A submodular function maximization perspective. In Database Systems for Advanced Applications; LNCS; Springer: Cham, Switzerland, 2014; Volume 8422, pp. 1–15. [Google Scholar]
- Du, K.-L.; Zhang, R.; Jiang, B.; Zeng, J.; Lu, J. Understanding machine learning principles: Learning, inference, generalization, and computational learning theory. Mathematics 2025, 13, 451. [Google Scholar] [CrossRef]
- Du, K.-L.; Swamy, M.N.S. Neural Networks and Statistical Learning, 2nd ed.; Springer: London, UK, 2019. [Google Scholar]
- Du, K.-L.; Jiang, B.; Lu, J.; Hua, J.; Swamy, M.N.S. Exploring kernel machines and support vector machines: Principles, techniques, and future directions. Mathematics 2024, 12, 3935. [Google Scholar] [CrossRef]
- Dietterich, T.G.; Bakiri, G. Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 1995, 2, 263–286. [Google Scholar] [CrossRef]
- Li, H.; Song, J.; Xue, M.; Zhang, H.; Song, M. A survey of neural trees: Co-evolving neural networks and decision trees. IEEE Trans. Neural Netw. Learn. Syste. 2025. [Google Scholar] [CrossRef]
- Tsybakov, A.B. Optimal aggregation of classifiers in statistical learning. Ann. Stat. 2004, 32, 135–166. [Google Scholar] [CrossRef]
- Du, K.-L.; Swamy, M.N.S. Neural Networks in a Softcomputing Framework; Springer: London, UK, 2006. [Google Scholar]
- Yager, R.R. On ordered weighted averaging aggregation operators in multicriteria decision-making. IEEE Trans. Syst. Man Cybern. 1988, 18, 183–190. [Google Scholar] [CrossRef]
- Salmon, J.; Dalalyan, A.S. Optimal aggregation of affine estimators. In Proceedings of the 24th Annual Conference on Learning Theory (COLT), Budapest, Hungary, 9–11 June 2011; pp. 635–660. [Google Scholar]
- Maillard, G.; Arlot, S.; Lerasle, M. Aggregated hold-out. J. Mach. Learn. Res. 2021, 22, 1–55. [Google Scholar]
- Hoyos-Idrobo, A.; Schwartz, Y.; Varoquaux, G.; Thirion, B. Improving sparse recovery on structured images with bagged clustering. In Proceedings of the IEEE 2015 International Workshop on Pattern Recognition in NeuroImaging (PRNI), Stanford, CA, USA, 10–12 June 2015; pp. 73–76. [Google Scholar]
- Varoquaux, G.; Raamana, P.R.; Engemann, D.A.; Hoyos-Idrobo, A.; Schwartz, Y.; Thirion, B. Assessing and tuning brain decoders: Crossvalidation, caveats, and guidelines. NeuroImage 2017, 145, 166–179. [Google Scholar] [CrossRef]
- Jung, Y.; Hu, J. A K-fold averaging cross-validation procedure. J. Nonparam. Statist. 2015, 27, 167–179. [Google Scholar] [CrossRef] [PubMed]
- Hall, P.; Robinson, A.P. Reducing variability of crossvalidation for smoothing-parameter choice. Biometrika 2009, 96, 175–186. [Google Scholar] [CrossRef]
- Rokach, L. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
- Chan, L.-W. Weighted least square ensemble networks. In Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Washington, DC, USA, 10–16 July 1999; Volume 2, pp. 1393–1396. [Google Scholar]
- McAllester, D.A. PAC-Bayesian model averaging. In Proceedings of the 12th ACM Annual Conference Computational Learning Theory (COLT), Santa Cruz, CA, USA, 6–9 July 1999; pp. 164–170. [Google Scholar]
- Langford, J.; Shawe-Taylor, J. PAC-Bayes & margins. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 9–14 December 2002; Volume 15, pp. 423–430. [Google Scholar]
- McAllester, D. Simplified PAC-Bayesian margin bounds. In Computational Learning Theory and Kernel Machines; LNCS; Springer: New York, NY, USA, 2003; Volume 2777, pp. 203–215. [Google Scholar]
- Lacasse, A.; Laviolette, F.; Marchand, M.; Germain, P.; Usunier, N. PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2006; Volume 19, pp. 769–776. [Google Scholar]
- Laviolette, F.; Marchand, M.; Roy, J.-F. From PAC-Bayes bounds to quadratic programs for majority votes. In Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, WA, USA, 28 June–2 July 2011; pp. 649–656. [Google Scholar]
- Germain, P.; Lacasse, A.; Laviolette, F.; Marchand, M.; Roy, J.-F. Risk bounds for the majority vote: From a PAC-Bayesian analysis to a learning algorithm. J. Mach. Learn. Res. 2015, 16, 787–860. [Google Scholar]
- Bellet, A.; Habrard, A.; Morvant, E.; Sebban, M. Learning a priori constrained weighted majority votes. Mach. Learn. 2014, 97, 129–154. [Google Scholar] [CrossRef]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis; Chapman & Hall/CRC: London, UK, 2004. [Google Scholar]
- Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms, 2nd ed.; Wiley: Hoboken, NJ, USA, 2014. [Google Scholar]
- Opitz, D.; Maclin, R. Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 1999, 11, 169–198. [Google Scholar] [CrossRef]
- Ueda, N.; Nakano, R. Generalization Error of Ensemble Estimators. IEEE Trans. Pat. Anal. Mach. Intell. 1996, 20, 871–885. [Google Scholar]
- Schapire, R.E.; Freund, Y.; Bartlett, P.L.; Lee, W.S. Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Stat. 1998, 26, 1651–1686. [Google Scholar]
- Bastos, M.A.M.; de Oliveira, H.B.C.; Valle, C.A. An optimal pruning algorithm of classifier ensembles: Dynamic programming approach. Neural Comput. Appl. 2020, 32, 6345–6358. [Google Scholar]
- Friedman, J.; Hastie, T.; Tibshirani, R. Additive logistic regression: A statistical view of boosting. Ann. Stat. 2000, 28, 337–407. [Google Scholar] [CrossRef]
- Lemense, G.; Liu, H. Stratified bagging for imbalanced classification. Pat. Recogn. 2022, 130, 108801. [Google Scholar]
- Ganaie, M.A.; Hu, M.; Tanveer, M.; Chen, Y. Ensemble deep learning: A review. Artif. Intell. Rev. 2022, 55, 4431–4486. [Google Scholar] [CrossRef]
- Li, Y.; Zhao, X. Kernel-enhanced bagging for high-dimensional data classification. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 5891–5904. [Google Scholar]
- Oza, N.C.; Russell, S. Online bagging and boosting. In Proceedings of the 18th International Workshop Artificial Intelligence and Statistics (AISTATS), Key West, FL, USA, 1–3 April 2001; Morgan Kaufmann: San Mateo, CA, USA, 2001; pp. 105–112. [Google Scholar]
- Lee, H.K.H.; Clyde, M.A. Lossless online Bayesian bagging. J. Mach. Learn. Res. 2004, 5, 143–151. [Google Scholar]
- Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Zhou, Z.-H. Self-Adaptive Boosting for Noisy Data. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 712–728. [Google Scholar]
- Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, 3–6 July 1996; Morgan Kaufmann: San Mateo, CA, USA, 1996; pp. 148–156. [Google Scholar]
- Breiman, L. Population theory for predictor ensembles. Ann. Stat. 2004, 32, 1–11. [Google Scholar] [CrossRef]
- Aravkin, A.Y.; Bottegal, G.; Pillonetto, G. Boosting as a kernel-based method. Mach. Learn. 2019, 108, 1951–1974. [Google Scholar] [CrossRef]
- Muhlbaier, M.D.; Topalis, A.; Polikar, R. Learn++.NC: Combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. IEEE Trans. Neural Netw. 2009, 20, 152–168. [Google Scholar] [CrossRef] [PubMed]
- Dietterich, T.G. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach. Learn. 2000, 40, 139–158. [Google Scholar] [CrossRef]
- Buhlmann, P.; Yu, B. Boosting with the L2 loss: Regression and classification. J. Amer. Stat. Assoc. 2003, 98, 324–339. [Google Scholar] [CrossRef]
- Ratsch, G.; Onoda, T.; Muller, K.-R. Soft margins for AdaBoost. Mach. Learn. 2001, 43, 287–320. [Google Scholar] [CrossRef]
- Mease, D.; Wyner, A. Evidence contrary to the statistical view of boosting. J. Mach. Learn. Res. 2008, 9, 131–156. [Google Scholar]
- Johnson, R.; Zhang, T. Learning nonlinear functions using regularized greedy forest. IEEE Trans. Pat. Anal. Mach. Intell. 2013, 36, 942–954. [Google Scholar] [CrossRef] [PubMed]
- Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 3146–3154. [Google Scholar]
- Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 3–8 December 2018; pp. 6638–6648. [Google Scholar]
- Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS) Datasets Benchmarks Track, New Orleans, LA, USA, 28 November–9 December 2022; pp. 1–14. [Google Scholar]
- Shwartz-Ziv, R.; Armon, A. Tabular data: Deep learning is not all you need. Inf. Fusion 2022, 81, 84–90. [Google Scholar] [CrossRef]
- Guestrin, C. PAC-Learning, VC Dimension and Margin-Based Bounds; Lecture Notes for 10-701/15-781: Machine Learning; Carnegie Mellon University: Pittsburgh, PA, USA, 2006; Available online: https://www.cs.cmu.edu/~guestrin/Class/10701-S07/Slides/learning-theory2-big-picture.pdf (accessed on 1 February 2025.).
- Schapire, R.E.; Singer, Y. Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 1999, 37, 297–336. [Google Scholar] [CrossRef]
- Collins, M.; Schapire, R.E.; Singer, Y. Logistic regression, AdaBoost and Bregman distances. Mach. Learn. 2002, 47, 253–285. [Google Scholar] [CrossRef]
- Bartlett, P.L.; Traskin, M. AdaBoost is consistent. J. Mach. Learn. Res. 2007, 8, 2347–2368. [Google Scholar]
- Mukherjee, I.; Rudin, C.; Schapire, R.E. The rate of convergence of AdaBoost. J. Mach. Learn. Res. 2013, 14, 2315–2347. [Google Scholar]
- Ratsch, G.; Warmuth, M.K. Efficient margin maximizing with boosting. J. Mach. Learn. Res. 2005, 6, 2153–2175. [Google Scholar]
- Shalev-Shwartz, S.; Singer, Y. On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms. Mach. Learn. 2010, 80, 141–163. [Google Scholar] [CrossRef]
- Li, S.Z.; Zhang, Z. FloatBoost learning and statistical face detection. IEEE Trans. Pat. Anal. Mach. Intell. 2004, 26, 1112–1123. [Google Scholar] [CrossRef]
- Gambs, S.; Kegl, B.; Aimeur, E. Privacy-preserving boosting. Data Min. Knowl. Discov. 2007, 14, 131–170. [Google Scholar] [CrossRef]
- Buhlmann, P.; Hothorn, T. Boosting algorithms: Regularization, prediction and model fitting. Statist. Sci. 2007, 22, 477–505. [Google Scholar]
- Servedio, R.A. Smooth boosting and learning with malicious noise. J. Mach. Learn. Res. 2003, 4, 633–648. [Google Scholar]
- Amit, Y.; Dekel, O.; Singer, Y. A boosting algorithm for label covering in multilabel problems. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2007; pp. 27–34. [Google Scholar]
- Zhu, J.; Zou, H.; Rosset, S.; Hastie, T. Multi-class adaBoost. Stat. Interface 2009, 2, 249–360. [Google Scholar]
- Geist, M. Soft-max boosting. Mach. Learn. 2015, 100, 305–332. [Google Scholar] [CrossRef]
- Freund, Y. An adaptive version of the boost by majority algorithm. Mach. Learn. 2001, 43, 293–318. [Google Scholar] [CrossRef]
- Kanamori, T.; Takenouchi, T.; Eguchi, S.; Murata, N. Robust loss functions for boosting. Neural Comput. 2007, 19, 2183–2244. [Google Scholar] [CrossRef] [PubMed]
- Gao, C.; Sang, N.; Tang, Q. On selection and combination of weak learners in AdaBoost. Pat. Recogn. Lett. 2010, 31, 991–1001. [Google Scholar] [CrossRef]
- Shrestha, D.L.; Solomatine, D.P. Experiments with AdaBoost.RT, an improved boosting scheme for regression. Neural Comput. 2006, 18, 1678–1710. [Google Scholar] [CrossRef] [PubMed]
- Viola, P.; Jones, M. Robust real-time object detection. Int. J. Comput. Vis. 2001, 57, 137–154. [Google Scholar] [CrossRef]
- Saberian, M.; Vasconcelos, N. Boosting algorithms for detector cascade learning. J. Mach. Learn. Res. 2014, 15, 2569–2605. [Google Scholar]
- Viola, P.; Jones, M. Fast and robust classification using asymmetric AdaBoost and a detector cascade. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2002; Volume 14, pp. 1311–1318. [Google Scholar]
- Duffy, N.; Helmbold, D. Boosting methods for regression. Mach. Learn. 2002, 47, 153–200. [Google Scholar] [CrossRef]
- Livshits, E. Lower bounds for the rate of convergence of greedy algorithms. Izv. Math. 2009, 73, 1197–1215. [Google Scholar] [CrossRef]
- Hastie, T.; Taylor, J.; Tibshirani, R.; Walther, G. Forward stagewise regression and the monotone lasso. Electron. J. Stat. 2007, 1, 1–29. [Google Scholar] [CrossRef]
- Ehrlinger, J.; Ishwaran, H. Characterizing L2Boosting. Ann. Stat. 2012, 40, 1074–1101. [Google Scholar] [CrossRef]
- Zhang, T.; Yu, B. Boosting with early stopping: Convergence and consistency. Ann. Stat. 2005, 33, 1538–1579. [Google Scholar] [CrossRef]
- Wang, Y.; Liao, X.; Lin, S. Rescaled boosting in classification. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2598–2610. [Google Scholar] [CrossRef]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Chapman & Hall/CRC: London, UK, 1984. [Google Scholar]
- Genuer, R.; Poggi, J.-M.; Tuleau, C. Random forests: Some methodological insights. arXiv 2008, arXiv:0811.3619. [Google Scholar]
- Bernard, S.; Heutte, L.; Adam, S. Influence of hyperparameters on random forest accuracy. In International Workshop on Multiple Classifier Systems; Springer: Berlin/Heidelberg, Germany, 2009; pp. 171–180. [Google Scholar]
- Probst, P.; Boulesteix, A.-L. To tune or not to tune the number of trees in random forest. J. Mach. Learn. Res. 2017, 18, 1–18. [Google Scholar]
- Fernandez-Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 2014, 15, 3133–3181. [Google Scholar]
- Biau, G.; Devroye, L.; Lugosi, G. Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 2008, 9, 2015–2033. [Google Scholar]
- Biau, G. Analysis of a random forests model. J. Mach. Learn. Res. 2012, 13, 1063–1095. [Google Scholar]
- Klusowski, J.M. Sharp analysis of a simple model for random forests. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual, 13–15 April 2021; pp. 757–765. [Google Scholar]
- Sexton, J.; Laake, P. Standard errors for bagged and random forest estimators. Computat. Stat. Data Anal. 2009, 53, 801–811. [Google Scholar] [CrossRef]
- Wager, S.; Hastie, T.; Efron, B. Confidence intervals for random forests: The Jackknife and the infinitesimal Jackknife. J. Mach. Learn. Res. 2014, 15, 1625–1651. [Google Scholar] [PubMed]
- Scornet, E.; Biau, G.; Vert, J.-P. Consistency of random forests. Ann. Stat. 2015, 43, 1716–1741. [Google Scholar] [CrossRef]
- Wager, S.; Athey, S. Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Stat. Assoc. 2018, 113, 1228–1242. [Google Scholar] [CrossRef]
- Mentch, L.; Hooker, G. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests. J. Mach. Learn. Res. 2016, 17, 841–881. [Google Scholar]
- Mourtada, J.; Gaiffas, S.; Scornet, E. Minimax optimal rates for Mondrian trees and forests. Ann. Stat. 2020, 28, 2253–2276. [Google Scholar] [CrossRef]
- Lakshminarayanan, B.; Roy, D.M.; Teh, Y.W. Mondrian forests: Efficient online random forests. In Proceedings of the Advances in Neural Information Processing Systems ((NeurIPS), Montreal, QC, Canada, 8–13 December 2014; pp. 3140–3148. [Google Scholar]
- O’Reilly, E.; Tran, N.M. Minimax rates for high-dimensional random tessellation forests. J. Mach. Learn. Res. 2024, 25, 1–32. [Google Scholar]
- Scornet, E. Random forests and kernel methods. IEEE Trans. Inf. Theo. 2016, 62, 1485–1500. [Google Scholar] [CrossRef]
- Cevid, D.; Michel, L.; Naf, J.; Buhlmann, P.; Meinshausen, N. Distributional random forests: Heterogeneity adjustment and multivariate distributional regression. J. Mach. Learn. Res. 2022, 23, 1–79. [Google Scholar]
- Lu, B.; Hardin, J. A unified framework for random forest prediction error estimation. J. Mach. Learn. Res. 2021, 22, 1–41. [Google Scholar]
- Lin, Y.; Jeon, Y. Random forests and adaptive nearest neighbors. J. Amer. Stat. Assoc. 2006, 101, 578–590. [Google Scholar] [CrossRef]
- Pospisil, T.; Lee, A.B. RFCDE: Random forests for conditional density estimation. arXiv 2018, arXiv:1804.05753. [Google Scholar]
- Kocev, D.; Vens, C.; Struyf, J.; Dzeroski, S. Ensembles of multi-objective decision trees. In Proceedings of the European Conference on Machine Learning (ECML), Berlin, Germany, 3–5 September 2007; pp. 624–631. [Google Scholar]
- Segal, M.; Xiao, Y. Multivariate random forests. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2011, 1, 80–87. [Google Scholar] [CrossRef]
- Ghosal, I.; Hooker, G. Boosting random forests to reduce bias; One-step boosted forest and its variance estimate. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual, 13–15 April 2021; PMLR: San Diego, CA, USA, 2021; pp. 757–765. [Google Scholar]
- Breiman, L. Using Adaptive Bagging to Debias Regressions. Technical report; University of California, Berkeley, Department of Statistics: Berkeley, CA, USA, 1999. [Google Scholar]
- Zhang, G.; Lu, Y. Bias-corrected random forests in regression. J. Appl. Stat. 2012, 39, 151–160. [Google Scholar] [CrossRef]
- Hooker, G.; Mentch, L. Bootstrap bias corrections for ensemble methods. Stat. Comput. 2018, 28, 77–86. [Google Scholar] [CrossRef]
- Meinshausen, N. Quantile regression forests. J. Mach. Learn. Res. 2006, 7, 983–999. [Google Scholar]
- Athey, S.; Tibshirani, J.; Wager, S. Generalized random forests. Ann. Stat. 2019, 47, 1148–1178. [Google Scholar] [CrossRef]
- Zhang, H.; Zimmerman, J.; Nettleton, D.; Nordman, D.J. Random forest prediction intervals. Amer. Stat. 2019, 74, 392–406. [Google Scholar] [CrossRef]
- Lei, J.; Wasserman, L. Distribution-free prediction bands for non-parametric regression. J. Roy. Stat. Soc. B 2014, 76, 71–96. [Google Scholar] [CrossRef]
- Johansson, U.; Bostrom, H.; Lofstrom, T.; Linusson, H. Regression conformal prediction with random forests. Mach. Learn. 2014, 97, 155–176. [Google Scholar] [CrossRef]
- Li, H.; Wang, W.; Ding, H.; Dong, J. Trees weighting random forest method for classifying high-dimensional noisy data. In Proceedings of the 2010 IEEE 7th International Conference on E-Business Engineering, Shanghai, China, 10–12 November 2010; pp. 160–163. [Google Scholar]
- Winham, S.J.; Freimuth, R.R.; Biernacka, J.M. A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. 2013, 6, 496–505. [Google Scholar] [CrossRef]
- Chen, X.; Yu, D.; Zhang, X. Optimal weighted random forests. J. Mach. Learn. Res. 2024, 25, 1–81. [Google Scholar]
- Gaiffas, S.; Merad, I.; Yu, Y. WildWood: A new random forest algorithm. IEEE Trans. Inf. Theory 2023, 69, 6586–6604. [Google Scholar] [CrossRef]
- Mourtada, J.; Gaiffas, S.; Scornet, E. AMF: Aggregated Mondrian forests for online learning. J. Roy. Stat. Soc. B 2021, 83, 505–533. [Google Scholar] [CrossRef]
- Capitaine, L.; Bigot, J.; Thiebaut, R.; Genuer, R. Frechet random forests for metric space valued regression with non Euclidean predictors. J. Mach. Learn. Res. 2024, 25, 1–41. [Google Scholar]
- Kouloumpris, E.; Vlahavas, I. Markowitz random forest: Weighting classification and regression trees with modern portfolio theory. Neurocomputing 2025, 620, 129191. [Google Scholar] [CrossRef]
- Menze, B.H.; Kelm, B.M.; Splitthoff, D.N.; Koethe, U.; Hamprecht, F.A. On oblique random forests. In Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2011; pp. 453–469. [Google Scholar]
- Blaser, R.; Fryzlewicz, P. Random rotation ensembles. J. Mach. Learn. Res. 2016, 17, 126–151. [Google Scholar]
- Rainforth, T.; Wood, F. Canonical correlation forests. arXiv 2015, arXiv:1507.05444. [Google Scholar]
- Tomita, T.M.; Browne, J.; Shen, C.; Chung, J.; Patsolic, J.L.; Falk, B.; Priebe, C.E.; Yim, J.; Burns, R.; Maggioni, M.; et al. Sparse projection oblique randomer forests. J. Mach. Learn. Res. 2020, 21, 1–39. [Google Scholar]
- Boot, T.; Nibbering, D. Subspace methods. In Macroeconomic Forecasting in the Era of Big Data; Springer: Cham, Switzerland, 2020; pp. 267–291. [Google Scholar]
- Cannings, T.I.; Samworth, R.J. Random-projection ensemble classi cation. J. R. Stat. Soc. B 2017, 79, 959–1035. [Google Scholar] [CrossRef]
- Garca-Pedrajas, N.; Ortiz-Boyer, D. Boosting random subspace method. Neural Netw. 2008, 21, 1344–1362. [Google Scholar] [CrossRef] [PubMed]
- Tian, Y.; Feng, Y. RaSE: Random subspace ensemble classification. J. Mach. Learn. Res. 2021, 22, 1–93. [Google Scholar]
- Blaser, R.; Fryzlewicz, P. Regularizing axis-aligned ensembles via data rotations that favor simpler learners. Stat. Comput. 2021, 31, 15. [Google Scholar] [CrossRef]
- Durrant, R.J.; Kaban, A. Random projections as regularizers: Learning a linear discriminant from fewer observations than dimensions. Mach. Learn. 2015, 99, 257–286. [Google Scholar] [CrossRef]
- Mukhopadhyay, M.; Dunson, D.B. Targeted random projection for prediction from high-dimensional features. J. Amer. Stat. Assoc. 2020, 115, 1998–2010. [Google Scholar] [CrossRef]
- Lee, D.; Yang, M.-H.; Oh, S. Fast and accurate head pose estimation via random projection forests. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1958–1966. [Google Scholar]
- Tomita, T.M.; Maggioni, M.; Vogelstein, J.T. Romao: Robust oblique forests with linear matrix operations. In Proceedings of the SIAM International Conference on Data Mining (SDM), Houston, TX, USA, 27–29 April 2017; pp. 498–506. [Google Scholar]
- Biau, G.; Cadre, B.; Rouviere, L. Accelerated gradient boosting. Mach. Learn. 2019, 108, 971–992. [Google Scholar] [CrossRef]
- Breiman, L. Arcing classifiers. Ann. Stat. 1998, 26, 801–849. [Google Scholar]
- Popov, S.; Morozov, S.; Babenko, A. Neural oblivious decision ensembles for deep learning on tabular data. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019; pp. 1–9. [Google Scholar]
- Yeh, C.-K.; Kim, J.S.; Yen, I.E.H.; Ravikumar, P. Representer point selection for explaining deep neural networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montral, QC, Canada, 2–8 December 2018; pp. 9311–9321. [Google Scholar]
- Pruthi, G.; Liu, F.; Kale, S.; Sundararajan, M. Estimating training data influence by tracing gradient descent. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Virtual, 6–12 December 2020; Volume 33, pp. 9311–9321. [Google Scholar]
- Brophy, J.; Hammoudeh, Z.; Lowd, D. Adapting and evaluating influence-estimation methods for gradient-boosted decision trees. J. Mach. Learn. Res. 2023, 24, 1–48. [Google Scholar]
- Geurts, P.; Wehenkel, L.; d’Alche-Buc, F. Gradient boosting for kernelized output spaces. In Proceedings of the 24th International Conference on Machine Learning (ICML), Corvallis, OR, USA, 20–24 June 2007; pp. 289–296. [Google Scholar]
- Si, S.; Zhang, H.; Keerthi, S.S.; Mahajan, D.; Dhillon, I.S.; Hsieh, C. Gradient boosted decision trees for high dimensional sparse output. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 3182–3190. [Google Scholar]
- Zhang, Z.; Jung, C. GBDT-MO: Gradient-boosted decision trees for multiple outputs. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3156–3167. [Google Scholar] [CrossRef]
- Li, X.; Du, B.; Zhang, Y.; Xu, C.; Tao, D. Iterative privileged learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 2805–2817. [Google Scholar] [CrossRef]
- Geurts, P.; Ernst, D.; Wehenkel, L. Extremely randomized trees. Mach. Learn. 2006, 63, 3–42. [Google Scholar] [CrossRef]
- Pedrajas, N.G.; Boyer, D.O. Improving multiclass pattern recognition by the combination of two strategies. IEEE Trans. Pat. Anal. Mach. Intell. 2006, 28, 1001–1006. [Google Scholar] [CrossRef]
- Platt, J.C.; Christiani, N.; Shawe-Taylor, J. Large margin DAGs for multiclass classification. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1999; Volume 12, pp. 547–553. [Google Scholar]
- Chang, C.-C.; Chien, L.-J.; Lee, Y.-J. A novel framework for multi-class classification via ternary smooth support vector machine. Pat. Recogn. 2011, 44, 1235–1244. [Google Scholar] [CrossRef]
- Kong, E.; Dietterich, T.G. Error-correcting output coding correct bias and variance. In Proceedings of the 12th International Conference on Machine Learning (ICML), Tahoe City, CA, USA, 9–12 July 1995; Morgan Kauffmanm: San Francisco, CA, USA, 1995; pp. 313–321. [Google Scholar]
- Allwein, E.L.; Schapire, R.E.; Singer, Y. Reducing multiclass to binary: A unifying approach for margin classifiers. J. Mach. Learn. Res. 2000, 1, 113–141. [Google Scholar]
- Escalera, S.; Pujol, O.; Radeva, P. On the decoding process in ternary error-correcting output codes. IEEE Trans. Pat. Anal. Mach. Intell. 2010, 32, 120–134. [Google Scholar] [CrossRef]
- Escalera, S.; Tax, D.; Pujol, O.; Radeva, P.; Duin, R. Subclass problem dependent design of error-correcting output codes. IEEE Trans. Pat. Anal. Mach. Intell. 2008, 30, 1041–1054. [Google Scholar] [CrossRef] [PubMed]
- Pujol, O.; Radeva, P.; Vitria, J. Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes. IEEE Trans. Pat. Anal. Mach. Intell. 2006, 28, 1001–1007. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R. Classification by pairwise grouping. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1998; Volume 11, pp. 451–471. [Google Scholar]
- Escalera, S.; Masip, D.; Puertas, E.; Radeva, P.; Pujol, O. Online error correcting output codes. Pat. Recogn. Lett. 2011, 32, 458–467. [Google Scholar] [CrossRef]
- Klautau, A.; Jevtic, N.; Orlitsky, A. On nearest-neighbor error-correcting output codes with application to all-pairs multiclass support vector machines. J. Mach. Learn. Res. 2003, 4, 1–15. [Google Scholar]
- Dempster, A.P. Upper and lower probabilities induced by multivalued mappings. Ann. Math. Stat. 1967, 38, 325–339. [Google Scholar] [CrossRef]
- Smets, P. The combination of evidence in the transferable belief model. IEEE Trans. Pat. Anal. Mach. Intell. 1990, 12, 447–458. [Google Scholar] [CrossRef]
- Salehy, N.; Okten, G. Monte Carlo and quasi-Monte Carlo methods for Dempster’s rule of combination. Int. J. Appr. Reason. 2022, 145, 163–186. [Google Scholar] [CrossRef]
- Quost, B.; Masson, M.-H.; Denoeux, T. Classifier fusion in the Dempster-Shafer framework using optimized t-norm based combination rules. Int. J. Approx. Reason. 2011, 52, 353–374. [Google Scholar] [CrossRef]
- Zadeh, L.A. A simple view of the Dempster-Shafer theory of evidence and its implication for the rule of combination. AI Mag. 1986, 2, 85–90. [Google Scholar]
- Schubert, J. Conflict management in Dempster-Shafer theory using the degree of falsity. Int. J. Approx. Reason. 2011, 52, 449–460. [Google Scholar] [CrossRef]
- Dezert, J.; Smarandache, F. An introduction to DSmT. arXiv 2009, arXiv:0903.0279. [Google Scholar]
- Denoeux, T. Logistic regression, neural networks and Dempster-Shafer theory: A new perspective. Knowl.-Based Syst. 2019, 176, 54–67. [Google Scholar] [CrossRef]
- Cai, M.; Wu, Z.; Li, Q.; Xu, F.; Zhou, J. GFDC: A granule fusion density-based clustering with evidential reasoning. Int. J. Approx. Reason. 2024, 164, 109075. [Google Scholar] [CrossRef]
- Grina, F.; Elouedi, Z.; Lefevre, E. Re-sampling of multi-class imbalanced data using belief function theory and ensemble learning. Int. J. Approx. Reason. 2023, 156, 1–15. [Google Scholar] [CrossRef]
- Lanckriet, G.R.G.; Cristianini, N.; Bartlett, P.; Ghaoui, L.E.; Jordan, M.I. Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 2004, 5, 27–72. [Google Scholar]
- Ong, C.S.; Smola, A.J.; Williamson, R.C. Learning the kernel with hyperkernels. J. Mach. Learn. Res. 2005, 6, 1043–1071. [Google Scholar]
- Sonnenburg, S.; Ratsch, G.; Schafer, C.; Scholkopf, B. Large scale multiple kernel learning. J. Mach. Learn. Res. 2006, 7, 1531–1565. [Google Scholar]
- Ye, J.; Ji, S.; Chen, J. Multi-class discriminant kernel learning via convex programming. J. Mach. Learn. Res. 2008, 9, 719–758. [Google Scholar]
- Kim, S.-J.; Magnani, A.; Boyd, S. Optimal kernel selection in kernel Fisher discriminant analysis. In Proceedings of the International Conference on Machine Learning (ICML), Helsinki, Finland, 28 June–1 July 2006; pp. 465–472. [Google Scholar]
- Subrahmanya, N.; Shin, Y.C. Sparse multiple kernel learning for signal processing applications. IEEE Trans. Pat. Anal. Mach. Intell. 2010, 32, 788–798. [Google Scholar] [CrossRef]
- Yang, H.; Xu, Z.; Ye, J.; King, I.; Lyu, M.R. Efficient sparse generalized multiple kernel learning. IEEE Trans. Neural Netw. 2011, 22, 433–446. [Google Scholar] [CrossRef]
- Rakotomamonjy, A.; Bach, F.; Canu, S.; Grandvalet, Y. SimpleMKL. J. Mach. Learn. Res. 2008, 9, 2491–2521. [Google Scholar]
- Chapelle, O.; Rakotomamonjy, A. Second order optimization of kernel parameters. In NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels, Proceedings of the Machine Learning Research (PMLR), Whistler, BC, Canada, 8–13 December 2008; PMLR: Cambridge, MA, USA, 2008; pp. 465–472. [Google Scholar]
- Kloft, M.; Brefeld, U.; Sonnenburg, S.; Zien, A. lp-norm multiple kernel learning. J. Mach. Learn. Res. 2011, 12, 953–997. [Google Scholar]
- Aflalo, J.; Ben-Tal, A.; Bhattacharyya, C.; Nath, J.S.; Raman, S. Variable sparsity kernel learning. J. Mach. Learn. Res. 2011, 12, 565–592. [Google Scholar]
- Suzuki, T.; Tomioka, R. SpicyMKL: A fast algorithm for multiple kernel learning with thousands of kernels. Mach. Learn. 2011, 85, 77–108. [Google Scholar] [CrossRef]
- Xu, X.; Tsang, I.W.; Xu, D. Soft margin multiple kernel learning. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 749–761. [Google Scholar]
- Vishwanathan, S.V.N.; Sun, Z.; Ampornpunt, N.; Varma, M. Multiple kernel learning and the SMO algorithm. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2010; Volume 23, pp. 465–472. [Google Scholar]
- Gonen, M. Bayesian efficient multiple kernel learning. In Proceedings of the 29th International Conference on Machine Learning (ICML), Edinburgh, UK, 18–21 June 2012; Volume 1, pp. 1–8. [Google Scholar]
- Mao, Q.; Tsang, I.W.; Gao, S.; Wang, L. Generalized multiple kernel learning with data-dependent priors. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1134–1148. [Google Scholar] [CrossRef] [PubMed]
- Huang, H.-C.; Chuang, Y.-Y.; Chen, C.-S. Multiple kernel fuzzy clustering. IEEE Trans. Fuzzy Syst. 2012, 20, 120–134. [Google Scholar] [CrossRef]
- Bickel, S.; Scheffer, T. Multi-view clustering. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM), Brighton, UK, 1–4 November 2004; pp. 19–26. [Google Scholar]
- Liu, X.; Dou, Y.; Yin, J.; Wang, L.; Zhu, E. Multiple kernel k-means clustering with matrix-induced regularization. In Proceedings of the AAAI onference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30, pp. 1–7. [Google Scholar]
- Zhou, S.; Ou, Q.; Liu, X.; Wang, S.; Liu, L.; Wang, S.; Zhu, E.; Yin, J.; Xu, X. Multiple kernel clustering with compressed subspace alignment. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 252–263. [Google Scholar] [CrossRef] [PubMed]
- Yao, Y.; Li, Y.; Jiang, B.; Chen, H. Multiple kernel k-means clustering by selecting representative kernels. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4983–4996. [Google Scholar] [CrossRef] [PubMed]
- Han, Y.; Yang, K.; Yang, Y.; Ma, Y. Localized multiple kernel learning with dynamical clustering and matrix regularization. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 486–499. [Google Scholar] [CrossRef]
- Wang, C.; Chen, M.; Huang, L.; Lai, J.; Yu, P.S. Smoothness regularized multiview subspace clustering with kernel learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 5047–5060. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Li, Z.; Tang, C.; Liu, S.; Wan, X.; Liu, X. Multiple kernel clustering with adaptive multi-scale partition selection. IEEE Trans. Know. Data Eng. 2024, 36, 6641–6652. [Google Scholar] [CrossRef]
- Li, M.; Zhang, Y.; Ma, C.; Liu, S.; Liu, Z.; Yin, J.; Liu, X.; Liao, Q. Regularized simple multiple kernel k-means with kernel average alignment. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 15910–15919. [Google Scholar] [CrossRef]
- Alioscha-Perez, M.; Oveneke, M.C.; Sahli, H. SVRG-MKL: A fast and scalable multiple kernel learning solution for features combination in multi-class classification problems. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 1710–1723. [Google Scholar] [CrossRef] [PubMed]
- Fu, L.; Zhang, M.; Li, H. Sparse RBF networks with multi-kernels. Neural Process. Lett. 2010, 32, 235–247. [Google Scholar] [CrossRef]
- Hong, S.; Chae, J. Distributed online learning with multiple kernels. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 1263–1277. [Google Scholar] [CrossRef] [PubMed]
- Shen, Y.; Chen, T.; Giannakis, G.B. Random feature-based online multi-kernel learning in environments with unknown dynamics. J. Mach. Learn. Res. 2019, 20, 1–36. [Google Scholar]
- Li, Y.; Yang, M.; Zhang, Z. A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 2018, 31, 1863–1883. [Google Scholar] [CrossRef]
- Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Zhu, Y.; Kiros, R.; Zemel, R.; Salakhutdinov, R.; Urtasun, R.; Torralba, A. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 19–27. [Google Scholar]
- Karpathy, A.; Fei-Fei, L. Deep visual-semantic alignments for generating image descriptions. IEEE Trans. Pat. Anal. Mach. Intell. 2017, 39, 664–676. [Google Scholar] [CrossRef]
- Niu, L.; Li, W.; Xu, D.; Cai, J. An exemplar-based multi-view domain generalization framework for visual recognition. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 259–272. [Google Scholar] [CrossRef] [PubMed]
- Ding, Z.; Shao, M.; Fu, Y. Incomplete multisource transfer learning. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 310–323. [Google Scholar] [CrossRef]
- Guan, Z.; Zhang, L.; Peng, J.; Fan, J. Multi-view concept learning for data representation. IEEE Trans. Knowl. Data Eng. 2015, 27, 3016–3028. [Google Scholar] [CrossRef]
- Deng, C.; Lv, Z.; Liu, W.; Huang, J.; Tao, D.; Gao, X. Multiview matrix decomposition: A new scheme for exploring discriminative information. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, 25–31 July 2015; pp. 3438–3444. [Google Scholar]
- Srivastava, N.; Salakhutdinov, R.R. Multimodal learning with deep Boltzmann machines. In Proceedings of the Neural Information Processing Systems (NIPS) Conference, Lake Tahoe, NV, USA, 3–8 December 2012; pp. 2222–2230. [Google Scholar]
- Hu, D.; Li, X.; Lu, X. Temporal multimodal learning in audiovisual speech recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3574–3582. [Google Scholar]
- Li, B.; Yuan, C.; Xiong, W.; Hu, W.; Peng, H.; Ding, X. Multi-view multi-instance learning based on joint sparse representation and Multi-view dictionary learning. IEEE Trans. Pat. Anal. Mach. Intell. 2017, 39, 2554–2560. [Google Scholar] [CrossRef]
- Baltrusaitis, T.; Ahuja, C.; Morency, L.-P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pat. Anal. Mach. Intell. 2019, 41, 423–443. [Google Scholar] [CrossRef] [PubMed]
- Zong, L.; Zhang, X.; Zhao, L.; Yu, H.; Zhao, Q. Multi-view clustering via multi-manifold regularized non-negative matrix factorization. Neural Netw. 2017, 88, 74–89. [Google Scholar] [CrossRef] [PubMed]
- Yang, W.; Shi, Y.; Gao, Y.; Wang, L.; Yang, M. Incomplete-data oriented multiview dimension reduction via sparse low-rank representation. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 6276–6291. [Google Scholar] [CrossRef] [PubMed]
- Peng, J.; Luo, P.; Guan, Z.; Fan, J. Graph-regularized multi-view semantic subspace learning. Int. J. Mach. Learn. Cybern. 2019, 10, 879–895. [Google Scholar] [CrossRef]
- Du, K.-L.; Swamy, M.N.S.; Wang, Z.-Q.; Mow, W.H. Matrix factorization techniques in machine learning, signal processing and statistics. Mathematics 2023, 11, 2674. [Google Scholar] [CrossRef]
- Liu, J.; Jiang, Y.; Li, Z.; Zhou, Z.-H.; Lu, H. Partially shared latent factor learning with multiview data. IEEE Trans. Neural Netw. Learn. Syst. 2015, 26, 1233–1246. [Google Scholar]
- Kim, H.; Choo, J.; Kim, J.; Reddy, C.K.; Park, H. Simultaneous discovery of common and discriminative topics via joint nonnegative matrix factorization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Sydney, Australia, 10–13 August 2015; pp. 567–576. [Google Scholar]
- Ngiam, J.; Khosla, A.; Kim, M.; Nam, J.; Lee, H.; Ng, A.Y. Multimodal deep learning. In Proceedings of the International Conference on Machine Learning (ICML), Bellevue, WA, USA, 28 June–2 July 2011; pp. 689–696. [Google Scholar]
- Zhang, C.; Fu, H.; Hu, Q.; Cao, X.; Xie, Y.; Tao, D. Generalized latent multi-view subspace clustering. IEEE Trans. Pat. Anal. Mach. Intell. 2020, 42, 86–99. [Google Scholar] [CrossRef]
- Trigeorgis, G.; Bousmalis, K.; Zafeiriou, S.; Schuller, B.W. A deep matrix factorization method for learning attribute representations. IEEE Trans. Pat. Anal. Mach. Intell. 2017, 39, 417–429. [Google Scholar] [CrossRef] [PubMed]
- Sharma, P.; Abrol, V.; Sao, A.K. Deep-sparse-representation-based features for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 2162–2175. [Google Scholar] [CrossRef]
- Zhao, H.; Ding, Z.; Fu, Y. Multi-view clustering via deep matrix factorization. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; pp. 2921–2927. [Google Scholar]
- Huang, S.; Kang, Z.; Xu, Z. Auto-weighted multi-view clustering via deep matrix decomposition. Pat. Recogn. 2020, 97, 107015. [Google Scholar] [CrossRef]
- Li, K.; Lu, J.; Zuo, H.; Zhang, G. Multi-source contribution learning for domain adaptation. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 5293–5307. [Google Scholar] [CrossRef]
- Kettenring, J.R. Canonical analysis of several sets of variables. Biometrika 1971, 58, 433–451. [Google Scholar] [CrossRef]
- Nielsen, A.A. Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data. IEEE Trans. Image Process. 2002, 11, 293–305. [Google Scholar] [CrossRef] [PubMed]
- Horst, P. Relations among m sets of measures. Psychometrika 1961, 26, 129–149. [Google Scholar] [CrossRef]
- Luo, Y.; Tao, D.; Ramamohanarao, K.; Xu, C.; Wen, Y. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans. Knowl. Data Eng. 2015, 27, 3111–3124. [Google Scholar] [CrossRef]
- Lai, P.L.; Fyfe, C. Kernel and nonlinear canonical correlation analysis. Int. J. Neural Syst. 2000, 10, 365–377. [Google Scholar] [CrossRef] [PubMed]
- Hsieh, W.W. Nonlinear canonical correlation analysis by neural networks. Neural Netw. 2000, 13, 1095–1105. [Google Scholar] [CrossRef]
- Andrew, G.; Arora, R.; Bilmes, J.A.; Livescu, K. Deep canonical correlation analysis. In Proceedings of the 30th International Conference on Machine Learning (ICML), Atlanta, GA, USA, 16–21 June 2013; pp. 1247–1255. [Google Scholar]
- Wang, Y.; Chen, L. Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources. Expert Syst. Appl. 2017, 72, 457–466. [Google Scholar] [CrossRef]
- Bach, F.R.; Jordan, M.I. A Probabilistic Interpretation of Canonical Correlation Analysis; Technical Report 688; Department of Statistics; University of California, Berkeley: Berkeley, CA, USA, 2005; pp. 1–11. [Google Scholar]
- Yu, B.; Krishnapuram, S.; Rosales, R.; Rao, R.B. Bayesian cotraining. J. Mach. Learn. Res. 2011, 12, 2649–2680. [Google Scholar]
- Sharma, A.; Kumar, A.; Daume, H.; Jacobs, D.W. Generalized multiview analysis: A discriminative latent space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; pp. 2160–2167. [Google Scholar]
- Horst, P. Generalized canonical correlations and their applications to experimental data. J. Clin. Psychol. 1961, 17, 331–347. [Google Scholar] [CrossRef] [PubMed]
- Va, J.; Santamara, I.; Perez, J. A learning algorithm for adaptive canonical correlation analysis of several data sets. Neural Netw. 2007, 20, 139–152. [Google Scholar] [CrossRef]
- Kan, M.; Shan, S.; Zhang, H.; Lao, S.; Chen, X. Multi-view discriminant analysis. IEEE Trans. Pat. Anal. Mach. Intell. 2016, 38, 188–194. [Google Scholar] [CrossRef]
- Kuehlkamp, A.; Pinto, A.; Rocha, A.; Bowyer, K.W.; Czajka, A. Ensemble of multi-view learning classifiers for cross-domain iris presentation attack detection. IEEE Trans. Inf. Forensics Secur. 2019, 14, 1419–1431. [Google Scholar] [CrossRef]
- Somandepalli, K.; Kumar, N.; Travadi, R.; Narayanan, S. Multimodal representation learning using deep multiset canonical correlation. arXiv 2019, arXiv:1904.01775. [Google Scholar]
- Li, D.; Dimitrova, N.; Li, M.; Sethi, I.K. Multimedia content processing through cross-modal association. In Proceedings of the 11th ACM International Conference Multimedia (MULTIMEDIA), Berkeley, CA, USA, 2–8 November 2003; pp. 604–611. [Google Scholar]
- Sun, L.; Ji, S.; Yu, S.; Ye, J. On the equivalence between canonical correlation analysis and orthonormalized partial least squares. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI), Pasadena, CA, USA, 11–17 July 2009; pp. 1230–1235. [Google Scholar]
- Arenas-Garcia, J.; Camps-Valls, G. Efficient kernel orthonormalized PLS for remote sensing applications. IEEE Trans. Geosci. Remote Sens. 2008, 46, 2872–2881. [Google Scholar] [CrossRef]
- Yin, J.; Sun, S. Multiview Uncorrelated Locality Preserving Projection. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3442–3455. [Google Scholar] [CrossRef]
- Shi, Y.; Pan, Y.; Xu, D.; Tsang, I.W. Multiview alignment and generation in CCA via consistent latent encoding. Neural Comput. 2020, 32, 1936–1979. [Google Scholar] [CrossRef]
- Hastie, T.; Buja, A.; Tibshirani, R. Penalized discriminant analysis. Ann. Stat. 1995, 23, 73–102. [Google Scholar] [CrossRef]
- Diethe, T.; Hardoon, D.R.; Shawe-Taylor, J. Multiview Fisher discriminant analysis. In Proceedings of the NIPS Workshop Learning Multiple Sources, Whistler, BC, Canada, 12–13 December 2008. [Google Scholar]
- Sun, S.; Xie, X.; Yang, M. Multiview uncorrelated discriminant analysis. IEEE Trans. Cybern. 2015, 46, 3272–3284. [Google Scholar] [CrossRef]
- Kan, M.; Shan, S.; Chen, X. Multi-view deep network for crossview classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26–30 June 2016; pp. 4847–4855. [Google Scholar]
- Cao, G.; Iosifidis, A.; Chen, K.; Gabbouj, M. Generalized multiview embedding for visual recognition and cross-modal retrieval. IEEE Trans. Cybern. 2017, 48, 2542–2555. [Google Scholar] [CrossRef]
- Sindhwani, V.; Rosenberg, D.S. An RKHS for multi-view learning and manifold co-regularization. In Proceedings of the 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 5–9 July 2008; pp. 976–983. [Google Scholar]
- Farquhar, J.D.R.; Hardoon, D.R.; Meng, H.; Shawe-Taylor, J.; Szedmak, S. Two view learning: SVM-2K, theory and practice. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 5–8 December 2005; pp. 355–362. [Google Scholar]
- Sun, S.; Chao, G. Multi-view maximum entropy discrimination. In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 3–9 August 2013; pp. 1706–1712. [Google Scholar]
- Chao, G.; Sun, S. Alternative multiview maximum entropy discrimination. IEEE Trans. Neural Netw. Learn. Syst. 2016, 27, 1445–1456. [Google Scholar] [CrossRef]
- Mao, L.; Sun, S. Soft margin consistency based scalable multi-view maximum entropy discrimination. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, USA, 9–15 July 2016; pp. 1839–1845. [Google Scholar]
- Song, G.; Wang, S.; Huang, Q.; Tian, Q. Multimodal similarity Gaussian process latent variable model. IEEE Trans. Image Process. 2017, 26, 4168–4181. [Google Scholar] [CrossRef] [PubMed]
- Damianou, A.C.; Ek, C.H.; Titsias, M.K.; Lawrence, N.D. Manifold relevance determination. In Proceedings of the 29th International Conference Machine Learning (ICML), Edinburgh, UK, 26 June–1 July 2012; pp. 531–538. [Google Scholar]
- Lawrence, N.D. Gaussian process latent variable models for visualisation of high dimensional data. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver and Whistler, BC, Canada, 8–13 December 2003; pp. 329–336. [Google Scholar]
- Liu, Q.; Sun, S. Multi-view regularized Gaussian processes. In Proceedings of the 21th Pacific–Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Jeju, Republic of Korea, 23–26 May 2017; pp. 655–667. [Google Scholar]
- Wang, L.; Li, R.-C.; Lin, W.-W. Multiview orthonormalized partial least squares: Regularizations and deep extensions. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 4371–4385. [Google Scholar] [CrossRef] [PubMed]
- Bramon, R.; Boada, I.; Bardera, A.; Rodriguez, J.; Feixas, M.; Puig, J.; Sbert, M. Multimodal data fusion based on mutual information. IEEE Trans. Vis. Comput. Graph. 2012, 18, 1574–1587. [Google Scholar] [CrossRef] [PubMed]
- Groves, A.R.; Beckmann, C.F.; Smith, S.M.; Woolrich, M.W. Linked independent component analysis for multimodal data fusion. NeuroImage 2011, 54, 2198–2217. [Google Scholar] [CrossRef]
- Du, K.-L. Clustering: A neural network approach. Neural Netw. 2010, 23, 89–107. [Google Scholar] [CrossRef] [PubMed]
- Tao, Z.; Liu, H.; Li, S.; Ding, Z.; Fu, Y. Marginalized multiview ensemble clustering. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 600–611. [Google Scholar] [CrossRef] [PubMed]
- Huang, X.; Zhang, R.; Li, Y.; Yang, F.; Zhu, Z.; Zhou, Z. MFC-ACL: Multi-view fusion clustering with attentive contrastive learning. Neural Netw. 2025, 184, 107055. [Google Scholar] [CrossRef] [PubMed]
- Strehl, A.; Ghosh, J. Cluster ensembles—A knowledge reuse frame work for combining multiple partitions. J. Mach. Learn. Res. 2003, 3, 583–617. [Google Scholar]
- Fred, A.L.N.; Jain, A.K. Combining multiple clusterings using evidence accumulation. IEEE Trans. Pat. Anal. Mach. Intell. 2005, 27, 835–850. [Google Scholar] [CrossRef] [PubMed]
- Topchy, A.; Jain, A.K.; Punch, W. Clustering ensembles: Models of consensus and weak partitions. IEEE Trans. Pat. Anal. Mach. Intell. 2005, 27, 1866–1881. [Google Scholar] [CrossRef]
- Wu, J.; Liu, H.; Xiong, H.; Cao, J.; Chen, J. K-means-based consensus clustering: A unified view. IEEE Trans. Knowl. Data Eng. 2015, 27, 155–169. [Google Scholar] [CrossRef]
- Kumar, A.; Daume, H., III. A co-training approach for multi-view spectral clustering. In Proceedings of the 28th International Conference Machine Learning (ICML), 2011, Bellevue, WA, USA,, 28 June–2 July 2011; pp. 393–400. [Google Scholar]
- Blaschko, M.B.; Lampert, C.H. Correlational spectral clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Anchorage, AL, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Singh, A.P.; Gordon, G.J. Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD International Conference Knowledge Discovery Data Mining (KDD), Las Vegas, NV, USA, 24–27 August 2008; pp. 650–658. [Google Scholar]
- Xia, R.; Pan, Y.; Du, L.; Yin, J. Robust multi-view spectral clustering via low-rank and sparse decomposition. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; pp. 2149–2155. [Google Scholar]
- Cao, X.; Zhang, C.; Fu, H.; Liu, S.; Zhang, H. Diversity-induced multi-view subspace clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), Boston, MA, USA, 7–12 June 2015; pp. 586–594. [Google Scholar]
- Liu, H.; Liu, T.; Wu, J.; Tao, D.; Fu, Y. Spectral ensemble clustering. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Sydney, Australia, 10–13 August 2015; pp. 715–724. [Google Scholar]
- Li, T.; Ding, C.; Jordan, M. Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In Proceedings of the 7th IEEE International Conference Data Mining (ICDM), Omaha, NE, USA, 28–31 October 2007; pp. 577–582. [Google Scholar]
- Iam-On, N.; Boongoen, T.; Garrett, S.M.; Price, C.J. A link-based approach to the cluster ensemble problem. IEEE Trans. Pat. Anal. Mach. Intell. 2011, 33, 2396–2409. [Google Scholar] [CrossRef]
- Yousefnezhad, M.; Huang, S.-J.; Zhang, D. WoCE: A framework for clustering ensemble by exploiting the wisdom of crowds theory. IEEE Trans. Cybern. 2018, 48, 486–499. [Google Scholar] [CrossRef] [PubMed]
- Tao, Z.; Liu, H.; Li, S.; Fu, Y. Robust spectral ensemble clustering. In Proceedings of the 25th ACM International Conference Information and Knowledge Management (CIKM), Indianapolis, IN, USA, 24–28 October 2016; pp. 367–376. [Google Scholar]
- Liu, H.; Shao, M.; Li, S.; Fu, Y. Infinite ensemble for image clustering. In Proceedings of the 22nd ACM SIGKDD International Conference Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA, 13–17 August 2016; pp. 1745–1754. [Google Scholar]
- Pavlov, D.; Mao, J.; Dom, B. Scaling-up support vector machines using boosting algorithm. In Proceedings of the 15th International Conference on Pattern Recognition (ICPR), Barcelona, Spain, 3–7 September 2000; pp. 2219–2222. [Google Scholar]
- Collobert, R.; Bengio, S.; Bengio, Y. A parallel mixture of SVMs for very large scale problems. Neural Comput. 2002, 14, 1105–1114. [Google Scholar] [CrossRef]
- Valentini, G.; Dietterich, T.G. Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods. J. Mach. Learn. Res. 2004, 5, 725–775. [Google Scholar]
- Fu, Z.; Robles-Kelly, A.; Zhou, J. Mixing linear SVMs for nonlinear classification. IEEE Trans. Neural Netw. 2010, 21, 1963–1975. [Google Scholar]
- Singh, V.; Mukherjee, L.; Peng, J.; Xu, J. Ensemble clustering using semidefinite programming with applications. Mach. Learn. 2010, 79, 177–200. [Google Scholar] [CrossRef]
- Kuncheva, L.I.; Vetrov, D.P. Evaluation of stability of k-means cluster ensembles with respect to random initialization. IEEE Trans. Pat. Anal. Mach. Intell. 2006, 28, 1798–1808. [Google Scholar] [CrossRef]
- Shigei, N.; Miyajima, H.; Maeda, M.; Ma, L. Bagging and AdaBoost algorithms for vector quantization. Neurocomputing 2009, 73, 106–114. [Google Scholar] [CrossRef]
- Steele, B.M. Exact bootstrap k-nearest neighbor learners. Mach. Learn. 2009, 74, 235–255. [Google Scholar] [CrossRef]
- Mirikitani, D.T.; Nikolaev, N. Efficient online recurrent connectionist learning with the ensemble Kalman filter. Neurocomputing 2010, 73, 1024–1030. [Google Scholar] [CrossRef]
- Zhou, S.; Wang, J.; Wang, L.; Wan, X.; Hui, S.; Zheng, N. Inverse adversarial diversity learning for network ensemble. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 7923–7935. [Google Scholar] [CrossRef]
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.V.; Hinton, G.E.; Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Zhou, T.; Wang, S.; Bilmes, J.A. Diverse ensemble evolution: Curriculum data-model marriage. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 2–8 December 2018; pp. 5905–5916. [Google Scholar]
- Pang, T.; Xu, K.; Du, C.; Chen, N.; Zhu, J. Improving adversarial robustness via promoting ensemble diversity. In Proceedings of the International Conference Machine Learning (ICML), Long Beach, CA, USA, 9–15 June 2019; pp. 4970–4979. [Google Scholar]
- Li, N.; Yu, Y.; Zhou, Z.-H. Diversity regularized ensemble pruning. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Bristol, UK, 24–28 September 2012; Springer: Cham, Switzerland, 2012; pp. 330–345. [Google Scholar]
- Oshiro, T.M.; Perez, P.S.; Baranauskas, J.A. How many trees in a random forest? In Proceedings of the Machine Learning and Data Mining in Pattern Recognition (MLDM), Berlin, Germany, 13–20 July 2012; pp. 154–168. [Google Scholar]
- Tsoumakas, G.; Partalas, I.; Vlahavas, I. A taxonomy and short review of ensemble selection. In Proceedings of the Workshop Supervised and Unsupervised Ensemble Methods and their Applications (SUEMA), Patras, Greece, 21–22 July 2008; pp. 41–46. [Google Scholar]
- Gomes, J.B.; Gaber, M.M.; Sousa, P.A.C.; Menasalvas, E. Mining recurring concepts in a dynamic feature space. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 95–110. [Google Scholar] [CrossRef] [PubMed]
- Kuncheva, L.I.; Whitaker, C.J. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 2003, 51, 181–207. [Google Scholar] [CrossRef]
- Liu, H.; Mandvikar, A.; Mody, J. An empirical study of building compact ensembles. In Advances in Web-Age Information Management; Springer: Berlin, Germany, 2004; pp. 622–627. [Google Scholar]
- Hu, X. Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. In Proceedings of the IEEE International Conference Data Mining (ICDM), San Jose, CA, USA, 29 November–2 December 2001; pp. 233–240. [Google Scholar]
- Bonab, H.; Can, F. Less Is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2735–2745. [Google Scholar] [CrossRef] [PubMed]
- Ulas, A.; Semerci, M.; Yildiz, O.T.; Alpaydin, E. Incremental construction of classifier and discriminant ensembles. Inf. Sci. 2009, 179, 1298–1318. [Google Scholar] [CrossRef]
- Windeatt, T.; Zor, C. Ensemble pruning using spectral coefficients. IEEE Trans. Neural Netw. Learn. Syst. 2013, 24, 673–678. [Google Scholar] [CrossRef]
- Pietruczuk, L.; Rutkowski, L.; Jaworski, M.; Duda, P. How to adjust an ensemble size in stream data mining? Inf. Sci. 2017, 381, 46–54. [Google Scholar] [CrossRef]
- Latinne, P.; Debeir, O.; Decaestecker, C. Limiting the number of trees in random forests. In Multiple Classifier Systems; Springer: Berlin, Germany, 2001; pp. 178–187. [Google Scholar]
- Fumera, G.; Roli, F.; Serrau, A. A theoretical analysis of bagging as a linear combination of classifiers. IEEE Trans. Pat. Anal. Mach. Intell. 2008, 30, 1293–1299. [Google Scholar] [CrossRef] [PubMed]
- Fumera, G.; Roli, F. A theoretical and experimental analysis of linear combiners for multiple classifier systems. IEEE Trans. Pat. Anal. Mach. Intell. 2005, 27, 942–956. [Google Scholar] [CrossRef]
- Lopes, M.E.; Wu, S.; Lee, T.C.M. Measuring the algorithmic convergence of randomized ensembles: The regression setting. SIAM J. Math. Data Sci 2020, 2, 921–943. [Google Scholar] [CrossRef]
- Lopes, M.E. Estimating the algorithmic variance of randomized ensembles via the bootstrap. Ann. Stat. 2019, 47, 1088–1112. [Google Scholar] [CrossRef]
- Zhang, Y.; Burer, S.; Street, W.N. Ensemble pruning via semi-definite programming. J. Mach. Learn. Res. 2006, 7, 1315–1338. [Google Scholar]
- Meynet, J.; Thiran, J.-P. Information theoretic combination of pattern classifiers. Pattern Recogn. 2010, 43, 3412–3421. [Google Scholar] [CrossRef]
- Tang, E.K.; Suganthan, P.N.; Yao, X. Ananalysis of diversity measures. Mach. Learn. 2006, 65, 247–271. [Google Scholar] [CrossRef]
- Kleinberg, E. On the algorithmic implementation of stochastic discrimination. IEEE Trans. Pat. Anal. Mach. Intell. 2000, 22, 473–490. [Google Scholar] [CrossRef]
- Breiman, L. Bias Variance and Arcing Classifiers; Technical Report TR 460; Statistics Department, University of California: Berkeley, CA, USA, 1996. [Google Scholar]
- Friedman, J.; Hall, P. On Bagging and Nonlinear Estimation; Technical Report; Statistics Department, Stanford University: Stanford, CA, USA, 2000. [Google Scholar]
- Friedman, J.H. On bias, variance, 0/1–loss, and the curse-of-dimensionality. Data Min. Knowl. Discov. 1997, 1, 55–77. [Google Scholar] [CrossRef]
- Domingos, P. A unified bias-variance decomposition for zero-one and squared loss. In Proceedings of the 17th National Conference on Artificial Intelligence, Austin, TX, USA, 30 July–3 August 2000; pp. 564–569. [Google Scholar]
- Valentini, G. An experimental bias-variance analysis of SVM ensembles based on resampling techniques. IEEE Trans. Syst. Man Cybern. B 2005, 35, 1252–1271. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
- Mentch, L.; Zhou, S. Randomization as regularization: A degrees of freedom explanation for random forest success. J. Mach. Learn. Res. 2020, 21, 1–36. [Google Scholar]
- Olson, M.A.; Wyner, A.J. Making sense of random forest probabilities: A kernel perspective. arXiv 2018, arXiv:1812.05792. [Google Scholar]
- Welbl, J. Casting random forests as artificial neural networks (and profiting from it). In Proceedings of the German Conference on Pattern Recognition (GCPR), Munster, Germany, 3–5 September 2014; pp. 765–771. [Google Scholar]
- Biau, G.; Scornet, E.; Welbl, J. Neural random forests. Sankhya A 2016, 78, 140–158. [Google Scholar] [CrossRef]
- Coleman, T.; Peng, W.; Mentch, L. Scalable and efficient hypothesis testing with random forests. J. Mach. Learn. Res. 2022, 23, 1–35. [Google Scholar]
- Garipov, T.; Izmailov, P.; Podoprikhin, D.; Vetrov, D.; Wilson, A.G. Loss surfaces, mode connectivity, and fast ensembling of DNNs. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 2–8 December 2018; pp. 876–885. [Google Scholar]
- Izmailov, P.; Podoprikhin, D.; Garipov, T.; Vetrov, D.; Wilson, A.G. Averaging weights leads to wider optima and better generalization. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence 2018 (UAI), Corvallis, OR, USA, 1–4 August 2018; pp. 1–9. [Google Scholar]
- Maddox, W.; Garipov, T.; Izmailov, P.; Vetrov, D.; Wilson, A.G. A Simple baseline for Bayesian uncertainty in deep learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–14 December 2019; pp. 102–111. [Google Scholar]
- Wang, A.; Wan, G.; Cheng, Z.; Li, S. An incremental extremely random forest classifier for online learning and tracking. In Proceedings of the 16th IEEE International Conference Image Processing (ICIP), Cairo, Egypt, 7–11 November 2009; pp. 1449–1452. [Google Scholar]
- Gomes, H.M.; Bifet, A.; Read, J.; Barddal, J.P.; Enembreck, F.; Pfharinger, B.; Holmes, G.; Abdessalem, T. Adaptive random forests for evolving data stream classification. Mach. Learn. 2017, 106, 1469–1495. [Google Scholar] [CrossRef]
- Gomes, H.M.; Barddal, J.P.; Ferreira, L.E.B.; Bifet, A. Adaptive random forests for data stream regression. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), Bruges, Belgium, 25–27 April 2018; pp. 267–272. [Google Scholar]
- Luong, A.V.; Nguyen, T.T.; Liew, A.W.-C. Streaming active deep forest for evolving data stream classification. arXiv 2020, arXiv:2002.11816. [Google Scholar]
- Korycki, L.; Krawczyk, B. Adaptive deep forest for online learning from drifting data streams. arXiv 2020, arXiv:2010.07340. [Google Scholar]
- Gomes, H.M.; Read, J.; Bifet, A.; Durrant, R.J. Learning from evolving data streams through ensembles of random patches. Knowl. Inf. Syst. 2021, 63, 1–29. [Google Scholar] [CrossRef]
- Gomes, H.M.; Montiel, J.; Mastelini, S.M.; Pfahringer, B.; Bifet, A. On ensemble techniques for data stream regression. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
- Mastelini, S.M.; Nakano, F.K.; Vens, C.; de Carvalho, A.C.P.d.L.F. Online extra trees regressor. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 6755–6767. [Google Scholar] [CrossRef] [PubMed]
- Polikar, R.; Upda, L.; Upda, S.S.; Honavar, V. Learn++: An incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. C 2001, 31, 497–508. [Google Scholar] [CrossRef]
- Zhao, Q.; Jiang, Y.; Xu, M. Incremental learning by heterogeneous bagging ensemble. In Proceedings of the 6th International Conference on Advanced Data Mining and Applications (ADMA), Chongqing, China, 19–21 November 2010; Volume 2, pp. 1–12. [Google Scholar]
- Zliobaite, I. Adaptive Training Set Formation. Ph.D. Thesis, Vilnius University, Vilnius, Lithuania, 2010. [Google Scholar]
- Elwell, R.; Polikar, R. Incremental learning of concept drift in nonstationary environments. IEEE Trans. Neural Netw. 2011, 22, 1517–1531. [Google Scholar] [CrossRef] [PubMed]
- Kumano, S.; Akutsu, T. Comparison of the representational power of random forests, binary decision diagrams, and neural networks. Neural Comput. 2022, 34, 1019–1044. [Google Scholar] [CrossRef]
- Du, K.-L. Several misconceptions and misuses of deep neural networks and deep learning. In Proceedings of the 2023 International Congress on Communications, Networking, and Information Systems (CNIS 2023), Guilin, China, 25–27 March 2023; pp. 155–171. [Google Scholar]
- Yarotsky, D. Error bounds for approximations with deep ReLU networks. Neural Netw. 2017, 94, 103–114. [Google Scholar] [CrossRef] [PubMed]
- Baldi, P.; Vershynin, R. The capacity of feedforward neural networks. Neural Netw. 2019, 116, 288–311. [Google Scholar] [CrossRef]
- Veit, A.; Wilber, M.; Belongie, S. Residual networks behave like ensembles of relatively shallow networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29, pp. 550–558. [Google Scholar]
- He, F.; Liu, T.; Tao, D. Why ResNet works? Residuals generalize. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 5349–5362. [Google Scholar] [CrossRef] [PubMed]
- Du, K.-L.; Leung, C.-S.; Mow, W.H.; Swamy, M.N.S. Perceptron: Learning, generalization, model selection, fault tolerance, and role in the deep learning era. Mathematics 2022, 10, 4730. [Google Scholar] [CrossRef]
- Bhattacharyya, S.; Jha, S.; Tharakunnel, K.; Westland, J.C. Data mining for credit card fraud: A comparative study. Decis. Support Syst. 2011, 50, 602–613. [Google Scholar] [CrossRef]
- Santos, M.S.; Soares, J.P.; Abreu, P.H.; Arajo, H.; Santos, J. Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches. IEEE Comput. Intell. Mag. 2018, 13, 59–76. [Google Scholar] [CrossRef]
- Cruz, J.A.; Wishart, D.S. Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2006, 2, 59–77. [Google Scholar] [CrossRef]
- Zhang, J.; Zulkernine, M.; Haque, A. Random-forests-based network intrusion detection systems. IEEE Trans. Syst. Man Cybern. Part Appl. Rev. 2008, 38, 649–659. [Google Scholar] [CrossRef]
- Janai, J.; Guney, F.; Behl, A.; Geiger, A. Computer vision for autonomous vehicles: Problems, datasets and state of the art. Foundations and trends? Comput. Graph. Vis. 2020, 12, 1–308. [Google Scholar]
- Yang, Y.; Lv, H.; Chen, N. A Survey on ensemble learning under the era of deep learning. Artif. Intell. Rev. 2023, 56, 5545–5589. [Google Scholar] [CrossRef]
- Cao, Y.; Geddes, T.A.; Yang, J.Y.H.; Yang, P. Ensemble deep learning in bioinformatics. Nat. Mach. Intell. 2020, 2, 500–508. [Google Scholar] [CrossRef]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- Dong, X.; Yu, Z.; Cao, W.; Shi, Y.; Ma, Q. A survey on ensemble learning. Front. Comput. Sci. 2020, 14, 241–258. [Google Scholar] [CrossRef]
- Gomes, H.M.; Barddal, J.P.; Enembreck, F.; Bifet, A. A survey on ensemble learning for data stream classification. ACM Comput. Surv. 2017, 50, 23:1–23:36. [Google Scholar] [CrossRef]
- Gao, J.; Li, P.; Chen, Z.; Zhang, J. A survey on deep learning for multimodal data fusion. Neural Comput. 2020, 32, 829–864. [Google Scholar] [CrossRef]
Method | Strengths | Weaknesses | Typical Applications |
---|---|---|---|
Bagging | Reduces variance, improves stability | Less effective for biased models | Image classification, tabular data |
Boosting | High accuracy, adaptive learning | Prone to overfitting, sequential training | Fraud detection, ranking tasks |
Random forests | Robust, feature selection, scalable | Requires storage for multiple trees | Medical diagnosis, remote sensing |
XGBoost | Efficient, handles missing data, fast training | High memory usage, tuning complexity | Time series forecasting, NLP |
Method | Training Complexity | Inference Complexity | Storage Requirement |
---|---|---|---|
Bagging | |||
Boosting | |||
Random Forests | |||
XGBoost |
Aspect | Ensemble Learning | Deep Learning |
---|---|---|
Definition | Combines multiple model predictions to improve accuracy and robustness. | Learns hierarchical representations from raw data through multiple layers. |
Data Requirements | Effective with small to medium-sized datasets. | Typically requires large labeled datasets for optimal performance. |
Computational Complexity | Generally less demanding, depending on the base models used. | Computationally intensive, often requiring GPUs or TPUs. |
Interpretability | More interpretable, as individual model contributions can be analyzed. | Less interpretable due to complex feature transformations. |
Fusion Method | Aggregates predictions via majority voting, averaging, or weighted combinations. | Integrates features across multiple layers for final predictions. |
Strengths | Reduces bias, variance, or both by leveraging model diversity. | Excels at feature extraction and handling high-dimensional data. |
Weaknesses | Performance depends on the quality and diversity of base learners. | Prone to overfitting and requires careful tuning and regularization. |
Applications | Suitable for classification, regression, and anomaly detection. | Ideal for image recognition, NLP, and speech analysis. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Du, K.-L.; Zhang, R.; Jiang, B.; Zeng, J.; Lu, J. Foundations and Innovations in Data Fusion and Ensemble Learning for Effective Consensus. Mathematics 2025, 13, 587. https://doi.org/10.3390/math13040587
Du K-L, Zhang R, Jiang B, Zeng J, Lu J. Foundations and Innovations in Data Fusion and Ensemble Learning for Effective Consensus. Mathematics. 2025; 13(4):587. https://doi.org/10.3390/math13040587
Chicago/Turabian StyleDu, Ke-Lin, Rengong Zhang, Bingchun Jiang, Jie Zeng, and Jiabin Lu. 2025. "Foundations and Innovations in Data Fusion and Ensemble Learning for Effective Consensus" Mathematics 13, no. 4: 587. https://doi.org/10.3390/math13040587
APA StyleDu, K.-L., Zhang, R., Jiang, B., Zeng, J., & Lu, J. (2025). Foundations and Innovations in Data Fusion and Ensemble Learning for Effective Consensus. Mathematics, 13(4), 587. https://doi.org/10.3390/math13040587