Less Is More: Principled Diversity in Heterogeneous Anomaly Detection Ensembles
Abstract
1. Introduction
- An NCL-inspired unsupervised weighting scheme that explicitly penalises correlated detectors based on inter-detector score correlation, extending the diversity-penalisation principle of Negative Correlation Learning to the unsupervised anomaly detection setting.
- A GMM-based adaptive contamination estimator enabling automatic threshold calibration across datasets with anomaly rates ranging from 1.2% to 35.9%, replacing the fixed assumptions used by existing ensemble methods.
- A combinatorial search across all 2036 detector combinations () identifying that compact ensembles of three to five complementary detectors match or exceed the full eleven-detector ensemble at as little as 8.7% of computational cost.
- Rigorous statistical evaluation across 22 benchmark datasets using Friedman and Nemenyi testing, confirming significant improvements over six individual detectors (, ).
2. Related Work
3. Materials and Methods
3.1. Ensemble Framework
3.2. Experimental Setup
4. Results
4.1. Benchmark Datasets
4.2. Hyperparameter Sensitivity
4.3. Overall Comparison
4.4. Ablation Studies
4.5. Compact Ensemble Analysis
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AD | Anomaly Detection |
| AUPRC | Area Under the Precision-Recall Curve |
| AUROC | Area Under the Receiver Operating Characteristic Curve |
| CD | Critical Difference |
| COPOD | Copula-Based Outlier Detector |
| EM | Expectation–Maximisation |
| GMM | Gaussian Mixture Model |
| HBOS | Histogram-Based Outlier Score |
| INNE | Isolation-Based Nearest-Neighbour Ensemble |
| KNN | k-Nearest Neighbour |
| LOF | Local Outlier Factor |
| LSCP | Locally Selective Combination in Parallel Outlier Ensembles |
| LUNAR | Locally Unified Neighbourhood Anomaly Ranking |
| MCD | Minimum Covariance Determinant |
| NCL | Negative Correlation Learning |
| ODDS | Outlier Detection DataSets |
| PCA | Principal Component Analysis |
| SUOD | Scalable Unsupervised Outlier Detection |
| SVD | Singular Value Decomposition |
| SVDD | Support Vector Data Description |
| VAE | Variational Autoencoder |
References
- Pang, G.; Shen, C.; Cao, L.; van den Hengel, A. Deep Learning for Anomaly Detection: A Review. ACM Comput. Surv. 2021, 54, 1–38. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection: A Survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Fernando, T.; Gammulle, H.; Denman, S.; Sridharan, S.; Fookes, C. Deep Learning for Medical Anomaly Detection—A Survey. ACM Comput. Surv. 2021, 54, 1–37. [Google Scholar] [CrossRef]
- Zimek, A.; Campello, R.J.G.B.; Sander, J. Ensembles for Unsupervised Outlier Detection: Challenges and Research Questions. ACM SIGKDD Explor. Newsl. 2014, 15, 11–22. [Google Scholar] [CrossRef]
- Aggarwal, C.C. Outlier Analysis, 2nd ed.; Springer: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
- Han, S.; Hu, X.; Huang, H.; Jiang, M.; Zhao, Y. ADBench: Anomaly Detection Benchmark. In Proceedings of the 36th International Conference on Neural Information Processing Systems; Oh, A., Agarwal, S., Belmont, D., Eisenstein, J., Eds.; Curran Associates, Inc.: New Orleans, LA, USA, 2022; Volume 35, pp. 32142–32159. Available online: https://proceedings.neurips.cc/paper_files/paper/2022/file/cf93972b116ca5268827d575f2cc226b-Paper-Datasets_and_Benchmarks.pdf (accessed on 10 May 2026).
- Campos, G.O.; Zimek, A.; Sander, J.; Campello, R.J.G.B.; Micenková, B.; Schubert, E.; Assent, I.; Houle, M.E. On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study. Data Min. Knowl. Discov. 2016, 30, 891–927. [Google Scholar] [CrossRef]
- Cabrera-Bean, M.; Lázaro-Gredilla, M.; Van Vaerenbergh, S. Unsupervised Ensemble Classification with Correlated Agents. In Proceedings of the 2018 IEEE Statistical Signal Processing Workshop (SSP); IEEE: New York, NY, USA, 2018; pp. 528–532. [Google Scholar] [CrossRef]
- Reynolds, D.A. Gaussian Mixture Models. In Encyclopedia of Biometrics; Li, S.Z., Jain, A., Eds.; Springer: Boston, MA, USA, 2009; pp. 659–663. [Google Scholar] [CrossRef]
- Liu, Y.; Yao, X. Ensemble Learning via Negative Correlation. Neural Netw. 1999, 12, 1399–1404. [Google Scholar] [CrossRef] [PubMed]
- Rayana, S. ODDS Library; Stony Brook University, Department of Computer Science: Stony Brook, NY, USA, 2016; Available online: https://shebuti.com/outlier-detection-datasets-odds/ (accessed on 10 May 2026).
- Demšar, J. Statistical Comparisons of Classifiers over Multiple Data Sets. J. Mach. Learn. Res. 2006, 7, 1–30. Available online: https://jmlr.org/papers/v7/demsar06a.html (accessed on 5 May 2026).
- Ramaswamy, S.; Rastogi, R.; Shim, K. Efficient Algorithms for Mining Outliers from Large Data Sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data; ACM: New York, NY, USA, 2000; pp. 427–438. [Google Scholar] [CrossRef]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying Density-Based Local Outliers. SIGMOD Rec. 2000, 29, 93–104. [Google Scholar] [CrossRef]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining; IEEE: New York, NY, USA, 2008; pp. 413–422. [Google Scholar] [CrossRef]
- Goldstein, M.; Dengel, A. Histogram-Based Outlier Score (HBOS): A Fast Unsupervised Anomaly Detection Algorithm. In KI-2012: Poster and Demo Track; German Research Center for Artificial Intelligence (DFKI): Kaiserslautern, Germany, 2012; Available online: https://www.goldiges.de/publications/HBOS-KI-2012.pdf (accessed on 8 June 2026).
- Rousseeuw, P.J.; Van Driessen, K. A Fast Algorithm for the Minimum Covariance Determinant Estimator. Technometrics 1999, 41, 212–223. [Google Scholar] [CrossRef]
- Li, Z.; Zhao, Y.; Botta, N.; Ionescu, C.; Hu, X. COPOD: Copula-Based Outlier Detection. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM); IEEE: New York, NY, USA, 2020; pp. 1118–1123. [Google Scholar] [CrossRef]
- Bandaragoda, T.R.; Ting, K.M.; Albrecht, D.; Liu, F.T.; Zhu, Y.; Wells, J.R. Isolation-Based Anomaly Detection Using Nearest-Neighbour Ensembles. Comput. Intell. 2018, 34, 968–998. [Google Scholar] [CrossRef]
- An, J.; Cho, S. Variational Autoencoder based Anomaly Detection using Reconstruction Probability. In SNU Data Mining Center Technical Report; SNUDM-TR-2015-03; Seoul National University, Department of Industrial Engineering: Seoul, Republic of Korea, 2015; Available online: http://dm.snu.ac.kr/static/docs/TR/SNUDM-TR-2015-03.pdf (accessed on 27 April 2026).
- Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep One-Class Classification. In Proceedings of the 35th International Conference on Machine Learning; Dy, J., Krause, A., Eds.; PMLR: Stockholmsmässan, Stockholm, Sweden, 2018; Volume 80, pp. 4393–4402. Available online: https://proceedings.mlr.press/v80/ruff18a.html (accessed on 26 April 2026).
- Goodge, A.; Hooi, B.; Ng, S.K.; Ng, W.S. LUNAR: Unifying Local Outlier Methods via Graph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Menlo Park, CA, USA, 2022; Volume 36, pp. 6737–6745. [Google Scholar] [CrossRef]
- Lazarevic, A.; Kumar, V. Feature Bagging for Outlier Detection. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2005; pp. 157–166. [Google Scholar] [CrossRef]
- Zhao, Y.; Nasrullah, Z.; Hryniewicki, M.K.; Li, Z. LSCP: Locally Selective Combination in Parallel Outlier Ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining (SDM); Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2019; pp. 585–593. [Google Scholar] [CrossRef]
- Zhao, Y.; Hu, X.; Cheng, C.; Wang, C.; Wan, C.; Wang, W.; Yang, J.; Bai, H.; Li, Z.; Xiao, C.; et al. SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection. In Proceedings of Machine Learning and Systems; Smola, A., Dimakis, A., Stoica, I., Eds.; MLSys: Virtual Conference, 2021; Volume 3, pp. 463–478. Available online: https://proceedings.mlsys.org/paper/2021/hash/37385144cac01dff38247ab11c119e3c-Abstract.html (accessed on 28 April 2026).
- Liu, Y.; Zhu, L.; Ding, L.; Huang, Z.; Sui, H.; Wang, S.; Song, Y. Selective Ensemble Method for Anomaly Detection Based on Parallel Learning. Sci. Rep. 2024, 14, 1420. [Google Scholar] [CrossRef] [PubMed]
- Zhou, D.; Liu, B. RHAD: A Reinforced Heterogeneous Anomaly Detector for Robust Industrial Control System Security. Electronics 2025, 14, 2440. [Google Scholar] [CrossRef]
- Gao, T.; Yang, J.; Wang, W.; Fan, X. A Domain Feature Decoupling Network for Rotating Machinery Fault Diagnosis Under Unseen Operating Conditions. Reliab. Eng. Syst. Saf. 2024, 252, 110449. [Google Scholar] [CrossRef]
- Wang, P.; Li, S. Multi-Agent Debate: A Unified Agentic Framework for Tabular Anomaly Detection. arXiv 2026, arXiv:2602.14251. [Google Scholar] [CrossRef]
- Janssens, J.H.M.; Huszár, F.; Postma, E.O.; van den Herik, H.J. Stochastic Outlier Selection. In Tilburg Centre for Creative Computing Technical Report; TiCC TR 2012-001; Tilburg Centre for Creative Computing, Tilburg University: Tilburg, The Netherlands, 2012; Available online: https://github.com/jeroenjanssens/scikit-sos/blob/main/doc/sos-ticc-tr-2012-001.pdf (accessed on 12 May 2026).
- Kuncheva, L.I.; Whitaker, C.J.; Shipp, C.A.; Duin, R.P.W. Limits on the Majority Vote Accuracy in Classifier Fusion. Pattern Anal. Appl. 2003, 6, 22–31. [Google Scholar] [CrossRef]






| Dataset | n | d | Anomalies | Anom. Rate (%) |
|---|---|---|---|---|
| Wine | 129 | 13 | 10 | 7.7 |
| Lympho | 148 | 18 | 6 | 4.1 |
| Glass | 214 | 9 | 9 | 4.2 |
| Vertebral | 240 | 6 | 30 | 12.5 |
| Ionosphere | 351 | 33 | 126 | 35.9 |
| WBC | 378 | 30 | 21 | 5.6 |
| Arrhythmia | 452 | 274 | 66 | 14.6 |
| BreastW | 683 | 9 | 239 | 35.0 |
| Pima | 768 | 8 | 268 | 34.9 |
| Vowels | 1456 | 12 | 50 | 3.4 |
| Letter | 1600 | 32 | 100 | 6.3 |
| Cardio | 1831 | 21 | 176 | 9.6 |
| Musk | 3062 | 166 | 97 | 3.2 |
| Speech | 3686 | 400 | 61 | 1.7 |
| Thyroid | 3772 | 6 | 93 | 2.5 |
| Optdigits | 5216 | 64 | 150 | 2.9 |
| Satimage-2 | 5803 | 36 | 71 | 1.2 |
| Satellite | 6435 | 36 | 2036 | 31.6 |
| Pendigits | 6870 | 16 | 156 | 2.3 |
| Annthyroid | 7200 | 6 | 534 | 7.4 |
| MNIST | 7603 | 100 | 700 | 9.2 |
| Mammography | 11,183 | 6 | 260 | 2.3 |
| F1 | AUROC | AUPRC | |
|---|---|---|---|
| Method | Type | F1 | AUROC | AUPRC |
|---|---|---|---|---|
| Ensemble-EM † | Proposed | |||
| Ensemble-Equal † | Proposed | |||
| Ensemble-NCL † | Proposed | |||
| Ensemble-Var † | Proposed | |||
| IsolationForest | Standalone | |||
| MCD | Standalone | |||
| LSCP † | Baseline | |||
| PCA | Standalone | |||
| HBOS | Standalone | |||
| INNE | Standalone | |||
| KNN | Standalone | |||
| COPOD | Standalone | |||
| VAE | Standalone | |||
| LUNAR | Standalone | |||
| DeepSVDD | Standalone | |||
| LOF | Standalone |
| Comparison | Ensemble-Equal | Ensemble-NCL |
|---|---|---|
| vs. DeepSVDD | 0.001 | 0.002 |
| vs. LOF | 0.008 | 0.016 |
| vs. PCA | 0.008 | 0.014 |
| vs. VAE | 0.019 | 0.024 |
| vs. LUNAR | 0.044 | 0.076 |
| vs. COPOD | 0.055 | 0.069 |
| vs. HBOS | 0.197 | 0.240 |
| vs. KNN | 0.477 | 0.598 |
| vs. LSCP | 0.503 | 0.549 |
| vs. INNE | 1.000 | 1.000 |
| vs. IsolationForest | 0.954 | 0.971 |
| vs. MCD | 0.992 | 0.998 |
| Scheme | F1 | AUROC | AUPRC |
|---|---|---|---|
| Equal | |||
| Variance | |||
| NCL | |||
| EM |
| Ensemble | n | F1 | AUROC | AUPRC | Cost (%) |
|---|---|---|---|---|---|
| Full-11-EM | 11 | 100 | |||
| IF+INNE+KNN+HBOS | 4 | 13.8 | |||
| IF+INNE+KNN+HBOS+LUNAR | 5 | 23.4 | |||
| IF+KNN+HBOS | 3 | 8.7 | |||
| IF+KNN+MCD | 3 | 17.7 |
| Detector | Fit Time (s) | Inference (ms/1k) | Peak Memory (MB) |
|---|---|---|---|
| IsolationForest | |||
| INNE | |||
| LOF | |||
| KNN | |||
| HBOS | |||
| MCD | |||
| COPOD | |||
| PCA | |||
| VAE | |||
| DeepSVDD | |||
| LUNAR |
| n | Best Combination | F1 | AUROC | AUPRC |
|---|---|---|---|---|
| 2 | IF+PCA | |||
| 3 | IF+KNN+HBOS | |||
| 4 | IF+INNE+KNN+HBOS | |||
| 5 | IF+INNE+KNN+HBOS+LUNAR | |||
| 6 | IF+INNE+KNN+HBOS+PCA+LUNAR | |||
| 7 | IF+INNE+KNN+HBOS+MCD+COPOD+LUNAR | |||
| 8 | IF+INNE+KNN+HBOS+MCD+COPOD+PCA+LUNAR | |||
| 9 | IF+INNE+LOF+KNN+HBOS+MCD+COPOD+PCA+LUNAR | |||
| 10 | IF+INNE+KNN+HBOS+MCD+COPOD+PCA+VAE+DSVDD+LUNAR | |||
| 11 | IF+INNE+LOF+KNN+HBOS+MCD+COPOD+PCA+VAE+DSVDD+LUNAR |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Krčmar, T.; Šabanović, D.; Köhler, M.; Lukić, I. Less Is More: Principled Diversity in Heterogeneous Anomaly Detection Ensembles. AI 2026, 7, 214. https://doi.org/10.3390/ai7060214
Krčmar T, Šabanović D, Köhler M, Lukić I. Less Is More: Principled Diversity in Heterogeneous Anomaly Detection Ensembles. AI. 2026; 7(6):214. https://doi.org/10.3390/ai7060214
Chicago/Turabian StyleKrčmar, Tea, Dina Šabanović, Mirko Köhler, and Ivica Lukić. 2026. "Less Is More: Principled Diversity in Heterogeneous Anomaly Detection Ensembles" AI 7, no. 6: 214. https://doi.org/10.3390/ai7060214
APA StyleKrčmar, T., Šabanović, D., Köhler, M., & Lukić, I. (2026). Less Is More: Principled Diversity in Heterogeneous Anomaly Detection Ensembles. AI, 7(6), 214. https://doi.org/10.3390/ai7060214

