A Probabilistic Transformation of Distance-Based Outliers
Abstract
1. Introduction
| Notation | |
| A scalar (integer or real) | |
| A vector | |
| A matrix | |
| A scalar random variable | |
| A set | |
| A space | |
| A distribution | |
| The set of real numbers | |
| A dataset | |
| The set of all integers between 0 and n | |
| A function of x with domain and range | |
2. Distance-Based Outlier Detection
2.1. Nearest Neighbors
2.2. Local Outlier Factor
2.3. Closed-World and Open-World
3. Outlier Score Normalization
3.1. Interpretability, Explanation, and Trustworthiness
- Interpretability: the ability to judge the relevance of a prediction.
- Explanation: the ability to explain the reasoning behind a prediction.
- Trustworthiness: the ability to describe the confidence behind a prediction.
4. Probabilistic Outlier Scores
5. Results
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Correction Statement
References
- Barnett, V.; Lewis, T. Outliers in Statistical Data; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1978. [Google Scholar]
- Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.R. A Unifying Review of Deep and Shallow Anomaly Detection. Proc. IEEE 2021, 109, 756–795. [Google Scholar] [CrossRef]
- Hawkins, D.M. Identification of Outliers; Springer: Dordrecht, The Netherlands, 1980. [Google Scholar] [CrossRef]
- Markou, M.; Singh, S. Novelty Detection: A Review—Part 1: Statistical Approaches. Signal Process. 2003, 83, 2481–2497. [Google Scholar] [CrossRef]
- Markou, M.; Singh, S. Novelty Detection: A Review—Part 2. Signal Process. 2003, 83, 2499–2521. [Google Scholar] [CrossRef]
- Hodge, V.; Austin, J. A Survey of Outlier Detection Methodologies. Artif. Intell. Rev. 2004, 22, 85–126. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Pimentel, M.A.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A Review of Novelty Detection. Signal Process. 2014, 99, 215–249. [Google Scholar] [CrossRef]
- Knorr, E.M.; Ng, R.T. A Unified Approach for Mining Outliers. In Proceedings of the CASCON ’97: Proceedings of the 1997 Conference of the Centre for Advanced Studies on Collaborative Research, Toronto, ON, USA, 10–13 November 1997; p. 11. [Google Scholar] [CrossRef]
- Knorr, E.M.; Ng, R.T. Algorithms for Mining Distance-Based Outliers in Large Datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases, New York, NY, USA, 24–27 August 1998; VLDB ’98. Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1998; pp. 392–403. [Google Scholar]
- Knorr, E.M.; Ng, R.T.; Tucakov, V. Distance-Based Outliers: Algorithms and Applications. Vldb J. Int. J. Very Large Data Bases 2000, 8, 237–253. [Google Scholar] [CrossRef]
- Ramaswamy, S.; Rastogi, R.; Shim, K. Efficient Algorithms for Mining Outliers from Large Data Sets. SIGMOD Rec. 2000, 29, 427–438. [Google Scholar] [CrossRef]
- Angiulli, F.; Pizzuti, C. Fast Outlier Detection in High Dimensional Spaces. In Principles of Data Mining and Knowledge Discovery; Goos, G., Hartmanis, J., van Leeuwen, J., Carbonell, J.G., Siekmann, J., Elomaa, T., Mannila, H., Toivonen, H., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2002; pp. 15–27. [Google Scholar] [CrossRef]
- Geler, Z.; Kurbalija, V.; Radovanović, M.; Ivanović, M. Comparison of Different Weighting Schemes for the kNN Classifier on Time-Series Data. Knowl. Inf. Syst. 2016, 48, 331–378. [Google Scholar] [CrossRef]
- Dudani, S.A. The Distance-Weighted k-Nearest-Neighbor Rule. IEEE Trans. Syst. Man Cybern. 1976, SMC-6, 325–327. [Google Scholar] [CrossRef]
- Zavrel, J. An Empirical Re-Examination of Weighted Voting for k-NN. In Proceedings of the BENELEARN-97 7th Belgian-Dutch Conference on Machine Learning, Tilburg, The Netherlands, 21 October 1997; Daelemans, W., Flach, P., van den Bosch, A., Eds.; Tilburg University: Tilburg, The Netherlands, 1997; pp. 139–145. [Google Scholar]
- Macleod, J.E.S.; Luk, A.; Titterington, D.M. A Re-Examination of the Distance-Weighted k-Nearest Neighbor Classification Rule. IEEE Trans. Syst. Man Cybern. 1987, 17, 689–696. [Google Scholar] [CrossRef]
- Wu, M.; Jermaine, C. Outlier Detection by Sampling with Accuracy Guarantees. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, 20–23 August 2006; pp. 767–772. [Google Scholar] [CrossRef]
- Sugiyama, M.; Borgwardt, K. Rapid Distance-Based Outlier Detection via Sampling. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2013; Volume 26. [Google Scholar]
- Pang, G.; Ting, K.M.; Albrecht, D. LeSiNN: Detecting Anomalies by Identifying Least Similar Nearest Neighbours. In Proceedings of the 2015 IEEE International Conference on Data Mining Workshop (ICDMW), Atlantic City, NJ, USA, 14–17 November 2015; pp. 623–630. [Google Scholar] [CrossRef]
- Zimek, A.; Gaudet, M.; Campello, R.J.; Sander, J. Subsampling for Efficient and Effective Unsupervised Outlier Detection Ensembles. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; KDD ’13. Association for Computing Machinery: New York, NY, USA, 2013; pp. 428–436. [Google Scholar] [CrossRef]
- Aggarwal, C.C.; Sathe, S. Theoretical Foundations and Algorithms for Outlier Ensembles. ACM Sigkdd Explor. Newsl. 2015, 17, 24–47. [Google Scholar] [CrossRef]
- Muhr, D.; Affenzeller, M. Little Data Is Often Enough for Distance-Based Outlier Detection. Procedia Comput. Sci. 2022, 200, 984–992. [Google Scholar] [CrossRef]
- Aggarwal, C.; Yu, P. Outlier Detection for High Dimensional Data. ACM SIGMOD Rec. 2002, 30. [Google Scholar] [CrossRef]
- Kriegel, H.P.; Kröger, P.; Schubert, E.; Zimek, A. Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data. In Advances in Knowledge Discovery and Data Mining; Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; pp. 831–838. [Google Scholar] [CrossRef]
- Agrawal, A. Local Subspace Based Outlier Detection. In Contemporary Computing; Ranka, S., Aluru, S., Buyya, R., Chung, Y.C., Dua, S., Grama, A., Gupta, S.K.S., Kumar, R., Phoha, V.V., Eds.; Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2009; pp. 149–157. [Google Scholar] [CrossRef]
- Zhang, L.; Lin, J.; Karim, R. An Angle-Based Subspace Anomaly Detection Approach to High-Dimensional Data: With an Application to Industrial Fault Detection. Reliab. Eng. Syst. Saf. 2015, 142, 482–497. [Google Scholar] [CrossRef]
- Trittenbach, H.; Böhm, K. Dimension-Based Subspace Search for Outlier Detection. Int. J. Data Sci. Anal. 2019, 7, 87–101. [Google Scholar] [CrossRef]
- Keller, F.; Muller, E.; Bohm, K. HiCS: High Contrast Subspaces for Density-Based Outlier Ranking. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, Arlington, VA, USA, 1–5 April 2012; pp. 1037–1048. [Google Scholar] [CrossRef]
- Cabero, I.; Epifanio, I.; Piérola, A.; Ballester, A. Archetype Analysis: A New Subspace Outlier Detection Approach. Knowl.-Based Syst. 2021, 217, 106830. [Google Scholar] [CrossRef]
- Dang, T.T.; Ngan, H.Y.; Liu, W. Distance-Based k-Nearest Neighbors Outlier Detection Method in Large-Scale Traffic Data. In Proceedings of the 2015 IEEE International Conference on Digital Signal Processing (DSP), Singapore, 21–24 July 2015; pp. 507–510. [Google Scholar] [CrossRef]
- Bergman, L.; Cohen, N.; Hoshen, Y. Deep Nearest Neighbor Anomaly Detection. arXiv 2020, arXiv:2002.10445. [Google Scholar]
- Cohen, N.; Hoshen, Y. Sub-Image Anomaly Detection with Deep Pyramid Correspondences. arXiv 2021, arXiv:2005.02357. [Google Scholar]
- Roth, K.; Pemula, L.; Zepeda, J.; Scholkopf, B.; Brox, T.; Gehler, P. Towards Total Recall in Industrial Anomaly Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June 2022; pp. 14298–14308. [Google Scholar] [CrossRef]
- Hautamaki, V.; Karkkainen, I.; Franti, P. Outlier Detection Using K-Nearest Neighbour Graph. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26 August 2004; Volume 3, pp. 430–433. [Google Scholar] [CrossRef]
- Radovanović, M.; Nanopoulos, A.; Ivanović, M. Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection. IEEE Trans. Knowl. Data Eng. 2015, 27, 1369–1382. [Google Scholar] [CrossRef]
- Zhu, Q.; Feng, J.; Huang, J. Natural Neighbor: A Self-Adaptive Neighborhood Method without Parameter K. Pattern Recognit. Lett. 2016, 80, 30–36. [Google Scholar] [CrossRef]
- Wahid, A.; Annavarapu, C.S.R. NaNOD: A Natural Neighbour-Based Outlier Detection Algorithm. Neural Comput. Appl. 2021, 33, 2107–2123. [Google Scholar] [CrossRef]
- Tang, B.; He, H. ENN: Extended Nearest Neighbor Method for Pattern Recognition [Research Frontier]. IEEE Comput. Intell. Mag. 2015, 10, 52–60. [Google Scholar] [CrossRef]
- Tang, B.; He, H. A Local Density-Based Approach for Outlier Detection. Neurocomputing 2017, 241, 171–180. [Google Scholar] [CrossRef]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying Density-Based Local Outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data: 2000, Dallas, TX, USA, 15–18 May 2000; Dunham, M., Naughton, J.F., Chen, W., Koudas, N., Eds.; Association for Computing Machinery: New York, NY, USA, 2000; pp. 93–104. [Google Scholar] [CrossRef]
- Schubert, E.; Zimek, A.; Kriegel, H.P. Local Outlier Detection Reconsidered: A Generalized View on Locality with Applications to Spatial, Video, and Network Outlier Detection. Data Min. Knowl. Discov. 2014, 28, 190–237. [Google Scholar] [CrossRef]
- Schubert, E.; Zimek, A.; Kriegel, H.P. Generalized Outlier Detection with Flexible Kernel Density Estimates. In Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, PA, USA, 24–26 April 2014; Zaki, M., Obradovic, Z., Tan, P.N., Banerjee, A., Kamath, C., Parthasarathy, S., Eds.; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2014; pp. 542–550. [Google Scholar] [CrossRef]
- Zhang, K.; Hutter, M.; Jin, H. A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data. In Advances in Knowledge Discovery and Data Mining, Proceedings of the 13th Pacific-Asia Conference, PAKDD 2009, Bangkok, Thailand, 27–30 April 2009; Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; Lecture Notes in Computer Science, 0302-9743; Volume 5476, pp. 813–822. [Google Scholar]
- Jin, W.; Tung, A.K.H.; Han, J.; Wang, W. Ranking Outliers Using Symmetric Neighborhood Relationship. In Proceedings of the Advances in Knowledge Discovery and Data Mining; Ng, W.K., Kitsuregawa, M., Li, J., Chang, K., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; Lecture Notes in Computer Science; pp. 577–593. [Google Scholar] [CrossRef]
- Kriegel, H.P.; Kröger, P.; Schubert, E.; Zimek, A. LoOP: Local Outlier Probabilities. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, 2–6 November 2009; Cheung, D.W.L., Song, I.Y., Chu, W.W., Hu, X., Lin, J.J., Eds.; Association for Computing Machinery: New York, NY, USA, 2009; pp. 1649–1652. [Google Scholar] [CrossRef]
- Alghushairy, O.; Alsini, R.; Soule, T.; Ma, X. A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams. Big Data Cogn. Comput. 2021, 5, 1. [Google Scholar] [CrossRef]
- Goodge, A.; Hooi, B.; Ng, S.K.; Ng, W.S. LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 22 February–1 March 2022. [Google Scholar]
- Zhao, Y.; Nasrullah, Z.; Li, Z. PyOD: A Python Toolbox for Scalable Outlier Detection. J. Mach. Learn. Res. 2019, 20, 1–7. [Google Scholar]
- Muhr, D.; Affenzeller, M.; Blaom, A.D. OutlierDetection.Jl: A Modular Outlier Detection Ecosystem for the Julia Programming Language. arXiv 2022, arXiv:2211.04550. [Google Scholar]
- Kriegel, H.P.; Kröger, P.; Schubert, E.; Zimek, A. Outlier Detection in Arbitrarily Oriented Subspaces. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012; pp. 379–388. [Google Scholar] [CrossRef]
- Janssens, J.; Huszár, F.; Postma, E. Stochastic Outlier Selection; Technical Report TiCC TR 2012–001; Tilburg University: Tilburg, The Netherlands, 2012. [Google Scholar]
- van Stein, B.; van Leeuwen, M.; Bäck, T. Local Subspace-Based Outlier Detection Using Global Neighbourhoods. In Proceedings of the 2016 IEEE International Conference on Big Data, Washington, WA, USA, 5–8 December 2016; pp. 1136–1142. [Google Scholar] [CrossRef]
- Latecki, L.J.; Lazarevic, A.; Pokrajac, D. Outlier Detection with Kernel Density Functions. In Proceedings of the Machine Learning and Data Mining in Pattern Recognition, Leipzig, Germany, 18–20 July 2007; Perner, P., Ed.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2007; pp. 61–75. [Google Scholar] [CrossRef]
- Gao, J.; Tan, P.N. Converting Output Scores from Outlier Detection Algorithms into Probability Estimates. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06), Hong Kong, China, 18–22 December 2006; pp. 212–221. [Google Scholar] [CrossRef]
- Kriegel, H.P.; Kroger, P.; Schubert, E.; Zimek, A. Interpreting and Unifying Outlier Scores. In Proceedings of the 2011 SIAM International Conference on Data Mining, Mesa, AZ, USA, 28–30 April 2011; Liu, B., Liu, H., Clifton, C., Washio, T., Kamath, C., Eds.; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2011; pp. 13–24. [Google Scholar] [CrossRef]
- Bouguessa, M. Modeling Outlier Score Distributions. In Proceedings of the Advanced Data Mining and Applications; Lecture Notes in Computer Science. Zhou, S., Zhang, S., Karypis, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 713–725. [Google Scholar] [CrossRef]
- Schubert, E.; Wojdanowski, R.; Zimek, A.; Kriegel, H.P. On Evaluation of Outlier Rankings and Outlier Scores. In Proceedings of the 2012 SIAM International Conference on Data Mining, Anaheim, CA, USA, 26–28 April 2012; Ghosh, J., Liu, H., Davidson, I., Domeniconi, C., Kamath, C., Eds.; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2012; pp. 1047–1058. [Google Scholar] [CrossRef]
- Li, X.; Xiong, H.; Li, X.; Wu, X.; Zhang, X.; Liu, J.; Bian, J.; Dou, D. Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond. Knowl. Inf. Syst. 2022, 64, 3197–3234. [Google Scholar] [CrossRef]
- Zimek, A.; Filzmoser, P. There and Back Again: Outlier Detection between Statistical Reasoning and Data Mining Algorithms. WIREs Data Min. Knowl. Discov. 2018, 8, e1280. [Google Scholar] [CrossRef]
- Micenková, B.; Ng, R.T.; Dang, X.H.; Assent, I. Explaining Outliers by Subspace Separability. In Proceedings of the 2013 IEEE International Conference on Data Mining, Dallas, TX, USA, 7–10 December 2013; pp. 518–527. [Google Scholar] [CrossRef]
- Vinh, N.X.; Chan, J.; Romano, S.; Bailey, J.; Leckie, C.; Ramamohanarao, K.; Pei, J. Discovering Outlying Aspects in Large Datasets. Data Min. Knowl. Discov. 2016, 30, 1520–1555. [Google Scholar] [CrossRef]
- Macha, M.; Akoglu, L. Explaining Anomalies in Groups with Characterizing Subspace Rules. Data Min. Knowl. Discov. 2018, 32, 1444–1480. [Google Scholar] [CrossRef]
- Angiulli, F.; Fassetti, F.; Palopoli, L. Discovering Characterizations of the Behavior of Anomalous Subpopulations. IEEE Trans. Knowl. Data Eng. 2013, 25, 1280–1292. [Google Scholar] [CrossRef]
- Samek, W.; Montavon, G.; Vedaldi, A.; Hansen, L.K.; Müller, K.R. (Eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2019; Volume 11700. [Google Scholar] [CrossRef]
- Lapuschkin, S.; Wäldchen, S.; Binder, A.; Montavon, G.; Samek, W.; Müller, K.R. Unmasking Clever Hans Predictors and Assessing What Machines Really Learn. Nat. Commun. 2019, 10, 1096. [Google Scholar] [CrossRef]
- Kauffmann, J.; Ruff, L.; Montavon, G.; Müller, K.R. The Clever Hans Effect in Anomaly Detection. arXiv 2020, arXiv:2006.10609. [Google Scholar]
- Lee, J.D.; See, K.A. Trust in Automation: Designing for Appropriate Reliance. Hum. Factors 2004, 46, 50–80. [Google Scholar] [CrossRef]
- Jiang, H.; Kim, B.; Guan, M.; Gupta, M. To Trust or Not to Trust a Classifier. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
- Ovadia, Y.; Fertig, E.; Ren, J.; Nado, Z.; Sculley, D.; Nowozin, S.; Dillon, J.; Lakshminarayanan, B.; Snoek, J. Can You Trust Your Model’s Uncertainty? Evaluating Predictive Uncertainty under Dataset Shift. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Perini, L.; Vercruyssen, V.; Davis, J. Quantifying the Confidence of Anomaly Detectors in Their Example-Wise Predictions. In Proceedings of the Machine Learning and Knowledge Discovery in Databases; Lecture Notes in Computer Science; Hutter, F., Kersting, K., Lijffijt, J., Valera, I., Eds.; Springer: Cham, Switzerland, 2021; pp. 227–243. [Google Scholar] [CrossRef]
- Muhr, D.; Affenzeller, M. Hybrid (CPU/GPU) Exact Nearest Neighbors Search in High-Dimensional Spaces. In Proceedings of the Artificial Intelligence Applications and Innovations; IFIP Advances in Information and Communication Technology; Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P., Eds.; Springer: Cham, Switzerland, 2022; pp. 112–123. [Google Scholar] [CrossRef]
- Kirner, E.; Schubert, E.; Zimek, A. Good and Bad Neighborhood Approximations for Outlier Detection Ensembles. Lect. Notes Comput. Sci. 2017, 10609, 173–187. [Google Scholar] [CrossRef]
- Burghouts, G.; Smeulders, A.; Geusebroek, J.M. The Distribution Family of Similarity Distances. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2007; Volume 20. [Google Scholar]
- Schnitzer, D.; Flexer, A.; Schedl, M.; Widmer, G. Local and Global Scaling Reduce Hubs in Space. J. Mach. Learn. Res. 2012, 13, 2871–2902. [Google Scholar]
- Houle, M.E. Dimensionality, Discriminability, Density and Distance Distributions. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, 7–10 December 2013; pp. 468–473. [Google Scholar] [CrossRef]
- Lellouche, S.; Souris, M. Distribution of Distances between Elements in a Compact Set. Stats 2020, 3, 1–15. [Google Scholar] [CrossRef]
- Pekalska, E.; Duin, R. Classifiers for Dissimilarity-Based Pattern Recognition. In Proceedings of the 15th International Conference on Pattern Recognition, ICPR-2000, Barcelona, Spain, 3–8 September 2000; Volume 2, pp. 12–16. [Google Scholar] [CrossRef]
- Hubert, M.; Debruyne, M. Breakdown Value. Wires Comput. Stat. 2009, 1, 296–302. [Google Scholar] [CrossRef]
- Rousseeuw, P.J.; Hubert, M. Robust Statistics for Outlier Detection. Wires Data Min. Knowl. Discov. 2011, 1, 73–79. [Google Scholar] [CrossRef]
- Kim, J.; Scott, C.D. Robust Kernel Density Estimation. J. Mach. Learn. Res. 2012, 13, 2529–2565. [Google Scholar]
- Campos, G.O.; Zimek, A.; Sander, J.; Campello, R.J.G.B.; Micenková, B.; Schubert, E.; Assent, I.; Houle, M.E. On the Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and an Empirical Study. Data Min. Knowl. Discov. 2016, 30, 891–927. [Google Scholar] [CrossRef]
- Muhr, D.; Affenzeller, M. Outlier/Anomaly Detection of Univariate Time Series: A Dataset Collection and Benchmark. In Proceedings of the Big Data Analytics and Knowledge Discovery; Wrembel, R., Gamper, J., Kotsis, G., Tjoa, A.M., Khalil, I., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2022; pp. 163–169. [Google Scholar] [CrossRef]
- Bergmann, P.; Fauser, M.; Sattlegger, D.; Steger, C. MVTec AD—A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 9584–9592. [Google Scholar] [CrossRef]
- Bergmann, P.; Batzner, K.; Fauser, M.; Sattlegger, D.; Steger, C. The MVTec Anomaly Detection Dataset: A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. Int. J. Comput. Vis. 2021, 129, 1038–1059. [Google Scholar] [CrossRef]
- Dua, D.; Graff, C. The UCI Machine Learning Repository. 2017. Available online: http://archive.ics.uci.edu/ml (accessed on 1 June 2023).
- Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 2000, 101, e215. [Google Scholar] [CrossRef]
- Tan, C.W.; Webb, G.I.; Petitjean, F. Indexing and Classifying Gigabytes of Time Series under Time Warping. In Proceedings of the 2017 SIAM International Conference on Data Mining (SDM), Houston, TX, USA, 27–29 April 2017; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2017; pp. 282–290. [Google Scholar] [CrossRef]
- Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR Time Series Archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
- Murray, D.; Liao, J.; Stankovic, L.; Stankovic, V.; Hauxwell-Baldwin, R.; Wilson, C.; Coleman, M.; Kane, T.; Firth, S. A Data Management Platform for Personalised Real-Time Energy Feedback. In Proceedings of the 8th International Conference on Energy Efficiency in Domestic Appliances and Lighting, Lucerne, Switzerland, 26–28 August 2015. [Google Scholar]
- Davis, L.M. Predictive Modelling of Bone Ageing. Ph.D. Thesis, University of East Anglia, Norwich, UK, 2013. [Google Scholar]
- Keogh, E.; Wei, L.; Xi, X.; Lonardi, S.; Shieh, J.; Sirowy, S. Intelligent Icons: Integrating Lite-Weight Data Mining and Visualization into GUI Operating Systems. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06), Hong Kong, China, 18–22 December 2006; pp. 912–916. [Google Scholar] [CrossRef]
- Wang, X.; Ye, L.; Keogh, E.J.; Shelton, C.R. Annotating Historical Archives of Images. Int. J. Digit. Libr. Syst. (IJDLS) 2010, 1, 59–80. [Google Scholar] [CrossRef]
- Sun, J.; Papadimitriou, S.; Faloutsos, C. Online Latent Variable Detection in Sensor Networks. In Proceedings of the 21st International Conference on Data Engineering, Tokyo, Japan, 5–8 April 2005; pp. 1126–1127. [Google Scholar] [CrossRef]
- Sapsanis, C.; Georgoulas, G.; Tzes, A.; Lymberopoulos, D. Improving EMG Based Classification of Basic Hand Movements Using EMD. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, 3–7 July 2013; pp. 5754–5757. [Google Scholar] [CrossRef]
- Mueen, A.; Keogh, E.; Young, N. Logical-Shapelets: An Expressive Primitive for Time Series Classification. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 1154–1162. [Google Scholar] [CrossRef]
- Micenková, B.; van Beusekom, J.; Shafait, F. Stamp Verification for Automated Document Authentication. In Computational Forensics, Proceedings of the 5th International Workshop, IWCF 2012, Tsukuba, Japan, 11 November 2012 and 6th International Workshop, IWCF 2014, Stockholm, Sweden, 24 August 2014; Revised Selected Papers/Utpal Garain, Faisal Shafait; Lecture Notes in Computer Science, 0302-9743; Garain, U., Shafait, F., Eds.; Springer: Cham, Switzerland, 2015; Volume 8915, pp. 117–129. [Google Scholar] [CrossRef]
- Rebbapragada, U.; Protopapas, P.; Brodley, C.E.; Alcock, C. Finding Anomalous Periodic Time Series: An Application to Catalogs of Periodic Variable Stars. Mach. Learn. 2009, 74, 281–313. [Google Scholar] [CrossRef]
- Liu, J.; Zhong, L.; Wickramasuriya, J.; Vasudevan, V. uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications. Pervasive Mob. Comput. 2009, 5, 657–675. [Google Scholar] [CrossRef]
- Olszewski, R.T. Generalized Feature Extraction for Structural Pattern Recognition in Time-Series Data. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2001. [Google Scholar]
- Yang, J.; Rahardja, S.; Fränti, P. Outlier Detection: How to Threshold Outlier Scores? In Proceedings of the International Conference on Artificial Intelligence, Information Processing and Cloud Computing-AIIPCC ’19, Sanya, China, 19–21 December 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Perini, L.; Bürkner, P.C.; Klami, A. Estimating the Contamination Factor’s Distribution in Unsupervised Anomaly Detection. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]










| Dataset | N | O | d | Source | Refs. | 
|---|---|---|---|---|---|
| Annthyroid | 6942 | 347 | 21 | ELKI | [86] | 
| Arrhythmia | 256 | 12 | 259 | ELKI | [86] | 
| Cardiotocography | 1734 | 86 | 21 | ELKI | [86] | 
| CinCECGTorso | 373 | 18 | 1639 | UTSD | [87] | 
| Crop | 1052 | 52 | 46 | UTSD | [88] | 
| Earthquakes | 387 | 19 | 512 | UTSD | [89] | 
| ECG5000 | 3072 | 153 | 140 | UTSD | [87] | 
| ECGFiveDays | 465 | 23 | 136 | UTSD | [89] | 
| ElectricDevices | 4500 | 225 | 96 | UTSD | [89] | 
| FaceAll | 344 | 17 | 131 | UTSD | [89] | 
| FordA | 2660 | 133 | 500 | UTSD | [89] | 
| FordB | 2380 | 119 | 500 | UTSD | [89] | 
| FreezerRegularTrain | 1578 | 78 | 301 | UTSD | [90] | 
| HandOutlines | 921 | 46 | 2709 | UTSD | [91] | 
| HeartDisease | 157 | 7 | 13 | ELKI | [86] | 
| Hepatitis | 70 | 3 | 19 | ELKI | [86] | 
| InternetAds | 1682 | 84 | 1555 | ELKI | [86] | 
| ItalyPowerDemand | 575 | 28 | 24 | UTSD | [92] | 
| MedicalImages | 625 | 31 | 99 | UTSD | [89] | 
| MixedShapesRegularTrain | 793 | 39 | 1024 | UTSD | [93] | 
| MoteStrain | 721 | 36 | 84 | UTSD | [94] | 
| PageBlocks | 5139 | 256 | 10 | ELKI | [86] | 
| Parkinson | 50 | 2 | 22 | ELKI | [86] | 
| PhalangesOutlinesCorrect | 1787 | 89 | 80 | UTSD | [91] | 
| Pima | 526 | 26 | 8 | ELKI | [86] | 
| SemgHandGenderCh2 | 568 | 28 | 1500 | UTSD | [95] | 
| SonyAIBORobotSurface2 | 635 | 31 | 65 | UTSD | [96] | 
| SpamBase | 2661 | 133 | 57 | ELKI | [86] | 
| Stamps | 325 | 16 | 9 | ELKI | [97] | 
| StarLightCurves | 5607 | 280 | 1024 | UTSD | [98] | 
| Strawberry | 369 | 18 | 235 | UTSD | [89] | 
| TwoLeadECG | 611 | 30 | 82 | UTSD | [87] | 
| UWaveGestureLibraryAll | 589 | 29 | 945 | UTSD | [99] | 
| Wafer | 6738 | 336 | 152 | UTSD | [100] | 
| Yoga | 1863 | 93 | 426 | UTSD | [89] | 
| Empiric | Exponential | None | Normal | |
|---|---|---|---|---|
| (distance) | 0.388 ± 0.274 | 0.385 ± 0.273 | 0.387 ± 0.273 | 0.387 ± 0.273 | 
| (exponential) | 0.387 ± 0.273 | 0.384 ± 0.272 | 0.389 ± 0.273 | 0.386 ± 0.272 | 
| (max) | 0.389 ± 0.275 | 0.389 ± 0.275 | 0.389 ± 0.275 | 0.389 ± 0.275 | 
| (mean) | 0.384 ± 0.273 | 0.382 ± 0.271 | 0.384 ± 0.272 | 0.385 ± 0.273 | 
| (rank) | 0.389 ± 0.274 | 0.387 ± 0.273 | 0.388 ± 0.273 | 0.388 ± 0.273 | 
| (reverse) | 0.387 ± 0.273 | 0.383 ± 0.272 | 0.385 ± 0.272 | 0.386 ± 0.273 | 
| Dataset | Train (Normal) | Test (Normal) | Test (Outlier) | Masks | Groups | Shape | 
|---|---|---|---|---|---|---|
| Carpet | 280 | 28 | 89 | 97 | 5 | |
| Grid | 264 | 21 | 57 | 170 | 5 | |
| Leather | 245 | 32 | 92 | 99 | 5 | |
| Tile | 230 | 33 | 84 | 86 | 5 | |
| Wood | 247 | 19 | 60 | 168 | 5 | |
| Bottle | 209 | 20 | 63 | 68 | 3 | |
| Cable | 224 | 58 | 92 | 151 | 8 | |
| Capsule | 219 | 23 | 109 | 114 | 5 | |
| Hazelnut | 391 | 40 | 70 | 136 | 4 | |
| Metal Nut | 220 | 22 | 93 | 132 | 4 | |
| Pill | 267 | 26 | 141 | 245 | 7 | |
| Screw | 320 | 41 | 119 | 135 | 5 | |
| Toothbrush | 60 | 12 | 30 | 66 | 1 | |
| Transistor | 213 | 60 | 40 | 44 | 4 | |
| Zipper | 240 | 32 | 119 | 177 | 7 | 
| F1 (Image) | ROC AUC (Image) | ROC AUC (Pixel) | |
|---|---|---|---|
| PatchCore | 0.978 ± 0.012 | 0.987 ± 0.017 | 0.976 ± 0.017 | 
| ProbabilisticPatchCore | 0.971 ± 0.021 | 0.982 ± 0.023 | 0.976 ± 0.016 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Muhr, D.; Affenzeller, M.; Küng, J. A Probabilistic Transformation of Distance-Based Outliers. Mach. Learn. Knowl. Extr. 2023, 5, 782-802. https://doi.org/10.3390/make5030042
Muhr D, Affenzeller M, Küng J. A Probabilistic Transformation of Distance-Based Outliers. Machine Learning and Knowledge Extraction. 2023; 5(3):782-802. https://doi.org/10.3390/make5030042
Chicago/Turabian StyleMuhr, David, Michael Affenzeller, and Josef Küng. 2023. "A Probabilistic Transformation of Distance-Based Outliers" Machine Learning and Knowledge Extraction 5, no. 3: 782-802. https://doi.org/10.3390/make5030042
APA StyleMuhr, D., Affenzeller, M., & Küng, J. (2023). A Probabilistic Transformation of Distance-Based Outliers. Machine Learning and Knowledge Extraction, 5(3), 782-802. https://doi.org/10.3390/make5030042
 
        


 
       