An Incremental Local Outlier Detection Method in the Data Stream
Abstract
:1. Introduction
- A proposal for a new incremental local outlier detection method for data streams, in which an incremental update strategy of the composite nearest neighborhood, including the k-nearest neighbor, reverse k-nearest neighbor, and shared k-nearest neighbors, is developed.
- Theoretical analysis for the proposed outlier detection approach is provided, which involves algorithm complexity, scalability, and parameter selection.
- Performance improvement of the proposed approach, compared to the k-nearest neighbor based method, is also demonstrated from extensive experiments on both synthetic and real-life data sets.
2. Materials and Methods
2.1. Related Work
2.2. Distance-Based Local Density and Outlier Factor Estimation
2.3. Incremental Outlier Detection
3. Results
3.1. Scalablity
3.2. Outlier Detection
4. Discussion
Author Contributions
Funding
Conflicts of Interest
References
- Amini, A.; Teh, Y.W.; Saboohi, H. On density-based data streams clustering algorithms: A survey. J. Comput. Sci. Technol. 2014, 29, 116–141. [Google Scholar] [CrossRef]
- Hawkins, D. Identification of Outliers; Chapman and Hall: London, UK, 1980; Volume 80. [Google Scholar]
- Oreilly, C.; Gluhak, A.; Imran, M.A.; Rajasegarar, S. Anomaly detection in wireless sensor networks in a non-stationary environment. IEEE Commun. Surv. Tutor. 2014, 16, 1413–1432. [Google Scholar] [CrossRef]
- Xie, M.; Han, S.; Tian, B.; Parvin, S. Anomaly detection in wireless sensor networks: A survey. J. Netw. Comput. Appl. 2011, 34, 1302–1325. [Google Scholar] [CrossRef]
- Gupta, M.; Gao, J.; Aggarwal, C.C.; Han, J. Outlier detection for temporal data: A survey. IEEE Trans. Knowl. Data Eng. 2014, 26, 2250–2267. [Google Scholar] [CrossRef]
- Schubert, E.; Zimek, A.; Kriegel, H.-P. Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Discov. 2012, 28, 190–237. [Google Scholar] [CrossRef]
- Pokrajac, D.; Lazarevic, A.; Latecki, L.J. Incremental local outlier detection for data streams. In Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, Honolulu, HI, USA, 1 March–5 April 2007; IEEE: Piscataway, NJ, USA; pp. 504–515. [Google Scholar]
- Salehi, M.; Leckie, C.; Bezdek, J.C.; Vaithianathan, T.; Zhang, X. Fast memory efficient local outlier detection in data streams. IEEE Trans. Knowl. Data Eng. 2016, 28, 3246–3260. [Google Scholar] [CrossRef]
- Zhang, Y.; Hamm, N.A.S.; Meratnia, N.; Stein, A.; van de Voort, M.; Havinga, P.J.M. Statistics-based outlier detection for wireless sensor networks. Int. J. Geogr. Inf. Sci. 2012, 26, 1373–1392. [Google Scholar] [CrossRef]
- Kumarage, H.; Khalil, I.; Tari, Z. Granular evaluation of anomalies in wireless sensor networks using dynamic data partitioning with an entropy criteria. IEEE Trans. Comput. 2015, 64, 2573–2585. [Google Scholar] [CrossRef]
- Eskin, E.; Arnold, A.; Prerau, M.; Portnoy, L.; Stolfo, S.; Arnold, A.; Prerau, M.; Portnoy, L.; Stolfo, S. A geometric framework for unsupervised anomaly detection. In Applications of Data Mining in Computer Security; Springer: Boston, MA, USA, 2002; pp. 77–101. [Google Scholar]
- Yu, D.; Sheikholeslami, G.; Zhang, A. Findout: Finding outliers in very large datasets. Knowl. Inf. Syst. 2002, 4, 387–412. [Google Scholar] [CrossRef]
- Guha, S.; Meyerson, A.; Mishra, N.; Motwani, R.; O’Callaghan, L. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng. 2003, 15, 515–528. [Google Scholar] [CrossRef] [Green Version]
- Assent, I.; Kranen, P.; Baldauf, C.; Seidl, T. Anyout: Anytime outlier detection on streaming data. In Proceedings of the International Conference on Database Systems for Advanced Applications, Busan, Korea, 15–18 April 2012; pp. 228–242. [Google Scholar]
- Kim, H.; Min, J.-K. An energy-efficient outlier detection based on data clustering in WSNs. Int. J. Distrib. Sens. Netw. 2014, 10, 619313. [Google Scholar] [CrossRef]
- Rassam, M.A.; Zainal, A.; Maarof, M.A. An efficient distributed anomaly detection model for wireless sensor networks. In Proceedings of the 2013 AASRI Conference on Parallel and Distributed Computing and Systems, Singapore, 1–2 May 2013. [Google Scholar]
- Rajasegarar, S.; Leckie, C.; Bezdek, J.C.; Palaniswami, M. Centered hyperspherical and hyperellipsoidal one-class support vector machines for anomaly detection in sensor networks. IEEE Trans. Inf. Forensics Secur. 2010, 5, 518–533. [Google Scholar] [CrossRef]
- O’Reilly, C.; Gluhak, A.; Imran, M.; Rajasegarar, S. Online anomaly rate parameter tracking for anomaly detection in wireless sensor networks. In Proceedings of the 2012 9th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON), Seoul, Korea, 18–21 June 2012; IEEE: Piscataway, NJ, USA. [Google Scholar]
- Zhang, Y.; Meratnia, N.; Havinga, P.J.M. Distributed online outlier detection in wireless sensor networks using ellipsoidal support vector machine. Ad Hoc Netw. 2013, 11, 1062–1074. [Google Scholar] [CrossRef]
- Breunig, M. Lof: Identifying density-based local outliers. In Proceedings of the ACM Sigmod International Conference on Management of Data, Dalles, TX, USA, 16–18 May 2000; Volume 29, pp. 93–104. [Google Scholar]
- Pimentel, M.A.F.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A review of novelty detection. Signal Process. 2014, 99, 215–249. [Google Scholar] [CrossRef]
- Roussopoulos, N.; Kelley, S.; Vincent, F. Nearest neighbor queries. In Proceedings of the 1995 ACM Sigmod International Conference on Management of Data, San Jose, CA, USA, 22–25 May 1995. [Google Scholar]
- Papadimitriou, S.; Kitagawa, H.; Gibbons, P.B.; Faloutsos, C. Loci: Fast outlier detection using the local correlation integral. In Proceedings of the 19th International Conference on Data Engineering, Bangalore, India, 5–8 March 2003; IEEE: Piscataway, NJ, USA. [Google Scholar]
- Jin, W.; Tung, A.K.; Han, J.; Wei, W. Ranking outliers using symmetric neighborhood relationship. In Proceedings of the Advances in Knowledge Discovery & Data Mining Conference, Singapore, 9–12 April 2006; pp. 577–593. [Google Scholar]
- Angiulli, F.; Pizzuti, C. Outlier mining in large high-dimensional data sets. IEEE Trans. Knowl. Data Eng. 2005, 17, 203–215. [Google Scholar] [CrossRef] [Green Version]
- Zhang, K.; Hutter, M.; Jin, H. A new local distance-based outlier detection approach for scattered real-world data. In Proceedings of the Pacific-Asia Conference on Advances in Knowledge Discovery & Data Mining, Bangkok, Thailand, 27–30 April 2009; Volume 5476, pp. 813–822. [Google Scholar]
- Latecki, L.J.; Lazarevic, A.; Pokrajac, D. Outlier detection with kernel density functions. In Proceedings of the International Conference on Machine Learning & Data Mining in Pattern Recognition, Leipzig, Germany, 18–20 July 2007; Volume 4571, pp. 61–75. [Google Scholar]
- Tang, B.; He, H. A local density-based approach for outlier detection. Neurocomputing 2017, 241, 171–180. [Google Scholar] [CrossRef]
- Beckmann, N.; Kriegel, H.P.; Schneider, R.; Seeger, B. The r*-tree: An efficient and robust access method for points and rectangles. ACM Sigmod Rec. 1990, 19, 322–331. [Google Scholar] [CrossRef]
- Lichman, M. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2013; Available online: http://archive.ics.uci.edu/ml (accessed on 4 July 2012).
- Campos, G.O.; Zimek, A.; Sander, J.; Campello, R.J.G.B.; Micenková, B.; Schubert, E.; Assent, I.; Houle, M.E. On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Min. Knowl. Discov. 2016, 30, 891–927. [Google Scholar] [CrossRef]
- Suthaharan, S.; Alzahrani, M.; Rajasegarar, S.; Leckie, C.; Palaniswami, M. Labelled data collection for anomaly detection in wireless sensor networks. In Proceedings of the International Conference on Intelligent Sensors, Sensor Networks and Information Processing, Brisbane, Australia, 7–10 December 2010. [Google Scholar]
Input: k, , d, n, t and N = . | |
Output: Outliersin X | |
1 | Collects 𝑛 data as the first training set; |
2 | Searches the kNN, RkNN, and SkNN for xi, and 1 ≤ i ≤ n; |
3 | Calculates CLOF(xi), if CLOF(xi) > 1, outlier count of xi increased by 1; |
4 | Collects a new data xn+1, deletes the obsolete data point x1; |
5 | if the outlier count of x1 ≥ t (1 ≤ t ≤ n), x1 is an outlier; |
6 | Searches the kNN, RkNN, and SkNN for xn+1, and 2 ≤ i ≤ n + 1; |
7 | Updates the kNN, RkNN, SkNN, and CLOF for affected data; |
8 | Calculates CLOF(xi), if CLOF(xi) > 1, outlier count of xi increased by 1; |
9 | Collects a new data xn+2, deletes the obsolete data point x2; |
10 | if the outlier count of x2 ≥ t (1 ≤ t ≤ n), x2 is an outlier; |
11 | Searches the kNN, RkNN, and SkNN for xn+2, and 3 ≤ i ≤ n + 2; |
12 | Updates the kNN, RkNN, SkNN, and CLOF for affected data; |
13 | Calculates CLOF(xi), if CLOF(xi) > 1, outlier count of xi increased by 1; |
14 | Continue with steps 4–13; |
15 | Till the end of X; |
16 | Output the outliers in X; |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yao, H.; Fu, X.; Yang, Y.; Postolache, O. An Incremental Local Outlier Detection Method in the Data Stream. Appl. Sci. 2018, 8, 1248. https://doi.org/10.3390/app8081248
Yao H, Fu X, Yang Y, Postolache O. An Incremental Local Outlier Detection Method in the Data Stream. Applied Sciences. 2018; 8(8):1248. https://doi.org/10.3390/app8081248
Chicago/Turabian StyleYao, Haiqing, Xiuwen Fu, Yongsheng Yang, and Octavian Postolache. 2018. "An Incremental Local Outlier Detection Method in the Data Stream" Applied Sciences 8, no. 8: 1248. https://doi.org/10.3390/app8081248