# Real-Time Anomaly Detection with Subspace Periodic Clustering Approach

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Origin of the Problem

#### 1.2. Motivation and Contribution

## 2. Related Works

## 3. Problem Definitions

**Definition 1.**

_{a}, where V

_{a}is the domain of the attribute a ∈ A, then quadruple S = (U, A, V, f) defines a set-valued information system [60]. A function f:U × A→V is defined as 1 ≤ f(x, a) ∈ V

_{a}, ∀ x ∈ U, a ∈ A. Also, we take the attribute set A = {C∪{d}; C∩{d} = ϕ; C, the set of conditional and {d} the decision attributes}.

**Definition 2.**

**Definition 3.**

**Definition 4.**

**Property 1.**

**Definition 5.**

**Remark 1.**

**Definition 6.**

**Definition 7.**

**Definition 8.**

**Definition 9.**

**Definition 10.**

_{1}, a

_{2}, ..., a

_{m}. Let the attribute set A have n members. Then, S is expressed as an m×n matrix with rows as objects and columns as attributes. Attributes can be designated as dimensions and each a

_{i}= (a

_{i1}, a

_{i2}, …, a

_{in}); i = 1, 2, …, m will be a point in n-dimensional space S.

**Definition 11.**

_{i}= (a

_{i1}, a

_{i2}, …, a

_{in}), being the points in n-dimensional space, then the distance d(a

_{i}, C

_{j}) between a

_{i}; i = 1, 2, ..., n and cluster C

_{j}; j = 1, 2, …, k is defined as follows.

_{j}is the C

_{j},’s centroid and d(a

_{i},C

_{j}) ∈ [0, 1].

**Definition 12.**

_{A}(x) > 0}, whereas the core of A in X is the crisp set containing every element of X with membership grades equal to 1 in A [see e.g., [59]]. Obviously, core [t

_{1}, t

_{2}] = [t

_{1}, t

_{2}], since a closed interval [t

_{1}, t

_{2}] is an equi-fuzzy interval with membership 1 [see e.g., [56,57,58,59]].

**Definition 13.**

_{1}(S) A

_{2}= (A

_{1}− A

_{2})

^{(1/2)}(+) (A

_{1}∩ A

_{2})

^{(1)}(+)(A

_{2}− A

_{1})

^{(1/2)}

_{1}− A

_{2})

^{(1/2)}and (A

_{2}− A

_{1})

^{(1/2)}are fuzzy sets [57,59] with constant membership value (1/2), and (+) signifies a union of disjoint sets. To elaborate it, let A

_{1}= [s

_{1}, t

_{1}] and A

_{2}= [s

_{2}, t

_{2}] are two real intervals such that when A

_{1}∩ A

_{2}≠ ϕ, we will obtain a superimposed part. When two intervals are superimposed, each interval contributes half of its value to the superimposed interval, so from Equation (7) we obtain:

_{1}, t

_{1}](S)[s

_{2}, t

_{2}] = [s

_{(1)},t

_{(2)}]

^{(1/2)}(+) [s

_{(2)},t

_{(1)}]

^{(1)}(+) (s

_{(1)},t

_{(2)}]

^{(1/2)}

_{(1)}= min(s

_{1}, s

_{2}), s

_{(2)}= max(s

_{1}, s

_{2}), t

_{(1)}= min(t

_{1}, t

_{2}), and t

_{(2)}= max(t

_{1}, t

_{2}). The superimposition process is presented using Figure 1, Figure 2 and Figure 3 below.

_{1}, t

_{1}], [s

_{2}, t

_{2}], and [s

_{3}, t

_{3}], (with non-empty intersection) are superimposed to obtain the following expression.

_{1}, t

_{1}](S)[s

_{2}, t

_{2}](S)[s

_{3},t

_{3}] = [s

_{(1)},s

_{(2)}]

^{(1/3)}(+)[s

_{(2)},s

_{(3)}]

^{(2/3)}(+) [s

_{(3)},t

_{(1)}]

^{(1)}(+) [t

_{(1)},t

_{(2)}]

^{(2/3)}(+)[t

_{(2)},t

_{(3)}]

^{(1/3)}

_{(i)}; i = 1, 2, 3} is arranged from {s

_{i}; i = 1, 2, 3} in an increasing order of magnitude and {t

_{(i)}; i = 1, 2, 3} is also arranged from {t

_{i}; i = 1, 2, 3} in the similar fashion. Let [s

_{i}, t

_{i}], i = 1,2,…,n, be n real intervals with $\underset{i=1}{\overset{n}{\cap}}\left[{s}_{i},{t}_{i}\right]$ ≠ ϕ. Using generalization (9) gives as follows.

_{1}, t

_{1}](S) [s

_{2}, t

_{2}](S) ... (S)[s

_{n}, t

_{n}] = [s

_{(1)}, s

_{(2)}]

^{(1/n)}(+) [s

_{(2)}, s

_{(3)}]

^{(2/n)}(+) ... (+) [s

_{(r)}, s

_{(r+1)}]

^{(r/n)}(+) ... (+) [s

_{(n)},t

_{(1)}]

^{(1)}(+)[t

_{(1),}t

_{(2)}]

^{((n−1)/n)}(+)...(+)[t

_{(n-r)},t

_{(n-r+1)}]

^{(r/n)}(+)...(+)[t

_{(n-2)},t

_{(n-1)}]

^{(2/n)}(+)[t

_{(n-1)},t

_{(n)}]

^{(1/n)}

_{(i)}} is organized from {s

_{i}} in increasing order of magnitude for i = 1,2, …, n and similarly {t

_{(i)}} is also organized from {t

_{i}} in increasing order of magnitude [57]. It is to be noted here that the membership functions are a mixture of an empirical probability distribution function and a complementary probability distribution function given, as follows:

**Definition 14.**

**Definition 15.**

## 4. Proposed Algorithm

Algorithm 1: Subspace Generation |

Input: (U, A): the information system, where the attribute set A is divided into C-conditional attributes and D-decision attributes, consisting of n objects,Output: Subspace of (U, A) Step 1. Generate a dominance relation ${R}_{C}^{\ge}$ on U corresponding to C and X ⊆ U. Step 2. Generate the nano topology ${\tau}_{C}^{\ge}\left(X\right)$ and its basis ${\beta}_{C}^{\ge}\left(X\right)$ Step 3. for each x ∈ $C,\mathrm{find}{\tau}_{C-\left\{x\right\}}^{\ge}\left(X\right)$ and ${\beta}_{C-\left\{x\right\}}^{\ge}\left(X\right)$ Step 4. if ( ${\beta}_{C}^{\ge}\left(X\right)={\beta}_{C-\left\{x\right\}}^{\ge}\left(X\right)$ )Step 5. then drop x from C,Step 6. else form criterion reductionStep 7. end forStep 8. generate CORE(C) = ∩ {criterion reductions} Step 9. Generate subspace of the given information system. |

_{max}. Otherwise, a new life-span will start by setting the current-time as a start-time and the previous life-span of the cluster will be closed with last-time as the end of the life-span. The lifespan of the cluster will be put on the list maintained for it if its length is greater than a specified length (say t

_{min}). The lifespans of the earlier and later clusters are updated if a data instance switches from one cluster to another during the execution process. For instance, if the time stamp on the outgoing data instance is either the start-time or end-time of the preceding cluster, the lifespan of the prior cluster is updated by using the next or previous cluster time-stamps respectively. Updates are made to the cluster-centroids as well. Again, the lifespans of the former and later clusters will not change if the time stamp of the outgoing data instance falls within those lifespans, but the cluster centroids will be modified. Similar to this, if the time stamp of a data instance migrating from one cluster to another falls outside the later cluster’s lifespan, the cluster-centroid is updated and the later cluster’s life-span is updated as well, provided that the time gap between the two clusters is within a certain limit (t

_{max).}The pseudocode of the algorithm is given below.

_{min}are provided by Algorithm 2.

Algorithm 2: Dynamic k-means clustering algorithm |

Input: E: Information system consisting n objects and attribute set CORE(A) ⊆ A, t_{max}: the maximum time-gap of consecutive time-stamp, t_{min}: the minimum length of lifespan.Output: Set of clusters where each cluster is associated with a sequence of time intervals as its lifespansStep 1. Given d _{1}-dimensional dataset CORE(A)Step 2. Select C[i] = {x[i], tp[i]}; i = 1, 2, …, k, where x[i] be the data instances or means of clusters, tp[i] points to list of time-intervals each maintained for every cluster contains time-stamps (start-time) of x[i] and start-time = last-time initially Step 3. for each incoming data instance x with current time-stamp current-time Step 3. { if d(x, C_{j}) ≤ d(x, C_{i}), i ≠ j; i = 1, 2, …, kStep 4. {Add x to C _{j}Step 5. Update mean(C _{j})Step 6. if (|current-time − last-time[j]|≤ t_{max})Step 7. { if(last-time[j] ≤ current-time)Step 8. extend lifespan(C _{j}) by setting last-time[j] = current-timeStep 9. else go to Step3Step 10. } Step 11. else if|last-time[j] − start-time|≥ t_{min} Step 12. {Add [start-time[j], last-time[j]] to tp[j] Step 13. set last-time[j] = start-time[j] = current-time Step 14. } Step 15. } Step 16. } Step 17. if (assign does not occur) go to step19Step 18. else go to Step3 Step 19. Output cluster set |

Algorithm 3: Algorithm for finding periodic (fully/partially) and fuzzy periodic clusters |

Input: Set of clusters along with their lifespans (set of sequence of time intervals).Output: Set of fuzzy periodic clustersStep 1. For each cluster c with list of linespans L. Step 2. initially Lc=null//Lc is the list of superimposed intervals Step 3. lt = L.get() //lt points to the 1 ^{st} time interval (lifespan) in L Step 4. Lc = append(lt) Step 5. m = 1 //m = number of intervals superimposed Step 6. while((lt=L.get())!=null)Step 7. {flag = 0 Step 8. while ((lct =L.get())!=null)Step 9. if (compsuperimp(lt, lct) Step 10. flag =1 Step 11. if (flag == 0)Step 12. Lc.append(lt) } Step 13. } Step 14. } Step 15. compsupeimp(lt, lct) Step 16. if(|intersect(lct, lt)!=null)| Step 17. { superimp(lct, lt) Step 18. m++ Step 19. return 1 Step 20. } Step 21. return 0 Step 22. Compute match ratio = m/n //n = number periods in the whole dataset. Step 23. if (match = 1) Step 24. the cluster c is fully periodic Step 25. else partially periodicStep 26. generate fuzzy time intervals from superimposed time intervals to get fuzzy periodic clusters. Step 27. End |

## 5. Complexity Analysis

^{2}.d), where |U| = n, and |C| = d. For generating the nano topology, the lower approximation and approximation of the set has to be generated, which takes computational time O(|X|.|U|). So the total computational cost of step1 and step2 is O(n

^{2}.d+|X|.|U|)= O(n

^{2}.d) which is the worst-case complexity. From step3 for loop starts it runs over at most all the attributes of the attribute set. The computation from step 4 to step7 takes constant time, say O(k

_{1}), where k

_{1}= constant. Therefore, the computational cost from step3 to step8 is O(k

_{1}d). Similarly, that of step9 and 10 is also constant, say O(k

_{2}), where k

_{2}= constant. The overall complexity of algorithm1 is O(n

^{2}.d + k

_{1}d+ k

_{2}) = O(n

^{2}.d). For finding the complexity of Algorithm2, the following steps are taken. Let k(≤ n) be the number of clusters. The computational cost of a centroid is O(n + n.k.d

_{1}) = O(n.k.d

_{1}), where d

_{1}(≤d), is the dimension of the CORE. Also, O(2n.k) = O(n.k) is the time required compute the minimum distance and time-gap for each cluster. The cost of updating cluster-mean and lifespan is O(2k). The total cost of algorithm2 is O(i(n.k.d

_{1}+ n.k + k)) = O(i.n.k.d

_{1}) = O(n

^{3}) as i(≤ n), the number of iterations, k ≤ n, and d

_{1}is considerably small. The worst-case complexity of the whole method is O(n

^{2}.d + n

^{3}). For finding the time-complexity of Algorithm 3, we proceed as follows. Let n

_{1}be the size of the sequence of time intervals associated with a cluster and n

_{2}be the average number of time intervals superimposed. For each time interval of a cluster, it is required to make a pass through the list of superimposed time intervals to check whether the corresponding time interval can be superimposed on any of the available superimposed time intervals or not. For this, the intersection of the current time-interval with the core of the superimposed time interval is computed, which requires O(1) time. If the current time interval is superimposed, then its boundaries have to be inserted into two sorted arrays used to keep the end points of the superimposed time-intervals (one sorted array for left end points and other for right end points). Now, searching in a sorted array requires O(log n

_{1}) time and insertion needs O(n

_{1}) time. The two end points require O(2(log n

_{1}+ n

_{1})) = O(n

_{1}) time. For one cluster, the process requires O(n

_{1}.p.n

_{2}) time, where p is the size of the list of superimposed time intervals. On the other hand, p = O(n

_{1}), and n

_{2}= O(n

_{1}); the overall time-complexity in the worst-case is O(n

_{1}

^{3}). For k clusters, the total time-complexity in worst-case is O(k.n

_{1}

^{3}). Therefore, the worst-case complexity of the whole method is O((n

^{2}.d + n

^{3}) + k.n

_{1}

^{3}). Also k = O(n), which gives the time-complexity, as O(n

^{2}.d + n

^{3}+ n.n

_{1}

^{3}) = O(n

^{3}+ n.n

_{1}

^{3}), as d ≤ n, which is the time-complexity of the method in worst-case. Since the time-complexity of the method depends on n and n

_{1,}and not on d (dimension), the worst-case complexity of the method can be rewritten as O(n

^{3}). Thus, the method runs in cubic time.

## 6. Experimental Analysis and Results

_{min}(minimum length of a lifespan = 180 min) and t

_{max}(maximum time-gap between two consecutive time-stamps associated with a cluster = 20 min) are to be specified. Then the Algorithm 3 is applied to the clusters to generate periodic, partially periodic, and fuzzy periodic clusters. The performances of the proposed method along with the afore-mentioned methods are recorded. The performance is measured using the following evaluation metrics.

## 7. Conclusions, Limitations and Lines for Future Works

#### 7.1. Conclusions

^{3}+ n.n

_{1}

^{3}) in the worst-case, where n = the number of instances and n

_{1}= the maximum number of intervals associated with any cluster. Obviously, n

_{1}is very small in comparison to n. Therefore, the method runs in cubic time. Further, it has also been found that RADSPCA runs linearly with respect to the dimension of the datasets.

#### 7.2. Limitations and Future Directions of Work

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Xu, L.D.; He, W.; Li, S. Internet of Things in Industries: A Survey. IEEE Trans. Ind. Inform.
**2014**, 10, 2233–2243. [Google Scholar] [CrossRef] - Sisinni, E.; Saifullah, A.; Han, S.; Jennehag, U.; Gidlund, M. Industrial Internet of Things: Challenges, Opportunities, and Directions. IEEE Trans. Ind. Inform.
**2018**, 14, 4724–4734. [Google Scholar] [CrossRef] - Sethi, P.; Sarangi, S. Internet of Things: Architectures, Protocols, and Applications. J. Electr. Comput. Eng.
**2017**, 2017, 9324035. [Google Scholar] [CrossRef] [Green Version] - Papaioannou, M.; Karageorgou, M.; Mantas, G.; Sucasas, V.; Essop, I.; Rodriguez, J.; Lymberpoulos, D. A Survey on Security Threats and Countermeasures in Internet of Medical Things (IoMT). Trans. Emerg. Telecommun. Technol.
**2020**, 33, e4049. [Google Scholar] [CrossRef] - Mantas, G.; Komninos, N.; Rodriguz, J.; Logota, E.; Marques, H. Security for 5G Communications. In Fundamentals of 5G Mobile Networks; Wiley: Hoboken, NJ, USA, 2015; pp. 207–220. [Google Scholar] [CrossRef] [Green Version]
- Zarpelão, B.B.; Miani, R.S.; Kawakami, C.T.; de Alvarenga, S.C. A survey of intrusion detection in Internet of Things. J. Netw. Comput. Appl.
**2017**, 84, 25–37. [Google Scholar] [CrossRef] - Makhdoom, I.; Abolhasn, M.; Lipman, J.; Liu, R.P.; Ni, W. Anatomy of Threats to the Internet of Things. IEEE Commun. Surv. Tutorials
**2019**, 21, 1636–1675. [Google Scholar] [CrossRef] - Zachos, G.; Essop, I.; Mantas, G.; Porfyrkis, K.; Ribeiro, J.C.; Rodriguez, J. Generating IoT Edge Network Datasets based on the TON_IoT Telemetry Dataset. In Proceedings of the IEEE 26th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD-2021), Porto, Portugal, 25–27 October 2021. [Google Scholar] [CrossRef]
- Mazarbhuiya, F.A.; Shenify, M. A Mixed Clustering Approach for Real-Time Anomaly Detection. Appl. Sci.
**2023**, 13, 4151. [Google Scholar] [CrossRef] - Mazarbhuiya, F.A.; AlZahrani, M.Y.; Mahanta, A.K. Detecting Anomaly Using Partitioning Clustering with Merging. ICIC Express Lett.
**2020**, 14, 951–960. [Google Scholar] - Mazarbhuya, F.A.; AlZahrani, M.Y.; Georgieva, L. Anomaly Detection Using Agglomerative Hierarchical Clustering Algorithm; ICISA 2018. Lecture Notes on Electrical Engineering (LNEE); Springer: Hong Kong, China, 2019; Volume 514, pp. 475–484. [Google Scholar]
- Mazarbhuiya, F.A. Detecting Anomaly using Neighborhood Rough Set based Classification Approach. ICIC Express Lett.
**2023**, 17, 73–80. [Google Scholar] - Al Mamun, S.M.A.; Valmaki, J. Anomaly Detection and Classification in Cellular Networks Using Automatic Labeling Technique for Applying Supervised Learning. Procedia Comput. Sci.
**2018**, 140, 186–195. [Google Scholar] [CrossRef] - Liu, Y.; Wang, H.; Zhang, X.; Tian, L. An Efficient Framework for Unsupervised Anomaly Detection over Edge-Assisted Internet of Things. ACM Trans. Sens. Netw.
**2023**, 2023, 1–26. [Google Scholar] [CrossRef] - Mozaffari, M.; Doshi, K.; Yilmaz, Y. Self-Supervised Learning for Online Anomaly Detection in High-Dimensional Data Streams. Electronics
**2023**, 12, 1971. [Google Scholar] [CrossRef] - Angiulli, F.; Fasetti, F.; Serrao, C. Anomaly detection with correlation laws. Data Knowl. Eng.
**2023**, 145, 102181. [Google Scholar] [CrossRef] - Fan, Z.; Wang, G.; Zhang, K.; Liu, S.; Zhong, T. Semi-Supervised Anomaly Detection via Neural Process. IEEE Trans. Knowl. Data Eng.
**2023**, 2023, 1–13. [Google Scholar] [CrossRef] - Lu, T.; Wang, L.; Zhao, X. Review of Anomaly Detection Algorithms for Data Streams. Appl. Sci.
**2023**, 13, 6353. [Google Scholar] [CrossRef] - Hartigan, J.A. Hartigan Clustering Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 1975. [Google Scholar]
- Cheng, Y.-M.; Jia, H. A Unified Metric for Categorical and Numeric Attributes in Data Clustering. Hong Kong University Technical Report. 2011. Available online: https://www.comp.hkbu.edu.hk/tech-report (accessed on 12 June 2018).
- Mazarbhuiya, F.A.; Abulaish, M. Clustering Periodic Patterns using Fuzzy Statistical Parameters. Int. J. Innov. Comput. Inf. Control.
**2012**, 8, 2113–2124. [Google Scholar] - Gil-Garcia, R.; Badia-Contealles, J.M.; Pons-Porrata, A. Dynamic Hierarchical Compact Clustering Algorithm. In Progress in Pattern Recognition, Image Analysis and Applications; Sanfeliu, A., Cortés, M.L., Eds.; CIARP 2005, LNCS 3775; Springer: Berlin/Heidelberg, Germany, 2005; pp. 302–310. [Google Scholar]
- Hammouda, K.M.; Kamel, M.S. Efficient phrase-based document indexing for Web document clustering. IEEE Trans. Knowl. Data Eng.
**2004**, 16, 1279–1296. [Google Scholar] [CrossRef] - Erfani, S.M.; Rajasegrar, S.; Karunasekera, S.; Leckie, C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit.
**2016**, 58, 121–134. [Google Scholar] [CrossRef] - Hodge, V.; Austin, J. A survey of outlier detection methodologies. Artif. Intell. Rev.
**2004**, 22, 85–126. [Google Scholar] [CrossRef] [Green Version] - Kaya, M.; Schoop, M. Analytical Comparison of Clustering Techniques for the Recognition of Communication Patterns. Group Decis. Negot.
**2022**, 31, 555–589. [Google Scholar] [CrossRef] - Aggarwaal, C.C.; Philip, S.Y. An effective and efficient algorithm for high-dimensional outlier detection. VLDB J.
**2005**, 14, 211–221. [Google Scholar] [CrossRef] - Ramchandran, A.; Sangaiaah, A.K. Chapter 11—Unsupervised Anomaly Detection for High Dimensional Data—An Exploratory Analysis. In Computational Intelligence for Multimedia Big Data on the Cloud with Engineering Applications; Intelligent Data-Centric Systems; Academic Press: Cambridge, MA, USA, 2018; pp. 233–251. [Google Scholar]
- Retting, L.; Khayati, M.; Cudre-Maurooux, P.; Piorkowski, M. Online anomaly detection over Big Data streams. In Proceedings of the 2015 IEEE International Conference on Big Data, Santa Clara, CA, USA, 29 October–1 November 2015. [Google Scholar]
- Alguliyev, R.; Aliguuliyev, R.; Sukhostat, L. Anomaly Detection in Big Data based on Clustering. Stat. Optim. Inf. Comput.
**2017**, 5, 325–340. [Google Scholar] [CrossRef] - Hahsler, M.; Piekenbroock, M.; Doran, D. dbscan: Fast Density-Based Clustering with R. J. Stat. Softw.
**2019**, 91, 1–30. [Google Scholar] [CrossRef] [Green Version] - Song, H.; Jiang, Z.; Men, A.; Yang, B. A Hybrid Semi-Supervised Anomaly Detection Model for High Dimensional Data. Comput. Intell. Neurosci.
**2017**, 2017, 8501683. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Mazarbhuiya, F.A. Detecting IoT Anomaly Using Rough Set and Density Based Subspace Clustering. ICIC Express Lett.
**2022**. accepted. [Google Scholar] [CrossRef] - Ahmed, S.; Lavin, A.; Purdy, S.; Aghaa, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing
**2017**, 262, 134–147. [Google Scholar] [CrossRef] - Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci.
**1982**, 11, 341–356. [Google Scholar] [CrossRef] - Thivagar, M.L.; Richaard, C. On nano forms of weakly open sets. Int. J. Math. Stat. Invent.
**2013**, 1, 31–37. [Google Scholar] - Thivagar, M.L.; Priyalaatha, S.P.R. Medical diagnosis in an indiscernibility matrix based on nano topology. Cogent Math.
**2017**, 4, 1330180. [Google Scholar] [CrossRef] - Kim, B.; Alawaami, M.A.; Kim, E.; Oh, S.; Park, J.; Kim, H. A Comparative Study of Time Series Anomaly Detection, Models for Industrial Control Systems. Sensors
**2023**, 23, 1310. [Google Scholar] [CrossRef] - Alghawli, A.S. Complex methods detect anomalies in real time based on time series analysis. Alex. Eng. J.
**2022**, 61, 549–561. [Google Scholar] [CrossRef] - Younas, M.Z. Anomaly Detection using Data Mining Techniques: A Review. Int. J. Res. Appl. Sci. Eng. Technol.
**2020**, 8, 568–574. [Google Scholar] [CrossRef] - Thudumu, S.; Branch, P.; Jin, J.; Siingh, J. A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data
**2020**, 7, 42. [Google Scholar] [CrossRef] - Habeeb, R.A.A.; Nasaaruddin, F.; Gani, A.; Hashem, I.A.T.; Ahmed, E.; Imran, M. Real-time big data processing for anomaly detection: A Survey. Int. J. Inf. Manag.
**2019**, 45, 289–307. [Google Scholar] [CrossRef] [Green Version] - Wang, B.; Hua, Q.; Zhang, H.; Tan, X.; Nan, Y.; Chen, R.; Shu, X. Research on anomaly detection and real-time reliability evaluation with the log of cloud platform. Alex. Eng. J.
**2022**, 61, 7183–7193. [Google Scholar] [CrossRef] - Halstead, B.; Koh, Y.S.; Riddle, P.; Pechenizkiy, M.; Bifet, A. Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams. ACM Trans. Knowl. Discov. Data
**2023**, 17, 1–36. [Google Scholar] [CrossRef] - Zhao, Z.; Birke, R.; Han, R.; Robu, B.; Buchenak, S.; Ben Mokhtar, S.; Chen, L.Y. RAD: On-line Anomaly Detection for Highly Unreliable Data. arXiv
**2019**, arXiv:1911.04383. [Google Scholar] - Chenaghlou, M.; Moshtghi, M.; Lekhie, C.; Salahi, M. Online Clustering for Evolving Data Streams with Online Anomaly Detection. Advances in Knowledge Discovery and Data Mining. In Proceedings of the 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, 3–6 June 2018; pp. 508–521. [Google Scholar]
- Firoozjaei, M.D.; Mahmoudyar, N.; Baseri, Y.; Ghorbani, A.A. An evaluation framework for industrial control system cyber incidents. Int. J. Crit. Infrastruct. Prot.
**2022**, 36, 100487. [Google Scholar] [CrossRef] - Chen, Q.; Zhou, M.; Cai, Z.; Su, S. Compliance Checking Based Detection of Insider Threat in Industrial Control System of Power Utilities. In Proceedings of the 2022 7th Asia Conference on Power and Electrical Engineering (ACPEE), Hangzhou, China, 15–17 April 2022; pp. 1142–1147. [Google Scholar]
- Zhao, Z.; Mehrootra, K.G.; Mohan, C.K. Online Anomaly Detection Using Random Forest. In Recent Trends and Future Technology in Applied Intelligence; Mouhoub, M., Sadaoui, S., Ait Mohamed, O., Ali, M., Eds.; IEA/AIE 2018; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2018. [Google Scholar]
- Izakian, H.; Pedryecz, W. Anomaly detection in time series data using fuzzy c-means clustering. In Proceedings of the 2013 Joint IFSA World congress and NAFIPS Annual Meeting, Edmonton, AB, Canada, 24–28 June 2013. [Google Scholar]
- Decker, L.; Leite, D.; Giommi, L.; Bonakorsi, D. Real-time anomaly detection in data centers for log-based predictive maintenance using fuzzy-rule based approach. arXiv
**2020**, arXiv:2004.13527v1. [Google Scholar] - Masdari, M.; Khezri, H. Towards fuzzy anomaly detection-based security: A comprehensive review. Fuzzy Optim. Decis. Mak.
**2020**, 20, 1–49. [Google Scholar] [CrossRef] - de Campos Souza, P.V.; Guimarães, A.J.; Rezenede, T.S.; Silva Araujo, V.J.; Araujo, V.S. Detection of Anomalies in Large-Scale Cyberattacks Using Fuzzy Neural Networks. AI
**2020**, 1, 92–116. [Google Scholar] [CrossRef] [Green Version] - Habeeb, R.A.A.; Nasauddin, F.; Gani, A.; Hashem, I.A.T.; Amanullah, A.M.E.; Imran, M. Clustering-based real-time anomaly detection—A breakthrough in big data technologies. Trans. Emerg. Telecommun. Technol.
**2022**, 33, e3647. [Google Scholar] - Mahanta, A.K.; Mazarbhuiya, F.A.; Baruuah, H.K. Finding calendar-based periodic patterns. Pattern Recognit. Lett.
**2008**, 29, 1274–1284. [Google Scholar] [CrossRef] - Mazarbhuiya, F.A.; Mahanta, A.K.; Baruah, H.K. The Solution of fuzzy equation A+X=B using the method of superimposition. Appl. Math.
**2011**, 2, 1039–1045. [Google Scholar] [CrossRef] [Green Version] - Zadeh, L.A. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst.
**1978**, 1, 3–28. [Google Scholar] [CrossRef] - Loeve, M. Probability Theory; Springer Verlag: New York, NY, USA, 1977. [Google Scholar]
- Klir, J.; Yuan, B. Fuzzy Sets and Logic Theory and Application; Prentice Hill Pvt. Ltd.: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
- Qiana, Y.; Dang, C.; Liaanga, J.; Tangc, D. Set-valued ordered information systems. Inf. Sci.
**2009**, 179, 2809–2832. [Google Scholar] [CrossRef] - Stripling, E.; Baeseens, B.; Chizi, B.; Broucke, B.V. Isolation-based conditional anomaly detection on mixed-attribute data to uncover workers’ compensation fraud. Decis. Support Syst.
**2018**, 111, 13–26. [Google Scholar] [CrossRef] - Ding, Z.; Fei, M. An Anomaly Detection Approach Based on Isolation Forest Algorithm for Streaming Data using Sliding Window. IFAC Proc. Vol.
**2013**, 46, 12–17. [Google Scholar] [CrossRef] - Abdullah, J.; Chandran, N. Hierarchical Density-based Clustering of Malware Behaviour. J. Telecommun. Electron. Comput. Eng. (JTEC)
**2017**, 9, 159–164. [Google Scholar] - KDD CUP’99 Data. Available online: https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on 15 January 2020).
- Kitsune Network Attack Dataset. Available online: https://github.com/ymirsky/Kitsune-py (accessed on 12 December 2021).

**Figure 6.**The execution time with respect to the dimension using KDDCUP’99 dataset [64].

**Figure 7.**The execution time with respect to the dimension using Kitsune dataset [65].

**Table 1.**Comparative performances analysis of RADSPCA with some well-known existing methods using KDDCUP’99 [64] dataset.

Algorithms | Evaluation Metrics | Execution Time (in Seconds) | Periodic Clusters Obtained | |||
---|---|---|---|---|---|---|

Recall | Precision | F1-Score | ||||

1 | k-means | 0.9605 | 0.9400 | 0.9500 | 28 | × |

2 | IF model | 0.8301 | 0.850 | 0.8400 | 19 | × |

3 | SC | 0.6220 | 0.6004 | 0.6110 | 44 | × |

4 | HDBSCAN | 0.2530 | 0.2300 | 0.2410 | 95 | × |

5 | ACA | 0.8400 | 0.8010 | 0.8200 | 16 | × |

6 | LOF | 0.9550 | 0.9390 | 0.9470 | 14 | × |

7 | SSWLOFCC | 0.9665 | 0.9460 | 0.9560 | 12 | × |

8 | PCM | 0.8800 | 0.8420 | 0.8600 | 26 | × |

9 | OnCAD | 0.9751 | 0.9650 | 0.9700 | 30 | × |

10 | MICA | 0.9822 | 0.9780 | 0.9800 | 28 | × |

11 | Proposed Approach (RADSPCA) | 0.9812 | 0.9790 | 0.9800 | 58 | √ |

**Table 2.**Comparative performances analysis of RADSPCA with some well-known existing methods using Kitsune [65] dataset.

Algorithms | Evaluation Metrics | Execution Time (in Seconds) | Periodic Clusters Obtained | |||
---|---|---|---|---|---|---|

Recall | Precision | F1-Score | ||||

1 | k-means | 0.8701 | 0.8501 | 0.8600 | 95 | × |

2 | IF model | 0.7300 | 0.7502 | 0.7400 | 64.5 | × |

3 | SC | 0.6645 | 0.6420 | 0.6530 | 149.5 | × |

4 | HDBSCAN | 0.3899 | 0.3793 | 0.3850 | 150 | × |

5 | ACA | 0.7410 | 0.7010 | 0.7200 | 54.4 | × |

6 | LOF | 0.90401 | 0.9000 | 0.9020 | 47.6 | × |

7 | SSWLOFCC | 0.9280 | 0.9499 | 0.9390 | 40 | × |

8 | PCM | 0.7430 | 0.7810 | 0.7600 | 88 | × |

9 | OnCAD | 0.8450 | 0.8353 | 0.8400 | 102 | × |

10 | MICA | 0.9832 | 0.9770 | 0.9800 | 68 | × |

11 | Proposed Approach (RADSPCA) | 0.9860 | 0.9801 | 0.9830 | 88.5 | √ |

Acronym | Full Form and Purpose |
---|---|

IF | Isolation Forest: It is an anomaly detection using binary tree. |

SC | Spectral Clustering: It has been used as an outlier detection algorithm many times |

HDBSCAN | Hierarchical Density-based Spatial Clustering of Applications with Noise: It is a density–based hierarchical clustering approach that has been used for anomaly detection many times with less efficacies |

ACA | Agglomerative Clustering Algorithm: It is a hierarchical clustering approach for anomaly detection. |

LOF | Local Outlier Factor: It is an algorithm to identify outliers based on local neighborhood. |

SSWLOFCC | Streaming Sliding Window Local Outlier Factor Coreset Clustering Algorithm: It focuses on real-time detection of anomalies using big data technologies. |

PCM | Partitioning Clustering with Merging: It is an algorithm for finding anomalies which uses both partitioning and Hierarchical approaches |

OnCAD | Online Clustering and Anomaly Detection: It is a clustering-based anomaly detection approach in data streams that considers the temporal as well as spatial proximity of observations to detect the real-time anomaly. |

MICA | Mixed Clustering Algorithm: It is an algorithm for finding real-time anomalies using both partitioning and Hierarchical approaches |

RADSPSCA | Real-time Anomaly Detection with Subspace Periodic Clustering Approach is the method proposed in this article. |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Mazarbhuiya, F.A.; Shenify, M.
Real-Time Anomaly Detection with Subspace Periodic Clustering Approach. *Appl. Sci.* **2023**, *13*, 7382.
https://doi.org/10.3390/app13137382

**AMA Style**

Mazarbhuiya FA, Shenify M.
Real-Time Anomaly Detection with Subspace Periodic Clustering Approach. *Applied Sciences*. 2023; 13(13):7382.
https://doi.org/10.3390/app13137382

**Chicago/Turabian Style**

Mazarbhuiya, Fokrul Alom, and Mohamed Shenify.
2023. "Real-Time Anomaly Detection with Subspace Periodic Clustering Approach" *Applied Sciences* 13, no. 13: 7382.
https://doi.org/10.3390/app13137382