# The Application of a Double CUSUM Algorithm in Industrial Data Stream Anomaly Detection

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- The amount of data is practically infinite, pouring in as time goes on.
- Each piece of data has its own time stamp.
- There is concept drift, and there is no regular data distribution.
- Affected by various conditions, such as the sensor’s operating environment and its installation location, some data are distorted or ineffective and are of low quality.

## 2. Related Work

#### 2.1. Sliding Window

#### 2.2. Theoretical Background: CUSUM

Algorithm 1 CUSUM |

1. CUSUM:${S}_{1}$ =0, i$\in T$={1,2,$\cdots $ ,m},2. ${S}_{k,i}=\mathrm{max}(0,{S}_{k-1,i}+{x}_{k,i}-{\mu}_{i}),$ if ${S}_{k-1,i}\le {T}_{i}$, 3. ${S}_{k,i}=0$ and ${k}_{i}=k-1$, if ${S}_{k-1,i}>{T}_{i}$ 4. Design parameters: bias ${\mu}_{i}\in R>0$ and threshold ${T}_{i}\in R>0$ 5. Output: alarm time(s) ${k}_{i}$ |

## 3. DCUSUM-DS Algorithm

Algorithm 2 DCUSUM-DS |

1. DCUSUM-DS: initialize ${L}_{w}$, ${S}_{w}$, T, $\beta $ 2. Compute: ${M}_{s}$, ${S}_{s}$, ${M}_{L}$, ${S}_{L}$ 3. ${D}_{m}={M}_{s}-{M}_{L}$ 4. Compute: ${D}_{ms}$, ${D}_{mL}$, ${D}_{SS}$, ${D}_{SL}$ 5. ${D}_{mr}={D}_{ms}-{D}_{mL}$ 6. Compute: ${D}_{2ms}$, ${D}_{2mL}$, ${D}_{2SS}$, ${D}_{2SL}$ 7. If ${D}_{mr}>0$ 8. Compute: $\mathrm{sum}({D}_{mr})$ 9. If ${D}_{mr}<0$ 10. Compute: $\mathrm{sum}({D}_{mr})$ 11. Compute: ${R}_{S}={D}_{2mL}*\mathrm{abs}(\mathrm{sum}({D}_{mr}))$ CUSUM(${R}_{S}$) 12. Box(CUSUM(${R}_{S}$)) 13. If ${R}_{S}>{D}_{2mL}+T*{D}_{2SL}$ 14. Compute: n = n + 1(initialize n = 0) 15. If n >$\beta $ 16. Output: Label ${V}_{a}$ 17. If ${D}_{mr}<0$ 18. Compute: $\mathrm{sum}({D}_{mr})$ 19. Compute: ${R}_{S}={D}_{2mL}*\mathrm{abs}(\mathrm{sum}({D}_{mr}))$ 20. CUSUM(${R}_{S}$) 21. Box(CUSUM(${R}_{S}$))If ${R}_{S}<{D}_{2mL}-T*{D}_{2SL}$ 22. Compute: n = n + 1(initialize n = 0) 23. If n >$\beta $ 24. Output: label |

## 4. Simulation and Comparison

## 5. Summary

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Li, G.; Zhang, H.; Wang, J.; Zhu, X.D.; Yue, C.T. A review: Pre-warning system of oil-drilling engineering. J. Zhengzhou Univ. (Eng. Sci.)
**2017**, 38, 70–73. [Google Scholar] - Li, G.; Wang, J.; Liang, J.; Yue, C. Application of sliding nest window control chart in data stream anomaly detection. Symmetry
**2018**, 10, 113. [Google Scholar] [CrossRef] - Siddique, K.; Akhtar, Z.; Lee, H.G.; Kim, W.; Kim, Y. Toward Bulk Synchronous Parallel-Based Machine Learning Techniques for Anomaly Detection in High-Speed Big Data Networks. Symmetry
**2017**, 9, 197. [Google Scholar] [CrossRef] - Zabihi, M.; Rad, A.B.; Kiranyaz, S.; Gabbouj, M.; Katsaggelos, A.K. Heart Sound Anomaly and Quality Detection using Ensemble of Neural Networks without Segmentation. In Proceedings of the 2016 Computing in Cardiology Conference (CinC), Vancouver, BC, Canada, 11–14 September 2016; pp. 613–616. [Google Scholar]
- Li, F.; Wang, H.; Zhou, G.; Yu, D.; Li, J.; Gao, H. Anomaly Detection in Gas Turbine Fuel Systems Using a Sequential Symbolic Method. Energies
**2017**, 10, 724. [Google Scholar] [CrossRef] - Lan, K.; Fong, S.; Song, W.; Vasilakos, A.V.; Millham, R.C. Self-Adaptive Pre-Processing Methodology for Big Data Stream Mining in Internet of Things Environmental Sensor Monitoring. Symmetry
**2017**, 9, 244. [Google Scholar] [CrossRef] - Gil, A.; Sanz-Bobi, M.A.; Rodríguez-López, M.A. Behavior Anomaly Indicators Based on Reference Patterns—Application to the Gearbox and Electrical Generator of a Wind Turbine. Energies
**2018**, 11, 87. [Google Scholar] [CrossRef] - Costa, F.G.D.; Duarte, F.S.L.G.; Vallim, R.M.M.; de Mello, R.F. Multidimensional Surrogate stability to Detect Data Stream Concept Drift. Expert Syst. Appl.
**2017**, 87, 15–29. [Google Scholar] [CrossRef] - Ramírez-Gallego, S.; Krawczyk, B.; García, S.; Woźniak, M.; Herrera, F. A Survey on Data Preprocessing for Data Stream Mining: Current Status and Future Directions. Neurocomputing
**2017**, 239, 39–57. [Google Scholar] [CrossRef] - Jankov, D.; Sikdar, S.; Mukherjee, R.; Teymourian, K.; Jermaine, C. Real-time High Performance Anomaly Detection over Data Streams: Grand Challenge. In Proceedings of the 11th ACM International Conference on Distributed and Event-Based Systems, Barcelona, Spain, 19–23 June 2017; pp. 292–297. [Google Scholar]
- Simão, M.A.; Neto, P.; Gibaru, O. Unsupervised Gesture Segmentation by Motion Detection of a Real-Time Data Stream. IEEE Trans. Ind. Inf.
**2017**, 13, 473–481. [Google Scholar] [CrossRef] - Zhang, L.; Lin, J.; Karim, R. Sliding Window-based Fault Detection from High-dimensional Data Streams. IEEE Trans. Syst. Man Cybern. Syst.
**2017**, 47, 289–303. [Google Scholar] [CrossRef] - Tran, K.P.; Castagliola, P.; Celano, G. Monitoring the Ratio of Population Means of a Bivariate Normal Distribution Using CUSUM type Control Charts. Stat. Pap.
**2018**, 59, 387–413. [Google Scholar] [CrossRef] - Rafaelof, M.; Mustak, H.; Rootman, D.B. Anomalous Sphenoid Diploe Vein: Case Report Highlighting the Value of Careful CT Evaluation Prior to Decompression Surgery. Ophthalmic Plast. Reconstr. Surg.
**2018**, 34, 74–75. [Google Scholar] [CrossRef] [PubMed] - Liang, P.; Yang, H.D.; Chen, W.S.; Xiao, S.Y.; Lan, Z.Z. Transfer Learning for Aluminium Extrusion Electricity Consumption Anomaly Detection Via Deep Neural Networks. Int. J. Comput. Integr. Manuf.
**2018**, 31, 396–405. [Google Scholar] [CrossRef] - Aytekin, C.; Ni, X.; Cricri, F.; Aksu, E. Clustering and Unsupervised Anomaly Detection with L2 Normalized Deep Auto-Encoder Representations. arXiv, 2018; arXiv:1802.00187. [Google Scholar]
- Khan, S.; Gani, A.; Wahab, A.W.A.; Singh, P.K. Feature selection of Denial-of-Service attacks using entropy and granular computing. Arab. J. Sci. Eng.
**2018**, 43, 499–508. [Google Scholar] [CrossRef] - Li, G.; Wang, J.; Liang, J.; Yue, C.; Fan, Y.; Song, D.; Lv, Z. Study on drilling engineering prewarning based on random forests. J. Oil Gas Technol.
**2017**, 39, 193–198. [Google Scholar] [CrossRef] - Youn, I.H.; Youn, J.H.; Lee, J.M.; Kim, C.S. Anomaly event Detection for sit-to-stand Transition Recognition to improve Mariner Physical activity Classification during a Sea Voyage. Biomed. Res.
**2018**, 29, 444–447. [Google Scholar] [CrossRef] - Sun, Y.; Tang, K.; Minku, L.L.; Wang, S.; Yao, X. Online Ensemble Learning of Data Streams with Gradually Evolved classes. IEEE Trans. Knowl. Data Eng.
**2016**, 28, 1532–1545. [Google Scholar] [CrossRef] - Jung, I.S.; Berges, M.; Garrett, J.H.; Poczos, B. Exploration and evaluation of AR, MPCA and KL anomaly detection techniques to embankment dam piezometer data. Adv. Eng. Inf.
**2015**, 29, 902–917. [Google Scholar] [CrossRef][Green Version] - Lang, M. A Low-Complexity Model-Free Approach for Real-Time Cardiac Anomaly Detection Based on Singular Spectrum Analysis and Nonparametric Control Charts. Technologies
**2017**, 15, 26. [Google Scholar] - Widmer, G.; Kubat, M. Learning in the presence of concept drift and hidden contexts. Mach. Learn.
**1996**, 23, 69–101. [Google Scholar] [CrossRef][Green Version] - Ahn, K.; Cormode, G.; Guha, S.; McGregor, A.; Wirth, A. Correlation Clustering in Data Streams. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2237–2246. [Google Scholar]
- Chen, Q.; Chen, L.; Lian, X.; Liu, Y.; Yu, J.X. Indexable PLA for Efficient Similarity Search. In Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria, 23–27 September 2007; pp. 435–446. [Google Scholar]
- Liu, X.; Guan, J.; Hu, P. Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl.
**2009**, 57, 927–936. [Google Scholar] [CrossRef][Green Version] - Babcock, B.; Babu, S.; Datar, M.; Motwani, R.; Widom, J. Models and issues in data stream systems. In Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Madison, WI, USA, 3–5 June 2002; pp. 1–16. [Google Scholar]
- Huang, Y.; Tang, J.; Cheng, Y.; Li, H.; Campbell, K.A.; Han, Z. Real-time Detection of False Data Injection in smart Grid Networks: An adaptive CUSUM Method and Analysis. IEEE Syst. J.
**2016**, 10, 532–543. [Google Scholar] [CrossRef] - Cordeschi, N.; Shojafar, M.; Amendola, D.; Baccarelli, E. Energy-saving QoS resource management of virtualized networked data centers for Big Data Stream Computing. In Big Data: Concepts, Methodologies, Tools, and Applications; IGI Global: Hershey, PA, USA, 2016; pp. 848–886. [Google Scholar]
- Baccarelli, E.; Cordeschi, N.; Mei, A.; Panella, M.; Shojafar, M.; Stefa, J. Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: Review, challenges, and a case study. IEEE Netw.
**2016**, 30, 54–61. [Google Scholar] [CrossRef] - Sadik, S.; Le, G. An adaptive Outlier Detection Technique for Data Streams. In Proceedings of the International Conference on Scientific and Statistical Database Management, Portland, OR, USA, 20–22 July 2011; pp. 596–597. [Google Scholar]

**Figure 12.**Comparison of receiver operating characteristic (ROC) and area under the curve (AUC) of various data stream machine learning algorithms.

Algorithm | Length of Short Window | Length of Long Window | Threshold | Out Rate |
---|---|---|---|---|

DCUSUM-DS | 25 | 140 | 0.5 | 8 |

SNWCAD-DS | 25 | 140 | 0.5 | 8 |

A-ODDS | 25 | 140 | 0.5 | / |

Setting of Long Window | DCUSUM-DS | A-ODDS | SNWCAD-DS |
---|---|---|---|

131 | 0.1812 | 0.0867 | 0.0912 |

132 | 0.1821 | 0.0871 | 0.0919 |

133 | 0.1823 | 0.0875 | 0.0924 |

134 | 0.1826 | 0.0879 | 0.0932 |

135 | 0.1833 | 0.0931 | 0.1041 |

136 | 0.1839 | 0.0939 | 0.1085 |

137 | 0.1841 | 0.0944 | 0.1167 |

138 | 0.1846 | 0.0952 | 0.1174 |

139 | 0.1853 | 0.0959 | 0.1181 |

140 | 0.1859 | 0.0963 | 0.1190 |

Average | 0.18353 | 0.0918 | 0.10525 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Li, G.; Wang, J.; Liang, J.; Yue, C. The Application of a Double CUSUM Algorithm in Industrial Data Stream Anomaly Detection. *Symmetry* **2018**, *10*, 264.
https://doi.org/10.3390/sym10070264

**AMA Style**

Li G, Wang J, Liang J, Yue C. The Application of a Double CUSUM Algorithm in Industrial Data Stream Anomaly Detection. *Symmetry*. 2018; 10(7):264.
https://doi.org/10.3390/sym10070264

**Chicago/Turabian Style**

Li, Guang, Jie Wang, Jing Liang, and Caitong Yue. 2018. "The Application of a Double CUSUM Algorithm in Industrial Data Stream Anomaly Detection" *Symmetry* 10, no. 7: 264.
https://doi.org/10.3390/sym10070264