# The Application of a Double CUSUM Algorithm in Industrial Data Stream Anomaly Detection

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- The amount of data is practically infinite, pouring in as time goes on.
- Each piece of data has its own time stamp.
- There is concept drift, and there is no regular data distribution.
- Affected by various conditions, such as the sensor’s operating environment and its installation location, some data are distorted or ineffective and are of low quality.

## 2. Related Work

#### 2.1. Sliding Window

#### 2.2. Theoretical Background: CUSUM

Algorithm 1 CUSUM |

1. CUSUM:${S}_{1}$ =0, i$\in T$={1,2,$\cdots $ ,m},2. ${S}_{k,i}=\mathrm{max}(0,{S}_{k-1,i}+{x}_{k,i}-{\mu}_{i}),$ if ${S}_{k-1,i}\le {T}_{i}$, 3. ${S}_{k,i}=0$ and ${k}_{i}=k-1$, if ${S}_{k-1,i}>{T}_{i}$ 4. Design parameters: bias ${\mu}_{i}\in R>0$ and threshold ${T}_{i}\in R>0$ 5. Output: alarm time(s) ${k}_{i}$ |

## 3. DCUSUM-DS Algorithm

Algorithm 2 DCUSUM-DS |

1. DCUSUM-DS: initialize ${L}_{w}$, ${S}_{w}$, T, $\beta $ 2. Compute: ${M}_{s}$, ${S}_{s}$, ${M}_{L}$, ${S}_{L}$ 3. ${D}_{m}={M}_{s}-{M}_{L}$ 4. Compute: ${D}_{ms}$, ${D}_{mL}$, ${D}_{SS}$, ${D}_{SL}$ 5. ${D}_{mr}={D}_{ms}-{D}_{mL}$ 6. Compute: ${D}_{2ms}$, ${D}_{2mL}$, ${D}_{2SS}$, ${D}_{2SL}$ 7. If ${D}_{mr}>0$ 8. Compute: $\mathrm{sum}({D}_{mr})$ 9. If ${D}_{mr}<0$ 10. Compute: $\mathrm{sum}({D}_{mr})$ 11. Compute: ${R}_{S}={D}_{2mL}*\mathrm{abs}(\mathrm{sum}({D}_{mr}))$ CUSUM(${R}_{S}$) 12. Box(CUSUM(${R}_{S}$)) 13. If ${R}_{S}>{D}_{2mL}+T*{D}_{2SL}$ 14. Compute: n = n + 1(initialize n = 0) 15. If n >$\beta $ 16. Output: Label ${V}_{a}$ 17. If ${D}_{mr}<0$ 18. Compute: $\mathrm{sum}({D}_{mr})$ 19. Compute: ${R}_{S}={D}_{2mL}*\mathrm{abs}(\mathrm{sum}({D}_{mr}))$ 20. CUSUM(${R}_{S}$) 21. Box(CUSUM(${R}_{S}$))If ${R}_{S}<{D}_{2mL}-T*{D}_{2SL}$ 22. Compute: n = n + 1(initialize n = 0) 23. If n >$\beta $ 24. Output: label |

## 4. Simulation and Comparison

## 5. Summary

## Author Contributions

## Funding

## Conflicts of Interest

**Figure 12.**Comparison of receiver operating characteristic (ROC) and area under the curve (AUC) of various data stream machine learning algorithms.

Algorithm | Length of Short Window | Length of Long Window | Threshold | Out Rate |
---|---|---|---|---|

DCUSUM-DS | 25 | 140 | 0.5 | 8 |

SNWCAD-DS | 25 | 140 | 0.5 | 8 |

A-ODDS | 25 | 140 | 0.5 | / |

Setting of Long Window | DCUSUM-DS | A-ODDS | SNWCAD-DS |
---|---|---|---|

131 | 0.1812 | 0.0867 | 0.0912 |

132 | 0.1821 | 0.0871 | 0.0919 |

133 | 0.1823 | 0.0875 | 0.0924 |

134 | 0.1826 | 0.0879 | 0.0932 |

135 | 0.1833 | 0.0931 | 0.1041 |

136 | 0.1839 | 0.0939 | 0.1085 |

137 | 0.1841 | 0.0944 | 0.1167 |

138 | 0.1846 | 0.0952 | 0.1174 |

139 | 0.1853 | 0.0959 | 0.1181 |

140 | 0.1859 | 0.0963 | 0.1190 |

Average | 0.18353 | 0.0918 | 0.10525 |

