# Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Anomaly Detection and Data Discovery Based on Description Length

**Definition**

**1.**

#### 1.2. Alternative Approaches

## 2. Minimum Description Length Methods

#### 2.1. Sufficient Statistic Method (SSM)

**Theorem**

**1.**

**Lemma**

**2.**

**Theorem**

**2.**

- 1.
- The support of $\mathbf{t}$ is independent of
**θ**and its interior is connected. - 2.
- The extended CDF ${\stackrel{\u02c7}{\mathbf{F}}}_{i}$ of ${\mathbf{Y}}_{i}$ is continuous and differentiable.
- 3.
- The function ${\mathbf{Y}}_{i}\mapsto {\mathbf{s}}_{i}({\mathbf{Y}}_{i};\mathit{\theta})$ is one-to-one, continuous, and differentiable for fixed
**θ**.

**θ**given by ${\mathbf{r}}_{1}$ and ${\mathbf{r}}_{2}$ are identical.

**Proof.**

**Corollary**

**3.**

**θ**, and assume the equivalence map is a diffeomorphism. Then the distribution on

**θ**given by the sufficient statistic approach is the same for ${\mathbf{t}}_{1}$ and ${\mathbf{t}}_{2}$.

**Proof.**

#### 2.2. Normalized Likelihood Method (NLM)

#### 2.3. Examples

## 3. Scalar Signal Processing Methods

#### 3.1. Iid Gaussian Case

#### 3.1.1. Linear Transformations

#### 3.2. Linear Prediction

#### 3.3. Filterbanks and Wavelets

## 4. Vector Case

#### 4.1. Vector Gaussian Case with Unknown Mean

#### 4.2. Vector Gaussian Case with Unknown $\mathsf{\Sigma}$

#### 4.3. Vector Gaussian Case with Unknown Mean and $\mathsf{\Sigma}$

#### 4.4. Sparsity and DFT

## 5. Experimental Results

#### 5.1. Transient Detection Using Hydrophone Recordings

#### 5.2. Anomaly Detection Using Holter Monitoring Data

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Linear Prediction

## Appendix B. Vector Gaussian Case: Unknown Σ

## Appendix C. Vector Gaussian Case: Unknown Mean and $\mathsf{\Sigma}$

## References

- Høst-Madsen, A.; Sabeti, E.; Walton, C. Data Discovery and Anomaly Detection Using Atypicality: Theory. IEEE Trans. Inf. Theory
**2016**. submitted. [Google Scholar] - Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection for Discrete Sequences: A Survey. IEEE Trans. Knowl. Data Eng.
**2012**, 24, 823–839. [Google Scholar] [CrossRef] [Green Version] - Li, Y.; Nitinawarat, S.; Veeravalli, V.V. Universal Outlier Hypothesis Testing. IEEE Trans. Inf. Theory
**2014**, 60, 4066–4082. [Google Scholar] [CrossRef] - Li, Y.; Nitinawarat, S.; Veeravalli, V.V. Universal Outlier Detection. In Proceedings of the Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 10–15 February 2013; pp. 1–5. [Google Scholar]
- Li, Y.; Nitinawarat, S.; Veeravalli, V.V. Universal Sequential Outlier Hypothesis Testing. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 3205–3209. [Google Scholar]
- Grimmett, G.R.; Stirzaker, D.R. Probability and Random Processes, 3rd ed.; Oxford University Press: Oxford, UK, 2001. [Google Scholar]
- Sabeti, E.; Host-Madsen, A. Atypicality for the Class of Exponential Family. In Proceedings of the 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 27–30 September 2016. [Google Scholar]
- Kay, S.M. Fundamentals of Statistical Signal Processing, Volume II: Detection Theory; Prentice-Hall: Upper Sadle River, NJ, USA, 1993. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin, Germany, 2006. [Google Scholar]
- Cover, T.; Thomas, J. Information Theory, 2nd ed.; John Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
- Ziv, J.; Lempel, A. A Universal Algorithm for Sequential Data Compression. IEEE Trans. Inf. Theory
**1977**, 23, 337–343. [Google Scholar] [CrossRef] - Ziv, J.; Lempel, A. Compression of Individual Sequences via Variable-Rate Coding. IEEE Trans. Inf. Theory
**1978**, 24, 530–536. [Google Scholar] [CrossRef] - Ghido, F.; Tabus, I. Sparse Modeling for Lossless Audio Compression. IEEE Trans. Audio Speech Lang. Proc.
**2013**, 21, 14–28. [Google Scholar] [CrossRef] [Green Version] - Rissanen, J. A Universal Prior for Integers and Estimation by Minimum Description Length. Ann. Stat.
**1983**, 11, 416–431. [Google Scholar] [CrossRef] - Kostina, V. Data Compression With Low Distortion and Finite Blocklength. IEEE Trans. Inf. Theory
**2017**, 63, 4268–4285. [Google Scholar] [CrossRef] - Rissanen, J. Stochastic Complexity and Modeling. Ann. Stat.
**1986**, 14, 1080–1100. [Google Scholar] [CrossRef] - Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection: A Survey. ACM Comput. Surv.
**2009**, 41, 15. [Google Scholar] [CrossRef] - Ranshous, S.; Shen, S.; Koutra, D.; Harenberg, S.; Faloutsos, C.; Samatova, N.F. Anomaly Detection in Dynamic Networks: A Survey. WIREs Comput. Stat.
**2015**, 7, 223–247. [Google Scholar] [CrossRef] - Lee, Y.J.; Yeh, Y.R.; Wang, Y.C.F. Anomaly Detection via Online Oversampling Principal Component Analysis. IEEE Trans. Knowl. Data Eng.
**2013**, 25, 1460–1470. [Google Scholar] [CrossRef] [Green Version] - Pimentel, M.A.; Clifton, D.A.; Clifton, L.; Tarassenko, L. A Review of Novelty Detection. Signal Process.
**2014**, 99, 215–249. [Google Scholar] [CrossRef] - Esling, P.; Agon, C. Time-Series Data Mining. ACM Comp. Surv. (CSUR)
**2012**, 45, 12. [Google Scholar] [CrossRef] - Li, W.; Mahadevan, V.; Vasconcelos, N. Anomaly Detection and Localization in Crowded Scenes. IEEE Trans. Pattern Anal. Mach. Intell.
**2014**, 36, 18–32. [Google Scholar] [PubMed] [Green Version] - Jia, Z.; Shen, C.; Yi, X.; Chen, Y.; Yu, T.; Guan, X. Big-Data Analysis of Multi-Source Logs for Anomaly Detection on Network-Based System. In Proceedings of the 13th IEEE Conference on Automation Science and Engineering (CASE), Xi’an, China, 20–23 August 2017; pp. 1136–1141. [Google Scholar]
- Ahmed, M.; Mahmood, A.N.; Hu, J. A Survey of Network Anomaly Detection Techniques. J. Netw. Comp. Appl.
**2016**, 60, 19–31. [Google Scholar] [CrossRef] - Yoon, M.K.; Mohan, S.; Choi, J.; Christodorescu, M.; Sha, L. Learning Execution Contexts from System Call Distribution for Anomaly Detection in Smart Embedded System. In Proceedings of the Second International Conference on Internet-of-Things Design and Implementation, Pittsburgh, PA, USA, 18–21 April 2017; pp. 191–196. [Google Scholar]
- Sari, A. A Review of Anomaly Detection Systems in Cloud Networks and Survey of Cloud Security Measures in Cloud Storage Applications. J. Inf. Secur.
**2015**, 6, 142. [Google Scholar] [CrossRef] - Høst-Madsen, A.; Sabeti, E.; Walton, C.; Lim, S.J. Universal Data Discovery Using Atypicality. In Proceedings of the 3rd International Workshop on Pattern Mining and Application of Big Data (BigPMA 2016) at the 2016 IEEE International Conference on Big Data (Big Data 2016), Washington, DC, USA, 5–8 December 2016. [Google Scholar]
- Han, C.; Willett, P.; Chen, B.; Abraham, D. A Detection Optimal Min-Max Test for Transient Signals. IEEE Trans. Inf. Theory
**1998**, 44, 866–869. [Google Scholar] [CrossRef] - Wang, Z.; Willett, P. A Performance Study of Some Transient Detectors. IEEE Trans. Signal Proc.
**2000**, 48, 2682–2685. [Google Scholar] [CrossRef] - Wang, Z.; Willett, P.K. All-Purpose and Plug-In Power-Law Detectors for Transient Signals. Trans. Signal Proc.
**2001**, 49, 2454–2466. [Google Scholar] [CrossRef] - Wang, Z.J.; Willett, P. A Variable Threshold Page Procedure for Detection of Transient Signals. IEEE Trans. Signal Proc.
**2005**, 53, 4397–4402. [Google Scholar] [CrossRef] - Guépié, B.K.; Fillatre, L.; Nikiforov, I. Sequential Detection of Transient Changes. Seq. Anal.
**2012**, 31, 528–547. [Google Scholar] [CrossRef] - Egea-Roca, D.; López-Salcedo, J.A.; Seco-Granados, G.; Poor, H.V. Performance Bounds for Finite Moving Average Tests in Transient Change Detection. IEEE Trans. Signal Proc.
**2018**, 66, 1594–1606. [Google Scholar] [CrossRef] - Guépié, B.K.; Fillatre, L.; Nikiforov, I. Detecting a Suddenly Arriving Dynamic Profile of Finite Duration. IEEE Trans. Inf. Theory
**2017**, 63, 3039–3052. [Google Scholar] - Hirai, S.; Yamanishi, K. Detecting Changes of Clustering Structures Using Normalized Maximum Likelihood Coding. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 343–351. [Google Scholar]
- Yamanishi, K.; Miyaguchi, K. Detecting Gradual Changes from Data Stream Using MDL-Change Statistics. In Proceedings of the IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016; pp. 156–163. [Google Scholar]
- Killick, R.; Fearnhead, P.; Eckley, I.A. Optimal Detection of Changepoints with a Linear Computational Cost. J. Am. Stat. Assoc.
**2012**, 107, 1590–1598. [Google Scholar] [CrossRef] - Zou, S.; Fellouris, G.; Veeravalli, V.V. Quickest Change Detection under Transient Dynamics: Theory and Asymptotic Analysis. IEEE Trans. Inf. Theory
**2018**, 1. [Google Scholar] [CrossRef] - Molloy, T.L.; Ford, J.J. Minimax Robust Quickest Change Detection in Systems and Signals with Unknown Transients. IEEE Trans. Autom. Control
**2018**, 1. [Google Scholar] [CrossRef] - Veeravalli, V.V.; Banerjee, T. Quickest Change Detection. Acad. Press Library Signal Proc.
**2013**, 3, 209–256. [Google Scholar] - Fuh, C.D.; Tartakovsky, A.G. Asymptotic Bayesian Theory of Quickest Change Detection for Hidden Markov Models. IEEE Trans. Inf. Theory
**2019**, 65, 511–529. [Google Scholar] [CrossRef] [Green Version] - Lavielle, M. Using Penalized Contrasts for the Change-Point Problem. Signal Proc.
**2005**, 85, 1501–1510. [Google Scholar] [CrossRef] - Larsen, R.J.; Marx, M. An Introduction to Mathematical Statistics and Its Applications; Prentice-Hall: Englewood Cliffs, NJ, USA, 1986; Volume 2. [Google Scholar]
- Roos, T.; Rissanen, J. On Sequentially Normalized Maximum Likelihood Models. In Proceedings of the Workshop on Information Theoretic Methods in Science and Engineering (WITMSE-08), Tampere, Finland, 18 August 2008. [Google Scholar]
- Sabeti, E.; Host-Madsen, A. Enhanced MDL with Application to Atypicality. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017. [Google Scholar]
- Scharf, L.L. Statistical Signal Processing: Detection, Estimation, and Time Series Analysis; Addison-Wesley: Boston, MA, USA, 1990. [Google Scholar]
- Grunwald, P.D. The Minimum Description Length Principle; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
- Rissanen, J. Stochastic Complexity in Statistical Inquiry; World Scientific: Singapore, 1998; Volume 15. [Google Scholar]
- Forchini, G. The Density of the Sufficient Statistics for a Gaussian AR(1) Model in Terms of Generalized Functions. Stat. Probab. Let.
**2000**, 50, 237–243. [Google Scholar] [CrossRef] - Mallat, S. A Wavelet Tour of Signal Processing: The Sparse Way; Academic Press: Cambridge, MA, USA, 2008. [Google Scholar]
- Vetterli, M.; Kovacevic, J. Wavelets and Subband Coding; Prentice Hall: Englewood Cliffs, NJ, USA, 1995; Volume 995. [Google Scholar]
- Vetterli, M.; Herley, C. Wavelets and Filter Banks: Theory and Design. IEEE Trans. Signal Process.
**1992**, 40, 2207–2232. [Google Scholar] [CrossRef] - Mitra, S.K.; Kuo, Y. Digital Signal Processing: A Computer-Based Approach; McGraw-Hill New York: New York, NY, USA, 2006; Volume 2. [Google Scholar]
- Willems, F.M.J.; Shtarkov, Y.; Tjalkens, T. The Context-Tree Weighting Method: Basic Properties. IEEE Trans. Inf. Theory
**1995**, 41, 653–664. [Google Scholar] [CrossRef] - Willems, F.; Shtarkov, Y.; Tjalkens, T. Reflections on “The Context Tree Weighting Method: Basic properties”. Newslett. IEEE Inf. Theory Soc.
**1997**, 47, 1. [Google Scholar] - Sabeti, E.; Høst-Madsen, A. How interesting images are: An Atypicality Approach For Social Networks. In Proceedings of the IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016. [Google Scholar]
- Muirhead, R.J. Aspects of Multivariate Statistical Theory; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 197. [Google Scholar]
- Silver, K. A Passive Acoustic Automated Detector for Sei and Fin Whale Calls. Master’s Thesis, University of Hawaii, Honolulu, HI, USA, 12 November 2014. [Google Scholar]
- Host-Madsen, A.; Sabeti, E. Atypical Information Theory for Real-Valued Data. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 666–670. [Google Scholar]
- Goldberger, A.L.; Amaral, L.A.N.; Glass, L.; Hausdorff, J.M.; Ivanov, P.C.; Mark, R.G.; Mietus, J.E.; Moody, G.B.; Peng, C.K.; Stanley, H.E. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation
**2000**, 101, e215–e220. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Redundancy comparison between ordinary predictive minimum description length (O.P. MDL) and our proposed sufficient statistic method for $\mu =0$ and ${\sigma}^{2}=4$.

**Figure 3.**Detected atypical segments of Holter Monitoring heart rate variability (HRV): “S” stands for supraventricular arrhythmia and “V” stands for ventricular contraction based on annotation provided by PhysioNet [60].

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Sabeti, E.; Høst-Madsen, A.
Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data. *Entropy* **2019**, *21*, 219.
https://doi.org/10.3390/e21030219

**AMA Style**

Sabeti E, Høst-Madsen A.
Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data. *Entropy*. 2019; 21(3):219.
https://doi.org/10.3390/e21030219

**Chicago/Turabian Style**

Sabeti, Elyas, and Anders Høst-Madsen.
2019. "Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data" *Entropy* 21, no. 3: 219.
https://doi.org/10.3390/e21030219