The need to detect outliers or otherwise unusual data, which can be formalized as the estimation a particular quantile of a distribution, is an important problem that frequently arises in a variety of applications of pattern recognition, computer vision and signal processing. For example, our work was most proximally motivated by the practical limitations and requirements of many semi-automatic surveillance analytics systems that detect abnormalities in closed-circuit television (CCTV) footage using statistical models of low-level motion features. In this paper, we specifically address the problem of estimating the running quantile of a data stream with non-stationary stochasticity when the absolute (rather than asymptotic) memory for storing observations is severely limited. We make several major contributions: (i) we derive an important theoretical result that shows that the change in the quantile of a stream is constrained regardless of the stochastic properties of data; (ii) we describe a set of high-level design goals for an effective estimation algorithm that emerge as a consequence of our theoretical findings; (iii) we introduce a novel algorithm that implements the aforementioned design goals by retaining a sample of data values in a manner adaptive to changes in the distribution of data and progressively narrowing down its focus in the periods of quasi-stationary stochasticity; and (iv) we present a comprehensive evaluation of the proposed algorithm and compare it with the existing methods in the literature on both synthetic datasets and three large “real-world” streams acquired in the course of operation of an existing commercial surveillance system. Our results and their detailed analysis convincingly and comprehensively demonstrate that the proposed method is highly successful and vastly outperforms the existing alternatives, especially when the target quantile is high-valued and the available buffer capacity severely limited.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited