Next Article in Journal
A Novel Infrared and Visible Image Information Fusion Method Based on Phase Congruency and Image Entropy
Next Article in Special Issue
An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection
Previous Article in Journal
Depolarizing Channel Mismatch and Estimation Protocols for Quantum Turbo Codes
Previous Article in Special Issue
Information Theoretic Modeling of High Precision Disparity Data for Lossy Compression and Object Segmentation
Article

Detecting Metachanges in Data Streams from the Viewpoint of the MDL Principle

Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku 113-8656, Japan
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(12), 1134; https://doi.org/10.3390/e21121134
Received: 11 October 2019 / Revised: 11 November 2019 / Accepted: 16 November 2019 / Published: 20 November 2019
(This article belongs to the Special Issue Information-Theoretical Methods in Data Mining)

Abstract

This paper addresses the issue of how we can detect changes of changes, which we call metachanges, in data streams. A metachange refers to a change in patterns of when and how changes occur, referred to as “metachanges along time” and “metachanges along state”, respectively. Metachanges along time mean that the intervals between change points significantly vary, whereas metachanges along state mean that the magnitude of changes varies. It is practically important to detect metachanges because they may be early warning signals of important events. This paper introduces a novel notion of metachange statistics as a measure of the degree of a metachange. The key idea is to integrate metachanges along both time and state in terms of “code length” according to the minimum description length (MDL) principle. We develop an online metachange detection algorithm (MCD) based on the statistics to apply it to a data stream. With synthetic datasets, we demonstrated that MCD detects metachanges earlier and more accurately than existing methods. With real datasets, we demonstrated that MCD can lead to the discovery of important events that might be overlooked by conventional change detection methods.
Keywords: change detection; change of change; data stream; minimum description length principle; code length change detection; change of change; data stream; minimum description length principle; code length

1. Introduction

1.1. Purpose of This Paper

In this study, we are concerned with detecting changes in data streams. The goal of change detection is to detect the time points at which the nature of the data-generating mechanism significantly changes.
Thus far, many algorithms have been proposed to detect change points in data streams (e.g., [1,2,3,4,5,6,7,8,9,10,11]), and several studies addressed or have been related to the issue of changes of changes [12,13,14,15,16,17,18]. In this paper, we refer to the changes of changes as metachanges. A metachange refers to a change in the pattern of when or how changes occur. It is practically important to detect metachanges because they may be early warning signals of important events [12,13]. Metachanges have been treated from a viewpoint of metachanges along time. Metachanges along time indicate that the interval significantly varies between the change points. Such metachanges were called burstiness [12] and volatility [13] in previous studies. The detection of metachanges along time provides users with useful information from data streams. For example, in a machine in a manufacturing factory, a decrease in the interval between change points might be a sign of a serious failure.
There is also another type of metachange: metachanges along state. Here, “state” refers to the parameter value of the probability density function of a distribution. We consider a situation where change points t 1 , are detected for a data stream y 1 , y 2 , , and y t is drawn from p y ( y t ; η ) . Here, p y is a probability density function of distributions, and η is the associated parameter. Note that η is called state in this paper, and it varies before and after a change point. A metachange along state means a change of how significantly η varies before and after a change point. Metachanges along state might provide information such as changes of magnitude and velocity, which indicate an important change in the underlying data-generating mechanism. For example, in a machine in a manufacturing factory, a shift to an abrupt (sudden) change from a gradual (incremental) change [19], or its inverse shift, might be a sign of serious events.
A conceptual illustration of metachanges is shown in Figure 1, where the upper graph shows a data stream y 1 , and change points { t i } i = 1 8 on the horizontal axis. The lower left graph shows intervals between change points Δ t = t i t i 1 on the vertical axis. Metachanges along time occur at t 4 , t 5 , t 6 , t 7 : for example, t 4 t 3 is different from t 3 t 2 and t 2 t 1 . The lower right graph shows the states estimated piecewisely between the change points. Here, we assume y t is drawn from the univariate normal distribution p y ( y t ; μ , σ ) , where μ is the mean and σ is the standard deviation. In this case, ( μ , σ ) is a state. In Figure 1, because there is no significant change in the magnitude of state change between t 1 and t 2 , a metachange along state does not occur at t 2 . However, there is a significant change in the magnitude of change of μ between t 2 and t 3 : thus, a metachange along state occurs at t 3 . Because the magnitudes of the changes of μ and σ are almost the same between t 3 and t 4 , a metachange along state does not occur at t 4 . Using the same procedure, we conclude that metachanges along state occur at t 3 and t 7 with respect to μ . Moreover, metachanges along state occur at t 6 and t 8 with respect to σ : the magnitude of the change of standard deviations around t 6 ( t 8 ) is greater than those around t 5  ( t 7 ). As a result, metachanges along state occur at t 3 , t 6 , t 7 , and t 8 . We can infer that metachanges along both time and state occur at t 6 and t 8 , by combining the metachanges along time and state.
Metachanges along time have been investigated in previous studies [12,13], and, although there have been several studies related to metachanges along state [14,15,16,17,18], the focus of these studies was not on metachanges along state in particular. The purpose of this paper is to propose a framework and an approach to detect metachanges along time and state from a unified view with the minimum description length (MDL) [20]. Therefore, our framework and approach not only include previous notions such as burstiness [12] and volatility [13] but also extend these notions to metachanges along state. MDL asserts that the best statistical decision strategy is the one that compresses the data best. Description and coding with MDL are suitable for quantifying changes, and they enable us to easily integrate the code lengths of time and state.

1.2. Related Work

Change detection has been extensively explored in the area of data mining. Thus far, several methods have been proposed to detect metachanges in data streams [12,13], and there have been several studies related to metachanges along state [14,15,16,17,18].
Kleinberg [12] and Huang et al. [13] proposed algorithms for detecting metachanges along time. Kleinberg [12] proposed an algorithm to detect bursts in a time series. This algorithm assumes that intervals between successive events are drawn from an exponential distribution. The discretized values of the parameters of the exponential distribution are regarded as states. For intervals between successive events, states are estimated with dynamic programming. Changes of state indicate changes of intervals between the successive events. Huang et al. [13] proposed an algorithm, called the volatility detector, which detects changes of rates of change. The volatility detector prepares two buckets, called the buffer and the reservoir, to store intervals between change points. The intervals are put into the buffer sequentially. When the buffer is full, an interval is dropped from the buffer and moved to the reservoir in a first-in-first-out fashion. The reservoir stores the dropped interval by randomly replacing one of its stored intervals. If the ratio of variances of the buffer and the reservoir is over or under the specified threshold, the algorithm judges that the intervals change between change points. The authors called this event volatility shift. Both the burst detector and volatility detector are assumed to be used in two steps. That is, change points are detected with other change detection algorithms, and then changes of intervals between the change points are detected. While the burst detector works in an offline fashion, the volatility detector works in an online fashion.
Moreover, there have been several studies related to metachanges along state [14,15,16,17,18]. Aggarwal [15] introduced velocity density estimation to understand, visualize, and determine trends in the evolution of fast data streams. Spiliopoulou et al. [16,17] proposed an algorithm, called MONIC, to model and track cluster transitions. Ntoutsi et al. [18] proposed an algorithm, called FINGERPRINT, to summarize cluster evolution. Huang et al. [14] proposed a change type detector, intended to categorize change types into three relative types, some of which correspond to concept drifts proposed in [19]. Although their algorithms [14,15,16,17,18] are related to metachanges along state, they are not intended to characterize and detect metachanges directly. In addition, many change detection algorithms have been proposed based on detecting changes of state (e.g., [6,7,8,9,21,22]). The dynamic model selection [6,7] is the seminal work to apply MDL to the task of dynamic model selection and change detection. The MDL change statistics [8], SCAW [9], and STREAMKRIMP [22] are change detection algorithms with MDL. However, these algorithms are not intended to characterize and detect metachanges along state directly.

1.3. Significance of This Paper

In the context of Section 1.1 and Section 1.2, the contributions of this paper are summarized in the following subsections.

1.3.1. Proposal of Concept of Metachange

To detect changes of changes in data streams, we define a concept of metachanges along both time and state. Previous studies [12,13] considered metachanges along time only. In this paper, we deal with metachanges along both time and state. Metachanges along time include the notions proposed in previous studies such as burstiness [12] and volatility [13]. Metachange along state could capture changes of changes of the parameters of distribution between change points.
Our concept of metachange can detect the potential change of changes in data streams, which was overlooked by previous studies.

1.3.2. Novel Algorithm for Detection of Metachanges

We define metachange statistics along both time and state. There is a challenge to combining the metachange statistics along time and those along state. In this paper, these statistics are defined based on the MDL principle. Metachange statistics along time (MCAT) is defined as the code length of an interval between the change points, whereas metachange statistics along state (MCAS) is defined as the difference between the predictive code length and the normalized maximum likelihood (NML) code length [23] after a change. It is possible to simply add these statistics because they are defined as code lengths, which enables us to detect metachanges along both time and state in a unified manner.

2. Theoretical Background of Metachange Statistics

In this section, we consider how to encode both intervals between change points and states around the change points. We assume that for a data stream y 1 , y 2 , change points t 1 , are detected and that the intervals between change points x i = t i t i 1 and y t are drawn, respectively, from
x i p x ( x i ; ξ ) , y t p y ( y t ; η ) ,
where p x and p y are probability density functions of distributions and ξ and η are the associated parameters. Finally, η is the state whose metachanges are addressed in this paper.

2.1. Definitions of Metachanges

In this subsection, we give definitions of metachanges.
Definition 1.
(Metachange along time) For intervals between change points x 1 , x 2 , , we say that a metachange along time occurs at a change point t i for a threshold parameter δ t > 0 if and only if
q 1 q 2 at t = t i , d ( q 1 , q 2 ) > δ t , q 1 , q 2 F t , F t = { p x ( x ; ξ ) } ,
where q 1 and q 2 are distributions of intervals. q 1 q 2 means that x t q 1 at t = t i 1 and x t q 2 at t = t i . d is a distance function between the probability density functions.
Definition 2.
(Metachange along state) For a data stream y 1 , y 2 , , we say that a metachange along state occurs at a change point t i for a threshold parameter δ s > 0 if and only if
q 1 q 2 at t = t i 1 , q 2 q 3 at t = t i , | d ( q 2 , q 3 ) d ( q 1 , q 2 ) | > δ s , q 1 , q 2 , q 3 F s , F s = { p y ( y ; η ) } ,
where q 1 , q 2 , and q 3 are distributions of values of the data stream. Equation (2) means that y t q 1 at t = t i 2 , , t i 1 1 , y t q 2 at t = t i 1 , , t i 1 , and y t q 3 at t = t i , t i + 1 1 . Here, d is the same as that in Definition 1.
Definition 3.
(Integrated metachange) For a change point t i , we say that an integrated metachange occurs at t i if and only if Equation (1) or Equation (2) holds.

2.2. Problem Setting

In this subsection, we consider a situation where ( m + 1 ) change points t 1 , , t m + 1 are given. We consider how to encode x i and y t as shortly as possible. The ideal code length required for encoding x i is given by what we call the predictive code length, which is the sum of the negative logarithm of its predictive density p x at each time point, defined as follows:
min { ξ ^ x i 1 } i = 1 m i = 1 m log p x ( x i ; ξ ^ x i 1 ) ,
where ξ ^ x i 1 are estimated at each change point. Similarly, the ideal code length required for encoding y t around change points is given by the predictive code length as follows:
min { η ^ y t 1 | t neighbor ( t i ) } i = 1 m i = 1 m t Neighbor ( t i ) log p y ( y t ; η ^ y t 1 ) ,
where Neighbor ( t i ) indicates the neighborhood of a change point t i . In practice, as explained in Section 3.3, Neighbor ( t i ) = [ t i h , t i + h ] , h N . ξ ^ x i 1 and η ^ y t 1 are estimated using x i 1 = x 1 x i 1 and y t 1 = y 1 y t 1 , respectively. A change of η ^ y t 1 indicates a change of state. Detection of a metachange along time is asserted as a problem of detection of a change of ξ ^ x i 1 in Equation (3). On the other hand, detection of a metachange along state is asserted as a problem of detection of a change of how η ^ y t 1 in Equation (4) changes around a change point between change points.

3. Metachange Detection Algorithm

In this section, we present our online algorithm called metachange detection algorithm (MCD) for detecting metachanges along both time and state. We consider how to achieve Equations (3) and (4) in an online fashion. A schematic description of MCD is shown in Figure 2.
First, we detect change points from data stream (A). Next, we concurrently detect metachanges along time (B) and along state (C). We introduce metachange statistics to quantify these metachanges. Finally, we integrate the metachange statistics along time and state into a statistics (D).
The key challenge of detecting metachanges along time and state is how to describe and integrate them. Our approach describes both metachanges as code lengths with MDL; therefore, it is easy to combine them.

3.1. Detecting Change Points

First, we detect change points t 1 , t 2 , . As our proposed algorithm MCD works in an online fashion, it is necessary for the change detection algorithm to work in an online fashion (e.g., [1,2,3,4,8,9]). In general, MCD is prone to errors by the change detection algorithm and its threshold parameter. We empirically investigate and discuss this point in detail in Section 4.

3.2. Detecting Metachanges along Time

For the detected change points t 1 , let us consider intervals between the successive change points I i = [ t i 1 , t i 1 ] , with length x i = t i t i 1 . For an interval sequence x i = x 1 x i , we consider how to achieve Equation (3) in an online fashion. We define metachange along time (MCAT) a t i as the predictive code length
a t i = def log p x ( x i ; ξ ^ x i 1 ) ,
where p x F t , F t = { p x ( x ; ξ ) } is a parametric class of probability distribution, and ξ ^ x i 1 is estimated using x i 1 = x 1 x i 1 . For example, we can estimate ξ ^ x i 1 as the maximum likelihood estimator. To deal with nonstationary data streams, we use the online discounting maximum likelihood estimator [24]
ξ ^ x i 1 = argmax ξ t = 1 i 1 r ( 1 r ) i 1 t log p x ( x t ; ξ ) ,
where 0 < r < 1 is a discounting parameter. An increase in r has a greater effect on forgetting past data.
In this paper, we introduce a parametric class of the exponential distribution
F t = p x ( x ; ξ ) = ξ exp ( ξ x ) , ξ > 0 .
By substituting Equation (7) into Equation (6), we get
ξ ^ x i 1 = argmax ξ t = 1 i 1 r ( 1 r ) i 1 t log ξ exp ( ξ x t ) = argmax ξ t = 1 i 1 r ( 1 r ) i 1 t log ξ ξ x t .
The inside of argmax in the right-hand side of Equation (8) is expanded as
t = 1 i 1 r ( 1 r ) i 1 t ( log ξ ξ x t ) = r log ξ t = 1 i 1 ( 1 r ) i 1 t r ξ t = 1 i 1 ( 1 r ) i 1 t x t = r log ξ 1 ( 1 r ) i 1 r r ξ t = 1 i 1 ( 1 r ) i 1 t x t = log ξ ( 1 ( 1 r ) i 1 ) r ξ t = 1 i 1 ( 1 r ) i 1 t x t .
The right-hand side of Equation (9) is maximized by deriving it with respect to ξ . As a result, we obtain the following optimal solution:
ξ ^ x i 1 = 1 ( 1 r ) i 1 r t = 1 i 1 ( 1 r ) i 1 t x t .
Thus, by substituting Equation (10) into Equation (5), MCAT at t i is
a t i = log p x ( x i ; ξ ^ x i 1 ) = log ξ ^ x i 1 + ξ ^ x i 1 x i .
In practice, we judge that a metachange occurs along time when MCAT changes greatly between the change points. Technically, we use the change rate of MCAT: a metachange occurs along time when | ( a t i a t i 1 ) / a t i 1 | > ϵ t holds, where ϵ t > 0 is a threshold parameter. We call the algorithm described above as the metachange detection along time algorithm (MCD-T).
As for computational cost of MCAT, Equation (10) is written as
ξ ^ x i 1 = 1 ( 1 r ) i 1 r s i 1 ,
where
s i 1 = def j = 1 i 1 ( 1 r ) i 1 j x j .
s i and s i 1 satisfy the following relation:
s i = ( 1 r ) s i 1 + x i .
Therefore, the computational cost of MCAT a t i is O ( i ) .
Example:
We consider a data stream with a length of 200 time intervals between change points: x i = 100 ( i = 1 , , 100 ) and x i = 500 ( i = 101 , , 200 ) . This means that there are 201 change points { t i } i = 1 201 . If we assume t 1 = 100 , then t 2 = 200 , , t 101 = 10,100, t 102 = 10,600, t 103 = 11,100, , t 201 = 60,100. Then, x i is calculated as x i = t i t i 1 . Figure 3 shows the time intervals at the change points (Figure 3, top), MCATs a t i (Figure 3, second graph), the change rate of MCATs | ( a t i a t i 1 ) / a t i 1 | (Figure 3, third graph), and ξ ^ x i 1 (Figure 3, bottom). We observe in Figure 3 that we can detect the metachange along time when we choose a suitable threshold ϵ t . Here, the discounting parameter is set to r = 0.5 .

3.3. Detecting Metachanges Along State

For a change point t i detected in Section 3.1, we consider how to achieve Equation (4) in an online fashion. We consider a subset of time around t i for Neighbor ( t i ) in Equation (4). The subset is denoted by J i = t i h , t i + h , where h N is a window size. Thus, we consider a sequence y t i h t i + h = y t i h y t i + h , with length n = 2 h + 1 . We introduce a parametric class of probability distributions F s = { p y ( Y ; η ) ; η H } . Here, Y is a random variable and η is a real-valued parameter. H is the associated parameter space.
Next, we define metachange statistics along state (MCAS) at change point t i . First, two statistics, b t i + and b t i , are introduced. These are defined as the difference between two code lengths for y t i + 1 t i + h : one is the “expected” code length, estimated using the parameter change at t i 1 and the estimated parameter with y t i h t i 1 . The other is the code length with the parameter estimated in terms of y t i + 1 t i + h . Formally, b t i ± is defined as the difference between the predictive code length and the NML code length [20] after the change point. The former is calculated as the predictive code length, which is the total code length for encoding y t i + 1 t i + h in a predictive way, using the estimated parameter η ± as follows:
1 h t = t i + 1 t i + h log p y ( y t ; η ^ ± ) ,
where η ^ ± is defined as
η ^ ± = def η ^ y t i h t i 1 ± η ^ y t i 1 + 1 t i 1 + h η ^ y t i 1 h t i 1 1 ,
which indicates the parameter change to the same side and the opposite side in the same way as the previous change point t i 1 . Here, η ^ y τ 1 τ 2 means the maximum likelihood estimator of η using y τ 1 τ 2 = y τ 1 y τ 2 .
The latter is calculated as the NML code length, which is defined as the negative logarithm of the NML distribution [20]:
1 h t = t i + 1 t i + h log p y ( y t ; η ^ y t i + 1 t i + h ) + log C h .
The difference between Equation (12) and Equation (14) is given by
b t i ± = def 1 h t = t i + 1 t i + h log p y ( y t ; η ^ ± ) + log p y ( y t ; η ^ y t i + 1 t i + h ) log C h ,
where C h = z t i + 1 t i + h max η p y ( z t i + 1 t i + h ; η ) in Equation (15) is computed using Rissanen’s approximation formula under some regularity conditions [23]:
log C h k 2 log h 2 π + log | I ( θ ) | d θ ,
where k is the dimension of H and I ( θ ) = def E η [ 2 log p y ( Y ; η ) / η i η j ] is the Fisher information matrix at the parameter value η . Intuitively, Equation (15) quantifies the redundant code length for coding y t i + 1 t i + h with the parameters estimated in terms of the parameter change at t i 1 and the parameter values in the former part of t i .
Finally, we define MCAS as
b t i = def min ( b t i + , b t i ) ,
which means that metachanges along state are quantified by the relative magnitude of changes in the parameters in this paper. The computational cost of MCAS is O ( h ) = O ( 1 ) . We judge that a metachange along state occurs at t i when b t i > ϵ s holds, where ϵ s > 0 is a threshold parameter. We call the algorithm described above as the metachange detection along state algorithm (MCD-S).
Example:
We generate a data stream with length 11,250:
y t N ( 0.0 , 0.05 ) ( t = 1 , , 1000 ) N ( 1.0 , 0.05 ) ( t = 1001 , , 2000 ) N ( 0.0 , 0.05 ) ( t = 2001 , , 3000 ) N ( 1.0 , 0.05 ) ( t = 3001 , , 4000 ) N ( ( t 4001 ) / 1000 , 0.05 ) ( t = 4001 , , 5000 ) N ( 0.0 , 0.05 ) ( t = 5001 , , 6000 ) N ( ( t 6000 ) / 1000 , 0.05 ) ( t = 6001 , , 7000 ) N ( 1.0 , 0.05 ) ( t = 7001 , , 8000 ) N ( 1 ( t 8000 ) / 250 , 0.05 ) ( t = 8001 , , 8250 ) N ( 0.0 , 0.1 ) ( t = 8251 , , 9250 ) N ( 1.0 , 0.1 ) ( t = 9251 , , 10 , 250 ) N ( 1.0 , 0.3 ) ( t = 10 , 251 , , 11 , 250 ) ,
where N ( μ , σ ) denotes the probability density function of the univariate normal distribution with mean μ and standard deviation σ .
Figure 4 shows data stream { y t } (Figure 4, top) and statistics { b t i } (Figure 4, bottom). The parameter is set to h = 200 . True change points occur at 1001, 2001, 3001, 4001, 5101, 6001, 7001, 8001, 8251, 9251, and 10,251. Figure 4 shows that the statistics b t i increase when there is a change in how parameters behave around a change point between successive change points. At t 2 = 2001 and t 3 = 3001 , b t i are relatively small, which shows that parameter changes (i.e., their magnitudes) do not differ much between t 1 = 1001 and t 2 = 2001 and between t 2 = 2001 and t 3 = 3001 . However, b t i increases at t 4 = 4001 because the change shifts to a gradual change from an abrupt one. These results indicate that MCAS provides information regarding changes in the behavior around the change points.

3.4. Integrating Metachange Statistics

Finally, we consider how to integrate MCAT a t i and MCAS b t i at a change point t i . Because a t i and b t i are code lengths, they can be summed. Therefore, we propose adding a t i and b t i with weighting. Integrated metachange (MCI) s t i at t i is defined as
s t i = def a t i + λ b t i ,
where λ R is a hyperparameter. We should carefully choose λ with data. In Section 4.3, λ is determined using a grid search.
In practice, we judge that a metachange along both time and state occur at t i when MCI greatly changes between the change points. As in the case of metachanges along time in Section 3.2, we use the change rate of MCI: a metachange along both time and state occurs at t i if | ( s t i s t i 1 ) / s t i 1 | > ϵ ts , where ϵ ts > 0 is a threshold parameter.
We call the overall algorithm described above MCD; it is summarized in Algorithm 1.
Algorithm 1 MCD.
Input:r: discounting parameter ( 0 < r < 1 ), h: window size, ϵ ts : threshold parameter
Output: a t i : metachange statistics along time, b t i : metachange statistics along state, s t i : integrated metachange statistics.
1:
i = 1
2:
for t = 1 , do
3:
   Input y t .
4:
   Detect change point with a change detection algorithm.
5:
   if t is change point then
6:
         t i t .
7:
         x i t i t i 1 .
8:
        Calculate metachange statistics along time a t i according to Equation (11).
9:
        Calculate metachange statistics along state b t i according to Equation (16).
10:
     Calculate integrated metachange statistics s t i according to Equation (17).
11:
      Raise an alarm if and only if | ( s t i s t i 1 ) / s t i 1 | > ϵ ts .
12:
       i i + 1 .
13:
   end if
14:
end for

4. Experiment

We conducted five experiments to confirm the effectiveness of the proposed algorithm MCD (https://github.com/s-fuku/metachange).

4.1. Synthetic Dataset 1 (Metachanges along Time)

We defined six levels of time intervals between change points referring to the work in [13,25]. The interval lengths were 100,000, 50,000, 10,000, 5000, 1000, and 500. The change points were set using a Bernoulli distribution oscillating between μ = 0.2 and μ = 0.8 . For each combination of two intervals, we generated the streams based on the scheme above. Each stream contained 100 change points. In what follows, L 1 and L 2 indicate the first and second interval lengths, respectively.
We confirmed the effectiveness of MCD by comparing it with a volatility detector (VD) [13]. We used the SEED algorithm [13] and the sequential MDL-change statistics algorithm (SMDL) [8] for change detection. SEED was based on ADWIN2 [21] and its parameters were set to δ = 0.05 , Γ = 75 , ϵ ^ = 0.025 , and α = 0.025 , which are the same as those in [13]. The window size w of SMDL was set to w = 0.2 L 1 , and the threshold parameter ϵ was set to ϵ = 0.01 . For the Bernoulli distribution, the change score Ψ t of SMDL at time t was calculated as
Ψ t = μ ^ 0 log μ ^ 0 ( 1 μ ^ 0 ) log ( 1 μ ^ 0 ) 1 2 μ ^ 1 log μ ^ 1 ( 1 μ ^ 1 ) log ( 1 μ ^ 1 ) 1 2 μ ^ 2 log μ ^ 2 ( 1 μ ^ 2 ) log ( 1 μ ^ 2 ) ,
where μ ^ 0 = i = t w t + w y i / ( 2 w + 1 ) , μ ^ 1 = i = t w t 1 y i / w , and μ ^ 2 = i = t t + w y i / ( w + 1 ) . If Ψ t > ϵ , t is regarded as a change point. We determined that t was a change point if the change score Ψ t was the maximum. The parameter of MCD-T was set to r = 0.2 . Below, we discuss the dependency of MCD-T on r in Figure 5. For VD, buffer size B = 32 and reservoir size R = 32 , which were the same as in [13]. We also discuss the dependency of VD on B and R below in Figure 6. In running SEED [13], we used the Java source code provided by the authors (https://www.cs.auckland.ac.nz/research/groups/kmg/DavidHuang.html). We started to use change points when its number reached B + R for MCD-T and VD because the buffer and the reservoir of VD are not full until B + R intervals arrive.
We investigated the trade-off between detection delay and accuracy in terms of benefit and false alarm rate, defined as in [8,26]. For MCD-T, we first fixed the threshold parameter ϵ t and converted MCAT { a t i } in Equation (11) to binary alarms { α t i } . That is, α t i = 𝟙 ( | ( a t i a t i 1 ) / a t i 1 | > ϵ t ) , where 𝟙 ( t ) denotes the binary function that takes 1 if and only if t is true. We evaluated MCD-T by varying ϵ t . We let τ be a maximum tolerant delay of metachange detection. When the metachange really started from t * , we defined the benefit of an alarm at time t as
b ( t ; t * ) = 1 | t t * | τ ( 0 | t t * | < τ ) , 0 ( otherwise ) .
The number of false alarms was calculated as
n ( α 1 m ) = def k = 1 m α t k 𝟙 ( b ( t k , t * ) = 0 ) .
We visualized the performance by plotting the recall rate of the total benefit, b, against the false alarm rate, n / sup ϵ t n , with ϵ t varying. Likewise, for VD, α t i was calculated using the relative volatility between the variances of the buffer and the reservoir by varying the threshold parameter β . We evaluated all four combinations of change detectors SEED and SMDL and metachange detectors MCD-T and VD by calculating the average and standard deviation of the area under the curve (AUC) of the benefit vs. FAR curves. The AUC scores were calculated over 50 sequences. The delay parameter was set to τ = 5 L 2 . Table 1 shows the average AUC scores. Table 1 shows that MCD-T with SEED or MCD-T with SMDL outperforms VD with SEED or VD with SMDL. This indicates the effectiveness of MCD-T.
Because MCD-T depends on discounting parameter r and the change detection algorithm used, we investigated these effects. First, we examined the dependency of AUC on r for all combinations of L 1 and L 2 . We calculated AUC for 30 times with r = 0.01 , 0.05 , 0.1 , 0.2 , 0.3 , 0.4 , and 0.5 . We used SEED [13] as the change detection algorithm, and its parameters were set to the same values as above. The dataset used was also the same as in the previous experiment. Figure 5 shows that, when L 1 is relatively small (e.g., L 1 = 500 , 1000 , 5000 , 10 , 000 ), AUC is not heavily dependent on r. When L 1 is larger, however, we observe that the larger r is, the smaller AUC is. This is because, with an increase of L 1 , the number of false alarms of SEED also increases. In such situations, MCD-T is more prone to the false alarms when r is larger.
Figure 6 shows the dependency of AUC of VD on the buffer size B and the reservoir size R ( B = R ) for comparison. We calculated AUC for 50 times. We observe from Figure 6 that AUC decreases as B increases. In addition, we also see that MCD-T outperforms VD for various combinations of r and B ( = R ) by comparing Figure 5 with Figure 6.
Next, we investigated the effect of the change detection algorithm used. We used SEED by changing the parameter ϵ ^ = 0.0025 , 0.005 , and 0.0075 . Other conditions and the dataset were the same as in the previous experiment. Here, ϵ ^ is a hyperparameter that controls the threshold parameter [13]. Figure 7 shows that AUC does not heavily depend on ϵ ^ for all combinations of L 1 and L 2 . In general, the threshold parameter of the change detection algorithm controls the performance of MCD-T. Hence, it should be carefully set.

4.2. Synthetic Dataset 2 (Metachanges along State)

We generated a data stream with length 24 L , where L = 500 , 1000 , 2000 . The generated data stream contained a metachange along state. In the former part, each datum was drawn from
y t N ( 0 , 0.1 ) ( t = 1 , , L ) , N ( 0.5 , 0.1 ) ( t = L + 1 , , 2 L ) .
After we repeated the procedure 10 times, we obtained a subsequence with length 20 L . In the latter part, each datum was drawn from
y t N ( ( t 20 L ) / 2 L , 0.1 ) ( t = 20 L + 1 , , 21 L ) , N ( 0 , 0.1 ) ( t = 21 L + 1 , , 24 L ) .
A metachange along state occurred at t = 20 L + 1 . For change detection, we employed four algorithms for comparison: (1) SMDL [8], a semi-instant method with the MDL change statistics; (2) ChangeFinder (CF) [1,2,4], a state-of-the-art method of abrupt change detection; (3) Bayesian online change point detection (BOCPD) [3], a retrospective online change point detection with a Bayesian scheme; and (4) ADWIN2 [21], adaptive windowing methods. As we assumed a situation where change and metachange mechanisms do not vary significantly, we decided to choose the best combinations of parameters of each change detection algorithm by grid search, as in [8,27]. We generated 10 sequences with the scheme above and calculated the F-scores for each combination of the following parameters:
  • SMDL: Window size w = 50 , 100 ( L = 500 ), w = 100 , 200 ( L = 1000 ), w = 200 , 400 ( L = 2000 ). Threshold parameter ϵ = 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 .
  • CF: Discounting rate r = 0.003 , 0.005 , 0.01 , 0.03 , 0.1 . Threshold parameter δ = 0 , 0.5 , 1.0 , 1.5 , 2.0 (regression orders k 1 , k 2 = 3 , smoothing parameters T 1 , T 2 = 5 ).
  • BOCPD: Parameter related to change intervals α = 100 , 300 , 600 . Threshold parameter ϵ = 0.1 , 0.3 .
  • ADWIN2: Confidence parameter δ = 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 .
F-score is defined as the harmonic mean of precision and recall, which are calculated using the number of true positives (TP), false positives (FP), and false negatives (FN) as follows [9]: TP is the number of true change points that are τ -neighbors of estimated change points. Thus, FP and FN are calculated as FP = TP and FN = m TP , where and m are calculated as FP = TP and FN = m TP , where and m denotes the total number of estimated and true change points, respectively. Finally, we calculated recall = TP / ( TP + FN ) and precision = TP / ( TP + FP ) for each method. In this experiment, we set τ to 100.
After optimizing the parameters of each change detection algorithm, we generated 30 data streams with the scheme above and detected change points and the metachange. In the metachange detection, we compared MCD-S with SMDL. We chose SMDL for comparison because it calculates a change score at each time based on changes of parameters with MDL. Hence, a change rate of scores between change points is regarded as the degree of metachange along state. Hereafter, we refer to SMDL for metachange detection as SMDL metachange (SMDL-MC) and the window parameter as w mc . We calculated MCAS in Equation (16) for MCD-S and the change rate | ( Ψ t i Ψ t i 1 ) / Ψ t i 1 | for SMDL-MC. Ψ t is the change score at time t for a univariate normal distribution [8]:
Ψ t = 1 2 log σ ^ 0 2 σ ^ 1 σ ^ 2 + log C 2 w mc C w mc 2 ,
where σ ^ 0 , σ ^ 1 , and σ ^ 2 are the maximum likelihood estimators of standard deviations calculated for y t w mc + 1 t + w mc , y t w mc + 1 t 1 and y t t + w mc , respectively. C k is the normalizer of the normalized maximum likelihood code length [20]
log C k = 1 2 log 16 μ max π σ min 2 + k 2 log k 2 e log Γ k 1 2 ,
where Γ is the gamma function. In this paper, μ max = 2 and σ min = 0.005 . The window parameters h of MCD-S and w mc of SMDL-MC were set to h , w mc = 100 ( L = 500 ), h , w mc = 200 ( L = 1000 ), and h , w mc = 400 ( L = 2000 ). In calculating the F-scores, the maximum tolerant delay was set to τ = 0.5 L .
Table 2 shows the average AUC values of MCD-S and SMDL-MC for the detection of metachanges along state at t = 20 L + 1 . The first and second rows in the header represent change detection and metachange detection algorithms, respectively. The best parameters for each combination of change detection and metachange detection algorithms are ϵ = 0.7 , w = 100 ( L = 500 ), ϵ = 0.7 , w = 200 ( L = 1000 ), and ϵ = 0.7 , w = 400 ( L = 2000 ). Table 2 shows that MCD-S outperforms SMDL-MC overall because MCD-S deals with metachanges along state directly in terms of MCAS, whereas SMDL-MC only quantifies the difference in code lengths between situations where there is a change and where there is no change.
We further investigated the effects of window size h and threshold parameters of the change detection algorithms. We chose SMDL [8] for change detection. Figure 8 shows the dependency of AUC on h and threshold parameter ϵ of SMDL. The interval length was set to L = 500 , threshold parameter was set to ϵ = 0.1 , 0.2 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , and h = w = 50 , 100 , 150 , where w is the window parameter of SMDL. Figure 8 (top and bottom) shows the dependency of AUC of MCD-S on the threshold parameter ϵ of SMDL and the dependency of F-score of SMDL on ϵ , respectively. We observe in Figure 8 (top) that AUC of MCD-S decreases between ϵ = 0.2 and 0.4 , but, when ϵ exceeds 0.4 , AUC begins to increase for h = 50 , 100 , 150 . This reflects the fact that there are many local maximum points of the change scores of SMDL, leading to false alarms of change points around ϵ = 0.2 0.4 . It is noticeable that F-scores of SMDL decrease for ϵ = 0.1 ( h = 100 ) , and for ϵ = 0.2 ( h = 150 ) , but AUCs of MCD-S do not do so much. This is because SMDL detects many false positive change points, but it detects the metachange point accurately.
As for the dependency of AUC on window size h, we observe that AUC generally increases as h increases for the same ϵ .

4.3. Synthetic Dataset 3 (Metachanges Along Time and State)

We generated a data stream that contained metachanges along both time and state. The stream consisted of two subsequences. The former part repeated changes of mean. Each instance was drawn from Equation (20) with L = L 1 . We repeated the procedure for 50 times and obtained a subsequence with length 100 L 1 . The latter part comprised the following four parts, each with length L 2 :
y t N ( 0 , 0.1 ) ( t = 100 L 1 + 1 , , 100 L 1 + L 2 ) , N ( 0.45 , 0.1 ) ( t = 100 L 1 + L 2 + 1 , , 100 L 1 + 4 L 2 ) .
In total, we obtained a data stream with length 100 L 1 + 4 L 2 . A metachange along both time and state occurred at t = 100 L 1 + L 2 + 1 . We chose lengths L 1 and L 2 among 400, 450, and 500.
We detected the metachange in the following three ways: we first detected change points with the same algorithms as in Section 4.2, and then detected the metachanges with MCD-T, MCD-S, and MCD. The parameters of the change detection algorithms were tuned as in Section 4.2. The ranges of parameters were the same as those in Section 4.2. except that, for SMDL, the threshold parameter ϵ = 0.05 , 0.1 , 0.15 for all combinations of L 1 and L 2 . The parameter of MCD-T was selected among r = 0.1 , 0.2 , 0.3 and MCD-S was among h = 0.1 L 1 , 0.2 L 1 . The window size of SMDL were selected among w = h , and the maximum tolerant delay was τ = L 2 . We chose the weight parameter λ in Equation (17) among λ = 0.001 , 0.01 , 0.1 , 1 , 5 , 10 . For VD, the buffer and reservoir sizes (B and R) were selected among 16 , 24 , 32 . All the parameters were selected with grid search for the AUCs of metachange detection to be maximum.
Table 3 shows the average AUC values. Table 3a–c show average AUC values with MCD-T, MCD-S, and MCD. Table 3a shows that MCD combined with SMDL as the change detection algorithm outperforms MCD-S and MCD-T.
Table 4 shows the best parameters for each combination of intervals. We observe that the more intensive a metachange along time is, the bigger r is and the less λ becomes. These results reflect the fact that it is necessary to adapt to recent data, and MCAT increases in such a situation, leading to the decrease of λ .

4.4. Real Dataset: Human Action Recognition Data

We applied MCD to the detection of metachanges in human action recognition data called HASC-PAC2016 dataset [28] (HASC-PAC2016 dataset is publicly available at http://hub.hasc.jp/). The data were collected from the Human Activity Sensing Consortium (HASC, http://hasc.jp/). HASC-PAC2016 dataset contains sequences of acceleration data for three axes, and each sequence is segmented into one of six action labels: “stay”, “walk”, “jog”, “skip”, “stair up”, (go upstairs) and “stair down” (go downstairs). For this experiment, we aimed to evaluate the effectiveness of our proposed algorithm MCD by using a data stream with ground truth of “changes of action changes” and “changes of intervals of actions”. The former corresponds to metachanges along state, and the latter to metachanges along time. We combined each action into a data stream as follows: first, we repeated “stay” and “walk” alternately for 15 times; then “jog” and “skip” for 15 times; and, finally, “stair up” and “stair down” for 15 times. We repeated each pair of actions for 15 times because “stair up” and “stair down” have only 15 files, which are the fewest in all the six actions. We obtained a data stream of length 89,324. Table 5 shows the files used for a participant named Person06023. We read the files sequentially in alphabetical order for each action. Figure 9 shows the data stream we obtained. Here, acc_X, acc_Y, and acc_Z represent accelerations for x-, y-, and z-axes, respectively.
First, we detected change points with SMDL [8]. It was a challenge to determine the hyperparameters of SMDL—window size w and threshold parameter ϵ —in an online change detection. We tuned w and ϵ with the remaining dataset for Person06023, which alternated “stay” and “walk” four times, and “jog” and “skip” likewise. Although this dataset lacked “stair up” and “stair down”, we thought that it was enough to estimate the best configuration of w and ϵ . We calculated F-score as described in Section 4.2 for the change points between different action labels. We selected w = 900 and ϵ = 0.75 among w { 500 , 600 , 700 , 800 , 900 , 1000 } and ϵ { 0 , 0.25 , 0.5 , 0.75 , 1 } . Figure 10 shows histograms of intervals for each action label. We observe in Figure 10 that most of the intervals are around 960–970 for “jog”, “walk”, and “skip”, whereas, for “stay”, “stair up”, and “stair down”, the intervals are around 1020. We can see that w = 900 was enough to detect changes.
We applied SMDL to the stream and obtained the estimated change scores { Ψ t } at each time point. We calculated Ψ t with the multivariate normal distribution. Specifically, Ψ t is calculated as
Ψ t = 1 2 log | Σ ^ 0 | 2 | Σ ^ 1 | | Σ ^ 2 | + 1 2 w log C 2 w C w 2 + 1 2 w i = t w t + w ( y i μ ^ 0 ) Σ ^ 0 1 ( y i μ ^ 0 ) i = t w t 1 ( y i μ ^ 1 ) Σ ^ 1 1 ( y i μ ^ 1 ) i = t t + w ( y i μ ^ 2 ) Σ ^ 2 1 ( y i μ ^ 2 ) ,
where μ ^ 0 = 1 / ( 2 w + 1 ) i = t w t + w y i , μ ^ 1 = 1 / w i = t w t 1 y i , and μ ^ 2 = 1 / ( w + 1 ) i = t + 1 t + w y i . Σ ^ 0 = 1 2 w i = t w t + w ( y i μ ^ 0 ) ( y i μ ^ 0 ) , Σ ^ 1 = 1 w i = t w t 1 ( y i μ ^ 1 ) ( y i μ ^ 1 ) , and Σ ^ 1 = 1 w + 1 i = t t + w ( y i μ ^ 2 ) ( y i μ ^ 2 ) .
Note that C w in Equation (21) is the normalizer of the NML code length [29,30]:
log C w = ( m + 1 ) log m 2 + m 2 log μ max m 2 2 log σ min + m w 2 log w 2 e log Γ m 2 log Γ m w 1 2 ,
where m is the dimension of the data stream, Γ is the gamma function, and Γ m is calculated as
Γ m ( x ) = π m ( m 1 ) 4 j = 1 m Γ x + 1 j 2 .
We set μ max = 50 and σ min = 0.005 .
Next, we defined the ground truths for metachanges along state at two time points where the changes of action label changes occurred: t = 29,752 from “jog” to “stair up”, and t = 59,588 from “walk” to “skip”. Moreover, we also defined the ground truths for metachanges along time at time points where the changes of intervals occurred. We see in Figure 10 that the distributions are significantly different between four types of “changes of action changes”: from “stay” to “jog”, from “jog” to “stair up”, from “stair up” to “walk”, and from “skip” to “stair down”.
We detected metachanges along time with MCD-T and volatility detector (VD) [13], and compared them. Figure 11 shows the estimated MCAT with MCD-T and the relative volatility with VD. The parameter of MCD-T was set to r = 0.1 , 0.2 , 0.3 , whereas one of VD was B = R = 10 , 15 , 20 . Figure 11 shows the results.
We observe in Figure 11 that MCAT detects the metachanges along time between the four action pairs, respectively, for r = 0.1 , 0.2 , and 0.3 . However, the relative volatility fails to detect some of these metachanges along time.
We detected metachanges along state with MCD-S and the change rate of the MDL change statistics [8]. Figure 12 shows the estimated MCAS with MCD-S and the MDL change statistics. We observe in Figure 12 that both MCD-S and the MDL change statistics detect a time point around t = 29,752 from “jog” to “stair up”. However, the MDL change statistics do not change significantly at a time point around t = 59,588, where a metachange along state happened from “walk” to “skip”. It indicates that the change rate of the MDL change statistics failed to detect the metachange along state around t = 59,588, whereas MCD-S detected it successfully.
In summary, the proposed algorithm MCD detected metachanges along both time and state more accurately than other methods.

4.5. Real Dataset: Production Condition Data

We applied MCD to the detection of metachanges in the production condition data. The data were collected from a factory of a manufacturing company. Each datum comprised eight attributes, and the length of the stream was 26,450. The factory reported that important events occurred 10 times during the study period, at t = 668, 2634, 2635, 9663, 13,230, 13,231, 17,372, 17,832, 20,131, and 25,441. Figure 13 shows the attributes from the stream. The dashed line indicates the time points where important events occurred. We investigated whether the detected metachanges were signs of important events, and we finally concluded that it might be true. The details are as follows.
Figure 13 shows that the scales of attributes were different. Hence, we normalized each attribute X to ( X μ ) / σ , where μ and σ are the sample mean and standard deviation, respectively, which were calculated with the first 250 time points. First, we applied SMDL [8] to the stream and obtained the estimated change scores { Ψ t } at each time. We calculated Ψ t with the multivariate normal distribution in Equation (21). The window sizes w of SMDL and h of MCD were set to w = h = 250 by field knowledge that it roughly represents a unit of production. Moreover, μ max and σ min in Equation (22) were set to 60 and 0.001 , respectively. Next, we detected change points t 1 , t 2 , as time points where the change scores Ψ t i were locally maximum within an interval where Ψ t > ϵ . We set ϵ = 0.3 when the total change points detected was less than 0.5% of the total length. It is a business demand by a factory, and so there were not many alarms. The number of detected change points was 97 (0.37%). Finally, we determined the discounting parameter r and the weight parameter λ of MCD in Equation (17) with the first 5000 time points. We selected r = 0.1 and λ = 0.2 so that the AUC score at t = 2634 and t = 2635 would be the maximum. The AUC score was calculated using Equations (18) and (19).
Figure 14 shows the MDL change statistics { Ψ t } calculated with SMDL [8] (Figure 14, top), the estimated MCAT a t i (Figure 14, second), logarithm of the estimated MCAS log 10 b t i (Figure 14, third), and logarithm of the estimated MCI log 10 s t i (Figure 14, fourth). We also estimated the relative volatility with VD [13,25] (Figure 14, fifth) and the change rate of the MDL change statistics | ( Ψ t i Ψ t i 1 ) / Ψ t i 1 | (Figure 14, bottom) for comparison in detecting metachanges along both time and state. For VD, the buffer size B and the reservoir size R were both set to 10. In Figure 14 (top), the red points indicate the detected change points.
We summarize what can be seen for metachange statistics in Figure 14 as follows:
  • t = 9663 : The trend of MCI increases roughly after t = 5000 , which can be interpreted as a combination of MCAT and MCAS in Figure 14. The relative volatility and the change rate of the MDL change statistics do not show such a significant sign.
  • t = 13,230, 13,231, 17,372, 17,832: For time points between t = 10,000 and t = 15,000, the trend of MCI increases. It is also due to the combination of MCAT and MCAS, but is more influenced by MCAS. It might also be a sign of important events at t = 17,372 and 17,832 as well as t = 13,230 and t = 13,231. The relative volatility increases after t = 13,231, which might be a sign of the important event at t = 17,372. However, the change rate of the MDL change statistics does not show such a significant sign.
  • t = 25,440: For time points between t = 20,000 and t = 25,000, the trend of MCI increases with large fluctuations. It is also more influenced by MCAS. It might also be a sign of important events at t = 25,440. The relative volatility increases for the time points, but the change rate of the MDL change statistics does not show such a significant sign.
In summary, we can observe a sign of metachange for each important event. We therefore infer that there might have been some symptoms that should be analyzed using field knowledge.

5. Conclusions

We propose the concept of metachanges along time and state in data streams, and we introduce metachange statistics to quantify metachanges from a unified view with MDL. The key idea of our proposed method is to encode the time intervals and change of states with code lengths in the same fashion. Next, we introduce the novel methodology of MCD. Using synthetic datasets, we empirically demonstrated that the proposed algorithm was highly effective in detecting metachanges along time and state. Using a real dataset, we demonstrated that the proposed algorithm could detect metachanges in both time and state, some of which were overlooked by VD [13] and the MDL change statistics [8]. The estimated metachange statistics might have been a sign of important events.
Future work will be directed toward the theoretical guarantee of metachange statistics, especially integrated metachange statistics. We will also consider how to adapt to a non-stationary data stream by updating the weight parameter λ in Equation (17). Other research directions might lie in the extension of metachange statistics to transient periods between change points. Furthermore, metachange detection of model structure change and its change sign is another interesting line of research.

Author Contributions

Conceptualization, S.F. and K.Y.; methodology, S.F. and K.Y.; software, S.F.; validation, S.F.; formal analysis, S.F.; investigation, S.F.; resources, K.Y.; data curation, S.F.; writing–original draft preparation, S.F.; writing–review and editing, S.F. and K.Y.; visualization, S.F.; supervision, K.Y.; project administration, K.Y.; funding acquisition, K.Y.

Funding

This work was partially supported by JST KAKENHI 19H01114 and JST-AIP JPMJCR19U4.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Yamanishi, K.; Takeuchi, J. A unifying framework for detecting outliers and change points from non-stationary time series data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Edmonton, AB, Canada, 23–25 July 2002; pp. 676–681. [Google Scholar]
  2. Takeuchi, J.; Yamanishi, K. A unifying framework for detecting outliers and change-points from time series. IEEE Trans. Knowl. Data Eng. 2006, 18, 482–492. [Google Scholar] [CrossRef]
  3. Adams, R.; MacKay, D. Bayesian online changepoint detection. arXiv 2007, arXiv:0710.3742. [Google Scholar]
  4. Takahashi, T.; Tomioka, R.; Yamanishi, K. Discovering emerging topics in social streams via link anomaly detection. IEEE Trans. Knowl. Data Eng. 2014, 26, 120–130. [Google Scholar] [CrossRef]
  5. Miyaguchi, K.; Yamanishi, K. On-line detection of continuous changes in stochastic processes. In Proceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA), Paris, France, 19–21 October 2015; pp. 1–9. [Google Scholar]
  6. Yamanishi, K.; Maruyama, Y. Dynamic syslog mining for network failure monitoring. In Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD), Chicago, IL, USA, 21–24 August 2005; pp. 499–508. [Google Scholar]
  7. Yamanishi, K.; Maruyama, Y. Dynamic model selection with its applications to novelty detection. IEEE Trans. Inform. Theory 2007, 53, 2180–2189. [Google Scholar] [CrossRef]
  8. Yamanishi, K.; Miyaguchi, K. Detecting gradual changes from data stream using MDL-change statistics. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016; pp. 156–163. [Google Scholar]
  9. Kaneko, R.; Miyaguchi, K.; Yamanishi, K. Detecting changes in streaming data with information-theoretic windowing. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017; pp. 646–655. [Google Scholar]
  10. Yamanishi, K.; Fukushima, S. Model change detection with the MDL Principle. IEEE Trans. Inform. Theory 2018, 64, 6115–6126. [Google Scholar] [CrossRef]
  11. Aminikhanghahi, S.; Cook, D.J. A survey of methods for time series change point detection. Knowl. Inf. Syst. 2017, 51, 339–367. [Google Scholar] [CrossRef] [PubMed]
  12. Kleinberg, J. Bursty and hierarchical structure in streams. Data Min. Knowl. Discov. 2003, 7, 373–397. [Google Scholar] [CrossRef]
  13. Huang, D.; Koh, Y.S.; Dobbie, G.; Pears, R. Detecting volatility shift in data streams. In Proceedings of the 2014 IEEE International Conference on Data Mining (ICDM), Shenzhen, China, 14–17 December 2014; pp. 863–868. [Google Scholar]
  14. Huang, D.; Koh, Y.S.; Dobbie, G.; Pears, R. Tracking drift types in changing data streams. In Proceedings of the International Conference on Advanced Data Mining and Applications, Hangzhou, China, 14–16 December 2013; pp. 72–83. [Google Scholar]
  15. Aggarwal, C. A framework for diagnosing changes in evolving data streams. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD), San Diego, CA, USA, 9–13 June 2003; pp. 575–586. [Google Scholar]
  16. Spiliopoulou, M.; Ntoutsi, I.; Theodoridis, Y.; Schult, R. MONIC: Modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Philadelphia, PA, USA, 20–23 August 2006; pp. 706–711. [Google Scholar]
  17. Spiliopoulou, M.; Ntoutsi, E.; Theodoridis, Y.; Schult, R. MONIC and followups on modeling and monitoring cluster transitions. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Prague, Czech Republic, 23–27 September 2013; pp. 622–626. [Google Scholar]
  18. Ntoutsi, I.; Spiliopoulou, M.; Theodoridis, Y. Summarizing cluster evolution in dynamic environments. In Proceedings of the International Conference on Computational Science and Its Applications, Santander, Spain, 20–23 June 2011; pp. 562–577. [Google Scholar]
  19. Gama, J.; Žliobaitė, I.; Bifet, A.; Mykola, P.; Abdelhamid, B. A survey on concept drift adaptation. ACM Comput. Surv. 2014, 46, 44:1–44:37. [Google Scholar] [CrossRef]
  20. Rissanen, J. Optimal Estimation of Parameters; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
  21. Bifet, A.; Gavaldá, R. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining, Philadelphia, PA, USA, 26–28 April 2007; pp. 443–448. [Google Scholar]
  22. van Leeuwen, M.; Siebes, A. StreamKrimp: Detecting change in data streams. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Antwerp, Belgium, 15–19 September 2008; pp. 672–687. [Google Scholar]
  23. Rissanen, J. Stochastic complexity and modeling. Ann. Stat. 1986, 14, 1080–1100. [Google Scholar] [CrossRef]
  24. Yamanishi, K.; Takeuchi, J.; Williams, G.; Milne, P. On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min. Knowl. J. 2004, 8, 275–300. [Google Scholar] [CrossRef]
  25. Huang, D. Change Mining and Analysis for Data Streams. Ph.D. Thesis, The University of Auckland, Auckland, New Zealand, 2015. [Google Scholar]
  26. Fawcett, T.; Provost, F. Activity monitoring: noticing interesting changes in behavior. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Diego, CA, USA, 15–18 August 1999; pp. 53–62. [Google Scholar]
  27. Liu, S.; Yamada, M.; Collier, N.; Sugiyama, M. Change-point detection in time-series data by relative density-ratio estimation. Neural Netw. 2013, 43, 72–83. [Google Scholar] [CrossRef] [PubMed]
  28. Ichino, H.; Kaji, K.; Sakurada, K.; Horii, K.; Kawaguchi, N. HASC-PAC2016: Large scale human pedestrian activity corpus and its baseline recognition. In Proceedings of the UBICOMP/ISWC’16 Adjunct, Heidelberg, Germany, 12–16 September 2016; pp. 705–714. [Google Scholar]
  29. Hirai, S.; Yamanishi, K. Efficient computation of normalized maximum likelihood coding for Gaussian mixtures with its applications to optimal clustering. In Proceedings of the IEEE International Symposium on Information Theory, St. Petersburg, Russia, 31 July–5 August 2011; pp. 1031–1035. [Google Scholar]
  30. Hirai, S.; Yamanishi, K. Efficient computation of normalized maximum likelihood coding for Gaussian mixtures with its applications to optimal clustering. IEEE Trans. Inform. Theory 2013, 59, 7718–7727. [Google Scholar] [CrossRef]
Figure 1. Conceptual illustration of metachanges.
Figure 1. Conceptual illustration of metachanges.
Entropy 21 01134 g001
Figure 2. Schematic of the proposed metachange detection algorithm (MCD) algorithm.
Figure 2. Schematic of the proposed metachange detection algorithm (MCD) algorithm.
Entropy 21 01134 g002
Figure 3. Metachange statistics along time (MCAT): (top) time interval at each change point; (second) MCAT a t i ; (third) change rate of MCAT | ( a t i a t i 1 ) / a t i 1 | ; and (bottom) the estimated parameter of the exponential distribution λ ^ . The discounting parameter r = 0.5 .
Figure 3. Metachange statistics along time (MCAT): (top) time interval at each change point; (second) MCAT a t i ; (third) change rate of MCAT | ( a t i a t i 1 ) / a t i 1 | ; and (bottom) the estimated parameter of the exponential distribution λ ^ . The discounting parameter r = 0.5 .
Entropy 21 01134 g003
Figure 4. Metachange statistics along state (MCAS): (top) data stream y t ; and (bottom) MCAS b t i . Window size h = 200 .
Figure 4. Metachange statistics along state (MCAS): (top) data stream y t ; and (bottom) MCAS b t i . Window size h = 200 .
Entropy 21 01134 g004
Figure 5. Dependency of AUC on discounting parameter r for MCD-T on Synthetic Dataset 1.
Figure 5. Dependency of AUC on discounting parameter r for MCD-T on Synthetic Dataset 1.
Entropy 21 01134 g005
Figure 6. Dependency of AUC on the buffer size B (= the reservoir size R) for VD on Synthetic Dataset 1.
Figure 6. Dependency of AUC on the buffer size B (= the reservoir size R) for VD on Synthetic Dataset 1.
Entropy 21 01134 g006
Figure 7. Dependency of AUC on threshold controlling parameter ϵ ^ of SEED [13] on Synthetic Dataset 1.
Figure 7. Dependency of AUC on threshold controlling parameter ϵ ^ of SEED [13] on Synthetic Dataset 1.
Entropy 21 01134 g007
Figure 8. Dependency of AUC on threshold parameter ϵ for SMDL [8] and window size h of MCD-S on Synthetic Dataset 2.
Figure 8. Dependency of AUC on threshold parameter ϵ for SMDL [8] and window size h of MCD-S on Synthetic Dataset 2.
Entropy 21 01134 g008
Figure 9. Human action recognition data for Person06023. Each row represents accelerations for x-, y-, and z-axes, respectively.
Figure 9. Human action recognition data for Person06023. Each row represents accelerations for x-, y-, and z-axes, respectively.
Entropy 21 01134 g009
Figure 10. Histograms of intervals for each action label.
Figure 10. Histograms of intervals for each action label.
Entropy 21 01134 g010
Figure 11. MCAT with MCD-T ( r = 0.1 , 0.2 , 0.3 ) and the relative volatility with the volatility detector [13] ( B = R = 10 , 15 , 20 ).
Figure 11. MCAT with MCD-T ( r = 0.1 , 0.2 , 0.3 ) and the relative volatility with the volatility detector [13] ( B = R = 10 , 15 , 20 ).
Entropy 21 01134 g011
Figure 12. MCAS of MCD-S ( h = 900 ) and the MDL change statistics ( w = 900 ).
Figure 12. MCAS of MCD-S ( h = 900 ) and the MDL change statistics ( w = 900 ).
Entropy 21 01134 g012
Figure 13. Data stream of the production condition data. Red dashed line indicates the time points where the important events occurred.
Figure 13. Data stream of the production condition data. Red dashed line indicates the time points where the important events occurred.
Entropy 21 01134 g013
Figure 14. Metachange statistics of the production condition data: (top) the MDL change statistics { Ψ t i } . Blue dots show change points { t i } , where Ψ t i > ϵ ; (second) estimated MCAT a t i ; (third) estimated logarithm of MCAS log 10 b t i ; (fourth) estimated logarithm of integrated metachange statistics (MCI) log 10 s t i ; (fifth) relative volatility [13]; and (bottom) change rate of the MDL change statistics | ( Ψ t i Ψ t i 1 ) / Ψ t i 1 | . h = w = 250 , ϵ = 0.3 , B = R = 10 , λ = 0.2 .
Figure 14. Metachange statistics of the production condition data: (top) the MDL change statistics { Ψ t i } . Blue dots show change points { t i } , where Ψ t i > ϵ ; (second) estimated MCAT a t i ; (third) estimated logarithm of MCAS log 10 b t i ; (fourth) estimated logarithm of integrated metachange statistics (MCI) log 10 s t i ; (fifth) relative volatility [13]; and (bottom) change rate of the MDL change statistics | ( Ψ t i Ψ t i 1 ) / Ψ t i 1 | . h = w = 250 , ϵ = 0.3 , B = R = 10 , λ = 0.2 .
Entropy 21 01134 g014
Table 1. Average area under the curve (AUC) scores on Synthetic Dataset 1 ( r = 0.2 , τ = 5 L 2 ). Boldfaces describe best performances.
Table 1. Average area under the curve (AUC) scores on Synthetic Dataset 1 ( r = 0.2 , τ = 5 L 2 ). Boldfaces describe best performances.
L 1 L 2 SEEDSMDL
MCD-TVDMCD-TVD
100,00050,000 0.603 ± 0.180 0.458 ± 0.199 0.500 ± 0.060 0.313 ± 0.160
100,00010,000 0.621 ± 0.147 0.310 ± 0.167 0.710 ± 0.254 0.463 ± 0.113
100,0005000 0.645 ± 0.129 0.328 ± 0.152 0.668 ± 0.223 0.416 ± 0.164
100,0001000 0.651 ± 0.110 0.275 ± 0.135 0.512 ± 0.123 0.448 ± 0.107
100,000500 0.697 ± 0.140 0.336 ± 0.140 0.660 ± 0.111 0.506 ± 0.138
50,000100,000 0.788 ± 0.093 0.647 ± 0.107 0.729 ± 0.067 0.639 ± 0.107
50,00010,000 0.671 ± 0.103 0.280 ± 0.130 0.605 ± 0.171 0.556 ± 0.060
50,0005000 0.708 ± 0.087 0.293 ± 0.144 0.617 ± 0.183 0.546 ± 0.146
50,0001000 0.718 ± 0.067 0.294 ± 0.140 0.655 ± 0.161 0.501 ± 0.144
50,000500 0.767 ± 0.110 0.316 ± 0.133 0.686 ± 0.074 0.470 ± 0.157
10,000100,000 0.863 ± 0.059 0.794 ± 0.058 0.877 ± 0.068 0.791 ± 0.015
10,00050,000 0.834 ± 0.050 0.735 ± 0.050 0.876 ± 0.066 0.823 ± 0.026
10,0005000 0.723 ± 0.040 0.344 ± 0.250 0.658 ± 0.159 0.498 ± 0.084
10,0001000 0.781 ± 0.014 0.375 ± 0.260 0.689 ± 0.083 0.444 ± 0.077
10,000500 0.809 ± 0.063 0.391 ± 0.256 0.671 ± 0.163 0.520 ± 0.070
5000100,000 0.856 ± 0.060 0.796 ± 0.067 0.854 ± 0.071 0.798 ± 0.036
500050,000 0.825 ± 0.047 0.726 ± 0.062 0.875 ± 0.032 0.708 ± 0.043
500010,000 0.777 ± 0.030 0.575 ± 0.139 0.716 ± 0.060 0.630 ± 0.031
50001000 0.783 ± 0.009 0.436 ± 0.257 0.709 ± 0.009 0.353 ± 0.098
5000500 0.816 ± 0.054 0.493 ± 0.269 0.839 ± 0.191 0.413 ± 0.097
1000100,000 0.872 ± 0.072 0.814 ± 0.072 0.836 ± 0.036 0.812 ± 0.036
100050,000 0.844 ± 0.059 0.754 ± 0.061 0.947 ± 0.037 0.810 ± 0.027
100010,000 0.802 ± 0.022 0.668 ± 0.050 0.873 ± 0.049 0.805 ± 0.023
10005000 0.801 ± 0.014 0.648 ± 0.064 0.895 ± 0.053 0.812 ± 0.053
1000500 0.816 ± 0.048 0.560 ± 0.242 0.711 ± 0.141 0.409 ± 0.108
500100,000 0.876 ± 0.068 0.831 ± 0.063 0.830 ± 0.079 0.820 ± 0.023
50050,000 0.845 ± 0.062 0.767 ± 0.062 0.836 ± 0.044 0.818 ± 0.010
50010,000 0.827 ± 0.051 0.676 ± 0.047 0.872 ± 0.023 0.822 ± 0.016
5005000 0.830 ± 0.047 0.663 ± 0.042 0.864 ± 0.047 0.819 ± 0.017
5001000 0.830 ± 0.050 0.612 ± 0.100 0.935 ± 0.022 0.853 ± 0.095
Table 2. Average AUC scores on Synthetic Dataset 2. The first and second headers represent change detection and metachange detection algorithms, respectively. Boldfaces describe best performances.
Table 2. Average AUC scores on Synthetic Dataset 2. The first and second headers represent change detection and metachange detection algorithms, respectively. Boldfaces describe best performances.
LSMDLCFBOCPDADWIN2
MCD-SSMDL-MCMCD-SSMDL-MCMCD-SSMDL-MCMCD-SSMDL-MC
500 0.887 ± 0.100 0.795 ± 0.156 0.874 ± 0.111 0.851 ± 0.170 0.701 ± 0.318 0.572 ± 0.332 0.797 ± 0.186 0.853 ± 0.114
1000 0.921 ± 0.018 0.905 ± 0.012 0.912 ± 0.042 0.830 ± 0.052 0.751 ± 0.323 0.743 ± 0.291 0.834 ± 0.094 0.847 ± 0.048
2000 0.970 ± 0.010 0.953 ± 0.011 0.912 ± 0.033 0.843 ± 0.022 0.829 ± 0.124 0.821 ± 0.138 0.951 ± 0.032 0.887 ± 0.046
Table 3. Average AUC scores of metachange detection on Synthetic Dataset 3. The first and second headers represent change detection and metachange detection algorithms, respectively. Boldfaces describe best performances.
(a) Metachange detection along time.
(a) Metachange detection along time.
L 1 L 2 SMDLCFBOCPDADWIN2
MCD-TVDMCD-TVDMCD-TVDMCD-TVD
400450 0.867 ± 0.022 0.845 ± 0.013 0.818 ± 0.031 0.815 ± 0.025 0.843 ± 0.053 0.825 ± 0.038 0.839 ± 0.048 0.806 ± 0.043
400500 0.871 ± 0.021 0.867 ± 0.021 0.815 ± 0.040 0.812 ± 0.041 0.831 ± 0.049 0.814 ± 0.033 0.823 ± 0.048 0.826 ± 0.038
450400 0.813 ± 0.024 0.804 ± 0.017 0.795 ± 0.044 0.784 ± 0.029 0.810 ± 0.038 0.805 ± 0.031 0.805 ± 0.052 0.812 ± 0.035
450500 0.872 ± 0.014 0.863 ± 0.019 0.822 ± 0.039 0.819 ± 0.047 0.847 ± 0.032 0.829 ± 0.042 0.816 ± 0.044 0.815 ± 0.034
500400 0.874 ± 0.024 0.867 ± 0.024 0.837 ± 0.016 0.813 ± 0.045 0.822 ± 0.019 0.797 ± 0.029 0.815 ± 0.011 0.802 ± 0.031
500450 0.893 ± 0.015 0.873 ± 0.019 0.829 ± 0.013 0.859 ± 0.039 0.833 ± 0.011 0.823 ± 0.049 0.819 ± 0.021 0.875 ± 0.031
(b) Metachange detection along state.
(b) Metachange detection along state.
L 1 L 2 SMDLCFBOCPDADWIN2
MCD-SSMDC-MCMCD-SSMDC-MCMCD-SSMDC-MCMCD-SSMDC-MC
400450 0.901 ± 0.012 0.857 ± 0.014 0.823 ± 0.013 0.833 ± 0.024 0.855 ± 0.021 0.858 ± 0.031 0.809 ± 0.015 0.867 ± 0.011
400500 0.923 ± 0.016 0.911 ± 0.023 0.813 ± 0.011 0.812 ± 0.014 0.852 ± 0.034 0.851 ± 0.028 0.805 ± 0.036 0.798 ± 0.024
450400 0.895 ± 0.022 0.875 ± 0.011 0.835 ± 0.021 0.809 ± 0.033 0.855 ± 0.034 0.853 ± 0.025 0.809 ± 0.033 0.892 ± 0.031
450500 0.917 ± 0.017 0.905 ± 0.023 0.842 ± 0.039 0.825 ± 0.047 0.837 ± 0.051 0.819 ± 0.042 0.838 ± 0.044 0.615 ± 0.034
500400 0.875 ± 0.024 0.863 ± 0.022 0.822 ± 0.032 0.813 ± 0.045 0.810 ± 0.026 0.797 ± 0.022 0.729 ± 0.024 0.702 ± 0.023
500450 0.865 ± 0.021 0.823 ± 0.028 0.715 ± 0.038 0.723 ± 0.049 0.728 ± 0.045 0.706 ± 0.038 0.694 ± 0.042 0.675 ± 0.031
(c) Metachange detection along both time and state.
(c) Metachange detection along both time and state.
L 1 L 2 SMDLCFBOCPDADWIN2
MCDMCDMCDMCD
400450 0.985 ± 0.011 0.971 ± 0.023 0.968 ± 0.033 0.967 ± 0.029
400500 0.989 ± 0.007 0.975 ± 0.016 0.971 ± 0.005 0.969 ± 0.031
450400 0.983 ± 0.016 0.981 ± 0.013 0.968 ± 0.035 0.966 ± 0.014
450500 0.987 ± 0.010 0.982 ± 0.014 0.975 ± 0.025 0.970 ± 0.029
500400 0.979 ± 0.015 0.973 ± 0.011 0.969 ± 0.012 0.964 ± 0.013
500450 0.975 ± 0.012 0.969 ± 0.010 0.967 ± 0.018 0.954 ± 0.021
Table 4. Best parameters for each combination of intervals.
Table 4. Best parameters for each combination of intervals.
L 1 L 2 rwh λ
400450 0.2 0.2 L 1 0.2 L 1 0.1
400500 0.3 0.2 L 1 0.2 L 1 0.01
450400 0.1 0.2 L 1 0.2 L 1 0.1
450500 0.2 0.2 L 1 0.2 L 1 0.1
500400 0.3 0.2 L 1 0.2 L 1 0.01
500450 0.1 0.2 L 1 0.2 L 1 0.1
Table 5. Files for generating a sequence of Person06023.
Table 5. Files for generating a sequence of Person06023.
Action LabelFiles
stayHASC N-acc.csv (N = 0605581–0605595)
walkHASC N-acc.csv (N = 0608420–0608434)
jogHASC N-acc.csv (N = 0611173–0611187)
skipHASC N-acc.csv (N = 0613411–0613425)
stair upHASC N-acc.csv (N = 0615620–0615634)
stair downHASC N-acc.csv (N = 0614162–0614166)
Back to TopTop