^{1}

^{★}

^{2}

^{3}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

The trend to use large amounts of simple sensors as opposed to a few complex sensors to monitor places and systems creates a need for temporal pattern mining algorithms to work on such data. The methods that try to discover re-usable and interpretable patterns in temporal event data have several shortcomings. We contrast several recent approaches to the problem, and extend the T-Pattern algorithm, which was previously applied for detection of sequential patterns in behavioural sciences. The temporal complexity of the T-pattern approach is prohibitive in the scenarios we consider. We remedy this with a statistical model to obtain a fast and robust algorithm to find patterns in temporal data. We test our algorithm on a recent database collected with passive infrared sensors with millions of events.

Endowing environments with a capability to respond intelligently to different situations depends on observing the activity in the environment and deriving patterns of behaviour. For instance in ubiquitous environments, a wealth of sensor data is produced by observing the behaviours and interactions of humans. Mining the data for temporal patterns aims to discover associations and structure, either in an offline manner to pave the way for new designs and applications, or in an online manner to ensure adaptation of the environment to the users.

Two things make this task especially challenging. First of all, in a real environment, action patterns that are composed of separate events are interleaved, either by the presence of multiple factors that act on the environment (such as multiple users triggering different sensors), or by the presence of single actors performing multiple actions at the same time. Thus, taking an event window to predict the next event in the system will simply not work. Secondly, these patterns exist in different time intervals, and the time difference between related events of a single action can have a large variation. Consequently, detecting associations with these patterns becomes a very challenging task, and most traditional pattern analysis methods are not directly applicable.

In this paper, we review approaches to the problem of detecting temporal patterns and extend the T-Pattern algorithm [

This paper is structured as follows: In Section 2, a detailed survey of the relevant literature is presented. The T-pattern method and our proposed modifications to it are presented in Section 3 and Section 4, respectively. In Section 5, we test our methodology on a simulated database and on the recently collected Mitsubishi Electric Research Laboratories (MERL) motion detection database [

The application of sequential pattern recognition in sensor networks includes long-term environmental monitoring [

In sensor-based applications it is possible to view the physical causes of measured sensor events as hypotheses, for which a belief can be expressed via Bayesian techniques [

There are several algorithms that can potentially discover patterns of event sequences. An early approach was proposed by [

Markov models have been recently employed to tackle simplified versions of this problem, where there are no action overlaps, and events are generated as one long sequence [

The problem of detecting interesting sequences of events in temporal data has been explored in the data mining literature [

A recent approach involves PCA-based methods to uncover daily human behaviour routines [

Finding a

The basic Lempel-Ziv algorithm (LZ78) uses an automatically updated dictionary to extract recurring “words” (patterns) in a string. It constructs a symbol tree, where the paths from root to leaves constitute the words in the dictionary. The Lempel-Ziv-Welch (LZW) variant starts off with a pre-defined basic dictionary (in the case of sensor networks these are single sensor events) to avoid ill-detected patterns at the beginning of the stream and to introduce some continuity. The Active LeZi (ALZ) uses a sliding window of length

LZW and Active LeZi both aim at adding continuity to LZ pattern extraction, yet they still have linear complexity, which is a beneficial feature for a real-time event detection system. On the other hand, none of the compression based methods take into account the temporal structure of the patterns, as the time delays are not modelled, and subsequently overlapping events may escape detection. For a dense, low-cost sensor network without the identification of event source, this is a major drawback [

Most of the temporal pattern detection methods mentioned in the related work section cast the problem into a simpler representation by retaining only the order of events, and look for repeated patterns. In neural network, HMM, and compression based approaches, the emphasis is on predicting the next event, which is not a suitable perspective for an environment where multiple people trigger sensors, and the sensor patterns that follow a logical order (

In the

To recast this problem in a mathematical framework, we first introduce some notation. We denote by _{1}, _{2},…, _{n}_{A}_{A}_{n}_{n−1}. Similar notation is used for B-events. Since we need to find out whether A-events tend to induce B-events, we refer to the combination of an A-event and the _{AB}_{B}_{B}_{B}_{B}

Magnusson introduced the notion of a _{1}, _{2}] is considered to be a CI for the pair of symbols (events) (_{1}, _{2}] than in a random interval of the same size. He then suggests to use the standard _{A}_{B}_{A}_{A}_{B}_{B}_{1}, _{2}] we find all the times _{i}_{i}_{1}, _{i}_{2}], thus arriving at a number _{AB}_{B}_{B}d_{0} = ^{−μB}= ^{−λBd}. The above-mentioned _{AB}

T-patterns were previously used in modeling complex interactions in behavioural studies and sports events [

We propose two modifications to the T-pattern algorithm to make it more resilient to spurious patterns, and to make the search for patterns more robust.

The repeated significance expounded in the preceding section substantially increases the risk of false positives (suggesting spurious dependencies), since it increases the chances of finding random correlations between sensors. Applying a Bonferroni correction would be one way to mitigate this adverse effect. This can be done by replacing the

This proposition asserts that if the A and B processes are independent, then whenever an A-event occurs between two successive B-events, it will be uniformly distributed in that interval. It is intuitively clear that non-uniformity of A within the B-interval would allow a keen observer to improve his or her prediction of the next B-event, thus contradicting independence. More formally, it is well-known that for a

More complicated processes can be modelled by allowing the intensity to vary in time (

Proposition 1 therefore allows us to formulate a statistical procedure to test whether A and B are dependent: using the notation established above we compare for each event _{k}

The CI detection scheme as proposed in [^{2}^{2}), where

_{2}(

Our proposed scheme (GMMTPat) has a complexity independent of

The peaks are even more pronounced if we plot the inter-event times against a logarithmic scale. We illustrate this on the MERL data, described in Section 5 ^{0} = 1 seconds, and this peak gradually shifts to a value of approximately 10^{1} = 10 seconds for sensor 395 which is several meters down the corridor.

In order to compare the modified T-pattern approach with the original T-pattern scheme and compression based approaches, we have created a simple and realistic experimental setup by simulating a small number of interruption sensors in an office environment. A pre-defined event dictionary serves as a catalogue of prominent behaviours, where each behaviour takes the form of a number of sensor activations (events) separated by pre-defined time intervals. The events each correspond to some repeated activity, for instance going to the coffee machine, or to the photocopier. Depending on the layout and the working habits of the actors in the environment, there will be some consistent patterns, which our algorithm seeks to find. We create dummy office layouts with different sensor placements, and simulate activities of one or more users in them. Layout 1 is a rectangular office corridor block, with one door in the middle of each floor segment and sensors on the left and right hand side of three of these doors. Layout 2 consists of one entrance door connected to three corridors, and sensors placed along the corridors. Since the behaviour habits used in the simulation are known to us, we have the ground truth for the generated patterns. The existence of multiple users means that different sensors may be simultaneously activated, breaking the chain of causality (

These user-provided interval lengths are used in conjunction with an assumption of Gaussian noise between each triggered event. One or two users are simulated in the environment, where each user selects a behaviour from the dictionary, and executes it with a probability

We have tested two event dictionaries with six sensors, and generated training and test sequences by simulating one or two persons. We have investigated to what degree we could use the patterns discovered in the training phase as predictors for events in the second stream. The prediction is made for each discrete time slot, which is more granular than just predicting the next event. We have contrasted compression based methods, T-patterns, and our modified T-pattern approach. As the first symbol emitted by each new pattern is random and therefore completely unpredictable, and as individual patterns are short, the prediction rate will have an inherent upperbound.

We have associated probabilistic confidence values with each prediction. The compression-based approaches look at their prediction tree, and compute a posterior probability for each possible event. These probabilities are normalized to sum up to unity, and they indicate a confidence in the prediction; if the context is weak, the posteriors of several events will be close to each other, whereas a strong prediction is evident in a strong posterior. By setting a threshold of confidence (

For the T-patterns and the GMM T-patterns, the critical intervals are taken into account. Ordinarily, the stored patterns are useful for predicting the occurrence of multiple events in overlapping time intervals in near future. This result is more informative than the compression-based algorithm predictions. However, to make their comparison possible, we supply the algorithm with the time that an event occurs, and require the prediction of the event type. For this purpose, all detected T-patterns in the pattern dictionary are used to create their critical intervals based on a fixed history, and these are checked for inclusion of the event time. For each applicable pattern, a uniform distribution within the critical interval is assumed, and the probabilities of different patterns are combined.

We summarize the experimental results in

From the results it is evident that the T-pattern-based approaches perform better than compression-based approaches. It transpires that Magnusson’s original scheme produces too many (spurious) T-patterns making high-confidence prediction impossible. This is most apparent in the 2-person scenario where the intermingling of 1-person patterns generates a large number of new combinations, a fair bit of which are erroneously identified as T-patterns. The GMM approach fares much better, even in the more difficult 2-person scenario.

We have used the MERL motion detector dataset for a larger scale experiments [

The MERL dataset consists of activations recorded from more than 150 passive infrared (PIR) motion detectors placed around the MERL research facility over a large period of time. The PIR sensors fire when someone (or something) passes near the sensor. Via simple binary activations of these sensors, this dataset expresses the residual trace of the activity of all people working in the two-floor facility. It has been previously used in the IEEE Information Visualization Challenge, and presents a significant challenge for behavior analysis, search, manipulation and visualization. The accompanying ground truth contains partial tracks and behavior detections, as well as map data and anonymous calendar data. We have two separate experimental setups on this dataset.

Our first experiment considers 15 sensors and contrasts GMMTPat with the two TPattern variants we introduced before (TTPat and SITPat). We use a small portion of the MERL data for this purpose, as the temporal requirement for the TPattern variants are prohibitive. 5-fold validation is used to report the results in this section, with non-overlapping folds. The 15 sensors are selected as five clusters of sensor triplets, where each triplet is in close proximity and highly correlated, but the clusters are remotely located in the building, thus uncorrelated in principle. Any correctly sequenced within-cluster patterns are

We also consider Bonferroni correction in this section. The number of tests needs to be estimated for Bonferroni correction. The number of tests per event pair was elaborated before, we now complement this with the estimation of the number of event pairs. If there are ^{2}. Assume _{SIT Pat}_{2}(^{2}, _{TT Pat}^{2}, and _{GMMT Pat}^{2}. The independence testing for GMMTPat further reduces this number, as we no longer test all pairs of events for the existence of T-patterns. For the original T-pattern algorithm, on the other hand, this number would be _{Tpat} = 2^{2}. For an event horizon of 300 steps and 150 sensors, this means four billion tests.

In the final experiment, we prediction and Voronoi graph construction with GMMTPat on a large portion of the MERL set. We have used the recorded sensor events between 21 March 2006 and 11 June 2006 for training, and there are four and a half million events in this subset, generated by 154 sensors. As the test set, we use a different set of recordings, collected a year later (May 24, 2007–July 2, 2007), comprising about two million events. Due to the large number of available instances, cross-validation was not used in this study.

The complete motion ground truth for people using the environment is not available, as the sensor outputs are sometimes ambiguous. Furthermore, it is not possible to have rapid activations from a single sensor in succession, and some activity is lost. Finally, the network transmission of the events from sensors to the central recording server is reported to cause minor data loss from time to time. Along with sensor activations, some information about movements called

Since the amount of data is massive, we do not construct the whole cascade of T-patterns, but look at the elementary patterns, each composed of two basic sensor events spaced at most five minutes apart. For each such pattern, the potential critical interval is found by fitting a two-component Gaussian with the EM algorithm to the pooled interval times between the sensor firings, as described. Since the data are 1-dimensional, the convergence is fast (less than 10 iterations) and robust. Our experiments show that using more than 5,000 events for a single candidate pattern is not beneficial, as the distribution is very well approximated with 5,000 events. For real patterns, the first Gaussian has a very narrow shape that is characterized by a small standard deviation in comparison to the second Gaussian.

We use the elementary patterns detected by the algorithm to construct a Voronoi graph, which reflects the topology of the environment. Technically, the Voronoi graph or the Voronoi diagram of an environment is made up of points equidistant to existing obstacles, and thus serves as a roadmap [

In our implementation, every sensor is shown as a node in this graph, and once the elementary T-patterns [

By using the T-patterns, we can try to predict events based on the activation of a given sensor. This is actually more powerful than predicting the next event in the system, as we can give a temporal window (

For each sensor activation of the test set, we looked at the two best T-patterns, and checked the corresponding critical intervals (given by two standard deviations) for the expected events. If at least one event was detected, the prediction was counted as a success. As the number of sensors increased in time, we did not take into account activations from sensors that were missing in the training data. The prediction accuracy under this protocol was 75%.

It is also possible to analyse the prediction success sensor by sensor.

Recent progress in sensor technology makes it necessary to create algorithms that are capable of discovering structure in large-scale and possibly heterogeneous sensor systems. In this paper we have reviewed existing methodologies for the discovery of temporal patterns in sensor data. We have explicitly contrasted compression-based methods, which collapse the sequence into a string and then extract repetitive “words”, with the T-pattern approach, which takes advantage of the time dimension to find the typical delay between related events. We have proposed two improvements to the basic T-pattern methodology (referred to in this text as GMM T-patterns) that significantly improve the performance. Experiments show that T-patterns outperform the compression-based techniques and the proposed improvements (independence testing and GMM-modelling of correlation times) yield more reliable results.

We have applied the modified T-pattern algorithm on a recently published challenging dataset, consisting of binary motion sensor activations. We have shown that the proposed GMMTPat method significantly reduces the temporal complexity, even when contrasted to variants of T-pattern approach that are several orders of magnitude faster than the original. We have shown the effect of Bonferroni adjustment in eliminating spurious patterns. We have also assessed the prediction accuracy, in which the detected patterns are used to predict the firing of the next sensor in the pattern, and automatic construction of the Voronoi graph, which is a proximity-based physical map of the environment. We have validated the latter visually, by superposing it on the map of the environment that shows the true locations of the sensors. As a result, we have shown that the proposed method can be used for predicting events, or discovering the layout from the simple sensor activation patterns. The proposed method is not particular to motion sensors, and can be extended to any sensor activity where discrete events can be identified.

The application of data-mining methods to this problem seems very promising, and is conceived as a future work. In particular, the WINEPI algorithm [

E.P. would like to acknowledge partial support from EU-FP7 project LifeWatch (211372).

_{AB}_{B}

The distribution of the first B event after any A event. The mean and the standard deviation of the sharp Gaussian gives the critical interval for the A-B event.

The Gaussian peaks for successive sensor firings after a given sensor event on a corridor.

Percentage correct predictions at the 20% confidence level. Due to inherent randomness, the prediction upper-bound is 70%.

Layout 1 | Layout 2 | |||
---|---|---|---|---|

| ||||

1 person | 2 persons | 1 person | 2 persons | |

LZ | 29.8 | 17.7 | 56.5 | 13.2 |

ALZ | 21.1 | 18.8 | 66.4 | 19.6 |

LZW | 28.9 | 22.0 | 60.5 | 15.1 |

T-patterns | 28.8 | 17.1 | 61.5 | 24.2 |

GMM T-patterns | 34.8 | 29.3 | 61.9 | 48.3 |

Comparative evaluation of GMMTPat method.

Without Bonferroni | With Bonferroni | |||||
---|---|---|---|---|---|---|

| ||||||

SITPat | TTPat | GMMTPat | SITPat | TTPat | GMMTPat | |

Spurious | 111.6 | 86.6 | 1.4 | 0.0 | 2.0 | 0.0 |

Correct | 85.8 | 94.2 | 47.6 | 61.2 | 74.8 | 37.6 |

Missed | 29.2 | 20.8 | 67.4 | 53.8 | 40.2 | 77.4 |

Gray | 30.2 | 31.8 | 6.6 | 17.2 | 20.4 | 3.4 |

| ||||||

462.910 | 17.180 | 690 | 363.740 | 12.790 | 635 | |

405.000 | 14.812 | 900 | 405.000 | 14.812 | 900 |