1. Introduction
Data that contains location and time information exists everywhere. As we move to the internet of things (IoT), many devices (smart phones and sensors) may report data back with the most recent value at a specific location and time. Stationary spots (such as weather station) record multidimensional data at a time interval. Others, like computer servers and data centers, may log their performance data (e.g., number of traffic flows, system load, intrusion alerts, etc.) with time/location information. Even more challenging, the locations may change over time. For example, natural disasters, like hurricane, tornado and earthquake, may move along a path, while diseases may spread across regions over time carried by the movement of water, air and people.
Monitoring and understanding spatiotemporal data is nevertheless challenging, because the data not only grows quickly in size, but also becomes more complex in nature. This is further complicated by the fact that the data values are usually very dynamic, meaning they usually change not only across regions, but over time, as well. It is difficult for humans to understand the dynamics and correlation of events between time and space.
While data mining and machine learning approaches on spatiotemporal data [
1,
2,
3,
4,
5,
6,
7] are useful, there is a gap between the data mining results and the interpretation of results, particularly in the domain of anomaly detection and situation awareness, where users usually want a more intuitive interface to view these relationships. In addition, the possible combinations of attributes grow exponentially, thus the computational complexity for the data mining approach may become infeasible to examine large and complex datasets. Having a visual interface may help to greatly reduce the computation space and allow human operators to make a decision in a shorter time.
Analyzing spatial data, such as geographic visualization [
8], is a good starting point. However, visualizing the spatial information alone does not take into consideration the causal relationships among events. Spatiotemporal visualization [
9,
10,
11,
12,
13] has been proven useful in analyzing such data. Nevertheless, some spatiotemporal visualization is quite complex and requires a fairly steep learning curve. Some visual designs only work on specific types of data. Many visualizations focus on aesthetics rather than being geared towards analyzing the anomalies and simplicity. Since investigators under situation awareness scenarios are most interested in detecting the areas where changes of values are abnormal compared to the past and their neighbors, existing solutions are less effective to complete the task.
To that end, we developed a general visualization tool to analyze the spatiotemporal anomalies. This work is motivated by a simple task, i.e., given only a longitude/latitude (or x/y) location and a timestamped value at that location, can we find which location is abnormal without much a priori knowledge? To make it more challenging, there could be multiple attributes, or vectors, of values. How can we relate the value at each location to its neighbor’s value (spatial), and how does the value change over time (temporal)?
While the design principle of the tool is to make it general enough for many types of spatiotemporal data, one particular scenario of the possible application of the tool is network management. A network manager or system administrator can use the tool to gain a quick look at the current network health and find any place (data centers, servers, routers, hosts,
etc.) exhibiting abnormal usage patterns, possibly due to malicious attacks, misconfiguration or hardware fault. Besides security investigation, the tool may also be useful for troubleshooting and debugging purposes. The visual analytic tool (
Figure 1) allows investigators to interactively explore spatiotemporal datasets and analyze their anomalous changes. The system is built on top of a popular geographic information system (GIS),
i.e., Google Earth (GE), and utilizes a generic data format,
i.e., Keyhole Markup Language (KML).
Figure 1.
An overview of the spatiotemporal anomaly analytic tool. The filter (black box) and time slider (white box) allow interactive exploration of the evolution of multi-dimensional attributes. The map supports drag, spin, zoom and pan. Anomalous activities can be visually canalized through 2D grids (a) and 3D bars (b). (a) The main visualization on spatiotemporal anomalous analysis; (b) zoomed-in view with anomaly bars in regions.
Our contribution lies in spatiotemporal anomaly detection by studying the effectiveness of various combinations of 2D/3D visual objects and spatiotemporal data analysis (clustering) using an interactive system. Unlike traditional geographic visualization, we introduce visual cues that can help users understand the correlation of anomalous events. In particular, we adopt visual schemes, such as 3D anomaly bars of different color and size, for representing the value dynamics at different locations. Bars are intuitive to users and can effectively utilize the unused space above the map. Depending on the ways to construct the bars, one can calculate the anomalous scores that can be used to encode the properties of bars. Colors of bars may represent different attributes/dimensions of data. Users can drag, spin, zoom and pan, click for queries or adjust time sliders to investigate events within a particular time window.
In addition, in order to bring some level of automation into visual analysis, we allow the tool to take outputs from spatiotemporal data mining techniques, in particular detecting significant spatiotemporal changing patterns (over-density and/or under-density clusters) through GridScan [
14]. The irregularly-shaped clusters are encoded as 2D anomaly grids superimposed on the cartographic layer of the map to guide users to interesting areas that can potentially be anomalous. Such anomaly grids and bars can be used together for better understanding of spatiotemporal anomalies. The interactive nature of the tool allows users to work on different levels of granularity during the investigation process. Through case studies on publicly available dataset of a large enterprise network and Air Quality Index data, we demonstrate the potential usefulness of the visualization tool. Due to its generality, the proposed spatiotemporal anomaly analytic system may be applied to other domains related to situation awareness.
The rest of this paper is organized as follows.
Section 2 discusses the related work.
Section 3 describes the architecture of the system, data processing, analysis and visualization design. Specifically, algorithms for encoding and analyzing anomalies are discussed.
Section 4 evaluates the proposed visualization over publicly available datasets. Finally,
Section 5 concludes our work.
2. Related Work
Visual analytics [
15] has been valuable in exploring and analyzing general data. Many complex datasets contain spatial information, as pointed out by [
16], which stresses the need for the closer integration of three largely disparate technologies: geographic visualization, knowledge discovery and geo-computation. Part of this work is related to geographic visualization [
8], which focuses on the visualization on spatial data. Many real-world spatial data also have time attributes associated with them,
i.e., spatiotemporal data. We try to create an interactive environment for users to analyze anomalies in spatiotemporal data.
Spatial-temporal data has become increasingly popular and challenging to understand and analyze. Spatiotemporal visualization [
9,
10,
11,
12,
13] is becoming an important research topic. Among them, Whisper [
11] examines information diffusion in social media and microblogging for spatiotemporal patterns through the structure of a sunflower. GeoSTAT [
10] is a web-based tool for visual analysis of spatiotemporal data over a map layer. In addition, spatio-temporal visualization can be achieved through time wheels, a space-time cube or a time-series graph linked to a map [
9]. As in a survey [
17] on the techniques and tools for visual exploratory analysis of spatiotemporal data, spatiotemporal data can be categorized according to the types of changes over time, e.g., existential (appearance/disappearance), spatial properties (locations) and the values of attributes (increase/decrease). A hybrid particle and texture-based approach [
18] was proposed for the visualization of time-dependent vector fields. In addition, a web-based cartographic system [
19] was designed for the interactive spatial analysis of social data using the potential smoothing method. Visual analysis on spatiotemporal data has been applied to social media content [
20]. Visualization for analyzing movement and trajectory data has been proposed by using clustering and classification [
21] and stacking trajectory bands [
22]. Parallel coordinates were applied in geographic context to visualize categoric spatiotemporal data [
1].
Many visualization techniques for complex event analysis are restricted to one single dimension, e.g., time, geography or network connectivity. To counter that, GeoTime [
23] visualizes the spatial inter-connectedness over time and geography in an interactive 3D view with 3D timelines imposed on the geographic map. While well-designed 2D displays may be sufficient to present an extra dimension of information [
24], benefits have been proposed for moving from 2D to 3D geographical visualization [
25], e.g., using 3D arc maps [
26] and 3D heat maps [
27]. The benefits include additional display space, data variables and a familiar view of the world. When a flat 3D map and spinning/interaction are possible, 3D can perform better than 2D with a space time cube [
28]. 3D pencil and helix icons [
13] over maps have been adopted for visualizing spatio-temporal data, in which the icons are used to show the temporal attributes of data. Furthermore, stacking dots and lines in 3D [
12] have shown usefulness in adjunct to 2D visualization. In exploring possible visual solution to the IEEE Conference on Visual Analytics Science and Technology (VAST 2012) challenge, M-Sieve [
29] combines a map view, attribute explorer and treemap views. While a spatially ordered treemap layout [
30] may be used to visualize the spatiotemporal data, the treemap view may be less intuitive for geographic data. This work is based on our previous VAST challenge work [
31] by studying the effect of combining 2D spatiotemporal anomaly grids in addition to 3D anomaly bars.
There has been research on data mining on spatial, temporal and spatial-temporal data [
2]. Spatiotemporal datasets capture changing values of spatial and thematic attributes over a period of time. An event is usually defined as a spatial and temporal phenomenon that happens at a certain time and a certain location, e.g., an earthquake, hurricane or disease outbreak,
etc. Spatiotemporal data mining [
3] may involve analyzing spatiotemporal topological relationship patterns, neighborhood, association rules, clustering, movement patterns and outlier analysis. One way to mine spatiotemporal patterns is to find the most frequently occurred sequences of events [
4] and to use a depth-first-search-like approach for the fast discovery of long sequential patterns in spatiotemporal datasets. In addition, association rule mining [
6] may be applied to spatio-temporal data.
In particular, Compieta
et al. [
7] analyzed the large spatiotemporal data using spatial association rules based on
a priori algorithms and then displayed the mining outcomes combined with a Google Earth map. Their main objective was to predict hurricane Isabel (IEEE Visualization 2004 contest). Algorithms have been developed for discovering moving clusters in spatiotemporal trajectory data [
21,
32,
33,
34]. Some objects’ movement obeys periodic patterns over regular time intervals. Spatiotemporal periodic pattern mining [
5] is used to retrieve maximal periodic patterns using a specialized index structure for pruning purposes. Therefore, time range queries can be answered efficiently. Spatiotemporal clustering methods, such as SaTScan [
35] and GridScan [
14], have also been developed to analyze such data. A discretized spatiotemporal scan [
36] considers anomalous spatiotemporal windows as a set of contiguous spatial points across various temporal points that are unusual. In spatiotemporal scan statistics [
37], the window shape is cylindrical.
3. Spatiotemporal Data Analysis and Visualization
In this section, we begin by describing our visual analytic system, and then, we discuss how data is normalized and analyzed using anomaly bars and different construction methods of bars. We illustrate how spatiotemporal clustering algorithms may be integrated to make the anomaly analysis process more efficient. Particularly, how color function is defined based on user selection of time window is presented. Finally, a user interaction work flow is summarized for the general spatiotemporal anomaly detection.
3.2. Anomaly Bars Visualization
We simplify visual representations of situation/status data through bars, which are well understood, robust and, thus, have a less steep learning curve for ordinary users. Since most maps are 2D and, therefore, the third dimension is not well utilized, we adopt 3D bars to take advantage of both geographic locations and extra dimensions of attribute values. Data measurements could be seen directly and accurately on their geographic positions. We note that since many geographic systems, such as Google Earth, are interactive, i.e., users can drag and move the map from any arbitrary angles, and the system can perform zoom-and-pan operations, the visual complexity associated with 3D objects is likely to be alleviated due to the spinning/interactive nature of the visualization system.
There can be multiple ways to construct and interpret 3D bars, as illustrated in
Figure 3. Bars indicate visualized items’ three main dimensions: longitude, latitude and altitude (height). The surface polygon is a square with equal length and width. The center of the square is the latitude and longitude (x,y)-based location of each data point. The model has flexibility in terms of what and how to construct anomaly bars. The heights of bars,
i.e., the altitudes of 3D polygons, can be directly derived from the actual values of the attributes or dimensions of spatiotemporal data. These actual values may further be normalized to limit the height of each individual bar. Besides actual values, summary statistical values may also be included to encode the bar heights, such as summation, min/max, average,
etc. Particularly interesting, the temporal changes of attribute values, or Δ, may be used as bar heights to allow users to quickly see how many changes their systems have evolved from a previous timestamp.
Besides the heights of bars, the sizes may be used to denote the granularity levels of interactive visualization. For example, when zoomed into the most detailed level on a map, the smallest bars represent values of each individual data point. When zoomed out, the larger bars may represent aggregate values of multiple data points in a region.
Figure 3.
Utilizing the upper space of a 2D map to encode additional attribute dimensions of spatiotemporal data. The center of the square surface is the latitude and longitude (x,y)-based location of each data point. The altitude or height of the 3D polygon represents the attribute value. Colors may represent the severity of anomalies in a region. (a) 3D bar model; (b) bars representing data dimensions.
Colors may be used to reinforce the severity (or abnormality) level of a region. The color mappings follow the cold/warm color spectrum,
i.e., cold (e.g., blue) means good, while warm (e.g., orange) means bad. The color spectrum function is illustrated in Equation (
1). For example, we can let
if we want to equally divide the values into 10 bins, where each bin contains the same number of values. Notice that the value range of each bin is not fixed, but dynamically decided based on the actual value distribution. Performing the simple
k-means clustering will achieve a similar result as the binning process. The benefit of this method is that we do not need to have
a priori domain knowledge on the value range of each bin, except to decide the number of bins. We use this method in the Case I study.
Alternatively, we can also discretize the values into several predefined categories, each of which is mapped to one unique color. The category method is intuitive in many domains. For example, in network and system administration, depending on the syslog urgency levels, if the system load is above 80%, show red (critical); if it is between 50% and 80%, show yellow (warning); else, show green (normal). In environmental protection, depending on the air quality index, if above 250, show red (hazardous); if above 150, show yellow (unhealthy); if below 50, show green (good), etc. We use this method in the Case II study. Generally speaking, the higher and warmer the bars, the more anomalous is a region.
3.6. Interaction and Trend Presentation
Important features of the visualization tool include interaction and trend presentation. The design of user interaction is two-fold, i.e., the granularity of data values based on zoom-in/out levels and the granularity of colors based on start-end time slider selection by users, as explained below.
In order to keep the quantity of information at an appropriate level during the investigation process, we create different scanning results by modifying grid sizes. After loading the data into the system, the investigator could perform zoom-in/out and pan operation and view anomaly grids in different sizes (
Figure 4). The analysis transits smoothly between overview and detail-on-demand. The sizes of the grids may not only depend on the zoom levels, but ideally depend on the underlying data properties. For example, larger grids may be more suitable for one particular dataset, while smaller grids may be ideal for another dataset. Finding a good balance for the granularity of grid sizes for different datasets and incorporating them into user interaction are our ongoing work and will be included in the future version of the tool. A similar interaction is also possible with the anomaly bar visualization (
Figure 5) by controlling the number of bars based on different aggregation and zoom levels. In addition, users may simply click any bar to get further information, such as the original values, normalized values, heights, clusters,
etc.
Figure 4.
Interactive 2D anomaly grid visualization at multi-level granularity. Highlighted areas represent interesting/anomalous areas that are worth further investigation. Grid sizes may be adjusted depending on zoom levels or the underlying data spatio-temporal prosperities. (a) Level 1; (b) Level 2.
The ability to analyze the trend of status changes in the setting of situation awareness is another feature of the tool. While an overview of the entire time range of the dataset is useful and a good starting point, by setting a time range between a beginning and an ending time, the tool provides detail-on-demand investigation over a specified time window of interest events. Sometime, a potential issue may be too subtle to be identified in the overview. The common way to analyze a trend is to visualize pictures frame-by-frame, according to the timestamps. By dragging the time slider, the visualization system will dynamically generate an updated view at that specific time, creating an animation-like effect. This function is supported in our system.
Figure 5.
Interactive 3D anomaly bar visualization at multi-level granularity. Higher and warmer-colored bars likely indicate anomalous locations. In this example, each region contains many branches in a multinational financial corporation. (a) Region level; (b) branch level.
One shortcoming of the above approach is that a human usually performs poorly in remembering things. Memorizing and comparing by shifting images back and forth might be difficult. We try to provide an alternative view,
i.e., how can we present temporal dynamics and trends within a static view without requiring users to remember previous images, like in an animation? One useful feature (and novelty) of our tool is that the system will dynamically change the colors of anomaly grids and bars according to our color spectrum model (see Equation (
1) in
Section 3.7) based on the start and end time values of the slider, which controls the time window of investigation.
Figure 6 shows such comparisons. Typically, a static image could sufficiently lower the challenges for human perception and memory limitation.
For trend presentation, we have two solutions: one is we animate the moving trend and view the data temporal variation in either bars or grids. Animated visualization is used in this case to connect the dots between timelines. The function is achieved through a time control panel in the upper-left corner of Google Earth, as discussed above. An alternative solution to animation is to observe the dynamic trend through one static visualization, as discussed below.