Spatiotemporal Data Mining: A Computational Perspective

: Explosive growth in geospatial and temporal data as well as the emergence of new technologies emphasize the need for automated discovery of spatiotemporal knowledge. Spatiotemporal data mining studies the process of discovering interesting and previously unknown, but potentially useful patterns from large spatiotemporal


Introduction
Explosive growth in geospatial and temporal data as well as the emergence of new technologies emphasize the need for automated discovery of spatiotemporal knowledge.Spatiotemporal data mining studies the process of discovering interesting and previously unknown, but potentially useful patterns from large spatial and spatiotemporal databases [1][2][3][4][5].Figure 1 shows the process of spatiotemporal data mining.Given input spatiotemporal data, the first step is often preprocessing to correct noise, errors, and missing data and exploratory space-time analysis to understand the underlying spatiotemporal distributions.Then, an appropriate spatiotemporal data mining algorithm is selected to run on the preprocessed data, and produce output patterns.Common output pattern families include spatiotemporal outliers, associations and tele-couplings, predictive models, partitions and summarization, hotspots, as well as change patterns.Spatiotemporal data mining algorithms often have statistical foundations and integrate scalable computational techniques.Output patterns are post-processed and then interpreted by domain scientists to find novel insights and refine data mining algorithms when needed.Societal importance: Spatiotemporal data mining techniques are crucial to organizations which make decisions based on large spatial and spatiotemporal datasets, including NASA, the National Geospatial-Intelligence Agency [6], the National Cancer Institute [7], the US Department of Transportation [8], and the National Institute of Justice [9].These organizations are spread across many application domains.In ecology and environmental management [10][11][12][13], researchers need tools to classify remote sensing images to map forest coverage.In public safety [14], crime analysts are interested in discovering hotspot patterns from crime event maps so as to effectively allocate police resources.In transportation [15], researchers analyze historical taxi GPS trajectories to recommend fast routes from places to places.Epidemiologists [16] use spatiotemporal data mining techniques to detect disease outbreak.There are also other application domains such as earth science [17], climatology [1,18], precision agriculture [19], and Internet of Things [20].
The interdisciplinary nature of spatiotemporal data mining means that its techniques must developed with awareness of the underlying physics or theories in the application domains [21].For example, climate science studies find that observable predictors for climate phenomena discovered by data science techniques can be misleading if they do not take into account climate models, locations, and seasons [22].In this case, statistical significance testing is critically important in order to further validate or discard relationships mined from data.
Challenges: In addition to interdisciplinary challenges, spatiotemporal data mining also poses statistical and computational challenges.Extracting interesting and useful patterns from spatiotemporal datasets is more difficult than extracting corresponding patterns from traditional numeric and categorical data due to the complexity of spatiotemporal data types and relationships.According to Tobler's first law of geography, "Everything is related to everything else, but near things are more related than distant things."For example, people with similar characteristics, occupation and background tend to cluster together in the same neighborhoods.In spatial statistics such spatial dependence is called the spatial autocorrelation effect.Ignoring autocorrelation and assuming an identical and independent distribution (i.i.d.) when analyzing data with spatial and spatio-temporal characteristics may produce hypotheses or models that are inaccurate or inconsistent with the data set [23].In addition to spatial dependence at nearby locations, phenomena of spatiotemporal tele-coupling also indicate long range spatial dependence such as El Niño and La Niña effects in the climate system.Another challenge comes from the fact that spatiotemporal datasets are embedded in continuous space and time, and thus many classical data mining techniques assuming discrete data (e.g., transactions in association rule mining) may not be effective.A third challenge is the spatial heterogeneity and temporal non-stationarity, i.e., spatiotemporal temporal data samples do not follow an identical distribution across the entire space and over all time.Instead, different geographical regions and temporal period may have distinct distributions.Modifiable area unit problem (MAUP) or multi-scale effect is another challenge since results of spatial analysis depends on a choice of appropriate spatial and temporal scales.Finally, flow and movement and Lagrangian framework of reference in spatiotemporal networks pose challenges (e.g., directionality, anisotropy, etc.).
Previous surveys: As shown in Figure 2, surveys in spatial and spatiotemporal data mining can be categorized into two groups: ones without statistical foundations, and ones with a focus on statistical foundation.Among the surveys without focuses on statistical foundation, Koperski et al. [24] and Ester et al. [25] reviewed spatial data mining from a spatial database approach; Roddick et al. [12] provided a bibliography for spatial, temporal and spatiotemporal data mining; Miller et al. [26] cover a list of recent spatial and spatiotemporal data mining topics but without a systematic view of statistical foundation.Among surveys covering statistical foundations, Shekhar et al. 2003 [23] reviewed several spatial pattern families focusing on spatial data's unique characteristics; Kisilevich et al. [27] reviewed spatiotemporal clustering research; Aggarwal et al. [28] has a chapter summarizing spatial and spatiotemporal outlier detection techniques; Zhou et al. [29] reviewed spatial and spatiotemporal change detection research from an interdisciplinary view; Cheng et al. 2014 [30] reviews state of the art spatiotemporal data mining research including spatiotemporal autocorrelation, space-time forecasting and prediction, space-time clustering, as well as space-time visualization; Shekhar et al. 2011 [31] give the most recent review of spatial data mining research.However, its discussion of spatiotemporal patterns is limited (e.g., nothing on spatiotemporal change patterns or statistically significant hotspots).
In summary, there is no current survey in the literature that provides a systematic overview of spatiotemporal data mining that covers its statistical foundations as well as all major spatiotemporal pattern families.We hope this survey contributes to spatiotemporal data mining research in filling these two gaps.More specifically: (1) we provide a taxonomy of spatiotemporal data types; (2) we provide a taxonomy of spatial and spatiotemporal statistics organized by different data types; (3) we survey common computational techniques for all major spatiotemporal pattern families, including spatiotemporal outliers, spatiotemporal coupling and tele-coupling, spatiotemporal prediction, spatiotemporal partitioning and summarization, spatiotemporal hotspots and change patterns.Within each pattern family, techniques are categorized by the input spatiotemporal data types; (4) we analyze the research trends and future research needs.
Organization of the paper: This survey starts with the characteristics of the data inputs of spatiotemporal data mining (Section 2) and an overview of its statistical foundation (Section 3).It then describes in detail six main output patterns of spatiotemporal data mining related to outliers, association and tele-coupling, prediction, partitioning and summarization, hotspot, and change patterns (Section 4).Common software tools are discussed in Section 5.The paper concludes with an examination of research needs and future directions in Section 6.

Input: Spatial and Spatiotemporal Data
One important aspect of spatiotemporal data mining is its input data.This section provides a taxonomy of different spatial and spatiotemporal data types.The section also summarizes their unique data attributes and relationships.The goal is to provide a systematic overview of different techniques in spatiotemporal data mining tasks.

Types of Spatial and Spatiotemporal Data
The data inputs of spatiotemporal data mining tasks are more complex than the inputs of classical data science tasks because they include discrete representations of continuous space and time.Table 1 gives a taxonomy of different spatial and spatiotemporal data types (or models).Spatial data can be categorized into three models, i.e., the object model, the field model, and the spatial network model [3,32].Spatiotemporal data, based on how temporal information is additionally modeled, can be categorized into three types, i.e., temporal snapshot model, temporal change model, and event or process model [33][34][35].In the temporal snapshot model, spatial layers of the same theme are time-stamped.For instance, if the spatial layers are points or multi-points, their temporal snapshots are trajectories of points or spatial time series (i.e., variables observed at different times on fixed locations).Similarly, snapshots can represent trajectories of lines and polygons, raster time series, and spatiotemporal networks such as time expanded graphs (TEGs) and time aggregate graphs (TEGs) [36,37].The temporal change model represents spatiotemporal data with a spatial layer at a given start time together with incremental changes occurring afterward.For instance, it can represent motion (e.g., Brownian motion, random walk [38]) as well as speed and acceleration on spatial points, as well as rotation and deformation on lines and polygons.Event and process models represent temporal information in terms of events or processes.One way to distinguish events from processes is that events are entities whose properties are possessed timelessly and therefore are not subject to change over time, whereas processes are entities that are subject to change over time (e.g., a process may be said to be accelerating or slowing down) [39].

Data Attributes and Relationships
There are three distinct types of data attributes for spatiotemporal data: non-spatiotemporal attributes, spatial attributes, and temporal attributes.Non-spatiotemporal attributes are used to characterize non-contextual features of objects, such as name, population, and unemployment rate for a city.They are the same as the attributes used in the data inputs of classical data mining [40].Spatial attributes are used to define the spatial location (e.g., longitude and latitude), spatial extent (e.g., area, perimeter) [41,42], shape, as well as elevation defined in a spatial reference frame.Temporal attributes include the timestamp of a spatial object, a raster layer, or a spatial network snapshot, as well as the duration of a process.Relationships on these data attributes are summarized in Tables 2 and 3.One way to deal with implicit spatiotemporal relationships is to materialize the relationships into traditional data input columns and then apply classical data mining techniques [60][61][62][63][64].However, the materialization can result in loss of information [23].The spatial and temporal vagueness which naturally exists in data and relationships usually creates further modeling and processing difficulty in spatial and spatiotemporal data mining.A more preferable way to capture implicit spatial and spatiotemporal relationships is to develop statistics and techniques to incorporate spatial and temporal information into the data mining process.These statistics and techniques are the main focus the survey.

Statistical Foundations
This section provides a taxonomy of common statistical concepts for different spatial and spatiotemporal data types.Spatial and spatiotemporal statistics are distinct from classical statistics due to the unique characteristics of space and time.One important property of spatial data is spatial dependency, a property so fundamental that geographers have elevated it to the status of the first law of geography: "Everything is related to everything else, but nearby things are more related than distant things" [65].Spatial dependency is also measured using spatial autocorrelation.Other important properties include spatial heterogeneity, temporal autocorrelation and non-stationarity, as well as the multiple scale effect.

Spatial Statistics for Different Types of Spatial Data
Spatial statistics [38,[66][67][68] is a branch of statistics concerned with the analysis and modeling of spatial data.The main difference between spatial statistics and classical statistics is that spatial data often fails to meet the assumption of an identical and independent distribution (i.i.d.).As summarized in Table 4, spatial statistics can be categorized according to their underlying spatial data type: Geostatistics for point referenced data, lattice statistics for areal data, and spatial point process for spatial point patterns.
Geostatistics: Geostatistics [69] deals with the analysis of spatial continuity (i.e., dependence across locations), weak stationarity (i.e., some statistical properties not changing with locations) and isotropy (i.e., uniformity in all directions), which are inherent characteristics of spatial point reference data.It is mostly concerned with developing models of spatial dependence for prediction.Under the assumption of weak stationarity or intrinsic stationarity, spatial dependence at various distances can be captured by a covariance function or a semivariogram [66].Geostatistics provides a set of statistical tools, such as Kriging [66] which can be used to interpolate attributes at unsampled locations.However, real world spatial data often shows inherent variation in measurements of a relationship over space, due to influences of spatial context on the nature of spatial relationships.For example, human behavior can vary intrinsically over space (e.g., differing cultures).Different jurisdictions tend to produce different laws (e.g., speed limit differences between Minnesota and Wisconsin).This effect is called spatial heterogeneity or non-stationarity.Special models (e.g., local space-time Kriging [70]) can be further used to reflect the varying functions at different locations.
Lattice statistics: Lattice statistics studies statistics for spatial data in the field (or areal) model.Here a lattice refers to a countable collection of regular or irregular cells in a spatial framework.The range of spatial dependency among cells is reflected by a neighborhood relationship, which can be represented by a contiguity matrix called a W-matrix.A spatial neighborhood relationship can be defined based on spatial adjacency (e.g., rook or queen neighborhoods) or Euclidean distance, or in more general models, cliques and hypergraphs [71].Based on a W-matrix, spatial autocorrelation statistics can be defined to measure the correlation of a non-spatial attribute across neighboring locations.Common spatial autocorrelation statistics include Moran's I, Getis-Ord Gi * , Geary's C, Gamma index Γ [69], etc., as well as their local versions called local indicators of spatial association (LISA) [72].Several spatial statistical models, including the spatial autoregressive model (SAR), conditional autoregressive model (CAR), Markov random fields (MRF), as well as other Bayesian hierarchical models [66], can be used to model lattice data.Another important issue is the modifiable areal unit problem (MAUP) (also called the multi-scale effect) [73], an effect in spatial analysis that results for the same analysis method will change on different aggregation scales.For example, analysis using data aggregated by states will differ from analysis using data at individual family level.

Spatial point processes:
A spatial point process is a model for the spatial distribution of the points in a point pattern.It differs from point reference data in that the random variables are locations.Examples include positions of trees in a forest and locations of bird habitats in a wetland.One basic type of point process is a homogeneous spatial Poisson point process (also called complete spatial randomness, or CSR) [38], where point locations are mutually independent with the same intensity over space.However, real world spatial point processes often show either spatial aggregation (clustering) or spatial inhibition instead of complete spatial independence as in CSR.Spatial statistics such as Ripley's K function [74,75], i.e., the average number of points within a certain distance of a given point over the total average intensity, can be used to test a point pattern against CSR.Moreover, real world spatial point processes such as crime events often contain hotspot areas instead of following homogeneous intensity across space.A spatial scan statistic [76] can be used to detect these hotspot patterns.It tests if the intensity of points inside a scanning window is significantly higher (or lower) than outside.Though both the K-function and spatial scan statistics have the same null hypothesis of CSR, their alternative hypotheses are quite different: the K-function tests if points exhibit spatial aggregation or inhibition instead of independence, while spatial scan statistics assume that points are independent and test if a hotspot with much higher intensity exists.Finally, there are other spatial point processes such as the Cox process, in which the intensity function itself is a random function over space, as well as a cluster process, which extends a basic point process with a small cluster centered on each original point [38].For extended spatial objects such as lines and polygons, spatial point processes can be generalized to line processes and flat processes in stochastic geometry [77].

Spatial network statistics:
Most spatial statistics research focuses on the Euclidean space.Spatial statistics on the network space is much less studied.Spatial network space, e.g., river networks and street networks, is important in applications of environmental science and public safety analysis.However, it poses unique challenges including directionality and anisotropy of spatial dependency, connectivity, as well as high computational cost.Statistical properties of random fields on a network are summarized in [78].Recently, several spatial statistics, such as spatial autocorrelation, K-function, and Kriging, have been generalized to spatial networks [79][80][81].Little research has been done on spatiotemporal statistics on the network space.

Spatiotemporal Statistics
Spatiotemporal statistics [38,82] combine spatial statistics with temporal statistics (time series analysis [83], dynamic models [82]).Table 4 summarizes common statistics for different spatiotemporal data types, including spatial time series, spatiotemporal point process, and time series of lattice (areal) data.
There is also temporal autocorrelation and tele-coupling (high correlation across spatial time series at a long distance).Methods to model spatiotemporal process include physics inspired models (e.g., stochastically differential equations) [38] and hierarchical dynamic spatiotemporal models (e.g., Kalman filtering) for data assimilation [38].
Spatiotemporal point process: A spatiotemporal point process generalizes the spatial point process by incorporating the factor of time.As with spatial point processes, there is a spatiotemporal Poisson process, Cox process, and cluster process.There are also corresponding statistical tests including a spatiotemporal K function and spatiotemporal scan statistics [38].
Time series of lattice (areal) data: Similar to lattice statistics, there is spatial and temporal autocorrelation, a SpatioTemporal Autoregressive Regression (STAR) model [85], and Bayesian hierarchical models [66].Other spatiotemporal statistics include empirical orthogonal functions (EOF) analysis (principle component analysis in geophysics), canonical-correlation analysis (CCA), and dynamic spatiotemporal models (Kalman filter) for data assimilation [82].To understand the meaning of spatiotemporal outliers, it is useful first to consider global outliers.Global outliers [86][87][88] have been informally defined as observations in a data set which appear to be inconsistent with the remainder of that set of data, or which deviate so much from other observations as to arouse suspicions that they were generated by a different mechanism.In contrast, a spatiotemporal outlier [89][90][91][92] is a spatially and temporally referenced object whose non-spatiotemporal attribute values differ significantly from those of other objects in its spatiotemporal neighborhood.Informally, a spatiotemporal outlier is a local instability or discontinuity.

Application Domains
Detecting spatiotemporal outliers is useful in many applications including transportation, ecology, homeland security, public health, climatology, and location-based services [93,94].For example, spatiotemporal outlier detection can be used to detect anomalous traffic patterns from sensor observations on a highway road network.

Statistical Foundation
The spatial statistics for spatial outlier detection are also applicable to spatiotemporal outliers as long as spatiotemporal neighborhoods are well-defined.The literature provides two kinds of bi-partite multidimensional tests: graphical tests, including variogram clouds [95] and Moran scatterplots [68,96], and quantitative tests, including scatterplot [97] and neighborhood spatial statistics [93,98].

Common Approaches
The intuition behind spatiotemporal outlier detection is that they reflect "discontinuity" on non-spatiotemporal attributes within a spatiotemporal neighborhood.Approaches can be summarized according to the input data types.
Outliers in spatial time series: For spatial time series (on point reference data, raster data, as well as graph data), basic spatial outlier detection methods, such as visualization based approaches and neighborhood based approaches, can be generalized with a definition of spatiotemporal neighborhoods.Thus, for simplicity, we only discuss these basic spatial outlier detection approaches here.The visualization approach plots spatial locations on a graph to identify spatial outliers.The common methods are variogram clouds and Moran scatterplot as introduced earlier.The neighborhood approach defines a spatial or spatiotemporal neighborhood, and a spatial statistic is computed as the difference between the non-spatial attribute of the current location and that of the neighborhood aggregate [93,[99][100][101].Spatial neighborhoods can be identified by distances on spatial attributes (e.g., K nearest neighbors), or by graph connectivity (e.g., locations on road networks) [89].This research has been extended in a number of ways to allow for multiple non-spatial attributes [100], average and median attribute value [99,102], weighted spatial outliers [103], categorical spatial outlier [104], local spatial outliers [105], and fast detection [106,107].
Flow Anomalies: Given a set of observations across multiple spatial locations on a spatial network flow, flow anomaly discovery aims to identify dominant time intervals where the fraction of time instants of significantly mis-matched sensor readings exceeds the given percentage-threshold. Figure 3a is a simple example of problem input, which consists of two neighboring locations (i.e., an upstream (up) and downstream (down) sensor), 10 time instants, and the notion of travel time (TT) or flow between the locations.The output contains two flow anomalies; using the time instants at the upstream sensor, periods 1-3 and 6-9, where the majority of time-points show significant differences in-between (Figure 3b).Flow anomaly discovery can be considered as detecting discontinuities or inconsistencies of a non-spatiotemporal attribute within a neighborhood defined by the flow between nodes, and such discontinuities are persistent over a period of time.A time-scalable technique called SWEET (Smart Window Enumeration and Evaluation of persistent-Thresholds) was proposed [57,108,109] that utilizes several algebraic properties in the flow anomaly problem to discover these patterns efficiently.To account for flow anomalies across multiple locations, recent work [58] defines a teleconnected flow anomaly pattern and proposes a RAD (Relationship Analysis of Dynamic-neighborhoods) technique to efficiently identify this pattern.Anomalous moving object trajectories: Detecting spatiotemporal outliers from moving object trajectories is challenging due to the high dimensionality of trajectories and the dynamic nature.A context-aware stochastic model has been proposed to detect anomalous moving pattern in indoor device trajectories [110].Another spatial deviations (distance) based method has been proposed for anomaly monitoring over moving object trajectory stream [111].In this case, anomalies are defined as rare patterns with big spatial deviations from normal trajectories in a certain temporal interval.A supervised approach called Motion-Alert has also been proposed to detect anomaly in massive moving objects [112].This approach first extracts motif features from moving object trajectories, then clusters the features, and learns a supervised model to classify whether a trajectory is an anomaly.Other techniques have been proposed to detect anomalous driving patterns from taxi GPS trajectories [113][114][115]  Spatiotemporal coupling patterns represent spatiotemporal object types whose instances often occur in close geographic and temporal proximity.These patterns can be categorized according to whether there exists temporal ordering of object types: spatiotemporal (mixed drove) co-occurrences [48] are used for unordered patterns, spatiotemporal cascades [51] for partially ordered patterns, and spatiotemporal sequential patterns [53] for totally ordered patterns.Spatiotemporal tele-coupling [46] is the pattern of significantly positive or negative temporal correlation between spatial time series data at a great distance.

Application Domains
Discovering various patterns of spatiotemporal coupling and tele-coupling is important in applications related to ecology, environmental science, public safety, and climate science.For example, identifying spatiotemporal cascade patterns from crime event datasets can help police department to understand crime generators in a city, and thus take effective measures to reduce crime events [116].

Statistical Foundation
The underlying statistic for spatiotemporal coupling patterns is the spatiotemporal cross K function [117], which extends spatiotemporal Ripley's K function (Section 3.2) to the case of multiple variables.

Common Approaches
Mixed Drove Spatiotemporal Co-Occurrence Patterns represent subsets of two or more different object-types whose instances are often located in spatial and temporal proximity.Discovering MDCOPs is potentially useful in identifying tactics in battlefields and games, understanding predator-prey interactions, and in transportation (road and network) planning [118,119].However, mining MDCOPs is computationally very expensive because the interest measures are computationally complex, datasets are larger due to the archival history, and the set of candidate patterns is exponential in the number of object-types.Recent work has produced a monotonic composite interest measure for discovering MDCOPs and novel MDCOP mining algorithms are presented in [48,120].A filter-and-refine approach has also been proposed to identify spatiotemporal co-occurrence on extended spatial objects [49].
A spatiotemporal sequential pattern is a sequence of spatiotemporal event types in the form of f 1 → f 2 → ... → f k .It represents a "chain reaction" from event type f 1 to event type f 2 and then to event type f 3 until it reaches event type f k .A spatiotemporal sequential pattern differs from a colocation pattern in that it has a total order of event types.Such patterns are important in applications such as epidemiology where some disease transmission may follow paths between several species through spatial contacts.Mining spatiotemporal sequential patterns is challenging due to the lack of statistically meaningful measures as well as high computation cost.A measure of sequence index, which can be interpreted by K-function statistics, was proposed in [52,53], together with computationally efficient algorithms.Other works have investigated spatiotemporal sequential patterns from data other than spatiotemporal events, such as moving object trajectories [121][122][123].
Cascading spatio-temporal patterns: Partially ordered subsets of event-types whose instances are located together and occur in stages are called cascading spatio-temporal patterns (CSTP).In the domain of public safety, events such as bar closings and football games are considered generators of crime.Preliminary analysis revealed that football games and bar closing events do indeed generate CSTPs.CSTP discovery can play an important role in disaster planning, climate change science [124,125] (e.g., understanding the effects of climate change and global warming) and public health (e.g., tracking the emergence, spread and re-emergence of multiple infectious diseases [126]).A statistically meaningful metric was proposed to quantify interestingness and computational pruning strategies were proposed to make the pattern discovery process more computationally efficient [50,51].
Spatial time series and tele-connection: Given a collection of spatial time series at different locations, teleconnection discovery aims to identify pairs of spatial time series whose correlation is above a given threshold.Tele-connection patterns are important in understanding oscillations in climate science.Computational challenges arise from the length of the time series and the large number of candidate pairs and the length of time series.An efficient index structure, called a cone-tree, as well as a filter and refine approach [46,127] have been proposed which utilize spatial autocorrelation of nearby spatial time series to filter out redundant pair-wise correlation computation.Another challenge is spurious "high correlation" pairs of locations that happen by chance.Recently, statistical significant tests have been proposed to identify statistically significant tele-connection patterns called dipoles from climate data [47].The approach uses a "wild bootstrap" to capture the spatio-temporal dependencies, and takes account of the spatial autocorrelation, the seasonality and the trend in the time series over a period of time.

What is Spatiotemporal Prediction?
Given spatiotemporal data items, with a set of explanatory variables (also called explanatory attributes or features) and a dependent variable (also called target variables), the spatiotemporal prediction problem aims to learn a model that can predict the dependent variable from the explanatory variables.When the dependent variable is discrete, the problem is called spatiotemporal classification.When the dependent variable is continuous, the problem is spatiotemporal regression.One example of spatiotemporal classification problem is remote sensing image classification over temporal snapshots [128], where the explanatory variables consists of various spectral bands or channels (e.g., blue, green, red, infra-red, thermal, etc.) and the dependent variable is a thematic class such as forest, urban, water, and agriculture.Examples of spatiotemporal regression include yearly crop yield prediction [129], and daily temperature prediction at different locations.

Application Domains
Spatiotemporal prediction has broad applications such as land cover classification on remote sensing images [130], future trends projection in global or regional climate variables [131], and real estate price modeling [132].

Statistical Foundation
The statistical foundation of spatiotemporal prediction techniques includes classical statistics augmented to account for lagged (spatially and temporally) variables [133], as well as spatiotemporal statistics including spatial and temporal autocorrelation, spatial heterogeneity and temporal non-stationarity, as well as the multi-scale effect (introduced in the Section 3).

Spatiotemporal Autoregressive Regression (STAR):
In the spatial autoregression model, the spatial dependencies of the error term, or, the dependent variable, are directly modeled in the regression equation [134].If the dependent values y i are related to each other, then the regression equation can be modified as y = ρW y + Xβ + , where W is the neighborhood relationship contiguity matrix and ρ is a parameter that reflects the strength of the spatial dependencies between the elements of the dependent variable via the logistic function for binary dependent variables.SpatioTemporal Autoregressive Regression (STAR) extends SAR by further explicitly modeling the temporal and spatiotemporal dependency across variables at different locations.More details can be found in [68].
Spatiotemporal Kriging: Kriging [68] is a Geostatistic technique to make predictions at locations where observations are unknown, based on locations where observations are known.In other words, Kriging is a spatial "interpolation" model.Spatial dependency is captured by the spatial covariance matrix, which can be estimated through spatial variograms.Spatiotemporal Kriging [82] generalizes spatial kriging with a spatiotemporal covariance matrix and variograms.It can be used to make predictions from incomplete and noise spatiotemporal data.
Hierarchical Dynamic Spatiotemporal Models: Hierarchical dynamic spatiotemporal models (DSMs) [82], as the name suggests, aim to model spatiotemporal processes dynamically with a Bayesian hierarchical framework.On the top is a data model, which represents the conditional dependency of (actual or potential) observations on the underlying hidden process with latent variables.In the middle is a process model, which captures the spatiotemporal dependency with the process model.On the bottom is a parameter model, which captures the prior distributions of model parameters.DSMs have been widely used in climate science and environment science, e.g., for simulating population growth or atmospheric and oceanic processes.For model inference, Kalman filter can be used under the assumption of linear and Gaussian models.

What is Spatiotemporal Partitioning and Summarization?
Spatiotemporal partitioning, or Spatiotemporal clustering is the process of grouping similar spatiotemporal data items, and thus partitioning the underlying space and time [27].It is important in many societal applications.For example, partitioning and summarizing crime data, which is spatial and temporal in nature, helps law enforcement agencies find trends of crimes and effectively deploy their police resources.It is important to note that spatiotemporal partitioning or clustering is closely related to, but not the same as spatiotemporal hotspot detection.Hotspots can be considered as special clusters such that events or activities inside a cluster have much higher intensity than outside.
Spatiotemporal summarization aims to provide a compact representation of spatiotemporal data.For example, traffic accident events on a road network can be summarized into several main routes that cover most of the accidents.Spatiotemporal summarization is often done after or together with spatiotemporal partitioning so that objects in each partition can be summarized by aggregated statistics or representative objects.

Application Domains
Spatiotemporal partitioning and summarization are important in many societal applications such as public safety, public health, and environmental science.For example, partitioning and summarizing crime data, which is spatial and temporal in nature, helps law enforcement agencies find trends of crimes and effectively deploy their police resources [135].

Statistical Foundation
Relevant statistics for spatiotemporal partitioning and summarization include spatiotemporal point density estimation [38] (e.g., Kernel density function), and temporal correlation for spatial time series, etc.

Common Approaches
Spatio-Temporal Event Partitioning: Some classic clustering algorithms for two dimensional space can be easily generalized to spatiotemporal scenarios [136].These techniques can be classified as global partitioning, density-based, hierarchical, and graph based.Global partitioning groups spatial objects to maximize within-group similarity.Examples are K-means, K-Medoids, EM algorithm, CLIQUE [137], BIRCH, and CLARANS [138].Density based approaches first identify "dense" points and connect them to form contiguous clusters or partitions.Examples include ST-DBSCAN [139], a spatiotemporal extension of its spatial version called DBSCAN [140,141], as well as ST-GRID [142], which splits the space and time into 3D cells and merges dense cells together into clusters.The hierarchical approach, e.g., agglomerative [40], dendrogram [40], and BIRCH [143], partitions or groups spatiotemporal data at different hierarchical levels.Graph based approaches such as Chameleon [40,144] first represent data items as a sparse k nearest neighbor graph, then partition the graph into segments, and hierarchically merge graph segments according to the similarity between original segments and merged segments.
Spatial Time-Series Partitioning: Spatial time series partitioning aims to divide the space into regions such that the correlation or similarity between time series within the same regions is maximized.Global partitioning methods such as K Means, K Medoids, and EM can be applied.So can the hierarchical approach.However, due to the high dimensionality of spatial time series, density-based approaches and graph-based approaches are often not effective.When computing similarity between spatial time series, a filter-and-refine approach [46] can be used to avoid redundant computation.
Trajectory Data Partitioning: Trajectory data partitioning aims to partition trajectories into groups according to their similarity.Algorithms are of two types, i.e., density-based and frequency-based.The density based approaches [145,146] first break trajectories into small segments and apply density-based clustering algorithms similar to DB-SCAN [140] to connect dense areas of segments.The frequency based approach [147] uses association rule mining [63] algorithms to identify subsections of trajectories which have high frequencies (also called high "support").
Spatiotemporal Summarization: Data summarization aims to find compact representation of a data set [148].It is important for data compression as well as for making pattern analysis more convenient.As shown in Table 5, data summarization can be conducted on classical data, spatial data, as well as spatiotemporal data.For spatial time series data, summarization can be done by removing spatial and temporal redundancy due to the effect of autocorrelation.A family of such algorithms has been used to summarize traffic data streams [149].Similarly, the centroids from K Means can also be used to summarize spatial time series.For trajectory data, especially spatial network trajectories, summarization is more challenging due to the huge cost of similarity computation.A recent approach summarizes network trajectories into k primary corridors [150,151].The work proposes efficient algorithms to reduce the huge cost for network trajectory distance computation.Given a set of spatial objects (e.g., activity locations) in a study area, spatiotemporal hotspots are regions together certain time intervals where the number of objects is anomalously or unexpectedly high within the time intervals.Spatiotemporal hotspots are a special kind of clustered pattern whose inside has significantly higher intensity than outside.

Application Domains
Application domains for spatiotemporal hotspot detection range from public health to criminology.For example, in epidemiology finding disease hotspots allows officials to detect an epidemic and allocate resources to limit its spread [152].

Statistical Foundation
Spatiotemporal scan statistics [76,152] are used to detect statistically significant hotspots from spatiotemporal datasets.It uses a cylinder to scan the space-time for candidate hotspots and perform hypothesis testing.The null hypothesis states that the activity points are distributed randomly according to a homogeneous (i.e., same intensity) Poisson process over the geographical space.The alternative hypothesis states that the inside of the cylinder has higher intensity of activities than outside.A test statistic called the log likelihood ratio is computed for each candidate hotspot (or cylinder) and the candidate with the highest likelihood ratio can be evaluated using a significance value (i.e., p-value).

Common Approaches
Clustering based approaches: Clustering methods can be used to identify candidate areas for a further evaluation of spatiotemporal hotspots.These methods include global partitioning, density based clustering and hierarchical clustering (see Section 4.4).These methods can be used as a preprocessing step to generate candidate hotspot areas, and statistical tools may be used to test statistical significance.CrimeStat, a software package for spatial analysis of crime locations, incorporates several clustering methods to determine the crime hotspots in a study area.CrimeStat package has k-means tool, nearest neighbor hierarchical (NNH) clustering [153], Risk Adjusted NNH (RANNH) tool, STAC Hot Spot Area tool [135], and a Local Indicator of Spatial Association (LISA) tool [72] that are used to evaluate potential hotspot areas.Although many of the clustering methods mentioned above are generally designed for two dimensional Euclidean space and mostly used for pure spatial data, they can be used to identify spatiotemporal candidate hotspots by considering the temporal part of the data as a third dimension.For example, DBSCAN will cluster both spatial and spatiotemporal data using the density of the data as its measure.
Spatiotemporal Scan Statistics based approaches: Spatiotemporal hotspot detection can be seen as a special case of pure spatial hotspot detection by the addition of time as a third dimension.Two types of spatiotemporal hotspots that are of particular importance: "persistent" spatiotemporal hotspot and "emerging" spatiotemporal hotspot.A "persistent" spatiotemporal hotspot is defined as a region where the rate of increase in observations is constantly high over time.Thus, a persistent hotspot detection assumes that the risk of an hotspot (i.e., outbreak) is constant over time and it searches over space and time for an hotspot by simply totaling the number of observations in each time interval.An example tool for persistent spatiotemporal hotspot detection is SaTScan, which uses a cylindrical window in three dimensions (time is the third dimension) instead of the circular window in two dimensions [152] used for detecting spatial hotspots.An "emerging" spatiotemporal hotspot is a region where the rate of observations is monotonically increasing over time [154,155].This kind of spatiotemporal hotspot occurs when an outbreak emerges causing a sudden increase in the number observations.Such phenomena can be observed in epidemiology where at the start of an outbreak the number of disease cases suddenly increases.Tools for the detection of emerging spatiotemporal hotspots use spatial scan statistics with the change in expectation over time [156].Although the single term "change" is used to name the spatiotemporal change footprint patterns in different applications, the underlying phenomena may differ significantly.This section briefly summarizes the main ways a change may be defined in spatiotemporal data [29]: Change in Statistical Parameter: In this case, the data is assumed to follow a certain distribution and the change is defined as a shift in this statistical distribution.For example, in statistical quality control, a change in the mean or variance of the sensor readings is used to detect a fault.
Change in Actual Value: Here, change is modeled as the difference between a data value and its spatial or temporal neighborhood.For example, in a one-dimensional continuous function, the magnitude of change can be characterized by the derivative function, while on a two-dimensional surface, it can be characterized by the gradient magnitude.
Change in Models Fitted to Data: This type of change is identified when a number of function models are fitted to the data and one or more of the models exhibits a change (e.g., a discontinuity between consecutive linear functions) [157].

Common Approaches
This section follows the taxonomy of spatiotemporal change footprint patterns as proposed in [29].In this taxonomy, spatiotemporal change footprints are classified along two dimensions: temporal and spatial.Temporal footprints are classified into four categories: single snapshot, set of snapshots, point in a long series, and interval in a long series.Single snapshot refers to a purely spatial change that does not have a temporal context.A set of snapshots indicates a change between two or more snapshots of the same spatial field, e.g., satellite images of the same region.Spatial footprints can be classified as raster footprints or vector footprints.Vector footprints are further classified into four categories: point(s), line(s), polygon(s), and network footprint patterns.Raster footprints are classified based on the scale of the pattern, namely, local, focal, or zonal patterns.This classification describes the scale of the change operation of a given phenomenon in the spatial raster field [158].Local patterns are patterns in which change at a given location depends only on attributes at this location.Focal patterns are patterns in which change in a location depends on attributes in that location and its assumed neighborhood.Zonal patterns define change using an aggregation of location values in a region.
Spatiotemporal Change Patterns with Raster-Based Spatial Footprint: This includes patterns of spatial changes between snapshots.In remote sensing, detecting changes between satellite images can help identify land cover change due to human activity, natural disasters, or climate change [159][160][161].Given two geographically aligned raster images, this problem aims to find a collection of pixels that have significant changes between the two images [162].This pattern is classified as a local change between snapshots since the change at a given pixel is assumed to be independent of changes at other pixels.Alternative definitions have assumed that a change at a pixel also depends on its neighborhoods [163].For example, the pixel values in each block may be assumed to follow a Gaussian distribution [164].We refer to this type of change footprint pattern as a focal spatial change between snapshots.Researchers in remote sensing and image processing have also tried to apply image change detection to objects instead of pixels [165][166][167], yielding zonal spatial change patterns between snapshots.
A well-known technique for detecting a local change footprint is simple differencing.The technique starts by calculating the differences between the corresponding pixels intensities in the two images.A change at a pixel is flagged if the difference at the pixel exceeds a certain threshold.Alternative approaches have also been proposed to discover focal change footprints between images.For example, the block-based density ratio test detects change based on a group of pixels, known as a block [168,169].Object-based approaches in remote sensing [167,170,171] employ image segmentation techniques to partition temporal snapshots of images into homogeneous objects [172] and then classify object pairs in the two temporal snapshots of images into no change or change classes.
Spatiotemporal Change Patterns with Vector-Based Spatial Footprint: This includes the Spatiotemporal Volume Change Footprint pattern.This pattern represents a change process occurring in a spatial region (a polygon) during a time interval.For example, an outbreak event of a disease can be defined as an increase in disease reports in a certain region during a certain time window up to the current time.Change patterns known to have an spatiotemporal volume footprint include the spatiotemporal scan statistics [173,174], a generalization of the spatial scan statistic, and emerging spatiotemporal clusters defined by [156].

Spatial and Spatiotemporal Analysis Tools
This section lists currently existing spatial and spatiotemporal analysis tools, including geographic information system (GIS) softwares, spatial and spatiotemporal statistical tools, spatial database management systems, as well as spatial big data platforms.
GIS Softwares: ArcGIS [175] is the currently most widely used commercial GIS software for working with maps and geographic information.It has an extension named Tracking Analyst to support visualization and analysis for spatiotemporal data.QGIS [176] (previously Quantum GIS) is a very popular open source GIS software.Spatial Statistical Tools: R provides many packages for spatial and spatiotemporal statistical analysis [177], such as spatstat for point pattern analysis, gstat and geoR for Geostatistics, spdep for areal data analysis.Matlab also provides Mapping Toolbox [178] and other spatial statistical toolboxes.SAS recently provides support on spatial statistics [179] such as KRIGE2D Procedure for Kriging, SIM2D Procedure for Gaussian random field, SPP Procedure for spatial point pattern, and VARIOGRAM Procedure for variograms.Spatial Database Management Systems: Many commercial database provides extensions to support spatial data, such as Oracle Spatial [180], and DB2 Spatial Extender [181].PostGIS [182] is a widely used open source spatial database management systems, which is an extension to Postgres, an object-relational DBMS.
Spatial Big Data Platform: The upcoming spatial big data from vehicle GPS trajectories, cellphone location data, as well as remote sensing imagery exceeds the capabilities of traditional spatial DBMS, and requires new platforms to support scalable spatial analysis.Current spatial big data platforms include ESRI GIS on Hadoop [183,184], Hadoop GIS [185], and Spatial Hadoop [186].

Research Trend and Future Research Needs
Most current research in spatiotemporal data mining uses Euclidean space, which often assumes isotropic property and symmetric neighborhoods.However, in many real world applications, the underlying space is network space, such as river networks and road networks [187][188][189].One of the main challenges in spatial and spatiotemporal network data mining is to account for the network structure in the dataset.For example, in anomaly detection, spatial techniques do not consider the spatial network structure of the dataset, that is, they may not be able to model graph properties such as one-ways, connectivity, left-turns, etc.The network structure often violates the isotropic property and symmetry of neighborhoods, and instead, requires asymmetric neighborhood and directionality of neighborhood relationship (e.g., network flow direction).
Recently, some cutting edge research has been conducted in the spatial network statistics and data mining [80].For example, several spatial network statistical methods have been developed, e.g., network K function and network spatial autocorrelation.Several spatial analysis methods have also been generalized to the network space, such as network point cluster analysis and clumping method, network point density estimation, network spatial interpolation (Kriging), as well as network Huff model.Due to the nature of spatial network space as distinct from Euclidean space, these statistics and analysis often rely on advanced spatial network computational techniques [80].
We believe more spatiotemporal data mining research is still needed in the network space.First, though several spatial statistics and data mining techniques have been generalized to the network space, few spatiotemporal network statistics and data mining have been developed, and the vast majority of research is still in the Euclidean space.Future research is needed to develop more spatial network statistics, such as spatial network scan statistics, spatial network random field model, as well as spatiotemporal autoregressive models for networks.Furthermore, phenomena observed on spatiotemporal networks need to be interpreted in an appropriate frame of reference to prevent a mismatch between the nature of the observed phenomena and the mining algorithm.For instance, moving objects on a spatiotemporal network need to be studied from a traveler's perspective, i.e., the Lagrangian frame of reference [190][191][192] instead of a snapshot view.This is because a traveler moving along a chosen path in a spatiotemporal network would experience a road-segment (and its properties such as fuel efficiency, travel-time etc.) for the time at which he/she arrives at that segment, which may be distinct from the original departure-time at the start of the journey.These unique requirements (non-isotropy and Lagrangian reference frame) call for novel spatiotemporal statistical foundations [187] as well as new computational approaches for spatiotemporal network data mining.
Another future research need is to develope spatiotemporal graph big data platforms, motivated by the upcoming rich spatiotemporal network data collected from vehicles.Modern vehicles have rich instrumentation to measure hundreds of attributes at high frequency and are generating big data (Exabyte [193]).This vehicle measurement big data (VMBD) consist of a collection of trips on a transportation graph such as a road map annotated with several measurements of engine sub-systems.Collecting and analyzing VMBD during real-world driving conditions can aid in understanding the underlying factors which govern real world fuel inefficiencies or high greenhouse gas (GHG) emissions [194].Current relevant big data platforms for spatial and spatiotemporal data mining include ESRI GIS Tools for Hadoop [183,184], Hadoop GIS [185], etc.These provide distributed systems for geometric-data (e.g., lines, points and polygons) including geometric indexing and partitioning methods such as R-tree, R+-tree, or Quad tree.Recently, SpatialHadoop has been developed [186].SpatialHadoop embeds geometric notions in language, visualization, storage, MapReduce, and operations layers.However, spatio-temporal graphs (STGs) violate the core assumptions of current spatial big data platforms that the geometric concepts are adequate for conveniently representing STG analytics operations and for partition data for load-balancing.STGs also violate core assumptions underlying graph analytics software (e.g., Giraph [195], GraphLab [196] and Pregel [197]) that traditional location-unaware graphs are adequate for conveniently representing STG analytics operations and for partition data for load-balancing.Therefore, novel spatiotemporal graph big data platforms are needed.Several challenges should be addressed, e.g., spatiotemporal graph big data requires novel distributed file systems (DFS) to partition the graph, and a novel programming model is still needed to support abstract data types and fundamental STG operations, etc.

Summary
This paper provides an over view of current research in the field of spatiotemporal data mining from a computational perspective.Spatiotemporal data mining has broad application domains including ecology and environmental management, public safety, transportation, earth science, epidemiology, and climatology.However, the complexity of spatiotemporal data and intrinsic relationships limits the usefulness of conventional data science techniques for extracting spatiotemporal patterns.We provide a taxonomy of different spatiotemporal data types and underlying spatiotemporal statistics.We also review common spatiotemporal data mining techniques organized by major output pattern families: spatiotemporal outlier, spatiotemporal coupling and tele-coupling, spatiotemporal prediction, spatiotemporal partitioning and summarization, spatiotemporal hotspots, and change detection.Popular software tools for spatial and spatiotemporal data analysis are also listed.Finally, we discuss the cutting edge research areas and future research needs.

Figure 1 .
Figure 1.The process of spatiotemporal data mining.

Figure 2 .
Figure 2. Categorization of spatial and spatiotemporal data mining surveys.

Table 1 .
Taxonomy of Spatial and Spatiotemporal Data Models.

Table 2 .
Common Relationships among Non-spatial and Spatial Attributes.

Table 3 .
Relationships on Spatiotemporal Data.

Table 4 .
Taxonomy of Spatial and Spatiotemporal Statistics.

Table 5 .
Summarization Framework for Various Data Types.