Sampling and Estimation of Pairwise Similarity in Spatio-Temporal Data Based on Neural Networks

Increasingly fast computing systems for simulations and high-accuracy measurement techniques drive the generation of time-dependent volumetric data sets with high resolution in both time and space. To gain insights from this spatio-temporal data, the computation and direct visualization of pairwise distances between time steps not only supports interactive user exploration, but also drives automatic analysis techniques like the generation of a meaningful static overview visualization, the identification of rare events, or the visual analysis of recurrent processes. However, the computation of pairwise differences between all time steps is prohibitively expensive for large-scale data not only due to the significant cost of computing expressive distance between high-resolution spatial data, but in particular owing to the large number of distance computations (O(|T|2), with |T| being the number of time steps). Addressing this issue, we present and evaluate different strategies for the progressive computation of similarity information in a time series, as well as an approach for estimating distance information that has not been determined so far. In particular, we investigate and analyze the utility of using neural networks for estimating pairwise distances. On this basis, our approach automatically determines the sampling strategy yielding the best result in combination with trained networks for estimation. We evaluate our approach with a variety of time-dependent 2D and 3D data from simulations and measurements as well as artificially generated data, and compare it against an alternative technique. Finally, we discuss prospects and limitations, and discuss different directions for improvement in future work.


Introduction
Time-dependent data sets with increasing resolution in both time and space are generated at a fast rate, enabled by advances in parallel computing systems for simulations and high-accuracy measurement techniques.This data can feature millions of cells and thousands of time steps, and thus poses significant challenges for visual analysis.Even if the complete data-well-exceeding the available memory in most cases-could be presented to the user interactively, still numerous issues due to visual clutter and occlusion need to be prevented.A popular and natural choice to visualize the data without (temporal) occlusion and clutter issues is animation (i.e., sequentially rendering individual time steps).However , it has been shown to be ineffective as only a limited number of frames can be memorized by an observer (e.g., [1]).This motivates the development of visualization approaches that select and/or aggregate data in a data-driven way to enable efficient visual analysis and exploration.
For some of these type of data-driven visualization approaches, the computation of mutual distances between time steps is a fundamental operation for automatic analysis techniques.Recent examples of applications in visualization include the generation of a meaningful static overview visualization, the identification of rare events, as well as the visual analysis of recurrent processes.
Apart from that, visualizing the similarity information alone directly can also drive interactive visual exploration by indicating processes of interest to a user.In general, these application scenarios require the full computation of pairwise differences between all time steps.This is prohibitively expensive in particular in the context of large-scale data.This is not only due to (1) the significant cost of computing expressive distance between high-resolution time steps, but (2) especially owed to the large number of distance computations involved (O(|T| 2 ), with |T| being the number of time steps).This means that thousands of time steps already induce millions of (costly) distance computations.
In this work, we present and evaluate different strategies for the progressive computation of similarity information in a time series, as well as an approach for estimating missing distance information based on neural networks.Different strategies for sampling we consider range from purely random sampling over uniform (data-agnostic) to adaptive (data-driven) sampling strategies.With this sampled similarity information (i.e., a subset of a pairwise distances D of a time series T), we then aim to reconstruct the full set of pairwise distances D using a neural network.The goal of this approach is to let the neural network implicitly capture the special structure and properties of similarity information in spatio-temporal data.We then essentially combine both aspects by training neural networks for similarity estimation particularly on different sampling patterns of the different strategies.This eventually allows to automatically determines the sampling strategy yielding the best result in combination with trained networks for estimation.
The remainder of this paper is structured as follows.First, we review related work in Section 2.Then, we introduce our problem statement and give an overview of our approach in Section 3. We then discuss the two main parts of this work: different strategies for the sampling of similarity data (Section 4), as well as neural networks for the estimation of similarities (Section 5).Finally, we evaluate and discuss the properties of our approach in Section 6, before concluding our work in Section 7.

Visualization of Spatio-Temporal Data
For the visualization of time-varying data, and extensions to many techniques could be applied to make them more general towards dealing with multi-field data.Lee and Shen [2] visualize trend relationships among variables in multivariate time-varying data.Joshi and Rheingans [3] evaluate illustration-inspired techniques for time-varying data, like speedlines or flow ribbons.One approach is to interpret the data as a space-time hypercube, and apply extended classic visualization operations like slicing and projection techniques [4] or temporal transfer functions [5] to it (cf.Bach et al. [6] for on overview).Another way to approach time-dependent data are feature-based techniques.Here, particularly Time Activity Curves (TAC) that contain each voxel's time series have been used as the basis for different techniques (e.g., [7]).Apart from such techniques working directly with scalar volume data, a large body of work in time-dependent volume visualization is based on feature extraction.Wang et al. [8] extract a feature histogram per volume block (typically hundreds to thousands of voxels).They then derive entropy-based importance curves that characterize the local temporal behavior of each block, and classify them via k-means clustering.Widanagamaachchi et al. [9] employ feature tracking graphs.Lee and Shen [10] visualize time-varying features and their motion on the basis of time activity curves (TAC) that contain each voxel's time series.Fang et al. [7] use TACs in combination with different similarity measures.Silver et al. [11] isolate and track representations of regions of interest.The robustness of this approach has been improved by Ji and Shen [12] with a global optimization correspondence algorithm based on the Earth Mover's Distance.Scale-space based methods and topological techniques have also been used here (e.g., [13,14]).Schneider et al. [15] compare scalar fields on the basis of the largest contours.
Another line of techniques is based on the direct comparison of time steps.The Earth Mover's Distance (EMD, also known as the Wasserstein metric) is a common metric to compute the difference between mass distributions (conceptually, it determines the minimum cost of turning one (mass) distribution into the other) [16].For instance, Tong et al. [17] use different metrics to compute the distance between data sets, and employ dynamic programming to select the most interesting time steps accordingly.The field of video analysis also deals with related analysis problems, yet typically employing different methodologies.Specialized image and video metrics are used to compare frames (e.g., [18]), and distinct approaches were proposed to generate summaries of videos, e.g., based on the motion of actors over time [19].In addition, illustrative techniques have been used to depict processes of interest.Lu and Shen [20] propose interactive storyboards composed of volume renderings and descriptive geometric primitives.While most techniques mentioned above deal with volume data, numerous approaches have been presented for flow visualization (cf.Post et al. [21] and McLoughlin et al. [22] for an overview).Note that different fields have developed different methodologies to quantify similarity for other application settings.For instance, to enable style-based search, Garces et al. [23] present a method for measuring the similarity in style between two pieces of vector art, based on the differences between four types of features: color, shading, texture, and stroke.Feature weightings are learned from crowdsourced experiments.
Frey and Ertl [24] presented a technique to generate transformations between arbitrary volumes, providing both expressive distances and smooth interpolates.On this basis, they presented a new approach for the streaming selection of time steps in temporal data that allows for the reconstruction of the full sequence with a user-specified error bound.An accelerated version with overall improved efficiency as well as an extension to manycore devices (i.e., GPUs) has been presented in a follow-up work [25].We use this approach to quantify distances between time steps in this paper.
On the basis of similarity information between time steps, Frey and Ertl [26] adaptively select time steps from time-dependent volume data sets for an integrated and comprehensive visualization.This reduced set of time steps not only saves cost, but also allows to show both the spatial structure and temporal development in one combined rendering.The selection optimizes the coverage of the complete data on the basis of a minimum-cost flow-based technique to determine meaningful distances between time steps.An interactive volume raycaster produces an integrated rendering of the selected time steps, and their computed differences are visualized in a dedicated chart to provide additional temporal similarity information.

Similarity Matrices to Directly Visualize and Analyze Similarity Information
Their benefits and utility of recurrence plots and similarity matrices are discussed in detail by Marwan et al. [27].There are also variants that extend those concepts from univariate to multivariate data.One possibility to apply these concepts to study the spatial structure of data is to separate the data into many one-dimensional data series, and to apply the recurrence analysis separately to each of these series [28].Another possibility is the extension of the temporal approachof recurrence plots to a spatial one [29] at the cost of high-dimensional domains, e.g., a time-dependent 2D image is mapped to a 4D recurrence plot, which is, however, hard to visualize.Bautista et al. [30] analyze the difference between recurrence plots from different time series.For multi-field visualization, Frey et al. [31] presented an interactive approach on the basis of similarity matrices for extracting and exploring time-dependent phenomena, that allows to compare different locations, modalities, ensemble runs, or generally even data sets with no direct relation.It focuses on periodic and quasi-periodic behavior at single points, but was also used to analyze cross-correlations in ensemble and multi-variate data.

Machine Learning for Image Interpolation and Similarity Learning
Machine learning is popularly regarded as the only viable approach to building AI systems that can deal with (very) complicated environments [32].In particular, in this work, we employ neural networks for estimating the similarity between different time steps.Image interpolation is a different task with different inherent characteristics, but there are also some related aspects.Hu et al. [33] propose an interpolation algorithm using a classification-based neural network approach with the goal to improve the image quality.Plaziac [34] compared two adaptive algorithms for image interpolation based on a multilayer perceptron.More recently, Chen et al. [35] used anisotropic probabilistic neural network on the basis of an anisotropic Gaussian kernel to provide high adaptivity of smoothness/sharpness during image/video interpolation.
Neural networks have also been applied to similarity learning, which belongs to the category of supervised machine learning in artificial intelligence.In general, this resembles our task of learning the similarities between time steps of spatio-temporal data, yet previous work has been done mostly for very different application scenarios.The goal is to learn from examples a similarity function that measures how similar or related two entities are.It has applications in ranking, in recommendation systems, visual identity tracking, face verification, and speaker verification.Guillaumin et al. [36] present two methods for learning robust distance measures for assessing the similarity of faces.Similarity learning is closely related to distance metric learning, in that metric learning is the task of learning a distance function over objects.Kulis [37] presents an overview of existing research in metric learning.Davis et al. [38] present an information-theoretic approach to learning a Mahalanobis distance function (the Mahalanobis distance is a measure of the distance between a point and a distribution).

Overview
The motivation behind this work is to gain insights from spatio-temporal data.This data can have different processes and structures, and be obtained via measurements and different types of CFD simulations (Figure 1).This means that this type of analysis is of interest to a wide variety of different fields.In this section, we first provide an introduction into similarity information from time series data in Section 3.1.We then cover the fundamentals of neural networks that are the basis for our similarity estimation approach discussed later in this work (Section 3.2).Finally, we give an outline of our approach and its different components in Section 3.3.

Similarity Information from Spatio-Temporal Data
We aim to analyze our time series data T on the basis of similarities between individual time steps t ∈ T. Here, the similarity between individual time steps is quantified by function d : T × T → R, with the result being in the range [0, ∞) (0 denotes identity).We further assume d(•, •) to be symmetric, i.e., d(t 0 , t 1 ) = d(t 1 , t 0 ) for t 0 , t 1 ∈ T. In the following, we therefore only consider d(t 0 , t 1 ) for t 0 < t 1 (in the identity case of t 0 = t 1 , d(t 0 , t 1 ) = 0).As a basis for numerous analysis applications, in this work we are interested in obtaining all pair-wise similarities d(t 0 , t 1 ) between time steps t 0 , t 1 ∈ T with t 0 < t 1 .This means that for |T| time steps there are |d(T)| (also denoted as |D|) time steps: For the real-world data sets in Figure 1, the results are shown in Figure 2 in the form of similarity plots.We compute similarity information from the real-world data using the metric by Frey and Ertl [24,25] (cf.discussion later in Section 5.2).Essentially, it constitutes a fast computation method of the Earth Mover's Distance ( [16], also known as the Wasserstein metric) that makes it computationally feasible to apply it directly to high-resolution data.While the computation of similarities between all pairs of time steps is very expensive, for training and testing purposes, we generated reference similarity information for the time series data introduced above over the course of several weeks on different machines (using both CPUs and GPUs).Most notably, due the symmetric property of the distance quantification function d as discussed above (i.e., d(t 0 , t 1 ) = d(t 1 , t 0 ) for t 0 , t 1 ∈ T), not a full pair-wise matrix is shown but a only the cases for t 0 < t 1 .In the end, this forms a triangle-shaped plot.Different similarity structures can be seen, with the Kármán and most notably the Supernova data set featuring processes at distant points in time that are very similar (in the Kármán, this can be observed in the bottom right where line structures with a small offset to the diagonal can be seen).

Neural Networks Basics (for Time Series Similarity Estimation)
In this work, we make use of so-called feedforward neural networks (aka multi-layer perceptrons (MLPs)) [32].The goal of a feedforward network is to approximate some function f * .In our case, we aim to estimate the result distance function d(t 0 , t 1 ) that determines the similarity between any two time steps in a time series t 0 , t 1 ∈ T. In general, a feedforward network defines a mapping y = f (x; θ) and learns the value of the parameters θ that result in the best function approximation.In our application scenario, both input and out of the neural network is pair-wise similarity information in a time series.
These kind of models are called feedforward because information flows through the function being evaluated from x, through the intermediate operations used to define f , and eventually to the output f .There are no feedback connections in which outputs of the model are fed back into itself (as in recurrent neural networks, e.g., [32]).
Feedforward neural networks are typically represented by composing together different functions.The model is associated with a directed acyclic graph describing how the functions are composed together.For example, there might be three functions f (1) , f (2) , and f (3) connected in a chain, to form f (x) = f (3) ( f (2) ( f (1) )).These chain structures are the most commonly used structures of neural networks.In this case, f (1) is called the first layer of the network, f (2) is called the second layer, and so on.While the first layer is denote as the input layer, the final layer of a feedforward network is called the output layer.During neural network training, we adjust f (x) to match f * (x).Each example x is accompanied by reference result values y = f * (x).The training examples specify directly what the output layer must do at each point x; it must produce a value that is close to y.As the training data does not show the desired output for the layers between the input and the output layer, these layers are called hidden layers.

Approach Outline
A conceptual overview on our approach is given in Figure 3. Essentially, there are two distinct phases, that are indicated by the wide gray arrows: (1) select and (2) optimize.First, in selection, we evaluate different sampling strategies to sample similarity information.Sampling strategies define different techniques to approach the adaptive sampling of similarity information (cf.Section 4).We then use the full similarity information along with the sparse set generated by the adaptive sampling for training a neural network to reconstruct the full information.We then validate the generated model, which essentially results in an error value (subsequently denoted as cost).Using this information, we compare the obtained cost values of all sampling strategies, and choose the strategy yielding the lowest cost as our best strategy.With this, we enter our second phase in which we optimize the network belonging to the best strategy.While the selection just carries out one training and validation run for each strategy, the subsequent optimization step iteratively improves the network belonging to the best sampling strategy via continuous training.

Strategies for Similarity Sampling
As previously discussed in Section 3.1, we utilize and evaluate progressive approaches to compute similarity information between individual time steps t ∈ T of a time series T. Here, progressive means that we iteratively add new similarity information (i.e., distances between pairs of time steps).The strategy basically for which time step pair its similarity is computed next.The strategies for determining the next time step pair to compute-subsequently denoted as similarity sampling-considered in this paper are now discussed in the following.Depending on their procedure, they belong to two time types of categories: similarity pair-based and time step-based.
Similarity pair-based denotes the concept that pairs of time steps can be selected in an arbitrary fashion, and therefore also completely independently from the samples taken so far (naturally, a pair of time steps only needs to be computed once).More formally, this means that the next time step pair (t 0 , t 1 ) to compute the similarity for using metric d(t 0 , t 1 ) is chosen arbitrarily from the full set of time steps T (i.e., t 0 ∈ T and t 1 ∈ T).Here, the only restriction is that we limit ourselves to t 0 < t 1 due the symmetry property of d (cf.discussion in Section 3.1).
In contrast, time step-based means that not individual pairs but time steps are progressively added into consideration, and the similarity between all pairwise combinations of considered time steps is computed before selecting new time steps.This means that only a subset T * ⊂ T is currently considered.Before adding a new time step t ∈ T, t / ∈ T * , we first compute all combinations t 0 ∈ T * × t 1 ∈ T * (with t 0 < t 1 ).Only when all pairwise combinations are computed, we add a new time step to T * .
The different sampling strategies (creating a set of similarity pairs P) we employ and evaluate in this paper on the basis of these different approaches outlined above are as follows ((1) and (2) follow similarity pair-based, (3)-( 7) follow time step-based).
(1) uniform pair.In this strategy, the goal is to distribute samples in the temporal space T × T as evenly as possible (in a progressive fashion).Doing this, we start out with a random sample pair P = {(t 0 , t 1 )} (t 0 ∈ T and t 1 ∈ T).Subsequently, we then iteratively compute the new similarity pair (t 0 , t 1 ) that has the maximum distance to any of the pairs (t 0 , t 1 ) ∈ P (i.e., to any of the pairs that have been computed so far).
(2) random pair.A time step pair-that has not been computed yet (/ ∈ P)-is chosen randomly and processed next.(3) random time.A random time step is selected that has not yet been considered (/ ∈ T * ).As described above, before adding a new time step, first all pairwise combinations of time steps ∈ T * are computed before proceeding further.(4) uniform time.Select the time step in between the largest interval range t i+1 − t i in T * (with t i and t i+1 denoting subsequent time steps ∈ T).In case there are multiple intervals with the same size, we choose one randomly.The different sampling patterns arising from these seven different strategies are exemplified in Figure 4 (distances which have not been computed yet are indicated in red).We achieve a large variety of strategies, following completely random, uniform, and similarity-adaptive approaches.

Neural Networks for Similarity Estimation
In this section, we first discuss the basic model setup of our neural network (Section 5.1).We then outline how we obtain and generate the data used for its training and validation ((Section 5.2).On this basis, we finally discuss our approach to train neural networks and to select the most appropriate respective sampling strategy ((Section 5.3).

Model Setup
Our model is designed to estimate one missing similarity pair (t 0 , t 1 ) on the basis of other available similarity information.The design choices described below are based on empirical tests, informally evaluating different model designs and neural network setups against each other.Note that while our resulting design is the best we found in our tests, we cannot consider it optimal due to the large search space (there is large variety of different ways to configure a neural network alone).Considered Input Data.For estimating the similarity information, we utilize a subset of the available similarity information from different regions.In more detail, we jointly consider two types of information in a (temporal) region of extent δ around the requested element (t * 0 , t * 1 ) (individually, for each component of the pair, this results in a region of T 0 = {t 0 − δ, . . ., t 0 + δ} and T 1 = {t 1 − δ, . . ., t 1 + δ}, respectively).An illustration of the considered time steps at different examples is shown in Figure 5.
1.The cubical region around (t * 0 , t * 1 ), except for (t * 0 , t * 1 ) itself (blue in Figure 5): This results in |T 0 ||T 1 | − 1 elements.This gives the similarity of close pairs in temporal space.2. Two subsets of the similarity matrix, one around t 0 and one around t 1 (i.e., containing all pairwise combinations of time steps T 0 and T 1 , respectively) (green in Figure 5).This results in a total of |d(T 0 )| + |d(T 1 )| (according to Equation ( 1)), and gives the similarity to close time steps for each component of the pair.
In total, this accordingly results in I input elements (cf. Figure 5): In cases where no similarity information is available (because it has not been sampled yet or it is out of the temporal range), the respective element gets a dedicated (missing) value of m (in our implementation we set m = −1, which clearly indicates a special value as similarity information is generally quantified by positive values).Similarity pair information from the two different types of information may also overlap (e.g., Figure 5b,c), in which case the respective similarity information is considered redundantly as input for the neural network.
Neural Network Structure.Neural networks have three types of layers: input, hidden, and output.There is exactly one input layer, with the number of neurons being determined by the size of the input data.Based on the previous discussion regarding considered input data, this means that we have a total number of |I δ | input neurons (Equation ( 3)).The output layer also consists of exactly one layer.However, here, we just use single neuron as only one specific similarity pair is predicted at a time by our neural network.Regarding the hidden layers, there is much larger degree of freedom : how many hidden layers to actually have in the neural network and how many neurons (and which type) will be in each of these layers.Typically, these decisions have a significant impact on the results that can be achieved, but respective decisions come down to experience and trial-and-error to a certain extent.In this work, we use one hidden layer, with the number neurons equalling the number of neurons in the input layer.According to our experiments, this provides a good trade-off between underfitting and overfitting.For each neuron, we use the sigmoid as an activation function.Note that this is the design that worked best according to our experiments, but we do not consider it to be optimal or any kind of definite solution to the problem (but rather a step toward it).

Real-World 2D/3D Data
(Figure 1) generate variant of P i 16: for all e ∈ [0 . . .n T ) do 18: end for 20: determine number of samples to take, maximum number computed via Equation ( 1) end while

24:
return X 25: end function A large number of adequate training and validation data is crucial for the success when training neural networks.However, the computation of real-world distances is very expensive and can only be done for a few data sets.To overcome this issue, we generate additional artificial data and further modify the similarity information.In detail, we use the following multi-stage approach to generate a large variety of training and validation data (cf.Figure 6).
Real-World 2D/3D Data.As input, we use a set of typical real-world 2D/3D + time data sets (Figure 1).Compute Similarity.To compute the pairwise distances between different time steps within each series, we use the approach proposed by Frey and Ertl [24,25].It is used to make it computationally feasible to directly compute the similarity between high-resolution field data sets.Conceptually, it starts with an initial random assignment of so-called source elements from one data set to so-called target elements of the other data set (each element refers to a (scalar) mass unit given at a certain cell/position in the data).Then, this assignment is improved iteratively in the following.In each iteration, source elements exchange respectively assigned target elements under the condition that this improves the assignment.For this, the quality of an assignment is quantified by d, that essentially computes the sum of weighted distances of the assignments.Here, assignments are weighted by the scalar quantity that is transported.We use this value d directly (on the basis of Euclidean distances) to quantify the distance between the respective time steps.The respective results are shown in Figure 2. Please refer to Frey and Ertl [24,25] for a more detailed discussion.Artificial Similarity Data.Only using a small number of data sets is not sufficient to cover the large variety of typical patterns of similarity information in general, and might also be dangerous in terms of training the network regarding the concrete data rather than generalizing for similarity estimation.Therefore, we added further, synthetically-generated time series data to supplement this.Here, the idea is to mimic the typical patterns that we have seen occurring in the similarity data, yet providing a larger variety to yield better generalization characteristics after learning.For this, we used the following equation ψ for t ∈ [0, 1), and three random values ρ 0 , ρ 1 , ρ 2 ∈ [0, 1): We then compute similarity information from these, and use it during training and validation (Figure 7).Modify.We do not use the obtained similarity information directly, but randomly offset and scale the time series to get numerous variations on the basis of the available data.We outline our approach to prepare the training data X by means of Algorithm 1 (validation data is generated accordingly).We randomly pick data sets from our collection of real-world and artificial data (Line 10).To modify the data, we randomly choose a scaling factor s, that basically defines the step size with which time steps are considered (Line 12).Then, we use a random offset, which basically determines the first time step that is considered in a time series (Line 14).Finally, we employ this information to generate a new training element P * i (Lines 16-19), each one consisting of n T time steps (we use n T = 35 throughout this work).Sample.We then take a random number of samples s from the modified similarity data using the respective sampling strategy (cf.discussion in Figure 4, Line 21).Training / Validation Data.Finally, this yields the data that can be used for training and validation of the neural network.In more detail, each training / validation data element consists of a pair: (1) the original similarity data after Modify, and (2) the respective data after Sample.Each ( 1) and ( 2) contains pairwise similarity information between n T time steps (a portion of this information has been removed from ( 2)).

Similarity Estimation
Algorithm 2 Our approach to estimate missing similarity information based on neural networks (see Figure 3 for a conceptual overview). 1: in: pool of similarity information P for all λ ∈ Λ do 12: obtain n λ samples using sampling strategy λ 13: we use the Adam optimizer [40] for training the neural network return Θ 30: end function Our overall approach to use neural networks to estimate missing similarity information and to select sampling strategies has been conceptually outlined already in Figure 3.In the following, we now aim to describe it in more detail by means of Algorithm 2.
First of all, we generate separate sets for training and validation using the procedure described above (Lines 6 and 7).On this basis, we then aim to determine the sampling strategy with the lowest cost č (Lines 9-22).For each sampling strategy λ ∈ Λ (Line 11), we then obtain a sampled (i.e., sparse) variant of the similarity information (Line 13).We then use this as input for training (Line 15).During validation (Lines 17-18), we determine a cost c that we then compare against the cost obtained by other strategies.If it is smaller than the smallest cost č determined so far (Line 19), we save the respective model λ of the respective strategy as the best one so far (Line 20).After testing all models, we continue refining the model that corresponds to the sampling strategy that led to smallest validation cost č (Line 27).

Results
In this section, we first discuss our evaluation setup (Section 6.1).We then evaluate the results with different sampling strategies (Section 6.2) as well as similarity estimation with neural networks for the selected strategy (Section 6.3).Finally, we discuss properties and limitations of our approach (Section 6.4).

Evaluation Setup
Parameters, Software and Hardware Setup.For our implementation, we use Python and TensorFlow [41] (r0.11).For training, we used the GPU implementation on the basis of CUDA using a GTX1070 on an Ubuntu 16.04 system with 32GB of RAM and an Intel Core i7-4770 CPU.Furthermore, we employ a batch size of 4096 and 1024 training iterations.In total, this means that there are four million training cases in each epoch.For validation, we use 16384 cases.As discussed above, each individual case consists of a reference (i.e., a modified version of a real-world or a synthetic data set) as well as a sampled version of it.In our evaluation, we randomly vary the number of samples such that they cover between 10% and 50% of all pairwise similarity information.We use squared distances to assess the difference of the estimated similarity to the reference.In this work, as mentioned above, we evaluate our approach by considering time windows of size n T = 35.In total, the generation of training sets, training, and validation takes around six hours for testing each sampling strategy using our setup described above.Note that overall the goal here is to train a network that is able to predict similarity for a wide range of temporal data, which is why we train and validate considering a large variety of generated training data.This means that the training process only needs to be done once, and the resulting neural network can then be applied to estimating similarity data as-is.Evaluating the trained network can be done very quickly and yields comparable performance to other types of estimation considering a similar amount of data for estimation (e.g., via inverse distance weighting as discussed below).
Comparison against Inverse Distance Weighting for Similarity Estimation.We compare our approach for estimating missing similarities with neural networks against a standard approach for scattered data interpolation, namely inverse distance weighting.Here, the similarity information corresponding to time step pairs d (t 0 , t 1 ) is calculated with a weighted average of the values available for the known pairs.The neighborhood considered here is the same as is used by the neural network (i.e., as specified in Equation ( 2)).With this, the value estimation d (t 0 , t 1 ) for a missing (i.e., not yet sampled) value is computed as follows (D denotes a map of previously computed similarity information, with m being returned for unknown pairs): )∈T δ (t 0 ,t 1 ) ω( t0 , t1 )D( t0 , t1 ) ∑ ( t0 , t1 )∈T δ (t 0 ,t 1 ) ω( t0 , t1 ) , with Here, p is a positive real number that specifies the power parameter.Weight decreases as distance increases from the interpolated points.Larger values for p assign greater influence to values closest to the interpolated point, with the result converging toward nearest neighbor interpolation for large values of p.

Sampling Strategies
For evaluating the different sampling strategies (cf.Section 4) regarding their performance in the context of a neural networks for similarity estimation, we train a neural network over 2048 epochs with respectively sampled data.
For each sampling strategy, we further compare the validation results of the neural network to the results achieved with inverse distance weighting for different power parameters.The respective results are shown in Figure 8.Most prominently, it can be seen that the sampling strategies relatively perform similarly across all estimation approaches: uniform pair and uniform time yield the best results (lowest validation cost), while random time and similarity time yield the worst results here.On the basis of the different sampling patterns in Figure 4, we assume that the main reason behind this are the resulting larger temporal regions in which no similarity information is available.For these, it is much more difficult across all similarity estimation approaches to yield reasonably good results.Note that we consider a number of samples that is randomly chosen to be between 10% and 50% of the full sampling.In preliminary tests with a larger number of samples, adaptive approaches performed much better relatively.This indicates that a more complex combination of strategies (or more advanced sampling strategies overall) could be worthwhile to consider in this context.However, a closer investigation and evaluation of this remains for future work.
The best result overall in our evaluation setting is achieved by our neural network-based similarity estimation with the uniform time sampling strategy.Not only in this case but within each sampling strategy, it can be seen that our neural network-based approach consistently yields better results (i.e., lower validation cost) than any inverse distance weighting variant.Please refer to the upcoming section for a closer discussion of the different reasons behind this at the example of the uniform time sampling strategy.3).Respective costs are given for the estimation with the trained neural network as well as inverse distance interpolation (IDW) for different power parameters p.For the neural network, not only the results for training with all data sets (Neural Network (all)) are presented, but also the validation costs are provided for neural networks that have only be trained with either the von Kármán data set (Neural Network (von Kármán) on the basis of Figure 2d) or the Supernova data set (Neural Network (Supernova) on the basis of Figure 2e).
For analyzing the utility of using a variety of different data sets for training, we also include the results of networks that have just been trained on the basis of one data set.For this evaluation, we use the von Kármán data set (Neural Network (von Kármán) in Figure 8, employing similarity information from Figure 2d) as well as the Supernova data set (Neural Network (Supernova), on the basis of Figure 2e).In both cases, still the same total number of training data sets is generated via modification and sampling.This means that the only difference is that just a single real-world data set (and no artificial similarity data) is employed for training, but the same validation process is used as for Neural Network (all) (i.e., all data sets are always considered for validation).It can be seen in Figure 8 that the respective neural networks trained with a single data set deliver worse results than the neural network that has been trained more diversely with all data sets (Neural Network (all)).However, they still produce reasonable results that are comparable to the quality generated by inverse distance weighting.Comparing Neural Network (von Kármán) and Neural Network (Supernova) against each other, it can be seen that their relative performance depends significantly on the sampling strategy as well.Essentially, this indicates that how successful different sampling strategies are also depends on the type of properties and characteristics of the similarity information that is employed for training.Among others, this supports the approach-as discussed in Section 3.3-of taking a variety of strategies into account and using an automatic approach to select the best one for a provided collection of data sets of interest.A more exhaustive evaluation of respective properties remains for future work.

Similarity Estimation
Next, we discuss the results of similarity estimation with the uniform time sampling strategy that has been determined deliver the best results in our setup.Reference similarity information, samples, and the reconstructed information with the estimated similarities for our neural network as well as for inverse distance interpolation are shown in Figure 9 (at the example of a subset of the validation set).Overall, as reflected by the small cost/error value, it can be seen that even in cases where a large portion of the data is missing, the trained network performs well in filling in the missing information.
Good results can be achieved with our neural network-based approach over a large variety of cases.Despite potentially only a fraction of the similarity information being available and/or temporal changes occurring at a high rate, we are still able to yield a good approximation of the actual values.
In comparison, inverse distance weighting struggles particularly in the case of a higher rate of changes (e.g., case 5 and case 10).As we consider a fairly large neighborhood (δ = 6), the lower power variants for inverse distance interpolation that also give further away samples a significant weight yield insufficient results, in particular for the cases with a large variation (i.e., p = 1 and p = 2).In turn, a large power parameter (p = 16) effectively only considers the closest points, which yields blocky (non-smooth) results which is particularly noticeable in some smoother cases (e.g., cases 7 and 8).The best performance of inverse distance weighting overall is achieved in-between with p = 4 and p = 8, that shares both issues of a high and a low power parameter, yet to a lesser extent.However, generally inferior results are achieved in comparison to the similarity estimation by the neural network.it can be seen that, while delivering significantly worse results than our comprehensively trained network, it still yields decent results that are comparable in quality to the reference inverse distance weighting techniques.While this can be interpreted as a small indicator that we are able to achieve good results for general similarity information from a small set of data sets, a more extensive evaluation is required to thoroughly analyze respective properties.

Conclusions
In this work, we presented different strategies for the progressive computation of similarity information in spatio-temporal data, along with an approach for estimating missing distance information.For similarity estimation, we proposed to use a neural network design that directly takes the already available similarity information of a time series into account.We then automatically determined the sampling strategy that yields the best result in combination with respectively trained networks for estimation.For training and validation, we used a variety of time-dependent 2D and 3D data from simulations and measurements as well as artificially generated data.
We could demonstrate that we achieve good results already with our proposed approach, with further improvements being subject to future work.In particular, we further aim to further explore the huge parameter space inherent to the setup and training of neural networks by systematically testing different types of neurons, different numbers of layers and numbers of neurons in each layer, different batch sizes and learning rates, etc. to further improve our results.We also plan to to evaluate the impact of the quality of estimated distance estimation on the final result of different types of visualization applications.Finally, we aim to compare our approach to a larger variety of alternative approaches for similarity estimation, develop more advanced techniques for artificial data generation, and conduct a more comprehensive evaluation regarding generalization properties.
(a) Bottle (resolution 900 × 430, 160 time steps considered): laser pulse shooting through a bottle, captured via Femto Photography (Velten et al. [39]).(b) von Kármán (resolution 301 × 101, 418 time steps): 2D time-dependent CFD simulation of a von Kármán vortex street.(c) Hot Room (resolution 101 × 101, 265 time steps): air flow within a closed container, driven by buoyant forces imposed by a heated bottom plate and a cooled top plate.To provoke transient aperiodic flow, the container exhibits two barriers (one on the top, one on the bottom).
(e) Supernova (432 3 , 60 time steps): result of a supernova simulation.The data set is made available by Dr. John Blondin at the North Carolina State University through US Department of Energy's SciDAC Institute for Ultrascale Visualization.

Figure 1 .
Figure 1.All data sets include scalar values, that are mapped to a representation that is shown here, and also used for the distance computation via a user-defined transfer function (respective distances of each time series are plotted in Figure 1).

Figure 2 .
Figure 2. Input similarity information from different data sets presented in the form of similarity matrices (with t 0 along the x-axis and t 1 along the y-axis, cf.(a)).Only one half of the matrix is visualized due the symmetry property of our distance metric (i.e., d(t 0 , t 1 ) = d(t 1 , t 0 )).Values are mapped to colors using the viridis color map (low distances = purple, medium = green/blue, large = yellow).

Figure 3 .
Figure 3. Overview of our approach for the adaptive sampling and estimation of similarities with neural networks (cf.Algorithm 2 for a more detailed description).The selection phase (Select) chooses the best strategy by carrying out one training and validation run for each strategy.Afterwards, the optimization phase (Opt) iteratively improves the network belonging to the best sampling strategy via continuous training with repeatedly updated training data.

( 5 )
distance-weighted time.Choose a time step randomly (similar to (3)), but the probability of selecting an interval is weighted by t i+1 − t i − 1 (akin to the selection criterion in (4)).(6) similarity time.Consider the distance between two subsequent time steps in T: d(t i , t i+1 ).Add a time step in the interval with the largest distance.(7) similarity-weighted time.Select an interval randomly to add a new time step to T * , with the probability being weighted by d(t i , t i+1 ).

Figure 4 .
Figure 4. Different sampling strategies by example.Horizontally, we provide the results of a certain sampling strategy, while vertically these strategies are demonstrate by means of different input data sets.Similarity pairs that have not yet been computed by the sampling strategy are indicated in red.

34 Figure 5 .
Figure 5. Illustration of the different time step pairs that are considered for training at the example of different time step pairs (t * 0 , t * 1 ) of interest (cf.Section 5.1, Considered Input Data).

Figure 6 .Algorithm 1 i
Figure 6.Pipeline for preparing training and validation data.

Figure 9 .
Figure 9. Reference, sampling, and estimation of similarities for different interpolation strategies.The numbers below the plots with estimated similarity give the respective validation cost ((c)-(h)).