Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data^{ †}

## Abstract

## 1. Introduction

## 2. Related Work

#### 2.1. Multivariate Volume Data Visualization

#### 2.2. Clustering

#### 2.3. Interpolation in Attribute Space

## 3. Clustering

## 4. Interpolation

## 5. Adaptive Scheme

## 6. Nearest-Neighbor Interpolation at Sharp Material Boundaries

## 7. Interactive Visual Exploration

## 8. Results

## 9. Discussion

#### 9.1. Histogram Bin Size

#### 9.2. Upsampling Rate

## 10. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

**Figure 1.**Grid partition of two-dimensional dataset: The space is divided into equally-sized bins in the first dimension (

**a**); and the non-empty bins are further subdivided in the second dimensions (

**b**).

**Figure 2.**(

**a**) Grid partition of two-dimensional dataset with six different density levels; and (

**b**) respective density cluster tree with four modes shown as leaves of the tree.

**Figure 3.**Clustering of arbitrarily shaped clusters: (

**a**) original dataset; and (

**b**) histogram-based clustering result.

**Figure 5.**Sensitivity of clustering results with respect to the bin size. The graph plots the number of mode clusters over the number of bins per dimension.

**Figure 6.**Upsampling for a 2D physical space and a 2D attribute space: (

**a**) the corner points of the 2D cell correspond to bins of the histogram that are not connected; and (

**b**) after upsampling, the filled bins of the histogram are connected.

**Figure 7.**(

**a**) ${d}_{\mathrm{max}}=0$, tree nodes: 58; (

**b**) ${d}_{\mathrm{max}}=4$, tree nodes: 39; (

**c**) ${d}_{\mathrm{max}}=6$, tree nodes: 17; and (

**d**) ${d}_{\mathrm{max}}=7$, tree nodes: 3. Discrete histograms with 100 bins each and cluster trees at different interpolation depth for data in Figure 9. Red bins are local minima corresponding to branching in trees. Interpolation makes histograms approach the form of continuous distribution and corrects cluster tree.

**Figure 8.**Effect of bin size choice and interpolation procedure on synthetic data with known ground truth: ${10}^{2}$ bins are not enough to separate all clusters resulting in a degenerate tree (

**upper row**); ${30}^{2}$ bins are too many to keep clusters together (

**middle row**); and interpolation of data with the same number of bins corrects the tree (

**lower row**). Cluster trees, parallel coordinates, and clusters in physical space are shown in the left, mid, and right columns, correspondingly.

**Figure 9.**Scalar field distribution (

**a**); and continuous histogram (

**b**) for artificial data. Bold vertical line denotes the scaled Dirac function in the histogram.

**Figure 10.**Designing a synthetic dataset: Algebraic surfaces separate clusters in physical space (

**a**). Functions of algebraic distance to the surfaces (

**b**) define distribution of two attributes. The resulting distribution in the attribute space in form of a 2D scatterplot (

**c**).

**Figure 11.**Cluster tree (

**upper row**); parallel coordinates plot (

**middle row**); and physical space visualization (

**lower row**) for the 2008 IEEE Visualization Design Contest dataset, time slice 75, for original attribute space using a 10-dimensional histogram (

**a**) before and (

**b**) after interpolation. Several mode clusters are merged when applying our approach, which leads to a simplification of the tree and better cluster separation.

**Figure 12.**Two fields of the climate simulation dataset: (

**a**) surface temperature field is globally defined and smooth; (

**b**) surface runoff and drainage field has non-vanishing values only for land regions and, therefore, may be discontinuous along the land–sea border; and (

**c**) the fraction of grid cells containing a part of the land–sea border shown in red is about $20\%$ due to the coarse resolution of the grid.

**Figure 13.**2D histograms with 25 bins each built for: (

**a**) the original climate simulation data; (

**b**) data upsampled using a global smooth interpolation; and (

**c**) data upsampled using the nearest-neighbor interpolation within the cells intersecting the coastal line. The upsampling depth in (

**b**,

**c**) is four. Horizontal and vertical axes correspond to fields surface runoff and drainage and surface temperature.

**Figure 14.**Scatterplots of the “tornado” dataset initially sampled on ${128}^{3}$ regular grid: original data (

**a**); and result of adaptive upsampling with interpolation depth 5 (

**b**).

**Table 1.**Computation times for non-adaptive vs. adaptive upsampling scheme at different upsampling depths (2008 IEEE Visualization Design Contest dataset).

${\mathit{d}}_{\mathbf{max}}$ | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Non-adaptive | $7.57$ s | $59.84$ s | $488.18$ s | 3929 s | 27,360 s |

Adaptive | $5.99$ s | $27.5$ s | $136.56$ s | $717.04$ s | 3646 s |

Non-empty bins | 1984 | 3949 | 6400 | 9411 | 12,861 |

Modified adaptive | $14.3$ s | $26.0$ s | $80.91$ s | $437.76$ s | 2737 s |

Non-empty bins | 1984 | 2075 | 2451 | 3635 | 5945 |

