#
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data^{ †}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Related Work

#### 2.1. Multivariate Volume Data Visualization

#### 2.2. Clustering

#### 2.3. Interpolation in Attribute Space

## 3. Clustering

## 4. Interpolation

## 5. Adaptive Scheme

## 6. Nearest-Neighbor Interpolation at Sharp Material Boundaries

## 7. Interactive Visual Exploration

## 8. Results

## 9. Discussion

#### 9.1. Histogram Bin Size

#### 9.2. Upsampling Rate

## 10. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Whalen, D.; Norman, M.L. Competition Data Set and Description. 2008 IEEE Visualization Design Contest. 2008. Available online: http://vis.computer.org/VisWeek2008/vis/contests.html (accessed on 20 June 2018).
- Competition Data Set and Description. 2010 IEEE Visualization Design Contest. 2010. Available online: http://viscontest.sdsc.edu/2010/ (accessed on 20 June 2018).
- Bellman, R.E. Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1957. [Google Scholar]
- Molchanov, V.; Linsen, L. Overcoming the Curse of Dimensionality When Clustering Multivariate Volume Data. In Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Funchal, Portugal, 27–29 January 2018; SciTePress: Setubal, Portugal, 2018; Volume 3, pp. 29–39. [Google Scholar]
- Sauber, N.; Theisel, H.; Seidel, H.P. Multifield-Graphs: An Approach to Visualizing Correlations in Multifield Scalar Data. IEEE Trans. Vis. Comput. Graph.
**2006**, 12, 917–924. [Google Scholar] [CrossRef] [PubMed] - Woodring, J.; Shen, H.W. Multi-variate, Time Varying, and Comparative Visualization with Contextual Cues. IEEE Trans. Vis. Comput. Graph.
**2006**, 12, 909–916. [Google Scholar] [CrossRef] [PubMed] - Akiba, H.; Ma, K.L. A Tri-Space Visualization Interface for Analyzing Time-Varying Multivariate Volume Data. In Proceedings of the Eurographics/IEEE VGTC Symposium on Visualization, Norrkoping, Sweden, 23–25 May 2007; pp. 115–122. [Google Scholar]
- Blaas, J.; Botha, C.P.; Post, F.H. Interactive Visualization of Multi-Field Medical Data Using Linked Physical and Feature-Space Views. In Proceedings of the Eurographics/IEEE VGTC Symposium on Visualization (EuroVis), Norrkoping, Sweden, 23–25 May 2007; pp. 123–130. [Google Scholar]
- Daniels, J., II; Anderson, E.W.; Nonato, L.G.; Silva, C.T. Interactive Vector Field Feature Identification. IEEE Trans. Vis. Comput. Graph.
**2010**, 16, 1560–1568. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Maciejewski, R.; Woo, I.; Chen, W.; Ebert, D. Structuring Feature Space: A Non-Parametric Method for Volumetric Transfer Function Generation. IEEE Trans. Vis. Comput. Graph.
**2009**, 15, 1473–1480. [Google Scholar] [CrossRef] [PubMed] - Linsen, L.; Long, T.V.; Rosenthal, P.; Rosswog, S. Surface extraction from multi-field particle volume data using multi-dimensional cluster visualization. IEEE Trans. Vis. Comput. Graph.
**2008**, 14, 1483–1490. [Google Scholar] [CrossRef] [PubMed] - Linsen, L.; Long, T.V.; Rosenthal, P. Linking multi-dimensional feature space cluster visualization to surface extraction from multi-field volume data. IEEE Comput. Graph. Appl.
**2009**, 29, 85–89. [Google Scholar] [CrossRef] [PubMed] - Dobrev, P.; Long, T.V.; Linsen, L. A Cluster Hierarchy-based Volume Rendering Approach for Interactive Visual Exploration of Multi-variate Volume Data. In Proceedings of the 16th International Workshop on Vision, Modeling and Visualization (VMV 2011), Berlin, Germany, 4–6 October 2011; Eurographics Association: Geneve, The Switzerland, 2011; pp. 137–144. [Google Scholar]
- Jain, A.K.; Dubes, R.C. Algorithms for Clustering Data; Prentice Hall: Upper Saddle River, NJ, USA, 1988. [Google Scholar]
- Han, J.; Kamber, M. Data Mining: Concepts and Techniques; Morgan Kaufmann Publishers: Burlington, MA, USA, 2006. [Google Scholar]
- Hartigan, J.A. Clustering Algorithms; Wiley: Hoboken, NJ, USA, 1975. [Google Scholar]
- Hartigan, J.A. Statistical Theory in Clustering. J. Classif.
**1985**, 2, 62–76. [Google Scholar] [CrossRef] - Wong, A.; Lane, T. A kth Nearest Neighbor Clustering Procedure. J. R. Stat. Soc. Ser. B
**1983**, 45, 362–368. [Google Scholar] - Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
- Hinneburg, A.; Keim, D. An efficient approach to clustering in large multimedia databases with noise. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 27–31 August 1998; pp. 58–65. [Google Scholar]
- Hinneburg, A.; Keim, D.A.; Wawryniuk, M. HD-Eye: Visual Mining of High-Dimensional Data. IEEE Comput. Graph. Appl.
**1999**, 19, 22–31. [Google Scholar] [CrossRef] - Ankerst, M.; Breunig, M.M.; Kriegel, H.P.; Sander, J. OPTICS: Ordering points to identify the clustering structure. In Proceedings of the 1999 ACM SIGMOD International Conference On Management of Data, Seattle, WA, USA, 1–4 June 1999; pp. 49–60. [Google Scholar]
- Stuetzle, W. Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classif.
**2003**, 20, 25–47. [Google Scholar] [CrossRef] - Stuetzle, W.; Nugent, R. A generalized single linkage method for estimating the cluster tree of a density. Tech. Rep.
**2007**, 19, 397–418. [Google Scholar] [CrossRef] - Long, T.V. Visualizing High-Density Clusters in Multidimensional Data. Ph.D. Thesis, School of Engineering and Science, Jacobs University, Bremen, Germany, 2009. [Google Scholar]
- Bachthaler, S.; Weiskopf, D. Continuous Scatterplots. IEEE Trans. Vis. Comput. Graph.
**2008**, 14, 1428–1435. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Bachthaler, S.; Weiskopf, D. Efficient and Adaptive Rendering of 2-D Continuous Scatterplots. Comput. Graph. Forum
**2009**, 28, 743–750. [Google Scholar] [CrossRef] [Green Version] - Heinrich, J.; Bachthaler, S.; Weiskopf, D. Progressive Splatting of Continuous Scatterplots and Parallel Coordinates. Comput. Graph. Forum
**2011**, 30, 653–662. [Google Scholar] [CrossRef] [Green Version] - Lehmann, D.J.; Theisel, H. Discontinuities in Continuous Scatter Plots. IEEE Trans. Vis. Comput. Graph.
**2010**, 16, 1291–1300. [Google Scholar] [CrossRef] [PubMed] - Lehmann, D.J.; Theisel, H. Features in Continuous Parallel Coordinates. IEEE Trans. Vis. Comput. Graph.
**2011**, 17, 1912–1921. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Karypis, G.; Han, E.H.; Kumar, V. Chameleon: Hierarchical Clustering Using Dynamic Modeling. Computer
**1999**, 32, 68–75. [Google Scholar] [CrossRef] - Crawfis, R.; Max, N. Texture Splats for 3D Vector and Scalar Field Visualization. In Proceedings of the 4th Conference on Visualization ’93, San Jose, CA, USA, 26 October 1993; IEEE CS Press: Los Alamitos, CA, USA, 1993; pp. 261–266. [Google Scholar]

**Figure 1.**Grid partition of two-dimensional dataset: The space is divided into equally-sized bins in the first dimension (

**a**); and the non-empty bins are further subdivided in the second dimensions (

**b**).

**Figure 2.**(

**a**) Grid partition of two-dimensional dataset with six different density levels; and (

**b**) respective density cluster tree with four modes shown as leaves of the tree.

**Figure 3.**Clustering of arbitrarily shaped clusters: (

**a**) original dataset; and (

**b**) histogram-based clustering result.

**Figure 5.**Sensitivity of clustering results with respect to the bin size. The graph plots the number of mode clusters over the number of bins per dimension.

**Figure 6.**Upsampling for a 2D physical space and a 2D attribute space: (

**a**) the corner points of the 2D cell correspond to bins of the histogram that are not connected; and (

**b**) after upsampling, the filled bins of the histogram are connected.

**Figure 7.**(

**a**) ${d}_{\mathrm{max}}=0$, tree nodes: 58; (

**b**) ${d}_{\mathrm{max}}=4$, tree nodes: 39; (

**c**) ${d}_{\mathrm{max}}=6$, tree nodes: 17; and (

**d**) ${d}_{\mathrm{max}}=7$, tree nodes: 3. Discrete histograms with 100 bins each and cluster trees at different interpolation depth for data in Figure 9. Red bins are local minima corresponding to branching in trees. Interpolation makes histograms approach the form of continuous distribution and corrects cluster tree.

**Figure 8.**Effect of bin size choice and interpolation procedure on synthetic data with known ground truth: ${10}^{2}$ bins are not enough to separate all clusters resulting in a degenerate tree (

**upper row**); ${30}^{2}$ bins are too many to keep clusters together (

**middle row**); and interpolation of data with the same number of bins corrects the tree (

**lower row**). Cluster trees, parallel coordinates, and clusters in physical space are shown in the left, mid, and right columns, correspondingly.

**Figure 9.**Scalar field distribution (

**a**); and continuous histogram (

**b**) for artificial data. Bold vertical line denotes the scaled Dirac function in the histogram.

**Figure 10.**Designing a synthetic dataset: Algebraic surfaces separate clusters in physical space (

**a**). Functions of algebraic distance to the surfaces (

**b**) define distribution of two attributes. The resulting distribution in the attribute space in form of a 2D scatterplot (

**c**).

**Figure 11.**Cluster tree (

**upper row**); parallel coordinates plot (

**middle row**); and physical space visualization (

**lower row**) for the 2008 IEEE Visualization Design Contest dataset, time slice 75, for original attribute space using a 10-dimensional histogram (

**a**) before and (

**b**) after interpolation. Several mode clusters are merged when applying our approach, which leads to a simplification of the tree and better cluster separation.

**Figure 12.**Two fields of the climate simulation dataset: (

**a**) surface temperature field is globally defined and smooth; (

**b**) surface runoff and drainage field has non-vanishing values only for land regions and, therefore, may be discontinuous along the land–sea border; and (

**c**) the fraction of grid cells containing a part of the land–sea border shown in red is about $20\%$ due to the coarse resolution of the grid.

**Figure 13.**2D histograms with 25 bins each built for: (

**a**) the original climate simulation data; (

**b**) data upsampled using a global smooth interpolation; and (

**c**) data upsampled using the nearest-neighbor interpolation within the cells intersecting the coastal line. The upsampling depth in (

**b**,

**c**) is four. Horizontal and vertical axes correspond to fields surface runoff and drainage and surface temperature.

**Figure 14.**Scatterplots of the “tornado” dataset initially sampled on ${128}^{3}$ regular grid: original data (

**a**); and result of adaptive upsampling with interpolation depth 5 (

**b**).

**Table 1.**Computation times for non-adaptive vs. adaptive upsampling scheme at different upsampling depths (2008 IEEE Visualization Design Contest dataset).

${\mathit{d}}_{\mathbf{max}}$ | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|

Non-adaptive | $7.57$ s | $59.84$ s | $488.18$ s | 3929 s | 27,360 s |

Adaptive | $5.99$ s | $27.5$ s | $136.56$ s | $717.04$ s | 3646 s |

Non-empty bins | 1984 | 3949 | 6400 | 9411 | 12,861 |

Modified adaptive | $14.3$ s | $26.0$ s | $80.91$ s | $437.76$ s | 2737 s |

Non-empty bins | 1984 | 2075 | 2451 | 3635 | 5945 |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Molchanov, V.; Linsen, L.
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data. *Information* **2018**, *9*, 156.
https://doi.org/10.3390/info9070156

**AMA Style**

Molchanov V, Linsen L.
Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data. *Information*. 2018; 9(7):156.
https://doi.org/10.3390/info9070156

**Chicago/Turabian Style**

Molchanov, Vladimir, and Lars Linsen.
2018. "Upsampling for Improved Multidimensional Attribute Space Clustering of Multifield Data" *Information* 9, no. 7: 156.
https://doi.org/10.3390/info9070156