# Multivariate Pointwise Information-Driven Data Sampling and Visualization

^{*}

## Abstract

**:**

## 1. Introduction

- We propose a new multivariate association-driven data sampling algorithm for large-scale data summarization.
- Given a user-specified sampling fraction, we use pointwise information measures and statistical distribution-based sampling techniques to generate a sub-sampled data that preserves the important multivariate features.
- We perform a detailed qualitative and quantitative study to demonstrate the efficacy of the proposed sampling scheme.

## 2. Related Works

#### 2.1. Information Theory in Visualization

#### 2.2. Sampling for Data Analysis and Visualization

#### 2.3. Multivariate Data Analysis and Visualization

## 3. Method

#### 3.1. Random Sampling

#### 3.2. Proposed Multivariate Statistical Association-Driven Sampling

#### 3.2.1. Multivariate Pointwise Information Characterization

#### 3.2.2. Generalized Pointwise Information

#### 3.2.3. Pointwise Information-Guided Multivariate Sampling

## 4. Results

#### 4.1. Sample-Based Multivariate Query-Driven Visual Analysis

#### 4.1.1. Hurricane Isabel Data

#### 4.1.2. Turbulent Combustion Data

#### 4.1.3. Asteroid Impact Data

#### 4.1.4. Quantitative Evaluation of Query-Driven Analysis

#### 4.2. Reconstruction-Based Visualization of Sampled Data

#### 4.2.1. Hurricane Isabel Data

#### 4.2.2. Turbulent Combustion Data

#### 4.2.3. Asteroid Impact Data

#### 4.2.4. Image-Based Quantitative Evaluation of Reconstruction-Based Visualization

#### 4.3. Multivariate Correlation Analysis of the Proposed Sampling Method

## 5. Discussion, Limitations, and Future Works

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Ahern, S.; Shoshani, A.; Ma, K.L.; Choudhary, A.; Critchlow, T.; Klasky, S.; Pascucci, V.; Ahrens, J.; Bethel, E.; Childs, H. Scientific Discovery at the Exascale. In Proceedings of the The DOE ASCR 2011 Workshop on Exascale Data Management, Houston, TX, USA, 22–23 February 2011. [Google Scholar]
- Childs, H. Data Exploration at the Exascale. Supercomput. Front. Innov.
**2015**, 2, 5–13. [Google Scholar] - Ahrens, J.; Jourdain, S.; OLeary, P.; Patchett, J.; Rogers, D.H.; Petersen, M. An Image-Based Approach to Extreme Scale in Situ Visualization and Analysis. In Proceedings of the SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA, 16–21 November 2014; pp. 424–434. [Google Scholar] [CrossRef]
- Nouanesengsy, B.; Woodring, J.; Patchett, J.; Myers, K.; Ahrens, J. ADR visualization: A generalized framework for ranking large-scale scientific data using Analysis-Driven Refinement. In Proceedings of the 2014 IEEE 4th Symposium on Large Data Analysis and Visualization (LDAV), Paris, France, 9–10 November 2014; pp. 43–50. [Google Scholar] [CrossRef]
- Tikhonova, A.; Correa, C.D.; Ma, K. Explorable images for visualizing volume data. In Proceedings of the 2010 IEEE Pacific Visualization Symposium (PacificVis), Taipei, Taiwan, 2–5 March 2010; pp. 177–184. [Google Scholar] [CrossRef]
- Dutta, S.; Chen, C.M.; Heinlein, G.; Shen, H.W.; Chen, J.P. In Situ Distribution Guided Analysis and Visualization of Transonic Jet Engine Simulations. IEEE Trans. Vis. Comput. Graph.
**2017**, 23, 811–820. [Google Scholar] [CrossRef] [PubMed] - Woodring, J.; Petersen, M.; Schmeiβer, A.; Patchett, J.; Ahrens, J.; Hagen, H. In Situ Eddy Analysis in a High-Resolution Ocean Climate Model. IEEE Trans. Vis. Comput. Graph.
**2016**, 22, 857–866. [Google Scholar] [CrossRef] [PubMed] - Akiba, H.; Ma, K.; Chen, J.H.; Hawkes, E.R. Visualizing Multivariate Volume Data from Turbulent Combustion Simulations. Comput. Sci. Eng.
**2007**, 9, 76–83. [Google Scholar] [CrossRef] [Green Version] - Gosink, L.J.; Garth, C.; Anderson, J.C.; Bethel, E.W.; Joy, K.I. An Application of Multivariate Statistical Analysis for Query-Driven Visualization. IEEE Trans. Vis. Comput. Graph.
**2011**, 17, 264–275. [Google Scholar] [CrossRef] [PubMed] - Hazarika, S.; Dutta, S.; Shen, H.; Chen, J. CoDDA: A Flexible Copula-based Distribution Driven Analysis Framework for Large-Scale Multivariate Data. IEEE Trans. Vis. Comput. Graph.
**2019**, 25, 1214–1224. [Google Scholar] [CrossRef] [PubMed] - Liu, X.; Shen, H.W. Association Analysis for Visual Exploration of Multivariate Scientific Data Sets. IEEE Trans. Vis. Comput. Graph.
**2016**, 22, 955–964. [Google Scholar] [CrossRef] [PubMed] - Biswas, A.; Dutta, S.; Shen, H.; Woodring, J. An Information-Aware Framework for Exploring Multivariate Data Sets. IEEE Trans. Vis. Comput. Graph.
**2013**, 19, 2683–2692. [Google Scholar] [CrossRef] [PubMed] - Jänicke, H.; Wiebel, A.; Scheuermann, G.; Kollmann, W. Multifield visualization using local statistical complexity. IEEE Trans. Vis. Comput. Graph.
**2007**, 13, 1384–1391. [Google Scholar] [CrossRef] - Stockinger, K.; Shalf, J.; Wu, K.; Bethel, E.W. Query-driven visualization of large data sets. In Proceedings of the IEEE Visualization 2005 (VIS 05), Minneapolis, MN, USA, 23–28 October 2005; pp. 167–174. [Google Scholar] [CrossRef]
- Wang, K.; Shareef, N.; Shen, H. Image and Distribution Based Volume Rendering for Large Data Sets. In Proceedings of the 2018 IEEE Pacific Visualization Symposium (PacificVis), Kobe, Japan, 10–13 April 2018; pp. 26–35. [Google Scholar] [CrossRef]
- Wang, K.; Wei, T.; Shareef, N.; Shen, H. Statistical visualization and analysis of large data using a value-based spatial distribution. In Proceedings of the 2017 IEEE Pacific Visualization Symposium (PacificVis), Seoul, Korea, 18–21 April 2017; pp. 161–170. [Google Scholar] [CrossRef]
- Dutta, S.; Woodring, J.; Shen, H.W.; Chen, J.P.; Ahrens, J. Homogeneity guided probabilistic data summaries for analysis and visualization of large-scale data sets. In Proceedings of the 2017 IEEE Pacific Visualization Symposium (PacificVis), Seoul, Korea, 18–21 April 2017; pp. 111–120. [Google Scholar] [CrossRef]
- Clyne, J.; Mininni, P.; Norton, A.; Rast, M. Interactive desktop analysis of high resolution simulations: application to turbulent plume dynamics and current sheet formation. New J. Phys.
**2007**, 9, 301. [Google Scholar] [CrossRef] - Li, S.; Sane, S.; Orf, L.; Mininni, P.; Clyne, J.; Childs, H. Spatiotemporal Wavelet Compression for Visualization of Scientific Simulation Data. In Proceedings of the 2017 IEEE International Conference on Cluster Computing (CLUSTER), Honolulu, HI, USA, 5–8 September 2017; pp. 216–227. [Google Scholar] [CrossRef]
- Li, S.; Gruchalla, K.; Potter, K.; Clyne, J.; Childs, H. Evaluating the efficacy of wavelet configurations on turbulent-flow data. In Proceedings of the 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV), Chicago, IL, USA, 25–26 October 2015; pp. 81–89. [Google Scholar] [CrossRef]
- Lakshminarasimhan, S.; Shah, N.; Ethier, S.; Klasky, S.; Latham, R.; Ross, R.; Samatova, N.F. Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data. In Euro-Par 2011 Parallel Processing; Jeannot, E., Namyst, R., Roman, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 366–379. [Google Scholar]
- Biswas, A.; Dutta, S.; Pulido, J.; Ahrens, J. In Situ Data-driven Adaptive Sampling for Large-scale Simulation Data Summarization. In Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, Dallas, TX, USA, 12 November 2018; ACM: New York, NY, USA, 2018; pp. 13–18. [Google Scholar] [CrossRef]
- Wei, T.; Dutta, S.; Shen, H. Information Guided Data Sampling and Recovery Using Bitmap Indexing. In Proceedings of the 2018 IEEE Pacific Visualization Symposium (PacificVis), Kobe, Japan, 10–13 April 2018; pp. 56–65. [Google Scholar] [CrossRef]
- Woodring, J.; Ahrens, J.; Figg, J.; Wendelberger, J.; Habib, S.; Heitmann, K. In-situ Sampling of a Large-scale Particle Simulation for Interactive Visualization and Analysis. In Proceedings of the 13th Eurographics/IEEE—VGTC Conference on Visualization, Bergen, Norway, 1–3 June 2011; pp. 1151–1160. [Google Scholar] [CrossRef]
- Su, Y.; Agrawal, G.; Woodring, J.; Myers, K.; Wendelberger, J.; Ahrens, J. Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices. In Proceedings of the 22nd International Symposium on High-performance Parallel and Distributed Computing, New York, NY, USA, 17–21 June 2013; ACM: New York, NY, USA, 2013; pp. 13–24. [Google Scholar] [CrossRef]
- Church, K.W.; Hanks, P. Word association norms, mutual information, and lexicography. In Proceedings of the 27th Annual Meeting on Association for Computational Linguistics, Vancouver, BC, Canada, 26–29 June 1989; Association for Computational Linguistics: Stroudsburg, PA, USA, 1989; pp. 76–83. [Google Scholar] [CrossRef]
- Van de Cruys, T. Two multivariate generalizations of pointwise mutual information. In Proceedings of the Workshop on Distributional Semantics and Compositionality, Portland, OR, USA, 24 June 2011; Association for Computational Linguistics: Stroudsburg, PA, USA, 2011; pp. 16–20. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley Series in Telecommunications and Signal Processing; Wiley-Interscience: New York, NY, USA, 2006. [Google Scholar]
- Shannon, C.E. A Mathematical Theory of Communication. SIGMOBILE Mob. Comput. Commun. Rev.
**2001**, 5, 3–55. [Google Scholar] [CrossRef] - Verdú, S. Fifty years of Shannon theory. Inf. Theory IEEE Trans.
**1998**, 44, 2057–2078. [Google Scholar] [CrossRef] - Viola, I.; Feixas, M.; Sbert, M.; Gröller, M.E. Importance-Driven Focus of Attention. IEEE Trans. Vis. Comput. Graph.
**2006**, 12, 933–940. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Collignon, A.; Maes, F.; Delaere, D.; Vandermeulen, D.; Suetens, P.; Marchal, G. Automated multi-modality image registration based on information theory. Inf. Process. Med. Imaging
**1995**, 3, 263–274. [Google Scholar] - Hill, D.L.G.; Batchelor, P.G.; Holden, M.; Hawkes, D.J. Medical image registration. Phys. Med. Biol.
**2001**, 46, R1. [Google Scholar] [CrossRef] [PubMed] - Wells, W.M., III; Viola, P.; Atsumi, H.; Nakajima, S.; Kikinis, R. Multi-modal volume registration by maximization of mutual information. Med. Image Anal.
**1996**, 1, 35–51. [Google Scholar] [CrossRef] - Maes, F.; Collignon, A.; Vandermeulen, D.; Marchal, G.; Suetens, P. Multimodality image registration by maximization of mutual information. IEEE Trans. Med. Imaging
**1997**, 16, 187–198. [Google Scholar] [CrossRef] - Pluim, J.P.W.; Maintz, J.B.A.; Viergever, M.A. Mutual-information-based registration of medical images: A survey. IEEE Trans. Med. Imaging
**2003**, 22, 986–1004. [Google Scholar] [CrossRef] [PubMed] - Feixas, M.; Acebo, E.D.; Bekaert, P.; Sbert, M. An Information Theory Framework for the Analysis of Scene Complexity. In Computer Graphics Forum; Blackwell Publishers, Ltd.: Oxford, UK; Boston, MA, USA, 1999. [Google Scholar]
- Rigau, J.; Feixas, M.; Sbert, M. Shape complexity based on mutual information. In Proceedings of the 2005 International Conference on Shape Modeling and Applications, Cambridge, MA, USA, 13–17 June 2005; pp. 355–360. [Google Scholar] [CrossRef]
- Feixas, M.; Sbert, M.; González, F. A unified information-theoretic framework for viewpoint selection and mesh saliency. ACM Trans. Appl. Percept.
**2009**. [Google Scholar] [CrossRef] - Bruckner, S.; Möller, T. Isosurface Similarity Maps. Comput. Graph. Forum
**2010**, 29, 773–782. [Google Scholar] [CrossRef] [Green Version] - Wei, T.H.; Lee, T.Y.; Shen, H.W. Evaluating Isosurfaces with Level-set-based Information Maps. In Proceedings of the 15th Eurographics Conference on Visualization, Leipzig, Germany, 17–21 June 2013; Eurographics Association: Aire-la-Ville, Switzerland, 2013; pp. 1–10. [Google Scholar] [CrossRef]
- Bramon, R.; Ruiz, M.; Bardera, A.; Boada, I.; Feixas, M.; Sbert, M. An Information-Theoretic Observation Channel for Volume Visualization. Comput. Graph. Forum
**2013**, 32, 411–420. [Google Scholar] [CrossRef] - Ma, J.; Wang, C.; Shene, C.K. Coherent view-dependent streamline selection for importance-driven flow visualization. Proc. SPIE
**2013**, 8654. [Google Scholar] [CrossRef] - Tao, J.; Ma, J.; Wang, C.; Shene, C.K. A Unified Approach to Streamline Selection and Viewpoint Selection for 3D Flow Visualization. IEEE Trans. Vis. Comput. Graph.
**2013**, 19, 393–406. [Google Scholar] [CrossRef] [PubMed] - Bramon, R.; Boada, I.; Bardera, A.; Rodriguez, J.; Feixas, M.; Puig, J.; Sbert, M. Multimodal Data Fusion Based on Mutual Information. IEEE Trans. Vis. Comput. Graph.
**2012**, 18, 1574–1587. [Google Scholar] [CrossRef] [PubMed] - Bramon, R.; Ruiz, M.; Bardera, A.; Boada, I.; Feixas, M.; Sbert, M. Information Theory-Based Automatic Multimodal Transfer Function Design. IEEE J. Biomed. Health Inform.
**2013**, 17, 870–880. [Google Scholar] [CrossRef] [PubMed] - Haidacher, M.; Bruckner, S.; Kanitsar, A.; Gröller, M.E. Information-based Transfer Functions for Multimodal Visualization. In Proceedings of the First Eurographics conference on Visual Computing for Biomedicine, Delft, The Netherlands, 6–7 October 2008; Botha, C.P., Kindlmann, G., Niessen, W., Preim, B., Eds.; Eurographics Association: Geneva, Switzerland, 2008; pp. 101–108. [Google Scholar]
- Dutta, S.; Liu, X.; Biswas, A.; Shen, H.W.; Chen, J.P. Pointwise Information Guided Visual Analysis of Time-varying Multi-fields. In Proceedings of the SIGGRAPH Asia 2017 Symposium on Visualization, Bangkok, Thailand, 27–30 November 2017; ACM: New York, NY, USA, 2017; pp. 17:1–17:8. [Google Scholar] [CrossRef]
- Chen, M.; Feixas, M.; Viola, I.; Bardera, A.; Shen, H.W.; Sbert, M. Information Theory Tools for Visualization; A K Peters: Natick, MA, USA; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
- Chen, M.; Jänicke, H. An Information-theoretic Framework for Visualization. IEEE Trans. Vis. Comput. Graph.
**2010**, 16, 1206–1215. [Google Scholar] [CrossRef] [Green Version] - Rigau, J.; Feixas, M.; Sbert, M. Informational Aesthetics Measures. IEEE Comput. Graph. Appl.
**2008**, 28, 24–34. [Google Scholar] [CrossRef] [Green Version] - Sbert, M.; Feixas, M.; Rigau, J.; Chover, M.; Viola, I. Information Theory Tools for Computer Graphics; Synthesis Lectures on Computer Graphics and Animation; Morgan and Claypool Publishers: Fort Collins, CO, USA, 2009. [Google Scholar]
- Wang, C.; Shen, H.W. Information Theory in Scientific Visualization. Entropy
**2011**, 13, 254–273. [Google Scholar] [CrossRef] [Green Version] - Park, Y.; Cafarella, M.J.; Mozafari, B. Visualization-Aware Sampling for Very Large Databases. arXiv
**2015**, arXiv:1510.03921. [Google Scholar] - Nguyen, T.T.; Song, I. Centrality clustering-based sampling for big data visualization. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 1911–1917. [Google Scholar] [CrossRef]
- Chen, X.H.; Dempster, A.P.; Liu, J.S. Weighted Finite Population Sampling to Maximize Entropy. Biometrika
**1994**, 81, 457–469. [Google Scholar] [CrossRef] - Ko, C.W.; Lee, J.; Queyranne, M. An Exact Algorithm for Maximum Entropy Sampling. Oper. Res.
**1995**, 43, 684–691. [Google Scholar] [CrossRef] - Shewry, M.C.; Wynn, H.P. Maximum entropy sampling. J. Appl. Stat.
**1987**, 14, 165–170. [Google Scholar] [CrossRef] - Sauber, N.; Theisel, H.; Seidel, H.P. Multifield-Graphs: An Approach to Visualizing Correlations in Multifield Scalar Data. IEEE Trans. Vis. Comput. Graph.
**2006**, 12, 917–924. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Gosink, L.; Anderson, J.; Bethel, W.; Joy, K. Variable Interactions in Query-Driven Visualization. IEEE Trans. Vis. Comput. Graph.
**2007**, 13, 1400–1407. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Wong, P.C.; Bergeron, R.D. 30 Years of Multidimensional Multivariate Visualization. In Scientific Visualization, Overviews, Methodologies, and Techniques; IEEE Computer Society: Washington, DC, USA, 1997; pp. 3–33. [Google Scholar]
- Fuchs, R.; Hauser, H. Visualization of Multi-Variate Scientific Data. InComputer Graphics Forum; Blackwell Publishing Ltd.: Oxford, UK, 2009; Volume 28, pp. 1670–1690. [Google Scholar] [CrossRef]
- Lohr, S. Sampling: Design and Analysis; Advanced, Cengage Learning: Boston, MA, USA, 2009. [Google Scholar]
- Albert, J. Bayesian Computation with R; Use R; Springer: New York, NY, USA, 2009. [Google Scholar]
- Watanabe, S. Information theoretical analysis of multivariate correlation. IBM J. Res. Dev.
**1960**, 4, 66–82. [Google Scholar] [CrossRef] - Doucet, A.; Godsill, S.; Andrieu, C. On Sequential Monte Carlo Sampling Methods for Bayesian Filtering. Stat. Comput.
**2000**, 10, 197–208. [Google Scholar] [CrossRef] - Lawrence, E.; Wiel, S.V.; Bent, R. Model Bank State Estimation for Power Grids Using Importance Sampling. Technometrics
**2013**, 55, 426–435. [Google Scholar] [CrossRef] - Patchett, J.; Gisler, G. Deep Water Impact Ensemble Data Set. Los Alamos National Laboratory, LA-UR-17-21595. 2017. Available online: https://oceans11.lanl.gov/deepwaterimpact/ (accessed on 16 July 2019).
- Gisler, G.R.; Heberling, T.; Plesko, C.S.; Weaver, R.P. Three-dimensional simulations of oblique asteroid impacts into water. J. Space Saf. Eng.
**2018**, 5, 106–114. [Google Scholar] [CrossRef] - Levandowsky, M.; Winter, D. Distance between Sets. Nature
**1971**, 234. [Google Scholar] [CrossRef] - Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process.
**2004**, 13, 600–612. [Google Scholar] [CrossRef] - Székely, G.J.; Rizzo, M.L.; Bakirov, N.K. Measuring and Testing Dependence by Correlation of Distances. Ann. Stat.
**2007**, 35, 2769–2794. [Google Scholar] [CrossRef] - Lu, K.; Shen, H. A compact multivariate histogram representation for query-driven visualization. In Proceedings of the 2015 IEEE 5th Symposium on Large Data Analysis and Visualization (LDAV), Chicago, IL, USA, 25–26 October 2015; pp. 49–56. [Google Scholar] [CrossRef]

**Figure 1.**Visualization of Pressure and Velocity field of Hurricane Isabel data set. The hurricane eye at the center of Pressure field and the high velocity region around the hurricane eye can be observed.

**Figure 2.**PMI computed from Pressure and Velocity field of Hurricane Isabel data set is visualized. (

**a**) shows the 2D plot of PMI values for all value pairs of Pressure and Velocity, (

**b**) provides the PMI field for analyzing the PMI values in the spatial domain. It can be seen that around the hurricane eye, the eyewall is highlighted as high PMI-valued region which indicates a joint feature in the data set involving Pressure and Velocity field.

**Figure 3.**Sampling result on Isabel data set when Pressure and Velocity variables are used. (

**a**) shows results of random sampling and (

**b**) shows results of the proposed pointwise information driven sampling results for sampling fraction $0.03$. By observing the PMI field presented in Figure 2b, it can be seen that the proposed sampling method samples densely from the regions where statistical association between Pressure and Velocity is stronger (

**b**).

**Figure 4.**Sampling result for Isabel data set when three variables (QGraup, QCloud, and Precipitation) are used to perform sampling. In this case, the generalized specific correlation measure presented in Equation is used to compute multivariate associativity for the data points considering all three variables. (

**a**–

**c**) show the rendering of QGraup, QCloud, and Precipitation fields respectively. (

**d**) presents the rendering of sampled data points when the proposed multivariate sampling algorithm is applied to these three variables. It can be seen that the cloud and the rain bands show stronger statistical association among three variables and hence are sampled densely. The sampling fraction used in this example is $0.05$.

**Figure 5.**Results of the proposed sampling technique when the number of histogram bins is varied while computing the information theoretic measure PMI. It is observed that the overall result remains similar without impacting the outcome of the sampling algorithm significantly.

**Figure 6.**Visualization of multivariate query-driven analysis performed on the sampled data using Hurricane Isabel data set. The multivariate query −100 < Pressure < −4900 AND Velocity > 10 is applied on the sampled data sets. (

**a**) shows all the points selected by the proposed sampling algorithm by using Pressure and Velocity variable. (

**b**) shows the data points selected by the query when applied to raw data. (

**c**) shows the points selected when the query is performed on the sub-sampled data produced by the proposed sampling scheme and (

**d**) presents the result of the query when applied to a randomly sampled data set. The sampling fraction used in this experiment is $0.07$.

**Figure 7.**Visualization of multivariate query-driven analysis performed on the sampled data using Turbulent Combustion data set. The multivariate query 0.3 < mixfrac < 0.7 AND 0.0006 < Y_OH 0.1 is applied on the sampled data sets. (

**a**) shows all the points selected by the proposed sampling algorithm by using mixfrac and Y_OH variable. (

**b**) shows the data points selected by the query when applied to raw data. (

**c**) shows the points selected when the query is performed on the sub-sampled data produced by the proposed sampling scheme and (

**d**) presents the result of the query when applied to a randomly sampled data set. The sampling fraction used in this experiment is $0.07$.

**Figure 8.**Visualization of multivariate query driven analysis performed on the sampled data using Asteroid impact data set. The multivariate query 0.13 < tev < 0.5 AND 0.45 < v02 1.0 is applied on the sampled data sets. (

**a**) shows all the points selected by the proposed sampling algorithm by using tev and v02 variable. (

**b**) shows the data points selected by the query when applied to raw data. (

**c**) shows the points selected when the query is performed on the sub-sampled data produced by the proposed sampling scheme and (

**d**) presents the result of the query when applied to a randomly sampled data set. The sampling fraction used in this experiment is $0.07$.

**Figure 9.**Reconstruction-based visualization of Velocity field of Hurricane Isabel data set. Linear interpolation is used to reconstruct the data from the sub-sampled data sets. (

**a**) shows the result from the original raw data. (

**b**) provides the reconstruction result from the sub-sampled data generated by the proposed method, and (

**c**) presents the result of reconstruction from random sampled data. The sampling fraction used in this experiment is $0.05$.

**Figure 10.**Reconstruction-based visualization of mixfrac field of Turbulent Combustion data set. Linear interpolation is used to reconstruct the data from the sub-sampled data sets. (

**a**) shows the result from the original raw data. (

**b**) provides the reconstruction result from the sub-sampled data generated by the proposed method, and (

**c**) presents the result of reconstruction from random sampled data. The sampling fraction used in this experiment is $0.05$.

**Figure 11.**Reconstruction-based visualization of Y_OH field of Turbulent Combustion data set. Linear interpolation is used to reconstruct the data from the sub-sampled data sets. (

**a**) shows the result from the original raw data. (

**b**) provides the reconstruction result from the sub-sampled data generated by the proposed method, and (

**c**) presents the result of reconstruction from random sampled data. The sampling fraction used in this experiment is $0.05$.

**Figure 12.**Reconstruction-based visualization of tev field of Asteroid impact data set. Linear interpolation is used to reconstruct the data from the sub-sampled data sets. (

**a**) shows the result from the original raw data. (

**b**) provides the reconstruction result from the sub-sampled data generated by the proposed method, and (

**c**) presents the result of reconstruction from random sampled data. The sampling fraction used in this experiment is $0.05$.

**Figure 13.**Regions of interest (ROI) of different data sets used for analysis. (

**a**) shows the ROI in Isabel data set, where the hurricane eye feature is selected. (

**b**) shows the ROI for Combustion data set, where the turbulent flame region is highlighted. Finally, in (

**c**) the ROI for asteroid data set is shown. The ROI selected in this example indicates the region where the asteroid has impacted the ocean surface and the splash of the water is ejected to the environment.

samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | samp. frac: 0.07 | samp. frac: 0.09 | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | |

Isabel data (−100 < Pres < −4900 & Vel > 10) | 0.0096 | 0.0468 | 0.029 | 0.143 | 0.048 | 0.233 | 0.0676 | 0.315 | 0.0846 | 0.388 |

Isabel data (0 < Pres < 1500 & 10 < Vel < 35) | 0.0116 | 0.0103 | 0.0293 | 0.0332 | 0.05 | 0.0524 | 0.0724 | 0.078 | 0.0842 | 0.0969 |

Isabel data (−100 < Pres < −4900 & Qva > 0.017) | 0.0086 | 0.0912 | 0.033 | 0.163 | 0.05 | 0.266 | 0.0637 | 0.284 | 0.086 | 0.314 |

Isabel data (Pres > 300 & 0.02 < Qva < 0.03) | 0.0088 | 0.023 | 0.0159 | 0.0585 | 0.062 | 0.1241 | 0.0726 | 0.1507 | 0.0975 | 0.2446 |

samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | samp. frac: 0.07 | samp. frac: 0.09 | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | |

Combustion data (0.3 < mixfrac < 0.7 & 0.0006 < Y_OH < 0.1) | 0.0099 | 0.0275 | 0.029 | 0.081 | 0.048 | 0.135 | 0.0671 | 0.191 | 0.0862 | 0.244 |

Combustion data (0.7 < mixfrac < 1.0 & 0.0005 < Y_OH < 0.0019) | 0.00884 | 0.0329 | 0.0291 | 0.1139 | 0.0474 | 0.1892 | 0.0686 | 0.2636 | 0.0877 | 0.3518 |

samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | samp. frac: 0.07 | samp. frac: 0.09 | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | |

Asteroid data (0.13 < tev < 0.5 & 0.45 < v02 < 1.0) | 0.013 | 0.067 | 0.029 | 0.202 | 0.0479 | 0.328 | 0.0678 | 0.431 | 0.086 | 0.52 |

Asteroid data (0.1 < tev < 0.3 & 0.01 < v02 < 0.6) | 0.0097 | 0.0827 | 0.0302 | 0.2497 | 0.0491 | 0.4154 | 0.0668 | 0.5777 | 0.0866 | 0.7083 |

Isabel Pressure Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.9844 | 0.9915 | 0.9916 | 0.9931 | 0.9926 | 0.9939 |

MSE | 6.5563 | 1.9267 | 2.5239 | 1.2559 | 2.0576 | 0.8961 |

Isabel Velocity Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.9234 | 0.9559 | 0.9427 | 0.9649 | 0.9516 | 0.9702 |

MSE | 13.9638 | 8.492 | 10.6865 | 6.0452 | 8.1166 | 5.0213 |

Isabel Pressure Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.9834 | 0.9919 | 0.9915 | 0.9926 | 0.9916 | 0.9929 |

MSE | 6.5982 | 2.3903 | 3.0432 | 2.0518 | 3.0987 | 1.9561 |

Isabel QVapor Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.7495 | 0.7726 | 0.7745 | 0.7899 | 0.7838 | 0.80521 |

MSE | 12.7532 | 11.8243 | 10.2122 | 9.2676 | 9.262 | 8.2 |

Combustion mixfrac Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.8913 | 0.9138 | 0.9373 | 0.9538 | 0.9452 | 0.9708 |

MSE | 14.5252 | 12.2813 | 9.376 | 7.9371 | 18.141 | 5.7753 |

Combustion Y_OH Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.8868 | 0.9061 | 0.9401 | 0.9565 | 0.955 | 0.9739 |

MSE | 14.4677 | 13.6179 | 9.111 | 8.0836 | 7.4155 | 5.9128 |

Asteroid tev Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.9746 | 0.9813 | 0.9808 | 0.9885 | 0.9849 | 0.9908 |

MSE | 4.93 | 4.3499 | 3.8366 | 3.1976 | 3.2674 | 2.7139 |

Asteroid v02 Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.7898 | 0.8121 | 0.7972 | 0.8213 | 0.8064 | 0.8326 |

MSE | 31.27 | 32.91 | 26.301 | 27.656 | 23.9335 | 25.4177 |

Raw Data Correlation | PMI-Based Sampling | Random Sampling | ||||
---|---|---|---|---|---|---|

Pearson’s Correlation | Distance Correlation | Pearson’s Correlation | Distance Correlation | Pearson’s Correlation | Distance Correlation | |

Isabel Data (Pressure and QVapor) | −0.19803 | 0.3200 | −0.19805 | 0.3205 | −0.1966 | 0.3213 |

Combustion Data (mixfrac and Y_OH) | 0.01088 | 0.4012 | 0.01624 | 0.4054 | 0.02123 | 0.4071 |

Asteroid Data (tev and v02) | 0.2116 | 0.2938 | 0.2273 | 0.2994 | 0.2382 | 0.31451 |

Raw Data | PMI-Based Sampling | Random Sampling | ||||
---|---|---|---|---|---|---|

Pearson’s Correlation | Distance Correlation | Pearson’s Correlation | Distance Correlation | Pearson’s Correlation | Distance Correlation | |

Isabel Data (Pressure and QVapor) | 0.3725 | 0.5470 | 0.3735 | 0.5530 | 0.3686 | 0.5480 |

Combustion Data (mixfrac and Y_OH) | 0.3462 | 0.5113 | 0.3588 | 0.5248 | 0.3663 | 0.5321 |

Asteroid Data (tev and v02) | −0.028 | 0.3622 | −0.0209 | 0.1795 | −0.0259 | 0.1797 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Dutta, S.; Biswas, A.; Ahrens, J.
Multivariate Pointwise Information-Driven Data Sampling and Visualization. *Entropy* **2019**, *21*, 699.
https://doi.org/10.3390/e21070699

**AMA Style**

Dutta S, Biswas A, Ahrens J.
Multivariate Pointwise Information-Driven Data Sampling and Visualization. *Entropy*. 2019; 21(7):699.
https://doi.org/10.3390/e21070699

**Chicago/Turabian Style**

Dutta, Soumya, Ayan Biswas, and James Ahrens.
2019. "Multivariate Pointwise Information-Driven Data Sampling and Visualization" *Entropy* 21, no. 7: 699.
https://doi.org/10.3390/e21070699