# Multivariate Pointwise Information-Driven Data Sampling and Visualization

## Abstract

## 1. Introduction

- We propose a new multivariate association-driven data sampling algorithm for large-scale data summarization.
- Given a user-specified sampling fraction, we use pointwise information measures and statistical distribution-based sampling techniques to generate a sub-sampled data that preserves the important multivariate features.
- We perform a detailed qualitative and quantitative study to demonstrate the efficacy of the proposed sampling scheme.

## 2. Related Works

#### 2.1. Information Theory in Visualization

#### 2.2. Sampling for Data Analysis and Visualization

#### 2.3. Multivariate Data Analysis and Visualization

## 3. Method

#### 3.1. Random Sampling

#### 3.2. Proposed Multivariate Statistical Association-Driven Sampling

#### 3.2.1. Multivariate Pointwise Information Characterization

#### 3.2.2. Generalized Pointwise Information

#### 3.2.3. Pointwise Information-Guided Multivariate Sampling

## 4. Results

#### 4.1. Sample-Based Multivariate Query-Driven Visual Analysis

#### 4.1.1. Hurricane Isabel Data

#### 4.1.2. Turbulent Combustion Data

#### 4.1.3. Asteroid Impact Data

#### 4.1.4. Quantitative Evaluation of Query-Driven Analysis

#### 4.2. Reconstruction-Based Visualization of Sampled Data

#### 4.2.1. Hurricane Isabel Data

#### 4.2.2. Turbulent Combustion Data

#### 4.2.3. Asteroid Impact Data

#### 4.2.4. Image-Based Quantitative Evaluation of Reconstruction-Based Visualization

#### 4.3. Multivariate Correlation Analysis of the Proposed Sampling Method

## 5. Discussion, Limitations, and Future Works

## 6. Conclusions

**Figure 1.**Visualization of Pressure and Velocity field of Hurricane Isabel data set. The hurricane eye at the center of Pressure field and the high velocity region around the hurricane eye can be observed.

**Figure 2.**PMI computed from Pressure and Velocity field of Hurricane Isabel data set is visualized. (

**a**) shows the 2D plot of PMI values for all value pairs of Pressure and Velocity, (

**b**) provides the PMI field for analyzing the PMI values in the spatial domain. It can be seen that around the hurricane eye, the eyewall is highlighted as high PMI-valued region which indicates a joint feature in the data set involving Pressure and Velocity field.

**Figure 3.**Sampling result on Isabel data set when Pressure and Velocity variables are used. (

**a**) shows results of random sampling and (

**b**) shows results of the proposed pointwise information driven sampling results for sampling fraction $0.03$. By observing the PMI field presented in Figure 2b, it can be seen that the proposed sampling method samples densely from the regions where statistical association between Pressure and Velocity is stronger (

**b**).

**Figure 4.**Sampling result for Isabel data set when three variables (QGraup, QCloud, and Precipitation) are used to perform sampling. In this case, the generalized specific correlation measure presented in Equation is used to compute multivariate associativity for the data points considering all three variables. (

**a**–

**c**) show the rendering of QGraup, QCloud, and Precipitation fields respectively. (

**d**) presents the rendering of sampled data points when the proposed multivariate sampling algorithm is applied to these three variables. It can be seen that the cloud and the rain bands show stronger statistical association among three variables and hence are sampled densely. The sampling fraction used in this example is $0.05$.

**Figure 5.**Results of the proposed sampling technique when the number of histogram bins is varied while computing the information theoretic measure PMI. It is observed that the overall result remains similar without impacting the outcome of the sampling algorithm significantly.

**Figure 6.**Visualization of multivariate query-driven analysis performed on the sampled data using Hurricane Isabel data set. The multivariate query −100 < Pressure < −4900 AND Velocity > 10 is applied on the sampled data sets. (

**a**) shows all the points selected by the proposed sampling algorithm by using Pressure and Velocity variable. (

**b**) shows the data points selected by the query when applied to raw data. (

**c**) shows the points selected when the query is performed on the sub-sampled data produced by the proposed sampling scheme and (

**d**) presents the result of the query when applied to a randomly sampled data set. The sampling fraction used in this experiment is $0.07$.

**Figure 7.**Visualization of multivariate query-driven analysis performed on the sampled data using Turbulent Combustion data set. The multivariate query 0.3 < mixfrac < 0.7 AND 0.0006 < Y_OH 0.1 is applied on the sampled data sets. (

**a**) shows all the points selected by the proposed sampling algorithm by using mixfrac and Y_OH variable. (

**b**) shows the data points selected by the query when applied to raw data. (

**c**) shows the points selected when the query is performed on the sub-sampled data produced by the proposed sampling scheme and (

**d**) presents the result of the query when applied to a randomly sampled data set. The sampling fraction used in this experiment is $0.07$.

**Figure 8.**Visualization of multivariate query driven analysis performed on the sampled data using Asteroid impact data set. The multivariate query 0.13 < tev < 0.5 AND 0.45 < v02 1.0 is applied on the sampled data sets. (

**a**) shows all the points selected by the proposed sampling algorithm by using tev and v02 variable. (

**b**) shows the data points selected by the query when applied to raw data. (

**c**) shows the points selected when the query is performed on the sub-sampled data produced by the proposed sampling scheme and (

**d**) presents the result of the query when applied to a randomly sampled data set. The sampling fraction used in this experiment is $0.07$.

**Figure 9.**Reconstruction-based visualization of Velocity field of Hurricane Isabel data set. Linear interpolation is used to reconstruct the data from the sub-sampled data sets. (

**a**) shows the result from the original raw data. (

**b**) provides the reconstruction result from the sub-sampled data generated by the proposed method, and (

**c**) presents the result of reconstruction from random sampled data. The sampling fraction used in this experiment is $0.05$.

**Figure 10.**Reconstruction-based visualization of mixfrac field of Turbulent Combustion data set. Linear interpolation is used to reconstruct the data from the sub-sampled data sets. (

**a**) shows the result from the original raw data. (

**b**) provides the reconstruction result from the sub-sampled data generated by the proposed method, and (

**c**) presents the result of reconstruction from random sampled data. The sampling fraction used in this experiment is $0.05$.

**Figure 11.**Reconstruction-based visualization of Y_OH field of Turbulent Combustion data set. Linear interpolation is used to reconstruct the data from the sub-sampled data sets. (

**a**) shows the result from the original raw data. (

**b**) provides the reconstruction result from the sub-sampled data generated by the proposed method, and (

**c**) presents the result of reconstruction from random sampled data. The sampling fraction used in this experiment is $0.05$.

**Figure 12.**Reconstruction-based visualization of tev field of Asteroid impact data set. Linear interpolation is used to reconstruct the data from the sub-sampled data sets. (

**a**) shows the result from the original raw data. (

**b**) provides the reconstruction result from the sub-sampled data generated by the proposed method, and (

**c**) presents the result of reconstruction from random sampled data. The sampling fraction used in this experiment is $0.05$.

**Figure 13.**Regions of interest (ROI) of different data sets used for analysis. (

**a**) shows the ROI in Isabel data set, where the hurricane eye feature is selected. (

**b**) shows the ROI for Combustion data set, where the turbulent flame region is highlighted. Finally, in (

**c**) the ROI for asteroid data set is shown. The ROI selected in this example indicates the region where the asteroid has impacted the ocean surface and the splash of the water is ejected to the environment.

samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | samp. frac: 0.07 | samp. frac: 0.09 | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | |

Isabel data (−100 < Pres < −4900 & Vel > 10) | 0.0096 | 0.0468 | 0.029 | 0.143 | 0.048 | 0.233 | 0.0676 | 0.315 | 0.0846 | 0.388 |

Isabel data (0 < Pres < 1500 & 10 < Vel < 35) | 0.0116 | 0.0103 | 0.0293 | 0.0332 | 0.05 | 0.0524 | 0.0724 | 0.078 | 0.0842 | 0.0969 |

Isabel data (−100 < Pres < −4900 & Qva > 0.017) | 0.0086 | 0.0912 | 0.033 | 0.163 | 0.05 | 0.266 | 0.0637 | 0.284 | 0.086 | 0.314 |

Isabel data (Pres > 300 & 0.02 < Qva < 0.03) | 0.0088 | 0.023 | 0.0159 | 0.0585 | 0.062 | 0.1241 | 0.0726 | 0.1507 | 0.0975 | 0.2446 |

samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | samp. frac: 0.07 | samp. frac: 0.09 | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | |

Combustion data (0.3 < mixfrac < 0.7 & 0.0006 < Y_OH < 0.1) | 0.0099 | 0.0275 | 0.029 | 0.081 | 0.048 | 0.135 | 0.0671 | 0.191 | 0.0862 | 0.244 |

Combustion data (0.7 < mixfrac < 1.0 & 0.0005 < Y_OH < 0.0019) | 0.00884 | 0.0329 | 0.0291 | 0.1139 | 0.0474 | 0.1892 | 0.0686 | 0.2636 | 0.0877 | 0.3518 |

samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | samp. frac: 0.07 | samp. frac: 0.09 | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | Random | Proposed | |

Asteroid data (0.13 < tev < 0.5 & 0.45 < v02 < 1.0) | 0.013 | 0.067 | 0.029 | 0.202 | 0.0479 | 0.328 | 0.0678 | 0.431 | 0.086 | 0.52 |

Asteroid data (0.1 < tev < 0.3 & 0.01 < v02 < 0.6) | 0.0097 | 0.0827 | 0.0302 | 0.2497 | 0.0491 | 0.4154 | 0.0668 | 0.5777 | 0.0866 | 0.7083 |

Isabel Pressure Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.9844 | 0.9915 | 0.9916 | 0.9931 | 0.9926 | 0.9939 |

MSE | 6.5563 | 1.9267 | 2.5239 | 1.2559 | 2.0576 | 0.8961 |

Isabel Velocity Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.9234 | 0.9559 | 0.9427 | 0.9649 | 0.9516 | 0.9702 |

MSE | 13.9638 | 8.492 | 10.6865 | 6.0452 | 8.1166 | 5.0213 |

Isabel Pressure Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.9834 | 0.9919 | 0.9915 | 0.9926 | 0.9916 | 0.9929 |

MSE | 6.5982 | 2.3903 | 3.0432 | 2.0518 | 3.0987 | 1.9561 |

Isabel QVapor Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.7495 | 0.7726 | 0.7745 | 0.7899 | 0.7838 | 0.80521 |

MSE | 12.7532 | 11.8243 | 10.2122 | 9.2676 | 9.262 | 8.2 |

Combustion mixfrac Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.8913 | 0.9138 | 0.9373 | 0.9538 | 0.9452 | 0.9708 |

MSE | 14.5252 | 12.2813 | 9.376 | 7.9371 | 18.141 | 5.7753 |

Combustion Y_OH Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.8868 | 0.9061 | 0.9401 | 0.9565 | 0.955 | 0.9739 |

MSE | 14.4677 | 13.6179 | 9.111 | 8.0836 | 7.4155 | 5.9128 |

Asteroid tev Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.9746 | 0.9813 | 0.9808 | 0.9885 | 0.9849 | 0.9908 |

MSE | 4.93 | 4.3499 | 3.8366 | 3.1976 | 3.2674 | 2.7139 |

Asteroid v02 Field | samp. frac: 0.01 | samp. frac: 0.03 | samp. frac: 0.05 | |||

Random | Proposed | Random | Proposed | Random | Proposed | |

SSIM | 0.7898 | 0.8121 | 0.7972 | 0.8213 | 0.8064 | 0.8326 |

MSE | 31.27 | 32.91 | 26.301 | 27.656 | 23.9335 | 25.4177 |

Raw Data Correlation | PMI-Based Sampling | Random Sampling | ||||
---|---|---|---|---|---|---|

Pearson’s Correlation | Distance Correlation | Pearson’s Correlation | Distance Correlation | Pearson’s Correlation | Distance Correlation | |

Isabel Data (Pressure and QVapor) | −0.19803 | 0.3200 | −0.19805 | 0.3205 | −0.1966 | 0.3213 |

Combustion Data (mixfrac and Y_OH) | 0.01088 | 0.4012 | 0.01624 | 0.4054 | 0.02123 | 0.4071 |

Asteroid Data (tev and v02) | 0.2116 | 0.2938 | 0.2273 | 0.2994 | 0.2382 | 0.31451 |

Raw Data | PMI-Based Sampling | Random Sampling | ||||
---|---|---|---|---|---|---|

Pearson’s Correlation | Distance Correlation | Pearson’s Correlation | Distance Correlation | Pearson’s Correlation | Distance Correlation | |

Isabel Data (Pressure and QVapor) | 0.3725 | 0.5470 | 0.3735 | 0.5530 | 0.3686 | 0.5480 |

Combustion Data (mixfrac and Y_OH) | 0.3462 | 0.5113 | 0.3588 | 0.5248 | 0.3663 | 0.5321 |

Asteroid Data (tev and v02) | −0.028 | 0.3622 | −0.0209 | 0.1795 | −0.0259 | 0.1797 |

