# Dynamic Mixed Data Analysis and Visualization

^{1}

^{2}

^{*}

## Abstract

**:**

`R`, and the methodology is illustrated on a real data set related to COVID-19 healthcare, policy and restriction measures about the 2020–2021 COVID-19 pandemic across EU Member States.

## 1. Introduction

`R`which can be retrieved from https://github.com/giancaman/Manuscript-Dynamic-mixed-data-analysis-and-visualization-apps. The data sets and the instructions for running the apps are also contained in this GitHub repository, along with the Matlab and R code.

## 2. Materials and Methods

#### 2.1. A Robust Metric to Track Proximity over Time

#### 2.2. Tracking a Source of Variability over Time

#### 2.3. Distance Visualization over Time

`R`Shinny application.

#### 2.4. Dynamic Multiple MDS Maps

## 3. Results

#### 3.1. Data Set Construction

#### 3.2. Visualizing Time Series Data

#### 3.3. `R` Shiny Applications to Dynamically Monitor Countries’ Distances over Time

`R`. The first application is about showing time series of distances between one reference country and other chosen countries; in the second one, the reference country is compared to all the other countries in a map where different colors are assigned to countries, the color grading being proportional to the distances with respect to the reference country; the third one is a scatter plot showing the countries with coordinates given by the first two MDS axes and with circles representing countries with sizes proportional to either new COVID-19 cases or the number of vaccinated people. The last two shiny applications allow the user to dynamically monitor the comparison along the entire 59-week period using a slider object in the application. All the three applications start by asking the user to choose the location where the input data sets have been stored.

#### 3.3.1. Application 1: Time Series

#### 3.3.2. Application 2: Using a Map for Comparison

#### 3.3.3. Application 3: Displaying Principal Coordinates for Comparison

## 4. Discussion

`R`shiny applications were implemented allowing for flexible user-friendly visualizations (time series, geographic maps, dimensional maps), the choice of countries to compare and the choice of variables to integrate. Regarding the time complexity of the robust representation of the data points, which is another important feature of our work if one thinks about the dynamic of the distance matrices’ construction (i.e., the computation of 59 weekly distance matrices for 25 countries), the execution time for the Matlab code was 7.38 s without graphs, and 10.68 s with graphs (more than 30 plots produced). This code was executed on a Toshiba laptop; processor: Intel(R) Core(TM) i7-5500U CPU @ 2.40 GHz; RAM: 16.0 GB; 64-bit Windows operating system, x64-based processor.

`R`package that, starting from the mixed-type raw data, implements user-selected distances and allows all possible visualizations to be obtained interactively.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Foss, A.H.; Markatou, M.; Ray, B. Distance Metrics and Clustering Methodsfor Mixed-type Data. Int. Stat. Rev.
**2019**, 87, 80–109. [Google Scholar] [CrossRef] - Bar-Hillel, A.; Hertz, T.; Shental, N.; Weinshall, D.; Ridgeway, G. Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res.
**2005**, 6, 937–965. [Google Scholar] - Jian, S.; Hu, L.; Cao, L.; Lu, K. Metric-Based Auto-Instructor for Learning Mixed Data Representation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2018. [Google Scholar]
- Wang, D.; Tan, X. Robust Distance Metric Learning via Bayesian Inference. IEEE Trans. Image Process.
**2018**, 27, 1542–1553. [Google Scholar] [CrossRef] [PubMed] - Grané, A.; Romera, R. On visualizing mixed-type data: A joint metric approach to profile construction and outlier detection. Sociol. Methods Res.
**2018**, 47, 207–239. [Google Scholar] [CrossRef] - Cuadras, C.M. Multidimensional dependencies in classification and ordination. In Anal. Multidimens. Données; CISIA-CERESTA: Saint Mandé, France, 1998; pp. 15–25. [Google Scholar]
- Cuadras, C.M.; Fortiana, J. Visualizing categorical data with related metric scaling. In Visualization of Categorical Data; Elsevier: Amsterdam, The Netherlands, 1998; pp. 365–376. [Google Scholar]
- Grané, A.; Salini, S.; Verdolini, E. Robust multivariate analysis for mixed-type data: Novel algorithm and its practical application in socio-economic research. Socio-Econ. Plan. Sci.
**2021**, 73, 100907. [Google Scholar] [CrossRef] - Atkinson, A.; Riani, M.; Cerioli, A. Random start forward searches with envelopes for detecting clusters in multivariate data. In Data Analysis, Classification and the Forward Search; Springer: Berlin/Heidelberg, Germany, 2006; pp. 163–171. [Google Scholar]
- Gower, J.C. A General Coefficient of Similarity and Some of its Properties. Biometrics
**1971**, 27, 857–874. [Google Scholar] [CrossRef] - Grané, A.; Manzi, G.; Salini, S. Smart Visualization of Mixed Data. Stats
**2021**, 4, 472–485. [Google Scholar] [CrossRef] - Cuadras, C.M.; Fortiana, J. A Continuous Metric Scaling Solution for a Random Variable. J. Multivar. Anal.
**1995**, 52, 1–14. [Google Scholar] [CrossRef] - Rao, C.R. Diversity and dissimilarity coefficients: A unified approach. Theor. Popul. Biol.
**1982**, 21, 24–43. [Google Scholar] [CrossRef] - Cuadras, C.M.; Fortiana, J.; Oliva, F. The proximity of an individual to a population with applications in discriminant analysis. J. Classif.
**1997**, 14, 117–136. [Google Scholar] [CrossRef] - Guidotti, E.; Ardia, D. COVID-19 Data Hub. J. Open Source Softw.
**2020**, 5, 2376. [Google Scholar] [CrossRef] - Roser, M.; Ritchie, H.; Ortiz-Ospina, E.; Hasell, J. Coronavirus Pandemic (COVID-19). 2020. Available online: https://ourworldindata.org/ (accessed on 1 June 2022).
- Hale, T.; Angrist, N.; Goldszmidt, R.; Kira, B.; Petherick, A.; Phillips, T.; Webster, S.; Cameron-Blake, E.; Hallas, L.; Majumdar, S.; et al. A global panel database of pandemic policies (Oxford COVID-19 Government Response Tracker). Nat. Hum. Behav.
**2021**, 5, 529–538. [Google Scholar] [CrossRef] [PubMed] - Lopez-Arevalo, I.; Aldana-Bobadilla, E.; Molina-Villegas, A.; Galeana-Zapién, H.; Muñiz-Sanchez, V.; Gausin-Valle, S. A Memory-Efficient Encoding Method for Processing Mixed-Type Data on Machine Learning. Entropy
**2020**, 22, 1391. [Google Scholar] [CrossRef] [PubMed] - Salini, S.; Turri, M. How to measure institutional diversity in higher education using revenue data. Qual. Quant.
**2016**, 50, 1165–1183. [Google Scholar] [CrossRef] - D’Urso, P.; Vichi, M. Dissimilarities between trajectories of a three-way longitudinal data set. In Advances in Data Science and Classification; Springer: Berlin/Heidelberg, Germany, 1998; pp. 585–592. [Google Scholar]

**Figure 4.**Maximum and minimum pairwise distances over time and dynamic box plot for pairwise distances.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Grané, A.; Manzi, G.; Salini, S. Dynamic Mixed Data Analysis and Visualization. *Entropy* **2022**, *24*, 1399.
https://doi.org/10.3390/e24101399

**AMA Style**

Grané A, Manzi G, Salini S. Dynamic Mixed Data Analysis and Visualization. *Entropy*. 2022; 24(10):1399.
https://doi.org/10.3390/e24101399

**Chicago/Turabian Style**

Grané, Aurea, Giancarlo Manzi, and Silvia Salini. 2022. "Dynamic Mixed Data Analysis and Visualization" *Entropy* 24, no. 10: 1399.
https://doi.org/10.3390/e24101399