Next Article in Journal
A Method of Constructing Measurement Matrix for Compressed Sensing by Chebyshev Chaotic Sequence
Previous Article in Journal
Open-Destination Measurement-Device-Independent Quantum Key Distribution Network
Article

Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets

by 1,*,†, 2,3 and 1,4,†
1
Dipartimento di Fisica e Astronomia “Galileo Galilei”, Università degli studi di Padova, Via Marzolo 8, 35131 Padova, Italy
2
Fondazione The Microsoft Research—University of Trento, Centre for Computational and Systems Biology (COSBI), Piazza Manifattura 1, 38068 Rovereto, Italy
3
Dipartimento di Matematica, Università degli studi di Trento, Via Sommarive 14, 38123 Povo, Italy
4
Dipartimento di Matematica “Tullio Levi-Civita”, Università degli studi di Padova, Via Trieste 63, 35121 Padova, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2020, 22(10), 1084; https://doi.org/10.3390/e22101084
Received: 6 August 2020 / Revised: 17 September 2020 / Accepted: 23 September 2020 / Published: 26 September 2020
(This article belongs to the Section Information Theory, Probability and Statistics)
Recent technological and computational advances have enabled the collection of data at an unprecedented rate. On the one hand, the large amount of data suddenly available has opened up new opportunities for new data-driven research but, on the other hand, it has brought into light new obstacles and challenges related to storage and analysis limits. Here, we strengthen an upscaling approach borrowed from theoretical ecology that allows us to infer with small errors relevant patterns of a dataset in its entirety, although only a limited fraction of it has been analysed. In particular we show that, after reducing the input amount of information on the system under study, by applying our framework it is still possible to recover two statistical patterns of interest of the entire dataset. Tested against big ecological, human activity and genomics data, our framework was successful in the reconstruction of global statistics related to both the number of types and their abundances while starting from limited presence/absence information on small random samples of the datasets. These results pave the way for future applications of our procedure in different life science contexts, from social activities to natural ecosystems. View Full-Text
Keywords: upscaling life science data; statistical patterns inference; big data storage reduction upscaling life science data; statistical patterns inference; big data storage reduction
Show Figures

Figure 1

MDPI and ACS Style

Garlaschi, S.; Fochesato, A.; Tovo, A. Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets. Entropy 2020, 22, 1084. https://doi.org/10.3390/e22101084

AMA Style

Garlaschi S, Fochesato A, Tovo A. Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets. Entropy. 2020; 22(10):1084. https://doi.org/10.3390/e22101084

Chicago/Turabian Style

Garlaschi, Stefano, Anna Fochesato, and Anna Tovo. 2020. "Upscaling Statistical Patterns from Reduced Storage in Social and Life Science Big Datasets" Entropy 22, no. 10: 1084. https://doi.org/10.3390/e22101084

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop