Next Article in Journal
On the Linear Quadratic Optimal Control for Systems Described by Singularly Perturbed Itô Differential Equations with Two Fast Time Scales
Previous Article in Journal
Doily as Subgeometry of a Set of Nonunimodular Free Cyclic Submodules
Article Menu

Export Article

Open AccessArticle
Axioms 2019, 8(1), 29; https://doi.org/10.3390/axioms8010029

Using Ramsey Theory to Measure Unavoidable Spurious Correlations in Big Data

1
Department of Mathematics and Statistics, University of Calgary, Calgary, AB T2N 1N4, Canada
2
Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY 10027, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Received: 18 January 2019 / Revised: 8 February 2019 / Accepted: 11 February 2019 / Published: 5 March 2019
(This article belongs to the Special Issue Perspectives on Big Data and Data Sciences)
  |  
PDF [1121 KB, uploaded 6 March 2019]
  |  

Abstract

Given a dataset, we quantify the size of patterns that must always exist in the dataset. This is done formally through the lens of Ramsey theory of graphs, and a quantitative bound known as Goodman’s theorem. By combining statistical tools with Ramsey theory of graphs, we give a nuanced understanding of how far away a dataset is from correlated, and what qualifies as a meaningful pattern. This method is applicable to a wide range of datasets. As examples, we analyze two very different datasets. The first is a dataset of repeated voters ( n = 435 ) in the 1984 US congress, and we quantify how homogeneous a subset of congressional voters is. We also measure how transitive a subset of voters is. Statistical Ramsey theory is also used with global economic trading data ( n = 214 ) to provide evidence that global markets are quite transitive. While these datasets are small relative to Big Data, they illustrate the new applications we are proposing. We end with specific calls to strengthen the connections between Ramsey theory and statistical methods. View Full-Text
Keywords: statistics; data analysis; Ramsey theory; graph theory; transitivity statistics; data analysis; Ramsey theory; graph theory; transitivity
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Pawliuk, M.; Waddell, M.A. Using Ramsey Theory to Measure Unavoidable Spurious Correlations in Big Data. Axioms 2019, 8, 29.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Axioms EISSN 2075-1680 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top