If regularity in data takes the form of higher-order functions among groups of variables, models which are biased towards lower-order functions may easily mistake the data for noise. To distinguish whether this is the case, one must be able to quantify the contribution of different orders of dependence to the total information. Recent work in information theory attempts to do this through measures of multivariate mutual information (MMI) and information decomposition (ID). Despite substantial theoretical progress, practical issues related to tractability and learnability of higher-order functions are still largely unaddressed. In this work, we introduce a new approach to information decomposition—termed Neural Information Decomposition (NID)—which is both theoretically grounded, and can be efficiently estimated in practice using neural networks. We show on synthetic data that NID can learn to distinguish higher-order functions from noise, while many unsupervised probability models cannot. Additionally, we demonstrate the usefulness of this framework as a tool for exploring biological and artificial neural networks.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.