Inferring and comparing complex, multivariable probability density functions is fundamental to problems in several fields, including probabilistic learning, network theory, and data analysis. Classification and prediction are the two faces of this class of problem. This study takes an approach that simplifies many aspects of these problems by presenting a structured, series expansion of the Kullback-Leibler divergence—a function central to information theory—and devise a distance metric based on this divergence. Using the Möbius inversion duality between multivariable entropies and multivariable interaction information, we express the divergence as an additive series in the number of interacting variables, which provides a restricted and simplified set of distributions to use as approximation and with which to model data. Truncations of this series yield approximations based on the number of interacting variables. The first few terms of the expansion-truncation are illustrated and shown to lead naturally to familiar approximations, including the well-known Kirkwood superposition approximation. Truncation can also induce a simple relation between the multi-information and the interaction information. A measure of distance between distributions, based on Kullback-Leibler divergence, is then described and shown to be a true metric if properly restricted. The expansion is shown to generate a hierarchy of metrics and connects this work to information geometry formalisms. An example of the application of these metrics to a graph comparison problem is given that shows that the formalism can be applied to a wide range of network problems and provides a general approach for systematic approximations in numbers of interactions or connections, as well as a related quantitative metric.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited