^{1}

^{*}

^{2}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Building mathematical models of cellular networks lies at the core of systems biology. It involves, among other tasks, the reconstruction of the structure of interactions between molecular components, which is known as network inference or reverse engineering. Information theory can help in the goal of extracting as much information as possible from the available data. A large number of methods founded on these concepts have been proposed in the literature, not only in biology journals, but in a wide range of areas. Their critical comparison is difficult due to the different focuses and the adoption of different terminologies. Here we attempt to review some of the existing information theoretic methodologies for network inference, and clarify their differences. While some of these methods have achieved notable success, many challenges remain, among which we can mention dealing with incomplete measurements, noisy data, counterintuitive behaviour emerging from nonlinear relations or feedback loops, and computational burden of dealing with large data sets.

Systems biology is an interdisciplinary approach for understanding complex biological systems at the system level [

This review paper deals with the problem of constructing models of biological systems from experimental data. More specifically, we are interested in reverse engineering cellular systems that can be naturally modeled as biochemical networks. A network consists of a set of nodes and a set of links between them. In cellular networks the nodes are molecular entities such as genes, proteins, or metabolites. The links or edges are the interactions between nodes, such as the chemical reactions where the molecules are present, or a higher level abstraction such as a regulatory interaction involving several reactions. Thus cellular networks can be classified, according to the type of entities and interactions involved, as gene regulatory, metabolic, or protein signaling networks.

The main goal of the methods studied here is to infer the network structure, that is, to deduce the set of interactions between nodes. This means that the focus is put on methods that—if we choose metabolism as an example—aim at finding which metabolites appear in the same reaction, as opposed to methods that aim at the detailed characterization of the reaction (determining its rate law and estimating the values of its kinetic constants). The latter is a related but different part of the inverse problem, and will not be considered here.

Some attributes of the entities are measurable, such as the concentration of a metabolite or the expression level of a gene. When available, those data are used as the input for the inference procedure. For that purpose, attributes are considered random variables that can be analyzed with statistical tools. For example, dependencies between variables can be expressed by correlation measures. Information theory provides a rigorous theoretical framework for studying the relations between attributes.

Information theory can be viewed as a branch of applied mathematics, or more specifically as a branch of probability theory [

The fundamental notion of information theory is entropy, which quantifies the uncertainty of a random variable and is used as a measure of information. Closely related to entropy is mutual information, a measure of the amount of information that one random variable provides about another. These concepts can be used to infer interactions between variables from experimental data, thus allowing reverse engineering of cellular networks.

A number of surveys, which approach the network inference problem from different points of view, including information-theoretic and other methods, have been published in the past. To the best of the authors' knowledge, the first survey focused on identification of biological systems dates back to 1978 [

The problem of network inference has been investigated in many different communities. The aforementioned reviews deal mostly with biological applications, and were published in journals of bioinformatics, systems biology, microbiology, molecular biology, physical chemistry, and control engineering communities. Many more papers on the subject are regularly published in journals from other areas. Systems identification, a part of systems and control theory, is a discipline in its own right, with a rich literature [

Biological sciences have a long history of using statistical tools to measure the strength of dependence among variables. An early example is the correlation coefficient _{i}_{i}

It should be noted that in this context the word “linear” may be used in two different ways. When applied to a deterministic system, it means that the differential equations that define the evolution of the system's variables in time are linear. On the other hand, when applied to the relationship between two variables, it means that the two-dimensional plot of their values (not of the variables as a function of time,

A related concept, partial correlation, measures the dependence between two random variables

The Pearson coefficient is easy to calculate and symmetric, and its range of values has a clear interpretation. However, as noted in [

The joint entropy of a pair of discrete random variables (X,Y) is

Conditional entropy

The joint entropy and the conditional entropy are related so that the entropy of a pair of random variables is the entropy of one plus the conditional entropy of the other:

The relative entropy is a measure of the distance between two distributions with probability functions

The relative entropy is always non-negative, and it is zero if and only if

Mutual information,

Linfoot [

The mutual information is a measure of the amount of information that one random variable contains about another. It can also be defined as the reduction in the uncertainty of one variable due to the knowledge of another. Mutual information is related to entropy as follows:

Finally, the conditional mutual information measures the amount of information shared by two variables when a third variable is known:

If Y and Z carry the same information about X, the conditional mutual information

The relationship between entropy, joint entropy, conditional entropy, and mutual information is graphically depicted in

Mutual information is a general measure of dependencies between variables. This suggests its application for evaluating similarities between datasets, which allows for inferring interaction networks of any kind: chemical, biological, social, or other. If two components of a network interact closely, their mutual information will be large; if they are not related, it will be theoretically zero. As already mentioned, mutual information is more general than the Pearson correlation coefficient, which is only rigorously applicable to linear correlations with Gaussian noise. Hence, mutual information may be able to detect additional non-linear correlations undetectable for the Pearson coefficient, as has been shown for example in [

In practice, for the purpose of network inference, mutual information cannot be analytically calculated, because the underlying network is unknown. Therefore, it must be estimated from experimental data, a task for which several algorithms of different complexity can be used. The most straightforward approximation is to use a “naive” algorithm that partitions the data into a number of bins of a fixed width, and approximates the probabilities by the frequencies of occurrence. This simple approach has the drawback that the mutual information is systematically overestimated [

Another issue related to estimation of mutual information is the determination of a threshold to distinguish interaction from non-interaction. One solution is given by the minimum description length (MDL) principle [

We finish this section mentioning that a discussion of some issues concerning the definition of multivariate dependence has been presented in [

In the 1960s Marko proposed a generalization of Shannon's information theory called bidirectional information theory [_{1} and _{2}, with _{1} and _{2}. Then the directed transinformation from _{1} to _{2} is
_{n}_{n}_{n}Y_{n}_{n}_{n}_{2} to _{1} is defined in the same way, replacing

Marko's work was continued two decades later by Massey [^{N}^{N}^{N}^{N}

If no feedback between ^{N}^{N}^{N}^{N}

Another generalization of Shannon entropy is the concept of nonextensive entropy. Shannon entropy (also called Boltzmann–Gibbs entropy, which we denote here as _{BG}_{1}, …, _{N}_{i}_{B}_{q}_{BG}_{q}_{BG}_{q}_{q}_{BG}

Early examples of techniques based on mutual information in a biological context can be found in [^{M}_{Nm}_{i}_{i}

Two years later, Butte

Pearson's correlation coefficient was also used in [

In [

The aforementioned methods were developed mostly for gene expression data. In contrast, the next two techniques, Correlation Metric Construction (CMC) and Entropy Metric Construction (EMC), aimed at reverse engineering chemical reaction mechanisms, and used time series data (typically metabolic) of the concentration of the species present in the mechanism. In CMC [_{ij}_{i}_{i}_{j}_{j}_{i}_{ij}_{ij}_{τ}

The Entropy Metric Construction method, EMC [

Recently, CMC/EMC has inspired a method [

The empirical distance correlation (DCOR) was presented in [_{kl}_{k}_{l}_{kl}_{k}_{l}_{kl}_{kl}_{k}_{•} − _{•}_{l}_{••}, and similarly for _{kl}_{n}

Similarly,
_{n}

Unlike the classical definition of correlation, distance correlation is zero only if the random vectors are independent. Furthermore, it is defined for X and Y in arbitrary dimensions, rather than to univariate quantities. DCOR is a good example of a method that has gained recognition inside a research community (statistics) but whose merits have hardly become known to scientists working on other areas (such as the applied biological sciences). Some recent exceptions have recently appeared. In [

The Maximal Information Coefficient (MIC) is another recently proposed measure of association between variables [

The claims about MIC's performance expressed in the original publication [

Recently, the nonextensive entropy proposed by Tsallis has also been used in the context of reverse-engineering gene networks [_{i}

The reported results show an improvement on the inference accuracy by adopting nonextensive entropies instead of traditional entropies. The best computational results in terms of reduction of the number of false positives were obtained with the range of values 2.5 <

Finally, we discuss some methods that use the minimum description length principle (MDL) described in Subsection 2.2. MDL was applied in [

A number of methods have been proposed that use information theoretic considerations to distinguish between direct and indirect interactions. The underlying idea is to establish whether the variation in a variable can be explained by the variations in a subset of other variables in the system.

The Entropy Reduction Technique, ERT [

Given a species Y, start with

Find _{X}H

Set

Stop if

Intuitively, the method determines whether the nonlinear variation in a variable Y, as given by its entropy, is explainable by the variations of a subset—possibly all—of the other variables in the system,

The ARACNE method [_{0} are identified as candidate interactions. This part is similar to the method of mutual information relevance networks [_{0} and removes the edge with the smallest value. In this way, ARACNE manages to reduce the number of false positives, which is a limitation of mutual information relevance networks. Indeed, when tested on synthetic data, ARACNE outperformed relevance networks and Bayesian networks. ARACNE has also been applied to experimental data, with the first application being reverse engineering of regulatory networks in human B cells [

The definition of Conditional Mutual Information _{Z}_{∈}_{V}_{−}_{XY}I

The Context Likelihood of Relatedness technique, CLR [

The Minimum Redundancy Networks technique (MRNET [

A statistical learning strategy called three-way mutual information (MI3) was presented in [_{1}, _{2}, and _{1} and _{2} are possible “regulators” of the target variable,

Both MI3 and ERT try to detect higher order interactions and, for this purpose, they use scores calculated from entropies H(*), and 2- and 3- variable joint entropies, H(*,*) and H(*,*,*). MI3 was specifically designed to detect cooperative activity between two regulators in transcriptional regulatory networks, and it was reported to outperform other methods such as Bayesian networks, two-way mutual information and a discrete version of MI3. A method [

A similar measure, averaged three-way mutual information (AMD), was defined in [_{i}_{j}_{k}_{i}_{j}_{k}_{j}_{k}_{i}_{j}_{k}_{j}_{i}_{k}_{ijk}

The Inferelator [

Other methods have relied on correlation measures instead of mutual information for detecting indirect interactions. A method to construct approximate undirected dependency graphs from large-scale biochemical data using partial correlation coefficients was proposed in [

In [

In [

Inferring the causality of an interaction is a complicated task, with deep theoretical implications. This topic has been extensively investigated by Pearl [

It was already mentioned that CMC can determine directionality because it takes time series information into account, as shown in [

Some methods based on mutual information have taken causality into account. One of them is EMC [

As mentioned in the Introduction, there are some publications where detailed analyses and comparisons of some of the methods reviewed here have been carried out. For example, in [

The differences and similarities of three other network inference algorithms—ARACNE, Context Likelihood of Relatedness (CLR), and MRNET—were studied in [

The same three inference algorithms, together with the Relevance Networks method (RN), were compared in [

A number of methods for inferring the connectivity of cellular networks has been reviewed in this article. Most of these methods, which have been published during the last two decades, adopt some sort of information theoretic approach for evaluating the probability of the interactions between network components. We have tried to review as many techniques as possible, surveying the literature from areas such as systems and computational biology, bioinformatics, molecular biology, microbiology, biophysics, physical and computational chemistry, physics, systems and process control, computer science, or statistics. Some methods were designed for specific purposes (e.g., reverse engineering gene regulatory networks), while others aim at a wider range of applications. We have attempted to give a unified treatment to methods from different backgrounds, clarifying their differences and similarities. When available, comparisons of their performances have been reported.

It has been shown that information theory provides a solid foundation for developing reverse engineering methodologies, as well as a framework to analyze and compare them. Concepts such as entropy or mutual information are of general applicability and make no assumptions about the underlying systems; for example, they do not require linearity or absence of noise. Furthermore, most information theoretic methods are scalable and can be applied to large-scale networks with hundreds or thousands of components. This gives them in some cases an advantage over other techniques that have higher computational cost, such as Bayesian approaches.

A conclusion of this review is that no single method outperforms the rest for all problems. There is “no free lunch”: methods that are carefully tailored to a particular application or dataset may yield better results than others when applied to that particular problem, but frequently perform worse when applied to different systems. Therefore, when facing a new problem it may be useful to try several methods. Interestingly, the results of the DREAM challenges show that community predictions are more reliable than individual predictions [

In the last fifteen years different information theoretic methods have been successfully applied to the reverse engineering of genetic networks. The resulting predictions about existing interactions have enabled the design of new experiments and the generation of hypotheses that were later confirmed experimentally, demonstrating the ability of computational modeling to provide biological insight. Another indication of the success of the information theoretic approach is that in recent years methods that combine mutual information with other techniques have been among the top performers in the DREAM reverse engineering challenges [

Despite all the advances made in the last decades, the problem faced by these methods (inferring large-scale networks with nonlinear interactions from incomplete and noisy data) remains challenging. To progress towards that goal, several breakthroughs need to be achieved. A systematic way of determining causality that is valid for large-scale systems is still lacking. Computational and experimental procedures for identifying feedback loops and other complex structures are also needed. For these and other obstacles to be overcome, the future developments should be aware of the existing methodologies and build on their capabilities. We hope that this review will help researchers in that task.

This work was supported by the EU project “BioPreDyn” (EC FP7-KBBE-2011-5, grant number 289434), the Spanish government project “MULTISCALES” (DPI2011-28112-C04-03), the CSIC intramural project “BioREDES” (PIE-201170E018), and the National Science Foundation grant CHE 0847073.

Graphical representation of the entropies (