Disentangling the Information in Species Interaction Networks

Shannon’s entropy measure is a popular means for quantifying ecological diversity. We explore how one can use information-theoretic measures (that are often called indices in ecology) on joint ensembles to study the diversity of species interaction networks. We leverage the little-known balance equation to decompose the network information into three components describing the species abundance, specificity, and redundancy. This balance reveals that there exists a fundamental trade-off between these components. The decomposition can be straightforwardly extended to analyse networks through time as well as space, leading to the corresponding notions for alpha, beta, and gamma diversity. Our work aims to provide an accessible introduction for ecologists. To this end, we illustrate the interpretation of the components on numerous real networks. The corresponding code is made available to the community in the specialised Julia package EcologicalNetworks.jl.


Introduction
The use of networks to address ecological questions has become increasingly popular [1][2][3]. Ecological networks not only allow for studying species, but, more importantly, also their interactions [2]. Networks can be used to represent all kinds of ecological interactions, such as predation, parasitism, and mutualism, while the composing interactions can indicate energy transfer, exchange of material, or even exchange of information.
Indices borrowed from the field of information theory can be used to characterise the structure of these networks [4]. In ecology, one uses the term 'index' for a measure that quantifies diversity. In this work, we will often refer to information-theoretic measures as indices when they are used for this purpose. Many networks show a particular organisation, where interactions are heterogeneously distributed among species [5]. Generalists interact with many species, while specialists interact with only a few species [6]. This combination of generalists and specialists makes a network more robust [7].
Graphs are commonly used to represent the ecological interactions between different species in an ecosystem [8,9]. A graph consists of nodes that are connected through edges. The nodes represent the species, while the edges connect the interacting species. A distinction can be made between unipartite and bipartite graphs. Any two species can interact in a unipartite graph. Therefore, any two nodes can be connected [2]. A food web without distinct sets of species is an example of a unipartite network. A bipartite network, on the other hand, consists of two disjoint groups of species, with the species from the first group only interacting with species from the second group [10], for instance, feeding interactions between two trophic levels. Some specific examples of bipartite graphs are pollination networks, host-parasite networks, seed dispersal networks, and anemone-fish networks. Our work predominantly focuses on bipartite networks. In Figure 1 (left), interactions between five anemone species and five (anemone-) fish species are visualised as a bipartite graph. The two distinct interaction levels are the anemone species and the fish species, respectively, the bottom and top interaction level. This mutualistic symbiosis between anemones and fishes of the genus Amphiprion , as observed by Ricciardi et al. [11] in the Manado region of Indonesia, will be used throughout this work as an illustration. The interactions between five anemone species and five fish species [11] represented as a bipartite graph (left), binary incidence matrix (middle), and probability matrix of the joint distribution with an indication of the marginal probabilities of the anemone species p B i and the fish species p T j (right). This example will serve throughout this work to illustrate the proposed indices.
to prevent misinterpretation of entropy as a diversity index. We illustrate the proposed methodology on several types of ecological networks.

Ecological Couplings
To straightforwardly apply information theory, rather than represent the network as a graph, one uses an n × m incidence matrix Y. The rows and column of such a matrix represent the species of the two trophic levels. For bipartite networks, the n rows represent the species of one interaction level, while the m columns represent the species of the other interaction level. An incidence matrix can contain information regarding the frequency or strength of the interactions (i.e., a weighted matrix) or solely indicate the presence or absence of an interaction (i.e., a binary matrix). Binary observations of interactions are more frequently recorded than weighted descriptions of interaction networks [27]. A binary representation could be seen as a loss of information, as every interaction becomes equally important [28]. However, taking the strength of interactions into account can also lead to mistakes, since the observed frequencies do not always reflect the true frequencies.
Quantitative observations of interactions strongly depend on the sampling effort [8], and they often result in undersampling [29]. In this work, we opted to illustrate our methods on binary incidence matrices (possibly obtained through binarizing, i.e., mapping non-zero values to one). Figure 1 (middle) shows the binary incidence matrix Y of the bipartite interaction network between anemone species and fish species. The n rows and m columns of the matrix Y represent, respectively, the anemone species (i.e., the bottom interaction level) and the fish species (i.e., the top interaction level). A matrix element Y ij is equal to 1 if the species i of the bottom interaction level interacts with species j of the top interaction level and it is equal to 0 otherwise. In the incidence matrix shown in Figure 1 (middle), 1 indicates that anemone species i is visited by fish species j, while 0 indicates the opposite. However, an interaction between two species is not a pure yes-no event, as the interaction may be rare or depend on several local and behavioural circumstances. As such, we follow Poisot et al. [30] and compute the n × m probability matrix P of the joint distribution as This value can be interpreted as the probability that species i interacts with species j. In our earlier work, we called this normalized incidence matrix an ecological coupling [31]. This coupling arises from random and targeted interactions between the species and it is dependent on the relative species abundances.
In the context of mutualistic symbiosis between anemones and fishes, as shown in Figure 1 (right), P ij is the probability that anemone species i is visited by fish species j. When interactions are associated with the energy transfers between trophic levels, as in food webs, P can be interpreted as the probability distribution of the system's energy flow. The incidence matrix reveals the distribution of the energy flow from the bottom of the network, the energy source, to the top of the network, the energy sink [32].
The marginal distributions of both interaction levels can be computed as where p B i is the probability that bottom species i establishes an interaction and p T j is the probability that top species j establishes an interaction. Note that we introduced two random variables, B and T, for the bottom species and the top species, respectively. The probability matrix P can be augmented to indicate the marginal probabilities, as shown in i is the probability that anemon species i is visited and p T j is the probability that a visit is made by fish species j.

Information Theory for Interaction Networks
Given the above probabilistic interpretation, measures that were borrowed from the field of information theory can be applied to characterise interaction networks. Foremost, we recall the concept of entropy , which is defined for a random variable X, as where p X is the probability mass function of X [13]. By convention, 0 · log 0 is evaluated as 0 [33]. Therefore, values with zero probability will have no influence [34]. The logarithm to the base two is commonly used [35]. Therefore, we will drop the explicit notation of the base.
When base two is used, all of the information-theoretic measures are expressed in bits [36].
Entropy conveys how much information is contained in an outcome and, thus, how surprising a particular outcome is [18]. When the probabilities of all possible outcomes are equal (i.e., X is uniformly distributed), the entropy is maximal, since the effective outcome is the most difficult to guess [34,37]. Imagine the situation where a fish has to choose between a green and an orange anemone. The random variable X represents the outcome of the experiment and p X (x) is the probability that X takes value x. If both anemone species are equally desirable, then the probability distribution p X is uniform. The probability that the fish chooses the orange anemone is equal to the probability of choosing the green one, namely 1 2 . The entropy is now maximal, since every outcome is equally likely and, thus, equally surprising. When every outcome is of equal probability, we obtain the largest amount of information by observing the outcome of the experiment, since the effect was the hardest to predict. Suppose that the green anemone species would be less desirable, with the probability of being chosen equal to 1 8 . In that case, the entropy is reduced to 0.54 bits, which is less than the maximal entropy of one bit when both anemone species are equally desirable. The probability distribution is no longer uniform when one colour is preferred over the other, since it is much more likely that the fish chooses the orange anemone. The more the distribution deviates from the uniform distribution, the less information we obtain by observing the outcome, since we know better what outcome to expect. The entropy is equal to zero in the extreme case where the probability of selecting the orange anemone would be one. We obtain no new information from observing which colour the fish chooses, since we already knew that the outcome would be orange. We can extend this simple example of one fish choosing an interaction partner to an incidence matrix that represents multiple ecological interactions.
The entropies of the marginal distributions of the bottom species B and the top species T, the marginal entropies, can be computed as The joint entropy of the bivariate distribution is computed as The marginal entropies quantify the equality of the species at the bottom and top interaction level, or, in the context of mutualistic symbiosis between anemones and fishes, the equality of the anemone species and fish species, respectively. A large value indicates that the marginal distribution of the species of the interaction level is close to a uniform distribution. In contrast, a low value indicates that some species dominate the interactions more than others. On the other hand, joint entropy can be used to analyse the distribution of the interactions.
When the logarithm to base two is used, the entropy is expressed in bits. In this case, we can interpret entropy as the minimal number of yes-no questions that are, on average, required to learn the outcome of an experiment [38]. For species interactions, this boils down to the average number of questions needed to identify an interaction or interaction partner. The answer to these questions is 'yes' (1) or 'no' (0), so one bit is needed to store the information. For example, suppose that an ecosystem contains four species (a, b, c, and d) that occur with relative frequencies p a = 0.5, p b = 0.25, p c = 0.125, and p d = 0.125. Because species a is most abundant, the first question one might ask to identify a species is "Is it species a?". In the fifty percent of the cases that the answer is 'yes', one has identified the species using a single question. If the answer is 'no', then one has to ask additional questions. The next natural question would be "Is it species b?". Again, if the answer is 'yes', one has identified the species; otherwise, one has to pose a third question. This question could be "Is it species c?", which settles the matter as we were left with only two options (c and d). Because we can identify a using a single question (50% of the cases), b using two questions (25% of the cases), and c and d using three questions (12.5% of the cases each), we can identify the species using an average of 1.75 questions. Given that the entropy of this system equals we know that this scheme is optimal. However, if the species would be present in equal proportions, this scheme would no longer be optimal, as it now requires 2.25 questions on average. In this case, a different set of questions, starting with, for example, "Is it species a or b?", followed by a question to distinguish between the remaining two options, would be optimal. This scheme always requires two questions. Because the entropy of a uniform discrete distribution on a set of four elements is equal to 2, we know that we cannot improve this scheme. This interpretation of entropy expressed in bits as the average number of questions required to identify the interaction or interaction partner is also applicable to other information-theoretic measures, as presented later in this work. The conditional entropy of B given T, and vice versa, are defined as and These measures quantify the average uncertainty that remains regarding the top species when the bottom species is known and the average uncertainty that remains with regard to the bottom species when the top species is known, respectively. In the example of mutualistic symbiosis between anemones and fishes, these measures quantify the remaining uncertainty regarding the fish, respectively, anemone species, given that the anemone species, respectively, fish species, is known. Suppose that, for instance, each fish species visits a single anemone species and that each anemone species is visited by a single fish species. In that case, the marginal entropy of both anemone species and fish species is maximal, since both marginal distributions are uniform. However, the conditional entropy is zero because an anemone species is only visited by a single fish species. There is no freedom of choice. If we know the anemone species, then there is no more uncertainty regarding the interacting fish species, since each anemone species is only visited by one specific fish species. Conditional entropy can also be interpreted as the average number of questions needed to identify an interaction partner, as explained above. When the conditional entropy is zero, there is no freedom of choice and no uncertainty about the interaction partner. Therefore, no questions will need to be asked. A conditional entropy that is different from zero indicates that there is remaining uncertainty [39], thus, freedom of choice, for the anemone species or fish species. In that case, questions are needed in order to identify the interaction partner since there are multiple possibilities. The specificity of the interactions can be more directly quantified by the mutual information: which is symmetric with respect to B and T, i.e., I(B; T) = I(T; B), and that always satisfies I(B; T) ≥ 0. Mutual information quantifies the average reduction in uncertainty regarding B, given that T is known, or vice versa. It expresses how much information about B is conveyed by T, or how much information regarding T is conveyed by B. When B and T are independent, B holds no information about T, or vice versa; therefore, I(B; T) is equal to zero [40]. Mutual information can be interpreted as a measure of the efficiency of an interaction network [39], as high mutual information implies that the species are highly specialised towards a single or a few ecological partners [41].
Finally, the variation of information is defined as This measure is the difference between the joint entropy and mutual information. It is the sum of the average remaining uncertainty regarding the bottom species and top species when, respectively, the top and bottom species are known. In the example of mutualistic symbiosis between anemones and fishes, it is the sum of the average remaining uncertainty about the anemone species when the fish species is known and the average remaining uncertainty about the fish species when the anemone species is known. It captures the residual freedom of choice of the species, and it can be interpreted as a measure of stability [15]. The more redundant interactions, the higher the resistance of the network against the extinction of interaction partners [3]. The variation of information and, thus, the stability of the network, can increase when the number of possible interaction partners of the species increases or when the interactions become more equally distributed, thus increasing the uncertainty. Rearranging the formula above results in the relation between the joint entropy, mutual information, and variation of information: This formula suggests a trade-off between mutual information (i.e., efficiency) and the variation of information (i.e., stability) for an interaction network with a given joint entropy. The ecological interpretation hereof will be discussed more extensively later in this section. The information-theoretic decomposition of an interaction network can be visualised in a bar plot [37], as opposed to more misleading Venn diagrams. This bar plot displays the relationships between the joint entropy, the marginal entropies, the conditional entropies, and the mutual information of an interaction network. The variation of information is indirectly represented, as it is the sum of the two conditional entropies. The barplot shown in Figure 2 displays the information-theoretic decomposition of the interactions between the five anemone species and five fish species presented in Figure 1. The contributions of the different components to the joint entropy and their relative importance can be analysed and interpreted using this plot. The above-defined measures can be linked to a uniform distribution, whose en is always maximal. When the interactions are uniformly distributed over the n sp of the bottom interaction level, every species has the same probability of interactio namely 1 n . Therefore, when the probability distribution is uniform, the marginal entro the bottom species can be computed as Similarly, the marginal entropy of uniformly distributed top species can be compute Finally, every interaction is equally likely when the joint distribution is uniform. A net with n bottom species and m top species comprises nm potential interactions. Ther in the case of a uniform joint distribution, every interaction has the same probabili namely 1 nm . Thus, the joint entropy of the uniform distribution can be computed as and it is equal to the sum of the two marginal entropies H(U B ) and H(U T ).
The differences in entropy between the uniform distributions and the correspon true distributions are defined as These measures quantify how much each distribution deviates from the correspon The above-defined measures can be linked to a uniform distribution, whose entropy is always maximal. When the interactions are uniformly distributed over the n species of the bottom interaction level, every species has the same probability of interaction p B i , namely 1 n . Therefore, when the probability distribution is uniform, the marginal entropy of the bottom species can be computed as Similarly, the marginal entropy of uniformly distributed top species can be computed as Finally, every interaction is equally likely when the joint distribution is uniform. A network with n bottom species and m top species comprises nm potential interactions. Therefore, in the case of a uniform joint distribution, every interaction has the same probability P ij , namely 1 nm . Thus, the joint entropy of the uniform distribution can be computed as and it is equal to the sum of the two marginal entropies H(U B ) and H(U T ). The differences in entropy between the uniform distributions and the corresponding true distributions are defined as These measures quantify how much each distribution deviates from the corresponding uniform distribution [42]. Note that the difference for the joint distribution is not equal to the difference beweten the entropy of a uniform bivariate distribution and the joint entropy, but rather to the sum of the marginal differences in entropy. We can see H(B) + H(T) as the joint entropy of the random vector (B, T), while assuming that B and T are independent. This renders the differences in entropy being additive, while joint entropy is not. The difference in entropy as compared to a uniform distribution, the mutual information, and the variation of information are related by the following balance equation [42]: This can be demonstrated by combining the equations shown above: The balance equation can be decomposed into the separate contributions of the marginal distributions of the bottom and top species: Note that this equation also illustrates why the term I(B; T) occurs twice in the global balance equation. These equations show how the maximal potential information of an ecological network is divided into a component expressing that some species are more important or active than others (D), a component that is related to the specific interactions between species (I) and a final component comprising the remaining freedom of the interactions (V). Table 1 presents an overview of these components of the decomposition, for the marginal distributions as well as the joint distribution.  A ternary entropy diagram or entropy triangle plot can be used to visualise the different components of the balance equation. Each side of the triangle corresponds to one of these three components. The entropy triangle enables a direct comparison of different networks, since each network will be represented by a single dot in the triangle. Such a diagram can be constructed for the total balance equation, as well as for the marginal balance equations of the bottom and top interaction level. Figure 3 displays these three entropy triangles. In order to determine the location of a network in the triangle, the balance equation is normalised by dividing all the components of the equation by the entropy of the corresponding uniform distribution [42]. For the triangle of the joint distribution, the computation of the three coordinates of a network is based on the following normalised equation: Recall that H(U BT ) = log nm. Each term of this sum corresponds to the coordinate of the network on one of the three sides of the triangle. Normalising the components of the balance equation by the maximal entropy results in values that are between zero and one that can be plotted on the triangle. The same applies for the balance equations of the marginal distributions: The left side of the entropy triangle corresponds to no deviation from the uniform distribution. The interactions of a network located at this side are uniformly distributed. Therefore, the potential freedom of choice is maximal. The bottom side of the entropy triangle corresponds to no mutual information between the interaction levels. The bottom species convey no information regarding the top species and vice versa. This indicates that there is no specialisation in the network. Finally, the right side of the triangle corresponds to no variation of information. There is no residual freedom of choice for the bottom and top species. Therefore, the stability of the network is low. The location of a network on the triangle gives us information regarding the importance of the different components of the balance equation and, hence, the interaction network structure. Networks that are located close to each other on the triangle will have a similar structure.
The left side of the entropy triangle corresponds to no deviation from the uniform distribution. The interactions of a network located at this side are uniformly distributed. Therefore, the potential freedom of choice is maximal. The bottom side of the entropy triangle corresponds to no mutual information between the interaction levels. The bottom species convey no information regarding the top species and vice versa. This indicates that there is no specialisation in the network. Finally, the right side of the triangle corresponds to no variation of information. There is no residual freedom of choice for the bottom and top species. Therefore, the stability of the network is low. The location of a network on the triangle gives us information regarding the importance of the different components of the balance equation and, hence, the interaction network structure. Networks that are located close to each other on the triangle will have a similar structure.   Three fictive interaction networks with extreme distributions are added to the triangle shown in Figure 4 in order to illustrate the use of the balance equation and the entropy triangle. These three extreme situations correspond to the three vertices of the triangle. To ease the interpretation, they are presented as interactions between anemone species and fish species. Table 2 contains the corresponding incidence matrices and their informationtheoretic decomposition. Note that the presented matrices are binary matrices. The observations need to be converted to probabilities before information theory can be applied.
The upper vertex of the triangle shown in Figure 4 represents a network with a uniform distribution. Its variation of information is zero, while the mutual information is maximal. This situation corresponds to the left incidence matrix presented in Table 2, which is an example of perfect specialisation. Each fish species interacts with one specific anemone species and vice versa. The mutual information between the anemone species and fish species is maximal. If we know which fish species participated in an interaction, then we immediately know which anemone species was visited, as there is only one possibility. Similarly, if we know which anemone species was visited, then we immediately know the interacting fish species. Knowing the fish species reduces the uncertainty regarding the anemone species completely and knowing the anemone species reduces the uncertainty about the fish species completely. Therefore, the variation of information is equal to zero. There is no residual uncertainty and, thus, no freedom of choice. Such a network is maximally efficient, but vulnerable, since the limitations on possible interactions between the bottom and top species are very strict. In the absence of its specific anemone species, a fish species has no symbiotic partner. Because both marginal distributions are uniform, the deviation from the uniform distribution is zero.   Table 2.
Three fictive interaction networks with extreme distributions are added to the t shown in Fig. 4 in order to illustrate the use of the balance equation and the entropy tr These three extreme situations correspond to the three vertices of the triangle. To e interpretation, they are presented as interactions between anemone species and fish s Table 2 contains the corresponding incidence matrices and their information-th decomposition. Note that the presented matrices are binary matrices. The observ need to be converted to probabilities before information theory can be applied.
The upper vertex of the triangle shown in Fig. 4 represents a network with a u distribution. Its variation of information is zero, while the mutual information is m  Table 2.
The bottom-right vertex represents a network deviating maximally from the uniform distribution, while the mutual information and variation of information are zero. These characteristics correspond to the middle incidence matrix shown in Table 2, where one interaction is dominating the network. The variation of information is again equal to zero, but the mutual information is now also zero. Knowing the anemone species does not further reduce the uncertainty regarding the fish species, since there is simply no uncertainty, as there is only one possible interaction. However, the deviation from the uniform distribution is maximal, since both marginal distributions deviate completely from the uniform distribution as one interaction dominates the network.
Finally, the bottom-left vertex represents a network with no mutual information between the interaction levels and a maximal variation of information, while the deviation from the uniform distributions is zero. Therefore, freedom of choice is maximal. The right incidence matrix that is presented in Table 2 is an example of such a network, where each fish species interacts with every anemone species. The network is homogeneous, without any specialisation. Similar to the first incidence matrix, the deviation from the uniform distribution is equal to zero. However, the mutual information is now also zero. Knowing the anemone species does not reduce the uncertainty regarding the fish species, since every interaction is equally possible. On the other hand, the variation of information is maximal. In contrast to the left incidence matrix, this network has high stability. In the absence of one or even multiple anemone species, a fish species has plenty of other interaction options. The freedom of choice of the anemone species and fish species is not restricted at all. However, the network has a low efficiency as a result of the trade-off between stability and efficiency.
In Figure 4, the real interaction network between the anemone species and fish species is also added to the entropy triangle. The network is located very close to the left side of the triangle, which indicated that the deviation from the uniform distribution is minimal. Its structure lies somewhere in between the homogeneous network structure where each fish species interacts with every anemone species and the perfectly specialised network where each fish species interacts with one specific anemone species, but is slightly closer to perfect specialisation. A visual comparison of the interaction networks that are shown in Figure 4 supports this result. Figure 3 displays the entropy triangles of the joint distribution and the marginal distributions of this example. The black dot represents the real interaction network between anemone species and fish species. The three extreme interaction networks still correspond to the same three vertices of the triangle. Their location on the marginal triangles is the same as on the joint triangle because the marginal distributions of the bottom and top interaction level are identical in each network. Note that this is not always true and it entirely depends on the structure of the network. The location of the real interaction network is slightly different in the three triangles, but is still very similar. The marginal distribution of the top species deviates slightly more from the uniform distribution than the marginal distribution of the bottom species. Table 2. Fictive incidence matrices illustrating the interpretation of the balance equation. Figure 4 presents the triangle entropy plot with the corresponding interaction networks.
As demonstrated above, the balance equation indicates a trade-off between efficiency and stability: one comes at the cost of the other [39,43]. For example, Gorelick et al. [44] used entropy and mutual information to quantify the division of labour. Their method is similar to the information-theoretic decomposition described above and the subsequent normalisation in the entropy triangle. When species have a wider variety of interaction partners, their freedom of choice becomes larger. Therefore, the overall network stability increases [32], but the efficiency of the interactions decreases as they are less specialised [44]. Figure 5 illustrates this antagonistic relation. In this graph, the deviation from the uniform distribution is assumed to be constant. Therefore, the joint entropy of the network and, thus, the diversity of the network, remains constant. The variation of information increases with an increasing freedom of choice at the expense of the contribution of the mutual information. In a changing environment, stability will be an essential network characteristic.
However, in a stable environment, efficiency will be a key factor [39]. The same graph can be constructed for the marginal distributions of the interaction levels, based on the decomposition of the balance equation into the separate contributions of the interaction levels. Table 3 summarises the elements of the information-theoretic decomposition of an interaction network and their ecological interpretation. Example networks are added in order to aid the interpretation. Table 3. An overview of the components of the information-theoretic decomposition of an interaction network and their ecological interpretation. For H(U BT ), the last column contains a network with uniformly distributed interactions. For the other indices, the last column contains a network with a low (left) and high (right) value for the respective index.

H(U BT )
Entropy of the network if all interactions would be uniformly distributed over the species and therefore the freedom of choice would be maximal.

D(B, T)
Expresses how much the observed interactions differ from a uniform distribution. A large deviation indicates that one or more interactions dominate the network and that the freedom of choice is restricted.

I(B; T)
Quantifies the level of organisation of the network, i.e., the limitation on possible interactions between the bottom and top species. A restricted number of possible interactions, i.e., a large mutual information, can lead to a higher efficiency.

V(B; T)
Quantifies the uncertainty that remains when the structure of the interaction network is known. A large variation of information corresponds to a large variety of possible interaction partners and thus a large uncertainty. This index can be seen as a measure of the network's stability. A restriction of the number of possible interactions and thus freedom of choice of the species, decreases the stability of the network.

H(B|T) H(T|B)
Quantifies the uncertainty that remains about the bottom species when the top species are known or vice versa. A large conditional entropy indicates that the interacting species have a large freedom of choice. This index is similar to the variation of information described above, but is based on the marginal distribution of a single interaction level, whereas the variation of information combines the information of both marginal distributions. A low conditional entropy indicates that the freedom of choice of the species of the interaction level is restricted, lowering the stability of the network.
information. In a changing environment, stability will be an essential network characteristic. However, in a stable environment, efficiency will be a key factor [39]. The same graph can be constructed for the marginal distributions of the interaction levels, based on the decomposition of the balance equation into the separate contributions of the interaction levels. Table 3 summarises the elements of the information-theoretic decomposition of an interaction network and their ecological interpretation. Example networks are added in order to aid the interpretation.  Vázquez et al. [45] list several mechanisms that could explain the structure of an interaction network. The influence of these mechanisms can be linked to the components of the balance equation. The first mechanism, interaction neutrality, causes all individuals to have the same interaction probability. For binary incidence matrices, where the frequencies are not taken into account, this situation corresponds to a uniform distribution of the interactions and, thus, H(U BT ), the left-hand side of the balance equation. Other mechanisms will influence the distribution of the interactions and, therefore, influence the individual contributions of the three components at the right-hand side of the balance equation. Trait matching, for example, results in some interactions being favoured, while other interactions are impossible. The mutual information will increase as interactions become more efficient. However, as a result of the trade-off, the variation of information and stability will decrease as interactions are restricted. The spatio-temporal distribution will also influence the interactions. Species cannot interact if they are not at the same location at the same time. This can also be taken into account in the information-theoretic decomposition. Location, as well as time, can be introduced as an additional variable, in addition to the bottom and top interaction levels B and T. It will impose a further restriction on the interactions, leading to increased mutual information and a decrease in the variation of information. This notion will be discussed in Section 2.4. As mentioned before, Vázquez et al. [45] also note that observed interaction networks do not always match the true interactions due to sampling artefacts. Therefore, sampling can also influence the observed interaction structure and information-theoretic decomposition.

Entropy and Diversity
A proper interpretation of the information-theoretic indices is vital in correctly analysing ecological interaction networks. Entropy can be used to quantify the diversity of a network, but it is solely an index of diversity. Entropy is by no means synonymous for diversity [19,25]. For example, an ecosystem with a Shannon-Wiener index of four-bits is not twice as diverse as an ecosystem with a Shannon-Wiener index of two bits. The first system is four times as diverse, due to the logarithmic scale of the index. Similarly, a change in an interaction network will have a different effect on the diversity than on the entropy. More specifically, the order of magnitude of the effect will not be the same. The entropy of an incidence matrix with twice as many interactions will not be twice as high, because entropy does not obey the replication principle or doubling property [18,22]. This hampers the direct interpretation of information-theoretic indices. Converting entropies into effective numbers solves this problem. In this way, information-theoretic indices can be easily interpreted and compared, as they are now expressed on a linear scale. Indices are then no longer expressed in bits, but in the original units, i.e., the number of species or interactions, aiding the interpretation [8,21].
The effective number of interactions E BT is the number of interactions between B and T in a network with the same joint entropy, but with all species engaging in equally strong interactions. When all of the E BT interactions in a network are equally strong, the probability of an interaction P ij is equal to 1 E BT . Therefore, the effective number of interactions can be computed as i.e., Note that, in the case of a binary incidence matrix, this results in the original number of interactions, as, without any information regarding the frequency or strength of the interactions, interactions were already assumed to be equally strong. For weighted matrices, the effective number of interactions will be different from the original number of interactions and will often not be an integer. Although the effective number of interactions might seem to be redundant for binary interaction matrices, it is not only applicable to the joint entropy, but also to other information-theoretic indices, where it is also more effective for binary interactions.
Similarly, we can define the marginal effective numbers E B = 2 H(B) and E T = 2 H(T) , which represent the effective numbers of the bottom and top species, respectively. The conditional entropies give rise to E B|T = 2 H(B|T) and E T|B = 2 H(T|B) , which, in turn, represent the average effective number of interactions for the top, resp. bottom, species.
Finally, the mutual information gives rise to the effective number E I = 2 I(B;T) , which represents the effective number of specific interactions. Together, the effective numbers give rise to: Here, the last one is the most revealing. It states that the effective number of interactions equals the product of: (i) the effective number of interactions of the bottom species (i.e., E T|B ), (ii) the effective number of interactions of the top species (i.e., E B|T ), and (iii) the effective number of specific interactions (i.e., E I ).

Higher-Order Diversity
So far, only information-theoretic indices for two variables, which represent the bottom and top interaction levels, have been considered. However, the formulas introduced above can be easily extended to three or more variables. In the case of three variables, the third discrete variable Z could represent an additional species level, but also a different influencing factor, such as the location of the interaction, the time, or an environmental variable. The joint entropy of three variables can be computed as Other information-theoretic measures can be extended by conditioning them on the third variable Z. For example, for the mutual information, we have In a similar way, we can compute D(B, T|Z) and V(B, T|Z) (for more information, see MacKay [37] and Cover and Thomas [38]). Note that, in our framework, we currently do not consider expressions, such as I(B; T; Z) and I(B|T; Z). Only conditioning on a single variable is allowed. Some information theorists provide an interpretation for such multivariate measures [46], although as of yet there does not seem to be a consensus. We leave the ecological interpretation of such measures for future work. For instance, consider the case where Z represents the location. By including this third variable in the indices, the influence of location on the uncertainty can be accounted for. In this way, entropy can be used to quantify alpha, beta, and gamma diversity. Alpha diversity is defined at a local scale, at a particular site [22]. This can be expressed by the conditional entropy given that the location is known: The alpha entropy H α quantifies the remaining uncertainty regarding the interactions when their location is known. Beta diversity, on the other hand, expresses the differentiation between local networks [22]. Therefore, beta entropy is the reduction in uncertainty that results from learning the location [25]: Using the chain rule for entropy [38], it can be shown that H β is also equal to the marginal entropy H(Z) of the location. Gamma diversity is the total diversity of an entire region. Because there is no knowledge regarding the location and, hence, also no reduction in uncertainty, gamma entropy can be quantified as The relation between alpha, beta, and gamma entropy is given by: These entropies can also be converted to effective numbers in the same way as above to be able to easily compare the alpha, beta, and gamma entropies, and interpret them as measures of interaction diversity: By converting these entropies to effective numbers, the relation between alpha, beta, and gamma diversity, as proposed by Whittaker [47], is retrieved: Beta diversity can be quantified as the ratio between regional (i.e., gamma) and local (i.e., alpha) diversity [48]. The effective numbers have an interesting interpretation. E γ corresponds to the effective number of interactions over the networks, while E α represents the effective number of unique interactions per individual network. Subsequently, we can interpret E β as the effective number of unique networks. Figure 6 presents two fictive incidence matrices for two different locations to illustrate the use of alpha, beta, and gamma entropy, and the conversion to effective numbers. The joint incidence matrix of the bottom interaction level, top interaction level, and location contains ten binary interactions. Therefore, the non-zero P ijk values are equal to 1 10 . Alpha, beta and gamma entropy can be computed using the formulas that are derived above. As mentioned earlier, the entropies do not obey the doubling property. Converting them to effective numbers eases the interpretation. Figure 6 presents the resulting values. E β indicates that the interactions in the entire region, comprising the two locations, are almost twice as diverse as the local interactions. Inferring this directly from the value of H β is less straightforward. beta and gamma entropy can be computed using the formulas that are derived above. As mentioned earlier, the entropies do not obey the doubling property. Converting them to effective numbers eases the interpretation. Fig. 6 presents the resulting values. E β indicates that the interactions in the entire region, comprising the two locations, are almost twice as diverse as the local interactions. Inferring this directly from the value of H β is less straightforward.  6. Fictive incidence matrices for two different locations to illustrate alpha, beta, and gamma entropy, and the conversion to effective numbers. There are 5.1 equivalent interactions per network and 1.96 equivalent networks.

Interaction Datasets
The information-theoretic analysis developed in this paper is applied to the interaction networks available in the Web of Life database 1 . This database contains various interaction types, including 51 host-parasite (HP), four plant-herbivore (PH), 17 anemone-fish (AF), four plant-ant (PA), 148 pollination (PL), and 34 seed dispersal (SD) interaction networks across the world. Weighted incidence matrices have been converted to binary observation .web-of-life.es/ Figure 6. Fictive incidence matrices for two different locations to illustrate alpha, beta, and gamma entropy, and the conversion to effective numbers. There are 5.1 equivalent interactions per network and 1.96 equivalent networks.

Interaction Datasets
The information-theoretic analysis developed in this paper is applied to the interaction networks available in the Web of Life database (http://www.web-of-life.es/ (accessed on 1 June 2021)). This database contains various interaction types, including 51 host-parasite (HP), four plant-herbivore (PH), 17 anemone-fish (AF), four plant-ant (PA), 148 pollination (PL), and 34 seed dispersal (SD) interaction networks across the world. Weighted incidence matrices have been converted to binary observation by mapping non-zero values to one. The 27 food webs that are available in the Web of Life database were not included, as we are focusing on bipartite networks.
We performed the analysis using the EcologicalNetworks.jl package in the Julia programming language [49], which we extended to include the indices that are described in this work. This package allows for an easy analysis of ecological interaction networks and it contains the information-theoretic indices introduced in this work, in addition to various other non-information-theoretic features for ecological network analysis. All of the Web of Life networks can also be accessed via this package. From here on, when we talk about the normalized components, either global or for a trophic level, we will informally use the terms D-, I-, and V-component. Figure 7 shows the information-theoretic decomposition of the interaction networks of the Web of Life database in the triangle entropy plot, while Figure 8 shows an extract for the 16 anemone-fish interaction networks included. There is no clear difference between the decompositions of the different interaction types. More striking is the fact that all of the networks are located at the left side of the triangle. The full range of the V-component and I-component is used, while the range of the D-component is limited. This can be explained by the fact that all of the networks are binary. In binary networks, the deviation from the uniform distribution is limited. The larger the network, the larger the deviation can be, but the larger H(U BT ) will be, the denominator of the normalised balance equation. We recall that any weighted incidence matrix was binarised. Figure 9 was created using the original data, containing both binary and weighted incidence matrices, in order to illustrate the full potential of the entropy triangle. This figure shows that, when weighted observations are used, the entire range of the D-component and, therefore, the entire triangle, is utilised. However, this does not mean that the information-theoretic decomposition of binary incidence matrices is irrelevant. Contrary to the D-component, the V-component and I-component show a wide spread for binary incidence matrices. These two components precisely explain the ecologically important trade-off between stability and efficiency. Because the D-component is small and less essential to analyse the network structure of binary networks, it makes sense to convert the entropy triangle to a graph with only two axes, one for the V-component and one for the I-component. Figure 10 shows the relation between the normalised V-and I-components. The values are the same as in Figure 7, with the only difference being that the D-component is no longer shown. Because the D-component is small, Figure 10 shows a linear relation, once more illustrating the trade-off between stability and efficiency.    be, but the larger H(U BT ) will be, the denominator of the normalised balance equation. 417 We recall that any weighted incidence matrix was binarized. To illustrate the full potential of the entropy triangle, Fig. 9 was created using the 419 original data, containing both binary and weighted incidence matrices. This figure   420 shows that when weighted observations are used, the entire range of the D-component 421 and therefore the entire triangle is utilised. However, this does not mean that the 422 information-theoretic decomposition of binary incidence matrices is irrelevant. Contrary  Figure 10 shows the relation between the normalised V-and I-components.

429
The values are the same as in Fig. 7, the only difference being that the D-component is 430 no longer shown. As the D-component is small, Fig. 10 shows a linear relation, once 431 more illustrating the trade-off between stability and efficiency. To illustrate the full potential of the entropy triangle, Fig. 9 was created using the 419 original data, containing both binary and weighted incidence matrices. This figure   420 shows that when weighted observations are used, the entire range of the D-component 421 and therefore the entire triangle is utilised. However, this does not mean that the 422 information-theoretic decomposition of binary incidence matrices is irrelevant. Contrary  Figure 10 shows the relation between the normalised V-and I-components.

429
The values are the same as in Fig. 7, the only difference being that the D-component is 430 no longer shown. As the D-component is small, Fig. 10 shows a linear relation, once 431 more illustrating the trade-off between stability and efficiency. tensor. Subsequently, we computed the higher-order indices as described in Section 442 2.4 as well as the information-theoretic decomposition, both marginalised as well as 443 conditional, e.g., I(B; T) and I(B; T|Z). These results are presented in Table 4.
444 Figure 10. The relation between the normalised V-and I-components for the joint distribution (left) and the marginal distributions of the bottom (middle) and top (right) interaction level. The different colours denote the interaction types, as described in Section 3.1.

Higher-Order Diversity: Time and Space
This section illustrates how higher-order information-theoretic indices can be used to study a spatially distributed metaweb. To this end, we use the networks that were collected by Hadfield et al. [50], containing 51 host-parasite networks collected over a large spatial region in Eurasia. The aggregated metaweb contains 206 flea species and 121 mammal species.
The collection of networks could be represented as a 206 × 121 × 51 tensor, where the respective dimensions correspond to flea species, mammal species, and location. We obtained the trivariate probability mass function by normalising this presence-absence tensor. Subsequently, we computed the higher-order indices, as described in Section 2.4, as well as the information-theoretic decomposition, both marginalised as well as conditional, e.g., I(B; T) and I(B; T|Z). Table 4 presents these results.  The alpha, beta, and gamma entropy are presented both in bits and in equivalent effective numbers. It is notable that the former are additive, while the latter are multiplicative. Here, because all of the interactions are equally weighted, E γ boils down to the total number of interactions distributed over all networks. This number is split into two parts, where E α can be seen as an estimate of the 'average' number of unique interactions for each network. At the same time, E β corresponds to the theoretical number of unique networks. Because of the overlap of the interactions, E β is substantially smaller than the 51 networks that were observed.
In the second part of Table 4, we present the information-theoretic decomposition. The marginal version corresponds to the setting where we have summed over the location variable Z. In contrast to many other networks that are discussed in this work, this marginal network does not have equal probabilities for all present interactions, since the same ecological interaction can occur multiple times at different locations. The Vcomponent seems to dominate this metaweb, meaning that this metaweb has plenty of redundancy for the species, which implies stability. The D-component is relatively low, which means that the species composing the network have a relatively equal importance. However, the marginal indices take the differences between networks into account, which leads to a completely different picture. The D-component is now dominating, which suggests that individual networks are largely determined by marginals or activities of the composing species. Conditioning reduces the relative importance of the I-component, showing that specificity is more present in the metaweb when compared to the individual networks. Note that, in whatever way the decomposition is computed, its components sum to 14.60 ≈ log(206 × 121), as suggested by the theory.

Discussion and Conclusions
Our work translates species interaction networks into bivariate distributions and it gives meaning to the various information-theoretic measures that one can compute. As discussed in the introduction, we are most certainly not the first (nor will we be the last) authors to apply information theory to community ecology. However, our work introduces the elegant balance equation of Valverde-Albacete and Peláez-Moreno [42], allowing for researchers to decompose the information in an interaction network. Similarly, we find that the information also decomposes interestingly across spatial and temporal scales, in a way that is compatible with the concepts of alpha, beta, and gamma diversity.
Normalising an incidence matrix results in a valid probability distribution. In this work, we mainly used presence-absence, basically giving every interaction an equal weight. This process removes considerable information regarding the interaction strength that is present in the visitation rates and similar data. Although the frequency of interaction does not always match the interaction strength [3,51], it can convey information regarding the relative importance of the species [52]. We found directly interpreting the abundances as frequencies to be suboptimal, both conceptually and empirically. Nevertheless, using normalised binary incidence matrices has the drawback of giving equal weight to all interactions, while it gives no weight to the possible interactions that were not sampled. These are chiefly limitations of the data collection side and not of the theoretical framework.
The problems outlined above arise because the sampled network is a realisation of the underlying interaction probabilities rather than the probabilities themselves. The challenge is to find a suitable statistical method for recovering these probabilities from the observations. Here, thresholding and normalising are straightforward ones.
Several authors have proposed more sophisticated models for recovering interaction probabilities, for example, based on Bayesian reasoning [29,53,54] or filtering [26,55]. In recent work, some of the present authors suggested a MaxEnt model to recover interaction probabilities given species abundances and interaction preferences [31]. Finding suitable methods for estimating the various proposed indices and assessing their statistical relevance is an important future challenge.
The proposed framework does not make use of the species identities. Indeed, all of the indices are invariant under permutation of rows and columns. As such, they do not take some species potentially having a similar ecological function into account. Leinster and Cobbold [56] addressed this problem by incorporating species similarities (for example, based on traits) as weights into the Shannon entropy. Recent work that was conducted by Gallego et al. [57] generalised this approach to all information-theoretic measures, including mutual information and a variation of information, to study probability distributions over any metric space. Thus, the latter work is fully compatible with our proposed methodology for analysing species interaction networks. Combining the proposed information-theoretic framework with the ecological role of the species will result in a much more in-depth characterization of an ecosystem.
When Shannon and Weaver popularised the landmark paper The Mathematical Theory of Communication to the non-specialist, they clarified that information is fundamentally related to the freedom of choice [14]. We interpreted this as an ecological choice, where we quantify the freedom of the top species given the bottom species and vice versa.

Data Availability Statement:
The data used for this study was retrieved from the Web of Life website and can be accessed via the EcologicalNetworks.jl package.

Conflicts of Interest:
The authors declare no conflict of interest.