Information Thermodynamics and Reducibility of Large Gene Networks

Gene regulatory networks (GRNs) control biological processes like pluripotency, differentiation, and apoptosis. Omics methods can identify a large number of putative network components (on the order of hundreds or thousands) but it is possible that in many cases a small subset of genes control the state of GRNs. Here, we explore how the topology of the interactions between network components may indicate whether the effective state of a GRN can be represented by a small subset of genes. We use methods from information theory to model the regulatory interactions in GRNs as cascading and superposing information channels. We propose an information loss function that enables identification of the conditions by which a small set of genes can represent the state of all the other genes in the network. This information-theoretic analysis extends to a measure of free energy change due to communication within the network, which provides a new perspective on the reducibility of GRNs. Both the information loss and relative free energy depend on the density of interactions and edge communication error in a network. Therefore, this work indicates that a loss in mutual information between genes in a GRN is directly coupled to a thermodynamic cost, i.e., a reduction of relative free energy, of the system.

" and also a directed edge from " to ! . We transformed the undirected Barabási-Albert graphs into directed graphs by deleting any edge from ! to " when the node numbers constituting the edge satisfy the condition, > . This deletion results in every node in the graph having the same in-degree, , without any constraint on the out-degree. The final graph object, = ( , ), contains vertices numbered from 0 to − 1, and each directed edge ! → " in the set is such that < . A source node ! can only send information to node numbers higher than itself. Hence, the non-trivial values in the loss field, ( → ), (as shown in figures 2b and 2c) are confined within the upper diagonal, < .

S2. Creating model GRNs with mixtures of up and down regulation
To create the model GRNs with mixture of up and downregulation edges, as shown in Fig. 1b in the main text, we started with the directed Barabási-Albert graphs, = ( , ), generated as described in Section S1. Let the total number of edges or the cardinality of the set be | |, and the ratio of downregulation edges to upregulation edges be , as defined in the main text. We determined the number of downregulation edges in the graph, down , as the largest whole number less than or equal to /(1 + )| |. Then, we randomly selected down edges from the set and designated them as downregulation edges while specifying the rest as upregulation edges.
The number of nodes that can receive mixed signals, mixed , was determined by counting the number of nodes in the graph, = ( , ), that have both an upregulation edge and a downregulation edge directly connecting into it. a https://networkx.org/documentation/stable/reference/generated/networkx.generators.random_graphs.barabasi_albert_graph.html#networ kx.generators.random_graphs.barabasi_albert_graph

S3. Accessibility score distributions
Accessibility score for a receiver node " is the number of other nodes in the graph that has a path, or a connected set of edges, to send information to it. Increasing the in-degree creates more paths, and subsequently a receiver node can be accessed by more nodes in the graph. We demonstrate the effect of increasing m by evaluating the distribution of accessibility scores for =1, 2, and 3 type GRNs with 100 nodes.
For each value we generated 1000 graphs using the method described in Section S1. Then for each of the graphs we calculated the accessibility of each of the 100 nodes. Therefore, we obtained a total of 1000 × 100 samples of the accessibility score for each value of , which were used to obtain the accessibility score distributions shown in Figure S1. The out-degree distributions for each value of are shown in Figure S1. Figure S1: Accessibility scores of nodes in Barabási-Albert graphs as a function of the in-degree to every single node . (a) One of 1000 replicates of =1, 2, and 3 type model GRNs used to survey the accessibility score. (b) Distributions of the accessibility scores obtained from 1000 replicates of =1, 2, and 3 type model GRNs. (c) Out-degree distributions for =1, 2, and 3 type model GRNs from the same 1000 replicates used to compute the accessibility score distributions in Figure S1(b).
Other python packages that were used to compute the loss fields and generate the plots are, numpy, scipy, and matplotlib.