Optimal Microbiome Networks : 1 Macroecological Characterization and 2 Criticality

17 The human microbiome is extremely complex considering the amount of species, their 18 interactions, and its variability over time as a function of environmental drivers. Here 19 we untangle the complexity of the human microbiome for the Irritable Bowel Syndrome 20 (IBS) that is the most prevalent functional gastrointestinal disorder linked to many 21 causes. Based on a novel information theoretic network inference model (that considers 22 conditional entropy reduction till the maximum entropy is not reduced further) we detect 23 species interaction networks that are functionally and structurally different for healthy 24 and unhealthy individuals. Healthy networks are characterized by a neutral symmet25 rical pattern of species interaction and small-world features for functional node degree 26 and distance versus random unhealthy networks. We detect an inverse scaling relation27 ship between species total outgoing information flow (”active flow”) and abundance. 28 The top 10 interacting species are also the least abundant for the healthy microbiome 29 and the most detrimental; however these species are controlled by other species (via 30 negative feedbacks) and the microbiome is self-organized into a healthy state. On the 31 contrary, the most abundant species for the unhealthy microbiome are the least inter32 active and the most detrimental. These findings support the idea about a diminishing 33 role of network hubs and hubs should be defined considering total outgoing informa34 tion flow. The healthy microbiome is characterized by high diversity growth rate, small 35 species similarity decay over time (i.e. low species turnover), and small variability in 36 the abundance of all species. This result challenges current views that posit an asso37 ciation between health states and the highest diversity in ecosystems rather than the 38 highest biodiversity growth as in this study. In a network perspective the healthy micro39 biome is configured as a small-world network with a tendency toward a critical scale-free 40 network while the unhealthy one is organized as a random network with many more in41 teracting species. We show how the transitory microbiome at the edge of the healthy 42 and unhealthy ones is unstable and criticality of the healthy microbiome is not at the 43 phase transition (or second order) between order and chaos but in a meta-stable state 44 (on the contrary of other critical systems where energy and entropy grow in the same 45 direction and criticality is at the transition). We stress out the importance of consider46 ing interacting pairs versus single node dynamics when characterizing the microbiome 47 nexus and of ranking these pairs in terms of their dynamics; interactions (i.e. species 48 collective behavior) shape transition from healthy to unhealthy states. The macroeco49 logical characterization of the microbiome is useful for diagnostic purposes and disease 50 etiognosis while species-specific analyses can detect species that are more beneficial to 51 humans leading to personalized design of preand pro-biotic treatments and engineered 52 microbiome transplants as two examples. 53 2 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 17 December 2018 doi:10.20944/preprints201812.0188.v1

The top 10 interacting species are also the least abundant for the healthy microbiome and the most detrimental; however these species are controlled by other species (via negative feedbacks) and the microbiome is self-organized into a healthy state.On the contrary, the most abundant species for the unhealthy microbiome are the least interactive and the most detrimental.These findings support the idea about a diminishing role of network hubs and hubs should be defined considering total outgoing information flow.The healthy microbiome is characterized by high diversity growth rate, small species similarity decay over time (i.e.low species turnover), and small variability in the abundance of all species.This result challenges current views that posit an association between health states and the highest diversity in ecosystems rather than the highest biodiversity growth as in this study.In a network perspective the healthy microbiome is configured as a small-world network with a tendency toward a critical scale-free network while the unhealthy one is organized as a random network with many more interacting species.We show how the transitory microbiome at the edge of the healthy and unhealthy ones is unstable and criticality of the healthy microbiome is not at the phase transition (or second order) between order and chaos but in a meta-stable state (on the contrary of other critical systems where energy and entropy grow in the same direction and criticality is at the transition).We stress out the importance of considering interacting pairs versus single node dynamics when characterizing the microbiome nexus and of ranking these pairs in terms of their dynamics; interactions (i.e.species collective behavior) shape transition from healthy to unhealthy states.The macroecological characterization of the microbiome is useful for diagnostic purposes and disease etiognosis while species-specific analyses can detect species that are more beneficial to humans leading to personalized design of pre-and pro-biotic treatments and engineered microbiome transplants as two examples.

Introduction 1.Microbiome Dynamics and Health
Microbial ecology has become an important topic for health sciences and other sciences such as biology, ecology, forensic and agriculture.Recent work has shown how each person maintains a fairly unique microbial fingerprint, and that microbial dysbioses are often associated with shifts in health-status.These shifts are typically associated with the gut that is the most diverse part of the human body considering the bacteria holobiont (Coyte et al., 2015;van de Guchte et al., 2018).We recognize that our microbiota are highly dynamic, and that these dynamics are linked to environmental and individual states (van de Guchte et al., 2018).The field is still in its infancy and it not yet settled upon whether gut microbial community structure varies continuously or if it jumps between "discrete" community states, and whether these states are shared across individuals.In particular, some researchers suggest that gut communities can be binned into discrete enterotypes (Arumugam, 2011), while others argue that gut communities vary along multidimensional continua without any universality (Knights et al., 2014).If the ultimate goal of microbiome research is to improve human health by engineering the ecology of the gut, and other applications are also of interest, we must first understand how and why our microbiota varies in time, whether these dynamics are consistent across humans, and whether we can define stable or healthy dynamics.This line of research is primarily missing how microbial diversity is organized considering all its facets and how this diversity changes when species interaction networks change.For instance the same level of diversity can be achieved via different network topologies that may lead to different health states (Caesar et al., 2015).

Microbiome Diversity and Funtional Network Organization
In order to determine the network organization of the microbiome and associate that to healthy or unhealthy states we consider the Irritable Bowel Syndrome (IBS) as the template syndrome to characterize microbiome dynamics (Martí et al., 2017;Sitkin et al., 2018).IBS shows common symptoms of cramping, abdominal pain and diarrhea related to altered gut flora.Previous research has found that the microbiome in people with IBS differs from that in healthy people (Martí et al., 2017); however, nobody demonstrated how the microbiome network is different for these healthy and unhealthy groups (i.e."states" generally speaking when not focused on a particular subpopulation) and how the transition from one to another occurs.By exploring this topic we propose novel network inferential models for gathering microbiome networks from species big-data; these models are based on the principle of maximum entropy that tries to gather the most informative set of variables about stable state patterns (Marsili et al., 2013;Servadio and Convertino, 2018).An example can be about sets of species for predicting a diverse set of species networks.Big-data is not necessarily related to the size of the data used but also to the number of calculations required to infer the underlying network structure.These computations increase exponentially with the number of species/nodes n considered beyond the geometrical criteria, where the number of connections is n(n − 1)/2, because of the directed topology of the network.A directed topology is for instance found when species interaction networks are non-symmetrical which means that the direct influence of two species does not have the same magnitude for different directions of interactions (Layeghifard et al., 2017).A variety of different models have been proposed to infer network structures from small and large datasets.For biological systems in particular, the inference of causal interactions among systems' components is a daunting task because not all interactions are known or the "true" magnitude of interactions considering the data used to assess these interactions.For instance microbiome networks are in principle different if the used input data are species occurrence, abundance, geographic range or other features.Also for this motivation, we employ assumption free inference models that consider the whole probability distribution of species dynamics and that are validated considering their ability to predict population patterns over time.We extract optimal microbiome networks as optimal information networks (OINs) (Servadio and Convertino, 2018) for healthy, transitory and unhealthy groups to investigate general patterns and drivers underlying microbiome stability and the interactions among different species in terms of network topology, magnitude and preferential direction.Additionally, we characterize macroecological functions α, β and γ diversity which describe the temporal organization of microbiome biodiversity considering point time, intertemporal and total diversity.We show how these functions are related to microbiome network features and different topologies emerge for different diversity/health states.The linkage between microbiome networks and macroecology is unique and offers additional insights into the ecology and the evolution of the microbiome with relevance to ecosystem health.

Microbiome Neutrality and Criticality
Speculations about the underlying processes of ecosystems' organization have been moved in the past considering diversity patterns and models able to predict these patterns such as neutral models (Zillio et al., 2008;Convertino, 2011;Azaele et al., 2016;Martinello et al., 2017).Neutral models posit that biological diversity is driven solely by ecological drift without a strong interference of environmental biases that lead to preferential dynamics ("niche") for some species versus others.From neutral to niche states a critical transition is typically observed where species network organization exhibits scale-free behavior (Convertino et al., 2012;Lahti et al., 2014;Ma, 2015;Gentile and Weir, 2018;Gonze et al., 2018).This scale-free behavior was though to occur only at the critical transition point but recent evidence also shows that criticality does exist also for stable states where system's component organization is optimal due to optimal information sharing among components and the environment (Hidalgo et al., 2014;Martinello et al., 2017).This result was already found for geophysical networks and coupled ecological networks (Banavar et al., 2001;Convertino et al., 2009) for instance where energy dissipation tends to a minimum.However, these models are typically driven by some assumptions which may lead to erroneous conclusions about the fitted patterns: in other words, predictability (under some assumptions) of biological patterns does not imply causality considering the hypothesized and implemented processes (Sugihara et al., 2012).A different approach is achieved by pattern-oriented models (Grimm et al., 2005) such as the one here proposed that do not assume any mechanism a priori and consider the whole information content in data (via probability distributions and their relevance to predict patterns via entropic functions (Marsili et al., 2013)) to claim underlying processes.In this sense we move our discussion of the problem of understanding microbiome dynamics toward which information is critical and how that model criticality (Mastromatteo and Marsili, 2011;Marsili et al., 2013) is associated to biological criticality also considering the neutrality of diversity dynamics.Therefore, rather than trying to untangle biological complexity via fitting some biologically inspired models we use all data available to check their information content to define all possible microbiome states and associated diversity patterns.In this information theoretic framework, in particular we show how criticality coincides with neutrality and optimal microbial network organization that lead to healthy states.

Microbiome Data
We considered microbiome data from Martí et al. (2017) for which species data for six individuals was available over time (30 days).Species Operational Taxonomic Unit (OTU) abundance data were derived by published 16S rRNA and shotgun metagenomic sequencing (SMS) data pertaining to the gut microbiotas; these data are from Martí et al. (2017).Two individuals suffered from IBS.Two others were healthy and the other two individuals, one was treated with antibiotics and the other was at the verge to be unhealthy.The latter individuals are considered to be part of a "transitory" group between healthy and unhealthy.

Probabilistic Characterization of Microbiome Variables
We characterize probabilistically the distribution of macroecological and microbiome network variables (generally indicated as Y ) considering the following general exceedance probability distribution function: where Y * is the truncation point ("hard truncation") for which the transition in the regime of the probability distribution is observed from exponential to power-law.We refer to "hard truncation" when the pdf clearly exhibits two regimes (for y < Y * and y > Y * ) in which two diverse pdf can be identified.λ factors are scale factors for the exponential distribution, either above or below the lower/upper cutoff defining the scale-free regime with power-law distribution.m is the upper cutoff after which exponential finite size effects occur.
We introduce the function f (y/m) to give more generality to the cutoff (or homogeneity) function.y − +1 is the scaling function where is the scaling exponent of the power-law distribution.Note that the probability distribution function p(y) y − scales with only.
dictates how the mean and the variance behave.For = 2 the pdf is the Zipf?s law that is found for many socio-ecological systems (James et al., 2018).

Information Balance and Exchange
To infer species interaction networks based on microbial abundance data we base our approach on the model developed in Servadio and Convertino (2018) based on previous efforts (Lizier, 2014;Villaverde et al., 2014).We consider the microbiome as a dynamic network of species interactions where the total free energy changes over time.Considering information entropy as the energy's counterpart, the total network entropy can be written as: where x i denote the i − s variables that contribute to the total information of the network N .
In our case x is the abundance of species.In this equation, H(x i ) denotes Shannon entropy, and T E(x i , x j ) denotes Transfer Entropy from the first variable to the second variable (Razak and Jeldtoft Jensen, 2014;Lizier, 2014;Servadio and Convertino, 2018); in our case both variables are the abundance of two different species.Eq. 2.2 represents a fundamental principles of information balance independently of the chosen entropy (Hanel and Thurner, 2011) and forms the general basis of sensitivity analyses.Eq. 2.2 states that the total network entropy can be decomposed into the entropy of each individual node plus the entropy of interactions.The sum of absolute TEs is a proxy of the Mutual Information (MI) of a variable, thus it considers the whole set of variable interdependencies; in Eq. 2.2 we consider the sign of TE because H(N) should consider the typology of interactions with their sign.
σ(N ) is a noise term that captures the unexplained variability of N related to variables not considered and other discretization factors.Eq. 2.2 can also be extended in space if spatially explicit calculations are needed as in Servadio and Convertino (2018).
The computation of TE is based on the distributions of the two variables of interest (species abundance) conditioned on their histories.Comparing the conditional probability of the variable on its own history with the conditional probability of the variable on both its own history and the history of a predictor variable provides asymmetry in determining predictive abilities of one variable onto another.Thus a directed network can be inferred.
Directed TE of two time series variables, denoted as X i and X j , was calculated as where X i,τ and X j,τ denote the respective histories of X i and X j at time t as well as considering all past values for the period t − τ .Here we consider the same memory lag for X i and X j but in principle historical dependencies can be different when considering other variables and the variable itself.In our microbiome study X i and X j are species abundances of species i and j.
The definition of TE can assume that the processes analyzed obeys a Markov model, that is suitable for memoryless stochastic process.This implies that future states depend only on the current state and not on events that occurred before it.Thus, in a Markov process it is assumed that τ = 1.Most of the time, this is true, especially for rapidly varying processes (such as for microbial abundance); however, this constraint can be relaxed by choosing temporal lags that are small enough to focus on short-term interdependencies which are not related to long dependencies in the underlying processes.In our case study abundance values of two randomly selected species did not correlate with abundance values for τ = 1; thus, memory processes are relevant and as in Villaverde et al. (2014) we selected the τ that maximizes the interdependency between two species assessed by the functional distance (see Eq. 2.12).

Maximum Entropy Networks
Among all values of TE the question remains on which value is the most informative about the potential causal relationship between two variable.As in Servadio and Convertino (2018) we proposed to select TEs that lead to the maximum entropy for the inferred network.This corresponds to maximize the Fisher information matrix (Borile et al., 2012).MaxEnt favors probability distribution functions with maximum entropy as the most general distributions that fit the observed data (Hanel et al., 2014).This theory can be applied to a functional network where edge weights are based on TE.The network with the greatest total entropy can be similarly favored as the most general network structure that fits the observed data.The method considers all possible pairs of variables in both directions for predicting a pattern of interest.The edges that comprise the network with the greatest total TE are then included.
Selecting the edges that contribute to the greatest amounts of TE, according to the MaxEnt theory, produces the network that most accurately describes "causal" patterns among the included variables.
A utility function is needed in order to establish the function where MaxEnt is applied.
The utility function can be thought as a systemic (network) value function i,j f i,j (X) w i,j (potentially multiplied by weight factors w i,j ) where value functions f i,j are TEs among species abundances.These TEs as in Eq. 2.3 assess the potential causal interaction between species pairs.Thus, the utility function is the total network entropy H(N ) (Eq. 2.2) that needs to be optimized in order to define necessary and sufficient TEs with the maximum entropy.The optimization can be subjected to feasibility constraints, for instance related to the ability to control certain species or data limitations.In the context of the present goal of creating a microbiome network indicator, the value functions f i,j are defined as: where {X i , X j } represents the directed edge connecting X i to X j , and MENet (Maximum Entropy Network) represents the set of directed edges in the network with the maximum total network entropy H(N ).The selection of edges to be included in the network is determined by finding the network with the greatest total entropy as in Eq. 2.2.In the present study, the utility function is defined as the total TE of the network (plus Shannon entropies of each species abundance but those turn out to be second or third order factors that can be neglected), and it was maximized by selection of the f i,j functions.To the best of our knowledge, this is one of the the first times that TE was framed in a decision analytical model via a network threshold entropy criteria that defines MENets.

Optimal Information Networks
To reduce redundancy in creating a MENet, variables that are strongly predicted by other variables (hypothetically establishing a strong causality if prediction accuracy of one decreases quickly when removing the other (Sugihara et al., 2012)) can be excluded.This can be done by evaluating the weighted in-degree and out-degree of each node in the network (i.e.TE).
Nodes with a greater weighted out-degree than in-degree can be included in the Optimal Information Network (OIN) that one among many MENets with the same average total entropy.OIN is then the necessary and sufficient MENet for predicting microbiome function.
These nodes are strongly predicting the variability of other nodes, thus the overall network dynamics.This entropy reduction (that does not affect much the total entropy) can be achieved by introducing functions g(X i ), defined as follows where j f i,j (X) = OT E and j f j,i (X) = IT E where OTE and ITE are the total outgoing and incoming TE for a node.Thus, variable inclusion depends on the comparison of the TE projected by the variable X i onto the other variables and the TE projected by the other variables onto X i .
The defined function g was then used to create the total network entropy that can be used to carefully describe the network dynamics: which represents the sum of all necessary variables that were included by the structure of MENet in a multi-criteria value function, and the sufficient variable after the redundancy exclusion to form OIN. In this way, the OIN inference was based on information theoretic and topological criteria to screen (i) the necessary information to maximize network entropy H(MENet) (i.e.information content), and (ii) the smallest nonredundant information to sufficiently predict total network function (of maximum entropy H(OIN)).This OIN is the network with the highest accuracy in predicting macroecological patterns of diversity over time that are dependent on fluctuating species abundance.

Assessment of Species Importance & Collectivity
Subsequently the inference of OIN it is possible to quantify the importance of different species considering their variability in isolation and in cooperation with other species for the dynamics of the mcirobiome.Species first order importance and interaction for reproducing the network dynamics are then calculated considering new indices based on nodal information transfer rather than on Mutual Information Indices (MII) as in Lüdtke et al. (2008).σ i is describing species interaction and is calculated as the ratio between the total Outgoing Transfer Entropy (OTE) though as information flow (OT E(j) = i T E j→i ) and the total network entropy, while µ i is describing the species importance as the ratio between the nodal Entropy thought as information content (using Shannon entropy) and the total network entropy.These Transfer Entropy Indices (TEI) are useful when no systemic variable is needed and analytically the are formulated as: and , where X i is any variable (e.g.species abundance) and Y is the predicted variable built using the same process of constructing OINs but selecting variable features rather than keeping entropy of species as independent variables.The use of the transfer entropy can give further information about the directionality of the causality (in a predictive sense of the model), and the time-lag of the causality.

Macroecological Indicators
To characterize the microbiome we introduce macroecological indicators that aim to describe ecosystems' collective variability of diversity locally, within communities or time points, and globally.In this paper we use such macroecological indicators that are time dependent (because space is non relevant nor provided) and of order zero (Jost, 2006).For a set of unique species S = {S 1 , S 2 , ..., S n } whose abundance X = {X 1 , X 2 , ..., X n } changes over time, we define the local species diversity, or α diversity as: where p k (t) is the probability to find one species at time t.Thus, α is the sum of diverse species at any given time during the observation period (30 days).Considering this definition of α is easily noticeable that the sum of the entropy of all species abundance is proportional to the Shannon index that is the local species diversity of order one (Jost, 2006).
Leaving aside the controversy about the definition of interspecies diversity over time, i.e.
species turnover, we define β diversity as the complementary variable of species similarity (here introduced via the Jaccard Similarity Index (JSI)): (2.9) where S t,t+1 = n k=1,t (p k (t) 0 + p k (t + 1) 0 )/2 is the number of species present at both time steps if p k (t) 0 and p k (t + 1) 0 are = 0, otherwise S t,t+1 = 1.
is the number of species present at time t (or t + 1) (Eq.2.8).
Note that this definition of β is proportional to the "true" β that is classically defined as the number of diverse species between two samples (either over space or time).β diversity can also be defined as a second order index where the entropy related to , 2006) where H γ = H(N ) is the total network entropy (Eq.2.2).Considering the variation of diversity over time β diversity is proportional to the complementary of the mutual to the sum of the TEs.
The total diversity γ is defined as: that is established over the total number of speciation events M (that is the sum of all species at any given time independently of their diversity) or equivalently from time t = 1 to the final time of observation T .Considering the validity of the information balance equation (Eq.2.2) that leads to the diversity balance H γ = H α + H β , the total diversity can also be calculated as γ = α • β (Jost, 2006).
The total number of speciation events, that is the number of events where new or existing species can be introduced, can be related to the number of unique species as follows.
where S is the number of unique species across the whole observation period, x i is the abun- dance of the counted species, and m i is the number of times that species occurs.

Functional and Structural Network Metrics
The organizational topology of the microbiome is characterized via structural and functional complex network metrics.Functional metrics are based on information theoretic quantities that quantify the interactions among species while structural metrics are based on the geometry of the network and derived from the former ones.
The functional distance between species is defined as: where the value is considering the minimum for all possible time delays τ .X i and X j are the abundance of species i and j and MI is the mutual information evaluated for different values of the temporal scale of species dependency τ .The τ that minimizes the distance d f is chosen for capturing the maximum interdependence M I max .Such distance as in Villaverde et al. (2014) quantifies the magnitude of interactions between species: the higher MI the shorter the distance that signifies high levels on interaction without specifying the directionality.Thus, because of the inability of assessing the direction of causality MI is a metric useful for identifying the most interacting pairs of the microbiome rather than individual species.
The calculation of the structural distance is based on the functional distance and the concept of the shortest path.The structural distance is then defined as the minimum number of steps from one node (species) to another independently of the magnitude of these steps (e.g. in terms of TE).Thus, analytically the structural distance is defined as: where In terms of connectivity the functional degree is defined for the directed network as as the sum of the weighted in-and out-degree (i.e.TE) elevated to a power exponent equal to zero.Then, analytically the functional degree is: where f i,j (X) = T E ij is the transfer entropy as defined in Eq. 2.3.
The structural degree is defined by thinking the network as an undirected network, thus where a i,j = 1 = T E 0 i,j if i and j are connected.Classically, the structural degree considers the number of connections independently of the bidirectional pathways implied by TE.
Thus, functional degree is always greater or equal to structural degree.

Results
By a simple cursory analysis it is evident that the average abundance of the healthy microbiome is lower than the average abundance of the unhealthy microbiome independently of the species; however, the maximum abundance is higher for the healthy microbiome and that species is one of the the most beneficial for health.By looking into species diversity (Figure S1) it is observed that the average number of species at any time point (α) is lower for the healthy microbiome than the unhealthy one.This may seem in contrast with previous findings that report higher diversity for healthy microbiome or in general for healthy ecosystems (Mellin et al., 2014;Lahti et al., 2014;Zaneveld et al., 2016).A controversy on the subject is already found in literature (Mellin et al., 2014) and just maximizing total diversity, without considering how that diversity grows and is organized, is not intuitively a necessary and sufficient ingredient to achieve a stable healthy state (Johnson and Burnet, 2016) More importantly, the rank-abundance pattern (Fig. 1B) shows only one dynamical regime for the healthy microbiome (exponential like) vs. two regimes for the transitory and the unhealthy microbiomes.Figure 1C shows that the decay in richness over abundance is higher for the unhealthy microbiome; this result underlines the fact that higher diversity does not imply stability because of the suboptimal, yet unsustainable distribution of species in the unhealthy microbiome.Figure 1 B and C shows the rank-abundance plot and the Preston's plot (Hubbell, 2001) of species diversity dependent on abundance.The rank-abundance shows two dynamical regimes for the unhealthy and transitory groups: a result that likely confirm the bimodality in local species richness α.By plotting the Preston's plot in log-log a scaling relationship is found showing a faster decay in species richness for the unhealthy group.
Considering the abundance of species in time, from the most abundant to the least abundant, a transition in the pdf of abundance is observed from a pseudo-normal distribution (corresponding to a homogenous spatial distribution) to a Dirac-like distribution (corresponding to a singular point distribution) considering the maximum and minimum abundance.Considering the abundance of all species together (Figure S2) the transition is less dramatic; from an exponential to a log-normal like distribution.Intermediate abundance species, independently of species belonging to the healthy, unhealthy or transitory group, show a scale-free like distribution underlying the fact that these species are fundamentally important in the function of the complex microbiome as highlighted in Lahti et al. (2014).Rare species seem also to display a truncated scale-free behavior (limited by their maximum abundance as finite size factor rather than limited by spatial biological constraints), that also underline their importance for the microbiome organization.These pdfs are a signature of species interaction networks for different abundance groups: pseudo-random, scale-free, and small-world topology for the highest, intermediate and lowest abundance class, respectively.Further results will discuss the connection between abundance and species information flow.
The inferred microbial networks corresponding to the three microbiome groups are shown in Figure 2. Maximum entropy networks are evidencing the different topology in microbiome organization for healthy, unhealthy and transitory group.In the structure of these networks, the size of each node is proportional to the Shannon entropy of the species and the color is proportional to the structural degree.in Fig. S3 we show the networks whose nodal color is proportional to the total outgoing TE (OTE) that is likely more representative of node activity in a collective network sense.The higher the value of structural degree (or OTE), the warmer the color.The width of each edge is proportional to the TE between pairs and the direction is the corresponding to the directional influence.The MaxEnt networks are OINs; yet, the networks for which the total network entropy is maximized (MENets) and where redundant nodes are removed (see section 2.3.3).The transition in network topology, from random to small-world (tending toward a scale-free network) for the unhealthy and healthy groups, is manifested also by the shift in total entropy pattern (left plot in Fig. 2).The latter is asymmetrical and symmetrical for the random/unhealthy and SW/healthy microbiome.The network entropy plots show that network entropy over information flow is roughly symmetrical for healthy individuals, expressing that the interconnectedness in healthy communities is more dynamically balanced.Figure S3 shows the microbiome networks for high value of the threshold (T E ij ) that is establishing the information exchange between species above which links become relevant.Considering the total network entropy and its decomposition, it is observed that the most important nodes in terms of OTE (Eq.2.6, and Fig. S6), that is the information flow necessary to predict all other nodes' dynamics, are the dominant factors in making up the total network information (Fig. S5).In other words, the entropy of each single node in isolation H(x i ) is a second or third order factor in determining the total network entropy.Figure S7 shows that most of species interactions (TEs) are positive for the unhealthy microbiome underlying the evidence that mutualistic positive feedback leads to instability; therefore, higher diversity α does not guarantee stability if interactions are predominantly in one direction.The healthy microbiome instead has balanced positive and negative interactions.
Figure 3 shows macroecological indicators of diversity of the microbiome for healthy, unhealthy and transitory individuals.We show that species diversity α, and total species diversity γ are the highest in the unhealthy group (for which average abundance is also the highest) but species similarity 1-β and the the diversity growth rate α over time are the highest for the healthy group.This is a critical result that shapes microbiome organization around healthy or dysbiotic state.The highest fluctuations in abundance and macroecological indicators (in particular α and γ) are observed for the transitory and unhealthy groups.These results underline the potential conclusion that too high levels of diversity are unsustainable, yet leading to unhealthy unstable states related to the abnormally excessive multiplication of species in the guy ecosystem.It is interesting to note the behavior of the pdf of α that informs about the potential states of the microbiome in each group.The pdf is platykurtic multimodal for the unhealthy microbiome which suggests the presence of multiple unstable states, and it is leptokurtic monomodal for the healthy microbiome which implies one stable state.The transitory microbiome shows an almost symmetrical pdf underlying the fact it exists in between the healthy and unhealthy microbiome.These results underlines the resilience of the microbiome as a whole dictated by the ability to change as a function of external stressors as well as the higher stability of the optimal healthy state.The latter however seems easy to perturb consider the lower entropy (and probability) to be defined in one state.
Species collective interaction and singular importance is show in Figure 4 by plotting the information theoretic global sensitivity indices σ i and µ i (see Methods).The top 10 interacting species are also the least abundant for the healthy microbiome and the most detrimental; however these species are controlled by other species and the microbiome is organizied into a healthy state.This result sheds some light into a vision where a diminishing role of network hubs (considering total information flow) is reported as found by other studies.
The least abundant species for the unhealthy microbiome are the most interactive and the least detrimental.On the contrary, the most abundant species (Fig. S4) for the unhealthy microbiome are the least interactive and the most detrimental.For the healthy microbiome, the most abundant species interact the least and these species are the most beneficial.These species-specific analyses are useful for detecting species that are more beneficial or detrimental and this knowledge can lead to design probiotic treatment or microbiome transplants for instance.Figure S7 shows that from the top to the least 10 TE species there is a shift in the pdf of abundance from a bimodal to a monomodal distribution for the healthy microbiome.
For the transitory and unhealthy microbiome instead there is a shift from a leptokurtic (Diraclike) to a platykurtic pdf (uniform-like).The top 10 TE species are the most dangerous bacteria ("antibiotic") but their abundance is small for the healthy microbiome; this means that these bacteria are controlled by all other good bacteria.
The non-linear duality between microbiome structure and function is shown in Figure 5 where structure is considered via the network degree (Fig. S8 and S9) and function is about the nodal information flow OTE.The epdfs show how microbiome function is much more suited to show network topology versus microbiome structure.As show by Figure 2 the healthy microbiome function is a tending toward a scale free organization.This mild scale free organization however does not correspond to a scale free distribution of α diversity (Fig. 5 bottom plot) that instead is exponential.The non-linearity between structure, function and microbiome service (diversity) is highlighted when plotting α dependent on functional network degree and distance.α increases for high value of functional connections (Eq.2.14) but does not have a clear trend when considering the functional distance (Eq.2.12); however, α(d f ) is lower for the unhealthy than the healthy microbiome for the same range of functional distances which highlight the more random distribution of diversity in any dysbiotic state.We observe 72, 378, and 9647 unique values of functional distance for the healthy, transitory and unhealthy group.These values were normalized and the distribution of α over the normalized distance shows a random arrangement for the unhealthy group with respect to the healthy one.
The most interesting results we find is when we combine microbiome service and function indicators, for instance considering total macroecological diversity γ and OTE. Figure 6 shows the relationship between γ and the temporal sampling scale (i.e. the number of speciation events) in analogy to the species-area relationship widely used in macroecology (Hubbell, 2001).The plot shows a scaling relationship for two orders of magnitude whose exponent is higher for the healthy than unhealthy group underlying the optimal growth of diversity for the healthy microbiome.Considering this optimal diversity growth relationship it is meaningful how the transitory microbiome has the largest value of γ leading to a change in diversity from the healthy species "poor" to the unhealthy species "rich" microbiome.These results are in synchrony with the power-law decay of species similarity 1 − β over time (Fig 6 bottom left plot).When considering OTE of species as a function of their abundance we find a surprising scaling law for four orders of magnitude; this law with an average exponent close to 1/4 (very common in biology, for instance the mass-specific Kleiber's law (DeLong et al., 2010)) implies a decay in species interaction for highly abundant species.When comparing γ over OTE a non-linear growth is detected where a common increase in total diversity occurs until a critical species interaction value above which γ slows down or remain stationary as for the healthy group.

Discussion
We employ a information theoretic model for the inference of the microbial species interaction network.The model is used to infer a microbial network suitable for predicting selected biodiversity patterns characterizing space-time organization of α, β, and γ diversity.Thus, the purpose of the model is not to infer causal (or "true") species-species interactions among bacteria.The "exact" computational inference of these interactions is always very hardprovided that there is a complete knowledge of the reality on which results can be validated -and dependent on the analytics and data used.For instance, abundance profile may not necessarily contain the information about all species-species interactions aimed to be assessed.
In this perspective the entropy-based model is focused on the predictability of patterns vs.
causal investigation of mechanisms.The total network entropy is the lowest for the healthy microbiome for any threshold of the information flow TE (Fig. 2).This implies higher free energy available to the healthy microbiome and lower information needed to function where information entropy in the physical space can be though as the average interspecies communication.The lower entropy in species collective interaction has certainly implications for data collection, potentially implying less data needed for characterizing healthy microbiomes.More theoretically speaking, the highest entropy is a sign of criticality that is the state toward which any ecosystem tend to (Hidalgo et al., 2014) where there is a balance of self-organization and environmental influence.
We do not find any strong scale invariance as for instance in Servadio and Convertino (2018), likely because no pure scale-free networks are observed.In this paper we focus on the total entropy as a utility function versus the value function defined in Servadio and Convertino (2018) where raw values of the network variables were considered rather than TEs among them.The focus on network variable interdependence (that is between species in this context) rather than nodal values (i.e.abundance for the microbiome) leads to a higher variability in network entropy patterns.However, we believe that the focus should be on network function in order to better characterize networks; this is substantiated by the higher importance of species interactions versus species independent dynamics as shown in Fig. S5.
Entropy-flow pattern are then useful for detecting scale-invariance in the functional topology of the network.Additionally the entropy-flow patterns can reveal healthy vs. unhealthy states by considering the symmetry of the entropy distribution; if symmetrical positive and negative species interactions (TEs) sum up to zero leading to a healthy neutral state.The asymmetry of unhealthy microbiome can certainly relate to non-neutral states created by strong stressors as highlighted theoretically in Borile et al. (2012) and may not allow host individuals to keep the microbiome "on a leash" (R. Foster et al., 2017).The broken symmetry can be indeed manifest an unhealthy state.The neutral state also coincides with the critical state because of the tending scale-free organization of the network manifested by the epdf of OTE (Fig. 5), higher functional distances and smaller functional degrees (Fig. S8).High threshold show that healthy group maintain topology while changing TE; this is because healthy networks are more scale free than unhealthy one.This configuration enhances stability as confirmed by the calculation of the dominant eigenvalue for both the adjacency and TE matrix; the dominant eigenvalue is the smallest for the healthy group that is a signature of stability of the network.
In our microbiome data we consider the complementary of β-diversity over time via the Jaccard Similarity Index (JSI) and we show that JSI is higher for the healthy than the unhealthy microbiome over time.This means that the local species richness, α, tends to be more equal to previous values over time; yet, this underlines the stability of α (species organization) in the healthy state.For the unhealthy microbiome the similarity over time is lower (i.e. higher species turnover, or higher β diversity) such as for the corals in Zaneveld et al. (2016) that are evaluated over time as a function of external stressors.In other types of ecosystems, e.g. in coral ecosystems under stress it was found that the true β-diversity increases over time Zaneveld et al. (2016).In macroecology, leaving aside the debates about the many definition of species turnover, in an entropic context β-diversity is the ratio between regional (γ) and local species diversity (α) (Jost, 2006).This definition is in line with the general information balance equation (Eq.2.2) and the more specific diversity balance equation as in Jost (2006).An increase in β is typically associated with a decrease in α as much as we observe for the healthy microbiome, and this is also associated to fluctuations of α that are smaller than for the unhealthy microbiome.The "proportional species turnover" (i.e. where β p = 1 − α/γ) that quantifies what proportion of species diversity is not contained in an average representative sample, is also higher.In ecology these quantities are typically evaluated over space and in healthy conditions 1-β has a relatively fast decay but never goes to zero; this means that heterogeneity exists but even communities far apart have species in common.Considering space in unhealthy conditions, typically the "true" β diversity is smaller than in healthy conditions because much more homogeneity is achieved.
The higher variation of β-diversity in healthy individuals highlight the "Anna Karenina phenomenon" for human microbiomes.The principles underlying the phenomenon states that dysbiotic individuals vary more in microbial community composition than healthy individuals paralleling Leo Tolstoy's dictum that all happy families look alike (each unhappy family is unhappy in its own way).The stability-unimodal pattern of diversity is concordant with current theories looking into β diversity vs. solely α diversity for the stability of ecosystems (Mellin et al., 2014).Convertino et al. (2015) had previously found that ecosystem hotspots are those that maximize the Value of (biodiversity) Information which coincides with those that minimize β diversity variability over time.The multiplicity of "unhappy/unhealty" states is reflected by the network topology that is random for the unhealthy group, which allows many more potential unhealthy microbiome combinations.We argue the point of previous studies, that Anna Karenina effects are a common and important response of animal microbiomes to stressors that reduce the ability of the host or its microbiome to regulate community composition.
Similarly to other ecosystems we show that scale-invariance (that is occurring for the healthy microbiome) does not arise from an underlying criticality (where fluctuations becomes bigger and bigger causing the system to tip abruptly) nor self-organization at the edge of a phase transition.Instead, it emerges from the fact that perturbations to the system exhibit a neutral drift (also relate to small extrinsic environmental changes) with respect to the endogenous spontaneous dynamics.This neutral dynamics, similar to the one in genetics and ecology, shows fluctuations of all sizes simultaneously that likely determine power-law distributed species diversity (as well as power-law information exchange among species).The tipping point that is observed, e.g. between healthy and unhealthy microbiome, is a secondorder critical transitions where exogenous fluctuations are too large to be assimilated by the system and the microbiome tips from healthy to unhealthy.This transition is evident in the shape of the pdf of microbiome function and diversity but not in the shape of microbiome structure (unless a rescaling in size, for instance for the microbial network degree (see Fig.

S5 top plot)).
The introduction of new pathogens driven by the environment can lead to the alteration of the whole ecosystem microbiome.In our case study, despite the non explicit consideration of the disturbance agent, we see a transition in IBS individuals from healthy to unhealthy states.The microbiome is like the gut of any ecosystem: no other species at all scales of biological organization can survive optimally if the microbiome is altered.The microbiome is the linkage between the fundamental genetic organization of life and the stochastic environmental dynamics; in the context of a person's growth it is possible to refer to those two processes as nature and nurture.The proposed information theoretic global sensitivity and uncertainty analyses (Figure 4) allows one to map the dynamics of species considering their interactions and absolute influence, and to see how these quantities vary considering their intrinsic biological variability and external variability.In the healthy state many more species are influencing the collective dynamics with a more organized distribution of interactions ("hierarchical" organization), while for the transitory and unhealthy state all species are somehow behaving equally and likely driven by external environmental stimuli ("random" organization).This organization is also reflected by network properties (Fig. S8 and S9) that can be altered for the same set of species/diversity.Previous papers found that cooperation promotes ecosystem biodiversity, that in turn increases its stability without any fine tuning of the species interaction strengths or of the self-interactions (i.e.neutrality) (Suweis et al., 2015;Tu et al., 2018).Even small value of TEs manifesting mutualistic interactions among species can stabilize the dynamics, yet stability increases with the ecosystem complexity where the latter is related to the scale-free like organization of bacteria.teresting to note that this scale-free cooperation of species leads to Taylor's laws (Kilpatrick and Ives, 2003;Ma, 2015) between mean and variance of abundance where Taylor's exponent is different for healthy and unhealthy groups (Martí et al., 2017).Yet, this reemphasizes the connection between time dynamics, network organization, and ecological patterns of diversity and abundance (Suweis et al., 2015;Grilli et al., 2017;Gonze et al., 2018).In particular it has been shown that higher-order interactions (e.g.captured by σ i in our model) have a stabilizing role (Grilli et al., 2017).These higher-order interactions are all those beyond the simple pairwise interactions whose sum indeed cannot explain the whole composition of ecosystems (Levine et al., 2017).We show that these higher-order interactions cannot be prevalent because some species must have an independent dynamics (capture by µ i ) otherwise instability and tendency toward disorganized unhealthy state is very likely (Fig. 4).
Universality in human microbiota dynamics can be ideally manipulated in a similar or even identical fashion in multiple individuals.Following the discovery of universality and the demonstration of beneficiary effects of specific interventions, microbiome engineering efforts can be applied to large number of people.In this way, microbiome engineering will be highly cost-effective as a public-health based approach.This in sharp contrast to the excessive cost of "precision-medicine" that tries to correct individual microbiome dynamics by thinking as it is a purely individual-based function.
Current frontier topics are also related to the understanding of how the microbiome and functional brain networks "communicate" (Allen et al., 2017).The hypothalamic-pituitaryadrenal axis (HPA axis) is a primary mechanism by which the brain can communicate with the gut to help control digestion through the action of hormones (Allen et al., 2017).It seems that the nervous system, through its ability to affect gut transit time and mucus secretion, can help dictate which microbes inhabit the gut; this in turns affect emotional response and long term well being beyond short-term health.• The healthy state is characterized by the highest total species diversity growth rate γ (leaving aside the transitory microbiome) and the lowest loss of species similarity over time, i.e. species turnover ((1 − β) ).A relationship similar to the species-area relationship is found between γ diversity and the number of species generations (over time) with an exponent equal to 0.20 on average.The fact that the healthy microbiome has the lowest average diversity is in contrast to what is observed in large scale ecosystems at stationarity where the highest total diversity correspond to the stable and supposedly healthy state.However, we speculate that an optimal diversity growth is oriented toward maximizing growth rate rather than total diversity (as according to many Pareto portfolio theories) that can lead to over-redundancy and instability as observed for the dysbiotic microbiome.Hence, we tend to challenge the diversity-health-stability hypothesis if for diversity is considered the total systemic diversity γ;

Conclusions
• We observe a phase transition of the second order from the healthy to the unhealthy state and vice versa.The transition from healthy to unhealthy is characterized by typical signs of transitions observed in many complex systems, i.e. an increase and a decrease in mean and variance of species diversity while approaching the transition.In the unhealthy state the variance of α is higher than in the healthy state and concentrated around two values which underline the chaotic-like dynamics of the microbiome.
In terms of functional network topology a transition between the SW-like to the random network topology is observed for the microbiome network.The critical state, defined by a scale-free like organization of microbial species interactions, coincides with the neutral state (i.e. for the symmetrical network entropy pattern) emphasizing how criticality does not necessarily occur at critical phase transitions (particularly for second order transitions as in this case); rather criticality can coincides with neutrality in open energy dissipative systems.Neutrality implies higher topological complexity but higher dynamical stability considering the small-world organization of the microbial network.
• A probabilistic linkage is found between microbiome function and services (defined as species interactions and macroecological indicators, respectively).We do not find any correspondence between microbiome structure and function which emphasizes the non-linearity between the two and the need to assess function rather than structure in biological networks.We propose the total Outgoing Transfer Entropy (OTE) as the measure to identify the most influential nodes; these nodes are able to predict the behavior of all other connected nodes.OTE is largely determining the total entropy of the network compared to the sum of nodal entropies whose contribution is negligible.
The highest OTE nodes have the lowest abundance, and are the most beneficial and the most detrimental bacteria for the dysbiotic and healthy microbiome.A scaling law is found between OTE and abundance with an exponent close to 1/4 that is similar to the mass-specific Kleiber's law where the species specific metabolic rate is the OTU and the mass is the abundance.A power-law distribution for the microbiome function (i.e. the sum of nodal OTE) is found for the healthy state (with an exponent ∼ 2 that implies finite mean but infinite variance) despite no information (or resolution) invariance is detected in the network entropy pattern.

Figure Captions
Graphical Abstract.and extracted Maximum Entropy Networks on the right.The size of each node is proportional to the Shannon Entropy of the species; the color of the node is proportional to the structural degree (in Fig. S3 the color of each node is proportional to the sum of total outgoing TEs of each node (OTE); the higher OTE, the warmer the color); the distance is proportional to exp(−I(X, Y )) where I(X, Y ) is the mutual information between species abundance x and y; the width of each edge is proportional to the pairwise Transfer Entropy; and the direction is related to T E(i− > j); the direction of this edge is from i to j.

Preprints
the adjacency matrix that can be formulated in terms of TE.The rationale for considering the shortest paths is related to the exponentially large ensemble of distances as a function of the number of nodes and the fact that biological systems always optimize information transmission; yet, Pareto shortest paths are always chosen(Seoane and     13

A
information theoretic model for the inference of microbiome networks and its biodiversity organization over time is proposed.The model consists in the assessment of transfer entropy based species interactions after entropy reduction calculations that remove the spurious direct interactions related to indirect interactions between species.Maximum entropy networks are then extracted considering the highest information without model overfit.The macroecologi-the model is performed considering the ability to simultaneously predict the pdf of α-diversity, γ-diversity growth over time, species similarity (1−β) decay, and the rankabundance profile.This validation allows to predict other biodiversity patterns such as the Preston's plot of average species richness dependent on species abundance.Considering the application of the model to healthy and Irritable Bowel Syndrome symptomatic individuals the following points are worth mentioning.•Directed species interdependencies and phase transitions of the microbiome over time are detected.The healthy microbiome is characterized by balanced positive and negative species interactions vs. the unhealthy microbiome where most species interactions are positive.The balanced interactions are evidenced by the symmetrical pattern of the network total entropy as a function of the pairwise information flow vs. the positively biased asymmetrical pattern of the dysbiotic microbiome.The healthy symmetrical network entropy pattern underlines the neutral "sum to zero" dynamics of species interactions; the same neutrality is found for biodiversity of large scale ecosystems at stationarity that are driven predominantly by intrinsic ecological stochasticity (ecological drift).On the contrary, unhealthy microbiome entropic patterns are affected by environmental disturbances; the positive bias in information flow (e.g.related to infections and antibiotics) causes an overgrowth in abundance of many opportunistic species as well as the generation of new detrimental species;

Figure 1 .
Figure 1.Abundance trajectories, rank-abundance, and Relative Species Abundance.Blue, green and red curves refer to the healthy, transitory and unhealthy microbiome.The healthy microbiome shows smaller fluctuations in species diversity α vs abundance and one regime when considering the rank-abundance profile.An inverse scaling law is detected for the RSA.

Figure 2 .
Figure 2. Network entropy patterns and inferred Maximum Entropy Networks.Network entropy dependent on the pairwise information flow (T E) (left patterns)

Figure 3 .
Figure 3. Macroecological indicators of microbiome networks and probabilistic characterization.Average α, species similarity 1 − β, and total diversity γ are plotted as a function of time.Their probability distribution is shown on the right.put α in the first top left plot

Figure 4 .
Figure4.Importance and interaction of microbial species, and top 10 most active species species.σ is describing species interaction and is calculated as the ratio between the total outgoing Information Flow (OTE) (OT E(j) = i T E j→i ) and the Total Network Entropy, while µ is describing the species importance as the ratio between the Nodal Entropy (Shannon Entropy) and the Total Network Entropy.The continuous line in each σ-µ plot shows the critical edge that describes a state between regularity and chaos.On the right the top 10 most active species in terms of OTE (and least abundant) are ranked.These species are the most detrimental for the healthy group and the most beneficial for the unhealthy one.

Figure 5 .
Figure 5. Exceedance probability distribution of microbiome structure, function, and service.Network degree, total outgoing transfer entropy (OTE) of each node, and α diversity over time characterize the structure, function and service of the microbiome network.

Figure 6 .
Figure 6.Macroecological scaling patterns and predicted species interactions.Left plots are about the scaling of total γ diversity and species similarity 1 − β dependent on the number of speciation events (that is the number of new and existing species introduced until the time considered); speciation time is a proxy of the sampling area over time.Right plots are about the scaling of OTE vs. abundance and γ diversity vs. OTE.