Comparison Between Bayesian and Maximum Entropy Analyses of Flow Networks †

We compare the application of Bayesian inference and the maximum entropy (MaxEnt) method for the analysis of flow networks, such as water, electrical and transport networks. The two methods have the advantage of allowing a probabilistic prediction of flow rates and other variables, when there is insufficient information to obtain a deterministic solution, and also allow the effects of uncertainty to be included. Both methods of inference update a prior to a posterior probability density function (pdf) by the inclusion of new information, in the form of data or constraints. The MaxEnt method maximises an entropy function subject to constraints, using the method of Lagrange multipliers, to give the posterior, while the Bayesian method finds its posterior by multiplying the prior with likelihood functions incorporating the measured data. In this study, we examine MaxEnt using soft constraints, either included in the prior or as probabilistic constraints, in addition to standard moment constraints. We show that when the prior is Gaussian, both Bayesian inference and the MaxEnt method with soft prior constraints give the same posterior means, but their covariances are different. In the Bayesian method, the interactions between variables are applied through the likelihood function, using second or higher-order cross-terms within the posterior pdf. In contrast, the MaxEnt method incorporates interactions between variables using Lagrange multipliers, avoiding second-order correlation terms in the posterior covariance. The MaxEnt method with soft prior constraints, therefore, has a numerical advantage over Bayesian inference, in that the covariance terms are avoided in its integrations. The second MaxEnt method with soft probabilistic constraints is shown to give posterior means of similar, but not identical, structure to the other two methods, due to its different formulation.


Introduction
The analysis of flow rates on networks is required for the design and monitoring of electrical, water, sewer, irrigation, fire suppression, drainage, oil, gas and any other networks through which fluids or energy are transported.Their analysis is an important engineering problem.Traditionally, these systems have been analysed using deterministic methods.These methods incorporate physical laws such as Kirchhoff's first and second laws (conservation of mass and equivalence of potentials at nodes) and sufficient known parameter values, giving a closed-form set of equations which is solved for the (deterministic) solution.Deterministic methods yield precise parameter values but do not consider uncertainty, either due to a lack of knowledge of the state of the system or flow variability.To account for uncertainty, a probabilistic framework is required.There are two primary methods for probabilistic inference: Bayesian inference using Bayes' rule, and maximum entropy (MaxEnt) analysis.
Bayes' theorem comes from the product rule of probabilities.To use Bayes' theorem, the prior and likelihood functions need to be chosen before the data are analysed.To analyse the data, a set of data values are incorporated in the likelihood function, which is then updated by Bayes' rule to obtain the posterior.This process can be repeated for each data set by using the posterior as the prior for the next data set.The order in which the data sets are analysed should not impact the final result.
The analysis presented here follows the well-known Bayesian parameter estimation or regression procedure as described in [1].Although the applications of this procedure are extremely vast, including least-squares regression, the authors are not aware of it being applied in its general form to estimate flows on a network.An example of using Bayes' theorem with transient pipe flows is presented in Rougier and Goldstein [2] who solve the water hammer partial differential equations, incorporating uncertainty from the pipeline characteristics and the boundary conditions.Bayes' theorem is used to estimate the flows, pressures and pipeline characteristics as time progresses, using data obtained through real-time monitoring of the pipeline in a few locations.As this method requires the solution of a partial differential equation which incorporates time and uncertainty, its computational cost is high and therefore is restricted to small networks; in the example, a single pipe is analysed.The Bayesian method can also be used to calibrate model parameters, often using least-squares regression.Savic et al. [3] provide a comprehensive review of calibration techniques used with water networks.As an alternative to predicting model parameters, Hutton et al. [4] use Bayes' theorem to update the coefficients in an autoregressive, data-driven model to predict future flow rates (at two locations in their example) using current and previous flow rate observations.In their case study, they were able to provide accurate one-hour forecasts for the monitored locations.Hutton and Kapelan [5] extended their previous analysis to predict pipe bursts by considering the difference between their predicted and observed flow rates.They were able to detect abnormal flow conditions representing pipe bursts greater than 5% of normal flow conditions.
Entropy is a measure of uncertainty [6][7][8][9][10].The MaxEnt method for inference can be derived from an axiomatic approach based on the axioms of locality, coordinate invariance and subsystem independence [11][12][13].Alternatively the MaxEnt method can be derived from a combinatorial approach [14][15][16][17], which shows that the MaxEnt method infers the most probable distribution, subject to the constraints and prior [6,[14][15][16][17][18][19][20].The maximum relative entropy method (MaxEnt), equivalent to the minimum Kullback-Leibler divergence [21], is a method of inference used to infer or update a probability distribution describing an under-determined system, which respects all constraints imposed on the system and is closest to the prior distribution [8].However, the MaxEnt method is a method of inference, with no guarantee that the inferred solution will be realised [6,8,9].The validity of the distribution will depend on the assumptions used to construct the MaxEnt model.The MaxEnt method has been applied to predict the flows on water distribution networks [22][23][24][25], transportation networks [26], electrical networks [26] and generic flow networks [27][28][29].
There have been many studies on the connection between Bayes' theorem and the MaxEnt method, with some authors suggesting that one can be obtained from the other (in either direction) [30][31][32][33].Giffin and Caticha's method [31][32][33] to obtain Bayes' rule using MaxEnt requires the relative entropy function to be defined over the model parameters and the data.The normalisation constraint is applied, and the variables representing the data are constrained with a pdf (probability density function) representing the Bayesian likelihood function.The pdf constraint is applied to the parameters defined to be the prior in the Bayesian method.Bayes' rule is obtained by dividing the inferred distribution by the pdf defined over the data parameters, i.e., dividing by the pdf constraint.Although this equivalence is mathematically correct, the second constraint appears somewhat contrived as it is applied over the parameters of the Bayesian prior.
The current authors have compared the probability distributions of quasi-Newton rules obtained when inferring the Jacobian or Hessian using Bayesian inference [34,35] and the MaxEnt method [36].
In both methods, the same Gaussian prior was used.In the MaxEnt method, secant equations were used as the constraints.In Bayes' method, delta likelihood functions incorporating the secant equation were used to represent the data.It was found that both methods obtained the same posterior means, but the covariance matrices were found to be different.
In this study, we develop a Bayesian method to analyse flow networks (Section 2).This theory contains many features in common with the MaxEnt method of [25].In Section 3.1, we present a MaxEnt theory using soft constraints that are implemented in the prior pdf.In Section 3.2, we compare the distributions obtained by the two methods.In Section 4.1, we also present a MaxEnt theory with soft constraints implemented using pdfs as constraints, and, in Section 4.2, we compare this to the Bayesian method.Finally, in Section 5, we discuss our findings.

Bayesian Analysis
Consider a flow network with N external flow rates and M internal flow rates, which can be assembled into the vectors Θ and Q, respectively, which, in turn, can be assembled into the vector In the Bayesian method, to avoid inconsistencies due to different network representations, we consider a basis set X of n flow rates selected from Ψ as parameters of the pdf used to represent the uncertainty.The indices of the basis set X in Ψ are given by the set B, while the indices of the complementary non-basis set of flow rates in Ψ are given by N .For closure, at least N − 1 basis flow rates must be chosen but up to N + M can be chosen.The derivation of the Bayesian method requires a prior belief of the state of the system, represented as a prior pdf q(X), which is updated using observed data to a posterior pdf according to Bayes' rule: where p(y|X) is the likelihood function, the denominator allows for normalisation, X is the basis set of flow rates, y is the vector of observed data and Ω is the domain of X.The flow rates X not included in the basis set are taken as functions of the model parameters X, using: where in which • diag() places the elements of a vector on the diagonal of a square matrix of zeros; • the set V contains the N + M − n indices of the equations required to uniquely define X from X; • the matrix C is an N × (N + M) connectivity matrix containing elements {−1, 0, 1}.Its entries C i,r , ∀i ∈ {1, ...N} indicate membership of edge r to the node i, given by 0 if the edge is not connected to the node, 1 if the assumed direction of Q m or Θ i is entering the node and −1 otherwise; • the vector K is an (M + N) × 1 vector of flow resistances; • the matrix W is a w × (N + M) loop matrix containing elements {−1, 0, 1}, where w is the number of independent cycles (loops) within the network.Its entries W i,r , ∀i ∈ {1, ..., w} indicate membership of edge r within loop i, given by 0 if the edge is not in the loop, 1 if the assumed direction of Q m is in a clockwise direction around the loop and −1 otherwise; • the matrix F is a N Θ + N Q × N + M matrix containing either 0 or 1 in each of its elements.Each row will have a single 1 on the index corresponding to the dimension of the observed link, with the remaining elements set to 0; • N Θ and N Q are the numbers of flow rate observation locations for flows entering/exiting or within the network respectively; and • the matrix T is a h c × (M + N) pseudo-loop matrix containing {−1, 0, 1}, where h c is the number of potential difference constraints applied.The pseudo-loop matrix contains paths between nodes of known pressure or potential values.For convenience, these are referenced to the potential at a single reference node H 0 ; this gives Y T as the h c × 1 vector of mean potential differences between H 0 and H j , for all nodes with potential observations.The entries in T i,r , ∀i ∈ {1, ..., h c } indicate membership of edge r within the potential difference constraint index i, given by 0 if the edge is not in the constraint, 1 if the assumed direction of Q m is defined as in the direction from node 0 to node j, and −1 otherwise.
The prior is chosen to represent one's belief of the system state before incorporating any measured data.Although any distribution which represents what is believed about the system state could be chosen, in this study, a multidimensional Gaussian distribution is selected, defined over the real domain: where m is the n × 1 vector of prior mean flow rates and Σ is the n × n matrix of prior covariances.In Bayes' method, likelihood functions are used to incorporate the physics of the system as well as any observed data, as follows: • The likelihood function to incorporate conservation of mass at each node or Kirchhoff's first law (or the flow rate for incompressible systems) is given by a delta function where This delta function is defined by the limit of a Gaussian distribution • The likelihood function to incorporate the loop laws for each loop, Kirchhoff's second law, is given by a delta function where This delta function is defined by the limit of the Gaussian distribution • Observed flow rates can be constrained using the likelihood function where Y F is a N Θ + N Q × 1 vector that has the flow rate of each observation for a link in its elements, Σ F is the N Θ + N Q × N Θ + N Q covariance matrix of the observations and • Observed potential differences can be constrained with the likelihood function where Y T is a h c × 1 vector that has the potential difference of an observation between two points in each element, Σ T is the h c × h c covariance matrix of the observations and Applying Bayes' rule with each of the likelihood functions, and expanding and dropping all terms which are not functions of X gives the posterior in the form (21) where Combining like factors gives Completing the square gives where the mean flow rates and variance matrix are given by Using the Woodbury matrix identity [37] to find the posterior covariance gives The following algebra is needed to find a form which does not require an inversion of a zero matrix arising from the delta functions.Right multiplying the inverse posterior covariance by ΣO gives then left multiplying with the posterior covariance Extracting Σ p O S −1 by right multiplying by S + OΣO −1 gives The posterior mean flow rates can then be found from Equation ( 27) by substituting Equation (29) and Equation (32) to give

Formulation
The maximum entropy method is defined by the following algorithm [6,9]: (i) define a probability measure over the uncertainties of interest; (ii) construct a relative entropy function; (iii) define a prior probability function and constraints; (iv) maximise the entropy subject to the constraints and prior, to infer the probability distribution which describes the system; and, if desired, (v) extract statistical moments of quantities of interest.Soft MaxEnt constraints have previously been suggested by the authors [24,25] but have not been formally derived.To implement soft constraints, we define a pdf which expresses the uncertainty in the system defined over a reduced parameter set X, consisting of a basis set of n flow rates selected from Ψ and also the parameter observations Y F and Y T .The indices of X in Ψ are again given by the set B. Again for closure, at least N − 1 basis flow rates must be chosen but up to N + M can be chosen.The joint probability is defined to be: where Υ X , Υ Y F and Υ Y T are the vectors of the random variables for X, Y F and Y T , respectively.We also assume that each of the flow rate and potential difference constraints are applied as soft constraints, but this does not restrict this method to only use soft constraints, and strict constraints defined by expectations can still be applied.This choice of pdf gives the following relative entropy or negative Kullback-Leibler function [21], over the space of uncertainties used in this formulation: where n o = N Θ + N Q + h c , the number of data observations, q(X, Y F , Y T ) is the prior pdf, and l i and u i are the lower and upper bounds of the ith flow rate.The relative entropy is then maximised subject to the constraints on the system.The following constraints are always required: • Normalisation of probability: • Kirchhoff's first law, for the conservation of flow rates at each internal node, here imposed in the mean: • Kirchhoff's second law, which requires the potential difference to vanish around each enclosed loop, again imposed in the mean: We also allow for any of the following constraints: • A set of specified inflow/outflow and internal flow rate constraints: • Potential difference constraints between pairs of nodes: After identifying the constraints, the entropy ( 35) is then maximised subject to Equations ( 36)-(38) and whichever of Equations ( 39) and (40) apply.Applying the calculus of variations, we form the Lagrangian: where κ, (scalar) α, β, λ and η (row vectors) are the Lagrange multipliers for the normalisation, Kirchhoff's first and second laws, flow rates and the head loss constraints, respectively.The variation of L is given by δL = 0. Extremizing Equation (41) by taking the functional derivative with respect to p(X) and combining integrals gives: where κ = κ + 1. Rearrangement gives the following solution for p(X) (the Boltzmann distribution): This can be solved, in conjunction with the constraints ( 36)-( 40), to give p * (X, Y F , Y T ) and the Lagrange multipliers κ, α, β, λ and η.

Solution and Comparison to Bayesian Solution
In the MaxEnt method, we choose the prior pdf as the multidimensional Gaussian where m (n × 1 vector) and Σ (n × n matrix) are the mean and covariance of the prior flow rates within the entropy function, m F (N Θ + N Q × 1 vector) and m T (h c × 1 vector) are the values of the observations of the flow rate and potential differences, respectively, and where γ = α β λ η .Combining terms of the same order and assuming the covariance is symmetric and positive definite and completing the square The above form allows the mean to be obtained as Using the constraint equations, the Lagrange multipliers can be found from Substituting Equation (49) into Equation (48) gives Extracting the posterior means then gives Applying the limit to Σ C and Σ W gives Equation (33).In consequence, the MaxEnt formulation with soft prior constraints Section 3.1 and the Bayesian formulation Section 2 give the same mean flow rate prediction (33), but with different covariance matrices.

Formulation
The MaxEnt method can also incorporate soft constraints using a probabilistic representation of the observed data.To implement this, we define a pdf that expresses the uncertainty in the system defined over a reduced parameter set X, again consisting of a basis set of n flow rates selected from After identifying the constraints, the entropy (53) is then maximised subject to Equations (54)-( 56) and whichever of Equations ( 57) and (58) apply.Applying the calculus of variations, we form the Lagrangian: where κ, (scalar) α, β, λ and η (row vectors) are the Lagrange multipliers for the normalisation, Kirchhoff's first and second laws, flow rates and the head loss constraints, respectively.The variation of L is given by δL = 0. Extremizing Equation (59) by taking the functional derivative with respect to p(X) and combining integrals gives: where κ = κ + 1. Rearrangement gives the following solution for p(X) (the Boltzmann distribution): This can be solved, in conjunction with the constraints (36)-(40), to give p * (X) and the Lagrange multipliers κ, α, β, λ and η.As the purpose of the soft constraints is to incorporate a distribution as a constraint, we take λ = −1 and η = −1.

Solution and Comparison to Bayesian Solution
If the prior is chosen to be proportional to Equation ( 6 The posterior mean flow rates can now be expressed using Equation (70) by substituting Equations ( 72) and (75) to give Using the constraint equations, the Lagrange multipliers can be found from Substituting Equation (78) into Equation (77) gives the posterior means or As evident, the first bracketed term is of similar structure to Equations ( 33) and (51), although it contains parameters related to the soft probabilistic constraints.The second term gives a second-order expansion of that solution relating to the interaction between the hard moment constraints and soft probabilistic constraints.If all constraints were applied as soft probabilistic constraints, Equation (33) would be obtained.Numerical experiments suggest that the means obtained in Equation (80) are equal to the Bayesian posterior means Equation (33), in several examples considered, but the covariances are different.

Discussion
The MaxEnt and Bayesian methods rest on different theoretical foundations but are both able to predict flows on networks by updating the prior belief to the posterior with the inclusion of new information in the form of constraints or uncertain data.This study compares the application of Bayesian inference and the MaxEnt method for the analysis of flow networks, for the latter using soft constraints-included in the prior or imposed as probabilistic constraints-in addition to standard moment constraints.It is shown that both the Bayesian method and MaxEnt method with soft prior constraints, implemented using a multidimensional Gaussian prior pdf, infer the same mean flow rates but different covariance matrices.In the Bayesian method, the interactions between variables are applied through the likelihood function, using second or higher-order cross-terms within the posterior pdf.In contrast, the MaxEnt method incorporates interactions between variables using Lagrange multipliers, avoiding second-order correlation terms in the posterior covariance.The MaxEnt method with soft prior constraints therefore has a numerical advantage in its integrations, in that the covariance terms are avoided.
In contrast, the second MaxEnt method with probabilistic and moment constraints is shown to give a posterior mean of similar, but not identical, structure to the other two methods.Due to the mixture of constraint types, some of the interactions between variables are incorporated in the Lagrange multipliers and some are incorporated in the covariance matrix, leading to a more complicated formulation.
For both MaxEnt formulations given herein, the equivalence between the posterior means inferred by the Bayesian and MaxEnt methods is dependent on the choice of a multidimensional Gaussian prior and its parameterisation.Further research is required to classify the effect of other prior distributions on the MaxEnt and Bayesian formulations, and whether these lead to equivalences between the means or higher-order moments of the inferred posterior pdf.
), the MaxEnt probability distribution with normalisation, Kirchhoff's first and second law, potential difference and flow rate constraints is proportional to Left multiplying with the posterior covariance then gives Σ Õ = ΣP Õ S−1 S + ÕΣ Õ (74) and obtaining ΣP Õ S−1 by right multiplying by S + ÕΣ Õ −1 gives are their respective covariances.The resulting MaxEnt pdf with normalisation, Kirchhoff's first and second law, potential difference and flow rate constraints is proportional to ln