A Bayesian Approach to the Balancing of Statistical Economic Data

This paper addresses the problem of balancing statistical economic data, when data structure is arbitrary and both uncertainty estimates and a ranking of data quality are available. Using a Bayesian approach, the prior configuration is described as a multivariate random vector and the balanced posterior is obtained by application of relative entropy minimization. The paper shows that conventional data balancing methods, such as generalized least squares, weighted least squares and biproportional methods are particular cases of the general method described here. As a consequence, it is possible to determine the underlying assumptions and range of application of each traditional method. In particular, the popular biproportional method is found to assume that all source data has the same relative uncertainty. Finally, this paper proposes a simple linear iterative method that generalizes the biproportional method to the data balancing problem with arbitrary data structure, uncertainty estimates and multiple data quality levels.


Introduction
In the compilation of statistical economic data, such as a census-based Input-Output (IO) table or a social-accounting matrix (SAM), it is often the case that the data is not balanced, i.e., row and column sums do not add up [1].Furthermore, data balancing is important in practical applications such as updating or regionalizing IO tables, or decomposing proximate causes of economic change [2][3][4].So as more countries develop IO tables with greater regularity and regional SAMs for computable general equilibrium (CGE) modeling are used more, the use of balancing techniques will undoubtedly rise as well.
As Lahr and de Mesnard [5] note, many alternative formulations do exist that can perform a table balancing.Empirical work demonstrating the merits and costs of the various approaches are not always convincing.Indeed, because no theory of optimal IO data processing exists, there is no way to figure out a priori which particular technique will work best under particular circumstances.
In this paper I intend to develop a theory for balancing elements in input-output tables based on the theory of Bayesian inference of Jaynes [6].This approach has appeal because it is based on first principles and does not rely on ad-hoc reasoning.Using it I am therefore able to prove which numerical algorithm is best suited for a given set of uncertainty parameters in a set of IO accounts.
The present paper addresses the problem of IO data balancing under the following conditions: • The constraints are not necessarily biproportional but can take arbitrary structure.
• There is some degree of uncertainty affiliated with the values of IO elements.
• The IO elements may come from different sources with differing degrees of data quality.
In the classical biproportional or RAS problem [5,7] the intermediate inputs in a matrix are adjusted while row and column sums are fixed.When arbitrary structure is considered, every element in the data set may be constrained to be the sum of a subset of all other elements in the data set.
Data uncertainty is an estimate of the empirical error associated with each numerical datum.In contrast to biproportional balancing methods [5] and their variants [8], in this paper it is considered that every datum is characterized both by a best guess and by an uncertainty estimate.Some optimization methods [9], such as least-squares methods [10], allow the use of uncertainty information during balancing but provide no general rule to determine uncertainty when that information is initially absent.
Finally, data balancing problems frequently involve the combination of data from several sources with potentially different degrees of quality.For example, in the classical table update problem [5], there is an initial estimate from the previous year for interior points (low quality data) and row and column sums for the present year (high quality data).In practice the data update problem combines data from multiple sources and differing degrees of trustworthiness [11][12][13].The present paper deals with the general problem of combining data with differing degrees of quality (e.g., data from national statistical offices, from international organizations, survey data, etc.).
Currently, there is no data balancing method that addresses all of these issues, even though all of them arise in the compilation of multi-regional IO models.In this paper, this problem is solved using concepts and techniques of Bayesian inference [6].
Conventional methods address the balancing problem by imposing constraints on data interpreted as real numbers.In contrast, in a Bayesian framework, data are interpreted as random variables, and constraints are imposed on their first and second moments (best guess and uncertainty).Application of relative entropy minimization leads to an analytical solution.
Unfortunately, the analytical solution is impractical, so a series of numerical approximations is derived, whose validity depends on the amount of uncertainty information initially available.After this derivation conventional data balancing methods are reviewed and a one-to-one correspondence between the conventional methods and the numerical approximations is identified.
The existence of a one-to-one correspondence between Bayesian and conventional methods means that it is possible to identify the underlying assumptions of conventional methods.In particular, the popular RAS method assumes all data to have the same relative uncertainty.
Therefore, the Bayesian linear algorithm (recommended for most practical applications) turns out to be a generalization of the classical RAS method to the situation of arbitrary structure, uncertainty information and data quality hierarchy.
The paper proceeds as follows.Section 2 derives the general solution and numerical simplifications of the Bayesian data balancing method.Section 3 reviews conventional methods and compares them to the Bayesian methods.Section 4 concludes and the Appendix A reports auxiliary material.

Problem Formulation
This paper addresses the problem of balancing an IO table with arbitrary structure, uncertainty estimates and multiple data sources.These three properties are modeled as follows.
An arbitrary structure is formalized by considering that the IO data is arranged in a vector t of length n T and is subject to n K accounting identities of the form: where k i is a numerical constraint and each G ij can take values −1, 0 or 1.The accounting identities can be arranged in a constraint vector k and a concordance matrix G, such that: where 0 is a vector of zeros and t is the balanced posterior.The starting point for the balancing procedure is the unbalanced prior, θ for which: In a nutshell, the data balancing problem with arbitrary structure is as follows: initially there is knlowledge of G, k and θ, satisfying Equation ( 3) and the goal is to determine the t which satisfies Equation ( 2) and satisfies some additional properties.
For the purpose of this paper it is considered that every entry of the data vector is positive.Appendix A.1 shows how to deal with negatives and zeros in the original IO table.
To illustrate the construction of the concordance matrix and constraint vector, consider the case of a 2 × 2 matrix Z, with known row and column sums, Ze = x R and Z e = x C , where the row and column sums are x R and x C , and e is a vector of ones.This problem is formulated with n T = 4 numerical data and n K = 4 accounting identities, as: The system described by Equation ( 4) is the conventional RAS problem.The handling of uncertainty estimates requires the formalization of the stochastic properties of the IO data.Following Weise and Woger [14], who apply concepts of Bayesian inference [6] to the problem of measurement errors, in this paper it is considered that each IO datum is subject to empirical measurement errors and is therefore described by a random variable.
Thus, the prior θ is characterized by a probability distribution π(q), which expresses the degree of belief that the inaccurately known prior takes realization q.The prior best guess or expectation vector is µ, the prior uncertainty or standard-deviation vector is σ, and the prior correlation matrix is P. The posterior t is in turn characterized by a probability distribution p(q), the posterior best guess vector is m, the posterior uncertainty vector is s and the posterior correlation matrix is R.The prior and posterior covariance matrices are, respectively, Σ = σP σ and S = ŝRŝ, where ˆdenotes diagonal matrix.
This paper considers that the probability distribution, best guess, uncertainty and correlations of the prior are known.The best guess, m i , and uncertainty, s i , of a numerical datum are referred to as observables, to distinguish them from the corresponding parameters of the truncated Gaussian distribution, mi and si , that will appear in Section 2.2.
Finally, the problem of combining different data sources is formalized with the concept of data quality.That is, this paper considers that besides quantitative uncertainty information the source data is also characterized by a qualitative ranking, h, which indicates how trustworthy that data point is relative to others.
The ranking of data quality is used to solve the problem of conflicting constraints.Essentially, the present paper suggests that data of lower quality should be balanced while keeping data of higher quality fixed as constraints.But if a balanced solution cannot be found, then the "constraints" become adjustable.
To illustrate this concept, consider the RAS problem described in Equation (4).In this case the entries of the Z matrix are of lower quality than the row and column sums.Thus, the general problem, taking into account data quality can be formulated with n T = 8 numerical data and n K = 4 accounting identities, as: and k = 0.The problem defined by Equation ( 4) has a single level of data quality (where interior points are adjusted while row/column sums are fixed), whereas the problem defined by Equation (5) has two data quality levels, allowing for row and column sums to be adjusted too.
For clarity of exposition the remainder of this section is as follows.The data balancing problem with just two quality levels (numerical data and numerical constraints) is studied in Section 2.2.Data quality and the construction of numerical constraints is addressed in Section 2.3.Finally, Section 2.4 presents numerical approximations.

Analytical Solution
Bayesian inference was first developed by Laplace [15] and later expanded by others, such as Jeffreys [16] and Jaynes [6,17].According to the Bayesian paradigm, a probability is a degree of belief about the likelihood of an event, and should reflect all relevant available information about that event.
If more information about the event becomes available, then the prior probability must be updated to a posterior probability.
In the data balancing problem the goal is to update a probability distribution, under the guiding principle that the best inference is the one which takes into account all available information and no other.This principle is operationalized by searching for a posterior distribution that is as close as possible to the prior (in an information sense) and that satisfies the accounting identities, expressed in terms of moment constraints.
That is, if a discrete distribution is considered, the goal is to obtain a posterior, p(q j ), when both a prior, π(q j ), and moment constraints are known, by minimizing relative entropy [18,19].The Lagrangean is: The first term on the right hand side of Equation ( 6) is the entropy of p(q j ) relative to π(q j ), and the second term is the set of moment constraints.n L is the number of discrete realizations, n M is the number of moment constraints and M i is the i-th moment (e.g., the first moment is the best guess, the second moment is the variance).The solution of relative entropy minimization takes the form: Z is a normalization factor to convert relative probabilities into absolute ones.According to Robinson et al. [20] (p.52), the solution of relative entropy minimization "is analogous to Bayes' Theorem, whereby the posterior distribution, p(q i ), is equal to the product of the prior distribution, π(q i ), and the likelihood function (probability of drawing the data given parameters being estimated), exp( n L i=1 λ i (q j ) i ), dividing by a normalization factor, Z."As reviewed in Section 3.1, there is a class of conventional cross-entropy methods in which an IO datum is treated as a scalar, t i , and so the constraints take the form of Equation (1).That formalization is radically different from the Bayesian interpretation followed here, in which a numerical datum is conceptualized as a random variable.To our knowledge no data balancing method using the Bayesian interpretation of IO data has ever been proposed, although Golan et al. [21] offer a bridge between the two interpretations (datum as scalar and datum as random variable) through the concept of generalized cross entropy (see Section 3.1).
According to the Bayesian paradigm the best solution to the data balancing problem should take all available information into account.This information are the constraints of the first and second moments of the numerical data.Appendix A.2 shows that the constraints take the matrix form: where m and s2 are the vectors of best guess and uncertainty constraints.The construction of these vectors is explained in Section 2.3.Appendix A.3 shows how the introduction of these constraints in the cross-entropy minimization problem leads to the solution: where α and β are the first and second moment Lagrange parameters.Taken together, Equations ( 10) and (11), Equations ( 8) and ( 9) define the analytical solution of the Bayesian data balancing method.Note however that Equations ( 10) and ( 11) contain symbols adjoined with ˜(Gaussian parameters) while Equations ( 8) and ( 9) do not.When relative uncertainty, σ j /µ j or s j /m j , is low, then the Gaussian parameter and the observable are identical.When relative uncertainty is high, the best guess Gaussian parameter tends to −∞ and the uncertainty Gaussian parameter tends to ∞.There is no closed-form expression between observables and Gaussian parameters in the multivariate case.

Data Quality
This subsection introduces the concept of data quality, which determines the sequence in which the data balancing procedure is implemented and how numerical constraints are constructed.As described in Section 2.1, a key motivation for the present work is the possibility to incorporate information on the quality of source data directly in the balancing method.
Thus, consider that each numerical datum i is characterized by an integer-valued number h i which indicates its quality.That is, if datum i is more trustworthy than datum j, then h i > h j .Section 2.1 gives the example of a 2 by 2 RAS problem, in which the row and column sums were assumed to have higher quality than interior points.(The choice of integer values for the entries of h is for convenience only, any ordinal ranking such as a, b, c, etc., would work as well.) Issues of data quality inevitably arise in the compilation of IO tables from multiple data sources.If a practitioner wishes to construct a table combining official data from a national statistical office with survey data and data collected by third parties, it is likely that discrepancies between the different datasets will arise.When removing those discrepancies (the purpose of data balancing), it is natural that the method should allow the practitioner to use a qualitative measure of how trustworthy the different datasets are, relative to one another.
Data quality, h i , should not be confused with uncertainty estimate, s i .The latter is a quantitative expression of how trustworthy the best guess, m i , is.The former is a qualitative expression of how trustworthy (m i , s i ) are, in relation to other source data (m j , s j ) where j = i.
The present paper suggests to incorporate data quality in the balancing problem by considering that higher quality data is fixed while lower quality data is being balanced, and only if a balanced solution cannot be found is higher quality data adjusted too.
Consider that among the n T numerical data there are Q data quality levels, and the numerical data are indexed by increasing level of data quality.That is, all points in the range (n L−1 + 1, n L ) have data quality of level L, where n 0 = 0 and n Q = n T .The method searches for a balanced solution of quality level L, by holding fixed all data points j > n L .The method starts with L = 1 and moves up until a solution is found.In the worst-case scenario, a solution always exists when L = Q and all data can be balanced.
That is, in the data balancing problem at level L, the vectors of numerical data and the columns of the concordance matrix are truncated from n T to n L , and the posterior moment constraints (Equations ( 8) and ( 9)) become: The numerical constraints k(L), introduced in Section 2.1, are therefore an aggregation of higher quality data, for the particular balancing problem of level L. The constraint best guess, m(L), and variance, s2 (L), vectors, introduced in Section 2.2, are defined as: where G(L), θ(L), µ(L) and Σ(L) are, respectively, the concordance matrix, the prior random vector, the prior best guess vector and the prior covariance matrix at quality level L. It follows that at the highest quality level, Q, the numerical constraints are zero, The solution at the current quality level is incorporated in the prior of the next quality level: µ j (L + 1) = m j (L), σ j (L + 1) = s j (L) and ρ jk (L + 1) = r jk (L) for j = 1, . . ., n L and k = 1, . . ., n L .
A word of caution is necessary.If the assignment of data quality is incorrect, it is possible that the problem becomes ill posed.As a general rule, the user should always check if the results are meaningful: the Bayesian data balancing method can only provide a good solution if good data is provided.This is not a handicap but an advantage of the method, because it is warning the practitioner that the initial assignment of data quality is incorrect.This behavior is in agreement with the suggestion of Jaynes [6] that a Bayesian inference robot should apply rules uncritically, so that if an absurd outcome emerges, it is easy to identify the error in the problem formulation.

Numerical Approximations
The analytical solution of Section 2.2 requires the analytical conversion from the multivariate truncated Gaussian parameters to observables [22,23] and matrix inversions [24], operations which are far from trivial.
In this subsection a series of numerical approximations is reported, whose validity depends on how well source data uncertainty is characterized, which will in turn affect the value of correlations.
In practical applications, it happens frequently that an accounting identity (introduced in Section 2.1) contains only one entry G ij = −1 and several entries G ij = 1.In this paper the former is referred to as an aggregate datum and the latter as disaggregate data.
If there is a good characterization of all source data uncertainties, then the generalized least squares algorithm should be used.If there is a good characterization of disaggregate data but a poor one of aggregate data, then the weighted least squares algorithm or the linear algorithm should be considered.Finally, if there is a poor characterization of all uncertainties, the proportional algorithm should be preferred.
All of these algorithms are iterative and at each step the best guess displacement must be kept small and relative uncertainty constant.

The GLS Algorithm
The generalized least-squares (GLS) algorithm is obtained under two simplifying assumptions.The first and strongest assumption is to replace the truncated multivariate Gaussian with the non-truncated Gaussian, while still imposing that observable uncertainty is bound by observable best guess, 0 < σ ≤ µ and 0 < s ≤ m (Section 2.1).As shown in Section 3, the algorithms derived from this simplification turn out to be generalizations of the most used conventional data balancing methods.Thus, if in the future someone proves that the numerical algorithms proposed here are bad approximations of the analytical solution, that would imply the data balancing practice of the past 50 years is also wrong.If the present paper is the catalyst of such a revolutionary discovery, that alone is a valid contribution to the literature.
The second assumption is to consider that best guesses, µ, are known more accurately than uncertainties and correlations, σ and P, and so uncertainties and correlations should be adjusted before best guesses.That way, if the best guesses are initially balanced, they will remain unchanged.
Thus, if second-order data is initially balanced, S = Σ, Equation ( 11) simplifies to: The combination of Equations ( 8) and ( 14) determines the best guess Lagrange multipliers, α, as the solution of: Equations ( 14) and ( 15) represent a generalized least-squares (GLS) solution, which is valid if a balanced set of covariances has been found.
This problem may lack a solution if some accounting identities are linearly dependent and the corresponding numerical constraints are inconsistent.This case can be addressed by finding the minimum-norm solution, i.e., the α which minimizes ||α|| 2 and: where A minimum-norm solution can be found using the Moore-Penrose inverse [25], among other possible numerical algorithms [26].The Moore-Penrose inverse was used in the context of IO analysis by Pereira et al. [27].
The determination of posterior covariances, which is mathematically more complex, is described in Appendix A.4.

The WLS Algorithm
The weighted least-squares (WLS) algorithm is valid when aggregate data is maximally uninformative, in which case all correlations between disaggregate data are approximately unitary, r jk = 1, as shown in Appendix A. 5.
Two additional assumptions are now considered: first, that each disaggregate numerical datum is affected by few accounting identities; second, that each accounting identity affects many disaggregate numerical data.That is, the row sums of matrix G are large integers, while column sums are small (but positive) integers.These auxiliary assumptions are likely to be met in practice.
In Appendix A.7 it is shown that under these conditions the data balancing algorithm is given by: and the Lagrange multipliers are determined by: This is a weighted least-squares (WLS) method in which the weights are prior uncertainties.

The Proportional Algorithm
A further simplification is the situation when all (aggregate and disaggregate) prior relative uncertainties are identical, σ j /µ j = const.As shown in Appendix A.5, in this case all correlations are unitary, ρ jk = 1.This simplification leads to the proportional algorithm.
If the same considerations about data structure of the WLS case still apply (many numerical data per accounting identity, few accounting identities per numerical datum), Equation (17) becomes: Each row of the previous expression is: Recall that the Taylor first-order approximation of e x where x 0 is e x 1 + x.If the update rule is applied recursively in small steps, the previous expression can be rewritten as: where γ i = e α i .The first-order constraint, Equation ( 8), can be expressed in scalar form as: Combining the two previous expressions and imposing that the multipliers are adjusted one at a time, by balancing the respective first-order constraint, leads to: where and zero otherwise, and where and zero otherwise.The previous expression yields the solution: if n T j=1 G P ij µ j > 0 and otherwise: The algorithm consists in the application of Equation (23) to each accounting identity separately to determine the Lagrange parameters and the update of the best guess estimates by the application of Equation (21).This is a generalization of the popular RAS method for arbitrary structure.
The derivation of the proportional algorithm started by considering that all priors are maximally uninformative.However, the critical assumption is that all prior relative uncertainties are identical, σ j /µ j = constant, which implies that all correlations are unitary.

The Linear Algorithm
In the two least-squares methods derived above it is necessary to solve linear systems, by calculating a matrix inverse, a pseudo-inverse or using some implicit method [26].These are operations of greater complexity than the simple iterative rule of the proportional method.The linear algorithm derived now is a variation of the WLS algorithm which does not require solving a linear system but that can take into account uncertainty information, which the proportional algorithm does not.
Consider that each accounting identity, g(i), is used to determine the corresponding Lagrange multiplier, α i , in isolation.Thus, Equation (17) becomes: vspce-12pt and direct substitution in Equation ( 8) leads to the solution: The adjustment induced by the previous expression is linear (as opposed to the multiplicative adjustment of the proportional algorithm) and can be applied simultaneously to all pairs of accounting identities and Lagrange multipliers.The linear algorithm consists in the application of Equation ( 17) and: where ÷ is Hadamard (or entry-wise) division.

Proportional and Cross-Entropy Methods
Data balancing occurs in IO analysis under different circumstances, of which the most thoroughly explored is the problem of table update when row and column sums, mR i and mC j , are known for the current year, and interior points, µ * ij , are known from a previous year [13].In this problem, the goal is to update each interior point to m * ij , such that M * e = mR and M * e = mC .In the previous expressions e is a vector of ones, is transpose and superscript * was added to distinguish the original data in dense format from the data in sparse format introduced in the following paragraphs.
The most popular strategy to address this problem is a biproportional method in which the original matrix is iteratively multiplied by a left and a right perturbation diagonal matrices, (where the γ's are multiplicative adjustment factors), until the row and column sums are satisfied.The first such technique to be used in IO analysis was the RAS method [28,29], which spawned a vast offspring, whose genealogy is reviewed in Lahr and de Mesnard [5] and whose mathematical properties are characterized in de Mesnard [30].
Although this method is referred to as being specifically biproportional, it can be recast in a more general framework in which the numerical data is affected by any number of constraints, and not necessarily aligned in rows and columns.That is, where m and µ are now the prior and posterior vectors, m is a vector of numerical constraints and, in the case of a 2 × 2 matrix: (Notice that this is the same problem defined by Equation ( 4), but now expressed in terms of observable best guesses.) Cross-entropy methods, such as Snickars and Weibull [31] (see also Golan and Vogel [32], Robinson et al. [20] or Fernandez-Vasquez [33]), address the table update problem using a constrained optimization framework, in which the objective function is cross entropy [18].That is, a Lagrangean is defined as some variation of: where the first term in the right hand side is the relative (or cross) entropy to be minimized, followed by the set of constraints.Cross-entropy minimization provides a posterior probability distribution that is closest to the prior in an information sense and is also consistent with the constraints, as discussed in Section 2.2.Therefore, the application of this technique in this context implies that IO quantities, either economic transactions or technical coefficients, are being treated as probabilities, and that the IO table as a whole (either transaction or technical) is viewed as a probability distribution.This interpretation should be contrasted with the Bayesian approach of Section 2.2 in which a numerical datum is represented by a random variable instead of a real number.
Minimization of the Lagrangean with respect to the posteriors yields a solution of the form: where Z is a normalization constant.Substitution of γ i = exp(λ i )/Z 1/n K leads to the finding, in agreement with Bacharach [34], that the solution of such a problem is none other than the simple RAS method described above.Thus, cross-entropy methods provide a theoretical interpretation of proportional methods in a transaction-as-probability sense.In fact, to describe a method as being proportional or cross-entropy is to view the same object under two different angles: "proportionality" describes the implementation algorithm while "cross entropy" describes the objective function.
An interesting variation of cross entropy is the concept of generalized cross entropy of Golan et al. [21].They address the classical problem formulated in the first paragraph of this subsection, but expressed in terms of technical coefficients instead of transaction values.For consistency with the remainder of the present exposition, their problem is now reformulated as determining the interior points of a matrix, m * ij , subject to fixed row and column constraints, mR i and mC j , given priors µ * ij .They introduce a support of M discrete points, 0 ≤ q ij1 < . . .< q ijM ≤ mC i , and a probability associated with each point, p(q ijk ), so that (according to the Equation ( 24) in [21] ), the datum is actually an expectation: The optimization problem they consider (Equations ( 20)-( 23) in [21]) is: But this is is none other than a multivariate version of Equation ( 6) where the zero and first order constraints are known.Thus, in spite of some technical inaccuracies (e.g., Equation ( 25) in [21] defines an object which is the variance of an expectation), the generalized cross entropy of Golan et al. [21] is a forerunner of the Bayesian theory of IO uncertainty developed here.

Least-Squares Methods
There are balancing methods that use constrained optimization with other objective functions besides cross entropy [9], of which the most popular are least-squares methods [35][36][37][38][39].In these studies the Lagrangean is some variation of: Now the numerical datum is no longer characterized only by a best guess (or expected value) µ j or m j , but also by an uncertainty (or standard-deviation), σ j or s j .The corresponding covariance matrices are defined as Σ = σPσ and S = ŝRŝ, where P and R are the prior and posterior correlation matrices, and ˆdenotes diagonal matrix.
In these studies, prior uncertainty is defined as σ j = a j µ j , the product of the best guess prior and a reliability index a j , which expresses the subjective degree of belief that the expert has about the accuracy of the data.Some of these studies [38] consider zero prior correlations, leading to a solution of the form: Other studies obtain covariances from considerations of time autocorrelation [36,39], leading to a solution: In a variation to this theme, Rampa [10] notes that a second-order Taylor expansion to the cross-entropy objective function is a weighted least square objective function with the weight being the prior best guess: With this insight, he proposes a subjective weighted least-squares (SWLS) method where, as before, σ j = a j µ j and a j is a reliability index, but now the weights in the objective function are not covariances but standard-deviations, leading to a solution of the form: This proposal should be contrasted with the KRAS method [8], which addresses the issue of conflicting constraints, i.e., the table update problem in which the goal is to find M * 1 = μR and M * 1 = μC but now the constraints are themselves inconsistent, 1 μR = 1 μC .The method assumes that for each constraint prior both a best guess, μi , and an uncertainty, σj , are available.The method consists in alternating a proportional adjustment of interior points (conventional RAS) and an adjustment of constraints of the form: The previous expressions are a particular case of the SWLS method, with a single accounting identity, gm = 0, which, in the case of a 2 × 2 matrix, becomes: So the KRAS method, implementation details aside, is actually a hybrid between a cross-entropy optimization problem (for interior points) and a weighted least-squares optimization problem (for constraints).For clarity, this example considered row and column sums as constraints, but the KRAS method allows for arbitrary structure ( i.e., a numerical constraint can be linked to any subset of disaggregate data).However, the KRAS method always requires the classification of data quality into two quality levels, where data in the first quality level is adjusted using the proportional algorithm and data in the second data level is adjusted using the linear algorithm (which is identical to the SWLS method in the case of a single accounting identity).
The hybrid character shares some affinities with the work of Lieu et al. [40] and Lieu and Hicks [41], who combine entropy maximization and least-squares minimization to address conflicting constraints.

Discussion
The conventional methods, reviewed in Sections 3.1 and 3.2 are not very useful for the general data balancing problem outlined in Section 2.1 due to several problems, all of which are solved under the Bayesian approach.
Objective function: Conventional data balancing methods can be formulated as constrained optimization problems with different objective functions: cross entropy, generalized least squares, least squares weighted with variances or least squares weighted with standard-deviations.However, they offer no obvious rule to determine when each method should be applied under a particular circumstance.
Fortunately, the conventional methods reviewed here are very similar to the algorithms derived in Section 2.4, which means that it is possible to identify when each conventional method is valid.The GLS method should be used when uncertainty estimates for some disaggregate data and all aggregate data are available.The WLS method should be used when uncertainty estimates for some disaggregate data only are available.The proportional method should be used when no uncertainty data is available.The Bayesian algorithms are all iterative while conventional least-squares methods take place in a single step, which means that the latter are only valid if the initial inconsistency is small.
Uncertainty estimates: All data balancing methods can use information on best guesses, but they differ in the ability to incorporate information on uncertainty.In cross-entropy/biproportional methods there is no obvious way to introduce such information (but it can be done, e.g., [42]), while in least-square methods it is mandatory to specify both the standard-deviation and the correlations of the prior using subjective reliability indices which require expert knowledge of the data.
These problems do not occur in the methods proposed here, which adhere strictly to the rule that only available information should be used, meaning that if some uncertainty or correlation is missing the worst-case scenario must be assumed: the uncertainty equals the best guess and the correlation is unitary.This strategy provides objective rules that can be used even in the absence of expert knowledge.
Data quality: Another important characteristic of conventional methods is that either some part of the numerical data is adjusted while the other is held fixed (cross-entropy and proportional methods), or all the data is adjusted at the same time (least-squares methods).
In a Bayesian context, it is possible to introduce a ranking of data quality in the sequence in which the balancing procedure is implemented, as described in Section 2.3.In practice, a qualitative ordering of numerical data by level of trustworthiness is often more accessible than quantitative uncertainty estimates.Unlike conventional methods, the Bayesian approach to data balancing allows the user to make direct use of this knowledge.
There are two additional problems that some, but not all conventional methods exhibit, and which are absent from the algorithms derived here.
Arbitrary structure: Proportional methods assume that the data is organized in a matrix format [43] while cross-entropy and least-squares methods allow for an arbitrary structure.
Sign preservation: Cross-entropy and proportional methods always ensure sign preservation, while least-squares methods do not (i.e., an initially positive datum may become negative).In practice, all transactions in a table should be positive, and balancing items such as fixed capital formation, variations in stocks or net taxes can take both signs.Appendix A.1 shows how to allow balancing items to shift sign while ensuring the sign preservation of transactions.
In summary, this work has achieved a major theoretical unification in the problem of data balancing by being able to state the conditions in which conventional methods are valid and by providing simple rules to determine missing second-order data.The range of source information that can be used in the data balancing problem was expanded (data quality) while interesting features that some conventional methods possess have been kept (arbitrary structure and sign preservation).

Empirical Considerations
Section 2.4 proposed a series of data balancing algorithms, whose choice depends on the available information on source uncertainties and correlations.This subsection presents a brief survey of the empirical literature and discuss its implication for the choice of algorithm.
The Bayesian algorithm that requires more detailed source data is the GLS method, so the review starts by the least-squares literature, in which relative uncertainties are referred to as reliability indices.Weale [38] considers three reliability indices: 1.5%, 6.5% and 15%; Byron et al. [44] consider four: 3%, 13%, 30% and 50%; Chen [45] considers three: 10%, 20% and 30%.Using a two-step procedure that involves determining first a qualitative reliability indicator and later a coefficient of variation, Rassier et al. [46] consider relative uncertainties that range from 0% to 100%.Rampa [10] considers five: 1, 1.5, 2, 3 and 4 times the smallest value (in least-squares methods the absolute value of the reliability index is not important, only the relative value).Some studies [38,39] go as far as estimating prior covariances from time auto-correlations, but they are routinely assumed to be zero.
From this very brief survey it is apparent that in least-squares methods the reliability indices do not express factual quantitative knowledge but only a broad sense of qualitative ordering of different types of data (row/column sums, domestic transactions, added value, imports, etc.).In the Bayesian framework this qualitative ordering should be used directly in the form of data quality.The assignment of subjective reliability indices in the absence of numerical empirical support is not only unnecessary but also contrary to the Bayesian philosophy, according to which only available information should be used in the data balancing problem.
Uncertainty estimates are not routinely provided by statistical offices, but such studies are occasionally produced, and their results are broadly consistent, indicating that relative uncertainty, σ j /µ j , of IO transactions decreases monotonically with the best guess, µ j , in the broad range of 40% to 10% and row/column sums are known with proportionately better accuracy than interior points [47], decreasing down to 3%.These broad trends have been confirmed by studies in different countries, such as Bullard and Sebald [48] for the USA, Lenzen [49] for Australia, Nhambiú [50] for Portugal, and Lenzen et al. [51] for the UK.Yamakawa and Peters [52] use time-series inconsistencies to calculate source data uncertainty and Díaz and Morillas [53] use fuzzy logic [54,55] and firm-level data to estimate the uncertainty of technical coefficients.
The detailed studies of source uncertainty mentioned in the preceding paragraph are very labour intensive, so in a study that involves gathering empirical data of this type it probably makes sense to use the GLS data balancing method of Section 2.4.However, if uncertainty estimates are obtained from a literature survey, it may be better to use the WLS algorithm (or its linear variant) and to assign a higher quality level to aggregate data.The computational effort of using the full GLS method is several orders of magnitude higher and, in the absence of a high degree of confidence in the quality of source data, that additional effort is unjustified.
In conclusion, this paper suggests the following: if no quantitative uncertainty estimates are available, use only knowledge of the ranking of data quality.Unless there is substantial confidence in the uncertainty estimates of aggregate data, use a simplification instead of the full GLS method.The linear algorithm is the most flexible and easiest to implement and its theoretical shortcomings are probably of no consequence for most empirical applications.

Conclusions
This paper studies the problem of IO data balancing from the standpoint of Bayesian inference.The basic idea that motivates the present work is very simple, although its implications are far from trivial.A numerical datum known with some degree of uncertainty is treated as a random variable, t j , whose probability density function, p j (q), quantifies the degree of belief that the datum takes realization q j .The numerical datum is characterized empirically by a best guess, m j , and an uncertainty, s j , which are interpreted as the expectation and standard deviation of random variable t j .
The set of posteriors, t, must satisfy a set of accounting identities, summarized as Gt = 0, and a set of priors, θ is initially available, such that Gθ = 0. Application of the cross-entropy minimization, subject to first and second moment constraints leads to the analytical solution of the data balancing problem.
Several numerical algorithms are derived, whose scope of application depends on the availability of uncertainty estimates.If no uncertainty information is available, the algorithm is a natural generalization of the familiar RAS method.If some uncertainty estimates are available but there is no guarantee that the uncertainty of aggregate data was obtained independently from that of disaggregate data, then the algorithm is an uncertainty-weighted least-squares method.If the uncertainty of both aggregate and disaggregate data was obtained independently, then the algorithm is an alternate generalized least-squares method for first and second moment parameters.All algorithms are iterative and valid for arbitrary structure.
This paper presents a review of conventional data balancing algorithms and establishes a one-to-one correspondence with the Bayesian algorithms derived earlier, thus underpinning the assumptions of each conventional method.In particular, this paper finds that the conventional RAS method is a particular case of the proportional algorithm and thus implicitly assumes that the relative uncertainty of all data points is identical.This paper's suggestion for practical implementation (in the absence of high-quality uncertainty data) is the use of the linear algorithm described in Section 2.4, combined with the assignment of both uncertainty estimates (when available) and of data quality to the numerical priors.
A negative entry that is not allowed to shift sign can be handled by altering the structure of the problem.Consider the problem defined by Equation (5) where entry Z 12 is negative.The acccounting identities of the original system are: This problem can be recast as: where all quantities (including −Z 12 ) are now positive numbers.
To allow an IO datum to shift sign, is is necessary to consider two different positive-valued entries in an IO table: the superavit component, an entry in the corresponding row, and the deficit component, in the corresponding column.If, for example, the datum is positive, then it is assigned to the superavit component, and an infinitesimal is assigned to the deficit component.
To allow a balancing item to shift sign using the linear algorithm (which in this paper is recommended for practical uses), it is sufficient not to enforce the displacement bound.That is, if t j is a transaction, it is necessary to ensure that at every step |1 − m j /µ j | < .If t j is a balancing item that check should not be performed.
A zero value in an IO table can mean two different things: either the transaction is logically impossible or it was simply too small to have been recorded.In the first case, it should be excluded from the set of numerical data.
However, if a transaction t j is below the resolution of the IO table, , but it is nonetheless logically possible, it should be assigned a maximally uninformative prior, σ j = µ j , with an infinitesimal best guess, µ j .Because the initial inconsistency of IO tables is usually small, this step is unnecessary, i.e., m j so there is no problem in removing transaction t j from the set of numerical data altogether.
This paper suggests the explicit consideration of infinitesimals only when there is the suspicion that some non-infinitesimal best guess is misreported, a situation studied by Keogh and Quill [4].
Notice that r jk when G ij = 1 and G ij = −1 are absent from Equation (58).The matrix form of Equation ( 58) is given by Equation (9).Notice that in Equation ( 9) the concordance matrix appears two times, and one of those times it is in absolute terms, | • | (although it does not matter which one).This is necessary so that correlations between r jk when G ij = 1 and G ij = −1 cancel out.

A.3. Analytical Solution
The information about the first two moments is introduced in the Lagrangean of the system in scalar form as: In Equation (59) the expression Ω dq is a shorthand for the product n T j=1 ∞ 0 dq j .Each q j is the realization of the random variables t j and θ j .The first term on the right hand side of Equation (59) contains the entropy of the posterior, relative to the prior.The second term is the normalization constraint.The third term is the set of best guess constraints.The fourth term is the set of uncertainty constraints.The term m j is the marginal expectation of t j , defined as: The term for some i, j and k = j, or G * ijk = 0 otherwise.The λ, α * 's and β * 's are, respectively, the Lagrange multipliers of the normalization, best guess and uncertainty constraints.Minimization of Equation (59) with respect to p(q) yields: The C's in the previous and subsequent expressions denote appropriately chosen constants.The previous expression can be rewritten in the form: This expression can be simplified to: where and the latter term is obtained as: Thus, although m * i is still unknown, it is only a function of the accounting identity iterator i and independent of the iterators of numerical data j or k.Since the Lagrange multipliers are still free, the substitution above is valid.
Notice that the exponent in Equation ( 64) is a polynomial whose coefficients are linear combinations of Lagrange multipliers.If the prior is a multivariate truncated Gaussian and the constraints are of second order, the posterior is also a truncated multivariate Gaussian whose probability density is: The exponent of the prior and posterior probability densities can be expanded in a polynomial form.In particular, Equation (66) becomes: and the polynomial expansion of the prior distribution displays a similar pattern.In the previous expression s−1 jk is the (j, k) entry of matrix S−1 .An explicit expression for the parameters of the posterior can be obtained by solving expressions of the form C post = C prior + C constraint , where each constant is the coefficient of the corresponding polynomial expansion for the posterior and prior distributions and the expressions containing the Lagrange multipliers that result from differentiating the Lagrangean, Equation (59).This leads to Equations ( 10) and (11).

A.4. GLS Algorithm: Covariances
To determine posterior uncertainties and correlations it is more convenient to express the covariance matrices as S = ŝRŝ and Σ = σPσ, so that Equation ( 10) can be recast as: where the truncated Gaussian parameters were replaced by observable parameters.Under the substitution σR * σ = ŝRŝ and B = G β|G|, the previous expression simplifies to: Using the Woodbury identity [57], the previous expression is equivalent to: Another application of the Woodbury identity leads to: The previous expression can be rewritten as: If the displacement from prior to posterior is small, each Lagrange parameter, β i , is also small.The third term in the right hand side of the previous expression contains products β i β j 0 which can be discarded, and so the first-order approximation is obtained: The filter matrix, F, possesses entry F jk = 1 if there is some accounting identity i for which G ij = G ik and F jk = 0 otherwise.Matrix F is introduced to avoid the appearance of mathematical artifacts.After all, Equation (73) is an approximation and, if unchecked, it may lead to the appearance of spurious correlations for an entry (j, k) for which F jk = 0.The magnitude of these spurious correlation would be small, but incorrect nonetheless, and with the use of the filter matrix this matter is swiftly addressed.
The Lagrange multipliers are determined by substitution of Equation (73) in Equation ( 9), leading to: (Matrix F was ignored for computational purposes.)The previous expression can be further simplified by noting that diag(A + B) = diag(A) + diag(B) and that d = diag(A bC) = (A#C )b, where # is the Hadamard (or entry-wise) product, since This implies that the solution is: If all relative uncertainties are identical, s 0 /m 0 = s j /m j = const, the second-order constraint becomes: which combined with the first-order constraint leads to: which in turn implies: so that r ij = 1, all correlations are unitary.
A reviewer of a previous version of this paper stated that unitary correlations are problematic, because the theoretical solution involves matrix inversion and the inverse of a fully correlated covariance matrix is non-invertible.Appendix A.6 shows that the theory has no problem in handling a unitary correlation matrix.

A.6. A Single Accounting Identity
The analytical solution of the covariance update rule, Equation (10), in the case of a single accounting identity with unitary prior correlations, is now studied.It is considered that there are n disaggregate data (labeled from 1 to n) and an aggregate datum (labeled 0).
The first case considered is that in which the aggregate datum and the disaggregate data have the same quality level, so they are adjusted simultaneously.For clarity consider the case n = 2.The non-truncated version of Equation ( 10) is: If the prior correlation is unitary, ρ = 1, there seems to be a problem, because the matrix inverse is ill-defined.Does this mean that the Bayesian data balancing method is inconsistent?No.It means that, as Jaynes [6] points out, direct reasoning in terms of infinite quantities (or in this case infinitesimals) should be avoided, since it may lead to paradoxes.
Instead, this paper follows his strict finite-sets policy: "Apply the ordinary processes of arithmetic and analysis only to expressions with a finite number n of terms.Then after the calculation is done, observe how the resulting finite expressions behave as the parameter n increases indefinitely" [6] (p. 452).In the present case, the previous expression can be rewritten as: The substitution β * = (1 − ρ 2 )β was performed and it is assumed that r ρ.It is now clear that, as ρ → 1, the Lagrange multiplier affecting disaggregate data vanishes, β * β.This means that the adjustment effort falls entirely on the aggregate uncertainty, so s 0 = n j=1 σ j , while s j = σ j and r jk = ρ jk = 1.
The case n = 2 was analyzed, but the same result holds in general.Entry (jk) of the inverse of a n × n matrix P is: where g jk is potentially a function of every element of P and det is the determinant.As in the 2 × 2 case: where β * = det(P)β.Since β * → 0 when ρ → 1, all the adjustment effort falls on the aggregate uncertainty, just like in the bivariate case.The empirically relevant case in which uncertainties estimates are known with better accuracy than correlations is now addressed.In this case s = σ and only correlations are adjusted.If n = 2, the non-truncated version of Equation ( 10) is: Proceeding as before leads to: where β * = (1 − ρ 2 )β.
There is no indeterminacy in the expression linking prior and posterior correlations: For arbitrary n the equivalent implicit expression is valid: where β * = det(P)β.So even in the limit case of a single accounting identity the Bayesian data balancing method generates meaningful results.

A.7. Derivation of the WLS Algorithm
Consider the particular example of a dense IO matrix, where every interior point (ij) is affected by two accounting identities, corresponding to the row and column sums.The expansion of the term G α in Equation ( 14) becomes a vector where each entry is the sum of two Lagrange multipliers, α R i + α C j , corresponding to the i-th row and j-th column sums.For notational convenience (ij) denotes a single numerical datum.The expansion of an entry of Equation ( 14) becomes: Under the substitution α R * i = α R i k σ ik and α C * j = α C j k σ kj , the previous expression becomes: If there are many numerical data per accounting identity, it is reasonable to consider that σ ik l σ il and that σ kj l σ lj .Introducing these considerations in the previous expression leads to: The expression above is valid for interior points.The corresponding expressions for row and column sums (labeled respectively with superscripts R and C) are easy to obtain since these data exhibit zero correlations with every other datum.Under the assumption of unitary correlations and a balanced prior, σ R i = k σ ik and σ C j = k σ kj , the adjustment of row and column sums are: The generalization of the previous expressions to matrix format is given by Equation (17).