An Alternative Proof of Minimum Trace Reconciliation

: Minimum trace reconciliation, developed by Wickramasuriya et al., 2019, is an innovation in the literature on forecast reconciliation. The proof, however, has a gap, and the idea is not easy to extend to more general situations. This paper fills the gap by providing an alternative proof based on the first-order condition in the space of a non-square matrix and arguing that it is not only simpler but also can be extended to incorporate more general results on minimum weighted trace reconciliation in Panagiotelis et al., 2021. Thus, our alternative proof not only has pedagogical value but also connects the results in the literature from a unified perspective.


Introduction
Minimum trace reconciliation, developed by [1], is an innovation in the literature on forecast reconciliation.The tool enables a systematic approach to forecasting with linear constraints, which encompasses a wide range of applications, including electricity demand forecasting [2] and macroframework forecasting [3,4].
The proof of [1], however, has a gap and is not easy to extend to more general situations.Their proof attempts to solve a minimization problem by replacing the objective function with its lower bound.Although they find the solution that minimizes the lower bound, the minimizer is not shown to coincide with the solution to the original problem, which creates a gap in the proof.
This paper provides an alternative proof and argues that it is not only simpler but also can be extended to incorporate more general results in the literature.The proof is more direct in the sense that it solves the first-order condition in the space of the non-square matrix.An almost identical proof can be used to prove Theorem 3.3 of [5], which shows that the minimum trace reconciliation and minimum weighted trace reconciliation lead to an identical formula.By selecting a special weight in the weighted trace reconciliation problem, we can also see why the lower bound minimization in [1] reaches the same formula.Thus, the alternative proof not only has pedagogical value but also connects the results in the literature from a unified perspective.
The paper is organized into six sections.In Section 2, we provide the setup of the problem.In Section 3, we briefly illustrate the proof of [1].In Section 4, we provide an alternative proof of [1].Section 5 extends the proof to incorporate [5] and discusses the insights.In Section 6, we make our conclusions.

Setup
The setup and notation follow [1].Let y t and b t be m × 1 and n × 1 vectors of random variables, where m > n > 0. The two vectors are constrained linearly by where S is a m × n matrix, and its last n rows are identity matrix and, thus, S is of full column rank for any matrix C. Intuitively, b t represents the most disaggregated level and y t includes b t itself and aggregates of the subcomponents as specified by C, although mathematically, C can include negative elements.In any case, the realization of y t is linearly dependent and belongs to as Suppose that an h-step ahead forecast based on the information up to time T, denoted by ŷT (h) and called "base" forecast, is given.The base forecast ŷT (h) is assumed to be an unbiased estimator of y T+h where E T is the expectation conditional on the information up to time T.But an issue is that ŷT (h) may not belong to A, which motivates forecast reconciliation.
A reconciled forecast The role of P is to map the base forecast ŷT (h) into the most disaggregated level.The reconciled forecast ∼ y T (h) is assumed to be unbiased and, thus, satisfies Note that the necessity of the last equivalence follows from multiplying S ′ from the left of both sides.S ′ S is a full-rank square matrix as S is a full-rank matrix, so S ′ S is invertible.The sufficiency follows from multiplying S from the left of both sides.
The forecast error of the reconciled forecast can be expressed as where W = E T (y T+h − ŷT (h))(y T+h − ŷT (h)) ′ is the covariance matrix of the h-step ahead base forecast error and is assumed to be invertible (i.e., excluding the case of zero forecast error and the case of degenerated matrix C for aggregation).The equality follows because noting that (I m − SP)S = S − SPS = 0 holds.Ref. [1] attempted to prove that the matrix P that minimizes the trace of the covariance matrix subject to the unbiasedness constraint is

Gap in Proof of [1]
The proof of [1] can be divided into two steps.First, they show in its online Appendix A2 that the objective function can be bounded from below, as follows: Second, a minimization problem where the objective function is the lower bound is solved.
The proof ends here, and thus, one still needs to show that the minimizers of the two problems ( 9) and ( 11) coincide.This presents a gap in the proof.
This gap is non-trivial because minimizing a function is not generally the same as minimizing its lower bound.That is, a function f (x) being bounded from below by another function g(x) does not guarantee that their minimizers coincide.For a counterexample where minimizers do not coincide, consider where f (x) is bounded by g(x) from below, as follows: but the minimizers do not coincide, as follows: In the case of ( 9) and ( 11), however, the minimizers do coincide, as explained in proposition 2 of Section 5.

An Alternative Proof of (9)
The alternative proof that we propose is an extension of the partial derivative and the first-order condition in a space of the matrix.
Proof.Let (R n×m , ⟨, ⟩) be the space of n × m matrix equipped with the Frobenius inner product ( [6]): ⟨A, By Theorem 1 of [7] (p.243), there exists an n × m matrix Lagrange multiplier Λ such that the Lagrangian is stationary at its minimum point.This means that, at the minimum, the directional derivative (or Gateaux differential, as defined on page 171 of [7]) of L(P) is zero for any n × m matrix H.By a direction calculation, By rearranging the terms and taking the limit in α, the quadratic term disappears, and the derivative becomes the following: where the second equality uses tr(A ′ B) = tr(B ′ A) = tr(BA ′ ) and the symmetry of W. Since this has to hold for all H, Multiplying S ′ on both sides from the left and using SPS = S gives the following: Thus, the formula is as follows:

□
The proof essentially uses the extension of a partial derivative and solves the firstorder condition.Since the objective function is quadratic and the constraint is linear, the first-order condition is sufficient.

An Extension of the Alternative Proof
The proof can be applied to the environment of weighted trace minimization as Theorem 3.3 of [5].To motivate the extension, suppose we have the base forecast of the variables in the GDP expenditure approach and want to reconcile it to satisfy where Y is GDP, C is consumption, I is investment, G is government expenditure, and XM is net export.The minimum trace reconciliation minimizes the variance of forecast error with equal weights, as follows: subject to the constraint (22).Since the forecast of GDP often attracts more attention than the others, a natural question is whether it is possible to improve the forecast of some variables at the expense of other variables by adjusting the weights.
Such specification can be expressed as a weighted trace, as follows: where ω is a m × m matrix with its (i, i) element equal to ω i .When the m × m weight matrix is a diagonal matrix, the objective function is a weighted sum of the variance of forecast errors.Note that the constraint in (25) is the same as that in (9), so the same unbiasedness assumption is still imposed.The same unbiasedness assumption is also reflected in the objective function as in (9).As [5] showed and [1] proved in their unpublished manuscript, the optimal matrix P is independent of ω as long as ω is symmetric and invertible.Therefore, in practice, one does not need to exercise judgment or estimate how much weight to put on which variable.Proposition 1.For any symmetric and invertible m × m matrix ω, the solution to (25) is Proof.The proof is almost identical to Section 4. Let the Lagrangian be Following the same logic as Section 4, the first-order condition leads to lim α→0 L(P + αH) − L(P) Since this has to hold for all H, Multiplying S ′ on both sides from the left and using SPS = S gives the following: The formula follows because S ′ ωS is a full-rank square matrix and, thus, invertible.

□
Intuitively, the fact that the weight matrix does not matter can be interpreted as saying that there is not a trade-off between variables, as if the choice matrix P has enough degree of freedom in mixing the base forecast so that the variance of each variable's forecast error can be minimized variable by variable, without affecting the variance of other variables' forecast errors.
Mathematically, the proof shares an almost identical structure as Section 4, which is a special case when ω = I m .Since a symmetric invertible matrix can be factorized as ω = A ′ A from Takagi's factorization, the objective function can be written as for any full-rank square matrix A. In fact, since the proof only requires S ′ ωS to be invertible, one can extend A to be a non-square matrix and show that the objective function of ( 11) is a special case of that of (25).
Proposition 2. There exists a weightωsuch that the objective function of (11) equals that of (25).Multiplying  on both sides from the left and using  =  gives the following:

Proof. Let
The formula follows because   is a full-rank square matrix and, thus, invertible.

□
Intuitively, the fact that the weight matrix does not matter can be interpreted as saying that there is not a trade-off between variables, as if the choice matrix  has enough degree of freedom in mixing the base forecast so that the variance of each variable's forecast error can be minimized variable by variable, without affecting the variance of other variables' forecast errors.
Mathematically, the proof shares an almost identical structure as Section 4, which is a special case when  =  .Since a symmetric invertible matrix can be factorized as  =   from Takagi's factorization, the objective function can be written as for any full-rank square matrix .In fact, since the proof only requires   to be invertible, one can extend  to be a non-square matrix and show that the objective function of (11) is a special case of that of (25).
Proposition 2. There exists a weight  such that the objective function of (11) equals that of (25).

□
Note that since   =  is invertible, the proof of proposition 1 can be applied to show (11).This is one way to see why the proof of [1] reaches the same formula.One insight from the right side of (33) is that it represents the summed variance of the forecast error of the most disaggregated variables.Thus, minimizing the summed variance of all variables is equivalent to minimizing the summed variance of the most disaggregated variables.
In summary, the extension to allow a general weight highlights two observations.First, the irrelevance of weight implies that the objective function, being the trace of the forecast error covariance matrix, is not essential, although (9) is called minimum trace reconciliation in the literature.What is essential is the unbiasedness assumption, and thus, it could alternatively be called an optimal unbiased reconciliation.Second, the irrelevance of weight suggests that (9) reconciles the base forecast as if the forecast error variance of each variable can be minimized independently, but at the same time, (9) can be obtained Multiplying  on both sides from the left and using  =  gives the The formula follows because   is a full-rank square matrix and, thu    =        ⇒  =      .

□
Intuitively, the fact that the weight matrix does not matter can be interp ing that there is not a trade-off between variables, as if the choice matrix  degree of freedom in mixing the base forecast so that the variance of each va cast error can be minimized variable by variable, without affecting the varia variables' forecast errors.
Mathematically, the proof shares an almost identical structure as Section a special case when  =  .Since a symmetric invertible matrix can be factor   from Takagi's factorization, the objective function can be written as    =     for any full-rank square matrix .In fact, since the proof only requires   ible, one can extend  to be a non-square matrix and show that the objective (11) is a special case of that of (25).   =     =   .

□
Note that since   =  is invertible, the proof of proposition 1 can b show (11).This is one way to see why the proof of [1] reaches the same fo insight from the right side of (33) is that it represents the summed variance of error of the most disaggregated variables.Thus, minimizing the summed va variables is equivalent to minimizing the summed variance of the most disagg iables.
In summary, the extension to allow a general weight highlights two o First, the irrelevance of weight implies that the objective function, being the forecast error covariance matrix, is not essential, although (9) is called min reconciliation in the literature.What is essential is the unbiasedness assumptio it could alternatively be called an optimal unbiased reconciliation.Second, the of weight suggests that (9) reconciles the base forecast as if the forecast error each variable can be minimized independently, but at the same time, (9) can A ′ A. The objective function of (25) collapses to that of (11).

□
Note that since S ′ ωS = I n is invertible, the proof of proposition 1 can be applied to show (11).This is one way to see why the proof of [1] reaches the same formula.One insight from the right side of (33) is that it represents the summed variance of the forecast error of the most disaggregated variables.Thus, minimizing the summed variance of all variables is equivalent to minimizing the summed variance of the most disaggregated variables.
In summary, the extension to allow a general weight highlights two observations.First, the irrelevance of weight implies that the objective function, being the trace of the forecast error covariance matrix, is not essential, although (9) is called minimum trace reconciliation in the literature.What is essential is the unbiasedness assumption, and thus, it could alternatively be called an optimal unbiased reconciliation.Second, the irrelevance of weight suggests that (9) reconciles the base forecast as if the forecast error variance of each variable can be minimized independently, but at the same time, (9) can be obtained by minimizing the variance of only the bottom-level variables.The extension suggests that these two apparently contradictory interpretations can coexist.

Conclusions
In this paper, we have provided an alternative proof to the minimum trace reconciliation developed by [1], filling a gap in their proof.We have also shown that an almost identical proof can be used to prove [5], so both the trace and weighted trace can be analyzed from a unified perspective.We believe the alternative simpler proof provides