An Alternative Proof of Minimum Trace Reconciliation

Sakai Ando; Futoshi Narita

doi:10.3390/forecast6020025

Abstract

Minimum trace reconciliation, developed by Wickramasuriya et al., 2019, is an innovation in the literature on forecast reconciliation. The proof, however, has a gap, and the idea is not easy to extend to more general situations. This paper fills the gap by providing an alternative proof based on the first-order condition in the space of a non-square matrix and arguing that it is not only simpler but also can be extended to incorporate more general results on minimum weighted trace reconciliation in Panagiotelis et al., 2021. Thus, our alternative proof not only has pedagogical value but also connects the results in the literature from a unified perspective.

Keywords:

forecast reconciliation; minimum trace reconciliation; hierarchical time series

JEL Classification:

C53; E17

1. Introduction

Minimum trace reconciliation, developed by [1], is an innovation in the literature on forecast reconciliation. The tool enables a systematic approach to forecasting with linear constraints, which encompasses a wide range of applications, including electricity demand forecasting [2] and macroframework forecasting [3,4].

The proof of [1], however, has a gap and is not easy to extend to more general situations. Their proof attempts to solve a minimization problem by replacing the objective function with its lower bound. Although they find the solution that minimizes the lower bound, the minimizer is not shown to coincide with the solution to the original problem, which creates a gap in the proof.

This paper provides an alternative proof and argues that it is not only simpler but also can be extended to incorporate more general results in the literature. The proof is more direct in the sense that it solves the first-order condition in the space of the non-square matrix. An almost identical proof can be used to prove Theorem 3.3 of [5], which shows that the minimum trace reconciliation and minimum weighted trace reconciliation lead to an identical formula. By selecting a special weight in the weighted trace reconciliation problem, we can also see why the lower bound minimization in [1] reaches the same formula. Thus, the alternative proof not only has pedagogical value but also connects the results in the literature from a unified perspective.

The paper is organized into six sections. In Section 2, we provide the setup of the problem. In Section 3, we briefly illustrate the proof of [1]. In Section 4, we provide an alternative proof of [1]. Section 5 extends the proof to incorporate [5] and discusses the insights. In Section 6, we make our conclusions.

2. Setup

The setup and notation follow [1]. Let

y_{t}

and

b_{t}

be

m \times 1

and

n \times 1

vectors of random variables, where

m > n > 0

. The two vectors are constrained linearly by

y_{t} = S b_{t},

(1)

where

S

is a

m \times n

matrix, and its last

n

rows are identity matrix

S = [\begin{matrix} C \\ I_{n} \end{matrix}],

(2)

and, thus,

S

is of full column rank for any matrix

C

. Intuitively,

b_{t}

represents the most disaggregated level and

y_{t}

includes

b_{t}

itself and aggregates of the subcomponents as specified by

C

, although mathematically,

C

can include negative elements. In any case, the realization of

y_{t}

is linearly dependent and belongs to

A = \{y \in R^{m} : [I_{m - n} - C] y = 0\}

(3)

as

[\begin{matrix} I_{m - n} & - C \end{matrix}] [\begin{matrix} C \\ I_{n} \end{matrix}] = C - C = 0

.

Suppose that an

h

-step ahead forecast based on the information up to time

T

, denoted by

{\hat{y}}_{T} (h)

and called “base” forecast, is given. The base forecast

{\hat{y}}_{T} (h)

is assumed to be an unbiased estimator of

y_{T + h}

E_{T} y_{T + h} = E_{T} {\hat{y}}_{T} (h),

(4)

where

E_{T}

is the expectation conditional on the information up to time

T

. But an issue is that

{\hat{y}}_{T} (h)

may not belong to

A

, which motivates forecast reconciliation.

A reconciled forecast

{\tilde{y}}_{T} (h)

given an

n \times m

matrix

P

is a linear transformation of

{\hat{y}}_{T} (h)

such that

{\tilde{y}}_{T} (h) = S P {\hat{y}}_{T} (h) .

(5)

The role of

P

is to map the base forecast

{\hat{y}}_{T} (h)

into the most disaggregated level. The reconciled forecast

{\tilde{y}}_{T} (h)

is assumed to be unbiased and, thus, satisfies

\begin{array}{l} E_{T} {\tilde{y}}_{T} (h) = E_{T} y_{T + h} = E_{T} {\hat{y}}_{T} (h), \forall y_{T + h} \\ \Leftrightarrow S P S E_{T} b_{T + h} = S E_{T} b_{T + h}, \forall E_{T} b_{T + h} \\ \Leftrightarrow S P S = S \\ \Leftrightarrow P S = I_{n} . \end{array}

(6)

Note that the necessity of the last equivalence follows from multiplying

S^{'}

from the left of both sides.

S' S

is a full-rank square matrix as

S

is a full-rank matrix, so

S' S

is invertible. The sufficiency follows from multiplying

S

from the left of both sides.

The forecast error of the reconciled forecast can be expressed as

E_{T} [(y_{T + h} - {\tilde{y}}_{T} (h)) {(y_{T + h} - {\tilde{y}}_{T} (h))}^{'}] = S P W P^{'} S^{'},

(7)

where

W = E_{T} [(y_{T + h} - {\hat{y}}_{T} (h)) {(y_{T + h} - {\hat{y}}_{T} (h))}^{'}]

is the covariance matrix of the

h

-step ahead base forecast error and is assumed to be invertible (i.e., excluding the case of zero forecast error and the case of degenerated matrix

C

for aggregation). The equality follows because

y_{T + h} - {\tilde{y}}_{T} (h) = y_{T + h} - S P {\hat{y}}_{T} (h) = \underset{= 0}{\underset{⏟}{(I_{m} - S P) S}} b_{T + h} + S P (y_{T + h} - {\hat{y}}_{T} (h)),

(8)

noting that

(I_{m} - S P) S = S - S P S = 0

holds. Ref. [1] attempted to prove that the matrix

P

that minimizes the trace of the covariance matrix subject to the unbiasedness constraint is

{(S^{'} W^{- 1} S)}^{- 1} S^{'} W^{- 1} = \arg \min_{P \in R^{n \times m}} t r [S P W P^{'} S^{'}] s . t . P S = I_{n} .

(9)

3. Gap in Proof of [1]

The proof of [1] can be divided into two steps. First, they show in its online Appendix A2 that the objective function can be bounded from below, as follows:

t r [S P W P^{'} S^{'}] \geq t r [P W P^{'}], \forall P \in R^{n \times m} .

(10)

Second, a minimization problem where the objective function is the lower bound is solved.

{(S^{'} W^{- 1} S)}^{- 1} S^{'} W^{- 1} = \arg \min_{P \in R^{n \times m}} t r (P W P^{'}) s . t . P S = I_{n} .

(11)

The proof ends here, and thus, one still needs to show that the minimizers of the two problems (

9

) and (

11

) coincide. This presents a gap in the proof.

This gap is non-trivial because minimizing a function is not generally the same as minimizing its lower bound. That is, a function

f (x)

being bounded from below by another function

g (x)

does not guarantee that their minimizers coincide. For a counterexample where minimizers do not coincide, consider

f (x) = 2 x^{2} - 4 x + 4, g (x) = x^{2}, x \in R,

(12)

where

f (x)

is bounded by

g (x)

from below, as follows:

f (x) - g (x) = {(x - 2)}^{2} \geq 0, \forall x \in R,

(13)

but the minimizers do not coincide, as follows:

\arg \min_{x \in R} f (x) = 1 \neq \arg \min_{x \in R} g (x) = 0 .

(14)

In the case of (

9

) and (

11

), however, the minimizers do coincide, as explained in proposition 2 of Section 5.

4. An Alternative Proof of (9)

The alternative proof that we propose is an extension of the partial derivative and the first-order condition in a space of the matrix.

Proof.

Let

(R^{n \times m}, ⟨,⟩)

be the space of

n \times m

matrix equipped with the Frobenius inner product ([6]):

⟨A, B⟩ = t r (A^{'} B), A, B \in R^{n \times m} .

(15)

By Theorem 1 of [7] (p. 243), there exists an

n \times m

matrix Lagrange multiplier

Λ

such that the Lagrangian

L (P) = t r (S P W P^{'} S^{'}) + t r (Λ^{'} (I_{n} - P S))

(16)

is stationary at its minimum point. This means that, at the minimum, the directional derivative (or Gateaux differential, as defined on page 171 of [7]) of

L (P)

is zero for any

n \times m

matrix

H

.

By a direction calculation,

\begin{array}{l} L (P + α H) & = t r (S (P + α H) W {(P + α H)}^{'} S^{'}) + t r (Λ^{'} (I_{n} - (P + α H) S)) \\ = t r (S P W P^{'} S^{'}) + t r (Λ^{'} (I_{n} - P S)) + α t r (S H W P S^{'}) + α t r (S P W H^{'} S^{'}) \\ + α^{2} t r (S H W H^{'} S^{'}) - α t r (Λ^{'} H S) \\ = L (P) + α t r (S H W P S^{'}) + α t r (S P W H^{'} S^{'}) + α^{2} t r (S H W H^{'} S^{'}) - α t r (Λ^{'} H S) . \end{array}

(17)

By rearranging the terms and taking the limit in

α

, the quadratic term disappears, and the derivative becomes the following:

\begin{array}{l} \lim_{α \to 0} \frac{L (P + α H) - L (P)}{α} & = t r (S H W P^{'} S^{'} + S P W H^{'} S^{'} - Λ^{'} H S) \\ = t r (H (2 W P^{'} S^{'} S - S Λ^{'})) \\ = 0, \end{array}

(18)

where the second equality uses

t r (A' B) = t r (B^{'} A) = t r (B A^{'})

and the symmetry of

W

. Since this has to hold for all

H

,

2 W P^{'} S^{'} S = S Λ^{'} \Rightarrow P^{'} S^{'} S = \frac{1}{2} W^{- 1} S Λ^{'} .

(19)

Multiplying

S^{'}

on both sides from the left and using

S P S = S

gives the following:

S^{'} S = \frac{1}{2} S^{'} W^{- 1} S Λ^{'} \Rightarrow {(S^{'} W^{- 1} S)}^{- 1} S^{'} S = \frac{1}{2} Λ^{'} .

(20)

Thus, the formula is as follows:

P^{'} S^{'} S = W^{- 1} S {(S^{'} W^{- 1} S)}^{- 1} S^{'} S \Rightarrow P = {(S^{'} W^{- 1} S)}^{- 1} S^{'} W^{- 1} .

(21)

□

The proof essentially uses the extension of a partial derivative and solves the first-order condition. Since the objective function is quadratic and the constraint is linear, the first-order condition is sufficient.

5. An Extension of the Alternative Proof

The proof can be applied to the environment of weighted trace minimization as Theorem 3.3 of [5]. To motivate the extension, suppose we have the base forecast of the variables in the GDP expenditure approach and want to reconcile it to satisfy

Y = C + I + G + X M,

(22)

where

Y

is GDP,

C

is consumption,

I

is investment,

G

is government expenditure, and

X M

is net export. The minimum trace reconciliation minimizes the variance of forecast error with equal weights, as follows:

V (Y - \tilde{Y}) + V (C - \tilde{C}) + V (I - \tilde{I}) + V (G - \tilde{G}) + V (X M - \tilde{X M})

(23)

subject to the constraint (

22

). Since the forecast of GDP often attracts more attention than the others, a natural question is whether it is possible to improve the forecast of some variables at the expense of other variables by adjusting the weights.

ω_{Y} V (Y - \tilde{Y}) + ω_{C} V (C - \tilde{C}) + ω_{I} V (I - \tilde{I}) + ω_{G} V (G - \tilde{G}) + ω_{X M} V (X M - \tilde{X M}), ω_{i} > 0, \sum_{i} ω_{i} = 1 .

(24)

Such specification can be expressed as a weighted trace, as follows:

\min_{P \in R^{n \times m}} t r (ω S P W P^{'} S^{'}) s . t . P S = I_{n},

(25)

where

ω

is a

m \times m

matrix with its

(i, i)

element equal to

ω_{i}

. When the

m \times m

weight matrix is a diagonal matrix, the objective function is a weighted sum of the variance of forecast errors. Note that the constraint in (

25

) is the same as that in (

9

), so the same unbiasedness assumption is still imposed. The same unbiasedness assumption is also reflected in the objective function as in (

9

).

As [5] showed and [1] proved in their unpublished manuscript, the optimal matrix

P

is independent of

ω

as long as

ω

is symmetric and invertible. Therefore, in practice, one does not need to exercise judgment or estimate how much weight to put on which variable.

Proposition 1.

For any symmetric and invertible

m \times m

matrix

ω,

the solution to (25) is

P = {(S^{'} W^{- 1} S)}^{- 1} S^{'} W^{- 1} .

(26)

Proof.

The proof is almost identical to Section 4. Let the Lagrangian be

L (P) = t r (ω S P W P^{'} S^{'}) + t r (Λ^{'} (I_{n} - P S)) .

(27)

Following the same logic as Section 4, the first-order condition leads to

\lim_{α \to 0} \frac{L (P + α H) - L (P)}{α} = t r (H (2 W P^{'} S^{'} ω S - S Λ^{'})) = 0 .

(28)

Since this has to hold for all

H

,

2 W P^{'} S^{'} ω S = S Λ^{'} \Rightarrow P^{'} S^{'} ω S = \frac{1}{2} W^{- 1} S Λ^{'} .

(29)

Multiplying

S^{'}

on both sides from the left and using

S P S = S

gives the following:

S^{'} ω S = \frac{1}{2} S^{'} W^{- 1} S Λ^{'} \Rightarrow {(S^{'} W^{- 1} S)}^{- 1} S^{'} ω S = \frac{1}{2} Λ^{'} .

(30)

The formula follows because

S^{'} ω S

is a full-rank square matrix and, thus, invertible.

P^{'} S^{'} ω S = W^{- 1} S {(S^{'} W^{- 1} S)}^{- 1} S^{'} ω S \Rightarrow P = {(S^{'} W^{- 1} S)}^{- 1} S^{'} W^{- 1} .

(31)

□

Intuitively, the fact that the weight matrix does not matter can be interpreted as saying that there is not a trade-off between variables, as if the choice matrix

P

has enough degree of freedom in mixing the base forecast so that the variance of each variable’s forecast error can be minimized variable by variable, without affecting the variance of other variables’ forecast errors.

Mathematically, the proof shares an almost identical structure as Section 4, which is a special case when

ω = I_{m}

. Since a symmetric invertible matrix can be factorized as

ω = A^{'} A

from Takagi’s factorization, the objective function can be written as

t r (ω S P W P^{'} S^{'}) = t r (A S P W P^{'} S^{'} A^{'})

(32)

for any full-rank square matrix

A

. In fact, since the proof only requires

S^{'} ω S

to be invertible, one can extend

A

to be a non-square matrix and show that the objective function of (

11

) is a special case of that of (

25

).

Proposition 2.

There exists a weight

ω

such that the objective function of (11) equals that of (25).

Proof.

Let

A ≔ {(S^{'} S)}^{- 1} S^{'}

and

ω ≔ A^{'} A

. The objective function of (25) collapses to that of (11).

t r (ω S P W P^{'} S^{'}) = t r (A S P W P^{'} S^{'} A^{'}) = t r (P W P^{'}) .

(33)

□

Note that since

S^{'} ω S = I_{n}

is invertible, the proof of proposition 1 can be applied to show (

11

). This is one way to see why the proof of [1] reaches the same formula. One insight from the right side of (

33

) is that it represents the summed variance of the forecast error of the most disaggregated variables. Thus, minimizing the summed variance of all variables is equivalent to minimizing the summed variance of the most disaggregated variables.

In summary, the extension to allow a general weight highlights two observations. First, the irrelevance of weight implies that the objective function, being the trace of the forecast error covariance matrix, is not essential, although (

9

) is called minimum trace reconciliation in the literature. What is essential is the unbiasedness assumption, and thus, it could alternatively be called an optimal unbiased reconciliation. Second, the irrelevance of weight suggests that (

9

) reconciles the base forecast as if the forecast error variance of each variable can be minimized independently, but at the same time, (

9

) can be obtained by minimizing the variance of only the bottom-level variables. The extension suggests that these two apparently contradictory interpretations can coexist.

6. Conclusions

In this paper, we have provided an alternative proof to the minimum trace reconciliation developed by [1], filling a gap in their proof. We have also shown that an almost identical proof can be used to prove [5], so both the trace and weighted trace can be analyzed from a unified perspective. We believe the alternative simpler proof provides additional insights and contributes to deepening the understanding of the minimum trace reconciliation.

Author Contributions

S.A. and F.N. wrote and reviewed the paper together. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We thank Daniele Girolimetto, Olivier Sprangers, Shanika Wickramasuriya, Yangzhuoran Fin Yang, Machiko Narita, and other colleagues at the IMF for their useful comments. The views expressed in this paper are those of the authors and do not necessarily represent the views of the IMF, its Executive Board, or IMF management.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wickramasuriya, S.L.; Athanasopoulos, G.; Hyndman, R.J. Optimal Forecast Reconciliation for Hierarchical and Grouped Time Series Through Trace Minimization. J. Am. Stat. Assoc. 2019, 114, 804–819. [Google Scholar] [CrossRef]
Taieb, S.B.; Taylor, J.W.; Hyndman, R.J. Hierarchical Probabilistic Forecasting of Electricity Demand with Smart Meter Data. J. Am. Stat. Assoc. 2021, 116, 27–43. [Google Scholar] [CrossRef]
Ando, S.; Kim, T. Systematizing Macroframework Forecasting: High-Dimensional Conditional Forecasting with Accounting Identities. IMF Econ. Rev. 2023. [Google Scholar] [CrossRef]
Athanasopoulos, G.; Gamakumara, P.; Panagiotelis, A.; Hyndman, R.J.; Affan, M. Hierarchical forecasting. In Macroeconomic Forecasting in the Era of Big Data; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Panagiotelis, A.; Athanasopoulos, G.; Gamakumara, P.; Hyndman, R.J. Forecast Reconciliation: A Geometric View with New Insights on Bias Correction. Int. J. Forecast. 2021, 37, 343–359. [Google Scholar] [CrossRef]
Horn, R.A.; Johnson, C.R. Matrix Analysis, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Luenberger, D.G. Optimization by Vector Space Methods; John Wiley & Sons: Hoboken, NJ, USA, 1969. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the IMF. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.