Heuristic Cross-Temporal Reconciliation Approaches Applied to Heterogeneous Models in Photovoltaic Forecasting

Gudiño-Ochoa, Alberto; Calderón-González, Harold Felipe

doi:10.3390/computers15070425

Open AccessArticle

Heuristic Cross-Temporal Reconciliation Approaches Applied to Heterogeneous Models in Photovoltaic Forecasting

by

Alberto Gudiño-Ochoa

and

Harold Felipe Calderón-González

^*

AgnaLab, New York, NY 10003, USA

^*

Author to whom correspondence should be addressed.

Computers 2026, 15(7), 425; https://doi.org/10.3390/computers15070425

Submission received: 3 May 2026 / Revised: 19 June 2026 / Accepted: 29 June 2026 / Published: 1 July 2026

(This article belongs to the Special Issue Machine Learning and Statistical Learning with Applications (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

Forecast reconciliation has been widely studied in cross-sectional and temporal hierarchies, but its role in cross-temporal settings for photovoltaic (PV) forecasting remains insufficiently examined. In particular, the relative benefits of reconciliation across heterogeneous forecasting approaches, including statistical, machine learning, deep learning, and foundation models, have not been clearly established. This study addresses that gap by evaluating direct, univariate, and iterative cross-temporal reconciliation strategies applied to TBATS, LightGBM, KAN, NBEATSx, NHITS, and TimeGPT using Belgian PV generation data from 2020 to 2025 across weekly, daily, and hourly frequencies and national, regional, and provincial levels. Model efficacy is assessed through 52-week walk-forward cross-validation, which provides a full-year coverage. Under the fixed-configuration experimental protocol adopted in this study, the results show that the gains from reconciliation vary substantially across forecasting families. LightGBM achieved the largest observed gains, with its univariate and iterative schemes achieving global error reductions of up to 19.6% relative to the Bottom-Up benchmark. KAN, NHITS, and NBEATSx also benefited from reconciliation, with their best reconciled variants yielding reductions of up to 11.9%. TimeGPT and TBATS achieved reductions of up to 9.2% and 14.5%, respectively, although their global errors were higher than those obtained by the best machine learning and deep learning configurations in this evaluation. Across the fixed baseline configurations considered here, LightGBM obtained the lowest global errors before and after reconciliation. These findings show that cross-temporal reconciliation can be an effective post-processing strategy, but its impact depends strongly on the underlying base forecasting model. Therefore, the observed advantage of LightGBM should be interpreted as conditional on the adopted feature set, implementations, and baseline configurations.

Keywords:

photovoltaic; cross–temporal reconciliation; hierarchical time series; renewable energy

1. Introduction

Photovoltaic (PV) forecasting is essential for the operation of modern power systems. Forecast errors affect scheduling, market participation, reserve allocation, and grid balance, with direct economic consequences for both producers and system operators [1,2,3,4]. Therefore, improving forecast quality is not only a modeling objective, but also a practical requirement for the reliable and cost-effective integration of solar generation into electrical systems [5,6,7,8,9].

Recent studies in PV forecasting have largely focused on improving model accuracy. Statistical methods remain relevant because of their interpretability and relatively low complexity, although they may become less effective when nonlinearities and multi-scale patterns are more pronounced [10,11]. Machine learning models have expanded the forecasting toolkit by learning flexible relationships from lagged, calendar, and meteorological features [12,13,14,15,16]. Deep learning architectures have extended this line of work by modeling complex temporal dependencies, while more recent transformer-based and foundation models have introduced the possibility of transfer across domains through large-scale pretraining [17,18,19,20,21,22]. However, most studies in PV forecasting still focus primarily on improving the base forecasting model itself. In practice, PV systems may involve extensive hierarchies in which coherence across aggregated levels is also required [23,24,25,26].

In hierarchical PV systems, accuracy alone is therefore not sufficient. Forecasts are often required simultaneously at several temporal resolutions and aggregation levels. When these forecasts are generated independently, they may violate aggregation constraints: lower-level forecasts may fail to match higher-level totals, and high-frequency forecasts may be inconsistent with lower-frequency aggregates. These inconsistencies reduce the practical value of forecasts because different planning layers may end up operating with conflicting views of future generation. In this setting, forecast quality is determined not merely by predictive accuracy, but further by coherence across temporal and cross-sectional dimensions [27,28,29,30,31].

Forecast reconciliation addresses this issue as an ex post mechanism that modifies base forecasts to satisfy aggregation constraints while exploiting the relationships among related series. In cross-sectional hierarchies, reconciliation enforces coherence across aggregation levels. In temporal hierarchies, it aligns forecasts produced at different sampling frequencies. Cross-temporal reconciliation extends these two settings by seeking forecasts that are coherent simultaneously across space and time. Over the last few years, this framework has evolved from heuristic and sequential procedures to optimal combination formulations, iterative alternatives, and broader analyses of their statistical and computational properties [27,28,29,31]. Applications in energy forecasting have also shown that cross-temporal reconciliation can improve forecast accuracy [32,33,34,35].

However, the empirical evidence on cross-temporal reconciliation remains limited in two specific respects. First, a previous PV study using the same Belgian dataset examined temporal hierarchical reconciliation across heterogeneous forecasting models, but did not evaluate the joint effect of temporal and cross-sectional coherence constraints [22]. Second, preliminary cross-temporal evidence has considered direct, univariate, and iterative reconciliation schemes, but the role of the base forecasting paradigm and the reconciliation strategy still requires a clearer assessment under a common PV forecasting protocol [36]. This distinction is important because statistical models, machine learning models, deep learning architectures, and foundation models differ in how they represent temporal dependence, use exogenous information, and distribute forecast errors across aggregation levels. There is therefore little reason to expect cross-temporal reconciliation to affect all forecasting paradigms in the same way [27,31].

Accordingly, this study evaluates cross-temporal reconciliation in a PV generation forecasting setting utilizing exemplary models spanning four forecasting paradigm families: statistical, machine learning, deep learning, and foundation models. Base forecasts are generated independently and then reconciled through three established strategies: a direct approach, in which temporal and cross-sectional constraints are imposed jointly; a univariate approach, in which one dimension is reconciled explicitly while the other is enforced through bottom-up aggregation; and an iterative approach, which alternates reconciliation across both dimensions until convergence [29]. The analysis is conducted over a 52-week walk-forward cross-validation protocol, covering the entirety of an annual photovoltaic operational cycle, with forecasts evaluated across weekly, daily, and hourly resolutions and across national, regional, and provincial aggregation levels.

The contribution of this work is therefore empirical. It clarifies how established cross-temporal reconciliation strategies behave when applied to heterogeneous PV forecasting models under the same experimental design. Specifically, the study compares whether the gains from reconciliation depend on the forecasting paradigm, the reconciliation strategy, the temporal frequency, and the cross-sectional aggregation level. This design distinguishes the present work from temporal-only reconciliation studies and from more limited cross-temporal evaluations by focusing on the joint temporal–cross-sectional setting and on the interaction between reconciliation strategy and base model family.

The subsequent sections of this paper are organized as follows. Section 2 elaborates on the cross-temporal reconciliation framework and the estimators under consideration. Section 3 outlines the base forecasting models employed. Section 4 details the dataset and experimental design adopted. Section 5 presents the empirical findings and their discussion. Section 6 draws the concluding remarks and directions for future research.

2. Cross-Temporal Forecast Reconciliation

In PV systems, forecasts are required simultaneously at different temporal resolutions (weekly, daily, hourly) and at cross-sectional levels (national, regional, provincial) to support control, scheduling, and market participation [20,37,38]. When these dimensions are modeled independently, inconsistencies emerge—hourly trajectories may not match daily or weekly totals, and national forecasts may diverge from the sum of regional or provincial nodes. At the same time, valuable information available at one temporal resolution remains unused by the others.

Cross-temporal reconciliation overcomes these limitations by integrating forecasts along both dimensions in a unified framework. Low-frequency patterns provide stability to finer-grained predictions, while high-frequency and local dynamics enrich aggregated views. The result is a set of forecasts that are more accurate and useful for coordinated decision making in PV energy management.

2.1. Cross-Sectional and Temporal Summing Matrix

In this study, two hierarchical structures are considered. The first is the cross-sectional hierarchy, which organizes forecasts geographically. At the bottom level there are 11 provincial series. These aggregate into 3 regional totals (Flanders, Wallonia, and Brussels), which in turn sum into the national forecast of Belgium.

Formally, let

P (r)

denote the set of provinces belonging to region r (with Brussels-Capital forming a singleton set). If

{\tilde{y}}_{p}

denotes the forecast for province p,

{\tilde{y}}_{r}

the forecast for region r, and

{\tilde{y}}_{national}

the national forecast, the aggregation constraints at any given time are:

{\tilde{y}}_{r} = \sum_{p \in P (r)} {\tilde{y}}_{p}, {\tilde{y}}_{national} = \sum_{r = 1}^{3} {\tilde{y}}_{r} .

(1)

These relations can be compactly represented as

{\tilde{y}}^{cross} = S_{cross} {\tilde{y}}_{b}^{cross}, S_{cross} = [\begin{matrix} 1_{1 \times 11} \\ B_{3 \times 11} \\ I_{11} \end{matrix}],

(2)

where

{\tilde{y}}_{b}^{cross} \in R^{11}

contains the provincial forecasts,

1_{1 \times 11}

produces the national total,

B_{3 \times 11}

aggregates them into their three corresponding regions, and

I_{11}

preserves the 11 bottom-level provincial series.

The second structure is the temporal hierarchy. PV generation is predicted over a one-week forecasting horizon across three temporal resolutions: 1 week ahead, 1–7 days ahead, and 1–168 h ahead. Given that each resolution is estimated independently, cross-resolution coherence is not inherently ensured; as an illustration, the aggregation of 168 hourly forecasts may not correspond to the sum of seven daily totals, nor to the weekly aggregate. Temporal hierarchies offer a principled methodological framework for imposing such aggregation constraints. These hierarchical relationships can be expressed compactly through a summing matrix. Let

\tilde{y} \in R^{176}

denote the vector of reconciled forecasts, ordered as 1 weekly total, 7 daily totals, and 168 hourly values. Then, for each cross-sectional node (national, regional, or provincial):

\tilde{y} = S_{temp} {\tilde{y}}_{b},

(3)

where

{\tilde{y}}_{b} \in R^{168}

contains the bottom-level hourly forecasts. For this hierarchy,

S_{temp} = [\begin{matrix} 1_{168}^{⊤} \\ I_{7} \otimes 1_{24}^{⊤} \\ I_{168} \end{matrix}],

(4)

with ⊗ denoting the Kronecker product,

I_{m}

denoting an identity matrix of order m, and

1_{m}

denoting an m-dimensional vector of ones. This enforces consistency between daily totals and their 24 hourly constituents, while ensuring the weekly forecast equals the sum of all daily values or, correspondingly, the aggregate of all 168 hourly predictions.

Figure 1 illustrates the three structures considered. The cross-sectional hierarchy (left) links provincial forecasts to regional and national totals, while the temporal hierarchy (middle) connects hourly, daily and weekly resolutions. Combining these two dimensions yields the cross-temporal hierarchy (right), which enforces coherence at all temporal resolutions for each cross-sectional node. This integrated structure constitutes the basis for the reconciliation strategies evaluated in this study.

2.2. Cross-Temporal Reconciliation Methods and Heuristic Alternatives

All reconciliation methods considered in this study can be expressed within the generalized least squares (GLS) projection framework. Reconciliation is achieved by mapping the vector of base forecasts

\hat{y}

onto the coherent subspace induced by the summing matrix

S

:

\tilde{y} = R \hat{y},

(5)

where

\tilde{y}

denotes the reconciled forecasts and

R

is the reconciliation matrix. As shown by [39],

R

can be written as

R = S G, G = {(S^{⊤} W^{- 1} S)}^{- 1} S^{⊤} W^{- 1} .

(6)

where

S

encapsulates the hierarchical temporal aggregation structure and

W

denotes the covariance matrix, estimated from the in-sample residuals of the base forecasting models, that governs the magnitude and direction of forecast adjustments. Within this formulation,

G

functions as a combination matrix that maps base forecasts onto the coherent subspace characterized by

S

, minimizing error variance subject to the covariance structure represented by

W

[27].

The Bottom-Up (BU) method, which constructs reconciled forecasts by directly aggregating the bottom-level series, does not arise from the GLS framework and, therefore, does not rely on any specification of

W

. However, BU is routinely included as a benchmark because it enforces coherence deterministically and provides a natural baseline against which GLS-based reconciliations can be compared.

Under the GLS formulation, distinct reconciliation approaches arise from varying specifications of the matrix

W

: the identity matrix (OLS), simple structural weights, or fully data-driven covariance matrix estimators. In practice, however,

W

is unknown and must be approximated from the data. This has motivated a range of approaches that introduce simplified or adaptive weighting schemes, trading some statistical efficiency for numerical stability and scalability.

Table 1 provides a summary of the covariance estimators considered in this study, together with their mathematical form and main assumptions. For methods that require estimating a full covariance matrix, Ledoit–Wolf shrinkage is applied to regularize the sample covariance, improving conditioning and ensuring stable inversion in high-dimensional settings [40,41].

These estimators provide the foundation for cross-temporal reconciliation. We examine three complementary strategies: a direct method that enforces temporal and cross-sectional constraints in a single step; a univariate alternative that reconciles forecasts along only one hierarchy while imposing coherence on the other through BU aggregation; and an iterative scheme that alternates between both dimensions until convergence.

2.2.1. Direct Cross-Temporal Method

The direct cross-temporal method enforces coherence simultaneously for both the cross-sectional and temporal hierarchies [29], using a joint cross-temporal summing matrix constructed as the Kronecker product of the cross-sectional and temporal summing matrices:

S_{CT} = S_{cross} \otimes S_{temp},

(7)

where ⊗ denotes the Kronecker product. This operation expands the cross-sectional aggregation structure across all temporal aggregation levels and the temporal aggregation structure across all cross-sectional nodes. Thus,

S_{CT}

maps the bottom-level hourly provincial forecasts to the complete set of coherent cross-temporal aggregates. Since

S_{cross} \in R^{15 \times 11}

and

S_{temp} \in R^{176 \times 168}

, the resulting matrix satisfies

S_{CT} \in R^{2640 \times 1848}

. Reconciliation is then obtained through the GLS projection described in Section 2.2, with

S

replaced by

S_{CT}

.

We implemented reconciliation under four specifications: BU, OLS, SS, and COV. Figure 2 illustrates the block structure of

S_{CT}

. The horizontal axis corresponds to the bottom-level series (hourly forecasts for each province), while the vertical axis stacks the higher-level aggregates (regional, national, daily, and weekly). The characteristic staircase pattern reflects how the aggregation constraints are embedded simultaneously in space and time, each block capturing the contribution of lower-level forecasts to higher-level aggregates.

The main advantage of the direct approach is that it imposes all cross-temporal aggregation constraints in a single step. Its limitation is computational, since the joint summing matrix grows rapidly and covariance estimation becomes demanding under COV.

2.2.2. Univariate Cross-Temporal Method

The univariate strategy applies reconciliation in a single dimension—either temporal or cross-sectional—while the other is imposed deterministically through BU aggregation [29,31,45]. In the temporal-first variant, the provincial forecasts are reconciled in the week–day–hour hierarchy, and then the cross-sectional totals are obtained by the BU. In the cross-sectional-first variant, the national–regional–provincial hierarchy is reconciled at the hourly level, and subsequently, temporal consistency is enforced by aggregating hours into days and weeks.

This approach ensures that forecasts respect both hierarchies: one through GLS adjustment and the other through exact summation. Its main strength lies in simplicity and computational efficiency, as it avoids constructing the full cross-temporal summing matrix [29]. However, only the reconciled dimension benefits from variance–covariance weighting, while the other remains constrained to BU aggregation.

2.2.3. Iterative Cross-Temporal Method

The iterative method alternates temporal and cross-sectional reconciliation until both hierarchies are satisfied simultaneously [29,31,45]. Let

Y \in R^{T \times C}

denote the forecast matrix, with T temporal indices (week, days, hours) and C cross-sectional nodes (national, regional, provincial). Table 2 illustrates this structure.

Let

R_{temp}

and

R_{cross}

denote the reconciliation matrices for the temporal and cross-sectional hierarchies, respectively, obtained under the GLS framework described in Section 2.2. In this subsection, the superscript

(t)

denotes the iteration counter of the alternating reconciliation procedure. The iterative cross-temporal method proceeds in three steps:

Step 1—Temporal reconciliation (column-wise).

At each iteration, temporal reconciliation is applied column-wise, for each cross-sectional node

c = 1, \dots, C

,

Y_{\cdot, c}^{(t + \frac{1}{2})} = R_{temp} Y_{\cdot, c}^{(t)}, c = 1, \dots, C,

ensuring that hourly forecasts aggregate to daily values, and daily to the weekly total.

Y_{\cdot, c}^{(t)}

denotes the vector of forecasts across the temporal hierarchy for the cross-sectional node c.

Step 2—Cross-sectional reconciliation (row-wise).

The cross-sectional reconciliation is then applied row-wise, for each temporal index

τ = 1, \dots, T

,

Y_{τ, \cdot}^{(t + 1)} = R_{cross} Y_{τ, \cdot}^{(t + \frac{1}{2})}, τ = 1, \dots, T,

where

Y_{τ, \cdot}^{(t)}

denotes the vector of forecasts across all cross-sectional nodes at temporal index

τ

. Thus, provinces sum to their regions and regions to the national total.

Step 3—Check aggregation inconsistencies.

We measure the remaining inconsistencies against the summing matrices using the

ℓ_{1}

norm (Manhattan distance):

d_{temp}^{(t)} = \sum_{c = 1}^{C} ∥ Y_{\cdot, c}^{(t)} - S_{temp} Y_{B_{temp}, c}^{(t)} ∥_{1}, d_{cross}^{(t)} = \sum_{τ = 1}^{T} ∥ Y_{τ, \cdot}^{(t)} - S_{cross} Y_{τ, B_{cross}}^{(t)} ∥_{1} .

Here,

B_{temp}

and

B_{cross}

denote the sets of bottom-level indices in the temporal and cross-sectional hierarchies, respectively.

The iterations stop once

max {d_{temp}^{(t)}, d_{cross}^{(t)}} < δ

; we set

δ = 10^{- 6}

. The cycle can also start with cross-sectional reconciliation followed by temporal reconciliation. Since

R_{temp}

and

R_{cross}

generally do not commute, intermediate updates differ across cycles; however, the aggregation inconsistencies are minimal once the tolerance criterion is satisfied. The method simply alternates both projections until the forecasts are coherent along the temporal and cross-sectional hierarchies [29,31]. In the implementation, the maximum number of iterations was set to 300. To avoid unnecessary cycling when the coherence objective stopped improving, an early-stopping patience of 10 iterations was used.

3. Heterogeneous Base Models

In this work, the cross-temporal reconciliation framework is evaluated across several forecasting approaches, including statistical, machine learning, deep learning, and foundation approaches. Specifically, we consider TBATS, LightGBM, NHITS, KAN, NBEATSx, and TimeGPT. These models represent different structures and learning mechanisms, allowing us to examine how reconciliation interacts with distinct forecasting paradigms.

3.1. TBATS: Trigonometric Box–Cox ARMA Trend Seasonal Model

The TBATS model [46,47] generalizes exponential smoothing by integrating Box–Cox transformations, ARMA errors, trend, and trigonometric seasonal components, allowing it to capture multiple and even fractional seasonal cycles. Variance stabilization is achieved through a Box–Cox transformation with transformation parameter

λ_{BC}

:

y_{t}^{(λ_{BC})} = \{\begin{matrix} \frac{y_{t}^{λ_{BC}} - 1}{λ_{BC}}, & λ_{BC} \neq 0, \\ ln (y_{t}), & λ_{BC} = 0, \end{matrix}

(8)

where

λ_{BC}

controls the power transformation and the case

λ_{BC} = 0

corresponds to the natural logarithm. The transformed observation equation is formulated as a composite of level

ℓ_{t}

, trend

b_{t}

, trigonometric seasonal components, and ARMA

(p, q)

disturbances:

y_{t}^{(λ_{BC})} = ℓ_{t - 1} + ϕ b_{t - 1} + \sum_{r = 1}^{M} s_{t - 1}^{(r)} + d_{t}, s_{t}^{(r)} = \sum_{u = 1}^{k_{r}} s_{u, t}^{(r)} .

(9)

The ARMA disturbance is defined as

d_{t} = \sum_{a = 1}^{p} φ_{a} d_{t - a} + \sum_{b = 1}^{q} θ_{b} ε_{t - b} + ε_{t} .

(10)

Here,

ℓ_{t}

denotes the level component,

b_{t}

the trend component,

ϕ

the damping parameter, M the number of seasonal components,

s_{t}^{(r)}

the rth trigonometric seasonal component,

k_{r}

the number of Fourier harmonics used for the rth seasonal component,

d_{t}

the ARMA disturbance, p and q the autoregressive and moving-average orders,

φ_{a}

and

θ_{b}

the AR and MA coefficients, and

ε_{t}

the innovation term. Fourier expansions efficiently approximate complex seasonal cycles, making TBATS a strong statistical reference for high-resolution PV forecasting tasks.

3.2. LightGBM for Time Series Forecasting

Light Gradient Boosted Machine or LightGBM is a fast and memory-efficient implementation of gradient boosted decision trees [48]. Despite not being specifically developed for temporal data, it can be reformulated for forecasting by constructing supervised learning datasets enriched with time-related features. This model learns additively, where each successive tree

f_{t}

is fitted to the residual errors produced by the prior stage:

{\hat{y}}_{i}^{(t)} = {\hat{y}}_{i}^{(t - 1)} + f_{t} (x_{i}), f_{t} \in T_{tree} .

(11)

where

{\hat{y}}_{i}^{(t)}

denotes the prediction for observation i after boosting iteration t,

f_{t}

is the regression tree added at that iteration,

x_{i}

is the corresponding feature vector, and

T_{tree}

denotes the space of regression trees.

Because the algorithm does not model autocorrelation explicitly, predictive accuracy hinges on feature engineering. Common practices involve including lagged values for short-term dependencies, rolling statistics for local dynamics, expanding-window summaries for long-term memory, and external regressors such as weather variables to reflect environmental drivers of PV generation.

3.3. TimeGPT: Foundation Transformer Model

TimeGPT [49] represents a foundation model trained on massive collections of heterogeneous time series spanning domains such as energy, finance, and IoT. In contrast to locally trained methods, it leverages large-scale pretraining to enable robust zero- and few-shot generalization, which is advantageous in PV forecasting where historical depth is often limited and non-stationarity is pronounced.

Its architecture follows a Transformer encoder–decoder with sinusoidal positional encodings:

PE (p o s, 2 i) = sin (\frac{p o s}{10000^{2 i / d_{model}}}), PE (p o s, 2 i + 1) = cos (\frac{p o s}{10000^{2 i / d_{model}}}) .

(12)

Dependencies are modeled using multi-head attention:

Attention (Q, K, V) = softmax (\frac{Q K^{⊤}}{\sqrt{d_{k}}}) V,

(13)

MultiHead (Q, K, V) = Concat (h_{1}, \dots, h_{H_{att}}) W^{O},

(14)

combined with normalization for stability:

LN (X) = γ \frac{X - μ}{σ + ϵ} + β .

(15)

Here,

p o s

denotes the temporal position, i indexes the embedding dimension,

d_{model}

is the model dimension,

Q

,

K

, and

V

are the query, key, and value matrices,

d_{k}

is the key dimension,

h_{1}, \dots, h_{H_{att}}

are the attention-head outputs,

H_{att}

is the number of attention heads,

W^{O}

is the output projection matrix, and

X

denotes the input representation normalized by layer normalization.

From a probabilistic standpoint, TimeGPT estimates the conditional distribution of future trajectories:

P (y_{t + 1 : t + H_{for}} | y_{0 : t}, x_{0 : t + H_{for}}) = f_{θ} (y_{0 : t}, x_{0 : t + H_{for}}),

(16)

where

H_{for}

denotes the forecast horizon,

y_{0 : t}

is the historical target sequence,

x_{0 : t + H_{for}}

denotes the available covariates, and

f_{θ}

is the Transformer-based forecasting function parameterized by

θ

. Generating forecasts auto-regressively, TimeGPT removes the need for manual feature design and captures multi-scale patterns. It has reported competitive results across a broad range of forecasting challenges [21].

3.4. Kolmogorov–Arnold Networks (KANs)

Kolmogorov–Arnold Networks (KANs) draw their theoretical basis from the Kolmogorov–Arnold representation theorem, which asserts that every continuous multivariate function can be decomposed into superpositions of univariate functions and additive combinations thereof. Unlike traditional multilayer perceptrons (MLPs), which use fixed activation functions, KANs assign trainable nonlinear mappings to edges, typically realized as spline functions [50].

A general form is:

f (x_{1}, \dots, x_{n}) = \sum_{q = 1}^{2 n + 1} Φ_{q} (\sum_{p = 1}^{n} φ_{q, p} (x_{p})),

(17)

leading to the layer structure:

x_{j}^{(l + 1)} = \sum_{i = 1}^{n_{l}} φ_{i, j}^{(l)} (x_{i}^{(l)}), j = 1, \dots, n_{l + 1},

(18)

where n is the input dimension, p and q are local summation indices,

φ_{q, p}

and

Φ_{q}

denote the inner and outer univariate functions in the Kolmogorov–Arnold representation, i and j index units in consecutive layers,

n_{l}

is the number of units in layer l, and

φ_{i, j}^{(l)}

is the trainable edge function from unit i to unit j. With

φ_{i, j}^{(l)}

modeled as B-splines, the full network is then a composition:

KAN (x) = (Ψ_{L - 1} \circ \dots \circ Ψ_{0}) (x) .

(19)

Here,

Ψ_{l}

denotes the vector-valued transformation implemented by the lth KAN layer.

This edge-driven formulation provides two main benefits: (i) improved scaling laws, with test error decreasing proportionally to

N^{- α}

and

α = 4

for cubic splines, compared to

α \leq 1

for MLPs; (ii) enhanced interpretability, since the learned spline functions are directly inspectable. In forecast contexts, KANs thus offer a compact yet expressive deep learning alternative.

3.5. NBEATSx: Neural Basis Expansion with Exogenous Variables

NBEATS [51] is a fully connected residual architecture designed for decomposing time series into trend and seasonal components. Its extension, NBEATSx, integrates exogenous variables, which are especially relevant in PV forecasting because of the role of weather [52].

Each block b within stack s derives features from lagged inputs and exogenous regressors:

h_{s, b} = {FCNN}_{s, b} (y_{s, b - 1}^{back}, X_{s, b - 1}), θ_{s, b}^{back} = W^{back} h_{s, b}, θ_{s, b}^{for} = W^{for} h_{s, b},

(20)

which are then expanded through basis functions:

{\hat{y}}_{s, b}^{back} = V_{s, b}^{back} θ_{s, b}^{back}, {\hat{y}}_{s, b}^{for} = V_{s, b}^{for} θ_{s, b}^{for} .

(21)

Residual connections propagate across blocks and stacks:

y_{s, b + 1}^{back} = y_{s, b}^{back} - {\hat{y}}_{s, b}^{back}, {\hat{y}}^{for} = \sum_{s = 1}^{S} \sum_{b = 1}^{B} {\hat{y}}_{s, b}^{for} .

(22)

Here,

s = 1, \dots, S

indexes stacks,

b = 1, \dots, B

indexes blocks, S and B denote the total numbers of stacks and blocks, respectively,

h_{s, b}

is the hidden representation produced by the fully connected network,

y_{s, b}^{back}

is the backcast residual input,

X_{s, b - 1}

contains the exogenous regressors,

θ_{s, b}^{back}

and

θ_{s, b}^{for}

are backcast and forecast coefficient vectors, and

W^{back}

,

W^{for}

,

V_{s, b}^{back}

, and

V_{s, b}^{for}

are projection or basis matrices.

Two variants are typical: an interpretable design using polynomial, harmonic, and exogenous bases, and a generic form that trades transparency for flexibility. Both configurations employ specialized encoders for covariates to preserve temporal alignment.

3.6. NHITS: Neural Hierarchical Interpolation for Time Series

NHITS builds on the N-BEATS family to enhance long-horizon forecasts through multi-rate sampling and hierarchical interpolation, promoting specialization across temporal frequencies [53].

For block ℓ, an input subsequence

y_{t - L : t, ℓ}

is pooled to highlight a temporal scale:

y_{t - L : t, ℓ}^{(p)} = MaxPool (y_{t - L : t, ℓ}, k_{ℓ}),

(23)

processed by an MLP into backcast and forecast coefficients:

\begin{matrix} h_{ℓ} & = {MLP}_{ℓ} (y_{t - L : t, ℓ}^{(p)}), \end{matrix}

(24)

\begin{matrix} θ_{f}^{ℓ} & = W_{f} h_{ℓ}, θ_{b}^{ℓ} = W_{b} h_{ℓ}, \end{matrix}

(25)

and expanded through an interpolation function

I

:

{\hat{y}}_{t + 1 : t + H, ℓ} = I (ξ, θ_{f}^{ℓ}), {\tilde{y}}_{t - L : t, ℓ} = I (ξ, θ_{b}^{ℓ}) .

(26)

In this subsection,

ℓ = 1, \dots, B

denotes the NHITS block index, L is the input window length, H is the forecast horizon,

k_{ℓ}

is the pooling kernel associated with block ℓ,

h_{ℓ}

is the hidden representation produced by the MLP,

θ_{f}^{ℓ}

and

θ_{b}^{ℓ}

are forecast and backcast coefficient vectors,

ξ

denotes the interpolation grid, and

I (\cdot)

denotes the interpolation function used to expand these coefficients.

Block outputs accumulate additively:

{\hat{y}}_{t + 1 : t + H} = \sum_{ℓ = 1}^{B} {\hat{y}}_{t + 1 : t + H, ℓ},

(27)

while residuals refine inputs iteratively:

y_{t - L : t, ℓ + 1} = y_{t - L : t, ℓ} - {\tilde{y}}_{t - L : t, ℓ} .

(28)

This doubly-residual and multi-rate design encourages block specialization across frequencies, reducing redundancy and improving both accuracy and interpretability for long-horizon PV forecasts.

4. Data and Features

This study uses the open Elia dataset PV power production estimation and forecast on the Belgian grid (Historical), distributed under the Elia Open Data License. The dataset reports quarter-hourly measurements of realized PV generation, monitored capacity, and load factor. Our analysis covers the period from 5 January 2020 to 1 March 2025 and focuses on realized generation. The dataset encompasses 15 hierarchically structured series, consisting of one national-level series, three regional aggregates corresponding to Brussels, Flanders, and Wallonia, and eleven provincial-level series. The spatial organization of these Belgian PV generation nodes follows the structure previously reported by Calderón-González and Gudiño-Ochoa [22], where the national, regional, and provincial nodes are illustrated in Figure 3 of the referenced article.

For the forecasting analysis, the original 15 min records were converted to an hourly series. Given that the raw PV generation data was recorded as power in megawatts (MW), each quarter-hourly observation was initially converted into energy by multiplication by a factor of 0.25 h, after which the four intra-hour values were aggregated to yield hourly generation expressed in megawatt-hours (MWh). Daily and weekly series were subsequently derived through direct aggregation of the hourly data, with weekly periods defined to begin on Sundays. This transformation preserves physical consistency in the conversion from power to energy and ensures additivity across temporal aggregation levels.

Examples of the series at different aggregation levels are shown in Figure 3, spanning January 2020 to March 2025. The national time series exhibits pronounced seasonal dynamics and notable inter-annual variability, whereas the regional and provincial series underscore substantial heterogeneity in both magnitude and relative contribution across Belgian territories.

4.1. Exogenous Features

Two groups of exogenous predictors were considered: meteorological and calendar-related variables. The meteorological predictors consisted of relative humidity and temperature retrieved from the NOAA API, together with shortwave radiation obtained from the Open-Meteo API. Data were collected at the provincial level and then aggregated to regional and national scales to align the spatial resolution of the covariates with the hierarchical structure of the PV generation time series. Correlation analysis with PV generation (y) showed that shortwave radiation, temperature, and humidity were the most informative predictors (Figure 4). By contrast, cloud cover showed only a weak relationship with PV output, while other irradiance-related variables, such as direct, diffuse, and global tilted radiation, were omitted because of their collinearity with shortwave radiation.

Calendar-derived features were included to capture seasonal and intra-annual patterns. These variables comprised daylight duration, trigonometric encodings, and radial basis function (RBF) transformations of temporal indices, including quarter, month, week, day of the year, and hour. For a generic temporal index x with period T, cyclical encodings were computed as:

x_{sin} = sin (\frac{2 π x}{T}), x_{cos} = cos (\frac{2 π x}{T}) .

(29)

RBF expansions follow Gaussian kernels:

ϕ_{k} (x) = exp (- \frac{1}{2} {(\frac{x - μ_{k}}{σ})}^{2}), k = 1, \dots, n,

(30)

with centers

μ_{k}

and width

σ

chosen to balance flexibility and parsimony.

Exogenous information was used only in models that support external inputs, namely LightGBM, NHITS, KAN, and NBEATSx. The selected covariates included shortwave radiation, temperature, relative humidity, and calendar-based features, which were incorporated as historical and future inputs according to the forecasting horizon and temporal frequency. Full details of the feature sets used for each model and temporal frequency are available in [22].

LightGBM, NHITS, KAN, and NBEATSx were trained as global models across multiple PV generation series. To distinguish among individual series during training, static identifiers were encoded through one-hot representations. These identifiers were predefined from the fixed set of 15 PV nodes and remained the same in every walk-forward cutoff. They were used only as static features to distinguish the series in the global models and were not derived from future observations; therefore, they do not introduce look-ahead bias. TimeGPT was also evaluated in a global setting, but without static identifiers or exogenous inputs, operating in a zero-shot fashion based solely on the historical target series. By contrast, TBATS was implemented as a local univariate model, fitted independently to each series using only past target values.

4.2. Walk-Forward Cross-Validation

Forecasting performance was evaluated using a walk-forward cross-validation (WFCV) framework. Out-of-sample accuracy was assessed over 52 weekly forecast origins, each defined at the end of a calendar week (Sunday), thereby spanning a complete annual cycle.

At each iteration, the forecast origin was moved forward sequentially by one calendar week, producing non-overlapping test windows. The forecasting horizon was fixed to one week ahead across all temporal levels: 1 week for weekly series, 7 days for daily series, and 168 h for hourly series. This specification ensures that, at each cutoff, all forecasts start from the same forecast origin and cover the corresponding future week, while preserving the temporal aggregation structure [43,44].

The forecast origins extended from 3 March 2024 to 23 February 2025 and were further grouped into four seasonal subsets, each comprising 13 consecutive weeks: spring (3 March to 26 May 2024), summer (2 June to 25 August 2024), autumn (1 September to 24 November 2024), and winter (1 December 2024 to 23 February 2025).

For reconciliation schemes, weighting matrices in COV, COVSh, WLS, WLSV, WLSH, AUTO, and CROSS were recomputed at every cutoff exclusively from in-sample residuals. This adaptation ensured that weights captured the error structure of the available history, avoided look-ahead bias, and preserved comparability with base forecasts. Reconciliation was then applied exclusively to out-of-sample predictions, so the entire evaluation remained predictive and consistent with best practice [22,29].

5. Experimental Results

All experiments were conducted on Databricks Runtime 16.4 ML (Apache Spark 3.5.2, Scala 2.12) with GPU acceleration enabled. Computations were executed on a g4dn.xlarge instance equipped with four virtual CPUs, 32 GB RAM, and an NVIDIA T4 GPU.

Model performance was assessed using the Normalized Root Mean Squared Error (NRMSE). For each series

i = 1, \dots, N_{series}

, forecasting model j, temporal frequency

f \in {Hourly, Daily, Weekly}

, and cutoff k, the series-level NRMSE was defined as

{NRMSE}_{i, j, f}^{(k)} = \frac{\sqrt{\frac{1}{L_{f}} \sum_{ℓ = 1}^{L_{f}} {({\tilde{y}}_{i, j, f, ℓ}^{(k)} - y_{i, f, ℓ}^{(k)})}^{2}}}{\frac{1}{L_{f}} \sum_{ℓ = 1}^{L_{f}} y_{i, f, ℓ}^{(k)}}

(31)

where

{\tilde{y}}_{i, j, f, ℓ}^{(k)}

and

y_{i, f, ℓ}^{(k)}

denote the reconciled forecast and the observed value, respectively, at horizon step ℓ for series i and cutoff k, and

L_{f}

denotes the forecast horizon associated with frequency f.

N_{series}

denotes the total number of evaluated series. This averaging is performed within each temporal frequency before any aggregation across frequencies or hierarchy levels is computed.

For each model j, temporal frequency f, cross-sectional level

g \in {National, Regional, Provincial}

, and cutoff k, the level-specific NRMSE was computed by averaging the corresponding series-level NRMSE values across all series belonging to that level:

{NRMSE}_{j, f, g}^{(k)} = \frac{1}{| S_{g} |} \sum_{i \in S_{g}} {NRMSE}_{i, j, f}^{(k)},

(32)

where

S_{g}

denotes the set of series belonging to cross-sectional level g, and

| S_{g} |

is the number of series in that level.

A global cutoff-level NRMSE for model j was then obtained by averaging across all temporal frequencies and cross-sectional levels:

{NRMSE}_{j}^{(k)} = \frac{1}{| F | | G |} \sum_{f \in F} \sum_{g \in G} {NRMSE}_{j, f, g}^{(k)},

(33)

where

F = {Hourly, Daily, Weekly}

and

G = {National, Regional, Provincial}

. Therefore, the global cutoff-level NRMSE is a frequency-balanced macro-average: each temporal frequency contributes equally within each cross-sectional level and cutoff, regardless of its number of forecast steps.

The overall global NRMSE over the full WFCV period was obtained by averaging cutoff-level global NRMSE values across all K cutoffs:

{\bar{NRMSE}}_{j} = \frac{1}{K} \sum_{k = 1}^{K} {NRMSE}_{j}^{(k)} .

(34)

The NRMSE facilitates comparisons across series, cross-sectional levels, and temporal frequencies with different scales, while the macro-averaging scheme prevents any temporal frequency from dominating the global metric solely because it contains more forecast steps.

Table 3 illustrates how NRMSE values were aggregated across temporal frequencies and cross-sectional hierarchy levels. Here, j indexes the forecasting model,

f \in {w, d, h}

denotes the temporal frequency (weekly, daily, and hourly, respectively), and

g \in {N, R, P}

denotes the cross-sectional hierarchy level (national, regional, and provincial).

Each gray cell represents the average of the cutoff-level NRMSE values for a given combination of temporal frequency and cross-sectional hierarchy level. Blue cells represent cross-sectional summaries obtained by averaging NRMSE over all frequencies and cutoffs within each hierarchy level. Green cells represent temporal summaries obtained by averaging NRMSE over all cross-sectional hierarchy levels and cutoffs within each temporal frequency. The orange cell represents the global summary obtained by averaging NRMSE over all temporal frequencies, cross-sectional hierarchy levels, and cutoffs. Accordingly, the orange cell should be interpreted as a balanced global summary rather than as a pooled error over all forecasted time points. It should therefore be read together with the frequency-specific summaries shown in the green cells.

For comparability across methods, all models were trained and evaluated using common baseline configurations, and no model-specific hyperparameter tuning was undertaken. This choice was made to preserve a common evaluation protocol for the reconciliation analysis, rather than to optimize each forecasting method individually. The reported results should therefore be interpreted as a controlled comparison of reconciliation behavior across forecasting paradigms under standardized conditions. Detailed information on software libraries, model architectures, configurations, and parameter settings follows the fixed experimental protocol previously reported in [22].

Figure 5 provides an overview of the forecasting and cross-temporal reconciliation pipeline employed in this work. Starting from raw Belgian PV generation data at 15 min resolution, the cross-sectional hierarchy is first defined, and the series are then temporally aggregated to hourly, daily, and weekly frequencies. Feature engineering is subsequently conducted independently at each temporal resolution according to the model-specific settings outlined in Section 4.1. Base forecasts are then produced for each model–frequency combination using the walk-forward cross-validation (WFCV) framework described in Section 4.2. These forecasts are then used as inputs to the cross-temporal reconciliation stage, using the direct, univariate, and iterative approaches described in Section 2.2. The final output consists of weekly, daily, and hourly forecasts reconciled across both the temporal and cross-sectional hierarchies.

5.1. Error Diagnostic Analysis

To evaluate whether the impact of cross-temporal reconciliation on predictive accuracy varies across forecasting paradigms, NRMSE values were analyzed using a multi-factor analysis of variance (ANOVA) based on the aligned rank transform (ART) framework. This approach provides a non-parametric alternative for factorial experimental designs, enabling the assessment of main and interaction effects without requiring the assumption of normally distributed residuals. [54,55]. The Bottom-Up (BU) strategy was included as a benchmark reconciliation approach, as it guarantees both temporal and cross-sectional coherence by construction through the aggregation of forecasts from the hourly level to coarser temporal frequencies and from bottom-level series to higher-level aggregates. Separate analyses were performed for weekly, daily, and hourly frequencies and for each spatial aggregation level (national, regional, and provincial). In each case, forecasting accuracy was evaluated using ART-based mixed-effects models, where cutoff was specified as a random effect and degrees of freedom were adjusted using the Kenward–Roger approximation [56].

At the global level, we applied a repeated-measures ART ANOVA. Table 4 indicates significant effects of model, reconciliation method, and their interaction on NRMSE. This shows that predictive accuracy varies across forecasting models, that reconciliation variants do not produce equivalent errors, and that their effect depends on the base forecaster. The same pattern was observed across all hierarchical levels and temporal frequencies, where all effects remained statistically significant (

p < 2.22 \times 10^{- 16}

) in every case.

To complement the analysis, Figure 6 presents a Multiple Comparisons with the Best (MCB) analysis based on average NRMSE for the six baseline forecasting models, both globally and separately by temporal frequency and cross-sectional level. This comparison is relevant because the effectiveness of reconciliation depends, to a large extent, on the quality of the underlying base forecasts. LightGBM ranked first globally and at all three cross-sectional levels. The second to fourth positions were occupied by NHITS, NBEATSx, and KAN, with only minor changes in their ordering across the national, regional, and provincial levels, whereas TBATS and TimeGPT consistently appeared at the bottom of the ranking. Across frequencies, the weekly panel showed the weakest separation, with all baseline models falling within the critical band, indicating statistically similar performance. The daily panel shows LightGBM ranked first and NHITS and NBEATSx following behind. The hourly panel showed the clearest separation, with LightGBM alone occupying the top position.

Figure 7 extends the ranking analysis to reconciled forecasts. The MCB procedure was computed on the full set of 270 reconciled configurations generated across forecasting models, reconciliation approaches, and estimators. However, for visualization, we restricted our attention to a representative subset of 36 configurations constructed from the best- and worst-performing reconciled variants of each forecasting model under direct, univariate, and iterative cross-temporal approaches. This yielded six representative configurations per model: the best and worst of direct, univariate, and iterative schemes, for a total of 36. From this representative subset, the plot displays only the 10 highest-ranked configurations to preserve readability and facilitate comparisons between forecasting models and reconciliation strategies.

Globally, the ranking was dominated by LightGBM, whose univariate tCROSS:cBU and iterative tCROSS:cWLS configurations consistently occupied the first two positions, followed by its direct COV variant. Importantly, 117 reconciled configurations overlapped the MCB critical difference band. The top part of this group was dominated by LightGBM variants across all reconciliation approaches and estimators, followed by the best-performing iterative and univariate deep learning configurations.

The global top 10 included both the best- and worst-performing reconciled variants of LightGBM for each of the three cross-temporal approaches. This indicates that, across the direct, univariate, and iterative schemes, and across all associated estimators, LightGBM consistently remained ahead of the other forecasting models, even when compared against their best reconciled variants. KAN and NHITS also entered the global top 10, mainly through their best iterative and univariate variants for KAN, and their direct and univariate variants for NHITS.

A similar pattern was observed across cross-sectional levels. At the national and regional levels, the leading positions were still dominated by LightGBM variants. At the provincial level, however, only the three strongest LightGBM configurations remained at the top, while the best NHITS, NBEATSx and KAN variants became more prominent. The frequency-specific panels revealed a clearer contrast. At the weekly level, all plotted configurations lay within the critical band, indicating very similar performance among the best reconciled models. At the daily and hourly levels, the separation became more evident: the two LightGBM tCROSS variants remained the top-ranked configurations, followed by other LightGBM variants and then the best NHITS and KAN alternatives. Meanwhile, TBATS and TimeGPT consistently occupied the lowest positions in the global, frequency-specific, and cross-sectional MCB rankings, indicating weaker reconciled performance relative to the other forecasting models. Together, these results confirm that the strongest reconciled performance was repeatedly achieved by LightGBM, especially under the univariate and iterative tCROSS specifications.

Table 5 reports the global NRMSE of the baseline forecasts together with the best- and worst-performing reconciled variants for each model under the direct, univariate, and iterative cross-temporal approaches. This condensed presentation was adopted to preserve table readability and facilitate comparisons across forecasting models and reconciliation strategies. The complete set of results for all reconciliation methods, separately reported for the direct, univariate, and iterative approaches and for each base model, is provided in Appendix A. For the univariate and iterative cases, the notation in parentheses identifies the order in which the temporal and cross-sectional dimensions are handled, together with the corresponding estimators. In the univariate case, either temporal reconciliation is applied first and cross-sectional coherence is then imposed through aggregation, denoted by

t X

:

c B U

, or cross-sectional reconciliation is applied first and temporal coherence is then imposed through aggregation, denoted by

c Y

:

t B U

. In the iterative case,

t X

:

c Y

indicates that temporal reconciliation is applied using temporal estimator X, followed by cross-sectional reconciliation using cross-sectional estimator Y.

LightGBM showed the largest error reductions under reconciliation, with its best results obtained under the univariate variant (tCROSS:cBU) and the iterative variant (tCROSS:cWLS). More importantly, it remained the strongest model overall: even its worst reconciled variants yielded lower global NRMSE values than the best reconciled configurations obtained by any of the other forecasting models.

KAN also benefited substantially, with its best results obtained under the univariate and iterative schemes, both of which relied on variance-based information and delivered very similar errors. For the NHITS and NBEATSx models, the most favorable results were associated with structural scaling. In NBEATSx, the best direct and iterative configurations reached the same error, whereas NHITS achieved its lowest value under the univariate scheme. TBATS attained its best performance with the iterative variant (tOLS:cCOVSh), and TimeGPT improved mainly under covariance-based specifications in the univariate and iterative settings. A broader pattern also emerges from the table: the lowest errors were concentrated almost entirely in the univariate and iterative approaches. For five of the six forecasting models, the best reconciled result came from one of these two schemes; NBEATSx was the only exception, with a tie between the direct and iterative approaches.

In all cases, the direct cross-temporal BU benchmark did not outperform the best reconciled variant obtained under the direct, univariate, or iterative approaches for any base model. This indicates that, although BU provides a coherent and useful reference, it was consistently surpassed by at least one alternative reconciliation specification in every forecasting model considered.

The effect of reconciliation was not consistently favorable. Although it reduced error for selected configurations, it also produced clear deteriorations in several cases. The largest degradations appeared under the iterative scheme for NBEATSx, NHITS, TimeGPT, and especially TBATS, where some combinations led to dramatic increases in NRMSE. The highest-error cases were concentrated in covariance-based iterative variants such as (tOLS:cCOV) and (tCROSS:cCOV), suggesting that the main challenge lies in the estimation and use of full covariance matrices within the iterative procedure. These estimators could deliver substantial gains in some settings, but they also produced the least favorable outcomes in others. By comparison, simpler variance-based and structural estimators tended to yield more favorable behavior across models. Taken together, the results identify LightGBM and KAN as the models that made the most effective use of cross-temporal reconciliation, while for the remaining models the outcome depended much more on the specific reconciliation sequence and estimator.

These findings are reinforced by the heatmaps in Figure 8, which report the percentage change in global NRMSE for each model–reconciliation combination relative to the BU benchmark. In the direct approach, the most consistent gains were obtained with SS, which reduced error for all models, with the largest decreases observed for NBEATSx (

- 11.8 %

), NHITS (

- 11.3 %

), and TBATS (

- 10.3 %

). COV also produced marked improvements for LightGBM (

- 10.2 %

) and NHITS (

- 11.3 %

), followed by smaller reductions for TimeGPT and NBEATSx, whereas its effect on TBATS and KAN was limited. By contrast, OLS increased error for KAN, LightGBM, NBEATSx, and NHITS, with the largest deterioration observed for KAN (

+ 14.5 %

), whereas it reduced error for TBATS (

- 13.6 %

) and TimeGPT (

- 6.6 %

), making these the only two models that benefited from this estimator under the direct approach.

The univariate approach reveals a clearer separation across reconciliation methods. When cross-sectional reconciliation was applied first and BU was then used temporally (e.g., cCOV:tBU, cOLS:tBU), the reductions were generally small and most combinations increased error, reaching

+ 12.7 %

for TBATS under cOLS:tBU and

+ 11.8 %

for NBEATSx under cCOV:tBU. In contrast, the reverse order, in which temporal reconciliation was applied first and BU was then used cross-sectionally, was much more effective, particularly for variance- and covariance-based estimators. A plausible explanation is that temporal reconciliation corrects within-node inconsistencies across hourly, daily, and weekly resolutions before cross-sectional aggregation is imposed. This allows lower-frequency information to regularize provincial hourly forecasts, so that the subsequent BU aggregation to regional and national levels propagates a less distorted error structure. By contrast, when cross-sectional reconciliation is applied first at the hourly level and temporal coherence is imposed afterward only through aggregation, residual temporal misspecification at the lower level is carried into daily and weekly totals rather than explicitly corrected. Under this scheme, LightGBM achieved the largest reductions, especially with tAUTOCOV:cBU (

- 15.8 %

) and tCROSS:cBU (

- 19.6 %

). Deep learning models also benefited, particularly with variance-based estimators such as WLSS, WLSH, and WLSV, with reductions ranging from

- 5.2 %

to

- 12.4 %

. TimeGPT attained its largest decrease under tAUTOCOV:cBU, whereas TBATS did so under tOLS:cBU, reaching reductions of up to

- 9.2 %

and

- 14.2 %

, respectively. The main exceptions were the more noticeable error increases under tOLS:cBU for KAN (

+ 11.6 %

) and, to a lesser extent, LightGBM (

+ 1.2 %

).

The iterative approach produced the best improvements for some models, but it also generated the largest failures. LightGBM again showed the clearest benefit, especially under temporal covariance-based estimators combined with cross-sectional WLS-type methods. Its best result was obtained with tCROSS:cWLS, which reduced global NRMSE by

19.8 %

relative to BU. KAN improved under temporal WLSH combined with cross-sectional WLS, SS, COVSh, and OLS, with reductions between

9.7 %

and

11.7 %

. NHITS and NBEATSx performed best when WLSS was used temporally and paired with each cross-sectional estimator, producing decreases of roughly

9.2 %

to

11.8 %

. TBATS benefited across all combinations in which OLS was used as the temporal estimator, with reductions of up to

14.5 %

regardless of the cross-sectional estimator applied afterward.

At the same time, the iterative heatmap shows that this approach could also produce the least favorable outcomes. Several combinations caused error increases above

100 %

, particularly for NBEATSx, NHITS, and TBATS. The most problematic cases involved covariance-based cross-sectional reconciliation applied after temporal reconciliation, including variants such as tAUTOCOV:cCOV, tCROSS:cCOV, and tWLSS:cCOV. This pattern suggests that, under the iterative scheme, the main difficulty may lie in the estimation and inversion of the cross-sectional covariance matrix rather than in the temporal estimator itself. KAN also showed its largest deterioration under this approach, with error increases of up to

30.4 %

when OLS was used as the temporal estimator and combined with cross-sectional estimators. Overall, these results indicate that cross-sectional covariance reconciliation was the main source of severe degradations in the iterative setting, whereas temporal OLS appeared to be an additional model-specific risk for KAN.

Table 6 extends the global comparison by reporting the baseline forecasts and the representative best- and worst-performing reconciled variants separately for the weekly, daily, and hourly frequencies. The same model-specific patterns observed at the global level remained visible across frequencies, although their magnitude changed with the temporal aggregation level. As in the global analysis, the best results were concentrated mainly in the univariate and iterative approaches. Weekly and hourly forecasts were dominated by these two schemes, whereas the daily frequency showed a weaker and more heterogeneous response. Across all three frequencies, LightGBM remained the best-performing model. This frequency-dependent behavior is consistent with prior evidence from temporal hierarchical reconciliation on the same Belgian PV dataset, where the clearest benefits of reconciliation were observed at the weekly and hourly resolutions, while the daily level showed only modest and model-dependent improvements [22].

Weekly forecasts showed the clearest gains from reconciliation. The differences between the baseline and the best reconciled variant were negligible for TBATS and TimeGPT, but much larger for LightGBM and the deep learning models. The strongest reductions relative to the baseline were obtained by KAN (

- 48.9 %

) under the univariate and iterative approaches, NBEATSx (

- 37.2 %

) under the direct and iterative approaches, and NHITS (

- 31.8 %

) under the univariate approach. LightGBM still achieved the lowest weekly NRMSE overall, with tCROSS:cBU reducing error by

43.6 %

, followed closely by tCROSS:cWLS with a

43.4 %

reduction. Even its worst reconciled weekly variants remained below the baseline error, indicating that reconciliation was beneficial for this model across all weekly specifications considered.

Daily forecasts showed a different pattern. For LightGBM, NBEATSx, and NHITS, the baseline forecasts already yielded the lowest errors, even when compared with the best reconciled variants of KAN, TBATS, and TimeGPT. For these latter three models, the reductions relative to the baseline were also small, ranging only from

1.7 %

to

3.4 %

. Taken together, these results indicate that the daily level was the least responsive to reconciliation, with improvements that were generally modest and more dependent on the specific forecasting model.

The hourly forecasts again were favored by reconciliation, although less markedly than weekly forecasts. LightGBM achieved the lowest error with the iterative variant tCROSS:cWLS, which reduced NRMSE by

17.5 %

. KAN ranked second, also under the iterative approach, with a reduction of

9.0 %

. The remaining models exhibited more modest gains relative to their baselines, with best-case reductions of

6.3 %

for NHITS,

4.8 %

for NBEATSx and

9.5 %

for TimeGPT and TBATS.

Table 7 reports the baseline forecasts and the representative best- and worst-performing reconciled variants at the national, regional, and provincial levels. Unlike the frequency results, the best reconciled variant improved upon the baseline for every model at all three cross-sectional levels.

LightGBM yielded the lowest NRMSE at every level, in all cases under the iterative approach, with values of

0.3825

,

0.3891

, and

0.4580

at the national, regional, and provincial levels, corresponding to reductions of

22.4 %

,

21.2 %

, and

14.7 %

, respectively. KAN also showed clear reductions, again mainly under the iterative approach, with improvements of

17.3 %

at the national level,

16.5 %

at the regional level, and

14.5 %

at the provincial level.

For the deep learning models, NBEATSx attained the same best value under the direct and iterative approaches at the national and regional levels, whereas its best provincial result was obtained under the univariate approach. NHITS showed the same shift, with the direct approach yielding the lowest error at the national and regional levels and the univariate approach doing so at the provincial level. TimeGPT also improved at all three levels, with the same best value under the univariate and iterative approaches, whereas TBATS consistently reached its lowest error under the iterative approach. However, TBATS also showed the most inflated worst-case reconciled values across all three levels. The largest deteriorations again came from iterative variants combining temporal OLS or CROSS with cross-sectional COV, reproducing the same instability already observed in the global and frequency-specific analyses.

Figure 9 summarizes the best-performing reconciled cross-temporal configuration of each forecasting model across hierarchy levels and temporal frequencies. The best configurations were predominantly iterative for LightGBM, KAN, and TBATS, univariate for NHITS and TimeGPT, and direct only for NBEATSx. In the x-axis labels, the abbreviation after ‘|’ identifies the selected reconciliation scheme: ite for iterative, uni for univariate, and dir for direct.

Rather than altering the ranking substantially across hierarchy levels, the figure shows a much stronger separation across temporal frequencies. Weekly forecasts consistently occupy the lowest NRMSE range, daily forecasts lie at an intermediate level, and hourly forecasts display the largest errors for every model. Within each frequency, national and regional results are very close, whereas the provincial level is systematically associated with the highest NRMSE, indicating that error increases as the series become more spatially disaggregated. This pattern is most evident at the hourly frequency, where the provincial level consistently exhibited the highest NRMSE values.

LightGBM:tCROSS:cWLS remains the top-performing specification throughout, with the lowest NRMSE at the weekly, daily, and hourly levels and for all three hierarchy levels. KAN:tWLSH:cWLS follows as the second-best model overall. NHITS:tWLSS:cBU and NBEATSx:ctSS form a middle group with very similar errors. TimeGPT:tAUTOCOV:cBU and TBATS:tOLS:cCOVSh occupy the upper part of the error scale, although their ordering changes with frequency: at the weekly and daily levels, TBATS is generally below TimeGPT, whereas at the hourly level TimeGPT becomes slightly more favorable than TBATS. Overall, the figure reinforces two main findings: first, the relative ranking of the best reconciled configurations is remarkably stable across hierarchy levels; second, forecast difficulty is driven primarily by temporal resolution and spatial disaggregation, with hourly provincial series representing the most challenging setting.

These patterns are also reflected in Figure 10, which shows forecasts during the summer period, when PV generation reaches its highest levels in this dataset.

At the weekly and daily frequencies, the best reconciled LightGBM configuration followed the temporal evolution of PV generation more closely than the other models, although it still tended to underestimate the observed series, particularly around the largest peaks. The deep learning models displayed broadly similar trajectories, but with a less accurate representation of short-term fluctuations and local turning points in generation. By contrast, TimeGPT and TBATS showed the weakest performance at the weekly and daily levels, with a clearer underestimation of major production peaks and a poorer reconstruction of the overall trajectory. At the hourly frequency, all models reproduced the pronounced intraday cycle, but differences in peak amplitude, day-to-day modulation, and local variability remained visible. In line with the previous results, the best reconciled LightGBM forecasts remained among the closest to the observed series across frequencies.

5.2. Pairwise Statistical Comparison of the Best Reconciled Configurations

To complement the rank-based comparisons reported in Figure 6 and Figure 7, the best reconciled configuration from each forecasting family was compared through targeted pairwise statistical tests. For each family, the configuration with the lowest mean global NRMSE across the 52 walk-forward cutoffs was first selected. The cutoff-level NRMSE sequence of the best LightGBM configuration, LightGBM:tCROSS:cWLS, was then compared against the corresponding sequence of each competing family’s best reconciled configuration using the Diebold–Mariano test with the Harvey–Leybourne–Newbold small-sample correction [57,58]. Loss differentials were defined as the competitor’s NRMSE minus the LightGBM NRMSE, so positive values indicate lower error for LightGBM. Since five pairwise comparisons were performed against the same reference model, the resulting p-values were adjusted using Holm’s procedure.

Table 8 shows that the mean and median loss differentials were positive in all comparisons. LightGBM:tCROSS:cWLS achieved a mean NRMSE of 0.4099, whereas the best competing reconciled configurations ranged from 0.5565 for KAN:tWLSH:cWLS to 0.7918 for TBATS:tOLS:cCOVSh. LightGBM also obtained lower NRMSE in most paired cutoffs, with win rates between 84.6% and 100.0%. All Holm-adjusted p-values remained below 0.05, indicating that the observed advantage of LightGBM:tCROSS:cWLS over the best reconciled alternatives was statistically significant under the adopted walk-forward protocol.

5.3. Numerical Diagnostic of Unstable Iterative Configurations

Some iterative configurations produced error increases that were too large to be treated as ordinary losses in accuracy. To examine these cases, we performed a targeted diagnostic analysis of the most unstable covariance-based iterative variants and their shrinkage counterparts. The purpose was to check whether the error explosions were linked to covariance conditioning, convergence behavior, or excessive amplification by the reconciliation operator.

Table 9 summarizes the diagnostic results. Each configuration is reported with two rows: cCOV corresponds to the sample covariance estimator, while cCOVSh corresponds to the Ledoit–Wolf shrinkage estimator. NRMSE measures the actual damage in forecasting accuracy under the same global evaluation protocol used in the main experiments. Convergence (Conv.) reports the percentage of cutoffs that reached the numerical tolerance. Iter. med/max describes the median and maximum number of iterations. Patient stop (Pat. stop) reports the percentage of cutoffs stopped because the objective stopped improving before reaching the tolerance.

The conditioning metric is

{log}_{10} κ (A_{cross})

, where

A_{cross} = S_{cross}^{⊤} W^{- 1} S_{cross} .

This is the cross-sectional GLS matrix involved in the reconciliation projection. Large values indicate poor numerical conditioning. The metric

∥ R_{cross} ∥_{2}

is the spectral norm of the cross-sectional reconciliation operator and measures its capacity to amplify forecast perturbations. The amplification ratio (Amp. ratio) is the maximum absolute reconciled forecast divided by the maximum observed value. It gives a direct measure of forecast explosion.

The diagnostic results show that the problem is not simply that some base models forecast worse. The large failures are concentrated in the cCOV variants, where the sample covariance matrix leads to poorly conditioned cross-sectional GLS projections. This is reflected in the NRMSE values. For example, TBATS:tCROSS:cCOV reaches an NRMSE of 94.8124, NHITS:tOLS:cCOV reaches 7.9416, and NBEATSx:tOLS:cCOV reaches 7.0525. After replacing cCOV with cCOVSh, these values decrease to 0.9288, 0.7207, and 0.6965, respectively.

The conditioning diagnostics explain this behavior. For NBEATSx,

{log}_{10} κ (A_{cross})

decreases from 10.10 to 3.36 after shrinkage. For NHITS, it decreases from 10.17 to 2.70. These values indicate that the cross-sectional matrix involved in the GLS projection was severely ill-conditioned under cCOV and became much better conditioned under cCOVSh. TBATS shows a smaller conditioning value under cCOV, 6.69, but it produced the largest practical explosion. This indicates that conditioning alone does not explain the failures; the amplification induced by the reconciliation operator also matters.

The projection norm shows this amplification effect. Under cCOV,

∥ R_{cross} ∥_{2}

is 9804.73 for NBEATSx and 11,876.24 for NHITS. After shrinkage, these values decrease to 13.86 and 17.31. For TBATS, the norm decreases from 268.28 to 10.51. These reductions mean that the cross-sectional reconciliation operator under cCOV had a high capacity to magnify small changes in the forecasts. The amplification ratio gives the same message in practical terms. The clearest case is TBATS:tCROSS:cCOV, where the largest reconciled forecast was more than 3000 times larger than the maximum observed value. With cCOVSh, this ratio decreased to 0.78.

The convergence diagnostics also point to numerical instability under cCOV. All cCOVSh variants reached the tolerance in 100% of the cutoffs and none stopped by patience. Under cCOV, convergence was incomplete in several cases. TBATS:tCROSS:cCOV reached the tolerance in only 30.8% of the cutoffs and stopped by patience in 69.2%. NBEATSx with AUTOCOV or CROSS reached the tolerance in 84.6% of the cutoffs and stopped by patience in 15.4%. NHITS with OLS reached the tolerance in 82.7% of the cutoffs and stopped by patience in 17.3%.

The iteration counts show that coherence convergence and numerical reliability are not the same. Some configurations reached their stopping criterion quickly but still produced unrealistic forecasts. For example, NBEATSx:tOLS:cCOV has a median/maximum of 2/11 iterations, but its amplification ratio is 64.56. Similarly, NHITS:tOLS:cCOV has 2/11 iterations and an amplification ratio of 63.17. In these cases, the failure occurs almost immediately after applying the covariance-based projection. Other cases, such as NBEATSx:tAUTOCOV:cCOV and NBEATSx:tCROSS:cCOV, require many more iterations in the worst cutoffs, with maximum values of 168 and 199, showing an additional convergence issue.

These results support a more cautious interpretation of full sample covariance estimators in iterative cross-temporal reconciliation. In this experiment, cCOVSh acts as a numerical regularization of cCOV and prevents the largest forecast explosions. The results also show that reaching coherence is not enough to guarantee useful forecasts. For this reason, shrinkage regularization, early stopping, conditioning checks, projection-norm checks, and fallback to simpler diagonal estimators such as cWLS should be treated as practical safeguards when covariance-based iterative reconciliation is used.

6. Conclusions, Limitations and Future Work

Under the fixed-configuration experimental protocol adopted in this study, cross-temporal reconciliation improved PV forecasting accuracy for all evaluated models, with the best reconciled configuration for each model outperforming both its unreconciled baseline and the BU benchmark. These improvements, however, were not uniform: univariate and iterative schemes produced the largest error reductions, particularly when applied to more accurate base forecasts. The effectiveness of reconciliation depended mainly on the base forecasting model and, to a lesser extent, on the reconciliation strategy. LightGBM achieved the lowest errors in both the baseline and reconciled comparisons under the specific feature set, software implementations, and baseline configurations considered here, with its best results obtained under the univariate and iterative approaches. This finding should therefore be interpreted as conditional on the adopted experimental design, rather than as evidence of an intrinsic superiority of LightGBM over other forecasting models. KAN, NBEATSx, and NHITS also benefited from reconciliation, although the magnitude of improvement varied more across specifications, whereas TBATS and TimeGPT showed smaller or less consistent gains. Within the univariate approach, temporal-first variants generally outperformed cross-sectional-first alternatives. Univariate and iterative schemes involving full cross-sectional covariance estimators were usually more prone to instability, producing large error increases in some configurations. Although covariance-based estimators can be beneficial when the error structure is well estimated, their use in the cross-sectional step requires caution for this dataset. In particular, covariance-based iterative reconciliation should be accompanied by shrinkage regularization, explicit early-stopping rules, covariance-conditioning checks, and projection-amplification diagnostics to reduce the risk of unstable inversions and unrealistic forecast adjustments. Because the difference between the best univariate and iterative configurations was often small, the extra computational and estimation effort required by the iterative scheme may not always be justified in practice. Across cross-sectional levels, the reductions in error followed a similar pattern, whereas across temporal frequencies the clearest gains were observed at the weekly and hourly levels, with a weaker effect at the daily level.

Several limitations should be acknowledged. The analysis was restricted to a single national PV system with a fixed cross-sectional hierarchy and three temporal aggregation levels, so the findings should not be assumed to transfer directly to other energy systems or climatic regimes. A further important limitation is that all forecasting models were evaluated under fixed baseline configurations, without model-specific hyperparameter tuning or architecture search. This design was adopted to preserve a common evaluation protocol and to keep the computational burden manageable across the large set of forecasting models, temporal frequencies, cross-sectional levels, walk-forward cutoffs, and reconciliation specifications considered in the study. However, different forecasting paradigms can exhibit different degrees of sensitivity to hyperparameter choices, network architecture, training length, regularization, and feature construction. Consequently, the relative ranking of the forecasting models, including the observed advantage of LightGBM, may partly reflect the selected implementations and baseline configurations. The study therefore provides a controlled comparison of cross-temporal reconciliation behavior across forecasting paradigms under standardized conditions, but not a fully optimized benchmark for each model family.

Future work should evaluate these strategies in other PV systems and renewable generation settings, incorporate richer spatial information through static spatial features or embeddings, and explore better-conditioned weighting matrices for high-dimensional reconciliation. A priority for future research is to conduct budgeted and methodologically balanced hyperparameter tuning across model families, for example through rolling-validation search spaces or equivalent computational budgets, to assess whether the comparative rankings remain stable after optimization. Another relevant direction is to assess whether ensemble base forecasts can provide a better starting point for cross-temporal reconciliation than single-model inputs. Additional extensions include probabilistic reconciliation, non-negativity constraints, and hybrid reconciliation frameworks [59,60].

Author Contributions

Conceptualization, H.F.C.-G. and A.G.-O.; methodology, H.F.C.-G. and A.G.-O.; software, H.F.C.-G. and A.G.-O.; validation, H.F.C.-G. and A.G.-O.; formal analysis, H.F.C.-G.; investigation, A.G.-O.; resources, H.F.C.-G.; data curation, A.G.-O.; writing—original draft preparation, H.F.C.-G. and A.G.-O.; writing—review and editing, H.F.C.-G. and A.G.-O.; visualization, A.G.-O.; supervision, H.F.C.-G.; project administration, H.F.C.-G.; funding acquisition, H.F.C.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The photovoltaic generation data used in this study are publicly available on the Elia Open Data platform accessed on 1 November 2025: https://opendata.elia.be/explore/dataset/ods032/. All other data and the code developed for analysis are available from the corresponding author upon reasonable request.

Acknowledgments

The authors appreciate the support provided by AgnaLab for computational infrastructure and technical resources. This work was carried out using cloud-based platforms and GPU-enabled environments generously made available through AgnaLab’s data science innovation program.

Conflicts of Interest

Alberto Gudiño-Ochoa and Harold Felipe Calderón-González are employed by AgnaLab. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Mathematical Symbols

The following mathematical symbols are used in this manuscript:

Symbol	Definition
$P (r)$	Set of provinces belonging to region r
p	Province index
r	Region index
${\tilde{y}}_{p}$	Forecast for province p
${\tilde{y}}_{r}$	Forecast for region r
${\tilde{y}}_{national}$	National-level forecast
$\hat{y}$	Base forecast before reconciliation
$\tilde{y}$	Reconciled forecast after enforcing aggregation constraints
$\hat{y}$	Vector of base forecasts
$\tilde{y}$	Vector of reconciled forecasts
${\tilde{y}}^{cross}$	Cross-sectional vector of reconciled forecasts
${\tilde{y}}_{b}^{cross}$	Bottom-level cross-sectional forecast vector
${\tilde{y}}_{b}$	Bottom-level temporal forecast vector
$S$	Generic summing matrix defining hierarchical aggregation constraints
$S_{cross}$	Cross-sectional summing matrix
$S_{temp}$	Temporal summing matrix
$S_{CT}$	Joint cross-temporal summing matrix
$B_{3 \times 11}$	Regional aggregation matrix from provinces to regions
$I_{m}$	Identity matrix of order m
$1_{m}$	m-dimensional vector of ones
⊗	Kronecker product
$R$	Reconciliation matrix
$G$	Combination matrix in the GLS reconciliation framework
$W$	Forecast-error covariance matrix
$Σ_{k}$	Covariance matrix for temporal level k in the AUTOCOV estimator
$\hat{Σ}$	Estimated covariance matrix
${\hat{Σ}}_{LW}$	Ledoit–Wolf shrinkage covariance estimate
$Y$	Forecast matrix in the iterative reconciliation procedure
$R_{temp}$	Temporal reconciliation matrix
$R_{cross}$	Cross-sectional reconciliation matrix
T	Number of temporal indices in the iterative matrix representation
C	Number of cross-sectional nodes in the iterative matrix representation
c	Cross-sectional node index in the iterative reconciliation procedure
$τ$	Temporal index in the iterative reconciliation procedure
t	Iteration index in the iterative reconciliation procedure
$B_{temp}$	Set of bottom-level temporal indices
$B_{cross}$	Set of bottom-level cross-sectional indices
$d_{temp}^{(t)}$	Temporal aggregation inconsistency at iteration t
$d_{cross}^{(t)}$	Cross-sectional aggregation inconsistency at iteration t
$δ$	Convergence tolerance for the iterative reconciliation procedure
${∥ \cdot ∥}_{1}$	$ℓ_{1}$ norm
$λ_{BC}$	Box–Cox transformation parameter
$ℓ_{t}$	Level component in the TBATS model
$b_{t}$	Trend component in the TBATS model
$ϕ$	Damping parameter in the TBATS model
$s_{t}^{(r)}$	rth trigonometric seasonal component in TBATS
M	Number of seasonal components in TBATS
$k_{r}$	Number of Fourier harmonics for the rth seasonal component
$d_{t}$	ARMA disturbance term in TBATS
$p, q$	Autoregressive and moving-average orders in TBATS
$φ_{a}, θ_{b}$	Autoregressive and moving-average coefficients in TBATS
$ε_{t}$	Innovation term
$x_{i}$	Feature vector for observation i in LightGBM
$f_{t}$	Regression tree added at boosting iteration t in LightGBM
$T_{tree}$	Space of regression trees in LightGBM
$p o s$	Temporal position in the sinusoidal positional encoding
$d_{model}$	Model dimension in the Transformer architecture
$Q, K, V$	Query, key, and value matrices in the attention mechanism
$d_{k}$	Dimension of the key vectors in attention
$h_{1}, \dots, h_{H_{att}}$	Attention-head outputs in the multi-head attention mechanism
$H_{att}$	Number of attention heads
$W^{O}$	Output projection matrix in multi-head attention
$X$	Input representation or matrix of exogenous regressors, depending on context
$γ, β$	Scale and shift parameters in layer normalization
$μ, σ$	Mean and standard deviation used in layer normalization
$ϵ$	Small numerical constant used for stability in layer normalization
$H_{for}$	Forecast horizon in TimeGPT
$f_{θ}$	Forecasting function parameterized by $θ$
$x$	Input vector in the KAN composition
$φ_{q, p}$	Inner univariate function in the Kolmogorov–Arnold representation
$Φ_{q}$	Outer univariate function in the Kolmogorov–Arnold representation
$φ_{i, j}^{(l)}$	Trainable edge function from unit i to unit j in KAN layer l
n	Input dimension in the Kolmogorov–Arnold representation
$n_{l}$	Number of units in KAN layer l
$Ψ_{l}$	Vector-valued transformation implemented by the lth KAN layer
N	Number of trainable parameters when discussing KAN scaling laws
S	Number of stacks in NBEATSx
B	Number of blocks in NBEATSx or NHITS, depending on the local model subsection
$s, b$	Stack and block indices in NBEATSx
$h_{s, b}$	Hidden representation produced by block b in stack s of NBEATSx
$y_{s, b}^{back}$	Backcast residual input in NBEATSx
$X_{s, b - 1}$	Matrix of exogenous regressors in NBEATSx
$θ_{s, b}^{back}$	Backcast coefficient vector in NBEATSx
$θ_{s, b}^{for}$	Forecast coefficient vector in NBEATSx
$W^{back}, W^{for}$	Projection matrices for backcast and forecast coefficients in NBEATSx
$V_{s, b}^{back}, V_{s, b}^{for}$	Basis matrices for backcast and forecast expansions in NBEATSx
L	Input window length in NHITS
H	Forecast horizon in NHITS
ℓ	Local index; it denotes the NHITS block index in Section 3 and the forecast horizon step in the NRMSE evaluation
$k_{ℓ}$	Pooling kernel associated with NHITS block ℓ
$h_{ℓ}$	Hidden representation produced by NHITS block ℓ
$θ_{f}^{ℓ}, θ_{b}^{ℓ}$	Forecast and backcast coefficient vectors in NHITS
$ξ$	Interpolation grid used in NHITS
$I (\cdot)$	Interpolation function used in NHITS
$N_{series}$	Total number of evaluated series
i	Series index in the NRMSE evaluation
j	Forecasting model index in the NRMSE evaluation
f	Temporal frequency index
$w, d, h$	Weekly, daily, and hourly temporal frequencies, respectively
g	Cross-sectional hierarchy-level index in the NRMSE evaluation
$N, R, P$	National, Regional, and Provincial cross-sectional levels, respectively
k	Walk-forward validation cutoff index
$L_{f}$	Forecast horizon length associated with temporal frequency f
$F$	Set of temporal frequencies
$G$	Set of cross-sectional hierarchy levels
$S_{g}$	Set of series belonging to cross-sectional level g
K	Total number of walk-forward validation cutoffs
NRMSE	Normalized Root Mean Squared Error
${\bar{NRMSE}}_{j}$	Overall global NRMSE for forecasting model j averaged over all cutoffs

Appendix A

Global NRMSE by Reconciliation and Base Model

Table A1. Global NRMSE obtained by each cross-temporal reconciliation approach and estimator for all base models. Blue values indicate the lowest score in each model column, and red values the highest.

Reconciliation	Estimator	KAN	LightGBM	NBEATSx	NHITS	TBATS	TimeGPT
Base	base	0.6624	0.5080	0.6576	0.6544	0.8561	0.7421
Direct	ctBU	0.6304	0.5109	0.6883	0.6941	0.9266	0.7764
	ctCOV	0.6316	0.4586	0.6487	0.6155	0.9187	0.7258
	ctOLS	0.7220	0.5516	0.6989	0.7095	0.8009	0.7252
	ctSS	0.5980	0.4772	0.6070	0.6156	0.8311	0.7334
Iterative	tAUTOCOV:cCOV	0.6373	0.4305	1.6584	1.0043	1.0079	0.7685
	tAUTOCOV:cCOVSh	0.6129	0.4303	0.6652	0.6712	0.8972	0.7069
	tAUTOCOV:cOLS	0.6183	0.4506	0.6688	0.6592	0.9419	0.7104
	tAUTOCOV:cSS	0.6123	0.4306	0.6551	0.6264	0.9015	0.7077
	tAUTOCOV:cWLS	0.6173	0.4288	0.6510	0.6131	0.8809	0.7056
	tCROSS:cCOV	0.6688	0.4174	1.7631	0.9685	94.8124	1.0209
	tCROSS:cCOVSh	0.6184	0.4175	0.6685	0.6643	0.9288	0.7420
	tCROSS:cOLS	0.6232	0.4508	0.6718	0.6551	0.9604	0.7357
Iterative	tCROSS:cSS	0.6139	0.4175	0.6587	0.6253	0.9140	0.7319
	tCROSS:cWLS	0.6191	0.4099	0.6557	0.6142	0.8917	0.7299
	tOLS:cCOV	0.8218	0.5251	7.0525	7.9416	0.7964	0.7222
	tOLS:cCOVSh	0.7135	0.5246	0.6965	0.7207	0.7918	0.7116
	tOLS:cOLS	0.7220	0.5516	0.6989	0.7095	0.8009	0.7252
	tOLS:cSS	0.7080	0.5244	0.6885	0.6733	0.7958	0.7210
	tOLS:cWLS	0.7046	0.5169	0.6841	0.6683	0.7941	0.7168
	tWLSH:cCOV	0.6662	0.4645	1.1238	1.0865	0.9678	0.7602
	tWLSH:cCOVSh	0.5603	0.4644	0.6680	0.6533	0.9611	0.7470
	tWLSH:cOLS	0.5692	0.4788	0.6686	0.6608	1.0380	0.7728
	tWLSH:cSS	0.5609	0.4640	0.6467	0.6403	0.9780	0.7639
	tWLSH:cWLS	0.5565	0.4636	0.6331	0.6257	0.9435	0.7548
	tWLSS:cCOV	0.6066	0.4770	2.1387	2.2417	1.0483	0.7760
	tWLSS:cCOVSh	0.6025	0.4768	0.6170	0.6301	0.8306	0.7223
	tWLSS:cOLS	0.6046	0.4976	0.6121	0.6289	0.8486	0.7395
	tWLSS:cSS	0.5980	0.4772	0.6070	0.6156	0.8311	0.7334
	tWLSS:cWLS	0.5985	0.4716	0.6073	0.6125	0.8255	0.7274
	tWLSV:cCOV	0.5924	0.4808	0.9506	0.9027	0.9680	0.7641
	tWLSV:cCOVSh	0.5872	0.4807	0.6746	0.6699	0.9623	0.7546
	tWLSV:cOLS	0.5912	0.4975	0.6722	0.6728	1.0394	0.7807
	tWLSV:cSS	0.5832	0.4811	0.6533	0.6563	0.9795	0.7716
	tWLSV:cWLS	0.5824	0.4788	0.6429	0.6463	0.9452	0.7621
Univariate	cCOV:tBU	0.6493	0.5164	0.7693	0.7477	0.9682	0.7719
	cCOVSh:tBU	0.6490	0.5164	0.7088	0.7102	0.9662	0.7724
	cOLS:tBU	0.6420	0.5391	0.6961	0.7069	1.0443	0.8030
	cSS:tBU	0.6333	0.5185	0.6879	0.7018	0.9838	0.7924
	cWLS:tBU	0.6391	0.5118	0.6919	0.7033	0.9495	0.7805
	tAUTOCOV:cBU	0.6264	0.4301	0.6498	0.6117	0.8709	0.7051
	tCROSS:cBU	0.6286	0.4109	0.6556	0.6126	0.8823	0.7298
	tOLS:cBU	0.7038	0.5171	0.6822	0.6480	0.7951	0.7155
	tWLSH:cBU	0.5586	0.4647	0.6203	0.6128	0.9209	0.7518
	tWLSS:cBU	0.5979	0.4713	0.6073	0.6077	0.8240	0.7255
	tWLSV:cBU	0.5805	0.4794	0.6299	0.6328	0.9225	0.7589

Base denotes the unreconciled forecast of each model. The first column indicates the reconciliation type, and the second column reports the specific estimator used within that family.

References

Reindl, T.; Walsh, W.; Yanqin, Z.; Bieri, M. Energy meteorology for accurate forecasting of PV power output on different time horizons. Energy Procedia 2017, 130, 130–138. [Google Scholar] [CrossRef]
Duranay, Z.B. Extreme learning machine-based power forecasting in photovoltaic systems. IEEE Access 2023, 11, 128923–128931. [Google Scholar] [CrossRef]
Sabadus, A.; Blaga, R.; Hategan, S.M.; Calinoiu, D.; Paulescu, E.; Mares, O.; Boata, R.; Stefu, N.; Paulescu, M.; Badescu, V. A cross-sectional survey of deterministic PV power forecasting: Progress and limitations in current approaches. Renew. Energy 2024, 226, 120385. [Google Scholar] [CrossRef]
Van der Meer, D.W.; Widén, J.; Munkhammar, J. Review on probabilistic forecasting of photovoltaic power production and electricity consumption. Renew. Sustain. Energy Rev. 2018, 81, 1484–1512. [Google Scholar] [CrossRef]
Falope, T.O.; Lao, L.; Huo, D.; Kuang, B. Development of an integrated energy management system for off-grid solar applications with advanced solar forecasting, time-of-use tariffs, and direct load control. Sustain. Energy Grids Netw. 2024, 39, 101449. [Google Scholar] [CrossRef]
Visser, L.; AlSkaif, T.; Hu, J.; Louwen, A.; van Sark, W. On the value of expert knowledge in estimation and forecasting of solar photovoltaic power generation. Sol. Energy 2023, 251, 86–105. [Google Scholar] [CrossRef]
Wang, Y.; Yin, W.; Zhai, Y.; Zhang, H.; Xu, F. A Day-Ahead Power Prediction Algorithm for PV Power Plants Based on Power Market Benefit and BP. In Proceedings of the 2024 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
Hai, T.; Singh, N.S.S.; Jamal, F. Energy management of a microgrid with integration of renewable energy sources considering energy storage systems with electricity price. J. Energy Storage 2025, 110, 115191. [Google Scholar] [CrossRef]
Ye, H.; Yang, B.; Han, Y.; Chen, N. State-of-the-art solar energy forecasting approaches: Critical potentials and challenges. Front. Energy Res. 2022, 10, 875790. [Google Scholar]
Kontopoulou, V.I.; Panagopoulos, A.D.; Kakkos, I.; Matsopoulos, G.K. A review of ARIMA vs. machine learning approaches for time series forecasting in data driven networks. Future Internet 2023, 15, 255. [Google Scholar] [CrossRef]
Matushkin, D.; Zaporozhets, A.; Babak, V.; Kulyk, M.; Denysov, V. Hourly Photovoltaic Power Forecasting Using Exponential Smoothing: A Comparative Study Based on Operational Data. Solar 2025, 5, 48. [Google Scholar] [CrossRef]
Masini, R.P.; Medeiros, M.C.; Mendes, E.F. Machine learning advances for time series forecasting. J. Econ. Surv. 2023, 37, 76–111. [Google Scholar]
Markovics, D.; Mayer, M.J. Comparison of machine learning methods for photovoltaic power forecasting based on numerical weather prediction. Renew. Sustain. Energy Rev. 2022, 161, 112364. [Google Scholar] [CrossRef]
Gupta, M.; Arya, A.; Varshney, U.; Mittal, J.; Tomar, A. A review of PV power forecasting using machine learning techniques. Prog. Eng. Sci. 2025, 2, 100058. [Google Scholar] [CrossRef]
Alcañiz, A.; Grzebyk, D.; Ziar, H.; Isabella, O. Trends and gaps in photovoltaic power forecasting with machine learning. Energy Rep. 2023, 9, 447–471. [Google Scholar]
Pereira, S.; Canhoto, P.; Oozeki, T.; Salgado, R. Comprehensive approach to photovoltaic power forecasting using numerical weather prediction data and physics-based models and data-driven techniques. Renew. Energy 2025, 251, 123495. [Google Scholar]
An, S.; Oh, T.J.; Sohn, E.; Kim, D. Deep learning for precipitation nowcasting: A survey from the perspective of time series forecasting. Expert Syst. Appl. 2025, 268, 126301. [Google Scholar] [CrossRef]
Tan, H.; Qin, J.; Li, Z.; Wu, W. PSCNet: Long sequence time-series forecasting for photovoltaic power via period selection and cross-variable attention. Appl. Intell. 2025, 55, 642. [Google Scholar]
López Santos, M.; García-Santiago, X.; Echevarría Camarero, F.; Blázquez Gil, G.; Carrasco Ortega, P. Application of temporal fusion transformer for day-ahead PV power forecasting. Energies 2022, 15, 5232. [Google Scholar] [CrossRef]
Tao, K.; Zhao, J.; Tao, Y.; Qi, Q.; Tian, Y. Operational day-ahead photovoltaic power forecasting based on transformer variant. Appl. Energy 2024, 373, 123825. [Google Scholar] [CrossRef]
Liao, W.; Wang, S.; Yang, D.; Yang, Z.; Fang, J.; Rehtanz, C.; Porté-Agel, F. TimeGPT in load forecasting: A large time series model perspective. Appl. Energy 2025, 379, 124973. [Google Scholar]
Gonzalez, F.C.; Gudiño-Ochoa, A. Temporal hierarchical forecast reconciliation of photovoltaic power generation from heterogeneous base models. Meas. Energy 2026, 10, 100094. [Google Scholar] [CrossRef]
Yang, D.; Quan, H.; Disfani, V.R.; Liu, L. Reconciling solar forecasts: Geographical hierarchy. Sol. Energy 2017, 146, 276–286. [Google Scholar] [CrossRef]
Agoua, X.G.; Girard, R.; Kariniotakis, G. Photovoltaic power forecasting: Assessment of the impact of multiple sources of spatio-temporal data on forecast accuracy. Energies 2021, 14, 1432. [Google Scholar] [CrossRef]
Yang, D.; Yang, G.; Perez, M.J.; Perez, R. Effectively dispatchable solar power with hierarchical reconciliation and firm forecasting. J. Mod. Power Syst. Clean Energy 2024, 13, 585–596. [Google Scholar] [CrossRef]
Di Fonzo, T.; Girolimetto, D. Spatio-temporal reconciliation of solar forecasts. Sol. Energy 2023, 251, 13–29. [Google Scholar] [CrossRef]
Athanasopoulos, G.; Hyndman, R.J.; Kourentzes, N.; Panagiotelis, A. Forecast reconciliation: A review. Int. J. Forecast. 2024, 40, 430–456. [Google Scholar]
Kourentzes, N.; Athanasopoulos, G. Cross-temporal coherent forecasts for Australian tourism. Ann. Tour. Res. 2019, 75, 393–409. [Google Scholar] [CrossRef]
Di Fonzo, T.; Girolimetto, D. Cross-temporal forecast reconciliation: Optimal combination method and heuristic alternatives. Int. J. Forecast. 2023, 39, 39–57. [Google Scholar] [CrossRef]
Rombouts, J.; Ternes, M.; Wilms, I. Cross-temporal forecast reconciliation at digital platforms with machine learning. Int. J. Forecast. 2025, 41, 321–344. [Google Scholar]
Girolimetto, D.; Di Fonzo, T. Cross-temporal forecast reconciliation: Insights on sequential, iterative, and optimal approaches. Stat. Methods Appl. 2025, 35, 161–180. [Google Scholar] [CrossRef]
Spiliotis, E.; Petropoulos, F.; Kourentzes, N.; Assimakopoulos, V. Cross-temporal aggregation: Improving the forecast accuracy of hierarchical electricity consumption. Appl. Energy 2020, 261, 114339. [Google Scholar] [CrossRef]
Di Fonzo, T.; Girolimetto, D. Enhancements in cross-temporal forecast reconciliation, with an application to solar irradiance forecasts. arXiv 2022, arXiv:2209.07146. [Google Scholar]
Quinn, C.O.; Corliss, G.F.; Povinelli, R.J. Cross-temporal hierarchical forecast reconciliation of natural gas demand. Energies 2024, 17, 3077. [Google Scholar] [CrossRef]
Abolghasemi, M.; Girolimetto, D.; Di Fonzo, T. Improving cross-temporal forecasts reconciliation accuracy and utility in energy market. Appl. Energy 2025, 394, 126053. [Google Scholar] [CrossRef]
Calderon Gonzalez, H.F.; Gudiño-Ochoa, A. Heuristic Cross-Temporal Reconciliation Applied to Heterogeneous Models in Photovoltaic Forecasting. SSRN 2025. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5527782 (accessed on 28 June 2026). [CrossRef]
Zsiborács, H.; Pintér, G.; Vincze, A.; Birkner, Z.; Baranyai, N.H. Grid balancing challenges illustrated by two European examples: Interactions of electric grids, photovoltaic power generation, energy storage and power generation forecasting. Energy Rep. 2021, 7, 3805–3818. [Google Scholar] [CrossRef]
Iheanetu, K.J. Solar photovoltaic power forecasting: A review. Sustainability 2022, 14, 17005. [Google Scholar] [CrossRef]
Hyndman, R.J.; Ahmed, R.A.; Athanasopoulos, G.; Shang, H.L. Optimal combination forecasts for hierarchical time series. Comput. Stat. Data Anal. 2011, 55, 2579–2589. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. The power of (non-) linear shrinking: A review and guide to covariance matrix estimation. J. Financ. Econ. 2022, 20, 187–218. [Google Scholar]
Ledoit, O.; Wolf, M. Analytical nonlinear shrinkage of large-dimensional covariance matrices. Ann. Stat. 2020, 48, 3043–3065. [Google Scholar] [CrossRef]
Wickramasuriya, S.L.; Athanasopoulos, G.; Hyndman, R.J. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. J. Am. Stat. Assoc. 2019, 114, 804–819. [Google Scholar]
Athanasopoulos, G.; Hyndman, R.J.; Kourentzes, N.; Petropoulos, F. Forecasting with temporal hierarchies. Eur. J. Oper. Res. 2017, 262, 60–74. [Google Scholar] [CrossRef]
Nystrup, P.; Lindström, E.; Pinson, P.; Madsen, H. Temporal hierarchies with autocorrelation for load forecasting. Eur. J. Oper. Res. 2020, 280, 876–888. [Google Scholar] [CrossRef]
Girolimetto, D.; Di Fonzo, T. Insights into regression-based cross-temporal forecast reconciliation. In Proceedings of the Scientific Meeting of the Italian Statistical Society; Springer: Berlin/Heidelberg, Germany, 2024; pp. 119–125. [Google Scholar]
De Livera, A.M. Modeling Time Series with Complex Seasonal Patterns Using Exponential Smoothing. Ph.D. Thesis, Monash University, Melbourne, Australia, 2010. [Google Scholar]
De Livera, A.M.; Hyndman, R.J.; Snyder, R.D. Forecasting time series with complex seasonal patterns using exponential smoothing. J. Am. Stat. Assoc. 2011, 106, 1513–1527. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 3149–3157. [Google Scholar]
Garza, A.; Challu, C.; Mergenthaler-Canseco, M. TimeGPT-1. arXiv 2023, arXiv:2310.03589. [Google Scholar]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. Kan: Kolmogorov-arnold networks. arXiv 2024, arXiv:2404.19756. [Google Scholar]
Oreshkin, B.N.; Carpov, D.; Chapados, N.; Bengio, Y. N-BEATS: Neural basis expansion analysis for interpretable time series forecasting. arXiv 2019, arXiv:1905.10437. [Google Scholar]
Olivares, K.G.; Challu, C.; Marcjasz, G.; Weron, R.; Dubrawski, A. Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx. Int. J. Forecast. 2023, 39, 884–900. [Google Scholar]
Challu, C.; Olivares, K.G.; Oreshkin, B.N.; Ramirez, F.G.; Canseco, M.M.; Dubrawski, A. Nhits: Neural hierarchical interpolation for time series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2023; Volume 37, pp. 6989–6997. [Google Scholar]
Mansouri, H.; Paige, R.L.; Surles, J.G. Aligned rank transform techniques for analysis of variance and multiple comparisons. Commun. Stat.-Theory Methods 2004, 33, 2217–2232. [Google Scholar] [CrossRef]
Wobbrock, J.O.; Findlater, L.; Gergle, D.; Higgins, J.J. The aligned rank transform for nonparametric factorial analyses using only anova procedures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; ACM: New York, NY, USA, 2011; pp. 143–146. [Google Scholar]
Kowalchuk, R.K.; Keselman, H.; Algina, J. Repeated measures interaction test with aligned ranks. Multivar. Behav. Res. 2003, 38, 433–461. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Comparing predictive accuracy. J. Bus. Econ. Stat. 2002, 20, 134–144. [Google Scholar] [CrossRef]
Harvey, D.; Leybourne, S.; Newbold, P. Testing the equality of prediction mean squared errors. Int. J. Forecast. 1997, 13, 281–291. [Google Scholar] [CrossRef]
Girolimetto, D. Non-negative forecast reconciliation: Optimal methods and operational solutions. Forecasting 2025, 7, 64. [Google Scholar] [CrossRef]
Girolimetto, D.; Di Fonzo, T. Coherent forecast combination for linearly constrained multiple time series. arXiv 2024, arXiv:2412.03429. [Google Scholar]

Figure 1. Cross-sectional (left), temporal (middle), and cross-temporal (right) hierarchy considered in this study.

Figure 2. Hierarchical structure of the joint summing matrix

S_{CT}

obtained from the Kronecker product of cross-sectional and temporal hierarchies.

Figure 2. Hierarchical structure of the joint summing matrix

S_{CT}

obtained from the Kronecker product of cross-sectional and temporal hierarchies.

Figure 3. Representative PV generation series between January 2020 and April 2025 at different frequencies. Colors denote hierarchical levels (national, regional, provincial).

Figure 4. Correlation between PV generation (y) and meteorological features. Shortwave radiation, temperature, and humidity dominate, while cloud cover has limited explanatory value. Collinear irradiance indicators were removed.

Figure 5. Workflow of the proposed forecasting and cross-temporal reconciliation framework.

Figure 6. MCB Nemenyi test using NRMSE for baseline forecasting models. Results are shown for global, hierarchical levels and all frequencies. Friedman test p-value is reported in the lower-right corner (

p < 0.05

). Lower ranks indicate better performance. The blue band denotes the critical interval around the best-ranked model; blue triangles mark models statistically indistinguishable from it, whereas red circles mark statistically worse models.

Figure 6. MCB Nemenyi test using NRMSE for baseline forecasting models. Results are shown for global, hierarchical levels and all frequencies. Friedman test p-value is reported in the lower-right corner (

p < 0.05

). Lower ranks indicate better performance. The blue band denotes the critical interval around the best-ranked model; blue triangles mark models statistically indistinguishable from it, whereas red circles mark statistically worse models.

Figure 7. MCB Nemenyi test based on NRMSE for the top 10 reconciled forecasting models. Results are reported at the global level and separately by hierarchical level and temporal frequency. The Friedman test p-value is shown in the lower-right corner (

p < 0.05

). Lower ranks indicate better performance. The blue band denotes the critical interval around the best-ranked model; blue triangles mark models statistically indistinguishable from it.

Figure 7. MCB Nemenyi test based on NRMSE for the top 10 reconciled forecasting models. Results are reported at the global level and separately by hierarchical level and temporal frequency. The Friedman test p-value is shown in the lower-right corner (

p < 0.05

). Lower ranks indicate better performance. The blue band denotes the critical interval around the best-ranked model; blue triangles mark models statistically indistinguishable from it.

Figure 8. Percentage change in global NRMSE relative to the BU benchmark under the three cross-temporal reconciliation approaches. Negative values indicate lower error than BU, whereas positive values indicate higher error. (a) Direct cross-temporal reconciliation. (b) Univariate cross-temporal reconciliation. (c) Iterative cross-temporal reconciliation.

Figure 9. NRMSE by temporal frequency and cross-sectional hierarchy level for the best-performing reconciled configuration of each forecasting model.

Figure 10. Observed PV generation at the Belgian national level during summer (black), together with forecasts from the best-performing reconciled configuration of each forecasting model at the weekly (top), daily (middle), and hourly (bottom) frequencies.

Table 1. Estimators of the covariance matrix

W

applied in temporal and cross-sectional reconciliation.

Table 1. Estimators of the covariance matrix

W

applied in temporal and cross-sectional reconciliation.

Method	Form of W	Description
Temporal estimators
Ordinary least squares (OLS)	$I$	Assumes homoscedastic and uncorrelated errors. Ignores heterogeneity between levels and all correlations [42].
Weighted least squares with structural scaling (WLSS)	$σ^{2} diag (S 1)$	Variances are assumed proportional to the count of bottom-level series aggregated at each node. Does not require residuals [43].
Weighted least squares with level variance (WLSV)	$diag ({\hat{σ}}_{[k]}^{2})$	One variance estimate is specified per temporal resolution level (week/day/hour), estimated from in-sample residuals. No cross-node correlations are modeled [43].
Weighted least squares with hierarchy variance (WLSH)	$diag ({\hat{σ}}_{[k], c}^{2})$	Fully heterogeneous estimator in which each node has its own variance estimated from residuals. Captures intra-level differences but ignores correlations [43].
Autocovariance scaling (AUTOCOV)	$blockdiag (Σ_{k})$	Block-diagonal covariance estimated from residuals. Preserves within-level autocorrelation, while correlations between levels are not considered [44].
Cross-covariance scaling (CROSS)	${\hat{Σ}}_{LW}$	Full covariance across all nodes estimated from residuals. Ledoit–Wolf shrinkage is employed to regularize matrix inversion while maintaining both within-level and cross-level correlation structures. [44].
Cross-sectional estimators
Ordinary least squares (OLS)	$I$	Assumes homoscedastic and uncorrelated errors across units. This is the simplest diagonal form [27].
Structural scaling (SS)	$σ^{2} diag (S 1)$	Weights are proportional to the count of bottom-level units contributing to each aggregate. Does not require residuals [27].
Weighted least squares (WLS)	$diag ({\hat{σ}}_{c}^{2})$	Uses node-specific variances estimated from residuals. Correlations are not modeled [27].
Covariance estimator (COV)	$\hat{Σ}$	Sample covariance matrix estimated directly from residuals. It can become unstable in high-dimensional settings [42].
Shrinkage covariance estimator (COVSh)	${\hat{Σ}}_{LW}$	Covariance matrix estimated from residuals and regularized with Ledoit–Wolf shrinkage to improve numerical stability and inversion [42].

Table 2. Structure of the forecast matrix

Y

. Columns represent cross-sectional nodes: N = national, R = region, and P = province. Rows represent the temporal hierarchy: w = week, d = day, and h = hour.

Table 2. Structure of the forecast matrix

Y

. Columns represent cross-sectional nodes: N = national, R = region, and P = province. Rows represent the temporal hierarchy: w = week, d = day, and h = hour.

Time/Node	N	R1	⋯	R3	P1	⋯	P11
Week-1	$y_{w, N}$	$y_{w, r 1}$	⋯	$y_{w, r 3}$	$y_{w, p 1}$	⋯	$y_{w, p 11}$
Day-1	$y_{d 1, N}$	$y_{d 1, r 1}$	⋯	$y_{d 1, r 3}$	$y_{d 1, p 1}$	⋯	$y_{d 1, p 11}$
⋮	⋮	⋮		⋮	⋮		⋮
Day-7	$y_{d 7, N}$	$y_{d 7, r 1}$	⋯	$y_{d 7, r 3}$	$y_{d 7, p 1}$	⋯	$y_{d 7, p 11}$
Hour-1	$y_{h 1, N}$	$y_{h 1, r 1}$	⋯	$y_{h 1, r 3}$	$y_{h 1, p 1}$	⋯	$y_{h 1, p 11}$
⋮	⋮	⋮		⋮	⋮		⋮
Hour-168	$y_{h 168, N}$	$y_{h 168, r 1}$	⋯	$y_{h 168, r 3}$	$y_{h 168, p 1}$	⋯	$y_{h 168, p 11}$

Table 3. NRMSE aggregation scheme.

	Temporal			Cross-Sectional Summary
Cross-Sectional	Week	Day	Hour	Cross-Sectional Summary
National	${\bar{NRMSE}}_{j, w, N}$	${\bar{NRMSE}}_{j, d, N}$	${\bar{NRMSE}}_{j, h, N}$	${\bar{NRMSE}}_{j, N}$
Regional	${\bar{NRMSE}}_{j, w, R}$	${\bar{NRMSE}}_{j, d, R}$	${\bar{NRMSE}}_{j, h, R}$	${\bar{NRMSE}}_{j, R}$
Provincial	${\bar{NRMSE}}_{j, w, P}$	${\bar{NRMSE}}_{j, d, P}$	${\bar{NRMSE}}_{j, h, P}$	${\bar{NRMSE}}_{j, P}$
Temporal summary	${\bar{NRMSE}}_{j, w}$	${\bar{NRMSE}}_{j, d}$	${\bar{NRMSE}}_{j, h}$	${\bar{NRMSE}}_{j}$

Table 4. Results of the ART-based ANOVA analyses, using a repeated-measures design for the global evaluation and mixed-effects models for the cross-sectional levels and temporal frequencies.

Level	Effect	F Value	Pr (>F)
Global	Model	9450.029	<2.22 $\times 10^{- 16}$
	Reconciliation	74.690	<2.22 $\times 10^{- 16}$
	Model × Reconciliation	11.914	<2.22 $\times 10^{- 16}$
National	Model	9036.778	<2.22 $\times 10^{- 16}$
	Reconciliation	71.576	<2.22 $\times 10^{- 16}$
	Model × Reconciliation	11.702	<2.22 $\times 10^{- 16}$
Regional	Model	9575.855	<2.22 $\times 10^{- 16}$
	Reconciliation	72.482	<2.22 $\times 10^{- 16}$
	Model × Reconciliation	11.740	<2.22 $\times 10^{- 16}$
Provincial	Model	9468.474	<2.22 $\times 10^{- 16}$
	Reconciliation	79.910	<2.22 $\times 10^{- 16}$
	Model × Reconciliation	12.116	<2.22 $\times 10^{- 16}$
Weekly	Model	6107.6597	<2.22 $\times 10^{- 16}$
	Reconciliation	41.9497	<2.22 $\times 10^{- 16}$
	Model × Reconciliation	10.1985	<2.22 $\times 10^{- 16}$
Daily	Model	2800.4069	<2.22 $\times 10^{- 16}$
	Reconciliation	72.7935	<2.22 $\times 10^{- 16}$
	Model × Reconciliation	7.8148	<2.22 $\times 10^{- 16}$
Hourly	Model	2855.0029	<2.22 $\times 10^{- 16}$
	Reconciliation	62.5668	<2.22 $\times 10^{- 16}$
	Model × Reconciliation	6.4258	<2.22 $\times 10^{- 16}$

Table 5. Global NRMSE of the baseline forecasts, the direct Bottom-Up benchmark, and representative best- and worst-performing reconciled variants across the direct, univariate, and iterative cross-temporal approaches for each base model. Blue values denote the best result(s), and red values the worst, across the baseline and all reported reconciled variants for each model.

Model	Case	Base	BU	Direct	Univariate	Iterative
KAN	Best	0.6624	0.6304	0.5980 (SS)	0.5586 (tWLSH:cBU)	0.5565 (tWLSH:cWLS)
KAN	Worst	0.6624	0.6304	0.7220 (OLS)	0.7038 (tOLS:cBU)	0.8218 (tOLS:cCOV)
LightGBM	Best	0.5080	0.5109	0.4586 (COV)	0.4109 (tCROSS:cBU)	0.4099 (tCROSS:cWLS)
LightGBM	Worst	0.5080	0.5109	0.5516 (OLS)	0.5391 (cOLS:tBU)	0.5516 (tOLS:cOLS)
NBEATSx	Best	0.6576	0.6883	0.6070 (SS)	0.6073 (tWLSS:cBU)	0.6070 (tWLSS:cSS)
NBEATSx	Worst	0.6576	0.6883	0.6989 (OLS)	0.7693 (cCOV:tBU)	7.0525 (tOLS:cCOV)
NHITS	Best	0.6544	0.6941	0.6155 (COV)	0.6077 (tWLSS:cBU)	0.6125 (tWLSS:cWLS)
NHITS	Worst	0.6544	0.6941	0.7095 (OLS)	0.7477 (cCOV:tBU)	7.9416 (tOLS:cCOV)
TBATS	Best	0.8561	0.9266	0.8009 (OLS)	0.7951 (tOLS:cBU)	0.7918 (tOLS:cCOVSh)
TBATS	Worst	0.8561	0.9266	0.9187 (COV)	1.0443 (cOLS:tBU)	94.8124 (tCROSS:cCOV)
TimeGPT	Best	0.7421	0.7764	0.7252 (OLS)	0.7051 (tAUTOCOV:cBU)	0.7056 (tAUTOCOV:cWLS)
TimeGPT	Worst	0.7421	0.7764	0.7334 (SS)	0.8030 (cOLS:tBU)	1.0209 (tCROSS:cCOV)

Base denotes the unreconciled forecast of each model, and BU denotes the direct cross-temporal Bottom-Up benchmark. Values in parentheses indicate the specific reconciliation method achieving the reported result.

Table 6. NRMSE of the baseline forecasts and representative best- and worst-performing reconciled variants across the direct, univariate, and iterative cross-temporal approaches for each base model, reported separately for weekly, daily, and hourly frequencies. Blue values denote the best result(s), and red values the worst, within each frequency and model.

Frequency	Model	Case	Base	Direct	Univariate	Iterative
Weekly	KAN	Best	0.4250	0.2355 (SS)	0.2172 (tWLSH:cBU)	0.2172 (tWLSH:cWLS)
	KAN	Worst	0.4250	0.4032 (OLS)	0.3861 (tOLS:cBU)	0.5148 (tOLS:cCOV)
	LightGBM	Best	0.3336	0.2037 (COV)	0.1881 (tCROSS:cBU)	0.189 (tCROSS:cWLS)
	LightGBM	Worst	0.3336	0.3191 (OLS)	0.2613 (cOLS:tBU)	0.3191 (tOLS:cOLS)
	NBEATSx	Best	0.4277	0.2686 (SS)	0.2706 (tWLSS:cBU)	0.2686 (tWLSS:cSS)
	NBEATSx	Worst	0.4277	0.3956 (OLS)	0.4288 (cCOV:tBU)	6.7421 (tOLS:cCOV)
	NHITS	Best	0.4010	0.2925 (COV)	0.2736 (tWLSS:cBU)	0.2796 (tWLSS:cWLS)
	NHITS	Worst	0.4010	0.4161 (OLS)	0.4123 (cCOV:tBU)	7.6567 (tOLS:cCOV)
	TBATS	Best	0.3173	0.3220 (OLS)	0.3131 (tOLS:cBU)	0.3120 (tOLS:cCOVSh)
	TBATS	Worst	0.3173	0.4335 (BU)	0.5819 (cOLS:tBU)	269.3 (tCROSS:cCOV)
	TimeGPT	Best	0.3252	0.3252 (OLS)	0.3440 (tAUTOCOV:cBU)	0.3440 (tAUTOCOV:cWLS)
	TimeGPT	Worst	0.3252	0.3958 (BU)	0.4075 (cOLS:tBU)	0.4800 (tCROSS:cCOV)
Daily	KAN	Best	0.4688	0.4801 (SS)	0.4581 (tWLSH:cBU)	0.4570 (tWLSH:cWLS)
	KAN	Worst	0.4688	0.6027 (OLS)	0.5826 (tOLS:cBU)	0.7050 (tOLS:cCOV)
	LightGBM	Best	0.3089	0.3429 (COV)	0.3143 (tCROSS:cBU)	0.3130 (tCROSS:cWLS)
	LightGBM	Worst	0.3089	0.4283 (OLS)	0.4434 (cOLS:tBU)	0.4283 (tOLS:cOLS)
	NBEATSx	Best	0.4183	0.4777 (SS)	0.4794 (tWLSS:cBU)	0.4777 (tWLSS:cSS)
	NBEATSx	Worst	0.4183	0.5706 (OLS)	0.6499 (cCOV:tBU)	6.9232 (tOLS:cCOV)
	NHITS	Best	0.4224	0.4861 (COV)	0.4764 (tWLSS:cBU)	0.4800 (tWLSS:cWLS)
	NHITS	Worst	0.4224	0.5750 (OLS)	0.6402 (cCOV:tBU)	7.8221 (tOLS:cCOV)
	TBATS	Best	0.5698	0.5688 (OLS)	0.5628 (tOLS:cBU)	0.5602 (tOLS:cCOVSh)
	TBATS	Worst	0.5698	0.7059 (BU)	0.8306 (cOLS:tBU)	6.602 (tCROSS:cCOV)
	TimeGPT	Best	0.6098	0.5889 (OLS)	0.6060 (tAUTOCOV:cBU)	0.6060 (tAUTOCOV:cWLS)
	TimeGPT	Worst	0.6098	0.6673 (BU)	0.6922 (cOLS:tBU)	0.9390 (tCROSS:cCOV)
Hourly	KAN	Best	1.0936	1.0783 (SS)	1.0007 (tWLSH:cBU)	0.9950 (tWLSH:cWLS)
	KAN	Worst	1.0936	1.1600 (OLS)	1.1426 (tOLS:cBU)	1.2456 (tOLS:cCOV)
	LightGBM	Best	0.8815	0.8293 (COV)	0.7303 (tCROSS:cBU)	0.7270 (tCROSS:cWLS)
	LightGBM	Worst	0.8815	0.9075 (OLS)	0.9125 (cOLS:tBU)	0.9075 (tOLS:cOLS)
	NBEATSx	Best	1.1266	1.0747 (SS)	1.0719 (tWLSS:cBU)	1.0747 (tWLSS:cSS)
	NBEATSx	Worst	1.1266	1.1304 (OLS)	1.2290 (cCOV:tBU)	7.492 (tOLS:cCOV)
	NHITS	Best	1.1398	1.0679 (COV)	1.0732 (tWLSS:cBU)	1.0778 (tWLSS:cWLS)
	NHITS	Worst	1.1398	1.1375 (OLS)	1.1906 (cCOV:tBU)	8.346 (tOLS:cCOV)
	TBATS	Best	1.6811	1.5118 (OLS)	1.5095 (tOLS:cBU)	1.5033 (tOLS:cCOVSh)
	TBATS	Worst	1.6811	1.6403 (BU)	1.7203 (cOLS:tBU)	8.549 (tCROSS:cCOV)
	TimeGPT	Best	1.2911	1.2616 (OLS)	1.1700 (tAUTOCOV:cBU)	1.1680 (tAUTOCOV:cWLS)
	TimeGPT	Worst	1.2911	1.2661 (BU)	1.3093 (cOLS:tBU)	1.6430 (tCROSS:cCOV)

Table 7. NRMSE of the baseline forecasts and representative best- and worst-performing reconciled variants across the direct, univariate, and iterative cross-temporal approaches for each base model, reported separately for the national, regional, and provincial levels. Blue values denote the best result(s), and red values the worst, within each level and model.

Level	Model	Case	Base	Direct	Univariate	Iterative
National	KAN	Best	0.6506	0.5822 (SS)	0.5382 (tWLSH:cBU)	0.5391 (tWLSH:cWLS)
	KAN	Worst	0.6506	0.6909 (OLS)	0.6858 (tOLS:cBU)	0.7995 (tOLS:cCOV)
	LightGBM	Best	0.4930	0.4218 (COV)	0.3851 (tCROSS:cBU)	0.3825 (tCROSS:cWLS)
	LightGBM	Worst	0.4930	0.5071 (OLS)	0.5068 (cOLS:tBU)	0.5071 (tOLS:cOLS)
	NBEATSx	Best	0.6400	0.5903 (SS)	0.5918 (tWLSS:cBU)	0.5903 (tWLSS:cSS)
	NBEATSx	Worst	0.6400	0.6709 (OLS)	0.737 (cCOV:tBU)	6.7615 (tOLS:cCOV)
	NHITS	Best	0.6531	0.5813 (COV)	0.5911 (tWLSS:cBU)	0.5954 (tWLSS:cWLS)
	NHITS	Worst	0.6531	0.6687 (OLS)	0.7242 (cCOV:tBU)	7.64 (tOLS:cCOV)
	TBATS	Best	0.8321	0.7832 (OLS)	0.7815 (tOLS:cBU)	0.7777 (tOLS:cCOVSh)
	TBATS	Worst	0.8321	0.9036 (BU)	0.9067 (cOLS:tBU)	93.12 (tCROSS:cCOV)
	TimeGPT	Best	0.7258	0.7036 (OLS)	0.6900 (tAUTOCOV:cBU)	0.6900 (tAUTOCOV:cWLS)
	TimeGPT	Worst	0.7258	0.7522 (BU)	0.7754 (cOLS:tBU)	0.9824 (tCROSS:cCOV)
Regional	KAN	Best	0.6456	0.5840 (SS)	0.5402 (tWLSH:cBU)	0.5391 (tWLSH:cWLS)
	KAN	Worst	0.6456	0.6961 (OLS)	0.6893 (tOLS:cBU)	0.8086 (tOLS:cCOV)
	LightGBM	Best	0.4938	0.4297 (COV)	0.3893 (tCROSS:cBU)	0.3891 (tCROSS:cWLS)
	LightGBM	Worst	0.4938	0.5115 (OLS)	0.5066 (cOLS:tBU)	0.5115 (tOLS:cOLS)
	NBEATSx	Best	0.6478	0.5928 (SS)	0.5949 (tWLSS:cBU)	0.5928 (tWLSS:cSS)
	NBEATSx	Worst	0.6478	0.676 (OLS)	0.7452 (cCOV:tBU)	7.2873 (tOLS:cCOV)
	NHITS	Best	0.6330	0.5846 (COV)	0.5964 (tWLSS:cBU)	0.5995 (tWLSS:cWLS)
	NHITS	Worst	0.6330	0.6673 (OLS)	0.7369 (cCOV:tBU)	8.2288 (tOLS:cCOV)
	TBATS	Best	0.8652	0.7824 (OLS)	0.7816 (tOLS:cBU)	0.776 (tOLS:cCOVSh)
	TBATS	Worst	0.8652	0.907 (BU)	1.0159 (cOLS:tBU)	96.33 (tCROSS:cCOV)
	TimeGPT	Best	0.7324	0.7107 (OLS)	0.694 (tAUTOCOV:cBU)	0.694 (tAUTOCOV:cWLS)
	TimeGPT	Worst	0.7324	0.7635 (BU)	0.7851 (cOLS:tBU)	1.0099 (tCROSS:cCOV)
Provincial	KAN	Best	0.6911	0.6278 (SS)	0.5976 (tWLSH:cBU)	0.5912 (tWLSH:cWLS)
	KAN	Worst	0.6911	0.7789 (OLS)	0.7361 (tOLS:cBU)	0.8573 (tOLS:cCOV)
	LightGBM	Best	0.5372	0.5244 (COV)	0.4583 (tCROSS:cBU)	0.458 (tCROSS:cWLS)
	LightGBM	Worst	0.5372	0.6362 (OLS)	0.6039 (cOLS:tBU)	0.6362 (tOLS:cOLS)
	NBEATSx	Best	0.6849	0.6379 (SS)	0.6353 (tWLSS:cBU)	0.6379 (tWLSS:cSS)
	NBEATSx	Worst	0.6849	0.7498 (OLS)	0.8255 (cCOV:tBU)	7.1086 (tOLS:cCOV)
	NHITS	Best	0.6772	0.6805 (COV)	0.6357 (tWLSS:cBU)	0.6424 (tWLSS:cWLS)
	NHITS	Worst	0.6772	0.7926 (OLS)	0.7821 (cCOV:tBU)	7.9559 (tOLS:cCOV)
	TBATS	Best	0.8710	0.8371 (OLS)	0.8223 (tOLS:cBU)	0.8219 (tOLS:cCOVSh)
	TBATS	Worst	0.8710	0.9691 (BU)	1.2101 (cOLS:tBU)	94.99 (tCROSS:cCOV)
	TimeGPT	Best	0.7679	0.7613 (OLS)	0.737 (tAUTOCOV:cBU)	0.737 (tAUTOCOV:cWLS)
	TimeGPT	Worst	0.7679	0.8134 (BU)	0.8486 (cOLS:tBU)	1.0703 (tCROSS:cCOV)

Table 8. Pairwise DM–HLN tests against LightGBM:tCROSS:cWLS.

Competitor	Best Setup	NRMSE	Mean Diff.	Median Diff.	LightGBM Wins	DM–HLN	Holm (p)
KAN	tWLSH:cWLS\|ite	0.5565	0.1466	0.1114	84.6%	6.754	$4.00 \times 10^{- 8}$
NBEATSx	ctSS\|dir	0.6070	0.1971	0.1380	90.4%	7.477	$3.84 \times 10^{- 9}$
NHITS	tWLSS:cBU\|uni	0.6077	0.1979	0.1602	86.5%	6.160	$1.56 \times 10^{- 7}$
TBATS	tOLS:cCOVSh\|ite	0.7918	0.3820	0.2413	98.1%	6.267	$1.56 \times 10^{- 7}$
TimeGPT	tAUTOCOV:cBU\|uni	0.7051	0.2952	0.2230	100.0%	8.365	$1.96 \times 10^{- 10}$

Reference model: LightGBM:tCROSS:cWLS, NRMSE = 0.4099. Positive differences favor LightGBM.

Table 9. Targeted numerical diagnostic of unstable iterative reconciliation configurations under cCOV and cCOVSh.

Model	Temporal Method	Cross-Sectional Method	NRMSE	Conv.	Iter. Med/Max	Pat. Stop	${log}_{10} κ (A_{cross})$	$∥ R_{cross} ∥_{2}$	Amp. Ratio
NBEATSx	AUTOCOV	cCOV	1.6584	84.6%	14/168	15.4%	10.10	9804.73	4.51
NBEATSx	AUTOCOV	cCOVSh	0.6652	100.0%	10/57	0.0%	3.36	13.86	0.90
NBEATSx	CROSS	cCOV	1.7631	84.6%	14/199	15.4%	10.10	9804.73	4.89
NBEATSx	CROSS	cCOVSh	0.6685	100.0%	10/101	0.0%	3.36	13.86	0.90
NBEATSx	OLS	cCOV	7.0525	88.5%	2/11	11.5%	10.10	9804.73	64.56
NBEATSx	OLS	cCOVSh	0.6965	100.0%	2/2	0.0%	3.36	13.86	0.93
NBEATSx	WLSS	cCOV	2.1387	88.5%	2/12	11.5%	10.10	9804.73	9.36
NBEATSx	WLSS	cCOVSh	0.6170	100.0%	2/2	0.0%	3.36	13.86	0.84
NHITS	OLS	cCOV	7.9416	82.7%	2/11	17.3%	10.17	11,876.24	63.17
NHITS	OLS	cCOVSh	0.7207	100.0%	2/2	0.0%	2.70	17.31	0.83
NHITS	WLSS	cCOV	2.2417	86.5%	2/10	13.5%	10.17	11,876.24	23.31
NHITS	WLSS	cCOVSh	0.6301	100.0%	2/2	0.0%	2.70	17.31	0.81
TBATS	CROSS	cCOV	94.8124	30.8%	7/77	69.2%	6.69	268.28	3026.45
TBATS	CROSS	cCOVSh	0.9288	100.0%	12/33	0.0%	2.97	10.51	0.78

Each pair compares the sample covariance estimator cCOV with its Ledoit–Wolf shrinkage version cCOVSh. Values are summarized across the 52 walk-forward cutoffs.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gudiño-Ochoa, A.; Calderón-González, H.F. Heuristic Cross-Temporal Reconciliation Approaches Applied to Heterogeneous Models in Photovoltaic Forecasting. Computers 2026, 15, 425. https://doi.org/10.3390/computers15070425

AMA Style

Gudiño-Ochoa A, Calderón-González HF. Heuristic Cross-Temporal Reconciliation Approaches Applied to Heterogeneous Models in Photovoltaic Forecasting. Computers. 2026; 15(7):425. https://doi.org/10.3390/computers15070425

Chicago/Turabian Style

Gudiño-Ochoa, Alberto, and Harold Felipe Calderón-González. 2026. "Heuristic Cross-Temporal Reconciliation Approaches Applied to Heterogeneous Models in Photovoltaic Forecasting" Computers 15, no. 7: 425. https://doi.org/10.3390/computers15070425

APA Style

Gudiño-Ochoa, A., & Calderón-González, H. F. (2026). Heuristic Cross-Temporal Reconciliation Approaches Applied to Heterogeneous Models in Photovoltaic Forecasting. Computers, 15(7), 425. https://doi.org/10.3390/computers15070425

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Heuristic Cross-Temporal Reconciliation Approaches Applied to Heterogeneous Models in Photovoltaic Forecasting

Abstract

1. Introduction

2. Cross-Temporal Forecast Reconciliation

2.1. Cross-Sectional and Temporal Summing Matrix

2.2. Cross-Temporal Reconciliation Methods and Heuristic Alternatives

2.2.1. Direct Cross-Temporal Method

2.2.2. Univariate Cross-Temporal Method

2.2.3. Iterative Cross-Temporal Method

3. Heterogeneous Base Models

3.1. TBATS: Trigonometric Box–Cox ARMA Trend Seasonal Model

3.2. LightGBM for Time Series Forecasting

3.3. TimeGPT: Foundation Transformer Model

3.4. Kolmogorov–Arnold Networks (KANs)

3.5. NBEATSx: Neural Basis Expansion with Exogenous Variables

3.6. NHITS: Neural Hierarchical Interpolation for Time Series

4. Data and Features

4.1. Exogenous Features

4.2. Walk-Forward Cross-Validation

5. Experimental Results

5.1. Error Diagnostic Analysis

5.2. Pairwise Statistical Comparison of the Best Reconciled Configurations

5.3. Numerical Diagnostic of Unstable Iterative Configurations

6. Conclusions, Limitations and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Mathematical Symbols

Appendix A

Global NRMSE by Reconciliation and Base Model

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI