Next Article in Journal
Design and Implementation of Autonomous and Non-Autonomous Time-Delay Chaotic System Based on Field Programmable Analog Array
Next Article in Special Issue
A Neutrosophic Forecasting Model for Time Series Based on First-Order State and Information Entropy of High-Order Fluctuation
Previous Article in Journal
Canonical Divergence for Measuring Classical and Quantum Complexity
Previous Article in Special Issue
A Data-Weighted Prior Estimator for Forecast Combination
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Assessing the Performance of Hierarchical Forecasting Methods on the Retail Sector

by
José Manuel Oliveira
1,2,* and
Patrícia Ramos
1,3
1
INESC Technology and Science, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal
2
Faculty of Economics, University of Porto, Rua Dr. Roberto Frias, 4200-464 Porto, Portugal
3
School of Accounting and Administration of Porto, Polytechnic Institute of Porto, Rua Jaime Lopes Amorim, 4465-004 S. Mamede de Infesta, Portugal
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(4), 436; https://doi.org/10.3390/e21040436
Submission received: 18 March 2019 / Revised: 16 April 2019 / Accepted: 22 April 2019 / Published: 24 April 2019
(This article belongs to the Special Issue Entropy Application for Forecasting)

Abstract

:
Retailers need demand forecasts at different levels of aggregation in order to support a variety of decisions along the supply chain. To ensure aligned decision-making across the hierarchy, it is essential that forecasts at the most disaggregated level add up to forecasts at the aggregate levels above. It is not clear if these aggregate forecasts should be generated independently or by using an hierarchical forecasting method that ensures coherent decision-making at the different levels but does not guarantee, at least, the same accuracy. To give guidelines on this issue, our empirical study investigates the relative performance of independent and reconciled forecasting approaches, using real data from a Portuguese retailer. We consider two alternative forecasting model families for generating the base forecasts; namely, state space models and ARIMA. Appropriate models from both families are chosen for each time-series by minimising the bias-corrected Akaike information criteria. The results show significant improvements in forecast accuracy, providing valuable information to support management decisions. It is clear that reconciled forecasts using the Minimum Trace Shrinkage estimator (MinT-Shrink) generally improve on the accuracy of the ARIMA base forecasts for all levels and for the complete hierarchy, across all forecast horizons. The accuracy gains generally increase with the horizon, varying between 1.7% and 3.7% for the complete hierarchy. It is also evident that the gains in forecast accuracy are more substantial at the higher levels of aggregation, which means that the information about the individual dynamics of the series, which was lost due to aggregation, is brought back again from the lower levels of aggregation to the higher levels by the reconciliation process, substantially improving the forecast accuracy over the base forecasts.

Graphical Abstract

1. Introduction

Retailers need demand forecasts at different levels of aggregation to support decision-making at operational and short-term strategic levels [1]. Consider a retailer warehouse storing inventory that is used to replenish multiple retail stores: Store-level forecasts at different product levels are needed to manage inventory in the store or to allocate shelf space, but aggregate forecasts are also required for the inventory decisions of the retailer warehouse [2]. Understanding whether these aggregate forecasts should be generated independently at each level of the hierarchy, based on the aggregated demand, or obtained using an hierarchical forecasting method, which depends on the aggregation constraints of the hierarchy but ensures coherent decision-making at the different levels, is the gap we seek to address in this paper.
SKUs (Stock Keeping Units) are naturally grouped together in hierarchies, with the individual sales of each product at the bottom level of the hierarchy, sales for groups of related products (such as categories, families, or areas) at increasing aggregation levels, and the total sales at the top level [3]. Generating accurate forecasts for hierarchical time-series can be particularly difficult. Time-series at different levels of the hierarchical structure have different scales and can exhibit very different patterns. The time-series at the most disaggregated level can be very noisy and are often intermittent, being more challenging to model and forecast. Aggregated series at higher levels are usually much smoother and, therefore, easier to forecast. Additionally, in order to ensure coherent decision-making at the different levels of the hierarchy, it is essential that forecasts of each aggregated series be equal to the sum of the forecasts of the corresponding disaggregated series. However, it is very unlikely that these aggregation constraints will be satisfied if the forecasts for each series in the hierarchical structure are generated independently. Finally, hierarchical forecasting methods should take advantage of the interrelations between the series at each level of the hierarchy.
The most traditional approaches to hierarchical forecasting are bottom-up and top-down methods. The bottom-up method involves forecasting each series at the bottom level, and then summing these to obtain forecasts at the higher levels of the hierarchy [4,5,6,7]. The main advantage of this approach is that, since forecasts are obtained at the bottom level, no information is lost due to aggregation. However, it ignores the inter-relations between the series and usually performs poorly on highly aggregated data. The top-down method involves forecasting the most aggregated series at the top level, and then disaggregating these, using either historical [8] or forecasted proportions [9], to obtain bottom level forecasts. Top-down approaches based on historical proportions tend to produce less accurate forecasts at lower levels of the hierarchy. The middle-out approach combines both bottom-up and top-down methods. First, forecasts for each series of an intermediate level of the hierarchy chosen previously are obtained. The forecasts for the series above the intermediate level are produced using the bottom-up approach, while the forecasts for the series below the intermediate level are produced using the top-down approach. Empirical studies comparing the performance of bottom-up and top-down methods have mixed results as to a preference for either bottom-up or top-down [4,6,10,11,12].
Recent work in the area tackles the problem using a two-stage approach: In the first step, forecasts for all series at all the levels of the hierarchy, rather then at a single level, are independently produced (these are called base forecasts). Then, a regression model is used to combine these to give coherent forecasts (these are called reconciled forecasts). Athanasopoulos et al. [9] and Hyndman et al. [13] used the Ordinary Least Squares (OLS) estimator and showed that their approach worked well, compared to most traditional methods. Hyndman et al. [14] suggested the Weighted Least Squares (WLS) estimator, proposing the variances of the base forecast errors as a proxy to the diagonal of the errors covariance matrix, with null off-diagonal elements. They also introduced several algorithms to make the computations involved more efficient under a very large number of series. To extend the work of Hyndman et al. [14], Wickramasuriya et al. [15] proposed a closed-form solution, based on the Generalised Least Squares (GLS) estimator, that minimised the sum of the variances of the reconciled forecast errors incorporating information from a full covariance matrix of the base forecast errors. The authors evaluated the performance of their method, compared to the most commonly-used methods and the results showed that it worked well with both artificial and real data.
Erven and Cugliari [16] proposed a Game-Theoretically OPtimal (GTOP) reconciliation method that selected the set of reconciled predictions, such that the total weighted quadratic loss of the reconciled predictions will never be greater than the total weighted quadratic loss of the base predictions. The authors illustrated the benefits of their approach on both simulated data and real electricity consumption data. This approach required fewer assumptions about the forecasts and forecast errors, but it did not have a closed-form solution and did not scale well for a huge set of time-series.
Mircetic et al. [17] proposed a top-down approach for hierarchical forecasting in a beverage supply chain, based on projecting the ratio of bottom and top level series into the future. Forecast projections were then used to disaggregate the base forecasts of the top level series. The disadvantage of all top-down approaches, including this one, is that they do not produce unbiased coherent forecasts [13].
The remainder of the paper is organized as follows. The next section presents a brief description of the two most widely-used approaches to time-series forecasting: State space models and ARIMA models. The procedure for using information criteria in model selection is also discussed. Section 3 describes the methods more commonly used to forecast hierarchical time-series. Section 4 presents the case study of a Portuguese retailer, explains the evaluation setup implemented and error measures used, and discusses the results obtained. Finally, Section 5 offers the concluding remarks.

2. Pure Forecasting Models

We consider two alternative forecasting methods for generating the base forecasts used by hierarchical forecasting approaches; namely, state space models and ARIMA models. These are briefly described in this section, giving a special focus on the use of information criteria for model selection.

2.1. State Space Models

Forecasts generated by exponential smoothing methods are weighted averages of past observations, where the weights decrease exponentially as the observations get older. The component form representation of these methods comprises the forecast equation and one smoothing equation for each of the components considered, which can be the level, the trend, and the seasonality. The possibilities for each of these components are: Trend = N , A , A d and Seasonality = N , A , M , where N , A , A d and M mean, respectively, none, additive, additive damped, and multiplicative. By considering all combinations of the trend and seasonal components, nine exponential smoothing methods are possible. Each method is usually labelled by a pair of letters, (T,S), specifying the type of trend and seasonal components. Denoting the time-series by y t , t = 1 , 2 , , n and the forecast of y t + h , based on all data up to time t by y ^ t + h | t , the component form of the additive Holt-Winters’ method, A , A , is
(1a) y ^ t + h | t = l t + h b t + s t + h m ( k + 1 ) (1b) l t = α y t s t m + 1 α l t 1 + b t 1 (1c) b t = β * l t l t 1 + 1 β * b t 1 (1d) s t = γ y t l t 1 b t 1 + 1 γ s t m
0 α 1 , 0 β * 1 , 0 γ 1 α ,
where l t , b t , and s t denote, respectively, the estimates of the series level, trend (slope), and seasonality at time t; m denotes the period of seasonality; and k is the integer part of ( h 1 ) / m . The smoothing parameters α , β * , and γ are constrained, to ensure that the smoothing equations can be interpreted as weighted averages. Fitted values are calculated by setting h = 1 with t = 0 , 1 , , n 1 . H-step ahead forecasts, for h = 1 , 2 , , can then be obtained using the last estimated values of the level, trend, and seasonality ( t = n ) . Details about all the other methods may be found in Hyndman and Athanasopoulos [18]. To be able to produce forecast intervals and use a model selection criteria, Hyndman et al. [19] (amongst others) developed a statistical framework, where an innovation state space model can be written for each of the exponential smoothing methods. Each state space model comprises a measurement equation, which describes the observed data, and state equations which describe how the unobserved components (level, trend, and seasonality) change with time. For each exponential smoothing method, two possible state space models are considered, one with additive errors and one with multiplicative errors, giving a total of 18 models. To distinguish state space models with additive and multiplicative errors, an extra letter E was added: The triplet E , T , S identifies the type of error, trend, and seasonality. The general state space model is
(2a) y t = w ( x t 1 ) + r ( x t 1 ) ε t (2b) x t = f ( x t 1 ) + g ( x t 1 ) ε t ,
where y t denotes the observation at time t, x t is the state vector, { ε t } is a white noise process with variance σ 2 referred to as the innovation (new and unpredictable), w ( . ) is the measurement function, r ( . ) is the error term function, f ( . ) is the transition function, and g ( . ) is the persistence function. Equation (2a) is the measurement equation and Equation (2b) gives the state equations. The measurement equation shows the relationship between the observations and the unobserved states. The transition equation shows the evolution of the state through time. The equations of the ETS A , A , A model (underlying additive Holt-Winters’ method with additive errors) are [18]
(3a) y t = l t 1 + b t 1 + s t m + ε t (3b) l t = l t 1 + b t 1 + α ε t (3c) b t = b t 1 + β ε t (3d) s t = s t m + γ ε t ,
and the equations of the ETS M , A , A model (underling additive Holt-Winters’ method with multiplicative errors) are [19]
(4a) y t = l t 1 + b t 1 + s t m 1 + ε t (4b) l t = l t 1 + b t 1 + α l t 1 + b t 1 + s t m ε t (4c) b t = b t 1 + β l t 1 + b t 1 + s t m ε t (4d) s t = s t m + γ l t 1 + b t 1 + s t m ε t .

2.1.1. Estimation of State Space Models

Maximum likelihood estimates of the parameters and initial states of the state space model (2) can be obtained by minimizing its likelihood. The probability density function for y = ( y 1 , , y n ) is given by [19]
p ( y | θ , x 0 , σ 2 ) = t = 1 n p ( y t | x t 1 ) = t = 1 n p ( ε t ) / | r ( x t 1 ) | ,
where θ is the parameters vector, x 0 is the initial states vector, and σ 2 is the innovation variance. By assuming that the distribution of { ε t } is Gaussian, this likelihood has the form
L ( θ , x 0 , σ 2 | y ) = ( 2 π σ 2 ) n / 2 t = 1 n r ( x t 1 ) 1 exp 1 2 t = 1 n ε t 2 / σ 2 ,
and its logarithm is
log L = n 2 log ( 2 π σ 2 ) t = 1 n log r ( x t 1 ) 1 2 t = 1 n ε t 2 / σ 2 .
The maximum likelihood estimate of σ 2 can be obtained by taking the partial derivative of (7) with respect to σ 2 and setting it to zero:
σ ^ 2 = n 1 t = 1 n ε t 2 .
This estimate can be used to eliminate σ 2 from the likelihood (6), which becomes
L ( θ , x 0 | y ) = ( 2 π e σ ^ 2 ) n / 2 t = 1 n r ( x t 1 ) 1 .
Hence, twice the negative logarithm of this likelihood is
2 log L ( θ , x 0 | y ) = c n + n log t = 1 n ϵ t 2 + 2 t = 1 n log | r ( x t 1 ) | ,
where c n = n log ( 2 π e ) n log ( n ) . Thus, maximum likelihood estimates for the parameters θ and the initial states x 0 can be obtained by minimizing
L * ( θ , x 0 ) = n log t = 1 n ϵ t 2 + 2 t = 1 n log | r ( x t 1 ) | .
The innovations can be computed recursively, using the relationships
ε t = [ y t w ( x t 1 ) ] / r ( x t 1 )
x t = f ( x t 1 ) + g ( x t 1 ) ε t .

2.1.2. Information Criteria for Model Selection

Forecast accuracy measures can be used to select a model for a given time-series, as long as the errors are computed from a test set and not from the training set used to estimate the model. However, the errors usually available are not enough to draw reliable conclusions. One possible solution is to use an information criterion (IC), based on the likelihood L ( θ , x 0 | y ) , that would include a regularization term to compensate for potential overfitting. The Akaike Information Criteria (AIC) for state space models is defined as [18]
AIC = 2 log L ( θ , x 0 | y ) + 2 k ,
where L ( θ , x 0 | y ) is the likelihood and k is the number of parameters and initial states of the estimated model. Akaike based his model selection criteria on the Kullback-Liebler (K-L) discrimination information, also known as negative entropy, defined by
I ( f , g ) = f ( x ) log f ( x ) g ( x | θ ) d x ,
which measures the information lost when the model g is used to approximate the real model f . He found that he could estimate the expectation of K-L information by the maximized log-likelihood corrected for bias. This bias can be approximated by the number of estimated parameters in the approximating model. Thus, the model selection procedure is to choose the model amongst the candidates having the minimum value of the AIC. The Bayesian Information Criteria (BIC) is defined as [20]
BIC = AIC + k [ log ( n ) 2 ] .
The BIC is order-consistent, but is not asymptotically efficient like the AIC. The AIC corrected for small-sample bias, denoted by AIC c , is defined as [19]
AIC c = AIC + k ( k + 1 ) n k 1 .
Appropriate models can be selected by minimizing the AIC, the BIC, or the AIC c .

2.2. ARIMA Models

ARIMA models are generally accepted as one of the most versatile classes of models for forecasting time-series [21,22]. Many different types of stochastic seasonal and non-seasonal time-series can be represented by them. These include pure autoregressive (AR), pure moving average (MA), and mixed AR and MA processes, all requiring stationary data so that they can be applied. Although many time-series are non-stationary, they can be transformed to stationary time-series by taking proper degrees of differencing (regular and/or seasonal). The multiplicative seasonal ARIMA model, denoted as ARIMA ( p , d , q ) × ( P , D , Q ) m , has the following form [23]:
ϕ p ( B ) Φ P ( B m ) ( 1 B ) d ( 1 B m ) D y t = c + θ q ( B ) Θ Q ( B m ) ε t ,
where
ϕ p ( B ) = 1 ϕ 1 B ϕ p B p Φ P ( B m ) = 1 Φ 1 B m Φ P B P m , θ q ( B ) = 1 + θ 1 B + + θ q B q Θ Q ( B m ) = 1 + Θ 1 B m + + Θ Q B Q m ,
m is the period of seasonality, D is the degree of seasonal differencing, d is the degree of ordinary differencing, B is the backward shift operator, ϕ p ( B ) and θ q ( B ) are the regular autoregressive and moving average polynomials of orders p and q, respectively, Φ P ( B m ) and Θ Q ( B m ) are the seasonal autoregressive and moving average polynomials of orders P and Q, respectively, c = μ ( 1 ϕ 1 ϕ p ) ( 1 Φ 1 Φ P ) , where μ is the mean of ( 1 B ) d ( 1 B m ) D y t , and ε t is a zero-mean Gaussian white noise process with variance σ 2 . To ensure causality and invertibility, the roots of the polynomials ϕ p ( B ) , Φ P ( B m ) , θ q ( B ) , and Θ Q ( B m ) should lie outside the unit circle. One of the main tasks in ARIMA forecasting is selecting the values of p , q , P , Q , d , and D. Usually, the following steps are used [23]: Plot the series, identify outliers, and choose a proper variance-stabilizing transformation. For that purpose, a Box-Cox transformation may be applied [24]:
y t = ln ( y t ) , λ = 0 ( y t λ 1 ) / λ , λ 0 ,
where the parameter λ is a real number, often between 1 and 2. Then, the sample ACF (Auto-Correlation Function) and sample PACF (Partial Auto-Correlation Function) can be computed to decide appropriate degrees of differencing (d and D). Alternatively, unit-root tests may be applied. The Canova–Hansen test [25] can be used to choose D. After D is selected, d can be chosen by applying successive KPSS (Kwiatkowski, Phillips, Schmidt & Shin) tests [26]. Finally, the sample ACF and sample PACF are matched with the theoretical patterns of known models, to identify the orders of p , q , P , and Q.

Information Criteria for Model Selection

As for state space models, the values of p , q , P , and Q may be selected by an information criterion, such as the Akaike Information Criteria [18]:
AIC = 2 log L ( θ , σ 2 | y ) + 2 ( p + q + P + Q + k + 1 ) ,
where k = 1 if c 0 and 0 otherwise, and log L ( θ , σ 2 | y ) is the log-likelihood of the model fitted to the properly transformed and differenced data, given by [27]
log L ( θ , σ 2 | y ) = n 2 log ( 2 π ) n 2 log ( σ 2 ) t = 1 n ε t 2 2 σ 2 ,
where θ is the parameter vector of the model and σ 2 is the innovation variance (the last term in parentheses in (20) is the total number of parameters that have been estimated, including the innovation variance). Note that the AIC is defined by considering the same principles of maximum likelihood and negative entropy discussed in Section 2.1. The AIC corrected for small sample sizes, AIC c , is defined as
AIC c = AIC + 2 ( p + q + P + Q + k + 1 ) ( p + q + P + Q + k + 2 ) n p q P Q k 2 .
The Bayesian Information Criterion is defined as
BIC = AIC + [ log ( n ) 2 ] ( p + q + P + Q + k + 1 ) .
As for the state space models, appropriate ARIMA models may be obtained by minimizing either the AIC, AIC c , or BIC.

3. Hierarchical Forecasting

3.1. Hierarchical Time-Series

For the purpose of illustration, consider the example of the hierarchical structure shown in Figure 1. At the top of the hierarchy (level 0) is the most aggregated time-series, denoted by Total. The observation at time t of the Total series is denoted by y T o t a l , t . The Total series is disaggregated into series A and series B, at level 1. The t-th observation of series A is denoted as y A , t and the t-th observation of series B is denoted as y B , t . The series A and B are disaggregated, respectively, into two and three series that are at the bottom level (level 2). For example, y A A , t denotes the t-th observation of series AA. In this case, the total number of series is n = 8 and the number of series at the bottom level is m = 5 . For any time t, the observations at the bottom level will sum to the observations of the series above. Hence, in this case, we have
y T o t a l , t = y A A , t + y A B , t + y B A , t + y B B , t + y B C , t , y A , t = y A A , t + y A B , t , y B , t = y B A , t + y B B , t + y B C , t .
These aggregation constraints can be easily represented using matrix notation
y t = S b t ,
where y t = ( y T o t a l , t , y A , t , y B , t , y A A , t , y A B , t , y B A , t , y B B , t , y B C , t ) is an n-dimensional vector, b t = ( y A A , t , y A B , t , y B A , t , y B B , t , y B C , t ) is an m-dimensional vector, and S is the summing matrix of order n × m , given by
S = 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 I 5 .
Note that the first three rows of S correspond, respectively, to the three aggregation constraints in (24). The identity matrix I 5 below guarantees that each bottom level observation on the right-hand side of the equation is equal to itself on the left hand side. These concepts can be applied to an arbitrary set of n time-series that are subject to an aggregation structure, with m series at the bottom level [18]. The goal is to produce coherent forecasts for each series in the hierarchy; that is, forecasts that add up according to the aggregation constraints of the hierarchical structure.

3.2. Hierarchical Forecasting Methods

Let y ^ t + h | t be an n-dimensional vector containing the forecasts of the values of all series in the hierarchy at time t + h (with h = 1 , 2 , ), obtained using observations up to and including time t, and stacked in the same order as y t . These are usually called base forecasts. They are calculated independently for each time-series, not taking into account any relationship that might exist between them due to the aggregation constraints. Any forecasting method, such as ETS or ARIMA, can be used to generate these forecasts. The issue is that it is very unlikely that these will be coherent forecasts, hence some reconciliation method should be further applied. All existing reconciliation methods can be expressed as
y ˜ t + h | t = S P y ^ t + h | t ,
where y ˜ t + h | t is an n-dimensional vector of reconciled forecasts, which are now coherent, and P is a matrix of dimension m × n , which maps the base forecasts y ^ t + h | t into reconciled bottom level forecasts, which are then aggregated by the summing matrix S . If the bottom-up (BU) approach is used, then P = [ 0 m × ( n m ) | I m ] , where 0 m × ( n m ) is the null matrix of order m × ( n m ) and I m is the identity matrix of order m [4,5,6,9,10,28,29]. For the hierarchy shown in Figure 1, P is given by
P = 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 .
This approach is computationally very efficient, since it only requires summing the bottom level base forecasts. It also has the advantage of forecasting the series at the most disaggregated level and, although it is more difficult to model, no information about the dynamics of the series is lost due to aggregation. However, it usually provides very poor forecasts for the upper levels in the hierarchy [13]. If a top-down (TD) approach is used, then P = [ p | 0 m × ( n 1 ) ] , where p = [ p 1 , , p m ] is an m-dimensional vector containing the disaggregation proportions, which indicate how the top level base forecast at time t + h is to be distributed to obtain forecasts for the bottom level series, which are then summed by S [8,17,30,31,32,33]. For the hierarchy shown in Figure 1, P is given by
P = p 1 0 0 0 0 0 0 0 p 2 0 0 0 0 0 0 0 p 3 0 0 0 0 0 0 0 p 4 0 0 0 0 0 0 0 p 5 0 0 0 0 0 0 0 .
The most common top-down methods performed quite well in Gross and Sohl [8]. In method “a” of Gross and Sohl [8] (referred to in the results that follow as TD GSa ), each proportion p i is the average of the historical proportions of bottom level series y i , j , relative to top level series y T , j , over the time period j = 1 , , t :
p i = 1 t j = 1 t y i , j y T , j , i = 1 , , m .
In method “f” (referred to in the results that follow as TD GSf ), each proportion p i is the average value of the historical data of bottom level series y i , j , relative to the average value of the historical data of top level series y T , j , over the time period j = 1 , , t :
p i = j = 1 t y i , j t / j = 1 t y T , j t , i = 1 , , m .
These two methods are very simple to implement, since they only require forecasts for the most aggregated series in the hierarchy. They seem to provide reliable forecasts for the aggregate levels. However, they are not able to capture the individual dynamics of the series that is lost due to aggregation. Moreover, since they are based on historical proportions, they tend to produce less accurate forecasts than the bottom-up approach at lower levels of the hierarchy, as they do not take into account how these proportions may change over time. To address this issue, Athanasopoulos et al. [9] proposed to obtain proportions based on forecasts rather than historical data:
p i = l = 0 k 1 y ^ i , t + h | t ( l ) S ^ i , t + h | t ( l + 1 ) , i = 1 , , m ,
where k is the level of the hierarchy, y ^ i , t + h | t ( l ) is the base forecast at the time t + h of the series that corresponds to the node which is l levels above i, and S ^ i , t + h | t ( l + 1 ) is the sum of the base forecasts at the time t + h of the series that corresponds to the nodes that are below the node that is l levels above node i and are directly connected to it. In the results that follow, this top-down method is referred as TD fp . In the methods discussed so far, no real reconciliation has been performed, because these have been based on base forecasts from a single level of the hierarchy. However, processes that reconcile the base forecasts from the whole hierarchy structure in order to produce coherent forecasts can also be considered. Hyndman et al. [13] proposed an approach based on the regression model
y ^ t + h | t = S β t + h | t + ε h ,
where β t + h | t is the unknown conditional mean of the most disaggregated series and ε h is the coherency error assumed with mean zero and covariance matrix Σ h . If Σ h was known, the generalised least squares (GLS) estimator of β t + h | t would lead to the following reconciled forecasts
y ˜ t + h | t = S β ^ t + h | t = S ( S Σ h 1 S ) 1 S Σ h 1 y ^ t + h | t = S P y ^ t + h | t ,
where P = ( S Σ h 1 S ) 1 S Σ h 1 . Hyndman et al. [13] also showed that, if the base forecasts y ^ t + h | t are unbiased, then the reconciled forecasts y ˜ t + h | t will be unbiased, provided that S P S = S . This condition is true for this reconciliation approach and also for the bottom-up, but not for top-down methods. So, the top-down approaches will never give unbiased reconciled forecasts, even if the base forecasts are unbiased. Recently, Wickramasuriya et al. [15] showed that, in general, Σ h is not identifiable. They showed that the covariance matrix of the h-step ahead reconciled forecast errors is given by
Var ( y t + h y ˜ t + h | t ) = S P W h P S ,
for any P such that S P S = S , where W h = Var ( y t + h y ^ t + h | t ) = E ( e ^ t + h | t e ^ t + h | t ) is the covariance matrix of the corresponding h-step ahead base forecast errors. The goal is to find the matrix P that minimises the error variances of the reconciled forecasts, which are on the diagonal of the covariance matrix Var ( y t + h y ˜ t + h | t ) . Wickramasuriya et al. [15] showed that the optimal reconciliation matrix P that minimises the trace of S P W h P S , such that S P S = S , is
P = ( S W h 1 S ) 1 S W h 1 .
Therefore, the optimal reconciled forecasts are given by
y ˜ t + h | t = S ( S W h 1 S ) 1 S W h 1 y ^ t + h | t ,
which is referred to as the MinT (Minimum Trace) estimator. Note that the MinT and GLS estimators only differ in the covariance matrix. We still need to estimate W h , which is a matrix of order n that can be quite large. The following simplifying approximations were considered by Wickramasuriya et al. [15]:
(1) W h = k h I n for all h with k h > 0 . In this case, the MinT estimator corresponds to the ordinary least squares (OLS) estimator of β t + h | t . It is the most simplifying approximation considered, being P -independent of the data (it only depends on S ), which means that this method does not account for differences in scale between the levels of the hierarchy (captured by the error variances of the base forecasts), or the relationships between the series (captured by the error covariances of the base forecasts). This is optimal only when the base forecast errors are uncorrelated and equivariant, which are unrealistic assumptions for an hierarchical time-series. In the results that follow, this method is referred to as OLS.
(2) W h = k h diag ( W ^ 1 ) for all h with k h > 0 , where W ^ 1 is the sample covariance estimator of the in-sample 1-step ahead base forecast errors. Then, W h is a diagonal matrix with the diagonal entries of W ^ 1 , which are the variances of the in-sample 1-step ahead base forecast errors, stacked in the same order as y t . This approximation scales the base forecasts, using the variance of the residuals. In the results that follow, this specification is referred to as MinT-VarScale.
(3) W h = k h Λ for all h with k h > 0 , and Λ = diag ( S 1 ) where 1 is a unit vector of dimension n. This method was proposed by Athanasopoulos et al. [34] for temporal hierarchies, and assumes that the bottom level base forecasts errors are uncorrelated between nodes and have variance k h . Hence, the diagonal entries in Λ are the number of forecast error variances contributing to each node, stacked in the same order as y t . This estimator only depends on the aggregation constraints, being independent of the data. Therefore, it is usually referred to as structural scaling, and we label it as MinT-StructScale. Notice that this specification only assumes equivariant base forecast errors at the bottom level, which is an advantage over OLS. It is particularly useful when the residuals are not available, which is the case when the base forecasts are generated by judgmental forecasting.
(4) W h = k h W ^ 1 , D * for all h with k h > 0 , where W ^ 1 , D * = λ W ^ 1 , D + ( 1 λ ) W ^ 1 is a shrinkage estimator that shrinks the off-diagonal elements of W ^ 1 towards zero (while the diagonal elements remain unchanged), W ^ 1 , D is a diagonal matrix with the diagonal entries of W ^ 1 , and λ is the shrinkage intensity parameter. By parameterizing the shrinkage in terms of variances and correlations, rather than variances and covariances, and assuming that the variances are constant, Schäfer and Strimmer [35] proposed the following shrinkage intensity parameter
λ ^ = i j Var ( r ^ i j ) ^ i j r ^ i j 2 ,
where r ^ i j is the i j th element of R ^ 1 , the sample correlation matrix of the in-sample 1-step ahead base forecast errors. In contrast to variance and structure scaling estimators, which are diagonal covariance estimators accommodating only differences in scale between the levels of the hierarchy, this shrinkage estimator, which is a full covariance estimator, also accounts for the relationships between the series, while the shrinkage parameter regulates the complexity of the matrix W h . In the results that follow, this method is referred to as MinT-Shrink. In all estimators, k h is a proportionality constant that needs to be estimated only to obtain prediction intervals.

4. Empirical Study

4.1. Case Study Data

The Jerónimo Martins Group is an international company, based in Portugal, with 225 years of accumulated experience in the retail sector. Food distribution is its main business and represents more than 95% of their consolidated sales. In Portugal, it leads the supermarket segment through a supply chain called Pingo Doce. This empirical study was performed using a real database of product sales from one of the largest stores of Pingo Doce. The data were aggregated on a weekly basis and span the period between 3 January 2012 and 27 April 2015, comprising a total of 173 weeks. Only the products that have at least one sale every week were considered, since these are the most challenging for inventory planning. The hierarchical structure of products adopted by the retailer, from the top level to the bottom level, is: Store > Area > Division > Family > Category > Sub-category > SKU. The total number of time-series considered is 1751 (aggregated and disaggregated) and their split in the six levels of the hierarchy is summarised in Table 1. The most aggregated level, referred to as the top level, comprises the total sales at the store level. Level 1 comprises these sales disaggregated by the six main areas: Grocery, specialized perishables, non-specialized perishables, beverages, detergents and cleaning, and personal care. These are further disaggregated, at level 2, into 21 divisions; at level 3, into 73 families; at level 4, into 203 categories; at level 5, into 459 subcategories; and, at the bottom level, into 988 SKUs (Stock Keeping Units).
Figure 2 plots the sales at the top level and at level 1 of the hierarchy, aggregating these by the store and by each of the 6 main areas. The scale on the y axis was removed due to confidentiality reasons. The strong peak in sales in 2012, observed in all series, is relative to a promotional event carried out at a national level by Pingo Doce on 1 May (Labour day), after which the company shifted from an Every Day Low Price strategy to a continuous promotional cycle.
All the series show local upward and downward trends, although less prominent in the detergents/cleaning and personal care time-series. The store time-series shows a similar behaviour to the perishables time-series, as the later represent the major proportion of the total sales. These aggregate series do not show any seasonal variation.
For a better understanding of the hierarchical structure of the data, we show, in Table 2, the complete hierarchy for the milk division (level 2). The total sales of the milk division are disaggregated, at level 3, into 2 families: Raw and UHT. The raw family is disaggregated into the Pasteurized category at level 4, which is further disaggregated into the Brik sub-category at level 5, which comprises 5 SKUs. The UHT family is disaggregated into the Current and Special categories. The Current category is disaggregated into the Semi-skimmed and Skimmed sub-categories, which comprise 2 and 3 SKUs, respectively. The Special category is disaggregated into the Semi-skimmed, Skimmed, and Flavored sub-categories, which comprise 10, 10, and 3 SKUs, respectively. The plots in Figure 3 show the sales of the SKUs within each subcategory of the milk division. These help us to visualise the diverse individual dynamics within each sub-category and the relative importance of each SKU. As we move down the hierarchy, the signal-to-noise ratio of the series decreases. Therefore, the series at the bottom level shows a lot more random variation, compared to the higher levels.

4.2. Experimental Setup

Generating accurate forecasts for each of the 1751 time-series within the hierarchical structure is crucial for the planning operations of the store. We can always forecast the series at each level of the hierarchy independently (we refer to these as base forecasts), based on forecasting models fitted individually for each series. However, by ignoring the aggregation constraints, it is very unlikely that the resulting forecasts will be coherent. To ensure aligned decision-making across the various levels of management, it is essential that these forecasts are reconciled across all levels of the hierarchy.
We consider two alternative forecasting model families for generating the base forecasts; namely, ETS and ARIMA, as discussed in Section 2. The appropriate ETS model for each time-series is chosen from the 18 potential models by minimising AIC c , and the smoothing parameters and initial states are estimated by maximising the likelihood L [19], as implemented in the forecast package in the R software [36]. The ARIMA model is chosen following the algorithm proposed by Hyndman and Khandakar [37], also implemented in the forecast package. First, the number of seasonal and ordinary differences D and d required for stationarity are selected, and then the orders of p , q , P , and Q are identified, based on AIC c . ETS and ARIMA models are the two most widely-used approaches to time-series forecasting. They are based on different perspectives to the problem and often, but not always, perform differently, although they share some mathematically equivalent models [21,22,38,39,40]. ARIMA can potentially capture higher-order time-series dynamics than ETS [34]. Therefore, we use both approaches to generate base forecasts, in order to evaluate how these can influence the performance of each reconciliation process. To make incoherent ETS and ARIMA forecasts coherent, we use the implementations of the hierarchical forecasting approaches, as discussed in Section 3.2, available in the hts package [41] for R.
We evaluate the forecasting accuracies of several competing methods using a rolling origin, as illustrated in Figure 4. By increasing the number of forecast errors available, we increase the confidence in our results.
We start with the training set containing the first 139 weeks and generate 1- to 12-week ahead base forecasts for each of the 1751 series using ETS and ARIMA. These base forecasts are then reconciled, using the alternative hierarchical methods. The training set is then expanded by one week, and the process is repeated until week 161. This gives a total of 23 forecast origins for each of the 1751 series. For each forecast origin, new ETS and ARIMA models based on the updated training data are specified, from which we generate new base forecasts which are again reconciled using the corresponding errors for both calculated. The performance of the hierarchical forecasting methods was evaluated by using the Average Relative Mean Squared Error (AvgRelMSE) [42]. As we are comparing forecast accuracy across time-series with different units, it is important to use a scale-independent error measure. For each time-series i, we calculate the Relative Mean Squared Error (RelMSE) [43]
RelMSE i , h = MSE i , h MSE i , h base , i = 1 , , 1751 ; h = 1 , 2 , 4 , 8 , 12 ,
where MSE i , h is the mean squared error of the forecast of interest averaged across all forecast origins and forecast horizons h, and MSE i , h base is the mean squared error of the base forecast averaged across all forecast origins and forecast horizons h, which is used as a benchmark. If the hierarchical forecasting method reconciles with ARIMA (ETS) base forecasts, then the ARIMA (ETS) base forecasts are taken as the benchmark. For each forecast horizon h, we averaged (39) across the time-series of the hierarchy using the following geometric mean
AvgRelMSE L , h = i L RelMSE i , h 1 # L , h = 1 , 2 , 4 , 8 , 12 .
where L is the level (i.e., Top level, Level 1, , Level 5, Bottom level, All). The geometric mean should be used for averaging benchmark ratios, since it gives equal weight to reciprocal relative changes [44]. An advantage of AvgRelMSE is its interpretability. When it is smaller than 1, (1-AvgRelMSE)100% is the average percentage of improvement in MSE of the evaluated forecast over the benchmark.

4.3. Results

Table 3 presents the results of AvgRelMSE for the series of each hierarchical level, while Table 4 presents the results of AvgRelMSE for the complete hierarchy. BU refers to bottom-up method, TD GSa refers to top-down “a” method of Gross and Sohl [8], TD GSf refers to top-down “f” method of Gross and Sohl [8], TD fp refers to top-down with forecast proportions, OLS refers to Ordinary Least Squares, MinT-VarScale refers to Minimum Trace Variance Scaling estimator, MinT-StructScale refers to Minimum Trace Structural Scaling estimator, MinT-Shrink refers to Minimum Trace Shrinkage estimator and Base refers to base forecasts. The left side of these tables shows the results using ARIMA base forecasts, while the right side shows the results using ETS base forecasts. As the base forecasts were used to scale the errors, in the rows labelled Base the AvgRelMSE is equal to 1 across all columns. We provide forecast results for 1 week, 2 weeks, 4 weeks (about one month), 8 weeks (about two months), and 12 weeks (about three months). The column labelled Rank provides the mean rank of each method across all forecast horizons. A method with rank of 1 is interpreted as being the best on all the horizons, while that with a rank of 9 it is always the worst. To support the comparisons between the methods that are expected to perform better, Figure 5 visualises the results of AvgRelMSE for the MinT-VarScale, MinT-StructScale, MinT-Shrink, and Base methods, presented in the Table 3 and Table 4. The results for the complete hierarchy are highlighted with a light grey background.
It is immediately clear that the MinT-Shrink forecasts improved on the accuracy of the ARIMA base forecasts for all levels and for the complete hierarchy, across all forecast horizons. The only exception was the bottom level for the short-term horizons h = 1 and 1 2 ( h = 2 ) , albeit with marginal differences. The gains in forecast accuracy were more substantial at the higher levels of aggregation. This was not the case for all other reconciliation methods, attesting to the difficulty of producing reconciled forecasts that were (at least) as accurate as the base forecasts. Furthermore, the MinT-Shrink method using ARIMA base forecasts returned the most accurate coherent forecasts for all levels, the only exceptions being the Store level, for which the MinT-VarScale returned the most accurate forecasts, and the Area level, where the MinT-StructScale performed best. The improvements on the accuracy of MinT-Shrink forecasts, across all forecast horizons, are more pronounced with the ARIMA base forecasts, compared to the ETS base forecasts (with the exception of horizon h = 1 at the bottom level), although the former was almost always more accurate than the latter (see Table 5). This could have be due to the limitation of the ets() function in the forecast package, which restricts seasonality to have a maximum period of 24. Without this limitation, ARIMA can potentially capture seasonalities of a higher order than ETS.
Clearly, the least accurate method was the OLS, for both ETS and ARIMA forecasts and across all forecast horizons. OLS only improved forecast accuracy over the base forecasts at the top level. This was due to ignoring the differences in scale between the levels of the hierarchy and any relationships between the series. A major drawback of the TD GSa and TD GSf methods was that they only considered information from the top level. Interestingly, their forecasts only improved on the accuracy of the ARIMA base forecasts for the Area level, never improving over the ETS base forecasts (the forecasts at the top level are equal to the base forecasts). The TD fp proportions were based on forecasts from all disaggregated levels of the hierarchy, but it performed badly, never improving the forecast accuracy over the ARIMA base forecasts across all forecast horizons. This could be expected, since top-down approaches never give unbiased reconciled forecasts, even if the base forecasts are unbiased. BU provided poor forecasts for all aggregate levels in the hierarchy, showing average increases in the MSE relative to the base forecasts for all levels of aggregation and all forecast horizons (the forecasts at the bottom level are equal to the base forecasts). These losses in forecast accuracy were more substantial at higher levels of aggregation.
Like OLS, MinT-StructScale only depended on the structure of the aggregations and not on the actual data, resulting in poor forecasts, especially at the lower levels of aggregation; in our case, at the Category, Sub-category, and SKU levels, which comprised about 94% of the time-series of the complete hierarchy (see Figure 5). On the other hand, by accommodating the differences in scale between the levels of the hierarchy, MinT-VarScale performed well almost always, generally improving the forecast accuracy over the base forecasts. MinT-Shrink also accounted for the inter-relationships between the series in the hierarchy, always performing better than MinT-VarScale, across both ETS and ARIMA forecasts for all forecast horizons; the only exception being at the Store level (which comprised only one time-series).
To improve on the accuracy of the base forecasts, the reconciliation methods have to take advantage of the combination of informative signals from all levels of aggregation. It is clear that MinT-Shrink was able do this and, hence, improvements in forecast accuracy over the base forecasts were attained. For the complete hierarchy, the accuracy gains generally increased with the forecast horizon varying between 1.7% and 3.7%. It is also evident that the gains in forecast accuracy were more substantial at higher levels of aggregation, which means that information about the individual dynamics of the series which was lost due to aggregation, was brought back again from the lower levels of aggregation to the higher levels by the reconciliation process, substantially improving the forecast accuracy over the base forecasts.
These results are in accordance with those obtained by Kourentzes and Athanasopoulos [45], which compared MinT-Shrink and MinT-VarScale forecasts with base forecasts in the context of generating coherent cross-temporal forecasts for Australian tourism. Both MinT-Shrink and MinT-VarScale improved the forecast accuracy over the base ETS and ARIMA forecasts for the bottom level and the complete hierarchy. MinT-Shrink performed better than MinT-VarScale across both ETS and ARIMA forecasts.
In order to find out if the forecast error differences between the several competing methods were statistically significant or not, we conducted a Nemenyi test [46]. The results of this test are shown in Figure 6. The panels on the left side show the results for the complete hierarchy using ARIMA base forecasts, for each forecast horizon; while the panels on the right side show the respective results using ETS base forecasts. In the vertical axis, the methods are sorted by MSE mean rank. In the horizontal axis, they are ordered as in Table 3 and Table 4. In each row, the cell in black represents the method being tested and any blue cell indicates a method with no evidence of statistically significant differences, at a 5% level, while the white cells indicate methods without such evidence. We use the Nemenyi test implementation available in the tsutils [47] package for R.
Analysing the results for ARIMA presented in the panels on the left side, we observe that, for h = 1 , BU and Base are grouped together as the top-performing methods. They are immediately followed by MinT-Shrink and MinT-VarScale, which are found to be statistically indifferent. For the forecast horizon 1 2 ( h = 2 ) , BU, Base, MinT-Shrink, and MinT-VarScale are now grouped together as top-performing methods. For the forecast horizon 1 4 ( h = 4 ) , MinT-Shrink and MinT-VarScale belong to the top-performing group of forecasts and BU and Base perform significantly worse. For the long-term forecasts, MinT-Shrink performs significantly better than MinT-VarScale, BU, and Base. The TD fp and MinT-StructScale methods perform significantly worse than MinT-Shrink, MinT-VarScale, BU, and Base across all forecast horizons, and are found to be statistically indifferent, outperforming only TD GSa , TD GSf , and OLS.
Analysing the results for ETS presented in the panels on the right side, we observe that, for h = 1 , BU and Base are again grouped together as top-performing methods, followed by MinT-VarScale and MinT-Shrink. For the forecast horizon 1 2 ( h = 2 ) , MinT-VarScale and Base are grouped together as top-performing methods, being immediately followed by MinT-Shrink and BU; which are found to be statistically indifferent. For the other forecast horizons, MinT-VarScale performs better, being always followed by MinT-Shrink. Overall, for both ETS and ARIMA, the MinT approach outperforms the other competing methods, with the exception for the short horizon h = 1 .

5. Conclusions

Retailers need forecasts for a huge number of related time-series which can be organised into an hierarchical structure. Sales at the SKU level can be naturally aggregated into categories, families, areas, stores, and regions. To ensure aligned decision-making across the hierarchy, it is essential that forecasts at the most disaggregated level add up to forecasts at the aggregate levels above. It is not immediately clear if these aggregate forecasts should be generated independently or by using an hierarchical forecasting method that ensures coherent decision-making at the different levels but does not guarantee (at the least) the same accuracy. To give guidelines on this issue, our empirical study investigates the relative performance of independent and reconciled forecasting approaches.
We use weekly data of SKU sales from one big store of a Portuguese retailer, spanning the period between 3 January 2012 and 27 April 2015, and consider the hierarchical structure of products adopted by the company from the top level to the bottom level, comprising six levels overall. We generate the independent forecasts using two alternative forecasting model families; namely, ETS and ARIMA. These are compared to the most commonly-used hierarchical forecasting approaches. We evaluate the forecast accuracies of several competing methods, through the Average Relative Mean Squared Error, by using a cross-validation based on a rolling forecast origin.
It is clear that MinT-Shrink forecasts generally improve on the accuracy of the ARIMA base forecasts for all levels and for the complete hierarchy, across all forecast horizons. The accuracy gains generally increase with the horizon, varying between 1.7% and 3.7% for the complete hierarchy. That is not the case for all other reconciliation methods, attesting to the difficulty of producing reconciled forecasts that are at least as accurate as base forecasts. The improvements on the accuracy of MinT-Shrink forecasts, across all forecast horizons, are more pronounced with the ARIMA base forecasts, compared to the ETS base forecasts (with the exception to horizon h = 1 at the bottom level); although, the former is almost always more accurate than the latter.
To improve on the accuracy of the base forecasts, the reconciliation methods have to take advantage of the combination of informative signals from all levels of aggregation. It is clear that MinT-Shrink is able do this and, hence, improvements in forecast accuracy over the base forecasts are attained. It is also evident that the gains in forecast accuracy are more substantial at higher levels of aggregation, which means that the information about the individual dynamics of the series lost when aggregating, is brought back again from the lower levels of aggregation to the higher levels by the reconciliation process, substantially improving the forecast accuracy over the base forecasts.

Author Contributions

Conceptualization, J.M.O. and P.R.; methodology, J.M.O. and P.R.; software, J.M.O. and P.R.; validation, J.M.O. and P.R.; formal analysis, J.M.O. and P.R.; investigation, J.M.O. and P.R.; resources, J.M.O. and P.R.; data curation, J.M.O. and P.R.; writing–original draft preparation, J.M.O. and P.R.; writing–review and editing, J.M.O. and P.R.; visualization, J.M.O. and P.R..

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fildes, R.; Ma, S.; Kolassa, S. Retail forecasting: Research and practice. Working paper. Available online: http://eprints.lancs.ac.uk/128587/ (accessed on 24 April 2019).
  2. Kremer, M.; Siemsen, E.; Thomas, D.J. The sum and its parts: Judgmental hierarchical forecasting. Manag. Sci. 2016, 62, 2745–2764. [Google Scholar] [CrossRef]
  3. Pennings, C.L.; van Dalen, J. Integrated hierarchical forecasting. Eur. J. Oper. Res. 2017, 263, 412–418. [Google Scholar] [CrossRef]
  4. Orcutt, G.H.; Watts, H.W.; Edwards, J.B. Data aggregation and information loss. Am. Econ. Rev. 1968, 58, 773–787. [Google Scholar]
  5. Dunn, D.M.; Williams, W.H.; Dechaine, T.L. Aggregate versus subaggregate models in local area forecasting. J. Am. Stat. Assoc. 1976, 71, 68–71. [Google Scholar] [CrossRef]
  6. Shlifer, E.; Wolff, R.W. Aggregation and proration in forecasting. Manag. Sci. 1979, 25, 594–603. [Google Scholar] [CrossRef]
  7. Kohn, R. When is an aggregate of a time series efficiently forecast by its past? J. Econom. 1982, 18, 337–349. [Google Scholar] [CrossRef]
  8. Gross, C.W.; Sohl, J.E. Disaggregation methods to expedite product line forecasting. J. Forecast. 1990, 9, 233–254. [Google Scholar] [CrossRef]
  9. Athanasopoulos, G.; Ahmed, R.A.; Hyndman, R.J. Hierarchical forecasts for Australian domestic tourism. Int. J. Forecast. 2009, 25, 146–166. [Google Scholar] [CrossRef]
  10. Dangerfield, B.J.; Morris, J.S. Top-down or bottom-up: Aggregate versus disaggregate extrapolations. Int. J. Forecast. 1992, 8, 233–241. [Google Scholar] [CrossRef]
  11. Widiarta, H.; Viswanathan, S.; Piplani, R. Forecasting aggregate demand: An analytical evaluation of top-down versus bottom-up forecasting in a production planning framework. Int. J. Prod. Econ. 2009, 118, 87–94. [Google Scholar] [CrossRef]
  12. Syntetos, A.A.; Babai, Z.; Boylan, J.E.; Kolassa, S.; Nikolopoulos, K. Supply chain forecasting: Theory, practice, their gap and the future. Eur. J. Oper. Res. 2016, 252, 1–26. [Google Scholar] [CrossRef]
  13. Hyndman, R.J.; Ahmed, R.A.; Athanasopoulos, G.; Shang, H.L. Optimal combination forecasts for hierarchical time series. Comput. Stat. Data Anal. 2011, 55, 2579–2589. [Google Scholar] [CrossRef]
  14. Hyndman, R.J.; Lee, A.; Wang, E. Fast computation of reconciled forecasts for hierarchical and grouped time series. Comput. Stat. Data Anal. 2016, 97, 16–32. [Google Scholar] [CrossRef]
  15. Wickramasuriya, S.L.; Athanasopoulos, G.; Hyndman, R.J. Optimal forecast reconciliation for hierarchical and grouped time series through trace minimization. J. Am. Stat. Assoc. 2018. [Google Scholar] [CrossRef]
  16. Erven, T.; Cugliari, J. Game-Theoretically Optimal Reconciliation of Contemporaneous Hierarchical Time Series Forecasts. In Modeling and Stochastic Learning for Forecasting in High Dimensions; Antoniadis, A., Poggi, J.M., Brossat, X., Eds.; Springer: Cham, Switzerland, 2015; Volume 217, pp. 297–317. [Google Scholar]
  17. Mircetic, D.; Nikolicic, S.; Stojanovic, Đ; Maslaric, M. Modified top down approach for hierarchical forecasting in a beverage supply chain. Transplant. Res. Procedia 2017, 22, 193–202. [Google Scholar] [CrossRef]
  18. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; Online Open-access Textbooks, 2018. Available online: https://OTexts.com/fpp2/ (accessed on 24 April 2019).
  19. Hyndman, R.J.; Koehler, A.B.; Ord, J.K.; Snyder, R.D. Forecasting with Exponential Smoothing: The State Space Approach; Springer: Berlin, Germany, 2008. [Google Scholar]
  20. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  21. Ramos, P.; Santos, N.; Rebelo, R. Performance of state space and ARIMA models for consumer retail sales forecasting. Robot. Comput. Integr. Manuf. 2015, 34, 151–163. [Google Scholar] [CrossRef]
  22. Ramos, P.; Oliveira, J.M. A procedure for identification of appropriate state space and ARIMA models based on time-series cross-validation. Algorithms 2016, 9, 76. [Google Scholar] [CrossRef]
  23. Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015. [Google Scholar]
  24. Box, G.E.P.; Cox, D.R. An analysis of transformations. J. R. Stat. Soc. 1964, 26, 211–252. [Google Scholar] [CrossRef]
  25. Canova, F.; Hansen, B.E. Are seasonal patterns constant over time? A test for seasonal stability. J. Bus. Econ. Stat. 1985, 13, 237–252. [Google Scholar] [CrossRef]
  26. Kwiatkowski, D.; Phillips, P.C.; Schmidt, P.; Shin, Y. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econom. 1992, 54, 159–178. [Google Scholar] [CrossRef]
  27. Hamilton, J. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994. [Google Scholar]
  28. Theil, H. Linear Aggregation of Economic Relations; North-Holland: Amsterdam, The Netherlands, 1974. [Google Scholar]
  29. Zellner, A.; Tobias, J. A note on aggregation, disaggregation and forecasting performance. J. Forecast. 2000, 19, 457–465. [Google Scholar] [CrossRef]
  30. Grunfeld, Y.; Griliches, Z. Is aggregation necessarily bad? Rev. Econ. Stat. 1960, 42, 1–13. [Google Scholar] [CrossRef]
  31. Lutkepohl, H. Forecasting contemporaneously aggregated vector ARMA processes. J. Bus. Econ. Stat. 1984, 2, 201–214. [Google Scholar] [CrossRef]
  32. McLeavey, D.W.; Narasimhan, S. Production Planning and Inventory Control; Allyn and Bacon Inc.: Boston, MA, USA, 1974. [Google Scholar]
  33. Fliedner, G. An investigation of aggregate variable timesSeries forecast strategies with specific subaggregate time series statistical correlation. Comput. Oper. Res. 1999, 26, 1133–1149. [Google Scholar] [CrossRef]
  34. Athanasopoulos, G.; Hyndman, R.J.; Kourentzes, N.; Petropoulos, F. Forecasting with temporal hierarchies. Eur. J. Oper. Res. 2017, 262, 60–74. [Google Scholar] [CrossRef]
  35. Schäfer, J.; Strimmer, K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 2005, 4, 151–163. [Google Scholar] [CrossRef]
  36. R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
  37. Hyndman, R.J.; Khandakar, Y. Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 2008, 26, 1–22. [Google Scholar] [CrossRef]
  38. Papacharalampous, G.; Tyralis, H.; Koutsoyiannis, D. Predictability of monthly temperature and precipitation using automatic time series forecasting methods. Acta Geophys. 2018, 66, 807–831. [Google Scholar] [CrossRef]
  39. Papacharalampous, G.; Tyralis, H.; Koutsoyiannis, D. One-step ahead forecasting of geophysical processes within a purely statistical framework. Geosci. Lett. 2018, 5, 12. [Google Scholar] [CrossRef]
  40. Papacharalampous, G.; Tyralis, H.; Koutsoyiannis, D. Comparison of stochastic and machine learning methods for multi-step ahead forecasting of hydrological processes. Stoch. Environ. Res. Risk Assess. 2019. [Google Scholar] [CrossRef]
  41. Hyndman, R.; Lee, A.; Wang, E.; Wickramasuriya, S. hts: Hierarchical and Grouped Time Series, 2018. R package Version 5.1.5. Available online: https://pkg.earo.me/hts/ (accessed on 24 April 2019).
  42. Davydenko, A.; Fildes, R. Measuring forecasting accuracy: The case of judgmental adjustments to SKU-level demand forecasts. Int. J. Forecast. 2013, 29, 510–522. [Google Scholar] [CrossRef]
  43. Fildes, R.; Petropoulos, F. Simple versus complex selection rules for forecasting many time series. J. Bus. Res. 2015, 68, 1692–1701. [Google Scholar] [CrossRef]
  44. Fleming, P.J.; Wallace, J.J. How not to lie with statistics: The correct way to summarize benchmark results. Commun. ACM 1986, 29, 218–221. [Google Scholar] [CrossRef]
  45. Kourentzes, N.; Athanasopoulos, G. Cross-temporal coherent forecasts for Australian tourism. Ann. Tourism Res. 2019, 75, 393–409. [Google Scholar] [CrossRef]
  46. Hollander, M.; Wolfe, D.A.; Chicken, E. Nonparametric Statistical Methods; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2015. [Google Scholar]
  47. Kourentzes, N.; Svetunkov, I.; Schaer, O. tsutils: Time Series Exploration, Modelling and Forecasting, 2019. R package Version 0.9.0. Available online: https://rdrr.io/cran/tsutils/ (accessed on 24 April 2019).
Figure 1. Example of a two-level hierarchical structure.
Figure 1. Example of a two-level hierarchical structure.
Entropy 21 00436 g001
Figure 2. Total sales (top level, or store) and sales aggregated by area (level 1).
Figure 2. Total sales (top level, or store) and sales aggregated by area (level 1).
Entropy 21 00436 g002
Figure 3. Sales of the SKUs within each sub-category of the milk division.
Figure 3. Sales of the SKUs within each sub-category of the milk division.
Entropy 21 00436 g003
Figure 4. Cross-validation procedure, based on a rolling forecast origin with 1- to 12-week ahead forecasts.
Figure 4. Cross-validation procedure, based on a rolling forecast origin with 1- to 12-week ahead forecasts.
Entropy 21 00436 g004
Figure 5. AvgRelMSE for the MinT-VarScale, MinT-StructScale, MinT-Shrink, and Base methods with ARIMA and ETS.
Figure 5. AvgRelMSE for the MinT-VarScale, MinT-StructScale, MinT-Shrink, and Base methods with ARIMA and ETS.
Entropy 21 00436 g005
Figure 6. Nemenyi test results, at a 5% significance level, for the complete hierarchy.
Figure 6. Nemenyi test results, at a 5% significance level, for the complete hierarchy.
Entropy 21 00436 g006
Table 1. Number of series in each hierarchical level by area.
Table 1. Number of series in each hierarchical level by area.
AreaDivisionsFamiliesCategoriesSubcategoriesSKUs
Specialized perishables61950102193
Non-specialized perishables41648117287
Grocery31451144309
Beverages461632103
Personal care29193759
Detergents & cleaning29192737
Total2173203459988
Table 2. Hierarchical structure of the milk division.
Table 2. Hierarchical structure of the milk division.
AreaDivisionFamiliesCategoriesSubcategoriesSKUs
Non-specialized perishablesMilkRawPasteurizedBrik5
UHTCurrentSemi-skimmed2
Skimmed3
SpecialSemi-skimmed10
Skimmed3
Flavored3
Table 3. Average Relative Mean Squared Error (AvgRelMSE) for each level of the hierarchy obtained with ARIMA and ETS base forecasts.
Table 3. Average Relative Mean Squared Error (AvgRelMSE) for each level of the hierarchy obtained with ARIMA and ETS base forecasts.
ARIMA ETS
Method h = 1 1 2 1 4 1 8 1 12 Rank h = 1 1 2 1 4 1 8 1 12 Rank
Top-level: Store
BU2.0742.1792.4892.5692.2379 1.7481.7211.8691.9901.9149
TD GSa 111116.5 111114.6
TD GSf 111116.5 111114.6
TD fp 111116.5 111114.6
OLS0.9490.9510.9590.9500.9474 0.9850.9900.99810.9992.5
MinT-VarScale0.7360.7540.7780.7620.7501 0.9670.9721.0211.0461.0365
MinT-StructScale0.7490.7770.8360.8370.7962.4 1.0351.0311.0991.1421.1238
MinT-Shrink0.7370.7640.8480.8770.8562.6 0.9150.9130.9931.0110.9992.1
Base111116.5 111114.6
Level 1: Area
BU1.0961.1541.2421.2741.2688.6 1.2641.2741.3141.3271.2889
TD GSa 0.8950.8990.9220.9500.9725 1.0771.0741.0691.0921.0837.6
TD GSf 0.8860.8880.9110.9380.9614 1.0671.0631.0571.0801.0716.6
TD fp 1.0201.0121.0021.0091.0157 1.0211.0090.9980.9980.9984.5
OLS1.1891.1861.1501.1341.1258.4 1.1231.0791.0040.9900.9775.3
MinT-VarScale0.7540.7630.7940.8140.8272.8 0.9620.9650.9800.9900.9852.3
MinT-StructScale0.7170.7340.7770.8040.8171 0.9800.9830.9971.0080.9983.9
MinT-Shrink0.7330.7410.7860.8120.8302.2 0.9060.9170.9360.9460.9471
Base111116 111114.8
Level 2: Division
BU1.0821.1311.1751.2121.1927.6 1.2781.2781.2771.2561.2278
TD GSa 1.0811.0981.1301.1461.1385.8 1.2591.2191.1901.1561.1326
TD GSf 1.0891.1041.1361.1511.1427 1.2691.2261.1971.1621.1377
TD fp 1.0911.0911.0821.0681.0565.6 1.0261.0201.0021.0041.0063.6
OLS1.9661.9531.9942.0292.0279 1.5231.4951.4611.4711.4579
MinT-VarScale0.8480.8640.8810.8870.8892.4 1.0091.0071.0051.0061.0033.4
MinT-StructScale0.8420.8610.8850.9030.9082.6 1.0361.0311.0291.0291.0235
MinT-Shrink0.7950.8020.8190.8390.8511 0.9640.9690.9780.9940.9981
Base111114 111112
Level 3: Family
BU1.0161.0221.0311.0401.0365 1.0831.0831.0731.0671.0616.4
TD GSa 1.1941.1821.1741.1551.1307 1.2171.1761.1321.0791.0436.8
TD GSf 1.2001.1881.1791.1591.1348 1.2231.1811.1361.0831.0467.8
TD fp 1.1011.0941.0791.0791.0756 1.0241.0181.0081.0051.0054
OLS2.3482.3142.3382.4052.3999 1.5671.5421.5331.5241.5039
MinT-VarScale0.9300.9270.9240.9270.9292 0.9890.9880.9830.9810.9812
MinT-StructScale0.9820.9790.9810.9910.9983 1.0351.0311.0261.0231.0215
MinT-Shrink0.8980.8880.8830.8850.8901 0.9610.9630.9630.9700.9751
Base111114 111113
Level 4: Category
BU1.0141.0151.0191.0291.0294 1.0271.0281.0271.0281.0274.2
TD GSa 1.3001.2901.2711.2491.2337 1.2951.2631.2191.1591.1227
TD GSf 1.3061.2961.2761.2531.2378 1.3021.2691.2241.1631.1258
TD fp 1.1291.1211.1081.1071.1035.1 1.0331.0311.0281.0271.0304.8
OLS2.4632.4182.4032.3982.3759 1.6361.6181.6021.5631.5379
MinT-VarScale0.9770.9730.9690.9660.9662 0.9880.9900.9890.9880.9891.8
MinT-StructScale1.1291.1251.1211.1151.1125.9 1.0761.0731.0691.0631.0626
MinT-Shrink0.9400.9320.9260.9280.9331 0.9720.9760.9800.9860.9921.2
Base111113 111113
Level 5: Subcategory
BU1.0081.0091.0121.0151.0143.8 1.0111.0091.0091.0091.0094
TD GSa 1.3261.3011.2741.2311.2086.8 1.3141.2701.2201.1551.1177
TD GSf 1.3351.3091.2821.2381.2157.8 1.3231.2781.2281.1611.1238
TD fp 1.1551.1431.1311.1221.1155 1.0521.0461.0441.0391.0395
OLS2.4782.4262.4082.3782.3539 1.6771.6511.6261.5821.5589
MinT-VarScale1.0091.0041.0010.9940.9922.8 1.0000.9970.9970.9950.9951.7
MinT-StructScale1.2601.2501.2431.2251.2196.4 1.1351.1251.1171.1061.1036
MinT-Shrink0.9700.9620.9550.9480.9491 0.9890.9920.9961.0011.0051.8
Base111112.4 111112.5
Bottom-level: SKU
BU111112.1 111111.5
TD GSa 1.3811.3551.3211.2671.2436.2 1.3871.3461.2931.2171.1777
TD GSf 1.3931.3661.3311.2761.2517.4 1.3981.3571.3031.2251.1848
TD fp 1.1821.1661.1481.1291.1265 1.0801.0791.0751.0681.0695
OLS2.0772.0382.0091.9721.9599 1.5061.4961.4791.4481.4339
MinT-VarScale1.0351.0291.0221.0121.0124 1.0151.0161.0141.0111.0123.6
MinT-StructScale1.3781.3641.3471.3201.3157.4 1.2041.2001.1911.1771.1726
MinT-Shrink1.0111.0040.9950.9870.9901.8 1.0041.0091.0111.0151.0203.4
Base111112.1 111111.5
Table 4. AvgRelMSE for the complete hierarchy obtained with ARIMA and ETS base forecasts.
Table 4. AvgRelMSE for the complete hierarchy obtained with ARIMA and ETS base forecasts.
ARIMA ETS
Method h = 1 1 2 1 4 1 8 1 12 Rank h = 1 1 2 1 4 1 8 1 12 Rank
All
BU1.0061.0081.0101.0131.0123.7 1.0131.0131.0131.0121.0124
TD GSa 1.3431.3201.2921.2481.2256.8 1.3461.3061.2561.1861.1487
TD GSf 1.3531.3291.3011.2551.2327.8 1.3561.3151.2641.1931.1548
TD fp 1.1631.1501.1341.1211.1175 1.0641.0611.0571.0521.0525
OLS2.2232.1822.1592.1322.1169 1.5651.5491.5301.4961.4779
MinT-VarScale1.0131.0081.0030.9960.9952.9 1.0061.0061.0051.0031.0032.6
MinT-StructScale1.2861.2761.2651.2461.2426.4 1.1601.1541.1461.1351.1316
MinT-Shrink0.9830.9750.9680.9630.9661 0.9940.9981.0011.0061.0112
Base111112.4 111111.4
Table 5. AvgRelMSE results of ARIMA base forecasts with ETS base forecasts used as benchmark.
Table 5. AvgRelMSE results of ARIMA base forecasts with ETS base forecasts used as benchmark.
h = 1 1 2 1 4 1 8 1 12
Top-level0.5920.5720.5490.5630.617
Level 11.0070.980.9580.9470.929
Level 21.0751.010.9620.9140.913
Level 30.9860.9610.9310.9020.894
Level 40.9850.9670.950.9210.905
Level 50.9840.9690.9550.9370.925
Bottom-level1.0070.9980.9870.9720.961
All0.9980.9850.9710.9530.941

Share and Cite

MDPI and ACS Style

Oliveira, J.M.; Ramos, P. Assessing the Performance of Hierarchical Forecasting Methods on the Retail Sector. Entropy 2019, 21, 436. https://doi.org/10.3390/e21040436

AMA Style

Oliveira JM, Ramos P. Assessing the Performance of Hierarchical Forecasting Methods on the Retail Sector. Entropy. 2019; 21(4):436. https://doi.org/10.3390/e21040436

Chicago/Turabian Style

Oliveira, José Manuel, and Patrícia Ramos. 2019. "Assessing the Performance of Hierarchical Forecasting Methods on the Retail Sector" Entropy 21, no. 4: 436. https://doi.org/10.3390/e21040436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop