Next Article in Journal
On Underdispersed Count Kernels for Smoothing Probability Mass Functions
Previous Article in Journal
Adjustment of Anticipatory Covariates in Retrospective Surveys: An Expected Likelihood Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Pivot Clustering to Minimize Error in Forecasting Aggregated Demand Streams Each Following an Autoregressive Moving Average Model

1
Sy Syms School of Business, Yeshiva University, Suite 334, 215 Lexington Avenue, New York, NY 10012, USA
2
Sy Syms School of Business, Yeshiva University, BH-428, 500 West 185th St., New York, NY 10033, USA
3
Technology, Operations and Statistics, Leonard N. Stern School of Business, New York University, 44 West 4th St., New York, NY 10012, USA
4
Gies College of Business and Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, 601 E John Street, Champaign, IL 61820, USA
*
Author to whom correspondence should be addressed.
Stats 2023, 6(4), 1198-1225; https://doi.org/10.3390/stats6040075
Submission received: 18 September 2023 / Revised: 23 October 2023 / Accepted: 27 October 2023 / Published: 2 November 2023
(This article belongs to the Section Time Series Analysis)

Abstract

:
In this paper, we compare the effects of forecasting demand using individual (disaggregated) components versus first aggregating the components either fully or into several clusters. Demand streams are assumed to follow autoregressive moving average (ARMA) processes. Using individual demand streams will always lead to a superior forecast compared to any aggregates; however, we show that if several aggregated clusters are formed in a structured manner, then these subaggregated clusters will lead to a forecast with minimal increase in mean-squared forecast error. We show this result based on theoretical MSFE obtained directly from the models generating the clusters as well as estimated MSFE obtained directly from simulated demand observations. We suggest a pivot algorithm, which we call Pivot Clustering, to create these clusters. We also provide theoretical results to investigate sub-aggregation, including for special cases, such as aggregating demand generated by MA(1) models and aggregating demand generated by ARMA models with similar or the same parameters.

1. Introduction

Modern-day technologies not only permit firms to accurately track their point of sales data and lost sales (purchases not made by customers due to a lack of inventory) data but also gather more granular data. These data streams deluge firms with information which can be either aggregated for planning purposes or considered in its entirety or follow an in-between approach. In this paper we analyze a model in which a retailer is faced with exactly the same choices and provide guidelines for combining the data for the purpose of forecasting demand.
Consider a retailer who has access to its individual customers’ demand streams. Assume that each of these demand streams follow an ARMA model having possibly contemporaneously correlated shock sequences. The primary contribution of this research is to quantify to what extent such a retailer would benefit from forecasting each of the individual streams as opposed to the aggregate. In general, retailers forecast their aggregate demand stream since historically the retailer may only have accurate aggregate demand information, and the forecasting of the individual customer demand streams is often thought of as being cumbersome and time consuming. We demonstrate that a retailer observing multiple demand streams generated by ARMA models can drastically reduce its mean squared forecasting error (MSFE) by forecasting the individual demand streams as opposed to just the aggregate demand stream, as noted in [1].
Although the retailer’s MSFE is never lower when forecasting using aggregated demand compared to the individual demand streams (this holds for the theoretical case when model coefficients are known and need not be estimated), there are cases when the individual streams do not reduce the MSFE. The primary situation where this occurs, as is discussed in the paper, is when various models generating the demand streams have identical ARMA parameter values. For the in-between case, our results demonstrate that the retailer’s MSFE under aggregate forecasting can be greatly reduced if the retailer forecasts various clusters of aggregated demand streams. We also show by example that clustering continues to perform well in the event that ARMA models are estimated for non-ARMA data.
In other words, retailers can make use of data mining and other clustering approaches in order to generate clusters of similar customers and their demand streams. We show that such clustering methods significantly enhance forecast accuracy. This is related to the study of telephone data in [2], where the authors concluded that subaggregated data can be effective for improving forecast accuracy compared to aggregate data. They also note that data are often aggregated to the level that forecasts are required. We describe here ways in which subaggregated clusters can be selected to minimize forecast error. Several examples are mentioned in [3] in the context of assigning demand allocation to different facilities.
Many researchers and practitioners focus on the need to determine clusters of similarly situated customers in order to create and provide customized and/or personalized products. This type of clustering is generally performed on specific characteristics that customers possess (see, for example, [4] and the references within). On the other hand, our focus is on forecasting demand for a product by a firm’s customers, recognizing that these customers may have different preferences and hence differing demand. As we describe below, from the demand forecasting perspective, the information contained within the individual demand streams provides the optimal forecast (in terms of minimizing the MSFE and hence inventory related costs) for product demand. Nonetheless, there has been research on the use of clustering methods within a forecasting environment when customer demand data are high dimensional (see, for example, [5]).
As opposed to generating clusters based upon customer preferences and customer demographics, we explain how clusters can be generated explicitly from the individual time series structure of the individual demand streams or customers. Even though it is always optimal from a forecasting perspective to use the individual streams, clusters of similar customer streams may be very helpful to the firm for other reasons as described above. Future empirical work would be necessary to investigate to what extent and in what contexts clusters generated based upon time series structure of the demand streams correlate to clusters based upon other customer preferences. In such a case where there exists such a relationship, firms could use clustering for simplifying their demand forecasting while identifying groups of customers to receive personalized products.
The purpose of this paper is to demonstrate that clustering based upon time series structure can be utilized within demand forecasting that is superior to forecasting aggregate demand and nearly as good as forecasting the individual demand streams. The structure of our paper is as follows. In the next section, we describe the demand framework and supply chain setting of our research problem as well as the way that theoretical MSFE computations are determined for the various forecasts (using aggregated demand processes) included herein. In Section 3, we illustrate (through example) that there exists a particular set of subaggregated clusters which results in an MSFE that is close to the MSFE obtained from using disaggregated streams and much lower than the MSFE obtained from the fully aggregated sequence. In Section 4, we describe how to cluster demand streams generated by ARMA models using Pivot Clustering and how this compares to other clustering methods. In Section 5, we describe an objective function that can be minimized to obtain an optimal assignment of streams to clusters in terms of MSFE reduction. Finally, we obtain theoretical results on how demand streams produced by MA(1) models can be clustered in the most efficient way possible to reduce the resulting subaggregated MSFE in Section 6.

2. Model Framework

We consider a retailer with possibly many large customers. In general, retailers forecast their aggregate demand stream since the forecasting of the individual customer demand streams is often thought of as being cumbersome and time consuming. We demonstrate that a retailer observing multiple streams of demand generated by ARMA models can drastically reduce its MSFE by forecasting the individual demand streams or aggregated clusters of similar demand streams. We limit the focus of this paper on ARMA models that describe stationary demand in order to keep the exposition as clear as possible. If we were to consider ARIMA (or seasonal ARIMA) models, then differencing (or seasonal differencing) would need to be carried out on the data to apply the methodology discussed here. We further note that even simple ARMA(1,1) models appearing in Section 4 can have coefficients that produce seasonal patterns in demand realizations.
Hence, consider a retailer that observes multiple demand streams for a single product { X 1 , t } , { X 2 , t } , , { X N , t } . Each demand stream { X k } is assumed to be generated by an ARMA model with respect to a sequence of shocks { ϵ k , t } given by
Φ k ( B ) X k , t = Θ k ( B ) ϵ k , t
where Φ k ( z ) = 1 + Φ k , 1 z + + Φ k , p k z p k and Θ k ( z ) = 1 + Θ k , 1 z + + Θ k , q k z q k , such that { X k , t } is invertible and causal with respect to { ϵ k , t } (see Brockwell and Davis, page 77 for a definition and discussion about causality and invertibility). We denote the variance of each shock sequence σ k 2 = E [ ϵ k 2 ] . Furthermore, we note that the shock sequences are potentially contemporaneously correlated with σ i j = E [ ϵ i , t ϵ j , t ] . In general, this set up guarantees that the shocks ϵ k , t are the retailer’s Wold shocks (see [6] pp. 187–188 for a description of a Wold decomposition of a time series) and that the MSFE of one-step-ahead leadtime demand (when using the disaggregated (individual) streams) is the sum of the elements in the covariance matrix Σ ϵ where Σ i j = σ i j such that σ k 2 = σ k k (see Equation (7)).
The focus of this paper is evaluating the difference in one-step-ahead MSFEs at time t when the forecast of leadtime demand, given by i = 1 + 1 ( X 1 , t + i + X 2 , t + i + + X N , t + i ) , is based on the different series described below where C k , τ = X 1 , τ C k + + X n C k , τ C k . Studying one-step-ahead MSFEs is mathematically simpler than those for general leadtimes since the former does not depend on model parameters.
Disaggregated (individual) sequences   { X 1 , τ } τ = t , { X 2 , τ } τ = t , , { X N , τ } τ = t
Subaggregated (clustered) sequences   { C 1 , τ } τ = t , { C 2 , τ } τ = t , , { C n , τ } τ = t
Aggregated (full) sequence   { D τ = X 1 , τ + X 2 , τ + + X N , τ } τ = t
Our problem is related to the one posed by [7] where a two-stage supply chain was considered with the retailer observing two demand streams. The focus of that paper was in evaluating information sharing between the retailer and supplier in a situation where the retailer forecasts each demand stream separately. Here, we show the benefit to the retailer (the situation is really identical for any player in the supply chain that might be observing multiple demand streams, where information sharing does not take place) in determining the separate forecasts while considering the existence of (possibly) more than two demand streams. Kohn [1] was the first to identify conditions under which using the individual demand streams leads to a better forecast than using the aggregated sequence; however, he did not determine the MSFE in the two cases. The same conditions can be used to show that if streams are subaggregated into clusters where optimal clusters are always used, then the MSFE of the forecast decreases as the number of clusters increases. Our aim is to determine the optimal cluster assignment based on a predetermined choice of the number of clusters. The number of clusters can be based on the level of detailed data available to a firm or based on the tradeoff from lowering MSFE and increasing the complexity of the model when increasing the amount of clusters used.
In this paper, we extend the results of [7] by providing formulas for computing MSFEs under the possibility of more than two streams and a general leadtime in the situation where a player’s (retailer’s) forecast can only be based on their Wold shocks. We also describe how a retailer would forecast its demand by identifying clusters of similar demand streams and then forecasting each cluster after it is aggregated. We provide a pivot algorithm, which we call Pivot Clustering, for identifying a locally optimal assignment of streams into a fixed number of clusters. The algorithm will often find the best possible assignment. We also describe a fast clustering algorithm which results in a globally optimal assignment when demand streams are generated by independent MA(1) models. Our results show that after the algorithms are carried out, much of the benefit of forecasting individual streams can be obtained by forecasting the aggregated clusters.

2.1. Forecasting Using the Disaggregated (Individual) Sequences { X 1 , τ } τ = t , , { X n , τ } τ = t

In this section we are interested in forecasting i = 1 + 1 ( X 1 , t + i + X 2 , t + i + + X N , t + i ) when the forecast is based on the diaggregated individual sequences as discussed in the previous section. The forecasting contained here is a textbook multivariate time-series result along with the propagation described in [8] as seen in (5). Thus, we consider processes { X k , t } , , { X N , t } such that X k , t generated by ARMA models given by
Φ k ( B ) X k , t = Θ k ( B ) ϵ k , t
with σ j k = E [ ϵ j , t ϵ k , t ] .
In this case, the disaggregated MSFE given by
M S F E i n d = E k = 1 N i = 1 + 1 X k , t + i k = 1 N i = 1 + 1 X k , t + i ^ 2
where i = 1 + 1 X k , t + i ^ is the best linear forecast of leadtime demand at time t for stream { X k , t } based on (2). We note that
E i = 1 + 1 X j , t + i i = 1 + 1 X j , t + i ^ i = 1 + 1 X k , t + i i = 1 + 1 X k , t + i ^ = σ j k i = 0 ω j , i ω k , i
such that
ω k , i = 0 i < 0 ψ k , i i = 0 ω k , i 1 + ψ k , i 0 < i < + 1 ω k , i 1 + ψ k , i ψ k , i 1 i + 1 .
with ψ k , i is the ith coefficient appearing in the MA() representation of { X k , t } with respect to { ϵ k , t } (see [8] for details). That is, 1 + ψ k , 1 z + ψ k , 2 z 2 + = Ψ k ( z ) and
X k , t = Ψ k ( B ) ϵ k , t
such that Ψ k ( z ) = Θ k ( z ) Φ k ( z ) .
Thus, the MSFE of the best linear forecast (BLF) of leadtime demand when using the individual sequences
{ X 1 , τ } τ = t , { X 2 , τ } τ = t , , { X N , τ } τ = t
is given by
M S F E i n d = k = 1 N j = 1 N i = 0 σ k j ω j , i ω k , i .
We note that the one-step-ahead ( = 0 ) disaggregated MSFE is given by j = 1 N k = 1 N σ j k . Thus, we can compare (7) with (26) below to determine the reduction in MSFE when using the individual sequences as opposed to the aggregated sequence.

2.2. ARMA Representation of a Summed Sequence { S τ = X 1 , τ + + X s , τ } τ = t

In this subsection, we determine the ARMA representation of a series { S t } with respect to a series of Wold shocks, where { S t } is the sum of several ARMA-generated streams given by { S t } = { X 1 , t + X 2 , t + + X s , t } . Furthermore, we determine the variance of the Wold shocks appearing in this representation. This will allow us to determine the MSFE when forecasts are based on the fully aggregated series as well as when the forecasts are based on subaggregated clusters. In order to obtain the ARMA representation, we first need to obtain the spectral density f S ( λ ) and the covariance generating function G S ( z ) of { S t } .
 Proposition 1. 
Let { S t } = { X 1 , t + X 2 , t + + X s , t } . The spectral density of { S t } is given by
f S ( λ ) = i = 1 s f X i ( λ ) + i = 1 s 1 j = i + 1 s f X i X j ( λ ) + f ¯ X i X j ( λ )
where the cross-spectrum f X i X j ( λ ) is defined f X i , X j ( λ ) = 1 2 π r = e i λ r C X i X j ( r ) with C X i X j ( r ) = E [ X i , t + r X j , t ] and f ¯ X i X j ( λ ) = 1 2 π r = e i λ r C X i X j ( r ) .
 Proof. 
We will prove this by induction. Note that when { S t } = { X 1 , t + X 2 , t } , f S ( λ ) is given by
f S ( λ ) = 1 2 π r = e i λ r E [ ( X 1 , t + r + X 2 , t + r ) ( X 1 , t + X 2 , t ) ] = 1 2 π r = e i λ r E [ X 1 , t + r X 1 , t ] + 1 2 π r = e i λ r E [ X 2 , t + r X 2 , t ]
+ 1 2 π r = e i λ r E [ X 1 , t + r X 2 , t ] + 1 2 π r = e i λ r E [ X 2 , t + r X 1 , t ] .
Noting that r = e i λ r E [ X 2 , t + r X 1 , t ] = r = e i λ r E [ X 2 , t X 1 , t + r ] = f ¯ X 1 X 2 ( λ ) , we see that
f S ( λ ) = f X 1 ( λ ) + f X 2 ( λ ) + f X 1 X 2 ( λ ) + f ¯ X 1 X 2 ( λ )
which matches representation (8). □
Now suppose (8) holds for { S n , t } = { X 1 , t + + X n , t } processes. Thus,
f S n ( λ ) = i = 1 n f X i ( λ ) + i = 1 n 1 j = i + 1 n f X i X j ( λ ) + f ¯ X i X j ( λ )
Consider { S n + 1 } = { S n , t + X n + 1 , t } . Since { S n , t } follows and ARMA model, we observe that
f S n + 1 ( λ ) = f S n ( λ ) + f X n + 1 ( λ ) + f S n , X n + 1 ( λ ) + f ¯ S n , X n + 1 ( λ )
Note that starting with the definition of f S n , X n + 1 ( λ ) ,
f S n , X n + 1 ( λ ) = 1 2 π r = e i λ r E [ S n , t X n + 1 , t + r ]
= 1 2 π r = e i λ r E [ X 1 , t X n + 1 , t + r + + X n , t X n + 1 , t + r ]
= 1 2 π r = e i λ r E [ X 1 , t X n + 1 , t + r ] + + 1 2 π r = e i λ r E [ X n , t X n + 1 , t + r ]
= f X 1 X n + 1 ( λ ) + + f X n X n + 1 ( λ )
Similarly,
f ¯ S n , X n + 1 ( λ ) = f ¯ X 1 X n + 1 ( λ ) + + f ¯ X n X n + 1 ( λ )
Thus, from (12),
f S n + 1 ( λ ) = f S n ( λ ) + f X n + 1 ( λ ) + f X 1 X n + 1 ( λ ) + + f X n X n + 1 ( λ ) + f ¯ X 1 X n + 1 ( λ ) + + f ¯ X n X n + 1 ( λ )
or equivalently
f S n + 1 ( λ ) = i = 1 n f X i ( λ ) + i = 1 n 1 j = i + 1 n f X i X j ( λ ) + f ¯ X i X j ( λ ) + f X n + 1 ( λ ) + f X 1 X n + 1 ( λ ) + + f X n X n + 1 ( λ ) + f ¯ X 1 X n + 1 ( λ ) + + f ¯ X n X n + 1 ( λ )
which can be simply written as
f S n + 1 ( λ ) = i = 1 n + 1 f X i + i = 1 n j = i + 1 n + 1 f X i X j ( λ ) + f ¯ X i X j ( λ )
and the result is proved. □
Now, consider the covariance generating function G S ( z ) = j = E [ S t S t j ] z j of { S t } . Here, we use the equivalence G S ( e i λ ) = 2 π f S ( λ ) and note the following:
f X i ( λ ) = σ i 2 2 π | Θ i ( e i λ ) | 2 | Φ i ( e i λ ) | 2
f X i X j ( λ ) = σ i j 2 π Θ i ( e i λ ) Φ i ( e i λ ) Θ j ( e i λ ) Φ j ( e i λ )
f ¯ X i X j ( λ ) = σ i j 2 π Θ i ( e i λ ) Φ i ( e i λ ) Θ j ( e i λ ) Φ j ( e i λ )
Thus, from Proposition 1 we observe that
G S ( z ) = i = 1 s σ i 2 Θ i ( z ) Θ i ( z 1 ) Φ i ( z ) Φ i ( z 1 ) + i = 1 s 1 j = i + 1 s σ i j Θ i ( z ) Φ i ( z ) Θ j ( z 1 ) Φ j ( z 1 ) + σ i j Θ i ( z 1 ) Φ i ( z 1 ) Θ j ( z ) Φ j ( z )
As described in Theorem 5 of [7], the covariance generating function G S ( z ) can be factorized as the ratio O ( z ) P ( z ) Q ( z ) where O ( z ) , P ( z ) and Q ( z ) are Laurent polynomials, with O ( z ) having all its roots on the unit circle and P ( z ) and Q ( z ) having no roots on the unit circle. This result follows from the fact that each additive term in (76) is a ratio of Laurent polynomials and the fact that for any Laurent polynomial P ( z ) , both P ( z ) P ( z 1 ) and P ( z ) + P ( z 1 ) will be Laurent polynomials. Furthermore, if P 1 ( z ) and P 2 ( z ) are Laurent polynomials then P 1 ( z ) P 2 ( z ) and P 1 ( z ) + P 2 ( z ) will be Laurent polynomials as well.
We can now use the factorization provided in [9] and described in [7] to obtain the ARMA representation of { S t } with respect to the Wold shocks { ϵ t } (appearing in its Wold representation). It should be noted that Remark 1 is simply a restatement of Theorem 5 of [7] with the slight addition of determining the polynomials appearing in the ARMA representation of the aggregate sequence { S t } .
 Remark 1. 
{ S t } can be represented with respect to shocks { ϵ t } using the ARMA model
Φ ( B ) S t = Θ ( B ) ϵ t
where Θ ( z ) = i = 1 m ( 1 a i z ) where { a i } are the roots of O ( z ) P ( z ) on or inside the unit circle and Φ ( z ) = i = 1 n ( 1 b i z ) where { b i } are the roots of Q ( z ) inside the unit circle. Furthermore,
σ ϵ 2 = E [ ϵ t 2 ] = p m j = 1 m ( 1 / a j ) q n j = 1 n ( 1 / b j )
where p m is the coefficient of z m in P ( z ) and q n is the coefficient of z n in Q ( z ) .

2.3. Forecasting Using the Fully Aggregated Sequence { D τ } τ = t for a General Leadtime

We note that Remark 1 can be used to obtain the ARMA representation of { D t } with respect to its Wold shocks { ϵ t } as well as σ ϵ 2 = E [ ϵ t 2 ] . We can therefore use Lemma 1 of [8] and its proof to determine the BLF of k = 1 + 1 D t + k and its MSFE when forecasting using the infinite past of { D t } up to time t, namely { D τ } τ = t . That is, if we consider the MA() representation of { D t } with respect to { ϵ t } given by
D t = Ψ ( B ) ϵ t
where Ψ ( z ) = 1 + Ψ 1 z + Ψ 2 z 2 + , then the MSFE of the BLF when using { D τ } τ = t is
M S F E a g g = σ ϵ 2 i = 0 ω i 2
where
ω i = 0 i < 0 ψ i i = 0 ω i 1 + ψ i 0 < i < + 1 ω i 1 + ψ i ψ i 1 i + 1 .
We note that Ψ ( z ) = Θ ( z ) Φ ( z ) .

2.4. Forecasting Using Subaggregated Sequences

In this section we use the results in the previous section to create a forecast and compute its MSFE when some of the individual streams are subaggregated. That is, for k 1 n , let cluster C k consist of n C k streams such that { C k , t = X 1 , t C k + + X n C k , t C k } . We are interested in the forecast and MSFE based on { C 1 , τ } τ = t , , { C n , τ } τ = t .
Section 2.2 describes how we can obtain the ARMA representation and variance of the Wold shocks appearing in the Wold representation for each sequence { C k , t } . This can then be used to create a forecast from each subaggregated sequence, the sum of which can be taken as the forecast for D t + + 1 . The one-step ahead MSFE of this forecast is the sum of the entries of the covariance matrix of the Wold shocks appearing in the Wold representation of { C 1 , t , , C n , t } . Equation (7) describes how we can also use the covariance matrix of the shocks of { C 1 , t , , C n , t } to obtain the MSFE for multi-step ahead forecasts. The remainder of this section will focus on obtaining this covariance matrix.
Without loss of generality, consider two subaggregated series { C 1 , t = X 1 , t + + X a , t } and { C 2 , t = X a + 1 , t + + X b , t } with ARMA representations
ϕ 1 ( B ) C 1 , t = θ 1 ( B ) ϵ 1 , t
ϕ 2 ( B ) C 2 , t = θ 2 ( B ) ϵ 2 , t
where { ϵ 1 , t } and { ϵ 2 , t } are the shocks appearing in the Wold representation of { C 1 , t } and { C 2 , t } , respectively. We note that the variances of { ϵ 1 , t } and { ϵ 2 , t } can be obtained using Remark 1. To obtain the covariance σ 12 = E [ ϵ 1 , t ϵ 2 , t ] consider the following.
We can rewrite the ARMA representations above as
ϵ 1 , t = ϕ 1 ( B ) θ 1 ( B ) C 1 , t
ϵ 2 , t = ϕ 2 ( B ) θ 2 ( B ) C 2 , t
The ARMA representations of { X 1 , t } , , { X b , t } can also be rewritten as
X 1 , t = Θ 1 ( B ) Φ 1 ( B ) ϵ 1 , t X b , t = Θ b ( B ) Φ b ( B ) ϵ b , t
Based on the definition of C 1 , t = X 1 , t + + X a , t and C 2 , t = X a + 1 , t + + X b , t we observe that
ϵ 1 , t = ϕ 1 ( B ) θ 1 ( B ) Θ 1 ( B ) Φ 1 ( B ) ϵ 1 , t + + Θ a ( B ) Φ a ( B ) ϵ a , t
ϵ 2 , t = ϕ 2 ( B ) θ 2 ( B ) Θ a + 1 ( B ) Φ a + 1 ( B ) ϵ a + 1 , t + + Θ b ( B ) Φ b ( B ) ϵ b , t
To obtain E [ ϵ 1 , t ϵ 2 , t ] we need to compute the expectation of the the product of the right-hand-sides of these two equations. Thus, we need to consider the sum of terms such as
E ϕ 1 ( B ) θ 1 ( B ) Θ i ( B ) Φ i ( B ) ϵ i , t ϕ 2 ( B ) θ 2 ( B ) Θ j ( B ) Φ j ( B ) ϵ j , t .
Note that we can write
ϕ 1 ( B ) θ 1 ( B ) Θ i ( B ) Φ i ( B ) ϵ i , t = k = 0 ψ ˜ i , k ϵ i , t k
and
ϕ 2 ( B ) θ 2 ( B ) Θ j ( B ) Φ j ( B ) ϵ j , t = k = 0 ψ ˜ j , k ϵ j , t k
where ψ ˜ i , k and ψ ˜ j , k can be obtained in the same way as the MA() coefficients in (27). Hence, the term in Equation (35) can be rewritten as
E k = 0 ψ ˜ i , k ϵ i , t k k = 0 ψ ˜ j , k ϵ j , t k
Since the shock sequences are not correlated across time by assumption, this is equivalent to
E k = 0 ψ ˜ i , k ψ ˜ j , k ϵ i , t k ϵ j , t k
or equivalently
k = 0 ψ ˜ i , k ψ ˜ j , k σ i j
Adding up these terms as required would yield the covariance. Hence
σ 12 = E [ ϵ 1 , t ϵ 2 , t ] = i = 1 a j = a + 1 b k = 0 ψ ˜ i , k ψ ˜ j , k σ i j
We note that this methodology can easily be extended to obtain any of the covariances in the covariance matrix
Σ ϵ = σ 11 σ 1 n σ 21 σ 2 n σ n 1 σ n n
We will use the previous methodology for all theoretical MSFE computations found in this paper. In the next subsection, we provide an example describing the importance of forecasting leadtime demand based upon the individual sequences.

2.5. Example

Consider a retailer that observes aggregate demand { D t = X 1 , t + X 2 , t + X 3 , t } where each individual demand stream is generated by one of the following ARMA models:
X 1 , t = ( 1 0.9 B ) ϵ 1 , t
X 2 , t = ( 1 + 0.9 B ) ϵ 2 , t
X 3 , t = ( 1 + 0.9 B ) ϵ 3 , t
where the shock covariance matrix is given by Σ = 1.6 1.4 0.5 1.4 1.3 0.8 0.5 0.8 2.0
We use the results described in Section 2.1 and Section 2.2 to compare the MSFE of the forecasts of leadtime demand j = 1 + 1 D t + j when using the individual sequences { X 1 , t } τ = t , { X 2 , t } τ = t , { X 3 , t } τ = t versus when using the aggregate sequence { D τ } τ = t .
The covariance generating function of { D t } is given by
G S ( z ) = 0.090 z 1 + 5.631 + 0.090 z 1
and the ARMA representation of { D t } is given by
D t = ( 1 + 0.01598704 B ) ϵ t
where σ ϵ 2 = 5.629561 . We can check that the ARMA representation of { D t } and its covariance generating function match up by noting that
G S ( z ) = σ ϵ 2 Θ ( z ) Θ ( z 1 ) Φ ( z ) Φ ( z 1 ) = 5.629561 ( 1 + 0.01598704 z ) ( 1 + 0.01598704 z 1 ) = 0.090 z 1 + 5.631 + 0.090 z
with Θ ( z ) = ( 1 + 0.01598704 z ) and Φ ( z ) = 1 being the MA and AR polynomials appearing in the ARMA representation of { D t } as defined in Remark 1.
We note that the one-step ahead forecast then has a MSFE = 5.629561 and a two-step-ahead forecast (of D t + 1 + D t + 2 ) has an MSFE = 11.44056.
Using Equation (7) and noting that ω k , 0 = ψ k , 0 = 1 in Equation (5), we can obtain the MSFE of the forecast based on the individual demand streams. In this case, the MSFE is 1.5, which is 4.129561 (73.3%) lower that the forecast error when using the aggregated series. We can likewise compute the elements ω k , i when forecasting D t + 1 + D t + 2 . In this case, ω k , 0 = ψ k , 0 = 1 and ω 1 , 1 = 0.1 , ω 2 , 1 = 1.9 , and ω 3 , 1 = 1.9 . From (7) this implies that the MSFE = 7.311, which is also 4.129561 (36.1%) lower (In this case, the reduction in MSFE appears the same, regardless of leadtime; however, this is due to the series being generated by MA(1) models. For higher order ARMA models, the reduction in MSFE may be dependent on leadtime.) than the forecast error when using the aggregated series. In the next section we demonstrate, with example, the benefit of forecasting using subaggregated clusters (which would have been identified using a suitable technique).

3. The Benefit of Forecasting Using Subaggregated Clusters

In this section, we consider a retailer that has ten demand streams that aggregate into three clusters consisting of similar streams. The models generating these streams are specifically chosen to provide a clear separation between “good” clusters leading to low MSFE and “bad” clusters leading to high MSFE. Intuition gleaned from Section 6 and Section 7 hint that streams generated from ARMA models with similar coefficients should be clustered together. Hence, for our example we consider three groups of models having similar sets of coefficients within each group. Later, in Section 4, we will randomly assign coefficients to streams and still observe a sharp drop in MSFE when streams are clustered to minimize MSFE.
Suppose the retailer observes aggregate demand { D t = X 1 , t + X 2 , t + + X 10 , t } where each individual demand stream is generated by one of the following ARMA processes:
( 1 0.3 B 0.6 B 2 ) X 1 , t = ( 1 0.6 B 0.2 B 2 ) ϵ 1 , t
( 1 0.35 B 0.5 B 2 ) X 2 , t = ( 1 0.65 B 0.15 B 2 ) ϵ 2 , t
( 1 0.27 B 0.55 B 2 ) X 3 , t = ( 1 0.63 B 0.17 B 2 ) ϵ 3 , t
( 1 0.8 B ) X 4 , t = ϵ 4 , t
( 1 0.9 B ) X 5 , t = ϵ 5 , t
( 1 0.75 B ) X 6 , t = ϵ 6 , t
( 1 + 0.77 B ) X 7 , t = ( 1 + 0.6 B ) ϵ 7 , t
( 1 + 0.68 B ) X 8 , t = ( 1 + 0.55 B ) ϵ 8 , t
( 1 + 0.73 B ) X 9 , t = ( 1 + 0.52 B ) ϵ 9 , t
( 1 + 0.7 B ) X 10 , t = ( 1 + 0.5 B ) ϵ 10 , t
where the shock covariance matrix is given by
Σ = 2 1 0.8 0.9 1.2 1.5 0.8 0.9 0.95 1 1 2.1 0.7 0.6 0.5 0.4 0.21 0.31 0.36 0.39 0.8 0.7 2.2 0.5 1.3 1 0.4 0.8 1 1.1 0.9 0.6 0.5 3 1.8 1.9 2 2.1 2.2 2.3 1.2 0.5 1.3 1.8 3.2 2 1.9 1.8 1.7 1.5 1.5 0.4 1 1.9 2 3.3 2.2 2.3 2.4 2.5 0.8 0.21 0.4 2 1.9 2.2 5 1 1.25 1.5 0.9 0.31 0.8 2.1 1.8 2.3 1 5.1 1.3 1.6 0.95 0.36 1 2.2 1.7 2.4 1.25 1.3 5.7 1.8 1 0.39 1.1 2.3 1.5 2.5 1.5 1.6 1.8 5.9
Consider the three natural clusters in the above ten demand streams, namely, streams 1–3, 4–6, and 7–10. It can be shown that, indeed, this grouping is optimal out of any other possible choice of three clusters simply by looking at all possible combinations and computing their MSFEs. Our analysis of the MA(1) case in Section 6 also points to these being the correct clusters based on the proximity of the ARMA coefficients of the models generating the demand streams. We demonstrate that, even though the best the retailer can achieve in such a situation is forecast all ten streams individually, if the retailer correctly clusters the demand streams as mentioned, the results will be similar. All MSFEs stated in this section are obtained using the methods described in Section 2.
Specifically in this case, if the retailer were to forecast the individual streams, its one-step ahead theoretical MSFE would be 21.64. If the retailer were to consider the subaggregated processes consisting of the correct clusters and forecast these separately, then the MSFE would be 21.74. This is in stark contrast to an MSFE of 61.39 when forecasting using the aggregate process of all ten streams. In other words, it is sufficient to determine clusters of similar customers as opposed to forecasting each individual stream in order to keep inventory-related costs down.
To see the impact of choosing the correct clusters, we consider the case that the retailer incorrectly clusters the streams as 1–2, 3–5, and 6–10. In this case, the MSFE rises to 33.4. Similarly, if the clusters chosen are 1 & 4 , 2 & 3 & 5 , and 6–10, then the MSFE is 45.04. Assigning streams randomly to three clusters consisting of three, two, and five streams yields Table 1.
We note that there could be substantial reduction in MSFE even when multiple streams are clustered incorrectly. In the next section, we demonstrate how a retailer would be able to generate clusters of its individual demand streams using Pivot Clustering.

4. TS Clustering Algorithms and Pivot Clustering; Empirical Evaluation

We have shown that the when the retailer forecasts using clusters of its demand streams as opposed to the individual streams, its MSFE can vary greatly from close to the optimal value (when forecasting using the individual streams) to close to the MSFE of a forecast based on the aggregate of all the streams. Hence, if the retailer wishes to forecast using clusters of its demand streams, the selection of the clusters is important. In general, a retailer might have customer-related information that could be used to generate the clusters. The benefit of generating the clusters and forecasting them is that there could be situations where it would be cumbersome for the retailer to collect the individual demand streams and service them individually. Forecasting clusters would therefore be a second-best option.
We propose Pivot Clustering for determining clusters which usually result in a relatively low MSFE among all choices of streams to clusters. We consider two ways to obtain the subaggregated MSFE based on some clustering assignment. The first is to use the individual ARMA demand models appearing in (2) to compute the subaggregated theoretical MSFE as per Section 2.2, Section 2.3 and Section 2.4. We note that if there is only one cluster, then the MSFE is the one computed for a forecast based on the aggregate of the all the streams, while if the number of clusters is equal to the number of streams, the MSFE is for a forecast based on the disaggregated (individual) sequences. We also estimate the MSFE by generating demand realizations for each stream based on (2). Once demand realizations are simulated for each stream, we subaggregate the realizations based on our choice of clusters. So, if some cluster is made up of streams { X i , t } and { X j , t } , we say that the cluster has realization { X i , t + X j , t } . We then estimate an ARMA(5,5) (the AR and MA degrees were chosen with the understanding that these degrees increase with the number of streams subaggregated into a particular cluster while trying to limit the complexity of the models being estimated) model using each cluster’s realization. Finally, we use the estimated models to obtain in-sample forecast errors and compute the covariance matrix of these forecast errors to estimate the MSFE for a particular assignment of streams to clusters. In the analysis below we see that the estimated MSFEs are close to theoretical ones and often lead to similar choices of clusters. Based on a predetermined number of clusters n, Pivot Clustering works as follows.
For each stream, randomly assign it to a cluster.
For each cluster:
For each stream in the cluster, compute or estimate the MSFE for the current assignment along with the resulting MSFEs if the stream was moved to each of the other clusters.
The MSFE can be either estimated based on realizations of the demand streams or computed using Equations (28), (29), (41) and (42).
Move each stream in the cluster to a cluster which leads to the largest overall MSFE reduction among all choices of clusters.
In the remainder of this section, we perform various simulations to assess the efficacy of Pivot Clustering. We focus on ARMA(1,1) models as these do not require too much runtime for Pivot Clustering based on theoretical MSFE and are complex enough to describe demand data such as in [10]. Additionally, forecasting an aggregate of ARMA(1,1) demand sequence has been studied by [11], where forecasts were based on exponential smoothing. The methods herein are generally applicable, however, to higher-order ARMA models. From a computational standpoint, it is possible to determine theoretical MSFE based on the aggregate of up to twenty demand streams generated by ARMA(1,1) models. The burden lies in having to find roots of large degree polynomials in order to determine the ARMA model and shock variance that describes the aggregated sequence. To understand the computational requirements, we check the runtimes of Pivot Clustering when determining theoretical and estimated MSFEs. Based on a given number of streams (between 10 and 20), we carry out Pivot Clustering for twenty different combinations of ARMA(1,1) models and plot the average of the runtimes in Figure 1 in assigning the streams to four clusters. If using estimated MSFE, then many more streams can be clustered, and Pivot Clustering has faster runtimes for larger amounts of streams. Upon checking, Pivot Clustering with estimated MSFEs for 200 streams takes approximately 20 min.
We can check the efficacy of Pivot Clustering through simulation. We begin by randomly assigning coefficients to twenty ARMA(1,1) models to produce twenty demand streams as well as the covariance matrix of the shock sequences. We make sure that each assignment results in causal and invertible demand with respect to the shocks and that the resulting covariance matrix is positive definite. The AR and MA coefficients and covariance matrix can be found under “Models.csv” and “covarmat.csv” in our Github location [12].
After randomly assigning streams to one of four cluster’s we compute both the estimated and theoretical one-step-ahead MSFEs based on this random assignment and use it to start Pivot Clustering. We output the clusters found by Pivot Clustering as well as the MSFE of the forecast based on this set of clusters. We iterate this procedure 50 times to study how much the MSFE improves based on Pivot Clustering for the starting allocations. The MSFEs of the final clusters and random clusters can be found under “MSFEresults.csv” in our Github link [12]. These can also be compared with the MSFEs of the forecast based on the individual (disaggregated) demand streams and the forecast based on fully aggregating the streams.
For the twenty demand streams and models used, the theoretical and estimated MSFEs when forecasting based on individual (disaggregated) streams are 102.1 and 96.2. The theoretical and estimated MSFEs when forecasting based on the fully aggregated streams are 231.3 and 220.6. For the 50 simulations of assigning streams to random clusters (used in the initialization step of Pivot Clustering) the average of the theoretical and estimated MSFEs based on the subaggregated random clusters are 202.2 and 194.2. After Pivot Clustering is carried out to obtain a better set of subaggregated clusters in each of the 50 simulations, the averages of the theoretical and estimated MSFEs are 109.4 and 101.0. The various theoretical and estimated MSFEs for the different initializations are provided in Figure 2 and Figure 3. We note that regardless of the initial random assignment of streams to clusters, Pivot Clustering leads to the clustering of streams such that the subaggregated MSFE is very low. In fact, typically, Pivot Clustering results in clusters for which the subaggregated MSFE ends up very close to the MSFE obtained when forecasts are based on the individual (disaggregated) streams.
We can compare our results with existing time-series clustering methods. Two distance measures that can be computed for time-series realizations are available in the TSclust package for R, namely AR.PIC and AR.LPC.CEPS. These distances can be used to perform hierarchical clustering such as average-linkage clustering. The final groups determined by these methods lead to MSFEs of 123.4 and 108.8, respectively, higher than those found by Pivot Clustering starting from random assignments. We note that the cluster assignments found by these methods can also be used in the initialization of Pivot Clustering, potentially leading to even better clusters.
Since the previous simulations were carried out on only one set of twenty ARMA(1,1) demand models, we should also check the efficacy of Pivot Clustering for other sets of models as well. As such, we consider twenty simulations where within each simulation a new set of twenty demand models is considered. We compare the estimated and theoretical MSFEs of one random assignment of streams to four clusters with the estimated and theoretical MSFEs of the four clusters obtained by Pivot Clustering. In each simulation, we also compute the MSFEs that would be found when fully aggregating the streams or when considering forecasts based on individual streams as well as the MSFEs that would be found using the AR.PIC and AR.LPC.CEPS distances for hierarchical clustering streams into four clusters. The results of these simulations are displayed in Figure 4 and Figure 5. We note that if forecasts are to be based on four clusters, the lowest MSFEs are obtained when clusters are formed using Pivot Clustering. Furthermore, Pivot Clustering leads to forecasts whose MSFE is very close to the MSFE of the forecast based on the individual streams in every simulation.
We continue with twenty simulations where in each simulation we consider a separate set of 20 streams being subaggregated into four clusters with 10 random initializations of Pivot Clustering. The means of the various theoretical and estimated MSFEs under different clustering approaches are displayed in Figure 6 and Figure 7. We note again that in every set of twenty streams, the averaged subaggregated MSFEs are very close to the disaggregated MSFEs when averaged for different initial random assignments of streams to clusters.
We continue by assessing how well Pivot Clustering performs when compared against an exhaustive algorithm which checks all possible assignments of streams to clusters using theoretical MSFE calculations. To achieve this, we consider twenty simulations where within each simulation we randomly generate 10 ARMA(1,1) streams (We reduced the number of streams and clusters here due to the fact that an exhaustive algorithm requires O ( k N ) iterative steps to check all possible cluster assignments where k is the number of clusters and N is the number of streams. We note that Pivot Clustering has a complexity of O ( k N ) in the event that each stream is only allowed to change clusters once.) and compute the lowest MSFE possible among all choices of streams to three clusters. Furthermore, we consider 10 random initializations of Pivot Clustering each time new streams are considered. The results are displayed in Table 2. For the first simulation of twenty streams (described by the first row of the table), 8 out of 10 initializations of our algorithm led Pivot Clustering to find the optimal solution. Among the two initializations that did not lead to the optimal solution, the ratio of optimal MSFE to the MSFE of the grouping found by pivot is 96.81%. The median was 96.81%, while the minimum ratio was 96.24%. In some instances Pivot Clustering never found an optimal solution (such as in the sixth simulation); however, the average MSFE of the optimal solution was around 99.6% of the MSFE of the groupings found by Pivot Clustering. In the worst performance, of Pivot Clustering (simulation 15), the best possible grouping led to an MSFE that was 80.27464% lower than the MSFE found by Pivot Clustering.
Finally, we consider the robustness of the Pivot Clustering algorithm to cases where the data generation process is not ARMA. To accomplish this, we perform ten simulations where in each simulation we simulate twenty demand stream realizations such that stream X k follows an ARFIMA( 0 , d k , 0 ) model given by
( 1 B ) d k X k , t = ϵ k , t
where 0.4 < d k < 0.4 and C o v ( ϵ k , t , ϵ j , t ) may be nonzero. Each realization, consisting of 1500 time periods, is used to fit an ARMA(5,5) model to compute an estimated one-step-ahead MSFE for the disaggregated series (appearing as a blue dot in Figure 8). Summing the realizations together to fit an ARMA(5,5) model yields an estimated MSFE for the aggregated series (appearing as a black dot in Figure 8). Finally, the Pivot Clustering algorithm is carried out using five different random initializations of assigning streams to one of four clusters. The MSFEs for the subaggregated random clusters and pivot clusters appear as red and green dots in Figure 8. We note that in the subaggregated case the number of ARMA models that needs to be estimated is equal to the number of clusters.
As before, we note that Pivot Clustering provides a sharp reduction in MSFE compared to random cluster assignments as well as compared to the aggregated case. We also observe that when fitting ARMA models to non-ARMA data, it is possible for Pivot Clustering to yield clusters which lead to a subaggregated MSFE that is lower than the MSFE using the individual (disaggregated) series. The exact cause of this is unclear; however, it is possible it has to do with the extra number of misspecified ARMA models that are fit in the disaggregated case.

5. Clustering Demand Streams through Minimizing an Objective Function Based on Subaggregated MSFE

In this section, we describe how to determine the optimal assignment of streams to clusters by identifying and minimizing an objective function which computes the overall MSFE given a particular assignment of streams to clusters. We begin by assuming that the desired number of clusters is known to be n. For α 1 , 2 , n , let subaggregated cluster series { C α , t } = { X 1 , t y 1 , α + + X N , t y N , α } , where y i , α = 1 if stream { X i , t } is in cluster { C α , t } and 0 otherwise, have the ARMA representation
ϕ α ( B ) C α , t = θ α ( B ) ϵ α , t .
For α 1 , 2 , n , the shocks ϵ α , t have covariance matrix Σ ϵ given in Equation (42). We define the number of demand streams in cluster { C α , t } to be n C α = i = 0 N y i , α > 0 for all α 1 , 2 , n . Furthermore, if y i , α = 1 then y i , β = 0 for all α β 1 , 2 , n . For each of n C α streams { X i , t C α } in cluster { C α , t } , we adapt the notation of its ARMA representation to be
Φ i , α ( B ) X i , t C α = Θ i , α ( B ) ϵ α , i , t .
 Lemma 1. 
An optimal set of clusters can be found by minimizing the subaggregated MSFE given by
M S F E s u b a g g = α = 1 n β = 1 n l = 0 ω β , l ω α , l · i = 1 n C α j = 1 n C β k = 0 ψ ˜ α , i , k ψ ˜ β , j , k σ i j
where ψ ˜ α , i , k and ψ ˜ β , j , k are obtained from the equivalence
ϕ α ( z ) θ α ( z ) Θ i , α ( z ) Φ i , α ( z ) k = 0 ψ ˜ α , i , k z k
and
ϕ β ( z ) θ β ( z ) Θ j , β ( z ) Φ j , β ( z ) k = 0 ψ ˜ β , j , k z k .
where the key terms are defined in the proof below.
Alternatively, the objective can be stated as finding the optimal set of { y 1 , , y N } such that we minimize
M S F E s u b a g g = α = 1 n β = 1 n l = 0 ω β , l ω α , l · i = 1 N j = 1 N k = 0 y i , α y j , β ψ ˜ α , i , k ψ ˜ β , j , k σ i j
where ω α , l and ψ α , l in Equation (69) are obtained through the equivalence
θ α ( z ) ϕ α ( z ) = l = 0 ψ α , l z l
where again the key terms are defined in subsequent proof.
 Proof. 
From Equation (7) we note that a forecast based on the clusters would have MSFE given by
M S F E s u b a g g = α = 1 n β = 1 n l = 0 σ α β ω β , l ω α , l
where
ω α , l = 0 i < 0 ψ α , l l = 0 ω α , l 1 + ψ α , l 0 < l < + 1 ω α , l 1 + ψ α , l ψ α , l 1 l + 1 .
where ψ α , l is the lth coefficient appearing in the MA() representation of { C α , t } with respect to { ϵ α , t } . From Equation (41), we note that for any two subaggregated clusters { C α , t } and { C β , t } consisting of n C α and n C β streams, respectively, the corresponding shock series { ϵ α , t } and { ϵ β , t } , the covariance E [ ϵ α , t ϵ β , t ] is expressed by
σ α β = i = 1 n C α j = 1 n C β k = 0 ψ ˜ α , i , k ψ ˜ β , j , k σ i j .
Therefore, the objective is to assign streams to clusters such that we minimize the MSFE
M S F E s u b a g g = α = 1 n β = 1 n l = 0 ω β , l ω α , l · i = 1 n C α j = 1 n C β k = 0 ψ ˜ α , i , k ψ ˜ β , j , k σ i j
where ψ ˜ α , i , k and ψ ˜ β , j , k are obtained from the equivalence
ϕ α ( z ) θ α ( z ) Θ i , α ( z ) Φ i , α ( z ) k = 0 ψ ˜ α , i , k z k
and
ϕ β ( z ) θ β ( z ) Θ j , β ( z ) Φ j , β ( z ) k = 0 ψ ˜ β , j , k z k .
Alternatively, we can say that we are finding the optimal set of { y 1 , , y N } such that we minimize
M S F E s u b a g g = α = 1 n β = 1 n l = 0 ω β , l ω α , l · i = 1 N j = 1 N k = 0 y i , α y j , β ψ ˜ α , i , k ψ ˜ β , j , k σ i j
where ω α , l and ψ α , l in Equation (69) are obtained through the equivalence
θ α ( z ) ϕ α ( z ) = l = 0 ψ α , l z l
with θ α ( z ) and ϕ α ( z ) found using Remark 1, where Laurent polynomials O ( z ) , P ( z ) and Q ( z ) are obtained from the covariance generating function G C α ( z ) given by
G C α ( z ) = i = 1 N y i , α σ i 2 Θ i ( z ) Θ i ( z 1 ) Φ i ( z ) Φ i ( z 1 ) + i = 1 N 1 j = i + 1 N y i , α y j , α σ i j Θ i ( z ) Φ i ( z ) Θ j ( z 1 ) Φ j ( z 1 ) + σ i j Θ i ( z 1 ) Φ i ( z 1 ) Θ j ( z ) Φ j ( z ) .
We note that it is impossible to offer an explicit solution because of the dependence of coefficients ω β , l , ω α , l , ψ ˜ α , i , k and ψ ˜ β , j , k on the selection of clusters. In the next section, we consider a much simpler case of demand streams being generated by MA(1) models which leads to a much simpler objective function. This allows us to find several theoretical results, culminating in the fact that optimal clusters can be found in this case by identifying streams having the closest MA coefficients with one another.

6. MA(1) Streams

In this section, we consider the case that the demand streams being considered are independent MA(1). As we demonstrate below, this leads to a simpler objective function. We will use this fact to show how we can use non-linear optimization to assign clusters to streams and to come up with an efficient way to cluster independent MA(1) streams based on segmenting the coefficient space into intervals. The focus on MA(1) streams here allows us to observe that streams with their MA coefficients close to each other should be clustered together. At the end of the section, we provide a lemma on aggregating streams produced by models having identical ARMA coefficients.
 Lemma 2. 
Suppose { X 1 , t } , { X 2 , t } , , { X N , t } are MA(1) with MA coefficients θ 1 , θ 2 , , θ N . Optimal clusters can be found by assigning y j k as an indicator variable for stream X j being in cluster C k such that we minimize
k = 1 n ( b k + 2 a k ) ( b k 2 a k )
where
b k = j = 1 N σ j 2 ( 1 + θ j 2 ) y j k
and
a k = j = 1 N σ j 2 θ j y j k .
Alternatively, the objective function (77) can be written as
k = 1 n j = 1 N i = 1 N σ j 2 σ i 2 ( 1 + θ j ) 2 ( 1 θ i ) 2 y j k y i k .
 Proof. 
Suppose { X 1 , t } , { X 2 , t } , , { X N , t } are MA(1) with MA coefficients θ 1 , θ 2 , , θ N . Suppose cluster C α , t consists of streams { X 1 , t } , , { X α , t } . The covariance generating function of C α , t simplifies to
G C α ( z ) = σ 1 2 ( 1 + θ 1 z ) ( 1 + θ 1 z 1 ) + + σ α 2 ( 1 + θ α z ) ( 1 + θ α z 1 ) .
In order to determine the variance of the shocks of C α , t , we need to find the roots of Equation (81). Note that it can be rewritten as
σ 1 2 θ 1 + + σ α 2 θ α z 1 + σ 1 2 ( 1 + θ 1 2 ) + + σ α 2 ( 1 + θ α 2 ) + σ 1 2 θ 1 + + σ α 2 θ α z .
We can find the roots of Equation (82) using the quadratic formula b ± b 2 4 a 2 2 a where
a = σ 1 2 θ 1 + + σ α 2 θ α = j = 1 α σ j 2 θ j
b = σ 1 2 ( 1 + θ 1 2 ) + + σ α 2 ( 1 + θ α 2 ) = j = 1 α σ j 2 ( 1 + θ j 2 )
We note that one roots r 1 will be outside the unit circle and from Remark 1 that the variance of the shocks of C α , t is equal to a r 1 . Since b > 0 , the variance is then given by
σ α 2 = b + ( b + 2 a ) ( b 2 a ) 2
Thus, if we are subaggregating into n clusters, the MSFE is given by
M S F E s u b a g g = 1 2 k = 1 n b k + k = 1 n ( b k + 2 a k ) ( b k 2 a k )
Since k = 1 n b k will be the same regardless of choice of clusters, we are minimizing the objective function given by
k = 1 n ( b k + 2 a k ) ( b k 2 a k )
where
b k = j = 1 N σ j 2 ( 1 + θ j 2 ) y j k
and
a k = j = 1 N σ j 2 θ j y j k
where y j k is an indicator variable for stream X j being in cluster C k .
Noting that
( b k + 2 a k ) = j = 1 N σ j 2 ( 1 + θ j ) 2 y j k
( b k 2 a k ) = j = 1 N σ j 2 ( 1 θ j ) 2 y j k
The objective function can also be rewritten as
k = 1 n j = 1 N i = 1 N σ j 2 σ i 2 ( 1 + θ j ) 2 ( 1 θ i ) 2 y j k y i k .
We have results (not shown here) where we use Equation (92) in a non-linear optimization algorithm to come up with clusters. When using 10 streams with three clusters, the approach actually leads to a globally optimal solution almost every time.
We note that Equation (92) can also be rewritten as
k = 1 n X i , X j C k σ j 2 σ i 2 ( 1 + θ j ) 2 ( 1 θ i ) 2 .
where the inner sum is taken over all pairs of streams in each cluster, including pairs of streams with themselves.
If stream X p is moved from cluster k to cluster κ , then the change in the objective function is
σ p 2 ( 1 + θ p ) 2 X i C κ σ i 2 ( 1 θ i ) 2 + σ p 2 ( 1 θ p ) 2 X i C κ σ i 2 ( 1 + θ i ) 2 σ p 2 ( 1 + θ p ) 2 X i C k σ i 2 ( 1 θ i ) 2 + σ p 2 ( 1 θ p ) 2 X i C k σ i 2 ( 1 + θ i ) 2
which provides an alternative way to cluster streams by identifying and moving the stream from one cluster to another, which yields the largest drop in the objective function.

Aggregate of Two MA(1) Streams

In this subsection, we consider two MA(1) streams whose variance of shocks is unitary. We demonstrate that the MA coefficient (The aggregate of two MA(1) streams is always MA(1)) of their aggregate process is always between the two MA coefficients of the individual streams. Furthermore, we show that as the two coefficients of the two MA(1) streams are moved further apart from each other, the variance of the shocks appearing in the aggregated process increases. These two facts imply that if we are studying N individual MA(1) streams (with unit shock variance) and would like to cluster them into n clusters, then the globally optimal clustering assignment will cluster the streams along intervals. That is, the assignment will have split the clusters into groups of streams whose MA coefficients are next to each other in a sorted arrangement. This implies that an efficient algorithm for obtaining a globally optimal cluster assignment consists of arranging the MA coefficients in increasing order and checking all possible “interval” clusters, without worrying that if two streams are clustered together, another stream with an MA coefficient between the two is assigned to a different cluster. This algorithm would need to check N n possible cluster assignments to find the globally optimal arrangement.
 Theorem 1. 
Consider two streams whose ARMA representations are given by
X 1 , t = ( 1 + θ 1 B ) ϵ 1 , t
X 2 , t = ( 1 + θ 2 B ) ϵ 2 , t
with v a r ( ϵ 1 , t ) = v a r ( ϵ 2 , t ) = 1 and σ 12 = 0 and θ 1 < θ 2 . Note that θ 1 and θ 2 are allowed to equal 0.
The aggregated process { X 1 , t + X 2 , t } is described by the MA(1) model
X 1 , t + X 2 , t = ( 1 + θ B ) ϵ t
such that θ 1 < θ < θ 2 .
 Proof. 
Note that the covariance generating function of the aggregate process is
G S ( z ) = ( θ 1 + θ 2 ) z 1 + ( 2 + θ 1 2 + θ 2 2 ) + ( θ 1 + θ 2 ) z .
As long as θ 1 θ 2 (if θ 1 = θ 2 , then θ = 0 and the result still holds), this polynomial has roots a 1 and 1 / a 1 . Suppose a 1 is inside the unit circle, then according to Remark 1, θ = a 1 and v a r ( ϵ t ) = σ ϵ 2 = ( θ 1 + θ 2 ) ( 1 / a 1 ) .
We can note that the Laurent polynomial in Equation (95) has roots given by
( 2 + θ 1 2 + θ 2 2 ) ± ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 2 ( θ 1 + θ 2 )
This implies that
a 1 = ( 2 + θ 1 2 + θ 2 2 ) + ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 2 ( θ 1 + θ 2 )
and
θ = ( 2 + θ 1 2 + θ 2 2 ) ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 2 ( θ 1 + θ 2 )
In the remainder of this section, we will prove that
θ 1 < ( 2 + θ 1 2 + θ 2 2 ) ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 2 ( θ 1 + θ 2 ) < θ 2
Suppose first that θ 1 + θ 2 > 0 .
Then, we can rewrite Equation (99) as
2 θ 1 2 + 2 θ 2 θ 1 < ( 2 + θ 1 2 + θ 2 2 ) ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 < 2 θ 2 2 + 2 θ 2 θ 1
or equivalently,
θ 1 2 θ 2 2 + 2 θ 2 θ 1 2 < ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 < θ 2 2 θ 1 2 + 2 θ 1 θ 2 2
or
θ 1 2 θ 2 2 2 θ 2 θ 1 + 2 < ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 < θ 2 2 θ 1 2 2 θ 1 θ 2 + 2
Noting that the left and right-hand sides of the inequality are always larger than zero, (102) holds if and only if the following inequality holds as well:
( θ 1 2 θ 2 2 2 θ 2 θ 1 + 2 ) 2 < ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 < ( θ 2 2 θ 1 2 2 θ 1 θ 2 + 2 ) 2 .
Labeling the three sides of this inequality as A < B < C , we observe that
A = θ 1 4 θ 1 2 θ 2 2 2 θ 1 3 θ 2 + 2 θ 1 2 θ 1 2 θ 2 2 + θ 2 4 + 2 θ 2 3 θ 1 2 θ 2 2 2 θ 1 3 θ 2 + 2 θ 2 3 θ 1 + 4 θ 1 2 θ 2 2 4 θ 2 θ 1 + 2 θ 1 2 2 θ 2 2 4 θ 2 θ 1 + 4 B = 4 + 2 θ 1 2 + 2 θ 2 2 + 2 θ 1 2 + θ 1 4 + θ 1 2 θ 2 2 + 2 θ 2 2 + θ 1 2 θ 2 2 + θ 2 4 4 ( θ 1 2 + 2 θ 1 θ 2 + θ 2 2 ) C = θ 2 4 θ 2 2 θ 1 2 2 θ 2 3 θ 1 + 2 θ 2 2 θ 2 2 θ 1 2 + θ 1 4 + 2 θ 1 3 θ 2 2 θ 1 2 2 θ 2 3 θ 1 + 2 θ 1 3 θ 2 + 4 θ 2 2 θ 1 2 4 θ 1 θ 2 + 2 θ 2 2 2 θ 1 2 4 θ 1 θ 2 + 4
Removing equivalent terms and combining like terms, we observe
2 θ 1 2 θ 2 2 4 θ 1 3 θ 2 + 4 θ 1 2 + 4 θ 2 3 θ 1 4 θ 2 2 8 θ 1 θ 2 < 2 θ 1 2 θ 2 2 8 θ 1 θ 2 < θ 2 2 θ 1 2 4 θ 2 3 θ 1 + 4 θ 2 2 + 4 θ 1 3 θ 2 4 θ 1 2 8 θ 2 θ 1
which can be rewritten as
4 θ 1 3 θ 2 + 4 θ 1 2 + 4 θ 1 θ 2 3 4 θ 2 2 < 0 < 4 θ 1 θ 2 3 + 4 θ 2 2 + 4 θ 1 3 θ 2 4 θ 1 2 .
Noting that the left and right-hand sides of this inequality are additive inverses, we see that this inequality holds (the given direction of the inequalities must hold, otherwise θ 1 > θ 2 in (99)) and therefore inequality (99) holds.
Finally, if θ 1 + θ 2 < 0 in (99), then the direction of the inequalities is reversed in (100) and a similar sequence of steps would lead us to observe
4 θ 1 3 θ 2 + 4 θ 1 2 + 4 θ 1 θ 2 3 4 θ 2 2 > 0 > 4 θ 1 θ 2 3 + 4 θ 2 2 + 4 θ 1 3 θ 2 4 θ 1 2
and by the same argument, we see that (99) holds and the theorem is proved. □
 Theorem 2. 
Consider two streams whose ARMA representations are given by
X 1 , t = ( 1 + θ 1 B ) ϵ 1 , t
X 2 , t = ( 1 + θ 2 B ) ϵ 2 , t
with σ 1 2 = σ 2 2 = 1 and σ 12 = 0 and θ 1 < θ 2 . Note that θ 1 and θ 2 may be 0.
As the distance between θ 1 and θ 2 increases, v a r ( ϵ t ) = σ ϵ 2 increases.
 Proof. 
From Equation (95) we see that the root 1 / a 1 of G S ( z ) , which is outside the unit circle, is given by
( 2 + θ 1 2 + θ 2 2 ) ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 2 ( θ 1 + θ 2 ) .
Since σ ϵ 2 = ( θ 1 + θ 2 ) / a 1 , we have
σ ϵ 2 = ( 2 + θ 1 2 + θ 2 2 ) + ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 2
To show that this is increasing in θ 2 , consider the derivative of the above with respect to θ 2 :
σ ϵ 2 θ 2 = θ 2 + θ 2 ( 2 + θ 1 2 + θ 2 2 ) 2 ( θ 1 + θ 2 ) ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 .
We need to show that (109) is larger than zero, thus consider
σ ϵ 2 θ 2 = θ 2 + θ 2 ( 2 + θ 1 2 + θ 2 2 ) 2 ( θ 1 + θ 2 ) ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 > 0
or equivalently
θ 2 ( 2 + θ 1 2 + θ 2 2 ) 2 ( θ 1 + θ 2 ) > θ 2 ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2
which simplifies to
2 ( θ 1 + θ 2 ) θ 2 ( 2 + θ 1 2 + θ 2 2 ) < θ 2 ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2
or
2 θ 1 θ 2 θ 1 2 θ 2 3 < θ 2 ( 2 + θ 1 2 + θ 2 2 ) 2 4 ( θ 1 + θ 2 ) 2 .
We will refer to the left and right hand sides of the inequality in (113) as L H S and R H S . Note that squaring both yields,
L H S 2 = 4 θ 1 2 4 θ 1 3 θ 2 4 θ 1 θ 2 3 + θ 1 4 θ 2 2 + 2 θ 1 2 θ 2 4 + θ 2 6 R H S 2 = 4 θ 2 2 + 4 θ 1 2 θ 2 2 + θ 1 4 θ 2 2 + 2 θ 1 2 θ 2 4 + θ 2 6 4 θ 1 2 θ 2 2 8 θ 1 θ 2 3
Note that, furthermore,
L H S 2 R H S 2 = 4 θ 1 2 4 θ 2 2 4 θ 1 3 θ 2 + 4 θ 1 θ 2 3 = 4 ( 1 θ 1 θ 2 ) ( θ 1 2 θ 2 2 )
Suppose first that θ 1 < θ 2 = 0 . Note that (113) reduces to 2 θ 1 < 0 , which holds, and therefore (110) holds as well.
Next, suppose that 0 = θ 1 < θ 2 . Note that (113) reduces to
θ 2 3 < θ 2 ( 2 + θ 2 2 ) 2 4 θ 2 2
which holds as well since the left-hand side is negative in this case.
In the remainder of the proof, we assume that θ 1 0 and θ 2 0 . Consider the case that | θ 2 | > | θ 1 | . Note that this implies that θ 2 >0 (since θ 2 > θ 1 ) and that (115) is less than zero. Thus, in this case L H S 2 < R H S 2 and L H S < R H S in (113). Therefore, (110) holds in this case as well.
Now, suppose that | θ 1 | > | θ 2 | and that θ 2 > 0 . The former implies that L H S 2 R H S 2 > 0 and the latter implies that R H S > 0 . Therefore L H S < 0 and therefore L H S < R H S . Therefore, (110) holds in this case as well.
Finally, suppose that | θ 1 | > | θ 2 | and that θ 2 < 0 . The former again implies that L H S 2 R H S 2 > 0 , while the latter implies that R H S < 0 . Therefore, L H S < R H S and (110) holds in all cases. □

7. Demand Streams Produced by Identical ARMA Models

Note that Theorems 1 and 2 imply that the prescribed algorithm at the top of the previous subsection always leads to an optimal solution. The following lemma establishes that in the event that two streams are generated by the same ARMA model, the aggregate will also follow the same ARMA model. Therefore, we can greatly reduce the dimensionality of the number of streams that need to be assigned to clusters by first aggregating demand streams from equivalent models.
 Lemma 3. 
Consider two sequences { X 1 , t } and { X 2 , t } that have the same ARMA representation with respect to Wold shock sequences { ϵ 1 , t } and { ϵ 2 , t } given by
Φ ( B ) X 1 , t = Θ ( B ) ϵ 1 , t
Φ ( B ) X 2 , t = Θ ( B ) ϵ 2 , t
such that the variance of the shock sequences are σ 1 2 and σ 2 2 with covariance σ 12 .
The aggregate { S t = X 1 , t + X 2 , t } also has the same ARMA representation with respect to its Wold shocks { ϵ t } given by
Φ ( B ) S t = Θ ( B ) ϵ t
such that the variance of { ϵ t } is given by σ 1 2 + σ 2 2 + 2 σ 12 .
 Proof. 
From Remark 1, we note that the ARMA representation of { S t } is given by
Φ ( B ) S t = Θ ( B ) ϵ t
such that Θ ( z ) = i = 1 m ( 1 a i z ) where { a i } are the roots of O ( z ) P ( z ) on or inside the unit circle and Φ ( z ) = i = 1 n ( 1 b i z ) where { b i } are the roots of Q ( z ) inside the unit circle with O ( z ) , P ( z ) and Q ( z ) are obtained from the covariance generating function G S ( z ) given by
G S ( z ) = ( σ 1 2 + σ 2 2 ) Θ ( z ) Θ ( z 1 ) Φ ( z ) Φ ( z 1 ) + 2 σ 12 Θ ( z ) Θ ( z 1 ) Φ ( z ) Φ ( z 1 )
as per (76). This can further be simplified as
G S ( z ) = ( σ 1 2 + σ 2 2 + 2 σ 12 ) Θ ( z ) Θ ( z 1 ) Φ ( z ) Φ ( z 1 )
and therefore Φ ( z ) = Φ ( z ) and Θ ( z ) = Θ ( z ) , and the result is proved. □
Lemma 3 shows us that if we have n demand sequences X 1 , t , , X n , t , generated by models with the same ARMA coefficients with respect to their Wold shocks, their aggregate will have the same ARMA coefficients. Therefore, if the customer base of a firm is comprised of many demand streams having the same ARMA representation, it is possible to greatly reduce the number of streams that need to be considered for clustering by first aggregating these equivalent streams.

8. Extensions and Other Questions

In this paper we compare theoretical MSFEs of a firm forecasting its leadtime demand based on disaggregated (individual) demand streams and subaggregated clusters formed from those streams. We highlight examples that illustrate that the MSFE based on subaggregates need not be much larger than the MSFE based on the disaggregated streams as long as those clusters are well formed. We propose a Pivot Clustering algorithm to form clusters which minimize the MSFE among all cluster assignments. We end with some theoretical results when the demand streams are generated by MA(1). Here, we show that clusters resulting in the lowest MSFE are formed by grouping streams by the proximity of their MA coefficient.
The MA(1) case hints that in a general ARMA case, “best” clusters would be formed based on proximity of the ARMA coefficients between models generating the various streams (or equivalently based on the proximity of roots of the AR and MA polynomials). Alternatively, best cluster assignments may result from grouping streams with most similar coefficients appearing in the MA() representation. Future work can be performed to establish the best approach.
Our current theoretical approach based on general ARMA models is limited in that root-finding algorithms are unstable once the degree of a polynomial gets too large. In our study, this begins to occur when we consider the aggregate of around twenty streams with at least one AR coefficient. It is possible to greatly reduce the dimensionality, however, by first aggregating demand streams that are produced by identical or nearly-identical ARMA models. This is another direction for future research.

Author Contributions

All authors contributed equally to conceptualization, methodology, formal analysis and writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kohn, R. When is an Aggregate of a Time Series Efficiently Forecast by its Past. J. Econom. 1982, 18, 337–349. [Google Scholar] [CrossRef]
  2. Dunn, D.M.; Williams, W.H.; DeChaine, T.L. Aggregate versus Subaggregate Models in Local Area Forecasting. J. Am. Stat. Assoc. 1976, 71, 68–71. [Google Scholar] [CrossRef]
  3. Benjaafar, S.; ElHafsi, M.; de Véricourt, F. Demand Allocation in Multiple-Product, Multiple-Facility, Make-to-Stock Systems. Manag. Sci. 2004, 50, 1431–1448. [Google Scholar] [CrossRef]
  4. Bernstein, F.; Modaresi, S.; Sauré, D. A Dynamic Clustering Approach to Data-Driven Assortment Personalization. Manag. Sci. 2018, 65, 1949–2443. [Google Scholar] [CrossRef]
  5. Ma, S.; Fildes, R.; Huang, T. Demand forecasting with high dimensional data: The Case of SKU Retail Sales Forecasting with Intra- and Inter-Category Promotional Information. Eur. J. Oper. Res. 2016, 249, 245–257. [Google Scholar] [CrossRef]
  6. Brockwell, P.J.; Davis, R.A. Time Series: Theory and Methods, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1991. [Google Scholar]
  7. Kovtun, V.; Giloni, A.; Hurvich, C. The Value of Sharing Disaggregated Information in Supply Chains. Eur. J. Oper. Res. 2019, 277, 469–478. [Google Scholar] [CrossRef]
  8. Giloni, A.; Hurvich, C.; Seshadri, S. Forecasting and Information Sharing in Supply Chains Under ARMA Demand. IIE Trans. 2014, 46, 35–54. [Google Scholar] [CrossRef]
  9. Sayed, A.H.; Kailath, T. A Survey of Spectral Factorization Methods. Numer. Linear Algebra Appl. 2001, 8, 467–496. [Google Scholar] [CrossRef]
  10. Yu, L. Evaluation and Analysis of Electric Power in China Based on the ARMA Model. Math. Probl. Eng. 2022, 5017751. [Google Scholar] [CrossRef]
  11. Rostami-Tabar, B.; Babai, M.Z.; Syntetos, A.; Ducq, Y. Forecasting aggregate ARMA(1,1) Demands: Theoretical analysis of top-down versus bottom-up. In Proceedings of the International Conference on Industrial Engineering and Systems Management, Agdal, Morocco, 28–30 October 2013. [Google Scholar]
  12. Kovtun, V. Available online: https://github.com/vkovtun84/Pivot-Clustering-of-Demand-Streams (accessed on 17 September 2023).
Figure 1. Runtimes of Pivot Clustering for assigning streams to four clusters using theoretical and estimated MSFEs. The graph contains the average runtime of Pivot Clustering for each specific number of streams being assigned to four clusters. That is, for N streams (between 10 and 20), we consider twenty randomly assigned ARMA(1,1) models for each stream. Pivot Clustering (with theoretical and estimated MSFE) is then used to obtain the four optimal clusters. The average runtime of Pivot Clustering is approximately 63 s when twenty streams are assigned to clusters using theoretical MSFE and 40 s using estimated MSFE.
Figure 1. Runtimes of Pivot Clustering for assigning streams to four clusters using theoretical and estimated MSFEs. The graph contains the average runtime of Pivot Clustering for each specific number of streams being assigned to four clusters. That is, for N streams (between 10 and 20), we consider twenty randomly assigned ARMA(1,1) models for each stream. Pivot Clustering (with theoretical and estimated MSFE) is then used to obtain the four optimal clusters. The average runtime of Pivot Clustering is approximately 63 s when twenty streams are assigned to clusters using theoretical MSFE and 40 s using estimated MSFE.
Stats 06 00075 g001
Figure 2. Theoretical subaggregated MSFE for random initialization of Pivot Clustering. Theoretical MSFEs are computed on the four clusters obtained by Pivot Clustering for different random initializations. The MSFE of the initial random assignment is provided as well as the MSFE that is obtained by Pivot Clustering. Horizontal lines are drawn to represent the MSFE based on the fully aggregated demand sequence (top) and the MSFE based on the fully disaggregated demand sequences (bottom).
Figure 2. Theoretical subaggregated MSFE for random initialization of Pivot Clustering. Theoretical MSFEs are computed on the four clusters obtained by Pivot Clustering for different random initializations. The MSFE of the initial random assignment is provided as well as the MSFE that is obtained by Pivot Clustering. Horizontal lines are drawn to represent the MSFE based on the fully aggregated demand sequence (top) and the MSFE based on the fully disaggregated demand sequences (bottom).
Stats 06 00075 g002
Figure 3. Estimated subaggregated MSFE for random initialization of Pivot Clustering. Estimated MSFEs are computed for different initializations of Pivot Clustering. The MSFE of the initial random assignment is provided as well as the MSFE that is obtained by Pivot Clustering. Horizontal lines are drawn to represent the MSFE based on the fully aggregated demand sequence (top) and the MSFE based on the fully disaggregated demand sequences (bottom).
Figure 3. Estimated subaggregated MSFE for random initialization of Pivot Clustering. Estimated MSFEs are computed for different initializations of Pivot Clustering. The MSFE of the initial random assignment is provided as well as the MSFE that is obtained by Pivot Clustering. Horizontal lines are drawn to represent the MSFE based on the fully aggregated demand sequence (top) and the MSFE based on the fully disaggregated demand sequences (bottom).
Stats 06 00075 g003
Figure 4. Theoretical subaggregated MSFE found by Pivot Clustering for different sets of streams. Theoretical MSFEs are computed for twenty simulations using different sets of twenty streams in each simulation. We note that using individual streams to forecasts leads to the lowest MSFE, while basing the forecast on the aggregate of the streams always leads to the highest MSFE. If subaggregated clusters are formed from the streams, the lowest MSFEs are obtained when clusters are based on Pivot Clustering.
Figure 4. Theoretical subaggregated MSFE found by Pivot Clustering for different sets of streams. Theoretical MSFEs are computed for twenty simulations using different sets of twenty streams in each simulation. We note that using individual streams to forecasts leads to the lowest MSFE, while basing the forecast on the aggregate of the streams always leads to the highest MSFE. If subaggregated clusters are formed from the streams, the lowest MSFEs are obtained when clusters are based on Pivot Clustering.
Stats 06 00075 g004
Figure 5. Estimated subaggregated MFSE found by Pivot Clustering for different sets of streams. Estimated MSFEs are computed for twenty simulations using different sets of twenty streams in each simulation. We note that using individual streams to forecasts leads to the lowest MSFE while basing the forecast on the aggregate of the streams always leads to the highest MSFE. If subaggregated clusters are formed from the streams, the lowest MSFEs are obtained when clusters are based on Pivot Clustering.
Figure 5. Estimated subaggregated MFSE found by Pivot Clustering for different sets of streams. Estimated MSFEs are computed for twenty simulations using different sets of twenty streams in each simulation. We note that using individual streams to forecasts leads to the lowest MSFE while basing the forecast on the aggregate of the streams always leads to the highest MSFE. If subaggregated clusters are formed from the streams, the lowest MSFEs are obtained when clusters are based on Pivot Clustering.
Stats 06 00075 g005
Figure 6. Theoretical subaggregated MSFE found by Pivot Clustering for different sets of streams. Theoretical MSFEs are computed for twenty simulations using different sets of twenty streams in each simulation and different initial assignments of streams to clusters. When averaging the final MSFEs based on the different initializations for each set of streams, we note that using individual streams to forecasts leads to the lowest averaged MSFE, while basing the forecast on the aggregate of the streams always leads to the highest averaged MSFE. If subaggregated clusters are formed from the streams, the lowest averaged MSFEs are obtained when clusters are based on Pivot Clustering.
Figure 6. Theoretical subaggregated MSFE found by Pivot Clustering for different sets of streams. Theoretical MSFEs are computed for twenty simulations using different sets of twenty streams in each simulation and different initial assignments of streams to clusters. When averaging the final MSFEs based on the different initializations for each set of streams, we note that using individual streams to forecasts leads to the lowest averaged MSFE, while basing the forecast on the aggregate of the streams always leads to the highest averaged MSFE. If subaggregated clusters are formed from the streams, the lowest averaged MSFEs are obtained when clusters are based on Pivot Clustering.
Stats 06 00075 g006
Figure 7. Estimated subaggregated MSFE found by Pivot Clustering for different sets of streams. Estimated MSFEs are computed for twenty simulations using different sets of twenty streams in each simulation and different initial assignments of streams to clusters. When averaging the final MSFEs based on the different initializations for each set of streams, we note that using individual streams to forecasts leads to the lowest averaged MSFE, while basing the forecast on the aggregate of the streams always leads to the highest averaged MSFE. If subaggregated clusters are formed from the streams, the lowest averaged MSFEs are obtained when clusters are based on Pivot Clustering.
Figure 7. Estimated subaggregated MSFE found by Pivot Clustering for different sets of streams. Estimated MSFEs are computed for twenty simulations using different sets of twenty streams in each simulation and different initial assignments of streams to clusters. When averaging the final MSFEs based on the different initializations for each set of streams, we note that using individual streams to forecasts leads to the lowest averaged MSFE, while basing the forecast on the aggregate of the streams always leads to the highest averaged MSFE. If subaggregated clusters are formed from the streams, the lowest averaged MSFEs are obtained when clusters are based on Pivot Clustering.
Stats 06 00075 g007
Figure 8. Estimated MSFEs when using aggregated, disaggregated and subaggregated series to forecast one-step-ahead demand for series that are not generated using an ARMA process. A total of 10 simulations are performed where, within each simulation, twenty separate demand realizations are generated according to the ARFIMA( 0 , d , 0 ) model with a different d for each realization such that the shocks appearing in the ARFIMA model are contemporaneously correlated. Pivot Clustering is carried out for five random initialization of assigning the streams to one of four clusters. We compute the estimated MSFEs for the disaggregated series (blue), aggregated series (black), subaggregated clusters generated using random assignment (red) and subaggregated clusters generated using the result of Pivot Clustering. We note the supremacy of Pivot Clustering in all ten simulations.
Figure 8. Estimated MSFEs when using aggregated, disaggregated and subaggregated series to forecast one-step-ahead demand for series that are not generated using an ARMA process. A total of 10 simulations are performed where, within each simulation, twenty separate demand realizations are generated according to the ARFIMA( 0 , d , 0 ) model with a different d for each realization such that the shocks appearing in the ARFIMA model are contemporaneously correlated. Pivot Clustering is carried out for five random initialization of assigning the streams to one of four clusters. We compute the estimated MSFEs for the disaggregated series (blue), aggregated series (black), subaggregated clusters generated using random assignment (red) and subaggregated clusters generated using the result of Pivot Clustering. We note the supremacy of Pivot Clustering in all ten simulations.
Stats 06 00075 g008
Table 1. The MSFEs for various clusters of three, two, and five streams.
Table 1. The MSFEs for various clusters of three, two, and five streams.
MSFEClusters
52.344955766,10,9 and 1,2 and 8,3,4,5,7
51.909121882,10,5 and 3,8 and 4,1,7,6,9
31.407892187,2,10 and 8,9 and 5,4,6,3,1
44.1596236910,1,7 and 3,4 and 8,2,9,6,5
50.325250785,8,3 and 1,10 and 4,2,7,6,9
39.311007696,9,5 and 10,7 and 1,3,8,4,2
45.093581413,1,10 and 5,8 and 2,4,9,7,6
51.548286099,5,10 and 4,2 and 7,6,8,1,3
34.211548298,7,2 and 1,9 and 6,4,5,10,3
55.214457946,1,10 and 2,4 and 7,3,8,5,9
Table 2. Twenty simulations are carried out where within each simulation we select a new set of ARMA(1,1) coefficients for each of 10 streams. The streams are then clustered using Pivot Clustering using theoretical MSFE based on 10 different starting groups. We also obtain the optimal (minimum MSFE) clustering assignment based on an exhaustive search of all possible assignments of streams to clusters. Each row corresponds to a new simulation. The three columns contain the mean, median and minimum ratios of global optimal MSFE to the MSFE obtained by Pivot Clustering for the different initializations in the event that Pivot Clustering did not find the optimal solution.
Table 2. Twenty simulations are carried out where within each simulation we select a new set of ARMA(1,1) coefficients for each of 10 streams. The streams are then clustered using Pivot Clustering using theoretical MSFE based on 10 different starting groups. We also obtain the optimal (minimum MSFE) clustering assignment based on an exhaustive search of all possible assignments of streams to clusters. Each row corresponds to a new simulation. The three columns contain the mean, median and minimum ratios of global optimal MSFE to the MSFE obtained by Pivot Clustering for the different initializations in the event that Pivot Clustering did not find the optimal solution.
SimulationAverage MSFE-Ratio of Global to PivotMinimum MSFE-Ratio of Global to PivotMedian MSFE-Ratio of Global to Pivot
10.96810.96810.9624
20.97740.97670.9748
30.98940.98900.9846
40.98650.98650.9865
50.94840.94790.9284
60.99590.99630.9952
70.97210.98100.9435
80.99330.99330.9933
90.91720.89200.8920
100.98090.98090.9809
110.97620.98610.9310
120.99020.98940.9864
130.98770.98770.9877
140.98900.99910.9688
150.84170.84810.8027
160.95680.95680.9315
170.99160.99520.9785
180.99420.99650.9802
190.99200.99480.9695
200.99460.99460.9946
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kovtun, V.; Giloni, A.; Hurvich, C.; Seshadri, S. Pivot Clustering to Minimize Error in Forecasting Aggregated Demand Streams Each Following an Autoregressive Moving Average Model. Stats 2023, 6, 1198-1225. https://doi.org/10.3390/stats6040075

AMA Style

Kovtun V, Giloni A, Hurvich C, Seshadri S. Pivot Clustering to Minimize Error in Forecasting Aggregated Demand Streams Each Following an Autoregressive Moving Average Model. Stats. 2023; 6(4):1198-1225. https://doi.org/10.3390/stats6040075

Chicago/Turabian Style

Kovtun, Vladimir, Avi Giloni, Clifford Hurvich, and Sridhar Seshadri. 2023. "Pivot Clustering to Minimize Error in Forecasting Aggregated Demand Streams Each Following an Autoregressive Moving Average Model" Stats 6, no. 4: 1198-1225. https://doi.org/10.3390/stats6040075

APA Style

Kovtun, V., Giloni, A., Hurvich, C., & Seshadri, S. (2023). Pivot Clustering to Minimize Error in Forecasting Aggregated Demand Streams Each Following an Autoregressive Moving Average Model. Stats, 6(4), 1198-1225. https://doi.org/10.3390/stats6040075

Article Metrics

Back to TopTop