Impact of Information Sharing and Forecast Combination on Fast-Moving-Consumer-Goods Demand Forecast Accuracy

: This article empirically demonstrates the impacts of truthfully sharing forecast information and using forecast combinations in a fast-moving-consumer-goods (FMCG) supply chain. Although it is known a priori that sharing information improves the overall efﬁciency of a supply chain, information such as pricing or promotional strategy is often kept proprietary for competitive reasons. In this regard, it is herein shown that simply sharing the retail-level forecasts—this does not reveal the exact business strategy, due to the effect of omni-channel sales—yields nearly all the beneﬁts of sharing all pertinent information that inﬂuences FMCG demand. In addition, various forecast combination methods are used to further stabilize the forecasts, in situations where multiple forecasting models are used during operation. In other words, it is shown that combining forecasts is less risky than “betting” on any component model.


Introduction
Demand forecasting at various forecast horizons is a key component in manufacturing operation management. Although the demand-forecasting problem have been addressed extensively in the literature and from many perspectives, the evolution of the complex supply chain-such as exposure to new data types, new retailing regimes, or new operation strategies-creates new challenges and opportunities for more accurate forecasts.
Forecast skill generally depends on two factors: the amount of data (information) that is available, and the depth of understanding about the underlying data-generating process. Whereas many papers in the literature focus on better inferring the underlying process, i.e., proposing various models to improve forecasts for a particular application, the integration of information across a supply chain is relatively less discussed. From an operational perspective, it is well understood that coordination across an entire supply chain creates win-win situations for all players in that supply chain. Such coordination is not just limited to the flow of materials, but also requires effective flow of information [1,2]. In a typical supply chain, the amount of demand-related data are abundant, ranging from a news release on a catastrophic event that may affect the global demand of a particular product, to a mobile-device advertising campaign launched by a local retailer. It is almost impossible for a particular player in a supply chain to obtain, manage, and use all of this information during forecasting. As a result, demand forecasts are often generated by using only an organization's internal data. For example, a manufacturer usually produces forecasts based on historical order data; the accuracy of such forecasts is often limited. On this point, the first contribution of this paper is therefore on investigating the benefits of forecast information sharing in a supply chain.

Literature Review
The literature review is divided into four parts. A summary is presented at the end of the section to relate the various ideas herein discussed.

On Information Sharing
Although it is known a priori that more information-provided that it is relevant-leads to better forecasts, the supply-chain literature on this topic seems to be divided. On one hand, with no surprise, there are abundant papers in the literature, many being empirical, showing that sharing historical demand information between downstream retailers and upstream suppliers is valuable. On the other hand, a stream of research says that, under certain conditions, sharing of demand information from downstream retailers to upstream suppliers is not valuable, as the historical time series data of the retailers' orders provides sufficient information for suppliers to impute the end-consumer demand process. In other words, the divided insights are partially subject to whether the retailer-level demand is inferable from the retailers' order data (see [7,8] and references therein for more details).
In a recent study, Ali and Boylan [9] showed that the part of literature that supports "no value of information sharing" mostly relies on certain restrictive assumptions, some of which may not hold in practice. For example, Cui et al. [10] showed that there is indeed value in sharing the demand time series. Although they use a similar demand-process model as papers showing "no value of information sharing", they find that, when retailers change their inventory policy (contingent on some private information), sharing the demand time series is valuable for the supply chain. The reader is referred to [10,11] for further information on such debate.
Apart from the general debate on information sharing, there are also more focused works on forecast sharing in supply chain (e.g., [12][13][14][15]). These works are mostly theoretical and/or dealing with supply-chain contracting. For instance, the authors in [13,14] considered a scenario where the retailer may game the system by strategically distorting the shared information. The game-theoretic interaction between retailers and suppliers, i.e., the continuum between the "all-or-nothing" cooperation (The term "all-or-nothing" is used to describe the situations where the supply-chain members either absolutely trust each other and cooperate or do not trust each other at all.) was modeled. Nevertheless, such behavior can be argued away by invoking the Folk theorem-a class of theorems about possible Nash equilibrium payoff profiles in repeated games [16]. Even when the interactions are not long-term, truthful sharing of forecasts can be encouraged by using appropriately designed contracts [15]. To that end, this paper assumes that all shared forecasts are truthful. Based on this assumption, this paper, along with the aforementioned papers, provides a strong prescription for supply-chain managers: sharing forecast information truthfully provides most of the benefits derived by sharing more (potentially sensitive) details, such as pricing or promotional strategies.

On Forecast Reconciliation
The concept of reconciliation emerges when the forecast quantity, such as renewable power generation [17][18][19], electrical load [20], tourism demand [21], or FMCG demand [22], can be modeled as a hierarchy. Due to the different information sets available at various levels in the hierarchy, e.g., retailers' business strategy is opaque to the distributor, and modeling uncertainties, the lower-level forecasts almost surely do not sum up to the higher-level forecasts. When such aggregation inconsistency is observed, decision makers are challenged with a selection problem, i.e., which set of forecasts should be used? In the case of FMCG forecasting, such inconsistency contradicts the core idea of manufacturing operation management, namely, effective and efficient supply-demand matching.
In the literature, there are several well-known aggregate-consistent forecasting methods, such as the bottom-up approach, top-down approach, or middle-out approach [21]. Bottom-up approach assumes the forecasts at a higher level of the hierarchy are the arithmetic sum of the bottom-level forecasts. In this case, a manufacturer does not make its own forecasts but awaits retailers' forecasts (sometimes in the form of order data). However, due to the need of creating "safety stock", retailers often over-forecast their demand. The excess demand propagates up the supply chain and results in a phenomenon commonly known as the bullwhip effect [23,24], which leads to inefficiency in supply chain. Another aggregate-consistent forecasting method is the top-down approach, which assumes the forecasts of lower-level demand are fractions of the manufacturer's total forecast. There are several ways to assign such fractions. However, such demand assignment often leads to inventory shortage/excess at lower levels. The middle-out approach combines the previous two approaches by initiating forecast from a middle level of the demand hierarchy; the forecasts on other levels are aggregated or disaggregated using the middle-level forecasts. However, when the supply chain is complex and contains many levels, identifying the "middle-level player" is not trivial; it also involves contracting issues that may complicate operation.
Recently, through a series of papers [21,[25][26][27], the optimal reconciliation technique-as detailed in the next section-was demonstrated. This technique takes the independently-made forecasts at various levels in a hierarchy (also known as the base forecasts), and produces a set of aggregate-consistent forecasts. Since the base forecasts are generated by different supply-chain entities, most likely using the best information available to these entities, they are often more accurate. Furthermore, as only the forecasts are required, neither the forecasting model nor the raw data needs to be revealed to the other entities. Due to the omni-channel sales, even when a particular forecast shows an increase in demand, the specific business strategy that causes that boost remains opaque to the peers, and thus protects the supply-chain entity who made that forecast. There are several versions of optimal reconciliation; this paper does not consider all of them. Instead, the aim is to demonstrate the benefits of forecast sharing, and thus only one version is used to exemplify the reconciliation procedure.

On Forecast Combination
Twenty years after the paper by Bates and Granger [6], Granger [28] noted in his review that the combination of forecasts is a simple, pragmatic, and sensible way to possibly produce better forecasts. Forecast combination often works even when some component forecasts are inefficient. For such reasons, forecast combination has become a standard practice in many scientific domains (e.g., [29][30][31]).
The core idea of forecast combination is to produce a final forecast using several component t , i ∈ {1, · · · , N}, are available, the combine forecast takes the form:ŷ (c) where α i are the weights placed on the component forecasts. It is immediately clear that there are many ways to estimate those weights. de Menezes et al. [32] summarized seven well-established methods that were thought to be good representatives of varying degrees of sophistication. The reader is referred to the original paper for details. Among the seven methods discussed by de Menezes et al. [32], the regression-based combination has the most variants. This paper considers several popular methods including ordinary least squares, least absolute deviations, lasso, and complete subset regression.

On FMCG Forecasting
FMCG demand forecasting, or any other forecasting, is essentially an input-output matching problem, where the input is the historical and current information, and the output is the forecast. To construct an appropriate forecasting model, many methods are available. For instance, econometric approaches (e.g., [33,34]) and machine-learning approaches (e.g., [35]) have both been used in the demand forecasting literature. The reader is referred to the work by Fildes et al. [36] for a comprehensive review on forecasting in operation research. In the supply-chain-planning process, due to the large number of factors that may affect demand, forecasting is often done in two steps. An automated forecasting system would produce an initial demand forecast and some adjustments are made subsequently based on either an expert's view, or some other algorithms [37].
General-purpose univariate forecasting methods, such as the autoregressive integrated moving average (ARIMA) family of models, or the exponential smoothing (ETS) family of models, are available in many statistical software packages, such as R or Python. They are therefore commonly utilized by the industry. Since forecasts that consider domain knowledge (exogenous factors) are generally better than those solely relying on univariate modeling, the output of a univariate forecasting method can be further adjusted using domain knowledge. For example, the base-times-lift model [38] is one such model.
Besides postprocessing the forecasts, the exogenous information can also be built into the forecasting models directly. In the case of FMCG demand forecasting, Huang et al. [39] considered the autoregressive distributed lag (ADL) model, which had shown excellent performance for their dataset. A large number of exogenous variables, as well as the lagged version of these variables, were used in their ADL model. The model is quite general and customizable. The complete form of the model is given by: where y 0 is the demand of the focal product; p 0 and p p are the price of the focal and competitor products; and I 0 and I q are the promotion index of the focal and competitor products. Together with the 12-month dummy variable and the dummy variables for nine major public holidays (and the weeks before that) in the US, if = = * = * * = P = Q = 2, the model would have 48 exogenous variables. Huang et al. [39] showed that the ADL model not only outperforms the ETS and base-times-lift models, but also marginally outperformed the reduced (without competitor information) ADL model. However, only a reduced model is considered in this paper because the real-time information on pricing and promotional strategy, as well as the corresponding demand times series faced by the competitors, are seldom available in practice. Moreover, a reduced model is less likely to be over-specified as compared to the full model.

Section Summary
In view of the above literature review, the following aspects are discussed in the remaining part of the paper:

1.
Various information-sharing strategies in hierarchical FMCG demand forecasting. Four cases are elaborated, based on different levels of information sharing. Forecasting models are selected based on the available information in each case. More specifically, Case I (see below) considers univariate forecasting models (ARIMA and ETS); Case II considers a reduced ADL model; the forecasting models for Case III utilizes hierarchical reconciliation; and Case IV again uses the ADL model, but with more predictors.

2.
Hierarchical reconciliation procedure. It has three main steps: (1) arrange the data into a hierarchical structure, (2) generate base forecasts, and (3) reconcile the base forecasts. Two methods, namely, the bottom-up approach and the optimal reconciliation are used.

3.
Effect of combining forecasts. After the forecast accuracies under different levels of information sharing are compared, all models are subsequently treated as component models to investigate the effect of combining forecasts. A total of seven forecast combination methods are considered in this paper.

Cases of Different Levels of Information Sharing and Hierarchical Reconciliation
In this section, details of the supply-chain structure and the different cases of information sharing are elaborated. More specifically, four cases are considered: (1) no information other than the past demand is shared (Case I); (2) the past demand, price, and promotional information, i.e., all past pertinent information, is shared (Case II); (3) only the past demand time series and the retailers' forecasts are shared (Case III); and (4) the past demand, price, and promotion time series, as well as the future planned pricing and promotional campaign strategies, are shared (Case IV).
Obviously, these cases are arranged in an increasing order in terms of the amount of shared information. Case IV is the one where all pertinent information is shared, so naturally it is expected to perform the best. However, this case is highly unrealistic, as future price and promotional campaign strategies are highly-sensitive information that retailers may not be willing to share. Case II also comes with practical challenges, as it would require retailers to divulge their historical pricing and promotional strategies. However, one may argue that this information may be less sensitive than what is needed for Case IV. Case I, on the other hand, does not have any challenge in terms of feasibility, nevertheless, its forecast accuracy may be greatly limited by the lack of information. Finally, Case III comes with relatively fewer practical challenges than Cases II and IV; it is thus argued to be the most feasible case of information sharing in the retail industry.
Before the forecasting models for each case are revealed, the hierarchical nature of the supply chain is exemplified. The number of levels in a demand hierarchy and the aggregation/disaggregation interpretation may vary according to the granularity of the supply-chain structure. For example, one may either disaggregate the total demand based on product types or do so geographically. Without loss of generality, Figure 1 shows a three-level hierarchical demand time series structure, which will be used throughout this paper. In Figure 1, the level 0 (L 0 ) demand is disaggregated based on n universal product codes (UPCs) to form the time series at level 1 (L 1 ). At L 1 , y i,t represents the demand of UPC i at time t. For each UPC, the demand is further disaggregated based on stores to form the level 2 (L 2 ) time series. Here, y ij,t represents the demand of UPC i at store j at time t. Let m i be the number of stores that sell UPC i, then the total number of L 2 demand time series is ∑ n i=1 m i .

Case I
Case I assumes that the business strategies of the downstream retailers are completely opaque to their supplier. In the context of Figure 1, it implies that the pricing and promotional strategies at L 2 are unknown to L 0 and L 1 . Under this setting, the information set at time t available to the supplier is where Y L 2 ,t = y 11,t , · · · , y 1m 1 ,t , · · · , y n1,t , · · · , y nm n ,t , t ∈ {1, · · · , t − 1} (4) is the historical retailer-level demand.
Using Ω I t , the supplier can come up with a L 1 forecast by first aggregating the demand data: so that the n UPC-level demand time series {y i,1 , · · · , y i,t−1 }, i ∈ {1, · · · , n}, are obtained. Subsequently, univariate forecasting methods can be used on the aggregated time series. In this work, two frequently used univariate families of models are considered for Case I, namely, ARIMA and ETS. Furthermore, the case number and model abbreviation in SMALLCAPS are jointly used to denote a model, e.g., the ARIMA family of models for Case I is denoted as C1ARIMA.

Case II
In this case, it is assumed that the retailers share the historical price, promotion, and the corresponding demand information with their supplier. Sharing such information does not pose severe practical challenges from an operational perspective. In fact, for those supply-chain relationships governed by long-term contracts, such information can be shared by enabling connectivity between IT systems of the firms. However, from a strategic perspective, retailers may not want to divulge their historical pricing and promotional strategy, since such information may be useful for their competitors in updating their priors. In other cases when the competitors also source from the same supplier, leakage of competitive information may not be ruled out completely. Therefore, this case poses certain strategic barriers to being implemented in practice.
The information set in this case is: where the additional terms to Case I, namely, P L 2 ,t = p 11,t , · · · , p 1m 1 ,t , · · · , p n1,t , · · · , p nm n ,t , t ∈ {1, · · · , t − 1}, I L 2 ,t = I 11,t , · · · , I 1m 1 ,t , · · · , I n1,t , · · · , I nm n ,t are the historical retailer-level price and promotions information, respectively. Whereas the L 1 demand is the simple summation of the L 2 demand (see Equation (5)), the L 1 price and promotional information needs to be strategically aggregated. A weighted-average approach is used: when aggregating P L 2 ,t and I L 2 ,t . ACV j is the all commodity volume (ACV) of retailer j, i.e., the annual sales volume of retailer j. Equations (9) and (10) average the price and promotion index of each retailer according to the store's ACV. In other words, a bigger store receives a higher demand during a promotional campaign than a smaller store with the same promotion.
Considering the exogenous factors, the L 1 forecasts for Case II can be produced via a regression model: where i ∈ {1, · · · , n} is the index set for the UPCs under consideration. Parameters , , and control the numbers of lagged values of the dependent variable and each explanatory variable. The regression parameters in Equation (11), namely, α i,t 0 , α i,t k , β i,t k , and γ i,t k are designed as adaptive parameters, to capture the changing dynamics in the demand series. After the regression parameters are estimated, the forecast for UPC i at time t is given by: As parameter estimation is not the primary concern of this paper, least squares fitting is used throughout the text. Furthermore, the model equations (such as Equation (11)) and the prediction equations (such as Equation (12)) will be used interchangeably hereafter. In conclusion, a reduced form of the ADL model is used for Case II or C2ADL.

Case III
Case III refers to a situation where retailers share their forecasts with their suppliers. Just like Cases I and II, it is assumed that the historical demand information is available to the supplier. However, the price and promotional information is assumed unknown to the supplier. The information available is thus Ω III t = {Y L 2 ,1 , · · · , Y L 2 ,t−1 ,Ŷ L 2 ,t }, whereŶ L 2 ,t denotes the forecasts made by retailers for time t. As mentioned earlier, the information available in Case III may even be less sensitive than that in Case II, as information on price and promotions is only embedded in the forecasts implicitly. In other words, if the supplier observes an increase in the forecast demand, due to the lack of information on P L 2 ,t and I L 2 ,t , there is, however, no way for the supplier to infer the cause (e.g., a target promotion or a simple price reduction) of the additional demand. The pricing and promotional strategies of retailers are thus (somewhat) protected.
At first glance, Case III is straightforward, as the forecasts at L 1 can be easily obtained by summing upŶ L 2 ,t . However, doing so implies that the supplier is not utilizing the information provided by {Y L 2 ,1 , · · · , Y L 2 ,t−1 }. Therefore, a more holistic approach is needed. A highlight of this paper is to present the usefulness of the optimal hierarchical reconciliation method [21,25] in forecasting for a supplier in a supply chain with multiple retailers.
In hierarchical reconciliation, simply summing upŶ L 2 ,t is referred to as the bottom-up approach. The reconciled forecasts at all levels,Ỹ t , using the bottom-up approach is: whereŶ L 2 ,t is given in Equation (22) below, and each entry therein is a forecast of a particular UPC at a particular store. S is a summing matrix. For illustration purposes, consider Figure 1 with n = 3, m i = 3, i ∈ {1, 2, 3}, and the summing matrix is given by: By observing the summing matrix, it becomes apparent howỸ t may be obtained. For example, multiplying the second row of S withŶ L 2 ,t gives the forecast for y 1,t .
The optimal reconciliation method is similar to the bottom-up approach, but it utilizes the information that is ignored by the bottom-up approach, namely, {Y L 2 ,1 , · · · , Y L 2 ,t−1 }. The idea of optimal reconciliation originates from a linear regression model: whereŶ t given by Equation (21) is a vector of all base forecasts, t has zero mean and unknown covariance Σ, and β t = E(Ŷ L 2 ,t ) is the unknown mean of retailer-level forecasts. As compared to Equation (13), Equation (15) models the errors associated with reconciliation ( t should not be confused with the base-forecast errors). In other words, when the base forecasts are generated individually at L 0 , L 1 and L 2 , Equation (15) explains the aggregate inconsistency in the forecasts. Regression parameter β t can be estimated using the generalized least squares (GLS): where Σ † is the Moore-Penrose generalized inverse of Σ. A problem with the GLS solution is that the covariance matrix, Σ, is unknown. As the length of t is l = 1 + n + ∑ n i=1 m i , i.e., the total number of base forecasts, there are l(l + 1)/2 unknown parameters in Σ that need to be estimated. This is not a straightforward computation. Therefore, an alternative strategy is to use the weighted least squares (WLS) approach: where Λ is a diagonal matrix whose elements are the inverse of the variances of t . Using the WLS solution, the optimally reconciled forecasts are: Using the optimal reconciliation technique, the entire information set Ω III t is used. The results are thus expected to be better than those from a bottom-up approach. The following system of equations summarizes the forecast method in Case III: Y L 2 ,t = ŷ 11,t , · · · ,ŷ 1m 1 ,t , · · · ,ŷ n1,t , · · · ,ŷ nm n ,t ; where i ∈ {1, · · · , n}, j ∈ {1, · · · , m i }; Y L 1 ,t = (ŷ 1,t , · · · ,ŷ n,t ) ; Equation (19) is the general form of forecast reconciliation [21]. The bottom-up approach can be represented by setting P as (0 m×(n+1) |I m ), where 0 and I are null matrix and identity matrix respectively, m = ∑ n i=1 m i . Equation (23) produces L 2 base forecasts. It assumes that the retailers know their own price and promotional information at a future time t; the summations over p and I thus start from k = 0. Equations (25) and (26)

Case IV
This case represents the ideal case for traditional supply-chain management. In this case, the retailers share all pertinent information with the supplier that is Ω IV t = {Y L 2 ,1 , · · · , Y L 2 ,t−1 , P L 2 ,1 , · · · , P L 2 ,t , I L 2 ,1 , · · · , I L 2 ,t }. It is important to note that, in this case, different from Case II, the planned pricing and promotional strategy P L 2 ,t and I L 2 ,t are also shared. Due to the competitive importance of this information, this case is less practical than all other cases.
The forecast method for Case IV is identical to Case II except now that the planned price and promotional information for time t is available to the supplier: L 1 price and promotion are aggregated through Equations (9) and (10), but for t ∈ {1, · · · , t}. This model is denoted as C4ADL.

Forecast Combination Methods
Section 3 has discussed the various modeling approaches to be used in cases of different levels of information sharing. It is noted that there is information overlap among these cases. For example, suppose the information set Ω III t is available, all forecasts in Case I also become available, i.e., Ω I t ⊂ Ω III t . Although it can be argued that the "best" method known to a forecaster will most likely outperform all other alternatives, e.g., C3OPT will outperform C1ARIMA, suboptimal forecasts sometimes can contribute positively to the final forecast, as discussed in Section 2. To that end, several frequently used forecast combination methods are formulated in this section.

Simple Averaging
The most intuitive way to combine forecasts is to use the average of all forecasts as the final combined forecast. Suppose a total of N component models are used to produce forecasts at each time stamp t, the final forecast is given byŷ whereŷ (i) t is the forecast produced by method i, andŷ (avg) t is the combined forecast using simple averaging. This method is denoted as AVG.

Trimmed Simple Averaging
In practice, if more than five component forecasts are available, the forecaster can afford to drop the highest and lowest forecasts, i.e., the outliers among the component forecasts. This method is known as the trimmed simple averaging, or TRIM.

Combination through Variance
AVG puts equal weights on the component models. However, it can be argued that a heavier weight should be placed on a better performing model. The variance-based method (VAR) computes the mean squared error (MSE) and weighs the forecasts according to their accuracy. Mathematically, it is expressed as:ŷ It is noted that, in this method, some historical forecasts need to be available for the computation of MSE.

Combination through Ordinary Least Squares
Besides assigning equal or variance-based weights, another class of weight-assignment methods is through linear regression: is the size-(N + 1) vector of component model forecasts at time t , y t is the actual demand at time t , and W is the size-(N + 1) vector of weights. The above regression equation can be written into a vector form, if a total of T historical forecasts are available: where Y = (y 1 , · · · , y T ) , e = (e 1 , · · · , e T ) , and If the weight vector W can be estimated, the combined forecast at time t can be obtained viâ Given the regression setting, the ordinary least squares (OLS) method provides a basic solution to W: This method is denoted as OLS.

Combination through Least Absolute Deviations
Whereas the OLS regression minimizes ∑ T t =1 (e t ) 2 , the median regression, or least absolute deviations (LAD) regression, minimizes ∑ T t =1 |e t |, i.e., Linear programming is required to solve for W LAD . The advantage of LAD is that it treats all samples equally, whereas OLS penalizes the large errors. In other words, LAD is more resistant to outliers in the component forecasts.

Combination through Lasso
Similarly, lasso is another commonly used regression technique, which minimizes the sum of squares subject to a regularization on the L 1 norm of the regression parameter. Formally, the lasso estimator is given by: Due to its geometry, see Tibshirani [40], lasso is capable of shrinking some regression parameters to zero, and thus acts as a variable selection method. In the present context, lasso shrinks and selects only the good component model forecasts, which may be more appropriate than OLS and LAD.

Combination through Complete Subset Regression
The last variant of the regression-based combination method considered in this paper is the complete subset regression. Given N component models, a total of Ψ regressions can be built, where These Ψ combined forecasts can then be combined for a final forecast using averaging-mean, median, mode, or any other combination method. This method is denoted as SUBSET.

Empirical Study
The empirical part of this paper considers the FMCG data from a retail chain. The data are stored in the Dominick's database (http://research.chicagobooth.edu/kilts/marketing-databases/dominicks/), which is freely available online. The dataset is provided by the James M. Kilts Center, University of Chicago Booth School of Business with a collaborative effort by the Dominick's Finer Food (DFF). Although the dataset records weekly historical data from 1989 to 1994, owing to its informative nature, it is still frequently being used to conduct marketing research (e.g., [41,42]). It was previously found that sales promotions in Dominick's database are large and frequent [43]; the dataset thus provides a suitable platform for our current investigation, i.e., forecasting under strong demand fluctuation.
The database contains four types of files: the customer count file, store-level demographics file, UPC files, and movement files. Store-level sales information for FMCG from 29 categories, along with the store-level price and promotional information, is provided in the movement files. Since the goal of this paper is to investigate the benefits of information sharing during forecasting, only data from one category, namely, the bottled juice category (BJC), is used. However, similar results and conclusions can be expected, should the data from other categories be used.
There are three types of promotion in BJC: simple price reduction, bonus buy, and coupons, with bonus buy being the dominant type (e.g., three bottles of cranberry juice for 10 dollars). As the price in the data file is registered for the bundle, the total price is simply divided by the size of the bundle to obtain the price of a single item (which is equivalent to a price reduction). In addition, a "1" is recorded if a particular store has a promotion (any type) on a particular product. After careful data preprocessing and filtering (e.g., list-wise deletion of missing values), complete data from 37 UPCs over a period of 377 weeks remain. These UPCs are listed in the first column of Table 1

Forecast Accuracies under Different Levels of Information Sharing
Rolling forecast is used in this paper. Data from the first 200 weeks are used to train the model, and thus generate the first forecast. A sliding window is then used to define the training data for subsequent forecasts. Given the span of the data (377 weeks), 177 pseudo out-of-sample forecasts can be made for each time series. This procedure is depicted in Figure 2. The mean absolute percentage errors (MAPEs) of UPC-level demand under each case are shown in the colored columns of Table 1 The results shown in Table 1 agree very well with the hypothesized scenario, namely, the forecast errors of Cases I-IV follow a decreasing order. Models in Case I, namely, C1ETS and C1ARIMA, perform the worst because they can hardly capture any demand surge during the promotional periods. This strongly supports the conjecture that pricing and promotional strategy forms a relevant part of the suppliers' information set, in the absence of which forecast errors may be significant. Case II performs better than Case I due to the additional information (past price and promotion) used during forecasting. However, the most interesting observation is that Case III performs almost as well as Case IV, with C3OPT having a higher accuracy than C3BU.
The results shown in Table 1 can be used to conclude that, for the current dataset, sharing full information with a supplier no doubt benefits the supplier greatly. However, only sharing forecast information can also result in significant forecast improvements on the supplier's side. The fact that forecast sharing benefits the supplier is not surprising; however, the observation that it nearly yields the same level of benefits as Case IV is quite interesting.

Forecast Accuracies of the Combined Forecasts
The second part of the empirical study investigates the effects of forecast combination on forecast accuracy. To simplify the case study, the full information set in Case IV is assumed to be available, so that all Cases I-III forecasts can be generated as well. In this way, a total of seven component models are used during forecast combination. Since all forecast combination methods, besides AVG and TRIM, require training/fitting, 100 out of 177 sets of pseudo out-of-sample forecasts are randomly chosen for training. The remaining samples are used for error evaluation. The MAPEs of the combined UPC-level forecasts are shown in columns 9-15 of Table 1.
By contrasting the combined forecasts to the component forecasts, it is observed that the combined forecasts appear to be more accurate in general. For certain UPCs, such as "3828103123," the worst combined forecasts are found to be better than the best component forecasts. Among the various forecast combination methods, LAD performs the best, whereas AVG performs the worst. Since the differences between the best and worst combined forecasts are smaller than that between the component forecasts, it seems that choosing combined forecasts is less risky than choosing the forecasts from the "best-known" component model. This also implies that, when the best forecasting method for a new UPC is unknown, it is better to choose a combined forecast than choosing a component forecast based on previous results. To elaborate, suppose only the forecasts for UPC "5300015132" are available, the forecaster would choose C1ETS as the best-known model-based on its small MAPE of 15.13% for that UPC. In this case, C1ETS would give large errors for other UPCs, which is not desired.

Conclusions
The benefits of forecast sharing in a supply chain in terms of forecast accuracy is demonstrated through an empirical example. Four cases characterized by different levels of information sharing are considered. It is found that the full information sharing case (Case IV) produces the best forecast accuracy. However, by only sharing the retailer-level forecast values (Case III), the forecast at the supplier level is almost as accurate as Case IV. This implies that it is not necessary for retailers to share pricing and promotional strategies explicitly, which protects them from leakage of sensitive competitive information. This concludes the first contribution of the paper.
When the retailer-level forecasts are shared, suppliers can reconcile those forecasts with their own forecasts. It is also shown that optimal reconciliation (which considers all forecasts) outperforms the traditional bottom-up method (which considers only the retailer-level forecasts). The merit of the optimal reconciliation approach is that it puts greater weights on the more accurate forecasts, so that the overall performance of all forecasts across the hierarchy can be improved. Additionally, forecast combination has been shown to improve the reliability of the forecasts, especially when the best component model is unknown to a forecaster. To that end, hierarchical reconciliation and forecast combination are two ways to address the forecast inconsistencies in supply-chain management, and hence conclude our second contribution.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: