Reframing demand forecasting: a two-fold approach for lumpy and intermittent demand

Demand forecasting is a crucial component of demand management. While shortening the forecasting horizon allows for more recent data and less uncertainty, this frequently means lower data aggregation levels and a more significant data sparsity. Sparse demand data usually results in lumpy or intermittent demand patterns, which have sparse and irregular demand intervals. Usual statistical and machine learning models fail to provide good forecasts in such scenarios. Our research shows that competitive demand forecasts can be obtained through two models: predicting the demand occurrence and estimating the demand size. We analyze the usage of local and global machine learning models for both cases and compare results against baseline methods. Finally, we propose a novel evaluation criterion of lumpy and intermittent demand forecasting models' performance. Our research shows that global classification models are the best choice when predicting demand event occurrence. When predicting demand sizes, we achieved the best results using Simple Exponential Smoothing forecast. We tested our approach on real-world data consisting of 516 three-year-long time series corresponding to European automotive original equipment manufacturers' daily demand.


Introduction
Demand forecasting is a critical component of supply chain management, directly affecting production planning and order fulfillment.Accurate forecasts have an impact across the whole supply chain and manufacturing plant organization: operational and strategic decisions are made on resources (allocation and scheduling of raw material and tooling), workers (scheduling, training, promotions, or hiring), manufactured products (market share increase, production diversification) and logistics for deliveries.
To issue accurate forecasts, we have to consider demand characteristics.Multiple demand characterizations were proposed (Williams (1984); Johnston and Boylan (1996)).Among the most influential ones is the characterization proposed by Syntetos et al. (2005), which divides demand patterns into four quadrants, based on inter-demand interval and coefficient of variation.The four demand types are smooth (regular demand occurrence and low demand quantity variation), erratic (regular demand occurrence and high demand quantity variation), intermittent (irregular demand occurrence and low demand quantity variation), and lumpy (irregular demand occurrence and high demand quantity variation).Intermittent and lumpy demand forecasting is considered among the most challenging demand forecasting problems.Both present infrequent demand arrivals with many zero demand periods, which pose an additional challenge to the accurate demand quantity estimation.Demand quantity estimation is harder for lumpy demands, since it also presents variable demand sizes (Petropoulos and Kourentzes (2015); Amin-Naseri and Tabar (2008); Bartezzaghi and Kalchschmidt (2011)).Bartezzaghi et al. (1999) considers demand lumpiness as a consequence of different market characteristics, such as numerousness and heterogeneity of customers, the frequency at which customers place the orders, and the variety of customer's requests (e.g., high customization in make-to-order settings (Verganti (1997))) and the correlation between customers behavior).Lumpyness is also related to the granularity level at which demand is considered (e.g., visualize demand at a client and product level vs.only at a product level) or visualize daily demand vs. monthly).Higher aggregation levels usually reduce the number of periods without demand, changing the demand pattern classification.Babai et al. (2014) stated that intermittent demand items account for considerable proportions of any organization's stock value.The importance of lumpy demand was characterized in a use case by Johnston et al. (2003), who found that 75% of items had a lumpy demand, accounting for 40% of the company's revenue and 60% of stock investment.In this line, Amin-Naseri and Tabar (2008) cites multiple authors who observe lumpy patterns are widespread, especially in organizations that hold many spare parts, such as process and automotive industries, telecommunication systems, and others.
Increasing industry automation, digitalization, and information sharing (e.g., thorough Electronic Data Interchange software), fomented by national and regional initiatives (Davies (2015); Glaser (2019); Yang and Gu (2021)), accelerates the data and information flow within the organization, enabling greater agility.It is critical to develop demand forecasting models capable of providing forecasts at a low granularity level to achieve greater agility in supply chain management.Such models enable short forecasting horizons and provide insights at a significant detail level, allowing to foresee and react to changes quickly.While these forecasting models benefit from the most recent data available (which helps enhance the forecast's accuracy), the low granularity level frequently requires dealing with irregular demand patterns.
Given the variety of demand types, researchers proposed multiple approaches to provide accurate demand forecasts.While smooth and erratic demands achieve good results using regression models, intermittent and lumpy demand require specialized models that consider demand occurrence.Statistical, machine learning and hybrid models were developed to that end.The increasing digitalization of industry enables to timely collect data relevant to demand forecasts.Data availability is key for developing machine learning models, which in some cases achieve the best results.
To deal with intermittent demand, Croston (1972) proposed a forecasting model that provides separate estimates for demand occurrence and demand quantity estimation.Since then, much work followed this direction: multiple authors proposed corrections to Croston's method to address forecast biases or provide different means to estimate demand occurrence.
The measurement of intermittent and lumpy model's performance was also the subject of extensive research.Many authors agree we require regression accuracy and inventory metrics.There is increasing agreement that regression metrics alone, used for smooth and erratic demand, are not useful for measuring intermittent and lumpy demand since they fail to weigh zero demand periods.Inventory metrics suffer the same bias while providing a perspective of how much time products stay in stock.
Croston provided a valuable intuition on separating demand occurrence from demand sizes.While many authors followed this intuition, we found that in the literature we reviewed, no author fully considered demand forecasting as a compound problem, which requires not only separate models but also separate metrics.We propose reframing demand forecasting as a twophase problem, which requires (i) a classification model to predict demand occurrence and (ii) a regression model to predict demand sizes.For smooth and erratic demands, classification can be omitted since it (almost) always occurs.In those cases, using only a regression model provides good demand forecasts (Brühl et al. (2009); Wang et al. (2011); Sharma and Sinha (2012); Gao et al. (2018); Salinas et al. (2020); Bandara et al. (2020)).For intermittent and erratic demands, using separate models for classification and regression provides at least two benefits.First, separate models allow optimizing for different objectives.Second, each problem has adequate metrics, and the cause of performance / under-performance can be clearly understood and addressed.
The classification model, used to estimate demand occurrence, should use specific features that correlate to demand occurrence.We found past research is a rich source of intuitions on factors related to demand occurrence.We consider the classification model a global model that trained over all the time series or subsets.This way, even though demand events for a single product may be scarce, the greater the amount for products considered, demand events sparsity decreases.Simultaneously, the model can learn underlying patterns, which may be related to specific behaviors (e.g., deliveries take place only on certain days).When demand events are scarce, we may find the classification problem corresponds to an imbalanced one, posing an additional challenge.
In this research we propose: We compare the statistical methods proposed by Croston (1972), Syntetos and Boylan (2005)), and Teunter and Sani (2009), the hybrid models developed by Nasiri Pour et al. (2008) and Willemain et al. (2004), and the ADIDA forecasting method introduced by Nikolopoulos et al. (2011), measuring their performance through classification, regression and inventory metrics.We also developed a compound model of our own, which outperforms the listed ones in all three dimensions.
We perform our research on a dataset consisting of 516 time series of intermittent and lumpy demand at a daily aggregation level, which corresponds to the demand of European manufacturing companies related to the automotive industry.
The rest of this paper is structured as follows: Section 2 presents related work, Section 3 describes our approach to demand forecasting with a particular focus on intermittent and lumpy demands, Section 4 describes the features we created for each forecasting model, how we built and evaluated them, Section 5 describes the experiments we performed and results obtained, and in Section 6, we provide our conclusions and outline future work.

Demand characterization
Many authors tried to characterize demand to provide cues to decide which forecasting model is most appropriate for each case.Croston (1972) assessed demand based on demand size and inter-demand intervals, providing a method to forecast intermittent demand.Williams (1984) considered variance of the number of orders and their size given a particular lead time, classifying items into five categories regarding high/low demand sporadicity and size.A particular category is created for products with a highly sporadic demand occurrence and high demand size variance, based on the author's empirical findings regarding demand intermittence.Eaves and Kingsman (2004) found the classification schema proposed by Williams did not provide means to distinguish steady demand solely based on transaction variability.He proposed dividing demand into five categories considering lead time variability, transaction rate variability, and demand size variability.Johnston and Boylan (1996) introduced the concept of average demand interval (ADI, see Eq. 1), which was complemented by Syntetos et al. (2005)

ADI =
Total Periods Total Demand Buckets (1)  introducing the coefficient of variation (CV, see Eq. 2).Both concepts allow to divide demand into quadrants, considering them smooth, erratic, intermittent, or lumpy demand types (see Fig. 1).Smooth and erratic demand present regular demand, with smooth demand having little variability in demand sizes, while this variability is strong for erratic demands.Intermittent and lumpy demand present irregular demand intervals over time.
Intermittent demand has little variability in demand sizes, in contrast to lumpy demand, which has a greater demand size variability.Thresholds were set based on empirical findings regarding where the methods proposed by Croston (1972) and Syntetos andBoylan (1999, 2001) performed best.This paper we use the term steady demand to refer to smooth and erratic demands (that display regular demand occurrence) and irregular demand for intermittent and lumpy demand (that display irregular demand occurrence).

Demand Forecasting models
Forecasting irregular demand is considered a challenging task since, in addition to demand size forecast, it requires taking into account irregular demand occurrence.We categorize demand forecasting models into three types.Type I consist of a single model providing a demand size estimate.Type II uses aggregation to remove demand intermittency and benefit from regular time-series models to forecast demand.Type III use separate models to estimate demand occurrence and demand size.
Box-Jenkins approaches, frequently used for regular time series forecasting, are considered useless in the context of irregular demand (Wallström and Segerstedt (2010)) since it is challenging to estimate trends and seasonality given the high proportion of zeros.

Type I models
Wright (1986) developed the linear, exponential smoothing, an adaptation of Holt's double exponential smoothing model, which considers variable reporting frequency and irregularities in time spacing, to compute and update a trend line with exponential smoothing.Altay et al. (2008) demonstrated this method is useful to forecast intermittent demand, where the trend is present.Sani and Kingsman (1997); Ghobbar and Friend (2003) found averaging methods can provide acceptable performance in some cases, despite demand intermittency.Chatfield and Hayya (2007) suggested using a zero demand for highly lumpy demands, where the holding cost is much higher than the shortage cost.Gutierrez et al. (2008) proposed forecasting lumpy demand with a three-layer multilayer perceptron (MLP), considering only two inputs: demand of the immediately preceding period and the number of periods separating the last two non-zero demand transactions.
Equation 3: Croston's formula (Croston (1972)) for irregular demand estimation, where a is demand level, p is periodicity, d refers to demand observations, q is previous demand occurrence, and αrepresents a smoothing constant.
Equation 5: Teunter, Syntetos & Babai formula (Teunter et al. (2011)) for irregular demand estimation, where a is demand level, p is probability of demand occurrence, d refers to demand observations, q is previous demand occurrence, and αrepresents a smoothing constant.
A seminal work regarding intermittent demand forecasting was developed by Croston (1972), who identified Exponential Smoothing as inadequate to estimate demand when the mean demand interval between two transactions greater than two time periods.Croston proposed a method to estimate the expected interval between transactions and the expected demand size (Eq.3), assuming that successive demand intervals and sizes are independent, the inter-demand intervals follow a Geometric distribution, and the demand sizes follow a Normal distribution.Shenstone and Hyndman (2005) showed Croston's method was not consistent with intermittent demand properties, but its results still outperformed conventional methods.Many researchers followed Croston's approach, either by enhancing this method or proposing similar ones.Syntetos and Boylan (2005) proposed a slight modification to Croston's method, known as Syntetos-Boylan Approximation (SBA), to avoid a positive correlation between the forecasted demand size and the smoothing constant (Eq.4).Levén and Segerstedt (2004) suggested computing a new demand rate every time demand takes place, considering a maximum of one time per time bucket.Teunter et al. (2011) consider computing a demand probability for each period and update the demand quantity forecast only when demand takes place (Eq.5).Prestwich et al. (2014b) proposed a hybrid of Croston's method and Bayesian inference to consider items ob-solescence.Chua et al. (2008) developed an algorithm that estimates future demand occurrence based on three time-series: non-zero demands, the interarrival period between demand occurrences, and periods spanned between two demand occurrences.They estimate demand size with a simple moving average.

Type II models
Type II models address irregular demand occurrence by performing data aggregation and achieving smooth time series.Much research was performed on the effect of aggregation on regular time series (Hotta and Neto (1993); Souza and Smith (2004) 2016)), showing that a higher aggregation improves forecast results.Nikolopoulos et al. (2011) proposed the aggregate-disaggregate intermittent demand approach (ADIDA), as a three-stage process: (i) perform timeseries aggregation (either overlapping or non-overlapping aggregation), (ii) forecast the next time series value over the aggregated time series, and (iii) disaggregate the forecasted value to the original aggregation level.(2008).Inputs to the model are demand size at the end of the preceding period, the number of periods between the last two demand occurrences, the number of periods between target period and last demand occurrence, and the number of periods between target period and first immediately preceding zero demand period.

Type III models
Following Croston (1972) intuition, some researchers developed separate models to forecast demand occurrence and demand size.Willemain et al. (2004) proposed to model demand occurrence as a Markov process and forecast demand size by randomly sampling past demand sizes and eventually jitter them to account for not yet seen values.Hua et al. (2007) follows a similar approach, attributing demand occurrence to autocorrelation or explanatory variables.If they attribute demand occurrence to autocorrelation, they predict demand occurrence based on Markov processes.Otherwise, they use a logistic regression model.Nasiri Pour et al. ( 2008) developed a hybrid approach, forecasting demand occurrence with a neural network (see Fig. 2), while they estimate demand size with exponential smoothing.The neural network they propose considers four input variables: demand size at the end of the preceding period, the number of periods between the last two demand occurrences, number of periods between target period and last demand occurrence, and number of periods between target period and first immediately preceding zero demand period.Finally, Petropoulos et al. (2016) developed an alternative perspective to the ADIDA framework (Nikolopoulos et al. (2011)), aggregating time series in such a way that each time bucket contains a single demand occurrence.By doing so, the transformed timeseries no longer present intermittency and forecast the time-varying number of periods when such demand will occur.Demand size is estimated based on Croston's, SBA, or Simple Exponential Smoothing methods, based on inter-demand interval and coefficient of variation mean values.

Forecasting features
In the scientific literature related to demand forecasting of intermittent and lumpy demands, authors describe multiple characteristics and features relevant to demand occurrence forecasting.Among them we find the average inter-demand interval size (Levén and Segerstedt (2004)), previous demand event occurrence (Gutierrez et al. (2008)), distribution of inter-demand interval sizes (Croston (1972)), demand size (Nasiri Pour et al. ( 2008)), demand shape distribution (Zotteri (2000)), usage of early information generated by customers during the purchasing process (Verganti (1997)), the presence of paydays, billing cycles or holidays (Hyndman et al. (2006)), demand event autocorrelation (Willemain et al. (1994)), or the fact that items may be purchased by same supplied or shipped using same transportation mode (Syntetos ( 2001)).
To estimate demand size, we found two techniques were applied across all cases.The first one was exponential smoothing (and its variants), applied across previous non-zero demand sizes (Nasiri Pour et al. (2008)).The second one was the use of jittering on top of randomly sampled past demand sizes to account for yet unseen values (Willemain et al. (2004); Hua et al. (2007)).Altay et al. (2008) described means to adjust demand size values based on the presence of trend in data.

Metrics
Measurement of forecasting models performance for lumpy and intermittent demand was a subject of extensive research.Syntetos (2001) compared the performance of the Mean Signed Error, Wilcoxon Rank Sum Statistic, Mean Square Forecast Error, Relative Geometric Root Mean Square Error, and Percentage of times Better metrics.They concluded that the Relative Geometric Root Mean Square Error behaves well in the context of irregular demand.Teunter and Duncan (2009) pointed out that the Relative Geometric Root Mean Square Error cannot be applied on a single item for zero or moving average forecasts (it would result in zero error).Hemeimat et al. (2016) suggested using the tracking signal metric, calculated by dividing the most recent sum of forecast errors by the most recent estimate of Mean Absolute Deviation.Among many metrics, Hyndman et al. (2006) suggested using the MASE for lumpy and intermittent demands, providing a scale-free assessment of how accurate demand size forecasts are.Though traditional per-period forecasting metrics, such as Root Mean Squared Error, Mean Squared Error, Mean Absolute Deviation, or Mean Absolute Percentage Error, were widely used in the literature regarding irregular demand, Teunter and Duncan (2009); Kourentzes (2014) showed they are not adequate due to the high proportion of zeros.Prestwich et al. (2014a) proposed computing a modified version of the error measures, considering the mean of the underlying stochastic process instead of the point demand for each point in time.Finally, it is relevant to point out that two metrics were used in the M5 competition (Makridakis et al. (2020)) to assess time series regarding irregular sales: Root Mean Squared Scaled Error (RMSSE, introduced by Hyndman et al. ( 2006)), and the Weighted RMSSE.By using a score that considers squared errors, result measurements optimize towards the mean.The Weighted RMSSE variant allows for penalizing each time series error based on some criteria (e.g., item price).Both metrics, though, unevenly penalize products sold during the whole time-period against those that are not.Syntetos and Boylan (2010) noted that regardless of the metrics used to estimate how accurate demand forecasts are, it is crucial to measure the impact of forecasts on stock-holding and service levels.In this line, Wallström and Segerstedt (2010) proposed two complementary metrics.The first one is the number of shortages, counting how many times the cumulated forecast error is over zero in the time interval of interest.The second one is Periods in Stock, as the number of periods the forecasted items spent in fictitious stock (or how many stock-out periods existed).More recently, Martin et al. (2020) proposed the Stock-keeping-oriented Prediction Error Costs (SPEC), which considers for each time step demand forecasts translate either into costs of opportunity or stock-keeping costs, but never both at the same time.2A) corresponds to the influential categorization developed by Syntetos et al. (2005).On the left (Fig. 2B), we propose a new schema that only considers demand occurrence, dividing demand into two groups.'R' denotes regular demand occurrence, where demand size can be predicted with a regression model.'C+R' denotes irregular demand occurrence that requires a model to predict demand occurrence and a model to predict demand size.Croston (1972) developed the intuition of considering two components for intermittent demand forecasts: demand occurrence and demand sizes.2005) and the division mentioned above, we propose an alternative demand categorization schema.Considering demand occurrence and demand quantity forecasting as two different problems, we can divide demand into two types (see Fig. 3).The first demand type is 'R', which refers to demand with regular demand event occurrence.Since demand (almost) always occurs, the estimation of demand occurrence is rendered irrelevant and does not pose a challenge when developing a regression model to estimate demand size.The second demand type is 'C+R' and refers to demand with irregular demand event occurrence.These cases benefit from models that consider both demand occurrence and demand size (see Section 2.2).

Demand characterization and forecasting models
Figure 4: Two-fold machine learning approach to demand forecasting.Fig. 4A shows a basic architecture for demand forecasting when reframing demand forecasting as classification and regression problems.Fig. 4B shows a fluxogram with steps followed to create the demand forecasting models and issue demand forecasts.
We present a demand forecasting model architecture and a fluxogram describing how to build it and issue demand forecasts in Fig. 4. Since the classification and regression models address different problems, we expect them to use different features to help achieve their goals.We described aspects relevant to demand occurrence and demand size forecasting present in the literature in Section 2.2.4.These can be used as features for the classification and regression models, respectively.
An increasing body of research suggests global machine-learning timeseries models (models built with multiple time series) provide better results than local ones (models considering time series corresponding to a single product) (Bandara et al. (2020); Salinas et al. (2020))).Increased performance is observed even when training models with disparate time series have different magnitudes or may not be considered related to each other (Laptev et al. (2017)), though how to bound the maximum possible error in such models remains a topic of research.Given this insight, we consider that dividing demand based on the coefficient of variation provides limited value and is no longer relevant to demand categorization.
We keep the cut-off value of ADI=1.32 proposed by Syntetos et al. ( 2005) as a reference.While this cut-off value remains relevant for statistical methods, we consider its relevance to blur regarding different machine learning models.Its importance may be rendered irrelevant for global machine learning classification models that predict demand occurrence.By considering multiple items at once, global models perceive a higher density of demand events and less irregularity than models developed with data regarding a single demand item.Simultaneously, the model can learn underlying patterns, which may be related to specific behaviors (e.g., deliveries take place only on certain days).It is important to note that event scarcity usually results in imbalanced classification datasets, posing an additional challenge.
We present suggested metrics to assess each model, and overall demand forecasting performance, in Section 3.2.

Metrics
Though several authors (e.g., Croston (1972); Syntetos and Boylan (2001); Kiefer et al.) considered separating demand occurrence and demand size when forecasting irregular demand, in the literature we reviewed, we found no researcher to measured them separately.We thus consider they did not consider demand occurrence and demand size forecast as entirely different problems.
Considering irregular demand forecasting only as a regression problem lead to much research and discussion (presented in Section 2.3) on how to mitigate and integrate zero-demand occurrence to measure demand forecasting models performance adequately.In our research, we provide a different perspective.We adopt four metrics to assess the performance of demand forecasting models: (i) AUC ROC to measure how accurate the model is forecasting demand occurrence, (ii) two variants of MASE to measure how accurate the model is forecasting demand size, and (iii) SPEC to measure how the forecast impacts inventory.When measuring SPEC, we consider α 1 =α =0.5.
AUC ROC is widely adopted as a classification metric, having many desirable properties such as being threshold independent and invariant to a priori class probabilities.MASE has the desirable property of being scaleinvariant.We consider two variants (namely MASE I and MASE II ).Following the criteria in Wallström and Segerstedt (2010), we compute MASE I on the time series that results from ignoring zero-demand values.By doing so, we assess how well the regression model performs against a Näive forecast, assuming a perfect demand occurrence prediction.On the other hand, MASE II is computed on time series considering all points where either demand took place or the classification model predicted demand occurrence.By doing so, we measure the impact of demand event occurrence misclassification on the demand size forecast.When the model predicting demand occurrence has perfect performance, (i) should equal (ii).Finally, we compute SPEC on the whole time series (considering zero and non-zero demand occurrences).This way, the metric measures the overall forecast impact on inventory, weighting stock-keeping, and opportunity costs.

Business understanding
Demand forecasting is a critical component to supply chain management since its outcomes directly affect the supply chain and manufacturing plant organization.This research focuses on demand forecasting for a European original equipment manufacturer from the automotive industry.We explored providing demand forecasts for each material and client at a daily level.Such forecasts enable highly detailed planning.From the forecasting perspective, accurate forecasts at a daily level can leverage the most recent information, which is lost at higher aggregation levels for non-overlapping aggregations.They also avoid imprecisions that result from higher-level forecast disaggregations.

Data understanding
For this research, we used a dataset with three years of demand data extracted from an Enterprise Resource Planning software.We consider records accounting products shipped from manufacturing locations for demand data on the day they are shipped.The dataset comprises 516 time series that correspond to 279 materials and 149 clients.When categorized according to the schema proposed by Syntetos et al. (2005), we found 49 correspond to a lumpy demand pattern, and 467 to intermittent demand pattern.In Table Table 1 and Table 2 we provide summary statistics for time series corresponding to each demand pattern.We find that demand occurrence for both sets of time series is highly infrequent, having a mean of one demand event in almost two months or more.

Data Preparation, Feature Creation and Modeling
We forecast irregular demand with two separate models: a classification model to predict demand occurrence and a regression model to predict demand size.Though source data is the same for both, different features must address each model's goals.Following knowledge distilled in the literature, we created features presented in Section 2.2.4,and some features of our own.
To model demand occurrence, we considered weekdays since last demand, day of the week of the last demand occurrence, day of the week for the target date, mean of inter-demand intervals, the mean of last inter-demand intervals (across all products), the skew and kurtosis of demand size distributions, among others.To model demand size, we considered, for each product, the size of the last demand, the average of the last three demand occurrences, the median value of past occurrences, and the most frequent demand size value, the exponential smoothing of past values, among others.
When computing feature values, we considered two forecasting horizons: fourteen and fifty-six days (Fig. 5), to understand how the forecasting horizon size affects the forecasts.
To forecast irregular demand, we compare seven methods from the literature: • Croston's method (Croston (1972)) • SBA (Syntetos and Boylan (2005)) • TSB (Teunter and Sani (2009)) • hybrid model proposed by Nasiri Pour et al. (2008).Considers a NN model (see Fig. 2) to forecast demand occurrence, while demand size is computed as exponential smoothing over non-zero demand quantities in past periods; • a hybrid model proposed by Willemain et al. (2004).Demand occurrence is estimated as a Markov process, while demand sizes are randomly sampled from previous occurrences; • ADIDA forecasting method, proposed by Nikolopoulos et al. (2011), which removes intermittence through aggregation, and then dissagregates the forecast back to the original aggregation level.
We also developed models of our own.We created a Catboost model (Prokhorenkova et al. (2018)) to forecast demand occurrence and compare six models to forecast demand size: Näive, Most Frequent Value (MFV), Moving Average over last three demand periods (MA(3)), Simple Exponential Smoothing (SES), random sampling from past values with jittering (RAND), and Random Forest Regressor (ML).While we initially used the LightGBM algorithm (Ke et al. (2017)) since four of the top five time-series forecasting models in the M5 competition were based on this algorithm (Makridakis et al. ( 2020)), we obtained better results with the Random Forest Regressor.
Catboost uses gradient descent to minimize a cost function, which informs how successful it is towards meeting the classification goal.Since the dataset regarding demand occurrence is heavily imbalanced (less than 6% of instances correspond to demand occurrence), we choose to optimize the model training with the focal loss (Lin et al. (2017)).The focal loss has the desirable property of providing an asymmetric penalization to train samples, focusing on misclassified ones to improve the overall classification.

Experiments and Results
This section describes the experiments we conducted and assesses their results with metrics we described in Section 3.2.For the SPEC metric, we considered α 1 and α 2 equal to 0.5.We summarize our experiments in Table 3, while we present their results in Table 4 and Table 5.
We adopt two forecasting horizons (fourteen and fifty-six days) to understand how sensitive are the existing approaches to the forecast lead time.To evaluate the models, we used nested cross-validation (Stone (1974)), which is frequently used to evaluate time-sensitive models.We test our models considering making predictions at the weekday level for six months of data.For classification models, we measure AUC ROC considering prediction scores cut at a threshold of 0.5.The only exception to this was the model by Nasiri Pour et al. (2008) since the author explicitly stated that they considered any prediction above zero as an indication of demand occurrence.
We observed that though the literature approaches provided different means to estimate demand occurrence, their performance to do so was close to 0.5 AUC ROC.Differences in performance were mainly driven by the method used to estimate demand size.
All the models we developed in our experiments strongly outperformed the models replicated from the literature.When considering AUC ROC, our models achieved scores of at least 0.94, almost doubling every model described in the literature.When considering regression metrics, our models displayed three to four times better MASE I and MASE II , and even more significant differences regarding the SPEC metric.We consider these results confirm the importance of considering the demand forecasting of irregular demands as two separate problems (demand occurrence and demand size), each with its features and optimized against its own set of metrics.Improvements in classification scores have a substantial impact on demand size and inventory metrics.
We obtained the best results with the Catboost classifier trained over all instances, making no difference between lumpy or intermittent demand.The model achieved an almost perfect classification AUC ROC score, reaching a value of 0.97 for fourteen and fifty-six days horizon.Among regression models used to estimate demand size, we achieved the best results with SES and MFV.The first one outperformed every other model on MASE I and MASE II , while the second one had the best median SPEC score and remained competitive on MASE I and MASE II .
When building the global classification model, we were interested in how much better it performed than models built using only lumpy and intermittent demands.We found that while the model built only on lumpy or intermittent demand achieved AUC ROC of at least 0,7368 and 0,9666 in each subset of products, while the global model built on all time series increased the performance to 0,9097 and 0,9776 respectively (see Table 6).
From the results, it seems the forecasting horizon has little influence on the classification performance.We attribute this difference to the fact that demand occurrence is scarce, and thus changes in demand behavior are likely to be slow.This fact has engineering implications since there is no need to retrain and deploy the classification model frequently.Achieving the best demand size forecasts with SES and MFV also means the demand forecasting does not require expensive computations and much maintenance in produc-  tion.Finally, we compared our C2R1-SES model against the ADIDA approach.To that end, we selected 64 products that had steady demand over the three years we considered and used SES to provide two-months ahead forecasts and measured MASE for the last six months using cross-validation.The C2R1-SES model with a fifty-six day forecasting horizon showed a strong performance, achieving a MASE of 0,0052, while for ADIDA-SES, we measured a MASE of 1,6640.Our model outperforms the state-of-the-art self-improving mechanism ADIDA at an aggregate level.In the light of the results obtained, we consider our approach has at least two significant advantages over aggregate forecasts.First, we avoid issues related to forecasts disaggregation, provided the classification model is good enough.Second, our models benefit from the information that is lost with aggregation, which may help to issue better forecasts.

Conclusions
In this research, we propose a new look at the demand forecasting problem for infrequent demands.Breaking it down into two prediction problems (classification for demand occurrence and regression for demand size), we (i) enable accurate model diagnostics and (ii) optimize each model for the specific task.Our results show that such decomposition enhances overall forecasting performance.Analyzing models proposed in the literature, we found   that most of them underperformed by first failing to predict demand occurrence accurately.Together with the problem decomposition, we propose a set of four metrics (AUC ROC for classification, MASE I and MASE II for regression, and SPEC to assess the impact on inventory).We also developed a novel model, which outperformed six models described in the literature for lumpy and intermittent demand and the state-of-the-art self-improving mechanism known as ADIDA.Considering the problem separation mentioned above, we also propose a new demand classification schema based on the approach that provides a good demand forecast.We consider two types of demands: 'R' for demands where only regression is required and 'C+R' where classification and regression models estimate demand occurrence and demand size, respectively.
We envision several directions of future research.First, we would like to replicate these experiments on some widely cited datasets, such as SKUs of the automotive industry (initially used by Syntetos and Boylan (2005)) or SKUs of Royal Air Force (used initially by Teunter and Duncan (2009)).Second, we enable new approaches to deal with forecasts involving items obsolescence, stationarity, and trend on irregular demands by decoupling demand occurrence from demand size.Third, we understand this approach can be extended to forecasting irregular and infrequent sales, which can significantly impact retail.Finally, we consider this approach can be applied in other domains with irregular time series, where currently self-improving mechanisms, such as ADIDA, are applied.
CV stands for Coefficient of Variation.Is computed over non-zero demand occurrences(Syntetos and Boylan (2001)).

Figure 1 :
Figure 1: Demand patterns classification.Fig. 1A shows classification proposed by Syntetos et al. (2005) based on empirical findings.In Fig. 1B we propose a new demand pattern classification, based on models required to solve the demand forecasting problem.'R' stands for 'Regression', while 'C+R' stands for 'Classification + Regression'.

Figure 2 :
Figure 2: MLP for hybrid approach proposed by Nasiri Pour et al.(2008).Inputs to the model are demand size at the end of the preceding period, the number of periods between the last two demand occurrences, the number of periods between target period and last demand occurrence, and the number of periods between target period and first immediately preceding zero demand period.

Figure 3 :
Figure3: Demand categorization schemas.On the right (Fig.2A) corresponds to the influential categorization developed bySyntetos et al. (2005).On the left (Fig.2B), we propose a new schema that only considers demand occurrence, dividing demand into two groups.'R' denotes regular demand occurrence, where demand size can be predicted with a regression model.'C+R' denotes irregular demand occurrence that requires a model to predict demand occurrence and a model to predict demand size.
Syntetos et al. (2005)  considered these two components and developed an influential demand categorization dividing demand into four types: smooth, erratic, intermittent, and lumpy, based on the coefficient of variation and the average demand interval.Many authors followed Croston's intuition, developing separate models to estimate demand occurrence and demand size (e.g.,Willemain et al. (2004); Hua et al. (2007); Nasiri Pour et al. (2008)), though none of them measured the performance of the demand occurrence component.We thus propose decoupling the demand forecasting problem into two sub-problems, each of which requires a separate model, features, and metrics: (i) demand occurrence, addressed as a classification problem, and (ii) demand size estimation addressed as a regression problem.Following the original work by Syntetos et al. (

Figure 5 :
Figure 5: We compute predictions for two forecasting horizons: 14 and 56 days, to test the sensitivity of predictions regarding demand occurrence and demand size to the forecasting horizon.

Table 1 :
Summary statistics for 49 lumpy demand time series.

Table 2 :
Summary statistics for 467 intermittent demand time series.

Table 3 :
Description of reference models we evaluated for demand occurrence forecast, and demand size estimation.

Table 5 :
Comparison of methods found in related work, and two of best models we created: C2R1-SES displays best performance on AUC ROC, MASE I and MASE II , while C2R1-MFV has best performance on the SPEC metric, while remaining competitive on the rest.

Table 6 :
Comparison of AUC ROC for lumpy and intermittent demand, obtained from C1 and C2 models.Using all data for a single classification model to predict demand occurrence shows improvements in both groups and time horizons.The highest improvement is observed for lumpy demand, with an improvement greater to 0,17.