Forecasting and Inventory Planning: An Empirical Investigation of Classical and Machine Learning Approaches for Svanehøj’s Future Software Consolidation

: Challenges related to effective supply and demand planning and inventory management impose critical planning issues for many small and medium-sized enterprises (SMEs). In recent years, data-driven methods in machine learning (ML) algorithms have provided beneﬁcial results for many large-scale enterprises (LSE). However, ML applications have not yet been tested in SMEs, leaving a technological gap. Limited recourse capabilities and ﬁnancial constraints expose the risk of implementing an insufﬁcient enterprise resource planning (ERP) setup, which ampliﬁes the need for additional support systems for data-driven decision-making. We found the forecasts and determination of inventory management policies in SMEs are often based on subjective decisions, which might fail to capture the complexity of achieving performance goals. Our research aims to utilize the leverage of ML models for SMEs within demand and inventory management by considering various key performance indicators (KPI). The research is based on collaboration with a Danish SME that faced issues related to forecasting and inventory planning. We implemented the following ML models: Artiﬁcial Neural Network (ANN), Long Short-Term Memory (LSTM), Support Vector Regression (SVR), Random Forest (RF), Wavelet-ANN (W-ANN), and Wavelet-LSTM (W-LSTM) for forecasting purposes and reinforcement learning approaches, namely Q-learning and Deep Q Network (DQN) for inventory management. Results demonstrate that predictive ML models perform superior concerning the statistical forecasting approaches, but not always if we focus on industrial KPIs. However, when ML models are solely considered, the results indicate careful consideration must be regarded, given that model evaluation can be perceived from an academic and managerial perspective. Secondly, Q-learning is found to yield preferable economic results in terms of inventory planning. The proposed models can serve as an extension to modern ERP systems by offering a data-driven approach to demand and supply planning decision-making.


Introduction
Businesses within supply chains often consist of different relative sizes, resulting in firms having various financial, management, knowledge, and technology capabilities and resources available to secure efficient supply chain planning [1]. These capabilities make substantial impacts on the business goals of small and medium-sized enterprises (SME) experience [2]. To investigate the origin of this key issue, ref. [3] describes the future of Enterprise resource planning systems (ERP) and the associated software complications in their comprehensive study on the practical implications of ERP systems. The authors found that digitalization will enable new opportunities for SMEs, since domain knowledge can serve as a modifier to the forecasting framework by incorporating distinctive features.
In the literature on SMEs and ERP implementation, we found there are substantial barriers and challenges still remaining, and ERP implementations should not be categorized as standard IT projects [4]. Rather, an ERP implementation is an organizational change that requires interdepartmental participation across the whole organization [5]. Ref. [6] found reduced inventory to be the most common, while reduced planning, decreased lead time, and improved communication and decision-making are all established organizational gains after successful ERP implementation. Although ERP systems can impact SMEs positively, acknowledgment and awareness regarding potential implementation risks and issues associated with unsuccessful implementation must be considered. Ref. [7] identified the risks in the ERP adaptation for SMEs are mainly SMEs limited resources and their distinct operating characteristics, making the implementation case different from larger enterprises. ERP systems can adapt to manufacturing environmental changes but not uncertainty, which generally constitutes the random interventions of the real world. SMEs tend to use buffering and dampening techniques to manage the aspect of uncertainty impact with the competitive environment, strategic objective, and manufacturing structures. As a response to the standardized commercial commodity ERP software solutions, more recently, ref. [8] proposed free, open-source ERP as a framework, enabling cost-effective, customized ERP solutions for SMEs, who lack the finical capacity to invest in commercial ERP systems. Additionally, the authors argue free, open-source ERP can serve and support.
Accurate demand forecast forms the backbone of supply chain planning since a forecast is a main facilitator of efficient procurement, production planning, control, and inventory management [9,10]. Forecasts establish the operational and strategic information background that establishes the future decision-making processes of an organization. Data limitations or limited sources of historical data often prevent the application of data-driven models. To combat this issue, ref. [11] propose a set of grouping schemes based on different grouping criteria for product classification to enhance the application of machine learning models with the objective of learning future demand. However, the qualitative forecast methodology lacks a generic method approach to track the overall performance related the forecast accuracy and the potential benefits of combing the qualitative forecast with a data-driven forecast to aid the identification of trends and patterns.
The forecasting inefficiency also makes a cascade effect on inventory planning. Inventory management should be assessed as an enabling competence, and in order to fully utilize the company's forecasting tools, inventory management and forecasting should be combined [12]. In practice, many SMEs reduce their competitive advantage by having inadequate inventory management. Some of the main contributing factors to inadequate inventory management within SMEs, which can be listed as lack of established planning processes, insufficient information flow between sales and the supply chain and production department and lastly, key contributing factors such as the lack of knowledge and skills of the employees [13]. Most SMEs either use Material Resource Planning (MRP) or re-order point methods as their inventory policies. But the parameters these methods use need to be closely monitored and updated in order for the models to adjust to the dynamic environmental changes in a given supply chain [14][15][16]. Theoretical methods which focus on this specific problem associated with a joint multi-item and multi-stage optimum have been widely investigated, but due to limitations and some methods being to complex, a lot of the theoretical methods are deemed infeasible. As a consequence, a gap between theoretical inventory models and practical industrial application has appeared, which [17] discusses.
Recently, we have worked with Svanehøj Danmark A/S (https://www.svanehoj. com/, accessed on 23 February 2023), where the sponsor company relies on qualitative forecasts based on the sales representative's market intuitions in combination with external market research report. The main objective was to gain insights into future requirement quantities for the bill of material (BOM)-level raw materials, optimize the supply chain planning, and close internal information gaps between outbound and inbound logistics by conducting demand planning. The process' aim was to initiate the initial steps of conducting sales and operations planning (S&OP) on the different product groups to balance and reduce gaps between the demand and supply chain planning. Within this case study, the company manufactures after the Make-to-Order (MTO) principle. Svanehøj's current supply chain planning process is restrained as an MRP environment in their ERP system, which operates deterministic and suggests future procurement quantities based on a set of coverage groups. However, due to human intervention and scaled up labor costs, they find difficulties: Products with large inventory and stockouts disturb their timely delivery commitment.
This paper aims to evaluate data-driven algorithms for forecasting and inventory planning, and validate how those can benefit SMEs. We implemented eight forecasting methods and three inventory replenishment algorithms to verify future software consolidation possibilities. Therefore, our study contributes to applying Q-Learning for an inventory management problem in the reward mechanisms to allow a more efficient inventory management process. The organization of the article is as follows: In the following subsection, a comprehensive review is conducted on state-of-art literature where inventory and forecasting issues are discussed. In Section 2, we briefly introduce the sponsor company. The methodology implemented to evaluate forecast and inventory planning methods and key performance indicators (KPIs) used for testing. Section 3 represents the results. Managerial insights are drawn in Section 4, and finally, conclusive remarks are presented in Section 5.

Literature Review
Within recent years, reinforcement learning has shown potential progress within the field of operations research. Inventory planning can be semi-atomized and modeled as an environment that an agent interacts with. We present a brief overview on recent applications of machine learning methods on inventory planning in Table 1. Table 1. Literature Review: application of machine learning in inventory replenishment decision.

Supply Chain Environment Settings
Approach Findings Practical Implications [18] An ordering replenishment problem in a retailer setting with the objective to minimize cost. The authors assumes a fixed lead time from a supplier.
The authors suggest Q learning and Sara algorithm to find near optimal replenishment policies for perishable products and compare the outcome of six situations.
The numerical results show the discount factor, learning rate and exploration rate influences the model's learning performance.
The results show an ordering policy that incorporates the age information and inventory quantity yields better outcomes than quantity depended policies. A comprehensive insight is cost analysis for the policies for demand variability, lead time, cost ratio and product lifetime. [19] Multi echelon supply chain setting with a factory, multiple warehouses with stochastic, seasonal demand.
Considers the following RL algorithms SARSA and Reinforce with an agent that acts based on the (s,Q) policy under a simple and complex scenario.
The finding shows both the SARSA and Reinforce approach yield a higher performance than the (s,Q) policy.
The authors show agents can be programmed to cope with demand trends, production level and stock allocation. The findings show agents can cope, operate, and take action under simple market conditions.

Paper Supply Chain Environment Settings
Approach Findings Practical Implications [20] A multi echelon supply chain system with plant, warehouse, retailer and customers under a simple, complex and special case, each with different settings.
The authors consider a vanilla policy gradient algorithm to solve the supply chain problem through profit maximization. The production target and quantities target are either discrete or continuous variables. Afterward, the results are compared with the outcome of a (r,Q) policy.
Based on all three cases, the DRL agent outperforms the (r,Q) policy.
The results show that the two methods, (1) Action clipping and (2) Output activation function, both yield better results than the (r,Q) policy. The free DRL models do not consider transition probability and are thereby able to make decisions without knowing the demand probability. [21] Multi Echelon Systems.

Paper Supply Chain Environment Settings
Approach Findings Practical Implications [24] A multi-echelon supply chain system based on the beer game.
Reconfigure the DQN algorithm to operate in cooperative environment.
The results indicate DQN models obtain near-optimal policies when compared with agents that follow a base stock policy.
Transfer learning is an approach that makes the DQN agents flexible to cope with different cost structure and settings without the need for vast training. DQN shows promising numerical results in a supply chain setting with real-time information sharing between each chain entity. [25] Investigate DRL in three different inventory problems: (1) lost sales; (2) dual sourcing inventory; (3) multi-echelon inventory management.
Formulate the inventory problems as MDP and use the Asynchronous Advantage Actor-Critic (A3C) algorithm.
The paper provides a proof of concept of deep reinforcement learning ability to solve classic inventory problems. Additionally, the paper underlines that the AC3 algorithm adapts well to the stochastic environment company.
The authors conduct Sensitivity with the objective to find the optimal gaps between state-of-the-art heuristic polices and the AC3 algorithm. The findings showcase the AC3 algorithm yields a good performance for long lead times and has a higher overall performance than the heuristic algorithms. Table 1 reflects that various algorithms are implemented for inventory planning. We refer to recent review work by [26] for a more detailed overview in this stream of research. Regarding the application of forecasting methods, we also refer to [27,28] for recent development. Note that researchers mostly focus on inventory planning and forecasting as two separate directions. Evaluations are performed while either disregarding the importance of forecasting or assuming the idealistic forecast is already available. However, this is not the case, at least for our sponsor company. And for SMEs, the improvement of forecast accuracy and inventory performance might not be aligned and create issues with various magnitudes. Therefore, our study supplements the literature and highlights the possible consequences for SMEs.

Company Background
This study is based on collaboration with the Danish SME Svanehøj Danmark A/S (https://www.svanehoj.com/, accessed on 23 February 2023), which produces hightechnology pump solutions for cargo pumps and safe handling of complicated fluids for the marine industry. In recent years, Svanehøj has experienced a vast demand spike across their whole product portfolio. While being affected by the supply chain disruption, mostly due to COVID and technology integration in their products, they face several issues. Primarily, the company's challenges have been within the procurement process of raw materials while maintaining appropriate inventory levels to serve future expected demand. We use three of their critical sub-components to facilitate their production planning, to study whether their current procurement planning and inventory management are possible fundamental issues that cause the degraded company performance.

Data Used
The quantitative data used in this study are based on their ERP system and were extracted through Microsoft Dynamics AX. The data consist of two different data types, either categorical or numerical. The categorical data are both ordinal (e.g., the BOM list) and nominal (e.g., supplier information). The categorical and nominal data were used for labeling and storing the data. The numerical data were both discrete and continuous and serve to give insight into numbers and measurements. Based on the discussion with the sponsor company representatives, we selected three important sub-components, which are used for producing multiple finished products.

Methodology
The following Figure 1 provides the step-wise overview of the computational work. As shown in Figure 1, we start by processing the data and performing time series decomposition on the processed data, to reveal trends, seasonality and residuals. We then perform a stationary test of each data set, to see whether the data are stationary or not. Lastly, we prepare the data sets to be applied on ML methods, by dividing the data into train and test sets. We then compute each method before validating the forecasts and computing their performances based on chosen KPIs. For the inventory models, we feed the models with demand information and observe how they behave and perform according to the chosen KPIs. The models are then compared and the best performing methods are identified.

Technical Details
All the methods used in this paper have been developed and written in Python Language and their implementation in TensorFlow [29].

Methods for Forecasting
Different statistical and data-driven methodologies are investigated in this study to evaluate a reliable forecast framework to evaluate their performance. There are respective impacts on the business from both a managerial and mathematical perspective. The selection of the forecasting methods we tested are categorized as: (i) statistical (e.g., Simple Exponential Smoothing (SES), Autoregressive Integrated Moving Average (ARIMA)); (ii) artificial intelligence (e.g., RF, SVR, W-ANN, W-LSTM). When we use a data-driven approach, we divide the data set into an 80:20 ratio to consider as a training and test set. All the forecasting methods are implemented and written in Python programming language. A brief description of each of these methods is presented below: Simple Exponential Smoothing (SES): Simple exponential smoothing is a method that considers the previous observation in a data set and forecasts the next value by a predetermined smoothing constant α, typically between 0 and 1, which is added to the difference between the past value and the next. This error gap between the past and next observations and the simplicity of the model is the main driver for its piratical implementation. The smoothing constant used in our study is optimized automatically. SES is easy to learn and apply and only requires a small sample of historical data. However, the disadvantage is that the model is inaccurate for long-term forecasts and is negatively affected by volatile demand [28].
Autoregressive Integrated Moving Average (ARIMA): Autoregression integrated moving average is a statistical forecast model that seeks to capture correlations between observations in a time series data set. ARIMA can be understood by describing its components: (i) Autoregression (AR), (ii) Integrated (I), and (iii) Moving Average (MA). The (AR) uses observations from past time steps as input to the regression. The parameter p specifies the number of past time steps used. (I) is the differentiation of the original observations, which transforms non-stationary data into stationary. The parameter d specifies how many orders of difference are considered to be differentiated to become stationary. (MA) establishes autocorrelation between observations and outputs a residual error to lagged observations. The parameter q determines the number of moving averages included in the model [30].
To determine the order of the ARIMA(p,d,q) models in this paper, the Akaike Information Criterion (AIC) has been used. The ARIMA model has the advantages of capturing trends and seasonality, however, but fails to capture intrinsic non-linearity [31]. Support Vector Regression (SVR): SVR is a classical approach based on statistical learning theory. The method is based on the principle of identifying a function that minimizes risk in opposite to finding empirical errors like in linear regression [32,33]. Three hyper-parameters govern the performance of a given SVR model. The parameters are determined either through theoretical error bounds or experimentation of error by cross-validation. However, in practical cases, parameter tuning is difficult, based on the complex nature of the parameter space. This means it can be challenging to determine the relationship between the parameter and the input noise [33][34][35]. We identified SVR key advantages by the model's ability to comprehend small data sets that consist of nonstationary and volatile data.
Random Forest (RF): Random Forest is an algorithm within machine learning, which can be used for both classification and regression. RF has the ability to serve as a method for a wide range of predictive problems especially within piratical problems [36]. RF is a method that has the advantage of being able to handle the data of both relatively small samples as well as large-scale complex data structures. The model simplicity make it easy to use and the model consists of few parameters [37,38]. The average of all trees' predicted value is then considered as the final predicted value [38]. From a forecast perspective, RF regression ability can be used for a time series data set for value prediction purposes, where the performance can be improved by tuning a set of parameters and a variable selection (future selection). As a predictor RF is based on growing M randomized regression trees, where a predicted value is denoted by a random variable that re-samples a fitting set before growing M trees.
Wavelet-Artificial Neural Network (W-ANN): The artificial neural network is a deep learning model that operates as a universal approximator and can accurately predict values. The model consists of a network structure with input, hidden, and output layers and considers non-linearity between the data points. This mentioned strength enables ANN to forecast future time series values [39,40]. However, to decrease and smooth the time series data and further increase the predicted accuracy, wavelet transformation can be used in combination with an ANN model. Wavelet transformation enables a counter approach to decompose the series and cope with local trends and seasonality. We refer to [41,42] for detailed discussion on W-ANN. The main advantages are ANN's ability to learn, comprehend, and approximate solutions to learning problems that are complex in nature, but the methodology is computationally intensive and hyper-parameter sensitive [43].
Wavelet-Long Short-Term Memory (W-LSTM): In the field of deep learning are, LSTM models, which are an extension of Recurrent Neural Network (RNN). An LSTM model enables the benefit of handling non-linearity and is able to cope with sequential data while capturing long-term dependency. The LSTM model consists of a memory cell and an output gate where an activation function controls the output [44]. Hence, the LSTM model does incorporate additional parameters in comparison with the ANN model [45], but as previously mentioned, the wavelet transformation allows for the decomposition of a time series data set, which is the main argument for using WT on the LSTM model [41]. While we evaluated performance, we tested standard LSTM and W-LSTM. We found LSTM offers the advantage of handling long data sequences and complex learning patterns, but the method is more prone to overfitting, especially when training data are limited. Although LSTMs were designed to mitigate the vanishing gradient problem, in practice, they are still prone to this issue.

Performance Measures for Evaluating Forecasting Methods
The following classic performance measures are used to evaluate forecasting methods in the perspective of theoretical background: (i) Mean absolute error (iii) Root mean squared error (RMSE) = 1 n ∑ n t=1 (y t − f t ) 2 , where y t represents the actual value and f t represents the forecasted value at Period t.
However it is found that the selection based on those performance measures might not lead to the desirable outcome from a company's perspective [46]. To keep the accuracy high, it is common in data-driven approaches to procure a large volume, at least at the beginning of the training data period [47]. Therefore, the following KPIs are also used to judge forecasting accuracy: (iv) Fill Rate ( The listed KPIs are deliberately selected based on both academic and industrial implications, given that the objective is to measure performance from a mathematical and industrial perspective.

Methods for Inventory
Over the years, many replenishment decisions have been proposed for defining the replenishment strategy under time-varying demand. Definitely, stochasticity is an issue, but in this study, we focus on replenishment decisions under time-varying demand. One of the simplest ways to decide a replenishment value is the implementation of Economic Order Quantity (EOQ), based on aggregated demand level for a fixed planning horizon. Very recently, data-driven approaches are also being tested. We made the following assumptions when we implemented various replenishment policies: (i) demand is considered to be timevarying, (ii) capacity of the supplier is unlimited, (iii) for each order, the sponsor company needs to incur a fixed ordering cost, (iv) the holding cost is considered as constant, the purchase cost, and sales price remain the constant during the evaluation period and (v) the lead time is constant for each product and remains the same throughout the evaluation period.
EOQ: The economic order quantity (EOQ) model is a simple fixed order quantity model that minimizes the holding and setup cost, which are both associated with carrying and procuring inventory on an annual basis [48]. However, the disadvantages in the context of our study is that it fails to capture the time-varying nature.
Q-Learning Algorithm: Q-Learning is a class of reinforcement learning, which gain growing popularity to semi-automatised inventory replenishment decision in recent years. The algorithm operates as an off-policy, which means the optimal policy is obtained independently of the agent's interactions with a given environment. In Q-learning algorithms, the decision-maker needs to set the following parameters: α, the learning rate and γ, the discount factor to optimize the model's performance [49].
If we assume Q(S,A) represents the Q-value for a given action, A, and the state, S. The algorithm updates the Q-value using a learning rate α, the reward function R, which is obtained by taking an action, the discount factor γ, and the maximum Q-value of the next state-action pairs. The reward function defined in the Q-learning and DQN is as follows: Total pro f it(TP) = <sales revenue> − <holding cost> − <purchase cost>− (1) <ordering cost> − <salvage cost> Note that if the product is non-deteriorating, we exclude the salvage cost.
where R represents reward, which is problem-specific. For instance, when we implement th algorithm, we use the reward function as presented in Appendix B. Note that we use the following cost parameter, as presented in Table 2, to make inventory planning. Based on the literature, the main advantages of Q-learning are the algorithm's ability to learn from past experience and that it can be adjusted to prioritize long-term or shortterm gain. The disadvantage of Q-learning is the algorithm becomes inefficient when interacting with large scale environments and the algorithm is sensitive to hyper-parameter adjustment.
Deep Q-Network (DQN): As we mentioned earlier, although Q-learning is extensively used in various decision-making contexts, we also introduce DQN, which is more robust in terms of computational efficiency and capabilities [50]. DQN incorporates neural networks by utilizing neural networks' ability to approximate the optimal action-value function. The neural networks perform this function by minimizing a loss value. The neural network facilitates the composition by taking the temporal difference between the estimated action value function q θ (s, a) and the associated observed cost after simulating x given periods(s) and attach the last value functions of the last state and transition to min a q θ (s , a), where a temporal difference for a finite list that represents a sequence of elements (s, a, c, s ) is outlined, where the difference is: When we optimize the performance, we use ADAM Optimizer, and the loss function is defined as [51] A loss function that guides a gradient-based search to update θ k seeks to minimize the mean squared error. The main advantage of considering a value-based approach is this method tends to learn better policies in offline settings. For both Q-learning and DQN, we consider a value function that consists of price, procurement quantity, holding cost and ordering cost [52,53]. We found, based on the literature, that DQN enables the advantage of interacting with large scale environments unlike Q-learning, but DQN is a computationally intensive approach and requires a large sample of data for model training.

Results
The manager of the stock keeping unit (SKU) of our sponsored company needs to manage thousands of products. Therefore, the product categorization is key, although it is not directly associated with our study. But we briefly introduce one of the practices used for product categorization before jumping into the main analysis.
The product categorization is performed by the determination of coefficient of variation (CV) = StdDev Mean for forecast-ability. Although we only consider only three items, we still follow the sponsor company's current practice of item classification as shown in Figure 2. Therefore, we found that the requirements are not unique. One of the objectives of the study is also to test which forecasting method or replenishment decision is suitable for slow-, medium-and fast-moving raw materials. Another type of classification is variance analysis, which is commonly used in industry, and which might support the categorization of SKU items [54]. Therefore, we found that, (i) Item-1: Slow-moving, High Cost, and Low Variance, (ii) Item-2: Slow-moving, High Cost, and High Variance, and (iii) Item-3: Fast-Moving, Low Cost, and Medium Variance.
The data were aggregated on a monthly basis and converted to a time series data set consisting of 46 data points, from April 2019 to January 2023, and the three critical items are used with no missing value. These products are mostly, sourced from Denmark and Asia and used in almost every finished product. As we mentioned earlier, with all forecasting methods we divide the data set into 80:20 ratio, to train and test the outcome, when we rely on machine learning methods.
As we mentioned earlier, whether we keep the focus on forecasting accuracy based on classical measures or KPIs, we can find that based on the choice; there is no unique method that ensures higher performance. The item with the lowest cV = 36.94, ARIMA(2,1,0) is the best performing, in terms of MAE and RMSE, while SVR has the best MAPE score. From a managerial perspective, LSTM and W-LSTM have the highest fill rate, ARIMA has the lowest AIL and THC, while LSTM and W-LSTM have the least stockouts. The item with the highest cV = 64.95, is best performing with RF and SES when looking at MAE, W-ANN has the best MAPE score of, while LSTM has the best RMSE. Looking at FR, W-LSTM is the best performing, while W-ANN performs the best looking at AIL and THC. Both SVR and W-LSTM have the least stockouts.
Noticeably, if the cV is increased, for instance, itemID with the highest cV, waveletincorporated machine learning methods lead to higher performance, from a managerial perspective. Definitely, some of the advantages of wavelet transformation incorporated with machine learning are: (i) the capture of both smooth and abrupt changes in data, and (ii) highlighting both high and low-frequency components. In most studies, when forecasting measures are considered, they ignore the effect of industrial KPIs. Therefore, our results clearly demonstrate that the selection of forecasting techniques based on classical measures (e.g., MAPE, RMSE) might fail to ensure managerial transition. It also reflects that data-driven methods should not be blindly implemented as a forecasting tool in the industrial context. Managers in SMEs have several operational limitations, such as limited technology infrastructure, given SMEs industrial data management processes and data quality often lacks volume and accuracy [55]. If there is a frequent stockout scenario (negative AIL), it does not bring any value in terms of delivery performance, and it might hurt the reputation of SMEs. Noticeably, the 100% fill rate is an idealistic scenario, and some of the forecasting methods (W-LSTM), in our study context, ensure a 100% fill rate, but companies would have to keep an extensively high inventory. And their capacity may be over-utilized, still we found that classical approaches, such as ARIMA, might be a viable option, and in terms of a data-driven approach, LSTM is a possible solution. Note that, similar to our study, LSTM was also tested by [56], and the author also found reasonably good performance. Figure 3 below shows an overview of the comparison of the forecasting models' performance for all three items. The SVR is not performing in an optimized way, which is also common in the literature [57]. Another point, which literature also supports, is that in instances where the coefficient of variation (cV) has a low value, conventional forecasting models, such as SES, are often adequate [58]. In the context of data sets with a high cV value, literature posits that machine learning models are highly capable of predicting with high accuracy, even though there is a presence of high volatility [59]. We refer to Appendix C.1 for the parameter settings for both forecasting and inventory models.
Previously, we compared the performance of data-driven and classical approaches, and we found that the classical approach can still be implemented, at least in forecasting. But if we look at inventory management, data-driven solutions like Q-learning are a possible solution for replenishment. We refer to Tables 3 and 4 for the detail results. Definitely, in terms of the total cost, the method outperforms the EOQ and EOQ is good when we consider the holding cost and ordering cost, but in our case, due to the high ordering cost compared to the low inventory holding cost, we obtain the desirable solution under the Q-learning method. The results in Table 3 demonstrate that the EOQ method orders more frequently, compared to the number of orders for Q-learning and DQN, and generally performs much worse. Although DQN is another possible data-driven approach, it is also not performing well in the context of our study. The rationale behind this is that, compared to Q-learning, DQN requires a much larger data set to be able to train the agent properly. This study only has four years of data, so it might be considered a bottleneck for DQN's potential performance [60]. In addition to non-deteriorating items, we also compute the replenishment decision for deteriorating items, and the results are presented in Table 4 and Figure 4. We refer to Appendix A for the detail calculation of profit function. As expected, the cost must be increased and the results demonstrate that fact.  (c) Total profit. Figure 4. Results for non-deteriorating items with time-varying demand.

Discussion
Throughout the years, research on forecasting models has been gaining more popularity and the models have become more advanced. Statistical methods from an academic point of view have shown a strong foundation in improving performance and great promise to improve potential smooth operations. However, when looking at the practical use of forecasting methods, there is not much research on how the models perform in terms of industrial KPIs. This paper explores both the selected models' performance from an academic perspective as well as the company's perspective.
Similar work for forecasting and inventory management has been performed, but it mainly focuses on either inventory replenishment decisions or improving forecasting accuracy. For instance, ref. [61] also implements forecasting techniques; however, the author does not consider combining forecasting and inventory management. Similar to our work, ref. [24] also implement DQN, but lack the practical perspective. Furthermore, they only consider inventory management. Our study clearly reflects that the two stream should be analyzed together. In the subsequent subsections, we discussed the implications in detail [62].

Time Series Forecast Implications
The findings presented in Table 5 provide a comprehensive view of the performance of selected forecasting methods, taking into account both classical academic and managerial KPIs. The SES model, from an academic perspective, performs better than most of the machine learning models in all instances, making it more desirable from an academic viewpoint. Looking at SES from a managerial perspective, it does not outperform any of the other forecasting methods. Looking at the industrial KPIs on Table 5, it consistently underforecasts and experiences stockouts in every period for all three items. In contrast, the W-LSTM model performs well from a managerial standpoint, despite having higher error scores than the SES. W-LSTM boasts a higher FR in all instances, although experiencing stockouts in some periods, it is still more desirable than SES from an managerial perspective. This performance balance illustrates the importance of considering various KPIs from multiple perspectives when evaluating forecasting models. Looking at it from a managerial perspective, there has to be a trade-off between high inventory levels and holding costs or 100% FR. In conclusion, there is no single approach to selecting a forecast model. The choice heavily depends on the business context and the trade-offs a company is willing to make between various KPIs. For instance, a company that values customer satisfaction and wants to avoid stockouts at all costs might prefer the W-LSTM model, while a company that wants to minimize costs might choose the W-ANN model because of the lower AIL level and lower holding cost. However, this comes at the cost of having more frequent stockouts and lower customer satisfaction.

Inventory Management Implications
In terms of the implementation ease, the EOQ model has the upper hand. As one of the most classical inventory management methods, EOQ can be readily implemented using basic mathematical calculations and does not require specialized software. However, its assumptions of constant demand and non-deteriorating items may limit its effectiveness in our scenario of constant-deteriorating items with time-varying demand. Conversely, the Q-Learning technique, being a type of reinforcement learning algorithm, necessitates an advanced understanding of ML and AI. However, it is potentially more suited for our scenario, given its ability to adapt to variable demand and optimize the total profit. The resources required for implementing each method vary significantly. The EOQ model, due to its simplicity, has minimal resource requirements. It can be calculated using basic tools and does not demand a high level of technical expertise. In contrast, the Q-Learning model requires substantial computational resources and data science expertise. Despite higher requirements, the superior performance of the Q-Learning model, particularly in terms of total profit, may make this investment worthwhile for companies with sufficient resources.

Time-Varying Demand
Given the actual practice at Svanehøj during the study period, the results at Figure 3 show Svanehøj's current inventory practice is the worst performing for, respectively, item 1 and item 2, while ranking as the second best for item 3. Looking at the ML models, Q-Learning delivers the best performance in terms of the total profit for all three products. Looking at the tables, one of the reasons why Q-learning is outperforming all the other methods is because it is extremely efficient at keeping the inventory levels very low by balancing order quantities and for the given demand. Ultimately leading to a much higher profit than the other methods. Lastly, the most complex method, DQN, does not show any improvements compared to Q-Learning. One issue that DQN seems to have is the model tends to procure higher quantities than the actual demand. From these comparisons, it is clear that different inventory management techniques have unique impacts on cost components and total profit for non-deteriorating items with time-varying demand. Furthermore, it is also visible that the Q-Learning model is superior to the DQN, EOQ, and actual practice.

Software Consolidation
The proposed framework we present should be viewed as an ERP extension tool, with the intention of decreasing the gap between SMEs and the identified software consolidations we have previously highlighted. Additionally, the presented methodology intent to demonstrate how ML models can be an applicant approached, to facilitate data-driven decision making within SME, and how ML models from a forecast and inventory perspective can support effective production planning and control. However, the governing input parameters for ML models must be compliant with the internal domain knowledge, and ML model output must be benchmarked against carefully selected KPIs to facilitate effective model implementation.

Conclusions
We are focus on the interaction between demand forecasting and inventory planning, in the context of SMEs. The empirical results demonstrate a selection of ML models that yield superior performance in comparison to the classic statistical forecast methods. However, when ML models are solely considered, the results indicate careful consideration must be regarded, given that model evaluation can be perceived from both an academic and managerial perspective. The performance indicator from all three items signifies several ML models yield different outcomes based on the nature of the given input data and the selected hyperparameter settings. This concludes from both a practical and academic domain that no single ML model can be considered; rather, a selection of several ML models must be evaluated in accordance with the presented framework in Figure 1. Secondly, drawing from the empirical findings based on the two scenarios with non-deterioration and deterioration. Conclusively, it can be deduced that the use of RL in combination with the Q-learning algorithm obtains the most preferable economic results from all cases. Furthermore, since RL generates a unique policy based on the provided data and the environmental setting, the results show RL is a promising data-driven approach that can solve inventory problems and generate custom inventory policies for individual items, which indeed can serve to close a gap between theoretical inventory models and piratical industrial use.
As a last remark, this paper aid in the reduction of the current industrial research gap identified by [26], which outlines the lack of RL models application on real-world problems within the field of production planning and control. Further, investigation is required to evaluate the performance in a multi-echelon setup. In this study, we aggregated the data on a monthly basis, as the sponsor company also operates in same manner. Definitely, for more thorough analysis, sensitivity analysis for data aggregation, rolling horizon or more practical demands, such as stock dependent demand and price dependent demand, can provide more critical insights.  Data Availability Statement: All data generated or analyzed during this study are included in this published article. The raw data are available on request from the author (madsheltoft@hotmail.com).

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The company had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A. List of Notations
The following notations are used in this article as listed in Table A1:

Appendix B. Optimization Model for Deteriorating Products
Assume that Qis the amount of inventory replenishment the beginning of each cycle. Therefore, inventory level (I(t)) decreases due to the quantity demanded up to time T 1 , the length of the replenishment cycle. Therefore, the inventory at time t ∈ (0, T 1 ) is governed by the differential equation with the initial condition I(T 1 ) = 0. Solving the differential equation, we obtain Therefore, the holding cost for the entire cycle is obtained as follows: From the boundary condition, I(T 1 ) = 0, we obtain the replenishment quantity for each cycle as follows: The total profit for each cycle is obtained as follows: Based on Equation (A5), we make the following two cost functions: (i) profit function when the product is not deteriorating (θ = 0), (ii) profit function when the product is deteriorating (0 < θ << 1). the model's outcome. Definitely, we do not integrate any hyper-parameter optimization tools, but in practice the operator might fix it by conducting sensitivity analysis, and this is also not a key focus of this study. For all the neural-network-based models, where an activation function is required, ReLU activation has been utilized.

14:
Perform a gradient descent step on (y j − Q(φ j , a j ; θ)) 2 according to θ 15: Every C steps resetQ = Q 16: end for 17: end for Similarly, we present the parameters used for implementing Q-learning and DQN policy, for determining optimal procurement schedule as shown in Table A3. The following parameter settings are also used by [63].