A Sales Forecasting Model for New-Released and Short-Term Product: A Case Study of Mobile Phones

: In today’s competitive market, sales forecasting of newly released and short-term products is an important challenge because there is not enough sales data. To address these challenges, we propose a sales forecasting model for new-released and short-term products and study the case of mobile phones. The main approach is to develop an integrated sales forecasting model by training the sales patterns and product characteristics of the same product category. In particular, we analyze the performance of the latest 12 machine learning models and propose the best performance model. Machine learning models have been used to compare performance through the development of Ridge, Lasso, Support Vector Machine (SVM), Random Forest, Gradient Boosting Machine (GBM), AdaBoost, LightGBM, XGBoost, CatBoost, Deep Neural Network (DNN), Recurrent Neural Network (RNN), and Long Short-Term Memory (LSTM). We apply a dataset consisting of monthly sales data of 38 mobile phones obtained in the Korean market. As a result, the Random Forest model was selected as an excellent model that outperforms other models in terms of prediction accuracy. Our model achieves remarkable results with a mean absolute percentage error (MAPE) of 42.6258, a root mean square error (RMSE) of 8443.3328, and a correlation coefﬁcient of 0.8629.


Introduction
The current economic situation is characterized by intense competition, rapid product development, and increased product differentiation, resulting in shorter product lifecycles and greater volatility in sales patterns. These changes have significant implications for the retail industry, which faces stronger requirements for sales forecasting. In the industry, accurate sales forecasting is becoming increasingly important because if excessive sales forecasting is made, malicious inventory will accumulate, and if under-sales forecasting is made, the opportunity to increase profits will be lost. In particular, as the life cycle of the product is getting shorter, sales forecasting immediately after release is becoming very important. However, forecasting sales for newly released and short-term products is challenging because of the limited availability of historical sales data, a major source of sales forecasting. In particular, sectors such as electronics and fashion encounter challenges in accurately forecasting sales due to high product diversity and limited sales history [1]. Even with business expertise, predictions can still be influenced by cognitive and motivational biases [2,3]. Additionally, while it is known that there is some kind of nonlinear mapping relationship between sales series, it is difficult to explain it with a clear mathematical model. For the above reasons, machine learning models are suitable for learning and predicting quantitative data-based linear and nonlinear sales patterns.
While individual forecasting models of short-term products may face learning failures and generalization errors due to limited amounts and diversity of data, integrated models across short-term product groups can achieve stronger results. Given the shorter product lifecycles of products like mobile phones [4], an integrated model for the product group is ceptron algorithms in predicting demand for short-term and textile products, multi-layer perceptron emerged as the dominant model [29]. A study verified the prediction of blood demand through SVM and artificial neural networks and confirmed that artificial identity networks accurately predict actual demand [30]. A novel sales forecasting model is proposed, integrating temporal convolutional networks (TCN) for the robust extraction of deep temporal features, demonstrating superior performance compared to conventional neural network models [31]. Directed Acute Graph Neural Network, consisting of a layer of Convolutional Neural Networks and BiLSTM, showed high predictive performance as a revenue prediction method for e-commerce [32]. A study leverages several machine learning (ML) models, including recurrent neural network (RNN) models, such as LSTM and Temporary Fusion Transformer, to present models for accurate sales forecasting for restaurants. The results of the study confirmed that the RNN model shows the highest performance when trends and seasonality are preserved [33]. A study utilizes RNN, LSTM, and GRU models for precise power consumption prediction in IoT and big data settings, revealing that the ensemble model combining the three models achieves the highest accuracy rate of 98.43% [34]. There are also studies using SGTM neural-like structure, its modifications and non-iterative approaches for demand and sales forecasting. A study proposes a new linear supervised learning predictor for health insurance cost prediction, utilizing Ito decomposition and the Successive Geometric Transformation Model (SGTM). The results demonstrate its superiority over existing approaches (common SGTM neural-like structure, multi-layer perceptron, Support Vector Machine, adaptive boosting, linear regression) in terms of speed, generalization, accuracy, and scalability for large datasets [35]. A stackingbased GRNN-SGTM Ensemble Model is proposed for used car price prediction, and its performance is found to outperform classical regression methods and neural network-based approaches on an RMSE [36]. A novel non-iterative learning approach has been proposed that combines a Random Vector Functional Link (RVFL) network with Ensemble Empirical Mode Decomposition (EEMD) for crude oil price forecasting. Additionally, it is confirmed that the proposed EEMD-based RVFL network outperforms other single algorithms and ensemble methods in both forecasting accuracy and computational speed [37].
There are many prior studies on sales forecasts, but demand and sales forecast studies are mainly conducted on mid-to long-term products that can collect sufficient past sales data, such as electricity, automobiles, oil prices, and daily necessities. However, as previous studies have been investigated, there are few new-released or short-term product sales forecasting studies. There is a study that predicts the sales of new products through the correlation coefficient of sales of similar products [38], but no study that developed a sales forecasting model through machine learning or deep learning was found. To bridge this gap, we define product-related and sales-related variables that understand sales patterns for products belonging to the same product category and propose a sales forecasting model by comparing the performance of 12 supervised machine learning algorithms using real data from the Korean mobile phone market. We evaluate the model performance using commonly used metrics in sales forecasting studies, such as Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Correlation. Our results show that the Random Forest model has the highest predictive power. Among the considered linear, neural network, tree and nonlinear-based machine learning models, we also confirm that tree-based models perform better for sales forecasting. We believe that our work will provide valuable insights on a new basis that was not present in previous studies, especially for forecasting sales of new-released and short-term products such as mobile phones.
The remainder of this paper is organized as follows: Section 2 covers variable definitions and machine learning algorithms. Section 3 describes data collection, data statistics, experiments, and the results of model performance comparisons. Finally, Section 4 presents the conclusion.

Methodology
This section describes the methodology employed in this study, encompassing the meticulous definition of independent and dependent variables for sales forecasting and presenting a comprehensive overview of 12 sales forecasting models. Figure 1 illustrates the conceptual model for integrated sales forecasting adopted in this study. Sales forecasting is influenced by a variety of factors, which can be categorized into product-related and sales-related factors. experiments, and the results of model performance comparisons. Finally, Section 4 presents the conclusion.

Methodology
This section describes the methodology employed in this study, encompassing the meticulous definition of independent and dependent variables for sales forecasting and presenting a comprehensive overview of 12 sales forecasting models. Figure 1 illustrates the conceptual model for integrated sales forecasting adopted in this study. Sales forecasting is influenced by a variety of factors, which can be categorized into product-related and sales-related factors.

Product Related Factors 
Product attribute specification: Product attribute specifications play a crucial role in shaping consumers' perception of a product's relevance to their personal needs. Understanding consumer preferences across different lifestyles is essential since individuals prioritize their functional and hedonic needs to varying extents [39]. For instance, when purchasing a tablet computer, customers consider factors such as the operating system, battery life, screen size, and RAM level. Therefore, it is reasonable to take into account attribute levels when forecasting sales [40]. In this study, we consider 14 attributes, including the operating system, display size (mm), display resolution (ppi), CPU processor speed (GHz), number of processor cores, rear camera pixels (MP), front camera pixels (MP), storage (GB), width (mm), length (mm), depth (mm), weight (g), battery capacity (mAh), and RAM (GB).  Brands: Brand image plays a vital role in building brand equity, which encompasses consumers' overall perception and emotional response towards a brand, influencing their behaviors. Marketers aim to shape consumers' perceptions and attitudes towards a brand through marketing activities. The goal is to establish a strong brand image in consumers' minds, stimulate their purchasing behavior, boost sales, maximize market share, and develop brand equity [41].  Price: Price plays a significant role in consumer purchasing decisions and is equally important for providers [42]. Lower pricing can impact sales volume, as some providers strategically price certain products low to attract the attention of consumers with the intention of selling them other, higher-priced items. However, consumers may question the quality of a product if the price is excessively low. Many consumers

Product Related Factors
• Product attribute specification: Product attribute specifications play a crucial role in shaping consumers' perception of a product's relevance to their personal needs. Understanding consumer preferences across different lifestyles is essential since individuals prioritize their functional and hedonic needs to varying extents [39]. For instance, when purchasing a tablet computer, customers consider factors such as the operating system, battery life, screen size, and RAM level. Therefore, it is reasonable to take into account attribute levels when forecasting sales [40]. In this study, we consider 14 attributes, including the operating system, display size (mm), display resolution (ppi), CPU processor speed (GHz), number of processor cores, rear camera pixels (MP), front camera pixels (MP), storage (GB), width (mm), length (mm), depth (mm), weight (g), battery capacity (mAh), and RAM (GB). • Brands: Brand image plays a vital role in building brand equity, which encompasses consumers' overall perception and emotional response towards a brand, influencing their behaviors. Marketers aim to shape consumers' perceptions and attitudes towards a brand through marketing activities. The goal is to establish a strong brand image in consumers' minds, stimulate their purchasing behavior, boost sales, maximize market share, and develop brand equity [41]. • Price: Price plays a significant role in consumer purchasing decisions and is equally important for providers [42]. Lower pricing can impact sales volume, as some providers strategically price certain products low to attract the attention of consumers with the intention of selling them other, higher-priced items. However, consumers may question the quality of a product if the price is excessively low. Many consumers prioritize value over the lowest price and are willing to pay a price that reflects the worth of a product. Setting prices too low can create a perception among consumers that a product is less satisfactory compared to similar products on the market [43].
• Time of Introduction: Products like mobile phones have short release cycles, making it crucial to consider this factor when forecasting sales. Continuous releases of new mobile phone models in the market create competition, trends, and consumer demand [4]. Technology products, including mobile phones, often experience high sales immediately after their release, followed by a rapid decline in the sales curve. Therefore, the time elapsed since the product release is a significant factor in understanding the sales pattern [44,45]. The value assigned to the months after release starts with 1 for the month of release and incrementally increases by 1 for each subsequent month.

Sales Related Factors
• Previous sales: In the manufacturing industry, the previous month's sales have been identified as a particularly influential parameter in sales forecasting [6]. This suggests that the sales performance in the immediately preceding month plays a significant role in predicting future sales. Furthermore, research conducted in this domain has consistently shown that not only the previous month's sales but also the sales figures from the two to three months prior can impact the sales outcomes in the predicted months [7,8].

•
Moving average of sales: Capturing the trend of sales is recognized as a crucial variable in related studies. One commonly employed method to represent this trend is the use of moving averages. It is a prevalent research practice to calculate the moving average of sales over a period of two to three months [6,7]. By calculating the average sales over this time window, the moving average provides a smoothed representation of the sales trend, allowing for a better understanding and prediction of sales patterns.

•
Relative difference of sales: The majority of time series data commonly demonstrate discernible vibration patterns that can either exhibit a decreasing or increasing trend. These patterns are quantified as relative difference variables, which represent the growth rates of sales over time. Such variables hold significant importance as primary factors within sales forecasting models [46,47].
Therefore, a total of 24 predictor variables, comprising both sales-related and productrelated variables, are depicted in Table 1.

Machine Learning Methods
This section describes 12 machine learning models applied in our study.

Ridge, Lasso Regression
Multiple linear regression models tend to overfit. The relationship between feature values and label values was analyzed in more detail than necessary. This results in poor generalization and poor prediction of new data. Ridge and Lasso are methods used to overcome these shortcomings. The Ridge regression model is a method of estimating the regression coefficient by minimizing the objective function by adding an L2 penalty term to the sum of error squares in an existing regression expression [48]. The loss in Ridge regression is defined as: where β is the regression coefficient associated with the input parameters of the Ridge model; x and y are the input and output, respectively, n is the number of samples in the training dataset, and the hyperparameter λ is the penalty parameter. The Lasso regression model is a method in which the L1 Penalty term is added [49]. The loss in Lasso regression is defined as: where x and y are the input and output vector, respectively, n is the number of samples in the training dataset, β is the regression coefficient, and λ is the penalty parameter.

Support Vector Regression
Support Vector Machine (SVM) is a supervised learning model that solves computational problems that predict using a kernel. Specifically, the main objective of SVM is to create the best decision boundaries to separate n-dimensional spaces into separate classes. In SVM, the best decision boundary is called a hyperplane. The hyperplanes help improve the predictive power of the model and reduce errors in prediction and classification [50]. Figure 2 shows the main structure of the SVM. y represents the model's output, and b is the bias term to be optimized based on the regularized function. K is the kernel function. As shown in Figure 2, this is a small subset extracted from the training data by a related algorithm consisting of SVMs. Additionally, the kernel is used to transform the data into the necessary form through input. The SVM models use different types of kernel functions such as linear kernel, Bessel kernel, and radial basis kernel. The most popular of these kernel functions is the radial basis kernel with nonlinear characteristics.

Random Forest Regression
Random Forest (RF) is a tree-based ensemble model used to construct predictive models using objective functions as regression functions. The RF model uses data samples to create multiple decision trees, calculate each tree, and vote to produce the best results [51]. Key functions of the RF include speed and flexibility that generate the relationship between input and output functions. RF also handles large datasets more efficiently than

Random Forest Regression
Random Forest (RF) is a tree-based ensemble model used to construct predictive models using objective functions as regression functions. The RF model uses data samples to create multiple decision trees, calculate each tree, and vote to produce the best results [51]. Key functions of the RF include speed and flexibility that generate the relationship between input and output functions. RF also handles large datasets more efficiently than other machine learning techniques.

Gradient Boosting Regression
Gradient Boosting Machine (GBM) is a tree-based ensemble model, learning several weak learners sequentially, learning the wrong residuals, updating weights and improving errors. In particular, the gradient descent method is used as a method of updating the weights. The process repeats unless the maximum number of trees is reached or the response is improved [52].

AdaBoost Regression
AdaBoost or Adaptive Boost is a tree-based ensemble model, which is a machine learning sequential ensemble technique used to randomly combine several weak learners in a dataset to create powerful learners. Among all training data sets, each sample observation is weighted, identifying false predictions and weighing them to further assign them to the next learner. The exact process repeats until the algorithm can correctly classify the output [53].

XGBoost Regression
XGBoost is a tree-based ensemble model that uses the Base Learner as the decision tree and learns in a way that compensates for the weaknesses of the previous model. Specifically, XGBoost uses a boosting algorithm to continuously correct fitting effects; each tree grows from the residuals of the previous tree and weights the ensemble output of all regression trees to obtain predictions [54].

Lightgbm Regression
Lightgbm is a tree-based ensemble model that uses leaf-based segmentation rather than tree-based segmentation. This creates a deep, asymmetric tree while continuously segmenting leaf nodes with maximum loss values without balancing the tree. This minimizes the prediction error loss compared to the tree-based segmentation scheme [55].

CatBoost Regression
CatBoost is a tree-based ensemble model created to solve the overfitting problem of existing boosting models. To this end, CatBoost learns after calculating the residual with only a part of the learning data, and as a result, the model is rebuilt [56].

Deep Neural Network
Deep Neural Network (DNN) is a machine learning and deep learning method that defines complex architectures for artificial neural networks (ANN). In ANN, artificial neurons (nodes) that form a network by combining synapses change the binding strength of synapses through learning, minimizing errors between predicted and actual values [57]. DNN is a learning method with two or more hidden layers in an ANN structure [58]. Figure 3 shows the main structure of the DNN with two layers. The y represents the model's output and h is the neurons. In practice, neural networks with two hidden layers are widely used and have performed very well for time series data [59]. defines complex architectures for artificial neural networks (ANN). In ANN, artificial neurons (nodes) that form a network by combining synapses change the binding strength of synapses through learning, minimizing errors between predicted and actual values [57]. DNN is a learning method with two or more hidden layers in an ANN structure [58]. Figure 3 shows the main structure of the DNN with two layers. The y represents the model's output and ℎ is the neurons. In practice, neural networks with two hidden layers are widely used and have performed very well for time series data [59].

Recurrent Neural Network
Recurrent Neural Network (RNN) is a neural network-structured algorithm that is used for time-dependent or sequential data learning because it contains internal circulatory structures. It is an algorithm that can express information as previous information is accumulated in the current information by the internal circulation structure, and the information can be constantly updated because the data are circulated [60]. Given as an input, unit time , its hidden state ℎ . It is then computed as Equation (3): where and are parameters to be learned and tan ℎ is a hyperbolic tangent function.

Long Short-Term Memory
Long Short-Term Memory (LSTM) is a neural network-structured algorithm designed to enable long/short-term memory by compensating for the shortcomings that existing RNNs cannot remember information far from the output. It is mainly used for time series prediction and natural language processing. To solve dependency and vanishing gradient problems, LSTM uses the cell state to adaptively adjust the amount of historical memory and the new information currently available [61]. LSTM comprises two state vectors: unit at time , hidden state ℎ and cell state , and three gates: forget gate , input gate , and output gate . Each state and gate is computed as follows: • ℎ ,

Recurrent Neural Network
Recurrent Neural Network (RNN) is a neural network-structured algorithm that is used for time-dependent or sequential data learning because it contains internal circulatory structures. It is an algorithm that can express information as previous information is accumulated in the current information by the internal circulation structure, and the information can be constantly updated because the data are circulated [60]. Given x t as an input, unit time t, its hidden state h t . It is then computed as Equation (3): where W h and b h are parameters to be learned and tanh is a hyperbolic tangent function.

Long Short-Term Memory
Long Short-Term Memory (LSTM) is a neural network-structured algorithm designed to enable long/short-term memory by compensating for the shortcomings that existing RNNs cannot remember information far from the output. It is mainly used for time series prediction and natural language processing. To solve dependency and vanishing gradient problems, LSTM uses the cell state to adaptively adjust the amount of historical memory and the new information currently available [61]. LSTM comprises two state vectors: unit at time t, hidden state h t and cell state C t , and three gates: forget gate f t , input gate i t , and output gate o t . Each state and gate is computed as follows: C where W and b are parameters to be learned and σ is a sigmoid function as an activation function. The cell state ensures long-term dependence between data points in the input sequence and allows the LSTM to be applied to long sequence data.

Experiments and Results
In this section, we present the experiments conducted and the results obtained from the performance comparison analysis of the forecasting models. The methodology is outlined in Figure 4. Initially, we collected the required data for analysis. Subsequently, Electronics 2023, 12, 3256 9 of 19 descriptive statistics were examined for sales and other pertinent variables to enhance our comprehension of the dataset. Following this, we applied feature normalization and feature selection techniques to preprocess the data for modeling. These steps ensured the appropriate scaling of input features and the inclusion of only the most relevant ones in the analysis. Next, we employed the Leave-One-Out Cross-Validation (LOOCV) technique to forecast values for the test dataset to find robust models. Additionally, this approach is suitable as a sales forecasting scenario for cases where there is little sales data, such as short-term and newly released products. Finally, the MAPE, RMSE, and Correlation metrics compare the predicted value with the actual value to find the best-performing forecasting model.

where
and are parameters to be learned and is a sigmoid function as an activa-tion function. The cell state ensures long-term dependence between data points in the input sequence and allows the LSTM to be applied to long sequence data.

Experiments and Results
In this section, we present the experiments conducted and the results obtained from the performance comparison analysis of the forecasting models. The methodology is outlined in Figure 4. Initially, we collected the required data for analysis. Subsequently, descriptive statistics were examined for sales and other pertinent variables to enhance our comprehension of the dataset. Following this, we applied feature normalization and feature selection techniques to preprocess the data for modeling. These steps ensured the appropriate scaling of input features and the inclusion of only the most relevant ones in the analysis. Next, we employed the Leave-One-Out Cross-Validation (LOOCV) technique to forecast values for the test dataset to find robust models. Additionally, this approach is suitable as a sales forecasting scenario for cases where there is little sales data, such as short-term and newly released products. Finally, the MAPE, RMSE, and Correlation metrics compare the predicted value with the actual value to find the best-performing forecasting model.

Data Collection and Descriptive Statistics
In this study, we collected sales data for 38 mobile phones from January 2020 to December 2021, specifically, the monthly sales data for 7 months after each mobile phone is released. The sales data used in this study are provided by one of the three telecommunication companies in South Korea and include the monthly sales data for each mobile product. Figure 5 illustrates the monthly sales trend for each mobile phone over the seven months following its release. The graph includes 38 products, categorized as 22 Samsung-branded, 12 Apple-branded, and 4 LG-branded products. Each line represents a specific product. Among the 38 mobile phones, five of them achieved monthly sales exceeding 40,000 units at least once during the observation period. On the other hand, the remaining thirty-three mobile phones had sales below this threshold. It is observed that the sales of mobile phones generally exhibit an initial increase in the first three months after their release. However, the growth rate gradually diminishes in the subsequent months. Additionally, we refrained from removing outliers through outlier analysis to avoid excluding relatively high-selling products. Additionally, since forecasting high-selling products is crucial, our model training includes sales datasets for all products. This comprehensive approach ensures that our predictions encompass the entire sales spectrum, including high-performing products. In addition to the sales data, we also gathered 14 product attribute specifications for each mobile phone. These specifications include details such as the brand and release price. The release price data were obtained from the well-known mobile phone information website "http://www.cetizen.co.kr" accessed on 3 April 2023. Additionally, 14 detailed specifications for each product were collected from the official websites of Samsung, LG, and Apple. The product specifications can be found on Samsung's 'https://www.samsung.com' accessed on 3 April 2023, LG's 'https://www.lge.co.kr' accessed on 3 April 2023, and Apple's 'https://www.apple.com' accessed on 3 April 2023.
40,000 units at least once during the observation period. On the other hand, the remain thirty-three mobile phones had sales below this threshold. It is observed that the sale mobile phones generally exhibit an initial increase in the first three months after their lease. However, the growth rate gradually diminishes in the subsequent months. Ad tionally, we refrained from removing outliers through outlier analysis to avoid exclud relatively high-selling products. Additionally, since forecasting high-selling product crucial, our model training includes sales datasets for all products. This comprehen approach ensures that our predictions encompass the entire sales spectrum, includ high-performing products. In addition to the sales data, we also gathered 14 product tribute specifications for each mobile phone. These specifications include details such the brand and release price. The release price data were obtained from the well-kno mobile phone information website "http://www.cetizen.co.kr" accessed on 3 April 20 Additionally, 14 detailed specifications for each product were collected from the offi websites of Samsung, LG, and Apple. The product specifications can be found on S sung's 'https://www.samsung.com' accessed on 3 April 2023, LG's 'https://www.lge.co accessed on 3 April 2023, and Apple's 'https://www.apple.com' accessed on 3 April 20 In Table 2, we present the descriptive statistics of the X and Y variables used to fo cast sales for the 38 mobile phones released in Korea between January 2020 and Decem 2021.  In Table 2, we present the descriptive statistics of the X and Y variables used to forecast sales for the 38 mobile phones released in Korea between January 2020 and December 2021.  [62,63]. When variables possess varying magnitudes, machine learning techniques may fail to accurately capture their influence on the dependent variable. By applying the min-max scaling method, the normalization of values can effectively mitigate the impact of disparate magnitudes on the analysis results. This normalization process ensures a more reliable representation of the variable's influence, regardless of their original scales.

Feature Selection
Feature selection is a valuable technique that reduces the number of features used in model building, resulting in a more concise model that is quick to train, analyze, and comprehend. To avoid subjective human intervention, many studies employ quantitative methods for feature selection. Random Forest is a commonly used method for this purpose, as demonstrated in several previous studies [64][65][66]. Typically, variables with feature importance close to zero are eliminated [64]. In our study, we also employed Random Forest for feature selection.
To analyze the feature importance using the Random Forest method, categorical variables such as brand and operating system were converted into dummy variables. Additionally, since the number of trees utilized affects the estimation of variable importance, we performed the analysis with various numbers of trees (200, 500, 1000, and 2000) to obtain robust results for variable importance [64]. The average value of feature importance was calculated based on these different tree configurations. As depicted in Figure 6, the analysis of feature importance led to the removal of variables with close to zero importance, such as brand, number of processor cores, and operating system. Previous 1-2 months' moving average of sales, rear camera pixels, release price, and CPU processor speed were identified as variables of high importance. Consequently, the number of selected variables was reduced to 21.
We utilized the 12 models mentioned above to compare their performance in sales forecasting. The key hyperparameter settings employed in this study are presented in Table 3. The hyperparameter "alpha" corresponds to the regularization intensity for Lasso and Ridge, while the hyperparameter "cost" relates to SVM. For Random Forest, GBM, AdaBoost, LightGBM, XGBoost, and CatBoost, the hyperparameter "number of estimators" refers to the number of boosting trees. As for DNN, RNN, and LSTM, the hyperparameter represents the number of neurons. We set candidate values to find the optimal hyperparameters and selected the hyperparameters with the lowest RMSE and MAPE for the test dataset. To assess the forecasting performance and calculate the error rate of sales forecasting, we employed Leave-One-Out Cross-Validation (LOOCV) [67]. LOOCV involves training a model on all but one product and then evaluating the sales forecasting performance on the excluded product using the trained model. This process is repeated for all products in the dataset, ensuring comprehensive testing and minimizing randomness to obtain stable results. In our study, as illustrated in Figure 7, we conducted 38 iterations corresponding to the number of mobile phones and the error of the predicted values was computed. This approach is appropriate because it trains and predicts data from the same category of products when there is little sales data, such as short-term and newly released products. a model on all but one product and then evaluating the sales forecasting performance on the excluded product using the trained model. This process is repeated for all products in the dataset, ensuring comprehensive testing and minimizing randomness to obtain stable results. In our study, as illustrated in Figure 7, we conducted 38 iterations corresponding to the number of mobile phones and the error of the predicted values was computed. This approach is appropriate because it trains and predicts data from the same category of products when there is little sales data, such as short-term and newly released products.

Evaluation Metric
To evaluate the performance of the models, the evaluation metrics employed in this study include Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Correlation. MAPE provides insights into the average absolute deviations in terms of percentages, making it a suitable indicator for detecting marginal errors. Conversely, RMSE, which relies on standard deviation, is particularly sensitive to values with significant errors or outliers [68].
represents for the number of data indexes; and are considered as actual sales, and predicted sales.
Correlation is an indicator that analyzes the strength of the relationship between the predicted value and actual value.

Evaluation Metric
To evaluate the performance of the models, the evaluation metrics employed in this study include Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), and Correlation. MAPE provides insights into the average absolute deviations in terms of percentages, making it a suitable indicator for detecting marginal errors. Conversely, RMSE, which relies on standard deviation, is particularly sensitive to values with significant errors or outliers [68].
n represents for the number of data indexes; y i andŷ i are considered as actual sales, and predicted sales.
Correlation is an indicator that analyzes the strength of the relationship between the predicted value and actual value.
where cov is the covariance, σ y is the standard deviation of actual sales, and σŷ is the standard deviation of predicted sales.

Predictive Performance
The scatterplot in Figure 8 depicts the relationship between actual sales and predicted sales for the test data using LOOCV. The plot includes 266 sale points, representing 7 months multiplied by 38 products. The purpose of this plot is to examine the correlation between predicted and actual sales. Ideally, a forecast that closely aligns with actual sales would follow the red line. Points below the red line indicate that predicted sales exceed actual sales, while points above the red line indicate the opposite. Based on Figure 8c, the Support Vector Model's prediction reveals limitations in forecasting high-section sales. Additionally, the DNN, RNN, and LSTM models exhibit inaccuracies in predicting high-section sales. Specifically, these models tend to underestimate the actual values in the high sales range. between predicted and actual sales. Ideally, a forecast that closely aligns with actual sales would follow the red line. Points below the red line indicate that predicted sales exceed actual sales, while points above the red line indicate the opposite. Based on Figure 8c, the Support Vector Model's prediction reveals limitations in forecasting high-section sales. Additionally, the DNN, RNN, and LSTM models exhibit inaccuracies in predicting highsection sales. Specifically, these models tend to underestimate the actual values in the high sales range. To gain deeper insights into the variations in predictive performance among the top three models, namely Random Forest, CatBoost, and AdaBoost, Figure 9 visually represents the actual sales and predicted sales generated by these models. It is evident from the figure that the Random Forest model outperformed the other two models across all sales ranges, demonstrating superior accuracy in sales forecasting. To gain deeper insights into the variations in predictive performance among the top three models, namely Random Forest, CatBoost, and AdaBoost, Figure 9 visually represents the actual sales and predicted sales generated by these models. It is evident from the figure that the Random Forest model outperformed the other two models across all sales ranges, demonstrating superior accuracy in sales forecasting.

Comparison of Models
After forecasting sales using 12 machine learning models, the results were compared Figure 9. Scatter plot of measured sales and predicted sales with Random Forest, CatBoost, AdaBoost.

Comparison of Models
After forecasting sales using 12 machine learning models, the results were compared and evaluated based on RMSE, MAPE, and Correlation. The total ranking was calculated by summing the rankings for each metric, with lower rankings indicating a more dominant model. A lower MAPE and RMSE corresponded to a higher ranking, while a larger Correlation also led to a higher ranking.
The evaluation of the test dataset was compared and analyzed using machine learning models, as presented in Table 4  We confirm that the Random Forest model exhibits the highest prediction performance as an integrated prediction model but further review the prediction accuracy according to the brand. As shown in Table 5, all performance evaluation indicators showed high predictive performance in the order of Samsung brand products, Apple brand products, and LG brand products. To compare the forecasting accuracy between brands, based on Samsung brand products, Apple brand products had a relatively high error rate of 35.9%, RMSE, 36.6%, and a correlation of 3.5%, while LG brand products had a relative error rate of 82%, RMSE, 48.9%, and a correlation of 63.8%.

Conclusions
In the case of products with short-term lifecycles, such as mobile phones and new products, sales data collection is limited, making it difficult to predict sales. However, accurate sales forecasting is one of the important factors that maximize the company's profits, so it is a problem to be solved. This study proposes an integrated model that trains product-related and sales-related variables that can understand sales patterns and product specifications for the same product category. To this end, the optimal model was identified and developed by comparing and evaluating the performance of 12 machine learning models using 38 mobile phone sales data in the Korean market between 2020 and 2021 to identify the best performance models. The following observations were found in the analysis of forecasting models considering product and sales-related variables:

•
For the mobile phone sales forecasting case, the previous 1-2 month's moving average of sales for sales-related variables, and rear camera pixels, release price, and CPU processor speed for product-related variables were identified as variables that significantly affect sales.

•
The Random Forest model outperformed other models in sales forecasting, with the lowest-performing model, LSTM, exhibiting a significantly higher relative error percentage of 665% for MAPE and 86% for RMSE compared to Random Forest.

•
Consistent with previous studies [5], deep learning models such as DNN, RNN, and LSTM demonstrated lower performance than machine learning models when working with relatively small datasets.

•
The Random Forest model, with the highest prediction performance, exhibited varying accuracy for each brand. The order of high accuracy was Samsung brand products > Apple brand products > LG brand products.
The analysis results of this study have the following important implications for companies engaged in sales forecasting of products with short lifecycles, such as mobile phones: • Significant performance differences observed between the best and worst performance models highlight the need for informed decision making. Employing an unsuitable model can result in significant forecasting errors that accumulate over time, adversely impacting the entire supply chain. Thus, businesses should meticulously evaluate the specific characteristics of their sales data, consider the strengths and weaknesses of each model, and select the most suitable model aligned with their specific requirements and objectives.

•
We believe that companies that produce short-term products can optimize the supply chain strategy by applying the Random Forest model or analysis process proposed by our study.

•
The variation in predictive performance by brand may be attributed to differences in sales patterns resulting from brand-specific marketing strategies, including promotions and price policies [69,70]. To enhance forecasting accuracy, collecting additional data on promotion timing, price fluctuations, and advertising timing to reflect brandspecific marketing strategies would be beneficial.
Both directions to enhance model performance warrant further research. Firstly, assessing the impact of outlier processing on forecasting accuracy is crucial, as outliers can significantly influence results. Secondly, exploring the implementation of more advanced models, such as SGTM neural-like structures, modifications, and non-iterative approaches, holds promise for improving the overall forecasting performance. Furthermore, exploring the generalizability of our proposed forecasting approach is intriguing. A study report increased online sales of electronics during the COVID-19 pandemic [71], while others indicate a decrease in sales volume [72]. As research on the pandemic's impact on sales is ongoing, analyzing our model's predictive results before and after the COVID-19 pandemic could provide valuable insights to academia. This analysis would contribute to a better understanding of the model's effectiveness under varying market conditions. Future research endeavors could focus on validating the findings in diverse markets, taking into account the unique characteristics of different product lifecycles. Additionally, conducting studies that involve the collection and analysis of daily or weekly sales data would contribute to a more comprehensive understanding of sales forecasting and should be a subject of interest for future investigations.