Sales Forecasting for Fashion Products Considering Lost Sales

: Sales forecasting for new products is signiﬁcantly important for fashion retailer companies because prediction with high accuracy helps the company improve management efﬁciency and customer satisfaction. The low inventory strategy of fashion products and the low stock level in each brick-and-mortar store lead to serious censored demand problems, making forecasting difﬁcult. In this regard, a two layers (TLs) model is proposed in this paper to predict the total sales of new products. In the ﬁrst layer, the demand is estimated by linear regression (LR). In the second layer, sales are modeled as a function of not only the demand but also the inventory. To solve the TLs model, a gradient-boosting decision tree method (GBDT) is used for feature selection. Considering the heterogeneity in products, a mixed k-mean algorithm is applied for product clustering and a genetic algorithm for parameter estimation in each cluster. The model is tested on real-world data from a Singapore company, and the experimental results show that our model is better than LR, GBDT, support vector regression (SVR) and artiﬁcial neural network (ANN) in most cases. Furthermore, two indicators are built: the average conversion rate and the marginal conversion rate, to measure products’ competitiveness and explore the optimal inventory level, respectively, which provide helpful guidance on decision-making for fashion industry managers.


Introduction
Sales forecasting is a critically important task in the fashion retail industry. It is an integral component of many core fashion retail processes, including the functional areas of pricing and inventory management, marketing and production purchasing, as well as finance and accounting [1,2]. Forecasts on sales also provide the basis for regional distribution and replenishment plans [3]. Therefore, the ability to estimate the probable sales quantity could lead to improved customer satisfaction, reduced waste, increased sales revenue and more effective and efficient production plans as after-sales services [4]. Generally, accurate sales forecasting is a prerequisite to competing efficaciously in the fashion world [5].
A number of methods have been developed over the past two decades. Statistical methods are widely used in forecasting fashion products, including traditional time series techniques and discrete choice models such as tje multinomial logistic model and Bayesian analysis [6,7]. Another popular method in sales forecasting is machine learning, such as artificial neural networks (ANN), evolutionary neural networks (ENN) and extreme learning machines (ELM) [8][9][10].
Despite the extensive research on the sales forecasting problem in fashion retail industry, there remain some practical issues that need consideration. Fashion products have the characteristic of short product life cycles [11]. Therefore, fashion products usually have a low inventory strategy in general, and every brick-and-mortar store has a low stock level, which leads to a serious problem with demand censoring [12,13]. Censored demand increases the difficulty of forecasting and the impact of inventory, and censored demand to sales are unignorable in the fashion retail industry. Research on the total sales forecasting problem for fashion products with censored demand is relatively rare, which motivates us to design a new model to alleviate the impact of censored demand on sales considering the lost sales phenomenon. The objective of this paper is to improve the total sales forecast accuracy for fashion products and provide helpful guidance on decision-making for fashion industry managers.
In this paper, a two layers (TLs) model for forecasting the total sales of new products is proposed, considering not only the actual demand but also the effect of censored demand in brick-and-mortar stores. In the first layer, the demand is estimated by traditional linear regression (LR), while in the second layer, the sales are modeled as a function of not only the demand but also the inventory. We incorporate two key parameters in this model, with one describing a product's basic conversion rate of one inventory to sales and another one describing the customers' preference for products. To solve the model, the gradient-boosting decision tree (GBDT) method is applied for feature selection. Considering the heterogeneity, products are clustered by a mixed k-mean clustering algorithm, and parameter estimation is conducted separately in each cluster by a genetic algorithm (GA). Collaborating with a multinational fashion retailer in Singapore, experiments are conducted based on real-world data of shoes and belts. The results of experiments are compared based on different models, including our TLs model, LR, GBDT, support vector regression (SVR) and ANN. The results show that our method outperforms other sales forecasting methods in most cases. For further conclusions, we analyze the results obtained by the TLs model and define the average conversion rate when the inventory level is equal to demand as one indicator to measure product competitiveness. We also involve the marginal conversion rate as another indicator for managers to make proper inventory decisions.
Generally, our contribution is summarized in three points: (1) A TLs sales forecasting model is proposed based on the relationship between sales, inventory and demand and incorporates two parameters to describe the conversion rate of inventory to sales. The marginal sales, which is the change in total sales that arises when the inventory is increased by one unit, is modeled by the conversion rate of inventory to sales. (2) A comparison is conducted among the sales forecasting results of a fashion enterprise using the TLs model, LR, GBDT, SVR and ANN. The result of the experiments shows that our TLs model outperforms benchmarks in most cases. (3) Two different indicators, the average conversion rate and the marginal conversion rate are defined to measure the product competitiveness and decide the optimal inventory level. These could provide fashion managers with great help in product and inventory management.
The rest of this paper is organized as follows. In Section 2, we present a literature review on the related work. Section 3 introduces the proposed TLs model and gives an explanation of the parameters. Section 4 is about the results and analysis of the data experiment, introducing two indicators, the average conversion rate and the marginal conversion rate. Finally, Section 5 concludes our study and points out potential directions for future research in the subject area.

Literature Review
Our work is related to the research on the forecasting problem in the fashion industry and retail industry. The commonly adopted methods are statistical methods and machine learning methods. In the following, we review several existing methods for solving the forecasting problem.
Being widely applied, statistical methods include regression [14], time series techniques, Bayesian analysis and discrete choice model. Time-series methods are traditional univariate forecasting models, ranging from simpler moving averages, the exponential smoothing family, to Auto Regression-Integrated Moving Average approach (ARIMA) [15]. Green and Harrison [16] apply a Bayesian approach to conduct forecasting for a mailorder company, which sells ladies' dresses, while Lee et al. [17] propose a hierarchical Bayesian model over prelaunch sales forecasting of recorded music. The discrete choice model (DCM), for example, the multinomial logistic model (MNL), is also popular in prediction. DCM conducts retail forecasting by modeling consumers' probability of choice. The literature explores the application of DCM to a forecasting problem in the retail industry [7,18]. Despite their popularity, statistical methods present several limitations. For example, statistical methods require relatively restrictive assumptions to perform well, such as the sample distribution of products and the independently identical distribution of samples. Since sales of fashion products exhibit a high degree of randomness, these hypotheses might not correspond with the actual situation and have a negative impact on the forecasting performance.
Another stream of studies on machine learning methods emerged with the advance computer technology. Popular methods such as artificial neural networks (ANNs), evolutionary neural networks (ENNs) and extreme learning machines (ELM) are widely used in the literature. Frank et al. [19] conduct fashion retail sales forecasting by using ANN due to its promising performance in the areas of control and pattern recognition. Chawla et al. [20] display a case study of an American retail corporation on forecasting by using ANN. ENN is a hybrid combination of evolutionary computation and neural networks. Au et al. [8] develop an optimized neural network structure for the forecasting of apparel sales using ENN. ELM learns faster than the conventional gradient-based learning methods and is usually adopted together with other methods by many researchers. Yu et al. [21] employ both the ELM and the traditional statistical methods to forecast sales, and Wong and Guo [22] propose a hybrid intelligent model by using ELM and the harmony search algorithm. Further, Choi et al. [23] achieve real-time sales forecasting by combing the ELM with the grey model. Machine learning performs well in identifying and modeling data patterns that are not easily discernible by traditional statistical methods. Some research studies show that machine learning methods outperform traditional statistical methods [24,25]. Machine learning methods are sometimes unstable; in other words, there are cases in which they might not work well. Moreover, these methods lack interpretability and need a substantial amount of time to conduct the prediction.
For the problem of censored demand, there have been research studies on inventory management in recent years. Nahmias and Smith [26] explore the optimal inventory levels in a retailer system with lost sales, while Heese and Swaminathan [27] provide insights regarding optimal inventory and sales effort management in the presence of unobserved sales. To the best of our knowledge, no previous literature studies the sales forecasting problem of fashion products in the case of censored demand.

Methodology
In this section, we propose a two layers (TLs) model for forecasting the total sales of new fashion products accounting for censored demand. We then develop an optimization method and establish the optimal sales plan after the introduction of the proposed model. The flow of the solution is shown in Figure 1, where the whole process of how the raw dataset transforms to forecast results is displayed from left to right. After the raw dataset enters the system, a gradient-boosting decision tree model is involved to conduct feature selection. After selecting the important features for training, we cluster the dataset by using a mixed k-mean method with the consideration of heterogeneity in different products. When finishing the steps above, a genetic algorithm is applied for parameter estimation in each cluster to give the output of forecasting in the TLs model.

The Proposed Sales Forecasting Model
Sales are transformed from the potential demand, and the transformation process is limited by the scale of inventory of a fashion product. When the inventory is less than the potential demand, some demand will not be met. However, even if the inventory is sufficient, some customers may not purchase the product because products are not placed in the correct store, which leads to demand loss. Given the demand, we define marginal sales as the change in the total sales that arises with the increment in inventory by one unit. In an ideal state, as long as there exists unsatisfied potential demand, the increased inventory will be sold out, and the marginal sales are equal to one. However, in reality, marginal sales are always less than one. As the sales increase, the number of potential customers decreases, and the chance of the product being discovered by the customer drops, leading to a decrease in marginal sales as well. In this manner, we model the marginal sales by the following equations: where S(I) and M(I) = S(I) − S(I − 1) is sales and the marginal sales when inventory is I. The sales iterative formula describes the relationship between sales, inventory and demand. We already have the marginal sales equation when the inventory is I, and the product sales can be calculated by the iterative formula S(I) = ∑ I i=1 M(i). The right side of Equation (1) consists of two parts, and the first part is q. Since S(0) = 0, then M(1) = q describes the marginal sales of the initial inventory. Therefore, q ∈ [0, 1] is defined as the basic conversion rate, which is determined by a product's inherent properties.
Another part of Equation (1) actually describes the chance of products being found in brick-and-mortar stores, where S(I) D is the proportion of the demand being satisfied. When the proportion is small, there are still a lot of unsatisfied demands, so the marginal sales brought by one-unit increased inventory is relatively large. When the proportion approaches one, most of the demand is satisfied, and increments in inventory can hardly transform into sales, resulting in small marginal sales. In general, marginal sales that decline as sales increase are described by 1 − ( S(I−1) D ) α . For some products, customers have a strong willingness to look for stores with stocks. Thus, the chance of these products being discovered is higher, and the decline rate of marginal sales with an increase in sales would be relatively small. We use α to control the decline rate. When α approaches infinity, the marginal sales will remain at the degree of the basic conversion rate.
Combining the influence of both products' inherent properties and potential customers on sales, the final increased unit of sales when one unit of inventory increases is determined by q(1 − ( S(I) D ) α ).
According to different q and α, products are divided into three categories: major commodity, common commodity and targeted commodity.
Products that have large q and α are defined as major commodities. Major commodities are usually products keeping up with the fashion trend in season. Their targeted customers have high stickiness and enthusiasm and are willing to make an effort to find them.
Products that have large q and small α are defined as common commodities. Common commodities are products with basic or classic styles. Their demands are stable, but customers give up finding them when the difficulty of finding them increases. For example, when the brick-and-mortar store the customer enters is out of stock in a common commodity, the possibility is small that the customer will go to another store specifically just to find the commodity.
Products that have the smallest q are defined as targeted commodities. Targeted commodities are designed for minority groups. Their basic rate of conversion is the lowest. Enterprises produce targeted commodities to improve competitiveness by increasing their product variety.
The total demand of a product is denoted by D. We use multiple linear regression to estimate demand by using a product's features. The linear model can be written as: where n is the number of products, D i refers to the demand for product i and β 1 specifies the impact of the features on demand. X i are features of product i. Since the TLs model is used to predict the total sales of new products and demand is decided by the product's inherent properties, such as price, factory cost and color, sales are not involved in features X i when estimating demand. Finally, the TLs model is described in two layers. The first layer consists of Equation (2), while the second layer consists of Equation (1).
The following theorem states the fundamental properties of the TLs model. Proof of Theorem 1. We start by proving that S(I) satisfies 0 ≤ S(I) ≤ D, for any I ∈ N + using mathematical induction. First, when I = 1, from Formula (1) and given conditions we have obviously, 0 ≤ S(1) ≤ D holds. When I ≥ 2, suppose 0 ≤ S(I) ≤ D holds, there always exists a p ∈ [0, 1] such that S(I) = D · p holds. From the TLs model, we have Define f (p) = D − S(I + 1), and bring (3) into f (p), then we get and its derivative function Since p a−1 ≤ 1, D ≥ α · q, (4) satisfied the following inequality: Note that f (p) is a decreasing function and f (1) = 0, we obtain Therefore, according to f (p) = D − S(I + 1), (3) and (5), 0 ≤ S(I + 1) ≤ D holds. Through mathematical induction, we now have From Formulas (1) and (6), we have sequence {S(I)} that is monotonically increasing and bounded with respect to D. According to the monotone bounded convergence theorem, we know that the limit of {S(I)} exists. Suppose lim I→∞ S(I) = S * , when I → ∞, from the TLs model we have It can be obtained by the above formula that S * = D, which completes the proof.

Feature Selection
Each product has a great number of category features, and category features are processed by one-hot encoding. This generates hundreds of features to be handled, while most might are unimportant. The model is too complex with all features, and a large number of features lead to the over-fitting problem. To alleviate this problem, a GBDT algorithm is used to select features by sorting the importance of features in descending order.
GBDT is a popular machine learning method constructed by an ensemble of weak CARTs (classification and regression trees). During training, the selected feature splits and their relative influence defined in [28] will be recorded. Let m org denote the initial number of features, and the importance of the j-th feature in a single decision tree is expressed as: where L − 1 is the number of non-leaf nodes of tree T, I is the indicator function, v t is the feature associated with node t andÎ 2 t is the corresponding reduction of square loss as a result of the split. The global importance of j-th feature is measured by the average importance of j-th feature in a single tree. The formula is given by: where H is the number of trees, T h refers to the h-th decision tree. Then the five-fold cross-validation method is applied to select important features based on the greedy strategy. By defining the dataset into the training set and validation set randomly, the method adds features based on the features' importance. Through computing the MAPE of the result, the features that have the smallest MAPE are selected to conduct the following clustering and forecasting operations.

Clustering
Products show heterogeneity with each other and have different α and q. It is natural to cluster these products and conduct parameter estimation separately. Each cluster shares the same α, q, β 0 and β 1 . Considering that these products usually have both continuous and classified attributes, this paper uses the mixed k-mean algorithm proposed by Ahmad and Dey [29] to cluster. Let X 1 = (X 11 , X 12 , . . . , X 1m ) T and X 2 = (X 21 , X 22 , . . . , X 2m ) T denote the attribute vectors of two products, where the first m r attributes are numeric, and the remaining m − m r attributes are category. The distance between X 1 and X 2 is as follows: where w is the significance function for numeric attributes, and δ is the distance function for category attributes defined in [29]. If the number of clusters is too small, it could not reflect the difference between products; also, if the number is too big, samples will be over-dispersed. Therefore, the number of clusters is limited to 3 to 5, and we employ the popular measure silhouette coefficient in [30] to determine the best number of clusters. The silhouette coefficient for dataset {X 1 , X 2 , . . . , X n } is defined as where a(X i ) is the average distance of sample X i to other samples in X i 's cluster, b(X i ) is the average distance of sample X i to all samples in the nearest cluster.
The higher the value of the silhouette coefficient is, the better the clustering effect is. The number of clusters with the largest silhouette coefficient is selected to conduct clustering and establish the TLs model.

Optimization Tasks
The parameters in our model are estimated by minimizing the gap between actual sales and forecast results during the training process. The optimization task is defined as: where α k ≥ 1, 0 ≤ q k ≤ 1, K is the number of clusters, N k is the number of products in cluster k, S i is the actual sales of product i andŜ i is the predicted sales of product i from the TLs model. Here, in cluster k, β 0k and β 1k is in the first layer for demand estimation and α k , q k is in the second layer for sales prediction. Actually, the optimization task of this problem is difficult for the TLs model to handle; it is an iterative formula. In order to solve the problem, a genetic algorithm (GA) is applied. GAs compute the optimal parameters iteratively. In each iteration, the parameters are tended to optimization by mutation, crossover and selection. GA is a popular technique for solving complex optimization problems [31,32], and has been widely used in the field of sales forecasting [33,34].

Data Description
The data are provided by our industry partner, a multinational fast fashion enterprise founded in Singapore. The company's products are mainly sold offline and are popular around the world. The company has many stores distributed in different large shopping malls and commercial pedestrian streets in Singapore.
Shoes and belts are selected as the forecast objects, for they are popular products that fashion managers pay much attention to. The original data includes the sales record in 30 stores from 1 January 2017 to 1 June 2019; a total of 126 weeks. After field research, we found that it usually takes 16 weeks for a company to receive the finished products from the factory. It is also known that fashion products have a short product life cycle. Fashion products would be off the shelves or have discounts 11 weeks after being released, which has a negative influence on the profit. Thus, managers of fashion companies only care about the sales in the first 11 weeks. Under this definition, the product life period is set to 11 weeks, and the lead time for fashion products is set to 16 weeks.
The training set and testing set is divided by time (see Figure 2). Because the forecasting model is about the total sales of new products, we have to make sure that all the products in the training set and the testing set should be released into the market for at least 11 weeks. On the left side of Figure 2, the fashion products' launch time was from 1 January 2017 to 7 April 2018; a total of 66 weeks, with a product life of 11 weeks. Here, the sales records in the first 77 weeks make up the training set, and a lead time of 16 weeks is considered when dividing the testing set. On the right side of Figure 2, the products' launch time was from 14 October 2018 to 16 March 2019, with a product life of 11 weeks. Here, the sales records in the last 33 weeks are defined as the testing set. This ensures that every product in these sets has a complete life cycle. Three statistics for each category are computed: the size of data, average and standard deviation in sales and inventory. These numerical characteristics are summarized in Table 1. From Table 1, it is noted that sales are close to inventory. This is because fashion products have a short life cycle and companies usually take a short inventory strategy. The difference between sales and inventory is thus generally small, and a serious censored demand problem is caused.ho

Preprocessing
The original data is incomplete and has a lot of noise data, so the data are preprocessed first. The steps are as follows: (1) Missing value processing. Delete data with more than 50% missing value. For continuous data, fill the missing value with the mean value. For discrete data, treat the missing values as a separate category. (2) Feature normalization on continuous data. Through normalization, the unit limit of data is removed, which is convenient for the comparison and weighting of indicators. The min-max normalization scheme is expressed as: Y ij = X ij − X min,j X max,j − X min,j , i = 1, . . . , n; j = 1, . . . , m r , where X min,j and X max,j denote the maximum value and minimum value of continuous feature j, respectively.
After preprocessing the data, one-hot is conducted to handle categorical features, and the GBDT algorithm is applied before feature selection. Clustering and parameter estimation are both conducted based on the processed dataset after feature selection.

Comparison Models
Statistical methods have been widely used in previous research on the forecasting problem in the retail industry and often require relatively restrictive assumptions or otherwise will lead to problems [7]. Nevertheless, the research problem of this paper is about the total sales forecasting in the fashion industry, especially for those fashion enterprises that have brick-and-mortar stores, while statistical methods in existing research on prediction in the retail industry usually give periodical results of forecasting rather than total sales [15,17]. Since research on the sales forecasting problem with censored demand is relatively rare, we conduct field research and find that the industry prefers to use traditional machine learning methods when facing the forecasting problem. Therefore, we introduce four more comparison models as follows: • Linear regression (LR) • Gradient-boosting decision tree (GBDT) • Support vector regression (SVR) • Artificial neural network (ANN) Table 2 reports the base forecasters and their software tools in this experiment. The data for the benchmarks are processed data after feature selection and clustering. Moreover, inventory is also used as a feature for benchmarks, considering the fairness of comparison. When forecasting, if the result of sales is negative, sales take zero; if it is greater than the inventory, take the quantity of inventory as sales.
All experiments were run in Matlab 2019b and Python 3.8.10. In our computational work, we used a Quad-core Intel Core i5 pc clocked at 2.4 GHz processor, equipped with 16 GB RAM.

Measure of Forecast Accuracy
In this paper, the forecast accuracy is evaluated using the following performance measures: (1) mean absolute error (MAE); (2) mean absolute percentage error (MAPE). The definitions are as follows: where n is the total number of products, and S i and S i represent the actual and predicted sales of products i, i = 1, . . . , n.

Results for Parameter Estimation
A total of 20 important features are selected for clustering in the shoe category using the feature selection method in Section 3.3. The runtime of each step is shown in Table A1, and the detailed information on the selected features is given in Table A2 in Appendix A. Then, the silhouette coefficients are computed to determine the best number of clusters. For shoes, the silhouette coefficients of cluster numbers 3 to 5 are 0.2407, 0.1999 and 0.2019. For belts, the silhouette coefficients of cluster numbers 3 to 5 are 0.4240, 0.2938 and 0.3946. As a result, both shoes and belts are divided into three clusters. The statistic characteristics of each cluster are given in Table 3. After clustering, GA is used to find the optimal solution for parameters in the TLs model. The results for shoes and belts are listed in Table 4. Each cluster corresponds to a commodity classification in Section 3.1. For shoes, the difference between q is small, and sales are mainly influenced by α. In the shoe dataset, cluster 3 has the highest α with 5.0117 and q with 0.8065. Products in cluster 3 are recognized as major commodities. Similarly, cluster 1 is the targeted commodities and cluster 2 is the common commodities. For belts, cluster 2 is the major commodity because it has the highest q and α. Cluster 1 is the common commodities, and cluster 3 is the targeted commodities. To conclude, the estimation results demonstrate our model well. Graphs also prove our conclusion about the products category above. Each item in the dataset has its own demand estimation. In order to display the general result of our TLs model, we set the rounding of the average demand of all samples in a corresponding dataset as the demand. The maximum inventory is set to 3 times the predictive value of demand. Figure 3 illustrates the changes in the sales forecast result related to inventory compared with the ideal conversion rate of inventory to sales. Here, Figure 3a,b are shoes and belts datasets whose demand is 94 and 116, respectively. The difference between each curve is significant. The major commodity has the curve that is closest to the ideal state. The common commodity has a slower conversion rate than the major commodity, and targeted commodity is at the bottom of the three curves.  Table 5 reports the forecasting performance of all base forecasters. The best performance is highlighted for each setting. In general, our model outperforms other models in most cases. Both shoes and belts show the best performances in cluster 3 over MAE, with, respectively, 4.5365 and 3.8445 from our model. In terms of MAPE, the proposed model achieves 0.1359 in shoe cluster 2 and 0.1365 in belt cluster 3. It is also noted that the TLs model shows the best performance over the whole shoe dataset in terms of MAE and MAPE and the belts dataset for MAPE. The reason might be that the TLs model takes inventory as a significant property of products and makes full use of it. The TLs model delves into the relationship between sales and inventory, while other methods only take inventory as an input feature and a constraint for sales. This is crucial because the existence of censored demand in fashion products makes sales closely related to inventory. As can be seen from the table, LR, GBDT, SVM and ANN did not show significant differences in the results. We note that ANN has a relatively good performance in forecasting belts' sales but performs much more poorly in the shoes' cluster. Especially in the whole belts dataset, ANN has a smaller MAE compared with the TLs model. The forecasting performance of ANN is unstable, and the possible reason is that the training sample is not big enough for ANN to extract abundant information.

Results for Forecast
To conclude, the TLs model shows the best performance in the experiment in most cases. Since the products in this dataset are typical fashion products with high demand volatility, the TLs model is likely to give better performance in the fashion industry with censored demand and multiple brick-and-mortar stores.

Further Discussion
In this subsection, further conclusion from the model for the fashion managers is provided. The conclusion consists of two parts: measure of product competitiveness and auxiliary index for inventory decision.
Three important indicators are introduced first: average conversion rate γ a , marginal conversion rate γ m and demand satisfied proportion γ d . The expression of each indicator is as follows: where I ∈ N + , S(0) = 0, γ a is the converting ratio of the total inventory to sales, and γ m is the converting ratio of a unit of increased inventory to sales, also known as the marginal conversion rate. γ d is defined as the proportion of satisfied demand. All ratios are related to q and α in the proposed model. Figures 4 and 5 display the relationship between inventory and three ratios: the average conversion rate, marginal conversion rate and demand-satisfied proportion over the shoes and belts datasets. Figure 4a-c plots the curves of three clusters in the shoe dataset. The demand set uses the same method shown in Figure 3, where the three cluster in the shoe dataset have an average demand of 106, 89 and 90, respectively. The setting of Figure 5 is the same as Figure 4.
In general, the graph shows a similarity based on the product category. In terms of γ m , major commodities, described in Figures 4c and 5b, have a relatively high γ m and decline rapidly when the inventory level exceeds the demand. Common commodities and targeted commodities have smooth curves compared with major commodities, as shown in Figures 4a,b and 5a,c. Further, when the inventory level is the same, more demand is satisfied in major commodities than in the other two commodities. (1) Measure of product competitiveness The average conversion rate when the inventory level is equal to demand is denoted by γ a(D) . γ a(D) is defined as the inventory conversion level of this SKU, and also can be understood as the product competitiveness of this SKU. Through computing, γ a(D) over each cluster in shoes is 0.6232, 0.6908 and 0.7678, respectively. For belts, the ratios are 0.6561, 0.7532 and 0.5771 for clusters 1 to 3. Obviously, cluster 3 in shoes and cluster 2 in belts have the highest product competitiveness, which means that they have the highest percentage of inventory converting into sales, corresponding with the characteristics of a major commodity. Cluster 1 in shoes and cluster 3 in belts has the lowest product competitiveness and is known as a targeted commodity. All the results correspond with the conclusion we give about commodity categories in Section 4.5.1.
(2) Inventory decision The TLs decision I ζ is defined as an auxiliary index for inventory decision: where ζ is a given constant of the marginal conversion rate, ζ ∈ (0, 1). Table 6 displays the TLs decision I ζ for each SKU in shoes' cluster 3 when ζ is given. Moreover, it gives the actual inventory of each SKU. All SKUs are randomly selected from cluster 3 in the shoe dataset. By combining the data of inventory cost, sales profit and production scale, the enterprise set a benchmark γ m 0 for marginal conversion rate. The enterprise then set ζ = γ m 0 in Formula (13) and gets the recommended inventory level. For example, if the company's minimum acceptable, marginal conversion rate is 0.7, it can be seen from Table 6 that the actual inventory level for SKU2 and SKU8 is higher than the TLs-decision I 0.7 , which means that the marginal conversion rate for these two SKUs are relatively low and generates revenue in an inefficient way. The company might consider decreasing the inventory level as a better inventory decision. By contrast, the actual inventory level for the rest of the SKUs is lower than the TLs decision I 0.7 . These SKUs might have too little stock, and the company should increase some inventory instead.

Conclusions, Limitations and Future Research
In this work, a TLs model is proposed for sales forecasting in the fashion industry. To the best of our knowledge, although the sales forecasting model has been explored by a great number of researchers, the TLs model proposed is brand new. The TLs model delves into the relationship between sales demand and inventory and collaborates two parameters to describe the conversion rate of inventory to sales. To solve the proposed model, feature selection and clustering are conducted to process the historical data, and a GA algorithm is used to operate the parameter estimation process. Furthermore, the proposed model is compared with the LR, GBDT, SVR and ANN models on historical data provided by a multinational fashion retailer in Singapore. The experimental result shows that the TLs model outperforms other models in most cases in terms of forecast accuracy. The four traditional methods do not show significant differences when forecasting, and the forecasting performance of ANN is unstable. Moreover, the results give a conclusion that in the shoe dataset, cluster 1 is a targeted commodity and cluster 2 is a common commodity, while cluster 3 is a major commodity. For belts, clusters 1 to 3 are common commodities, major commodities and targeted commodities, respectively.
Moreover, two indicators are built based on our TLs model: the average conversion rate and the marginal conversion rate. By defining the average conversion rate, a measure of product competitiveness is put forward for fashion managers. As for the marginal conversion rate, it is an auxiliary index for deciding on a suitable inventory decision. By setting the acceptable rate based on factors such as inventory cost, sales profit and production scale, the TLs-decision formula provides managers with the recommended inventory level.
Nevertheless, the proposed TLs model shows limitations. The TLs model forecasts the total sales of all stores, which does not consider the forecasting problem of sales in each brick-and-mortar store and periodical sales. Here, a top-down approach could be applied for further research. Moreover, the TLs model could consider alternatives for techniques in each step of the proposed algorithm to achieve better performance. A possible improvement for further work is to find an alternative method to replace the linear regression used in the first layer. Moreover, more effort could be put into extending the TLs model with more realistic constraints and mining deep informationaboutf products, such as the impact of feature crosses on sales.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
The runtimes of the GBDT feature selection and mixed k-mean algorithm in the shoe dataset were 488.3135 and 224.0785 s, respectively, while in the belts dataset, the runtimes were 76.2131 and 1.6885 s. Table A1 describes the training time of our TLs model and benchmark models in different clusters and in the total dataset. It is noted that the TLs model and ANN model have the longest runtime. This might be because the TLs model is an iterative formula, and ANN is relatively more complex compared with the other three benchmark models.