Augmented Data and XGBoost Improvement for Sales Forecasting in the Large-Scale Retail Sector

: The organized large-scale retail sector has been gradually establishing itself around the world, and has increased activities exponentially in the pandemic period. This modern sales system uses Data Mining technologies processing precious information to increase proﬁt. In this direction, the extreme gradient boosting (XGBoost) algorithm was applied in an industrial project as a supervised learning algorithm to predict product sales including promotion condition and a multiparametric analysis. The implemented XGBoost model was trained and tested by the use of the Augmented Data (AD) technique in the event that the available data are not sufﬁcient to achieve the desired accuracy, as for many practical cases of artiﬁcial intelligence data processing, where a large dataset is not available. The prediction was applied to a grid of segmented customers by allowing personalized services according to their purchasing behavior. The AD technique conferred a good accuracy if compared with results adopting the initial dataset with few records. An improvement of the prediction error, such as the Root Mean Square Error (RMSE) and Mean Square Error (MSE), which decreases by about an order of magnitude, was achieved. The AD technique formulated for large-scale retail sector also represents a good way to calibrate the training model.


Introduction
The large-scale retail trade has now established itself internationally for decades. The spread of this modern retail system is associated with a society characterized by a strong economy. Despite this, when a country's economy slows, this sector immediately suffers as sales drop. In this scenario, it is of great importance to have an innovative method that makes it possible to predict sales whether the trend of the economy is negative, steady, or even positive. Machine learning has tools capable of modeling business strategies taking into account consumer purchasing patterns. Machine learning is that branch of artificial intelligence that deals with algorithms capable of extracting useful information from the shapeless mass of data [1]. Machine learning algorithms learn from data in the training phase and are thus able to make predictions on experimental data [2,3]. Data mining (DM) combines the methodologies of machine learning and statistics in order to extract useful information from large amounts of data, such as databases and data warehouses [4]. DM uses artificial neural networks, classification trees, k-Nearest neighbor, Naive Bayes, and logistic regression to analyze data and generate classifications [5][6][7].
Data mining tools are also used by Business Intelligence (BI), which includes various processes and technologies used to collect, analyze, and make data usable in order to make better decisions [8]. BI also makes use of frontal analysis (dashboard), Executive Information System (EIS), and IT tools intended to support management activities, i.e., Decision Support System (DSS). BI performs a support function for business management with the aim of contributing to the qualitative improvement and greater speed of the organization's -the DSS platform main functions are defined; -the multiple attributes influencing the supermarket sales predictions are defined; -the customer grid (customer segmentation) is structured; -the AD approach able to increase the DSS performance is defined; -the platform is implemented; -the XGBoost algorithm is tested by proving the correct choice of the AD approach for the DSS.
Lack of data is a problem that machine learning algorithms face in many fields, especially when there are privacy concerns, such as in the medical field [32,33]. One method used in this case is the construction of synthetic or surrogate data using the copula method [34].
There are numerous research studies that have focused on data mining means in the field of sales forecasting [17,[35][36][37][38][39]. In particular, the XGBoost [16,17], ref. [40] method is widely used for its performances, which also finds applications in e-commerce [40][41][42]. However, this powerful method requires a sufficient amount of data to be applied profitably. In our case the data were not enough, so we used the augmented data technique. This new technique, which has been tested in different sectors [43][44][45], is being applied for the first time for sales forecasting. main objective of the present research work was the implementation of an innovative BI platform. In detail, Figure 1 shows the Use Case Diagram (UCD) of the project. This diagram outlines the main functionalities that the intelligent platform is equipped with. Here, we focus on the functionality outlined in red, which is the prediction of sales through XGBoost algorithms. Figure 2 shows the UCD of the intelligent platform. The operator carries out the data entry. Then, the data is deposited in a data warehouse (DW) that allows for depositing the data coming from multiple sources and at the same time simplifies the data processing. A web app allows the customer to receive some personalized purchase information made possible by the customer profiling carried out by K-means algorithms. The various branches of the store send data in real time to a central database (DB). This is a Structured Query Language (SQL) DB which is not suitable for archiving historical data. Therefore, the massive data are sent on the Vertica DB. The resulting DW architecture is shown in Figure 3.  Figure 1 illustrates the unified Modeling Language (UML) functional scheme of the implemented system applied for an industry working in the large-scale retail sector. The main objective of the present research work was the implementation of an innovative BI platform. In detail, Figure 1 shows the Use Case Diagram (UCD) of the project. This diagram outlines the main functionalities that the intelligent platform is equipped with. Here, we focus on the functionality outlined in red, which is the prediction of sales through XGBoost algorithms. Figure 2 shows the UCD of the intelligent platform. The operator carries out the data entry. Then, the data is deposited in a data warehouse (DW) that allows for depositing the data coming from multiple sources and at the same time simplifies the data processing. A web app allows the customer to receive some personalized purchase information made possible by the customer profiling carried out by K-means algorithms. The various branches of the store send data in real time to a central database (DB). This is a Structured Query Language (SQL) DB which is not suitable for archiving historical data. Therefore, the massive data are sent on the Vertica DB. The resulting DW architecture is shown in Figure 3.   We used the XGBoost algorithm in order to predict the future purchases of customers who have made a number of purchases equal to at least 30% of the number of days corresponding to the training period. On the basis of these predictions and the analysis of the association rules of the products purchased by the total customers of the supermarket, it is possible to formulate personalized offers for each customer. Customers were grouped into clusters, calculated on the basis of recency (distance from the last entry into the supermarket) and frequency (frequency of entry in the last 90 days). The prediction algorithm calculates the product that will be purchased by a customer who belongs to a cluster or to the entire clientele. Figure 4 shows the flow chart of the implemented XGBoost algorithm. The algorithm takes as input the cluster on which to make the forecast, the training period, the number of days for which it carries out the prediction, and the code of the product in question. Finally, a check box permits one to select which of the following parameters must be used  We used the XGBoost algorithm in order to predict the future purchases of customers who have made a number of purchases equal to at least 30% of the number of days corresponding to the training period. On the basis of these predictions and the analysis of the association rules of the products purchased by the total customers of the supermarket, it is possible to formulate personalized offers for each customer. Customers were grouped into clusters, calculated on the basis of recency (distance from the last entry into the supermarket) and frequency (frequency of entry in the last 90 days). The prediction algorithm calculates the product that will be purchased by a customer who belongs to a cluster or to the entire clientele. Figure 4 shows the flow chart of the implemented XGBoost algorithm. The algorithm takes as input the cluster on which to make the forecast, the training period, the number of days for which it carries out the prediction, and the code of the product in question. Finally, a check box permits one to select which of the following parameters must be used We used the XGBoost algorithm in order to predict the future purchases of customers who have made a number of purchases equal to at least 30% of the number of days corresponding to the training period. On the basis of these predictions and the analysis of the association rules of the products purchased by the total customers of the supermarket, it is possible to formulate personalized offers for each customer. Customers were grouped into clusters, calculated on the basis of recency (distance from the last entry into the supermarket) and frequency (frequency of entry in the last 90 days). The prediction algorithm calculates the product that will be purchased by a customer who belongs to a cluster or to the entire clientele. Figure 4 shows the flow chart of the implemented XGBoost algorithm. The algorithm takes as input the cluster on which to make the forecast, the training period, the number of days for which it carries out the prediction, and the code of the product in question. Finally, a check box permits one to select which of the following parameters must be used by the algorithm to make the prediction. The multiple parameters used for the prediction estimation were the following:

Platform Design and Implementation
day of purchase of the product; -week number; -promotion status; -weather conditions; -perceived temperature; -ratio between the number of receipts containing the specific product code and the number of receipts sold in total during the day; -ratio between the number of customer receipts with the cluster identified and the total number of receipts for the day; -ratio between the total receipts issued by the department of the specific product code and the number of total receipts issued during the day; -unit price; -ratio between the number of customers who bought that product and the total number of customers of the day; -total number of customers of the day; -product in promotion at the competitor; -holiday week; - week before a holiday week; -holiday; -day after a holiday; -pre-holiday; -closing day of the financial year. by the algorithm to make the prediction. The multiple parameters used for the prediction estimation were the following: day of purchase of the product; -week number; -promotion status; -weather conditions; -perceived temperature; -ratio between the number of receipts containing the specific product code and the number of receipts sold in total during the day; -ratio between the number of customer receipts with the cluster identified and the total number of receipts for the day; -ratio between the total receipts issued by the department of the specific product code and the number of total receipts issued during the day; -unit price; -ratio between the number of customers who bought that product and the total number of customers of the day; -total number of customers of the day; -product in promotion at the competitor; -holiday week; - week before a holiday week; -holiday; -day after a holiday; -pre-holiday; -closing day of the financial year. After building the dataset, a data pre-processing activity was carried out in order to apply Xgboost. Specifically, the techniques used were the following: • one hot encoding, which is a mechanism that consists of transforming the input data into binary code; After building the dataset, a data pre-processing activity was carried out in order to apply Xgboost. Specifically, the techniques used were the following: • one hot encoding, which is a mechanism that consists of transforming the input data into binary code; • clustering, in order to obtain a relevant statistic, the clientele was divided into groups of customers with similar habits.
Unless the customer makes a large number of purchases in the training period, it is difficult to acquire an accurate prediction of the products they will buy. To overcome the lack of statistics on customers who are not very loyal, customers were grouped into clusters. Clusters are segments or groups of people with similar habits of recency and frequency. This segmentation allows for the increase of the statistics available as the data of the whole group are analyzed to predict what the whole group will buy. Subsequently, other algorithms could be used to distribute the predicted sales among the customers belonging to that cluster. Clustering is based on the calculation of the quintiles (20%, 40%, ... 100%) of the cumulative frequency of recency (number of days since the last order) and frequency (total number of orders in the last 90 days) for each customer who made purchases on the analysis day.
The quantiles were calculated over the last 90 days. On the basis of the recency and frequency values with respect to the quintile values, the following scores were defined for the recency (R) and for the frequency (F) which are associated with each customer: Once the two scores have been calculated, the client's cluster was evaluated using the segmentation map ( Figure 5). The segment is given by the Cartesian point (R, F) in the graph.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of • clustering, in order to obtain a relevant statistic, the clientele was divided into grou of customers with similar habits.
Unless the customer makes a large number of purchases in the training period, it difficult to acquire an accurate prediction of the products they will buy. To overcome t lack of statistics on customers who are not very loyal, customers were grouped into clu ters. Clusters are segments or groups of people with similar habits of recency and fr quency. This segmentation allows for the increase of the statistics available as the data the whole group are analyzed to predict what the whole group will buy. Subsequent other algorithms could be used to distribute the predicted sales among the customers b longing to that cluster. Clustering is based on the calculation of the quintiles (20%, 40%, 100%) of the cumulative frequency of recency (number of days since the last order) a frequency (total number of orders in the last 90 days) for each customer who made pu chases on the analysis day.
The quantiles were calculated over the last 90 days. On the basis of the recency a frequency values with respect to the quintile values, the following scores were defined f the recency (R) and for the frequency (F) which are associated with each customer: Once the two scores have been calculated, the client's cluster was evaluated using t segmentation map ( Figure 5). The segment is given by the Cartesian point (R, F) in t graph.  Furthermore, various optimization techniques of the predictive algorithm were elaborated, such as the Grid search technique and the Feature Selection.
Grid search, also called parameter sweep, is useful for finding the best combination of hyperparameters for the algorithm, which specifically are: max_depth, min_child_weight, learning_rate (eta), n_estimators, subsample, colsample_bytree, reg_lambda, and reg_alpha in order to minimize the cost function as much as possible. The most-used error estimation parameters are: root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE), mean square error on prediction (RMSEP), cross correlation coefficient (R), and standard deviation (SD) [46][47][48][49]. The reliability of the algorithms depends on the estimates of the errors that are made during data processing. The errors may also depend on the creation of inefficient training models. The cost function chosen to evaluate the accuracy of the implemented algorithm was RMSE.
The other technique used to increase the accuracy of the algorithm was the Feature Selection, which consists in the selection of the features with the highest relevance for the provided training set. In fact, there is no unique set of features suitable for predicting the quantities that are sold of all products. The paper concludes with a detailed paragraph on the testing phase of the algorithm and the results obtained.
We used the Python programming language to implement the XGboost algorithm. In particular, Python provides the XGBoost and scikit-learn open-source libraries. Using ScikitLearn Wrapper interface [50][51][52] for implementing XGBoost provides the opportunity to run the algorithm with the maximum number of parallel threads via the optional n_jobs parameter. The n_jobs = −1 option, permitting one to use all CPU cores, saves time, especially for trees that use a large number of nodes. Parallel computing is widely used in various sectors [12,[53][54][55] because it optimizes computational costs.
We used the algorithm to be able to predict the future purchases of customers who had made at least 15 purchases in the last 90 days in order to create personalized offers for each customer. The offers are suggested by the analysis of the products purchased by the total customers of the supermarket. This analysis was performed using the association rules.
Appendix A lists the Python code of the adopted model, namely the piece of code where the XGBoost model is declared in terms of the hyperparameters: eta (alias learn-ing_rate), max_depth, gamma, min_child_weight, subsample, colsample_bytree, alpha, reg_lambda, and n_estimators. For this purpose, the XGBRegressor and GridSearchCV classes of the scikit-learn library were used [56][57][58].
GridSearchCV allows the model to automatically optimize the value of the hyperparameters. In fact, initially, the value of the parameters was not defined, with the exception of the target 'reg: squarederror' and the value of n_jobs, which was set to −1 to parallelize the code. During the construction of GridSearchCV class, it is necessary to provide a dictionary of hyperparameters to evaluate. This class assigns the best combination of the hyperparameters used, as it builds and evaluates a model for each combination of parameters. In particular, the correct calibration of the values of the eta and subsample hyperparameters avoids any overfitting problems. This is achieved by choosing eta so that the update step is not too large. Furthermore, only a fraction of randomly extracted data were sampled. This fraction is decided by the subsample value. Finally, cross-validation was used to evaluate each individual model.

XGBoost Algorithm
The XGBoost algorithm is a machine-learning algorithm that is composed of a sequence of weak predictors. This algorithm was introduced by Chen [59] and perfects the gradient boosting algorithm. Gradient boosting is based on the iterative estimation of trees on the residuals obtained at each step and on adaptive updating of the estimates. Gradient boosting uses the gradient descent technique, for which the split that favors the approach to the minimum point of the objective function is chosen.
The optimizations of XGBoost, compared with gradient boosting, are due to its scalability, the use of parallel and distributed computing, tree pruning operations, management of missing values, and regularization to avoid overfitting and bias.
The input data are the set of values of the variables x i used to make a prediction on the variable y i : This constitutes the training dataset. The model predicts the value of the variable y i starting from the variables x i , which are characterized by multiple features. In a linear regression problem, the predicted value isŷ i = ∑ j θ j x ij , where θ j is the weight of x j . In a generic problem, θ denotes the parameters of the model.
The objective function, which measures the model's ability to fit the training data, consists of two terms: where L(θ) is training loss function and Ω(θ) is the regularization term. The loss function is a differentiable function that evaluates the prediction. The regularization term allows one to control the model's complexity and avoid overfitting. XGBoost uses Taylor expansion of the loss function to write the objective function in the following form: . By defining the following quantities: we can write the optimal weight of the j-th leaf as θ j = − G j H j +λ , where I j is the instance set of the j-th leaf and q(x i ) is a function that maps the data instance into a leaf of the tree, returning the index of the leaf itself. The objective function that optimizes the model is Here, T is the number of leaves that characterize the tree. The computational cost of the algorithm is optimized due to the simultaneous training of all trees. Finally, the gain function, which allows one to evaluate the split candidates, is where the first term is the contribution relative to the left nodes (subscript L), the second term is relative to the right nodes (subscript R), and the last term is the contribution of the parent leaf node (subscript P). The split condition that generates the greatest gain is chosen. This is a pruning strategy that optimizes a tree level to avoid overfitting. For the implementation of the code in Python, the following open-source libraries were used: -Numpy, which facilitates the use of large matrices and multidimensional arrays to operate effectively on data structures by means of high-level mathematical functions; -Pandas, which allows the manipulation and analysis of data; -XGBoost, which provides a gradient boosting framework; -Sklearn, which provides various supervised and unsupervised learning algorithms.
A seed was awarded to initialize the pseudorandom number generator. The random module uses the seed value as the basis for generating a random number. This initialization allows for the replication of the same experiment, thus always obtaining the same results even if the process consists of random functions.
The training dataset was created using a shuffle function that allows for writing the lines in a random order.

AD Approach Applied for Large-Scale Retail Sector
In order to further improve the performance of the model, the augmented data technique was used. This approach was already validated in the case of LSTM neural networks [43][44][45], where data were artificially created to feed a data-poor training dataset. In [44], the training dataset was increased from 768 records to 10,000 records. In the present case, the dataset consisted of 7,212,348 records (instances or rows) relating to 897 sampling days. Each of this record refers to the quantity, which is expressed in pieces or weight according to the type of article, sold out of a single item in a store branch. For each article, the data are not sufficient to reach the desired accuracy, and for this reason increased data have been used. The following 14 parameters were the dataset attributes (variables or columns): 1.
day of the week 6.
quantity sold (number of items) 8.
store branch identifier 10. unit amount 11. measured quantity (kg) 12. product description 13. number of receipts 14. shop department identifier Each of these observations refers to transactions relating to a specific product, sold on a specific day in a specific branch, by customers belonging to a specific cluster. Table 1 summarizes the statistics of the data.  Figure 6 shows the structure of the records that make up the original dataset and the methodology used to create artificial data. The technique consists of generating a certain number of receipts for each product in a sequential manner for each day. The identification number of the first artificial record is equal to the number of the last record of the original dataset increased by 1. For this reason, a reordering of the data was carried out based on the receipt date and not on the identification number.
Furthermore, in generating the quantity sold and the number of receipts relating to a product, the average quantity sold for each item for that product was taken into account. Figure 7 shows the statistic distribution of the number of receipts. The Log Scale option was used to make the histogram of this dataset attribute.

Results and Discussion
The proposed methodological approach used in the development phase was validated and the functionalities implemented in the system were evaluated by comparing the results obtained from the prediction with the real results.
The test on the entire prototype system allowed the analysis of the results and the improvement of the prediction with the help of the Xgboost algorithm, whose behavior showed a very good performance both in training and prediction times and in the percentage of success in the predictions. In fact, the algorithm made it possible to create customized models, allowing any company to predict which customers will enter on the chosen day with a success rate of 70% and to predict what customers will buy, reaching a success rate of 35%. Figure 8 shows an example of the test result of the predictive algorithm for micro clusters. The figure also shows the dashboard that was implemented to allow the manager to set up some parameters, which are useful for the prediction, such as PLU (Price lookup) code, training days, prediction days, etc. In this test, we predicted the quantities of the product with code 3000055, which corresponds to a type of bread. The red plot shows the The number of receipts generated for a certain product in a given day is chosen randomly by setting an upper and lower threshold. In a similar way to the technique used in [23], we used the random method random(a,b) to generate real numbers in the range (a,b). Furthermore, in generating the quantity sold and the number of receipts relating to a product, the average quantity sold for each item for that product was taken into account. Figure 7 shows the statistic distribution of the number of receipts. The Log Scale option was used to make the histogram of this dataset attribute. Furthermore, in generating the quantity sold and the number of receipts relating to a product, the average quantity sold for each item for that product was taken into account. Figure 7 shows the statistic distribution of the number of receipts. The Log Scale option was used to make the histogram of this dataset attribute.

Results and Discussion
The proposed methodological approach used in the development phase was validated and the functionalities implemented in the system were evaluated by comparing the results obtained from the prediction with the real results.
The test on the entire prototype system allowed the analysis of the results and the improvement of the prediction with the help of the Xgboost algorithm, whose behavior showed a very good performance both in training and prediction times and in the percentage of success in the predictions. In fact, the algorithm made it possible to create customized models, allowing any company to predict which customers will enter on the chosen day with a success rate of 70% and to predict what customers will buy, reaching a success rate of 35%. Figure 8 shows an example of the test result of the predictive algorithm for micro clusters. The figure also shows the dashboard that was implemented to allow the manager to set up some parameters, which are useful for the prediction, such as PLU (Price lookup) code, training days, prediction days, etc. In this test, we predicted the quantities of the product with code 3000055, which corresponds to a type of bread. The red plot shows the

Results and Discussion
The proposed methodological approach used in the development phase was validated and the functionalities implemented in the system were evaluated by comparing the results obtained from the prediction with the real results.
The test on the entire prototype system allowed the analysis of the results and the improvement of the prediction with the help of the Xgboost algorithm, whose behavior showed a very good performance both in training and prediction times and in the percentage of success in the predictions. In fact, the algorithm made it possible to create customized models, allowing any company to predict which customers will enter on the chosen day with a success rate of 70% and to predict what customers will buy, reaching a success rate of 35%. Figure 8 shows an example of the test result of the predictive algorithm for micro clusters. The figure also shows the dashboard that was implemented to allow the manager to set up some parameters, which are useful for the prediction, such as PLU (Price look-up) code, training days, prediction days, etc. In this test, we predicted the quantities of the product with code 3000055, which corresponds to a type of bread. The red plot shows the quantities actually sold and the blue plot represents the quantities foreseen by the algorithm.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 11 of 21 quantities actually sold and the blue plot represents the quantities foreseen by the algorithm. The model was trained on 80% of the data and validated on the remaining 20%. The method was tested only for the item with plu equal to 3000055 in order to reduce the computational cost. This item appears on average in about 58 receipts with an average of 215 g of product sold per receipt. Respecting this average, 672 receipts for 30 consecutive days were created and added to the existing data. Figure 9a,c shows the sales predictions of the item for two different periods of seven days each. The predictions were compared day by day with the real values recorded in these periods. The blue curve, which represents the predicted values, is very close to the red curve. As for the test, the prediction was made over a period of seven days following the period to which the data used for training and validation of the model refer. The model does not know what happened in the seven days and the comparison is only later. Furthermore, Figure 9b,d show the same predictions made in Figure 9a,d, respectively, but this time performed with the AD technique. In this case, the blue prediction curve follows the real red curve very well. The improvement obtained with the use of the augmented data is confirmed by the RMSE value. Figure 10 makes this enhancement tangible; in fact, the comparison of RMSE between the cases of Figure 9a,b is shown. In Figure 10, the blue curve refers to the case where training is performed using historical data, while the orange curve refers to the use of augmented data (AD). The AD technique improves the accuracy of the XGboost algorithm by about a factor between 2 and 3. The model was trained on 80% of the data and validated on the remaining 20%. The method was tested only for the item with plu equal to 3000055 in order to reduce the computational cost. This item appears on average in about 58 receipts with an average of 215 g of product sold per receipt. Respecting this average, 672 receipts for 30 consecutive days were created and added to the existing data. Figure 9a,c shows the sales predictions of the item for two different periods of seven days each. The predictions were compared day by day with the real values recorded in these periods. The blue curve, which represents the predicted values, is very close to the red curve. As for the test, the prediction was made over a period of seven days following the period to which the data used for training and validation of the model refer. The model does not know what happened in the seven days and the comparison is only later. Furthermore, Figure 9b,d show the same predictions made in Figure 9a,d, respectively, but this time performed with the AD technique. In this case, the blue prediction curve follows the real red curve very well. The improvement obtained with the use of the augmented data is confirmed by the RMSE value. Figure 10 makes this enhancement tangible; in fact, the comparison of RMSE between the cases of Figure 9a,b is shown. In Figure 10, the blue curve refers to the case where training is performed using historical data, while the orange curve refers to the use of augmented data (AD). The AD technique improves the accuracy of the XGboost algorithm by about a factor between 2 and 3. (c) (d) Figure 9. Predictions obtained for a given product for two different time intervals lasting 7 days, compared with the respective cases obtained with the augmented data technique: from 29 May to 4 June with training using the original data (a) and the augmented data (b); from 2 June to 8 June using the original data (c) and the augmented data (d). The results of Figure 10 show that AD data provide a stable low RMSE output if compared with the oscillatory behavior obtained with the basic dataset. Table 2 reports the accuracy of the XGBoost model by comparing the values of RMSE and MSE between the original data and the augmented data for the test referred to in Figure 10. As for the RMSE value reported in the first row of Table 2, this is calculated by averaging the seven (c) (d) Figure 9. Predictions obtained for a given product for two different time intervals lasting 7 days, compared with the respective cases obtained with the augmented data technique: from 29 May to 4 June with training using the original data (a) and the augmented data (b); from 2 June to 8 June using the original data (c) and the augmented data (d). The results of Figure 10 show that AD data provide a stable low RMSE output if compared with the oscillatory behavior obtained with the basic dataset. Table 2 reports the accuracy of the XGBoost model by comparing the values of RMSE and MSE between the original data and the augmented data for the test referred to in Figure 10. As for the RMSE value reported in the first row of Table 2, this is calculated by averaging the seven The results of Figure 10 show that AD data provide a stable low RMSE output if compared with the oscillatory behavior obtained with the basic dataset. Table 2 reports the accuracy of the XGBoost model by comparing the values of RMSE and MSE between the original data and the augmented data for the test referred to in Figure 10. As for the RMSE value reported in the first row of Table 2, this is calculated by averaging the seven values of Figure 10. Each of these values refers to a different run in which the prediction was made for a given day in the week shown in Figure 10. MSE values were calculated using the same runs. We computed the values shown in Table 2 as follows: where the subscript i refers to the i-th day of the week. When the model was trained by means of augmented data, the MSE improved by a factor of 10.
As in [60], we studied the accuracy of the model as the hyperparameters vary. The study confirmed that the optimal values are those obtained using GridSearchCV, however in the present case there was not a significant increase in performance as the value of the hyperparameters varied. Table 3 shows the values of the main hyperparameters obtained automatically for the prediction shown in Figure 9d and a brief description of their meaning (for a more detailed description, see [41]). Table 3. Value of the main hyperparameters for the run of Figure 9d and their descriptions. Finally, Figure 11 shows the Mean Square Error (MSE) obtained for the prediction, for the same product analyzed in the previous figures, as a function of the number of training days. The error had a minimum in the vicinity of 45 days. When the training took place over a very long period, i.e., over 120 days, the error grew as the products suffer the effect of seasonality and because the habits of customers can change over a long period. Moreover, other studies using other algorithms are oriented on the prediction of few days [61], thus confirming that there could be a lot of variables for long periods that cannot be controlled for the forecasting.

Hyperparameter
In order to fully validate the model, we carried out further tests on other inventory items from different categories. The corresponding departments were the following: In order to fully validate the model, we carried out further tests on other inventory items from different categories. The corresponding departments were the following: The implemented XGBoost algorithm predicts the quantity that will be sold in the selected period, which varies between 1 and 7 days. For packaged products, the number of sold pieces are predicted; for other products such as fresh products and delicatessen department products, however, the predicted quantities are expressed in Kg.
In all the presented cases, the improvement obtained with the use of the augmented data technique is evident. Table 4 shows the RMSE values for the runs related to Figure 12. In the case of loose tomatoes and mozzarella, the quantities sold were very high, and therefore the RMSE values were quite high when compared with the other cases. In order to provide a parameter independent of the quantity sold, the RMSEP value was also calculated. This parameter normalizes each i-th error to the relative real value: = ∑ − Department 1 is the largest as it contains most of the food and drinks, so we chose various products from this department. In detail, the items chosen were: bottle of fresh milk, smoked scarmorze.
The implemented XGBoost algorithm predicts the quantity that will be sold in the selected period, which varies between 1 and 7 days. For packaged products, the number of sold pieces are predicted; for other products such as fresh products and delicatessen department products, however, the predicted quantities are expressed in Kg.
In all the presented cases, the improvement obtained with the use of the augmented data technique is evident. Table 4 shows the RMSE values for the runs related to Figure 12. In the case of loose tomatoes and mozzarella, the quantities sold were very high, and therefore the RMSE values were quite high when compared with the other cases. In order to provide a parameter independent of the quantity sold, the RMSEP value was also calculated. This parameter normalizes each i-th error to the relative real value: where N = 7, while y i andŷ i is the i-th real and predicted value, respectively.      cream (Figure 12a,b) 1. 56 (1.05) 0.53 (0.16) paper towel (Figure 12c The computational cost of a single prediction run was on the order of one minute. Table 5 shows the execution time of the runs whose predictions are shown in Figures 9 and 12.  (Figure 9c,d) 67.61 95.28 cream (Figure 12a,b) 55.69 69.65 paper towel (Figure 12c

Conclusions
In this paper, we discussed the implementation of a predictive model, based on XGBoost algorithms, that was applied for forecasting sales in the large-scale retail sector. It is a multi-parameters model that allows for the consideration of various factors, such as, for example, the weather conditions that are very important for fresh products. However, the use of many parameters makes the method very precise, but it requires an adequate number of historical data to train the algorithm. To make up for the lack of data, the AD technique was used. This technique of enrichment of the training dataset has already been applied with excellent results to LSTM neural network models, but it was for the first time used for an XGBoost algorithm. Here the method, was tested on only one product, which is characterized by the 3000055 PLU code, with very promising results. The approach will be tested on the other products in future works. The test carried out for the prediction of the chosen product allowed us to validate the method through the calculation of MSE and RMSE errors. The results reported in Table 2 confirm that for both errors the accuracy improved due to the use of the technique. The improvement in the case of MSE is particularly advantageous; in fact, it decreased by an order of magnitude. Finally, the use of XGBoost algorithms corroborated by the AD technique allowed us to implement a sophisticated model, as it is based on the use of many parameters, but is also robust because it is capable of enriching the training dataset and obtaining an excellent accuracy measured in terms of MSE. Data Availability Statement: Not Applicable, the study does not report any data.

Acknowledgments:
The proposed work has been developed within the framework of the industry project titled: Piattaforma BigData/B.I. per il volantino dinamico profilato della clientela e per il supporto alle decisioni, basato su analisi predittive, analisi/utilizzo "social", e su ricerche di mercato in ambito GDO. 'GDO-DSS Dynamic Intelligence'" [BigData/B.I platform. for the dynamic flyer profiled for customers and suitable for decision support based on predictive "social" analysis and on market research in the GDO. 'GDO-DSS Dynamic Intelligence'].

Conflicts of Interest:
There are no conflicts of Interest.