Optimization of Forecasting Performance in the Retail Sector Using Artificial Intelligence †
Abstract
1. Introduction
2. Related Literature
3. Materials and Methods
3.1. Proposed Methodology
- Data preparation: involves cleaning and preparing the input features.
- Feature engineering: to extract relevant temporal and product-related characteristics.
- Model training and evaluation: models are trained and evaluated using consistent criteria across all methods.
- Performance comparison: using standard classification and regression metrics.
- Final model selection: based on experimental results.
- Linear Regression: a conventional regression method utilized as a reference point.
- Decision Tree: a rule-based model known for interpretability.
- Random Forest: an ensemble of decision trees that improves generalization.
- XGBoost: a gradient boosting model known for accuracy and performance.
- Prophet: a time-series forecasting model designed to capture seasonality and trend.

3.2. Dataset Description
- Over 100,000 records;
- Key variables: product code, warehouse ID, date, and order demand;
- CSV format, with missing and inconsistent values.
3.3. Data Preprocessing and Binarization
- Removing null or inconsistent entries;
- Transforming categorical variables through label encoding or one-hot encoding;
- Extracting time components such as day, month, and weekday from the date;
- Exploratory Data Analysis (EDA):The data reveals a clear rising trend, with a 30 day rolling average of 3,069,188 units and weekly seasonality averaging 2,733,319 units. Outliers were deleted to better model training.
- Low Demand: instances where Order_Demand ≤ 5248.1
- High Demand: instances where Order_Demand > 5248.1
3.4. Model Implementation
3.5. Experimental Setup
- Scikit-learn for preprocessing, evaluation metrics, and baseline models;
- XGBoost for gradient boosting;
- Prophet for additive time series forecasting;
- Keras/TensorFlow for implementing the LSTM model.
- Number of epochs (LSTM): 50
- Batch size: 32
- Train/test split: 80% 20%
- -
- XGBoost
- training time: 4.12 s
- Training memory use: 6.71 MB
- Prediction time: 0.09 s
- Peak memory usage during prediction: 1.68 MB.
- -
- LSTM
- training time: 20.60 s
- Maximum memory use during training: 4.68 MB
- Prediction time: 0.39 s
4. Results
- Accuracy: Measures the model’s overall accuracy by calculating the proportion of true findings (including true positives and true negatives) across all examples evaluated.
- Precision (Pre): Indicates the percentage of accurately predicted positive observations among all predicted positives.
- Recall (Rec): Measures the proportion of true positives that were successfully identified.
- F1-Score: The harmonic mean of Precision and Recall achieves a balance between the two.
- RMSE (Root Mean Square Error): Determines the average magnitude of prediction mistakes in regression tasks.
- R2 Score (Coefficient of Determination): Indicates how closely the forecasts match the actual facts.
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Belattar, S.; Abdoun, O.; Haimoudi, E.K. New learning approach for unsupervised neural networks model with application to agriculture field. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 360–369. [Google Scholar] [CrossRef]
- Dash, R. Retail Demand Forecasting Dataset. 2024. Available online: https://www.kaggle.com/datasets/rishavdash/retail-demand-forecasting-dataset (accessed on 9 May 2025).
- Belattar, S.; Abdoun, O.; Haimoudi, E.K. A novel strategy for improving the counter propagation artificial neural networks in classification tasks. J. Commun. Softw. Syst. 2022, 18, 17–27. [Google Scholar] [CrossRef]
- Thivakaran, T.K.; Ramesh, M. Exploratory Data analysis and sales forecasting of bigmart dataset using supervised and ANN algorithms. Meas. Sens. 2022, 23, 100388. [Google Scholar] [CrossRef]
- Falatouri, T.; Darbanian, F.; Brandtner, P.; Udokwu, C. Predictive Analytics for Demand Forecasting - A Comparison of SARIMA and LSTM in Retail SCM. Procedia Comput. Sci. 2022, 200, 993–1003. [Google Scholar] [CrossRef]
- Mahin, M.P.R.; Shahriar, M.; Das, R.R.; Roy, A.; Reza, A.W. Enhancing Sustainable Supply Chain Forecasting Using Machine Learning for Sales Prediction. Procedia Comput. Sci. 2025, 252, 470–479. [Google Scholar] [CrossRef]
- Riachy, C.; He, M.; Joneidy, S.; Qin, S.; Payne, T.; Boulton, G.; Occhipinti, A.; Angione, C. Enhancing deep learning for demand forecasting to address large data gaps. Expert Syst. Appl. 2025, 268, 126200. [Google Scholar] [CrossRef]
- Darshan, S.M. Integrating Data Mining and Predictive Modeling Techniques for Enhanced Retail Optimization. arXiv 2024, arXiv:2409.19248. [Google Scholar] [CrossRef]
- Chakri, P.; Pratap, S.; Lakshay; Gouda, S.K. Analyzing financial accounting data using machine learning. Data Anal. J. 2023, 7, 100212. [Google Scholar] [CrossRef]
- Mafakheri, F.; Wang, C.; Seyedan, M. Order-up-to-level inventory optimization model using time-series demand forecasting with ensemble deep learning. Sustain. Comput. Inform. Syst. 2023, 3, 100024. [Google Scholar] [CrossRef]
- ul Husna, A.; Amin, S.H.; Ghasempoor, A. Machine learning techniques and multi-objective programming to select the best suppliers and determine the orders. Mach. Learn. Appl. 2025, 19, 100623. [Google Scholar] [CrossRef]
- Belattar, S.; Abdoun, O.; Haimoudi, E.K. Comparing machine learning and deep learning classifiers for enhancing agricultural productivity. Case study: Larache Province, Northern Morocco. Int. J. Electr. Comput. Eng. 2023, 13, 1689–1697. [Google Scholar] [CrossRef]
- Belattar, S.; Abdoun, O.; Haimoudi, E.K. Performance analysis of the application of convolutional neural networks architectures in the agricultural diagnosis. Indones. J. Electr. Eng. Comput. Sci. 2022, 27, 156–162. [Google Scholar] [CrossRef]

| Author(s) | Year | Studied Variables | Research Methods | Findings | Limitations |
|---|---|---|---|---|---|
| Thivakaran and Ramesh | 2022 | BigMart sales | ANN, Random Forest, XGBoost | Random Forest outperforms XGBoost | Data limited to BigMart |
| Falatouri | 2022 | Fresh product sales | SARIMA, LSTM | LSTM better for potatoes and tomatoes | SARIMA better for cucumbers and lettuce |
| Rahman Mahin | 2025 | Sales data | RF, KNN, Voting Regressor | R2 = 0.9997 | Complex hybrid model |
| Riachy | 2025 | Sales with COVID-19 restrictions | Deep Learning | Improved forecasting accuracy | Pandemic-specific data |
| Author(s) | Year | Studied Variables | Research Methods | Findings | Limitations |
|---|---|---|---|---|---|
| Sri M Darshan | 2024 | Sales data | Association Rules, Prophet | Pattern mining + time-series | Method complexity |
| Potta Chakri | 2023 | Financial ratios, revenue | EDA, LR, KNN, SVR, Decision Tree | Decision Tree (depth=9) best | No time-series or risk analysis |
| Asma ul Husna | 2025 | Demand, costs, customs, delivery | RF, GBR, LSTM, ARIMA, Fuzzy logic | Supplier optimization model | No real-time test |
| Seyedan Mahya | 2023 | Retail demand, inventory | LSTM, CNN, Bayesian Averaging | Better inventory forecasting | No real-time adaptability |
| Metric | Linear Regression | XGBoost | Random Forest | Decision Tree | LSTM | Prophet |
|---|---|---|---|---|---|---|
| Accuracy (%) | 60.67 | 83.21 | 91.99 | 93.05 | 92.31 | 85.71 |
| Precision (%) | 25.46 | 45.70 | 66.86 | 75.76 | 92.31 | 92.31 |
| Recall (%) | 89.75 | 87.84 | 88.08 | 76.13 | 100.00 | 92.31 |
| F1-score (%) | 39.67 | 60.12 | 76.02 | 75.94 | 96.00 | 92.31 |
| RMSE | 28,803.57 | 22,469.02 | 21,527.80 | 25,317.23 | - | - |
| R2 Score | 0.0226 | 0.4052 | 0.4540 | 0.2449 | - | - |
| Classes | High Demand, Low Demand | |||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jatte, H.; Belattar, S.; Haimoudi, E.K. Optimization of Forecasting Performance in the Retail Sector Using Artificial Intelligence. Eng. Proc. 2025, 112, 37. https://doi.org/10.3390/engproc2025112037
Jatte H, Belattar S, Haimoudi EK. Optimization of Forecasting Performance in the Retail Sector Using Artificial Intelligence. Engineering Proceedings. 2025; 112(1):37. https://doi.org/10.3390/engproc2025112037
Chicago/Turabian StyleJatte, Hoda, Sara Belattar, and El Khatir Haimoudi. 2025. "Optimization of Forecasting Performance in the Retail Sector Using Artificial Intelligence" Engineering Proceedings 112, no. 1: 37. https://doi.org/10.3390/engproc2025112037
APA StyleJatte, H., Belattar, S., & Haimoudi, E. K. (2025). Optimization of Forecasting Performance in the Retail Sector Using Artificial Intelligence. Engineering Proceedings, 112(1), 37. https://doi.org/10.3390/engproc2025112037

