Systematic Mapping Study of Sales Forecasting: Methods, Trends, and Future Directions

Ahaggach, Hamid; Abrouk, Lylia; Lebon, Eric

doi:10.3390/forecast6030028

Open AccessArticle

Systematic Mapping Study of Sales Forecasting: Methods, Trends, and Future Directions

by

Hamid Ahaggach

^1,2,*

,

Lylia Abrouk

^1,3

and

Eric Lebon

²

¹

LIB Laboratory, University of Burgundy, 21000 Dijon, France

²

Syartec, 13290 Aix-en-Provence, France

³

MISTEA, INRAE & Institut Agro, University of Montpellier, 34000 Montpellier, France

^*

Author to whom correspondence should be addressed.

Forecasting 2024, 6(3), 502-532; https://doi.org/10.3390/forecast6030028

Submission received: 2 May 2024 / Revised: 23 June 2024 / Accepted: 1 July 2024 / Published: 5 July 2024

(This article belongs to the Section Forecasting in Economics and Management)

Download

Browse Figures

Versions Notes

Abstract

In a dynamic business environment, the accuracy of sales forecasts plays a pivotal role in strategic decision making and resource allocation. This article offers a systematic review of the existing literature on techniques and methodologies used in forecasting, especially in sales forecasting across various domains, aiming to provide a nuanced understanding of the field. Our study examines the literature from 2013 to 2023, identifying key techniques and their evolution over time. The methodology involves a detailed analysis of 516 articles, categorized into classical qualitative approaches, traditional statistical methods, machine learning models, deep learning techniques, and hybrid approaches. The results highlight a significant shift towards advanced methods, with machine learning and deep learning techniques experiencing an explosive increase in adoption. The popularity of these models has surged, as evidenced by a rise from 10 articles in 2013 to over 110 by 2023. This growth underscores their growing prominence and effectiveness in handling complex time series data. Additionally, we explore the challenges and limitations that influence forecasting accuracy, focusing on complex market structures and the benefits of extensive data availability.

Keywords:

sales forecasting; predictive analytics; machine learning; time series analysis; regression analysis; artificial intelligence

1. Introduction

The term sales forecasting is a pivotal concept in the economic sphere, essential for strategic planning in modern industries. This study explores the evolving methodologies, technologies, and trends that shape the landscape of sales forecasting, which has seen significant transformations due to technological advancements, market dynamics, and shifts in consumer behavior [1,2,3].

Before the widespread adoption of heuristic and basic statistical methods, sales forecasting was often a rudimentary process embedded in the general experience and intuition of business owners and managers. Decisions were predominantly made based on personal observations and simple extrapolations of past sales performances without any formal methodological support. This informal approach was largely unstructured and subjective, heavily reliant on individual judgment and local market knowledge.

As businesses expanded and markets became more dynamic, these primitive methods proved inadequate. This led to the development and adoption of heuristic methods, which provided a more structured yet still largely intuitive approach to forecasting. Heuristics involved rules of thumb and were slightly more systematic, drawing on accumulated business experience and historical data patterns [4]. Alongside them, basic statistical methods began to emerge which utilized simple mathematical models to predict future sales based on historical data, marking the initial steps towards more scientific approaches in forecasting.

However, the modern era has witnessed a paradigm shift to sophisticated algorithmic and data-driven approaches, drastically changing the field’s dynamics. The limitations of heuristic and simple statistical methods became apparent as markets grew in complexity and data availability increased. This shift was driven by the need for more accurate forecasting, which is crucial for effective inventory management, financial planning, and strategic decision making. Inaccuracies in forecasting can lead to substantial financial losses, missed opportunities, and operational inefficiencies. As such, businesses are increasingly leveraging advanced methodologies, including modern statistical models and machine learning techniques, to enhance forecast accuracy [5,6].

Despite these advancements, significant gaps remain in our understanding and practical application of these methods, particularly in how they can be integrated and optimized across various market conditions. Our research is motivated by the need to systematically review and synthesize the breadth of existing works, identify key trends and gaps, and propose a comprehensive framework that can guide both current research and future practice in sales forecasting. This need becomes critical, as many previous literature reviews focus narrowly on specific sectors or a single technological approach, which may limit their applicability and relevance across broader sales forecasting contexts.

This comprehensive study delves into the intricate world of sales forecasting, a critical component of contemporary business strategy and decision-making. Our research meticulously examines the titles and abstracts of 1007 papers, which represent the cutting-edge developments over the past decade, from January 2013 to March 2024, in this field. Our objective is to construct a well-defined taxonomy of the technologies employed in sales forecasting, scrutinize the variety of publication venues, elucidate the prevalent terminology, and ultimately decode the key factors that drive the concept of sales forecasting in the context of predictive business analytics.

To achieve this, the systematic mapping study (SMS) method [7] was employed, a rigorous approach that enables a comprehensive and structured exploration of the existing literature. This method facilitates the identification of trends, gaps, and clusters in research, allowing us to map out the field of sales forecasting effectively.

This paper makes substantial contributions to the field of sales forecasting by offering a detailed analysis of technological advancements across various domains and identifying critical research gaps. Our systematic approach not only tracks the evolution of forecasting methodologies from traditional statistical techniques to modern machine learning and deep learning applications but also proposes future research directions to fill identified gaps. By bridging the gap between theoretical research and practical application, this study provides actionable insights for both academics and practitioners, enhancing their understanding of the field. The comprehensive review and the recommendations presented here aim to inspire ongoing innovation and underscore the growing significance of integrating real-time data and advanced analytics into sales forecasting practices.

The rest of the paper is organized as follows: The research methodology is detailed in Section 2, outlining the systematic steps taken to curate our study. Section 3 presents an overview of the existing sales forecasting approaches using a comparison framework and the findings from our analysis, presenting a distilled synthesis of the extensive data we have gathered. This section reveals insights into the technological underpinnings, thematic concentrations, and the evolutionary trajectory of sales forecasting research. Finally, Section 4 concludes the paper and bridges the current state of knowledge with prospective exploratory avenues, setting the stage for subsequent inquiries and advancements in the field.

2. Research Approach

The research approach for this systematic mapping review is designed to comprehensively map out the existing literature on sales forecasting, focusing particularly on its evolution and the methodologies employed. A systematic mapping study is used as a methodology to investigate the field of sales forecasting. SMSs resemble other systematic reviews, such as Systematic Literature Reviews (SLRs) [8] which synthesize existing research in established fields, yet they differ in that they use broader inclusion criteria to select a more diverse array of research papers.

Their aim is to classify topics within a field, as opposed to synthesizing the results of studies. Our study encompasses the extant body of work within sales forecasting. The procedure depicted in Figure 1 comprises six distinct steps, each yielding its own specific outcomes.

Formulating research questions with broad search criteria, such as keywords, language, and publication type. This foundational step shapes the direction of the review and sets the boundaries for inclusion.
Conducting searches for primary studies in various databases including Scopus, Elsevier, Springer, and IEEE Xplore. This step ensures the comprehensiveness of the literature collection and the breadth of the research coverage.
Screening of papers, which involves a meticulous review to ascertain each paper’s relevance based on the predefined inclusion and exclusion criteria.
Keywording of abstracts: here, the aim is to identify and catalog key terms and concepts from the abstracts, which aids in categorizing the papers and discerning thematic trends.
Data extraction: this step extracts pertinent data from the selected papers, ensuring that all relevant information is captured for analysis.
Exploring research, the mapping of studies, which is the culmination of the process where the extracted data are analyzed to identify trends, and gaps in the existing research and charting the landscape of sales forecasting studies.

Our systematic approach, outlined above, enabled us to construct a detailed overview of the sales forecasting field, providing a platform for future research initiatives to build upon. The next sections detail each step of the SMS.

2.1. Definition of Research Questions

In crafting our systematic mapping study, the initial step was to precisely define the research questions. The questions were designed to be both inclusive and specific, guiding the subsequent search and ensuring that the study remained focused on the pertinent issues within the field of sales forecasting.

RQ₁:: What is the annual number of studies on sales forecasting?
RQ₂:: In what venues are research papers on sales forecasting published?
RQ₃:: What is the specific terminology used in sales forecasting?
RQ₄:: What datasets are used to evaluate the proposed approaches for sales forecasting?
RQ₅:: What performance metrics are used in sales forecasting literature?
RQ₆:: What limitations do the proposed solutions for sales have?
RQ₇:: What are the methods and the technologies used in sales forecasting?
RQ₈:: How have those techniques evolved over time?
RQ₉:: How do sales forecasting models differ across various industries?
RQ₁₀:: How are real-time sales forecasting models implemented, and what impact do they have on revenue?

Section 3 provides comprehensive responses to each of the research questions delineated in detail.

2.2. Conducting Searches for Primary Studies

The primary objective of this phase is to compile a comprehensive collection of scientific articles relevant to our research questions. The searches were conducted in April 2023, using renowned databases such as Scopus, Elsevier, Springer, and IEEE Xplore to acquire pertinent literature. Our search strategy was guided by a carefully crafted query string, designed to capture the core aspects of our research focus within the expansive subject area. This study employed a search query that amalgamated research themes and questions with essential keywords, using boolean operators. The initial query, (“sales” OR “revenue”) AND (“prediction” OR “forecasting” OR “estimating” OR “recommendation”), yielded more than 325,000 publications.

2.3. Examination of Papers

In this phase, two steps were performed: the screening of papers and the analysis of keywords in abstracts to further refine our search. We implemented filters, including publication date, language, and publication type (refer to Table 1) to enhance our search string (refer to Table 2) and, to focus on the most pertinent studies. This rigorous process allowed us to significantly narrow down our results to 3801 articles, ensuring that our review was both thorough and concentrated on the most relevant and up-to-date research in the field of sales forecasting.

Based on the abstract, we excluded 554 sources not relevant to sales forecasting topics of papers, obtaining 516 papers presented in this study.

2.4. Data Extraction

The data extraction phase is a crucial component of the SMS, serving as the foundation for synthesizing and analyzing the vast amounts of articles collected from the reviewed literature. The primary objective during this stage is to methodically extract pertinent information that directly contributes to answering the predefined research questions. This process involves identifying relevant data points from each selected paper.

To facilitate a structured and efficient extraction process, a data extraction form is meticulously designed and utilized. This form is tailored to capture essential attributes and metrics from each study, which are critical for addressing the research questions and objectives of the SMS. The attributes include, but are not limited to, the study title, publication details, type of study, key findings, and any identified limitations. Additionally, information regarding the journal or conference name, performance measures, models, technology used, the main objective of the study, experimental performance, and the domain area are collected. These attributes are chosen based on their relevance to the research questions. Table 3 provides the data extraction form used in this SMS.

2.5. Addressing Validity Threats

In qualitative research, the integrity of the study is often challenged by various validity threats. These threats, if not properly managed, can undermine the reliability and generalizability of the research findings. Recognizing this, we embarked on a meticulous analysis of potential biases and validity threats that could influence the outcomes of our investigation. To fortify our study against such vulnerabilities, we introduce a series of strategic measures aimed at mitigating these risks. Drawing upon the insights provided by Pinciroli et al. [9], we detail these interventions as follows:

Descriptive Validity: This pertains to the factual accuracy of the reported data. To safeguard against inaccuracies, we standardized the terminologies and criteria across the study. Furthermore, a comprehensive data extraction template was deployed to ensure consistent and precise recording of information, thereby enhancing the reliability of our data collection process.
Theoretical Validity: This concerns the study’s capacity to accurately capture the concepts it aims to investigate. To enhance our theoretical grasp, we meticulously crafted a search strategy, employing both automatic and manual search techniques across esteemed digital libraries in computer science and software engineering. Additionally, by defining clear inclusion and exclusion criteria, we minimized the risk of omitting relevant literature, thereby strengthening our theoretical foundation.
Generalization Validity: This aspect examines the study’s potential for broader applicability beyond the immediate research context. By formulating general yet incisive research questions, we paved the way for findings that not only shed light on specific instances of sales forecasting applications but also offer insights with wider relevance, enhancing the study’s overall external validity.
Evaluative Validity: This facet evaluates the logical soundness of the study’s conclusions. To uphold the integrity of our evaluative processes, the analysis was conducted independently by multiple researchers, with overlapping responsibilities to identify and reconcile any discrepancies. This collaborative yet independent review process ensured that our conclusions were not only grounded in the data but also subjected to rigorous scrutiny.
Transparency Validity: The reproducibility of a study is paramount to its credibility. We documented our research methodology with meticulous detail, providing a clear and comprehensive guide that allows for the replication of our study. This commitment to transparency not only validates our research process but also contributes to the body of knowledge by enabling subsequent scholars to build upon our work with confidence.

2.6. Data Synthesis

Data synthesis aims to integrate and analyze information from selected studies, addressing the research questions effectively. In this process, we employ a multifaceted approach to data analysis, leveraging a combination of narrative synthesis and quantitative visualization techniques. This includes the utilization of descriptive tables that summarize key findings and trends, as well as various graphical tools. Specifically, we make use of bar charts, pie charts, and line graphs to visually represent data distributions, patterns, and correlations identified during our analysis. These methods collectively enhance our ability to interpret the compiled data comprehensively, facilitating a deeper understanding of the study’s overarching themes and findings.

3. Results of SMS on Sales Forecasting

In this section, the findings of this SMS will be presented and discussed for each research question (RQ).

3.1. Studies on Sales Forecasting

A detailed review of the literature focusing on sales forecasting underscores a burgeoning interest in the field. Data analysis from 2013 to 2023 highlights a remarkable, sustained growth in research efforts, especially evident from 2016, as illustrated in Figure 2.

This analysis covers a decade of research with a total of 516 articles. The distribution of these articles over the years shows initial modest engagement, with only 10 studies identified in 2013. However, the ensuing years witnessed a substantial increase in scholarly output, especially after 2016, reaching a peak in recent years. This indicates that sales forecasting has become a trending topic within the academic community. The data for 2024 were excluded from this analysis to avoid misinterpretation due to the incomplete nature of the year’s data. This ensures that the trends depicted are based on complete and accurate data, providing a clearer and more reliable representation of the research trends in sales forecasting.

3.2. Venues Publishing Sales Forecasting Research

Research on sales forecasting is disseminated across a variety of venues, reflecting the interdisciplinary nature of the field that spans business, economics, statistics, and computer science. As presented in Figure 3, the distribution of the top 10 publication venues provides insights into the focal points and trends within the field. For instance, a significant portion of the research, constituting 11.9% (62 papers), is published in the International Journal of Forecasting, indicating its prominence as a leading journal in the forecasting community. Expert Systems with Applications comprised 9.2% of the papers (47 papers), which points to the integration of artificial intelligence in sales forecasting methodologies. Moreover, “Others”, representing 42.2% of the distribution, comprises a multitude of journals that cater to niche aspects of sales forecasting, suggesting varied research.

Concerning publication types, the papers analyzed in this study are distributed as follows: 87% in journals, 11% in conference proceedings, and 2% in books, highlighting the predominant role of journal publications in disseminating research findings.

3.3. Terminology Used in Sales Forecasting

We created two keyword clouds: one based on the titles of the selected publications (Figure 4) and the second one based on their abstracts (Figure 5). The size of the word in the cloud indicates its frequency in the dataset. Here, we can observe that several terms are central to the discussion on sales forecasting within the selected publications.

In the keyword cloud derived from publication titles (Figure 4), prominent terms include “forecasting”, “sales”, “demand”, “prediction”, “time series”, “model”, and “machine learning”. These terms suggest a strong focus on predictive modeling and the use of machine learning techniques to anticipate sales and demand.

On the other hand, the keyword cloud from the abstracts (Figure 5) highlights not only “forecasting” and “sales” but also “data”, “algorithm”, “method”, “product”, and “market”. The presence of “neural network”, “deep learning”, “time series”, and “regression” indicates the application of statistical and artificial intelligence methods to sales forecasting. Additionally, the term “feature” points to the importance of feature engineering in improving the accuracy of predictive models.

Based on the study of the keywords, we can establish that both clouds indicate a multidisciplinary approach that leverages statistical methods, machine learning algorithms, and time series analysis to address the dynamic and complex nature of sales forecasting. The emphasis on “market”, “product”, and “demand” underscores the practical, real-world application of these techniques in various industries, as suggested by terms such as “retail”, “electric vehicle”, “agricultural”, and “supply chain”.

3.4. Datasets Used in Sales Forecasting

The evaluation of sales forecasting approaches is essential for understanding their effectiveness, scalability, and applicability to real-world scenarios. The datasets used in these evaluations vary widely depending on the domain of application, the granularity of data required, and the specific goals of the forecasting model. Although many of these datasets remain private and thus inaccessible, there are notable public datasets that serve as standard benchmarks (Figure 6). However, due to the specialized and confidential nature of sales forecasting, it is rare for studies to use the same dataset more than once, underscoring the variability and complexity of the challenges faced in this field.

In this SMS, we categorize these datasets into three main types based on their availability: public datasets, which are open and freely accessible, private datasets, which are not publicly available, and hybrid datasets. It is observed that some research efforts use a mix of both public and private datasets or combine multiple public datasets to conduct their experiments.

The datasets employed in the field of sales forecasting are highly varied, tailored to the specific needs of the forecasting task at hand, whether it involves analyzing historical sales data, inventory levels, or economic indicators. Table 4 gives an overview of the percentage of datasets used in the articles reviewed in this study. From the table, it is clear that time series data are the most used in sales forecasting studies. This prevalence is understandable given that sales are inherently tied to time; these types of data are crucial for time series analysis. Textual data are also employed, as these forms of data consist of natural language or unstructured text, making them valuable for sentiment analysis to gauge customer sentiment and predict sales trends. Tabular data are another key type used in sales forecasting. Structured in nature, they can encapsulate product characteristics that are essential for predicting sales outcomes.

Furthermore, there are studies that combine multiple data types, such as product characteristics, sales data, inventory levels, and economic indicators, to forecast sales. These works often employ multiple models to enhance prediction accuracy. Image data and network data are less commonly used in this context. Regarding hybrid data (combining public and private datasets), the researchers in some instances use the same types of data across both datasets, whereas in other cases, they integrate various data types, such as merging sales history with product features. These hybrid strategies showcase the adaptability and intricacy of data analysis techniques in modern sales forecasting research.

3.5. Performance Metrics Used in Sales Forecasting

This section aims to explore and elaborate on the various performance metrics prevalent in the realm of sales forecasting research. The assessment of sales forecasting models is of paramount importance, with the selection of metrics being largely contingent on the type of forecasting task at hand, such as classification, regression, or time series analysis, as well as the methodologies employed, like genetic algorithm models.

In the following, we provide a detailed description of the most frequently used metrics, as illustrated in Figure 7.

At the forefront is the Mean Absolute Error (MAE), which stands out for its widespread use in 113 studies, highlighting its leading role as an indicator of forecast precision. Next in line are the Root Mean Squared Error (RMSE) and the Mean Squared Error (MSE), applied in 89 and 70 studies, respectively. These indicators are especially valuable for regression-oriented forecasting models, emphasizing the importance of quantifying the differences between forecasted and actual values of sales.

The metrics used in evaluating sales forecasting models can be broadly categorized into four categories. Each category is tailored to assess different aspects of model’s performance, depending on the nature of the prediction and task:

Time series and regression evaluation metrics: These metrics are essential when the output is numerical, such as the number of sales next month or the number of days of aging. They help in quantifying the accuracy of predictions in a continuous space.
Classification evaluation metrics: this category is crucial when the objective is to evaluate the prediction of classes or categories, such as predicting whether sales will increase, decrease, require sentiment analysis, or remain stable.
Clustering evaluation metrics: These metrics are used in sales forecasting when the goal is to evaluate models that group similar data points together without predefined labels. This can be useful in market segmentation and understanding customer behaviors.
Statistical model evaluation metrics: these metrics assess the statistical robustness and validity of forecasting models, ensuring that predictions are not only accurate but also statistically significant.

In the following sections, we detail each metric category, starting with time series and regression evaluation metrics.

3.5.1. Time Series and Regression Evaluation Metrics

Mean Absolute Error (MAE)

This is a metric commonly used in sales forecasting, particularly in time series analysis and regression models. It measures the average magnitude of errors in a set of predictions, without considering their direction. It is the average over the test sample of the absolute differences between predictions and actual observations. The definition of MAE is as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - {\hat{Y}}_{i} |

(1)

where

Y_{i}

represents the actual sales value of the i-th observation,

{\hat{Y}}_{i}

is the predicted value for the i-th observation, and n is the number of observations. The absolute value indicates that we take into account the magnitude of the errors regardless of their direction.

Mean Squared Error (MSE)

MSE is another popular metric used in evaluating the performance of sales forecasting models, especially those based on time series and regression analysis. Unlike MAE, MSE penalizes larger errors more heavily, as it squares the differences between actual and predicted values before averaging them. This makes MSE more sensitive to outliers and extreme values. The definition of MSE is as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}

(2)

where

Y_{i}

represents the actual sales value of the i-th observation,

{\hat{Y}}_{i}

is the predicted value for the i-th observation, and n is the number of observations. By squaring the differences, the impact of larger errors on the overall metric is magnified.

Mean Absolute Scaled Error (MASE)

MASE is a metric that extends the idea of Mean Absolute Error (MAE) by scaling the errors based on the in-sample Mean Absolute Error of a naive forecast method. This scaling makes MASE a more interpretable metric, as it allows for comparisons across different datasets and forecasting methods. The definition of MASE is as follows:

MASE = \frac{\frac{1}{n} \sum_{i = 1}^{n} | Y_{i} - {\hat{Y}}_{i} |}{\frac{1}{T} \sum_{t = 1}^{T} | Y_{t} - Y_{t - 1} |}

(3)

where

Y_{i}

represents the actual sales value of the i-th observation,

{\hat{Y}}_{i}

is the predicted value for the i-th observation, n is the number of observations in the test set, T is the number of observations in the training set, and

Y_{t}

and

Y_{t - 1}

are consecutive actual sales values in the training set.

Root Mean Squared Error (RMSE)

RMSE is closely related to MSE and is often used alongside it. RMSE is simply the square root of the MSE, which brings the error values back to the original scale of the data. This makes RMSE more interpretable than MSE, as the error values are in the same units as the original data. The definition of RMSE is as follows:

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}}

(4)

where

Y_{i}

,

{\hat{Y}}_{i}

, and n are defined as before. By taking the square root, RMSE provides a more intuitive understanding of the average error magnitude in the model’s predictions.

Mean Absolute Percentage Error (MAPE)

This is a widely used metric in sales forecasting that expresses the accuracy of a forecasting model as a percentage. It measures the average absolute difference between actual and predicted values as a proportion of the actual values. The definition of MAPE is as follows:

MAPE = \frac{1}{n} \sum_{i = 1}^{n} |\frac{Y_{i} - {\hat{Y}}_{i}}{Y_{i}}| \times 100 %

(5)

where

Y_{i}

represents the actual sales value of the i-th observation,

{\hat{Y}}_{i}

is the predicted value for the i-th observation, and n is the number of observations.

Symmetric Mean Absolute Percentage Error (sMAPE)

This is a variant of MAPE that addresses the issue of infinite or undefined values when actual values are close to zero. sMAPE also penalizes both over- and under-forecasting equally. The definition of sMAPE is as follows:

sMAPE = \frac{1}{n} \sum_{i = 1}^{n} \frac{| Y_{i} - {\hat{Y}}_{i} |}{(| Y_{i} | + | {\hat{Y}}_{i} |) / 2} \times 100 %

(6)

where

Y_{i}

and

{\hat{Y}}_{i}

are defined as before.

R² (Coefficient of Determination)

This is a metric that indicates the proportion of the variance in the dependent variable (actual sales) that is explained by the independent variables (features) in a regression model. R² ranges between 0 and 1, with higher values indicating a better fit of the model to the data. The definition of R² is as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}

(7)

where

Y_{i}

and

{\hat{Y}}_{i}

are defined as before and

\bar{Y}

is the mean of the actual sales values.

Squared Loss (SQL)

This is a metric that measures the difference between actual and predicted values by squaring the errors. It is similar to Mean Squared Error (MSE), but without the averaging step. The definition of SQL is as follows:

SQL = {(Y_{i} - {\hat{Y}}_{i})}^{2}

(8)

where

Y_{i}

and

{\hat{Y}}_{i}

are defined as before.

Sum of Squared Errors (SSE)

This is a metric that measures the sum of the squared differences between actual and predicted values. It is used to assess the overall quality of a forecasting model. The definition of the SSE is as follows:

SSE = \sum_{i = 1}^{n} {(Y_{i} - {\hat{Y}}_{i})}^{2}

(9)

where

Y_{i}

and

{\hat{Y}}_{i}

are defined as before.

Mean Absolute Deviation (MAD)

This is a metric that measures the average absolute difference between actual and predicted values. It is similar to Mean Absolute Error (MAE), but it uses the median of the errors instead of the mean. The definition of MAD is as follows:

MAD = {median}_{i = 1}^{n} | Y_{i} - {\hat{Y}}_{i} |

(10)

where

Y_{i}

and

{\hat{Y}}_{i}

are defined as before.

3.5.2. Classification Evaluation Metrics

Classification evaluation metrics are crucial for assessing the performance of sales forecasting models when the objective is to predict discrete classes or categories, such as predicting whether sales will increase, decrease, or remain stable. These metrics provide valuable insights into the model’s ability to correctly classify instances and help in selecting the most suitable model for a given task.

Accuracy

Accuracy is a widely used metric for evaluating the performance of classification models. It measures the proportion of correct predictions out of the total number of predictions. The formula for accuracy is as follows:

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(11)

where TP, TN, FP, and FN represent the number of true positives, true negatives, false positives, and false negatives, respectively. TPs represent the instances where the model correctly predicted a positive outcome, such as a decrease in sales, when the actual outcome was also negative.

TNs represent instances where the model correctly predicted a negative outcome, such as a decrease in sales or the absence of a particular event, when the actual outcome was indeed negative. This metric is crucial for understanding how well a model can identify and predict instances where an event does not occur.

FPs, often referred to as Type I errors, occur when the model incorrectly predicts a positive outcome (e.g., forecasting an increase in sales) when the actual outcome is negative (e.g., sales decreased or remained the same). A high number of false positives can indicate that a model is overly optimistic in predicting positive events.

FNs, also known as Type II errors, happen when the model fails to predict a positive outcome (e.g., an increase in sales) that actually occurs. This can be particularly costly in scenarios where missing out on a positive event (such as failing to anticipate a surge in demand) could have significant implications.

Precision

Precision is a metric that measures the proportion of true positive predictions out of all positive predictions made by the model. It is calculated as follows:

Precision = \frac{TP}{TP + FP}

(12)

Precision is particularly useful when the cost of false positives is high, as it provides insights into the model’s ability to avoid making false positive predictions.

Recall

Recall, also known as sensitivity, is a metric that measures the proportion of true positive predictions out of all actual positive instances in the dataset. It is calculated as follows:

Recall = \frac{TP}{TP + FN}

(13)

Recall is particularly useful when the cost of false negatives is high, as it provides insights into the model’s ability to correctly identify positive instances.

F1 Score

F1 score is a metric that combines precision and recall into a single value, providing a balanced assessment of the model’s performance. It is calculated as the harmonic mean of precision and recall:

F 1 - Score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(14)

F1 score is particularly useful when dealing with imbalanced datasets or when both precision and recall are important for the application.

Confusion Matrix

A confusion matrix is a powerful tool for visualizing the performance of a classification model. It is essentially a table used to describe the performance of a predictive model on a test dataset for which the true values are known. The matrix itself displays the number of correct and incorrect predictions made by the model, categorized by type. Specifically, the diagonal elements of the matrix represent the number of instances where the predicted class matches the actual class—these are the correct predictions. This direct visualization aids in more easily understanding which classes are being predicted correctly and which are not.

For a binary classification problem, the confusion matrix can be graphically represented in Table 5.

Evaluation metrics like accuracy, precision, and recall can be directly computed from the values in this matrix. It also explicitly shows different types of errors made by the model.

For multi-class problems, the confusion matrix generalizes to an

N \times N

grid, where N is the number of classes. Each row represents the instances in an actual class, while each column represents the instances in a predicted class.

Area under the Receiver Operating Characteristic Curve (AUC-ROC)

The Area Under the Receiver Operating Characteristic Curve (AUC-ROC) is a metric for evaluating the performance of a classification model. It measures the model’s ability to distinguish between positive and negative classes across various threshold levels. The ROC curve plots the true positive rate against the false positive rate, providing a visual representation of the model’s discrimination thresholds.

The AUC-ROC, the integral of the ROC curve, encapsulates the model’s discrimination capacity into a single value between 0 and 1. A perfect model has an AUC of 1.0, while a model with an AUC of 0.5 does not perform better than random chance. Values below 0.5 suggest that a model performs worse than random, potentially indicating a need to reevaluate the model’s approach. The AUC-ROC is exceptionally beneficial when decision thresholds are variable, in situations with imbalanced class distributions, or when the costs associated with false positives and negatives are significantly different. Figure 8 illustrates the ROC curve, where the solid line represents the balance between TPR and FPR at various thresholds. The dashed line indicates the line of no discrimination, equivalent to random guessing. The area shaded under the ROC curve (AUC) quantifies the overall ability of the model to correctly classify the positive and negative cases. The closer the ROC curve is to the top-left corner of the graph, the higher the model’s accuracy.

3.5.3. Clustering Evaluation Metrics

Clustering evaluation metrics are used to assess the performance of clustering methods in sales forecasting, particularly when hybrid methods that incorporate clustering are employed. Despite the scarcity of extensive research in this area, these metrics are crucial for gauging the quality and effectiveness of the clustering process.

Diversity Measures

Diversity measures are used to assess the variability, heterogeneity, and diversity of predictions made by ensemble models or within clusters of data. These measures help in understanding the spread or dissimilarity in the data or models. Common diversity measures include the Shannon Diversity Index, Simpson’s Diversity Index, and the Gini–Simpson Index. In ensemble learning, diversity among the models can lead to better generalization and robustness of the predictions. In clustering, diversity measures can help in assessing the effectiveness of the clustering process by evaluating how distinct the clusters are from one another.

3.5.4. Statistical Model Evaluation Metrics

Bayesian Information Criterion (BIC)

The Bayesian Information Criterion (BIC) is a criterion for model selection among a finite set of models. The BIC is based on the likelihood function and is closely related to the Akaike Information Criterion (AIC). However, the BIC introduces a higher penalty for models with more parameters, which helps to prevent overfitting. It is calculated as

B I C = ln (n) k - 2 ln (\hat{L})

, where n is the number of observations, k is the number of parameters, and

\hat{L}

is the maximized value of the likelihood function of the model. Models with lower BIC values are generally preferred.

Overall Goodness of Fit (OGF)

The OGF is a comprehensive metric used to evaluate how well a statistical model fits the observed data. It typically involves a combination of tests and measures, such as the Chi-square test, R-squared, and Root Mean Squared Error (RMSE), to assess the discrepancy between observed values and the values expected under the model in question. A higher goodness of fit indicates that the model accurately represents the data, while a lower goodness of fit suggests discrepancies between the model and observed data.

Convergence Rate

The convergence rate of a statistical model or algorithm refers to the speed at which it approaches its final solution or optimal performance as the number of iterations increases or as the sample size grows. It is a critical factor in the evaluation of iterative algorithms, including optimization algorithms and machine learning models. The convergence rate can be described quantitatively, often as a function of the number of iterations or the size of the dataset. For example, an algorithm may have a linear convergence rate if the error decreases proportionally with the number of iterations, or a quadratic convergence rate if the error decreases with the square of the number of iterations. Fast convergence rates are desirable as they imply that the model or algorithm requires fewer iterations and resources to reach an acceptable level of accuracy. However, the convergence rate may vary depending on the complexity of the problem, the initial conditions, and the algorithm’s parameters. Understanding and optimizing the convergence rate is crucial for efficient algorithm design and implementation, especially in data-intensive applications where computational resources are a limiting factor.

3.6. Limitations of Proposed Solutions in Sales Forecasting

The limitations existing in sales forecasting can be systematically categorized into several distinct areas, highlighting the multifaceted challenges encountered in developing and implementing effective forecasting models. These limitations are detailed as follows:

Data limitations:
Many studies are constrained by the scope, quality, and availability of the data (Table 4) used for training and testing their models. Issues such as reliance on data from specific regions, industries, or platforms may limit the applicability of the findings to other contexts. Notable examples include the studies [27,31,32].
Model complexity and interpretability:
The complexity of the proposed solutions, which often involve complex ensemble models or the integration of multiple techniques (like deep learning, optimization algorithms, and sentiment analysis), poses significant challenges in terms of interpretability, computational requirements, and practical implementation. Representative studies include [28,46].
Handling dynamic factors:
Accurately capturing and forecasting the impact of dynamic factors, such as economic fluctuations and consumer behavior changes, remains a challenge. This issue is particularly pronounced in scenarios involving unforeseen events, like the COVID-19 pandemic, which can drastically affect sales and demand patterns. This category is inferred from discussions in various studies [47,48,49,50,51].
Overfitting and bias:
The risk of overfitting to training data or introducing biases through specific datasets or algorithms used, which could impair the model’s performance on unseen data or in real-world applications, is a noted concern. The studies [52,53,54] mention limitations concerning dataset biases and overfitting.
Computational resources:
The computationally intensive nature of some deep learning models and ensemble techniques, which require significant resources for training and deployment, limit their practical applicability in certain scenarios. Key examples include [55,56,57,58].
Integration and adaptation:
The potential difficulties in integrating the proposed solutions with existing systems or adapting them to various industries, products, or market conditions are acknowledged, indicating a need for further research and customization. Examples include [41,45].

3.7. Methods and Technologies Used in Sales Forecasting

Sales forecasting encompasses a broad array of methods and technologies tailored to the unique challenges and goals of predicting future sales. The complexity of sales forecasting arises from the need to understand and quantify the relationship between a multitude of factors that influence sales, such as historic sales, consumer behavior, and market conditions with economic indicators and company operations.

The methods and techniques in sales forecasting vary depending on the problem that the study tries to answer. It is often approached as a time series problem where future sales,

y_{t + 1}

, are predicted as a function, f, of sales in a prior window of time,

[t - window, t]

. The function f can take various forms, such as statistical models or machine learning techniques. These methods leverage historical sales data to identify patterns and trends, which are then used to generate forecasts.

However, sales data are often noisy and non-stationary, meaning they can exhibit trends, cycles, and volatilities that evolve over time. Effective forecasting models must account for these characteristics, adapting and evolving as new data become available. Moreover, the advent of big data and advanced computational power has catalyzed the development of sophisticated models that can capture complex patterns and relationships within the data that were previously undetectable with simpler models. This is why deep learning models and hybrid models, which combine various statistical models with machine learning, are increasingly prevalent. In some cases, the problem of sales forecasting can also be viewed as a text mining problem to understand client demand using data from customer reviews to predict future sales. Here, techniques such as text mining and sentiment analysis can be invaluable.

Thus, the selection of appropriate sales forecasting methods and technologies heavily depends on the nature of the sales data, the forecasting horizon, the level of accuracy required, and the computational resources available. Techniques can range from simple moving averages and exponential smoothing to complex machine learning algorithms and artificial intelligence applications. Each of these methods has its strengths and weaknesses, and often, a hybrid or ensemble approach is used to combine the benefits of multiple methods to enhance forecasting performance.

Figure 9 summarizes the techniques and models often used in sales forecasting. As we delve into the various techniques, we explore how each category of methods contributes to the overarching goal of predicting future sales.

3.7.1. Qualitative Methods

Qualitative methods in sales forecasting are based on expert judgments, market intelligence, and subjective evaluations, rather than relying on quantitative historical data. These methods prove beneficial in scenarios involving new products, markets, or disruptive events where past data might not be available or relevant. The literature includes works comparing these methods with machine learning approaches, such as the studies by Bian et al. (2022) [59] and Shiman (2023) [60].

3.7.2. Machine Learning Models

In recent years, machine learning techniques have experienced a significant surge in popularity within the sales forecasting domain, primarily due to their remarkable ability to process and learn from vast datasets, thereby unearthing complex patterns [32,42,61,62,63]. These techniques have found diverse applications, such as identifying key factors influencing sales [32,42,61,62,63]; predicting customer churn and lifetime value to enable targeted retention strategies [31,42,64,65]; forecasting demand for new products or services based on similar historical patterns [66,67,68,69,70,71]; and optimizing pricing and promotional strategies to maximize sales and profitability [72,73,74,75,76,77].

A diverse array of machine learning algorithms has been employed in sales forecasting, including Support Vector Machines (SVMs), K-Nearest Neighbors (KNNs), Logistic Regression (LR), Decision Trees (DTs), Random Forest (RF), Gradient Boosting, and Extreme Gradient Boosting (XGBoost).

3.7.3. Deep Learning Models

Deep learning, a subset of machine learning, has emerged as a powerful set of techniques in the sales forecasting area, particularly over the last decade. It is well-suited for addressing complex time series problems through models like Recurrent Neural Networks (RNNs) [78,79,80], Long Short-Term Memory (LSTM) networks [38,81], and Bidirectional LSTM (BiLSTM) [82,83]. These models are adept at capturing temporal dependencies and patterns in sales data, which is crucial for accurate forecasting.

Another notable technique includes the Gated Recurrent Unit (GRU) [10,49], which simplifies the structure of standard RNNs while retaining their ability to capture dependencies for time series data.

Moreover, Convolutional Neural Networks (CNNs) [84,85] have been applied to sales data, utilizing filters to extract spatial hierarchies of features, which is beneficial in recognizing patterns that influence sales trends. This array of deep learning models demonstrates the evolving landscape of sales forecasting techniques, driven by advancements in AI and computational capabilities. Recently, more advanced models such as Transformers and pretrained Large Language Models (LLMs) have gained traction in the field. Transformers, with their self-attention mechanisms [86], are particularly effective in handling long-range dependencies and have shown superior performance in various forecasting tasks [87]. LLMs, originally designed for natural language processing tasks [88], are now being adapted for sales forecasting by leveraging their ability to analyze and interpret vast amounts of textual data, such as customer reviews and social media sentiments. These models enhance the forecasting accuracy by incorporating sentiment analysis and other qualitative data into the predictive models.

3.7.4. Statistical Models

Despite the rise of deep learning models in sales forecasting, traditional statistical models [89,90] continue to play a crucial role due to their effectiveness in handling sales data. Among these, the Autoregressive Integrated Moving Average (ARIMA) [91] model and its variant, the Seasonal ARIMA (SARIMA) [92], stand out for their ability to model various types of time series data, including those with seasonal trends.

Exponential smoothing techniques are also widely used for their simplicity and robustness in forecasting, especially when dealing with time series data that exhibit a trend or seasonal pattern. These methods apply weighted averages of past observations to predict future values [93], making them particularly effective for short-term forecasts.

Markov Chain Monte Carlo (MCMC) methods offer a probabilistic approach to sales forecasting, incorporating prior knowledge and evidence to update the probability of a hypothesis as more information becomes available. These approaches are highly adaptable and are applied to a wide range of forecasting problems [55].

3.7.5. Decomposition and Clustering Techniques

Decomposition and clustering techniques are powerful tools in sales forecasting that help break down complex data or identify groups within the data.

Decomposition techniques [94,95] are used to break down complex sales data into simpler components, making it easier to identify patterns and generate accurate forecasts. These techniques include Time Series Decomposition, Segmentation Methods, and Hierarchical Forecasting. Clustering methods are used to group similar sales data points together based on their characteristics [96,97]. These methods help identify customer segments or product categories that exhibit similar sales patterns, allowing for more targeted forecasting.

3.7.6. Optimization and Heuristic Approaches

Those approaches are increasingly being used in sales forecasting to find optimal solutions in complex and high-dimensional search spaces. These methods are particularly useful when dealing with large datasets and when traditional statistical methods may not be sufficient. Two popular optimization and heuristic approaches are Particle Swarm Optimization and Genetic Algorithms [30,98,99].

3.7.7. Natural Language Processing (NLP)

Natural language processing techniques have become indispensable in the realm of sales forecasting due to their ability to derive meaningful insights from unstructured textual data. These data encompass customer reviews, social media posts, and news articles, which are rich sources of information on consumer sentiment and market trends.

One of the cornerstone NLP techniques in this context is sentiment analysis [22,29], which focuses on determining the emotional tone or opinion expressed in text data. This capability is pivotal for sales forecasting, as it allows businesses to gauge customer sentiment towards their products or brands. Sentiment analysis serves as a tool for identifying leading indicators of future sales performance by examining the positive or negative trends in customer sentiment found in reviews or social media mentions.

3.7.8. Data Processing Techniques

Data processing techniques play a vital role in sales forecasting by transforming raw data into a format suitable for analysis and modeling. These techniques help improve the quality of the data, reduce computational complexity, and enhance the performance of forecasting models. Two key data processing techniques in sales forecasting are dimensionality reduction and feature selection. Dimensionality reduction techniques [25,31] aim to reduce the number of variables or features in a dataset while retaining the most important information. Feature selection techniques [100,101] focus on identifying the most relevant features or variables for sales forecasting, while discarding irrelevant or redundant ones. This process helps improve model interpretability, reduce overfitting, and enhance the generalization performance of forecasting models.

3.7.9. Hybrid Approaches

Hybrid approaches in sales forecasting integrate two or more distinct forecasting techniques or models to enhance accuracy and robustness. These methods aim to capitalize on the strengths and offset the weaknesses of individual models, thereby improving overall performance. Common hybrid approaches include the integration of ARIMA and SARIMA with LSTM models; ARIMA and SARIMA are adept at capturing linear patterns, while LSTM excels in modeling nonlinear dynamics in data [102]. Another approach is the Nonlinear Autoregressive with ARIMA, which merges nonlinear autoregressive models with ARIMA to capture both nonlinear and linear dependencies [103]. Additionally, the clustering and SVM strategy employs clustering algorithms to group similar sales patterns, followed by the application of SVM to each cluster for forecasting [96]. Furthermore, combining NLP techniques with machine learning models allows the incorporation of textual information such as customer reviews or social media data into the forecasting process, enriching the predictive accuracy [104,105].

3.7.10. Ensemble Techniques

Ensemble techniques, on the other hand, combine multiple models of the same type to create a more powerful and accurate forecasting system. The main idea behind ensemble techniques is to leverage the collective knowledge of multiple models to make better predictions. Ensemble methods aim to reduce the variance and bias of individual models, leading to improved generalization performance.

A key differences between hybrid approaches and ensemble techniques is that hybrid approaches typically combine models from different families (e.g., statistical and machine learning models), while ensemble techniques often use multiple models of the same type (e.g., multiple Decision Trees in a Random Forest). The most popular ensemble techniques used in sales forecasting include the following: Bagging: this trains multiple instances of the same model on different subsets of the training data and combines their predictions through averaging or voting [27,106]. Boosting: boosting algorithms, such as AdaBoost [36] and Gradient Boosting [12], iteratively train weak learners and combine them to create a strong learner, focusing on difficult samples or minimizing the overall loss function. Stacking: this combines the predictions of multiple heterogeneous models using a meta-model that learns to optimally combine the base models’ predictions [107].

3.7.11. Miscellaneous Techniques

In addition to the various forecasting models and techniques discussed earlier, there are several emerging technologies that have the potential to revolutionize sales forecasting. These technologies offer new ways of processing and analyzing vast amounts of data, improve computational efficiency, and uncover hidden patterns and insights. Such technologies include quantum computing [108], IoT analysis [81], and big data analytics [109].

3.8. The Evolution of Sales Forecasting Techniques

Sales forecasting has undergone considerable transformation over time. Initially grounded in qualitative methods based on expert opinions, the field has progressively embraced more sophisticated quantitative approaches.

Figure 10 indicates that traditional statistical models were popular early on but saw a gradual decline in studies over the years. Machine learning models have shown an increasing trend, indicating their rising importance in sales forecasting. Deep learning models, while not as prominent as machine learning initially, experienced a surge in studies, peaking around 2021 before declining. Notably, ensemble techniques and optimization and heuristic approaches also gained attention in certain periods, suggesting their utility in addressing complex forecasting challenges. This progression underscores the industry’s shift towards more data-driven and algorithmically complex methods to grapple with the growing intricacies of market dynamics, the abundance of data, and the continual improvements in computational capabilities.

3.9. Variations in Sales Forecasting Models across Industries

Sales forecasting models are designed to meet the distinct needs and tackle the unique challenges of various industries, reflecting a diversity in data characteristics and operational priorities. These models are fundamental in translating complex data into actionable insights that drive strategic decisions and operational efficiencies.

In the retail sector, demand forecasting is essential for managing inventory and optimizing supply chains, utilizing consumer behavior trends, seasonal variations, and promotional impacts to predict future product demands. This helps in minimizing costs and in enhancing customer satisfaction [30]. Contrastingly, water resource management employs forecasting models to ensure sustainable usage by predicting future water needs, critical for urban and agricultural planning [24].

The financial markets rely heavily on forecasting for guiding investment decisions through the prediction of stock prices and market trends, using historical data and advanced statistical methods [27]. Similarly, supply chain management forecasts product demand to optimize manufacturing and distribution processes, which is vital for reducing costs and improving logistics [31].

In the realm of corporate finance, forecasting is geared towards projecting economic trends and assessing financial health, aiding fiscal strategies in businesses and governments [32]. The fight against the drug industry benefits from forecasting by predicting sales and locations. This is highlighted by several studies, such as [110,111], which discuss the methodologies and successes of predictive analytics in this context.

Furthermore, the telecommunications sector uses these models to anticipate service demand and network traffic, which is essential for planning infrastructure development and ensuring quality service [109]. Finally, the energy sector employs forecasting to manage supply and demand, optimizing production to meet market conditions and consumption patterns [112]. In E-commerce, sales forecasting involves analyzing consumer data and market trends to manage inventory and target marketing efforts, using sophisticated data analytics tools [20].

3.10. Real-Time Sales Forecasting Models

Real-time sales forecasting models play a crucial role in sales, where they substantially improve decision making by providing prompt and accurate predictions of sales. Although there is a limited number of studies specifically addressing these models, existing research offers valuable insights. For example, Purnama et al. (2023) [113] demonstrate that incorporating online data-driven methods into product development and supply chain management significantly reduces the time to market and enhances adaptability to consumer demands in high-technology industries. In a similar study, Panda et al. (2023) [38] show that real-time forecasting with advanced machine learning models enables more agile adjustments in the food industry’s supply chain by accurately predicting demand fluctuations.

For the impact of real-time sales forecasting models, Zhao et al. (2022) [105] demonstrated the use of Linear Regression and Support Vector Regression models, integrated with sentiment and influence data from microblogs, to predict short-term movie box office sales. The real-time analysis of sentiment significantly enhanced the prediction accuracy, showing that timely data integration could lead to better forecasting outcomes. By adjusting marketing and distribution strategies based on these forecasts, movie distributors can maximize opening weekend revenues and overall profitability.

The study on three-dimensional concurrent engineering (3DCE) by Purnama et al. (2023) [113] is another excellent example of real-time sales forecasting impacting revenue. In high-technology markets, such as smartphones, where product life cycles are short and consumer demands rapidly evolve, real-time data-driven approaches can link customer requirements with supply chain dynamics effectively. This approach not only reduces the time to market but also minimizes the risks of stock outs or excess inventory, directly enhancing revenue potential. Real-time sales forecasting models offer significant benefits to sales in decision making, supply chain management, and revenue potential. By providing accurate and timely predictions of sales, these models enable more agile adjustments to supply chain dynamics, reduce waste, and enhance revenue potential.

3.11. Overview of Previous Studies

This section provides a detailed comparison of previous reviews and studies in sales forecasting and identifies the specific gaps that our research aims to fill. Table 6 below compares key aspects of existing research with our study, highlighting the differences in methodologies, focus areas, and outcomes.

The review of the literature underscores a prevalent issue: most studies concentrate intensely on specific sectors or utilize a single technological approach, which considerably narrows their applicability and relevance to the broader field of sales forecasting. Our research addresses these gaps in the following ways:

Expanding domain applicability:unlike previous works, our study does not confine its approach to a single domain but seeks to develop methodologies that are applicable across various types of sales contexts, enhancing the generalizability of our findings.
Technological diversity: we incorporate a variety of technological approaches, including AI and statistical models, to provide a comprehensive tool that is adaptable to different sales forecasting needs rather than focusing on a singular aspect of technology.
Holistic methodological approach: our research combines multiple studies to cover a range of sales types and scenarios, bridging the gap between domain-specific studies and the need for versatile, all-encompassing forecasting tools.

An important limitation in sales forecasting that our study addresses is sales intermittency, where data are often sparse and non-continuous. Sales intermittency refers to irregular and unpredictable occurrences of sales, including periods where sales may drop to zero, complicating the forecasting process. Such conditions demand robust methodologies that can effectively handle sporadic demand. Our research reviews how various works have addressed this issue, with machine learning techniques providing potential solutions. These methodologies are capable of analyzing patterns in sparse data and predicting future sales with greater accuracy, thus improving the reliability of sales forecasts in industries where sales intermittency is prevalent [47,48,49,50,51].

3.12. Discussion

In this section, we synthesize and discuss the answers given to each research question posed in our systematic mapping study. We provide a comprehensive evaluation of the methodologies, technologies, and strategies used in sales forecasting, uncovering an evolving terrain where traditional quantitative and statistical techniques not only persist but are also progressively augmented by advanced machine learning and deep learning approaches.

A significant observation is the surge in research efforts, particularly since 2016, signaling a growing interest in sales forecasting due to its strategic business importance. The study identifies a wide range of publication sources, indicating the interdisciplinary nature of the field that encompasses business, economics, statistics, and computer science.

The analysis of keywords and terminology used in the literature shows a strong focus on predictive modeling, machine learning techniques, time series analysis, and the amalgamation of various data sources. This demonstrates the multidisciplinary approach to sales forecasting, which merges statistical methods, artificial intelligence algorithms, and domain-specific knowledge to tackle the dynamic and intricate nature of sales prediction.

This study also underscores the variety of datasets used to assess sales forecasting approaches, ranging from public benchmarks to proprietary datasets specific to particular industries or companies. This variety mirrors the customized nature of sales forecasting tasks, which often demand domain-specific data and considerations.

Moreover, the examination of performance metrics used in sales forecasting studies emphasizes the importance of evaluation metrics. These metrics are essential for gauging the accuracy of regression and classification predictions, a critical aspect of sales forecasting.

A notable finding from the study is the increasing integration of classical statistical models with contemporary machine learning algorithms. This hybrid method takes advantage of the robustness of traditional models while harnessing the predictive capabilities of machine learning to manage large datasets and complex patterns that elude simpler, linear models.

However, the study also identifies several challenges in the field of sales forecasting. Issues related to data, such as availability and quality, pose significant obstacles to the creation of effective forecasting models. Furthermore, the assimilation of these advanced models into existing business systems and workflows remains a complex task that necessitates continuous research and innovation.

A critical aspect that has emerged from the study is sales intermittency. Sales data often exhibit irregular patterns, with periods of zero sales interspersed with bursts of activity, particularly in industries with seasonal demand or irregular purchase cycles. Traditional forecasting models struggle to handle such intermittence, necessitating the development of specialized techniques that can accurately predict sales in such erratic environments. Addressing intermittence requires robust methods capable of capturing both the frequency and magnitude of sales events, ensuring businesses can maintain appropriate inventory levels and optimize their supply chains.

Based on the insights derived from this extensive review, several potential avenues for future research in the field of sales forecasting are evident: enhancing predictive capabilities by leveraging diverse data sources such as sales figures, customer demographics, and economic indicators; developing hybrid and ensemble modeling approaches that combine the strengths of multiple forecasting techniques; incorporating dynamic and contextual factors such as seasonality, promotional activities, and market trends to improve forecasting accuracy and real-world applicability; focusing on the development of models that are not only accurate but also interpretable to enhance trust and facilitate better decision making within businesses; and exploring the implementation of real-time data processing and forecasting models to provide immediate insights and support agile decision making processes.

The findings from this study have significant implications for practice. Businesses can leverage advanced sales forecasting methods to enhance their strategic planning, inventory management, and financial decision making. The integration of machine learning and deep learning techniques allows for more accurate and scalable forecasting, which is crucial in today’s data-driven business environment. However, to effectively implement these advanced models, businesses must invest in data quality management, ensure the continuous training of their analytics teams, and develop systems that can integrate these models into their existing workflows.

4. Conclusions

In this article, we reviewed the landscape of sales forecasting research in all sectors, highlighting the significant advancements and diverse methodologies that have emerged over the past decade. We addressed the research questions defined in Section 2.1, using a structured approach that allowed us to thoroughly examine the current state of sales forecasting research and practices. Through the detailed mapping study conducted in this article, we have unearthed the increasing inclination towards sophisticated analytical techniques, notably machine learning and deep learning, aimed at refining the accuracy of sales forecasts. The findings reveal the critical challenges that research faces, such as the quality and complexity of data, alongside the dynamic nature of the sales environment. This review has illuminated the significant transition in sales forecasting methodologies to more sophisticated machine learning and deep learning approaches. These advanced methods are better equipped to handle the complexities of big data and have demonstrated substantial improvements in forecast accuracy. The integration of big data analytics into sales forecasting not only enhances predictive precision but also allows for more nuanced understandings of market dynamics and customer behavior. However, the adoption of these advanced technologies is not without its challenges. Issues with data quality, access, and integration with existing systems remain substantial barriers.

Based on the insights derived from this extensive review, we foresee several potential avenues for future research in the field of sales forecasting, specifically within the automotive industry due to its complex market structure, global supply chains, evolving consumer preferences, and significant economic impact. These include embracing multimodal data integration to enhance predictive capabilities by leveraging diverse data sources such as historical sales, customer demographics, and economic indicators. We also aim to advance hybrid and ensemble modeling approaches that combine the strengths of multiple forecasting techniques, such as time series analysis and machine learning algorithms. Incorporating dynamic and contextual factors, such as seasonality, promotional activities, and market trends, is also crucial for improving forecasting accuracy and real-world applicability. Lastly, addressing interpretability and explainability is essential for enhancing trust in our forecasting models and facilitating better decision making within the automotive industry.

Author Contributions

H.A.: Conceptualization, Formal analysis, Investigation, Methodology, Visualization, Writing—original draft, Writing—review & editing. L.A.: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Writing—review & editing. E.L.: Data curation, Project administration, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by both the company Syartec and the ANRT (National Association for Research and Technology).

Data Availability Statement

The data extraction details are available upon request. Interested readers can contact the authors to obtain the CSV file containing our state-of-the-art analysis, which includes various attributes such as article year, citation (number/BibTeX), domain area, experimental performance of proposed models, findings, journal/conference name, limitations, models and technology, objective, and performance measures.

Conflicts of Interest

The authors declare that this study received funding from Syartec and ANRT. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Liang, M.; Yang, L.; Li, K.; Zhai, H. Improved collaborative filtering for cross-store demand forecasting. Comput. Ind. Eng. 2024, 190, 110067. [Google Scholar] [CrossRef]
Sleem, A.A.; Alromema, M.; Abdel-Aal, M.A.M. Improved bass model using sales proportional average for one condition of mono peak curves. arXiv 2024, arXiv:2403.08993. [Google Scholar]
Geertsema, P.; Lu, H. Return Predictability: Accounting versus Market Information. 2024. Available online: https://ssrn.com/abstract=4725107 (accessed on 1 May 2024).
Makridakis, S.; Andersen, A.; Carbone, R.; Fildes, R.; Hibon, M.; Lewandowski, R.; Newton, J.; Parzen, E.; Winkler, R. Accuracy of forecasting: An empirical investigation. J. R. Stat. Soc. Ser. A (General) 1983, 142, 97–145. [Google Scholar] [CrossRef]
Petropoulos, F.; Makridakis, S.; Assimakopoulos, V.; Nikolopoulos, K. Forecasting with multivariate temporal aggregation: The case of promotional modelling. Int. J. Prod. Econ. 2018, 204, 161–171. [Google Scholar]
Snyder, R.D.; Ord, J.K.; Beaumont, A. Forecasting the intermittent demand for slow-moving inventories: A modelling approach. Int. J. Forecast. 2012, 28, 485–496. [Google Scholar] [CrossRef]
Petersen, K.; Feldt, R.; Mujtaba, S.; Mattsson, M. Systematic mapping studies in software engineering. In Proceedings of the 12th International Conference on Evaluation and Assessment in Software Engineering (EASE), BCS Learning & Development, Bari, Italy, 26–27 June 2008; pp. 1–10. [Google Scholar]
Swaminathan, K.; Venkitasubramony, R. Demand forecasting for fashion products: A systematic review. Int. J. Forecast. 2024, 40, 247–267. [Google Scholar] [CrossRef]
Pinciroli, F.; Justo, J.L.B.; Zeligueta, L.; Pma, M. Systematic Mapping Protocol-Coverage of Aspect-oriented Methodologies for the Early Phases of the Software Development Life Cycle. arXiv 2017, arXiv:1702.02653. [Google Scholar]
Li, D.; Li, X.; Gu, F.; Pan, Z.; Chen, D.; Madden, A. A Universality-Distinction Mechanism-Based Multi-Step Sales Forecasting for Sales Prediction and Inventory Optimization. Systems 2023, 11, 311. [Google Scholar] [CrossRef]
Omar, H.; Klibi, W.; Babai, M.Z.; Ducq, Y. Basket data-driven approach for omnichannel demand forecasting. Int. J. Prod. Econ. 2023, 257, 108748. [Google Scholar] [CrossRef]
Wang, J.; Chong, W.K.; Lin, J.; Hedenstierna, C.P.T. Retail Demand Forecasting Using Spatial-Temporal Gradient Boosting Methods. J. Comput. Inf. Syst. 2023, 1–13. [Google Scholar] [CrossRef]
Tillmann, A.M.; Joormann, I.; Ammann, S.C. Reproducible air passenger demand estimation. J. Air Transp. Manag. 2023, 112, 102462. [Google Scholar] [CrossRef]
Madongo, C.T.; Zhongjun, T. A movie box office revenue prediction model based on deep multimodal features. Multimed. Tools Appl. 2023, 82, 31981–32009. [Google Scholar] [CrossRef]
Chen, M.Y.; Liao, C.H.; Hsieh, R.P. Modeling public mood and emotion: Stock market trend prediction with anticipatory computing approach. Comput. Hum. Behav. 2019, 101, 402–408. [Google Scholar] [CrossRef]
Zhang, C.; Li, Y.; Yang, X. Predicting Car Sales Based on Web Search Data and Sentiment Classification. In Proceedings of the 2nd International Conference on Computing and Data Science, CONF-CDS 2021, Stanford, CA, USA, 28–29 January 2021; ACM: New York, NY, USA, 2021; pp. 1–6. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, J.; Ji, S.; Meng, C.; Li, T.; Zheng, Y. Predicting and ranking box office revenue of movies based on big data. Inf. Fusion 2020, 60, 25–40. [Google Scholar] [CrossRef]
Satish, K.R.; Kavya, N.P. Trend Analysis of E-Commerce Data using Hadoop Ecosystem. Int. J. Comput. Appl. 2016, 147, 1–5. [Google Scholar] [CrossRef]
Jabeur, S.B.; Mefteh-Wali, S.; Viviani, J.L. Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Ann. Oper. Res. 2021, 334, 679–699. [Google Scholar] [CrossRef]
Zhou, S. E-commerce Sales Forecast Based on Neural Network LSTM. In Proceedings of the 2nd International Conference on Mathematical Statistics and Economic Analysis, MSEA 2023, Nanjing, China, 26–28 May 2023; EAI: Nanjing, China, 2023; pp. 0–30. [Google Scholar] [CrossRef]
Xu, X.; Zhang, Y. Edible oil wholesale price forecasts via the neural network. Energy Nexus 2023, 12, 100250. [Google Scholar] [CrossRef]
Lin, Q.; Jia, N.; Chen, L.; Zhong, S.; Yang, Y.; Gao, T. A two-stage prediction model based on behavior mining in livestream e-commerce. Decis. Support Syst. 2023, 174, 114013. [Google Scholar] [CrossRef]
Makoni, T.; Chikobvu, D. Assessing and Forecasting the Long-Term Impact of the Global Financial Crisis on New Car Sales in South Africa. Data 2023, 8, 78. [Google Scholar] [CrossRef]
Patil, R.; Alandikar, P.; Chaudhari, V.; Patil, P.; Deshpande, P.S. Water Demand Prediction Using Machine Learning. Int. J. Res. Appl. Sci. Eng. Technol. 2022, 10, 122–128. [Google Scholar] [CrossRef]
Chen, J.; Wu, J. The prediction of Chongqing’s GDP based on the LASSO method and chaotic whale group algorithm—Back propagation neural network—ARIMA model. Sci. Rep. 2023, 13, 15002. [Google Scholar] [CrossRef]
Ahaggach, H.; Abrouk, L.; Foufou, S.; Lebon, E. Predicting Car Sale Time with Data Analytics and Machine Learning. In Proceedings of the IFIP International Conference on Product Lifecycle Management, Grenoble, France, 10–13 July 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 399–409. [Google Scholar]
Deng, S.; Zhu, Y.; Yu, Y.; Huang, X. An integrated approach of ensemble learning methods for stock index prediction using investor sentiments. Expert Syst. Appl. 2024, 238, 121710. [Google Scholar] [CrossRef]
Li, H.; Gao, H.; Song, H. Tourism forecasting with granular sentiment analysis. Ann. Tour. Res. 2023, 103, 103667. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, Y. Multivariate SVR Demand Forecasting for Beauty Products Based on Online Reviews. Mathematics 2023, 11, 4420. [Google Scholar] [CrossRef]
Punia, S.; Shankar, S. Predictive analytics for demand forecasting: A deep learning-based decision support system. Knowl. Based Syst. 2022, 258, 109956. [Google Scholar] [CrossRef]
Islam, S.; Amin, S.H.; Wardley, L.J. A supplier selection & order allocation planning framework by integrating deep learning, principal component analysis, and optimization techniques. Expert Syst. Appl. 2024, 235, 121121. [Google Scholar] [CrossRef]
Hao, J.; Yuan, J.; Wu, D.; Xu, W.; Li, J. A dynamic ensemble approach for multi-step price prediction: Empirical evidence from crude oil and shipping market. Expert Syst. Appl. 2023, 234, 121117. [Google Scholar] [CrossRef]
Gao, H.; Bai, Z.; Li, J. Sales Prediction Based On Product Titles and Images with Deep Learning Approaches. CS230: Deep Learning, Fall 2021, Stanford University, CA. Available online: https://cs230.stanford.edu/projects_fall_2021/reports/103165343.pdf (accessed on 1 May 2024).
Patil, S.; Vaze, V.; Agarkar, P.; Mahajan, H. Social context-aware and fuzzy preference temporal graph for personalized B2B marketing campaigns recommendations. Soft Comput. 2023. [CrossRef]
Xu, C.; Wang, X.; Hu, B.; Zhou, D.; Dong, Y.; Huo, C.; Ren, W. Graph attention networks for new product sales forecasting in e-commerce. In Proceedings of the Database Systems for Advanced Applications: 26th International Conference, DASFAA 2021, Part III 26, Taipei, Taiwan, 11–14 April 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 553–565. [Google Scholar]
Chien, C.H.; Trappey, A.J.; Wang, C.C. ARIMA-AdaBoost hybrid approach for product quality prediction in advanced transformer manufacturing. Adv. Eng. Inform. 2023, 57, 102055. [Google Scholar] [CrossRef]
Borrero, J.D.; Borrero-DomÃƒÂ nguez, J.D. Enhancing Short-Term Berry Yield Prediction for Small Growers Using a Novel Hybrid Machine Learning Model. Horticulturae 2023, 9, 549. [Google Scholar] [CrossRef]
Panda, S.K.; Mohanty, S.N. Time Series Forecasting and Modelling of Food Demand Supply Chain based on Regressors Analysis. IEEE Access 2023. [Google Scholar] [CrossRef]
Rožanec, J.M.; Kažič, B.; Škrjanc, M.; Fortuna, B.; Mladenić, D. Automotive OEM demand forecasting: A comparative study of forecasting algorithms and strategies. Appl. Sci. 2021, 11, 6787. [Google Scholar] [CrossRef]
Vaiciukynas, E.; Danenas, P.; Kontrimas, V.; Butleris, R. Two-Step Meta-Learning for Time-Series Forecasting Ensemble. IEEE Access 2021, 9, 62687–62696. [Google Scholar] [CrossRef]
Ebadi Jokandan, S.M.; Bayat, P.; Farrokhbakht Foumani, M. Predicting product advertisement links using hybrid learning within social networks. J. Supercomput. 2023, 79, 15023–15050. [Google Scholar] [CrossRef]
Hossain, M.S.; Rahman, M.F.; Uddin, M.K.; Hossain, M.K. Customer sentiment analysis and prediction of halal restaurants using machine learning approaches. J. Islam. Mark. 2022, 14, 1859–1889. [Google Scholar] [CrossRef]
Zhang, B.; Tseng, M.L.; Qi, L.; Guo, Y.; Wang, C.H. A comparative online sales forecasting analysis: Data mining techniques. Comput. Ind. Eng. 2023, 176, 108935. [Google Scholar] [CrossRef]
Giampaolo, F.; Gatta, F.; Prezioso, E.; Cuomo, S.; Zhou, M.; Fortino, G.; Piccialli, F. ENCODE - Ensemble neural combination for optimal dimensionality encoding in time-series forecasting. Inf. Fusion 2023, 100, 101918. [Google Scholar] [CrossRef]
Sun, F.; Meng, X.; Zhang, Y.; Wang, Y.; Jiang, H.; Liu, P. Agricultural Product Price Forecasting Methods: A Review. Agriculture 2023, 13, 1671. [Google Scholar] [CrossRef]
Borucka, A. Seasonal Methods of Demand Forecasting in the Supply Chain as Support for the CompanyÃ¢Â€Â™s Sustainable Growth. Sustainability 2023, 15, 7399. [Google Scholar] [CrossRef]
Kim, H.J.; Kim, J.H.; Im, J.b. Forecasting Offline Retail Sales in the COVID-19 Pandemic Period: A Case Study of a Complex Shopping Mall in South Korea. Buildings 2023, 13, 627. [Google Scholar] [CrossRef]
Leenawong, C.; Chaikajonwat, T. Event Forecasting for ThailandÃ¢Â€Â™s Car Sales during the COVID-19 Pandemic. Data 2022, 7, 86. [Google Scholar] [CrossRef]
Zhang, C.; Tian, Y.X. Forecast Daily Tourist Volumes During the Epidemic Period Using COVID-19 data, search engine data and weather data. Expert Syst. Appl. 2022, 210, 118505. [Google Scholar] [CrossRef]
Sleiman, R.; Mazyad, A.; Hamad, M.; Tran, K.P.; Thomassey, S. Forecasting Sales Profiles of Products in an Exceptional Context: COVID-19 Pandemic. Int. J. Comput. Intell. Syst. 2022, 15, 99. [Google Scholar] [CrossRef]
Hartanto, C.; Sofianti, T.D.; Budiarto, E. Multivariate Sales Forecast Model Towards Trend Shifting During COVID-19 Pandemic: A Case Study in Global Beauty Industry. In Proceedings of the 2022 International Conference on Engineering and Information Technology for Sustainable Industry, Tangerang, Indonesia, 21–22 September 2022; pp. 1–7. [Google Scholar]
Huang, T.; Fildes, R.; Soopramanien, D. The value of competitive information in forecasting FMCG retail product sales and the variable selection problem. Eur. J. Oper. Res. 2014, 237, 738–748. [Google Scholar] [CrossRef]
Kristjanpoller, W.; Minutolo, M.C. Forecasting volatility of oil price using an artificial neural network-GARCH model. Expert Syst. Appl. 2016, 65, 233–241. [Google Scholar] [CrossRef]
Li, C.; Cheang, B.; Luo, Z.; Lim, A. An exponential factorization machine with percentage error minimization to retail sales forecasting. ACM Trans. Knowl. Discov. Data (TKDD) 2021, 15, 1–32. [Google Scholar] [CrossRef]
Martin, G.M.; Frazier, D.T.; Maneesoonthorn, W.; Loaiza-Maya, R.; Huber, F.; Koop, G.; Maheu, J.; Nibbering, D.; Panagiotelis, A. Bayesian forecasting in economics and finance: A modern review. Int. J. Forecast. 2023. [Google Scholar] [CrossRef]
Ma, S.; Fildes, R. Retail sales forecasting with meta-learning. Eur. J. Oper. Res. 2021, 288, 111–128. [Google Scholar] [CrossRef]
Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; Li, Z. Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
Joseph, R.V.; Mohanty, A.; Tyagi, S.; Mishra, S.; Satapathy, S.K.; Mohanty, S.N. A hybrid deep learning framework with CNN and Bi-directional LSTM for store item demand forecasting. Comput. Electr. Eng. 2022, 103, 108358. [Google Scholar] [CrossRef]
Bian, T.; Chen, J.; Feng, Q.; Li, J. Comparing Econometric Analyses with Machine Learning Approaches: A Study on Singapore Private Property Market. Singap. Econ. Rev. 2022, 67, 1787–1810. [Google Scholar] [CrossRef]
Shiman, X. Comparison of Sales Prediction in Conventional Insights and Machine Learning Perspective. Psychology 2023, 13, 146–154. [Google Scholar] [CrossRef]
Suaza-Medina, M.E.; Zarazaga-Soria, F.J.; Pinilla-Lopez, J.; Lopez-Pellicer, F.J.; Lacasta, J. Effects of data time lag in a decision-making system using machine learning for pork price prediction. Neural Comput. Appl. 2023, 35, 19221–19233. [Google Scholar] [CrossRef]
Zohdi, M.; Rafiee, M.; Kayvanfar, V.; Salamiraad, A. Demand forecasting based machine learning algorithms on customer information: An applied approach. Int. J. Inf. Technol. 2022, 14, 1937–1947. [Google Scholar] [CrossRef]
Tugay, R.; Oguducu, S.G. Demand prediction using machine learning methods and stacked generalization. arXiv 2020, arXiv:2009.09756. [Google Scholar]
Hwang, S.; Yoon, G.; Baek, E.; Jeon, B.K. A Sales Forecasting Model for New-Released and Short-Term Product: A Case Study of Mobile Phones. Electronics 2023, 12, 3256. [Google Scholar] [CrossRef]
Vukovic, D.B.; Spitsina, L.; Gribanova, E.; Spitsin, V.; Lyzin, I. Predicting the Performance of Retail Market Firms: Regression and Machine Learning Methods. Mathematics 2023, 11, 1916. [Google Scholar] [CrossRef]
Gupta, R.; Lau, C.K.M.; Plakandaras, V.; Wong, W.K. The role of housing sentiment in forecasting U.S. home sales growth: Evidence from a Bayesian compressed vector autoregressive model. Econ. Res.-Ekon. IstražIvanja 2019, 32, 2554–2567. [Google Scholar] [CrossRef]
Liu, Y.; Yang, X.; Zhu, C.; Meng, J. Drugs Sale Forecasting Based on SVR Integrated Promotion Factors. J. Phys. Conf. Ser. 2021, 1910, 012056. [Google Scholar] [CrossRef]
Dharmawan, P.A.S.; Indradewi, I.G.A.A.D. Double exponential smoothing brown method towards sales forecasting system with a linear and non-stationary data trend. J. Phys. Conf. Ser. 2021, 1810, 012026. [Google Scholar] [CrossRef]
Hardi, S.; Fakhrur Rozi, N. Data Mining Forecasting Sales of Building Materials on CV. Forward Together in Surabaya With Use Time Series. J. Phys. Conf. Ser. 2020, 1569, 022085. [Google Scholar] [CrossRef]
Kachniewska, M. The Use of Big Data in Tourism Sales Forecasting. Int. J. Contemp. Manag. 2020, 19, 7–35. [Google Scholar] [CrossRef]
Kurniawan, H.; Triloka, J.; Ardhan, Y. Analysis of the Artificial Neural Network Approach in the Extreme Learning Machine Method for Mining Sales Forecasting Development. Int. J. Adv. Comput. Sci. Appl. 2023, 14. [Google Scholar] [CrossRef]
Raju, S.M.T.U.; Sarker, A.; Das, A.; Islam, M.M.; Al-Rakhami, M.S.; Al-Amri, A.M.; Mohiuddin, T.; Albogamy, F.R. An Approach for Demand Forecasting in Steel Industries Using Ensemble Learning. Complexity 2022, 2022, 1–19. [Google Scholar] [CrossRef]
Dou, Z.; Sun, Y.; Zhang, Y.; Wang, T.; Wu, C.; Fan, S. Regional Manufacturing Industry Demand Forecasting: A Deep Learning Approach. Appl. Sci. 2021, 11, 6199. [Google Scholar] [CrossRef]
Massaro, A.; Panarese, A.; Giannone, D.; Galiano, A. Augmented Data and XGBoost Improvement for Sales Forecasting in the Large-Scale Retail Sector. Appl. Sci. 2021, 11, 7793. [Google Scholar] [CrossRef]
Panarese, A.; Settanni, G.; Vitti, V.; Galiano, A. Developing and Preliminary Testing of a Machine Learning-Based Platform for Sales Forecasting Using a Gradient Boosting Approach. Appl. Sci. 2022, 12, 11054. [Google Scholar] [CrossRef]
Aguilar-Palacios, C.; Munoz-Romero, S.; Rojo-Alvarez, J.L. Cold-Start Promotional Sales Forecasting Through Gradient Boosted-Based Contrastive Explanations. IEEE Access 2020, 8, 137574–137586. [Google Scholar] [CrossRef]
Raizada, S.; Saini, J.R. Comparative Analysis of Supervised Machine Learning Techniques for Sales Forecasting. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 102–110. [Google Scholar] [CrossRef]
Kalaiarasan, T.R.; An kumar, V.; Ratheesh, K.A.M. Sales Forecasting using RNN. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 2748–2751. [Google Scholar] [CrossRef]
Khan, M.A.; Saqib, S.; Alyas, T.; Ur Rehman, A.; Saeed, Y.; Zeb, A.; Zareei, M.; Mohamed, E.M. Effective Demand Forecasting Model Using Business Intelligence Empowered With Machine Learning. IEEE Access 2020, 8, 116013–116023. [Google Scholar] [CrossRef]
Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent Neural Networks for Time Series Forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
Liu, Q.; Zhao, X.; Shi, K. The analysis of agricultural Internet of things product marketing by deep learning. J. Supercomput. 2023, 79, 4602–4621. [Google Scholar] [CrossRef]
Liu, B.; Song, C.; Liang, X.; Lai, M.; Yu, Z.; Ji, J. Regional differences in China’s electric vehicle sales forecasting: Under supply-demand policy scenarios. Energy Policy 2023, 177, 113554. [Google Scholar] [CrossRef]
Mouthami, K.; Yuvaraj, N.; Pooja, R.I. Analysis of SARIMA-BiLSTM-BiGRU in Furniture Time Series Forecasting. In Lecture Notes in Networks and Systems; Springer Nature: Cham, Switzerland, 2023; pp. 959–970. [Google Scholar] [CrossRef]
Arunkumar, O.; Divya, D. Deep learning techniques for demand forecasting: Review and future research opportunities. Inf. Resour. Manag. J. 2022, 35, 1–24. [Google Scholar] [CrossRef]
Rizvi, S.M.; Syed, T.; Qureshi, J. Real-time forecasting of petrol retail using dilated causal CNNs. J. Ambient. Intell. Humaniz. Comput. 2022, 1–12. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Vallés-Pérez, I.; Soria-Olivas, E.; Martínez-Sober, M.; Serrano-López, A.J.; Gómez-Sanchís, J.; Mateo, F. Approaching sales forecasting using recurrent neural networks and transformers. Expert Syst. Appl. 2022, 201, 116993. [Google Scholar] [CrossRef]
Lan, Y.; Wu, Y.; Xu, W.; Feng, W.; Zhang, Y. Chinese Fine-Grained Financial Sentiment Analysis with Large Language Models. arXiv 2023, arXiv:2306.14096. [Google Scholar]
Belvedere, V.; Goodwin, P. The influence of product involvement and emotion on short-term product demand forecasting. Int. J. Forecast. 2017, 33, 652–661. [Google Scholar] [CrossRef]
Schaer, O.; Kourentzes, N.; Fildes, R. Demand forecasting with user-generated online information. Int. J. Forecast. 2019, 35, 197–212. [Google Scholar] [CrossRef]
Ramos, P.; Santos, N.; Rebelo, R. Performance of state space and ARIMA models for consumer retail sales forecasting. Robot. Comput. Integr. Manuf. 2015, 34, 151–163. [Google Scholar] [CrossRef]
Singh, K.; Booma, P.M.; Eaganathan, U. E-Commerce System for Sale Prediction Using Machine Learning Technique. J. Physics Conf. Ser. 2020, 1712, 012042. [Google Scholar] [CrossRef]
Zhao, L.; Liu, Z.; Mbachu, J. Energy Management through Cost Forecasting for Residential Buildings in New Zealand. Energies 2019, 12, 2888. [Google Scholar] [CrossRef]
Liu, J.; Chen, L.; Luo, R.; Zhu, J. A combination model based on multi-angle feature extraction and sentiment analysis: Application to EVs sales forecasting. Expert Syst. Appl. 2023, 224, 119986. [Google Scholar] [CrossRef]
Hu, M.; Li, H.; Song, H.; Li, X.; Law, R. Tourism demand forecasting using tourist-generated online review data. Tour. Manag. 2022, 90, 104490. [Google Scholar] [CrossRef]
Dai, W.; Chuang, Y.Y.; Lu, C.J. A Clustering-based Sales Forecasting Scheme Using Support Vector Regression for Computer Server. Procedia Manuf. 2015, 2, 82–86. [Google Scholar] [CrossRef]
van Ruitenbeek, R.E.; Koole, G.; Bhulai, S. Hierarchical Agglomerative Clustering for Product Sales Forecasting; Vrije Universiteit Amsterdam: Amsterdam, The Netherlands, 2023. [Google Scholar] [CrossRef]
Jiménez, F.; Sánchez, G.; García, J.M.; Sciavicco, G.; Miralles, L. Multi-objective evolutionary feature selection for online sales forecasting. Neurocomputing 2017, 234, 75–92. [Google Scholar] [CrossRef]
Sohrabpour, V.; Oghazi, P.; Toorajipour, R.; Nazarpour, A. Export sales forecasting using artificial intelligence. Technol. Forecast. Soc. Chang. 2021, 163, 120480. [Google Scholar] [CrossRef]
Zhou, X.; Li, Z.; Feng, X.; Yan, H.; Chen, D.; Yang, C. A hybrid deep learning framework driven by data and reaction mechanism for predicting sustainable glycolic acid production performance. AIChE J. 2023, 69, e18083. [Google Scholar] [CrossRef]
Fan, G.F.; Wei, X.; Li, Y.T.; Hong, W.C. Forecasting electricity consumption using a novel hybrid model. Sustain. Cities Soc. 2020, 61, 102320. [Google Scholar] [CrossRef]
Ma, X.; Li, M.; Tong, J.; Feng, X. Deep Learning Combinatorial Models for Intelligent Supply Chain Demand Forecasting. Biomimetics 2023, 8, 312. [Google Scholar] [CrossRef]
Sarpong-Streetor, R.M.N.Y.; Sokkalingam, R.; Othman, M.; Azad, A.S.; Syahrantau, G.; Arifin, Z. Intelligent Hybrid ARIMA-NARNET Time Series Model to Forecast Coconut Price. IEEE Access 2023. [Google Scholar] [CrossRef]
Ding, Y.; Wu, P.; Zhao, J.; Zhou, L. Forecasting product sales using text mining: A case study in new energy vehicle. Electron. Commer. Res. 2023, 1–33. [Google Scholar] [CrossRef]
Zhao, J.; Xiong, F.; Jin, P. Enhancing Short-Term Sales Prediction with Microblogs: A Case Study of the Movie Box Office. Future Internet 2022, 14, 141. [Google Scholar] [CrossRef]
Nti, I.K.; Adekoya, A.F.; Weyori, B.A. A comprehensive evaluation of ensemble learning for stock-market prediction. J. Big Data 2020, 7. [Google Scholar] [CrossRef]
Livieris, I.E.; Pintelas, E.; Stavroyiannis, S.; Pintelas, P. Ensemble Deep Learning Models for Forecasting Cryptocurrency Time-Series. Algorithms 2020, 13, 121. [Google Scholar] [CrossRef]
Gandhudi, M.; Gangadharan, G.R.; Alphonse, P.J.; Velayudham, V.; Nagineni, L. Causal aware parameterized quantum stochastic gradient descent for analyzing marketing advertisements and sales forecasting. Inf. Process. Manag. 2023, 60, 103473. [Google Scholar] [CrossRef]
Chatterjee, S.; Chaudhuri, R.; Gupta, S.; Sivarajah, U.; Bag, S. Assessing the impact of big data analytics on decision-making processes, forecasting, and performance of a firm. Technol. Forecast. Soc. Change 2023, 196, 122824. [Google Scholar] [CrossRef]
Koch, F.H.; Prestemon, J.P.; Donovan, G.H.; Hinkley, E.A.; Chase, J.M. Predicting cannabis cultivation on national forests using a rational choice framework. Ecol. Econ. 2016, 129, 161–171. [Google Scholar] [CrossRef]
Meng, J.; Yang, X.; Yang, C.; Liu, Y. Comparative Analysis of Prophet and LSTM Model in Drug Sales Forecasting. J. Phys. Conf. Ser. 2021, 1910, 012059. [Google Scholar] [CrossRef]
Zhou, C.; Che, C.; Wang, P.; Zhang, Q. Diformer: A dynamic self-differential transformer for new energy power autoregressive prediction. Knowl.-Based Syst. 2023, 281, 111061. [Google Scholar] [CrossRef]
Purnama, D.A.; Masruroh, N.A. Online data-driven concurrent product-process-supply chain design in the early stage of new product development. J. Open Innov. Technol. Mark. Complex. 2023, 9, 100093. [Google Scholar] [CrossRef]
Basu, R.; Lim, W.M.; Kumar, A.; Kumar, S. Marketing analytics: The bridge between customer psychology and marketing decision-making. Psychol. Mark. 2023, 40, 2588–2611. [Google Scholar] [CrossRef]
Brasse, J.; Broder, H.R.; Förster, M.; Klier, M.; Sigler, I. Explainable artificial intelligence in information systems: A review of the status quo and future research directions. Electron. Mark. 2023, 33, 26. [Google Scholar] [CrossRef]
Alzubaidi, L.; Bai, J.; Al-Sabaawi, A.; Santamaría, J.; Albahri, A.S.; Al-dabbagh, B.S.N.; Fadhel, M.A.; Manoufali, M.; Zhang, J.; Al-Timemy, A.H.; et al. A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. J. Big Data 2023, 10, 46. [Google Scholar] [CrossRef]
Papastefanopoulos, V.; Linardatos, P.; Panagiotakopoulos, T.; Kotsiantis, S. Multivariate Time-Series Forecasting: A Review of Deep Learning Methods in Internet of Things Applications to Smart Cities. Smart Cities 2023, 6, 2519–2552. [Google Scholar] [CrossRef]
Haghani, M.; Sprei, F.; Kazemzadeh, K.; Shahhoseini, Z.; Aghaei, J. Trends in electric vehicles research. Transp. Res. Part D Transp. Environ. 2023, 123, 103881. [Google Scholar] [CrossRef]
Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. A 2021, 379, 20200209. [Google Scholar] [CrossRef]
Henrique, B.M.; Sobreiro, V.A.; Kimura, H. Literature review: Machine learning techniques applied to financial market prediction. Expert Syst. Appl. 2019, 124, 226–251. [Google Scholar] [CrossRef]
Khadjeh Nassirtoussi, A.; Aghabozorgi, S.; Ying Wah, T.; Ngo, D.C.L. Text mining for market prediction: A systematic review. Expert Syst. Appl. 2014, 41, 7653–7670. [Google Scholar] [CrossRef]
Han, Z.; Zhao, J.; Leung, H.; Ma, K.F.; Wang, W. A review of deep learning models for time series prediction. IEEE Sens. J. 2019, 21, 7833–7848. [Google Scholar] [CrossRef]

Figure 1. Methodology overview.

Figure 2. The evolution of the number of published works on sales forecasting over time.

Figure 3. Top 10 paper venues.

Figure 4. Word cloud of sales forecasting literature based on the papers’ titles.

Figure 5. Word cloud of sales forecasting literature based on the papers’ abstracts.

Figure 6. Distribution of dataset by privacy type.

Figure 7. Distribution of evaluation metrics used in studies for sales forecasting.

Figure 8. The ROC curve and its corresponding AUC.

Figure 9. Mind map of techniques and models used in sales forecasting.

Figure 10. Evolution of sales forecasting techniques from 2013 to 2023.

Table 1. Inclusion/exclusion criteria for the study on sales forecasting.

Selection Criteria	Criteria Description
Inclusion criteria (3801 sources identified)	- Title/abstract/keywords include search string in Table 2. - The paper is published after 2013.
Exclusion criteria (1070 sources selected)	- The source is not a research paper (blog, presentation, etc.). - Duplications. - The source is not open access. - The source is not in English or French.

Table 2. Search query structure.

(“sales” OR “revenue” OR “product” OR “finance” OR “market” OR “industry” OR “market”,“demand”) AND (“prediction” OR “prevision” OR “forecasting” OR “estimating” OR “recommendation” OR “machine learning” OR “regression” OR “time series” OR “artificial intelligence” OR “deep learning” OR “neural networks” OR “data mining” OR “predictive analytics” OR “predictive modeling”)

Table 3. Data extraction form.

No.	Attribute Name	Research Question
1	Abstract	RQ₃
2	Article year	RQ₁, RQ₈
3	Citation (number/BibTeX)	-
4	Domain area	RQ₉
5	Experimental performance of proposed models	RQ₇
6	Findings	RQ₄, RQ₁₀
7	Journal/conference name	RQ₃
8	Limitations	RQ₆
9	Models and technology	RQ₇, RQ₈
10	Objective	RQ₇
11	Performance measures	RQ₅
12	Study title	RQ₃
13	Type of study	RQ₂

Table 4. Summary of studies by data privacy type.

Privacy	Data Type	Example of Studies	Number of Studies
Public	Time series data	[10,11,12]	61
	Tabular data	[13,14,15]	15
	Textual data	[16,17]	2
	Combined data	[18,19,20]	6
Private	Time series data	[21,22,23]	115
	Tabular data	[24,25,26]	30
	Textual data	[27,28,29]	5
	Combined data	[30,31,32]	42
	Image data	[33]	1
	Network data	[34,35]	3
Hybrid	Time series data	[36,37]	87
	Tabular data	[38,39,40]	24
	Textual data	[41,42,43]	6
	Combined data	[20,44,45]	58

Table 5. Confusion matrix.

	Prediction Values
Actual Values	True positive (TP)	False negative (FN)
Actual Values	False negative (FN)	True negative (TN)

Table 6. Comparison of reviews on sales forecasting studies.

Study	Methodology	Focus Area	Key Findings	Gaps Identified
[114]	State-of-the-art overview of marketing analytics	Marketing, specifically analyzing customer behavior	Provides insights into customer behavior and decision making processes	Limited depth in specific industries; focuses broadly on marketing
[115]	Discusses XAI methods, particularly in AI transparency	Information systems, particularly in AI applications	Discusses the importance of transparency in AI applications	Does not address specific industry challenges
[116]	Comprehensive survey of deep learning tools	Various applications of deep learning across industries	Outlines the utility and application of deep learning tools in data analytics	Broad focus, lacks industry-specific insights
[117]	Examines deep learning architectures like RNN and LSTM	Smart cities and related applications	Reviews the effectiveness of time series forecasting in smart city management	Specific to smart cities, not applicable to other industries directly
[118]	Electrification of vehicles, trends, and implications	Transportation, focusing on electric vehicles	Discusses current trends and future directions in electric vehicle markets	Focused solely on electric vehicles, not covering other automotive areas
[119]	Deep learning architectures for time series forecasting	Various applications including climate, finance, retail, and healthcare	Effective in handling complex time series data, outperforming traditional methods	Complexity and black-box nature of deep learning models
[120]	Machine learning models, especially SVMs and neural networks	Financial markets, particularly stock market forecasting	Effective in analyzing financial time series; highlights need for research in developing markets	Predicting financial markets remains complex due to inherent unpredictability
[121]	Text mining in online sentiment analysis	Financial markets, sentiment analysis for stocks and FOREX	Potential of text mining to predict market trends emphasized	Challenge in predicting market movements due to complex data
[122]	Various deep learning models like CNNs, LSTMs, and GANs	Time series prediction across various fields	Unique advantages of models in specific scenarios highlighted	Computational intensity and overfitting issues noted

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahaggach, H.; Abrouk, L.; Lebon, E. Systematic Mapping Study of Sales Forecasting: Methods, Trends, and Future Directions. Forecasting 2024, 6, 502-532. https://doi.org/10.3390/forecast6030028

AMA Style

Ahaggach H, Abrouk L, Lebon E. Systematic Mapping Study of Sales Forecasting: Methods, Trends, and Future Directions. Forecasting. 2024; 6(3):502-532. https://doi.org/10.3390/forecast6030028

Chicago/Turabian Style

Ahaggach, Hamid, Lylia Abrouk, and Eric Lebon. 2024. "Systematic Mapping Study of Sales Forecasting: Methods, Trends, and Future Directions" Forecasting 6, no. 3: 502-532. https://doi.org/10.3390/forecast6030028

APA Style

Ahaggach, H., Abrouk, L., & Lebon, E. (2024). Systematic Mapping Study of Sales Forecasting: Methods, Trends, and Future Directions. Forecasting, 6(3), 502-532. https://doi.org/10.3390/forecast6030028

Article Menu

Systematic Mapping Study of Sales Forecasting: Methods, Trends, and Future Directions

Abstract

1. Introduction

2. Research Approach

2.1. Definition of Research Questions

2.2. Conducting Searches for Primary Studies

2.3. Examination of Papers

2.4. Data Extraction

2.5. Addressing Validity Threats

2.6. Data Synthesis

3. Results of SMS on Sales Forecasting

3.1. Studies on Sales Forecasting

3.2. Venues Publishing Sales Forecasting Research

3.3. Terminology Used in Sales Forecasting

3.4. Datasets Used in Sales Forecasting

3.5. Performance Metrics Used in Sales Forecasting

3.5.1. Time Series and Regression Evaluation Metrics

Mean Absolute Error (MAE)

Mean Squared Error (MSE)

Mean Absolute Scaled Error (MASE)

Root Mean Squared Error (RMSE)

Mean Absolute Percentage Error (MAPE)

Symmetric Mean Absolute Percentage Error (sMAPE)

R² (Coefficient of Determination)

Squared Loss (SQL)

Sum of Squared Errors (SSE)

Mean Absolute Deviation (MAD)

3.5.2. Classification Evaluation Metrics

Accuracy

Precision

Recall

F1 Score

Confusion Matrix

Area under the Receiver Operating Characteristic Curve (AUC-ROC)

3.5.3. Clustering Evaluation Metrics

Diversity Measures

3.5.4. Statistical Model Evaluation Metrics

Bayesian Information Criterion (BIC)

Overall Goodness of Fit (OGF)

Convergence Rate

3.6. Limitations of Proposed Solutions in Sales Forecasting

3.7. Methods and Technologies Used in Sales Forecasting

3.7.1. Qualitative Methods

3.7.2. Machine Learning Models

3.7.3. Deep Learning Models

3.7.4. Statistical Models

3.7.5. Decomposition and Clustering Techniques

3.7.6. Optimization and Heuristic Approaches

3.7.7. Natural Language Processing (NLP)

3.7.8. Data Processing Techniques

3.7.9. Hybrid Approaches

3.7.10. Ensemble Techniques

3.7.11. Miscellaneous Techniques

3.8. The Evolution of Sales Forecasting Techniques

3.9. Variations in Sales Forecasting Models across Industries

3.10. Real-Time Sales Forecasting Models

3.11. Overview of Previous Studies

3.12. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI