1. Introduction
Inventory management involves making decisions for tactically managing and controlling the inventory level that are essential to meet customer demand at the right time and at the best cost for improving the production and delivery of the end products [
1]. Inventory control attempts to meet customer demand at a predetermined service level. By assuming deterministic demand, most existing studies strive to minimize the total of projected ordering and inventory carrying costs [
2]. Nevertheless, in practice, owing to market fluctuations, consumer demand becomes exceedingly unpredictable [
3]. In a market where competition is fierce, precise demand forecasting is essential for streamlining inventory control, cutting expenses, and increasing income. Accurate and timely forecasting of future demand help retailers match marketing, distribution, and production to anticipated consumer demands, improving both operational effectiveness and customer satisfaction [
4]. In general, having either excessive or insufficient inventory results in inefficient management of the stock inventory and negative outcomes affecting the upstream supply chain [
5]. For instance, having too much inventory can lead to an oversupply scenario, as the quantity of stored goods has significantly surpassed market needs [
6]. In this scenario, the related inventory expense becomes elevated, as many items will need to be kept for an extended period, resulting in slow or inadequate resource turnover for businesses [
7]. Furthermore, having an inadequate inventory result in insufficient coverage of customer demands, which can gradually lead to product shortages, diminished customer trust, and potentially reduced profits [
8]. Despite the difficulties in attaining precise demand forecasting amid market uncertainties, the abundance of historical data and the application of big data analytics have enhanced the accuracy of demand forecasting [
9].
Numerous previous studies have supported the derived version of the inventory model to account for uncertainty and variability. The Economic Order Quantity (EOQ) model is developed using Monte Carlo simulation to account for demand variability and time-sensitive product quality [
10]. In view of this problem, another study examined the lead time and defective products as uncertainty factors in addition to demand by formulating the Min–Max inventory method [
11]. An additional approach to address uncertainty in inventory involves the use of predictive techniques, such as employing ensemble deep learning for optimizing order-up-to-level inventory in demand forecasting [
12]. Nevertheless, none of the studies have taken into account not only the uncertainties in supply and demand but also the uncertainty of discount events offered by the supplier at any given time. Specifically, this discount event is viewed as an uncertain or fluctuating price that has become increasingly prevalent lately [
13]. Studies on demand forecasting are crucial for many industries, but the retail industry presents greater challenges than others. The primary cause of this is the lack of precise knowledge about the factors that will influence demand. Seasonality and trend are two other factors that heavily impact sales in the apparel sector [
14]. Sales in the retail sector typically happen quickly. Customers visit other stores to satisfy their demands if they are unable to find the thing they desire in the store. For the stores, this translates into lower sales. As a result, it is critical to anticipate client needs precisely [
15]. Customer loss may result from higher stock expenses, stock transfers to subsequent years, and a break from fashion if production exceeds real demand. When there is less production than there is demand, there is a decline in sales and a bad effect on the reputation of the brand, as customers are unable to find the products. Additionally, this may have a detrimental impact on brand belief and client loyalty. Specifically, the retail fashion and apparel sector is known for its dynamic and fast-paced environment, which is fueled by shifting consumer tastes, seasonal fashions, and shifting market dynamics [
16,
17]. Consumers these days want their products supplied quickly and like to choose them from a vast array of alternatives. It is difficult for fashion retailers to quickly satisfy customer needs [
18]. As a result, it is critical that sellers of fashion apparel act quickly and effectively when it comes to restocking inventory by forecasting future demand patterns [
19].
Deep learning is a branch of machine learning that makes use of multi-layered artificial neural networks and has shown impressive results in identifying complex patterns and connections in big datasets [
20,
21]. Retailers can use sophisticated algorithms to derive valuable insights from a variety of data sources, such as past sales, product features, industry trends, and outside variables, by utilizing deep learning models [
13]. Predicting future demand and sales of fashion items with accuracy is still a major challenge for both business and academia. Studying the intricacy of the fashion industry and managerial techniques that would enable products to be created, manufactured, and delivered on schedule is essential in order to address this issue [
14]. By utilizing deep learning techniques to forecast demand in the retail clothing and fashion industry and investigating the possibilities of deep learning models, including transformer models, convolution neural networks (CNNs) [
3], and recurrent neural networks (RNNs) [
15], it is possible to increase the precision and dependability of demand forecasts by addressing the shortcomings of conventional forecasting techniques. Predicting future demand and sales of fashion items with accuracy is still a major challenge for both business and academia [
22]. It is essential to solve this challenge. The fashion industry is trying to manage these quick changes more effectively by implementing an agile supply chain in order to solve the complexity originating from these variables [
16,
17]. By presenting a revolutionary strategy for creating an intelligent demand forecasting system employing cutting-edge deep learning techniques and historical product images, this study solves the shortcomings of the traditional forecasting system [
2]. However, it is crucial to understand that there are challenges associated with the demand forecasting models used in the fashion and apparel retail industry. Fashion cycles are seasonal and can be very volatile most of the time. The external factors are economic factors and social media trends that cannot be overlooked, and the data from the fashion and apparel retail industry may include historical sales, current trends, as well as consumer data. Models fail to consider fluctuating trends in consumer preferences and other characteristics.
Therefore, this research offers significant improvement to demand forecasting in fashion and apparel retail using FZ with a DACR model. Such an integration of FZ with deep learning architectures leverages the inherent adaptability of FZ to optimize the feature extraction and sequential learning strength of deep learning architectures. In this study, FZ is used to select the most relevant search space that helps in improving the efficiency of the model. The key contribution of the proposed model is explained as follows.
Formicary Zebra Optimization (FZ): The FZ algorithm is a novel optimization algorithm inspired by the foraging behavior of zebras and the path search capability of ants to assist in obtaining the optimal solution. Based on the patterns of collected and shared collective intelligence, the FZ algorithm balances the trade-off between exploration–exploitation, leading to redundancy across other domains. Furthermore, the incorporation of unique traits improves the foraging and decision making concepts to solve high-dimensional optimization problems and provides improved performance.
FZ-DACR: In the proposed model, the combination of the convolutional neural network (CNN) and the recurrent neural network (RNN) model offers the benefits of extracting the context and hierarchical patterns present in the input data. Specifically, the proposed model is trained over the features extracted from product images and historical sales data. Furthermore, the distributed attention mechanism allows the model to focus on the important regions in the sequence data representations using the attention weights, improving forecasting performance.
The study is coherently organized, with
Section 2 primarily concentrating on contemporary techniques and providing an overview of the strategies and obstacles faced. The recommended method for demand forecasting and the DACR model’s mathematical form is explained in detail in
Section 3. The DACR technique’s outcomes are examined in
Section 4. A thorough examination of the study’s findings is provided as conclusion in
Section 5.
2. Literature Review
The goal of Chandadevi Giri et al.’s research [
2] was to use ML clustering and classification algorithms to propose a novel sales forecasting approach for fashion products. The clustering and classification model has been trained using real fashion retail data. This model has the potential to be useful for handling several supply chain planning activities in the fashion garment business. The results of the study show that the fashion retailing business can forecast the sales of a new item by using images and the item’s past data. However, it was possible to improve the image database in subsequent work to increase model accuracy.
Simsir Fuat and Ilker Guven [
8] examine the impact of product variety and artificial intelligence on sales forecasting in the retail apparel sector, intending to minimize errors. Within this framework, models of artificial intelligence, including support vector machines (SVM) and artificial neural networks (ANN), have been developed and conclusions drawn from the datasets that attain a high performance and efficiently decrease the over-fitting issues. Sales decline and brand prestige suffer when there is insufficient supply compared to demand because customers cannot locate the products.
For apparel products, demand forecasting and inventory prediction were first presented by Tran Thi Bich Chau et al. [
11]. The ARIMA (Autoregressive Integrated Moving Average) model’s demand was predicted in their study. A policy framework for production and forecasting models was proposed to ensure optimal inventory levels and enhance enterprise supply capacity. Additionally, the next goal is to identify the distribution path for the whole distribution system.
Giri Chandadevi et al. [
3] present a revolutionary deep learning and non-linear neural network regression technique for fashion product sales forecasting. For estimating the amount of products that are to be sold in the future, the study appears to be promising. The study’s primary weakness stems from the usage of a smaller dataset. This restriction should be overcome in future work by choosing a sizable image dataset to enhance model performance. This strategy may help fashion retailers and designers in the big data and data mining era.
Majd Kharfan et al. [
10] used machine learning techniques to provide a data-driven forecasting solution for recently launched seasonal products. Forecasting has an impact on a company’s entire supply chain activities. It is influenced by a variety of implicit product attributes in addition to sales patterns. To improve the process of demand forecasting for recently launched products in the fashion retailing industry, the study suggests a data-driven methodology based on machine learning techniques. Decision makers can use the suggested methodology to anticipate demand for recently introduced products even in the absence of previous data for fashion supply chains. Still, there are unanswered questions regarding the features to be used and the chosen machine learning algorithms for further research.
The use of modular neural networks for demand forecasting with seasonal climate predictions was first suggested by Smit Marvaniya et al. [
18]. The challenge of climate-aware demand forecasting in the seasonal context involves employing a sub-neural network architecture to efficiently learn joint latent representations of historical data, known inputs, and climate predictions. This model effectively overcomes the existing problems and attains high performance. Additionally, there is a need to create procedures to assess the uncertainty associated with demand projections and transmit it from ensemble forecasts at different granularities.
Muhammad Yasir et al.’s [
5] main objective was to forecast the textile and clothing sectors’ output. GLS and single-layer perceptron models were used on the time-series dataset of a textile apparel company. The model has high potential to forecast the textile and clothing sectors’ output with high accuracy. However, time-series datasets are rarely available; this is a problem. The same issue was present in this investigation. Future research must be based on large datasets in accordance with this limitation of unavailability. It is also necessary to include more macro-level (exogenous) elements in the forecasting models.
Sushil Punia et al.’s [
15] research presents a novel forecasting technique that integrates deep learning with an advantage over previous forecasting techniques. The suggested method may represent intricate correlations of both temporal and regression types. For time-series forecasting challenges, the more recent and sophisticated neural networks, including spiking neural networks or convolutional neural networks (CNNs), were used. CNN handles large amounts of data with notable success. However, the overfitting issue persists in the model, affecting overall performance.
Javad Feizabadi [
19] introduced a hybrid demand forecasting method grounded in machine learning, such as ARIMAX and a neural network. The method was applied and evaluated in the context of a functional product and a steel manufacturer. However, the major limitation of the research is the use of a single dataset, limiting the generalization ability of the model.
Marco A. Villegas [
20] presented a new model selection approach that combines different criteria using a support vector machine (SVM). This methodology will be exciting for scenarios with highly volatile demand. The methodology is very interesting because it allows for changing the model when it does not fit the data sufficiently well, thereby reducing the risk of misusing modeling techniques in the automatic processing of large datasets. However, the model is limited in terms of slow learning if new trends and outside forces are involved.
Rathipriya et al. [
23] introduced the Demand Forecast Model (DFM), which assists pharmaceutical firms in achieving high efficiency in the global market. Specifically, the DFM model employed diverse shallow and deep neural networks to improve demand forecasting. In addition, the DFM model effectively suggested sales and marketing strategies considering trends or seasonal effects associated with diverse groups of pharmaceutical products with dissimilar characteristics. From the overall analysis, DFMs utilizing shallow neural networks efficiently predicted the future demand of pharmaceutical products.
Chandriah and Naraganahalli [
24] implemented the RNN/LSTM with a modified Adam optimizer that effectively predicted the demand for spare parts. To improve the RNN/LSTM model’s performance, the weights were optimally tuned utilizing the Modified-Adam algorithm. Further, the experimental validation demonstrated that RNN/LSTM with a Modified-Adam results in minimal errors compared to other baseline techniques utilized for the comparison.
Zohdi et al. [
25] utilized the machine learning algorithms that addressed the bias in statistical methods for the demand prediction. Furthermore, the extreme learning machine was introduced for improving intermittent demand prediction. Ultimately, the innovation of the model was analyzed by validating the algorithm with other machine learning models, including the decision tree, gradient boosting, K-nearest neighbors, and multi-layer perceptron, for its accuracy and performance in comparison to other approaches.
Most of the existing techniques prevalent in demand prediction are limited in providing precise demand forecasting. Specifically, existing methods face difficulty in handling complex patterns hidden in the attributes, and modeling their subordinate and complementary relationships adds additional complexity. Existing models need to be trained with large sets of data for them to have proper functionality, limiting the ability of demand forecasting to predict demand for products that are new or in high-demand variability. Due to the limited feature representations, the model had difficulties in extracting the complex and dynamic relations in the historical sales data. Furthermore, the existing models are observed to be complex to fit with large data for automatic processing and slow to learn, resulting in subpar performance for analyzing new trends. So, the proposed research utilizes the FZ-DACR model with an advanced architecture that extracts the complex relationships present in the historical sales data, utilizing the effective feature representations, minimizing computational complexity, and improving the generalization performance.
Challenges
- ➢
Some of the common issues that fashion retailers have to face are missing data, incorrect data, and noisy data. Lack of sufficient capacity for existing frameworks in handling such problems could also be attributed to their weaknesses [
2].
- ➢
Numerous types of relationships can exist between various products, and some of them include a subordinate relationship, a complementary relationship, and a cannibal relationship. Maybe the currently available models are not so prescriptive in capturing these interdependencies [
8].
- ➢
Many fashion items have relatively low durability, which dramatically affects the availability of historical data for such forecasting. Current models need to be trained with large sets of data for them to have proper functionality; this severely limits the capability of demand forecasting to predict demand for products that are new or in high-demand variability [
8].
- ➢
Certain contemporary models have difficulties in extracting complex and dynamic relationships due to limited feature representations [
11].
- ➢
Demand is another factor that influences fashion and may be shaped by several factors, such as the climate, changes in the economy, social media, and endorsement of celebrities. There are often external factors resulting in poor forecasts of results; in fact, historical models may not include such factors fully [
3].
- ➢
In the SVM model, the model faced difficulty to fit with the large data for automatic processing, thereby limited in terms of slow learning and resulting in subpar performance for the new trends and outside forces involved in the forecasting [
20].
However, the proposed FZ-DACR model addresses the challenges in the existing techniques with the application of the FZ algorithm, which improves the decision making concepts and solves the high-dimensional optimization problem, resulting in improved performance. Furthermore, the combination of the CNN and RNN algorithms assists in extracting the context and hierarchical patterns present in the input historical sales data. Furthermore, the proposed model is trained over the features extracted from product images and historical sales data, which assist in improving the generalization performance. Additionally, the distributed attention mechanism highlights the important region in the sequence data representations using the attention weights, improving the forecasting performance. Moreover, hyperparameter tuning using the proposed FZ algorithm offers the optimal configuration for minimizing computational complexity while improving demand forecasting performance.
3. Proposed Methodology for Demand Forecasting
Existing demand forecasting models face limitations, including poor performance due to fluctuating fashion trends, limited resources, incorrect feature selection, overfitting, high computation time, and poor generalizability. The proposed research offers an effective methodology that incorporates advanced pre-processing, feature extraction, and a segmentation process for effective feature extraction that influences accurate demand prediction. A hybrid FZ optimization is developed to overcome the existing limitations, like poor convergence, false errors, and complexity issues. Furthermore, the combination of a distributed attention mechanism with deep learning models improves the training process for demand forecasting while simultaneously reducing errors. The proposed FZ-DACR model utilized three different fashion datasets, such as dress, skirt, and T-shirt datasets, to train the model (as shown in
Figure 1). Each dataset includes diverse consumer reviews and product images, which enhance the generalizability of the model on unseen data.
The input data comprise price information, product qualities, historical sales indicators, and other relevant details in image and text formats. Initially, the review data input are processed through a pre-processing and feature extraction stage to guarantee data consistency and quality. Similarly, image input is processed in segmentation and feature extraction procedures. A key component of the process is feature extraction, which includes TF-IDF (Term Frequency–Inverse Document Frequency) and graph-based feature extraction techniques for review data in order to examine the links between fashion items and capture word importance inside of brand names and product descriptions.
In contrast, the input image is applied with the Formicary Zebra Optimization-based segmentation process for separating the fashion objects from the background. Furthermore, images are fed into the pre-trained deep neural networks, such as VGG16 and ResNet 101, for feature extraction, which extracts texture, form, and fine features. Hence, the extracted feature vectors received from the review data and image data are forwarded to the complex deep learning architecture, known as DACR. The proposed DACR model comprises a CNN model, a Zero Attention mechanism, and an RNN model for extracting the input features for accurate demand forecasting. Additionally, the proposed model integrates a distributed attention mechanism that enables it to concentrate on pertinent aspects from many data sources. The extracted features received from review and image data through the CNN model are concatenated and sent to the RNN layer. Furthermore, Formicary Zebra Optimization adaptively optimizes the FZ-DACR model parameters, boosting demand forecasting performance and generating accurate results. Furthermore, the method’s outcome is the creation of demand forecasts for fashion items, which give traders useful information for improving supply chain decisions, pricing tactics, and inventory management.
3.1. Input Data for Demand Forecasting
The demand forecasting model for the fashion industry utilizes input data, such as fashion images, reviews, news, etc. In this research, three different fashion datasets, dress, skirt, and T-shirt datasets, are combined into a single Flipkart Grid Software Development Challenge 2.0 dataset [
26], which is then used to train and test the proposed FZ-DACR model for accurate demand forecasting. The input image data are collected through scraping from the original dataset. This dataset also includes product reviews stored as time series data, which are mathematically presented below.
where
denotes the database,
represents the input data, and
denotes the number of classes.
3.2. Pre-Processing for Review Data
The source dataset contains a time series of data that includes historical sales characteristics with reviews, URLs, brand items, discounted prices, MRP, stars, and ratings, providing the basics for forecasting the demand for fashion products. The input review data are applied to the pre-processing phase, where the text cleaning process is utilized to remove symbols, punctuations, stop words, and distracting elements from the text. The approach ensures that the textual data derived from consumer reviews are pre-processed and provides a reliable set of data to aid in more precise demand forecasting. The pre-processed input review data
with dimension
are mathematically represented as
where
denotes the number of classes and
represents the pre-processed review data.
3.3. Feature Extraction for Review Data
The pre-processed review data are forwarded into a combined feature extraction technique to obtain different aspects of the information contained in the reviews. This involves the TF-IDF graph embedding techniques, which include the cosine similarity, cosine distance, and linear integration. All of these techniques give a different view of the data and thus improve the reliability of feature extraction.
Term Frequency–Inverse Document Frequency (TF-IDF)-Based Graph Embedding
To determine a word’s statistical significance in a document, the TF-IDF approach is utilized based on a keyword extraction technique in which the text is converted into vector values.
Its primary premise is that a word’s significance to a text is positively connected with how often it appears in the document and negatively correlated with how often it appears in the corpus of documents. The term frequency is measured by the frequency of a word occurring in the document, mathematically expressed as
where
represents the document,
represents the word, and
represents the number of times
word appears in the document.
Inverse Document Frequency (IDF) is used to reduce the weights of words that appear in multiple documents in the corpus, which is defined as
where
represents the documents that have the word
and
is the total number of documents in the corpus. We typically add 1 to the denominator to account for words that do not appear in the corpus, which results in a 0 denominator. The final calculation of TF-IDF is defined as
The TF-IDF score is then computed for all of the terms in the documents, and the keywords can be obtained by choosing the terms with higher TF-IDF values. The TF-IDF-based graph embeddings function as a process of lowering data dimensions, which works well because it retains the inherent features and connections of the initial data. These transformations enable techniques, such as cosine similarity, cosine distance, and linear kernels, to improve the graph embeddings when used in demand forecasting. Therefore, TF-IDF-based graph embedding can be used to gain a better perspective on the structure and relationships of the data and therefore improve the comprehensibility of the prediction results.
- (i)
Cosine Similarity
Cosine similarity measures the similarity of the two vectors, generally texts or a particular segment of texts, based on the cosine of the angle formed by the two vectors in a high-dimensional space. Cosine similarity can be estimated as the ratio of the dot product of the two vectors
B and
C and the magnitudes of the vectors. This is where a vector score of 1 means that the two vectors are parallel, conveying the same information, while a score of 0 means that the two vectors are perpendicular, containing non-related information, and a score of −1 means that the two vectors are quite opposite and contain completely different information.
where
and
are the magnitudes of vectors
and
, respectively.
- (ii)
Cosine Distance
Cosine distance computes the cosine of the angle between two vectors, which can refer to documents or text segments, that are different from one another by using the complement of cosine similarity. It is defined as the value −1 cosine of the angle formed by the vectors. This metric implies that they can easily measure how orthogonal the two orientations of the two vectors are in a high-dimensional space. Cosine distance eliminates the dependence on the vector magnitudes while keeping the angle between vectors as the parameter for dissimilarity and, thus, allows for reliable measurement of the intrinsic dissimilarity of textual content.
- (iii)
Linear Kernels
The linear kernels measure similarity based on the dot product of two input vectors
and
, which generates a single numerical value. The linear kernel
is mathematically expressed as
Finally, the graph embedding feature vector
is generated by concatenating the findings of cosine similarity, cosine distance, and linear kernels. This hybrid vector is mathematically defined as
In Equation (9), β, γ, and δ represent the weights assigned according to the importance of each feature type. Graph embedding combines multiple features, including the cosine similarity, cosine distance, and linear kernel, into a unified feature vector with a dimension . This improves the representation of a term or document in different machine learning tasks.
3.4. Optimization-Based Segmentation for Image Data
The input image is gathered from the dataset using a scraping method and applied to the segmentation process. In this research, the segmentation process utilizes a thresholding method to transform the original input image into a binary format by separating the segmented image from the background. This binary thresholding transforms the original pixel intensity values, spanning a range of 0 to 255, into a binary representation of either 0 or 1. Furthermore, the segmentation process is enhanced by the integration of formicary zebra optimization. This FZ optimization utilizes the foraging and path-finding strategies to segment images or fashion items from their backgrounds. Therefore, the segmentation process is optimized and captures the important features of fashion items, like diverse image complexities or patterns, including the texture, color, and shape. Moreover, the segmented image is enhanced by the subsequent feature extraction techniques, which are explained in the next section.
3.5. Feature Extraction for Image Data
After the segmentation process, the segmented image is entered into the feature extraction phase, where two pre-trained deep neural networks are utilized to extract significant features that may influence demand forecasting. Here, two networks, such as VGG16 and ResNet101, are employed to learn multiple levels of abstraction, like edges, texture, shapes, and much more complex structures, from the input images.
In this research, we utilize both VGG16 and ResNet101 to take advantage of the strengths of both models to produce a better feature representation when dealing with image data. Different models, like VGG16 and ResNet101, are particularly good at highlighting different aspects of image features because of their structural differences. Compared with VGG19, which is more complex and inconsistent, VGG16 is good at preserving delicate texture and structure information. On the other hand, ResNet101, with a deeper architecture and residual connections, is able to learn more complex and abstract features. This integration benefits the ability of the system to capture a larger variety of visual features, thereby improving other related tasks, such as classification, clustering, and regression.
3.5.1. VGG16
VGG 16 is a type of CNN model that has proven to be simple but very effective in solving image classification problems. VGG16 was developed by the Visual Geometry Group (VGG) with a simple architecture that mainly consists of 16 layers, mostly of small 3 × 3 convolutional layers. These filters are used subsequently in all of the various layers of the network and then followed using max-pooling layers, and the final layers are fully connected layers. Small filter size and the homogeneous structure of the model help VGG16 extract detailed textures in the images. Although it is much shallower compared to today’s architecture, VGG16 excels in pattern and shape recognition because of its feature extraction depth. It is also computationally efficient and easy to implement due to its simplicity of design, which has made it one of the most used solutions for many image recognition problems. The significant features extracted through the VGG16 model are denoted as with a dimension .
3.5.2. ResNet101
ResNet101, or residual network with 101 layers, is a deep CNN model that seeks to overcome a major problem with very deep networks known as the vanishing gradient problem. ResNet101 was developed by researchers at Microsoft, where residual connections are integrated that enable the network to learn identity mappings. These residual connections allow the network to train more effectively by making sure that gradients can pass through the network without loss as the depth of the network increases. ResNet101 mainly consists of numerous residual blocks; each of these blocks contains convolutional layers succeeded by batch normalization and ReLU activation. This design helps ResNet101 pick up a variety of features from hierarchies, from the edges of the image to the objects in the picture. Due to its depth and remaining connections, ResNet101 is very efficient in the identification of complex patterns and relationships in image data. The input segmented image is extracted through the ResNet101 model and generates the feature map, represented as with dimension .
The final output of the feature extraction technique for image data is produced by fusing the feature maps received from the pretrained model, represented as
. This final feature vector is generated by concatenating the feature maps received from the VGG16, represented as
, and the ResNet101 model is represented as
, which is mathematically expressed as
where
denotes the concatenation operation. Hence, the feature vector
is generated with a dimension of
.
3.6. Distributed Attention-Based Deep Learning Model
The feature final vectors generated from the review data
and image
are allowed into the proposed FZ-DACR model for demand forecasting (as shown in
Figure 2). A distributed attention-based deep learning approach employs a CNN model integrated with a Zero Attention mechanism. This architecture is organized into two parallel branches dedicated to processing text data and image data, respectively. Furthermore, the RNN model is incorporated to combine textual and visual data to improve the prediction. Moreover, FZ optimization is integrated to tune the parameters to achieve accurate prediction and increase the convergence rate while reducing the false errors. The proposed FZ-DACR model starts by learning the data of the reviews and images separately through the distribution attention method.
At first, the feature vector received from review data with a dimension
is applied to the convolutional network for further processing. The input feature vector is transmitted through multiple layers, such as convolutional layers, linear activation function, batch normalization, max-pooling, reshape, Zero Attention mechanism, flatten layer, and dense layer in a deep CNN model separately to learn context and hierarchical patterns. In each convolutional layer, there are two or more filters (or kernels) that scan over the input feature map and perform element-wise multiplication or addition to generate several feature maps. These filters are capable of detecting various low-level patterns like edges, texture, and shapes in earlier layers and higher levels of abstractions in deeper layers. By using several layers of convolution, the network forms a deep, fully connected feature space. Thus, hierarchical feature extraction is vital to identify the details of the data from a combined review and image data set, which in turn helps the model to make an appropriate demand forecast. The architecture of the proposed FZ-DACR model is shown in
Figure 2.
The convolutional layers calculate the dot product of the weights and the bias of the input received from the review data, which is expressed as
where
and
represent the weights and bias of the input
vector, which is then forwarded to the linear activation function, where the dimensionality is not changed, and sent to the next layer. Then, the batch normalization layer is employed to speed up the learning process while reducing overfitting. Furthermore, the pooling layer is applied to decrease the spatial extent of the feature maps, or, more precisely, to decrease the dimensions of the maps that were obtained, but the main objective here is to save all important data and decrease the computational load. Here, the CNN model is incorporated with a Zero Attention mechanism, which learns the informative spatial regions and complex patterns. Hence, the attention mechanism extracts important features from the input reviews and images while neglecting the irrelevant features effectively. Furthermore, the inclusion of a flatten and reshape layer reduces the dimensionality of the input. Then, the output received from the reshape layer is sent to the dense layer. A dense layer is also known as a fully connected layer, where multiple neurons are connected through the layers, which learns input and produces accurate predictions with dimension
to the consequent layers. The combination of the CNN model with the attention mechanism enhances the accuracy of demand forecasting results.
Similarly, the feature vector received from image data is processed through a separate CNN learning network. The feature vector is processed using a distributed attention mechanism and a CNN model to extract more detailed patterns and higher-level, hierarchical features, mirroring the processing applied to the review data.
Finally, the processed output generated for both review data and image data using the CNN model has the same dimension
. Furthermore, these generated features are concatenated with a dimension
and then fed to the RNN model. Here, the combined feature vector is denoted as
with dimension
formed by concatenating the individual feature vectors received from the review-data-based CNN and image-data-based CNN model, expressed as
where
represents the features based on review data,
represents the features based on image data, and
denotes the concatenation operation. Furthermore, the concatenated output
is fed into the reshape layer, where the dimension of the data is formed
, which is applied to the RNN model.
The RNN deals with temporal sequences and is important for understanding the temporal variation in sales. Moreover, the RNN model is incorporated with the Formicary Zebra Optimization by tuning the weights and bias parameters of the model to achieve accurate demands while minimizing false errors. In this context, the final output of the dense layer is the final one, which enables the organization to make accurate demand estimations. Finally, the predicted output is generated from the second RNN’s hidden state and employed with a linear activation function, dropout layer, and dense layer to produce the final output, which may represent the predicted demand for fashion items with dimension . This multi-hierarchical combined strategy utilizes the potency of both CNNs and RNNs in addition to an attention mechanism to forecast demand and assist retailers in managing stock and meeting consumers’ demands.
3.6.1. Distributed Attention Mechanism
In this research, a Zero Attention mechanism is utilized to extract the review and image data individually throughout the deep learning network. Here, the Zero Attention mechanism is computed by utilizing the zero-channel attention and zero spatial attention mechanisms to reduce the spatial dimensions of the input features. This aims to improve model performance using an attention mechanism without introducing any additional parameters, which calculates the element-wise summation to merge the pooled features and leverages spatial relationships among features to create a spatial attention map. The concatenation of features received from zero spatial attention
and zero channel attention
is represented as
3.6.2. Recurrent Neural Network (RNN)
The RNN model extracts the temporal dependencies and sequences within the data, receiving its input from the reshape layer output with a dimension of
. The hidden state calculation
known in RNN is expressed as
where the hidden state at time
is denoted as
, the weight matrices are denoted as
and
, the bias vector is denoted as
, and the linear activation function is denoted as
. Specifically, the output of the first RNN layer is integrated with FZ optimization, where the weights and bias of the RNN model are tuned for accurate demand prediction and to reduce false errors. Furthermore, the tuned data are applied to the linear activation function with dimension
and also forwarded to the second RNN model.
Here, denotes the output weight parameters, denotes the output bias parameters, and is the final hidden state of the network at time step . This concept also enables one to integrate inputs of different forms, namely, the images and the reviews, hence offering enhanced accuracy of the demand forecasting model.
3.7. Formicary Zebra Optimization
A new metaheuristic algorithm is developed from nature-inspired processes to address and overcome limitations found in existing algorithms, such as poor convergence, getting stuck in local optima, and the occurrence of false errors. Hence, the foraging behavior of the Zebra Optimization Algorithm (ZOA) [
27] and the path search capability of Ant Colony Optimization (ACO) [
28] are integrated into the proposed FZ optimization to overcome those previous limitations. Consequently, the FZ algorithm effectively navigates the large search space to find the optimal solution and offers the benefits of fewer parameter requirements while maintaining faster convergence. As a result, the FZ algorithm offers an efficient binary thresholding technique to facilitate accurate image segmentation. Specifically, the FZ algorithm operates through image data segmentation and effectively segments the relevant features from the background images of different fashion items. This simplifies the images and feeds the DACR model with the significant features of the image. In addition, the FZ algorithm is applied for optimal hyperparameter selection to improve the model’s performance, thereby improving the overall demand forecasting performance. Thus, FZ optimization fine-tunes the weights and parameters of the DACR model to achieve high accuracy and reduce false errors while facilitating a high convergence rate. Altogether, these characteristics enhance the model’s precision in providing accurate results in demand forecasting in the fashion industry.
Inspiration: FZ optimization is inspired by the efficient foraging characteristics of zebras and the path search capability of ants. The ACO algorithm comprises the collective intelligence and path-finding characteristics of ants, which use pheromone trails to discover the shortest paths between food sources and their nest. FZ incorporates this mechanism to guide the search process towards optimal solutions. Additionally, the ZOA integrates the natural foraging and defense traits of zebras in the wild, including their coordinated foraging and defensive tactics against predators. By integrating these complementary behaviors, FZ enhances model effectiveness and robustness. The ACO ensures efficient exploration and exploitation of the search space, while the ZOA provides adaptability and resilience. Together, these characteristics improve the model’s ability to generate accurate demand forecasting.
Initialization: The position of each solution in the search space determines the values for the decision variables. The population of the solution can be mathematically modeled using the matrix. The initial position of the solution in the search space is randomly assigned. The population matrix is specified below.
Here, means the first solution, means the solution, means the population, and means the number of decision variables. The initial random solution is initialized from the population matrix based on the weights and bias parameters.
Fitness Evaluation: In FZ, the fitness function is considered as the Mean Squared Error (MSE). The model is tuned for minimum MSE values to attain the optimal solution, which also increases the prediction accuracy. Thus, by minimizing the MSE, the FZ algorithm continuously refines the model’s parameters to improve the method’s ability to make an accurate demand forecast for the fashion industry.
Phase 1: Foraging strategy: If
, where
represents the deterministic factor. The pioneer solution and the leading solution are especially important in the foraging process of FZ, where the aim is to explore the search space and look for the best solutions. This leading character paves the way to other solutions, which are contingent on its guidance while mapping the search space. Here, the pioneer solution of zebra optimization and the following solution of ant optimization are incorporated to create an effective foraging strategy. This makes the pioneer solution effective in guiding the solution space, as it can identify the successful paths and use them for the group’s benefit. It helps to create the average for the group to enhance to establish the position of the optimal solution. Besides supporting the effective exploration and exploitation solutions, it also makes the process strong, especially in the presence of noise and disturbances. This foraging strategy is expressed by incorporating the behavior of ants and zebras, which is updated as
Here, represents the new status of the zebra in the first phase, denotes the leading solution, denotes the random factor in interval (0, 1), denotes the energy factor of leading solution ranges (1, 2), represents the ability to follow the leading solution with pheromone values of ants, is the construction step, represents the evaporation rate, , denotes the position at current and previous iterations, and denotes the speed of the solution. In Equation (20), the first half shows the leading behavior of the leader solution of zebras, and the second half shows the following behavior of ants.
Phase 2: Escape strategy: The second phase shows that the solutions’ defensive strategies against attackers are employed to update the position of population members in the search space. It is assumed that one of the following two conditions occurs with the same probability.
(i) Defensive strategy: : Here, denotes the fitness value of the solution, and means the fitness value of the attacker solution.
In FZ, when a solution is threatened by an opponent with better fitness, it has to implement defense mechanisms to avoid the attacker to remain safe. Trailing and attractiveness characteristics help the strategy to guide the solution to shift towards the center of the population as a way of enhancing the strategy. In this way, the weaker solutions can stay in the safety phase, where they are given a chance to continuously improve their fitness, while the stronger solutions continue with the actual optimization without compromising the randomness and hence the diversification of the search.
Here, denotes the updated following ability of the solution, denotes the pheromone matrix (the ability to stay in the population), denotes the constant number with value 0.01, and represents the probabilistic term in the range (0, 1).
(ii) Offensive strategy:
: In FZ, if a solution has a fitness value greater than that of the attacker, it goes for an attack in order to safeguard other inferior solutions in the population. In this context, if an attacker targets a weaker solution, all of the stronger solutions converge towards the attacker. This coordinated movement helps to distract and tire down the attacker while creating a shield around the weaker solution. This forces the weaker member to be shielded from all threats that may be from other members of the group, as the stronger members work as a group to shield it. This offensive strategy not only eliminates the impact of the attacker but also increases the strength of the weakest solution in the herding effect. The collective defense mechanism guarantees that the members who have gotten stuck at a certain phase can still survive, along with contributing to the subsequent search process; this makes the entire optimization algorithm stronger.
Here, means the worst solution in the population. Thus, it can be pointed out that in the overall framework of FZ, the model under consideration pays special attention to the efforts aimed at minimizing the loss of weights to improve the accuracy of the prediction. This is done by including the loss function in the objective function during the training phase. As optimization progresses, the solutions move towards the set of parameters that produces the most minimal loss, which enables the model to generate accurate and reliable predictions. This continuous reduction in loss not only enhances the forecast accuracy of the model but also increases the model’s overall performance and reliability in terms of demand forecasting in the fashion and apparel retail business.
Termination: Finally, the algorithm terminates the iterative process upon reaching the optimal solution, with the condition
; otherwise, the solution is re-evaluated to obtain the optimal solution. Hence, the global solution is declared for updating the hyperparameters of the model, resulting in improved demand forecasting performance.
Figure 3 shows the flowchart of the proposed Formicary Zebra Optimization, and Algorithm 1 provides the pseudocode of FZ optimization.
| Algorithm 1. Pseudocode for the Formicary Zebra Optimization model.
|
| 1. | Start |
| 2. | Initializing Random Solution. |
| 3. | Evaluate Fitness Function |
| 4. | The Objective Function |
| 5. | If |
| 6. | Foraging Strategy |
| 7. | |
| 8. | Else If |
| 9. | Defensive Strategy |
| 10. | |
| 11. | Else |
| 12. | Offensive Strategy |
| 13. | |
| 14. | While |
| 15. | Declare Global Best Solution |
| 16. | Else |
| 17. | Return To Fitness Evaluation |
| 18. | Terminate The Process |
| 19. | End While |
| 20. | End |