Enhancing Demand Forecasting Using the Formicary Zebra Optimization with Distributed Attention Guided Deep Learning Model

Fandi, Ikhalas; Khalifa, Wagdi

doi:10.3390/app16021039

Open AccessArticle

Enhancing Demand Forecasting Using the Formicary Zebra Optimization with Distributed Attention Guided Deep Learning Model

by

Ikhalas Fandi

^* and

Wagdi Khalifa

Institute of Graduate Research and Studies, University of Mediterranean Karpasia, Mersin 33010, Turkey

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 1039; https://doi.org/10.3390/app16021039

Submission received: 31 October 2025 / Revised: 21 December 2025 / Accepted: 25 December 2025 / Published: 20 January 2026

(This article belongs to the Special Issue Advanced Methods for Time Series Forecasting)

Download

Browse Figures

Versions Notes

Abstract

In the modern era, demand forecasting enhances the decision-making tasks of industries for controlling production planning and reducing inventory costs. However, the dynamic nature of the fashion and apparel retail industry necessitates precise demand forecasting to optimize supply chain operations and meet customer expectations. Consequently, this research proposes the Formicary Zebra Optimization-Based Distributed Attention-Guided Convolutional Recurrent Neural Network (FZ-DACR) model for improving the demand forecasting. In the proposed approach, the combination of the Formicary Zebra Optimization and Distributed Attention mechanism enabled deep learning architectures to assist in capturing the complex patterns of the retail sales data. Specifically, the neural networks, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), facilitate extracting the local features and temporal dependencies to analyze the volatile demand patterns. Furthermore, the proposed model integrates visual and textual data to enhance forecasting accuracy. By leveraging the adaptive optimization capabilities of the Formicary Zebra Algorithm, the proposed model effectively extracts features from product images and historical sales data while addressing the complexities of volatile demand patterns. Based on extensive experimental analysis of the proposed model using diverse datasets, the FZ-DACR model achieves superior performance, with minimum error values including MAE of 1.34, MSE of 4.7, RMS of 2.17, and R² of 93.3% using the DRESS dataset. Moreover, the findings highlight the ability of the proposed model in managing the fluctuating trends and supporting inventory and pricing strategies effectively. This innovative approach has significant implications for retailers, enabling more agile supply chains and improved decision making in a highly competitive market.

Keywords:

inventory optimization; demand forecasting; deep learning; formicary zebra optimization; convolutional neural networks

1. Introduction

Inventory management involves making decisions for tactically managing and controlling the inventory level that are essential to meet customer demand at the right time and at the best cost for improving the production and delivery of the end products [1]. Inventory control attempts to meet customer demand at a predetermined service level. By assuming deterministic demand, most existing studies strive to minimize the total of projected ordering and inventory carrying costs [2]. Nevertheless, in practice, owing to market fluctuations, consumer demand becomes exceedingly unpredictable [3]. In a market where competition is fierce, precise demand forecasting is essential for streamlining inventory control, cutting expenses, and increasing income. Accurate and timely forecasting of future demand help retailers match marketing, distribution, and production to anticipated consumer demands, improving both operational effectiveness and customer satisfaction [4]. In general, having either excessive or insufficient inventory results in inefficient management of the stock inventory and negative outcomes affecting the upstream supply chain [5]. For instance, having too much inventory can lead to an oversupply scenario, as the quantity of stored goods has significantly surpassed market needs [6]. In this scenario, the related inventory expense becomes elevated, as many items will need to be kept for an extended period, resulting in slow or inadequate resource turnover for businesses [7]. Furthermore, having an inadequate inventory result in insufficient coverage of customer demands, which can gradually lead to product shortages, diminished customer trust, and potentially reduced profits [8]. Despite the difficulties in attaining precise demand forecasting amid market uncertainties, the abundance of historical data and the application of big data analytics have enhanced the accuracy of demand forecasting [9].

Numerous previous studies have supported the derived version of the inventory model to account for uncertainty and variability. The Economic Order Quantity (EOQ) model is developed using Monte Carlo simulation to account for demand variability and time-sensitive product quality [10]. In view of this problem, another study examined the lead time and defective products as uncertainty factors in addition to demand by formulating the Min–Max inventory method [11]. An additional approach to address uncertainty in inventory involves the use of predictive techniques, such as employing ensemble deep learning for optimizing order-up-to-level inventory in demand forecasting [12]. Nevertheless, none of the studies have taken into account not only the uncertainties in supply and demand but also the uncertainty of discount events offered by the supplier at any given time. Specifically, this discount event is viewed as an uncertain or fluctuating price that has become increasingly prevalent lately [13]. Studies on demand forecasting are crucial for many industries, but the retail industry presents greater challenges than others. The primary cause of this is the lack of precise knowledge about the factors that will influence demand. Seasonality and trend are two other factors that heavily impact sales in the apparel sector [14]. Sales in the retail sector typically happen quickly. Customers visit other stores to satisfy their demands if they are unable to find the thing they desire in the store. For the stores, this translates into lower sales. As a result, it is critical to anticipate client needs precisely [15]. Customer loss may result from higher stock expenses, stock transfers to subsequent years, and a break from fashion if production exceeds real demand. When there is less production than there is demand, there is a decline in sales and a bad effect on the reputation of the brand, as customers are unable to find the products. Additionally, this may have a detrimental impact on brand belief and client loyalty. Specifically, the retail fashion and apparel sector is known for its dynamic and fast-paced environment, which is fueled by shifting consumer tastes, seasonal fashions, and shifting market dynamics [16,17]. Consumers these days want their products supplied quickly and like to choose them from a vast array of alternatives. It is difficult for fashion retailers to quickly satisfy customer needs [18]. As a result, it is critical that sellers of fashion apparel act quickly and effectively when it comes to restocking inventory by forecasting future demand patterns [19].

Deep learning is a branch of machine learning that makes use of multi-layered artificial neural networks and has shown impressive results in identifying complex patterns and connections in big datasets [20,21]. Retailers can use sophisticated algorithms to derive valuable insights from a variety of data sources, such as past sales, product features, industry trends, and outside variables, by utilizing deep learning models [13]. Predicting future demand and sales of fashion items with accuracy is still a major challenge for both business and academia. Studying the intricacy of the fashion industry and managerial techniques that would enable products to be created, manufactured, and delivered on schedule is essential in order to address this issue [14]. By utilizing deep learning techniques to forecast demand in the retail clothing and fashion industry and investigating the possibilities of deep learning models, including transformer models, convolution neural networks (CNNs) [3], and recurrent neural networks (RNNs) [15], it is possible to increase the precision and dependability of demand forecasts by addressing the shortcomings of conventional forecasting techniques. Predicting future demand and sales of fashion items with accuracy is still a major challenge for both business and academia [22]. It is essential to solve this challenge. The fashion industry is trying to manage these quick changes more effectively by implementing an agile supply chain in order to solve the complexity originating from these variables [16,17]. By presenting a revolutionary strategy for creating an intelligent demand forecasting system employing cutting-edge deep learning techniques and historical product images, this study solves the shortcomings of the traditional forecasting system [2]. However, it is crucial to understand that there are challenges associated with the demand forecasting models used in the fashion and apparel retail industry. Fashion cycles are seasonal and can be very volatile most of the time. The external factors are economic factors and social media trends that cannot be overlooked, and the data from the fashion and apparel retail industry may include historical sales, current trends, as well as consumer data. Models fail to consider fluctuating trends in consumer preferences and other characteristics.

Therefore, this research offers significant improvement to demand forecasting in fashion and apparel retail using FZ with a DACR model. Such an integration of FZ with deep learning architectures leverages the inherent adaptability of FZ to optimize the feature extraction and sequential learning strength of deep learning architectures. In this study, FZ is used to select the most relevant search space that helps in improving the efficiency of the model. The key contribution of the proposed model is explained as follows.

Formicary Zebra Optimization (FZ): The FZ algorithm is a novel optimization algorithm inspired by the foraging behavior of zebras and the path search capability of ants to assist in obtaining the optimal solution. Based on the patterns of collected and shared collective intelligence, the FZ algorithm balances the trade-off between exploration–exploitation, leading to redundancy across other domains. Furthermore, the incorporation of unique traits improves the foraging and decision making concepts to solve high-dimensional optimization problems and provides improved performance.

FZ-DACR: In the proposed model, the combination of the convolutional neural network (CNN) and the recurrent neural network (RNN) model offers the benefits of extracting the context and hierarchical patterns present in the input data. Specifically, the proposed model is trained over the features extracted from product images and historical sales data. Furthermore, the distributed attention mechanism allows the model to focus on the important regions in the sequence data representations using the attention weights, improving forecasting performance.

The study is coherently organized, with Section 2 primarily concentrating on contemporary techniques and providing an overview of the strategies and obstacles faced. The recommended method for demand forecasting and the DACR model’s mathematical form is explained in detail in Section 3. The DACR technique’s outcomes are examined in Section 4. A thorough examination of the study’s findings is provided as conclusion in Section 5.

2. Literature Review

The goal of Chandadevi Giri et al.’s research [2] was to use ML clustering and classification algorithms to propose a novel sales forecasting approach for fashion products. The clustering and classification model has been trained using real fashion retail data. This model has the potential to be useful for handling several supply chain planning activities in the fashion garment business. The results of the study show that the fashion retailing business can forecast the sales of a new item by using images and the item’s past data. However, it was possible to improve the image database in subsequent work to increase model accuracy.

Simsir Fuat and Ilker Guven [8] examine the impact of product variety and artificial intelligence on sales forecasting in the retail apparel sector, intending to minimize errors. Within this framework, models of artificial intelligence, including support vector machines (SVM) and artificial neural networks (ANN), have been developed and conclusions drawn from the datasets that attain a high performance and efficiently decrease the over-fitting issues. Sales decline and brand prestige suffer when there is insufficient supply compared to demand because customers cannot locate the products.

For apparel products, demand forecasting and inventory prediction were first presented by Tran Thi Bich Chau et al. [11]. The ARIMA (Autoregressive Integrated Moving Average) model’s demand was predicted in their study. A policy framework for production and forecasting models was proposed to ensure optimal inventory levels and enhance enterprise supply capacity. Additionally, the next goal is to identify the distribution path for the whole distribution system.

Giri Chandadevi et al. [3] present a revolutionary deep learning and non-linear neural network regression technique for fashion product sales forecasting. For estimating the amount of products that are to be sold in the future, the study appears to be promising. The study’s primary weakness stems from the usage of a smaller dataset. This restriction should be overcome in future work by choosing a sizable image dataset to enhance model performance. This strategy may help fashion retailers and designers in the big data and data mining era.

Majd Kharfan et al. [10] used machine learning techniques to provide a data-driven forecasting solution for recently launched seasonal products. Forecasting has an impact on a company’s entire supply chain activities. It is influenced by a variety of implicit product attributes in addition to sales patterns. To improve the process of demand forecasting for recently launched products in the fashion retailing industry, the study suggests a data-driven methodology based on machine learning techniques. Decision makers can use the suggested methodology to anticipate demand for recently introduced products even in the absence of previous data for fashion supply chains. Still, there are unanswered questions regarding the features to be used and the chosen machine learning algorithms for further research.

The use of modular neural networks for demand forecasting with seasonal climate predictions was first suggested by Smit Marvaniya et al. [18]. The challenge of climate-aware demand forecasting in the seasonal context involves employing a sub-neural network architecture to efficiently learn joint latent representations of historical data, known inputs, and climate predictions. This model effectively overcomes the existing problems and attains high performance. Additionally, there is a need to create procedures to assess the uncertainty associated with demand projections and transmit it from ensemble forecasts at different granularities.

Muhammad Yasir et al.’s [5] main objective was to forecast the textile and clothing sectors’ output. GLS and single-layer perceptron models were used on the time-series dataset of a textile apparel company. The model has high potential to forecast the textile and clothing sectors’ output with high accuracy. However, time-series datasets are rarely available; this is a problem. The same issue was present in this investigation. Future research must be based on large datasets in accordance with this limitation of unavailability. It is also necessary to include more macro-level (exogenous) elements in the forecasting models.

Sushil Punia et al.’s [15] research presents a novel forecasting technique that integrates deep learning with an advantage over previous forecasting techniques. The suggested method may represent intricate correlations of both temporal and regression types. For time-series forecasting challenges, the more recent and sophisticated neural networks, including spiking neural networks or convolutional neural networks (CNNs), were used. CNN handles large amounts of data with notable success. However, the overfitting issue persists in the model, affecting overall performance.

Javad Feizabadi [19] introduced a hybrid demand forecasting method grounded in machine learning, such as ARIMAX and a neural network. The method was applied and evaluated in the context of a functional product and a steel manufacturer. However, the major limitation of the research is the use of a single dataset, limiting the generalization ability of the model.

Marco A. Villegas [20] presented a new model selection approach that combines different criteria using a support vector machine (SVM). This methodology will be exciting for scenarios with highly volatile demand. The methodology is very interesting because it allows for changing the model when it does not fit the data sufficiently well, thereby reducing the risk of misusing modeling techniques in the automatic processing of large datasets. However, the model is limited in terms of slow learning if new trends and outside forces are involved.

Rathipriya et al. [23] introduced the Demand Forecast Model (DFM), which assists pharmaceutical firms in achieving high efficiency in the global market. Specifically, the DFM model employed diverse shallow and deep neural networks to improve demand forecasting. In addition, the DFM model effectively suggested sales and marketing strategies considering trends or seasonal effects associated with diverse groups of pharmaceutical products with dissimilar characteristics. From the overall analysis, DFMs utilizing shallow neural networks efficiently predicted the future demand of pharmaceutical products.

Chandriah and Naraganahalli [24] implemented the RNN/LSTM with a modified Adam optimizer that effectively predicted the demand for spare parts. To improve the RNN/LSTM model’s performance, the weights were optimally tuned utilizing the Modified-Adam algorithm. Further, the experimental validation demonstrated that RNN/LSTM with a Modified-Adam results in minimal errors compared to other baseline techniques utilized for the comparison.

Zohdi et al. [25] utilized the machine learning algorithms that addressed the bias in statistical methods for the demand prediction. Furthermore, the extreme learning machine was introduced for improving intermittent demand prediction. Ultimately, the innovation of the model was analyzed by validating the algorithm with other machine learning models, including the decision tree, gradient boosting, K-nearest neighbors, and multi-layer perceptron, for its accuracy and performance in comparison to other approaches.

Most of the existing techniques prevalent in demand prediction are limited in providing precise demand forecasting. Specifically, existing methods face difficulty in handling complex patterns hidden in the attributes, and modeling their subordinate and complementary relationships adds additional complexity. Existing models need to be trained with large sets of data for them to have proper functionality, limiting the ability of demand forecasting to predict demand for products that are new or in high-demand variability. Due to the limited feature representations, the model had difficulties in extracting the complex and dynamic relations in the historical sales data. Furthermore, the existing models are observed to be complex to fit with large data for automatic processing and slow to learn, resulting in subpar performance for analyzing new trends. So, the proposed research utilizes the FZ-DACR model with an advanced architecture that extracts the complex relationships present in the historical sales data, utilizing the effective feature representations, minimizing computational complexity, and improving the generalization performance.

Challenges

➢: Some of the common issues that fashion retailers have to face are missing data, incorrect data, and noisy data. Lack of sufficient capacity for existing frameworks in handling such problems could also be attributed to their weaknesses [2].
➢: Numerous types of relationships can exist between various products, and some of them include a subordinate relationship, a complementary relationship, and a cannibal relationship. Maybe the currently available models are not so prescriptive in capturing these interdependencies [8].
➢: Many fashion items have relatively low durability, which dramatically affects the availability of historical data for such forecasting. Current models need to be trained with large sets of data for them to have proper functionality; this severely limits the capability of demand forecasting to predict demand for products that are new or in high-demand variability [8].
➢: Certain contemporary models have difficulties in extracting complex and dynamic relationships due to limited feature representations [11].
➢: Demand is another factor that influences fashion and may be shaped by several factors, such as the climate, changes in the economy, social media, and endorsement of celebrities. There are often external factors resulting in poor forecasts of results; in fact, historical models may not include such factors fully [3].
➢: In the SVM model, the model faced difficulty to fit with the large data for automatic processing, thereby limited in terms of slow learning and resulting in subpar performance for the new trends and outside forces involved in the forecasting [20].

However, the proposed FZ-DACR model addresses the challenges in the existing techniques with the application of the FZ algorithm, which improves the decision making concepts and solves the high-dimensional optimization problem, resulting in improved performance. Furthermore, the combination of the CNN and RNN algorithms assists in extracting the context and hierarchical patterns present in the input historical sales data. Furthermore, the proposed model is trained over the features extracted from product images and historical sales data, which assist in improving the generalization performance. Additionally, the distributed attention mechanism highlights the important region in the sequence data representations using the attention weights, improving the forecasting performance. Moreover, hyperparameter tuning using the proposed FZ algorithm offers the optimal configuration for minimizing computational complexity while improving demand forecasting performance.

3. Proposed Methodology for Demand Forecasting

Existing demand forecasting models face limitations, including poor performance due to fluctuating fashion trends, limited resources, incorrect feature selection, overfitting, high computation time, and poor generalizability. The proposed research offers an effective methodology that incorporates advanced pre-processing, feature extraction, and a segmentation process for effective feature extraction that influences accurate demand prediction. A hybrid FZ optimization is developed to overcome the existing limitations, like poor convergence, false errors, and complexity issues. Furthermore, the combination of a distributed attention mechanism with deep learning models improves the training process for demand forecasting while simultaneously reducing errors. The proposed FZ-DACR model utilized three different fashion datasets, such as dress, skirt, and T-shirt datasets, to train the model (as shown in Figure 1). Each dataset includes diverse consumer reviews and product images, which enhance the generalizability of the model on unseen data.

The input data comprise price information, product qualities, historical sales indicators, and other relevant details in image and text formats. Initially, the review data input are processed through a pre-processing and feature extraction stage to guarantee data consistency and quality. Similarly, image input is processed in segmentation and feature extraction procedures. A key component of the process is feature extraction, which includes TF-IDF (Term Frequency–Inverse Document Frequency) and graph-based feature extraction techniques for review data in order to examine the links between fashion items and capture word importance inside of brand names and product descriptions.

In contrast, the input image is applied with the Formicary Zebra Optimization-based segmentation process for separating the fashion objects from the background. Furthermore, images are fed into the pre-trained deep neural networks, such as VGG16 and ResNet 101, for feature extraction, which extracts texture, form, and fine features. Hence, the extracted feature vectors received from the review data and image data are forwarded to the complex deep learning architecture, known as DACR. The proposed DACR model comprises a CNN model, a Zero Attention mechanism, and an RNN model for extracting the input features for accurate demand forecasting. Additionally, the proposed model integrates a distributed attention mechanism that enables it to concentrate on pertinent aspects from many data sources. The extracted features received from review and image data through the CNN model are concatenated and sent to the RNN layer. Furthermore, Formicary Zebra Optimization adaptively optimizes the FZ-DACR model parameters, boosting demand forecasting performance and generating accurate results. Furthermore, the method’s outcome is the creation of demand forecasts for fashion items, which give traders useful information for improving supply chain decisions, pricing tactics, and inventory management.

3.1. Input Data for Demand Forecasting

The demand forecasting model for the fashion industry utilizes input data, such as fashion images, reviews, news, etc. In this research, three different fashion datasets, dress, skirt, and T-shirt datasets, are combined into a single Flipkart Grid Software Development Challenge 2.0 dataset [26], which is then used to train and test the proposed FZ-DACR model for accurate demand forecasting. The input image data are collected through scraping from the original dataset. This dataset also includes product reviews stored as time series data, which are mathematically presented below.

T = \sum_{n = 1}^{u} T_{e}

(1)

where

T

denotes the database,

T_{e}

represents the input data, and

n

denotes the number of classes.

3.2. Pre-Processing for Review Data

The source dataset contains a time series of data that includes historical sales characteristics with reviews, URLs, brand items, discounted prices, MRP, stars, and ratings, providing the basics for forecasting the demand for fashion products. The input review data are applied to the pre-processing phase, where the text cleaning process is utilized to remove symbols, punctuations, stop words, and distracting elements from the text. The approach ensures that the textual data derived from consumer reviews are pre-processed and provides a reliable set of data to aid in more precise demand forecasting. The pre-processed input review data

T_{e}^{*}

with dimension

(N, 22, 224, 3)

are mathematically represented as

T = \sum_{n = 1}^{u} T_{e}^{*}

(2)

where

n

denotes the number of classes and

T_{e}^{*}

represents the pre-processed review data.

3.3. Feature Extraction for Review Data

The pre-processed review data

T_{e}^{*}

are forwarded into a combined feature extraction technique to obtain different aspects of the information contained in the reviews. This involves the TF-IDF graph embedding techniques, which include the cosine similarity, cosine distance, and linear integration. All of these techniques give a different view of the data and thus improve the reliability of feature extraction.

Term Frequency–Inverse Document Frequency (TF-IDF)-Based Graph Embedding

To determine a word’s statistical significance in a document, the TF-IDF approach is utilized based on a keyword extraction technique in which the text is converted into vector values.

Its primary premise is that a word’s significance to a text is positively connected with how often it appears in the document and negatively correlated with how often it appears in the corpus of documents. The term frequency is measured by the frequency of a word occurring in the document, mathematically expressed as

T F (y, d) = n_{y d} / \sum_{n} n_{y d}

(3)

where

d

represents the document,

y

represents the word, and

n_{y d}

represents the number of times

y

word appears in the document.

Inverse Document Frequency (IDF) is used to reduce the weights of words that appear in multiple documents in the corpus, which is defined as

I D F (y, d) = \log \frac{|D|}{|\{n_{y} \in d\}|}

(4)

where

\{n_{y} \in d\}

represents the documents that have the word

y

and

|D|

is the total number of documents in the corpus. We typically add 1 to the denominator to account for words that do not appear in the corpus, which results in a 0 denominator. The final calculation of TF-IDF is defined as

T F - I D F (y, d, D) = T F (y, d) \times I D F (y, d)

(5)

The TF-IDF score is then computed for all of the terms in the documents, and the keywords can be obtained by choosing the terms with higher TF-IDF values. The TF-IDF-based graph embeddings function as a process of lowering data dimensions, which works well because it retains the inherent features and connections of the initial data. These transformations enable techniques, such as cosine similarity, cosine distance, and linear kernels, to improve the graph embeddings when used in demand forecasting. Therefore, TF-IDF-based graph embedding can be used to gain a better perspective on the structure and relationships of the data and therefore improve the comprehensibility of the prediction results.

(i): Cosine Similarity

Cosine similarity measures the similarity of the two vectors, generally texts or a particular segment of texts, based on the cosine of the angle formed by the two vectors in a high-dimensional space. Cosine similarity can be estimated as the ratio of the dot product of the two vectors B and C and the magnitudes of the vectors. This is where a vector score of 1 means that the two vectors are parallel, conveying the same information, while a score of 0 means that the two vectors are perpendicular, containing non-related information, and a score of −1 means that the two vectors are quite opposite and contain completely different information.

C S (B, C) = \frac{B \cdot C}{‖B‖ ‖C‖}

(6)

where

‖B‖

and

‖C‖

are the magnitudes of vectors

B

and

C

, respectively.

(ii): Cosine Distance

Cosine distance computes the cosine of the angle between two vectors, which can refer to documents or text segments, that are different from one another by using the complement of cosine similarity. It is defined as the value −1 cosine of the angle formed by the vectors. This metric implies that they can easily measure how orthogonal the two orientations of the two vectors are in a high-dimensional space. Cosine distance eliminates the dependence on the vector magnitudes while keeping the angle between vectors as the parameter for dissimilarity and, thus, allows for reliable measurement of the intrinsic dissimilarity of textual content.

C D (B, C) = 1 - C S (B, C)

(7)

(iii): Linear Kernels

The linear kernels measure similarity based on the dot product of two input vectors

B

and

C

, which generates a single numerical value. The linear kernel

K

is mathematically expressed as

K (B, C) = B \cdot C

(8)

Finally, the graph embedding feature vector

y_{h y b r i d}

is generated by concatenating the findings of cosine similarity, cosine distance, and linear kernels. This hybrid vector is mathematically defined as

y_{h y b r i d} = β C S (B, C) + γ C D (B, C) + δ K (B, C)

(9)

In Equation (9), β, γ, and δ represent the weights assigned according to the importance of each feature type. Graph embedding combines multiple features, including the cosine similarity, cosine distance, and linear kernel, into a unified feature vector with a dimension

(N, 83)

. This improves the representation of a term or document in different machine learning tasks.

3.4. Optimization-Based Segmentation for Image Data

The input image is gathered from the dataset using a scraping method and applied to the segmentation process. In this research, the segmentation process utilizes a thresholding method to transform the original input image into a binary format by separating the segmented image from the background. This binary thresholding transforms the original pixel intensity values, spanning a range of 0 to 255, into a binary representation of either 0 or 1. Furthermore, the segmentation process is enhanced by the integration of formicary zebra optimization. This FZ optimization utilizes the foraging and path-finding strategies to segment images or fashion items from their backgrounds. Therefore, the segmentation process is optimized and captures the important features of fashion items, like diverse image complexities or patterns, including the texture, color, and shape. Moreover, the segmented image is enhanced by the subsequent feature extraction techniques, which are explained in the next section.

3.5. Feature Extraction for Image Data

After the segmentation process, the segmented image is entered into the feature extraction phase, where two pre-trained deep neural networks are utilized to extract significant features that may influence demand forecasting. Here, two networks, such as VGG16 and ResNet101, are employed to learn multiple levels of abstraction, like edges, texture, shapes, and much more complex structures, from the input images.

In this research, we utilize both VGG16 and ResNet101 to take advantage of the strengths of both models to produce a better feature representation when dealing with image data. Different models, like VGG16 and ResNet101, are particularly good at highlighting different aspects of image features because of their structural differences. Compared with VGG19, which is more complex and inconsistent, VGG16 is good at preserving delicate texture and structure information. On the other hand, ResNet101, with a deeper architecture and residual connections, is able to learn more complex and abstract features. This integration benefits the ability of the system to capture a larger variety of visual features, thereby improving other related tasks, such as classification, clustering, and regression.

3.5.1. VGG16

VGG 16 is a type of CNN model that has proven to be simple but very effective in solving image classification problems. VGG16 was developed by the Visual Geometry Group (VGG) with a simple architecture that mainly consists of 16 layers, mostly of small 3 × 3 convolutional layers. These filters are used subsequently in all of the various layers of the network and then followed using max-pooling layers, and the final layers are fully connected layers. Small filter size and the homogeneous structure of the model help VGG16 extract detailed textures in the images. Although it is much shallower compared to today’s architecture, VGG16 excels in pattern and shape recognition because of its feature extraction depth. It is also computationally efficient and easy to implement due to its simplicity of design, which has made it one of the most used solutions for many image recognition problems. The significant features extracted through the VGG16 model are denoted as

f_{V G G 18}

with a dimension

(N, 1000)

.

3.5.2. ResNet101

ResNet101, or residual network with 101 layers, is a deep CNN model that seeks to overcome a major problem with very deep networks known as the vanishing gradient problem. ResNet101 was developed by researchers at Microsoft, where residual connections are integrated that enable the network to learn identity mappings. These residual connections allow the network to train more effectively by making sure that gradients can pass through the network without loss as the depth of the network increases. ResNet101 mainly consists of numerous residual blocks; each of these blocks contains convolutional layers succeeded by batch normalization and ReLU activation. This design helps ResNet101 pick up a variety of features from hierarchies, from the edges of the image to the objects in the picture. Due to its depth and remaining connections, ResNet101 is very efficient in the identification of complex patterns and relationships in image data. The input segmented image is extracted through the ResNet101 model and generates the feature map, represented as

f_{R E S N E T 101}

with dimension

(N, 1000)

.

The final output of the feature extraction technique for image data is produced by fusing the feature maps received from the pretrained model, represented as

f_{c} (I)

. This final feature vector is generated by concatenating the feature maps received from the VGG16, represented as

f_{V G G 18}

, and the ResNet101 model is represented as

f_{R E S N E T 101}

, which is mathematically expressed as

f_{c} (I) = f_{V G G 18} \oplus f_{R E S N E T 101}

(10)

where

\oplus

denotes the concatenation operation. Hence, the feature vector

f_{c} (I)

is generated with a dimension of

(N, 2000)

.

3.6. Distributed Attention-Based Deep Learning Model

The feature final vectors generated from the review data

y_{h y b r i d}

and image

f_{c} (I)

are allowed into the proposed FZ-DACR model for demand forecasting (as shown in Figure 2). A distributed attention-based deep learning approach employs a CNN model integrated with a Zero Attention mechanism. This architecture is organized into two parallel branches dedicated to processing text data and image data, respectively. Furthermore, the RNN model is incorporated to combine textual and visual data to improve the prediction. Moreover, FZ optimization is integrated to tune the parameters to achieve accurate prediction and increase the convergence rate while reducing the false errors. The proposed FZ-DACR model starts by learning the data of the reviews and images separately through the distribution attention method.

At first, the feature vector received from review data with a dimension

(N, 83)

is applied to the convolutional network for further processing. The input feature vector is transmitted through multiple layers, such as convolutional layers, linear activation function, batch normalization, max-pooling, reshape, Zero Attention mechanism, flatten layer, and dense layer in a deep CNN model separately to learn context and hierarchical patterns. In each convolutional layer, there are two or more filters (or kernels) that scan over the input feature map and perform element-wise multiplication or addition to generate several feature maps. These filters are capable of detecting various low-level patterns like edges, texture, and shapes in earlier layers and higher levels of abstractions in deeper layers. By using several layers of convolution, the network forms a deep, fully connected feature space. Thus, hierarchical feature extraction is vital to identify the details of the data from a combined review and image data set, which in turn helps the model to make an appropriate demand forecast. The architecture of the proposed FZ-DACR model is shown in Figure 2.

The convolutional layers calculate the dot product of the weights and the bias of the input received from the review data, which is expressed as

C o n v = f (w * y_{h y b r i d} + b)

(11)

where

w

and

b

represent the weights and bias of the input

y_{h y b r i d}

vector, which is then forwarded to the linear activation function, where the dimensionality is not changed, and sent to the next layer. Then, the batch normalization layer is employed to speed up the learning process while reducing overfitting. Furthermore, the pooling layer is applied to decrease the spatial extent of the feature maps, or, more precisely, to decrease the dimensions of the maps that were obtained, but the main objective here is to save all important data and decrease the computational load. Here, the CNN model is incorporated with a Zero Attention mechanism, which learns the informative spatial regions and complex patterns. Hence, the attention mechanism extracts important features from the input reviews and images while neglecting the irrelevant features effectively. Furthermore, the inclusion of a flatten and reshape layer reduces the dimensionality of the input. Then, the output received from the reshape layer is sent to the dense layer. A dense layer is also known as a fully connected layer, where multiple neurons are connected through the layers, which learns input and produces accurate predictions with dimension

(N, 512)

to the consequent layers. The combination of the CNN model with the attention mechanism enhances the accuracy of demand forecasting results.

Similarly, the feature vector received from image data

(N, 2000)

is processed through a separate CNN learning network. The feature vector is processed using a distributed attention mechanism and a CNN model to extract more detailed patterns and higher-level, hierarchical features, mirroring the processing applied to the review data.

Finally, the processed output generated for both review data and image data using the CNN model has the same dimension

(N, 512)

. Furthermore, these generated features are concatenated with a dimension

(N, 512)

and then fed to the RNN model. Here, the combined feature vector is denoted as

V

with dimension

(N, 512)

formed by concatenating the individual feature vectors received from the review-data-based CNN and image-data-based CNN model, expressed as

V = v_{r e v i e w} \oplus v_{i m a g e}

(12)

where

v_{r e v i e w}

represents the features based on review data,

v_{i m a g e}

represents the features based on image data, and

\oplus

denotes the concatenation operation. Furthermore, the concatenated output

(N, 512)

is fed into the reshape layer, where the dimension of the data is formed

(N, 512, 1)

, which is applied to the RNN model.

The RNN deals with temporal sequences and is important for understanding the temporal variation in sales. Moreover, the RNN model is incorporated with the Formicary Zebra Optimization by tuning the weights and bias parameters of the model to achieve accurate demands while minimizing false errors. In this context, the final output of the dense layer is the final one, which enables the organization to make accurate demand estimations. Finally, the predicted output is generated from the second RNN’s hidden state and employed with a linear activation function, dropout layer, and dense layer to produce the final output, which may represent the predicted demand for fashion items with dimension

(N, 1)

. This multi-hierarchical combined strategy utilizes the potency of both CNNs and RNNs in addition to an attention mechanism to forecast demand and assist retailers in managing stock and meeting consumers’ demands.

3.6.1. Distributed Attention Mechanism

In this research, a Zero Attention mechanism is utilized to extract the review and image data individually throughout the deep learning network. Here, the Zero Attention mechanism is computed by utilizing the zero-channel attention and zero spatial attention mechanisms to reduce the spatial dimensions of the input features. This aims to improve model performance using an attention mechanism without introducing any additional parameters, which calculates the element-wise summation to merge the pooled features and leverages spatial relationships among features to create a spatial attention map. The concatenation of features received from zero spatial attention

S_{a t t}

and zero channel attention

C_{a t t}

is represented as

Z_{a t t} = f (C_{a t t}) + f (S_{a t t})

(13)

3.6.2. Recurrent Neural Network (RNN)

The RNN model extracts the temporal dependencies and sequences within the data, receiving its input from the reshape layer output with a dimension of

(N, 512, 1)

. The hidden state calculation

h_{t}

known in RNN is expressed as

h_{t} = σ (W_{h} I_{t} + X_{h} h_{t - 1} + b_{h})

(14)

where the hidden state at time

t

is denoted as

h_{t}

, the weight matrices are denoted as

W_{h}

and

X_{h}

, the bias vector is denoted as

b_{h}

, and the linear activation function is denoted as

σ

. Specifically, the output of the first RNN layer is integrated with FZ optimization, where the weights and bias of the RNN model are tuned for accurate demand prediction and to reduce false errors. Furthermore, the tuned data are applied to the linear activation function with dimension

(N, 512, 100)

and also forwarded to the second RNN model.

z = W_{o} h_{U} + b_{o}

(15)

Here,

W_{o}

denotes the output weight parameters,

b_{o}

denotes the output bias parameters, and

h_{U}

is the final hidden state of the network at time step

U

. This concept also enables one to integrate inputs of different forms, namely, the images and the reviews, hence offering enhanced accuracy of the demand forecasting model.

3.7. Formicary Zebra Optimization

A new metaheuristic algorithm is developed from nature-inspired processes to address and overcome limitations found in existing algorithms, such as poor convergence, getting stuck in local optima, and the occurrence of false errors. Hence, the foraging behavior of the Zebra Optimization Algorithm (ZOA) [27] and the path search capability of Ant Colony Optimization (ACO) [28] are integrated into the proposed FZ optimization to overcome those previous limitations. Consequently, the FZ algorithm effectively navigates the large search space to find the optimal solution and offers the benefits of fewer parameter requirements while maintaining faster convergence. As a result, the FZ algorithm offers an efficient binary thresholding technique to facilitate accurate image segmentation. Specifically, the FZ algorithm operates through image data segmentation and effectively segments the relevant features from the background images of different fashion items. This simplifies the images and feeds the DACR model with the significant features of the image. In addition, the FZ algorithm is applied for optimal hyperparameter selection to improve the model’s performance, thereby improving the overall demand forecasting performance. Thus, FZ optimization fine-tunes the weights and parameters of the DACR model to achieve high accuracy and reduce false errors while facilitating a high convergence rate. Altogether, these characteristics enhance the model’s precision in providing accurate results in demand forecasting in the fashion industry.

Inspiration: FZ optimization is inspired by the efficient foraging characteristics of zebras and the path search capability of ants. The ACO algorithm comprises the collective intelligence and path-finding characteristics of ants, which use pheromone trails to discover the shortest paths between food sources and their nest. FZ incorporates this mechanism to guide the search process towards optimal solutions. Additionally, the ZOA integrates the natural foraging and defense traits of zebras in the wild, including their coordinated foraging and defensive tactics against predators. By integrating these complementary behaviors, FZ enhances model effectiveness and robustness. The ACO ensures efficient exploration and exploitation of the search space, while the ZOA provides adaptability and resilience. Together, these characteristics improve the model’s ability to generate accurate demand forecasting.

Initialization: The position of each solution in the search space determines the values for the decision variables. The population of the solution can be mathematically modeled using the matrix. The initial position of the solution in the search space is randomly assigned. The population matrix is specified below.

X = {[\begin{array}{l} X_{1} \\ \dots \\ X_{i} \\ X_{n} \end{array}]}_{N \times M}

(16)

Here,

X_{1}

means the first solution,

X_{i}

means the

i th

solution,

N

means the population, and

M

means the number of decision variables. The initial random solution

X^{t}

is initialized from the population matrix based on the weights and bias parameters.

Fitness Evaluation: In FZ, the fitness function is considered as the Mean Squared Error (MSE). The model is tuned for minimum MSE values to attain the optimal solution, which also increases the prediction accuracy. Thus, by minimizing the MSE, the FZ algorithm continuously refines the model’s parameters to improve the method’s ability to make an accurate demand forecast for the fashion industry.

F i t (X^{t + 1}) = \min (M S E (X^{t + 1}))

(17)

Phase 1: Foraging strategy: If

ϕ \geq 1

, where

ϕ

represents the deterministic factor. The pioneer solution and the leading solution are especially important in the foraging process of FZ, where the aim is to explore the search space and look for the best solutions. This leading character paves the way to other solutions, which are contingent on its guidance while mapping the search space. Here, the pioneer solution of zebra optimization and the following solution of ant optimization are incorporated to create an effective foraging strategy. This makes the pioneer solution effective in guiding the solution space, as it can identify the successful paths and use them for the group’s benefit. It helps to create the average for the group to enhance to establish the position of the optimal solution. Besides supporting the effective exploration and exploitation solutions, it also makes the process strong, especially in the presence of noise and disturbances. This foraging strategy is expressed by incorporating the behavior of ants and zebras, which is updated as

X^{t + 1} = 0.5 [X_{t} + r_{1} \cdot (z_{p} - I \cdot X_{t}) + [(1 - ρ_{1}) \cdot τ^{n - 1} + ρ_{1} V^{t}]]

(18)

Here,

X_{t}

represents the new status of the zebra in the first phase,

z_{p}

denotes the leading solution,

r_{1}

denotes the random factor in interval (0, 1),

I

denotes the energy factor of leading solution ranges (1, 2),

τ^{n - 1}

represents the ability to follow the leading solution with pheromone values of ants,

n

is the construction step,

ρ_{1}

represents the evaporation rate,

ρ_{1} = \frac{X^{t} - X^{t - 1}}{t}

,

X^{t}, X^{t - 1}

denotes the position at current and previous iterations, and

v^{t}

denotes the speed of the solution. In Equation (20), the first half shows the leading behavior of the leader solution of zebras, and the second half shows the following behavior of ants.

Phase 2:

i f ϕ < 1

Escape strategy: The second phase shows that the solutions’ defensive strategies against attackers are employed to update the position of population members in the search space. It is assumed that one of the following two conditions occurs with the same probability.

(i) Defensive strategy:

i f F (X_{a t t}) \geq F (X_{t})

: Here,

F (X_{t})

denotes the fitness value of

X_{t}

the solution, and

F (X_{a t t})

means the fitness value of the attacker solution.

In FZ, when a solution is threatened by an opponent with better fitness, it has to implement defense mechanisms to avoid the attacker to remain safe. Trailing and attractiveness characteristics help the strategy to guide the solution to shift towards the center of the population as a way of enhancing the strategy. In this way, the weaker solutions can stay in the safety phase, where they are given a chance to continuously improve their fitness, while the stronger solutions continue with the actual optimization without compromising the randomness and hence the diversification of the search.

X^{t + 1} = \frac{(X_{t} + c (2 r_{2} - 1) (1 - \frac{t}{t_{\max}}) X_{t} + τ_{n + 1})}{2}

(19)

Here,

τ_{n + 1}

denotes the updated following ability of the solution,

τ_{n}

denotes the pheromone matrix (the ability to stay in the population),

c

denotes the constant number with value 0.01, and

r_{2}

represents the probabilistic term in the range (0, 1).

(ii) Offensive strategy:

i f F (X_{a t t}) < F (X^{t})

: In FZ, if a solution has a fitness value greater than that of the attacker, it goes for an attack in order to safeguard other inferior solutions in the population. In this context, if an attacker targets a weaker solution, all of the stronger solutions converge towards the attacker. This coordinated movement helps to distract and tire down the attacker while creating a shield around the weaker solution. This forces the weaker member to be shielded from all threats that may be from other members of the group, as the stronger members work as a group to shield it. This offensive strategy not only eliminates the impact of the attacker but also increases the strength of the weakest solution in the herding effect. The collective defense mechanism guarantees that the members who have gotten stuck at a certain phase can still survive, along with contributing to the subsequent search process; this makes the entire optimization algorithm stronger.

X^{t + 1} = X^{t} + r_{2} (X_{w o r s t} - I X_{t})

(20)

Here,

X_{w o r s t}

means the worst solution in the population. Thus, it can be pointed out that in the overall framework of FZ, the model under consideration pays special attention to the efforts aimed at minimizing the loss of weights to improve the accuracy of the prediction. This is done by including the loss function in the objective function during the training phase. As optimization progresses, the solutions move towards the set of parameters that produces the most minimal loss, which enables the model to generate accurate and reliable predictions. This continuous reduction in loss not only enhances the forecast accuracy of the model but also increases the model’s overall performance and reliability in terms of demand forecasting in the fashion and apparel retail business.

Termination: Finally, the algorithm terminates the iterative process upon reaching the optimal solution, with the condition

(t < t_{\max})

; otherwise, the solution is re-evaluated to obtain the optimal solution. Hence, the global solution is declared for updating the hyperparameters of the model, resulting in improved demand forecasting performance. Figure 3 shows the flowchart of the proposed Formicary Zebra Optimization, and Algorithm 1 provides the pseudocode of FZ optimization.

Algorithm 1. Pseudocode for the Formicary Zebra Optimization model.
1.	Start
2.	Initializing Random Solution.
3.	Evaluate Fitness Function
4.	The Objective Function
5.	If $(ϕ \geq 1)$
6.	Foraging Strategy
7.	$X^{t + 1} = 0.5 ⌊X_{t} + r_{1} (z_{p} - I X_{t})⌋ + 0.5 ⌊(1 - ρ_{1}) \cdot τ^{n - 1} + ρ_{1} ν_{t}⌋$
8.	Else If $(F (X_{a t t}) \geq F (X_{t}))$
9.	Defensive Strategy
10.	$X^{t + 1} = \frac{(X_{t} + c (2 r_{2} - 1) (1 - \frac{t}{t_{\max}}) X_{t} + τ_{n + 1})}{2}$
11.	Else
12.	Offensive Strategy
13.	$X^{t + 1} = X^{t} + r_{2} (X_{w o r s t} - I X_{t})$
14.	While $t < t_{\max}$
15.	Declare Global Best Solution
16.	Else
17.	Return To Fitness Evaluation
18.	Terminate The Process
19.	End While
20.	End

4. Result and Discussion

The results obtained with the implementation of the proposed model for demand forecasting are described in this section.

4.1. Experimental Setup

In order to use the FZ-DACR model for demand forecasting, it is executed on Windows 10 OS with 8 GB of RAM in the Python 3.7 programming language. The initial parameter settings of the proposed framework involve a batch size of 32, a dropout rate of 0.5, a linear activation function, loss function “MSE”, a learning rate of 0.01, epochs of 500, a population size of 100, and default optimizer Adam. In the proposed research, the training and testing split percentage of 90:10 is used for the validation of the proposed model.

4.2. Dataset Description

Flipkart Grid Software Development Challenge 2.0 dataset [26]: This dataset was collected from the GitHub website accessed in April 2024, which contains more detailed data that are extracted from a range of e-commerce sites, some of which are specialized in selling dresses, skirts, and t-shirts. For the specific subject of dresses, the attributes might encompass different styles, fabrics, measurements, colors, prices, customers’ feedback, and ratings. Skirts are also described using other attributes that divide them based on design, length, types of fabrics, prices, and customers’ feedback. Specific items in T-shirts are style, size, color, type of material used, price, and customers’ reviews on similar products. These clothing types are all well-represented in this extensive database and employed in many machine learning processes for recommendations, trend analysis, profitable pricing, and consumer behavior prediction. In addition, image data provide additional attributes and visual cues that are critical for accurate forecasting. The study utilized samples from the three distinct datasets, including 327 from the dress dataset, 296 from the skirt dataset, and 296 from the T-shirt dataset. These combined samples were used to train and validate the demand forecasting model.

4.3. Experimental Results

Figure 4 displays the experimental outcome produced by the proposed FZ-DACR model. The experimental results display the various processes utilized for demand forecasting, where the input image data are applied to the pre-processing phase. Furthermore, the output received from segmentation and feature extraction is explained as follows.

Figure 5 illustrates the graph embedding features obtained using the proposed FZ-DACR model based on the dress dataset, the skirt dataset, and the T-shirt dataset for demand forecasting. The round balls represent the individual items and the similarity between the items are established through the link (straight line connecting the items), while the numbers represents the similarity score between the items.

4.4. Comparative Methods

To demonstrate the advancement of the FZ-DACR model, this evaluation compares a number of existing techniques, such as ML [19], SVM [20], ANN [8], CNN [29], non-linear NNR [3], and MNN [30]. Furthermore, the comparative analysis of the proposed FZ-DACR model with other existing techniques in terms of Training Percentage (TP) Analysis and k-fold analysis is discussed in this section.

4.4.1. Comparative Analysis Based on Dress Dataset with TP Analysis

The FZ-DACR model demonstrated the lowest MAE in demand forecasting compared to the MNN model, showing a slight 19.62% increase and achieving the lowest MAE of 1.34, as indicated in Figure 6a.

In Figure 6b, the FZ-DACR model exhibited the lowest MSE in demand forecasting compared to the MNN model, surpassing it by 36.92% and attained the lowest MSE of 4.70, with a TP of 90.

Regarding demand forecasting, the FZ-DACR model illustrated in Figure 6c performed 3.13% better than the MNN model. With a TP of 90, it achieved an R2 of 93.13%, surpassing existing techniques.

The FZ-DACR model displayed the least RMSE in demand forecasting compared to the MNN model in Figure 6d, outperforming it by 16.90% and achieving the lowest RMSE of 2.17, with a TP of 90.

4.4.2. Comparative Analysis Based on Dress Dataset with K-Fold Analysis

The comparative evaluation of the proposed model with existing techniques in terms of k-fold analysis is depicted in Figure 7. The FZ-DACR model demonstrated the lowest MAE in demand forecasting compared to the MNN model, showing a slight decrease of 0.44 and achieving the lowest MAE of 1.38, as indicated in Figure 7a.

In Figure 7b, the FZ-DACR model exhibited the lowest MSE in demand forecasting compared to the MNN model, surpassing it by a difference of 0.185 and attaining the lowest MSE of 4.94, with k-fold 10.

Regarding demand forecasting, the FZ-DACR model illustrated in Figure 7c obtained 1.59% better results than the MNN model. With k-fold 10, FZ-DACR achieved an R2 of 93.65%, surpassing existing techniques.

The FZ-DACR model displayed the least RMSE in demand forecasting compared to the MNN model in Figure 7d, outperforming it with an error difference of 0.041 by achieving the lowest RMSE of 2.22, with k-fold 10.

4.4.3. Comparative Analysis Based on SKIRT Dataset with TP Analysis

The FZ-DACR model demonstrated the lowest MAE in demand forecasting compared to the MNN model, showing a slight decrease of 11.53% and achieving the lowest MAE of 1.43, as indicated in Figure 8a.

In Figure 8b, the FZ-DACR model exhibited the lowest MSE in demand forecasting compared to the MNN model, surpassing it by 14.97% and attaining the lowest MSE of 4.75, with a TP of 90.

Regarding demand forecasting, the FZ-DACR model illustrated in Figure 8c performed 95.67% better than the MNN model. With a TP of 90, it achieved an improvement of 0.64%, surpassing existing techniques.

The FZ-DACR model displayed the least RMSE in demand forecasting compared to the MNN model in Figure 8d, outperforming it by 7.20% and achieving the lowest RMSE of 2.18, with a TP of 90.

4.4.4. Comparative Analysis Based on Skirt Dataset with K-Fold Analysis

The FZ-DACR model demonstrated the lowest MAE in demand forecasting compared to the MNN model, showing a slight decrease of 0.326 and achieving the lowest MAE of 1.66, as indicated in Figure 9a.

In Figure 9b, the FZ-DACR model exhibited the lowest MSE in demand forecasting compared to the MNN model, surpassing it by a difference of 2.23 and attaining the lowest MSE of 5.09, with k-fold 10.

Regarding demand forecasting, the FZ-DACR model illustrated in Figure 9c performed 94.12% better than the MNN model. With k-fold 10, it achieved a relative improvement of 0.83%, surpassing existing techniques.

The FZ-DACR model displayed the least RMSE in demand forecasting compared to the MNN model in Figure 9d, outperforming it with an error difference of 0.44 and achieving the lowest RMSE of 2.25, with k-fold 10.

4.4.5. Comparative Analysis Based on T-Shirt Dataset with TP Analysis

The FZ-DACR model demonstrated the lowest MAE in demand forecasting compared to the MNN model, showing a slight decrease of 11.53% and achieving the lowest MAE of 1.43, as indicated in Figure 10a.

In Figure 10b, the FZ-DACR model exhibited the lowest MSE in demand forecasting compared to the MNN model, surpassing it by 14.97% and attaining the lowest MSE of 4.75, with a TP of 90.

Regarding demand forecasting, the FZ-DACR model illustrated in Figure 10c performed 95.67% better than the MNN model. With a TP of 90, it achieved an R2 of 0.64, surpassing existing techniques.

The FZ-DACR model displayed the lowest RMSE in demand forecasting compared to the MNN model in Figure 10d, outperforming it by 7.20% and achieving the lowest RMSE of 2.18, with a TP of 90.

4.4.6. Comparative Analysis Based on T-Shirt Dataset with K-Fold Analysis

The FZ-DACR model demonstrated the lowest MAE in demand forecasting compared to the MNN model, showing a slight decrease of 1.522 and achieving the lowest MAE of 1.46, as indicated in Figure 11a.

In Figure 11b, the FZ-DACR model exhibited the lowest MSE in demand forecasting compared to the MNN model, surpassing it by 0.35 and attaining the lowest MSE of 6.29, with k-fold 10.

Regarding demand forecasting, the FZ-DACR model illustrated in Figure 11c obtained 95.24% better results than the MNN model. With k-fold 10, it achieved a relative improvement of 1.02%, surpassing existing techniques.

The FZ-DACR model displayed the least RMSE in demand forecasting compared to the MNN model in Figure 11d, outperforming it with an error difference of 0.069 and achieving the lowest RMSE of 2.50, with k-fold 10.

4.5. Comparative Discussion

The comparative analysis of the proposed FZ-DACR model was evaluated with other existing models to enhance the performance in demand forecasting. The existing models, such as ML, CNN, non-linear NNR, and MNN, in fashion and apparel demand forecasting include drawbacks of inadequate fashion trend fluctuations, multiple data sources, and temporal dependencies in the sales data. Compared with traditional machine learning methods, such as SVM and ANN, which have limitations of slow learning if new trends and outside forces are involved, CNNs are mostly used for image data and may not make good use of sequential characteristics. The non-linear NNR and MNNs also have some difficulty in optimizing across complex data spaces and do not optimally work with high-dimensional and dynamic data in fashion. However, the proposed FZ-DACR model overcomes these issues more effectively and holistically by incorporating effective pre-processing, feature extraction, and a segmentation process. Flexible problem solving and decision making are achieved through the integration of FZ optimization and a distributed attention mechanism to improve the model’s adaptability to a highly dynamic and fluctuating fashion environment. The utilization of a distributed attention mechanism with CNN and RNN models enables the extraction of significant properties from fashion images and texts, which is useful for the styles and preferences of products, which is important for selling apparel over different seasons and trends. Thus, the benefits of these advancements in the proposed model generated more accurate and reliable forecasts that encompass the complex nature of the fashion business. However, the imbalanced data may affect the performance of the model over diverse datasets, which will be overcome in future research. Table 1, Table 2 and Table 3 include the comparative discussion of the dress dataset, the skirt dataset, and the T-shirt dataset.

4.6. Statistical Analysis

The statistical analysis was carried out in terms of Best, Mean, Variance, and Standard Deviation (STD) for analyzing the robustness of the reported results evaluated using the performance metrics MAE, MSE, R2, and RMSE. Here, the results of the proposed FZ-DACR method are compared with the existing methods ML, SVM, ANN, CNN, non-linear NNR, and MNN. Table 4, Table 5 and Table 6 depict the statistical analysis of the proposed model using three datasets and compared with other existing methods.

4.7. Computational Complexity

The computation time required by the proposed method was demonstrated through a comparative analysis across multiple iterations and compared to existing methods. The results highlight that the proposed method consistently requires significantly less time than other approaches, emphasizing its computational effectiveness. The proposed FZ-DACR method exhibits the lowest computational time of 121.66 s at the 100th epoch, which is significantly faster than all other methods. More specifically, the FZ algorithm adaptively tunes the hyperparameter of the proposed model, resulting in faster convergence. Furthermore, the FZ algorithm ensures the optimal configuration of the proposed architecture and minimizes computational complexity during demand forecasting. In addition, the distributed attention mechanism allows the model to focus on the important features while suppressing the irrelevant features, which minimizes the processing time required for analyzing the irrelevant features. Consequently, the proposed approach minimizes the computational overhead compared to the other baseline techniques utilized for demand forecasting. The computation time analysis for the proposed model is depicted in Table 7 and Figure 12.

4.8. Convergence Analysis

Convergence analysis was carried out to analyze the convergence rate required to find the optimal solution during the iterative process. The convergence analysis of the proposed FZ algorithm is evaluated against several established metaheuristic algorithms, including Particle Swarm Optimization (PSO), Genetic Algorithms (GA), Ant Colony Optimization (ACO), and the Zebra Optimization Algorithm (ZOA), as depicted in Figure 13. From the convergence analysis, the proposed FZ algorithm is observed to have a minimum loss of 0.000001 at the 98th epoch, which is less compared with other optimization techniques. At the same time, the existing PSO algorithm obtained a loss of 0.0027, the GA algorithm obtained a loss value of 0.0026, the ZOA algorithm reached a loss of 0.0002, and the ZOA algorithm gained a loss of 0.0002, respectively. Specifically, the incorporation of the unique traits in the FZ algorithm, offering exploration in the large search space, accelerates the convergence speed while obtaining an optimal solution. As a result, the proposed model minimizes loss and outperforms the other existing algorithms.

4.9. Sensitivity Analysis

Sensitivity analysis was conducted to analyze how uncertainty in the input variables influences the system’s performance in predicting demands, especially in terms of MSE. The sensitivity analysis carried out for the proposed FZ-DACR model over the baseline technique of MNN is shown in Figure 14. Furthermore, the sensitivity analysis shows that the proposed approach attained the lowest MSE of 4.7, while the existing MNN attained an MSE of 6.44. Likewise, the proposed model outperformed all of the existing models. Furthermore, the proposed FZ-DACR approach attained high sensitivity, which assists in maintaining robustness throughout the demand forecasting process.

4.10. Ablation Study

An ablation study was performed to examine the performance of the model after removing or altering different components. Figure 15 shows the results of the ablation study, which assessed the FZ-DACR model’s performance by varying the different components. More specifically, the FZ-DACR’s performance was compared to that of the CNN-RNN, DACR model. Furthermore, the results of the models are evaluated in terms of the MSE. The FZ-DACR model achieved the lowest MSE of 4.70, which was significantly reduced over the CNN-RNN by 2.95 and DACR by 0.75. Moreover, the FZ-DACR model provides superior performance, as it integrates the strength of CNN-RNN, a distributed attention mechanism, and the FZ algorithm for demand forecasting. Specifically, the combination of the CNN and RNN algorithms assists in extracting the context and hierarchical patterns present in the input product images and historical sales data. Furthermore, the distributed attention mechanism helps the model focus on the important region and suppress the irrelevant features in the sequence data representations, utilizing the attention weights to boost forecasting performance. As a result, the FZ-DACR model attains better performance for demand forecasting. Moreover, the significance of every component utilized in the proposed model for improving demand forecasting is highlighted in this ablation study.

4.11. Predictive Analysis for Demand Forecasting

The proposed FZ-DACR model is developed for demand forecasting in the fashion industry, and so the dress dataset, the skirt dataset, and the T-shirt dataset were used for training and testing the model. Specifically, the input data were trained with 90% of the data and validated with 10% of the data. The prediction analysis shows that the proposed model predicted the maximum number of fashion items, with only a small difference observed. Furthermore, the graph portraying the historical demand data and forecasting demands using multiple color shades is shown in Figure 16. From the figure, the yellow color wave represents the historical fashion data, the green color represents the actual fashion demands, and the blue wave denotes the predicted fashion demands, while the orange color represents the training data. Thus, the proposed FZ-DACR model effectively predicts demand, offering valuable support for demand forecasting within the fashion industry.

5. Conclusions

This research offers the prospect of the combined use of FZ optimization and the proposed DACR model to forecast demand in the fashion and apparel retail industry. This approach solves the existing models’ issues, including fashion trend fluctuations, data heterogeneity, and consumer preferences with a dynamic nature. By integrating the FZ’s adaptive optimization, the weights and bias parameters of the model are tuned to find an optimal solution. Through effective feature extraction and sequential learning, the proposed model improves predictive precision and work productivity. Initially, the proposed approach examines the best way to obtain information from input review text and images and discover more precise demand data. By acquiring this information, the proposed model determines the actual demand and hence resolves the uncertainty. According to the results, the models attain the lowest error values of metrics when utilizing the dress dataset, with MAE of 1.34, MSE of 4.7, R2 of 93.3, and RMSE of 2.17. However, the imbalanced data may affect the model’s performance across diverse datasets, and real-time data will be addressed in future research. The implications of the research in the fashion and apparel industry help to price products based on demand. In the future, the proposed model can be extended through real-time implementation to improve the productivity of the model. Although the proposed model with a complex architecture effectively extracts the non-linear representation but adds model complexity. In the future, the effective balance between model complexity and its ability to represent non-linear patterns should be considered to enhance overall performance.

Author Contributions

All authors have made substantial contributions to conception and design, revising the manuscript, and the final approval of the version to be published. Also, all authors agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive funding from any institution.

Data Availability Statement

The data used in this study can be accessed at the Flipkart Grid Software Development Challenge 2.0 Dataset link, https://github.com/avmand/GriD_Fashion/blob/master/CurrentTrends/final_code/dress-flipkart-final-final.csv, accessed on 20 April 2024.

Conflicts of Interest

No potential conflicts of interest were reported by the authors.

References

Seyedan, M.; Mafakheri, F.; Wang, C. Order-up-to-level inventory optimization model using time-series demand forecasting with ensemble deep learning. Supply Chain. Anal. 2023, 3, 100024. [Google Scholar] [CrossRef]
Giri, C.; Chen, Y. Deep learning for demand forecasting in the fashion and apparel retail industry. Forecasting 2022, 4, 565–581. [Google Scholar] [CrossRef]
Giri, C.; Thomassey, S.; Balkow, J.; Zeng, X. Forecasting new apparel sales using deep learning and nonlinear neural network regression. In Proceedings of the 2019 International Conference on Engineering, Science, and Industrial Applications (ICESI), Tokyo, Japan, 22–24 August 2019; IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
Deng, C.; Liu, Y. A deep learning-based inventory management and demand prediction optimization method for anomaly detection. Wirel. Commun. Mob. Comput. 2021, 2021, 9969357. [Google Scholar] [CrossRef]
Skenderi, G.; Joppi, C.; Denitto, M.; Cristani, M. On the use of learning-based forecasting methods for ameliorating fashion business processes: A position paper. In Proceedings of the International Conference on Pattern Recognition, Montréal, QC, Canada, 21–25 August 2022; Springer Nature: Cham, Switzerland, 2022; pp. 647–659. [Google Scholar]
Giri, C.; Thomassey, S.; Zeng, X. Customer analytics in the fashion retail industry. In Functional Textiles and Clothing; Springer: Singapore, 2024; pp. 349–361. [Google Scholar]
Punia, S.; Shankar, S. Predictive analytics for demand forecasting: A deep learning-based decision support system. Knowl.-Based Syst. 2022, 258, 109956. [Google Scholar] [CrossRef]
Güven, İ.; Şimşir, F. Demand forecasting with color parameter in the retail apparel industry using artificial neural networks (ANN) and support vector machines (SVM) methods. Comput. Ind. Eng. 2020, 147, 106678. [Google Scholar] [CrossRef]
Rizqi, Z.U.; Chou, S.Y. Neuroevolution reinforcement learning for multi-echelon inventory optimization with delivery options and uncertain discount. Eng. Appl. Artif. Intell. 2024, 134, 108670. [Google Scholar] [CrossRef]
Kharfan, M.; Chan, V.W.K.; Firdolas Efendigil, T. A data-driven forecasting approach for newly launched seasonal products by leveraging machine-learning approaches. Ann. Oper. Res. 2021, 303, 159–174. [Google Scholar] [CrossRef]
Vo, T.T.B.C.; Le, P.H.; Nguyen, N.T.; Nguyen, T.L.T.; Do, N.H. Demand Forecasting and Inventory Prediction for Apparel Products using the ARIMA and Fuzzy EPQ Model. J. Eng. Sci. Technol. Rev. 2021, 14, 80–89. [Google Scholar] [CrossRef]
Loureiro, A.L.; Miguéis, V.L.; Da Silva, L.F. Exploring the use of deep neural networks for sales forecasting in fashion retail. Decis. Support Syst. 2018, 114, 81–93. [Google Scholar] [CrossRef]
Rizqi, Z.U.; Khairunisa, A.; Maulani, A. Financial assessment on designing inventory policy by considering demand, lead time, and defective product uncertainties: A Monte Carlo simulation. Indones. Sch. Sci. Summit Taiwan Proc. 2021, 3, 36–42. [Google Scholar] [CrossRef]
Christopher, M.; Lee, H. Mitigating supply chain risk through improved confidence. Int. J. Phys. Distrib. Logist. Manag. 2004, 34, 388–396. [Google Scholar] [CrossRef]
Punia, S.; Nikolopoulos, K.; Singh, S.P.; Madaan, J.K.; Litsiou, K. Deep learning with long short-term memory networks and random forests for demand forecasting in multi-channel retail. Int. J. Prod. Res. 2020, 58, 4964–4979. [Google Scholar] [CrossRef]
Christopher, M.; Towill, D. An integrated model for the design of agile supply chains. Int. J. Phys. Distrib. Logist. Manag. 2001, 31, 235–246. [Google Scholar] [CrossRef]
Battista, C.; Schiraldi, M.M. The Logistic Maturity Model: Application to a fashion firm. Int. J. Eng. Bus. Manag. 2013, 5, 1–11. [Google Scholar] [CrossRef]
Marvaniya, S.; Singh, J.; Galichet, N.; Otieno, F.O.; De Mel, G.; Weldemariam, K. Encoding Seasonal Climate Predictions for Demand Forecasting with Modular Neural Network. arXiv 2023, arXiv:2309.02248. [Google Scholar] [CrossRef]
Feizabadi, J. Machine learning demand forecasting and supply chain performance. Int. J. Logist. Res. Appl. 2022, 25, 119–142. [Google Scholar] [CrossRef]
Villegas, M.A.; Pedregal, D.J.; Trapero, J.R. A support vector machine for model selection in demand forecasting applications. Comput. Ind. Eng. 2018, 121, 1–7. [Google Scholar] [CrossRef]
Zhang, C.; Zhu, F.; Lv, Y.; Ye, P.; Wang, F.Y. MLRNN: Taxi demand prediction based on multi-level deep learning and regional heterogeneity analysis. IEEE Trans. Intell. Transp. Syst. 2021, 23, 8412–8422. [Google Scholar] [CrossRef]
Xu, Z.; Lv, Z.; Li, J.; Sun, H.; Sheng, Z. A novel perspective on travel demand prediction considering natural environmental and socioeconomic factors. IEEE Intell. Transp. Syst. Mag. 2022, 15, 136–159. [Google Scholar] [CrossRef]
Rathipriya, R.; Abdul Rahman, A.A.; Dhamodharavadhani, S.; Meero, A.; Yoganandan, G.J.N.C. Demand forecasting model for time-series pharmaceutical data using a shallow and deep neural network model. Neural Comput. Appl. 2023, 35, 1945–1957. [Google Scholar] [CrossRef] [PubMed]
Chandriah, K.K.; Naraganahalli, R.V. RNN/LSTM with modified Adam optimizer in a deep learning approach for automobile spare parts demand forecasting. Multimed. Tools Appl. 2021, 80, 26145–26159. [Google Scholar] [CrossRef]
Zohdi, M.; Rafiee, M.; Kayvanfar, V.; Salamiraad, A. Demand forecasting based on machine learning algorithms on customer information: An applied approach. Int. J. Inf. Technol. 2022, 14, 1937–1947. [Google Scholar] [CrossRef]
Available online: https://github.com/avmand/GriD_Fashion/blob/master/CurrentTrends/final_code/dress-flipkart-final-final.csv (accessed on 20 April 2024).
Trojovská, E.; Dehghani, M.; Trojovský, P. Zebra optimization algorithm: A new bio-inspired optimization algorithm for solving optimization problems. IEEE Access 2022, 10, 49445–49473. [Google Scholar] [CrossRef]
Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization. IEEE Comput. Intell. Mag. 2006, 1, 28–39. [Google Scholar] [CrossRef]
Joseph, R.V.; Mohanty, A.; Tyagi, S.; Mishra, S.; Satapathy, S.K.; Mohanty, S.N. A hybrid deep learning framework with CNN and Bi-directional LSTM for store item demand forecasting. Comput. Electr. Eng. 2022, 103, 108358. [Google Scholar] [CrossRef]
Nguyen, H.H.; Chan, C.W. Multiple neural networks for a long-term time series forecast. Neural Comput. Appl. 2004, 13, 90–98. [Google Scholar] [CrossRef]

Figure 1. Architecture for the proposed FZ-DACR demand forecasting model.

Figure 2. Architecture for the proposed FZ-DACR model.

Figure 3. Flowchart for the proposed Formicary Zebra Optimization.

Figure 4. Experimental result obtained using the proposed FZ-DACR model.

Figure 5. Graph embedding output obtained using the proposed FZ-DACR model.

Figure 6. Comparative analysis based on the dress dataset with TP analysis. (a) MAE; (b) MSE; (c) R2; (d) RMSE.

Figure 7. Comparative analysis based on the dress dataset with k-fold analysis. (a) MAE; (b) MSE; (c) R2; (d) RMSE.

Figure 8. Comparative analysis based on the skirt dataset with TP analysis. (a) MAE; (b) MSE; (c) R2; (d) RMSE.

Figure 9. Comparative analysis based on the skirt dataset with k-fold analysis. (a) MAE; (b) MSE; (c) R2; (d) RMSE.

Figure 10. Comparative analysis based on the T-shirt dataset with TP analysis. (a) MAE; (b) MSE; (c) R2; (d) RMSE.

Figure 11. Comparative analysis based on the T-shirt dataset with k-fold analysis. (a) MAE; (b) MSE; (c) R2; (d) RMSE.

Figure 12. Computational complexity.

Figure 13. Convergence analysis.

Figure 14. Sensitivity analysis.

Figure 15. Ablation study.

Figure 16. Prediction output vs. actual demands.

Table 1. Comparative discussion using the dress dataset.

Models	TP-90				K-Fold 10
Models	MAE	MSE	R2	RMSE	MAE	MSE	R2	RMSE
ML	2.34	8.09	92.2	2.84	1.48	6.00	86.07	2.45
SVM	1.48	5.07	86.89	2.25	1.50	17.47	92.01	4.18
ANN	2.38	10.99	83.25	3.32	2.19	6.73	89.21	2.59
CNN	2.28	9.73	89.75	3.12	2.15	5.41	91.75	2.33
Non-Linear NNR	1.94	6.93	89.89	2.63	1.83	5.13	92.17	2.27
MNN	1.6	6.44	92.25	2.54	1.56	4.96	93.45	2.23
FZ-DACR	1.34	4.7	93.13	2.17	1.38	4.95	93.65	2.22

Table 2. Comparative discussion using the skirt dataset.

Models	TP-90				K-Fold 10
Models	MAE	MSE	R2	RMSE	MAE	MSE	R2	RMSE
ML	1.44	4.96	77.59	2.23	3.49	10.41	87.65	3.23
SVM	1.70	5.92	93.40	2.43	1.73	7.74	94.01	2.78
ANN	2.73	11.82	92.16	3.44	2.16	8.81	83.86	2.97
CNN	1.91	7.22	92.84	2.69	2.04	7.93	92.27	2.82
Non-Linear NNR	1.70	5.87	93.56	2.42	1.99	7.33	93.34	2.71
MNN	1.59	5.46	95.06	2.34	1.68	7.00	93.46	2.65
FZ-DACR	1.43	4.75	95.67	2.18	1.66	5.09	94.12	2.26

Table 3. Comparative discussion using the T-shirt dataset.

Models	TP-90				K-Fold 10
Models	MAE	MSE	R2	RMSE	MAE	MSE	R2	RMSE
ML	1.44	4.96	77.59	2.23	1.99	7.14	93.01	2.67
SVM	1.70	5.92	93.40	2.43	2.52	10.16	92.87	3.19
ANN	2.73	11.82	92.16	3.44	2.62	8.08	87.52	2.84
CNN	1.91	7.22	92.84	2.69	2.17	7.15	92.86	2.67
Non-Linear NNR	1.70	5.87	93.56	2.42	1.70	6.65	94.27	2.58
MNN	1.59	5.46	95.06	2.34	1.52	6.62	94.89	2.57
FZ-DACR	1.43	4.75	95.67	2.18	1.46	6.29	95.24	2.51

Table 4. Statistical analysis based on the dress dataset.

Methods		ML	SVM	ANN	CNN	Non-Linear NNR	MNN	FZ-DACR
MAE	Best	4.26	4.00	4.16	4.04	3.93	3.63	3.49
	Mean	3.27	2.68	3.12	2.97	2.84	2.65	2.29
	Variance	0.95	0.65	0.52	0.52	0.60	0.54	0.48
	STD	0.97	0.80	0.72	0.72	0.77	0.73	0.69
MSE	Best	23.37	23.97	24.74	24.03	23.03	17.57	16.52
	Mean	15.61	20.98	17.48	16.46	13.89	10.93	10.21
	Variance	45.24	7.07	43.33	46.24	42.56	26.09	20.51
	STD	6.73	2.66	6.58	6.80	6.52	5.11	4.53
R2	Best	86.07	92.01	89.21	91.75	92.17	93.45	93.65
	Mean	79.91	87.11	81.50	83.27	84.98	86.71	89.60
	Variance	32.10	20.91	54.71	58.27	45.83	45.16	12.49
	STD	5.67	4.57	7.40	7.63	6.77	6.72	3.53
RMSE	Best	4.83	4.90	4.97	4.90	4.80	4.19	4.06
	Mean	3.84	4.57	4.09	3.95	3.61	3.21	3.11
	Variance	0.84	0.08	0.74	0.87	0.87	0.62	0.52
	STD	0.92	0.29	0.86	0.93	0.93	0.79	0.72

Table 5. Statistical analysis based on the skirt dataset.

Methods		ML	SVM	ANN	CNN	Non-Linear NNR	MNN	FZ-DACR
MAE	Best	4.00	3.94	4.29	3.99	3.91	3.87	2.91
	Mean	3.76	2.61	3.52	3.10	2.94	2.80	2.25
	Variance	0.05	0.57	0.61	0.59	0.54	0.67	0.18
	STD	0.22	0.76	0.78	0.77	0.73	0.82	0.42
MSE	Best	24.54	13.31	24.96	22.86	21.70	16.71	16.38
	Mean	17.87	10.18	17.81	15.69	13.05	11.66	10.55
	Variance	30.55	3.81	38.30	31.72	23.95	11.35	16.42
	STD	5.53	1.95	6.19	5.63	4.89	3.37	4.05
R2	Best	87.65	94.01	83.86	92.27	93.34	93.46	94.12
	Mean	79.20	82.31	75.01	78.19	80.95	84.65	88.11
	Variance	39.17	69.66	31.65	68.88	78.76	78.93	52.28
	STD	6.26	8.35	5.63	8.30	8.87	8.88	7.23
RMSE	Best	4.95	3.65	5.00	4.78	4.66	4.09	4.05
	Mean	4.17	3.18	4.15	3.89	3.55	3.38	3.18
	Variance	0.44	0.09	0.59	0.56	0.43	0.25	0.41
	STD	0.66	0.30	0.77	0.75	0.66	0.50	0.64

Table 6. Statistical analysis based on the T-shirt dataset.

Methods		ML	SVM	ANN	CNN	Non-Linear NNR	MNN	FZ-DACR
MAE	Best	4.35	4.28	4.35	4.25	4.17	3.68	2.81
	Mean	3.25	3.83	3.67	3.04	2.75	2.55	2.17
	Variance	0.85	0.45	0.44	0.48	0.68	0.5	0.25
	STD	0.92	0.67	0.66	0.69	0.83	0.71	0.5
MSE	Best	20.96	20.19	23.81	22.68	22.52	22.22	16.79
	Mean	13.62	14.12	18.14	16.85	15.19	13.80	11.93
	Variance	20.32	18.78	31.46	28.86	27.82	28.50	13.97
	STD	4.51	4.33	5.61	5.37	5.27	5.34	3.74
R2	Best	93.01	92.87	87.52	92.86	94.27	94.89	95.24
	Mean	82.36	87.09	76.45	79.27	82.26	85.34	88.05
	Variance	48.55	43.65	53.79	67.12	67.21	59.98	63.66
	STD	6.97	6.61	7.33	8.19	8.20	7.74	7.98
RMSE	Best	4.58	4.49	4.88	4.76	4.75	4.71	4.10
	Mean	3.64	3.72	4.20	4.04	3.83	3.64	3.41
	Variance	0.38	0.32	0.54	0.54	0.53	0.53	0.32
	STD	0.62	0.56	0.73	0.73	0.73	0.73	0.57

Table 7. Comparison of computational time of FZ-DACR.

Methods	Computational Time (s)
ML	124.45
SVM	123.05
ANN	123.12
CNN	123.68
Non-Linear NNR	124.18
MNN	124.21
FZ-DACR	121.66

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fandi, I.; Khalifa, W. Enhancing Demand Forecasting Using the Formicary Zebra Optimization with Distributed Attention Guided Deep Learning Model. Appl. Sci. 2026, 16, 1039. https://doi.org/10.3390/app16021039

AMA Style

Fandi I, Khalifa W. Enhancing Demand Forecasting Using the Formicary Zebra Optimization with Distributed Attention Guided Deep Learning Model. Applied Sciences. 2026; 16(2):1039. https://doi.org/10.3390/app16021039

Chicago/Turabian Style

Fandi, Ikhalas, and Wagdi Khalifa. 2026. "Enhancing Demand Forecasting Using the Formicary Zebra Optimization with Distributed Attention Guided Deep Learning Model" Applied Sciences 16, no. 2: 1039. https://doi.org/10.3390/app16021039

APA Style

Fandi, I., & Khalifa, W. (2026). Enhancing Demand Forecasting Using the Formicary Zebra Optimization with Distributed Attention Guided Deep Learning Model. Applied Sciences, 16(2), 1039. https://doi.org/10.3390/app16021039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Demand Forecasting Using the Formicary Zebra Optimization with Distributed Attention Guided Deep Learning Model

Abstract

1. Introduction

2. Literature Review

Challenges

3. Proposed Methodology for Demand Forecasting

3.1. Input Data for Demand Forecasting

3.2. Pre-Processing for Review Data

3.3. Feature Extraction for Review Data

Term Frequency–Inverse Document Frequency (TF-IDF)-Based Graph Embedding

3.4. Optimization-Based Segmentation for Image Data

3.5. Feature Extraction for Image Data

3.5.1. VGG16

3.5.2. ResNet101

3.6. Distributed Attention-Based Deep Learning Model

3.6.1. Distributed Attention Mechanism

3.6.2. Recurrent Neural Network (RNN)

3.7. Formicary Zebra Optimization

4. Result and Discussion

4.1. Experimental Setup

4.2. Dataset Description

4.3. Experimental Results

4.4. Comparative Methods

4.4.1. Comparative Analysis Based on Dress Dataset with TP Analysis

4.4.2. Comparative Analysis Based on Dress Dataset with K-Fold Analysis

4.4.3. Comparative Analysis Based on SKIRT Dataset with TP Analysis

4.4.4. Comparative Analysis Based on Skirt Dataset with K-Fold Analysis

4.4.5. Comparative Analysis Based on T-Shirt Dataset with TP Analysis

4.4.6. Comparative Analysis Based on T-Shirt Dataset with K-Fold Analysis

4.5. Comparative Discussion

4.6. Statistical Analysis

4.7. Computational Complexity

4.8. Convergence Analysis

4.9. Sensitivity Analysis

4.10. Ablation Study

4.11. Predictive Analysis for Demand Forecasting

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI