1. Introduction
In recent years, the volatility of global supply chains has intensified due to factors such as geopolitical tensions, rapid technological changes, and rising environmental expectations. These challenges have elevated the importance of agility in supply chain design—allowing systems to rapidly adapt to disruptions while maintaining service levels and sustainability goals. The concept of the Agile Supply Chain (ASC) has, thus, emerged as a cornerstone of modern logistics, focusing on responsiveness, resilience, and operational efficiency [
1,
2].
At the same time, traditional optimization approaches, often based on static and deterministic assumptions, have shown limitations in effectively handling real-time variability and uncertainty. To address this gap, researchers and practitioners are increasingly turning to the integration of Internet of Things (IoT) and Digital Twin (DT) technologies. IoT enables the collection of real-time data from supply chain nodes such as warehouses, transport fleets, and production lines [
3], while digital twins offer virtual simulations of physical systems, facilitating scenario testing, anomaly detection, and predictive analytics [
4,
5].
Despite their potential, the practical integration of IoT and DT into optimization models is still evolving. In particular, handling uncertain data streams from these technologies requires advanced modeling frameworks. Fuzzy logic provides a powerful tool to represent uncertainties in parameters such as demand, transportation time, and emissions [
6], while adaptive learning algorithms such as reinforcement learning can enhance the model’s responsiveness to changing conditions [
7].
This paper proposes a unified fuzzy multi-objective model that leverages IoT and DT architectures to continuously update and refine supply chain decisions in real-time. A Deep Deterministic Policy Gradient (DDPG) algorithm is used to adapt fuzzy parameters based on live data, enabling the system to learn and improve over time. The model addresses multiple conflicting objectives simultaneously, such as cost, delivery time, carbon emissions, and agility.
In addition to the methodological contributions, the paper presents two illustrative case studies:
Urban Grocery Distribution: A retailer manages a smart-city distribution network where consumer demand and local traffic vary significantly throughout the day. By applying the proposed model, the company dynamically adjusts delivery schedules and routes, resulting in reduced delivery time and lower emissions.
Humanitarian Relief Operations: In the aftermath of a natural disaster, rapid deployment of essential supplies is critical. The proposed model supports decision-making under uncertainty by updating route plans and resource allocations using IoT sensor data and digital twin simulations.
Through these contributions, this paper aims to advance the development of intelligent, self-learning supply chains that can operate effectively in uncertain and dynamic environments.
The key innovation of this research is that it provides an integrated combination of fuzzy logic, IoT, digital twin, and reinforcement learning that simultaneously enables accurate prediction, real-time intelligent control, and automatic adaptation to environmental uncertainties. This integration brings advantages over previous works that have used only IoT or only DT, such as reduced operating cost, reduced decision-making latency, increased accuracy in condition monitoring, and improved flexibility in the face of sudden changes in demand and capacity. Thus, the proposed model does not just collect data, but dynamically and continuously improves the multi-objective optimization process with reinforcement learning and fuzzy logic.
From the perspective of numerical solution, this research also adopts a hybrid approach. For solving small-scale problems, GAMS has been used to generate accurate answers and validation criteria. However, when dealing with large-scale and highly complex problems, metaheuristic algorithms, including NSGA-II, Multi-Objective Particle Swarm Algorithm (MOPSO), and Whale Algorithm, have been used. Each of these algorithms has been analyzed and compared in the face of different problem dimensions and objective function structures to extract the best option based on the data type and decision-making objectives. In addition, an adaptive learning mechanism based on Reinforcement Learning has also been designed to continuously and dynamically update fuzzy values and make optimal answers closer to real-world conditions.
2. Literature Review
The literature on agile and sustainable supply chains highlights the growing need for systems that can rapidly adapt to disruptions, reduce environmental impacts, and remain resilient in the face of uncertainty. Classical decision-making approaches based on deterministic parameters are increasingly viewed as inadequate in addressing challenges such as fluctuating demand, resource variability, and sustainability requirements [
5,
6]. Consequently, modern supply chain research has gravitated toward multi-objective and uncertainty-aware frameworks.
One critical development in this space is the application of fuzzy logic to capture the inherent vagueness in parameters like demand, lead times, and emissions. Fuzzy models allow for decisions that better reflect real-world ambiguity, enhancing model realism and robustness [
8,
9]. In particular, multi-objective fuzzy programming has enabled simultaneous optimization of competing goals—such as cost, time, and sustainability—even under incomplete or uncertain information [
10,
11].
In terms of optimization techniques, metaheuristic algorithms like NSGA-II, MOPSO, and the Whale Optimization Algorithm have gained popularity due to their scalability and flexibility. Unlike traditional linear programming methods, these algorithms can handle large, nonlinear, and non-convex problems that are common in real-world supply chains [
12,
13]. NSGA-II is well-regarded for producing diverse Pareto fronts and rapid convergence [
14], while MOPSO offers high computational efficiency. The Whale Optimization Algorithm, though newer, is gaining attention for its dual-phase search capabilities [
15,
16].
Beyond optimization, the integration of real-time data streams into decision models has emerged as a key trend. The advent of the Internet of Things (IoT) has made it feasible to collect granular data on transportation, inventory, and environmental conditions [
17,
18]. However, access to data alone is insufficient—there remains a need for systems that can analyze, simulate, and act upon this data dynamically.
This is where Digital Twin (DT) technology plays a transformative role. DTs offer virtual representations of physical assets, enabling continuous monitoring and predictive analytics within the supply chain [
19,
20]. Despite growing interest, many existing studies only integrate the DT conceptually or use IoT data passively. There is a notable lack of frameworks where IoT and DT are actively connected to a mathematical optimization model [
21].
An emerging solution lies in the use of adaptive learning mechanisms, particularly reinforcement learning techniques such as DDPG (Deep Deterministic Policy Gradient). These models can bridge the gap between data collection and decision-making by learning to adjust fuzzy parameters dynamically based on feedback [
22,
23]. Yet, few studies have successfully operationalized this feedback loop within a fully integrated, real-time supply chain optimization model [
24,
25].
To address these gaps, this research proposes a comprehensive, adaptive fuzzy multi-objective optimization framework that actively connects IoT and DT technologies with real-time learning algorithms. Unlike previous studies that treat these components in isolation, our model is built around their synergy—offering both high-level strategy and moment-to-moment adaptability. This integrated approach has the potential to reshape the future of supply chain intelligence.
3. Problem Definition and Assumptions
In this research, the problem of designing and optimizing an agile and sustainable supply chain under uncertainty is defined by utilizing digital twin and IoT technologies. The overall structure of the supply chain includes multiple suppliers, production centers, warehouses, and end customers that interact in real-time in a multi-level network. The main goal is to select the optimal configuration of this network in the presence of limited resources, variable demand, and environmental risks, so that multiple economic, environmental, and agility goals are realized simultaneously.
Decision-making in this model takes place at three main levels. At the strategic level, decisions related to facility location and overall network design are made. At the tactical level, resource allocation, production planning, and inventories are considered. Finally, at the operational level, transportation scheduling, real-time response to IoT data, and modification of logistics routes are performed using the digital twin model. This multi-layered structure allows the model to incorporate both long-term decisions and real-time adjustments into a coherent framework.
In the design of this system, a digital twin is used as an intelligent virtual system that simulates and analyzes a real-time version of the physical supply chain. By connecting to sensors, cameras, and IoT platforms, this digital model receives real-time data related to the location of goods, inventory levels, energy consumption, weather conditions, and transportation conditions, and provides them to optimization algorithms in the form of an intelligent decision-making system. The digital twin acts not only as a monitoring tool but also as an active part of the decision-making process, analyzing current conditions and providing optimal suggestions.
The IoT structure also includes a set of sensors and communication devices installed at key points in the supply chain, including in warehouses, trucks, production lines, and even finished products. These sensors measure information such as temperature, humidity, inventory levels, delays, energy consumption, and environmental conditions, and continuously send them to the digital twin model. This data is recorded and processed at high frequency and with minimal time delay, which allows the model to respond quickly to changes.
In practical implementation, the data sampling frequency is set between 5 and 30 s, depending on the variable type, which ensures high accuracy for sensitive data (such as temperature and energy). Information exchange is carried out via industry-standard protocols such as OPC-UA and MQTT, which provide data reliability and security. Also, a synchronization mechanism based on timestamp alignment and buffering strategy is used so that data received from different sensors is processed and used simultaneously in the digital twin model with minimal timing errors. This infrastructure ensures that optimization decisions are made based on an accurate and up-to-date picture of the entire system.
The type of demand in this model is considered fuzzy and dynamic, meaning that the exact amount of customer demand is not known in advance, but is modeled in the form of intervals with membership functions. This reflects the reality of unstable and unpredictable markets in high-risk industries. Specifically, demand is defined as a triangular or trapezoidal fuzzy model over a multi-period time horizon and is modified with real-time IoT data.
Resource constraints in this model also include supplier capacity, production constraints, warehouse capacity, and transportation fleet. These constraints are also modeled in a fuzzy manner, considering operational fluctuations and unforeseen events. For example, the production capacity of a factory may be reduced due to emergency repairs or power outages, which are detected and included in the model by the digital twin.
Environmental uncertainties, which include unexpected events such as sudden changes in raw material prices, traffic, natural disasters, or weather conditions, are modeled using fuzzy linguistic variables (such as “high”, “medium”, “low”). The proposed system detects these uncertainties through the IoT sensing layer and transmits them to the digital twin to assess their effects on macro decisions.
Finally, real-time data acquisition is a key feature of the present model, which is fully realized using a hybrid IoT and digital twin architecture. This capability enables decisions to be made not only based on historical data or forecasts, but also on live and current information, which plays a significant role in increasing the agility, resilience, and rapid response capability of the supply chain. However, in real-world environments, sensor data is not always error-free and may be subject to noise, transmission delays, and even short interruptions in data transmission. To mitigate the effects of these issues, the proposed architecture is designed to be coupled with preprocessing and denoising techniques such as the Kalman Filter and Moving Average smoothing, and to identify and replace suspicious data when necessary. These features ensure that the model’s decision-making remains valid even in the presence of incomplete or noisy data.
It should be noted that at this stage, the main focus of the research is on developing and testing the methodology, and, therefore, simulated and semi-random data have been used. However, the model architecture is designed in such a way that direct connection to real supply chain data, including IoT flows and digital twin data, is possible without the need for structural changes. In future plans, a pilot study in collaboration with a real supply chain is foreseen for the calibration and empirical validation of the model. In the following, the proposed structure of an integrated agile and sustainable supply chain model based on digital twin and Internet of Things layers is presented in
Figure 1.
In this architecture, three decision-making levels (strategic, tactical, operational) are seamlessly combined with information flows from IoT and the Digital Twin analytics layer to enable multi-objective optimization under uncertainty. This integration of real-time data mining, intelligent analytics, and fuzzy modeling forms the innovative core of this research.
4. Fuzzy Multi-Objective Mathematical Model
In this section, a multi-objective fuzzy mathematical model is presented that aims to simultaneously optimize key metrics of an agile and sustainable supply chain, in the presence of real-time IoT data and digital twin simulation capabilities. This model is presented in the form of a mathematical framework including sets, parameters, decision variables, objective functions, constraints, and an adaptive learning mechanism. Due to the inherent uncertainties in demand, resources, and environmental risks, many of the model parameters are defined in a fuzzy manner, and their values are updated over time with the help of live data.
| Set of suppliers |
| Set of manufacturing plants |
| Set of distribution centers |
| Set of customers |
| Set of products |
| Set of time periods |
| Fuzzy demand of customer l for product p in period t |
| Transportation cost per unit from supplier i to plant j |
| Transportation cost from plant j to distribution center k |
| Transportation cost from distribution center k to customer l |
| Inventory holding cost for product p at DC k |
| Transportation time between nodes u → v |
| Carbon emission on the route u → v |
| Energy consumption of plant j |
| Agility index of the route from plant j to DC k |
| Capacity of supplier i |
| Production capacity of plant j |
| Storage capacity at distribution center k |
| Maximum allowable delivery time |
| Maximum total allowable carbon emission |
| Maximum total energy consumption |
| Fuzzy membership function |
| Real-time IoT data stream |
| Digital Twin feedback on current system status |
| Amount of product p p transported from supplier i to plant j in period t |
| Amount transported from plant j to distribution center k |
| Amount delivered to customer l |
| Production quantity of product p at plant j |
| Inventory level of product p at DC k |
In the presented mathematical model, objective function (1) is designed to minimize the total cost of the supply chain, including the costs of supply, production, transportation, and maintenance, and its goal is to optimize the overall economic system. Objective function (2) focuses on minimizing the total cycle time from supply to final delivery and ensures the speed of network response in agile conditions. Objective function (3) considers the amount of greenhouse gas emissions resulting from logistics and production operations and is introduced with the aim of reducing the environmental impact of the model. Objective function (4) represents the maximization of the supply chain agility index, in which the ability to respond quickly, flexibility in distribution routes, and adaptability to demand fluctuations are taken into account. Constraint (5) ensures that the amount ordered from each supplier does not exceed its allowed capacity. Constraint (6) limits the production quantity of each unit to the actual production capacity of that unit. Constraint (7) controls that the quantity of goods transported between different centers does not exceed the capacity of the transportation fleet. Constraint (8) establishes the connection between ordering, production, and shipping so that the flow of materials in the chain is modeled in an integrated manner. Constraint (9) ensures that the inventory level in the distribution centers remains within an acceptable range in each time period. Constraint (10) satisfies the demand of end customers within the planned time horizon so that no shortages occur. Constraint (11) ensures the sustainability of the supply of raw materials from environmentally friendly sources. Constraint (12) limits the maximum allowable carbon emissions in each part of the network and controls environmental sustainability. Constraint (13) prevents excessive energy consumption in production processes so that energy-saving goals are achieved. Constraint (14) models the connection between IoT components and their data acquisition capabilities at different nodes of the chain. Constraint (15) ensures the minimum level of expected agility in key parts of the chain to maintain responsiveness in critical situations. Constraint (16) keeps the fuzzy values within defined ranges so that the uncertainties modeled in the fuzzy framework are controllable. Constraint (17) adjusts the adaptive adaptation rate of the fuzzy parameters according to learning from real-time data. Finally, constraint (18) constrains all decision variables to allowable numerical and logical ranges so that the mathematical model structure remains consistent and solvable.
In this model, environmental constraints such as carbon emission caps and energy consumption limits are defined as fuzzy variables to cover uncertainties due to regulatory changes and operational fluctuations. These variables are used as baseline values during the initial optimization process, but the model structure is designed to allow for dynamic updating. In other words, if environmental constraints change during or after the start of operation (e.g., due to stricter regulations or changes in energy prices), these changes can be fed into the state space online, and the DDPG reinforcement learning mechanism can update the values of the fuzzy membership functions and optimal bounds without having to stop or rerun the entire model.
In order to understand and manage the real uncertainties in the supply chain, many key parameters in this model are defined in fuzzy terms. Customer demand, transportation time, energy consumption, production capacity, and carbon emissions are among the variables that fluctuate and are unpredictable in real operating environments. Triangular or trapezoidal fuzzy numbers are used to model these parameters. For example, the fuzzy customer demand
l for product
p at time
t is defined as follows:
In which the minimum, most likely, and maximum demand values are specified in the form of a fuzzy membership function. Other uncertain parameters such as delivery time or carbon emissions are modeled in the same way. When solving the model, these fuzzy numbers are converted to definite values using methods such as the centroid method or α-cut, but they remain as updatable fuzzy variables in the dynamic structure of the model.
The proposed model is fully integrated with real-time data from IoT sensors and the digital twin infrastructure. The IoT infrastructure consists of a network of sensors, data collection devices, and communication systems installed at various points in the supply chain (such as production lines, warehouses, transportation fleets, and stores). These sensors transmit data such as ambient temperature, humidity, road traffic, machinery status, and inventory levels in real time. In contrast, the digital twin, as an active digital representation of the entire supply chain, collects, analyzes, and simulates real-time data and reconstructs the current state of the system at any moment using analytical algorithms. This information is used not only to monitor the state, but also to dynamically change the fuzzy parameters of the mathematical model. For example, if the actual transportation time between two points changes due to road conditions, the corresponding fuzzy value in the model is also automatically corrected.
In this research, in order to dynamically and intelligently update the fuzzy parameters of the model, an adaptive learning module based on the Deterministic Deep Reinforcement Learning (DDPG) algorithm has been designed and implemented. DDPG is a policy-oriented (Actor-Critic) algorithm in the field of reinforcement learning that is very suitable for problems with continuous state and action space and has high efficiency, especially in time-dependent environments with delayed feedback signals. These features make it very suitable for adaptive modeling of fuzzy parameters of the supply chain under uncertainty.
In this structure, the DDPG learning agent is responsible for generating the optimal value for fuzzy parameters (such as demand, shipping time, carbon emissions, etc.) at each time step, based on the current state of the system (State) provided through real-time IoT and digital twin data.
The basic structure is defined as follows:
State (S): A combination of real-time data including transportation status, inventory level, weather conditions, energy consumption, and past data of the desired fuzzy parameter.
Action (A): Determine a new (adjusted) value for the fuzzy parameter, such as or
Reward (R): Defined based on the amount of improvement in the model’s objective function (e.g., cost or delay reduction) or reduction in uncertainty in the outputs
Policy Network (Actor): A neural network to generate the best action (new parameter value).
Value Network (Critic): A second neural network to evaluate the value of the action taken.
Each time the mathematical model is run and the outputs from the current fuzzy parameter are calculated, a reward signal is issued to the DDPG agent. This agent gradually learns under what conditions, what modification to the fuzzy parameter leads to improved system performance. Over time, the fuzzy values are intelligently adjusted to match the real-time reality of the supply chain.
To avoid instability in learning, the following techniques are used:
Replay Buffer to prevent data correlation
Soft Target Networks for gradual Actor and Critic updates
Normalization of input values for compatibility with IoT data
Integrating this structure with the mathematical model allows the system to not only be responsive to momentary changes but also to use past experiences to optimize future fuzzy decisions. This feature transforms the model from a static fuzzy system to a living, learning, and adaptive system. The detailed design of the state space, action space, and reward function, along with the interaction flow between DDPG and the fuzzy model, is explicitly described below and visually summarized in
Figure 2.
Figure 2 clearly shows how the learning agent, using IoT data and digital twins, continuously refines fuzzy values and guides the system towards intelligent and responsive decision-making.
During the training of the DDPG agent, the convergence behavior was carefully monitored to ensure the stability of the learning process. The results showed that the cumulative reward value increased steadily and reached a steady state after about a few thousand iterations, while the learning error values of the Actor and Critic networks decreased steadily and remained within a stable range. This behavioral pattern indicates that the use of the Replay Buffer and the Soft-Target Updates mechanism prevented severe fluctuations and ensured optimal convergence. In order to maintain the coherence and brevity of the paper, instead of presenting training graphs, a summary of the reward and error trends was presented descriptively.
5. Solution Methods
The proposed solution approach addresses the multifaceted complexity of supply chain optimization under uncertainty by combining exact mathematical programming, evolutionary computation, and adaptive machine learning techniques. This hybridized methodology is designed to ensure that the model is not only solvable across various problem scales—from small to industrially large—but also responsive and intelligent in adapting to real-time data inputs.
For small-scale problem instances, where the dimensionality of decision variables and constraints remains manageable, the model is implemented and solved using the General Algebraic Modeling System (GAMS). Within GAMS, classical optimization techniques such as Branch and Bound and BARON are employed to obtain exact solutions. These exact solutions serve two essential purposes: first, they provide a ground truth for validating the outputs of heuristic algorithms; and second, they facilitate sensitivity analyses that help gauge the responsiveness of the model to parameter fluctuations in well-defined scenarios (see
Table 1).
For medium to large-scale problem instances where exact optimization becomes computationally intractable, the paper turns to metaheuristic algorithms, known for their robustness in handling nonlinear, multi-objective, and high-dimensional problems. Three algorithms are applied:
NSGA-II (Non-dominated Sorting Genetic Algorithm II): Known for generating high-quality Pareto-optimal fronts, NSGA-II is used as a reference point for both performance accuracy and convergence diversity. It is particularly adept at balancing exploration (searching a wide space) and exploitation (refining current solutions).
MOPSO (Multi-Objective Particle Swarm Optimization): MOPSO is employed for its rapid convergence behavior, making it suitable for applications requiring near real-time decision-making. Its swarm intelligence mechanism allows for efficient exploration of complex landscapes.
Whale Optimization Algorithm: Chosen for its simplicity and low computational cost, this algorithm mimics the bubble-net hunting strategy of humpback whales, making it especially suitable for high-dimensional problems where quick and reasonably good solutions are acceptable.
All algorithms are tested under consistent experimental settings to ensure fair comparison. Parameters such as population size, mutation and crossover rates, maximum iterations, and archive size are carefully tuned and standardized. These settings are held constant across all experiments to isolate the influence of algorithmic structure on outcome performance.
A key innovation of the solution approach is the integration of a Deep Deterministic Policy Gradient (DDPG) algorithm. This reinforcement learning component serves as a dynamic, self-correcting mechanism for continuously adjusting fuzzy input parameters—such as uncertain customer demand, fluctuating transportation time, and shifting carbon emission rates. Drawing on real-time data from IoT devices and the analytical insights of the digital twin, the DDPG agent iteratively refines these parameters by maximizing reward signals tied to improved objective function outcomes.
The learning framework is structured around:
State representations based on sensor data and current system performance
Actions that propose parameter updates within fuzzy bounds
Rewards computed from improvements in cost, time, or sustainability metrics
To stabilize training, techniques such as experience replay, soft target updates, and input normalization are applied. As a result, the learning system gradually develops a deep contextual understanding of the supply chain’s behavior, leading to smarter decision-making over time.
Performance evaluation is multi-dimensional. Quantitative measures include convergence rate, solution optimality, Pareto front spread and hypervolume, and runtime efficiency. Qualitative assessments involve robustness to parameter uncertainty, adaptability to changing inputs, and resilience under stressed conditions. These analyses validate not only the model’s numerical accuracy but also its practical viability in volatile real-world settings.
In summary, the proposed solution approach effectively integrates deterministic precision, evolutionary flexibility, and adaptive learning capabilities. This synergy enables the model to function robustly across diverse and dynamic operational scenarios. As a result, it serves as a powerful decision-support system, equally valuable for theoretical research and real-world industrial implementation.
6. Analysis of Results
To evaluate the performance of the proposed model in real-world conditions, a set of numerical experiments has been designed, including both small problems (solvable exactly with GAMS) and large problems (suitable for metaheuristic algorithms). These experiments are designed to cover a wide range of complexity, from scenarios with a small number of nodes and products to large scenarios with simulated industrial-scale data.
All data used in this table are semi-randomly generated and, where available, field experiments or data available in the literature have also been used. The most important feature of this table is that it is used as a unified reference for all subsequent numerical analyses, so that for each case, optimization analysis, algorithm comparison, and sensitivity testing can be performed separately.
For greater clarity, the main parameters of this study are organized into two categories. First, parameters that are directly determined based on real data and available scientific literature, including production and storage capacity, transportation cost ranges, energy consumption ranges, and carbon emissions, whose values are selected according to industry reports and valid case studies. Second, parameters that are generated semi-randomly in order to evaluate the model under various conditions and examine sensitivity scenarios, including changes in customer demand and fluctuations in transportation time. To generate these data, standard statistical distributions (normal and uniform) were used in the specified ranges that are consistent with the values observed in field studies. This hybrid approach allows both to maintain the generalizability of the model and to scientifically validate the results.
To conduct numerical experiments and evaluate the model’s performance at different scales, a set of ten diverse scenarios with different dimensions has been designed, which are presented in
Table 2.
In this table, 10 different numerical scenarios with different dimensions are defined. Scenarios S1 to S3 are used for exact solution (with GAMS or reference analysis), while scenarios S4 to S10 are designed to be solved only through metaheuristic algorithms. The “Problem Size” column indicates an estimate of the number of variables and constraints in each model. Variation in the number of nodes (supplier, factory, distribution center, customer), products, and time periods also allows for analysis of the impact of complexity.
In this study, three small-scale problems (scenarios S1 to S3 from
Table 2) are first solved using GAMS 46.1 software and exact solution methods to serve as a reference criterion for evaluating the quality of solutions produced by metaheuristic algorithms. This comparison allows us to numerically assess the accuracy and reliability of each algorithm before moving on to the analysis of larger problems.
In this study, simulated and semi-random data have been used to define the numerical scenarios in
Table 2 in order to test a wide range of problem dimensions with precise control of variables and the possibility of repeatability of results. This approach allows for sensitivity analysis, model stability examination, and performance testing of optimization algorithms at different scales. It is worth noting that the model structure and IoT and digital twin infrastructures are designed in such a way that they are capable of receiving and processing real data directly without any fundamental changes.
Figure 3 shows the final values of the objective functions for each algorithm and the optimal values obtained from GAMS. As can be seen in the graph, the NSGA-II algorithm produced the closest results to GAMS in all three scenarios, with an average relative error of less than 2%. The MOPSO algorithm also performed well, although with a slightly larger difference from the reference values. In contrast, the Whale algorithm showed a significant deviation from the optimal value in scenarios S2 and S3, indicating its lower accuracy in small problems.
This analysis confirms that the NSGA-II and MOPSO algorithms can provide results very close to optimal solutions on a small scale, and, therefore, their use in larger problems where exact solutions are not possible is scientifically defensible. This comparison provides a solid basis for entering into an analysis of the algorithms’ performance in more complex scenarios.
To compare the performance of the optimization algorithms in solving the proposed model, five medium to large-sized scenarios (from S5 to S9) have been selected, which are suitable for evaluating the efficiency in terms of computational complexity. For each scenario, the model is implemented with three algorithms, NSGA-II, MOPSO, and the whale algorithm, and two main criteria are examined: the final value of the aggregated objective functions (the lower the value, the better the quality of the response) and the execution time of the algorithms (which indicates the time efficiency and computational cost).
Figure 4 shows a comparison of the objective functions for each algorithm in five different scenarios. As can be seen, the NSGA-II algorithm produced a more optimal value for the objective functions in most cases, especially in larger problems such as S8 and S9, where the difference in performance with MOPSO and the Whale algorithm becomes more noticeable. In contrast, the Whale algorithm performed worse in optimizing the objective values, but it has special features in terms of runtime, which are discussed below.
In
Figure 5, the execution time of each algorithm is examined in the same five scenarios. Contrary to expectations, the Whale algorithm has a shorter execution time than NSGA-II and MOPSO in most cases, indicating that it can be considered as a lighter option in situations where computational resources are limited. However, despite the longer time, NSGA-II provides better answers in terms of final quality and is a more suitable option for situations where the accuracy of the solution is of higher importance.
Overall, this analysis shows that NSGA-II, as the dominant algorithm in this model, offers a good balance between accuracy and time. While MOPSO converges faster on medium-sized problems, and the Whale algorithm has an advantage in low execution times, it has a significant gap with NSGA-II in solution quality.
In order to evaluate the performance of optimization algorithms, this section discusses Pareto front analysis and solution convergence. The goal of this analysis is to measure the ability of algorithms to produce a diverse and qualitative set of non-dominated solutions in a multi-objective space. This analysis is especially critical for decision-making in situations of conflict between objectives (such as cost and carbon emissions).
Figure 6 shows the Pareto front of NSGA-II and MOPSO for scenario S8 (one of the most complex problems in
Table 2). The two main objective functions compared in this analysis are: the first objective (f
1): the total cost of the entire supply chain, and the second objective (f
2): the total carbon emissions. Both objectives must be minimized, so points that are simultaneously closer to the origin are considered more optimal solutions. It is worth noting that the whale algorithm has not been deliberately included in this analysis. The reason for this is that in numerical experiments, this algorithm has shown two important weaknesses in large scenarios in terms of Pareto structure: first, the generation of fronts with low diversity and strong clustering in a limited part of the objective space; second, the high fluctuation in the results of different iterations, which indicates instability in convergence towards the true non-dominated front. Therefore, to avoid bias in the analysis and create ambiguity, only the results of two valid and stable algorithms are reported in this section.
As can be seen in the graph, the NSGA-II algorithm was able to create a more uniform, wider front and, on average, closer to the origin. This feature indicates that NSGA-II has a stronger performance in terms of the diversity of responses and the balance between exploration and exploitation. In contrast, the Pareto front obtained from MOPSO is more compact and has limited diversity, especially at the lower end (lower cost points), which may limit decision options in cost-priority scenarios.
Overall, this analysis confirms that NSGA-II not only performs well in terms of objective function value and running time, but also has a significant advantage over other algorithms in terms of Pareto front structure. This advantage is especially important in multi-criteria decision-making environments where a rich set of non-dominated solutions is required.
In order to measure the flexibility and robustness of the proposed model against operational fluctuations, three key parameters were systematically varied: customer demand, production and distribution resource capacity, and environmental constraints. The analyses were conducted for the S7 scenario as a large and complex problem, and the results are presented in the form of three curve plots (
Figure 7,
Figure 8 and
Figure 9). For better clarity, 95% confidence intervals are now shown as error bars in
Figure 7,
Figure 8 and
Figure 9.
In the first analysis, the fuzzy demand level was varied in the range of 80% to 120% of the baseline value to examine its impact on the total cost of the supply chain. As can be seen in
Figure 7, the increase in demand level caused a relatively linear growth in the value of the economic objective function, but the magnitude of this increase varied among the algorithms. The NSGA-II algorithm, with higher adaptability, was able to control the costs and showed a smoother increase slope than the other algorithms. In contrast, the Whale algorithm was more sensitive to the increase in demand and recorded the highest cost at level 1.2.
In the second step, the capacity of resources, including production, storage, and transportation, was changed in the range of 75% to 125% to analyze its impact on the average delivery time of the goods. The results presented in
Figure 8 show that reducing the capacity caused a significant increase in the response time, especially in the MOPSO algorithm, whose performance was significantly slower under resource constraints. Even at low-capacity levels, the NSGA-II algorithm was able to maintain the delivery time within an acceptable range by optimally reallocating resources and routes. With increasing capacity, all three algorithms experienced a reduction in time, but the dominant algorithm remained NSGA-II.
Finally, in the third sensitivity analysis, the system stability constraints were considered more stringent. For this purpose, the allowable carbon emissions and energy consumption were reduced by 10 percent and 20 percent, respectively.
Figure 9 shows the trend in the number of feasible non-dominated solutions for each algorithm in the face of these constraints. It can be seen from this graph that NSGA-II, while maintaining the Pareto front structure, was able to provide an acceptable number of solutions even under the most stringent conditions, while the whale algorithm lost almost all of its answers at the stringent level (0.8). The MOPSO algorithm also experienced a sharp decrease in the number of answers and showed less stability than NSGA-II.
These three analyses confirm that the proposed model has a high ability to respond to critical changes, and the NSGA-II algorithm has provided a suitable combination of stability, agility, and response diversity in all cases. These findings make the model a reliable option for industrial applications, not only from a numerical perspective but also from a decision-making perspective in complex and real-world situations.
To increase statistical robustness, the sensitivity analysis results were evaluated by calculating the mean and standard deviation of five independent runs of each scenario, in addition to comparing the curves, and reporting 95% confidence intervals for key indicators (such as total cost and delivery time). These calculations showed that the observed changes at all levels of parameter change were statistically significant (p < 0.05), thus strengthening the stability of the results and their generalizability. Furthermore, the low variance among runs indicates that the proposed algorithm has high stability against input fluctuations.
To evaluate the behavior of the adaptive learning model in modifying fuzzy values, a systematic analysis based on the interaction between real-time data and the fuzzy parameter update process has been performed. In this analysis, the initial value of some uncertain parameters, such as delivery time, supplier reliability, and transportation cost, has been estimated in a fuzzy manner with initial membership functions. Then, using the adaptive learning mechanism based on Reinforcement Learning, these membership functions have been updated based on the real feedback of the system (such as the deviation of the actual performance from the predictions).
To perform this evaluation, a comparison between two cases, “without adaptive learning” and “with adaptive learning,” has been made over 10 consecutive periods. In the first case, the fuzzy membership functions have remained fixed and only the initial information has been used for decision-making. In the second case, the learning model has modified the dynamics of the membership functions in each period using the data received from the IoT infrastructure and the Digital Twin platform. These modifications have resulted in the gradual convergence of the fuzzy values towards more realistic values.
Figure 10 shows the trend of the forecast error reduction for the key variable “delivery cost” over 10 periods. It can be seen that the use of adaptive learning has led to a significant reduction in error and faster convergence of the model compared to the case without learning.
Next,
Figure 11 shows how the fuzzy membership functions for one of the key parameters (e.g., delivery time) are updated over 4-time intervals using real-time data. It can be seen that the initial membership function, which has a wide and uncertain range, gradually becomes a more accurate and narrower function focusing on more probable values.
These results clearly demonstrate that the proposed model, combining the digital twin architecture, IoT infrastructure, and adaptive learning algorithm, has the ability to dynamically and in real time modify fuzzy values. Such a capability is an important competitive advantage for decision stability in volatile and agile environments.
To accurately assess the impact of real-time traceability based on IoT and digital twin architecture, a comparative analysis was designed between two models: (1) a static fuzzy model with predefined parameters and (2) a dynamic model with parameters updated based on real-time data.
Figure 12 shows the difference in total cost in ten numerical scenarios with different problem sizes. In all scenarios, the traceable model was able to lead to more optimal and less costly decisions, indicating a better performance in adapting to environmental changes.
This analysis clearly demonstrates that integrating IoT and digital twin architectures is not only a technological advantage but can also lead to improved economic efficiency at the decision-making model level. This finding can be cited as a key innovation in designing agile and sustainable supply chains.
7. Discussion
The comprehensive numerical analysis and the application of diverse optimization strategies in this study provide compelling evidence of the effectiveness and versatility of the proposed model. By integrating fuzzy logic, IoT, digital twins, and reinforcement learning, the model advances the state of supply chain design—demonstrating both theoretical robustness and practical applicability.
A key finding across all experimental scales is the superior performance of the NSGA-II algorithm in balancing solution quality and diversity. Particularly in large-scale instances, NSGA-II consistently generated broader and more well-distributed Pareto fronts, outperforming MOPSO and the Whale algorithm in both convergence quality and response consistency. While MOPSO offered advantages in execution speed for medium-sized problems, it displayed weaker Pareto diversity in constrained environments. The Whale Optimization Algorithm, though computationally lightweight and fast, showed performance limitations in solution optimality, especially under strict environmental constraints.
Convergence patterns confirmed the stability and maturity of the proposed optimization framework. Successive iterations revealed a consistent reduction in objective function values and increasing tightness of Pareto fronts. This stability is crucial in practical applications where decision confidence and reliability are essential.
Sensitivity analyses further underscored the model’s robustness. Variations in demand, capacity, and environmental constraints did not destabilize the system; instead, the model effectively recalibrated its decision outputs. For instance, even under aggressive carbon and energy limits, NSGA-II maintained a high density of non-dominated solutions, validating its utility for sustainability-centered applications. These results affirm that the model is not only capable of reacting to uncertainty but can also proactively maintain performance within volatile operating environments.
A particularly innovative dimension of the model lies in its adaptive learning mechanism, which provides continuous feedback between real-world conditions and decision parameters. The DDPG reinforcement learning module demonstrated high efficacy in reducing prediction error and accelerating convergence of fuzzy values. Over multiple periods, the learning system significantly narrowed the gap between expected and observed values in variables such as delivery cost and transport time. This self-correcting mechanism minimizes reliance on static assumptions and transforms the model into a truly dynamic system.
Real-time traceability, enabled through the integration of IoT and DT technologies, emerged as a critical enabler of operational excellence. Comparative evaluations showed that when live data and digital twin simulations informed the optimization engine, overall supply chain costs were markedly reduced. In scenarios with high demand fluctuations or logistical disruptions, this feedback loop allowed for faster response times and better resource allocation, reinforcing the model’s relevance in agile contexts.
The case studies further grounded these insights in tangible applications. The urban grocery distribution scenario demonstrated the model’s ability to optimize delivery time and reduce emissions simultaneously, supporting smart city goals. The humanitarian relief scenario illustrated how adaptive logistics can safeguard service continuity in crisis zones, where rapid shifts in infrastructure and need levels are commonplace. These examples validate the model’s value not only as a theoretical construct but as a real-world decision-support framework.
Finally, the effect of real-time tracking and the adaptive structure of the model was observed not only in reducing costs but also in improving sustainability, increasing flexibility, and real-time decision-making. Combining these features in an integrated and implementable model provides a new path for designing forward-looking, agile, and intelligent supply chains in complex industries. These results can also inspire the development of decision-support solutions in other operational areas such as crisis management, humanitarian logistics, or healthcare supply chains.
In conclusion, the solution strategy successfully blends deterministic accuracy, evolutionary adaptability, and real-time learning. This triad enables the model to operate effectively across a range of scenarios—making it a robust decision support tool for both academic exploration and industrial deployment.
To further demonstrate the practical applicability of the proposed model, two illustrative case studies are presented. These examples reflect distinct supply chain contexts that benefit from real-time adaptation and multi-objective decision-making under uncertainty.
Case Study 1: Urban Grocery Distribution
In this scenario, a national grocery retail chain manages last-mile deliveries across a smart city environment. Urban conditions are characterized by high variability in customer demand, traffic congestion, and strict sustainability regulations. The model is deployed to optimize the daily scheduling and routing of delivery vehicles across multiple urban zones. Real-time data from IoT devices—such as in-vehicle GPS, ambient temperature sensors, and inventory monitors—are fed into the digital twin, which simulates delivery feasibility and congestion risks. The DDPG-based adaptive mechanism continuously refines demand forecasts and vehicle capacities. Results from simulation experiments show that the model reduces total delivery time by 15%, lowers fuel consumption by 12.7%, and cuts total CO2 emissions by 14.3%, ensuring compliance with local emission targets. Moreover, the agile response mechanism enables dynamic re-routing during traffic incidents, further enhancing service continuity.
Case Study 2: Humanitarian Relief Operations
This case simulates logistics operations in the aftermath of a natural disaster, such as a cyclone or earthquake, where transportation infrastructure is partially disrupted and demand for essential goods is uncertain and rapidly evolving. The model is applied to coordinate deliveries of food, water, and medical supplies from regional warehouses to affected shelters and hospitals. IoT devices provide real-time updates on road accessibility, warehouse stock levels, and shelter capacities. The digital twin simulates operational scenarios, factoring in weather forecasts and regional constraints. The DDPG learning module adapts the supply network configuration based on changes in accessibility and emerging needs. In stress-test simulations, the model outperformed static planning approaches by 22% in meeting critical delivery windows. Key contributions from the model included real-time routing adjustments based on sensor data, prioritization of delivery loads through fuzzy demand modeling, and continuous adaptation of transportation availability influenced by digital twin feedback. The system maintained operational continuity in more than 90% of disrupted routes.
These illustrative applications highlight the model’s flexibility and relevance across sectors. Whether navigating the challenges of last-mile delivery in an urban setting or responding to rapidly shifting priorities in humanitarian contexts, the integration of adaptive learning with IoT/DT infrastructure empowers supply chains to make real-time, informed, and sustainable decisions.
Comparing the results of this research with studies that have used only the Internet of Things or digital twins, it is shown that the proposed hybrid framework provides significant practical advantages. Among them, we can mention the improvement of the average accuracy of demand forecasting and operating conditions, the reduction in the response time to unexpected events, and the increase in the model’s resilience in the face of noisy data. These improvements are achieved through the synergy of fuzzy logic for uncertainty management, the simulation and accurate prediction of the digital twin, and the dynamic decision-making mechanism based on reinforcement learning, which together provide a real-time and self-adaptive multi-objective optimization system.
One of the limitations of this research is the use of simulated and semi-random data, which aims to create controlled conditions for testing the model and measuring its response in diverse scenarios and at different scales. However, the proposed architecture, including IoT and digital twin layers, is fully ready to work with real data and can be used in industrial deployments and field studies without the need for structural changes.
Of course, this paper and the presented models have limitations. Despite the strengths and demonstrated performance of the proposed model, several underlying assumptions and limitations should be acknowledged to contextualize its application and inform future research.
Dependence on High-Quality Real-Time Data: The effectiveness of the IoT and digital twin integration depends heavily on the availability, accuracy, and granularity of real-time data. In real-world supply chains, delays, sensor malfunctions, or data noise can compromise model responsiveness. In addition, it should be noted that in real-world conditions, there is a possibility of sensor errors, noise in the data, transmission delays, or loss of part of the data. In the present version, to simplify the modeling, these factors are not simulated, but the fuzzy structure and digital twin infrastructure are designed in a way that allows the integration of data preprocessing algorithms, noise removal filters, and incomplete data replacement methods, and can be fully integrated with such conditions in later stages of research.
Another important limitation of the present study is the lack of field data at the current stage. The reason for this choice was to focus on the development of the multi-objective fuzzy algorithm and ensure the repeatability of the method. However, initial collaborations with companies active in the food supply chain and urban logistics have begun to implement a pilot and feed the model with real data, and in the next stages, experimental results will be added to the research for validation and accurate calibration.
Computational Complexity for Large-Scale Deployments: Although metaheuristics and reinforcement learning reduce exact computation burdens, the overall solution time and resource requirements can still be significant, especially for very large problem instances with high fuzziness and stochasticity.
Assumed Stability of Learning Convergence: The reinforcement learning mechanism (DDPG) assumes stable convergence of the policy over time. In volatile environments or under sparse feedback, there is a risk of suboptimal policy oscillation or convergence stagnation.
No Human-in-the-Loop Consideration: The current model operates under a fully automated framework. In practice, decision-makers often override algorithmic suggestions based on qualitative insights, which is not currently accommodated.
Simplification of Multi-Agent Logistics Dynamics: The model assumes a centralized optimization perspective. In distributed or decentralized supply chains with multiple autonomous agents (e.g., third-party carriers), coordination mechanisms would need further development.
Tuning Sensitivity: The performance of the DDPG module and metaheuristics is sensitive to hyperparameter settings (e.g., learning rate, mutation probability). Extensive tuning is required, and the model lacks automated self-tuning mechanisms at present.
Environmental Factors Are Static During Execution: While environmental constraints (e.g., emissions) are modeled as fuzzy, they remain fixed during optimization runs. The model does not yet support dynamic environmental regulation updates during execution.
Although, to increase statistical robustness, sensitivity analysis was completed with multiple replicates and mean, standard deviation, and confidence interval indicators were presented, the statistical validity of the results can still be further strengthened by developing more advanced statistical tests (such as multivariate analysis of variance and bootstrap tests) in future studies.
Sensor reliability is always one of the fundamental challenges in implementing IoT and digital twin-based systems. The possibility of sensor failure, gradual deviation in measurement accuracy, and the need for regular calibration can affect the quality of input data. Although the present model has the ability to integrate error detection and data replacement algorithms, in the next steps, it is necessary to implement predictive maintenance strategies and periodic calibration programs to ensure data stability in practice.
The multi-objective nature and the need for repeated execution of metaheuristic algorithms such as NSGA-II and MOPSO can create significant computational costs, especially in large-scale problems. Although the use of parallel processing and code optimization has led to a relative reduction in the computational load, the development of more scalable versions and the use of cloud computing infrastructures or GPUs is essential for future industrial applications.
Model performance depends on fine-tuning hyperparameters such as population size, mutation rate, learning rate, and stopping criterion. These hyperparameters directly affect the convergence speed and quality of optimal solutions. In the present study, these values were selected based on preliminary experiments and literature recommendations, but in the future, the process of selecting hyperparameters can be automated and more stable by using auto-tuning methods or higher-level meta-heuristic algorithms.
These limitations do not undermine the value of the model but instead highlight opportunities for further enhancement and real-world adaptation. Addressing these areas in future work—such as adding robust error-tolerant data pipelines, hybrid human–AI decision frameworks, or dynamic policy re-training modules—can further increase the resilience and practical utility of the framework.
Although the current model operates in a fully automated mode, future deployments will include Human–AI interaction pathways. Decision-makers will be able to monitor, override, or guide AI-driven recommendations, ensuring that expert insights complement real-time adaptive learning. This human-in-the-loop mechanism will strengthen trust and transparency in critical operational contexts.
8. Conclusions
This research presented a dynamic and adaptive model for designing and optimizing agile, sustainable supply chains under uncertainty. By integrating fuzzy multi-objective optimization with real-time Internet of Things (IoT) data, digital twin (DT) architecture, and an adaptive reinforcement learning mechanism (DDPG), the model offers a comprehensive and scalable framework for supply chain decision-making in volatile environments.
Key contributions of the study include the development of a multi-layered decision model that simultaneously minimizes cost, reduces delivery time, controls carbon emissions, and enhances supply chain agility. Unlike traditional models, the proposed approach dynamically updates fuzzy parameters based on real-time data inputs, enabling continuous responsiveness to evolving operational conditions. The implementation of the DDPG module further enhances learning efficiency and prediction accuracy, ensuring that supply chains are not only reactive but also anticipatory in nature.
Experimental analyses across multiple scenarios confirmed the model’s robustness and effectiveness. NSGA-II emerged as the most balanced and accurate optimization strategy, while MOPSO and the Whale algorithm offered advantages in speed and computational simplicity under specific conditions. Sensitivity analyses demonstrated the model’s ability to maintain stable performance across a range of demand patterns, capacity constraints, and environmental limitations.
The practical potential of the model was illustrated through two hypothetical but realistic case studies. In urban grocery distribution, the model enabled substantial reductions in delivery time (15%), fuel consumption (12.7%), and CO2 emissions (14.3%) through adaptive routing and load balancing. In humanitarian relief operations, the model ensured delivery reliability by dynamically adjusting route plans and resource allocations in response to disrupted infrastructure and fluctuating needs.
Beyond operational performance, the model contributes to broader strategic goals, including environmental sustainability, digital transformation, and crisis readiness. Its modular architecture and real-time learning capabilities make it adaptable for a variety of applications—from perishable goods logistics and e-commerce to public health and emergency response systems.
Future research directions include deploying the model in real-world pilot studies, integrating deeper predictive analytics using advanced deep learning techniques, and expanding the architecture to accommodate blockchain-based traceability or autonomous vehicle coordination. These extensions can further solidify the model’s role as a next-generation decision-support tool in the era of smart and resilient supply chains.
The next step of this research is to conduct field studies and experimentally deploy the model in real environments to validate and calibrate its performance with live data from IoT sensors and digital twins. This future path will complement the simulation-based findings and strengthen the experimental robustness and industrial applicability of the model. In future research, a pilot study with real industrial datasets is planned to empirically validate and fine-tune the proposed model.
In conclusion, this study offers a robust foundation for intelligent supply chain design, capable of navigating the challenges of complexity, uncertainty, and sustainability in the modern industrial landscape.