Smart Master Production Schedule for the Supply Chain: A Conceptual Framework

: Risks arising from the effect of disruptions and unsustainable practices constantly push the supply chain to uncompetitive positions. A smart production planning and control process must successfully address both risks by reducing them, thereby strengthening supply chain (SC) resilience and its ability to survive in the long term. On the one hand, the antidisruptive potential and the inherent sustainability implications of the zero-defect manufacturing (ZDM) management model should be highlighted. On the other hand, the digitization and virtualization of processes by Industry 4.0 (I4.0) digital technologies, namely digital twin (DT) technology, enable new simulation and optimization methods, especially in combination with machine learning (ML) procedures. This paper reviews the state of the art and proposes a ZDM strategy-based conceptual framework that models, optimizes and simulates the master production schedule (MPS) problem to maximize service levels in SCs. This conceptual framework will serve as a starting point for developing new MPS optimization models and algorithms in supply chain 4.0 (SC4.0) environments. The extent to which the SC organization’s decisions impact the future situation of the natural environment, society and business viability [52]. A sustainable SC is one that includes measures of proﬁt and loss, as well as social and environmental dimensions. Such conceptualization has been referred to as the sustainability triple dimension: ﬁnancial, social, environmental [53].


Introduction
Since artificial intelligence began to make its way into almost all the sectors of today's society, the adjectives intelligent or smart have become commonplace to describe a myriad of entities which are, in one way or another, endowed with the ability to react to changes in the environment to establish optimal operating conditions by themselves. We can find some examples in the industrial sector, such as intelligent software, intelligent systems, and intelligent agents, or smart grids, smart sensors, smart products, among others. For a supply chain 4.0 (SC4.0), understood as the supply chain (SC) that is reorganized by using the design principles and enabling technologies of the Industry 4.0 (I4.0) spectrum [1], it seems appropriate to link the intelligent or smart attributes with SC abilities to overcome the risks that it faces and survive as the main proof of its capability to respond to challenging changes in the environment and to achieve optimal operating conditions. Along these lines, and regardless of whether causes are natural, economic, political or technological, disruption is the most significant risk that an SC faces in the short and mid terms. On a long-term horizon, lack of sustainability is one of the main risks for SC survival. So an SC4.0, such as a smart SC, should be resilient and sustainable.
The effect of technological advances on industrial companies is indeed remarkable, and guides their development toward a production paradigm in which resilience and sustainability emerge as decisive SC management elements for both the future occupation of better market positions and survival purposes [2]. SCs' digital transformation, experimented on its way toward SC4.0, can contribute to addressing those aspects that compromise resilience and sustainability from a more favorable position by using I4.0 design principles and enabling technologies to mitigate the complexity and heterogeneity grows, particularly if the MPS is posed as a multi-objective issue. These limitations can lead to unacceptable computational times for a decision support system (DSS) when this is expected to facilitate real-time decision making, especially when the intention is to provide it with a certain level of autonomy, just as the approach set out in this research calls for. The machine learning (ML) potential to tackle this situation is remarkable at any production planning and control decision level [23,24] and its application to the MPS problem should, therefore, be considered. Furthermore, the feasibility to formulate the MPS problem as a Markov decision process (MDP) [25][26][27] leads to the specific choice of reinforcement learning as the main suitable candidate among existing ML methodologies.
Thus the combined use in the MPS process of (i) the DT enabling technology, (ii) the ZDM management model, and (iii) ML-based modeling approaches is particularly relevant because it can guide SC toward positions of greater resilience and sustainability and, for this reason, can be qualified as a smart approach by providing 3-fold and complementary assistance to SCs' responsiveness to changes in their environment. Nevertheless, it must be stated that this joint perspective of the smart improvement in the MPS problem has not yet been addressed by the academic community as only one author in the currently existing literature provides a simple initial conceptual framework that coincides with the joint approach herein indicated. This paper presents an overview of the addressed topics from a joint perspective and, to bridge this knowledge gap, proposes an initial ML-based DT framework for automated MPS management in an SC4.0 context with a zero-defect characteristic, which we call smart MPS, to provide an answer to the following research questions: -RQ1: What mechanisms can make the DT competent in assisting the MPS process from an enabling strategy? -RQ2: How can ML techniques help to overcome the difficulties that arise from the MPS problem's computational efficiency? -RQ3: How does the ZDM anti-disturbing strategy push MPS to achieve a more resilient and sustainable SC? -RQ4: Can the DT technology, the ZDM management model and ML-based modelling approaches be considered conceptual complementary tools that support MPS and push it to higher resilience and sustainability levels?
The rest of the paper is organized as follows. Section 2 first provides definitions of the main concepts included in this research and subsequently offers an overview of the related literature. Section 3 describes the conceptual proposal by defining an initial framework and presenting the setup of a smart MPS. Section 4 discusses the main implications of the proposal formulated by reviewing how the proposal responds to the research questions. Finally, Section 5 provides the main conclusions and further research.

Literature Review
The review of the selected literature was carried out in four stages: (i) a semantic introduction to the main involved concepts; (ii) a literature search; (iii) a thematic approach of the joint domain making up the selected literature; (iv) a content analysis to identify the main contributions from the perspective of this research.

The Main Involved Concepts
The introductory definitions of the main concepts employed by means of this research are provided in Table 1.

Concept Definitions
Industry 4.0 (Enabling context) I4.0 stands for the fourth industrial revolution, which is defined as a new level of organization and control over the entire value chain of products' life cycle. It is geared to increasingly individualized customer requirements [28]. A combination of digital technology with manufacturing transforms industrial production to the next level [29] the convergence of industrial production, information and communication technologies [30].
Supply chain 4.0 (Target context) A transformational holistic approach to SC management that utilizes I4.0 disruptive technologies to streamline SC processes, activities and relations to generate significant strategic benefits for all the SC stakeholders [31]. SC4.0 is the SC created as a result of the new digital era brought forth by the fourth industrial revolution [32], I4.0. The reorganization of SCs-design and planning, production, distribution, consumption, reverse logistics-using technologies known as I4.0 [1].

Master production schedule (Research object)
A line on the master schedule grid that reflects the anticipated built schedule of those items assigned to it, and one that represents the items that a company plans to produce and are expressed as specific configurations, quantities and dates [8]. The MPS is essential for maintaining customer service levels and stabilizing production planning in a material requirements planning (MRP) environment [33]. The MPS drives the MRP system and provides an important link between the forecasting, order entry, and production planning activities on the one hand, and the detailed planning and scheduling of components and raw materials on the other hand [34].

Digital twin (Research tool)
A dynamic model in the virtual world that is fully consistent with its corresponding physical entity in the real world and can simulate its physical counterpart's characteristics, behavior, life, and performance in a timely fashion [35]. A virtual model in the virtual space that is used to simulate the behavior and characteristics of the corresponding physical object in real time [36]. A virtual and computerized counterpart of a physical system that can exploit the real-time synchronization of the sensed data from the field and is closely linked with I4.0 [37].

Machine learning (Research tool)
A computer program capable of learning from experience to improve a performance measure of a given task [38]. ML is an evolving branch of computational algorithms, designed to emulate human intelligence by learning from the surrounding environment [39]. ML is an artificial intelligence application that provides computers with the ability to automatically learn and improve from experience with no direct programming [40].

Zero-defect manufacturing (Research tool)
A strategy whose goal is to decrease and mitigate failures in manufacturing processes and to do things right the first time [41]. A manufacturing strategy which, by assuming that errors and failures will always exist, focuses on minimising and detecting them online so that no production output deviates from specification advances to the next step [16]. ZDM consists of four strategies: detection, repair, prediction, prevention [42].

Concept Definitions
Intelligence (I4.0 design principle) The attribute that defines an artificial system's behavior which, if a human behaves in the same way, is considered intelligent [43]. Intelligence assists decision making by converting raw business data into valuable and meaningful information and knowledge [44], and is supported by the development of advanced analytics and data visualization models, platforms and services that support decision-making processes [45]. Intelligence is a corporate capability to forecast change, regardless of it coming in the form of opportunity or threat, and in time to do something about it [46].
Real-time action ability (I4.0 design principle) A set of conditions, qualities and abilities that allows a device or system to correctly perform a function when interacting with a real-world physical process that shares the same temporal constraints. In the SC context, this capability characterizes the way in which a given SC device or system successfully performs its function within the time frame that configures the process with which it interacts without altering the pace of its progress. This capability is one of the main concerns in an SC as it allows to speed up the elicitation of responses during decision making and, consequently, increases its efficiency [47].

Supply chain resilience (Expected effect)
Resilience is an SC's capacity to persist, adapt or transform when faced with change from both engineering and social-ecological perspectives [48]. An SC's adaptive capability is to prepare for and/or respond to disruptions, to make a timely and cost-effective recovery and to, therefore, progress to a post-disruption state of operations, ideally a better state than that before the disruption [49].
SC resilience is the adaptive capability to prepare for unexpected events, respond to disruptions, and recover from them by maintaining the continuity of operations at the desired level of connectedness and control over both structure and function [50].
Supply chain sustainability (Expected effect) SC sustainability is the management of environmental, social and economic impacts, and the encouragement of good governance practices, throughout the life cycles of goods and services [51]. The extent to which the SC organization's decisions impact the future situation of the natural environment, society and business viability [52]. A sustainable SC is one that includes measures of profit and loss, as well as social and environmental dimensions. Such conceptualization has been referred to as the sustainability triple dimension: financial, social, environmental [53].

Literature Search
The SC is a conceptual realm that has been approached from many different angles with more than 50,000 entries in Scopus in the past decade alone. In an attempt to identify all those trends, Maryniak et al. [54] diagnose which the dominant SC topic areas are in the last three decades. However, hardly any literature has been identified that simultaneously addresses the MPS problem in the SC from the ZDM perspective and with the joint support of DT and ML technologies. Thus, in the Scopus database, the search instance TITLE-ABS-KEY (("supply chain" OR "supply network") AND ("master production" OR mps) AND (zdm OR "zero * defect") AND "digital twin" AND ("machine learning" OR "artificial intelligence")) returned only one result, which evidences a knowledge gap. For this reason, we further explored the existing literature in the individual knowledge domains MPS, Computers 2021, 10, 156 6 of 24 ZDM, DT and ML, applied specifically to the SC, which added 24 relevant papers to the aforementioned one (Table 2).  Given the special relevance of some of the involved concepts, such as: (i) I4.0, for representing the enabling context; (ii) SC4.0, for representing the target context; (iii) intelligence and real-time action ability, for constituting the main I4.0 design principles implied in the proposed research line; (iv) resilience and sustainability, for being the ultimate expected effects of applying the DT-ML-ZDM scheme in the MPS, the review of these 25 papers has also considered the treatment given to all these concepts.

Thematic Analysis
The thematic analysis of the selected papers, carried out with the VOSviewer 1.6.16 tool, shows ( Figure 1) a first grouping of concepts around the main one, "supply chain", which is connected to four other groupings: "digital twin", "production plan", "digital technology" and "supply chain management". From the thematic map, and based on the co-occurrences in the text composed of the title and the abstract of each paper, we observe that: (i) the main group formed by "supply chain", in addition to "organization", is formed by "ZDM", "sustainability" and "uncertainty"; (ii) the most closely related concept to "supply chain" is "digital twin", which might reveal the importance that this technology has acquired in the SC field; (iii) "digital twin" and "supply chain planning" form a cluster, which shows the importance that the DT has in academia for SC planning processes; (iv) the "production plan" cluster is also formed by "CPS" (cyber-physical systems) and "agent", which can place them as common tools for researchers in production planning; (v) the cluster headed by "digital technology" includes concepts such as "quality", "simulation", "ripple effect" and "resilience", and this relation can show where digital technology draws useful attention or generates interest in the SC domain; (vi) the cluster headed by "supply chain management" also integrates "knowledge" and "future research", which might be related to the shown interest in acquiring new knowledge into the SC domain that supports improvements to its management processes.

Content Analysis
Proposing new resolution and optimization models has been a recurrent approach in the specific SC literature segment that focuses on the MPS. Chern et al. [55] put forward a multi-objective MPS resolution model with a heuristic method based on a genetic algorithm called the GA-based master planning algorithm (GAMPA) to solve a MPS problem with multiple final products, substitutions and a recycling process with a stochastic pattern, which creates a loop in both the SC and product structure trees. Grillo et al. [56] use the fuzzy set theory to model uncertainty and propose a metaheuristic particle swarm optimization (PSO) technique as a solution method. A method to achieve an optimal MPS in an uncertain environment is that proposed by Sutthibutr and Chiadamrong [57]. It is based on a multi-objective linear fuzzy model with an α-cut analysis to ensure decision makers. The result satisfies their preferences based on a specified minimum allowed satisfaction value (α). Arani and Torabi [58] integrate physico-material tactical plans with financial ones to account for their reciprocal effects in a bi-objective mixed possibilistic-stochastic model for an SC master planning problem. Ghasemy et al. [59] propose a mixed integer nonlinear programming model with probabilistic constraints to determine centralized planning, viewed from the sustainability perspective under uncertainty. Here the sustainability aspect is reduced to a sustainable procurement planning addressed by appropriate supplier selection. Martin et al. [60] address the uncertain MPS problem for an automotive second- tier supplier with two optimization approaches based on other authors' research. Both were tested in a real automotive SC and compared to a deterministic approach. The MPS problem for a centralized SC of replenishment, production and distribution is tackled by Peidro et al. [61], who present a fuzzy multi-objective linear programming approach to model it.

Content Analysis
Proposing new resolution and optimization models has been a recurrent approach in the specific SC literature segment that focuses on the MPS. Chern et al. [55] put forward a multi-objective MPS resolution model with a heuristic method based on a genetic algorithm called the GA-based master planning algorithm (GAMPA) to solve a MPS problem with multiple final products, substitutions and a recycling process with a stochastic pattern, which creates a loop in both the SC and product structure trees. Grillo et al. [56] use the fuzzy set theory to model uncertainty and propose a metaheuristic particle swarm optimization (PSO) technique as a solution method. A method to achieve an optimal MPS in an uncertain environment is that proposed by Sutthibutr and Chiadamrong [57]. It is based on a multi-objective linear fuzzy model with an α-cut analysis to ensure decision makers. The result satisfies their preferences based on a specified minimum allowed satisfaction value ( ). Arani and Torabi [58] integrate physico-material tactical plans with financial ones to account for their reciprocal effects in a bi-objective mixed possibilistic-stochastic model for an SC master planning problem. Ghasemy et al. [59] propose a mixed integer nonlinear programming model with probabilistic constraints to determine centralized planning, viewed from the sustainability perspective under uncertainty. Here the sustainability aspect is reduced to a sustainable procurement planning addressed by appropriate supplier selection. Martin et al. [60] address the uncertain MPS problem for an automotive second-tier supplier with two optimization approaches based on other authors' research. Both were tested in a real automotive SC and compared to a deterministic approach. The MPS problem for a centralized SC of replenishment, production and distribution is tackled by Peidro et al. [61], who present a fuzzy multi-objective linear programming approach to model it.
Serrano et al. [18] propose an initial DT-based conceptual framework to model and Serrano et al. [18] propose an initial DT-based conceptual framework to model and simulate the MPS problem with a ZDM feature in the SC4.0 context. This is the only paper in the literature to address the focus of this research comprehensively, albeit with an initial descriptive approach. This framework focuses on creating an enabling space for solving optimization algorithms for the MPS problem based on applying deep reinforcement learning (DRL) techniques. The framework is designed to accommodate the set of actors in the SC, along with their physical and virtual processes and resources in a collaborative manner. Its design aims to improve SC performance by reinforcing the digitization, intelligence, visibility, interconnectedness, organization and sustainability I4.0 attributes. This initial framework is restricted to the manufacturer and goes up to second-tier suppliers to narrow down the problem's scope.
According to Orozco-Romero et al. [62], the DT technology is a tool that enables both real-time digital monitoring and automatic decision making. Therefore, DTs are relevant tools when pursuing the goal of automating SC systems. Marmolejo-Saucedo et al. [12] review the scientific literature on DTs as one of the main I4.0 enabling technologies within the SC management realm. The association of DTs with SC visibility, and the possibility of planning and making real-time decisions, lead to better disruptive risk management and higher resilience levels. Along such lines, Barykin et al. [63] attribute the need to build DTs given SCs' poor reliability and stability due to errors in their operation. They assert that DTs can generate information on the impact of such errors, and can influence SC performance by observing different scenarios that simulate the location of errors and their duration, and to analyse recovery policies. All this leads to greater SC resilience. Ivanov et al. [13] explain the SC DT concept and propose a framework for risk management by analyzing perspectives and future transformations that can help to integrate resilience owing to the information provided by the DT. According to the authors' paper, an SC DT is a model that can represent the network state for any given moment in time, and allows for complete end-to-end SC visibility to improve resilience and to test contingency plans, which is clearly aligned with the approach of this research by focusing on resilience and sustainability. The research by Ivanov and Das [64] is centred on SC resilience after disruptive events occurring as a result of the COVID-19 pandemic and how to optimally recover normalcy in an SC. It identifies the need to implement such a partnership to map supply networks and to ensure their visibility as a tool to recover from disruption, where the DT can play a significant role by taking the disruptive effect of the pandemic as an example. Dolgui et al. [65] propose reconfigurability as an SC parameter that characterizes the SC in an uncertain and changing environment. It does so by addressing the notion of a reconfigurable SC, or a X-network, by taking the DT as a basis for its design. In a reconfigurable SC, the organization design at the network level must be shaped by I4.0, circular economy, industrial symbiosis and collaborative industry. In SCs, reconfigurability plays an important role in I4.0 design principles, such as intelligence, real-time action capability, flexibility and sustainability (the last of which comes in its three well-known dimensions), as well as enabling technologies such as the DT, in this specific case as SC DTs which, according to the authors, are computerized models representing the network state for any given moment in real time. SCs' resilience to fluctuations in make-to-order SC environments in customized production cases is addressed by Park et al. [66], whose propose a logistics CPS, or CPLS, coordinated with agent cyber physical production systems (CPPS) in a multi-level cyber-physical system structure based on distributed DT simulation technology. Wang et al. [10] address the SC problem from a DT perspective by detailing its benefits and potential compared to other approaches: (i) with synchronization between the physical and virtual twin, the DT promotes faster action and response to reduce lead times; (ii) with dynamic and comprehensive data collection, the DT improves forecast accuracy; (iii) with high-quality modeling, the DT significantly improves planning verifications. Thus in the I4.0 and SC4.0 eras, the DT promotes demand forecasting, aggregate planning and inventory planning to be more analytical, reliable, efficient and quick to obtain, which all favor SC resilience.
As for using ML techniques to support production planning and control problems in the SC domain, it is worth noting that most contributions focus on the operational decision level. Of those dealing with planning at the tactical decision level, most focus on either inventory replenishment or, to a lesser extent, dynamic supplier selection problems. Alves and Mateus [67] consider a DRL approach based on an improved version of the proximal policy optimization algorithm (PPO), called PPO2, to solve the inventory problem of a four-step SC with two nodes per step and stochastic demands. The optimization approach for Peng et al. [68] is similar, but the modeled problem considers a simpler SC composed of three stages-plant, plant warehouse and retailer-subject to independent, stochastic and seasonal demand. In it the adequate and stable supply of raw materials is assumed, but the plant's production capacity is limited. The article of Boute et al. [69] offers a conceptual approach, and its objective is to describe the key design choices of DRL algorithms to facilitate their implementation into the inventory control task in SCs. It first introduces MDPs for inventory control optimization in their different solution approaches. Second, it describes the use of neural networks to solve MDPs, as well as the different methods that arise according to how the function of Bellman equations is used for the neural network design. After these theoretical introductions, the authors explain the procedure followed to develop DRL algorithms by providing a taxonomic analysis. The research by Afridi et al. [70] focuses on the environments of certain complex SCs, such as those in the semiconductor industry, where innovation cycles are short, production lead times are long and demand uncertainty is high. These operating conditions in SCs mean that semiconductor manufacturers are particularly exposed to the undesired amplification of demand fluctuations within the chain, a phenomenon known as the bullwhip effect, which was described by Lee et al. [76]. In this context, the authors propose adopting a collaborative strategy known as vendor management inventory (VMI), in which the supplier takes control and full responsibility for replenishing the customer's inventory by defining minimum and maximum inventory levels, and all supported by the deep Q-network (DQN) method. The authors consider a two-stage SC and model this problem as an MDP. Synchronization of SCs as a means to avoid the bullwhip effect in stochastic environments constitutes the central theme of the research by Kegenbekov and Jackson [71]. Indeed, an SC with synchronized stages and nodes can prevent the dynamics of cascading inventory increases and decreases that follow unanticipated fluctuations in demand, and to mitigate the bullwhip effect caused by operational errors. A DRL agent can perform the adaptive coordination needed to perform such synchronization, as long as end-to-end visibility in the SC is complete. As an MDP, the authors model a problem characterized by having a single-product, multi-stage, single-node-per-step SC environment in which a PPO agent has to choose how many products to order from all the SC agents in each step to, thus indirectly obtain local inventory levels.
The application of the ZDM philosophy to the SC domain has also been a topic addressed by researchers, albeit sparsely. Most focus more on the quality management discipline than on production planning and control, and the zero-defect outcome comes about from indirectly applying other strategies or philosophies. For Siddh et al. [72], the objective is to integrate lean six sigma into SCs instead of ZDM, but the zero-defect outcome is indirectly achieved as an effect. Within the lean six sigma framework, the authors place a central idea: knowing how many defects the process has, systematically figuring out how to eliminate them is possible. This research does not address resilient SC properties and, as the authors state, the only mention made to the sustainability issue is through the 5S of lean manufacturing: sort, store, shine, standardize, sustain. Pardamean and Wibisono [73] propose a framework to explain the impact of six sigma on SC performance based on increasing process capability in the value stream by seeking zero defects and reducing process variation, which approximates to the aforementioned exogenous strategy to mitigate the milieu stochasticity, also described as an antidisturbing strategy, to thus favor the automation of SC production systems and processes, and the capability to respond in real time to changes in the environment. In this research, sustainable SCs' logistics performance is assessed using three categories, namely sustainable supplier selection, sustainable production and sustainable delivery. Poornachandrika and Venkatasudhakar [74] present a behavioral process and a system model for achieving zero defects with a case study conducted in an automotive company. This article focuses mainly on the transformation of quality within SCs. One of its main conclusions is that the elimination of human intervention in some processes improves results, which relates it to automation. Unlike the above authors, Thakur and Mangla [75] understand the zero-defects concept in the SC as one of the final effects of sustainable practices.
Finally, it is worth noting that the relevant literature on MPS, DT, ML and ZDM applied specifically to the SC shows a common thread that should be highlighted here. Most articles present research results that, in one way or another, and to a greater or lesser extent, are based on some of the design principles and enabling technologies of I4.0 and, therefore, of SC4.0, from a positive perspective of both paradigms; in other words, from a position that assumes, as a valid axiom, that introducing I4.0 and SC4.0 into a context such as that of SCs only leads to positive effects. Some researchers argue that this is not really the case. Adopting I4.0 and SC4.0, in addition to opportunities, involves barriers and poses risks [77] that must be duly considered when addressing any digital transformation project in the SC, and which will depend largely on the selected digitization strategy and the core capabilities acquired by the SC by that strategy [78].
From the review, it can be concluded: (i) the existing literature on the MPS problem addressing the DT, ML and ZDM individually is abundant and varied, but the literature that addresses the problem from a joint perspective is practically nonexistent; (ii) DT technology is considered by researchers an enabling tool to achieve higher efficiency and reliability levels by endowing SC systems with capabilities, such as decision-making automation, real-time response, end-to-end visibility or disruptive risk management; (iii) conceptual framework or model proposals based on the DRL-driven DT are very limited; (iv) using ML methods to support production planning in the SC domain is also a limited practice that centers mostly on DRL-based methods; (v) of all the DRL-based methods followed by the researchers in the SC planning domain, PPO implementations have become more prominent in the last 3 years, followed by DQN algorithms, whose use is currently declining in favor of PPO and its variants, as previously indicated; (vi) the ZDM issue in the SC domain is still not approached as a per se strategy, but appears as an effect of applying other strategies, such as lean manufacturing, six sigma, or their merger lean six sigma, despite the remarkable and growing interest shown by researchers in applying ZDM to other planning domains, especially at the operational decision level. Only a couple of authors mention the potential of this strategy for the mitigation of disturbances that affect processes, which is so exploited in other planning contexts, especially in operational decision terms such as job scheduling and sequencing.

Proposal
The proposal of a conceptual framework for the smart MPS based on the DT-ML-ZDM scheme is formulated in the following five stages in this section: (i) alignment axes of the proposal with the I4.0 and SC4.0 paradigms; (ii) integrating the DT for the MPS into the SC context; (iii) integrating the physical and virtual environments of the DRL-based DT; (iv) description of the DRL-based agent's learning and prescription processes; and, finally, (v) the proposal summary. The proposal presented here is based on the general assumption that the environment on which it is developed has the characteristics of an SC4.0, i.e., an SC whose digital transformation is aligned with the design principles governing I4.0 and is carried out by using its enabling technologies. In this particular case, specifically some design principles of I4.0, such as flexibility, intelligence, integration, virtualization, interconnectedness, interoperability, visibility, real-time action ability, energy efficiency and sustainability [79][80][81][82] play a relevant role directly or indirectly in the endeavor to confront SC complexity and heterogeneity toward more resilience and sustainability on the way toward SC4.0. The same applies for some of its enabling technologies, such as information and communication technologies (ICT), cyber-physical systems (CPS) or cyber-physical production systems (CPPS), the Internet of things (IoT) or the industrial IoT (IIoT), smart enterprise resource planning (ERP), manufacturing execution system (MES), virtual reality, DT, ML algorithms, big data, cloud services or cloud manufacturing, semantic technologies and cybersecurity [79,81,[83][84][85], which are involved in the design of the proposal, along with techniques such as modeling, simulation and optimization.

Integrating the DT into the SC Context
Within the conceptual framework that is herein proposed, the DT is firstly characterized by virtually replicating the MPS, an operation also known as digital twinning [86].
Based partially on the research by [13,63,66], the proposed DT shapes the MPS as two different planes, the physical plane and the virtual one, as shown in Figure 2. In the physical DT plane, the MPS is determined by physical processes and resources, meaning data and information on the processes and resources from the actual SC environment. The main physical processes that determine the MPS are: (i) demand forecasting; (ii) receiving customer orders; (iii) planning processing; (iv) formalizing the intervening parties' commitment to the MPS; (v) referring to suppliers about the MPS; (vi) controlling MPS evolution. As for the involved physical resources, the MPS is determined by: (i) manpower; (ii) pro-ductive equipment; (iii) inventory; (iv) started production; (v) subcontracted quantities; (vi) capacity constraints; and (vii) time as a resource represented by the different milestones shaping and constraining the problem. The sources and communication systems of these data and information can vastly vary by taking into account the environment in question, characterized by the I4.0 and SC4.0 paradigms: CPS/CPPS, sensorization, the IoT/IIoT, cloud manufacturing, smart ERP, MES, among other I4.0 enabling technologies. The data and information fed to the DT from any SC node must be automated and its real-time flow must be guaranteed.
Based partially on the research by [13,63,66], the proposed DT shapes the MPS as two different planes, the physical plane and the virtual one, as shown in Figure 2. In the physical DT plane, the MPS is determined by physical processes and resources, meaning data and information on the processes and resources from the actual SC environment. The main physical processes that determine the MPS are: (i) demand forecasting; (ii) receiving customer orders; (iii) planning processing; (iv) formalizing the intervening parties' commitment to the MPS; (v) referring to suppliers about the MPS; (vi) controlling MPS evolution. As for the involved physical resources, the MPS is determined by: (i) manpower; (ii) productive equipment; (iii) inventory; (iv) started production; (v) subcontracted quantities; (vi) capacity constraints; and (vii) time as a resource represented by the different milestones shaping and constraining the problem. The sources and communication systems of these data and information can vastly vary by taking into account the environment in question, characterized by the I4.0 and SC4.0 paradigms: CPS/CPPS, sensorization, the IoT/IIoT, cloud manufacturing, smart ERP, MES, among other I4.0 enabling technologies. The data and information fed to the DT from any SC node must be automated and its real-time flow must be guaranteed. In order to perform the analysis, simulation, optimization and prescription, the data and information from the physical SC environment must be replicated and processed In order to perform the analysis, simulation, optimization and prescription, the data and information from the physical SC environment must be replicated and processed virtually at two different levels: the backend or support level; the frontend or interface. The backend forms part of the DT development and is responsible for running the existing system logic behind the interface with the human operator. In the backend, the processes and resources data and information from the physical plane are translated into virtual processes and resources. The virtual processes that enable DT functioning in the backend are: (i) simulation in the virtual environment for agent training, based on historical data or the generation of synthetic scenarios; (ii) agent training in the virtual environment, a parallel and simultaneous process to the previous one; and (iii) agent prediction, herein called prescription, a process enabled by the successful completion of the training process. By virtual backend resources, we mean both the data and information related to the real plane elements, as well as those related to the formulation and modeling of the MPS problem, but they are all coded and combined in such a way that they can feed the above simulation, training and prescription processes based on the DRL method. This includes data and information from: (i) the MDP model of the MPS and the DRL model; (ii) demand; (iii) costs; (iv) lotification; (v) capacity; (vi) deadlines and periods; and (vii) possible policies. From these backend processes, data and information, the frontend, as an interface specially prepared to human users, automatically provides in real time the schedule that is currently prescribed by the agent and the necessary information about using resources. This information can also automatically feed other tactical decision level processes, such as MRP, inventory control or capacity requirements planning (CRP).
The DT backend, and the MPS data and information contained therein, are elements that, in principle, belong to the manufacturer's sphere and are not replicated for other SC stakeholders, i.e., suppliers, warehousers, retailers or, in some cases of customized manufacturing, even customers. Unlike the previous one, the DT frontend, whose repli-cation scope extends beyond the manufacturer's sphere (the centre of the SC within this framework), is shared with other SC stakeholders in a collaborative cloud-computing environment to provide end-to-end visibility to each SC actor and the possibility of real-time process synchronization to achieve: (i) greater SC enablement against unexpected demand fluctuations, which make it more resilient; and (ii) optimized use of resources by enabling inventory reduction, improved transportation efficiency, reduced energy use, a shorter lead time to, thus, lower costs, among other effects, that result in greater sustainability.
Within this framework, the SC is understood as a single domain for all the intervening SC stakeholders, where each one uses personalized data and information blocks about the MPS with different access categories according to their particular needs, but all from a single common origin: the DT. This scheme not only facilitates the flow of data and information about production planning among actors, but also creates a coordination channel for the zero-defect strategy in the SC as it makes it possible to: (i) enable collaborative manufacturing with the DT as a means of sharing data and information about processes and resources; (ii) for each involved stakeholder, monitor the MPS process parameters that need to be shared in this collaborative manufacturing context to improve early defect detection, or even prediction, as a way to empower prevention policies and to, thus, better cope with disturbing or disruptive events and their subsequent recovery; (iii) enhance data storage, analysis and visualization by unifying these performances through the DT; (iv) quickly reconfigure and reorganize the MPS whenever necessary in a coordinated manner by gaining efficiency and saving idle times for this reason; and (v) collaboratively launch real-time production rescheduling across the entire SC, which is generated and spread by the DT. In a nutshell: (i) collaborative manufacturing; (ii) process monitoring; (iii) data management enhancement; (iv) reconfiguration and reorganization; and (v) real-time rescheduling ability, i.e., five of the seven system areas-which also include continuous quality control and online predictive maintenance-formulated by Lindström et al. [16] in their model for ZDM would be collected and considered within this framework to favor a zero-defect goal in the SC and, in this specific manner, to understand MPS processes and fight against process failures to minimize, mitigate and eliminate possible disturbances that can potentially place the SC's normal operation at risk and lead to higher resilience levels.
The implementation of the DT for the SC smart MPS according to the described framework would require several stages to be extended throughout the chain as a whole. Nevertheless, the first and most important one is to develop the manufacturer's specific domain, where the backend is located as the core of the DT, before extending it to the scope of the other actors involved in the SC. The basic infrastructure and processes in this restricted DT space are described in the two following subsections of the proposal.

Integrating the Physical and Virtual Environments of the DRL-Based DT
The DRL-based DT is configured within this framework as a set of overlapping and interrelated layers, where each individual layer demarcates a defined part of the DT environment ( Figure 3). This setup is partially based on the research by Serrano et al. [88] for a smart DT for ZDM-based job-shop scheduling.
All these DT layers or elements act as a receiver, processor and/or generator of data and information, depending on the characteristics of the role played in the DT. The physical environment of the DT groups the following five elements: (i) the hardware and software making up the DT frontend interface and backend processing core; (ii) the hardware and software for storing the dataset in the cloud; (iii) the IIoT; (iv) cyber-physical systems (CPS) distributed throughout the SC physical environment; and (v) information captured locally on the current state of production and resources that is relevant to the MPS. Regarding the virtual environment, it groups the following elements: (i) demand forecasts and the current status of customer ordering, dynamically updated in real time; (ii) the DRL agent; (iii) the master scheduling policy; (iv) the simulation environment for agent training; (v) the accumulated training data; and (vi) the set of actions taken by the agent on the MPS. restricted DT space are described in the two following subsections of the proposal.

Integrating the Physical and Virtual Environments of the DRL-Based DT
The DRL-based DT is configured within this framework as a set of overlapping and interrelated layers, where each individual layer demarcates a defined part of the DT environment ( Figure 3). This setup is partially based on the research by Serrano et al. [88] for a smart DT for ZDM-based job-shop scheduling. All these DT layers or elements act as a receiver, processor and/or generator of data and information, depending on the characteristics of the role played in the DT. The physical environment of the DT groups the following five elements: (i) the hardware and software making up the DT frontend interface and backend processing core; (ii) the hardware and software for storing the dataset in the cloud; (iii) the IIoT; (iv) cyberphysical systems (CPS) distributed throughout the SC physical environment; and (v) information captured locally on the current state of production and resources that is relevant to the MPS. Regarding the virtual environment, it groups the following elements: All these elements are synchronized and constitute a single cohesive environment in the DT.

Description of the DRL-Based Agent's Learning and Prescription Processes
Both processes are based on the DRL method [69], and are basically developed by two elements, the training environment and the DRL agent (Figure 4), to be implemented into a DRL framework based on the Python code with the help of its specialized open source libraries.
The training environment is the MPS modeled as an MDP in such a way that it is made up of: (i) an observation space; (ii) an action space; (iii) an initial state; (iv) the state transition function. The observation space specifies which are the variables of the MPS problem and delimits the boundaries between, which may vary each period. The action space determines the variety of actions that can be decided about the MPS problem and to what extent. The initial state represents the MPS state during the first period considered in the MPS training cycle, and is defined by the value taken by the MPS variables in the observation space during the initial period. Finally, the state transition function defines what varies, and to what extent, between one state and the next after an agent action is applied in the valid action space. This environment to be implemented into the Python code with the Open AI Gym library is assisted by an ad hoc scenario generator, which can create synthetics problem instances that are adequately modeled to facilitate agent training. The training process can also be assisted with stored historical MPS data modeled as MDP requirements if data are available and if deemed necessary or convenient. the DT.

Description of the DRL-Based Agent's Learning and Prescription Processes
Both processes are based on the DRL method [69], and are basically developed by two elements, the training environment and the DRL agent (Figure 4), to be implemented into a DRL framework based on the Python code with the help of its specialized open source libraries. The training environment is the MPS modeled as an MDP in such a way that it is made up of: (i) an observation space; (ii) an action space; (iii) an initial state; (iv) the state transition function. The observation space specifies which are the variables of the MPS problem and delimits the boundaries between, which may vary each period. The action space determines the variety of actions that can be decided about the MPS problem and to what extent. The initial state represents the MPS state during the first period considered in the MPS training cycle, and is defined by the value taken by the MPS variables in the observation space during the initial period. Finally, the state transition function defines what varies, and to what extent, between one state and the next after an agent action is applied in the valid action space. This environment to be implemented into the Python code with the Open AI Gym library is assisted by an ad hoc scenario generator, which can create synthetics problem instances that are adequately modeled to facilitate agent training. The training process can also be assisted with stored historical MPS data modeled as MDP requirements if data are available and if deemed necessary or convenient.
In the training stage, the DRL agent must play its role in the arena shaped by the above-described environment. From the initial state prepared by the scenario generator, the agent essentially acts in the environment by triggering an advance toward a new state for the next period. The environment grants the agent a reward for this step, whose value essentially depends on how much the new state improves the MPS. With this reward and the new MPS state, the agent performs a new action that depends on the selected type of In the training stage, the DRL agent must play its role in the arena shaped by the above-described environment. From the initial state prepared by the scenario generator, the agent essentially acts in the environment by triggering an advance toward a new state for the next period. The environment grants the agent a reward for this step, whose value essentially depends on how much the new state improves the MPS. With this reward and the new MPS state, the agent performs a new action that depends on the selected type of DRL method, i.e., value-based, policy-based, hybrid methods such as actor-critics, among others, which lead to a new state and a new reward, and so on, period after period, to complete a planning cycle. These training cycles are repeatedly performed as often as necessary until the agent's throughput evaluation exceeds a certain threshold, or the DRL algorithm is changed by not exceeding the threshold after a predetermined number of cycles. Finally, when training is evaluated as satisfactory and the DRL agent is considered sufficiently trained, the latter is prepared to interact with the real environment-which, unlike the training environment, is dynamic and continuous-and, from this, new MPS states are prescribed.
The DRL agent can be a Python algorithm to be implemented with the RLLib via Ray library and Tensorflow, specifically designed to interact in the above-described training environment with basically two operation modes: training and prescription. When training, the DRL agent collects the current MPS state and predicts a new state for the next period, and so on, until all the periods of a complete training MPS cycle have been completed. The agent's predictions are based on a learned methodology from synthetic or real data, which depend on the DRL methodology selected from those existing in the RLLib library and the adjustment of its hyperparameters. This library, which includes the most basic versions, those of the policy-based or value-based type, mainly collects the most usual hybridbased DRL methodologies, such as actor-critic or gradient-based methods; e.g., policy gradients (PG), soft actor critics (SAC), advantage actor-critic (A2C, A3C), or proximal policy optimization (PPO), and some high-performance architectures such as asynchronous proximal policy optimization (APPO). DRL algorithm selection relies on an additional module attached to the agent that evaluates its performance during training and has the capacity to modify: (i) the agent's number of training cycles, also called epochs; (ii) the DRL algorithm type depending on the result of evaluations; and (iii) the adjustment of certain basic hyperparameters that varies according to the selected DRL algorithm.

Proposal Summary
In summary: (i) the proposed DT is conceived as a DSS implemented by the manufacturer and partially shared with suppliers, warehousers, retailers and, depending on the case, customers by means of a cloud-computing system; (ii) from all these SC stakeholders, the DT receives the data and information about the processes and resources that are properly modeled as a DRL instance; (iii) when the DRL agent is trained, the DT processes the MPS problem automatically and autonomously in real time based on the DRL method; (iv) the DT provides a permanently optimized MPS in the event of any change in input as output, but respects the committed ordering policy on the fixed demand horizon, if any; (v) the DT allows the manufacturer to transmit changes to lower planning levels without delays, such as MRP, CRP or inventory control; (vi) diverts a master supply schedule to suppliers at their different tiers for their own planning; (vii) diverts available products to promise per period to warehousers, retailers and, depending on the case, to customers; and (viii) delimits the data and information of each actor depending on its role.

Discussion
The MPS plays a crucial role in the SC and has been a sustained driver of research into new planning methodologies, which has provided continuous scientific development and generated new models with a wide range of approaches. However, in today's dynamic environment, the growing scale and complexity of global SCs and the new technological developments occurring at an ever-increasing speed mean that knowledge gaps persistently appear. In the case at hand, the aim of this paper is to respond to the lack of contributions detected in the literature on the joint use of the ZDM management model and the ML-based DT enabling technology to pursue smart master planning to, thus, contribute to a resilient and sustainable SC.
On the mechanisms that lead one of the I4.0 enabling technologies par excellence, such as the DT, to constitute a competent tool to enable the MPS to achieve higher automation, autonomy and real-time action capacity levels, it can be stated that the DT is a system that combines physical entities-in our case, the data and information about the real master planning environment-with their virtual counterparts-the virtual MPS-by taking advantage of the benefits of virtual and physical environments to benefit the whole system [11]. The DT captures information from the physical entity, which it stores, processes, analyzes and evaluates so that the knowledge generated after these operations can be subsequently applied to not only current physical entities, but also to future ones [11], and all this without localization restrictions given its ability to enable shared virtual spaces where data and information about systems become more visible [12,13] and, thus, enable collaborative production scenarios. Relating the implementation of this technology in the literature into the digital transformation of processes from the perspective of its automation [89] and its endowment with higher autonomy levels is commonplace [90]. Moreover, the DT's potential to enable real-time management is a recurrent research topic in the logistics and industrial field in general [36], but also in the area of production planning and control in particular [37], especially when assisted by artificial intelligence [7]. Not many examples appear in the literature that show the benefits of the DT in the specific MPS field [18], but they can be found in many other SC fields, such as real-time monitoring and control [62], risk management [13], recovery from disruption [65], SCs' resilience to disruption [66], planning verifications related to demand forecasting, aggregate planning, and inventory planning [10]. One limitation of this technology is that the existing commercial solutions on the market currently have relatively high acquisition and maintenance costs, and need to be handled by qualified personnel. However, the possibility of implementing ad hoc solutions with open source tools has increased significantly since this technology began to make its way in the early part of the last decade.
Regarding ML and its ability to cope with NP-hard computational complexity levels, once again it is true that, in the production planning and control area, the academic community has chosen to address mostly the application of ML methods in process problems other than the MPS, i.e., at the tactical decision level, mainly in inventory control and supplier selection problems, and at the operational decision level, in the various configurations of the job scheduling problem [18]. It is important to emphasize that these problems share the possibility of being modeled as MDP with the MPS, which would a priori allow the application of the reinforcement learning methodology with similar guarantees of success in the MPS as in other problems. However, it must be assumed that the complex structure of current SCs, especially global ones with many stages and nodes, the number of variables included in the modeled problem and its intrinsically stochastic condition imply that the modeling of real cases with the reinforcement learning methodology, but without the additional assistance of other methods, constitutes a considerable challenge. Only through the gradual incorporation of the DRL methodology [69], a combination of the reinforcement learning methodology with deep learning-another ML methodology that uses artificial neural networks to transform a set of inputs into a set of outputs, that solve tasks that involve handling complex and high-dimensional raw input data sets [91]-has it been possible to begin to consider the study of SCs with certain complexity, e.g.,: (i) the multistage SC problem of Alves and Mateus [67], validated with a four-stage SC scenario and two nodes per stage, local inventories, lead time, a single product, and demand uncertainty; (ii) the capacitated SC problem of Peng et al. [68], validated with a three-stage SC scenario, one node in the first, two in the second and three in the last stage, capacitated production, independent, stochastic and seasonal demand, and a single product; (iii) the case of Meisheri et al. [92] who, despite restricting the validation of their retailers' inventory replenishment to the last SC layers, i.e., warehouse and retailer, considers the existence of product variety, with instances of 100 and 220 products-to substantially increase combinatorial computation-and incorporates lead time, limited storage capacity, cross-product restrictions, and weight and volume transportation restrictions. Computational limitations in this regard are manifested as the size of the problem to be solved in terms of the size of the input dataset, and especially the size of the modeled problem's observation space. Nevertheless, advances in the DRL methodology are continuous and new implementations with meta-learning, contextual bandits or high-performance architectures, among others, frequently appear, whose application in the SC planning field is yet to be explored as the most advanced implementations in the related literature do not go beyond gradient-based methods, such as PPO, advantage actor-critic (A2C), or even the basic DQN.
The ZDM management model is often associated with I4.0 for presenting largely compatible objectives and providing synergistic and complementary approaches [17,93,94]. Beyond the most well-known ZDM objectives, such as minimizing failures and defects and their early online detection, this management model shares with I4.0 the purpose of minimizing production costs and making production more efficient and sustainable by reducing the number of failures, breakdowns and defective parts [95]. Although the ZDM model does not fully appear in the literature about SCs and the MPS, it should be emphasized that the effects of meeting its objectives entail certain benefits for SCs whose discussion is of interest in the present research: (i) minimization of defects on line, regardless of them being failures, breakdowns or defective quality parts, is a factor that, in turn, favors the minimization of the disturbances that usually affect the system [7,15,16,96]. Thus it is beneficial action for the automation of processes; and (ii) the sustainability of SCs is favored by the minimization and mitigation of defects in two of its dimensions, economic and environmental, because achieving the ZDM strategy favors the reduction of costs, but also the reduction of emissions and energy, and raw material use [17,94]. It should also be noted that ZDM and resilience are related in the literature [97] as they share some significant points. The path toward higher resilience levels in SC involves promoting both properties that reduce the vulnerability of the SC to disruptive events and those that reduce its recovery time. ZDM has the dual potential to improve both groups of properties as manufacturing without failures or defects is robust, persistent and, therefore, less vulnerable manufacturing, but also allows faster recovery after disruptions because it is more agile and adaptable. As for the relation between ZDM and sustainability, it is remarkable how the research of Psarommatis et al. [19] establishes such a direct relation, and the word sustainability plays a leading role in the very definition of ZDM provided in this paper: "ZDM offers a holistic approach, aiming at greater manufacturing sustainability, which ensures both process and product quality by reducing product defects through the use of corrective, preventive, and predictive techniques made possible by data-based technologies, and guarantees that no defective products leave the production site and reach the customer". However, the ZDM model has some restrictions that should be mentioned. As a quality improvement (QI) method, ZDM differs from traditional methods, such as lean manufacturing, six sigma or total quality management (TQM) because, while traditional methods use historical data to improve the future without considering the current production status, ZDM employs both historical and current data, essential for tracing the cause of the defect and to learn from the event. This advantage of ZDM lies in its negative counterpart insofar as it requires intensive real-time data use, without which the model's efficiency is compromised [19].
Thus it seems reasonable to think that the DT technology, the ML method and the ZDM model applied in the MPS are aligned individually with a smart MPS model that contributes to a more resilient and sustainable SC. However, this alignment is reinforced in the triple combination of the DT-ML-ZDM scheme given the cross synergies among them, where the following stand out: (i) the DT technology is favored by the ML method because it enables the real-time prescription of solutions to the MPS problem in high-dimensional problems, a field in which traditional methods such as analytics, simulation and heuristics are limited; (ii) the DT technology is favored by the ZDM model because it mitigates the disturbances number and magnitude on the system, which favors its automation, including DT functions; (iii) the ML method is favored by the DT technology because the virtualization of the real environment allows the ML agent to act on it only when it is positively evaluated and is, therefore, able to prescribe after training, which confers planning robustness; (iv) for the same reason, the ZDM model is favored by DT technology because the fact that the ML agent only acts when it is trained favors the reduction and mitigation of errors and, thus, the elimination of defects; and, finally, (v) the ZDM model is favored by applying the ML method because the latter favors the necessary real-time data feeding of the former and can, thus, properly carry out its function. The potential benefits are, therefore, significant. Yet the other side of the coin is marked by the possible barriers and risks associated with implementing the DT-ML-ZDM scheme, among which, and according to the research of Müller et al. [77], those associated with are: (i) suppliers and SC partners, e.g., their critical attitude toward changes, or rejection of data transparency; (ii) organization and implementation, e.g., the amount of investment required, or lack of resources or expertise; (iii) data management, e.g., data security, quality or availability; (iv) human aspects; e.g., the role of new employees, labor market disruptions or critical attitudes to change; (v) technology, e.g., its implementation procedure, overestimation of its benefits, use of immature systems or poor selection; finally, (vi) legal issues and standards, e.g., public framework conditions, standardization and business ethics. Digital integration with customers is also an aspect to consider in this regard given the positive influence it will have on SC management and performance, as indicated by Queiroz et al. [78]. Thus an effective MPS digital transformation process according to the herein proposed scheme must take into account all these challenges and risks beyond the simple synergistic implementation of DT-ML-ZDM.
That said, it is also worth mentioning that from a practical perspective, MPS robustness lies on several fundamental pillars, of which the following are highlighted: (i) the accuracy of demand forecasts; (ii) the consideration of realistic constraints and deadlines; (iii) the use of accurate calculation methods; (iv) the flexibility to synchronize with the evolution of demand patterns; (v) a fluid movement of information between agents and areas; and (vi) the involved parties' acceptance. Thus the existence of instruments that provide this structure with support by facilitating SC systems and processes becoming visible to the agents involved in it, in their different areas and at their distinct decision levels, collaborative interaction capacity, wide data access, simulation and off-line analysis power, and real-time action capacity, can all be key to minimize, mitigate and/or eliminate failures and defects, and to reinforce SC resilience and sustainability. Therefore, it is considered that in the particular MPS context, virtualization by means of the DT, the intelligence imbued in decision-making processes with ML assistance, and the stable fluency of processes in line with the zero-defect philosophy have the capacity to play a significant role in the smart MPS.

Conclusions
This paper proposes an initial DT-based conceptual framework to model, optimize and prescribe the MPS in an SC in a ZDM context. This framework focuses on developing optimization algorithms to solve the MPS problem in the specific described environment based on digital twinning with the support of DRL techniques.
The proposed DT-based model, designed to accommodate the set of stakeholders in the SC, along with their real and virtual processes and resources in two different planes, is described. The DRL-based DT-driven MPS setup is also presented.
Both the described framework and its configuration are considered a first contribution of this research. Its design aims to improve SC performance by reinforcing its digitization, intelligence, visibility, interconnectedness and organization, which all take the SC toward higher resilience and sustainability levels; a goal for any traditional SC that intends to be transformed into SC4.0. The DT technology is distinguished by the potential to simultaneously and positively influence all these aspects because: (i) digitization is an intrinsic property of a DT; (ii) although the commonest purpose of a DT is to simulate, analyze, predict or optimize, this technology admits moving one step further toward the action of autonomously prescribing, a capability to which the attribute of intelligence can be attributed; (iii) a model in which the DT replicates a specific planning subject (e.g., the MPS) for its shared use across the entire SC has the capacity to take visibility, interconnectedness and organization qualities to a higher level; (iv) a more effective ZDM strategy facilitated by the model design contributes to not only SC resilience by minimizing, mitigating or eliminating potential disturbing factors, but also to more sustainability.
The reinforcement learning approach offers certain benefits that are highlighted. Proper DRL-based modeling that bridges the exploration-exploitation dilemma in a balanced manner can help to solve the problem of correlating immediate planning actions with their long-term consequences. In addition, and unlike analytical or heuristic approaches, the DRL-based modeling approach provides an acceptable solution in real-world environments, such as manufacturing, for those problems in which feedback is often subject to time delays, provided that these problems can be characterized as Markov decision processes (MDP), which is the case of the MPS problem. It is also shown that the DRL method is an effective tool for dealing with problems whose solution with analytical or heuristic approaches is harder due to implicit computational complexity. This proposal has some limitations. The model does not foresee the inclusion of financial considerations. Moreover, the possibility of putting at risk with unwise decisions the economic value of the resources involved in the MPS by the actors involved in the SC means that it is advisable to restrict the DT's prescriptive action in a first stage so that the final MPS confirmation depends on the human operator. This recommendation would continue to be advisable until the system's reliability is properly verified.
Regarding research perspectives, this conceptual framework has to be considered an initial starting point and roadmap for modeling, applications and empirical validations in a real-world SC MPS case study. It is also necessary to study if the modeling approach can be extended to other planning levels, such as MRP or inbound and outbound logistics, and under what conditions this would be possible.
Although the proposed conceptual framework accommodates all the intervening actors in the SC, developing the model beyond the manufacturer and its suppliers at the two closest tiers is challenging and opens up a supplementary research line. The same conclusion is reached for the task of incorporating additional supplier tiers into the previous two, plus logistics warehousers, wholesale distributors, retailers and, finally, customers.
A better understanding of the relevance of the human factor in SC4.0 and its planning would also be a topic for further research. The European Union's Industry 5.0 initiative falls in line with this, and it is worth mentioning the desirability of further research into the role that humans should play in environments where not only the most physically demanding or risky tasks are being transferred from humans to systems, but also the responsibility for decision making.
Lastly, the described conceptual framework, and the technical background behind the proposed DT, can be adapted to other novel alternative tactical planning frameworks, such as adaptive sales and operations planning (AS&OP) that derive from the demand-driven adaptive enterprise (DDAE) model by substituting the MPS subject for other different ones; e.g., replenishment of items in the buffers identified at the tactical level. Even by being formulated as an MDP, it can also be modeled as a nonlinear, stochastic and/or fuzzy problem to face uncertainty, which would be a promising future research line.