Energy Consumption-Based Maintenance Policy Optimization

: The optimal predictive, preventive, corrective and opportunistic maintenance policies play an important role in the success of sustainable maintenance operations. This study discusses a new energy efﬁciency-related maintenance policy optimization method, which is based on failure data and status information from both the physical system and the digital twin-based discrete event simulation. The study presents the functional model, the mathematical model and the solution algorithm. The maintenance optimization method proposed in this paper is made up of four main phases: computation of energy consumption based on the levelized cost of energy, computation of GHG emission, computation of value determination equations and application of the Howard’s policy iteration techniques. The approach was tested with a scenario analysis, where different electricity generation sources were taken into consideration. The computational results validated the optimization method and show that optimized maintenance policies can lead to an average of 38% cost reduction regarding energy consumption related costs. Practical implications of the proposed model and method regard the possibility of ﬁnding optimal maintenance policies that can affect the energy consumption and emissions from the operation and maintenance of manufacturing systems.


Introduction
Today, significant changes are taking place in production and service systems as the impact of the Fourth Industrial Revolution is felt, resulting not only in significant quantitative but also qualitative changes. These changes affect not only the production processes, but also other important activities related to their operation, such as procurement, distribution and maintenance. In the case of complex systems, the implementation of design tasks requires an integrated approach [1], through which both investment and operating costs can be reduced while a sustainable system can be implemented. As statistical surveys show, the top causes for maintenance are the following: 34% aging resources, 20% mechanical failure of equipment, 11% human and operational error, 9% lack of time for maintenance, and 8% bad design of technical and logistics resources [2]. These are influencing the energy efficiency and energy consumption of manufacturing systems; therefore, these complex causes require complex maintenance policy. The application of maintenance policies in digital contexts is a new trend in the Industry 4.0 era. Digital twin-based simulation makes it possible to optimize the operation and maintenance of manufacturing systems, because the impact of maintenance policies can be validated through discrete event simulation [3]. The digital twin methodology is used in many fields of manufacturing, construction and oil industry, and new research shows that this technology is applicable in the field of risk control and prevention [4]. Energy efficiency plays an important role in the manufacturing processes, where smart production systems apply preventive maintenance and restoration to prevent out-of-control states [5]. Maintenance policies can be applied in complex supply chain solutions with maintenance and warranty policy [6]. A wide range of studies focus on the maintenance policies of imperfect production systems [7], but only a few of them analyze the impact of these policies on energy efficiency and emission reduction.
A suitable maintenance policy has a great impact on the reduction of energy use, because a well-chosen preventive or corrective maintenance strategy can keep assets in the best condition, which leads to energy efficient operation; digital and automated work orders regarding maintenance can support the standardization of maintenance operation; identification of the real problems in the manufacturing system can identify the reasons for increased excessive energy consumption. Within the frame of this paper the proposed maintenance optimization methodology links maintenance optimization to the reduction of energy consumption. This paper studies the impact of maintenance policy optimization on energy consumption and GHG emission, while discounted profit is also taken into consideration. As the literature review section will show, the majority of the articles in the field of maintenance policy optimization are focusing on the optimization in a conventional manufacturing environment and only a few of them describes the maintenance policy and strategy optimization using digital-twin-based discrete event simulation regarding energy consumption and GHG emission depending on the electricity generation sources. The application of suitable design and control methods can increase the efficiency, availability and sustainability. Figure 1 summarizes the background of this research focusing on the linkage of energy and maintenance policy optimization.
on the maintenance policies of imperfect production systems [7], but only a few of them analyze the impact of these policies on energy efficiency and emission reduction.
A suitable maintenance policy has a great impact on the reduction of energy use, because a well-chosen preventive or corrective maintenance strategy can keep assets in the best condition, which leads to energy efficient operation; digital and automated work orders regarding maintenance can support the standardization of maintenance operation; identification of the real problems in the manufacturing system can identify the reasons for increased excessive energy consumption. Within the frame of this paper the proposed maintenance optimization methodology links maintenance optimization to the reduction of energy consumption. This paper studies the impact of maintenance policy optimization on energy consumption and GHG emission, while discounted profit is also taken into consideration. As the literature review section will show, the majority of the articles in the field of maintenance policy optimization are focusing on the optimization in a conventional manufacturing environment and only a few of them describes the maintenance policy and strategy optimization using digital-twin-based discrete event simulation regarding energy consumption and GHG emission depending on the electricity generation sources. The application of suitable design and control methods can increase the efficiency, availability and sustainability. Figure 1 summarizes the background of this research focusing on the linkage of energy and maintenance policy optimization.; Paula Fernández González This paper is organized as follows. Section 2 presents a systematic literature review, which summarizes the research background of manufacturing policy optimization from a descriptive and content analysis point of view. Section 3 describes the model framework of maintenance policy optimization including the functional model, the mathematical model and a Howard's policy iteration-based optimization. The model is focusing on the energy consumption related costs based on a levelized cost of electricity. The model also takes into consideration GHG emissions depending on their source of generation. Section 4 demonstrates the scenario analysis, which validates the model and the optimization algorithm. Conclusions, future research directions and managerial impacts are discussed in Section 5.

Literature Review
Within the frame of this short systematic literature review the following questions are answered regarding maintenance policy optimization [8]: What is the current state of knowledge on maintenance related research? How is maintenance policy optimization analyzed and supported in previous works? What methods and parameters are used? What factors influence the results of maintenance policy optimization? Who first did it or published it? What are the main research gaps and what are the limitations of existing research results? This paper is organized as follows. Section 2 presents a systematic literature review, which summarizes the research background of manufacturing policy optimization from a descriptive and content analysis point of view. Section 3 describes the model framework of maintenance policy optimization including the functional model, the mathematical model and a Howard's policy iteration-based optimization. The model is focusing on the energy consumption related costs based on a levelized cost of electricity. The model also takes into consideration GHG emissions depending on their source of generation. Section 4 demonstrates the scenario analysis, which validates the model and the optimization algorithm. Conclusions, future research directions and managerial impacts are discussed in Section 5.

Literature Review
Within the frame of this short systematic literature review the following questions are answered regarding maintenance policy optimization [8]: What is the current state of knowledge on maintenance related research? How is maintenance policy optimization analyzed and supported in previous works? What methods and parameters are used? What factors influence the results of maintenance policy optimization? Who first did it or published it? What are the main research gaps and what are the limitations of existing research results?
The used systematic literature review process can be divided into two main parts. The first part includes a descriptive analysis showing the tendencies of research topics in the field of maintenance policy optimization, while the second part focuses on the content analysis of the available research results. The used methodology of systematic literature review includes the following aspects: (a) definition of research questions; (b) search process in Science Direct; (c) inclusion and exclusion process; (d) descriptive analyses of chosen articles; (e) content analysis and (f) identification of scientific gaps, bottlenecks, and limitations (see Figure 2). Firstly, the relevant terms were defined. The following keywords were used to search in the Science Direct database: "maintenance" AND "policy" AND "optimisation". Initially, 42 articles were identified. This list was reduced to 33 articles by selecting journal articles only. The search was conducted in July 2021; therefore, new articles may have been published since then.

Descriptive Analysis
The journal articles focusing on maintenance policy optimization can be classified based on the subject area defined in Science Direct. Figure 3 shows the results of the classification of 33 articles considering 8 subject areas. This classification shows the majority of engineering, mathematics, computer sciences, and decision making. The optimization of maintenance strategies and maintenance policies, especially in the field of manufacturing, production and transportation, has been intensively researched in the past 15 years, but there are some early research results from the 70s and 80s. One of the first articles in this field was published in 1978 in the field of optimization of stochastic maintenance policies [9] and within the frame of this article the author summarizes the main types of maintenance policies focusing on both deterministic and stochastic environment from a cost optimization point of view. Firstly, the relevant terms were defined. The following keywords were used to search in the Science Direct database: "maintenance" AND "policy" AND "optimisation". Initially, 42 articles were identified. This list was reduced to 33 articles by selecting journal articles only. The search was conducted in July 2021; therefore, new articles may have been published since then.

Descriptive Analysis
The journal articles focusing on maintenance policy optimization can be classified based on the subject area defined in Science Direct. Figure 3 shows the results of the classification of 33 articles considering 8 subject areas. This classification shows the majority of engineering, mathematics, computer sciences, and decision making.
The used systematic literature review process can be divided into two main parts. The first part includes a descriptive analysis showing the tendencies of research topics in the field of maintenance policy optimization, while the second part focuses on the content analysis of the available research results. The used methodology of systematic literature review includes the following aspects: (a) definition of research questions; (b) search process in Science Direct; (c) inclusion and exclusion process; (d) descriptive analyses of chosen articles; (e) content analysis and (f) identification of scientific gaps, bottlenecks, and limitations (see Figure 2). Firstly, the relevant terms were defined. The following keywords were used to search in the Science Direct database: "maintenance" AND "policy" AND "optimisation". Initially, 42 articles were identified. This list was reduced to 33 articles by selecting journal articles only. The search was conducted in July 2021; therefore, new articles may have been published since then.

Descriptive Analysis
The journal articles focusing on maintenance policy optimization can be classified based on the subject area defined in Science Direct. Figure 3 shows the results of the classification of 33 articles considering 8 subject areas. This classification shows the majority of engineering, mathematics, computer sciences, and decision making. The optimization of maintenance strategies and maintenance policies, especially in the field of manufacturing, production and transportation, has been intensively researched in the past 15 years, but there are some early research results from the 70s and 80s. One of the first articles in this field was published in 1978 in the field of optimization of stochastic maintenance policies [9] and within the frame of this article the author summarizes the main types of maintenance policies focusing on both deterministic and stochastic environment from a cost optimization point of view. The optimization of maintenance strategies and maintenance policies, especially in the field of manufacturing, production and transportation, has been intensively researched in the past 15 years, but there are some early research results from the 70s and 80s. One of the first articles in this field was published in 1978 in the field of optimization of stochastic maintenance policies [9] and within the frame of this article the author summarizes the main types of maintenance policies focusing on both deterministic and stochastic environment from a cost optimization point of view.

Content Analysis
As the descriptive analysis part of the systematic literature review shows, there is a wide range of articles regarding maintenance policy optimization. The literature sources focus on both modelling and optimization problems of maintenance strategies and validate the proposed policy optimization methods with different types of scenarios. Within the frame of this chapter, the most important results of maintenance policy optimization research are summarized. The results of the content analysis are summarized in Figure 4. As the descriptive analysis part of the systematic literature review shows, there is a wide range of articles regarding maintenance policy optimization. The literature sources focus on both modelling and optimization problems of maintenance strategies and validate the proposed policy optimization methods with different types of scenarios. Within the frame of this chapter, the most important results of maintenance policy optimization research are summarized. The results of the content analysis are summarized in Figure 4. The time-dependent and average unavailability has a great impact on the performance of standby units, therefore several basic cost rate equation-based general formalisms were developed to support the selection process for optimal test and maintenance periods, where risk constraints were also taken into consideration [10]. The Markov Decision Process (MDP) is a general method which can be applied in the case of a wide range of maintenance strategy optimization problems. MDP-based modelling makes it possible to describe and optimize stochastic dynamic decision-making processes of condition-based maintenance. This modelling method means the integration of failure rate curves and non-convex mixed integer non-linear programming (MINLP), where the complexity of the model led to computational problems [11]. The integration of statistical process control and preventive maintenance plays an important role in the reliability and quality of production processes, because cost savings can be achieved by applying the optimized policy of preventive and corrective maintenance. In this approach the deterioration of production and logistics resources can be described as a discrete-time Markov chain model [12]. The inspection and maintenance policies in manufacturing systems generally focus on the reliability, availability and life-cycle cost of the production and material handling resources. The Markov-Decision-Process-based policy optimization can be integrated with Value of Information (VoI) methodology. This VoIbased MDB approach is used to find the ideal aperiodic sequential inspection and condition-based maintenance policy, where the decision-making process and its results depends on the value of information [13]. The optimization of inspection and maintenance strategies and policies is unavoidable in the case of preparedness systems, where failures can be detected only by inspections. A wide range of inspection and maintenance strategies can be taken into consideration and it is also a suitable way to mix various approaches and policies. The solution algorithm of preparedness systems can be based on both MDP and dynamic programming [9]. The efficiency of maintenance policies has a great impact on the quality of processes and products, therefore it is important to find the The time-dependent and average unavailability has a great impact on the performance of standby units, therefore several basic cost rate equation-based general formalisms were developed to support the selection process for optimal test and maintenance periods, where risk constraints were also taken into consideration [10]. The Markov Decision Process (MDP) is a general method which can be applied in the case of a wide range of maintenance strategy optimization problems. MDP-based modelling makes it possible to describe and optimize stochastic dynamic decision-making processes of condition-based maintenance. This modelling method means the integration of failure rate curves and non-convex mixed integer non-linear programming (MINLP), where the complexity of the model led to computational problems [11]. The integration of statistical process control and preventive maintenance plays an important role in the reliability and quality of production processes, because cost savings can be achieved by applying the optimized policy of preventive and corrective maintenance. In this approach the deterioration of production and logistics resources can be described as a discrete-time Markov chain model [12]. The inspection and maintenance policies in manufacturing systems generally focus on the reliability, availability and life-cycle cost of the production and material handling resources. The Markov-Decision-Process-based policy optimization can be integrated with Value of Information (VoI) methodology. This VoI-based MDB approach is used to find the ideal aperiodic sequential inspection and condition-based maintenance policy, where the decision-making process and its results depends on the value of information [13]. The optimization of inspection and maintenance strategies and policies is unavoidable in the case of preparedness systems, where failures can be detected only by inspections. A wide range of inspection and maintenance strategies can be taken into consideration and it is also a suitable way to mix various approaches and policies. The solution algorithm of preparedness systems can be based on both MDP and dynamic programming [9]. The efficiency of maintenance policies has a great impact on the quality of processes and products, therefore it is important to find the correct correlation between maintenance strategy and quality of products and processes. The relationship between maintenance and quality is influenced by a wide range of environmental factors and the reliability and accuracy of the policies depends on the accuracy and availability of measuring instruments and sensors. This fact led to the increased importance of simulation-based decision support in maintenance-based quality control [14,15]. In the case of complex systems and equipment learning algorithms it can be used to optimize the maintenance policy. It is especially important in the field of aviation, where the most important influencing factor of the preventive and corrective maintenance of airplanes and their components is operational reliability. The reinforcement learning approach was adopted in the case of aero-engine maintenance, where the policy optimization process is a Markov Decision Process and the reinforcement learning algorithm is based on Gauss-Seidel value iteration [16].
The reliability of systems and processes depends on the reliability of their components. Increasing the life-time of components is an important aspect of system and process reliability, therefore it is important to find an optimal maintenance strategy both for prevention and for correction. The proportional hazard model, as a class of survival models in statistics is a suitable tool for analyzing different maintenance strategies and describing their impact on the reliability of the system. Based on different proportional hazard models, the impact of corrective and preventive maintenance of system components on the reliability and lifecycle of the whole process can be analyzed [17]. One of the most critical parts of the maintenance policies is the procurement of spare parts, which is represented by an optimized spare part ordering strategy. In the case of complex systems, where serial and parallel processes are performed, the optimization of the order strategy is a core problem. In the case of silent problems, the components of systems can be divided into two main parts: the first part of components can cause only minor failures, while in the case of the second part of components the failure of components can cause major system failure. The inventory and the order optimization of spare parts is a complex problem, where a hybrid strategy can lead to an optimal spare part logistics. In the case of hybrid solutions, the periodic and the continuous review policy can be combined [18]. In the case of complex production systems, optimization of production and maintenance must be taken into consideration as an integrated process. This integration is demonstrated in the case of subcontracted production processes, where the optimization of the production and maintenance strategy takes the subcontracting constraints into consideration and, using a sequential design process, optimizes the total inventory, production, and maintenance cost of the subcontracted production network [19]. Maintenance policies can be determined using analytical or heuristic methods. In the case of preventive maintenance, a Weibull distribution-based analytical method can be used to optimize the maintenance policy with minimal repair failure and periodic maintenance. The analytical method uses the available system failure data [20].
The joint optimization of inspection maintenance and spare parts provisioning is a complex optimization problem, therefore the simulation-based decision making and survey data can support the analysis of different inventory review policies of spare parts, which has a great impact on the performance of the maintenance strategy. The simulation-based methodology was validated in a paper production plant [21].
The time of preventive maintenance operations is an idle time, when the system is not working. Therefore, a potential solution lies in integrating the preventive and predictive maintenance of other components. Researchers demonstrated a new maintenance optimization policy which focuses on the selection strategy of components to be linked to the failed component's maintenance process. Computational results validate the component maintenance priority model and its application in the case of linking the preventive maintenance policy [22]. The integrated approach of maintenance in production systems plays an important role, especially in the case of complex production systems. The failures of the production systems can be categorized and different hybrid maintenance strategies can be used as a result of an integrated production, inventory, and maintenance policy optimization algorithm [23]. The same approach is suitable to determine the displacement sequence for spare units in multi-unit parallel production systems [24]. Not only the production but also the logistics and related maintenance processes are influenced by the qualification of human resources [25]. The performance analysis of maintenance processes focuses on both technological and human resources and considers the impact of human error on corrective, preventive, and predictive maintenance. It is confirmed that human error in maintenance increases the production cost and the stock level of spare parts, therefore the qualification of maintenance staff is a core issue in maintenance [26]. In the case of parallel systems with identical components and units, the optimization of condition-based opportunistic maintenance can be linked to optimization of the spare parts provisioning policy, and the long-run cost rate model can optimize the maintenance, spare parts ordering, and inventory strategy. The used method describes the production process as a stochastic process with the property that certain parts, components and sub-processes are statistically independent of each other and the maintenance probabilities are derived using a deterioration state space partitioning method. This joint optimization method is validated in a wind power farm scenario [27].
The preventive, corrective, predictive, and opportunistic maintenance processes can be analyzed in many ways using analytical, heuristic methods or discrete-event simulation. The discrete-event simulation can handle complex systems and processes. It can account for the uncertainty environment as a constraint. Simulation can support the analysis and flexible optimization of maintenance policies as a hybrid integration of various types of maintenance and different approaches, including the minimization of maintenance cost or maximization of availability and reliability. The discrete-event simulation was validated in the case of a mining factory [28]. The closed loop economy and the circular economy plays an important role in our life. The recycling and reuse options are highlighted in maintenance processes and the operations of maintenance are linked to the inverse processes of value chains. The remanufacturing of spare parts and manufacturing machines requires new maintenance policies. In the case of monotonous and stochastic deterioration, a proposed adopted condition-based maintenance can support the performance of maintenance processes. In the case of this complex system, which was modelled as an NP-hard problem, the optimization was based on a genetic algorithm [29]. The maintenance strategies can also consider sustainability aspects. This means that the objective function is based on profit, greenhouse gas (GHG) emission, system availability, and reliability [30]. Aircraft maintenance represents a special field of maintenance, where policies are influenced by authorities' regulations. The integration of failure and reparation processes, the order strategy, and the spare part inventory management lead to new maintenance policies, which can be optimized using simulation-based optimization algorithms [31]. Within the frame of a multi-dimensional warranty, different maintenance aspects can be taken into consideration. After renewal or nonrenewal of warranty, the maintenance strategy must be changed, and a new post-warranty periodic preventive maintenance policy can be applied. During post-warranty maintenance, new and different factors (e.g., the maintenance cost is paid by the user) have a great impact on the maintenance strategy [32]. Maintenance policy influences the sustainable operation of wastewater infrastructure, where the process parameters are influenced by the maintenance strategy, linked performance measurement tools and energy recovery technologies [33][34][35], and a wide range of mathematical models and methods can be used to support the design and operation in this field [36,37].
A new performance-guided maintenance policy was proposed for multicomponent systems. The maintenance policy takes maintenance cost, system availability and operating revenue of the system into consideration, and the optimal maintenance strategy is defined using a policy iteration method. The maintenance process is guided by the efficiency-cost ratio criterion based on system availability, in this case the optimized maintenance strategy was used in communication for navigation safety [38]. Healthcare organizations are also using complex technical systems where the optimization of maintenance strategies is a core problem. A new decision support system for maintenance policy optimization was proposed for medical gas systems. Within the frame of this decision support system, Markov Decision Policy (MDP) was linked to a Categorical Based Evaluation Technique (CBET) for the optimal medical asset management policy [39]. The objective function of maintenance policy optimization can include a wide range of aspects and it can take special parameters into consideration. In the case of tube heat exchangers, not only the minimization of life cycle cost of the whole exchanger system, but also the equipment design and the cleaning strategy represent important parts of the objective function. Within the frame of this proposed policy optimization, two main streams of conventional optimization tasks are integrated, the first one is the design phase, while the second one is the operational phase of the exchanger system from a maintenance perspective, focusing on the optimal cleaning schedule [40]. In complex systems, especially in the case of series-parallel multistate systems the definition of a suitable maintenance policy is a complex optimization task. A Markov model-based behavior of the analyzed system is linked to the model of dependencies and priorities of the system and an analytical approach is proposed for the optimization of the mentioned complex system [41]. Monte Carlo simulation makes it possible to include practical aspects of maintenance strategies, like stand-by operation modes, deteriorating repairs, aging and sequences of periodic maintenances. An integration of genetic algorithm and the mentioned Monte Carlo simulation was proposed for the optimization of maintenance and repair policy for plant logistics management [42]. Analytical optimization methods are also suitable in the case of complex systems such as electrical utilities. In a proposed analytical approach for preventive maintenance a wide range of parameters, minimal repair at failure, periodic overhaul, and replacement can be taken into consideration, and historical failure time data influences the optimal preventive maintenance strategy [43].
The quality of maintenance is influenced by the quality of human and technical resources of maintenance actions. Imperfect maintenance operations have a great impact on the availability and reliability of the system and they can cause increased maintenance and operation costs. In the case of multi-component systems, a clustering maintenance policy was proposed for optimal repair processes, where not only the imperfect maintenance operations but also degradation of components is analyzed [44]. Inventory policy optimization can be linked to maintenance policies, especially in the case of deteriorating spare part inventory, because deteriorating inventory can lead to increased inventory costs. The deterministic and stochastic deteriorating inventory (DDI and SDI) models can be used to optimize the preventive replacement interval and the maximum inventory level. This analytical optimization method was demonstrated in the case of maintenance of electric locomotives [45]. In the case of complex infrastructure systems an adaptive control approach is proposed for modeling life-cycle maintenance policy selection [46]. The multipurpose nature of components increases the complexity of the system and causes more problems during operation and maintenance. This multipurpose nature can be recognized in the case of matrix production systems, where the maintenance of standardized, multipurpose production cells influences the productivity of the flexible production process [47]. Using multistage stochastic optimization, it is possible to optimize the maintenance policy of the components [48].
In the next part of the content analysis, the focus is on the energy point of view. In modern manufacturing systems, including CNC machine tools, energy consumption is influenced by tool wear. The integration of machine maintenance and tool replacement processes can reduce energy consumption [49]. The scheduling of maintenance operations in manufacturing systems improves the energy efficiency of the processes, while productivity, product quality, and energy consumption can be taken into account. The manufacturing system can be modelled as a standard discrete-state Markov process [50]. It is possible to integrate both energy and quality aspects into optimal maintenance policies. The degradation process of production resources can be modeled as a Lévy-type process and the multi-objective optimization problem can be solved using the Monte Carlo method [51]. The energy efficiency of the product and its price is in the focus of a research area, where the manufacturing system produces defective products in the out-ofcontrol state. To prevent this out-of-control state, preventive maintenance and restoration methodology is proposed [5]. Road maintenance is a special field of maintenance processes, where maintenance operations require high energy consumption, therefore maintenance policy optimization for form routing, cost, and GHG emission is an important problem to be solved [52]. Imperfect maintenance operations are performed in a stochastic environment, and the quality of the maintenance and its impact on energy efficiency can be measured [53,54]. The maintenance policies and maintenance operations have great impact on the cost reduction and energy efficiency of operations in the energy sector, as case studies show in the field of wind farms [55][56][57] and ocean wave farms [58]; therefore, maintenance plays a special role in the energy sector, where service level and cost efficiency are the most important objective functions. Condition-based maintenance has been applied in many industrial systems, where the future condition of technological and logistic resources is predictable. Condition-based maintenance policies consider maintenance cost, energy efficiency, and output performance [59][60][61]. Other maintenance related research focuses on sustainable manufacturing based on energy saving window [62], using field data for maintenance optimization regarding energy efficiency [63], and control-chart-based queueing approaches for maintenance policy optimization [64]. The authors' contributions related to energy aspects of maintenance policy optimization are shown in Table A2.
As the above-mentioned energy-related maintenance policy optimization research indicates, existing studies focus on the energy efficiency of the main processes (manufacturing operations, logistics operations, transportation, or energy generation in energy farms), while only a few of them consider the energy aspects of maintenance operations.
The increasing number of publications indicates the importance and scientific potential of research on maintenance policy optimization. The articles that addressed the optimization of maintenance policies and strategies are based on a wide range of production and service environments, but few of the articles have aimed to research the potentials of digital twin application maintenance policy optimization, especially in the case of complex systems including technological and logistics resources. Table A1 in Appendix A summarizes the main contributions of the analyzed articles, including the main contribution and the focus on maintenance, optimization, application of digital twin or other Industry 4.0 technologies, and supply chain management aspects. The table identifies a research gap, because the maintenance policy optimization from an energy point of view regarding potential of digital twin technology has not been extensively published until now. Figure 3 summarizes the methodological framework of the analyzed maintenance policy optimization studies.
Therefore, Industry 4.0 technologies still needs more attention and research in the field of preventive, predictive, and corrective maintenance. As the above-described content analysis shows, energy-and emission-related aspects are not taken into consideration in maintenance policy optimization; therefore, within the frame of this article, energy and GHG emission reduction is the focus of maintenance policy optimization. Accordingly, the main focus of this research is the modelling and optimization of maintenance policies in cyber-physical environments, where not only the cost, but also energy consumption and GHG emission can be optimized.
The main contribution of this article includes: (1) a systematic literature review with descriptive and content analyses to define research gaps and limitations of existing research results; (2) Markov process-based modelling of the maintenance processes of a cyber-physical production environment including Howard's policy iteration technique, which is focusing on the energy consumption and GHG emission reduction based on levelized cost of energy and electricity generation sources; (3) optimization algorithm to find the best maintenance strategy for cyber-physical manufacturing environment from energy efficiency, GHG emission and operation cost point of view; and (4) computational results of maintenance policy optimization with different datasets.

Modeling of the Maintenance Optimization Process
Within the frame of this chapter the maintenance model will be divided into two main parts. Within the first part, the general functional model of the maintenance process is described, while in the second part the mathematical model and the solution algorithm of maintenance policy optimization is discussed.

Functional Model of Maintenance Policy Optimization
The maintenance module of an ERP system includes the following functions of maintenance related processes: maintenance organization, order management of spare parts and Energies 2021, 14, 5674 9 of 33 maintenance-related tools and materials, maintenance object management, maintenance measurement and controlling management, maintenance processing, refurbishment processes, and maintenance information system. This maintenance module defines the input parameters, objective functions, and constraints for the maintenance policy. The literature defines four types of maintenance policies: preventive maintenance policy, predictive maintenance policy, corrective maintenance policy, and opportunistic maintenance policy. The combination of these basic maintenance policies leads to the hybrid and integrated maintenance policies. These maintenance policies are defined by an optimization algorithm. In this model, the optimization algorithm includes the well-known Howard's policy optimization technique, which is combined with energy consumption and greenhouse gas emission calculation.
The optimization algorithm is based on the information from the ERP maintenance module, the database of the whole ERP, information from the physical manufacturing and logistics system, and the results of discrete event simulation based on digital twininformation. The functional model of the proposed maintenance optimization is shown in Figure 5. main parts. Within the first part, the general functional model of the maintenance process is described, while in the second part the mathematical model and the solution algorithm of maintenance policy optimization is discussed.

Functional Model of Maintenance Policy Optimization
The maintenance module of an ERP system includes the following functions of maintenance related processes: maintenance organization, order management of spare parts and maintenance-related tools and materials, maintenance object management, maintenance measurement and controlling management, maintenance processing, refurbishment processes, and maintenance information system. This maintenance module defines the input parameters, objective functions, and constraints for the maintenance policy. The literature defines four types of maintenance policies: preventive maintenance policy, predictive maintenance policy, corrective maintenance policy, and opportunistic maintenance policy. The combination of these basic maintenance policies leads to the hybrid and integrated maintenance policies. These maintenance policies are defined by an optimization algorithm. In this model, the optimization algorithm includes the wellknown Howard's policy optimization technique, which is combined with energy consumption and greenhouse gas emission calculation.
The optimization algorithm is based on the information from the ERP maintenance module, the database of the whole ERP, information from the physical manufacturing and logistics system, and the results of discrete event simulation based on digital twininformation. The functional model of the proposed maintenance optimization is shown in Figure 5. The manufacturing plant can be divided into two main parts; the first part is the manufacturing system with technological resources, while the second part is the logistics system including devices and logistics resources for loading and unloading, transporting, The manufacturing plant can be divided into two main parts; the first part is the manufacturing system with technological resources, while the second part is the logistics system including devices and logistics resources for loading and unloading, transporting, storage, packaging, and other material handling operations. The fourth industrial revolution makes it possible to transfer conventional manufacturing systems into cyber-physical systems using Industry 4.0 technologies. In the field of technological resources, the most important I4.0 technologies are smart sensors, identification technologies, machine to machine solutions (M2M), advanced robotics, and IoT solutions. The technological part of the manufacturing plant can include intelligent tools and intelligent products; they can link the physical technological resources to the digital twin model. In the same way, the logistics resources, material handling machines can be linked with smart sensors to the digital twin. The smart sensors send failure data and status information from the physical system to the digital twin and based on these data it is possible to forecast the future failures and status of the whole system. The digital twin solution can integrate process simulation, while big data problems can be solved with three different levels of data processing. These three levels are represented by edge computing, fog computing, and cloud computing.
The digital twin of the manufacturing system is a digital aggregate, which is the integration of digital instances and digital prototypes covered by a digital twin environment. This digital twin can support discrete event simulation with failure data and status information, and the discrete event simulation can forecast the future status of instances. Based on these statuses, it is possible to define the optimal maintenance strategy. In conventional manufacturing systems, maintenance is a part of enterprise resource planning, but it is also possible to integrate the maintenance process into the manufacturing execution system (MES), because maintenance operations are close to the operational level and MES can be directly connected to the maintenance operators. The MES makes the correlation between production and maintenance data stronger [65].
The primary benefit of the application of energy efficient maintenance policy optimization is a reduction of energy cost. Energy efficiency and environmental performance are linked to each other. In this approach, energy centered maintenance is taken into consideration, but in the literature [66] there are other types of maintenance approaches (reliability centered maintenance or total productive maintenance).
The reduction of energy consumption of the technological and logistic resources (machine tools and material handling machines) can lead to significant savings in energy costs. Therefore, it is important to find the optimal maintenance policy and maintenance operation schedule. This digital twin-based model makes it possible to perform a maintenance strategy optimization: the maintenance policy optimization is a long-term optimization which is based on Howard's policy iteration method [67].

Mathematical Model and Solution of Maintenance Policy Optimization
Based on the above described functional model, the digital twin-based maintenance policy optimization process has the following main phases: (a) data collection from the physical system and transformation of data to the digital twin; (b) collection of real-time failure data and status information from the digital twin and discrete event simulation scenarios, especially focusing on energy consumption, emissions, and sustainability aspects; (c) definition of objective functions and constraints of the maintenance strategy; (d) optimization of the maintenance policy based on Howard's policy iteration method and value determination equations. This optimization phase includes the determination of energy consumption related costs based on the levelized cost of energy, while GHG emission can also be calculated based on the different potential electricity generation sources. The optimization process is shown in Figure 6.
The status of technological and logistics resources can be described as a discretetime stochastic process; therefore, Markov chains offer suitable representations for the description of this process. The goal of this maintenance policy optimization problem represented by a Markov process is to identify an optimal policy for the decision maker, which can be represented as a set function (see Equation (2)) which specifies the action that the decision maker will choose when the system is in a defined status. A policy is optimal if it minimizes some cumulative function of the random costs, typically the discounted sum over a predefined time window, which can be an infinite time horizon: τ) is the cost at time τ when the decision maker follows policy D, and ϑ is the discount rate. This Markov decision problem is the infinite-horizon discounted Markov decision problem, which can be solved in many ways: Bellman's successive approximations method or Howard's policy iteration.
The benefits of the solution of the mentioned Markov decision process with Howard's policy iteration techniques for energy consumption are as follows: • the energy consumption and emissions of the resources in the manufacturing system depends on their status; therefore, the iterated policies using Howard's methodology can lead to decreased energy consumption and emissions; • the energy consumption costs of operation and maintenance influences the discounted profit, therefore Howard's methodology results in optimal maintenance policies from an energy point of view. The characteristics of the maintenance policy optimization problem is an assignment problem, where suitable maintenance operations have to be assigned to every status of the system resources. As assumptions, the following are taken into consideration. The status of technological and logistics resources can be described as a discrete-time stochastic process; therefore, Markov chains offer suitable representations for the description of this process. The goal of this maintenance policy optimization problem represented by a Markov process is to identify an optimal policy for the decision maker, which can be represented as a set function (see Equation (2)) which specifies the action that the decision maker will choose when the system is in a defined status. A policy is optimal if it minimizes some cumulative function of the random costs, typically the discounted sum over a predefined time window, which can be an infinite time horizon: , where ( , ) is the cost at time τ when the decision maker follows policy D, and is the discount rate. This Markov decision problem is the infinite-horizon discounted Markov decision problem, which can be solved in many ways: Bellman's successive approximations method or Howard's policy iteration.
The benefits of the solution of the mentioned Markov decision process with Howard's policy iteration techniques for energy consumption are as follows: • the energy consumption and emissions of the resources in the manufacturing system depends on their status; therefore, the iterated policies using Howard's methodology can lead to decreased energy consumption and emissions; • the energy consumption costs of operation and maintenance influences the discounted profit, therefore Howard's methodology results in optimal maintenance policies from an energy point of view.
The characteristics of the maintenance policy optimization problem is an assignment problem, where suitable maintenance operations have to be assigned to every status of the system resources. As assumptions, the following are taken into consideration.
In the maintenance model we can define a set of possible states: In the maintenance model we can define a set of possible states: S = (s 1 , s 2 , · · · , s i , · · · s z ).
where s i is state i of the system and z is the maximum number of possible states of the system. These states have great impact on the energy efficiency of the manufacturing system, which means that this set of possible states also defines a link between status information, maintenance operations, and energy. This set includes all potential states of the system and these states can be identified either from the failure data and status information of the physical manufacturing and logistics system or from their digital twin aggregate. In the same way, it is possible to define a decision set, which includes all potential decisions regarding the various states of the system. The decision set includes all suitable maintenance processes: where d j is potential decision j regarding a maintenance request and r is the maximum number of possible decisions regarding the maintenance of the whole system. These decisions regarding maintenance operations also form a link to the energy consumption, because it is possible to calculate the energy consumption of all maintenance operations. The transition probabilities represent the constraints of the maintenance policy optimization: Γ = [γ ih ], where γ ih is the transition probability from state s i to state s h and The transition probabilities for transition from state s i to state s h resulting from decision d j : p s h s i , d j can be calculated. The following constraints can be defined for the transition probability values: which means that the chosen decision d j will not definitely result in the transition probability γ ih from state s i to state s h , because d j will result in a new state of the system and p s h s i , d j can be defined as the transition probability from the state resulting from the decision d j , from state s i to the state s h . Within the frame of the maintenance policy optimization, different objective functions can be taken into consideration: cost, reliability, availability, GHG emission, energy consumption, inventory value, or level of spare parts. In the case of multiple criteria decision making (MCDM), it is possible to involve more than one objective function to be optimized simultaneously. In the case of MCDM, different techniques are used for weighting multiple objectives, the well-known methods are the Churchmann-Ackoff method and the Guilford method. Within the frame of our multi-stage maintenance policy optimization the profit will be used as the objective function of the policy optimization.
In this model, the discounted profit caused by the energy consumption of the manufacturing system and the energy consumption of the maintenance process for an infinite time horizon are taken into consideration. As Winston defines [68], infinite-horizon probabilistic dynamic programing problems are Markov decision processes, where the profit-based objective function can be defined in the following form: where σ s i ,d j (s i ) is the expected profit in the initial period if decision d j was chosen for status s i , P S (s i ) is the expected discounted profit of status s i , ϑ is the discounting factor of the profits and 0 < ϑ ≤ 1. The Howard's policy iteration method is a suitable technique to optimize policies from a value (or cost) determination equation point of view. In this technique, it is possible to calculate a special parameter of the chosen maintenance policy: where Θ S D (s i ) is a special parameter of the Howard's policy iteration technique for the state s i of the system. If Θ S D (s i ) = P S (s i ) for j = 1 · · · r, then D is the optimal maintenance policy, otherwise the policy must be changed and a new iteration phase must be computed using value determination equations and the Howard's policy iteration parameter calculation.

Computational Results
This section discusses a scenario analysis, which focuses on the validation of the above-described multi-phase maintenance policy optimization including the long-term optimization based on Howard's policy iteration techniques. In this scenario, the manufacturing system has five different statuses (excellent, good, average, poor, and bad), and based on these states the set of possible statuses is given as follows: Energies 2021, 14, 5674 13 of 33 The decision set of this scenario includes four different level of maintenance policy (no maintenance, level 1, level 2, and level 3): The status of the system in the next period depends on the status of the current period, therefore it is possible to define the transition probabilities between statuses as a transition matrix of the Markov decision process: After performing a maintenance process, the transition probabilities from the current status to status E can be calculated as follows: After performing a maintenance process, the transition probabilities from the current status to status G can be calculated as follows: After performing a maintenance process, the transition probabilities from the current status to status A can be calculated as follows: p(A|P, L 1 ) = p(A|B, L 2 ) = γ A→A = 0.7 p(P|P, L 1 ) = p(P|B, L 2 ) = γ A→P = 0.2 p(B|P, L 1 ) = p(B|B, L 2 ) = γ A→B = 0.1 (12) After performing a maintenance process, the transition probabilities from the current status to status P can be calculated as follows: The profit of the analyzed time window of the manufacturing process is influenced by the income and the costs, including energy consumption (electricity) of manufacturing and maintenance: where σ i,j is the profit of the system in the status of the system i, while maintenance level j is performed, σ I . is the initial income of the manufacturing system, σ EC1 i is the energy consumption of the manufacturing system in the status i and σ EC2 j . is the energy consumption of the performed maintenance operations of maintenance level j.
The energy of manufacturing and maintenance operations can be generated from different sources (lignite, coal, oil, natural gas, photovoltaic, biomass, nuclear, water, and wind), and, depending on these energy generation sources, it is possible to calculate both the cost of energy consumption and the virtual emissions of the whole manufacturing process depending on the status of the system and the performed maintenance operations of the chosen maintenance policy. The cost calculation of energy consumption is based on the levelized cost of energy (Figure 7). The profit of the system based on the status of the system, the performed maintenance level and the costs caused by the energy consumption of the related maintenance level, and the energy consumption of the manufacturing system in the current status are input parameters of the optimization problem: (15) For this calculation, the initial energy consumption of manufacturing and maintenance is taken into consideration (Tables 1 and 2).  In this model, the greenhouse gas emissions of various electricity generations sources published by World Nuclear Association are taken into consideration [69].
Based on the specific GHG emission (Table 3), the total virtual GHG emission can be calculated based on the status of the manufacturing system and the chosen maintenance policy: where e f ,g i,j is the virtual emission of GHG f in the case of the state i of manufacturing system, chosen maintenance policy j, and electricity generation source g, ε spec f is the specific GHG emission of GHG f, c LCOE g is the levelized cost of electricity in the case of electricity generation source g, where f ∈ (CO 2 , SO 2 , CO, HC, NO X , PM). Table 3. Specific greenhouse gas (GHG) emission depending on the electricity generation source in CO 2 emission in g/kWh. [69]. The virtual GHG emission in the case of natural gas electricity generation source is shown in Table 4. The computational results regarding the virtual emission of other GHGs are in Tables 5-9.   As the above tables show, the GHG emission depends on the status of the systems and the maintenance operations performed. The resulting proportion of GHG emission is shown in Figure A1. The expected CO 2 emission reduction caused by the first iteration of the maintenance policy [kg] is shown in Figure A2. After the initialization of the energy consumption and GHG emission values, the first step is to choose an initial maintenance policy: which means that in the case of status E and status G no maintenance is performed, in the case of status A and P first level maintenance is performed, and in the case of status B second level maintenance is chosen. The value determination equations (see Equation (5)), and the expected discounted profit for the predefined time-window can be calculated as follows:

EGS 1 Emission
Solving these equations, the expected discounted profit for each status in the case of the initial maintenance policy can be determined: After the solution of value determination equations, the next phase is to apply the Howard's policy iteration technique, and calculate a Θ S * parameter for each S * maintenance policy. In the case of status E no maintenance is required, therefore based on Equation (6) it is meaningless to change the maintenance policy for status E: In the case of status G, the value of Θ S * parameter can be computed as follows: As the solution of Equation (21) shows, Θ S * (G) = 3351.43€, which means, that there is a better maintenance policy, than the initial policy chosen in the first iteration: In the case of status A, the value of Θ S * parameter can be computed as follows: As the solution of Equation (23) shows, Θ S * (A) = 3331.43€, which means, that there is a better maintenance policy for status A of the system, than the S * (A) = L 1 initial policy chosen in the first iteration: In the case of status P, the value of Θ S * parameter can be computed as follows: As the solution of Equation (25) shows, Θ S * (P) = 3061.43€, which means, that there is a better maintenance policy for status P of the system, than the S * (P) = L 1 initial policy chosen in the first iteration: In the case of status B, the value of Θ S * parameter can be computed as follows: As the solution of Equation (27) shows, Θ S * (B) = 2795.33€, which means, that there is a better maintenance policy for status P of the system than the S * (B) = L 2 initial policy chosen in the first iteration: After that, the new value determination equations can be defined, and the expected discounted profit can be calculated in the case of S * * maintenance policy for the predefined time-window: Solving these equations, the expected profit for each status in the case of the S * * first iteration of maintenance policy (see Equation (29)) can be given: Based on the results of the S * * first iteration, the energy consumption reduction can be determined: where ω * i,g is the energy consumption reduction resulting from the first iteration phase in the case of system status i and electricity generation source g. Similarly, the GHG emission reduction can also be calculated: where ζ * i,g, f is the GHG emission reduction resulting from the first iteration phase in the case of system status i, electricity generation source g, and GHG f.
The computational results of energy consumption reduction and GHG emission reduction after the first iteration phase in the case of natural gas electricity generation source are described in Table 10. Table 10. Energy consumption reduction and GHG emission reduction resulting from the first iteration of maintenance policy optimization within the whole time-window of analysis in the case of natural gas electricity generation source. After the second solution of the value determination equations, the Howard's policy iteration techniques can be applied, and the Θ S * * parameter for each S * * maintenance policy can be calculated. In the case of status E, no maintenance is required, therefore based on Equation (6) it is meaningless to change the maintenance policy for status E:

Initial System Status
In the case of status G, the value of Θ S * * parameter can be computed as follows: As the solution of Equation (34) shows, Θ S * * (G) = 5016€, which means, that the S * * (G) = L 1 first iteration of the maintenance policy for status G is the optimal maintenance policy: In the case of status A, the value of Θ S * * parameter can be computed as follows: As the solution of Equation (36) shows, Θ S * * (A) = 4996€, which means, that the S * * (A) = L 2 first iteration of the maintenance policy for status A is the optimal maintenance policy: In the case of status P, the value of Θ S * * parameter can be computed as follows: As the solution of Equation (38) shows, Θ S * * (P) = 4816.68€, which means that there is a better maintenance policy for status P of the system than the S * * (P) = L 3 initial policy chosen in the first iteration: Θ S * * (P) > P S * * (P) → S * * * (P) = S * * (P) → S * * * (P) = L 2 (39) In the case of status B, the value of Θ S * * parameter can be computed as follows: As the solution of Equation (40) shows, Θ S * * (B) = 4630.14€, which means that the S * * (B) = L 2 first iteration of the maintenance policy for status B is the optimal maintenance policy: After that, the new value determination equations can be defined and the expected discounted profit in the case of S * * * maintenance policy for the predefined time-window can be calculated: P S * * * (E) = σ E,NO + ϑ· γ E→E ·P S * * * (E) + γ E→G ·P S * * * (G) + γ E→A ·P S * * * (A) P S * * * (G) = σ G,L 1 + ϑ· γ E→E ·P S * * * (E) + γ E→G ·P S * * * (G) + γ E→A ·P S * * * (A) P S * * * (A) = σ A,L 2 + ϑ· γ E→E ·P S * * * (E) + γ E→G ·P S * * * (G) + γ E→A ·P S * * * (A) P S * * * (P) = σ P,L 2 + ϑ· γ G→G ·P S * * * (G) + γ G→A ·P S * * * (A) + γ G→P ·P S * * * (P) P S * * * (B) = σ B,L 2 + ϑ· γ A→A ·P S * * * (A) + γ A→P ·P S * * * (P) + γ A→B ·P S * * * (B) (42) Solving these equations, the expected profit for each status in the case of the S * * second iteration of the maintenance policy can be determined: P S * * * (E) = 5126€, P S * * * (G) = 5016€, P S * * * (A) = 4996€, P S * * (P) < P S * * * (P) = 4821.2€, P S * * (B) < P S * * * (B) = 4650.13€ (43) Based on the results of the S * * * second iteration, the energy consumption reduction can be determined: where ω * * i,g is the energy consumption reduction resulting from the second iteration phase in the case of system status i and electricity generation source g. Similarly, the GHG emission reduction can also be calculated: where ζ * * i,g, f is the GHG emission reduction resulting from the second iteration phase in the case of system status i, electricity generation source g, and GHG f.
The computational results of energy consumption reduction and GHG emission reduction after the second iteration phase in the case of natural gas electricity generation source are described in Table 11. The computational results of energy consumption and GHG emission reduction in the case of other electricity generation sources are shown in Tables 12 and 13 and Figure A2 Table 11. Energy consumption reduction and GHG emission reduction resulting from the second iteration of maintenance policy optimization within the whole time-window of analysis in the case of natural gas electricity generation source.  After the third solution of the value determination equations, the Howard's policy iteration techniques can be applied, and the Θ S * * * parameter for each S * * * maintenance policy can be calculated. Based on the condition defined in Equation (4), it is unambiguous that it is no way to improve the maintenance policy and find a better policy for a higher profit, so the optimal maintenance policy for this scenario is the following:

Initial System Status
The improvement of the maintenance policy and the resultant discounted profit of the manufacturing system is shown in Figure 8. Table 13. Energy consumption reduction and GHG emission reduction resulting from the second iteration of maintenance policy optimization within the whole time-window of analysis in the case of electricity generation source oil, photovoltaic, and biomass.

Initial System Status
The improvement of the maintenance policy and the resultant discounted profit of the manufacturing system is shown in Figure 8. The discounted profit resulting from the optimized maintenance policy is influenced by the parameters of the physical systems and the cost model. Figure 9 and Table A3 demonstrate the influence of discounting parameter on the discounted profit. As Figure 9 shows, the increased value of the discount parameter leads to an increase in discounted profit. The discounted profit resulting from the optimized maintenance policy is influenced by the parameters of the physical systems and the cost model. Figure 9 and Table A3 demonstrate the influence of discounting parameter ϑ on the discounted profit. As Figure 9 shows, the increased value of the discount parameter leads to an increase in discounted profit.
The discounted profit depends on the initial status level of the system. Figure 10 demonstrates the influence of initial status level of the system on the discounted profit. As Figure 10 shows, the better status level causes higher discounted profit and the increased ϑ value also increases the discounted profit for all initial system statuses.
The impact of the initial system status and the ϑ discount parameter on the discounted profit is shown in Figure 11 and Table A4. The sensitivity analysis shows that optimal maintenance policy is influenced by the parameters of the technological system and the corresponding cost model represented by the ϑ discount parameter.
The discounted profit resulting from the optimized maintenance policy is influenced by the parameters of the physical systems and the cost model. Figure 9 and Table A3 demonstrate the influence of discounting parameter on the discounted profit. As Figure 9 shows, the increased value of the discount parameter leads to an increase in discounted profit. Figure 9. Impact of the discounting parameter on the discounted energy-consumption-based profit resulting from the optimized maintenance policy in the case of different initial status of the system (best status is Status 1 and worst status is Status 5).
The discounted profit depends on the initial status level of the system. Figure 10 demonstrates the influence of initial status level of the system on the discounted profit. As Figure 10 shows, the better status level causes higher discounted profit and the increased value also increases the discounted profit for all initial system statuses. Figure 9. Impact of the discounting parameter on the discounted energy-consumption-based profit resulting from the optimized maintenance policy in the case of different initial status of the system (best status is Status 1 and worst status is Status 5).
Energies 2021, 14, x FOR PEER REVIEW 24 of 35 Figure 10. Impact of the initial system status on the discounted energy-consumption-based profit resulting from the optimized maintenance policy in the case of different discount parameters.
The impact of the initial system status and the discount parameter on the discounted profit is shown in Figure 11 and Table A4. The sensitivity analysis shows that optimal maintenance policy is influenced by the parameters of the technological system and the corresponding cost model represented by the discount parameter. Figure 11. Impact of the initial system status and the discount parameter on the discounted profit resulted by the optimized maintenance policy. Figure 10. Impact of the initial system status on the discounted energy-consumption-based profit resulting from the optimized maintenance policy in the case of different discount parameters.
Energies 2021, 14, x FOR PEER REVIEW 24 of 35 Figure 10. Impact of the initial system status on the discounted energy-consumption-based profit resulting from the optimized maintenance policy in the case of different discount parameters.
The impact of the initial system status and the discount parameter on the discounted profit is shown in Figure 11 and Table A4. The sensitivity analysis shows that optimal maintenance policy is influenced by the parameters of the technological system and the corresponding cost model represented by the discount parameter. Figure 11. Impact of the initial system status and the discount parameter on the discounted profit resulted by the optimized maintenance policy.
The discount parameter does not influence the optimal maintenance policy, but it has a great impact on the discounted profit, because energy costs have to be taken into consideration as discounted costs. The above-described scenario validates the digital twin-based maintenance policy optimization and justifies the maintenance policy in both Figure 11. Impact of the initial system status and the ϑ discount parameter on the discounted profit resulted by the optimized maintenance policy. The discount parameter does not influence the optimal maintenance policy, but it has a great impact on the discounted profit, because energy costs have to be taken into consideration as discounted costs. The above-described scenario validates the digital twin-based maintenance policy optimization and justifies the maintenance policy in both conventional and cyber-physical manufacturing systems. Services must be optimized in order to increase performance and ensure that all technological and logistics resources can operate at 100% efficiency at all times. To summarize, the proposed Howard's policy iteration technique-based optimization model makes it possible to analyze the impact of maintenance policy on the performance parameters of technological and logistics processes and decrease the energy consumption and the related discounted costs.
As the findings of the literature review show, the articles that addressed the maintenance policy optimization are focusing on conventional production environments, but none of the articles aimed to identify the potential in a cyber-physical production environment, where Industry 4.0 technologies can increase the impact of the maintenance policy optimization. The comparison of our results with those from other studies shows that the optimization of maintenance policies in cyber-physical environments still requires more attention and research.

Discussion and Conclusions
Supply chain optimization is a key factor for a sustainable, cost-efficient economy [70]. The maintenance of production and service processes represent an important part of value chains; maintenance is important not only in the field of manufacturing, but also in the processes of purchasing and distribution. The fourth industrial revolution indicates new directions for the improvement of maintenance processes and their policies. Maintenance policies are discussed in a wide range of literature, but only a few of them focus on the potentials of linking energy and maintenance using real-time data for discrete event simulation. The content analysis part of the literature review showed that existing works mainly discuss the energy efficiency of the primary processes, and that the energy aspects of maintenance processes are out of their scope. To try to fill this gap, this analysis has developed a maintenance policy optimization methodology to analyze an existing maintenance policy based on value determination equations and improve them based on Howard's policy iteration technique. In the model, a wide range of objective functions can be defined, such as costs, reliability, availability, sustainability, or GHG emission, while different constraints can also be taken into consideration. The described methodology shows that the optimization of maintenance policy has a great impact on energy consumption, GHG emission, and profit of the manufacturing system, and that it can support the optimization of maintenance policy, because real-time data collection can improve the efficiency of failure data forecasting and status information collection.
As a managerial impact, I would like to mention that the above-described methodology can support managerial decisions regarding organizational, technological, and logistical aspects of maintenance strategies. As the computational results show, the maintenance policy influences the energy consumption of both the manufacturing processes and the maintenance operations, therefore strategic decisions regarding maintenance policy have a great impact on the costs and income of the whole system. The electricity generation source of the energy used in production and maintenance has a great impact on the GHG emissions, making it important to find the best suitable maintenance strategy for manufacturing systems, and this is also an important managerial impact of the proposed approach.
If real data is available from an industrial application, then through data mining techniques and digital twin-based simulation this model can be extended again for further study regarding the effect of real-time optimization in a sustainable manufacturing system. As in the case of this proposed model, a wide range of energy-and emission-related constraints can immediately be incorporated within the model, and heuristics or metaheuristics can be applied for optimization, which is another immediate extension of the problem [71]. The conclusions and research implications can be summarized as follows:

•
The development of new maintenance policy optimization methods must include Industry 4.0 technologies to improve the performance and reliability of algorithms and techniques. A new maintenance policy optimization method was developed for the improvement of available maintenance policies to improve the discounted profit of the manufacturing system, while energy efficiency and GHG emissions reduction are taken into consideration. The computational results of the analyzed scenario show, that the maintenance policy has a great impact on the cost reduction. In the demonstrated case, cost reduction of 38% was calculated, taking natural gas as the electricity generation source. The energy consumption cost was calculated on the basis of the levelized cost of the electricity.

•
The design and operation of maintenance processes in services and production systems is a great challenge for researchers, because the complex and stochastic environment can lead to complex optimization problems.

•
The computational results validated the described mathematical model and the Howard's policy iteration-based optimization algorithm. The integration of value determination equations and Howard's policy iteration-based optimization with energy-consumption and GHG emission minimization is a suitable tool to reduce costs from a maintenance policy point of view. As the computational results show, the different electricity generation sources influence the virtual GHG emission, which is virtual from the manufacturing system's point of view, as the real emissions are generated at the electricity generation plant and not at the manufacturing plant.
In the case of NP-hard problems, heuristic and metaheuristic algorithms can solve the optimization problems [72]. A further study of the proposed work would be the integration of the above-mentioned methodology with a real-time heuristic policy optimization. Digital twin-based discrete event simulation makes it possible to analyze data from smart sensors of the manufacturing system and link the results of analysis to the optimization process to find the best maintenance policy for the current status of the resources in the manufacturing system.
In the case of a multi-level optimization, the described methodology can be improved and the new algorithm could have the following main phases: (a) data collection from the physical system and transformation of data to the digital twin; (b) definition of objective functions and constraints of the maintenance strategy; (c) long-term optimization of the maintenance policy based on Howard's policy iteration method and value determination equations, including the costs of energy consumption and GHG emission calculation based on the levelized cost of energy and electricity generation sources; (d) collection of real-time failure data and status information from the digital twin; (e) analysis of the maintenance policy based on value determination equations; (f) real-time heuristic optimization of the long-term maintenance policy (re-optimization or correction based on real-time failure data and status information) from a cost, energy consumption, and GHG emissions perspective. This new, multi-level optimization process can be discussed within the frame of further study (see future plans in Figure A3). Taking into consideration the maturity of Industry 4.0 technologies [73], another future research direction is the integration of other I4.0 technologies (RFID, RTLS) with maintenance processes [74].

Conflicts of Interest:
The author declares no conflict of interest. Table A4. Impact of the initial system status and the discount parameter on the discounted profit resulted by the optimized maintenance policy.   Tables for Sensitivity Analysis   Table A3. Impact of the discounting parameter on the discounted profit resulting from the optimized maintenance policy in the case of different initial status of the system (best status is Status 1 and worst status is Status 5).
= . Table A4. Impact of the initial system status and the discount parameter on the discounted profit resulted by the optimized maintenance policy.  Tables for Sensitivity Analysis   Table A3. Impact of the discounting parameter on the discounted profit resulting from the optimized maintenance policy in the case of different initial status of the system (best status is Status 1 and worst status is Status 5).  Table A4. Impact of the initial system status and the ϑ discount parameter on the discounted profit resulted by the optimized maintenance policy.