Digital Twin-Based Hybrid Simulation–Prediction Framework for KPI Optimization in Sustainable Digital Printing

Bratić, Diana; Pasanec Preprotić, Suzana; Cajner, Hrvoje; Preprotić, Branimir

doi:10.3390/technologies14030170

Open AccessArticle

Digital Twin-Based Hybrid Simulation–Prediction Framework for KPI Optimization in Sustainable Digital Printing

¹

University of Zagreb Faculty of Graphic Arts, Getaldićeva 2, 10 000 Zagreb, Croatia

²

University of Zagreb Faculty of Mechanical Engineering and Naval Architecture, Ivana Lučića 5, 10 000 Zagreb, Croatia

³

University of Applied Sciences, Department of Mechanical Engineering, Vrbik 8, 10 000 Zagreb, Croatia

^*

Author to whom correspondence should be addressed.

Technologies 2026, 14(3), 170; https://doi.org/10.3390/technologies14030170

Submission received: 22 December 2025 / Revised: 22 February 2026 / Accepted: 2 March 2026 / Published: 10 March 2026

(This article belongs to the Special Issue Agentic AI-Driven Optimization in Advanced Manufacturing Systems)

Download

Browse Figures

Versions Notes

Abstract

The increasing emphasis on sustainability in digital printing requires quantitative methods for optimizing key performance indicators (KPIs) under technical and operational constraints. The term digital twin is used here in a methodological and analytical sense, as a simulation framework for analyzing interdependence, prediction, and multi-criteria optimization of KPIs, rather than as a direct virtual replica of a specific physical production system. This paper proposes a hybrid simulation–prediction model based on a digital twin framework for optimization of KPIs in sustainable digital printing, with particular emphasis on overall equipment effectiveness (OEE). Due to the limited availability of structured industrial data, the model is developed using a synthetically generated dataset constructed in accordance with industry-reported operating ranges and technically realistic digital printing process variables. Random Forest and XGBoost algorithms are applied to model nonlinear relationships between process parameters and KPIs, including material waste, energy consumption, machine downtime, and OEE. Based on these predictive models, a constrained multi-objective optimization procedure is performed to identify Pareto-efficient configurations that reduce material waste and energy consumption while maintaining acceptable downtime and OEE levels. The results characterize structural trade-offs among environmental and operational KPIs within a formally defined decision space.

Keywords:

digital twin; digital printing; overall equipment effectiveness (OEE); key performance indicators; ESG metrics; sustainability; multi-criteria optimization; machine learning

1. Introduction

The printing industry is undergoing a process of digital transformation [1,2,3,4,5], in which traditional production structures are increasingly being replaced by automated and data-driven systems [6,7]. Digital printing, as the fastest-growing segment, enables flexible production and adaptation to market demands, while simultaneously introducing new technical and organizational challenges related to sustainability [8]. Reducing material waste, energy consumption, and machine downtime has become a primary technical priority in systems aiming to optimize resource utilization and ensure compliance with ESG principles. In this context, key performance indicators (KPIs) have become a fundamental instrument for efficiency monitoring and real-time technical decision-making [9].

Despite the growing availability of digital production systems, most printing facilities lack centralized KPI databases or standardized models for collecting technical information [10,11,12]. Data is often distributed across heterogeneous software systems, without effective mechanisms for aggregation and analytical processing. As a result, the practical application of predictive analysis and optimization methods remains limited, and research frequently relies on partial or synthetic datasets. Existing studies typically focus on isolated aspects of efficiency, such as energy consumption or the number of out-of-specification prints, rather than adopting an integrated approach that connects all key technical variables within a unified optimization framework.

The concept of a digital twin offers a solution to the limited availability of real-world data. A digital twin represents a virtual replica of a physical process, enabling the monitoring and analysis of production parameters within a simulated environment. In this context, synthetic data generated by simulation models do not constitute a substitute for real measurements, but rather a methodological tool for investigating relationships, sensitivities, and trade-offs within the operational space of the process, in line with dominant approaches in the digital twin literature [13,14]. In the context of digital printing, such a digital twin framework can reproduce the relationships between print speed, toner coverage, fuser temperature, paper type, and ambient humidity, as well as their impact on technical performance indicators. In this study, the digital twin framework is defined as a model-based and structurally constrained virtual representation of the process, rather than as a data-synchronized replica of a specific industrial installation.

This paper develops a simulation-based and methodological digital twin framework designed to analyze the relationships among key performance indicators in digital printing production under conditions of limited availability of industrial data [13]. The proposed digital twin framework does not constitute a direct virtual replica of a specific physical printing machine, nor is it connected to real-time measurements from an operational production system. Accordingly, the primary contribution of this study lies in the development and validation of an integrated simulation–prediction–optimization framework, rather than in the derivation of process-specific optimal settings for a particular digital printing system. Instead, it functions as a controlled virtual environment for the generation of synthetic process data, their predictive modeling, and multi-criteria optimization. The objective of the study is not to determine actual operational parameter settings or to provide directly applicable industrial recommendations, but rather to investigate structural interdependencies, trade-offs, and the influence of process parameters on sustainable and operational KPIs within a defined modeling framework. Accordingly, the optimization results presented in this study should be interpreted exclusively as model-dependent and comparative outcomes, rather than as empirically validated values of a real production process. The proposed framework represents an early stage in the development of a digital twin, intentionally designed to allow future calibration using real industrial data and gradual extension toward more adaptive or autonomous decision-support systems. In the context of sustainable manufacturing, this approach enables the analysis of trade-offs between environmental and operational objectives and supports the development of data-driven optimization models that are not constrained by real production experiments [14]. Within this context, the scientific validity of the analysis does not arise from the origin of the data, but from the physical and operational realism of the modeled relationships, the transparency of the underlying assumptions, and the coverage of the relevant operational space of the process.

The framework employs a synthetic dataset generated in accordance with industry standards and technical variables of the digital printing process. Based on this dataset, Random Forest and XGBoost predictive algorithms are developed to estimate and optimize indicators such as material waste, energy consumption, machine downtime, and overall equipment effectiveness (OEE). The optimized technical parameters are validated within a simulation environment that operates as a digital twin framework of the digital printing process.

The objective of this research is to develop a methodological framework that enables the technical optimization of production processes in situations where real industrial data is unavailable or limited. The particular contribution of the proposed approach lies in the integration of artificial intelligence and simulation into a unified digital twin framework that supports the quantification, analysis, and multi-criteria optimization of technical performance indicators [12,15,16,17,18,19,20,21].

2. Theoretical Background

The graphic industry is undergoing a process of digital transformation in which traditional production structures are being replaced by automated and data-driven systems. Digital printing, as the fastest-growing segment, enables flexible production and adaptation to market demands, while simultaneously introducing new technical and organizational challenges related to sustainability [22]. Reducing material waste, energy consumption, and machine downtime has become a primary technical priority in systems that aim to optimize resource utilization and ensure compliance with ESG principles. In this context, KPIs emerge as a fundamental instrument for efficiency monitoring and real-time technical decision-making [23].

Within the Industry 4.0 paradigm, production systems and manufacturing units operate autonomously and respond dynamically to changes in orders, priorities, and operational disturbances, which require comprehensive information on the current system state and its operational capabilities [24]. Process and decision-making autonomy is therefore recognized as a core characteristic of smart manufacturing, with digitalization and the digital twin playing a central role in the development of intelligent production systems [25]. Tao, Zhang, Liu, and Nee [26] further identify digital twins as one of the key enabling technologies of Industry 4.0, owing to their ability to continuously integrate the cyber and physical domains throughout the entire life cycle of products and production systems.

Pressures associated with greenhouse gas emissions further intensify the need for reliable, measured data on energy consumption and material flows in digital printing [9], as evidenced by life cycle assessment (LCA) studies based on measurements conducted in real printing facilities [27]. Evidence from comparable industries further demonstrates that digitally enabled production models can support circular economy objectives and resource efficiency through the application of Industry 4.0 technologies, highlighting their relevance for sustainability-oriented manufacturing contexts [28]. In production environments employing digital printing variants, empirical measurements under real operating conditions further suggest that digital technologies may exhibit a more favorable environmental profile than conventional methods, depending on the application scenario and context [29]. Similarly, the concept of on-demand production associated with digital printing is increasingly positioned as a mechanism for reducing overproduction, excess inventories, and waste, while maintaining economic viability through order customization and personalization [30].

Despite the increasing availability of digital production systems, most printing facilities do not maintain centralized KPI databases or standardized models for collecting technical information. Data is often distributed across multiple software systems, without effective mechanisms for aggregation and analytical processing. As a result, the practical application of predictive analysis and optimization methods remains limited, and research frequently relies on partial or synthetic datasets. Review studies further highlight the heterogeneity of digital twin approaches, differences in modeling depth, and the lack of unified terminology, all of which hinder their systematic adoption in production environments [31]. In this context, the digital twin is recognized as a central concept of smart manufacturing that integrates physical systems, data, and analytical models with the aim of improving the efficiency and reliability of production processes [32]. Kusiak [33] emphasizes that data is becoming a high-value resource and that digital environments enable the optimization and simulation of decisions prior to their implementation in physical systems, while simultaneously raising new challenges related to model complexity, interpretability, and reusability.

KPIs such as overall equipment effectiveness (OEE) and related KPIs are increasingly used in modern manufacturing systems not only as descriptive tools for retrospective analysis, but also as target variables in predictive and optimization models. In the context of digital manufacturing, OEE and KPI frameworks enable quantitative linkage between process parameters, operational efficiency, and environmental impact, thereby forming the basis for data-driven technical decision-making.

Numerous studies indicate that the relationships between process variables and KPIs are highly nonlinear, multidimensional, and characterized by the simultaneous interaction of multiple parameters. Such structural complexity limits the applicability of classical analytical and linear models and justifies the use of supervised machine learning algorithms to approximate functional relationships between input variables and target performance metrics, including OEE, energy consumption, and waste indicators.

Among various machine learning approaches, ensemble models based on decision trees, such as Random Forest and gradient boosting methods, have proven to be particularly well suited for predictive modeling of KPIs in industrial systems. Their robustness to data noise, ability to capture nonlinearities and interaction effects, and relative interpretability make them suitable for applications where understanding the influence of individual process parameters on overall system efficiency is essential, especially in manufacturing environments characterized by pronounced process variability [34,35]. Previous studies confirm the successful application of Random Forest and XGBoost models for predicting OEE and related performance indicators in manufacturing contexts, particularly within digital twin frameworks and data-driven optimization settings [36,37]. Similar approaches combining ensemble models with optimization techniques have also been successfully applied to the prediction and optimization of complex engineering systems [38,39].

In a broader context, machine learning is increasingly applied to the quantification of non-financial indicators, with ESG metrics being treated as dynamic, data-driven measures of risk and sustainability [40]. To ensure the practical applicability of predictive models in such settings, model interpretability becomes as important as predictive accuracy itself [41]. In predictive tasks related to equipment efficiency, Dobra and Jósvai [42] demonstrate that KPI metric structures such as OEE are not used solely for monitoring purposes, but also for estimating future performance, allowing supervised machine learning models to be compared in terms of predictive accuracy across different time-horizon strategies. In their subsequent work [43], the authors emphasize that, despite the routine collection of large volumes of production data through MES and ERP systems, OEE and related indicators remain only partially predictable, thereby motivating the use of more robust and combined machine learning approaches.

A recent review by Ördek, Borrgiani, and Coatanea [44] further confirms the rapid expansion of machine learning applications in manufacturing, while also highlighting the fragmentation of the field and the fact that most existing solutions target limited functions and narrowly defined objectives. The authors argue that excessive specialization could be mitigated through broader adoption of transfer learning approaches, enabling knowledge transfer across machines, production lines, and industrial domains. In the context of industrial automation, another group of authors [45] emphasizes that machine learning is increasingly embedded in system monitoring, inspection, and optimization through the analysis of sensor data and operational histories, enabling proactive fault prediction, intelligent inspection, and dynamic process control. They also highlight the growing prominence of deep learning, particularly convolutional and recurrent architectures and a shift toward real-time implementations at the network edge, accompanied by closer integration with digital twin and Edge AI concepts.

In the domain of quality assurance, studies indicate that algorithm selection is non-trivial due to performance variability, scalability issues, and inconsistent evaluation metrics [46]. In this context, different machine learning models exhibit complementary strengths across tasks such as defect detection, predictive maintenance, and process parameter optimization, with interpretability remaining an important consideration for process management. With respect to sustainability management in competitive manufacturing environments, Karjust et al. [47] emphasize that performance measurement plays a central role in aligning operational efficiency with sustainable development objectives through structured KPI systems and propose an advanced KPI selection and prioritization framework based on expert judgment, outlier detection, and multi-criteria decision-making under uncertainty.

In the context of the circular economy, Mejía-Moncayo et al. [48] show that sustainable remanufacturing requires a particularly broad repertoire of KPIs encompassing economic, environmental, and social dimensions, and they identify numerous indicators and application areas that link KPI systems with smart and sustainable remanufacturing practices. At the level of process quality control, Zuher Hassan Abdullaha [49] further demonstrates that operational and environmental sustainability can be directly addressed by minimizing non-productive time and resource consumption, with indicators such as production efficiency, production cycle efficiency, and resource utilization efficiency being explicitly linked to reductions in environmental and operational impacts, and with lead time emphasized as a critical quality performance indicator.

Optimization of production systems oriented toward sustainability typically involves multiple objectives that are inherently competing. In the context of digital printing, reductions in material waste and energy consumption are often in conflict with increases in productivity, process stability, and overall equipment effectiveness. Such a structure of objectives precludes the formulation of a single optimal solution that would simultaneously maximize all performance-and environment-related indicators.

Conventional optimization approaches frequently attempt to collapse multiple objectives into a single aggregated loss function through the use of weighting factors. However, the selection of weights is inherently subjective and context dependent, and the resulting solutions tend to obscure the actual trade-offs between individual KPIs. In production systems subject to pronounced environmental and operational constraints, such approaches may lead to technically unacceptable or operationally unstable solutions.

Multi-criteria optimization based on the Pareto principle provides a conceptually more appropriate framework for the analysis of such systems. Rather than seeking a single optimum, the Pareto approach identifies a set of non-dominated solutions for which no objective can be improved without deteriorating at least one other objective. This explicitly quantifies the trade-offs between environmental and operational goals and shifts the decision-making process from the algorithmic level to the level of engineering judgment.

Within the literature on sustainable manufacturing and industrial system optimization, Pareto analysis is widely accepted as a standard tool for evaluating trade-offs among energy-related, material-related, and performance-related objectives. This approach is particularly relevant in the context of digital twins, where simulation and predictive environments enable systematic exploration of the entire solution space and informed decision-making based on the structure of trade-offs rather than on predefined weighting factors [50,51,52].

Within this context, the digital twin enables the integration of simulation, data, and analytical models into a unified framework for technical decision-making. Since not all relevant quantities in real production systems can be directly measured, combining real operational data with simulation models originating from the design phase allows for more reliable predictions of system behavior, particularly in flexible manufacturing environments [24].

Digital twins are increasingly defined as networked digital entities that integrate data, functional capabilities, and communication features, enabling the automation of complex production and value chains within the Internet of Things [53]. Lu et al. [54] further extend this concept to the supply chain level, emphasizing that a sustainable digital twin must encompass not only technical systems, but also people and processes across the entire value chain. Such a holistic approach enables the alignment of operational decisions with sustainability objectives throughout the full product life cycle and associated logistics flows. Systematic reviews of industrial digital twin applications further confirm their broad applicability in design, production, maintenance, and system health management, while also identifying clear challenges related to scalability and standardization [26]. In the context of the circular economy, Sajadieh and Noh [55] propose a hierarchical digital twin maturity model that links implementation stages with closed-loop principles, positioning the digital twin as a fundamental mechanism for resource optimization, life-cycle management, and predictive decision-making in sustainable industrial systems. In operational sustainability applications, Nozari and Yordanova [56] propose a digital twin environment as a source of real-time data and combine it with uncertainty fuzzification and multi-objective optimization (NSGA-II) to simultaneously manage cost, energy, and waste, with Pareto analysis explicitly quantifying the trade-offs between economic and environmental objectives.

The application of simulation techniques enables the development of experimentable digital twins that can be interconnected and tested within virtual environments and subsequently integrated with real production systems in hybrid control and monitoring scenarios [53]. Such approaches support the development of advanced control strategies, user interfaces, and mental models for intelligent systems, through which the digital twin progressively evolves into a comprehensive digital representation of the physical asset and its behavior. The authors of [57] demonstrate that networked digital twins can be used to link multiple production sites into a unified monitoring and control system through the application of IoT technologies, advanced networking protocols, and real-time security mechanisms. These models confirm that digital twins can simultaneously support technical optimization, remote monitoring, and proactive maintenance in distributed manufacturing environments.

A distinct application domain relates to monitoring systems, where digital twins based on augmented reality enable real-time data analysis, advanced visualization of system performance, and reductions in configuration time and equipment setup errors [58,59]. In the context of sustainable intelligent manufacturing, digital twins are further emphasized as tools for early fault detection and prediction of system degradation, thereby reducing resource losses and improving long-term production reliability [60]. Timperi et al. [61] further emphasize that the potential of digital twins extends beyond purely technical prerequisites to encompass business and organizational dimensions of sustainability, particularly in the transition toward a circular economy.

Similarly, Gbededo and Liyanage [62] demonstrate that simulation-based digital twins enable the simultaneous evaluation of the economic, environmental, and social impacts of production decisions, thereby positioning the digital twin as an operational tool for multi-criteria decision-making in sustainable manufacturing. In the context of implementation planning and change management, Jawad et al. [63] further emphasize that the expected added value of a digital twin should be defined at early stages of digital transformation through both operational and strategic criteria, allowing optimization to be performed in parallel at the levels of plant layout, machine fault tolerance and design, and product quality.

In equipment maintenance practice, predictive maintenance is increasingly recognized as a core component of sustainable smart manufacturing, where large-scale operational datasets are leveraged for automated fault detection and diagnostics in order to reduce downtime and increase system utilization [64]. Broader review studies further confirm that machine learning algorithms are systematically applied to process and parameter optimization across a wide range of technology-driven manufacturing environments, while also identifying prevailing trends, key challenges, and directions for future research [65].

In real-world production settings, the competency gap frequently emerges as a limiting factor in the implementation of machine learning solutions. To address this issue, Rosemeyer, Pinzone, and Metternich [66] propose digital assistive systems that enable domain experts with limited programming expertise to develop sustainable machine learning applications. The authors note that existing solutions predominantly focus on technical aspects, while user integration and validation under industrial conditions remain insufficient, which is critical for effective knowledge transfer and stable operational deployment.

At the level of corporate governance, Hristov and Chirico [67] further caution that the measurement of sustainable development becomes problematic when KPI systems are not directly linked to business strategy. They therefore propose structuring KPIs along environmental, social, and economic dimensions in order to ensure strategic alignment and value creation. Within the context of the circular economy, another group of authors [68] systematically identifies and aggregates key circular economy (CE) indicators in manufacturing, highlighting composite measures such as strategies and initiatives, material efficiency, remanufacturing productivity, investment in technology, and eco-innovation. This provides an operationalized framework for measuring and optimizing circularity in industrial systems.

This paper proposes a hybrid simulation–prediction model based on a digital twin for the optimization of KPIs in sustainable digital printing. The model relies on a synthetic dataset generated in accordance with industry standards [10,12] and technical variables of the digital printing process. Based on this dataset, Random Forest and XGBoost predictive algorithms are developed to estimate and optimize indicators such as material waste, energy consumption, machine downtime, and overall equipment effectiveness (OEE). The optimized technical parameters are validated within a simulation environment that functions as a digital twin of the digital printing process.

In the context of intelligent manufacturing, digital twins are increasingly being extended toward knowledge-driven architectures that enable perception, simulation, prediction, optimization, and autonomous control of production units through the use of dynamic knowledge bases and intelligent skills [69]. Ponnusamy, Ekambaram, and Zdravković [70] emphasize that the integration of deep learning with digital twins supports adaptive learning, fault prediction, and production scheduling optimization, thereby transforming digital twins from descriptive models into predictive and self-adaptive systems. This approach is particularly relevant in data-intensive manufacturing environments in which large and heterogeneous datasets are processed in real time. In such settings, deep learning is identified as a suitable approach for advanced manufacturing data analytics, especially in fault diagnosis, quality management, and predictive maintenance tasks that directly affect system sustainability [35]. In practice, this implies that, in addition to predicting KPI outcomes, transparent explanations of the influence of key variables on individual predictions must be provided, which is operationalized by the SHAP framework through feature importance attribution for each model output [41].

An analysis of the relevant literature indicates that digital twins, when combined with predictive modeling, simulation, and multi-criteria optimization, constitute a key conceptual framework for managing complex manufacturing systems within the context of Industry 4.0 and sustainable manufacturing [12,15,71,72,73].

At the same time, existing studies reveal a pronounced gap between theoretical digital twin models and their operational industrial deployment, particularly with respect to maturity levels, model validation, and integration with technical decision-making processes, as well as in domains characterized by limited or fragmented availability of reliable industrial data [74]. This methodological gap highlights the need for integrated approaches that connect generative modeling, prediction, and optimization within a unified digital twin framework. This approach is operationalized in the remainder of this paper through the proposed digital twin framework.

Within the Industry 4.0 paradigm, the proposed framework is positioned as an analytical and optimization-oriented digital twin that enables the integration of process variables, KPIs, and multi-criteria optimization within a unified simulation environment. Although the system presented in this study does not include autonomous decision-making or agent-based process control, its conceptual architecture is compatible with further development toward adaptive and agent-based approaches, for example, through integration with supervisory systems, real-time feedback loops, or autonomous optimization algorithms. In this way, conceptual alignment with the Industry 4.0 framework is ensured, while preserving methodological clarity and the defined limitations of the research.

3. Methodology

The methodological approach is based on the development of a hybrid model that integrates simulation, prediction, and validation of technical performance indicators within the digital printing process. The model is grounded in the digital twin concept, which enables virtual representation of the process and systematic examination of the relationships between production parameters and key performance indicators (KPI). The proposed approach comprises three interconnected layers: a simulation layer, in which synthetic data describing process variables are generated; a predictive layer, in which machine learning algorithms are trained and evaluated; and a validation layer, in which optimized parameter settings are applied and visualized within the digital twin environment.

The procedure involves the generation of input variables using the Latin Hypercube Sampling statistical method, the development of Random Forest and XGBoost predictive models, and the application of multi-criteria optimization aimed at reducing material waste and energy consumption while maintaining acceptable levels of machine downtime and overall efficiency. The optimization results are validated within the simulation environment through the digital twin framework, thereby verifying model consistency and the stability of the proposed technical parameters.

3.1. Framework Concept and Architecture

The proposed hybrid framework is based on a three-layer architecture comprising a simulation layer, a predictive layer, and a validation layer (Figure 1). Each layer serves a clearly defined role in the processes of data generation, framework development, and verification of optimized technical configurations. The architecture is designed to operate under conditions of limited availability of real industrial data while preserving analytical rigor and enabling technically interpretable results [75]. From a methodological perspective, it is important to emphasize that the proposed framework is intended for the analysis of system behavior and interrelationships among process variables and KPIs within a defined simulation structure, rather than for modeling a real physical system with the aim of direct industrial application. In this context, the numerical optimal values obtained through multi-criteria optimization represent optimal solutions within the defined model space, rather than direct recommended settings of a specific production system. Their practical utility lies in identifying structurally feasible operating regions, analyzing sensitivity of KPIs to parameter variations, and revealing the magnitude and direction of trade-offs between environmental and operational objectives. For engineering practice, such information supports scenario evaluation, risk-aware parameter adjustment, and prioritization of process interventions without requiring immediate experimental implementation on production equipment.

The simulation layer represents the initial phase in which a synthetic dataset is generated based on digital printing process variables. Within this layer, the ranges of input parameters and their interdependencies are defined, and a statistical data generation approach is applied to capture a wide spectrum of possible process states. The objective of the simulation layer is to construct a representative set of input and output variables that can serve as a foundation for training predictive models.

The predictive layer utilizes the generated data to train machine learning algorithms. In this layer, relationships between input variables and key performance indicators (KPIs) are analyzed, model accuracy is evaluated, and optimization of technical process parameters within the defined model space is performed. The predictive layer enables the identification of parameter combinations that minimize energy consumption and material waste while simultaneously maintaining target OEE levels and acceptable machine downtime.

The validation layer links the optimized parameter settings to the digital twin framework of the process. In this layer, selected parameter combinations are transferred to the digital twin simulation environment to assess their stability and technical feasibility within a dynamic operational simulation structure. Validation involves comparison between a baseline scenario and an optimized scenario, as well as analysis of differences in key performance indicators.

Data flow through the proposed architecture proceeds through four stages: data generation in the simulation layer, analysis and modeling in the predictive layer, parameter optimization, and feedback validation within the digital twin simulation environment. Such an approach ensures methodological consistency and provides a foundation for the potential development of adaptive control systems in sustainable digital printing environments.

3.2. Definition of Variables and Input Data Generation

Modeling the relationships between digital printing process variables and the corresponding KPIs requires a clear definition of the input parameters and the mechanisms through which they influence system performance. At this stage, the structure of the input vector is defined, together with the functional forms used to generate KPI values and the simulation procedure based on Latin Hypercube Sampling. The objective is to construct a rich, statistically representative, and technically realistic dataset that can be used for training predictive algorithms and performing optimization within the defined model space.

This study considers digital printing based on an electrophotographic toner-based process. The selected process variables, including fuser temperature, toner coverage, and paper behavior under thermal load, are specific to electrophotographic systems and are not directly applicable to inkjet technologies, which rely on different drying mechanisms and ink-substrate interaction principles. Accordingly, the proposed digital twin framework and the associated generative model are defined exclusively within the context of electrophotographic digital printing.

Although color management represents an important segment of graphic production, it is not included in this study as a separate set of variables. This is because the research objective is focused on the technical and operational aspects of the process that directly affect resource consumption, process stability, and equipment efficiency, rather than on color accuracy or the visual quality of the print. Incorporating color management systems would require an additional modeling layer and a different set of KPIs, which is beyond the scope of this work.

The input variable vector comprises the core digital printing process parameters and is extended with variables describing paper properties, production intensity, and equipment condition. This extension enables more realistic modeling of the effects of maintenance activities and mechanical degradation on KPIs within the simulation framework.

Digital printing process parameters are represented as an input vector of dimension d, where each component corresponds to a single measurable technological variable. Formally, the system state in the i-th simulation is described as:

\begin{matrix} x_{i} = [\begin{matrix} v_{i} \\ c_{i} \\ t_{i} \\ h_{i} \\ p_{i} \\ g_{i} \\ n_{i} \\ s_{i} \\ r_{i} \end{matrix}] \in R^{d} \end{matrix}

(1)

where vᵢ denotes the print speed, cᵢ the average toner coverage, tᵢ the fuser temperature, hᵢ the ambient relative humidity, pᵢ the paper type (encoded as a discrete variable), gᵢ the paper grammage, nᵢ the print run size, sᵢ the machine wear index, and rᵢ the number of prints since the last scheduled maintenance. This structure reflects the technical specificity of digital printing, in which variations in any of these variables have an immediate and measurable impact on toner fusing behavior, paper transport, image stability, and energy consumption.

To ensure that the simulation reflects real operating conditions of digital printing, an operational range is defined for each input variable within the technically permissible limits of production systems. This approach is consistent with recent literature on digital twins in sustainable manufacturing [76,77,78], where simulation models are constructed within a physically and operationally feasible process space and their realism is subsequently validated through analysis of system behavior [55,79,80,81,82,83,84].

In this study, the following variables are considered: print speed v [A4/min], average toner coverage c [% area], fuser temperature t [°C], relative humidity h [%], paper type p, paper grammage g [g/m²], job size n [number of prints], machine wear index s, and the interval since the last maintenance r [number of prints]. The operational ranges and their technical justification are presented in Table 1, and the same ranges are used as bounds for the Latin Hypercube sampling procedure described later in this section. The defined operational ranges are derived from manufacturer technical specifications of production-class electrophotographic digital presses and from industrial operating practice reported in the literature on digital printing systems. The ranges do not correspond to a single specific machine model but represent consolidated engineering bounds that capture the typical operating space of contemporary production digital printers [79,80,81].

The paper type is modeled as a discrete variable with three categories, where p = 0 denotes uncoated papers used for office and standard commercial jobs, p = 1 represents coated papers intended for more demanding graphic applications, and p = 2 corresponds to specialty papers that exhibit a higher propensity for waste and energy variability at higher coverage levels. Paper grammage g, machine wear index s, and the interval since the last maintenance r are included in the input vector as additional variables influencing process stability, downtime occurrence, and overall energy efficiency.

In an industrial setting, variables such as machine wear index and maintenance interval would not represent directly measured physical quantities, but aggregated indicators derived from in-process monitoring systems, maintenance logs, vibration signals, acoustic emissions, temperature sensors, and historical fault records. Their numerical values would therefore depend not only on the physical condition of the equipment, but also on the specific process context in which measurements are acquired.

In contemporary manufacturing environments, in-process signals are frequently influenced by context-dependent factors, including sensor positioning, relative motion between measurement sources and mechanical components, production speed regimes, substrate characteristics, and geometric process configurations. Such contextual variability can introduce systematic bias into derived indicators and, consequently, into KPI estimation. Robust data-driven models are therefore required not merely to detect correlations, but to decode context-dependent signatures embedded in monitoring signals and to distinguish structural degradation patterns from operational variability.

Within the present study, the machine wear index and maintenance interval are modeled at an abstract level in order to capture their structural influence on downtime and efficiency. This abstraction does not assume direct sensor equivalence; rather, it represents a parametrized proxy for condition-aware monitoring variables that, in real-world deployment, would be derived from context-sensitive data acquisition systems. Accordingly, the predictive layer of the proposed digital twin framework is conceptually aligned with context-aware learning approaches, where model stability must be preserved under variations in measurement conditions and operational regimes.

The output variables, i.e., the KPIs, are modeled as a nonlinear function of the input variables with an additional stochastic component that captures unstructured influences that cannot be directly measured, such as micro-scale temperature fluctuations or deviations in paper transport:

\begin{matrix} y_{i} = F (x_{i}) + G (x_{i}) η_{i}, η_{i} \sim N (0, Σ) \end{matrix}

(2)

where the vector yᵢ comprises material waste wᵢ, energy consumption Eᵢ, downtime dᵢ, print quality Qᵢ, and overall equipment effectiveness OEEᵢ. This formulation allows the process to be defined as a multivariate stochastic system, which is consistent with approaches commonly used in industrial simulations and optimization models.

Linear and quadratic terms are employed to model individual KPIs, enabling the representation of nonlinearities and interaction effects among process parameters.

Material waste is defined as:

\begin{matrix} w_{i} = x_{i}^{⊤} A_{w} x_{i} + b_{w}^{⊤} x_{i} + ϵ_{w, i} \end{matrix}

(3)

Energy consumption is defined as:

\begin{matrix} E_{i} = x_{i}^{⊤} A_{E} x_{i} + b_{E}^{⊤} x_{i} + ϵ_{E, i} \end{matrix}

(4)

Downtime is defined as:

\begin{matrix} d_{i} = x_{i}^{⊤} A_{d} x_{i} + b_{d}^{⊤} x_{i} + ϵ_{d, i} \end{matrix}

(5)

For practical implementation of the model, normalized variables are introduced by linearly mapping the actual ranges defined in Table 1 onto the interval [0, 1]. For print speed v, toner coverage c, fuser temperature t, humidity h, paper grammage g, job size n, and the interval since the last service r, the following transformations are defined:

x_{norm} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(6)

while the machine wear index s is defined directly on the interval [0, 1]. The paper type is modeled using indicator variables Iₚ (coated) and Iₛₚₑc (special), where uncoated paper corresponds to the case Iₚ = 0 and Iₛₚₑc = 0.

Material waste wᵢ in the i-th simulation is defined as a nonlinear function of the input variables:

w_{i} = m a x {0, 1.5 + 2.0 v_{i}^{n o r m} + 3.0 c_{i}^{n o r m} + 1.5 h_{i}^{n o r m} + 2.5 |t_{i}^{n o r m} - 0.6| + 1.0 g_{i}^{n o r m} + 2.0 s_{i} + 1.5 r_{i}^{n o r m} + 1.5 v_{i}^{n o r m} c_{i}^{n o r m} + 1.0 ∣ t_{i}^{n o r m} - 0.6 ∣ g_{i}^{n o r m} + 1.0 h_{i}^{n o r m} I_{s p e c, i} + ε_{w, i}}

(7)

where the coefficients reflect technically established trends: an increase in material waste with toner coverage, print speed, humidity, system wear, and deviation of temperature from the optimal range around t^norm ≈ 0.6, which corresponds to approximately 180 °C.

Before further interpretation of these relationships, it is important to emphasize that the mathematical expressions defined in Equations (7)–(15) are formulated as generative, parameterized models whose primary purpose is the synthesis of datasets with controlled structural properties, rather than the physically precise modeling of a real printing process. In this context, the coefficients do not represent empirically calibrated parameters or physical constants, but weighted terms introduced to regulate the relative influence of individual process variables and their interactions within the simulation environment. Accordingly, the resulting optimal values and modeled objective relationships should be interpreted strictly as model-dependent outcomes, within the defined framework space, with a clear distinction between structural model realism and empirical representativeness.

Changes in the weighted coefficients within the generative model necessarily lead to shifts in the position and shape of the Pareto front of the optimization problem. Such shifts do not represent a limitation of the proposed approach, but rather the expected behavior of a multi-criteria optimization model in which trade-offs between KPIs are defined with respect to selected priorities. Accordingly, the optimal solutions obtained in this study should be interpreted as optimal within a given set of weights and constraints, and not as universally optimal settings of the digital printing process.

Consequently, all numerical values obtained in the prediction and optimization phases, including relative changes in aggregated indicators such as OEE, represent results conditioned by the defined framework structure and the selected weighted relationships among variables. These results do not describe the behavior of a real digital printing process, but instead serve for sensitivity analysis of the system, identification of structural trade-offs between KPIs, and evaluation of relative changes within the simulated operational space. In this sense, the value of the obtained results lies not in their absolute magnitude, but in their ability to reveal the internal structure of interactions among process variables and KPIs under the defined modeling assumptions.

Energy consumption per 1000 prints is defined by a linear model:

E_{i} = 24 + 6.0 t_{i}^{n o r m} + 3.0 v_{i}^{n o r m} + 2.0 c_{i}^{n o r m} + 1.5 g_{i}^{n o r m} + 1.5 I_{p, i} + 1.0 I_{s p e c, i} + ε_{E, i}

(8)

where elevated temperature, print speed, toner coverage, and paper grammage, as well as more surface-demanding paper types, lead to an increase in specific energy consumption.

The defect rate, expressed in ppm, is modeled as:

D_{i} = 300 + 700 ∣ t_{i}^{n o r m} - 0.6 ∣ + 500 h_{i}^{n o r m} + 300 c_{i}^{n o r m} + 200 v_{i}^{n o r m} + 150 I_{s p e c, i} + ε_{D, i}

(9)

which reflects the sensitivity of digital printing to humidity, fuser temperature deviations, and high toner coverage.

Downtime dᵢ is defined using a logistic function to ensure values within the interval [0, 100]:

d_{i} = 100 \cdot σ (- 2.2 + 1.8 s_{i} + 1.6 r_{i}^{n o r m} + 0.9 v_{i}^{n o r m} + 0.5 c_{i}^{n o r m} + ε_{d, i})

(10)

where

σ (z) = \frac{1}{1 + e^{- z}}

.

Total throughput is expressed as:

T_{i} = v_{i} (1− \frac{d_{i}}{100})

(11)

A_{i} = 1 - \frac{d_{i}}{100}

(12)

Q_{i} = 1 - \frac{w_{i}}{100}

(13)

P_{i} = \frac{T_{i}}{v_{nom}}

(14)

and the overall efficiency is estimated by the expression:

O E E_{i} = A_{i} P_{i} Q_{i}

(15)

with nominal speed vₙₒₘ = 180 A4/min. The presented expressions constitute the parametrization of the quadratic and nonlinear components of the model, enabling the realistic generation of synthetic KPI values for the purpose of training predictive models and performing optimization.

The selection and scaling of coefficients in the generative model are performed to preserve simulation stability and to ensure a clear differentiation of the relative influence of individual process variables. The coefficients are defined to produce comparable ranges of variation in the KPIs within the normalized variable space, thereby preventing the dominance of specific inputs due solely to differences in numerical magnitude. In this way, the generative model remains aligned with its primary role in this study, namely the generation of synthetic data suitable for multi-criteria analysis and the evaluation of optimization strategies.

To assess the robustness of the generative model, the weighting coefficients associated with toner coverage, print speed, and machine wear in Equations (7)–(10) were varied within ±20% of their nominal values, and model stability was evaluated based on the preservation of relative KPI scaling and the qualitative structure of the resulting optimization solution space. The results of this analysis indicate that variations in the coefficients primarily affect the range and relative scaling of individual KPIs, whereas the fundamental structure of interdependencies and the shape of trade-off solutions within the Pareto space remain preserved.

The definitions of the OEE components are adapted to the available KPIs in digital printing, with all quantities expressed in a dimensionless form to ensure consistent integration into the generative and optimization model. The use of symmetric matrices

A_{w}

,

A_{E}

and

A_{d}

enables the representation of interdependencies among input variables, while the linear terms represent the primary and strongest effects of individual inputs on KPI values. This formulation is commonly used in response surface models and nonlinear regression systems in the field of industrial optimization.

These expressions represent the widely used industrial metrics for evaluating the efficiency of production systems and are required for defining constraints in the subsequent optimization phase.

The presented expressions constitute a parameterized generative model describing the relationship between process variables and KPIs, where the directions of influence and functional relationships are defined in accordance with general technical principles of production processes and with interaction patterns reported in the literature on digital twins and data-driven manufacturing [63,71,85], as well as with methodologies of multivariate modeling and simulation experiments [75,86]. The framework is not directly derived from existing sources but is adapted to the objectives of this study in order to enable controlled generation of synthetic data while preserving the physical and operational realism of the process.

For each continuous variable x with industry-defined bounds

[x_{m i n}, x_{m a x}]

, a linear mapping is used:

x^{n o r m} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}

(16)

This ensures the comparability of the influence of individual variables within the generative functions and the numerical stability of the procedures applied in later stages of model development. Discrete variables, specifically paper type, are modeled using indicator variables, while the machine wear index is considered directly on the interval [0, 1].

To generate the synthetic dataset, the Latin Hypercube Sampling (LHS) method is applied, as it enables efficient and uniform coverage of the multidimensional input space without requiring an exponentially large number of simulations [75]. LHS is performed over the defined ranges of all input variables listed in Table 1, resulting in a total of 10,000 generated samples.

To further assess model behavior at the boundaries of the process space, six edge validation scenarios are additionally generated, corresponding to typical extreme operating conditions in digital printing: low speed and low coverage, high speed and high coverage, elevated relative humidity, high paper grammage, low fuser temperature, and high fuser temperature. These scenarios are not included in the training dataset but are used exclusively to evaluate the stability of the generative model and to support subsequent validation of the optimization results.

LHS constructs N samples such that each variable uniformly and exhaustively covers its respective interval. The generation matrix U is defined as:

U_{i j} = \frac{π_{j} (i) - ξ_{i j}}{N}, ξ_{i j} \sim U (0,1)

(17)

where πⱼ represents the permutation of indices for the j-th variable. This formulation ensures that each subinterval is represented exactly once, resulting in significantly better coverage than simple random sampling.

The input values are transformed into the physical range of each variable according to:

x_{i j} = a_{j} + (b_{j} - a_{j}) U_{i j}, 1 \leq i \leq N, 1 \leq j \leq d

(18)

In this way, N technically consistent digital printing scenarios are obtained, covering the entire operational space defined by industry standards.

To prevent variables with different scales from dominating the predictive models, the input data are standardized as follows:

z_{i j} = \frac{x_{i j} - μ_{j}}{σ_{j}}

(19)

where μⱼ and σⱼ denote the arithmetic mean and standard deviation of the j-th variable across all simulations.

The standardized set of input variables together with the corresponding KPI values defines the dataset used for predictive modeling:

D = {(z_{i}, y_{i})}_{i = 1}^{N}

(20)

This approach provides full control over the data structure, ensures reproducibility of the generated scenarios, and enables the creation of a sufficiently rich set of examples required for training and validating machine learning algorithms in the subsequent phase of the proposed framework.

The defined edge scenarios are not used during the training phase of the predictive models; instead, they serve for subsequent validation of the stability of the generative model and for examining the behavior of optimized solutions under extreme, yet technically feasible, operating conditions.

The generated dataset is divided into three disjoint subsets: training, validation, and test sets, following a 70/15/15 split. The training set is used for learning the model parameters, the validation set for hyperparameter tuning and assessment of generalization performance, and the test set for the final evaluation of predictive accuracy.

To ensure the physical and operational realism of the generated KPI values, lower and upper bounds are defined for all target metrics, as presented in Table 2. Given that the predictive and optimization stages rely on surrogate models trained on synthetically generated data, it is necessary to explicitly constrain the admissible range of KPI values to technically plausible operating regimes.

In order to ensure the physical realism of the generated data, KPI values are constrained to technically acceptable ranges. These constraints are applied to prevent unrealistic combinations of input variables within the generative model and to stabilize the training process of the predictive models.

3.3. Predictive Modeling and Optimization

Based on the generative model defined in this study, predictive models are developed to learn mappings between the input space and the target KPIs using a synthetically generated dataset. The objective of predictive modeling is to approximate the functional relationships between process variables and the associated KPIs embedded in the generative model of the digital twin framework, thereby enabling efficient multi-criteria optimization and sensitivity analysis.

The predictive layer of the framework uses the generated dataset to construct models that describe nonlinear relationships between process parameters and individual KPIs. The objective is to obtain stable and generalizable models that support reliable prediction, sensitivity analysis, and subsequent optimization of process settings. In this work, the Random Forest and XGBoost algorithms are employed due to their robustness, ability to capture interaction effects, and suitability for engineering systems characterized by pronounced nonlinear behavior.

For the set of standardized inputs

{z_{i}}

and corresponding outputs

{y_{i}}

, the objective of the predictive model is to identify an approximation function

\hat{y} = M (z)

that minimizes the estimation error. The Random Forest model is defined as an ensemble of T decision trees, where the prediction is given by the arithmetic mean of the individual trees:

\begin{matrix} {\hat{y}}_{i} = \frac{1}{T} \sum_{t = 1}^{T} h_{t} (z_{i}) \end{matrix}

(21)

where hₜ is the t-th tree trained on a bootstrap subsample of the data. This formulation enables the capture of nonlinearities and reduction in prediction variance, which is particularly important in simulations that involve noise and interactions among variables.

The XGBoost boosting model is constructed through iterative addition of weak learners that correct the errors of previous models. The estimate at the k-th iteration is given by:

\begin{matrix} {\hat{y}}_{i}^{(k)} = {\hat{y}}_{i}^{(k - 1)} + η f_{k} (z_{i}) \end{matrix}

(22)

where fₖ is a regression tree trained on the gradients of the loss function, and η is the learning rate. This procedure provides high model flexibility and enables adaptation to complex patterns in the data.

The overall optimization objective minimized by XGBoost can be formalized as:

\begin{matrix} L^{(k)} = \sum_{i = 1}^{N} l (y_{i}, {\hat{y}}_{i}^{(k)}) + Ω (f_{k}) \end{matrix}

(23)

where

l (\cdot, \cdot)

is a differentiable loss function, and the regularization term Ω(·) controls the complexity of the tree in order to prevent overfitting. The regularizer is modeled as a combination of L1 and L2 penalties:

\begin{matrix} Ω (f_{k}) = α ∥ w ∥_{1} + \frac{1}{2} λ ∥ w ∥_{2}^{2} \end{matrix}

(24)

where w is the tree weight parameter, and α and λ are regularization coefficients.

Model evaluation is performed using standard regression metrics. The mean absolute error (MAE) is defined as:

M A E = \frac{1}{N} \sum_{i = 1}^{N} ∣ y_{i} - {\hat{y}}_{i} ∣

(25)

while the mean squared error (MSE) is defined as:

\begin{matrix} M S E = \frac{1}{N} \sum_{i = 1}^{N} (y_{i} - {\hat{y}}_{i})^{2} \end{matrix}

(26)

The coefficient of determination R² measures the proportion of variance explained by the model:

\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{N} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{N} (y_{i} - \bar{y})^{2}} \end{matrix}

(27)

Models that achieve stable R² values and low MAE/MSE values are selected for optimization.

Multi-criteria optimization is based on the simultaneous minimization of waste, energy consumption, and downtime, subject to industry-standard technical constraints requiring that downtime does not exceed the allowable limit and that OEE remains above a predefined minimum target value. In this study, individual KPIs including material waste, energy consumption, downtime, defect rate, and throughput are treated as elementary process metrics that directly describe the environmental and operational aspects of digital printing. In contrast, overall equipment effectiveness (Overall Equipment Effectiveness, OEE) is not considered as an independent KPI of the same level, but rather as an aggregated operational indicator derived from selected process components. Within the proposed digital twin framework, the KPIs constitute the objective functions of the optimization, while OEE is introduced exclusively as a constraint that ensures the optimized solutions maintain an acceptable level of overall operational efficiency and system stability.

The optimization problem is defined as:

\begin{matrix} \underset{x}{m i n} (w (x), E (x), d (x)) \end{matrix}

(28)

Subject to:

d (x) \leq d_{m a x}, O E E (x) \geq O E E_{m i n}

(29)

Multi-criteria optimization and Pareto front analysis represent a standard approach in the optimization of complex production systems with competing objectives [79].

Since the system is nonlinear and multi-objective, the solution is sought on the Pareto front, where no objective component can be improved without degrading at least one other. Formally, the solution

x^{\ *}

satisfies the condition:

\begin{matrix} ∄ x : (w (x), E (x), d (x)) ≺ (w (x^{\ *}), E (x^{\ *}), d (x^{\ *})) \end{matrix}

(30)

where the symbol ≺ denotes strict Pareto dominance.

To verify the local optimality and consistency of the optimized parameters, the Karush–Kuhn–Tucker (KKT) conditions are evaluated, ensuring that the solution satisfies the imposed technical constraints of the digital printing process:

\begin{matrix} \nabla_{x} L (x, μ) = 0 \end{matrix}

(31)

where

μ

are the constraint multipliers. This condition ensures that the solution is stable with respect to local optimality and that the technical limits of the digital printing process are satisfied.

The obtained optimized parameters

x^{\ *}

are then transferred to the validation layer of the methodology, where a digital twin simulation is performed and the baseline and optimized process states are compared.

3.4. Simulation and Validation of the Digital Twin Framework

The digital twin in this study is defined as a reduced simulation model of the digital printing process that abstracts the key functional components of the production flow and their mutual interactions, without being tied to a specific commercial simulation environment.

The validation layer of the methodology establishes a link between the predictive models and the simulation-based representation of the digital printing process. The objective is to verify the technological feasibility of the optimized parameters

x^{\ *}

and to assess the discrepancy between the predictions produced by the machine learning model and the results obtained from the simulated system behavior. In this work, the digital twin is defined as a discrete-event simulation model that represents the operational flow of digital printing, including processing times, downtime intervals, and variability dependent on the input parameters.

The validation performed in this study refers to the internal consistency of the proposed simulation-prediction framework rather than to empirical validation against a real production system. The predictive models are trained on synthetically generated data, and their performance is evaluated in terms of their ability to reproduce the relationships defined by the generative model. While this approach supports verification of model consistency and stability, empirical validation using measurements from a real printing system is beyond the scope of this study.

The simulation framework is described by the operator

S

, which maps the input parameter vector to simulated KPI values:

\begin{matrix} y^{s i m} = S (x) \end{matrix}

(32)

where the operator incorporates the event structure, stochastic processes governing downtime generation, and mechanisms for material flow processing. In this way, the simulation model is treated as a black box that returns the estimated system behavior under dynamic conditions for each input vector.

To ensure consistency between the simulation and the predictive models, the deviation between simulated and predicted KPI values for the optimized parameters

x^{\ *}

. is analyzed. The difference is defined using a projection operator:

\begin{matrix} Δ y = P (y^{s i m} (x^{\ *}) - y^{p r e d} (x^{\ *})) \end{matrix}

(33)

where the operator

P

may represent a selected metric, such as absolute differences, weighted differences with respect to technical priorities, or normalized values. This formulation enables assessment of the degree to which the results of the predictive models are consistent with the simulation-based representation of the process.

The comparison between the baseline and optimized scenarios is performed by evaluating the relative improvement for each KPI value. For the j-th KPI component, the relative improvement is defined as:

\begin{matrix} δ_{j} = \frac{y_{j}^{b a s e} - y_{j}^{o p t}}{y_{j}^{b a s e}} \end{matrix}

(34)

where

y_{j}^{b a s e}

denotes the KPI value in the reference scenario, and

y_{j}^{o p t}

the value simulated for the optimized parameters. A positive value of

δ_{j}

indicates a reduction in waste, energy consumption, or downtime, that is, an increase in efficiency.

The stability of the optimized solution is verified through an additional simulation of n perturbed scenarios around

x^{\ *}

. For the perturbation vector

ϵ

, the expected deviation is defined as:

\begin{matrix} E [y^{s i m} (x^{\ *} + ϵ)] = \int_{R^{d}} S (x^{\ *} + ϵ) p (ϵ) d ϵ \end{matrix}

(35)

where

p (ϵ)

denotes the density of the perturbation noise. This expression provides an estimate of the robustness of the optimized parameters, which is critical under industrial conditions where input parameters cannot be maintained perfectly constant.

Based on the assessment of differences (26), improvements (27), and robustness (28), the digital twin enables evaluation of whether the set of optimized parameters

x^{\ *}

is technically acceptable, stable, and sustainable under the defined modeling assumptions. The validation layer thus completes the methodological architecture, confirming that the predictive and optimization models are internally consistent and operationally coherent.

3.5. Implementation of the Framework in the Python Environment

The methodology was implemented in Python 3.11 using standard scientific computing libraries, including NumPy, pandas, scikit-learn, and XGBoost, which served as the computational platform for executing the simulation, predictive modeling, and optimization procedures defined in the preceding sections. The implementation strictly follows the formal mathematical definitions of the models and does not introduce additional methodological assumptions.

The synthetic dataset was generated using the Latin Hypercube Sampling procedure over the ranges of process variables defined in Section 3.2. For each generated input vector, the values of material waste, energy consumption, downtime, throughput, and OEE were computed using the parameterized generative model. The resulting dataset was stored in a structured tabular format and used as input for the predictive and optimization stages.

Predictive modeling was performed using the Random Forest and XGBoost algorithms as described in Section 3.3. The implementation includes input standardization, partitioning of the dataset into training, validation, and test subsets, and training of separate models for individual KPIs. Model performance was evaluated using MAE, MSE, and the coefficient of determination R².

Based on the trained predictive models, a multi-criteria optimization procedure was carried out. The predictive models were used as surrogate approximators of nonlinear relationships between process variables and KPIs, enabling the identification of Pareto-optimal solutions that minimize material waste and energy consumption subject to constraints on downtime and minimum required OEE.

Validation of the optimized parameters was conducted using the reduced simulation model described in Section 3.4. For selected optimized configurations, simulated KPI values were compared with a baseline scenario to assess optimization effects and verify consistency between the predictive and simulation layers.

Visual analysis of the results included KPI distribution plots, Pareto fronts of the multi-criteria optimization, model-based feature importance analysis, and comparisons between baseline and optimized scenarios. These visualizations support interpretation of the results and illustrate the influence of individual process parameters on system sustainability and efficiency.

The implementation is organized in a modular manner, ensuring procedural transparency, reproducibility of results, and straightforward adaptability of the framework to alternative digital printing configurations or extension to include additional KPIs in future research.

4. Results and Discussion

The results obtained through the application of the proposed digital twin framework are analyzed as a consistency-based validation of the conceptual architecture established in the preceding sections. The underlying assumption of this study is that sustainability in digital graphic production cannot be reliably addressed at a normative or declarative level but must instead be technically operationalized through measurable, interrelated and optimizable key performance indicators. Within this context, the digital twin framework is conceived as an integrative layer that enables the quantitative coupling of process variables, predictive models and multi-criteria decision-making within a unified analytical structure.

The first group of results focuses on the validation of the predictive core of the digital twin framework, as prediction reliability represents a necessary prerequisite for any subsequent sustainability assessment. The relationship between simulated reference values and model-predicted KPI values, illustrated in Figure 2, demonstrates that the predictive models accurately approximate the structured nonlinear relationships defined in the generative formulation across the operational space. Notably, no significant degradation in predictive accuracy is detected across regions corresponding to more extreme operating regimes, demonstrating that the predictive models maintain stable performance characteristics even under parameter configurations associated with elevated process variability. This robustness is particularly relevant for the systematic evaluation of scenarios targeting waste reduction or energy consumption minimization.

Quantitatively, the range of residuals remains within approximately −400 to +600 units across the entire range of predicted values, with no statistically discernible trend of increasing error as process load increases. The symmetric distribution of residuals around zero indicates the absence of systematic model bias, while the lack of heteroskedastic behavior (i.e., homoskedastic residual variance) confirms the stability of the predictive core of the predictive models even under more extreme operating regimes.

To complement the graphical insight with quantitative evidence, the predictive performance of the models for individual KPIs is presented in Table 3. The table clearly shows that the predictive models maintain a consistent level of predictive quality across all six KPIs, without pronounced deviations that would suggest systematic bias toward specific aspects of the process. This result has a direct methodological implication: the digital twin can be used for relative comparison of scenarios within the optimization procedure, ensuring that ESG-relevant conclusions are based on actual process differences rather than numerical artifacts of the model structure.

The coefficient of determination (R²) values range from 0.831 to 0.905 for environmental and operational KPIs, while OEE achieves an R² value of 0.864, indicating a high level of explained variance across the full set of objectives. At the same time, MAE and RMSE values remain consistent with the natural scale of the individual metrics, confirming that the predictive models do not favor specific KPIs at the expense of others, but enable reliable relative comparison of optimization scenarios within the multi-criteria optimization procedure.

Prediction reliability is further supported by residual structure analysis, which shows no systematic increase in error with rising KPI values. This behavior directly reflects the methodological choices described in Section 3, particularly the use of a generative dataset that preserves the physical and operational realism of the digital printing process.

The next set of results focuses on analyzing the relationships between process variables and KPIs, with the aim of identifying structural constraints on sustainability. The analysis of the relative influence of process variables, shown in Figure 3, indicates that no single process configuration simultaneously optimizes all KPIs. Instead, individual parameters exhibit differentiated and often conflicting effects on environmental and operational metrics.

Quantitative analysis of the distributions shows that the optimization results in a reduction in material waste in the range of approximately 1–3%, while energy consumption is reduced by about 2–6% compared to the baseline scenario. At the same time, changes in machine downtime exhibit a wider range of variability, indicating the sensitivity of downtime to equipment condition and maintenance-related parameters.

The most pronounced and stable optimization effect within the defined model space is identified for OEE, with relative improvements in the range of approximately 20–40%. These changes reflect the behavior of the optimization model under the selected weighting structure and confirm that trade-offs between environmental and operational objectives are most clearly manifested through overall equipment efficiency within the simulated model. The reported relative changes in KPIs, including percentage changes in OEE, describe the behavior of the optimization model defined in this study and are directly conditioned by the selected weighted relationships among variables. Therefore, these results are not interpreted as descriptions of the behavior of a real digital printing process, but rather as an illustration of the structure of trade-offs and the sensitivity of the optimization solution within the defined model space.

The increase in OEE should be interpreted as a relative improvement within the simulated operational space of the digital twin and should be regarded as a comparative indicator of process efficiency rather than an absolute measure of real-world industrial performance.

This structure of relationships has significant interpretative value. Parameters related to the thermal regime, toner coverage, and print speed primarily shape energy demand and waste generation, while variables associated with equipment condition and service characteristics strongly influence downtime and overall equipment efficiency. Taken together, these observations motivate the application of multi-criteria optimization in order to explicitly formalize the trade-offs arising from such interdependent effects.

Input variables related to machine wear and maintenance intervals are modeled at an abstract level in this study but are conceptually aligned with data that are available in practice through process monitoring and maintenance systems. In a real production environment, such variables are typically estimated based on in-process monitoring, historical failure records, service interventions, and equipment performance indicators. It is important to note that the availability and quality of such data may introduce a certain bias in the evaluation of KPIs, particularly with respect to OEE and downtime, which further underscores the need to interpret the results within the defined model and process context.

For this reason, multi-criteria optimization is applied in this study, with results presented through the Pareto front in Figure 4. The Pareto front does not represent a set of best solutions in an absolute sense, but rather a formalization of the trade-offs arising from the technical and operational constraints of the system.

Analysis of the Pareto front structure shows that extreme solutions that minimize individual environmental KPIs lead to unacceptable deterioration of other performance aspects, such as throughput or process stability. The selected solution is not located at the edges of the front, but rather in its central region, thereby achieving a balanced compromise between environmental and operational objectives.

In order to assess the robustness of the obtained optimization results, a sensitivity analysis was conducted in which the weighted coefficients in the generative model were varied within a range of ±20% relative to their nominal values.

For each such set of coefficients, the optimization space and the corresponding Pareto front were regenerated. The results indicate that, although expected shifts in the position of the Pareto front occur, the fundamental structure of trade-offs between environmental and operational KPIs remains preserved (Figure 5). In this context, the sensitivity analysis indicates that the selected compromise solution does not represent a fragile optimum, but rather a stable configuration whose relative performance characteristics are retained under moderate variations in weighting priorities.

Quantitative robustness analysis of the selected solution shows that, under perturbations of input variables up to 8% of their range, the average OEE value decreases from approximately 78% to about 75%, while the baseline scenario remains stable at around 35%. This indicates that the selected configuration retains a significant relative operational advantage even under conditions of increased process uncertainty.

This type of selection directly reflects the engineering logic of sustainability, where the objective is long-term system stability rather than short-term optimization of individual indicators.

The comparison of the baseline and optimized scenarios, shown in Figure 6, further quantifies the effect of such a compromise. The normalized representation of KPIs enables simultaneous analysis of metrics with different scales and clearly demonstrates that optimization leads to a reduction in environmental burden without substantial degradation of key production performance.

Quantitatively, the selected optimized solution shows a reduction in waste from approximately 5.5% in the baseline scenario to about 4.5%, representing a relative reduction of approximately 18%. At the same time, energy consumption increases from around 40 to approximately 52 kWh per 1000 prints, quantifying the trade-off between environmental objectives. The position of the optimized solution within the central region of the Pareto front indicates that it is not an extreme optimum, but a balanced configuration that reduces waste at the cost of an acceptable increase in energy demand. This observation motivates a more comprehensive examination of the solution within the global optimization space and across multiple performance dimensions.

However, an isolated comparison of individual KPIs does not provide a complete picture without integrated interpretation. For this reason, the results are further analyzed through a combined representation of the optimization space and KPI profiles, shown in Figure 7. A detailed view of the selected solution is presented in Figure 8. These representations enable simultaneous assessment of the position of the selected solution relative to other Pareto-optimal configurations and its behavior across all KPI dimensions.

The visualization of the optimization space shows that the selected solution lies within the feasible region defined by the downtime and overall equipment efficiency constraints, achieving an OEE above approximately 55% with downtime below 20%, in contrast to the baseline scenario, which lies at the boundary of defined constraint limits. The parallel visualization of KPI profiles further indicates that the selected solution does not extremize individual metrics but instead achieves a balanced profile in which no KPI deviates significantly from the target range.

In this way, sustainability ceases primarily to be treated as an abstract objective and becomes a concrete technical characteristic of the system that can be analyzed, compared, and optimized.

The final set of results, shown in Figure 9, is focused on verifying the structural realism of the optimized solutions. The heat map illustrating the relationships between process variables and KPIs shows that regions of improved performance coincide with the technically expected ranges of key process parameters. This indicates that optimization does not exploit marginal or unrealistic parameter combinations but is conducted within a process-acceptable operating space.

A more detailed analysis of the heat map shows that regions of improved KPI values are concentrated within moderate ranges of key process variables. The lowest levels of waste and energy consumption are achieved at medium toner coverage levels and stable thermal regimes, while extreme parameters, such as maximum print speeds or significantly elevated temperatures, are associated with deterioration in OEE and increased downtime. It is also evident that configurations with OEE values above approximately 55% and downtime below 20% are not located at the boundaries of the process space, but rather in its central region, indicating that the optimization tends to favor technically stable and long-term sustainable operating regimes.

This result has a methodologically important implication for the potential application of the proposed approach. The digital twin framework does not generate solutions that are mathematically optimal but technically infeasible; instead, it enables the identification of configurations that are simultaneously sustainable, stable, and operationally acceptable. This indicates that the digital twin, in combination with predictive models and multi-criteria optimization, represents a methodologically consistent analytical framework for integrating ESG objectives into engineering decision-making in digital graphic production.

The proposed digital twin framework is conceptually aligned with recent developments in explainable artificial intelligence (XAI), where interpretability is treated as a functional requirement for industrial deployment in production systems characterized by technical constraints and sustainability targets. In manufacturing contexts, model outputs are expected to remain traceable to physically meaningful process variables in order to support engineering validation and technically grounded decision-making. Although SHAP-based interpretability analysis is not explicitly implemented in this study, the structure of the predictive layer and the parametrically defined process space enable potential attribution of KPI variations to individual input variables. Consequently, the framework preserves a transparent linkage between statistical learning behavior and engineering process understanding within the defined modeling assumptions.

5. Applicability, Potentials, and Limitations of the Proposed Digital Twin Framework

A limitation of the proposed approach lies in the fact that the analysis is based on a synthetically generated dataset, which precludes direct quantitative validation at the level of an individual industrial facility. However, this limitation arises from the research objective of the study, which is focused on the methodological integration of a digital twin framework, predictive modeling, and multi-criteria optimization, rather than on calibrating the framework for a specific production installation.

The results presented in the previous section indicate that the digital twin framework, in combination with predictive modeling and multi-criteria optimization, provides a structured analytical basis for technical decision-making in the context of sustainable digital graphic production. Nevertheless, in order to avoid excessive generalization of the obtained results, it is necessary to consider the practical implications of the proposed approach as well as its structural limitations.

From the perspective of practical applicability, the key contribution of the proposed approach lies in its ability to support scenario-based decision-making without intervention in the actual production process. The digital twin framework enables the simulation and evaluation of alternative process configurations in situations where experimental testing would be technically infeasible, economically unjustified, or operationally risky. This capability is particularly relevant in the context of ESG requirements, where changes in process parameters often have multiple interdependent effects that cannot be reliably assessed intuitively or on the basis of partial data.

An additional potential of the proposed approach is reflected in its ability to operate with limited and fragmented datasets. Rather than relying on long-term industrial data collection, the digital twin framework employed in this study uses a generative model that preserves the physical and operational realism of the digital printing process. This opens up the possibility of applying similar methodological approaches in environments where complete historical data is not available, which is frequently the case in practice, especially in smaller or technologically heterogeneous production systems.

From an operational perspective, the proposed approach can provide analytical support for decision-making at the process management level, for example, in the selection of operating regimes, assessment of the impact of changes in the production portfolio or planning of preventive maintenance. The integration of multiple KPIs into a single optimization framework enables decisions to be made not on the basis of individual metrics, but by considering the overall pattern of system behavior, which is particularly important in production environments characterized by pronounced trade-offs between productivity, quality, and environmental requirements.

The limitations of this study arise from the fact that the proposed digital twin framework was developed and analyzed exclusively within a simulation environment, without direct validation on a real production machine or with real industrial process data. The generative model, predictive algorithms, and optimization results are based on synthetically generated data and predefined assumptions, which implies that the obtained numerical values do not represent empirically validated optimal settings of an actual production system. Consequently, the results cannot be interpreted as directly applicable industrial recommendations, but rather as analytical insight into the behavior of the modeled system and the relationships between process variables and KPIs. Despite these limitations, the proposed framework enables systematic exploration of scenarios under conditions of limited data availability and provides a foundation for future research in which the framework can be calibrated using real industrial measurements and extended toward more adaptive or autonomous digital twins. Although the proposed digital twin framework does not implement agent-based or autonomous process control, its structure allows for future extension toward adaptive and agent-based digital twins following calibration with real industrial data.

6. Conclusions

This paper addresses the problem of data fragmentation and the lack of operational tools for technical decision-making in the context of sustainable digital graphic production. Instead of approaching sustainability through general ESG guidelines or isolated performance indicators, a methodological framework is proposed in which a digital twin–based model is used as an integration platform to connect process variables, key performance indicators, and multi-criteria optimization-based decision-making.

The results show that a digital twin framework, grounded in predictive models and a synthetically generated dataset, can approximate complex relationships within the digital printing process and enable the analysis of scenarios that are not readily accessible in a real production environment. This supports a systematic investigation of trade-offs between environmental and operational objectives, where sustainability is not treated as an abstract requirement but as a technical problem that can be quantitatively analyzed and optimized.

The application of multi-criteria optimization within the digital twin framework was shown to be important for formalizing conflicts among KPIs. Rather than identifying a single optimal solution, a set of compromise configurations was obtained, reflecting the real constraints of the production system. Such an approach supports technically grounded and operationally informed decision-making, which is particularly important in the context of ESG requirements, where improvements in one aspect often have implications for others.

The scientific contribution of this work consists of showing how a digital twin framework can be used not only as a simulation tool but also as an active mechanism for integrated decision-making in production systems characterized by pronounced multi-criteria constraints. The engineering contribution is reflected in showing how ESG objectives can be operationalized through concrete KPIs and incorporated into the optimization process without compromising fundamental production performance.

The usefulness of the obtained results does not stem from their absolute numerical values, but from the ability of the proposed digital twin framework to explicitly represent the structure of trade-offs between environmental and operational KPIs. By modifying weights, constraints, or input variables, it is possible to generate alternative Pareto spaces that reflect different production priorities without altering the underlying architecture of the framework. In this way, the results of this study serve as an analytical basis for understanding system interdependencies and sensitivity, rather than as a prescriptive set of optimal settings for a real production process.

Although the proposed approach was developed and evaluated using synthetically generated data, the results indicate the potential applicability in real industrial environments, subject to appropriate calibration and adaptation to the context of a specific production system. The primary value of the proposed approach lies in the architecture of the digital twin framework and the integrated methodological structure, rather than in the specific numerical results obtained under simulated conditions. In a real production environment, the generative module can, in principle, be replaced by a physical data stream without altering the structure of the framework. This involves linking the input variables of the digital twin to data collected through sensors, process monitoring systems, maintenance systems, and production information systems, while the predictive models and the optimization procedure remain unchanged. In this way, the proposed framework supports a gradual transition from a simulation environment to industrial application, allowing the digital twin framework to evolve from an analytical tool toward an operational decision-support role for sustainable digital printing.

In this sense, the conclusions of this study pertain to the methodological value and analytical consistency of the proposed digital twin framework, while the obtained results and optimal values should be interpreted exclusively within the context of the defined model, rather than as empirically validated or directly applicable industrial recommendations.

Author Contributions

Conceptualization, D.B.; methodology, D.B., S.P.P., H.C. and B.P.; software, D.B.; validation, D.B., H.C. and B.P.; formal analysis, D.B., S.P.P., H.C. and B.P.; investigation, D.B. and S.P.P.; resources, D.B. and S.P.P.; data curation, D.B.; writing—original draft preparation, D.B. and S.P.P.; writing—review and editing, D.B., S.P.P., H.C. and B.P.; visualization, D.B.; supervision, D.B. and H.C.; project administration, D.B.; funding acquisition, D.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research and the APC were supported by the University of Zagreb through internal financial support for the project “Smart system for technical optimization of KPIs in sustainable graphic arts production”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are generated synthetically within the proposed simulation framework. The Python 3.11 implementation used for data generation, model training, and optimization is available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Intergraf. Available online: https://www.intergraf.eu/communications/publications/item/441-joint-statement-digital-is-not-neutral (accessed on 29 June 2025).
Intergraf. Available online: https://www.intergraf.eu/communications/publications/item/331-intergraf-recommendations-on-co2-emissions-calculation-in-the-printing-industry (accessed on 29 June 2025).
Smithers. Available online: https://www.smithers.com/services/market-reports/printing/the-future-of-screen-vs-digital-printing-to-2030 (accessed on 29 June 2025).
Smithers. Available online: https://www.smithers.com/Services/market-reports/Printing/The-Future-of-Digital-Printing-to-2032 (accessed on 29 June 2025).
Intergraf. Available online: https://www.intergraf.eu/latest-news/item/517-intergraf-signs-the-joint-european-industry-manifesto-to-relaunch-competitiveness (accessed on 29 June 2025).
Smithers. Available online: https://www.smithers.com/services/market-reports/printing/digital-print-strategic-forecasts-to-2029 (accessed on 29 June 2025).
Smithers. Available online: https://www.smithers.com/services/market-reports/printing/the-future-of-digital-vs-offset-printing-to-2029 (accessed on 29 June 2025).
Othen, R.; Padberg, J.; Möbitz, C.; Gries, T. Digital Transformation in the Paper Industry: Assessing Maturity, Challenges, and Opportunities. Sustainability 2025, 17, 770. [Google Scholar] [CrossRef]
Rodzvilla, J. Project Management for Book Publishers: The Programs and Workflows Behind Making Books and Digital Products, 1st ed.; Routledge: London, UK, 2024; pp. 42–68. [Google Scholar] [CrossRef]
International Organization for Standardization. Guidelines for Using Print Production Standards; v2 January; ISO/TC 130 Graphic Technology; International Organization for Standardization: Geneva, Switzerland, 2024; Available online: https://committee.iso.org/files/live/sites/tc130/files/Resources/Guidelines%20for%20using%20print%20production%20standards%20v2%20Jan%202024.pdf (accessed on 29 June 2025).
Stora Enso. Forest Certification. Available online: https://www.storaenso.com/en/sustainability/forest/forest-certification (accessed on 29 June 2025).
International Organization for Standardization. Framework for ISO/TC 130 Standards; ISO/TC 130 Graphic Technology; International Organization for Standardization: Geneva, Switzerland, 2024; Available online: https://committee.iso.org/files/live/sites/tc130/files/Resources/Framework%20for%20standards%202024.pdf (accessed on 29 June 2025).
Petrov, S. Digital Twins and Sustainability: A Comprehensive Review of Limitations and Opportunities. Master’s Thesis, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden, June 2023. [Google Scholar]
Bhatia, M.; Kumar, R. Digital Twin and sustainability: A data-driven scientometric exploration. Internet Things 2025, 32, 101652. [Google Scholar] [CrossRef]
ISO 14001:2015; Environmental Management Systems—Requirements with Guidance for Use. International Organization for Standardization: Geneva, Switzerland, 2015.
ISO 14040:2006; Environmental Management Systems—Life Cycle Assessment—Principles and Framework. International Organization for Standardization: Geneva, Switzerland, 2006.
ISO 14044:2006; Environmental Management Systems—Life Cycle Assessment—Requirements and Guidelines. International Organization for Standardization: Geneva, Switzerland, 2006.
ISO 14067:2018; Greenhouse Gases—Carbon Footprint of Products—Requirements and Guidelines for Quantification. International Organization for Standardization: Geneva, Switzerland, 2018.
ISO 50001:2018; Energy Management Systems—Requirements with Guidance for Use. International Organization for Standardization: Geneva, Switzerland, 2018.
ISO 16759:2013; Graphic technology—Quantification and communication for calculating the carbon footprint of print media products. International Organization for Standardization: Geneva, Switzerland, 2013.
Sustainability and Climate Change Strategy: Evaluation Framework. Department for Education, UK Government. Available online: https://www.gov.uk/government/publications/sustainability-and-climate-change-strategy-evaluation-framework (accessed on 29 June 2025).
Kluczek, A.; Gladysz, B.; Buczacki, A.; Krystosiak, K.; Ejsmont, K.; Palmer, E. Aligning sustainable development goals with Industry 4.0 for the design of business model for printing and packaging companies. Packag. Technol. Sci. 2023, 36, 307–325. [Google Scholar] [CrossRef]
Baxter, R. Operational Excellence Handbook: A Must Have for Those Embarking on a Journey of Transformation and Continuous Improvement, 1st ed.; Lulu Press: Morrisville, NC, USA, 2015; pp. 12–13. [Google Scholar]
Rosen, R.; von Wichert, G.; Lo, G.; Bettenhausen, K.D. About The Importance of Autonomy and Digital Twins for the Future of Manufacturing. IFAC-PapersOnLine 2015, 48, 567–572. [Google Scholar] [CrossRef]
Kusiak, A. Smart manufacturing. Int. J. Prod. Res. 2018, 56, 508–517. [Google Scholar] [CrossRef]
Tao, F.; Zhang, H.; Liu, A.; Nee, A.Y.C. Digital Twin in Industry: State-of-the-Art. J. Manuf. Syst. 2019, 58, 213–227. [Google Scholar] [CrossRef]
Kariniemi, M.; Nors, M.; Kujanpää, M.; Pajula, T.; Pihkola, H. Evaluating Environmental Sustainability of Digital Printing. Digit. Print. Technol. Digit. Fabr. 2010, 26, 92–96. [Google Scholar] [CrossRef]
Glogar, M.; Petrak, S.; Mahnić Naglić, M. Digital Technologies in the Sustainable Design and Development of Textiles and Clothing—A Literature Review. Sustainability 2025, 17, 1371. [Google Scholar] [CrossRef]
Azizul Hoque, S.M.; Parrillo Chapman, L.; Moore, M.; Lavelle, J.; Saloni, D.; Woodbridge, J.; Maguire King, K. Environmental Sustainability Analysis of Rotary-Screen Printing and Digital Textile Printing. AATCC J. Res. 2024, 11, 464–473. [Google Scholar] [CrossRef]
Rahman, M. Print-on-demand fashion models for reducing overproduction and environmental waste in the fashion industry. Int. J. Innov. Technol. Soc. Sci. 2024, 8, 1–10. [Google Scholar] [CrossRef]
Tao, F.; Zhang, H.; Qi, Q.; Xu, J.; Sun, Z.; Hu, T.; Liu, X.; Liu, T.; Guan, J.; Chen, C.; et al. Theory of digital twin modeling and its application. Comput. Integr. Manuf. Syst. CIMS 2021, 27, 1–15. [Google Scholar] [CrossRef]
Tao, F.; Zhang, M.; Nee, A.Y.C. Digital Twin Driven Smart Manufacturing, 1st ed.; Academic Press: Boston, MA, USA, 2019; pp. 7–54. [Google Scholar] [CrossRef]
Kusiak, A. Predictive models in digital manufacturing: Research, applications, and future outlook. Int. J. Prod. Res. 2023, 61, 6052–6062. [Google Scholar] [CrossRef]
Sheppard, C. Tree-Based Machine Learning Algorithms: Decision Trees, Random Forest and Boosting, 2nd ed.; CreateSpace Independent Publishing Platform: Austin, TX, USA, 2019; pp. 75–79. [Google Scholar]
Jamwal, A.; Agrawal, R.; Sharma, M. Deep learning for manufacturing sustainability: Models, applications in Industry 4.0 and implications. Int. J. Inf. Manag. Data Insights 2022, 2, 100107. [Google Scholar] [CrossRef]
Chen, C.; Li, X.; Wang, K. Applying XGBoost for Fault Prediction in Industrial Production Line. J. Intell. Knowl. Eng. 2024, 2, 154–161. [Google Scholar] [CrossRef]
Kritzinger, W.; Karner, M.; Traar, G.; Henjes, J.; Sihn, W. Digital Twin in manufacturing: A categorical literature review and classification. IFAC-PapersOnLine 2018, 51, 1016–1022. [Google Scholar] [CrossRef]
Zhang, W.; Wu, C.; Zhong, H.; Li, Y.; Wang, L. Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci. Front. 2021, 12, 469–477. [Google Scholar] [CrossRef]
Chen, Y.; Li, F.; Zhou, S.; Zhang, X.; Zhang, S.; Zhang, Q.; Su, Y. Bayesian optimization based random forest and extreme gradient boosting for the pavement density prediction in GPR detection. Constr. Build. Mater. 2023, 387, 131564. [Google Scholar] [CrossRef]
Dwivedi, D.; Batra, S.; Pathak, Y.K. A machine learning based approach to identify key drivers for improving corporate’s ESG ratings. J. Law Sustain. Dev. 2023, 11, e0242. [Google Scholar] [CrossRef]
Tursunalieva, A.; Alexander, D.L.J.; Dunne, R.; Li, J.; Riera, L.; Zhao, Y. Making Sense of Machine Learning: A Review of Interpretation Techniques and Their Applications. Appl. Sci. 2024, 14, 496. [Google Scholar] [CrossRef]
Dobra, P.; Jósvai, J. Cumulative and Rolling Horizon Prediction of Overall Equipment Effectiveness (OEE) with Machine Learning. Big Data Cogn. Comput. 2023, 7, 138. [Google Scholar] [CrossRef]
Dobra, P.; Jósvai, J. Prediction of Overall Equipment Effectiveness in assembly processes using machine learning. J. Mech. Eng. 2024, 74, 57–64. [Google Scholar] [CrossRef]
Ördek, B.; Borrgiani, Y.; Coatanea, E. Machine learning supported manufacturing: A review and research agenda. Circ. Econ. Sustain. 2024, 12, 2326526. [Google Scholar] [CrossRef]
Rahman, M.A.; Shahrior, M.F.; Iqbal, K.; Abushaiba, A.A. Enabling Intelligent Industrial Automation: A Review of Machine Learning Applications with Digital Twin and Edge AI Integration. Automation 2025, 6, 37. [Google Scholar] [CrossRef]
Kausik, A.K.; Rashid, A.B.; Baki, R.F.; Maktum, M.F.J. Machine learning algorithms for manufacturing quality assurance: A systematic review of performance metrics and applications. Array 2025, 26, 100393. [Google Scholar] [CrossRef]
Karjust, K.; Mehrparvar, M.; Kaganski, S.; Raamets, T. Development of a Sustainability-Oriented KPI Selection Model for Manufacturing Processes. Sustainability 2025, 17, 6374. [Google Scholar] [CrossRef]
Mejía-Moncayo, C.; Chaabane, A.; Kenné, J.-P.; Hof, L.A. Key performance indicators for sustainable remanufacturing: A literature review and methodological framework. Clean. Logist. Supply Chain 2025, 17, 100260. [Google Scholar] [CrossRef]
Zuher, H.A. Impact Quality Indicators onto Sustainable Manufacturing. J. Sustain. Dev. Innov. 2024, 1, 137–141. [Google Scholar] [CrossRef]
Kesireddy, A.; Medrano, F.A. Elite Multi-Criteria Decision Making–Pareto Front Optimization in Multi-Objective Optimization. Algorithms 2024, 17, 206. [Google Scholar] [CrossRef]
Chen, C.; Wang, Y.; Lu, S.; Li, X. Design and Multiobjective Optimization of Green Closed-Loop Manufacturing-Recycling Network Considering Raw Material Attribute. Processes 2022, 10, 904. [Google Scholar] [CrossRef]
Vadenbo, C.; Hellweg, S.; Guillén-Gosálbez, G. Multi-objective optimization of waste and resource management in industrial networks—Part I: Model description. Resour. Conserv. Recycl. 2014, 89, 52–63. [Google Scholar] [CrossRef]
Schluse, M.; Priggemeyer, M.; Atorf, L.; Rossmann, J. Experimentable digital twins–Streamlining simulation-based systems engineering. IEEE Trans. Ind. Inform. 2018, 14, 1722–1731. [Google Scholar] [CrossRef]
Lu, Y.; Liu, C.; Wang, K.I.-K.; Huang, H.; Xu, X. Digital Twin-driven smart manufacturing: Connotation, reference model, applications and research issues. Robot. Comput.-Integr. Manuf. 2020, 61, 101837. [Google Scholar] [CrossRef]
Sajadieh, S.M.M.; Noh, S.D. A Review of Digital Twin Integration in Circular Manufacturing for Sustainable Industry Transition. Sustainability 2025, 17, 7316. [Google Scholar] [CrossRef]
Nozari, H.; Yordanova, Z. A fuzzy–digital twin optimization framework for simultaneous management of waste and energy consumption in sustainable manufacturing. Int. J. Optim. Control. Theor. Appl. 2025, 1, 025360152. [Google Scholar] [CrossRef]
Kamble, S.S.; Gunasekaran, A.; Parekh, H.; Mani, V.; Belhadi, A.; Sharma, R. Digital twin for sustainable manufacturing supply chains: Current trends, future perspectives, and an implementation framework. Technol. Forecast. Soc. Change 2022, 176, 121448. [Google Scholar] [CrossRef]
Stavropoulos, P.; Mourtzis, D. Digital twins in industry 4.0. In Design and Operation of Production Networks for Mass Personalization in the Era of Cloud Technology, 1st ed.; Mourtzis, D., Ed.; Elsevier: Amsterdam, The Netherlands, 2022; Volume 1, pp. 277–316. [Google Scholar]
Stavropoulos, P.; Papacharalampopoulos, A.; Siatras, V.; Mourtzis, D. An AR based Digital Twin for Laser based manufacturing process monitoring. In Proceedings of the 18th CIRP Conference on Modeling of Machining Operations, Ljubljana, Slovenia, 15–17 June 2021. [Google Scholar] [CrossRef]
He, B.; Bai, K.J. Digital twin-based sustainable intelligent manufacturing: A review. Adv. Manuf. 2021, 9, 1–21. [Google Scholar] [CrossRef]
Timperi, M.; Kokkonen, K.; Hannola, L. Digital twins for environmentally sustainable and circular manufacturing sector: Visions from industry professionals. Prod. Manuf. Res. 2024, 12, 2428249. [Google Scholar] [CrossRef]
Gbededo, M.; Liyanage, K. Advancing sustainable manufacturing through digital twin based simulation. In Proceedings of the MATEC Web of Conferences, ICMR 2024, Glasgow, Scotland, 28–30 August 2024. [Google Scholar] [CrossRef]
Jawad, M.S.; Chandran, P.; Ramli, A.A.B.; Mahdin, H.B.; Abdullah, Z.B.; Rejab, M.B.M. Adoption of digital twin for sustainable manufacturing and achievements of production strategic-planned goals. MethodsX 2022, 9, 101920. [Google Scholar] [CrossRef]
Çınar, Z.M.; Abdussalam Nuhu, A.; Zeeshan, Q.; Korhan, O.; Asmael, M.; Safaei, B. Machine Learning in Predictive Maintenance towards Sustainable Smart Manufacturing in Industry 4.0. Sustainability 2020, 12, 8211. [Google Scholar] [CrossRef]
Rajesh, A.S.; Prabhuswamy, M.S.; Krishnasamy, S. Smart Manufacturing through Machine Learning: A Review, Perspective, and Future Directions to the Machining Industry. J. Eng. 2022, 1, 9735862. [Google Scholar] [CrossRef]
Rosemeyer, J.; Pinzone, M.; Metternich, J. Digital Assistance Systems to Implement Machine Learning in Manufacturing: A Systematic Review. Mach. Learn. Knowl. Extr. 2024, 6, 2808–2828. [Google Scholar] [CrossRef]
Hristov, I.; Chirico, A. The Role of Sustainability Key Performance Indicators (KPIs) in Implementing Sustainable Strategies. Sustainability 2019, 11, 5742. [Google Scholar] [CrossRef]
Aljamal, D.; Salem, A.; Khanna, N.; Hegab, H. Towards sustainable manufacturing: A comprehensive analysis of circular economy key performance indicators in the manufacturing industry. Sustain. Mater. Technol. 2024, 40, e00953. [Google Scholar] [CrossRef]
Zhou, G.; Zhang, C.; Zi, L.; Ding, K.; Wang, C. Knowledge-driven digital twin manufacturing cell towards intelligent manufacturing. Int. J. Prod. Res. 2019, 59, 1034–1051. [Google Scholar] [CrossRef]
Ponnusamy, V.; Ekambaram, D.; Zdravkovic, N. Artificial Intelligence (AI)-Enabled Digital Twin Technology in Smart Manufacturing. In Industry 4.0, Smart Manufacturing, and Industrial Engineering; Tyagi, A.K., Tiwari, S., Ahmad, S.S., Eds.; CRC Press: Boca Raton, FL, USA, 2024; Volume 1, pp. 248–270. [Google Scholar]
Leng, J.; Wang, D.; Shen, W.; Li, X.; Liu, Q.; Chen, X. Digital twins-based smart manufacturing system design in Industry 4.0: A review. J. Manuf. Syst. 2021, 60, 119–137. [Google Scholar] [CrossRef]
Fuller, A.; Fan, Z.; Day, C.; Barlow, C. Digital Twin: Enabling technologies, challenges and open research. IEEE Access 2020, 8, 108952–108971. [Google Scholar] [CrossRef]
Villegas, L.F.; Macchi, M.; Polenghi, A. Digital twins in manufacturing: A unified conceptual framework. Annu. Rev. Control 2025, 60, 101031. [Google Scholar] [CrossRef]
Pentti, V. Environmental Sustainability in the Finnish Printing and Publishing Industry. Licentiate Thesis, EVTEK University of Applied Sciences, Espoo, Finland, 2008. [Google Scholar]
Kleijnen, J.P.C. Design and Analysis of Simulation Experiments, 1st ed.; Springer: New York, NY, USA, 2008; pp. 179–203. [Google Scholar]
Bertocci, F.; Grandoni, A.; Fidanza, M.; Berni, R. A Guideline for Implementing a Robust Optimization of a Complex Multi-Stage Manufacturing Process. Appl. Sci. 2021, 11, 1418. [Google Scholar] [CrossRef]
Sun, L.; Ji, Y.; Zhu, X.; Peng, T. Process knowledge-based random forest regression for model predictive control on a nonlinear production process with multiple working conditions. Adv. Eng. Inform. 2022, 52, 101561. [Google Scholar] [CrossRef]
Ahmed, I.; Raihan, A.S. Machine learning techniques for sustainable industrial process control. In Computational Intelligence Techniques for Sustainable Supply Chain Management, 1st ed.; Sanjoy, K.P., Sandeep, K., Eds.; Elsevier: Amsterdam, The Netherlands, 2024; Volume 1, pp. 141–176. [Google Scholar]
Hewlett-Packard Indigo. Available online: https://support.hp.com/in-en/product/setup-user-guides/hp-indigo-7000-digital-press-series/3722856 (accessed on 13 June 2025).
Canon Europe. Available online: https://www.canon-europe.com/support/business/products/controllers/imagepress/imagepress-c9010vp.html (accessed on 13 June 2025).
Konica Minolta Europe. Available online: https://manuals.konicaminolta.eu/konicaminolta/ (accessed on 13 June 2025).
Majnarić, I. Fundametals of Digital Printing, 1st ed.; University of Zagreb, Faculty of Graphic Arts: Zagreb, Croatia, 2015; pp. 137–189. [Google Scholar]
Spleth, P.; Korbel, J.J.; Zarnekow, R. Sustainability Effects of Digital Twins: A Review. In Proceedings of the Pacific Asia Conference on Information Systems 2024, Ho Chi Minh City, Vietnam, 1–5 July 2024. [Google Scholar]
Mouflih, C.; Gaha, R.; Durupt, A.; Bosch-Mauchand, M.; Martinsen, K.; Eynard, B. Decision Support Framework using Knowledge Based Digital Twin for Sustainable Product Development and End of Life. Proc. Des. Soc. 2023, 3, 1157–1166. [Google Scholar] [CrossRef]
Franciosi, C.; Miranda, S.; Veneroso, C.R.; Riemma, S. Improving industrial sustainability by the use of digital twin models in maintenance and production activities. IFAC-PapersOnLine 2022, 55, 37–42. [Google Scholar] [CrossRef]
Hair, J.F.; Black, W.C.; Babin, B.J.; Anderson, R.E. Multivariate Data Analysis: A Global Perspective, 7th ed.; Pearson: Boston, MA, USA, 2010; pp. 76–77. [Google Scholar]

Figure 1. Conceptual structure of the proposed hybrid digital twin-based simulation and optimization framework.

Figure 2. Relationship between simulated reference KPI values and machine-learning-predicted values within the digital twin model. The red marker in the plots denotes the reference point corresponding to the ideal agreement between simulated and predicted values.

Figure 3. Relative influence of process variables on selected KPIs within the digital twin framework.

Figure 4. Pareto front of non-dominated optimization solutions for selected ESG-relevant KPIs.

Figure 5. Sensitivity analysis of the Pareto front with respect to weighting coefficients.

Figure 6. Normalized comparison of KPIs for the baseline and selected optimized scenarios.

Figure 7. Distribution of Pareto-optimal solutions in the objective space under defined operational constraints.

Figure 8. Integrated visualization of KPI profiles for the baseline and selected optimized solution.

Figure 9. Heatmap of the relationships between process variables and KPIs within the optimized solution.

Table 1. Digital printing input variables and operational ranges.

Variable Name	Symbol	Unit	Range	Description
Print speed	v	A4/min	40–180	Typical range of production digital printers; speeds below 40 A4/min correspond to the office segment, while speeds above 180 A4/min belong to the high-performance class.
Toner coverage	c	%	5–85	Range spanning from text-based jobs to high-density graphic applications; values above 85% increase the risk of smearing and fusing defects.
Fuser temperature	t	°C	150–200	Range within which manufacturers specify stable toner fusing in electrophotographic processes; lower values result in insufficient fusing, while higher values may cause thermal damage to the substrate.
Relative humidity	h	%	25–65	Typical operating conditions; low humidity increases electrostatic issues, whereas high humidity leads to paper curling and registration errors.
Paper type	p	-	0, 1, 2	Encoding: 0 = uncoated paper, 1 = coated paper, 2 = special printing substrates with increased variability.
Paper grammage	g	g/m²	70–300	Range from lightweight office papers to commercial-grade cardboard; grammages outside this range require specialized paper transport paths.
Job size	n	prints	200–50,000	Covers personalized runs, prototype batches, and medium-scale commercial jobs; larger runs fall within the domain of conventional printing.
Machine wear index	s	-	0–1	0 = new or freshly serviced condition; 1 = state immediately prior to servicing; continuously represents performance degradation.
Last service interval	r	prints	0–30,000	Number of prints since the last scheduled service; higher values increase the risk of downtime and quality deterioration.

Table 2. Physical and operational limits of KPIs used for validation of the generated data.

KPI Indicators	Symbol	Unit	Lower Bound	Upper Bound	Description
Material waste	w	%	0	20	Values above 20% are considered unacceptable in production digital printing.
Energy consumption	E	kWh/1000 prints	10	80	The range covers low to high consumption across different print speeds, coverage levels, and paper types.
Defect rate	D	ppm	0	3000	Values exceeding several thousand ppm indicate severe process instability.
Downtime	d	%	0	40	Downtime above 40% in practice indicates an unsustainable system condition.
Throughput	T	prints/hour	0	-	The upper limit depends on print speed and system configuration; negative values are not permitted.
Overall Equipment Effectiveness (OEE)	OEE	%	0	100	Standard definition of overall equipment effectiveness.

Table 3. Predictive performance metrics for individual KPIs.

Target	MAE	RMSE	R²
Waste (pct)	0.988	1.245	0.831
Energy (kWh/1000 prints)	2.597	3.229	0.905
Defect (ppm)	161.303	195.820	0.857
Downtime (pct)	1.527	2.117	0.883
Throughput (prints per hour)	4.121	5.181	1.000
OEE (pct)	2.890	3.691	0.864

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bratić, D.; Pasanec Preprotić, S.; Cajner, H.; Preprotić, B. Digital Twin-Based Hybrid Simulation–Prediction Framework for KPI Optimization in Sustainable Digital Printing. Technologies 2026, 14, 170. https://doi.org/10.3390/technologies14030170

AMA Style

Bratić D, Pasanec Preprotić S, Cajner H, Preprotić B. Digital Twin-Based Hybrid Simulation–Prediction Framework for KPI Optimization in Sustainable Digital Printing. Technologies. 2026; 14(3):170. https://doi.org/10.3390/technologies14030170

Chicago/Turabian Style

Bratić, Diana, Suzana Pasanec Preprotić, Hrvoje Cajner, and Branimir Preprotić. 2026. "Digital Twin-Based Hybrid Simulation–Prediction Framework for KPI Optimization in Sustainable Digital Printing" Technologies 14, no. 3: 170. https://doi.org/10.3390/technologies14030170

APA Style

Bratić, D., Pasanec Preprotić, S., Cajner, H., & Preprotić, B. (2026). Digital Twin-Based Hybrid Simulation–Prediction Framework for KPI Optimization in Sustainable Digital Printing. Technologies, 14(3), 170. https://doi.org/10.3390/technologies14030170

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Digital Twin-Based Hybrid Simulation–Prediction Framework for KPI Optimization in Sustainable Digital Printing

Abstract

1. Introduction

2. Theoretical Background

3. Methodology

3.1. Framework Concept and Architecture

3.2. Definition of Variables and Input Data Generation

3.3. Predictive Modeling and Optimization

3.4. Simulation and Validation of the Digital Twin Framework

3.5. Implementation of the Framework in the Python Environment

4. Results and Discussion

5. Applicability, Potentials, and Limitations of the Proposed Digital Twin Framework

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI