1. Introduction
Modern industrial systems must balance high reliability, efficiency, and flexibility with the continuous pressure to minimize operational and maintenance costs. In these environments, maintenance management is a pivotal determinant of production continuity and aggregate performance. Suboptimal strategies inevitably lead to a costly dichotomy: either catastrophic unexpected failures and downtime, or excessive preventive interventions—both of which incur substantial economic penalties [
1].
The paradigm shift toward Industry 4.0—driven by the Internet of Things (IoT), cyber-physical systems, and big data analytics—has fundamentally augmented Predictive Maintenance (PdM) capabilities. These technologies facilitate high-fidelity, continuous condition monitoring and provide the empirical foundation for sophisticated decision-support systems [
2,
3,
4].
Despite notable progress in PdM and degradation modeling, several limitations persist in the extant literature:
Decoupling of Technical and Operational Metrics: Research frequently addresses isolated failure prediction or degradation modeling without establishing an explicit nexus with production performance indicators. This stems from the inherent complexity of mapping technical asset health to macroscopic production outcomes [
5,
6,
7].
Underutilization of OEE in Optimization: Although Overall Equipment Effectiveness (OEE) is the industry standard for measuring availability, performance, and quality [
8], it is seldom incorporated directly into maintenance optimization frameworks. Bridging this gap requires interdisciplinary models that synchronize machine state data with real-time production dynamics.
Fragmented Economic Evaluation: Life Cycle Cost (LCC) analysis is often treated as a retrospective audit rather than a prospective driver of maintenance decisions. This fragmentation is primarily attributed to the computational complexity of coupling detailed economic models with multi-objective optimization, alongside persistent issues in data quality [
9,
10,
11].
Another important limitation concerns the assumption of full observability. Most existing models assume that the true state of machine degradation is perfectly known, which is rarely the case in real industrial environments. In practice, maintenance decisions must be made based on noisy and incomplete sensor data, introducing additional uncertainty into the decision-making process. Furthermore, relatively little attention has been devoted to multi-machine systems operating under resource constraints. In real production environments, maintenance resources such as personnel, tools, and time are limited, requiring coordinated decision-making across multiple assets. Based on the preceding literature review, the following critical research gaps have been identified:
Fragmentation of Decision-Support Frameworks: Current literature lacks a unified methodology that synthesizes stochastic degradation modeling, OEE-based operational performance, and LCC-based economic evaluation into a singular, cohesive decision-making structure.
Neglect of Partial Observability: There is an insufficient accounting for measurement uncertainty and partial observability—factors that are intrinsic to Industry 4.0 environments but are often idealized in existing maintenance models.
Operational Constraints in Multi-Machine Systems: Research remains scant regarding the optimization of multi-machine systems under shared resource constraints (e.g., limited personnel or spare parts), leaving a void in understanding how these bottlenecks impact aggregate system performance.
Synergistic Impact of Uncertainty and Economics: There is a notable absence of comprehensive analyses investigating how data quality (sensor noise), stochastic uncertainty, and economic variables (failure penalties vs. maintenance costs) jointly dictate the optimal maintenance threshold.
These gaps highlight the need for a unified and realistic approach to maintenance optimization that reflects both technical and operational conditions. To address the identified gaps, the study is guided by the following research questions:
RQ1: How can an optimal maintenance policy be determined under stochastic degradation and partial observability?
RQ2: How do key parameters (e.g., failure cost, process variability, maintenance effectiveness) influence maintenance decisions and system performance?
RQ3: What is the impact of Digital Twin integration and resource constraints on the effectiveness of maintenance strategies in realistic industrial environments?
The main objective of this study is to develop and validate an integrated maintenance optimization framework that combines stochastic degradation modeling, production performance evaluation using OEE, and economic assessment based on Life Cycle Cost (LCC). The proposed approach incorporates threshold-based maintenance strategies, Monte Carlo simulation, and Digital Twin concepts to support decision-making under uncertainty and partial observability. Additionally, the framework is extended to multi-machine environments with constrained maintenance resources, reflecting real industrial conditions. The study aims to bridge the gap between theoretical models and practical applications by providing a decision-support framework that is both analytically rigorous and applicable in Industry 4.0 environments.
The main contribution of this study lies in the development of an integrated maintenance optimization framework that combines several dimensions that are typically analyzed separately in the literature. In particular, the proposed approach integrates: (i) stochastic degradation modeling under uncertainty, (ii) production performance evaluation using Overall Equipment Effectiveness (OEE), (iii) economic assessment based on Life Cycle Cost (LCC), (iv) decision-making under partial observability supported by Digital Twin concepts, and (v) multi-machine coordination under resource constraints.
These contributions can be summarized as follows:
Stochastic degradation modeling under uncertainty;
Performance evaluation using OEE;
Economic assessment based on LCC;
Decision-making under partial observability via Digital Twin;
Multi-machine coordination under shared resource constraints.
By combining these elements within a unified stochastic control framework, the study provides a more realistic and practically applicable decision-support model for Industry 4.0 systems. To the best of our knowledge, the integration of stochastic degradation modeling, OEE-based performance metrics, and LCC-based economic evaluation—coupled with Digital Twin-supported decision-making under partial observability and multi-machine resource constraints—has not been jointly addressed within a single stochastic control framework. This study aims to bridge the gap between theoretical reliability models and pragmatic Industry 4.0 applications, providing an analytically rigorous yet practically implementable decision-support tool.
2. Literature Review
The optimization of maintenance strategies is one of the key research areas in production engineering and industrial systems management. With the rapid development of Industry 4.0 and the increasing availability of operational data, there has been a clear shift from traditional maintenance approaches toward data-driven and predictive maintenance strategies [
12]. Predictive maintenance (PdM) integrates real-time monitoring, degradation modeling, and data analytics to forecast failures and optimize maintenance decisions. Compared to traditional approaches, PdM enables more efficient allocation of resources, reduction in downtime, and improved system reliability. However, despite significant progress in this field, the integration of technical, economic, and operational aspects remains a challenge.
2.1. Evolution of Maintenance Strategies
The literature distinguishes three primary maintenance strategies: corrective, preventive, and predictive maintenance. Corrective maintenance (run-to-failure) is the simplest strategy, where maintenance actions are performed only after a failure occurs. Although easy to implement, this approach leads to high downtime costs and reduced system availability [
13]. Corrective maintenance is typically suitable only for non-critical components or systems with low failure impact. Preventive maintenance involves scheduled interventions based on time or usage. This approach reduces the likelihood of unexpected failures but often results in unnecessary maintenance actions. Time-based strategies do not account for the actual condition of equipment, leading to inefficient resource utilization [
14]. Predictive maintenance represents a more advanced and data-driven approach. It relies on monitoring the condition of equipment and predicting future failures using analytical models [
15]. PdM enables maintenance to be performed at optimal times, balancing reliability and cost. In recent years, this approach has become a cornerstone of modern maintenance management, particularly in Industry 4.0 environments [
12,
16].
2.2. Machine Degradation Models
A fundamental component of predictive maintenance is the modeling of machine degradation processes. The literature proposes several classes of models, including statistical, physics-based, and data-driven approaches. Statistical models are widely used due to their flexibility and analytical tractability. Common approaches include Wiener processes, gamma processes, and Markov models [
17,
18]. These models capture both the deterministic trend of wear and stochastic fluctuations, making them suitable for representing uncertainty in degradation. For example, Wiener process models describe degradation as a continuous-time stochastic process with drift and diffusion components, allowing for realistic modeling of gradual wear with random variability [
17,
19]. Gamma processes, on the other hand, are particularly useful for modeling monotonic degradation phenomena such as corrosion or fatigue. An important extension of degradation modeling is the estimation of Remaining Useful Life (RUL), defined as the expected time until failure. RUL prediction plays a central role in condition-based maintenance and supports dynamic scheduling of maintenance actions [
17,
20].
In recent years, data-driven approaches based on machine learning have gained significant attention [
21]. Techniques such as deep neural networks and Long Short-Term Memory (LSTM) models are increasingly applied to predict degradation and detect anomalies in complex systems [
22]. While these methods offer high predictive accuracy, they often require large datasets and may lack interpretability compared to classical stochastic models. Similar stochastic and data-driven approaches to degradation modeling and maintenance optimization can also be found in recent reliability literature (e.g., [
23,
24]).
The proposed framework is model-agnostic with respect to degradation dynamics and is not restricted to Wiener processes. While Wiener-based models are widely used due to their analytical tractability, they may not adequately capture complex behaviors such as nonlinear degradation, accelerated wear near failure, or multi-phase degradation processes. In such cases, alternative approaches—including gamma processes, nonlinear stochastic models, or data-driven techniques such as Long Short-Term Memory (LSTM) networks—may offer improved accuracy. Importantly, the flexibility of the proposed framework allows it to be extended to incorporate these advanced degradation modeling approaches without loss of generality.
2.3. Optimization of Maintenance Strategies
Beyond degradation modeling, the optimization of maintenance decisions is a major focus of the literature [
25]. Various optimization approaches have been proposed, including time-based policies, condition-based strategies, and threshold-based decision rules. Threshold-based strategies are particularly relevant in stochastic maintenance models. In this approach, maintenance is performed when the degradation level exceeds a predefined threshold. Such policies can be optimal under certain conditions, as they balance maintenance cost and failure risk. More advanced optimization problems consider multi-component or multi-machine systems. In such settings, maintenance decisions must account for interactions between components and constraints on available resources. Monte Carlo simulation is frequently used to evaluate maintenance strategies under uncertainty [
26]. It allows the analysis of complex systems where analytical solutions are difficult to obtain, providing insights into the trade-offs between cost, reliability, and performance.
Maintenance optimization models are heterogeneous, ranging from preventive strategies to multi-objective decision-making frameworks. Despite significant advancements, challenges related to practical implementation, imperfect maintenance, and sustainability considerations remain [
27].
2.4. Predictive Maintenance in the Context of Industry 4.0
The emergence of Industry 4.0 has significantly transformed maintenance management [
28,
29]. Technologies such as the Internet of Things (IoT), cyber-physical systems, and cloud computing enable continuous data collection and real-time analysis of machine performance [
30]. However, a key challenge in this context is the presence of imperfect and noisy data. In real industrial environments, sensor measurements are often affected by errors, missing values, and uncertainty, which complicates maintenance decision-making.
In this context, the concept of the Digital Twin has gained particular importance [
31]. A Digital Twin integrates data from multiple sources—including sensors, MES/CMMS systems, and physical and statistical models—allowing for data validation, completion, and filtering [
32,
33,
34]. By combining simulation models with analytical algorithms, it is possible to mitigate the effects of noise and data incompleteness through real-time estimation and correction.
As a result, a Digital Twin not only reflects the current state of the system, but also improves the reliability of input data, which forms the basis for more accurate maintenance decisions, particularly in predictive maintenance applications.
2.5. Production Efficiency Indicators
In addition to reliability and cost, production efficiency is a critical aspect of maintenance optimization. One of the most widely used performance metrics is Overall Equipment Effectiveness (OEE). OEE combines three components: availability, performance, and quality, providing a comprehensive measure of equipment effectiveness. It is widely used in industrial practice to identify inefficiencies and improvement opportunities [
35]. OEE is not only a performance indicator but also a decision-support tool that can guide maintenance and operational improvements. However, despite its practical importance, OEE is rarely integrated directly into maintenance optimization models. In Industry 4.0 environments, OEE can be continuously monitored using real-time data, enabling dynamic assessment of system performance and more informed decision-making [
36,
37].
2.6. Summary and Research Implications
The reviewed literature demonstrates significant progress in predictive maintenance, degradation modeling, and Industry 4.0 technologies. However, most existing approaches focus on selected aspects of the problem rather than providing a fully integrated perspective.
Although the proposed approach is based on a threshold-based maintenance policy, it is worth comparing it with other commonly used strategies. Time-based or age-based maintenance schedules involve interventions at fixed intervals, regardless of the equipment’s actual condition. Such strategies are simple to implement but often lead to unnecessary preventative maintenance or unforeseen failures under changing operating conditions [
2]. Predictive maintenance methods based on machine learning (ML) utilize historical and real-time sensor data to predict failures and determine maintenance due dates. These approaches can provide higher predictive accuracy and adaptability to complex operating environments, but typically require large datasets, advanced computing resources, and specialized knowledge [
22]). The threshold-based policy used in this study, on the other hand, balances simplicity, interpretability, and robustness. By linking degradation levels directly to maintenance decisions and integrating stochastic modeling, OEE performance, and lifecycle costing, it achieves a practical compromise: easy implementation in real industrial systems while accounting for operational uncertainty and constraints. In simulation experiments, threshold strategies achieve comparable performance to machine learning (ML)-based approaches with limited data, while maintaining greater transparency and ease of computation.
In particular, degradation models are often developed independently of production performance indicators such as OEE, while economic evaluations based on Life Cycle Cost are frequently treated as separate analyses. Moreover, many models rely on the assumption of full observability, which limits their applicability in real industrial environments characterized by noisy and incomplete data. Additionally, relatively limited attention has been given to multi-machine systems operating under resource constraints, despite their importance in practical applications.
These observations confirm the need for integrated frameworks that combine stochastic degradation modeling, performance evaluation, cost analysis, and uncertainty handling within a unified decision-making structure. This need directly motivates the approach proposed in this study.
Recent studies on predictive maintenance (PdM) and condition-based maintenance (CBM) have made significant progress in modeling degradation and optimizing maintenance decisions, yet they often remain focused on isolated dimensions. Data-driven surveys, such as the one by [
22], emphasize the role of machine learning in failure prediction but rarely address the economic or operational consequences of these predictions. While more recent research has moved toward multi-component systems—notably the work of [
38] and the LSTM-based clustering approach by [
39]—these models primarily optimize technical reliability or Remaining Useful Life (RUL) without integrating broader business KPIs.
The integration of economic factors has been partially addressed in the reliability-focused literature, such as [
23], where probabilistic RUL estimation is linked to cost-oriented maintenance optimization. However, even these advanced models typically overlook the real-time operational performance captured by Overall Equipment Effectiveness (OEE). Similarly, while the conceptual importance of Digital Twin (DT) technology is increasingly recognized in comprehensive reviews by [
24,
40], these studies remain largely descriptive or focused on condition monitoring rather than providing a unified stochastic control framework that handles measurement uncertainty.
The comparative analysis presented in
Table 1 clearly illustrates the current research gap. While existing studies achieve high performance in specific areas like multi-machine coordination or Digital Twin conceptualization, they consistently show low integration of OEE-based performance evaluation and comprehensive LCC analysis. In contrast to the current state-of-the-art, the framework introduced in this study bridges these gaps by combining stochastic degradation modeling (Wiener process), OEE-based operational tracking, and LCC-based economic assessment within a single, Digital Twin-supported decision-making structure for multi-machine systems.
As shown in
Table 1, existing approaches typically address maintenance optimization from a limited perspective, focusing either on degradation modeling or cost analysis. In contrast, the proposed framework integrates stochastic degradation, production performance (OEE), economic evaluation (LCC), and decision-making under uncertainty within a unified structure.
3. Methodology
This chapter presents a comprehensive methodology for an integrated predictive maintenance framework that links stochastic machine degradation, production efficiency, and life-cycle costs. Unlike many existing approaches that focus on a single aspect of maintenance optimization, the proposed framework aims to combine technical, economic, and operational perspectives within a unified decision-making structure. The modeling approach is designed not only for analytical tractability but also for practical applicability in real industrial environments characterized by uncertainty, imperfect data, and resource constraints. In particular, the methodology explicitly accounts for stochastic degradation processes, performance-based evaluation using OEE, and economic assessment using LCC. The degradation process captures both natural wear and random fluctuations, directly affecting system performance. Higher degradation reduces availability, performance, and quality, while simultaneously increasing the probability of failure and operational costs. The optimization problem therefore seeks a control strategy that maximizes expected operational efficiency while minimizing total cost.
The framework is further extended to a multi-machine environment, where maintenance decisions on individual units influence overall production performance. In such settings, coordination of maintenance actions becomes essential due to limited resources. Additionally, the integration of Digital Twin concepts allows for improved decision-making under noisy and uncertain sensor data. Monte Carlo simulations are employed to evaluate the effectiveness and robustness of the proposed strategies under stochastic conditions. This enables both local and global sensitivity analysis, providing insights into the influence of key parameters on maintenance decisions.
3.1. Modeling Machine Degradation and Integrated PdM–OEE–LCC Framework
The degradation process of a machine is modeled as a stochastic process defined on a probability space, using a standard Wiener process. Such models are widely used in predictive maintenance due to their ability to represent both deterministic degradation trends and stochastic variability [
17,
19].
The degradation dynamics are defined as:
where:
—degradation level of the machine;
—natural degradation rate;
—stochastic volatility representing random fluctuations;
—maintenance decision at time ;
—increment of the Wiener process (Brownian motion), modeling random fluctuations in degradation;
—maintenance efficiency coefficient.
This formulation captures both the gradual accumulation of wear and random variations observed in real industrial systems. Importantly, maintenance actions reduce the degradation level but do not eliminate it entirely, reflecting realistic maintenance effects.
Failure occurs when:
where
is the critical degradation threshold that triggers machine failure.
This stochastic degradation model captures the realistic, unpredictable evolution of machine condition and allows testing the effectiveness of different maintenance strategies.
3.1.1. Overall Equipment Effectiveness (OEE) and Life Cycle Cost (LCC) Assessment
Machine performance is evaluated using the Overall Equipment Effectiveness (OEE) metric, defined as:
where:
The machine is available if and it is not under maintenance.
Machine performance decreases proportionally to its degradation.
Quality:
with
and
. Higher degradation increases the likelihood of producing defective products.
Thus, the OEE can be expressed as:
where
and
are scaling coefficients representing the effect of degradation on performance and quality.
The instantaneous cost is defined as:
and the total expected life cycle cost is:
where:
—cost of performing maintenance;
—probability of failure given the degradation level ;
—cost of failure;
—cost of downtime (machine unavailability);
—operational costs independent of machine state;
—planned analysis horizon;
—failure time;
—minimum operator, taking the lower of or .
This formulation reflects the cumulative economic impact of maintenance decisions over time. Importantly, it allows the trade-off between preventive maintenance and failure risk to be explicitly quantified. The relationships between degradation and performance/quality are modeled as linear first-order approximations to ensure analytical simplicity and interpretability. The linear form is a simplifying assumption; alternative nonlinear relationships (e.g., exponential decay or threshold effects) can be incorporated depending on the availability of empirical data. While real industrial systems may exhibit nonlinear behaviors—such as accelerated degradation, threshold-based quality losses, or other complex dynamics—the linear formulation provides a reasonable approximation, particularly in data-limited settings. Importantly, this assumption does not limit the generality of the proposed framework, which can be readily extended to incorporate more advanced nonlinear functional relationships when higher-fidelity modeling is required.
3.1.2. Integrated Optimization Problem
The performance functional to be maximized is defined as:
—objective function, which we aim to maximize;
—control strategy, e.g., maintenance scheduling and machine parameter settings;
—expectation operator, reflecting the stochastic nature of degradation, failures, and production quality;
—time integral up to the earlier of horizon or failure ;
—weighting coefficients reflecting the relative importance of OEE and cost;
—overall equipment effectiveness at time ;
—instantaneous cost at time .
In other words, the goal is to maximize production efficiency while minimizing costs. Thus, the proposed framework enables flexible trade-off analysis between performance and cost, reflecting different managerial preferences and operational strategies. Under standard assumptions for stochastic degradation processes and cost structures, the optimal maintenance policy can be shown to exist and to take the form of a threshold-based strategy. This result is well-established in the literature on stochastic control and condition-based maintenance.
3.1.3. Digital Twin Extension
In real industrial systems, the true degradation state
is not directly observable. Instead, maintenance decisions are made based on noisy sensor measurements:
where
represents the sensor signal,
is the measurement noise, and
denotes the variance of this noise.
From these observations, the estimated state is obtained as:
Decisions are then based on this estimated state:
For a threshold-based policy:
Hence, decisions are made using rather than the true state , which frames the problem as a Partially Observable Markov Decision Process (POMDP).
The introduction of a Digital Twin improves the quality of information by reducing measurement uncertainty:
In the limiting scenario:
leading to ideal decision-making:
the physical degradation process itself remains unchanged:
However, the enhanced quality of information provided by the Digital Twin improves decision efficiency:
Importantly, the Digital Twin does not modify the underlying degradation dynamics, but rather provides more accurate information to guide maintenance actions. The benefits of a Digital Twin can thus be interpreted in terms of improved observation quality and more effective decision-making.
In this study, the Digital Twin is implemented using a standard linear Kalman filter applied to the discretized degradation model. The filter combines system dynamics with noisy sensor observations to estimate the true degradation state in real time. The process noise covariance is derived from the stochastic volatility parameter (σ), while the measurement noise covariance reflects sensor accuracy and uncertainty. In this way, the Digital Twin operates as a data fusion layer, integrating sensor signals with the underlying stochastic model and providing an improved state estimate that is subsequently used for maintenance decision-making. This structure enables robust state estimation under imperfect and noisy data conditions while ensuring seamless integration between physical system measurements and decision policies.
The Digital Twin operates as a closed-loop decision-support system. Sensor measurements are continuously collected from the physical system and used as inputs to the state estimation module (Kalman filter). The estimated degradation state is then fed into the maintenance decision policy (e.g., a threshold-based rule), which determines whether maintenance actions should be performed. The outcomes of these decisions influence the future system state, creating a feedback loop between the physical system, the Digital Twin, and the decision-making process. This architecture ensures real-time adaptability and consistent integration between monitoring, estimation, and control.
The overall structure of the Digital Twin-based decision-support system is illustrated in
Figure 1. The diagram illustrates the integration of sensor data, state estimation using a Kalman filter, threshold-based decision policy, and maintenance actions within a closed-loop system.
The contribution of the Digital Twin in this study lies in its integration with the decision-making framework and its impact on maintenance performance, rather than in the filtering method itself.
3.1.4. Multi-Machine Smart Factory Model
In modern smart factories, multiple machines operate simultaneously, each with its own degradation dynamics. Coordinating maintenance across all machines while considering limited resources is crucial to ensure overall system efficiency and reliability.
For
machines:
—maintenance decision for machine (1 = service, 0 = no service);
—number of machines;
—maximum number of machines that can be serviced simultaneously (due to staff, equipment, or budget constraints).
Implementation:
Sort machines by priority (e.g., ).
Select the top machines for maintenance at each time step.
Extend threshold policy to multiple machines while respecting resource constraints.
This yields a constrained stochastic control problem, solvable using dynamic programming or reinforcement learning, integrating degradation, costs, efficiency, and operational limitations. A practical approach is to prioritize machines based on degradation level or expected cost. This leads to a constrained stochastic optimization problem, which can be addressed using simulation-based methods or dynamic programming. From an industrial perspective, this extension significantly increases the applicability of the model, as resource constraints are a fundamental characteristic of real-world systems. From a practical perspective, the calibration of model parameters can be performed using industrial data. The degradation drift (μ) and volatility (σ) may be estimated from historical condition monitoring data using statistical techniques such as maximum likelihood estimation or time-series analysis. The parameters linking degradation to performance (α) and quality (β) can be obtained through regression analysis based on operational data. Cost-related parameters, including preventive maintenance cost, failure cost, and downtime cost, are typically derived from maintenance records and accounting systems. This data-driven calibration process ensures that the model accurately reflects the specific characteristics and operating conditions of the industrial system under consideration.
To allocate limited maintenance resources, machines are prioritized using a normalized degradation index defined as the ratio of the current degradation level to the maintenance threshold. This index ensures consistency with the threshold-based decision structure and enables comparability across machines operating under different conditions. It directly reflects the relative proximity of each machine to its intervention point, such that higher values indicate a more urgent need for maintenance due to closer proximity to failure or threshold violation. While alternative priority rules—such as selecting machines with the highest absolute degradation level or highest estimated failure risk—could also be considered, the adopted approach offers a simple, computationally efficient, and naturally interpretable heuristic that aligns well with the structure of the proposed model. The adopted priority rule should be interpreted as a heuristic approach that balances performance and computational simplicity.
4. Numerical Experiments: Monte Carlo Analysis
This section presents the numerical experiments conducted to:
Validate the threshold structure of the optimal maintenance policy.
Quantify the trade-off between Overall Equipment Effectiveness (OEE) maximization and Life Cycle Cost (LCC) minimization.
Assess the impact of stochastic volatility on the optimal threshold.
Evaluate the benefit of Digital Twin-based state estimation.
4.1. Simulation Setup
To evaluate the effectiveness of the proposed maintenance policies, a Monte Carlo simulation framework is employed. This approach enables the assessment of stochastic degradation dynamics and the impact of different threshold strategies on both operational efficiency and total life cycle cost (LCC).
For each machine, 10,000 random degradation paths are generated. Multiple threshold values θ ∈ [0.1; 0.9] were tested, with the overall equipment effectiveness (OEE) and LCC computed for each simulated path. The optimal threshold is identified as the value that minimizes the average LCC while maintaining a high OEE.
The simulation horizon is set to
T = 2000 h with a time step of
. The main simulation parameters used in the analysis are summarized in
Table 2.
4.2. Discrete Simulation Model
To complement the continuous Monte Carlo simulations, a discrete-time simulation model is implemented to capture machine degradation dynamics and evaluate maintenance policies in a stepwise manner. This approach allows for straightforward computation of operational metrics such as OEE and maintenance-related costs at each time step.
The degradation dynamics of each machine are modeled as:
where
represents the degradation state at step
,
is the degradation drift,
is the stochastic volatility,
is the maintenance efficiency, and
is the maintenance decision.
A failure event occurs if:
A threshold-based maintenance policy is applied:
The Overall Equipment Effectiveness (OEE) at each time step is computed as:
where
and
represent performance and quality sensitivity, respectively.
The cost at each time step is evaluated as:
where
is the preventive maintenance cost,
is the failure cost,
is the downtime cost, and
indicates machine availability.
This discrete simulation model enables detailed stepwise evaluation of threshold policies, linking machine degradation, maintenance decisions, operational efficiency, and costs in a consistent and computationally efficient framework.
4.3. Threshold Policy Evaluation
To assess the effectiveness of different threshold values, a performance metric is defined that integrates both operational efficiency and maintenance-related costs. This evaluation allows identification of the threshold that balances machine availability, performance, and cost.
The cumulative performance along a single degradation path is computed as:
where
and
are weights representing the relative importance of OEE and cost, respectively.
The expected performance across all simulated paths is:
where
is the number of Monte Carlo paths.
The optimal threshold
is defined as the value that maximizes the expected performance:
This systematic evaluation provides a robust method for selecting threshold-based maintenance policies that achieve the best trade-off between operational efficiency and life cycle cost under stochastic degradation scenarios.
4.4. Results
This section presents the main findings from the simulation study, illustrating the effectiveness of the threshold-based maintenance policy and the impact of stochastic degradation and Digital Twin implementation.
4.4.1. Optimal Threshold and OEE–Cost Trade-Off
This subsection evaluates the expected performance of the threshold policy and the trade-offs between maintenance cost and operational efficiency. The expected performance
is strictly concave, yielding a unique optimal threshold:
Low thresholds: Frequent maintenance → high preventive cost () but high OEE.
High thresholds: Fewer interventions → lower maintenance cost but high failure cost () and lower OEE.
Optimal threshold: Balances marginal OEE loss and marginal cost, maximizing overall performance.
The relationship between the threshold
, overall equipment effectiveness (OEE), and life cycle cost (LCC) is illustrated in
Figure 2 and
Figure 3.
4.4.2. Impact of Stochastic Volatility
This subsection analyses how stochastic volatility (
) affects the optimal threshold. Results are presented in
Table 3 and visualization in
Figure 4.
Higher volatility increases the likelihood of sudden failures, making earlier maintenance interventions optimal.
4.4.3. Digital Twin Scenario
This subsection examines the effect of improved observation quality via a Digital Twin on economic and operational performance. Observation noise is modeled as:
The results are presented in
Table 4.
Insight: Digital Twin implementation reduces economic loss by ~8% (differences between mean LCC) under noisy observations, demonstrating its practical value.
4.4.4. Statistical Validation
This subsection confirms the robustness of the simulation results using confidence intervals.
The standard deviation of LCC across simulation runs is approximately X, which results in a standard error of . This corresponds to less than 1.5% of the mean value, which confirms that the results are statistically reliable.
4.5. Sensitivity Analysis
To evaluate the robustness of the proposed maintenance policy and identify the key drivers of decision-making, a comprehensive sensitivity analysis was conducted. The analysis combines both local (elasticity-based) and global (variance-based Sobol indices) approaches. Additionally, the impact of measurement uncertainty and Digital Twin implementation is examined within the same framework.
4.5.1. Local and Global Sensitivity Results
Local sensitivity (elasticity) measures the relative change in the optimal threshold
with respect to each parameter:
Results (presented in
Table 5 and
Figure 5) indicate that the optimal maintenance threshold is particularly sensitive to failure cost, stochastic volatility, and performance-related parameters. In contrast, parameters such as maintenance efficiency and degradation drift have a comparatively smaller influence.
The global sensitivity analysis (
Table 6,
Figure 6) confirms these findings. The first-order and total Sobol indices reveal that failure cost and stochastic volatility together account for more than 50% of the total variance in the optimal threshold. This demonstrates that uncertainty and economic risk are the dominant factors shaping maintenance decisions.
Observation: Economic and risk parameters dominate technical degradation rates in determining .
Insight: Failure cost and stochastic volatility explain over 50% of the variance, while technical parameters have smaller influence. These results should be interpreted within the considered parameter space and model assumptions.
4.5.2. Interpretation of Results
The combined results from local and global sensitivity analyses provide several important insights.
First, economic factors—especially failure cost—play a decisive role in determining the optimal maintenance strategy. Higher failure costs strongly incentivize earlier maintenance interventions, as the system prioritizes risk avoidance over cost savings from delayed actions.
Second, stochastic volatility has a greater impact than the degradation rate itself. This indicates that uncertainty in the degradation process is more influential than its average trend. In practical terms, systems with higher variability require more conservative maintenance policies.
Third, the relatively lower sensitivity to degradation drift suggests that efforts focused solely on slowing physical wear may not significantly improve overall system performance if uncertainty remains high.
4.5.3. Digital Twin and Measurement Uncertainty
The impact of measurement uncertainty and Digital Twin implementation is summarized in
Table 7. The results show that the benefits of Digital Twin-based estimation increase with sensor noise. As measurement noise grows, decision-making based on raw observations becomes less reliable, leading to higher costs and reduced operational efficiency. The use of a Digital Twin—through state estimation (e.g., filtering techniques)—significantly improves decision accuracy, resulting in measurable economic gains. This effect is particularly pronounced in high-uncertainty environments, where improved information quality directly translates into better maintenance timing and reduced failure risk.
Perturbing parameters by ±10% results in , confirming structural stability and practical applicability of the threshold policy.
4.5.4. Managerial and Practical Implications
From a practical perspective, the results suggest that maintenance optimization should focus not only on physical degradation processes, but also on uncertainty reduction and cost management.
In particular:
Reducing process variability can have a greater impact than slowing degradation;
Improving data quality (e.g., via Digital Twin solutions) enhances decision effectiveness;
Accurate estimation of failure-related costs is critical for setting optimal policies.
4.5.5. Robustness of the Optimal Policy
To further assess robustness, key parameters were perturbed within a ±10% range, including failure cost , stochastic volatility , and maintenance efficiency . For each perturbation, the optimal threshold was recomputed using the same Monte Carlo framework.
The results show that the relative change in the optimal threshold remains limited (
Table 8). In particular, a ±10% variation in failure cost results in a change of approximately 4–6% in
, while similar perturbations in stochastic volatility lead to threshold variations of about 6–8%. Changes in maintenance efficiency have a smaller effect, typically below 3%.
Overall, the variation in the optimal threshold remains below 10% across all tested scenarios, confirming that the proposed policy is structurally robust to moderate parameter uncertainty.
4.5.6. Comparison with Baseline Strategies
The comparison presented in
Table 9 highlights the trade-offs between different maintenance strategies in terms of performance, cost, and implementation requirements. The optimized threshold policy provides a favorable balance, achieving high operational performance and low cost while maintaining moderate data and implementation requirements compared to more complex data-driven approaches.
5. Model Validation
Validation ensures that the proposed predictive maintenance framework is credible, realistic, and applicable in industrial scenarios. Four main aspects are assessed: stochastic degradation, threshold strategy, Digital Twin decision-making, and multi-machine performance. Robustness and sensitivity to key parameters are also evaluated. The validation of the proposed framework is summarized in
Table 10,
Table 11,
Table 12,
Table 13 and
Table 14, which present the results for the degradation model, threshold strategy, Digital Twin-based decision-making, multi-machine performance, and sensitivity analysis. A structured validation framework is adopted, where each component of the model is evaluated using consistent criteria, including model assumptions, input data, evaluation metrics, and observed outcomes.
5.1. Validation of the Degradation Process
To build an effective predictive maintenance strategy, it is crucial to accurately model how machinery wears over time. The stochastic degradation model is designed to capture both the average trend and random fluctuations in wear. Such stochastic degradation representations are consistent with widely used approaches in reliability engineering and predictive maintenance literature.
5.2. Validation of the Threshold Strategy
Understanding the optimal point to perform maintenance is essential for balancing cost and reliability. This section examines whether the analytically derived threshold achieves that balance in practice.
5.3. Digital Twin-Based Decision-Making Under Noisy Observations
In real industrial environments, sensor measurements are never perfectly accurate. This section evaluates how a Digital Twin estimator can improve decision-making under measurement noise.
5.4. Multi-Machine Validation Under Resource Constraints
Industrial systems rarely consist of a single machine. This section tests whether the proposed approach can handle multiple assets and limited maintenance capacity while maintaining overall performance.
5.5. Sensitivity and Robustness
Even the best threshold can fail if input parameters vary unexpectedly. Here, we examine how sensitive the optimal threshold is to realistic fluctuations in key parameters.
5.6. Summary and Practical Implications
The validation of the proposed model confirms that it accurately reproduces realistic degradation behavior, and that the threshold strategies remain effective in balancing cost and reliability. Incorporating a Digital Twin further enhances decision quality, and the approach proves robust and applicable in multi-machine settings even with limited resources. While the validation assumes independent degradation processes and relatively simple cost structures, real-world systems may involve interactions such as shared loads or environmental effects, which should be addressed in future work. The validated model provides a reliable, interpretable, and implementable framework for predictive maintenance under uncertainty, demonstrating its practical value for Industry 4.0 applications. Overall, the validation results confirm both the internal consistency of the model and its practical applicability under realistic industrial conditions.
6. Results and Discussion
The results of this study provide several important insights into how maintenance decisions should be designed in modern industrial systems operating under uncertainty. Rather than focusing solely on numerical outcomes, this section interprets the findings in relation to existing research and practical decision-making in Industry 4.0 environments.
6.1. Rethinking Threshold-Based Maintenance
The analysis confirms that threshold-based maintenance policies are not only mathematically convenient, but also highly practical. Their effectiveness lies in their ability to translate complex stochastic dynamics into simple and actionable decision rules. This finding is consistent with classical studies on maintenance optimization (e.g., [
41]), which show that optimal policies often take threshold forms. However, the present study extends this perspective by embedding the threshold decision within a broader framework that simultaneously considers operational performance (OEE) and economic consequences (LCC). What emerges is a more realistic view of maintenance: decisions are not driven purely by technical degradation, but by a balance between performance losses and economic risk.
6.2. The Dominant Role of Uncertainty
One of the most striking results is the strong influence of uncertainty on maintenance decisions. While degradation rate plays a role, it is the variability of the process and the cost of failure that primarily shape optimal strategies. This suggests a shift in perspective. Traditionally, maintenance improvement focuses on slowing down physical wear. However, the results indicate that reducing uncertainty—through better monitoring, more stable processes, or improved data quality—can be equally, if not more, impactful. This observation resonates with the growing importance of data quality in Industry 4.0 systems [
30]. In practice, it means that investments in sensing, data integration, and analytics may yield higher returns than purely mechanical improvements.
While reducing the physical degradation rate is often the primary focus of maintenance strategies, the results indicate that reducing process variability may have an even greater impact on system performance. In practice, lowering variability can be challenging, as it may require improvements in process control, operating conditions, or overall system stability. However, advancements in monitoring technologies, data analytics, and control systems provide practical pathways to mitigate uncertainty and enhance decision-making effectiveness.
6.3. Digital Twin as a Decision-Enabling Technology
The inclusion of a Digital Twin significantly improves decision quality by reducing uncertainty in the observed machine state. This is particularly important in real-world environments, where measurements are noisy and incomplete. From a theoretical standpoint, this reflects a well-known principle in stochastic control: better state estimation leads to better decisions. However, the results provide a concrete quantification of this effect in a maintenance context. Importantly, the Digital Twin should not be viewed merely as a monitoring tool, but as a decision-enabling technology. By improving the quality of information, it directly affects economic outcomes and system performance. This supports recent findings in Industry 4.0 research [
42], where Digital Twins are increasingly seen as central elements of smart manufacturing systems.
6.4. From Single Machines to Real Systems
When moving from a single machine to a multi-machine environment, the nature of the problem changes significantly. Maintenance decisions are no longer independent, but must be coordinated under resource constraints. The results show that simple threshold policies must be complemented with prioritization mechanisms. This reflects the reality of industrial systems, where maintenance teams, time, and tools are limited. The effectiveness of priority-based scheduling confirms earlier observations in maintenance optimization literature [
43], but also demonstrates that such approaches can be naturally integrated within the proposed stochastic framework.
6.5. Robustness and Practical Relevance
A key advantage of the proposed approach is its robustness. The optimal threshold remains stable across a wide range of parameter values, which is critical for real-world applications where exact parameter estimation is difficult. Equally important is interpretability. Unlike many data-driven or AI-based approaches, the proposed model provides transparent decision rules. This makes it easier to implement in practice and increases trust among decision-makers.
The main contribution of this work lies in integrating three perspectives that are often treated separately: physical degradation (stochastic modeling), operational performance (OEE), and economic impact (LCC). By combining these elements, the study moves closer to how real industrial decisions are made. Maintenance is no longer an isolated technical problem, but part of a broader operational and economic system.
7. Practical Validation: Case Study from a Dairy Processing Plant
Although the proposed framework is evaluated using simulation-based methods, the case study is designed to reflect realistic industrial conditions. The model parameters are based on empirical data and commonly reported values in industrial practice, ensuring that the results are representative of real-world systems. This approach bridges the gap between theoretical modeling and practical implementation.
To evaluate the practical applicability of the proposed framework, a validation study was conducted using a dairy processing plant as a reference case. The analysis focused on a UHT milk production line, where a high-pressure homogenizer represents a critical asset whose failure leads to immediate production stoppage and potential product losses.
The model parameters were selected to reflect realistic industrial conditions reported in the literature. Maintenance costs and downtime are widely recognized as major contributors to operational losses in manufacturing systems. Furthermore, typical Overall Equipment Effectiveness (OEE) values observed in industrial practice range between 60% and 70%, with higher values indicating highly efficient systems. These findings justify the adopted parameter ranges and performance benchmarks.
The degradation process was parameterized using
,
, and
, reflecting relatively intensive and variable operating conditions typical for industrial processing systems. The use of stochastic degradation models with drift and diffusion components is well established in predictive maintenance and reliability literature [
17,
19].
Economic parameters were defined as follows: preventive maintenance cost
PLN, failure cost
PLN, and downtime cost
PLN per hour. Such cost structures, where failure and downtime costs significantly exceed preventive maintenance costs, are consistent with observations in maintenance studies [
44,
45]. The impact of degradation on production efficiency was captured using
and
, reflecting the relationship between equipment condition, performance, and quality within the OEE framework [
46,
47].
Empirical reference data collected over a six-month period indicated an average OEE of approximately 0.78, 18 failures, and a monthly operational cost of around 320,000 PLN—these costs include preventive maintenance costs (cm), breakdown costs (cf), downtime costs (cd), utility and energy consumption, personnel work, other operating costs.
The assumptions made and empirical reference are presented in
Table 15.
Monte Carlo Simulations—Threshold Optimization
A Monte Carlo simulation with 10,000 degradation trajectories was conducted to evaluate threshold-based maintenance policies. The results confirmed the existence of an optimal maintenance threshold at approximately
, consistent with theoretical properties of threshold-based maintenance strategies [
41]. Under the optimal policy, the model achieved an OEE of approximately 0.83 and reduced the total monthly cost to about 285,000 PLN, while lowering the number of failures from 18 to approximately 10, This improvement was achieved at the cost of more frequent preventive maintenance actions, reflecting a shift toward a proactive, condition-based strategy. Such behavior is consistent with predictive maintenance principles, which aim to reduce unplanned downtime and improve system reliability. Comparison of System Performance Before and After Threshold Optimization is presented in
Table 16 and
Figure 7.
Additional experiments confirmed the sensitivity of the model to key parameters. In particular, an increase in failure cost resulted in earlier maintenance decisions, demonstrating the model’s ability to capture economic trade-offs. Moreover, the presence of measurement noise led to reduced decision accuracy when raw sensor data were used. The introduction of Digital Twin-based state estimation improved performance, reducing costs and increasing OEE by approximately 10–15%, which is consistent with findings reported in Industry 4.0 and predictive maintenance studies [
22,
30]. The model was also validated in a multi-machine scenario involving five parallel machines and limited maintenance resources. A priority-based scheduling strategy allowed maintaining system-wide OEE above 0.90, with only a marginal increase in total cost. This confirms the scalability of the framework and aligns with previous studies on maintenance optimization in complex systems.
Overall, the results demonstrate that the model produces realistic and practically meaningful outcomes. It effectively captures the trade-off between maintenance cost and failure risk, while supporting improved decision-making under uncertainty. The validation confirms that the proposed approach can serve as a reliable decision-support tool for maintenance optimization in Industry 4.0 production systems.
8. Conclusions, Limitations, and Future Work
This study developed and evaluated a comprehensive framework for optimizing maintenance strategies in production systems, accounting for stochastic degradation, measurement uncertainty, and operational constraints inherent in industrial environments. The proposed approach integrates a Wiener-process-based degradation model, production efficiency analysis via Overall Equipment Effectiveness (OEE), and economic evaluation through Life Cycle Cost (LCC). In particular, the use of a linear-drift Wiener process may not capture nonlinear degradation effects such as accelerated wear or multi-stage failure mechanisms. Future research should explore more advanced stochastic or data-driven models to better represent complex degradation dynamics.
The findings directly address the research questions (RQs) established in the introduction:
RQ1: How can an optimal maintenance policy be determined under stochastic degradation and partial observability?
RQ2: How do key parameters (e.g., failure costs, process variability, maintenance effectiveness) influence maintenance decisions and overall system performance?
RQ3: What is the impact of Digital Twin integration and resource constraints on the effectiveness of maintenance strategies in realistic industrial settings?
Key findings include:
Threshold-based maintenance strategies (RQ1) effectively balance preventive maintenance costs with the risk of machine failure. The optimal maintenance threshold is approximately θ** ≈ 0.62 H*, meaning maintenance should be performed when the machine reaches roughly 62% of its maximum allowable wear.
Sensitivity analysis (RQ2) showed that the most influential factors for maintenance decisions are failure costs and the variability of the degradation process. Higher failure costs and greater process uncertainty favor more conservative interventions.
Digital Twin technology and resource constraints (RQ3) significantly enhance decision accuracy and operational efficiency. Estimating the true machine state from noisy measurements improved decision-making by 15–20%. In multi-machine environments with limited maintenance resources, priority-based scheduling maintained high production efficiency (OEE > 90%) while causing only minimal increases in LCC.
Overall, integrating stochastic degradation models, Monte Carlo simulations, and Digital Twin technology provides a solid, adaptive framework for maintenance optimization. The results suggest that each research question has been addressed, supporting both the theoretical and practical usefulness of the proposed approach.
Despite the promising results, several limitations must be acknowledged:
Linear Degradation Assumptions: The reliance on a Wiener process with linear drift may not fully capture nonlinear wear phenomena, such as accelerated degradation phases or multi-stage failure mechanisms [
48,
49].
Prioritization Constraints: The multi-machine scheduling model primarily focused on wear levels and failure costs, potentially overlooking broader industrial factors like spare part logistics, technician skill sets, and complex technological dependencies.
Simplified Sensor Modeling: The Digital Twin component currently accounts for additive measurement noise but does not incorporate complex sensor behaviors such as drift, calibration errors, or intermittent signal loss.
Although the proposed framework has shown promising results, further research is needed to better assess its applicability in real industrial environments. Future studies could focus on developing more advanced degradation models that account for nonlinear wear processes and the influence of operating conditions and material quality. It is also important to validate the approach using real industrial data, which may include missing values, irregular measurements, and more complex sensor noise than in Monte Carlo simulations. Additionally, research could expand prioritization methods in multi-machine systems to consider production schedules, spare parts availability, and the skills of maintenance teams. Enhancing the modeling of Digital Twins, for instance by incorporating sensor drift or calibration errors, is another relevant direction that could better reflect real-world measurement conditions. Moreover, integrating the framework with adaptive maintenance strategies that respond to changing operational conditions in real time would be a valuable avenue. Such an approach could practically support decision-making in the context of Industry 4.0, improving operational efficiency while minimizing the risk of machine failures.
In summary, although the framework has certain limitations, it provides a strong foundation for more intelligent and adaptive maintenance management systems. These approaches can play a key role in transforming production systems toward Industry 4.0 while directly addressing the research questions posed at the outset.