An Optimal Opportunistic Maintenance Planning Integrating Discrete-and Continuous-State Information

: Information-driven group maintenance is crucial to enhance the operational availability and proﬁtability of diverse industrial systems. Existing group maintenance models have primarily concentrated on a single health criterion upon maintenance implementation, where the fusion of multiple health criteria is rarely reported. However, this is not aligned with actual maintenance planning of multi-component systems on many occasions, where multi-source health information can be integrated to support robust decision making. Additionally, how to improve maintenance eﬀectiveness through a scientiﬁc union of both scheduled and unscheduled maintenance remains a challenge in group maintenance. This study addresses these research gaps by devising an innovative multiple-information-driven group replacement policy for serial systems. In contrast to existing studies, both discrete-state information (hidden defect) and continuous degradation information are employed for group maintenance planning, and scheduled postponed maintenance and unscheduled opportunistic maintenance are dynamically integrated for the ﬁrst time to mitigate downtime loss. To be speciﬁc, inspections are equally spaced to reveal system health states, followed by the multi-level replacement implemented when either (a) the degradation of the continuously degrading unit reaches a speciﬁed threshold, or (b) the age of the multi-state unit since the defect’s identi-ﬁcation reaches a pre-set age (delayed replacement). Such scheduling further enables the implementation of multi-source opportunistic replacement to alleviate downtime. The Semi-Markov Decision Process (SMDP) is utilized for the collaborative optimization of continuous-and discrete-state thresholds, so as to minimize the operational costs. Numerical experiments conducted on the critical structure of circulating pumps verify the model’s applicability.


Introduction
Preventive maintenance is a critical element of asset health management, which has a significant impact on ensuring the operational reliability and availability of industrial plants [1,2], as well as enhancing their profitability [3,4]. According to the foundation of maintenance decision making, preventive maintenance can be substantially partitioned into two types, time-based maintenance (TBM) and condition-based maintenance (CBM) [5,6]. In recent decades, with the rapid advancement of sensor and data processing technology, CBM has a racted considerable a ention, with a series of field applications in areas such as navigation, rail transit, and advanced manufacturing [7,8]. The core principle of CBM is to collect multiple types of health information to support data-driven maintenance decision making, with full consideration of the system structure, operational conditions, and failure mechanisms [9,10].
Inspection, as a fundamental part of CBM, provides crucial supporting information for subsequent repair/spare part decisions [11,12]. Typically, there are two main types of health information that can be identified through inspections [13]. The first type of health data is continuous degradation information, which includes measurements such as blade crack length, wear accumulation of rotating machinery, capacity reduction of lithium batteries, and parameter drift of electronic devices [14]. The second type is discrete health status, which is typically observed in multi-state plants. Among these, the most commonly encountered pa ern during inspection activities is two-stage deterioration, where two successive and random stages occur before ultimate failure [15,16]. The former stage is called the defect initialization stage, while the la er stage is called the defect expansion stage, whose duration is also called the delay time [17,18]. Typical defects include dents, holes, cracks, out-of-control quality, etc., which broadly exist in industrial plants such as in bearings, pumps, medical equipment, and manufacturing lines [19]. The defective state, analogous to continuous degradation, is usually unfatal and hidden, and its effective identification and removal are closely related to inspection activities [20,21]. This highlights the importance of scheduling and optimizing inspection plans, which significantly affects maintenance performance [22,23].
Despite the extensive research into CBM planning of critical components degrading either continuously or discretely, the existing literature on group maintenance models has primarily focused on a single failure mode, either degradation or sudden failure type [24,25]. There remains a lack of a unified group maintenance framework that examines the interaction effect between two separate failure modes [26]. For instance, Gopalan et al. [27] and Malefaki et al. [28] analyzed the degradation analysis and condition-based maintenance modeling approaches for two-component systems subject to gradual degradation. Zhang [29] studied the optimal maintenance decision with regard to group maintenance of a two-component system subject to sudden failure, with application to petrochemical enterprises. Wang et al. [6,30] extended the single failure mode to competing failures, either degradation-based or shock-induced. However, scheduled group maintenance was not considered in their work since shock-induced failure does not hinge on inspection outcome. Xu et al. [15] investigated the group maintenance optimization of generalized multi-component systems subject to imperfect inspections, which was also limited to degradation-based failure mode.
After conducting a detailed literature review, we identified four significant research gaps that remain to be addressed. Firstly, the majority of current group maintenance models made maintenance decisions solely based on a single type of health information, either accumulated degradation or hidden failure/defect. There is a scarcity of group maintenance frameworks that sufficiently utilize both continuous and discrete health information to (a) improve the robustness of maintenance decision making and (b) enhance system effectiveness [31]. However, as for realistic industrial systems such as wind turbines, gas pumps, and aircraft, both continuous degradation information and discretestate information (for instance, defect information) are extensively seen through condition monitoring of crucial mechanical/electronic components. It is crucial to schedule group maintenance with sufficient consideration of both information arising from different failure modes, so as to develop more effective maintenance strategies [32,33]. Secondly, there are few studies that have investigated how inspection information can be used to integrate both (a) scheduled group maintenance (GM) and (b) unscheduled opportunistic maintenance (OM), with the goal of reducing downtime loss. Given that preventive actions for system components may vary depending on their health conditions, an effective integration of GM and OM is essential to capture the interaction between separate maintenance activities to alleviate maintenance downtime losses to a maximum extent [34,35]. Thirdly, most group maintenance models assume immediate maintenance, particularly for two-component systems [36]. This may be a sub-optimal option for multi-state plants since the potential of the remaining lifetime is not sufficiently exploited, leading to over-maintenance [37,38]. Also, immediate maintenance is unable to provide extra chances for opportunistic maintenance [39,40]. Fourthly, there is no high-efficiency algorithm to deal with such constrained maintenance interaction problems. The commonly adopted renewal-reward theory, although facilitating single-component maintenance modeling, confronts the renewal asynchrony challenges when applying to the OM-GM interaction model [41]. Simulation algorithms such as Monte Carlo, on the other hand, are challenged by the computation burden and model interpretability.
To address the foregoing research gaps, we innovatively devised an inspectiondriven, multi-source group maintenance policy for serial systems subject to both continuous and discrete deterioration processes. As opposed to previous studies, the information about (a) continuous degradation accumulation and (b) discrete early warning signals is incorporated to support the implementation of mutually interacted maintenance actions such as opportunistic maintenance and delayed maintenance. Through such information interaction, the robustness and timeliness of group maintenance policies can be sufficiently ensured, which ultimately improves the system service availability. Moreover, we are the first to schedule defect-induced postponed maintenance within group-conditionbased maintenance models, which allows preventive maintenance to be postponed to future inspection windows upon defect identification. As such, delayed replacement can be integrated with degradation-based replacement to constitute cost-effective group maintenance planning that sufficiently shares the set-up cost. Furthermore, we innovatively extract multiple maintenance opportunities from (a) the corrective replacement of both units, (b) the threshold-based replacement of degrading units, and (c) the delayed replacement of multi-state units. Through the scheduling of such multi-source opportunistic maintenance, the economic and structural dependence of the entire system can be fully harnessed to effectively mitigate system downtime. To the best of our knowledge, this is the first effort to integrate delayed maintenance and opportunistic maintenance within group maintenance models, which significantly promotes system profitability and availability through (a) multiple-source information fusion and (b) dynamic maintenance interaction. Ultimately, we devised a high-efficiency optimization algorithm oriented to such multi-variables that leverages the Semi-Markov Decision Process to solve the model convergence and computation burden problems caused by conventional renewal theories. The applicability of the proposed model was validated by numerical experiments on cycling pump systems experiencing both crack propagation and corrosive pi ing processes.
To summarize, this study contributes to group maintenance optimization from the following four perspectives:  Constructing an innovative group maintenance framework integrating both continuous and discrete health information, which dynamically integrates (a) opportunistic maintenance and (b) delayed maintenance to significantly enhance maintenance effectiveness and availability;  Allowing defect removals to be postponed so as to (a) exploit the remaining lifetime potentials and (b) offer extra chances for the selection and implementation of costeffectiveness opportunistic maintenance;  Scheduling multiple types of opportunistic maintenance arising from (a) thresholdbased replacement, (b) delayed replacement, and (c) corrective maintenance to sufficiently control system downtime and enhance decision-making robustness;  Realizing the high-efficiency optimization of maintenance interaction problems via the Semi-Markov Decision Process, and demonstrating the model's applicability through numerical experiments on a circling pump.
The remainder of this paper is structured as follows. Section 2 introduces the basic problem with regard to the basic system structure and unit failure mechanism. Section 3 designs the inspection-based maintenance policy. Section 4 formulates the maintenance model under the SMDP framework. Section 5 illustrates the applicability via a numerical experiment on a circling pump. Section 6 concludes the paper and lists some possible extensions.

Problem Description
This paper is mainly divided into four parts: degradation modeling, replacement policy, cost modeling, and optimization algorithm. Figure 1 is the research framework of this paper. Starting from this section, the optimal replacement policy for two-unit series systems considering discrete and continuous degradation information will be studied following the research ideas shown in the framework. Consider a deteriorating system that consists of two critical units connected in series, in that failure of each unit leads to an immediate system failure. The two units possess independent deterioration mechanisms throughout their lifetime. Unit 1 is continuously deteriorating with observable degradation trajectories through inspections. Such mechanisms are widely seen in diverse industrial plants, such as fatigue crack propagation, battery capacity reduction, and wear accumulation [27]. Unit 1 is deemed as failed when the accumulated degradation a ains a pre-set threshold , 0 D D  according to industrial standards or safety constraints. Unit 2 is a multi-state unit that encounters one or more unfatal transition states prior to ultimate failure, which can be viewed as effective earlywarning signals supporting timely preventive maintenance. The hidden health information of both units can only be identified by inspections, whose diagnosing outcomes support group maintenance decision making.
In this study, the degradation process of Unit 1 is characterized by the Wiener process. Wiener is a widespread stochastic process that captures non-monotone degradation behaviors, a ributed to its good mathematical properties and physical interpretability [8].
Accordingly, the underlying degradation process   X t is formulated as  is the drift process,  represents the diffusion coefficient, and   W t represents the standard Brownian motion. A prominent property of such a process is that the degradation increment within any interval is an independent variable following inverse Gaussian distributions. Notably, other forms of stochastic processes, such as random walk and Gamma processes, are also applicable without model restriction.
On the other hand, the deterioration process of Unit 2 is specified as two-phase deterioration, due to its generality and representativeness in inspection-based maintenance [40]. Such a process usually defines a non-fatal, identifiable defective state, during which the unit remains operational but experiences significantly higher malfunction risk. In other words, the random sojourn of the defect propagation process b  is statistically smaller than that of the defect initialization process a  . Representative instances of such defects include dents, holes, stripping, over-vibration, overheating, out-of-control quality, etc.
Obviously, inspections are crucial and fundamental preventive maintenance activities, as they report the hidden health state (either continuous or discrete) of both units, which supports timely, state-driven maintenance planning. In the following sections, we use the inspection outcomes to devise, formulate, and then optimize the group-level condition-based maintenance policy.

Maintenance Planning
The core focus of maintenance policy is to minimize the operational cost of the entire system by considering the following factors: (a) system structure, (b) unit failure behavior, and (c) combinations of maintenance activities. In particular, for a serial system with both structural dependency and economic dependency, group maintenance is more cost-effective than individual maintenance. Group maintenance allows for the sharing of set-up costs and the utilization of unavoidable maintenance downtime. In the remaining section, we will outline the approach for scheduling an inspection-based group maintenance policy that captures unit dependencies.

Basic Assumptions
In order to clarify the maintenance policy, some basic assumptions with regard to failure characteristics, operational conditions, and maintenance activities are outlined, with proper justifications or interpretations.
(a) Units 1 and 2 are as good as new when initially put into use. In other words, the initial degradation accumulation of Unit 1 and the virtual age of Unit 2 are equal to 0. This is a common assumption used to simply the maintenance problem, which can be easily relaxed [26]; (b) Inspections are instantaneous, non-destructive, and perfect. In other words, both the degradation severity (Unit 1) and the defective state (Unit 2) of the system can be accurately reported. This is a widely accepted se ing since well-prepared inspections, in contrast to sensor-based condition monitoring, incur negligible measurement error [7]; (c) Maintenance, either corrective, preventive, or opportunistic, returns the unit back to as-good-as-new status. This is equivalent to the effect of spare part replacement. In the rest of the section, we use maintenance and replacement interchangeably; (d) The time to execute maintenance activities is non-negligible, and these activities require the stoppage of the entire system, and the downtime loss cannot be ignored.

Group Maintenance Scheduling
As addressed earlier, group maintenance is a more cost-effective selection for multicomponent systems compared with individual maintenance [5,42], due to its capacity to (a) adequately harness unavoidable downtime and (b) save set-up and personnel costs. On the other hand, for a multi-phase plant with a defective state, it is suggested to postpone preventive maintenance when revealing the defect, instead of an immediate execution. Such postponement ensures a sufficient exploration of the remaining lifetime potential.
In this study, we designed a novel inspection-based, multi-dimensional maintenance policy, which is the union of four types of mutually interacting maintenance activities: (a) threshold-centered replacement, (b) delayed replacement, (c) opportunistic replacement, and (d) corrective replacement. In particular, opportunistic replacement is integrated with the other three types of activities to form maintenance groups, so as to enhance maintenance efficiency. Also, the provision of postponed maintenance offers more space and flexibility for opportunistic maintenance. The specific scheme of the maintenance policy is outlined below, and a specific situation of integrated maintenance activities is presented in Figure 2. Opportunistic replacement (OR). OR is available for both units, whose acceptance hinges on the entire system state. Here, two situations are possible: (a) OR of Unit 1. If Unit 2 fails unexpectedly due to defect evolution or requires replacement, Unit 1 is offered extra chances for OR. Specifically, if the degradation accumulation exceeds L , the chance is accepted and a cost 2 o C is incurred; otherwise, no action is taken; (b) OR of Unit 2. If the degradation of Unit 1 exceeds either the pre-set replacement threshold, while Unit 2 is waiting for DR, Unit 2 will be offered chances for OR.
To be specific, if the time elapsed since the defect identification exceeds , 1 VT V H   , the chance for OR is accepted and a cost 1 o C is incurred; otherwise, no action is taken;  Remark. The sufficient interaction and integration of OR with TR and DR addressed in this maintenance policy is an effective and robust way to exploit unit dependencies (structural dependency and economy dependency), so as to enhance operational profitability by minimizing system downtime and downtime loss. For similar reasons, predictive maintenance is a cost-effective solution for multi-phase units/systems due to its nature of (a) avoiding excessive maintenance, (b) extending the remaining lifetime, (c) allowing sufficient resource preparing, and (d) offering extra chances for OR.

Model Formulation and Optimization
The objective of the maintenance model is to minimize the average cost per unit time In this study, we strive to solve the maintenance problem under the Semi-Markov Decision Process (SMDP) framework, which has been proven an efficient and steady analytical approach to renewal problems with generalized state sojourn time [9,37]. To this end, the one-step transition probabilities of each unit are calculated, based on which the system transition probabilities are derived. Then, the expected sojourn time and cost of the system are provided, and the optimal maintenance strategy minimizing the cost is searched via a random search approach.

State Transition of Unit 1
We begin with the state transition of Unit 1. To this end, we first investigate its sto- represent the degradation process of Unit 1. Then, the degradation trajectory, starting from brand-new status, is formulated as It is well acknowledged that the Wiener process is an independent incremental process. Therefore, for , j j i  within time t . When Unit 1 is known to be operable at time , the one-step transition probability within a single inspection interval is wri en as where the lower bound As aforementioned, replacement is required when the degradation of Unit 1 exceeds U at an inspection. In particular, CR is immediate if the degradation exceeds D ; otherwise, TR is immediate. Unit 1 will be restored to as-good-as-new status after replacement. Then, the one-step transition probabilities from state i to state 0 are given as and

State Transition of Unit 2
Let   is the survival function of the defective stage. Moreover, the probability of Unit 2 transforming from normal status to defective is

System Transition Probability
As addressed,   ( ): represent the deterioration process of both units. In this study, the SMDP framework is employed to solve the maintenance optimization problem. To this end, we first define the state set and action set of the system, as stated below.
 System state set. Denote the system operational state as , , S F k F  represent that both units fail. Clearly, the entire system state set is the union of the above-mentioned scenarios, i.e., Similarly, the transition probability from state ( , , ) i k a to state ( , 1, ) j k b  i k a represents that at the k-th inspection point, component 1 is in state i, and component 2 is in a normal state. ( , 1, ) j k b  represents that at the (k + 1)-th inspection point, component 1 is in state j, and component 2 is in a defect state.
Equations (9) and (10) represent the joint probability distribution function of the degradation of component 1 and the defect of component 2 within an inspection interval, respectively.
Moreover, when at least one unit requires maintenance at the decision point, the system transition probability is equivalent to 1, since the maintained unit is restored to asgood-as-new status after replacement. For instance, when Unit 2 is experiencing CR, and Unit 1 takes the chance to execute OR, the state transition probability 2 ( , , ),(0, 1, ) (3,1) Other system transition probabilities can be derived in a similar manner. In the following, we specify each state transition possibilities with regard to system renewals, and derive the corresponding sojourn time and cost.

a. No maintenance action is taken
We first focus on the case of taking no maintenance action, which can be partitioned into the following two cases: (a) Unit 1 has not reached the TR threshold, and Unit 2 is normal-working; (b) Unit 1 has not reached the OR threshold, and Unit 2 remains defective and waiting for delayed maintenance. Specifically, the expected sojourn under the first scenario is The left-hand term indicates that Unit 2 will not fail in the next inspection cycle, and the right-hand term indicates that Unit 2 will fail randomly in the next inspection cycle. Likewise, the expected sojourn of the right-hand term is

. Only Unit 1 is replaced
When Unit 1 is experiencing TR or CR, Unit 2 will be decided whether to execute OR. Here, the time to execute TR/CR is no less than that of OR, and thus, we omit OR execution time. Notably, when maintenance action is determined at n T , the subsequent decision point 1 n T  is the maintenance completion time, as graphically represented in Figure 2. Thus, the expected sojourn time when only Unit 1 is correctively or preventively repaired is equal to

c. Only Unit 2 is replaced
Analogously, when DR or OR is executed on Unit 2, we will determine whether to execute OR on Unit 1. The expected sojourn time with respect to such a case is given by and 2 2 ( , , ) ( , , ) 2 (0,1) (3,1) .

d. Both components are replaced
It is possible for both units to undergo preventive replacement at an inspection, when (a) the degradation of Unit 1 is within   , U D and TR is immediate, and (b) the accumulated age of Unit 2 has reached the postponement threshold, and replacement is immediate. In this case, the expected sojourn time is the maximum of these two preventive replacement times, i.e., ( , , ) (2,2) max( , ).
The sojourn times when one unit experiences preventive replacement while the other experiences corrective replacement are given by

Expected Cost
The maintenance cost arises due to the execution of inspections as well as replacements. Let , A A is taken. As mentioned in Section 3, the maintenance cost can be divided into several components, including inspection cost, corrective maintenance cost, preventive maintenance cost, and opportunistic maintenance cost. Analogous to Section 4.4, there are four scenarios that need to be discussed.

a. No maintenance action is taken
Clearly, when no maintenance action is taken, the only cost incurred is the inspection cost upon decision making. Remember that Unit 2 can either be normal or defective when experiencing inspection. Therefore, the expected cost can be constructed as

b. Only Unit 1 is replaced
When Unit 1 is experiencing TR or CR, Unit 2 is also offered a chance to execute OR. Depending on whether OR is accepted, four situations are possible: (1,3) ,

Solution Procedure
The proposed maintenance model contains multiple dependent decision variables, which are difficult to solve via analytical approaches. To this end, we propose a heuristic random search approach under the framework of the Ant Colony Algorithm. Ant Colony Optimization, initially proposed by Marco Dorigo, is a high-efficiency probabilistic simulated evolutionary algorithm employed to find optimal paths in graphs [43]. Its inspiration comes from the behavior of ants to find paths in the process of searching for food. In the process of movement, ants will leave something called pheromones, which gradually reduce as the distance of movement increases. Therefore, the concentration of pheromones is often the strongest around the home or food, and ants themselves will choose the direction according to the pheromones. The main procedure of the optimization approach is outlined below:


Step 1. Initialization parameters, including the degradation coefficients of Unit 1 and Unit 2, and the time and cost needed to execute CR, TR, DR, and OR. Initialize the ant amount W , the information heuristic factor ( 0)    , the expectation heuristic factor   0    , the objective function g , and the iteration times N . Initialize the solution set   * * * * , , , H V U L ;


Step 2. Put the ant starting point in the current solution set. For each ant, the probability ij P is transferred to the next point j , which places the vertex j in the current solution set;  Step 3. Calculate the target value function * * * * ( , , , ) g H V U L for each ant;


Step 4. Modify the trajectory strength by updating the process Step 5. Reduce the number of iterations by one, i.e.,

Numerical Experiment
This section applies the proposed maintenance framework to the crucial mechanical structure of the circulating pump. The circulating pump is a common component in larger-scale systems used for transport reaction, absorption, separation, and absorption liquid regeneration. The structure of the circulating pump consists of two safety-critical key components: the main bearing and the impeller. These two components are connected in series, meaning that if one of them fails, the entire pump will break down immediately. The primary mode of degradation for the main bearing is fatigue fracture. This type of degradation can be quantified by measuring the length of cracks that form in the bearing over time. In contrast, the degradation process for the impeller can be broken down into two distinct phases: cavitation and corrosive pi ing. In particular, the degradation process that occurs during the corrosive pi ing phase is typically more severe than that during the cavitation phase. This makes it intractable to monitor and detect these types of degradation processes in real time. Instead, the failure data as well as right-censored data can be employed to capture the time-scale failure characteristics.
Through the goodness-of-fit test, Weibull distribution scales characterize the sojourns of defect initialization and evolution processes [43], with scales 1 0.78   , 2 0.83

 
and shapes 1 0.63 k  , 2 1.65 k  , respectively. Additionally, the bearing degradation is described by the Wiener process, which has been widely adopted to characterize the crack propagation process [18], with a failure threshold of 22.8 mm. Based on parameter estimation outcomes, the drift and diffusion coefficients of the Wiener process are 0.741   and 0.012   , respectively. The circulating pump is inspected per week to identify its health state, including the bearing crack length and whether corrosive pi ing is initialized. The cost structure is set as follows. The CR, TR/DR, and OR costs for the main bearing and impeller are 12,000, 6000, and 4500, respectively; the CR, TR/DR, and OR costs for the impeller are 9000, 4000, and 3000, respectively; the inspection cost is 500 per time unit.

Optimization Results
The optimal combination of decision variables and the minimum average maintenance cost is searched using the algorithm in Section 4.7. According to the optimization outcome, the minimum cost is obtained when (1) Unit 1 (main bearing) is preventively replaced (TR) when its crack length reaches 19.6 mm, or opportunistically replaced (OR) when the length reaches 17.1 mm; (2) Unit 2 (impeller) is preventively replaced (DR) 9 weeks since the identification of corrosive pi ing, or opportunistically replaced (OR) if the chances arrive within 7 and 9 weeks. The minimum cost regarding the optimal solution is 2941.5 per week.
Clearly, the optimization outcome is affected by several health-related factors of the circulating pump, such as the propagation velocity/volatility of the bearing, as well as the state sojourns of the impeller. Therefore, we conducted a sensitivity analysis on crucial coefficients representing the deterioration severity of the system. Here, we mainly pay a ention to the variations in bearing degradation coefficients, as the impeller deterioration can be analyzed in a similar way. First, we consider the sensitivity of the maintenance cost to the diffusion parameter. As shown in Figure 3a, as the diffusion increases, the optimal cost tends to decrease gradually, but this trend decreases gradually. Moreover, the optimal cost increases gradually as the drift increases, also with a decreasing trend. From Figure 4, the drift and diffusion parameters contributed significantly and almost equally to maintenance performance. Therefore, the bearing degradation should be carefully identified and controlled. For a be er illustration of the collaboration effect, we tested the maintenance cost variation with respect to both the drift and diffusion coefficients, as indicated in Figure 4. From the diagram, a larger drift and a smaller diffusion coefficient contribute to a higher cost. This is because the diffusion coefficient affects the degradation rate, in which case the condition-based threshold will decrease, indicating that threshold-based maintenance (TR) needs to be executed more regularly, resulting in higher maintenance-induced costs.  Figure 5 indicates the maintenance cost with respect to the (a) scale parameter and (b) shape parameter of the corrosive pi ing initialization process. It can be seen from the figure that with the increase in the shape parameter and the decrease in the scale parameter, the maintenance cost gradually increases. Moreover, the impact of the scale parameter on the maintenance cost is larger than that of the shape parameter. This is due to the fact that the scale parameter is more closely related to the initialization duration and affects replacement executions more significantly.

Policy Comparison
To highlight the superiority of the proposed maintenance policy in cost control, we introduce three alternate policies for comparison, either widely used in maintenance engineering or easy to implement.  Policy 1. TR and CR are executed for Unit 1, but OR is ignored. Unit 2 undergoes TR, DR, CR, and OR, which is aligned with the proposed policy;  Policy 2. DR and CR are executed for Unit 2, but OR is ignored; Unit 1 remains the same as the proposed policy;  Policy 3. Both units undergo TR, DR, and CR, and OR is omi ed. Then, the policy reduces to a conventional preventive maintenance policy without considering unit dependencies.
The comparison result between these four maintenance policies is indicated in Table  1. Clearly, the proposed policy outperforms the other heuristic policies, reducing the cost by 11.5%, 6.8%, and 13.7%, respectively. This indicates the profitability of (a) executing OM, either PM-induced or CM-induced, and (b) allowing defect-induced maintenance to be performed, since more maintenance opportunities can be integrated to reduce downtime. Moreover, due to the existence of OM, a more radical (scheduled) maintenance policy is possible, in that preventive maintenance can be arranged less frequently. In order to test the robustness of the proposed policy, we conducted a sensitivity analysis on some critical cost parameters of the circulating pump, such as the opportunistic replacement cost as well as preventive replacement cost. We first tested the variation in pump maintenance cost with respect to the OR cost of the main bearing. As clearly indicated in Figure 6, the proposed policy outperforms the other three policies regardless of OR cost variation. Notably, the optimal cost under Policy 1 and Policy 3 is not affected by OR cost, due to the ignorance of maintenance opportunities. Also, the proposed maintenance policy is less sensitive to OR cost than Policy 2.
Likewise, the sensitivity of the optimal cost to the OR cost of the impeller can be analyzed. As one can see from Figure 7, the proposed policy is more cost-effective than the other policies, which is rarely affected by impeller OR cost. Also, the optimal cost is the least sensitive to the OR cost due to the existence of multiple interacting maintenance actions, which weakens the influence of a single maintenance action.  Finally, we tested the influence of threshold-centered TR cost on the optimal maintenance cost. To this end, we fixed the cost of OR and altered the scope of TR cost from 1000 to 6000. The test outcome is indicated in Figure 8. It is important to note that although the optimal maintenance cost increases with the increase in TR cost under all four maintenance policies, the cost-effectiveness of the prosed policy is not challenged by such variations, since its cost-increasing velocity is slightly smaller than or equivalent to other policies. This indicates the robustness of the maintenance policy and framework.

Conclusions
An innovative maintenance optimization problem regarding a two-unit serial system with both continuous and non-continuous degradation was investigated. Unlike previous studies, three types of maintenance renewal activities, namely, threshold-based renewal, postponement renewal, and opportunistic renewal, were integrated to enhance maintenance efficiency and mitigate downtime loss, which can be quantitatively analyzed and optimized via the Semi-Markov Decision Process. The applicability and cost superiorities over other conventional maintenance policies are demonstrated through a case study of the critical mechanical structure of a circulating pump. The comparative outcomes indicate that the proposed policy outperforms some heuristic/conventional policies in downtime mitigation and cost control, possessing be er model robustness.
In future research, there are three possible extensions to the current study. First, the proposed maintenance model can be applied to more general and sophisticated multi-unit systems [44,45]. Second, the system structure mode can also be extended, including but not limited to serial, parallel, standby, and voting systems, with modified maintenance policies [30]. Third, the failure interaction and load sharing between units can also be integrated into the proposed model [40,46,47]. Last but not least, the statistical properties of the proposed model can be further explored to enhance the robustness and applicability of the maintenance policy.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest:
The authors declare no conflict of interest.