Stochastic Analysis of a Priority Standby System under Preventive Maintenance

: In this paper, we propose a system of two dissimilar units: one unit prioritizes operation (priority unit), and the other unit is kept as a cold standby (ordinary unit). In this system, we assume that the failures, repairs, and preventive maintenance (PM) times follow arbitrary distributions for both units, except for the fact that the repair time of the ordinary unit follows an exponential distribution. The priority unit has normal, partial failure or total failure modes, while the ordinary unit has normal or total failure modes. The PM of the system can be started after time t when (i) the priority unit is in the normal or partial failure modes up to time t and (ii) the standby unit is available up to time t . PM can be achieved in two types: the costlier type with probability p and the cheaper type with probability ( 1 − p ) . Under these assumptions, we investigate the reliability measures of the system using the regenerative point technique. Finally, we show a numerical example to illustrate the theoretical ﬁndings and show the effect of preventive maintenance in the reliability measures of the proposed system.


Introduction
Standby redundancy is a technique that can be used to improve and increase the reliability and availability of systems. Improvement of a standby system depends on many factors, such as the repair, preventive maintenance, and the availability of workmen.
Many authors have investigated some related problems of the reliability of a two-unit standby system assuming that the failure time for the operating unit is exponentially distributed; among those, El-Sherbeny and Al-Hussaini [1] calculated the availability and mean time to failure in the steady-state case of different configurations of series systems with exponentially distributed times. They assumed that the failure and repair times of operative and warm standby components followed exponential distribution. Kumar and Malik [2] focused on the stochastic modeling of a computer system of two identical units: one operating and the other kept as cold standby. In each, the unit hardware (h/r) and software (s/w) components work together and fail independently. They assumed the failure time distributed to be negative exponential. Sharma et al. [3] considered a cold standby system with two dissimilar units, one named priority and the other standby. They obtained some reliability measures for the system considering failure times following exponential distribution. Yusuf et al. [4] presented the reliability measures of deterioration with time.
They also studied the effect of time and deterioration rates on system reliability. Yann and Meng [5] considering a warm standby repairable system consisting of a two dissimilar units, where one unite takes priority in use if one repairman is available. They assumed that the working and repairing times followed negative exponential distribution. They illustrated the theoretical results by numerical examples and graphed the reliability measures with different parameters of failure and repair rates. El-said [6] investigated the cost-benefit analysis of a two-unit cold standby system with two stages for repair of the failed unit. Mahmoud and Moshref [7] studied the stochastic analysis of a twounit cold standby system considering hardware failure, human error failure, and PM. Mohamoud et al. [8] discussed the system of two priority standby units under repair, postrepair, and PM. Di Nardo et al. [9] used a quantitative model based on Bhopal accident data, in which the technical, organizational, and human factors could be managed effectively with a safety management system (SMS). Wells [10] used some distributional assumptions involving distributions of phase type, introducing matrix Laplace transformations to extend the known analytic results for systems with warm standbys to the case in which the systems under study are subject to both repairable and non-repairable failures occurring in the system as a whole instead of a single operating unit. Zhong and Jin [11] discussed a novel optimal PM policy for a cold standby system with two components when a repairman is present. In their study, an SMP and regenerative point technique was used to model the system under some reasonable assumptions, and all possible states of the system and transition probabilities between them were analyzed. In addition, they derived the optimal PM cycle by maximizing the mean time from the initial state to system failure in the form of the theorem. Levitin et al. [12] considered homogeneous, cold standby systems performing missions of the fixed duration when a failure of an operating element results in a mission failure. They considered a system operating in a random environment modeled by the Poisson process of shocks. Ruiz-Castroa and Dawabshab [13] discussed a multi-state warm standby system with preventive maintenance, loss of units, and an indeterminate multiple number of repairpersons. This system can be applied to the backup server with periodic inspection by an installed monitoring program that analyzes logic and physics parameters to detect possible errors arising from internal and/or external events. Ge et al. [14] investigated the reliability analysis of a cold standby system under stepwise Poisson shocks. Gao and Wang [15] studied the reliability and availability analysis of a repairable retrial system with mixed standbys, a single repair facility subjected to preventive maintenance or active breakdown. They obtained the steady-state availability by solving the steady-state probability in matrix form. Kumar et al. [16] developed a reliability model for a non-identical cold standby system for the evaluation of system reliability, mean time to system failure, steady-state availability, the busy period of server, expected number of repairs, expected number of visits by the server, and the profit function of the system by considering all time-random variables to be Weibull distributed. Kumar and Goel [17] discussed a two-unit cold standby system by considering the concepts of degradation, inspection, preventive maintenance (PM), and priority. They considered the unit to work at a reduced capacity after its repair and thus called it a degraded unit. Furthermore, they discussed various reliability characteristics, such as mean time to system failure (MTSF), availability, busy period of the repairman, expected number of visits by the repairman, and profits of the system. Hirata et al. [18] addressed a derivation procedure for the reliability function of the two-component priority standby redundant system based on the maximum entropy principle. The procedure based on the maximum entropy principle in information theory is widely known as an elegant approach to deriving a probability density function by utilizing the given information about the stochastic property of random variables. Kumar [19] analyzed the cost-benefit analysis of a cold standby repairable system under varying environmental conditions: normal and abnormal. Kumar and Chowdhary [20] investigated the stochastic analysis of an identical two-unit standby system model with the concept of preventive maintenance of the operative unit after some significant time, which is considered to be a random variable with some probability distribution.
In this paper, we propose a system of two dissimilar units: one prioritizes operation, while the other can be kept as a standby unit. In addition, we assume that there are two types of PM for the priority unit, namely, the costlier type with probability p and the cheaper type with probability (1 − p). All of the failures, repairs, and PM times are arbitrary distributed, except for the repair time of the ordinary unit, which is exponentially distributed. In addition, we assume that the priority unit has normal, partial failure or total failure modes. Next, we use the regenerative point technique to investigate the performance of the proposed system in the steady-state case in terms of some reliability measures: (i) the mean time to system failure (MTSF); (ii) availability; (iii) busy periods with repair due to repair and PM; (iv) the profit gain of the system. Finally, in order to show the effect of PM and system parameters on the reliability measures of the proposed system, we provide some numerical illustrations and graphs.

•
The system consists of two dissimilar units: one is a priority unit and the other is the ordinary unit. • The priority unit has three modes, namely, normal, total, and partial failures, while the ordinary unit has two modes, normal and total failure. • Initially, the priority unit is in normal mode operation, while the ordinary unit can be kept as a standby case (cold standby).

•
If the priority unit is in normal mode until time t without total (or partial) failures, then it should go to the PM (costlier with probability p and cheaper with probability (1 − p)) provided that the standby unit is available. On the other hand, if the priority unit is in the partial failure mode until time t (without total failure), it should go to costlier PM provided that the standby unit is available. In all cases, when the standby unit is not available, then PM is postponed until the next time period of PM. • The switchover is perfect and instantaneous and only one the server operator is available for both of the repairs and PM. • All of the time distributions are arbitrary except the repair time of the ordinary unit. • After repair or PM, the unit returns back to normal mode.

System States
The configuration and states of the proposed system can be descried through the following notations: N O /N OC /N p f : the unit is in normal mode/in normal mode and is continued from the /N P f /N s earlier state/in partial failure mode/in partial failure mode and is continued from the earlier state/in normal mode and is kept as cold standby. F r /F R /F wr : the unit is in failure mode and under repair/the unit in failure mode and under repair is continued from the earlier state state/in failure mode and waiting for repair. N co /N ch : the unit is in normal mode and under costlier PM/in normal mode and under cheaper PM. N Co /N Ch : the unit in normal mode and under costlier PM is continued from the earlier state/the unit in normal mode and under cheaper PM is continued from the earlier state.
By using these notations, the system can be presented in any one of the following states, where the first argument represents the state of the priority unit and the second argument represents the state of the ordinary unit: The up states in this system are S 0 , S 1 , S 2 , S 3 , S 5 , S 7 , S 8 , S 9 , and S 10 ; the down states are S 4 , S 6 , S 11 ,and S 12 . Additionally, the states S 0 , S 1 , S 2 , S 3 , S 5 , S 8 , S 10 and S 12 are regenerative states, but S 4 , S 6 , S 7 , S 9 , and S 11 are non-regenerative states; see Figure 1. In this this section, we calculate the transition probabilities between the different states of the system and the corresponding sojourn times for such states. From the system setup, it can be observed that the epochs of entry into any of the states S i ∈ E are regenerative points. T 0 (≡ 0), T 1 , T 2 , . . . denotes the epochs at which the system enters any state S i ∈ E, and X n denotes the state visited at epoch T n +, i.e., just after transition at T n . {X n , T n } is the Markov renewal process with state space E, and is the semi-Markov kernel over E.
Thus, the transition probability matrix of embedded Markov chain is with non-zero elements p ij , which can be calculated by using the following probabilistic arguments: where p 01 means that the total failure of the priority unit does not occur until time t; PM does is achieved until time t; and partial failure of the priority unit occurs at time t. All other transition probabilities listed below can be explained in the same manner.
In addition, the mean sojourn times of the system µ i in state S i ∈ E can be calculated as follows:

Mean Time to System Failure
In this section, we derive the mean time to system failure as one of the reliability measures of the proposed system. According to the arguments of theory of regenerative processes, the mean times to system failureΠ i (t) starting from state S i ∈ E, i = 0, 1, 2, 3, 10 can be obtained by solving the following equations: By making use of Laplace transform (LT) for Equations (1)-(5) and solving forΠ * 0 (s), considering s = 0, we obtain the time to system failure MTSF in the steady state starting from the state S 0 as follows: where

Availability Analysis
In this section, we discuss the pointwise and steady-state availability of the proposed system. Upon use of some probabilistic arguments, we obtain the following recursive relations for AV i (t), i = 0, 1, 2, 3, 5, 8, 10, 12, where where Applying LT to (9)-(16) and solving for AV * 0 (s), we obtain the steady-state availability of the system AV 0 (∞) as follows: where N = µ 0 p 01 p 02 p 03 0 0 p 0,10 0 5,10 p 5,12 and D can be obtained using the differentiation of the determinant as follows: where

Busy Period Analysis
In this section, we investigate and analyze the expected busy period of the proposed system under repair of failed units and PM of the priority unit.

Expected Busy Period with Repair
Using the probabilistic arguments, we identify the expected busy periods with the repair for the priority unit R 1 i (t), i = 0, 1, 2, 3, 5, 8, 10, 12 by solving the following equations: 5,10 (t)©R 1 10 (t), Applying LT to (20)-(27) and solving for R 1 * 0 (s), we identify the expected busy period with repair in the steady state starting from state S 0 as follows: where D is given in (19) and Similarly, the expected busy period with the repair for the ordinary unit in the steady state starting from S 0 , R 2 0 (∞) can be obtained as follows: where D is given in (19) and

Expected Busy Period with PM
Following the procedure presented in Section 5.1, we identify the expected busy period with costlier PM in the steady state starting from S 0 , R 3 0 (∞) as follows: where D is given in (19) and Similarly, we identify the expected busy period with cheaper PM in the steady state starting from S 0 , R 4 0 (∞) as follows: where D is given in (19) and

Cost-Benefit Analysis
In this section, we determine the cost function incurred during the interval of time (0, t) for the proposed system as follows: where α is the revenue per unit of up time; k i , i = 1, 2, 3, 4 are the cost per unit times of the repair of the priority, ordinary, costlier PM and cheaper PM, respectively; and From (36) and (37), we obtain Then, the expected profit per unit of time in steady state (cost function) for the proposed system can be obtained using (38) as follows:

Numerical Illustration
In order to demonstrate the usefulness of the proposed system, we investigate the effect of PM and the other parameters on the reliability measures through the following numerical illustrations below. By setting α = 100, k 1 = 30, k 2 = 10, k 3 = 20, k 4 = 10 and λ > 0, = 1, 2, 3, 4, From the figures, we see the system reliability measures increase from time t 0 to reach their maximum values and then decrease to the steady-state level (the values of system reliability measures without PM). This enables us to select the optimal period of time to perform PM. Tables 1 and 2 display the system reliability measures for some other choices of parameters. From the numerical results displayed in the graphs and Tables 1 and 2, one can observe the following: The reliability measures of the system increase (decrease) as failure rate parameters decrease (increase).

(ii)
The reliability measures of the system increase (decrease) as the rates of the repair rates increase (decrease).

(iii)
The reliability measures of the system increase (decrease) as the PM rates increase (decrease). (iv) PM does not appear to be useful when the parameters of the failure rates are large or small (for example, when λ 1 = 1 and λ 2 = 2, the optimal time for PM is not approached).
For the large values of t 0 , the reliability measures of the system reach the stable stage; this means that PM is not useful. (vi) The starting point and the optimal times of PM decrease as the failure parameters increase. Moreover, the time interval between the starting and optimal times of PM decreases as the parameters of the failure rate increase.  Table 1. MTSF without PM at the starting time (ST) and the optimal time (OT) of t 0 with the corresponding optimal value of MTSF (t 0 ) (OV) to perform PM when ν = 0.5, δ = 1, λ 3 = 0.3, λ 4 = 0.5.  Table 2. The optimal time (OT) and optimal value (OV) of the availability and cost function of the system in the steady state when ν = 0.5, δ = 1, λ 3 = 0.3, λ 4 = 0.5.

Conclusions
In this paper, the effect of preventive maintenance on reliability measures of the propped system consisting of one priority unit and one ordinary unit is investigated in terms of the mean time to first system failure, availability, and cost function. The priority unit has normal, total, and partial failures, while the ordinary unit has two modes: normal and total failure. If the priority unit is in normal mode until time t without total (or partial) failures, then it should go to PM (costlier with probability p and cheaper with probability (1 − p)) provided that the standby unit is available. The switchover is perfect and instantaneous, and only one server operator is available for both of the repairs and PM.
The configurations and states of the system are descried, and the transitions probabilities between the different states and the corresponding mean sojourn times are calculated. Next, the Laplace transforms and transition probabilities are utilized to derive the closed form of the mean time to system failure as one of the reliability measures of the system. In addition, the steady-state availability of the proposed system is investigated through some recursive relations. The expected busy period of the system with both repair and PM is derived. To demonstrate the effect of PM on the system reliability, cost-benefit analysis is carried out.
Some numerical illustrations are displayed to show the usefulness of the findings of the paper based on certain choices of the system parameters. Theses illustrations show the importance of parameter selections in increasing the efficiency of systems and in determining the optimal time interval to start PM. Finally, we conclude from the findings that the choices of system parameters and the type of PM should be taken into account for the configurations of new systems. Acknowledgments: The authors would like to thank the referees for their helpful comments which improved the presentation of the paper. The authors would also like to extend their sincere appreciation to the Deanship of Scientific Research at King Saud University for funding this Research group (RG-1435-056).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: cdf of repair time of priority unit; q ij (t), Q ij (t) pdf and cdf of time for the system t transits from regenerative state S i to S j ; ij (t) pdf and cdf of time for the system transits from regenerative state S i to S j via the non-regenerative state S k ∈Ē; m ij Contribution to mean sojourn time in state S i , when the system transits direct to S j ; Contribution to mean sojourn time in state S i , when the system transits to S j via the non-regenerative state S k ∈Ē µ i p [ system sojourns in state S i for at least time t]dt; M i (t) p[system is up initially in state S i ∈ E is up at time t without passing through any other regenerative state or returning to itself through one or more states ∈ E]; AV i (t) P [the system is up at time t|E 0 = S i ∈ E]; R i (t) P [server operator is busy with repair of priority unit = 1, repair of ordinary unit = 2 at time t starting from state S i ∈ E]; R i (t) P[server operator is busy with costlier PM of type ( = 3), cheaper PM of type ( = 4) at time t starting from state S i ∈ E]; C(t) The expected profit incurred in (0, t]; δ Repair rate of ordinary unit; s Dummy variable in Laplace transform (LT); © Symbol for convolution of f (t) and g(t) that is f (t)©g(t) = t 0 f (x)g(t − x)dx; * Symbol for LT.