In software reliability engineering, software-rejuvenation and -checkpointing techniques are widely used for enhancing system reliability and strengthening data protection. In this paper, a stochastic framework composed of a composite stochastic Petri reward net and its resulting non-Markovian availability model is presented to capture the dynamic behavior of an operational software system in which time-based software rejuvenation and checkpointing are both aperiodically conducted. In particular, apart from the software-aging problem that may cause the system to fail, human-error factors (i.e., a system operator’s misoperations) during checkpointing are also considered. To solve the stationary solution of the non-Markovian availability model, which is derived on the basis of the reachability graph of stochastic Petri reward nets and is actually not one of the trivial stochastic models such as the semi-Markov process and the Markov regenerative process, the phase-expansion approach is considered. In numerical experiments, we illustrate steady-state system availability and find optimal software-rejuvenation policies that maximize steady-state system availability. The effects of human-error factors on both steady-state system availability and the optimal software-rejuvenation trigger timing are also evaluated. Numerical results showed that human errors during checkpointing both decreased system availability and brought a significant effect on the optimal rejuvenation-trigger timing, so that it should not be overlooked during system modeling.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited