The Markov model of reliability of a failover cluster performing calculations in a cyber-physical system is considered. The continuity of the cluster computing process in the event of a failure of the physical resources of the servers is provided on the basis of virtualization technology and is associated with the migration of virtual machines. The difference in the proposed model is that it considers the restrictions on the allowable time of interruption of the computational process during cluster recovery. This limitation is due to the fact that, if two physical servers fail, then object management is lost, which is unacceptable. Failure occurs if their recovery time is longer than the maximum allowable time of interruption of the computing process. The modes of operation of the cluster with and without system recovery in the event of a failure of part of the system resources that do not lead to loss of continuity of the computing process are considered. The results of the article are aimed at the possibility of assessing the probability of cluster operability while supporting the continuity of computations and its running to failure, leading to the interruption of the computational (control) process beyond the maximum permissible time. As a result of the calculation example for the presented models, it was shown that the mean time to failure during recovery under conditions of supporting the continuity of the computing process increases by more than two orders of magnitude.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited