Skip to Content
SystemsSystems
  • Article
  • Open Access

22 January 2026

Performance Margin and Reliability Modeling Method for Multi-Level Redundant System

,
,
and
1
School of Reliability and System Engineering, Beihang University, Beijing 100191, China
2
Science and Technology on Reliability and Environmental Engineering Laboratory, Beihang University, Beijing 100191, China
*
Author to whom correspondence should be addressed.

Abstract

This study proposes a multi-level performance margin modeling and belief reliability framework for redundant systems. Starting from system performance, a “performance–margin–reliability” linkage is established by defining the performance and margin of multi-level redundant systems and deriving performance, margin, and metric equations that account for failures. For complex redundant systems, a hierarchical Behavior Interaction Priority (BIP) modeling approach is developed to explicitly represent the normal and failure states of atomic component models. The effects of redundant components on the overall system are transformed into variations of performance parameters, enabling quantitative analysis of redundancy mechanisms. This paper proposes a boundary search algorithm for pruning optimization, which breaks through the computational bottleneck of non-analytic threshold sets in high-dimensional topological spaces. A case study on a power supply system with multi-level structural redundancy is conducted. Based on the proposed method, a performance-margin model of the redundant power supply system is constructed, critical states are analyzed, and system reliability is calculated. The results verify the effectiveness of the proposed margin-equation formulation and solution algorithm, offering practical guidance for reliability design of redundant systems.

1. Introduction

As complex systems with high reliability, redundant systems are widely used in aerospace, Internet communication, nuclear energy safety and other fields, especially in electronic systems in key fields. The design of multi-level redundant system is often used. The failure behavior of redundant systems is diverse and the state changes are complex. Therefore, the reliability modeling and analysis of redundant systems have always been a key and active research topic in the field of reliability system engineering. Scientifically and reasonably predicting and evaluating the reliability performance of redundant systems is the foundation for achieving reliable design, risk avoidance, and developing maintenance and support strategies for electronic systems.
The modeling of redundant systems has long been a central research focus in the field of reliability engineering. Existing modeling approaches include traditional methods such as reliability block diagrams (RBD) [1], fault trees [2], Petri nets [3,4], and Markov processes [5]. In recent years, methods based on binary decision diagrams (BDD) [6] and universal generating functions (UGF) [7] have also attracted increasing attention.
The reliability block diagram method represents the most fundamental modeling approach in reliability analysis and is suitable for relatively simple redundant-system modeling problems. Benabid et al. [1] employed the RBD method to evaluate the reliability of a redundant power supply system and compared the results with those obtained using fault tree analysis. Ahmadi et al. [8] conducted a reliability assessment of various redundancy strategies for modular multi-level converters.
Traditional static failure-tree logic gates cannot effectively represent the temporal dynamics of redundant systems. To address this limitation, researchers have developed dynamic fault trees [9], state/event fault trees [2] and fuzzy fault trees [10]. The solution of such complex fault trees often requires transformation into Markov chains [5] or Monte Carlo simulations [11].
The transitions between states in a redundant system can be regarded as a stochastic process. When the state transition is assumed to depend only on the current state and not on the previous ones, the process exhibits the Markov property. Consequently, the Markov processes are frequently employed for modeling redundant systems. Zhong et al. [12] incorporated system state duration into a regenerative-point method and a semi-Markov process to construct a cold-standby system model, addressing the problem of optimal preventive maintenance strategies. Fathizadeh et al. [13] proposed a model for a cold-standby system comprising a single server and two identical units with an imperfect switch, based on a matrix renewal function and a semi-Markov process. However, when applied to complex systems, the Markov modeling approach suffers from a rapid increase in computational complexity as the number of components and states grows, leading to the well-known state-space explosion problem.
Petri nets are capable of describing the dynamic characteristics of complex discrete-event systems and are commonly used for state-transition modeling of redundant systems. Ereau et al. [3] elaborated on the process of modeling redundant systems using Petri nets. Zeng et al. [14] employed Petri nets to conduct quantitative analyses of failure dependencies and common-cause failures within redundant systems. Chen and Zhang [4] proposed a novel emergency supply chain stress evolution modeling methodology based on stochastic Petri nets and corresponding isomorphic Markov chain model. As an information-flow model, however, the Petri net cannot directly support analytical computation; instead, it requires auxiliary analytical or simulation-based methods for quantitative evaluation.
In recent years, decision diagram–based modeling methods have gained widespread attention in redundant-system analysis due to their advantages in handling temporal logic, compatibility with multi-level redundant system, and immunity to the combinatorial explosion problem during computation. For example, Zhai et al. [6] utilized a generalized binary decision diagram to model the reliability of warm-standby systems. For k/n warm-standby systems with component lifetimes following arbitrary distributions, Tannous et al. [15] proposed a system reliability evaluation method based on sequential binary decision diagrams. Jia et al. [16] applied a multi-state multi-valued decision diagram modeling approach to address reliability assessment problems for demand-oriented warm-standby systems in which component degradation follows arbitrary distributions. Mo and Xing [17] integrated binary decision diagrams with multi-valued decision diagrams to perform reliability analysis of warm-standby systems. Yeh et al. [18] proposed an improved d-MP method based on a multistate flow network to study the reliability of logistics delivery in trucks driving on two types of roads under constraints.
The Universal Generating Function (UGF) is essentially a probabilistic computation method that enables the combination of discrete random variables, thereby addressing the problem of state-combination probability calculation in redundant systems. Owing to its simplicity and computational efficiency, the UGF approach has been widely applied to reliability modeling of multi-level redundant system. For active-standby redundant systems, Wang et al. [7] employed the interval UGF method and a discrete approximation approach to solve the reliability optimization problem of heterogeneous cold-standby systems. Levitin et al. [19] integrated the UGF method with the reliability block diagram approach to determine the performance of heterogeneous warm-standby systems under discrete-state and continuous-time processes. Furthermore, they proposed a UGF-based method for obtaining the demand satisfaction probability, which effectively addressed the optimal component allocation and sequencing problem in warm-standby systems [20].
Most existing reliability modeling approaches for redundant systems focus on failure relationships, emphasizing component lifetime distributions or failure rates to establish the link between reliability and failure probability. However, these conventional methods encounter fundamental limitations when applied to complex multi-level architectures: they often suffer from computational state-space explosion, rely on invalid component independence assumptions that overlook dynamic coupling, and, most critically, restrict analysis to binary logical states, thereby failing to capture the ‘Performance Margin’—the quantifiable distance to functional thresholds. Yet, reliability is fundamentally defined as the ability of a system to perform its intended function under specified conditions. This capability is intrinsically evaluated by determining whether system performance satisfies a given threshold; failures, in essence, influence reliability precisely through their impact on this performance [21].
In 2013, Professor Kang [22] proposed the belief reliability approach, which quantifies the influence of specified conditions on system performance through performance equations, describes performance degradation over time through degradation equations, derives performance margins using margin equations to represent whether system performance meets functional requirements, and evaluates the uncertainty of performance margins through metric equations to measure system reliability. The belief reliability method has been widely applied in the reliability modeling of systems. The theory of belief reliability, based on the uncertain maximum likelihood method and the uncertain variational inference algorithm, solves the problem of insufficient data in reliability analysis of two-stage degradation modeling [23]. In order to describe the state transition process of multi-state deteriorating systems under the effect of epistemic uncertainty, a mathematical model called uncertain state transition chain is proposed based on belief reliability [24]. The arithmetic Liu process based on belief reliability is used to model the uncertain performance degradation process of the components in the feedback control system [25]. The hybrid uncertainty of dependent competing failure process system is analyzed and quantified with chance theory based on belief reliability to solve the quantification problem of mixed uncertainty in these systems [26]. These methods have been demonstrated to be a more scientific and comprehensive reliability modeling framework. Within the belief reliability modeling theory, the margin equation defines the system’s performance margin as a distance function between system performance and its threshold. The accuracy of the margin equation directly affects the computation of the metric equation and the overall reliability assessment, making it a critical component of reliability analysis.
A multi-level structure consists of a hierarchical architecture where redundancy strategies are applied at distinct system levels. Unlike single-level designs, this configuration introduces complex cross-level state dependencies. Static methods struggle to represent dynamic transitions between levels, and dynamic methods often suffer from severe state-space explosion. More importantly, modeling the coupling between failure behaviors and performance is practically necessary for high-reliability applications. A system may remain logically functional after a partial failure, yet its performance can degrade significantly, shrinking the safety margin toward a critical threshold. Ignoring this coupling leads to overly optimistic reliability assessments. It fails to identify “grey states” where the system is topologically intact but performance-wise borderline.
However, traditional methodologies face critical challenges in accurately capturing this linkage for multi-level redundant systems. Static tools like RBD and FT fail to model internal failure dependencies or dynamic temporal behaviors, while Markov processes and Petri nets are hindered by state-space explosion or low computational efficiency. Furthermore, decision diagrams and UGF depend heavily on prior statistical data, often overlooking the physical connections between failures and the influence of external operating conditions. To bridge these modeling gaps and quantitatively evaluate performance margins under specified conditions, this study introduces a hierarchical framework based on Behavior Interaction Priority (BIP), which unifies physical laws and logical rules to enable a precise ‘Performance–Margin–Reliability’ linkage.
To address these gaps, this study proposes a hierarchical framework based on Behavior Interaction Priority (BIP). The effectiveness of the proposed approach stems from three core mechanisms: (1) Complexity Reduction: Hierarchical BIP modeling utilizes equivalent connectors to flatten cross-level interactions, representing complex behaviors with minimal layers to bypass the state-space explosion inherent in traditional dynamic methods. (2) Failure-Performance Coupling: The ‘Performance–Margin–Reliability’ linkage maps discrete failure transitions within atomic components directly to continuous performance parameter variations. (3) Analytical Support: A pruning-optimized boundary search algorithm provides a rigorous numerical solution for high-dimensional, non-analytic threshold sets, enabling precise reliability quantification where simple analytical expressions are unavailable.
The innovations of this study are reflected in the following aspects:
(1)
Presents the definitions of performance and margin for multi-level redundant system, together with the basic workflow of performance margin modeling for redundant systems, thereby laying the foundation for belief reliability modeling of redundant systems.
(2)
Investigates a BIP-based modeling method for redundant-system performance that considers system failure behavior coupling. The failure behaviors introduced by adding redundant components to the original system are incorporated into the performance margin modeling, and—under the coupling between system behavior and redundant components—the system’s margin equation is established.
(3)
Addressing the issue that the performance range of complex multi-level redundant systems forms a high-dimensional topological set that is difficult to describe with simple functions, a depth-first search method with pruning optimization is adopted to rapidly obtain numerical solutions for the set boundary (requirement thresholds), thereby avoiding the excessive computational burden of exhaustive enumeration.
The remainder of this paper is organized as follows: Section 2 presents the concepts of performance and margin for multi-level redundant systems and provides the margin expressions and modeling procedure for redundant systems. Section 3 introduces the basic principles of the BIP modeling method, proposes a multi-level redundant system BIP modeling approach that considers the influence of failure behavior coupling, and develops a boundary search algorithm for solving the margin equation. Section 4 takes a multi-level redundant power-supply system as an example to demonstrate the implementation process and results of the proposed modeling method. Section 5 summarizes the paper and discusses potential directions for future research.

2. Performance and Margin Modeling of Multi-Level Redundant System

2.1. Definition of Performance and Margin in Redundant Systems

Professor Rui Kang [24] proposed three fundamental principles of reliability science within the framework of belief reliability theory: the margin reliability principle, the degradation eternity principle, and the uncertainty principle. He further pointed out that the reliability model of a system can be constructed using four fundamental equations—the performance equation, the degradation equation, the margin equation, and the metric equation—among which the performance degradation equation is expressed as follows: [22]
P = f t = 0 ( X , Y , t )
where X denotes the internal variable, Y denotes the external variable, t represents reversible time, P represents the vector of system performance parameters (hereinafter referred to as performance parameters), and f t = 0 ( ) a denotes the functional relationship among the aforementioned variables. Equation (1) serves as the general form of the performance degradation equation for various types of systems. For redundant systems, by considering both the failure state and the system state, the performance degradation equation can be expressed as:
P ( δ ) = F 0 ( X , Y , t ) δ = 0 F δ ( X ( δ ) , Y ( δ ) , t ) 0 < δ δ m
In this equation, δ represents the index of the current state of the redundant system. The system’s state is defined by the combination of failure conditions of its functional components, where each unique combination corresponds to a specific failure configuration. δ m represents the maximum number of failures that the system can tolerate before it is considered to have completely failed. Thus, δ tracks the current failure status of the system, while δ m defines the system’s failure tolerance threshold. When δ = 0 , the system is in the normal operating state, meaning no failure has occurred. When 0 < δ δ m , the system is in a failure state, with δ indicating the number of failures that have occurred. For convenience, the representative index of the state in which the redundant system operates normally is denoted as 1 - δ m . F 0 represents the performance equation of the system when no failure has occurred, hereinafter referred to as the performance equation of the initial state of the redundant system. F δ represents the performance equation of the system after recovery or reconfiguration; if reconfiguration is not considered, then F 0 remains equivalent to F δ . It can thus be seen that the performance equation describes the physical law governing system performance, encompassing the dynamic evolutionary behavior of the system caused by failures.
It is assumed that, within the n-dimensional normalized margin space R n * of the redundant system, there exist demand threshold points P t h A , δ δ m , and the performance parameter P ( δ ) serves as the unique solution to the performance equation, as illustrated in Figure 1. In this case, the system’s performance parameter is a fixed value—for example, the output voltage of the power supply in electronic equipment. This type of performance parameter does not vary with changes in user demand.
Figure 1. Performance margin diagram of redundant system.
In Figure 1, the dashed arrow represents the margin M, A represents the range of performance parameters that satisfy the functional requirements of the system, while P t h denotes the demand threshold. The boundary of the performance index range is a set composed of these demand thresholds. For each point δ δ m , the performance equation has unique solutions P , P , and P , respectively. For a redundant system, system reliability implies that, for all δ δ m , the solutions of the performance equation must satisfy the functional requirements.
The performance of a redundant system is related to the internal failures that occur within it and can be characterized by the representative failure index δ , which indicates the current functional state combination of the system. When δ δ m , the shortest distance P ( δ ) between the performance parameter M ( δ ) and its corresponding demand threshold can be used to describe the reliability performance of the redundant system. Accordingly, the performance margin of the redundant system is defined as the expected value of the system’s performance margin δ δ m over all system states M ( δ ) .
Without considering system recovery or reconfiguration after failures, the expression of the performance margin M ( δ ) is given as follows:
M ( δ ) = min P t h A t h π 1 ( F 0 ( X , Y , t ) , P t h ) F 0 ( X , Y , t ) P t h 2 δ = 0 min t h A t h π 1 ( F 0 ( X ( δ ) , Y , t ) , P t h ) F 0 ( X ( δ ) , Y , t ) P t h 2 0 < δ δ m
where π 1 ( ) denotes the indicator function, A t h represents the range of performance parameters that satisfy the functional requirements of the system, and A t h represents the boundary of the performance index range A t h , corresponding to the demand threshold. The value of the indicator function is 1 if and only if C and C both satisfy the specified constraint relationship; otherwise, it is 0.
Let the probability density function of the system being in different performance state combinations be denoted as f ( δ ) , then the performance margin equation of the redundant system considering uncertainty can be expressed as:
M r e = 0 δ m f ( δ ) M ( δ ) d δ
where δ m represents the maximum representative index of allowable failures in the redundant system. Specifically, when δ is a discrete variable, Equation (4) can be reduced to a discrete form:
M r e = δ = 0 δ m f ( δ ) M ( δ )
The system reliability is defined as the probability that the performance margin is greater than zero, i.e., the metric equation is expressed as:
R = c { M r e ( δ ) > 0 }
where R denotes the belief reliability, and c { } represents the mathematical measure used to quantify uncertainty, such as a probability measure.

2.2. Basic Process of Performance Margin Modeling for Complex Redundant Systems

Equation (3) represents the general form of the margin equation for redundant systems. For complex multi-level redundant systems, the margin cannot be expressed in a simple equation form. Instead, an approach involving the construction of a surrogate multi-level diagram model can be used. The process for constructing the margin equation for complex redundant systems is shown in the flow diagram in Figure 2. (a) is the conceptual relationship between belief reliability, performance margin, and the hierarchical BIP model. (b) is the basic process for establishing the margin equation of redundant systems.
Figure 2. Basic process for establishing the margin equation of redundant systems. (a) The conceptual relationship. (b) Establishing the margin equation.
(1)
By collecting information on the design, structure, functions, and other aspects of the redundant system, key performance parameters of the system are obtained using functional, performance, and margin analysis methods. Based on this, the hierarchical structure and information for modeling the redundant system are organized.
(2)
Using the information obtained in step 1, a multi-state graph model of the redundant system is constructed (e.g., the BIP model described later). The BIP model serves as a surrogate form for the redundant system’s performance equation. During the construction of the BIP model, the first step is to identify the internal and external parameters of the redundant system that need to be analyzed, followed by the selection of the necessary BIP components and modeling methods.
(3)
The system’s performance equation is obtained based on the BIP model. If there are correlations between different key performance parameters, decoupling must be performed first to ensure independent performance equations are derived. By simulating the BIP model, the values of the system performance parameters under the given internal and external parameters can be obtained.
(4)
The demand thresholds of the redundant system are determined, completing the construction of the system’s margin equation. All constraints are integrated, and a functional expression satisfying the demand thresholds is provided. This allows for the determination of the probabilities of the system being in different functional carrier states, thus obtaining the distribution of the system, which is substituted into Equation (4) or (5). Finally, the belief reliability is calculated using Equation (6).

3. BIP Modeling Method for Performance Margin of Redundant Systems

3.1. BIP Modeling Method

The BIP model mentioned in Figure 2, is a formal modeling framework proposed by the Verimag Laboratory in France (founded by Turing Award winner Joseph Sifakis [27]). It is primarily used for the design and verification of complex systems. The core idea of this model is to describe system dynamics through the interactions between components and use a priority mechanism to address concurrency control issues [28].
The BIP model consists of three layers: The bottom layer describes the behavior of atomic components through an extended labeled transition system. The middle layer expresses the interactions between components via connectors, which can be defined on any number of ports. The types of connectors include rendezvous (strong synchronization) and broadcast. The top layer defines interaction priority rules that are used to schedule the sequence of interactions in the bottom layer. These priorities describe the system’s scheduling strategy and allow for the implementation of complex logical relationships.
The atomic component consists of a set of ports (Ports), local variables (Variable), a set of states (States), and several transitions (Transitions). Figure 3 shows the general structure of an atomic component, which primarily includes the port set, state set, variable set, and the set of transitions corresponding to the modeling computation steps.
Figure 3. Atomic component structure diagram.
In Figure 3, the ports of the atomic component are generally denoted by = { p 1 , p 2 , , p n } . Ports are operation names used for synchronizing with other components. The state set is represented by S = { s 1 , s 2 , , s n } , where the control state indicates the position where the component is waiting for synchronization. The variable set is denoted by V = { v 1 , v 2 , , v n } , used to represent the numerical values of variables in the model. The set of transitions consists of multiple transitions. A transition from state s i to state s j is represented by the s i , p , g p , f p , s j tuple, where p denotes the port related to this transition, which can exchange data with other components. g p represents the guard condition for the transition, specifying the conditions under which the transition can occur, typically evaluated using a Boolean variable. f p represents the action to be executed when the transition occurs; this action can execute a function on the variable set.

3.2. BIP Modeling Method for Multi-Level Redundant Systems

In the BIP model of a redundant system, each atomic component can represent either a system-level component or a subsystem-level unit. For atomic components representing system-level components, there must be two or more states, with one state being a failure state and the remaining states representing normal or intermediate states. The corresponding set of transitions must include at least one transition whose endpoint is the failure state. The execution action of this transition can represent the performance equation and the dynamic evolution behavior of the system caused by failures. The guard condition for the transition leading to failure represents the failure criterion of the system component at the unit level.
For multi-level redundant systems, the system performance equation based on BIP is constructed by modeling from the lowest layer to the higher layers. The lowest-level components can be represented by BIP atomic components, which describe behavior and interaction. At higher levels, BIP atomic components can describe the redundancy rules at the subsystem level. For the entire system, it is necessary to combine the lower-level BIP atomic components with the higher-level BIP atomic components. Here, we introduce BIP composite components to represent the BIP model for such multi-level systems. The composite component model has ports for interacting with the external environment, and its internal structure consists of interactions between lower-level atomic components, as shown in Figure 4.
Figure 4. Construction of BIP model composite component.
The BIP model of a system with multi-level redundancy can achieve component-level reuse through the addition of ports and interactions. However, due to the complexity of the system’s multiple level, it becomes difficult to implement the model programmatically. To address this, the logical relationships need to be simplified to describe the global impact of each component’s behavior on the system behavior using the fewest system layers. By using equivalent connectors, the interactions across different levels in the multi-level BIP model can be flattened, i.e., the internal interactions within composite components and the external interactions are mapped to the same level. The three types of connectors shown in Figure 5 are equivalent [29].
Figure 5. Equivalent transformation of interactions between multi-level and single-level BIP models. (a) Single-level interaction (b) Three-level interaction (c) Two-level interaction.
The performance BIP model of a redundant system can adopt a multi-level BIP model, where the atomic component model includes both normal and failure states. This allows for the consideration of the impact of system failure behavior on the performance of redundant components. The relationship between redundant components represents the new component behavior generated by adding redundant system components to the original system. There is an interaction between system behavior and the redundant component relationships, and the effects of this interaction will be reflected in the changes to the system performance parameters.

3.3. Boundary Search Algorithm for Margin Equation

Based on the characteristics of the margin equation for redundant systems, constructing the margin equation requires obtaining the range A of performance parameters that satisfy the functional requirements of the system, along with their threshold P t h (as shown in Figure 1). For complex systems, the range of performance parameters needs to be represented by a set in a high-dimensional topological space, and the boundary of this set corresponds to the threshold. Since it is difficult to describe the boundary of a high-dimensional topological space set using a function, the most straightforward method is to use exhaustive enumeration to obtain the numerical solution of the boundary. However, this approach is computationally expensive and inefficient.
This section proposes a depth-first search method with pruning optimization to quickly obtain the numerical solution for the boundary of the high-dimensional topological space. The core idea is to search the boundary and identify as many points as possible to replace the boundary curve, thereby obtaining the required threshold. To fully explain the algorithm’s process, the following definitions are provided.
Definition 1. 
Let there be a topological space  ( R n , T p 0 n )  and a set  x R n Using  x  as the origin and a step size of  λ  , the topological space is discretized into a set of evenly spaced points. The resulting new set of points  U x λ  is called the gridding of the topological space  ( R n , T p 0 n ) .
Definition 2. 
Let there be a point set  R n *  in the normalized margin space  U x where the elements in the set satisfy the rule  F U By gridding  R n *  and using a Boolean value  b u  to distinguish between the points that satisfy rule  F U  and those that do not, a tuple  [ x 1 , , x n ] , b u  is formed. The set composed of these tuples   is referred to as the grayscale of the normalized margin space.
A complete example of the gridding and grayscale process in a three-dimensional normalized margin space is shown in Figure 6.
Figure 6. Gridding and Grayscale process of the normalized margin space.
Through the processes of gridding and grayscale transformation, the high-dimensional space can be represented by a discrete set of points, with the original boundary curve approximated by connecting boundary points. The accuracy of this approximation depends on the gridding step size. Based on the grayscale transformation of the space, it can be observed that the boundary points still satisfy the necessary and sufficient conditions for boundary points in the topological space. In this study, a depth-first search method is employed to identify the boundary points, while pruning optimization is applied to terminate the search early for neighboring points of certain boundary points. The principle of this method is illustrated in Figure 7.
Figure 7. Schematic diagram of the domain boundary search algorithm.
The forward direction and termination conditions of the depth-first search follow the rules below:
(1)
The forward search direction consists of the 3 n 1 adjacent searchable points of the current point. For example, if the current point is 0 , 0 , the adjacent searchable points are 1 , 0 , 0 , 1 , 1 , 1 , 1 , 0 , 0 , 1 , 1 , 1 , 1 , 1 and 1 , 1
(2)
The termination condition is that the Boolean value of the current point equals 1, and within an open ball of radius γ a ( γ a > λ ) centered at the current point a , there exist points B ( a ; γ a ) =   x R n ρ n ( x , a ) < γ a with Boolean values of both 1 and 0. In this case, the search continues; otherwise, the search terminates and backtracks to the previous search point.
It is worth noting that the radius of the open ball in the termination condition should not be excessively large, as an overly large radius will always include points with Boolean values of both 1 and 0. Based on empirical experience, the maximum selectable radius should not exceed 9 n λ .
To improve the search speed of the depth-first search algorithm, the following pruning optimization rules are adopted:
(1)
If only one-third or fewer of the adjacent searchable points of the current point do not satisfy the termination condition, then these adjacent points will not be searched. For example, if the current point is 0 , 0 and only 1 , 1 among its adjacent searchable points can continue the search, while the remaining points satisfy the termination condition, then these remaining adjacent points 1 , 1 will not be considered as the next current point in the subsequent search step.
(2)
If the Boolean value of the current point is 0, none of its adjacent searchable points will be searched.
(3)
If three-fourths or more of the adjacent searchable points of the current point do not satisfy the termination condition, then the adjacent points of the current point will also not be searched. For example, if the current point is 0 , 0 and among its adjacent searchable points 1 , 0 , 0 , 1 , 1 , 1 , 1 , 0 , 0 , 1 , 1 , 1 , 1 , 1 and 1 , 1 can continue the search, then the remaining adjacent searchable points will not be considered as the next current points.
Based on the above definitions and rules, the process of the proposed domain boundary search algorithm is shown in Figure 8.
Figure 8. Flowchart of the boundary search algorithm.

3.4. System Reliability Calculation Method

To solve the margin equation of a redundant system, it is first necessary to obtain the shortest distance between the performance parameters and the demand thresholds under each system state. Then, the margin is determined based on the probability of each system state, followed by the calculation of system reliability. The main steps are as follows:
(1)
Generate the set of system states in which the redundant system can operate normally.
(2)
Use the boundary search algorithm to obtain the demand threshold A t h .
(3)
Based on the system performance equation, calculate F 0 ( X , Y , t ) and F 0 ( X ( δ ) , Y , t ) ; for a given δ , iterate over each P t h A t h to compute the corresponding π 1 F 0 ( X ( δ ) , Y , t ) , P t h F 0 ( X ( δ ) , Y , t ) P t h 2 and take the minimum value, with δ serving as M δ under the given condition.
(4)
According to the obtained f δ , select Equation (4) or (5) as appropriate to compute M r e , and then calculate the system reliability using Equation (6).

4. Case Analysis

4.1. Case Introduction

The power supply system of a certain industrial control device adopts a multi-level redundant system design as shown in Figure 9. The battery simultaneously powers two chargers, each consisting of four charging modules. Two of the modules serve as the primary working components, while the other two are used for hot backup. The two chargers are connected in parallel for redundancy.
Figure 9. Power supply system structure diagram.
The function of the power supply system is to provide a fixed current output by the charging modules. Failure detection occurs when none of the chargers can meet the required output current. Each charging module is controlled by the device’s mainboard through serial communication, which enables switching between failure and backup modules, as well as the selection and matching of primary working modules and charger output control. The control logic of the device mainboard is as follows: the mainboard monitors the status of each working module. When all charging modules within the same charger are operating normally, the two modules with the highest total output current are selected as the primary modules, while the remaining modules serve as hot backups. When a primary module fails, the charging module with the highest output current among the backup modules is activated as the new primary module. In the two chargers, the charger with the highest output current is selected to supply power. Considering that the effective voltage supplied to the battery must not be too low, there are corresponding requirements for the equivalent internal resistance of the charging modules. Since the output current and the equivalent internal resistance of the charging modules are relatively independent performance parameters, the system’s performance is represented as a two-dimensional vector consisting of output current and equivalent internal resistance.

4.2. Power Supply System BIP Modeling

The first step is to establish the atomic components of the power supply system. Figure 10 illustrates the atomic component construction for a non-repairable single resistor in a parallel resistor configuration. This resistor has two states: S 1 represents the abnormal state, and S 2 represents the normal state. The operating state corresponds to a self-loop transition P 2 . P 1 denotes an abnormal state caused by over-voltage disconnection or short-circuiting (denoted as ε = 0 ). The current flowing through the resistor is represented by I = 0 , while p 3 indicates an abnormal state caused by the short circuit. α = 1 is the short-circuit indicator. The input port in transmits the voltage signal to the resistor, and ports P 1 , P 2 , and p 3 transmit the state indicator signal and current signal to another parallel resistor and subsequent circuitry, respectively.
Figure 10. Atomic component construction of a single resistor in a parallel resistor configuration.
The atomic components are combined to form a multi-level BIP model. Figure 11 shows the multi-level BIP model of Charger 1.
Figure 11. Multi-level BIP model of charger 1.
By applying simplification rules, the BIP model of the system is reduced, as shown in Figure 12.
Figure 12. BIP model of the power supply system.
In the redundant system’s BIP model, two failure-triggering components are set up to act on the charging modules in Charger 1 and Charger 2, respectively. The f ports of the four charging modules strongly interact with the failure output port p 1 of the failure-triggering components to ensure that failure information is synchronized across all components. The time progression of the failure-triggering components is strongly synchronized with the time progression of each component through port t . Each charging module component includes both normal and failure states. The normal state port n connects to the logic decision component port via connectors C 1 and C 2 , where the protective device of the connector judges the status of the charging modules within the charger and performs the action of selecting the charging module as the primary working module. I p e _ 1 and I p e _ 2 are assigned to the current outputs of Charger 1 and Charger 2 via the connectors, and their expression is as follows:
I p e _ 1 = max i j I i , 1 + I j , 1
I p c _ 2 = max i j I i , 2 + I j , 2
In this model, I i , 1 represents the output current capability of the charging module in Charger 1, while i = 1 , 2 , 3 , 4 and j = 1 , 2 , 3 , 4 represent the states of the logic decision component. The migration constraints for states s 1 and s 2 in the logic decision component are used to determine whether the charger itself can output normally. When both chargers can output normally, the system’s output current I p e is:
I p e = max I p e _ 1 , I p e _ 2
When at least one charger is functioning normally, the system state is considered a normal output state. In the case where both chargers fail, the system state transitions from the normal output state to a failure state. The output current during a charger failure is set to zero. The system’s equivalent resistance is calculated using the charger that is outputting and the primary working module’s charging module, based on the circuit’s equivalent resistance formula.
R p e = R i , N R j , N R i , N + R j , N
In this model, N represents the index of the charger with the largest output current from the two chargers, while i and j represent the indices of the two charging modules with the largest output currents in Charger N .

4.3. Performance Margin Equation of the Power Supply System

The current state of the redundant system can be represented by a tuple of the number of failures N n f and the failure-bearing component index N i d f . Each state of the redundant system is numbered and labeled as δ , where δ = 0 represents the system state with no failures, and X δ represents the changes in output current and equivalent internal resistance of the charging modules after a failure occurs. To simplify the description of the system’s performance equation, the physical laws governing the system’s performance at the initial moment (when no failures occur) and the system’s performance after a failure are derived. The simplified performance equation expression is obtained through the BIP model of the system:
P P S = I p e R p e = max N = 1 , 2 max i j I i , N ( δ ) + I j , N ( δ ) f y ( N , δ ) R i ( δ ) , N ( δ ) R j ( δ ) , N ( δ ) R i ( δ ) , N ( δ ) + R j ( δ ) , N ( δ ) f R ( δ )
where, P P S represents the performance parameter vector composed of the output current and equivalent internal resistance of the power supply system. y p e , R p e T represents the performance parameters. N δ indicates the index of the charger selected under different redundant system states. i δ and j δ represent the indices of the two charging modules with the largest output currents in the selected charger N δ . f y is the indicator function for output current; if the number of normally functioning charging modules in charger δ under the redundant system state represented by state number N is fewer than 2, the function value is 0; otherwise, the function value is 1. f R represents the indicator function for equivalent internal resistance; if the output current of the redundant system state represented by state number δ is 0, the function value is ; otherwise, the function value is 1.
The performance margin equation of the power supply system is:
M = min P P S A t h π 1 P P S , P t h P P S P t h 2

4.4. Belief Reliability Analysis of the Power Supply System

The parameter table for each module in Charger 1 and Charger 2 is shown in Table 1. It includes the output current capability, equivalent internal resistance, and the failure detection criteria for the industrial control system.
Table 1. Parameter table for each module of charger 1 and charger 2.
We first solve the performance equation, and the resulting system performance calculation is presented as a set of data points in Figure 13.
Figure 13. System performance vs. failure parameters in power supply charging.
In Figure 13, the horizontal axis represents the output current (A) and the vertical axis represents the equivalent internal resistance (mΩ). Colored dashed lines depict the boundaries of the system performance parameter set, which vary with failure conditions as the number of component failures increases from 1 to 5.
From Figure 13, we observe that as the number of failures in the failure parameters increases, the system performance shifts toward a lower output current and a higher equivalent internal resistance. Concurrently, the set of performance parameter points continually expands. Consequently, the solution to the system performance equation can be substituted by analyzing the change in the performance point set, which is influenced by the number of failures.
Specifically, when the number of system failures is six, these failure points significantly affect the system’s performance output. Crucially, the power supply system fails (system failure) when the number of system failures is greater than or equal to seven. The output current of the power supply system is constrained to be no less than 75 A and no more than 125 A. The relationship between the equivalent internal resistance limits and the output current is detailed in Table 2.
Table 2. Equivalent internal resistance limits vs. output current in power system.
The system’s Demand Threshold is governed by the failure criteria defined in Table 3 and the output current, which consequently dictates the rules satisfied by the points within the threshold. The set of points constituting this Demand Threshold is subsequently obtained using the Boundary Search Algorithm, as illustrated in Figure 14.
Table 3. Set of system states for normal power system operation. (Symbol “x”: arbitrary state of a functional carrier).
Figure 14. Set of demand threshold points for the power system.
As shown in Figure 14, where the axes and units are consistent with Figure 13, the algorithm solves for the numerical demand threshold set P t h . This boundary defines the acceptable region for system operation based on the combined constraints of output current and internal resistance.
Based on the configuration of the Redundant Power System, the set of system states enabling normal operation is generated. Specifically, the state of the redundant power system is represented by an octuple composed of the failure or normal status of the charging modules, where 0 and 1 denote the failure and normal status of the charging module’s functional carrier, respectively. A total of 239 system states capable of normal operation were obtained. These states are then consolidated based on the various combinations of operational chargers and charging modules serving as the system output, resulting in the categorized states presented in Table 3.
In Table 3, the symbol “x” denotes an arbitrary state, which can be either a failure or normal condition. The system states were consolidated into twelve distinct states differentiated by their output current capability. M δ and f δ are calculated and obtained separately using the Margin Solution Algorithm. It is evident that the performance margin for identical system state M δ possesses the same magnitude, and the calculation results for M δ are presented in Table 4.
Table 4. Performance margin for each system state of the power supply system.
Based on the failure criteria for charging modules provided in Table 1, the performance margin and belief reliability are calculated for the eight charging modules within the power system. Each charging module is characterized by two performance parameters—output current and equivalent internal resistance—along with their respective demand thresholds.
System uncertainty primarily originates from the parameter uncertainty of the charging modules, which is mainly introduced during the manufacturing process. The uncertainty of the power system’s margin is quantified using the normal distribution in probability theory. This analysis assumes that no failures exist in the industrial control equipment. The probability distributions for the parameters of the respective charging modules are given in Table 5.
Table 5. Probability distribution table of parameters for each charging module.
The calculation results of the belief reliability of the charging module failure are shown in Table 6.
Table 6. The belief reliability of each charging module.
Based on the states of the system as shown in Table 3, the probability results of the system being in each state are presented in Table 7.
Table 7. The possibility of the power system being in various states.
Based on Equation (6), the system’s performance margin M r e = 0.091613 is calculated. By substituting the probability distributions of the output current capability and equivalent internal resistance into the margin equation, the belief reliability R B = P { M r e > 0 } = 0.9611 of the system is obtained through the analysis of the distribution of the performance margin in this redundant system.
The state of the power supply system can be reflected by the number of fault-bearing components within the system. The curve in Figure 15 provides a sensitivity analysis of the system performance margin M δ . It demonstrates how the minimum margin—measured as the shortest distance to the requirement boundary—decreases as the system is subjected to an increasing number of concurrent component faults, thereby quantifying the degradation of belief reliability performance.
Figure 15. Variation of performance margin with the number of component failures.

5. Conclusions

This study proposes a multi-level modeling approach for Performance Margin and Belief Reliability in Redundant Systems. Unlike traditional methods, the proposed approach establishes a margin equation that explicitly considers degradation and fault states, deriving reliability based on the principle that the performance margin remains greater than zero.
  • Starting from the scientific principles of Belief Reliability, the performance and margin of multi-level redundant systems are defined. A performance method, margin equation, and measurement equation that account for system faults are presented. A hierarchical BIP modeling method for complex redundant systems is proposed, defining the normal and fault states of atomic components while considering how fault behaviors influence the performance of redundant components. Consequently, the impact of redundant components on the system is reflected through variations in system performance parameters.
  • To address the low computational efficiency of boundary calculations in high-dimensional topological spaces during the threshold solving process of complex redundant system performance parameters, a Boundary Search Algorithm based on pruning optimization is proposed. This method enables numerical computation of demand-threshold sets without requiring analytical expressions.
  • This paper provides an efficient numerical solution approach for the threshold set of complex system requirements. Taking the multi-layer redundant power supply system of a certain industrial control equipment as an example, a performance margin model was established to identify the critical failure state, and the system belief reliability was calculated to be 0.9611, verifying the practicality of the proposed method in redundancy design and reliability evaluation.
Future research will be directed towards two primary areas. First, regarding the algorithmic complexity, while the current boundary search algorithm effectively solves the margin equations for the presented engineering case, its computational cost in extremely high-dimensional spaces remains a challenge. Future work will explore the scalability of the algorithm, investigating advanced heuristic techniques to further optimize the complexity for systems with massive module counts and higher levels of redundancy. Finally, we intend to examine the effect of interdependencies between module parameters on the overall system reliability, with particular focus on how these dependencies influence the accuracy and sensitivity of the reliability assessments under the assumptions made in this study.

Author Contributions

Conceptualization, Y.C.; formal analysis, T.Y.; investigation, Y.W.; data curation, Y.W.; writing—original draft preparation, T.Y. and Y.G.; writing—review and editing, T.Y. and Y.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62573020.

Data Availability Statement

The data that has been used is confidential.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Benabid, R.; Merrouche, D.; Bourenane, A.; Alzbutas, R. Reliability Assessment of Redundant Electrical Power Supply Systems Using Fault Tree Analysis, Reliability Block Diagram, and Monte Carlo Simulation Methods. In Proceedings of the 2018 International Conference on Electrical Sciences and Technologies in Maghreb (CISTEM); IEEE: New York, YN, USA, 2018; pp. 1–7. [Google Scholar]
  2. Kaiser, B.; Gramlich, C.; Förster, M. State/Event Fault Trees—A Safety Analysis Model for Software-Controlled Systems. Reliab. Eng. Syst. Saf. 2007, 92, 1521–1537. [Google Scholar] [CrossRef]
  3. Ereau, J.-F.; Saleman, M.; Valette, R.; Demmou, H. Petri Nets for the Evaluation of Redundant Systems. Reliab. Eng. Syst. Saf. 1997, 55, 95–104. [Google Scholar] [CrossRef]
  4. Chen, Q.; Zhang, J. Evolution Model of Emergency Material Supply Chain Stress Based on Stochastic Petri Nets—A Case Study of Emergency Medical Material Supply Chains in China. Systems 2025, 13, 423. [Google Scholar] [CrossRef]
  5. Boudali, H.; Crouzen, P.; Stoelinga, M. A Rigorous, Compositional, and Extensible Framework for Dynamic Fault Tree Analysis. IEEE Trans. Dependable Secur. Comput. 2010, 7, 128–143. [Google Scholar] [CrossRef]
  6. Zhai, Q.; Peng, R.; Xing, L.; Yang, J. Binary Decision Diagram-Based Reliability Evaluation of k-out-of-(n plus k) Warm Standby Systems Subject to Fault-Level Coverage. Proc. Inst. Mech. Eng. Part O-J. Risk Reliab. 2013, 227, 540–548. [Google Scholar] [CrossRef]
  7. Wang, W.; Xiong, J.; Xie, M. A Study of Interval Analysis for Cold-Standby System Reliability Optimization under Parameter Uncertainty. Comput. Ind. Eng. 2016, 97, 93–100. [Google Scholar] [CrossRef]
  8. Ahmadi, M.; Shekhar, A.; Bauer, P. Impact of the Various Components Consideration on Choosing Optimal Redundancy Strategy in MMC. In Proceedings of the 2022 IEEE 20th International Power Electronics and Motion Control Conference (PEMC); IEEE: New York, YN, USA, 2022; pp. 21–26. [Google Scholar]
  9. Dugan, J.B.; Bavuso, S.J.; Boyd, M.A. Dynamic Fault-Tree Models for Fault-Tolerant Computer Systems. IEEE Trans. Reliab. 1992, 41, 363–377. [Google Scholar] [CrossRef]
  10. Popa, C.; Stefanov, O.; Goia, I.; Nistor, F. A Hybrid Fault Tree–Fuzzy Logic Model for Risk Analysis in Multimodal Freight Transport. Systems 2025, 13, 429. [Google Scholar] [CrossRef]
  11. Zhang, P.; Chan, K.W. Reliability Evaluation of Phasor Measurement Unit Using Monte Carlo Dynamic Fault Tree Method. IEEE Trans. Smart Grid 2012, 3, 1235–1243. [Google Scholar] [CrossRef]
  12. Zhong, C.; Jin, H. A Novel Optimal Preventive Maintenance Policy for a Cold Standby System Based on Semi-Markov Theory. Eur. J. Oper. Res. 2014, 232, 405–411. [Google Scholar] [CrossRef]
  13. Fathizadeh, M.; Khorshidian, K. An Alternative Approach to Reliability Analysis of Cold Standby Systems. Commun. Stat.—Theory Methods 2016, 45, 6471–6480. [Google Scholar] [CrossRef]
  14. Zeng, Y.; Duan, R.; Huang, S.; Feng, T. Reliability Analysis for Complex Systems Based on Generalized Stochastic Petri Nets and EDA Approach Considering Common Cause Failure. Eng. Comput. 2019, 37, 1513–1530. [Google Scholar] [CrossRef]
  15. Tannous, O.; Xing, L.; Dugan, J.B. Reliability Analysis of Warm Standby Systems Using Sequential BDD. In Proceedings of the Annual Reliability and Maintainability Symposium (Rams), 2011 proceedings; IEEE: New York, YN, USA, 2011. [Google Scholar]
  16. Jia, H.; Ding, Y.; Peng, R.; Song, Y. Reliability Evaluation for Demand-Based Warm Standby Systems Considering Degradation Process. IEEE Trans. Reliab. 2017, 66, 795–805. [Google Scholar] [CrossRef]
  17. Mo, Y.; Xing, L. An Enhanced Decision Diagram-Based Method for Common-Cause Failure Analysis. Proc. Inst. Mech. Eng. Part O-J. Risk Reliab. 2013, 227, 557–566. [Google Scholar] [CrossRef]
  18. Yeh, W.-C.; Huang, C.-L.; Wu, H.-S. An Improved d-MP Algorithm for Reliability of Logistics Delivery Considering Speed Limit of Different Roads. Signals 2022, 3, 895–910. [Google Scholar] [CrossRef]
  19. Levitin, G.; Xing, L.; Dai, Y. Optimizing Dynamic Performance of Multistate Systems With Heterogeneous 1-Out-of-N Warm Standby Components. IEEE Trans. Syst. Man Cybern.-Syst. 2018, 48, 920–929. [Google Scholar] [CrossRef]
  20. Levitin, G.; Xing, L.; Ben-Haim, H.; Huang, H.-Z. Dynamic Demand Satisfaction Probability of Consecutive Sliding Window Systems with Warm Standby Components. Reliab. Eng. Syst. Saf. 2019, 189, 397–405. [Google Scholar] [CrossRef]
  21. Kang, R.; Zhang, Q.; Zeng, Z.; Zio, E.; Li, X. Measuring Reliability under Epistemic Uncertainty: Review on Non-Probabilistic Reliability Metrics. Chin. J. Aeronaut. 2016, 29, 571–579. [Google Scholar] [CrossRef]
  22. Zeng, Z.; Wen, M.; Kang, R. Belief Reliability: A New Metrics for Products’ Reliability. Fuzzy Optim. Decis. Mak. 2013, 12, 15–27. [Google Scholar] [CrossRef]
  23. Wang, Y.; Kang, R.; Chen, Y. Belief Reliability Modeling for the Two-Phase Degradation System with a Change Point under Small Sample Conditions. Comput. Ind. Eng. 2022, 173, 108697. [Google Scholar] [CrossRef]
  24. Li, Y.; Chen, Y.; Zhang, Q.; Kang, R. Belief Reliability Analysis of Multi-State Deteriorating Systems under Epistemic Uncertainty. Inf. Sci. 2022, 604, 249–266. [Google Scholar] [CrossRef]
  25. Chen, Y.; Wang, Y.; Kang, R. Epistemic Uncertainty Propagation and Reliability Evaluation of Feedback Control System. IEEE Trans. Reliab. 2024, 73, 521–535. [Google Scholar] [CrossRef]
  26. Chen, Y.; Wang, Y.; Li, S.; Kang, R. Hybrid Uncertainty Quantification of Dependent Competing Failure Process with Chance Theory. Reliab. Eng. Syst. Saf. 2023, 230, 108958. [Google Scholar] [CrossRef]
  27. Basu, A.; Bozga, M.; Sifakis, J. Modeling heterogeneous real-time components in BIP. In Proceedings of the Fourth IEEE International Conference on Software Engineering and Formal Methods (SEFM 2006); IEEE: New York, YN, USA, 2006; pp. 3–12. [Google Scholar] [CrossRef]
  28. Basu, A.; Bensalem, S.; Bozga, M.; Bourgos, P.; Sifakis, J. Rigorous System Design: The BIP Approach; Lecture Notes in Computer Science (MEMICS 2011); Springer: Berlin/Heidelberg, Germany, 2012; Volume 7119. [Google Scholar] [CrossRef]
  29. Bliudze, S.; Sifakis, J. The Algebra of Connectors—Structuring Interaction in BIP. IEEE Trans. Comput. 2008, 57, 1315–1330. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.