Optimization Methods for Redundancy Allocation in Hybrid Structure Large Binary Systems

: This paper addresses the issue of optimal redundancy allocation in hybrid structure large binary systems. Two aspects of optimization are considered: (1) maximizing the reliability of the system under the cost constraint, and (2) obtaining the necessary reliability at a minimum cost. The complex binary system considered in this work is composed of many subsystems with redundant structure. To cover most of the cases encountered in practice, the following kinds of redundancy are considered: active redundancy, passive redundancy, hybrid standby redundancy with a hot or warm reserve and possibly other cold ones, triple modular redundancy (TMR) structure with control facilities and cold spare components, static redundancy: triple modular redundancy or 5-modular redundancy (5MR), TMR/Simplex with cold standby redundancy, and TMR/Duplex with cold standby redundancy. A classic evolutionary algorithm highlights the complexity of this optimization problem. To master the complexity of this problem, two fundamentally different optimization methods are proposed: an improved evolutionary algorithm and a zero-one integer programming formulation. To speed up the search process, a lower bound is determined ﬁrst. The paper highlights the difﬁculty of these optimization problems for large systems and, based on numerical results, shows the effectiveness of zero-one integer programming. systems series-aligned subsystems. Our study a system of


Introduction
The problem of reliability optimization in large hybrid systems mainly refers to the type of the system (binary or multi-state), type of solution (reliability allocation and/or redundancy allocation), or the kind of redundancy, which can be static (TMR or 5MR, for example), dynamic (active redundancy or standby redundancy), or hybrid (TMR/Simplex or TMR/Duplex with spare components, etc.). Useful overviews covering models and methods for these reliability optimization problems (ROPs), including reliability allocation, redundancy allocation, and reliability-redundancy allocation can be found in many works, such as [1][2][3].
The mathematical formulation of a reliability optimization problem requires the specification of three elements: decision variables, imposed constraints, and objective function(s).
The decision variables describe those elements that can be changed or adjusted or the decisions that can be made to improve system performance, as expressed by the objective function(s). As examples of decision variables one can mention the types of components and their characteristics (reliability, cost, etc.), the type of redundancy for each subsystem, the number of spare components for each subsystem, etc.
The constraints reflect practical design limitations, e.g., a required level of reliability or the available budget, which occur in almost all cases. But in practice there may be other limitations, related to the volume or weight of the system, for example.
The objective function measures the performance of the system for a set of values of the decision variables. Thus, by optimizing the objective function(s) under the specified constraints it is possible to identify the combination of values of the decision variables that leads to the best possible design solution for the studied system.
Usually for ROPs, the goal of optimization is to maximize system reliability or minimize system cost. In reliability engineering the problem of system reliability maximization under two or more constraints often arises, e.g., under cost constraints, but also under weight and/or volume constraints. When an analytical approach is possible (e.g., in the case of active-redundancy-only subsystems), to ensure that two or more constraints are satisfied, Lagrangian multipliers are often introduced as part of the objective function [4][5][6].
In this paper we address a class of redundancy allocation problems (RAPs) where the decision variable is the number of redundant components for each subsystem in a series redundant reliability model. RAP is one of the most studied reliability optimization problems, because it has been proven to be quite difficult to solve, and many different optimization approaches have been used to determine optimal or near-optimal solutions. As [7] demonstrates, RAPs belong to the NP-hard class of optimization problems.
The RAPs we consider involves hybrid structures with no less than eight types of redundancy; these are conditions where the optimization problems are difficult to solve, even if we limit ourselves to single-constraint optimization problems. More specifically, our goal is to highlight the difficulty of these RAPs for large systems, when the number of subsystems grows to the order of tens or even hundreds.
In order to master the complexity of RAPs in case of large systems, for which the difficulty of the problem increases, special research efforts have been made in recent years. In addition, to cover a wide range of techniques used to increase the reliability encountered in practice, many hybrid reliability models have been considered for which the RAPs get even more complicated. For example, [8] investigates a complex reliabilityredundancy allocation problem with a component mixing strategy, which changes the traditional RAP model to a heterogeneous one. Moreover, in the hybrid reliability models proposed in [9], the choice of redundancy strategy is considered as a decision variable. So, for each subsystem, an active or cold standby redundancy may be considered. In addition, components of different types can be used in each subsystem, i.e., a component mixing strategy. Consequently, this RAP involves determining a solution that maximizes system reliability in terms of the type of redundancy and the number of spare components of each type (for each subsystem). To solve this RAP, a genetic algorithm is developed. Also, a reliability model based on cold standby redundancy combined with component mixing is investigated by [10]. For this complex problem, the author proposes a simplified swarm optimization method in which a multi-role resource sharing strategy is adopted to provide the diverse system components. Another reliability model based on active or cold standby redundancy combined with component mixing is investigated in [11]. To solve this RAP, the authors propose a parallel stochastic fractal search algorithm. Other RAPs involving a heterogeneous structure and/or component allocation strategy of a different type can be found in [12][13][14].
Such a hybrid reliability model is also considered in this paper. In the previously cited works, RAPs are formulated by considering redundant systems with hybrid redundancy strategies and/or reliability models with heterogeneous components, which means that each component of a subsystem can have its own failure rate. In this paper we limit ourselves to the case where subsystems include homogeneous components, but we extend RAPs to cover more redundancy strategies (not just active redundancy or cold standby), including static redundancy or reconfigurable structures such as TMR/Simplex or TMR/Duplex with cold standby redundancy.
As the highlights of our contribution we can mention: • The formalization of two RAPs for binary systems with hybrid structure, which include no less than eight types of redundancy, where reliability modeling of redundant and reconfigurable structures is based on Markov chains; • The design and implementation of two evolutionary algorithms and the formulation of a zero-one integer program for solving these complex optimization problems; • Conducting an extensive performance evaluation study of the three proposed techniques on thousands of problems, which demonstrates the effectiveness of the zero-one integer programming approach for large systems with tens or even hundreds of subsystems.
This paper is organized as follows. Section 2 presents the issue addressed, whereas the types of redundancy considered here and the models or equations used for reliability evaluation are presented in detail in Section 3. Some related works are mentioned in Section 4. The algorithms used for these optimal allocation issues are described in Section 5. The objective functions adopted for the evolutionary algorithms and for the linear programming model are reported in Section 6, whereas in Section 7 a lower bound solution is proven. Experimental results are presented in Section 8. Further discussion is the subject of Section 9. The conclusions of the paper and several directions of future research are included in Section 10.

Problem Description
For systems with a large number of components without redundancy, reliability is often very low. To achieve the required reliability, a certain type of redundancy is applied to a certain element, depending on technical particularities, which can be static, dynamic, or hybrid redundancy. All of these types of redundancy are considered in this paper. The reliability model for this redundant system is a series-redundant one as presented in Figure 1.
As the highlights of our contribution we can mention: • The formalization of two RAPs for binary systems with hybrid structure, which include no less than eight types of redundancy, where reliability modeling of redundant and reconfigurable structures is based on Markov chains; • The design and implementation of two evolutionary algorithms and the formulation of a zero-one integer program for solving these complex optimization problems; • Conducting an extensive performance evaluation study of the three proposed techniques on thousands of problems, which demonstrates the effectiveness of the zero-one integer programming approach for large systems with tens or even hundreds of subsystems.
This paper is organized as follows. Section 2 presents the issue addressed, whereas the types of redundancy considered here and the models or equations used for reliability evaluation are presented in detail in Section 3. Some related works are mentioned in Section 4. The algorithms used for these optimal allocation issues are described in Section 5. The objective functions adopted for the evolutionary algorithms and for the linear programming model are reported in Section 6, whereas in Section 7 a lower bound solution is proven. Experimental results are presented in Section 8. Further discussion is the subject of Section 9. The conclusions of the paper and several directions of future research are included in Section 10.

Problem Description
For systems with a large number of components without redundancy, reliability is often very low. To achieve the required reliability, a certain type of redundancy is applied to a certain element, depending on technical particularities, which can be static, dynamic, or hybrid redundancy. All of these types of redundancy are considered in this paper. The reliability model for this redundant system is a series-redundant one as presented in Figure 1. The notations used to describe the redundant structures and their reliability evaluation models are presented at the end of the paper. Along with these notations we include a short nomenclature and some assumptions under which the reliability models are valid.
Typically, in this allocation process the criterion may be reliability, cost, weight, or volume. One or more criteria can be considered in an objective function, while the others may be considered constraints, as considered by [22] (pp. 331-338). In this paper, the criteria we consider are reliability and cost, and in this situation, two optimization problems are frequently encountered in practice: 1. Minimizing the cost of the redundant system for which a required reliability must be achieved; 2. Maximizing the reliability of the system within a maximum allowed cost. The notations used to describe the redundant structures and their reliability evaluation models are presented at the end of the paper. Along with these notations we include a short nomenclature and some assumptions under which the reliability models are valid.
Typically, in this allocation process the criterion may be reliability, cost, weight, or volume. One or more criteria can be considered in an objective function, while the others may be considered constraints, as considered by [22] (pp. 331-338). In this paper, the criteria we consider are reliability and cost, and in this situation, two optimization problems are frequently encountered in practice:

1.
Minimizing the cost of the redundant system for which a required reliability must be achieved; 2.
Maximizing the reliability of the system within a maximum allowed cost. In both cases, from the mathematical point of view, one must solve an optimization problem with an objective function and constrains. More exactly, for the first problem, one must minimize the cost function: with the constraint of reliability: For the second problem, one must maximize the reliability function: with the cost constraint: For example, when for all the subsystems an active redundancy is considered, for the redundant system a series-parallel reliability model results. Thus, the cost and reliability functions can be expressed by the equations: Thus, we have to determine the values k 1 , k 2 , . . . , k n that minimize the cost function in Equation (5) with the reliability constraint in Equation (2), or maximize the reliability function in Equation (6) with the cost constraint in Equation (4), as the case may be.

Types of Redundancy
To cover most situations encountered in practice, the following types of redundancy are considered in this study, namely: • active redundancy (tr = A); • passive redundancy (or cold standby redundancy) (tr = B); • hybrid standby redundancy with a hot reserve (tr = C) or a warm one (tr = D) and possibly other cold ones; • hybrid redundancy consisting of a TMR structure with control facilities and possibly cold reserves (tr = E); • static redundancy: TMR or 5MR (tr = F); • reconfigurable TMR/Simplex type structure with possible other cold-maintained spare components (tr = G); • reconfigurable TMR/Duplex type structure with possible other cold-maintained spare components (tr = H). The reliability model and the equations used to evaluate the reliability for a subsystem, depending on the type of redundancy, are presented in this section. Since the time to failure for a component is assumed to have a negative exponential distribution, the following equations are valid: Mathematics 2022, 10, 3698 5 of 33 and λT = − ln r (8) Remember that for any redundant subsystem the spare components are considered identical to the basic ones.

Active Redundancy (tr = A)
For this parallel reliability model where all components operate simultaneously, the well-known equation is applied: 3.2. Passive Redundancy (tr = B) In this case, one component is in operation and all other identical k − 1 spare components are maintained in a cold state, which means that a spare component is switched off until it is needed to replace the defective one (i.e., a redundant component does not fail in cold standby mode). The following equation can be applied to this model: Note that Equation (10) is the sum of the first k terms of the Poisson distribution of the parameter λT. Remember that for any redundant subsystem the spare components are consider identical to the basic ones.

Active Redundancy ( = )
For this parallel reliability model where all components operate simultaneously, t well-known equation is applied: In this case, one component is in operation and all other identical − 1 spare co ponents are maintained in a cold state, which means that a spare component is switch off until it is needed to replace the defective one (i.e., a redundant component does n fail in cold standby mode). The following equation can be applied to this model: Note that Equation (10) is the sum of the first terms of the Poisson distribution the parameter .

Hybrid Standby Redundancy with a Hot ( = ) or a Warm ( = ) Spare and Possibly Other Cold Ones
In this case of standby redundancy, a component is in operation, a spare compone is active or kept in a warm state, and possibly other spare components are kept in co conditions as illustrated in Figure 2. A warm component may fail before being put into operation and its failure rate less than that of the same component in active mode. Therefore, let , 0 < ≤ 1, be t failure rate for this reserve. For this type of redundancy, the subsystem reliability functi is obtained based on the Markov method, depending on the total number of componen as shown below.
3.3.1. Case 1: = 2 Consider a subsystem consisting of a component in operation and a warm-ma tained reserve. The evolution of this redundant subsystem until failure is illustrated the Markov chain presented in Figure 3. A warm component may fail before being put into operation and its failure rate is less than that of the same component in active mode. Therefore, let αλ, 0 < α ≤ 1, be the failure rate for this reserve. For this type of redundancy, the subsystem reliability function is obtained based on the Markov method, depending on the total number of components, as shown below.   To obtain the probability functions ( ), = 1: , the following system of differ tial equations must be solved: Note that the state probabilities for = 0 are also known. Let us resume the analysis of the subsystem under study. In the Markov chain p sented in Figure 3, and are successful states, while is a failure state. Thus, reliability function of this redundant subsystem can be defined as After a partial-fraction-expansion, the function ( ) can be expressed as follows 1 + 1 1 + 1 To begin with, let us refer to a general Markov model. Let S 1 , S 2 , . . . , S N be the states of the Markov chain and A = a x,y N×N be the matrix of state transition rates, where a x,y , x = y, represents the rate of transition from state S y to state S x , while an element of the main diagonal (i.e., x = y) is the negative value of the sum of all the other elements in the column.
Let s(t) be the state of the subsystem at the time t, and To obtain the probability functions p x (t), x = 1 : N, the following system of differential equations must be solved: where Note that the state probabilities for t = 0 are also known. Let us resume the analysis of the subsystem under study. In the Markov chain presented in Figure 3, S 1 and S 2 are successful states, while S 3 is a failure state. Thus, the reliability function of this redundant subsystem can be defined as As the transition rate matrix is: to determine the probability functions p 1 (t) and p 2 (t), the following system of differential equations must be solved: With the initial values: p 1 (0) = 1 and p 2 (0) = p 3 (0) = 0, by applying the Laplace transform (L), the following system of algebraic equations results: where P i (s) = L {p i (t)}, i ∈ {1, 2}, are functions in the frequency domain, and s is the Laplace operator. Based on (16), after some algebraic operations, the following functions are obtained: Mathematics 2022, 10, 3698 After a partial-fraction-expansion, the function P 2 (s) can be expressed as follows: As the function R(s) = L{R(t)} = P 1 (s) + P 2 (s), the following expression results: The reliability function R(t) can then be obtained by applying the inverse Laplace transform, R(t) = L −1 {R(s)}. Thus, the reliability function has the following form: For a certain period of time T, the component reliability is r = e −λT , so that the subsystem reliability R as a function of r and α is given by the equation: For a redundancy subsystem with a larger number of components, the reliability function can be obtained based on the Markov method in the same way, but algebraic operations are more complicated. The results for the other two cases are presented below.

Case
Take a redundant subsystem composed of an active component, a hot/warm spare component, and another one maintained in cold conditions. For this case, the following reliability function results: 3.3.3. Case 3: k = 4 For a redundant subsystem with an active component, a hot/warm spare component, and two other ones maintained in cold conditions, the reliability function is given by the following equation:

TMR Structure with Control Facilities and Cold Spare Components (tr = E)
In this case, another hybrid redundancy is considered. Thus, a redundant system is composed of a TMR structure with control facilities as a basic structure (i.e., static redundancy) and possibly one or more components maintained in cold conditions (i.e., standby redundancy). This type a hybrid redundancy is illustrated in Figure 4.
The decision logic works on the principle of majority logic, 2 out of 3, called voter and represented by the symbol V in Figure 4. When one of the three components in operation (CO 1 , CO 2 or CO 3 ) fails, an error signal indicates the faulty component. Thus, the faulty component can be replaced with a cold-maintained standby one as soon as possible. In this way, this redundant hybrid subsystem can tolerate one or more defective components, as the case may be. For additional decision and control block the failure rate, denoted by λ dc , is expressed based on the basic component rate, λ. In this study, the following expression is used:  In case of a TMR structure without reserves (i.e., = 3), the redundant subsystem can tolerate only one faulty component, so the subsystem reliability function is given by the well-known equation: 3.4.2. Case 2: TMR Structure and One Cold Spare Component A redundant subsystem with hybrid redundancy composed of a TMR structure and one CSC (i.e., = 4) may tolerate two faulty components. For a start, for the logical block of decision and control, the possibility of failure is neglected. The reliability evaluation is made based on the Markov graph given in Figure 5. In this graph, , and are successful states, while is a failure one. Given these aspects, the reliability function of this redundant subsystem is expressed as: As the transition rate matrix is: by applying Equation (12) in order to determine the probability functions ( ), ( ) Consequently, the reliability function for logical decision and control block, denoted by r dc , is expressed as:

Case 1: TMR Structure without Standby Redundancy
In case of a TMR structure without reserves (i.e., k = 3), the redundant subsystem can tolerate only one faulty component, so the subsystem reliability function is given by the well-known equation:

Case 2: TMR Structure and One Cold Spare Component
A redundant subsystem with hybrid redundancy composed of a TMR structure and one CSC (i.e., k = 4) may tolerate two faulty components. For a start, for the logical block of decision and control, the possibility of failure is neglected. The reliability evaluation is made based on the Markov graph given in Figure 5.  In this graph, , and are successful states, while is a failure one. Given these aspects, the reliability function of this redundant subsystem is expressed as: As the transition rate matrix is: In this graph, S 1 , S 2 and S 3 are successful states, while S 4 is a failure one. Given these aspects, the reliability function of this redundant subsystem is expressed as: Mathematics 2022, 10, 3698 9 of 33 As the transition rate matrix is: by applying Equation (12) in order to determine the probability functions p 1 (t), p 2 (t) and p 3 (t), the next system of differential equations results: With the initial values: p 1 (0) = 1, and p 2 (0) = p 3 (0) = 0, by applying the Laplace transform, the following system of algebraic equations is obtained: By solving the system, the following functions in the frequency domain result: As the function the following expression results: The reliability function R(t), obtained by applying the inverse Laplace transform, is of the form: Finally, taking also into account the reliability of the decision and control logic, the subsystem reliability R as a function of r and β is given by the equation: For a hybrid redundancy subsystem with a larger number of CSCs, the reliability function can be obtained by applying the Markov method in the same way, but algebraic operations are more complicated. A result obtained for another case is presented as follows.

Case 3: TMR Structure and Two Cold Spare Components
Take a redundant subsystem with hybrid redundancy composed of a TMR structure and two CSCs (i.e., k = 5). This redundant subsystem can tolerate three defective compo-nents. A Markov-based approach similar to the one presented above gives the following subsystem reliability as a function of r and β: 3.5. Static Redundancy: TMR or 5MR (tr = F) This type of redundancy refers to those subsystems for which a static redundancy with majority logic (TMR or 5MR) can be adopted, depending on the desired level of reliability. Thus, in the process of finding an optimal solution, the valid values for variable k are 1, 3 and 5.

Case 1: TMR Structure
This case where k = 3 was also considered in Section 3.4, Case 1, so that the reliability function for this redundant subsystem is given by Equation (25).

Case 2: 5MR Structure
When a 5MR redundancy is adopted (i.e., k = 5), as [22] (pp. 165-176) appreciates, the additional logic of decision and control is more complex than that used for TMR redundancy. Consequently, the failure rate, denoted by λ dc , expressed on the basis of the failure rate of the basic components, is considered of the form: where the reduction factor γ is lower than the reduction factor β used for the TMR redundancy. Because the 5MR structure can tolerate two defective components, the reliability of the subsystem can be calculated as follows:

TMR/Simplex and Cold Standby Redundancy (tr = G)
This is another case of hybrid redundancy in which the basic structure is reconfigurable. Specifically, the redundant subsystem consists of a TMR structure with control and reconfiguration facilities and other possible CSCs, as shown in Figure 6.
If one of the three components in operation fails, the subsystem continues to operate successfully based on redundancy, and the control logic generates an error signal indicating the faulty component. The status of the active component (good or failed) is reflected by three dedicated flip-flops. For example, Figure 6 illustrates the case where components CO 1 and CO 3 work successfully and component CO 2 is defective.
When an error signal is activated, the defective component must be replaced with a spare one as soon as possible to restore the initial fault tolerance state. Let us suppose this replacement is done quickly enough so reliability is not significantly affected. When only two components remain in good state, in order to increase the reliability, it is preferable for only one component to continue to work, not both. This reconfigurable structure is known as TMR/Simplex [32] (p. 233) or TMR 3-2-1 [22] (p. 152). Note that after a component has failed, the control logic can no longer correctly indicate another fault, so the values of the status flip-flops must be preserved until the fault tolerance is restored. This is the role of the 3-input NAND logic gate in , > 1.

TMR/Simplex and Cold Standby Redundancy ( = )
This is another case of hybrid redundancy in which the basic structure is reconfigurable. Specifically, the redundant subsystem consists of a TMR structure with control and reconfiguration facilities and other possible CSCs, as shown in Figure 6.  For an additional decision, control and reconfiguration logic block, the faulty rate denoted by λ dcr is expressed based on the basic component rate. In this study, the following equation is used: where the reduction factor δ is lower than the reduction factor β used for TMR redundancy. Consequently, the reliability function for the logic of decision, control and reconfiguration denoted by r dcr is expressed as: The reliability of the redundant subsystem depends on the number of CSCs, as shown below.

Case 1: TMR/Simplex without Standby Redundancy
In case of TMR/Simplex redundancy without spare components (i.e., k = 3), the subsystem reliability function is given by the well-known equation [32], (p. 233): 3.6.2. Case 2: TMR/Simplex and One Cold Reserve For this case of hybrid redundancy, the reliability evaluation is made by applying the Markov method. For starters, for the logical block of decision, control and configuration the possibility of failure is neglected. In this condition, the evolution of the redundant subsystem to failure is illustrated by the Markov chain shown in Figure 7. 3.6.2. Case 2: TMR/Simplex and One Cold Reserve For this case of hybrid redundancy, the reliability evaluation is made by applying the Markov method. For starters, for the logical block of decision, control and configuration the possibility of failure is neglected. In this condition, the evolution of the redundant subsystem to failure is illustrated by the Markov chain shown in Figure 7. In this graph, , and are states of success, while is a failure state. Consequently, the subsystem reliability is defined as: Since the transition rate matrix is: In this graph, S 1 , S 2 and S 3 are states of success, while S 4 is a failure state. Consequently, the subsystem reliability is defined as: Since the transition rate matrix is: based on (12), the following system of differential equations results: With the initial values: p 1 (0) = 1, and p 2 (0) = p 3 (0) = 0, by applying the Laplace transform, the following system of algebraic equations is obtained: By solving this equation system, the following functions in the field of Laplace transform are obtained: The reliability function in the field of Laplace transform is: The reliability function R(t), obtained by applying the inverse Laplace transform, is of the form: Finally, taking also into account the reliability of the logical block of decision, control and configuration, the reliability of the subsystem R as a function of r and δ is given by the equation: 3.6.3. Case 3: TMR/Simplex and Two Cold Reserves Take a reconfigurable subsystem with hybrid redundancy composed of a TMR/Simplex structure and two CSCs (i.e., k = 5). This redundant subsystem can tolerate three defective components. A Markov-based approach similar to the one presented above gives the following subsystem reliability as a function of r and δ:

TMR/Duplex and Cold Standby Redundancy (tr = H)
As in the previous case, the redundant subsystem has a hybrid redundancy consisting of a reconfigurable TMR structure and possibly other CSCs, as shown in Figure 6. But this reconfigurable structure also aims at high operational safety. Thus, when one component of the TMR structure fails, the other two good components are put into operation in duplex mode. Specifically, the two components operate in parallel and their outputs are compared continuously. When the two components no longer generate the same response, an error signal is activated (as shown in Figure 6), so that the operation is stopped in safe mode. This reconfigurable structure is called by [32] TMR/Duplex.
Regarding the reliability assessment, note that this redundant subsystem can tolerate the same number of faulty components as the TMR structure presented in Section 3.4 for type E redundancy. Consequently, depending on the total number of components (k), Equations (26), (35) or (36) are valid in this case as well, with the only difference that the reduction factor β is replaced by δ.

Related Work
The problems of maximizing reliability with a cost constraint or minimizing cost with a reliability constraint can be solved using various methods. One is by solving an analytical model based on Lagrange multipliers with an alternative indicator for reliability [4]. The resulting system of algebraic equations can be solved but involve some approximate relations which may impact the accuracy of the solution. Also, this method gives realvalued results which must be converted into integers, and this may have a strong impact on solution quality. Therefore, heuristic methods can be appropriate. For example, one such technique described by [22] (p. 335) is a greedy approach that tries to make an optimal choice at each step: starting with the minimum system design, the system reliability is increased by adding one component to the subsystem with the lowest reliability. This process is repeated as long as the cost constraint is met.
Another method described by [33] (pp. 499-532) tries to accelerate the allocation process by noticing that the subsystem with the highest reliability should have the smallest number of components, and the least reliable subsystem should have the greatest number of components. Starting with the initial system, the reliability is increased by adding one component to each subsystem as long as the cost constraint is met. For the most reliable subsystem, this is the final allocation. The process continues with the other subsystems, until no allocation is possible any longer.
Pairwise Hill Climbing (PHC) [29] adapts the idea of classic hill climbing to the reliability-cost problem. Two candidate solutions are generated for each pair of subsystems. The first candidate is created by adding one component to the first subsystem, i.e., the direct hill climbing operation. The second is created by adding one component to the first subsystem and subtracting one from the second subsystem, i.e., a swapping operation. A hybrid approach starting from an approximate, but nearly-optimal solution given by the analytical approach, further improved by PHC was found to provide good results.
The problem can also be expressed as a quadratic unconstrained binary optimization (QUBO). This formulation has the potential of being solved by the D-Wave quantum computer as shown by [29] or [34].
The problem must be stated in the form of: The user needs to specify the parameters a i (the weights associated with each qubit) and b ij (the strengths of the couplers between qubits). The expression is minimized by quantum annealing when run on the quantum computer and the observed q i values of either 0 or 1 represent the solution. A special procedure is required to transform the inequality constraint into additional terms to be optimized together with the main objective function in the same expression [29].

The Optimization Algorithms
The experimental studies presented in Section 9 are based on three approaches: a classical real-valued evolutionary algorithm, an improved evolutionary algorithm called RELIVE, that combines global search with local search, and a zero-one integer programming model, i.e., a special case of linear programming. While these techniques have been extensively used for various optimization problems, an original contribution of the current paper is the design of the objective functions corresponding to the problem under study, described in Section 6.

Classic Evolutionary Algorithm
Evolutionary algorithms (EAs) are inspired by biological natural selection [35,36]. They maintain a population of individuals (or chromosomes) which are potential solutions, i.e., different values of the x input of the objective function f (x) that needs to be optimized. There are three main genetic operators which are repeatedly applied for a pre-specified number of generations or until a convergence criterion is satisfied: selection (which identifies "parents", such that individuals with better objective functions have a higher probability of being selected), crossover (which combines the genes of two parents and creates an offspring), and mutation (which may change some genes of a child before it is inserted into the new population). All these operators are stochastic, but the constant favoring of better individuals to reproduce drives the algorithm towards increasingly better solutions, while random changes in the chromosomes try to prevent it from convergence into local optima. For the experiments in Section 8, the standard evolutionary algorithm (SEA) uses the following types of operators and parameters: • tournament selection with two individuals; • elitism is used, i.e., the best individual is directly copied into the next generation; • arithmetic crossover, where a child chromosome is a linear combination of the parent chromosomes, with a probability of 0.9; • mutation by gene resetting, where the value of a randomly selected gene is set to a random number from a uniform distribution defined on its domain of definition, with a probability of 0.2; • stopping criterion with a fixed number of generations; depending on the experiment 1000 or 10,000 generations are used.

RELIVE
The cross-generational evolutionary algorithm with local improvements (RELIVE) [4] is an original evolutionary algorithm which performs secondary local searches in addition to the main global search and includes the concept of personal improvement of individuals that survive for several generations, instead of just one. Since the lifespan of individuals is no longer fixed, the size of the population is variable. Personal improvement is based on a number of hill climbing steps in each generation. During a generation, the individuals undergo the classic evolution based on selection, crossover and mutation. Another typical feature of RELIVE is the way in which it encourages exploration. This has proved particularly useful for difficult optimization problems such as the one addressed in our work. First, a few newly created chromosomes are added in each generation. Secondly, to generate a neighbor state in the hill climbing stage, three types of mutation are used with different probabilities: Gaussian mutation, resetting mutation, and pairwise mutation, where two genes exchange a unit, i.e., one's value is incremented and the other's value is decremented. The latter type is again specifically designed for problems involving integer solutions, such as the present one. For the experiments in Section 8, RELIVE uses the following parameter values: • the initial size of the population is 50; For the rest of the operators RELIVE uses, like SEA, tournament selection with two individuals, elitism, arithmetic crossover, with a probability 0.9, and a maximum number of 100 or 1000 generations.

Linear Programming
Linear programming (LP) is an optimization method aimed at problems with a linear objective function and linear constraints. There are several specific LP algorithms implemented in various libraries and programs. For our experiments, l psolve [28] was used, which implements an optimized version of the simplex algorithm proposed by [37]. Depending on the nature of the optimization problem, it can select either the primal or the dual method, with factorization and scaling procedures to increase numerical stability. The problem we address in this paper is in fact cast as a zero-one integer programming (01IP) problem, a special case of LP.
6. Designing the Objective Functions 6.1. Evolutionary Algorithms 6.1.1. Problem Definition For the two evolutionary algorithms, the objective (or fitness) function closely follows the definition of the two correlated problems stated in Section 3 and repeated here for convenience.
The maximization of the reliability with a maximum cost limit can be expressed as: The minimization of the cost of the redundant system with a required reliability can be expressed as: As C i and R i are computed by means of the equations detailed in Section 3, which depend on the number of components for each subsystem, the optimization problem reduces to finding k 1 , k 2 , . . . , and k n .
For the two evolutionary algorithms, the fitness functions are the expressions in (52) and (53) that need to be optimized. Since an EA maximizes the fitness function by default, in case of (53), the negative of the sum of costs is actually used as the fitness function. The encoding of the problem uses real values, thus the chromosomes have n real genes, corresponding to k i . The domain of the genes is [1, k max ], i.e., 1 ≤ k i ≤ k max . It depends on the problem and therefore k max needs to be chosen by the user.

Genotype-Phenotype Mapping
The real values involved in the evolutionary search are interpreted as integer values for k i before the computation of the fitness function. Therefore, the first step is to round the real values to the nearest integer: where k g i reflects the genotype (the actual value of the gene), and k p i reflects the phenotype (its interpretation for further use).
Because in our case studies, for some types of redundancy we limited ourselves to a certain number of spare components as sufficient, another important issue is related to the unsuitability of some values of k i for certain subsystems. Therefore, the adjustment rules in Table 1 are used to interpret the values of k i as valid ones.
It must be mentioned that trying to enforce a valid domain for each subsystem gene a priori would have caused discontinuities in the evolutionary search, would have decreased the genetic diversity, and thus would have led to inferior results.

Chromosom Repairing Procedure
Although expressed with a very simple equation, because of the possibly large size of a problem (e.g., n = 50 or n = 100, as considered in our case studies), the constraints are actually difficult to satisfy.
A naïve approach based on penalties for constraint violation decreases the genetic diversity to such an extent that the algorithms usually fail to find any solution at all, or find feasible solutions very far from the optimum. Therefore, one can apply a repairing procedure for the chromosomes, such that even if a certain individual resulted from the application of the genetic operators is initially unfeasible, it can be slightly modified to become feasible. In this way, all the individuals in the population represent feasible solutions and the evolutionary algorithm focuses on optimizing the fitness function.
For the reliability maximization problem with cost constraints, a random repairing method is applied. Iteratively, a subsystem whose k i > 1 is randomly selected and its k i is decreased by 1, until the overall cost of the system becomes smaller than C * .
Alternative methods were also attempted, but they were slower with no significant improvement of results:

•
The selection of the subsystem with the highest cost. Because of the genotypephenotype distinction, this could sometimes lead to infinite loops (e.g., the repairing procedure decrements a value, and the corresponding adjustment rule increments it); • The selection of the subsystem with the highest reliability. This is even slower because it requires the recomputation of the system reliability after each k i is decremented, with i from 1 to n.
The repairing procedure for the cost minimization with reliability constraints proved much more challenging. Eventually, a random repairing method was also applied in this case. Iteratively, a subsystem whose k i < k max is randomly selected and its k i is increased by 1, until the overall reliability of the system becomes greater than R * . However, the way in which this increment affects the overall system reliability is nonlinear. Simple random selection may be very slow, because it may take several trials to choose the proper subsystem whose increased reliability may turn the overall reliability above the imposed threshold. That is why a specified number of repairing attempts trials is imposed (e.g., 10). If after these repeated trials the reliability does not exceed R * , the individual is penalized with a very low value for its fitness function (e.g., −10 6 ) and thus becomes likely to be excluded from the evolutionary selection process.
Several other alternative methods were attempted as well, but they all had various drawbacks compared to the random method presented above:

•
The selection of the subsystem with the lowest reliability. This method is slower and its results are not much better; • A more elaborate method, where the number of components is increased on layers, with subsystems taken in a random order. When one layer of incrementation is completed, the next one begins. This method was the slowest, about an order of magnitude slower than random selection.

Linear Programming
The objective function is transformed in a different way in order to apply 01IP optimization. This is based on the idea proposed by [29]. The maximization of the product is equivalent to the maximization of the sum of logarithms. The desired solutions of the problem, i.e., k i , i = 1 : n, are included as separate terms, one for each possible result, from 1 to k max : where x ij ∈ {0, 1}, ∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , k max }, is a binary variable that shows that for subsystem i, j components are needed to maximize reliability. The notation R i (j) signifies the reliability of subsystem i when it contains j redundant components. For a subsystem i, only one solution is possible, i.e., its binary indicator must be 1, and the rest must be 0, and this can be written as an additional constraint: The main constraint of the problem is also expressed by using a different term for each possible solution: For the cost minimization problem, the formulation becomes: The genotype-phenotype mapping described above is also used here to compute the reliability of the subsystems by handling the k i values that are not allowed for the corresponding subsystem type.

Lower Bound Solution
The minimum system design represents the first step toward achieving an optimized system design. Let us consider the optimization problem in which the required reliability R * must be achieved at a minimum cost. To obtain a lower bound solution expressed by the values k i , i = 1 : n, as the first step for optimization, an improved version of Albert's method [22,38] is used. Albert's method assumes that as spare elements are added, the reliability of the subsystems tends to become more uniform. This method involves the following steps: Step 1. The components are renumbered so that the reliabilities are in increasing order: Step 2. Let m be the lower limit to which all subsystems certainly require an additional allocation. According to Albert's method, the limit m is adopted so that or m = n in case of r n ≤ R * .
As an improved version, we propose that the limit m be adopted as the highest value for which the following condition is met: r m r m+1 · · · r n < R * .
(61) Let R be the reliability level that the first m subsystems must reach. Based on the condition that: R m r m+1 r m+2 · · · r n ≥ R * , for R the following condition results: Step 3. With this intermediate result (reliability value R), for each subsystem i, i = 1:m, depending on the redundancy type, the lower bound k i is then determined. For example, for a subsystem i with active redundancy (tr = A), the following equations apply: After applying the logarithm we get: and then: So, the lower bound as an integer value is: where the equations are too complicated, the lower bound is determined iteratively, and not algebraically. For other components with higher reliability, the lower bound corresponds to the non-redundant variant, so that: Based on this lower bound solution, the search for an optimal solution can decrease significantly.

Experimental Results
In order to evaluate the effectiveness of the proposed algorithms, a large number of optimization problems of the order of thousands were analyzed. For all these optimization problems, all eight types of redundancy presented in Section 3 are considered. For any of the n subsystems, the type of redundancy is randomly generated based on the predetermined weights, as shown in Table 2. Component reliabilities and costs are also randomly generated. In terms of cost, the values are in the range of [1,50] units for all n subsystems. In terms of reliability, the value ranges depend on the type of redundancy, as shown in Table 3. Regarding the coefficient α and the reduction factors β and δ, the values are randomly generated in the ranges: In the case of type F redundancy subsystems, the value of the reduction factor γ is taken as half of the value for β (γ = β/2).
For the optimization problems we address, two levels of complexity were taken into account, when n = 50 and n = 100. For each case, extensive experimental studies were performed, including thousands of optimization problems.
For each reliability model, the proposed algorithms were tested taking into account both optimization problems. Specifically, for any reliability model, the study on the optimal allocation of redundancy was conducted in this way. First, the issue of redundancy allocation is considered to maximize system reliability at a maximum allowable cost C * = 3 × C ns . Let R max be the maximum system reliability obtained in this way. Then, another redundancy allocation problem is solved to obtain the required reliability R * = R max at a minimum cost. In this way, either the solution from the first optimization problem is validated, or an improved solution is obtained. This is the final allocation that we consider, reflected by the vector k and for which the reliability and cost are R rs and C rs , respectively. For any allocation solution, the redundancy efficiency is then calculated as follows: Efficiency is a more intuitive indicator that shows how often the risk of a failure for the redundant system decreases compared to the basic, non-redundant one.
To illustrate this approach, the numerical results of four experimental studies (problems P 1 − P 4 ) are presented below. First, two reliability models for a system with 50 subsystems are considered (problems P 1 and P 2 ). All the details of these models are presented in Tables 4 and 5.
Each problem is defined by a set of n tuples corresponding to the parameters of its subsystems. In Table 4, we define a problem with 50 subsystems, therefore we have 50 tuples. The first number in the tuple, i, goes from 1 to 50. The second item of a tuple is the subsystem type. It is identified by a letter following the convention defined in Section 3. For example, the first tuple (1: D, 0.989, 39; α = 0.55) has tr 1 = D, which corresponds to hybrid standby redundancy with a warm reserve and possibly other cold ones. The following two numbers identify the reliability and the cost of a single component. Again, for the first tuple, the reliability is r 1 = 0.989 and the cost is c 1 = 39.  C ns = 1241, C * = 3 × C ns = 3723 Table 5. Problem P 2 for n = 50 subsystems.
Structural Details: Tuples of (i: tr i , r i , c i ) Extended with Parameters α i , β i , or as Appropriate, i = 1:n.
The rest of the parameters depend on the subsystem type. They were defined in the mathematical description in Sections 3.1-3.7, but for convenience we include a summary here with the list of the parameters used for each type of subsystems: • active redundancy (tr = A), passive redundancy (or cold standby redundancy) (tr = B), and hybrid standby redundancy with a hot reserve (tr = C) and possibly other cold ones: no additional parameters; • hybrid standby redundancy with a warm reserve (tr = D) and possibly other cold ones: parameter α (the coefficient of reduction of the failure rate for a warm-maintained reserve compared to the failure rate of the component in operation); • hybrid redundancy consisting of a TMR structure with control facilities and possibly cold reserves (tr = E): parameter β (the reduction factor used to express the failure rate of the decision and control logic of a TMR structure based on the failure rate of the basic components); • static redundancy: TMR or 5MR (tr = F): parameters β (as above) and γ (the reduction factor used to express the failure rate of the decision and control logic of a 5MR structure based on the failure rate of the basic components); • reconfigurable TMR/Simplex type structure with possible other cold-maintained spare components (tr = G) and reconfigurable TMR/Duplex type structure with possible other cold-maintained spare components (tr = H): parameter δ (the reduction factor used to express the failure rate of the decision, control and reconfiguration logic of a TMR/Simplex or a TMR/Duplex structure based on the failure rate of the basic components).
For example, in Table 4, since subsystem 1 is of type D, its parameter α 1 is 0.55. Since subsystem 4 is of type E, its parameter β 4 is 50. The subscripts were omitted to avoid cluttering the table, but the parameters have distinct values for each subsystem, i.e., they are α i , β i , γ i or δ i .
On the last line, one can see the cost of the non-redundant system C ns and the maximum allowable cost of the system C * , chosen to be three times greater than C ns . C * could have in fact any value, but greater values do not make the problem harder, because the main difficulty lies in finding the proper distribution of redundant components in the "upper" part of the allocation. Greater values for C * would lead to a certain number of redundant components included for all subsystems, and then the main issue would also lie in this "upper" part of the allocation.
The redundancy allocation for these problems generated by the three proposed algorithms after the first optimization process, that tries to maximize system reliability at a maximum allowable cost C * , is presented in Tables 6 and 7. The solutions after the second optimization process trying to minimize the cost under the reliability constraint R rs ≥ R * = R max are presented in Tables 8 and 9. For the second experiment, more complex reliability models corresponding to a system with 100 subsystems are considered (problems P 3 and P 4 ). These models are presented in Tables 10 and 11. Structural Details: Tuples of (i: tr i , r i , c i ) Extended with Parameters α i , β i , γ i or as Appropriate, i = 1:n. C ns = 2579, C * = 3 × C ns = 7737 Table 11. Problem P 4 for n = 100 subsystems.
Structural Details: Tuples of (i: tr i , r i , c i ) Extended with Parameters α i , β i , γ i or as Appropriate, i = 1:n.
The numerical results after the two optimization processes described above are presented in Tables 12-15.  Table 12. Best solutions to problem P 3 after first optimization (maximizing reliability under cost constraint: C * = 7737).

Algorithm
Optimal Allocation: k 1 , k 2 Table 13. Best solutions to problem P 4 after first optimization (maximizing reliability under cost constraint: C * = 7518).

Algorithm
Optimal Allocation: k 1 , k 2 , . . . , k n C rs R rs Ef  Table 14. Best solutions to problem P 3 after second optimization (minimizing cost under the constraint of reliability R * ).
The three optimization algorithms considered in our study generate different solutions. The following three examples illustrate how we can determine whether one is superior to the other:

•
Consider problem P 1 for which the best solutions generated by the three optimization algorithms are shown in Table 6. All three solutions require 3719 cost units, but the solution given by SEA achieves lower reliability (0.973398) compared to that given by RELIVE and LP (0.977724); • Consider problem P 3 for which the best solutions generated by the three optimization algorithms are shown in Table 12. The solutions given by RELIVE and LP both require 7737 cost units, but the solution generated by LP achieves higher reliability (0.947769) compared to that given by RELIVE (0.927214); • Consider problem P 4 for which the best solutions generated by the three optimization algorithms are shown in Table 15. Please note that the solution given by SEA requires the highest cost and offers the lowest reliability compared to the solutions given by RELIVE and LP.
For a better comparison of the three proposed optimization algorithms, 1000 randomly generated problems were considered for both n = 50 and n = 100. The corresponding results are presented in Figures 8-11. Each graph presents the mean values as the height of the bars, with the standard deviations represented as two sigmas (one up from the mean, and one down from the mean).
First, the reliability maximization case for n = 50 was considered. Figure 8 shows some statistics of the final system reliability obtained by the algorithms. Since the performance of the evolutionary algorithm greatly depends on the number of generations, two versions were considered: 1000 and 10,000 generations for SEA, and 100 and 500 generations for RELIVE.
It must be mentioned that RELIVE performs additional function evaluations during the hill climbing procedure, therefore it is normal that its number of generations be less than for SEA. Figure 8a presents the actual efficiency values obtained by the algorithms. Figure 8b includes a comparison relative to LP, where in each of the 1000 trials the efficiency found by LP was considered to correspond to 100% and the efficiency found by the other algorithms is represented as a percentage of that found by LP. It can be seen that the results of LP and RELIVE are very close, with LP being slightly better, while those of SEA are of a lower quality.
It can also be seen that there is no significant difference between the results of SEA and RELIVE with different numbers of generations: most likely, 1000 and 100 generations, respectively, are sufficient for such problems.
Similar statistics are displayed in Figure 9 for systems with n = 100. In this case, since the problems are more difficult, there are greater differences between algorithms. LP remains the best, while the relative average efficiency of RELIVE solutions is around 75%, and that of SEA is around 45%. Figures 10 and 11 show the results obtained for the cost minimization problems. Since an increase in the number of generations does not seem to be a decisive factor, only 100 and 500 generations were considered for SEA and RELIVE, respectively. The relative performance of algorithms is similar: LP provides the best results, RELIVE results are comparable, slightly worse especially for n = 100, while SEA gives an average minimum cost around 120-130% higher than the optimal solution.
In addition, in order to better verify the effectiveness of the proposed algorithms, for the 2000 problems studied, the results obtained for the initial variant were compared with those for two other variants in which the order of the subsystems changed, being sorted by reliability. The LP algorithm provided the same results for all 2000 problems checked, which highlights its stability for this type of stress. This is not the case with the two evolutionary algorithms, RELIVE and SEA, but the differences that occurred were not statistically significant. Consider problem for which the best solutions generated by the three optimization algorithms are shown in Table 12. The solutions given by RELIVE and LP both require 7737 cost units, but the solution generated by LP achieves higher reliability (0.947769) compared to that given by RELIVE (0.927214); • Consider problem for which the best solutions generated by the three optimization algorithms are shown in Table 15. Please note that the solution given by SEA requires the highest cost and offers the lowest reliability compared to the solutions given by RELIVE and LP.
For a better comparison of the three proposed optimization algorithms, 1000 randomly generated problems were considered for both = 50 and = 100. The corresponding results are presented in Figures 8-11. Each graph presents the mean values as the height of the bars, with the standard deviations represented as two sigmas (one up from the mean, and one down from the mean).
First, the reliability maximization case for = 50 was considered. Figure 8 shows some statistics of the final system reliability obtained by the algorithms. Since the performance of the evolutionary algorithm greatly depends on the number of generations, two versions were considered: 1000 and 10,000 generations for SEA, and 100 and 500 generations for RELIVE.
It must be mentioned that RELIVE performs additional function evaluations during the hill climbing procedure, therefore it is normal that its number of generations be less than for SEA. Figure 8a presents the actual efficiency values obtained by the algorithms. Figure 8b includes a comparison relative to LP, where in each of the 1000 trials the efficiency found by LP was considered to correspond to 100% and the efficiency found by the other algorithms is represented as a percentage of that found by LP. It can be seen that the results of LP and RELIVE are very close, with LP being slightly better, while those of SEA are of a lower quality.    (a) (b) Figure 10. Comparison between the performance of the algorithms for cost minimization with systems of = 50 subsystems: (a) average cost; (b) results relative to LP.
(a) (b) Figure 11. Comparison between the performance of the algorithms for cost minimization with systems of = 100 subsystems: (a) average cost; (b) results relative to LP. Figure 11. Comparison between the performance of the algorithms for cost minimization with systems of n = 100 subsystems: (a) average cost; (b) results relative to LP.

Discussion
In the mathematical model, we assume that the time to failure of a component follows a negative-exponential distribution. For electronic components or electronic modules, especially for integrated circuits, the time to failure is usually considered to have such a distribution. This means that, for a given operating regime, the average failure rate is constant (and not a function of time). But for mechanical elements, for example, this assumption must be accepted with caution because of the physical wear and tear that can occur during system operation. In this case, a Weibull distribution may be more appropriate.
This assumption is important only for specifying the reliability of the redundant system. Only under this assumption the reliability function for most of the redundant structures we considered can be determined analytically, using Markov models, as presented in Section 3. For other distributions, the evaluation of subsystem reliability is more complicated and can be done in other ways, e.g., by using a Monte Carlo simulation.
The optimization methods used in this study are not fundamentally affected by this simplifying assumption. The only change concerns the calculation of the objective function, which otherwise should be done in a different way. Thus, we appreciate that the comparative performance results of the three optimization methods presented in this article are not significantly affected by this simplifying assumption.
The systems discussed in this paper are all series-aligned subsystems. Our study does not cover cases where a system component may have a redundant structure composed of elements other than the base component, as shown in Figure 12.
Similar statistics are displayed in Figure 9 for systems with = 100. In th since the problems are more difficult, there are greater differences between algo LP remains the best, while the relative average efficiency of RELIVE solutions is 75%, and that of SEA is around 45%. Figures 10 and 11 show the results obtained for the cost minimization problem an increase in the number of generations does not seem to be a decisive factor, o and 500 generations were considered for SEA and RELIVE, respectively. The relat formance of algorithms is similar: LP provides the best results, RELIVE results a parable, slightly worse especially for = 100, while SEA gives an average minim around 120-130% higher than the optimal solution.
In addition, in order to better verify the effectiveness of the proposed algorith the 2000 problems studied, the results obtained for the initial variant were compar those for two other variants in which the order of the subsystems changed, bein by reliability. The LP algorithm provided the same results for all 2000 problems c which highlights its stability for this type of stress. This is not the case with the t lutionary algorithms, RELIVE and SEA, but the differences that occurred were no tically significant.

Discussion
In the mathematical model, we assume that the time to failure of a component a negative-exponential distribution. For electronic components or electronic mod pecially for integrated circuits, the time to failure is usually considered to have distribution. This means that, for a given operating regime, the average failure rat stant (and not a function of time). But for mechanical elements, for example, this a tion must be accepted with caution because of the physical wear and tear that ca during system operation. In this case, a Weibull distribution may be more approp This assumption is important only for specifying the reliability of the redund tem. Only under this assumption the reliability function for most of the redundan tures we considered can be determined analytically, using Markov models, as pr in Section 3. For other distributions, the evaluation of subsystem reliability is mo plicated and can be done in other ways, e.g., by using a Monte Carlo simulation.
The optimization methods used in this study are not fundamentally affected simplifying assumption. The only change concerns the calculation of the objecti tion, which otherwise should be done in a different way. Thus, we appreciate comparative performance results of the three optimization methods presented in ticle are not significantly affected by this simplifying assumption.
The systems discussed in this paper are all series-aligned subsystems. Our stu not cover cases where a system component may have a redundant structure comp elements other than the base component, as shown in Figure 12. In this situation, the optimization problem must be formulated differently, a volves the inclusion of more types of components than those that form the non-red system. In this situation, the optimization problem must be formulated differently, and it involves the inclusion of more types of components than those that form the non-redundant system.
Such cases are encountered in complex systems, e.g., with a network structure. Unfortunately, the conclusions regarding the performance of the three optimization algorithms compared in this paper cannot be extended to these more general cases. There is no evidence to support this.
Another point of discussion is needed about the number of generations used by the two evolutionary algorithms. The specific number of generations used in the study are powers of ten so that the reader can have an intuitive view about the results. A fairer comparison would need to assess their performance, e.g., with the same number of objective function evaluations, a common setting in the area of biologically-inspired optimization algorithms. The number of function evaluations is easy to determine in case of SEA. If the population consists of 50 chromosomes and 10,000 generations are used, then 500,000 evaluations are needed. However, RELIVE does not have a constant population size. Additional function evaluations are performed in the hill climbing step, although at most one of these solutions will be actually used subsequently in the next generation, i.e., the best local improvement. It was empirically estimated that RELIVE with 100 generations needs about 27 times more function evaluations than SEA with 1000 generations. Thus, a comparison could be made with SEA with about 27,000 generations. Still, from the statistical analysis presented above, we hypothesize that the poorer results of SEA are not caused by a smaller number of generations than required. The performance in both cases with 1000 and 10,000 generations is quite similar. Also, the main issue is not execution time, because this is not a real-time application, but the fact that SEA usually gets stuck into a local optimum because, e.g., at the "top" part of the allocation, one cannot include any more components without exceeding the cost limit. It would require one to add one component to a subsystem and remove one component from another subsystem in order to improve the optimization. SEA lacks any mechanisms to do so, and such improvements can come only from "lucky" mutations and removals of components during the chromosome repairing procedure. On the other hand, RELIVE has an especially designed mutation for this situation, based on exchanging a unit between a pair of genes. Because of this, we eventually chose to use the lower number of generations, i.e., 1000 for SEA and 100 for RELIVE, because in this case the optimization is faster and it seems to show the hierarchy of the used methods quite well.
Since evolutionary algorithms are stochastic, more runs may be necessary to obtain a good solution. In the case studies presented above, we used the following methodology:

•
For the results presented in Figures 8-11, each algorithm was run a single time for a problem and 2000 problems were used, i.e., 1000 problems for n = 50 and another 1000 problems for n = 100. Due to the high number of problems, the results are statistically significant to assess the performance of the algorithms. These figures show this statistical analysis in terms of mean and standard deviation; • For the results presented in Tables 6-9 and 12-15, the best out of ten runs was selected for SEA and RELIVE, because we were interested in the best solution. The LP algorithm was run only once.

Conclusions
Extensive experimental studies on the allocation of redundancy in large binary systems with a hybrid structure, which include a number of optimization problems of the order of thousands, highlight the difficulty of these optimization problems as the number of subsystems increases. Three algorithms were used for optimization: zero-one integer programming, a classic evolutionary algorithm and an original evolutionary algorithm, RELIVE, which combines global search with local fine tuning and includes a number of mutation strategies in order to escape from local optima.
The proposed algorithms are compared, but their effectiveness was also verified by solving two optimization problems, properly correlated. Specifically, a converse problem of minimizing cost for the reliability threshold found in the first case was also attempted as a means to verify the optimality of the solution and when the solution was not optimal, to attempt to improve it from either the cost or reliability perspectives, and possibly both. Experimental results demonstrate that for large instances of the reliability maximization problem, zero-one integer programming yields the best results, followed by RELIVE. The differences become apparent when the number of subsystems is large, e.g., when n = 100.
As future research, the authors intend to extend the study on the optimal allocation of reliability in hybrid structure binary systems in several directions, as shown below.
For the optimization issues considered in this paper, the type of redundancy is predetermined for all subsystems, as shown in Table 4, Table 5, Table 10 or Table 11. But for certain reliability models this condition may be relaxed. For example, if a redundancy technique based on majority logic is appropriate for a subsystem, then one of the following solutions can be adopted: TMR, TMR/Simplex or 5MR, with or without cold-maintained spare components. The same is true for dynamic redundancy, where active redundancy or hybrid standby redundancy with a hot component and other passive spare ones can be adopted. Therefore, the optimization process can be extended to find an optimal solution that refers to both the type of redundancy and the number of components for each of the n subsystems.
On the other hand, some redundant structures often adopt the technical solution in which the components are functionally compatible but different in design to avoid common errors. For example, this idea applies to majority logic structures (TMR, TMR/Simplex and 5MR) or duplex structure. A future direction of research also refers to these redundant subsystems with heterogeneous structure.
In reliability engineering the problem of system reliability maximization under two or more constraints often arises; for example, under cost constraints, but also under weight and/or volume constraints. We intend to extend the research to also cover this important problem of maximizing system reliability under two or more constraints.
We also plan to study the transformation of the problem into a multi-objective optimization problem, e.g., maximize the system's reliability while minimizing the associated cost. The solutions to be considered would be the solutions around the imposed threshold for cost or reliability. Previously we saw that an increase in the cost limit of only 5% can lead to a larger increase in system reliability. By using a multi-objective optimization approach, such analysis could be more principled.
Another direction of investigation would be to assess the effect of integer-based representation for the evolutionary algorithms instead of the real-valued representation used so far.

Conflicts of Interest:
The authors declare no conflict of interest.

Reliability
The probability that a component or a system works successfully within a given period of time The number of components that make up the redundant subsystem i R i The reliability of subsystem i (subsystem with redundant structure) C i The cost of subsystem i tr i The type of redundancy for subsystem i α, 0 < α < 1 The coefficient of reduction of the failure rate for a warm-maintained reserve compared to the failure rate of the component in operation The reduction factor used to express the failure rate of the decision and control logic of a TMR structure based on the failure rate of the basic components γ, γ > 1 The reduction factor used to express the failure rate of the decision and control logic of a 5MR structure based on the failure rate of the basic components δ, δ > 1 The reduction factor used to express the failure rate of the decision, control and reconfiguration logic of a TMR/Simplex or a TMR/Duplex structure based on the failure rate of the basic components

R ns
The reliability of the non-redundant system (system with series reliability model) C ns The cost of the non-redundant system

R rs
The reliability of the redundant system (system with series-redundant reliability model)

R rs
The reliability of the redundant system (system with series-redundant reliability model)

C rs
The cost of the redundant system R * The required level of reliability of the system C * The maximum allowable cost of the system CO A component in operation (active component) WSC A warm-maintained spare component CSC A cold-maintained spare component Note: For notations r i to tr i , when the subsystem is not indicated the index is not necessary, therefore the notations used are r, c, λ and so on. Assumptions

•
For any redundant subsystem, the spare components are considered identical to the basic one/ones. • For the components in operating mode or for the spare components maintained in warm conditions, the time to failure has a negative-exponential distribution.

•
The events of failure that may affect the components of the system are stochastically independent.