Model and Data-Driven System Portfolio Selection Based on Value and Risk

System portfolio selection is a kind of tradeoff analysis and decision-making on multiple systems as a whole to fulfill the overall performance on the perspective of System of Systems (SoS). To avoid the subjectivity of traditional expert experience-dependent models, a model and data-driven approach is proposed to make an advance on the system portfolio selection. Two criteria of value and risk are used to indicate the quality of system portfolios. A capability gap model is employed to determine the value of system portfolios, with the weight information determined by correlation analysis. Then, the risk is represented by the remaining useful life (RUL), which is predicted by analyzing time series of system operational data. Next, based on the value and risk, an optimization model is proposed. Finally, a case with 100 candidate systems is studied under the scenario of anti-missile. By utilizing the Non-dominated Sorting Differential Evolution (NSDE) algorithm, a Pareto set with 200 individuals is obtained. Some characters of the Pareto set are analyzed by discussing the frequency of being selected and the association rules. Through the conclusion of the whole procedures, it can be proved that the proposed model and data-driven approach is feasible and effective for system portfolio selection.


Introduction
Joint operations have become the main trend of modern warfare.The construction of "system of systems (SoS)" is not only a goal but also a basic guideline on the long-term weapon/equipment development.System portfolio selection is a widely used concept of weapon SoS construction, where a key step is evaluation [1].Traditional evaluation models rely too much on subjective awareness, making assessment results inaccurate and unconvincing to some extent.With the rise of data science, an effective method to compensate for the low accuracy and implementing difficulty of relying on expert experience is making decisions according to real data.Therefore, the combination of data-driven methods and model-based approaches is a new trend to solving system portfolio selection problems.
Markowitz first proposed the portfolio theory in 1952, opening a new era of utilizing mathematical approaches in resource allocation problems [2,3].In the field of management science and operation research, the portfolio theory is widely used in project research and development (R&D) [4], supplier selection [5], material selection etc.With the development of SoS science, portfolio theory shows increasing popularity in the field of weapon SoS construction, where the optimal system portfolio will be selected by evaluating system portfolio candidates through model-based methods.So far, there has been little research that measures weapon system portfolios without subjective criteria.Typical measurements in most literature, such as benefit-risk analysis [6], cost-efficiency analysis [7], and requirement-satisfaction analysis [8] are inaccurate and unconvincing to some extent because they usually require too much expert experience.
Motivated by solving issues mentioned above, data-driven methods are combined with traditional model-based approaches to improve the accuracy and credibility of evaluation results on system portfolios, by reducing dependence on artificial expertise.In addition, data-driven methods are complements of model-based approaches, instead of substitutions, because pure data without models cannot construct the bridge connecting scheme variable inputs and evaluation outputs.
In civilian fields, the portfolio selection theory has been mainly studied and applied on project portfolio problems [9].From the perspective of modeling, the scenario-based models are frequently used to describe the boundary of possible cases, based on which, decision-makers evaluate and select the well-matched optimal system portfolio [10][11][12][13].Robust models are also widely studied and applied in project portfolio problems to solve the difficulties in determining probabilities of future scenarios, aiming to select an ideal system portfolio that performances well at almost all possible situations [14][15][16].As for the evaluation and trade-off of project portfolios, variant methods are proposed and studied, such as risk analysis methods [17], value evaluation methods [18], cost-efficiency methods [19], fuzzy assessment methods [4], preference-based methods [20], game theory, interactive decision methods [21], etc.A common ground of those methods is determining the value and risk of a system portfolio to abstractly indicate what decision-makers expect or not expect.As regard to portfolio planning and optimization, the goal is to select the optimal project portfolio by analyzing and comparing candidate project portfolios.The mixed integer model [22], multi-objective optimization [23], hybrid and dynamic planning are the most popular optimization methods.In addition, genetic algorithms [23], Monte Carlo simulation [24] and Lagrangian relaxation methods are also widely used in the solving process, when facing a large solution scale and specific constraints.
In military fields, most methods in system portfolio selection are based on specific evaluation models, where the most commonly investigated techniques include multiple objective analysis, multiple criteria analysis [25], value analysis [26], cost-efficiency analysis [27], expert judgment [27], Monte Carlo technique, risk analysis and etc.In detail, Yang et al. [26] formulize the weapon system portfolio problem with a mixed integer non-linear optimization model and solve the problem with an adaptive immune genetic algorithm.Greiner et al. [28] conclude challenges of the Department of Defense (DoD) in determining weapon system value during portfolio selection processes.Cheng et al. [29] use combat network and operation loop to analyze strategies of the weapon system portfolio selection problem, where the operational capability evaluation indexes of weapon systems are constructed.Zhou et al. [30] deal with weapon system portfolio selection problems based on fuzzy clustering, with the maximum deviation methods applied to rank all the candidates by calculating the weight of each weapon system.Kangaspunta J et al. [27] use the cost-efficiency method to decide the acquisition and maintenance of military equipment, aiming to build long-term capabilities in future military conflicts.Li et al. [31] adopt a network-based method to formulate and analyze weapon system portfolio architecting problem by embedding different types of systems into a network.Zhou et al. [32] study the evolving capability requirement-oriented portfolio planning problem with a capability-based approach from the perspective of operational research.Huang et al. [33] regard the weapon system portfolio as a constrained combinatorial optimization problem and use a self-adaptive memetic algorithm-based decision-making method to maximize the expected damage of hostile targets.
Whatever in civilian fields and military fields, the model-based portfolio selection methods have been elaborately studied.With the increase of requirements for more accurate and valid approaches, the data-driven idea is appropriate to be applied to the portfolio selection.In the paper, we focus mainly on system portfolio evaluation, where a key part is determining criteria that influence the evaluation result of an object.Herein, two criteria of value and risk are used to evaluate system portfolios, where the value criterion is decided according to capability gaps of system portfolios and the risk criterion is decided by the remaining useful life (RUL) of systems.Based on the two criteria, the optimization is to obtain the system portfolio with the maximal value and minimal risk, within the limitation of a certain cost.To increase the credibility and practicability, the weight information in value evaluation and the RUL are all decided according to simulation data, instead of expert experience.
The remaining parts of the paper are structured as follows.In the second section, the capability gap-based value decision method and the RUL-based risk decision method are studied.In the third section, a case is examined to verify the utility and effectiveness of the proposed methods and models.Then, the results are discussed by analyzing the frequency of being selected and the association rules.

Weight Decision Based on Correlation Analysis
Weight information determines the criteria importance, and thus influences the final evaluation results of systems portfolios.Typically, criteria weights tend to be determined according to expert experience, which is criticized for its subjectivity and infeasibility.Correlation analysis is a quantitative method to measure the correlations between independent and dependent variables, and therefore can be used to indicate the weights of independent variables.
The maximal information coefficient (MIC) is used to measure correlations among all kinds of data.Compared to other measurements, it is acknowledged that the MIC is more sensible to identify correlations among variables, whatever for linear relations or non-linear relations (cubic, exponential, sinusoidal, parabolic, etc.) The correlation of variable X and Y can be indicated by the mutual information, as Equation (1) shows.

I(X, Y) =
x,y p(x, y) log p(x, y) p(x)p(y) , Because the information entropy of discrete random variables is denoted as H(p) = − n i=1 p i log p i , where n i=1 p i = 1.Therefore, the I(X, Y) can be proved to be equivalent to Equation (2).
With respect to MIC, it is assumed that the D = X × Y ⊂ R 2 is the variable space.Dividing the D into a grid G of x × y.The distribution of data D in grid G is represented by D| G .Executing multiple divisions on D and calculating the I(D G ) under different divisions.Then, the MIC can be obtained by calculating the max I(D G ) value from all possible division schemes.
Here, a grid of one row and one column is taken for example as Figure 1 shows.All data points are divided into four areas: top left, top right, bottom left, and bottom right.The numbers of data points belonging to each area are 1, 4, 4, and 1.Then, the normalized numbers of data point frequencies in the four regions are 0.1, 0.4, 0.4, and 0.1.Herein, X has two values: left and right, and Y has two values: upper and lower.The joint probabilities of the data in four regions can be calculated as p(X = le f t, Y = up) = 0.1, P(X = right, Y = up) = 0.4, P(X = le f t, Y = down) = 0.4, P(X = right, Y = down) = 0.1.Therefore, the point frequencies in X, Y are P(X = le f t) = 0.5, P(X = right) = 0.5, P(Y = upper) = 0.5 and P(Y = lower) = 0.5.According to the mutual information calculation formula introduced above, the mutual information of X and Y is obtained as Equation (3).
information calculation formula introduced above, the mutual information of X and Y is obtained as equation (3).Finally, for the relation of ( ) , ,..., n x x x x   =   , the weight information of elements in x can be obtained by calculating the MIC between each independent variable of

Value Model Construction
Value is a measurement to denote the importance degree of an object.In the military field, a frequently used measurement for denoting value is the capability gap, meaning the gap between system capabilities and capability requirements.Before calculating the capability gap, it's necessary to obtain system portfolio capabilities, which are the combination of those of component systems.
represents the capability value of the system i s , where ij c denotes the capability value of i s on Finally, for the relation of y = R(x), x = [x 1 , x 2, , . . ., x n ], the weight information of elements in x can be obtained by calculating the MIC between each independent variable of [x 1 , x 2, , . . ., x n ] and dependent variable y.

Value Model Construction
Value is a measurement to denote the importance degree of an object.In the military field, a frequently used measurement for denoting value is the capability gap, meaning the gap between system capabilities and capability requirements.Before calculating the capability gap, it's necessary to obtain system portfolio capabilities, which are the combination of those of component systems.
Firstly, let CR = {cr 1 , . . ., cr k } be the set of capabilities requirements, proposed by stakeholders, where cr j denotes the value of the jth capability requirement.The C(s i ) = (c i1 , . . ., c ik ), C(s i ) ∈ {0, 1} m represents the capability value of the system s i , where c ij denotes the capability value of s i on the jth capability.The PC(x i ) = (pc 1 (x i ), . . ., pc k (x i )) denotes the capabilities values of the system portfolio corresponding to scheme x i , where pc j (x i ) is the jth capability value.
In the combination process, one case must be considered is that certain systems in a system portfolio may have the same capabilities.It's a key procedure to deal with the combination of this kind of capabilities.Hereon, four rules are introduced as follows to support this combination.
Assume there are n systems s i , i = 1, . . ., n in a system portfolio, providing the same capability t, with capability values:c it , i = 1, . . ., n.
(1) Additive rule: the combined capability value is c it , i = 1, . . .n. E.g., Assuming that 3 transportation systems operate at the same time, the freight volume are 5t, 6t and 7t respectively, then the portfolio of the 3 systems can provide a freight capability of 18t.(2) Maximal rule: the combined capability value is max{c it }, i = 1, . . ., n. E.g., Assuming that there are three bridges over a river, and each bridge can bear the weight of 100t, 120t and 130t respectively, then only object less than 130t can pass the river, because one object can only pass over one bridge at the same time.(3) Minimal rule: the combined capability value is min{c it }, i = 1, . . ., n. E.g., Assuming that there are 3 tandem oil pipelines with oil flow of 5 t/hour, 6 t/hour and 3 t/hour, then the max oil flow of the 3 pipeline is 3 t/hour.(4) Average rule: the combined capability value is c it /n, i = 1, . . .n. E.g., Assuming there are 3 forecast systems, with correctly predicting probability 50%, 60%, and 70%, then the overall correctly predicting probability is 60%.
In addition, the capability can be classified into benefit and cost types.Firstly, the relation between a single capability of a solution and corresponding capability requirement is defined based on three premises: (1) When a capability of a solution is deficient or worse than the inferior value of the corresponding capability requirement interval, which means the solution absolutely cannot meet the capability requirement.Its capability gap is supposed to be 1. ( 2) When a solution can provide a capability, with its value falling in the interval of corresponding capability requirement, its capability gap should be a number between [0, 1].Especially when the solution's capability value exactly falls in the middle of the capability requirement interval, the capability gap is supposed to be 0.5, which means the capability can meet the capability requirement by half.(3) When a capability of a solution equals to or better than the superior value of the corresponding capability requirement interval, it means that the capability can absolutely meet the corresponding capability requirement, and the capability gap is supposed to be 0.
A concrete case is given in Figure 2, reflecting the relation between a capability and corresponding capability requirement interval [l, u].Four special points discussed above are marked by circle points.To mathematically model the relations with universal forms, a formula is defined to fit the linear line above, shown in Equation (4).
where G(c i , cr i ) represents the capability gap between the ith capability and corresponding capability requirement interval [l, u].Two special notation a and b are defined to simplify the formula, where The notation worst denotes the requirement threshold of cost-type capability that the capability gap will be judged to be 1 if a cost-type capability value exceeds worst.
Then, the capability gaps of all capabilities should be aggregated to compute the total capability gap of a system portfolio as k i=1 w i G(c i , cr i ), where the w i denote the weight of the ith capability and is determined according to the method in Section 2.1.1.Then, according to capability gaps, the value is defined as Equation ( 5), which indicates that the bigger the total capability gap, the smaller the value of a system portfolio.
Different to general "capability-based" evaluation methods, which are usually based on hierarchy structures such as a tree structure that the capabilities of all systems in bottom level are up-aggregated to an integrated value, capability gap is a criterion denoting the gap between capabilities and capability requirements.For general methods, an inevitable defect is that an extremely high capability will pull up the integrated capability value, which is obviously unreasonable.Whereas for the capability gap, it can mitigate the effects of extreme capability values.For example, even an infinitely high benefit-type capability can only result in corresponding capability gap of 0, instead of unreasonably extreme value.Then, the capability gaps of all capabilities should be aggregated to compute the total capability gap of a system portfolio as , where the i w denote the weight of the i th capability and is determined according to the method in Section 2.1.1.Then, according to capability gaps, the value is defined as equation 5, which indicates that the bigger the total capability gap, the smaller the value of a system portfolio.
( ) Different to general "capability-based" evaluation methods, which are usually based on hierarchy structures such as a tree structure that the capabilities of all systems in bottom level are upaggregated to an integrated value, capability gap is a criterion denoting the gap between capabilities and capability requirements.For general methods, an inevitable defect is that an extremely high capability will pull up the integrated capability value, which is obviously unreasonable.Whereas for the capability gap, it can mitigate the effects of extreme capability values.For example, even an infinitely high benefit-type capability can only result in corresponding capability gap of 0, instead of unreasonably extreme value.

Risk Decision Based on RUL Prediction
General risk assessment methods follow the steps of risk factor identification, risk analysis, risk assessment and risk management.However, for weapon systems, decision-makers mainly focus on the availability and stability of systems in the operation process.Therefore, the RUL, which denotes when a system is predicted to fault, is used as a weapon system risk criterion.The longer the RUL, the smaller the probability a risk will happen in the operational process.In addition, the RUL prediction method based on similarities of degradation characteristics is proved to perform better than other classical methods when there are sufficient historical samples.Therefore, the paper adopts degradation characteristics similarities to predict the RUL to indicate system portfolio risks.
The key resource for predicting RUL is the operational data, which is typically obtained from embedded sensors with time series.In specific, the degradation data is analyzed to support the

Risk Decision Based on RUL Prediction
General risk assessment methods follow the steps of risk factor identification, risk analysis, risk assessment and risk management.However, for weapon systems, decision-makers mainly focus on the availability and stability of systems in the operation process.Therefore, the RUL, which denotes when a system is predicted to fault, is used as a weapon system risk criterion.The longer the RUL, the smaller the probability a risk will happen in the operational process.In addition, the RUL prediction method based on similarities of degradation characteristics is proved to perform better than other classical methods when there are sufficient historical samples.Therefore, the paper adopts degradation characteristics similarities to predict the RUL to indicate system portfolio risks.
The key resource for predicting RUL is the operational data, which is typically obtained from embedded sensors with time series.In specific, the degradation data is analyzed to support the construction of the RUL prediction model and the main steps can be concluded as degradation track phase space construction, track matching, and RUL prediction.Detailed procedures are given in Figure 3. construction of the RUL prediction model and the main steps can be concluded as degradation track phase space construction, track matching, and RUL prediction.Detailed procedures are given in Figure 3.

Feature extraction based on variation coefficient
When multiple sensors are used to monitor the health of a system, a key step for predicting health states is selecting features from the multidimensional time series data, because only parameters without commonality can be treated as attributes features.Therefore, parameters screening is the prerequisite to ensure the non-commonality of parameters.Intuitively, practitioners

Feature extraction based on variation coefficient
When multiple sensors are used to monitor the health of a system, a key step for predicting health states is selecting features from the multidimensional time series data, because only parameters without commonality can be treated as attributes features.Therefore, parameters screening is the prerequisite to ensure the non-commonality of parameters.Intuitively, practitioners should firstly deal with correlations of parameters to make the transformed parameters independent.
In this paper, the statistic of variation coefficients is used to select the parameters features.Similar to the concept of standard deviation and variance in statistics, the variation coefficient is described as the dispersion degree of observations.The variation coefficient is calculated by the ratio of the standard deviation to the mean, as Equation (6) shows.Therefore, it is dimensionless.
Variables with larger variation coefficients show more obvious features and therefore are more suitable for describing characteristics of data.

Reconstruction of Degradation Track Phase Space
According to Taken's theorem, the potential dynamics laws of a system can be studied by constructing the phase space that preserves the topological properties of the original system.Coupled with the nonlinear characteristics of degradation, the time delay embedding theorem is often applied to construct high-dimensional phase space.
Assume that the time series is X = (x 1 , x 2 , • • • , x N ), then the points in the phase space can be expressed as the following form of row vector.
where d is the embedded dimension of phase space and τ is the time delay.Embedded dimension d is the key parameter in phase space construction, whose value will be determined by the non-subjective algorithm.X i (d) is a point in the d dimension phase space.If the point X n(i,d) nearest to X i (d) exists, the following relation can be deduced.
For all points in the d and d + 1 dimensional phase spaces, the definitions for a(i, d), E(d) E * (d), E 1 (d), and E 2 (d) are given respectively.
Then the value of the embedded dimension d can be determined by finding the smallest spatial dimension that makes the d and d + 1 dimensional phase spaces topologically equivalent.The topological equivalence here refers to that the nearest neighbors in the d dimensional space still remain the closest in the d + 1 dimensional space, which means that the E 1 (d) tends to be a stable value when the d and d + 1 dimensional phase spaces are topologically equivalent.However, in reality, it is difficult to find the smallest d that makes E 1 (d) a stable value.That's why the E * (d) and E 2 (d) are defined.If x is a random time series, then E 2 (d) will vary with d, or E 2 (d) will approach 1.Then by combining E 1 (d) and E 1 (d), the embedded dimension of the degradation track phase space can be determined.After the construction of the phase space, the time series can be transformed into the form of special track.By referring to the degradation data transformed from historical data and comparing the similarity of the new track with the reference track in phase space, the RUL can be predicted.

System Portfolio Risk Determination Based on RUL
is the current degradation track.Then, the normalized cross correlation (NCC) is introduced to measure the similarity between tracks.The specific definition of NCC is given as follows: where Taking the jth time series } as an example, the time points of the time series are denoted as is the running life of the j th degradation process.
Taking time delay τ as 1 and considering embedded dimension as d for all degradation track phase spaces (d is no less than the maximal embedded dimension of all phase spaces).Then the original time series of degradation process is transformed into the track of phase space.Denoting the reference track for the jth process as , the points in phase space are expressed as When performing track matching, a track subset is selected with a time window length of l in the current phase space.Then the NCC between the current track Y k and the reference track Z j should be modified as follows.
If the s j k reaches the maximum at the time point T j k , then sub-tracks at time T j k are regarded as the best matching results in terms of degradation reference track Z j .This result also reflects the influence of the track shape to the degradation stage.Therefore, the RUL of the most similar part to the jth degradation reference track can be estimated as On the basis of the above result of the best track matching, the weight of the remaining life calculated according to the jth degradation reference track is given as ) , where M is the total number of the degradation processes.Then the remaining life of the system k is estimated as Equation (12).
Appl.Sci.2019, 9, 1657 Then, the RUL should be transformed into risk according to the joint probability.It is identified as a risk event when any system in a system portfolio breaks.Therefore, for a system portfolio x i = (s 1 , s 2 , . . ., s n ), the risk is calculated as Equation ( 13) shows.
In conclusion, the RUL is an important criterion for the weapon system research because longer RUL can reduce the failure risk of systems in combat.The availability of each component system is the basis to guarantee the normal operation of systems portfolios.An operation activity will be broken or even unsuccessful if any system of a system portfolio malfunctions in the operation process.This induces the demand for long RUL of systems.

System Portfolio Optimization
As discussed above, the paper tries to solve the system portfolio problem considering system values and risks.The corresponding notation is as follows.Let S = {s 1 , . . ., s m } denotes the set of alternative systems, indicating m systems can be selected for a system portfolio.x i = (x i1 , . . ., x im ), x i ∈ {0, 1} m is one of system portfolio schemes, where x ij = 0\1, with x ij = 0 denoting the jth system in S is not selected in scheme x i , and x ij = 1 denoting the jth system in S is selected in scheme x i .All possible schemes compose the solution space X.Let E = (e 1 , . . ., e m ) represents the cost of systems, where e i is the cost of system s i , and B denotes the total budget.Then, the system portfolio optimization model can be formulated as follows.
x ji × e i ≤ B x j = x j1 , . . ., x jm , where V(x j ) and R(x j ) are the value and risk of the system portfolio scheme x j .G(c i , cr i ) represents the capability gap on the ith capability.In addition, the cost of scheme x j must within the budget limitation.
It can be seen that the optimization is a multi-objective problem and the 2 objectives are conflict with each other.Therefore, an optimal system portfolio with the best performance in both 2 objectives does not exist.The target is to obtain the Pareto optimal solutions of 2 objectives, also known as non-dominated solutions.

Background Description
It is hypothesized that the problem aims to select an optimal system portfolio under the anti-missile scenario, where an object will suffer saturated missiles attacks.The objective is selecting a system portfolio from 100 alternative systems (2 100 − 1 candidate system portfolios in total) under the budget limitation to maximize the system portfolio value and simultaneously minimize the whole risk.
According to the operation process of OODA, capabilities discussed in the paper are c1 detection range, c2 communication range, c3 striking range and c4 decision time, where the former three capabilities are beneficial type and the last one is cost type.In a specific operation scenario, the capability requirements and combination rules are shown in Table 1.In addition, for capabilities of systems, they are generated by executing a Monte Carlo simulation method according to truncated normal distribution functions.The histogram of generated data is shown in Figure 4.The worst value of cost-type capability decision time is 34.6853, which will be used in value calculation according to Section 2.1.2.

Weight Determination
Based on simulations on the "Command: Modern Air/Naval Operations", an ultimate military simulator for modern military conflicts, the weight information can be deduced by the correlation analysis.
The independent variables are four capabilities, that is detection range, communication range, striking range and decision time.The dependent variable is the intercepted missile number.By autosimulating for 10,000 times, 10,000 sets of data are generated.Through the MIC algorithm, the corresponding results can be obtained, as Table 2 shows.Through normalization, the weight of four capabilities are determined as 0.276, 0.250, 0.174, and 0.300.− candidate system portfolios, an example of the value calculation process is introduced.Assuming a system portfolio SP1 have 5 component systems of S1, S2, S3, S4, and S5, with capability information shown in Table 3.

Weight Determination
Based on simulations on the "Command: Modern Air/Naval Operations", an ultimate military simulator for modern military conflicts, the weight information can be deduced by the correlation analysis.
The independent variables are four capabilities, that is detection range, communication range, striking range and decision time.The dependent variable is the intercepted missile number.By auto-simulating for 10,000 times, 10,000 sets of data are generated.Through the MIC algorithm, the corresponding results can be obtained, as Table 2 shows.Through normalization, the weight of four capabilities are determined as 0.276, 0.250, 0.174, and 0.300.

Value Calculation
Because it is impossible to calculate all values of 2 100 − 1 candidate system portfolios, an example of the value calculation process is introduced.Assuming a system portfolio SP1 have 5 component systems of S1, S2, S3, S4, and S5, with capability information shown in Table 3.According to capability combination rules in Table 1, the combined capabilities of system portfolio SP1 are shown in the last column in Table 3.
Then, according to Equation ( 4), the capability gaps of four combined capabilities are calculated as Equation (15) shows.The value of any system portfolio can be calculated based on the same steps.

Risk Determination Based on RUL Prediction
In the case study, the key component of weapon systems, the turbine engine, is taken as an example for analyzing risks.The data is derived from the experiment conducted by a commercial modular simulation software C-MAPSS as shown in Figure 5.

Risk Determination Based on RUL Prediction
In the case study, the key component of weapon systems, the turbine engine, is taken as an example for analyzing risks.The data is derived from the experiment conducted by a commercial modular simulation software C-MAPSS as shown in  (3) There are 3 operational setting parameters that have a substantial impact on an engine's performance.(4) There are noises in the data.(5) The engines operate normally at the initial moment and begin to degrade at some points in time series.In the training set, the cumulative degradation quantity continues to grow until it reaches or exceeds the preset threshold.In the testing set, the time series will terminate when engines fail.Due to the fact that the constant variable is unable to reflect the evolution of engine degradations, variable 1, 5, 6, 10, 16, 18, and 19 are not regarded as feature variables.What's more, the tracks exhibit different trends in terms of variable 9 and 14, so variable 9 and 14 are inadequate to describe the degradation process.
Then, the variation coefficients of the rest 12 variables are calculated based on the degradation data and the results are shown in Table 4.According to the rule of eliminating variables with small variation coefficients, variable 3, 4, 11, 15, 17, 20 and 21 are chosen as the base variables that represent engine degradation characters.According to the RUL prediction method, the remaining life of the engines in the testing set can be estimated by matching the testing data with the reference tracks.Then, the risks can be obtained.The results are shown in Table 5. maximizing the value and minimize the risk of system portfolios.Therefore, a multi-objective algorithm is employed to solving the optimization problem.The non-dominated sorting generic algorithm (NSGA) is a kind of widely used multi-objective algorithm, which exhibits a good performance for retaining elites in offspring.On the other side, the differential evolution (DE) is a nice genetic operator, which plays really well on keeping population diversity.Thus, the paper uses the non-dominated differential evolution (NSDE) algorithm, which fuses the two advantages of NSGA and DE, to solve the system portfolio optimization problem.The corresponding parameters are set as follows.The population size is Pop = 100, the number of iterations is Gen = 1000, the mutation probability is 0.01 and the crossover probability is 0. In detail, the 200 individuals of the best Pareto set are shown in the system option diagram in Figure 9.The rectangular area is divided into 100 × 200 rectangles according to the number of system candidates and the dimension of the non-dominated weapon system portfolios.Each rectangle represents whether a system candidate is selected in the Pareto set.If a weapon system i is selected by the j th non-dominated system portfolio, the i th row and j th column rectangular block will be colored black, otherwise, it is left blank.
From Figure 10, it can be seen that some systems are frequently selected in the Pareto set.However, some systems are seldom selected or even never selected.To compare the importance degree of different systems, the frequencies for all systems of being selected in the Pareto set is counted.Systems of S6, S9, S15, S16, S22, S25, S32, S39, S50, S52, S65, S69, S70, S71, S72, S73, S75, S79, S83, S90, S98, and S99 are selected by at least one system portfolio in the Pareto set.In addition, the systems S9, S25, S50 are quite important according to the high selected numbers in the Pareto set.Further, the rank of systems according to selected numbers is: S25 > S9 > S50 > S39 > S69 > S75 > S83 > S22 > S71 > S32 > S73 > S16 > S98 > S72 > S15 > S6 > S65 > S99 > S70 > S79 > S52 > S90, which to some extend indicates the importance degrees of selected systems.As regarding to the rest systems, they can be directly neglected in the system portfolio selection process.In detail, the 200 individuals of the best Pareto set are shown in the system option diagram in Figure 9.The rectangular area is divided into 100 × 200 rectangles according to the number of system candidates and the dimension of the non-dominated weapon system portfolios.Each rectangle represents whether a system candidate is selected in the Pareto set.If a weapon system i is selected by the j th non-dominated system portfolio, the i th row and j th column rectangular block will be colored black, otherwise, it is left blank.
From Figure 10, it can be seen that some systems are frequently selected in the Pareto set.However, some systems are seldom selected or even never selected.To compare the importance degree of different systems, the frequencies for all systems of being selected in the Pareto set is counted.Systems of S6, S9, S15, S16, S22, S25, S32, S39, S50, S52, S65, S69, S70, S71, S72, S73, S75, S79, S83, S90, S98, and S99 are selected by at least one system portfolio in the Pareto set.In addition, the systems S9, S25, S50 are quite important according to the high selected numbers in the Pareto set.Further, the rank of systems according to selected numbers is: S25 > S9 > S50 > S39 > S69 > S75 > S83 > S22 > S71 > S32 > S73 > S16 > S98 > S72 > S15 > S6 > S65 > S99 > S70 > S79 > S52 > S90, which to some extend indicates the importance degrees of selected systems.As regarding to the rest systems, they can be directly neglected in the system portfolio selection process.By deeper analysis, it can be discovered that some systems tend to always be selected together.Therefore, a frequent item set mining algorithm of Apriori is applied to identify the association rules, shown in Table 6.The support parameter indicates the ratio between the simultaneously appearing frequency and all items, which means the probability of appearing simultaneously.The confidence of the rule of "A → B" represents the ratio of ( ) ( ) , which means the probability of A B ∪ when A appears.By deeper analysis, it can be discovered that some systems tend to always be selected together.Therefore, a frequent item set mining algorithm of Apriori is applied to identify the association rules, shown in Table 6.The support parameter indicates the ratio between the simultaneously appearing frequency and all items, which means the probability of appearing simultaneously.The confidence of the rule of "A → B" represents the ratio of ( ) ( ) , which means the probability of A B ∪ when A appears.By deeper analysis, it can be discovered that some systems tend to always be selected together.Therefore, a frequent item set mining algorithm of Apriori is applied to identify the association rules, shown in Table 6.The support parameter indicates the ratio between the simultaneously appearing frequency and all items, which means the probability of appearing simultaneously.The confidence of the rule of "A→B" represents the ratio of support(A ∪ B)/support(A), which means the probability of A ∪ B when A appears.
In Table 5, the association rules are ranked by the value of support and confidence respectively.Firstly, according to the ranking by support, it can be elicited that the "S9→S25" is the most frequent rule, which means they tend to be selected together.In addition, when system S75 is selected, the system S25 must also be selected according to the first rule in the ranking by confidence.Referring the association rules, decision-makers can have a deeper understanding of the significance of system portfolios.

Figure 1 .
Figure 1.Division for variable X and Y.
of capabilities requirements, proposed by stakeholders, where j cr denotes the value of the jth capability requirement.The

Figure 1 .
Figure 1.Division for variable X and Y.By using the traversal algorithm, we can find the maximal value from all possible division schemes as I * (D, x, y) = maxI(D G ) .Then, the characteristic matrix M(D) x,y is constructed through normalization operation on I * (D, x, y) as M(D) x,y = I * (D, x, y)/log min x, y .Whereupon the MIC can be obtained according to MIC(D) = max x,y<B(n) {M(D) x,y }.

Figure 2 .
Figure 2. The gap function curve: (a) Benefit type capability, (b) cost type capability.

Figure 2 .
Figure 2. The gap function curve: (a) Benefit type capability, (b) cost type capability.

Figure 4 .
Figure 4. Histogram of generated capability data.

Figure 4 .
Figure 4. Histogram of generated capability data.

Figure 5 .
Figure 5.The interface of the simulation platform.

Figure 5 .
Figure 5.The interface of the simulation platform.The C-MAPSS simulates the operation of a turbine engine with 900,000-pound thrust and records monitoring signals.Based on the principle of thermodynamics, two failure modes are designed: high-pressure compressor degradation and fan degradation.The main functional modules and connections are shown in Figure 6.The simulation runs in the following settings: (1) The simulation experiment data contains time series of 21 variables.It can be further divided into a training set and a testing set.Each multivariate time series corresponds to a specific engine, meaning that the data can be considered to be generated by engines of different systems.(2) The initial wear condition of each engine might not be identical and there are manufacturing variations, which are considered reasonable and not treated as reasons of engine failures.

19 Figure 6 .
Figure 6.The functional modules and their connection.

Figure 6 .
Figure 6.The functional modules and their connection.As a result, 100 degradation tracks are obtained in the training set and 100 tracks before failure in the testing set.The training data is used to establish the RUL prediction model of engines, and the testing set is used to test the feasibility of the model.The monitoring data is shown in the scatter plots in Figure 7.Each plot visualizes the 100 degradation tracks of one variable in the training set.The engine code, the operation cycle and the 3 operational setting parameters are not shown in the figure.Due to the fact that the constant variable is unable to reflect the evolution of engine degradations, variable 1, 5, 6, 10, 16, 18, and 19 are not regarded as feature variables.What's more, the tracks exhibit different trends in terms of variable 9 and 14, so variable 9 and 14 are inadequate to describe the degradation process.Then, the variation coefficients of the rest 12 variables are calculated based on the degradation data and the results are shown in Table4.According to the rule of eliminating variables with small variation coefficients, variable 3, 4, 11, 15, 17, 20 and 21 are chosen as the base variables that represent engine degradation characters.

Figure 8 .
Figure 8. Result exhibition: (a) Result of 10 times of running; and (b) result of 200 best individuals.

Figure 10 .
Figure 10.Frequencies for all systems of being selected in the Pareto set.

Figure 10 .
Figure 10.Frequencies for all systems of being selected in the Pareto set.

Figure 10 .
Figure 10.Frequencies for all systems of being selected in the Pareto set.
1and Y are of the same length.Y and Z i are the mean vectors of Y and Z i respectively.

Table 2 .
Obtained maximal information coefficient (MIC) of input and output.

Table 2 .
Obtained maximal information coefficient (MIC) of input and output.

Table 4 .
Variation coefficients (VC) of the rest 12 variables.

Table 6 .
The association rules mined by frequent item set mining.

Table 6 .
The association rules mined by frequent item set mining.