Roadmap Optimization: Multi-Annual Project Portfolio Selection Method

: The process of project portfolio selection is crucial in many organizations, especially R&D organizations. There is a need to make informed decisions on the investment in various projects or lack thereof. As the projects may continue over more than 1 year, and as there are connections between various projects, there is a need to not only decide which project to invest in but also when to invest. Since future beneﬁts from projects are to be depreciated in comparison with near-future ones, and due to the interdependency among projects, the question of allocating the limited resources becomes quite complex. This research provides a novel heuristic method for allocating the limited resources over multi-annual planning horizons and examines its results in comparison with an exact branch and bound solution and various heuristic ones. This paper culminates with an efﬁcient tool that can provide both practical and academic beneﬁts


Introduction
Most project-based organizations are faced with a long list of proposed projects that compete for a limited set of resources such as money, manpower, and equipment [1]. Project portfolio selection (PPS) aims to find which projects an organization should take [2][3][4]. Needless to say, as the decision to allocate and prioritize projects today affects the organization's competitive position in the future [5], and the decisions of initiation (and termination) of projects are of a strategic nature since they involve the commitment of substantial enterprise resources [6], it is recognized worldwide that there is a need to manage projects as an overall portfolio [7] and not as separated projects [8].
Evidently, organizations wish to maximize their return on investment when selecting projects [9], and therefore, the selection process should be based on criteria that take into account this objective function [10].
The operational research problem of PPS was defined [11] as the situation where several projects are available for investment. They are different in their resource needs (both resource types and resource demand level).
The current research and method are lacking in two aspects: they do not take into consideration the time value of the projects (i.e., contribution depending on the projects' completion time), and they also do not consider the dependence between the projects (i.e., precedence and competition over resources). This research aims to present a novel formulation of the problem, namely one that incorporates theses aspects. The article also provides an exact solution algorithm (branch and bound) that can solve small to medium problems. However, due to the NP-hard nature of the problem, large-scale problems require a different approach. Thus, several metaheuristic algorithms are proposed and analyzed.

Literature
The research in the area of project portfolio selection has been growing intensively in the past decade, with intensive proliferation of research articles tackling a myriad of variations of this problem. Some of this proliferation was summarized in several review papers. For example, Frey and Buxmann [12] reviewed the literature on portfolio selection of IT projects. Weissenberger-Eibl and Teufel [13] provided a strategic and political review on project selection. Padhy [14] and Condé and Martens [15] reviewed six-sigma project selections. Danesh et al. [16] provided a broad review of multi-criteria portfolio management, and Mohagheghi et al. [17] reviewed models, uncertainty approaches, solution techniques, and case studies.
The simplest model of the project selection problems is single-attribute optimization under resource constraints. This problem resembles the well-known knapsack algorithm [18,19]. This problem was primarily extended to reflect synergies and uncertainty [18]. The main extensions to this type of optimization were with interactions and under uncertainty [20,21].
Another classical model of the project selection problem is financial project portfolio selection, which is based on mean profit vs. profit variance [22]. However, this model of portfolio selection is rooted in investment in the stock markets, and its use is indeed more suited to investment decisions in a portfolio of stocks and other financial assets. Thus, only a few papers on project portfolio selection adopted this modeling approach [9]. This model's assumptions and characteristics are more suited to pure investment decisions than to a company or organization decision problem.
During the last two decades, fuzzy logic started to play an important role in decision making, and in the past decade, it made its way into the portfolio selection literature. For example, Perez and Gomez [33] and Perez et al. [38] used fuzzy constraints between projects, while others [1,35,39,40] used them in the objective function.
Another branch of research gathered both project scheduling and project selection into a single decision frame (some examples are in [19,30,[41][42][43]).
While the above discussion dealt with project selection in technical terms, strategic project portfolio selection presents a very different approach [25,[44][45][46]. Killen et al. [4] identify three strategic perspectives of project portfolio selection: (1) the resource-based view, (2) the dynamic capabilities view, and (3) the absorptive capacity view. Kaiser et al. [47] stressed the role of structural alignment of selected projects with the organization's values, vision, and strategy. Kopmann et al. [46] suggested fostering both deliberate and emergent strategies. Finally, Guo et al. [25] suggested balancing strategic contributions and financial returns.
While the existing reviews cover their relevant part of the literature and suggest some classifications of the project selection problems, a more formal and general classification system can contribute to them. We suggest a system that would be close to Kendall's notation in queuing theory [48]. To foster a discussion for filling this gap, we present here our initial attempt at classification of project portfolio selection's main problem types using the major characteristics of these problems. It is hoped that the suggested classification scheme will initiate a discussion (or even a debate) which will culminate in an agreed-upon standard classification method. In Figure 1, we propose a classification method for the PPS problem. The suggestion is to classify the problems by their objective type, the solution method used to make the selection, the nature of the data, and the constraints. Therefore, the proposed method has four classifiers: X-Objective type: single-attribute selection, multiple-attribute selection, fitness functi profit vs. risk selection, utility vs risk selection, and strategic selection; Y-Solution type: optimized selection, robust selection, efficient frontier, AHP/ANP, a priority-based; Z-Data type: deterministic data, fuzzy data, and stochastic data; W-Constraint characteristics: a combination of letters that testify for the existence of characteristic constraint as follows: P = precedence constraints, RL = resource-le constraints, DR = depleted resource constraints, and T = time constraints. Some examples of this classification scheme are shown in Table 1.  X-Objective type: single-attribute selection, multiple-attribute selection, fitness function, profit vs. risk selection, utility vs risk selection, and strategic selection; Y-Solution type: optimized selection, robust selection, efficient frontier, AHP/ANP, and priority-based; Z-Data type: deterministic data, fuzzy data, and stochastic data; W-Constraint characteristics: a combination of letters that testify for the existence of the characteristic constraint as follows: P = precedence constraints, RL = resource-level constraints, DR = depleted resource constraints, and T = time constraints.
Some examples of this classification scheme are shown in Table 1. As stated above, enhancements to this classification scheme and even challenges are welcome as part of future research.
The problem researched in this article is of a single objective (maximum value); the data are assumed to be deterministic, and no depleted resources are concerned. The article utilizes two solution types: optimal for small-to-medium-sized problems and metaheuristic searching for larger ones. Therefore, this problem should be classified according to the hierarchical classification of Figure 1 as multi-attribute, optimized, and deterministic.
This article does not consider random, fuzzy, or gray data.

Problem Description
The problem deals with an ongoing situation of R&D to develop projects. The need is to decide which projects should be scheduled for each year in the planning horizon. As a company has a limited amount of resources each year, it is impossible to perform all projects at once, and therefore, there is a need to schedule less-lucrative projects for distant years. In the case of no constraints, all projects would have been planned for the first year. However, this is not the case. Two reasons compel postponing projects to future years:

•
Resource availability: Each project requires a specific level of various resources. The assumption is that there cannot be a breach of the available level of each resource. • Precedence: As part of the R&D project, the company acquires new capabilities that can be exploited for future projects.

Problem Assumptions
• Each project has a specific value to the company. This value can be measured in the same units (typically profit, measured in dollars).

•
The value of the project to the company depreciates as a function of time; that is, each year can be assigned a corresponding coefficient that expresses the depreciated value.
(For example, a project assigned to year 1 has a value of 100. The same project assigned to year 2 has a value of 80. If assigned to year 3, it has an even lower value, etc.).

•
All relevant resources are renewable (e.g., work hours). For each year, there are new levels of resources available. The amount of a resource that was not consumed in year n cannot be used in year n + 1.

•
Each project has given levels of resources needed for its completion (e.g., programmer hours or QA hours).

•
Technical precedence dependencies exist between the projects. Thus, project x can be performed based on the technology performed for project y.

Problem Notations
To assist understanding of the formulation, Table 2 depicts the notations used for the formulation.

Problem Formulation
The objective function is for maximizing the cumulative value of the project portfolio such that Each project is to end in one year only, so the relevant constraints are Since a project can stretch over more than one year (i.e., start before its final year), there is a need for an auxiliary set of variables ( = Z), denoting the years a project can use resources (i.e., the project cannot use resources after its ending): 6 of 23 Thus, z k,i can have the maximum value of 1 for each year until its ending year and 0 onward.
To ensure that the project spreads over consecutive years, we use This notification enables the setting of the resource-consuming variables ( = W), denoting the level of resource j consumed by project k at year i: w k,i,j ≤ q k,j z k,i ∀k ∈ {1, 2, . . . , N}, ∀i ∈ {1, 2, . . . , H}, ∀j ∈ {1, 2, . . . , P} Thus, a project may consume part of the resources it needs in year n and part of year n + 1 (e.g., use 2023's budget and 2024's budget). Additionally, each project must consume all the needed resources: To prevent over-consumption of resources at any given year, the resource level constraints are Finally, the technical dependencies among the projects are expressed as A small example to help illustrate this formulation is detailed in the Appendix A.

Problem Complexity and Exact Solution
Although the formulation depicted in Section 3.3 is accurate and needed, it is of little contribution when trying to solve such problems. The described problem is NP-complete, yet exact solutions can be obtained via the branch and bound (B&B) method. Section 4.1 describes a B&B solution for this problem.
Since a B&B solution is practical for some of the problems, and the NP-completeness of the general problem hinders exact solutions, a practical and efficient metaheuristic solution is described in Section 5. This solution method is useful for large-sized problems and also provides an initial upper bound for the B&B algorithm.

Complexity
To prove the NP-completeness of the PPS, a reduction to a well-known problem is needed: the precedence constraint knapsack problem [54]. The reduction is performed as follows: • Set the horizon (number of years) to 2. Thus, the projects are either assigned to the first year (i.e., inserted to the "knapsack") or to the second one (left out).

•
Set the second year's value to zero (y 2 = 0). As PPS is far more complicated than this, it is clear that PPS is of an NP-complete nature.

Branch and Bound Algorithm
An efficient B&B algorithm is based on the following components: • A lower bound (LB): Any feasible solution can provide an LB. The algorithm can start with either a solution produced by the metaheuristic algorithm or a simple heuristic solution.

•
An upper bound (UB): This is an efficient way to assess the maximal potential of a partial solution (i.e., branch) of the tree. The UB can serve as a convenient heuristic precedence rule (i.e., a rule to decide which branch to further develop first). • A branching method: This is a way to create the net branches. The following subsections describe the components of the algorithm.

Initial Solution
It is convenient (though not necessary) to start the run of the algorithm with a lower bound (LB). The higher the LB, the better. A reasonable algorithm should consider all important attributes of the projects, namely the resource requirements and precedence. The following algorithm can provide a decent initial solution: 1.
For each resource type (j): 1.1 Calculate for each project the ratio of the project value to its resource requirement; that is, , where θ k denotes the value that can be obtained from one unit of the resource by performing the project.

1.2
For each project, calculate its total impact. The impact is calculated by aggregating the project's θ k value and the values of all its successors (direct and indirect). For example, for project 1, the impact is I 1 = θ 1 + θ 5 + θ 7 + θ 8 (since projects 5, 7, and 8 are successors to project 1).

1.3
Sort the projects in descending order by the total impact. 1.4 Schedule the projects according to the order obtained in Step 1.3. Each project should be scheduled for the first available year. 1.5 Calculate the total value obtained from the schedule of The solution is set to be LB = max ∀j V (j).

Branching
The branching process is quite simple. Each "level" of the tree represents a year. Therefore, the tree depth can only be as deep as the planning horizon.
Each branch is simply a set of projects that fully utilize at least one of the resources available for that year; that is, when branching year i, each branch is a set J of all the unscheduled projects that fulfill the following requirements:

•
There exists a resource type (j) for which • For every subset of J, denoted by J − (i.e., J − ⊂ J and not J − = J) and for every resource type (j), the following is true: The two conditions may look cumbersome, but all they mean is that the set J is a set that exploits one resource to the fullest, and any subset of J can be extended by adding project (s).

Bounding Rule
An efficient bounding rule should satisfy the following demands:

•
Simply calculable: The rule should provide the result with a low complexity algorithm (otherwise, it provides no benefit). • Low UB: To trim the tree as much as possible, the UB should be as low as possible.
The following algorithm provides the two demands. The algorithm is based on two relaxations of the problem: the first is ignoring the precedence constraints, and the second is ignoring the multi-resource nature of the problem (by dealing with one resource at a time).

1.
For each resource type (j), the following should be performed: 1.1 Ignore all resource requirements (other than resource j) and precedence constraints. The remaining problem is the multiple knapsacks problem. Solve the problem accordingly. 1. 3 As the multiple knapsacks problem is an NP-complete problem, when a project is scheduled to start in year i and finish in year i + 1, calculate the obtained value of the project proportional to the year it was scheduled for. 1.4 Calculate the total values of the projects (each according to its year) UB(k).
The proposed algorithm is quite simple and rapid for small-scale problems.

Metaheuristic Search
Although the previous section describes an exact algorithm, it is impractical to apply this algorithm (or any exact algorithm) successfully to large-size problems. The algorithm provided in Section 4.2.1 may prove unsatisfactory for even medium-let alone large-scale problems.
To provide near-optimal solutions, a metaheuristic approach is suggested. The benefit of this approach is that (providing enough runtime) optimality is guaranteed. Since the first conception of metaheuristics, many approaches were suggested. The proposed solution is based on the CLONALG metaheuristics. This search method was developed by de Castro and Van Zuben [55]. This basic method can be applied to various scheduling problems, as long as the following requirements are provided [56]: • Representation of the solution space (i.e., the "antibodies"); • An initial set of solutions. • A procedure to create valid mutations in the antibodies (i.e., mutations creating new solutions that are included in the solution space).
Section 5.1 elaborates on the application of the first requirement to PPS. The second requirement application is provided in Section 5.2. Finally, several different mutation algorithms (third requirement) are detailed in Section 6.

Vector Representation
Although the formulation provided in Section 3 (and the notations of Table 2) is mathematically accurate, the presentation of the decision variables ( = X ) is impractical for the purposed CLONALG process. The main problem is that most possible instances of = X are not feasible, either because they represent schedules that violate the dependency constraints or violate the resource constraints. An ideal representation is one in which every possible instance represents a feasible solution. Thus, the mutation process would never yield infeasible solutions. Another requirement is that all feasible solutions can be represented (i.e., the vector representation should spread the entire solution space) so the mutation process will not omit any possible solution.
To achieve this, a vector G is introduced. This vector represents the "genotype" of the solution (i.e., it is not the schedule itself), but from each possible instance of G, a feasible "phenotype" ( = X , the solution) can be derived. G is a vector of N natural numbers (from 1 to N), where g k is the k th project to be scheduled.
The transformation from G to = X ("genotype to phenotype translation") is performed as follows: MAIN

1.
For k = 1 to N, do the following: x k,j = 0 (i.e., g k has not been scheduled yet): Run procedure "FindYear" with parameter k.
For m = 1 to k − 1, do the following (without loss of generality, as it is assumed that if project k depends on project m, then k > m): If d m,k = 1 and ∑ N j=1 x m,j = 0 (project k depends on project m, and project m has not been scheduled), run procedure "SCHEDULE" with parameter m.

1.2
Otherwlse (either project k does not depend on project m or project m has already been scheduled), continue.

2.
Find the first year i in which g k can be scheduled (has enough available resources and does not violate dependencies), and set x k,i = 1.

Return
This simple algorithm provides the two requirements: (1) any genotype G can be converted to a feasible "phenotype" (feasible = X ) through a simple process, and (2) all feasible solutions can be originated from a genotype G.

Initial Solution Generation
Since the aforementioned procedure makes any vector containing the natural numbers from 1 to N to be a feasible representation of a solution, a procedure to generate an initial feasible solution is quite straightforward; any "scrambled" vector containing the numbers from 1 to N in random order will suffice. To achieve a set of these scrambled vectors, the following method was applied:
For i = 1 to N do the following: Set s i,2 = U(0, 1) (random number from the unit uniform distribution).
After this stage, the first column is filled with running numbers and the second with random numbers.
3 Sort = S in ascending order according to the second column.

4
The first column of = S is an initial solution vector (G).

Mutation Generation
The presentation of the solution vector and the initial solution set generating set the ground for the central part of the CLONALG process: mutation generation and cloning. The mutations are generated by random change insertion to the genome vector (G). Since the genotype vector describes the order of scheduling the projects, the mutation process will be carried out by changing this order. This section describes three approaches to the mutation process. The first and the second are "traditional" approaches, and the third one attempts to exploit clustering techniques to improve the search performance. An additional approach is also presented: a combination of the previous ones.

Minor Mutations
The most trivial and straightforward approach to changing the order of the vector members is by replacement. The simplest way is to randomly choose a project (member of the genotype vector) and replace it with its neighbor, as depicted in Figure 2. In this case, the fifth location (project 6) was chosen and replaced with its neighbor in the sixth location (project 3).

Minor Mutations
The most trivial and straightforward approach to changing the order of the vector members is by replacement. The simplest way is to randomly choose a project (member of the genotype vector) and replace it with its neighbor, as depicted in Figure 2. In this case, the fifth location (project 6) was chosen and replaced with its neighbor in the sixth location (project 3). There is an advantage in small mutations. When in the vicinity of the optimal solution, a small mutation is less likely to cause damage and drift away from the optimal solution, as visualized in Figure 3. The small mutation, though in a wrong direction, is less likely to corrupt the solution value than the larger mutation in the correct direction. There is an advantage in small mutations. When in the vicinity of the optimal solution, a small mutation is less likely to cause damage and drift away from the optimal solution, as visualized in Figure 3. The small mutation, though in a wrong direction, is less likely to corrupt the solution value than the larger mutation in the correct direction.

Major Mutations
Though the minor mutations have the advantage of minimal damage when in the vicinity of the optimum, they are likely to provide only minimal improvement (i.e., many steps needed toward the optimum). To illustrate this, let us examine the case depicted in Figure 4. Project 7 has a high value and therefore should be scheduled ASAP. Let us assume a pre-mutated solution vector ̅ = (4,10,1,2,3,5,6,7,8,9). A mutation switching Projects 1 and 10 may decrease the total value (as Project 10 is postponed) without expediting the lucrative Project 7. This mutation has a high probability of being rejected (decrease in the objective function). To expedite Project 7, there is a need for several "lucky" small mutations. This sequence of small mutations will indeed appear eventually but may take quite

Major Mutations
Though the minor mutations have the advantage of minimal damage when in the vicinity of the optimum, they are likely to provide only minimal improvement (i.e., many steps needed toward the optimum). To illustrate this, let us examine the case depicted in Figure 4.

Major Mutations
Though the minor mutations have the advantage of minimal damage when in th vicinity of the optimum, they are likely to provide only minimal improvement (i.e., man steps needed toward the optimum). To illustrate this, let us examine the case depicted i Figure 4. Project 7 has a high value and therefore should be scheduled ASAP. Let us assume pre-mutated solution vector ̅ = (4,10,1,2,3,5,6,7, 8,9). A mutation switching Projects and 10 may decrease the total value (as Project 10 is postponed) without expediting th lucrative Project 7. This mutation has a high probability of being rejected (decrease in th Project 7 has a high value and therefore should be scheduled ASAP. Let us assume a pre-mutated solution vector G = (4, 10, 1, 2, 3, 5, 6,7,8,9). A mutation switching Projects 1 and 10 may decrease the total value (as Project 10 is postponed) without expediting the lucrative Project 7. This mutation has a high probability of being rejected (decrease in the objective function). To expedite Project 7, there is a need for several "lucky" small mutations. This sequence of small mutations will indeed appear eventually but may take quite a long time. Figure 5 visually depicts the difference between the mutation types. To improve this, another version of mutations is suggested: randomly choosing two projects in the vector and switching their locations, as depicted in Figure 6. In this case, the third location (Project 2) and the eighth location (Project 1) were switched. While this "mega-mutation" may prove lethal (i.e., significantly reduce the objective function value), it may also save the need for a lucky sequence of minor mutations.

Oriented Mutations
As claimed by Darwin, in nature, all mutations, whether large or small, are totally random and have no special direction (unlike the Lamarckism theory, which claims that they evolve toward a defined goal). The two mutation types described in Sections 6.1 and 6.2 are Darwinian; that is, they are totally random, and each project has the same probability to be selected. This full randomness has the distinct advantage of being totally unbiased but may prove inefficient. Evolutionary scientists have claimed for decades that a gene cannot be considered simply "bad", "good", or even "helpful" for the organism, but rather a set of genes operating together may prove beneficial [57][58][59]. For example, a set of sharp incisors and canines is of no use for an herbivore animal, nor are long intestines and complex stomachs useful for a carnivorous hunter. A gene for sharp teeth can contribute only when accompanied by other genes for a carnivorous lifestyle, yielding a cheetah for example. Not to push the natural metaphor too much, but this observation from the field of "ordinary" evolution can be adapted to the field of evolutionary metaheuristic search. If project X and project Y are both predecessors of project Z (which is very lucrative), then there is no advantage in expediting project X alone, but it is very beneficial to expedite X and Y together. In our example, expediting Project 5 may prove unbeneficial if not accompanied by Projects 6 and 7. A beneficial mutation will include several combined small mutations and thus collectively improve the total value. To improve this, another version of mutations is suggested: randomly choosing two projects in the vector and switching their locations, as depicted in Figure 6. In this case, the third location (Project 2) and the eighth location (Project 1) were switched. To improve this, another version of mutations is suggested: randomly choosing two projects in the vector and switching their locations, as depicted in Figure 6. In this case, the third location (Project 2) and the eighth location (Project 1) were switched. While this "mega-mutation" may prove lethal (i.e., significantly reduce the objective function value), it may also save the need for a lucky sequence of minor mutations.

Oriented Mutations
As claimed by Darwin, in nature, all mutations, whether large or small, are totally random and have no special direction (unlike the Lamarckism theory, which claims that they evolve toward a defined goal). The two mutation types described in Sections 6.1 and 6.2 are Darwinian; that is, they are totally random, and each project has the same probability to be selected. This full randomness has the distinct advantage of being totally unbiased but may prove inefficient. Evolutionary scientists have claimed for decades that a gene cannot be considered simply "bad", "good", or even "helpful" for the organism, but rather a set of genes operating together may prove beneficial [57][58][59]. For example, a set of sharp incisors and canines is of no use for an herbivore animal, nor are long intestines and complex stomachs useful for a carnivorous hunter. A gene for sharp teeth can contribute only when accompanied by other genes for a carnivorous lifestyle, yielding a cheetah for example. Not to push the natural metaphor too much, but this observation from the field of "ordinary" evolution can be adapted to the field of evolutionary metaheuristic search. If project X and project Y are both predecessors of project Z (which is very lucrative), then there is no advantage in expediting project X alone, but it is very beneficial to expedite X and Y together. In our example, expediting Project 5 may prove unbeneficial if not accompanied by Projects 6 and 7. A beneficial mutation will include several combined While this "mega-mutation" may prove lethal (i.e., significantly reduce the objective function value), it may also save the need for a lucky sequence of minor mutations.

Oriented Mutations
As claimed by Darwin, in nature, all mutations, whether large or small, are totally random and have no special direction (unlike the Lamarckism theory, which claims that they evolve toward a defined goal). The two mutation types described in Sections 6.1 and 6.2 are Darwinian; that is, they are totally random, and each project has the same probability to be selected. This full randomness has the distinct advantage of being totally unbiased but may prove inefficient. Evolutionary scientists have claimed for decades that a gene cannot be considered simply "bad", "good", or even "helpful" for the organism, but rather a set of genes operating together may prove beneficial [57][58][59]. For example, a set of sharp incisors and canines is of no use for an herbivore animal, nor are long intestines and complex stomachs useful for a carnivorous hunter. A gene for sharp teeth can contribute only when accompanied by other genes for a carnivorous lifestyle, yielding a cheetah for example. Not to push the natural metaphor too much, but this observation from the field of "ordinary" evolution can be adapted to the field of evolutionary metaheuristic search. If project X and project Y are both predecessors of project Z (which is very lucrative), then there is no advantage in expediting project X alone, but it is very beneficial to expedite X and Y together. In our example, expediting Project 5 may prove unbeneficial if not accompanied by Projects 6 and 7. A beneficial mutation will include several combined small mutations and thus collectively improve the total value.
The problem then is how to recognize whether project X is beneficial with project Y. The projects have dependencies, and they compete for the same resources. To predict whether two projects should be scheduled together, a technique often used by data science was exploited: the similarity coefficient method (SCM). The concept of the SCM was originally used for group technology (GT) applications [60,61], but it is now used for a wide variety of classification and optimization problems [62][63][64]. Therefore, the proposed method is actually a combination of three fields: SCM, oriented (Lamarckist) evolution, and PPS. Obviously, the similarity between machines and manufacturing processes (as used by GT) should be altered too.
The challenge is to keep the process of CLONALG and its random mutations while inserting "smart" mutations. The basic notion laying beneath this approach is to increase the probability of clustered-together projects to be moved simultaneously (either postponed or expedited but together) by resorting to the SCM. The first step is to calculate the similarity between the projects (i.e., likelihood of benefiting from being scheduled together). The second phase is to generate the mutation in a way that is based on this similarity.
The similarity measure will be as follows: • Dependent projects: In the example depicted in Figure 4, Projects 5 and 6 have a common dependent in Project 7. This means that it will not be possible to gain the value of Project 7, even if Project 5 is expedited. There is a need to complete Projects 6 and 2 as well. Any small mutation expediting just one of these projects will leave the others untouched and fail to yield a major gain in value. Furthermore, any mutation that causes one of these projects to be postponed will yield a major reduction in the total value. A mutation involving all three projects may enable the expedition of the lucrative Project 7. The proposed similarity measure is (based on [65]) where the latter is simply the number of projects that depend on both project k and m divided by the number of projects that depend on either of the two. For example, S1 5,6 = 1 1+1+1 = 1 3 , since only Project 7 depends on both projects, and there are 3 projects that depend on either 5 or 6. • Mutual dependencies: When two projects depend on the same (or nearly the same) projects, expediting both together causes only a little more impact on the entire schedule than expediting only one (i.e., "two for almost the same price"). Therefore, the second similarity is calculated as follows: • Resource requirements: Obviously, all projects compete for the same resource pool. Therefore, the third similarity is based on the measure of the level of common resources required by both projects. Two projects that require totally different resources do not compete at all. Projects that compete for the same resource, in which the resource itself is in abundance, will result in a minimal competition. If, however, both have high requirements for a low-level resource, then they are in head-to-head competition.
Therefore, the third similarity can be calculated as the ratio between the required combined resources and the availability of these resources: • This similarity measure differs from S1 and S2. First, it measures dissimilarity. Second, the value of s3 depends on the number of resources (R). The larger the value of R, the larger the value of s3. This poses a problem for the implementation of the CLONALG method. Therefore, the similarity measure (S3 k,m ) will be calculated as follows: Thus, the similarity measure (S3) is set to be a 0-1 number, where 1 indicates 2 non-competing projects, and the lower the value, the higher the competition for resources.
These three similarity measures are utilized to create a similarity coefficient that incorporates all these attributes. The similarity coefficient is, therefore, the following: where α 1 + α 2 + α 3 = 1 and α 1 , α 2 , α 3 are non-negative. The similarity coefficient is the base of the mutation generation algorithm: 1.
Randomly choose a project k; 2.
Create an empty set of projects Φ; 3.
For each project m = 1 . . . N where m = k, do the following: 3.1 Generate a random number u ∼ U(0, 1); 3.2 If SC k,m > u, then add project m to Φ.

4.
Randomly choose a location l for project k;

5.
Move all members of the set Φ to location l (while maintaining the inner order of Φ).

Mixed Mutations
The basic concept of the oriented mutations is that the advance toward the optimum is not limited to random mutations but also benefits from knowledge and common sense (as expressed in the similarity coefficient matrix). In Figure 7a, there is an illustration of random mutations, where the new solutions are spread randomly. Figure 7b Thus, the similarity measure ( 3) is set to be a 0-1 number, where 1 indicates 2 noncompeting projects, and the lower the value, the higher the competition for resources.
These three similarity measures are utilized to create a similarity coefficient that incorporates all these attributes. The similarity coefficient is, therefore, the following: where + + = 1 and , , are non-negative. The similarity coefficient is the base of the mutation generation algorithm: 1. Randomly choose a project ; 2. Create an empty set of projects Φ; 3. For each project = 1 … where , do the following: 3.1. Generate a random number ~(0,1); 3.2. If ℂ , > , then add project to Φ. 4. Randomly choose a location for project ; 5. Move all members of the set Φ to location (while maintaining the inner order of Φ).

Mixed Mutations
The basic concept of the oriented mutations is that the advance toward the optimum is not limited to random mutations but also benefits from knowledge and common sense (as expressed in the similarity coefficient matrix). In Figure 7a  The main risk of the oriented mutation is that the "orientation" will lower the diversification, or the ability to visit many different regions of the solution space [66]. Lower diversification will interfere with the random search and disturb the metaheuristic process that enables escape from the local optima. Though the oriented approach increases the The main risk of the oriented mutation is that the "orientation" will lower the diversification, or the ability to visit many different regions of the solution space [66]. Lower diversification will interfere with the random search and disturb the metaheuristic process that enables escape from the local optima. Though the oriented approach increases the search intensification, it is essential to find an optimal balance between intensification and diversification [67].
To overcome this risk, other approaches were examined for the mutation process. These approaches were basically to tune the level of similarity that is, in the mutation algorithm depicted in Section 6.2, Step 3 is replaced by the following: 3 For each project m = 1 . . . N where m = k, do the following: 3.1 Generate a random number u ∼ U(0, 1); 3.2 If αSC k,m > u, then add project m to Φ. where α is the tuning parameter. A high value of α means that the mutation process relies more on oriented mutations. A low value of α means it is less reliant on these. When α = 0, the mutation process is the same as depicted in Section 6.2.

Database
To examine the performance of the different approaches, a random data set of portfolios was generated. To encompass the wide variety of portfolios, the database was generated by randomly generating different cases varying in size, connectivity, and resources:

•
Size: The number of projects varied between 20 and 120 projects (20,40,60, and 80 projects); • Connectivity: The number of precedence connections can vary between zero (i.e., no project depends on any other one) and full connectivity (i.e., all projects can be presented as Project 1 precedes Project 2, which precedes Project 3, and so on). The problems were divided into three categories: high, low, and medium connectivity; • Resources: For each type of resource, its scarcity can be measured by the ratio between the total demand (of all projects) and its annual availability. The resource scarcity has a strong connection to the planning horizon (the scarcer the resource, the more years are needed to complete the entire set of projects). As the number of years for the entire project was set to five, the scarcest resource was set to require seven times the annual resource level. The total number of resources varied between 1 and 3.
For each combination of size, connectivity, and resources, a set of five random portfolios was generated.
The data set was planned for 5 years (a project scheduled for year 6 is considered to be undesired and not performed).

Experiment Design
The purpose of the experiment was to assess the performance of the four approaches to CLONALG mutations (i.e., minor, major, oriented, and mixed mutations). As there was no database of optimal (or even best-known) solutions, the comparison was based on the following measures: • Ratio to optimal solution: For smaller problems (i.e., sets of 20 projects), an optimal solution was found using B&B; • Ratio to best-known solution: For each instance, the best value was found, and for each method, the ratio between its solution and the best solution was calculated.
To compare the various methods correctly, they were run for the exact same time. The runtime for each problem was set by running the minor mutation first until no improvement was achieved for 20 generations. The time was measured, and then all other methods were run for the same length of time, thus providing a comfortable benchmark.
The mixed method used a value of α = 0.5.

Initial Results
The results of the experiment for small problems are depicted in Table 3. For larger problems, the results are depicted in Table 4.  As can be seen from Tables 3 and 4, the mixed method outperformed all other methods (also relying only on minor mutations proven to be underperforming). A comparison between the methods is described in Figure 8. The graph of Figure 8 reveals that the "oriented" (i.e., Lamarckian) mutations indeed improved the performance of the metaheuristic search. However, mixing them with "classic" mutations provided better solutions. The question that arises is the composition of this mix. The experiment was based on the arbitrary setting of = 0.5 (i.e., halfway between oriented mutations and totally non-oriented ones). As the advantage of oriented mutations was established, it is interesting to further explore and find the optimal ratio of this "mix".
To test this point, a second experiment was performed. The same sets of 80 problems were used again for different values of . The results are depicted in Figure 9. From the results, it is evident that there was almost no significance to the value of α as long as there was a presence of oriented mutations and as long as the oriented mutations monopolized the process.   Average ratio to best-known solution α Analysis of the effect of α The graph of Figure 8 reveals that the "oriented" (i.e., Lamarckian) mutations indeed improved the performance of the metaheuristic search. However, mixing them with "classic" mutations provided better solutions. The question that arises is the composition of this mix. The experiment was based on the arbitrary setting of α = 0.5 (i.e., halfway between oriented mutations and totally non-oriented ones). As the advantage of oriented mutations was established, it is interesting to further explore and find the optimal ratio of this "mix".
To test this point, a second experiment was performed. The same sets of 80 problems were used again for different values of α. The results are depicted in Figure 9. From the results, it is evident that there was almost no significance to the value of α as long as there was a presence of oriented mutations and as long as the oriented mutations monopolized the process. The graph of Figure 8 reveals that the "oriented" (i.e., Lamarckian) mutation improved the performance of the metaheuristic search. However, mixing them w sic" mutations provided better solutions. The question that arises is the compo this mix. The experiment was based on the arbitrary setting of = 0.5 (i.e., hal tween oriented mutations and totally non-oriented ones). As the advantage of mutations was established, it is interesting to further explore and find the optima this "mix".
To test this point, a second experiment was performed. The same sets of 80 p were used again for different values of . The results are depicted in Figure 9. F results, it is evident that there was almost no significance to the value of α as long was a presence of oriented mutations and as long as the oriented mutations mon the process.   Average ratio to best-known solution α Analysis of the effect of α Figure 9. Effect of α.

Summary and Conclusions
This paper aimed to tackle the problem of project selection subject to resource constraints and technical precedence. To test this novel problem, the research developed a benchmark database of portfolios varying in size, precedence complexity, and resources. As far as we can ascertain, this is a one-of-a-kind database, and one of the outcomes of this research is to set benchmark results. This paper provides an exact formulation and an example for the new problem.
To solve the problem, a practical search approach for reaching a solution was developed. This enhancement approach would be applicable for most metaheuristic search techniques by using clustering methods that portray the attractive search zones and act as "intensificators". The proposed search was able to generate feasible, meaningful, and highly satisfactory solutions to the planning of long-horizon problems.
The proposed algorithm has both theoretical and practical implications. The practical one is its ability to upgrade the PPS problem decision making and base it on solid exact foundations. The decision-making process should be based less on "gut feelings" and more on exact and well-presented data. Furthermore, the process may enable the decision makers to be aware of the impact of various constraints and lead to improved decisions (e.g., the economic benefit of recruiting more of a specific type of engineers). The theoretical implications, on the other hand, can be derived from the metaheuristic approach; the suggested oriented search need not be limited to PPS and can be implemented in various scheduling (and perhaps other) problems.
An obvious weakness of the article is its limitation to a specific problem, where the data is deterministic and the objective function is limited to a single one (maximum gain value). In reality, the data are often fuzzy or stochastic, and the proposed model does not take this into account. It is worth mentioning that there is nothing fundamental that prevents the proposed meta-heuristic search techniques from tackling fuzzy objective functions, and this may be a suitable direction for further research.
Another direction for future research could use the presented insights to develop better algorithms that will smartly manipulate the mutation type in the different phases of the search and develop a technique or a method for optimizing the various factors of the search to better its performance.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Problem Example
To help visualize the problem formulation, a miniature PPS is portrayed. Whereas typical PPS may include hundreds of projects, this one includes only 10.

Appendix A.1. Projects
The set includes 10 projects with the dependencies included in Figure 4. The dependency matrix is, therefore, the following:

. Resources
The planning horizon is limited to 3 years. In this example, the number of resources is limited to one. Therefore, the matrix = R is of the dimensions 3 × 1. Let us assume a constant level of 5 units (e.g., 5 worker years for each of the years of the entire horizon): = R T = (5, 5, 5, ∞) As can be seen, the last year has an infinite level of resources, since scheduling a project to year 4 means not performing it at all.
The demand for resources is depicted in Table A1. From this table, the matrix = Q can be derived: = Q T = (2, 3, 1, 3, 2, 3, 1, 2, 3, 1) As mentioned, the planning horizon spreads over 3 years. The first year has a value (depreciation) of 1 (no depreciation), the second has a value of 0.8, the third has a value of 0.5, and everything that follows has a value of 0 (i.e., not planned to be developed at all). Therefore, we set H = 4 (i.e., the 3 years of the planning horizon plus 1 year for the projects that would not be realized). We also set the following: Y T = (1, 0.8, 0.5, 0)