Uncertain Public R&D Project Portfolio Selection Considering Sectoral Balancing and Project Failure

: In order to promote scientiﬁc and technological innovation and sustainable development, public funding agencies select and fund a large number of R&D projects every year. To guarantee the performance of the resulting project portfolio and the government’s investment beneﬁts, the decision maker needs to select appropriate projects and determine a reasonable funding amount for each selected project. In the process of project selection, it is necessary to consider the balance of funding allocated to different scientiﬁc sectors as well as the failure probability of the projects in future execution, so that the expected performance of the project portfolio is maximized as much as possible. In view of this, we propose and study the uncertain public R&D project portfolio selection problem considering sectoral balancing and project failure. We formulate a stochastic programming model for the problem to support the portfolio decisions of the funding agencies. We also transform the model into an equivalent deterministic second-order cone programming model that can be directly solved by exact solvers. We generate datasets reﬂecting different scenarios through simulation and perform computational experiments to validate our model. The impacts of various factors (i


Introduction
Science and technology play a key role in socially sustainable development.Governments of various countries have invested a lot of money in the form of public R&D projects to promote scientific and technological innovation [1].For example, the financial investment of the National Natural Science Foundation of China (NSFC) has increased from 80 million RMB in 1986 to 28 billion RMB in 2018 [2]; China's total R&D investment reached 2786.4 billion RMB in 2021 [3].The amount of capital investment received by each approved project has an impact on its performance, which in turn affects the effectiveness of government capital investment [4].Since the total amount of funds available to the project funding agencies is limited, when the funding agencies fund R&D projects, they need to select a portfolio of projects with high potential from a large number of project proposals and allocate a reasonable budget for each selected project.As a result, the sustainability and efficiency of public funds are improved, and the project portfolio's performance is maximized [5].
When the government funding agencies select the portfolio of R&D projects, it is necessary to ensure the balance of budget allocation among different sectors or scientific disciplines, such that a certain degree of fairness is ensured.Typical ways of balancing budget allocation include limiting the proportion or amount of funds allocated to different sectors [6,7] and limiting the number of projects funded in different sectors [8].Karsu and Morton propose an imbalance indicator that measures the difference of a distribution from an ideally balanced distribution and define a bi-criteria framework for trading balance off against efficiency in resource allocation problems [9].However, these studies are based on the assumption that the decision maker knows the optimal balanced allocation results.In practice, it is also difficult for the decision maker to determine the balanced allocation in advance.Ça glar and Gürel propose a two-stage model in which they make sectoral budget allocation decisions in the first stage to maximize the total impact of the budget while ensuring the relative balance among sectors, and in the second stage they maximize the total score of the supported projects under allocated sectoral budgets [10].Both Mavrotas and Özpeynirci et al. develop interactive approaches that require the decision maker to perform pairwise comparisons among alternative allocation schemes [11,12].In addition, the budget allocation can also be improved from the perspective of project resource allocation by using the fuzzy sets theory [13][14][15].
The projects are selected before they are executed, and some of the selected projects may fail during the long time period from the fund allocation to their completion.Therefore, when constructing the project portfolio, the government funding agencies may benefit from considering the probability of different projects failing in the future.In fact, in the project environment, researchers have developed many methods to estimate different types of likelihood [16].For example, Kahraman and ˙Ihsan employ the concept of probability of a fuzzy event to perform investment analysis [17].Fuzzy numbers are used to model uncertain activity times and their distributions [18,19].Balali et al. use artificial neural networks to predict the cost performance indices of a project [20].In the field of public R&D project portfolio selection, only a few studies have considered project failure.To solve a public R&D project portfolio selection problem with cancellations, Ça glar and Gürel develop a robust optimization model for the case that the cancellation probabilities are unknown, and a stochastic optimization model for the case that the cancellation probabilities can be assessed [21].Mohagheghi et al. propose project resilience as a key criterion in the large-scale construction project selection problem [22].Gouglas and Marsh study the optimal portfolio solutions of the vaccine technologies by estimating their probability of success for rapid response to emerging infections [23].Some studies also consider project failure in the private R&D project portfolio selection field.Zhao et al. build a mixed-integer nonlinear programming model to maximize the expected net present value for the selection of projects under variable task failures [24].Gökalp and Branke consider a pharmaceutical R&D pipeline management problem under two significant uncertainties, i.e., the outcomes of clinical trials and their durations, and they present an approximate dynamic programming approach to solve the problem [25].Tamošaitien ė et al. discover that project risks might cause firms to end up facing big losses or even failure, hence they use the risk and return trade-off in choosing their portfolios of projects [26].
In addition, in the field of project portfolio selection, whether it takes public or private R&D projects as the research object, there are many studies that consider the interrelationships and interactions between projects.There are often substitutional or complementary relationships between some projects, and the outcomes or technologies of the projects influence each other.The interrelationships make the performance of the project portfolio a nonlinear function of the selected projects and further increase the complexity of the R&D portfolio selection.In the field of public project portfolio selection, Fernandez et al. consider the interactions and synergies between projects and the decision maker's preferences when solving the problem of allocating public funds to competing programs or projects [27].Arratia-Martinez et al. take into account portfolio balancing rules and interdependence between tasks and/or projects and solve the R&D project portfolio selection problem under uncertainty [28].Wei et al. construct a co-citation network to simulate project interactions and study the ways in which project interactions influence the final values of project portfolios [29].Delouyi et al. use network mapping to visualize project interdependencies and improve the quality of the dynamic portfolio selection decision in cross-country gas transmission projects [30].In the private project portfolio selection field, both Pérez et al. and Kumar et al. believe that uncertain synergies and incompatibilities exist between projects, and they handle these uncertainties using fuzzy parameters [31,32].Ghasemi et al. believe project interdependencies and cause-effect relationships between risks create complexity for portfolio risk analysis, so they present a model using a Bayesian network for modeling and analyzing portfolio risks [33].
Therefore, in project portfolio selection, there are relatively few studies on public R&D projects.Table 1 summarizes the representative studies mentioned above.It can be seen that, both in deterministic and uncertain environments, no research has yet considered sectoral balancing and project failure.This paper aims to fill this gap.The main contributions of this paper are as follows: (1) We propose and study the uncertain public R&D project portfolio selection problem, considering both sectoral balancing and project failure (URDPS).( 2) We formulate a stochastic programming model for the URDPS.We then transform the stochastic programming model into an equivalent deterministic second-order cone programming model such that it can be solved by commercial solvers (e.g., CPLEX).(3) We perform computational experiments to validate our model and analyze the impacts of the number of project proposals, project failure probability, the upper limit of the budget allocated to each project, and the decision maker's tolerance for project failure on the portfolio performance.The remainder of this article is structured as follows.In Section 2, we describe the uncertain public R&D project portfolio selection problem considering both sectoral balancing and project failure.Section 3 presents the stochastic optimization model and its equivalent deterministic second-order cone programming model.In Section 4, we analyze our model based on computational experiments.Section 5 offers conclusions and suggestions for future work.

Problem Statement
The funding organization receives |J| project proposals (J = {1, 2, . . . ,|J|}) and needs to select a subset of the projects to fund.The selected projects form a portfolio.There are K sectors, and each project belongs to a single sector.For example, the NSFC's funding covers nine sectors.Let S(k) denote the set of project proposals under sector k.The funding organization has a budget B. The variable B k represents the budget for sector k, k = 1, 2, . . ., K. B k is not greater than the sum of the required budgets for all project proposals, i.e., B k ≤ ∑ j∈S(k) r j .There are K ≤ K sectors whose numbers of funded projects are limited to the interval [m, n].Let J k denote the set of project proposals in this type of sector k (k = 1, 2, . . ., K ).
Peer reviewers assess each project proposal j and assign a score w j ∈ [w min , w max ] for j.If w j is lower than a pre-given threshold, project j will not be funded.If w j is not lower than the threshold, the decision maker of the funding organization needs to further determine the actual fund allocated to project j.We use x j ∈ (0, 1] to denote the proportion of the actual allocated fund to the fund r j claimed by the principal investigator, which is also used as an estimation for the future expected performance after project j is finished.Therefore, the actual obtained fund for project j is x j r j and it has an upper limit of R j .In addition, to ensure that project j is funded enough to be executed, we provide a lower limit b j to x j .For the convenience of subsequent modeling, we also introduce an auxiliary variable f j to indicate whether project j is funded or not: if x j > 0, then let f j = 1; otherwise let f j = 0.It can be seen that the nonzero elements in the set f 1 , f 2 , . . . ,f |J| correspond to a project portfolio. To ensure fairness, the funding organization considers sectoral balancing when selecting projects.A proper capital allocation [14] is essential for balancing the budget allocation.Sectoral balancing avoids situations where some sectors receive most of the funds and others receive little.We quantify the balance of budget allocation based on each sector's historical contribution and impact (e k ) on society [10].It is a trend in public R&D funding to assess the impact of each sector's outcomes and decide the budget allocation accordingly.For example, Graham and Mackie shift 3.4% of the agency budget from lower-impact areas to higher-impact areas, which improves the public health impact of the resources [34].Following Ça glar and Gürel [10], the impact I of a portfolio is calculated as , where α ∈ (0, 1) represents the disfavor of funding agencies to the inequities that result from disparate budgets across disciplines.For a given α, the maximum value I max of I can be obtained by solving the following nonlinear programming model [10]: Subject to : where Constraints (2) are used to determine the total budget of each sector.Constraint (3) ensures the total allocated budget to all sectors does not exceed B. Constraints (4) define the range of the decision variables.It is not uncommon for a project to be faced with uncertainties [13,17].In our problem, due to the uncertainty of scientific research, each project j has a probability θ j of failure when executed in the future.If project j fails, the actual budget used is a j r j , where a j is the proportion of the funds used at the time of failure to its grant amount.Let random variable r J denote the actual budget used by project j in its future execution, then we obtain r J = r j w.p. 1 − θ j a j r j w.p.θ j ∀j ∈ J. ( The funding organization wants to limit the number of failed projects in the portfolio, so the total number ∑ j∈J θ j f j of failed projects should not exceed ζ∑ j∈J f j , where ζ ∈ [0, 1] is set by the funding agency according to its needs and determines the maximum expected number of failed projects.
There are different relationships between projects.The relationships have different impacts on the ultimate performance of the corresponding portfolio.For example, cofunding complementary projects helps to increase the expected performance of the portfolio, whereas co-funding competing projects tends to decrease the expected performance of the portfolio.We consider the two types of relationships mentioned above when constructing the project portfolio.When the interrelated projects u and v are chosen simultaneously, the additional impact on the total score of the project portfolio is expressed as o u,v .When o u,v > 0, it means that the selection of both projects u and v has a positive impact on the portfolio performance; otherwise, there is a negative impact.Due to the existence of the project failure probability, the expected additional impact of projects u and v on the total score of the project portfolio is (1 The public R&D project portfolio selection problem studied in this paper aims at constructing a portfolio with the maximum expected total score z of the project portfolio while considering sectoral balancing, project failure, and project interrelationships.We express z as , where ∑ j∈J 1 − θ j w j x j is the expected total score under the condition of project failure without considering project interrelationships, is the added value of the expected score of the portfolio caused by the interrelationships between the projects, and S l ⊆ J × J is the set of all possible pairwise projects with interrelationships.

Stochastic Programming Model
We formulate the following stochastic programming model (SP) for the above problem: Subject to: (2)-( 4).
Constraints on the number of funded projects: Constraints on the funding proportion: Funding amount constraints: Sectoral balancing constraints: Chance constraints on budget: Constraints on the number of failed projects: Range of decision variables: In model SP, the objective function ( 6) maximizes the expected total score of the portfolio.Constraints (7) and ( 8) restrict the lower and upper limits of the number of funded projects in different sectors, respectively.Constraints (9) express the logical relations between decision variables f j and x j .Constraints (10) indicate the lower limit of the funding proportion for each project.Constraints (11) are the upper limit for the funded amount of each project.Constraint (12) ensures that the degree of balance in budget allocation between sectors is not less than I max , where the balancing preference parameter ∈ [0, 1]  reflects the degree to which the funding organization wants the balance of the budget allocation to reach its theoretical maximum value I max .Chance Constraints (13) ensure that when the project is executed, the probability that the budget of each sector does not exceed the planned quota is greater than or equal to the confidence level β.Constraint ( 14) limits the number of failed projects.Finally, Constraints ( 15) and ( 16) define the range of the decision variables.
It should be noted that both Constraints ( 12) and ( 13) in SP are nonlinear.Therefore, SP is a nonlinear optimization model.To solve the model with exact solvers (e.g., CPLEX), we will transform SP into its equivalent deterministic second-order cone programming model in the next subsection.

Second-Order Cone Programming Model
By introducing auxiliary variables and second-order cone inequalities, we transform SP into an equivalent deterministic second-order cone programming model.First, proposition 1 transforms the nonlinear constraints in (13) into deterministic second-order cone constraints.
Proposition 1.The constraints in ( 13) are equivalent to the following constraints: Proof.The expectation of r J is E( r J ) = r j 1 − θ j + a j r j θ j = r j 1 − θ j 1 − a j .The variance of Based on the properties of the normal distribution, the chance constraints in ( 13) can be transformed into Constraints (20), where Φ denotes the cumulative distribution function of the normal distribution: Let Φ −1 denote the inverse function of Φ.Since Φ −1 is monotonically increasing, constraints (20) can be further transformed into the following constraints: We then obtain Constraints ( 22) by plugging the expectation E r J and variance Var r J into Constraints (21): Since Φ −1 (β) > 0, Constraints ( 22) are convex and equivalent to Constraints ( 17)-( 19), where Constraints (18) are second-order cone constraints and y k is a continuous auxiliary variable.
MISOCP is a mixed-integer second-order cone program that can be directly solved by CPLEX.

Computational Experiments
In this section, we investigate the following four research questions (RQs) based on computational experiments.
RQ1: How does the number of project proposals affect the total score of the portfolio?RQ2: The failure probability of a project affects the chance of the project being selected.Then, how does the proportion of projects with high and low failure probability in all candidate projects affect the portfolio performance?RQ3: How do different upper limits of the budget allocated to each project affect the performance of the project portfolio?RQ4: The decision maker's tolerance for project failure affects the chance of a project being selected.Then, how do different failure tolerances affect the portfolio performance?
In our experiments, the above RQs are answered by solving the MISOCP model with CPLEX.Our code is programmed in MATLAB 2016a, and CPLEX is also called by MATLAB 2016a.Our experiments were conducted on a computer equipped with a Ryzen5 5500U 2.1 GHz CPU and Windows 11 64-bit.

Benchmark Dataset
There is no dataset designed specifically for our problem.Therefore, we generate the benchmark dataset SET using a full factorial design.We call the situation corresponding to the benchmark dataset the baseline scenario.The parameter values of the benchmark dataset are shown in Table 2.For each parameter combination in Table 2, 10 instances are generated.Since each parameter in Table 2 has only one value, this leads to 10 instances in the benchmark dataset.The dataset used in this paper has been uploaded to https://github.com/RuiChen-329/URDPS, accessed on 18 October 2022.[21,36], [9,36], [7,36], [23,36], [7,22], [11,36], [14,36] The range of the project budget in each sector The proportion of the available budget to the total required budget of all project proposals The proportion of projects in the set that have quantity constraints compared to the total number of proposals The number of project sets that have quantity constraints The lower limit of the number of funded projects in J k (k = 1, 2, . . ., K ) The upper limit of the number of funded projects in J k (k = 1, 2, . . ., K ) w j [2,10] The range of the score for each project η 85% The upper limit of the budget allocated to each project The range of the performance score fluctuations for the projects with interrelationships

Performance Measure
In our experiments, we generate multiple variants of the baseline scenario by changing the parameter values in the baseline scenario.The number of instances contained in each scenario variant is the same as the number of instances in the baseline scenario.Based on the computational results obtained by comparing the scenario variants with the baseline scenario, we answer the four RQs.In the comparisons, we use the average relative deviation (ARD) of the total score as the performance measure.ARD measures the average gap in the optimal objective function value between the scenario variants and the baseline scenario.ARD is calculated as follows: where z VS i (z BS i ) is the optimal objective function value of instance i in the scenario variants (baseline scenario).The value of the ARD reveals the impacts of the factors (i.e., the number of project proposals, the funding amount, and the decision maker's tolerance for project failure) on the performance of the project portfolio.In other words, when the decision maker chooses an alternative scenario instead of the baseline scenario, the value of the ARD indicates the gain or loss of the decision.The smaller the ARD value, the weaker the impact of the factors on the portfolio selection result.
When using CPLEX to solve each instance in different scenarios, we set the time limit to 30 s.The results show that 180 out of the 410 instances are solved to optimality.For the remaining 230 instances, only feasible solutions are obtained, and the estimated optimality gaps given by CPLEX are within 0.02%.

Experimental Results
We perform four experiments to answer the four RQs, respectively.As shown in Table 3, in each experiment, only one parameter's value is changed compared to the baseline scenario, and we generate 10 scenario variants (other parameters are kept unchanged).Note that in Experiment 2, instances of scenario variants 1-5 and 6-10 are obtained by extracting the failure probabilities of 10-50% of the baseline scenario instances from the intervals (0, 0.1) and (0.2, 0.3), respectively.The computational results are shown in Figure 1.We can see from Figure 1 that the number of project proposals, the upper limit of the budget allocated to each project, and the decision maker's tolerance for project failure have positive impacts on the portfolio performance.This means that the greater the values of these parameters, the higher the performance of the project portfolio.For the number of project proposals, this result is obvious: more candidate projects tend to contain more potential projects that can contribute more to the portfolio.In addition, the impact of the number of project proposals is linear.

Conclusions and Future Research
In this paper, considering sectoral balancing, project failure probability, and interrelationships between projects, we have studied the uncertain public R&D project portfolio selection problem to maximize the expected performance of the project portfolio.We formulate a stochastic programming model for the problem.We transform the stochastic programming model into an equivalent deterministic second-order cone programming model that can be directly solved by exact solvers.Computational experiments are conducted to analyze the impacts of different factors on the project portfolio performance.The impacts of the upper limit of the budget allocated to each project and the decision maker's tolerance for project failure are non-linear and follow a similar pattern.Specifically, before the thresholds (88% for the former factor and 14% for the latter) are reached, the performance of the portfolio increases rapidly as the above two factors increase.After the thresholds are reached, the impacts of both factors become steady.This means that for the upper limit of the budget allocated to each project, after most of the selected projects are allocated to large enough funds, increasing the funding amount of each project benefits fewer.For the decision maker's tolerance for project failure, it reveals that the number of failed projects should be controlled within a certain range, which can achieve higher portfolio performance.
Compared with the baseline scenario, when the probability of project failure is small (large), the performance of the project portfolio is higher (lower).Increasing the number of projects with a low failure probability helps enhance the portfolio performance.

Conclusions and Future Research
In this paper, considering sectoral balancing, project failure probability, and interrelationships between projects, we have studied the uncertain public R&D project portfolio selection problem to maximize the expected performance of the project portfolio.We formulate a stochastic programming model for the problem.We transform the stochastic programming model into an equivalent deterministic second-order cone programming model that can be directly solved by exact solvers.Computational experiments are conducted to analyze the impacts of different factors on the project portfolio performance.The results show that the number of project proposals, the funding amount, and the decision maker's tolerance for project failure have positive impacts on the portfolio performance, whereas the probability of project failure has a negative impact.
Our results reveal that the proposed method provides an effective tool to aid the decision maker's project portfolio selection decisions.By incorporating real-world characteristics that appear in public R&D project portfolio selection such as sectoral balancing, project failure, and project interrelationships, our method is able to generate satisfactory project portfolios for different decision scenarios.Our method is easily embeddable into funding agency information systems, resulting in portfolio selection solutions that are automatically obtained to serve as a reference for decision-making and the effective utilization of public investment.
Future work will consider more factors in public R&D portfolio selection, such as the robustness of the solution.Additionally, since project funding agencies make project selection decisions dynamically or at regular intervals, it is necessary to study the problem from the perspective of a multi-stage decision-making process.

Sustainability 2022 , 13 Figure 1 .
Figure 1.The impacts of different parameters on the portfolio performance.

Figure 1 .
Figure 1.The impacts of different parameters on the portfolio performance.

Table 1 .
Representative studies in project portfolio selection.

Table 2 .
Parameter values for the benchmark dataset SET. max

Table 3 .
Parameter values for the scenario variants.