3. The Robust Portfolio Optimization Problem
Consider a market with a finite horizon and discrete time . The proposed framework describes a consecutive process of decision-making by an investor who manages a portfolio trying to attain maximum reward at the end of the strategy. Each decision is made at the beginning of the investment period, while the outcome is observed at the end of the period. Throughout the paper, t will denote the beginning of the corresponding period.
Assume that there are n risky assets and one risk-free asset on the market. Let  be the market price of the i-th risky asset at time t and let . Let  be the market price of the risk-free asset. In the rest of the paper, the term “price” means the market price. The price of any risky asset at time t might be undetermined at  while the exact value of  is assumed to be known at  (such property will be referred to as predictability later on). An example of a risk-free asset is the price of money in the portfolio’s base currency. Another example is a sovereign bond account if the bonds are assumed risk-free and liquid, and if they are issued and traded in the portfolio’s base currency. For the rest of the paper, we assume that all prices are positive. Note that, while this is a classic assumption that is reasonable in many practical cases, it does not fully account for the risk of bankruptcy of the issuer. We also do not account for the price discretization and assume that .
By portfolio at time t we mean a predictable vector of asset volumes . Predictability means that the portfolio structure for the next investment period is decided at the beginning of that period. We assume that all the assets are infinitely divisible:  . At the start of the strategy the investor is provided with the initial portfolio . Let  denote the volumes of the risky assets at t.
To account for the friction in the market, we assume that each trade by the investor incurs transaction costs. Costs can be categorized as explicit, such as broker commissions and fees, and implicit, which are due to the insufficient liquidity of the market: a trade moves the corresponding market price against it, therefore the average price of the trade might be worse than the market price if the volume is sufficiently large. Thus, in the frictionless market, a portfolio can be rebalanced with no loss of market value. In the real market, however, any rebalancing might lead to the loss of portfolio market value, which can be thought of as carrying additional costs. We assume that there are no costs incurred by the trades in the risk-free asset.
We assume that all information about the market is contained in the prices of the assets. The system state can therefore be represented as a combination of the price values and the portfolio. Let  be the market value of the risk-free position at time t. Since the risk-free asset is assumed to be absolutely liquid, it is easier to operate in terms of  then work with  and  separately. Thus, by the system state  at time t, we would mean a combination of ,  and . We will also refer to the system history  up to the moment t as . For the ease of notation, the functional dependence on the “·” argument will mean the dependence on the system’s past history throughout the paper. The dependence will sometimes be explicitly written out to clarify the dependence on the components of .
Let 
 denote the costs carried when transitioning to the portfolio 
. The function 
 is assumed predictable. We do not introduce any additional inflows and outflows of assets, hence the following budget equation should hold for all 
t:
      where 
 is the backward difference operator, that is, 
. Both in Equation (
1) and in the rest of the paper, the product of vectors should be understood as the inner product. Equation (
1) allows us to express 
 as a function of 
 and the past history as
      
      thus the optimization problem will be stated in terms of 
 instead of 
 in the rest of the paper, and by the portfolio strategy we mean the sequence of 
.
We assume that at each time 
t the portfolio structure might be subject to the predictable constraints:
The constraint sets 
 can be interpreted as the trading limits, which might serve the risk-management purposes or originate from the aims of the portfolio strategy. For example, consider a portfolio liquidation problem where an investor tries to close the risky positions over 
N consecutive periods. The problem can be formulated in several ways, one of which is gradually reducing the allowed volumes of risky assets via the constraints, that is,
      
 would have meant the inability to continue with the investment process and the early termination of the strategy. This case would have complicated the presentation of the results with technical details, so we assume the non-emptiness of  throughout the paper. As for the other properties, from the risk-management point of view, it is usually reasonable to assume the boundedness of the constraint sets, since the unbounded sets might encourage entering infinitely large risky positions that are subject to infinite risk.
At 
, the risk-free price 
 is assumed to be known by the investor. Consider the risk-free price dynamics in a multiplicative form as
      
      where 
 is assumed predictable. 
 can be interpreted as the risk-free rate of return. By combining (
2) and (
3), we can represent 
 as a function of portfolio 
 and the past history as
      
As for the risky assets, the only assumption about the prices considers the range of returns. Let the risky price dynamics be represented in a multiplicative form as
      
      where 
 is the return of the 
i-th asset over the investment period starting at 
. Let 
 and let 
 be a diagonal matrix with 
 on the main diagonal. Then we can write
      
The main assumption about market ambiguity is that we assume that  belongs to a predefined set  for each , whereas the sets  are assumed convex and compact.
Below, we present the notion of optimality and the Bellman-Isaacs equation. Classic portfolio selection frameworks usually assume that the investor maximizes the expected reward function of the portfolio at the end of the strategy. for example, in Reference [
7], the authors consider the negated value of the execution costs as a reward, while in Reference [
8], the mixture of the execution costs and the market risk is optimized. Many frameworks are based on expected utility maximization, where the form of the utility reflects the risk-aversion of the investor. Our approach accounts for both the uncertainty and the ambiguity of the market—the former refers to the stochastic nature of the market given a specific market model, while the latter refers to the uncertainty about the market model itself. In the presence of ambiguity, the standard expected von Neumann-Morgenstern utility cannot be used to define the optimal strategy, see the Ellsberg paradox [
19]. Instead, we say that the portfolio strategy is optimal if it maximizes the robust Savage representation as defined in Reference [
20] (see Theorem 2.80). For this, we assume the reasonable additional axioms of uncertainty aversion and certainty independence presented in Reference [
20], when describing the behavior of the investor. The form of the robust Savage representation resembles the worst-case expected reward, which makes it a natural extension of the classic framework.
Remark 1. Optimization of the worst-case expected von Neumann-Morgenstern utility might still be appropriate when applied to a particular investment process. Assume, for example, that the utility function is defined by the risk-averse decision-maker based on the subjective estimates of the market. At the same time, the optimal strategy is obtained by another agent, the analyst, who tries to maximize the expected value of the provided utility in the worst case based on the analyst’s independent estimate. Thus, from the analyst’s point of view, the utility is an exogenous reward function which should be used in the framework.
 We represent the ambiguity of the market via a set of probability measures describing price dynamics. For each 
, consider a set of probability measures 
, where each measure describes the uncertainty of risky prices over the corresponding period (in terms of the distribution of 
, see (
5)). Let 
 be the set of probability measures describing the risky prices across the investment periods starting at 
 or later. Throughout the paper, we assume the following properties:
- for each  and every ,  and  has compact and convex values; 
- for each  and every , , where ; 
- (The Rectangularity assumption) for each admissible portfolio strategy,
           
Property 1 refers to the previously mentioned assumption about the price dynamics. The integral of the vector 
 in Property 2 should be interpreted element-wise as 
 for 
 and 
. Property 2 means that at the beginning of each period the investor has certain expectations about the future price values, whereas 
 is the expected value based on the investor’s estimate. The assumption is not restrictive since, in many practical cases, the investment process is based on the estimate of future price movements, which is provided by the analyst. Besides, dropping the assumption would lead to the degeneration of the strategy in some practical cases, which will be illustrated at the end of the Section. Property 3 is a generalization of the Rectangularity assumption of References [
21,
22], that can be interpreted as the independence of market ambiguity in the future from the present market model. As an example of ambiguity that does not comply with (
6), consider a single-asset market with the multiplicative dynamics (
5) and 
. For each 
t, let 
, 
, and let 
 be either uniformly distributed on 
 with probability 
p or uniformly distributed on 
 with probability 
, where 
 is unknown, which makes the market model ambiguous. Let 
 denote the corresponding probability measure at 
t. Then 
 for 
, and
      
On the other hand,
      
      hence 
. The Rectangularity assumption means that the investor assumes that the market is “replenished” at each period in terms of ambiguity, which is reasonable in the context of the worst-case framework. See Reference [
22] for a similar example for the Ellsberg paradox.
In this paper, we do not focus on the formal definition of the admissible strategies of the investor and the market, for a formal definition see for example, Reference [
23]. Let 
 be the reward function which depends on the portfolio structure and the market state. We assume that 
J has values from 
, where the 
 value means the failure of the investment strategy. Let 
 be the set of the admissible portfolio strategies, and let 
 be the set of the admissible strategies over the periods starting at 
 or later. Let 
 be the set of probability measures describing the market dynamics across all periods. Then we can say that the portfolio strategy 
 is 
optimal if
      
      where 
 is the system history for the strategy 
. The supremum in (
7) might not be achieved in general. However, it is usually the case in practice, if the problem is stated correctly, so throughout the paper we assume the existence of an optimal strategy. Note that 
 is the robust Savage representation we mentioned earlier.
Remark 2. Definition (7) requires integrability of the underlying reward function. In the presented practical framework, we do not focus on the question of measurability but assume the required form of measurability as needed. It will be shown in the next section, that we are mainly interested in the infimum over the subset of atomic measures from  that are concentrated in  or fewer points of , , and every function is integrable with respect to any measure from that subset.  Let 
 be the supremum of the robust Savage representation as estimated at time 
, 
:
For , let . We refer to  as the “value function” in the rest of the paper. The proposed portfolio optimization problem can be solved via the dynamic programming principle which can be applied to the proposed framework:
Theorem 1. 1. The value function  satisfies the following Bellman-Isaacs equation: 2. If for each  the supremum in (9) is achieved at some , then  is an optimal strategy.  Proof.  The proof is technical and has been moved to the 
Appendix A. □
 The dynamic programming principle makes the problem more feasible for the numerical solution by allowing us to reconstruct the value function for each investment period through an iterative process. By combining the formulae for the risk-free position value (
4) and the price dynamics (
5) with (
9) and (
10), we obtain the final form of the Bellman-Isaacs equation for the portfolio investment problem as
      
The numerical solution of the provided equation might prove difficult due to the additional minimization problem in the recursive formula, which is not present in the classic stochastic framework. The problem is even more relevant due to the high dimension of the state space. In practical cases, the number of risky assets covered by the investment strategy can be quite high. For example, when the portfolio replicates a market index. In the next section, we provide sufficient conditions which allow the class of measures in the infimum problem to be narrowed down to atomic measures, which should make the Bellman-Isaacs equation more feasible for practical use.
Note that the value function  might attain both  and  values. While the case  arises naturally given the definition of ,  usually means an incorrect problem statement, since the possibility of infinite reward usually implies infinite risk of the optimal strategy. Therefore, in the rest of the paper we assume that  for each t. The assumption is satisfied for example, for the bounded reward function:
Proposition 1. If  for some , then , for each .
 Proof.  We conduct the proof by backward induction. For , the statement is true by definition of the value function. Assume that  for some . Then  and from the Bellman-Isaacs equation we get the statement for . □
 If the reward function is unbounded, then proving that  would require some additional continuity assumptions, which is out of the scope of the presented research.
At the end of the current Section, we return to the question of the robust strategy in the absence of any expectations about risky prices, that is when 
, as defined above, is also ambiguous. For example, consider a one period problem (
) for a single risky asset (
) with no costs (
) and phase constraints of the form 
, where 
 and 
, so that positive, negative and zero volumes are allowed. Let 
, which means that the risky price rate can either be higher or lower than the risk-free rate 
 (the assumption is natural, since otherwise there would be obvious arbitrage in the market). Assume that 
, where the distribution of 
 is ambiguous, and belongs to the family of distributions 
 with support 
. Note that for 
, the Rectangularity property is automatic. Assume that 
 contains all the Dirac measures, that is, the admissible strategies of the “market” include the deterministic strategies. Let the initial risky price be 
 and let the initial portfolio have 
 volume invested in the risky asset, while its risk-free position value is 
. Let the reward 
 be a strictly monotone function of the portfolio market value 
. For any admissible strategy 
H, the market value of the portfolio at 
 would be
      
      where 
. Note that if 
 and the “market” chooses a deterministic strategy 
, where 
, then 
; similarly, if 
, then we have 
 for the deterministic “market” strategy 
 with 
. Note that 
 does not depend on the risky price (there are no risky investments in the portfolio). Therefore, the value of the robust Savage representation for any strategy 
 is less than its value for 
, which makes 
 the optimal strategy. The optimality of 
 means that the investor should liquidate the risky positions completely. One can easily extrapolate the example to the multiperiod problem and see that 
 is optimal for each period, which compromises the whole idea of the investment process. This example shows that to obtain a non-trivial strategy in this case, the investor needs to have some prior expectations about the future price values.
  4. Simplified Forms of the Bellman-Isaacs Equation
This research provides a practical framework for the portfolio optimization problem. Therefore, we attempt to make economically reasonable assumptions while providing the interpreted solution via the Bellman-Isaacs equation. Still, a practical approach should allow for either an analytic or numerical solution. The Equations (
11) and (
12) does not allow for an analytic representation of the value function in most practical cases, while solving it numerically might be computationally difficult compared to the usual Bellman equation, since we need to solve an additional minimization problem at every step. Below, we consider particular cases of the framework that allow us to simplify the numerical solution of the internal minimization problem.
Let  be the set of extreme points of . Let  (respectively, ) be the set of probability measures from  concentrated at  or fewer points (respectively, extreme points) of . Below we show that in some cases, the infimum over  in the Bellman-Isaacs equation is equivalent to the infimum over  or even . For this, we use the following result:
Lemma 1. Let  be a real-valued Borel measurable function on convex and compact . Then (i) the infimum of  over the set of probability measures  equals the infimum over ; (ii) if  is concave, the infimum over  coincides with the infimum over .
 Proof.  The proof is technical and has been moved to the 
Appendix A. A similar result for a continuous 
 can be found in Smirnov [
24]. The idea of their proof applies to our case as well. The proof of the Lemma 1 is based on their idea and is provided in this paper for consistency. □
 Lemma 1 immediately yields the following result, which allows us to narrow the set of measures in the Bellman-Isaacs equation:
Theorem 2. Assume that  and the sets  are compact and convex for each . Then  Proof.  Assume that for some t,  and H, , the subfunction  attains  for some , . Then the infimum is attained at the atomic measure concentrated at the points , such that  belongs to the interval connecting  and . If  is attained exactly at , then the corresponding atomic measure is concentrated at . If  is not attained on , then we use Lemma 1 to complete the proof. □
 Lemma 1 allows us to further narrow the class of measures when finding the infimum in the Bellman-Isaacs equation. For this, the subfunctions 
 in (
11) should be concave. Below, we will prove the concavity via backward induction by imposing restrictions on the form of the cost function, the reward function, the constraint sets and the measure sets. Under the restrictions, the dependence on the risky position volumes 
H would only be allowed through the dependence on the market values 
, where “∘” means the element-wise (Hadamard) product.
For further convenience, we sometimes represent the dependence on the system history  as the dependence on the arguments , ,  and , thus decomposing the dependence on the latest system state and the history before that. We also uphold this notation for , in which case the arguments should be interpreted as ,  and , meaning that there is no history at the beginning of the strategy. We also sometimes represent the system history  as a combination of the history of the portfolio risky volumes , the history of the risk-free position value  and the price history  up to time t. For example,  might be equivalently written as  or , whenever the notation is clear from the context. In the following proofs, we also make some monotonicity assumptions, in which case the comparison of vectors and arrays of arguments should be understood element-wise.
Assume that the constraint sets 
 are monotonic in the following sense: let 
 be any of the subfunctions 
, then
      
The assumption means that the increase in the portfolio risk-free position should provide additional trading options while keeping the previous ones available. The assumption might be inappropriate in some cases, for example, when the increase in risk-free funds leads to additional constraints on the less risky positions in favor of the more risky ones to stimulate the more risky and profitable strategies. We do not consider such cases, believing that the investor is reasonable enough to account for the increased risk-free reserves when choosing the strategy, thus the additional trading limits are not required.
Assume that the measure sets 
 are monotonic in the following sense: let 
 be any of the subfunctions 
, then
      
The assumption means that the increase in the risk-free funds does not lead to more ambiguity in the market, according to the investor’s subjective estimate. Otherwise, we would have assumed that the investor becomes more uncertain when provided with the additional risk-free liquid assets, which is unreasonable.
Below, we provide sufficient conditions for the concavity and monotonicity of the value function.
Theorem 3. Let  be the value function in the Bellman-Isaacs Equations (11) and (12). Assume that for each  - the subfunctions  are convex; 
- the subfunctions  are non-increasing; 
- the subfunctions  are concave; 
- the subfunctions  are non-decreasing; 
- the constraint sets  have convex values and monotone in the sense of (15); 
- the measure sets  are monotone in the sense of (16); 
- . 
Then the subfunctions  are concave and the subfunctions  are non-decreasing for .
 Proof.  The proof is technical and has been moved to the 
Appendix A. □
 The convexity of the cost function is a common requirement in financial literature, which is justified in most practical cases due to the well-known price impact effect in the market. The assumption that the costs are non-increasing with respect to the risk-free position value is economically reasonable, since the additional risk-free and liquid funds should create more favorable conditions when choosing the execution strategy, while incurring no additional costs themselves. The concavity of the reward function is reasonable if the investor is risk-averse, while the monotonicity with respect to the risk-free position value is natural for many practical cases, when adding to the total portfolio value at no additional costs should increase the portfolio reward.
Let  be the vector of risky position values at time t, and let  be the corresponding history up to time t, . Let  be the state of the portfolio in terms of the position market values.  can be derived from  and combines the market and portfolio states. Let  be the corresponding state history up to time t. Next, we show how the class of measures in the Bellman-Isaacs equation can be narrowed, if the value function can be represented as a function of .
Theorem 4. Assume that all the assumptions of Theorem 3 hold and, in addition,
- the cost function has the following form: 
- the reward function has the following form: 
- the constraint sets  are given in the form of constraints on the risky position market values: 
- the probability measure sets  have the following form: 
 Proof.  The statement can be proven by using Lemma 1, so we only need to prove the concavity of the subfunctions 
 for each 
. First, we will prove by backward induction that the value function 
 can be represented in the form
        
The statement holds for 
 due to (
18). Assume that it is true for some 
. From (
17), we get that 
 can be represented as a function of 
 and 
. Indeed, from (
4) we have
        
        where 
 means the vector 
. Since 
 is a diagonal matrix, (
23) implies that
        
Therefore, 
 is, in fact, a function of 
, 
 and 
, which we will denote as 
. Then, by substituting (
19) and (
20) into the Bellman-Isaacs equation, we get that
        
Therefore, 
 is a function of 
, which proves the induction step, thus proving that the value function admits the representation (
23) for 
. Then, from (
23) and the diagonality of 
, we get that, for any 
,
        
By Theorem 3, the subfunctions 
 are concave. Therefore, the subfunctions 
 are concave as compositions of the linear function 
 and the concave function. Then, (
25) implies that the subfunctions 
 are concave as well, which means the concavity of the subfunctions 
 and proves the statement. □
 The conditions of Theorem 4 might seem too restrictive. However, they are satisfied in a variety of practical cases. Below, we provide some examples covered by the Theorem.
The form of the cost function (
17) includes the widely-used affine model of costs, see Reference [
5]. At the same time, it allows us to introduce the non-linear cost growth, which might be relevant for large portfolios.
As the example of the concave and monotonic reward function of the form (
18), consider a utility function which depends on the portfolio value, for example,
      
      or
      
      where 
U is a concave non-decreasing utility function and 
 is the costs of liquidating the remaining risky assets in the portfolio after the end of the strategy (assume that 
 are convex and 
 are non-increasing). In the first case, the investor wants to maximize the market value of the portfolio, and in the latter case, the liquidation value of the portfolio is being maximized.
As the example of the constraint sets covered by Theorem 4, consider a set of predictable functions 
 for 
 and 
, such that the corresponding subfunctions 
 are quasi-convex. Let 
 be predictable coefficients and let 
 be the market value of the portfolio at 
. Consider the constraint sets of the form:
Quasi-convexity of  implies the convexity of  values, while non-negativity of  implies the monotonicity as required by Theorem 4. The introduced form of constraints includes several important trading limit types, for example:
- , , means the constraints on the maximum value of each risky position in terms of the percentage of the portfolio market value; 
- , , means the constraints on the minimum value of each risky position, which limits short-selling; 
-  means the constraints on the combinations of asset positions, which is useful for limiting investments in a group of products, which, for example, represent a sector of economy or constitute an index; a special case of this type of constraints is , which might be used to limit the risk-free asset short-selling. 
Theorem 4 covers several price impact effects. For example, consider the measure sets 
 with the expected value 
 defined as
      
      where 
, 
, 
 and 
 are assumed to be known, 
. For the 
i-th risky asset and time 
k, 
 is the expected drift of the market price, 
 is the expected permanent price impact of the trade with market value 
w, 
 is the expected temporary price impact of the trade with market value 
w, and 
 characterizes the expected resilience rate of the market price. The logarithm in the formula guarantees that the expected value of the multiplicative coefficient of the price dynamics is positive. Let the support of the measure set be defined as
      
      where the “+” operator means the Minkowski sum, and 
 characterizes both the uncertainty and the ambiguity of the market prices. Assume that 
 to guarantee that 
, 
. The provided forms of 
 and 
 capture several aspects of the price impact which are relevant for portfolio optimization in a market with limited liquidity, see for example, References [
8,
11,
25,
26]. The structure of 
 characterizes the investor’s estimate of the market ambiguity. For example, the investor might assume the rectangular structure to disregard any prior knowledge about the dependencies between the asset prices. On the other hand, by considering the elliptical form of 
, the investor assumes that the prices will not attain the respective extreme values all at the same time, thus introducing some prior knowledge about the market into the framework.