Constrained Dynamic Mean-Variance Portfolio Selection in Continuous-Time

This paper revisits the dynamic MV portfolio selection problem with cone constraints in continuous-time. We first reformulate our constrained MV portfolio selection model into a special constrained LQ optimal control model and develop the optimal portfolio policy of our model. In addition, we provide an alternative method to resolve this dynamic MV portfolio selection problem with cone constraints. More specifically, instead of solving the correspondent HJB equation directly, we develop the optimal solution for this problem by using the special properties of value function induced from its model structure, such as the monotonicity and convexity of value function. Finally, we provide an example to illustrate how to use our solution in real application. The illustrative example demonstrates that our dynamic MV portfolio policy dominates the static MV portfolio policy.


Introduction
The classical static mean-variance (MV) model was pioneered by Markowitz [1] more than sixty years ago, which laid the foundation of modern financial theory. However, such a static policy is not a good choice since the investor usually needs to adjust his/her portfolio policy to achieve better performance according to updated information. Due to the non-separability of the variance term in the language of dynamic programming, this problem is not trivial analytically. In 2000, the static MV model was extended to the multi-period setting by Li and Ng [2] and to the continuous-time setting by Zhou and Li [3] by using the proposed embedding method. From then on, the past decades have witnessed significant progress on the dynamic MV portfolio selection analysis by leaps and bounds; see, for example, Zhu et al. [4], Bielecki et al. [5], He et al. [6], Gao et al. [7], Gao et al. [8], Zhou et al. [9], Cui et al. [10] and Strub et al. [11].
One prominent attraction of the dynamic MV portfolio selection models is their explicit portfolio policy, which can be derived by using the embedding method and dynamic programming approach. However, in real portfolio management, some constraints on portfolio strategy are inevitable; e.g., the investor is usually subject to some limits of consideration of the risk or economic regulations. Generally speaking, when we consider the strategy constraints, they usually destroy the advantages of the MV model. That is to say, under such cases, it is hard to obtain the closed form of a portfolio policy except in a few special cases. To address this issue, Li et al. [12] adopted the viscosity solution of the partial differential equation and successfully characterized the analytical solution of the continuous-time MV portfolio selection problem with no-shorting constraints. Cui et al. [13] considered the discrete-time version of this type of problem with no-shorting constraints and solved it by using dynamic programming. Gao et al. [14] developed the optimal portfolio policy for the dynamic MV portfolio selection model with a cardinality constraint with respect to the active periods of time. Wu et al. [15] investigated the stochastic linear-quadratic (LQ) optimal control problem with the linear control constraints, which can be regarded as a generalization of dynamic MV portfolio selection with no-shorting constraints. In addition, some promising results have emerged recently for dynamic MV portfolio selection with some other constraints, such as no bankruptcy constraints (see, e.g., Zhu et al. [4] and Bielecki et al. [5]). This paper studies the dynamic MV portfolio selection problem with cone constraints in continuous-time. As for the discrete-time versions of problems of this type with cone constraints, Cui et al. [16] investigated them in detail and solved them by using dynamic programming. The revealed optimal portfolio policy is a piece-wise affine function of current wealth level and can be computed efficiently by solving two important equations offline. Wu et al. [15] further verified this significant result by utilizing the state separation property induced from its structure. As for the continuous-time versions of problems of this type, Hu and Zhou [17] solved them by using the backward stochastic differential equation approach. More specifically, they introduced two extended stochastic Riccati equations (ESREs) and characterized the optimal portfolio policy by using the solutions of these two ESREs. However, their work did not demonstrate how to construct these two ESREs by using the special structure of a model rather than by suspecting them. Generally speaking, this task has not been accomplished. Recently, Wu et al. [18] proposed an important state separation theorem and answered this question under the framework of the stochastic LQ control problem with linear control constraints.
This paper revisits the dynamic MV portfolio selection problem with cone constraints in continuous-time. The contributions of our work are several. Firstly, we reformulated our dynamic MV portfolio selection model into a special stochastic LQ optimal control model and used the results derived in [18] to develop the optimal portfolio policy of our model. Secondly, we created an alternative method to resolve this dynamic MV portfolio selection problem with cone constraints. More specifically, instead of solving the correspondent HJB equation directly, we developed the optimal solution for this problem by using the special properties of value function induced from its model structure, such as the monotonicity and convexity of value function. This alternative method offers a new way of thinking under the constrained mean-risk framework and can be extended to solve another portfolio selection models with cone constraints. Finally, we have provided an example to illustrate how to use our solution in real applications according to real market data. The illustrative example demonstrates that our dynamic MV portfolio policy dominates the static MV portfolio policy.
The remainder of this paper is organized as follows. Section 2 provides the basic formulation of the continuous-time MV portfolio selection model with cone constraints. Section 3 presents the analytical solution of our model. Section 4 offers an alternative approach to solving this problem. Section 5 provides some examples to show that how to use our solution scheme in real applications. Finally, Section 6 concludes the whole paper and provides some further extensions. The symbol 1 denotes the indicator function and the symbol I denotes the identity matrix. All lemmas and theorems have been proofed in the Appendix A.

Problem Formulation
In this work, we consider a financial market consisting of one risk-free asset and n risky assets, which can be traded continuously within time horizon [0, T]. We assume that all randomness is modeled by a complete filtrated probability space (Ω, F , {F t } t≥0 , P), on which F t -adapted n-dimensional Brownian motion W(t) := (W 1 (t), · · · , W n (t)) is defined. Denote F t as the augmented σ-algebra generated by W(t), which implies the information set available at time t, t ∈ [0, T]. L 2 F (0, T; R) is the set of R-valued, F t -adapted and square integrable stochastic processes.
The price of the risk-free asset is S 0 (t), which is governed by the following equation: where r(t) > 0 is the risk-free return rate. The price of n risky assets satisfy the SDE: where µ i (t) > 0 and σ ij (t) are the appreciation rate and volatility coefficient, respectively. Let µ(t) := µ 1 (t), · · · , µ n (t) and for all t ∈ [0, T]. We assume that r(t), µ(t) and σ(t) are the deterministic functions of time and they are bounded for t ∈ [0, T]. We also assume the following nondegeneracy condition holds for some constant δ > 0, An investor enters the market with initial wealth x 0 and allocates this wealth continuously during the invest horizon [0, T]. The total wealth at time t is denoted as x(t) and the portfolio decision vector is denoted as u(t):= u 1 (t), · · · , u n (t) , which represents the allocations of the wealth in n risky assets. The wealth level x(t) evolves according to the following SDE (e.g., see Zhou et al. [3]): is the excess return rate vector. We also assume the following condition holds. Assumption 1. There exists some i * ∈ {1, · · · , n} such that b i * (t) > 0.
Assumption 1 says that the return rates of some risky assets should be greater than the return rate of the risk-free asset, which is reasonable in real markets.
Motivated by the restrictions on real investments, we consider the following constraint for the portfolio decision vector: where H(t) ∈ R k×n is the deterministic matrix for t ∈ [0, T]. The above constraint is known as the convex cone constraint, which includes various portfolio constraints as its special cases, e.g., the no-shorting constraint ( [12]). In the following, we use to denote the variance of some random variable x. The investor adopts the following dynamic MV portfolio decision model to guide his investment: (3) and (4).
where d is expected terminal wealth level. Usually, we set d > x 0 e T 0 r(s)ds ; i.e., the target expected terminal wealth should be greater than that obtained by investing all into the risk-free account. In model (P MV ), the penalty term u(t) R(t)u(t) is used to control the risk exposure in the risky assets (The penalty term u(t) R(t)u(t) can be used to adjust the amount invested in risky assets. Hence, it can help control the risky exposure in the risky assets). It is worthwhile to mention that our model is more general than the one studied in the current literature; e.g., the work [12] only involved the no-shorting constraint (In [12], some additional assumptions were needed; e.g., they assumed µ i (t) > r(t) for all i = 1, · · · , n. This assumption is crucial to obtain their results. However, in our model, we can relax said assumption to Assumption 1) and [17] did not consider the penalty term u(t) R(t)u(t) in their model.

Solution Scheme for Problem (P MV )
To solve problem (P MV ), we first reformulate it as a special case of the LQ control problem (P T LQ ) investigated in [18] and use the solution provided in Section 3 of [18] to solve a constrained dynamic MV portfolio selection problem (P MV ). We utilize the embedding technique introduced by Li et al. [2] to overcome the difficulties of the inseparability of variance term in the sense of dynamic programming. We consider the following auxiliary problem ( P MV (λ)) by introducing the Lagrange multiplier λ ∈ R for constraint E[x(T)] = d: (3) and (4).
We define the discount factor as for t ∈ [0, T] and construct a new state variable z(t) as z(t) := x(t) − λρ(t). Replacing the state variable in ( P MV (λ)) yields the following problem: Obviously, this problem is a special case of problem (P T LQ ) by setting A(t) = r(t), Thus, similarly to equations (6) and (7) in [18], we introduce the following two ODEs for the two unknown functionsĜ mv (·) Furthermore, the functionsĜ mv (·) andḠ mv (·) have the following properties.
Note that, in Lemma 1, the strict inequality only holds for (8). Such a result plays an important role in the following.
The following result characterizes the solution of problem (P MV ).

Theorem 1.
The associated optimal portfolio policy of problem (P MV ) is given by whereK mv (t) = arg min K mv (t) = arg min Moreover, the optimal Lagrange multiplier λ * is Usually, the investor is interested in the MV efficient frontier, i.e., the Parato optimal set of expected terminal wealth d and the associated minimum variance Var[x(T)]. Substituting λ * back into v( P MV (λ)) yields the semi-analytical expression of the MV efficient frontier as follows: with E[x * (T)] ≥ x 0 ρ(0) −1 . Note that the second term can be evaluated by the Monte Carlo simulation method after computing allK mv (t) andK mv (t).

An Alternative Method for Problem (P MV ) without the Penalty Term
In this section, we provide an alternative method to resolve this dynamic MV portfolio selection problem with cone constraints. Consider the following portfolio selection problem (A MV ) without the penalty term u(t) R(t)u(t); i.e., (3) and (4).
Clearly, this problem was investigated in detail in [17]. Instead of using the complicated backward stochastic differential equation method (BSDE), we solve this problem by using some properties induced from the special structure of model. Firstly, we consider the following auxiliary problem ( A MV (λ)) by introducing the Lagrange multiplier λ ∈ R for constraint E[x(T)] = d: (3) and (4).
Define the value function of problem ( A MV (λ)) at any time t ∈ [0, T] as follows: From the classical optimal control theory (see, e.g., [19,20]), the value function V(t, x) satisfies the following Hamilton-Jacobi-Bellman (HJB) equation: with the boundary condition V(T, x) = (x − λ) 2 . By Theorem 1 in [18], we have the following important result.

Theorem 2.
While V xx (t, x) = 0, the infimum in the HJB Equation (14) can be obtained as follows: where theK(t) andK(t) are optimal solutions of the following problems: respectively.
Proof. As for getting the result using Theorem 1 in [18], we leave the proof for the interested readers.
Before we go further, we need the following result. From Theorem 2 and Lemma 2, we can obtain the optimal portfolio policy of problem (A MV ). Before we go further, we must introduce the following two ODEs for the two unknown functionsF(·) : R → R andF(·) : R → R:F where the parametersL(t) andL(t) arê (t, L,F(t)), (t, L,F(t)).

An Illustrative Example
Here we consider an example of the MV portfolio selection model. We used the financial indices of six industrial sectors in U.S stock market as the risky assets, namely, the sectors of toys and recreation, communication, ship building, coal, gold, and industrial mining. The detailed instructions of these industrial indices and historical data can be found in http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/det_ 48_ind_port.html (accessed on 18 August 2021). Based on the historical data of monthly return from January 2008 to December 2013, we could estimate the parameters of average return rate and volatility; e.g. Differently from the MV portfolio model studied in [12], in which the no-shorting constraints were imposed in all risky assets, our model allows the no-shorting constraints to only be imposed on some of the assets; i.e., we set no-short constraints u 2 (t) ≥ 0, u 3 (t) ≥ 0, u 4 (t) ≥ 0 and u 6 (t) ≥ 0 for t ∈ [0, T]. The remaining assets had no constraints; i.e., u 1 (t) and u 5 (t) were free of constraints. These constraints can be easily represented in matrix formulation H(t)u(t) ≤ 0, i.e., by setting the diagonal elements of H(t) to be −1 except the first and fifth elements, and setting all the other elements to be 0. We also assumed that there is a risk-free asset with the monthly return rate being 0.25%. The investor entered market with initial wealth x 0 = 100 and hoped for the (expected) terminal wealth level d = 130, with the investment horizon being T = 12 months. We also included a penalty matrix for the portfolio: R(t) = 10 −2 I for all t ∈ [0, T].
Based on Theorem 1, we can compute the optimal Lagrangian multiplier as λ * = 189.78, and the optimal portfolio policy is u * (t) = whereK mv (t) andK mv (t) are plotted in Figure 1 for t ∈ [0, T]. We used the buy-and-hold portfolio policy as a benchmark; i.e., we treated the total T = 12 month as the one-period static MV portfolio selection model. All the constraints and parameters were set to be same as our dynamic MV portfolio selection model (P MV ). Solving the problem provided us the buy-and-hold portfolio policy. Figure 2 plots the realizations of the wealth processes generated by our model (P MV ) and the static benchmark for the identical sample path of the price process. Figure 2 shows that our model performed better than the static benchmark model. To compare the performances of these two models, we plotted the mean-variance efficient frontier of the terminal wealth x(T) for two models in Figure 3. The efficient frontier describes the Parato optimal set of the mean and standard deviation of the wealth. In this example, the efficient frontier was plotted in the following way. We varied the expected terminal wealth level d and computed the correspondent optimal portfolio. Once we had the portfolio, we could compute the standard deviation of the terminal wealth. More specifically, we first simulated 100,000 sample paths for each return of risky assets, respectively. Then, we could compute the corresponding realized portfolio policy and realized terminal wealth. According to the realized terminal wealth, we could compute the expected value and standard deviation of terminal wealth. Figure 3 shows that our dynamic MV portfolio policy achieved a lower level of risk than the static MV portfolio policy for the same level of expected terminal wealth.

Conclusions
This paper revisited the dynamic MV portfolio selection problem with cone constraints in continuous-time. By reformulating our dynamic MV portfolio selection model into a special stochastic LQ optimal control model, we successfully developed the optimal portfolio policy for our model. Moreover, we provided an alternative method to resolve this dynamic MV portfolio selection problem with cone constraints. More specifically, instead of solving the correspondent HJB equation directly, we developed the optimal solution for this problem by using the special properties of value function induced from its model structure, such as the monotonicity and convexity of value function. This alternative method offers a new way of thinking under the constrained mean-risk framework and can be extended to solve another portfolio selection models with cone constraints. Finally, our illustrative example demonstrated that our dynamic MV portfolio policy dominates the static MV portfolio policy. For example, our dynamic MV portfolio policy achieved a lower level of risk than the static MV portfolio policy for the same level of expected terminal wealth. Moreover, our model performed better than the static benchmark model in terms of the realized wealth processes.
One possible future study will be to extend our results to the stochastic control problem with partial moments as the objective function, which has various applications in portfolio management, such as the mean-downside-risk portfolio optimization model. Moreover, it would also be significant to extend our results to the dynamic portfolio problem with no bankruptcy constraint and other trading constraints, such that our model becomes easier to be applied to real portfolio management.
The remaining task is to identify the optimal Lagrangian multiplier λ * . From Theorem 2 in [18], it not hard to derive the optimal value of problem (P MV (λ)) for any fixed λ as Then, the optimal Lagrangian multiplier λ * can be identified by maximizing λ * = max λ∈R n v(P MV (λ)). According to (7) and (8), we can find that v(P MV (λ)) is a piece-wise concave function with respect to λ. Therefore, the optimal Lagrangian multiplier λ * can be derived as (12). The optimal portfolio decision (9) can be achieved by replacing z(t) with x(t) in (A1). Proof. Firstly, we will prove the convexity of the value function V(t, x). Let (X 1 (t), u 1 (t)) and (X 2 (t), u 2 (t)) be any different admissible pairs. We define (X(t),ũ(t)) = (aX 1 (t) + (1 − a)X 2 (t), au 1 (t) + (1 − a)u 2 (t)), for any 0 < a < 1. It is easily to verify that (X(t),ũ(t)) satisfies H(t)ũ(t) ≥ 0. That is to say, (X(t),ũ(t)) is an admissible pair. Therefore, where the last inequality holds true due to the convexity of square function. Since (X 1 (t), u 1 (t)) and (X 2 (t), u 2 (t)) were chosen arbitrarily, we can find that Therefore, the convexity of V(t, x) can be established. Secondly, before we show that V(t, x) is strictly decreasing on (−∞, λρ(t)] and is strictly increasing on [λρ(t), +∞), we need show that the value function V(t, x) = 0 while x = X(t) = λρ(t). Indeed, we assume that X(t) = λρ(t). After that, we could obtain X(T) = λ if taking u(t) = 0. Therefore, we have Based on the non-negative of square function, we could obtain the result that V(t, λρ(t)) = 0.
Finally, we will focus on proving V(t, x) is strictly increasing on [λρ(t), +∞), the other part is similar. For any X b (t) > X a (t) ≥ X 0 (t) = λρ(t), there exists k such that where k ∈ [0, 1). Thus, based on the convexity of V(t, x), we have V(t, X a (t)) ≤ kV(t, X b (t)) + (1 − k)V(t, X 0 (t)) ≤ kV(t, X b (t)) < V(t, X b (t)), where the second inequality is from the fact that V(t, λρ(t)) = 0 and the last inequality is according to the assumption that k ∈ [0, 1). This result implies that V(t, x) is strictly increasing in [λρ(t), +∞), which completes the proof.
Appendix A.4. The Proof of Theorem 3 Proof. Combining Theorem 2 and Lemma 2, we could easily obtain the optimal solution for the HJB Equation (14) as follows: where V xx (t, x) = 0. Indeed, if we let V xx (t, x) = 0, it has V x (t, x) = constant. However, this result conflicts with Lemma 2, since V(t, x) is strictly decreasing on (−∞, λρ(t)] and strictly increasing on (λρ(t), +∞), for each fixed t ∈ [0, T].
Recall that if there are no strategy constraints, the optimal solution for the HJB Equation (14) is Thus, if we define it is not hard to see that solving the HJB Equation (14) is equivalent to solve the following auxiliary HJB equation without strategy constraints V t + inf u(t) { 1 2 V xx u(t) σ(t)σ(t) u(t) + V x [r(t)x + u(t) B(t)]} = 0 V(T, x) = (x − λ) 2 (A3) More specifically, the solution of (A3) is equal to the one of (14). In fact, the solution to the HJB Equation (A3) is also the value function associated with the following problem: (Ā mv (λ)) min (s.t.) dx(t) = (r(t)x(t) + B(t) π(t))dt + u(t) σ(t)dW(t).
This problem has been widely studied in the literature and can be solved by the stochastic optimal control or martingale method (see, [3,5]). The optimal portfolio policy of problem (Ā mv (λ)) is Therefore, the optimal control of mean-variance problem with cone constraints is (15).