Mean-variance portfolio selection with tracking error penalization

This paper studies a variation of the continuous-time mean-variance portfolio selection where a tracking-error penalization is added to the mean-variance criterion. The tracking error term penalizes the distance between the allocation controls and a reference portfolio with same wealth and fixed weights. Such consideration is motivated as follows: (i) On the one hand, it is a way to robustify the mean-variance allocation in case of misspecified parameters, by"fitting"it to a reference portfolio that can be agnostic to market parameters; (ii) On the other hand, it is a procedure to track a benchmark and improve the Sharpe ratio of the resulting portfolio by considering a mean-variance criterion in the objective function. This problem is formulated as a McKean-Vlasov control problem. We provide explicit solutions for the optimal portfolio strategy and asymptotic expansions of the portfolio strategy and efficient frontier for small values of the tracking error parameter. Finally, we compare the Sharpe ratios obtained by the standard mean-variance allocation and the penalized one for four different reference portfolios: equal-weights, minimum-variance, equal risk contributions and shrinking portfolio. This comparison is done on a simulated misspecified model, and on a backtest performed with historical data. Our results show that in most cases, the penalized portfolio outperforms in terms of Sharpe ratio both the standard mean-variance and the reference portfolio.


Introduction
The Markowitz mean-variance portfolio selection problem has been initially considered in Markowitz (1952) in a single-period model. In this framework, investement decision rules are made according to the objective of maximizing the expected return of the portfolio for a given financial risk quantified by its variance. The Markowitz portfolio is widely used in the financial industry due to its intuitive formulation and the fact that it produces, by construction, portfolios with high Sharpe ratios (defined as the ratio of the average of portfolio returns over their volatility), which is a key metric used to compare investment strategies.
The mean-variance criterion involves the expected terminal wealth in a nonlinear way due to the presence of the variance term. In a continuous-time dynamic setting, this induces the so-called time inconsistency problem and prevents the direct use of the dynamic programming technique. A first approach, from Zhou and Li (2000), consists in embedding the mean-variance problem into an auxiliary standard control problem that can be solved by using stochastic linear-quadratic theory. Some more recent approaches rely on the development of stochastic control techniques for control problems of McKean-Vlasov (MKV) type. MKV control problems are problems in which the equation of the state process and the cost function involve the law of this process and/or the law of the control, possibly in a non-linear way. The mean-variance portfolio problem in continuous-time is a McKean-Vlasov control problem of the linear-quadratic type. The state diffusion, which represents the wealth of the portfolio, involves the state process and the control in a linear way while the cost involves the terminal value of the state and the square of its expectation due to the variance criterion. In Andersson and Djehiche (2011), the authors solved the mean-variance problem as a McKean-Vlasov control problem by deriving a version of the Pontryagin maximum principle. More recently, Pham and Wei (2017) have developed a general dynamic programming approach for the control of MKV dynamics and applied it for the resolution of the mean-variance portfolio selection problem. In Fischer and Livieri (2016), the mean-variance problem is viewed as the MKV limit of a family of controlled many-component weakly interacting systems. These prelimit problems are solved by standard dynamic programming, and the solution to the original problem is obtained by passage to the limit.
A frequent criticism addressed to the mean-variance allocation is its sensitivity to the estimation of expected returns and covariance of the stocks and the risk of a poor out-ofsample performance. Several solutions to these issues have been considered. An approach consists in using a more sophisticated model than the Black-Scholes model, in which the parameters are stochastic or ambiguous and to take decisions under the worst-case scenario over all conceivable models. Robust mean-variance problems have thus been considered in the economic and engineering literature, mostly on single-period or multi-period models; see, e.g., Fabozzi et al. (2010), Pinar (2016), and Liu and Zeng (2016). In a continuous-time setting, Ismail and Pham (2019) have developed a robust approach by studying the mean-variance allocation with a market model where the model uncertainty affects the covariance matrix of multiple risky assets. In Guo et al. (2020), the authors study the problem of utility maximization under uncertain parameters in a model where the parameters of the model do not evolve freely within a given range, but are constrained via a penalty function. Let us also mention uncertain volatility models in Matoussi et al. (2012) and Lin and Riedel (2014) for robust portfolio optimization with expected utility criterion. Another approach is to rely on the shrinking of the portfolio weights or of the wealth invested in each risky asset in order to obtain a more sparse or more stable portfolio. In DeMiguel et al. (2009), the authors find single-period portfolios that perform well out-of-sample in the presence of estimation error. Their framework deals with the resolution of the traditional minimumvariance problem with the additional constraint that the norm of the portfolio-weight vector must be smaller than a given threshold. In Ho et al. (2015), the authors study a one-period mean-variance problem in which the mean-variance objective function is regularized with a weighted elastic net penalty. They show that the use of this penalty can be justified by a robust reformulation of the mean-variance criterion that directly accounts for parameter uncertainty. In the same spirit, in Chen et al. (2013), l p -norm regularized models are used to seek near-optimal sparse portfolios.
In this paper, we investigate the mean-variance portfolio selection in continuous time with a tracking error penalization. This penalization represents the distance between the optimized portfolio composition and the composition of a reference portfolio with the same wealth but fixed weights that have been chosen in advance. Typical reference portfolios widely used in the financial industry are the equal weights, the minimum variance and the equal risk contribution (ERC) portfolios. The equal weights portfolio studied, e.g. in Duchin and Levy (2009), is a portfolio where all the wealth of the investor is invested in risky assets and divided equally between the different assets. The minimum variance portfolio is a portfolio where all the wealth is invested in risky assets and portfolio weights are optimized in order to attain the minimal portfolio volatility. The ERC portfolio, presented in Maillard et al. (2010) and in the monography Roncalli (2013), is totally invested in risky assets and optimized such that the contributions of each asset to the total volatility of the portfolio are equal. The mix of the mean-variance and of this tracking error criterion can be interpreted in two different ways: (i) From a first viewpoint, it is a procedure to regularize and robustify the mean-variance allocation. By choosing reference portfolio weights which are not based on the estimation of market parameters, or which are less sensible to estimation error, the allocation obtained is more robust to parameters estimation error than the standard mean-variance one. (ii) From a second viewpoint, this optimization permits to mimic an allocation corresponding to the reference portfolio weights while improving its Sharpe ratio via the consideration of the mean-variance criterion.
We tackle this problem as a McKean-Vlasov linear-quadratic control problem and adopt the approach developed in Basei and Pham (2019), where the authors give a general method to solve this type of problems by means of a weak martingale optimality principle. We obtain explicit solutions for the optimal portfolio strategy and value function, and provide asymptotic expansions of the portfolio strategy and efficient frontier for small values of the portfolio tracking error penalization parameter. We then compare the Sharpe ratios obtained by the standard mean-variance portfolio, the penalized one and the reference portfolio in two different ways. First, we compare these performances on simulated market data with misspecified market parameters. Different magnitudes of parameter misspecifications are used to illustrate the impact of the parameter estimation error on the performance of the different portfolios. In a second time, we compare the performances of these portfolios on a backtest based on historical market data. In these tests, we shall consider three reference portfolios cited above: the equal weights, the minimum variance and the equal risk contribution (ERC) portfolios. Finally, we will also consider the case where the reference portfolio weights are all equal to zero. This case corresponds to a shrinking of the wealth invested in the different risky assets along the investment horizon.
The rest of the paper is organized as follows. Section 2 formulates the mean-variance problem with tracking error. In Section 3 we derive explicit solutions for this control problem and provide expansion of this solution for small values of the tracking error penalization parameter. Section 4 is devoted to the applications of those results and to the comparison of the mean-variance, penalized and reference portfolio for the different reference portfolios presented above. We show the benefit of the penalized portfolio compared to the standard mean-variance portfolio and the different reference portfolios on simulated and historical data in terms of Sharpe ratio and the lower sensitivity of the penalized portfolio to parameter estimation error.

Formulation of the problem
Throughout this paper, we fix a finite horizon T ∈ (0, ∞), and a complete probability space Ω, F, P, F = {F t } 0≤t≤T on which a standard F-adapted d-dimensional Brownian motion W = (W 1 , ..., W d ) is defined. We denote by L 2 F (0, T ; R d ) the set of all R d -valued, measurable stochastic processes (f t ) t∈[0,T ] adapted to F such that E T 0 |f t | 2 dt < ∞. We consider a financial market with price process P := (P t ) t∈[0,T ] , composed of one risk-free asset, assumed to be constant equal to one, i.e., P 0 ≡ 1, and d risky assets on a finite investment horizon [0, T ]. These assets price processes P i t , i = 1, ..., d satisfy the following stochastic differential equation: is the appreciation rate, and σ := (σ ij ) i,j=1,...,d ∈ R d×d is the volatility matrix of the d stocks. We denote by Σ := σσ the covariance matrix. Throughout this paper, we will assume that the following nondegeneracy condition holds Let us consider an investor with total wealth at time t ≥ 0 denoted by X t , starting from some initial capital x 0 > 0. It is assumed that the trading of shares takes place continuously and transaction cost and consumptions are not considered. We define the set of admissible portfolio strategies α = (α 1 , . . . , α d ) as where α i t , i = 1, ..., d represents the total market value of the investor's wealth invested in the ith asset at time t. The dynamics of the self-financed wealth process X = X α associated to a portfolio strategy α ∈ A is then driven by (2.1) Given a risk aversion parameter µ > 0, and a reference weight w r ∈ R d , the objective of the investor is to minimize over admissible portfolio strategies a mean-variance functional to which is added a running cost: This running cost represents a running tracking error between the portfolio composition α t of the investor and the reference composition w r X t of a portfolio of same wealth X t and constant weights w r . The matrix Γ ∈ R d×d is symmetric positive definite and is used to introduce an anisotropy in the portfolio composition penalization. The penalization , which we will call "tracking error penalization", is introduced in order to ensure that the portfolio of the investor does not move away too much from this reference portfolio with respect to the distance |M | := M ΓM, M ∈ R d . The mean-variance portfolio selection with tracking error is then formulated as and an optimal allocation given the cost J(α) will be given by We complete this section by recalling the solution to the mean-variance problem when there is no tracking error running cost, and which will serve later as benchmark for comparison when studying the effect of the tracking error with several reference portfolios.
Remark 2.1 (Case of no tracking error). When Γ = 0, it is known, see e.g. Zhou and Li (2000) that the optimal mean-variance strategy is given by where X * t is the wealth process associated to α * . The vector Σ −1 b, which depends only on the model parameters of the risky assets, determines the allocation in the risky assets.
In the sequel, we study the quantitative impact of the tracking error running cost on the optimal mean-variance strategy.

Solution allocation with tracking error
Our main theoretical result provides an analytic characterization of the optimal control to the mean-variance problem with tracking error.
where S t := K t Σ + Γ. The optimal control for problem (2.3) is then given by and X = X α Γ is the wealth process associated to α Γ . Moreover, we have Proof. Given the existence of a pair (K, Λ) ∈ C [0, T ], R * + ×C ([0, T ], R + ) solution to (3.1), the optimality of the control process in (3.2) follows by the weak version of the martingale optimality principle as developed in Basei and Pham (2019). The arguments are recalled in appendix A.1.
Here, let us verify the existence and uniqueness of a solution to the system (3.1).
(i) We first consider the equation for K, which is a scalar Riccati equation. The equation for K is associated to the standard linear-quadratic stochastic control problem: By a standard result in control theory (Yong and Zhou, 1999, Ch. 6, Thm. 6.1, 7.1, 7.2), there exists a unique solution (ii) Given K, we consider the equation for Λ. This is also a scalar Riccati equation.
By the same arguments as for the K equation, there exists a unique solution Λ ∈ C([0, T ], R + ) to the second equation of (3.1), provided that for some δ > 0. We already have that Λ T = 0. From the fact that K > 0, together with the nondegeneracy condition on the matrix Σ, we have that K t Σ + Γ ≥ Γ ≥ δI d .
Since Γ > 0, and under the nondegeneracy condition of matrix Σ, we can use the Woodbury matrix identity to obtain We then get (iii) Given (K, Λ), the equation for Y is a linear ODE, whose unique continuous solution is explicitly given by We can see from the expression of the optimal control (3.2) that the allocation in the risky assets has two components. One component is determined by the vector S −1 t Γw r = (K t Σ + Γ) −1 Γw r with leverage X t , and the second one by the vector . Computing the average wealth X = E[X] associated to α Γ , we can express the control α Γ as a function of the initial wealth of the investor x 0 and the current wealth X t Remark 3.2. In the case when Γ is the null matrix, Γ = 0, we see that the first component of the optimal control (3.3) vanishes, The first line of the optimal control α Γ equation vanishes and the second line can be rewritten as Computing the integral in this expression, we recover the optimal control of the classical mean-variance problem (2.4).
Remark 3.3 (Limit of α γ t for Γ = γI d → ∞). If we consider Γ in the form Γ = γI d , the optimal control can be rewritten as We show in appendix A.4 that K t and Λ t are bounded functions of the penalization parameter γ, thus Kt γ , Λt γ −→ γ→∞ 0.
We rewrite Y t as . Thus the second term of (3.4) vanishes and we get which corresponds to the reference portfolio.
Remark 3.4 (Expansion for Γ = γI d → 0). We take Γ = γI d . Since the covariance matrix Σ is symmetric, there exists an invertible matrix Q ∈ R d×d and a diagonal matrix D ∈ R d×d such that Σ = Q·D ·Q −1 . We can then rewrite the matrix S −1 t : keeping only the terms up to the linear term in γ.
Putting this expression in the differential equation of K, and keeping only the terms up to the linear term in γ, we get the differential equation

5)
where we set ρ := b Σ −1 b. We look for a solution to this equation of the form Putting this expression in the differential equation (3.5), we get two differential equations, for the leading order and the linear order in γ respectively which yield the explicit solution is the solution to the differential equation in the unpenalized case. From the expansion for K, we can write the expansion of the differential equation for Λ up to the linear term in γ. We use the expansion and we get the following expansion of the differential equation of Λ As before, we look for a solution of this differential equation of the form Plugging this expression into the equation (3.6), we get the two following differential equa- The first differential equation yields the solution Λ 0 t = 0, ∀t ∈ [0, T ]. Replacing Λ 0 t by this value in the second differential equation, we get the equation and obtain the solution We can also compute the first order expansion of where we set The last expansion we need to compute before rewritting the optimal control is the expansion of H t . We can rewrite As shown in appendix A.2, we can rewrite the optimal control where we set Σ −2 := (Σ −1 ) 2 , and with

(3.8)
We see that for γ = 0, we recover the classical mean-variance optimal control. For nonzero values of γ, we see that a mix of three different portfolio allocations is obtained. The weight of the allocation Σ −1 b is modified and two allocations Σ −2 b and Σ −1 w r appear with weights γα 1,2 From this expansion of the control α γ , we can compute the first order asymptotic expansion in γ of the equation giving the relation between the variance of the terminal wealth of the portfolio and its expectation. In the classical mean-variance case, this equation is called the efficient frontier formula. As shown in appendix A.3, with the tracking error penalization, the first order asymptotic expansion in γ gives The leading order term corresponds to the efficient frontier equation of the classical meanvariance allocation computed in Zhou and Li (2000), and thus for γ = 0, we recover this classical result. The linear term in γ contains contributions of the three perturbative allocations. A modification of "leverage" of the original mean-variance allocation Σ −1 b and two different allocations Σ −2 b and Σ −1 w r .

Applications and numerical results
In this section, we apply the results of the previous section and study the allocation obtained by considering four different static portfolios as reference. First, we shall study these allocations on simulated data, in the case of misspecified parameters. The misspecification of parameters means that the market parameters used to compute the portfolio allocations are different from the ones driving the stocks prices. This study allows us to estimate the impact of the estimation error on the portfolio performance. In a second time, we perform a backtest and run the different portfolios on real market data. To simplify the presentation, we will assume now that the tracking error penalization matrix is in the form Γ = γI d with γ ∈ R * + . With this simplification, we have S −1 t = (K t Σ + γI d ) −1 and we can rewrite the system of ODEs (3.1) and the optimal control (3.2) as We will consider three different classical allocations as reference portfolio.
(i) Equal-weights portfolio: in this classical equal-weights portfolio, the same capital is invested in each asset, thus where d is the number of risky assets considered and e ∈ R d is the vector of ones.
(ii) Minimum variance portfolio: the minimum variance portfolio is the portfolio which achieves the lowest variance while investing all its wealth in the risky assets. The weight vector of this portfolio is equal to These weights correspond to the one-period Markowitz portfolio when every asset expected return b i is taken equal to 1. In that case, only the portfolio variance is relevant and is minimized during the optimization process.
(iii) ERC portfolio: the equal risk contributions (ERC) portfolio, presented in Maillard et al. (2010) and in the monograph Roncalli (2013) is constructed by choosing a risk measure and computing the risk contribution of each asset to the global risk of the portfolio. When the portfolio volatility is chosen as the risk measure, the principle of the ERC portfolio lays in the fact that the volatility function satisfies the hypothesis of Euler's theorem and can be reduced to the sum of its arguments multiplied by their first partial derivatives. The portfolio volatility σ(w) = √ w Σw of a portfolio with weights vector w ∈ R d can then be rewritten as The term under the sum w i (Σw) i σ(w) , corresponding to the i-th asset, can be interpreted as the contribution of this risky asset to the total portfolio volatility. The equal risk contribution allocation is then defined as the allocation in which these contributions are equal for all the risky assets of the portfolio, for every i, j ∈ 1, d . The equal risk contribution allocation is thus obtained when the portfolio weights w * are given by With this risk measure, the ERC portfolio weights can be expressed in a closed-form only in the case where the correlations between every couple of stocks are equal, that is corr(P i , P j ) = c, ∀ i, j ∈ 1, d , with the additional assumption that c ≥ − 1 d−1 . Under these assumptions, and with the constaint that d i=1 (w erc r ) i = 1, the weights of this portfolio are equal to where σ i is the volatility of the i-th asset.
In the general case, the weights of the ERC portfolio do not have a closed form and must be computed numerically by solving the following optimization problem s.t e w = 1 and 0 ≤ w i ≤ 1, ∀i ∈ 1, d .
(iv) Control shrinking (zero portfolio): this is the portfolio where all weights are equal to zero, w i r = 0 for all i. This case corresponds to a shrinking of the controls of the penalized allocation, in the same spirit as the shrinking of regression coefficients in the Ridge regression (or Tikhonov regularization).

Performance comparison with Monte Carlo simulations
In this section we compare, for each reference portfolio, the classical dynamic mean-variance allocation, the reference portfolio and the "tracking error" penalized portfolio. In a real investment situation, expected return and covariance estimates are noisy and biased. Thus, in order to compare the three portfolios and observe the impact of adding a tracking error penalization in the mean-variance allocation, we will run Monte Carlo simulations, assuming that the real-world expected returns b real and covariances σ real are equal to reference expected returns b 0 and covariances σ 0 plus some noise: 0.05 −0.05 0.10 0.05 1. −0.03 0.12 −0.05 −0.03 1. −0.13 0.10 0.12 −0.13 1.
with the volatilties v 0 and correlations C 0 and where the covariance matrix σ 0 is obtained from v 0 and C 0 . The noise follows a standard normal distribution N (0, 1) and is its magnitude. We use Monte Carlo simulations to estimate the expected Sharpe ratio of each portfolio, equal to the average of the portfolio daily returns R divided by the standard deviation of those returns: E E[R] Stdev(R) . We consider an investment horizon of one year, with 252 business days and a daily rebalancing of the portfolio. The risk aversion parameter µ is chosen so that the targeted annual return of the classical mean-variance allocation is equal to 20%, thus µ = e b Σ −1 b 2x 0 * 1.20 according to Zhou and Li (2000). The initial wealth of the investor x 0 is chosen equal to 1 and we choose the penalization parameter γ = µ/100. Indeed, as the value of µ depends on the value of the stocks expected return and covariance matrix and on the targeted return, and can be very big, we express γ a function of this µ in order for the penalization to be relevant and non-negligible.
For each reference portfolio, we compare the reference portfolio, the classical meanvariance allocation and the penalized one for values of noise amplitude ranging from 0 to 1. For each value of , we run 2000 scenarios and we plot the graphs of the average Sharpe ratio as a function of .
On the following graphs, we can see that in the four cases, the mean-variance and the penalized portfolios are superior to the reference. In the case where the equal weights portfolio is chosen as reference, the penalized portfolio's Sharpe ratio is lower than the mean-variance one for small values of . For greater than approximately 0.25, the penalized portfolio's Sharpe ratio becomes larger and the gap with the mean-variance's Sharpe tends to increase with . The same phenomenon occurs in the case where the ERC portfolio is chosen as reference, with a smaller gap between the mean-variance and penalized portfolios' Sharpe ratios. When the minimum variance portfolio is chosen as reference, the penalized portfolio's Sharpe ratio is lower than the one of the mean-variance portfolio for all in the interval [0, 1]. This is certainly due to the sensitivity of the minimum variance portfolio to the estimator of the covariance matrix. Finally, in the case of the control shrinking, the Sharpe ratio of the penalized portfolio is significantly higher that the Sharpe ratio of the mean-variance portfolio, for every value of the noise amplitude in the interval [0, 1].

Performance comparison on a backtest
We now compare the different allocations on a backtest based on adjusted close daily prices available on Quandl between 2013-09-03 and 2017-12-28 for four stocks: Apple, Microsoft, Boeing and Nike. Here we chose a value of µ which corresponds to an annual expected return of 25%. In our example, we express again γ as a function of µ and we consider two different values, γ = µ and γ = µ/100. Figures 5, 6 and 7 show the total wealth of the four different portfolios, mean-variance, reference and the penalized portfolio with the big and the small penalization as a function of time. On these graphs we observe that, at the beginning of the investment horizon, the mean-variance allocation has the largest wealth increase, hence the largest leverage. As the wealth of this portfolio attains the target wealth, expressed as 1 2µ e b Σ −1 b T +x 0 in the meanvariance control equation (2.4), its leverage decreases and its wealth curve flattens. The same phenomenon occurs for the penalized allocation with large penalization parameter γ = µ. In this case, the high value of the penalization parameter keeps the penalized portfolio controls close to the ones of the mean-variance portfolio. On the contrary, the reference portfolios have constant weights and no target wealth. We can see that in each case the reference portfolio's wealth keeps increasing over the entire horizon. The wealth of the penalized portfolio with penalization parameter γ = µ/100 follows the wealth of these reference portfolio due to the small value of the tracking error penalization.
For these three reference portfolios, we observe that the penalized portfolio with penalization parameter γ = µ outperforms both the mean-variance and the reference portfolios in terms of Sharpe ratio whereas the penalized portfolio with penalization parameter γ = µ/100 outperforms the mean-variance but underperforms the reference portfolio. This can be attributed to the larger weight of the mean-variance criterion with respect to the tracking error in the optimized cost (2.2) with penalization parameter γ = µ.
Finally, Figure 8 corresponds to the case of a reference portfolio with weights all equal to zero. This corresponds to a shrinking of the optimal control of the penalized portfolio. In that case, for a better visualization, we plot the total wealth of the mean-variance and penalized portfolios for penalization parameters γ = µ and γ = µ/100 normalized by the standard deviation of their daily returns. On this graph, we can see that the normalized wealth of the two penalized portfolio is higher than the one of the meanvariance allocation. Similarly to the three precedent reference portfolios, the two penalized portfolios outperform the mean-variance allocation in terms of Sharpe ratio. As previously, we observe that the Sharpe ratio of the penalized portfolio with penalization parameter γ = µ is greater than the one with γ = µ/100, due to the larger weight of the mean-variance criterion in the functional cost.

Conclusion
In this paper, we propose an allocation method based on a mean-variance criterion plus a tracking error between the optimized portfolio and a reference portfolio of same wealth and fixed weights. We solve this problem as a linear-quadratic McKean-Vlasov stochastic control problem using a weak martingale approach. We then show using simulations that for a certain degree of market parameter misspecification and the right choice of reference portfolio, the mean-variance portfolio with tracking error penalization outperforms the standard mean-variance and the mean-variance allocations in terms of Sharpe ratio. Another backtest based on historical market data also shows that the mean-variance portfolio with tracking error outperforms the traditional mean-variance and the reference portfolios in terms of Sharpe ratio for the four reference portfolios considered.
for some deterministic processes (K t , Λ t , Y t , R t ) to be determined. Condition (i) in Lemma (A.1) fixes the terminal condition where we set ζ := X 0 + 1 2µ e ρT . We get the solution The average of the square of the portfolio wealth at time t is given by the ODE We can then compute the variance of the terminal total wealth of the portfolio given by the control (3.7) We have B t → γ→∞ −2b w r − σ w r 2 , thus ψ t −→ γ→∞ 0, ∀t ∈ [0, T ] and K t is bounded in γ for every t ∈ [0, T ].