Portfolio Optimization Constrained by Performance Attribution

This paper investigates performance attribution measures as a basis for constraining portfolio optimization. We employ optimizations that minimize expected tail loss and investigate both asset allocation (AA) and the selection effect (SE) as hard constraints on asset weights. The test portfolio consists of stocks from the Dow Jones Industrial Average index; the benchmark is an equi-weighted portfolio of the same stocks. Performance of the optimized portfolios is judged using comparisons of cumulative price and the risk-measures maximum drawdown, Sharpe ratio, and Rachev ratio. The results suggest a positive role in price and risk-measure performance for the imposition of constraints on AA and SE, with SE constraints producing the larger performance enhancement.


Introduction
How well a portfolio performs is always the major concern for investors, and is usually the major metric reflecting investor confidence in the portfolio's management. In common terms, a good portfolio delivers satisfactory return with low risk. Attribution analysis provides measures for how well an portfolio is being managed. Paraphrasing from Bacon (2008), performance attribution is a technique used to quantify the excess return (relative to a benchmark) of a portfolio and explain that performance in terms of investment strategy and market conditions. From a management perspective, attribution analysis has been used to monitor performance, to identify early indications of underperformance, and gain investor confidence by demonstrating a thorough understanding of the performance drivers.
Following the fundamental work on performance attribution by Brinson and Fachler (1985) and Brinson et al. (1986), we decompose excess return into two quantities that reflect investment strategy: asset allocation (AA), which measures the contribution of each asset class in a portfolio to total performance of the portfolio, and the selection effect (SE), which measures the impact of choice of assets within each class in the portfolio.
As is apparent from their definitions in the next section, AA and SE measure the differences between mean performance of asset classes in a managed portfolio and those of a market benchmark, and are therefore 'blind' to volatility effects, i.e. to tail-risk. Motivated by this and by the work of Biglova and Rachev (2007) and Rachev et al. (2009), we investigate the impact on portfolio optimization using AA and SE as hard constraints on asset weights as a method of combining performance and tail-risk control. We apply this methodology to a test portfolio of stocks comprising a major market index; specifically the Dow Jones Industrial Average. Optimization is performed by minimizing expected tail loss (ETL) at a specified quantile level, α. For the required market benchmark we utilize an equi-weighted portfolio comprised of the same assets. Performance of the resulting optimal portfolios is measured in terms of cumulative portfolio price and standard risk-measures.
Section 2 of the paper discusses performance attribution, the constrained ETL portfolio optimizations and the risk-measures employed to gauge their performance. Section 3 presents the results of the methodology applied to the test portfolio. Section 4 summarizes our conclusions.

Methodology
Consider a managed portfolio p comprised of N assets, consisting of M asset classes with n i assets in class Let b denote a benchmark portfolio comprised of the same assets. Let the index pair, ij; i = 1, . . . , M ; j = 1, . . . , n i , identify asset j in class i. Denote the daily closing price for this asset as S ij (t) and it's corresponding log-return as r ij (t) = ln (S ij (t)/S ij (t − 1)). For brevity, we will suppress the time variable for most of the discussion in this section. Let w ij denote its weight in the benchmark. We assume all weights are non-negative; that is, all portfolios considered take long-only positions. Let w ij represent the total weights of the assets in class i in the portfolio and benchmark respectively. The quantities AA and SE for asset class i are defined as follows: 1 where log-return for asset class i consided as a fully-invested portfolio by itself. In contrast, R (b) represents the usual expected log-return for the entire benchmark portfolio.
Similarly we have the usual expected log-return for portfolio p, The excess return, S = R (p) − R (b) , can be viewed as the value added by portfolio management. From (1) through (4), is an "interaction" term. AA, SE and I are, respectively, the total asset allocation, total selection effect, and total interaction terms for portfolio p. The contribution to the total value added to the excess return, S, from asset class i is AA i , while SE i represents the contribution to S determined by the choice of assets within class i. To understand these interpretations, consider first the sign of the value of AA i in (1).
> 0, the expected return from asset class i in the benchmark is outperforming the total expected return for the benchmark. Therefore if w > 0, the weight of asset class i in portfolio p is larger than in the benchmark, capitalizing further on the better return from class i. Otherwise, if < 0, the class i weighting in portfolio p is hurting the potential performance of that class (as determined by the benchmark).
i − R (b) < 0, the expected return from asset class i in the benchmark is under-performing the total expected return for the benchmark. Therefore if w < 0, the weight of asset class i in portfolio p is smaller than in the benchmark, further suppressing the poorer return from that class. Otherwise, if i > 0, the class i weighting in portfolio p is overweighting the poor performance of that class.
Thus a positive sign for the value of AA i indicates a "correct" decision in the management of portfolio p relative to the benchmark while a negative sign indicates a "poor" decision. The magnitude of AA i quantifies how good or poor the decision is.
Similarly, as we assume 3 w i , R (b) and r ij were simple (i.e. discrete) returns, formulas of the form (3) are exact. However, as they are log-returns, such formulas are approximate. For example, the formula for R (b) incurrs an error term which, to leading order in a Taylor series expansion, is 1 3 Also a requirement for class i to be in the portfolio. benchmark, while a negative sign indicates that the expected return from the choice of assets in class i in portfolio p is under-performing.
The interaction term, I i , captures the part of the excess return unexplained by asset allocation and selection effect. Written as it can be viewed as the product of the asset allocation and selection effect contributions of class i to portfolio p compared to the weighted excess return of class i in the benchmark b. Alternatively, written as it can be interpreted as the product of the asset selection effect and the over-or under-weighted part of asset class i. The relationship (7) between I i and SE i reveals a simple form for the sum of the selection effect and interaction terms, In the portfolio optimizations discussed next, (8) provides a way to incorporate a constraint on the sum of the selection and interaction effects for class i. We denote the total value of the combined selection effect plus interaction term by SE = M i=1 SE i . Expected tail loss (ETL), also known as conditional value-at-risk (CVaR), is defined in terms of valueat-risk (VaR). Let F (x) = P r{r ≤ x} denote the cumulative distribution function of a return r. Then where α ∈ (0, 1) is a prescribed quantile level, typically having value of 0.95, 0.99, or 0.995. We consider four portfolio optimization problems, P k α , k = 0, . . . , 3, based upon constrained minimization of ETL α (Krokhmal et al., 2002). These optimization problems successively add further performance attribute constraints to a long-only, fully invested, ETL α -minimized portfolio. For the total portfolio return R (p) (4), perform the subject to the constraints: The constants a i , b i can be user-specified to meet particular goals. For example, the constraint AA ≥ 0 requires that, on average, the asset classes in the optimized portfolio p equal-or-outperform those in the benchmark. A constraint SE i ≥ 0 requires that the weights of the assets in class i be adjusted to perform as well, or better than, class i in the benchmark. Since individual asset weights can be zero, this is equivalent to choice of assets in the class. The constraint SE ≥ 0 requires that this be true averaged over classes. We note that the minimization (11) can be solved in terms of a linear optimization function (Tütüncü et al., 2003). Performance of these four optimized portfolios, relative to each other and to the benchmark, will be judged based upon cumulative price, as well as performance relative to three common risk measures. Let w . . , T denote daily weights obtained from one of these optimizations. 5 Recalling that r ij (t) is the log-return based upon the closing price of asset ij on day t, the portfolio logreturn 2 and cumulative price are The three measures used are: which characterizes the maximum loss incurred from peak to trough during the time period [0, T ]; 2. Sharpe ratio (Sharpe, 1994), where r f (t) is a risk-free rate, and µ p and σ p are the expected mean and standard deviation of the portfolio's excess return, R (p) (t) − r f (t); and 3. Rachev ratio (Rachev et al., 2008), which represents the reward potential for positive returns compared to the risk potential for negative returns at quantile levels defined by the user. In our analysis, we set α = β = 0.95.

Application to a Test Portfolio
To illustrate portfolio optimization under performance attribution constraints, we consider a specific portfolio comprised of stocks from the Dow Jones Industrial Average (DJIA  Daily return data for the stocks covered 3, 240 trading days. Using a standard rolling-window strategy for optimization with a window size of 1,008 days (four years), optimized portfolio return data were computed for an in-sample period of T = 2, 232 days. In the portfolio optimization, we assume no transaction costs and impose no turnover constraints. We optimized at two separate quantile levels, α ∈ {0.95, 0.99}. For the attribution constraints in optimizations P 1 α , P 2 α , and P 3 α , we set the lower bounds a i = 0, i = 1, 2, 3, and 6 We exclude Dow Inc. which was spun off of DowDuPont on April 1, 2019. Its stock, under the ticker symbol "DOW", began trading on March 20, 2019. It was added to the DJIA on April 2, 2019. 7 Bloomberg Professional Services. 8 As is apparent from equations (1) through (7), results from an attribution analysis depend critically on choice of benchmark.
set no upper bounds (b i = ∞, i = 1, 2, 3). Thus, for example, optimization P 1 α minimizes the ETL for the long-only portfolio while requiring that, on average, its asset classes outperform the benchmark.
To circumvent the non-linear SE constraints in portfolio optimization P 2 α , we performed a two-step optimization procedure to linearize the SE constraints. 9 The first step satisfies condition (b) of optimization P 2 α by performing optimization P 1 α to provide the optimal class weights w (p) ij . The second step satisfies condition (c) using the fixed class weights from the first step.   The two-step optimization was employed only for optimizations P 2 α , P 5 α , P 7 α and P 8 α .
and SE for the optimized portfolios. In this figure, in the text and in similar figures below, we employ the notation AA 0 0.95 to refer to AA computed for P 0 0.05 , etc.
• As expected, compared to the benchmark the optimized portfolios P 0 α develop a less aggressive overall price growth by reducing ETL, with P 0 0.99 providing better cumulative return than P 0 0.95 . Reflecting the decreased return performance of P 0 α relative to the benchmark, values of AA, SE and SE are overwhelming negative, from 58% of the daily values being negative for AA 0 0.99 to 93% for SE 0 0.99 . The spread of values is smallest for AA and much wider for SE and SE.
• For the same value of α, differences between the price processes for P 1 α and P 0 α are relatively minor. Most noticeable is a narrowing of the difference between the performance of P  • Compared with a 34% MDD for the benchmark, these optimized portfolios reduce MDD by a further: ∼5% for P k 0.99 , k = 1, 2, 3; 7.5% for P k 0.95 , k = 0, 1, 2, 3; and 11% for P 0 0.99 .
• Compared with a value of 0.032 for the benchmark, Sharpe ratios for these optimized portfolios are reduced: by a factor of ∼2 for P 0 α and P 1 α ; by a factor of 1.2 to 1.5 for P 2 α ; and by a factor of 3 to 4 for P 3 α . The large decrease in Sharpe ratio for optimizations P 3 α reflect their relatively flat price performance. • Compared with the benchmark value of 0.867, Rachev ratios for these optimized portfolios generally increase (by 2% to 5%), with the exceptions of P 3 α and P 1 0.95 .
• The improved price performance of P 2 α (compared to the other optimized portfolios) is also accompanied by improved values in MDD and Rachev ratio (compared to the benchmark).  The risk-measure values presented in Fig. 2 provide a single value for the entire in-sample data period.
One consequence is that the MDD values in Fig. 2 reflect only the onset of the Covid-19 pandemic. Fig. 3 therefore plots daily risk measure values, computed using a 1,008-day moving window, for the benchmark and P 0 α through P 3 α over the period 3/22/2016 through 02/01/2021. The MDD data suggest three separate time periods, 2016:Q2 through 2018:Q4, 2019:Q1 though 2020:Q1, and 2020:Q2 into 2021:Q1. In the first time period, the benchmark and P 2 α display the lowest MDD while the benchmark has the highest MDD in period three. The benchmark has the highest Sharpe ratio in period one, but the optimized portfolios, particularly those for α = 0.99, become competitive with the benchmark in periods two and three. Generally the benchmark and P 3 α have the worst Rachev ratio. Of these optimized portfolios, P 2 α generally displays the best risk measures. During the pandemic (period three), all P k 0.95 , k = 0, 1, 2, 3 portfolios display comparable risk measures. There is greater variation in the performance of the P k 0.99 , k = 0, 1, 2, 3 portfolios during the pandemic period, with P 0 0.99 being a consistent best performer.
The addition of the performance attribution total measures AA, SE and SE as constraints do not result in uniformly improved performance relative to the ETL-optimized portfolio P 0 α . To test whether this is the result of using total measures that only constrain averages over all classes, we attempted a more aggressive set of optimizations which apply asset allocation and selection effect constraints to each individual class: Constraint (c) for P 5 α requires explanation. As with P 2 α , to avoid the non-linear constraint inherent in SE i , optimization problem P 5 α was also solved using the two-step approach, with the first optimization step determining the class weights, w i , independent of portfolio p. Setting a constraint on SE i in such a case makes no sense. Thus a necessary, though certainly not sufficient, requirement to establish an SE constraint on class i is that w (p) i be nonnegative. Fig. 4 shows the cumulative price performance and the statistics on the observed distributions of the performance attributes for these portfolios. The price performance of P 4 α improved relative to that of P 1 α , with the largest improvement occurring for P 4 0.95 . Compared to P 2 α , price performance improved for P 5 0.95 , while it degraded for P 5 0.99 . Significant improvements are seen in P 6 α relative to P 3 α . The spread of AA 4 α values increased compared to AA 1 α . The spread of AA 5 α , SE 5 α and SE 5 α values increased and were more significantly positive than for P 2 α . Similar observations hold for performance attribution distribution statistics for P 6 α .
The risk measures for these portfolios are given in Fig. 2. Compared to their counterparts in optimizations P 1 α through P 3 α : 1) with the exception of P 4 0.95 , some improvement in MDD was observed; 2) Sharpe ratios improved, with strong improvements of P 6 α compared to P 3 α ; and 3) Rachev ratios improve, with the exception of P 5 α compared to P 2 α , To highlight the aggressiveness of optimizations P 4 α , P 5 α and P 6 α , Fig. 5 Figure 4: (a) Cumulative price resulting from a $1 investment and (b) box-whisker plots of the observed distributions of AA, SE and SE values for the optimized portfolios P 4 α , P 5 α and P 6 α .
portfolio, the percentage of the time that no weight solution could be found in the feasibility region. 10 "Nosolution" rates are unacceptably high for P 4 α through P 6 α , indicating that such aggressive constraints should be implemented as soft, rather than hard, constraints.
To test the constrictions defining the feasilibity region, we considered two further optimization approaches which mix total and individual class constraints: Cumulative price plots for these optimizations are shown in Fig. 6; risk measures are given in Fig. 2; and no-solution rates in Fig. 5 10 If no feasibility region solution was found for day t, we employed a momentum strategy and set w ij (t) = w ij (t − 1).  Figure 6: (a) Cumulative price resulting from a $1 investment and (b) box-whisker plots of the observed distributions of AA, SE and SE values for the optimized portfolios P 7 α and P 8 α .
in magnituge than positive SE i values, non-constrained time steps skew total values of SE to comparatively large negative numbers. For the same value of α, no-solution rates for P 7 α are comparable to P 2 α , while those for P 8 α are comparable to P 5 α . Of the optimizations P 2 α , P 5 α , P 7 α and P 8 α , Rachev ratios are the lowest for P 7 α , while Sharpe ratios are generally comparable.

Conclusions
It is well known that no single optimization method can achieve all goals. Compared with the ETL optimizations P 0 α which impose no performance attribution constraints, the price and risk measure performances of the eight performance-constrained ETL α optimizations, each computed at two values of α = 095, 0.99, considered in this study lead to the following observations.
• The eight optimizations P 4 α , P 5 α , P 6 α , and P 8 α , α = 095, 0.99 that seek to simultaneously constrain AA i , i = 1, . . . , M have the largest rates for which no optimizing solution is found in the feasibility region. 10 For each risk measure value (MDD, Sharpe ratio, Rachev ratio) five or more of these eight optimizations performed in the top 50%.
• The optimizations P 1 α , and P 3 α , α = 095, 0.99 which had no-solution rates below 0.2%, generally registered (at least three out of the four) in the bottom half of the performers for each risk-measure.
• Of the remaining optimizations, P 2 α , and P 7 α , which had no-solution rates between 2.5% and 5.5%, P 2 0.95 and P 2 0.99 were in the top half performers in terms of Rachev and Sharpe ratios, while P 2 0.95 and P 7 0.95 were in the top half performers in terms of MDD.
• That fact that most of the optimized portfolios outperformed the Rachev ratio of the benchmark while none of them surpassed the Sharpe ratio of the benchmark may reflect the use of the portfolio standard deviation in the denominator of the Sharpe ratio which penalizes improved positive returns as strongly as it penalizes negative returns.
These observations, combined with the strong price performance of P 2 0.95 and P 2 0.99 , suggest the following conclusions from this study.
1. There is support for considering optimization using constraints on the total values of AA and SE; there is less support, in term of both price and risk measure performance, for considering constraints on SE.
2. Constraining SE and AA produces a larger price performance effect than constraining AA alone.
3. Replacement of the strong constraints a 1,i ≤ AA i ≤ b 1,i or a 2,i ≤ SE i ≤ b 2,i , i = 1, . . . , M in optimizations P 4 α , P 5 α ,P 6 α , and P 8 α by soft constraints, or by imposing strong AA i or SE i constraints only on a subset of the asset classes, might drastically reduce no-solution rates while retaining the high risk measure performance of these constrained optimizations.
4. The performance of optimizations P 2 α , P 5 α , P 7 α , and P 8 α should be investigated further using a non-linear optimization solver to replace the two-step optimization used in this study.