In this section, we present numerical examples of a set of performance measures including the AS and FH performance measures and empirical examples of the DOW 30 stocks.
3.1. Numerical Examples of the Performance Measures
We first present numerical examples of a set of performance measures using random variables of cash flows. Although
Kadan and Liu (
2014) presented empirical examples of the AS and FH performance measures in various financial assets, the characteristics of the two performance measures were not fully explained there. We intend to show the characteristics of the two performance measures in comparison to the traditional performance measures of the Sharpe ratio, Sortino ratio, and Calmer ratio. The Sortino ratio and Calmar ratio are derived from the Sharpe ratio by replacing the standard deviation in the Sharpe ratio by other risk measures. The Sortino ratio replaces standard deviation in the Sharpe ratio by the downside deviation in order to take into account only downside risk instead of both downside and upside risk in standard deviation. The Calmar ratio replaces standard deviation in the Sharpe ratio by another risk measure of the maximum drawdown where a drawdown is a peak-to-trough decline. Comparing the two performance measures with these traditional performance measures, we can see how the new performance measures function relative to the old performance measures.
We consider the following two sets of random variables in Example 1 and 2 given in
Table 1 and
Table 2. We consider four cases of random variables in Example 1 where values each random variable takes are given with corresponding probabilities where pr stands for probability. Mean, s.d., third, and fourth, skew, kurt, downrisk, and maxdrawd are respectively mean, standard deviation, the third central moment, the fourth central moment, skewness, kurtosis, downside deviation, and maximum drawdown for each random variable. Similar numerical examples were examined in
Hodoshima (
2020a) where the traditional performance measures were compared with the AS performance measure.
Sharpe, Sortino, Calmar, AS, and FH denote respectively the Sharpe ratio, Sortino ratio, Calmar ratio, AS performance measure, and FH performance measure.
We estimate the two performance measures by the generalized method of moments (GMM) estimator, as described in
Kadan and Liu (
2014). In particular, we find the two performance measures via grid search for the solutions of the sample analogs of the implicit equations for the two performance measures. The GMM estimator is consistent and asymptotically normally distributed. The implicit equations of the two performance measures are given as follows. The AS performance measure of a gamble
is given by
, which is a unique solution
of the implicit equation
On the other hand, the FH performance index of a gamble
is defined by
, which is the unique solution of the implicit equation
In Example 1, the case with a higher number dominates the case with a lower number since the former is larger than the latter with probability one. Hence, appropriate performance measures ought to take higher values in the case with a higher number than the case with a lower number, which is the property called
monotonicity. The de facto industry standard performance measure, the Sharpe ratio, fails to provide a larger value in the case with a higher number than the case with a lower number. Therefore, the Sharpe ratio does not satisfy monotonicity, one of the most fundamental criteria for performance measures. On the other hand, the Sortino ratio and Calmar ratio both satisfy this criterion. However, the Calmar ratio increases very much in the case with a higher number where a large value replaces a small value in the case with a lower number with a small probability
. Therefore, the Calmar ratio is too sensitive to the increase of a value in the random variable with a small probability
in the case with a higher number. The Sortino ratio increases gradually in the case with a higher number so that it satisfies monotonicity in Example 1. The AS performance measure does satisfy to provide a larger number in the case with a higher number than the case with a lower number
2. We remark the AS performance measures in case 2–4 are the same up to the third decimal point in Example 1 but that the AS performance measure in the case with a higher number is in fact larger in four decimal places or less than the case with a lower number in case 2–4 in Example 1. The increase of the AS performance measure in the case with a higher number is very small and hence not sensitive to gains of the underlying random variable. The FH performance measure increases more clearly than the AS performance measure in the case with a higher number than the case with a lower number. This implies the FH performance measure is more sensitive to gains than the AS performance measure, which has never been mentioned in the literature. The magnitude of the FH performance measure is similar to that of the AS performance measure and Sortino ratio.
In Example 2, a random variable in each case has a disaster
with a small probability
and other cashflows with remaining probabilities proportional to probabilities in each case in Example 1. The disaster risk
with a small probability
does not affect much the Sharpe ratio and Sortino ratio but does affect significantly the Calmar ratio. However, the Calmar ratio is again seen to be too sensitive to the increase of a value in the random variable with a small probability
in the case with a higher number. Hence it is not a reliable performance measure. The AS performance measure becomes less than half in Example 2 than in Example 1. We can say the disaster risk has a large effect on the AS performance measure. Thus, the AS performance measure is sensitive to losses or the maximal loss of the underlying random variable. The FH performance measure has an even more significant negative effect by the disaster risk. In
Table 2,
denotes limit on the right, i.e., the limit of sequences of positive numbers converging to zero. It becomes virtually zero, which is a lower bound of the FH performance measure, in every case in Example 2. Adding larger positive values in the case with a higher number does not affect the FH performance measure in Example 2. Therefore, the FH performance measure is virtually determined by disaster risk or maximal loss. Hence, it is the most sensitive measure to disaster risk, which conforms to the previous studies of the FH performance measure (cf.,
Foster and Hart 2009;
Kadan and Liu 2014;
Anand et al. 2016,
2017;
Riedel and Hellmann 2015). Hence, the newly introduced performance measures of the AS and FH performance measures are both quite sensitive to losses of the underlying random variable.
We introduce another example, Example 3 given in
Table 3, to show the problematic nature of the traditional performance measures of the Sharpe ratio, Sortino ratio, and Calmar ratio. Example 3 has a loss of
, which is not huge but sizable compared to the disaster risk
in Example 2, with a small probability
. It has four cases where a case with a higher number is larger than a case with a smaller number with probability 1. The loss
produces larger skewness and kurtosis in absolute value in Example 3 than in Example 2. We can see absurd values of the Sharpe ratio in Example 3, failing to satisfy monotonicity again. In Example 3, the Sortino ratio also fails to satisfy monotonicity. Therefore, we cannot always trust the Sortino ratio because it does give irrational values, depending on the underlying random variable. We can also observe the Calmar ratio is again too sensitive to the increase of a value in the random variable with small probability
. Hence, we cannot trust the Calmar ratio as an appropriate performance measure. In Example 3, the FH performance index is again
, the limit of sequences of positive numbers converging to zero, in every case of Example 3. Adding larger positive values does not change the FH performance measure in Example 3. Hence, the FH performance measure is again virtually determined by the disaster risk of
in Example 3. One may say this indicates that the FH performance measure is excessively sensitive to disaster risk. On the other hand, the AS performance measure provides larger scores in Example 3 than in Examples 1 and 2. This also conforms to the performance of mean in the three examples, i.e., mean in Example 3 is larger than in Example 1 and 2. Therefore, the AS performance measure is sensitive to losses of the underlying random variable but not excessively sensitive to disaster risk as in the FH performance measure. We can see the AS performance measure is again insensitive to gains in Example 3. One may consider the AS performance measure, sensitive to losses but not excessively sensitive to disaster risk, is more appropriate than the FH performance measure since it can provide assessments more often.
Although our numerical examples are limited, we can summarize our numerical comparisons as follows. Overall, the traditional performance measures of the Sharpe ratio, Sortino ratio, and Calmar ratio are not reliable since they do not either satisfy monotonicity or sometimes give irrational evaluations depending on the underlying target in question. On the other hand, the AS and FH performance measures are reliable, i.e., satisfy monotonicity, when they are well defined. However, the FH performance measure is excessively sensitive to the maximal loss, which makes the FH performance measure incapable of providing appropriate assessments. On the other hand, the AS performance measure can provide assessments more often than the FH performance measure. The AS performance measure is quite sensitive to losses but not excessively sensitive to the maximal loss as in the FH performance measure. The AS performance measure is less sensitive to gains than the FH performance measure when the two performance measures both can provide assessments.
In the next subsection, we provide empirical examples to show how the AS and FH performance measures function when we can compute the two performance measures. In particular, we show by empirical examples how the FH performance measure performs when we can obtain its assessment, which we could observe only once in Example 1 in three numerical examples in this section.
3.2. Empirical Results
In this subsection, we present empirical results of evaluations by the two new performance measures and the Sharpe ratio for the DOW 30 components
3 as of 2 April 2019. The DOW 30 components are listed in
Table 4. Our sample period for daily (monthly) return data is from 4 January 2005 (February 2005) till 30 December 2019 (December 2019)
4. As stock returns, we use log-returns in this paper. The same data were studied by
Hodoshima and Yamawake (
2020) where only winners and losers of the DOW 30 components were described without percentiles including the maximal loss. On the other hand, the current study focuses on the issue of sensitivity of the new performance measures to disaster risk. We cursorily describe performance of the DOW 30 stocks in this study. The list of the DOW 30 components is given in
Table 4.
We present summary statistics of daily return data in
Table 5 and percentiles, including the maximum and minimum, in
Table 6. In the tables, s.d., max, min, 80%, 60%, 40%, and 20% denote respectively standard deviation, maximum, minimum, 80% percentile, 60% percentile, 40% percentile, and 20% percentile. Mean in the last row in
Table 5 and
Table 6 denotes mean of summary statistics and percentiles over 29 stocks. Mean ranges from
in Walgreens to
in Apple. Standard deviation ranges from
in Johnson & Johnson to
in JP Morgan. Skewness shows 11 stocks are negatively skewed and that 18 stocks are positively skewed. Kurtosis shows all the data have heavy tails compared to the normal distribution. The maximum ranges from
in McDonald’s to
in UnitedHealth. The minimum ranges from
in Procter & Gamble to
in JP Morgan. These minimum values are for daily returns and hence considered to be quite disastrous losses. Four other percentiles are given in
Table 6. They are listed from larger values (80% percentile) to smaller values (20% percentile).
Table 7 presents the three performance measures, the AS and FH performance measures and the Sharpe ratio, for daily return data. We do not provide the Sortino ratio and Calmar ratio in this section since our focus is on the two performance measures and the Sharpe ratio is the de facto industry standard performance measure to compare. Mean in the last row in
Table 7 denotes mean of performance measures over 29 stocks. We can obtain the AS performance measure in every stock of the DOW 30 stocks. In other words, we can obtain the AS performance measure without much difficulty in representative real stock data. Therefore, obtaining the AS performance measure is not a problem in our daily stock data. The AS performance measure ranges from
in Goldman Sachs to
in McDonald’s. Therefore, the AS performance measure scores are much smaller than those in the numerical examples in the previous subsection although the maximum loss is in some stocks larger than the numerical examples in the previous section. The difference between the AS performance measure and Sharpe ratio is large in outperforming stocks but small in underperforming stocks.
The FH performance measure is generally similar to the AS performance measure. Example 1 in the previous subsection shows the FH performance measure is similar to the AS performance measure when there is not disaster risk, which can be the reason why the FH performance measure is similar to the AS performance measure, although the minimum values in
Table 6 show the existence of severe negative returns in many stocks. The existence of these severe negative returns seems to have downward effects on the FH performance measure as well as the AS performance measure in daily data. The two performance measures are substantially small compared to those in Example 1 and the AS performance measure in Example 2 and 3 at the previous subsection.
Table 8 presents summary statistics of monthly return data for the DOW 30 stocks. Mean in the last row in
Table 8 denotes mean of summary statistics over 29 stocks. Mean of summary statistics ranges from
in Walgreens to
in Apple. Mean ranges from the minimum in Walgreens to the maximum in Apple in monthly data, which is the same as in daily data. In the table, mean* denotes the mean derived from the formula where mean in monthly data should be close to
times mean in daily data if daily returns follow identical distributions and s.d.* denotes the standard deviation derived from the formula where standard deviation in monthly data should be close to
times standard deviation in daily data if daily returns follow independent and identical distributions. Standard deviation ranges from
in Johnson & Johnson to
in Apple. Standard deviation in Johnson & Johnson is also the minimum in daily data.
Skewness is all negative in monthly data except American Express. The negative skewness of the distribution shows that we may expect frequent small gains and a few large losses. There are only four companies, 3M, American Express, Cisco Systems, and Walgreens, where skewness is larger in monthly data than daily data. Skewness is more negative in the rest of the four companies in monthly data than daily data. Kurtosis shows DOW components have tails closer to the normal distribution except American Express in monthly data than daily data.
Table 9 presents percentiles of monthly return data for DOW 30 stocks. Mean in the last row in
Table 9 denotes mean of percentiles over 29 stocks. The maximum ranges from
in Johnson & Johnson to
in American Express. The minimum ranges from
in Caterpillar to
in McDonald’s. The range of returns widens in monthly data than in daily data. Consequently, most of summary statistics become larger in absolute value in monthly data than in daily data except kurtosis and 40% percentile. This applies to the maximum and minimum. Hence, the minimum shows more severe disastrous observations in monthly data than daily data. Since the AS and FH performance measures are sensitive to losses but insensitive to gains as we saw in the numerical examples in the previous section, negative values of the observation have disproportionally larger adverse effects on the two performance measures than positive values when the former and latter are equal in absolute value.
Table 10 presents the three performance measures of monthly return data for the DOW 30 stocks. The Sharpe ratio is high in monthly data in some stocks such as Visa, McDonald’s, Nike, and Apple. The Sharpe ratio in monthly data is much higher than that in daily data. This is natural since the monthly Sharpe ratio is close to
times as much as the daily Sharpe ratio if daily returns are independently and identically distributed (cf.
Lo 2002), where 30 denotes an average number of days in a month, 7 denotes the number of days in a week, and 5 denotes the number of weekdays in a week. On the other hand, the AS and FH performance measures in monthly data are much closer to those in daily data. They are nearly closed under temporal aggregation in some stocks, i.e., they have time-invariant values regardless of data frequency (cf.
Hodoshima 2020b).
The AS performance measure ranges from in Walgreens to in McDonald’s. McDonald’s is the maximum in the AS performance measure in monthly data, which is the same as in daily data. McDonald’s is by far the best by the AS performance measure but rated only as the second-best next to Visa by the Sharpe ratio. The difference between the AS performance measure and Sharpe ratio is large in outperforming stocks but small in underperforming stocks.
The FH performance measure is small compared to the AS performance measure. Some companies with high AS performance measure scores in monthly data referred above all have the FH performance measure considerably smaller than the corresponding AS performance measure. This is in contrast to the result in daily data. As we saw in the previous subsection, the FH performance measure is much more sensitive to losses of the underlying stock performance, we consider this indicates the lower FH performance measure score in monthly data is due to larger losses in the stock return in monthly data. Therefore, our result indicates the FH performance measure is more sensitive to losses, i.e., negative skewness or left tail of the underlying distribution, than the AS performance measure.