 Next Article in Journal
Divergence from, and Convergence to, Uniformity of Probability Density Quantiles
Next Article in Special Issue
Principles of Bayesian Inference Using General Divergence Criteria
Previous Article in Journal
Virtual Network Embedding Based on Graph Entropy
Previous Article in Special Issue
Statistical Reasoning: Choosing and Checking the Ingredients, Inferences Based on a Measure of Statistical Evidence with Some Applications

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Adjusted Empirical Likelihood Method in the Presence of Nuisance Parameters with Application to the Sharpe Ratio

Department of Mathematics and Statistics, York University, 4700 Keele Street, Toronto, ON M3J 1P3, Canada
*
Author to whom correspondence should be addressed.
Entropy 2018, 20(5), 316; https://doi.org/10.3390/e20050316
Received: 24 February 2018 / Revised: 11 April 2018 / Accepted: 11 April 2018 / Published: 25 April 2018

## Abstract

:
The Sharpe ratio is a widely used risk-adjusted performance measurement in economics and finance. Most of the known statistical inferential methods devoted to the Sharpe ratio are based on the assumption that the data are normally distributed. In this article, without making any distributional assumption on the data, we develop the adjusted empirical likelihood method to obtain inference for a parameter of interest in the presence of nuisance parameters. We show that the log adjusted empirical likelihood ratio statistic is asymptotically distributed as the chi-square distribution. The proposed method is applied to obtain inference for the Sharpe ratio. Simulation results illustrate that the proposed method is comparable to Jobson and Korkie’s method (1981) and outperforms the empirical likelihood method when the data are from a symmetric distribution. In addition, when the data are from a skewed distribution, the proposed method significantly outperforms all other existing methods. A real-data example is analyzed to exemplify the application of the proposed method.

## 1. Introduction

In financial economics, Sharpe ratio, defined in , provides a measure of a fund’s excess returns relative to its volatility. Let $μ$ be an expected return of an asset, and $σ$ be the corresponding standard deviation. The Sharpe ratio is defined as
$s r = μ − R f σ ,$
where $R f$ is a known risk-free rate of return. Note that the larger the Sharpe ratio is, the more return the investor is getting per unit of risk. It is the standard convention in economics and finance research to report the Sharpe ratio. Therefore, the Sharpe ratio is very well studied as a measure of the mutual fund performance in the financial economic areas such as the portfolio analysis, the pricing of capital asset under conditions of risk and the general behavior of stock market prices. The popularity of the Sharpe ratio in financial economics is not only from its simplicity; the study of the Sharpe ratio will also directly result in deeper understandings in portfolio selections. Assuming that the asset returns are all normally distributed, Sharpe  showed that picking an asset with the largest Sharpe ratio is equivalent to finding a solution of the investor’s expected utility problem.
Under the normality assumption, Jobson and Korkie  proposed a parametric test for the Sharpe ratio, which is a very popular inferential method in economics and finance. However, as shown by many researchers [3,4,5,6], it is very common for the actual returns of the investments, such as the hedge funds, to have a skewed distribution. When the normality assumption of the investment returns is violated, the commonly used approximate distributions of the Sharpe ratio which are developed under the normality assumption become problematic. Model mis-specification is a big concern for all parametric approaches since a misspecified model may lead to biased results. Since the Sharpe ratio is only involved in the first two moments of the data, one of the themes attempting to resolve the problem is to consider higher order moments. There was abundant literature along this line of research such as [7,8,9,10] and references therein.
Another line of research to the problem is to use the nonparametric approach. In this article, we adopt the empirical likelihood (EL) method. Empirical likelihood-type method was first used by Thomas and Grunkemeier  to study the survival probabilities estimated by the Kaplan–Meier curve. Owen [12,13] formalized the EL as a unified inference method under more general settings. The EL-based confidence region has several beneficial properties: it does not impose prior constraints on region shape, is transformation invariant and Bartlett correctable . Qin and Lawless  applied the EL to inference on parameters that are generated from estimating equations. When the sample size is small and/or the dimension of the estimating equations is high, the EL approach can be hindered by an empty set problem and under-coverage problem. In order to resolve the empty set problem and improve the coverage probability of the statistical tests of the ordinary EL methods, Chen et al.  proposed the adjusted empirical likelihood (AEL) method by adding one artificial point into the data set. However, only problems without nuisance parameters were considered in . In this article, we focus on the AEL method with nuisance parameters in addition to the parameter of primary interest. We develop the asymptotic theory of the AEL method when nuisance parameters exist, and demonstrate the use of the AEL method in the application of the Sharpe ratio. Our simulation studies show that the proposed approach provides a beneficial robust alternative to the inference of the Sharpe ratio. The proposed AEL method is comparable to Jobson and Korkie’s method  and outperforms the EL method when the data are from a symmetric distribution, while for data generating from a skewed distribution, the proposed method outperforms all other existing methods, especially for small sample sizes. The AEL method preserves the advantage of the EL method: the shape of confidence region based on the AEL ratio reflects the observed data set, while the confidence region based on other methods (excluding EL) is always symmetric about the point estimator. Therefore, the AEL approach allows the data to speak for themselves, and is robust against model mis-specification.
The rest of the article is organized as follows. A brief introduction to the EL and AEL methodologies is given in Section 2. In Section 3, we study the asymptotic property of the AEL method with nuisance parameters. In Section 4, simulation studies are conducted to investigate the precision of the coverage probabilities in the context of the Sharpe ratio. In Section 5, a real-data example is analyzed to illustrate the application of the proposed method. Some concluding remarks are given in Section 6. The technical details are presented in the Appendix.

## 2. Review of the Empirical Likelihood and the Adjusted Empirical Likelihood Methods

Let $X 1 , X 2 , … , X n ∈ R d$ be the independent and identically distributed random vectors following distribution F with mean $μ$ and a nonsingular covariance matrix. The corresponding observed values are denoted by $x 1 , x 2 , … , x n$. The EL function for the population distribution F is given by
$L ( F ) = ∏ i = 1 n F ( { x i } ) ,$
where $F ( { x i } )$ is the probability of observing the value $x i$ in a sample from F. Denote $p i = F ( { x i } )$. The EL function can also be written as
$L ( F ) = ∏ i = 1 n p i .$
Clearly , we have $0 ≤ p i ≤ 1$ and $∑ i = 1 n p i = 1$. Suppose that the goal is to construct a confidence region for the mean $μ$. The profile EL function of $μ$ is defined to be
$L E L ( μ ) = sup ∏ i = 1 n p i : p i ≥ 0 , i = 1 , … , n ; ∑ i = 1 n p i = 1 ; ∑ i = 1 n p i x i = μ .$
Qin and Lawless  showed that extra information in the form of a set of estimating equations can be used to improve the maximum empirical likelihood estimators (MELE) and the EL ratio confidence intervals. Suppose a k dimensional parameter $θ$ is associated with F via a vector $g ( x , θ )$ of $r ≥ k$ functionally independent unbiased estimating functions. Then for each $j = 1 , 2 , … , r$, we have an estimating equation $E F { g j ( x , θ ) } = 0$, which can be written in the vector form as $E F { g ( x , θ ) } = 0$. The profile EL function of $θ$ is
$L E L ( θ ) = sup ∏ i = 1 n p i : p i ≥ 0 , i = 1 , … , n ; ∑ i = 1 n p i = 1 ; ∑ i = 1 n p i g ( x i , θ ) = 0 ,$
and hence, the profile log-EL function is
$l E L ( θ ) = sup ∑ i = 1 n log p i : p i ≥ 0 , i = 1 , … , n ; ∑ i = 1 n p i = 1 ; ∑ i = 1 n p i g ( x i , θ ) = 0 .$
The constrained optimization problem in (3) can be solved by applying the method of Lagrange multipliers. Let $λ$ and $t = ( t 1 , … , t r ) τ$ be Lagrange multipliers and define
$H = ∑ i log p i + λ ( 1 − ∑ i p i ) − n t τ ∑ i p i g ( x i , θ ) .$
Then maximizing (3) is equivalent to maximizing H unconditionally. Setting the first partial derivative of (4) with respect to $p i$ equal to 0, we have
$∂ H ∂ p i = 1 p i − λ − n t τ g ( x i , θ ) = 0 ,$
$∑ i = 1 n p i ∂ H ∂ p i = n − λ = 0 ⇒ λ = n$
and
$p ^ i = 1 n [ 1 + t τ g ( x i , θ ) ] ,$
where t can be expressed as a function of $θ$ by solving the following equations
$∑ i = 1 n p ^ i g ( x i , θ ) = 0 .$
Now the profile log-EL function can be written as
$l E L ( θ ) = − ∑ i = 1 n log 1 + t τ g ( x i , θ ) − n log n .$
Note that (5) can be rewritten as
$∑ i = 1 n g ( x i , θ ) 1 + t τ g ( x i , θ ) = 0 .$
Now maximizing (3) has been transformed into an equivalence of solving (7) for the Lagrange multiplier t. In practice, this is achieved by numerical methods. One such algorithm devoted to this end can be found in . A necessary and sufficient condition for the existence of a solution $t ˜ = t ˜ ( θ )$ in (7) is that 0 must be an inner point of the convex hull expanded by ${ g ( x i , θ ) , i = 1 , 2 , … , n }$.
Qin and Lawless  further showed that under some regularity conditions, the EL ratio statistic $W 0 ( θ 0 ) = 2 [ l E L ( θ ˜ ) − l E L ( θ 0 ) ]$ converges to $χ k 2$ in distribution as the sample size n approaches infinity. This result is the foundation for hypothesis test on $θ$ and can be used to construct an approximate $100 ( 1 − α ) %$ confidence region of $θ$,
$I E L = { μ : W 0 ( θ ) ≤ χ k 2 ( 1 − α ) } ,$
where $χ k 2 ( 1 − α )$ is the $100 ( 1 − α ) %$ quantile of the $χ k 2$ distribution, and $α$ is a pre-specified significance level.
Under mild conditions, the convex hull of ${ g ( x i , θ ) , i = 1 , 2 , … , n }$ contains 0 as its inner point with probability 1 as $n → ∞$. However, if $θ$ is not close to the true parameter $θ 0$ or when the sample size n is small, the convex hull is not guaranteed to contain 0. Thus, there is a nonzero probability that the solution to (7) does not exist. It results computational issues when solving the constrained optimization problem in the definition of the EL function. This is known as the empty set problem or the convex hull problem in the EL literature.
In order to resolve the convex hull problem, Chen et al.  proposed the AEL method by adding one artificial point into the data set. Denote
$g i = g i ( θ ) = g ( x i , θ )$
and
$g ¯ n = g ¯ n ( θ ) = 1 n ∑ i = 1 n g i .$
Let $a n = o ( n )$ be a given positive constant. Define a new point by
$g n + 1 = g n + 1 ( θ ) = − a n n ∑ i = 1 n g i = − a n g ¯ n .$
Similar to (2), the profile log-AEL function if defined as
$l A E L ( θ ) = sup ∑ i = 1 n + 1 log [ ( n + 1 ) p i ] : p i ≥ 0 , i = 1 , … , n + 1 ; ∑ i = 1 n + 1 p i = 1 ; ∑ i = 1 n + 1 p i g i = 0 ,$
and we have
$l A E L ( θ ) = − ∑ i = 1 n + 1 log 1 + t τ g ( x i , θ ) ,$
where t satisfies
$∑ i = 1 n + 1 g ( x i , θ ) 1 + t τ g ( x i , θ ) = 0 .$
The introduction of $g n + 1$ guarantees a solution for t in (7). Let the maximum AEL estimator $θ ˜$ be the maximizer of $l A E L ( θ )$. Under mild regularity conditions, the AEL ratio statistic $W ( θ 0 ) = 2 [ l A E L ( θ ˜ ) − l A E L ( θ 0 ) ]$ converges to $χ k 2$ in distribution as the sample size n approaches infinity. Chen et al.  showed that the statistical tests based on the AEL method give better coverage probabilities than those obtained by the original EL method.
In this article, we propose using the AEL method to conduct inference on the Sharpe ratio. Suppose the data is from a population with mean $μ$ and variance $σ 2$. Without loss of generality, for the rest of this article, define the Sharpe ratio of the population as
$s r = μ σ .$
In this case, the parameter vector is $θ = ( μ , σ 2 )$, and the parameter of interest is $s r$. The set of estimating functions can either be
$X − μ and ( X − μ ) 2 − μ s r 2$
or
$X − σ ( s r ) and ( X − σ ( s r ) ) 2 − σ 2 ,$
which has $μ$ or $σ$ as the nuisance parameter, respectively. Chen et al.  discussed the AEL-based inference without nuisance parameters. Building upon [15,16], we develop the convergence theorem for the AEL with nuisance parameters as shown in the next section.

## 3. The Adjusted Empirical Likelihood Method in the Presence of Nuisance Parameters

Suppose a k dimensional parameter $θ = ( θ 1 , θ 2 )$ consists a q dimensional parameter of interest $θ 1$ as well as a $( k − q )$ dimensional nuisance parameter $θ 2$. The goal is to test $H 0 : θ 1 = θ 1 0$ for some given $θ 1 0$. In order to obtain inference for $θ 1$ using the AEL method, the asymptotic results in  need to be reconstructed and extended to the situation with nuisance parameters.
First, we develop a lemma about positive definite matrices. If a matrix M is positive semidefinite, we denote it by $M ≥ 0$; if M is positive definite, we write $M > 0$. For any matrices G and H, let $G ≥ H$ denote that $G − H$ is positive semidefinite, and let $G > H$ denote that $G − H$ is positive definite.
Lemma 1.
Let M be a $k × k$ symmetric positive definite block matrix of the form
$M = A B B τ C ,$
where A is a $q × q$ matrix, B is a $q × ( k − q )$ matrix, and C is a $( k − q ) × ( k − q )$ matrix. Then C is positive definite and
$A B B τ C − 1 ≥ 0 0 0 C − 1 .$
The proof of the above lemma is given in Appendix. In order to prove the main theorem, we also need the following two results about idempotent matrices. The proof of these two results can be found in  (pp. 186–187).
Result 1.
A necessary and sufficient condition that $Y ′ A Y$ has a $χ 2$ distribution is that A is idempotent, that is, $A 2 = A$, in which case the degrees of freedom of $χ 2$ is rank A = trace A.
Result 2.
If A, B, $A − B$ are matrices of non-negative quadratic forms and A and B are idempotent, then $A − B$ is also idempotent.
Based on Lemma (1) and the above two results, we have the following theorem which gives the asymptotic properties of the AEL ratio test statistic. The theorem is a nonparametric analogue of the theorem in  on the asymptotic distribution of the likelihood ratio. The difference is that Wilks’ theorem is based on parametric likelihood and ours is based on the adjusted empirical likelihood. Moreover, it takes into consideration nuisance parameters. We follow the idea of profiling out nuisance parameters (Corollary 5 in  and Corollary 1 in ) to perform the AEL ratio test. The proof of the theorem is provided in Appendix.
Theorem 1.
Let $θ τ = ( θ 1 , θ 2 ) τ$, where $θ 1$ and $θ 2$ are $q × 1$ and $( k − q ) × 1$ vectors, respectively. For $H 0 : θ 1 = θ 1 0$, the profile AEL ratio test statistic is
$W ( θ 1 0 ) = 2 [ l A E L ( θ ˜ 1 , θ ˜ 2 ) − l A E L ( θ 1 0 , θ ˜ 2 0 ) ] ,$
where $θ ˜ τ = ( θ ˜ 1 , θ ˜ 2 ) τ$ maximizes $l A E L ( θ ) = l A E L ( θ 1 , θ 2 )$, and $θ ˜ 2 0$ maximizes $l A E L ( θ 1 0 , θ 2 )$ with respect to $θ 2$. Under $H 0$, $W ( θ 1 0 ) → d χ q 2$ as $n → ∞$.
It is worth noticing that Theorem 1 holds true as long as $a n = o p ( n )$. In application, $a n$ with higher orders is usually not recommended, since the AEL ratios are decreasing functions of the adjustment level $a n$ . As suggested by , we set $a n = 1 2 log n$ for all of the simulations and applications if not otherwise specified.
Since in Theorem 1 , $θ 1$ is the parameter of interest and $θ 2$ is considered as the nuisance parameter. We can apply the theorem to the Sharpe ratio by setting $θ 1 = s r$ along with $θ 2 = μ$ or $θ 2 = σ 2$. Therefore, the AEL ratio statistic under the null hypothesis $H 0 : s r = s r 0$ can be either
$W ( s r 0 ) = 2 [ l A E L ( s r ˜ , μ ˜ ) − l A E L ( s r 0 , μ ˜ 0 ) ] ,$
or
$W ( s r 0 ) = 2 [ l A E L ( s r ˜ , σ ˜ 2 ) − l A E L ( s r 0 , σ ˜ 0 2 ) ] .$
Our simulation shows that using (12) or (13) as the AEL ratio statistic does not make any significant difference in the inference of $s r$.

## 4. Simulation Study

In order to evaluate the accuracy of the asymptotic chi-square calibration of the AEL method, we choose the coverage probability as an indicator throughout this section. For some fixed sample size n and $s r 0$, suppose we have run the simulation m times and s of the simulated $W ( s r 0 )$ are less than the $1 − α$ quantile of $χ 1 2$ for some given $α$. Then the coverage probability is defined to be $s / m$, which is compared with the nominal value $1 − α$. When m is large, if the coverage probability $s / m$ is close to $1 − α$, then the level $α$ test for $s r$ will tend to give good performance and $χ 1 2$ is considered an acceptable reference distribution for $W ( s r 0 )$ at sample size n.
We compare the coverage probability of the proposed method with other methods for sample sizes $n = 20 , 50 , 200 , 500$ at nominal values $1 − α = 0.9 , 0.95$. Each coverage probability is obtained from $m = 5000$ simulations. The data are generated from the normal distribution with mean $μ = 1$ and standard deviation $σ = 0.5$, t-distribution and the chi-square distributions with various degrees of freedom. The methods under comparison are the following: the Jobson and Korkie’s method  (JK), the Mertens’s method  (Mertens), the usual EL inferential method (EL), application of the delta method on the asymptotic distribution of the EL estimator of the mean and standard deviation (Delta), and the proposed method (AEL) with the adjustment level $a n = 0.5 log n$. Jobson and Korkie  assumed that the data are from a normal distribution. By applying the delta method to approximate the mean and variance of the Sharpe ratio, confidence interval for the Sharpe ratio can then be approximated by the Central Limit Theorem. Mertens  used the skewness and kurtosis to give an adjusted approximation of the variance of the Sharpe ratio derived in Jobson and Korkie  and again obtained the confidence interval of the Sharpe ratio from the Central Limit Theorem. The approach denoted by Delta is similar to JK but based on the EL. For the EL method, whenever the convex hull problem occurs for a set of simulated data, we use the convention to set the value of the profile log-EL function as negative infinity. Results are summarized in Table 1.
From Table 1, we can see that the AEL method has the most robust performance for various underlying population distributions. The AEL method always has significantly better performance over the EL method in terms of coverage probability. When the data is normally distributed, the JK method performs the best while when the data comes from a skewed distribution, the JK method performs poorly. For normal data with small sample size, the AEL has slightly less coverage probabilities than the JK method, while for normal data with sample size larger than 50 and data from various t distributions, the AEL has comparable performance with the JK method. For all other situations, the AEL method significantly outperforms all other methods, especially for cases with small sample sizes.

## 5. Real Data Analysis

The data we consider is the Nasdaq GS return of the Apple Inc. (Cupertino, CA, USA) from 3 October 2017 to 12 December 2017 (https://finance.yahoo.com/quote/AAPL/). The return is evaluated from the close price of the current day compared with the close price of the previous day. There are 50 trading days during the period considered. We use the yearly return rate of the 5-year bonds, which is $2.116 %$, as the yearly risk-free return. Therefore, the daily risk-free return rate used in the analysis is $0.02116 / 252 = 8.397 × 10 − 5$. Based on our data, the Durbin-Watson test statistic is 1.58. Hence, there is no significant evidence of serial correlation. The qqplot of the returns in Figure 1 reveals some skewness of the data. The confidence intervals of the Sharpe ratio for the Apple Inc. return data produced by different methods are listed in Table 2. For JK and Mertens methods, the point estimates are the value of $s r$ that corresponding to the 50% quantile of the standard normal limiting distribution of their test statistics. The estimates of the Delta, EL and AEL methods are the value of the maximum EL and AEL estimates, respectively.
From Table 2, we see that since JK and Mertens methods are moment-based methods, both their estimates are the same as the sample Sharpe ratio. The Delta, EL and AEL methods are empirical-likelihood-based methods so the corresponding estimates are different from the previous two approaches. We observe that there is some difference in the confidence intervals for various approaches. Note that the data has some skewness as shown in Figure 1. Based on the observation from our simulation studies, the skewness will affect the JK method but not the rest of the four methods. The confidence interval based on our proposed AEL method is more robust and trustworthy.

## 6. Conclusions

We extended the adjusted empirical likelihood method  to obtain inference for the a parameter of interest in the presence of nuisance parameters. The advantage of the proposed method is that it does not rely on the distributional assumption of the data. In particular, we applied the proposed method to obtain inference for the Sharpe ratio. Simulation results show that the proposed method gives the coverage probabilities closest to the nominal value than those obtained by the standard empirical likelihood ratio method. Simulation results illustrate that the proposed method is comparable to Jobson and Korkie’s method  and outperforms the EL method when the data are from a symmetric distribution. In addition, when the data are from a skewed distribution, the proposed method outperforms all other existing methods.
The time-series properties of investment strategies can have a nontrivial impact on the Sharpe ratio estimator. In this article, we proposed using empirical-likelihood-based inference for Sharpe ratio. Empirical likelihood was motivated by independent and identically distributed data. When dealing with dependent data, we need to account for the dependency structure in constructing confidence regions for the parameter of interest. In general, the approach to handle dependent data within the EL framework is parallel to the methods based on parametric likelihood. The extension of our approach for dependent data is valuable and interesting. We will consider it in future research.

## Author Contributions

All authors equally participated in the design, methodology, writing and interpretation of the results. H.W. conducted the analysis in R as part of her Ph.D. thesis. All authors have read and approved the final manuscript.

## Acknowledgments

The authors would like to thank the Editor and three referees for their valuable suggestions and comments.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendix

Proof of Lemma 1.
Since M is a symmetric positive matrix, we have $C > 0$ and $A − B C − 1 B τ > 0$; see Theorem 16.1 in . Noting that M has the following factorization
$A B B τ C = I B C − 1 0 I A − B C − 1 B τ 0 0 C I 0 ( B C − 1 ) τ I ,$
we have
$A B B τ C − 1 = I 0 ( B C − 1 ) τ I − 1 ( A − B C − 1 B τ ) − 1 0 0 C − 1 I B C − 1 0 I − 1 .$
Further note that
$0 0 0 C − 1 = I 0 ( B C − 1 ) τ I − 1 I 0 ( B C − 1 ) τ I 0 0 0 C − 1 I B C − 1 0 I I B C − 1 0 I − 1 = I 0 ( B C − 1 ) τ I − 1 0 0 0 C − 1 I B C − 1 0 I − 1 .$
$A B B τ C − 1 − 0 0 0 C − 1 = I 0 ( B C − 1 ) τ I − 1 ( A − B C − 1 B τ ) − 1 0 0 C − 1 − 0 0 0 C − 1 I B C − 1 0 I − 1 = I 0 ( B C − 1 ) τ I − 1 ( A − B C − 1 B τ ) − 1 0 0 0 I B C − 1 0 I − 1 .$
Since $A − B C − 1 B τ > 0$, we have $( A − B C − 1 B τ ) − 1 > 0$, which leads to
$A B B τ C − 1 ≥ 0 0 0 C − 1 .$
Proof of Theorem 1.
For simplicity, denote $l ( θ ) = − l A E L ( θ )$. Then $θ ˜ τ = ( θ ˜ 1 , θ ˜ 2 ) τ$ minimizes $l ( θ ) = l ( θ 1 , θ 2 )$, and $θ ˜ 2 0$ minimizes $l ( θ 1 0 , θ 2 )$ with respect to $θ 2$. Under this new notation, the test statistic becomes
$W ( θ 1 0 ) = 2 [ l ( θ 1 0 , θ ˜ 2 0 ) − l ( θ ˜ 1 , θ ˜ 2 ) ] .$
First, the following notations are needed in this proof. Let
$Q 1 n ( θ , t ) = 1 n + 1 ∑ i = 1 n + 1 g i ( θ ) 1 + t τ g i ( θ ) ,$
$Q 2 n ( θ , t ) = 1 n + 1 ∑ i = 1 n + 1 1 1 + t τ g i ( θ ) ∂ g i ( θ ) ∂ θ τ t .$
Let $θ ˜$ and $t ˜ = t ( θ ˜ )$ be the solution of
$Q 1 n ( θ ˜ , t ˜ ) = 0 , Q 2 n ( θ ˜ , t ˜ ) = 0 .$
The existence of $θ ˜$ and $t ˜ = t ( θ ˜ )$ in a neighborhood of the true parameter $θ 0$ is proved in [15,16]. Note that
$∂ Q 1 n ( θ , 0 ) ∂ θ = 1 n + 1 ∑ i = 1 n + 1 ∂ g i ( θ ) ∂ θ , ∂ Q 1 n ( θ , 0 ) ∂ t τ = − 1 n + 1 ∑ i = 1 n + 1 g i ( θ ) g i ( θ ) τ , ∂ Q 2 n ( θ , 0 ) ∂ θ = 0 , ∂ Q 2 n ( θ , 0 ) ∂ t τ = 1 n + 1 ∑ i = 1 n + 1 ∂ g i ( θ ) ∂ θ τ .$
Taylor expansion of $Q 1 n ( θ ˜ , t ˜ )$ and $Q 2 n ( θ ˜ , t ˜ )$ at $( θ 0 , 0 )$ gives
$0 = Q 1 n ( θ ˜ , t ˜ ) = Q 1 n ( θ 0 , 0 ) + ∂ Q 1 n ( θ 0 , 0 ) ∂ θ ( θ ˜ − θ 0 ) + ∂ Q 1 n ( θ 0 , 0 ) ∂ t τ ( t ˜ − 0 ) + o p ( δ n )$
$0 = Q 2 n ( θ ˜ , t ˜ ) = Q 2 n ( θ 0 , 0 ) + ∂ Q 2 n ( θ 0 , 0 ) ∂ θ ( θ ˜ − θ 0 ) + ∂ Q 2 n ( θ 0 , 0 ) ∂ t τ ( t ˜ − 0 ) + o p ( δ n ) ,$
where $δ n = | | θ ˜ − θ 0 | | + | | t ˜ | |$. Observing that $Q 2 n ( θ 0 , 0 ) = 0$, we have
$S n t ˜ θ ˜ − θ 0 = − Q 1 n ( θ 0 , 0 ) + o p ( δ n ) o p ( δ n ) ,$
where
$S n = ∂ Q 1 n ∂ t τ ∂ Q 1 n ∂ θ ∂ Q 2 n ∂ t τ 0 ( θ 0 , 0 ) .$
Now we solve (A1) for an expression of $t ˜$. By the law of large numbers, as $n → ∞$
$1 n ∑ i = 1 n ∂ g i ( θ ) ∂ θ ⟶ E ∂ g ( θ ) ∂ θ .$
Therefore,
$∂ g n + 1 ( θ ) ∂ θ = − a n n ∑ i = 1 n ∂ g i ( θ ) ∂ θ = o p ( n ) .$
Hence applying the law of large numbers again
$∂ Q 1 n ( θ , 0 ) ∂ θ = 1 n + 1 ∑ i = 1 n + 1 ∂ g i ( θ ) ∂ θ = 1 n + 1 ∑ i = 1 n ∂ g i ( θ ) ∂ θ + 1 n + 1 ∂ g n + 1 ( θ ) ∂ θ = E ∂ g ( θ ) ∂ θ + o p ( 1 ) .$
Similarly, we can obtain
$∂ Q 2 n ∂ t τ = E ∂ g ( θ ) ∂ θ τ + o p ( 1 ) and − ∂ Q 1 n ∂ t τ = E g ( θ ) g ( θ ) τ + o p ( 1 ) .$
Thus as $n → ∞$
$S n ⟶ S 11 S 12 S 21 0 = − E g g τ E ∂ g ∂ θ E ∂ g ∂ θ τ 0 θ = θ 0 .$
We can see that
$S n − 1 ⟶ S 11 − 1 + S 11 − 1 S 12 S 22.1 − 1 S 21 S 11 − 1 − S 11 − 1 S 12 S 22.1 − 1 − S 22.1 − 1 S 21 S 11 − 1 S 22.1 − 1 ,$
where $S 22.1 − 1 = E ∂ g ∂ θ τ ( E g g τ ) − 1 E ∂ g ∂ θ − 1$. Consequently, (A1) can be solved as
$t ˜ θ ˜ − θ 0 = S n − 1 − Q 1 n ( θ 0 , 0 ) + o p ( δ n ) o p ( δ n ) ,$
which means
$θ ˜ − θ 0 = S 22.1 − 1 S 21 S 11 − 1 Q 1 n ( θ 0 , 0 ) + o p ( δ n ) t ˜ = − ( S 11 − 1 + S 11 − 1 S 12 S 22.1 − 1 S 21 S 11 − 1 ) Q 1 n ( θ 0 , 0 ) + o p ( δ n ) .$
Note that by Central Limit Theorem
$Q 1 n ( θ 0 , 0 ) = 1 n + 1 ∑ i = 1 n + 1 g i ( θ 0 ) = n 1 2 n + 1 · n − 1 2 ∑ i = 1 n g i ( θ 0 ) − n − 1 2 · a n n + 1 · n − 1 2 ∑ i = 1 n g i ( θ 0 ) = n − 1 2 · n − 1 2 ∑ i = 1 n g i ( θ 0 ) + o p ( n − 1 2 ) ,$
which implies
$n Q 1 n ( θ 0 , 0 ) ⟶ N ( 0 , E g g τ ) and Q 1 n = O p ( n − 1 2 ) .$
From (A2), we know that
$δ n = | | θ ˜ − θ 0 | | + | | t ˜ | | = O p ( n − 1 2 ) .$
Therefore, we have obtained the desired result
$t ˜ = − ( S 11 − 1 + S 11 − 1 S 12 S 22.1 − 1 S 21 S 11 − 1 ) Q 1 n ( θ 0 , 0 ) + o p ( n − 1 2 )$
and
$θ ˜ − θ 0 = S 22.1 − 1 S 21 S 11 − 1 Q 1 n ( θ 0 , 0 ) + o p ( n − 1 2 ) .$
In particular, we can see that
$t ˜ = O p ( n − 1 2 ) and θ ˜ − θ 0 = O p ( n − 1 2 ) .$
Now we are ready to compute $l ( θ ˜ ) = l ( θ ˜ 1 , θ ˜ 2 )$. Taylor expansion yields
$l ( θ ˜ 1 , θ ˜ 2 ) = ∑ i = 1 n + 1 log [ 1 + t ˜ τ g i ( θ ˜ ) ] = ∑ i = 1 n + 1 t ˜ τ g i ( θ ˜ ) − 1 2 ( t ˜ τ g i ( θ ˜ ) ) 2 + o p ( 1 ) = t ˜ τ ∑ i = 1 n + 1 g i ( θ ˜ ) − 1 2 t ˜ τ ∑ i = 1 n + 1 g i ( θ ˜ ) g i ( θ ˜ ) τ t ˜ + o p ( 1 ) .$
Note that expanding $g i ( θ ˜ )$ at $θ 0$, we get
$g i ( θ ˜ ) = g i ( θ 0 ) + ∂ g i ( θ 0 ) ∂ θ ( θ ˜ − θ 0 ) + O p ( n − 1 ) ,$
for $i = 1 , 2 , … , n$.
Hence
$∑ i = 1 n g i ( θ ˜ ) = ∑ i = 1 n g i ( θ 0 ) + ∑ i = 1 n ∂ g i ( θ 0 ) ∂ θ · ( θ ˜ − θ 0 ) + O p ( 1 ) = n Q 1 n ( θ 0 , 0 ) + n S 12 S 22.1 − 1 S 21 S 11 − 1 Q 1 n ( θ 0 , 0 ) + o p ( n 1 2 )$
and
$g n + 1 ( θ ˜ ) = − a n n ∑ i = 1 n g i ( θ ˜ ) = o p ( n 1 2 ) .$
Consequently, we can obtain the first term of (A5) as
$t ˜ τ ∑ i = 1 n + 1 g i ( θ ˜ ) = − n Q 1 n ( θ 0 , 0 ) τ ( S 11 − 1 + S 11 − 1 S 12 S 22.1 − 1 S 21 S 11 − 1 ) Q 1 n ( θ 0 , 0 ) + o p ( 1 ) .$
Now we calculate the second term of (A5). For $i = 1 , 2 , … , n$,
$g i ( θ ˜ ) g i ( θ ˜ ) τ = g i ( θ 0 ) g i ( θ 0 ) τ + O p ( n − 1 2 ) .$
Thus
$Σ i = 1 n g i ( θ ˜ ) g i ( θ ˜ ) τ = ∑ i = 1 n g i ( θ 0 ) g i ( θ 0 ) τ + O p ( n 1 2 ) = − n S 11 + O p ( n 1 2 ) .$
Note that
$g n + 1 ( θ ˜ ) g n + 1 ( θ ˜ ) τ = o p ( n 1 2 ) o p ( n 1 2 ) = o p ( n ) .$
We have
$t ˜ τ ∑ i = 1 n + 1 g i ( θ ˜ ) g i ( θ ˜ ) τ t ˜ = − n Q 1 n ( θ 0 , 0 ) τ ( S 11 − 1 + S 11 − 1 S 12 S 22.1 − 1 S 21 S 11 − 1 ) Q 1 n ( θ 0 , 0 ) + o p ( 1 ) .$
Finally, we have
$l ( θ ˜ 1 , θ ˜ 2 ) = − n 2 Q 1 n ( θ 0 , 0 ) τ ( S 11 − 1 + S 11 − 1 S 12 S 22.1 − 1 S 21 S 11 − 1 ) Q 1 n ( θ 0 , 0 ) + o p ( 1 ) .$
Similarly, we can apply the above process to $l ( θ 1 0 , θ ˜ 2 0 )$. The procedures are sketched as follows. Let $θ ˜ 2 0$ and $t 0 ˜ = t ( θ 1 0 , θ ˜ 2 0 )$ satisfy
$Q 1 n ( θ 1 0 , θ ˜ 2 0 , t 0 ˜ ) = 0 and Q 2 n ( θ 1 0 , θ ˜ 2 0 , t 0 ˜ ) = 0 .$
Expanding $Q 1 n$ and $Q 2 n$ at $( θ 1 0 , θ 2 0 , 0 )$ will produce the linear equations
$H n t 0 ˜ θ ˜ 2 0 − θ 2 0 = − Q 1 n ( θ 0 , 0 ) + o p ( δ n ′ ) o p ( δ n ′ ) ,$
where $θ 0 = ( θ 1 0 , θ 2 0 )$ is the true value of $θ$, $δ n ′ = | | θ ˜ 2 0 − θ 2 0 | | + | | t 0 ˜ | |$ and as $n → ∞$
$H n ⟶ H 11 H 12 H 21 0 = − E g g τ E ∂ g ∂ θ 2 E ∂ g ∂ θ 2 τ 0 θ = θ 0 .$
Note that $H 11 = S 11$.
Solving (A7) gives us
$t 0 ˜ = − ( H 11 − 1 + H 11 − 1 H 12 H 22.1 − 1 H 21 H 11 − 1 ) Q 1 n ( θ 0 , 0 ) + o p ( n − 1 2 )$
and
$θ ˜ 2 0 − θ 2 0 = H 22.1 − 1 H 21 H 11 − 1 Q 1 n ( θ 0 , 0 ) + o p ( n − 1 2 ) .$
By Taylor expansion, the above estimations yield
$l ( θ 1 0 , θ ˜ 2 0 ) = − 1 2 n Q 1 n ( θ 0 , 0 ) τ ( H 11 − 1 + H 11 − 1 H 12 H 22.1 − 1 H 21 H 11 − 1 ) Q 1 n ( θ 0 , 0 ) + o p ( 1 ) .$
Using (A9) and (A6), we can write
$W ( θ 1 0 ) = 2 l ( θ 1 0 , θ ˜ 2 0 ) − 2 l ( θ ˜ 1 , θ ˜ 2 ) = [ ( E g g τ ) − 1 2 n Q 1 n ( θ 0 , 0 ) ] τ ( A − B ) [ ( E g g τ ) − 1 2 n Q 1 n ( θ 0 , 0 ) ] + o p ( 1 ) ,$
where
$A = ( E g g τ ) − 1 2 E ∂ g ∂ θ E ∂ g ∂ θ τ ( E g g τ ) − 1 E ∂ g ∂ θ − 1 E ∂ g ∂ θ τ ( E g g τ ) − 1 2$
$B = ( E g g τ ) − 1 2 E ∂ g ∂ θ 2 E ∂ g ∂ θ 2 τ ( E g g τ ) − 1 E ∂ g ∂ θ 2 − 1 E ∂ g ∂ θ 2 τ ( E g g τ ) − 1 2$
and all the evaluations related to g are performed at the true value $θ 0$. By assumption, $E ∂ g ∂ θ$ has rank k and $E g g τ$ is positive definite. Therefore, both A and B are non-negative definite and idempotent. By Lemma 1
$E ∂ g ∂ θ E ∂ g ∂ θ τ ( E g g τ ) − 1 E ∂ g ∂ θ − 1 E ∂ g ∂ θ τ ≥ E ∂ g ∂ θ 1 , E ∂ g ∂ θ 2 0 0 0 E ∂ g ∂ θ 2 τ ( E g g τ ) − 1 E ∂ g ∂ θ 2 − 1 E ∂ g ∂ θ 1 τ E ∂ g ∂ θ 2 τ = E ∂ g ∂ θ 2 E ∂ g ∂ θ 2 τ ( E g g τ ) − 1 E ∂ g ∂ θ 2 − 1 E ∂ g ∂ θ 2 τ ,$
which means that $A − B$ is non-negative definite. Thus by Result 2, $A − B$ is also idempotent. From (A3), we can see that $( E g g τ ) − 1 2 n Q 1 n ( θ 0 , 0 )$ follows the multivariate standard normal distribution asymptotically. Note that $t r ( A ) = k$ and $t r ( B ) = k − q$. We have $t r ( A − B ) = k − ( k − q ) = q$. The requirement of Lemma 1 is satisfied, which implies
$W ( θ 1 0 ) → d χ q 2 .$

## References

1. Sharpe, W.F. Mutual fund performance. J. Bus. 1966, 39, 119–138. [Google Scholar] [CrossRef]
2. Jobson, J.D.; Korkie, B.M. Performance hypothesis testing with the sharpe and treynor measures. J. Financ. 1981, 36, 889–908. [Google Scholar] [CrossRef]
3. Fama, E. The behavior of stock-market prices. J. Bus. 1965, 38, 34–105. [Google Scholar] [CrossRef]
4. Leland, H.E. Beyond mean-variance: Risk and performance measurement in a nonsymmetric world. Financ. Anal. J. 1999, 1, 27–36. [Google Scholar] [CrossRef]
5. Agarwal, V.; Naik, N.Y. Risk and portfolio decisions involving hedge funds. Rev. Financ. Stud. 2004, 17, 63–98. [Google Scholar] [CrossRef]
6. Ingersoll, J.; Spiegel, M.; Goetzmann, W. Portfolio performance manipulation and manipulation-proof performance measures. Rev. Financ. Stud. 2007, 20, 1503–1546. [Google Scholar]
7. Samuelson, P. The fundamental approximation theorem of portfolio analysis in terms of means, variances, and higher moments. Rev. Econ. Stud. 1970, 37, 537–542. [Google Scholar] [CrossRef]
8. Scott, R.; Horvath, P. On the direction of preference for moments of higher order than variance. J. Financ. 1980, 35, 915–919. [Google Scholar] [CrossRef]
9. Zakamouline, V.; Koekebakker, S. Portfolio performance evaluation with generalized Sharpe ratios: Beyond the mean and variance. J. Bank. Financ. 2009, 33, 1242–1254. [Google Scholar] [CrossRef]
10. Pierro, M.D.; Mosevich, J. Effects of skewness and kurtosis on portfolio rankings. Quant. Financ. 2011, 11, 1449–1453. [Google Scholar] [CrossRef]
11. Thomas, D.R.; Grunkemeier, G.L. Confidence interval estimation of survival probabilities for censored data. J. Am. Stat. Assoc. 1975, 70, 865–871. [Google Scholar] [CrossRef]
12. Owen, A. Empirical likelihood ratio confidence intervals for a single functional. Biometrika 1988, 75, 237–249. [Google Scholar] [CrossRef]
13. Owen, A. Empirical likelihood ratio confidence regions. Ann. Stat. 1990, 18, 90–120. [Google Scholar] [CrossRef]
14. Hall, P.; La Scala, B. Methodology and algorithms of empirical likelihood. Inter. Stat. Rev. 1990, 58, 109–127. [Google Scholar] [CrossRef]
15. Qin, J.; Lawless, J. Empirical likelihood and general estimating equations. Ann. Stat. 1994, 22, 300–325. [Google Scholar] [CrossRef]
16. Chen, J.; Variyath, A.M.; Abraham, B. Adjusted empirical likelihood and its properties. J. Comput. Graph. Stat. 2008, 17, 426–443. [Google Scholar] [CrossRef]
17. Rao, C.R. Linear Statistical Inference and Its Applications; Wiley: New York, NY, USA, 1973. [Google Scholar]
18. Wilks, S.S. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses. Ann. Math. Stat. 1938, 9, 60–62. [Google Scholar] [CrossRef]
19. Wang, H.J.; Zhu, Z. Empirical likelihood for quantile regression models with longitudinal data. J. Stat. Plan. Inference 2011, 141, 1603–1615. [Google Scholar] [CrossRef]
20. Chen, J.; Huang, Y. Finite-sample properties of the adjusted empirical likelihood. J. Nonparametric Stat. 2013, 25, 147–159. [Google Scholar] [CrossRef]
21. Mertens, E. Comments on the Correct Variance of Estimated Sharpe Ratios in Lo (2002, FAJ) When Returns Are IID. Research Note. Available online: http://www.elmarmertens.com/research/discussion/soprano01.pdf (accessed on 30 January 2018).
22. Gallier, J. Geometric Methods and Applications for Computer Science and Engineering; Springer: New York, NY, USA, 2011. [Google Scholar]
Figure 1. Quantile-quantile plot of Apple Inc. return data.
Figure 1. Quantile-quantile plot of Apple Inc. return data.
Table 1. Coverage probabilities of the Sharpe ratio.
Table 1. Coverage probabilities of the Sharpe ratio.
$1 − α$Method$n = 20$$n = 50$$n = 200$$n = 500$
$N ( 1 , 0.25 )$
0.9JK0.89560.90220.89680.9060
Mertens0.82580.86940.88940.9004
EL0.82100.87600.89060.9040
AEL0.84860.88740.89420.9058
Delta0.84280.88400.88960.8976
0.95JK0.94600.94880.94880.9522
Mertens0.89260.92700.94140.9494
EL0.87620.92140.94400.9514
AEL0.89800.93120.94660.9534
Delta0.90540.93340.94080.9476
$t 3$
0.9JK0.89600.90040.90300.9040
Mertens0.83900.86460.87820.8890
EL0.84280.87380.88840.8946
AEL0.87940.88960.89440.8976
Delta0.82400.85860.87660.8858
0.95JK0.94940.95380.95160.9550
Mertens0.90280.91440.93260.9442
EL0.90420.92680.94380.9508
AEL0.93400.94020.94660.9514
Delta0.89100.90920.93180.9372
$t 6$
0.9JK0.89820.89840.89340.8976
Mertens0.87380.88400.89000.8954
EL0.87000.88600.89280.8962
AEL0.89860.90180.89660.8978
Delta0.86340.88280.89360.9008
0.95JK0.95040.94820.94660.9476
Mertens0.92460.93640.94260.9458
EL0.92400.93940.94380.9466
AEL0.94660.94700.94800.9481
Delta0.92140.93300.94600.9494
$χ 4 2$
0.9JK0.96400.95320.94760.9474
Mertens0.80480.84740.86760.8938
EL0.78000.83520.86260.8942
AEL0.82160.85360.86640.8952
Delta0.80720.83540.86600.8914
0.95JK0.98720.98080.98080.9774
Mertens0.87800.90720.92780.9422
EL0.85620.89720.91940.9418
AEL0.89240.91260.92280.9430
Delta0.88720.90460.92520.9388
$χ 6 2$
0.9JK0.94660.94760.93920.9414
Mertens0.80480.84600.87600.8916
EL0.79960.83920.87280.8904
AEL0.83460.85680.87800.8926
Delta0.82360.85240.87660.8846
0.95JK0.97960.97760.97580.9754
Mertens0.87800.91680.93520.9412
EL0.86260.90320.92840.9402
AEL0.88940.91560.93260.9410
Delta0.89160.91700.93360.9432
Table 2. Confidence Intervals of the Sharpe ratio for Apple Inc. return data.
Table 2. Confidence Intervals of the Sharpe ratio for Apple Inc. return data.
$1 − α$MethodEstimateLower BoundUpper Bound
0.9JK0.1907−0.04410.4254
Mertens0.1907−0.03500.4163
Delta0.1926−0.03290.4181
EL0.1926−0.03760.4140
AEL0.1926−0.04790.4241
0.95JK0.1907−0.08900.4703
Mertens0.1907−0.07830.4596
Delta0.1926−0.07610.4613
EL0.1926−0.08270.4558
AEL0.1926−0.09490.4683

## Share and Cite

MDPI and ACS Style

Fu, Y.; Wang, H.; Wong, A. Adjusted Empirical Likelihood Method in the Presence of Nuisance Parameters with Application to the Sharpe Ratio. Entropy 2018, 20, 316. https://doi.org/10.3390/e20050316

AMA Style

Fu Y, Wang H, Wong A. Adjusted Empirical Likelihood Method in the Presence of Nuisance Parameters with Application to the Sharpe Ratio. Entropy. 2018; 20(5):316. https://doi.org/10.3390/e20050316

Chicago/Turabian Style

Fu, Yuejiao, Hangjing Wang, and Augustine Wong. 2018. "Adjusted Empirical Likelihood Method in the Presence of Nuisance Parameters with Application to the Sharpe Ratio" Entropy 20, no. 5: 316. https://doi.org/10.3390/e20050316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.