Testing Symmetry of Unknown Densities via Smoothing with the Generalized Gamma Kernels

Masayuki Hirukawa; Mari Sakudo

doi:10.3390/econometrics4020028

and

¹

Faculty of Economics, Setsunan University, 17-8 Ikeda Nakamachi, Neyagawa, Osaka 572-8508, Japan

²

Research Institute of Capital Formation, Development Bank of Japan, 9-7, Otemachi 1-chome, Chiyoda-ku, Tokyo 100-8178, Japan

³

Waseda Institute of Political Economy, Waseda University, 6-1 Nishiwaseda 1-chome, Shinjuku-ku, Tokyo 169-8050, Japan

⁴

Japan Economic Research Institute, 2-1, Otemachi 2-chome, Chiyoda-ku, Tokyo 100-0004, Japan

Econometrics2016, 4(2), 28;https://doi.org/10.3390/econometrics4020028

Version Notes

Order Reprints

Abstract

This paper improves a kernel-smoothed test of symmetry through combining it with a new class of asymmetric kernels called the generalized gamma kernels. It is demonstrated that the improved test statistic has a normal limit under the null of symmetry and is consistent under the alternative. A test-oriented smoothing parameter selection method is also proposed to implement the test. Monte Carlo simulations indicate superior finite-sample performance of the test statistic. It is worth emphasizing that the performance is grounded on the first-order normal limit and a small number of observations, despite a nonparametric convergence rate and a sample-splitting procedure of the test.

Keywords:

asymmetric kernel; degenerate U-statistic; generalized gamma kernels; nonparametric kernel testing; smoothing parameter selection; symmetry test; two-sample goodness-of-fit test

MSC:

62G10; 62G20

JEL:

C12; C14

1. Introduction

Symmetry and conditional symmetry play a key role in numerous fields of economics and finance. Economists’ focuses are often on asymmetry of price adjustments (Bacon [1]), innovations in asset markets (Campbell and Hentschel [2]) or policy shocks (Clarida and Gertler [3]). In addition, the mean-variance analysis in finance is consistent with investors’ portfolio decision making if and only if asset returns are elliptically distributed (e.g., Chamberlain [4]; Owen and Rabinovitch [5]; Appendix B in Chapter 4 of Ingersoll [6]). Moreover, conditional symmetry in the distribution of the disturbance is often a key regularity condition for regression analysis. In particular, convergence properties of adaptive estimation and robust regression estimation are typically explored under this condition. For the former, Bickel [7] and Newey [8] demonstrate that conditional symmetry of the disturbance distribution in the contexts of linear regression and moment-condition models, respectively, suffices for adaptive estimators to attain their efficiency bounds. For the latter, Carroll and Welsh [9] warn invalidity in inference based on robust regression estimation when the regression disturbance is asymmetrically distributed. Indeed, symmetry of the disturbance distribution is often a key assumption for consistency of parameter estimators in certain versions of robust regression estimation (e.g., Lee [10,11]; Zinde-Walsh [12]; Bondell and Stefanski [13]). Based on their simulation studies, Baldauf and Santos Silva [14] also argue that lack of conditional symmetry in the disturbance distribution may lead to inconsistency of parameter estimates via robust regression estimation.

In view of the importance in the existence of symmetry, a number of tests for symmetry and conditional symmetry have been proposed. The tests can be classified into kernel and non-kernel methods. Examples for the former include Fan and Gencay [15], Ahmad and Li [16], Zheng [17], Diks and Tong [18], and Fan and Ullah [19]. The latter falls into the tests based on: (i) sample moments (Randles et al. [20]; Godfrey and Orme [21]; Bai and Ng [22]; Premaratne and Bera [23]); (ii) regression percentile (Newey and Powell [24]); (iii) martingale transformation (Bai and Ng [25]); (iv) empirical processes (Delgado and Escanciano [26]; Chen and Tripathi [27]); and (v) Neyman’s smooth test (Fang et al. [28]). Our focus is on the test by Fernandes, Mendes and Scaillet [29] (abbreviated as “FMS” hereafter). While this test can be viewed as the kernel-smoothed one, it has a unique feature. When a probability density function (“pdf”) is symmetric about zero, its shapes on positive and negative sides must be mirror images each other. Then, after estimating pdfs on positive and negative sides separately using positive and absolute values of negative observations, respectively, FMS examine whether symmetry holds through gauging closeness between two density estimates. By this nature, we call the test the split-sample symmetry test (“SSST”) hereafter. One of the features of the SSST is that it relies on asymmetric kernels with support on

[0, \infty)

such as the gamma (“G”) kernel by Chen [30]. Asymmetric kernel estimators are nonnegative and boundary bias-free, and achieve the optimal convergence rate (in the mean integrated squared error sense) within the class of nonnegative kernel estimators. It is also reported (e.g., p. 597 of Gospodinov and Hirukawa [31]; p. 651 of FMS) that asymmetric kernel-based estimation and inference possess nice finite-sample properties. The split-sample approach is expected to result in efficiency loss. However, it can attain the same convergence rate as the smoothed symmetry tests using symmetric kernels do. Furthermore, unlike these tests, the SSST does not require continuity of density derivatives at the origin.

The aim of this paper is to ameliorate the SSST further through combining it with the generalized gamma (“GG”) kernels, a new class of asymmetric kernels with support on

[0, \infty)

that have been proposed recently by Hirukawa and Sakudo [32]. Our particular focus is on two special cases of the GG kernels, namely, the modified gamma (“MG”) and Nakagami-m (“NM”) kernels. While superior finite-sample performance of the MG kernel has been reported in the literature, the NM kernel is also anticipated to have an advantage when applied to the SSST. It is known that finite-sample performance of a kernel density estimator depends on proximity in shape between the underlying density and the kernel chosen. As shown in Section 2, the NM kernel collapses to the half-normal pdf when smoothing is made at the origin, and the shape of the density is likely to be close to those on the positive side of single-peaked symmetric distributions. We also pay particular attention to the smoothing parameter selection. While existing articles on asymmetric kernel-smoothed tests (e.g., Fernandes and Grammig [33]; FMS) simply borrow the choice method based on optimality for density estimation, we tailor the idea of test-oriented smoothing parameter selection by Kulasekera and Wang [34,35] to the SSST.

The SSST with the GG kernels plugged in preserves all appealing properties documented in FMS. First, the SSST has a normal limit under the null of symmetry and it is also consistent under the alternative. Hence, unlike the tests by Delgado and Escanciano [26] and Chen and Tripathi [27], no simulated critical values are required. Second, Monte Carlo simulations indicate superior finite-sample performance of the SSST smoothed by the GG kernels. The performance is confirmed even when the entire sample size is 50, despite a nonparametric convergence rate and a sample-splitting procedure. Remarkably, the superior performance is based simply on first-order asymptotic results, and thus the assistance of bootstrapping appears to be unnecessary, unlike most of the smoothed tests employing fixed, symmetric kernels. This result complements previous findings on asymmetric kernel-smoothed tests by Fernandes and Grammig [33] and FMS.

The remainder of this paper is organized as follows. In Section 2 a brief review of a family of the GG kernels is provided. Section 3 proposes symmetry and conditional symmetry tests based on the GG kernels. Their limiting null distributions and power properties are also explored. As an important practical problem, Section 4 discusses the smoothing parameter selection. Our particular focus is on the choice method for power optimality. Section 5 conducts Monte Carlo simulations to investigate finite-sample properties of the test statistics. Section 6 summarizes the main results of the paper. Proofs are provided in the Appendix.

This paper adopts the following notational conventions:

Γ (a) = \int_{0}^{\infty} y^{a - 1} exp (- y) d y

(a > 0)

is the gamma function;

1 \{\cdot\}

signifies an indicator function;

⌊\cdot⌋

denotes the integer part;

∥A∥ = {\{tr (A^{'} A)\}}^{1 / 2}

is the Euclidian norm of matrix A; and

c (> 0)

denotes a generic constant, the quantity of which varies from statement to statement. The expression “

X \overset{d}{=} Y

” reads “A random variable X obeys the distribution Y.” The expression “

X_{n} \sim Y_{n}

” is used whenever

X_{n} / Y_{n} \to 1

as

n \to \infty

. Lastly, in order to describe different asymptotic properties of an asymmetric kernel estimator across positions of the design point

x (> 0)

relative to the smoothing parameter

b (> 0)

that shrinks toward zero, we denote by “interior x” and “boundary x” a design point x that satisfies

x / b \to \infty

and

x / b \to κ

for some

0 < κ < \infty

as

b \to 0

, respectively.

2. Family of the GG Kernels: A Brief Review

Before proceeding, we provide a concise review on a family of the GG kernels. The family constitutes a new class of asymmetric kernels, and it consists of a specific functional form and a set of common conditions, as in Definition 1 below. The name “GG kernels” comes from the fact that the pdf of a GG distribution by Stacy [36] is chosen as the functional form. A major advantage of the family is that for each asymmetric kernel generated from this class, asymptotic properties of the kernel estimators (e.g., density and regression estimators) can be delivered by manipulating the conditions directly, as with symmetric kernels.

Definition 1.

(Hirukawa and Sakudo [32], Definition 1) Let

(α, β, γ) = (α_{b} (x), β_{b} (x), γ_{b} (x)) \in R_{+}^{3}

be a continuous function of the design point x and the smoothing parameter b. For such

(α, β, γ)

, consider the pdf of

G G (α, β Γ (α / γ) / Γ \{(α + 1) / γ\}, γ)

, i.e.,

K_{G G} (u; x, b) = \frac{γ u^{α - 1} exp [- {\{\frac{u}{β Γ (\frac{α}{γ}) / Γ (\frac{α + 1}{γ})}\}}^{γ}]}{{\{β Γ (\frac{α}{γ}) / Γ (\frac{α + 1}{γ})\}}^{α} Γ (\frac{α}{γ})} 1 \{u \geq 0\} .

(1)

This pdf is said to be a family of the GG kernels if it satisfies each of the following conditions:

Condition 1.

β = \{\begin{matrix} x & for x \geq C_{1} b \\ φ_{b} (x) & for x \in [0, C_{1} b) \end{matrix}

, where

0 < C_{1} < \infty

is some constant, the function

φ_{b} (x)

satisfies

C_{2} b \leq φ_{b} (x) \leq C_{3} b

for some constants

0 < C_{2} \leq C_{3} < \infty

, and the connection between x and

φ_{b} (x)

at

x = C_{1} b

is smooth.

Condition 2.

α, γ \geq 1

, and for

x \in [0, C_{1} b)

, α satisfies

1 \leq α \leq C_{4}

for some constant

1 \leq C_{4} < \infty

. Moreover, connections of α and γ at

x = C_{1} b

, if any, are smooth.

Condition 3.

M_{b} (x) : = \frac{Γ (\frac{α}{γ}) Γ (\frac{α + 2}{γ})}{{\{Γ (\frac{α + 1}{γ})\}}^{2}} = \{\begin{matrix} 1 + (C_{5} / x) b + o (b) & for x \geq C_{1} b \\ O (1) & for x \in [0, C_{1} b) \end{matrix}

, for some constant

0 < |C_{5}| < \infty

.

Condition 4.

H_{b} (x) : = \frac{Γ (\frac{α}{γ}) Γ (\frac{2 α}{γ})}{2^{1 / γ} Γ (\frac{α + 1}{γ}) Γ (\frac{2 α - 1}{γ})} = \{\begin{matrix} 1 + o (1) & for interior x \\ O (1) & for boundary x \end{matrix}

.

Condition 5.

A_{b, ν} (x) : = {\{\frac{γ Γ (\frac{α + 1}{γ})}{β}\}}^{ν - 1} \frac{Γ \{\frac{ν (α - 1) + 1}{γ}\}}{ν^{\frac{ν (α - 1) + 1}{γ}} {\{Γ (\frac{α}{γ})\}}^{2 ν - 1}} \sim \{\begin{matrix} V_{I} (ν) {(x b)}^{\frac{1 - ν}{2}} & for interior x \\ V_{B} (ν) b^{1 - ν} & for boundary x \end{matrix}

,

ν \in R_{+}

, where constants

0 < V_{I} (ν), V_{B} (ν) < \infty

depend only on ν.

The family embraces the following two special cases1. Putting

(α, β) = \{\begin{matrix} (\frac{x}{b}, x) & for x \geq 2 b \\ (\frac{1}{4} {(\frac{x}{b})}^{2} + 1, \frac{x^{2}}{4 b} + b) & for x \in [0, 2 b) \end{matrix}

and

γ = 1

in (1) generates the MG kernel

K_{M G} (u; x, b) = \frac{u^{α - 1} exp \{- u / (β / α)\}}{{(β / α)}^{α} Γ (α)} 1 \{u \geq 0\} .

It can be found that this is equivalent to the one proposed by Chen [30] by recognizing that

α = ρ_{b} (x)

on p. 473 of Chen [30] and

β / α = b

. The same

(α, β)

and

γ = 2

also yields the NM kernel

K_{N M} (u; x, b) = \frac{2 u^{α - 1} exp [- {\{u / (β Γ (\frac{α}{2}) / Γ (\frac{α + 1}{2}))\}}^{2}]}{{\{β Γ (\frac{α}{2}) / Γ (\frac{α + 1}{2})\}}^{α} Γ (\frac{α}{2})} 1 \{u \geq 0\} .

The GG kernels are designed to inherit all appealing properties that the MG kernel possesses. We conclude this section by referring to the properties. Two properties below are basic ones. First, by construction, the GG kernels are free of boundary bias and always generate nonnegative density estimates everywhere. Second, the shape of each GG kernel varies according to the position at which smoothing is made; in other words, the amount of smoothing changes in a locally adaptive manner. To illustrate this property, Figure 1 plots the shapes of the MG and NM kernels at four different design points (

x = 0.0, 0.5, 1.0, 2.0

) at which the smoothing is performed. For reference, the G kernel is also drawn in each panel. When smoothing is made at the origin (Panel (A)), the NM kernel collapses to a half-normal pdf, whereas others reduce to exponential pdfs. As the design point moves away from the boundary (Panels (B–D)), the shape of each kernel becomes flatter and closer to symmetry. We should stress that Figure 1 is drawn with the value of the smoothing parameter fixed at

b = 0.2

. Unlike variable bandwidth methods for fixed, symmetric kernels (e.g., Abramson [37]), adaptive smoothing of these kernels can be achieved by a single smoothing parameter, which makes them much more appealing in empirical work.

Figure 1. Shapes of the GG Kernels When

b = 0.2

.

The remaining three properties are on density estimates using the GG kernels. Third, when best implemented, each GG density estimator attains Stone’s [38] optimal convergence rate in the mean integrated squared error within the class of nonnegative kernel density estimators. Fourth, the leading bias of each GG density estimator contains only the second-order derivative of the true density over the interior region, unlike many other asymmetric kernels including the G kernel. Fifth, the variance of the GG estimator tends to decrease as the design point moves away from the boundary. This property is particularly advantageous to estimating the distributions that have long tails with sparse data.

3. Tests for Symmetry and Conditional Symmetry Smoothed by the GG Kernels

3.1. SSST as a Special Case of Two-Sample Goodness-of-Fit Tests

This section proposes to combine the SSST with the GG kernels, explores asymptotic properties of the test statistic, and finally expands the scope of the test to testing the null of conditional symmetry. The SSST can be characterized as a special case of two-sample tests for equality of two unknown densities investigated by Anderson, Hall and Titterington [39]. Suppose that we are interested in testing symmetry of the distribution of a random variable

U \in R

. Without loss of generality, we hypothesize that the distribution is symmetric about zero. If U has a pdf, then under the null, its shapes on positive and negative sides of the entire real line

R

must be mirror images each other. Let f and g be the pdfs to the right and left from the origin, respectively. Then, we would like to test the null hypothesis

H_{0} : f (u) = g (u) for almost all u \in R_{+}

against the alternative

H_{1} : f (u) \neq g (u) on a set of positive measure in R_{+} .

Accordingly, a natural test statistic should be built on the integrated squared error (“ISE”)

\begin{matrix} I & = \int_{0}^{\infty} {\{f (u) - g (u)\}}^{2} d u \\ = \int_{0}^{\infty} \{f (u) - g (u)\} d F (u) - \int_{0}^{\infty} \{f (u) - g (u)\} d G (u), \end{matrix}

where F and G are cumulative distribution functions corresponding to f and g, respectively.

The name of the SSST comes from the way to construct a sample analog to I. A random sample of N observations

{\{U_{i}\}}_{i = 1}^{N}

is split into two sub-samples, namely,

{\{X_{i}\}}_{i = 1}^{n_{1}} : = {\{U_{i} : U_{i} \geq 0\}}_{i = 1}^{n_{1}}

and

{\{Y_{i}\}}_{i = 1}^{n_{2}} : = {\{- U_{i} : U_{i} < 0\}}_{i = 1}^{n_{2}}

, where

N = n_{1} + n_{2}

. Given the sub-samples, f and g can be estimated using a GG kernel with the smoothing parameter b as

\hat{f} (u) = \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} K_{G G} (X_{i}; u, b) and \hat{g} (u) = \frac{1}{n_{2}} \sum_{i = 1}^{n_{2}} K_{G G} (Y_{i}; u, b),

respectively2. Similarly,

(F, G)

is replaced with their empirical measures

(F_{n_{1}}, G_{n_{2}})

. In addition, because

n_{1} \sim n_{2}

under

H_{0}

, without loss of generality and for ease of exposition, we assume that N is even and that

n : = n_{1} = n_{2} = N / 2

. Using a short-handed notation

K_{X} (Y) = K_{G G} (Y; X, b)

finally yields the sample analog to I as

\begin{matrix} {\bar{I}}_{n} & = \frac{1}{n} \sum_{i = 1}^{n} \{\hat{f} (X_{i}) + \hat{g} (Y_{i}) - \hat{g} (X_{i}) - \hat{f} (Y_{i})\} \\ = \sum_{i = 1}^{n} \frac{1}{n^{2}} \{K_{X_{i}} (X_{i}) + K_{Y_{i}} (Y_{i}) - K_{Y_{i}} (X_{i}) - K_{X_{i}} (Y_{i})\} \\ + \sum_{j = 1}^{n} \sum_{i = 1, i \neq j}^{n} \frac{1}{n^{2}} \{K_{X_{j}} (X_{i}) + K_{Y_{j}} (Y_{i}) - K_{Y_{j}} (X_{i}) - K_{X_{j}} (Y_{i})\} \\ = I_{1 n} + I_{n} . \end{matrix}

Although we could use

{\bar{I}}_{n}

itself as the test statistic, the probability limit of

I_{1 n}

plays a role in a non-vanishing center term of the asymptotic null distribution. Because the term is likely to cause size distortions in finite samples, we focus only on

I_{n}

to construct the testing statistic. Now

I_{n}

can be rewritten as

I_{n} : = \sum_{1 \leq i < j \leq n} Φ_{n} (Z_{i}, Z_{j}) : = \sum_{1 \leq i < j \leq n} \frac{1}{n^{2}} \{ϕ_{n} (Z_{i}, Z_{j}) + ϕ_{n} (Z_{j}, Z_{i})\},

where

Z_{i} : = (X_{i}, Y_{i})

and

ϕ_{n} (Z_{i}, Z_{j}) : = K_{X_{j}} (X_{i}) + K_{Y_{j}} (Y_{i}) - K_{Y_{j}} (X_{i}) - K_{X_{j}} (Y_{i})

. Observe that

Φ_{n} (Z_{i}, Z_{j})

is symmetric between

Z_{i}

and

Z_{j}

and that

E \{Φ_{n} (Z_{i}, Z_{j})| Z_{i}\} = 0

almost surely under

H_{0}

. It follows that

I_{n}

is a degenerate U-statistic, and thus we may apply a martingale central limit theorem (e.g., Theorem 1 of Hall [40]; Theorem 4.7.3 of Koroljuk and Borovskich [41]).

Before describing the asymptotic properties of

I_{n}

, we make two remarks. First, applying the idea of two-sample goodness-of-fit tests to the symmetry test is not new. Ahmad and Li [16] and Fan and Ullah [19] have also studied the symmetry test based on closeness of two density estimates measured by the ISE. They estimate densities using two samples, namely, the original entire sample

{\{X_{i}\}}_{i = 1}^{N} : = {\{U_{i}\}}_{i = 1}^{N}

and the one obtained by flipping the sign of each observation

{\{Y_{i}\}}_{i = 1}^{N} : = {\{- U_{i}\}}_{i = 1}^{N}

in our notations. Because each of X and Y has support on

(- \infty, \infty)

by construction, a standard symmetric kernel is employed for density estimation unlike the SSST. Second, if X and Y are taken from two different distributions with support on

[0, \infty)

, then

I_{n}

can be viewed as a pure two-sample goodness-of-fit test. It can be immediately applied to the testing for equality of two unknown distributions of nonnegative economic and financial variables such as incomes, wages, short-term interest rates, and insurance claims.

To present the convergence properties of

I_{n}

, we make the following assumptions.

Assumption 1.

Two random samples

{\{X_{i}\}}_{i = 1}^{n_{1}}

and

{\{Y_{i}\}}_{i = 1}^{n_{2}}

are drawn independently from univariate distributions that have pdfs f and g with support on

[0, \infty)

, respectively.

Assumption 2.

f and g are twice continuously differentiable on

[0, \infty)

, and

E {|X f^{''} (X)|}^{2}

,

E |X^{2} f^{''} (X) g^{''} (X)|

,

E |Y^{2} f^{''} (Y) g^{''} (Y)|

,

E {|Y g^{''} (Y)|}^{2} < \infty

.

Assumption 3.

The smoothing parameter

b (= b_{n})

satisfies

b + {(n b)}^{- 1} \to 0

as

n \to \infty

.

Assumption 4.

Let

(X_{1}, X_{2})

and

(Y_{1}, Y_{2})

be two independent copies of X and Y, respectively. Then, the followings hold:

(a): $E \{K_{X_{2}} (X_{1}) K_{Y_{2}} (X_{1})\} \sim E \{f (X) g (X)\}$ ; and $E \{K_{Y_{2}} (Y_{1}) K_{X_{2}} (Y_{1})\} \sim E \{f (Y) g (Y)\}$ .
(b): $E \{K_{X_{2}} (X_{1}) K_{X_{1}} (X_{2})\} \sim b^{- 1 / 2} V_{I} (2) E \{X^{- 1 / 2} f (X)\}$ ; $E \{K_{X_{2}} (Y_{1}) K_{Y_{1}} (X_{2})\} \sim$
$b^{- 1 / 2} V_{I} (2)$ $E \{X^{- 1 / 2} g (X)\}$ ; $E \{K_{Y_{2}} (X_{1}) K_{X_{1}} (Y_{2})\} \sim b^{- 1 / 2} V_{I} (2) E \{Y^{- 1 / 2} f (Y)\}$ ;
and $E \{K_{Y_{2}} (Y_{1}) K_{Y_{1}} (Y_{2})\}$ $\sim b^{- 1 / 2} V_{I} (2) E \{Y^{- 1 / 2} g (Y)\}$ , where $V_{I} (2)$ is a kernel-specific constant given in Condition 5 of Definition 1.

Assumptions 1–3 are standard in the literature of asymmetric kernel smoothing. On the other hand, Assumption 4 has a different flavor. Convergence results on

I_{n}

are built on several different moment approximations. While Definition 1 implies the statements in Lemma A2 in the Appendix, it is unclear whether the definition may even admit such approximations as in Assumption 4. The difficulty comes from the fact that unlike symmetric kernels, roles of design points and data points are nonexchangeable in asymmetric kernels. What makes the problem more complicated is that functional forms of

(α, β, γ) = (α_{b} (x), β_{b} (x), γ_{b} (x))

in the GG kernels are not fully specified in Definition 1. Considering that not all GG kernels may admit the moment approximations (a) and (b), we choose to make an extra assumption. Note that the MG and NM kernels fulfill Assumption 4, as documented in the next lemma.

Lemma 1.

If Assumptions 1–3 hold, then each of the MG and NM kernels satisfies Assumption 4.

The theorem below delivers the convergence properties of

I_{n}

and provides a consistent estimator of its asymptotic variance.

Theorem 1.

Suppose that Assumptions 1–4 and

n_{1} = n_{2} = n

hold.

(i): Under $H_{0}$ , $n b^{1 / 4} I_{n} \overset{d}{\to} N (0, σ^{2})$ as $n \to \infty$ , where

$σ^{2} = 2 V_{I} (2) E [X^{- 1 / 2} \{f (X) + g (X)\} + Y^{- 1 / 2} \{f (Y) + g (Y)\}],$

which reduces to $σ^{2} = 8 V_{I} (2) E \{X^{- 1 / 2} f (X)\}$ under $H_{0}$ , and $V_{I} (2)$ is a kernel-specific constant given in Condition 5 of Definition 1.
(ii): A consistent estimator of $σ^{2}$ is given by

${\hat{σ}}^{2} = 2 V_{I} (2) \frac{1}{n} \sum_{i = 1}^{n} [X_{i}^{- 1 / 2} \{\hat{f} (X_{i}) + \hat{g} (X_{i})\} + Y_{i}^{- 1 / 2} \{\hat{f} (Y_{i}) + \hat{g} (Y_{i})\}] .$

(2)

We make a few remarks. First, it follows from Lemma 1 and Theorem 1 that the MG and NM kernels can be safely employed for the SSST, where values of

V_{I} (2)

for these kernels are

(V_{I, M G} (2), V_{I, N M} (2)) = (1 / (2 \sqrt{π}), 1 / \sqrt{2 π})

. It also follows from Proposition 1 of FMS and Theorem 1 that limiting null distributions of

n b^{1 / 4} I_{n}

using the G and MG kernels coincide, as expected. Second, while a similar form to the asymptotic variance

σ^{2}

can be found in Proposition 1 of FMS,

σ^{2}

takes a more general form. Accordingly, the variance estimator

{\hat{σ}}^{2}

is consistent under both

H_{0}

and

H_{1}

. Third, it can be inferred from Theorem 1 that the test statistic becomes

T_{n} : = n b^{1 / 4} I_{n} / \hat{σ}

. As a consequence, the SSST is a one-sided test that rejects

H_{0}

in favor of

H_{1}

if

T_{n} > z_{α}

, where

z_{α}

is the upper α-percentile of

N (0, 1)

.

The next proposition refers to consistency of the SSST. Observe that the power approaches one for local alternatives with convergence rates no faster than

n b^{1 / 4}

, as well as for fixed alternatives.

Proposition 1.

If Assumptions 1–4 hold, then under

H_{1}

,

Pr (T_{n} > B_{n}) \to 1

as

n \to \infty

for any non-stochastic sequence

B_{n}

satisfying

B_{n} = o (n b^{1 / 4})

.

3.2. SSST When Two Sub-Samples Have Unequal Sample Sizes

Convergence results in the previous section rely on the assumption that the sample sizes of two sub-samples

{\{X_{i}\}}_{i = 1}^{n_{1}}

and

{\{Y_{i}\}}_{i = 1}^{n_{2}}

are the same, i.e., so far

n_{1} = n_{2}

has been maintained. In reality,

n_{1} \neq n_{2}

is often the case, in particular, when the entire sample size

N = n_{1} + n_{2}

is odd or when

H_{1}

is true.

Handling this case requires more tedious calculation. When

n_{1} \neq n_{2}

,

I_{n}

can be rewritten as

\begin{matrix} I_{n_{1}, n_{2}} & = \sum_{j = 1}^{n_{1}} \sum_{i = 1, i \neq j}^{n_{1}} \frac{1}{n_{1}^{2}} K_{X_{j}} (X_{i}) + \sum_{j = 1}^{n_{2}} \sum_{i = 1, i \neq j}^{n_{2}} \frac{1}{n_{2}^{2}} K_{Y_{j}} (Y_{i}) \\ - \sum_{j = 1}^{n_{2}} \sum_{i = 1, i \neq j}^{n_{1}} \frac{1}{n_{1} n_{2}} K_{Y_{j}} (X_{i}) - \sum_{j = 1}^{n_{1}} \sum_{i = 1, i \neq j}^{n_{2}} \frac{1}{n_{1} n_{2}} K_{X_{j}} (Y_{i}) . \end{matrix}

(3)

Following Fan and Ullah [19], we deliver convergence results under the assumption that two sample sizes

n_{1}

and

n_{2}

diverge at the same rate. The asymptotic variance of

n_{1} b^{1 / 4} I_{n_{1}, n_{2}}

and its consistent estimate are also provided. Because the essential arguments are the same as those for Theorem 1 and Proposition 1, we omit the proofs of Theorem 2 and Proposition 2 and simply state the results. Observe that when

n_{1} = n_{2} = n

, these results collapse to Theorem 1 and Proposition 1, respectively.

Theorem 2.

Suppose that Assumptions 1–4 and

n_{1} / n_{2} \to λ

for some constant

λ \in (0, \infty)

hold.

(i): Under $H_{0}$ , $n_{1} b^{1 / 4} I_{n_{1}, n_{2}} \overset{d}{\to} N (0, σ_{λ}^{2})$ as $n_{1} \to \infty$ , where

$\begin{matrix} σ_{λ}^{2} & = 2 V_{I} (2) [E \{X^{- 1 / 2} f (X)\} + λ E \{X^{- 1 / 2} g (X)\} \\ + λ E \{Y^{- 1 / 2} f (Y)\} + λ^{2} E \{Y^{- 1 / 2} g (Y)\}], \end{matrix}$

which reduces to $σ_{λ}^{2} = 2 {(1 + λ)}^{2} V_{I} (2) E \{X^{- 1 / 2} f (X)\}$ under $H_{0}$ .
(ii): A consistent estimator of $σ_{λ}^{2}$ is given by

$\begin{matrix} {\hat{σ}}_{λ}^{2} & = 2 V_{I} (2) \{\frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} X_{i}^{- 1 / 2} \hat{f} (X_{i}) + (\frac{n_{1}}{n_{2}}) \frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} X_{i}^{- 1 / 2} \hat{g} (X_{i}) \\ + (\frac{n_{1}}{n_{2}}) \frac{1}{n_{2}} \sum_{i = 1}^{n_{2}} Y_{i}^{- 1 / 2} \hat{f} (Y_{i}) + {(\frac{n_{1}}{n_{2}})}^{2} \frac{1}{n_{2}} \sum_{i = 1}^{n_{2}} Y_{i}^{- 1 / 2} \hat{g} (Y_{i})\} . \end{matrix}$

(4)

Proposition 2.

If Assumptions 1–4 and

n_{1} / n_{2} \to λ \in (0, \infty)

hold, then under

H_{1}

,

Pr (T_{n_{1}, n_{2}} > B_{n_{1}}) : = Pr (n_{1} b^{1 / 4} I_{n_{1}, n_{2}} / {\hat{σ}}_{λ} > B_{n_{1}}) \to 1

as

n_{1} \to \infty

for any non-stochastic sequence

B_{n_{1}}

satisfying

B_{n_{1}} = o (n_{1} b^{1 / 4})

.

The next corollary is a natural outcome from Theorem 2 and comes from the fact that under

H_{0}

,

n_{1} \sim n_{2}

or

λ = 1

holds. Because N could be odd in this context, n should read

n = ⌊N / 2⌋

.

Corollary 1.

If Assumptions 1–4 and

n_{1} / n_{2} \to 1

hold, then

n_{1}, n_{2} \sim n = ⌊N / 2⌋

so that

n_{1} b^{1 / 4} I_{n_{1}, n_{2}} = n b^{1 / 4} I_{n} + o_{p} (1)

.

3.3. Extension to a Test for Conditional Symmetry

So far we have maintained the assumption that the random variable U is observable and has a distribution that is symmetric about zero. However, often U is unobservable or the axis of symmetry is not zero. The former is typical when we are interested in symmetry of the distribution of the disturbance conditional on regressors in regression analysis. In this scenario, the test is conducted after U is replaced with the residual. For the latter, the test should be based on location-adjusted observations, i.e., transformed observations with an estimate of the axis of symmetry (e.g., the sample mean or the sample median) subtracted from U. These aspects motivate us to generalize the SSST to the testing for conditional symmetry.

Following FMS, we consider a testing for symmetry in the conditional distribution of

V_{1}| V_{2}

with

(V_{1}, V_{2}) \in R \times R^{d}

within the framework of a semiparametric context. Specifically, for a parameter space

Θ_{1}

and a function

ξ_{1} : R^{d} \times Θ_{1} \to R

, it suffices to check whether the conditional distribution of

V_{1}| V_{2}

is symmetric about

ξ_{1} (V_{2}; θ_{1}^{0})

for some

θ_{1}^{0} \in

Θ_{1}

. Observe that this is equivalent to test whether there is

θ_{1}^{0} \in

Θ_{1}

such that the conditional distribution of

V| V_{2} : =

V_{1} - ξ_{1} (V_{2}; θ_{1}^{0})| V_{2}

is symmetric about zero.

However, implementing this type of testing strategy requires to estimate the conditional pdf of

V| V_{2}

nonparametrically. This is cumbersome, considering the curse of dimensionality in

V_{2}

and another smoothing parameter choice. Instead, as in Zheng [17], Bai and Ng [25] and Delgado and Escanciano [26], we assume that there are a parameter space

Θ_{2}

and a function

ξ_{2} : R^{d + 1} \times Θ_{2} \to R

that can attain symmetry of the marginal distribution of

U : = ξ_{2} (V_{1}, V_{2}; θ_{2}^{0})

about zero for some

θ_{2}^{0} \in

Θ_{2}

. Given the dependence of V (and thus U) on

Θ_{1}

, we can finally rewrite our testing scheme as the one that tests, for a suitable parameter space Θ and a function

ξ : R^{d + 1} \times Θ \to R

, symmetry of the marginal distribution of

U = ξ (V_{1}, V_{2}; θ_{0})

about zero for some

θ_{0} \in

Θ.

Accordingly, the procedure of the conditional symmetry test takes the following two steps. First, we estimate

ξ (\cdot, \cdot; θ_{0})

given N observations

{\{(V_{1 i}, V_{2 i})\}}_{i = 1}^{N}

and denote a consistent estimator of

(ξ, θ_{0})

as

(\hat{ξ}, \hat{θ})

. Second, the test is conducted using

{\{{\hat{U}}_{i}\}}_{i = 1}^{N} : =

{\{\hat{ξ} (V_{1}, V_{2}; \hat{θ})\}}_{i = 1}^{N}

. As before, the entire sample is split into two sub-samples

{\{{\hat{X}}_{i}\}}_{i = 1}^{n_{1}} : = {\{{\hat{U}}_{i} : {\hat{U}}_{i} \geq 0\}}_{i = 1}^{n_{1}}

and

{\{{\hat{Y}}_{i}\}}_{i = 1}^{n_{2}} : = {\{- {\hat{U}}_{i} : {\hat{U}}_{i} < 0\}}_{i = 1}^{n_{2}}

. Then, the test statistics, namely,

I_{n} (\hat{ξ}, \hat{θ})

and

I_{n_{1}, n_{2}} (\hat{ξ}, \hat{θ})

for equal (

n_{1} = n_{2} = n

) and unequal (

n_{1} \neq n_{2}

) sample sizes, can be obtained by replacing

(X, Y) = (X (ξ, θ_{0}), Y (ξ, θ_{0}))

in

I_{n} (ξ, θ_{0})

and

I_{n_{1}, n_{2}} (ξ, θ_{0})

with

(\hat{X}, \hat{Y})

, respectively.

Our remaining task is to demonstrate that there is no asymptotic cost in the test statistics with

(ξ, θ_{0})

replaced by its estimator

(\hat{ξ}, \hat{θ})

, as long as

(\hat{ξ}, \hat{θ}) \overset{p}{\to} (ξ, θ_{0})

at a suitable rate of convergence. To control the convergence rate, we make Assumption 5 below. Observe that it allows for nonparametric rates of convergence; see Hansen [42], for instance, for uniform convergence rates of kernel estimators.

Assumption 5.

N^{r} ∥\hat{θ} - θ_{0}∥ \overset{p}{\to} 0

and

N^{r} |\hat{ξ} - ξ| \overset{p}{\to} 0

uniformly over

R^{d + 1} \times Θ

for some

r \in (0, 1 / 2]

.

Theorem 3 below provides combinations of the shrinking rate q for b and the convergence rate r for

(\hat{ξ}, \hat{θ})

that can establish the first-order asymptotic equivalence between

n b^{1 / 4} I_{n} (ξ, θ_{0})

(

n_{1} b^{1 / 4} I_{n_{1}, n_{2}} (ξ, θ_{0})

) and

n b^{1 / 4} I_{n} (\hat{ξ}, \hat{θ})

(

n_{1} b^{1 / 4} I_{n_{1}, n_{2}} (\hat{ξ}, \hat{θ})

) when two sub-samples have equal (unequal) sample sizes.

Theorem 3.

If Assumptions 1–5 hold, then under

H_{0}

,

n b^{1 / 4} I_{n} (\hat{ξ}, \hat{θ}) = n b^{1 / 4} I_{n} (ξ, θ_{0}) + o_{p} (1)

as

n \to \infty

when

n_{1} = n_{2} = n

and

n_{1} b^{1 / 4} I_{n_{1}, n_{2}} (\hat{ξ}, \hat{θ}) = n_{1} b^{1 / 4} I_{n_{1}, n_{2}} (ξ, θ_{0}) + o_{p} (1)

as

n_{1} \to \infty

when

n_{1} / n_{2} \to λ \in (0, \infty)

, provided that

(q, r)

belong to the set

\{(q, r) : r > - 5 q / 4 + 1, r > q / 2, r \leq 1 / 2\}

.

The set given in the theorem can be expressed as the triangular region formed by the corners

(2 / 5, 1 / 2)

,

(4 / 7, 2 / 7)

and

(1, 1 / 2)

on the

q - r

plane. The theorem also indicates that we must employ the sub-optimal smoothing parameter

b = o (n^{- 2 / 5})

or undersmooth the observations to avoid additional cost of estimating

(ξ, θ_{0})

, as is the case with other kernel-smoothed tests. Moreover, FMS set

b = o (n^{- 4 / 9})

and obtain the lower bound of r as

4 / 9

. Indeed, the set provided in Theorem 3 overlaps the one derived by FMS

\{(q, r) : q > 4 / 9, r > 4 / 9\}

.

4. Smoothing Parameter Selection

How to choose the value of the smoothing parameter b is an important practical problem. Nonetheless, it appears that the issue has not been well addressed in the literature on testing problems using asymmetric kernels. While Fernandes and Grammig [33] adopt a method inspired by Silverman’s [43] rule-of-thumb, FMS adjust the value chosen via cross validation. Both methods choose the smoothing parameter value from the viewpoint of optimality for density estimation. Such choices cannot be justified in theory or practice, because estimation-optimal values may not be equally optimal for testing purposes. In contrast, there are a few works on test-oriented smoothing parameter selection. For the test of equality in two unknown regression curves, Kulasekera and Wang [34,35], analytically explore the idea of choosing the smoothing parameter value that maximizes the power with the size preserved. Gao and Gijbels [44] combine this idea with the Edgeworth expansion for a bootstrap specification test of parametric regression models.

Below we tailor the procedure by Kulasekera and Wang [35] to the SSST. For a realistic setup, the case of

n_{1} \neq n_{2}

is exclusively considered. Their basic idea is from sub-sampling. Without loss of generality assume that

{\{X_{i}\}}_{i = 1}^{n_{1}}

and

{\{Y_{i}\}}_{i = 1}^{n_{2}}

are ordered samples. Then, the entire sample

\{{\{X_{i}\}}_{i = 1}^{n_{1}}, {\{Y_{i}\}}_{i = 1}^{n_{2}}\}

can be split into M sub-samples, where

M = M_{n_{1}}

is a non-stochastic sequence that satisfies

1 / M + M / n_{1} \to 0

as

n_{1} \to \infty

. Given such M and

(k_{1}, k_{2}) : = (⌊n_{1} / M⌋, ⌊n_{2} / M⌋)

, the mth sub-sample is defined as

\{{\{X_{m + (i - 1) M}\}}_{i = 1}^{k_{1}}, {\{Y_{m + (i - 1) M}\}}_{i = 1}^{k_{2}}\}, m = 1, \dots, M

. This sub-sample yields the analogues to (3) and (4) as

\begin{matrix} I_{k_{1}, k_{2}} (m) & = \sum_{j = 1}^{k_{1}} \sum_{i = 1, i \neq j}^{k_{1}} \frac{1}{k_{1}^{2}} K_{X_{m + (j - 1) M}} (X_{m + (i - 1) M}) + \sum_{j = 1}^{k_{2}} \sum_{i = 1, i \neq j}^{k_{2}} \frac{1}{k_{2}^{2}} K_{Y_{m + (j - 1) M}} (Y_{m + (i - 1) M}) \\ - \sum_{j = 1}^{k_{2}} \sum_{i = 1, i \neq j}^{k_{1}} \frac{1}{k_{1} k_{2}} K_{Y_{m + (j - 1) M}} (X_{m + (i - 1) M}) - \sum_{j = 1}^{k} \sum_{i = 1, i \neq j}^{k_{2}} \frac{1}{k_{1} k_{2}} K_{X_{m + (j - 1) M}} (Y_{m + (i - 1) M}) \end{matrix}

and

\begin{matrix} {\hat{σ}}_{λ}^{2} (m) & = 2 V_{I} (2) \{\frac{1}{k_{1}} \sum_{i = 1}^{k_{1}} X_{m + (i - 1) M}^{- 1 / 2} {\hat{f}}_{m} (X_{m + (i - 1) M}) + (\frac{k_{1}}{k_{2}}) \frac{1}{k_{1}} \sum_{i = 1}^{k_{1}} X_{m + (i - 1) M}^{- 1 / 2} {\hat{g}}_{m} (X_{m + (i - 1) M}) \\ + (\frac{k_{1}}{k_{2}}) \frac{1}{k_{2}} \sum_{i = 1}^{k_{2}} Y_{m + (i - 1) M}^{- 1 / 2} {\hat{f}}_{m} (Y_{m + (i - 1) M}) + {(\frac{k_{1}}{k_{2}})}^{2} \frac{1}{k_{2}} \sum_{i = 1}^{k_{2}} Y_{m + (i - 1) M}^{- 1 / 2} {\hat{g}}_{m} (Y_{m + (i - 1) M})\}, \end{matrix}

where

\begin{matrix} {\hat{f}}_{m} (u) & = \frac{1}{k_{1}} \sum_{i = 1}^{k_{1}} K_{G G} (X_{m + (i - 1) M}; u, b) and \\ {\hat{g}}_{m} (u) & = \frac{1}{k_{2}} \sum_{i = 1}^{k_{2}} K_{G G} (Y_{m + (i - 1) M}; u, b) . \end{matrix}

It follows that the test statistic using the mth sub-sample becomes

T_{k_{1}, k_{2}} (m) : = \frac{k_{1} b^{1 / 4} I_{k_{1}, k_{2}} (m)}{{\hat{σ}}_{λ} (m)}, m = 1, \dots, M .

Also denote the set of admissible values for

b = b_{n_{1}}

as

H_{n_{1}} : = [\underset{̲}{B} n_{1}^{- q}, \bar{B} n_{1}^{- q}]

for some prespecified exponent

q \in (0, 1)

and two constants

0 < \underset{̲}{B} < \bar{B} < \infty

. Moreover, let

{\hat{π}}_{M} (b_{k_{1}}) : = \frac{1}{M} \sum_{m = 1}^{M} 1 \{T_{k_{1}, k_{2}} (m) > c_{m} (α)\},

where

c_{m} (α)

is the critical value for the size α test using the mth sub-sample. We pick the power-maximized

{\hat{b}}_{k_{1}} = \hat{B} k_{1}^{- q} = arg {max}_{b_{k_{1}} \in H_{k_{1}}} {\hat{π}}_{M} (b_{k_{1}})

, and the smoothing parameter value

{\hat{b}}_{n_{1}} : = \hat{B} n_{1}^{- q}

follows.

The behavior of

{\hat{π}}_{M} (b_{k_{1}})

can be examined by considering the local alternative

H_{1}^{'} : f (u) = g (u) + \frac{h (u)}{\sqrt{n_{1} b^{1 / 4}}},

where

h (u)

satisfies

\int_{0}^{\infty} h (u) d u = 0

and

I_{h} : =

\int_{0}^{\infty} h^{2} (u) d u \in (0, \infty)

. Also let

π (b_{n_{1}}) : = Pr \{T_{n_{1}, n_{2}} > c (α)\}

, where

c (α)

is the critical value for the size α test using the entire sample. For such

π (b_{n_{1}})

, define

b_{n_{1}}^{*} : = B^{*} n_{1}^{- q} = arg {max}_{b_{n_{1}} \in H_{n_{1}}} π (b_{n_{1}})

. Then,

{\hat{b}}_{n_{1}}

is optimal in the sense of Proposition 3. The proof is omitted, because it is a minor modification of the one for Theorem 2.1 of Kulasekera and Wang [35]; indeed it can be established by recognizing that

T_{n_{1}, n_{2}} \overset{d}{\to} N (I_{h} / σ_{λ}, 1)

under

H_{1}^{'}

, as in Proposition 3 of Fernandes and Grammig [33].

Proposition 3.

If Assumptions 1–4,

1 / M + M / n_{1} \to 0

and

n_{1} / n_{2} \to λ \in (0, \infty)

hold, then

B^{*} / \hat{B} \overset{p}{\to} 1

as

n_{1} \to \infty

.

We conclude this section by stating how to obtain

{\hat{b}}_{n_{1}}

in practice. Step 1 reflects that M should be divergent but smaller than both

n_{1}

and

n_{2}

in finite samples. Step 3 follows from the implementation methods in Kulasekera and Wang [34,35]. Finally, Step 4 considers that there may be more than one maximizer of

{\hat{π}}_{M} (b_{k_{1}})

.

Step 1: Choose some

δ \in (0, 1)

and specify

M = min \{⌊n_{1}^{δ}⌋, ⌊n_{2}^{δ}⌋\}

.

Step 2: Make M sub-samples of sizes

(k_{1}, k_{2}) = (⌊n_{1} / M⌋, ⌊n_{2} / M⌋)

.

Step 3: Pick two constants

0 < \underset{̲}{H} < \bar{H} < 1

and define

H_{k_{1}} = [\underset{̲}{H}, \bar{H}]

.

Step 4: Set

c_{m} (α) \equiv z_{α}

and find

{\hat{b}}_{k_{1}} = inf \{arg {max}_{b_{k_{1}} \in H_{k_{1}}} {\hat{π}}_{M} (b_{k_{1}})\}

by a grid search.

Step 5: Obtain

\hat{B} = {\hat{b}}_{k_{1}} k_{1}^{q}

and calculate

{\hat{b}}_{n_{1}} = \hat{B} n_{1}^{- q}

.

5. Finite-Sample Performance

5.1. Setup

It is widely recognized that asymptotic results on kernel-smoothed tests are not well transmitted to their finite-sample distributions, which reflects that omitted terms in the first-order asymptotics on the test statistics are highly sensitive to their smoothing parameter values in finite samples. On the other hand, Fernandes and Grammig [33] and FMS report superior finite-sample properties of asymmetric kernel-smoothed tests. To see which perspective dominates, this section investigates finite-sample performance of the test statistic for the SSST via Monte Carlo simulations.

To make a direct comparison with the results by FMS, we specialize in the conditional symmetry test using the same linear regression model

y = β_{0} + β_{1} x + u

as used in FMS. The data are generated in the following manner. First, the regressor x is drawn from

N (0, 1)

. Second, the disturbance u, which is independent of x, is drawn from one of eight distributions with means of zero given in Table 1. Distributions with “S” (symmetric) and “A” (asymmetric) are used to investigate size and power properties of the test statistic, respectively. All the distributions are popularly chosen in the literature; the generalized lambda distribution (“GLD”) by Ramberg and Schmeiser [45], in particular, is known to nest a wide variety of symmetric and asymmetric distributions3. Finally, the dependent variable y is generated by setting

β_{0} = β_{1} = 1

.

Table 1. Distributions of the Disturbance u in the Simulation Study.

We are interested in testing symmetry of the conditional distribution of y given x. For this purpose the SSST is applied for the least-squares residual

{\hat{u}}_{i} : = y_{i} - {\hat{β}}_{0} - {\hat{β}}_{1} x_{i}

using the sample

{\{(y_{i}, x_{i})\}}_{i = 1}^{N}

, where

({\hat{β}}_{0}, {\hat{β}}_{1})

are least-squares estimates of

(β_{0}, β_{1})

. Finite-sample size and power properties of the test statistic

T_{n_{1}, n_{2}}

for two sub-samples with unequal sample sizes are examined against nominal 5% and 10% levels. The MG and NM kernels (denoted as “

T_{n_{1}, n_{2}}

-MG” and “

T_{n_{1}, n_{2}}

-NM”, respectively) are employed as examples of the GG kernels.

Finite-sample properties of

T_{n_{1}, n_{2}}

-MG and

T_{n_{1}, n_{2}}

-NM are evaluated in comparison with other versions of the SSST. First, two versions of FMS’s original test statistic built on an equivalence to our

I_{n}

using the G kernel are considered. “FMS-G-O” is FMS’s truly original statistic, whereas “FMS-G-AltVar” is the one with the variance estimator replaced by

{\hat{σ}}_{λ}^{2}

given in Theorem 2. Second,

T_{n_{1}, n_{2}}

using the G kernel (denoted as “

T_{n_{1}, n_{2}}

-G”) is also calculated. Notice that FMS-G-AltVar and

T_{n_{1}, n_{2}}

-G take exactly the same form. The only difference is the method of choosing the smoothing parameter b, which will be discussed shortly. Effects of changing the variance estimator, the method of choosing b, and the kernel choice can be examined by weighing FMS-G-O with FMS-G-AltVar, FMS-G-AltVar with

T_{n_{1}, n_{2}}

-G, and

T_{n_{1}, n_{2}}

-G with

T_{n_{1}, n_{2}}

-MG or

T_{n_{1}, n_{2}}

-NM, respectively.

The smoothing parameter b for FMS-G-O and FMS-G-AltVar is determined via making an adjustment for the value chosen by a cross-validation criterion; see p. 657 of FMS for details. On the other hand, the values of b for

T_{n_{1}, n_{2}}

-G,

T_{n_{1}, n_{2}}

-MG and

T_{n_{1}, n_{2}}

-NM are selected by the power-optimality criterion in the previous section. Implementation details are as follows: (i) all critical values in

{\hat{π}}_{M} (b_{k_{1}})

are set at

z_{0.05} = 1.645

; (ii) the shrinking rate of b is set at

q = 4 / 9

because of

N^{1 / 2}

-consistency of least-squares estimates and Theorem 3; (iii) three different values are considered for δ, namely,

δ \in \{0.3, 0.5, 0.7\}

; and (iv) the interval for

b_{k_{1}}

is set equal to

H_{k_{1}} = [0.01, 0.64]

. The sample size is

N \in \{50, 100, 200\}

, and 1000 replications are drawn for each combination of the sample size N and the distribution of u.

5.2. Simulation Results

Table 2 presents finite-sample rejection frequencies of each test statistic against nominal 5% and 10% levels across 1000 Monte Carlo samples. Critical values are simply based on the first-order normal limit, i.e.,

1.645

and

1.280

correspond to the 5% and 10% levels, respectively.

Table 2. Size and Power of the SSST.

Panel (A) reports size properties. At first glance, we can find that the results of FMS-G-O are close to what is reported in Table 3 of FMS. It has the tendency of over-rejecting the null slightly against the nominal size. Comparing FMS-G-O with FMS-G-AltVar reveals that replacing the variance formula is likely to decrease the rejection frequencies. Changing the choice method of b further reduces the rejection frequencies, and

T_{n_{1}, n_{2}}

-G tends to result in mild under-rejection of the null. Effects of alternative kernel choices are mixed. While

T_{n_{1}, n_{2}}

-G and

T_{n_{1}, n_{2}}

-MG have similar size properties,

T_{n_{1}, n_{2}}

-NM looks more conservative in the sense that its rejection frequencies are slightly smaller. Impacts of varying δ are found to be minor at best. A concern is that all test statistics exhibit size distortions for S4. However, the distribution is platykurtic and has sharp boundaries at

\pm 1

. A platykurtic distribution is an exception rather than a rule in economics and finance, and a distribution with a compact support violates Assumption 1. In sum, all test statistics exhibit good size properties, although their convergence rates are nonparametric ones, effective sample sizes are (roughly) a half of the entire sample size N, and no size correction devices such as bootstrapping are used.

Panel (B) refers to power properties. We can immediately see that the rejection frequencies of each test statistic approach to one with the sample size N, which confirms consistency of the SSST. There is substantial improvement in power as the sample size increases from

N = 50

to 100. Most of rejection frequencies become nearly one for as small as

N = 200

. After a closer look, we can find it hard to judge whether changing the variance formula from FMS-G-O to FMS-G-AltVar may affect power properties favorably or adversely. However, once the smoothing parameter value is chosen via the power-optimality criterion, power properties are improved in general. Power properties of

T_{n_{1}, n_{2}}

-G and

T_{n_{1}, n_{2}}

-MG again look alike, whereas

T_{n_{1}, n_{2}}

-NM appears to be more powerful than these two. Because the power tends to decrease with δ, it could be safe to choose

δ = 0.3

from the viewpoint of power-maximization. Indeed, for

N = 200

and

δ = 0.3

, each of

T_{n_{1}, n_{2}}

-G,

T_{n_{1}, n_{2}}

-MG and

T_{n_{1}, n_{2}}

-NM exhibits better power properties than FMS-G-O and FMS-G-AltVar.

For convenience, Panel (B) presents size-adjusted powers, where the best case scenario (i.e.,

δ = 0.3

) is considered for

T_{n_{1}, n_{2}}

-G,

T_{n_{1}, n_{2}}

-MG and

T_{n_{1}, n_{2}}

-NM. These three test statistics again outperform FMS’s original statistics in terms of size-adjusted powers, and

T_{n_{1}, n_{2}}

-NM appears to have the best power properties among three. All in all, Monte Carlo results indicate superior size and power properties of the SSST with the GG kernels plugged in.

6. Conclusions

The SSST developed by FMS is built on the idea of gauging the closeness between right and left sides of the axis of symmetry of an unknown pdf. To implement the test, we split the entire sample into two sub-samples and estimate both sides of the pdf nonparametrically using asymmetric kernels with support on

[0, \infty)

. This paper has improved the SSST by combining it with the newly proposed GG kernels. The test statistic can be interpreted as a standardized version of a degenerate U-statistic. We deliver convergence properties of the test statistic and provide the asymptotic variance formulae for the cases of two sub-samples with equal and unequal sample sizes separately. It is demonstrated that the SSST smoothed by the GG kernels has a normal limit under the null of symmetry and is consistent under the alternative. As a part of the implementation method we also propose to select the smoothing parameter in a power-optimality criterion. Monte Carlo simulations indicate that the GG kernel-smoothed SSST with the power-maximized smoothing parameter value plugged in enjoys superior finite-sample properties. It should be stressed that the good performance of the SSST is grounded on the first-order normal limit and a small number of observations, despite its nonparametric convergence rate and sample-splitting procedure.

Appendix A. Appendix

Appendix A.1. Proof of Lemma 1

Because the proof for the MG kernel is basically the same as those for Lemmata 1(e) and 2 of FMS, we prove the case of the NM kernel. Among all statements, we concentrate on demonstrating that

\begin{matrix} E \{K_{X_{2}} (X_{1}) K_{Y_{2}} (X_{1})\} & \sim E \{f (X) g (X)\}, and \end{matrix}

(A1)

\begin{matrix} E \{K_{X_{2}} (Y_{1}) K_{Y_{1}} (X_{2})\} & \sim b^{- 1 / 2} V_{I} (2) E \{X^{- 1 / 2} g (X)\} . \end{matrix}

(A2)

All the remaining statements can be shown in the same manner. To approximate the gamma function, we frequently refer to the following well-known formulae:

Stirling’s formula (“SF”):

$Γ (z + 1) = \sqrt{2 π} z^{z + 1 / 2} e^{- z} \{1 + \frac{1}{12 z} + \frac{1}{288 z^{2}} + O (z^{- 3})\} as z \to \infty .$
Legendre’s duplication formula (“LDF”):

$Γ (z) Γ (z + \frac{1}{2}) = \frac{\sqrt{π}}{2^{2 z - 1}} Γ (2 z) for z > 0 .$

In addition, proofs of the above statements require the following lemma. Its proof is virtually the same as those for Lemmata A.1 and A.2 of Fernandes and Monteiro [46], and thus it is omitted.

Lemma A1.

For a constant

D > 0

and two numbers

x, y > 0

,

exp \{- \frac{{(y - x)}^{2}}{D x}\} \leq {(\frac{x}{y})}^{\frac{y - x}{D}} \leq exp \{- \frac{{(y - x)}^{2}}{D x} + \frac{{(y - x)}^{3}}{2 D x^{2}}\}

(A3)

if

x \leq y

, and

exp \{- \frac{{(y - x)}^{2}}{D x} + \frac{{(y - x)}^{3}}{2 D y^{2}}\} \leq {(\frac{x}{y})}^{\frac{y - x}{D}} \leq exp \{- \frac{{(y - x)}^{2}}{D x}\}

(A4)

if

y < x

.

Proof of (A1).

We apply the trimming argument as on p. 476 of Chen [30]. For some

ϵ \in (0, 1 / 2)

,

\begin{matrix} E \{K_{X_{2}} (X_{1}) K_{Y_{2}} (X_{1})\} \\ = \int_{b^{1 - ϵ}}^{\infty} [\int_{b^{1 - ϵ}}^{\infty} \{\int_{0}^{\infty} L_{b} (u; x, y) f (x_{1}) d x_{1}\} g (y) d y] f (x) d x + O (b^{1 - ϵ}), \end{matrix}

where

L_{b} (u; x, y) : = K_{N M} (u; x, b) K_{N M} (u; y, b)

for interior

x, y

. Then, the proof takes a multi-step approach including the following steps:

Step 1: approximating

J (x, y) : = \int_{0}^{\infty} L_{b} (u; x, y) f (u) d u .

Step 2: approximating

J : = \int_{b^{1 - ϵ}}^{\infty} \{\int_{b^{1 - ϵ}}^{\infty} J (x, y) g (y) d y\} f (x) d x .

Step 1: Define

P_{z} (b) : = \frac{z Γ (\frac{z}{2 b})}{Γ (\frac{z}{2 b} + \frac{1}{2})}

(A5)

for

z = x, y, x + y

. Then,

\begin{matrix} L_{b} (u; x, y) & = \frac{2 {\{P_{x} (b) P_{y} (b) / \sqrt{P_{x}^{2} (b) + P_{y}^{2} (b)}\}}^{\frac{x + y}{b} - 1} Γ (\frac{x + y}{2 b} - \frac{1}{2})}{P_{x}^{x / b} (b) Γ (\frac{x}{2 b}) P_{y}^{y / b} (b) Γ (\frac{y}{2 b})} \\ \times \frac{2 u^{(\frac{x + y}{b} - 1) - 1} exp [- {\{\frac{u}{P_{x} (b) P_{y} (b) / \sqrt{P_{x}^{2} (b) + P_{y}^{2} (b)}}\}}^{2}]}{{\{P_{x} (b) P_{y} (b) / \sqrt{P_{x}^{2} (b) + P_{y}^{2} (b)}\}}^{\frac{x + y}{b} - 1} Γ (\frac{x + y}{2 b} - \frac{1}{2})}, \end{matrix}

(A6)

where the first term is denoted as

B_{b} (x, y)

, and the second term can be viewed as the pdf of

G G ((x + y) / b - 1, P_{x} (b) P_{y} (b) / \sqrt{P_{x}^{2} (b) + P_{y}^{2} (b)}, 2)

. Moreover,

B_{b} (x, y)

can be further rewritten as

B_{1 b} B_{2 b} B_{3 b} : = \frac{2 Γ (\frac{x + y}{2 b} - \frac{1}{2})}{Γ (\frac{x}{2 b}) Γ (\frac{y}{2 b})} \frac{{\{P_{x}^{2} (b) + P_{y}^{2} (b)\}}^{1 / 2}}{P_{x} (b) P_{y} (b)} \frac{P_{x}^{y / b} (b) P_{y}^{x / b} (b)}{{\{P_{x}^{2} (b) + P_{y}^{2} (b)\}}^{\frac{x + y}{2 b}}},

(A7)

and an approximation to each of

B_{1 b}

,

B_{2 b}

and

B_{3 b}

is provided separately.

By LDF,

B_{1 b}

becomes

B_{1 b} = (\frac{2}{\frac{x + y}{2 b} - \frac{1}{2}}) \frac{Γ (\frac{x + y}{2 b} + \frac{1}{2})}{Γ (\frac{x}{2 b}) Γ (\frac{y}{2 b})} = \frac{b \sqrt{π}}{2^{\frac{x + y}{b} - 3} (x + y - b)} \frac{Γ (\frac{x + y}{2 b})}{Γ (\frac{x}{2 b}) Γ (\frac{y}{2 b}) Γ (\frac{x + y}{2 b})} .

Then, by SF, an approximation to

B_{1 b}

is given by

B_{1 b} = \sqrt{\frac{2}{π}} (\frac{\sqrt{x y}}{x + y}) {(\frac{x + y}{x})}^{\frac{x}{2 b}} {(\frac{x + y}{y})}^{\frac{y}{2 b}} \{1 + o (1)\} .

(A8)

Next, it follows from LDF and SF that

P_{z} (b) = \frac{2^{z / b - 1}}{\sqrt{π}} \frac{z Γ^{2} (\frac{z}{2 b})}{Γ (\frac{z}{b})} = {(2 b z)}^{1 / 2} \{1 + \frac{b}{4 z} + O (b^{2})\} .

(A9)

Hence,

B_{2 b}^{2} = \frac{1}{P_{x}^{2} (b)} + \frac{1}{P_{y}^{2} (b)} = \frac{b^{- 1}}{2} (\frac{x + y}{x y}) \{1 + o (1)\},

and thus

B_{2 b} = \frac{b^{- 1 / 2}}{\sqrt{2}} (\frac{\sqrt{x + y}}{\sqrt{x y}}) \{1 + o (1)\} .

(A10)

Furthermore, (A9) also implies that

\begin{matrix} P_{x}^{y / b} (b) & = {(2 b x)}^{\frac{y}{2 b}} exp (\frac{y}{4 x}) \{1 + o (1)\}; \\ P_{y}^{x / b} (b) & = {(2 b y)}^{\frac{x}{2 b}} exp (\frac{x}{4 y}) \{1 + o (1)\}; and \\ {\{P_{x}^{2} (b) + P_{y}^{2} (b)\}}^{\frac{x + y}{2 b}} & = {\{2 b (x + y)\}}^{\frac{x + y}{2 b}} exp (\frac{1}{2}) \{1 + o (1)\} . \end{matrix}

Then,

B_{3 b} = x^{\frac{y}{2 b}} y^{\frac{x}{2 b}} {(x + y)}^{- \frac{x + y}{2 b}} exp \{\frac{1}{4} (\frac{y}{x} - 1)\} exp \{\frac{1}{4} (\frac{x}{y} - 1)\} \{1 + o (1)\} .

(A11)

Substituting (A8), (A10) and (A11) into (A7) finally yields

B_{b} (x, y) : = b^{- 1 / 2} {\tilde{B}}_{b} (x, y) {(\frac{x}{y})}^{\frac{y - x}{2 b}} \{1 + o (1)\},

where

{\tilde{B}}_{b} (x, y) = \frac{1}{\sqrt{π} \sqrt{x + y}} exp \{\frac{1}{4} (\frac{y}{x} - 1)\} exp \{\frac{1}{4} (\frac{x}{y} - 1)\} .

Then, for a random variable

ζ_{x} \overset{d}{=} G G ((x + y) / b - 1, P_{x} (b) P_{y} (b) / \sqrt{P_{x}^{2} (b) + P_{y}^{2} (b)}, 2)

,

\begin{matrix} J (x, y) & = \int_{0}^{\infty} L_{b} (u; x, y) f (u) d u \{1 + o (1)\} \\ = b^{- 1 / 2} {\tilde{B}}_{b} (x, y) {(\frac{x}{y})}^{\frac{y - x}{2 b}} E \{f (ζ_{x})\} \{1 + o (1)\} . \end{matrix}

By the property of GG random variables, (A5), (A9), and (A10),

E (ζ_{x}) = B_{2 b}^{- 1} \frac{Γ (\frac{x + y}{2 b})}{Γ (\frac{x + y}{2 b} - \frac{1}{2})} = \sqrt{x y} \{1 + O (b)\} .

In the end, a first-order Taylor expansion of

f (ζ_{x})

around

ζ_{x} = \sqrt{x y}

gives

J (x, y) = b^{- 1 / 2} {\tilde{B}}_{b} (x, y) f (\sqrt{x y}) {(\frac{x}{y})}^{\frac{y - x}{2 b}} \{1 + o (1)\},

which completes Step 1.

Step 2: For some

t \in (0, 1)

, we split the interval for y into four subintervals as follows:

\begin{matrix} J & = \int_{b^{1 - ϵ}}^{\infty} \{\int_{b^{1 - ϵ}}^{(1 - t) x} + \int_{(1 - t) x}^{x} + \int_{x}^{(1 + t) x} + \int_{(1 + t) x}^{\infty} J (x, y) g (y) d y\} f (x) d x \\ = J_{1} + J_{2} + J_{3} + J_{4} (say) . \end{matrix}

Also denote

h (x, y) : = {\tilde{B}}_{b} (x, y) f (\sqrt{x y}) g (y)

. Then, by (A4) and the change of variable

v : = (y - x) / \sqrt{2 b x}

,

\begin{matrix} J_{1} & \leq \int_{b^{1 - ϵ}}^{\infty} [\int_{0}^{(1 - t) x} b^{- 1 / 2} h (x, y) exp \{- \frac{{(y - x)}^{2}}{2 b x}\} \{1 + o (1)\} d y] f (x) d x \\ \leq \int_{b^{1 - ϵ}}^{\infty} \sqrt{2 x} [\int_{- \frac{1}{\sqrt{2}} \sqrt{\frac{x}{b}}}^{- \frac{t}{\sqrt{2}} \sqrt{\frac{x}{b}}} h (x, x + v \sqrt{2 b x}) e^{- v^{2}} \{1 + o (1)\} d v] f (x) d x \to 0 \end{matrix}

as

b \to 0

.

Next, it follows from (A4) that

\begin{matrix} J_{2} & \geq \int_{b^{1 - ϵ}}^{\infty} [\int_{(1 - t) x}^{x} b^{- 1 / 2} h (x, y) exp \{- \frac{{(y - x)}^{2}}{2 b x} + \frac{{(y - x)}^{3}}{4 b y^{2}}\} \{1 + o (1)\} d y] f (x) d x \\ \geq \int_{b^{1 - ϵ}}^{\infty} [\int_{(1 - t) x}^{x} b^{- 1 / 2} h (x, y) exp \{- \frac{{(y - x)}^{2}}{2 b x} (1 + τ_{1})\} \{1 + o (1)\} d y] f (x) d x, \end{matrix}

where

τ_{1} : = t (2 - t) / \{2 {(1 - t)}^{2}\}

. By the change of variable

w : = (y - x) \sqrt{(1 + τ_{1}) / (2 b x)}

, the right-hand side becomes

\int_{b^{1 - ϵ}}^{\infty} \sqrt{\frac{2 x}{1 + τ_{1}}} [\int_{- t \sqrt{\frac{1 + τ_{1}}{2}} \sqrt{\frac{x}{b}}}^{0} h (x, x + w \sqrt{\frac{2 b x}{1 + τ_{1}}}) e^{- w^{2}} \{1 + o (1)\} d w] f (x) d x .

Because

\int_{- \infty}^{0} e^{- v^{2}} d v = \sqrt{π} / 2

and

h (x, x) = f (x) g (x) / \sqrt{2 π x}

, we have

lim inf_{b \to 0} J_{2} = \frac{1}{2} \sqrt{\frac{1}{1 + τ_{1}}} \int_{0}^{\infty} f (x) g (x) d F (x) \to \frac{1}{2} E \{f (X) g (X)\}

by letting t shrink toward zero. On the other hand, again by (A4) and the change of variable

v = (y - x) / \sqrt{2 b x}

,

\begin{matrix} J_{2} & \leq \int_{b^{1 - ϵ}}^{\infty} [\int_{(1 - t) x}^{x} b^{- 1 / 2} h (x, y) exp \{- \frac{{(y - x)}^{2}}{2 b x}\} \{1 + o (1)\} d y] f (x) d x \\ \leq \int_{b^{1 - ϵ}}^{\infty} \sqrt{2 x} [\int_{- \frac{t}{\sqrt{2}} \sqrt{\frac{x}{b}}}^{0} h (x, x + v \sqrt{2 b x}) e^{- v^{2}} \{1 + o (1)\} d v] f (x) d x, \end{matrix}

so that

lim sup_{b \to 0} J_{2} = \frac{1}{2} \int_{0}^{\infty} f (x) g (x) d F (x) = \frac{1}{2} E \{f (X) g (X)\} .

Hence, we can conclude that

J_{2} \to (1 / 2) E \{f (X) g (X)\}

.

It can be also demonstrated that

J_{3} \to (1 / 2) E \{f (X) g (X)\}

and

J_{4} \to 0

with the assistance of (A3). Therefore,

J \to E \{f (X) g (X)\}

, and thus (A1) is established. □

Proof of (A2). Again for some

ϵ \in (0, 1 / 2)

,

E \{K_{X_{2}} (Y_{1}) K_{Y_{1}} (X_{2})\} = \int_{b^{1 - ϵ}}^{\infty} [\int_{b^{1 - ϵ}}^{\infty} M_{b} (x, y) g (y) d y] f (x) d x + O (b^{- ϵ}),

where

Λ_{b} (x, y) : = K_{N M} (y; x, b) K_{N M} (x; y, b)

for interior

x, y

and the order of the remainder term is

O (b^{- ϵ}) = o (b^{- 1 / 2})

by construction. Observe that

Λ_{b} (x, y) = \frac{4 y^{x / b - 1} x^{y / b - 1}}{P_{x}^{x / b} (b) P_{y}^{y / b} (b) Γ (\frac{x}{2 b}) Γ (\frac{y}{2 b})} exp \{- \frac{y^{2}}{P_{x}^{2} (b)}\} exp \{- \frac{x^{2}}{P_{y}^{2} (b)}\} .

(A12)

It follows from (A9) that

P_{z}^{z / b} (b) = {(2 b z)}^{\frac{z}{2 b}} exp (\frac{1}{4}) \{1 + o (1)\}

(A13)

for

z = x, y

. Similarly,

\begin{matrix} exp \{- \frac{y^{2}}{P_{x}^{2} (b)}\} & = exp (- \frac{y^{2}}{2 b x}) exp \{\frac{1}{4} {(\frac{y}{x})}^{2}\} \{1 + o (1)\}, and \end{matrix}

(A14)

\begin{matrix} exp \{- \frac{x^{2}}{P_{y}^{2} (b)}\} & = exp (- \frac{x^{2}}{2 b y}) exp \{\frac{1}{4} {(\frac{x}{y})}^{2}\} \{1 + o (1)\} . \end{matrix}

(A15)

Substituting (A13)–(A15) into (A12) and using SF, we have

\begin{matrix} Λ_{b} (x, y) & = \frac{b^{- 1}}{π \sqrt{x} \sqrt{y}} {(\frac{x}{y})}^{(y - x) / b} exp \{- \frac{(y + x) {(y - x)}^{2}}{2 b x y}\} \\ \times exp [\frac{1}{4} \{{(\frac{y}{x})}^{2} - 1\}] exp [\frac{1}{4} \{{(\frac{x}{y})}^{2} - 1\}] \{1 + o (1)\} . \end{matrix}

As before, for some

t \in (0, 1)

, consider

\begin{matrix} Λ & = \int_{b^{1 - ϵ}}^{\infty} \{\int_{b^{1 - ϵ}}^{(1 - t) x} + \int_{(1 - t) x}^{x} + \int_{x}^{(1 + t) x} + \int_{(1 + t) x}^{\infty} Λ_{b} (x, y) g (y) d y\} f (x) d x \\ = Λ_{1} + Λ_{2} + Λ_{3} + Λ_{4} (say) . \end{matrix}

It follows from (A4) that

\begin{matrix} Λ_{1} & \leq \int_{b^{1 - ϵ}}^{\infty} [\int_{0}^{(1 - t) x} \frac{b^{- 1}}{π \sqrt{x} \sqrt{y}} exp \{- \frac{{(y - x)}^{2}}{b x} - \frac{(y + x) {(y - x)}^{2}}{2 b x y}\} \\ \times exp \{\frac{1}{4} ({(\frac{y}{x})}^{2} - 1)\} exp \{\frac{1}{4} ({(\frac{x}{y})}^{2} - 1)\} \{1 + o (1)\} g (y) d y] f (x) d x \\ \leq \int_{b^{1 - ϵ}}^{\infty} [\int_{0}^{(1 - t) x} \frac{b^{- 1}}{π \sqrt{x} \sqrt{y}} exp \{- \frac{τ_{2} {(y - x)}^{2}}{2 b x}\} \\ \times exp \{\frac{1}{4} ({(\frac{y}{x})}^{2} - 1)\} exp \{\frac{1}{4} ({(\frac{x}{y})}^{2} - 1)\} \{1 + o (1)\} g (y) d y] f (x) d x, \end{matrix}

where

τ_{2} : = (4 - t) / (1 - t)

. Then, by the change of variable

η : = (y - x) \sqrt{τ_{2} / (2 b x)}

,

\begin{matrix} b^{1 / 2} Λ_{1} & \leq \int_{b^{1 - ϵ}}^{\infty} \frac{1}{π} \sqrt{\frac{2}{τ_{2}}} [\int_{- \sqrt{\frac{τ_{2}}{2}} \sqrt{\frac{x}{b}}}^{- t \sqrt{\frac{τ_{2}}{2}} \sqrt{\frac{x}{b}}} \frac{1}{\sqrt{x + η \sqrt{\frac{2 b x}{τ_{2}}}}} e^{- η^{2}} exp \{\frac{1}{4} ({(\frac{x + η \sqrt{\frac{2 b x}{τ_{2}}}}{x})}^{2} - 1)\} \\ \times exp \{\frac{1}{4} ({(\frac{x}{x + η \sqrt{\frac{2 b x}{τ_{2}}}})}^{2} - 1)\} \{1 + o (1)\} g (x + η \sqrt{\frac{2 b x}{τ_{2}}}) d η] f (x) d x \\ \to 0, \end{matrix}

or

Λ_{1} = o (b^{- 1 / 2})

.

Next, (A4) implies that

\begin{matrix} Λ_{2} & \geq \int_{b^{1 - ϵ}}^{\infty} [\int_{(1 - t) x}^{x} \frac{b^{- 1}}{π \sqrt{x} \sqrt{y}} exp \{- \frac{{(y - x)}^{2}}{b x} + \frac{{(y - x)}^{3}}{2 b y^{2}} - \frac{(y + x) {(y - x)}^{2}}{2 b x y}\} \\ \times exp \{\frac{1}{4} ({(\frac{y}{x})}^{2} - 1)\} exp \{\frac{1}{4} ({(\frac{x}{y})}^{2} - 1)\} \{1 + o (1)\} g (y) d y] f (x) d x \\ \geq \int_{b^{1 - ϵ}}^{\infty} [\int_{(1 - t) x}^{x} \frac{b^{- 1}}{π \sqrt{x} \sqrt{y}} exp \{- \frac{{(y - x)}^{2}}{2 b x} (3 + τ_{3})\} \\ \times exp \{\frac{1}{4} ({(\frac{y}{x})}^{2} - 1)\} exp \{\frac{1}{4} ({(\frac{x}{y})}^{2} - 1)\} \{1 + o (1)\} g (y) d y] f (x) d x, \end{matrix}

where

τ_{3} : = {(1 - t)}^{- 2}

. By the change of variable

μ : = (y - x) \sqrt{(3 + τ_{3}) / (2 b x)}

, the right-hand side becomes

\begin{matrix} \int_{b^{1 - ϵ}}^{\infty} \frac{b^{- 1 / 2}}{π} \sqrt{\frac{2}{3 + τ_{3}}} [\int_{\sqrt{\frac{3 + τ_{3}}{2}} \sqrt{\frac{x}{b}}}^{0} \frac{1}{\sqrt{x + μ \sqrt{\frac{2 b x}{3 + τ_{3}}}}} e^{- μ^{2}} exp \{\frac{1}{4} ({(\frac{x + μ \sqrt{\frac{2 b x}{3 + τ_{3}}}}{x})}^{2} - 1)\} \\ \times exp \{\frac{1}{4} ({(\frac{x}{x + μ \sqrt{\frac{2 b x}{3 + τ_{3}}}})}^{2} - 1)\} \{1 + o (1)\} g (x + μ \sqrt{\frac{2 b x}{3 + τ_{3}}}) d μ] f (x) d x, \end{matrix}

so that

lim inf_{b \to 0} b^{1 / 2} Λ_{2} = \frac{1}{2 \sqrt{π}} \sqrt{\frac{2}{3 + τ_{3}}} \int_{0}^{\infty} x^{- 1 / 2} g (x) d F (x) \to \frac{1}{2} V_{I, N M} (2) E \{X^{- 1 / 2} g (X)\}

by letting t shrink toward zero, where

V_{I, N M} (2) : = 1 / \sqrt{2 π}

. Notice that we may safely assume that

E \{X^{- 1 / 2} g (X)\} < \infty

: Assumption 2 ensures that f and g are bounded, and thus it must be the case that

x^{- 1 / 2} f (x) g (x) \leq c x^{- 1 / 2}

in the vicinity of the origin. On the other hand, (A4) also yields

\begin{matrix} Λ_{2} & \leq \int_{b^{1 - ϵ}}^{\infty} [\int_{(1 - t) x}^{x} \frac{b^{- 1}}{π \sqrt{x} \sqrt{y}} exp \{- \frac{{(y - x)}^{2}}{b x} - \frac{(y + x) {(y - x)}^{2}}{2 b x y}\} \\ \times exp \{\frac{1}{4} ({(\frac{y}{x})}^{2} - 1)\} exp \{\frac{1}{4} ({(\frac{x}{y})}^{2} - 1)\} \{1 + o (1)\} g (y) d y] f (x) d x \\ \leq \int_{b^{1 - ϵ}}^{\infty} [\int_{(1 - t) x}^{x} \frac{b^{- 1}}{π \sqrt{x} \sqrt{y}} exp \{- \frac{2 {(y - x)}^{2}}{b x}\} \\ \times exp \{\frac{1}{4} ({(\frac{y}{x})}^{2} - 1)\} exp \{\frac{1}{4} ({(\frac{x}{y})}^{2} - 1)\} \{1 + o (1)\} g (y) d y] f (x) d x . \end{matrix}

By the change of variable

ω : = (y - x) \sqrt{2 / (b x)}

,

\begin{matrix} Λ_{2} & \leq \int_{b^{1 - ϵ}}^{\infty} \frac{b^{- 1 / 2}}{\sqrt{2} π} [\int_{- t \sqrt{2} \sqrt{\frac{x}{b}}}^{0} \frac{1}{\sqrt{x + ω \sqrt{\frac{b x}{2}}}} e^{- ω^{2}} exp \{\frac{1}{4} ({(\frac{x + ω \sqrt{\frac{b x}{2}}}{x})}^{2} - 1)\} \\ \times exp \{\frac{1}{4} ({(\frac{x}{x + ω \sqrt{\frac{b x}{2}}})}^{2} - 1)\} \{1 + o (1)\} g (x + ω \sqrt{\frac{b x}{2}}) d ω] f (x) d x, \end{matrix}

and thus

lim sup_{b \to 0} b^{1 / 2} Λ_{2} = \frac{1}{2 \sqrt{2 π}} \int_{0}^{\infty} x^{- 1 / 2} g (x) d F (x) = \frac{1}{2} V_{I, N M} (2) E \{X^{- 1 / 2} g (X)\} .

Hence, we can conclude that

Λ_{2} \sim b^{- 1 / 2} (1 / 2) V_{I, N M} (2) E \{X^{- 1 / 2} g (X)\}

.

It also follows from (A3) that

Λ_{3} \sim b^{- 1 / 2} V_{I, N M} (2) (1 / 2) E \{X^{- 1 / 2} g (X)\}

and

Λ_{4} = o (b^{- 1 / 2})

. Therefore,

Λ \sim b^{- 1 / 2 /} V_{I, N M} (2) E \{X^{- 1 / 2} g (X)\}

, and thus (A2) is also established. □

Appendix A.2. Proof of Theorem 1

Because (ii) is obvious given that (i) is true, we concentrate only on (i). The proof strategy for (i) largely follows the one for Theorem 1.1 of Fernandes and Monteiro [46]. The proof of (i) also requires three lemmata below.

Lemma A2.

Let

(X_{1}, X_{2})

and

(Y_{1}, Y_{2})

be two independent copies of X and Y, respectively. Then, under Assumptions 1–3, the followings hold:

(a): $E \{K_{X_{2}}^{2} (X_{1})\} \sim b^{- 1 / 2} V_{I} (2) E \{X^{- 1 / 2} f (X)\}$ ; $E \{K_{X_{2}}^{2} (Y_{1})\} \sim b^{- 1 / 2} V_{I} (2) E \{X^{- 1 / 2} g (X)\}$ ;
$E \{K_{Y_{2}}^{2} (X_{1})\} \sim b^{- 1 / 2} V_{I} (2) E \{Y^{- 1 / 2} f (Y)\}$ ; and $E \{K_{Y_{2}}^{2} (Y_{1})\} \sim b^{- 1 / 2} V_{I} (2) E \{Y^{- 1 / 2} g (Y)\},$ where $V_{I} (2)$ is given in Condition 5 of Definition 1.
(b): $E \{K_{X_{2}} (X_{1}) K_{Y_{2}} (Y_{1})\} \sim E \{f (X)\} E \{g (Y)\}$ ; $E \{K_{X_{2}} (Y_{1}) K_{Y_{2}} (X_{1})\} \sim E \{g (X)\} E \{f (Y)\}$ ;
$E \{K_{X_{2}} (Y_{1}) K_{X_{1}} (Y_{2})\} \sim E^{2} \{g (X)\}$ ; and $E \{K_{Y_{2}} (X_{1}) K_{Y_{1}} (X_{2})\} \sim E^{2} \{f (Y)\}$ .
(c): $E \{K_{X_{2}} (X_{1}) K_{X_{2}} (Y_{1})\} \sim E \{f (X) g (X)\}$ ; and $E \{K_{Y_{2}} (Y_{1}) K_{Y_{2}} (X_{1})\} \sim E \{g (Y) f (Y)\}$ .
(d): $E \{K_{X_{2}} (X_{1}) K_{X_{1}} (Y_{2})\} \sim E \{f (X) g (X)\}$ ; $E \{K_{Y_{1}} (X_{2}) K_{X_{2}} (X_{1})\} \sim E \{f^{2} (Y)\}$ ;
$E \{K_{X_{1}} (Y_{2}) K_{Y_{2}} (Y_{1})\} \sim E \{g^{2} (X)\}$ ; and $E \{K_{Y_{2}} (Y_{1}) K_{Y_{1}} (X_{2})\} \sim E \{f (Y) g (Y)\}$ .

Lemma A3.

If Assumptions 1–4 and

n_{1} = n_{2} = n

hold, then

E \{Φ_{n}^{2} (Z_{1}, Z_{2})\} \sim \frac{4 V_{I} (2)}{n^{4} b^{1 / 2}} E [X^{- 1 / 2} \{f (X) + g (X)\} + Y^{- 1 / 2} \{f (Y) + g (Y)\}] .

Lemma A4.

If Assumptions 1–4 and

n_{1} = n_{2} = n

hold, then

E \{Φ_{n}^{2 k} (Z_{1}, Z_{2})\} < \infty

and

\frac{E \{Υ_{n}^{k} (Z_{1}, Z_{2})\} + n^{1 - k} E \{Φ_{n}^{2 k} (Z_{1}, Z_{2})\}}{E^{k} \{Φ_{n}^{2} (Z_{1}, Z_{2})\}} \to 0

for some

k \in (1, 3 / 2)

, where

Υ_{n} (x, y) : = E \{Φ_{n} (Z_{1}, x) Φ_{n} (y, Z_{1})\}

.

Appendix A.2.1. Proof of Lemma A2

The variance approximation in Theorem 1 of Hirukawa and Sakudo [32] and the trimming argument on p.476 of Chen [30] yield (a). On the other hand, the bias approximation in Theorem 1 of Hirukawa and Sakudo [32] is applied to (b)–(d). As a consequence, (b) can be established by recognizing that

E \{K_{X_{2}} (X_{1}) K_{Y_{2}} (Y_{1})\} = E \{K_{X_{2}} (X_{1})\} E \{K_{X_{2}} (X_{1})\}

, for instance. Moreover, (c) and (d) follow from the proofs for (d) and (f) in Lemma A1 of FMS. □

Appendix A.2.2. Proof of Lemma A3

Because

ϕ_{n} (Z_{1}, Z_{2}) \overset{d}{=} ϕ_{n} (Z_{2}, Z_{1})

, we have

E \{Φ_{n}^{2} (Z_{1}, Z_{2})\} = \frac{2}{n^{4}} [E \{ϕ_{n}^{2} (Z_{1}, Z_{2})\} + E \{ϕ_{n} (Z_{1}, Z_{2}) ϕ_{n} (Z_{2}, Z_{1})\}] .

With the assistance of Assumption 4 and Lemma A2, we can pick out the leading terms of

E \{ϕ_{n}^{2} (Z_{1}, Z_{2})\}

and

E \{ϕ_{n} (Z_{1}, Z_{2}) ϕ_{n} (Z_{2}, Z_{1})\}

as:

\begin{matrix} E \{ϕ_{n}^{2} (Z_{1}, Z_{2})\} \\ = E \{K_{X_{2}}^{2} (X_{1})\} + E \{K_{X_{2}}^{2} (Y_{1})\} + E \{K_{Y_{2}}^{2} (X_{1})\} + E \{K_{Y_{2}}^{2} (Y_{1})\} + O (1) \\ = b^{- 1} V_{I} (2) E [X^{- 1 / 2} \{f (X) + g (X)\} + Y^{- 1 / 2} \{f (Y) + g (Y)\}] + o (b^{- 1}); and \\ E \{ϕ_{n} (Z_{1}, Z_{2}) ϕ_{n} (Z_{2}, Z_{1})\} \\ = E \{K_{X_{2}} (X_{1}) K_{X_{1}} (X_{2})\} + E \{K_{X_{2}} (Y_{1}) K_{Y_{1}} (X_{2})\} \\ + E \{K_{Y_{2}} (X_{1}) K_{X_{1}} (Y_{2})\} + E \{K_{Y_{2}} (Y_{1}) K_{Y_{1}} (Y_{2})\} + O (1) \\ = b^{- 1} V_{I} (2) E [X^{- 1 / 2} \{f (X) + g (X)\} + Y^{- 1 / 2} \{f (Y) + g (Y)\}] + o (b^{- 1}) . \end{matrix}

The result immediately follows. □

Appendix A.2.3. Proof of Lemma A4

It follows from Lemma A3 that

E^{k} \{Φ_{n}^{2} (Z_{1}, Z_{2})\} = O (n^{- 4 k} b^{- k / 2}) .

(A16)

Next, by Jensen’s and

C_{r}

-inequalities,

\begin{matrix} E \{Υ_{n}^{k} (Z_{1}, Z_{2})\} \\ = E_{Z_{1}, Z_{2}} [E_{Z_{3}}^{k} \{Φ_{n} (Z_{3}, Z_{1}) Φ_{n} (Z_{2}, Z_{3})\}] \\ \leq E_{Z_{1}, Z_{2}} [E_{Z_{3}} {\{Φ_{n} (Z_{3}, Z_{1}) Φ_{n} (Z_{2}, Z_{3})\}}^{k}] \\ \leq n^{- 4 k} E_{Z_{1}, Z_{2}} \{E_{Z_{3}} |ϕ_{n} (Z_{3}, Z_{1}) ϕ_{n} (Z_{2}, Z_{3}) + ϕ_{n} (Z_{3}, Z_{1}) ϕ_{n} (Z_{3}, Z_{2}) \\ + {ϕ_{n} (Z_{1}, Z_{3}) ϕ_{n} (Z_{2}, Z_{3}) + ϕ_{n} (Z_{1}, Z_{3}) ϕ_{n} (Z_{3}, Z_{2})|}^{k}\} \\ \leq n^{- 4 k} 2^{2 (k - 1)} E_{Z_{1}, Z_{2}} [E_{Z_{3}} {|ϕ_{n} (Z_{3}, Z_{1}) ϕ_{n} (Z_{2}, Z_{3})|}^{k} + E_{Z_{3}} {|ϕ_{n} (Z_{3}, Z_{1}) ϕ_{n} (Z_{3}, Z_{2})|}^{k} \\ + E_{Z_{3}} {|ϕ_{n} (Z_{1}, Z_{3}) ϕ_{n} (Z_{2}, Z_{3})|}^{k} + E_{Z_{3}} {|ϕ_{n} (Z_{1}, Z_{3}) ϕ_{n} (Z_{3}, Z_{2})|}^{k}] \\ = n^{- 4 k} 2^{2 (k - 1)} (Υ_{1} + Υ_{2} + Υ_{3} + Υ_{4}) (say) . \end{matrix}

Furthermore, applying

C_{r}

-inequality repeatedly yields

Υ_{1} \leq 2^{4 (k - 1)} \cdot 8 [E \{K_{X_{1}}^{k} (X_{3}) K_{X_{3}}^{k} (X_{2})\} + E^{2} \{K_{X_{1}}^{k} (X_{3})\}]

under

H_{0}

. Essentially the same arguments as in the proofs of Lemmata 1 and A2 establish that

E \{K_{X_{1}}^{k} (X_{3}) K_{X_{3}}^{k} (X_{2})\}

is bounded by

c b^{1 - k} \int_{0}^{\infty} x^{1 - k} f^{3} (x) d x

. It follows from

k < 3 / 2

that

x^{1 - k} f^{3} (x) \leq c x^{- 1 / 2}

in the neighborhood of the origin, and thus

\int_{0}^{\infty} x^{1 - k} f^{3} (x) d x < \infty

holds. Hence,

E \{K_{X_{1}}^{k} (X_{3}) K_{X_{3}}^{k} (X_{2})\} \leq O (b^{1 - k})

. Similarly,

E^{2} \{K_{X_{1}}^{k} (X_{3})\} \leq O \{{(b^{(1 - k) / 2})}^{2}\} = O (b^{1 - k})

, and thus

Υ_{1} \leq O (b^{1 - k})

. It can be also shown that each of

Υ_{2}

,

Υ_{3}

and

Υ_{4}

is bounded by

O (b^{1 - k})

. As a result,

E \{Υ_{n}^{k} (Z_{1}, Z_{2})\} \leq O (n^{- 4 k} b^{1 - k}) .

(A17)

Using

C_{r}

-inequality and

ϕ_{n} (Z_{1}, Z_{2}) \overset{d}{=} ϕ_{n} (Z_{2}, Z_{1})

, we also have

\begin{matrix} E \{Φ_{n}^{2 k} (Z_{1}, Z_{2})\} & \leq n^{- 4 k} E {|ϕ_{n} (Z_{1}, Z_{2}) + ϕ_{n} (Z_{2}, Z_{1})|}^{2 k} \\ \leq n^{- 4 k} 2^{2 k - 1} \{E {|ϕ_{n} (Z_{1}, Z_{2})|}^{2 k} + E {|ϕ_{n} (Z_{2}, Z_{1})|}^{2 k}\} \\ = n^{- 4 k} 2^{2 k} E {|ϕ_{n} (Z_{1}, Z_{2})|}^{2 k} . \end{matrix}

Again, by

C_{r}

-inequality,

E {|ϕ_{n} (Z_{1}, Z_{2})|}^{2 k} \leq 4 \cdot 2^{2 (2 k - 1)} E \{K_{X_{2}}^{2 k} (X_{1})\} \leq c b^{\frac{1 - 2 k}{2}} \int_{0}^{\infty} x^{\frac{1 - 2 k}{2}} f^{2} (x) d x,

where

x^{(1 - 2 k) / 2} f^{2} (x) \leq c x^{- (1 - ε)}

for some

ε \in (0, 1 / 2)

as

x \to 0

so that

\int_{0}^{\infty} x^{(1 - 2 k) / 2} f^{2} (x) d x < \infty

is ensured. Therefore,

E \{Φ_{n}^{2 k} (Z_{1}, Z_{2})\} \leq O (n^{- 4 k} b^{\frac{1 - 2 k}{2}}) = o (1)

(A18)

and thus

E \{Φ_{n}^{2 k} (Z_{1}, Z_{2})\} < \infty

is demonstrated.

In the end, by (A16)–(A18),

\begin{matrix} \frac{E \{Υ_{n}^{k} (Z_{1}, Z_{2})\}}{E^{k} \{Φ_{n}^{2} (Z_{1}, Z_{2})\}} & = O (b^{1 - k / 2}) \to 0, and \\ \frac{n^{1 - k} E \{Φ_{n}^{2 k} (Z_{1}, Z_{2})\}}{E^{k} \{Φ_{n}^{2} (Z_{1}, Z_{2})\}} & = O \{{(n b^{1 / 2})}^{1 - k}\} \to 0, \end{matrix}

as long as

1 < k < 3 / 2

. This completes the proof. □

Appendix A.2.4. Proof of Theorem 1

It follows from Lemma A4 that a martingale central limit theorem for a degenerate U-statistic (Theorem 4.7.3 of Koroljuk and Borovskich [41], to be precise) applies. Moreover, by Lemma A3, the asymptotic variance of the normal limit becomes

\begin{matrix} σ^{2} & = lim_{n \to \infty} n^{2} b^{1 / 2} V a r (I_{n}) \\ = lim_{n \to \infty} n^{2} b^{1 / 2} \frac{n (n - 1)}{2} E \{Φ_{n}^{2} (Z_{1}, Z_{2})\} \\ = 2 V_{I} (2) E [X^{- 1 / 2} \{f (X) + g (X)\} + Y^{- 1 / 2} \{f (Y) + g (Y)\}] \\ = 8 V_{I} (2) E \{X^{- 1 / 2} f (X)\} under H_{0} . \end{matrix}

□

Appendix A.3. Proof of Proposition 1

The proof closely follows the one for Theorem 2.2 of Fan and Ullah [19]. Under

H_{1}

,

E (I_{n}) = \int_{0}^{\infty} {\{f (u) - g (u)\}}^{2} d u + O (b) = I + O (b)

. Moreover,

V a r (I_{n}) = O (n^{- 2} b^{- 1 / 2})

and

{\hat{σ}}^{2} \overset{p}{\to} σ^{2}

, regardless of whether

H_{0}

or

H_{1}

may be true. Therefore,

I_{n} = I + O (b) + O_{p} (n^{- 1} b^{- 1 / 4}) \overset{p}{\to} I > 0

, and thus

n b^{1 / 4} I_{n} / \hat{σ}

is a divergent stochastic sequence with an expansion rate of

n b^{1 / 4}

. The result immediately follows. □

Appendix A.4. Proof of Theorem 3

For brevity, we focus only on the case of equal sample sizes in two sub-samples. The proof largely follow the one for Proposition 5 of FMS. FMS consider the Taylor expansion

I_{n} (\hat{ξ}, \hat{θ}) - I_{n} (ξ, θ_{0}) = Δ_{1} (ξ, θ_{0}) (\hat{ξ} - ξ) + Δ_{2} (ξ, θ_{0}) (\hat{θ} - θ_{0}) + R_{n},

where

Δ_{1} (ξ, θ_{0})

and

Δ_{2} (ξ, θ_{0})

are partial derivatives of

I_{n}

with respect to the first and second arguments evaluated at

(ξ, θ_{0})

, respectively, and

R_{n}

is the remainder term of a smaller order. The only difference between their proof and ours is that we derive the range of

(q, r)

within which

n b^{1 / 4} \{Δ_{1} (ξ, θ_{0}) (\hat{ξ} - ξ) + Δ_{2} (ξ, θ_{0}) (\hat{θ} - θ_{0})\} = o_{p} (1)

is the case. Because each of

Δ_{1}

and

Δ_{2}

is

O (b) + O_{p} (n^{- 1} b^{- 3 / 4})

, the left-hand side is bounded by

O (n^{1 - r} b^{5 / 4}) + O_{p} (n^{- r} b^{- 1 / 2}) = O (n^{1 - r - 5 q / 4}) + O_{p} (n^{- r + q / 2}) .

This becomes

o_{p} (1)

if

(q, r)

satisfy

r > - 5 q / 4 + 1

,

r > q / 2

and

r \leq 1 / 2

. □

Acknowledgments

We would like to thank the editor Kerry Patterson, four anonymous referees, Yohei Yamamoto, and the participants of seminars at Hitotsubashi University and the Development Bank of Japan for their constructive comments and suggestions. We are also grateful to Marcelo Fernandes, Eduardo Mendes and Olivier Scaillet for providing us with the computer codes used for Monte Carlo simulations in Fernandes, Mendes and Scaillet [29]. This research was supported, in part, by the grant from Japan Society of the Promotion of Science (grant number 15K03405). The views expressed herein and those of the authors do not necessarily reflect the views of the Development Bank of Japan.

Author Contributions

The authors contributed equally to the paper as a whole.

Conflicts of Interest

The authors declare no conflicts of interest.

References

R.W. Bacon. “Rockets and feathers: The asymmetric speed of adjustment of UK retail gasoline prices to cost changes.” Energy Econ. 13 (1991): 211–218. [Google Scholar] [CrossRef]
J.Y. Campbell, and L. Hentschel. “No news is good news: An asymmetric model of changing volatility in stock returns.” J. Financ. Econ. 31 (1992): 281–318. [Google Scholar] [CrossRef]
R. Clarida, and M. Gertler. “How the Bundesbank conducts monetary policy.” In Reducing Inflation: Motivation and Strategy. Edited by C.D. Romer and D.H. Romer. Chicago, IL, USA: University of Chicago Press, 1997, pp. 363–412. [Google Scholar]
G. Chamberlain. “A characterization of the distributions that imply mean-variance utility functions.” J. Econ. Theory 29 (1983): 185–201. [Google Scholar] [CrossRef]
J. Owen, and R. Rabinovitch. “On the class of elliptical distributions and their applications to the theory of portfolio choice.” J. Finance 38 (1983): 745–752. [Google Scholar] [CrossRef]
J.E. Ingersoll Jr. Theory of Financial Decision Making. Savage, MD, USA: Rowman & Littlefield, 1987. [Google Scholar]
P.J. Bickel. “On adaptive estimation.” Ann. Stat. 10 (1982): 647–671. [Google Scholar] [CrossRef]
W.K. Newey. “Adaptive estimation of regression models via moment restrictions.” J. Econom. 38 (1988): 301–339. [Google Scholar] [CrossRef]
R.J. Carroll, and A.H. Welsh. “A note on asymmetry and robustness in linear regression.” Am. Stat. 42 (1988): 285–287. [Google Scholar]
M.-J. Lee. “Mode regression.” J. Econom. 42 (1989): 337–349. [Google Scholar] [CrossRef]
M.-J. Lee. “Quadratic mode regression.” J. Econom. 57 (1993): 1–19. [Google Scholar] [CrossRef]
V. Zinde-Walsh. “Asymptotic theory for some high breakdown point estimators.” Econom. Theory 18 (2002): 1172–1196. [Google Scholar] [CrossRef]
H.D. Bondell, and L.A. Stefanski. “Efficient robust regression via two-stage generalized empirical likelihood.” J. Am. Stat. Assoc. 108 (2013): 644–655. [Google Scholar] [CrossRef] [PubMed]
M. Baldauf, and J.M.C. Santos Silva. “On the use of robust regression in econometrics.” Econ. Lett. 114 (2012): 124–127. [Google Scholar] [CrossRef]
Y. Fan, and R. Gencay. “A consistent nonparametric test of symmetry in linear regression models.” J. Am. Stat. Assoc. 90 (1995): 551–557. [Google Scholar] [CrossRef]
I.A. Ahmad, and Q. Li. “Testing symmetry of unknown density functions by kernel method.” J. Nonparametr. Stat. 7 (1997): 279–293. [Google Scholar] [CrossRef]
J.X. Zheng. “Consistent specification testing for conditional symmetry.” Econom. Theory 14 (1998): 139–149. [Google Scholar] [CrossRef]
C. Diks, and H. Tong. “A test for symmetries of multivariate probability distributions.” Biometrika 86 (1999): 605–614. [Google Scholar] [CrossRef]
Y. Fan, and A. Ullah. “On goodness-of-fit tests for weakly dependent processes using kernel method.” J. Nonparametr. Stat. 11 (1999): 337–360. [Google Scholar] [CrossRef]
R.H. Randles, M.A. Fligner, G.E. Policello II, and D.A. Wolfe. “An asymptotically distribution-free test for symmetry versus asymmetry.” J. Am. Stat. Assoc. 75 (1980): 168–172. [Google Scholar] [CrossRef]
L.G. Godfrey, and C.D. Orme. “Testing for skewness of regression disturbances.” Econ. Lett. 37 (1991): 31–34. [Google Scholar] [CrossRef]
J. Bai, and S. Ng. “Tests for skewness, kurtosis, and normality for time series data.” J. Bus. Econ. Stat. 23 (2005): 49–58. [Google Scholar] [CrossRef]
G. Premaratne, and A. Bera. “A test for symmetry with leptokurtic financial data.” J. Financ. Econom. 3 (2005): 169–187. [Google Scholar] [CrossRef]
W.K. Newey, and J.L. Powell. “Asymmetric least squares estimation and testing.” Econometrica 55 (1987): 819–847. [Google Scholar] [CrossRef]
J. Bai, and S. Ng. “A consistent test for conditional symmetry in time series models.” J. Econom. 103 (2001): 225–258. [Google Scholar] [CrossRef]
M.A. Delgado, and J.C. Escanciano. “Nonparametric tests for conditional symmetry in dynamic models.” J. Econom. 141 (2007): 652–682. [Google Scholar] [CrossRef]
T. Chen, and G. Tripathi. “Testing conditional symmetry without smoothing.” J. Nonparametr. Stat. 25 (2013): 273–313. [Google Scholar] [CrossRef]
Y. Fang, Q. Li, X. Wu, and D. Zhang. “A data-driven test of symmetry.” J. Econom. 188 (2015): 490–501. [Google Scholar] [CrossRef]
M. Fernandes, E.F. Mendes, and O. Scaillet. “Testing for symmetry and conditional symmetry using asymmetric kernels.” Ann. Inst. Stat. Math. 67 (2015): 649–671. [Google Scholar] [CrossRef]
S.X. Chen. “Probability density function estimation using gamma kernels.” Ann. Inst. Stat. Math. 52 (2000): 471–480. [Google Scholar] [CrossRef]
N. Gospodinov, and M. Hirukawa. “Nonparametric estimation of scalar diffusion models of interest rates using asymmetric kernels.” J. Empir. Finance 19 (2012): 595–609. [Google Scholar] [CrossRef]
M. Hirukawa, and M. Sakudo. “Family of the generalised gamma kernels: A generator of asymmetric kernels for nonnegative data.” J. Nonparametr. Stat. 27 (2015): 41–63. [Google Scholar] [CrossRef]
M. Fernandes, and J. Grammig. “Nonparametric specification tests for conditional duration models.” J. Econom. 127 (2005): 35–68. [Google Scholar] [CrossRef]
K.B. Kulasekera, and J. Wang. “Smoothing parameter selection for power optimality in testing of regression curves.” J. Am. Stat. Assoc. 92 (1997): 500–511. [Google Scholar] [CrossRef]
K.B. Kulasekera, and J. Wang. “Bandwidth selection for power optimality in a test of equality of regression curves.” Stat. Probab. Lett. 37 (1998): 287–293. [Google Scholar] [CrossRef]
E.W. Stacy. “A generalization of the gamma distribution.” Ann. Math. Stat. 33 (1962): 1187–1192. [Google Scholar] [CrossRef]
I.S. Abramson. “On bandwidth variation in kernel estimates—A square root law.” Ann. Stat. 10 (1982): 1217–1223. [Google Scholar] [CrossRef]
C.J. Stone. “Optimal rates of convergence for nonparametric estimators.” Ann. Stat. 8 (1980): 1348–1360. [Google Scholar] [CrossRef]
N.H. Anderson, P. Hall, and D.M. Titterington. “Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates.” J. Multivar. Anal. 50 (1994): 41–54. [Google Scholar] [CrossRef]
P. Hall. “Central limit theorem for integrated square error of multivariate nonparametric density estimators.” J. Multivar. Anal. 14 (1984): 1–16. [Google Scholar] [CrossRef]
V.S. Koroljuk, and Y.V. Borovskich. Theory of U-Statistics. Dordrecht, The Netherlands: Kluwer Academic Publishers, 1994. [Google Scholar]
B.E. Hansen. “Uniform convergence rates for kernel estimation with dependent data.” Econom. Theory 24 (2008): 726–748. [Google Scholar] [CrossRef]
B.W. Silverman. Density Estimation for Statistics and Data Analysis. London, UK: Chapman & Hall, 1986. [Google Scholar]
J. Gao, and I. Gijbels. “Bandwidth selection in nonparametric kernel testing.” J. Am. Stat. Assoc. 103 (2008): 1584–1594. [Google Scholar] [CrossRef]
J.S. Ramberg, and B.W. Schmeiser. “An approximate method for generating asymmetric random variables.” Commun. ACM 17 (1974): 78–82. [Google Scholar] [CrossRef]
M. Fernandes, and P.K. Monteiro. “Central limit theorem for asymmetric kernel functionals.” Ann. Inst. Stat. Math. 57 (2005): 425–442. [Google Scholar] [CrossRef]

Figure 1. Shapes of the GG Kernels When

b = 0.2

.

Table 1. Distributions of the Disturbance u in the Simulation Study.

**Table 1.** Distributions of the Disturbance u in the Simulation Study.
	Distribution	Skewness	Kurtosis
S1	$N (0, 1)$	$0.00$	$3.00$
S2	$t_{10}$	$0.00$	$4.00$
S3	$D E (0, 1)$ or Standard Laplace	$0.00$	$24.00$
S4	$U [- 1, 1]$ or GLD with $(λ_{1}, λ_{2}, λ_{3}, λ_{4}) = (0, 1, 1, 1)$	$0.00$	$1.80$
A1	$L N (0, 1) - exp (1 / 2)$	$6.18$	$113.94$
A2	$χ_{3}^{2} - 3$	$1.63$	$7.00$
A3	GLD with $(λ_{1}, λ_{2}, λ_{3}, λ_{4}) = (12.601, - 0.00980045, - 0.11, - 0.0001)$	$- 2.92$	$19.52$
A4	GLD with $(λ_{1}, λ_{2}, λ_{3}, λ_{4}) = (- 9.7726, - 0.0151878, - 0.001, - 0.13)$	$3.16$	$23.75$

Note: “S” and “A” stand for symmetric and asymmetric distributions, respectively. “GLD” denotes the generalized lambda distribution by Ramberg and Schmeiser [45]. The distribution is defined in terms of the inverse of the cumulative distribution function

F^{- 1} (u) = λ_{1} + \{u^{λ_{3}} + {(1 - u)}^{λ_{4}}\} / λ_{2}

for

u \in [0, 1]

.

Table 2. Size and Power of the SSST.

**Table 2.** Size and Power of the SSST.
(A) Size										(%)
N	Test	δ	Distribution
			S1		S2		S3		S4
			5%	10%	5%	10%	5%	10%	5%	10%
50	FMS-G-O	−	4.8	9.4	4.9	9.5	6.0	10.8	7.4	12.6
	FMS-G-AltVar	−	4.7	9.3	4.4	8.9	5.8	10.9	6.9	12.6
	$T_{n_{1}, n_{2}}$ -G	0.3	3.5	7.5	3.1	7.0	4.5	8.7	5.2	9.6
		0.5	3.8	6.8	3.0	7.3	4.4	7.9	5.4	10.2
		0.7	3.4	6.7	3.4	7.3	4.5	8.9	5.4	10.5
	$T_{n_{1}, n_{2}}$ -MG	0.3	3.6	7.7	3.9	7.5	4.9	9.5	5.3	9.7
		0.5	4.0	7.2	3.7	7.2	4.9	8.7	5.4	10.2
		0.7	3.3	6.8	3.8	7.3	4.7	9.1	5.3	10.7
	$T_{n_{1}, n_{2}}$ -NM	0.3	3.4	7.3	3.1	6.7	4.3	8.3	5.2	9.3
		0.5	3.9	6.7	3.1	6.4	4.2	7.9	5.2	9.8
		0.7	3.3	7.0	3.1	6.4	4.4	8.1	5.2	10.3
100	FMS-G-O	−	5.2	9.8	6.7	10.8	5.6	10.1	7.4	12.5
	FMS-G-AltVar	−	4.9	8.9	6.4	10.6	5.3	9.9	6.6	12.6
	$T_{n_{1}, n_{2}}$ -G	0.3	4.0	7.0	6.1	9.3	5.1	8.8	7.3	11.9
		0.5	3.6	7.3	5.8	9.1	5.2	8.9	7.1	11.4
		0.7	3.8	7.6	5.9	9.5	4.8	9.3	6.6	11.8
	$T_{n_{1}, n_{2}}$ -MG	0.3	4.5	7.4	6.2	9.5	5.8	9.6	7.3	12.1
		0.5	3.7	7.7	6.1	9.6	5.7	9.0	7.2	11.6
		0.7	3.8	8.1	5.8	9.6	5.1	9.4	6.7	12.1
	$T_{n_{1}, n_{2}}$ -NM	0.3	4.2	6.8	5.3	8.9	4.3	8.4	7.1	12.4
		0.5	4.0	6.6	5.3	8.7	4.9	8.7	7.2	11.6
		0.7	4.2	6.8	5.3	8.8	4.9	8.4	7.3	11.7
200	FMS-G-O	−	4.1	7.3	6.0	9.3	5.8	8.9	8.9	14.1
	FMS-G-AltVar	−	4.1	7.0	6.1	8.9	5.4	9.3	8.7	15.1
	$T_{n_{1}, n_{2}}$ -G	0.3	3.3	6.3	4.9	8.0	4.9	8.7	8.9	12.9
		0.5	3.2	6.4	4.9	8.6	5.3	8.7	8.7	13.3
		0.7	3.5	6.8	5.2	9.2	5.4	8.9	8.4	13.9
	$T_{n_{1}, n_{2}}$ -MG	0.3	3.5	6.7	5.5	8.4	5.5	9.4	9.0	13.4
		0.5	3.5	6.8	5.2	9.0	5.7	8.8	8.7	13.6
		0.7	3.5	7.0	5.5	9.8	5.7	9.1	8.6	13.7
	$T_{n_{1}, n_{2}}$ -NM	0.3	3.5	6.0	4.0	7.5	4.4	8.5	8.7	13.3
		0.5	3.6	5.9	4.4	7.6	4.4	8.4	8.7	13.3
		0.7	3.6	5.8	4.6	7.4	4.5	8.4	8.8	13.1
(B) Power										(%)
N	Test	δ	Distribution
			A1		A2		A3		A4
			5%	10%	5%	10%	5%	10%	5%	10%
50	FMS-G-O	−	42.0	51.6	21.4	30.3	26.3	37.2	28.8	40.8
			[43.7]	[52.4]	[22.7]	[31.0]	[27.6]	[38.1]	[30.5]	[42.0]
	FMS-G-AltVar	−	24.9	37.0	13.3	22.4	39.8	52.2	43.1	55.3
			[27.5]	[39.0]	[13.9]	[24.1]	[41.5]	[53.6]	[44.9]	[57.4]
	$T_{n_{1}, n_{2}}$ -G	0.3	31.3	41.5	17.3	24.7	30.8	41.3	33.8	44.3
			[35.4]	[46.6]	[20.8]	[29.5]	[35.1]	[46.3]	[38.7]	[50.8]
		0.5	29.5	39.7	16.0	23.8	29.4	39.1	31.4	42.9
		0.7	27.8	38.7	14.5	22.6	28.0	36.8	30.2	40.6
	$T_{n_{1}, n_{2}}$ -MG	0.3	32.1	42.2	17.9	25.8	31.6	41.8	35.5	45.5
			[35.2]	[45.7]	[20.9]	[29.3]	[35.0]	[45.8]	[39.1]	[49.5]
		0.5	30.1	40.8	16.5	24.5	30.2	40.3	32.2	43.1
		0.7	28.0	38.9	15.1	23.0	28.5	37.5	30.8	41.1
	$T_{n_{1}, n_{2}}$ -NM	0.3	33.8	42.5	18.2	25.2	32.9	42.5	35.2	46.5
			[36.6]	[48.8]	[20.7]	[30.4]	[36.8]	[48.5]	[39.2]	[53.5]
		0.5	33.5	42.2	17.7	24.9	32.4	42.2	34.6	45.8
		0.7	33.0	41.4	17.4	24.2	32.2	41.6	34.5	45.7
100	FMS-G-O	−	73.0	81.5	41.4	51.4	59.1	71.2	64.2	74.6
			[72.7]	[81.8]	[40.7]	[52.3]	[58.7]	[71.8]	[63.8]	[74.6]
	FMS-G-AltVar	−	56.4	70.2	30.3	43.4	73.7	80.7	77.2	83.7
			[58.1]	[71.6]	[32.2]	[44.4]	[74.4]	[81.8]	[78.1]	[84.4]
	$T_{n_{1}, n_{2}}$ -G	0.3	72.3	80.5	37.9	48.6	70.3	78.4	74.1	80.5
			[75.4]	[84.4]	[40.2]	[54.9]	[72.8]	[82.3]	[76.1]	[84.1]
		0.5	67.1	77.2	33.9	45.0	65.2	75.7	69.5	78.2
		0.7	62.9	73.6	31.9	42.5	61.7	72.3	65.2	75.8
	$T_{n_{1}, n_{2}}$ -MG	0.3	73.1	80.2	38.3	49.3	70.5	78.3	73.6	81.0
			[74.8]	[83.6]	[40.9]	[53.5]	[72.4]	[80.7]	[75.4]	[83.3]
		0.5	67.4	77.5	34.6	45.4	65.6	75.4	69.5	77.9
		0.7	62.9	73.5	32.0	42.3	62.2	72.5	65.3	75.2
	$T_{n_{1}, n_{2}}$ -NM	0.3	76.8	84.0	41.7	51.9	75.5	82.1	76.8	84.0
			[79.6]	[87.2]	[44.8]	[58.0]	[77.6]	[85.0]	[79.7]	[86.9]
		0.5	77.0	84.0	40.1	51.0	75.1	82.0	75.7	83.0
		0.7	76.4	83.7	39.6	50.1	74.9	81.9	75.6	82.9
200	FMS-G-O	−	97.4	98.3	71.5	80.7	95.6	98.1	97.1	97.8
			[97.8]	[98.6]	[75.0]	[84.2]	[96.9]	[98.6]	[97.4]	[98.4]
	FMS-G-AltVar	−	93.4	96.3	60.4	72.8	98.7	99.0	98.4	99.1
			[95.2]	[97.5]	[69.1]	[78.5]	[98.8]	[99.2]	[99.0]	[99.2]
	$T_{n_{1}, n_{2}}$ -G	0.3	97.7	99.1	77.0	84.8	98.6	99.1	98.6	99.2
			[98.7]	[99.4]	[82.7]	[90.3]	[98.8]	[99.4]	[98.9]	[99.2]
		0.5	97.2	98.1	71.3	80.9	97.7	98.9	97.9	98.7
		0.7	96.4	97.6	65.4	75.8	96.7	98.8	97.2	98.3
	$T_{n_{1}, n_{2}}$ -MG	0.3	97.9	99.1	76.2	85.3	98.6	99.1	98.5	99.1
			[98.6]	[99.4]	[81.8]	[89.8]	[98.7]	[99.4]	[98.7]	[99.2]
		0.5	97.3	98.1	71.4	80.3	97.6	98.9	97.9	98.6
		0.7	96.5	97.6	65.5	75.9	96.7	98.7	97.2	98.3
	$T_{n_{1}, n_{2}}$ -NM	0.3	98.9	99.0	84.8	91.1	98.8	99.2	98.6	98.9
			[99.0]	[99.2]	[91.1]	[95.8]	[99.4]	[99.7]	[99.5]	[99.7]
		0.5	96.8	97.0	81.2	88.2	98.3	98.6	98.4	98.6
		0.7	95.6	95.7	80.9	88.3	98.3	98.6	98.4	98.6

Note: Numbers in brackets are size-adjusted powers.

¹Hirukawa and Sakudo [32] present the Weibull kernel as yet another special case. However, it is not confirmed that this kernel satisfies Lemma 1 below, and thus the kernel is not investigated throughout.
²It is possible to use different asymmetric kernels and/or different smoothing parameters to estimate f and g. For convenience, however, we choose to employ the same asymmetric kernel function and a single smoothing parameter.
³Although the GLDs corresponding to A3 and A4 are used in Zheng [17] and FMS, they are found to have non-zero means. Therefore, we adjust the values of $λ_{1}$ and $λ_{2}$ with skewness and kurtosis maintained so that the resulting distributions have means of zero.

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).

Testing Symmetry of Unknown Densities via Smoothing with the Generalized Gamma Kernels

Abstract

1. Introduction

2. Family of the GG Kernels: A Brief Review

3. Tests for Symmetry and Conditional Symmetry Smoothed by the GG Kernels

3.1. SSST as a Special Case of Two-Sample Goodness-of-Fit Tests

3.2. SSST When Two Sub-Samples Have Unequal Sample Sizes

3.3. Extension to a Test for Conditional Symmetry

4. Smoothing Parameter Selection

5. Finite-Sample Performance

5.1. Setup

5.2. Simulation Results

6. Conclusions

Appendix A. Appendix

Appendix A.1. Proof of Lemma 1

Appendix A.2. Proof of Theorem 1

Appendix A.2.1. Proof of Lemma A2

Appendix A.2.2. Proof of Lemma A3

Appendix A.2.3. Proof of Lemma A4

Appendix A.2.4. Proof of Theorem 1

Appendix A.3. Proof of Proposition 1

Appendix A.4. Proof of Theorem 3

Acknowledgments

Author Contributions

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics