Next Article in Journal
Is Benford’s Law a Universal Behavioral Theory?
Next Article in Special Issue
Bootstrap Tests for Overidentification in Linear Regression Models
Previous Article in Journal
On Bootstrap Inference for Quantile Regression Panel Data: A Monte Carlo Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Joint Specification Test for Response Probabilities in Unordered Multinomial Choice Models

by
Masamune Iwasawa
1,2
1
Graduate School of Economics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto 606-8501, Japan
2
Japan Society for the Promotion of Science, Kojimachi Business Center Building, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan 
Econometrics 2015, 3(3), 667-697; https://doi.org/10.3390/econometrics3030667
Submission received: 4 June 2015 / Revised: 28 August 2015 / Accepted: 9 September 2015 / Published: 16 September 2015
(This article belongs to the Special Issue Recent Developments of Specification Testing)

Abstract

:
Estimation results obtained by parametric models may be seriously misleading when the model is misspecified or poorly approximates the true model. This study proposes a test that jointly tests the specifications of multiple response probabilities in unordered multinomial choice models. The test statistic is asymptotically chi-square distributed, consistent against a fixed alternative and able to detect a local alternative approaching to the null at a rate slower than the parametric rate. We show that rejection regions can be calculated by a simple parametric bootstrap procedure, when the sample size is small. The size and power of the tests are investigated by Monte Carlo experiments.

1. Introduction

Not infrequently, variables of interest in economic research are discrete and unordered, as we often find the variables that indicate the behavior or state of economic agents. Some econometric models have been developed to deal with these discrete and unordered outcomes. Above all, parametric models, such as the multinomial logit (MNL) and probit (MNP) models proposed by [1,2], respectively, are widely employed, for example, in structural econometric analysis (e.g., the economic models of automobile sales in [3,4]) and as part of econometric methods (e.g., selection bias correction of [5,6]). However, results obtained by such parametric models may be seriously misleading when the model is misspecified or poorly approximates the true model. Thus, researchers need to examine the validity of parametric assumptions as long as the assumptions are refutable from data alone.
This study proposes a new specification test that is directly applicable to any multinomial choice models with unordered outcome variables. These models set parametric assumptions on response probabilities that an option is chosen from multiple alternatives, and identical assumptions are often set for all response probabilities. Problems occur when these models do not mimic the true models, because the response probabilities and partial effects of some variables on the probabilities cannot be properly predicted. Moreover, the parameter estimation results may be misleading and their interpretation confusing. The specification test proposed here can be utilized to justify the choice of parametric models and to avoid misspecification problems.
The novelty of the test provided in this study is that it allows us to test the specifications of response probabilities jointly for all choice alternatives. Multinomial choice models with unordered outcomes consist of multiple response probabilities, each of which may be parameterized differently. This implies that one needs to test multiple null hypotheses to justify the parametric assumptions of these models. A substantial number of specification tests has been developed to test a single null hypothesis. To our knowledge, however, no joint specification tests have so far been theoretically suggested for multiple null hypotheses.
The test proposed here is based on moment conditions. We show that the test statistic is asymptotically chi-square distributed, consistent against a fixed alternative and able to detect a local alternative approaching the null at the rate of 1 / n h q / 2 , where q is the number of independent variables.
One eminent feature of our test is that a parametric bootstrap procedure works well to calculate the rejection region for the test statistic. Since the testing method involves nonparametric estimation, a sufficiently large sample size could be required to establish that the chi-squared distribution is a proper approximation for the distribution of the test statistic. Thus, a simple parametric bootstrap procedure to calculate rejection regions is a practical need.
A crucial point that makes parametric bootstrap work is that the orthogonality condition holds with bootstrap sampling under both the null and alternative hypotheses. This is different from the specification test for the regression function that requires the wild bootstrapping procedure to calculate the rejection region proven by [7]. It is also noteworthy that the parametric nature of the model leads to substantial savings in the computational cost of bootstrapping.
Methodologically, two different approaches have been developed to construct specification tests. One uses an empirical process and the other a smoothing technique. We call the first type empirical process-based tests and the second type smoothing-based tests. Most of the literature on specification tests can be categorized into one of these two types. Empirical process-based tests are proposed by [8,9,10,11,12,13,14,15,16,17,18], among others. Smoothing-based tests are proposed by [7,19,20,21,22,23,24,25,26,27,28,29,30,31,32], to mention only a few.
These two types of tests are complementary to each other, rather than substitutional, in terms of the power property. For Pitman local alternatives, empirical process-based tests are more powerful than the smoothing-based tests. Empirical process-based tests can detect Pitman local alternatives approaching the null at the parametric rate n - 1 / 2 , whereas smoothing-based tests can detect them at a rate slower than the parametric rate. Smoothing-based tests are, however, more powerful for a singular local alternative that changes drastically or is of high frequency. Empirical process-based tests can be represented by a kernel-like weight function with a fixed smoothing parameter. Thus, it can be intuitively understood that empirical process-based tests oversmooth the true function and obscure the drastic changes of alternatives. The work in [33] shows that smoothing-based tests can detect singular local alternatives at a rate faster than n - 1 / 2 .
The test proposed in this study is most related to [30], which proposes a smoothing-based test for functional forms of the regression function. Most of the specification tests developed for functional forms of the regression function can be directly applied to test the parametric specifications of ordered choice models, such as the parametric binary choice models, because ordered choice models have only a single response probability that is equal to the conditional expectation of the outcome. For example, [34] applied several specification tests, originally developed for regression functions, to some ordered discrete choice models for a comparison of their relative merits based on their asymptotic size and power. However, applying them to unordered multinomial choice models, as done in this study, is not a trivial task. Extending empirical process-based tests and rate-optimal tests1 to unordered multinomial choice models is a task left for future research.
This paper is organized as follows. Section 2 introduces unordered multinomial choice models and reveals the problems of parametric specification. The new test statistic is proposed in Section 3. The assumptions and asymptotic behavior are provided in Section 4. Section 5 shows how to bootstrap parametrically. We investigate the size and power of the test by conducting Monte Carlo experiments in Section 6. We conclude with Section 7. The proofs of the lemmas and propositions are provided in the Appendix.

2. Unordered Multinomial Choice Models

We have the observations { { Y i , j , X i , j } i = 1 n } j = 1 J , where Y i , j { 0 , 1 } is a binary response variable that takes one if individual i chooses alternative j and zero otherwise. Each individual chooses one of J alternatives, which implies Y i , m = 0 for all m j if Y i , j = 1 . X i , j R k j is a vector of independent variables that affect the choice decision made by individual i. Throughout this paper, we assume that { X i , j , Y i , j } i = 1 n is independent and identically distributed for each j = 1 , , J . With i remaining fixed, however, { X i , j , Y i , j } j = 1 J is not necessarily independent or identical.
Multinomial choice models with unordered response variables are constructed by introducing latent variables y i , j * , which may be interpreted as the utility or satisfaction that i can obtain by choosing alternative j. We assume each individual chooses an alternative that maximizes personal utility; that is, Y i , j = 1 if y i , j * > y i , m * for all m j . Further, y i , j * depends on a function g j ( X i , j , θ ) and unobserved error ϵ i , j : y i , j * = g j ( X i , j , θ ) + ϵ i , j , where ϵ i , j is independent of X i , j and θ Θ is a parameter in a subset of a finite dimensional space Θ. Then, the response probability that i chooses j can be formulated as follows:
P ( Y i , j = 1 | X i ) = P ( y i , j * > y i , m * m j | X i ) = P ( ϵ i , j - ϵ i , m > g m ( X i , m , θ ) - g j ( X i , j , θ ) m j | X i ) ,
where X i R q is a vector consisting of all independent variables. The dimension q of X i is equal to j = 1 J k j when all variables in X i , j are alternative-specific for all j. This occurs when no variable in X i , j is identical to any of those in X i , m as long as j m .
A specification of the functional forms of g ( · ) and the distributions of ϵ leads to full parameterization of the model in the sense that parameters and response probabilities can be estimated parametrically. For example, if we assume linearity, g j ( X i , j , θ ) = X i , j β , and the Type I extreme-value distribution for ϵ i , j for all j, we have MNL model in which: P ( Y i , j = 1 | X i ) = exp ( X i , j β ) / j = 1 J exp ( X i , j β ) .2 An alternative model suggested by [2] is the MNP model, in which ϵ i , j is assumed to be normally distributed. In both cases, the parameters can be inferred by maximum likelihood estimation, and the choice probabilities are obtained by plugging the estimated values into (1).
Specification of the distribution of ϵ in (1) is less restrictive than specifying a distribution of a random variable. This is because specification of ϵ is true if ϵ is in a family of distributions. The strict inequality in (1) holds after any transformations on both sides of the inequality with any strictly increasing functions. For example, the distribution of ϵ i , j - ϵ i , m in (1) could be transformed into a well-known one as the normal or Type I extreme distribution. In these special cases, the distributions of ϵ’s may not be an essential specification issue, provided we can specify the right-hand side of the inequality correctly. In other words, distributional assumptions of error terms that help us simplify the estimation of parametric models could be justified by specifying the functional forms of g j ( · ) prudently.
In empirical studies, however, functional forms of g j ( · ) and distributions of ϵ i , j are generally unknown for all j. Moreover, in unordered multinomial choice models, the functional forms of g j ( · ) and the distributions of ϵ i , j may be nonidentical across j. Thus, we need joint specification tests that indicate whether parametric specifications provide a good approximation to the true models. The appropriate null and alternative hypotheses are as follows:
H 0 : P [ m θ , j ( X i ) = P ( Y i , j = 1 | X i ) ] = 1 , for some θ Θ and for all j , H 1 : P [ m θ , j ( X i ) = P ( Y i , j = 1 | X i ) ] < 1 , for any θ Θ and for some j ,
where m j ( X i ) denotes the true response probabilities and m θ , j ( X i ) their parameterized variants.

3. Test Statistic

The test statistic proposed in this study is built on the features of response probabilities, that is the moment conditions that are satisfied when the parametric response probability is true. This implies that we test the specifications of the functional forms of g j ( · ) and the distributions of ϵ i . j simultaneously for all j. Rejection of the null hypothesis, thus, indicates that at least one of the parametric specifications of g j ( · ) and ϵ i . j is misspecified.
Before presenting the test statistic, we introduce some notations. Let f h ( x ) be the non-parametric density estimator for a continuous point of X i as follows:
f h ( x ) = 1 n h q i = 1 n K X i - x h ,
where K ( · ) is a kernel function and h is a bandwidth depending on n. In addition, we define K ( 2 ) as the two-times convolution product of the kernel and K ( 4 ) as that of K ( 2 ) .
The test statistic is based on
Z j E [ u θ , i , j E ( u θ , i , j | X i ) f ( X i ) ] ,
where u θ , i , j = Y i , j - m θ , j ( X i ) and f ( · ) is the marginal density of X i . Under the null hypothesis, Z j = 0 , since E ( u θ , i , j | X i ) = 0 . Under the alternative hypothesis, E [ u θ , i , j E ( u θ , i , j | X i ) f ( X i ) ] = E [ E ( u θ , i , j | X i ) 2 f ( X i ) ] c E { [ P ( Y i , j = 1 | X i ) - m θ , j ( X i ) ] 2 } > 0 , for some positive constant c, provided that f ( · ) is bounded away from zero.
The nonparametric estimates of Z j , denoted as Z n , j , can be obtained as follows:
Z n , j = 1 n ( n - 1 ) i = 1 n l i n 1 h q K X i - X l h u ^ θ , i , j u ^ θ , l , j ,
where u ^ θ , i , j = Y i , j - m θ ^ , j ( X i ) and m θ ^ , j ( X i ) is the estimate of m θ , j ( X i ) . We denote the asymptotic variance of Z n , j and the covariance between Z n , j and Z n , m by V j , j and V j , m , respectively.
We introduce some further notations to provide the test statistic. Note that testing the specification of an arbitrary pair of J - 1 response probabilities is a sufficient test for the null hypothesis subject to j = 1 J P ( Y i , j = 1 | X i ) = 1 for all i. For notational simplicity, we omit the J-th response probability from our test statistic. Let Z n ( Z n , 1 , , Z n , J - 1 ) be a ( J - 1 ) × 1 vector and V ^ a ( J - 1 ) × ( J - 1 ) variance-covariance matrix whose ( j , m ) elements are estimates of V j , m . Then, the test statistic is
C n = n 2 h q Z n V ^ - 1 Z n ,
where
V ^ j , j = K ( 2 ) 0 2 n i = 1 n [ σ ^ j 2 ( X i ) ] 2 f h ( X i ) , V ^ j , m = K ( 2 ) 0 2 n i = 1 n [ σ ^ j , m ( X i ) ] 2 f h ( X i ) ,
for all j = 1 , , J - 1 and j m . σ ^ j 2 ( · ) is the estimated conditional variance of u i , j Y i , j - m j ( X i ) , where E ( u i , j | X i ) = 0 and m j ( x ) P ( Y i , j = 1 | X i = x ) = E ( Y i , j | X i = x ) , and σ ^ j , m ( · ) is the estimated covariance between u i , j and u i , m .
Considering the nature of the model, σ ^ j 2 ( · ) and σ ^ j , m ( · ) can be easily obtained. Since Y i , j is a binary variable taking zero or one, u i , j = [ 1 - m j ( X i ) ] 1 ( Y i , j = 1 ) - m j ( X i ) 1 ( Y i , j = 0 ) , where 1 ( · ) is an indicator function. The conditional variance of u i , j and the covariance between u i , j and u i , m can then be written straightforwardly as follows:
σ j 2 ( X i ) E ( u i , j 2 | X i ) = m j ( X i ) [ 1 - m j ( X i ) ] ,
σ j , m ( X i ) E ( u i , j u i , m | X i ) = - m j ( X i ) m m ( X i ) .
Thus, consistent parametric estimators of σ j 2 ( X i ) and σ j , m ( X i ) under the null hypothesis are σ ^ j 2 ( x ) = m θ ^ , j ( x ) [ 1 - m θ ^ , j ( x ) ] and σ ^ j , m ( x ) = - m θ ^ , j ( x ) m θ ^ , m ( x ) , respectively.

4. The Asymptotic Behavior

First, we provide sufficient assumptions to show the asymptotic behavior of the test statistic. Asymptotic distributions under the null and alternative hypothesis are then given. Finally, we show the asymptotic behavior of the test statistic under Pitman local alternatives.

4.1. Assumptions

The following are sufficient assumptions to show the test statistic’s asymptotic behavior.
Assumption 1: 
X lies on a compact set. The marginal density of X i , denoted as f ( · ) , is continuously differentiable and bounded away from zero.
Assumption 2: 
m ( · ) is continuously differentiable on the support of X.
Assumption 3: 
P ( Y i , j = 1 | X i ) 0 and P ( Y i , j = 1 | X i ) 1 , for all i and j. None of the alternatives is a perfect substitute for any other.
Assumption 4: 
m θ , j ( X ) is differentiable with respect to θ, the derivative θ m θ , j ( X ) is continuous with respect to X and θ, and sup θ Θ | m θ , j ( x ) | < for all x.
Assumption 5: 
There exists a unique value for the θ, defined as θ 0 = arg max θ i = 1 n j = 1 J 1 { Y i , j = 1 } log [ m θ , j ( X i ) ] . Letting θ 0 = θ , it satisfies θ ^ - θ = O p ( 1 / n ) .
Assumption 1 establishes that the first-order derivative of f ( · ) is bounded. The assumption that X lies on a compact set may be considered too strong, because it excludes X from some tractable distributions, such as the normal. However, it does not confine applications of the test to empirical study, because, in general, observations rarely take an infinite value. The assumption that f ( · ) is bounded from zero avoids the random denominator problem associated with a nonparametric kernel estimation. It is also straightforward to see that the first-order derivative of m ( · ) is also bounded under Assumptions 1 and 2.
Assumption 3 guarantees that σ j 2 ( X i ) 0 and σ j , l ( X i ) 0 for any j and l j , because σ j 2 ( X i ) = P ( Y i , j = 1 | X i ) P ( Y i , j = 0 | X i ) and σ j , l ( X i ) = - P ( Y i , j = 1 | X i ) P ( Y i , l = 1 | X i ) . It is also clear that σ j 2 ( X i ) and σ j , l ( X i ) never tend to infinity, owing to the nature of the model. The fact that no alternatives are perfect substitutes for each other ensures that the variance-covariance matrix V is invertible.
We need Assumption 4 to show the asymptotic behavior of C n . The n -consistency of the parametric estimation given in Assumption 5 is obtained, for example, by maximal likelihood estimation of a multinomial probit or logit model.
The kernel function assumption is as follows:
Assumption 6: 
The kernel K is a symmetric function and satisfies K ( u ) d u = 1 , | K ( u ) | d u < , sup | K ( u ) | < and | u K ( u ) | 0 as | u | .
Assumption 6 is satisfied by commonly-used second-order kernels, such as the Epanechnikov, Gaussian and quartic kernels, and the two-times convolution product of the kernel is bounded under this assumption. Furthermore, the nonparametric density estimator is consistent under Assumptions 1 and 5 (see, for example, Theorem 4.1 of [41]).

4.2. Asymptotic Distribution under the Null Hypothesis

We provide a proposition about the asymptotic distribution of C n under the null hypothesis. The proof of the proposition is provided in the Appendix.
Proposition 1. 
Let Assumptions 1–6 hold. Then, under the null hypothesis,
C n d χ ( J - 1 ) 2 ,
as h 0 and n h q .
Proposition 1 indicates that the asymptotic distribution of the test statistic C n under the null hypothesis is a chi-squared distribution with J - 1 degrees of freedom. Therefore, we reject the null hypothesis that the parametric specification of the response probability is identical to the true one with a probability of one if C n > t α , where t α is the ( 1 - α ) quantile of the chi-squared distribution with J - 1 degrees of freedom.

4.3. Asymptotic Distribution under the Alternative Hypothesis

We show that the test statistic is consistent, that is its asymptotic power is equal to one. The proof of the lemma is provided in the Appendix.
Lemma 1. 
Let Assumptions 1–6 hold. Then, under the alternative hypothesis,
1 n h 2 / q n h 2 / q Z n , j V ^ j , j p E { [ m θ , j ( X i ) - m j ( X i ) ] 2 f ( X i ) } 2 K ( 2 ) 0 E { m θ , j ( X i ) 2 [ 1 - m θ , j ( X i ) ] 2 f ( X i ) } > 0 ,
for some j as n and h 0 .
The proof of Lemma 1 provided in the Appendix implies that n h 2 / q Z n , j diverges for some j as the sample size n increases and V ^ j , j converges to a constant that is strictly larger than zero. In addition, it is straightforward to see that the probability limits of V ^ j , m under the alternative hypothesis is
2 K ( 2 ) 0 E [ m θ , j ( X i ) 2 m θ , m ( X i ) 2 f ( X i ) ] ,
which is bounded above by Assumptions 1–3 and 5 for any j m . Thus, the following proposition follows immediately.
Proposition 2. 
Let Assumptions 1–6 hold. Then, under the alternative hypothesis, C n diverges in probability, and thus, the asymptotic power of the test is one.
The proof of Proposition 2 is apparent from Lemma 1 and the discussion on the probability limits of V ^ j , m under the alternative hypothesis mentioned above.

4.4. Asymptotic Distribution under the Pitman Local Alternative

We show that the test statistic C n has nontrivial power against Pitman local alternatives approaching the null at the rate of 1 / n h q / 2 . The proof of the lemma is provided in the Appendix. Let us consider a sequence of local alternatives:
H 1 n : P ( Y i , j = 1 | X i ) = m θ , j ( X i ) + δ n l j ( X i ) ,
where l j ( · ) is a known continuous function with E [ l j ( · ) 2 ] < for all j and δ n 0 at the rate of 1 / n h q / 2 .
Lemma 2. 
Let Assumptions 1–6 hold. Then, under the local alternative hypothesis,
n h q / 2 Z n , j d N ( M j , V j , j ) for all j ,
where M j E [ l j ( x ) 2 f ( x ) ] .
Lemma 2 indicates that the limiting distribution of n h q / 2 Z n , j / V j , j is the normal distribution with mean M j V j , j - 1 / 2 and variance one. The following proposition shows that the test statistic can detect the local alternative with nontrivial power.
Proposition 3. 
Let Assumptions 1–6 hold. Then, under the local alternative hypothesis, the test statistic C n converges to a non-central chi-squared distribution with J - 1 degrees of freedom:
C n d χ ( J - 1 ) 2 ( λ ˜ ) ,
where λ ˜ M V - 1 M is a noncentrality parameter.
The proof of Proposition 3 is straightforward from Lemma 2 and the discussion on the probability limits of V ^ j , m for j m in the proof of Proposition 1.

5. Bootstrap Methods

This section presents a bootstrapping method that is useful in approximating the distribution of the test statistic when the sample size is small. Specification tests for the regression function usually require the wild bootstrapping procedure to calculate the rejection region, as proven by [7]. In our case, however, the wild bootstrap does not work well. The intuitive reason is that it fails to generate a bootstrap sample for the binary response variable.
We show that the parametric bootstrap procedure works well to calculate the rejection region for the test statistic. Intuitively, this is because the binary bootstrap sample for the response variable, say Y i * , can be driven according to the parametrically-generated response probabilities, and there are no specific conditions that should be held by Y i * in multinomial choice models. This is, for example, different from the case of the regression model in which the conditional expectation of the error term should be zero. The proof for the proposition in this section is provided in the Appendix.
The response probability that person i chooses alternative j can be parametrically estimated under the null hypothesis for all i and j by using the observations { { X i , j , Y i , j } i = 1 n } j = 1 J . For each person, we randomly choose one of J alternatives (say, alternative m i ) with individual probabilities equal to the estimated response probabilities. Then, we derive bootstrap observations Y i * { Y i , 1 * , Y i , 2 * , , Y i , m i * , , Y i , J * } for each i = 1 , , n , so that P ( Y i , j * | X i ) = m θ ^ , j ( X i ) , where Y i , m i * = 1 and Y i , j * = 0 for j m i . We use { { X i , j , Y i , j * } i = 1 n } j = 1 J as the bootstrap observations.
Assumptions 3 and 5 can be rewritten by using the bootstrap observations as follows:
Assumption 3’: 
P ( Y i , j * = 1 | X i ) 0 and P ( Y i , j * = 1 | X i ) 1 , for all i and j. None of the alternatives is a perfect substitute for any other.
Assumption 5’: 
There exists a unique value for the θ, defined as θ 0 = arg max θ i = 1 n j = 1 J 1 { Y i , j = 1 } log [ m θ , j ( X i ) ] . Letting θ 0 = θ , it satisfies θ ^ * - θ = O p ( 1 / n ) .
where θ ^ * is the estimate of θ obtained by using the bootstrap observations { { X i , j , Y i , j * } i = 1 n } j = 1 J .
Since the bootstrap sample Y i , j * is derived in accordance with the parametrically-estimated response probabilities m θ ^ , j ( X i ) , Assumption 3’ implies that these probabilities do not take the values zero and one; that is, m θ ^ , j ( X i ) 0 and m θ ^ , j ( X i ) 1 , for all i and j. Assumption 3’ holds whenever Assumption 3 holds and one applies parametric models whose estimates do not exceed below zero and above one, such as the MNL or MNP model. Assumption 5’ requires that θ ^ * be a consistent estimator of θ. Assumption 5’ is satisfied whenever Assumption 5 holds because the true value of θ ^ * is θ ^ , which converges to θ in probability.

Bootstrap Methods for C n

The test statistic C n * is constructed similarly to C n by using the bootstrap observations { { X i , j , Y i , j * } i = 1 n } j = 1 J . We obtain the ( 1 - α ) quantile t α * by Monte Carlo approximation for the distribution of C n * . The null hypothesis is rejected if C n > t α * . In the following proposition, we show that this parametric bootstrap procedure works: under the null hypothesis, C n * converges to the asymptotic distribution of C n ; under the alternative hypothesis, C n * converges to the asymptotic distribution of the test statistics under the null hypothesis.
Proposition 4. 
Let Assumptions 1–6 hold. Then, the test statistic obtained with the bootstrap observation converges to a chi-squared distribution with J - 1 degrees of freedom:
C n * p χ ( J - 1 ) 2 ,
as n and h 0 .

6. Monte Carlo Experiments

The size and power of the test are examined by Monte Carlo experiments. We consider a simple case in which each individual chooses one of three alternatives. To explore the power properties of the test, we consider three different true models.
The null hypothesis to be tested is the following:
H 0 : P m θ , j ( X i ) = exp ( β 0 + β 1 X i , j ) j = 1 J exp ( β 0 + β 1 X i . j ) = 1 ,
for some β 0 , β 1 R and for all j = 1 , 2 , 3 . The null hypothesis is based on the assumptions that the function g j ( X i , j , θ ) is linear, specifically, β 0 + β 1 X i , j , and that ϵ i , j follows the Type I extreme-value distribution for all j. For simplicity of calculation, X i , j is assumed to be one-dimensional.
We consider three different true models. Each of these true models has a specific form of g j ( · ) , which can be generally written as g j ( X i , j , θ ) = γ j X i , j + c j ( X i , j - 1 / 2 ) 2 + d j ( 2 X i , j - 2 / 3 ) 3 . By applying specific values in γ { γ 1 , γ 2 , γ 3 } , c ( c 1 , c 2 , c 3 ) and d ( d 1 , d 2 , d 3 ) , we propose three true models: Model 1: γ = { 1 , 1 , 1 } , c = ( 0 , 0 , 1 ) , and d = ( 0 , 0 , 0 ) ; Model 2: γ = { 1 , 1 , 5 } , c = ( 0 , 3 , 5 ) , and d = ( 0 , 0 , 0 ) ; and Model 3: γ = { 1 , 1 , 1 } , c = ( 0 , 3 , 5 ) , and d = ( 0 , 3 , 5 ) . The true distribution of ϵ i , j is a Type I extreme-value distribution for all j.
These true models allow us to investigate the power property of the test in the case of misspecification due to nonlinearity and the choice-specific coefficients. We add nonlinearity to the true function of g j ( · ) in all true models by setting c j and d j at nonzero values. Choice-specific coefficients are inserted into Model 2 by setting γ j at different values across j. In this experiment, we do not consider the misspecification originating in the distribution of ϵ i , j and the omitted variables.
We derive { { X i , j } j = 1 3 } i = 1 n uniformly from [0,1] and { { ϵ i , j } j = 1 3 } i = 1 n randomly from the Type I extreme-value distribution. Then, the latent variable y * is generated by each true model: y i , j * = g j ( X i , j , θ ) + ϵ i , j . The binary outcome Y i , j is chosen to be one, if y i , j * > y i , m * for all m j , and zero otherwise. Sample sizes are n = 50 and n = 100 . The critical value is computed by B = 100 repetitions of the parametric bootstrap, and all results are based on M = 1000 simulation runs.
To calculate the test statistics, X i , j is considered to be specific to each alternative, namely, q = 3 . The quartic kernel K ( z ) = ( 15 / 16 ) ( 1 - z 2 ) 2 1 ( | z | < 1 ) is used for nonparametric estimation. Bandwidths for the kernel estimator are chosen to be h { 0 . 30 , 0 . 35 , 0 . 40 , 0 . 45 } .
Table 1 illustrates the size of the test at the 5 % significance level. The first and second rows of the table show the size of the test, where the critical values are obtained by the parametric bootstrap ( t 0 . 05 * ) and asymptotic distribution of the test statistic ( t 0 . 05 ), respectively. The first to fourth columns of the table illustrate the results obtained with a sample size of n = 50 and bandwidths h of 0 . 30 , 0 . 35 , 0 . 40 and 0 . 45 , respectively. Similarly, the fifth to eighth columns show the result with n = 100 . Overall, the test tends to over-reject the null hypothesis when the critical values are calculated by parametric bootstrap. The probability of rejection is close to its nominal size when h = 0 . 35 and n = 50 . In contrast, the test tends to under-reject the null hypothesis when the critical values are the 95 % quantile of the chi-squared distribution with two degrees of freedom. The probability of rejection is close to its nominal size when h = 0 . 30 and n = 50 .
Table 1. Monte Carlo estimates of the size.
Table 1. Monte Carlo estimates of the size.
Critical Value\h n = 50 n = 100
0.30 0.35 0.40 0.45 0.30 0.35 0.40 0.45
t 0.05 * 0.069 0.047 0.062 0.058 0.058 0.064 0.063 0.077
t 0.05 0.049 0.040 0.042 0.043 0.032 0.051 0.050 0.046
The significance level is 0.05 .
In comparing the power performance of the test, it is possible to correct size distortion by using the bandwidths corresponding to the nominal size of the tests. In practice, however, this procedure cannot be employed, because we do not know the true model. Thus, we do not correct the size distortion in this experiment. We rather show the power performance with each bandwidth level, since choosing an appropriate bandwidth in practice is outside the scope of this paper.
Before beginning to show the simulation results of the power performance of the test statistics, we illustrate the discrepancy between the true and parametric null models. The response probabilities in this simulation are mappings of the unit cube to the unit interval. For illustration simplicity, however, we focus on the domain of the response probabilities, being { X i = ( X i , 1 , X i , 2 , X i , 3 ) : X i , j [ 0 , 1 ] for all j and X i , 1 = X i , 2 = X i , 3 } . In this setting, the fitted values for the response probabilities of the parametric model under the null hypothesis are always 1 / 3 for all j, because the model does not have any alternative-variant coefficients.
Figure 1 shows how the true and null response probabilities react to the covariates. The larger distance between the true and null models with x fixed indicates that the parametric null model does not approximate the true model well. The parametric predictions of response probabilities lie closer to the true response probability in Model 1 than in Models 2 and 3 for all j. For the second and third alternatives, the parametric null response probability appears to lie closer to the true response probability in Model 3 than in Model 2. For the first alternative, however, the distance between the true and null models seems closer in Model 3. In brief, the null model gives the best response probability predictions in Model 1, but the predictions are less accurate in Models 2 and 3. The prediction precision of the null model could reflect the power performance of the test statistics.
Table 2 reports the proportion of rejections of the null hypothesis at the 5 % significance level. The first to third rows of the table show the power of the test when the true models are Model 1, Model 2 and Model 3, respectively, where the critical values are obtained by parametric bootstrap. Similarly, the fourth to sixth rows of the table show the power, where the critical values are obtained by the 95 % quantile of the chi-squared distribution with two degrees of freedom. The first to fourth columns of the table illustrate the power results obtained with a sample size of n = 50 and bandwidths h of 0 . 30 , 0 . 35 , 0 . 40 and 0 . 45 , respectively. Similarly, the fifth to eighth columns show the result with n = 100 .
The test does not have a decidedly nontrivial power when Model 1 is true. Non-rejection of the null hypothesis does not imply that the null model is true. However, in fact, as the top three figures in Figure 1 show, the parametric model under the null hypothesis may provide a proper approximation for the response probabilities of Model 1. Therefore, the low power of the test statistic may be acceptable. In contrast, the test statistic has more nontrivial power when Model 2 or 3 is true. The greater the sample size, the better the power performance, which depends on the choice of bandwidth.
Figure 1. Discrepancy between true and estimated parametric response probabilities.
Figure 1. Discrepancy between true and estimated parametric response probabilities.
Econometrics 03 00667 g001
Table 2. Proportion of null hypothesis rejections based on Monte Carlo simulation.
Table 2. Proportion of null hypothesis rejections based on Monte Carlo simulation.
Critical ValueModel\h n = 50 n = 100
0.30 0.35 0.40 0.45 0.30 0.35 0.40 0.45
t 0.05 * Model 1 0.061 0.058 0.055 0.070 0.065 0.053 0.063 0.076
Model 2 0.810 0.888 0.935 0.960 0.998 1.000 1.000 1.000
Model 3 0.236 0.330 0.397 0.411 0.672 0.709 0.791 0.838
t 0.05 Model 1 0.047 0.043 0.056 0.061 0.049 0.053 0.052 0.053
Model 2 0.807 0.907 0.940 0.970 0.995 1.000 0.999 1.000
Model 3 0.255 0.304 0.365 0.415 0.600 0.713 0.777 0.839
The significance level is 0.05 .
Closer inspection of Table 2 reveals that the test performs better in terms of power when the critical values are obtained by parametric bootstrap, especially when the sample size is n = 50 . Too see this, we compare the results of h = 0 . 35 when critical values are t 0 . 05 * with those of h = 0 . 30 when critical values are t 0 . 05 . We compare the results with different bandwidths because the size of the test is close to its nominal size with these bandwidths ( 0 . 047 and 0 . 049 , respectively). When the true model is Model 2, the probability of the rejection of the null hypothesis is 0 . 888 for t 0 . 05 * . The probability of the rejection is 0 . 807 for t 0 . 05 . Similarly, when the true model is Model 3, the probability is 0 . 330 for t 0 . 05 * and 0 . 255 for t 0 . 05 . It is surprising that the performance of the test is not unreasonable when critical values are obtained by an asymptotic distribution. However, at least in this setting, the test shows higher power when critical values are obtained by parametric bootstrap when the sample size is small.

7. Conclusions

This study proposes a consistent specification test for unordered multinomial choice models. It tests the specifications of multiple response probabilities jointly for all choice alternatives. The test statistic is asymptotically chi-square distributed with J - 1 degrees of freedom, consistent against a fixed alternative and have nontrivial power against local alternatives approaching the null at the rate of 1 / n h q / 2 . The rejection region for the test statistic can be calculated through a simple parametric bootstrap procedure, when the sample size is small. In Monte Carlo experiments, we test the specification of the MNL model under three true models to examine the power performance of the test. We find that the test statistic does not have a decidedly nontrivial power when the parametric model under the null hypothesis provides a proper approximation for the response probabilities of the true model. The test statistic has more nontrivial power when the approximation of the null model is less successful. In addition, we find that the test shows higher power performance when critical values are obtained by parametric bootstrap than when they are obtained by the asymptotic distribution of the test statistic. The differences of the power performances are greater when the sample size is small. We can reduce size distortion by choosing an appropriate bandwidth, but this issue remains for future research.
The test proposed in this study can be applied to testing the parametric specifications of response probabilities for any unordered multinomial choice models, including the MNL and MNP models. However, the test is not able to detect local alternatives approaching the null hypothesis at the parametric rate, nor is it rate-optimal. Extending the testing procedure to incorporate such features is left for future research.

Acknowledgments

The author is grateful to Yoshihiko Nishiyama, Ryo Okui, and Naoya Sueishi for their helpful comments and guidance. I also would like to thank Kohtaro Hitomi, Yoon-Jae Whang, and two anonymous referees for constructive comments that improved the paper. This work was supported by JSPS KAKENHI Grant Number 13J06130.

Conflicts of Interest

The author declares no conflict of interest.

Appendix: Proofs

Proof of Proposition 1. 
We first prove the following:
n h q / 2 Z ^ n , j d N ( 0 , V j , j ) ,
V j , j - V ^ j , j = o p ( 1 ) ,
V j , m - V ^ j , m = o p ( 1 ) ,
where V j , j and V j , m are the asymptotic variance of n h q / 2 Z ^ n , j and the covariance between n h q / 2 Z ^ n , j and n h q / 2 Z ^ n , m , respectively. We show that they can be written as follows:
V j , j 2 K ( 2 ) ( 0 ) E { [ σ 2 ( x ) ] 2 f ( x ) } , V j , m 2 K ( 2 ) ( 0 ) E { [ σ j , m ( x ) ] 2 f ( x ) } .
Proof of (4). 
Under the null hypothesis, we have m j ( · ) = m θ , j ( · ) . Thus, it follows that
n h q / 2 Z ^ n , j = 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h u θ ^ , i , j u θ ^ , l , j = 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) + u i , j ] [ m θ , j ( X l ) - m θ ^ , j ( X l ) + u l , j ] = 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h u i , j u l , j + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] u l , j + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] u i , j Z 1 , j + Z 2 , j + Z 3 , j + Z 4 , j .
We will prove the following:
Z 1 , j = o p 1 ,
Z 2 , j d N 0 , V j , j ,
Z 3 , j + Z 4 , j = o p 1 .
Proof of (7). 
We show that Z 1 , j = o p 1 . By Assumption 4, there is an interior point θ ˜ between θ and θ ^ , such that
m θ ^ , j ( X i ) - m θ , j ( X i ) = θ m θ ˜ , j ( X i ) ( θ ^ - θ ) .
By using this, Z 1 , j can be represented as follows:
Z 1 , j = 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] = h q / 2 n ( θ ^ - θ ) 1 n ( n - 1 ) i = 1 n l i n 1 h q K X i - X l h m θ ˜ , j ( X i ) θ m θ ˜ , j ( X l ) θ n ( θ ^ - θ ) h q / 2 n ( θ ^ - θ ) 1 n ( n - 1 ) i = 1 n l i n Z ¯ 1 ( X i , X l ) n ( θ ^ - θ ) ,
where Z ¯ 1 ( X i , X l ) is a symmetric function. We apply Lemma 3.1 of [42] to the second-order U-statistic, 1 n ( n - 1 ) i = 1 n l i n Z ¯ 1 ( X i , X l ) . To do this, we need to show that E [ Z ¯ 1 ( X i , X l ) 2 ] = o ( n ) .
E [ Z ¯ 1 ( X i , X l ) 2 ] = 1 h 2 q E K X i - X l h 2 m θ ˜ , j ( X i ) θ 2 m θ ˜ , j ( X l ) θ 2 = 1 h 2 q K x - y h 2 m θ ˜ , j ( x ) θ 2 m θ ˜ , j ( y ) θ 2 f ( x ) f ( y ) d x d y = 1 h q K ( u ) 2 m θ ˜ , j ( x ) θ 2 m θ ˜ , j ( x - u h ) θ 2 f ( x ) f ( x - u h ) d x d u = 1 h q K ( 2 ) ( 0 ) m θ ˜ , j ( x ) θ 4 f ( x ) 2 d x + O ( h ) = O ( h - q ) + O [ n ( n h q ) - 1 ] = o ( n ) since n h q .
Applying Lemma 3.1 of [42], we obtain 1 n ( n - 1 ) i = 1 n l i n Z ¯ 1 ( X i , X l ) = E [ Z ¯ 1 ( X i , X l ) ] + o p ( 1 / n ) , where
E [ Z ¯ 1 ( X i , X l ) ] = 1 h q E K X i - X l h m θ ˜ , j ( X i ) θ m θ ˜ , j ( X l ) θ = 1 h q K x - y h m θ ˜ , j ( x ) θ m θ ˜ , j ( y ) θ f ( x ) f ( y ) d x d y = K ( u ) m θ ˜ , j ( x ) θ m θ ˜ , j ( x - u h ) θ f ( x ) f ( x - u h ) d x d u = m θ ˜ , j ( x ) θ m θ ˜ , j ( x ) θ f ( x ) 2 d x = O ( 1 ) .
Therefore, we yield
Z 1 , j = h q / 2 n ( θ ^ - θ ) 1 n ( n - 1 ) i = 1 n l i n Z ¯ 1 ( X i , X l ) n ( θ ^ - θ ) , = O p ( h q / 2 ) = o p ( 1 ) .
Proof of (8). 
Note that Z 2 , j can be treated as a second-order degenerate U-statistic:
h q / 2 n Z 2 , j = 1 n ( n - 1 ) i = 1 n l i n K X i - X l h u i , j u l , j .
Define G n ( Z 1 , Z 2 ) = E Z i [ { K ( X 1 - X i ) / h u 1 , j u i , j } { K ( X 2 - X i ) / h u 2 , j u i , j } ] , where Z i = { X i , u i } . According to the central limit theorem for degenerate U-statistics proposed by [43],
Z 2 , j h - q / 2 2 E { [ u 1 , j u 2 , j K X 1 - X 2 h ] 2 } d N ( 0 , 1 ) ,
if
E [ G n 2 ( Z 1 , Z 2 ) ] + n - 1 E { [ u 1 , j u 2 , j K X 1 - X 2 h ] 4 } E { [ u 1 , j u 2 , j K X 1 - X 2 h ] 2 } 2 0 as n .
Thus, it is enough to show that (12) and
2 h q E u 1 , j u 2 , j K X 1 - X 2 h 2 V j , j ,
hold.
Proof of (12). 
First, straightforward calculation gives
E [ G n 2 ( Z 1 , Z 2 ) ] = E E Z i u 1 , j u 2 , j u i , j 2 K X 1 - X i h K X 2 - X i h 2 = E σ j 2 ( X 1 ) σ j 2 ( X 2 ) σ j 2 ( z ) K X 1 - z h K X 2 - z h f ( z ) d z 2 = h 3 q K ( 4 ) ( 0 ) [ σ j 2 ( x ) ] 4 f ( x ) 4 d x + O ( h 3 q + 1 ) + o ( h 3 q + 1 ) = O ( h 3 q ) .
Similarly, it can be shown that
1 n E u 1 , j u 2 , j K X 1 - X 2 h 4 = 1 n σ j 4 ( x ) σ j 4 ( y ) K x - y h 4 f ( x ) f ( y ) d x d y = h q n [ σ j 4 ( x ) ] 2 f 2 ( x ) d x K u 4 d u + O h 2 q n = O h q n .
Following some calculation, we obtain
E u 1 , j u 2 , j K X 1 - X 2 h 2 2 = E σ 2 ( X 1 ) σ 2 ( X 2 ) K X 1 - X 2 h 2 2 = h 2 q K ( 2 ) ( 0 ) [ σ 2 ( x ) ] 2 f 2 ( x ) d x + O ( h ) 2 = O ( h 2 q ) .
Finally, (14)–(16) indicate that (12) holds because O ( h 3 q ) + O h q n O ( h 2 q ) 0 as h 0 and n h q .
Proof of (13). 
From Equation (16), it is clear that
2 h q E u 1 , j u 2 , j K X 1 - X 2 h 2 = 2 K ( 2 ) ( 0 ) [ σ 2 ( x ) ] 2 f 2 ( x ) d x + O ( h ) = 2 K ( 2 ) ( 0 ) E { [ σ 2 ( x ) ] 2 f ( x ) } + O ( h ) V j , j .
Proof of (9). 
We show that Z 3 , j + Z 4 , j = o p ( 1 ) . By using (10), Z 3 , j + Z 4 , j can be represented as follows:
Z 3 , j + Z 4 , j = 1 ( n - 1 ) i = 1 n l i n 1 h q / 2 K X i - X l h { [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] u l , j + [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] u i , j } = 1 ( n - 1 ) i = 1 n l i n 1 h q / 2 K X i - X l h m θ ˜ , j ( X i ) θ ( θ ^ - θ ) u l , j + m θ ˜ , j ( X l ) θ ( θ ^ - θ ) u i , j = h q / 2 ( n - 1 ) i = 1 n l i n 1 h q K X i - X l h u l , j m θ ˜ , j ( X i ) θ + u i , j m θ ˜ , j ( X l ) θ ( θ ^ - θ ) h q / 2 ( n - 1 ) i = 1 n l i n Z ¯ 3 ( X i , X l ) ( θ ^ - θ ) ,
where Z ¯ 3 ( X i , X l ) is a symmetric function. We apply Lemma 3.1 of [42] to the second-order U-statistic, 1 n ( n - 1 ) i = 1 n l i n Z ¯ 3 ( X i , X l ) . To do this, we need to show that: E [ Z ¯ 3 ( X i , X l ) 2 ] = o ( n ) .
E [ | Z ¯ 3 ( X i , X l ) | 2 ] 2 h 2 q E K X i - X l h 2 m θ ˜ , j ( X i ) θ 2 u l , j 2 + 2 h 2 q E K X i - X l h 2 m θ ˜ , j ( X l ) θ 2 u i , j 2 = 2 h 2 q K x - y h 2 m θ ˜ , j ( x ) θ 2 σ j 2 ( y ) f ( x ) f ( y ) d x d y + 2 h 2 q K x - y h 2 m θ ˜ , j ( y ) θ 2 σ j 2 ( x ) f ( x ) f ( y ) d x d y = 2 h q K ( u ) 2 m θ ˜ , j ( x ) θ 2 σ j 2 ( x - u h ) f ( x ) f ( x - u h ) d x d u + 2 h q K ( v ) 2 m θ ˜ , j ( y ) θ 2 σ j 2 ( y + v h ) f ( y + v h ) f ( y ) d v d y = 2 h q K ( 2 ) ( 0 ) m θ ˜ , j ( x ) θ 2 σ j 2 ( x ) f ( x ) 2 d x + 2 h q K ( 2 ) ( 0 ) m θ ˜ , j ( y ) θ 2 σ j 2 ( y ) f ( y ) 2 d y + O ( h ) = O ( h - q ) + O [ n ( n h q ) - 1 ] = o ( n ) since n h q .
Applying Lemma 3.1 of [42], we obtain 1 n ( n - 1 ) i = 1 n l i n Z ¯ 3 ( X i , X l ) = E [ Z ¯ 3 ( X i , X l ) ] + o p ( 1 / n ) , where E [ Z ¯ 3 ( X i , X l ) ] = 0 . Therefore,
Z 3 , j + Z 4 , j = n h q / 2 n ( n - 1 ) i = 1 n l i n Z ¯ 3 ( X i , X l ) ( θ ^ - θ ) = n h q / 2 o p ( 1 / n ) O p ( 1 / n ) = o p ( h q / 2 ) = o p ( 1 ) .
Proof of (5) and (6). 
Since the asymptotic variance is shown above, we derive the asymptotic covariance between n h q / 2 Z ^ n , j and n h q / 2 Z ^ n , m , which we denote as V j , m . From the results of (7)–(9), it is clear that E ( Z 2 , j Z 2 , m ) V j , m as n . Because E ( u i , j u l , j ) = 0 if i l , and E ( u i , j u i , m | X i ) = σ j , m ( X i ) if j m , it follows that
E ( Z 2 , j Z 2 , m ) = 1 ( n - 1 ) 2 h q E i = 1 n l 1 n K X i - X l h u i , j u l , j s = 1 n t s n K X s - X t h u s , m u t , m = 2 ( n - 1 ) 2 h q E i = 1 n l 1 n u i , j u l , j u i , m u l , m K X i - X l h 2 = 2 n ( n - 1 ) h q σ j , m ( x ) σ j , m ( y ) K x - y h 2 f ( x ) f ( y ) d x d y = 2 K ( 2 ) ( 0 ) [ σ j , m ( x ) ] 2 f 2 ( x ) d x + O ( h ) V j , m .
Thus, the proofs of (5) and (6) are straightforward from (17) and (19).
Let Z 2 = ( Z 2 , 1 , Z 2 , 2 , , Z 2 , J - 1 ) . Similarly to the proof of (8), it can be straightforwardly shown that t Z 2 d N ( 0 , t V t ) for any ( J - 1 ) × 1 vector t, where V is a ( J - 1 ) × ( J - 1 ) variance-covariance matrix whose ( j , m ) elements are V j , m . Then, by the Cramér–Wold device, Z 2 converges to a multivariate normal distribution with ( J - 1 ) × 1 mean vector consists of zero and variance-covariance matrix V. Therefore, C n , which is the quadratic form of n h q / 2 Z ^ n , j , converges to a chi-squared distribution with J - 1 degrees of freedom.  ☐
Proof of Lemma 1. 
Under the alternative hypothesis, n h q / 2 Z n , j can be represented as follows:
n h q / 2 Z ^ n , j = 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h u θ ^ , i , j u θ ^ , l , j = 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m j ( X i ) - m θ ^ , j ( X i ) + u i , j ] [ m j ( X l ) - m θ ^ , j ( X l ) + u l , j ] = 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m j ( X i ) - m θ , j ( X i ) + m θ , j ( X i ) - m θ ^ , j ( X i ) + u i , j ] [ m j ( X l ) - m θ , j ( X l ) + m θ , j ( X l ) - m θ ^ , j ( X l ) + u l , j ] = 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m j ( X i ) - m θ , j ( X i ) ] [ m j ( X l ) - m θ , j ( X l ) ] + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m j ( X i ) - m θ , j ( X i ) ] [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m j ( X i ) - m θ , j ( X i ) ] u l , j + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] [ m j ( X l ) - m θ , j ( X l ) ] + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] u l , j + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m j ( X l ) - m θ , j ( X l ) ] u i , j + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] u i , j + 1 h q / 2 ( n - 1 ) i = 1 n l i n K X i - X l h u i , j u l , j A 1 , j + A 2 , j + A 3 , j + A 4 , j + A 5 , j + A 6 , j + A 7 , j + A 8 , j + A 9 , j ,
where A 5 , j = Z 1 , j = o p ( 1 ) and A 6 , j + A 8 , j = Z 3 , j + Z 4 , j = o p ( 1 ) . We show that
1 n h 2 / q ( A 2 , j + A 3 , j + A 4 , j + A 7 , j + A 9 , j ) = o p ( 1 ) .
Then, Z ^ n , j = 1 n h 2 / q A 1 , j + o p ( 1 ) . Thus, it is enough to show that (20), and the following holds:
1 n h 2 / q A 1 , j = E { [ m θ , j ( X i ) - m j ( X i ) ] 2 f ( X i ) } + o p ( 1 ) ,
V ^ j , j Z h = 2 K ( 2 ) 0 E { m θ , j ( X i ) 2 [ 1 - m θ , j ( X i ) ] 2 f ( X i ) } + o p ( 1 ) .
Since σ ^ j 2 ( x ) = m θ ^ , j ( x ) [ 1 - m θ ^ , j ( x ) ] converges to m θ , j ( x ) [ 1 - m θ , j ( x ) ] in a probability under the alternative hypothesis, the proofs of (22) is straightforward.
Proof of (20). 
First, we show 1 n h 2 / q ( A 2 , j + A 4 , j ) = o p ( 1 ) . 1 n h 2 / q ( A 2 , j + A 4 , j ) can be represented as a second-order U-statistic as follows:
1 n h 2 / q ( A 2 , j + A 4 , j ) = 1 n ( n - 1 ) i = 1 n l i n 1 h q K X i - X l h [ m j ( X i ) - m θ , j ( X i ) ] [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] + [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] [ m j ( X l ) - m θ , j ( X l ) ] = 1 n ( n - 1 ) i = 1 n l i n 1 h q K X i - X l h [ m j ( X i ) - m θ , j ( X i ) ] θ m θ ˜ , j ( X l ) + [ m j ( X l ) - m θ , j ( X l ) ] θ m θ ˜ , j ( X i ) ( θ ^ - θ ) 1 n ( n - 1 ) i = 1 n l i n A ¯ 2 ( X i , X l ) ( θ ^ - θ ) ,
where A ¯ 2 ( X i , X l ) is a symmetric function. We show that E [ A ¯ 2 ( X i , X l ) 2 ] = o ( n ) .
E [ A ¯ 2 ( X i , X l ) 2 ] 2 h 2 q E K X i - X l h 2 | m j ( X i ) - m θ , j ( X i ) | 2 θ m θ ˜ , j ( X l ) 2 + 2 h 2 q E K X i - X l h 2 | m j ( X l ) - m θ , j ( X l ) | 2 θ m θ ˜ , j ( X i ) 2 = 2 h 2 q K x - y h 2 | m j ( x ) - m θ , j ( x ) | 2 θ m θ ˜ , j ( y ) 2 f ( x ) f ( y ) d x d y + 2 h 2 q K x - y h 2 | m j ( y ) - m θ , j ( y ) | 2 θ m θ ˜ , j ( x ) 2 f ( x ) f ( y ) d x d y = 2 h q K ( u ) 2 | m j ( x ) - m θ , j ( x ) | 2 θ m θ ˜ , j ( x - u h ) 2 f ( x ) f ( x - u h ) d x d u + 2 h q K ( v ) 2 | m j ( y ) - m θ , j ( y ) | 2 θ m θ ˜ , j ( y + v h ) 2 f ( y + v h ) f ( y ) d v d y = 2 h q K ( 2 ) ( 0 ) | m j ( x ) - m θ , j ( x ) | 2 θ m θ ˜ , j ( x ) 2 f ( x ) 2 d x + 2 h q K ( 2 ) ( 0 ) | m j ( y ) - m θ , j ( y ) | 2 θ m θ ˜ , j ( y ) 2 f ( y ) 2 d y + O ( h ) = O ( h - q ) + O [ n ( n h q ) - 1 ] = o ( n ) since n h q .
Applying Lemma 3.1 of [42], we obtain 1 n ( n - 1 ) i = 1 n l i n A ¯ 2 ( X i , X l ) = E [ A ¯ 2 ( X i , X l ) ] + o p ( 1 ) , where
E [ A ¯ 2 ( X i , X l ) ] = 1 h q K x - y h [ m j ( x ) - m θ , j ( x ) ] m θ ˜ , j ( y ) θ f ( x ) f ( y ) d x d y + 1 h q K x - y h [ m j ( y ) - m θ , j ( y ) ] m θ ˜ , j ( x ) θ f ( x ) f ( y ) d x d y = K ( u ) [ m j ( x ) - m θ , j ( x ) ] m θ ˜ , j ( x - u h ) θ f ( x ) f ( x - u h ) d x d u + K ( v ) [ m j ( y ) - m θ , j ( y ) ] m θ ˜ , j ( y + v h ) θ f ( y + v h ) f ( y ) d v d y = [ m j ( x ) - m θ , j ( x ) ] m θ ˜ , j ( x ) θ f ( x ) 2 d x + [ m j ( y ) - m θ , j ( y ) ] m θ ˜ , j ( y ) θ f ( y ) 2 d y + O ( h ) = O ( 1 ) .
Therefore, we yield
1 n h 2 / q ( A 2 , j + A 4 , j ) = 1 n ( n - 1 ) i = 1 n l i n A ¯ 2 ( X i , X l ) ( θ ^ - θ ) = O p ( 1 / n ) = o p ( 1 ) .
Next, we show 1 n h 2 / q ( A 3 , j + A 7 , j ) = o p ( 1 ) . 1 n h 2 / q ( A 3 , j + A 7 , j ) can be represented as a second-order U-statistic as follows:
A 3 , j + A 7 , j n h 2 / q = 1 n ( n - 1 ) i = 1 n l i n 1 h q K X i - X l h [ m j ( X i ) - m θ , j ( X i ) ] u l , j + [ m j ( X l ) - m θ , j ( X l ) ] u i , j = 1 n ( n - 1 ) i = 1 n l i n A ¯ 3 ( X i , X j ) ,
where A ¯ 3 ( X i , X j ) is a symmetric function. We show that E [ A ¯ 3 ( X i , X l ) 2 ] = o ( n ) .
E [ A ¯ 3 ( X i , X l ) 2 ] 2 h 2 q E K X i - X l h 2 | m j ( X i ) - m θ , j ( X i ) | 2 u l , j 2 + 2 h 2 q E K X i - X l h 2 | m j ( X l ) - m θ , j ( X l ) | 2 u i , j 2 = 2 h 2 q K x - y h 2 | m j ( x ) - m θ , j ( x ) | 2 σ j 2 ( y ) f ( x ) f ( y ) d x d y + 2 h 2 q K x - y h 2 | m j ( y ) - m θ , j ( y ) | 2 σ j 2 ( x ) f ( x ) f ( y ) d x d y = 2 h q K ( u ) 2 | m j ( x ) - m θ , j ( x ) | 2 σ j 2 ( x - u h ) f ( x ) f ( x - u h ) d x d u + 2 h q K ( v ) 2 | m j ( y ) - m θ , j ( y ) | 2 σ j 2 ( y + v h ) f ( y + v h ) f ( y ) d v d y = 2 h q K ( 2 ) ( 0 ) | m j ( x ) - m θ , j ( x ) | 2 σ j 2 ( x ) f ( x ) 2 d x + 2 h q K ( 2 ) ( 0 ) | m j ( y ) - m θ , j ( y ) | 2 σ j 2 ( y ) f ( y ) 2 d y + O ( h ) = O ( h - q ) + O [ n ( n h q ) - 1 ] = o ( n ) since n h q .
Applying Lemma 3.1 of [42], we obtain 1 n ( n - 1 ) i = 1 n l i n A ¯ 3 ( X i , X l ) = E [ A ¯ 3 ( X i , X l ) ] + o p ( 1 ) , where E [ A ¯ 3 ( X i , X l ) ] = 0 . Therefore, we yield
A 3 , j + A 7 , j n h 2 / q = 1 n ( n - 1 ) i = 1 n l i n A ¯ 3 ( X i , X j ) = o p ( 1 ) .
Finally, we show that n - 1 h - 2 / q A 9 , j = o p ( 1 ) . It is clear that 1 n h 2 / q A 9 , j is a second-order U-statistic. This satisfies the condition for Lemma 3.1 of [42] as follows:
E 1 h q K X i - X l h u i , j u l , j 2 = 1 h 2 q K x - y h 2 σ j 2 ( x ) σ j 2 ( y ) f ( x ) f ( y ) d x d y = 1 h q K ( u ) 2 [ σ j 2 ( x ) ] 2 f ( x ) 2 d x d u + O ( h ) = O ( h - q ) = O ( n ( n h q ) - 1 ) = o ( n ) since n h q .
Applying Lemma 3.1 of [42], we obtain 1 n h q / 2 A 9 , j = E h - q u i , j u l , j K X i - X l h + o p ( 1 ) , where E h - q u i , j u l , j K X i - X l h = 0 .
Proof of (21). 
1 n h 2 / q A 1 , j can be represented as a second-order U-statistic as follows:
1 n h 2 / q A 1 , j = 1 n ( n - 1 ) i = 1 n l i n 1 h q K X i - X l h [ m j ( X i ) - m θ , j ( X i ) ] [ m j ( X l ) - m θ , j ( X l ) ] 1 n ( n - 1 ) i = 1 n l i n A ¯ 1 ( X i , X l ) ,
where A ¯ 1 ( X i , X l ) is a symmetric function. Similar to (11), it can be straightforwardly shown that E [ A ¯ 1 ( X i , X l ) 2 ] = o ( n ) . The only difference from (11) is we have m j ( x ) - m θ , j ( x ) 2 , which is uniformly bounded, as a part of the integrand instead of m θ ˜ , j ( x ) / θ 2 . By applying Lemma 3.1 of [42], we yield 1 n h 2 / q A 1 , j = E [ A ¯ 1 ( X i , X l ) ] + o p ( 1 ) , where
E [ A ¯ 1 ( X i , X l ) ] = 1 h q K x - y h [ m j ( x ) - m θ , j ( x ) ] [ m j ( y ) - m θ , j ( y ) ] f ( x ) f ( y ) d x d y = K ( u ) [ m j ( x ) - m θ , j ( x ) ] [ m j ( x - u h ) - m θ , j ( x - u h ) ] f ( x ) f ( x - u h ) d x d u = [ m j ( x ) - m θ , j ( x ) ] 2 f ( x ) 2 d x + O ( h ) = E { [ m θ , j ( X i ) - m j ( X i ) ] 2 f ( X i ) } + O ( h ) .
  ☐
Proof of Lemma 2. 
Under the local alternative hypothesis, n h q / 2 Z n , j can be written as follows:
n h q / 2 Z n , j = 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h u ^ θ , i , j u ^ θ , l , j = 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) + δ n l j ( X i ) + u i , j ] [ m θ , j ( X l ) - m θ ^ , j ( X l ) + δ n l j ( X l ) + u l , j ] = 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] + 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] δ n l j ( X l ) + 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] δ n l j ( X i ) + 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] u l , j + 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] u i , j + 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h δ n 2 l j ( X i ) l j ( X l ) + 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h δ n l j ( X i ) u l , j + 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h δ n l j ( X l ) u i , j + 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h u i , j u l , j B 1 , j + B 2 , j + B 3 , j + B 4 , j + B 5 , j + B 6 , j + B 7 , j + B 8 , j + B 9 , j ,
where B 1 , j = Z 1 , j = o p ( 1 ) , B 4 , j + B 5 , j = Z 3 , j + Z 4 , j = o p ( 1 ) and B 9 , j = Z 2 , j d N 0 , V j , j . It suffices to show the following:
B 2 , j + B 3 , j = o p ( 1 ) ,
B 6 , j p E [ l j ( x ) 2 f ( x ) ] ,
B 7 , j + B 8 , j = o p ( 1 ) .
Proof of (23). 
We show that B 2 , j + B 3 , j = o p ( 1 ) . B 2 , j + B 3 , j can be represented as follows:
B 2 , j + B 3 , j = 1 ( n - 1 ) h q / 2 i = 1 n l 1 n K X i - X l h { [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] l j ( X l ) + [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] l j ( X i ) } δ n = n h q / 2 n ( n - 1 ) i = 1 n l 1 n 1 h q K X i - X l h { [ m θ , j ( X i ) - m θ ^ , j ( X i ) ] l j ( X l ) + [ m θ , j ( X l ) - m θ ^ , j ( X l ) ] l j ( X i ) } δ n = n h q / 2 n ( n - 1 ) i = 1 n l 1 n 1 h q K X i - X l h m θ ˜ , j ( X i ) θ l j ( X l ) + m θ ˜ , j ( X l ) θ l j ( X i ) ( θ ^ - θ ) δ n n h q / 2 n ( n - 1 ) i = 1 n l 1 n B ¯ 2 ( X i , X l ) ( θ ^ - θ ) δ n ,
where B ¯ 2 ( X i , X l ) is a symmetric function. Similar to (18), it can be straightforwardly shown that E [ | B ¯ 2 ( X i , X l ) | 2 ] = o ( n ) . The only difference from (18) is that we have l j ( · ) 2 instead of σ j 2 ( · ) as a part of the integrand, where E [ l j ( · ) 2 ] is assumed to be bounded. By applying Lemma 3.1 of [42], we yield 1 n ( n - 1 ) i = 1 n l 1 n B ¯ 2 ( X i , X l ) = E [ B ¯ 2 ( X i , X l ) ] + o p ( 1 ) , where
E [ B ¯ 2 ( X i , X l ) ] = 1 h q K x - y h m θ ˜ , j ( x ) θ l j ( y ) + m θ ˜ , j ( y ) θ l j ( x ) f ( x ) f ( y ) d x d y = K ( u ) m θ ˜ , j ( x ) θ l j ( x - u h ) + m θ ˜ , j ( x - u h ) θ l j ( x ) f ( x ) f ( x - u h ) d x d u = 2 m θ ˜ , j ( x ) θ l j ( x ) f ( x ) 2 d x + O ( h ) = O ( 1 ) .
Therefore,
B 2 , j + B 3 , j = n h q / 2 n ( n - 1 ) i = 1 n l 1 n B ¯ 2 ( X i , X l ) ( θ ^ - θ ) δ n = n h q / 2 O ( 1 ) O p ( 1 / n ) O ( 1 / n h q / 2 ) = O p ( h q / 2 ) = o p ( 1 ) .
Proof of (24). 
We show that B 6 , j converges to E [ l j ( x ) 2 f ( x ) ] as n . B 6 , j can be represented as follows:
B 6 , j = n h q / 2 n ( n - 1 ) i = 1 n l 1 n 1 h q K X i - X l h l j ( X i ) l j ( X l ) δ n 2 n h q / 2 n ( n - 1 ) i = 1 n l 1 n B ¯ 6 ( X i , X l ) δ n 2 ,
where B ¯ 6 ( X i , X l ) is a symmetric function. Similar to (11), it can be straightforwardly shown that E [ | B ¯ 6 ( X i , X l ) | 2 ] = o ( n ) . The only difference from (11) is that we have l j ( · ) 2 instead of m θ ˜ , j ( X i ) / θ 2 as a part of the integrand, where E [ l j ( · ) 2 ] is assumed to be bounded. By applying Lemma 3.1 of [42], we yield 1 n ( n - 1 ) i = 1 n l 1 n B ¯ 6 ( X i , X l ) = E [ B ¯ 6 ( X i , X l ) ] + o p ( 1 ) , where
E [ B ¯ 6 ( X i , X l ) ] = 1 h q K x - y h l j ( y ) l j ( x ) f ( x ) f ( y ) d x d y = K ( u ) l j ( x - u h ) l j ( x ) f ( x ) f ( x - u h ) d x d u = l j 2 ( x ) f ( x ) 2 d x + O ( h ) .
Therefore,
B 6 , j = n h q / 2 n ( n - 1 ) i = 1 n l 1 n B ¯ 6 ( X i , X l ) δ n 2 = n h q / 2 { E [ l j 2 ( x ) f ( x ) ] + o p ( 1 ) } δ n 2 p E [ l j 2 ( x ) f ( x ) ] .
Proof of (25). 
We show that B 7 , j + B 8 , j = o p ( 1 ) . B 7 , j + B 8 , j can be represented as follows:
B 7 , j + B 8 , j = n h q / 2 n ( n - 1 ) i = 1 n l 1 n 1 h q K X i - X l h [ l j ( X i ) u l , j + l j ( X l ) u i , j ] δ n n h q / 2 n ( n - 1 ) i = 1 n l 1 n B ¯ 7 ( X i , X l ) δ n ,
where B ¯ 7 ( X i , X l ) is a symmetric function. Similar to (18), it can be straightforwardly shown that E [ | B ¯ 7 ( X i , X l ) | 2 ] = o ( n ) . The only difference from (18) is that we have l j ( · ) 2 instead of m θ ˜ , j ( X i ) / θ 2 as a part of the integrand, where E [ l j ( · ) 2 ] is assumed to be bounded. By applying Lemma 3.1 of [42], we yield 1 n ( n - 1 ) i = 1 n l 1 n B ¯ 7 ( X i , X l ) = E [ B ¯ 7 ( X i , X l ) ] + o p ( 1 / n ) , where E [ B ¯ 7 ( X i , X l ) ] = 0 . Therefore,
B 7 , j + B 8 , j = n h q / 2 n ( n - 1 ) i = 1 n l 1 n B ¯ 7 ( X i , X l ) δ n = n h q / 2 o p ( 1 / n ) δ n = o p ( 1 ) .
  ☐
Proofs of Proposition 4. 
Proposition 4 can be proven along the same lines as Proposition 1. Let u i , j * = Y i , j * - m j * ( X i ) , where m j * ( X i ) E ( Y i , j * | X i ) = m θ ^ , j ( X i ) , and therefore, E ( u i , j * | X i ) = 0 . Then, the boundedness of σ j * 4 ( x ) E [ u i , j * 4 | X i = x ] corresponding to (15) can be shown straightforwardly because Y i , j * is a binary variable taking the values zero and one, and X lies on a compact set by Assumption 1.
We first prove the following:
n h q / 2 Z ^ n , j * d N ( 0 , V j , j * ) ,
V j , j * - V ^ j , j * = o p ( 1 ) ,
V j , m * - V ^ j , m * = o p ( 1 ) ,
where V j , j * and V j , m * are the asymptotic variance of n h q / 2 Z ^ n , j * and covariance between n h q / 2 Z ^ n , j * and n h q / 2 Z ^ n , m * , respectively. We show that they can be written as follows:
V j , j * 2 K ( 2 ) ( 0 ) E { [ σ j * 2 ( x ) ] 2 f ( x ) d x } , V j , m * 2 K ( 2 ) ( 0 ) E { [ σ j , m * ( x ) ] 2 f ( x ) d x } ,
where σ j * 2 ( x ) is the conditional variance of u i , j * and σ j , m * ( x ) is the covariance between u i , j * and u i , m * .
Proof of (26). 
Let u θ ^ , i , j * = Y i , j * - m θ ^ * , j ( X i ) and Y i , j * = m θ ^ , j ( X i ) + u i , j * , where E ( u i , j * | X i ) = 0 by definition. Then,
n h q / 2 Z ^ n , j * = 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h u θ ^ , i , j * u θ ^ , l , j * = 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X i ) - m θ ^ * , j ( X i ) + u i , j * ] [ m θ ^ , j ( X l ) - m θ ^ * , j ( X l ) + u l , j * ] = 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X i ) - m θ ^ * , j ( X i ) ] [ m θ ^ , j ( X l ) - m θ ^ * , j ( X l ) ] + 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h u i , j * u l , j * + 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X i ) - m θ ^ * , j ( X i ) ] u l , j * + 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X l ) - m θ ^ * , j ( X l ) ] u i , j * Z 1 , j * + Z 2 , j * + Z 3 , j * + Z 4 , j * .
We will prove that
Z 1 , j * = o p 1 ,
Z 2 , j * d N ( 0 , V j , j * ) ,
Z 3 , j * + Z 4 , j * = o p 1 .
Proof of (29). 
Z 1 , j * can be represented as follows:
Z 1 , j * = 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X i ) - m θ , j ( X i ) + m θ , j ( X i ) - m θ ^ * , j ( X i ) ] [ m θ ^ , j ( X l ) - m θ , j ( X l ) + m θ , j ( X l ) - m θ ^ * , j ( X l ) ] = 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X i ) - m θ , j ( X i ) ] [ m θ ^ , j ( X l ) - m θ , j ( X l ) ] + 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ * , j ( X i ) ] [ m θ , j ( X l ) - m θ ^ * , j ( X l ) ] + 2 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X i ) - m θ , j ( X i ) ] [ m θ , j ( X l ) - m θ ^ * , j ( X l ) ] Z 1 , j * + Z 1 , j * + Z 1 , j * ,
where Z 1 , j * = Z 1 , j = o p ( 1 ) . By Assumption 4, there is an interior point θ ˜ * between θ and θ ^ * , such that
m θ ^ * , j ( X i ) - m θ , j ( X i ) = θ m θ ˜ * , j ( X i ) ( θ ^ * - θ ) .
Therefore, Z 1 , j * = o p ( 1 ) and Z 1 , j * = o p ( 1 ) can be also shown similar to the proof for Z 1 , j = o p ( 1 ) by using the above mean value theorem instead of (10), because θ - θ ^ * = O p ( 1 / n ) for all j under appropriate parametric models and Assumption 5.
Proof of (30). 
It is clear that Z 2 , j * can be treated as second order degenerate U-statistic as follows:
h q / 2 n Z 2 , j * = 1 n ( n - 1 ) i = 1 n l i n K X i - X l h u i , j * u l , j * .
Define G n * ( Z 1 , Z 2 ) = E Z i [ { K ( X 1 - X i ) / h u 1 , j * u i , j * } { K ( X 2 - X i ) / h u 2 , j * u i , j * } ] , where Z i * = { X i , u i * } . According to the central limited theorem for degenerated U-statistics proposed by [43],
Z 2 , j * h - q / 2 2 E { [ u 1 , j * u 2 , j * K ( ( X 1 - X 2 ) / h ) ] 2 } d N ( 0 , 1 ) ,
if
E [ G n * 2 ( Z 1 * , Z 2 * ) ] + n - 1 E { [ u 1 , j u 2 , j K ( ( X 1 - X 2 ) / h ) ] 4 } E { [ u 1 , j * u 2 , j * K ( ( X 1 - X 2 ) / h ) ] 2 } 2 0 as n .
Thus, it suffices to show that (33) and
2 h - q E { [ u 1 , j * u 2 , j * K ( ( X 1 - X 2 ) / h ) ] 2 } V j , j * ,
hold.
Proof of (33). 
First, note that
E [ G n * 2 ( Z 1 , Z 2 ) ] = E E Z i u 1 , j * u 2 , j * u i , j * 2 K X 1 - X i h K X 2 - X i h 2 = E σ j * 2 ( X 1 ) σ j * 2 ( X 2 ) σ j * 2 ( z ) K X 1 - z h K X 2 - z h f ( z ) d z 2 = h 3 q K ( 4 ) ( 0 ) [ σ j * 2 ( x ) ] 4 f ( x ) 4 d x + O ( h 3 q + 1 ) + o ( h 3 q + 1 ) = O ( h 3 q ) .
In the same way as above, we obtain
n - 1 E u 1 , j * u 2 , j * K X 1 - X 2 h 4 = n - 1 σ j * 4 ( x ) σ j * 4 ( y ) K x - y h 4 f ( x ) f ( y ) d x d y = n - 1 h q [ σ j * 4 ( x ) ] 2 f 2 ( x ) d x K u 4 d u + O h 2 q n = O h q n .
Following some calculation, we can obtain
E u 1 , j * u 2 , j * K X 1 - X 2 h 2 2 = E σ * 2 ( X 1 ) σ * 2 ( X 2 ) K X 1 - X 2 h 2 2 = h 2 q K ( 2 ) ( 0 ) [ σ * 2 ( x ) ] 2 f 2 ( x ) d x + O ( h ) 2 = O ( h 2 q ) .
Thus, (33) holds by (35)–(37) because O ( h 3 q ) + O h q n O ( h 2 q ) = O ( h q ) + O 1 n h q O ( 1 ) 0 , as h 0 and n h q .
Proof of (34). 
From Equation (37), it is clear that
2 h q E u 1 , j * u 2 , j * K X 1 - X 2 h 2 = 2 K ( 2 ) ( 0 ) [ σ * 2 ( x ) ] 2 f 2 ( x ) d x + O ( h ) = 2 K ( 2 ) ( 0 ) E { [ σ * 2 ( x ) ] 2 f ( x ) } + O ( h ) V j , j * .
Thus, we have Z 2 , j * d N ( 0 , V j , j * ) .
Proof of (31). 
Z 3 , j * + Z 4 , j * can be represented as follows:
Z 3 , j * + Z 4 , j * = 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X i ) - m θ ^ * , j ( X i ) ] u l , j * + 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X l ) - m θ ^ * , j ( X l ) ] u i , j * = 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X i ) - m θ , j ( X i ) ] u l , j * + 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ , j ( X i ) - m θ ^ * , j ( X i ) ] u l , j * + 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ ^ , j ( X l ) - m θ , j ( X l ) ] u i , j * + 1 ( n - 1 ) h q / 2 i = 1 n l i n K X i - X l h [ m θ , j ( X l ) - m θ ^ * , j ( X l ) ] u i , j * Z 3 , j * + Z 3 , j * + Z 4 , j * + Z 4 , j * .
The only difference between Z 3 , j * + Z 4 , j * and Z 3 , j + Z 4 , j in (9) is that the former contains u l , j * and u i , j * instead of u l , j and u i , j , respectively. However, since E ( u i , j * | X i ) = 0 and E ( u l , j * | X l ) = 0 from the definition, Z 3 , j * + Z 4 , j * = o p ( 1 ) can be proven as with the proof of (9). Moreover, Z 3 , j * + Z 4 , j * = o p ( 1 ) can also be proven as with the proof of (9) by using (32) instead of (10).
Proof of (27) and (28). 
Since the asymptotic variance is shown above, we derive the asymptotic covariance between n h q / 2 Z ^ n , j * and n h q / 2 Z ^ n , m * , which we denote V j , m * . From the results of (29)–(31), it is clear that E ( Z 2 , j * Z 2 , m * ) V j , m * as n . Because E ( u i , j * u l , j * ) = 0 if i l and E ( u i , j * u i , m | X i ) = σ j , m * ( X i ) if j m , it follows that
E ( Z 2 , j * Z 2 , m * ) = 1 ( n - 1 ) 2 h q E i = 1 n l 1 n K X i - X l h u i , j * u l , j * s = 1 n t s n K X s - X t h u s , m * u t , m * = 2 ( n - 1 ) 2 h q E i = 1 n l 1 n u i , j * u l , j * u i , m * u l , m * K X i - X l h 2 = 2 n ( n - 1 ) h q σ j , m * ( x ) σ j , m * ( y ) K x - y h 2 f ( x ) f ( y ) d x d y = 2 K ( 2 ) ( 0 ) [ σ j , m * ( x ) ] 2 f 2 ( x ) d x + O ( h ) V j , m * .
Thus, proofs of (27) and (28) are straightforward from (38) and (39).
By the Cramér–Wold device and a similar calculation to the proof of (30), it can be straightforwardly shown that Z 2 converges to a multivariate normal distribution with the ( J - 1 ) × 1 mean vector consisting of zero and variance-covariance matrix V * , where V * is a ( J - 1 ) × ( J - 1 ) variance-covariance matrix whose ( j , m ) elements are V j , m * . Therefore, C n * , which is the quadratic form of n h q / 2 Z ^ n , j * , converges to a chi-squared distribution with J - 1 degrees of freedom.  ☐

References

  1. D. McFadden. “Conditional Logit Analysis of Qualitative Choice Behavior.” In Frontiers in Econometrics. Edited by P. Zarembka. New York, NY, USA: Academic Press, 1974, pp. 105–142. [Google Scholar]
  2. J.A. Hausman, and D.A. Wise. “A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Preferences.” Econometrica 46 (1978): 403–426. [Google Scholar] [CrossRef]
  3. S. Berry, J. Levinsohn, and A. Pakes. “Automobile Prices in Market Equilibrium.” Econometrica 63 (1995): 841–890. [Google Scholar] [CrossRef]
  4. P.K. Goldberg. “Product Differentiation and Oligopoly in International Markets: The Case of the US Automobile Industry.” Econometrica 63 (1995): 891–951. [Google Scholar] [CrossRef]
  5. J.J. Heckman. “The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models.” Ann. Econ. Soc. Meas. 5 (1976): 475–492. [Google Scholar]
  6. J.A. Dubin, and D.L. McFadden. “An Econometric Analysis of Residential Electric Appliance Holdings and Consumption.” Econometrica 52 (1984): 345–362. [Google Scholar] [CrossRef]
  7. W. Härdle, and E. Mammen. “Comparing Nonparametric versus Parametric Regression Fits.” Ann. Stat. 21 (1993): 1926–1947. [Google Scholar] [CrossRef]
  8. H.J. Bierens. “Consistent Model Specification Tests.” J. Econ. 20 (1982): 105–134. [Google Scholar] [CrossRef]
  9. H.J. Bierens. “Model Specification Testing of Time Series Regressions.” J. Econom. 26 (1984): 323–353. [Google Scholar] [CrossRef]
  10. H.J. Bierens. “A Consistent Conditional Moment Test of Functional Form.” Econometrica 58 (1990): 1443–1458. [Google Scholar] [CrossRef]
  11. M.A. Delgado. “Testing the Equality of Nonparametric Regression Curves.” Stat. Probab. Lett. 17 (1993): 199–204. [Google Scholar] [CrossRef]
  12. R.M. De Jong. “The Bierens Test under Data Dependence.” J. Econom. 72 (1996): 1–32. [Google Scholar] [CrossRef]
  13. D.W. Andrews. “A Conditional Kolmogorov Test.” Econometrica 65 (1997): 1097–1128. [Google Scholar] [CrossRef]
  14. H.J. Bierens, and W. Ploberger. “Asymptotic Theory of Integrated Conditional Moment Tests.” Econometrica 65 (1997): 1129–1151. [Google Scholar] [CrossRef]
  15. W. Stute. “Nonparametric Model Checks for Regression.” Ann. Stat. 25 (1997): 613–641. [Google Scholar] [CrossRef]
  16. M.B. Stinchcombe, and H. White. “Consistent Specification Testing with Nuisance Parameters Present Only under the Alternative.” Econom. Theory 14 (1998): 295–325. [Google Scholar] [CrossRef]
  17. X. Chen, and Y. Fan. “Consistent Hypothesis Testing in Semiparametric and Nonparametric Models for Econometric Time Series.” J. Econom. 91 (1999): 373–401. [Google Scholar] [CrossRef]
  18. Y.J. Whang. “Consistent Bootstrap Tests of Parametric Regression Functions.” J. Econom. 98 (2000): 27–46. [Google Scholar] [CrossRef]
  19. R.L. Eubank, and C.H. Spiegelman. “Testing the Goodness of fit of a Linear Model via Nonparametric Regression Techniques.” J. Am. Stat. Assoc. 85 (1990): 387–392. [Google Scholar] [CrossRef]
  20. S. Le Cessie, and J.C. van Houwelingen. “A Goodness-of-fit Test for Binary Regression Models, Based on Smoothing Methods.” Biometrics 47 (1991): 1267–1282. [Google Scholar] [CrossRef]
  21. J.M. Wooldridge. “A Test for Functional Form Against Nonparametric Alternatives.” Econom. Theory 8 (1992): 452–475. [Google Scholar] [CrossRef]
  22. A.J. Yatchew. “Nonparametric Regression Tests Based on an Infinite Dimensional Least Squares Procedure.” Econom. Theory 8 (1992): 435–451. [Google Scholar] [CrossRef]
  23. P.L. Gozalo. “A Consistent Model Specification Test for Nonparametric Estimation of Regression Function Models.” Econom. Theory 9 (1993): 451–477. [Google Scholar] [CrossRef]
  24. Y. Aït-Sahalia, P.J. Bickel, and T.M. Stoker. “Goodness-of-fit Tests for Kernel Regression with an Application to Option Implied Volatilities.” J. Econom. 105 (2001): 363–412. [Google Scholar] [CrossRef]
  25. M.A. Delgado, and T. Stengos. “Semiparametric Specification Testing of non-Nested Econometric Models.” Rev. Econ. Stud. 61 (1994): 291–303. [Google Scholar] [CrossRef]
  26. J.L. Horowitz, and W. Härdle. “Testing a Parametric Model Against a Semiparametric Alternative.” Econom. Theory 10 (1994): 821–848. [Google Scholar] [CrossRef]
  27. Y. Hong, and H. White. “Consistent Specification Testing via Nonparametric Series Regression.” Econometrica 63 (1995): 1133–1159. [Google Scholar] [CrossRef]
  28. Y. Fan, and Q. Li. “Consistent Model Specification Tests: Omitted Variables and Semiparametric Functional Forms.” Econometrica 64 (1996): 865–890. [Google Scholar] [CrossRef]
  29. P. Lavergne, and Q.H. Vuong. “Nonparametric Selection of Regressors: The Nonnested Case.” Econometrica 64 (1996): 207–219. [Google Scholar] [CrossRef]
  30. J.X. Zheng. “A Consistent Test of Functional Form via Nonparametric Estimation Techniques.” J. Econom. 75 (1996): 263–289. [Google Scholar] [CrossRef]
  31. Q. Li, and S. Wang. “A Simple Consistent Bootstrap Test for a Parametric Regression Function.” J. Econom. 87 (1998): 145–165. [Google Scholar] [CrossRef]
  32. P. Lavergne, and Q. Vuong. “Nonparametric Significance Testing.” Econom. Theory 16 (2000): 576–601. [Google Scholar] [CrossRef]
  33. Y. Fan, and Q. Li. “Consistent Model Specification Tests.” Econom. Theory 16 (2000): 1016–1041. [Google Scholar] [CrossRef]
  34. J. Mora, and A.I. Moro-Egido. “On Specification Testing of Ordered Discrete Choice Models.” J. Econom. 143 (2008): 191–205. [Google Scholar] [CrossRef]
  35. J. Fan, and L.S. Huang. “Goodness-of-fit Tests for Parametric Regression Models.” J. Am. Stat. Assoc. 96 (2001): 640–652. [Google Scholar]
  36. J.L. Horowitz, and V.G. Spokoiny. “An Adaptive, Rate-Optimal Test of a Parametric Mean-Regression Model Against a Nonparametric Alternative.” Econometrica 69 (2001): 599–631. [Google Scholar] [CrossRef]
  37. V. Spokoiny. “Data-Driven Testing the fit of Linear Models.” Math. Methods Stat. 10 (2001): 465–497. [Google Scholar]
  38. Y. Baraud, S. Huet, and B. Laurent. “Adaptive Tests of Linear Hypotheses by Model Selection.” Ann. Stat. 31 (2003): 225–251. [Google Scholar]
  39. C.M. Zhang. “Adaptive Tests of Regression Functions via Multiscale Generalized Likelihood Ratios.” Can. J. Stat. 31 (2003): 151–171. [Google Scholar] [CrossRef]
  40. E. Guerre, and P. Lavergne. “Data-Driven Rate-Optimal Specification Testing in Regression Models.” Ann. Stat. 33 (2005): 840–870. [Google Scholar]
  41. W. Härdle, M. Müller, S. Sperlich, and A. Werwatz. Nonparametric and Semiparametric Models. Berlin Heidelberg, Germany: Springer, 2004, p. 92. [Google Scholar]
  42. J.L. Powell, J.H. Stock, and T.M. Stoker. “Semiparametric Estimation of Index Coefficients.” Econometrica 57 (1989): 1403–1430. [Google Scholar] [CrossRef]
  43. P. Hall. “Central Limit Theorem for Integrated Square Error of Multivariate Nonparametric Density Estimators.” J. Multivar. Anal. 14 (1984): 1–16. [Google Scholar] [CrossRef]
  • 1.Rate optimal tests are proposed by [36,37,38,39,40,43], among others.
  • 2.To be accurate, the MNL model consists of alternative-variant coefficients whose response probabilities are indicated by P ( Y j = 1 | X ) = exp ( X β j ) / [ 1 + j = 1 J exp ( X β j ) ] . However, the models represented by alternative-variant coefficients are able to transform into a model with alternative-invariant coefficients without loss of generality, which is sometimes called a conditional logit model. In this paper, we describe only a model with alternative-invariant coefficients.

Share and Cite

MDPI and ACS Style

Iwasawa, M. A Joint Specification Test for Response Probabilities in Unordered Multinomial Choice Models. Econometrics 2015, 3, 667-697. https://doi.org/10.3390/econometrics3030667

AMA Style

Iwasawa M. A Joint Specification Test for Response Probabilities in Unordered Multinomial Choice Models. Econometrics. 2015; 3(3):667-697. https://doi.org/10.3390/econometrics3030667

Chicago/Turabian Style

Iwasawa, Masamune. 2015. "A Joint Specification Test for Response Probabilities in Unordered Multinomial Choice Models" Econometrics 3, no. 3: 667-697. https://doi.org/10.3390/econometrics3030667

Article Metrics

Back to TopTop