Interpretation and Semiparametric Efficiency in Quantile Regression under Misspecification

Allowing for misspecification in the linear conditional quantile function, this paper provides a new interpretation and the semiparametric efficiency bound for the quantile regression parameter β(τ) in Koenker and Bassett (1978). The first result on interpretation shows that under a mean-squared loss function, the probability limit of the Koenker–Bassett estimator minimizes a weighted distribution approximation error, defined as FY(Xβ(τ)|X) − τ, i.e., the deviation of the conditional distribution function, evaluated at the linear quantile approximation, from the quantile level. The second result implies that the Koenker–Bassett estimator semiparametrically efficiently estimates the quantile regression parameter that produces parsimonious descriptive statistics for the conditional distribution. Therefore, quantile regression shares the attractive features of ordinary least squares: interpretability and semiparametric efficiency under misspecification.


Introduction
This paper revisits the approximation properties of the linear quantile regression under misspecification ( [1][2][3]).The quantile regression estimator, introduced by the seminal paper of Koenker and Bassett [4], offers parsimonious summary statistics for the conditional quantile function and is computationally tractable.Since the development of the estimator, researchers have frequently used quantile regression, in conjunction with ordinary least squares regression, to analyse how the outcome variable responds to the explanatory variables.For example, to model wage structure in labour economics, Angrist, Chernozhukov, and Fernández-Val [1] study returns to education at different points in the wage distribution and changes in inequality over time.A thorough review of recent developments in quantile regression can be found in [5].The object of interest of this paper is the quantile regression (QR) parameter that is the probability limit of the Koenker-Bassett estimator without assuming the true conditional quantile function to be linear.Two results are presented: a new interpretation and the semiparametric efficiency bound for the QR parameter.
The topic of interest is the conditional distribution function (CDF) of a continuous response variable Y given the regressor vector X, denoted as F Y (y|X).An alternative for the CDF is the conditional quantile function (CQF) of Y given X, defined as Q τ (Y|X) := inf{y : F Y (y|X) ≥ τ} for any quantile index τ ∈ (0, 1).Assuming integrability, the CQF minimizes the check loss function where Q is the set of measurable functions of X, ρ τ (u) = u(τ − 1 {u≤0} ) is known as the check function and 1 {•} is the indicator function.A linear approximation to the CQF is provided by the QR parameter β(τ), which solves the population minimization problem assuming the integrability and uniqueness of the solution, and d is the dimension of X.The QR parameter β(τ) provides a simple summary statistic for the CQF.The QR estimator introduced in [4] is the sample analogue for the random sample (Y i , X i , i ≤ n) on the random variables (Y, X ).By the equivalent first-order condition, this estimator β(τ) is also the generalized method of moments (GMM) estimator based on the unconditional moment restriction ( [6,7]) This paper focuses on the population QR parameter defined by (1) or equivalently (3).
If the CQF is modelled to be linear in the covariates Q τ (Y|X) = X β(τ) or F Y (X β(τ)|X) = τ, the coefficient β(τ) satisfies the conditional moment restriction almost surely.In the theoretical and applied econometrics literature, this linear QR model is often assumed to be correctly specified.Nevertheless, a well-known crossing problem arises: the CQF for different quantiles may cross at some values of X, except when β(τ) is the same for all τ.A logical monotone requirement is violated for Q τ (Y|X) or its estimator to be weakly increasing in the probability index τ given X.The crossing problem for estimation could be treated by rearranging the estimator (for example, see [8] and the references therein. 1 ).However, the crossing problem remains for the population CQF, suggesting that the linear QR model ( 4) is inherently misspecified.That is, there is no β(τ) ∈ R d satisfying the conditional moment (4) almost surely.Therefore, the parameter of interest in this paper is the QR parameter β(τ) defined by ( 1) or (3) without the linear CQF assumption in (4).We can view β(τ) as the pseudo-true value of the linear QR model under misspecification.As the Koenker-Bassett QR estimator is widely used, it is important to understand the approximation nature of the estimand.
For the mean regression counterpart, ordinary least squares (OLS) consistently estimates the linear conditional expectation and minimizes mean-squared error loss for fitting the conditional expectation under misspecification.Chamberlain [9] proves the semiparametric efficiency of the OLS estimator, which provides additional justification for the widespread use of OLS.The attractive features of OLS, interpretability and semiparametric efficiency, under misspecification, motivate my investigation of parallel properties in QR.I study how this QR parameter approximates the CQF and the CDF and calculate its semiparametric efficiency bound.
The first contribution of this paper is on how β(τ) minimizes the distribution approximation error, defined by F Y (X β(τ)|X) − τ, under a mean-squared loss function.The first-order condition (3) 1 Chernozhukov, Fernández-Val, and Galichon [8] rearrange an estimator Qτ (Y|X) to be monotonic.The original estimator can be computationally tractable.The rearranged monotonic estimated CDF is FY (y|X) = can be understood as the orthogonality condition of the covariates X and the distribution approximation error in the projection model.I show that the QR parameter β(τ) minimizes the mean-squared distribution approximation error, inversely weighted by the conditional density function f Y (X β(τ)|X).Angrist, Chernozhukov, and Fernández-Val [1] (henceforth ACF) show that β(τ) minimizes the mean-squared quantile specification error, defined by Q τ (Y|X) − X β(τ), using a weight primarily determined by the conditional density.ACF's study, as well as my own results, suggests that QR approximates the CQF more accurately at points with more observations, but the corresponding CDF evaluated at the approximated point F Y (X β(τ)|X) is more distant from the targeted quantile level τ.This trade-off is controlled by the conditional density, which is distinct from OLS approximating the conditional mean, because the distribution and quantile functions are generally nonlinear operators.This observation is novel and increases the understanding of how the QR summarizes the outcome distribution.A numerical example in Figure 1 in Section 4 illustrates this finding.
The second result is the semiparametric efficiency bound of the β(τ).Chamberlain's results in [9] on the mean regression based on differentiable moment restrictions cannot be applied to semiparametric efficiency for QR, due to the lack of moment function differentiability in (3).Although Ai and Chen [10] provide general results for sequential moment restrictions containing unknown functions, which could cover the quantile regression setting, I calculate the efficiency bound accommodating regularity conditions specifically for the QR parameter β(τ) using the method of Severini and Tripathi [11].It follows that the misspecification-robust asymptotic variance of the QR estimator β(τ) in (2) attains this bound, which means no regular2 estimator for (3) has smaller asymptotic variance than β(τ).This result might be expected for an M-estimator, but, to my knowledge, the QR application has not been demonstrated and discussed rigorously in any publication.Furthermore, I calculate the efficiency bounds for jointly estimating QR parameters at a finite number of quantiles for both linear projection (3) and linear QR (4) models.Employing the widely-used method of Newey [12], Newey and Powell [13] find the semiparametric efficiency bound for β(τ) of the correctly-specified linear CQF in (4).Note that the efficiency bounds for (3) do not imply the bounds for (4); nor does the converse hold.
In Section 2, I discuss the interpretation of the misspecified QR model in terms of approximating the CDF and the CQF.The theorems for the semiparametric efficiency bounds are in Section 3. In Section 4, I discuss the parallel properties of QR and OLS.The paper is concluded by a review of some existing efficient estimators for linear projection model (3) and linear QR model (4).

Interpreting QR under Misspecification
Let Y be a continuous response variable and X be a d × 1 regressor vector.The quantile-specific residual is defined as the distance between the response variable and the CQF, . This is a semiparametric problem in the sense that the distribution functions of ε τ and X, as well as the CQF, are unspecified and unrestricted other than by the following assumptions, which are standard in QR models.I assume the following regularity conditions, based on the conditions of Theorem 3 in ACF.
(R1) (Y i , X i , i ≤ n) are independent and identically distributed on the probability space (Ω, F , P) for each n; (R2) the conditional density f Y (y|X = x) exists and is bounded and uniformly continuous in y, uniformly in x over the support of X; ] is positive definite for all τ ∈ (0, 1), where β(τ) is uniquely defined in (1); (R4) E X 2+ < ∞ for some > 0; (R5) f Y (X β(τ)|X) to be bounded away from zero.
The identification of the pseudo-true parameter β(τ) is assumed in (R3).The bounded conditional density function of the continuous response variable Y given X in (R2) is needed for the existence of the CQF for any τ ∈ (0, 1).The uniform continuity guarantees the existence and differentiability of the distribution function, i.e., dF Y (y|X The covariates X are allowed to contain discrete components.(R5) guarantees that the objective function defined below in Equation ( 6) is finite ∀β ∈ R d , where β(τ) is the parameter of interest uniquely defined by Equation (1).
The parameter of interest β(τ) is equivalent to solving by applying the law of iterated expectations on Equation (3).Equation ( 5) states that X is orthogonal to the distribution approximation error F Y X β(τ) X − τ.The following theorem interprets QR via a weighed mean-squared error loss function on the distribution approximation error.
Proof of Theorem 1.The objective function in ( 6) is finite by the assumptions.Any fixed point b = β(τ) would solve the first-order condition, E X F Y X b X − τ = 0.By the law of iterated expectations, (3) implies the above first-order condition.Therefore, β(τ) solves (6).When the second-order condition holds, Theorem 1 states that β(τ) is the unique fixed point to an iterated minimum distance approximation, with a weight of a function of X only.The mean-squared loss makes it clear how the linear function matches the CDF to the targeted probability of interest.The loss function puts more weight on points where the conditional density f Y (X β(τ)|X) is small.As a result, the distribution approximation error is smaller at points with smaller conditional density.Now, I discuss the approximation nature of QR based on the distributional approximation error and quantile specification error.ACF interpret QR as the minimizer of the weighted mean-squared error loss function for quantile specification error, defined as the deviation between the approximation point X β(τ) and the true CQF wτ (X, ACF define wτ (X, β(τ)) in ( 8) to be the importance weights that are the averages of the response variable over a line connecting the approximation point X β(τ) and the true CQF.ACF note that 3 For estimation, [14] studies different approaches based on the distribution regression and quantile regression.
the regressors contribute disproportionately to the QR estimate and the primary determinant of the importance weight is the conditional density.Moreover, the first-order condition implied by (7) E wτ (X, β(τ))X X β(τ) − Q τ (Y|X) = 0 is a weighted orthogonal condition of the quantile specification error.A Taylor expansion provides intuition to connect the distribution approximation error and the quantile specification error: . This observation implies the quantile specification error is smaller at points where the conditional density f Y (X β|X) is larger.On the other hand, the distribution approximation error is larger at points with larger f Y (X β|X).Comparing with the OLS, where the mean operator is linear, the CDF and its inverse operator, the CQF, are generally nonlinear.The distribution approximation error can be interpreted as the distance after a nonlinear transformation by the CDF, A Taylor expansion linearizes the distribution function to the quantile specification error multiplied by the conditional density function.The conditional density plays a crucial role on weighting the distribution approximation error and the quantile specification error.The above discussion provides additional insights to how the QR parameter approximates the CQF and fits the CDF to the targeted quantile level.
Remark 1 (Mean-squared loss under misspecification).The linear function X β(τ) is the best linear approximation under the check loss function in (1).While β(.5) is the least absolute derivations estimation, the QR parameter β(τ) for τ = 0.5 is the best linear predictor for a response variable under the asymmetric loss function ρ τ (•) in (1).ACF note that the prediction under the asymmetric check loss function is often not the object of interest in empirical work, with the exception of the forecasting literature, for example [15].For the mean regression counterpart, OLS consistently estimates the linear conditional expectation and minimizes mean-squared error loss for fitting the conditional expectation under misspecification.The robust nature of OLS also motivates research on misspecification in panel data models.For example, Galvao and Kato [16] investigate linear panel data models under misspecification.The pseudo-true value of the fixed effect estimator provides the best partial linear approximation to the conditional mean given the explanatory variables and the unobservable individual effect. 4

The Semiparametric Efficiency Bounds
Section 3.1 presents the semiparametric efficiency bound for the unconditional moment restriction (3).Section 3.2 discusses the existing results on the semiparametric efficiency bound for the conditional moment restriction (4).

QR under Misspecification
I calculate the semiparametric efficiency bound for the unconditional moment restriction (3) by the approach of Severini and Tripathi [11].

Proof of Theorem 2. See the Appendix. 2
My proof accommodates the regularity assumptions for quantile regression and modifies Section 9 of [11].For example, the covariate X can contain discrete components, by constructing two tangent spaces for the conditional density of Y given X and the marginal density of X, respectively.In the efficiency bound, J(τ) := E [ f Y (X β(τ)|X)XX ] is obtained by assuming the interchangeability of integration and differentiation for the nonsmooth check function. 5 The method in [11] has been used in the monotone binary model in [18], Lewbel [19] latent variable model in [20] and the partial linear single index model in [21], for example.I work in the Hilbert space of tangent vectors of the square-root density functions and using the Riesz-Fréchet representation theorem.Another equivalent approach in [12] works in a Hilbert space of random variables and uses the projection on the linear space spanned by the scores from the one-dimensional subproblems to find the efficient influence function.The efficiency bound is then the second moment of the efficient influence function, J(τ) −1 X(τ − 1 {Y≤X β} ).Newey's efficient influence function is the score function evaluated at the unique representers by the Riesz-Fréchet theorem used in [11]; a more detailed comparison of these two approaches is discussed in [11].
The semiparametric efficiency bound for the correctly-specified quantile regression ( 4) ) is assumed to be finite and nonsingular.This is first calculated in [13] using the method developed in [12].If, in addition, the conditional density function of ε τ given X is independent of . This asymptotic covariance is attained by β(τ), first shown in [4].This has an interesting resemblance to the fact that the OLS estimator is semiparametrically efficient in a homoskedastic regression model, i.e., e = Y − X β, E[e|X] = 0, and I further show, in general, that the semiparametrically-efficient joint asymptotic covariance of the estimators for (β (τ 1 ), ..., for any τ i , τ j ∈ T , i, j = 1, 2, ..., m, for any finite integer m ≥ 1.The regularity conditions imposed, (R1), (R2) and (R4), are weaker than the assumptions in [13]; for example, they assume f (ε, X) is 5 Severini and Tripathi construct the tangent space for the continuous and bounded joint density f (X, Y) in Section 9 of [11].Additionally, they define J on the derivative of the moment restriction.absolutely continuous in ε, which implies uniform continuity in (R2).See the Appendix for the detailed proof for (9).

Discussion and Conclusions
Misspecification is a generic phenomenon; especially in quantile regression (QR), the true conditional quantile function (CQF) might be nonlinear or different functions of the covariates at different quantiles.Table 1 summarizes the parallel properties of QR and OLS.Under misspecification, the pseudo-true OLS coefficient can be interpreted as the best linear predictor of the conditional mean function, E[Y|X], in the sense that the coefficient minimizes the mean-squared error of the linear approximation to the conditional mean.The approximation properties of OLS have been well studied (see, for example, [22]).With respect to the QR counterpart, I present the inverse density-weighted mean-squared error loss function based on the distribution approximation error F Y (X β|X) − τ.This result complements the interpretation based on the quantile specification error in [1].My results imply that the Koenker-Bassett estimator is semiparametrically efficient for misspecified linear projection models and correctly specified linear quantile regression models when f Y (Q τ (Y|X)|X) = f ε τ (0|X) does not depend on X.Alternatively, the smoothed empirical likelihood estimator using the unconditional moment restriction in [23] has the same asymptotic distribution as the Koenker-Bassett estimator and, hence, attains the efficiency bound.

Linear Regression Model
] where e = Y − X β; † The feasible generalized least squares estimator is semiparametrically efficient, for example.
Under the linear quantile regression model, the Koenker-Bassett estimator consistently estimates the true β(τ), although it is not semiparametrically efficient given heteroskedasticity.Researchers have proposed many efficient estimators for the correctly-specified linear quantile regression parameter, for example the one-step score estimator in [13], the smoothed conditional empirical likelihood estimator in [24] and the sieve minimum distance (SMD) estimator in [25,26].However, for all of these estimators, the pseudo-true values under misspecification are different, and their interpretations have not been thoroughly studied.Therefore, the semiparametric efficiency bounds of these pseudo-true values are also different.For example, an unweighted SMD estimator converges to a pseudo-true value β SMD that minimizes E (F Y (X β|X) − τ) 2 . 6The first-order condition is which is the unconditional moment used in [13] for the semiparametrically efficient GMM estimator under correct specification.The conditional density weight is similar to the generalized least squares in the mean regression in that it uses a weight function of the conditional variance to construct an efficient estimator.
It is interesting to note that the pseudo-true value of the SMD estimator minimizes The distribution approximation error is weighted evenly over the support of X for β SMD , in contrast to the QR parameter, which is weighted more at points with smaller conditional density in Theorem 1.Therefore, the SMD estimator might have more desirable and reasonable approximation properties than QR.Nevertheless, the SMD estimator is computationally more demanding than the Koenker-Bassett estimator.A numerical example in Figure 1 illustrates how the Koenker and Bassett (KB) and SMD estimators approximate the CQF and the CDF.where the conditional density is larger, the quantile specification error of SMD is smaller than that of KB in the left panel.For the distribution approximation error in the right panel, SMD weights more evenly over the support of X, while KB has smaller distribution approximation error at larger x with smaller density.
This discussion leads to open-ended questions: What is an appropriate linear approximation or a meaningful summary statistic for the nonlinear CQF?How should economists measure the 6 The conditional moment restriction in (4) can be expressed as m(X, β) = τ − F Y (X β|X) = 0.In [26], an unweighted penalized sieve minimum distance estimator minimizes a possibly penalized consistent estimate of the minimum distance criterion, E[m(X, β) 2 ].
marginal effect of the covariates on the CQF?An approach that circumvents this problem is measuring the average marginal response of the covariates on the CQF directly.The average quantile derivative, defined as E[W(X)∇Q τ (Y|X)] where W(X) is a weight function, offers such a succinct summary statistic ( [27]).Sasaki [28] investigates the question that quantile regressions may misspecify true structural functions.He provides a causal interpretation of the derivative of the CQF, which identifies a weighted average of heterogeneous structural partial effects among the subpopulation of individuals at the conditional quantile of interest.Sasaki's work adds economic content to this misspecified question.This paper complements the prior literature on understanding how the QR statistically summarizes the outcome distribution.ξ 0 = (ψ 0 , φ 0 ) ∈ A and β 0 as a one-dimensional subproblem.For some t 0 > 0, let t → (ξ t , β t ) be a curve from [0, t 0 ] into A × R d , which passes through (ξ 0 , β 0 ) at t = 0.That is, estimating η(ξ t ) = c β t = t at the true parameter t = 0 is equivalent to estimating t = 0.The likelihood of estimating t using a single observation (Y, X ) is given by ψ 2 t (Y|X)φ 2 t (X).Therefore, the score function for estimating t = 0 is φ 0 (X) .
Then, the Fisher information at t = 0 can be written as where the third equality is because ξ0 = ( ψ0 , φ0 ) lin T(A, ξ 0 ), and E X denotes integrals with respect to the distribution of X.Therefore, the Fisher information inner product < •, • > F and the corresponding norm • F are defined as for any ξ1 .ξ2 ∈ lin T(A, ξ 0 ), which is a closed subset of L 2 (Ω; P).Hence, I have constructed the Hilbert space (lin T(A, ξ 0 ), < •, • > F ). Now, I am ready to derive the efficiency bounds.It is known that the information inequality holds for all regular estimators, i.e., the asymptotic covariance of the estimator ≥ 1/i F = ξ0 −2 F .The semiparametric bound can be interpreted as the supremum of the asymptotic covariance over the parametric submodels.By [11], the lower bound is The third equality is the norm of the linear functional ∇η, the path-wise derivative of η (p.105 in [29]).The forth equality is from the Riesz-Fréchet theorem: there exists a unique ξ * ∈ lin T(A, ξ 0 ) for the continuous linear functional ∇η on the Hilbert space (lin Therefore, to find the lower bound by (10), I need to find ξ * , which is known as the representer of the continuous linear functionals ∇η.

Figure 1 .
Figure 1.This numerical example is constructed by X ∼ Uni f orm[1,2], e|X = x ∼ Uni f orm[0, x] and Y = cos(2X) + e.Therefore, f Y (y|X) = 1/X, F Y (y|X) = (y − cos(2X))/X and Q τ (Y|X) = τX + cos(4X).Set τ = 0.5 for the median.The red solid line is for the QR parameter β KB defined in (3) and estimated by the Koenker-Bassett (KB) estimator.The blue dashed line is the approximation by the SMD estimatorβ SMD minimizing E[(F Y (X β|X) − τ) 2 ].The approximations are X β KB = −0.324+ 0.161X and X β SMD = −0.204+ 0.078X.The left panel shows the linear approximations X β KB , X β SMD and the true CQF.The green circles are 300 random draws from the DGP.The right panel shows the corresponding CDFs F Y (X β KB |X) and F Y (X β SMD |X).For smaller x where the conditional density is larger, the quantile specification error of SMD is smaller than that of KB in the left panel.For the distribution approximation error in the right panel, SMD weights more evenly over the support of X, while KB has smaller distribution approximation error at larger x with smaller density.

Table 1 .
Summary properties of OLS and quantile regression (QR).