A Robust Covariate-Dependent Kink Threshold Regression Model for Panel Data

Ma, Ding; Hong, Hengzhao; Li, Yi; Wan, Chuang; Wang, Yutong

doi:10.3390/axioms15050319

Open AccessArticle

A Robust Covariate-Dependent Kink Threshold Regression Model for Panel Data

by

Ding Ma

¹

,

Hengzhao Hong

²,

Yi Li

³,

Chuang Wan

⁴ and

Yutong Wang

^4,*

¹

School of International Business and Trade, Fujian Business University, Fuzhou 350012, China

²

School of Economic, Xiamen University, Xiamen 361005, China

³

College of Tourism, Hunan Normal University, Changsha 410081, China

⁴

School of Economic, Jinan University, Guangzhou 510632, China

^*

Author to whom correspondence should be addressed.

Axioms 2026, 15(5), 319; https://doi.org/10.3390/axioms15050319

Submission received: 10 March 2026 / Revised: 11 April 2026 / Accepted: 26 April 2026 / Published: 28 April 2026

(This article belongs to the Special Issue Probability, Statistics and Estimations, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

This paper introduces a rank-based panel kink threshold regression model with a covariate-dependent threshold, where the threshold is specified as a function of informative covariates. To estimate the model parameters, we propose a profile estimation procedure for both the threshold parameters and regression coefficients. Additionally, we develop a Wald test statistic to examine the constancy of the threshold and a sup-score test to detect the presence of the kink effect. Through simulation studies and an empirical analysis, we demonstrate that the proposed methods exhibit robustness against outliers and heavy-tailed errors in both parameter estimation and hypothesis testing.

Keywords:

panel kink model; robust estimation; rank-based regression; rank score test

MSC:

62J02; 62F03; 62F35; 62F10

1. Introduction

Threshold regression models provide a powerful analytical framework for identifying heterogeneous effects, as they partition the dataset into two or more subgroups characterized by distinct regression functions based on a continuous threshold variable. In particular, the majority of existing literature focuses on jump threshold regression [1,2], where the effects of covariates are expressed as piecewise functions of the threshold variable, with discrete jumps occurring at unknown change points. However, assuming an abrupt, discontinuous shift in the regression relationship at the threshold may be unrealistic in certain practical contexts, given that covariates often induce a more gradual transition. To address this limitation, the kink threshold regression model was developed as an extension of the regression discontinuity design (e.g., [3]). This model ensures the continuity of the regression function across all points while allowing for a “kink”, a discrete change in slope at the threshold. Since its introduction, kink threshold regression has attracted significant attention in econometrics, biostatistics, and related fields and has been adapted to diverse data types, including cross-sectional data [4,5], time series data [3,6], longitudinal data [7,8], and panel data [9,10]. Motivated by its wide applicability, in this paper, we focus on panel data, which is characterized by a large cross-section of individuals or entities observed repeatedly over a relatively small, fixed number of time periods, due to its prevalence in many research domains.

A substantial body of literature has explored kink regression under the assumption of a constant threshold parameter. For instance, Ref. [3] examined the kink effect of debt on economic growth, using long-span U.S. time series data covering the period 1791–2009. The study found that elevated debt ratios would lead to a moderate slowdown in average GDP growth rates, with the estimated constant kink threshold falling in the range of 43–44%. However, the assumption of a constant, unknown threshold is often insufficient to capture the varying kink effect. For example, when studying the nonlinear effect of public debt on economic growth, inflation exerts two opposing influences: it can erode the real value of public debt and mitigate its negative impact on growth, yet it can also act as a form of sovereign default, raising risk premiums and amplifying debt’s drag on growth. This duality implies that the kink threshold for debt is not fixed but instead depends on inflation and other covariates. To accommodate such heterogeneity, Ref. [11] extended the constant kink threshold regression to a covariate-dependent kink threshold model, where the threshold is modeled as a function of informative covariates. This new model is capable of capturing such covariate-driven shifts in the debt-growth relationship. Building upon this framework, Ref. [12] further extended the model to the panel data setting, establishing the asymptotic properties of the estimators and developing F-type test statistics to examine both the constancy of the threshold and the existence of the kink effect.

All the aforementioned methods are rooted in the least squares estimation framework, which performs reasonably well under the assumption of normality. Nevertheless, in many real-world applications, outliers and heavy-tailed errors are prevalent, and neglecting such departures from normality can severely bias parameter estimates and undermine the reliability of threshold inference. Consequently, a robust estimation procedure is desirable to ensure valid and efficient statistical inference. In the context of constant kink threshold regression, Ref. [13] proposed a rank-based estimator that exhibits robustness to outliers and heavy-tailed errors while retaining high efficiency. Despite its advantages, their estimation and inference methodology cannot be directly extended to the panel covariate-dependent kink threshold regression model, thereby leaving a critical gap in the literature.

In this paper, we implement a robust statistical inference procedure for the panel kink regression model with a covariate-dependent threshold, and our contributions are threefold. First, we develop a rank-based estimation procedure by substituting the residual sum of squares in [12] with the rank dispersion function [14]. To account for unobserved heterogeneity, we draw on the approach of [15] and adopt the within-group transformation to eliminate individual fixed effects. Given that the objective function is non-differentiable and non-convex with respect to the threshold parameters, we adapt the widely used profile estimation strategy to jointly estimate the threshold parameters and regression coefficients. We further demonstrate that the slope and threshold estimators are jointly asymptotically normal with a root-nconvergence rate, owing to the continuity of the regression function with respect to the threshold parameter. Second, we design a formal testing procedure to examine whether the kink threshold is constant or covariate-dependent. Leveraging the asymptotic properties of the proposed estimators, we construct a standard Wald statistic to test the null hypothesis of threshold constancy. Third, we develop a testing procedure for the existence of the kink threshold effect, which is based on a weighted CUSUM-type statistic of subgradients. The asymptotic properties of the proposed test statistic are rigorously established under both the null and alternative hypotheses, and a simulation-based implementation procedure is outlined to facilitate the practical application of the test.

The remainder of this paper is organized as follows. Section 2 introduces the covariate-dependent panel kink threshold regression model, elaborates on the rank-based estimation for model parameters, and proposes test statistics for both threshold constancy and the presence of the threshold effect. Section 3 presents Monte Carlo simulation results aimed at evaluating the finite-sample performance of the proposed inference procedures. Section 4 provides an empirical application using a panel wage dataset to illustrate the practical utility of the proposed methodologies. Finally, Section 5 concludes the paper and discusses potential avenues for future research. All technical proofs are relegated in the Appendix A.

2. Methodology

2.1. Covariate-Dependent Panel Kink Threshold Regression Model

Consider the following panel data regression model featuring a kink (i.e., a slope change), where the threshold point is not constant but instead varies with other explanatory variables:

y_{i t} = β_{0} x_{i t} + β_{1} {(x_{i t} - γ_{i t})}_{+} + β_{2}^{T} z_{i t} + μ_{i} + ε_{i t},

(1)

for units

i = 1, 2, \dots, n

and time periods

t = 1, 2, \dots, T

, where

y_{i t}

is the dependent variable,

x_{i t}

is the primary regressor of interest, and

ε_{i t}

is the idiosyncratic error term. The vector

z_{i t}

contains a set of l control variables and incorporates the covariates

q_{i t}

(defined in Equation (2)), which determine the threshold. The term

μ_{i}

captures unobserved, time-invariant individual effects that may be correlated with other regressors. The expression

{(x_{i t} - γ_{i t})}_{+} = max (x_{i t} - γ_{i t}, 0) = (x_{i t} - γ_{i t}) I (x_{i t} > γ_{i t})

denotes the positive part of

(x_{i t} - γ_{i t})

. This formulation implies that the slope of

x_{i t}

is

β_{0}

when

x_{i t}

is below the threshold

γ_{i t}

, and changes to

β_{0} + β_{1}

when

x_{i t}

exceeds it, producing a kink at the point

x_{i t} = γ_{i t}

. Importantly, the threshold

γ_{i t}

is modeled as a linear function of observable covariates

q_{i t} = {(q_{1, i t}, \dots, q_{k, i t})}^{T}

, thereby accommodating heterogeneity in the threshold across individuals and/or over time:

γ_{i t} = γ_{0} + γ_{1}^{T} q_{i t},

(2)

where

γ_{0}

is an intercept and

γ_{1}

is a

k \times 1

vector of coefficients. Note that

q_{i t}

cannot include

x_{i t}

itself, as doing so would induce perfect multicollinearity.

The covariate-dependent kink threshold regression model can be viewed as a generalization of the kink regression models proposed by [3,9], as it allows the kink threshold to vary across observations. This model has also been studied by [12,16] using the least squares (LS) estimator, which relies on the assumptions of zero-mean and finite-variance errors. However, the LS estimator is well known to be highly sensitive to outliers, and its validity is compromised when the error follows a heavy-tailed distribution (e.g., the Cauchy distribution), as such distributions violate the LS assumptions and lead to unreliable estimates. This fundamental limitation motivates the need for a more robust alternative, prompting us to explore a rank-based regression framework.

2.2. Rank-Based Estimator

To enhance the robustness of the covariate-dependent kink threshold regression model (1), we adopt a rank-based estimation approach grounded in Jaeckel’s dispersion function, originally developed by [14,17].

For clarity, we first rewrite the model in a more compact form. Denote

β = {(β_{0}, β_{1}, β_{2}^{T})}^{T}

,

γ = {(γ_{0}, γ_{1}^{T})}^{T}

, and

x_{i t} (γ) = {(x_{i t}, {(x_{i t} - γ_{i t})}_{+}, z_{i t}^{T})}^{T}

. Model (1) can then be expressed as

y_{i t} = β^{T} x_{i t} (γ) + μ_{i} + ε_{i t} .

(3)

Since the individual effects

μ_{i}

are not of main interest, we eliminate them via within-individual centering, yielding

{\ddot{y}}_{i t} = β^{T} {\ddot{x}}_{i t} (γ) + {\ddot{ε}}_{i t},

(4)

where, for example,

{\ddot{x}}_{i t} (γ) = x_{i t} (γ) - T^{- 1} \sum_{t = 1}^{T} x_{i t} (γ)

. To estimate the unknown parameter vector

θ \equiv {(β^{T}, γ^{T})}^{T}

, one can minimize the objective function

Q_{n} (θ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} ϕ (\frac{R ({\ddot{ε}}_{i t})}{n T + 1}) {\ddot{ε}}_{i t},

(5)

where

R ({\ddot{ε}}_{i t})

is the rank of

{\ddot{ε}}_{i t} = {\ddot{y}}_{i t} - β^{T} {\ddot{x}}_{i t} (γ)

among

{{\ddot{ε}}_{11}, \dots, {\ddot{ε}}_{n T}}

, and

ϕ (\cdot)

is a non-decreasing, square-integrable score function defined on

(0, 1)

, standardized such that

\int ϕ (u) d u = 0

and

\int ϕ {(u)}^{2} d u = 1

. The choice of score function depends on the shape of the error distribution [18]. Commonly used options include the Wilcoxon score,

ϕ (t) = \sqrt{12} (t - 0.5)

, and the sign score,

ϕ (t) = sgn (t - 0.5)

. The Wilcoxon score is particularly effective for symmetric, moderately heavy-tailed distributions, offering both robustness and relatively high efficiency. Therefore, we adopt the Wilcoxon score function in this study.

However, estimating the parameter vector

θ

is challenging because the objective function

Q_{n} (θ)

is convex in

β

but non-convex in

γ

. Therefore, its minimizer cannot be obtained by directly minimizing (5). To address this issue, we adopt the profile estimation strategy, a widely used approach for threshold-based models [2,3,6,8,12]. Specifically, we can express the objective function (5) in the form

Q_{n} (β, γ) = \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R ({\ddot{y}}_{i t} - β {(γ)}^{T} {\ddot{x}}_{i t} (γ))}{n T + 1} - 0.5) ({\ddot{y}}_{i t} - β {(γ)}^{T} {\ddot{x}}_{i t} (γ)) .

(6)

The minimization proceeds in two stages:

(1): For each $γ \in Γ$ , where $Γ$ is a compact set of feasible $γ$ values, we compute the profile estimate of $β (γ)$ as $\hat{β} (γ) = arg {min}_{β} Q_{n} (β (γ), γ)$ .
(2): We then estimate $γ$ by $\hat{γ} = arg {min}_{γ \in Γ} Q_{n} (\hat{β} (γ), γ)$ . The final profiled estimator for $θ$ is thus defined as $\hat{θ} = {(\hat{β} {(\hat{γ})}^{T}, {\hat{γ}}^{T})}^{T}$ .

2.3. Computational Details

In this subsection, we provide additional details on the numerical implementation of the proposed profile estimation procedure, which are essential for replication and practical application.

Grid construction and search region. In Step (1), we need to specify

Γ = Γ_{0} \times Γ_{1}

, where

Γ_{0}

and

Γ_{1} = Γ_{11} \times Γ_{12} \times \dots \times Γ_{1 k}

are the parameter spaces for

γ_{0}

and

γ_{1}

, which are assumed to be compact. In applications, following [12], we define

Γ_{0}

as

[x_{(0.15 N)}, x_{(0.85 N)}]

with

x_{(η)}

being the

η

th order statistic of

x_{i t}

, and set

Γ_{1 j}

as

[- r_{max}, r_{max}]

, in which

r_{max} = max {| x_{(0.15 N)} |, | x_{(0.85 N)} |}

, for

j = 1, \dots, k

. For each component of

γ

, we construct an evenly spaced grid over its admissible range. The resolution of the grid is chosen to balance computational feasibility and estimation accuracy. In our implementation, we use a moderately fine grid (e.g., 100 grid points per dimension), which we find sufficient to achieve stable results. As suggested by [15], selecting a threshold parameter

γ

that assigns an excessively small number of observations to either regime can be suboptimal. To mitigate this issue, we recommend constraining the grid search to satisfy a minimum regime size requirement, ensuring that each regime contains at least a specified percentage (e.g., 10% or 15%) of the total observations. Additionally, a robustness analysis using different percentage thresholds is advised to assess the stability and consistency of the estimation results.

Profile estimation procedure. Given a candidate value of

γ

, the slope parameter

β

is obtained by minimizing the rank-based objective function

Q_{n} (β, γ)

, which is convex in

β

. This estimation procedure can be readily implemented via conventional rank-based regression methods, and all computations can be performed using widely available software packages, including the R package Rfit.

Initialization and local minima. Although the objective function is generally non-convex in

γ

, the grid search approach ensures a global solution within the discretized parameter space. To further guard against potential local irregularities, we conduct sensitivity checks using alternative grid initializations and confirm that the resulting estimates are stable.

Computational complexity. The computational cost of the proposed method is primarily driven by the grid search over

γ

and the repeated evaluation of the rank-based objective function. While the procedure is computationally efficient for low-dimensional threshold parameters, the cost increases with the dimension of

q_{i t}

. Our implementation demonstrates that the method remains feasible for moderate sample sizes and low-to-moderate dimensional covariates. When the dimension of

q_{i t}

increases, the iterative estimation scheme proposed by [16] provides an effective solution, particularly when initialized with a suitable starting value.

Overall, these implementation details ensure that the proposed estimation procedure is both reproducible and practically feasible while maintaining robustness to outliers and heavy-tailed errors.

2.4. Asymptotic Properties

Adopting the asymptotic framework of [12], our theoretical analysis is conducted under the setting where

n \to \infty

and T is fixed. To derive the asymptotic distribution of the proposed rank-based estimator

\hat{θ}

, we introduce the following notations. Let

F (\cdot)

and

f (\cdot)

denote the cumulative distribution function and probability density function of the error term

{\ddot{ε}}_{i t}

, respectively. Define the scale parameter

c_{ϕ} = {\int ϕ^{'} (F (u)) f (u) d F (u)}^{- 1}

. Next, let

1_{i t}^{+} (γ) = 1 (x_{i t} > γ_{i t})

,

{\ddot{1}}_{i t}^{+} (γ) = 1_{i t}^{+} (γ) - \frac{1}{T} \sum_{t = 1}^{T} 1_{i t}^{+} (γ)

, and

{\ddot{q}}_{i t}^{+} (γ) = q_{i t} 1_{i t}^{+} (γ) - \frac{1}{T} \sum_{t = 1}^{T} q_{i t} 1_{i t}^{+} (γ)

. We define the centered error term

{\ddot{ε}}_{i t} (θ) = {\ddot{y}}_{i t} - β^{T} {\ddot{x}}_{i t} (γ)

, and the gradient vector

h_{i t} (θ) = - \frac{\partial {\ddot{ε}}_{i t} (θ)}{\partial θ} = {({\ddot{x}}_{i t} {(γ)}^{T}, - β_{1} {\ddot{1}}_{i t}^{+} (γ), - β_{1} {\ddot{q}}_{i t}^{+} (γ))}^{T}

. Furthermore, define the matrices

\begin{matrix} G (θ) & = & - \frac{1}{c_{ϕ}} E [\sum_{t = 1}^{T} h_{i t} (θ) h_{i t} {(θ)}^{T}], \\ Σ (θ) & = & lim_{n \to \infty} Var \{\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R ({\ddot{ε}}_{i t} (θ))}{n T + 1} - 0.5) h_{i t} (θ)\} . \end{matrix}

For brevity, we denote

{\ddot{ε}}_{i t} = \ddot{ε} (θ^{★})

,

h_{i t} = h_{i t} (θ^{★})

,

G = G (θ^{★})

, and

Σ = Σ (θ^{★})

, where

θ^{★}

is the true parameter value. To establish the asymptotic distribution of

\hat{θ}

, we impose the following regularity conditions.

(A1): (i) For each t, $v_{i t} \equiv {(y_{i t}, x_{i t}, z_{i t}^{T}, q_{i t}^{T})}^{T}$ are independently and identically distributed (i.i.d.) across i; (ii) For some $r > 0$ , $E | y_{i t} |^{4 + r} < \infty$ , $E | x_{i t} |^{4 + r} < \infty$ , $E | z_{i t} |^{4 + r} < \infty$ , and $E | q_{i t} |^{4 + r} < \infty$ ; (iii) $E [ε_{i t} | (x_{i s}, z_{i s}^{T}, q_{i s}^{T}, u_{i} : 1 \leq s \leq T)] = 0$ .
(A2): The variable $x_{i t}$ has a conditional probability density function given $q_{i t} = q$ , denoted by $f_{q, t} (x | q)$ , satisfying ${max}_{1 \leq t \leq T} f_{q, t} (x | q) \leq {\bar{f}}_{q} < \infty$ .
(A3): The random error ${\ddot{ε}}_{i t}$ has a continuous density function $f (\cdot)$ with a bounded first derivative and finite Fisher information.
(A4): The true parameter $θ^{★} = arg {min}_{(β, γ) \in B \times Γ} E [Q_{n} (θ)]$ exists and is unique, where $Θ = B \times Γ$ is a compact subset of $R^{k + l + 3}$ containing $θ^{★}$ .
(A5): $β_{1}^{★} \neq 0$ .
(A6): $G (θ)$ and $Σ (θ)$ are positive definite in a neighborhood of $θ^{★}$ .

Condition (A1) imposes finite moment conditions on the response and the explanatory variables and assumes strict exogeneity of the regressors and covariates influencing the threshold. Conditions (A2)–(A5) are standard in the threshold regression literature. Condition (A2) ensures that the threshold variable

x_{i t}

has a bounded and continuous density given

q_{i t}

. Condition (A3) is a common assumption in rank estimation, guaranteeing smoothness and identifiability of the score function. Condition (A4) ensures that the population objective function attains a unique minimum and that the parameter space

Θ

is compact. Condition (A5) serves as the identification condition required for consistent estimation of

\hat{θ}

. Condition (A6) guarantees that the Hessian and variance matrices are invertible near

θ^{★}

, enabling derivation of the asymptotic distribution.

Theorem 1.

Under regularity conditions (A1)–(A6), as

n \to \infty

, we have

(i): $\hat{θ} \overset{p}{\to} θ^{★}$ .
(ii): $\sqrt{n} (\hat{θ} - θ^{★})$ is asymptotically normally distributed with mean zero and covariance matrix $G^{- 1} Σ G^{- 1}$ , i.e., $\sqrt{n} (\hat{θ} - θ^{★}) \overset{D}{\to} N (0, G^{- 1} Σ G^{- 1})$ .

It is important to emphasize that, in our model framework, the regression coefficients and threshold estimators

{({\hat{β}}^{T}, {\hat{γ}}^{T})}^{T}

are jointly asymptotically normal with a convergence rate of

\sqrt{n}

. This property sets our model apart from conventional threshold regression models featuring a discontinuous jump, such as those considered in [19,20,21,22], where the regression coefficient estimator

\hat{β}

maintains

\sqrt{n}

-consistent, but the threshold estimator

\hat{γ}

exhibits n-consistency and follows a non-standard asymptotic distribution. In contrast, the

\sqrt{n}

-convergence rate of

\hat{γ}

in our case arises from the continuity of

Q_{n} (θ)

in

γ

. This result is consistent with the behavior observed in conventional kink threshold regression models, as studied, for example, in [3,5,8,10].

2.5. Testing for the Threshold Constancy

The result in Theorem 1 enables valid statistical inference by providing consistent estimators of the asymptotic covariance matrix. For instance, we can test whether the kink threshold effect is constant across covariates. To this end, we consider the following hypotheses:

H_{0}^{c} : γ_{1} = 0, v . s . H_{1}^{c} : γ_{1} \neq 0 .

(7)

A natural test for distinguishing between a constant threshold and the covariate-dependent threshold model (1) is based on the Wald statistic:

W_{n} = n {\hat{θ}}^{T} R {(R^{T} {\hat{G}}^{- 1} \hat{Σ} {\hat{G}}^{- 1} R^{T})}^{- 1} R^{T} \hat{θ},

(8)

where

R = (0_{k \times (l + 3)}, I_{k})

is the incident matrix, and

\hat{G} = \hat{G} (\hat{θ})

and

\hat{Σ} (\hat{θ})

are consistent estimators of

G

and

Σ

, respectively. In practice, these matrices can be approximated by:

\hat{G} = - \frac{1}{{\hat{c}}_{ϕ}} \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} h_{i t} (\hat{θ}) h_{i t} {(\hat{θ})}^{T}, and \hat{Σ} = \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} 12 {(\frac{R ({\ddot{ε}}_{i t} (\hat{θ}))}{n T + 1} - 0.5)}^{2} h_{i t} (\hat{θ}) h_{i t} {(\hat{θ})}^{T} .

(9)

Here,

c_{ϕ}

is a scale parameter that depends on the density function f and the score function

ϕ

. A consistent estimator of

c_{ϕ}

is required for valid inference, and we adopt the estimator proposed by [23]. The asymptotic distribution of

W_{n}

under the null hypothesis is given below.

Theorem 2.

Suppose that the conditions in Theorem 1 hold. Then, under

H_{0}^{c}

,

W_{n} \overset{D}{\to} χ_{k}^{2} .

2.6. Testing for the Kink Threshold Effect

Note that the proposed estimation procedure relies on the presence of a threshold effect (i.e.,

β_{1} \neq 0

). Thus, another important problem is whether such a threshold effect exists in the regression model (1). To address this, we consider the following null and alternative hypotheses:

H_{0}^{l} : β_{1} = 0 for any γ \in Γ v . s . H_{1}^{l} : β_{1} \neq 0 for some γ \in Γ .

(10)

Under

H_{0}^{l}

, the model reduces to a standard linear specification, and the threshold parameter

γ

becomes unidentifiable. Existing tests for this linearity hypothesis typically rely on either a Wald-type statistic [12] or a likelihood ratio-type statistic [24]. However, both approaches require fitting the full alternative kink threshold model under

H_{1}^{l}

, which can be computationally intensive, especially when the dimension of

γ

is large. To this end, we propose a test based on the Lagrange multiplier principle, utilizing the score process to construct a more efficient testing procedure.

The test is constructed by sequentially evaluating the subgradients of the objective function under

H_{0}^{l}

over a subsample, in a manner analogous to the CUSUM statistic. We define the score-based test statistic as

R_{n} (γ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R ({\ddot{y}}_{i t} - {\hat{ξ}}^{T} {\ddot{w}}_{i t})}{n T + 1} - 0.5) ({\ddot{x}}_{i t} (γ) - c_{ϕ} {\hat{S}}_{2 n} {(γ)}^{T} S_{1 n}^{- 1} {\ddot{w}}_{i t}),

where

{\ddot{x}}_{i t} (γ) = {(x_{i t} - γ_{i t})}_{+} - T^{- 1} \sum_{t = 1}^{T} {(x_{i t} - γ_{i t})}_{+}

,

{\ddot{w}}_{i t} = w_{i t} - T^{- 1} \sum_{t = 1}^{T} w_{i t}

with

w_{i t} = {(x_{i t}, z_{i t}^{T})}^{T}

,

{\hat{S}}_{2 n} (γ) = n^{- 1} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} \hat{f} ({\hat{\ddot{ε}}}_{i t}^{l}) {\ddot{w}}_{i t} {\ddot{x}}_{i t} (γ) and {\hat{S}}_{1 n} = n^{- 1} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} \hat{f} ({\hat{\ddot{ε}}}_{i t}^{l}) {\ddot{w}}_{i t} {\ddot{w}}_{i t}^{T} .

Here,

\hat{ξ} \equiv {({\hat{β}}_{0}, {\hat{β}}_{2}^{T})}^{T}

denotes the estimator of

ξ \equiv {(β_{0}, β_{2}^{T})}^{T}

under the null hypothesis

H_{0}^{l}

, obtained via

\hat{ξ} = arg min_{ξ} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t})}{n T + 1} - 0.5) ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t}),

(11)

where

R ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t})

is the rank of the residual

{\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t}

among all residuals

{{\ddot{y}}_{11} - ξ^{T} {\ddot{w}}_{11}, \dots, {\ddot{y}}_{n T} - ξ^{T} {\ddot{w}}_{n T}}

. Correspondingly,

{\hat{\ddot{ε}}}_{i t}^{l} = {\ddot{y}}_{i t} - {\hat{ξ}}^{T} {\ddot{w}}_{i t}

are the estimated residuals under

H_{0}^{l}

, and

\hat{f} (\cdot)

is a kernel-based estimator of the error density

f (\cdot)

. Importantly, our test statistic

R_{n} (γ)

only requires estimation of the null model, as specified in (11), making it substantially more computationally efficient than alternative approaches that require fitting the full alternative model under

H_{1}^{l}

.

Since

γ

is not identified under

H_{0}^{l}

, we follow the union-intersection principle [25] and take the supremum of

R_{n} (γ)

over the compact set

Γ

. Therefore, we propose the following test statistic

L_{n} = sup_{γ \in Γ} | R_{n} (γ) | .

Intuitively, under

H_{0}^{l}

,

\hat{ξ}

is a consistent estimate of the true parameter value, and the estimated residuals

{\hat{\ddot{ε}}}_{i t}

fluctuate randomly around zero. As a result,

| R_{n} (γ) |

tends to be small across all

γ \in Γ

. In contrast, under

H_{1}^{l}

,

\hat{ξ}

deviates substantially from the true value, and the residuals

{\hat{\ddot{ε}}}_{i t}

contain systematic bias, leading to large values of

| R_{n} (γ) |

for some

γ

. Hence, a large value of

L_{n}

provides strong evidence against

H_{0}^{l}

.

2.6.1. Limiting Distribution of the Test Statistic

To evaluate the power of

L_{n}

, we consider the following local alternative model:

y_{i t} = β_{0}^{★} x_{i t} + n^{- 1 / 2} β_{1}^{★} {(x_{i t} - γ_{i t})}_{+} + β_{2}^{★ T} z_{i t} + μ_{i} + ε_{i t},

(12)

where

β_{1}^{★} \neq 0

. To characterize the limiting distribution of

L_{n}

, we define

\begin{matrix} S_{1 n} & = \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} f ({\ddot{ε}}_{i t}) {\ddot{w}}_{i t} {\ddot{w}}_{i t}, S_{1} = \sum_{t = 1}^{T} E [\sqrt{12} f ({\ddot{ε}}_{i t}) {\ddot{w}}_{i t} {\ddot{w}}_{i t}^{T}] \\ S_{2 n} (γ) & = \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} f ({\ddot{ε}}_{i t}) {\ddot{w}}_{i t} {\ddot{x}}_{i t} (γ), S_{2} (γ) = \sum_{t = 1}^{T} E [\sqrt{12} f ({\ddot{ε}}_{i t}) {\ddot{w}}_{i t} {\ddot{x}}_{i t} (γ)], \\ S_{n} (γ \land γ^{★}) & = \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} f ({\ddot{ε}}_{i t}) {\ddot{x}}_{i t} (γ) {\ddot{x}}_{i t} (γ^{★}), S (γ \land γ^{★}) = \sum_{t = 1}^{T} E [\sqrt{12} f ({\ddot{ε}}_{i t}) {\ddot{x}}_{i t} (γ) {\ddot{x}}_{i t} (γ^{★})], \end{matrix}

and

κ (γ) = [S (γ \land γ^{★}) - S_{2} {(γ)}^{T} S_{1}^{- 1} S_{2} (γ^{★})] β_{1}^{★}

. The following theorem shows the large-sample performance of

L_{n}

under the local alternative model (12).

Theorem 3.

Suppose that the regularity conditions (A1)–(A6) hold. Under the local alternative model (12),

R_{n} (γ)

admits the asymptotic representation

R_{n} (γ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5] ({\ddot{x}}_{i t} (γ) - c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} {\ddot{w}}_{i t}) + κ (γ) + o_{p} (1) .

(13)

Furthermore, as

n \to \infty

, the test statistic

L_{n}

converges weakly to the process

{sup}_{γ \in Γ} | R (γ) + κ (γ) |

, where

R (γ)

is a zero-mean Gaussian process with covariance function

\sum_{t = 1}^{T} E [({\ddot{x}}_{i t} (γ) - c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} {\ddot{w}}_{i t}) ({\ddot{x}}_{i t} (\tilde{γ}) - c_{ϕ} S_{2} {(\tilde{γ})}^{T} S_{1}^{- 1} {\ddot{w}}_{i t})]

for any

(γ, \tilde{γ}) \in Γ \times Γ

.

Under the local alternative model (12),

κ (γ)

is generally nonzero for some

γ

, whereas it is identically zero for all

γ

under

H_{0}^{l}

. As implied by Theorem 3, when

H_{0}^{l}

holds (i.e.,

β_{1}^{★} = 0

),

R_{n} (γ)

converges to the mean-zero Gaussian process

R (γ)

. In contrast, under

H_{1}^{l}

(i.e.

, β_{1}^{★} \neq 0

),

R_{n} (γ)

includes an additional fluctuating term

κ (γ)

, shifting its distribution away from zero. This distinction corroborates our earlier intuition regarding the behavior of

R_{n} (γ)

. Consequently, the proposed test statistic

L_{n}

can distinguish

H_{1}^{l}

with a covariate-dependent kink threshold effect from

H_{0}^{l}

with no threshold. Moreover, the power of

L_{n}

approaches one when the magnitude of the threshold effect under

H_{1}^{l}

is of order greater than (though it can be arbitrarily close to)

n^{- 1 / 2}

, as stated in the following corollary.

Corollary 1.

Suppose that the conditions in Theorem 3 hold. Under the local alternative model

y_{i t} = β_{0}^{★} x_{i t} + n^{- 1 / 2} a_{n} β_{1}^{★} {(x_{i t} - γ_{i t})}_{+} + β_{2}^{★ T} z_{i t} + μ_{i} + ε_{i t}

, for any increasing positive sequence

a_{n} \to \infty

, then

{lim}_{n \to \infty} Pr (| L_{n} | \geq b) = 1

holds for any

b > 0

.

2.6.2. A Bootstrap Approach to Compute the p-Value

However, the limiting null distribution of

L_{n}

is nonstandard because it depends on the nuisance parameter

γ

. To conduct valid inference, we propose a simulation-based method for computing critical values, which leverages the asymptotic representation of

R_{n} (γ)

in (13). Additionally, the covariance of

L_{n}

involves estimation of both the CDF

F (\cdot)

and the density function

f (\cdot)

of errors, which complicates the analysis. Following the approach in [20] for quantile regression, we employ a kernel method to estimate

\hat{f} ({\hat{\ddot{ε}}}_{i t}^{l})

by

\hat{f} ({\hat{\ddot{ε}}}_{i t}^{l}) = {(n T)}^{- 1} \sum_{i^{'} = 1}^{n} \sum_{t^{'} = 1}^{T} K_{h} ({\hat{\ddot{ε}}}_{i t}^{l} - {\hat{\ddot{ε}}}_{i^{'} t^{'}}^{l})

, where

K_{h} (\cdot) = K (\cdot / h) / h

,

K (\cdot)

is a symmetric kernel function, and

h > 0

is the bandwidth. We impose an additional regularity condition to justify this estimation procedure.

(A7): The symmetric kernel function $K (\cdot)$ satisfies $\int K (u) d u = 1$ and has a bounded first derivative. The bandwidth h satisfies $h \to 0$ and $n h \to \infty$ as $n \to \infty$ .

In the practical applications, the statistic

R_{n} (γ)

defined in Algorithm 1 depends on the bandwidth h through the kernel-based estimators

{\hat{S}}_{2 n} (γ)

and

{\hat{S}}_{1 n}

. For bandwidth selection, we use Silverman’s rule of thumb [26],

h = 1.06 \hat{σ} n^{- 1 / 5}

, where

\hat{σ}

is the standard deviation of the residuals

{\hat{\ddot{ε}}}_{i t}^{l}

under

H_{0}^{l}

. We summarize the bootstrap-based testing procedure in Algorithm 1.

Algorithm 1 The bootstrap-based test of $L_{n}$
1:	Generate iid random variables ${u_{1}, \dots, u_{n}}$ with $u_{i} = v_{i} w_{i}$ , where $v_{i}$ is drawn from $N (0, 1)$ , and $w_{i}$ (independent of all $v_{i}$ ’s) from $Pr (w_{i} = 1) = Pr (w_{i} = - 1) = 0.5$ .
2:	Calculate the test statistic $R_{n}^{*} (γ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} u_{i} \sum_{t = 1}^{T} \sqrt{12} [\hat{F} ({\hat{\ddot{ε}}}_{i t}) - 0.5] ({\ddot{x}}_{i t} (γ) - {\hat{c}}_{ϕ} {\hat{S}}_{2 n} {(γ)}^{T} {\hat{S}}_{1 n}^{- 1} {\ddot{w}}_{i t}),$ where $\hat{F} (\cdot)$ is the empirical distribution function of ${\hat{\ddot{ε}}}_{i t}^{l}$ under the null hypothesis, ${\hat{S}}_{2 n} (γ) = n^{- 1} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} \hat{f} ({\hat{\ddot{ε}}}_{i t}^{l}) {\ddot{w}}_{i t} {\ddot{x}}_{i t} (γ), and {\hat{S}}_{1 n} = n^{- 1} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} \hat{f} ({\hat{\ddot{ε}}}_{i t}^{l}) {\ddot{w}}_{i t} {\ddot{w}}_{i t}^{T} .$
3:	Repeat Steps 1–2 NB times to obtain $L_{n}^{* (1)}, \dots, L_{n}^{* (NB)}$ . The p-value is calculated by ${\hat{p}}_{n} = {NB}^{- 1} \sum_{j = 1}^{NB} 1 {L_{n}^{* (j)} \geq L_{n}}$ .

The following result establishes the validity of the bootstrap resampling scheme, whose proof is given in Appendix A.

Theorem 4.

Suppose that regularity conditions (A1)–(A7) hold. Then, under both the null and the local alternative hypotheses,

R_{n}^{*} (γ)

defined in Algorithm 1 converges weakly to the Gaussian process

R_{n} (γ)

as

n \to \infty

.

3. Simulation Studies

In this section, we conduct simulation studies to examine the finite sample performances of the proposed estimation method and testing procedures. In particular, we evaluate the accuracy of parameter estimation, type I error, and power of the test for the presence of a kink threshold effect and the performance of the test for threshold constancy. Similar to [12], we generate data using the following data-generating processes (DGPs):

\begin{matrix} DGP 1 : & y_{i t} = x_{i t} - {(x_{i t} - γ_{i t})}_{+} + 2 z_{i t} + μ_{i} + ε_{i t}, \\ DGP 2 : & y_{i t} = x_{i t} - {(x_{i t} - γ)}_{+} + 2 z_{i t} + μ_{i} + ε_{i t}, \\ DGP 3 : & y_{i t} = x_{i t} + 2 z_{i t} + μ_{i} + ε_{i t}, \end{matrix}

where

x_{i t} = 0.25 μ_{i} + u_{q, i t} + u_{x, i t}

,

z_{i t} = 0.5 μ_{i} + u_{q, i t} + u_{z, i t}

,

γ_{i t} = γ_{0} + γ_{1} q_{i t}

,

q_{i t} = z_{i t}

,

u_{q, i t} \overset{i . i . d}{\sim} N (0.5, 1)

,

u_{x, i t}

and

u_{z, i t}

both follow

i . i . d . N (0, 1)

, and

μ_{i} \overset{i . i . d}{\sim} N (0, 1)

. The innovation terms

ε_{i t}, u_{x, i t}, u_{z, i t}

and

u_{q, i t}

are mutually independent. DGP 1 corresponds to the covariate-dependent kink threshold regression model with true parameters

(β_{0}, β_{1}, β_{2}) = (1, - 1, 2)

and

(γ_{0}, γ_{1}) = (0, 0.5)

. DGP 2 specifies a constant kink threshold regression model, while DGP 3 represents a standard linear regression model. DGPs 2 and 3 serve as benchmark specifications for evaluating the proposed tests. We consider four error distributions for

ε_{i t}

, all standardized to have mean zero and unit variance: (i)

N (0, 1)

, (ii) Student’s t-distribution with three degrees of freedom

t (3)

; (iii) Tukey contaminated normal

T (0.1, 10)

(

0.9 N (0, 1) + 0.1 N (0, 10)

); and (iv) Lognormal distribution

L N (0, 1)

. All these errors are standardized with a mean of zero and a variance of one. We use sample sizes n = 50,100 and time spans T = 10, 20. For each case, we conduct

NS = 500

replications.

3.1. Estimation Accuracy

To evaluate the performance of the proposed estimation method, we compute several metrics for each parameter estimator: bias, standard deviation (SD), average estimated standard error (ESE), mean squared error (MSE), and empirical coverage probability (ECP). These results are compared with those obtained from the least squares (LS) estimator proposed by [12], abbreviated as Yang. Specifically, for the jth component of

θ

, denoted as

θ_{j}

,

Bias (θ_{j}) = {NS}^{- 1} \sum_{m = 1}^{NS} ({\hat{θ}}_{j}^{(m)} - θ_{j}^{★})

and

SD (θ_{j}) = \sqrt{{NS}^{- 1} \sum_{m = 1}^{NS} {({\hat{θ}}_{j}^{(m)} - {\bar{θ}}_{j})}^{2}}

, where

{\bar{θ}}_{j} = {NS}^{- 1} \sum_{m = 1}^{NS} {\hat{θ}}_{j}^{(m)}

is the average of

{\hat{θ}}_{j}

, and

{\hat{θ}}_{j}^{(m)}

is the estimate in the mth replication, and

θ_{j}^{★}

is the true parameter value. The ESE is defined as

ESE ({\hat{θ}}_{j}) = {NS}^{- 1} \sum_{m = 1}^{NS} {\hat{σ}}_{j}^{(m)}

, where

{\hat{σ}}_{j}^{2}

is the jth diagonal element of

{\hat{G}}^{- 1} \hat{Σ} {\hat{G}}^{- 1}

, as defined in (9). The ECP for

{\hat{θ}}_{j}

is computed based on the

1 - α

Wald confidence interval,

{Wald}_{1 - α} (θ_{j}) = [{\hat{θ}}_{j} - N^{- 1 / 2} z_{1 - α / 2} {\hat{σ}}_{j}, {\hat{θ}}_{j} + N^{- 1 / 2} z_{1 - α / 2} {\hat{σ}}_{j}],

where

z_{1 - α / 2}

is the

(1 - α / 2)

-quantile of the standard normal distribution. We set

α = 0.05

. The ECP is then defined as the proportion of simulations in which the true parameter value lies within the Wald-type confidence interval.

Table 1 and Table 2 summarize the estimation results, reporting the bias, SD, ESE, MSE, and ECP for each estimator. Several key findings emerge. (i) When the error term follows a standard normal distribution, both estimators perform well and exhibit comparable accuracy. Both yield estimates close to the true parameter values with negligible bias, indicating that they are effectively unbiased. Furthermore, the ESEs closely match the empirical SDs. The MSEs of Yang’s estimators are slightly smaller than those of the proposed estimators, which is expected since rank-based estimators in linear regression with normal errors achieve approximately 95% relative efficiency compared to the LS estimator [18]. (ii) Under non-normal error distributions, the proposed estimators significantly outperform the LS estimators, as evidenced by smaller SDs and MSEs. Although Yang’s method shows some improvement with increasing n or T, the gains are considerably more modest than those achieved by our method. (iii) In terms of ECP, the proposed method generally yields coverage rates closer to the nominal 95% level than Yang’s method across most settings. As n or T increases, the ECPs for the regression coefficients from both methods approach the 95% target. However, under heavy-tailed error distributions, our estimator demonstrates greater stability in maintaining valid coverage. (iv) The ECPs for the threshold parameters obtained from Yang’s method are consistently lower than those for the regression coefficients, often falling below 90% and dropping below 60% under heavy-tailed errors. This indicates that threshold parameter estimation is more sensitive to the error distribution, leading to higher variability. In contrast, the proposed method yields more stable and reliable coverage for both regression and threshold parameters. In summary, compared with Yang’s estimators, the proposed rank-based estimators offer superior robustness to outliers and heavy-tailed errors while maintaining competitive performance under normality.

3.2. Type I Error and Power Analysis

We next evaluate the finite sample size and power performance of the proposed test statistic

W_{n}

for threshold constancy in Section 2.5. We focus on DGP 1 and DGP 2, accompanied by the covariate-dependent kink threshold (i.e.,

H_{1}^{c}

holds) and the constant kink threshold (i.e.,

H_{0}^{c}

holds), respectively. For comparison, we also include the Wald test based on the asymptotic theory of the LS estimator proposed by [12]. Table 3 reports the empirical rejection frequencies for DGPs 1 and 2 at the 5% nominal significance level. First, when the error term follows a standard normal distribution, our test maintains a Type I error rate close to the nominal level, whereas Yang’s Wald test is slightly oversized. The power of both tests is reasonably high in this setting. Second, under heavy-tailed or contaminated normal error distributions, Yang’s test becomes anti-conservative, exhibiting both inflated Type I error rates and high power. This is primarily because Yang’s method relies on the LS framework, which lacks robustness to outliers and deviations from normality. In contrast, our test preserves the nominal Type I error rate across all error distributions and maintains satisfactory power, demonstrating its superior robustness.

We also conduct a simulation study to evaluate the finite-sample performance of the test statistic

L_{n}

for testing the presence of a kink threshold effect. For this purpose, we consider DGP 1 and DGP 3, which correspond to a kink threshold regression model with a covariate-dependent threshold and a standard linear regression model, respectively. To assess the performance of our proposed test, we compare it with two existing approaches: the sup-Wald test statistic proposed by [12] and the score-based test statistic introduced by [16]. Both of these tests are based on the LS estimation framework and are referred to as Yang and Zhou, respectively. Our test is the rank-based procedure developed in Section 2.6. In implementing our test, we use the Epanechnikov kernel

K (u) = 3 / 4 (1 - u^{2}) I (| u | \leq 1)

, and select the bandwidth

h = 1.06 \hat{σ} {(n T)}^{- 1 / 5}

.

Table 4 summarizes the empirical rejection rates of the three test statistics. The following key findings emerge. First, the size (i.e., the rejection frequency under DGP 3, the null linear model) of both our method and Zhou’s method is close to the nominal 5% level, whereas Yang’s method exhibits substantial undersizing, with rejection rates far below the nominal level. This indicates that the Wald-type test statistic proposed by Yang is highly conservative in detecting the presence of a threshold effect. Second, when the kink threshold effect is present (i.e., under DGP 1), all three testing methods perform reasonably well. When n or T is small; for example,

n = 50

and

T = 10

, the power of both our method and Yang’s method is close to 1, substantially exceeding that of Zhou’s method. As expected, as n or T increases, the power of Zhou’s method also approaches 1. In summary, the proposed test statistic

L_{n}

demonstrates satisfactory size control and competitive power across various sample configurations, confirming its effectiveness in finite-sample settings.

4. An Empirical Application

4.1. Data and Model Specification

Understanding the relationship between female wage income and working hours is a long-standing topic in labor economics [27,28,29,30], among many others. Empirical studies in this field have suggested that the relationship between female wage income and working hours may be nonlinear. For example, Ref. [31] employed a panel threshold regression model to document a positive but nonlinear relationship between weekly hours worked and hourly wage growth for women. Ref. [32] developed a dynamic model showing that the wage–hour relationship is nonlinear due to occupational sorting and labor market constraints, which result in heterogeneous returns across the hours distribution. This nonlinearity is also found to differ significantly by gender, with women generally experiencing lower returns than men in the upper range of the hours distribution.

In this paper, we apply the proposed robust covariate-dependent kink threshold regression model to capture the nonlinear relationship between female wage income and working hours. The panel wage dataset we use is sourced from [33], originally collected through the National Longitudinal Surveys (NLS) conducted by the U.S. Department of Labor. This public dataset is available in the R package PoEdata as the data file “nls_panel.dat”. Our sample consists of 716 female respondents who were surveyed over five waves. Notably, Ref. [16] analyzed the same dataset using the LS estimator and confirmed the presence of a covariate-dependent kink effect. Following their work, we specify the following econometric model:

\begin{matrix} {lwage}_{i t} & = & μ_{i} + β_{0} {exper}_{i t} + β_{1} {({exper}_{i t} - γ_{i t})}_{+} + β_{2} {hours}_{i t} + ε_{i t}, \\ γ_{i t} = γ_{0} + γ_{1} {hours}_{i t}, i = 1, \dots, 716; t = 1, \dots, 5, \end{matrix}

(14)

where

{lwage}_{i t}

is the log-transformed hourly wage (the dependent variable),

{exper}_{i t}

denotes the total labor force experience, and

{hours}_{i t}

denotes the usual weekly working hours. For estimation purposes, both

{exper}_{i t}

and

{hours}_{i t}

are standardized to lie in the unit interval

[0, 1]

. The tipping point

γ_{i t}

is modeled as a linear function of the informative covariate

{hours}_{i t}

.

4.2. Estimation Results

Table 5 presents the estimation and testing results for the working model (14), obtained using both the mean regression method of [12] and the proposed rank regression approach. We begin by testing whether the kink threshold location

γ_{i t}

depends on

{hours}_{i t}

, i.e.,

H_{0}^{c} : γ_{1} = 0

. The p-values from both Yang’s test and our proposed test are below 0.1, indicating strong evidence against the null hypothesis and confirming the presence of a covariate-dependent kink threshold effect at the 10% significance level. Next, we test for the existence of a kink effect, i.e.,

H_{0}^{l} : β_{1} = 0

. Using the sup-Wald test statistic based on mean regression [12] and our proposed test procedure in Algorithm 1 with

NB = 1000

bootstrap replications, we obtain p-values of 0.000 and 0.010, respectively. Both tests decisively reject the null hypothesis of linearity in favor of the kink threshold regression specification at the 5% significance level. Taken together, these results support the use of the panel kink threshold regression model (14) with a covariate-dependent threshold.

We now examine the parameter estimation results obtained from the LS estimator and the proposed robust rank-based estimator. Several noteworthy findings emerge. First, while work experience in years (

{exper}_{i t}

) exerts a positive effect on the log wage (

{lwage}_{i t}

), as indicated by

{\hat{β}}_{0} > 0

and

{\hat{β}}_{0} + {\hat{β}}_{1} > 0

, the relationship between

{exper}_{i t}

and

{lwage}_{i t}

is nonlinear and varies across the two regimes defined by the threshold:

{exper}_{i t} \leq γ_{i t}

and

{exper}_{i t} > γ_{i t}

. Specifically, the regression function is steeper in the first regime, with a larger marginal return

{\hat{β}}_{0}

, and becomes flatter in the second regime, with a reduced marginal return

{\hat{β}}_{0} + {\hat{β}}_{1}

. This pattern is consistent with the findings of [16]. Second, our estimation results reveal a similar qualitative pattern to that reported by [12]. However, compared with the LS estimator, the standard errors of our robust estimator are noticeably smaller, indicating improved efficiency. This gain can be attributed to the rank-based estimation framework, which reduces sensitivity to outliers and heavy-tailed disturbances. In summary, the panel kink threshold regression model with a covariate-dependent threshold provides an effective tool for capturing nonlinear wage dynamics. Moreover, the proposed rank estimator enhances robustness while maintaining efficiency, making it well-suited for empirical analysis in labor economics.

4.3. Influence of Outliers and Sensitivity Analysis

To assess whether robustness is empirically relevant, we first examine the distributional properties of the residuals from the LS estimator. Figure 1 shows that the residuals deviate substantially from normality, exhibiting pronounced heavy tails and moderate right-skewness. This evidence is reinforced by the summary statistics in Table 6. The skewness coefficient is 0.480, indicating asymmetry, while the kurtosis reaches 17.396, far exceeding the Gaussian benchmark of 3. Moreover, both the Jarque–Bera and Shapiro–Wilk tests strongly reject normality. These findings indicate that the error distribution departs markedly from the classical assumptions underlying LS estimation, suggesting that LS may be highly sensitive to extreme observations and may yield unreliable inference.

To further investigate the impact of such observations, we conduct an influence diagnostic analysis. Figure 2 reports Cook’s distance and the residual–leverage relationship. The Cook’s distance plot shows that while most observations have negligible influence, a nontrivial subset exhibits relatively large values, indicating that influence is unevenly distributed across the sample. The residual-leverage plot further reveals that several observations combine relatively high leverage with sizable residuals, suggesting that they may exert a disproportionate effect on the estimated regression function.

To evaluate the impact of influential observations, we conduct a sensitivity analysis by re-estimating the model after trimming extreme observations. The results in Table 7 show that LS estimates vary noticeably across samples, particularly for the kink and threshold parameters, indicating substantial sensitivity to extreme data points. In contrast, the rank-based estimates remain highly stable across specifications, with only minimal changes in key parameters. This stability suggests that the proposed estimator is considerably less affected by influential observations. Taken together, these results provide strong empirical evidence that the proposed method delivers more reliable inference in the presence of heavy-tailed errors and outliers.

5. Conclusions

In this paper, we develop a rank-based estimation procedure for the panel kink threshold regression model with a covariate-dependent threshold. To address the non-differentiability of the objective function, we propose a two-stage estimation strategy for simultaneously estimating the regression coefficients and threshold parameters. We establish the joint asymptotic normality of the slope and threshold estimators, which facilitates the construction of a standard Wald-type test for threshold constancy. Additionally, we introduce a rank score-based statistic to test for the presence of a threshold effect. Extensive numerical studies demonstrate that the proposed estimators and tests perform well in finite samples, offering both robustness and reliable inference.

The asymptotic theory developed in this paper relies on a set of standard regularity conditions commonly adopted in the panel threshold literature. While these assumptions facilitate tractable theoretical analysis, it is important to clarify their scope and practical implications. First, our framework is developed under the asymptotic regime where the cross-sectional dimension

n \to \infty

while the time dimension T remains fixed. This setting is appropriate for typical micro-panel datasets, where a large number of individuals are observed over a relatively small number of periods. However, the proposed method is not directly designed for panels with large T, where time series dependence may play a more prominent role. Extending the framework to accommodate large-T asymptotics would require additional techniques and is left for future research. Second, the model assumes cross-sectional independence across individuals. This assumption simplifies the derivation of the asymptotic distribution but may be restrictive in empirical applications where common shocks or cluster-level dependence are present. In such cases, the variance estimation procedure may need to be adjusted, for example, by incorporating cluster-robust covariance estimators. Developing a fully robust inference procedure under cross-sectional dependence remains an important direction for future work. Third, we impose a strict exogeneity condition on the regressors and threshold covariates. In particular, the explanatory variables and covariates determining the threshold are assumed to be uncorrelated with the idiosyncratic error term while this assumption is standard in the panel threshold literature, it may be violated in applications where regressors are endogenous or only weakly exogenous. Addressing endogeneity would require extending the current framework to incorporate instrumental variable techniques or control function approaches within a rank-based setting, which poses nontrivial challenges. Fourth, the theory assumes that the conditional density of the threshold variable is bounded and continuous. This condition ensures identification and stable estimation of the threshold parameters. In practice, this assumption is generally mild, but it may be violated in cases with discrete or highly concentrated covariates, in which case the performance of the estimator may be affected.

Author Contributions

Conceptualization, D.M. and Y.W.; methodology, C.W.; software, H.H.; validation, Y.L., Y.W. and H.H.; formal analysis, C.W.; investigation, D.M.; resources, C.W.; data curation, D.M.; writing—original draft preparation, D.M.; writing—review and editing, Y.W.; visualization, Y.L.; supervision, Y.W.; project administration, C.W.; funding acquisition, D.M. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Youth Project of the Fujian Provincial Social Science Foundation (Grant No. FJ2024C030), the Fujian Provincial Natural Science Foundation Youth Innovation Project (Grant No. 2026J008253), the Natural Science Foundation of Hunan Province (2026JJ50259), the Scientific Research Project of Education Department of Hunan Province (25A0059), the National Natural Science Foundation of China (12301344, 12571284, 72373074), and China Postdoctoral Science Foundation (2024M761153).

Data Availability Statement

All empirical analyses presented in this manuscript are conducted using the publicly available NLS-based panel wage dataset. The dataset is distributed with the R package PoEdata under the filename “nls_panel.dat”, and is freely accessible to the research community.

Conflicts of Interest

The writers affirm that they have no conflicting interests that could appear to influence the present work.

Appendix A

This appendix contains the proofs of the main theorems presented in the paper. Throughout, we denote a generic positive constant by C, which may take different values at different occurrences.

Appendix A.1. Proofs for Section 2.2

Proof of Theorem 1.

(i) We first show the consistency of

\hat{θ}

. We use the empirical notations. Denote

P_{n}

as the empirical measure, and

P

as the probability measure. That is,

P g = E g (X)

and

P_{n} g = n^{- 1} \sum_{i = 1}^{n} g (X_{i})

for any measurable function

g (\cdot)

.

Define

e_{i t} (θ) = {\ddot{y}}_{i t} - β^{T} {\ddot{x}}_{i t} (γ) = {\ddot{ε}}_{i t} + {(β^{★} - β)}^{T} {\ddot{x}}_{i t} (γ) + β^{★ T} [{\ddot{x}}_{i t} (γ^{★}) - {\ddot{x}}_{i t} (γ)]

. The minimizer of

Q_{n} (θ)

is equivalent to that of

P_{n} g (θ)

with respect to

θ

, where

g (θ) = \sum_{t = 1}^{T} a (e_{i t} (θ)) e_{i t} (θ),

where

a (e_{i t} (θ)) = \sqrt{12} (\frac{R (e_{i t} (θ))}{n T + 1} - 0.5)

. By Condition (A4),

P g (θ)

is continuous in

θ

and is uniquely minimized at

θ^{★}

. We need to show that the class of functions

{g (θ) : θ \in Θ}

is Glivenko-Cantelli, that is

{sup}_{θ \in Θ} | P_{n} g (θ) - P g (θ) | \overset{P}{\to} 0

, as n goes to infinity. Since

Θ

is compact, it is easy to verify that both

P_{n} g (θ)

and

P g (θ)

are continuous in

θ

. By the weak law of large numbers, we have

P_{n} g (θ) \overset{P}{\to} P g (θ)

pointwisely in

θ

. It remains to show that

g (θ)

is Lipschitz continuous in probability. Note that for any given

x_{i t}

,

x_{i t} (γ)

is continuous in

γ

and hence

{\ddot{x}}_{i t} (γ) = x_{i t} (γ) - T^{- 1} \sum_{s = 1}^{T} x_{i s} (γ)

is also continuous in

γ

. Furthermore,

e_{i t} (θ)

is continuous in

θ

. By triangle inequality, we have the following bound

\begin{matrix} ∥ {\ddot{x}}_{i t} (γ) ∥ & \leq & ∥ x_{i t} (γ) ∥ + \frac{1}{T} \sum_{s = 1}^{T} ∥ x_{i s} (γ) ∥ \\ \leq & ∥ z_{i t} ∥ + | x_{i t} | + {\bar{C}}_{Γ_{0}} + ∥ q_{i t} ∥ {\bar{C}}_{Γ_{1}} + \frac{1}{T} \sum_{s = 1}^{T} \{∥ z_{i s} ∥ + | x_{i s} | + {\bar{C}}_{Γ_{0}} + ∥ q_{i s} ∥ {\bar{C}}_{Γ_{1}}\}, \end{matrix}

where

{\bar{C}}_{Γ_{0}} = sup {| γ_{0} | : γ_{0} \in Γ_{0}}

and

{\bar{C}}_{Γ_{1}} = sup {∥ γ_{1} ∥ : γ_{1} \in Γ_{1}}

. Thus, by Condition (A1),

\begin{matrix} | P g (θ) | & \leq & \sum_{t = 1}^{T} \sqrt{12} |\frac{R (e_{i t} (θ))}{n T + 1} - 0.5| \cdot | e_{i t} (θ) | \\ \leq & \sum_{t = 1}^{T} \sqrt{12} |\frac{R (e_{i t} (θ))}{n T + 1} - 0.5| \cdot (| {\ddot{y}}_{i t} | + \bar{β} ∥ {\ddot{x}}_{i t} (γ) ∥) < \infty, \end{matrix}

where

\bar{β} = sup {∥ β ∥ : β \in B}

. Moreover, for all

\tilde{θ}

and

θ

in

Θ

, we have

\begin{matrix} | P_{n} g (\tilde{θ}) - P g (θ) | & \leq & C \sum_{t = 1}^{T} \{∥ \tilde{β} ∥ \cdot ∥ {\ddot{x}}_{i t} (\tilde{γ}) - {\ddot{x}}_{i t} (\tilde{γ}) ∥ + ∥ \tilde{β} - β ∥ \cdot ∥ {\ddot{x}}_{i t} (γ) ∥\} \\ \leq & C \sum_{t = 1}^{T} \{∥ \tilde{β} ∥ \cdot ∥ \tilde{γ} - \tilde{γ} ∥ + ∥ \tilde{β} - β ∥ \cdot ∥ {\ddot{x}}_{i t} (γ) ∥\} \end{matrix}

for some positive constant C. Thus, by Conditions (A1) and (A4),

∥ θ ∥

and

E ∥ {\ddot{x}}_{i t} (γ) ∥

are bounded. Then, there exists a positive constant

B_{n} = O_{p} (1)

, such that

| P_{n} g (\tilde{θ}) - P g (θ) | \leq B_{n} ∥ \tilde{θ} - θ ∥

, for all

\tilde{θ}

and

θ

in

Θ

. Hence, the empirical process

θ \to P_{n} g (θ)

is stochastically equicontinuous, which implies the uniform convergence of

P_{n} g (θ)

to

P g (θ)

.

Given the compactness of

Γ \times B

, and the uniqueness of the minimum true value

θ^{★}

by assumption, by Theorem 2.1 of [34], we have

\hat{θ} \overset{P}{\to} θ^{★}

.

(ii) We next prove the asymptotic normality. Following the proof of [3], we need to verify that conditions (i)–(iv) in [35] (Section 3.2) also hold for the panel data in rank estimation. The consistency of

\hat{θ}

has been established. It remains to verify the following conditions:

Condition 2.: $\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} a ({\ddot{ε}}_{i t}) h_{i t} \overset{D}{⟶} N (0, Σ)$ ;
Condition 3.: $G (θ)$ is continuous in $θ$ , and $G (θ^{★}) = G$ ;
Condition 4.: $ν (θ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} [a ({\ddot{ε}}_{i t} (θ)) h_{i t} (θ) - E \{a ({\ddot{ε}}_{i t} (θ)) h_{i t} (θ)\}]$ is stochastically equicontinuous.

By the central limit theorem, Condition 2 follows. From the expression of

G (θ) = \sum_{t = 1}^{T} E [h_{i t} (θ) h_{i t} {(θ)}^{T}]

, we find that the elements of

G (θ)

are quadratic in

β

. Hence,

G (θ)

is continuous in

β

. By the fact that

γ

enters

G (θ)

through one of the following forms:

E [x_{i t} (γ) x_{i s} (γ)]

,

E [x_{i t} (γ) z_{i s}^{T}]

,

E [1_{i t} (γ) 1_{i s} (γ)]

,

E [w_{i t} 1_{i s} (γ)]

, and

E [q_{i t}^{T} w_{i t} 1_{i s} (γ)]

. By Condition (A1), there exists a constant C satisfying

{(E ∥ w_{i t} ∥^{2 + r / 2})}^{4 / (4 + r)}

\leq C < \infty

. Thus, by Lemma A.1 of [12] and the Hölder’s inequality, we have

\begin{matrix} E ∥ w_{i t} 1 (γ_{1, i t} \leq x_{i t} \leq γ_{2, i t}) ∥^{2} & \leq & {(E ∥ w_{i t} ∥^{2 + r / 2})}^{4 / (4 + r)} {(E | 1 (γ_{1, i t} \leq x_{i t} \leq γ_{2, i t}) |)}^{1 / τ} \\ \leq & C {({\bar{f}}_{q} | γ_{02} - γ_{01} | + C {\bar{f}}_{q} ∥ γ_{12} - γ_{11} ∥)}^{1 / τ}, \end{matrix}

where

τ = \frac{4 + r}{4}

. Therefore,

E [w_{i t} 1_{i s} (γ)]

is continuous in

γ

. Likewise, we can show that

E [x_{i t} (γ) x_{i s} (γ)]

,

E [x_{i t} (γ) z_{i s}^{T}]

,

E [1_{i t} (γ) 1_{i s} (γ)]

, and

E [q_{i t}^{T} w_{i t} 1_{i s} (γ)]

are also continuous in

γ

. Thus,

G (θ)

is continuous in

θ

. By evaluating

θ = θ^{★}

, we obtain

G (θ^{★}) = G

. Condition 3 holds.

We next establish Condition 4. Denote

m_{i t} (θ) = {(m_{i t}^{1} {(θ)}^{T}, m_{i t}^{2} {(θ)}^{T}, m_{i t}^{3} {(θ)}^{T})}^{T}

, in which

m_{i t} (θ) = a ({\ddot{ε}}_{i t} (θ)) h_{i t}

,

m_{i t}^{1} {(θ)}^{T} = {\ddot{x}}_{i t} (γ) a ({\ddot{ε}}_{i t} (θ))

,

m_{i t}^{2} {(θ)}^{T} = - β_{1} {\ddot{1}}_{i t}^{+} (γ) a ({\ddot{ε}}_{i t} (θ))

, and

m_{i t}^{3} {(θ)}^{T} = - β_{1} {\ddot{q}}_{i t}^{+} (γ) a ({\ddot{ε}}_{i t} (θ))

.

Note that the first part is linear in

β

, and the second and third terms are quadratic in

β

. It suffices to show that the stochastic equicontinuity with

γ

. We thus simplify notation by writing

m_{i t} (θ) = m_{i t}^{*} (γ)

. Under Condition (A1),

m_{i t}^{*} (γ)

has a bounded

2 + \frac{r}{2}

-th moment and the envelope condition holds. For any

δ

, set

N (δ) = δ^{- 2 τ}

and

γ_{\cdot, k} = {[γ_{0, k}, γ_{1, k}^{T}]}^{T}

,

k = 1, \dots, N_{δ}

, to be an equally spaced grid on

Γ

. Notice that the distance between the grid points is

O (\frac{1}{N_{δ}})

. Define

m_{i t k}^{*} = min [m_{i t}^{*} (γ_{\cdot, k - 1}), m_{i t}^{*} (γ_{\cdot, k})]

and

m_{i t k}^{* *} = max [m_{i t}^{*} (γ_{\cdot, k - 1}), m_{i t}^{*} (γ_{\cdot, k})]

. Then for each

γ

, there exists

γ_{\cdot, k} = {[γ_{0, k}, γ_{1, k}^{T}]}^{T}

such that

m_{i t k}^{*} \leq m_{i t}^{*} (γ) \leq m_{i t k}^{* *}

. Thus

[m_{i t k}^{*}, m_{i t k}^{* *}]

brackets

m_{i t}^{*} (γ)

. Using the bound of

E ∥ w_{i t} 1 (γ_{i t, k - 1} \leq x_{i t} \leq γ_{i t, k}) ∥^{2}

, we thus have

\begin{matrix} E ∥ m_{i t k}^{* *} - m_{i t k}^{*} ∥^{2} & = & E ∥ m_{i t}^{*} (γ_{\cdot, k}) - m_{i t}^{*} (γ_{\cdot, k - 1}) ∥^{2} \\ \leq & C {({\bar{f}}_{q} | γ_{02} - γ_{01} | + C {\bar{f}}_{q} ∥ γ_{12} - γ_{11} ∥)}^{1 / τ} \\ \leq & O (N_{δ}^{- 1 / τ}) = O (δ^{2}) . \end{matrix}

It follows that

N_{δ} = δ^{- 2 τ}

are the

L^{2}

bracketing number and

ln N_{δ} = O (| ln δ |)

is the metric entropy with bracketing for the class

{m_{i t}^{*} (γ) : γ \in Γ}

. Hence, Condition 4 holds by (2.17) of [36]. Combining these facts together, the asymptotic normality in Theorem 1 holds. □

Appendix A.2. Proofs for Section 2.5 and Section 2.6

The proof of Theorem 2 follows from the asymptotic property in Theorem 1, whose proof is thus omitted.

Recall that we consider the local alternative model

y_{i t} = β_{0}^{★} x_{i t} + n^{- 1 / 2} β_{1}^{★} {(x_{i t} - γ_{i t})}_{+} + β_{2}^{★ T} z_{i t} + μ_{i} + ε_{i t}

, where

β_{1}^{★} = 0

corresponds to the null hypothesis. By concentrating out

μ_{i}

, we have

{\ddot{y}}_{i t} = β_{0}^{★} {\ddot{x}}_{i t} + n^{- 1 / 2} β_{1}^{★} {\ddot{x}}_{i t} (γ^{★}) + β_{2}^{★ T} {\ddot{z}}_{i t} + {\ddot{ε}}_{i t} .

(A1)

To prove Theorem 3, we need the following convergence results.

Lemma A1.

Under the Conditions (A1)–(A3), as

n \to \infty

, we have

(i): ${\hat{S}}_{1 n} \overset{p}{\to} S_{1}$ ;
(ii): ${sup}_{γ \in Γ} | S_{2 n} (γ) - S_{2} (γ) | \overset{p}{\to} 0$ ;
(iii): ${sup}_{γ \in Γ} | {\hat{S}}_{2 n} (γ) - S_{2} (γ) | \overset{p}{\to} 0$ ;
(iv): ${sup}_{γ \in Γ} | S_{n} (γ \land γ^{★}) - S (γ \land γ^{★}) | \overset{p}{\to} 0$ .

Proof.

It is easy to prove (i) by applying the weak law of large numbers. For (ii), we can show that

S_{2 n} (γ) \overset{p}{\to} E [S_{2 n} (γ)] = S_{2} (γ)

for each given

γ

. Then the uniform convergence follows with the similar arguments used in Lemma 5 of [19]. For (iii), it is sufficient to show that

{sup}_{γ \in Γ} | {\hat{S}}_{2 n} (γ) - S_{2 n} (γ) | = o_{p} (1)

. We can write

\begin{matrix} {\hat{S}}_{2 n} (γ) - S_{2} (γ) & = & \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [\hat{f} ({\hat{\ddot{ε}}}_{i t}) - f ({\hat{\ddot{ε}}}_{i t})] {\ddot{w}}_{i t} {\ddot{x}}_{i t} (γ) \\ + \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [f ({\hat{\ddot{ε}}}_{i t}) - f ({\ddot{ε}}_{i t})] {\ddot{w}}_{i t} {\ddot{x}}_{i t} (γ) + S_{2 n} (γ) - S_{2} (γ) \\ \equiv & I_{1} + I_{2} + I_{3} . \end{matrix}

By the uniform convergence of the kernel density estimator, we have

{sup}_{γ} | I_{1} | = o_{p} (1)

. For

I_{2}

,

| I_{2} | \leq \frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} |{\ddot{w}}_{i t} {\ddot{x}}_{i t} (γ)| \cdot max_{i, t} |f ({\ddot{y}}_{i t} - {\hat{ξ}}^{T} {\ddot{w}}_{i t}) - f ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t})| .

By the Condition (A3), and

∥ \hat{ξ} - ξ ∥ = O_{p} (n^{- 1 / 2})

in the proof of Theorem 3, and the mean-value theorem, it is easy to show

max_{i, t} |f ({\ddot{y}}_{i t} - {\hat{ξ}}^{T} {\ddot{w}}_{i t}) - f ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t})| \leq max_{i, t} ∥ {\ddot{w}}_{i t} ∥ \cdot ∥ \hat{ξ} - ξ ∥ \cdot | f^{'} (ζ^{T} {\ddot{w}}_{i t}) | = o_{p} (1),

where

ζ

lies in the segment between

\hat{ξ}

and

ξ

. Thus,

{sup}_{γ} | I_{2} | = o_{p} (1)

. Furthermore,

{sup}_{γ} | I_{3} | = o_{p} (1)

follows from (ii), and hence (iii) holds.

The proof of part (iv) follows a similar argument as that of part (ii) and is therefore omitted. □

Proof of Theorem 3.

Recall that

\hat{ξ} = {({\hat{β}}_{0}, {\hat{β}}_{2}^{T})}^{T} = arg min_{ξ} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t})}{n T + 1} - 0.5) ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t}),

which is equivalent to solving the estimating equation

\begin{matrix} M_{n} (ξ) & = & - \frac{d}{d ξ} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t})}{n T + 1} - 0.5) ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t}) \\ = & \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R ({\ddot{y}}_{i t} - ξ^{T} {\ddot{w}}_{i t})}{n T + 1} - 0.5) {\ddot{w}}_{i t} . \end{matrix}

Under the following local alternative model (A1), we have

\begin{matrix} M_{n} (ξ) & = & \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R \{{\ddot{ε}}_{i t} + n^{- 1 / 2} β_{1}^{★} {\ddot{x}}_{i t} (γ^{★})\}}{n T + 1} - 0.5) {\ddot{w}}_{i t} \\ = & \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [\frac{n T}{n T + 1} \hat{F} \{{\ddot{ε}}_{i t} + n^{- 1 / 2} β_{1}^{★} {\ddot{x}}_{i t} (γ^{★})\} - 0.5] {\ddot{w}}_{i t} + o_{p} (1) \\ = & \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5 + f ({\ddot{ε}}_{i t}) n^{- 1 / 2} β_{1}^{★} {\ddot{x}}_{i t} (γ^{★})] {\ddot{w}}_{i t}, \end{matrix}

(A2)

where

F (\cdot)

is the cumulative distribution function of

{\ddot{ε}}_{i t}

and

\hat{F} (\cdot)

is the corresponding estimator. The last equality in (A2) follows from Taylor expansion.

Next, by Theorem A.3.8 in [18], it yields that

n^{- 1 / 2} M_{n} (\hat{ξ}) = n^{- 1 / 2} M_{n} (ξ) - c_{ϕ} (\frac{1}{n} \sum_{i = 1}^{n} \sum_{t = 1}^{T} {\ddot{w}}_{i t} {\ddot{w}}_{i t}^{T}) \sqrt{n} (\hat{ξ} - ξ) + o_{p} (1) .

Since

n^{- 1 / 2} M_{n} (\hat{ξ}) = 0

, and combined with Lemma A1, it follows that

\begin{matrix} \sqrt{n} (\hat{ξ} - ξ^{★}) & = & c_{ϕ} S_{1}^{- 1} n^{- 1 / 2} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5] {\ddot{w}}_{i t} \\ + c_{ϕ} S_{1}^{- 1} n^{- 1 / 2} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} f ({\ddot{ε}}_{i t}) n^{- 1 / 2} β_{1}^{★} {\ddot{x}}_{i t} (γ^{★}) {\ddot{w}}_{i t} + o_{p} (1), \end{matrix}

where

ξ^{★} = {(β_{0}^{★}, β_{2}^{★ T})}^{T}

is the true value of

ξ

.

Under the local alternative model (A1), we can rewrite

R_{n} (γ)

as

\begin{matrix} R_{n} (γ) & = & \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} (\frac{R ({\ddot{y}}_{i t} - {\hat{ξ}}^{T} {\ddot{w}}_{i t})}{n T + 1} - 0.5) ({\ddot{x}}_{i t} (γ) - c_{ϕ} {\hat{S}}_{2 n} {(γ)}^{T} {\hat{S}}_{1 n}^{- 1} {\ddot{w}}_{i t}) \\ = & \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5 - f ({\ddot{ε}}_{i t}) {(\hat{ξ} - ξ^{★})}^{T} {\ddot{w}}_{i t} + n^{- 1 / 2} f ({\ddot{ε}}_{i t}) β_{1}^{★} {\ddot{x}}_{i t} (γ^{★})] {\ddot{x}}_{i t} (γ) \\ - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5 - f ({\ddot{ε}}_{i t}) {(\hat{ξ} - ξ^{★})}^{T} {\ddot{w}}_{i t} + n^{- 1 / 2} f ({\ddot{ε}}_{i t}) β_{1}^{★} {\ddot{x}}_{i t} (γ^{★})] c_{ϕ} {\hat{S}}_{2 n} {(γ)}^{T} {\hat{S}}_{1 n}^{- 1} {\ddot{w}}_{i t} \\ \equiv & A_{1} - A_{2} . \end{matrix}

For

A_{1}

, note that

\begin{matrix} A_{1} & = & \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5] {\ddot{x}}_{i t} (γ) - {(\hat{ξ} - ξ^{★})}^{T} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [f ({\ddot{ε}}_{i t}) {\ddot{w}}_{i t} {\ddot{x}}_{i t} (γ)] \\ + \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [n^{- 1 / 2} f ({\ddot{ε}}_{i t}) β_{1}^{★} {\ddot{x}}_{i t} (γ^{★}) {\ddot{x}}_{i t} (γ)] \\ = & \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5] {\ddot{x}}_{i t} (γ) - \sqrt{n} {(\hat{ξ} - ξ^{★})}^{T} S_{2} (γ) + β_{1}^{★} S (γ \land γ^{★}) + o_{p} (1) . \end{matrix}

(A3)

For

A_{2}

, by applying Lemma A1, we get

\begin{matrix} A_{2} & = & c_{ϕ} {\hat{S}}_{2 n} {(γ)}^{T} {\hat{S}}_{1 n}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5] {\ddot{w}}_{i t} \\ - c_{ϕ} {\hat{S}}_{2 n} {(γ)}^{T} {\hat{S}}_{1 n}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [f ({\ddot{ε}}_{i t}) {(\hat{ξ} - ξ^{★})}^{T} {\ddot{w}}_{i t}] {\ddot{w}}_{i t} \\ + c_{ϕ} {\hat{S}}_{2 n} {(γ)}^{T} {\hat{S}}_{1 n}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [n^{- 1 / 2} f ({\ddot{ε}}_{i t}) β_{1}^{★} {\ddot{x}}_{i t} (γ^{★})] {\ddot{w}}_{i t} \\ = & c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5] {\ddot{w}}_{i t} - \sqrt{n} {(\hat{ξ} - ξ^{★})}^{T} S_{2} (γ) \\ + β_{1}^{★} c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} S_{2} (γ^{★}) + o_{p} (1) \end{matrix}

(A4)

Combining (A3) and (A4) together, we have

\begin{matrix} R_{n} (γ) & = & \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5] ({\ddot{x}}_{i t} (γ) - c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} {\ddot{w}}_{i t}) + κ (γ) + o_{p} (1) \\ \overset{D}{\to} & R (γ) + β_{1}^{★} (S (γ \land γ^{★}) - c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} S_{2} (γ^{★})), \end{matrix}

where the weak convergence of

R_{n} (γ)

can be obtained by following the proofs in [37]. □

Proof of Corollary 1.

By arguments analogous to those used in the proof of Theorem 3, the desired result follows immediately. □

Proof of Theorem 4.

We divide the proof into three steps. First, we show that the covariance function of

R_{n}^{*} (γ)

converges to that of

R (γ)

. Define

R_{n}^{* *} (γ) = \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} u_{i} \sum_{t = 1}^{T} \sqrt{12} [F ({\ddot{ε}}_{i t}) - 0.5] \{{\ddot{x}}_{i t} (γ) - c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} {\ddot{w}}_{i t}\} .

By leveraging the uniform convergence of

\hat{F} - F

and

{\hat{c}}_{ϕ} - c_{ϕ}

, along with the uniform convergence of

{\hat{S}}_{2 n} (γ) - S_{2} (γ)

established in Lemma A1, we can readily show that

R_{n}^{*} (γ)

and

R_{n}^{* *} (γ)

are asymptotically equivalent in the sense that

{sup}_{γ} ∥ R_{n}^{*} (γ) - R_{n}^{* *} (γ) ∥ = o_{p} (1)

. Note that

u_{i}

’s are independent of

(y_{i t}, x_{i t}, z_{i t}, q_{i t})

, and

E u_{i} = 0

,

Var (u_{i}) = 1

. Then, for any

γ

and

\tilde{γ}

, the covariance function of

R_{n}^{* *} (γ)

is

\begin{matrix} Cov (R_{n}^{* *} (γ), R_{n}^{* *} (\hat{γ})) & = & \frac{1}{n} \sum_{i = 1}^{n} E (u_{i}^{2} \sum_{t = 1}^{T} 12 {[F ({\ddot{ε}}_{i t}) - 0.5]}^{2} \{{\ddot{x}}_{i t} (γ) - c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} {\ddot{w}}_{i t}\} \\ \times \{{\ddot{x}}_{i t} (\tilde{γ}) - c_{ϕ} S_{2} {(\tilde{γ})}^{T} S_{1}^{- 1} {\ddot{w}}_{i t}\}) \\ = & \sum_{t = 1}^{T} E [\{{\ddot{x}}_{i t} (γ) - c_{ϕ} S_{2} {(γ)}^{T} S_{1}^{- 1} {\ddot{w}}_{i t}\} \cdot \{{\ddot{x}}_{i t} (\tilde{γ}) - c_{ϕ} S_{2} {(\tilde{γ})}^{T} S_{1}^{- 1} {\ddot{w}}_{i t}\}] \end{matrix}

which is the same as the covariance of

R (γ)

.

Second, it is straightforward to show that any finite-dimensional projection of

R_{n}^{*} (γ)

converges to that of

R (γ)

, by the central limit theorem.

Third,

R_{n} (γ)

is uniformly tight. Note that the class of all indicator functions

I (x_{i t} \leq γ_{i t})

constitutes a Vapnik-Chervonenkis (VC) class of functions. Consequently, the class of functions

F_{n} = [\sum_{t = 1}^{T} \{{\ddot{x}}_{i t} (γ) - c_{ϕ} S_{2} (γ) S_{1}^{- 1} {\ddot{w}}_{i t}\} : γ \in Γ]

is also a VC class. Thus, by appealing to the equicontinuity lemma (Lemma 15) from [38], one can establish that

R_{n}^{*} (γ)

is uniformly tight. Finally, by the Cramér-Wold device, the proof of Theorem 4 is completed. □

References

Tong, H. Non-Linear Time Series: A Dynamical System Approach; Oxford University Press: Oxford, UK, 1990. [Google Scholar]
Hansen, B.E. Sample Splitting and Threshold Estimation. Econometrica 2000, 68, 575–603. [Google Scholar] [CrossRef]
Hansen, B.E. Regression kink with an unknown threshold. J. Bus. Econ. Stat. 2017, 35, 228–240. [Google Scholar] [CrossRef]
Card, D.; Mas, A.; Rothstein, J. Tipping and the Dynamics of Segregation. Q. J. Econ. 2008, 123, 177–218. [Google Scholar] [CrossRef]
Zhong, W.; Wan, C.; Zhang, W. Estimation and inference for multi-kink quantile regression. J. Bus. Econ. Stat. 2022, 40, 1123–1139. [Google Scholar] [CrossRef]
Zhang, F.; Xie, R.; Xiao, Z. Time series quantile regression kink with an unknown threshold. Econom. Rev. 2025, 44, 1275–1320. [Google Scholar] [CrossRef]
Das, R.; Banerjee, M.; Nan, B.; Zheng, H. Fast estimation of regression parameters in a broken-stick model for longitudinal data. J. Am. Stat. Assoc. 2016, 111, 1132–1143. [Google Scholar] [CrossRef]
Wan, C.; Zhong, W.; Zhang, W.; Zou, C. Multikink quantile regression for longitudinal data with application to progesterone data analysis. Biometrics 2023, 79, 747–760. [Google Scholar] [CrossRef]
Zhang, Y.; Zhou, Q.; Jiang, L. Panel kink regression with an unknown threshold. Econ. Lett. 2017, 157, 116–121. [Google Scholar] [CrossRef]
Sun, Y.; Wan, C.; Zhang, W.; Zhong, W. A Multi-Kink quantile regression model with common structure for panel data analysis. J. Econom. 2024, 239, 105304. [Google Scholar] [CrossRef]
Yang, L.; Su, J.J. Debt and growth: Is there a constant tipping point? J. Int. Money Financ. 2018, 87, 133–143. [Google Scholar] [CrossRef]
Yang, L.; Zhang, C.; Lee, C.; Chen, I.P. Panel kink threshold regression model with a covariate-dependent threshold. Econom. J. 2021, 24, 462–481. [Google Scholar] [CrossRef]
Zhang, F.; Li, Q. Robust bent line regression. J. Stat. Plan. Inference 2017, 185, 41–55. [Google Scholar] [CrossRef] [PubMed]
Jaeckel, L.A. Estimating Regression Coefficients by Minimizing the Dispersion of the Residuals. Ann. Math. Stat. 1972, 43, 1449–1458. [Google Scholar] [CrossRef]
Hansen, B.E. Threshold effects in non-dynamic panels: Estimation, testing, and inference. J. Econom. 1999, 93, 345–368. [Google Scholar] [CrossRef]
Zhou, M.; Ye, F.; Li, Y.; Liu, F.; Wan, C. A note on the covariate-dependent kink threshold regression model for panel data. Commun. Stat. Theory Methods 2025, 54, 908–920. [Google Scholar] [CrossRef]
Jureckova, J. Nonparametric Estimate of Regression Coefficients. Ann. Math. Stat. 1971, 42, 1328–1338. [Google Scholar] [CrossRef]
Hettmansperger, T.; McKean, J. Robust Nonparametric Statistical Methods, 2nd ed.; Robust Nonparametric Statistical Methods; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
Yu, P.; Fan, X. Threshold regression with a threshold boundary. J. Bus. Econ. Stat. 2021, 39, 953–971. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, H.J.; Zhu, Z. Single-index thresholding in quantile regression. J. Am. Stat. Assoc. 2022, 117, 2222–2237. [Google Scholar] [CrossRef]
Wei, K.; Zhu, H.; Qin, G.; Zhu, Z.; Tu, D. Multiply robust subgroup analysis based on a single-index threshold linear marginal model for longitudinal data with dropouts. Stat. Med. 2022, 41, 2822–2839. [Google Scholar] [CrossRef]
Wan, C.; Zeng, H.; Zhang, W.; Zhong, W.; Zou, C. Data-driven estimation for multithreshold accelerated failure time model. Scand. J. Stat. 2025, 52, 447–468. [Google Scholar] [CrossRef]
Koul, H.L.; Sievers, G.L.; McKean, J. An estimator of the scale parameter for the rank analysis of linear models under general score functions. Scand. J. Stat. 1987, 14, 131–141. [Google Scholar]
Lee, S.; Seo, M.H.; Shin, Y. Testing for threshold effects in regression models. J. Am. Stat. Assoc. 2011, 106, 220–231. [Google Scholar] [CrossRef]
Roy, S.N. On a heuristic method of test construction and its use in multivariate analysis. Ann. Math. Stat. 1953, 24, 220–238. [Google Scholar] [CrossRef]
Silverman, B.W. Density Estimation for Statistics and Data Analysis; Routledge: London, UK, 1986. [Google Scholar]
Mincer, J. Labor force participation of married women: A study of labor supply. In Aspects of Labor Economics; Princeton University Press: Princeton, NJ, USA, 1962; pp. 63–105. [Google Scholar]
Heckman, J. Shadow prices, market wages, and labor supply. Econom. J. Econom. Soc. 1974, 42, 679–694. [Google Scholar] [CrossRef]
Blau, F.D.; Kahn, L.M. Changes in the labor supply behavior of married women: 1980–2000. J. Labor Econ. 2007, 25, 393–438. [Google Scholar] [CrossRef]
Bick, A.; Blandin, A.; Rogerson, R. Hours and wages. Q. J. Econ. 2022, 137, 1901–1962. [Google Scholar] [CrossRef]
Gicheva, D. Working long hours and early career outcomes in the high-end labor market. J. Labor Econ. 2013, 31, 785–824. [Google Scholar] [CrossRef]
Liu, K. Explaining the gender wage gap: Estimates from a dynamic model of job changes and hours changes. Quant. Econ. 2016, 7, 411–447. [Google Scholar] [CrossRef][Green Version]
Hill, R.C.; Griffiths, W.E.; Lim, G.C. Principles of Econometrics; John Wiley & Sons: Hoboken, NJ, USA, 2018. [Google Scholar]
Newey, W.K.; McFadden, D. Large sample estimation and hypothesis testing. Handb. Econom. 1994, 4, 2111–2245. [Google Scholar]
Andrews, D.W. Empirical process methods in econometrics. Handb. Econom. 1994, 4, 2247–2294. [Google Scholar]
Doukhan, P.; Massart, P.; Rio, E. Invariance principles for absolutely regular empirical processes. In Proceedings of the Annales de l’IHP Probabilités et Statistiques; Institute of Mathematical Statistics: Waite Hill, OH, USA, 1995; Volume 31, pp. 393–427. [Google Scholar]
Stute, W. Nonparametric model checks for regression. Ann. Stat. 1997, 25, 613–641. [Google Scholar] [CrossRef]
Pollard, D. Convergence of Stochastic Processes; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]

Figure 1. The Q–Q plot against the normal distribution. The red 45-degree line in the Q–Q plot is the ideal reference line, which is used to judge whether the sample data follows the theoretical normal distribution.

Figure 2. Diagnostic plots for influential observations. The vertical solid lines represent the Cook’s distance for each observation, measuring the influence of each data point on the regression model. The horizontal dashed red line indicates the commonly used threshold

4 / n

for identifying influential observations.

Figure 2. Diagnostic plots for influential observations. The vertical solid lines represent the Cook’s distance for each observation, measuring the influence of each data point on the regression model. The horizontal dashed red line indicates the commonly used threshold

4 / n

for identifying influential observations.

Table 1. Performance comparison between the proposed estimator and Yang’s estimator, based on 500 simulated samples generated from DGP 1 with n = 50, 100 and

T = 10

, for the four error distributions.

Table 1. Performance comparison between the proposed estimator and Yang’s estimator, based on 500 simulated samples generated from DGP 1 with n = 50, 100 and

T = 10

, for the four error distributions.

Errors		Yang					Proposed
Errors		$β_{0}$	$β_{1}$	$β_{2}$	$γ_{0}$	$γ_{1}$	$β_{0}$	$β_{1}$	$β_{2}$	$γ_{0}$	$γ_{1}$
$n = 50, T = 10$
$N (0, 1)$	Bias	0.015	−0.028	−0.012	0.000	0.005	0.014	−0.028	−0.011	0.001	0.004
	SD	0.130	0.134	0.087	0.256	0.080	0.133	0.137	0.090	0.260	0.082
	ESE	0.102	0.128	0.071	0.154	0.061	0.140	0.177	0.097	0.213	0.085
	MSE	0.017 ¹	0.019	0.008	0.065	0.006	0.018	0.020	0.008	0.067	0.007
	ECP	0.874	0.924	0.884	0.806	0.908	0.956	0.988	0.962	0.902	0.940
$t (3)$	Bias	0.015	−0.035	0.002	0.007	−0.010	0.006	−0.014	−0.003	0.002	0.001
	SD	0.157	0.262	0.093	0.265	0.089	0.095	0.103	0.070	0.173	0.066
	ESE	0.098	0.132	0.066	0.152	0.060	0.106	0.136	0.073	0.170	0.067
	MSE	0.025	0.070	0.009	0.070	0.008	0.009	0.011	0.005	0.030	0.004
	ECP	0.898	0.930	0.868	0.822	0.924	0.954	0.980	0.948	0.926	0.976
$T (0.1, 10)$	Bias	0.028	−0.027	−0.015	−0.034	0.001	0.003	0.000	−0.006	−0.009	0.006
	SD	0.127	0.133	0.090	0.237	0.083	0.055	0.060	0.048	0.098	0.059
	ESE	0.101	0.126	0.070	0.153	0.061	0.074	0.094	0.052	0.117	0.047
	MSE	0.017	0.018	0.008	0.057	0.007	0.003	0.004	0.002	0.010	0.004
	ECP	0.910	0.952	0.898	0.838	0.910	0.976	0.988	0.948	0.968	0.922
$L N (0, 1)$	Bias	0.013	−0.022	−0.016	−0.013	0.001	−0.002	0.002	−0.003	−0.006	0.008
	SD	0.176	0.195	0.125	0.292	0.111	0.061	0.070	0.050	0.112	0.059
	ESE	0.102	0.129	0.072	0.160	0.061	0.080	0.102	0.056	0.130	0.051
	MSE	0.031	0.039	0.016	0.085	0.012	0.004	0.005	0.002	0.013	0.004
	ECP	0.892	0.946	0.864	0.834	0.904	0.972	1.000	0.932	0.960	0.938
$n = 100, T = 10$
$N (0, 1)$	Bias	0.013	−0.017	−0.009	0.001	0.001	0.013	−0.017	−0.009	0.002	0.001
	SD	0.086	0.095	0.063	0.145	0.064	0.087	0.097	0.064	0.145	0.065
	ESE	0.071	0.089	0.049	0.108	0.043	0.099	0.124	0.069	0.152	0.060
	MSE	0.008	0.009	0.004	0.021	0.004	0.008	0.010	0.004	0.021	0.004
	ECP	0.892	0.922	0.878	0.884	0.960	0.964	0.984	0.970	0.958	0.982
$t (3)$	Bias	0.011	−0.011	−0.008	−0.006	0.001	0.001	0.002	−0.002	−0.005	0.003
	SD	0.186	0.183	0.130	0.170	0.066	0.066	0.075	0.051	0.117	0.060
	ESE	0.076	0.095	0.053	0.109	0.043	0.080	0.102	0.055	0.126	0.049
	MSE	0.035	0.034	0.017	0.029	0.004	0.004	0.006	0.003	0.014	0.004
	ECP	0.904	0.914	0.880	0.860	0.900	0.964	0.982	0.952	0.962	0.984
$T (0.1, 10)$	Bias	0.007	−0.008	−0.004	−0.004	0.003	−0.002	0.005	−0.005	−0.008	0.011
	SD	0.087	0.098	0.063	0.159	0.066	0.039	0.043	0.040	0.072	0.057
	ESE	0.070	0.088	0.049	0.109	0.043	0.054	0.068	0.037	0.084	0.053
	MSE	0.008	0.010	0.004	0.025	0.004	0.001	0.002	0.002	0.005	0.003
	ECP	0.916	0.930	0.888	0.846	0.920	0.990	0.996	0.942	0.958	0.948
$L N (0, 1)$	Bias	0.010	−0.014	−0.008	−0.007	0.003	0.000	0.002	−0.006	−0.006	0.011
	SD	0.081	0.092	0.063	0.154	0.067	0.042	0.047	0.044	0.083	0.058
	ESE	0.070	0.088	0.049	0.110	0.043	0.058	0.073	0.041	0.093	0.056
	MSE	0.007	0.009	0.004	0.024	0.005	0.002	0.002	0.002	0.007	0.003
	ECP	0.908	0.944	0.880	0.866	0.900	0.978	0.992	0.898	0.958	0.942

¹ The smaller MSE values are highlighted in black.

Table 2. Performance comparison between the proposed estimator and Yang’s estimator, based on 500 simulated samples generated from DGP 1 with

n = 50, 100

and

T = 20

, for the four error distributions.

Table 2. Performance comparison between the proposed estimator and Yang’s estimator, based on 500 simulated samples generated from DGP 1 with

n = 50, 100

and

T = 20

, for the four error distributions.

Errors		Yang					Proposed
Errors		$β_{0}$	$β_{1}$	$β_{2}$	$γ_{0}$	$γ_{1}$	$β_{0}$	$β_{1}$	$β_{2}$	$γ_{0}$	$γ_{1}$
$n = 50, T = 20$
$N (0, 1)$	Bias	0.005	−0.002	0.000	−0.007	−0.004	0.004	−0.002	0.001	−0.003	−0.004
	SD	0.078	0.088	0.060	0.144	0.065	0.080	0.091	0.062	0.147	0.066
	ESE	0.072	0.089	0.049	0.110	0.044	0.103	0.128	0.070	0.159	0.063
	MSE	0.006 ¹	0.008	0.004	0.021	0.004	0.006	0.008	0.004	0.022	0.004
	ECP	0.932	0.954	0.906	0.896	0.964	0.992	0.992	0.976	0.954	0.944
$t (3)$	Bias	0.001	−0.005	−0.002	0.004	0.003	−0.003	0.003	−0.001	0.001	0.004
	SD	0.087	0.097	0.068	0.153	0.066	0.062	0.069	0.051	0.109	0.059
	ESE	0.071	0.090	0.048	0.109	0.042	0.080	0.101	0.056	0.124	0.049
	MSE	0.008	0.009	0.005	0.023	0.004	0.004	0.005	0.003	0.012	0.004
	ECP	0.918	0.936	0.870	0.870	0.916	0.980	0.990	0.960	0.966	0.920
$T (0.1, 10)$	Bias	0.011	−0.006	−0.006	−0.014	0.001	0.001	0.003	−0.009	−0.012	0.014
	SD	0.080	0.089	0.060	0.143	0.066	0.034	0.038	0.039	0.068	0.056
	ESE	0.071	0.089	0.049	0.110	0.043	0.051	0.064	0.035	0.079	0.052
	MSE	0.007	0.008	0.004	0.021	0.004	0.001	0.001	0.002	0.005	0.003
	ECP	0.936	0.956	0.910	0.888	0.928	0.996	0.996	0.926	0.968	0.942
$L N (0, 1)$	Bias	0.007	−0.012	−0.002	−0.002	0.001	−0.004	0.006	−0.005	−0.008	0.013
	SD	0.080	0.091	0.061	0.146	0.064	0.037	0.043	0.039	0.072	0.057
	ESE	0.070	0.088	0.048	0.110	0.043	0.056	0.071	0.038	0.088	0.049
	MSE	0.006	0.008	0.004	0.021	0.004	0.001	0.002	0.002	0.005	0.003
	ECP	0.946	0.962	0.886	0.908	0.908	0.994	0.996	0.906	0.974	0.912
$n = 100, T = 20$
$N (0, 1)$	Bias	0.003	−0.004	−0.001	−0.002	0.000	0.004	−0.004	−0.002	−0.006	0.000
	SD	0.059	0.066	0.049	0.098	0.060	0.059	0.065	0.050	0.102	0.060
	ESE	0.050	0.062	0.034	0.077	0.030	0.072	0.090	0.050	0.111	0.044
	MSE	0.003	0.004	0.002	0.010	0.004	0.004	0.004	0.002	0.010	0.004
	ECP	0.904	0.928	0.810	0.834	0.576	0.986	0.990	0.962	0.988	0.996
$t (3)$	Bias	−0.002	−0.003	−0.003	0.002	0.006	−0.003	−0.001	−0.005	0.000	0.011
	SD	0.058	0.066	0.050	0.103	0.059	0.046	0.052	0.045	0.084	0.058
	ESE	0.049	0.061	0.034	0.077	0.030	0.056	0.071	0.039	0.088	0.045
	MSE	0.003	0.004	0.002	0.011	0.004	0.002	0.003	0.002	0.007	0.003
	ECP	0.896	0.942	0.828	0.816	0.584	0.982	0.994	0.916	0.922	0.870
$T (0.1, 10)$	Bias	0.003	0.000	−0.003	−0.009	0.004	−0.002	0.005	−0.013	−0.015	0.024
	SD	0.059	0.065	0.049	0.105	0.060	0.027	0.028	0.035	0.060	0.051
	ESE	0.050	0.062	0.035	0.077	0.031	0.036	0.045	0.025	0.056	0.042
	MSE	0.003	0.004	0.002	0.011	0.004	0.001	0.001	0.001	0.004	0.003
	ECP	0.918	0.930	0.824	0.798	0.550	0.984	0.996	0.884	0.996	0.866
$L N (0, 1)$	Bias	0.001	0.000	−0.005	−0.003	0.004	−0.002	0.005	−0.012	−0.016	0.020
	SD	0.056	0.065	0.049	0.100	0.060	0.029	0.031	0.037	0.063	0.053
	ESE	0.049	0.062	0.034	0.078	0.030	0.040	0.050	0.028	0.063	0.044
	MSE	0.003	0.004	0.002	0.010	0.004	0.001	0.001	0.001	0.004	0.003
	ECP	0.930	0.960	0.830	0.844	0.552	0.986	0.996	0.824	0.986	0.912

¹ The smaller MSE values are highlighted in black.

Table 3. Empirical sizes (DGP 2) and powers (DGP 1) for the test of the threshold constancy.

n	T	Methods	$N (0, 1)$		$t (3)$		$T (0.1, 10)$		$LN (0, 1)$
n	T	Methods	Size	Power	Size	Power	Size	Power	Size	Power
50	10	Yang	0.096	1.000	0.132	0.996	0.142	0.996	0.124	0.992
		Proposed	0.054	0.996	0.046	0.998	0.042	1.000	0.052	1.000
	20	Yang	0.072	1.000	0.068	1.000	0.064	1.000	0.114	0.998
		Proposed	0.048	1.000	0.048	1.000	0.052	1.000	0.054	1.000
100	10	Yang	0.058	1.000	0.064	0.996	0.100	1.000	0.092	1.000
		Proposed	0.048	1.000	0.048	1.000	0.044	1.000	0.042	1.000
	20	Yang	0.052	1.000	0.050	1.000	0.064	1.000	0.072	1.000
		Proposed	0.052	1.000	0.046	1.000	0.048	1.000	0.046	1.000

Table 4. Empirical sizes (DGP 3) and powers (DGP 1) for the test of the presence of threshold effect.

n	T	Methods	$N (0, 1)$		$t (3)$		$T (0.1, 10)$		$LN (0, 1)$
n	T	Methods	Size	Power	Size	Power	Size	Power	Size	Power
50	10	Yang	0.012	1.000	0.012	0.996	0.006	1.000	0.004	0.998
		Zhou	0.034	0.700	0.034	0.732	0.038	0.656	0.034	0.722
		Proposed	0.040	0.912	0.050	0.992	0.056	1.000	0.044	0.996
50	20	Yang	0.008	1.000	0.010	1.000	0.004	1.000	0.006	1.000
		Zhou	0.048	1.000	0.032	0.984	0.028	0.996	0.030	0.988
		Proposed	0.048	1.000	0.042	1.000	0.038	1.000	0.056	1.000
100	10	Yang	0.006	1.000	0.010	1.000	0.004	1.000	0.010	1.000
		Zhou	0.052	0.998	0.036	0.986	0.040	0.996	0.040	0.995
		Proposed	0.054	0.998	0.038	1.000	0.046	1.000	0.050	1.000
100	20	Yang	0.006	1.000	0.008	1.000	0.006	1.000	0.004	1.000
		Zhou	0.038	1.000	0.034	1.000	0.018	1.000	0.020	0.998
		Proposed	0.042	1.000	0.042	1.000	0.044	1.000	0.056	1.000

Table 5. Empirical analysis results for the wage dataset.

	Yang			Proposed
	Est.	s.e.	Conf.int.	Est.	s.e.	Conf.int.
$β_{0}$	1.292	0.285	[0.733, 1.850]	1.142	0.051	[1.043, 1.241]
$β_{1}$	−0.399	0.085	[−0.565, −0.232]	−0.276	0.018	[−0.311, −0.242]
$β_{2}$	−0.117	0.189	[−0.486, 0.253]	−0.182	0.033	[−0.247, −0.116]
$γ_{0}$	0.455	0.054	[0.349, 0.560]	0.636	0.021	[0.596, 0.677]
$γ_{1}$	−0.212	0.125	[−0.456, 0.032]	−0.333	0.044	[−0.419, −0.248]
Testing	Statistic		p-value	Statistic		p-value
$H_{0}^{c} : γ_{1} = 0$	2.891		0.089	58.520		0.000
$H_{0}^{l} : β_{1} = 0$	6.072		0.000	1.732		0.010

Table 6. Summary statistics for the estimated residuals obtained by the LS estimator.

Kurtosis	Skewness	Jarque–Bera Test (p-Value)	Shapiro–Wilk Test (p-Value)
17.396	0.480	0.000	0.000

Table 7. Sensitivity of estimates to influential observations

	Full Sample		Trimmed Sample
	LS	Robust	LS	Robust
$β_{0}$	1.292	1.142	1.673	1.149
	(0.285)	(0.051)	(0.267)	(0.041)
$β_{1}$	−0.399	−0.276	−0.483	−0.273
	(0.085)	(0.018)	(0.092)	(0.013)
$β_{2}$	−0.117	−0.182	−0.214	−0.191
	(0.189)	(0.033)	(0.145)	(0.027)
$γ_{0}$	0.455	0.636	0.293	0.676
	(0.054)	(0.021)	(0.040)	(0.019)
$γ_{1}$	−0.212	−0.333	−0.151	−0.343
	(0.125)	(0.044)	(0.096)	(0.036)
NO. of individuals	716	716	688	688

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, D.; Hong, H.; Li, Y.; Wan, C.; Wang, Y. A Robust Covariate-Dependent Kink Threshold Regression Model for Panel Data. Axioms 2026, 15, 319. https://doi.org/10.3390/axioms15050319

AMA Style

Ma D, Hong H, Li Y, Wan C, Wang Y. A Robust Covariate-Dependent Kink Threshold Regression Model for Panel Data. Axioms. 2026; 15(5):319. https://doi.org/10.3390/axioms15050319

Chicago/Turabian Style

Ma, Ding, Hengzhao Hong, Yi Li, Chuang Wan, and Yutong Wang. 2026. "A Robust Covariate-Dependent Kink Threshold Regression Model for Panel Data" Axioms 15, no. 5: 319. https://doi.org/10.3390/axioms15050319

APA Style

Ma, D., Hong, H., Li, Y., Wan, C., & Wang, Y. (2026). A Robust Covariate-Dependent Kink Threshold Regression Model for Panel Data. Axioms, 15(5), 319. https://doi.org/10.3390/axioms15050319

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Robust Covariate-Dependent Kink Threshold Regression Model for Panel Data

Abstract

1. Introduction

2. Methodology

2.1. Covariate-Dependent Panel Kink Threshold Regression Model

2.2. Rank-Based Estimator

2.3. Computational Details

2.4. Asymptotic Properties

2.5. Testing for the Threshold Constancy

2.6. Testing for the Kink Threshold Effect

2.6.1. Limiting Distribution of the Test Statistic

2.6.2. A Bootstrap Approach to Compute the p-Value

3. Simulation Studies

3.1. Estimation Accuracy

3.2. Type I Error and Power Analysis

4. An Empirical Application

4.1. Data and Model Specification

4.2. Estimation Results

4.3. Influence of Outliers and Sensitivity Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Proofs for Section 2.2

Appendix A.2. Proofs for Section 2.5 and Section 2.6

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI