A Penalized Profile Quasi-Likelihood Method for a Semiparametric Varying Coefficient Spatial Autoregressive Panel Model with Fixed Effects

Ruiqin Tian; Miaojie Xia; Dengke Xu

doi:10.3390/axioms14020121

Abstract

This paper proposes a variable selection method for a semiparametric varying coefficient spatial autoregressive panel model with fixed effects based on a penalized profile quasi-likelihood method, which can simultaneously select significant variables in parametric components and nonparametric components without estimating fixed effects. With an appropriate selection of the tuning parameters and some mild assumptions, the consistency of this procedure and the oracle property of the obtained estimators are established. Then, we conduct some Monte Carlo simulations to assess the finite sample performance of the proposed variable selection method, and finally, we analyze a real dataset for further illustration.

Keywords:

semiparametric varying coefficient; spatial autoregressive panel model; fixed effects; variable selection; profile quasi-likelihood

MSC:

62G05; 62F12

1. Introduction

Recently, there has been a surge in focus on spatial panel data model research. These models not only account for the spatial interdependencies of economic phenomena but also allow investigators to manage the unobservable heterogeneity among geographical units [1,2,3,4,5]. The basic spatial panel model is specified as follows:

y_{i t} = ρ_{0} \sum_{j = 1}^{N} w_{i j} y_{j t} + X_{i t}^{τ} β_{0} + μ_{0 i} + ϵ_{i t}, i = 1, 2, \dots, N; t = 1, 2, \dots, T,

(1)

where

{y_{i t}, X_{i t}}

represents the observations of an individual i in period t,

w_{i j}

denotes the spatial weight between an individual i and j,

μ_{0 i}

denotes the unobserved and time-invariant individual effect,

ϵ_{i t}

denotes random disturbance, and

{ρ_{0}, β_{0}}

denotes the unknown true parameter value. However, model (1) adopts a linear specification and conducting parametric statistical inference inherently requires a set of model assumptions, with linearity serving as one of the most practical options. Despite their robust theoretical foundations, linear models frequently fall short in practical applications. Furthermore, when a linear model is misspecified with respect to the data analysis process, it may result in significant modeling biases and misleading conclusions. Therefore, more flexible spatial panel models are required. Ai and Zhang [6] extended model (1) to a partially linear spatial panel model with fixed effects by adding an unknown function, and they proposed a sieve two-stage least squares regression to consistently estimate their model. Zhang and Sun [7] considered a partially specified dynamic spatial panel model, which takes into account past information of the dependent variable. Furthermore, a model known as the semiparametric varying coefficient spatial autoregressive (hereafter, SVCSAR) panel model, which strikes a balance between flexibility and interpretability, is being popularly studied; see [8,9] for more details. The model is specified as follows:

y_{i t} = ρ_{0} \sum_{j = 1}^{N} w_{i j} y_{j t} + X_{i t}^{τ} β_{0} + Z_{i t}^{τ} α_{0} (u_{i t}) + μ_{0 i} + ϵ_{i t}, i = 1, 2, \dots, N; t = 1, 2, \dots, T,

(2)

where

{y_{i t}, u_{i t}, X_{i t}, Z_{i t}}

represents the observations of an individual i in period t.

α_{0} (u)

is a vector of unknown functions. Model (2) is a general model that can be reduced to some existing panel models. For instance, if

ρ_{0} = 0

, this model is reduced to the varying coefficient panel model studied by [10,11,12,13]. If

ρ_{0} = 0

and

β_{0} = 0

, this model is reduced to a varying coefficient panel model [14]. If

α_{0} (\cdot) \equiv 0

, this model generalizes the spatial autoregressive panel model [5]. Moreover, if

ρ_{0} = 0

,

β_{0} = 0

and

α_{0} (\cdot) \equiv 0

, this model becomes the classical panel model. However, while model (2) can be reduced to various panel models, it also increases the risk of model misspecification in practical applications. Specifying the model form becomes an inevitable issue, which is equivalent to detecting the zero components of

{ρ_{0}, β_{0}, α_{0} (\cdot)}

. In other words, a variable selection method for model (2) is required. Additionally, when the number of covariates in model (2) is large, selecting important variables is also a purpose of the variable selection method.

Variable selection constitutes a crucial aspect of contemporary statistical inference. Over the years, numerous variable selection techniques have emerged for parametric models. LASSO [15], SCAD [16], and ALASSO [17] are the most popular methods among them. Based on those methods, variable selection methods for nonparametric or semiparametric models have been established in recent years. Wang et al. [18] considered variable selection for varying coefficient models using the SCAD penalty. Li and Liang [19] utilized the SCAD penalty to identify significant variables within the parametric components of a semiparametric varying coefficient partially linear model. Wang et al. [20] and Zhao and Xue [21] presented a variable selection procedure by combining basis function approximations with SCAD penalty for semiparametric varying coefficient partially linear models. The proposed procedure simultaneously selects significant variables in parametric components and nonparametric components. Tian et al. [22] introduced a novel method for variable selection, which integrates basis function approximations with quadratic inference functions. This approach enables the simultaneous identification of significant variables in both parametric and nonparametric components. For further development of the variable selection methods for nonparametric or semiparametric models, see [23,24,25,26], among others. However, there are few studies on the variable selection of panel model (2), in which spatial components and nonparametric components are simultaneously included.

In this paper, we propose a variable selection procedure for model (2). In order to avoid incidental parameter problems [27] brought by unknown fixed effects, this variable selection procedure combines a profile quasi-likelihood method with the basis function approximation and SCAD penalty to achieve estimation and variable selection simultaneously. The proposed procedure can shrink the spatial, linear, and functional coefficients of irrelevant covariates automatically to achieve variable selection. Moreover, by selecting appropriate tuning parameters, we demonstrate the consistency of our variable selection method. The regression coefficient estimators exhibit the oracle property, which implies that the nonparametric component estimators converge optimally, while the parametric component estimators share the same asymptotic behavior as those derived from the true submodel. This suggests that our penalized estimators perform as effectively as if the true zero coefficients were known. Compared with Liu et al. [28] and Xie et al. [29], we consider panel data and varying coefficient components. In comparison, although Luo and Wu [30] took into account variable selection for the SVCSAR model, the model they analyzed was confined to cross-sectional data. Furthermore, they only selected parametric coefficients. However, our method enables the simultaneous selection of significant variables in both parametric and nonparametric components under panel data.

The rest of this paper is structured as follows. In Section 2, we introduce a variable selection process designed for the SVCSAR panel model with fixed effects. In Section 3, we establish the asymptotic properties of the resulting estimators. In Section 4, we detail the computational steps for obtaining these estimators and discuss the selection of tuning parameters. In Section 5, we conduct simulations to assess the performance of our method with finite samples. Additionally, in Section 6, we provide a real-data analysis to further demonstrate the application of our proposed methodology. Finally, in Section 7, we conclude the article with a concise discussion. All technical proofs supporting the asymptotic results are included in the Appendix A.

2. Penalized Profile Quasi-Likelihood Method

We first consider the SVCSAR panel model described in (2). Specifically,

y_{i t}

and

u_{i t}

are scalars, while

X_{i t}

is a

p \times 1

vector and

Z_{i t}

is a

q \times 1

vector, respectively;

β_{0} = {(β_{01}, β_{02}, \dots, β_{0 p})}^{τ}

is an unknown vector that reflects the linear effect of

X_{i t}

on

y_{i t}

, and it is assumed that only a subset of its elements are non-zero. Without loss of generality, we assume that the first s elements specifically are non-zero.

α_{0} (u) \equiv {(α_{01} (u), \dots, α_{0 k} (u), \dots, α_{0 q} (u))}^{τ}

is a vector of unknown functions. Similarly, we assume that the first d function is non-zero.

ϵ_{i t}

is an i.i.d. disturbance with zero mean and finite unknown variance

σ_{0}^{2}

; thus,

{σ_{0}^{2}, ρ_{0}, β_{0}, α_{0}, μ_{01}, \dots, μ_{0 N}}

are unknown components. Next, we show a penalized quasi-likelihood method that can skip the estimation of

μ_{0 i} (i = 1, \dots, N)

and shrink the remaining estimators.

Let

n = N \times T, Y_{n} \equiv {(Y_{1}^{τ}, \dots, Y_{t}^{τ}, \dots, Y_{T}^{τ})}^{τ}

with

Y_{t} = {(y_{1 t}, \dots, y_{N t})}^{τ}; W_{n} \equiv I_{T} \otimes W_{N}

, where “⊗” demotes the Kronecker product symbol and

I_{T}

denotes a T-dimensional identity matrix;

X_{n} \equiv {(X_{1}^{τ}, \dots, X_{t}^{τ}, \dots, X_{T}^{τ})}^{τ}

with

X_{t} = {(X_{1 t}, \dots, X_{N t})}^{τ}

;

A_{0} \equiv {(A_{01}^{τ}, \dots, A_{0 t}^{τ}, \dots, A_{0 T}^{τ})}^{τ}

with

A_{0 t} = {(Z_{1 t}^{τ} α_{0} (u_{1 t}), \dots, Z_{N t}^{τ} α_{0} (u_{N t}))}^{τ}

;

V_{n} \equiv (V_{1}^{τ}, \dots, V_{t}^{τ}, \dots,

V_{T}^{τ})^{τ}

with

V_{t} = {(ϵ_{1 t}, \dots, ϵ_{N t})}^{τ}

;

μ_{0} = {(μ_{01}, . ., μ_{0 N})}^{τ}

denotes the fixed effects vector;

D_{n} \equiv ι_{T} \otimes I_{N}

, where

ι_{T}

represents a T-dimensional vector with all 1. Then, the model (2) can be expressed in matrix form:

Y_{n} = ρ_{0} W_{n} Y_{n} + X_{n} β_{0} + A_{0} + D_{n} μ_{0} + V_{n},

(3)

where

E (V_{n}) = 0, var (V_{n}) = σ_{0}^{2} I_{n}

. Thus, the unknown parametric component is

(σ_{0}^{2}, ρ_{0}, β_{0}, μ_{0})

, and the unknown nonparametric component is

α_{0} (u)

.

Let

θ = {(σ^{2}, ρ, β^{τ})}^{τ}

,

θ_{0} = {(σ_{0}^{2}, ρ_{0}, β_{0}^{τ})}^{τ}

, and

M_{n} (ρ) = I_{n} - ρ W_{n}

for any

ρ

and

M_{n} \equiv M_{n} (ρ_{0})

. Subsequently, we suggest optimizing the log-Gaussian quasi-likelihood, following the approach taken by [31,32], and the log-Gaussian quasi-likelihood of (3) is

\begin{matrix} ln \tilde{L} (θ, μ, α (u)) = & - \frac{n}{2} ln 2 π - \frac{n}{2} ln σ^{2} + ln | M_{n} (ρ) | \\ - \frac{1}{2 σ^{2}} {(M_{n} (ρ) Y_{n} - X_{n} β - A - D_{n} μ)}^{τ} (M_{n} (ρ) Y_{n} - X_{n} β - A - D_{n} μ), \end{matrix}

(4)

where

A = {(A_{1}^{τ}, \dots, A_{t}^{τ}, \dots, A_{T}^{τ})}^{τ}

,

A_{t} = {[Z_{1 t}^{τ} α (u_{1 t}), \dots, Z_{N t}^{τ} α (u_{N t})]}^{τ}

, and

α (u) = {[α_{1} (u), \dots, α_{q} (u)]}^{τ}

. When maximizing Equation (4), we encounter two main challenges: (i) Directly estimating

μ_{0}

can lead to the incidental parameter problem [27], especially when

μ_{0}

becomes high-dimensional as N increases. (ii) Estimating

α_{0}

is difficult due to its infinite dimensionality. To address these issues, we follow the approach in Tian et al. [32] and let

B (u) = {(B_{1} (u), B_{2} (u), \dots, B_{K_{n} + l + 1} (u))}^{τ}

denote a vector comprising normalized B-spline basis functions of order l with

K_{n}

internal knots. Subsequently, we approximate

α_{0 k} (u)

as a linear combination of

B_{1} (u), B_{2} (u), \dots, B_{K_{n} + l + 1} (u)

, i.e.,

α_{0 k} (u) \approx B^{τ} (u) γ_{0 k}

for

k = 1, 2, \dots, q

. Consequently, mode (3) can be written as follows:

Y_{n} \approx ρ_{0} W_{n} Y_{n} + X_{n} β_{0} + S_{n} γ_{0} + D_{n} μ_{0} + V_{n},

(5)

where

S_{n} = {(S_{1}^{τ}, \dots, S_{t}^{τ}, \dots, S_{T}^{τ})}^{τ}

,

S_{t} = [(I_{q} \otimes B (u_{1 t})) Z_{1 t}, \dots, (I_{q} \otimes B (u_{N t})) Z_{N t}]

, and

γ_{0} = {(γ_{01}^{τ}, \dots, γ_{0 q}^{τ})}^{τ}

. The log-Gaussian quasi-likelihood of (5) is

\begin{matrix} ln \hat{L} (θ, μ, γ) = & - \frac{n}{2} ln 2 π - \frac{n}{2} ln σ^{2} + ln | M_{n} (ρ) | \\ - \frac{1}{2 σ^{2}} {(M_{n} (ρ) Y_{n} - X_{n} β - S_{n} γ - D_{n} μ)}^{τ} (M_{n} (ρ) Y_{n} - X_{n} β - S_{n} γ - D_{n} μ) . \end{matrix}

(6)

To avoid the incidental parameter problem [27] caused by

μ

, we first concentrate

μ

out and obtain the profile quasi-likelihood. Let

η = {(θ^{τ}, γ^{τ})}^{τ}

. For given

η

, from (6), we derive

\hat{μ} (η) = {(D_{n}^{τ} D_{n})}^{- 1} D_{n}^{τ} (M_{n} (ρ) Y_{n} - X_{n} β - S_{n} γ)

and substitute this into (6). Then,

\begin{matrix} ln L_{n} (η) = & - \frac{n}{2} ln 2 π - \frac{n}{2} ln σ^{2} + ln | M_{n} (ρ) | \\ - \frac{1}{2 σ^{2}} {(M_{n} (ρ) Y_{n} - X_{n} β - S_{n} γ)}^{τ} J_{n} (M_{n} (ρ) Y_{n} - X_{n} β - S_{n} γ), \end{matrix}

(7)

where

J_{n} = {(I_{n} - Q_{n})}^{τ} (I_{n} - Q_{n})

and

Q_{n} = D_{n} {(D_{n}^{τ} D_{n})}^{- 1} D_{n}^{τ}

.

Inspired by the concept of variable selection in semiparametric varying coefficient partially linear models [21], we introduce a penalized profile quasi-likelihood function defined as follows:

Q_{n} (η) = ln L_{n} (η) - n \sum_{j = 2}^{p + 2} p_{λ_{1, n}} (| θ_{j} |) - n \sum_{k = 1}^{q} p_{λ_{2, n}} (∥ B^{τ} (\cdot) γ_{k} ∥),

(8)

where

∥ B^{τ} (\cdot) γ_{k} ∥ = {(\int {(B^{τ} (u) γ_{k})}^{2} d u)}^{1 / 2}

, and

p_{λ} (\cdot)

is a SCAD penalty function [16] defined by

p_{λ}^{'} (ω) = λ \{I (ω \leq λ) + \frac{{(a λ - ω)}_{+}}{(a - 1) λ} I (ω \geq λ)\},

with

a > 2, ω > 0

and

p_{λ} (0) = 0

. Throughout this paper, we adopt the suggestion by Fan and Li [16] that the choice of

a = 3.7

performs well in a variety of situations. The tuning parameter

λ

can be different for all

θ_{j}

and

B^{τ} (\cdot) γ_{k}

. Note that

∥ B^{τ} (\cdot) γ_{k} ∥ = {(\int {(B^{τ} (u) γ_{k})}^{2} d u)}^{1 / 2} = {(γ_{k}^{τ} H γ_{k})}^{1 / 2} \equiv {∥ γ_{k} ∥}_{H}

, where

H = \int B (u) B^{τ} (u) d u

. Then, the penalized profile quasi-likelihood function can be written as follows:

Q_{n} (η) = ln L_{n} (η) - n \sum_{j = 2}^{p + 2} p_{λ_{1, n}} (| θ_{j} |) - n \sum_{k = 1}^{q} p_{λ_{2, n}} (∥ γ_{k} ∥_{H}) .

(9)

Let

\hat{η} = {({\hat{θ}}^{τ}, {\hat{γ}}^{τ})}^{τ}

be the solution by maximizing (9). Then,

\hat{θ}

is the penalized profile quasi-likelihood estimator (penalized profile QMLE) of

θ

, and the estimator of

α_{k} (u)

can be obtained by

α_{k} (u) = B^{τ} (u) {\hat{γ}}_{k}

. Next, we study the asymptotic properties of the resulting penalized profile likelihood estimators. Without loss of generality, we assume that

θ_{0 j} = 0

for

j = s + 1, \dots, p + 2

, with the remaining

θ_{0 j} (j = 1, \dots, s)

being the non-zero components of

θ_{0}

. Similarly, we assume that

α_{0 k} (\cdot) \equiv 0

for

k = d + 1, \dots, q

and that the

α_{0 k} (\cdot), (k = 1, \dots, d)

constitute all the non-zero components of

α_{0} (\cdot)

.

3. Asymptotic Results

Denote

G_{n} = W_{n} M_{n}^{- 1}

,

R_{n} = G_{n} (X_{n} β_{0} + A_{0} + D_{n} μ_{0})

. The following assumptions are necessary prior to establishing the asymptotic properties.

C1: T is finite and greater than 2, and N is large.
C2: The disturbances ${ϵ_{i t}}$ for $i = 1, 2, \dots, n$ and $t = 1, 2, \dots, T$ are independently and identically distributed with zero mean, and the finite variance is $σ_{0}^{2}$ . Additionally, for some $v > 0$ , $E | ϵ_{i t} |^{4 + v}$ exists.
C3: The entries ${w_{i j}}$ of $W_{n}$ satisfy $w_{i i} = 0$ and $w_{i j} = O (1 / h_{n})$ , where $h_{n} / n \to 0$ as $n \to \infty$ .
C4: The matrix $M_{n} (ρ)$ is nonsingular for all $ρ$ in a compact parameter space $Λ$ . The sequences ${M_{n}^{- 1} (ρ)}$ are uniformly bounded in either row or column sums for all $ρ \in Λ$ . The true $ρ_{0}$ is in the interior of $Λ$ .
C5: The sequences of matrices ${W_{n}}$ and ${M_{n}^{- 1}}$ are uniformly bounded in both row and column sums.
C6: The elements of $X_{n}$ and $S_{n}$ are uniformly bounded for all n, and the limit ${lim}_{n \to \infty} n^{- 1} {(X_{n}, S_{n}, R_{n})}^{τ} J_{n} (X_{n}, S_{n}, R_{n})$ exists and is nonsingular.
C7: There exists a constant $λ_{c}$ , such that $λ_{c} I_{n} - Γ_{n} Γ_{n}^{τ}$ is positive and semidefinite for all n, where $Γ_{n} = R_{n}, X_{n}, G_{n}$ .
C8: The limits ${lim}_{n \to \infty} E [n^{- 1} \partial^{2} ln L_{n} (η_{T}) / \partial η \partial η^{τ}]$ exist.
C9: Third derivatives $\partial^{3} ln L_{n} (η) / (\partial η_{i} \partial η_{j} \partial η_{k})$ exist for all $η$ in an open set $H$ that contains the parameter point $η_{T}$ . Furthermore, there exist functions $M_{i j k}$ , such that $| \partial^{3} ln L_{n} (η) / (\partial η_{i} \partial η_{j} \partial η_{k}) | \leq M_{i j k}$ for all $η \in H$ , where $E (M_{i j k}) \leq \infty$ for $i, j, k$ .
C10: For $k = 1 \dots q$ , $α_{k} (u) \in C^{r} (0, 1)$ , where $r \geq 2$ . The distribution of $u_{i t}$ is absolutely continuous, and its density is bound away from zero and infinity on $[0, 1]$ .
C11: Let $π_{1}, \dots, π_{K_{n}}$ denote the interior knots within the interval $[0, 1]$ and $π_{0} = 0$ , $π_{K_{n} + 1} = 1$ . Define $ϱ_{i} = π_{i} - π_{i - 1}$ . There exists a constant C, such that $max ϱ_{i} / min ϱ_{i} \leq C$ and $max {ϱ_{i}} = o (K_{n}^{- 1})$ .
C12: The knot number $K_{n}$ is assumed to satisfy $K_{n} = n^{\frac{1}{2 r + 1}}$ .
C13: Let $b_{n} = {max}_{j, k} {p_{λ_{1 j}}^{″} (| θ_{0 j}), p_{λ_{2, n}}^{″} (∥ γ_{0 k} ∥_{H}) : | θ_{0 j} | \neq 0, ∥ γ_{0 k} ∥_{H} \neq 0}$ . Then, $b_{n} \to 0$ , as $n \to \infty$ .
C14: $j = s + 1, \dots, p, k = d + 1, \dots, q$ , $lim {inf}_{n \to \infty} lim {inf}_{θ_{j} \to 0^{+}} λ_{2 j}^{- 1} p_{λ_{2 j}}^{'} (| θ_{j} |) > 0$ and $lim {inf}_{n \to \infty} lim {inf}_{∥ γ_{0 k} ∥_{H} \to 0^{+}} λ_{2 k}^{- 1} p_{λ_{2 k}}^{'} (∥ γ_{0 k} ∥_{H}) > 0$ hold.

Remark 1.

C1 excludes the scene where T approaches infinity. Essentially, this assumption implies a setting (large N, small T) aligning with many spatial data studies. Conversely, the scenarios in which only T increases indefinitely and where both N and T go to infinity closely resemble the scenario where only N goes to infinity. C2 is needed to apply the central limit theorem in Kelejian and Prucha [33]. C3–C5 are analogous to Assumption 2 in Lee [31], which focuses on the properties of the spatial weight matrix

W_{n}

and is essential for the identifiability of ρ. Specifically,

Λ = (- 1, 1)

when

W_{n}

satisfies

\sum_{j = 1}^{n} w_{i j} = 1

for all i. C6–C9 are applied for asymptotic normality of the profile QMLE. C10–C12 facilitate achieving the optimal convergence rate for

{\hat{α}}_{k} (u)

. He et al. [34] suggested that cubic B-splines are adequate for accurately approximating nonparametric functions, with the number of interior knots set to the integer part of

n^{1 / 5}

. Meanwhile, C13 and C14 present assumptions about the penalty function, which are comparable to those utilized in [16,18,19].

Due to the projection matrix

J_{n}

, a portion of the degrees of freedom is lost; therefore, the estimator

{\hat{σ}}^{2}

derived from (9) is not a consistent estimator of

σ_{0}^{2}

. A correction is needed. Let

σ_{T}^{2} = (T - 1) σ_{0}^{2} / T, θ_{T} = {(σ_{T}^{2}, ρ_{0}, β_{0}^{τ})}^{τ}

and

η_{T} = {(θ_{T}^{τ}, γ_{0}^{τ})}^{τ}

. Under the assumptions, the subsequent theorem establishes the consistency property of the penalized profile QMLE.

Theorem 1.

Suppose that Assumptions C1–C12 hold; then, we can have

∥ {\hat{α}}_{k} (\cdot) - α_{0 k} (\cdot) ∥ = O_{p} (n^{- r / (2 r + 1)} + a_{n}), k = 1, \dots, q,

where

a_{n} = {max}_{j, k} \{p_{λ_{1 j}}^{'} (| θ_{0 j} |), p_{λ_{2 k}}^{'} (∥ γ_{0 k} ∥_{H}) : | θ_{0 j} | \neq 0, ∥ γ_{0 k} ∥_{H} \neq 0\} .

Furthermore, under some conditions, we show that such consistent estimators must possess the sparsity property, which is stated as follows.

Theorem 2.

Suppose that Assumptions C1–C14 hold, and let

λ_{max} = {max}_{j, k} \{λ_{1 j}, λ_{2 k}\}

and

λ_{min} = {min}_{j, k} \{λ_{1 j}, λ_{2 k}\}

. If

λ_{max} \to 0

and

n^{r / (2 r + 1)} λ_{min} \to \infty

as

n \to \infty

, then, with probability tending to 1,

\hat{β}

and

\hat{α} (\cdot)

must satisfy

(i): ${\hat{β}}_{j} = 0, j = s + 1, \dots, p .$
(ii): ${\hat{α}}_{k} (\cdot) \equiv 0, k = d + 1, \dots, q .$

According to Remark 1 in Fan and Li [16], if

λ_{max} \to 0

as

n \to \infty

, then

a_{n} = 0

. Consequently, based on Theorems 1 and 2, it becomes evident that by selecting appropriate tuning parameters, our variable selection approach is consistent. Furthermore, the estimators of the nonparametric components attain the optimal convergence rate as if the subset of true zero coefficients were already known [35]. Subsequently, we will demonstrate that the estimators for the non-zero coefficients in the parametric components share the same asymptotic distribution as those derived from the correct submodel. Let

{\hat{θ}}^{*} = {({\hat{θ}}_{1}, \dots, {\hat{θ}}_{s})}^{τ},

θ_{T}^{*} = {(θ_{T 1}, \dots, θ_{T s})}^{τ}

, where the following result states the asymptotic normality of

\hat{θ}

.

Theorem 3.

Suppose that Assumptions C1–C14 and the conditions in Theorem 2 hold, then

\sqrt{n} ({\hat{θ}}^{*} - θ_{T}^{*}) \to N (0, \frac{T}{T - 1} \{Ω^{- 1} + Ω^{- 1} [Ψ_{θ}^{*} - 2 Σ_{θ γ}^{* τ} {(Σ_{γ}^{*})}^{- 1} Ψ_{θ}^{*}] Ω^{- 1}\}),

where

Ω, Ψ_{θ}^{*}, Σ_{γ}^{*}, Σ_{θ γ}^{*}

are defined in the Appendix A.

4. Some Issues in Practice

In the practical situation, we have to choose the proper tuning parameters

(λ_{1}, λ_{2})

and provide an effective calculation algorithm. In this section, we discuss these practical issues.

4.1. Selection of Tuning Parameters

The tuning parameters

λ_{1 k}

’s and

λ_{2 k}

’s should be chosen. In practice, we suggest taking

λ_{1 j} = λ_{1} / | β_{j}^{(0)} |

and

λ_{2 k} = λ_{2} / {∥ γ_{k}^{(0)} ∥}_{H}

, and the pair

(λ_{1}, λ_{2})

is derived by minimizing the following BIC-type criterion:

B I C (λ_{1}, λ_{2}) = - 2 ln L_{n} ({\hat{η}}_{λ_{1}, λ_{2}}) + d f_{λ_{1}, λ_{2}} \times ln n,

where

{\hat{η}}_{λ_{1}, λ_{2}}

is the

\hat{η}

derived for given

λ_{1}, λ_{2}

, and

d f_{λ_{1}, λ_{2}}

is the number of non-zero elements of both

\hat{θ}

and

(∥ {\hat{γ}}_{1} ∥_{H}, \dots, ∥ {\hat{γ}}_{q} ∥_{H})

.

4.2. Computational Algorithm

Given that

Q_{n} (η)

is not differentiable at the origin, the standard gradient method cannot be employed. Therefore, we have devised an iterative algorithm that relies on a local quadratic approximation of the penalty function

p_{λ} (\cdot)

, similar to the approach taken by Fan and Li [16]. Let

\{\begin{matrix} f (η) = \frac{\partial ln L_{n} (η)}{\partial η}, \\ Γ (η) = diag \{0, \frac{p_{λ_{12}}^{'} (| θ_{1} |)}{| θ_{1} |}, \dots, \frac{p_{λ_{1 p}}^{'} (| θ_{p} |)}{| θ_{p} |}, \frac{p_{λ_{21}}^{'} (∥ γ_{1} ∥_{H})}{∥ γ_{1} ∥_{H}} H, \dots, \frac{p_{λ_{2 q}}^{'} (∥ γ_{q} ∥_{H})}{∥ γ_{q} ∥_{H}} H\}, \\ U (η) = Γ (η) η, \\ Σ (η) = \frac{\partial^{2} ln L_{n} (η)}{\partial η \partial η^{τ}} . \end{matrix}

Then, a feasible algorithm is as follows:

Step 1: Initialize $η^{(0)}$ .
Step 2: Update $η^{(m + 1)} = η^{(m)} - {[Σ (η^{(m)}) + Γ (η^{(m)})]}^{- 1} [f (η^{(m)}) - U (η^{(m)})]$ .
Step 3: Iterate Step 2 until convergence, and denote the final estimators as the penalized profile quasi-likelihood estimators.

The initial value

η^{(0)}

in Step 1 is obtained from the profile QMLE, i.e.,

\partial ln L_{n} (η^{(0)}) / \partial η = 0

.

5. Monte Carlo Simulations

In this section, we conduct Monte Carlo simulations to assess the finite sample performance of our proposed method. Following the methodology employed by Li and Liang in [19], we evaluate the estimator

\hat{θ}

using the generalized mean square error (GMSE), which is defined as follows:

GMSE = \frac{1}{n} {(\hat{θ} - θ_{T})}^{τ} [{(1, W_{n} Y, X)}^{τ} (1, W_{n} Y, X)] (\hat{θ} - θ_{T}) .

The performance of estimator

\hat{α} (\cdot)

is assessed using average square errors (ASE):

ASE = \{\frac{1}{S} \sum_{s = 1}^{S} \sum_{k = 1}^{q} {[{\hat{α}}_{k} (u_{s}) - α_{0 k} (u_{s})]}^{2}\},

where

u_{s}, s = 1, \dots, S

are the grid points at which the function

\hat{α} (\cdot)

is evaluated. In our simulation,

S = 200

is used.

The spatial matrix is generated from the following procedure. (i) Calculate

G_{N} = round (N^{0.8})

as the number of “groups” and

m = N^{0.2}

as the average number of individuals in each group. (ii) Generate the “group” size

n_{i} \sim U n i f o r m (0.8 m, 1.2 m) (i = 1, \dots, G_{N})

and adjust

n_{i}

so that it satisfies

\sum_{i = 1}^{G_{N}} n_{i} = N

. (iii) Normalize the matrices

W_{i} (i = 1, \dots, G_{N})

with zero for diagonal elements and

1 / (n_{i} - 1)

for other elements. (iv) Generate the final spatial matrix

W_{N} = diag {W_{1}, \dots, W_{G_{N}}}

.

Data are generated from model (2), and we construct two examples to simulate different scenarios. Example 1 is a regular example that is primarily used to verify the asymptotic properties of penalized profile likelihood estimators. Various sample sizes, degrees of spatial dependence, distributions of disturbances, and variances in disturbances are considered. Example 2 focuses on examining the performance of the proposed method in cases where the sample size is large, and the dimensions of parametric and nonparametric components are high. Unlike Example 1, Example 2 includes covariates with AR structure and different functions. All simulation results are obtained based on 500 repetitions.

5.1. Example 1: Regular Scenario

Let

β_{0} = {(2, - 1, 0_{5})}^{τ}

,

α_{0} (u) = {(2 sin (2 π u), 2 cos (2 π u), 0_{5})}^{τ}

. To simulate different spatial degrees, we choose different

ρ_{0} \in {0, 0.3, 0.7}

. We take two distribution settings for

ε_{i t}

: (i)

ε_{i t} \sim N (0, σ_{0}^{2})

and (ii)

ε_{i t} \sim \sqrt{σ_{0}^{2} / 1.5} \cdot t (6)

, with

σ_{0}^{2} \in {1, 2}

. To perform this simulation, we take the covariates

X_{i t} \sim N (0, I_{7}), Z_{i t} \sim N (0, I_{7})

and

u_{i t} \sim U (0, 1)

. The fixed effect

μ_{0 i}, i = 1, \dots, N

is generated from

U (0, 1)

. We use the cubic B-splines in all simulations and generate

N = 30, 60; T = 10, 15

, respectively.

We compare the performance of the variable selection procedure based on the SCAD penalty (SCAD) proposed by this paper with that based on the adaptive LASSO (ALASSO) penalty [17]. Table 1 and Table 2 report the effects of variable selection under normal disturbance. The column labeled “C” indicates the average number of true zeros correctly set to zero, while the column labeled “I” shows the average number of true non-zeros incorrectly set to zero. The row labeled “Oracle” refers to the oracle estimators computed using the true model when the zero coefficients are known.

Table 1. Variable selection under

ϵ_{i t} \sim N (0, σ_{0}^{2})

in Example 1

(T = 10)

.

Table 2. Variable selection under

ϵ_{i t} \sim N (0, σ_{0}^{2})

in Example 1

(T = 15)

.

From Table 1 and Table 2, we can infer the following consequences. (i) As

n = N \times T

increases, the performance of all variable selection methods, both parametric and nonparametric, converges towards the oracle procedure in terms of model error and complexity. (ii) For the parametric component, SCAD-based variable selection outperforms ALASSO-based methods, while for the nonparametric component, the reverse is true. However, overall, the effects of variable selection based on these two penalty functions are satisfactory and comparable. (iii) When the true parameter

ρ_{0} = 0

, reducing the model to a non-spatial varying coefficient panel model, the proposed variable selection methods can accurately identify the true model.

Table A1 in Appendix B reports the effect of variable selection under t disturbance. We can see that the conclusions drawn from Table 1 and Table 2 also hold for Table A1. Comparing Table 1 and Table 2 and Table A1, the GMSE and ASE under t disturbance are slightly larger than those under normal disturbance, but this difference is so small that it can be ignored, which implies the robustness of the proposed variable selection procedure.

5.2. Example 2: High-Dimension Scenario

Let

N \times T = 100 \times 20

,

β_{0} = {(3, - 1.5, 2, 1, 0_{46})}^{τ}

, and

α_{0} (u) = (2 sin (2 π u), 4 u (1 - u) (u - 3),

ln (16 u^{2} - 1), 0_{12})^{τ}

. The covariates

{(X_{i t}^{τ}, Z_{i t}^{τ})}^{τ}

are generated from

N (0, {AR}_{0.5})

, where

{AR}_{0.5} = [\begin{matrix} 1 & 0.5 & \dots & 0 . 5^{65} \\ 0.5 & 1 & \dots & 0 . 5^{64} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 . 5^{65} & 0 . 5^{64} & \dots & 1 \end{matrix}] .

To save space, we only considered the case where the disturbance term follows a normal distribution in this example. The remaining settings are identical to those in Example 1.

Table A2 in Appendix B presents the outcomes of Example 2. As depicted in Table A2, the proposed procedures demonstrate robust performance despite the substantial dimensions of both the parametric and nonparametric components.

6. A Real Example

In this section, we use China’s provincial carbon emission panel dataset to illustrate our proposed methods. The dataset contains 14 annual variables of 30 provinces in China from 2007 to 2019. The model used is an extension of the STRIPAT model of Dietz and Rosa [36], which is specified as

ln y_{i t} = ρ_{0} \sum_{j = 1}^{30} w_{i j} ln y_{j t} + \sum_{j = 1}^{8} β_{0 j} ln x_{i t j} + \sum_{k = 1}^{6} α_{0 k} (u_{i t}) ln z_{i t k} + μ_{0 i} + ϵ_{i t}

(10)

for

i = 1, \dots, 30; t = 1, \dots, 13

. The corresponding variables are described in Table 3. Such a model aims to analyze the social factors that have an impact on carbon dioxide emissions, especially the affluence factor. The spatial weight matrix

W_{N} = w_{i j}

is specified by contiguity rules; that is, let

w_{i j} = \{\begin{matrix} 1, & province i and j share the common graphic boundary \\ 0, & else \end{matrix} .

Table 3. Description of related variables.

Then, adjust

w_{i j}

so that the diagonal elements of

W_{N}

are 0 and the row sums are 1.

In previous studies, the affluence factor always had significantly positive effects on carbon dioxide emissions, and the effects were always set as a constant, see [37,38]. This phenomenon can be described by the following mechanism: among the three major industries, the secondary industry exhibits the highest dependence on energy consumption, followed by the tertiary industry. However, these two industries are also the pillar industries that drive economic development, implying a significant reliance on economic growth on energy consumption. Currently, fossil fuels constitute the primary energy source in the world, and their consumption leads to a substantial release of carbon dioxide. As indicated by existing research analyses, economic development contributes significantly to carbon emissions. It is well known that coal consumption among fossil fuels generates the most carbon dioxide, suggesting that the energy structure may directly influence the impact of economic development on carbon emissions. However, considering the significant variations in energy structures across and within different provinces, cities, and over different years, it may be unreasonable to measure this impact using a constant. Therefore, we wish to capture the potential heterogeneity of this impact using the coal consumption proportion (CCP), which is why we set the CCP as

u_{i t}

and six variables describing affluence as

z_{i t 1}, \dots, z_{i t 6}

.

Table 4 reports the estimated coefficient parameters, and Figure 1 depicts the estimated coefficient functions. There is not much difference between the results for SCAD and ALASSO, so we focus on the SCAD results in the following analysis. The spatial coefficient

\hat{ρ}

is not shrunk to zero and is positive, which means that neighboring provinces exhibit a substantial spatial positive correlation. The constant coefficients

{\hat{β}}_{2}, {\hat{β}}_{3}, {\hat{β}}_{4}

, and

{\hat{β}}_{6}

are positive, and the mean values are estimated when EI, RDF, FIR, and FV change by 1% and CE changes by 0.35%, 0.087%, 0.245%, and 0.1%, respectively.

{\hat{β}}_{7}

is the only negative constant coefficient, which implies that as the public transportation passenger volume increases, carbon emissions tend to decrease. Public transportation is a low-carbon mode of transportation. The more people choose to travel by public transportation, the fewer people use private cars, leading to a reduction in carbon emissions. This provides the evidence for the negative coefficient

{\hat{β}}_{7}

. The remaining coefficients are estimated to be zero, which means that they are excluded through variable selection. In addition, most of the coefficient functions in Figure 1 show an upward trend, which means that as coal consumption proportion increases, the contribution of affluence factors to carbon emission increases. Meanwhile, two function coefficients are removed through variable selection. This is generally consistent with the previous analysis in this section.

Table 4. Penalized estimators for the parametric components.

Figure 1. The estimated varying coefficient functions based on SCAD and ALASSO. (a) Varying coefficient derived from SCAD; (b) varying coefficient derived from ALASSO.

7. Discussion and Conclusions

Within the context of SVCSAR panel models with fixed effects, we developed a variable selection process that utilizes basis function approximations alongside the profile quasi-likelihood method. This approach allows for the simultaneous selection of significant variables in both parametric and nonparametric components, as well as the estimation of unknown coefficients. By selecting appropriate tuning parameters, we demonstrated that this selection process is consistent, and the estimators of constant coefficients exhibit the oracle property. Simulation results highlight the effectiveness of our proposed method in selecting variables and estimating both constant and varying coefficients. In this study, we assume that the dimensions of covariates X and Z remain fixed. Additionally, we also presented simulations in Section 5 and obtained desired results when the dimensions p and q were large. However, it is worth noting that our variable selection process may not be applicable to scenarios involving ultra-high-dimensional covariates. As a potential area for future research, exploring variable selection techniques for SVCSAR panel models with ultra-high-dimensional covariates would be of great interest.

Author Contributions

Conceptualization, R.T. and M.X.; Methodology, R.T., M.X. and D.X.; Software, M.X.; Formal analysis, R.T. and M.X.; Writing—original draft, R.T., M.X. and D.X.; Supervision, R.T. All authors have read and agreed to the published version of the manuscript.

Funding

R.T.’s work is supported by the Zhejiang Provincial Philosophy and Social Sciences Planning Project (No. 24NDJC014YB). D.X.’s work is supported by the Zhejiang Provincial Natural Science Foundation of China (No. LY23A010013).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of Theorems

The following list summarizes some frequently used facts in the appendix.

Fact A1.

Let

M_{1}, M_{2}

be

n \times n

symmetric matrices, with

M_{2}

being positive semidefinite. Then,

λ_{i} (M_{1}) \leq λ_{i} (M_{1} + M_{2})

for

i = 1, \dots, n

, where

λ_{i}

denotes the i-th eigenvalue.

Fact A2.

Let

M_{1}

and

M_{2}

be

n \times n

matrices and uniformly bound in either row or column sums; then, the entries of their product

M_{1} M_{2}

are also uniformly bound.

Before proving the main theorems, we supplement several lemmas.

Lemma A1.

Let

δ_{n} = A_{0} - S_{n} γ_{0}

, supposing that Assumptions C1–C12 hold. Then,

\frac{1}{\sqrt{n}} H_{n}^{τ} J_{n} δ_{n} = o_{p} (1),

for

H_{n} = V_{n}, A_{0}, X_{n}

and

W_{n} Y_{n}

.

Proof.

The proof is similar to that of Lemma 1 in Tian et al. [32]. □

Lemma A2.

Suppose that Assumptions C1–C4 hold, then

(I) G_{n} i s u n i f o r m l y b o u n d e d i n e i t h e r r o w o r c o l u m n s u m s; (I I) \frac{tr G_{n}}{n} = O (1); (I I I) \frac{tr (G_{n}^{τ} G_{n})}{n} =

O (1); (I V) \frac{tr J_{n}}{n} = \frac{T - 1}{T}; (V) \frac{tr (G_{n}^{τ} J_{n})}{n} = \frac{T - 1}{T} \frac{tr G_{n}}{n}; (V I) \frac{tr (G_{n}^{τ} J_{n} G_{n})}{n} = \frac{T - 1}{T} \frac{tr G_{n}^{τ} G_{n}}{n} .

Proof.

The proof method is similar to that of Lemma 3 in Tian et al. [32]; therefore, will not be elaborated here. □

Lemma A3.

Suppose that Assumptions C1–C12 hold, then

\frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial η} = O_{p} (1) .

Proof.

The first order partial derivatives of the profile likelihood function is

\begin{matrix} \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial σ^{2}} & = - \frac{\sqrt{n}}{2 σ_{T}^{2}} + \frac{1}{2 σ_{T}^{4} \sqrt{n}} V_{n}^{τ} J_{n} V_{n} + o_{p} (1) \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial ρ} & = - \frac{1}{\sqrt{n}} tr G_{n} + \frac{1}{σ_{T}^{2} \sqrt{n}} R_{n}^{τ} J_{n} V_{n} + \frac{1}{σ_{T}^{2} \sqrt{n}} V_{n}^{τ} G_{n}^{τ} J_{n} V_{n} + o_{p} (1) \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial β} & = \frac{1}{σ_{T}^{2} \sqrt{n}} X_{n}^{τ} J_{n} V_{n} + o_{p} (1) . \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial γ} & = \frac{1}{σ_{T}^{2} \sqrt{n}} S_{n}^{τ} J_{n} V_{n} + o_{p} (1) . \end{matrix}

(A1)

The variance of

n^{- 1 / 2} \partial ln L_{n} (η_{T}) / \partial σ^{2}

is

\begin{matrix} var (\frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial σ^{2}}) & = \frac{1}{4 σ_{T}^{8} n} var (V_{n}^{τ} J_{n} V_{n}) + o (1) \\ = \frac{1}{4 σ_{T}^{8}} \{\frac{{(T - 1)}^{2} (μ_{4}^{*} - 3 σ_{0}^{4})}{T^{2}} + \frac{2 (T - 1) σ_{0}^{4}}{T}\} + o (1) \\ = O (1) \end{matrix} .

Therefore,

n^{- 1 / 2} \partial ln L_{n} (η_{T}) / \partial σ^{2} = O_{p} (1)

. The variance of

n^{- 1 / 2} \partial ln L_{n} (η_{T}) / \partial ρ

is

\begin{matrix} var (\frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial ρ}) & \leq \frac{2}{σ_{T}^{4} n} [var (R_{n}^{τ} J_{n} V_{n}) + var (V_{n}^{τ} G_{n}^{τ} J_{n} V_{n})] + o (1) \\ \leq \frac{2}{σ_{T}^{4}} [σ_{0}^{2} λ_{c} + \frac{{(T - 1)}^{2} (μ_{4}^{*} - 3 σ_{0}^{4})}{T^{2}} + \frac{(T - 1) σ_{0}^{4} tr [(G_{n} + G_{n}^{τ}) G_{n})]}{T}] \\ = O (1) . \end{matrix} .

Thus,

n^{- 1 / 2} \partial ln L_{n} (η_{T}) / \partial ρ = O_{p} (1)

. Given the uniform boundedness of the elements of

X_{n}

and

S_{n}

are uniformly bound for all n, as stated in C6, it is evident that

\frac{1}{σ_{T}^{2} \sqrt{n}} X_{n}^{τ} J_{n} V_{n} = O_{p} (1), \frac{1}{σ_{T}^{2} \sqrt{n}} S_{n}^{τ} J_{n} V_{n} = O_{p} (1) .

□

Lemma A4.

Suppose that Assumptions C1–C12 hold, then

\frac{1}{n} \{\frac{\partial^{2} ln L_{n} (η_{T})}{\partial η \partial η^{τ}} - E [\frac{\partial^{2} ln L_{n} (η_{T})}{\partial η \partial η^{τ}}]\} = o_{p} (1) .

Proof.

Utilizing a similar argument as in Theorem 3.2 of Lee [31], Lemma A4 is proven. □

Proof of Theorem 1.

Let

Δ_{n} = a_{n} + n^{- r / (2 r + 1)}, θ = θ_{T} + Δ_{n} ζ_{1}, γ = γ_{0} + Δ_{n} ζ_{2}

and

ζ = {(ζ_{1}^{τ}, ζ_{2}^{τ})}^{τ}

. It is sufficient to show that, for any given

ε > 0

, there exists a large constant C such that

P \{sup_{∥ ζ ∥ = C} Q (θ, γ) < Q (θ_{T}, γ_{0})\} > 1 - ε .

Let

D (ζ) = Q (θ, γ) - Q (θ_{T}, γ_{0})

; then, with a simple calculation, we see that

\begin{matrix} D (ζ) = & Δ_{n} {(\frac{\partial ln L_{n} (η_{T})}{\partial η})}^{τ} ζ + \frac{1}{2} Δ_{n}^{2} ζ^{τ} \frac{\partial^{2} ln L_{n} (η_{T})}{\partial η \partial η^{τ}} ζ (1 + o_{p} (1)) \\ + n \sum_{j = 2}^{p + 2} [p_{λ_{1 j}} (| θ_{j} |) - p_{λ_{1 j}} (| θ_{T j} |)] + n \sum_{k = 1}^{q} [p_{λ_{2 k}} (∥ γ_{k} ∥_{H}) - p_{λ_{2 k}} (∥ γ_{0 j} ∥_{H})] \\ \equiv & D_{1} + D_{2} + D_{3} + D_{4} . \end{matrix}

It follows by Lemma A3 and the Cauchy inequality that

\begin{matrix} | D_{1} | = |Δ_{n} {(\frac{\partial ln L_{n} (η_{T})}{\partial η})}^{τ} ζ| & \leq Δ_{n} \sqrt{n} ∥\frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial η}∥ \cdot ∥ ζ ∥ \\ = ∥ ζ ∥ O_{p} (\sqrt{n K_{n}} Δ_{n}) = ∥ ζ ∥ O_{p} (n Δ_{n}^{2}) . \end{matrix}

According to Lemma A4, a simple calculation yields

D_{2} = \frac{1}{2} Δ_{n}^{2} ζ^{τ} \frac{\partial^{2} ln L_{n} (η_{T})}{\partial η \partial η^{τ}} ζ (1 + o_{p} (1)) = - n Δ_{n}^{2} ζ^{τ} Σ_{n, η} ζ (1 + o_{p} (1)) .

Hence, by choosing a sufficiently large C,

D_{2}

dominates

D_{1}

uniformly in

∥ ζ ∥ = C

. Furthermore, invoking C13 and

p_{λ_{1 j}} (0) = 0

, and by the standard argument of the Taylor expansion, we obtain that

\begin{matrix} D_{3} & \leq n \sum_{j = 2}^{s} [p_{λ_{1 j}} (| θ_{j} |) - p_{λ_{1 j}} (| θ_{T j} |)] \\ \leq n \sum_{j = 2}^{s} [n Δ_{n} p_{λ_{1 j}}^{'} (| θ_{T j} |) sgn (θ_{T j}) | ζ_{1 j} | + n Δ_{n}^{2} p_{λ_{1 j}}^{″} (| θ_{T j} |) | ζ_{1 j} |^{2} (1 + o_{p} (1))] \\ \leq \sqrt{s} n Δ_{n} a_{n} ∥ ζ ∥ + n Δ_{n}^{2} b_{n} {∥ ζ ∥}^{2} \\ = ∥ ζ ∥ O_{p} (n Δ_{n}^{2}) + ∥ ζ ∥ o_{p} (n Δ_{n}^{2}) . \end{matrix}

Then, it is easy to show that

D_{3}

is dominated by

D_{2}

uniformly in

∥ ζ ∥ = C

. With the same argument, we can prove that

D_{4}

is dominated by

D_{2}

. Hence, by choosing a sufficiently large C, (A1) holds. This implies that a local maximizer

\hat{γ}

exists such that

∥ \hat{γ} - γ_{0} ∥ = O_{p} (Δ_{n}) = O_{p} (a_{n} + n^{- r / (2 r + 1)})

. Then,

∥ {\hat{γ}}_{k} - γ_{0 k} ∥ = O_{p} (a_{n} + n^{- r / (2 r + 1)})

. Let

δ_{k} = α_{k} (u) - B {(u)}^{τ} γ_{0 k}, k = 1, \dots, q

. Note that

\begin{matrix} ∥ {\hat{α}}_{k} (u) - α_{0 k} {(u) ∥}^{2} & = \int_{0}^{1} {[{\hat{α}}_{k} (u) - α_{0 k} (u)]}^{2} d u \\ = \int_{0}^{1} {[B^{τ} (u) {\hat{γ}}_{k} - B^{τ} (u) γ_{0 k} - δ_{k}]}^{2} d u \\ \leq 2 \int_{0}^{1} {[B^{τ} (u) ({\hat{γ}}_{k} - γ_{0 k})]}^{2} d u + 2 \int_{0}^{1} δ_{k}^{2} d u \\ = {({\hat{γ}}_{k} - γ_{0 k})}^{τ} H ({\hat{γ}}_{k} - γ_{0 k}) + 2 \int_{0}^{1} δ_{k}^{2} d u, \end{matrix}

and by invoking

∥ H ∥ = O (1)

, a simple calculation yields

{({\hat{γ}}_{k} - γ_{0 k})}^{τ} H ({\hat{γ}}_{k} - γ_{0 k}) = O_{p} (a_{n}^{2} + n^{- 2 r / (2 r + 1)}) .

Together with

\int_{0}^{1} δ_{k}^{2} d u = O_{p} (n^{- 2 r / (2 r + 1)}),

the proof is completed. □

Proof of Theorem 2.

We begin by proving part (i). Given

λ_{max} \to 0

, it follows easily that

a_{n} = 0

for large n. Then, according to Theorem 1, it suffices to show that for any

θ_{j}

satisfying

| θ_{j} - θ_{0 j} | = O_{p} (n^{- r / (2 r + 1)}), j = 1, \dots, s

and any

γ

satisfying

∥ γ - γ_{0} ∥ = O_{p} (n^{- r / (2 r + 1)})

, with some given small

ε = C n^{- r / (2 r + 1)}

, as

n \to \infty

, the probability tends to 1 that

\frac{\partial Q_{n} (η)}{\partial θ_{j}} < 0, for 0 < θ_{j} < ε, j = s + 1, \dots, p + 2,

(A2)

and

\frac{\partial Q_{n} (η)}{\partial θ_{j}} > 0, for - ε < θ_{j} < 0, j = s + 1, \dots, p + 2 .

(A3)

Thus, (A2) and (A3) imply that the maximizer of

Q_{n} (η)

attains

θ_{j} = 0, j = s + 1, \dots, p + 2

.

Using a similar argument to the proof of Theorem 1, we obtain that

\begin{matrix} \frac{\partial Q_{n} (η)}{\partial θ_{j}} = & \frac{\partial ln L_{n} (η)}{\partial θ_{j}} - n p_{λ_{1 j}}^{'} (| θ_{0 j} |) sgn (θ_{0 j}) \\ = & \frac{\partial ln L_{n} (η_{T})}{\partial θ_{j}} + \sum_{k = 1} \frac{\partial^{2} ln L_{n} (η_{T})}{\partial θ_{j} \partial θ_{k}} (η_{k} - η_{T k}) \\ + \sum_{k = 1} \sum_{l = 1} \frac{\partial^{3} ln L_{n} (η^{*})}{\partial θ_{j} \partial θ_{k} \partial θ_{l}} (η_{k} - η_{T k}) (η_{l} - η_{T l}) - n p_{λ_{1 j}}^{'} (| θ_{j} |) sgn (θ_{j}), \end{matrix}

where

η^{*}

lies between

η

and

η_{T}

. From Lemmas A1 and A2 and assumption C9, we have

\begin{matrix} \frac{1}{n} \frac{\partial ln L_{n} (η_{T})}{\partial θ_{j}} = O_{p} (n^{- \frac{1}{2}}), \\ \frac{1}{n} \frac{\partial^{2} ln L_{n} (η_{T})}{\partial η \partial η^{τ}} = E [\frac{1}{n} \frac{\partial^{2} ln L_{n} (η_{T})}{\partial η \partial η^{τ}}] + o_{p} (1), \\ \frac{1}{n} \frac{\partial^{3} ln L_{n} (η_{T})}{\partial θ_{j} \partial η_{k} \partial η_{l}} = O_{p} (1) . \end{matrix}

Then,

\begin{matrix} \frac{\partial Q_{n} (η)}{\partial θ_{j}} = & n λ_{1 j} \{λ_{1 j}^{- 1} \frac{1}{n} \frac{\partial ln L_{n} (η_{T})}{\partial θ_{j}} + λ_{1 j}^{- 1} \frac{1}{n} \sum_{k = 1} \frac{\partial^{2} ln L_{n} (η_{T})}{\partial θ_{j} \partial θ_{k}} (η_{k} - η_{T k}) \\ + λ_{1 j}^{- 1} \frac{1}{n} \sum_{k = 1} \sum_{l = 1} \frac{\partial^{3} ln L_{n} (η^{*})}{\partial θ_{j} \partial θ_{k} \partial θ_{l}} (η_{k} - η_{T k}) (η_{l} - η_{T l}) - λ_{1 j}^{- 1} p_{λ_{1 j}}^{'} (| θ_{j} |) sgn (θ_{j})\} \\ = & n λ_{1 j} \{λ_{1 j}^{- 1} O_{p} (n^{\frac{- r}{2 r + 1}}) - λ_{1 j}^{- 1} p_{λ_{1 j}}^{'} (| θ_{j} |) sgn (θ_{j})\} . \end{matrix}

Since

{lim}_{n \to \infty} lim {inf}_{θ_{j} \to 0} λ_{1 j}^{- 1} p_{λ_{1 j}}^{'} (| θ_{j} |) > 0

and

n^{r / (2 r + 1)} λ_{1 j} > n^{r / (2 r + 1)} λ_{min} \to \infty

, the sign of the derivation is completely determined by that of

θ_{j}

; then, (A2) and (A3) hold.

By applying similar techniques to our analysis of part (i) in this theorem, we find that with probability tending to 1,

{\hat{γ}}_{k} = 0, k = d + 1, \dots, q

. Then, using the fact that

{sup}_{u} ∥ B (u) ∥ = O (1)

, the result of this theorem follows immediately from

{\hat{α}}_{k} (u) = B^{τ} (u) {\hat{γ}}_{k}

. □

Proof of Theorem 3.

Let

γ^{*} = {(γ_{1}^{τ}, \dots, γ_{d}^{τ})}^{τ}

,

γ_{0}^{*} = {(γ_{01}^{τ}, \dots, γ_{0 d}^{τ})}^{τ}

,

η^{*} = (θ^{*}, γ^{*})

, and

η_{T}^{*} = (θ_{T}^{*}, γ_{0}^{*})

. In order to obtain the asymptotic distribution, we first write the components of

n^{- 1 / 2} \partial ln L_{n} (η_{T}) / \partial η

as follows:

\begin{matrix} \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial σ^{2}} & = - \frac{\sqrt{n}}{2 σ_{T}^{2}} + \frac{1}{2 σ_{T}^{4} \sqrt{n}} {(δ_{n} + V_{n})}^{τ} J_{n} (δ_{n} + V_{n}) \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial ρ} & = - \frac{1}{\sqrt{n}} tr G_{n} + \frac{1}{σ_{T}^{2} \sqrt{n}} (R_{n}^{τ} J_{n} V_{n} + V_{n}^{τ} G_{n}^{τ} J_{n} V_{n} + {(W_{n} Y_{n})}^{τ} J_{n} δ_{n}) \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial β} & = \frac{1}{σ_{T}^{2} \sqrt{n}} X_{n}^{τ} J_{n} (δ_{n} + V_{n}) \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial γ} & = \frac{1}{σ_{T}^{2} \sqrt{n}} S_{n}^{τ} J_{n} (δ_{n} + V_{n}), \end{matrix}

where

δ_{n} = A_{0} - S_{n} γ_{0}

. By Lemma A1,

\frac{1}{\sqrt{n}} H_{n}^{τ} J_{n} δ_{n} = o_{p} (1)

for

H_{n} = V_{n}, δ_{n}, X_{n}

and

W_{n} Y_{n}

, and the formula above is rewritten as follows:

\begin{matrix} \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial σ^{2}} & = - \frac{\sqrt{n}}{2 σ_{T}^{2}} + \frac{1}{2 σ_{T}^{4} \sqrt{n}} V_{n}^{τ} J_{n} V_{n} + o_{p} (1) \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial ρ} & = - \frac{1}{\sqrt{n}} tr G_{n} + \frac{1}{σ_{T}^{2} \sqrt{n}} (R_{n}^{τ} J_{n} V_{n} + V_{n}^{τ} G_{n}^{τ} J_{n} V_{n}) + o_{p} (1) \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial β} & = \frac{1}{σ_{T}^{2} \sqrt{n}} X_{n}^{τ} J_{n} V_{n} + o_{p} (1) \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial γ} & = \frac{1}{σ_{T}^{2} \sqrt{n}} S_{n}^{τ} J_{n} V_{n} + o_{p} (1) . \end{matrix}

(A4)

Let

μ_{3}^{*} = E {| ϵ_{i t} |}^{3}

and

μ_{4}^{*} = E {| ϵ_{i t} |}^{4}

,

T_{n}^{τ} J_{n} T_{n} / n \equiv Φ_{T T}

for

T_{n} = X_{n}, R_{n}

or

S_{n}

. Then, using Lemma A2 we write the following:

\begin{array}{l} E [- \frac{1}{n} \frac{\partial^{2} ln L_{n} (η_{T})}{\partial η \partial η^{τ}}] & = [\begin{matrix} \frac{σ_{0}^{2} tr J_{n}}{n σ_{T}^{6}} - \frac{1}{2 σ_{T}^{4}} + \frac{δ_{n}^{τ} J_{n} δ_{n}}{n σ_{T}^{6}} & * & * & * \\ \frac{R_{n}^{τ} J_{n} δ_{n}}{n σ_{T}^{4}} + \frac{σ_{0}^{2} tr (G_{n}^{τ} J_{n})}{n σ_{T}^{4}} & \frac{R_{n}^{τ} J_{n} R_{n}}{n σ_{T}^{2}} + \frac{tr G_{n}^{2}}{n} + \frac{σ_{0}^{2} tr (G_{n}^{τ} J_{n} G_{n})}{n σ_{T}^{2}} & * & * \\ \frac{X_{n}^{τ} J_{n} δ_{n}}{n σ_{T}^{4}} & \frac{X_{n}^{τ} J_{n} R_{n}}{n σ_{T}^{2}} & \frac{X_{n}^{τ} J_{n} X_{n}}{n σ_{T}^{2}} & * \\ \frac{S_{n}^{τ} J_{n} δ_{n}}{n σ_{T}^{4}} & \frac{S_{n}^{τ} J_{n} R_{n}}{n σ_{T}^{2}} & \frac{S_{n}^{τ} J_{n} X_{n}}{n σ_{T}^{2}} & \frac{S_{n}^{τ} J_{n} S_{n}}{n σ_{T}^{2}} \end{matrix}] \\ = [\begin{matrix} \frac{1}{2 σ_{T}^{4}} & * & * & * \\ \frac{tr G_{n}}{n σ_{T}^{2}} & \frac{Φ_{R R}}{σ_{T}^{2}} + \frac{tr (G_{n}^{2} + G_{n}^{τ} G_{n})}{n} & * & * \\ 0 & \frac{Φ_{X R}}{σ_{T}^{2}} & \frac{Φ_{X X}}{σ_{T}^{2}} & * \\ 0 & \frac{Φ_{S R}}{σ_{T}^{2}} & \frac{Φ_{S X}}{σ_{T}^{2}} & \frac{Φ_{S S}}{σ_{T}^{2}} \end{matrix}] + o (1) \equiv Σ_{n, η} + o (1) \end{array}

(A5)

\begin{matrix} E [\frac{1}{n} \frac{\partial ln L_{n} (η_{T})}{\partial η} \frac{\partial ln L_{n} (η_{T})}{\partial η^{τ}}] \\ = [\begin{matrix} \frac{T}{T - 1} \frac{1}{2 σ_{T}^{4}} + \frac{μ_{4}^{*} - 3 σ_{0}^{4}}{4 σ_{T}^{4} σ_{0}^{4}} & * & * & * \\ \begin{matrix} \frac{T}{T - 1} \frac{tr G_{n}}{n σ_{T}^{2}} + \frac{(μ_{4}^{*} - 3 σ_{0}^{4}) tr G_{n}}{2 n σ_{T}^{2} σ_{0}^{4}} \\ + \frac{μ_{3}^{*} R_{n}^{τ} J_{n} diag (J_{n})}{2 n σ_{T}^{6}} \end{matrix} & \begin{matrix} \frac{T}{T - 1} (\frac{Φ_{R R}}{σ_{T}^{2}} + \frac{tr (G_{n} G_{n} + G_{n}^{τ} G_{n})}{n}) \\ + \frac{(μ_{4}^{*} - 3 σ_{0}^{4}) \sum g_{n, i i}^{2}}{4 σ_{T}^{4} σ_{0}^{4}} + \frac{2 μ_{3}^{*} R_{n}^{τ} J_{n} diag (G_{n}^{τ} J_{n})}{n σ_{T}^{4}} \end{matrix} & * & * \\ \frac{μ_{3}^{*} X_{n}^{τ} J_{n} diag (J_{n})}{2 n σ_{T}^{6}} & \frac{T}{T - 1} \frac{Φ_{X R}}{σ_{T}^{2}} + \frac{μ_{3}^{*} X_{n}^{τ} J_{n} diag (G_{n}^{τ} J_{n})}{n σ_{T}^{4}} & \frac{T}{T - 1} \frac{Φ_{X X}}{σ_{T}^{2}} & * \\ \frac{μ_{3}^{*} S_{n}^{τ} J_{n} diag (J_{n})}{2 n σ_{T}^{6}} & \frac{T}{T - 1} \frac{Φ_{S R}}{σ_{T}^{2}} + \frac{μ_{3}^{*} X_{n}^{τ} J_{n} diag (G_{n}^{τ} J_{n})}{n σ_{T}^{4}} & \frac{T}{T - 1} \frac{Φ_{S X}}{σ_{T}^{2}} & \frac{T}{T - 1} \frac{Φ_{S S}}{σ_{T}^{2}} \end{matrix}] + o (1) \\ = \frac{T}{T - 1} (Σ_{n, η} + Ψ_{n, η}) + o (1), \end{matrix}

(A6)

where

g_{n, i i}

is the i-th diagonal element of

G_{n}

and

Ψ_{n, θ} = \frac{T - 1}{T} [\begin{matrix} \frac{μ_{4}^{*} - 3 σ_{0}^{4}}{4 σ_{T}^{4} σ_{0}^{4}} & * & * & * \\ \frac{(μ_{4}^{*} - 3 σ_{0}^{4}) tr G_{n}}{2 n σ_{T}^{2} σ_{0}^{4}} + \frac{μ_{3}^{*} R_{n}^{τ} J_{n} diag (J_{n})}{2 n σ_{T}^{6}} & \frac{(μ_{4}^{*} - 3 σ_{0}^{4}) \sum g_{n, i i}^{2}}{4 σ_{T}^{4} σ_{0}^{4}} + \frac{2 μ_{3}^{*} R_{n}^{τ} J_{n} diag (G_{n}^{τ} J_{n})}{n σ_{T}^{4}} & * & * \\ \frac{μ_{3}^{*} X_{n}^{τ} J_{n} diag (J_{n})}{2 n σ_{T}^{6}} & \frac{μ_{3}^{*} X_{n}^{τ} J_{n} diag (G_{n}^{τ} J_{n})}{n σ_{T}^{4}} & 0 & * \\ \frac{μ_{3}^{*} S_{n}^{τ} J_{n} diag (J_{n})}{2 n σ_{T}^{6}} & \frac{μ_{3}^{*} S_{n}^{τ} J_{n} diag (G_{n}^{τ} J_{n})}{n σ_{T}^{4}} & 0 & 0 \end{matrix}] .

(A7)

To derive the asymptotic distribution of

\hat{θ}

, we divide

Σ_{n, η}

into four block matrices, which correspond to the second-order derivatives of the likelihood function with respect to

θ

, the cross-partial derivatives of

θ

and

γ

, and the second-order derivatives of

γ

. The matrices are as follows:

Σ_{n, η} \equiv [\begin{matrix} Σ_{n, θ} & Σ_{n, θ γ}^{τ} \\ Σ_{n, θ γ} & Σ_{n, γ}, \end{matrix}]

where

\begin{matrix} Σ_{n, θ} \equiv [\begin{matrix} \frac{1}{2 σ_{T}^{4}} & * & * \\ \frac{tr G_{n}}{n σ_{T}^{2}} & \frac{Φ_{R R}}{σ_{T}^{2}} + \frac{tr (G_{n}^{2} + G_{n}^{τ} G_{n})}{n} & * \\ 0 & \frac{Φ_{X R}}{σ_{T}^{2}} & \frac{Φ_{X X}}{σ_{T}^{2}} \end{matrix}], Σ_{n, θ γ} \equiv [\begin{matrix} 0, \frac{Φ_{S R}}{σ_{T}^{2}} & \frac{Φ_{S X}}{σ_{T}^{2}} \end{matrix}], Σ_{n, γ} \equiv [\begin{matrix} \frac{Φ_{X X}}{σ_{T}^{2}} \end{matrix}] . \end{matrix}

Let

Σ_{n, θ}^{*}

be the first

s \times s

upper-left submatrix of

Σ_{n, θ}

,

Σ_{n, θ γ}^{*}

be the first

(K_{n} + l + 1) d \times s

upper-left submatrix of

Σ_{n, θ γ}

, and

Σ_{n, γ}^{*}

be the first

(K_{n} + l + 1) d \times (K_{n} + l + 1) d

upper-left submatrix of

Σ_{n, γ}

. Using the same argument, we partition

Ψ_{n, η}

into four block matrices, that is

Ψ_{n, η} \equiv [\begin{matrix} Ψ_{n, θ} & Ψ_{n, θ γ}^{τ} \\ Ψ_{n, θ γ} & Ψ_{n, γ} \end{matrix}],

(A8)

and obtain submatrices

Ψ_{n, θ}^{*}, Ψ_{n, θ γ}^{*}

and

Ψ_{n, γ}^{*}

. Let the notation of matrices without subscripts “n” represent the limiting version of the matrices. For example,

Σ_{θ}^{*} = {lim}_{n \to \infty} Σ_{n, θ}^{*}

. Let

Ω = Σ_{θ}^{*} - Σ_{θ γ}^{* τ} {(Σ_{γ}^{*})}^{- 1} Σ_{θ γ}^{*}

, and we assume that

Ω

is non-singular.

According to Theorems 1 and 2, as

n \to \infty

, with probability tending to 1,

Q_{n} (η)

achieves its maximal value at

{({\hat{θ}}^{* τ}, 0)}^{τ}

and

{({\hat{γ}}^{* τ}, 0)}^{τ}

; then,

{({\hat{θ}}^{* τ}, 0)}^{τ}

and

{({\hat{γ}}^{* τ}, 0)}^{τ}

must satisfy the following:

\frac{1}{\sqrt{n}} \frac{\partial Q_{n} ({({\hat{θ}}^{* τ}, 0)}^{τ}, {({\hat{γ}}^{* τ}, 0)}^{τ})}{\partial η} = 0 .

Applying the Taylor expansion, we have

\frac{1}{\sqrt{n}} (\begin{matrix} \frac{\partial ln L_{n} (η_{T})}{\partial θ^{*}} \\ \frac{\partial ln L_{n} (η_{T})}{\partial γ^{*}} \end{matrix}) - \{\frac{1}{n} (\begin{matrix} \frac{\partial^{2} ln L_{n} (η_{T})}{\partial θ^{*} \partial θ^{* τ}} & \frac{\partial^{2} ln L_{n} (η_{T})}{\partial γ^{* τ} \partial θ^{*}} \\ \frac{\partial^{2} ln L_{n} (η_{T})}{\partial γ^{*} \partial θ^{* τ}} & \frac{\partial^{2} ln L_{n} (η_{T})}{\partial γ^{*} \partial γ^{* τ}} \end{matrix}) + o_{p} (1)\} \sqrt{n} (\begin{matrix} \hat{θ} - θ_{T} \\ \hat{γ} - γ_{0} \end{matrix}) + (\begin{matrix} P_{θ} \\ P_{γ} \end{matrix}) = 0,

(A9)

where

P_{θ} = (\begin{matrix} \sqrt{n} p_{λ_{11}}^{'} (| {\hat{θ}}_{1} |) sgn ({\hat{θ}}_{1}) \\ ⋮ \\ \sqrt{n} p_{λ_{1 s}}^{'} (| {\hat{θ}}_{s} |) sgn ({\hat{θ}}_{s}) \end{matrix}), P_{γ} = (\begin{matrix} \sqrt{n} p_{λ_{21}}^{'} (∥ {\hat{γ}}_{1} ∥_{H}) \frac{H {\hat{γ}}_{1}}{∥ {\hat{γ}}_{1} ∥_{H}} \\ ⋮ \\ \sqrt{n} p_{λ_{2 d}}^{'} (∥ {\hat{γ}}_{d} ∥_{H}) \frac{H {\hat{γ}}_{d}}{∥ {\hat{γ}}_{d} ∥_{H}} \end{matrix}) .

Applying the Taylor expansion to

p_{λ_{1 j}}^{'} (| {\hat{θ}}_{j} |)

, we obtain that

p_{λ_{1 j}}^{'} (| {\hat{θ}}_{j} |) = p_{λ_{1 j}}^{'} (| θ_{0 j} |) + {p_{λ_{1 j}}^{″} (| θ_{0 j} |) + o_{p} (1)} ({\hat{θ}}_{j} - θ_{0 j}) .

Furthermore, Assumption C13 implies that

p_{λ_{1 j}}^{″} (| θ_{0 j} |) = o_{p} (1)

, and note that

p_{λ_{1 j}}^{'} (| θ_{0 j} |) = 0

as

λ_{max} \to 0

, then

P_{θ} = o_{p} (\sqrt{n} ({\hat{θ}}^{*} - θ_{T}^{*}))

. Using similar arguments, we can prove that

P_{γ} = o_{p} (\sqrt{n} ({\hat{γ}}^{*} - γ_{0}^{*}))

. Together with Lemma A4 and Equation (A9), a simple calculation yields the following:

\begin{matrix} \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial θ^{*}} - [Σ_{n, θ}^{*} + o_{p} (1)] \sqrt{n} ({\hat{θ}}^{*} - θ_{T}^{*}) - [Σ_{n, θ γ}^{* τ} + o_{p} (1)] \sqrt{n} ({\hat{γ}}^{*} - γ_{0}^{*}) = 0 \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial γ^{*}} - [Σ_{n, θ γ}^{*} + o_{p} (1)] \sqrt{n} ({\hat{θ}}^{*} - θ_{T}^{*}) - [Σ_{n, γ}^{*} + o_{p} (1)] \sqrt{n} ({\hat{γ}}^{*} - γ_{0}^{*}) = 0 . \end{matrix}

By substitution, the term

\sqrt{n} ({\hat{γ}}^{*} - γ_{0}^{*})

is eliminated, yielding

\sqrt{n} ({\hat{θ}}^{*} - θ_{T}^{*}) = {[Σ_{n, θ}^{*} - Σ_{n, θ γ}^{* τ} {(Σ_{n, θ γ}^{*})}^{- 1} Σ_{n, θ γ}^{*} + o_{p} (1)]}^{- 1} [I_{s}, - Σ_{n, θ γ}^{* τ} {(Σ_{n, θ γ}^{*})}^{- 1}] (\begin{matrix} \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial θ^{*}} \\ \frac{1}{\sqrt{n}} \frac{\partial ln L_{n} (η_{T})}{\partial γ^{*}} \end{matrix}) + o_{p} (1) .

(A10)

Furthermore, the central limit theorem for linear–quadratic forms [33] shows that

\begin{matrix} \frac{1}{\sqrt{n}} (\begin{matrix} \frac{\partial ln L_{n} (η_{T})}{\partial θ^{*}} \\ \frac{\partial ln L_{n} (η_{T})}{\partial γ^{*}} \end{matrix}) \overset{D}{\to} N (0, \frac{T}{T - 1} [\begin{matrix} Σ_{θ}^{*} + Ψ_{θ}^{*} & Σ_{θ γ}^{* τ} + Ψ_{θ γ}^{* τ} \\ Σ_{θ γ}^{*} + Ψ_{θ γ}^{*} & Σ_{γ}^{*} + Ψ_{γ}^{*}, \end{matrix}]) \end{matrix}

(A11)

where the definition of the symbols in the covariance matrix is located in the context of (A8). Then, invoking (A10) and (A11), and using the Slutsky theorem, we have

\sqrt{n} ({\hat{θ}}^{*} - θ_{T}^{*}) \to N (0, \frac{T}{T - 1} \{Ω^{- 1} + Ω^{- 1} [Ψ_{θ}^{*} - 2 Σ_{θ γ}^{* τ} {(Σ_{γ}^{*})}^{- 1} Ψ_{θ}^{*}] Ω^{- 1}\}) .

□

Appendix B. Supplementary Simulation Results

Table A1. Variable selection under

ϵ_{i t} \sim \sqrt{σ_{0}^{2} / 1.5} \cdot t (6)

in Example 1.

Table A1. Variable selection under

ϵ_{i t} \sim \sqrt{σ_{0}^{2} / 1.5} \cdot t (6)

in Example 1.

			$σ_{0}^{2} = 1$						$σ_{0}^{2} = 2$
$N \times T$	$ρ_{0}$	Method	$\hat{θ}$			$\hat{α} (\cdot)$			$\hat{θ}$			$\hat{α} (\cdot)$
			GMSE	C	I	ASE	C	I	GMSE	C	I	ASE	C	I
$30 \times 10$	0	SCAD	0.027	5.944	0	0.161	4.944	0	0.076	5.938	0	0.234	4.716	0
		ALASSO	0.032	5.748	0	0.160	4.960	0	0.086	5.630	0	0.211	4.914	0
		Oracle	0.027	6.000	0	0.157	5.000	0	0.071	6.000	0	0.206	5.000	0
	0.3	SCAD	0.032	4.846	0	0.160	4.962	0	0.084	4.798	0	0.237	4.734	0
		ALASSO	0.032	4.902	0	0.158	4.968	0	0.086	4.830	0	0.213	4.944	0
		Oracle	0.029	5.000	0	0.157	5.000	0	0.074	5.000	0	0.207	5.000	0
	0.7	SCAD	0.032	4.970	0	0.158	4.954	0	0.092	4.966	0	0.236	4.750	0
		ALASSO	0.036	4.904	0	0.158	4.960	0	0.102	4.798	0	0.218	4.918	0
		Oracle	0.031	5.000	0	0.155	5.000	0	0.086	5.000	0	0.211	5.000	0
$30 \times 15$	0	SCAD	0.021	5.972	0	0.140	4.918	0	0.052	5.966	0	0.188	4.712	0
		ALASSO	0.025	5.836	0	0.138	4.980	0	0.066	5.752	0	0.172	4.946	0
		Oracle	0.022	6.000	0	0.137	5.000	0	0.051	6.000	0	0.170	5.000	0
	0.3	SCAD	0.021	4.890	0	0.138	4.932	0	0.060	4.866	0	0.185	4.736	0
		ALASSO	0.023	4.944	0	0.135	4.998	0	0.069	4.896	0	0.170	4.982	0
		Oracle	0.020	5.000	0	0.135	5.000	0	0.059	5.000	0	0.167	5.000	0
	0.7	SCAD	0.025	4.994	0	0.139	4.922	0	0.063	4.966	0	0.183	4.762	0
		ALASSO	0.030	4.920	0	0.138	4.994	0	0.076	4.858	0	0.174	4.956	0
		Oracle	0.025	5.000	0	0.136	5.000	0	0.060	5.000	0	0.167	5.000	0
$60 \times 10$	0	SCAD	0.015	5.992	0	0.128	5.000	0	0.039	5.976	0	0.152	4.988	0
		ALASSO	0.018	5.832	0	0.129	4.996	0	0.053	5.812	0	0.153	4.992	0
		Oracle	0.016	6.000	0	0.128	5.000	0	0.040	6.000	0	0.152	5.000	0
	0.3	SCAD	0.020	4.932	0	0.129	5.000	0	0.038	4.894	0	0.151	4.998	0
		ALASSO	0.022	4.966	0	0.129	4.998	0	0.043	4.928	0	0.152	4.992	0
		Oracle	0.020	5.000	0	0.129	5.000	0	0.036	5.000	0	0.151	5.000	0
	0.7	SCAD	0.020	4.988	0	0.128	5.000	0	0.049	4.990	0	0.151	4.998	0
		ALASSO	0.024	4.954	0	0.129	4.998	0	0.060	4.926	0	0.154	4.996	0
		Oracle	0.020	5.000	0	0.128	5.000	0	0.048	5.000	0	0.151	5.000	0
$60 \times 15$	0	SCAD	0.014	6.000	0	0.118	4.994	0	0.028	5.998	0	0.139	4.888	0
		ALASSO	0.016	5.906	0	0.119	5.000	0	0.036	5.878	0	0.136	4.998	0
		Oracle	0.014	6.000	0	0.118	5.000	0	0.030	6.000	0	0.135	5.000	0
	0.3	SCAD	0.015	4.930	0	0.119	4.978	0	0.037	4.958	0	0.139	4.854	0
		ALASSO	0.017	4.934	0	0.120	4.984	0	0.043	4.938	0	0.135	5.000	0
		Oracle	0.015	5.000	0	0.119	5.000	0	0.037	5.000	0	0.134	5.000	0
	0.7	SCAD	0.015	4.998	0	0.119	4.998	0	0.030	4.990	0	0.136	4.920	0
		ALASSO	0.018	4.970	0	0.119	4.998	0	0.040	4.940	0	0.136	4.994	0
		Oracle	0.016	5.000	0	0.119	5.000	0	0.031	5.000	0	0.134	5.000	0

Table A2. Variable selection for

n = 100 \times 20

in Example 2.

Table A2. Variable selection for

n = 100 \times 20

in Example 2.

		$σ_{0}^{2} = 1$						$σ_{0}^{2} = 2$
$ρ_{0}$	Method	$\hat{θ}$			$\hat{α} (\cdot)$			$\hat{θ}$			$\hat{α} (\cdot)$
		GMSE	C	I	ASE	C	I	GMSE	C	I	ASE	C	I
0	SCAD	0.004	46.986	0	0.037	11.670	0	0.010	46.982	0	0.049	11.508	0
	ALASSO	0.005	46.388	0	0.036	11.996	0	0.018	45.818	0	0.048	11.982	0
	Oracle	0.003	47.000	0	0.036	12.000	0	0.009	47.000	0	0.046	12.000	0
0.3	SCAD	0.004	45.814	0	0.039	10.972	0	0.011	45.602	0	0.053	11.110	0
	ALASSO	0.006	45.506	0	0.037	11.914	0	0.018	44.648	0	0.049	11.672	0
	Oracle	0.004	46.000	0	0.036	12.000	0	0.009	46.000	0	0.046	12.000	0
0.7	SCAD	0.004	45.986	0	0.038	11.450	0	0.011	45.964	0	0.049	11.420	0
	ALASSO	0.006	45.568	0	0.037	11.992	0	0.021	44.846	0	0.048	11.964	0
	Oracle	0.004	46.000	0	0.036	12.000	0	0.010	46.000	0	0.046	12.000	0

References

Anselin, L.; Hudak, S. Spatial econometrics in practice: A review of software options. Reg. Sci. Urban Econ. 1992, 22, 509–536. [Google Scholar] [CrossRef]
Baltagi, B.H.; Heun Song, S.; Cheol Jung, B.; Koh, W. Testing for serial correlation, spatial autocorrelation and random effects using panel data. J. Econom. 2007, 140, 5–51. [Google Scholar] [CrossRef]
Kapoor, M.; Kelejian, H.H.; Prucha, I.R. Panel data models with spatially correlated error components. J. Econom. 2007, 140, 97–130. [Google Scholar] [CrossRef]
Baltagi, B.H.; Kao, C.; Liu, L. Asymptotic properties of estimators for the linear panel regression model with random individual effects and serially correlated errors: The case of stationary and non-stationary regressors and residuals. Econom. J. 2008, 11, 554–572. [Google Scholar] [CrossRef]
Lee, L.F.; Yu, J. Estimation of spatial autoregressive panel data models with fixed effects. J. Econom. 2010, 154, 165–185. [Google Scholar] [CrossRef]
Ai, C.; Zhang, Y. Estimation of partially specified spatial panel data models with fixed-effects. Econom. Rev. 2017, 36, 6–22. [Google Scholar] [CrossRef]
Zhang, Y.; Sun, Y. Estimation of partially specified dynamic spatial panel data models with fixed-effects. Reg. Sci. Urban Econ. 2015, 51, 37–46. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Shen, D. Estimation of semi-parametric varying-coefficient spatial panel data models with random-effects. J. Stat. Plan. Inference 2015, 159, 64–80. [Google Scholar] [CrossRef]
Feng, S.; Tong, T.; Chiu, S.N. Statistical Inference for Partially Linear Varying Coefficient Spatial Autoregressive Panel Data Model. Mathematics 2023, 11, 4606. [Google Scholar] [CrossRef]
Hu, X. Estimation in a semi-varying coefficient model for panel data with fixed effects. J. Syst. Sci. Complex. 2014, 27, 594–604. [Google Scholar] [CrossRef]
He, B.; Hong, X.; Fan, G. Empirical likelihood for semi-varying coefficient models for panel data with fixed effects. J. Korean Stat. Soc. 2016, 45, 395–408. [Google Scholar] [CrossRef]
Feng, S.; He, W.; Li, F. Model detection and estimation for varying coefficient panel data models with fixed effects. Comput. Stat. Data Anal. 2020, 152, 107054. [Google Scholar] [CrossRef]
Feng, S.; Li, G.; Peng, H.; Tong, T. Varying coefficient panel data model with interactive fixed effects. Stat. Sin. 2021, 31, 935–957. [Google Scholar] [CrossRef]
Sun, Y.; Carroll, R.; Li, D. Semiparametric estimation of fixed-effects panel data varying coefficient models. Adv. Econom. 2009, 25, 101–129. [Google Scholar]
Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Wang, L.; Li, H.; Huang, J.Z. Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements. J. Am. Stat. Assoc. 2008, 103, 1556–1569. [Google Scholar] [CrossRef]
Li, R.; Liang, H. Variable selection in semiparametric regression modeling. Ann. Stat. 2008, 36, 261–286. [Google Scholar] [CrossRef]
Wang, H.J.; Zhu, Z.; Zhou, J. Quantile regression in partially linear varying coefficient models. Ann. Stat. 2009, 37, 3841–3866. [Google Scholar] [CrossRef]
Zhao, P.; Xue, L. Variable selection for semiparametric varying coefficient partially linear models. Stat. Probab. Lett. 2009, 79, 2148–2157. [Google Scholar] [CrossRef]
Tian, R.; Xue, L.; Liu, C. Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data. J. Multivar. Anal. 2014, 132, 94–110. [Google Scholar] [CrossRef]
Tian, R.; Xue, L.; Hu, Y. Smooth-threshold GEE variable selection for varying coefficient partially linear models with longitudinal data. J. Korean Stat. Soc. 2015, 44, 419–431. [Google Scholar] [CrossRef]
Li, R.; Mu, S.; Hao, R. Estimation and variable selection for partially linear additive models with measurement errors. Commun. Stat. Theory Methods 2021, 50, 1416–1445. [Google Scholar] [CrossRef]
Ma, X.; Du, Y.; Wang, J. Model detection and variable selection for mode varying coefficient model. Stat. Methods Appl. 2022, 31, 321–341. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Z.; Tian, M.; Yu, K. Estimation and variable selection for generalized functional partially varying coefficient hybrid models. Stat. Pap. 2024, 65, 93–119. [Google Scholar] [CrossRef]
Neyman, J.; Scott, E.L. Consistent Estimates Based on Partially Consistent Observations. Econometrica 1948, 16, 1–32. [Google Scholar] [CrossRef]
Liu, X.; Chen, J.; Cheng, S. A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat. Stat. 2018, 25, 86–104. [Google Scholar] [CrossRef]
Xie, T.; Cao, R.; Du, J. Variable selection for spatial autoregressive models with a diverging number of parameters. Stat. Pap. 2020, 61, 1125–1145. [Google Scholar] [CrossRef]
Luo, G.; Wu, M. Variable selection for semiparametric varying-coefficient spatial autoregressive models with a diverging number of parameters. Commun. Stat. Theory Methods 2021, 50, 2062–2079. [Google Scholar] [CrossRef]
Lee, L.F. Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Autoregressive Models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
Tian, R.; Xia, M.; Xu, D. Profile quasi-maximum likelihood estimation for semiparametric varying-coefficient spatial autoregressive panel models with fixed effects. Stat. Pap. 2024, 65, 5109–5143. [Google Scholar] [CrossRef]
Kelejian, H.H.; Prucha, I.R. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 2001, 104, 219–257. [Google Scholar] [CrossRef]
He, X.; Fung, W.K.; Zhu, Z. Robust Estimation in Generalized Partial Linear Models for Clustered Data. J. Am. Stat. Assoc. 2005, 100, 1176–1184. [Google Scholar] [CrossRef]
Stone, C.J. Optimal Global Rates of Convergence for Nonparametric Regression. Ann. Stat. 1982, 10, 1040–1053. [Google Scholar] [CrossRef]
Dietz, T.; Rosa, E. Effects of population and affluence on CO₂ emissions. Proc. Natl. Acad. Sci. USA 1997, 94, 175–179. [Google Scholar] [CrossRef]
Wen, L.; Shao, H. Analysis of influencing factors of the carbon dioxide emissions in China’s commercial department based on the STIRPAT model and ridge regression. Environ. Sci. Pollut. Res. 2019, 26, 27138–27147. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Wang, W.; Wang, Y.; Qin, Y. Industrial structure, technological progress and CO₂ emissions in China: Analysis based on the STIRPAT framework. Nat. Hazards 2017, 88, 1545–1564. [Google Scholar] [CrossRef]

Figure 1. The estimated varying coefficient functions based on SCAD and ALASSO. (a) Varying coefficient derived from SCAD; (b) varying coefficient derived from ALASSO.

Table 1. Variable selection under

ϵ_{i t} \sim N (0, σ_{0}^{2})

in Example 1

(T = 10)

.

Table 1. Variable selection under

ϵ_{i t} \sim N (0, σ_{0}^{2})

in Example 1

(T = 10)

.

			$σ_{0}^{2} = 1$						$σ_{0}^{2} = 2$
N	$ρ_{0}$	Method	$\hat{θ}$			$\hat{α} (\cdot)$			$\hat{θ}$			$\hat{α} (\cdot)$
			GMSE	C	I	ASE	C	I	GMSE	C	I	ASE	C	I
30	0	SCAD	0.020	5.970	0	0.158	4.950	0	0.049	5.966	0	0.234	4.722	0
		ALASSO	0.025	5.772	0	0.158	4.958	0	0.062	5.654	0	0.210	4.924	0
		Oracle	0.020	6.000	0	0.155	5.000	0	0.044	6.000	0	0.207	5.000	0
	0.3	SCAD	0.030	4.822	0	0.160	4.940	0	0.064	4.802	0	0.235	4.726	0
		ALASSO	0.028	4.880	0	0.157	4.950	0	0.064	4.812	0	0.214	4.920	0
		Oracle	0.025	5.000	0	0.156	5.000	0	0.052	5.000	0	0.206	5.000	0
	0.7	SCAD	0.026	4.984	0	0.159	4.962	0	0.069	4.966	0	0.233	4.758	0
		ALASSO	0.030	4.916	0	0.158	4.972	0	0.081	4.828	0	0.218	4.934	0
		Oracle	0.025	5.000	0	0.156	5.000	0	0.063	5.000	0	0.209	5.000	0
60	0	SCAD	0.011	5.990	0	0.128	4.992	0	0.023	5.986	0	0.160	4.826	0
		ALASSO	0.014	5.834	0	0.128	4.998	0	0.032	5.742	0	0.154	4.990	0
		Oracle	0.012	6.000	0	0.128	5.000	0	0.024	6.000	0	0.152	5.000	0
	0.3	SCAD	0.015	4.944	0	0.129	4.980	0	0.027	4.896	0	0.162	4.778	0
		ALASSO	0.016	4.958	0	0.129	5.000	0	0.030	4.906	0	0.154	4.990	0
		Oracle	0.014	5.000	0	0.128	5.000	0	0.026	5.000	0	0.152	5.000	0
	0.7	SCAD	0.015	4.988	0	0.129	4.978	0	0.029	4.990	0	0.159	4.848	0
		ALASSO	0.018	4.956	0	0.129	4.996	0	0.036	4.890	0	0.155	4.992	0
		Oracle	0.015	5.000	0	0.128	5.000	0	0.029	5.000	0	0.152	5.000	0

Table 2. Variable selection under

ϵ_{i t} \sim N (0, σ_{0}^{2})

in Example 1

(T = 15)

.

Table 2. Variable selection under

ϵ_{i t} \sim N (0, σ_{0}^{2})

in Example 1

(T = 15)

.

			$σ_{0}^{2} = 1$						$σ_{0}^{2} = 2$
N	$ρ_{0}$	Method	$\hat{θ}$			$\hat{α} (\cdot)$			$\hat{θ}$			$\hat{α} (\cdot)$
			GMSE	C	I	ASE	C	I	GMSE	C	I	ASE	C	I
30	0	SCAD	0.014	5.974	0	0.138	4.930	0	0.031	5.960	0	0.179	4.796	0
		ALASSO	0.018	5.810	0	0.136	4.990	0	0.045	5.730	0	0.168	4.984	0
		Oracle	0.015	6.000	0	0.135	5.000	0	0.030	6.000	0	0.166	5.000	0
	0.3	SCAD	0.017	4.918	0	0.137	4.944	0	0.038	4.874	0	0.187	4.702	0
		ALASSO	0.019	4.952	0	0.135	4.998	0	0.046	4.902	0	0.172	4.962	0
		Oracle	0.016	5.000	0	0.135	5.000	0	0.035	5.000	0	0.168	5.000	0
	0.7	SCAD	0.019	4.980	0	0.139	4.934	0	0.045	4.978	0	0.185	4.740	0
		ALASSO	0.023	4.930	0	0.137	4.988	0	0.058	4.854	0	0.172	4.964	0
		Oracle	0.019	5.000	0	0.136	5.000	0	0.042	5.000	0	0.168	5.000	0
60	0	SCAD	0.011	5.994	0	0.118	4.988	0	0.018	5.984	0	0.137	4.910	0
		ALASSO	0.013	5.864	0	0.119	4.988	0	0.025	5.836	0	0.135	5.000	0
		Oracle	0.012	6.000	0	0.118	5.000	0	0.019	6.000	0	0.134	5.000	0
	0.3	SCAD	0.012	4.974	0	0.119	4.994	0	0.020	4.942	0	0.137	4.892	0
		ALASSO	0.013	4.978	0	0.119	5.000	0	0.024	4.950	0	0.136	4.996	0
		Oracle	0.012	5.000	0	0.119	5.000	0	0.020	5.000	0	0.134	5.000	0
	0.7	SCAD	0.013	4.998	0	0.120	4.994	0	0.022	4.996	0	0.138	4.904	0
		ALASSO	0.016	4.976	0	0.120	5.000	0	0.031	4.954	0	0.138	4.994	0
		Oracle	0.013	5.000	0	0.120	5.000	0	0.023	5.000	0	0.135	5.000	0

Table 3. Description of related variables.

First-Tier Indicators	Second-Tier Indicators	Abbreviation	Symbol
Environmental impact	Carbon emissions per capita	CE	$y_{i t}$
Population	Urbanization rate	UR	$x_{i t 1}$
Science and technology	Energy intensity	EI	$x_{i t 2}$
Science and technology	R&D funding intensity	RDF	$x_{i t 3}$
Finance	Financial Interrelation Ratio	FIR	$x_{i t 4}$
Finance	Financial Efficiency	FE	$x_{i t 5}$
Transport	Freight volume per capita	FV	$x_{i t 6}$
Transport	Public transport passenger volume per capita	PTPV	$x_{i t 7}$
Ecology	Percentage of forest cover	PF	$x_{i t 8}$
Energy Structure	Coal consumption proportion	CCP	$u_{i t}$
Affluence	GDP per capita	GDP	$z_{i t 1}$
	Fiscal revenue per capita	FR	$z_{i t 2}$
	Residents consumption level	RCL	$z_{i t 3}$
	Disposable income per capita	DI	$z_{i t 4}$
	Import and export amount per capita	IEA	$z_{i t 5}$
	Fixed investment per capita	FINV	$z_{i t 6}$

Table 4. Penalized estimators for the parametric components.

Method	$\hat{ρ}$	${\hat{β}}_{1}$	${\hat{β}}_{2}$	${\hat{β}}_{3}$	${\hat{β}}_{4}$	${\hat{β}}_{5}$	${\hat{β}}_{6}$	${\hat{β}}_{7}$	${\hat{β}}_{8}$	${\hat{σ}}^{2}$
SCAD	0.154	0	0.350	0.087	0.245	0	0.100	−0.014	0	0.005
ALASSO	0.167	0	0.355	0.120	0.193	0	0.113	−0.017	0	0.005

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.