Next Article in Journal
On Stability of Non-Surjective Coarse Isometries of Banach Spaces
Previous Article in Journal
Quantum Private Set Intersection Scheme Based on Bell States
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Penalized Profile Quasi-Likelihood Method for a Semiparametric Varying Coefficient Spatial Autoregressive Panel Model with Fixed Effects

1
School of Mathematics, Hangzhou Normal University, Hangzhou 311121, China
2
School of Mathematics, Statistics and Mechanics, Beijing University of Technology, Beijing 100124, China
3
School of Economics, Hangzhou Dianzi University, Hangzhou 310018, China
*
Author to whom correspondence should be addressed.
Axioms 2025, 14(2), 121; https://doi.org/10.3390/axioms14020121
Submission received: 11 January 2025 / Revised: 3 February 2025 / Accepted: 5 February 2025 / Published: 7 February 2025

Abstract

:
This paper proposes a variable selection method for a semiparametric varying coefficient spatial autoregressive panel model with fixed effects based on a penalized profile quasi-likelihood method, which can simultaneously select significant variables in parametric components and nonparametric components without estimating fixed effects. With an appropriate selection of the tuning parameters and some mild assumptions, the consistency of this procedure and the oracle property of the obtained estimators are established. Then, we conduct some Monte Carlo simulations to assess the finite sample performance of the proposed variable selection method, and finally, we analyze a real dataset for further illustration.

1. Introduction

Recently, there has been a surge in focus on spatial panel data model research. These models not only account for the spatial interdependencies of economic phenomena but also allow investigators to manage the unobservable heterogeneity among geographical units [1,2,3,4,5]. The basic spatial panel model is specified as follows:
y i t = ρ 0 j = 1 N w i j y j t + X i t τ β 0 + μ 0 i + ϵ i t ,   i = 1 , 2 , , N ;   t = 1 , 2 , , T ,
where { y i t , X i t } represents the observations of an individual i in period t, w i j denotes the spatial weight between an individual i and j, μ 0 i denotes the unobserved and time-invariant individual effect, ϵ i t denotes random disturbance, and { ρ 0 , β 0 } denotes the unknown true parameter value. However, model (1) adopts a linear specification and conducting parametric statistical inference inherently requires a set of model assumptions, with linearity serving as one of the most practical options. Despite their robust theoretical foundations, linear models frequently fall short in practical applications. Furthermore, when a linear model is misspecified with respect to the data analysis process, it may result in significant modeling biases and misleading conclusions. Therefore, more flexible spatial panel models are required. Ai and Zhang [6] extended model (1) to a partially linear spatial panel model with fixed effects by adding an unknown function, and they proposed a sieve two-stage least squares regression to consistently estimate their model. Zhang and Sun [7] considered a partially specified dynamic spatial panel model, which takes into account past information of the dependent variable. Furthermore, a model known as the semiparametric varying coefficient spatial autoregressive (hereafter, SVCSAR) panel model, which strikes a balance between flexibility and interpretability, is being popularly studied; see [8,9] for more details. The model is specified as follows:
y i t = ρ 0 j = 1 N w i j y j t + X i t τ β 0 + Z i t τ α 0 ( u i t ) + μ 0 i + ϵ i t , i = 1 , 2 , , N ; t = 1 , 2 , , T ,
where { y i t , u i t , X i t , Z i t } represents the observations of an individual i in period t. α 0 ( u ) is a vector of unknown functions. Model (2) is a general model that can be reduced to some existing panel models. For instance, if ρ 0 = 0 , this model is reduced to the varying coefficient panel model studied by [10,11,12,13]. If ρ 0 = 0 and β 0 = 0 , this model is reduced to a varying coefficient panel model [14]. If α 0 ( · ) 0 , this model generalizes the spatial autoregressive panel model [5]. Moreover, if ρ 0 = 0 , β 0 = 0 and α 0 ( · ) 0 , this model becomes the classical panel model. However, while model (2) can be reduced to various panel models, it also increases the risk of model misspecification in practical applications. Specifying the model form becomes an inevitable issue, which is equivalent to detecting the zero components of { ρ 0 , β 0 , α 0 ( · ) } . In other words, a variable selection method for model (2) is required. Additionally, when the number of covariates in model (2) is large, selecting important variables is also a purpose of the variable selection method.
Variable selection constitutes a crucial aspect of contemporary statistical inference. Over the years, numerous variable selection techniques have emerged for parametric models. LASSO [15], SCAD [16], and ALASSO [17] are the most popular methods among them. Based on those methods, variable selection methods for nonparametric or semiparametric models have been established in recent years. Wang et al. [18] considered variable selection for varying coefficient models using the SCAD penalty. Li and Liang [19] utilized the SCAD penalty to identify significant variables within the parametric components of a semiparametric varying coefficient partially linear model. Wang et al. [20] and Zhao and Xue [21] presented a variable selection procedure by combining basis function approximations with SCAD penalty for semiparametric varying coefficient partially linear models. The proposed procedure simultaneously selects significant variables in parametric components and nonparametric components. Tian et al. [22] introduced a novel method for variable selection, which integrates basis function approximations with quadratic inference functions. This approach enables the simultaneous identification of significant variables in both parametric and nonparametric components. For further development of the variable selection methods for nonparametric or semiparametric models, see [23,24,25,26], among others. However, there are few studies on the variable selection of panel model (2), in which spatial components and nonparametric components are simultaneously included.
In this paper, we propose a variable selection procedure for model (2). In order to avoid incidental parameter problems [27] brought by unknown fixed effects, this variable selection procedure combines a profile quasi-likelihood method with the basis function approximation and SCAD penalty to achieve estimation and variable selection simultaneously. The proposed procedure can shrink the spatial, linear, and functional coefficients of irrelevant covariates automatically to achieve variable selection. Moreover, by selecting appropriate tuning parameters, we demonstrate the consistency of our variable selection method. The regression coefficient estimators exhibit the oracle property, which implies that the nonparametric component estimators converge optimally, while the parametric component estimators share the same asymptotic behavior as those derived from the true submodel. This suggests that our penalized estimators perform as effectively as if the true zero coefficients were known. Compared with Liu et al. [28] and Xie et al. [29], we consider panel data and varying coefficient components. In comparison, although Luo and Wu [30] took into account variable selection for the SVCSAR model, the model they analyzed was confined to cross-sectional data. Furthermore, they only selected parametric coefficients. However, our method enables the simultaneous selection of significant variables in both parametric and nonparametric components under panel data.
The rest of this paper is structured as follows. In Section 2, we introduce a variable selection process designed for the SVCSAR panel model with fixed effects. In Section 3, we establish the asymptotic properties of the resulting estimators. In Section 4, we detail the computational steps for obtaining these estimators and discuss the selection of tuning parameters. In Section 5, we conduct simulations to assess the performance of our method with finite samples. Additionally, in Section 6, we provide a real-data analysis to further demonstrate the application of our proposed methodology. Finally, in Section 7, we conclude the article with a concise discussion. All technical proofs supporting the asymptotic results are included in the Appendix A.

2. Penalized Profile Quasi-Likelihood Method

We first consider the SVCSAR panel model described in (2). Specifically, y i t and u i t are scalars, while X i t is a p × 1 vector and Z i t is a q × 1 vector, respectively; β 0 = ( β 01 , β 02 , , β 0 p ) τ is an unknown vector that reflects the linear effect of X i t on y i t , and it is assumed that only a subset of its elements are non-zero. Without loss of generality, we assume that the first s elements specifically are non-zero. α 0 ( u ) α 01 ( u ) , , α 0 k ( u ) , , α 0 q ( u ) τ is a vector of unknown functions. Similarly, we assume that the first d function is non-zero. ϵ i t is an i.i.d. disturbance with zero mean and finite unknown variance σ 0 2 ; thus, { σ 0 2 , ρ 0 , β 0 , α 0 , μ 01 , , μ 0 N } are unknown components. Next, we show a penalized quasi-likelihood method that can skip the estimation of μ 0 i ( i = 1 , , N ) and shrink the remaining estimators.
Let n = N × T , Y n ( Y 1 τ , , Y t τ , , Y T τ ) τ with Y t = ( y 1 t , , y N t ) τ ; W n I T W N , where “⊗” demotes the Kronecker product symbol and I T denotes a T-dimensional identity matrix; X n ( X 1 τ , , X t τ , , X T τ ) τ with X t = ( X 1 t , , X N t ) τ ; A 0 ( A 01 τ , , A 0 t τ , , A 0 T τ ) τ with A 0 t = ( Z 1 t τ α 0 ( u 1 t ) , , Z N t τ α 0 ( u N t ) ) τ ; V n ( V 1 τ , , V t τ , , V T τ ) τ with V t = ( ϵ 1 t , , ϵ N t ) τ ; μ 0 = ( μ 01 , . . , μ 0 N ) τ denotes the fixed effects vector; D n ι T I N , where ι T represents a T-dimensional vector with all 1. Then, the model (2) can be expressed in matrix form:
Y n = ρ 0 W n Y n + X n β 0 + A 0 + D n μ 0 + V n ,
where E ( V n ) = 0 , var ( V n ) = σ 0 2 I n . Thus, the unknown parametric component is ( σ 0 2 , ρ 0 , β 0 , μ 0 ) , and the unknown nonparametric component is α 0 ( u ) .
Let θ = ( σ 2 , ρ , β τ ) τ , θ 0 = ( σ 0 2 , ρ 0 , β 0 τ ) τ , and M n ( ρ ) = I n ρ W n for any ρ and M n M n ( ρ 0 ) . Subsequently, we suggest optimizing the log-Gaussian quasi-likelihood, following the approach taken by [31,32], and the log-Gaussian quasi-likelihood of (3) is
ln L ˜ ( θ , μ , α ( u ) ) = n 2 ln 2 π n 2 ln σ 2 + ln | M n ( ρ ) | 1 2 σ 2 M n ( ρ ) Y n X n β A D n μ τ M n ( ρ ) Y n X n β A D n μ ,
where A = A 1 τ , , A t τ , , A T τ τ , A t = Z 1 t τ α ( u 1 t ) , , Z N t τ α ( u N t ) τ , and α ( u ) = α 1 ( u ) , , α q ( u ) τ . When maximizing Equation (4), we encounter two main challenges: (i) Directly estimating μ 0 can lead to the incidental parameter problem [27], especially when μ 0 becomes high-dimensional as N increases. (ii) Estimating α 0 is difficult due to its infinite dimensionality. To address these issues, we follow the approach in Tian et al. [32] and let
B ( u ) = B 1 ( u ) , B 2 ( u ) , , B K n + l + 1 ( u ) τ
denote a vector comprising normalized B-spline basis functions of order l with K n internal knots. Subsequently, we approximate α 0 k ( u ) as a linear combination of B 1 ( u ) , B 2 ( u ) , , B K n + l + 1 ( u ) , i.e., α 0 k ( u ) B τ ( u ) γ 0 k for k = 1 , 2 , , q . Consequently, mode (3) can be written as follows:
Y n ρ 0 W n Y n + X n β 0 + S n γ 0 + D n μ 0 + V n ,
where S n = ( S 1 τ , , S t τ , , S T τ ) τ , S t = I q B ( u 1 t ) Z 1 t , , I q B ( u N t ) Z N t , and γ 0 = ( γ 01 τ , , γ 0 q τ ) τ . The log-Gaussian quasi-likelihood of (5) is
ln L ^ ( θ , μ , γ ) = n 2 ln 2 π n 2 ln σ 2 + ln | M n ( ρ ) | 1 2 σ 2 M n ( ρ ) Y n X n β S n γ D n μ τ M n ( ρ ) Y n X n β S n γ D n μ .
To avoid the incidental parameter problem [27] caused by μ , we first concentrate μ out and obtain the profile quasi-likelihood. Let η = ( θ τ , γ τ ) τ . For given η , from (6), we derive
μ ^ ( η ) = ( D n τ D n ) 1 D n τ ( M n ( ρ ) Y n X n β S n γ )
and substitute this into (6). Then,
ln L n ( η ) = n 2 ln 2 π n 2 ln σ 2 + ln | M n ( ρ ) | 1 2 σ 2 M n ( ρ ) Y n X n β S n γ τ J n M n ( ρ ) Y n X n β S n γ ,
where J n = ( I n Q n ) τ ( I n Q n ) and Q n = D n ( D n τ D n ) 1 D n τ .
Inspired by the concept of variable selection in semiparametric varying coefficient partially linear models [21], we introduce a penalized profile quasi-likelihood function defined as follows:
Q n ( η ) = ln L n ( η ) n j = 2 p + 2 p λ 1 , n ( | θ j | ) n k = 1 q p λ 2 , n ( B τ ( · ) γ k ) ,
where B τ ( · ) γ k = ( ( B τ ( u ) γ k ) 2 d u ) 1 / 2 , and p λ ( · ) is a SCAD penalty function [16] defined by
p λ ( ω ) = λ I ( ω λ ) + ( a λ ω ) + ( a 1 ) λ I ( ω λ ) ,
with a > 2 , ω > 0 and p λ ( 0 ) = 0 . Throughout this paper, we adopt the suggestion by Fan and Li [16] that the choice of a = 3.7 performs well in a variety of situations. The tuning parameter λ can be different for all θ j and B τ ( · ) γ k . Note that B τ ( · ) γ k   = ( ( B τ ( u ) γ k ) 2 d u ) 1 / 2 = ( γ k τ H γ k ) 1 / 2 γ k H , where H = B ( u ) B τ ( u ) d u . Then, the penalized profile quasi-likelihood function can be written as follows:
Q n ( η ) = ln L n ( η ) n j = 2 p + 2 p λ 1 , n ( | θ j | ) n k = 1 q p λ 2 , n ( γ k H ) .
Let η ^ = ( θ ^ τ , γ ^ τ ) τ be the solution by maximizing (9). Then, θ ^ is the penalized profile quasi-likelihood estimator (penalized profile QMLE) of θ , and the estimator of α k ( u ) can be obtained by α k ( u ) = B τ ( u ) γ ^ k . Next, we study the asymptotic properties of the resulting penalized profile likelihood estimators. Without loss of generality, we assume that θ 0 j = 0 for j = s + 1 , , p + 2 , with the remaining θ 0 j ( j = 1 , , s ) being the non-zero components of θ 0 . Similarly, we assume that α 0 k ( · ) 0 for k = d + 1 , , q and that the α 0 k ( · ) , ( k = 1 , , d ) constitute all the non-zero components of α 0 ( · ) .

3. Asymptotic Results

Denote G n = W n M n 1 , R n = G n ( X n β 0 + A 0 + D n μ 0 ) . The following assumptions are necessary prior to establishing the asymptotic properties.
C1
T is finite and greater than 2, and N is large.
C2
The disturbances { ϵ i t } for i = 1 , 2 , , n and t = 1 , 2 , , T are independently and identically distributed with zero mean, and the finite variance is σ 0 2 . Additionally, for some v > 0 , E | ϵ i t | 4 + v exists.
C3
The entries { w i j } of W n satisfy w i i = 0 and w i j = O ( 1 / h n ) , where h n / n 0 as n .
C4
The matrix M n ( ρ ) is nonsingular for all ρ in a compact parameter space Λ . The sequences { M n 1 ( ρ ) } are uniformly bounded in either row or column sums for all ρ Λ . The true ρ 0 is in the interior of Λ .
C5
The sequences of matrices { W n } and { M n 1 } are uniformly bounded in both row and column sums.
C6
The elements of X n and S n are uniformly bounded for all n, and the limit lim n n 1 ( X n , S n , R n ) τ J n ( X n , S n , R n ) exists and is nonsingular.
C7
There exists a constant λ c , such that λ c I n Γ n Γ n τ is positive and semidefinite for all n, where Γ n = R n , X n , G n .
C8
The limits lim n E [ n 1 2 ln L n ( η T ) / η η τ ] exist.
C9
Third derivatives 3 ln L n ( η ) / ( η i η j η k ) exist for all η in an open set H that contains the parameter point η T . Furthermore, there exist functions M i j k , such that | 3 ln L n ( η ) / ( η i η j η k ) |   M i j k for all η H , where E ( M i j k ) for i , j , k .
C10
For k = 1 q , α k ( u ) C r ( 0 , 1 ) , where r 2 . The distribution of u i t is absolutely continuous, and its density is bound away from zero and infinity on [ 0 , 1 ] .
C11
Let π 1 , , π K n denote the interior knots within the interval [ 0 , 1 ] and π 0 = 0 , π K n + 1 = 1 . Define ϱ i = π i π i 1 . There exists a constant C, such that max ϱ i / min ϱ i C and max { ϱ i } = o ( K n 1 ) .
C12
The knot number K n is assumed to satisfy K n = n 1 2 r + 1 .
C13
Let b n = max j , k { p λ 1 j ( | θ 0 j ) , p λ 2 , n ( γ 0 k H )   :   | θ 0 j | 0 , γ 0 k H 0 } . Then, b n 0 , as n .
C14
j = s + 1 , , p , k = d + 1 , , q , lim inf n lim inf θ j 0 + λ 2 j 1 p λ 2 j ( | θ j | ) > 0 and lim inf n lim inf γ 0 k H 0 + λ 2 k 1 p λ 2 k ( γ 0 k H ) > 0 hold.
Remark 1.
C1 excludes the scene where T approaches infinity. Essentially, this assumption implies a setting (large N, small T) aligning with many spatial data studies. Conversely, the scenarios in which only T increases indefinitely and where both N and T go to infinity closely resemble the scenario where only N goes to infinity. C2 is needed to apply the central limit theorem in Kelejian and Prucha [33]. C3–C5 are analogous to Assumption 2 in Lee [31], which focuses on the properties of the spatial weight matrix W n and is essential for the identifiability of ρ. Specifically, Λ = ( 1 , 1 ) when W n satisfies j = 1 n w i j = 1 for all i. C6–C9 are applied for asymptotic normality of the profile QMLE. C10–C12 facilitate achieving the optimal convergence rate for α ^ k ( u ) . He et al. [34] suggested that cubic B-splines are adequate for accurately approximating nonparametric functions, with the number of interior knots set to the integer part of n 1 / 5 . Meanwhile, C13 and C14 present assumptions about the penalty function, which are comparable to those utilized in [16,18,19].
Due to the projection matrix J n , a portion of the degrees of freedom is lost; therefore, the estimator σ ^ 2 derived from (9) is not a consistent estimator of σ 0 2 . A correction is needed. Let σ T 2 = ( T 1 ) σ 0 2 / T , θ T = ( σ T 2 , ρ 0 , β 0 τ ) τ and η T = ( θ T τ , γ 0 τ ) τ . Under the assumptions, the subsequent theorem establishes the consistency property of the penalized profile QMLE.
Theorem 1.
Suppose that Assumptions C1–C12 hold; then, we can have
α ^ k ( · ) α 0 k ( · )   = O p ( n r / ( 2 r + 1 ) + a n ) , k = 1 , , q ,
where a n = max j , k p λ 1 j ( | θ 0 j | ) , p λ 2 k ( γ 0 k H )   :   | θ 0 j | 0 , γ 0 k H 0 .
Furthermore, under some conditions, we show that such consistent estimators must possess the sparsity property, which is stated as follows.
Theorem 2.
Suppose that Assumptions C1–C14 hold, and let λ max = max j , k λ 1 j , λ 2 k and λ min = min j , k λ 1 j , λ 2 k . If λ max 0 and n r / ( 2 r + 1 ) λ min as n , then, with probability tending to 1, β ^ and α ^ ( · ) must satisfy
(i) 
β ^ j = 0 ,   j = s + 1 , , p .
(ii) 
α ^ k ( · ) 0 ,   k = d + 1 , , q .
According to Remark 1 in Fan and Li [16], if λ max 0 as n , then a n = 0 . Consequently, based on Theorems 1 and 2, it becomes evident that by selecting appropriate tuning parameters, our variable selection approach is consistent. Furthermore, the estimators of the nonparametric components attain the optimal convergence rate as if the subset of true zero coefficients were already known [35]. Subsequently, we will demonstrate that the estimators for the non-zero coefficients in the parametric components share the same asymptotic distribution as those derived from the correct submodel. Let θ ^ = ( θ ^ 1 , , θ ^ s ) τ ,   θ T = ( θ T 1 , , θ T s ) τ , where the following result states the asymptotic normality of θ ^ .
Theorem 3.
Suppose that Assumptions C1–C14 and the conditions in Theorem 2 hold, then
n ( θ ^ θ T ) N 0 , T T 1 Ω 1 + Ω 1 [ Ψ θ 2 Σ θ γ τ ( Σ γ ) 1 Ψ θ ] Ω 1 ,
where Ω , Ψ θ , Σ γ , Σ θ γ are defined in the Appendix A.

4. Some Issues in Practice

In the practical situation, we have to choose the proper tuning parameters ( λ 1 , λ 2 ) and provide an effective calculation algorithm. In this section, we discuss these practical issues.

4.1. Selection of Tuning Parameters

The tuning parameters λ 1 k ’s and λ 2 k ’s should be chosen. In practice, we suggest taking λ 1 j = λ 1 / | β j ( 0 ) | and λ 2 k = λ 2 / γ k ( 0 ) H , and the pair ( λ 1 , λ 2 ) is derived by minimizing the following BIC-type criterion:
B I C ( λ 1 , λ 2 ) = 2 ln L n ( η ^ λ 1 , λ 2 ) + d f λ 1 , λ 2 × ln n ,
where η ^ λ 1 , λ 2 is the η ^ derived for given λ 1 , λ 2 , and d f λ 1 , λ 2 is the number of non-zero elements of both θ ^ and ( γ ^ 1 H , , γ ^ q H ) .

4.2. Computational Algorithm

Given that Q n ( η ) is not differentiable at the origin, the standard gradient method cannot be employed. Therefore, we have devised an iterative algorithm that relies on a local quadratic approximation of the penalty function p λ ( · ) , similar to the approach taken by Fan and Li [16]. Let
f ( η ) = ln L n ( η ) η , Γ ( η ) = diag 0 , p λ 12 ( | θ 1 | ) | θ 1 | , , p λ 1 p ( | θ p | ) | θ p | , p λ 21 ( γ 1 H ) γ 1 H H , , p λ 2 q ( γ q H ) γ q H H , U ( η ) = Γ ( η ) η , Σ ( η ) = 2 ln L n ( η ) η η τ .
Then, a feasible algorithm is as follows:
Step 1
Initialize η ( 0 ) .
Step 2
Update η ( m + 1 ) = η ( m ) [ Σ ( η ( m ) ) + Γ ( η ( m ) ) ] 1 [ f ( η ( m ) ) U ( η ( m ) ) ] .
Step 3
Iterate Step 2 until convergence, and denote the final estimators as the penalized profile quasi-likelihood estimators.
The initial value η ( 0 ) in Step 1 is obtained from the profile QMLE, i.e., ln L n ( η ( 0 ) ) / η = 0 .

5. Monte Carlo Simulations

In this section, we conduct Monte Carlo simulations to assess the finite sample performance of our proposed method. Following the methodology employed by Li and Liang in [19], we evaluate the estimator θ ^ using the generalized mean square error (GMSE), which is defined as follows:
GMSE = 1 n ( θ ^ θ T ) τ ( 1 , W n Y , X ) τ ( 1 , W n Y , X ) ( θ ^ θ T ) .
The performance of estimator α ^ ( · ) is assessed using average square errors (ASE):
ASE = 1 S s = 1 S k = 1 q α ^ k ( u s ) α 0 k ( u s ) 2 ,
where u s , s = 1 , , S are the grid points at which the function α ^ ( · ) is evaluated. In our simulation, S = 200 is used.
The spatial matrix is generated from the following procedure. (i) Calculate G N = round ( N 0.8 ) as the number of “groups” and m = N 0.2 as the average number of individuals in each group. (ii) Generate the “group” size n i U n i f o r m ( 0.8 m , 1.2 m ) ( i = 1 , , G N ) and adjust n i so that it satisfies i = 1 G N n i = N . (iii) Normalize the matrices W i ( i = 1 , , G N ) with zero for diagonal elements and 1 / ( n i 1 ) for other elements. (iv) Generate the final spatial matrix W N = diag { W 1 , , W G N } .
Data are generated from model (2), and we construct two examples to simulate different scenarios. Example 1 is a regular example that is primarily used to verify the asymptotic properties of penalized profile likelihood estimators. Various sample sizes, degrees of spatial dependence, distributions of disturbances, and variances in disturbances are considered. Example 2 focuses on examining the performance of the proposed method in cases where the sample size is large, and the dimensions of parametric and nonparametric components are high. Unlike Example 1, Example 2 includes covariates with AR structure and different functions. All simulation results are obtained based on 500 repetitions.

5.1. Example 1: Regular Scenario

Let β 0 = ( 2 , 1 , 0 5 ) τ , α 0 ( u ) = ( 2 sin ( 2 π u ) , 2 cos ( 2 π u ) , 0 5 ) τ . To simulate different spatial degrees, we choose different ρ 0 { 0 , 0.3 , 0.7 } . We take two distribution settings for ε i t : (i)  ε i t N ( 0 , σ 0 2 ) and (ii) ε i t σ 0 2 / 1.5 · t ( 6 ) , with σ 0 2 { 1 , 2 } . To perform this simulation, we take the covariates X i t N ( 0 , I 7 ) , Z i t N ( 0 , I 7 ) and u i t U ( 0 , 1 ) . The fixed effect μ 0 i , i = 1 , , N is generated from U ( 0 , 1 ) . We use the cubic B-splines in all simulations and generate N = 30 , 60 ; T = 10 , 15 , respectively.
We compare the performance of the variable selection procedure based on the SCAD penalty (SCAD) proposed by this paper with that based on the adaptive LASSO (ALASSO) penalty [17]. Table 1 and Table 2 report the effects of variable selection under normal disturbance. The column labeled “C” indicates the average number of true zeros correctly set to zero, while the column labeled “I” shows the average number of true non-zeros incorrectly set to zero. The row labeled “Oracle” refers to the oracle estimators computed using the true model when the zero coefficients are known.
From Table 1 and Table 2, we can infer the following consequences. (i) As n = N × T increases, the performance of all variable selection methods, both parametric and nonparametric, converges towards the oracle procedure in terms of model error and complexity. (ii) For the parametric component, SCAD-based variable selection outperforms ALASSO-based methods, while for the nonparametric component, the reverse is true. However, overall, the effects of variable selection based on these two penalty functions are satisfactory and comparable. (iii) When the true parameter ρ 0 = 0 , reducing the model to a non-spatial varying coefficient panel model, the proposed variable selection methods can accurately identify the true model.
Table A1 in Appendix B reports the effect of variable selection under t disturbance. We can see that the conclusions drawn from Table 1 and Table 2 also hold for Table A1. Comparing Table 1 and Table 2 and Table A1, the GMSE and ASE under t disturbance are slightly larger than those under normal disturbance, but this difference is so small that it can be ignored, which implies the robustness of the proposed variable selection procedure.

5.2. Example 2: High-Dimension Scenario

Let N × T = 100 × 20 , β 0 = ( 3 , 1.5 , 2 , 1 , 0 46 ) τ , and α 0 ( u ) = ( 2 sin ( 2 π u ) , 4 u ( 1 u ) ( u 3 ) ,   ln ( 16 u 2 1 ) , 0 12 ) τ . The covariates ( X i t τ , Z i t τ ) τ are generated from N ( 0 , AR 0.5 ) , where
AR 0.5 = 1 0.5 0 . 5 65 0.5 1 0 . 5 64 0 . 5 65 0 . 5 64 1 .
To save space, we only considered the case where the disturbance term follows a normal distribution in this example. The remaining settings are identical to those in Example 1.
Table A2 in Appendix B presents the outcomes of Example 2. As depicted in Table A2, the proposed procedures demonstrate robust performance despite the substantial dimensions of both the parametric and nonparametric components.

6. A Real Example

In this section, we use China’s provincial carbon emission panel dataset to illustrate our proposed methods. The dataset contains 14 annual variables of 30 provinces in China from 2007 to 2019. The model used is an extension of the STRIPAT model of Dietz and Rosa [36], which is specified as
ln y i t = ρ 0 j = 1 30 w i j ln y j t + j = 1 8 β 0 j ln x i t j + k = 1 6 α 0 k ( u i t ) ln z i t k + μ 0 i + ϵ i t
for i = 1 , , 30 ; t = 1 , , 13 . The corresponding variables are described in Table 3. Such a model aims to analyze the social factors that have an impact on carbon dioxide emissions, especially the affluence factor. The spatial weight matrix W N = w i j is specified by contiguity rules; that is, let
w i j = 1 , province i and j share the common graphic boundary 0 , else .
Then, adjust w i j so that the diagonal elements of W N are 0 and the row sums are 1.
In previous studies, the affluence factor always had significantly positive effects on carbon dioxide emissions, and the effects were always set as a constant, see [37,38]. This phenomenon can be described by the following mechanism: among the three major industries, the secondary industry exhibits the highest dependence on energy consumption, followed by the tertiary industry. However, these two industries are also the pillar industries that drive economic development, implying a significant reliance on economic growth on energy consumption. Currently, fossil fuels constitute the primary energy source in the world, and their consumption leads to a substantial release of carbon dioxide. As indicated by existing research analyses, economic development contributes significantly to carbon emissions. It is well known that coal consumption among fossil fuels generates the most carbon dioxide, suggesting that the energy structure may directly influence the impact of economic development on carbon emissions. However, considering the significant variations in energy structures across and within different provinces, cities, and over different years, it may be unreasonable to measure this impact using a constant. Therefore, we wish to capture the potential heterogeneity of this impact using the coal consumption proportion (CCP), which is why we set the CCP as u i t and six variables describing affluence as z i t 1 , , z i t 6 .
Table 4 reports the estimated coefficient parameters, and Figure 1 depicts the estimated coefficient functions. There is not much difference between the results for SCAD and ALASSO, so we focus on the SCAD results in the following analysis. The spatial coefficient ρ ^ is not shrunk to zero and is positive, which means that neighboring provinces exhibit a substantial spatial positive correlation. The constant coefficients β ^ 2 , β ^ 3 , β ^ 4 , and β ^ 6 are positive, and the mean values are estimated when EI, RDF, FIR, and FV change by 1% and CE changes by 0.35%, 0.087%, 0.245%, and 0.1%, respectively. β ^ 7 is the only negative constant coefficient, which implies that as the public transportation passenger volume increases, carbon emissions tend to decrease. Public transportation is a low-carbon mode of transportation. The more people choose to travel by public transportation, the fewer people use private cars, leading to a reduction in carbon emissions. This provides the evidence for the negative coefficient β ^ 7 . The remaining coefficients are estimated to be zero, which means that they are excluded through variable selection. In addition, most of the coefficient functions in Figure 1 show an upward trend, which means that as coal consumption proportion increases, the contribution of affluence factors to carbon emission increases. Meanwhile, two function coefficients are removed through variable selection. This is generally consistent with the previous analysis in this section.

7. Discussion and Conclusions

Within the context of SVCSAR panel models with fixed effects, we developed a variable selection process that utilizes basis function approximations alongside the profile quasi-likelihood method. This approach allows for the simultaneous selection of significant variables in both parametric and nonparametric components, as well as the estimation of unknown coefficients. By selecting appropriate tuning parameters, we demonstrated that this selection process is consistent, and the estimators of constant coefficients exhibit the oracle property. Simulation results highlight the effectiveness of our proposed method in selecting variables and estimating both constant and varying coefficients. In this study, we assume that the dimensions of covariates X and Z remain fixed. Additionally, we also presented simulations in Section 5 and obtained desired results when the dimensions p and q were large. However, it is worth noting that our variable selection process may not be applicable to scenarios involving ultra-high-dimensional covariates. As a potential area for future research, exploring variable selection techniques for SVCSAR panel models with ultra-high-dimensional covariates would be of great interest.

Author Contributions

Conceptualization, R.T. and M.X.; Methodology, R.T., M.X. and D.X.; Software, M.X.; Formal analysis, R.T. and M.X.; Writing—original draft, R.T., M.X. and D.X.; Supervision, R.T. All authors have read and agreed to the published version of the manuscript.

Funding

R.T.’s work is supported by the Zhejiang Provincial Philosophy and Social Sciences Planning Project (No. 24NDJC014YB). D.X.’s work is supported by the Zhejiang Provincial Natural Science Foundation of China (No. LY23A010013).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of Theorems

The following list summarizes some frequently used facts in the appendix.
Fact A1.
Let  M 1 , M 2  be  n × n  symmetric matrices, with  M 2  being positive semidefinite. Then,  λ i ( M 1 ) λ i ( M 1 + M 2 )  for  i = 1 , , n , where  λ i  denotes the i-th eigenvalue.
Fact A2.
Let  M 1  and  M 2  be  n × n  matrices and uniformly bound in either row or column sums; then, the entries of their product  M 1 M 2  are also uniformly bound.
Before proving the main theorems, we supplement several lemmas.
Lemma A1.
Let δ n = A 0 S n γ 0 , supposing that Assumptions C1–C12 hold. Then,
1 n H n τ J n δ n = o p ( 1 ) ,
for H n = V n , A 0 , X n and W n Y n .
Proof. 
The proof is similar to that of Lemma 1 in Tian et al. [32]. □
Lemma A2.
Suppose that Assumptions C1–C4 hold, then
( I ) G n i s   u n i f o r m l y   b o u n d e d   i n   e i t h e r   r o w   o r   c o l u m n   s u m s ; ( I I ) tr G n n = O ( 1 ) ; ( I I I ) tr ( G n τ G n ) n =
O ( 1 ) ; ( I V ) tr J n n = T 1 T ; ( V ) tr ( G n τ J n ) n = T 1 T tr G n n ; ( V I ) tr ( G n τ J n G n ) n = T 1 T tr G n τ G n n .
Proof. 
The proof method is similar to that of Lemma 3 in Tian et al. [32]; therefore, will not be elaborated here. □
Lemma A3.
Suppose that Assumptions C1–C12 hold, then
1 n ln L n ( η T ) η = O p ( 1 ) .
Proof. 
The first order partial derivatives of the profile likelihood function is
1 n ln L n ( η T ) σ 2 = n 2 σ T 2 + 1 2 σ T 4 n V n τ J n V n + o p ( 1 ) 1 n ln L n ( η T ) ρ = 1 n tr G n + 1 σ T 2 n R n τ J n V n + 1 σ T 2 n V n τ G n τ J n V n + o p ( 1 ) 1 n ln L n ( η T ) β = 1 σ T 2 n X n τ J n V n + o p ( 1 ) . 1 n ln L n ( η T ) γ = 1 σ T 2 n S n τ J n V n + o p ( 1 ) .
The variance of n 1 / 2 ln L n ( η T ) / σ 2 is
var 1 n ln L n ( η T ) σ 2 = 1 4 σ T 8 n var V n τ J n V n + o ( 1 ) = 1 4 σ T 8 ( T 1 ) 2 ( μ 4 3 σ 0 4 ) T 2 + 2 ( T 1 ) σ 0 4 T + o ( 1 ) = O ( 1 ) .
Therefore, n 1 / 2 ln L n ( η T ) / σ 2 = O p ( 1 ) . The variance of n 1 / 2 ln L n ( η T ) / ρ is
var 1 n ln L n ( η T ) ρ 2 σ T 4 n var R n τ J n V n + var V n τ G n τ J n V n + o ( 1 ) 2 σ T 4 σ 0 2 λ c + ( T 1 ) 2 ( μ 4 3 σ 0 4 ) T 2 + ( T 1 ) σ 0 4 tr [ ( G n + G n τ ) G n ) ] T = O ( 1 ) . .
Thus, n 1 / 2 ln L n ( η T ) / ρ = O p ( 1 ) . Given the uniform boundedness of the elements of X n and S n are uniformly bound for all n, as stated in C6, it is evident that
1 σ T 2 n X n τ J n V n = O p ( 1 ) , 1 σ T 2 n S n τ J n V n = O p ( 1 ) .
Lemma A4.
Suppose that Assumptions C1–C12 hold, then
1 n 2 ln L n ( η T ) η η τ E 2 ln L n ( η T ) η η τ = o p ( 1 ) .
Proof. 
Utilizing a similar argument as in Theorem 3.2 of Lee [31], Lemma A4 is proven. □
Proof of Theorem 1.
Let Δ n = a n + n r / ( 2 r + 1 ) , θ = θ T + Δ n ζ 1 , γ = γ 0 + Δ n ζ 2 and ζ = ( ζ 1 τ , ζ 2 τ ) τ . It is sufficient to show that, for any given ε > 0 , there exists a large constant C such that
P sup ζ = C Q ( θ , γ ) < Q ( θ T , γ 0 ) > 1 ε .
Let D ( ζ ) = Q ( θ , γ ) Q ( θ T , γ 0 ) ; then, with a simple calculation, we see that
D ( ζ ) = Δ n ln L n ( η T ) η τ ζ + 1 2 Δ n 2 ζ τ 2 ln L n ( η T ) η η τ ζ ( 1 + o p ( 1 ) ) + n j = 2 p + 2 [ p λ 1 j ( | θ j | ) p λ 1 j ( | θ T j | ) ] + n k = 1 q [ p λ 2 k ( γ k H ) p λ 2 k ( γ 0 j H ) ] D 1 + D 2 + D 3 + D 4 .
It follows by Lemma A3 and the Cauchy inequality that
| D 1 | = Δ n ln L n ( η T ) η τ ζ Δ n n 1 n ln L n ( η T ) η · ζ = ζ O p ( n K n Δ n ) = ζ O p ( n Δ n 2 ) .
According to Lemma A4, a simple calculation yields
D 2 = 1 2 Δ n 2 ζ τ 2 ln L n ( η T ) η η τ ζ ( 1 + o p ( 1 ) ) = n Δ n 2 ζ τ Σ n , η ζ ( 1 + o p ( 1 ) ) .
Hence, by choosing a sufficiently large C, D 2 dominates D 1 uniformly in ζ = C . Furthermore, invoking C13 and p λ 1 j ( 0 ) = 0 , and by the standard argument of the Taylor expansion, we obtain that
D 3 n j = 2 s [ p λ 1 j ( | θ j | ) p λ 1 j ( | θ T j | ) ] n j = 2 s [ n Δ n p λ 1 j ( | θ T j | ) sgn ( θ T j ) | ζ 1 j | + n Δ n 2 p λ 1 j ( | θ T j | ) | ζ 1 j | 2 ( 1 + o p ( 1 ) ) ] s n Δ n a n ζ + n Δ n 2 b n ζ 2 = ζ O p ( n Δ n 2 ) + ζ o p ( n Δ n 2 ) .
Then, it is easy to show that D 3 is dominated by D 2 uniformly in ζ = C . With the same argument, we can prove that D 4 is dominated by D 2 . Hence, by choosing a sufficiently large C, (A1) holds. This implies that a local maximizer γ ^ exists such that γ ^ γ 0 = O p ( Δ n ) = O p ( a n + n r / ( 2 r + 1 ) ) . Then, γ ^ k γ 0 k = O p ( a n + n r / ( 2 r + 1 ) ) . Let δ k = α k ( u ) B ( u ) τ γ 0 k , k = 1 , , q . Note that
α ^ k ( u ) α 0 k ( u ) 2 = 0 1 [ α ^ k ( u ) α 0 k ( u ) ] 2 d u = 0 1 [ B τ ( u ) γ ^ k B τ ( u ) γ 0 k δ k ] 2 d u 2 0 1 [ B τ ( u ) ( γ ^ k γ 0 k ) ] 2 d u + 2 0 1 δ k 2 d u = ( γ ^ k γ 0 k ) τ H ( γ ^ k γ 0 k ) + 2 0 1 δ k 2 d u ,
and by invoking H = O ( 1 ) , a simple calculation yields
( γ ^ k γ 0 k ) τ H ( γ ^ k γ 0 k ) = O p ( a n 2 + n 2 r / ( 2 r + 1 ) ) .
Together with
0 1 δ k 2 d u = O p ( n 2 r / ( 2 r + 1 ) ) ,
the proof is completed. □
Proof of Theorem 2.
We begin by proving part (i). Given λ max 0 , it follows easily that a n = 0 for large n. Then, according to Theorem 1, it suffices to show that for any θ j satisfying | θ j θ 0 j | = O p ( n r / ( 2 r + 1 ) ) , j = 1 , , s and any γ satisfying γ γ 0 = O p ( n r / ( 2 r + 1 ) ) , with some given small ε = C n r / ( 2 r + 1 ) , as n , the probability tends to 1 that
Q n ( η ) θ j < 0 , for 0 < θ j < ε , j = s + 1 , , p + 2 ,
and
Q n ( η ) θ j > 0 , for ε < θ j < 0 , j = s + 1 , , p + 2 .
Thus, (A2) and (A3) imply that the maximizer of Q n ( η ) attains θ j = 0 , j = s + 1 , , p + 2 .
Using a similar argument to the proof of Theorem 1, we obtain that
Q n ( η ) θ j = ln L n ( η ) θ j n p λ 1 j ( | θ 0 j | ) sgn ( θ 0 j ) = ln L n ( η T ) θ j + k = 1 2 ln L n ( η T ) θ j θ k ( η k η T k ) + k = 1 l = 1 3 ln L n ( η ) θ j θ k θ l ( η k η T k ) ( η l η T l ) n p λ 1 j ( | θ j | ) sgn ( θ j ) ,
where η lies between η and η T . From Lemmas A1 and A2 and assumption C9, we have
1 n ln L n ( η T ) θ j = O p ( n 1 2 ) , 1 n 2 ln L n ( η T ) η η τ = E 1 n 2 ln L n ( η T ) η η τ + o p ( 1 ) , 1 n 3 ln L n ( η T ) θ j η k η l = O p ( 1 ) .
Then,
Q n ( η ) θ j = n λ 1 j λ 1 j 1 1 n ln L n ( η T ) θ j + λ 1 j 1 1 n k = 1 2 ln L n ( η T ) θ j θ k ( η k η T k ) + λ 1 j 1 1 n k = 1 l = 1 3 ln L n ( η ) θ j θ k θ l ( η k η T k ) ( η l η T l ) λ 1 j 1 p λ 1 j ( | θ j | ) sgn ( θ j ) = n λ 1 j λ 1 j 1 O p n r 2 r + 1 λ 1 j 1 p λ 1 j ( | θ j | ) sgn ( θ j ) .
Since lim n lim inf θ j 0 λ 1 j 1 p λ 1 j ( | θ j | ) > 0 and n r / ( 2 r + 1 ) λ 1 j > n r / ( 2 r + 1 ) λ min , the sign of the derivation is completely determined by that of θ j ; then, (A2) and (A3) hold.
By applying similar techniques to our analysis of part (i) in this theorem, we find that with probability tending to 1, γ ^ k = 0 , k = d + 1 , , q . Then, using the fact that sup u B ( u ) = O ( 1 ) , the result of this theorem follows immediately from α ^ k ( u ) = B τ ( u ) γ ^ k . □
Proof of Theorem 3.
Let γ = ( γ 1 τ , , γ d τ ) τ , γ 0 = ( γ 01 τ , , γ 0 d τ ) τ , η = ( θ , γ ) , and η T = ( θ T , γ 0 ) . In order to obtain the asymptotic distribution, we first write the components of n 1 / 2 ln L n ( η T ) / η as follows:
1 n ln L n ( η T ) σ 2 = n 2 σ T 2 + 1 2 σ T 4 n ( δ n + V n ) τ J n ( δ n + V n ) 1 n ln L n ( η T ) ρ = 1 n tr G n + 1 σ T 2 n ( R n τ J n V n + V n τ G n τ J n V n + ( W n Y n ) τ J n δ n ) 1 n ln L n ( η T ) β = 1 σ T 2 n X n τ J n ( δ n + V n ) 1 n ln L n ( η T ) γ = 1 σ T 2 n S n τ J n ( δ n + V n ) ,
where δ n = A 0 S n γ 0 . By Lemma A1, 1 n H n τ J n δ n = o p ( 1 ) for H n = V n , δ n , X n and W n Y n , and the formula above is rewritten as follows:
1 n ln L n ( η T ) σ 2 = n 2 σ T 2 + 1 2 σ T 4 n V n τ J n V n + o p ( 1 ) 1 n ln L n ( η T ) ρ = 1 n tr G n + 1 σ T 2 n ( R n τ J n V n + V n τ G n τ J n V n ) + o p ( 1 ) 1 n ln L n ( η T ) β = 1 σ T 2 n X n τ J n V n + o p ( 1 ) 1 n ln L n ( η T ) γ = 1 σ T 2 n S n τ J n V n + o p ( 1 ) .
Let μ 3 = E | ϵ i t | 3 and μ 4 = E | ϵ i t | 4 , T n τ J n T n / n Φ T T for T n = X n , R n or S n . Then, using Lemma A2 we write the following:
E 1 n 2 ln L n ( η T ) η η τ = σ 0 2 tr J n n σ T 6 1 2 σ T 4 + δ n τ J n δ n n σ T 6 R n τ J n δ n n σ T 4 + σ 0 2 tr ( G n τ J n ) n σ T 4 R n τ J n R n n σ T 2 + tr G n 2 n + σ 0 2 tr ( G n τ J n G n ) n σ T 2 X n τ J n δ n n σ T 4 X n τ J n R n n σ T 2 X n τ J n X n n σ T 2 S n τ J n δ n n σ T 4 S n τ J n R n n σ T 2 S n τ J n X n n σ T 2 S n τ J n S n n σ T 2 = 1 2 σ T 4 tr G n n σ T 2 Φ R R σ T 2 + tr ( G n 2 + G n τ G n ) n 0 Φ X R σ T 2 Φ X X σ T 2 0 Φ S R σ T 2 Φ S X σ T 2 Φ S S σ T 2 + o ( 1 ) Σ n , η + o ( 1 )
E 1 n ln L n ( η T ) η ln L n ( η T ) η τ = T T 1 1 2 σ T 4 + μ 4 3 σ 0 4 4 σ T 4 σ 0 4 T T 1 tr G n n σ T 2 + ( μ 4 3 σ 0 4 ) tr G n 2 n σ T 2 σ 0 4 + μ 3 R n τ J n diag ( J n ) 2 n σ T 6 T T 1 Φ R R σ T 2 + tr ( G n G n + G n τ G n ) n + ( μ 4 3 σ 0 4 ) g n , i i 2 4 σ T 4 σ 0 4 + 2 μ 3 R n τ J n diag ( G n τ J n ) n σ T 4 μ 3 X n τ J n diag ( J n ) 2 n σ T 6 T T 1 Φ X R σ T 2 + μ 3 X n τ J n diag ( G n τ J n ) n σ T 4 T T 1 Φ X X σ T 2 μ 3 S n τ J n diag ( J n ) 2 n σ T 6 T T 1 Φ S R σ T 2 + μ 3 X n τ J n diag ( G n τ J n ) n σ T 4 T T 1 Φ S X σ T 2 T T 1 Φ S S σ T 2 + o ( 1 ) = T T 1 ( Σ n , η + Ψ n , η ) + o ( 1 ) ,
where g n , i i is the i-th diagonal element of G n and
Ψ n , θ = T 1 T μ 4 3 σ 0 4 4 σ T 4 σ 0 4 ( μ 4 3 σ 0 4 ) tr G n 2 n σ T 2 σ 0 4 + μ 3 R n τ J n diag ( J n ) 2 n σ T 6 ( μ 4 3 σ 0 4 ) g n , i i 2 4 σ T 4 σ 0 4 + 2 μ 3 R n τ J n diag ( G n τ J n ) n σ T 4 μ 3 X n τ J n diag ( J n ) 2 n σ T 6 μ 3 X n τ J n diag ( G n τ J n ) n σ T 4 0 μ 3 S n τ J n diag ( J n ) 2 n σ T 6 μ 3 S n τ J n diag ( G n τ J n ) n σ T 4 0 0 .
To derive the asymptotic distribution of θ ^ , we divide Σ n , η into four block matrices, which correspond to the second-order derivatives of the likelihood function with respect to θ , the cross-partial derivatives of θ and γ , and the second-order derivatives of γ . The matrices are as follows:
Σ n , η Σ n , θ Σ n , θ γ τ Σ n , θ γ Σ n , γ ,
where
Σ n , θ 1 2 σ T 4 tr G n n σ T 2 Φ R R σ T 2 + tr ( G n 2 + G n τ G n ) n 0 Φ X R σ T 2 Φ X X σ T 2 , Σ n , θ γ 0 , Φ S R σ T 2 Φ S X σ T 2 , Σ n , γ Φ X X σ T 2 .
Let Σ n , θ be the first s × s upper-left submatrix of Σ n , θ , Σ n , θ γ be the first ( K n + l + 1 ) d × s upper-left submatrix of Σ n , θ γ , and Σ n , γ be the first ( K n + l + 1 ) d × ( K n + l + 1 ) d upper-left submatrix of Σ n , γ . Using the same argument, we partition Ψ n , η into four block matrices, that is
Ψ n , η Ψ n , θ Ψ n , θ γ τ Ψ n , θ γ Ψ n , γ ,
and obtain submatrices Ψ n , θ , Ψ n , θ γ and Ψ n , γ . Let the notation of matrices without subscripts “n” represent the limiting version of the matrices. For example, Σ θ = lim n Σ n , θ . Let Ω = Σ θ Σ θ γ τ ( Σ γ ) 1 Σ θ γ , and we assume that Ω is non-singular.
According to Theorems 1 and 2, as n , with probability tending to 1, Q n ( η ) achieves its maximal value at ( θ ^ τ , 0 ) τ and ( γ ^ τ , 0 ) τ ; then, ( θ ^ τ , 0 ) τ and ( γ ^ τ , 0 ) τ must satisfy the following:
1 n Q n ( ( θ ^ τ , 0 ) τ , ( γ ^ τ , 0 ) τ ) η = 0 .
Applying the Taylor expansion, we have
1 n ln L n ( η T ) θ ln L n ( η T ) γ 1 n 2 ln L n ( η T ) θ θ τ 2 ln L n ( η T ) γ τ θ 2 ln L n ( η T ) γ θ τ 2 ln L n ( η T ) γ γ τ + o p ( 1 ) n θ ^ θ T γ ^ γ 0 + P θ P γ = 0 ,
where
P θ = n p λ 11 ( | θ ^ 1 | ) sgn ( θ ^ 1 ) n p λ 1 s ( | θ ^ s | ) sgn ( θ ^ s ) , P γ = n p λ 21 ( γ ^ 1 H ) H γ ^ 1 γ ^ 1 H n p λ 2 d ( γ ^ d H ) H γ ^ d γ ^ d H .
Applying the Taylor expansion to p λ 1 j ( | θ ^ j | ) , we obtain that
p λ 1 j ( | θ ^ j | ) = p λ 1 j ( | θ 0 j | ) + { p λ 1 j ( | θ 0 j | ) + o p ( 1 ) } ( θ ^ j θ 0 j ) .
Furthermore, Assumption C13 implies that p λ 1 j ( | θ 0 j | ) = o p ( 1 ) , and note that p λ 1 j ( | θ 0 j | ) = 0 as λ max 0 , then P θ = o p ( n ( θ ^ θ T ) ) . Using similar arguments, we can prove that P γ = o p ( n ( γ ^ γ 0 ) ) . Together with Lemma A4 and Equation (A9), a simple calculation yields the following:
1 n ln L n ( η T ) θ [ Σ n , θ + o p ( 1 ) ] n ( θ ^ θ T ) [ Σ n , θ γ τ + o p ( 1 ) ] n ( γ ^ γ 0 ) = 0 1 n ln L n ( η T ) γ [ Σ n , θ γ + o p ( 1 ) ] n ( θ ^ θ T ) [ Σ n , γ + o p ( 1 ) ] n ( γ ^ γ 0 ) = 0 .
By substitution, the term n ( γ ^ γ 0 ) is eliminated, yielding
n ( θ ^ θ T ) = [ Σ n , θ Σ n , θ γ τ ( Σ n , θ γ ) 1 Σ n , θ γ + o p ( 1 ) ] 1 [ I s , Σ n , θ γ τ ( Σ n , θ γ ) 1 ] 1 n ln L n ( η T ) θ 1 n ln L n ( η T ) γ + o p ( 1 ) .
Furthermore, the central limit theorem for linear–quadratic forms [33] shows that
1 n ln L n ( η T ) θ ln L n ( η T ) γ D N 0 , T T 1 Σ θ + Ψ θ Σ θ γ τ + Ψ θ γ τ Σ θ γ + Ψ θ γ Σ γ + Ψ γ ,
where the definition of the symbols in the covariance matrix is located in the context of (A8). Then, invoking (A10) and (A11), and using the Slutsky theorem, we have
n ( θ ^ θ T ) N 0 , T T 1 Ω 1 + Ω 1 [ Ψ θ 2 Σ θ γ τ ( Σ γ ) 1 Ψ θ ] Ω 1 .

Appendix B. Supplementary Simulation Results

Table A1. Variable selection under ϵ i t σ 0 2 / 1.5 · t ( 6 ) in Example 1.
Table A1. Variable selection under ϵ i t σ 0 2 / 1.5 · t ( 6 ) in Example 1.
σ 0 2 = 1 σ 0 2 = 2
N × T ρ 0 Method θ ^ α ^ ( · ) θ ^ α ^ ( · )
GMSECIASECIGMSECIASECI
30 × 10 0SCAD0.0275.94400.1614.94400.0765.93800.2344.7160
ALASSO0.0325.74800.1604.96000.0865.63000.2114.9140
Oracle0.0276.00000.1575.00000.0716.00000.2065.0000
0.3SCAD0.0324.84600.1604.96200.0844.79800.2374.7340
ALASSO0.0324.90200.1584.96800.0864.83000.2134.9440
Oracle0.0295.00000.1575.00000.0745.00000.2075.0000
0.7SCAD0.0324.97000.1584.95400.0924.96600.2364.7500
ALASSO0.0364.90400.1584.96000.1024.79800.2184.9180
Oracle0.0315.00000.1555.00000.0865.00000.2115.0000
30 × 15 0SCAD0.0215.97200.1404.91800.0525.96600.1884.7120
ALASSO0.0255.83600.1384.98000.0665.75200.1724.9460
Oracle0.0226.00000.1375.00000.0516.00000.1705.0000
0.3SCAD0.0214.89000.1384.93200.0604.86600.1854.7360
ALASSO0.0234.94400.1354.99800.0694.89600.1704.9820
Oracle0.0205.00000.1355.00000.0595.00000.1675.0000
0.7SCAD0.0254.99400.1394.92200.0634.96600.1834.7620
ALASSO0.0304.92000.1384.99400.0764.85800.1744.9560
Oracle0.0255.00000.1365.00000.0605.00000.1675.0000
60 × 10 0SCAD0.0155.99200.1285.00000.0395.97600.1524.9880
ALASSO0.0185.83200.1294.99600.0535.81200.1534.9920
Oracle0.0166.00000.1285.00000.0406.00000.1525.0000
0.3SCAD0.0204.93200.1295.00000.0384.89400.1514.9980
ALASSO0.0224.96600.1294.99800.0434.92800.1524.9920
Oracle0.0205.00000.1295.00000.0365.00000.1515.0000
0.7SCAD0.0204.98800.1285.00000.0494.99000.1514.9980
ALASSO0.0244.95400.1294.99800.0604.92600.1544.9960
Oracle0.0205.00000.1285.00000.0485.00000.1515.0000
60 × 15 0SCAD0.0146.00000.1184.99400.0285.99800.1394.8880
ALASSO0.0165.90600.1195.00000.0365.87800.1364.9980
Oracle0.0146.00000.1185.00000.0306.00000.1355.0000
0.3SCAD0.0154.93000.1194.97800.0374.95800.1394.8540
ALASSO0.0174.93400.1204.98400.0434.93800.1355.0000
Oracle0.0155.00000.1195.00000.0375.00000.1345.0000
0.7SCAD0.0154.99800.1194.99800.0304.99000.1364.9200
ALASSO0.0184.97000.1194.99800.0404.94000.1364.9940
Oracle0.0165.00000.1195.00000.0315.00000.1345.0000
Table A2. Variable selection for n = 100 × 20 in Example 2.
Table A2. Variable selection for n = 100 × 20 in Example 2.
σ 0 2 = 1 σ 0 2 = 2
ρ 0 Method θ ^ α ^ ( · ) θ ^ α ^ ( · )
GMSECIASECIGMSECIASECI
0SCAD0.00446.98600.03711.67000.01046.98200.04911.5080
ALASSO0.00546.38800.03611.99600.01845.81800.04811.9820
Oracle0.00347.00000.03612.00000.00947.00000.04612.0000
0.3SCAD0.00445.81400.03910.97200.01145.60200.05311.1100
ALASSO0.00645.50600.03711.91400.01844.64800.04911.6720
Oracle0.00446.00000.03612.00000.00946.00000.04612.0000
0.7SCAD0.00445.98600.03811.45000.01145.96400.04911.4200
ALASSO0.00645.56800.03711.99200.02144.84600.04811.9640
Oracle0.00446.00000.03612.00000.01046.00000.04612.0000

References

  1. Anselin, L.; Hudak, S. Spatial econometrics in practice: A review of software options. Reg. Sci. Urban Econ. 1992, 22, 509–536. [Google Scholar] [CrossRef]
  2. Baltagi, B.H.; Heun Song, S.; Cheol Jung, B.; Koh, W. Testing for serial correlation, spatial autocorrelation and random effects using panel data. J. Econom. 2007, 140, 5–51. [Google Scholar] [CrossRef]
  3. Kapoor, M.; Kelejian, H.H.; Prucha, I.R. Panel data models with spatially correlated error components. J. Econom. 2007, 140, 97–130. [Google Scholar] [CrossRef]
  4. Baltagi, B.H.; Kao, C.; Liu, L. Asymptotic properties of estimators for the linear panel regression model with random individual effects and serially correlated errors: The case of stationary and non-stationary regressors and residuals. Econom. J. 2008, 11, 554–572. [Google Scholar] [CrossRef]
  5. Lee, L.F.; Yu, J. Estimation of spatial autoregressive panel data models with fixed effects. J. Econom. 2010, 154, 165–185. [Google Scholar] [CrossRef]
  6. Ai, C.; Zhang, Y. Estimation of partially specified spatial panel data models with fixed-effects. Econom. Rev. 2017, 36, 6–22. [Google Scholar] [CrossRef]
  7. Zhang, Y.; Sun, Y. Estimation of partially specified dynamic spatial panel data models with fixed-effects. Reg. Sci. Urban Econ. 2015, 51, 37–46. [Google Scholar] [CrossRef] [PubMed]
  8. Zhang, Y.; Shen, D. Estimation of semi-parametric varying-coefficient spatial panel data models with random-effects. J. Stat. Plan. Inference 2015, 159, 64–80. [Google Scholar] [CrossRef]
  9. Feng, S.; Tong, T.; Chiu, S.N. Statistical Inference for Partially Linear Varying Coefficient Spatial Autoregressive Panel Data Model. Mathematics 2023, 11, 4606. [Google Scholar] [CrossRef]
  10. Hu, X. Estimation in a semi-varying coefficient model for panel data with fixed effects. J. Syst. Sci. Complex. 2014, 27, 594–604. [Google Scholar] [CrossRef]
  11. He, B.; Hong, X.; Fan, G. Empirical likelihood for semi-varying coefficient models for panel data with fixed effects. J. Korean Stat. Soc. 2016, 45, 395–408. [Google Scholar] [CrossRef]
  12. Feng, S.; He, W.; Li, F. Model detection and estimation for varying coefficient panel data models with fixed effects. Comput. Stat. Data Anal. 2020, 152, 107054. [Google Scholar] [CrossRef]
  13. Feng, S.; Li, G.; Peng, H.; Tong, T. Varying coefficient panel data model with interactive fixed effects. Stat. Sin. 2021, 31, 935–957. [Google Scholar] [CrossRef]
  14. Sun, Y.; Carroll, R.; Li, D. Semiparametric estimation of fixed-effects panel data varying coefficient models. Adv. Econom. 2009, 25, 101–129. [Google Scholar]
  15. Tibshirani, R. Regression Shrinkage and Selection Via the Lasso. J. R. Stat. Soc. Ser. B (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
  16. Fan, J.; Li, R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  17. Zou, H. The Adaptive Lasso and Its Oracle Properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
  18. Wang, L.; Li, H.; Huang, J.Z. Variable Selection in Nonparametric Varying-Coefficient Models for Analysis of Repeated Measurements. J. Am. Stat. Assoc. 2008, 103, 1556–1569. [Google Scholar] [CrossRef]
  19. Li, R.; Liang, H. Variable selection in semiparametric regression modeling. Ann. Stat. 2008, 36, 261–286. [Google Scholar] [CrossRef]
  20. Wang, H.J.; Zhu, Z.; Zhou, J. Quantile regression in partially linear varying coefficient models. Ann. Stat. 2009, 37, 3841–3866. [Google Scholar] [CrossRef]
  21. Zhao, P.; Xue, L. Variable selection for semiparametric varying coefficient partially linear models. Stat. Probab. Lett. 2009, 79, 2148–2157. [Google Scholar] [CrossRef]
  22. Tian, R.; Xue, L.; Liu, C. Penalized quadratic inference functions for semiparametric varying coefficient partially linear models with longitudinal data. J. Multivar. Anal. 2014, 132, 94–110. [Google Scholar] [CrossRef]
  23. Tian, R.; Xue, L.; Hu, Y. Smooth-threshold GEE variable selection for varying coefficient partially linear models with longitudinal data. J. Korean Stat. Soc. 2015, 44, 419–431. [Google Scholar] [CrossRef]
  24. Li, R.; Mu, S.; Hao, R. Estimation and variable selection for partially linear additive models with measurement errors. Commun. Stat. Theory Methods 2021, 50, 1416–1445. [Google Scholar] [CrossRef]
  25. Ma, X.; Du, Y.; Wang, J. Model detection and variable selection for mode varying coefficient model. Stat. Methods Appl. 2022, 31, 321–341. [Google Scholar] [CrossRef]
  26. Liu, Y.; Wang, Z.; Tian, M.; Yu, K. Estimation and variable selection for generalized functional partially varying coefficient hybrid models. Stat. Pap. 2024, 65, 93–119. [Google Scholar] [CrossRef]
  27. Neyman, J.; Scott, E.L. Consistent Estimates Based on Partially Consistent Observations. Econometrica 1948, 16, 1–32. [Google Scholar] [CrossRef]
  28. Liu, X.; Chen, J.; Cheng, S. A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat. Stat. 2018, 25, 86–104. [Google Scholar] [CrossRef]
  29. Xie, T.; Cao, R.; Du, J. Variable selection for spatial autoregressive models with a diverging number of parameters. Stat. Pap. 2020, 61, 1125–1145. [Google Scholar] [CrossRef]
  30. Luo, G.; Wu, M. Variable selection for semiparametric varying-coefficient spatial autoregressive models with a diverging number of parameters. Commun. Stat. Theory Methods 2021, 50, 2062–2079. [Google Scholar] [CrossRef]
  31. Lee, L.F. Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Autoregressive Models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
  32. Tian, R.; Xia, M.; Xu, D. Profile quasi-maximum likelihood estimation for semiparametric varying-coefficient spatial autoregressive panel models with fixed effects. Stat. Pap. 2024, 65, 5109–5143. [Google Scholar] [CrossRef]
  33. Kelejian, H.H.; Prucha, I.R. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 2001, 104, 219–257. [Google Scholar] [CrossRef]
  34. He, X.; Fung, W.K.; Zhu, Z. Robust Estimation in Generalized Partial Linear Models for Clustered Data. J. Am. Stat. Assoc. 2005, 100, 1176–1184. [Google Scholar] [CrossRef]
  35. Stone, C.J. Optimal Global Rates of Convergence for Nonparametric Regression. Ann. Stat. 1982, 10, 1040–1053. [Google Scholar] [CrossRef]
  36. Dietz, T.; Rosa, E. Effects of population and affluence on CO2 emissions. Proc. Natl. Acad. Sci. USA 1997, 94, 175–179. [Google Scholar] [CrossRef]
  37. Wen, L.; Shao, H. Analysis of influencing factors of the carbon dioxide emissions in China’s commercial department based on the STIRPAT model and ridge regression. Environ. Sci. Pollut. Res. 2019, 26, 27138–27147. [Google Scholar] [CrossRef] [PubMed]
  38. Li, W.; Wang, W.; Wang, Y.; Qin, Y. Industrial structure, technological progress and CO2 emissions in China: Analysis based on the STIRPAT framework. Nat. Hazards 2017, 88, 1545–1564. [Google Scholar] [CrossRef]
Figure 1. The estimated varying coefficient functions based on SCAD and ALASSO. (a) Varying coefficient derived from SCAD; (b) varying coefficient derived from ALASSO.
Figure 1. The estimated varying coefficient functions based on SCAD and ALASSO. (a) Varying coefficient derived from SCAD; (b) varying coefficient derived from ALASSO.
Axioms 14 00121 g001
Table 1. Variable selection under ϵ i t N ( 0 , σ 0 2 ) in Example 1 ( T = 10 ) .
Table 1. Variable selection under ϵ i t N ( 0 , σ 0 2 ) in Example 1 ( T = 10 ) .
σ 0 2 = 1 σ 0 2 = 2
N ρ 0 Method θ ^ α ^ ( · ) θ ^ α ^ ( · )
GMSECIASECIGMSECIASECI
300SCAD0.0205.97000.1584.95000.0495.96600.2344.7220
ALASSO0.0255.77200.1584.95800.0625.65400.2104.9240
Oracle0.0206.00000.1555.00000.0446.00000.2075.0000
0.3SCAD0.0304.82200.1604.94000.0644.80200.2354.7260
ALASSO0.0284.88000.1574.95000.0644.81200.2144.9200
Oracle0.0255.00000.1565.00000.0525.00000.2065.0000
0.7SCAD0.0264.98400.1594.96200.0694.96600.2334.7580
ALASSO0.0304.91600.1584.97200.0814.82800.2184.9340
Oracle0.0255.00000.1565.00000.0635.00000.2095.0000
600SCAD0.0115.99000.1284.99200.0235.98600.1604.8260
ALASSO0.0145.83400.1284.99800.0325.74200.1544.9900
Oracle0.0126.00000.1285.00000.0246.00000.1525.0000
0.3SCAD0.0154.94400.1294.98000.0274.89600.1624.7780
ALASSO0.0164.95800.1295.00000.0304.90600.1544.9900
Oracle0.0145.00000.1285.00000.0265.00000.1525.0000
0.7SCAD0.0154.98800.1294.97800.0294.99000.1594.8480
ALASSO0.0184.95600.1294.99600.0364.89000.1554.9920
Oracle0.0155.00000.1285.00000.0295.00000.1525.0000
Table 2. Variable selection under ϵ i t N ( 0 , σ 0 2 ) in Example 1 ( T = 15 ) .
Table 2. Variable selection under ϵ i t N ( 0 , σ 0 2 ) in Example 1 ( T = 15 ) .
σ 0 2 = 1 σ 0 2 = 2
N ρ 0 Method θ ^ α ^ ( · ) θ ^ α ^ ( · )
GMSECIASECIGMSECIASECI
300SCAD0.0145.97400.1384.93000.0315.96000.1794.7960
ALASSO0.0185.81000.1364.99000.0455.73000.1684.9840
Oracle0.0156.00000.1355.00000.0306.00000.1665.0000
0.3SCAD0.0174.91800.1374.94400.0384.87400.1874.7020
ALASSO0.0194.95200.1354.99800.0464.90200.1724.9620
Oracle0.0165.00000.1355.00000.0355.00000.1685.0000
0.7SCAD0.0194.98000.1394.93400.0454.97800.1854.7400
ALASSO0.0234.93000.1374.98800.0584.85400.1724.9640
Oracle0.0195.00000.1365.00000.0425.00000.1685.0000
600SCAD0.0115.99400.1184.98800.0185.98400.1374.9100
ALASSO0.0135.86400.1194.98800.0255.83600.1355.0000
Oracle0.0126.00000.1185.00000.0196.00000.1345.0000
0.3SCAD0.0124.97400.1194.99400.0204.94200.1374.8920
ALASSO0.0134.97800.1195.00000.0244.95000.1364.9960
Oracle0.0125.00000.1195.00000.0205.00000.1345.0000
0.7SCAD0.0134.99800.1204.99400.0224.99600.1384.9040
ALASSO0.0164.97600.1205.00000.0314.95400.1384.9940
Oracle0.0135.00000.1205.00000.0235.00000.1355.0000
Table 3. Description of related variables.
Table 3. Description of related variables.
First-Tier IndicatorsSecond-Tier IndicatorsAbbreviationSymbol
Environmental impactCarbon emissions per capitaCE y i t
PopulationUrbanization rateUR x i t 1
Science and technologyEnergy intensityEI x i t 2
R&D funding intensityRDF x i t 3
FinanceFinancial Interrelation RatioFIR x i t 4
Financial EfficiencyFE x i t 5
TransportFreight volume per capitaFV x i t 6
Public transport passenger volume per capitaPTPV x i t 7
EcologyPercentage of forest coverPF x i t 8
Energy StructureCoal consumption proportionCCP u i t
AffluenceGDP per capitaGDP z i t 1
Fiscal revenue per capitaFR z i t 2
Residents consumption levelRCL z i t 3
Disposable income per capitaDI z i t 4
Import and export amount per capitaIEA z i t 5
Fixed investment per capitaFINV z i t 6
Table 4. Penalized estimators for the parametric components.
Table 4. Penalized estimators for the parametric components.
Method ρ ^ β ^ 1 β ^ 2 β ^ 3 β ^ 4 β ^ 5 β ^ 6 β ^ 7 β ^ 8 σ ^ 2
SCAD0.15400.3500.0870.24500.100−0.01400.005
ALASSO0.16700.3550.1200.19300.113−0.01700.005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tian, R.; Xia, M.; Xu, D. A Penalized Profile Quasi-Likelihood Method for a Semiparametric Varying Coefficient Spatial Autoregressive Panel Model with Fixed Effects. Axioms 2025, 14, 121. https://doi.org/10.3390/axioms14020121

AMA Style

Tian R, Xia M, Xu D. A Penalized Profile Quasi-Likelihood Method for a Semiparametric Varying Coefficient Spatial Autoregressive Panel Model with Fixed Effects. Axioms. 2025; 14(2):121. https://doi.org/10.3390/axioms14020121

Chicago/Turabian Style

Tian, Ruiqin, Miaojie Xia, and Dengke Xu. 2025. "A Penalized Profile Quasi-Likelihood Method for a Semiparametric Varying Coefficient Spatial Autoregressive Panel Model with Fixed Effects" Axioms 14, no. 2: 121. https://doi.org/10.3390/axioms14020121

APA Style

Tian, R., Xia, M., & Xu, D. (2025). A Penalized Profile Quasi-Likelihood Method for a Semiparametric Varying Coefficient Spatial Autoregressive Panel Model with Fixed Effects. Axioms, 14(2), 121. https://doi.org/10.3390/axioms14020121

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop