1. Introduction
By tracking each individual in a predetermined sample over time, a panel dataset with multiple observations can be collected. In contrast to cross-sectional data (Cheng and Chen [
1]), the panel data structure is sufficiently abundant to permit estimation and testing of regression models that include not only individual characteristics observed by the researchers but also unobserved characteristics that vary across individuals or/and time periods, simply referred to as “individual or/and time effects”. In recent decades, many studies have focused on parametric panel data models (Chamberlain [
2], Chamberlain [
3], Hsiao [
4], Baltagi [
5], Arellano [
6], Wooldridge [
7], Thapa et al. [
8], and Rehfeldt and Weiss [
9]).
In practice, a lot of spatial panel datasets are collected from “locations” and require further studying. Therefore, based on classic parametric panel data models, different types of parametric panel data models with spatial correlations at different locations are presented. There are three fundamental types of spatial parametric panel data model (Elhorst [
10]): spatial autoregressive (SAR) model (Brueckner [
11]), spatial Durbin model (SDM) (LeSage [
12], Elhorst [
13]), and spatial error model (SEM) (Allers and Elhorst [
14]). A more detailed discussion can be found in Elhorst [
15], Baltagi and Li [
16], Pesaran [
17], Kapoor et al. [
18], Lee and Yu [
19], Mutl and Pfaffermayr [
20], and Baltagi et al. [
21]. Among these, the SEM introduced by Anselin [
22] accounts for the interaction effects in spatially correlated determinants of the dependent variable that is excluded from the model or in unobserved shocks with a spatial mode (Elhorst [
13]). The development of testing and estimation of SEM has been summarized in books by Baltagi [
5], Elhorst [
10], and Anselin [
22] as well as in surveys by Baltagi et al. [
23], Bordignon et al. [
24], and Kelejian and Prucha [
25], among others. However, the aforementioned SEM assumes that the only correlation over time results from the same regional impact presented across the panel. In spatial panel data analysis, this assumption may be too restrictive for real situations. For example, if an undetected shock influences investments across regions, behavioral relationships are affected, at least in the next few stages. Thus, ignoring serially correlated errors may lead to inefficient regression coefficient estimators (Baltagi [
5]).
By adding serially correlated errors to SEM, parametric panel data models with serially and spatially correlated errors can be established. The extended models consider not only serial correlations through spatial units but also spatial correlations at every point temporally. Researchers have established two types of parametric panel data models with serially and spatially correlated separable and nonseparable errors. The model with the separable error was used to analyze the effects of public infrastructure investment on the costs and productivity of private enterprises (Cohen and Paul [
26]). The MLE and corresponding asymptotic properties of spatial panel data model with serially and spatially correlated separable error were investigated by Lee and Yu [
27]. A random effects parametric panel data model with serially and spatially correlated error was introduced by Parent and LeSage [
28], and the estimators were obtained using the Markov chain Monte Carlo (MCMC) method. Elhorst [
29] constructed the MLE of a parametric panel data model with serially and spatially correlated nonseparable error without establishing the asymptotic properties of the estimators. Lee and Yu [
30] established the QMLE and corresponding asymptotic properties for both linear panel data models with serially and spatially correlated separable and nonseparable errors.
Although the theories and applications of the aforementioned linear parametric panel data models with serially and spatially correlated separable/nonseparable errors have been thoroughly explored, their usefulness in applications is often impractical because they lack flexibility and are limited in their ability to detect complicated structures (e.g., nonlinearity). Furthermore, a misspecification of the data generating process by the linear parametric panel data model may result in significant modeling bias and can lead to incorrect findings. For these reasons, Zhao et al. [
31] investigated the estimation and testing issues of a partially linear single-index panel data model with serially and spatially correlated separable errors using the semiparametric minimum average variance method and a F-type test statistic. Li et al. [
32] obtained the PQMLE, the generalized F-test, and the asymptotic properties for a partially linear nonparametric regression model with serially and spatially correlated separable error. Li et al. [
33] studied the PQMLEs of the unknowns for the fixed effects partially linear varying coefficient panel data model with serially and spatially correlated nonseparable error, thereby deriving the consistency and asymptotic normality. Bai et al. [
34] constructed the estimators of the parametric and nonparametric components of a partially linear varying-coefficient panel data model with serially and spatially correlated separable errors using weighted semiparametric least squares and weighted polynomial spline series methods, respectively, thereby proving their asymptotic normality.
By adding a nonparametric component to a random effects parametric model with serially and spatially correlated nonseparable errors, a new random effects semiparametric model (RESPM) can be established to concurrently capture the linearity and nonlinearity of the covariates, spatial correlation and serial correlation of errors, and individual random effects. This study explores its PQMLE and hypothesis testing issues from theoretical, simulation, and application perspectives. Our proposed model is different from the model proposed by Li et al. [
32], which assumes spatially and serially correlated errors are separable. The difference in model structure leads to differences in the following aspects: (1) The parameter spaces (i.e, stationarity conditions) of the two models are different. (2) There are differences in the estimation process, assumption conditions and theorem proofs.
The remainder of this paper is organized as follows: In
Section 2, a RESPM with serially and spatially correlated nonseparable error is introduced, and the estimation and testing procedures are established.
Section 3 lists several regular assumptions and theorems for the estimators and test statistic. In
Section 4 and
Section 5, numerical simulations and real data analyses are presented. Finally,
Section 6 summarizes the results. The
Appendix contains the proofs of the lemmas and theorems.
Notation. Some important symbols and their definitions that appear throughout the paper are given in
Table 1.
5. Real Data Analysis
In this section, the proposed estimation and testing methods are demonstrated on the Indonesian rice farm dataset. This dataset is collected by the Agricultural Economic Research Center of the Ministry of Agriculture of Indonesia. It covers 171 farms over six growing seasons (three wet and three dry seasons) and has been widely used in random effects models (Feng and Horrace [
44]).
The dataset includes observations of five variables which are rice yield, high-yield varieties, mixed-yield varieties, seed weight, and land area, respectively. Rice yield is taken as response variable and others are taken as covariates whose definitions are given in
Table 9.
The testing method proposed in
Section 2.3 is used to determine whether a nonlinear connection exists between the covariates and response variable. The outcomes of the F-test are listed in
Table 10.
Table 10 shows that
(other covariates) exhibits a significant nonlinear (linear) relationship with
at a significance level of
.
Therefore, the following model is established for the data analysis:
where
,
, and
denote the denotes the
ith observation of log(rice), high- and mixed-yield varieties, seed weight, and land area during the
tth growing season, respectively. Furthermore, the
is determined using the following method (Druska and Horrace [
45]):
where
denote the
i-th and
j-th village, respectively. Moreover, the weights matrix is normalized to ensure the elements of each row sum to 1. Additionally, the weights matrix is assumed time invariant, so the
t subscript can be dropped.
Table 11 shows the parametric estimation results, which reveals that (1) All linear covariates
have a positive impact on rice yield. (2)
demonstrates that the rice yield is generally steady and less susceptible to exogenous perturbations.
Figure 6 presents
and its 95% confidence interval, where the blue short dashed curve denotes
, and the red and green solid curves correspond to the 95% confidence bands. Clearly,
has a nonlinear effect on rice yield.