PQMLE of a Partially Linear Varying Coefficient Spatial Autoregressive Panel Model with Random Effects

This article deals with asymmetrical spatial data which can be modeled by a partially linear varying coefficient spatial autoregressive panel model (PLVCSARPM) with random effects. We constructed its profile quasi-maximum likelihood estimators (PQMLE). The consistency and asymptotic normality of the estimators were proved under some regular conditions. Monte Carlo simulations implied our estimators have good finite sample performance. Finally, a set of asymmetric real data applications was analyzed for illustrating the performance of the provided method.


Introduction
Spatial regression models are used to deal with spatially dependent data which widely exist in many fields, such as economics, environmental science and geography. According to the different types of spatial interaction effects, spatial regression models can be sorted into three basic categories (see [1]). The first category is spatial autoregressive (SAR) models, which include endogenous interaction effects among observations of response variables at different spatial scales (see [2]). The second category is spatial durbin models (SDM), which include exogenous interaction effects among observations of covariates and endogenous interaction effects among observations of response variables at different spatial scales (see [3,4]). The third category is spatial error models (SEM), which include interaction effects among disturbance terms of different spatial scales (see [5]). Among them, the SAR models proposed by [6] may be the most popular. The developments in the testing and estimation of SAR models for sectional data were summarized in the books by [7][8][9] and the surveys by [10][11][12][13][14][15][16], among others. Compared to sectional data models, panel data models exhibit an excellent capacity for capturing the complex situations by using abundant datasets built up over time and adding individual-specific or time-specific fixed or random effects. Their theories, methods and applications can be found in the books by [17][18][19][20] and the surveys by [21][22][23][24][25][26], among others.
The above mentioned research literature mainly focuses on linear parametric models. Although the estimations and properties of these models have been well established, they are often unrealistic in application, for the reason that they are unable to accommodate sufficient flexibility to accommodate complex structures (e.g., nonlinearity). Moreover, mis-specification of the data generation mechanism by a linear parametric model could lead to excessive modeling biases or even erroneous conclusions.
Ref. [27] pointed out that the relationship between variables in space usually exhibits highly complexity in reality. Therefore, researches have proposed a number of solution methods. In video coding systems, transmission problems are usually dealt with wavelet-based methods. Ref. [28] proposed a low-band-shift method which avoids the shift-variant property of the wavelet transform and performs the motion compensation more precisely and efficiently. Ref. [29] obtained a wavelet-based lossless video coding scheme that switches between two operations based on the different amount of motion activities between two consecutive frames. More theories and applications can be found in [30][31][32]. In econometrics, some nonparametric and semiparametric spatial regression models have been developed to relax the linear parametric model settings. For sectional spatial data, ref. [33] studied the GMM estimation of a nonparametric SAR model; Ref. [34] investigated the PQMLE of a partially linear nonparametric SAR model. However, such nonparametric SAR models may cause the "curse of dimensionality" when the dimension of covariates is higher. In order to overcome this drawback, ref. [35] studied two-step SGMM estimators and their asymptotic properties for spatial models with space-varying coefficients. Ref. [36] proposed a semiparametric series-based least squares estimating procedure for a semiparametric varying coefficient mixed regressive SAR model and derived the asymptotical normality of the estimators. Some other related research works can be found in [37][38][39][40][41][42]. For panel spatial data, ref. [43] obtained PMLE and its asymptotical normality of varying the coefficient SAR panel model with random effects; Ref. [44] applied instrumental variable estimation to a semiparametric varying coefficient spatial panel data model with random effects and the investigated asymptotical normality of the estimators.
In this paper, we extend the varying coefficient spatial panel model with random effects given in [43] to a PLVCSARPM with random effects. By adding a linear part to the model of [43], we can simultaneously capture linearity, non-linearity and the spatial correlation relationships of exogenous variables in a response variable. By using the profile quasimaximum likelihood method to estimate PLVCSARPM with random effects, we proved the consistency and asymptotic normality of the estimators under some regular conditions. Monte Carlo simulations and real data analysis show that our estimators perform well. This paper is organized as follows: Section 2 introduces the PLVCSARPM with random effects and constructs its PQMLE. Section 3 proves the asymptotic properties of estimators. Section 4 presents the small sample estimates using Monte Carlo simulations. Section 5 analyzes a set of asymmetric real data applications for illustrating the performance of the proposed method. A summary is given in Section 6. The proofs of some important theorems and lemmas are given in Appendix A.
The model (1) can be simplified as the following matrix form: where Y = (y 11 , y 12 , · · · , y NT ) ; X = (x 11 , x 12 , · · · , x NT ) ; Z = (z 11 , z 12 , · · · , z NT ) ; ε = (ε 11 , ε 12 , · · · , ε NT ) ; W = W 0 ⊗ I T ; b = (b 1 , b 2 , · · · , b N ) ; U = I N ⊗ e T ; I N is an N × N identity matrix; I T is a T × T identity matrix; ⊗ denotes the Kronecker product; e T is a T × 1 vector consisting of 1. Define A(ρ) = I − ρW; then the model (2) can be rewritten as: where I is an NT × NT unit matrix. For the model (3), it is easy to get the following facts: According to [12], the quasi-log-likelihood function of the model (3) can be written as follows: where H = I N ⊗ (T −1 e T e T ), and c is a constant. By maximizing the above quasi-log-likelihood function with respect to θ, the quasimaximum likelihood estimators of α, σ 2 b and σ 2 ε can be easily obtained as By substituting (5)-(7) into (4), we have the concentrated quasi-log-likelihood function of ρ as It is obvious that we cannot directly obtain the quasi-maximum likelihood estimator of ρ by maximizing the above formula because β(u it ) is an unknown function. In order to overcome this problem, we use the PQMLE method and working independence theory ( [45,46]) to estimate the unknown parameters and varying coefficient functions of the model (1). The main steps are as follows: Step 1 Suppose that θ is known. β s (·) can be approximated by the first-order Taylor expansion β s (u it ) ≈ β s (u) + (u it − u)β s (u)=a 1s + a 2s (u it − u), s = 1, 2, · · · , p for u it in a neighborhood of u, where a 2s is the first order derivative of β(u). Let a 1 = (a 11 , a 12 , · · · , a 1p ) and a 2 = (a 21 , a 22 , · · · , a 2p ) ; then estimators of a 1 and a 2 can be obtained by (â 1 ,â 2 ) = arg min is a kernel function and h is the bandwidth. Therefore, the feasible initial estimator of β(u) can be obtained byβ I N (u) =â 1 = (â 11 ,â 12 , · · · ,â 1p ) . Denote y NT ) , δ = (a 11 , · · · , a 1p , ha 21 , · · · , ha 2p ) and D u = Therefore, we obtainδ = (D u W u D u ) −1 D u W uỸ .
Let S u = 1 NT D u W u D u , T u = 1 NT D u W uỸ and e 0 = (I p , 0 p ). It is easy to know that where e 0 S u = s(u). Consequently, the initial estimator of Xβ(u) is given by Step 2 Replacing β(u) of (4) withβ I N (u), the estimatorθ of θ can be obtained by maximizing approximate quasi-log-likelihood function: In the real estimation of θ, the procedure is realized by following steps: Firstly, assume ρ is known. The initial estimators of σ 2 b , σ 2 ε and α are obtained by maximizing (9). Then, we find Secondly, with the estimatedσ 2 εI N ,σ 2 bI N andα I N , updateρ by maximizing the concentrated quasi-log-likelihood function of ρ: Therefore, the estimator of ρ is obtained by: Step 3 By substitutingρ into (10), (11) and (12), respectively, the final estimators of σ 2 b , σ 2 ε and α are computed as follows: Step 4 By replacing ρ withρ in (8), we get the ultimate estimator of β(u):

Asymptotic Properties for the Estimators
In this section, we focus on studying consistency and asymptotic normality of the PQMLEs given in Section 2. To prove these asymptotic properties, we need the following assumptions to hold.
To provide a rigorous analysis, we make the following assumptions.
The density function f t (u) of u it is nonzero, uniformly bounded and second-order continuously differentiable on U, where U is the supporting set of K(u). Ω 11 (u) = E(x it x it |u it = u) < ∞ exists, and it has a second-order continuous derivative; (iii) Real valued function β s (u)(s = 1, 2, · · · , p) is second-order, continuously differentiable and satisfies the first order Lipschitz condition, |x it β s (u)| ≤ m β at any u, |z it α| ≤ m α , where m β and m α are positive and constant.  (iii) A(ρ) is nonsingular for any |ρ| < 1.

Assumption 5.
There is an unique θ 0 to make the model (1) tenable.

Remark 1.
Assumption 1 provides the essential features of the regressors and disturbances for the model (see [47]). Assumption 2 concerns the basic features of the spatial weights matrix and the parallel Assumptions 2-5 of [12]. Assumption 2(i) is always satisfied if l N is a bounded sequence. We allow l N to be divergent, but at a rate smaller than N, as specified in Assumption 2(ii). Assumption 2(iii) guarantees that model (2) has an equilibrium given by (3). Assumption 2(iv) is also assumed in [12], and it limits the spatial correlation to some degree but facilitates the study of the asymptotic properties of the spatial parameter estimators. Assumptions 3 and 4 concern the kernel function and bandwidth sequence. Assumption 5 offers a unique identification condition, and Assumption 6 is necessary for proof of asymptotic normality.
In order to prove large sample properties of estimators, we need introduce the following useful lemmas. Before that, we simplify the model (2) and obtain reduced form equation of Y as follows: . The above equations are frequently used in a later derivation.
where Y i are scalar random variables. Further, assume that E|y| r f (x, y)dy < ∞, where f denotes the joint density of (X, Y). Let K be a bounded positive function with a bounded support, satisfying a Lipschitz condition. Given that The proof can be found in [48].
Proof See the Appendix A.
Proof See the Appendix A.

Lemma 4.
Under Assumptions 1-4, we have The proof can be found in [37].
] is a positive definite matrix and Proof See the Appendix A.
With the above lemmas, we state main results as follows. Their detailed proofs are given in the Appendix A.

Monte Carlo Simulation
In this section, Monte Carlo simulations are presented, which were carried out to investigate the finite sample performance of PQMLEs. The sample standard deviation (SD) and two root mean square errors (RMSE1 and RMSE2) were used to measure the estimation performance. RMSE1 and RMSE2 are defined as: where mcn is the number of iterations;θ i (i = 1, 2, · · · , mcn) are estimates of θ for each iteration, θ 0 is the true value of θ;θ 0.25 ,θ 0.5 andθ 0.75 are the upper quartile, median and lower quartile of parametric estimates, respectively. For the nonparametric estimates, we took the mean absolute deviation error (MADE) as the evaluation criterion: where {u q } Q q=1 are Q fixed grid points in the support set of u. In the simulations, we applied the rule of thumb method of [48] to choose the optimal width and let kernel function be an Epanechnikov kernel K(u) = ( 3 [33]). We ran a small simulation experiment with mcn = 500 and generated the simulation data from following model: it + 0.5u it , (α 1 , α 2 ) = (1, 1.5) and ρ = 0.25, 0.5 and 0.75, respectively. Furthermore, we chose the Rook weight matrix (see [7]) to investigate the influence of the spatial weight matrix on the estimates. Our simulation results for both cases T = 10 and T = 15 are presented in Tables 1-6.       Tables 1-6, one can obtain the following findings: (1) The RMSE1s and RMSE2s forρ,σ 2 ε ,α 1 andα 2 were fairly small for almost all cases, and they decreased as N increased. The SDs and RMSEs forσ 2 b are not negligible for small sample sizes, but they decreased as N increased. (2) For fixed T, as N increased, the SDs for ρ,σ 2 ε ,σ 2 b ,α 1 andα 2 decreased for all cases. RMSE1s and RMSE2s forα 1 andα 2 decreased rapidly, whereas the RMSEs for estimates of the others parameters did not change much. For fixed N, as T increased, the behavior of the estimates of parameters was similar to the case where N changed under the fixed T. (3) The SDs for MADEs of varying coefficient functions β 1 (u) and β 2 (u) decreased as T or N increased. Combined with the above three findings, we conclude that the estimates of the parameter and unknown varying coefficient functions were convergent. Figures 1 and 2 present the fitting results and 95% confidence intervals of β 1 (u) and β 2 (u) under ρ = 0.5, respectively, where the short dashed curves are the average fits over 500 simulationsβ s (u) (s = 1, 2) by PQMLE, the solid curves are the true values of β s (u) (s = 1, 2) and the two long dashed curves are the corresponding 95% confidence bands. By observing every subgraph in Figures 1 and 2, we can see that the short dashed curve is fairly close to solid curve and the corresponding confidence bandwidth is narrow. This illustrates that the nonparametric estimation procedure works well for small samples. To save space, we do not present the cases ρ = 0.25 and ρ = 0.75 because they had similar results as the case ρ = 0.5.

Real Data Analysis
We applied the housing prices of Chinese city data to analyze the proposed model with real data. The data were obtained from the China Statistical Yearbook, the China City Statistical Yearbook and the China Statistical Yearbook for Regional Economies. Based on the panel data of related variables of 287 cities at/above the prefecture level (except the cities in Taiwan, Hong Kong and Macau) in China from 2011 to 2018, we explored the influencing factors of housing prices of Chinese cities by PLVCSARPM with random effects.
Taking [49][50][51] as references, we collected nine variables related to housing prices of China cities, including each city's average selling price of residential houses (denoted by HP, yuan/sq.m), the expectation of housing price trends (EHP, %), population density (POD, person), annual per capita disposable income of urban households (ADI, yuan), loan-to-GDP ratio (MON, %), natural growth rate of population (NGR, ), sulphur dioxide emission (SDE, 10,000 tons), area of grassland (AOG, 10,000 hectares) and the value of buildings completed (VBC, yuan/sq.m). According to result of non-linear regression, the established model is given by where y it represents the ith observation of ln(HP) at time t; z it represents the ith observation of EHP at time t; x it means the ith observation of POD, ln(ADI), MON, NGR, SDE and AOG at time t, respectively, u it represents the ith observation of ln(VBC) at time t.
In order to transfer the asymmetric distribution of POD to nearly uniform distribution on (0,1), we set The spatial weight matrix W 0 we adopted is calculated as follows:  Figure 3. The estimated functions β s (u) and corresponding 95% confidence intervals for the housing prices of Chinese cities.
The estimation results of parameters in the model (14) are reported in Table 7. It can be seen from Table 7: (1) all estimates of parameters are all significant; (2) spatial correlation coefficientρ = 0.3820 > 0, which means there exists a spatial spill over effect for housing prices of Chinese cities; (3)α = 0.9869 > 0 indicates that the expectation of housing price trends (EHP) has a promotional effect on housing prices of Chinese cities; (4)σ 2 ε = 0.0163 shows that the growth of housing prices in different regions is relatively stable and is less affected by external fluctuations. Notes: ** and *** are significant at the significance levels of 5% and 1%, respectively.

Conclusions
In this paper, we proposed PQMLE of PLVCSARPM with random effects. Our model has the following advantages: (1) It can overcome the "curse of dimensionality" in the nonparametric spatial regression model effectively. (2) It can simultaneously study the linear and non-linear effects of coveriates. (3) It can investigate the spatial correlation effect of response variables. Under some regular conditions, consistency and asymptotic normality of the estimators for parameters and varying coefficient functions were derived. Monte Carlo simulations showed the proposed estimators are well behaved with finite samples. Furthermore, the performance of the proposed method was also assessed on a set of asymmetric real data.
This paper only focused on the PQMLE of PLVCSARPM with random effects. In future research, we may try to extend our method to more general models, such as a partially linear varying coefficient spatial autoregressive model with autoregressive disturbances. In addition, we also need study the issues of Bayesian analysis, variable selection and quantile regression in these models.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are openly available in Reference [50].

Acknowledgments:
The authors are deeply grateful to editors and anonymous referees for their careful reading and insightful comments. Their comments led us to significantly improve the paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
To proceed with the proofs of main lemmas and theorems, we first provide two frequently used evident facts (see [14]). Fact 1. If the row and column sums of the n × n matrices B 1n and B 2n are uniformly bounded in absolute value, then the row and column sums of B 1n B 2n are also uniformly bounded in absolute value. Fact 2. If the row (resp. column) sums of B 1n are uniformly bounded in absolute value and B 2n is a conformable matrix whose elements are uniformly O(o n ), then so are the elements of B 1n B 2n (resp. B 2n B 1n ).
Proof of Lemma 2. Recall S u = 1 NT D u W u D u . By straightforward calculation, it is not difficult to get By Assumptions 1-3, we know where µ l = υ l k(υ)dυ and υ = u it −u h , D it (i = 1, 2, · · · , N) are i.i.d for any fixed t and l. According to Khinchine law of large numbers, we have C tl p −→ EC tl . Therefore, it is not hard to get that

Proof of Lemma 3.
Recall T u = 1 NT D u W uỸ . By straightforward calculation, we obtain for l = 0, 1. Then, it is not hard to obtain For the first term of above equality in (A1), by similar proof procedures of Lemma 2, it is easily to get that For the second term of above equality in (A1), we obtain Therefore, Furthermore, by using (8) and Lemma 2, we easily knowβ I N (u) p −→ β 0 (u). Proof of Lemma 5. Its proof is analogous to Lemma 4.2 in [34] and Theorem 3.2 in [12]. The major difference is that our model is panel model with random effects instead of sectional model.

Proof of Theorem 4. By Taylor expansion of
whereθ = (ρ,α ,σ 2 b ,σ 2 ε ) andθ lies betweenθ and θ 0 . By Theorem 1 and Theorem 3, we easily know thatθ converges to θ 0 in probability. Denote Next, we prove 1 NT To prove (A8), we need to show that each element of 1 converges to 0 in probability. It is not difficult to get 1 NT By mean value theorem, Assumption 2 (iv) and Lemma 5, we have whereρ * lies betweenρ and ρ 0 . Similarly, we have 1 NT (1).
Denote R u = 1 NT D u W u (Ub + ε). By using Lemma 1 and Taylor expansion of β 0 (u it ) at u, it is not difficult to get According to the proof procedure of Lemma 2, we know = µ 2 f (u)Ω 11 (u)β 0 (u).