Previous Article in Journal
Shapovalov Wave-Like Spacetimes

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Lagrange Multiplier Test for Spatial Autoregressive Model with Latent Variables

by
Anik Anekawati
1,2,
Bambang Widjanarko Otok
1,*,
1 and
Sutikno Sutikno
1
1
Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
2
Faculty of Teacher Training and Education, Universitas Wiraraja, Sumenep 69451, Indonesia
*
Author to whom correspondence should be addressed.
Symmetry 2020, 12(8), 1375; https://doi.org/10.3390/sym12081375
Submission received: 15 July 2020 / Revised: 16 August 2020 / Accepted: 17 August 2020 / Published: 18 August 2020

## Abstract

:
The focus of this research is to develop a Lagrange multiplier (LM) test of spatial dependence for the spatial autoregressive model (SAR) with latent variables (LVs). It was arranged by the standard SAR, where the independent variables were replaced by factor scores of the exogenous latent variables from a measurement model (in structural equation modeling) as well as their dependent variables. As a result, an error distribution of the SAR-LVs should have a different distribution from the standard SAR. Therefore, this LM test for the SAR-LVs is based on the new distribution. The estimation of the latent variables used a weighted least squares (WLS) method. The estimation of the SAR-LVs parameter used a two-stage least squares (2SLS) method. The SAR-LVs model was applied to the model with a positive and negative spatial autoregressive coefficient to illustrate how it was interpreted.

## 1. Introduction

Researchers have often faced models involving latent variables and analyzed the relationships between two or more of those latent variables simultaneously. Latent variables are unobserved or unmeasured variables [1] and measured by connecting to the observed variables because they cannot be directly measured [2]. The statistical methodology that is able to accommodate these two objectives is structural equation modeling (SEM). SEM is a statistical method used to test the relationships between latent variables (path models) and between its observed variables (confirmatory factor models) [2]. In general, SEM has two submodels, namely the measurement model and the structural model. The structural model describes the relationship between latent variables, while the measurement model is the relationship between indicators and the latent variables that construct it.
In social research, analyses involving latent variables and at the same time having a spatial effect have often been found. Spatial dependence may be caused by different kinds of spatial spillover effects. There are two frameworks that involve spatial data in the SEM model, namely at the level of the measurement model or the structural model. The involvement of spatial data at the level of the measurement model is commonly used to analyze multivariate spatial data [3], i.e., when several variables are measured at the same locations over a spatial area, and they are often correlated with each other. Each of them might also be correlated across the locations because of geographic similarities of the different locations.
Christensen and Amemiya [4] suggested a model with a latent variable that is distribution-free to analyze multivariate spatial data. However, it was limited by the assumption of a linear relationship between observed and latent variables that might not apply to Poisson and binomial data. The parameters of the model were estimated by means of a moment method. Wang and Wall [3] proposed the generalized common spatial factor, which was an extension of the traditional factor analysis model. In this model, it was assumed that the common factors were spatially correlated and extended to handle more types of observed data from exponential families, especially Poisson and binomial data.
Hogan and Tchernis [5] proposed the method of a Bayesian hierarchical model for analysis factors of spatially correlated multivariate data. At the first level, the distribution of a vector of manifest variables was conditional on an underlying latent factor in each location, whereas at the second level, the area-specific latent factors had a joint distribution that combined spatial correlation.
In contrast to previous researchers that only analyzed multivariate spatial data in measurement models in the SEM model, Liu et al. [6] developed a generalized spatial structural equation model (GSSEM). They joined the generalized common spatial factor model proposed by Wang and Wall in [3] and SEM that calculated spatial correlations. The GSSEM can also be extended to spatial correlations that can be added to the measurement model. Congdon [7] used the factor analysis on the measurement model. In this model, the construct was observed through indicators. Indicators allowed both spatial correlation and correlation with one another. The relation among constructs that are nonlinear was approached using a spline regression.
Oud and Folmer [8] proposed a SEM approach to the spatial dependence model. They combined the standard spatial model in [9] with the multiple indicators multiple causes (MIMIC) model in [10]. In this approach, the spatial weight that described the spatial spillover effects was located in the structural model. This approach was more flexible and informative compared to modeling that gave the spatial weight to the measurement model. The parameters of this model were estimated using Full Information Maximum Likelihood (FIML) and resulted in an estimator to control the bias of endogeneity due to the interaction between the dependent variable and its lag.
Anekawati et al. [11] conducted modeling of education quality in the senior high school level using the spatial SEM approach. Although they allocated the spatial weight on the structural model, their work had a different perspective on the model in [8]. They developed the spatial SEM model from the standard spatial model by Anselin in [9] but replaced the dependent variables by endogenous latent variables, as well as independent variables. The latent variables were estimated as the factor scores using the partial least squares (PLS) method through iterative estimation developed by Trujillo in [12]. The factor scores were modeled by involving the spatial effects, and the spatial dependence of this model was tested using the Lagrange multiplier (LM) test. The results of the spatial dependence test led to the spatial autoregressive (SAR) model. Furthermore, this model was called the spatial autoregressive model with latent variables (SAR-LVs). The parameters of the SAR-LVs model were estimated using maximum likelihood estimation (MLE).
Anekawati et al. [13] estimated the parameters of the SAR-LVs model from Anekawati et al. in [11] using the generalized method of moment (GMM), which was developed by Kelejian and Prucha in [14,15]. The SAR-LVs model in [13] indicated a better fit for the model than the MLE method since it produced a higher R-square. Additionally, the GMM was computationally easier than MLE.
This idea can be used as alternative modeling involving latent variables and spatial data simultaneously, as the research limitation in [16]. The research purposes in [16] were to identify the relationship between vulnerability factors related to social, economic, and environmental aspects, and economic losses from natural disasters in 230 local communities in South Korea. The social, economic, and environmental aspects were latent variables measured by connecting to the observed variables. The social aspect was constructed by two indicators, namely the percentage of the population over age 15 without elementary school completion and a minority percentage of foreigners. Additionally, the economic and environmental aspects were latent variables. However, the relationship was modeled based on indicators, which were not latent variables. It was less precise to identify the research purpose. That study revealed the limitations of the study, which was an indicator-based approach for the identification of vulnerability factors. Therefore, the method in this study provides an alternative solution for the spatial model that involves latent variables, especially focusing on the spatial dependency test using the LM test.
Oud and Folmer [8] did not perform a diagnostic test of spatial dependence, so there was no direction in determining whether the model led to the spatial autoregressive model or the spatial error model. Anekawati’s research works in [11,13] tested the spatial dependence using the Lagrange multiplier (LM) to diagnose spatial dependence.
One of the constructions of the test of parametric hypotheses based on asymptotic theory is the LM test [17]. Anselin [18] developed the diagnostics for spatial dependence using the LM test. The LM approach seems reasonable and relatively easy based on estimation under the null hypotheses [17], namely, in its most simple form. Yang [19] introduced a residual-based bootstrap method for asymptotically refined approximations to the finite sample critical values of the LM statistics.
The LM test of spatial dependence [18,19] did not involve latent variables for the standard SAR model. Anekawati‘s work [11,13] used the LM test of spatial dependence for the SAR-LVs model, but an error distribution of the model was assumed the same as the standard SAR model in [9,18]. The LM test for the standard SAR model in [9,18] had an assumption that error was normally distributed . Meanwhile, the SAR-LVs model was modeled based on the standard SAR model, where factor scores replaced the independent and dependent variables. The factor scores were the estimation result of the latent variables in the submodel in SEM, namely, the measurement model. The measurement model had the assumption that the error was normally distributed, namely, for exogenous and $ε * ∼ N ( 0 , Θ ε * )$ for endogenous, while the error distribution in the standard SAR model was . As a result, the error distribution of the SAR-LVs model should have a different distribution from the error of the standard SAR model. In this paper, an attempt is made to fill this gap. The focus of this study is to develop a Lagrange multiplier test of spatial dependence for the spatial autoregressive model (SAR) with latent variables (LVs).
To complete this paper, the estimation of latent variables into factor scores uses the weighted least squares (WLS) method, so that the error distribution of the SAR-LVs model is constructed from the result of this estimation. The estimation of parameters of the SAR-LVs model uses the two-stage least squares (2SLS) method. In the last section, the SAR-LVs model is applied for cases of the positive and negative spatial autoregressive coefficient to provide an interpretation of the spatial autoregressive coefficient.

## 2. Materials and Methods

#### 2.1. Spatial Autoregressive Model with Latent Variable (SAR-LVs Model)

SEM consists of two basic components—the structural model and measurement model in [2]. The measurement model represents the relationship between the manifest variable and exogenous latent variables (1) or endogenous latent variables (2), while the structural model describes the relationship among the latent variables (3). Bollen in [1] wrote the measurement and structural model as Equations (1)–(3).
where $η$ is the $( q × 1 )$ endogenous random vector, $q$ is the number of the endogenous variables, $ξ$ is the $( p × 1 )$ exogenous random vector, $p$ is the number of the exogenous variables, $B$ is the $( q × q )$ coefficient matrix that shows the effect of the relationship of an endogenous latent variable to another endogenous variable, $Γ$ is the $( q × p )$ coefficient matrix which shows the effect of $ξ$ relationship to $η$, $ζ$ is the $( q × 1 )$ random error vector, $y$ is the $( B × 1 )$ observed vector of the endogenous variables, $B$ is a total number of indicators of the endogenous variables, $x$ is the $( A × 1 )$ observed vector of the exogenous, $A$ is a total number of indicators of the exogenous variables, is the $( B × q )$ coefficient matrix which shows the relationship of $y$ to $η$, is the $( A × p )$ coefficient matrix which shows the relationship of $x$ to $ξ$, $ε *$ is the $( B × 1 )$ measurement error vector of $y$, and $δ$ is the $( A × 1 )$ measurement error vector of $x$.
Assumptions that must be fulfilled are is nonsingular, and the element of error vectors of measurements, namely $δ i$ and are homoscedastic and nonautocorrelated across observations (see [1]).
Anselin wrote the standard SAR model in [9]:
where $y *$ is the $( T × 1 )$ spatially lagged dependent vector, $T$ is the number of the observed units, $X$ is the $( T × ( p + 1 ) )$ exogenous matrix, $λ$ is coefficient of $y *$, $β$ is the $( ( p + 1 ) × 1 )$ parameters vector of exogenous, $W$ is the $( T × T )$ spatial weight matrix with the main diagonal elements being zero, $ε$ is the disturbance vector, where it is the classic homoscedastic situation. The queen contiguity method was used for spatial weighting. In queen contiguity, Wij is defined as 1 for the entity where the common side or the common vertex meets the region of concern, and Wij is defined as 0 for other regions [20].
Oud and Folmer in [8] wrote the SAR-LVs model in the form of the MIMIC. The proposed model was $η ~ = λ W η ~ + X ~ γ + ζ ~$, where $λ$ was the spatial lag coefficient of the endogenous variable, $W$ was the contiguity matrix, and $X ~$ was the observation matrix of the explanatory variable.
The SAR-LVs model is a model that involves latent variables, and the unit of observation is location. The SAR-LVs model is a standard SAR model in which the independent and dependent variables are latent variables. In the SEM model, there are latent variables that cannot be measured directly as a sample unit. Therefore, in this work, to represent the latent variable in the standard SAR in Equation (4) is changed by the factor score. The latent variable is replaced by the factor score from the measurement model in Equations (2) and (3) as a measured and random unit sample. As seen in Equation (4), the spatially lagged dependent variable $( y * )$ is changed by the endogenous latent variable ($η$), and the exogenous variable ($X$) is changed by the exogenous latent variable ($ξ$), which is previously estimated using the WLS method. The result of estimation of the latent variable is denoted by $η ^ = l$ and $ξ ^ = K$. This SAR-LVs model does not use the MIMIC model, since there are no exogenous or endogenous variables that are observed variables, and the endogenous variable is limited to only one.
Thus, the SAR-LVs model in Equation (4) changes to:
where $l$ is the the $( T × 1 )$ endogenous factor score vector, $K$ is the $( T × ( p + 1 ) )$ exogenous factor score matrix, and $β$ is the $( ( p + 1 ) × 1 )$ regression coefficient vector.

#### 2.2. Estimation of Score of Latent Variable

The factor score is the estimation result of the latent variables, both the endogenous and exogenous variables in the measurement model. The method used is the WLS, which is by minimizing the sum squared errors that are weighted by the error variant matrix. In the estimation process, to obtain a factor score, it is assumed that the value of the loading factor and the error variant matrix are constant.
In the equation of the measurement model of the exogenous latent variable (1), where $p$ is the number of the exogenous latent variable, $a i$ is the number of indicators of the ith exogenous latent variable, and is an error, where with is the covariant–variant matrix of measurement error of observed variable $x$, namely
$Θ δ = d i a g ( σ δ ( 1 ) 1 2 , σ δ ( 2 ) 1 2 , ⋯ , σ δ ( a 1 ) 1 2 , σ δ ( a 1 ) 2 2 , σ δ ( a 2 ) 2 2 , ⋯ , σ δ ( a 2 ) 2 2 , ⋯ , σ δ ( 1 ) p 2 , σ δ ( 2 ) p 2 , ⋯ , σ δ ( a p ) p 2 ) .$
The distribution of $x$ is obtained through the properties of the expected value and variance of a random variable. It is assumed that the value of $Λ x$ and $Θ δ$ are constant. and . Therefore, if and then the distribution of $x$ is . Suppose that a random sample T is given from a random variable $x$
$( x ( x ) 11 , x ( 2 ) 12 , ⋯ , x ( a i ) 1 T , x ( 1 ) 21 , x ( 2 ) 22 , ⋯ , x ( a 2 ) 2 T , ⋯ , x ( 1 ) p l , x ( 2 ) p T , x ( a p ) p T , ξ 11 , ξ 12 , ⋯ , ξ 1 T , ξ 21 , ξ 22 , … , ξ 2 T , … , ξ p 1 , ξ p 2 , … ξ p T )$
with t = 1, 2, …, T so . The probability function of $x t$ is
The likelihood function is
The latent variable $ξ$ is estimated using the WLS method with optimization $L ( ξ , Θ δ$). Maximum by adding the weight of an error variant matrix $Θ δ$ is obtained as $∑ t = 1 T ξ ^ t = ( Λ x ′ Θ δ − 1 Λ x ) ( Λ x ′ Θ δ − 1 ) ∑ t = 1 T x t$ or can be written in matrix form and contain each element as
By assuming matrix and $X = ( x ( 1 ) 11 x ( 1 ) 12 ⋯ x ( 1 ) 1 T x ( 2 ) 11 x ( 2 ) 12 ⋯ x ( 2 ) 1 T ⋮ ⋮ ⋱ ⋮ x ( a 1 ) 11 x ( a 1 ) 12 ⋯ x ( a 1 ) 1 T x ( 1 ) 21 x ( 1 ) 22 ⋯ x ( 1 ) 2 T x ( 2 ) 21 x ( 2 ) 22 ⋯ x ( 2 ) 2 T ⋮ ⋮ ⋱ ⋮ x ( a 2 ) 21 x ( a 2 ) 22 ⋯ x ( a 2 ) 22 ⋮ ⋮ ⋱ ⋮ x ( 1 ) p 1 x ( 1 ) p 2 ⋯ x ( 1 ) p T x ( 2 ) p 1 x ( 2 ) p 2 ⋯ x ( 2 ) p 2 ⋮ ⋮ ⋱ ⋮ x ( a p ) p 1 x ( a p ) p 2 ⋯ x ( a p ) p T )$, so
$K ′ = ( Λ x ′ Θ δ − 1 Λ x ) ( Λ x ′ Θ δ − 1 ) X .$
$X$ in Equation (6) is a random observation matrix. Based on Definition A.1 in [21] (Appendix A), $X$ has normal distribution , where $e T × 1 = ( 1 , … , 1 )$.
The definition of the characteristic function of the random matrix $X$ is , where $ι = − 1$. If part of Equation (6) is assumed then Equation (6) can be simplified into $K ′ = X P$. The characteristic function of $K ′$ is . Based on Theorem B.1, the characteristic function of $K ′$ can be changed into . If $Z 1 ′ = Z ′ P$ then the characteristic function of $K ′$ can be changed into .
The distribution of X is and $K ′ = X P$. Based on Theorem A.2, the characteristic function of $K ′$ is . If $Z 1 ′$ is changed by $Z 1 ′ = Z ′ P$ then the characteristic function of $K ′$ is Furthermore, $P$ is changed by $P = ( Λ x ′ Θ δ − 1 Λ x ) ( Λ x ′ Θ δ − 1 )$, so the characteristic function of $K ′$ is
Based on Theorem A.2 and Equation (7), $K ′$ is the matrix variate normal distribution with mean $ξ t e ′$ and covariate matrix and is notated by . Based on Theorem A.1, $K$ is the matrix variate normal distribution that is notated by:
In this paper, the number of the endogenous variables is limited only to one. The covariance matrix of the measurement error of the observed variable for $y$ is $Θ ε *$, namely , where $B$ is the number of indicators of the endogenous latent variable. In the same way as the previous estimation with the exogenous latent variable and by assuming that vector and $Y = ( y 11 y 21 ⋮ y B 1 y 12 y 22 ⋮ y B 2 ⋯ ⋯ ⋱ ⋯ y 1 T y 2 T ⋮ y B T )$, it is obtained
and its distribution is
Theorem A.1, A.2, and B.1 are provided in Appendix A.

#### 2.3. The Error Distribution of The SAR-LVs Model

The equation of the SAR-LVs model from the estimated factor score can be arranged as Equation (6) and adding it can be written as
or can be written in the matrix form
$( l 1 l 2 ⋮ l T ) = ( 1 k 11 k 12 ⋯ k 1 p 1 k 21 k 22 ⋯ k 2 p ⋮ ⋮ ⋮ ⋱ ⋮ 1 k T 1 k T 2 ⋯ k T p ) ( β 0 β 1 β 2 ⋮ β p ) ( l 1 l 2 ⋮ l T ) + ( ε 1 ε 2 ⋮ ε T )$
where is Equation (6) and is Equation (10).
The SAR-LVs model (11) is a spatial regression model by considering $l$ as a response variable and $K$ as a predictor variable, both of which are random. Thus, the function of $l$ is $f ( l | K )$, so that the variable $K$ is no longer random but fixed. As a result, the error in Equation (11) is $ε = ( I − λ W ) l − K β$ where $l$ is a random variable, $K$ is fixed and is assumed not to correlate with $ε$, and $c o v ( l , K ) ≠ 0$.
The expectation value of $ε$ is , and the variance of $ε$ is , so the error distribution of $ε$ is
where
The error distribution of the standard SAR model is , while the error distribution of the SAR-LVs model is as Equation (12). This error distribution is used to construct the LM test.

#### 2.4. Test of Dependency Spatial

The Lagrange multiplier (LM) test of the SAR-LVs model, as shown in Equation (11), is a test based on estimation under the null hypothesis. The likelihood function $l$ in the SAR-LVs model is obtained by replacing $ε$ and multiplying by the Jacobian in the Gaussian function so that the likelihood function for the SAR-LVs model is obtained: $L ( λ , β , Θ ; l ) = π − T / 2 | Θ | − 1 / 2 | C | exp ( − 1 2 ε ′ Θ − 1 ε )$, where $C = ( I − λ W )$.
The log-likelihood function for the SAR-LVs model is , where the value of $Θ$ is as in Equation (13), and $ε = ( I − λ W ) l − K β$.
Breusch and Pagan in [17] defined LM test statistics as follows: $LM λ = D ^ λ ′ Ψ ^ λ λ − 1 D ^ λ$, where $Ψ ^ λ λ − 1$ is an element of an information matrix measuring $k × k$ whose elements are the second derivative of each parameter estimated as $ψ ^ θ = E [ − ∂ 2 ℒ ( θ ) ∂ θ ∂ θ ′ ]$. The test was under the null hypothesis, so $D ^ λ$ is the first derivative of the log-likelihood function of $λ$ where $λ = 0$.
The value of $Ψ ^ λ λ − 1$ and $D ^ λ$ were decomposed in Appendix A, which obtained so where $( l − K β )$ is an error of the OLS regression model, $l − K β = ε ~$, so $D ^ λ = p ( W K β ) ′ ε ~$. The LM statistic test is
where and , so the value of test statistic LM becomes
$L M λ = − ( p ( W K β ) ′ ε ~ ) 2 pd$
The LM statistics $L M λ$ follows the asymptotic distribution of $χ ( 1 ) 2$.

#### 2.5. Estimation of Parameter of SAR-LVs Model

If the parameter of the SAR-LVs model in Equation (11) and its error distribution as Equation (12) are estimated by the OLS, then the estimator is biased and inconsistent, since there is a case where the regression variable $( W l )$ correlates with the error ε or . If the model is estimated by the moment method, the overidentified condition is obtained. The following explains the overidentified condition.
The Equation (11) can be simplified as follows:
$l = Z α + ε ,$
where $Z$ is $Z = ( K | W l )$ and has the $( T × ( p + 2 ) )$-sized, $α$ is $α = ( β ′ | λ ) ′$ and has the the $( ( p + 2 ) × 1 )$-sized.
In this work, δ as Equation (15) was estimated by the 2SLS method as performed by [14,15], namely two steps of the ordinary least squares (OLS) method as follows: (i). The 2SLS method requires an $H$ instrument variable, which is a joint of the $K$ matrix and the $W K$ matrix or written as $H = ( K | W K )$. The instrument variable $H$ is valid because it does not correlate with $ε$ and correlates with regressor $W l$; (ii). Regress $W l$ on instrument variable $H$ to obtain $W l ^ = H ( H ′ H ) − 1 H ′ ( W l )$; (iii). Regress $l$ on $Z ^$ to obtain
$α ^ = ( Z ′ ^ Z ^ ) − 1 Z ^ ′ l ,$
where $Z ^ = ( K | W l ^ )$ and which contains $β ^$ and $λ ^$.

## 3. Results and Discussion

In the discussion, this study examined two cases or models developed with the results of positive and negative spatial autoregressive coefficient to provide an interpretation of the spatial autoregressive coefficient. The first case was the education quality model developed by [13], with updated data in 2018 and showing a negative spatial autoregressive coefficient. Meanwhile, the second case related to a poverty model conducted by [22] producing a positive spatial autoregressive coefficient.
The education quality model for senior high schools in Sumenep Regency involved 27 observation units, one endogenous latent variable, and two exogenous latent variables. The endogenous latent variable was the education quality with three indicators. Indicators of education quality were the ratio of the gross enrolled number of senior high school students to the number of children aged between 15 and 18 years in each district (Y11)—the ratio of the number of accredited senior high schools with at least B levels to the total number of senior high schools in each district (Y12) and the average of national exam scores of senior high school students in each district (Y13).
Exogenous latent variables were school infrastructure and socioeconomic conditions. Indicators of school infrastructure were the proportion of the number of schools with a minimum classroom space according to the regulations of the national education ministry (X11), the proportion of the numbers of schools with laboratories according to the regulations of the national education ministry (X12), and the proportion of the number of schools with libraries according to the regulations of the national education ministry (X13).
Indicators of socioeconomic conditions were the ratio of the number of households running a home industry or with a shop at home to the total number of households in each district (X21) and the ratio of the number of households using clean water to the total number of households in each district (X22). The model is shown in Figure 1.
Latent variables were estimated as in the Equations (6) and (9), then were modeled as in the Equation (11). The estimation of the model parameters as the Equation (16) used the Matlab software, and the results are shown in Table 1. The LM test based on the Equation (14) used Matlab software, and the obtained value $LM λ$ was −2.3272. The value of $LM λ$ was compared to $χ 2$ with degrees of freedom of one and $α = 5 %$. Then, the result was significant towards the SAR-LVs model. Additionally, the spatial autoregressive coefficient was negative.
In general, the SAR-LVs model for the education quality of the senior high school is: , where $l i$ is the education quality in the i-th district, $k 1$ is the infrastructure, and $k 2$ is socioeconomic condition.
The spatial dependence test result was significant. This means there was a correlation between the education quality of the senior high schools in one district and the one in other contiguous districts. The negative spatial autoregressive coefficient interpreted the opposite of the common spillover effect, i.e., a district was supported by or gained a spillover effect of the neighboring districts’ education quality.
The spillover effect of the neighboring districts’ education quality was generally due to the migration of students to find high-quality schools in the neighboring districts. As a result, the districts that had high-quality schools, gained the spillover effect of quality education through high-achieving students from the neighboring districts. In general, high-quality schools in Sumenep Regency are public schools. Figure 2 draws the distribution of districts with and without public schools. To illustrate the student migration, an example is provided in the Gapura district (see Figure 2). Table 2 shows the number of junior high school graduates and new senior high school students for seven districts in the same year, 2018, based on [23]. This table provides an overview of the migration data of students entering senior high school among districts. The Gapura District was used as an example to illustrate student migration (see Figure 2 and Table 2). The Gapura District has one public school and six neighboring neighbors (queen contiguity). Based on Table 2, the Gapura District has 485 junior high school graduates and 622 new high school students. This means that there was a migration of students to the Gapura District.
The Gapura District received the spillover effect of quality education from neighboring districts, especially those with no public schools. The quality spillover was due to schools in the Gapura district having the opportunity to select the best students from the district itself or the neighboring districts.
The second case was the poverty model in East Java province, with an observation unit of 38 regencies. This model had one endogenous latent variable, namely poverty, and three exogenous latent variables, namely Economy, Human Resource, and Health. Poverty indicators were the percentage of the poor population (Y1), the index of poverty depth (Y2), and the index of poverty severity (Y3). Indicators of economics were the percentage of poor people around 15 years old or more who were unemployed (X1), the percentage of poor people aged 15 years old or more who were working in agriculture (X2), and the percentage of households gaining Raskin (X3). Raskin is an Indonesian subsidy program to provide rice for people who live under the poverty line. Indicators of Human Resources were the percentage of poor people aged 15 years old and over who did not complete elementary education (X4), the literacy rate of the poor aged from 15 to 55 years (X5), and the participation rates in schools for the poor aged from 13 to 15 years (X6). Health indicators were the percentage of women using KB (Family planning program) devices in poor households (X7), the percentage of children under five in poorly immunized households (X8), the percentage of poor households using drinking water (X9), and the percentage of poor households using private/together latrines (X10). The model is shown in Figure 3.
Latent variables were estimated as in the Equations (6) and (9), then were modeled as in the Equation (11). The estimation of the model parameters as in the Equation (16) used the Matlab software, and the results are shown in Table 3. The LM test based on Equation (14) used the Matlab software, and the obtained value $LM λ$ was −4965. The value of $LM λ$ was compared to $χ 2$ with degrees of freedom of one and $α = 5 %$. Then, the result was significant towards the SAR-LVs model. Meanwhile, the spatial autoregressive coefficient was positive.
In general, the SAR-LVs model for the poverty is , where $l i$ is poverty in the i-th regency, $k 1$ is Economy, is Human Resource, and $k 3$ is Health.
The spatial dependence test result was significant. This means there was a correlation between poverty in one regency and the one in other contiguous regencies. The positive spatial autoregressive coefficient interpreted the common spillover effect, i.e., a regency gives a poverty spillover effect to the neighboring regencies. Figure 4 describes the poor people distribution in the East Java province measured in percent. The percentage of poor people was based on the poverty data in [24] and was clustered into four quartiles. The regency with the high percentage of poor people was categorized as the first quartile (red zone), with 20.71–13.01%. On the other hand, the fourth quartile (green zone) includes the regency with 7.13–3.80% of the poor group. The first quartile was the regency group with the highest percentage of poor people, and so on until the fourth quartile was the regency group with the lowest percentage of poor people. According to Figure 4, the location of regencies in the first quartile was always close to the regencies in the first and the second quartile. This visualization reinforces the results of the multiplier Lagrange test that there is a spatial effect where one regency gives the influence of poverty on its neighboring regencies.
The finding of this method provides value for policymakers relating to existing problems. The first case was the negative spatial autoregressive coefficient for the education quality model. It interpreted that a district was supported by or gained a spillover effect of the neighboring districts’ education quality through students’ migration. This case needs the policy to strive for quality standardization for all schools. The second case was the positive spatial autoregressive coefficient for the poverty model. It interpreted that a regency gives a poverty spillover effect to the neighboring regencies. Policymakers need this information to assist at the locus of poverty appropriately.

## 4. Conclusions

The SAR-LVs model is a standard SAR model in which the independent and dependent variables are latent variables. The standard SAR model is $y * = λ W y * + X β + ε$. The variable of $y *$ is changed by the endogenous factor score ($η ^$), and $X$ is changed by the exogenous factor score ($ξ ^$). The estimation of the latent variables uses the WLS method and assumes that the value of $Λ x$ and $Θ δ$ are constant. Therefore, the SAR-LVs model can be modeled as , where $η ^ = l$ and $ξ ^ = K$. The distribution of $K$ and $l$ are and . The variables of $l$ and $K$ are random. Thus, the function of $l$ is , so that the variable $K$ is no longer random but fixed. The error distribution is obtained through the properties of the expected value and the variance of the error in the SAR-LVs model, namely where . Based on its error distribution model, so under the null hypothesis, the LM statistic is $L M λ = − ( p ( W K β ) ′ ε ~ ) 2 pd$ and follows the asymptotic distribution of $X ( 1 ) 2$.
Some significant limitations of this study need to be considered. Firstly, the number of endogenous latent variables is one. Future studies can be developed for the higher number of endogenous latent variables. Secondly, the LM test developed was merely for SAR-LVs. Future studies can be developed for a spatial error model with latent variables (SEM-LVs).

## Author Contributions

Conceptualization, A.A., B.W.O., P.P., and S.S.; data curation, B.W.O.; formal analysis, A.A. and P.P.; funding acquisition, A.A.; methodology, A.A., B.W.O., P.P., and S.S.; project administration, A.A.; resources, S.S.; software, A.A.; supervision, B.W.O.; validation, P.P.; visualization, S.S.; writing—original draft, A.A.; writing— review and editing, B.W.O. All authors have read and agreed to the published version of the manuscript.

## Funding

The authors would like to thank the Ministry of Research, Technology, and Higher Education of Indonesia (Kemenristekdikti) for providing a research grant to carry out preliminary research with the contract number 046/SP2H/K7/KM/2017. The first author would also like to thank Universitas Wiraraja for its general financial support.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendix A

#### Appendix A.1. Matrix Variate Normal Distribution

Definition A.1 based on [21] page 56.
The matrix variate normal distribution arises when sampling from the multivariate normal population. Let $x 1 , … , x N$ be a random sample of size N from $N p ( μ , Σ )$. Define the observation random matrix as
where $e N x 1 = ( 1 , … , 1 ) ′$
Theorem A.1 based on [21] page 56.
If , then
Teorema A.2 based on [21] page 56.
If , then the characteristic function of $X$ is $ϕ X ( Z ) = e t r ( ι Z ′ M − 1 / 2 Z ′ Σ Z Ψ )$ where $ι = − 1$

#### Appendix A.2. Properties of Matrix and Derivative of Matrix/Vector

Theorem B.1 based on [21] page 56.
$t r ( A B ) = t r ( B A )$ with and $B q × p$
Properties of derivative matrix/vector
B.2.
B.3. $∂ ( X − 1 ) = − X − 1 ( ∂ X ) X − 1$

#### Appendix A.3. Derivative of the Element of the Information Matrix for the SAR-LVs Model

• The first partial derivative of the log-likelihood function $L ( λ , β , Θ ; l )$ to $λ$ based on the error distribution in Equation (12) where .
• The first partial derivative of
$∂ λ ∂ Θ = − p − 1 ( W A ′ + A W ′ )$
• The first partial derivative of $l n | Θ |$ to $λ$
Based on B.2
$∂ ln | Θ | ∂ λ = Tr ( ( p − 1 A A ′ ) − 1 ∂ Θ ∂ λ ) = Tr ( p ( p − 1 A A ′ ) ( − p − 1 ( W A ′ + A W ′ ) ) ) = − 2 Tr ( A − 1 W )$
• The first partial derivative of $I n | A |$ to $λ$
Based on B.2:
• The first partial derivative of to $λ$
Based on B.3:
• The first partial derivative of $ε ′ Θ − 1 ε$ to $λ$
Based on B.4:
$∂ ( ε ′ Θ − 1 ε ) ∂ λ = − 2 p ( A − 1 W A − 1 K β ) ′ ( l − A − 1 K β )$
The first partial derivative of the log-likelihood function $L ( λ , β , Θ ; l )$ to $λ$ based on the error distribution in Equation (12), point b, c, and d
$∂ L ( λ , β , Θ ; l ) ∂ λ = p ( A − 1 W A − 1 K β ) ′ ( l − A − 1 K β )$
• The first partial derivative of the log-likelihood function $L ( λ , β , Θ ; l )$ to $β$ based on the error distribution in Equation (12)
Based on B.5:
$∂ L ( λ , β , Θ ; l ) ∂ β = p ( A − 1 K ) ′ ( l − A − 1 K β )$
• The second partial derivative $∂ 2 l ( λ , β , Θ ; l ) ∂ λ 2$
Based on B.3:
$∂ 2 L ( λ , β , Θ ; l ) ∂ λ 2 = p [ 2 ( K β ) ′ ( ( A − 1 W A − 1 ) A − 1 ( A − 1 W A − 1 ) ) ′ ( l − A − 1 K β ) −$
$( A − 1 W A − 1 K β ) ′ ( A − 1 W A − 1 ) ( K β ) ]$
• The second partial derivative $∂ 2 L ( λ , β , Θ ; l ) ∂ β ∂ β ′$
$∂ 2 L ( λ , β , Θ ; l ) ∂ β ∂ β ′ = − p ( A − 1 K ) ′ ( A − 1 K )$
• The second partial derivative $∂ 2 L ( λ , β , Θ ; l ) ∂ β ∂ λ$
$∂ 2 L ( λ , β , Θ ; l ) ∂ β ∂ λ = p [ ( l − A − 1 K β ) ′ ( A − 1 W A − 1 K ) − ( A − 1 W A − 1 K β ) ′ ( A − 1 K ) ]$
• The element of the information matrix
• The Element (1,1), namely $Ψ ~ λ λ$
$Ψ ~ λ λ = p [ ( A − 1 W A − 1 K β ) ′ ( A − 1 W A − 1 ) ( K β ) − 2 ( K β ) ′ ( ( A − 1 W A − 1 ) A − 1 ( A − 1 W A − 1 ) ) ′$
$( e η t − A − 1 K β ) ]$
if $λ = 0$ then
$Ψ ~ λ λ = p [ ( W K β ) ′ ( W K β ) − 2 ( W K β ) ′ W ( e η t − A − 1 K β ) ]$
• The Element (2,2), namely $ψ ~ β β$
if $λ = 0$ then
• The Element (1,2), namely $ψ ~ λ β$
$ψ ~ λ β = E ( − ∂ 2 L ( λ , β , Θ ; l ) ∂ λ ∂ β ′ )$
$ψ ~ λ β = p [ ( A − 1 W A − 1 K β ) ′ ( A − 1 K ) − ( e η t − A − 1 K β ) ′ ( A − 1 W A − 1 K ) ]$
if $λ = 0$ then $ψ ~ λ β = p [ ( W K β ) ′ K − ( e η t − K β ) ′ ( W K ) ]$
• The Element (2,1), Namely
if $λ = 0$ then
• The information matrix
if $λ = 0$ then the information matrix is $Ψ ~ θ = [ Ψ ~ λ λ Ψ ~ β λ Ψ ~ λ β Ψ ~ β β ]$
• Invers of the information matrix when $λ = 0$
If the partition matrix is $C = [ C 1 C 2 C 3 C 4 ]$ then the invers matrix is $C − 1 = [ C E 1 C E 2 C E 3 C E 4 ]$ where
$C E 1 = ( C 1 − C 2 C 4 − 1 C 3 ) − 1$; $C E 2 = ( − C E 1 C 2 C 4 − 1 )$; $C E 3 = − C 4 − 1 C 3 C E 1$; and $C E 4 = ( C 4 − 1 − C 4 − 1 C 3 C E 2 )$
The element (1,1) of the information matrix invers is $Ψ ~ λ λ − 1 = ( Ψ ~ λ λ − Ψ ~ λ β ( Ψ ~ β β ) − 1 Ψ ~ β λ ) − 1$
$Ψ ~ λ λ − 1 = ( p ( W K β ) ′ ( W K β ) − 2 ( W K β ) ′ W ′ ( e η t − K β ) − p ( ( W K β ) ′ ( W K β ) − ( e η t − K β ) ′$
$Ψ ~ λ λ − 1 = p − 1 ( − ( e η t − K β ) ′ W W ′ ( e η t − K β ) ) − 1$

## References

1. Bollen, K.A. Structural Equations with Latent Variables; John Wiley & Sons: New York, NY, USA, 1989. [Google Scholar]
2. Civelek, M.E. Essentials of Structural Equation Modeling; The University of Nebraska: Lincoln, NE, USA, 2018; ISBN 978-1-60962-129-2. [Google Scholar]
3. Wang, F.; Wall, M.M. Generalized common spatial factor model. Biostatistics 2003, 4, 569–582. [Google Scholar] [CrossRef] [PubMed]
4. Christensen, W.F.; Amemiya, Y. Latent variable analysis of multivariate spatial data. J. Am. Stat. Assoc. 2002, 97, 302–317. [Google Scholar] [CrossRef]
5. Hogan, J.W.; Tchernis, R. Bayesian Factor Analysis for Spatially Correlated Data, With Application to Summarizing Area-Level Material Deprivation from Census Data. J. Am. Stat. Assoc. 2004, 99, 314–324. [Google Scholar] [CrossRef]
6. Liu, X.; Wall, M.M.; Hodges, J.S. Generalized spatial structural equation models. Biostatistics 2005, 6, 539–557. [Google Scholar] [CrossRef] [PubMed] [Green Version]
7. Congdon, P. A spatial structural equation model for health outcomes. J. Stat. Plan. Inference 2008, 138, 2090–2105. [Google Scholar] [CrossRef]
8. Oud, J.H.L.; Folmer, H. A Structural Equation Approach to Models with Spatial Dependence. Geogr. Anal. 2008, 40, 152–166. [Google Scholar] [CrossRef]
9. Anselin, L. Spatial Econometrics: Methods and Models; Kluwer Academic Publisher: Dordrecht, The Netherlands, 1988; Volume 4, ISBN 978-90-481-8311-1. [Google Scholar]
10. Joreskog, K.G.; Sorbom, D. Lisrel 8: User’s Reference Guide; Scentific Software: Chicago, IL, USA, 1997. [Google Scholar]
11. Anekawati, A.; Widjanarko Otok, B. Modelling of the education quality of a high schools in Sumenep Regency using spatial structural equation modelling. J. Phys. Conf. Ser. 2017, 890, 012094. [Google Scholar] [CrossRef]
12. Trujillo, G.S. Pathmox Approach: Segmentation Trees in Partial Least Squares Path Modeling; Universitat Politecnica de Catalunya: Barcelona, Spain, 2009. [Google Scholar]
13. Anekawati, A.; Otok, B.W.; Sutikno, P. Generalized method of moments approach to spatial structural equation modeling. FJMS 2018, 103, 1057–1076. [Google Scholar] [CrossRef]
14. Kelejian, H.H.; Prucha, I.R. A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances. J. Real Estate Financ. Econ. 1998, 17, 99–121. [Google Scholar] [CrossRef]
15. Kelejian, H.H.; Prucha, I.R. A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. Int. Econ. Rev. 1999, 40, 509–533. [Google Scholar] [CrossRef] [Green Version]
16. Jeong, S.; Yoon, D. Examining Vulnerability Factors to Natural Disasters with a Spatial Autoregressive Model: The Case of South Korea. Sustainability 2018, 10, 1651. [Google Scholar] [CrossRef] [Green Version]
17. Breusch, T.S.; Pagan, A.R. The Lagrange Multiplier Test and its Applications to Model Specification in Econometrics. Rev. Econ. Stud. 1980, 47, 239–253. [Google Scholar] [CrossRef]
18. Anselin, L. Lagrange Multiplier Test Diagnostics for Spatial Dependence and Spatial Heterogeneity. Geogr. Anal. 1988, 20, 1–17. [Google Scholar] [CrossRef]
19. Yang, Z. LM tests of spatial dependence based on bootstrap critical values. J. Econom. 2015, 185, 33–59. [Google Scholar] [CrossRef]
20. LeSage, J.P. The Theory and Practice of Spatial Econometrics, 1st ed.; The University of Toledo: Toledo, OH, USA, 1999. [Google Scholar]
21. Gupta, A.K.; Nagar, D.K. Matrix Variate Distributions; Monographs and Surveys in Pure and Applied Mathematics; Chapman and Hall/CRC: New York, NY, USA, 2000; ISBN 1-58488-046-5. [Google Scholar]
22. Otok, B.W.; Standsyah, R.E.; Suharsono, A.; Purhadi. Development of Model Poverty in Java Using Meta-Analysis Structural Equation Modeling (MASEM). In Proceedings of the 2nd International Conference on Science, Mathematics, Environment, and Education, Surakarta, Indonesia, 26–28 July 2019; AIP Conference Proceedings Volume 2194. p. 020078. [Google Scholar] [CrossRef]
23. BPS. Sumenep in Figures 2018; BPS-Statistics of Sumenep Regency: Sumenep, Indonesia, 2018; ISBN 0215.2193.
24. BPS. Data dan Informasi Kemiskinan Kabupaten/Kota tahun 2019; BPS-Statistics Indonesia: Jakarta, Indonesia, 2019.
Figure 1. The education quality model.
Figure 1. The education quality model.
Figure 2. Distribution of public schools in Sumenep Regency.
Figure 2. Distribution of public schools in Sumenep Regency.
Figure 3. Model of poverty.
Figure 3. Model of poverty.
Figure 4. A map of the distribution of poor people in the East Java province (in percent).
Figure 4. A map of the distribution of poor people in the East Java province (in percent).
Table 1. The estimation result of parameter and spatial autoregressive coefficient for the education model.
Table 1. The estimation result of parameter and spatial autoregressive coefficient for the education model.
VariableCoefficient
School Infrastructure (b1)2.3121
Socioeconomic condition (b2)0.1286
Constant (b0)9.6604
Spatial Autoregressive Coefficient (λ)−0.002
Table 2. Number of students and public schools for the neighboring of Gapura district.
Table 2. Number of students and public schools for the neighboring of Gapura district.
DistrictNumber of Public Senior High SchoolsNumber of Junior High School Graduate StudentsNumber of New Senior High School Students
Kalianget1488744
Kota Sumenep315822041
Manding020193
Batuputih0226143
Gapura1485622
Batang Batang0552360
Dungkek0409156
Table 3. The estimation result of parameter and spatial autoregressive coefficient for the poverty model.
Table 3. The estimation result of parameter and spatial autoregressive coefficient for the poverty model.
VariableCoefficient
Economy (b1) 0.0742
Human Resource (b2)−0.0722
Health (b3)0.0155
Constant (b0)7.0881
Spatial Autoregressive Coefficient (λ)0.2345

## Share and Cite

MDPI and ACS Style

Anekawati, A.; Otok, B.W.; Purhadi, P.; Sutikno, S. Lagrange Multiplier Test for Spatial Autoregressive Model with Latent Variables. Symmetry 2020, 12, 1375. https://doi.org/10.3390/sym12081375

AMA Style

Anekawati A, Otok BW, Purhadi P, Sutikno S. Lagrange Multiplier Test for Spatial Autoregressive Model with Latent Variables. Symmetry. 2020; 12(8):1375. https://doi.org/10.3390/sym12081375

Chicago/Turabian Style

Anekawati, Anik, Bambang Widjanarko Otok, Purhadi Purhadi, and Sutikno Sutikno. 2020. "Lagrange Multiplier Test for Spatial Autoregressive Model with Latent Variables" Symmetry 12, no. 8: 1375. https://doi.org/10.3390/sym12081375

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.