Ridge-Type Pretest and Shrinkage Estimation Strategies in Spatial Error Models with an Application to a Real Data Example

Al-Momani, Marwan; Arashi, Mohammad

doi:10.3390/math12030390

Open AccessArticle

Ridge-Type Pretest and Shrinkage Estimation Strategies in Spatial Error Models with an Application to a Real Data Example

by

Marwan Al-Momani

^1,*

and

Mohammad Arashi

^2,3

¹

Department of Mathematics, College of Sciences, University of Sharjah, Sharjah P.O. Box 27272, United Arab Emirates

²

Department of Statistics, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad 9177948974, Iran

³

Department of Statistics, Faculty of Natural and Agricultural Sciences, University of Pretoria, Pretoria 0028, South Africa

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(3), 390; https://doi.org/10.3390/math12030390

Submission received: 6 December 2023 / Revised: 16 January 2024 / Accepted: 23 January 2024 / Published: 25 January 2024

(This article belongs to the Special Issue Bayesian Inference, Prediction and Model Selection)

Download

Browse Figures

Versions Notes

Abstract

Spatial regression models are widely available across several disciplines, such as functional magnetic resonance imaging analysis, econometrics, and house price analysis. In nature, sparsity occurs when a limited number of factors strongly impact overall variation. Sparse covariance structures are common in spatial regression models. The spatial error model is a significant spatial regression model that focuses on the geographical dependence present in the error terms rather than the response variable. This study proposes an effective approach using the pretest and shrinkage ridge estimators for estimating the vector of regression coefficients in the spatial error mode, considering insignificant coefficients and multicollinearity among regressors. The study compares the performance of the proposed estimators with the maximum likelihood estimator and assesses their efficacy using real-world data and bootstrapping techniques for comparison purposes.

Keywords:

spatial error model; asymptotic performance; bootstrapping; pretest; ridge estimator; shrinkage

MSC:

62J07; 62H12; 91B72

1. Introduction

Data collected over a geographic region may be generally more comparable than data from far away. This phenomenon can be modeled using a covariance structure in conventional statistical models. Spatial regression models, incorporating various spatial dependencies, are increasingly being used in geology, epidemiology, disease monitoring, urban planning, and econometrics.

Time series autoregressive models represent data at time t as linear combinations of the most recent observations. Similarly, in the spatial framework, these models display data from a specific location based on neighboring locations. Data are typically collected from a geographical location known as a site, and proximity is defined by a distance metric.

One of the most used autoregressive models is the spatial error(SE) model, in which a linear regression with a spatially lagged autoregressive error component is used to model the spatial response variable’s mean. Ref. [1] investigated the quantile regression estimation for the SE model with potentially variable coefficients. The authors established the proposed estimators’ asymptotic properties. Ref. [2] applied the SE model to examine the existence of spatial clustering and correlation between neighboring counties for the data from Egypt’s 2006 census. Ref. [3] used the SE model to evaluate the social disorganization theory. Ref. [4] used the combined application of the SE model and spatial lag model based on cross-sectional data from 20 districts in Chengdu. The authors found that the haze had a negative impact on both the selling and rental prices of houses. Ref. [5] proposed a robust estimation method based on SE models, demonstrating reduced bias, a more stable empirical influence function, and robustness to outliers through simulation. More information about the spatial autoregressive models can be found in [6,7,8], among others.

In frequentist statistics, sample information is used to establish inferences about unknown parameters, while Bayesian statistics combines sample information with uncertain prior information (UPI) to draw conclusions. Subjective UPI may not always be available. However, model selection procedures like Akaike’s information criterion (AIC), Bayesian information criterion (BIC), or model selection techniques can still provide UPI.

An initial trial to estimate the regression parameters using sample information and UPI is called a pretest estimation. The pretest estimator selects significant regression coefficients and chooses between the full model estimator or the revised model estimator, which contains fewer coefficients based on a binary weight. A new modification of the pretest estimator, known as the shrinkage estimator, uses smooth weights between full and sub-model estimators to adjust regression coefficient estimates toward a target value impacted by UPI. Nevertheless, the modified shrinkage estimator suffers sometimes from an over-shrinkage phenomenon. Later, an improved version of this estimator, known as positive shrinkage, controls the over-shrinkage issue.

The concept of using pretest and shrinkage estimating methodologies has received considerable attention from many researchers; for example, ref. [9] introduced an efficient estimation using pretest and shrinkage methods to estimate the regression coefficient vector of the marginal model in the case of multinomial responses. Ref. [10] developed different shrinkage and penalty estimation strategies for the negative binomial regression model when over-fitting and uncertain information exists about the subspace. Ref. [11] proposed shrinkage estimation for the parameter vector of the linear regression model with heteroscedastic errors, and extended their study to the high-dimensional heteroscedastic regression model.

Multicollinearity is a major issue when fitting a multiple linear regression model using the ordinary least squares (OLS) method, which arises when some regressor variables are correlated, especially when the correlation between any two is high. There are several techniques discussed in the literature to reduce the risk of this issue. Ref. [12] introduced the concept of ridge regression as a solution to nonorthogonal problems. They showed that the estimator improves the mean square error of estimation. Ref. [13] introduced a new biased estimator and demonstrated, both theoretically and numerically, the improvements of the new one. Ref. [14] proposed a new version of the Liu estimator for the vector of parameters in a linear regression model based on some prior information.

Using the idea of shrinkage, ref. [15] introduced an improved form of the Liu-type estimator. Analytical and numerical results were used to demonstrate the proposed method’s superiority. Ref. [16] suggested the use of the ridge estimator as a suitable approach for handling high-dimensional multicollinearity data. Recently, ref. [17] proposed a novel pretest and shrinkage estimate technique, known as the Liu-type approach, developed for the conditional autoregressive model.

In this article, we propose the ridge-type pretest and shrinkage estimation strategy for the

p \times 1

regression coefficients vector in the SE model when some prior information is available about the irrelevant coefficients. We will partition the vector

β

as

{(β_{1}, β_{2})}^{T}

, where

β_{1}

is a

p_{1} \times 1

vector that contains the coefficients of the main effect, and

β_{2}

is a

p_{2} \times 1

vector of irrelevant coefficients, with

p_{1} + p_{2} = p

. Mainly, we focus on estimating the vector

β_{1}

when the UPI indicates that

β_{2}

is ineffective, which can be achieved by testing a statistical hypothesis of the form

H_{0} : β_{2} = 0

. In some instances, the estimator of the full model may exhibit considerable variability and provide challenges in terms of interpretation. Conversely, the estimator of the sub-model may yield a significantly biased and under-fitted estimate. To tackle this matter, we took into account the pretest, shrinkage, and positive shrinkage ridge estimators for the vector

β_{1}

.

In accordance with our goal, this paper is organized as follows. Section 2 offers an overview of the SE model. A discussion about the maximum likelihood estimators for the parameters of the SE model is presented in Section 3. In Section 4, we propose the pretest and shrinkage ridge estimators. Asymptotic analysis of the proposed estimators and some theoretical results are presented in Section 5. The estimators are compared numerically using simulated and real data examples in Section 6. Some concluding remarks are presented in Section 7. An appendix containing some proofs is presented at the end of this manuscript.

2. Spatial Error Model

Let

s = {s_{1}, s_{2}, \dots, s_{n}}

represent a set of

(n)

spatial sites (frequently known as locations, regions, etc.). Set

s

forms what is commonly referred to as a lattice, and the set of nearby sites for

s_{j}

, denoted by

K (s_{j})

, is defined as follows:

K (s_{j}) = {s_{i} : s_{i} is a neighbor of s_{j}}

,

i = 1, 2, \dots, n

. A neighborhood structure can be determined using a predefined adjacency metric. In regular lattices, if two sites just share edges, they are rook-based neighbors; if they also share borders and/or corners, they are queen-based neighbors.

Let

Y_{n} (s) = {Y (s_{1}), Y (s_{2}), \dots, Y (s_{n})}

be a vector of observations collected at sites

{s_{1}, s_{2}, \dots, s_{n}}

,

X (s) = (X (s_{1}), X (s_{2}), \dots, X (s_{n}))

be the

(n \times p)

matrix of covariates, and

β = {(β_{1}, β_{2}, \dots, β_{p})}^{T}

be the

(p \times 1)

vector of unknown regression parameters, known as the large-scale effect. Following Cressie and Wikle [7], the SE model models the response

Y

at the

j^{t h}

site

s_{j}

as follows:

\begin{matrix} Y (s_{j}) & = & X^{T} (s_{j}) β + ϵ (s_{j}), j = 1, 2, \dots, n, \end{matrix}

(1)

\begin{matrix} with ϵ (s_{j}) & = & \sum_{i \neq j}^{n} λ_{j i} ϵ (s_{i}) + e (s_{j}), j = 1, 2, \dots, n, \end{matrix}

(2)

where

e (s) = {(e (s_{1}), e (s_{2}), \dots, e (s_{n}))}^{T}

is the noise vector that has a Gaussian distribution with a mean of

0

and the covariance matrix

Ω = d i a g {σ_{j}^{2}, j = 1, 2, \dots, n}

. Parameters

λ_{j i}

are used to model the spatial dependencies among the errors

ϵ_{j}, j = 1, 2, \dots, n

, with

λ_{j j} = 0

. Let

Λ = {λ_{j, i}}_{j, i = 1}^{n}

, and assume that

(I - Λ)

is invertible, where

I

is the

(n \times n)

identity matrix, then by ignoring the spatial indices, the SE in (1) can be rewritten in matrix form, as follows:

\begin{matrix} Y = X β + ϵ with ϵ \sim N (0, {(I - Λ)}^{- 1} Ω {(I - Λ^{T})}^{- 1}) \end{matrix}

(3)

Nature exhibits sparsity in many situations, which means that a small number of factors can account for the majority of the observed variability. In the context of regression analysis, sparsity means a few numbers of the coefficients are significantly different from zero, while the bulk of the coefficients are insignificant and remain zero. Sparsity is frequently used in spatial regression models to imply covariance structures that are easier to compute. Consequently, by setting

Ω = σ^{2} I

, and

Λ = ρ W

, where

σ^{2}

is the variance component,

ρ

is the spatial dependence parameter, and

W

is the weight or proximity-known matrix with a main diagonal of zeros, and off-diagonal entices

w_{j i} = 1

if the location j is a neighbor to location i; otherwise,

w_{j i} = 0 for j \neq i

, the preceding model yields a straightforward and frequently used version. Usually, the weight matrix is normalized as

W^{*} = {\frac{w_{j i}}{w_{j +}}}_{j, i = 1}^{n}

. So, the SE regression model can be rewritten as follows:

\begin{matrix} Y & = & X β + ϵ, where ϵ \sim N (0, σ^{2} V_{n}) \end{matrix}

(4)

\begin{matrix} and V_{n} & = & {(1 - ρ W^{*})}^{- 1} {(1 - ρ W^{* T})}^{- 1} \end{matrix}

(5)

3. Maximum Likelihood Estimation

Let

θ = (β, σ^{2}, ρ)

; the maximum likelihood estimator (MLE) of

θ

may be acquired by the use of a two-step profile-likelihood method; see [6]. At first, we fix

ρ

and find the MLEs of

β, σ^{2}

as a function of

ρ

, which are given as follows:

\begin{matrix} \hat{β} (ρ) & = & {(X^{T} V_{n}^{- 1} X)}^{- 1} X^{T} V_{n}^{- 1} Y \end{matrix}

(6)

\begin{matrix} {\hat{σ}}^{2} (ρ) & = & \frac{{(Y - X \hat{β} (ρ))}^{T} V_{n}^{- 1} (Y - X \hat{β} (ρ))}{n} . \end{matrix}

(7)

Then, we plug

\hat{β}

and

{\hat{σ}}^{2}

into the log-likelihood and obtain the MLE of

ρ

by maximizing the profile’s log-likelihood function. Finally, the MLEs of

β

and

σ^{2}

are computed by replacing

ρ

with

\hat{ρ}

in Equations (6) and (7), respectively. Ref. [18] proved that

\hat{θ}

is a consistent estimator of

θ

, and asymptotically has normal distribution. This finding makes it simple to demonstrate that

\hat{β}

is asymptotically normal and consistent. The significance of regression coefficients can often be determined subjectively or through certain model selection techniques in various situations. As a result of this information, the

(p \times 1)

regression coefficient vector

β

is divided into two sub-vectors, as

β = (β_{1}, β_{2})

, where

β_{1}

is a

p_{1} \times 1

vector of important coefficients and

β_{2}

is a

p_{2} \times 1

vector of unimportant coefficients with

p_{1} + p_{2} = p

. Similarly, the matrix of covariates

X

is also partitioned as

X = (X_{1}, X_{2})

, where

X_{1}

and

X_{2}

consist of the first

p_{1}

and the last

p_{2}

columns of the design matrix

X

of dimensions

n \times p_{1}

and

n \times p_{2}

, respectively. Consequently, the SE full model in (3) can be rewritten as follows:

\begin{matrix} Y & = & X_{1} β_{1} + X_{2} β_{2} + ϵ \end{matrix}

(8)

For the full model in (8), we can obtain the MLEs of

(β_{1}, β_{2})

using the same technique employed in model (4); see [19]. The MLEs are as follows:

\begin{matrix} {\hat{β}}_{1} & = & {(X_{1}^{T} A_{X_{2}} X_{1})}^{- 1} X_{1}^{T} A_{X_{2}} Y, where \end{matrix}

(9)

\begin{matrix} A_{X_{2}} & = & {\hat{V}}_{n}^{- 1} - {\hat{V}}_{n}^{- 1} X_{2} {(X_{2}^{T} {\hat{V}}_{n}^{- 1} X_{2})}^{- 1} X_{2}^{T} {\hat{V}}_{n}^{- 1} \end{matrix}

and

{\hat{β}}_{2}

has an identical formula as

{\hat{β}}_{1}

by exchanging indices 1 and 2 in the above two equations. The full model estimation may be prone to significant variability and may be difficult to interpret. Our primary goal is to estimate the value of

β_{1}

when the set of regressors included in the partitioned matrix

X_{2}

does not sufficiently explain the variability in the response variable, which can be achieved by formulating a linear hypothesis as follows:

\begin{matrix} H_{0} : β_{2} & = & 0 \end{matrix}

(10)

Assuming the null hypothesis in (10) is true, the updated model based on this assumption of the model, given (8), becomes

\begin{matrix} Y & = & X_{1} β_{1} + ϵ . \end{matrix}

(11)

We will refer to the model in (11) as the restricted SE model. Let

{\hat{β}}_{1}^{S}

be the MLE of

β_{1}

of the model in (11), then

\begin{matrix} {\hat{β}}_{1}^{S} & = & {(X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{1})}^{- 1} X_{1}^{T} {\hat{V}}_{n}^{- 1} Y . \end{matrix}

(12)

Obviously,

{\hat{β}}_{1}^{S}

will have better performance than

{\hat{β}}_{1}

if the null hypothesis in (10) is true, while the opposite occurs when

β_{2}

begins to move away from the null space. Yet, the restricted strategy method can provide the under-fitted and highly biased model. One goal of this research is to overcome the problem of significant bias in spatial error models when multicollinearity is present among the regressors. To address this issue, we suggest using a ridge-type estimate approach for both the full and sub-models to enhance these estimators by incorporating pretest and shrinkage estimating approaches, to reduce biases and improve the overall accuracy of the estimators. To dominate the large bias, we propose the ridge-type estimation strategy of the full and reduced models and then improve the two estimators using the pretest and shrinkage estimation idea.

4. Materials and Methods: Developing Pretest and Shrinkage Ridge Estimation Strategies

In this section, we propose a set of estimators for the SE model parameter vector

β_{1}

in (11). Following [12], the ridge estimator of

β

for the model given in (4) is defined as

\begin{matrix} {\hat{β}}^{R F} & = & {(X^{T} {\hat{V}}_{n}^{- 1} X + k I_{p})}^{- 1} X^{T} {\hat{V}}_{n}^{- 1} Y, \end{matrix}

(13)

where

k > 0

is known as the ridge parameter. Clearly, when

k = 0

, the ridge estimator reduces to the MLE of

β

, but if

k ⟶ \infty

, the ridge estimator

{\hat{β}}^{R F} = 0

.

4.1. Full and Reduced Model Ridge Estimators

The unrestricted full model ridge estimator of

β_{1}

, denoted by

{\hat{β}}_{1}^{UR}

, is defined as follows:

\begin{matrix} {\hat{β}}_{1}^{UR} & = & {(X_{1}^{T} A_{X_{2}} X_{1} + k_{f} I_{p_{1}})}^{- 1} X_{1}^{T} A_{X_{2}} Y, \end{matrix}

(14)

where

k_{f}

represents the ridge parameter for the unretracted full model estimator, denoted as

{\hat{β}}_{1}^{UR}

. Assuming the null hypothesis in (10) is true, the restricted ridge estimator of

β_{1}

for the model in (11), denoted by

{\hat{β}}_{1}^{RR}

, is given by

\begin{matrix} {\hat{β}}_{1}^{RR} & = & {(X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{1} + k_{r} I_{p_{1}})}^{- 1} X_{1}^{T} {\hat{V}}_{n}^{- 1} Y, \end{matrix}

(15)

where

k_{r}

is the ridge parameter for restricted model estimator

{\hat{β}}_{1}^{RR}

. When the null hypothesis in (10) is accurate or almost accurate (i.e., when

β_{2}

is close to zero),

{\hat{β}}_{1}^{RR}

is generally a more effective estimator than

{\hat{β}}_{1}^{UR}

. Nevertheless, as

β_{2}

deviates from the zero space,

{\hat{β}}_{1}^{RR}

becomes inefficient compared with the unrestricted estimator

{\hat{β}}_{1}^{UR}

. In addition to the gain obtained by employing the ridge estimation idea to the MLE of

β_{1}

, we also aim to find estimators that are functions of

{\hat{β}}_{1}^{UR}

and

{\hat{β}}_{1}^{RR}

, and intend to lessen the dangers connected with any of these two estimators over the majority of the parameter space. The pretest and shrinkage estimators, which will be built in the following subsection, can help with this.

4.2. Pretest, Shrinkage, and Positive Shrinkage Ridge Estimators

In line with testing the null hypothesis in (10), the pretest estimator selects either the full model estimator

{\hat{β}}_{1}^{UR}

if

H_{0}

is rejected or the restricted ridge estimator

{\hat{β}}_{1}^{RR}

if not. An appropriate test statistic to test the hypothesis in (10) is

\begin{matrix} T_{n} & = & \frac{{({\hat{β}}_{2}^{UR})}^{T} (X_{2}^{T} A_{X_{1}} X_{2}) ({\hat{β}}_{2}^{UR})}{s^{2}}, \end{matrix}

where

A_{X_{1}}

is defined in a similar manner, as

A_{X_{2}}

,

{\hat{β}}_{2}^{UR} = {(X_{2}^{T} A_{X_{1}} X_{2})}^{- 1} X_{2}^{T} A_{X_{1}} Y

, and

s^{2} = {(Y - X {\hat{β}}^{R F})}^{T} (Y - X {\hat{β}}^{R F}) / (n - p)

, which is a consistent estimator of

σ^{2}

, and statistic

T_{n}

asymptotically follows a chi-square distribution with

p_{2}

degrees of freedom under the null hypothesis. Hence, the pretest estimator, denoted by

{\hat{β}}_{1}^{PTR}

, is given by

\begin{matrix} {\hat{β}}_{1}^{PTR} & = & {\hat{β}}_{1}^{UR} - ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) I (T_{n} \leq χ_{α, p_{2}}^{2}), \end{matrix}

(16)

where

I (.)

is an indicator function, and

χ_{α, p_{2}}^{2}

is the upper

α

th quantile of the chi-square distribution with

p_{2}

degrees of freedom. The pretest estimator depends on the level of the significance

(α)

, and selects

{\hat{β}}_{1}^{UR}

if the null hypothesis is rejected, and

{\hat{β}}_{1}^{RR}

otherwise, based on binary weights. These drawbacks can be improved using smoother weights of the two estimators

{\hat{β}}_{1}^{UR}

and

{\hat{β}}_{1}^{RR}

instead, which is known as the shrinkage estimator. It is denoted by,

{\hat{β}}_{1}^{SR}

, and given by

\begin{matrix} {\hat{β}}_{1}^{SR} & = & {\hat{β}}_{1}^{RR} + ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) {1 - (p_{2} - 2) T_{n}^{- 1}}, p_{2} \geq 3 . \end{matrix}

(17)

The shrinkage estimator may experience an over-shrinkage in which negative coordinates may be produced whenever

(T_{n} < p_{2} - 2)

. The positive shrinkage estimator, a modified version of

{\hat{β}}_{1}^{SR}

, resolves this issue. It is denoted by

{\hat{β}}_{1}^{PSR}

, and given by

\begin{matrix} {\hat{β}}_{1}^{PSR} & = & {\hat{β}}_{1}^{RR} + ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) {1 - (p_{2} - 2) T_{n}^{- 1}}^{+}, \end{matrix}

(18)

where

x^{+} = m a x (x, 0) .

It is easy to see that all the pronounced shrinkage estimators satisfy the following general form

\begin{matrix} {\hat{β}}_{1}^{Shrinkage} = {\hat{β}}_{1}^{UR} - ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n}) . \end{matrix}

(19)

Simply, for

{\hat{β}}_{1}^{PTR}

,

{\hat{β}}_{1}^{SR}

, and

{\hat{β}}_{1}^{PSR}

, the corresponding

g (\cdot)

functions are given by

I (T_{n} \leq χ_{α, p_{2}}^{2})

,

(p_{2} - 2) T_{n}^{- 1}

, and

(1 - (p_{2} - 2) T_{n}^{- 1}) I (T_{n} \leq χ_{α, p_{2}}^{2})

, respectively.

5. Asymptotic Analysis

In this section, we will study the asymptotic performances of all estimators based on their asymptotic quadratic risks. Our goal is to investigate the behavior of the set of estimators near the null space, so we consider a sequence of local alternatives given by

\begin{matrix} H_{(n)} : β_{2 (n)} & = & \frac{ξ}{\sqrt{n}}, ξ \in ℜ^{p_{2}}, with ξ \neq 0 \end{matrix}

(20)

Obviously, when

ξ = 0

, the local alternatives in (20) may be simplified to the null hypothesis given in (10). Assuming that

K (x)

represents the cumulative distribution function of any estimator of

β_{1}

, say

{\hat{β}}_{1}^{*}

, then:

K (x) = lim_{n ⟶ \infty} P_{H_{(n)}} (\sqrt{n} ({\hat{β}}_{1}^{*} - β_{1}))

. Thus, for any

(p_{1} \times p_{1})

positive definite matrix

M

, the weighted quadratic loss function is defined as

\begin{matrix} W ({\hat{β}}_{1}^{*}, β_{1}) & = & n {({\hat{β}}_{1}^{*} - β_{1})}^{T} M ({\hat{β}}_{1}^{*} - β_{1}) \\ = & t r [M [n ({\hat{β}}_{1}^{*} - β_{1}) {({\hat{β}}_{1}^{*} - β_{1})}^{T}]], \end{matrix}

where

t r (A)

is the trace of the matrix

A

. Define

ϑ_{n}^{*} = \sqrt{n} ({\hat{β}}_{1}^{*} - β_{1})

, then if

ϑ_{n}^{*} \overset{D}{⟶} ϑ^{*}

, where

\overset{D}{⟶}

denotes the convergence in distribution, then the asymptotic (distributional) quadratic risk (ADQR) of

{\hat{β}}_{1}^{*}

, denoted by

Γ (β_{1}^{*})

, is given by

\begin{matrix} Γ (β_{1}^{*}, M) & = & E (ϑ_{n}^{* T} M ϑ_{n}^{*}) = \int (x_{1}^{T} M x_{1}) d K (x_{1}) \end{matrix}

(21)

The asymptotic (distributional) bias (ADB) of

{\hat{β}}_{1}^{*}

can be obtained via

\begin{matrix} ADB ({\hat{β}}_{1}^{*}) = E (lim_{n \to \infty} \sqrt{n} ({\hat{β}}_{1}^{*} - β_{1})) . \end{matrix}

(22)

To derive asymptotic distributional properties, in addition to the first four assumptions of [18], we set the following regularity conditions:

(A1): ${max}_{1 \leq i \leq n} \frac{1}{n} x_{i}^{T} {(X^{T} {\hat{V}}_{n}^{- 1} X)}^{- 1} x_{i} \to 0$ , as $n \to \infty$ , where $x_{i}$ is the ith row of $X$ .
(A2): Let $C_{n} = X^{T} {\hat{V}}_{n}^{- 1} X$ . Then, $lim_{n \to \infty} \frac{1}{n} C_{n} = C$ , where $C$ is the $(p \times p)$ positive definite matrix.
(A3): Let

$\begin{matrix} C_{n}^{- 1} & = & {(\begin{matrix} X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{1} & X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{2} \\ X_{2}^{T} {\hat{V}}_{n}^{- 1} X_{1} & X_{2}^{T} {\hat{V}}_{n}^{- 1} X_{2} \end{matrix})}^{- 1} and \\ C^{- 1} & = & {(\begin{matrix} C_{11} & C_{12} \\ C_{21} & C_{22} \end{matrix})}^{- 1} = (\begin{matrix} C_{11.2}^{- 1} & - C_{11}^{- 1} C_{12} C_{22.1}^{- 1} \\ - C_{22}^{- 1} C_{21} C_{11.2}^{- 1} & C_{22.1}^{- 1} \end{matrix}), \end{matrix}$

Then, $lim_{n \to \infty} {(\frac{1}{n} C_{n})}^{- 1} = C^{- 1}$ , where $C_{i i . j} = C_{i i} - C_{i j} C_{j j}^{- 1} C_{j i}$ for $i, j = 1, 2$ .

In the following, we refer to the above assumptions as the “named regularity condition (NRC)”.

The primary tool to derive expressions of the asymptotic quadratic risks for the proposed estimators is to find the asymptotic distribution of the unrestricted full model ridge estimator

{\hat{β}}_{1}^{UR}

and the restricted ridge estimator

{\hat{β}}_{1}^{RR}

. To this end, we make use of the following lemma. The proof is provided in Appendix A.

Lemma 1.

Assume the NRC. If

k / \sqrt{n} \to k_{o} \geq 0

, then

\begin{matrix} \sqrt{n} ({\hat{β}}^{RF} - β) \overset{D}{\to} N_{p} (- k_{o} C^{- 1} β, σ^{2} C^{- 1}), \end{matrix}

where

\overset{D}{\to}

denotes convergence in distribution. Indeed, Lemma 1 enables us to provide some asymptotic distributional results about the estimators

{\hat{β}}_{1}^{UR}

and

{\hat{β}}_{1}^{RR}

, as presented in the following theorem, which are easy to prove.

Theorem 1.

Let

ϑ_{n}^{(1)} = \sqrt{n} ({\hat{β}}_{1}^{UR} - β_{1})

,

ϑ_{n}^{(2)} = \sqrt{n} ({\hat{β}}_{1}^{RR} - β_{1})

, and

ϑ_{n}^{(3)} = \sqrt{n} ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR})

. Assume the local alternatives in (20) and NRC. Then, as

n \to \infty

, we have

1.: $ϑ_{n}^{(1)} \sim N_{p_{1}} (- η_{11.2}, σ^{2} C_{11.2}^{- 1})$
2.: $ϑ_{n}^{(2)} \sim N_{p_{1}} (δ - η_{11.2}, σ^{2} C_{11}^{- 1})$
3.: $ϑ_{n}^{(3)} \sim N_{p_{1}} (δ, σ^{2} (C_{11.2}^{- 1} - C_{11}^{- 1}))$
4.: $(\begin{matrix} ϑ_{n}^{(1)} \\ ϑ_{n}^{(3)} \end{matrix}) \sim N_{2 p_{1}} ((\begin{matrix} - η_{11.2} \\ δ \end{matrix}), σ^{2} (\begin{matrix} C_{11.2}^{- 1} & C_{11.2}^{- 1} - C_{11}^{- 1} \\ C_{11.2}^{- 1} - C_{11}^{- 1} & C_{11.2}^{- 1} - C_{11}^{- 1} \end{matrix}))$
5.: $(\begin{matrix} ϑ_{n}^{(2)} \\ ϑ_{n}^{(3)} \end{matrix}) \sim N_{2 p_{1}} ((\begin{matrix} δ - η_{11.2} \\ δ \end{matrix}), σ^{2} (\begin{matrix} C_{11}^{- 1} & 0 \\ 0 & C_{11.2}^{- 1} - C_{11}^{- 1} \end{matrix}))$
6.: E $[ϑ_{n}^{(1)} | ϑ_{n}^{(3)}] = - η_{11.2} + ϑ_{n}^{(3)} - δ$
7.: $P r (T_{n} \leq x) = H_{q} (x; Δ)$ , where $H_{q} (x; Δ)$ is the cumulative distribution function of a non-central chi-square distribution with q degrees of freedom and the Δ non-centrality parameter.
Where $η = {(η_{1}^{T}, η_{2}^{T})}^{T} = - k_{o} C^{- 1} β$ , $η_{11.2} = η_{1} - C_{12} C_{22}^{- 1} ((β_{2} - ξ) - η_{2})$ , $δ = C_{11}^{- 1} C_{12} ξ$ .

With Lemma 1 in hand, it is pretty straightforward to reach the asymptotic distributional properties of the shrinkage estimators. Through the subsequent theorems, we will provide the asymptotic bias and weighted quadratic risk functions.

Theorem 2.

Under the assumptions of Lemma 1, the asymptotic distributional bias of the shrinkage estimators are given by

1.: $ADB ({\hat{β}}_{1}^{PTR}) = - η_{11.2} - δ H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ)$
2.: $ADB ({\hat{β}}_{1}^{SR}) = - η_{11.2} - (p_{2} - 2) δ E (χ_{p_{2} + 2}^{- 2} (Δ))$
3.: $ADB ({\hat{β}}_{1}^{PSR}) = - η_{11.2} - δ H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) + (p_{2} - 2) δ E [χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)],$
where

$E (χ_{q}^{- 2 i} (Δ)) = \int_{x = 0}^{x = \infty} x^{- 2 i} d H_{q} (x; Δ), i = 1, 2 .$

For the proof, refer to Appendix A.

The following results reveal the expressions for the ADQR of the proposed shrinkage estimators.

Theorem 3.

Under the assumptions of Lemma 1, the asymptotic distributional quadratic risks of the shrinkage estimators are given by

1.: $\begin{matrix} Γ ({\hat{β}}_{1}^{PTR}, M) & = & Γ ({\hat{β}}_{1}^{UR}, M) - 2 η_{11.2}^{T} M δ H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) \\ - σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) \\ δ^{T} M δ [2 H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) - H_{p_{2} + 4} (χ_{α, p_{2}}^{2}; Δ)] \end{matrix}$
2.: $\begin{matrix} Γ ({\hat{β}}_{1}^{SR}, M) & = & Γ ({\hat{β}}_{1}^{UR}, M) + 2 (p_{2} - 2) η_{11.2}^{T} M δ E (χ_{p_{2} + 2}^{- 2} (Δ)) \\ - (p_{2} - 2) σ^{2} t r (M C_{11}^{- 1} C_{12} C_{22.1}^{- 1} C_{21} C_{11}^{- 1}) \\ \{2 E (χ_{p_{2} + 2}^{- 2} (Δ)) - (p_{2} - 2) E (χ_{p_{2} + 2}^{- 4} (Δ))\} \\ + & (p_{2} - 2) δ^{T} M δ \\ \times \{2 E (χ_{p_{2} + 2}^{- 2} (Δ)) - 2 E (χ_{p_{2} + 4}^{- 2} (Δ)) - (p_{2} - 2) E (χ_{p_{2} + 4}^{- 4} (Δ))\} . \end{matrix}$
3.: $\begin{matrix} Γ ({\hat{β}}_{1}^{PSR}, M) & = & Γ ({\hat{β}}_{1}^{SR}, M) - 2 η_{11.2}^{T} M δ A_{1} \\ + (p_{2} - 2) σ^{2} t r (M C_{11}^{- 1} C_{12} C_{22.1}^{- 1} C_{21} C_{11}^{- 1}) A_{2} \\ - σ^{2} t r (M C_{11}^{- 1} C_{12} C_{22.1}^{- 1} C_{21} C_{11}^{- 1}) H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) \\ + δ^{T} M δ [2 H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ) - H_{p_{2} + 4} (χ_{α, p_{2}}^{2}; Δ)] \\ - (p_{2} - 2) δ^{T} M δ A_{3}, \end{matrix}$

where

M

is a positive definite weight matrix,

\begin{matrix} Γ ({\hat{β}}_{1}^{UR}, M) = η_{11.2}^{T} M η_{11.2} + σ^{2} t r (M C_{11.2}^{- 1}), a n d \end{matrix}

\begin{matrix} A_{1} & = & E \{1 - (p_{2} - 2) χ_{p_{2} + 2}^{- 2} (Δ)\} I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2), \\ A_{2} & = & 2 E \{χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)\} \\ - (p_{2} - 2) E \{χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)\}, \\ A_{3} & = & 2 E \{χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)\} \\ - 2 E \{χ_{p_{2} + 4}^{- 2} (Δ) I (χ_{p_{2} + 4}^{2} (Δ) \leq p_{2} - 2)\} \\ + (p_{2} - 2) E \{χ_{p_{2} + 2}^{- 4} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)\} . \end{matrix}

For the proof, refer to Appendix A.

6. Numerical Analysis

To demonstrate our theoretical findings, we first use Monte Carlo simulation experiments and then apply the set of proposed estimators to a real dataset. The Monte Carlo simulation is used to investigate the performance of the ridge-type set of estimators in comparison to the MLE

({\hat{β}}_{1})

given in (9) via the simulated mean square error of each estimator.

6.1. Simulation Experiments

In this section, we compare the set of ridge-type estimators with respect to the MLE using Monte Carlo simulation experiments based on their simulated mean squared errors. In each of these experiments, we consider

(N \times N)

square lattices using

N = 7, 10

with the corresponding sample sizes of

n = N^{2} = 49, 100

, respectively. To show the performance of the proposed estimators when multicollinearity exists, we generate the design matrix

X

from the multivariate normal distribution with a mean of

0

, and a variance-covariance matrix with a first-order autoregressive structure, in which

c o v (X_{i}, X_{j}) = {\begin{matrix} ρ_{x}^{| i - j |} & i \neq j \\ 1 & i = j \end{matrix}

and apply it to

ρ_{x} \in {0.3, 0.6, 0.9}

, while the error term

ϵ

is generated from another multivariate normal with a mean of

0

and an SE variance matrix with

V_{n} = σ^{2} {(I - ρ W^{*})}^{- 1} {(I - ρ W^{*'})}^{- 1}

. We set

σ^{2} = 1

. For the weight matrix

W^{*}

, a queen-based contiguity neighborhood was used. The set of values for

ρ

is

{0.3, 0.6, 0.9}

. We partitioned the vector of coefficients

β

as

β = (β_{1}, β_{2})

, where

β_{1} = 1_{p 1}

is a

p_{1} \times 1

vector of ones, and

β_{2} = (Δ, 0_{p_{2} - 1})

,

0_{p_{2} - 1}

is a zero vector of dimension

(p_{2} - 1 \times 1)

, and

Δ = ∥ β - β_{0} ∥

, where

∥ A ∥

is the Euclidean norm of

A

, and

Δ

represents the non-centrality parameter. The range of values for

Δ

is set to be from 0 to 2. Then we fit the model in (8) using the spautolm function within the R-package spdep [20], obtain the values of all estimators considered in our study, and compute the simulated mean square error (SMSE) of each estimator as

S M S E (\hat{β_{1}^{*}}) = \sum_{i = 1}^{p_{1}} {(\hat{β_{1 i}^{*}} - β_{1 i})}^{2}

. The simulated relative efficiency (SRE) of any estimator, say

{\hat{β_{1}}}^{\circ}

, with respect to the MLE

(\hat{β_{1}})

, is calculated as follows:

\begin{matrix} S R E ({\hat{β_{1}}}^{\circ}) & = & \frac{S M S E (\hat{β_{1}})}{S M S E ({\hat{β_{1}}}^{\circ})}, \end{matrix}

(23)

where

{\hat{β_{1}}}^{\circ}

is any of the estimators

{{\hat{β}}_{1}^{UR}, {\hat{β}}_{1}^{RR}, {\hat{β}}_{1}^{PTR}, {\hat{β}}_{1}^{SR}, {\hat{β}}_{1}^{PSR}}

. It is evident that when the

S R E ({\hat{β_{1}}}^{\circ})

is greater than one, it signifies that this estimate outperforms the MLE of the full model, and vice versa. We run the simulation for

(p_{1}, p_{2}) \in {(5, 10), (5, 20), (5, 30)}

, and use

α = 0.05

to test the hypothesis in (20). No statistically significant change is seen while altering the spatial dependency parameter. Therefore, we simply exhibit the graphs for

ρ = 0.90

. Figure A1, Figure A2 and Figure A3 in Appendix A show the results of the SRE against various values of

Δ

. The findings support the following conclusions:

(i): Across all values, the ridge-type full model estimator $({\hat{β}}_{1}^{UR})$ consistently outperforms the traditional MLE estimator. Furthermore, as $p_{2}$ increases, so does its efficiency for fixed values of $ρ$ and $ρ_{x}$ . Additionally, when the multicollinearity among the explanatory variables in the design matrix becomes stronger, ${\hat{β}}_{1}^{UR}$ efficiency increases as expected.
(ii): The ridge-type sub-model estimator $({\hat{β}}_{1}^{RR})$ outperforms all other estimators when $Δ = 0$ . Since the null hypothesis is correct, it is expected. However, once $Δ$ begins to depart from the null space, the estimator’s SRE drops precipitously and approaches zero, making it less effective than the other estimators.
(iii): The SRE values grow while holding other parameters constant, as the correlation coefficient $ρ_{x}$ increases among the explanatory factors.
(iv): As the number of zero coefficients $(p_{2})$ increase =, all SRE estimators also increase.
(v): The ridge-type positive shrinkage estimator $({\hat{β}}_{1}^{PSR})$ uniformly prevails over the competing estimators.

6.2. Data Example

In 1970, ref. [21] examined the use of housing market data for census tracts in the Boston Statistical Metropolitan Area. The authors’ major objective was to establish a relationship between a set of (15) variables and the median cost of owner-occupied residences in Boston. Ref. [22] offered a corrected version of the dataset along with new spatial data. The dataset is accessible through the R-Package spdep version 1.3-1. There are 506 observations in the data, each of which relates to a single census tract. The variables in the data include the tract identification number (TRACT), median owner-occupied housing prices in US dollars (MEDV), corrected median owner-occupied housing prices in US dollars (CMEDV), percentages of residential land zoned for lots larger than 2500 square feet per town (constant for all Boston tracts) (ZN), percentages of non-retail business areas per town (INDUS), average room sizes per home (RM), the percentage of owner-occupied homes built before 1940 (AGE), a dummy variable with two levels, which is 1 if the tract borders the Charles River and 0 otherwise (CHAS), the crime rate per capita (CRIM), the weighted distance to main employment centers (DIS), nitrogen oxide concentration (parts per 10 million) per town (NOX), an accessibility index to radial highway per town (constant for all Boston tracts) (RAD), property tax rate per town ($10,000) (constant for all Boston tracts) (TAX), percentage of the lower-class population (LSTAT), pupil–teacher ratios per town (constant for all Boston tracts) (PTRATIO), and the variable

1000 {(b - 0.63)}^{2}

, where b is the proportion of blacks (B). Ref. [23] added the location of each tract in latitude (LAT), and longitude (LON) variables. Assuming an SE model, we can predict the response variable log(CMEDV) using all available variables, which will be referred to as the full SE model. For these data, a variety of selection techniques were used to determine the sub-model. One sub-model that was used by [19] is the model obtained by the adaptive LASSO algorithm, which will be referred to as our SE sub-model. The two models are summarized in Table 1.

Figure 1 displays a colored plot of the correlation coefficients for each variable. The notation (***) indicates high significance with a p-value of less than 0.001. The notation (**) indicates significance at a 1% level, while (*) indicates significance at a 5% level. If none of these symbols are present, it signifies that the correlation coefficient between the two variables is not statistically significant. The CMEDV and a few other factors have a strong linear relationship, as seen in the plot. This plot is useful for examining the strength of linearity between the original response CMEDV and any other variable if it exists. The selected variables by the adaptive LASSO algorithm appear to have strong, medium, and weak relationships with the response variable. Moreover, some variables exhibit collinearity; this issue will show how ridge-type estimators show high performance compared to MLE estimators.

Table 2 presents the estimated ridge-type values of the proposed estimators.

To assess the effectiveness of the suggested estimators, we will use two different methods, aiming to provide a reliable and valid evaluation of our results. The first method employs the bootstrapping methodology, whereas the second one involves validation using out-of-sample data.

The bootstrapping technique suggested by [24], computes the mean squared prediction error (MSPE) for any estimator as follows:

Fit the SE full and sub-models as they appear in Table 1 using the spautolm function and obtain the MLEs of $β_{1}$ , $σ^{2}$ , the spatial dependence parameter $ρ$ , and the covariance matrix $V_{n}$ .
As the columns of matrices ${(X_{1}^{T} A_{X_{2}} X_{1})}^{- 1}$ and ${(X_{1}^{T} {\hat{V}}_{n}^{- 1} X_{1})}^{- 1}$ are not orthogonal, and the sample size is large, we follow [25] to estimate the tuning ridge parameters for the two estimators, ${\hat{β}}_{1}^{UR}$ and ${\hat{β}}_{1}^{RR}$ , which are, respectively, given by
$k_{f} = \frac{{\hat{σ}}^{2} t r {(X 1^{T} A_{X_{2}} X 1)}^{- 1}}{{(\hat{β_{1}})}^{T} {(X 1^{T} A_{X_{2}} X 1)}^{- 1} \hat{β_{1}}}$ , and $k_{r} = \frac{{\hat{σ}}^{2} t r {(X 1^{T} {\hat{V}}_{n}^{- 1} X 1)}^{- 1}}{{({\hat{β}}_{1}^{S})}^{T} {(X 1^{T} {\hat{V}}_{n}^{- 1} X 1)}^{- 1} {\hat{β}}_{1}^{S}}$ .
Use the Cholesky decomposition method to express the matrix $\hat{V_{n}}$ in a decomposed form as $\hat{V_{n}} = U U^{T}$ , where $U$ is an $(n \times n)$ lower triangular matrix.
Let $\hat{ϵ} = U^{- 1} (Y - X \hat{β})$ , where $\hat{ϵ} = ({\hat{ϵ}}_{1}, {\hat{ϵ}}_{2}, \dots, {\hat{ϵ}}_{n})$ ; we define the centered residual as $ϵ_{i}^{c} = {\hat{ϵ}}_{i} - \frac{1}{n} \sum_{j = 1}^{n} {\hat{ϵ}}_{j}$ , and then select with the replacement a sample of size $(n)$ $(ϵ_{1}^{c}, ϵ_{2}^{c}, \dots, ϵ_{n}^{c})$ to obtain $ϵ^{★} = (ϵ_{1}^{★}, ϵ_{2}^{★}, \dots, ϵ_{n}^{★})$ .
Calculate the bootstrapping response value as $Y^{★} = X \hat{β} + U^{- 1} ϵ^{★}$ , and then use it to fit the full and sub-models and obtain bootstrapping estimated values of all estimators.
Calculate the predicted value of the response variable using each estimator as follows: $\hat{y_{k i}^{★}} = X_{1} \hat{β_{1}^{*}} + \hat{ρ^{★}} \sum_{j = 1}^{n} W_{i j}^{*} (\hat{y_{k j}^{★}} - X_{j} \hat{β_{1}^{*}})$ , where $\hat{β_{1}^{*}}$ represents any of the estimators in the set ${\hat{β_{1}}, {\hat{β}}_{1}^{UR}, {\hat{β}}_{1}^{RR}, {\hat{β}}_{1}^{PTR}, {\hat{β}}_{1}^{SR}, {\hat{β}}_{1}^{PSR}}$ .
For the $k^{t h}$ bootstrapping sample, we calculate the square root of the mean square prediction error (MSPE) as

$\begin{matrix} M S P E_{k} (\hat{β_{1}^{*}}) & = & \sqrt{\frac{\sum_{i = 1}^{n} {(\hat{y_{i}^{*}} - y_{i})}^{2}}{n}}, k = 1, 2, \dots, B, \end{matrix}$

(24)

where B is the number of bootstrapping samples.
Calculate the relative efficiency (RE) of any estimator with respect to the MLE $\hat{β_{1}}$ as follows:

$\begin{matrix} R E (\hat{β_{1}^{•}}) & = & \frac{M S P E (\hat{β_{1}})}{M S P E (\hat{β_{1}^{•}})}, \end{matrix}$

(25)

where $\hat{β_{1}^{•}}$ is any of the ridge-type proposed estimators. We apply the bootstrapping technique $B = 2000$ times.

Table 3 summarizes the results of the relative efficiencies, where a relative efficiency value exceeding one indicates the superior performance of the estimator in the denominator.

The second approach is based on out-of-sample data. In general, when using out-of-sample data for non-spatial regression models, it is assumed that the errors are independent. However, the errors in the SE regression model are not independent. Nonetheless, by employing a transformation, we may overcome this challenge. We suggest modifying the SE model to ensure the errors are independent while keeping

σ^{2}

and

ρ

constant. A related transformation technique in spatial models can be found in [6]. Note that the

(n \times n)

covariance matrix

V_{n}

is positive definite, so

V_{n}

can be rewritten as

V_{n} = A^{T} A

, where

A

is an upper triangular matrix with positive entries on the diagonal; see ([26], p. 338). By multiplying the model in (4) by

{(A^{- 1})}^{T}

, we obtain

\begin{matrix} Y^{★} & = & X^{★} β + ϵ^{★}, \end{matrix}

(26)

where

Y^{★} = {(A^{- 1})}^{T} Y

,

X^{★} = {(A^{- 1})}^{T} X

, and

ϵ^{★} = {(A^{- 1})}^{T} ϵ

, with

ϵ^{★} \sim N (0, σ^{2} I_{n})

. Practically, we obtain the MLEs of

σ^{2}

, and

ρ

, and use these estimates to find the estimated matrix

\hat{A}

. The steps of using out-of-sample data are as follows:

Create a data frame containing the columns of $X^{★}$ and $Y^{★}$ .
Divide the data frame into training and testing subsets. The testing data subset is known as the out-of-sample dataset.
Using the training dataset, we follow the same procedure discussed in Section 3. That is, we divide the training data into two subsets, as $X_{t r a i n}^{★} = (X_{1, t r a i n}^{★}, X_{2, t r a i n}^{★})$ , fit the full and sub-models, and obtain the array of estimators, which are denoted by ${\hat{β}}_{1, t r a i n}$ , ${\hat{β}}_{1, t r a i n}^{U R}$ , ${\hat{β}}_{1, t r a i n}^{R R}$ , ${\hat{β}}_{1, t r a i n}^{P T R}$ , ${\hat{β}}_{1, t r a i n}^{S R}$ , ${\hat{β}}_{1, t r a i n}^{P S R}$ .
Divide the testing data into two subsets, as $X_{t e s t}^{★} = (X_{1, t e s t}^{★}, X_{2, t e s t}^{★})$ , and calculate the predicted response values as follows:

$\begin{matrix} {\hat{Y}}^{•} & = & X_{1, t e s t}^{★} {\hat{β}}_{1, t r a i n}^{•}, \end{matrix}$

where ${\hat{β}}_{1, t r a i n}^{•}$ is any of the estimators obtained in step (2).
Compute the average of the MSPE using Equation (24), replacing ${\hat{y_{i}}}^{*}$ with ${\hat{y_{i}}}^{•}$ , and $y_{i}$ by $y_{i, t e s t}^{★}$ .

We divide the dataset into

80 %

for the taring set and

20 %

for the testing set, respectively; we repeat steps (2–5) 2000 times, and then obtain the relative efficiency as in Equation (25). Table 4 shows the relative efficiency results based on out-of-sample data.

Table 3 and Table 4 illustrate the better performance of the sub-model ridge-type estimator

({\hat{β}}_{1}^{RR})

compared to all other estimators. It is then followed by the pretest estimator

({\hat{β}}_{1}^{PTR})

, demonstrating the correctness of the sub-model that was selected. Also, the ridge-positive shrinkage estimator performs better than the shrinkage one. Furthermore, all ridge-type estimators outperformed the MLE of

β_{1}

.

7. Conclusions

This paper discusses the pretest, shrinkage, and positive shrinkage ridge-type estimators of the parameter vector

(β)

for the SE model when there is a previous suspicion that certain coefficients are insignificant, and multicollinearity exists between two or more regressor variables. To obtain the proposed set of estimators for the main effect vector of coefficients

(β_{1})

, we test the hypothesis

H_{0} : β_{2} = 0

. The proposed estimators were compared analytically via their asymptotic distributional quadratic risks, and numerically through simulation experiments and a real data example.

Our results showed that there is no significant effect of the spatial dependence parameters

(ρ)

, while the performance of the ridge estimators increases as the correlation among the regressor variables increases. Moreover, the performance of the ridge estimators is always better than the MLE. In addition, the estimator

({\hat{β}}_{1}^{RR})

dominates all estimators under the null hypothesis

H_{0} : β_{2} = 0

, or when near the null space, and delivers higher efficiency than the other estimators. However, the proposed positive shrinkage ridge estimators

({\hat{β}}_{1}^{PSR})

perform better than the MLE in all seniors. Further, we apply the set of estimators to a real data example, and use bootstrapping and validation based on out-of-sample data techniques to evaluate their performance based on the relative efficacy of the square root of the mean squared prediction error.

The ridge-type pretest and shrinkage estimators significantly reduced the MSPE. These estimators handled spatial error models well by collecting and minimizing prediction variation. The lower MSPE shows that the proposed estimation strategy makes more accurate and trustworthy forecasts than MLE. This suggests that adding these estimators to the model improves predicted accuracy and model performance. This part of the residual analysis shows that ridge-type estimators are effective at addressing prediction errors in spatial error modeling.

The idea of a ridge-type pretest and shrinkage estimation strategy applies to a wide range of spatial regression models. Regarding continuous data, the method can be used for different models, such as the conditional autoregressive model, simultaneous autoregressive model, and spatial autoregressive moving average model. For discrete spatial data types, it can be applied to generalized linear models with a conditional autoregressive covariance structure, auto-logistic models for binary data, and auto-Poisson and negative binomial models for count data, among others. Lee, L. F [27] considered the estimation of the spatial model, which includes a spatial lag of the dependent variable and spatially autoregressive disturbances and provides the most effective spatial two-stage least squares (2SLS) estimator, which are instrumental variable estimators and optimal in the asymptotic sense. Such a method may be used and benefit from the pretest and shrinkage estimation strategy to improve the estimators of several spatial regression models. Liu and Yang [28] discussed the impact of spatial dependence on the convergence rate of quasi-maximum likelihood (QML) estimators and guided how to rectify the finite sample bias in the spatial error dependence model. Based on our findings, it is expected that employing the ridge-type pretest and shrinkage estimation for the spatial dependence model will be a beneficial addition, and provide better results in terms of the estimators’ biases.

Author Contributions

M.A.-M. initiated the research, designed the study, proposed estimators, established a methodology, did the numerical studies, including the simulation and the data example, analyzed findings, wrote a manuscript, underwent critical revision, and revised the final version. M.A. meticulously stages the research from design to final approval, including methodology, especially the theory, and the writing of the original manuscript, reviewing and editing, ensuring accuracy, clarity, and coherence of findings through an iterative process. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset is accessible through the R-Package “spdep”.

Acknowledgments

We express our heartfelt gratitude to the four anonymous reviewers for their valuable feedback, which prompted us to include several details in the work and enhance its presentation.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of the Main Results

Proof of Lemma 1.

For the proof, we follow the approach of Yuzbasi et al. [29], with a slight modification. Let

W \sim N_{p} (0, σ^{2} C)

and define

\begin{matrix} V_{n} (u) & = & \sum_{i = 1}^{n} [{(ϵ_{i} - u^{T} x_{i} / \sqrt{n})}^{2} - ϵ_{i}^{2}] + k \sum_{j = 1}^{p} [{(β_{j} + u_{j} / \sqrt{n})}^{2} - β_{j}^{2}] \\ V (u) & = & - 2 u^{T} W + u^{T} C u + 2 k_{o} u^{T} β, \end{matrix}

where

u = {(u_{1}, \dots, u_{p})}^{T}

. Following [30],

\begin{matrix} \sum_{i = 1}^{n} [{(ϵ_{i} - u^{T} x_{i} / \sqrt{n})}^{2} - ϵ_{i}^{2}] \overset{D}{\to} - 2 u^{T} W + u^{T} C u \end{matrix}

with finite-dimensional convergence holding trivially. Also

\begin{matrix} k \sum_{j = 1}^{p} [{(β_{j} + u_{j} / \sqrt{n})}^{2} - β_{j}^{2}] \overset{D}{\to} k_{o} \sum_{j = 1}^{p} u_{j} β_{j} . \end{matrix}

Thus

V_{n} (u) \overset{D}{\to} V (u)

with the finite-dimensional convergence holding trivially. Since

V_{n} (u)

is convex and

V (u)

has a unique minimum, it follows that

\begin{matrix} a r g m i n V_{n} (u) = \sqrt{n} ({\hat{β}}^{RF} - β) \\ \overset{D}{\to} \\ a r g m i n V (u) = C^{- 1} (W - k_{o} β) \sim N_{p} (- k_{o} C^{- 1} β, σ^{2} C^{- 1}) . \end{matrix}

It concludes

\begin{matrix} \sqrt{n} ({\hat{β}}^{RF} - β) \overset{D}{\to} N_{p} (- k_{o} C^{- 1} β, σ^{2} C^{- 1}) . \end{matrix}

□

Proof of Theorem 2.

Because all of the pronounced estimators are special cases

{\hat{β}}_{1}^{Shrinkage}

, we give the bias of this estimator here. Then the proof follows by applying relevant

g (\cdot)

function in each estimator. Hence, we have

\begin{matrix} ADB ({\hat{β}}_{1}^{Shrinkage}) & = & ADB ({\hat{β}}_{1}^{UR}) - lim_{n \to \infty} \sqrt{n} E [({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] . \end{matrix}

Using part one of Lemma 1,

ADB ({\hat{β}}_{1}^{UR}) = - η_{11.2}

. Further using part three of Lemma 1 along with Theorem 1 in Appendix B of [31], we have

\begin{matrix} lim_{n \to \infty} \sqrt{n} E [({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] = δ E [g (χ_{p_{2} + 2}^{2} (Δ))] . \end{matrix}

Therefore, the asymptotic bias of the general shrinkage estimator is given by

\begin{matrix} ADB ({\hat{β}}_{1}^{Shrinkage}) = - η_{11.2} - δ E [g (χ_{p_{2} + 2}^{2} (Δ))] . \end{matrix}

The proof is complete considering the expressions for

E [g (χ_{p_{2} + 2}^{2} (Δ))]

given in Table A1. □

Table A1. Expressions for the corresponding

g (\cdot)

functions in the proposed shrinkage estimators.

Table A1. Expressions for the corresponding

g (\cdot)

functions in the proposed shrinkage estimators.

Shrinkage Estimator	$g (\cdot)$ Function	$E [g (χ_{p_{2} + 2}^{2} (Δ))]$
${\hat{β}}_{1}^{PTR}$	$I (T_{n} \leq χ_{α, p_{2}}^{2})$	$H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ)$
${\hat{β}}_{1}^{SR}$	$(p_{2} - 2) T_{n}^{- 1}$	$(p_{2} - 2) E (χ_{p_{2} + 2}^{- 2} (Δ))$
${\hat{β}}_{1}^{PSR}$	$(1 - (p_{2} - 2) T_{n}^{- 1}) I (T_{n} \leq χ_{α, p_{2}}^{2})$	$H_{p_{2} + 2} (χ_{α, p_{2}}^{2}; Δ)$
		$- (p_{2} - 2)$
		$E [χ_{p_{2} + 2}^{- 2} (Δ) I (χ_{p_{2} + 2}^{2} (Δ) \leq p_{2} - 2)]$

Proof of Theorem 3.

Similar to the proof of Theorem 2, we provide the ADQR of the shrinkage estimator

{\hat{β}}_{1}^{Shrinkage}

here. Then the proof follows by applying relevant

g (\cdot)

function in each estimator. Hence, we have

\begin{matrix} Γ ({\hat{β}}_{1}^{Shrinkage}, M) & = & Γ ({\hat{β}}_{1}^{UR}, M) \\ - & 2 lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - β)}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] \\ + lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR})}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g^{2} (T_{n})] \end{matrix}

From Lemma 1, we have

\begin{matrix} Γ ({\hat{β}}_{1}^{UR}, M) & = & t r \{M [c o v ({\hat{β}}_{1}^{UR})]\} \\ = & t r \{M [lim_{n \to \infty} n E ({\hat{β}}_{1}^{UR} - β_{1}) {({\hat{β}}_{1}^{UR} - β_{1})}^{T}]\} \\ = & t r \{M [lim_{n \to \infty} E (ϑ_{n}^{(1)} {ϑ_{n}^{(1)}}^{T})]\} \\ = & t r \{M [lim_{n \to \infty} c o v (ϑ_{n}^{(1)}) + E (ϑ_{n}^{(1)}) E {(ϑ_{n}^{(1)})}^{T}]\} \\ = & t r \{M [σ^{2} C_{11.2}^{- 1} + η_{11.2} η_{11.2}^{T}]\} \\ = & η_{11.2}^{T} M η_{11.2} + σ^{2} t r (M C_{11.2}^{- 1}) . \end{matrix}

From Lemma 1

\begin{matrix} lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - β)}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] & = & t r \{M [lim_{n \to \infty} E (ϑ_{n}^{(3)} {ϑ_{n}^{(1)}}^{T} g (T_{n}))]\} . \end{matrix}

Using double expectation, parts three and six of Lemma 1, and Theorems 1 & 3 in Appendix B of [31], we have

\begin{matrix} lim_{n \to \infty} E (ϑ_{n}^{(3)} {ϑ_{n}^{(1)}}^{T} g (T_{n})) & = & lim_{n \to \infty} E [E (ϑ_{n}^{(3)} {ϑ_{n}^{(1)}}^{T} g (T_{n})) | ϑ_{n}^{(3)}] \\ = & lim_{n \to \infty} E [ϑ_{n}^{(3)} E ({ϑ_{n}^{(1)}}^{T} g (T_{n})) | ϑ_{n}^{(3)}] \\ = & lim_{n \to \infty} E [ϑ_{n}^{(3)} {[- η_{11.2} + ϑ_{n}^{(3)} - δ]}^{T} g (T_{n}) | ϑ_{n}^{(3)}] \\ = & - lim_{n \to \infty} E [ϑ_{n}^{(3)} η_{11.2}^{T} g (T_{n})] + lim_{n \to \infty} E [ϑ_{n}^{(3)} {(ϑ_{n}^{(3)} - δ)}^{T} g (T_{n})] \\ = & - lim_{n \to \infty} E [ϑ_{n}^{(3)} g (T_{n})] η_{11.2}^{T} + lim_{n \to \infty} E [ϑ_{n}^{(3)} {ϑ_{n}^{(3)}}^{T} g (T_{n})] \\ - lim_{n \to \infty} E [ϑ_{n}^{(3)} g (T_{n})] δ^{T} \\ = & - δ η_{11.2}^{T} E [g (χ_{p_{2} + 2}^{2} (Δ))] + σ^{2} (C_{11.2}^{- 1} - C_{11}^{- 1}) E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ δ^{T} E [g (χ_{p_{2} + 4}^{2} (Δ))] - δ δ^{T} E [g (χ_{p_{2} + 2}^{2} (Δ))] \end{matrix}

Thus, it yields

\begin{matrix} lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - β)}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g (T_{n})] & = & - η_{11.2}^{T} M δ \\ E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] \\ E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ^{T} M δ E [g (χ_{p_{2} + 4}^{2} (Δ))] \\ - & δ^{T} M δ E [g (χ_{p_{2} + 2}^{2} (Δ))] \end{matrix}

In a similar manner, we have

\begin{matrix} lim_{n \to \infty} n E [{({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR})}^{T} M ({\hat{β}}_{1}^{UR} - {\hat{β}}_{1}^{RR}) g^{2} (T_{n})] & = & t r \{M [lim_{n \to \infty} E (ϑ_{n}^{(3)} {ϑ_{n}^{(3)}}^{T} g (T_{n}))]\} \\ = & σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] \\ E [g^{2} (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ^{T} M δ E [g^{2} (χ_{p_{2} + 4}^{2} (Δ))] \end{matrix}

Gathering all required expressions, we finally have

\begin{matrix} Γ ({\hat{β}}_{1}^{Shrinkage}, M) & = & η_{11.2}^{T} M η_{11.2} + σ^{2} t r (M C_{11.2}^{- 1}) \\ - 2 {- η_{11.2}^{T} M δ E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ^{T} M δ E [g (χ_{p_{2} + 4}^{2} (Δ))] - δ^{T} M δ E [g (χ_{p_{2} + 2}^{2} (Δ))]} \\ + σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] E [g^{2} (χ_{p_{2} + 2}^{2} (Δ))] \\ + δ^{T} M δ E [g^{2} (χ_{p_{2} + 4}^{2} (Δ))] \\ = & η_{11.2}^{T} M η_{11.2} + σ^{2} t r (M C_{11.2}^{- 1}) + 2 η_{11.2}^{T} M δ E [g (χ_{p_{2} + 2}^{2} (Δ))] \\ + σ^{2} t r [M (C_{11.2}^{- 1} - C_{11}^{- 1})] \{- 2 E [g (χ_{p_{2} + 2}^{2} (Δ))] + E [g^{2} (χ_{p_{2} + 2}^{2} (Δ))]\} \\ + δ^{T} M δ \\ \times \{- 2 E [g (χ_{p_{2} + 4}^{2} (Δ))] + 2 E [g (χ_{p_{2} + 2}^{2} (Δ))] + E [g^{2} (χ_{p_{2} + 2}^{2} (Δ))]\} . \end{matrix}

The proof is complete using Table A1. □

Figure A1. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

,

ρ_{x} \in {0.3, 0.6, 0.9}

,

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 10)

.

Figure A1. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

,

ρ_{x} \in {0.3, 0.6, 0.9}

,

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 10)

.

Figure A2. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

,

ρ_{x} \in {0.3, 0.6, 0.9}

,

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 20)

.

Figure A2. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

,

ρ_{x} \in {0.3, 0.6, 0.9}

,

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 20)

.

Figure A3. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

,

ρ_{x} \in {0.3, 0.6, 0.9}

,

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 30)

.

Figure A3. SRE of the suggested estimators with respect to the MLE (

\hat{β_{1}}

) for

n = 49, 100

,

ρ_{x} \in {0.3, 0.6, 0.9}

,

ρ = 0.90

, and

(p_{1}, p_{2}) = (5, 30)

.

References

Dai, X.; Li, E.; Tian, M. Quantile regression for varying coefficient spatial error models. Commun. Stat.—Theory Methods 2019, 50, 2382–2397. [Google Scholar] [CrossRef]
Higazi, S.F.; Abdel-Hady, D.H.; Al-Oulfi, S.A. Application of spatial regression models to income poverty ratios in Middle Delta contiguous counties in Egypt. Pak. J. Stat. Oper. Res. 2013, 9, 93. [Google Scholar] [CrossRef]
Piscitelli, A. Spatial Regression of Juvenile Delinquency: Revisiting Shaw and McKay. Int. J. Crim. Justice Sci. 2019, 14, 132–147. [Google Scholar]
Liu, R.; Yu, C.; Liu, C.; Jiang, J.; Xu, J. Impacts of haze on housing prices: An empirical analysis based on data from Chengdu (China). Int. J. Environ. Res. Public Health 2018, 15, 1161. [Google Scholar] [CrossRef] [PubMed]
Yildirim, V.; Mert, K.Y. Robust estimation approach for spatial error model. J. Stat. Comput. Simul. 2020, 90, 1618–1638. [Google Scholar] [CrossRef]
Cressie, N. Statistics for Spatial Data; John Wiley & Sons: Nashville, TN, USA, 1993. [Google Scholar]
Cressie, N.; Wikle, C.K. Statistics for Spatio-Temporal Data; Wiley-Blackwell: Chichester, UK, 2011. [Google Scholar]
Haining, R. Spatial Data Analysis: Theory and Practice; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Al-Momani, M.; Riaz, M.; Saleh, M.F. Pretest and shrinkage estimation of the regression parameter vector of the marginal model with multinomial responses. Stat. Pap. 2022, 64, 2101–2117. [Google Scholar] [CrossRef]
Lisawadi, S.; Ahmed, S.E.; Reangsephet, O. Post estimation and prediction strategies in negative binomial regression model. Int. J. Model. Simul. 2020, 41, 463–477. [Google Scholar] [CrossRef]
Nkurunziza, S.; Al-Momani, M.; Lin, E.Y. Shrinkage and lasso strategies in high-dimensional heteroscedastic models. Commun. Stat.—Theory Methods 2016, 45, 4454–4470. [Google Scholar] [CrossRef]
Hoerl, A.E.; Kennard, R.W. A new Liu-type estimator in linear regression model. Technometrics 1970, 12, 55–67. [Google Scholar] [CrossRef]
Kejian, L. A new class of biased estimate in linear regression. Commun. Stat.—Theory Methods 1993, 22, 393–402. [Google Scholar] [CrossRef]
Li, Y.; Yang, H. A new Liu-type estimator in linear regression model. Stat. Pap. 2010, 53, 427–437. [Google Scholar] [CrossRef]
Arashi, M.; Kibria, B.M.G.; Norouzirad, M.; Nadarajah, S. Improved preliminary test and Stein-rule Liu estimators for the ill-conditioned elliptical linear regression model. J. Multivar. Anal. 2014, 126, 53–74. [Google Scholar] [CrossRef]
Arashi, M.; Norouzirad, M.; Roozbeh, M.; Khan, N.M. A high-dimensional counterpart for the ridge estimator in multicollinear situations. Mathematics 2021, 9, 3057. [Google Scholar] [CrossRef]
Al-Momani, M. Liu-type pretest and shrinkage estimation for the conditional autoregressive model. PLoS ONE 2023, 18, e0283339. [Google Scholar] [CrossRef] [PubMed]
Mardia, K.V.; Marshall, R.J. Maximum likelihood estimation of models for residual covariance in spatial regression. Biometrika 1984, 71, 135–146. [Google Scholar] [CrossRef]
Al-Momani, M.; Hussein, A.A.; Ahmed, S.E. Penalty and related estimation strategies in the spatial error model. Stat. Neerl. 2016, 71, 4–30. [Google Scholar] [CrossRef]
Bivand, R. R packages for Analyzing Spatial Data: A comparative case study with Areal Data. Geogr. Anal. 2022, 54, 488–518. [Google Scholar] [CrossRef]
Harrison, D.; Rubinfeld, D.L. Hedonic housing prices and the demand for Clean Air. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef]
Gilley, O.W.; Pace, R.K. On the Harrison and Rubinfeld data. J. Environ. Econ. Manag. 1996, 31, 403–405. [Google Scholar] [CrossRef]
Pace, R.K.; Gilley, O.W. Using the Spatial Configuration of the Data to Improve Estimation. J. Real Estate Financ. Econ. 1997, 14, 333–340. [Google Scholar] [CrossRef]
Solow, A.R. Bootstrapping correlated data. J. Int. Assoc. Math. Geol. 1985, 17, 769–775. [Google Scholar] [CrossRef]
Boonstra, P.S.; Mukherjee, B.; Taylor, J.M. A small-sample choice of the tuning parameter in ridge regression. Stat. Sin. 2015, 23, 1185. [Google Scholar] [CrossRef] [PubMed]
Seber, G.A.F. Spatial Data Analysis: Theory and Practice. In A Matrix Handbook for Statisticians; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Lee, L. Best Spatial Two-Stage Least Squares Estimators for a Spatial Autoregressive Model with Autoregressive Disturbances. Econom. Rev. 2003, 22, 307–335. [Google Scholar] [CrossRef]
Liu, S.F.; Yang, Z. Asymptotic Distribution and Finite Sample Bias Correction of QML Estimators for Spatial Error Dependence Model. Econometrics 2015, 3, 376–411. [Google Scholar] [CrossRef]
Yuzbasi, B.; Arashi, M.; Ahmed, S.E. Shrinkage estimation strategies in generalized ridge regression models under low/high-dimension regime. Int. Stat. Rev. 2020, 88, 229–251. [Google Scholar] [CrossRef]
Fu, W.; Knight, K. Asymptotics for lasso-type estimators. Ann. Stat. 2000, 28, 1356–1378. [Google Scholar] [CrossRef]
Judge, G.G.; Bock, M.E. The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics; North-Holland Pub. Co.: Amsterdam, The Netherlands, 1978. [Google Scholar]

Figure 1. Correlation matrix for the Boston housing data.

Table 1. Full and sub-model.

Selection Criterion	Model
Full	`log(CMEDV) = log(LSTAT) + I(RM^2) + TAX`
	`+ B + log(RAD) + CHAS + CRIM + PTRATIO`
	`+ AGE + LAT + LON + log(RAD) + I(NOX^2)`
	`+ log(DIS) + ZN + INDUS`
Sub-model	`log(CMEDV) = log(LSTAT) + I(RM^2) + TAX + B + CRIM + PTRATIO`

Table 2. Estimated values.

Coefficient	${\hat{β}}_{1}^{UR}$	${\hat{β}}_{1}^{RR}$	${\hat{β}}_{1}^{PTR}$	${\hat{β}}_{1}^{SR}$	${\hat{β}}_{1}^{PSR}$
`Intercept`	3.8310	3.6490	3.8310	3.7882	3.7882
`log(LSTAT)`	−0.2635	−0.2872	−0.2635	−0.2691	−0.2691
`I(RM^2)`	0.0081	0.0077	0.0081	0.0080	0.0080
`TAX`	−0.0005	−0.0003	−0.0005	−0.0005	−0.0005
`B`	0.0006	0.0005	0.0006	0.0006	0.0006
`CRIM`	−0.0053	−0.0047	−0.0053	−0.0051	−0.0051
`PTRATIO`	−0.0168	−0.0157	−0.0168	−0.0165	−0.0165

Table 3. RE of the proposed estimators.

Estimator	${\hat{β}}_{1}^{UR}$	${\hat{β}}_{1}^{RR}$	${\hat{β}}_{1}^{PTR}$	${\hat{β}}_{1}^{SR}$	${\hat{β}}_{1}^{PSR}$
$R E$	1.0198	2.9468	2.8624	2.3070	2.3287

Table 4. The RE of the proposed estimators using out-of-sample data.

Estimator	${\hat{β}}_{1}^{UR}$	${\hat{β}}_{1}^{RR}$	${\hat{β}}_{1}^{PTR}$	${\hat{β}}_{1}^{SR}$	${\hat{β}}_{1}^{PSR}$
$R E$	1.0447	5.1399	1.2816	1.5649	1.5649

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Al-Momani, M.; Arashi, M. Ridge-Type Pretest and Shrinkage Estimation Strategies in Spatial Error Models with an Application to a Real Data Example. Mathematics 2024, 12, 390. https://doi.org/10.3390/math12030390

AMA Style

Al-Momani M, Arashi M. Ridge-Type Pretest and Shrinkage Estimation Strategies in Spatial Error Models with an Application to a Real Data Example. Mathematics. 2024; 12(3):390. https://doi.org/10.3390/math12030390

Chicago/Turabian Style

Al-Momani, Marwan, and Mohammad Arashi. 2024. "Ridge-Type Pretest and Shrinkage Estimation Strategies in Spatial Error Models with an Application to a Real Data Example" Mathematics 12, no. 3: 390. https://doi.org/10.3390/math12030390

APA Style

Al-Momani, M., & Arashi, M. (2024). Ridge-Type Pretest and Shrinkage Estimation Strategies in Spatial Error Models with an Application to a Real Data Example. Mathematics, 12(3), 390. https://doi.org/10.3390/math12030390

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ridge-Type Pretest and Shrinkage Estimation Strategies in Spatial Error Models with an Application to a Real Data Example

Abstract

1. Introduction

2. Spatial Error Model

3. Maximum Likelihood Estimation

4. Materials and Methods: Developing Pretest and Shrinkage Ridge Estimation Strategies

4.1. Full and Reduced Model Ridge Estimators

4.2. Pretest, Shrinkage, and Positive Shrinkage Ridge Estimators

5. Asymptotic Analysis

6. Numerical Analysis

6.1. Simulation Experiments

6.2. Data Example

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proofs of the Main Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI