# Spatial Outlier Accommodation Using a Spatial Variance Shift Outlier Model

## Abstract

## 1. Introduction

## 2. Variance Shift Outlier Model in the Classical Regression Model

## 3. The Proposed VSOM in Spatial Regression Model (SVSOM)

#### 3.1. Identification of Inflation in Variance

#### 3.2. Proposed Asymptotic Distribution of the ${t}_{si}^{2}$

- The GSM in Equation (9) is fitted to obtain $\widehat{\rho}$, $\widehat{\lambda}$, $\widehat{\beta}$ and ${\widehat{\sigma}}_{s}^{2}$.
- Using the estimated $\widehat{\rho}$, $\widehat{\lambda}$, $\widehat{\beta}$ and X, generate a new set of dependent variable, $\tilde{y}$, such that$$\tilde{y}=\widehat{\rho}{W}_{1}\tilde{y}+X\widehat{\beta}+{(I-\widehat{\lambda}{W}_{2})}^{-1}\tilde{\epsilon},$$
- With the newly generated $\tilde{y}$, fit Equation (9) and get the squared spatial studentized residuals (${t}_{si}^{2}$). Compute $100(1-\alpha $)% percentile of the ${t}_{si}^{2}$ for any suitable $\alpha $.
- Repeat step 3 10,000 times to generate the empirical distribution of the ${t}_{si}^{2}$ and save in a vector $ts$.
- Calculate the median of the vector $ts$ in step 4 as a threshold for ${t}_{si}^{2}$.
- Declare any ith observation whose ${t}_{si}^{2}$ exceeds the threshold as an outlier.

## 4. Spatial Outlier Accommodation

## 5. Simulation Experiment

- Due to the fact that contamination on the dependent variable, y, has influence on the model fit, the SVSOM accurately picks and classifies the outliers.
- As demonstrated by the results of the simulation study, an outlier in the residual term does not mask other locations and hence yields a large power statistic according to the contamination criteria. Due to the robustness of the SVSOM, the effect of the contamination is neutralized, and a better estimate of the true parameter is always obtained.
- Contamination in x and $\epsilon $ does not have much influence on the fit as demonstrated by the simulation study. The power statistics illustrate that the combination masked other locations as outliers and, hence, reduced the power.
- Contamination on the y and $\epsilon $ yields results similar to those of contamination on y. Though some contaminated locations and masked locations are picked as outliers as a result of contamination on $\epsilon $, contamination on the y variable is almost always picked due to the influence on the fitted model.
- Similar to contamination on y and $\epsilon $, contamination on y, $\epsilon $ and x masks other locations; hence, the lower power.

## 6. Numerical Example

#### 6.1. Artificial Data

#### 6.2. Georgia State COVID-19 Data

## 7. Conclusions

**Figure 1.**The graph of ${\widehat{\sigma}}_{s}^{2}$ against ${\widehat{\beta}}_{1}$ for $\rho =0.4$ and $\lambda =0.5$ from different sources of contamination.

**Figure 2.**The graph of ${\widehat{\sigma}}_{s}^{2}$ against ${\widehat{\beta}}_{1}$ for $\rho =0.7$ and $\lambda =0.8$ from different sources of contamination.

**Figure 3.**The graph of ${\widehat{\sigma}}_{s}^{2}$ against $MSE\left({\widehat{\beta}}_{1}\right)$ for $\rho =0.4$ and $\lambda =0.5$ from different sources of contamination.

**Figure 4.**The graph of ${\widehat{\sigma}}_{s}^{2}$ against $MSE\left({\widehat{\beta}}_{1}\right)$ for $\rho =0.7$ and $\lambda =0.8$ from different sources of contamination.

**Figure 6.**Scatter plot of (

**a**) dependent variable against its average neighbour and (

**b**) autocorrelated residual with its average neighbour in the simulated data for $\rho =0.4$ and $\lambda =0.5$ with two contamination (red dots) on the dependent variable, y. (

**a**) y against $Wy$. (

**b**) $\xi $ against $W\xi $.

**Figure 9.**Index plot of (

**a**) ${t}_{si}^{2}$, (

**b**) ${\widehat{\omega}}_{si}$ for the Geogia COVID-19 data.

**Figure 11.**Choropleth map of (

**a**) number of COVID-19 cases (

**b**) squared spatial studentized prediction residual, showing outliers in the Georgia counties as detected by the SVSOM.

**Table 1.**The estimates of ${\beta}_{0}$, ${\beta}_{1}$ and $MSE\left(\widehat{\beta}\right)$ in the GSM and SVSOM.

Contamination Source | ${\mathit{\sigma}}_{\mathit{s}}^{2}$ | $(\mathit{\rho},\mathit{\lambda})$ | GSM | SVSOM | ||||
---|---|---|---|---|---|---|---|---|

${\overline{\widehat{\mathbf{\beta}}}}_{\mathbf{0}}$ | ${\overline{\widehat{\mathbf{\beta}}}}_{\mathbf{1}}$ | $\mathit{M}\mathit{S}\mathit{E}\mathbf{\left(}\widehat{\mathbf{\beta}}\mathbf{\right)}$ | ${\overline{\widehat{\mathbf{\beta}}}}_{\mathbf{0}}$ | ${\overline{\widehat{\mathbf{\beta}}}}_{\mathbf{1}}$ | $\mathit{M}\mathit{S}\mathit{E}\left(\widehat{\mathbf{\beta}}\right)$ | |||

y | 0.01 | (0.4, 0.5) | 1.241 | 9.952 | 0.300 | 0.993 | 10.001 | 0.149 |

$\epsilon $ | 0.01 | (0.4, 0.5) | 1.005 | 10.000 | 0.033 | 1.000 | 10.000 | 0.001 |

y and $\epsilon $ | 0.01 | (0.4, 0.5) | 1.263 | 9.949 | 0.311 | 1.004 | 10.000 | 0.147 |

x and $\epsilon $ | 0.01 | (0.4, 0.5) | 1.016 | 9.998 | 0.016 | 1.000 | 10.000 | 0.001 |

y, $\epsilon $ and x | 0.01 | (0.4, 0.5) | 1.253 | 9.950 | 0.300 | 1.017 | 9.998 | 0.150 |

y | 0.01 | (0.7, 0.8) | 1.271 | 9.949 | 0.621 | 0.884 | 9.997 | 0.505 |

$\epsilon $ | 0.01 | (0.7, 0.8) | 1.013 | 10.000 | 0.015 | 0.999 | 10.000 | 0.001 |

y and $\epsilon $ | 0.01 | (0.7, 0.8) | 1.256 | 9.954 | 0.594 | 0.872 | 10.001 | 0.509 |

x and $\epsilon $ | 0.01 | (0.7, 0.8) | 1.030 | 9.997 | 0.015 | 1.000 | 10.000 | 0.001 |

y, $\epsilon $ and x | 0.01 | (0.7, 0.8) | 1.262 | 9.952 | 0.607 | 0.920 | 9.998 | 0.508 |

y | 0.10 | (0.4, 0.5) | 1.255 | 9.950 | 0.312 | 1.001 | 9.999 | 0.149 |

$\epsilon $ | 0.10 | (0.4, 0.5) | 1.005 | 10.000 | 0.018 | 1.000 | 10.000 | 0.007 |

y and $\epsilon $ | 0.10 | (0.4, 0.5) | 1.252 | 9.951 | 0.303 | 1.007 | 9.999 | 0.146 |

x and $\epsilon $ | 0.10 | (0.4, 0.5) | 1.017 | 9.998 | 0.017 | 1.001 | 10.000 | 0.007 |

y, $\epsilon $ and x | 0.10 | (0.4, 0.5) | 1.252 | 9.951 | 0.298 | 1.017 | 9.998 | 0.146 |

y | 0.10 | (0.7, 0.8) | 1.227 | 9.952 | 0.544 | 1.136 | 9.999 | 0.498 |

$\epsilon $ | 0.10 | (0.7, 0.8) | 1.0125 | 10.000 | 0.021 | 1.000 | 10.000 | 0.014 |

y and $\epsilon $ | 0.10 | (0.7, 0.8) | 1.217 | 9.956 | 0.539 | 1.139 | 10.001 | 0.494 |

x and $\epsilon $ | 0.10 | (0.7, 0.8) | 1.027 | 9.997 | 0.020 | 1.000 | 10.000 | 0.014 |

y, $\epsilon $ and x | 0.10 | (0.7, 0.8) | 1.235 | 9.952 | 0.531 | 1.183 | 9.998 | 0.488 |

y | 1.00 | (0.4, 0.5) | 1.250 | 9.950 | 0.312 | 1.007 | 9.999 | 0.166 |

$\epsilon $ | 1.00 | (0.4, 0.5) | 1.004 | 10.000 | 0.076 | 0.997 | 10.000 | 0.076 |

y and $\epsilon $ | 1.00 | (0.4, 0.5) | 1.252 | 9.951 | 0.314 | 1.008 | 10.000 | 0.168 |

x and $\epsilon $ | 1.00 | (0.4, 0.5) | 1.016 | 9.998 | 0.074 | 0.998 | 10.000 | 0.075 |

y, $\epsilon $ and x | 1.00 | (0.4, 0.5) | 1.253 | 9.951 | 0.310 | 1.007 | 9.997 | 0.167 |

y | 1.00 | (0.7, 0.8) | 1.245 | 9.948 | 0.579 | 1.115 | 10.000 | 0.523 |

$\epsilon $ | 1.00 | (0.7, 0.8) | 1.013 | 10.000 | 0.015 | 0.999 | 10.000 | 0.001 |

y and $\epsilon $ | 1.00 | (0.7, 0.8) | 1.217 | 9.955 | 0.562 | 1.174 | 10.001 | 0.529 |

x and $\epsilon $ | 1.00 | (0.7, 0.8) | 1.041 | 9.996 | 0.137 | 1.000 | 9.998 | 0.142 |

y, $\epsilon $ and x | 1.00 | (0.7, 0.8) | 1.201 | 9.957 | 0.554 | 1.200 | 9.999 | 0.519 |

Source of Contamination | y | $\mathit{\epsilon}$ | y and $\mathit{\epsilon}$ | x and $\mathit{\epsilon}$ | y, $\mathit{\epsilon}$ and x |
---|---|---|---|---|---|

Power of test | 0.943 | 1.000 | 0.527 | 0.750 | 0.353 |

False positve | 0.000 | 0.018 | 0.000 | 0.025 | 0.000 |

Parameter Estimate | OLS | S.E | GSM | S.E | SVSOM | S.E |
---|---|---|---|---|---|---|

$\widehat{\rho}$ | - | - | 0.540 | 0.163 | 0.540 | 0.097 |

${\widehat{\beta}}_{0}$ | 2628.392 | 2127.713 | 1087.739 | 1830.686 | 2853.433 | 1196.100 |

${\widehat{\beta}}_{1}$ | 14.311 | 2.982 | 9.783 | 2.566 | 6.192 | 1.605 |

${\widehat{\beta}}_{2}$ | −56.245 | 34.539 | −54.140 | 29.718 | −12.421 | 18.461 |

${\widehat{\beta}}_{3}$ | 45.318 | 9.902 | 40.332 | 8.520 | 28.747 | 5.603 |

${\widehat{\beta}}_{4}$ | 4.173 | 12.791 | −6.221 | 11.006 | −0.237 | 6.823 |

${\widehat{\beta}}_{5}$ | −34.673 | 16.275 | −28.587 | 14.003 | −19.267 | 9.744 |

${\widehat{\beta}}_{6}$ | −9.718 | 24.979 | 4.829 | 21.492 | −25.474 | 13.556 |

${\widehat{\omega}}_{s3}$ | - | - | - | - | 4.185 | - |

${\widehat{\omega}}_{s26}$ | - | - | - | - | 27.581 | - |

${\widehat{\omega}}_{s49}$ | - | - | - | - | 7.202 | - |

${\widehat{\omega}}_{s50}$ | - | - | - | - | 50.309 | - |

${\widehat{\omega}}_{s70}$ | - | - | - | - | 10.267 | - |

${\widehat{\omega}}_{s120}$ | - | - | - | - | 9.083 | - |

${\widehat{\omega}}_{s128}$ | - | - | - | 4.578 | - | |

${\widehat{\omega}}_{s135}$ | - | - | - | - | 3.944 | - |

${\widehat{\omega}}_{s141}$ | - | - | - | - | 6.549 | - |

${\widehat{\omega}}_{s142}$ | - | - | - | - | 3.943 | - |

