# Spatial Outlier Accommodation Using a Spatial Variance Shift Outlier Model

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Variance Shift Outlier Model in the Classical Regression Model

## 3. The Proposed VSOM in Spatial Regression Model (SVSOM)

#### 3.1. Identification of Inflation in Variance

#### 3.2. Proposed Asymptotic Distribution of the ${t}_{si}^{2}$

- The GSM in Equation (9) is fitted to obtain $\widehat{\rho}$, $\widehat{\lambda}$, $\widehat{\beta}$ and ${\widehat{\sigma}}_{s}^{2}$.
- Using the estimated $\widehat{\rho}$, $\widehat{\lambda}$, $\widehat{\beta}$ and X, generate a new set of dependent variable, $\tilde{y}$, such that$$\tilde{y}=\widehat{\rho}{W}_{1}\tilde{y}+X\widehat{\beta}+{(I-\widehat{\lambda}{W}_{2})}^{-1}\tilde{\epsilon},$$
- With the newly generated $\tilde{y}$, fit Equation (9) and get the squared spatial studentized residuals (${t}_{si}^{2}$). Compute $100(1-\alpha $)% percentile of the ${t}_{si}^{2}$ for any suitable $\alpha $.
- Repeat step 3 10,000 times to generate the empirical distribution of the ${t}_{si}^{2}$ and save in a vector $ts$.
- Calculate the median of the vector $ts$ in step 4 as a threshold for ${t}_{si}^{2}$.
- Declare any ith observation whose ${t}_{si}^{2}$ exceeds the threshold as an outlier.

## 4. Spatial Outlier Accommodation

## 5. Simulation Experiment

- Due to the fact that contamination on the dependent variable, y, has influence on the model fit, the SVSOM accurately picks and classifies the outliers.
- As demonstrated by the results of the simulation study, an outlier in the residual term does not mask other locations and hence yields a large power statistic according to the contamination criteria. Due to the robustness of the SVSOM, the effect of the contamination is neutralized, and a better estimate of the true parameter is always obtained.
- Contamination in x and $\epsilon $ does not have much influence on the fit as demonstrated by the simulation study. The power statistics illustrate that the combination masked other locations as outliers and, hence, reduced the power.
- Contamination on the y and $\epsilon $ yields results similar to those of contamination on y. Though some contaminated locations and masked locations are picked as outliers as a result of contamination on $\epsilon $, contamination on the y variable is almost always picked due to the influence on the fitted model.
- Similar to contamination on y and $\epsilon $, contamination on y, $\epsilon $ and x masks other locations; hence, the lower power.

## 6. Numerical Example

#### 6.1. Artificial Data

#### 6.2. Georgia State COVID-19 Data

## 7. Conclusions

## Author Contributions

## Funding

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Cook, R.D.; Weisberg, S. Residuals and Influence in Regression; Chapman and Hall: New York, NY, USA, 1982. [Google Scholar]
- Cook, R.D.; Holschuh, N.; Weisberg, S. A note on an alternative outlier model. J. R. Stat. Soc. Ser. (Methodol.)
**1982**, 44, 370–376. [Google Scholar] [CrossRef] - Dai, X.; Jin, L.; Shi, A.; Shi, L. Outlier detection and accommodation in general spatial models. Stat. Methods Appl.
**2016**, 25, 453–475. [Google Scholar] [CrossRef] - Lehmann, R.; Lösler, M.; Neitzel, F. Mean Shift versus Variance Inflation Approach for Outlier Detection—A Comparative Study. Mathematics
**2020**, 8, 991. [Google Scholar] [CrossRef] - Thompson, R. A note on restricted maximum likelihood estimation with an alternative outlier model. J. R. Stat. Soc. Ser. (Methodol.)
**1985**, 47, 53–55. [Google Scholar] [CrossRef] - Gumedze, F.N.; Welham, S.J.; Gogel, B.J.; Thompson, R. A variance shift model for detection of outliers in the linear mixed model. Comput. Stat. Data Anal.
**2010**, 54, 2128–2144. [Google Scholar] [CrossRef] - Insolia, L.; Chiaromonte, F.; Riani, M. A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers. In Festschrift in Honor of R. Dennis Cook: Fifty Years of Contribution to Statistical Science; Springer: Cham, Switzerland, 2021; p. 17. [Google Scholar]
- Beckman, R.J.; Cook, R.D. Outlier.......... s. Technometrics
**1983**, 25, 119–149. [Google Scholar] - Gumedze, F.N. Use of likelihood ratio tests to detect outliers under the variance shift outlier model. J. Appl. Stat.
**2019**, 46, 598–620. [Google Scholar] [CrossRef] - Zimmerman, D.L. Likelihood-based methods. In Handbook of Spatial Statistics; CRC Press: Boca Raton, FL, USA, 2010; pp. 45–56. [Google Scholar]
- Gumedze, F.N.; Jackson, D. A random effects variance shift model for detecting and accommodating outliers in meta-analysis. BMC Med. Res. Methodol.
**2011**, 11, 19. [Google Scholar] [CrossRef] [PubMed] - McCulloch, R.E.; Tsay, R.S. Bayesian inference and prediction for mean and variance shifts in autoregressive time series. J. Am. Stat. Assoc.
**1993**, 88, 968–978. [Google Scholar] [CrossRef] - Hawkins, D.M.; Zamba, K. A change-point model for a shift in variance. J. Qual. Technol.
**2005**, 37, 21–31. [Google Scholar] [CrossRef] - Zhang, M.; Li, Y.; Lu, J.; Shi, L. Outlier detection and accommodation in meta-regression models. Commun. Stat.-Theory Methods
**2019**, 50, 1728–1744. [Google Scholar] [CrossRef] - Anselin, L. Spatial Econometrics: Methods and Models; Springer: Dordrecht, The Netherlands, 1988. [Google Scholar]
- LeSage, J.P. The Theory and Practice of Spatial Econometrics; University of Toledo: Toledo, OH, USA, 1999; Volume 28. [Google Scholar]
- Yildirim, V.; Mert Kantar, Y. Robust estimation approach for spatial error model. J. Stat. Comput. Simul.
**2020**, 90, 1618–1638. [Google Scholar] [CrossRef] - Kou, Y.; Lu, C.T. Outlier Detection, Spatial. In Encyclopedia of GIS; Springer: Cham, Switzerland, 2008; pp. 1539–1546. [Google Scholar]
- Chen, D.; Lu, C.T.; Kou, Y.; Chen, F. On detecting spatial outliers. Geoinformatica
**2008**, 12, 455–475. [Google Scholar] [CrossRef] - Hiekkalinna, T.; Göring, H.H.; Terwilliger, J.D. On the validity of the likelihood ratio test and consistency of resulting parameter estimates in joint linkage and linkage disequilibrium analysis under improperly specified parametric models. Ann. Hum. Genet.
**2012**, 76, 63–73. [Google Scholar] [CrossRef] [PubMed] - Baba, A.M.; Midi, H.; Adam, M.B.; Rahman, N.H.A. Detection of Influential Observations in Spatial Regression Model Based on Outliers and Bad Leverage Classification. Symmetry
**2021**, 13, 2030. [Google Scholar] [CrossRef] - Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr.
**1970**, 46, 234–240. [Google Scholar] [CrossRef]

**Figure 1.**The graph of ${\widehat{\sigma}}_{s}^{2}$ against ${\widehat{\beta}}_{1}$ for $\rho =0.4$ and $\lambda =0.5$ from different sources of contamination.

**Figure 2.**The graph of ${\widehat{\sigma}}_{s}^{2}$ against ${\widehat{\beta}}_{1}$ for $\rho =0.7$ and $\lambda =0.8$ from different sources of contamination.

**Figure 3.**The graph of ${\widehat{\sigma}}_{s}^{2}$ against $MSE\left({\widehat{\beta}}_{1}\right)$ for $\rho =0.4$ and $\lambda =0.5$ from different sources of contamination.

**Figure 4.**The graph of ${\widehat{\sigma}}_{s}^{2}$ against $MSE\left({\widehat{\beta}}_{1}\right)$ for $\rho =0.7$ and $\lambda =0.8$ from different sources of contamination.

**Figure 6.**Scatter plot of (

**a**) dependent variable against its average neighbour and (

**b**) autocorrelated residual with its average neighbour in the simulated data for $\rho =0.4$ and $\lambda =0.5$ with two contamination (red dots) on the dependent variable, y. (

**a**) y against $Wy$. (

**b**) $\xi $ against $W\xi $.

**Figure 9.**Index plot of (

**a**) ${t}_{si}^{2}$, (

**b**) ${\widehat{\omega}}_{si}$ for the Geogia COVID-19 data.

**Figure 11.**Choropleth map of (

**a**) number of COVID-19 cases (

**b**) squared spatial studentized prediction residual, showing outliers in the Georgia counties as detected by the SVSOM.

**Table 1.**The estimates of ${\beta}_{0}$, ${\beta}_{1}$ and $MSE\left(\widehat{\beta}\right)$ in the GSM and SVSOM.

Contamination Source | ${\mathit{\sigma}}_{\mathit{s}}^{2}$ | $(\mathit{\rho},\mathit{\lambda})$ | GSM | SVSOM | ||||
---|---|---|---|---|---|---|---|---|

${\overline{\widehat{\mathbf{\beta}}}}_{\mathbf{0}}$ | ${\overline{\widehat{\mathbf{\beta}}}}_{\mathbf{1}}$ | $\mathit{M}\mathit{S}\mathit{E}\mathbf{\left(}\widehat{\mathbf{\beta}}\mathbf{\right)}$ | ${\overline{\widehat{\mathbf{\beta}}}}_{\mathbf{0}}$ | ${\overline{\widehat{\mathbf{\beta}}}}_{\mathbf{1}}$ | $\mathit{M}\mathit{S}\mathit{E}\left(\widehat{\mathbf{\beta}}\right)$ | |||

y | 0.01 | (0.4, 0.5) | 1.241 | 9.952 | 0.300 | 0.993 | 10.001 | 0.149 |

$\epsilon $ | 0.01 | (0.4, 0.5) | 1.005 | 10.000 | 0.033 | 1.000 | 10.000 | 0.001 |

y and $\epsilon $ | 0.01 | (0.4, 0.5) | 1.263 | 9.949 | 0.311 | 1.004 | 10.000 | 0.147 |

x and $\epsilon $ | 0.01 | (0.4, 0.5) | 1.016 | 9.998 | 0.016 | 1.000 | 10.000 | 0.001 |

y, $\epsilon $ and x | 0.01 | (0.4, 0.5) | 1.253 | 9.950 | 0.300 | 1.017 | 9.998 | 0.150 |

y | 0.01 | (0.7, 0.8) | 1.271 | 9.949 | 0.621 | 0.884 | 9.997 | 0.505 |

$\epsilon $ | 0.01 | (0.7, 0.8) | 1.013 | 10.000 | 0.015 | 0.999 | 10.000 | 0.001 |

y and $\epsilon $ | 0.01 | (0.7, 0.8) | 1.256 | 9.954 | 0.594 | 0.872 | 10.001 | 0.509 |

x and $\epsilon $ | 0.01 | (0.7, 0.8) | 1.030 | 9.997 | 0.015 | 1.000 | 10.000 | 0.001 |

y, $\epsilon $ and x | 0.01 | (0.7, 0.8) | 1.262 | 9.952 | 0.607 | 0.920 | 9.998 | 0.508 |

y | 0.10 | (0.4, 0.5) | 1.255 | 9.950 | 0.312 | 1.001 | 9.999 | 0.149 |

$\epsilon $ | 0.10 | (0.4, 0.5) | 1.005 | 10.000 | 0.018 | 1.000 | 10.000 | 0.007 |

y and $\epsilon $ | 0.10 | (0.4, 0.5) | 1.252 | 9.951 | 0.303 | 1.007 | 9.999 | 0.146 |

x and $\epsilon $ | 0.10 | (0.4, 0.5) | 1.017 | 9.998 | 0.017 | 1.001 | 10.000 | 0.007 |

y, $\epsilon $ and x | 0.10 | (0.4, 0.5) | 1.252 | 9.951 | 0.298 | 1.017 | 9.998 | 0.146 |

y | 0.10 | (0.7, 0.8) | 1.227 | 9.952 | 0.544 | 1.136 | 9.999 | 0.498 |

$\epsilon $ | 0.10 | (0.7, 0.8) | 1.0125 | 10.000 | 0.021 | 1.000 | 10.000 | 0.014 |

y and $\epsilon $ | 0.10 | (0.7, 0.8) | 1.217 | 9.956 | 0.539 | 1.139 | 10.001 | 0.494 |

x and $\epsilon $ | 0.10 | (0.7, 0.8) | 1.027 | 9.997 | 0.020 | 1.000 | 10.000 | 0.014 |

y, $\epsilon $ and x | 0.10 | (0.7, 0.8) | 1.235 | 9.952 | 0.531 | 1.183 | 9.998 | 0.488 |

y | 1.00 | (0.4, 0.5) | 1.250 | 9.950 | 0.312 | 1.007 | 9.999 | 0.166 |

$\epsilon $ | 1.00 | (0.4, 0.5) | 1.004 | 10.000 | 0.076 | 0.997 | 10.000 | 0.076 |

y and $\epsilon $ | 1.00 | (0.4, 0.5) | 1.252 | 9.951 | 0.314 | 1.008 | 10.000 | 0.168 |

x and $\epsilon $ | 1.00 | (0.4, 0.5) | 1.016 | 9.998 | 0.074 | 0.998 | 10.000 | 0.075 |

y, $\epsilon $ and x | 1.00 | (0.4, 0.5) | 1.253 | 9.951 | 0.310 | 1.007 | 9.997 | 0.167 |

y | 1.00 | (0.7, 0.8) | 1.245 | 9.948 | 0.579 | 1.115 | 10.000 | 0.523 |

$\epsilon $ | 1.00 | (0.7, 0.8) | 1.013 | 10.000 | 0.015 | 0.999 | 10.000 | 0.001 |

y and $\epsilon $ | 1.00 | (0.7, 0.8) | 1.217 | 9.955 | 0.562 | 1.174 | 10.001 | 0.529 |

x and $\epsilon $ | 1.00 | (0.7, 0.8) | 1.041 | 9.996 | 0.137 | 1.000 | 9.998 | 0.142 |

y, $\epsilon $ and x | 1.00 | (0.7, 0.8) | 1.201 | 9.957 | 0.554 | 1.200 | 9.999 | 0.519 |

Source of Contamination | y | $\mathit{\epsilon}$ | y and $\mathit{\epsilon}$ | x and $\mathit{\epsilon}$ | y, $\mathit{\epsilon}$ and x |
---|---|---|---|---|---|

Power of test | 0.943 | 1.000 | 0.527 | 0.750 | 0.353 |

False positve | 0.000 | 0.018 | 0.000 | 0.025 | 0.000 |

Parameter Estimate | OLS | S.E | GSM | S.E | SVSOM | S.E |
---|---|---|---|---|---|---|

$\widehat{\rho}$ | - | - | 0.540 | 0.163 | 0.540 | 0.097 |

${\widehat{\beta}}_{0}$ | 2628.392 | 2127.713 | 1087.739 | 1830.686 | 2853.433 | 1196.100 |

${\widehat{\beta}}_{1}$ | 14.311 | 2.982 | 9.783 | 2.566 | 6.192 | 1.605 |

${\widehat{\beta}}_{2}$ | −56.245 | 34.539 | −54.140 | 29.718 | −12.421 | 18.461 |

${\widehat{\beta}}_{3}$ | 45.318 | 9.902 | 40.332 | 8.520 | 28.747 | 5.603 |

${\widehat{\beta}}_{4}$ | 4.173 | 12.791 | −6.221 | 11.006 | −0.237 | 6.823 |

${\widehat{\beta}}_{5}$ | −34.673 | 16.275 | −28.587 | 14.003 | −19.267 | 9.744 |

${\widehat{\beta}}_{6}$ | −9.718 | 24.979 | 4.829 | 21.492 | −25.474 | 13.556 |

${\widehat{\omega}}_{s3}$ | - | - | - | - | 4.185 | - |

${\widehat{\omega}}_{s26}$ | - | - | - | - | 27.581 | - |

${\widehat{\omega}}_{s49}$ | - | - | - | - | 7.202 | - |

${\widehat{\omega}}_{s50}$ | - | - | - | - | 50.309 | - |

${\widehat{\omega}}_{s70}$ | - | - | - | - | 10.267 | - |

${\widehat{\omega}}_{s120}$ | - | - | - | - | 9.083 | - |

${\widehat{\omega}}_{s128}$ | - | - | - | 4.578 | - | |

${\widehat{\omega}}_{s135}$ | - | - | - | - | 3.944 | - |

${\widehat{\omega}}_{s141}$ | - | - | - | - | 6.549 | - |

${\widehat{\omega}}_{s142}$ | - | - | - | - | 3.943 | - |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Baba, A.M.; Midi, H.; Abd Rahman, N.H.
Spatial Outlier Accommodation Using a Spatial Variance Shift Outlier Model. *Mathematics* **2022**, *10*, 3182.
https://doi.org/10.3390/math10173182

**AMA Style**

Baba AM, Midi H, Abd Rahman NH.
Spatial Outlier Accommodation Using a Spatial Variance Shift Outlier Model. *Mathematics*. 2022; 10(17):3182.
https://doi.org/10.3390/math10173182

**Chicago/Turabian Style**

Baba, Ali Mohammed, Habshah Midi, and Nur Haizum Abd Rahman.
2022. "Spatial Outlier Accommodation Using a Spatial Variance Shift Outlier Model" *Mathematics* 10, no. 17: 3182.
https://doi.org/10.3390/math10173182