# Transdisciplinary Foundations of Geospatial Data Science

## Abstract

## 1. Introduction

#### 1.1. Motivation

#### 1.2. Transdisciplinary Foundations: Mathematics, Statistics and Computer Science

#### 1.3. Geospatial Techniques

#### 1.4. Scope and Outline

## 2. Foundations of Hotspot Detection

#### 2.1. Mathematical Foundation of Hotspot Detection

#### 2.2. Statistical Foundation of Hotspot Detection

#### 2.3. Computer Science Foundation of Hotspot Detection

^{2}), where N represents the number of points in the dataset. If we improve completeness by considering 3-point circles (i.e., three points on a circumference), this cost increases polynomially, i.e., O(N

^{3}). If we consider all possible circular regions for total completeness, the enumeration cost becomes exorbitant. Finally, for statistical significance testing, since the exact distribution of interest measure values (e.g., likelihood ratio) is unknown except for very simple density measures [23], a p-value often cannot be computed in close-form. Thus, Monte Carlo Simulation is commonly used in practice to simulate a null hypothesis through a large number of trials (e.g., 10,000), which requires repetitive execution of detection algorithms and multiplies the computational cost.

## 3. Foundations of Colocation Detection

#### 3.1. Mathematical Foundation of Colocation Detection

#### 3.2. Statistical Foundation of Colocation Detection

#### 3.3. Computer Science Foundation of Colocation Detection

## 4. Foundations of Spatial Prediction

#### 4.1. Mathematical Foundation of Spatial Prediction

#### 4.2. Statistical Foundation of Spatial Prediction

#### 4.3. Computer Science Foundation of Spatial Prediction

## 5. Foundations of Spatial Outlier Detection

#### 5.1. Mathematical Foundation of Spatial Outlier Detection

#### 5.2. Statistical Foundation of Spatial Outlier Detection

#### 5.3. Computer Science Foundation of Spatial Outlier Detection

## 6. Foundations of Teleconnection Discovery

#### 6.1. Mathematical Foundation of Teleconnection Discovery

#### 6.2. Statistical Foundation of Teleconnection Discovery

#### 6.3. Computer Science Foundation of Teleconnection

## 7. Discussion

#### 7.1. Gaps and Opportunities

## 8. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. Transdisciplinary Foundations: Mathematics, Statistics and Computer Science

#### A.1. Mathematics

#### A.2. Statistics

**Figure A1.**Example of spatial statistics. (

**a**) Shows distribution of three types of events colored in green, orange and yellow. (

**b**) Shows a grid partition that splits original study area into four cells. (

**c**) Shows a neighbor graph generated based on point distribution in (

**a**).

Event Pairs | Pearson’s Correlation Coefficient | Participation Index |
---|---|---|

(Green, Orange) | −0.90 | 0.67 |

(Yellow, Orange) | 1 | 1 |

#### A.3. Computer Science

**Figure 1.**Crime hotspots detected by SaTScan. The point dataset contains 3781 cases of assault crimes in San Francisco, 2017. The map shows three statistically significant hotspots in blue.

**Figure 4.**Spatial Classification Problem [6]. (

**a**) input high-resolution aerial imagery, Chanhassen, Minnesota, USA; (

**b**) decision tree prediction with salt-and-pepper errors highlighted in white circle; (

**c**) ground truth class map: red is dry land and green is wetland.

**Figure 5.**Examples of spatial statistics for spatial outlier detection [62]. (

**a**) Original input data points; (

**b**) Spatial outliers found by variogram cloud; (

**c**) Spatial outliers found by Moran scatter plot; (

**d**) Spatial outliers found by scatter plot; (

**e**) Spatial outliers found by spatial statistic test.

**Table 1.**Approaches for addressing challenges of statistical significance testing of teleconnection patterns.

Challenge | Approach |
---|---|

Spatial and temporal autocorrelation. | Spatio-temporal dependencies are captured by using a time series decomposition that requires each end of the dipole to share the same global component and using an auto-regressive term to capture time dependencies. |

Seasonality and trends. | The time-series decomposition captures the seasonality and trends by extracting the underlying governing time series against local noise variations. |

Generating random samples under the null hypothesis. | A “wild bootstrap” approach generates samples by multiplying random noise to the residuals |

