# pyjeo: A Python Package for the Analysis of Geospatial Data

## Abstract

## 1. Introduction

## 2. Design of Pyjeo

#### 2.1. From C/C++ to Python

**properties**, is a general module to get and set properties of the basic classes Jim (for raster datasets) and JimVect (for vector datasets). For example, the method properties.getBBox() returns the bounding box in the spatial reference system of an instance of the Jim class. Input and output operations such as reading and writing datasets from and to disk are covered in the

**io**module. The

**geometry**module groups operations that relate to the geometry of a dataset. Subsetting data cubes, aggregating raster pixels based on vector feature overlays (i.e., zonal statistics), and warping are some examples. Statistical functions such as minimum, maximum, means, and histograms are part of the

**stats**module. Algorithmic functions based on pixel operations and neighborhood operations are grouped in the modules

**pixops**and

**ngbops**respectively. Typical pixel operations are: convert the data type, threshold and mask, and pixel wise arithmetic. Many of these functions can also be called directly on the raster object (Jim) using operator overloading:

**classify**module include multilayer artificial neural networks. Both fully connected and sparsely connected networks are implemented. Also implemented are support vector machines and symbolic machine learning (SML [14]). Other classification algorithms can easily be integrated using the bridge to other data models (see also Section 2.2). This is also valid for algorithms not related to classification. Operations that are specific to digital elevation models can be called from the

**demops**module: e.g., calculation of slope (directions), pit removal [15,16], and contribution of drainage areas [17]. Finally, the

**ccops**module supports several connected-component operations including image segmentation algorithms such as watersheds and constrained connectivity [18] and the calculation of different distance (e.g., Euclidean and Geodesic) measures.

#### 2.2. Data Model

#### 2.3. Integrating Pyjeo for Big Data Analytics

## 3. Use Cases

#### 3.1. Large-Scale Processing with Pyjeo in Jeo-Batch

#### 3.2. Interactive Processing with Pyjeo in JEO-Lab

## 4. Conclusions

**Figure 3.**Using SWIG to build a Python interface from C++. Header files in white, C/C++ source files in gray, object files in green, dynamic library in cyan, Python modules in yellow.

**Figure 4.**The pyjeo raster data model is a multi-band 3D dataset, represented by the Python class Jim. The function stackPlane adds a plane to the dataset, whereas stackBand adds a new band with the same dimension as the existing bands.

**Figure 5.**Sentinel-2 global composite based on a minimum set of overlapping images visualized from a JupyterLab notebook in JEO-lab.

**Figure 6.**Interactive analysis and visualization in JEO-lab. The definition of the composition function is written in Python using pyjeo functions (left). The execution is performed in deferred processing, where each tile is processed in parallel at the extent and scale the user has visualized in the interactive map viewer (right).

Rank | Description | SCL Code |
---|---|---|

0 | Vegetation | 4 |

1 | Bare Soils | 5 |

2 | Water | 6 |

3 | Dark Area Pixels | 2 |

4 | Snow/Ice | 11 |

5 | Cirrus | 10 |

6 | Cloud Shadows | 3 |

7 | Clouds low probability/Unclassified | 7 |

8 | Clouds medium probability | 8 |

9 | Clouds high probability | 9 |

10 | Saturated/Defective | 1 |

11 | No Data | 0 |

Task | Throughput | Total Processing Time |
---|---|---|

selection | $1.5$ tiles/hour/core | 20 h |

Sen2Cor | $0.6$ tiles/hour/core | 50 h |

compositing | 2 tiles/hour/core | 15 h |

