# Multidimensional Arrays for Analysing Geoscientific Data

^{1}

^{2}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. What Are Multidimensional Arrays?

#### 2.1. Types of Arrays

#### 2.2. Array Abstraction of Space and Time

#### 2.3. What Array Cell Values Refer to

## 3. Array Operations

#### 3.1. Select Operations

#### 3.2. Scale Operations

#### 3.3. Reduce Operations

#### 3.4. Rearrange Operations

#### 3.5. Compute Operations

## 4. Implementations

## 5. Methods on Arrays

#### 5.1. Regridding and Change of Support

#### 5.2. Dimension Reduction

## 6. Study Cases of Regridding and Dimension Reduction

#### 6.1. Study Case: Regridding

#### 6.2. Study Case: Dimension Reduction

- bands are variables (columns), temporal-spatial points are observations ($M{b}_{11137\times 6}$)
- times are variables, spectral-spatial points are observations ($M{t}_{1554\times 43}$)
- spectral time series are variables, spatial points are observations ($Mb{t}_{259\times 258}$)

## 7. Discussion

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

DGGs | Discrete Global Grids |

PCA | Principal Component Analysis |

EOF | Empirical Orthogonal Functions |

MNF | Maximum Noise Fraction |

CCA | Canonical Correlation Analysis |

Fmask | Function of mask |

SVD | Singular Vector Decomposition |

UTC | Coordinate Universal Time |

## References

- Galton, A. Fields and objects in space, time, and space-time. Spat. Cogn. Comput.
**2004**, 4, 39–68. [Google Scholar] [CrossRef] - Scheider, S.; Gräler, B.; Pebesma, E.; Stasch, C. Modelling spatio-temporal information generation. Int. J. Geogr. Inf. Sci.
**2016**, 30, 1980–2008. [Google Scholar] - Bauer-Marschallinger, B.; Sabel, D.; Wagner, W. Optimisation of global grids for high-resolution remote sensing data. Comput. Geosci.
**2014**, 72, 84–93. [Google Scholar] [CrossRef] - Peckham, S.D.; Hutton, E.W.; Norris, B. A component-based approach to integrated modeling in the geosciences: The design of CSDMS. Comput. Geosci.
**2013**, 53, 3–12. [Google Scholar] [CrossRef] - Schabenberger, O.; Gotway, C.A. Statistical Methods for Spatial Data Analysis; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
- Gotway, C.A.; Young, L.J. Combining incompatible spatial data. J. Am. Stat. Assoc.
**2002**, 97, 632–648. [Google Scholar] [CrossRef] - Hyvärinen, A. Survey on independent component analysis. Neural Comput. Surv.
**1999**, 2, 94–128. [Google Scholar] - Green, A.A.; Berman, M.; Switzer, P.; Craig, M.D. A transformation for ordering multispectral data in terms of image quality with implications for noise removal. IEEE Trans. Geosci. Remote Sens.
**1988**, 26, 65–74. [Google Scholar] [CrossRef][Green Version] - Furtado, P.; Baumann, P. Storage of multidimensional arrays based on arbitrary tiling. In Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia, 23–26 March 1999; pp. 480–489. [Google Scholar]
- Dong, B.; Wu, K.; Byna, S.; Liu, J.; Zhao, W.; Rusu, F. ArrayUDF: User-Defined Scientific Data Analysis on Arrays. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, Washington, DC, USA, 26–30 June 2017; ACM: New York, NY, USA, 2017; pp. 53–64. [Google Scholar]
- Stonebraker, M.; Brown, P.; Zhang, D.; Becla, J. SciDB: A database management system for applications with complex analytics. Comput. Sci. Eng.
**2013**, 15, 54–62. [Google Scholar] [CrossRef] - Baumann, P.; Dehmel, A.; Furtado, P.; Ritsch, R.; Widmann, N. The Multidimensional Database System RasDaMan; ACM SIGMOD Record; ACM: New York, NY, USA, 1998; Volume 27, pp. 575–577. [Google Scholar]
- Rusu, F.; Cheng, Y. A survey on array storage, query languages, and systems. arXiv, 2013; arXiv:1302.0103. [Google Scholar]
- Cudre-Mauroux, P.; Kimura, H.; Lim, K.-T.; Rogers, J.; Madden, S.; tonebraker, M.; Zdonik, S.B.; Brown, P.G. Ss-db: A Standard Science DBMS Benchmark. Available online: www-conf.slac.stanford.edu/xldb10/docs/ssdb_benchmark.pdf (accessed on 2 August 2018).
- Cheng, Y.; Rusu, F. Formal representation of the SS-DB benchmark and experimental evaluation in EXTASCID. Distrib. Parallel Datab.
**2015**, 33, 277–317. [Google Scholar] [CrossRef] - Baumann, P. A database array algebra for spatio-temporal data and beyond. In Next Generation Information Technologies and Systems; Springer: Berlin, Germany, 1999; pp. 76–93. [Google Scholar]
- Richards, J.A.; Jia, X. Remote Sensing Digital Image Analysis: An Introduction; Springer-Verlag, Inc.: Secaucus, NJ, USA, 2005. [Google Scholar]
- Schmidt, A. An Array Algebra. arXiv, 2008; arXiv:0812.4986. [Google Scholar]
- Codd, E.F. A relational model of data for large shared data banks. Commun. ACM
**1970**, 13, 377–387. [Google Scholar] [CrossRef] - Marathe, A.P.; Salem, K. A Language for Manipulating Arrays. In Proceedings of the 23rd International Conference on Very Large Data Bases VLDB ’97, Athens, Greece, 25–29 August 1997; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1997; pp. 46–55. [Google Scholar]
- Van Ballegooij, A. RAM: A Multidimensional Array DBMS; EDBT Workshops; Springer: Berlin, Germany, 2004; Volume 3268, pp. 154–165. [Google Scholar]
- Ritter, G. Recent developments in image algebra. Adv. Electron. Electron Phys.
**1991**, 80, 243–308. [Google Scholar] - Appel, M.; Lahn, F.; Pebesma, E.; Buytaert, W.; Moulds, S. Scalable Earth-observation Analytics for Geoscientists: Spacetime Extensions to the Array Database SciDB. In Proceedings of the EGU General Assembly 2016, Vienna, Austria, 17–22 April 2016; Volume 18. [Google Scholar]
- Appel, M.; Lahn, F.; Buytaert, W.; Pebesma, E. Open and scalable analytics of large Earth observation datasets: From scenes to multidimensional arrays using SciDB and GDAL. ISPRS J. Photogramm. Remote Sens.
**2018**, 138, 47–56. [Google Scholar] [CrossRef] - Aiordăchioaie, A.; Baumann, P. Petascope: An open-source implementation of the OGC WCS Geo service standards suite. In Proceedings of the International Conference on Scientific and Statistical Database Management, Portland, OR, USA, 20–22 July 2011; Springer: Berlin, Germany, 2010; pp. 160–168. [Google Scholar]
- White, T. Hadoop: The Definitive Guide; O’Reilly Media, Inc.: Newton, MA, USA, 2012. [Google Scholar]
- Buck, J.B.; Watkins, N.; LeFevre, J.; Ioannidou, K.; Maltzahn, C.; Polyzotis, N.; Brandt, S. SciHadoop: Array-based query processing in Hadoop. In Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, Washington, DC, USA, 12–18 November 2011; pp. 1–11. [Google Scholar]
- Li, Z.; Hu, F.; Schnase, J.L.; Duffy, D.Q.; Lee, T.; Bowen, M.K.; Yang, C. A spatiotemporal indexing approach for efficient processing of big array-based climate data with MapReduce. Int. J. Geogr. Inf. Sci.
**2017**, 31, 17–35. [Google Scholar] [CrossRef] - Pebesma, E. spacetime: Spatio-temporal data in R. J. Stat. Softw.
**2012**, 51, 1–30. [Google Scholar] [CrossRef] - Hijmans, R.J.; van Etten, J. Raster: Geographic Data Analysis and Modeling. R Package Version 2.5-8. Available online: https://CRAN.R-project.org/package=raster (accessed on 2 August 2018).
- Yue, L.; Shen, H.; Yuan, Q.; Zhang, L. Fusion of multi-scale DEMs using a regularized super-resolution method. Int. J. Geogr. Inf. Sci.
**2015**, 29, 2095–2120. [Google Scholar] [CrossRef] - Reiche, J.; Verbesselt, J.; Hoekman, D.; Herold, M. Fusing Landsat and SAR time series to detect deforestation in the tropics. Remote Sens. Environ.
**2015**, 156, 276–293. [Google Scholar] [CrossRef] - Sedano, F.; Kempeneers, P.; Hurtt, G. A Kalman Filter-Based Method to Generate Continuous Time Series of Medium-Resolution NDVI Images. Remote Sens.
**2014**, 6, 12381–12408. [Google Scholar] [CrossRef][Green Version] - Gevaert, C.M.; García-Haro, F.J. A comparison of STARFM and an unmixing-based algorithm for Landsat and MODIS data fusion. Remote Sens. Environ.
**2015**, 156, 34–44. [Google Scholar] [CrossRef] - Schmidt, M.; Lucas, R.; Bunting, P.; Verbesselt, J.; Armston, J. Multi-resolution time series imagery for forest disturbance and regrowth monitoring in Queensland, Australia. Remote Sens. Environ.
**2015**, 158, 156–168. [Google Scholar] [CrossRef][Green Version] - Gregersen, J.; Gijsbers, P.; Westen, S. OpenMI: Open modelling interface. J. Hydroinform.
**2007**, 9, 175–191. [Google Scholar] [CrossRef] - Duchon, C.E. Lanczos filtering in one and two dimensions. J. Appl. Meteorol.
**1979**, 18, 1016–1022. [Google Scholar] [CrossRef] - Stevens, S.S. On the Theory of Scales and Measurement. Science
**1946**, 103, 677–680. [Google Scholar] [CrossRef] [PubMed] - Bierkens, M.; Finke, P.; De Willigen, P. Upscaling and Downscaling Methods for Environmental Research; Kluwer Academic: Dordrecht, The Netherlands, 2000. [Google Scholar]
- Truong, P.N.; Heuvelink, G.B.; Pebesma, E. Bayesian area-to-point kriging using expert knowledge as informative priors. Int. J. Appl. Earth Obs. Geoinf.
**2014**, 30, 128–138. [Google Scholar] [CrossRef] - Journel, A.G.; Huijbregts, C.J. Mining Geostatistics; Academic Press: Cambridge, MA, USA, 1978. [Google Scholar]
- Nielsen, A.A.; Conradsen, K.; Simpson, J.J. Multivariate alteration detection (MAD) and MAF postprocessing in multispectral, bitemporal image data: New approaches to change detection studies. Remote Sens. Environ.
**1998**, 64, 1–19. [Google Scholar] [CrossRef] - PROJ contributors. PROJ Coordinate Transformation Software Library; Open Source Geospatial Foundation: Chicago, IL, USA, 2018. [Google Scholar]
- Reiche, J.; de Bruin, S.; Hoekman, D.; Verbesselt, J.; Herold, M. A Bayesian approach to combine Landsat and ALOS PALSAR time series for near real-time deforestation detection. Remote Sens.
**2015**, 7, 4973–4996. [Google Scholar] [CrossRef] - Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ.
**2012**, 118, 83–94. [Google Scholar] [CrossRef] - Cressie, N. Statistics For Spatial Data, Revised Edition; John Wiley & Sons: New York, NY, USA, 1993; p. 928. [Google Scholar]
- Lu, M.; Pebesma, E.; Sanchez, A.; Verbesselt, J. Spatio-temporal change detection from multidimensional arrays: Detecting deforestation from MODIS time series. ISPRS J. Photogramm. Remote Sens.
**2016**, 117, 227–236. [Google Scholar] [CrossRef] - Doherty, P.J.; Guo, Q.; Li, W.; Doke, J. Space-time analyses for forecasting future incident occurrence: A case study from Yosemite National Park using the presence and background learning algorithm. Int. J. Geogr. Inf. Sci.
**2014**, 28, 910–927. [Google Scholar] [CrossRef] - Cressie, N.; Wikle, C.K. Statistics for Spatio-Temporal Data; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Bolin, D.; Lindström, J.; Eklundh, L.; Lindgren, F. Fast estimation of spatially dependent temporal vegetation trends using Gaussian Markov random fields. Comput. Stat. Data Anal.
**2009**, 53, 2885–2896. [Google Scholar] [CrossRef][Green Version] - Huang, B.; Wu, B.; Barry, M. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int. J. Geogr. Inf. Sci.
**2010**, 24, 383–401. [Google Scholar] [CrossRef] - Bates, D.; Maechler, M. Matrix: Sparse and Dense Matrix Classes and Methods. R Package Version 1.2-7.1. Available online: https://CRAN.R-project.org/package=Matrix (accessed on 2 August 2018).
- Planthaber, G.; Stonebraker, M.; Frew, J. EarthDB: Scalable analysis of MODIS data using SciDB. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, Redondo Beach, CA, USA, 6 November 2012; ACM: New York, NY, USA, 2012; pp. 11–19. [Google Scholar]
- Jones, R.H. Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics
**1980**, 22, 389–395. [Google Scholar] [CrossRef] - Scargle, J.D. Studies in astronomical time series analysis. II-Statistical aspects of spectral analysis of unevenly spaced data. Astrophys. J.
**1982**, 263, 835–853. [Google Scholar] [CrossRef] - Broersen, P.M.; Bos, R. Time-series analysis if data are randomly missing. IEEE Trans. Instrum. Meas.
**2006**, 55, 79–84. [Google Scholar] [CrossRef] - Furrer, R.; Sain, S.R. spam: A Sparse Matrix R Package with Emphasis on MCMC Methods for Gaussian Markov Random Fields. J. Stat. Softw.
**2010**, 36, 1–25. [Google Scholar] [CrossRef] - Sahr, K.; White, D.; Kimerling, A.J. Geodesic discrete global grid systems. Cartogr. Geogr. Inf. Sci.
**2003**, 30, 121–134. [Google Scholar] [CrossRef] - Dutton, G.H. A Hierarchical Coordinate System for Geoprocessing and Cartography; Springer: Berlin, Germany, 1999. [Google Scholar]

**Figure 2.**Array operations, from top to bottom: select, scale, reduce, rearrange and compute operations. “A” indicates original arrays, “B” indicates result arrays after certain array operations are applied. The application of “reduce” functions changes array cardinality, the application of “rearrange" functions alters array dimensions.

**Figure 3.**Regridding: original values are available for the grid indicated by grey lines, new values are required for the black lined grid (e.g., the red cell), or vice versa (e.g., the green cell). New cell values can be calculated from the intersecting grid areas (lower left), intersecting grid cell centre points (lower right), or using interpolation (e.g., from black cells or cell center points to the green cell).

**Figure 4.**Comparing using the bilinear resampling of projectRaster and gdalwarp to reproject and resample the grid of Landsat 8 image to the grid of the MODIS 09Q1 image.

**Figure 5.**A comparison between using the bilinear and the nearest neighbour methods to align the Landsat TM band to the MODIS grid.

**Figure 6.**PC loadings (1–4) of $Mb$ (

**a**), $Mt$ (

**b**) and $Mbt$ (

**c**). The brown vertical line indicates the time of the harvesting event. The points between two grey vertical lines are spatial points of a spectral band.

**Figure 7.**The procedure of the study case and the corresponding array operations, R and SciDB functions.

**Table 1.**Comparison of natively supported array operations in open-source data analysis software. UDF indicates user defined functions. Support of specific features is encoded as single letters where:

`0`and

`I`indicate support of sparse storage and irregular array dimensions respectively,

`G`denotes whether geographic dimensions (space, time) are handled explicitly, and

`S`represents whether the support of cells is considered appropriately. ${D}_{n}$ and ${V}_{m}$ are the maximum number of dimensions and attributes with n and m the limits, absence indicates no limits.

`LA`indicates that linear algebra routines for vectors and matrices are available.

Features | Select Operations | Scale Operations | Reduce Operations | Rearrange Operations | Compute Operations | |
---|---|---|---|---|---|---|

R (base) | ${V}_{1}$, LA | []which | apply | apply | apermdim <-tas | apply+,−,*,/!,&&,||... |

R (raster) | G,${D}_{3}$, ${V}_{1}$ | []cropwhich | aggregatedisaggregatecalcresample | calc | as | focalcalc+,-,*,/!,&&,||... |

R (spacetime) | S,0,G,I,${D}_{4}$ | [] | aggregate | apply | as | +,-,*,/!,&&,||... |

GNU Octave | ${V}_{1}$, LA | ,[] | accumarraysumprodsumsq | reshapepermuteipermutevertcathorzcatflipudflip rot90rotdim vec | arrayfuncumsumcumprod+,-,*,/not,and,or... | |

Python (NumPy) | ${V}_{1}$, LA | []takeclip | apply_ over_axesapply_ along_axes | min maxtrace prodsum meanvar stdapply_ over_axesapply_ along_axes | flattenravelswapaxestransposereshaperesizesqueeze | cumsumcumprod+,-,*,/not,and,or... |

Rasdaman CE (RASQL) | G | selectwhere[]. | scale | condense | +,-,*,/not,and,or... | |

SciDB CE (AFL) | 0,G,LA | subarraybetweenprojectfilterslice | regridxgrid | aggregate | redimensionreshapetranspose | windowcumulate+,-,*,/applynot,and,or... |

ArrayUDF | G | UDF | UDF | UDF | UDF | UDF |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lu, M.; Appel, M.; Pebesma, E. Multidimensional Arrays for Analysing Geoscientific Data. *ISPRS Int. J. Geo-Inf.* **2018**, *7*, 313.
https://doi.org/10.3390/ijgi7080313

**AMA Style**

Lu M, Appel M, Pebesma E. Multidimensional Arrays for Analysing Geoscientific Data. *ISPRS International Journal of Geo-Information*. 2018; 7(8):313.
https://doi.org/10.3390/ijgi7080313

**Chicago/Turabian Style**

Lu, Meng, Marius Appel, and Edzer Pebesma. 2018. "Multidimensional Arrays for Analysing Geoscientific Data" *ISPRS International Journal of Geo-Information* 7, no. 8: 313.
https://doi.org/10.3390/ijgi7080313