# sgdm: An R Package for Performing Sparse Generalized Dissimilarity Modelling with Tools for gdm

## Abstract

## 1. Introduction

## 2. General sgdm Package Description

- R> # Package installation from GitHub
- R> devtools::install_github("sparsegdm/sgdm_package")
- R> Loading package
- R> library(sgdm)

## 3. Running a SGDM Model in the sgdm Package

- R> # Parameterize SGDM
- R> sgdm.gs <− sgdm.param(predData = spectra, bioData = trees, k = 30)

- R> # Retrieving and building the best SGDM model
- R> sgdm.model <− sgdm.best(perf.matrix = sgdm.gs, predData = spectra, bioData = trees, output = ”m”, k = 30)

- R> # Retrieving the sparse canonical components corresponding to the best GDM model
- R> sgdm.sccbest <− sgdm.best(perf.matrix = sgdm.gs, predData = spectra, bioData = trees, output = ”c”, k = 30)
- R> # Retrieving the sparse canonical vectors corresponding to the best GDM model
- R> sgdm.vbest <− sgdm.best(perf.matrix = sgdm.gs, predData = spectra, bioData = trees, output = ”v”, k = 30)

- R> # Applying SCCA transformation onto the prediction map
- R> component.image <− predData.transform(predData = spectral.image, v = sgdm.vbest)

## 4. Additional Tools Useful for GDM and SGDM

- R> # Combining pair site data for GDM model variable contribution check
- R> spData.sccbest <− gdm:: formatsitepair(bioData = trees, bioFormat = 1, dist = "bray", abundance = TRUE, siteColumn = "Plot_ID", XColumn = "X", YColumn = "Y", predData = sgdm.sccbest)
- R> # Checking SGDM variable drop contribution
- R> gdm.varcont(spData = spData.sccbest)
- R> # Checking significance of variable contributions
- R> sigtest.sgdm <− gdm.varsig(predData = sgdm.sccbest, bioData = trees)
- R> # Excluding non-significant variables
- R> sgdm.sccbest.red <− data.reduce(data = sgdm.sccbest, datatype = "pred", sigtest = sigtest.sgdm)
- R> # Combining pair site data for input in GDM
- R> spData.sccbest.red <− gdm:: formatsitepair(bioData = trees, bioFormat = 1, dist = "bray", abundance = TRUE, siteColumn = "Plot_ID", XColumn = "X", YColumn = "Y", predData = sgdm.sccbest.red)
- R> # Final SGDM model
- R> sgdm.model.red <− gdm:: gdm(data = spData.sccbest.red)

- R> # 10-fold cross-validation of the final SGDM model
- R> gdm.cv(spData = spData.sccabest.red, nfolds = 10, performance = "r2")

- R> # Mapping community composition
- R> map.sitepairs <− gdm.map(spData = spData.sccabest.red, model = sgdm.model.red)

- R> # Reducing canonical component map to significant components
- R> component.image.red <− data.reduce(data = component.image, datatype = "pred", igtest = sigtest.sgdm)
- R> # Mapping community composition in space
- R> map.image <− gdm.map(spData = spData.sccabest.red, predMap = component.image.red, model = sgdm.model.red, k = 8)

## 5. Discussion

## 6. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

**Figure 1.**Illustrative workflow for deriving a raster map of Cerrado tree communities within the sgdm package.

**Figure 2.**False colour RGB composite of the spectral.image predictor map: a subset of the hyperspectral Hyperion image covering an area of natural vegetation in the Brazilian Cerrado, acquired on 27 June 2014 (DOY 178). The overlaid yellow triangles represent the sample locations, for which both the biological (trees) and predictor (spectra) datasets were derived.

**Figure 3.**Representation of the performance matrix with the model root mean square error (RMSE) values for each parameter pair. In the x axis are penalization values for the environmental matrix (from 0.6 to 1), and in the y axis those for the biological matrix (also from 0.6 to 1).

**Figure 4.**Plot of the fitted generalized dissimilarity modelling (GDM) model and respective I-splines.

**Figure 6.**Plots of the eight non-metric multidimensional scaling (NMDS) axes representing the tree community transitions in the study area, according to the sparse generalized dissimilarity modelling (SGDM) model predictions.

