Replicated maxent model for Pseudo_nitzchia_seriata


This page summarizes the results of 5 split-sample models for Pseudo_nitzchia_seriata, created Mon May 30 09:33:21 BST 2022 using Maxent version 3.3.3a. The individual models are here: [0] [1] [2] [3] [4]


Analysis of omission/commission

The following picture shows the test omission rate and predicted area as a function of the cumulative threshold, averaged over the replicate runs. The omission rate should be close to the predicted omission, because of the definition of the cumulative threshold.


The next picture is the receiver operating characteristic (ROC) curve for the same data, again averaged over the replicate runs. Note that the specificity is defined using predicted area, rather than true commission (see the paper by Phillips, Anderson and Schapire cited on the help page for discussion of what this means). The average test AUC for the replicate runs is 0.785, and the standard deviation is 0.009.



Pictures of the model



Analysis of variable contributions


The following table gives estimates of relative contributions of the environmental variables to the Maxent model. To determine the first estimate, in each iteration of the training algorithm, the increase in regularized gain is added to the contribution of the corresponding variable, or subtracted from it if the change to the absolute value of lambda is negative. For the second estimate, for each environmental variable in turn, the values of that variable on training presence and background data are randomly permuted. The model is reevaluated on the permuted data, and the resulting drop in training AUC is shown in the table, normalized to percentages. As with the variable jackknife, variable contributions should be interpreted with caution when the predictor variables are correlated. Values shown are averages over replicate runs.

VariablePercent contributionPermutation importance
Bathy81.974.1
TempMin9.59.6
TempRange4.76.3
TempMean0.92.1
SalMax0.80.8
TempMax0.61.5
SalMean0.30.9
SalRange0.30.7
CVMin0.20.7
CVMax0.21.4
CVRange0.20.6
SalMin0.20.5
CVMean00.8


The following picture shows the results of the jackknife test of variable importance. The environmental variable with highest gain when used in isolation is Bathy, which therefore appears to have the most useful information by itself. The environmental variable that decreases the gain the most when it is omitted is Bathy, which therefore appears to have the most information that isn't present in the other variables. Values shown are averages over replicate runs.



The next picture shows the same jackknife test, using test gain instead of training gain. Note that conclusions about which variables are most important can change, now that we're looking at test data.


Lastly, we have the same jackknife test, using AUC on test data.




Command line to repeat this species model: java density.MaxEnt nowarnings noprefixes -E "" -E Pseudo_nitzchia_seriata jackknife outputdirectory=models/Pseudo_nitzchia_seriata samplesfile=occurrences/Pseudo_nitzchia_seriata.csv environmentallayers=backgrounds/Gymnodinium_catenatum_background.csv randomseed nowarnings noaskoverwrite randomtestpoints=20 replicates=5 replicatetype=subsample autorun writeplotdata