Maxent model for Pseudo_nitzchia_fraudulenta_2


This page contains some analysis of the Maxent model for Pseudo_nitzchia_fraudulenta_2, created Mon May 30 09:25:56 BST 2022 using Maxent version 3.3.3a. If you would like to do further analyses, the raw data used here is linked to at the end of this page.


Analysis of omission/commission

The following picture shows the omission rate and predicted area as a function of the cumulative threshold. The omission rate is is calculated both on the training presence records, and (if test data are used) on the test records. The omission rate should be close to the predicted omission, because of the definition of the cumulative threshold.


The next picture is the receiver operating characteristic (ROC) curve for the same data. Note that the specificity is defined using predicted area, rather than true commission (see the paper by Phillips, Anderson and Schapire cited on the help page for discussion of what this means). This implies that the maximum achievable AUC is less than 1. If test data is drawn from the Maxent distribution itself, then the maximum possible test AUC would be 0.935 rather than 1; in practice the test AUC may exceed this bound.



Some common thresholds and corresponding omission rates are as follows. If test data are available, binomial probabilities are calculated exactly if the number of test samples is at most 25, otherwise using a normal approximation to the binomial. These are 1-sided p-values for the null hypothesis that test points are predicted no better than by a random prediction with the same fractional predicted area. The "Balance" threshold minimizes 6 * training omission rate + .04 * cumulative threshold + 1.6 * fractional predicted area.

Cumulative thresholdLogistic thresholdDescriptionFractional predicted areaTraining omission rateTest omission rateP-value
1.0000.028Fixed cumulative value 10.3000.0000.0002.899E-13
5.0000.114Fixed cumulative value 50.2010.0300.0838.713E-14
10.0000.175Fixed cumulative value 100.1510.0500.2508.471E-11
3.7600.097Minimum training presence0.2180.0000.0835.07E-13
17.7180.30210 percentile training presence0.1060.1000.4171.643E-8
17.9290.306Equal training sensitivity and specificity0.1060.1100.4171.643E-8
13.9430.232Maximum training sensitivity plus specificity0.1240.0700.3338.935E-10
6.4880.133Equal test sensitivity and specificity0.1830.0400.1679.055E-12
2.2860.061Maximum test sensitivity plus specificity0.2500.0000.0003.416E-15
2.5700.069Balance training omission, predicted area and threshold value0.2420.0000.0421.266E-13
7.6810.153Equate entropy of thresholded and original distributions0.1710.0400.2507.319E-10


(A link to the Explain tool was not made for this model. The model uses product features, while the Explain tool can only be used for additive models.)



Analysis of variable contributions


The following table gives estimates of relative contributions of the environmental variables to the Maxent model. To determine the first estimate, in each iteration of the training algorithm, the increase in regularized gain is added to the contribution of the corresponding variable, or subtracted from it if the change to the absolute value of lambda is negative. For the second estimate, for each environmental variable in turn, the values of that variable on training presence and background data are randomly permuted. The model is reevaluated on the permuted data, and the resulting drop in training AUC is shown in the table, normalized to percentages. As with the variable jackknife, variable contributions should be interpreted with caution when the predictor variables are correlated.

VariablePercent contributionPermutation importance
Bathy61.262.7
SalMin10.90.4
SalMax81
TempMean4.81.5
TempMax3.70
SalMean3.30.6
TempMin2.59
SalRange2.34.7
TempRange10.4
CVMax0.89.9
CVMin0.71.3
CVRange0.57.8
CVMean0.20.7


The following picture shows the results of the jackknife test of variable importance. The environmental variable with highest gain when used in isolation is Bathy, which therefore appears to have the most useful information by itself. The environmental variable that decreases the gain the most when it is omitted is Bathy, which therefore appears to have the most information that isn't present in the other variables.



The next picture shows the same jackknife test, using test gain instead of training gain. Note that conclusions about which variables are most important can change, now that we're looking at test data.


Lastly, we have the same jackknife test, using AUC on test data.



Raw data outputs and control parameters


The data used in the above analysis is contained in the next links. Please see the Help button for more information on these.
The model applied to the training environmental layers
The coefficients of the model
The omission and predicted area for varying cumulative and raw thresholds
The prediction strength at the training and (optionally) test presence sites
Results for all species modeled in the same Maxent run, with summary statistics and (optionally) jackknife results


Regularized training gain is 1.779, training AUC is 0.946, unregularized training gain is 1.921.
Unregularized test gain is 1.376.
Test AUC is 0.912, standard deviation is 0.017 (calculated as in DeLong, DeLong & Clarke-Pearson 1988, equation 2).
Algorithm terminated after 500 iterations (7 seconds).

The follow settings were used during the run:
100 presence records used for training, 24 for testing.
1222 points used to determine the Maxent distribution (background points and presence points).
Environmental layers used (all continuous): Bathy CVMax CVMean CVMin CVRange SalMax SalMean SalMin SalRange TempMax TempMean TempMin TempRange
Regularization values: linear/quadratic/product: 0.050, categorical: 0.250, threshold: 1.000, hinge: 0.500
Feature types used: hinge product linear threshold quadratic
jackknife: true
outputdirectory: models/Pseudo_nitzchia_fraudulenta
samplesfile: occurrences/Pseudo_nitzchia_fraudulenta.csv
environmentallayers: backgrounds/Alexandrium_minutum_background.csv
randomseed: true
warnings: false
askoverwrite: false
randomtestpoints: 20
replicates: 5
replicatetype: subsample
autorun: true
writeplotdata: true
Command line used: -e backgrounds/Alexandrium_minutum_background.csv -s occurrences/Pseudo_nitzchia_fraudulenta.csv -J -o models/Pseudo_nitzchia_fraudulenta noaskoverwrite logistic threshold -X 20 replicates=5 betamultiplier=1 writeclampgrid=true writemess=true nowarnings writeplotdata=true -a Subsample linear=true quadratic=true product=true threshold=true hinge=true togglelayertype=NA