# A Suite of Tools for ROC Analysis of Spatial Models

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

**Figure 1.**(

**a**) Map of probability and (

**b**) binary map of event, for 100 grid cells. Grid cells with high to medium probability (black and dark grey cells) tend to coincide with the 11 event black grid cells.

**Table 1.**Contingency table used to compute a threshold point on the ROC curve. H

_{t}, F

_{t}, M

_{t}, and C

_{t}are respectively the proportion of grid cells corresponding to hits, false alarms, misses and correct rejections (Modified from Pontius and Parmentier [6]).

Event Map | 1 (Event) | 0 (No event) | Threshold Total |
---|---|---|---|

Threshold Map | |||

1 (Modeled as event) | H_{t} | F_{t} | H_{t} + F_{t} |

0 (Modeled as No event) | M_{t} | C_{t} | M_{t} + C_{t} |

Event total | H_{t} + M_{t} | F_{t} + C_{t} | 1 |

_{t}/(F

_{t}+ C

_{t})) and the vertical axis the true positive rate (proportion of the true event cells modeled as event, that is H

_{t}/(H

_{t}+ M

_{t})). A popular summary metric is the area under the curve (AUC) that connects the points obtained by the various thresholds. If the true events coincide perfectly with the higher ranked probabilities, then the Area Under the Curve (AUC) is equal to one because the curve begins at the point (0,0), goes up the horizontal axis to the point (0,1), and to the right to the point (1,1). A random probability map produces a diagonal ROC curve in which the true positive rate equals the false positive rate at all threshold points. Any probability map that has a ROC curve below the diagonal has less predictive power than a random map. In the literature, false and true positives rates are also referred as (1-specificity) and sensitivity respectively (Figure 2).

**Figure 2.**The ROC Curve for the maps of Figure 1. True and false positive rates are computed for each threshold applied to the probability map. To define the first point in the red square, we observe that the first bin has cells coded 1 in a threshold map that captures the 10 highest probability darkest cells. Four of them coincide with the 11 event cells, thus generates a true positive rate = 4/11. The other six cells coincide with the 89 no event cells, thus generates a false positive rate = 6/89. The next point in the ROC curve is defined taking into account all the cells above the next lower probability threshold.

## 2. Dinamica EGO

## 3. Implementation of ROC Analysis for Raster Maps

_{t}+ F

_{t}), instead of the false positive rate. In fact, this change of horizontal axis does not induce large change in the ROC curve when the number of hits is much less than the number of false alarms (H

_{t}<< F

_{t}) and the number of presence cells (points of occurrence) is much smaller than the number of pseudo-absences (H

_{t}+ M

_{t}<< F

_{t}+ C

_{t}), but the alternative horizontal axis can lead to additional insights concerning the ROC curve.

#### 3.1. AUC and pAUC Estimation

**Figure 3.**Partial area under the curve (AUC) for a range on the horizontal axis. pAUC corresponds to the area AEFD. Its value is standardized using the pAUC of a random model (area ABCD) and a perfect model (area AGHD).

#### 3.2. Confidence Intervals

_{k}for a cell to be selected k times in a bootstrapped replicate is calculated by Equation (2):

_{k}is the probability for a cell to be selected k times into a bootstrap replicate where n is the number of cells in the stratum to which the cell belongs.

#### 3.3. Comparison of Two ROC Curves

_{1}and AUC

_{2}are the two AUCs and sd (AUC

_{2}− AUC

_{1}) is the standard deviation of the difference between the two AUCs with numerous replicates. As Z approximately represents a normal distribution, one or two-tailed p-values are computed to carry out one or two-tailed tests respectively. The same concepts apply to partial AUCs.

#### 3.4. Improvements in the Use and Interpretation of ROC Curves

_{t}+ M

_{t}equals H

_{t}+ F

_{t}. In order to highlight important threshold points on the ROC curve, a tool was designed to show the threshold’s corresponding probability and the proportion of the study area that has a probability below the threshold. Finally, the density of the event occurrence in each bin of the ROC curve was computed as the ratio between the occurrence cells and the candidate cells of a given bin (Equation (5)). The result can be represented by a bar plot or a map.

_{t}is the density of occurrence cells in bin t, H

_{t}and H

_{t + 1}are hits at threshold t and t + 1 respectively and, M

_{t}and M

_{t + 1}are misses at threshold t and t + 1 respectively.

#### 3.5. Decreasing Computing Time

**Figure 4.**Sampling procedure. Original image is read line by line and selected cells are sorted into a one-line resampled map.

## 4. Applications

#### 4.1. Land Use/Cover Change (LUCC) Model

**Figure 5.**(

**a**) Map of observed forest cover change during 1994–1999 and (

**b**) probability of post-1994 deforestation. The white non forest areas at 1994 are eliminated from the analysis.

**Figure 6.**ROC curve obtained by comparing the probability of post-1994 deforestation map versus observed deforestation between 1994 and 1999, using 100 bins and the equal probability increment method. The point identified in the ROC curve corresponds to the area expected to be deforested during 1994–1999, assuming pre-1994 trends were to continue beyond 1994. The blue area corresponds to the partial AUC focused on high probability values, which are 0–0.25 on the False Positive Rate axis.

^{−89}.

#### 4.2. Models of Species Distribution

**Figure 7.**Maps of probability of presence of B. variegatus obtained by Weights of Evidence (WofE) and MaxEnt methods.

**Figure 8.**Cumulative distribution functions (CDFs) for the probability maps from WofE and MaxEnt. The vertical axis is the proportion of the candidate region that has a probability values less than or equal to the value on the horizontal axis.

**Figure 9.**ROC curves obtained by WofE and MaxEnt methods. Grey shaded area represents partial AUC of WofE model between 0.95 and 1 on the True Positive Rate axis. The pAUCs are similar for WofE and MaxEnt, which indicates that the probability maps are similar concerning where the relatively lower probabilities are allocated.

AUC | Based on Entire Data | Based on Resampled Data | ||||||
---|---|---|---|---|---|---|---|---|

Number of bins | 100 | 20 | 10 | 5 | 100 | 20 | 10 | 5 |

WofE | 0.746 (−0.3) | 0.739 (−1.2) | 0.734 (−1.8) | 0.709 (−5.3) | 0.746 (−0.3) | 0.738 (−1.3) | 0.734 (−1.9) | 0.709 (−5.2) |

MaxEnt | 0.806 (−0.6) | 0.800 (−1.3) | 0.782 (−3.6) | 0.737 (−9.2) | 0.805 (−0.7) | 0.800 (−1.4) | 0.781 (−3.7) | 0.736 (−9.3) |

AUC | Based on Entire Data | Based on Resampled Data | ||||||
---|---|---|---|---|---|---|---|---|

Number of bins | 100 | 20 | 10 | 5 | 100 | 20 | 10 | 5 |

WofE | 0.704 (−5.9) | 0.687 (−8.1) | 0.665 (−11.1) | 0.656 (−12.3) | 0.703 (−6.0) | 0.687 (−8.1) | 0.665 (−11.1) | 0.657 (−12.2) |

MaxEnt | 0.71 (−11.8) | 0.674 (−16.9) | 0.636 (−21.5) | 0.611 (−24.6) | 0.715 (−11.9) | 0.674 (-16.9) | 0.636 (−21.6) | 0.611 (−24.6) |

**Table 3.**Values of upper, trapezoidal, and lower AUC at various numbers of bins for the equal probability increment method.

Number of Bins | ||||
---|---|---|---|---|

100 | 20 | 10 | 5 | |

AUC upper | 0.7617 | 0.7780 | 0.8006 | 0.8218 |

AUC | 0.7458 | 0.7385 | 0.7341 | 0.7085 |

AUC lower | 0.7299 | 0.6990 | 0.6676 | 0.5952 |

**Figure 10.**Trapezoidal, lower and upper ROC curves from the same probability map with 0.05 (

**Left**) and 0.2 (

**Right**) slicing increments. When the threshold increment is 0.2, the number of bins is 5.When the threshold increment is 0.05, the number of bins is 20.

**Table 4.**AUC and partial AUC values along with their confidence interval using alpha = 0.05 obtained using WofE and MaxEnt. Partial AUC was calculated between 0.95 and one in the True Positive Rate (vertical) axis, values reported are normalized.

Software | Index | Inferior bound | Index Value | Superior bound |
---|---|---|---|---|

WofE | AUC | 0.6618 | 0.7382 | 0.8055 |

MaxEnt | AUC | 0.7231 | 0.7996 | 0.8706 |

WofE | pAUC | 0.7798 | 0.9051 | 0.9979 |

MaxEnt | pAUC | 0.8352 | 0.9179 | 0.9990 |

**Figure 11.**Density of species occurrence expressed as a proportion (%) in each bin (Equation (5)). Bins are ordered with lower probabilities on the left and higher probabilities on the right using the equal probability increment method.

## 5. Discussion

## 6. Conclusion

## Acknowledgments

## Conflict of Interest

## References

- Swets, J.A. Signal Detection Theory and ROC Analysis in Psychology and Diagnostics, 1st ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1996. [Google Scholar]
- Satchell, S.; Xia, W. Analytic Models of the ROC Curve: Applications to Credit Rating Model Validation. In The Analytics of Risk Model Validation, 1st ed.; Christodoulakis, G., Satchell, S., Eds.; Elsevier: London, UK, 2008. [Google Scholar]
- Sonego, P.; Kocsor, A.; Pongor, S. ROC analysis: Applications to the classification of biological sequences and 3D structures. Brief. Bioinform.
**2008**, 9, 198–209. [Google Scholar] [CrossRef] - Li, R.; Guan, Q.; Merchant, J. A geospatial modeling framework for assessing biofuels-related land-use and land-cover change. Agr. Ecosyst. Environ.
**2012**, 161, 17–26. [Google Scholar] [CrossRef] - Pontius, R.G., Jr.; Batchu, K. Using the relative operating characteristic to quantify certainty in prediction of location of land cover change in India. Trans. GIS
**2003**, 7, 467–484. [Google Scholar] - Pontius, R.G., Jr.; Parmentier, B. Recommendations for using the relative operating characteristic (ROC). Landsc. Ecol.
**2013**. submitted for publication. [Google Scholar] - Fawcett, T. An introduction to ROC analysis. Pattern. Recogni. Lett.
**2006**, 27, 861–874. [Google Scholar] [CrossRef] - Robin, X.; Turck, N.; Hainard, A.; Tiberti, N.; Lisacek, F.; Sanchez, J.C.; Müller, M. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinforma.
**2011**, 12. [Google Scholar] [CrossRef] - Soares-Filho, B.S.; Rodrigues, H.O.; Follador, M. A hybrid analytical-heuristic method for calibrating land-use change models. Environ. Model. Soft.
**2013**, 43, 80–87. [Google Scholar] [CrossRef] - Peterson, A.T.; Papeş, M.; Soberón, J. Rethinking receiver operating characteristic analysis applications in ecological Niche modelling. Ecol. Model.
**2008**, 213, 63–72. [Google Scholar] [CrossRef] - McClish, D.K. Analyzing a portion of the ROC curve. Med. Decis. Making
**1989**, 9, 190–195. [Google Scholar] [CrossRef] - Santini, S. Computing the Binomial. Coefficients. 2007. Available online: http://arantxa.ii.uam.es/~ssantini/writing/notes/s667_binomial.pdf (accessed on 21 June 2013).
- Pontius, R.G., Jr.; Schneider, L.C. Land-cover change model validation by an ROC method for the Ipswich Watershed, Massachusetts, USA. Agr. Ecosyst. Environ.
**2001**, 85, 239–248. [Google Scholar] [CrossRef] - Lobo, J.M.; Jiménez-Valverde, A.; Real, R. AUC: A Misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr.
**2008**, 17, 145–151. [Google Scholar] [CrossRef] - Phillips, S.J.; Anderson, R.P.; Schapire, R.E. Maximum entropy modeling of species geographic distributions. Ecol. Model.
**2006**, 190, 231–259. [Google Scholar] [CrossRef] - Mas, J.F.; Farfán, M.; Ghilen, C.; Lima, T.; Soares Filho, B. Una Comparación de dos Enfoques de Modelación de Nicho Ecológico. In Proceedings of Memorias de la XX Reunión SELPER, San Luis Potosí, México, 21–25 October 2013.
- Soares-Filho, B.S.; Alencar, A.; Nepstad, D.; Cerqueira, G.; Vera Diaz, M.; Rivero, S.; Solorzano, L.; Voll, E. Simulating the response of land-cover changes to road paving and governance along a major Amazon highway: The Santarém-Cuiabá Corridor. Glob. Change Biol.
**2004**, 10, 745–764. [Google Scholar] [CrossRef] - Mas, J.F.; Pérez-Vega, A.; Clarke, K.C. Assessing simulated land use/cover maps using similarity and fragmentation indices. Ecol. Complex
**2012**, 11, 38–45. [Google Scholar] [CrossRef] - Pontius, R.G., Jr.; Pacheco, P. Calibration and validation of a model of forest disturbance in the western Ghats, India 1920–1990. GeoJournal
**2004**, 61, 325–334. [Google Scholar] [CrossRef] - R Development Core Team, R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013.

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Mas, J.-F.; Soares Filho, B.; Pontius, R.G.; Farfán Gutiérrez, M.; Rodrigues, H. A Suite of Tools for ROC Analysis of Spatial Models. *ISPRS Int. J. Geo-Inf.* **2013**, *2*, 869-887.
https://doi.org/10.3390/ijgi2030869

**AMA Style**

Mas J-F, Soares Filho B, Pontius RG, Farfán Gutiérrez M, Rodrigues H. A Suite of Tools for ROC Analysis of Spatial Models. *ISPRS International Journal of Geo-Information*. 2013; 2(3):869-887.
https://doi.org/10.3390/ijgi2030869

**Chicago/Turabian Style**

Mas, Jean-François, Britaldo Soares Filho, Robert Gilmore Pontius, Michelle Farfán Gutiérrez, and Hermann Rodrigues. 2013. "A Suite of Tools for ROC Analysis of Spatial Models" *ISPRS International Journal of Geo-Information* 2, no. 3: 869-887.
https://doi.org/10.3390/ijgi2030869