Next Article in Journal
3-D Morphological Change Analysis of a Beach with Seagrass Berm Using a Terrestrial Laser Scanner
Previous Article in Journal
A RSSI/PDR-Based Probabilistic Position Selection Algorithm with NLOS Identification for Indoor Localisation
Open AccessArticle

Analysis of Thematic Similarity Using Confusion Matrices

Departamento de Ingeniería Cartográfica, Geodésica y Fotogrametría, Universidad de Jaén, 23071 Jaén, Spain
Departamento de Estadística e Investigación Operativa, Universidad de Jaén, 23071 Jaén, Spain
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2018, 7(6), 233;
Received: 8 May 2018 / Revised: 13 June 2018 / Accepted: 18 June 2018 / Published: 20 June 2018
PDF [761 KB, uploaded 20 June 2018]


The confusion matrix is the standard way to report on the thematic accuracy of geographic data (spatial databases, topographic maps, thematic maps, classified images, remote sensing products, etc.). Two widely adopted indices for the assessment of thematic quality are derived from the confusion matrix. They are overall accuracy (OA) and the Kappa coefficient (ĸ), which have received some criticism from some authors. Both can be used to test the similarity of two independent classifications by means of a simple statistical hypothesis test, which is the usual practice. Nevertheless, this is not recommended, because different combinations of cell values in the matrix can obtain the same value of OA or ĸ, due to the aggregation of data needed to compute these indices. Thus, not rejecting a test for equality between two index values does not necessarily mean that the two matrices are similar. Therefore, we present a new statistical tool to evaluate the similarity between two confusion matrices. It takes into account that the number of sample units correctly and incorrectly classified can be modeled by means of a multinomial distribution. Thus, it uses the individual cell values in the matrices and not aggregated information, such as the OA or ĸ values. For this purpose, it is considered a test function based on the discrete squared Hellinger distance, which is a measure of similarity between probability distributions. Given that the asymptotic approximation of the null distribution of the test statistic is rather poor for small and moderate sample sizes, we used a bootstrap estimator. To explore how the p-value evolves, we applied the proposed method over several predefined matrices which are perturbed in a specified range. Finally, a complete numerical example of the comparison of two matrices is presented. View Full-Text
Keywords: thematic accuracy; confusion matrix; multinomial distribution; similarity; Hellinger distance; bootstrapping thematic accuracy; confusion matrix; multinomial distribution; similarity; Hellinger distance; bootstrapping

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

García-Balboa, J.L.; Alba-Fernández, M.V.; Ariza-López, F.J.; Rodríguez-Avi, J. Analysis of Thematic Similarity Using Confusion Matrices. ISPRS Int. J. Geo-Inf. 2018, 7, 233.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
ISPRS Int. J. Geo-Inf. EISSN 2220-9964 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top