Convolutional neural network (CNN)-based deep learning (DL) is a powerful, recently developed image classification approach. With origins in the computer vision and image processing communities, the accuracy assessment methods developed for CNN-based DL use a wide range of metrics that may be unfamiliar to the remote sensing (RS) community. To explore the differences between traditional RS and DL RS methods, we surveyed a random selection of 100 papers from the RS DL literature. The results show that RS DL studies have largely abandoned traditional RS accuracy assessment terminology, though some of the accuracy measures typically used in DL papers, most notably precision and recall, have direct equivalents in traditional RS terminology. Some of the DL accuracy terms have multiple names, or are equivalent to another measure. In our sample, DL studies only rarely reported a complete confusion matrix, and when they did so, it was even more rare that the confusion matrix estimated population properties. On the other hand, some DL studies are increasingly paying attention to the role of class prevalence in designing accuracy assessment approaches. DL studies that evaluate the decision boundary threshold over a range of values tend to use the precision-recall (P-R) curve, the associated area under the curve (AUC) measures of average precision (AP) and mean average precision (mAP), rather than the traditional receiver operating characteristic (ROC) curve and its AUC. DL studies are also notable for testing the generalization of their models on entirely new datasets, including data from new areas, new acquisition times, or even new sensors.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited