4.1. Inflation of Map Accuracy
To our knowledge, the strategy we have described here has not been tested prior to this study, though we identified several studies that relied on regression modeling approaches for distilling MF and infeasibility values into meaningful results. For example, as a component of a study on Eastern hemlock (Tsuga canadensis
) decline, Pontius et al. [79
] used MF and infeasibility from MTMF to calibrate the field-estimated percent basal area. The final regression model was extended to all MTMF output pixels to estimate species abundance. Pontius et al. [80
] used a similar approach to map ash trees (Fraxinus
L.) in an urban context, employing MF and infeasibility as predictive variables in logistic regression. Their approach produced a map illustrating the probability that a given pixel contains ash. Gudex-Cross et al. [63
] used MF and infeasibility to predict the percent basal area of different tree species at reference plots using a form of stepwise regression. As part of a hierarchical mapping strategy, the resulting percent basal area rasters served as input to an object-based image analysis that ultimately led to hard classification. These studies support our work by indicating a broader interest in improving traditional MTMF post-processing by employing replicable modeling strategies for integrating MF and infeasibility values.
Recent use of supervised machine learning algorithms in image classification also lends support to our use of these models, particularly considering their strong performance. Rodriguez-Galiano et al. [48
], for example, found that pixel-wise random forests classification of multi-season, multi-texture Landsat TM imagery improved the overall classification accuracy by 31% over traditional maximum likelihood classification. Pal and Mather [81
] found that SVM improved the overall accuracy of classified Landsat ETM+ data by 5% over maximum likelihood and 2.8% over neural network algorithms. Duro et al. [33
] compared three supervised learning methods applied to SPOT-5 imagery: Random forests, SVM, and decision trees. They reported pixel-wise overall accuracies of 89.7%, 89.3%, and 87.6%, respectively, with even higher accuracies achieved using object-based image analysis. Brenning [82
] evaluated eleven algorithms in the classification of Enhanced Thematic Mapper Plus (ETM+) imagery and elevation model terrain derivatives. He found penalized linear discriminant analysis to produce significantly lower error rates compared to the other algorithms. Huang et al. [34
] compared SVM, neural networks, decision trees, and maximum likelihood classifiers. They found that SVM and neural networks generally yielded superior accuracies in comparison with decision trees and maximum likelihood, though model superiority varied based on the way each model was trained. That the best performing algorithm varies between these studies and between the results we present above, is indication that data dimensionality, image heterogeneity, analytical framework (e.g., multi-temporal versus high resolution versus MTMF), and other landscape attributes influence classification accuracy.
With respect to the traditional strategy of manually drawing a region of properly classified cells on a two-dimensional scatterplot of MF by infeasibility, we achieved comparable or slightly lower overall map accuracies. Mundt et al. [8
], for instance, computed an overall accuracy of 82% using a presence/absence classification when applying their “iterative” approach on HyMap scenes at 3.5 m pixel resolution. Mitchell and Glenn [13
] also classified HyMap imagery varying from 3.2 m to 3.3 m resolution and produced overall accuracy values ranging from 67% to 85% via their “interactive” scatterplot approach. Parker-Williams and Hunt [27
] achieved an overall accuracy of 95% when classifying leafy spurge for presence/absence and used custom flown Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data at a 20 m resolution. Depending on the supervised learning algorithm used, our protocol suggests potential overall accuracy inflation between 18.4% and 30.8% without cross-validation—the differences between the overall accuracy from our manual drawing method and that from the cross-validated models.
Importantly, similar MTMF studies (e.g., [3
]) do not report one or more of kappa accuracy, user’s accuracy, producer’s accuracies, or the results of cross-validation procedures along with their classification accuracies. Thus, comparing results across studies may not be appropriate. Indeed, we too achieved overall accuracies as great as 87.1% using the traditional iterative “drawing” approach applied to binary delineation (Table 1
). However, our kappa of 0.23 indicates that this classification is quite weak and inflated by random chance. When overfitting a presence-absence model, we still achieved an overall accuracy of 77.1%, though in this case, we also obtained a kappa of 0.43, which together suggest a more successful, albeit mediocre, classification. If we compare this result with the cross-validated supervised learning methods, we see that the overfit model’s accuracy figures exceed all other presence-absence trials. This is because one-time accuracy assessment can lead to biased results [78
], which is why we recommend using cross-validation. Other studies of image classification have made use of cross-validation strategies [33
], indicating support for the extension of these methods to MTMF.
However, the more significant results of our models are the kappa accuracy values (kappa ≤ 0.35), all of which consistently demonstrate that these models offer little more predictive ability than random assignment according to class proportions. This is what we might expect to see, as the cross-validated supervised learning methods help to correct systematic bias that appears in the drawn and overfit models, yielding instead more realistic (and less misleading) classifications. Table 1
illustrates how supervised learning algorithms can be used to maximize accuracy, while simultaneously minimizing artificial inflation through cross-validation. For our case study, our classified map is not 87.1% accurate, as basic “drawing” would lead us to believe, nor is it 77.1% accurate, as an overfit model might lead us to believe. In the case of presence-absence binned data, we can see that our mapped results are ~35% better than chance, with moderate user’s and producer’s accuracies. Some might generously consider these figures to reflect ‘fair’ agreement between the field data and classified map [83
We present the overfit modeling approach as an extension of the manual, iterative drawing approach that has been used in the literature to date. Manual methods of this nature generally rely on a single and complete training set to obtain the classification threshold(s), and are therefore not conducive to rigorous cross-validation. This makes the classification threshold obtained through a complex drawing approach less applicable to other scenes. Yet, recent examples from the literature illustrate that the traditional “scatterplot drawing” approach to MTMF classification threshold determination is still very much in use [14
]. Such projects have limited replicability, may present overly optimistic results, and generally show few signs of optimizing the results beyond comparing a single classification product to reference data (i.e., basic accuracy assessment). Projects of this nature might be strengthened from the framework we have outlined above.
To reiterate, it is important to recognize that MTMF is not limited to cases of hard classification, where the analyst seeks to obtain mutually exclusive class assignments. While we have presented an example of a hard classification, MTMF is also employed in fuzzy, continuous, or otherwise unique frameworks, such that the MF and infeasibility images are never integrated into discrete classes. For example, Franke et al. [84
] used a fusion of three MF images of Brazilian cerrado to produce a continuous-scaled map of fire fuel load conditions that was calibrated against field-estimated biomass. Mikheeva et al. [64
] developed abundance classes for tundra-taiga ecotone vegetation using MTMF and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) imagery. They relied on high resolution classified QuickBird imagery for accuracy assessment, and performed the binning of abundance data not during the MTMF post-processing workflow, but as a separate step, based on errors in the relationship between the two classified images. With the exception of thresholding MF values to those with infeasibility values less than 10, Ayoobi and Tangestani [85
] relied solely on their MF image to map copper abundance. As noted above, Pontius et al. [79
] and Gudex-Cross et al. [63
] also employed MTMF classification in ways that were not initially hard classification.
With non-hard classification aside, we argue that machine learning algorithms provide an efficient, semi-automated approach to classification, and that cross-validation should be considered a critical component when computing the accuracy of any classified map product. Other studies within the remote sensing and geospatial literature base have already demonstrated the importance of cross-validation (e.g., [33
]), and our study strongly suggests extending this standard procedure to future MTMF algorithm use.
4.2. Hyperion, Automation
Our results do not demonstrate the highly successful use of Hyperion data for the classification of leafy spurge. This is visible in our computed accuracies and in Figure 5
. When considering the relatively low estimates of kappa in conjunction with the variability visible in Figure 5
and Table 1
, we might question the spectral fidelity of our target endmember (compared to the background) or the quality of the imagery. The many previous studies focused on this particular plant species [3
] collectively suggest that spectral fidelity of the endmember relative to the scene background signatures is not the issue. With respect to image quality, the Hyperion sensor is among those known to exhibit variable signal-to-noise ratios. Kruse et al. [90
] and Ayoobi and Tangestani [85
] note that the noise levels for a given sensor are generally fixed, but that the strength of the signal is dependent on external factors, such as solar zenith angle, atmospheric interference, or surface reflectance, among others. Kruse et al. [90
] demonstrate that the signal in Hyperion imagery is sensitive to acquisition conditions, and that superior signal-to-noise ratios are obtained from periods with high solar zenith angles. Our imagery was collected during optimal conditions—within roughly one month of the summer solstice (Northern hemisphere), at mid-day, and with low cloud cover (NASA ratings of 0–9% and 20–29%). When paired with our methods for identifying shift difference regions and applying noise reduction transformations (MNF), we feel confident that the signal-to-noise ratios for our two images could not be markedly improved. This may indirectly indicate that Hyperion imagery is not suitable for making estimates of low abundance leafy spurge, or that our ocular field estimates were not reliable. Additional studies are needed to explore the application of unmixing algorithms to Hyperion scenes of heterogeneously vegetated landscapes.
While our results do not demonstrate a highly successful use of Hyperion data for leafy spurge, our results help to highlight the precise reason why reliable threshold selection and cross-validation are so essential. Our one-time manual accuracy assessment led us to an inflated estimate, and the cross-validation methods we employed provided the rigor to challenge these inflated accuracies and improve the reliability of our results. In general, automating as many aspects of image classification and other relevant protocols as possible (e.g., [36
]) will help to ensure that the remote sensing and land-management communities maintain common ground in dialogues concerning their respective disciplines. Automation may also help in sidestepping limitations in the underlying theoretical bases of various imagery analysis protocols. In particular, the consistently underestimated subpixel abundance estimates illuminated by Mitchell and Glenn [13
] detract from the interpretability and reliability of MTMF outputs. As an alternative, investigators might consider coarser metrics for assessing subpixel abundance (e.g., discretizing the continuous abundance values into bins using a clustering algorithm) that offer accuracy at scales that land managers will still find applicable to their needs.