Application of Abundance Map Reference Data for Spectral Unmixing

Reference data (“ground truth”) maps have traditionally been used to assess the accuracy of classification algorithms. These maps typically classify pixels or areas of imagery as belonging to a finite number of ground cover classes, but do not include sub-pixel abundance estimates; therefore, they are not sufficiently detailed to directly assess the performance of spectral unmixing algorithms. Our research aims to efficiently generate, validate, and apply abundance map reference data (AMRD) to airborne remote sensing scenes. Scene-wide AMRD for this study were generated using the remotely sensed reference data (RSRD) technique, which spatially aggregates classification or unmixing results from fine scale imagery (e.g., 1-m GSD) to co-located coarse scale imagery (e.g., 10-m GSD or larger). Validation of the accuracy of these methods was previously performed for generic 10 m × 10 m coarse scale imagery, resulting in AMRD with known accuracy. The purpose of this paper was to apply this previously validated AMRD to specific examples of airborne coarse scale imagery. Application of AMRD involved three main parts: (1) spatial alignment of coarse and fine scale imagery; (2) aggregation of fine scale abundances to produce coarse scale imagery specific AMRD; and (3) demonstration of comparisons between coarse scale unmixing abundances and AMRD. Spatial alignment was performed using our new scene-wide spectral comparison (SWSC) algorithm, which aligned imagery with accuracy approaching the distance of a single fine scale pixel. We compared simple rectangular aggregation to coarse sensor point-spread function (PSF) aggregation, and found that PSF returned lower error, but that rectangular aggregation more accurately estimated true AMRD at ground level. We demonstrated various metrics for comparing unmixing results to AMRD, including several new techniques which adjust for known error in the reference data itself. These metrics indicated that fully constrained linear unmixing of AVIRIS imagery across all three scenes returned an average error of 10.83% per class and pixel. Our reference data research has demonstrated a viable methodology to efficiently generate, validate, and apply AMRD to specific examples of airborne remote sensing imagery, thereby enabling direct quantitative assessment of spectral unmixing performance.

.2, is an example of early aerial photography taken from a hot air balloon. Aerial imagery and reconnaissance technologies progressed quickly in the 20th century [8]. Commercial availability of satellite imagery in the early 21st century, in turn, supported the development of modern digital maps. Ideally, aerial imagery and corresponding maps are aligned perfectly and the map is simply a summarization of the nearly infinite information in imagery into a format that is easier for a human or computer to understand. In the late 20th century a new type of aerial imaging technology emerged, called imaging spectroscopy or hyperspectral imagery [9]. Previously, each pixel in an image had contained a single brightness intensity (black and white image), three brightness intensities for red, green, and blue colors (color image), or multiple brightness intensities for red, green, blue, short-wave infrared, long-wave infrared, etc. (multispectral image). With the advent of imaging spectroscopy, however, each pixel may contain brightness information for hundreds of contiguous color bands from the visual, near infrared, and short-wave infrared spectrum [10]. Such data can be visualized as a 3-D cube, shown in Fig. 1.3, where the x and y coordinates represent spatial dimensions and the z coordinate represents spectral dimensions. These imaging spectrometer data add little value in their raw format, but are well suited to imagery analysis by computer algorithms, resulting in useful data products that enhance our understanding of remote sensing scenes [11]. Reference data maps are a special type of map designed to assess the performance of computer algorithms that automatically categorize overhead imagery into a finite number of ground classes [12]. For example, a reference data map for a typical suburban area would accurately categorize all areas of the map into several ground cover classes including: roof top, sidewalk, road, vehicle, vegetation, soil, etc. The resulting reference data map could then be used to quantitatively assess the accuracy of a new computer algorithm that attempts to categorize the image into the same ground cover classes. Over the years, several reference data maps have become well-known in the field of remote sensing imaging spectroscopy, including maps known as Cuprite (Nevada, USA) [13,14,15,16,17], Indian Pines (Indiana, USA) [18,19,20,21,22], and Pavia (Italy) [18,21,23,24,25]. Numerous publications for new imaging spectrometer processing algorithms have been published based on quantitative comparisons with these reference data. Example reference data for Indian Pines is shown in Fig. 1.4. In the field of remote sensing, these reference data or reference data maps are often called "ground truth" maps, referring to the traditional method of creating reference data by conducting field surveys. However, reference data terminology is preferred because ground truth is an overused term in remote sensing that can mean any type of data collected on the ground, including spectral reflectance of calibration panels, weather data, labeled class maps, etc. [26]. Ground truth also implies that reference data maps are perfectly accurate and that ground measurements are the best way to create the maps. Our research has demonstrated that reference data maps are not perfectly accurate and that ground measurements may not be the best way to create such maps [27,5,28].
Reference data maps are typically generated by field surveys and imagery analysis; however, both these methods can be prohibitively labor intensive and are prone to various sources of error, including human bias, mistakes, fatigue, view angle differences, and spatial alignment differences [12]. In addition to the labor required to produce reference data maps, reference data itself should be validated in order to characterize uncertainty and error.
Published validation reports are not available for even the most widely used reference data maps.
A key attribute of reference data is the level of spatial detail in the map. Spatial detail can be grouped into three categories: area, pixel, and abundance level detail. Area detail means that multiple pixels are grouped together and labeled as belonging to a certain ground cover class. Pixel detail means that each pixel is independently labeled as belonging to a certain class. Abundance detail means that each pixel can be a fractional mix of multiple classes. The well-known reference data maps mentioned previously are provided at area or pixel detail, with no widespread use of reference data at abundance map detail. In summary, the current state of reference data is deficient in the following respects: Existing reference data products are limited in number because generating new reference data using traditional methods can be prohibitively expensive.
Existing reference data sets lack published validation reports that characterize error and uncertainty.
Existing reference data sets do not include spatial detail at the abundance level.
This research demonstrates methods through which the deficiencies of existing reference data can be improved and makes several examples of improved reference data publicly available. We have defined a set of research objectives to address these identified shortcomings of current reference data maps. This chapter presents a journal paper that was published in Remote Sensing, in May 2017 [5].

OBJECTIVES
It focuses on the validation of AMRD produced using RSRD methodology. RSRD derived data are compared with other versions of AMRD collected through field work and imagery analysis to characterize expected error and uncertainty.

Chapter 6: Summary
This chapter summarizes the work that has been performed, presents final conclusions, and recommends future work.

CONTRIBUTIONS
The novel contributions of this research include the following items: Generation, validation, and application of AMRD for three independent remote sensing scenes, with data made available for use by the research community Demonstration of a methodology to efficiently produce scene-wide AMRD Execution of an extensive data collection campaign in support of AMRD validation, including field surveys and imagery analysis by two observers, and image processing by three algorithms Introduction of a methodology to validate the accuracy of reference data Development of an image registration algorithm capable of aligning coarse and fine scale imaging spectrometer data, with accuracy approaching that of a single fine scale pixel Introduction of methodologies to quantitatively compare spectral unmixing results with AMRD, including accounting for known error in AMRD itself.

FOREWORD
This dissertation is organized in the modern format, meaning that each of the primary chapters is a complete paper, including the standard background section. This additional background chapter is provided as a more in-depth examination of traditional imaging spectrometer reference data. We also include a section discussing the complexity of remotely sensed imaging spectrometer data and accompanying reference data.

KEY ATTRIBUTES OF REFERENCE DATA
Through the experience of generating reference data products for imaging spectrometer data [27,5,28], we have identified a number of key attributes of reference data. The following sections briefly describe these key attributes, which are used later to analyze the past and present state of reference data products commonly implemented by the remote sensing community.

Imaging Sensor
Reference data maps are generated for specific imaging spectrometer scenes. Details of the imaging spectrometer, including its name, the number of spectral bands, calibration, post processing, and any other pertinent information are important for the combined use of imagery and accompanying reference data.

Image Size
The imagery and corresponding reference data have a total number of pixels, based on the number of rows and columns in the data. The number of pixels can be important in remote sensing, for purposes of having enough pixels from each class to both train and test algorithms.

GSD of Pixels
Ground sample distance (GSD) refers to the size of the pixels at ground location. For example, pixels with 15m GSD occupy approximately a square area of size (15m x 15m) on the ground. Large GSD pixels are more likely to be made up of a mixture of ground cover classes, when compared to small GSD imaging spectrometer data.

Number of Classes
A small number of ground cover classes provides a simple map, but generally results in more pixels being significantly different from the closest class. Increasing the number of classes reduces this error, but also introduces more complexity to the reference data product.

Generation Method
Reference data have typically been generated using field surveys, human imagery analysis, computer algorithms, or some mixture of these methods.

Spatial Detail
Spatial detail can be broken into three categories or scales: area, pixel, and abundance level detail. Area detail implies that multiple pixels are grouped together and labeled as belonging to a certain ground cover class. Pixel detail means that each pixel is independently labeled as belonging to a certain class. Abundance detail means that each pixel can be a fractional mix of multiple classes.

Validation Availability
Before using a reference data product to quantitatively assess the performance of a new algorithm, a researcher should have an idea how much error or uncertainty is present in the reference data. Ideally, a validation report or similar document should be available for widely used reference data.

Accuracy
The accuracy of reference data maps should be known before they are used as the benchmark for assessing performance of new image processing algorithms.

DATA
The following sections provide an introduction to five of the most widely used reference data sets in the remote sensing community. While they are not a comprehensive list of available reference data, these data sets were chosen because of their widespread use and historical importance.

Cuprite (Nevada, USA)
Cuprite is a mining district in Nevada, USA, that was studied extensively by geologists in the 1990's [3]. Perhaps the forerunner of all imaging spectrometer reference data, the Cuprite scene and accompanying mineral maps are shown in Fig. 2.1. Swayze et al. [30,31] conducted field surveys in the area and tied field results to an algorithm called Tetracorder, which generated scene wide mineral maps. Several versions of maps are available, with separate maps and minerals for AVIRIS' VNIR and SWIR bands. Cuprite map publication coincided with the initiation of NASA's AVIRIS program and attained notoriety along with the sensor itself. Tetracorder outputs were continuous rather than binary, resulting in mineral maps with semi-abundance qualities. In practice, researchers tended to use Cuprite results in a quantitative fashion when assessing classification results and in a qualitative fashion when assessing unmixing results [13,14,15,16,17]. Pines reference data coincided with the rise of the AVIRIS sensor. Baumgardner et al. [32] conducted field surveys and questionnaires in the area and created the reference data map by classifying broad areas of pixels into the various classes. While use of Cuprite has faded over time, Indian Pines continues to be used as reference data [18,19,20,21,22,33,34].

Salinas (California, USA)
Salinas Valley is an agricultural area in California, USA, that was imaged in 1998 by AVIRIS flying at a low altitude [3]. The Salinas scene and reference data are shown in Fig. 2.3. This scene is frequently used as reference data, including numerous recent publications [35,36,33,37,34] . Despite its widespread use, information regarding its creation and validation is difficult to find; thus far we have not been able to locate any documentation other than brief summaries in secondary sources.

Pavia (Italy)
Pavia is an urban area in northern Italy that was imaged by the ROSIS sensor The Pavia scenes are used often in recent publications [18,21,23,24,25,36,33,20,37], nevertheless, we have not been able to locate detailed information about how the reference data were generated. Gamba [38] mentions that the reference data maps were created from regional maps, ground surveys, and manual photo interpretation, implying a mixed methodology.   The first notable trend is the decrease in both GSD and number of classes over time.
The Pavia and Salinas data sets, which are commonly used in recent research, boast higher spatial resolution, but summarize the imagery into a smaller number of ground cover classes.
This trend is perhaps the opposite of what one would expect, given that higher resolution imagery should reveal more detail on the ground. As such, this trend is probably driven by considerations of practicality, rather than from a pure information content perspective.
The second trend is that these reference data provide information at the pixel or area spatial detail level, with no widely used reference data that contains abundance information.
Cuprite does have some abundance like properties, as Tetracorder results for each pixel are continuous, but the maps are not presented in a manner supporting quantitative abundance analysis.
The third trend is that all these data appear to have been created through mixed methodologies of field surveys, imagery analysis, existing map products, and in the case of Cuprite, algorithms. Field surveys are often considered the gold standard, but practical constraints make it difficult to produce comprehensive reference data through a pure field survey methodology. Indian Pines is the data set that relies most heavily on field surveys and questionnaires and not surprisingly, it is provided at the area level of detail where entire fields are combined into the same class, rather than taking into account intra-field variation as would be done for pixel level detail.
The fourth, and perhaps the most important trend, is the surprising lack of validation reports and expected accuracy of reference data. Meaningful documentation is available for Cuprite and Indian Pines and these reports speak to the dedication and professionalism of the researchers who developed the reference data maps. But the focus of the reports documents how the data were created rather than estimating accuracy, as would be done in a validation report. Salinas and Pavia were also likely compiled by professional researchers using careful methodology, but documentation is not readily available.

PREVIOUS SUB-PIXEL REFERENCE DATA
Prior to our introduction of the remotely sensed reference data (RSRD) methodology to produce abundance map reference data (AMRD), several studies produced sub-pixel reference data with techniques similar to RSRD, i.e., aggregation of fine spatial scale imagery products to generate AMRD for co-located coarse scale imagery. Specifically, sub-pixel reference data were created using high-resolution RGB videography [39], RGB imagery [40], multi-spectral imagery [41] and imaging spectrometer data [42]. However, although several of these studies alluded to the need for the validation of sub-pixel reference data, none implemented an assessment approach of reference data, nor did they expand on which reference data development approach is best suited to this challenge. Our research therefore further explores the topic of sub-pixel reference data generated via high-resolution imaging spectrometer data, specifically in terms of a methodology to validate reference data, spatial alignment of image data, and novel error metrics.

COMPLEXITY OF REFERENCE DATA
Remotely sensed imaging spectrometer data and accompanying reference data are complex products that capture information from complex scenes. Producing these data requires making assumptions which are often overlooked by end users. For example, in the creation of imagery, each link in the imaging chain contributes non-zero error to the final data product. This imaging chain includes key links such as the radiative transfer of electromagnetic energy from ground level to the imaging sensor, camera optics and digital sensors, image processing, etc. [43,26]. Creation of reference data further contributes error, as producing ground cover class maps requires the loss of much information while complex scenes are simplified into a finite set of classes [44]. The selection of appropriate ground cover classes and the eventual loss of intra-class variability have been shown to significantly affect image processing accuracy [45]. Furthermore, numerous additional sources of error exist whether reference data are produced via field surveys, imagery analysis, image processing, or mixed methods [12,5]. In addition to these standard sources of error in imagery and reference data, the methodology described in this research uses high spatial resolution imaging spectrometer data to produce abundance level reference data for low spatial resolution imaging spectrometer data. As a result, there are differences between the two imaging systems, including collection altitude and angle, calibration and post-processing, etc. Even the basic interpretation of an abundance is complex when it comes to assigning physical meaning to the theoretical definition of an abundance being a scalar multiplier of an endmember spectrum. As such, the work contained in this dissertation comes with a number of assumptions and limitations which are outlined below. Assumptions: Imagery data, image processing results, and reference data all contain error Through validation, error in reference data can be accurately estimated Through quantitative assessment of image processing results using validated reference data, error from imagery data and image processing can be accurately estimated The chosen classification schemes broadly characterize the major ground materials of the scenes The endmember spectra of ground cover classes approximate the mean of the spectra they are intended to represent The ground cover classes are distinct enough to be separable using only red, green, and blue colors during field surveys and imagery analysis Spatial alignment of high and low resolution imagery products is accurate enough so that the accuracy of validated reference data is not significantly degraded The physical interpretation of theoretical abundance scalars corresponds to sensor line of sight surface area fractions on the ground Limitations: Validated reference data have both mean and standard deviation error, of which only mean error can be removed from analysis, leaving standard deviation error as a persistent form of error between image processing results and reference data Utilizing reference data to assess image processing results requires using the same classification scheme for both image processing and reference data; ground cover classes can't be added or subtracted without re-generating and re-validating reference data Reference data validation is accomplished using samples from multiple scenes together, requiring image processing assessments to also be conducted together Reference data validation is tied to ground level and high resolution imagery abundances, meaning that we validate line of sight surface area fractions falling within nominal low resolution imagery pixel squares; however, the point spread function of low resolution sensors overemphasizes information from the center of pixels and includes contributions from outside the nominal pixel square on the ground, creating another persistent form of error between image processing results and reference data

CONCLUSIONS
We have introduced and analyzed five of the most widely used aerial imaging spectrometer reference data maps. We commend the professionals who compiled these and other data sets, which have supported several decades of remote sensing research. Despite the usefulness of these well-known reference data, they also have weaknesses that are not often acknowledged.
Specifically, none of these data have been validated, and only two of the five have readily accessible documentation describing how the data were generated. Furthermore, these well-known scenes were all estimated at the area-or pixel-level of spatial detail, lacking the abundance level detail needed to assess spectral unmixing algorithms. We identified several lesser-known studies that used sub-pixel reference data, but these data sets lack validation and have not been adopted widely. We therefore recommend that the remote sensing community develop methodologies to efficiently generate validated abundance level reference data. Finally, we acknowledged the complexity of remote sensing scenes, imagery, image processing algorithms, and reference data, and we therefore encourage the responsible use of these data products.

Chapter 3
Remotely Sensed Reference Data

FOREWORD
This paper was presented and published in the SPIE Defense and Security 2016 proceedings [27]. It introduces and demonstrates remotely sensed reference data (RSRD), which is a new method of efficiently generating abundance map reference data (AMRD). It fulfills our first research objective, namely, "Demonstrate a new method of efficiently generating abundance map reference data (AMRD)."

ABSTRACT
Exploitation of imaging spectrometer (IS) data (hyperspectral imagery) using classification and spectral unmixing algorithms is a major research area in remote sensing, with reference data required to assess algorithm performance. However, we are limited by our inability to generate rapid, accurate, and consistent reference data, thus making quantitative algorithm analysis difficult. As a result, many investigators present either limited quantitative results, use synthetic imagery, or provide qualitative results using real imagery. Existing reference data typically classify large swaths of imagery pixel-by-pixel, per cover type. While this type of mapping provides a first order understanding of scene composition, it is not detailed enough to include the complexity of mixed pixels. Accounting for mixed pixels requires es- This new methodology for generating large scale AMRD is called remotely sensed reference data (RSRD). This paper demonstrates the process of using a RSRD methodology to produce AMRD using NEON and AVIRIS imagery. It also addresses challenges related to the fusion of multiple remote sensing modalities (e.g., different sensors, sensor look angles, spatial registration, varying scene illumination, etc.). A new algorithm for spatial registration of IS imagery with disparate resolutions also is presented. Several versions of AMRD results were compared to each other, with total cover area differences of 0.7-6.4%, depending on the ground cover class. Differences between individual pixels, on the other hand, had means of 1.0-8.8%. These versions of AMRD differed less than when AMRD were compared to directly unmixed AVIRIS data, which had total cover area differences of 0.9-13.0% and mean individual pixel differences of 2.8-21.3%. These AMRD results are promising, with future field survey and imagery analysis required to quantify the accuracy of AMRD produced using the RSRD technique.

INTRODUCTION
Imaging spectrometers were conceived in the 1980's [46] and imaging spectrometer (IS) data (hyperspectral imagery) became widely available in the 1990's when the Jet Propulsion Laboratory (JPL) fielded the Airborne Visible / Infrared Imaging Spectrometer (AVIRIS) [47].
Classification refers to algorithms that label each pixel as belonging to a certain ground cover class of materials according to the chosen classification scheme [10]. For example, in this paper the classification scheme consists of the following classes: roof, photosynthetic vegetation (PV), non-photosynthetic vegetation (NPV), bare soil (BS), and pavement [48,49]. The representative spectral signature for each class is often called an endmember if of pure composition [50]. A classification algorithm would assign each pixel in the image to one of these five ground cover classes based on which endmember most closely resembles the spectrum of the given pixel.
Spectral unmixing, on the other hand, refers to a variety of algorithms wherein the spectra of each pixel is assumed to come from a mixture of endmembers within the ground footprint of the pixel. Spectral unmixing algorithms are designed to estimate the fraction or abundance of each endmember in each pixel. For example, a given pixel may be made up of 50% PV, 25% NPV, and 25% BS. Spectral unmixing has been exhaustively studied [50] and since the problem is challenging the effort continues into the present [51].
One of the challenges involved in assessing the performance of classification or unmixing algorithms, is knowing the actual abundances per class for each pixel. Traditionally, a map that provides the true composition of classes or abundances on the ground has been called a "ground truth" map or a reference data map [26]. Unfortunately, there is a lack of quality reference data for remotely collected imaging spectrometer data. Collection of reference data is challenging for many reasons, not the least of which being the large spatial scale of aerial imaging spectrometer data. Field survey derived reference data maps are time consuming and costly; and while often considered the gold standard, they are still prone to positional and thematic errors [12]. Existing reference data typically classify area-by-area, or at best pixel-by-pixel, without accounting for the mixed pixel abundance information necessary for assessing spectral unmixing.
Without quality reference data, it is not possible to quantitatively assess and compare algorithm performance on imaging spectrometer data. This fact, along with the lack of reference data maps, has forced researchers to present quantitative results using synthetic imagery and qualitative results using a small number of well-known imaging spectrometer scenes to show that their results visually approximate previous results. Large area, high quality, abundance map reference data (AMRD) would enable quantitative assessment of spectral umixing algorithms on real imaging spectrometer data. This paper introduces a methodology for generating large area, high quality, AMRD, which we call remotely sensed reference data (RSRD). Demonstration of this methodology, including initial results, are presented using the imagery shown in  Table 3.1 [53]. The main purpose of this paper is to demonstrate the use of low altitude, high spatial resolution NEON data to generate AMRD for AVIRIS imagery.

Data Preparation
Since NEON and AVIRIS data were collected by different sensors at different altitudes and times of day, data preparation attempts to align data spectrally, spatially, and radiometrically. Both NEON and AVIRIS imaging spectrometer data were provided as orthorectified reflectance imagery. NEON's imaging spectrometer is designed by NASA JPL [54] and is considered to be a next generation version of AVIRIS. NEON used a CIMEL sun photometer for atmospheric characterization and verified atmospheric compensation using tracor tarps [53]. AVIRIS used a proprietary variant of the Atmospheric Removal Program (ATREM) for atmospheric compensation [55].
It is worth noting here that the digital elevation maps (DEM) used for orthorectification by NEON and AVIRIS did not include small scale elevation deviations from ground surface, such as buildings, trees, etc. Both NEON and AVIRIS sensors look side to side +/-15 degrees, but AVIRIS flies at a higher altitude than NEON and its swath width is wide compared to NEON (see Table 3.1). Therefore, the sensor look angle for AVIRIS remains relatively fixed throughout NEON's swath width. Tall objects such as trees, that depart significantly from the DEM, are imaged in slightly different ground locations by the two sensors due to parallax effects. Since the flight patterns for these collections were oriented north to south, parallax differences are amplified towards the east and west edges of the NEON data. Accounting for parallax distortions is an area of future work.
NEON data contain 428 contiguous spectral bands from 380-2510 nm with the full-width half max (FWHM) of spectral channels being roughly 5 nm [52]. AVIRIS data contain 224 contiguous spectral bands from 400-2500 nm with a FWHM of roughly 10 nm [9]. ENVI's spectral resampling tool was used to resample NEON data such that the new bands matched AVIRIS bands. ENVI uses a Gaussian model with instrument-specific FWHM information to accomplish resampling. NEON and AVIRIS data also contained noisy bands due to atmospheric absorption and other factors and these bands (1-10, 104-114, 153-168, 215-224) were removed prior to further processing.
When the data sets were loaded into ENVI it was visually apparent that the georeferencing of data sets was off by roughly 1-2 AVIRIS pixels, or 15-30 meters. Since NEON data were to be aggregated according to the corresponding AVIRIS pixel to produce AMRD, it was important for the two data sets to be spatially aligned. A new algorithm was developed for spatial alignment with the following steps: Extract a spatial subset of NEON imaging spectrometer data corresponding to the desired study area.
Extract a spatial subset of AVIRIS imaging spectrometer data that is larger than the study area by enough margin (In this case, three AVIRIS pixels in all directions) that The shift location with the lowest mean SAM is the position of optimal spatial alignment.
Aligned data are trimmed to the nearest whole AVIRIS and NEON pixels.
At this point our data were atmospherically compensated, orthorectified, spectrally resampled, and spatially aligned. The effectiveness of the data preparation process can be assessed in at least two ways. The first is by taking the spectral average of all pixels in the aligned images and plotting NEON compared to AVIRIS. This result is shown in Figure 3.3, which shows good concurrence between the two data sets with exception of a few noisy areas in AVIRIS data. The second is to look back at Figure 3.2. When optimally aligned, the mean spectral angle between AVIRIS and integrated NEON pixels decreased from 0.13 radians before alignment, to 0.085 radians after alignment.

Classification Scheme & NEON Endmember Determination
For any type of reference data map, a classification scheme must be developed, choosing ground cover classes and representative endmembers that broadly summarize the ground targets in the study area. The classification scheme for this study area was designed based on studying the NEON RGB image in and β = 0.2. Endmember spectra were then updated based on a 50% contribution from the original endmember pixels and 50% contribution from the average of all pixels assigned to the classes. This process was repeated until the endmember spectra representing each class remained stable. Through this method, the original intent/definition of classes was preserved, while assuring that endmember spectra were representative of the entire class.

Classification & Spectral Unmixing of NEON Data
Classification of NEON data was performed by assigning each pixel to the class whose endmember minimizes Eqn. 3.3. Once each pixel was assigned to a class, similar classes were combined, thus resulting in the final classes.
Spectral unmixing was performed using a non-negative least squares (NNLS) algorithm [56]. When applied to spectral unmixing, this algorithm performs linear unmixing, while guarding against negative contributions from endmembers. The linear mixing model (LMM) [10] is given by where X and N are k × n matricies, E is k × m, and A is m × n. k is the number of spectral bands, n is the number of pixels, and m is the number of endmembers. Unconstrained unmixing can be performed using the standard normal equations, i.e., However, unconstrained unmixing allows a i < 0 and a i = 1, which have no physical meaning in the LMM. The NNLS algorithm used in this paper therefore ensured that a i ≥ 0 [56]. No sum-to-one requirement was enforced during initial unmixing, because we use class means as endmembers rather than the most extreme pixels in the image, naturally resulting in mixtures above and below the ideal sum-to-one constraint. Our solution was to perform NNLS and scale abundances afterward, such that a i = 1 for each pixel. After unmixing, abundance maps of similar endmembers were combined to form the final abundance maps.
Whether performing classification or unmixing, the objective of the algorithm is to estimate A, the abundance matrix. Abundance maps are simply the slices of A belonging to each class. For example, the "roof" slice of A is a two dimensional matrix whose pixels align with imagery pixels. Pixel values fell in the range 0 to 1, with 0 representing 0% roof and 1 representing 100% roof.

Generation of Abundance Map Reference Data for AVIRIS
Once a classification scheme was selected for the study area and NEON pixels were either classified or unmixed, the next step aggregated NEON pixels that fit within corresponding AVIRIS pixels to produce AVIRIS scale abundance maps. Since the edges of NEON and AVIRIS pixels do not align perfectly, partial NEON pixels were assigned to the AVIRIS pixel in which the majority of their area resides. The result of this process is our desired AMRD for AVIRIS imagery.

Direct Spectral Unmixing of AVIRIS Data
For purpose of comparison to RSRD, AVIRIS data were unmixed directly using NNLS unmixing. Endmember selection for AVIRIS was challenging, because relatively few AVIRIS pixels cover the spatial extent of the study area and most pixels were highly mixed. Endmember determination is often the most difficult aspect of spectral unmixing, especially as resolution decreases or ground complexity increases. Various endmember determination strategies were attempted and we settled on a strategy to extract AVIRIS endmember spectra directly from single AVIRIS pixels, using RGB imagery and NEON abundance maps for each class to guide pixel selection.

Comparison of Classification and Unmixing
Either classification or spectral unmixing can be used on NEON imaging spectrometer data to ultimately generate AMRD for AVIRIS data using the RSRD process.    NEON pixels provides good fractional quantization for AVIRIS pixels. Classification also avoids the circular argument of creating reference data to assess unmixing performance using other unmixing results. Unmixing, on the other hand, improves fractional quantization compared to classification. Even 1 m 2 pixels are often mixed pixels and classification ignores sub-pixel complexity. This is particularly interesting for the NPV and BS classes in this study area, where senescent grass often is highly mixed with background soil spectral response. In other words, a significant portion of the study area gets classified as NPV; however, many of these NPV pixels also contain a small but significant portion of BS and that complexity is ignored by classification.

Abundance Map Reference Data
Generating the data in    where again, the larger differences correspond to more abundant classes.    Differences between individual pixels had a mean of 2.8-21.3%. These difference were larger than those in Table 3.2. This result highlights the difficulty of endmember determination and spectral unmixing for high altitude, low spatial resolution data sets such as AVIRIS.

CONCLUSIONS
This paper presents a new methodology for generating abundance map reference data (AMRD). This methodology, called remotely sensed reference data (RSRD), uses high spatial resolution imaging spectrometer (IS) data to produce AMRD for low spatial resolution IS data. The RSRD process is demonstrated in this paper using NEON (1-m GSD) and AVIRIS (15-m GSD) IS data over NEON research sites near Fresno, CA. Initial results are promising, based on human analyst comparison of AMRD and high resolution RGB imagery.
Concurrence between AMRD generated via separate classification and unmixing strategies also suggest that AMRD are representative of what is actually on the ground. Specifically, the total area coverage difference between these AMRD were 0.7-6.4%, depending on ground cover class, while the difference between individual pixels were 1.0-8.8%. These differences compare favorably with the differences between directly unmixed AVIRIS data and AMRD, which were 0.9-13.0% and 2.8-21.3%, respectively. Ground based surveys and imagery analysis will be conducted in subsequent efforts to further quantify the accuracy of AMRD generated via RSRD in this paper. Pending confirmation of the validity of this approach, RSRD has the potential to greatly improve the creation of reference data, because it produces high quality, low cost, AMRD for low spatial resolution imaging spectrometer data at vast spatial scale.

Validation of Abundance Map
Reference Data

FOREWORD
This chapter was published as a peer-reviewed journal article in Remote Sensing in May 2017 [5]. It validates the accuracy of abundance map reference data (AMRD) generated using the remotely sensed reference data (RSRD) technique for three remote sensing scenes near Fresno, CA. It fulfills our second research objective, namely, "validate the new method of generating reference data to characterize the expected error and uncertainty."

ABSTRACT
The purpose of this study is to validate the accuracy of abundance map reference data (AMRD) for three airborne imaging spectrometer (IS) scenes. AMRD refers to reference data maps ("ground truth") that are specifically designed to quantitatively assess the performance of spectral unmixing algorithms. While classification algorithms typically label whole pixels as belonging to certain ground cover classes, spectral unmixing allows pixels to be composed of fractions or abundances of each class. The AMRD validated in this paper were generated using our previously-proposed remotely-sensed reference data (RSRD) technique, which spatially aggregates the results of standard classification or unmixing algorithms from fine spatial-scale IS data to produce AMRD for co-located coarse-scale IS data. Validation of the three scenes was accomplished by estimating AMRD in 51 randomlyselected 10 m×10 m plots, using seven independent methods and observers. These independent estimates included field surveys by two observers, imagery analysis by two observers and RSRD by three algorithms. Results indicated statistically-significant differences between all versions of AMRD. Even AMRD from our two field surveys were significantly different for two of the four ground cover classes. These results suggest that all forms of reference data require validation prior to use in assessing the performance of classification and/or unmixing algorithms. Given the significant differences between the independent versions of AMRD, we propose that the mean of all (MOA) versions of reference data for each plot and class is most likely to represent true abundances. Our independent versions of AMRD were compared to MOA to characterize error and uncertainty. Best case results were achieved by a version of imagery analysis, which had mean coverage area differences of 2.0%, with a standard deviation of 5.6%. One of the RSRD algorithms was nearly as accurate, achieving mean differences of 3.0%, with a standard deviation of 6.3%. Further analysis of statistical equivalence yielded an overall zone of equivalence between [−7.0%, 7.2%] for this version of RSRD. The relative accuracy of RSRD methods is promising, given their potential to efficiently generate scene-wide AMRD. These results provide the first known validated abundance level reference data for airborne IS data.

INTRODUCTION
Since the widespread introduction of aerial imaging spectrometer (IS) data (hyperspectral imagery) in the 1990s [9,47], researchers have been developing processing algorithms to analyze and interpret the vast data produced by IS sensors [10,46]. Often, these algorithms are designed to produce labeled maps where each pixel is assigned to one of a finite number of ground cover classes. Common ground cover classes for ecological research are photosynthetic vegetation (PV), non-photosynthetic vegetation (NPV), bare soil (BS), rock, etc. This type of processing, where each whole pixel is assigned to a class, is called classification [10,57].
Spectral unmixing is another method of processing imaging spectrometer data that is similar to classification, except that each pixel is allowed to be a mixture of the pure classes [50,51]. Instead of classifying whole pixels, each pixel is assigned to be a fraction or abundance of each class. For example, a given pixel could be 15% PV, 50% NPV and 35% BS. Since the ground sample distance (GSD) of imaging spectrometer data is often 10 m or greater [9,58], most of the pixels in a scene are composed of a mixture of classes. Spectral unmixing accounts for these mixed pixels and produces abundance maps, where each class has a corresponding abundance map that estimates the fraction of that class in each pixel.
An example abundance map is shown in Reference data, or reference data maps, represent a specific type of map that is used to quantitatively assess how well a classification or spectral unmixing algorithm performs [12].
Reference data maps are usually created through field surveys, imagery analysis, algorithms or a mixture of these methods [31,32,38]. Reference data maps have often been called "ground truth" maps within the remote sensing community, but recently, there has been an effort to use reference data terminology [12]. We strongly support this change, because the term ground truth is overused and implies that the maps are free of error.
Due to the large spatial extent of aerial imaging spectrometer scenes, generating new reference data maps is expensive and time consuming [30,32]. As a result, few reference data maps have been created, even though imaging spectrometers have been operational for several decades. Cuprite [31], Indian Pines [32], Salinas [3] and Pavia [38] are examples of commonly-used reference data products. Despite being widely used for several decades to assess the performance of new classification algorithms, we have not been able to locate validation reports that characterize the error or uncertainty in these reference data. Addi-tionally, these products label whole pixels as belonging to classes, making them of little use for assessing the performance of spectral unmixing algorithms.
Researchers have used various methods to assess the performance of spectral unmixing algorithms. Qualitative assessments have visually compared unmixing results to well-known reference data, such as Cuprite, Indian Pines and Pavia [59,60,51]. Quantitative assessments have used synthetic data [59,60,51], average per-pixel residual error [61,51], taken the mean of several algorithms as the ideal and computed residuals from the ideal [61], etc. Furthermore, there are several studies that have created sub-pixel reference data using high-resolution RGB videography [39], RGB imagery [40], multi-spectral imagery [41] and IS imagery [42]. However, although many of these studies alluded to the need for the validation of sub-pixel reference data, none implemented an assessment approach of reference data, nor did they expand on which reference data development approach is best suited to this challenge. This research therefore further explores the topic of sub-pixel reference data generated via high-resolution imagery.
In summary, the main challenges with reference data are that new reference data maps are prohibitively expensive to produce; existing reference data products have not been validated for accuracy; and we are not aware of any commonly-used reference datasets that contain abundance level detail.
Our previous work [27] proposed a new technique for efficiently generating abundance map reference data (AMRD). This technique is called remotely-sensed reference data (RSRD).
RSRD aggregates the results of standard classification or spectral unmixing algorithms on fine-scale imaging spectrometer data (e.g., 1-m GSD National Ecological Observatory Network (NEON) IS data) to generate scene-wide AMRD for co-located coarse-scale imaging spectrometer data (e.g., 15-m GSD Airborne Visible-Infrared Imaging Spectrometer (AVIRIS) IS data), thereby enabling quantitative assessment of spectral unmixing algo- rithms. An example AVIRIS pixel grid pattern, overlaid on NEON imagery, is shown in  The primary objective of this paper is to validate three AMRD scenes that were produced using the RSRD technique. The secondary objective of this paper is to promote the understanding that every method of generating reference data is vulnerable to certain types of error; reference data should not be used to assess classification or unmixing performance without a validation report that characterizes error and uncertainty in the reference data itself.

Airborne Data and Cover Classes
The three scenes used throughout this paper are located on or adjacent to NEON D17 research sites near Fresno, CA [52]. Overhead images of the three scenes, collected by NEON's co-mounted framing RGB camera, are shown in Figure 4.3. Imagery over NEON D17 research sites was chosen for this study because NEON and AVIRIS conducted a joint campaign in June 2013, collecting imagery over the same areas on the same days [53].
The three specific scenes were chosen to be representative of a wide variety of remote sensing scenes. The selected scenes include asphalt roads, dirt roads, concrete, buildings, grass fields, dry valley grasslands, high mountain forests, large rock outcroppings, etc.
As such, the specific validation methods and conclusions in this paper should be generally applicable to many remote sensing scenes, especially environments dominated by rural or suburban features. It is worth noting that the intent of this research was to introduce the fine spatial resolution approach to coarser spatial resolution imaging spectroscopy pixel unmixing. Although we attempted to include both natural and man-made environments, we would not expect these scenes to be representative of dense urban environments.  Table 4.1. It is important to note that while we intend to produce AMRD specifically for AVIRIS data, the validation study in this paper was done for a generic 10-m GSD coarse-scale grid, allowing specific application for any imagery product with GSD larger than 10 m (i.e., AVIRIS, HyspIRI, Hyperion, EnMap, Landsat, etc.) Through analysis of RGB imagery of the three scenes, we chose a classification scheme similar to common ecological studies [48,49]. The following is a list of the ground cover classes we chose for our three scenes: Photosynthetic vegetation (PV): live green trees, bushes, weeds, grass, etc.
These classes represent the main ground constituents of our scenes, without over-complicating reference data generation and validation, i.e., they serve to illustrate the approach in this ecological context.
Historically, reference data maps have been generated using field surveys, imagery analysis, algorithms or a combination of multiple methods [3]. Each of these methods for generating reference data has strengths and weaknesses. For example, field surveys provide the best situational awareness and ability to closely examine ground cover. However, they are subject to human error, such as bias, fatigue, mistakes, etc. Field surveys are also subject to perspective and positional error [12]. Imagery analysis eliminates perspective and positional error, but reduces situational awareness and retains human error. Algorithms eliminate perspective and positional error, and reduce human error, but remove human intuition and situational awareness. Taking this information into account, the optimal method of generating reference data is not clear and will be further explored in this paper.
An important question for our study is how to validate and characterize the error in reference data, when reference data are already the most accurate estimates available. Our approach is to produce multiple independent versions of reference data for the same plots on the ground using traditional methods and RSRD and then to compare the results. To our knowledge, this is the first validated reference data presented at abundance-level spatial detail.
Time and budget constraints permitted the compilation of seven independent versions of reference data for 51 10 m×10 m plots, spread across the three scenes. Sample plots were selected using a pseudo-random number generator. The reference data methods and observers are listed below: The observers were both graduate students in imaging science, i.e., a degree of familiarity with imaging products and image interpretation was assured.
Validation and characterization of error was accomplished by comparing each of these data to each other, as well as comparing each version of reference data to the mean of all (MOA) data. Through this analysis, we examined whether certain datasets were statistically different from one another, estimated the mean and standard deviation of differences from MOA and used equivalence hypothesis tests to determine statistical similarity to MOA.

Reference Data Collection
Validation of RSRD as a credible technique for producing AMRD is dependent on compiling multiple independent versions of reference data that are each of the highest quality. As such, our team compiled seven independent versions of reference data for 51 of the selected 10 m×10 m plots shown in Figure 4.3. All but eight of the 51 plots were randomly selected, discarding plots that were located on busy roads, rooftops, etc. Eight non-randomly selected plots targeted under-represented cover types, such as asphalt, sidewalks and dirt roads. Due to time and budget constraints, private property issues, etc., the 51 plots were a subset of the original randomly selected plots.
The goal of reference data collection was to carefully estimate the abundance of ground cover classes in each of the 51 10 m×10 m plots, using field surveys, imagery analysis and RSRD methodologies. The following sections explain the details for each of these collection methodologies.

Field Surveys
Observers we assumed that this approach limited geospatial inaccuracies.
In order to more accurately estimate plot level abundances, a rope grid was positioned on top of the 10 m×10 m plot, dividing the plot into 25 2 m×2 m samples. This is illustrated in Figure 4.4a. With the grid in place, the two observers independently estimated the abundance of ground cover classes in each of the 2 m×2 m samples. If the observers' estimates differed by more than 15% cover area for a given class, the sample was re-examined by both researchers to reduce errors. Estimates were recorded on data collection sheets, as shown in Figure 4.4b. In addition to abundance estimates for each 2 m×2 m sample, photos were captured showing the entire plot, near nadir photos were captured of each 2 m×2 m sample, GPS coordinates of plot center locations were measured, and a library of spectral samples was compiled using a Spectra Vista Corporation (SVC) field portable spectrometer.
Paper data collection sheets and other data were compiled and recorded in a spreadsheet each evening so that errors and ambiguities could be resolved before forgetting important information.

Remotely-Sensed Reference Data
Euclidean distance (ED) [10], NNLS unmixing [56] and ML [10] were selected to estimate scene-wide AMRD using the RSRD technique. We chose these algorithms because of their widespread adoption, established reputation, and relative simplicity. This study was not intended to identify the best possible algorithm, i.e., we opted to use established methods to evaluate reference data outcomes.
Endmembers were extracted for each scene separately by selecting ten exemplar pixels for each cover class and taking their spectral mean. We opted for this approach, rather than algorithmic endmember extraction techniques, in order maintain human analyst control and interpretative intent over class representative spectra. The same endmembers were used for ED, NNLS and ML within each scene.
For ED, which is equivalent to a nearest-neighbor classifier, NEON IS pixels were evaluated using Equation (4.1), where x represents the spectrum of a pixel, s is an endmember, and n is the number of spectral bands. Pixels were assigned to the class whose endmember   Aggregating data and comparing results allowed us to build confidence that the RSRD methodology can be utilized to provide quality AMRD for any medium to large GSD imagery. As individual samples were aggregated to the 2-m, 6-m and 10-m scales, the variance of abundance estimates decreased, showing that the RSRD methodology, applied to 1-m NEON IS data, can be used to generate AMRD for any imagery with GSD greater than or equal to 10 m.

Methods of Data Comparison
Abundance map reference data estimates from the seven methods discussed above (Field- to produce difference data. Difference data were evaluated using histograms, statistical measures such as mean and standard deviation of differences, t-tests and equivalence tests.
When using a t-test, the intention is to reject the null hypothesis, with the null hypoth-esis being that all data or treatments are the same. In other words, t-tests are intended to demonstrate differences between datasets or treatments [63]. The purpose of using t-tests in this paper was to show that independent versions of reference data exhibit statistically significant differences from one another and, as such, need to be validated rather than simply assuming "ground truth".
Failing to reject the null hypothesis does not prove similarity, but provides inconclusive results. Equivalence tests are similar to t-tests, but are used to prove similarity. When using an equivalence test, the intention is still to reject the null hypothesis, but in this case, the null hypothesis is that data are different. Rejecting the null hypothesis in this case proves statistical equivalence. Equivalence tests require establishing a zone of equivalence, An equivalence test then finds the probability p that µ / ∈ [θ 0 − δ, θ 0 + δ].
If p < α, the data are proven equivalent with 100(1 − 2α)% confidence [64]. The purpose of using equivalence tests in this paper was to define equivalence zones wherein versions of reference data were statistically equivalent [64] to MOA.
In practice, equivalence tests are performed by calculating confidence intervals for the difference between two datasets; if the entire confidence interval falls with the zone of equivalence, the two datasets are deemed statistically equivalent. It should also be noted that we want reference data to have both small mean and small standard deviation differences from one another; however, it is easier to "pass" statistical tests when data have a large standard deviation. For this reason, the mean and standard deviation of difference data are presented along with t-tests and equivalence tests.

Comparison of Field Survey Data
Histograms of the difference between Field-A and Field-B data at 2-m aggregation are presented in Figure 4.8. These histograms highlight the frequency with which the observers differed by more than 15%, despite the careful collection procedures, which mandated reassessment of any such samples. These histograms also demonstrate that difference data between the two field surveys were relatively Gaussian in shape, allowing for comparison using metrics such as the mean and standard deviation.
The mean and standard deviation of the data in    A t-test rejects the null hypothesis when p < α. α, representing type I error, is typically set to 0.05 [65]. In this case, the null hypothesis was that difference data had a zero mean. Table 4.2 displays paired t-test results for Field-A and Field-B data, along with the standard deviation of differences (σ). The results indicate that statistically significant differences exist between the field observers' estimates for all spatial levels of NPV and BS and PV at the 2-m scale.

Pairwise Comparisons of Reference Data Versions
In the previous section, Field-A and Field-B data were compared in some detail, including  Each reference data version was compared to six other versions with four ground cover classes, resulting in 24 t-tests. The null hypothesis was rejected for at least half of the 24 t-tests for all reference data versions. Comparing field survey and imagery analysis methods only, the null hypothesis was rejected for 13 of the 24 t-tests. These results showed that our independent versions of reference data were significantly different from one another and that no single method stood out as the clear winner.

Comparison of Reference Data Versions to the Mean of All
Results from pairwise comparisons showed that various forms of carefully estimated reference data were significantly different. Furthermore, it was not clear which version of reference data was superior. Given these findings, we propose that for each plot and class, the mean of all (MOA) versions of reference data is more likely to represent true ground cover class abundances than any single method or observer. In this section, all reference data versions are compared to MOA, in order to determine which reference data approach was closest to the best available estimate of true abundances.
In order to better visualize the MOA concept, we present  One way to explore which reference data method is closest to MOA would be to display difference data histograms, similar to Figure 4.8, for each combination of different reference data methods and aggregation scales. However, we can also represent much of the information in a histogram by finding the center (mean) and spread (standard deviation) of difference data. To further streamline data presentation, we can combine mean and standard deviation data across classes. These data are presented in Figure 4.11.

Statistical Equivalence Tests
MATLAB R2016a's ttest function provides confidence intervals that could be used for equivalence testing; however, we modeled all versions of our reference data within the Statistical Analysis Software (SAS OnDemand) general linear model (GLM) environment to produce more robust confidence intervals. The results of this analysis are provided in Figure 4.12.
It should be noted that for modeling purposes in the SAS GLM environment, we compared each reference data version to a linear combination of all other versions. This "leave-oneout" approach is slightly different than comparison to MOA, as MOA includes the data being compared. For this reason, SAS GLM confidence intervals are not perfectly centered on mean distance from MOA results displayed in subsequent tables.
The ideal confidence interval result would be narrow intervals centered on zero. Our results showed that the accuracy of reference data versions varied considerably across classes, e.g., Analyst-A data were the best version of reference data for rock, but were the worst version for NPV and BS. Among RSRD versions, RSRD-NNLS had the overall tightest zone of equivalence, with RSRD-ED a close second. Tabulated confidence interval data and the resulting equivalence zone are provided for RSRD-NNLS data in Table 4.4. Note that the overall equivalence zone is simply the extent of extreme lower and upper CI values.

DISCUSSION
As noted before, both field observers were graduate students in imaging science, with an emphasis in remote sensing, who carefully and independently estimated reference data for 1275 2 m×2 m samples, within 51 10 m×10 m plots. Their estimates were collected at the same time and compared in the field after every fifth sample to reduce the chances of human error. If any percent coverage estimate varied by 15% or more for any class, they each examined the 2 m×2 m sample again. Data were transferred to electronic form each evening with errors being corrected on the same day as data collection. Our field surveys were assumed to be of the highest quality that can reasonably be collected, and procedures were in place to reduce human error. Yet, despite this careful estimation of reference data, statistically significant differences existed between Field-A and Field-B data at 2-m, 6-m and 10-m spatial aggregation scales. Further examination revealed statistically significant differences between all our versions of reference data, and traditional methods of field surveys and imagery analysis faired no better than RSRD methods.
It should be noted that the standard deviation of differences between Field-A and Field-B data were smaller than for the other pairwise comparisons. This may indicate that field surveys were the most dependable way to produce accurate reference data. However, it should also be noted that the field survey data may have been less independent than the other versions. The observers worked in tandem throughout the field surveys. While they each independently estimated all 1275 2 m×2 m samples, the collection procedures included comparing results every five samples in an effort to reduce errors and mistakes. This process of checking answers inevitably had some effect on abundance estimation decisions.
The t-test results in this paper are a reminder that even carefully generated reference data are not perfectly accurate and should be validated prior to use in assessing other algorithms. In the past, remote sensing algorithm papers generally have not discussed the accuracy of reference data and often have not cited how reference data were created. Given the results in this paper, we encourage a more robust treatment of reference data in the remote sensing community.
Having used t-tests to demonstrate significant differences between versions of reference data, in order to reinforce the importance of accuracy validation, we now turn our attention toward determining the degree of similarity among reference data. We accomplished this by examining the mean and standard deviation of differences, as well as confidence intervals and resulting zones of equivalence.
We interpreted mean differences as absolute bias between different forms of reference data, i.e., Observer-A "saw" less NPV on average than Observer-B. These biases between reference data versions and MOA were relatively similar across reference data versions.
Standard deviation of differences from MOA, on the other hand, were interpreted as the steadiness or reliability of each reference data estimate, i.e., Observer-B's estimates were reliably close to MOA and there were few outliers. These mean and standard deviation differences can be equated to accuracy and precision, respectively. Spatial aggregation of reference data from 2 m to 10 m did not decrease mean differences; however, aggregation significantly and predictably reduced standard deviation differences.
This reduction in variance with spatial aggregation was a positive trend, showing the power of the RSRD concept of aggregating many fine-scale pixels to produce AMRD for coarse-scale imagery. We expect that aggregation to coarse imagery with GSD > 10 m would result in little change to mean differences, but would further reduce standard deviation differences. As such, we consider the validation in this paper to be valid for any coarse imagery with GSD > 10 m.

Statistical analysis of difference data revealed that Field-B, Analyst-B and RSRD-NNLS
versions of reference data were the closest versions to MOA. Analyst-B appeared to represent the best overall version of reference data when compared to MOA, with a mean difference of 2.0% and a standard deviation of 5.6% at 10-m aggregation. RSRD-NNLS was nearly as accurate and precise, with a mean difference of 3.0% and standard deviation of 6.3%.
It should be noted here that the generation of MOA required significant effort for just 51 plots of data and would be unrealistic across large spatial extents. Similarly, the generation of scene-wide AMRD using traditional field survey or imagery analysis methods would be unsustainable. As such, the MOA concept is used to validate the accuracy of scene-wide AMRD generated using the RSRD technique. Based on standard deviation differences and zones of equivalence, we selected RSRD-NNLS to serve as validated scene-wide AMRD.
We considered using the best case version of RSRD for each ground cover class in an effort to reduce the expected error and uncertainty in the final reference data. However, constructing scene-wide AMRD from different algorithms would introduce sum-to-one error (an assumption in many unmixing algorithms) in the final abundances, and correcting this would introduce another source of error.
Using the NNLS implementation of RSRD to produce scene-wide AMRD for our three scenes yields Figures 4.13 to 4.15. The accuracy of these reference data has been validated in this paper, resulting in the data provided in Table 4.5. Table 4.5: Accuracy validation data for scene-wide AMRD generated using RSRD-NNLS. All data are expressed in terms of percent coverage area differences.

CONCLUSIONS
The purpose of this paper was to validate abundance map reference data (AMRD) generated using the remotely-sensed reference data (RSRD) technique, which aggregates classification or unmixing results from fine-scale imaging spectrometer (IS) data to produce AMRD for colocated coarse-scale IS data. This validation effort included three separate remote sensing scenes. The scenes were specifically chosen to be representative of many remote sensing environments. They contain a variety of common ground cover, including asphalt roads, dirt roads, concrete, buildings, grass fields, dry valley grasslands, high mountain forests, large rock outcroppings, etc. Therefore, we expect the conclusions of this paper to be generally applicable to similar rural and suburban scenes. We recommend that additional validation studies focus on more diverse and even densely-mixed land cover types.
Validation was accomplished by estimating AMRD in 51 randomly selected 10 m×10 m plots, using seven different methods or observers, and comparing the results. These independent versions of reference data included field surveys by two observers, imagery analysis by two observers and RSRD by three algorithms. Given that t-test comparisons showed statistically significant differences between all seven versions of reference data, we proposed that the best estimate of actual ground cover abundance fractions within the 51 plots is found by taking the mean of all (MOA) independent versions of reference data for each plot and class. Generating MOA for a limited number of random plots is labor intensive, but once generated, MOA can then be used to validate AMRD generated using the RSRD technique, which efficiently produces scene-wide reference data and can serve as a validated baseline for all future algorithm developments.
At the 10-m GSD aggregation scale, mean differences between versions of reference data and MOA were 1.6 to 6.1%, with standard deviations of 5.6 to 9.2%. The closest version of reference data to MOA was a version of imagery analysis, with mean differences of 2.0% and a standard deviation of 5.6%. The RSRD algorithm based on NNLS spectral unmixing was nearly as close to MOA, achieving mean differences of 3.0% and a standard deviation of 6.3%. Equivalence testing yielded a zone of equivalence between [−7.0%, 7.2%] for RSRD-NNLS. Considering the efficiency of RSRD in producing scene-wide AMRD, these results are promising. Validated scene-wide AMRD generated using RSRD-NNLS are available for use by the remote sensing community.
The novel contributions to the pixel unmixing branch of remote sensing research include: (1) a documented validation of the accuracy of abundance map reference data (AMRD) itself; (2) the inclusion of five reference data classes, which is similar to many contemporary classification/unmixing efforts; and (3) the creation and validation of AMRD for three new study scenes using different IS sensors. Secondary goals are to make the larger remote sensing community more aware of the need to validate reference data itself and to establish methods to quantitatively assess the performance of spectral unmixing algorithms.
This validation effort centered around using generic 10-m GSD coarse-scale imagery, which was designed such that the reference data accuracy validation in this paper would be

ABSTRACT
Reference data ("ground truth") maps have traditionally been used to assess the accuracy of classification algorithms. These maps typically classify pixels or areas of imagery as be-longing to a finite number of ground cover classes, but do not include sub-pixel abundance estimates; therefore, they are not sufficiently detailed to directly assess the performance of spectral unmixing algorithms. Our research aims to efficiently generate, validate, and apply abundance map reference data (AMRD) to airborne remote sensing scenes. Scene-wide AMRD for this study were generated using the remotely sensed reference data (RSRD) technique, which spatially aggregates classification or unmixing results from fine scale imagery

INTRODUCTION
Reference data maps, which are often called "ground truth" maps in remote sensing literature [10,26], are maps that accurately label areas or pixels of the map as belonging to a finite number of ground cover classes. Such maps have traditionally been used to assess the accuracy of classification algorithms. Well known examples of reference data (RD) include Cuprite [31], Indian Pines [32], Salinas [3], and Pavia [38]. These pixel-wise RD have been used extensively to quantitatively evaluate the performance of classification algorithms. Abundance map reference data (AMRD) are a special type of RD, where each pixel is allowed to be a mixture of the various ground cover classes. AMRD are designed to quantitatively evaluate the performance of spectral unmixing algorithms [27,5]. However, we are not aware of any similarly well-known and widely-available AMRD scenes for airborne imaging spectrometer (IS) data.
Traditional methods of generating RD, such as field surveys and imagery analysis, have not been undertaken frequently because of the time and expense required to accurately generate such data [12]. The added complexity of allowing each RD pixel to be a mixture of the various ground cover classes further complicates the problem. In order to efficiently generate scene-wide AMRD, we proposed a technique called remotely sensed reference data (RSRD) [27], which creates AMRD for coarse scale imagery using co-located fine scale imagery. Specifically, RSRD performs standard classification or unmixing on fine scale imagery (e.g., 1-m GSD National Ecological Observatory Network (NEON) IS), then aggregates fine scale results to co-registered coarse scale pixels (e.g., 15-m GSD Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) IS), thereby creating AMRD for the coarse scale imagery.
While RD frequently have been used to assess the performance of classification algorithms and occasionally have been used to assess unmixing algorithms, many studies neglect to account for the accuracy of the RD itself. This means that RD of unknown accuracy often have been used as the benchmark for assessing the accuracy of other algorithms. Indeed, we are not aware of any RD scenes with published validation studies of the RD itself.
In order to determine the accuracy of AMRD generated using the RSRD technique, we conducted an extensive validation study of AMRD for three remote sensing scenes [5]. This paper continues our work with AMRD by applying our previously generated and validated AMRD to specific coarse scale airborne IS imagery. The main concepts involved in applying validated AMRD to coarse scale imagery are listed below.
1. Spatial alignment of fine and coarse scale imagery 2. Aggregation of fine scale abundances to produce specific coarse scale AMRD 3. Comparison of spectral unmixing results and AMRD.
The objective of this study, therefore, is to implement these main concepts using several versions of coarse scale imagery and spectral unmixing. The subsequent sections introduce these important concepts in more detail.

Spatial Alignment
Accurate spatial alignment between fine and coarse scale imagery is necessary to properly use validated AMRD. Misalignment by only half a coarse scale pixel would significantly increase the error in AMRD itself. Imprecise spatial alignment would necessitate scaling analysis to multiple pixel windows, for example, one previous study using RSRD-like RD recommended scaling analysis to 9 × 9 Landsat pixel windows (270 m × 270 m on the ground) due to inaccurate alignment between fine and coarse imagery [39].
Airborne imagery georegistration is often accurate to 0.5-1.0 pixels. For example, AVIRIS does not claim sub-pixel accuracy from their improved (2010 version) orthorectification and georeferencing processing [68]. We have observed registration errors of 15+ m between NEON and AVIRIS imagery [27]. Lack of georeferencing accuracy necessitates image-to-image spatial alignment, rather than relying on imaging system geocoding.
Image registration is a vast field of research where numerous approaches and algorithms have been developed for a wide variety of image processing tasks. Common approaches include feature matching and intensity correlation, with comparisons made in both the spatial and frequency domains, while post registration resampling and alignment of data are carried out with affine transforms or warping [69]. Remote sensing image registration has often relied on feature matching methods and warping [70,71]. It should be noted that image registration for RSRD is challenging due to highly dissimilar spatial resolution, the need for sub-pixel registration accuracy, and the use of different imaging sensors.

Spatial Fidelity of Individual Coarse Pixels
In addition to aligning fine and coarse scale imagery at the whole scene level, we must also concern ourselves with the spatial fidelity of individual coarse pixels. Much processing occurs between initial airborne collection of radiometric information and final output of imagery, which often includes atmospheric compensation, ortho-rectification, resampling to a north-south oriented grid, etc. [26,10,43]. The method used to spatially resample imagery is of special importance to the spatial fidelity of individual coarse pixels. Nearest-neighbor resampling is often employed in the ortho-rectification and resampling to a north-south grid process [72,73], prioritizing the radiometric fidelity of each collected sample, rather than the spatial fidelity of radiometric information. In the best case, nearest-neighbor resampling results in some final pixels whose reflectance values are a result of initial collections at pixel edge rather than at pixel center. In the worst case, nearest-neighbor resampling results in significant pixel re-use, where adjacent pixels are identical, because one airborne collection sample was spatially closest to two or more final pixels.

Spatial Aggregation
Aggregation is the term we use to describe averaging together the spectra of many fine pixels that are co-located within a single coarse scale pixel. Aggregation can be implemented using a simple averaging filter covering the rectangular area conceptualized by each coarse scale pixel, or it can take into account the point spread function (PSF) of a pixel, where reflected light from the center of pixels contributes more to the final reflectance value of a pixel than light from pixel edges, and at the same time, each pixel is affected by reflected light originating from outside its rectangular pixel outline [74,75]. The rectangular method is more representative of cover class mixtures on the ground, while the later is more representative of the radiometric information that would have been collected by the imaging sensor. Both methods are used in this study for purpose of comparison.

Comparison of Unmixing Results and AMRD
In most cases, AMRD have not been available to quantitatively assess the performance of spectral unmixing algorithms. In the absence of AMRD, researchers have used various methods to assess performance. Qualitative assessments have visually compared unmixing results to well-known RD, such as Cuprite, Indian Pines, and Pavia [59,60,51]. Quantitative assessments have used synthetic data [59,60,51], average per-pixel residual error [61,51], used the mean of several algorithms as the ideal and computed residuals from the ideal [61], etc.
Confusion matrices have been the standard method of quantitative comparison between RD maps and classification algorithms [12,26,10]. A confusion matrix is an n × n matrix for an n-class problem. Rows of the matrix typically represent RD, while columns represent classification data, where the (i, j)th entry represents the number of pixels in the image belonging to class-i that were classified to class-j. Confusion matrices allow the evaluation of overall classification accuracy, as well as providing insight into which classes tend to be confused with one another. Unfortunately, confusion matrices aren't readily applicable to spectral unmixing and AMRD. Methods have been proposed to generalize confusion matrices for use with AMRD [76], but these methods do not allow straight-forward comparison of individual matrix elements, and essentially amount to comparing total RD area to total unmixed area per class.
As mentioned previously, there are a number of studies that have produced sub-pixel RD through methodologies similar to our RSRD process. These studies also implemented methods to compare AMRD with spectral unmixing results, with the most common strategy being calculation of mean absolute error (MAE) and/or root-mean square error (RMSE) [41,40,42]. Several studies plotted unmixed fraction versus RD fraction and computed a linear regression of the result. Using this technique, perfect unmixing would result in an R 2 and slope equal to unity, with the intercept equal to zero [39,42].

Data
The three study scenes used in this paper are located on or near NEON Domain 17 (D17) research sites near Fresno, CA, USA. We selected these specific areas because they contained significant ground cover variation. The three scenes contain mostly natural landscapes, including high mountain forests, large rock outcroppings, dry valley grasslands, and oak savannah. We also included a developed area with buildings, roads, grass fields, etc. These cover types are representative of many remote sensing scenes and the results should be generally applicable to other similar studies. Extension of our methodologies to more complex urban landscapes should be possible, but we caution that each additional ground cover class increases the complexity of reference data generation and validation, while likely lowering the accuracy of resulting AMRD. Further information for each site is listed below and RGB imagery of the study areas is shown in Figure 5 iment. NEON RGB data were not used in processing tasks, but were useful for enhancing human analyst understanding of the study areas. Note that we used both orthorectified and unorthorectified AVIRIS data in this study, because AVIRIS' orthorectification processing resulted in many replicated pixels due to nearest-neighbor resampling. AVIRIS Ortho data were generated as part of the NASA HyspIRI preparatory campaign, which spatially resampled data to a 15-m GSD grid, rather than the standard 18-m GSD grid. This finer sample spacing resulted in higher than normal pixel reuse from nearestneighbor resampling.

Validated Abundance Map Reference Data
As mentioned previously, the primary purpose of this paper is to demonstrate the application of previously validated AMRD to specific examples of airborne remotely sensed imagery, such as the AVIRIS imagery described in Table 5.1. We recommend reviewing the validation paper [5] for detailed information on the validation process; however, we will mention the effort briefly here.
Letâ p,c,m be an estimate of a * p,c , using method m, where a * p,c is the "true" abundance fraction of ground cover class c in coarse pixel p. We generatedâ p,c,m for P = 51 10 m × 10 m plots, randomly chosen from the three study scenes, C = 4 ground cover classes, and M = 7 independent methods of abundance estimates. Details of C and M are listed below.
As discussed previously, RSRD aggregates fine scale classification or unmixing results to co-located coarse scale pixels, thereby estimating abundance fractions. We performed the underlying classification/unmixing using a Euclidean distance (ED) classifier, non-negative least squares (NNLS) spectral unmixing algorithm, and by taking the posterior probabilities from a maximum likelihood (ML) estimator, thereby estimating coarse scale abundances using independent algorithms employed within the RSRD framework. We found statistically significant differences betweenâ c,1 ,â c,2 , ... ,â c,7 , and concluded that no single estimate method was demonstrably superior, based on pairwise comparisons.
Given that a * p,c was unknown, we concluded that the best estimate of true abundance fractions within each 10 m × 10 m plot was the mean of all (MOA) independent versions.
We then compared each estimation method to MOA using Equations (5.2)-(5.4), and found thatâ p,c,4 (Analyst-B) was the closest method to MOA; however,â p,c,6 (RSRD-NNLS) was nearly as close to MOA, and provided the opportunity to efficiently generate scene-wide AMRD. As such, we selected RSRD-NNLS to serve as AMRD for our three scenes, with validated accuracy documented in Table 5.2. Note that we used our NNLS unmixing algorithm on fine scale imagery, but we later enforced a sum-to-one constraint on coarse scale AMRD pixels. This validation was performed for generic 10 m × 10 m coarse scale imagery, such that the validated scenes could be applied to any co-located coarse scale imagery with GSD > 10 m, such as the AVIRIS data used in this study. The validation study suggested that further aggregation beyond 10 m GSD would decrease the expected standard deviation differences from MOA, but would have no expected effect on mean differences from MOA. Since the coarse data used in this study have GSD ≥ 15 m, we expect the resulting AMRD accuracy to be similar to that listed in Table 5.2, with slightly reduced standard deviation.
The three scenes and corresponding validated AMRD are shown in Figures 5.3-5.5.
Note that a fifth class, other (roof), was included in the abundance maps to account for the buildings in SJERHS. However, this class was excluded from analysis in the validation because we were not able to collect field survey validation samples on roof tops. We have also excluded it from analysis in this paper, since its accuracy was not validated.

Spatial Alignment
Spatially aligning fine scale NEON IS data and coarse scale AVIRIS IS data was an important step in applying validated AMRD to specific coarse scale imagery. We first attempted the image registration task using well known feature matching algorithms, including various configurations of ENVI's image registration workflow [71] and a Scale-Invariant Feature Transform (SIFT) based algorithm [77]. In all cases, is was difficult to accurately identify matching features between fine and coarse scale imagery, leading to badly warped and misaligned output data. Furthermore, these algorithms resampled the warp image to coincide with the base image coordinate system, and did not provide a method to estimate registration accuracy in terms of distance on the ground. It is worth noting that the purpose of this study was not to optimize a state-of-the-art image registration algorithm; we simply needed a method capable of aligning fine and coarse scale imagery within a fraction of coarse scale pixels, thereby preserving the accuracy of validated AMRD.
As such, we used an enhanced version of our own image alignment algorithm [27] to align NEON IS imagery with AVIRIS Ortho, AVIRIS Unortho, NEON 15 m, and NEON 29 m data. We coined this algorithm as "Scene-Wide Spectral Comparison" (SWSC). SWSC is an intensity comparison, spatial domain, affine transform class alignment algorithm that uses the full spectrum of IS data for alignment comparisons. Specifically, the algorithm iteratively rotates, scales, and translates scale imagery, and at each iteration, compares coarse scale pixel spectra to an aggregation of the underlying fine scale pixel spectra. Comparisons were made using Spectral Angle Mapper (SAM) [10], which is described in Equation (5.5), where s represents an AVIRIS pixel spectrum and x represents an aggregation of underlying NEON pixel spectra.
The following steps detail our use of the SWSC algorithm to align NEON and AVIRIS imagery: 1. Extract study scene imagery from flight-line imagery in such a way that AVIRIS scenes are larger than NEON scenes in terms of real distance on the ground.
2. Spectrally resample NEON data to match AVIRIS spectral sampling, using a Gaussian model with FWHM equal to band spacing.
3. Trim NEON and AVIRIS data, resulting in square study scenes with an odd number of pixels along each edge. AVIRIS should still be larger than NEON in real ground distance.

4.
Construct an x-y coordinate system for each image, where the image center is located at the origin and each pixel is represented by an integer value.
6. At each rotate, scale, and translate iteration, compare each AVIRIS pixel spectrum to the underlying aggregated NEON pixel spectra using Equation (5.5) (see Figure 5.6).
7. Compute the mean r SAM of all comparisons in the previous step.
8. Cycle through all rotation, scale, and translation iterations.
9. Examine the mean r SAM graphs to identify the rotation, scale, and translation position resulting in the minimum mean r SAM , this is the location of optimal alignment.
10. Based on the optimal alignment location, trim excess AVIRIS and NEON imagery, resulting in final aligned images.
11. Based on the optimal alignment location, aggregate validated AMRD corresponding to final aligned AVIRIS imagery, resulting in final aligned AVIRIS specific AMRD.
12. Confirm alignment via visual inspection of imagery and AMRD.

Spatial Aggregation
Aggregation of fine scale pixels corresponding to coarse scale pixels was used in both the SWSC algorithm and in subsequent generation of final coarse scale AMRD. We performed both these tasks using two versions of aggregation, a simple rectangular averaging filter the size of coarse scale pixels, and a Gaussian filter with full width half max (FWHM) equal to the GSD of coarse scale pixels. Gaussian aggregation of fine scale pixels was representative of the coarse scale sensor PSF [43]. Rectangular aggregation was performed via averaging pixel spectra in the spatial domain, while PSF aggregation was performed via multiplication in the frequency domain and downsampling in the spatial domain.
We re-used the same endmembers for coarse scale unmixing that we had used to produce our validated AMRD. In the case of AVIRIS Ortho/Unortho unmixing, NEON derived endmembers were spectrally resampled to match AVIRIS bands. Endmembers were obtained from NEON IS imagery by manual inspection and selection, including choosing ten exemplar pixels for each initial ground cover class, whose spectral mean served as the class endmember (s c ). While this method was somewhat simplistic compared to state of the art endmember generation and variability research, its accuracy compared to field surveys and imagery analysis was carefully validated in our previous work. Endmembers were selected independently for each of the three scenes and remained constant throughout validation and application. Unmixing was performed using a larger number of initial ground cover classes, which were subsequently combined into our final ground cover classes of PV, NPV, BS, and Rock. For example, in the SJERHS scene, initial ground cover classes were: white roof, grey roof, grass vegetation, tree vegetation, dry vegetation, dry grass, dirt road, dirt trail, concrete, and blacktop. We selected these initial ground cover classes to characterize the major spectral variation within the scene. Unmixing was performed using these initial ground cover classes, with resulting abundances being combined to produce abundances for our final cover classes. Further details are available in our AMRD validation paper [5].
x p (LS) = b c,0 . The calculated abundances,á p,c , are found using Equation (5.18), and the closeness of fit, R 2 c , is found using Equation (5.19), whereā c represents the mean unmixing fraction of   Surprisingly, mean r SAM values for AVIRIS Ortho were lower than AVIRIS Unortho, despite the high rate of pixel reuse from nearest-neighbor resampling. highest for the SOAP299 scene. We attributed these differences in alignment accuracy to steep elevation gradients and shade shifting throughout the day in the SOAP299 scene. PSF aggregation returned slightly lower mean r SAM results than the rectangular case. Similar to the results displayed in Figure 5.8, alignment between NEON IS and semi-synthetic coarse imagery returned much lower mean r SAM values than alignment with AVIRIS imagery.  Occasionally, PSF and rectangular aggregation methods yielded slightly different rota-tion, scale, and/or translation parameters, which amounted to alignment differences of less than 1 m. In these cases, we split the difference between the results from the two methods.

Spatial Alignment Accuracy
Since  Table 5.4. In this analysis, we combined results from three study scenes and four ground cover classes, due to the large amount of available data. Note that errors for certain scenes or classes may depart significantly from the mean, as shown in Figure 5.9.  Large errors in unconstrained LS unmixing disproportionately affected the AVIRIS cases, especially AVIRIS Unortho. Given that the discrepancies between NEON and AVIRIS coarse imagery were also significant, we also recomputed unmixing and reference type error metrics, while including LS unmixing and excluding AVIRIS imagery, resulting in Table 5.6.

Assessing the Performance of Unmixing Using Regression
Linear regression between unmixing abundances and AMRD provided another method for analyzing the performance of unmixing algorithms. Figure 5.9 shows the result of scatter plots of unmixing abundances versus AMRD for the same data analyzed in      look like, and provided a baseline against which to compare AVIRIS results. In this context, AVIRIS alignment results were far from the perfect test case, but ultimately resulted in well-defined troughs with the precision required to achieve single fine scale pixel accuracy per parameter. We interpreted these results as indicating successful spatial alignment between coarse and fine imagery. In making this interpretation, however, we acknowledge some level of misalignment, which lowers the accuracy of validated AMRD.
We used both orthorectified and unorthorectified versions of AVIRIS data in this study, due to high pixel replication rates from nearest-neighbor resampling during the orthorectification process. As such, we expected unorthorectified AVIRIS data to align more closely with NEON imagery, but our results indicated little difference between the two data sets, with orthorectified data slightly outperforming unorthorectified data in spatial alignment. Minimum mean r SAM values for AVIRIS Ortho ranged from 0.07-0.14 radians, while AVIRIS Unortho values ranged from 0.09-0.13 radians. The ideal coarse imagery for this study likely would have been orthorectified, but used interpolation, rather than nearest-neighbor resampling, during the orthorectification process.
We compared PSF and rectangular aggregation strategies for both spatial alignment and coarse imagery specific AMRD generation. As expected, PSF based spatial alignment returned lower minimum mean r SAM values than the rectangualar case, by a factor of approximately 10%. Similarly, unmixing assessments using PSF-aggregated AMRD resulted in slightly lower error than rectangular-aggregated AMRD, by a factor of approximately 5%.
Nevertheless, we considered rectangular-aggregated AMRD to more accurately represent true ground cover per pixel. PSF aggregation more accurately represented sensor reaching radiance in our imagery, but our goal was to assess ground level abundances, which were better represented by rectangular aggregation. The difference between PSF and rectangular AMRD represents a persistent form of error in unmixing; i.e., even if we could perfectly unmix the signal measured by an imaging sensor, perfect unmixing of ground abundances would require undoing sensor PSF blurring. We recommend using PSF aggregation in spatial alignment and rectangular aggregation when generating imagery-specific AMRD.
Mean absolute error (MAE) and linear regression provided concise methods for evaluating unmixing performance. Furthermore, reference data mean adjusted MAE (MA-MAE) and confidence interval adjusted MAE (CIA-MAE) gave us the ability to factor in the known error of AMRD data itself. These metrics allowed us to bring the comparison back to MOA, which we determined to be the best available representation of true ground cover fractions. As such, we interpreted the MA-MAE metric as being the most likely error from true ground cover, with CIA-MAE providing bounds. Basic MAE turned out to have lower error for the cases in this study, but lower does not always mean better, as our goal was to assess the true error in unmixed abundances.
Based on MAE analysis alone, we might have concluded that BS was the overall most accurate class in our study. However, the regression results suggested that low MAE in BS may have been due to overall lower fractional abundances, rather than superior accuracy. While MAE answered the basic question of how far the average unmixed abundance was from its true abundance, regression scatter plots provided the ability to visualize and correctly interpret the differences. Scene-specific coloring in the regression figures allowed us to examine the influence of various ground cover classes from each scene. For example, Figure 5.10a-d indicate that there were a large number of SOAP299 data (mountain forest) that fell near 1 in AMRD, but significantly lower than 1 in unmixing abundances.
Further exploration of this phenomenon revealed that these data were primarily from the PV class. It appears that our selection of endmember spectra for PV in the SOAP299 scene emphasized the brightest areas of the image, and as such, the less bright PV areas were underestimated in LS and NNLS unmixing. FCLS unmixing compensated for some of this error. These findings emphasized the importance of endmember estimation and the strong effect of intra-class variability in the unmixing process. Combined use of MAE and regression analysis thus enhanced our understanding of unmixing accuracy.

CONCLUSIONS
The purpose of this paper was to apply our previously validated abundance map reference Spatial alignment between coarse and fine imagery was perhaps the most challenging aspect of applying validated AMRD to specific coarse scale imagery. After unsuccessfully attempting spatial alignment using standard remote sensing tools, we opted to implement our own spatial alignment algorithm, namely, scene-wide spectral comparison (SWSC), which was designed to use all of the spatial and spectral information from both scenes simultaneously to accomplish spatial alignment. We determined that this spatial alignment approach yielded accuracy approaching that of a single fine scale pixel.
We generated semi-synthetic downsampled coarse scale imagery, NEON  were similar to those of NEON 15 m, indicating that larger GSD coarse imagery could be used with our methodologies, provided that the spatial and spectral attributes of such data allow for accurate spatial alignment. The significant differences in the error between NEON 15 m/29 m and AVIRIS Ortho/Unortho were a reminder that synthetic imagery can be highly useful in research studies, but that we should not underestimate the differences between synthetic and real imagery.
We compared unmixing results to AMRD using mean absolute error (MAE) and lin- Our reference data research has demonstrated a viable methodology to efficiently generate, validate, and apply AMRD to specific examples of airborne remote sensing imagery, thereby enabling direct quantitative assessment of spectral unmixing performance. We encourage the remote sensing community to adopt a similarly robust treatment of reference data in future research efforts.

Summary
Chapter 1 introduced maps from a historical perspective, from ancient world maps to reference data maps, which are the main focus of this dissertation. Abundance map reference data (AMRD) were introduced in this context as specialized maps used to quantitatively evaluate the performance of spectral unmixing algorithms. Research objectives also were introduced, including: 1) demonstration of remotely sensed reference data (RSRD), a new technique for efficiently generating AMRD, 2) validation of error and uncertainty in AMRD generated using RSRD, and 3) application of RSRD to examples of real imagery in order to provide validated AMRD to the remote sensing community.
Since this thesis was written in the modern format, with the main chapters being complete research papers, Chapter 2 contains additional background information specific to reference data. Key attributes of reference data sets were identified, and well-known reference data were introduced and analyzed with respect to key attributes. We also mentioned several lesser-known research studies that produced sub-pixel reference data using techniques similar to RSRD. Finally, we discussed the complexity of imaging spectroscopy data and accompanying reference data, which contributed various forms of error to unmixing results and the AMRD itself. Persistent forms of error included the standard deviation error in validated AMRD, and the difference between ground level abundances and coarse sensor PSF abundances. Other limitations included the fixed nature of the classification scheme, whereby adding or subtracting classes would require re-validation of AMRD, and the fact that our validation was accomplished for the three scenes together, making it necessary to compare unmixing results together as well.
Chapter 3 introduced RSRD as a potential technique to efficiently produce scene-wide AMRD. This chapter is a slightly modified version of a conference paper that we presented and published at the SPIE Defense and Security Conference in April 2016. The RSRD technique was described in detail and applied to a scene from the San Joaquin Experimental Range (SJER), near Fresno, CA. Initial results were promising, but lacked validation to quantify error and uncertainty in the data.
Chapter 4 validated three AMRD scenes produced using the RSRD technique. This chapter represents a journal paper that we published in Remote Sensing in May 2017. Validation was accomplished by estimating AMRD in 51 randomly selected 10 m × 10 m plots using seven independent methods and observers. Various tools were used to compare data, including histograms, mean and standard deviation of differences, t-tests, confidence intervals, and statistical equivalence tests. Results indicated statistically-significant differences between all seven versions of reference data, reinforcing the need to validate reference data, regardless of the method of generation. The mean of all (MOA) versions of reference data for each plot and class were proposed as the most likely estimate of true abundances. Each reference data version was then compared to MOA, resulting in mean and standard deviation differences. We found that one of the RSRD algorithms was nearly as accurate as the best traditional method, showing the potential of RSRD to efficiently generate accurate scene-wide AMRD.
Chapter 5 applied our previously validated AMRD to specific examples of coarse scale imagery over our three study scenes. This chapter represents a journal paper that we published in Remote Sensing in August 2017. Application of AMRD to specific coarse scale imagery involved three main parts: 1) spatial alignment of coarse and fine scale imagery, 2) aggregation of fine scale abundance to produce coarse scale imagery specific AMRD, and 3) demonstration of comparisons between coarse scale unmixing abundances and AMRD. Results indicated that our scene-wide spectral comparison (SWSC) algorithm aligned coarse and fine scale imagery with accuracy approaching the distance of a single fine scale pixel. Coarse sensor point-spread function (PSF) aggregation yielded lower error than simple rectangular aggregation, but we concluded that rectangular aggregation more accurately represented true ground abundances. We introduced several modifications to standard comparison methods that allowed us to account for known error in the reference data itself. Finally, several versions of validated AMRD for specific coarse scale imagery are available for our three study scenes, enabling direct quantitative assessment of spectral unmixing performance.

CONCLUSIONS
Generation of scene-wide abundance map reference data (AMRD) using traditional methods, such as field surveys and imagery analysis, is simply not practical, as evidenced by our own field surveys, which required two man-weeks of work to estimate abundances in 51 coarse scale plots. At this pace, it would have taken 70 man-weeks to estimate AMRD for AVIRIS imagery in the three study scenes. Imagery analysis was not as labor intensive as field surveys, but still would have required approximately 15 man-weeks to estimate AMRD in the three study scenes. Furthermore, this type of estimation approach is exceedingly tedious, and data from our validation process suggested that estimation accuracy decreased with the number of plots estimated per day. Given these challenges, the remotely sensed reference data (RSRD) technique provides a viable alternative to traditional methods, enabling efficient scene-wide generation of AMRD. The National Ecological Observatory Network (NEON) regularly collects fine scale imaging spectrometer data over a wide variety of landscapes, providing an opportunity to generate a large set of standard AMRD for the remote sensing community using RSRD techniques.
Validation of AMRD, generated using RSRD techniques, is necessary in order to estimate the error in the reference data itself. Typically, we assess accuracy against some measure that is assumed to be "true," but in the case of AMRD, no such data exist. As such, we generated independent estimates of AMRD for a limited number of randomly selected plots, using field surveys, imagery analysis, and RSRD. We compared each version of reference data against the other versions and found statistically-significant differences between all versions of reference data, including field surveys and imagery analysis by our different human observers. This finding reinforced the need to validate reference data, no matter the generation method. We assumed that the mean of all (MOA) of these independent estimates per plot and class was the most accurate representation of truth that we could reasonably obtain. Individual versions of AMRD were then compared to MOA, thereby ascertaining accuracy. Using this methodology, we found that best case accuracy was achieved by a version of imagery analysis, with a mean coverage area error of 2.0% and a standard deviation of 5.6%. Fortunately, one of the RSRD versions was nearly as accurate, achieving a mean error of 3.0%, with a standard deviation of 6.3%. We selected this version of RSRD to provide scene-wide AMRD, with known validated accuracy for each ground cover class. This effort validated the accuracy of AMRD for our three study scenes, along with demonstrating a methodology that could be used broadly to validate the accuracy of other reference data products.
Application of our previously validated AMRD to specific examples of coarse scale imagery required spatial alignment between coarse and fine scale imagery, aggregation of fine scale abundances to co-located coarse scale pixels, and demonstration of comparisons between coarse imagery unmixing abundances and AMRD. Spatial alignment was an especially challenging aspect of this process. We attempted spatial alignment with standard remote sensing tools, e.g., ENVI's image registration work-flow, without success. These image registration algorithms typically generate a number of tie points which are used to warp one image onto the other; however, we found that the standard algorithms couldn't identify tie points accurately enough, due to the large difference in GSD between our coarse and fine scale imagery, resulting in badly warped images. We therefore developed our own spatial alignment algorithm to accomplish the task, namely scene-wide spectral comparison (SWSC). This algorithm takes advantage of the information from the entire scene and all available spectral bands simultaneously to estimate optimal alignment between images.
Using this algorithm, we were able to align NEON (1-m GSD) and AVIRIS (15-or 18-m GSD) imagery with accuracy approaching the distance of a single NEON pixel. Aggregation using a filter, which approximates the point-spread function (PSF) of the coarse imaging sensor, more closely matched coarse scale imagery, but we found that a simple rectangular filter aggregated AMRD more closely compared to true abundances at ground level. Researchers previously have used mean absolute error (MAE) and linear regression (LR) to measure the difference between unmixing abundances and AMRD. We introduced modified metrics that take into account the known error in the references data itself, namely refer- The research presented in this dissertation has generated, validated, and applied scenewide AMRD for three remote sensing scenes, thereby making robust AMRD available to the remote sensing community for perhaps the first time. Furthermore, this work has demonstrated viable methodologies to efficiently generate, validate, and apply AMRD for new examples of airborne remote sensing imagery, thereby enabling direct quantitative assessment of spectral unmixing performance. We encourage the remote sensing community to adopt a similarly robust treatment of reference data in future research efforts.

FUTURE WORK
While conducting the research of this dissertation, we have identified ways in which our methods could be improved, expanded, or applied to future research. The following list documents several of these opportunities: We intended to apply our validated AMRD to both AVIRIS and Landsat imagery for the three study scenes, but we found that spatial alignment between NEON and Landsat imagery was especially difficult, and time did not permit us to solve this problem. This challenge likely is due to significantly different spectral resolution between hyper-spectral NEON and multi-spectral Landsat data. Solving the alignment problem between NEON and Landsat data would open up a vast catalog of data over the next several decades that could be used to generate a large set of standard AMRD for multispectral satellite based imagery.
The methodologies presented here would be especially effective if an aerial campaign were specifically designed to produce accurate AMRD for coarse scale imagery. Key elements of such a campaign would be dual flights by the same imaging sensor, at low and high elevation, providing fine and coarse imagery, respectively. Given that georeferencing between the two imagery sets would use the same processing chain, this would likely result in data with more accurate spatial alignment. Furthermore, field survey validation sampling could be performed at or near the time of the aerial campaign. Such an experiment would likely result in an even higher accuracy AMRD.
Artificial neural networks have shown potential in various remote sensing applications; however, they are limited by insufficient training data. Our methodologies could be used to produce a large amount of AMRD with the purpose of training artificial neural networks to perform remote sensing tasks; such data could be used to supplement synthetically generated training data.