Exploring Subpixel Learning Algorithms for Estimating Global Land Cover Fractions from Satellite Data Using High Performance Computing

Land cover (LC) refers to the physical and biological cover present over the Earth’s surface in terms of the natural environment such as vegetation, water, bare soil, etc. Most LC features occur at finer spatial scales compared to the resolution of primary remote sensing satellites. Therefore, observed data are a mixture of spectral signatures of two or more LC features resulting in mixed pixels. One solution to the mixed pixel problem is the use of subpixel learning algorithms to disintegrate the pixel spectrum into its constituent spectra. Despite the popularity and existing research conducted on the topic, the most appropriate approach is still under debate. As an attempt to address this question, we compared the performance of several subpixel learning algorithms based on least squares, sparse regression, signal–subspace and geometrical methods. Analysis of the results obtained through computer-simulated and Landsat data indicated that fully constrained least squares (FCLS) outperformed the other techniques. Further, FCLS was used to unmix global Web-Enabled Landsat Data to obtain abundances of substrate (S), vegetation (V) and dark object (D) classes. Due to the sheer nature of data and computational needs, we leveraged the NASA Earth Exchange (NEX) high-performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into four classes, namely forest, farmland, water and urban areas (in conjunction with nighttime lights data) over California, USA using a random forest classifier. Validation of these LC maps with the National Land Cover Database 2011 products and North American Forest Dynamics static forest map shows a 6% improvement in unmixing-based classification relative to per-pixel classification. As such, abundance maps continue to offer a useful alternative to high-spatial-resolution classified maps for forest inventory analysis, multi-class mapping, multi-temporal trend analysis, etc.


Introduction
During the past two and a half decades, various subpixel learning algorithms have been developed to disintegrate a pixel spectrum into its constituent spectra in medium to coarse spatial resolution multispectral data [1].The aim of these learning algorithms is to characterize mixed pixels through a mixture model assuming that observed data constitutes a mixture of two or more objects [2].Defining a direct observation model that links these object's quantities to the observed data is a non-trivial issue and requires an understanding of complex physical phenomena.A radiative transfer model (RTM) can accurately describe the light scattered by the objects in the observed scene [3].Theoretically, radiative transfer is the physical phenomenon of energy transfer and describes the propagation of radiation on the Earth's surface affected by interaction processes between radiation, atmospheric constituents and the Earth's physical surface [4].RTM solves the radiative transfer equation that describes these interaction processes in a mathematical way to develop numerical RTM, but would lead to very complex unmixing problems.Fortunately, imposing a few assumptions can lead to exploitable mixing models.
So far, two types of models have been proposed in the literature to unmix mixed pixels.First, a macroscopic mixture model where the incident light interacts with just one object (example, checkerboard type scenes) [5], because the spatial resolution of the instrument is not fine enough.Second, a microscopic mixture model (also called intimate spectral mixture), which is a nonlinear mixing of objects.This usually happens due to physical interaction between the light scattered by multiple objects in the scene, where the two objects are homogeneously mixed.Although researchers are debating the use of linear unmixing versus nonlinear unmixing (see [5] for an overview), the nonlinear mixture model is still immature compared to its linear counterpart.Consequently, in this paper, we will focus on the linear mixture model (LMM).Apart from its inherent simplicity, LMM is an acceptable approximation of the light scattering mechanism in many real scenarios [5][6][7][8][9].LMM infers a set of pure object's spectral signatures (called endmembers), and fractions of these endmembers (abundances) in each pixel of the image [10], i.e., the objects are present with relative concentrations weighted by their corresponding abundances.The endmembers can be either derived using endmember extraction algorithms from the image pixels or obtained from an endmember spectral library available a priori.
Various theories and methods have evolved to solve the mixed pixel problem, such as linear spectral unmixing [11], Gaussian mixture discriminant analysis [12], linear regression and regression tree [13], regression models based on random forest [14], spatial-correlation-based unmixing [15], unmixing based on distance geometry [16], and normal composite model [17].The approach to mixed pixel problem ranges from modeling the component mixtures to solving the linear combinations to obtain abundances through geometrical, statistical and sparse-regression-based techniques; a lengthy discussion on the review of these techniques can be continued ad nauseum [5,18].LMM renders an optimal solution that is in unconstrained or partially constrained or fully constrained form.A partially constrained model imposes either the abundance non-negativity constraint (ANC) or abundance sum-to-one constraint (ASC) while a fully constrained model imposes both.ANC restricts the abundance values from being negative and ASC confines the sum of abundances of all the classes to unity.Unconstrained least squares [18], orthogonal subspace projection [19], singular value decomposition [20], mixed tuned matched filter [21], and constrained energy minimization [18], etc. are examples of unconstrained models.Sum-to-one constrained least squares (SCLS) and non-negative constrained least squares (NCLS) are cases of partially constrained solutions, which are generally normalized to obtain normalized SCLS and normalized NCLS solutions [22].More often, unconstrained and partially constrained algorithms are appropriate for applications seeking target detection, identification and discrimination, while constrained models are more suitable for target quantification and abundance estimation, which can be supervised or unsupervised or automatic (used for anomaly detection without the need for target information) [18].
Despite previous research conducted on subpixel learning algorithm development, the most flexible, appropriate and robust approach for large scale classification in different types of landscape scenarios is still not recognized and remains debatable.Most published algorithms are restricted to the standard AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) cuprite dataset obtained over Nevada, USA [23], the hyperspectral image data collection experiment (HYDICE) distributed by the Army Geospatial Center, US Army Corps of Engineers [24], or datasets acquired by researchers using specialized sensors [25], with 1300 mineral signatures from the United States Geological Survey (USGS) digital spectral library [26] or the NASA (National Aeronautics and Space Administration) Jet Propulsion Laboratory ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) spectral library [27].There are some specific case studies limited to a few spatio-temporal snapshots of multispectral and hyperspectral data [28][29][30].Although the usage of these ready-to-use data and spectral libraries in testing algorithms is non-questionable, the performance of these techniques on real data and their application on large datasets are definitely of great importance.As an attempt to address this issue, the following two objectives were formulated: 1.
Perform a comparative analysis of the least squares, sparse regression, signal-subspace and geometrical methods for the subpixel classification of different datasets.

2.
Develop a method to utilize the abundance maps obtained from subpixel learning algorithms to retrieve fractional land cover (LC) classes representing forest, farmland (including agricultural and herbaceous lands), water, and urban areas.
We first present the qualitative and quantitative analysis of the subpixel learning algorithms to find out the best performing technique that renders fractional abundances of different classes with the highest accuracy through both computer-simulated and real-world data of an agricultural landscape and an urban scenario.These data were analyzed by deriving vegetation, substrate and dark objects (such as shadows and deep water) endmember fractions followed by comparison with the ground truth data for accuracy assessment using various measures.Subsequently, in the second part of this study, we show the scope of the best unmixing technique in an operational context to obtain global abundance maps from WELD (Web-Enabled Landsat Data) of the year 2011 using the global endmembers of substrate, vegetation and dark objects.The role of abundance maps in obtaining the fractional spatial extent of four LC classes such as forest, farmland, water bodies and urban areas is shown for the state of California, USA.
The paper is organized as follows: Section 2 briefly reviews the subpixel learning algorithms, Section 3 details the data and endmember generation, Section 4 discusses the LC classification methods and validation approaches, Section 5 presents the results and discussion, followed by concluding remarks in Section 6.

Review of Subpixel Learning Algorithms
The notion behind LMM is introduced below with a brief review of seven state-of-the-art subpixel learning algorithms.For each pixel, the observation vector y is related to endmember signature E by a linear model as where, each pixel is a M-dimensional vector y whose components are the digital numbers corresponding to the M spectral bands.E = [e 1 , . . .e n-1 , e n , e n+1 ..., e N ] is a M × N matrix, where N is the number of classes, {e n } is a column vector representing the spectral signature of the nth target material and η accounts for the measurement noise.For a given pixel, the abundance of the nth target material present in the pixel is denoted by α n , and these values are the components of the N-dimensional abundance vector α.We further assume that the components of the noise vector η are zero-mean random variables that are independent and identically distributed (i.i.d.).Therefore, the covariance matrix of the noise vector is σ 2 I, where σ 2 is the variance and I is M × M identity matrix.

Unconstrained Least Squares (UCLS)
The conventional approach to extract the abundance values is to minimize y − Eα as in (2): which is termed a UCLS estimate of the abundance.UCLS with full additivity is a non-statistical, non-parametric algorithm that optimizes a squared-error criterion but does not enforce the non-negativity and unity conditions.Our previous study [31] compared the performance evaluation of UCLS, orthogonal subspace projection and singular value decomposition, which indicated similar unmixing results on different datasets.UCLS was faster in terms of execution time, therefore it is being considered here for comparison with constrained algorithms [32].

Fully Constrained Least Squares (FCLS)
To avoid deviation of the estimated abundance fractions, the ANC given in (3) and the ASC given in ( 4) are imposed on the model: The ANC and ASC constrain the value of abundance in any given pixel between 0 and 1.When only ASC is imposed on the solution, the SCLS estimate of the abundance is where The SCLS solution may have negative abundance values but they add to unity.FCLS [18,33] extends the NNLS (non-negative least squares) algorithm [34] to minimize Eα − y subject to α ≥ 0 by including ASC in the signature matrix E by a new signature matrix (SME) defined by with θ in (7) and ( 8) regulates the ASC.Using these two equations, the FCLS algorithm is directly obtained from NNLS by replacing the signature matrix E with SME and pixel vector y with s.Further, the original NNLS algorithm was found to be computationally very slow in our experiments.Therefore, we modified the original FCLS by incorporating the faster NNLS proposed by [35].

Modified Fully Constrained Least Squares (MFCLS)
The ANC is a major difficulty in solving constrained linear unmixing problems as it forbids the use of Lagrange multiplier.Chang (2003) [18] proposed the replacement of The AASC allows usage of Lagrange multiplier along with the exclusion of negative abundance values.This leads to an optimal constrained least squares solution satisfying both the ASC and AASC, which is called MFCLS and is expressed as It turns out that the solution to ( 9) is where αUCLS = (E T E) −1 E T y as in (2).The ASC and AASC constraints are used to compute λ 1 and λ 2 by replacing α with αUCLS with the following constraints: MFCLS utilizes the SCLS solution and the algorithm terminates with all non-negative components.

Simplex Projection (SP)
SP finds the projection of a point onto a generic simplex and minimizes the least squares error while imposing the ANC and ASC.It reduces the computational complexity without any optimization, while recursively reducing the dimensionality of the problem to obtain a suitable abundance vector [2].At each run, the algorithm identifies an endmember that has zero abundance and orthogonally projects on a hyperplane of a dimension less than the previous one.Considering P points y p R M , p = 1, . . ., P with N endmembers {e 1 , . . .e n-1 , e n , e n+1 , ... , e N }, all points y p are projected on to a simplex S I spanned by N in the set I = {e 1 , . . .e n-1 , e n , e n+1 , ... , e N } producing the projected points y p and corresponding abundance vectors αp = α p1 , . . ., α pN .The projected points y p are determined through y p = E αp [36].

Sparse Unmixing via Variable Splitting and Augmented Lagrangian (SUnSAL)
Sparse regression [37] is related to both statistical and geometrical frameworks.The endmember search is conducted in a large library E ∈ R M x N , where M < N and α ∈ R N .Only a few of the signatures contained in E are involved in the mixed pixel spectrum, therefore α contains many values of zero and is a sparse vector.The sparse regression problem is expressed as where α 0 denotes the number of nonzero components of α and δ ≥ 0 is the noise and modeling error tolerance.α ≥ 0 and 1 T α = 1 refers to the ANC and ASC, respectively.A set of sparsest signals belonging to the (N-1) probability simplex and satisfying error tolerance inequality defines the solution of (12).When the fractional abundances follow the ANC and ASC, the problem is referred to as constrained sparse regression given by (13) [38]: where α 2 and α 1 are the l 2 and l 1 norms and λ ≥ 0 is a weighing factor.SUnSAL is based on the alternating direction method of multipliers (ADMM) and is derived as a variable splitting procedure followed by the adoption of an augmented Lagrangian method [39].

SUnSAL and Total Variation (SUnSAL TV)
Sparse unmixing techniques do not deal with the neighboring pixels and tend to ignore the spatial context.SUnSAL TV takes into account spatial information (the relationship between each pixel vector and its neighbors) by means of the TV regularizer with an assumption that two neighboring pixels will very likely have similar fractional abundances for the same endmember [40].The TV regularizer acts as a priori information and unmixing is achieved by a large non-smooth convex optimization problem.

Collaborative SUnSAL (CL SUnSAL)
Generally, the performance of sparse unmixing solution suffers from the limitation of a high degree of coherence between the signatures of the endmembers affecting the uniqueness of the solution.Collaborative sparse unmixing removes the above limitation where coherence has a weaker impact on unmixing since the pixels are constrained to share a small set of endmembers [10].
Here, the unmixing result is refined by solving a joint sparse regression problem in which the sparsity is simultaneously imposed on all the pixels [41].Consider (1) while assuming P data points in matrix Y and η = [η 1 , . . ., η P ] as the noise matrix.Assume α F ≡ trace{αα} T is the Frobenius norm of α, and let λ > 0 be the regularization parameter, then the optimization problem is where α k denotes the kth line of α. ∑ N k=1 α k 2 is the l 2,1 mixed norm that supports a small number of nonzero lines and sparsity in α among all the pixels, leading to the solution of ( 14) through the extended SUnSAL algorithm.CL SUnSAL solves the l 2 + l 2,1 optimization problem in addition to ANC.If α 2,1 = ∑ N k=1 α k 2 denotes the l 2,1 norm, then ( 14) can be rewritten as where ι R+ (α) = ∑ P i=1 ι R+ (α i ) is the indicator function.Equation ( 15) can be expressed in a constrained form and can be solved using the methods given in [42] to obtain the abundances.The abundance map allows a proportion of each pixel to be partitioned between classes.The value of abundance in any given pixel ranges from 0 to 1 (in an abundance map) with the number of abundance maps the same as the number of classes.The value 0 indicates absence of a particular class and 1 indicates the presence of only that class in a particular pixel.Intermediate values between 0 and 1 represent a proportion of that class.

Data
In this section, details about different experimental datasets, global Web-Enabled Landsat Data, and endmember generation are discussed.

Computer Simulated Data
One of the major problems in analyzing the quality of fractional estimation methods is that the exact ground truth information about the real abundances at subpixel level for all classes is difficult to obtain.Therefore, simulation of imagery is carried out as an intuitive way to perform evaluation of the techniques, for example, the simulated but realistic fully-calibrated citrus orchard virtual scene [43,44].Since all the details of the simulated images are known, the algorithm's performance can be examined in a controlled manner.For generating the synthetic multispectral data y of six bands with different levels of noise η, the linear mixture model y = Eα + η was used with a simulated abundance map α.The noise was drawn from an uncorrelated normal distribution with zero mean and variance σ 2 (noise level).The endmembers E were taken from a set of global spectra of the three-endmember libraries [45].At each pixel's location in the simulated abundance map α, one specific endmember among the three was made dominant (abundance ≥ 0.77), and the remaining two abundance values were chosen at random such that all the values were positive and add to one.To have synthetic data closer to reality, the dominant endmembers had spatial correlation among the neighboring pixels for the same endmember except at region boundaries.In a separate set of experiments, σ 2 was increased in multiples of two (i.e., σ 2 was set to 2, 4, 8, . . ., 256) in order to analyze the performance of various learning algorithms on subpixel classification with variable noise.

Landsat Data-An Agricultural Landscape and an Urban Scenario
A spectrally diverse collection of 11 time-series scenes of Level 1 (terrain-corrected), cloud-free Landsat-5 (obtained from the Web-Enabled Landsat Data (WELD) version 3.0 product) 16-bit data for Fresno, California, USA (WRS path 43, row 35) were used.These data were captured on 4 and 20 April, 22 May, 7 and 23 June, 9 and 25 July, 26 August, 11 and 27 September, and 13 October of the year 2008 and were calibrated to atmospheric reflectance [46].The atmospheric reflectance was converted to surface reflectance by means of the 6S code implementation in LEDAPS (Landsat Ecosystem Disturbance Adaptive Processing System).For the ground truth data corresponding to the above scenes, a coincidental set of ground canopy covers were collected for a number of surveyed fields that were located within an area of about 25 × 35 km 2 southwest of the city of Fresno (Figure 1).A total of 74 polygons of fractional vegetation cover were generated from digital photographs taken with a multispectral camera mounted on a frame at nadir view pointed 2.3 m above the ground.These photographs were acquired at the commercial agricultural fields of the San Joaquin Valley (in central California) on the 11 dates mentioned above, except for one date when the Landsat acquisition preceded the ground observation by one day.For each date, 2-4 evenly spaced pictures were taken for an area of 100 × 100 m 2 with center location marked by GPS [47].These fractional measurements belonged to a diverse set of seasonal and perennial crops in various developmental stages, from emergence to full canopy, that represented an agricultural environment in the real-world scenario.
A second set consisting of a pair of coincident clear sky Landsat TM-5 data and WV-2 data (World View-2 of 2 m spatial resolution) for an area of San Francisco, California, USA were used to assess the algorithms.San Francisco was chosen for the test site because of its urbanized landscape, with a mix of building architectures, vegetation and substrate.WV-2 data were acquired a few minutes after the Landsat-5 TM data acquisition on 1 May 2010 for an area near the Golden Gate Bridge, San Francisco (Figure 1).The spectral range of the first four bands of Landsat data correspond to the WV-2 bands 2, 3, 5 and 7, so they have similar spectral response functions.WV-2 data were converted to top of atmosphere reflectance values.The Landsat unmixed images were compared to the corresponding WV-2 fraction images for accuracy assessment.with a multispectral camera mounted on a frame at nadir view pointed 2.3 m above the ground.These photographs were acquired at the commercial agricultural fields of the San Joaquin Valley (in central California) on the 11 dates mentioned above, except for one date when the Landsat acquisition preceded the ground observation by one day.For each date, 2-4 evenly spaced pictures were taken for an area of 100 × 100 m 2 with center location marked by GPS [47].These fractional measurements belonged to a diverse set of seasonal and perennial crops in various developmental stages, from emergence to full canopy, that represented an agricultural environment in the real-world scenario.
A second set consisting of a pair of coincident clear sky Landsat TM-5 data and WV-2 data (World View-2 of 2 m spatial resolution) for an area of San Francisco, California, USA were used to assess the algorithms.San Francisco was chosen for the test site because of its urbanized landscape, with a mix of building architectures, vegetation and substrate.WV-2 data were acquired a few minutes after the Landsat-5 TM data acquisition on 1 May 2010 for an area near the Golden Gate Bridge, San Francisco (Figure 1).The spectral range of the first four bands of Landsat data correspond to the WV-2 bands 2, 3, 5 and 7, so they have similar spectral response functions.WV-2 data were converted to top of atmosphere reflectance values.The Landsat unmixed images were compared to the corresponding WV-2 fraction images for accuracy assessment.

Global Web-Enabled Landsat Data (WELD)
The NASA-funded WELD, through the Making Earth System Data Records for Use in Research Environments (MEaSUREs) project, has been systematically generating 30 m monthly and annual global composite products from Landsat 7 ETM+ and Landsat 5 TM data for all non-Antarctic land surface mosaics [48].The entire globe is covered with around 8000 scenes/month and the version 3.0

Global Web-Enabled Landsat Data (WELD)
The NASA-funded WELD, through the Making Earth System Data Records for Use in Research Environments (MEaSUREs) project, has been systematically generating 30 m monthly and annual global composite products from Landsat 7 ETM+ and Landsat 5 TM data for all non-Antarctic land surface mosaics [48].The entire globe is covered with around 8000 scenes/month and the version 3.0 global WELD (shown in Figure 2) are available in public domain for a three year period from 2008 to 2011 (~0.3 million scenes), spectrally calibrated and converted to surface reflectance and brightness temperature.

Endmember Generation
Global mixing spaces using a spectrally diverse LC and a diversity of biomes with 100 Landsat ETM+ subscenes were used to define a standardized set of spectral endmembers of substrate ("S"endmember 1), vegetation ("V"-endmember 2), and dark objects ("D"-endmember 3) that spans all terrestrial biomes determined by mean annual temperature and precipitation in proportion to land area [49].The geographical locations of the 100 Landsat scenes showing spectral diversity resulting from LC variety across biomes was established in an earlier study [50].The subscenes included within-scene spectral variability, LC transition and global land area distribution.With a linear stretch of 2% applied to the bands within a subscene, substrate, vegetation and water were apparent as brown, green and black color, respectively.The substrate includes urban structures, soils, sediments, rocks, and non-photosynthetic vegetation.Vegetation refers to green photosynthetic plants, and dark objects encompasses absorptive substrate materials, clear water, deep shadows, etc.The S-V-D endmember coefficients with dates and locations for each subscene are available in [45] and a plot of the endmembers is provided in [50].The estimates obtained from the global endmembers have been compared to the fractional vegetation cover derived vicariously by linearly unmixing nearcoincidental WV-2 acquisitions over a set of diverse coastal environments, using both global endmembers and image-specific endmembers to unmix the WV-2 images.The strong 1:1 linear correlation between the fractions obtained from the two types of image indicate that the mixture model fractions scale linearly from 2 m to 30 m over a wide range of terrains.When endmembers are derived from a large-enough sample of radiometric responses to encompass the Landsat spectral mixing space, they can be used to build a standardized spectral mixture model for global mapping applications.

Classification Methods and Validation Approaches
In this section, classification methods, validation datasets and validation approaches, along with the algorithm's parameter settings and computational requirements are discussed.

Land Cover Classification from Abundance Maps
Subsequent levels of classification can be carried from the S-V-D abundance maps to obtain fractional maps of forest, farmland/grassland, water bodies and urban areas, etc. for numerous other

Endmember Generation
Global mixing spaces using a spectrally diverse LC and a diversity of biomes with 100 Landsat ETM+ subscenes were used to define a standardized set of spectral endmembers of substrate ("S"-endmember 1), vegetation ("V"-endmember 2), and dark objects ("D"-endmember 3) that spans all terrestrial biomes determined by mean annual temperature and precipitation in proportion to land area [49].The geographical locations of the 100 Landsat scenes showing spectral diversity resulting from LC variety across biomes was established in an earlier study [50].The subscenes included within-scene spectral variability, LC transition and global land area distribution.With a linear stretch of 2% applied to the bands within a subscene, substrate, vegetation and water were apparent as brown, green and black color, respectively.The substrate includes urban structures, soils, sediments, rocks, and non-photosynthetic vegetation.Vegetation refers to green photosynthetic plants, and dark objects encompasses absorptive substrate materials, clear water, deep shadows, etc.The S-V-D endmember coefficients with dates and locations for each subscene are available in [45] and a plot of the endmembers is provided in [50].The estimates obtained from the global endmembers have been compared to the fractional vegetation cover derived vicariously by linearly unmixing near-coincidental WV-2 acquisitions over a set of diverse coastal environments, using both global endmembers and image-specific endmembers to unmix the WV-2 images.The strong 1:1 linear correlation between the fractions obtained from the two types of image indicate that the mixture model fractions scale linearly from 2 m to 30 m over a wide range of terrains.When endmembers are derived from a large-enough sample of radiometric responses to encompass the Landsat spectral mixing space, they can be used to build a standardized spectral mixture model for global mapping applications.

Classification Methods and Validation Approaches
In this section, classification methods, validation datasets and validation approaches, along with the algorithm's parameter settings and computational requirements are discussed.

Land Cover Classification from Abundance Maps
Subsequent levels of classification can be carried from the S-V-D abundance maps to obtain fractional maps of forest, farmland/grassland, water bodies and urban areas, etc. for numerous other applications.In the S-V-D abundance maps, dense forest pixels occur on and near the binary mixing trend between vegetation and dark (i.e., shadow) endmembers.The vegetation endmembers generally correspond to illuminated foliage.The higher the fraction of vegetation endmember, the less canopy shadow and illuminated substrate contribute to the pixel's reflectance.Herbaceous vegetation (grass) typically has less canopy shadow than closed canopy forest, so they occur closer to vegetation endmember (higher vegetation fractions).A continuum may exist between dense forest and herbaceous vegetation using a trade-off between vegetation and shadow fractions.Dense forest can be distinguished from farmland/grassland by assigning a threshold to both the vegetation and dark endmember fractions.Additionally, NDVI (Normalized Difference Vegetation Index) values can also be used as a supplement layer for discriminating dense forest by applying a suitable threshold.Sampled forest patches for training data revealed that vegetation abundance >0.2, dark objects abundance >0.6 and NDVI >0.7 represented forest areas.
In this work, the above approach is tested for the state of California in the United States for which training, test and relevant ancillary data were available.For classification of the three LC classes, 200,000 pixels belonging to homogeneous dense forest, homogeneous herbaceous vegetation/grassland, and water bodies (clear water, turbid water and green water caused by eutrophication) were selected as training samples using a stratified random sampling on the basis of administrative boundaries.The forest and grassland pixels were also confirmed by following the method in Huang et al., (2008) [51] and by visual interpretation of the high-resolution Google Earth TM images.A supervised classification was performed on S-V-D abundances using the random forest (RF) classifier [52] into dense forest, farmland and water bodies.Figure 3 shows the flowchart of the overall methodology.The same set of training polygons were also used to generate training samples to classify the original WELD data (bands 1-5 and 7) to obtain per-pixel classified maps in order to assess the advantages of fractional maps over per-pixel classification.Here, the number of trees was set to 250, the node size was set to five and the maximum number of terminal nodes was set to 500.When further parameter variations did not increase the accuracy, the final values of the parameters were obtained empirically depending on the minimum execution time and memory requirements.
For the classification of built and constructed impervious surfaces (such as buildings, driveways, sidewalks, roads, parking lots and other man-made surfaces), the common sources of error include underestimation in areas with extensive tree cover and overestimation in areas with extensive bare ground.In this regard, Defense Meteorological Satellite Program Operational Line Scanner (DMSP OLS) nighttime light data and the Visible Infrared Imaging Radiometer Suite (VIIRS) day-night band (DNB) carried by the Suomi National Polar-Orbiting Partnership (NPP) satellite are very useful in monitoring and analyzing human activities and natural phenomena.DMSP OLS and NPP-VIIRS pixels in the nighttime light images aid in classifying urban built-up areas since they capture illuminated surfaces than the surrounding dark areas at night [53].
Pixels with radiance values equal to or larger than a threshold that produces minimum difference between image-derived value and reference data are considered part of an urban built-up area [54].In this study, the NPP-VIIRS data at a resolution of 15 arcsec grids obtained from the NOAA (National Oceanic and Atmospheric Administration) National Geophysical Data Center were resampled to 30 m.A threshold of the minimum value radiance was used to segment the images into urban and non-urban areas for different cities.The absolute difference between the extracted area and the reference data was recorded and the process was repeated by varying the threshold to reach the maximum pixel value of the image.The threshold value that produced minimum difference was selected for urban built-up area extraction.For each city, FCC (false color composite) of urban areas was also adopted to analyze the spatial coherence of the extracted results.

Validation Methods
For the computer-simulated data, the estimated class abundance maps were first compared with the simulated true abundance maps using visual checks.Performance discriminators such as range of fractional estimates (minimum and maximum abundance values), correlation coefficient (r), RMSE (root mean square error), signal-to-reconstruction error (SRE), probability of success (p s ), and bivariate distribution function (BDF) were used for validation.A smaller RMSE indicates a better unmixing result i.e., higher accuracy.SRE and p s were computed for each algorithm for various noise levels [37].The quality of the reconstruction of a spectral mixture is measured using SRE ≡ E[ α ] 2  2 /E[ α − α 2 2 ] in decibels, where SRE(dB) ≡ 10 log 10 (SRE).It gives information by relating the power of the error to the power of the signal.It is to be noted that the higher the SRE(dB), the better the unmixing performance.p s ≡ P α−α 2 α 2 ≤ threshold , where p s is an estimate of the probability that the relative error power is smaller than a certain threshold and is also commonly used in the sparse regression community.It indicates the stability of the estimation, which is a complimentary measure to SRE (that is an average).BDF is used to visualize the accuracy of prediction by mixture models.Points along a 1:1 line on the BDF graph indicate predictions that match completely with the real abundances.Additionally, for each of the 74 surveyed field locations in the 11 Landsat scenes of the agricultural landscape, mean absolute error (MAE) was computed for all the subpixel learning algorithms.
Comprehensive validation of the entire fractional LC maps of the forest, farmland, water and urban areas was found to be extremely challenging as there are still limited datasets that can be used as a reference.Therefore, we focused on the National Land Cover Database 2011 (NLCD 2011) [55], NLCD 2011 percent tree canopy cover, NLCD 2011 percent developed imperviousness [56], North American Forest Dynamics (NAFD) 2010 product [56][57][58], and Google Earth TM imagery for both qualitative and quantitative methods.Qualitative validation included comprehensive visual assessment with local reference of high spatial resolution Google Earth TM imagery, whereas quantitative methods included a design-based accuracy assessment with NLCD 2011 products.For the quantitative assessment, the validation points were stratified into four groups based on NLCD 2011 LC classes: (1) Forest (NLCD classes: deciduous forest, evergreen forest, mixed forest, and woody wetland); (2) Agriculture/farmland (NLCD classes: grassland/herbaceous, pasture/hay, and cultivated crops); (3) Water (NLCD class: open water); and urban (NLCD classes: developed, open space, low intensity, medium intensity, and high intensity).Additionally, a separate per-pixel classified map from WELD was used for comparison with the fractional class maps.A total of 200,000 sampling pixels were chosen at random from 1000 validation sites for each LC type.Pixels with more than 50% presence of any class in the fractional maps were discretized to the respective class, although this threshold can be reduced if a smaller fraction of LC class is to be accounted.For urban areas, pixels with more than 30% fractions were considered to be urbanized, since urban areas have a mix of houses, parking spaces, walkways, lawns and roads.The per class producer's accuracy (PA), user's accuracy (UA), overall accuracy (OA), and kappa coefficient were calculated by comparing fractional maps with NLCD 2011 LC map and averaged over 10 iterations.

Parameter Settings and Computational Requirements
All the above subpixel learning algorithms (discussed in Section 2) were implemented in MATLAB ® .In FCLS, the tolerance was set to 1 × 10 −5 (the algorithms will converge at this value without entering an infinite loop) and the value of λ (weightage for the sum to one constraint) was set to 20, which was determined empirically.The tolerance in MFCLS was set to 1 × 10 −7 .The parameter λ in SUnSAL was set to 10 −5 , δ = 10 −4 ; λ in SUnSAL TV was set to 5 × 10 −4 and λ TV in SUnSAL TV was set to 5 × 10 −3 for computer-simulated data and 10 −3 for the Landsat data.λ in CL SUnSAL was set to 10 −2 for both without and with the ASC and ANC activated.Here, the optimal parameter values were obtained after several experiments with various values of λ and λ TV (see Tables IV-V in [37]; Tables I-II in [10,40]) and were finalized based on the higher SRE.It was observed that our observations of the best parameter values coincided with the values in [37,39].The maximum number of iterations was set to 100 and all other parameters were set to default.UCLS and SPU do not require any parameter settings.
In order to provide an open source solution and attain high execution speed for the global WELD classification, FCLS MATLAB ® implementation was converted to C++ syntax along with other open source packages such as OpenCV (Open Source Computer Vision) package, boost C++ libraries, gdal, and GRASS (Geographic Resources Analysis Support System) GIS on the NASA Earth Exchange (NEX) platform.The NEX framework combines state-of-the-art supercomputing (Pleiades supercomputer), Earth system modeling, remote sensing (RS) data from NASA and other agencies, workflow provenance, and a scientific social networking platform to deliver a complete end-to-end work environment.Since NEX is built upon the terrestrial observation and prediction system (TOPS) [59], a data assimilation and modeling framework developed at the NASA Ames Research Center, it provides software libraries and tools required by NEX to automate the processing of satellite and climate data, and manages workflows for data analysis and modeling studies.Each node in the Pleiades Harpertown compute cluster have CPUs consisting of 8 GB of memory and eight cores with 3-GHz processors per node.The deployment of the algorithm was done through the QSub routine and the message passing interface.Each WELD tile was fed to a separate core in parallel in the NEX high performance computing platform.

Computer Simulations
Figure 4a-c shows noise-free synthetic abundance maps for endmember 1, 2 and 3 and Figure 4d-f shows estimated abundance maps obtained for each signature class from FCLS, with the range of abundance fraction values underneath each panel.Note that the range of abundance values obtained from FCLS is exactly the same as in the synthetic abundance maps indicating that FCLS has been able to accurately perform subpixel classification.Results from the other algorithms are not presented due to space considerations.Visual examination of the abundance maps obtained from the seven algorithms revealed that they were similar in terms of the relative fractions and from the detection point of view.

Computer Simulations
Figure 4a-c shows noise-free synthetic abundance maps for endmember 1, 2 and 3 and Figure 4d-f shows estimated abundance maps obtained for each signature class from FCLS, with the range of abundance fraction values underneath each panel.Note that the range of abundance values obtained from FCLS is exactly the same as in the synthetic abundance maps indicating that FCLS has been able to accurately perform subpixel classification.Results from the other algorithms are not presented due to space considerations.Visual examination of the abundance maps obtained from the seven algorithms revealed that they were similar in terms of the relative fractions and from the detection point of view.Figure 5a-c shows r (statistically significant at 0.99 confidence level, p-value < 2.2 × 10 −16 ) and RMSE between real abundance and estimated abundance obtained from the seven algorithms for all the three endmembers corresponding to different levels of noise.For SUnSAL, SUnSAL TV and CL SUnSAL algorithms, three different cases-without any constraints, with ANC, and with both ANC and ASC imposed on the optimal solution were evaluated.For the noise-free data, most of the models had high r (close to 1) and low RMSE.All the models were robust until noise variance 32, beyond which r gradually decreased and reached a minimum of 0.12, producing higher RMSE following a hyperbolic curve.FCLS was robust until noise variance 128 for all the three endmembers and it showed highest r for endmember one at noise variance 256.For endmembers two and three, FCLS, MFCLS, SPU, SUnSAL (ANC + ASC) and CL SUnSAL (ANC + ASC) produced the highest r with lower RMSE.Figure 5d shows the plot of p s and SRE(dB) against various noise levels.Abundance values obtained from the unmixing algorithms were accepted when α − α 2 / α 2 ≤ threshold, where threshold (0.0005) is the 99th percentile of α − α 2 / α 2 of FCLS abundance at noise variance 32, where most of the methods rendered good performance.Figure 5d reveals that until noise variance 32, FCLS, MFCLS, SPU, fully-constrained SUnSAL and CL SUnSAL had p s close to 1, beyond which they decreased gradually.FCLS was robust even at noise variance 128 with p s = 0.6.With respect to SRE(dB), FCLS was better among all the algorithms at different noise levels.On the other hand, UCLS, SUnSAL (no constraints), CL SUnSAL (no constraints), along with SUnSAL TV (without and with constraints) had poor performance with low r, high RMSE, low p s and SRE.The plot of the execution time taken by each of the algorithms for unmixing computer-simulated data for different levels of noise on a 2.53 GHz Intel Core i5 processor with 8 GB RAM revealed that UCLS, FCLS, MFCLS, SPU and SUnSAL took much less execution time (<3 s), whereas SUnSAL TV and CL SUnSAL took higher execution time (15 s and 7 s respectively).The points on the BDF plots (not shown here) fell almost in a 1:1 line for all the three endmembers.As the noise variance increased, the estimated abundance deviated from the real abundance values.
plot of the execution time taken by each of the algorithms for unmixing computer-simulated data for different levels of noise on a 2.53 GHz Intel Core i5 processor with 8 GB RAM revealed that UCLS, FCLS, MFCLS, SPU and SUnSAL took much less execution time (<3 s), whereas SUnSAL TV and CL SUnSAL took higher execution time (15 s and 7 s respectively).The points on the BDF plots (not shown here) fell almost in a 1:1 line for all the three endmembers.As the noise variance increased, the estimated abundance deviated from the real abundance values.Considering the various measures of performance discriminators and execution time in the above analysis, it was found that overall, FCLS outperformed, followed by MFCLS, SPU and SUnSAL with marginally lower accuracies.Even though SUnSAL TV introduces regularization to enforce continuity of abundances among neighboring pixels, it performed the worst of all, with poor accuracy measures and higher execution times.In the next section, the implementation of the algorithms on real-world data for different landscapes are discussed.

Landsat Data-An Agritultural Landscape
Each of the 11 Landsat scenes was unmixed with S-V-D endmembers using different models to obtain the abundance estimates.For each scene, the vegetation abundances were compared with ground-based measurements.FCLS produced the least MAE of 0.03 and highest r of 0.99.All other methods except fully-constrained SUnSAL TV produced a lower MAE of 0.08 and r of 0.98.SUnSAL TV (ANC+ASC) rendered high MAE (0.3) and low r (0.21).FCLS was slightly better when compared to the other methods.On an average, FCLS took the minimum execution time (20 min) to process each Landsat scene (7321 rows × 8367 columns).SUnSAL and CL SUnSAL rendered good classification accuracies with high execution time (7200 s (2 h)/scene).Although time exhaustive (the algorithm took 118,800 s (33 h)/scene), SUnSAL TV did not produce satisfactory results.

Landsat Data-An Urban Scenario
Unmixed Landsat and WV-2 data of San Francisco with the S-V-D endmembers were compared for accuracy where each 2 m WV-2 pixel is less than 0.5% of the area within the 30 m full-width, halfmaximum of the Landsat point spread function.The WV-2 fractions were convolved with a Gaussian low pass filter with the point spread function of the Landsat sensor and resampled to 30 m. Coordinate comparison of the WV-2 and Landsat datasets at many random pixels did not reveal any systematic image registration error.Validation revealed that the unconstrained and partially constrained algorithms viz.UCLS, SUnSAL, SUnSAL TV and CL SUnSAL for the S, V and D classes produced MAE of 0.11, 0.07 and 1.99, respectively, and r of 0.86, 0.88 and −0.03, respectively.FCLS Considering the various measures of performance discriminators and execution time in the above analysis, it was found that overall, FCLS outperformed, followed by MFCLS, SPU and SUnSAL with marginally lower accuracies.Even though SUnSAL TV introduces regularization to enforce continuity of abundances among neighboring pixels, it performed the worst of all, with poor accuracy measures and higher execution times.In the next section, the implementation of the algorithms on real-world data for different landscapes are discussed.

Landsat Data-An Agritultural Landscape
Each of the 11 Landsat scenes was unmixed with S-V-D endmembers using different models to obtain the abundance estimates.For each scene, the vegetation abundances were compared with ground-based measurements.FCLS produced the least MAE of 0.03 and highest r of 0.99.All other methods except fully-constrained SUnSAL TV produced a lower MAE of 0.08 and r of 0.98.SUnSAL TV (ANC+ASC) rendered high MAE (0.3) and low r (0.21).FCLS was slightly better when compared to the other methods.On an average, FCLS took the minimum execution time (20 min) to process each Landsat scene (7321 rows × 8367 columns).SUnSAL and CL SUnSAL rendered good classification accuracies with high execution time (7200 s (2 h)/scene).Although time exhaustive (the algorithm took 118,800 s (33 h)/scene), SUnSAL TV did not produce satisfactory results.

Landsat Data-An Urban Scenario
Unmixed Landsat and WV-2 data of San Francisco with the S-V-D endmembers were compared for accuracy where each 2 m WV-2 pixel is less than 0.5% of the area within the 30 m full-width, half-maximum of the Landsat point spread function.The WV-2 fractions were convolved with a Gaussian low pass filter with the point spread function of the Landsat sensor and resampled to 30 m. Coordinate comparison of the WV-2 and Landsat datasets at many random pixels did not reveal any systematic image registration error.Validation revealed that the unconstrained and partially constrained algorithms viz.UCLS, SUnSAL, SUnSAL TV and CL SUnSAL for the S, V and D classes produced MAE of 0.11, 0.07 and 1.99, respectively, and r of 0.86, 0.88 and −0.03, respectively.FCLS for S, V and D classes showed MAE of 0.03, 0.02 and 0.02, respectively, and r of 0.94, 0.97 and 0.97, respectively, producing the best classification results with the smallest execution time (19 min) to process each Landsat scene (7151 rows × 8241 columns).The SUnSAL family of algorithms was the most time-exhaustive (SUnSAL and CL SUnSAL took 7200 s (2 h)/scene), while fully-constrained SUnSAL TV (for which the execution time was 118,800 s (33 h)/scene) did not produce reasonable results.Note the correspondence between the vegetation abundance and NDVI, which clearly indicates that the vegetation endmember was able to extract the vegetation component from the mixed pixels when investigated visually and spatially.To our best knowledge, this is the first attempt to produce time series global abundance maps at the native 30 m spatial resolution.With the extended WELD project, monthly and annual global products for six three-year periods spaced every five years (1985, 1990, 1995, 2000, 2005 and 2010) are planned in reverse chronological order (i.e., 2010, ..., 1985).

Subpixel Land Cover Classification
An elementary implementation of our framework produced reasonable results.Figure 7a,b shows the fractional forest cover and farmland/grassland maps of California. Figure 7c shows FCC (bands 3-5) for a small region from south-central California where dense forest pixels are highlighted in dark green, farmland as light green and water bodies in black.Figure 7d-f are the fractional LC maps of the three classes, (g) is the classified output from RF classifier and (h) is the forest cover map from NAFD.By comparing Figure 7d, g and h through visual inspection, it is clear that although RF Note the correspondence between the vegetation abundance and NDVI, which clearly indicates that the vegetation endmember was able to extract the vegetation component from the mixed pixels when investigated visually and spatially.To our best knowledge, this is the first attempt to produce time series global abundance maps at the native 30 m spatial resolution.With the extended WELD project, monthly and annual global products for six three-year periods spaced every five years (1985, 1990, 1995, 2000, 2005 and 2010) are planned in reverse chronological order (i.e., 2010, ..., 1985).

Subpixel Land Cover Classification
An elementary implementation of our framework produced reasonable results.Figure 7a,b shows the fractional forest cover and farmland/grassland maps of California. Figure 7c shows FCC (bands 3-5) for a small region from south-central California where dense forest pixels are highlighted in dark green, farmland as light green and water bodies in black.Figure 7d-f is the fractional LC maps of the three classes, Figure 7g is the classified output from RF classifier and Figure 7h is the forest cover map from NAFD.By comparing Figure 7d,g,h through visual inspection, it is clear that although RF seems to correctly classify the pixels, it has overestimated the classes because of per-pixel classification, while the NAFD product has overestimated forest areas over farmland.Figure 8a shows the spatial distribution of fractional water bodies and Figure 8b is a FCC of the San Luis Reservoir, with canals and highways along the water body.Figure 8c is the fractional water map, Figure 8d shows the classified map from RF and Figure 8e is the water body as depicted in the NAFD map.Linear canals have also been detected along with the reservoir in the unmixing based classification, unlike the output from the RF and NAFD product.It is clear that RF has overestimated the urban built-up areas since it also classifies open areas as urban within and outside the urban extent, causing high commission errors.If large open areas are separated from the urban areas during classification, it will decrease the total urban extent, especially when a definite urban boundary is not defined, as in the case of the SF Bay Area.
The minimum fractional forest cover estimated was 0.0016 ha (16 m²) and the total fractional area of forest cover was found to be 8,324,716 ha (19.7%).RF produced 8,978,601 ha (21.18%),NAFD showed 11,672,365 ha (27.53%), the NLCD 2011 LC map indicated 9,550,523 ha (22.54%) and the NLCD forest canopy percent indicated 7,456,588 ha (17.56%).For the farmland/grassland fraction map, the minimum fractional cover was 0.002 ha (20 m²) and the total fractional area was found to be 2,439,087 ha (5.8%), RF had 2,937,550 ha (6.9%), and NLCD showed 3,337,367 ha (7.8%).For farmland, the reason for the difference among NLCD, unmixing and RF could be the seasonal growth cycles in croplands.NLCD considers these classes as cultivated crops, which appear similar to farmland/herbaceous vegetation.The minimum fraction of water detected was 0.0036 ha (36 m²), which may correspond to small pools in residential areas.Unmixing-based classification estimated 1,888,727 ha (4.46%), RF yielded 1,948,956 ha (4.60%), NAFD indicated 1,769,822 ha (4.17%), and the It is clear that RF has overestimated the urban built-up areas since it also classifies open areas as urban within and outside the urban extent, causing high commission errors.If large open areas are separated from the urban areas during classification, it will decrease the total urban extent, especially when a definite urban boundary is not defined, as in the case of the SF Bay Area.
The minimum fractional forest cover estimated was 0.0016 ha (16 m 2 ) and the total fractional area of forest cover was found to be 8,324,716 ha (19.7%).RF produced 8,978,601 ha (21.18%),NAFD showed 11,672,365 ha (27.53%), the NLCD 2011 LC map indicated 9,550,523 ha (22.54%) and the NLCD forest canopy percent indicated 7,456,588 ha (17.56%).For the farmland/grassland fraction map, the minimum fractional cover was 0.002 ha (20 m 2 ) and the total fractional area was found to be 2,439,087 ha (5.8%), RF had 2,937,550 ha (6.9%), and NLCD showed 3,337,367 ha (7.8%).For farmland, the reason for the difference among NLCD, unmixing and RF could be the seasonal growth cycles in croplands.NLCD considers these classes as cultivated crops, which appear similar to farmland/herbaceous vegetation.The minimum fraction of water detected was 0.0036 ha (36 m 2 ), which may correspond to small pools in residential areas.Unmixing-based classification estimated 1,888,727 ha (4.46%), RF yielded 1,948,956 ha (4.60%), NAFD indicated 1,769,822 ha (4.17%), and the NLCD 2011 LC map had 1,938,462 ha (4.57%) as the water spread area.NAFD underestimated water bodies, as also seen in Figure 8e, while RF and NLCD had closer values.
For the SF Bay Area, unmixing-based classification using VIIRS data showed 139,152 ha as urban areas (most of the pixels have impervious surface fractions between 15 55%), RF gave a much higher estimate of 235,471 ha, NLCD indicated 231,179 ha, and the NLCD percent developed imperviousness layer had 141,522.73ha (most of the pixels have impervious surface fractions between 40 and 70%).One reason for the difference in the range of impervious surfaces between unmixing and NLCD impervious product could be the underlying methods: FCLS (used in unmixing) versus regression tree (used in NLCD).Also, FCLS imposes a sum-to-one constraint on the endmembers, whereas NLCD method does not.For the urban impervious surface in the LA area, unmixing-based classification yielded 189,179 ha, RF gave 315,496 ha, NLCD LC map resulted in 313,622 ha and the NLCD percent developed impervious layer indicated 191,134 ha.It can be seen that the unmixing and NLCD impervious percent values were very close.However, hard classification almost always overestimated the LC area compared to the fractional maps.These results also show that unmixing provides a physical basis to quantify the spectral characteristics of LC and distinguish spectrally heterogeneous areas from more spectrally homogeneous LC.One benefit of defining urban extent on the basis of spectral heterogeneity is the ability to generate a range of verifiable extent estimates that encompass a range of different definitions of urban areas, given that a unique spectral signature for urban LC is difficult to obtain.The above approach attempts to eliminate ambiguity resulting from varying administrative and political definitions of urban areas, as in many cases the exact definition of urban/non-urban remains ambiguous.As such, whether a park near the Golden Gate Bridge in San Francisco is urban or non-urban remains a question.It is to be noted that the park was classified as non-built-up in the original WELD data using RF.Beyond this, the lights in the park also contribute to the light emissions of the city.
Validation-For each of the 1000 validation sites as shown in Figure 10, (each point represents ~10 validation sites) reference datasets were derived for each LC type to create a confusion matrix.The results are tabulated in Table 1, which indicates that unmixing-based classification produced a higher overall TPR (true positive rate) of 91%, which is more than the per-pixel classification using RF (85%).Table 2 indicates PA and UA.The OA and kappa for unmixing-based classification (91.30% and 0.89) was higher than the RF classifier (85.31% and 0.83%).While the RF class areas are close to NLCD maps, it has wrongly classified many pixels belonging to barren/open land as built-up.Few water pixels were wrongly classified using unmixing-based classification, therefore, the PA decreased marginally.This is because some confusion between water and shadow might appear since both classes have pixels which are very dark in all the spectral channels.However, given the same geo-coordinates of training pixels for classification, the UA increased from 87.43 to 95%.The PA increased for forest (3.2%), farmland (9.76%), urban built-up (SF) (1.09%), urban built-up (LA) (12.15%), and the UA increased for forest (6.6%), water bodies (7.7%), urban built-up (SF) (11%), urban built-up (LA) (9.6%) in the unmixing based classification output.On the other hand, the UA decreased (∼1%) for farmland.
Unmixing was intended to improve classification accuracies by correctly classifying pixels, which were likely to be misclassified by RF classifier.Therefore, a cross-comparison of the two classified images located the pixels that were assigned different class labels at the same location.These wrongly classified pixels when validated revealed a 6% (29,830,877 pixels) improvement in unmixing-based classification.Had the WELD data been classified using a hard classification technique such as RF, the errors would accumulate for all the pixels.
Unmixing was intended to improve classification accuracies by correctly classifying pixels, which were likely to be misclassified by RF classifier.Therefore, a cross-comparison of the two classified images located the pixels that were assigned different class labels at the same location.These wrongly classified pixels when validated revealed a 6% (29,830,877 pixels) improvement in unmixing-based classification.Had the WELD data been classified using a hard classification technique such as RF, the errors would accumulate for all the pixels.Pixel to pixel correlation between the unmixed forest fraction and NLCD forest cover fraction was found to be 0.89.The SF and LA built-up fraction obtained through NPP-VIIRS showed a positive correlation of 0.87 and 0.91, respectively, with the NLCD percent developed imperviousness surface layer.The above validation approach was able to detect more than three-fourth of the class fractions with relatively low levels (i.e., 15-41% for the validation sites) of false alarm.
Along the sampling points shown in Figure 10, 1000 patches of size 10 × 10 pixels (300 m × 300 m on the ground) were selected for each LC type.The percentage of a class in each patch was computed as the ratio of the area of the class within the patch over the total area of that patch for the unmixed classified map, RF based framework, NLCD and NAFD maps. Figure 11 shows class fractions for 100 of the 1000 patches (for visual clarity) with the class means indicated by straight lines to visualize the error rates.Unmixing-based classification showed a 15%, 0% and 20% error rate, r of 0.84, 0.99 and 0.76, and RMSE of 0.23, 0.03 and 0.28 with NLCD, NLCD fractional forest cover and NAFD product, respectively.Obviously, the unmixing-based output was closest to the NLCD fractional forest cover with the maximum overestimation of 5% among the 1000 patches.Understanding the degree of underestimation/overestimation can lead to better estimates of tree cover and a better understanding of the potential limitations associated with the current classification estimates.RF was more positively correlated to the NLCD map (0.84) than NAFD (0.76), with RMSE of 0.21 and 0.26, and error rate of 10% and 15%.It is to be noted that the agreement between the NLCD and NAFD products was found to be only 74% for the 1000 patches.For the farmland class, unmixing-based classification was highly correlated (r = 0.94) to NLCD with a RMSE of 3, and a maximum underestimation of 6% was observed between fractions of both the maps with a mean error rate of less than 1% (as evident from Figure 11b).This difference was attributed to fine-scale variations in farmland (small patches) that were not detected by the unmixing method [60].For RF, r of 0.76 was found with the NLCD map, RMSE was 35.17, and a maximum difference of 85% was found between the fractions of both the patches with an error rate of ~30% (RF had highly overestimated farmland).
With the water class, a very high r of 0.99, RMSE of 0.98, and a maximum underestimation of 3% were observed between patches of unmixing-based classification and NLCD.With the NAFD product, r was 0.92, RMSE was 11.23, and a maximum difference of 85% between patches was observed.The overall performance of RF was similar to the unmixing-based classification with very small mean error rates among the different products.Figure 11d,e enumerates the fractions of urban region against the corresponding 100 patches for the SF Bay Area and LA area.NLCD and RF seem to overestimate both the urban areas.The r between unmixing-based classification and NLCD percent developed imperviousness was higher (0.89 for the SF Bay Area and 0.91 for LA) with lower RMSE (3.16 for the SF Bay Area and 3.07 for LA) than the NLCD map (r of 0.79 for the SF Bay Area and 0.85 for LA and RMSE of 23 and 16, respectively), with a maximum underestimation of ~8% observed between the fractional maps and the NLCD fractional maps for the 1000 patches.For urban areas, the potential for greater underestimation may be exacerbated as the urban surfaces become more fragmented or unevenly distributed.On the other hand, RF produced significantly less accurate results, with a higher average error rate of around 10% (see Figure 11d,e), lower r and high RMSE with both NLCD products for the SF Bay and LA areas.The poor performance of RF maps can be attributed to hard classification, where an urban pixel is classified in its entirety and approximated as either an urban or non-urban region based on whether most of the pixel area dominated by impervious surfaces are urban or not.
LC studies have important climatic, hydrologic, biophysical, ecological and socio-economic impacts on the environment.To date, most studies involve a simple per-pixel classification of RS data for LC maps, since an automated characterization of large-scale subpixel LC classification remains a challenge due to the inherent complexity and variability in vegetation dynamics and urban environment.In per-pixel classification, objects are classified based on their spectral characteristics, however, in most practical applications, each image pixel is usually composed of different (multiple) objects, which cannot be analyzed by per-pixel classification methods.The purpose of this study was to examine (i) how well the unmixing models perform towards processing Landsat data with global endmembers? and (ii) how to obtain different LC class fractions from the S-V-D abundance maps?
The relative comparison of one algorithm to the others was challenging due to lack of standardized data and the absence of defined rules [25].
for LC maps, since an automated characterization of large-scale subpixel LC classification remains a challenge due to the inherent complexity and variability in vegetation dynamics and urban environment.In per-pixel classification, objects are classified based on their spectral characteristics, however, in most practical applications, each image pixel is usually composed of different (multiple) objects, which cannot be analyzed by per-pixel classification methods.The purpose of this study was to examine (i) how well the unmixing models perform towards processing Landsat data with global endmembers? and (ii) how to obtain different LC class fractions from the S-V-D abundance maps?
The relative comparison of one algorithm to the others was challenging due to lack of standardized data and the absence of defined rules [25].In the Fresno area of central California, representing an agricultural scenario, the acquisitions were taken during clear sky conditions (except for three days) that also coincided with the approximate time of the satellite overpass which took into account the illumination effect.To avoid geolocation errors caused due to misregistration, atmospheric effects, presence of background mixed with substrate, etc., a matrix of 3 × 3 pixels centered over the GPS location was used.Thus, estimates of ground fractional cover from digital photographs obtained using image segmentation represented the field conditions well within the Landsat IFOV (instantaneous field of view).Since field data were In the Fresno area of central California, representing an agricultural scenario, the acquisitions were taken during clear sky conditions (except for three days) that also coincided with the approximate time of the satellite overpass which took into account the illumination effect.To avoid geolocation errors caused due to misregistration, atmospheric effects, presence of background mixed with substrate, etc., a matrix of 3 × 3 pixels centered over the GPS location was used.Thus, estimates of ground fractional cover from digital photographs obtained using image segmentation represented the field conditions well within the Landsat IFOV (instantaneous field of view).Since field data were gathered in the absence of topography, soils from two different field conditions may differ, causing minor errors in abundance estimates of substrate and dark objects.This difference is anticipated to be greater with a lower vegetation fraction cover than at dense vegetation sites.The image-derived fraction estimates closely matched the ground observations of sparse vegetation conditions, appreciating the fact that the vegetation fraction from the image is modeled only for the portion that is illuminated by sunlight and the shaded portions of the canopy are likely to be assigned to the dark fractions.Shadows are not a LC type but a result of the shaded portions of the vegetation canopy above soil or the consequence of tall buildings in urban areas and hillocks in mountainous terrain.Dark shadows often appear similar to tarred roads, deep water bodies, etc. in RS imageries and are therefore difficult to separate.Although shadows are implicitly modeled in RTM, they cannot be easily separated from other dark objects in the scene using LMM.Therefore, in this work shadows have been placed in the dark objects category that also encompasses absorptive substrate materials and clear deep water.The algorithms with the available global endmembers accounted for the variance in the soil by substrate and dark object fractions, given the fact that overall the crop conditions were very uniform.
For the SF area, which depicted an urban scenario, most of the urban pixels were mixed with vegetation (urban trees), roads, shadows or appear as dark objects due to the varied materials used in the construction of the terraces.Nevertheless, this study showed that urban reflectance could be adequately modeled with a three-endmember mixture model using Landsat and WV-2 data.The S-V-D endmember model characterized the fraction of illuminated vegetation, substrate or impervious materials and the shadowed or non-reflective surfaces such as water, roofing tar, etc. High substrate fractions are rational estimates of the impervious surface in developed land in temperate and tropical regions, as pervious surfaces are mostly covered by some kind of vegetation.Exposed substrates are also most likely to be impervious.In such cases, the vegetation fraction can be used as a proxy for fractional pervious surfaces because vegetation cannot thrive on impervious surfaces.Therefore, the presence of vegetation implies the presence of some amount of pervious surface.Hence, using detectable vegetation as an indicator of the permeable surface can account for the range of different natural and manmade surfaces [61].
The abundance estimation approach is very different from the classical per-pixel classification approach where each pure pixel is assigned to one and only one class.Owing to the consistency in the accuracy estimation procedures, standards and results, this study gives a clear vision of the accuracies of the different algorithms.Overall, FCLS performed better, was more robust and computationally faster compared to the other techniques.Nevertheless, MFCLS, SPU, fully-constrained SUnSAL and CL SUnSAL also proved to fit the data very well with equally high performances, whereas SUnSAL TV performed poorly.Despite our effort to conduct comprehensive and rigorous comparative analyses of various constrained unmixing algorithms, there are certain limitations of this study.The first limitation is that the number of endmembers used in the library was three.Therefore, it is acknowledged that fractional errors can occur occasionally in cases when few endmembers are used, resulting in spectral information that cannot be accounted for by the existing endmembers.Fractional errors can also occur when too many endmembers are present, in which case minor departures between the measured and modeled spectra are often assigned to an endmember that is used in the model, but not actually present.However, since the set of endmembers was derived for global Landsat data, they have wider usage than image or region-specific endmembers.In a real scenario, any landscape is likely to have mixture of the S-V-D endmembers.They can be easily used to assess the state, distribution and quantification of first level LC classes for obtaining spatio-temporal information from the large repository of Landsat data.Global endmembers have diverse applications from continental to global LC mapping, where local endmembers are restricted to a particular geographical location and may not be easily available for an unapproachable terrain.Therefore, the global endmembers can be used with monthly WELD to study the changes in vegetation and urban areas.The second limitation is that the fractional maps do not account for the endmember variability [62]; the discussion of this topic is beyond the scope of this work.
Although global LC data exists at spatial resolutions of 300 m, 1000 m, and also at 30 m resolution at a limited frequency, the subpixel mapping approach is now a feasible option for the next generation of global LC products.As a case study, a consistent, robust and highly scalable methodology was presented for the characterization of different LC classes.Applying this method to the WELD database now provides a robust automation module for large scale mapping of changes in LC.While there are commercial GIS software packages (such as ENVI ® , ERDAS IMAGINE ® , etc.) for unmixing/classification, they are not scalable across millions of scenes in an automated manner.The limited requirements of the presented approach make it suitable for estimating vegetation, water and urban fractions globally.The future extension of this study will include (i) analysis of the models with MODIS data that may reveal additional differences between the algorithm's performance; and (ii) use of S-V-D abundance maps for the classification of other spatial features such as buildings, highways, crop types, etc.The fractional LC maps can also be used as an input to ensemble classifiers such as deep learning frameworks [63,64] to further improve the classification accuracy.

Conclusions
In this paper, an evaluation of six state-of-the-art constrained subpixel learning algorithms was performed on computer-simulated data and Landsat data of both an agricultural landscape and an urban scenario.The analysis revealed that fully constrained least squares (FCLS) outperformed all the other techniques (such as sparse regression, signal-subspace and geometrical methods) with the highest classification accuracy and the smallest execution time.FCLS was used to produce global abundance maps of substrate, vegetation and dark objects (S-V-D) endmembers with global Web-Enabled Landsat Data.Further, the S-V-D abundance maps were classified into dense forest, farmland/grassland, water bodies and urban areas over California with the random forest classifier.Validation of these fractional land cover maps with the National Land Cover Database (NLCD) and North American Forest Dynamics (NAFD) products revealed 91% accuracy and showed 6% improvement over a per-pixel classified map, making this approach feasible for land cover mapping on a global scale.The current study may be a pathfinder in recognizing the mapping of land cover classes from abundance estimates.LC mapping from regional to global levels will continue to improve over time and would serve as a key data layer for various scientific studies.

Figure 1 .
Figure 1.Field data collection site in the San Joaquin Valley, central California, with surveyed boundaries (black polygons) from which ground fractional cover were derived for validation (left).Part of San Francisco city with a mix of substrate, vegetation, roads, shadow and dark objects (right).

Figure 1 .
Figure 1.Field data collection site in the San Joaquin Valley, central California, with surveyed boundaries (black polygons) from which ground fractional cover were derived for validation (left).Part of San Francisco city with a mix of substrate, vegetation, roads, shadow and dark objects (right).
ground.In this regard, Defense Meteorological Satellite Program Operational Line Scanner (DMSP OLS) nighttime light data and the Visible Infrared Imaging Radiometer Suite (VIIRS) day-night band (DNB) carried by the Suomi National Polar-Orbiting Partnership (NPP) satellite are very useful in monitoring and analyzing human activities and natural phenomena.DMSP OLS and NPP-VIIRS pixels in the nighttime light images aid in classifying urban built-up areas since they capture illuminated surfaces than the surrounding dark areas at night[53].

Figure 3 .
Figure 3. Overall methodology of classification.Figure 3. Overall methodology of classification.

Figure 3 .
Figure 3. Overall methodology of classification.Figure 3. Overall methodology of classification.

Figure 4 .
Figure 4. (a-c) shows synthetic abundance maps for endmember 1, 2 and 3; (d-f) shows abundance maps obtained from FCLS (Fully Constrained Least Squares).Black indicates the absence of a particular class (the minimum abundance value) and white indicates the full presence of that class in a pixel (the maximum abundance value).Intermediate values of the shades of gray represent a mixture of more than one class in a pixel.

Figure 5 .
Figure 5. Correlation coefficient (r) and RMSE (root mean square error) between abundance values obtained from the seven algorithms and the true abundance for (a) endmember 1, (b) endmember 2, (c) endmember 3, and (d) plot of probability of success ( s ) and SRE (signal-to-reconstruction error in dB) at different levels of noise.

Figure 5 .
Figure 5. Correlation coefficient (r) and RMSE (root mean square error) between abundance values obtained from the seven algorithms and the true abundance for (a) endmember 1, (b) endmember 2, (c) endmember 3, and (d) plot of probability of success (p s ) and SRE (signal-to-reconstruction error in dB) at different levels of noise.

Figure 6
Figure 6 is a mosaic of 8003 scenes showing the global (a) substrate, (b) dark objects, (c) vegetation abundance map and (d) NDVI generated from 2011 annual WELD.Each WELD scene is composed of 5295 rows and 5295 columns, therefore, a single snapshot of global data consists of 224.3 billion pixels with six spectral dimensions, processed in 29 min utilizing 200 cores.

Figure 6 .
Figure 6.Global abundance maps of (a) substrate, (b) dark objects, (c) vegetation and (d) NDVI-Normalized Difference Vegetation Index obtained from the 2011 annual WELD.

Figure 6 .
Figure 6.Global abundance maps of (a) substrate, (b) dark objects, (c) vegetation and (d) NDVI-Normalized Difference Vegetation Index obtained from the 2011 annual WELD.

Figure 7 .
Figure 7. (a) Forest fractional map and (b) farmland/grassland fractional map for the state of California.(c) FCC (false color composite) from Landsat (bands 5-4-3 as Red-Green-Blue) showing forest patch, grassland and water bodies; (d) forest fractional map for the corresponding area in (c); (e) farmland/grassland fractional map; (f) water fractional map; (g) classification of original WELD data by RF (Random Forest); and (h) forest cover map from NAFD (North American Forest Dynamics) product.

Figure
Figure 9a,b shows the RGB composites (bands 3-5) for the San Francisco (SF) Bay Area, popularly known as Silicon Valley, and the Los Angeles (LA) area, (c) and (d) are corresponding NPP-VIIRS nighttime data, (e) and (f) are the fractional urban built-up areas obtained using the combination of substrate and nighttime light data, and (g) and (h) shows the corresponding built-up area obtained from the classification of original WELD bands using RF.

Figure 7 .
Figure 7. (a) Forest fractional map and (b) farmland/grassland fractional map for the state of California.(c) FCC (false color composite) from Landsat (bands 5-4-3 as Red-Green-Blue) showing forest patch, grassland and water bodies; (d) forest fractional map for the corresponding area in (c); (e) farmland/grassland fractional map; (f) water fractional map; (g) classification of original WELD data by RF (Random Forest); and (h) forest cover map from NAFD (North American Forest Dynamics) product.

Figure
Figure 9a,b shows the RGB composites (bands 3-5) for the San Francisco (SF) Bay Area, popularly known as Silicon Valley, and the Los Angeles (LA) area, Figure 9c,d is corresponding NPP-VIIRS nighttime data, Figure9e,f is the fractional urban built-up areas obtained using the combination of substrate and nighttime light data, and Figure9g,h shows the corresponding built-up area obtained from the classification of original WELD bands using RF.

Figure
Figure9a,b shows the RGB composites (bands 3-5) for the San Francisco (SF) Bay Area, popularly known as Silicon Valley, and the Los Angeles (LA) area, (c) and (d) are corresponding NPP-VIIRS nighttime data, (e) and (f) are the fractional urban built-up areas obtained using the combination of substrate and nighttime light data, and (g) and (h) shows the corresponding built-up area obtained from the classification of original WELD bands using RF.

Figure 8 .
Figure 8.(a) Spatial distribution of the fractional water bodies in California; (b) RGB composite of the Landsat bands (3-5) showing the San Luis Reservoir in central California; (c) fractional map of the San Luis reservoir; (d) classification of original WELD data by RF classifier; and (e) San Luis reservoir from the NAFD product.

Figure 8 .
Figure 8.(a) Spatial distribution of the fractional water bodies in California; (b) RGB composite of the Landsat bands (3-5) showing the San Luis Reservoir in central California; (c) fractional map of the San Luis reservoir; (d) classification of original WELD data by RF classifier; and (e) San Luis reservoir from the NAFD product.Remote Sens. 2017, 9, 1105 16 of 25

Figure 9 .
Figure 9.The urban built-up areas extracted from substrate fractional map and NPP-VIIRS data of the SF Bay Area and LA area.Note: (a,b) are Landsat RGB composite (bands 3-5); (c,d) are NPP-VIIRS data; (e,f) are the extracted urban areas; (g,h) are the urban settlement obtained from classification of original WELD data by RF classifier.

Figure 9 .
Figure 9.The urban built-up areas extracted from substrate fractional map and NPP-VIIRS data of the SF Bay Area and LA area.Note: (a,b) are Landsat RGB composite (bands 3-5); (c,d) are NPP-VIIRS data; (e,f) are the extracted urban areas; (g,h) are urban settlement obtained from classification of original WELD data by RF classifier.

Figure 10 .
Figure 10.Map showing 100 validation points chosen over California for forest (dark green circles), farmland (light green circles), water bodies (blue boxes) and urban built-up classes (red boxes), each representing multiple (~10) points.

Figure 10 .
Figure 10.Map showing 100 validation points chosen over California for forest (dark green circles), farmland (light green circles), water bodies (blue boxes) and urban built-up classes (red boxes), each representing multiple (~10) points.

25 Figure 11 .
Figure 11.(a) Fraction of forest cover, (b) farmland, (c) water bodies, (d) urban built-up in the SF Bay Area and (e) LA area from unmixing, RF, NLCD and NAFD.The straight lines indicate the mean of the corresponding classes.

Figure 11 .
Figure 11.(a) Fraction of forest cover, (b) farmland, (c) water bodies, (d) urban built-up in the SF Bay Area and (e) LA area from unmixing, RF, NLCD and NAFD.The straight lines indicate the mean of the corresponding classes.