1. Introduction
The standardized spectral mixture model (SSMM) combines the specificity of a physically based representation of a spectrally mixed pixel with the generality and portability of a standard spectral index. However, unlike spectral indices, the spectral mixture model provides quantitative estimates of areal abundance of all endmember (EM) reflectances within the pixel Instantaneous Field of View (IFOV). By using a standard set of spectral EMs chosen to span a large, spectrally diverse, feature space, the resulting EM fraction estimates can be compared across space and time. This standardization makes the SSMM a potentially valuable complement to more complex spectral mixture models that may use local EMs or spectral libraries. An additional benefit of spectral mixture models more generally is the ability to easily quantify wavelength-specific model misfit by using the EM fraction estimates obtained from the model inversion to “remix” the endmember spectra for comparison to the mixed spectrum being modeled [
1,
2,
3]. When applied to imaging spectrometer measurements, the SSMM can provide a basis for representing high-dimensional data volumes using a conceptually simple low-dimensional model. Furthermore, once the low-dimensional model has been computed, it can be subtracted from the high-dimensional observations to yield a spectral mixture residual that can reveal subtle spectral features (e.g., absorptions) that may be deemphasized in the full spectra [
4]. Standardizing the linear mixture model using a parsimonious set of canonical endmembers allows it to encompass location- and application-specific mixture models by providing a common basis within which all linear models may be represented.
Spectrally and geographically diverse collections of broadband imagery have been used to show that the aggregate reflectance of the vast majority of ice-free landscapes on Earth can be represented as linear mixtures of rock and soil substrates (S), photosynthetic vegetation (V) and dark targets (D) composed of shadow and spectrally absorptive/transmissive materials (e.g., deep clear water, ferromagnesian rock, organic-rich soil). As such, dark fraction estimates can also be used to quantify variations in illumination, geometry (i.e., slope and aspect) and surface roughness [
5]. The spectral feature space bounded by these SVD EMs is also referred to as a spectral mixing space in acknowledgement of the spectral mixing that generally occurs at or below the subpixel scale of the IFOV. Compilations of Landsat, Sentinel and MODIS have been used to demonstrate the consistency of the triangular SVD mixing space across landscapes [
6,
7,
8,
9,
10,
11]. In addition, distinct mixing continua are observed for sands and evaporite substrates, shallow marine substrates (e.g., rock, sediments and coral) and the snow/firn/ice continuum [
7,
10,
12]. The widespread applicability of the SVD model and remarkable stability of its inversion for broadband spectra are noteworthy, given its generally small (<~5%) misfit and linear scaling over orders of magnitude in sensor resolution [
7,
9,
13]. This begs the question of whether similar characteristics may extend to spectrally mixed pixels collected by imaging spectrometers. The fact that application-specific linear mixture models have been used to represent hyperspectral data for years suggests that this may be the case (e.g., [
2,
14,
15,
16,
17,
18]).
Pilot studies of spectra collected by the AVIRIS and EMIT spectrometers suggest that the broadband SVD model may be extended to higher spatial and spectral resolution. Sousa and Small (2017) used a diverse collection of 3–9 m AVIRIS imagery from a variety of landscapes in California to demonstrate similarity of mixing space topology and EMs to near-simultaneous acquisitions of 30 m Landsat imagery [
13]. More recently, Sousa and Small (2023) used a smaller but diverse set of 20 early-release EMIT granules to verify the SVD mixing space topology and EMs at 40–60 m and 10 nm resolutions [
19]. In addition to using the singular value decomposition to quantify spectral dimensionality and mixing space topology, this study also used nonlinear dimensionality reduction [
20] to characterize local mixing space topology as well as that of the feature space of the SVD mixture residual [
19]. The much greater spectral and geographic diversity of the current EMIT archive allows for this characterization to be extended to a more globally representative diversity of landscapes than was possible with the 20 early release EMIT granules.
Both broadband and spectroscopic studies of the spectral mixing spaces raise questions about the completeness and generality of the SVD model for ice-free land cover, specifically regarding the plane of substrates revealed by the earlier studies referenced above. Variance partition from singular value decomposition of image compilations consistently shows both broadband and spectroscopic mixing spaces to be statistically three-dimensional (3D) for >95% of total variance [
21,
22,
23]. The third dimension generally accounts for only a few percent of total variance of the 3D space, but this thin third dimension consistently corresponds to the plane of substrates and often reveals additional potential substrate EMs. Most notably would be non-photosynthetic vegetation (NPV), which is abundant in many landscapes and often considered a distinct endmember [
16,
24,
25,
26]. In addition, previous studies have also shown that the high albedo substrate EM most often corresponds to a sand reflectance which is generally distinct from lower albedo soil reflectances. SVD models using a sand substrate EM can represent reflectances of a diversity of landscapes accurately (<~6% RMS misfit), but the significantly higher amplitude of a sand substrate EM may also underestimate the true fraction of lower albedo soils present in most landscapes with exposed substrate. In addition, the lack of an NPV EM omits an important compositional component of the vegetation–substrate continuum in many landscapes. Extending the SVD model by including an NPV EM would effectively extend the 2D planar triangular model to a 3D tetrahedral model, perhaps better representing the true 3D topology of the spectral mixing space. However, adding another degree of freedom to a stable model may incur costs in terms of model stability.
The objectives of this study are fourfold: (1) Verify the generality and stability of the spectroscopic SVD model in a larger, more spectrally diverse range of landscapes than previous studies have used. (2) Characterize the SVD topology and plane of substrates to assess linearity of spectral mixing. (3) Identify additional potential endmembers for non-sandy soil and NPV to extend the SVD model. (4) Quantify fraction estimate plausibility, EM sensitivity and linearity of spatial scaling for the spectroscopic linear mixture model. The ultimate objective is to develop an effective low-dimensional model to represent a high dimensional mixing space, thereby simplifying the use of spectroscopic imagery for a variety of applications.
3. Results
The spectral mixing space rendered from the low-order PCs reveals the expected triangular topology with clearly defined tapering apexes for vegetation and dark endmembers (
Figure 3). The plane of substrates, accounting for ~2% of total variance, is characterized by a high albedo sand apex with multiple lower albedo sand apexes and a distinct mixing continuum extending to a soil endmember. Opposite the dark-to-soil continuum, a clear convex bulge reveals an NPV endmember with amplitude comparable to, but somewhat lower than, the soil endmember. Both sand and NPV endmembers form mixing continua converging to the vegetation and dark endmembers, with the plane of substrates forming the base of the SVDN tetrahedron.
In contrast, the UMAP embeddings show distinct mixing continua for vegetation and substrates, with NPV connecting both.
Figure 4 shows the UMAP embedding for a near_neighbor scaling of 30. The 1-2 projection shows a single continuum surrounded by a constellation of distinct clusters, while the 3-2 projection clearly distinguishes the vegetation and substrate continua. Almost all of the isolated clusters correspond to spectrally distinct water bodies, or water masses within the larger coastal water bodies. There are two distinct clusters corresponding to sand bodies in the Levant and Salton granules. However, almost all of the sands present in the several desert granules form continua within or connected to the larger substrate continuum. The joint characterization combining PCs 1 & 2 with UMAP dimension 3 show these continua more clearly as they all span a range of amplitudes extending from the dark endmember to each of the distinct sand reflectances present in the mosaic (
Figure 4). At least nine distinct continua can be identified for this UMAP embedding. Higher near_neighbor settings collapse these distinct continua onto the larger substrate continuum.
The joint characterization reveals the presence of two distinct limbs of lower amplitude NPV embedded within the substrate continuum. Differences in the VNIR and SWIR2 (2000–2500 nm) suggest that N3 may be more vegetation-dominant with deeper chlorophyll absorptions in the visible and more prominent lignin absorptions in the SWIR2 (
Figure 5). In contrast, N4 has more nearly uniform VNIR reflectance reminiscent of sandy soil with no prominent absorptions in the SWIR2. These two limbs merge to form a single continuum that increases in amplitude to a branch point between higher amplitude NPV and a purely soil continuum at N2. The higher amplitude NPV (N1) and soil (S1) endmembers illustrate the NIR peak (~1400 nm) and deep lignin absorptions in SWIR2 of the NPV, in contrast to the more continuous soil spectrum peaking at SWIR1 wavelengths. This soil EM contrasts strongly with the VNIR shoulders and varying SWIR2 absorptions of the high albedo EMs of the sand continua.
Inversion of the SVDN mixture model yields the SVD composites shown in
Figure 6A and the NPV and RMS misfit composites in
Figure 6B. Aside from varying densities of vegetation canopy, the most prominent differences among the sample sites in the SVD composite is the contrast between higher albedo sandy soils and unmodeled sands (red) and lower albedo soils (blue to magenta). The relatively fine scale spectral diversity of the San Joaquin valley soils is especially apparent in this composite. The prominence of NPV (yellow) in most of the sample sites is apparent from
Figure 6B. Because of the generally low misfit of the SVDN model, areas with relatively higher misfit (blue) are actually areas with relatively little exposed NPV. Inversions run without the unit sum constraint produced wildly divergent results for both the SVD and SVDN models, with implausible dark fraction estimates and fraction sums ranging from −2 to 10.
Comparing bivariate fraction distributions highlights the most prominent differences between the SVD, SVDN and NVD models.
Figure 7 shows different models by column and corresponding fraction distributions across rows. Most immediately apparent is the difference in fraction distribution ranges among models. The SVD model (left) is well-bounded [0, 1] with all exceedances <0.1, while the SVDN and NVD models have both larger exceedances and much greater numbers of spectra out of range. This is particularly true for both substrate and vegetation in the SVDN model. For the SVDN model, 29% of substrate fractions are <0, while only 3% are <0 for the SVD model. For vegetation fractions, both models perform similarly with <5% of estimates <0—although the magnitude of exceedance is significantly greater for the SVDN model. For the SVDN model, most of the spectra with significantly negative substrate fractions have intermediate vegetation and dark fractions, corresponding to forests and other closed canopy vegetation. Notably, 29% of NPV fractions are <0 with exceedances reaching −1 for the SVDN model. The increasingly negative values of NPV with increasing substrate fractions >1 indicates that these nearly colinear (ρ = 0.9) endmembers together minimize misfit through destructive interference. In contrast, the NVD model shows similar tradeoffs of NPV with both the dark and vegetation fractions. Without a substrate endmember, the moderate collinearity (ρ = 0.53) of the NPV and vegetation endmembers results in similar interference effects. While the NVD model did yield plausible fractions for a significant number of modeled spectra, and may therefore be viable in some landscapes where NPV is more prominent than exposed soil, the large number of implausible fractions make it unsuitable for a general model for landscape reflectance.
Comparing corresponding fraction distributions for the SVD and SVDN models reveals the effect of model dimensionality on the fraction estimates themselves.
Figure 8 shows corresponding S, V and D fractions for both models, along with misfit distributions. As should be expected, the substrate fraction estimates are most strongly impacted by the presence of absence of the NPV endmember with a strong negative bias for all but the highest substrate fractions in the SVDN model. This is also true for the dark fractions, but to a much lesser extent than for substrates. It is noteworthy that vegetation fractions are almost identical for both models, although with a slight positive bias for the lowest vegetation fractions in the SVD model. As expected, the SVD model has higher RMS misfit than the SVDN, although both are quite small (<0.04) for all spectra except clouds. Although misfit is somewhat greater for the SVD model, it is still <0.04 for 98% and <0.02 for 68% of all spectra.
Comparing the observed spectra for both models reveals the nature of the misfits for each.
Figure 9 shows the same RMS misfit comparison as
Figure 8, but at an enlarged scale. Aside from cloud, both models show the largest misfits for sands (example 2). As expected, the SVDN model achieves much better fits for spectra with high NPV fractions. Both models have comparable misfits for example 7 because none of the EMs can accommodate the anomalously high visible reflectance of this mixture of NPV and vegetation.
The endmember sensitivity analysis confirms that the SVD model is quite robust to variations in all three endmembers. Comparing all permutations of three peripheral (outlier) spectra for each endmember quantifies the worst case scenarios using combinations of anomalous endmember spectra. Pairwise sensitivities between individual endmember spectra are highlighted by correlations between corresponding fractions for each endmember combination as shown in the inset correlation matrices in
Figure 10. The consistently high (> 0.98) linear correlations for all endmember combinations highlights the extremely stable nature of the SVD model inversion that results from the near orthogonality of its endmembers. The large numbers of implausible fraction estimates produced by the SVDN and NVD models reveal the relative instability of these models, thereby precluding the utility of sensitivity analysis for either model.
The linearity of mixing for both SVD and SVDN models is confirmed by comparing fraction estimate distributions for a 40 × 60 m resolution EMIT acquisition with near coincident 4.4 m resolution AVIRIS-3 acquisition from an agricultural region on the Sacramento delta in California. Despite the difference in spatial and spectral resolution, the spectral mixing spaces of the EMIT granule and AVIRIS-3 line are virtually identical and both mixture model inversions yield comparable RMS misfit distributions.
Figure 11 shows SVD fraction composites for SVDN models using the same EMIT-derived endmember spectra for both instruments. Scaling is strongly linear across an order of magnitude difference in resolution for all fraction estimate distributions. The slight positive biases for the S, V and N fractions and slight negative bias of the D fractions of the AVIRIS spectra are consistent with its collection under higher solar elevation conditions compared to the EMIT acquisition. Much of the dispersion about the 1:1 lines is a result of significant identifiable orthographic displacements between the AVIRIS line and the EMIT granule.
4. Discussion
4.1. The Spectroscopic Mixing Space
The collection of 40 EMIT granules from a diversity of agricultural basins worldwide yields a low-order mixing space topology very similar to that obtained from much larger areas of Landsat, Sentinel 2 and MODIS imagery collections, as well as smaller areas of AVIRIS collections, used in previous studies. The most obvious differences are related to the exclusion of evaporites, submarine substrates (i.e., reefs) and the cryospheric continuum of snow/firn/ice—which all form distinct mixing continua not found in the majority of ice-free landscapes. In comparison to our earlier analysis of 20 early-release EMIT granules, this collection contains a much greater diversity of both soil and vegetation types and exposures. As a result, the structure of the plane of substrates is more clearly resolved. Notably, soil and sand mixing continua are clearly distinguished and an NPV apex emerges. The joint characterization of the mixing space, combining the UMAP and PC embeddings, shows distinct substrate and vegetation continua, topologically connected by NPV. This is physically consistent with NPV being a compositional intermediary between photosynthetic vegetation and soil. The joint characterization also reveals at least seven distinct mixing continua between sands and sandy soils. This is also consistent with the fact that sands are often mineralogically distinct as a result of source rock provenance and the sedimentological processes by which they are segregated from finer-grained sediments. One notable difference to the broadband mixing spaces of earlier studies is the prominent continuum of water body reflectances extending from the dark endmember. This is consistent with EMIT’s high signal-to-noise ratio and thus its ability to resolve more distinct reflectances in the visible spectrum, as well as the importance of spectral curvature for aqueous targets and the absence of evaporite and snow/firn/ice continua in this image compilation, which would otherwise compress the 3rd dimension of the PC space because of the very high reflectance amplitude of dry evaporites and snow.
4.2. The SVD Model—Why It Works
Identification of a pure soil endmember, distinct from sands, allows the new soil-based spectroscopic SVD model to better represent a wider diversity of landscapes. Because the sand mixing continuum forms one edge of the plane of substrates while the NPV continuum forms the opposite, the intermediary soil endmember better accommodates both non-sandy soils as well as more organic-rich soils nearer the NPV continuum. This more representative endmember therefore reduces the misfit for most of the plane of substrates, as well as for unmodeled NPV. In addition, the reduced amplitude of the new soil endmember (relative to sand) reduces the underestimation of soil fraction estimates obtained using a high amplitude sand endmember for all substrates. The slightly negative vegetation fractions are limited to low albedo sands in the Gobi desert (Huang He & Hexi). The slightly negative dark fractions are limited to high albedo sands in the Negev (Levant) and Anza-Borrego (Salton) deserts. Relative to earlier studies, the incorporation of more soil-rich and fewer sand-dominant landscapes in the EMIT mosaic reduces the variance partition of the plane of substrates from 5–6% to 2–3% of total mixing space variance, with a reduction of RMS misfit to < 0.03 for 91% of modeled spectra. A sand-based substrate endmember may sometimes be preferred for modeling some arid landscapes, but the new soil-based substrate endmember may better represent a wider variety of non-arid landscapes worldwide.
The primary reason why the planar triangular SVD model is so effective as a general model of land surface reflectance is related to the nearly planar triangular topology of the low-order PC mixing space itself. Without the high amplitude evaporite, reef and cryospheric continua dominating the 3rd dimension of the space, the plane of substrates itself represents only <3% of total variance. Using a single substrate endmember effectively neglects this off-plane variance. However, the near orthogonality (ρ < ±0.3) of the S, V and D endmembers stabilizes the model inversion without driving the SVD fractions out of [0, 1] plausibility range. This stability comes at the cost of slightly higher RMS misfit (compared to SVDN), but the near planar topology of the mixing space still allows for a remarkably low misfit overall. Particularly given the presence of high amplitude unmodeled sands and clouds in the EMIT mosaic.
The primary limitation of the SVD model remains its requisite projection of a 3D mixing space onto a 2D model plane. The new, more representative soil endmember partially resolves the misfit that results from the orthogonal plane of substrates; however, sandy and NPV-rich soils still lie outside the model. In addition, nonlinear mixing resulting from multiple scattering produces convexities in parts of the mixing space that cannot be accommodated by any linear mixture model. Nonetheless, it is remarkable that nonlinearities associated with soil moisture effects and volume scattering within vegetation canopies are so well represented as varying mixtures of the substrate and vegetation endmembers mixing with the dark endmember. While moisture absorption and partial canopy transmission are certainly not linear effects, modulation by the dark endmember seems to be an effective way to represent them at meter to kilometer scales of spectral mixing.
4.3. The SVDN Model—Why It Does Not Work
Extending the SVD model with an NPV endmember reduces its already small RMS misfit significantly in many landscapes. However, this reduction comes at the cost of a much larger percentage of spectra being represented with implausible fraction estimates outside the [0, 1] range. The primary reason for this destabilization of the model inversion is the combination of the near collinearity (ρ~0.9) of the NPV and substrate endmembers as well as the additional degree of freedom of the SVDN model relative to the SVD. This additional degree of freedom, combined with the near collinearity of the substrate and NPV endmembers, allows the inversion to exploit destructive interference in the form of negative fractions to minimize model misfit. The moderate collinearity (ρ~0.5) of the NPV and vegetation endmembers also likely contributes to the implausible fraction distributions of both SVDN and NVD models. The identification of the NPV mixing continuum certainly better characterizes the spectroscopic mixing space from a physical perspective, and the NPV endmember spectrum may be useful for applications where NPV is a prominent component of the mixing space, but the overall costs clearly outweigh the benefits of the SVDN model as a general, parsimonious representation for landscape reflectance.
4.4. Why Use Standardized Spectral Mixture Models?
By combining the benefits of application-specific spectral mixture models with standardized spectral indices, the SSMM offers consistency, simplicity, inclusivity and diversity. While the benefits of diversity are often overstated, or even taken as axiomatic, the ability of a single model for mixed spectral reflectance to represent a wide range of landscape components is nonetheless potentially valuable for many applications.
The existence of an SSMM for spectroscopic mixture modeling in no way precludes the use of application-specific mixture models with local or otherwise optimized spectral endmembers. Given the stability and negligible computational cost of the SVD model inversion, the SSMM can complement application-specific mixture models by allowing their resulting fraction distributions to be compared across space and time by projecting them onto the SVD basis. Mathematically, the standardized SVD model provides a parsimonious representation of both the global spectral mixing space for a wide range of ice-free landscapes, as well as for spectral libraries. One potential application for the spectroscopic SSMM could therefore be the ability to easily project a given spectral library onto a simple ternary space spanning the three most physically and spectrally distinct components of most terrestrial landscapes. As such, the spectroscopic SSMM could allow for direct comparisons of different spectral libraries in the form of ternary diagrams. Further, an SSMM rooted in high SNR spectroscopic data facilitates straightforward standardization and cross-calibration of models across several multispectral sensors, since sensor-specific EMs can be trivially computed via convolution for any arbitrary multispectral sensor for which a spectral response function is available.
The information captured by a standardized spectroscopic mixture model can also be understood in the context of the wavelength-explicit misfit of the model, the spectral mixture residual [
3,
4]. The estimation of EM abundances using spectral unmixing involves minimization of a cost function, which is often but not always the root-mean-square misfit between the actual observed reflectance spectrum and the spectrum generated by areas-weighted linear combination of EM reflectances. Important spectroscopic information can exist within this model misfit, for instance, absorption features which are not represented in the EM spectra. Viewed in this way, the mixture residual of a standardized spectroscopic mixture model has conceptual parallels to Continuum Removal (e.g., [
26]). Evaluation and refinement of the standardized mixture model are thus important for understanding and improving the utility and generality of the spectral mixture residual in spaceborne imaging spectroscopy data.
4.5. Limitations and Future Work
The scaling analysis included in this study provides an important form of vicarious validation, indicating that the decameter resolution linear mixture model provides a reasonable approximation of spectrally distinct meter scale land cover components. However, it will be important to supplement this vicarious validation with in situ field validation taking into account factors like varying illumination geometry, soil moisture and roughness variations, canopy closure and leaf area index (e.g., [
42,
43]). The role of NPV spanning the substrate and vegetation mixing continua could also be the focus of a dedicated field validation campaign. Such field validations might be best constrained by collecting multitemporal observations of a seasonally variable validation site, perhaps with in situ monitors to provide more detailed context for spatiotemporal changes in atmospheric opacity, illumination geometry, vegetation phenology and soil moisture content.
The geographic coverage provided by the EMIT mission limits the availability of cloud-free imagery for the tropics. This is particularly true in Africa. It also precludes inclusion of high-latitude boreal environments. Based on earlier studies using more globally representative collections of broadband multispectral data, we do not expect the spectral mixing space to change significantly with extension to higher and lower latitudes, but we do acknowledge the potential for a more globally inclusive data compilation. Future studies will extend this analysis to a wider range of environments when global coverage from the NASA SBG mission becomes available after its anticipated 2028 launch.