^{*}

Reproduction is permitted for noncommercial purposes.

Weak gaseous plume detection in hyperspectral imagery requires that background clutter consisting of a mixture of components such as water, grass, and asphalt be well characterized. The appropriate characterization depends on analysis goals. Although we almost never see clutter as a single-component multivariate Gaussian (SCMG), alternatives such as various mixture distributions that have been proposed might not be necessary for modeling clutter in the context of plume detection when the chemical targets that could be present are known at least approximately. Our goal is to show to what extent the generalized least squares (GLS) approach applied to real data to look for evidence of known chemical targets leads to chemical concentration estimates and to chemical probability estimates (arising from repeated application of the GLS approach) that are similar to corresponding estimates arising from simulated SCMG data. In some cases, approximations to decision thresholds or confidence estimates based on assuming the clutter has a SCMG distribution will not be sufficiently accurate. Therefore, we also describe a strategy that uses a scene-specific reference distribution to estimate decision thresholds for plume detection and associated confidence measures.

Remote detection and identification of weak chemical plumes using passive hyperspectral sensors is a challenging problem. Here we consider an airborne visible and near infrared (NIR) sensor (AVIRIS [^{-8}

There are many ways to characterize background clutter; the appropriate characterization depends on analysis goals. We focus on ways that are most relevant for a particular method (generalized least squares, GLS) of plume detection. Plume-like pixels are those thought to have a gas plume influencing the signal; background pixels are those thought not to have a plume influencing the signal. One common set of simplifying assumptions leads to a GLS problem [

When we look at real scenes, we tend to see them as a mixture of components, or “clutter,” rather than as, for example, a single-component multivariate Gaussian (SCMG). Whether formal mixture models are effective for analysis depends on the goals, and although modeling clutter as SCMG may seem simplistic, it has proven to be surprisingly effective [

It is not our purpose here to recommend a particular detection method, but because GLS has been shown to be effective, we will compare GLS-based results from real scenes to those from simulated SCMG scenes, in order to assess the adequacy of the SCMG approximation in this context. Therefore, our main goal is to show to what extent the popular GLS approach applied to real data to look for evidence of known chemical targets leads to chemical concentration estimates and to chemical probability estimates (arising from repeated application of the GLS approach) that are similar to corresponding estimates arising from simulated SCMG data.

In order to perform gas plume detection using GLS it is necessary to characterize the distribution of GLS values in real scenes that do not contain plumes so that thresholds corresponding to false positive rates can be estimated. Recent results suggest that scalar-valued GLS estimates, which arise from a linear transformation applied to scenes consisting of a mixture of background emissivities (but no or negligible plumes) are surprisingly close to Gaussian in distribution [

In some cases, approximations to decision thresholds or confidence estimates based on assuming the clutter has a SCMG will not be sufficiently accurate. Therefore, we also describe a strategy that uses a scene-specific reference distribution to estimate decision thresholds for plume detection and associated confidence measures. To simplify the discussion, we will use the informal expressions “more Gaussian” and “less Gaussian” to describe the extent to which a particular estimate based on a cluttered scene is behaving like an estimate from a simulated SCMG scene.

Section 2 gives additional background. Section 3 describes the radiance data model and simplifications leading to the GLS approach and/or to BMA approaches that rely on GLS. Section 4 describes tests to compare GLS-based results from real data to those from simulated SCMG data. Because in many cases the GLS-based result applied to SCMG data will be Gaussian, the tests are checks for Gaussian behavior, such as skewness, kurtosis, and quantiles. Example results are also presented in Section 4. Section 5 explores why the GLS transform improves the Gaussian approximation. Section 6 describes the notion of a scene-specific reference distribution in cases where approximation based on the SCMG assumption is not adequate. Section 7 is a summary.

Our goal is to characterize clutter in the context of gas plume detection in cluttered scenes that contain a mixture of background emissivities, such as water, vegetation, asphalt, concrete, buildings, etc., for which various models and approaches have been proposed. A typical scene consists of approximately 128 to 256 spectral channels in each of approximately 500

We will use data from this same scene throughout and refer to it as scene A. We assume that scene A contains no plumes and will evaluate to what extent the collection of 224-dimensional radiance vectors at each pixel behaves in the context of GLS-based plume detection as a collection of 224-dimensional Gaussians having a single 224-dimensional mean and 224-by-224 dimensional covariance matrix (described in Section 3). In practice, it is not known whether a real scene contains any plumes, so there is typically an iterative procedure that first assumes there are no plumes, then looks for plumes using the covariance estimate from the “off-plume” pixels, then removes “plume-like” pixels and re-estimates the “off-plume” covariance matrix and mean vector. Because we focus entirely on characterizing the background in cases having weak and rare plumes, we will not consider such an iterative procedure. However, plume-detection performance is expected to degrade when on-plume pixels are used to estimate the background covariance matrix. Performance also degrades if the chemical target directions are similar to the clutter directions (as defined, for example, by the eigenvector directions in the spectral decomposition of the background covariance matrix [

In the scenes we consider, small (up to a few or a few tens of pixels) and weak (both in terms of temperature difference between the ground and plume and in terms of chemical strength, see section 3) plumes from a library of possible chemicals might be present. These chemicals have effects on the measured at-sensor radiance that we refer to as “chemical signatures,” which are based on “known” spectra (the spectral library values are not known perfectly) that must be transformed (introducing an error source). Therefore, the “chemical signatures” are also not known perfectly. However, our focus in on characterizing the background clutter in the context of this situation, so we will ignore errors in the chemical signatures and deal strictly with real and simulated scenes that have (or are assumed to have) no plume.

We focus on the GLS values or BMA probability estimates and associated thresholds for deciding whether a plume is present in a pixel in scenes having no plumes. This is an effective way to characterize background clutter in the context of plume detection because decision thresholds impact performance as defined by false positive and negative rates. Therefore, we compare thresholds estimated from one real cluttered scene and from several simulated cluttered scenes to thresholds from corresponding uncluttered, SCMG scenes. For completeness, we also compute other aspects of the distribution of GLS and BMA values, such as kurtosis and skewness, so that the Gaussian approximation can be more fully assessed.

One caveat is that approaches other than GLS are necessary if we cannot assume we have an exhaustive list of all possible chemical targets in the scene (“chemical target” is described Section 3); in such cases, there might be more relevant ways to characterize the background clutter. In addition, other approaches become available if the ground pixels are viewed twice or if other regions of the spectrum are used. For example, mid-wave IR data might allow the opportunity to observe plume effects as a shadow effect on ground pixels and also as we do here for NIR data, as a signal effect when the plume lies in the line of sight between the sensor and the corresponding ground pixels.

There are several implications of concluding that an SCMG model is effective in our context. First, it is a simple model-based summary of complicated clutter. Second, decision thresholds could be based on SCMG data, and therefore could be computed analytically (analogous, for example, to claiming that a decision threshold of ±2 standard deviations corresponds to a 5% false alarm probability by appealing to a Gaussian approximation). Third, it would suggest that it might be difficult to find robust methods that improve the GLS plume-detection performance (defined as the false negative rate for a given small fixed positive rate).

The photons detected by a visible/NIR hyperspectral detector associated with background pixel _{j} (all terms depend on frequency _{j}) can be modeled as
_{i}_{i}

The signal from plume pixel _{p}_{p}_{i}

Using the approximation
^{−}^{x}_{k}_{1}_{2}…_{nc}_{c} matrix of chemical spectra with _{k}_{1}_{2}…_{nc}_{c}_{i} as

If any of the estimated components in the β parameter is large, this is evidence of a plume at pixel ^{2} sr). The temperature difference term arises from evaluating Planck's function at ground and plume temperature. Note that for _{i}_{i}

To summarize, we can write _{i}

A common alternative to

In _{i}_{i}_{i}

The approach considered here uses all of the image's pixels to characterize the background by computing Σ̂ (recall that there is no iteration to remove plume-like pixels;. For each frequency _{j}

The GLS solution to

In ^{−1} prior to the multiplication by ^{T}: (1) in contrast to the often-used principal components, the transformation to Σ̂ ^{−1}^{−1}^{−1}; (2) in the case that Σ̂ ^{−1} has nonzero off-diagonal entries, note that Σ̂ ^{−1}^{−1} is diagonal (all off-diagonal entries are zero), note that Σ̂ ^{−1} _{i} by _{i}

BMA involves repetitive application of the GLS for an exhaustive list of possible chemical subsets, as described below.

In order to decide which chemicals are present in a candidate plume, we apply the GLS to all subsets of 1, 2, or 3 chemicals [

For a given data set _{1}∣_{1})_{1}), where _{1}) is the prior probability for model (subset) _{1}. This calculation requires calculation of the expression _{1}) = ∫_{1},_{1})_{1} ∣ _{1})_{1} where _{1} is the coefficient vector for the chemicals in model _{1}. Such integrals are notoriously difficult in most real problems requiring either numerical integration, analytical approximation, or Markov Chain Monte Carlo methods [

Following [_{j}_{j}^{−}^{BICj}^{/2} where BIC_{j}_{j}_{j}_{j}_{j}_{j}_{j}_{j}

The probability that chemical C is present is

Kurtosis and skewness are two of many measures used to gauge how close a distribution is to Gaussian [

Here we will include the following comparisons of real data to corresponding Gaussian data: (a) kurtosis; (b) skewness; (c) the quantiles of each element of

To simplify notation, we will continue to use

The next subsection describes mixture distributions and gives example quantiles from mixtures of scalar-valued Gaussian distributions. The following two subsections define skewness and kurtosis, and give examples with scalar mixtures. Following subsections describe simulated scenes and give results of our comparisons to SCMG data for a few cases.

Kurtosis is defined as the ratio of the fourth moment to the square of the second moment,
^{T}

We will also evaluate skewness, a common measure of symmetry, defined as

Throughout, we will use ^{T}^{−1} ^{−1} ^{T}^{−1} ^{T} A^{−1} ^{T} r

Because real data ^{T}^{−1} ^{−1} ^{T}^{−1}

A random variable _{comp} scalar-valued Gaussian random variables can be expressed as
_{i}_{i}_{i}_{i}_{i}_{i}

It is sufficient here to consider the case where the means _{i}_{i}

Perhaps surprisingly, for many mixtures, these probabilities are smaller than those of the corresponding reference distribution, which is a single-component univariate Gaussian having the same standard deviation (_{mix}) as the mixture. Therefore, the commonly-observed tendency for GLS values in hyperspectral IR image analysis to have fatter tails (higher probabilities of mean-centered values exceeding kσ). is not necessarily expected. However, for many other mixtures, particularly those having very unequal _{i}_{1} = 0.083, _{2} = 0.083, and _{3} = 0.083 and component means _{1} = −3, _{2} = 0 and _{3} = 5, and with _{mix}|>_{mix})=0.079, 0.022, and 0.0006 for _{1} = 0.25, _{2} = 0.5, and _{3} = 0.25 and _{1} = −3, _{2} = 0, and _{3} = 3 with _{mix} |> _{mix}) = 0.023, 0.000014, and 4.4 ^{-11} for

Although kurtosis measures tail behavior fairly well, it is possible for a distribution to have less-than-Gaussian kurtosis, yet have higher probability of exceeding some of these thresholds, and vice versa. For example, one example 10-component Gaussian having very different component fractions has kurtosis -0.333 but _{mix} |> _{mix}) = 0.0002. This is significantly larger than the corresponding probability for a single-component Gaussian, which is 0.00006.

The simulated mixture scenes will be motivated by real data examples such as shown in

_{i} from a lognormal distribution, which results in quite unequal fractions, such as those in Scene A (this is important for our data). For example, there is a large fraction of mountainous vegetation in Scene A, a moderate fraction of water, and a small amount of asphalt.

We will describe CLS and GLS results for several simulated scenes and for subsets of pixels from scene A. The simulated scenes include a 7-component Gaussian mixture example with randomly generated mean vectors; another 7-component Gaussian mixture example, but with the component mean vectors mimicking those in

In all cases, we can arrange for the simulated reference SCMG to have either exactly or approximately the same sample Σ̂ as the scene of interest. Because the ranked eigenvalues of Σ̂ from real scenes typically decay to nearly zero after the largest 2 to 3 eigenvalues, we anticipated that simulated reference scenes should have exactly the same Σ̂ (and eigenvalues) as the corresponding scene of interest (real or simulated). Therefore, results reported here have forced the simulated reference Gaussian to have exactly the same (to within machine accuracy) Σ̂ as the corresponding scene of interest. Also, in computing Σ̂ and/or Σ̂^{−1} it is sometimes advisable to discard very small eigenvalues or use some other type of regularization. We do not report results here arising from any type of regularization. In our context of characterizing clutter by comparing results to a corresponding simulated reference SCMG, it is important to do the same calculations on both data sources, but less important to experiment with options to improve some type of performance.

A secondary consideration is whether the 224 channel variances should be equal. Recall from

Gaussian mixture 1 was generated by randomly selecting 7 components of 224 means to represent 7 mixture components, each having a randomly-generated mean for each of the 224 spectral channels. The component fractions (_{i}_{i}

Recall from

Gaussian mixture 3 was generated in the same manner as mixture 1, except the 7 component means were randomly chosen from a library of 40 common background albedos.

The multivariate-

We also randomly-selected subsets of 16,384 pixels from real scene A are used for our real data examples.

The case that

We will compute the scalar GLS values for one chemical at a time (and report average results over all 296 chemicals) for simulated scenes, the real scene, and the corresponding simulated SCMG scenes.

For each of these cases,

As an important aside, evaluation of detection probabilities for injected signals is beyond our scope here because our focus is on the Gaussian approximation; however in all cases in

From

The Gaussian Mixture 2 results are qualitatively more typical of those from real scenes: there is a significantly larger fraction of large GLS values (exceeding 3, 4, or 5) than in the corresponding reference scene. However, this particular mixture distribution appears to be “worse” than our real data in terms of the corresponding GLS's distance from normality (see the low correlations in the normal probability plot for example). Interestingly, the CLS values often have large kurtosis, but mostly because of the fraction of CLS values exceeding modest values such as 2 than because of CLS values exceeding 3, 4, or 5. Recall that kurtosis values should be near 0 for the Gaussian distribution. There is also a tendency for the GLS values to have larger-than-Gaussian kurtosis, mostly due to the fraction of values exceeding the larger thresholds such as 3, 4, and 5. Also, note that the GLS values have larger kurtosis than do the CLS values for Gaussian Mixture 2; in this respect, the CLS values are therefore more Gaussian than the GLS values in this case. Although not shown here, there was a tendency for the kurtosis of the GLS values to be more Gaussian for random directions than in actual chemical directions randomly chosen from the library of 296 chemical spectra.

The multivariate-

In the three randomly-chosen subsets of pixels from real scene A, note that again the GLS values are closer to Gaussian than are the CLS values. Also note that the fraction of GLS values exceeding 3, 4, or 5 are distinctly larger than the corresponding SCMG values. Therefore, while the GLS transform is generally more Gaussian than is the CLS transform, it is not necessarily adequate to rely on Gaussian approximations (see Section 6).

Finally, note that in all cases, the normal probability plots of the GLS values are more Gaussian than those of the CLS values, as indicated by their significantly higher correlations (last column in

In this subsection, we still consider a scalar value for each pixel, but the scalar is the value of the maximum GLS value from the selected library of 296 chemicals.

Again, Gaussian Mixture 1 is almost indistinguishable from its corresponding simulated reference SCMG. There is a hint of the typical pattern in the extreme tails for P(Z>5): the mixture has slightly heavier than Gaussian tails. This pattern is more pronounced for the other four cases (three simulated and one real scene A with several subsets examined separately). And, that pattern is the key observation from

In this section we consider the vector-valued GLS estimate. We consider the case where ^{T}^{−1} ^{−1} ^{T}^{−1}

It appears from

Although exact calculations can be made for the probability that a given chemical from the library is present, these calculations rely strongly on all the assumptions made, including all the relevant probability distributions. Therefore, it is common to use approximate calculations that are typically in practice as effective as the “exact” calculations, as described in the BMA subsection. By “effective,” it is meant that estimated chemical probabilities are well calibrated. For example, if one records the estimated probability of chemical C in repeated experiments, then among all those instances in which P(C) lies in, say, the interval (0.65,0.74), the fraction of instances in which chemical C is present is approximately 0.70.

It has been shown [

Because all entries are repeatable to within approximated 0.01, ^{2} (which measures the percent of variance in the response that is explained by a linear fit to the predictor) as in

To summarize ^{2} measure are (0.42, 0.79), (0.41, 0.78), and (0.90, 0.94). We empirically verified that these average results are repeatable to within approximately ±0.01 or less. Thus, when averaged over the four chemicals, there is a slight tendency toward lower agreement in the real data both for the average absolute difference and for the ^{2} measure.

This is consistent with earlier findings using a different sensor (SEBASS, which has 128 spectral channels) [

We have demonstrated that GLS values are generally but not always closer to Gaussian than are CLS values using real and simulated data. This has also been demonstrated for a long-wave IR sensor having 128 channels [

Both the CLS and GLS transforms involve linear combinations of 224 variables, each of which has a mixture distribution in real scenes. The central limit theorem (CLT) suggests that, provided no terms dominate, a linear combination will be more Gaussian than any of the individual variables. There are many versions of the CLT [

The term Σ̂^{−1} that appears in the GLS transform but not in the CLS transform typically has approximately 50% of its 224 entries greater than zero in a given row, with some terms being considerably larger than the others. These large terms have the potential to dominate the sum, mitigating the central limit effect. Therefore, we do not anticipate providing a theoretical result but will rely on the following additional evidence.

^{−1} The top center (right) plot is a histogram of the first (second) transformed coordinate values. Notice that the low variance directions in the top left plot have become the high variance directions in new coordinates (the middle plot has a much wider range than the right plot). Also notice that the two components still appear, in the right plot, but that their separation is less than in the left plot. The three bottom plots are the same, except there is no multiplication by Σ̂^{−1}. In the bottom left plot, the 2-component Gaussian is identical to that in the corresponding top plot, but because the scale is smaller, both components are long and thin bivariate Gaussians. The bottom center plot shows the first coordinate (without a transform) and the bottom right plot shows the second coordinate (without a transform). The bottom right plot is more spread out than the corresponding top right plot, by a factor of approximately 2. Multiplication by Σ̂^{−1} tends to reduce the spread among component distributions; this makes the central limit effect have less work to do to approach Gaussian behavior.

^{−1} are shown in the top right plot. For each of the 224 channels, multiplication by Σ̂^{−1} has resulted in bringing the mixture components much closer, to the extent that the ordering of the 4 component values varies much more across channels. This behavior is present even if we use only 2 channels, and has occurred for any number of channels that we have experimented with to date (2, 3, 4, 16, 32, 64, 128, and 224).

The bottom plots are corresponding normal probability plots that should be very nearly linear if the CLS (bottom left) or the GLS (bottom right) values are Gaussian. The CLS and the GLS values are computed by first generating a synthetic scene consisting of 16,438 observations from a 4-component mixture, using the 224-channel response for each of the 4 pixels shown in the top left plot as the component means. The lognormal-generated mixture fractions were 0.07, 0.25, 0.60, and 0.08, respectively for the four components. Random noise having 10% COV was added to each observation. Then, the CLS and GLS transforms were computed in a randomly-chosen chemical direction from the 296-chemical library. We see that the CLS values are not close to Gaussian (the correlation in the normal probability plot is 0.910), but the GLS values are very close to Gaussian (the correlation in the normal probability plot is 0.998) except for the extreme tails. The rapid rise in the normal probability plot of the CLS values near 0 indicates a mixture of two components; although the data was generated from a four-component mixture plus noise, only two components remain recognizable after the CLS transform.

As seen here and in [

Regardless of whether other estimators will outperform the GLS, it is important to assess the confidence in each prediction. If the Gaussian approximation were sufficiently accurate, then in the case of using BMA to predict which chemicals are present in a given plume, the estimated chemical probabilities could be used directly in the natural fashion. We then select a tunable threshold, such as T = 0.99 and predict, for example, that chemical C is present if P(C) exceeds T. Conditional on observed data D, the false positive probability for detecting chemical C is estimated to be 1 – T. The actual false positive probability is likely to differ from 1 – T because the real data is nonGaussian. Even if the data were Gaussian, because of using the approximate result _{j}^{−}^{BICj}^{/2} and because of choosing the maximum or top few chemical probabilities from the entire chemical library, the actual false positive probability will differ from 1 – T [

To characterize the behavior of estimated chemical probabilities, [

An example of the reference distribution using BMA for a real scene, using a library of

We have demonstrated that the GLS transform using either random or chemical directions applied to real and simulated data leads to surprisingly close-to-Gaussian values, in agreement with [

If GLS-related and/or BMA-related results for simulated SCMG data are not sufficiently accurate for a scene of interest, we described a computationally demanding scene-specific reference distribution approach. Using the scene-specific reference distribution avoids the SCMG-based approximation which we have shown has good, but varying accuracy.

Performance (false negative rate in plume detection for a given false positive rate) comparisons were beyond our scope; we note however, that GLS-based approaches have remained competitive among the several other options reported [

The authors gratefully acknowledge support from the U.S. Department of Energy and from the three reviewers whose comments led to substantial improvements.

(top) Broadband image of a scene A, which is cluttered because it includes mountains, buildings, and water, but to our knowledge contains no plumes.

(top) Example radiance from selected pixels from Scene A.; (bottom) Example mean-centered radiance from selected pixels from Scene A.

Coefficient of variation of mean-centered spectra from selected pixels from Scene A.

Normal probability plots of F and GLS values in two different chemical directions (the top row is one direction; the bottom row is another direction) from 16,384 randomly selected pixels from scene A.

Hierarchical clustering result showing that the GLS values are closer to Gaussian (G) than are the CLS values. The subcases are denoted 1, 2, and 3. Subcase 1 is in chemical directions chosen from the library of 296 chemicals to be mutually distinct. Subcase 2 is randomly chosen chemical directions from the library of 296 chemicals. Subcase 3 is in random directions. The plot depicts the average result of 45 pairs of chemical directions for each of 10 randomly chosen sets of 16,384 pixels per set.

(top left) A 2-component Gaussian (a mixture) characterized by two small circular point clouds; (top center) the first component of the GLS transform; (top right) the second component of the GLS transform. The bottom left plot is the same as the top left plot except the horizontal axis is scaled differently. The bottom center and right plots are the same as the corresponding plots in the top row, except are for the CLS transform.

(top left) The well-separated 224-channel response from each of 4 pixels; (top right) The response in the top left plot, multiplied by Σ̂^{−1} and then scaled to unit variance, corresponding to the GLS operation; (bottom left) normal probability plot of corresponding CLS values in a random chemical direction using a random mixture of the 4 pixels in the top left plot; (bottom right) same as bottom left, except for GLS values.

Cell entries in columns 1 to 4 are 100 times the fraction of (CLS,GLS) values (averaged over 296 chemicals, each scaled-to-unit-variance), denoted _{±0.05} or less of results in hypothetical repeats of the same procedure. Entries for skewness and kurtosis are within approximately _{±0.01} of results in hypothetical repeats of the same procedure. Zero entries are less than 10^{-5}.

Case | P( |
P( |
P( |
P( |
kurtosis | skewness | Correlation in the normal probability plot |
---|---|---|---|---|---|---|---|

| |||||||

Gauss. Mix 1 | 2,2 | 0.08,0.1 | 0.01,3e-3 | 0,6e-5 | -0.13,-04e-3 | -0.07,-0.01 | 0.999,1 |

2,2 | 0.1,0.1 | 3e-3,3e-3 | 4e-5,0 | 0.004,-2e-3 | 0.1e-3,-2e-3 | 1,1 | |

| |||||||

Gauss. Mix 2 | 4,3 | 4,1 | 1,1 | 0.03,0.3 | 6.7,18.2 | 2.48,-0.12 | 0.78,0.86 |

2,2 | 0.09,0.01 | 8e-5,3e-3 | 0,2e-5 | -0.07,-3e-3 | 2e-3,3e-4 | 1,1 | |

| |||||||

Gauss.Mix 3 | 3,2 | 0.2,0,3 | 0.02,0.4 | 0,0.01 | -0.25,1.05 | 0.33,-0.16 | 0.984,0.991 |

2,2 | 0.2,0.1 | 3e-3,3e-3 | 2e-5,4e-5 | -0.017,0.01 | -5e-35,7e-3 | 1,1 | |

| |||||||

Multi- |
2,2 | 0.1,0.4 | 4e-3,0.1 | 2e-4,2e-4 | 0.02,2.25 | -0.02,0.02 | 1, 0.990 |

2,2 | 0.1,0.1 | 2e-3,2e-3 | 0,4e-5 | 0.02,2e-3 | -0.01,-0.01 | 1,1 | |

| |||||||

Scene A: | 3,2 | 0.4,0.2 | 0.05,0.02 | 0.02,6e-3 | 1.81,0.47 | 0.41,4e-3 | 0.987,0.989 |

Pixelsubset 1 | 2,2 | 0.2,0.1 | 0.01,4e-3 | 0,6e-5 | 0.01,2e-3 | 0.02,-0.002 | 1,1 |

| |||||||

Scene A: | 2,2 | 0.3,0.2 | 0.2,0.02 | 0.1,4e-3 | 16.1,0.43 | 1.26,7e-3 | 0.95,0.999 |

Pixelsubset 2 | 2,2 | 0.1,0.1 | 1e-3,e3-3 | 0,0 | -0.04,2e-3 | 9e-4,-4e-3 | 1,1 |

| |||||||

Scene A: | 0.5,2 | 0.1,0.2 | 0.03,0.02 | 0.01,5e-3 | 4.4,0.49 | -0.14,7e-3 | 0.91,0.99 |

Pixelsubset 3 | 2,2 | 0.2,0.1 | 4e-3,3e-3 | 0,2e-5 | -6e-3,-4e-3 | 0.03,2e-3 | 1,1 |

Cell entries in columns 1 to 4 are 100 times the fraction of the maximum over 296 (CLS,GLS) values that exceed 2, 3, 4, or 5, and the kurtosis. These (CLS, GLS) entries are given twice per cell: once for the scene and once for a corresponding simulated Gaussian reference scene. Entries for the fractions exceeding thresholds are within approximately ±0.05 or less of results in hypothetical repeats of the same procedure. Entries for kurtosis are within approximately ±0.01 of results in hypothetical repeats of the same procedure. The boldface entries are the only two examples in which the kurtosis of the GLS values was less Gaussian than the kurtosis of the CLS values.

Case | P( |
P( |
P( |
P( |
kurtosis |
---|---|---|---|---|---|

| |||||

Gauss. |
78,75 | 11,14 | 0.3,0.5 | 0,0.01 | -0.16,-4e-3 |

74,76 | 14,15 | 0.6,0.6 | 0.01,0 | 0.01,0.03 | |

| |||||

Gauss. |
12,24 | 5,22 | 4,16 | 3,6 | -0.76,-0.57 |

7,88 | 0.5,17 | 0.02,0.6 | 0,6e-3 | -0.14,-1e-3 | |

| |||||

Gauss. | 89,89 | 40,36 | 4,10 | 0,4 | |

Mixture 3 | 72,69 | 12,11 | 0.4,0.5 | 6e-3,0.01 | -0.02,0.02 |

| |||||

Multi- |
18,66 | 3,27 | 0.4,10 | 0.05,4 | |

15,81 | 1,4 | 0.04,0.5 | 0,0.01 | 0.18,0.01 | |

| |||||

Scene A: | 14,92 | 3,22 | 0.5,2 | 0,3,0.5 | 1.59,0.24 |

pixel subset 1 | 14,94 | 1,21 | 0.05,0.7 | 0,6e-3 | -6e-3,-1e-3 |

| |||||

Scene A: | 11,91 | 1,22 | 0.5,2 | 0.4,0.5 | 21.6,0.54 |

pixel subset 2 | 14,94 | 1,20 | 0.04,0.8 | 0,0 | -0.05,2e-3 |

| |||||

Scene A: | 4,89 | 1,23 | 0.4,2 | 0.3,0.6 | 4.46,0.37 |

pixel subset 3 | 10,94 | 0.9,21 | 0.03,0.7 | 0,6e-3 | -5e-3,7e-3 |

Results summary (averages over the results for 2 chemicals for multiple cases) for the distribution of each entry in the ^{*} directions case is chemicals chosen from distinct chemical groups rather than completely at random from the 296-chemical library.

Direction | P( |
P( |
P( |
kurtosis | skewness |
---|---|---|---|---|---|

Chemical^{*} |
1.9,2.3,2.2 | 0.3,0.2,0.1 | 0.09,0,03,2e-3 | 1.82,0.77,-5e-3 | 0.46,4e-3,5e-3 |

Chemical | 2.2,2.3,2.3 | 0.3,0.2,0.2 | 0.1,0.03,0.03 | 14.7,0.47,0.47 | 1.0,0.02,0.02 |

Random | 2.8,2.3,2.3 | 0.5,0.2,0.1 | 0.1,5e-3,3e-3 | 17.3,4e-3,-1e-3 | 1.46,0.02,2e-3 |

The average absolute difference (and the ^{2} describing the fit) between the estimated and observed probability of a chemical being present, for each of four chemicals, denoted as chemical 1, 2, 3, and 4. The first entry in each cell is for the case of four randomly-selected chemicals from the chemical library. The second entry is for the case of four random directions simulated from a Gaussian distribution. There are two simulated reference distributions. The first has the same mean and Σ as the real data, but is a SCMG; the second has a diagonal Σ with equal variances on the diagonal. Simulated data entries are within approximately 0.01 of a repeat of the 10 sets of 16,384 pixels. Real data entries would not vary if the 10 sets of 16,384 were analyzed again.

Data Source | Agreement Measure | Chemical | |||
---|---|---|---|---|---|

1 | 2 | 3 | 4 | ||

Real Data: 10 random subsets of 16,384 pixels from scene A | Avg Abs Diff | 0.50, 0.07 | 0.20, 0.08 | 0.19, 0.08 | 0.43, 0.07 |

^{2} |
0.09, 0.80 | 0.60, 0.78 | 0.46, 0.78 | 0.54, 0.79 | |

Reference Gaussian Data having same Σ as Scene A | Avg Abs Diff | 0.46, 0.07 | 0.20, 0.07 | 0.20, 0.07 | 0.43, 0.07 |

^{2} |
0.06, 0.78 | 0.57, 0.78 | 0.45, 0.78 | 0.57, 0.77 | |

Reference Gaussian Data having Diagonal Σ | Avg Abs Diff | 0.11, 0.04 | 0.04, 0.04 | 0.04, 0.04 | 0.06, 0.04 |

^{2} |
0.91, 0.93 | 0.96, 0.94 | 0.92, 0.94 | 0.79, 0.94 |