^{*}

Reproduction is permitted for noncommercial purposes.

This paper presents a nonlinear Bayesian regression algorithm for detecting and estimating gas plume content from hyper-spectral data. Remote sensing data, by its very nature, is collected under less controlled conditions than laboratory data. As a result, the physics-based model that is used to describe the relationship between the observed remote-sensing spectra, and the terrestrial (or atmospheric) parameters that are estimated is typically littered with many unknown “nuisance” parameters. Bayesian methods are well-suited for this context as they automatically incorporate the uncertainties associated with all nuisance parameters into the error estimates of the parameters of interest. The nonlinear Bayesian regression methodology is illustrated on simulated data from a three-layer model for longwave infrared (LWIR) measurements from a passive instrument. The generated LWIR scenes contain plumes of varying intensities, and this allows estimation uncertainty and probability of detection to be quantified. The results show that this approach should permit more accurate estimation as well as a more reasonable description of estimate uncertainty. Specifically, the methodology produces a standard error that is more realistic than that produced by matched filter estimation.

Estimating the constituent concentrations of industrial gas plumes from hyper-spectral data has received much attention in the recent years. See, e.g., [

Often the nuisance parameters are not completely unknown. Some information exists, usually from previous measurements or from a generic mathematical model. Many remote sensing estimation techniques leverage this external information by (1) estimating the nuisance parameters, (2) then plugging the estimates into the physics-based formulas and (3) mathematically solving for the desired parameters. An estimation technique for the nuisance parameters that has become popular consists in first identifying a “best” scene from large look-up tables that may contain thousands or more real scenes, and then deriving from this scene the necessary parameters [

Bayesian methods, on the other hand, are better suited for dealing with over-parameterized models. They explicitly make use of the nuisance parameter uncertainties when constructing the estimates of the parameters of interest. Moreover, under general conditions, Bayesian methods still provide consistent estimates of parameters of interest even when the number of nuisance parameters grows with the sample size [

The typical radiance models used in hyper-spectral data analyses are nonlinear in nature. State-of-the-art methods linearize these models using physics-based considerations such as focusing on optically thin plumes, assuming a known ground emissivity, ignoring the down-welling sky radiance, ignoring the nonlinear variation of ground emissivity, or/and linearizing the plume/background temperature difference.

This paper introduces efficient estimation methods that do not require such simplifications. First, a computationally fast algorithm (comparable in speed to matched filter estimators) for point estimates is described. This iterative procedure, called the Nonlinear Maximum Posterior Density (NLMPD) estimator produces the traditional Bayesian point estimate, which is the maximum posterior density value. This estimator, however, does not provide the full posterior, and, consequently, cannot always accurately quantify uncertainty. In order to produce these uncertainties, Markov Chain Monte Carlo (MCMC) algorithms are introduced [

The rest of this paper is organized as follows. Section 2 describes the radiance model used to illustrate the Bayesian methodology presented in Section 3. Section 4 addresses the parameter estimation of the nonlinear Bayesian regression model. Applications on realistic simulated data are shown in Section 5.

A complete derivation of the radiative transfer model in the thermal hyper-spectral regime is complicated, e.g., [

Let ^{−1}) of the electro-magnetic radiation, and let _{g}_{p}_{a}_{g}_{g}_{p}_{a}_{u}_{d}_{p}_{j}_{j}

Finally instrument error adds noise, so that the observed spectra is given by

As mentioned earlier, the three-layer model makes some simplifications:

_{a}_{u}_{d}_{g}_{g}

The Bayesian algorithm introduced in the next section can be extended so that these restrictions can be relaxed.

Given the observed radiation, _{obs}_{E}_{E}_{1}),…, _{n})), where

The above model is nonlinear in the sense that

Which parameters are nuisance parameters or parameters of interest depend on the particular application. For the temperature/emissivity separation problem,

No matter what the application is, a typical problem of physics-based models for hyper-spectral data is that the unknown parameters outnumber the spectral channels recorded, so if the model is used to formulate a classic (non-Bayesian) regression model, it will represent an under-determined set of equations. One could say that the spectrum is “under-sampled,” but it is important to note that the problem cannot be solved by increasing the sampling. The number of nuisance parameters increases as the sampling rate increases [

“Under sampling” is usually solved by including additional information about the nuisance parameters into the problem formulation. The non-Bayesian way to do this is to plug in estimates for the parameters, while the Bayesian formulation considers parameters as random variables and uses a prior distribution to supply this information. The chief advantage a prior distribution has over the plug-in approach is that estimate variabilities or uncertainties can be automatically included.

A secondary advantage is that a prior distribution provides a richer framework for describing what is known about the parameters. In the plume gas concentration estimation problem, the component of the prior that describes the temperatures and the ground emissivity, (representing the nuisance parameters) may include all information known about these parameters; For example, background emissivity must be constrained to the interval [0,1] and also be smooth across energy channels; This information can easily be built into the model prior. Also, the background typically has a strong spatial structure, and it might be very useful to include this information into the prior.

The parameters of interest must also be given a prior. If nothing is known about them, one would apply a non-informative or weakly informative prior to these parameters. However, if some information concerning them is available, their priors can be modified to include such information. For example, it is obvious that the effluent burdens reside within an interval with lower bound 0 and some maximum plausible burden.

Let _{g}_{p}_{g}_{1},…,
_{J}).

The generalized Beta distribution and the truncated Gaussian distribution with large variance are commonly used to defined weakly informative priors on large intervals [_{0} is a normalization constant so that _{β}_{j} ∈_{j}_{|} ≤ _{j}_{J}_{0} does not need to be known for the Bayesian estimation of the state parameters. By choosing _{j}_{c}_{c}_{β}

External information regarding the nuisance parameters is often available. For example, _{p}_{g}

Defining priors for the background emissivity is more challenging. The ground emissivity has two important properties that need to be considered in the model and the prior: smoothness and constraint to be in the interval [0, 1]. On the other hand, there is a lot of available information on ground emissivity. For example, the NEFDS database contains pre-measured surface reflection parameters for over 400 materials corresponding to a wide variety of objects ranging from camouflage to white paint. This context motivates the following approach for building a prior distribution for the ground emissivity: (1) Model the emissivity as a smooth function of the wavenumber using for example a spline function [^{−1}.

A simpler use of the NEFDS library for defining a prior on _{g}_{k}_{g}_{1},…,_{m}_{−1}) has a Gaussian distribution with mean 0 and covariance equal to the square of the diagonal matrix of eigenvalues resulting from the singular value decomposition, and λ is a constant we set to 4 in order to generate a diffuse prior. For our experiments we used

Bayesian inference about the model parameters relies on their posterior distribution. Given the data model (6) and the parameter prior distribution (7), the posterior distribution _{obs}_{1} is a normalization constant. Two approaches are commonly used to construct parameter estimates and their associated uncertainties. One can either use the posterior mean and posterior covariance, or the posterior mode and highest posterior density interval.

In order to produce the mode of the posterior (_{prior} is the covariance matrix of the prior distribution, and

Let _{0}, _{1}, _{2},… that form, after a sufficiently long burn-in of say

The simplest MCMC is the Metropolis-Hastings algorithm [_{t}_{+1} by first sampling a candidate point _{t}

If the candidate point is accepted, the next state become _{t}_{+1} = _{t}_{+1} = _{t}_{1} in (_{1} cannot be reliably computed as it requires a high-dimensional integration.

Remarkably, the proposal distribution _{t}

Due to the typical high-correlations observed between the effluent concentrations within the plume, the mixing remains in many cases very low. This forces us to generate very long MCMC sequences and thin them. Thinning consists of subsampling from the sequences by selecting every

Recently, Hamiltonian paths have been proposed to both increase the mixing in the MCMC sequence and reduce the number of rejections [_{1},…,_{d}_{1},…, _{d}

Consequently, if one can sample (

Hamiltonian dynamics allows one to move along trajectories of constant _{t}

Because the leapfrog technique is only an approximation, _{t}_{+1} = _{t}_{+1},_{t}_{+1}) + _{t}_{t}_{t}_{t}_{+1} =

The efficiency of the Hamiltonian algorithm depends on the choices for the ^{0.5}) produce the desired sequences.

Finally, in our context one can often expect to find parameter estimates on the distribution boundaries. For example, some gases that are part of the model are not present in the plume or are in very small concentrations. MCMC techniques, Metropolis and Hamiltonian, have difficulties generating legitimate candidates that are within the boundaries of the parameter space, leading to many rejections before a suitable candidate can be proposed. To solve this problem, we present a bouncing algorithm that conserves the equilibrium property central to the MCMC algorithm.

Let _{t}

The bouncing algorithm determines

_{u}_{l}

_{l}_{u}

_{2} ← min{_{l}_{u}

_{2} − _{1}

_{1}) modulo 2

if

_{1})/

_{i}

_{i}

_{i}

To test the methodology, we simulated spectra with known gas burdens and compared the NLBR approach with a matched filter, perhaps the most popular algorithm currently in use for gas plume detection. We considered the case of a plume with 3 non-correlated gases — NH3, Ethylene and Butyl Acetate — at various concentration levels: 0, 10, 20, 30, 50, 70, 90, and 110 ppm-M. For each level of concentration, 100 spectra were generated. For each simulation, the background emissivity was randomly selected from the 6 emissivities shown in

Because of this problem, the matched-filter estimates are biased, and the actual error in the estimates (matched-filter Root Mean Squared Error RMSE=36ppm-M) is much greater than the error predicted by the algorithm (matched-filter predicted standard error=1ppm-M). On the other hand, these problems are absent from the NLBR estimator; It has no detectable bias, and the observed and predicted estimation error, given by (

The fact that the NLBR algorithm produces realistic standard errors has great benefit when the estimates are used for detection. The usual statistic to test the presence or absence (0 ppm-M) of a gas is the t-statistic, i.e., the algorithm estimate divided by the algorithm predicted standard error given by (

This means that detection thresholds for NLBR estimates and the associated false call rate can be theoretically calculated and results will correspond to theory.

Fits of the NLBR model to actual hyperspectral cubes have shown that the performance shown in

The results presented up to this point utilize the “quick” Bayesian solution provided by the NLMPD approach.

The two histograms in

The second posterior distribution illustrates the problem that occurs when the plume contains a large amount of the gases (100 ppm-M for each gas). In this case, the shape of the distribution is dramatically different; between 0 and 100 ppm-M it is roughly uniform, and becomes roughly normal above 100 ppm-M (or more appropriately half-normal). The uniform portion below 100 ppm-M is produced by the fact that the Butyl Acetate signal is equivalent to a linear combination of other gases. The normal half of the distribution is caused by the constraints on the linear combinations (burdens are non-negative), which become active at higher-burdens. Note that the exact nature of the correlations in the plume gases can create posteriors with quite a complex shape; It is possible to have several modes in the posterior. As one can see, the MCMC can be very valuable for evaluating plumes that may contain several gases.

The above histograms were constructed from a sample of 5000 data points generated by the Hamiltonian MCMC, after a burn-in phase of size 1000. Similar outputs could be obtained with the “traditional” MCMC algorithm described in Section 4.2, using a thinning of size

The nonlinear Bayesian regression shows definite promise for producing better estimates from hyper-spectral data. To be most useful, NLBR should not be considered to be a specific algorithm, derived from a specific model, but a general framework for producing estimates that can be tailored to the problem at hand.

The effectiveness of these regression models depends heavily on the prior information supplied to them. The methodology does require that realistic prior distributions be constructed on the nuisance parameters. However, the fact that uncertainties can be incorporated into the prior distributions in a straightforward manner means that the Bayesian methodology incorporates these uncertainties in a fairly automatic way. In fact, one of the strongest motivations for using a Bayesian models may be because its statements of estimate uncertainty are more believable than those from other methodologies.

Our simulations involving NLBR have produced results that correspond with theory. The application of the present NLBR model to real hyper-spectral data have produced less optimal performance; It is obvious that real hyper-spectral data contains more variability than our NLBR currently describes, most prominently, atmospheric variability [

The authors would like to acknowledge the helpful advice of Herb Fry, Tom Burr, Bernie Foy, and Brian McVey of Los Alamos National Laboratory. This work was supported by the Computational and Information Science Initiative at the US Department of Energy's Pacific Northwest National Laboratory, Richland WA 99352. Pacific Northwest National Laboratory is operated by Battelle Memorial Institute for the US Department of Energy under Contract DE-AC05-76RL01830.

Emissivities of materials used in the simulations.

Matched-Filter Estimator Compared to NLBR Estimator.

Q-normal Plots of the T-statistic for NH3 and Ethylene, When the True Gas Burden is 0.

POD Curves for NH3 and Ethylene.

MCMC Derived Posterior for a Plume with 11 Highly Correlated Gases.