Hyperspectral Unmixing from Incomplete and Noisy Data

In hyperspectral images, once the pure spectra of the materials are known, hyperspectral unmixing seeks to find their relative abundances throughout the scene. We present a novel variational model for hyperspectral unmixing from incomplete noisy data, which combines a spatial regularity prior with the knowledge of the pure spectra. The material abundances are found by minimizing the resulting convex functional with a primal dual algorithm. This extends least squares unmixing to the case of incomplete data, by using total variation regularization and masking of unknown data. Numerical tests with artificial and real-world data demonstrate that our method successfully recovers the true mixture coefficients from heavily-corrupted data.


Hyperspectral Unmixing
Hyperspectral imaging is continuously being suggested for new industrial applications.While it has been used for decades in remote sensing (see, e.g., [1]), it is now being applied in various industrial sorting applications, such as recycling of polymers, paper and quality control of crops and fruit; see the references in [2].It has also been suggested for controlling glue coverage in manufacturing of wood strand boards [3] or even gas plume detection [4].
In hyperspectral imaging, the electromagnetic reflectance of an object or scene is measured in several hundred wavelength ranges at once, so that the spectrum measured at one spatial pixel acts as a fingerprint of the mixture of materials at the corresponding point in the scene.A hyperspectral image of size m × n with L wavelength ranges can be represented by a hypercube Y cube ∈ R m×n×L or, alternatively, by column-wise spatial reshaping, as two-dimensional array Y ∈ R L×N , where N = mn.In the following, we focus on the latter representation.
Mixing of materials can be macroscopic, when, due to low resolution, several materials are measured at one spatial pixel, or microscopic, when the surface layer is an intimate mixture of the pure materials at different concentrations.Usually, hyperspectral unmixing is based on the linear mixing model.The model comes from remote sensing, where the ground area corresponding to one pixel is large, e.g., 30 × 30 meters, and is often made up of several kinds of ground coverage, such as grass, soil or asphalt.The intensities of light reflected by the partial areas over the measured electromagnetic range are collected in the sensor.If the partial areas of different coverage are disjoint, it is reasonable to assume that the spectral distribution of light collected in the sensor is the sum of the spectral distributions reflected by the individual types of coverage, multiplied by their respective partial areas.For the limitations of the model and algorithmic approaches, we refer to the survey by Bioucas-Dias et al. [5].
Hyperspectral unmixing assumes the existence of pure spectral signatures, called endmembers, and seeks to infer the fractions of these endmembers, called abundances, in each pixel of the scene.As the number of materials, here denoted p, is usually much smaller than the number of wavelengths, hyperspectral unmixing is also an important data reduction step.Denote by k 1 , k 2 , . . . ,k p ∈ R L the endmembers, corresponding to pure (average) reflectance spectra of the individual coverage types and measured at L wavelengths.Let the respective relative areas that they occupy in the ground region corresponding to the j-th camera pixel, the abundances, be denoted by x 1 (j), x 2 (j), . . ., x p (j) ∈ [0, 1].The linear mixing model says that the spectrum y(j) ∈ R L observed in the j-th position is the convex weighted sum of k 1 , k 2 , . . . ,k p , weighted by the abundances x 1 (j), x 2 (j), . . ., x p (j) plus some noise vector w(j) ∈ R L : where: x r (j) ≥ 0, r = 1, . . ., p, and These two conditions are referred to as abundance non-negativity constraint (ANC) and abundance sum-to-one constraint (ASC) in the literature; see, e.g., [5].Setting x(j) := (x 1 (j), x 2 (j), . . ., x p (j)) ∈ R p , both conditions can jointly be written as simplex constraint: Using the vectors in Equation ( 1) for j = 1, . . ., N as columns of corresponding matrices: the linear mixing model becomes:

Contribution
In this paper, we make the additional assumption that only part of the entries of the data matrix are known; this can be specified by a mask M ∈ {0, 1} L×N , where: Then, restricting to the known entries, the linear mixing model Equation (3) with missing pixels reads: where the Hadamard matrix product is defined by Hyperspectral unmixing refers to the estimation of the abundance matrix X from the given data Y; the endmember matrix K is either known or has to be found, possibly from a larger dictionary.We assume that K is known.This is a realistic assumption, because in many applications, such as sorting of materials, manufacturing control or remote sensing, reference spectra are available (see, for example, [6] or glue coverage control in [3]).
In this paper, we propose a novel joint unmixing and inpainting approach.Unmixing directly the incomplete hypercube, our approach replaces the two-step procedure of inpainting followed by unmixing, where the inpainting step typically introduces artifacts.These artifacts are avoided in the joint approach, because the knowledge of the signal subspace is used from the beginning, and we successfully recover the abundances knowing only a small part of the data cube.
Our presentation is organized as follows: In Section 2, we introduce our novel variational unmixing model based on the linear mixing model with missing pixels and encourage spatial regularity by an edge-preserving discrete total variation regularizer.The minimizer of the corresponding functional is computed by primal-dual optimization methods in Section 3.Both on real and simulated data, our approach yields very good results, as can be seen in Section 4.

Related Work
The incompleteness of hyperspectral data cubes has different reasons and comes in different forms.In the first "traditional inpainting" case, the full spectrum has been measured correctly at most object locations, while at the other locations, nothing is known.A recent inpainting approach in this setting is [7,8], where missing lines from airborne push-broom scanning are inpainted by anisotropic diffusion.As seems inherent to the PDE approach, the information from other channels enters the restoration of some channel, apart from a normalization step, only in the strength coefficient for the diffusion.Another approach, suggested for example in the survey [5] by Bioucas-Dias et al., is to perform spectral unmixing first and then perform inpainting on the lower dimensional signal subspace.This reduces the computational cost, and the noise level is lower after the dimension reduction.Chen's survey [9] combines PCA with diffusion or TV inpainting methods, successfully inpainting small missing regions.
In the second case, there are few object locations where all of the information is lost, but for many object pixels, only part of the spectral information is available.A typical application are hypercubes acquired by so-called push-broom scanning or line cameras with missing information due to faulty sensor pixels in the area sensor.One can attempt to use the methods from the traditional inpainting case.Furthermore, channel-wise inpainting works well if only a few neighboring lines of pixels are missing.If, however, missing regions become large, its performance degrades.There are inherent problems to channel-wise methods: The affected region may contain spectra that are not in the boundary of the missing region; it may contain intensities smaller than the minimum or larger than the maximum occurring on the boundary of the missing region in that channel or the missing region may have contour lines not meeting the boundary.If we are in one of these settings, channel-wise diffusion-based inpainting cannot be successful.
In the context of accelerated hyperspectral image acquisition, the authors of [10] try to recover the full cube from a partially-measured cube with a DCT sparsity prior on the spectra.As spatial regularization, they impose spatial sparsity w.r.t. a shift-invariant redundant wavelet basis.
So far, all approaches concentrate only on inpainting of corrupted hyperspectral images.To obtain the abundances, a second unmixing step would be required.An attempt for a joint unmixing and inpainting using beta priors for the coefficient vectors is [11].They start from a larger endmember dictionary found by other methods (using Hysime [12]) and encourage spatial redundancy by learning a basis of atom cubelets for the abundance cube.

Novel Mathematical Model
In this section, we derive our unmixing model from incomplete data, combining spatial regularization with the masked unmixing data term.The model allows for the inpainting of missing data during the unmixing.

Unmixing Model with Spatial Regularization and Proposed Model
We start from Model (4) and add spatial regularization.If the noise W is independently Gaussian, the maximization of likelihood in Equation ( 4) leads us to seek X as: argmin where for a matrix Y, the square of its Frobenius norm is Y 2 F := trace(YY ).In real-world images, neighboring pixels are likely to contain the same mixture of materials, i.e., the same abundances.Therefore, the recovery of abundances can be improved by adding a spatial regularity prior, penalizing too much variation of the abundances.This is of particular importance for noisy data.Hence, we now introduce a discrete total variation (TV) functional, see [13], which has become one of the most widely-used regularizers in image restoration models, because it preserves sharp edges.We define the discrete gradient operator, as in [14], by: Here, I n denotes the n × n identity matrix; the Kronecker product of two matrices A ∈ R α 1 ×α 2 and B ∈ R β 1 ×β 2 is given by the matrix: and: denotes the forward difference matrix.Note that the zero row corresponds to mirrored boundary conditions.For an image F ∈ R m×n , column-wise reshaped into a vector f ∈ R N , let: Throughout this paper, | • | operates on a matrix by replacing, at each pixel, the vector of partial derivatives by its euclidean norm.
Recall that the reshaped abundance image of the r-th material, is contained in the r-th row xr of Summing over all materials, we add the regularization term ∑ p r=1 TV( x r ) to Equation ( 5) and arrive at the following unmixing model for incomplete data: for regularization parameters ν ≥ 0 and λ > 0. This corresponds to the isotropic spatial TV-regularization with no coupling between the channels.As already mentioned, the simplex constraint is also known as ANC + ASC.
Remark 1.A related functional has been investigated in [15].Their model reads, for κ ≥ 0 and λ ≥ 0, argmin TV aniso ( x r ) subject to X ≥ 0 where X 1,1 := ∑ N j=1 x(j) 1 and: In contrast to our functional, this functional contains no mask, uses anisotropic TV, and the additional term X 1,1 encourages sparsity.In [15], sparsity is important, because the authors use a larger K with columns corresponding to spectra from a library.In the presence of the ASC, as in our model, this term is constant, We have also performed experiments with anisotropic TV, i.e., replacing TV in Equation ( 6) with TV ansio , and we comment on this briefly in Section 4.3.If not mentioned otherwise, all of our results are obtained for isotropic TV.
Lemma 1.The proposed Model (6) is a convex function of X.If ν > 0, then it is strictly convex in X, and therefore, has a unique minimizer in X ∈ ( p ) N .

Proof. The first term,
2 is convex, and for each r ∈ {1, . . ., p}, TV( x r ) is a convex function of x r .Therefore, the functional without the term ν 2 X 2 F is convex.The term 1 2 X 2 F is strictly convex; thus, for ν > 0, the functional is strictly convex.The set p and, hence, also the set ( p ) N , which incorporates the ANC and ASC constraints is closed, bounded and convex.The existence of a minimizer is ensured because a continuous function on a compact set attains its minimum.The minimizer is unique for ν > 0, because a strictly convex function has at most one minimizer on a convex set.

Model Reformulation
In this section, we rewrite the proposed Model (6) in a sound matrix-vector form to better apply algorithms from convex analysis.Reshaping the matrices in Equation (4) appropriately into vectors, i.e., the abundance matrix X = x(1) . . .x(N) , the data matrix Y = y(1) . . .y(N) ∈ R L×N and the mask M ∈ {0, 1} L×N into: . . .
we can write the model with related Kronecker products.We make repeated use of the relation: First note that: Next, from vec X∇ = (∇ ⊗ I p )x, we have: Inserting these equations in Equation ( 6) and writing the constraint with the help of an indicator function: we obtain the vectorized unmixing model for incomplete data: This can equivalently be written as: )

Algorithm
We use the primal-dual hybrid gradient algorithm (PDHGMp) [16,17] to solve Equation (6") by finding a saddle point of the associated Lagrangian: This leads to Algorithm 1, which is guaranteed to converge to a saddle point of the Lagrangian by [18] [Theorem 6.4].All steps can be computed very efficiently.In Step 1, we compute an orthogonal projection of x (r) /(1 + ντ) to ( p ) N .In Step 2, for v (r+1) , we compute a coupled-shrinkage of (∇ ⊗ I p )x (r+1) + b (r) v , with coupling within each pair of entries corresponding to the forward differences in both spatial directions at one pixel.The "Kronecker product-vector" multiplications in Algorithm 1 are implemented in "matrix-matrix" form, using Relation ( 7), e.g., for (∇ ⊗ I p )x, we compute X∇ .
The number of elementary operations required for one iteration of the algorithm is linear in the number of entries of the hypercube, for constant p.In more detail, each iteration requires: N vector projections to p , which are computable with O(p 2 ) operations each; the coupled shrinkage of N p pairs of entries, computable in constant time; further, (2p + 1)NL multiplications and the same order of additions.
Remark 2. (Parameters and initialization) As mentioned above, the term ν 2 X 2 F ensures strict convexity of the model.We set ν = 10 −3 in all experiments below.In practice, setting ν = 0 works equally well, e.g., setting ν = 0 changes the recovery percentages in Section 4.3, Table 1, by less than 0.1%, where more than 0.3% of the pixels are known.
The parameters τ and σ influence only the convergence speed of the algorithm.They could be chosen adaptively [19]; here, we fix them both to (max , which ensures the required bound on the product τσ.
The remaining regularization parameter λ is chosen heuristically taking the strength of the noise and the size of the image features into account.
As initialization for X, we start with a uniformly random matrix and then normalize, so that the columns sum to one.Finally, we always use zero initialization for u (0) , v (0) , b v .

Numerical Results
We start with a brief introduction to hyperspectral image acquisition by a hyperspectral line camera in Section 4.1, before giving results on real data in Section 4.2 and on simulated data in Sections 4.3-4.5.

Hyperspectral Line Camera
A standard approach in hyperspectral image acquisition is to measure lines of the image one by one simultaneously at all wavelengths.In Earth observation, this is referred to as push-broom scanning.The method is also used in industrial applications, such as plastics sorting for recycling purposes.A typical camera is shown in Figure 1 (left).
At any time, one line of the image is measured at all wavelengths simultaneously.The incoming light from the current line is diffracted by an optical grid onto the area sensor of the camera (see Figure 1 (right)) and recorded as one "sensor frame".The full 2D object is measured by moving the object relative to the camera.While the lines add up to the full 2D object, the sensor frames with spectral and along-the-line directions are stacked along the second image direction to form the full 3D hypercube.
The snapshot of the area sensor taken for the line y = y j of the object yields a section Y cube :,y j ,: of the hypercube Y cube ∈ R m,n,L as depicted in Figure 2 (right), where the z-axis is the spectral direction.The manufacturing process of the sensor commonly produces some defect pixels.For push-broom scanning, each sensor frame Y cube :,y j ,: has the same pattern of missing pixels.One missing pixel thus creates a missing line, and a cluster of missing pixels creates a cylinder of missing entries in the hypercube.

Numerical Results for Real Data
In this section, we present results for a hyperspectral image measured at the Fraunhofer Institute for Industrial Mathematics ITWM with the line camera shown in Figure 1, comparing the restoration obtained as a by-product of our unmixing to a traditional inpainting.
We have measured the marked region in Figure 2 (left), in the wavelength range 1073-2300 nm, at a spectral resolution of L = 256 channels.The four regions contain plastic mixtures with different spectral signatures, and the assembly has been covered with a plastic foil.
Figure 3c shows the mask of working pixels with a few circular regions of corruption artificially added, marked in violet as "simulated defects".The small circle marked in red is a real defect, and a camera with such a defect is sold at a considerably reduced price.Slices Y cube :,:,z , z = 1, 2, . . ., L of the hypercube corresponding to a particular wavelength are called channels.Figure 5a shows Channels 10, 70, 90.Here, Channel 10 is noisy and affected by individual broken sensor pixels, each creating one missing line.Figure 5b shows the same channels, after the artificial defects have been added, i.e., masked, and while Channel 10 stays unchanged, we see that not much remains of Channels 70 and 90.
For the unmixing, we give to Algorithm 1 the hypercube together with the mask and the four pure spectra present in the scene, i.e., p = 4, as columns of K.These have been obtained from averaging spectra over manually selected rectangles and are shown in Figure 6a.For the comparison, we take the following approach: Starting from the unmixing coefficients X obtained by minimizing Equation ( 6) with Algorithm 1, we form the product KX, which is an approximating restoration of the data matrix Y.After reshaping, we compare this restoration to an inpainting of the hypercube Y cube by [20].
We have chosen the Navier-Stokes-based inpainting [20] as representative of neighborhood-based inpainting methods.The Navier-Stokes inpainting is performed in each x-z-plane of the hypercube and estimates missing data looking at surrounding pixels in that plane.In contrast to the proposed method, such neighborhood-based inpainting of the data cube lacks the capacity of utilizing the provided information about pure spectra.Runtime is several minutes.For Algorithm 1, we used the parameters from Remark 2 and λ = 0.01.The relative primal step x (r+1) − x (r) 2 / x (r) 2 fell below 10 −3 after 111 iterations, which took 7 s on an Intel Core i7 with 2.93 GHz.The graphics shown are after 1000 iterations.
In Figures 4 and 5, we compare the performance of Algorithm 1 to a Navier-Stokes inpainting of each x-z-plane of the hypercube.
For larger clusters of defect pixels, the inpainting of the hypercube by Navier-Stokes is not satisfactory: looking at Figure 4, the large masked region could still be guessed from the images inpainted with Navier-Stokes.On the other hand, the inpainted sensor images obtained from our method in Figure 4c,f agree well with the measured sensor frames in Figure 3a,b, also removing some noise, which was not contained in the mask.
In Figure 5, we see that the broader missing stripes in Channels 70 and 90 cannot be restored by Navier-Stokes inpainting and, hence, would introduce errors to any following unmixing step, whereas our joint unmixing remains unaffected and gives a satisfactory (denoised) restoration of the original, because the information from all intact channels is being used.

Numerical Results for Artificial Data (Pure Regions)
We have seen that our unmixing model performs well even if large parts of the hypercube are missing.To analyze which percentage of sensor pixels can be missing, i.e., to quantify the unmixing results in some way, in this section, we use an artificial input image with a small level of noise added.
The artificial image is constructed as follows.The j-th of the four regions of the piecewise constant 240 × 148 ground truth image in Figure 7 (left), is filled with copies of the j-th endmember spectrum.The four endmember spectra, which form the p = 4 columns of the matrix K, are shown in Figure 7 (right).The spectra come from our measurements of common plastic blends.The resulting hypercube Y pure belongs to R 240×148×256 .We add independent zero-mean Gaussian noise to each entry Y pure ijr , obtaining the hypercube Y low .Here, the standard deviation was taken slightly larger than 1% of the maximal entry of Y pure .For several percentages from 100% down to 0.1%, we randomly generate sensor masks of size 240 × 256 with approximately this percentage of working pixels, by flipping a biased coin for each pixel.Figure 8a shows the upper left quarter of the sensor mask for 3% working pixels.Note that the percentage of known sensor pixels and the percentage of entries of Y cube which are known to the unmixing algorithm, are the same, because each masked sensor pixel corresponds to one line in the hypercube along the y-axis.
For each percentage and corresponding mask applied to Y low , we find the minimizer X of Equation ( 6) by Algorithm 1 with λ = 0.1 and other parameters as in Remark 2; running 300 iterations took an average of 110 s per image.Then, we reshape X into the cube X low of unmixing coefficients.To quantify the quality of X low , we assign to each image pixel (i, j) the endmember corresponding to the largest of the four unmixing coefficients X low ij1 , . . ., X low ij4 at that image pixel.Some of the resulting label images are shown in Figure 8b.Table 1 lists the percentages of correctly assigned pixels depending on the percentage of known sensor pixels.For more than 10% working sensor pixels, the algorithm found the largest material abundance at the correct material for 100% of the image pixels.Furthermore, in Table 1, we give the results from minimizing the model obtained by replacing isotropic TV in Model (6) with anisotropic TV, as introduced in Remark 1.The results are slightly worse, though not much.Next, we construct a test image comprising patches of linear mixtures of the endmembers, to meet the real-world scenario of mixed pixels.We take again the four endmembers shown in Figure 7, hence again p = 4.As shown in the false color visualization of the ground truth in Figure 9 (left), the corners are filled with pure spectra, and along the sides of the image, we form linear mixtures.Then, a larger amount of noise, with a standard deviation of about 10% of the maximum, is added to the hypercube, and only about 10% of the pixels are retained.As in the previous section, we do this by first sampling a sensor mask under the assumption that each sensor pixel works with a probability of 10% and then simulating the line camera measurement.In Figure 9, we plot Channel 40 of the hypercube, in the middle before masking, and on the right, as known to the algorithm.

Numerical Results for Non-Occurring Endmembers
In applications, some of the expected materials might not be present in the image.We proceed as in the previous Section 4. 4, except that the endmember matrix K now contains the eight spectra shown in Figure 6b (p = 8), of which four are chosen and mixed using the same abundances as in Section 4.4.The abundances obtained from unmixing 10% of the hypercube are shown in Figure 11.Again, the true abundances are perfectly recovered.

Conclusions
In this paper, we have introduced a novel model for hyperspectral unmixing from incomplete and noisy data, provided that an estimate of the signal subspace is available.The model applies if an arbitrary part of the hypercube is known, e.g., a random selection of entries.
The model allows unmixing in one step from incomplete data cubes, having larger regions of missing data than could be restored by preprocessing methods unaware of the signal subspace.
We demonstrated results from line cameras, for which the mask of known entries is structured: lines along the second spatial direction of the hypercube are either fully known or fully unknown, which leads to a more difficult inpainting problem compared to, say, a random distribution of known entries in the hypercube.For large missing regions, where traditional Navier-Stokes inpainting failed, the data term approximation, obtained by our unmixing, provided a good inpainting.
We simulated artificial data with this special structure of the mask.For an image composed of pure materials, knowing only 3% of the sensor pixels in the simulated measurement, the rounded unmixing is a 99.5% correct assignment of pixels to materials.For a mixed and noisy image, knowing 10% of the sensor pixels, we obtain a visually perfect recovery of the abundances.
Our results show on real data that unmixing results can be perfect in spite of missing regions in the hypercube, which are orders of magnitude larger than for current line cameras.This shows the potential for further applications, such as those arising from novel image acquisition techniques.That the variational model is simple and can be easily extended is certainly a further benefit.

O b j e c tFigure 1 .Figure 2 .
Figure 1.Hyperspectral line camera (left) and principle of measuring a line simultaneously at all wavelengths (right).

Figure 3 .
Figure 3. (a,b) Two sensor frames y = y 0 with the spectral direction along the z-axis and (c) the mask of working sensor pixels.

Figure
Figure 3a,b shows two sensor frames y = 30, 60 of the measured hypercube.Being snapshots of the area sensor, they have the same pattern of missing pixels.Figure 4a,d shows the same sections with the artificially added circular defects.Slices Y cube :,:,z , z = 1, 2, . . ., L of the hypercube corresponding to a particular wavelength are called channels.Figure5ashows Channels 10, 70, 90.Here, Channel 10 is noisy and affected by individual broken sensor pixels, each creating one missing line.Figure5bshows the same channels, after the artificial defects have been added, i.e., masked, and while Channel 10 stays unchanged, we see that not much remains of Channels 70 and 90.For the unmixing, we give to Algorithm 1 the hypercube together with the mask and the four pure spectra present in the scene, i.e., p = 4, as columns of K.These have been obtained from averaging spectra over manually selected rectangles and are shown in Figure6a.

Figure 5 .
Figure 5. Channels 10, 70, 90 of: (a) the noisy original hypercube; (b) the masked original known to the algorithm; (c) the restoration by Navier-Stokes; and (d) the restoration by our method.

Figure 7 .
Figure 7.The artificial image is constructed by filling region j of the image on the left with the j-th endmember spectrum on the right (p = 4).

Figure 8 .
Figure 8.(a) Upper left corner of the 240 × 256 mask for 3% working sensor pixels; (b) label maps obtained from the unmixing by assigning to pixel (i, j) the index r of the largest coefficient X low ijr ∈ {X low ij1 , . . ., X low ij4 } at that location.

Figure 10 Figure 10 .
Figure10shows the obtained abundances, for parameters as in Remark 2 and λ = 0.3 after 918 iterations, which took 447 s on an Intel Core i7 with 2.93 GHz.The result is visually identical to the ground truth.

Figure 11 .
Figure 11.Unmixing result with eight endmembers of which four occur in the image; with 10% of the noisy hypercube known.The images show the estimated abundances X cube ij1 , . . ., X cube ij8 corresponding to the eight pure spectra.

Table 1 .
Percentage of image pixels, for which the largest of the material abundances found by the unmixing, correctly identifies the material at that pixel.