Variational Destriping in Remote Sensing Imagery: Total Variation with L 1 Fidelity

This paper introduces a variational method for destriping data acquired by pushbroom-type satellite imaging systems. The model leverages sparsity in signals and is based on current research in sparse optimization and compressed sensing. It is based on the basic principles of regularization and data ﬁdelity with certain constraints using modern methods in variational optimization - namely total variation (TV), both L 1 and L 2 ﬁdelity, and the alternating direction method of multipliers (ADMM). The main algorithm in this paper, TV- L 1 , uses sparsity promoting energy functionals to achieve two important imaging effects. The TV term maintains boundary sharpness of content in the underlying clean image, while the L 1 ﬁdelity allows for the equitable removal of stripes without over- or under-penalization, providing a more accurate model of presumably independent sensors with unspeciﬁed and unrestricted bias distribution. A comparison is made between the TV- L 1 and TV- L 2 models to exemplify the qualitative efﬁcacy of an L 1 striping penalty. The model makes use of novel minimization splittings and proximal mapping operators, successfully yielding more realistic destriped images in very few iterations.


Introduction
Image striping is a well-known phenomenon that arises in multi-detector imaging systems ranging from pushbroom-type instruments, such as the Airborne Multi-angle Spectro Polarimetric Imager (AirMSPI), to atomic force mircroscopy (AFM).Biases in lateral detection occur as a result of response variation in spatial detectors, such as in satellite imaging systems, or temporal changes, such as in raster scans.Although these systems are optimally precalibrated, post-processing, such as destriping, of data is a prerequisite for accurate and valid analyses.Striping removal has been traditionally performed using either statistically based methods [1,2] or low-pass filtering in the frequency domain [3][4][5][6][7].These methods, however, do not remove stripes completely and have effects of blurring the image.More recently, wavelet-based filtering methods have been proposed [8][9][10].However, such methods also blur the images and produce ringing effects in reconstruction.
In [11], the authors introduced a robust destriping algorithm with a spatially adaptive unidirectional total variation (TV) model.The authors developed a new destriping method that combines the TV-Stokes model and unidirectional TV model in [12].In [13], the authors introduced a TV and framelet regularization model for destriping.The same authors proposed an anisotropic spectral-spatial TV regularization to enhance the smoothness of the solution along both the spectral and spatial dimension in [14].In [13,14], the authors employed the L 2 norm as the fidelity term, and we refer to these models as TV-L 2 in this paper.
We follow the pedigree of variational and partial differential equation (PDE)-based methods applied to images [15,16] in order to construct a well-defined, optimizable model yielding fast and high-quality destriping.The focus of our paper is not on creating a sparse wavelet representation of the destriped image, but rather on how to remove the optimal striping mask while preserving high image fidelity.In this paper, we propose using the L 1 penalty for striping size.
Our research is robust to both isotropic and anisotropic versions of TV, whereas [13] argue that the anisotropic case is the only appropriate case.While it is true that the anisotropic case uses decoupled energy for the measure of smoothness and is therefore easier to minimize, isotropic TV is not selective in which direction smoothness is penalized.Image content smoothness (or lack thereof) is not known a priori, and thus no preference should immediately be given to certain directions for evaluating smoothness.Using the L 1 penalty, and depending on the data, the isotropic TV, which theoretically uses more local information, allows for qualitatively better, less invasive and more intelligent destriping.We compare the L 1 and L 2 penalties, ultimately favoring the L 1 as a result of a wider yet tighter distribution of the striping mask.We include detailed derivations and a motivated evolution of the optimization problem with pedagogy in mind so that the proposed TV-L 1 method, along with its alternating direction method of multipliers (ADMM) (split Bregman) optimization, can be accessible to all academic disciplines involved with image processing.
We construct a variational model that is well-defined, qualitatively motivated, and easily minimized.The constructed energy uses sparsity-promoting energy functionals, on the basis of TV and L 1 energy, to achieve minimally invasive destriping.Both isotropic and anisotropic TV, along with L 1 energy, are considered in our variational model.The ADMM is used in conjunction with nonlinear proximal operators to efficiently optimize the energy, yielding quick and high-quality results.

Materials and Methods
In this section, we describe the striping problem, provide the background and motivations, discuss tools from functional analysis to be used in describing the model, develop the proposed TV-L 1 model, and provide the algorithm.

Striping Structure
Let U(x, y) be a stripe-free image of size R by C, and let G(y) be a multiplicative stripe noise of length R. Then the observed image, F, can be written as Taking logarithms of both sides of Equation ( 1) yields an additive structure more suitable for energy minimization methods.The model can now be written as where f (x, y) = ln(F(x, y)), g(x, y) = ln(G(x, y)), and u(x, y) = ln(U(x, y)).
Striping in images can be viewed as structured noise, of which variations are mainly concentrated along one axis.This can be mathematically encoded as ∇ x G ∇ y G , or with the logarithmic terms, as ∇ x g ∇ y g .

Tikhonov Minimization
A classical Tikhonov minimization problem would consist of a smoothness regularizer and a data fidelity term, both easily differentiable, with the striping constraint [17]: This constraint can be simplified by the approximation that ∇ x G(x, y) = 0 ∀(x, y), which would make G(x, y) = G(y), and g(x, y) = g(y) functions of only one variable.Using the additive identity between f , g, and u along with the constraint approximation, the new unconstrained minimization problem is By taking the first variation of the energy and setting it to zero, the closed-form solution to this minimization problem is However, this solution would cause g to become bivariate, in contradiction with the constraint.Instead, using the Cartesian regularity of our rectangular domain Ω = I x × I y and using g(x, y) = g(y), we can come to a solution that is in agreement with the constraint by integrating with respect to x: Ω g(x, y)dx = I x g(y)dx = g(y)

Fourier Interpretation
Utilizing the Plancherel Fourier isometry, the solution can be interpreted in spectral form as ĝ(ω y ) = 1 For a specific x, the stripping g(y) is constant and is of higher frequency, whereas the underlying clean image varies more slowly (has more low-frequency content) and for each x has somewhat different frequencies.Therefore, the average frequencies of the clean image are low in magnitude and are of lower frequency, while the average frequencies of the stripping are high in magnitude and are of higher frequency.Therefore, the average frequencies (averaged over ω x ) of the cleaned image are simply the average frequencies of the original image multiplied by a one-dimensional low-pass filter λ λ+z 2 .Likewise, the striping mask on the spectral side, ĝ, is obtained analogously to a one-dimensional high-pass filter z 2 λ+z 2 .Although this minimization problem is readily solvable in closed form and has a motivated physical interpretation, we must abandon the quadratic energy terms so that we may have less penalization for heavier striping and to allow for less smooth solutions.Although the differentiability of terms is easy to work with, enough optimization machinery has been developed that we may tread forward without it.We now investigate and outline some tools from signal processing in order to refine our model.Stripe and ring artifact removal from this frequency perspective has been accomplished using wavelet and Fourier filtering [18].

Tools from Functional Analysis
We now introduce the notations and tools from functional analysis that are used in Section 2.5.

Total Variation: Anisotropic vs. Isotropic
The idea of using TV as a regularizer and denoiser that promotes sparsity and piecewise constant smoothness can be traced back to Rudin, Osher, and Fatemi [19,20].We begin with defining the notion of TV, which is used as a regularizer in the model.
For a differentiable function f ∈ Ω, with Ω ⊆ R n , the TV of f can be written as The choice of vectorial norm inside the integral yields the two different types of TV.
The isotropic and anisotropic cases differ in terms of the geometries they each preserve.While the decoupled anisotropic TV preserves piecewise constant orthogonal structures, such as rectangular roofs, the coupled isotropic TV preserves piecewise constant radial structures, such as silos.Our model is robust with respect to either choice of TV, and dual derivations of variable updates are shown; however, the experiments and results are based on the anisotropic definition.

Shrinkage Proximal Operator
We introduce a splitting variable and quadratic penalty into the model.The solution to the l 1 -regularized least-squares problem: is given by the soft threshold proximal mapping operator, shrinkage [21,22]: , as in the anisotropic case of TV, the shrinkage is decoupled and done component wise.On the other hand, if as in the isotropic case, the terms are coupled and both components are updated simultaneously.Both variants have their merits; while the former is computationally simpler, the latter has the advantage of using more local information and may be more conformant to certain image processing applications.

TV-L 1 ADMM Model
We now describe our model using the motivations and tools introduced earlier.The two energy components of the minimization problem are the smoothness regularizer and the data fidelity term.The energy of the data fidelity term, λ , can be interpreted as the magnitude of the striping mask.The L 2 fidelity overly penalizes stripes of large magnitude and likewise underexaggerates the significance of stripes of small magnitude.In areas of no striping, we intend our (logarithm of the) striping mask to be very close to zero, while in areas of heavy striping, we wish to remove such striping and thus yield a larger magnitude of our striping mask in that region.The L 1 fidelity gives a smaller striping mask in areas of no striping, leaving enough energy to remove the heavier striping in localized areas of the image.Because there is no prior knowledge of the distribution of the stripes and qualitatively we may wish to remove deep striping effects while preserving sharp geometry, we believe it is better to update the model with an L 1 striping penalty, g 1 .
An L 2 gradient term would cause oversmoothing of the retrieved clean image u(x, y).This could cause a loss in boundary sharpness of elements in the image (e.g., lakes, rooftops, etc.), which seems important in the pursuit and usage of destriped images.Implementing a TV-based regularizer would give performance similar to the L 2 gradient but maintains boundary sharpness more natural to the underlying image.Although these terms are not differentiable, impeding a closed-form solution, state-of-the-art nonlinear optimization algorithms are available for fast convergence to qualitatively meaningful minimizers.The unconstrained TV L 1 model (TV-L 1 ) is min u V(u(x, y), I x × I y ) + λ||u(x, y) − f (x, y) 1 or equivalently, minimizing with respect to the striping mask g, For the purpose of application and computation, we now move the problem into a discrete setting.
Let Ω = {x 1 , ..., x C } × {y 1 , ..., y R } be an R × C image.First variations are approximated via forward differences, so that We take Neumann boundary conditions, so that on the forward boundary (i = C or j = R), the derivative is set to zero.
The isotropic TV is The anisotropic TV is With these discrete operators defined, the discrete unconstrained TV-L 1 minimization problem is The two types of the minimization problem are the following: Anisotropic:

. Augmented Lagrangian
Anisotropic: With the discrete forward difference approximation matrix defined above, we can rewrite the minimization problem as follows: To render the constrained minimization problem unconstrained, we introduce auxiliary variables, Lagrangian multipliers (split Bregman), and quadratic penalty terms, so that the augmented Lagrangian is defined as follows: We now solve the unconstrained saddle point problem: The solution to the original constrained minimization problem is now found as the saddle point of the augmented Lagrangian L in a sequence of iterative suboptimizations called the ADMM [23][24][25][26][27][28].
The splitting variables b i and h are updated by the proximal mapping operator: Because of the introduction of the splitting variables, g is only contained in quadratic terms and thus is easily solved for The Langrangian multipliers (split Bregman variables) are updated through gradient ascent: Isotropic: As a result of the coupling of the terms in this version of the minimization problem, we cannot compactly write the problem with matrices as above; however, the solution is just as readily available.Here the denotes the Hadamard matrix power operator, which acts pointwise on the matrix.
Point Form: Matrix Form: Just as before, we introduce splitting variables and Lagrangian multipliers to form the augmented Lagrangian: L α,λ (a i,j , b i,j , h, g, p i,j , q i,j , r) = ∑ i,j The splitting variables a i,j and b i,j are updated by the vectorial proximal mapping operator: = arg min Each component of the vector is updated via shrinkage as follows: The splitting variable h, the striping mask g, and the Lagrangian multipliers are updated as before due to the common structure between the two models: )) We now describe the algorithm for the TV-L 1 model (see Algorithm 1).The first algorithm (anisotropic) is presented in vector form.The second algorithm (isotropic) is presented in matrix form.

Algorithm
Algorithm 1 Complete ADMM optimization of TV-L 1 .
1: Initialize: case Anisotropic: 6: Update splitting variable for smoothness regularizer term via shrinkage: Update Lagrangian multiplier for regularizer term via dual ascent: end for 10: case Isotropic: 11: Update splitting variables for smoothness regularizer term via vectorial shrinkage: Update Lagrangian multipliers for regularizer term via dual ascent: )) q n+1 i,j ← q n i,j + τα(b n+1 i,j − δ y (g n j − f i,j )) 14: end for 15: Update splitting variable for data fidelity term via shrinkage: Update striping mask: Update Lagrangian multiplier for data fidelity term via dual ascent: 18: Update energy terms: 19: until convergence:

Results
In our experiments, we used data acquired by the AirMSPI.The AirMSPI is an airborne prototype instrument similar to that of the future satellite-borne MSPI instrument for obtaining multi-angle polarization imagery [29].The instrument was built for NASA by the Jet Propulsion Laboratory in Pasadena, California and has been flying aboard the NASA ER-2 high altitude aircraft since October 2010.
The AirMSPI is an eight-band (355, 380, 445, 470, 555, 660, 865, and 935 nm) pushbroom camera, measuring polarization in the 470, 660, and 865 nm bands, mounted on a gimbal to acquire multi-angular observations over a ±67 • along-track range.Two principal observing modes are employed: step-and-stare, in which 11 km × 11 km targets are observed at a discrete set of view angles with a spatial resolution of ∼10 m; and continuous sweep, in which the camera slews back and forth along the flight track within ±67 • to acquire wide-area coverage (11 km swath at nadir; target length of 108 km) with a ∼25 m spatial resolution.
Step-and-stare provides more angles, but continuous sweep gives greater coverage.Multiple observing modes can be programmed into the instrument and activated under cockpit control.Multi-angle radiance and polarization imagery from the AirMSPI will provide 3D scene context where clouds and aerosol plumes are present.It will also enable the retrieval of aerosol and cloud macrophysical properties (distribution and height), microphysical properties (size distribution, single scattering albedo, and shape), and optical depth.
We first compare destriping results generated using TV-L 2 and the proposed TV-L 1 models.Figure 1 shows the 355 nm ultraviolet (UV) channel image with stripes captured by the AirMSPI instrument from the nadir angle at Mojave, California.The image has been destriped using the TV-L 2 and TV-L 1 models.As we can see from the destriped images and corresponding differences between the captured and destriped images, the TV-L 2 model does not preserve radiometric intensities in regions where no stripes are present.Figure 2 (left) shows plots of the recovered function G for the TV-L 2 and TV-L 1 destriped images.The TV-L 1 recovered function G is closer to the identity, particularly for the rows with no stripes, suggesting the TV-L 1 reconstruction is more accurate than the TV-L 2 reconstruction.Figure 2 (right) shows plots of sums over all rows of the original image with stripes (from Figure 1), as well as sums of rows for the TV-L 2 and TV-L 1 destriped images.These plots indicate that the TV-L 1 model preserves radiometric intensities of the original images better, and the TV-L 2 model produces more artificial smoothing throughout the image.Figure 3 shows histograms of the function G for the TV-L 2 and TV-L 1 reconstructions.The TV-L 1 reconstruction is pointier than the TV-L 2 reconstruction.It is also centered at 1, as opposed to the TV-L 2 reconstruction, which further indicates better accuracy of the TV-L 1 model.We note that the true stripes, at around G ≈ 0.95, are represented in the histograms by small bumps.

Discussion
We have provided detailed derivations and a motivated evolution of the optimization problem so that the proposed TV-L 1 method can be accessible to all academic disciplines involved with image processing.We have presented a variational model that is well-defined, qualitatively motivated, and easily minimized.The constructed energy uses sparsity-promoting energy functionals, on the basis of TV and L 1 energy, to achieve minimally invasive destriping.Both isotropic and anisotropic TV, along with L 1 energy, are considered in our variational model.The ADMM is used in conjunction with nonlinear proximal operators to efficiently optimize the energy, yielding quick and high-quality results.
The L 2 fidelity overly penalizes stripes of large magnitude and likewise underexaggerates the significance of stripes of small magnitude.In areas of no striping, we intend our striping mask to be very close to the identity, while in areas of heavy striping, we wish to remove such striping and thus yield a larger magnitude of our striping mask in that region.The L 1 fidelity gives a smaller striping mask in areas of no striping, leaving enough energy to remove the heavier striping in localized areas of the image.Because there is no prior knowledge of the distribution of the stripes and qualitatively we may wish to remove deep striping effects while preserving sharp geometry, we conclude that the model with an L 1 striping penalty is more effective.
The TV-L 1 model preserves radiometric intensities of the original images better, and the TV-L 2 model produces more artificial smoothing throughout the image.TV-L 1 recovers the function G that is closer to the identity, particularly for the rows with no stripes, suggesting that the TV-L 1 reconstruction is more accurate than the TV-L 2 reconstruction.The histograms of the function G for the TV-L 1 reconstruction are more compact around 1 than those for the TV-L 2 reconstruction.They are also centered at 1, as opposed to the TV-L 2 reconstruction, which further indicates the better accuracy of the TV-L 1 model.

Conclusions
In this paper, we have presented a novel variational method for image destriping through fast minimization techniques of appropriately modeled energy functionals, namely TV and the L 1 data fidelity term.Our destriping model solves the inverse problem as follows: it minimally removes a univariate multiplicative striping mask from the data, such that the clean image is somewhat smooth and the removed stripe has low energy.We assess the smoothness of the clean image using the TV, which maintains sharp image features and preserves definition and contrast.We address both isotropic and anisotropic TV in this paper, each having their respective strengths and weaknesses.We use L 1 and, for comparison, L 2 energy to measure the removed striping, ensuring minimal data removal and thus a clean image of high fidelity.
The variational problem is solved efficiently using an ADMM approach: it introduces splitting variables and quadratic penalties for deviations from the splitting variables to allow for efficient optimization via proximal shrinkage operators, explicit quadratic solutions, and simple gradient ascent for the Lagrangian multipliers.In our experiments, we have shown that the proposed TV-L 1 method yields qualitatively good results and removes minimal masking, and it does so quickly in terms of both iterations and time.From the histogram distributions of G, we observe a narrower spread around 1, yet a wider, more equidistributed support, suggesting that most of the time, there is minimal masking removal (multiplier close to 1); however, in areas of heavy striping, the destriping effect is more prevalent and is of greater magnitude.
Applications of this algorithm are not limited to satellite imagery, and they may be analogized to other fields, such as raster scans in microscopy.Any scientific measurements (of images) made mostly along a curve-parameterizable by a single dimension-may be susceptible to such striping biases and may be a candidate for similar destriping.Future work will expand this model to multi-modal images and color images, and it may incorporate other specific priors on the data.

Figure 1 . 1 Figure 2 .Figure 3 .
Figure 1.(a) The 355 nm channel image with stripes captured by Airborne Multi-angle Spectro Polarimetric Imager (AirMSPI) instrument from nadir angle at Mojave, California.(b) Destriped image using total variation TV-L 2 model.(c) Difference between captured image from (a) and TV-L 2 destriped image from (b).(d) Destriped image using TV-L 1 model.(e) Difference between captured image from (a) and TV-L 1 destriped image from (d).

Figures 4 -
Figures 4-6 display similar results to those shown in Figures 1-3 for the 355 nm channel image with stripes depicting clouds over the Pacific Ocean captured by the AirMSPI instrument from the 66.0 • F angle.

Figure 4 .
Figure 4. (a) The 355 nm channel image with stripes depicting clouds over the Pacific Ocean captured by Airborne Multi-angle Spectro Polarimetric Imager (AirMSPI) instrument from 66.0 • F angle.(b) Destriped image using total variation TV-L 2 model.(c) Difference between captured image from (a) and TV-L 2 destriped image from (b).(d) Destriped image using TV-L 1 model.(e) Difference between captured image from (a) and TV-L 1 destriped image from (d).

Figure 5 .Figure 6 .Figure 7 .Figure 8 .Figure 9 .Figure 10 .Figure 11 .
Figure 5. Comparisons of total variation TV-L 2 and TV-L 1 destriping for the results shown in Figure 4. On the left, plots of recovered function G for TV-L 2 destriped image (blue) and TV-L 1 destriped image (red) are shown.TV-L 1 recovered function G is closer to the identity, particularly for the rows with no stripes, suggesting TV-L 1 reconstruction is more accurate than TV-L 2 reconstruction.On the right, plots of sums over all rows of original image with stripes (black), TV-L 2 destriped image (blue), and TV-L 1 destriped image (red) are shown.These plots indicate that TV-L 1 model preserves radiometric intensities of the original images better, and TV-L 2 model produces more artificial smoothing throughout the image.