An Investigation of Smooth TV-Like Regularization in the Context of the Optical Flow Problem

Total variation (TV) is widely used in many image processing problems including the regularization of optical flow estimation. In order to deal with non differentiability of the TV regularization term, smooth approximations have been considered in the literature. In this paper, we investigate the use of three known smooth TV approximations, namely: the Charbonnier, Huber and Green functions. We establish the maximum theoretical error of these approximations and discuss their performance evaluation when applied to the optical flow problem.


Introduction
TV-based optical flow can be cast into the following form: min w D(w) + α ∇w 1 , where w ∈ R N is the optical flow, D : R N → R is a data energy function, . 1 is the l 1 norm, and α > 0 is a regularization parameter weighting the relative importance of data and smoothing terms.Although the TV semi-norm has been useful for performing edge-preserving regularization [1][2][3][4][5][6][7][8][9], it is known to be numerically difficult to handle.Despite its convexity, it is not linear, quadratic or even everywhere-differentiable. Thus, the non-smoothness of this term prevents a straightforward application of gradient based optimization methods.
To remedy the problem of non-differentiability of the l 1 norm function x → x 1 , there are two solutions.First, we can split x = y − z where y i = max{x i , 0} and z i = max{−x i , 0} for i = 1, . . ., N. The l 1 norm of x will be then equal to ∑ N i=1 (y i + z i ), which will remove non-differentiability at zero but unfortunately the problem dimension will be doubled and the optimization will become constrained since y and z should be positive.The second remedy, which we are investigating in this paper, is to replace the l 1 norm by a smooth approximation.
Any smooth TV regularization should address two issues: (1) It should remove the singularities that are caused by the use of TV regularization; (2) It should maintain the preservation of motion boundaries.Several smooth approximations of the l 1 norm have been established in the literature for the regularization of a wide variety of problems [10][11][12][13][14][15][16][17], as well as for the optical flow problem [2][3][4][5].
To our knowledge, there is no theory that establishes optimality of any of these approximations; the best choice is application dependent.For instance, Nikolova and Ng [17] have considered different smooth TV approximations in the context of restoration and reconstruction of images and signals using half quadratic minimization.Our objective in this paper is to investigate the use of three known smooth TV approximations, namely: the Charbonnier, Huber and Green functions for the case of optical flow computation.
The outline of this paper is as follows.Section 2 describes the variational formulation of the optical flow problem.In particular, we present the TV regularization model for dense optical flow estimation.In Section 3, we consider three smooth approximations of the TV regularization term and discuss their maximum theoretical error of approximation.Section 4 concerns the performance evaluation of the three approximations in terms of the quality of the estimated optical flow and the speed of convergence by using the Middlebury datasets.Finally, we conclude our work in Section 5.

Variational Formulation
Let us consider a sequence of gray level images I(t, x, y), t ∈ [0, T], (x, y) ∈ Ω, where [0, T] is the temporal domain and Ω denotes the image spatial domain.We will use both continuous and disc in time at frame numbers t k = k∆t, k = 0, . . ., K and in space at pixel coordinates p = (m, n), with m (respectively n) corresponds to the discrete column (respectively row) of the image, being the coordinate origin located in the top-left corner of the image.With this notation, I[k, m, n] denotes a discrete representation of I(t, x, y).We will also use both continuous and quantized image intensities and the same symbol I will be used for both of them.
Assuming that the gray level of a point does not change over time we may write the constraint where (x(t), y(t)) is the apparent trajectory of the point (x(0), y(0)) = (x, y).Taking the derivative with respect to t and denoting (u(t, x, y), v(t, x, y)) = (x (t), y (t)), we obtain the linear optical flow constraint The vector field w(t, x, y) := (u(t, x, y), v(t, x, y)) is called optical flow and I t , ∇I = (I x , I y ) denote the temporal and spatial partial derivatives of I, which are computed using high-pass gradient filters for discrete images.Clearly, the single constraint (2) is not sufficient to uniquely compute the two components (u, v) of the optical flow (this is called the aperture problem) and only gives the component of the flow normal to the image gradient, i.e., to the level lines of the image.As it is usual, in order to recover a unique flow field, some prior knowledge about it should be added.For that, we assume that the optical flow varies smoothly in space, or better, that is piecewise smooth in Ω.This can be achieved by including a smoothness term of the form where G : R 2 × R 4 → R is a suitable function.
Both data attachment (1) and regularization term (3) can be combined into a single energy functional where the data functional D is either equal to the linear term or the nonlinear term and α > 0 is a regularization parameter.When using the linear data term (5), the case G(∇I, ∇w) = ∇w 2 = ∇u 2 + ∇v 2 corresponds to the Horn-Schunck model [18] and the case G(∇I, ∇u, ∇v) = trace((∇w) T M(∇I)∇w), where M(∇I) = 1 ∇I 2 +δ (∇I) ⊥ (∇I) ⊥T + δ1 d and δ ≥ 0, corresponds to the Nagel-Enkelmann model [19].On the other hand, the TV regularization [1] became the most used in image processing because it allows for discontinuities preserving.In this case, for w ∈ L 1 (Ω), we have where C ∞ c (Ω, R 4 ) is the space of infinitely differentiable vector-valued functions with compact support.Note that when TV(w) < ∞, the distributional derivative dw of w is a vector-valued Radon measure with total variation |dw|(Ω) = TV(w).When w ∈ W 1,1 (Ω, R 2 ), the TV semi-norm reduces to the L 1 -norm of the gradient Ω ∇w dxdy so that G(∇I, ∇w) = ∇w , where we can have either ∇w = ∇u 2 + ∇v 2 or ∇w = ∇u + ∇v .We have chosen the first one because the Euclidean norm is known to be rotationally invariant.
For a full account of regularization techniques for the optical flow problem and the associated taxonomy, we refer to [20,21].
In this paper, we will consider a TV regularization model which is written in the discrete form as follows: where w p,q = (u q − u p ) 2 + (v q − v p ) 2 and N p is a set of neighbors of the pixel p.We will also combine the TV regularization with the nonlinear data term ( 6) used with a robust function ψ in order to remove outliers: The robust function used in this paper is where γ is a given threshold.

Smooth TV Regularization
In this section, we focus on approximating the non-smooth TV semi-norm ( 7) by a smooth function: where φ ε : R → R + is a smooth approximation of the absolute value function and ε > 0 is a small parameter adjusting the accuracy of this approximation.In this paper, we will consider variants choices of φ ε as illustrated in Table 1.
Notice that the regularization term in (9) with a function φ ε from Table 1 is a hybrid between the TV regularization (7) and the standard quadratic regularization [18].It takes the form of a quadratic or nearly quadratic for small values of the optical flow gradient and becomes linear or sublinear for large values of the optical flow gradient.In this way, this smooth regularization will retain the fast Laplacian diffusion inside homogeneous motion regions and its effect is substantially reduced near motion boundaries helping the preservation of these boundaries' edges.We should also note that the smaller the parameter ε is, the better the function φ ε approximates the absolute value function; and henceforth the better the smooth regularization (9) approximates the TV regularization (7).In practice, a very small parameter ε might cause numerical instabilities but such a choice is not really needed as the quadratic regularization is preferred inside homogeneous regions.
Table 1.Smooth approximations of the absolute value function and their derivatives.The graphs of these approximations are plotted versus the absolute value function near zero for ε = 0.01.

Charbonnier
Huber Green According to the discussion above, there are minimum conditions that should satisfy any approximation φ ε (see [10]): By simple calculus, it can be shown that the approximations we are considering in this paper, which are given in Table 1, satisfy the conditions in (10).Notice also that all these approximations are suitable for 1st order numerical convex optimization algorithms since they are all convex and differentiable.However, for 2nd order numerical optimization algorithms, the Charbonnier and Green functions are twice differentiable but the Huber function is not.In this case, the latter approximation is normally replaced by a twice differentiable function called the pseudo Huber approximation [12], which is the same as the Charbonnier function except for a vertical translation by ε.

Charbonnier Approximation
The first approximation is referred to as the Charbonnier penalty function [11].It was first used for optical flow in [5].This function is clearly strictly convex and infinitely differentiable.Moreover, we can easily prove that it approximates the absolute value function with an error at most equal to ε.
The Charbonnier TV regularization is therefore an approximation of the TV regularization of order ε.Let |Ω| be the total number of image pixels and |N | be the fixed size of each neighborhood N p which is used for the finite difference approximation of the optical flow gradient.(11).
Proof.Using the previous lemma, we get

Huber Approximation
The Huber function was initially used by Huber (see [13]) as an M-estimator in the field of robust statistics.Its use for optical flow computation was first discussed in [2].Later, it was used as a smooth approximation of the l 1 norm as in [16].The Huber function is clearly convex and continuously differentiable.We want to relate the Huber regularization in (9) to the TV and quadratic regularization.First, the following lemma shows that the Huber function approximates the absolute function with an error of order ε 2 .
This shows that the Huber TV regularization has a maximum theoretical error twice better than that of the Charbonnier TV regularization.

Green Approximation
The Green penalty function was originally used in [14] for the maximum likelihood reconstruction from emission tomography data as a convex extension of the Geman and McClure function [15].This penalty function was introduced for optical flow computation in [3,4].Again, this function is strictly convex and infinitely differentiable inheriting these properties from the log cosh function.Notice that we have translated the original Green function by a factor ε log 2, which is the maximum approximation error as shown by the following lemma.
Proof.Let s ∈ R. First, we have Therefore, whatever the sign of s, we get The Green TV regularization approximates the TV regularization with an order ε as well but the maximum error is slightly greater than that of the Huber TV regularization.

Experimental Results
We want to minimize the energy functional (4) where D is given by ( 8), R is given by ( 9) and φ ε is one of the smooth approximations presented in the previous section.We choose to adopt the discretize-optimize approach by applying a numerical optimization algorithm to this discrete version of the optical flow minimization problem.The problem is of a large-scale type and therefore we solve it using a multiresolution line search truncated Newton method as developed in [22,23].The method first builds a pyramid of images at different levels of resolution.It starts then at the coarsest level with a zero flow field and applies a number of iterations of the line search truncated Newton (LSTN) algorithm.Afterwards, the obtained coarse estimation is taken to the next fine level by bilinear interpolation.This process is repeated until reaching the finest level where a good initial estimate of the optical flow is obtained and henceforth refined by the LSTN algorithm until convergence is reached.
The parameter γ, present in the data term and which is shared by the three functionals, was fixed to a value between ten and twenty depending on the nature of the image sequence.However, in order to have a fair comparison, the parameters α and ε involved in the regularization term yielding a different energy functional, are tuned for each functional to have the best results.From the experiments, we have noticed that functionals with the Charbonnier and Huber approximations will share in general the same set of optimal parameters.As expected the set of optimal parameters for the Green function is different, especially for the value of ε since the log cosh function has a different transition level between its quadratic and linear parts.
In Figure 1, we show the colored based representation of the ground truth and the best optical flow estimates obtained using Charbonnier, Huber and Green smooth TV regularizations for the Middlebury training benchmark [24] using the best parameters.Notice first how the motion boundaries are preserved for all images in Figure 1.This is indeed a famous property of the TV regularization that has been inherited by its three smooth approximations.In Figure 2, we present other tested image sequences that have different types of movement.The first three images have a translation of different sizes: half, one and ten for the spiral, peppers and band sequences, respectively.The baboon sequence has a rotation movement and the Lena sequence has a homography mapping.These five images are standard test images in image processing that have been used as the first frames and the second frames have been generated by applying the movements described above.The Marble blocks sequence, which has a zoom transformation, was obtained from the Image Sequence Server, Institut für Algorithmen und Kognitive Systeme, (Group Prof. Dr. H.-H. Nagel), University of Karlsruhe, Germany and was first used in [25].Finally, the rotating sphere sequence was generated by the Computer Vision Research Group at the University of Otago, New Zealandand the book sequence by the Computer Laboratory at Cambridge University.

Sequence
Ground truth Charbonnier Huber Green Figure 1.Optical flow estimates with best parameters using Charbonnier, Huber and Green TV regularizations for the Middlebury images [24] ordered as in Table 2.
Tables 2 and 3 present the performance comparison of the three approximations in terms of the quality of the optical flow estimation measured by the average angular error (AAE) and the average endpoint error (AEE).In Table 4, we give also the interpolation error measured by the displaced frame difference (DFD), which corresponds to the data term in the energy functional (4).Then in Table 5, we provide a comparison with respect to the speed of convergence given by the number of gradient evaluations (Ng).Table 2. Average Angular Error (AAE) using Charbonnier, Huber and Green TV regularizations for the Middlebury datasets (top) and the image sequences given in Figure 2 in the same order (bottom).

Dimetrodon Grove2 Grove3
Hydrangea Rubberwhale Urban2 Urban3 Venus First, we remark that the Charbonnier and Huber approximations lead to similar results with a slight preference for the latter.Globally, these two approximations perform better than the Green TV regularization in terms of both the average angular error and the average endpoint error of the estimated optical flow solution, and the speed of convergence as shown in Table 6.On a total of sixteen image sequences, Huber method has performed better half of the time in terms of AEE with an average of 3.836 per image sequence.It has also 9 times a better AEE with an average of 0.468.The method needs an average of 466 gradient evaluations to reach the estimated solution.This is slightly better than Charbonnier method, which has averages of 3.849, 0.469 and 473 for AAE, AEE and Ng, respectively.Nevertheless, the Green approximation has the best performance with respect to the interpolation error.The method has performed better on thirteen image sequences out of sixteen with an average DFD of 0.580 per sequence; while Charbonnier and Huber approximations have an average DFD of 0.670 and 0.678, respectively.On the other hand, Green method has better AAE and three different sequences, better AEE for two sequences, and better Ng for four sequences.We have noticed also that the Green method is very sensitive to the parameter ε, which is due to the sensitivity of the hyperbolic function cosh to roundoff errors.The Charbonnier and Huber approximations suffer less from this problem.This might explain their wide use as smooth TV approximations in image processing.In Figure 3, we show the dependence of the estimated solution on the parameter ε for these two approximations using the Yosemite sequence, which was created by Lynn Quam at SRI and first used for optical flow in [26].The dependence is shown in terms of AAE, AEE and Ng.For the Yosemite sequence with clouds, we can see that both the Charbonnier and Huber TV approximations give similar results for values of ε between 10 −6 and 0.1.For ε < 10 −5 , Charbonnier approximation is slightly better than Huber approximation but the latter is performing better for values of ε around 10 −4 .Otherwise, the two approximations are performing almost the same except for large values of ε near 0.1 where Charbonnier gives slightly better AEE but Huber has better AAE.In Figure 4 and Table 7, the results are shown using the best parameters.

Conclusions
We have investigated the use of three smooth approximations of the TV regularization in the context of the optical flow problem.We have used the same non linear data optical flow term and the same multilevel truncated Newton algorithm for the three approximations.On sixteen tested image sequences, the Huber function has confirmed its best theoretical approximation with an overall better performance in terms of both the quality of the estimated optical flow and the speed of convergence.Although the Charbonnier function has the worst theoretical approximation, it has performed almost the same as the Huber function and better than the Green function.On the other hand, in terms of the interpolation error, the Green function appears to be the best method.It has performed better on thirteen images out of sixteen.

Figure 2 .
Figure 2. Other image sequences that are used in the comparison tables.

Figure 3 .
Figure 3. Sensitivity of Charbonnier and Huber TV regularizations for the Yosemite sequence with clouds (top) and without clouds (bottom) with respect to the choice of ε.On the left, we show the quality of the estimated optical flow in terms of AAE and AEE .On the right, we compare the number of gradient evaluations.α = 35 and γ = 10.The x-axis is log scaled.

Figure 4 .
Figure 4. Optical flow estimates with best parameters using Charbonnier and Huber TV regularizations for the Yosemite sequence.

Table 3 .
Average Endpoint Error (AEE) using Charbonnier, Huber and Green TV regularizations for the Middlebury datasets (top) and the image sequences given in Figure2in the same order (bottom).

Table 4 .
Displaced Frame Difference (DFD) using Charbonnier, Huber and Green TV regularizations for the Middlebury datasets (top) and the image sequences given in Figure2in the same order (bottom).

Table 5 .
Number of gradient evaluations (Ng) using Charbonnier, Huber and Green TV regularizations for the Middlebury datasets (top) and the image sequences given in Figure2in the same order (bottom).

Table 6 .
Overall performance of Charbonnier, Huber and Green TV regularizations for the sixteen tested images.

Table 7 .
Average Angular Error (AAE), Average Endpoint Error (AEE) and number of gradient evaluations (Ng) using best parameters for Charbonnier and Huber TV regularizations on the Yosemite sequence with clouds (Yosemitec) and without clouds (Yosemite).