Next Article in Journal
Injective Hulls of Infinite Totally Split-Decomposable Metric Spaces
Previous Article in Journal
An Inexact Nonsmooth Quadratic Regularization Algorithm
Previous Article in Special Issue
Nonuniform Sampling in Lp-Subspaces Associated with the Multi-Dimensional Special Affine Fourier Transform
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mitigating the Drawbacks of the L0 Norm and the Total Variation Norm

by
Gengsheng L. Zeng
Department of Computer Science, Utah Valley University, Orem, UT 84058, USA
Axioms 2025, 14(8), 605; https://doi.org/10.3390/axioms14080605 (registering DOI)
Submission received: 27 June 2025 / Revised: 23 July 2025 / Accepted: 31 July 2025 / Published: 4 August 2025

Abstract

In compressed sensing, it is believed that the L0 norm minimization is the best way to enforce a sparse solution. However, the L0 norm is difficult to implement in a gradient-based iterative image reconstruction algorithm. The total variation (TV) norm minimization is considered a proper substitute for the L0 norm minimization. This paper points out that the TV norm is not powerful enough to enforce a piecewise-constant image. This paper uses the limited-angle tomography to illustrate the possibility of using the L0 norm to encourage a piecewise-constant image. However, one of the drawbacks of the L0 norm is that its derivative is zero almost everywhere, making a gradient-based algorithm useless. Our novel idea is to replace the zero value of the L0 norm derivative with a zero-mean random variable. Computer simulations show that the proposed L0 norm minimization outperforms the TV minimization. The novelty of this paper is the introduction of some randomness in the gradient of the objective function when the gradient is zero. The quantitative evaluations indicate the improvements of the proposed method in terms of the structural similarity (SSIM) and the peak signal-to-noise ratio (PSNR).
MSC:
49N30; 49Q05; 65K10; 68U10; 68W20; 68W25; 68W40; 90C23

1. Introduction

The focus of this paper is on the L0 norm minimization. If x is a scalar, the L0 norm of x is defined as [1,2,3].
x L 0 = 0 i f x = 0 , 1 i f x 0 .
If X is a vector or a matrix, the L0 norm of X is defined as
X L 0 = The   total   count   of   the   nonzero   elements   in   X
Here we use the term ‘norm’ even though the L0 norm is not a true norm in the mathematical sense. It is better to refer to the L0 norm as the cardinality function or sparsity measure. A norm must have the positive homogeneity property
k x = k x ,
for all x in its domain and k > 0 . Clearly, the L0 norm does not satisfy the positive homogeneity property (3). In fact, according to (1), we have
k x L 0 = x L 0 ,
for all x in its domain and k > 0 .
One important application of the L0 norm is in compressed sensing, which was introduced in the first decade of this century [4,5,6,7,8,9,10]. The principle of compressed sensing is to exactly reconstruct a signal using a small number of samples if the signal is sparse. The problem can be described as a system of linear equations:
A X = P ,
where A is the system matrix, P is the measurement vector, and X is the unknown vector. In a compressed sensing problem, the number of measurements in P is much smaller than the number of unknowns in X . In other words, the system matrix A has more columns than rows. We call the solution X sparse if most of the elements in X are zero.
Another situation in compressed sensing is that the solution X of the system (5) is not sparse; however, a transformed version of X is sparse. If such a sparsification transformation operator is denoted by ψ , and the elements in vector Y
Y = ψ X
are dominated by zeros; then Y is sparse. The total count of non-zero elements in Y is the L0 norm of Y . In theory, the solution X can be obtained by minimizing the following objective function F
F = A X P L 2 2 + α | |   ψ X | | L 0 ,
where α is a tuning parameter.
Let us explain (5) and (7) further. Many real-word problems can be modeled as a system of linear equations (5), where the measurements are represented as a vector P and the object being measured is represented by a vector X. The measurements P usually contain noise. The vector X is a discretized version of the real-world object to be estimated. The system matrix A describes the first-order approximation of measurement physics. The matrix A is assumed to be known. In compressed sensing, the object is under-sampled. Thus, the size of P is smaller than the size of X, and the matrix A has more columns than rows.
The first term in (7), A X P L 2 2 , is the data fidelity term. Minimizing this term is equivalent to solving (5). When the system (5) is under-determined, the solutions in (5) is not unique. The second term in (7), α | |   ψ X | | L 0 , is the Bayesian term. Minimizing this term enforces ψ X to be sparse or enforces X to be piecewise-constant.
Since the L0 norm counts the total number of non-zero elements, it is difficult to use a gradient-based algorithm to minimize the objective function F defined in (7). In fact, minimizing the L0 norm is an NP-hard problem [11,12,13,14,15], making it computationally infeasible for large-scale problems.
Many researchers attempted to minimize L0 norm and found it to be difficult to deal with the L0 norm directly [2,3,16,17,18,19,20,21,22,23,24,25]. One way is to approximate the L0 norm by other norms [2,3,16,17,18,19]. Another way is to convert an L0-norm minimization problem to an integer programming problem [20,21]. Still another way is to approximate the L0-norm by a smooth function [22,23,24,25]. The L0-norm minimization problem can also be converted to some sub problems by using some thresholds [26,27]. Robitzsch compared an Lp-norm method with an approximation of the L0-norm method and showed that the L0-norm method is slightly better [28].
The most popular method to replace the L0-norm is the use of the total variation (TV) norm [29,30,31]. In the TV norm methods, the total variation (TV) norm of the unknown X is used to replace the second term in (7). The TV norm minimization method is to use finite difference as the sparsification operator ψ and use the L1 norm to approximate the L0 norm. If X is piecewise constant, the finite difference version is sparse. The L1 norm for a vector X of n elements is given by
X L 1 = i = 1 n x i .
Thus, the TV norm of a vector X of n elements is
X T V = i = 1 n 1 x i x i 1 .
Using the TV norm optimization, the objective function (7) for the vector case becomes
F = A X P L 2 2 + α X T V = A X P L 2 2 + α i = 1 n 1 x i x i 1 .
The TV norm of a matrix is not a direct extension of (9) to a higher dimension. We have two different definitions of the TV norm for a two-dimensional matrix (that is, a two-dimensional image) based on the two different ways to define the finite difference [1]. One TV norm is referred to as the isotropic TV norm and the other TV norm is referred to as the anisotropic TV norm. We use double indices for each element of the matrix X .
The isotropic TV norm is defined as [1]
T V i s o X = i , j x i + 1 , j x i , j 2 + ( x i , j + 1 x i , j ) 2 ,
and the anisotropic TV norm is defined by [1]
T V a n i s o X = i , j x i + 1 , j x i , j + x i , j + 1 x i , j .
For the two-dimensional image case, the objective function (10) can then be written as
F = A X P L 2 2 + α T V i s o X = A X P L 2 2 + α i , j x i + 1 , j x i , j 2 + ( x i , j + 1 x i , j ) 2
and
F = A X P L 2 2 + α T V a n i s o X = A X P L 2 2 + α i , j x i + 1 , j x i , j + x i , j + 1 x i , j ,
respectively.
The justification of using L1 norm to approximate the L0 norm is that they both prefer a solution with more zeros. Figure 1 shows an exemplary solution line x 2 = m x 1 + b ; any point x 1 ,   x 2 on this line is a solution to (5). This solution line may intersect the coordinate axes at two points: p 1 and p 2 , with p 1 = b / m , 0 and p 2 = 0 , b , respectively. The L0 minimization method will select either p 1 or p 2 to be the solution, because | p 1 | L 0 = | p 2 | L 0 =   1 and other points on the line have a larger L0 norm of | x 1 ,   x 2 | L 0 = 2 . As for the L1 norm, | p 1 | L 1 = | b / m | , and | p 2 | L 1 = | b | . The L1 minimization method will select p 1 if m > 1 or p 2 if m < 1 to be the solution. The L1 minimization method will select either p 1 or p 2 to be the solution if m = 1 . Therefore, L0 minimization and L1 minimization may select the same solutions. The justification illustrated in our toy example is unrealistic because we do not have the luxury to obtain this solution line to search along.
In the objective function (10), the first term involving the L2 norm is dominating. Thus, when the parameter α is small, the optimal solution of (10) may not make ψ X sparse.
A drawback of the TV norm is that it cannot tell the difference between a smooth monotonic transition and a sharp monotonic transition (See Figure 2).
Therefore, using the L0 norm and objective functions (7) may work better than objective functions (13) and (14), in the sense that the L0 norm can tell the difference for the two curves shown in Figure 2. A drawback of the finite difference using the L0 method is that we do not have an effective and efficient way to minimize the L0 norm. The main goal of the paper is to directly minimize the objective function (7) with the use of the L0 norm and find an innovative way to handle the L0 norm.

2. Methods

In Section 1, we analyzed the drawbacks of the TV norm and the L0 norms. In this section, we will develop a method to replace the TV norm by the L0 norm for a practical iterative algorithm.
We notice from the traditional definition of the L0 norm (see Figure 3 Top)
x L 0 = 0 i f x = 0 1 i f x 0 .
that the “if x = 0 ” statement is hardly true in a practical computer algorithm, because in practice a very small value (for example, an image pixel value of 0.000000000001) can be treated as zero. For the scalar case, it is reasonable to replace the definition (1) by (15) below
f 0 ( x ) = x c i f x c 1 i f x > c .
for a chosen c > 0 . In (15), c is a tuning parameter, determined by trial-and-error. This modification is illustrated in Figure 3 (Middle). The piecewise linear function f 0 ( x ) defined in (5) is continuous and is a good approximation of x L 0 when c 0 .
In fact, we can have many versions of the L0 norm. For example, the version shown in Figure 3 (Bottom) is a smooth function; the derivative of the curve exists everywhere. One such smooth function is
g 0 x = 1 e x 2 / c
for a small c > 0 .
The requirement to be differentiable everywhere is not needed. We believe that a gradient-based optimization algorithm only requires the existence of the left and right derivatives everywhere. Let us consider a toy example of
f ( x ) = x
We want to find the minimum of the function defined in (16). It is obvious that the solution is
x = 0
We notice that the function f ( x ) = x is not differentiable at x = 0 . At x = 0 , the left derivative and right derivative of f ( x ) are f l e f t ( 0 ) = 1 and f r i g h t ( 0 ) = 1 , respectively. A gradient-based optimization algorithm can be crafted as
x n e x t = x c u r r e n t λ f l e f t x c u r r e n t + f r i g h t x c u r r e n t 2
where the parameter λ > 0 controls the step size of the iterative optimization algorithm. Algorithm (18) is nothing but the commonly used gradient descent algorithm, except that the gradient is replaced by the average of the left and right gradients. Therefore, our definition of the L0 norm is user-friendly in gradient-based iterative optimization algorithms. Using a smooth function g 0 ( x ) shown in Figure 3 (Bottom), to approximate the L0 norm is not necessary.
To extend the revised L0 definition from the scalar case (15) to a matrix X, whose elements are denoted as x i , j , we have
f 0 ( X ) = i , j f 0 ( x i , j )  
where f 0 ( x i , j ) is defined in (15). The definition of (19) is still not effective in a gradient-based iterative optimization algorithm. The partial derivative of f 0 ( X ) with respect to x i , j is calculated as
f 0 ( X ) x i , j = d f 0 ( X ) d x i , j =   1 c s g n ( x i , j ) i f x i , j c 0 i f x i , j > c .
Let
s t e p ( x ) = 1 i f x 0 0 i f x < 0 .
Then (20) can be expressed as
f 0 ( X ) x i , j = d f 0 ( X ) d x i , j =   1 c × s g n ( x i , j ) × s t e p ( c x i , j ) .
The plot of (22) is shown in Figure 4. It is observed that when x > c the derivative is 0. In other words, there will be no update action in the iterative algorithm for most of the image pixels. This makes the optimization algorithm almost inactive and ineffective.
In order to obtain more actions in the iterative algorithm, our next innovation is to replace the zero values in d f 0 ( x ) / d x by small zero-mean random variables. Thus, Figure 4 becomes Figure 5. Since the optimization algorithms need the expression of d f 0 ( x ) / d x and do not care if there is a corresponding expression of f 0 ( x ) , we do not bother to investigate what the definition of f 0 x is corresponding to the d f 0 ( x ) / d x shown in Figure 5.
The mathematical expression for the revised derivative shown in Figure 5 is given as
f 0 ( X ) x i , j = d f 0 ( X ) d x i , j =   1 c × s g n ( x i , j ) × s t e p ( c x i , j ) + r a n d × s t e p ( x i , j c ) ,
where r a n d is a small zero-mean random variable. The tuning parameter c is selected by trail-and-error, depending on the application.
We remind the reader that our images are not sparse, but piecewise constant. We need a sparsification transformation operator ψ to convert a piecewise-constant image to a sparse image. We chose the finite difference operator as the sparsification transformation operator ψ . Just like cases of isotropic TV (11) and anisotropic (12) TV definitions, we can have two definitions of X L 0 / x i , j , one being the isotropic derivative version and the other being the anisotropic version.
The isotropic version is defined as
X L 0 x i , j = n , m x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 L 0 x i , j = n , m x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 L 0 x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 x i , j = n , m g ( n , m ) × x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 x i , j = g ( i 1 , j )   x i , j x i 1 , j 2 + ( x i 1 , j + 1 x i 1 , j ) 2 x i , j + g ( i , j )   x i + 1 , j x i , j 2 + ( x i , j + 1 x i , j ) 2 x i , j + g ( i , j 1 )   x i + 1 , j 1 x i , j 1 2 + ( x i , j x i , j 1 ) 2 x i , j = g ( i 1 , j ) x i , j x i 1 , j x i , j x i 1 , j 2 + ( x i 1 , j + 1 x i 1 , j ) 2 + g ( i , j 1 ) x i , j x i , j 1 x i + 1 , j 1 x i , j 1 2 + ( x i , j x i , j 1 ) 2 g ( i , j ) x i + 1 , j + x i , j + 1 2 x i , j x i + 1 , j x i , j 2 + ( x i , j + 1 x i , j ) 2 = g ( i 1 , j ) x i , j x i 1 , j u ( i 1 , j ) + g ( i , j 1 ) x i , j x i , j 1 u ( i , j 1 ) g ( i , j ) x i + 1 , j + x i , j + 1 2 x i , j u ( i , j )
with
u ( n , m ) = x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2
and
g ( n , m ) = x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 L 0 x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 =   1 c × s t e p ( c x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 ) + r a n d × s t e p ( x n + 1 , m x n , m 2 + ( x n , m + 1 x n , m ) 2 c ) =   1 c × s t e p ( c u ( n , m ) ) + r a n d × s t e p ( u ( n , m ) c ) .
The anisotropic version is defined by
X L 0 x i , j = n , m x n + 1 , m x n , m L 0 + x n , m + 1 x n , m L 0 x i , j
= x i , j x i 1 , j L 0 + x i 1 , j + 1 x i 1 , j L 0 x i , j
+ x i + 1 , j x i , j L 0 + x i , j + 1 x i , j L 0 x i , j
+ x i + 1 , j 1 x i , j 1 L 0 + x i , j x i , j 1 L 0 x i , j
= x i , j x i 1 , j L 0 x i , j + x i + 1 , j x i , j L 0 x i , j + x i , j + 1 x i , j L 0 x i , j + x i , j x i , j 1 L 0 x i , j
=   1 c × s g n ( x i , j x i 1 , j ) × s t e p ( c x i , j x i 1 , j ) + r a n d 1 × s t e p ( x i , j x i 1 , j c )
+   1 c × s g n ( x i + 1 , j x i , j ) × s t e p ( c x i + 1 , j x i , j ) + r a n d 2 × s t e p ( x i + 1 , j x i , j c )
+   1 c × s g n ( x i , j + 1 x i , j ) × s t e p ( c x i , j + 1 x i , j ) + r a n d 3 × s t e p ( x i , j + 1 x i , j c )
+   1 c × s g n x i , j x i , j 1 × s t e p c x i , j x i , j 1 + r a n d 4 × s t e p x i , j x i , j 1 c
If a gradient based iterative algorithm is used to minimize an objective function, the algorithm does not update the image pixel x i , j when the partial derivative of the objective function with respect to x i , j is zero. More often than not, the L0 Bayesian term is ‘silent’ and has little contribution to find a piecewise constant solution. If we replace zero with zero-mean random variables, we actually disturb the algorithm so that it is not ‘silent.’
This strategy is similar to the simulated annealing algorithm, which generates random solutions and gradually rejects less optimal solutions [32]. However, there is no theoretical guarantee that random solutions are better solutions.
By replacing zero by zero-mean random variables does not change the fact that the L0 defined norm is not convex. To obtain the global minimum, one must do exhaustive search for the entire solution space. In other words, L0 minimization even with the current modification is still NP-hard. The proposed algorithm is gradient based, does not do exhaustive search, and usually does not reach the global minimum.
A drawback of the objective function (7) is the difficulty in selecting the control parameter α . Instead of minimizing the objective function (7) directly, we propose to alternatively minimize the data fidelity term A X P L 2 2 and the Bayesian term | |   ψ X | | L 0 . In this way, the value of tuning parameter α is no longer important. In other words, we use an iterative Projection onto Convex Sets (POCS) algorithm.
There are many algorithms to minimize the first term (i.e., the data fidelity term). We chose the maximum likelihood expectation maximization (MLEM) algorithm in our implementation [33,34,35,36,37]. On the other hand, the Bayesian term is minimized by a gradient descent algorithm. For an image reconstruction task, at the pixel x i , j , the gradient is given by (27).
Now, we use a different way to explain the algorithm to enforce the L0 norm of the gradient image. For an image reconstruction task, we want to update the pixel x i , j , as shown in Figure 6. The horizontal and vertical gradients of pixel x i , j are x i , j x i 1 , j , x i , j x i + 1 , j , and x i , j x i , j 1 , x i , j x i , j + 1 , correspondingly. We want to minimize the L0 norms of these four differences. The derivative of these four L0 norms have this common expression:
D ( x i , j ,   x n e i g h b o r )   =   D e r i v a t i v e = a × s g n x i , j x n e i g h b o r
where
a = 1 / c , w h e n x i , j x n e i g h b o r < c , r a n d , w h e n x i , j x n e i g h b o r c .
A gradient descent algorithm to update x i , j is given as
x i , j n e x t = x i , j μ [ D ( x i , j , x i , j 1 ) + D ( x i , j , x i , j + 1 ) + D ( x i , j , x i , j 1 ) + D ( x i , j , x i , j + 1 ) ]
Here, the zero-mean random noise, rand, is uniformly distributed in [ 1 ,   1 ] . The tuning parameter c was chosen as 0.1 by trial-and-error in our implementation.
In our computer simulations, we used 100,000 iterations of the POCS algorithm. At each POCS iteration, we first ran 10 iterations of the MLEM algorithm to minimize the data fidelity term and then we ran 10 iterations of the gradient descent algorithm to minimize the L0 term with a step size of 0.1.
We summarize the main steps in the development of the proposed algorithm as follows. We started with the original L0 definition (1). Then, this expression was replaced by a piecewise-linear continuous approximation (15). Next, the partial derivative (22) was replaced by (23). The proposed POCS algorithm is summarized as a flowchart in Figure 7.

3. Results

We applied the proposed POCS algorithm to a limited-angle two-dimensional image reconstruction problem. The computer-generated phantoms were considered. The first phantom had a large, uniform disk as the background and 13 smaller, uniform squares and disks. The sizes, locations, and intensities for each square and disk are listed in Table 1. The second phantom was the famous Shepp–Logan head phantom [38]. No noise was added to the phantom projection data.
In the computer simulations, the parallel-beam imaging geometry was considered. The scanning angle for the first phantom was 40°, and for the second phantom was 90°. The image size was 256 × 256. For these two phantom studies, three image reconstruction algorithms were compared in Figure 8 and Figure 9, respectively: The well-known MLEM algorithm [34], the MLEM-TV algorithm [33], and the proposed POCS revised L0-norm minimization algorithm.
The iteration number was 400 in the POCS algorithm. Within each iteration, there were 10 iterations of the MLEM algorithm for data fidelity enforcement and 10 iterations of the gradient decent algorithm for piecewise-constant enforcement.
The only tuning parameter in the MLEM algorithm is the number of iterations. However, the gradient descent algorithm has two tuning parameters: the number of iterations and the step size. The step size was chosen as 0.0001 in the first phantom reconstruction and was 0.00001 in the second phantom reconstruction.
The MLEM reconstruction has the most limited-angle artifacts, and the shapes of the small objects are not well defined. The TV reconstruction slightly improves the boundaries of the small objects in the image. The most significant improvement is achieved by the proposed POCS L0 minimization algorithm.
Table 2 and Table 3 show the quantitative evaluation results with the structural similarity (SSIM) [39] and the peak signal-to-noise ratio (PSNR) [40] for the two phantom studies, respectively. An SSIM value closer to 1 indicates better image quality. A greater PSNR value indicates better image quality.
We observe that the reconstruction artifacts depend heavily on the size and the orientation of an object. A larger object tends to have more severe artifacts. If there are many smaller objects inside a larger object, the artifacts from each smaller object interact. Therefore, the distance between objects affects the overall artifacts. In the first phantom, the small objects are isolated from each other. In the second phantom, the small objects are close to each other. The second phantom is more difficult to reconstruct than the first phantom.

4. Conclusions

The TV norm has a drawback that it cannot distinguish a smooth function and a piecewise-constant function. The TV Bayesian objective function may not be effective in promoting a piecewise constant solution. The L0 norm, on the other hand, is difficult to implement in a gradient-based optimization algorithm. This paper aims to address these drawbacks.
The difficulty of using the L0 norm in an optimization algorithm is well known. Remedies have been proposed the tested by many researchers. The efforts can be classified into two categories: approximating the L0 norm by a different norm and approximating the L0 norm itself by a continuous functional. The well-known TV-norm optimization belongs to the first category. Our paper belongs to the second category. A unique feature of our method is in the region where the signal is not zero. In this region, the traditional L0 norm has X L 0 x i , j = 0 . We replace this 0 with a zero-mean random variable. There are many methods that combine L0, L1, and TV models. The unique feature of our method is the introduction of randomness in the derivative.
The L0 norm is not convex. The gradient-based algorithm usually does not converge to the global minimum. We introduce a zero-mean random perturbation to the algorithm; this random perturbation gives the algorithm a chance to ‘jump out’ from a local minimum to another local minimum with a smaller objective function value.
In this paper, we replace the Bayesian algorithm with a POCS algorithm and revise the derivative of the L0 norm so that it does not have a constant zero in most cases. As an application to limited-angle tomography, the proposed algorithm outperforms the MLEM-TV algorithm when the scanning angular range is small.
It is difficult to compare any two algorithms in general, because the performance of the algorithms is application-dependent. As shown in our two phantom studies, different applications may require different minimal scanning angular ranges.

Funding

This research was funded by NIH, grant number 2R15EB024283-03.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLEMmaximum-likelihood expectation-maximization
POCSprojection onto convex sets
PSNRpeak signal-to-noise ratio
SSIMstructural similarity
TVtotal variation

References

  1. Zeng, G.L.; Li, Y. Morphing from the TV-norm to the l0-norm. Biomed. J. Sci. Tech. Res. 2024, 55, 46741–46747. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
  2. Sun, Y.; Schaefer, S.; Wang, W. Denoising point sets via L0 minimization. Comput. Aided Geom. Des. 2015, 35, 2–15. [Google Scholar] [CrossRef]
  3. Nguyen, R.M.; Brown, M.S. Fast and effective L0 gradient minimization by region fusion. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 208–216. [Google Scholar]
  4. Baraniuk, R.G. Compressive Sensing [Lecture Notes]. IEEE Signal Process. Mag. 2007, 24, 118–121. [Google Scholar] [CrossRef]
  5. Romberg, J. Imaging via Compressive Sampling. IEEE Signal Process. Mag. 2008, 25, 14–20. [Google Scholar] [CrossRef]
  6. Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
  7. Tsaig, Y.; Donoho, D.L. Extensions of compressed sensing. Signal Process. 2006, 86, 549–571. [Google Scholar] [CrossRef]
  8. Tanner, J.; Vary, S. Compressed sensing of low-rank plus sparse matrices. Appl. Comput. Harmon. Anal. 2023, 64, 254–293. [Google Scholar] [CrossRef]
  9. Donoho, D.L.; Tanner, J. Exponential bounds implying construction of compressed sensing matrices, error-correcting codes, and neighborly polytopes by random sampling. IEEE Trans. Inf. Theory 2010, 56, 2002–2016. [Google Scholar] [CrossRef]
  10. Candes, E.J.; Romberg, J.; Tao, T. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 2006, 52, 489–509. [Google Scholar] [CrossRef]
  11. Woeginger, G.J. Exact algorithms for NP-hard problems: A survey. In Combinatorial Optimization—Eureka, You Shrink! Papers Dedicated to Jack Edmonds 5th International Workshop, Aussois, France, March 5–9, 2003; 2001 Revised Papers; Springer: Berlin/Heidelberg, Germany, 2003; pp. 185–207. [Google Scholar]
  12. Paschos, V.T. An overview on polynomial approximation of NP-hard problems. Yugosl. J. Oper. Res. 2009, 19, 3–40. [Google Scholar] [CrossRef]
  13. Tkatek, S.; Bahti, O.; Lmzouari, Y.; Abouchabaka, J. Artificial intelligence for improving the optimization of NP-hard problems: A review. Int. J. Adv. Trends Comput. Sci. Appl. 2020, 9, 7411–7420. [Google Scholar]
  14. Li, W.; Ding, Y.; Yang, Y.; Sherratt, R.S.; Park, J.H.; Wang, J. Parameterized algorithms of fundamental NP-hard problems: A survey. Hum. Centric Comput. Inf. Sci. 2020, 10, 29. [Google Scholar] [CrossRef]
  15. Lin, F.T.; Kao, C.Y.; Hsu, C.C. Applying the genetic approach to simulated annealing in solving some NP-hard problems. IEEE Trans. Syst. Man Cybern. 1993, 23, 1752–1767. [Google Scholar]
  16. Zhang, J.; Zhao, C.; Zhao, D.; Gao, W. Image compressive sensing recovery using adaptively learned sparsifying basis via L0 minimization. Signal Process. 2014, 103, 114–126. [Google Scholar] [CrossRef]
  17. Brandt, C.; Seidel, H.P.; Hildebrandt, K. Optimal spline approximation via ℓ0-minimization. In Computer Graphics Forum; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 2015; Volume 34, pp. 617–626. [Google Scholar]
  18. Lu, Z.; Zhang, Y. Penalty decomposition methods for L0-norm minimization. arXiv 2010, arXiv:1008.5372. [Google Scholar]
  19. Sun, Y.; Schaefer, S.; Wang, W. Image structure retrieval via L0 minimization. IEEE Trans. Vis. Comput. Graph. 2017, 24, 2129–2139. [Google Scholar] [CrossRef]
  20. Delle Donne, D.; Kowalski, M.; Liberti, L. A novel integer linear programming approach for global L0 minimization. J. Mach. Learn. Res. 2023, 24, 18322–18349. [Google Scholar]
  21. Atamturk, A.; Gómez, A.; Han, S. Sparse and smooth signal estimation: Convexification of l0-formulations. J. Mach. Learn. Res. 2021, 22, 1–43. [Google Scholar]
  22. Hyder, M.; Mahata, K. An approximate l0 norm minimization algorithm for compressed sensing. In Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, China, 12 May 2008; pp. 3365–3368. [Google Scholar]
  23. Wang, L.; Yin, X.; Yue, H.; Xiang, J. A regularized weighted smoothed L0 norm minimization method for underdetermined blind source separation. Sensors 2018, 18, 4260. [Google Scholar] [CrossRef]
  24. Robitzsch, A. Computational aspects of L0 linking in the Rasch model. Algorithms 2025, 18, 213. [Google Scholar] [CrossRef]
  25. O’Neill, M.; Burke, K. Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 2023, 33, 71. [Google Scholar] [CrossRef]
  26. Zhang, Y.; Dong, B.; Lu, Z. ℓ0 Minimization for wavelet frame based image restoration. Math. Comput. 2013, 82, 995–1015. [Google Scholar] [CrossRef]
  27. Cheng, X.; Zeng, M.; Liu, X. Feature-preserving filtering with L0 gradient minimization. Comput. Graph. 2014, 38, 150–157. [Google Scholar] [CrossRef]
  28. Robitzsch, A. L0 and Lp loss functions in model-robust estimation of structural equation models. Psych 2023, 5, 1122–1139. [Google Scholar] [CrossRef]
  29. Needell, D.; Ward, R. Stable image reconstruction using total variation minimization. SIAM J. Imaging Sci. 2013, 6, 1035–1058. [Google Scholar] [CrossRef]
  30. Yang, J.; Yu, H.; Jiang, M.; Wang, G. High-order total variation minimization for interior tomography. Inverse Probl. 2010, 26, 035013. [Google Scholar] [CrossRef]
  31. Sidky, E.Y.; Pan, X. Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Phys. Med. Biol. 2008, 53, 4777. [Google Scholar] [CrossRef]
  32. Van Laarhoven, P.J.; Aarts, E.H. Simulated annealing. In Simulated Annealing: Theory and Applications; Springer: Dordrecht, The Netherlands, 1987; pp. 7–15. [Google Scholar]
  33. Panin, V.Y.; Zeng, G.L.; Gullberg, G.T. Total variation regulated EM algorithm. IEEE Trans. Nucl. Sci. 1999, 46, 2202–2210. [Google Scholar] [CrossRef]
  34. Shepp, L.A.; Vardi, Y. Maximum likelihood reconstruction for emission tomography. IEEE Trans. Med. Imaging 2007, 1, 113–122. [Google Scholar] [CrossRef]
  35. Snyder, D.L.; Miller, M.I.; Thomas, L.J.; Politte, D.G. Noise and edge artifacts in maximum-likelihood reconstructions for emission tomography. IEEE Trans. Med. Imaging 1987, 6, 228–238. [Google Scholar] [CrossRef]
  36. Levitan, E.; Herman, G.T. A maximum a posteriori probability expectation maximization algorithm for image reconstruction in emission tomography. IEEE Trans. Med. Imaging 1987, 6, 185–192. [Google Scholar] [CrossRef]
  37. Ollinger, J.M. Maximum-likelihood reconstruction of transmission images in emission computed tomography via the EM algorithm. IEEE Trans. Med. Imaging 1994, 13, 89–101. [Google Scholar] [CrossRef]
  38. Shepp, L.A.; Logan, B.F. The Fourier reconstruction of a head section. IEEE Trans. Nucl. Sci. 1974, 21, 21–43. [Google Scholar] [CrossRef]
  39. Zhou, W.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
  40. Jain, A.K. Fundamentals of Digital Image Processing; Prentice Hall: Hoboken, NJ, USA, 1989. [Google Scholar]
Figure 1. Points x 1 , x 2 on the line x 2 = m x 1 + b are solutions to (1).
Figure 1. Points x 1 , x 2 on the line x 2 = m x 1 + b are solutions to (1).
Axioms 14 00605 g001
Figure 2. Two curves with the same TV value but different finite-difference plus L0 values.
Figure 2. Two curves with the same TV value but different finite-difference plus L0 values.
Axioms 14 00605 g002
Figure 3. Three ways to re-define the L0 norm for a scalar x. Top: The traditional definition (1). Middle: The proposed definition (15). Bottom: A smooth version of (15).
Figure 3. Three ways to re-define the L0 norm for a scalar x. Top: The traditional definition (1). Middle: The proposed definition (15). Bottom: A smooth version of (15).
Axioms 14 00605 g003
Figure 4. The curve of d f 0 ( x ) / d x according to the new definition (15).
Figure 4. The curve of d f 0 ( x ) / d x according to the new definition (15).
Axioms 14 00605 g004
Figure 5. The curve of f 0 ( x ) / d x according to the new definition (15) and by replacing the zeros with zero-mean random variables.
Figure 5. The curve of f 0 ( x ) / d x according to the new definition (15) and by replacing the zeros with zero-mean random variables.
Axioms 14 00605 g005
Figure 6. We consider updating a pixel x i , j by calculating the gradients with its horizonal and vertical neighbors.
Figure 6. We consider updating a pixel x i , j by calculating the gradients with its horizonal and vertical neighbors.
Axioms 14 00605 g006
Figure 7. The flowchart of the proposed POCS algorithm.
Figure 7. The flowchart of the proposed POCS algorithm.
Axioms 14 00605 g007
Figure 8. Results of the first phantom study. (A) True phantom; (B) MLEM reconstruction; (C) TV reconstruction; (D1, D2) Proposed revised L0 norm reconstruction.
Figure 8. Results of the first phantom study. (A) True phantom; (B) MLEM reconstruction; (C) TV reconstruction; (D1, D2) Proposed revised L0 norm reconstruction.
Axioms 14 00605 g008aAxioms 14 00605 g008bAxioms 14 00605 g008c
Figure 9. Results of the second phantom study. (A) True phantom; (B) MLEM reconstruction; (C) TV reconstruction; (D) Proposed revised L0 norm reconstruction.
Figure 9. Results of the second phantom study. (A) True phantom; (B) MLEM reconstruction; (C) TV reconstruction; (D) Proposed revised L0 norm reconstruction.
Axioms 14 00605 g009aAxioms 14 00605 g009b
Table 1. Parameters for the first phantom.
Table 1. Parameters for the first phantom.
TypeCenter x Center yDiameter or Side LengthRotation AngleDensity
Circle00230.4000.5
Square89.60025.6001.0
Square089.6023.0400.5
Square−89.60020.4801.0
Square0−89.6017.9201.0
Square64.0064.0015.3600.5
Square−64.00−64.0012.8001.0
Square−64.0064.0010.2401.0
Square64.00−64.007.6800.5
Square005.1201.0
Circle44.80023.0401.0
Circle044.8017.9200.5
Circle−44.80012.8001.0
Circle0−44.807.6800.5
Table 2. Quantitative evaluation results for the first phantom.
Table 2. Quantitative evaluation results for the first phantom.
MethodPSNRSSIM
MLEM (B)9.38220.4001
TV (C)13.49890.6341
Proposed 1 (D1)14.66750.6761
Proposed 2 (D2)17.08310.7144
Table 3. Quantitative evaluation results for the second phantom.
Table 3. Quantitative evaluation results for the second phantom.
MethodPSNRSSIM
MLEM (B)17.24550.7067
TV (C)17.26790.7300
Proposed (D)22.05010.8547
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zeng, G.L. Mitigating the Drawbacks of the L0 Norm and the Total Variation Norm. Axioms 2025, 14, 605. https://doi.org/10.3390/axioms14080605

AMA Style

Zeng GL. Mitigating the Drawbacks of the L0 Norm and the Total Variation Norm. Axioms. 2025; 14(8):605. https://doi.org/10.3390/axioms14080605

Chicago/Turabian Style

Zeng, Gengsheng L. 2025. "Mitigating the Drawbacks of the L0 Norm and the Total Variation Norm" Axioms 14, no. 8: 605. https://doi.org/10.3390/axioms14080605

APA Style

Zeng, G. L. (2025). Mitigating the Drawbacks of the L0 Norm and the Total Variation Norm. Axioms, 14(8), 605. https://doi.org/10.3390/axioms14080605

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop