Spatially Adaptive Regularization in Image Segmentation

We modify the total-variation-regularized image segmentation model proposed by Chan, Esedoglu and Nikolova [SIAM Journal on Applied Mathematics 66, 2006] by introducing local regularization that takes into account spatial image information. We propose some techniques for defining local regularization parameters, based on the cartoon-texture decomposition of the given image, on the mean and median filters, and on a thresholding technique, with the aim of preventing excessive regularization in piecewise-constant or smooth regions and preserving spatial features in nonsmooth regions. We solve the modified model by using split Bregman iterations. Numerical experiments show the effectiveness of our approach.


Introduction
Image segmentation is a fundamental problem in image processing and computer vision. Its goal is to divide the given image into regions that represent different objects in the scene. Variational models for image segmentation have been widely investigated, proving to be very effective in many applications -see, e.g., [1] and the references therein. Roughly speaking, the segmentation may be obtained by minimizing a cost functional which linearly combines regularization and data fidelity terms: where u : Ω ⊂ R 2 → R provides a representation of the segmentation and λ > 0 is a parameter that controls the weight of the fidelity term G versus the regularization term F. A widely-used segmentation model is the two-phase partitioning model introduced by Chan, Esedoḡlu and Nikolova [2], which we refer to as CEN model: minimize u,c 1 ,c 2 Ω |∇u| dx + λ Ω (c 1 −ū(x)) 2 u(x) + (c 2 −ū(x)) 2 (1 − u(x)) dx, s.t. 0 ≤ u ≤ 1, c 1 , c 2 > 0. (2) arXiv:2008.02168v1 [math.NA] 5 Aug 2020 Hereū denotes the image to be segmented, which is assumed to take its values in [0, 1]. The CEN model allows us to obtain one of the two domains defining the segmentation, Σ and Ω\Σ, by setting Σ = {x : u(x) > α} for a.e. α ∈ (0, 1), where u is the solution of problem (2). Note that (2) is the result of a suitable relaxation of the Chan-Vese model [3] leading to a convex formulation of that problem for any given (c 1 , c 2 ).
Here we start from a discrete version of the CEN model. Let be a discretization of Ω consisting of an m × n grid of pixels and let |∇ x u| i,j = |δ + x u| i,j , |∇ y u| i,j = |δ + y u| i,j where δ + x and δ + y are the forward finite-difference operators in the xand y-directions, with unit spacing, and the values u i,j with indices outside Θ are defined by replication. We consider the following discrete version of (2) with anisotropic discrete total variation (TV): After the minimization problem (3) is solved, the segmentation is obtained by taking for some α ∈ (0, 1). Problem (3) is usually solved by alternating the minimization with respect to u and the minimization with respect to c 1 and c 2 . In the sequel, we denote E(u, c 1 , c 2 ) the objective function in (3), and F(u) and G(u, c 1 , c 2 ) its discrete regularization and fidelity terms, respectively. The selection of a parameter λ able to balance F(u) and G(u, c 1 , c 2 ) and to produce a meaningful solution is a critical issue. Too large values of λ may produce oversegmentation, while too small values of λ may produce undersegmentation [4]. Furthermore, a constant value of λ may not be suitable for the whole image, i.e., different regions of the image may need different values. In recent years, spatially adaptive techniques have been proposed, focusing on local information, such as gradient, curvature, texture and noise estimation cues -see, e.g., [5]. Space-variant regularization has been also widely investigated in the context of image restoration, using TV and its generalizations -see, e.g., [6][7][8][9][10].
In this work, we propose some techniques for setting the parameter λ in an adaptive way based on spatial information, in order to prevent excessive regularization of smooth regions while preserving spatial features in nonsmooth areas. Our techniques are based on the so-called cartoon-texture decomposition of the given image, on the mean and median filters, and on thresholding techniques. The resulting locally adaptive segmentation model can be solved either by smoothing the discrete TV term -see, e.g., [11,12] and applying optimization solvers for differentiable problems such as spectral gradient methods [13][14][15][16] or by using directly optimization solvers for nondifferentiable problems, such as Bregman, proximal and ADMM methods [17][18][19][20][21][22][23]. In this work we use an alternating minimization procedure exploiting the split Bregman (SB) method proposed in [19]. The results of numerical experiments on images with different characteristics show the effectiveness of our approach and the advantages coming from using local regularization.
The rest of this paper is organized as follows. In Section 2 we propose our spatially adaptive techniques. In Section 3 we describe the solution of the segmentation model using those techniques by the aforementioned SB-based alternating minimization method. In Section 4 we discuss the results obtained with our approaches on several test images, performing also a comparison with the CEN model and a segmentation model developed for images with texture. The results show the effectiveness of our approach and the advantages coming from the use of local regularization. Some conclusions are provided in Section 5.

Defining local regularization by exploiting spatial information
The regularization parameter λ plays an important role in the segmentation process, because it controls the tradeoff between data fidelity and regularization. In general, the smaller the parameter in (3) the smoother the image content, i.e., image details are flattened or blurred. Conversely, the larger the parameter the more the enhancement of image details, and hence noise may be retained or amplified. Therefore, λ should be selected according to local spatial information. A small value of λ should be used in the smooth regions of the image to suppress noise, while a large value of λ should be considered to preserve spatial features in the nonsmooth regions. In other words, a matrix Λ = (λ i,j ) should be associated with the image, where λ i,j weighs pixel (i, j), as follows: Furthermore, in order to avoid oversegmentation or undersegmentation, it is convenient to fix a minimum and a maximum value for the entries of Λ, so as to drive the level of regularization in a reasonable range, depending on the image to be segmented.
We define Λ as a function of the imageū to be segmented: where 0 < λ min < λ max < ∞. We propose three choices of f , detailed in the next subsections. We note that problem (5) still has a unique solution for any fixed (c 1 , c 2 ), as a consequence of the next proposition.
Proof. Since the CEN model is convex for any fixed (c 1 , c 2 ), the thesis immediately follows from the fact that the parameters λ i,j are constant with respect to u.

Regularization based on the cartoon-texture decomposition
We define f (ū i,j ) by using the Cartoon-Texture Decomposition (CTD) of the image discussed in [24]. CTD splits any image u into the sum of two images, w and v, such that w represents the cartoon or geometric (piecewise-smooth) component of u, while v represents the oscillatory or textured component, i.e., v contains essentially textures, oscillating patterns, fine details and noise. The algorithm for computing CTD acts as described next. For each image pixel, we decide whether it belongs to the cartoon or the textural part by computing a local indicator associated with an image window around the pixel. The main feature of a textured region is its high TV, which decreases very fast under low-pass filtering. This leads to the following definition of local total variation (LTV) at a pixel (i, j): where L σ is a low-pass filter, σ is a scale parameter, |∇u| i,j = (∇ x u) 2 i,j + (∇ y u) 2 i,j and * denotes the convolution product. The relative reduction rate of LTV, gives the local oscillatory behavior of the image. A value of (ρ σ ) i,j close to 0 means that there is little relative reduction of LTV by the low-pass filter, thus the pixel (i, j) belongs to the cartoon. Conversely, (ρ σ ) i,j close to 1 indicates that the relative reduction is large and hence the pixel belongs to a textured region.
We use (8) for defining the weights λ i,j . The basic idea is that a large regularization parameter is needed if the pixel (i, j) belongs to the cartoon, while the parameter must be reduced in texture regions. Therefore, we set the function f in (6) as and (ρ σ ) i,j is defined by using the given imageū. Following [25], we set L σ equal to the Gaussian filter.

Regularization based on the mean and median filters
We define a technique based on spatial filters that are commonly used to enhance low-frequency details or to preserve edges [26,27]. More precisely, we combine the mean and median filters; the former aims at identifying smooth regions, where the regularization parameter can take small values, the latter aims at identifying edges, where the parameter must remain large. Mean filtering is a simple and easy-to-implement method for smoothing images, i.e., for reducing the intensity variation between a pixel and its neighbors. It also removes high-frequencies components due to the noise and the edges in the image, so the mean filter is a low-pass filter. The median filter preserves edges and useful details in the image.
Based on these considerations, we define a weight function as follows: where h 1 is the window size of the mean filter L h1 , h 2 is the window size of the median filter M h2 , and t is a threshold value acting as a cutoff between the two filters. Note that 0 ≤ ω i,j ≤ 1 and the pixels in homogeneous regions have ω i,j close to 1. The function f in (6) is set as where MM stands for "Mean and Median".

Regularization based on thresholding
This approach implicitly exploits the definition of Σ in (4) to set λ i,j . The idea is to use large values of λ i,j when u i,j is close to 1 and small values when u i,j is close to 0. Therefore, the parameters λ i,j are not defined in terms of the given imageū only. If the function u identifying the segmentation were known a priori, we could define λ i,j = f (u i,j ) as follows: where , emax = log 10 λ max and emin = log 10 λ min .
Since the function u must be computed by minimizing (3) and this is done by using an iterative procedure, we decided to update λ i,j at each iteration, using the current value of u i,j . On the other hand, in this case evaluating f is computationally cheaper than in the previous approaches, which apply two-dimensional convolution operators; thus the cost of the iterative update of λ i,j is practically negligible.

Solution by split Bregman iterations
As mentioned in Section 1, we use an alternating minimization method to solve problem (5). Given u k−1 i,j , by imposing the first-order optimality conditions with respect to c 1 and c 2 , we get For the solution of (5) with respect to u we use the split Bregman (SB) method [19]. Let where Following [28], we reformulate the minimization problem as follows: Given c k 1 and c k 2 , the SB method applied to (15) reads: where µ > 0. Closed-form solutions of the minimization problems with respect to d x and d y can be computed using the soft-thresholding operator: where, for any v = (v i,j ) and any scalar γ > 0, Finally, an approximate solution to the minimization problem with respect to u can be obtained by applying Gauss-Seidel (GS) iterations to the following system, as explained in [28]: where ∆ is the classical finite-difference discretization of the Laplacian. If the solution to (17) lies outside [0, 1] m×n , then it is projected onto that set. We denote P [0,1] the corresponding projection operator. The overall solution method is outlined in Algorithm 1. Note that when the approach in Section 2.3 is used, the values λ i,j must be updated at each iteration k, using (12) with u = u k .

Results and comparisons
The three spatially adaptive regularization techniques were implemented in MATLAB, using the Image Processing Toolbox. Algorithm 1 was implemented by modifying the Fast Global Minimization Algorithm for Active Contour Models by X. Bresson, available from http://htmlpreview.github.io/?https://github.com/xbresson/old_codes/blob/master/codes.html. This is a C code with a MATLAB MEX interface.
L σ in (7) is defined as a rotationally symmetric Gaussian filter with size 3 and standard deviation σ = 2. The mean and median filters use windows of size 3 and 7, respectively, and the parameter t in (10) is set as t = 0.5. The parameter α = 0.5 is used to identify the domain Σ according to (4).
In the original and modified codes, the SB iterations are stopped as follows: Algorithm 1: SB-based method for spatially adaptive segmentation Input :ū, λ min , λ max , f , µ, α (with f defined in (9) or (11) or (12)) Output : u, c 1 , c 2 Set u 0 =ū, d 0 Compute c k 1 and c k 2 by (13) Compute u k by applying GS iterations to system (17) tol is a given tolerance, and maxit denote the maximum number of outer iterations. The stopping criterion for the GS iterations is and tol GS and maxit GS are the tolerance and the maximum number of iterations for the GS method, respectively. In our experiments we set tol = 10 −6 , maxit = 30, tol GS = 10 −2 and maxit GS = 50. The adaptive models are compared with the original CEN model on different images widely used in image processing tests, listed in Table 1 and shown in Figures 1 to 4. In particular, the images bacteria, bacteria2, brain, cameraman, squirrel and tiger are included in Bresson's code We note that tiger is included in Bresson's code as a test problem for a segmentation model specifically developed for images with texture [30], which is also implemented in the code. This model uses the well-known Kullback-Leibler divergence function regularized with a TV term. The model is solved with the SB method. We perform the segmentation of tiger with the CEN model, the textural segmentation model and our approaches, to investigate whether our locally adaptive model can be also suitable for textural images. The cameraman image is perturbed with additive Gaussian noise, with zero mean and two values of standard deviation, σ = 15, 25, with the aim of evaluating the behavior of our adaptive approaches on noisy images. The noisy images are called cameraman15 and cameraman25.  Tables 2 to 4. The values of λ i,j in the adaptive models are initialized as specified in (9), (11) and (12). As described in Section 2.3, in the strategy based on thresholding those values change at each iteration k of Algorithm 1. It is worth noting that our approach also simplifies the choice of the regularization parameter, which may be a time-consuming task.
The initial approximation u 0 is set equal to the imageū i,j , which takes its values in [0, 1], as specified in Section 1. This is used to compute the starting values of c 1 and c 2 , and s i, In the original non-adaptive code, the value of λ is scaled using the difference between the maximum and the minimum value of s i,j ; the same scaling is applied to λ min and λ max in the implementations of the adaptive approaches. We run the tests on an Intel Core i7 processor with clock frequency of 2.6 GHz, 8 GB of RAM, and a 64-bit Linux system.
For each of the six images bacteria, bacteria2, brain, flowerbed, ninetyeight and squirrel we show the results obtained with the spatially adaptive strategy yielding the best segmentation for that image. The corresponding (unscaled) values of λ, λ min , λ max , the value of µ, as well as the number of outer iterations and the mean number of GS iterations per outer iteration are reported in Table 2. Note that for squirrel we use two values of λ min , one equal to the value of λ in the CEN model and the other greater than that, obtained by trial and error. Both values of λ min produce the same segmentation, but the larger value of λ max reduces the number of outer and GS iterations. The segmentations corresponding to the data in Table 2 are shown in Figures 1 to 3. For squirrel we display the CTD segmentation computed by using the larger value of λ min .
We see that on the selected problems CTD reduces the number of outer and GS iterations with respect to the CEN model; on the other hand, the setup of the regularization parameters is computationally more expensive. In terms of iterations, there is no clear winner between CEN and MM and between CEN and THR. The models based on the spatially adaptive techniques are slightly more expensive than the CEN model in this case too. A significant result is that the segmentations obtained with the adaptive techniques appear better than those obtained with the non-adaptive model, i.e., the spatially adaptive models can better outline boundaries between objects and foreground. This is clearly visible by looking at the segmentations of brain, bacteria2 (see the upper right corner), ninetyeight, bacteria (see the shape of the bacteria). It is also worth noting that the adaptive model based on THR removes textural details that do not belong to the flowerbed in the homonym image. The latter observation is confirmed by the experiments performed on tiger. The corresponding model and algorithmic details are reported in Table 3, while the segmentations are shown in Figure 4 along with (visual) information on quantities used to define λ i,j (see (8) and (10)). We see that the CTD and MM strategies produce satisfactory results, although they have been obtained by generalizing a non-textural model.
Finally, we show the results obtained on cameraman and its noisy versions by using CEN, CTD, MM, and THR. The methods perform comparable numbers of inner and GS iterations (see Table 4), but the spatially adaptive model THR yields some improvement over the CEN model on the noisy images ( Figure 5).

Conclusions
We introduced spatially adaptive regularization in a well-established variational segmentation model with the aim of improving the segmentation of images by suitably taking into account their smooth and nonsmooth regions. To this aim, we introduced three techniques, based on the application of suitable spatial filters or thresholding. The locally adaptive models, solved via an alternating minimization method using split Bregman iterations, showed the effectiveness of our approaches on several images, including also textural and noisy images. We also believe that the proposed models may simplify the setup of the regularization parameter.
Future work can include the extension of our spatially adaptive strategies to other segmentation models and the development of further adaptive techniques.  . Segmentations of tiger obtained by using the textural, CEN, CTD and MM models. A representation of quantities used to define λ i,j (see (8) and (10)) is also provided.