A Characterization of the Domain of Beta-Divergence and Its Connection to Bregman Variational Model

In image and signal processing, the beta-divergence is well known as a similarity measure between two positive objects. However, it is unclear whether or not the distance-like structure of beta-divergence is preserved, if we extend the domain of the beta-divergence to the negative region. In this article, we study the domain of the beta-divergence and its connection to the Bregman-divergence associated with the convex function of Legendre type. In fact, we show that the domain of beta-divergence (and the corresponding Bregman-divergence) include negative region under the mild condition on the beta value. Additionally, through the relation between the beta-divergence and the Bregman-divergence, we can reformulate various variational models appearing in image processing problems into a unified framework, namely the Bregman variational model. This model has a strong advantage compared to the beta-divergence-based model due to the dual structure of the Bregman-divergence. As an example, we demonstrate how we can build up a convex reformulated variational model with a negative domain for the classic nonconvex problem, which usually appears in synthetic aperture radar image processing problems.


Introduction
In general, the domain of a divergence [1,2] is that confined not by the positiveness of variables but by the positiveness of a divergence (i.e., D(b|u) ≥ 0).Therefore, the domain of a divergence could be defined to include negative region while keeping positiveness of the divergence.To the best of our knowledge, it is unclear when the domain of the β-divergence (and the corresponding Bregman-divergence) include the negative region.In this article, we systematically explore the domains of the β-divergence [2] and the corresponding Bregman-divergence associated with the convex function of Legendre type [3].
The β-divergence [2,[4][5][6][7] is a general framework of similarity measures induced from various statistical models, such as Poisson, Gamma, Gaussian, Inverse Gaussian, compound Poisson, and Tweedie distribution.For the connection between the β-divergence and the various statistical distributions, see [8].Among the diverse statistical distributions, the Tweedie distribution has a unique feature, i.e., the unit deviance of the Tweedie distribution [8] corresponds to the β-divergence with β ∈ R \ (1, 2).It is interesting that (1,2) is a vital range of β while defining a convex right Bregman proximity operator [9,10].We will address this issue in more details in Section 4. In addition, the β-divergence is also used as a distance-like measure in diverse areas, for instance, synthetic aperture radar (SAR) image processing [11,12], audio spectrogram comparison [6,13], and brain EEG signal processing [7].
We note that authors in [7] show the usefulness of the β-divergence with β > 1 as a robust similarity measure against outliers between two probability distributions.Here, outliers (rare events) are that have extremely low probability and thus they exist near zero probability.However, the (generalized) Kullback-Leibler-divergence (i.e., β-divergence D β=1 (b|u)), which is a commonly used similarity measure for probability distributions, is undefined at zero (u = 0).See Figure 2a.Therefore, it is not easy to obtain robustness against outliers through the (generalized) Kullback-Leibler-divergence.On the contrary, the β-divergence with β > 1 (i.e., D β>1 (b|u)) is well defined at zero (u = 0) and thus it is more robust to outliers than the Kullback-Leibler-divergence.For more details, see [4,5,7].We also note that if the variables of β-divergence are not probability distributions (i.e., unnormalized) then outliers correspond to the variables that have extremely large values ( 1) [14].To detect such kind of outliers under the Gamma distribution assumption, the β-divergence with β ∈ [−1, 0] is used as a distance-like measure in [11].See also Figure 2c.
In the case of SAR image data processing, speckle noise is modeled with the Gamma distribution and thus the negative log-likelihood function, which appears in speckle reduction problem, corresponds to the β-divergence with β = 0, i.e., the Itakura-Saito-divergence.Actually, this model is highly nonconvex [15].Therefore, various transforms are introduced to relax nonconvexity of the Gamma distribution related speckle reduction model [16][17][18][19][20][21].Recently, we have shown that the β-divergence with β ∈ (0, 1) can be used as a transform-less convex relaxation model for SAR speckle reduction problem [12].Generally, the data captured via a SAR system has extremely high dynamic range [22,23].Under this harsh environment, β-divergence with β ∈ (−1, 0) is successfully used as a similarity measure for separation of the strong scatterers in SAR data [11].In addition, the β-divergence is also used for the decomposition of magnitude data of audio spectrograms [6].In these applications, the domains of data are generally assumed to be positive.However, the domain of the β-divergence can be extended to the negative region.In fact, if β = 2, then the β-divergence is exactly the square of 2 -distance, the domain of which naturally includes a negative region.Surprisingly, in this article, we show that, under the mild condition on β, there are infinitely many β-divergences that have a negative domain.
It is known that the β-divergence can be reformulated with the Bregman-divergence [2,6].However, if we restrict the base function of the Bregman-divergence as the convex function of Legendre type, then some part of the β-divergence cannot be expressed through the Bregman-divergence (see Table 3).Although the Bregman-divergence associated with a convex function of Legendre type does not exactly match with the β-divergence, due to the fruitful mathematical structure of the convex function of Legendre type, the associated Bregman-divergence has many useful properties.For instance, the dual formulation of the Bregman-divergence associated with the convex function of Legendre type can be used as a convex reformulation of some nonconvex problems under the certain condition on its domain [24].In this article, we demonstrate that, by using the dual Bregman-divergence with the negative convex domain, we can make a convex reformulated Bregman variational model for the classic nonconvex problem that appears in the SAR image noise reduction problem [15].We also show that we can unify the various variational models appearing in image processing problems as the Bregman variational model having sparsity constraints, e.g., total variation [25,26] (we called it Bregman-TV).Actually, the Bregman variational model corresponds to the right Bregman proximity operator [9,10].See also [9,10,24,[27][28][29][30] for theoretical analysis of the Bregman-divergence and related interesting properties of it.

Background
In this section, we review typical examples of the β-divergence, i.e., Itakura-Saito-divergence, Generalized Kullback-Leibler-divergence (I-divergence), and norm 2 -distance.In addition, we introduce the Bregman-divergence and the corresponding Bregman variational model with sparsity constraints.

Let us start with the β-divergence D
where Actually, the domain Ω L × Ω R corresponds to the effective domain in optimization [3,31].We call Ω L and Ω R as the left and right domains of the β-divergence.In addition, we assume that the left and right domains, Ω L and Ω R , are convex sets, respectively.That is, if a, b ∈ Ω L (or Ω R ), then the line segment between two points also satisfies ab In addition, integration, multiplication, and division are performed component-wise.Based on a selection of β, we can recover the famous representatives of the β-divergence, i.e., Itakura-Saito-divergence [4,5,13], I-divergence (or generalized Kullback-Leibler-divergence) [20,32], and norm 2 -distance [25,26].These three divergences are important examples of the β-divergence, since they show three different types of domains of the β-divergence.We summarize them in the following.
Usually, the left and right domain, i.e., Ω L and Ω R of Itakura-Saito-divergence, are defined as positive and Ω L = Ω R [12,13].However, due to the scale invariance property of it, the variables b and u can be negative at the same time, even within the logarithmic function, i.e., Ω L = Ω R = R n −− .Based on this keen observation, in this article, we develop a new methodology that systematically detects a domain having the negative region.The Itakura-Saito-divergence is a typical example that can be expressed by the β-divergence and the Bregman-divergence at the same time.However, it has the negative domain in the β-divergence framework, but not in the Bregman-divergence framework (see Table 3).
• Generalized Kullback-Leibler-divergence (I-divergence) (β = 1): where we naturally assume that 0 ln 0 = 0. Interestingly, it has different left and right domains, i.e., Ω L = Ω R .Due to the asymmetric structure of the domain of I-divergence, we need to carefully handle the β-divergence at the boundary of each domain.We categorize the class of the β-divergence that has the asymmetric domain structure in Section 2.
This divergence is preferable to other divergences, since it has R n as its domain for each variable.Unlike the previous two divergences, the domain of it naturally includes a negative region R n − .Surprisingly, there are infinitely many β-divergences having R n as its domain.We will show it in Section 2.
Additionally, we introduce the Bregman-divergence associated with the convex function of Legendre type [3].The Bregman-divergence D Φ : Ω × int(Ω) → R + is formulated as where the base function Φ is the convex function of Legendre type [3], ∈ R}, and int(Ω) is the interior of Ω.In fact, it is relatively interior of Ω, i.e., ri(Ω).Note that ri(Ω) is the interior of Ω relative to its affine hull, which is the smallest affine set including Ω. Therefore, the relative interior ri(Ω) coincides with the interior int(Ω) when the affine hull of Ω is R n .For more details, see Chapter 2.H in [31].In this article, since the β-divergence (1) is separable in terms of dimension, the affine hull of Ω is always R n and thus we simply use int(Ω) instead of ri(Ω).Note that the typical examples of the β-divergence in the above (Itakura-Saito-divergence, I-divergence, and norm 2 -distance) can be reformulated with the Bregman-divergence (2) by using the convex function of Legendre type Φ and the associated domain Ω: The domain of the second variable of the Bregman-divergence ( 2) is always open set int(domΦ).However, the right domain Ω R of the second variable of the β-divergence (1) could be a closed set.In the coming section, we thoroughly analyze the relation between the Bregman-divergence and the β-divergence with regard to its domain.Based on the Bregman-divergence (2), we introduce the Bregman variational model that unifies the various minimization problems appearing in image processing: min where b is the observed data and R(u) is the sparsity enforcing regularization term, such as total variation [26].In image processing, (3) corresponds to the denoising problem under the various noise distributions: Poisson, Speckle, Gaussian noise, etc.However, in optimization, it is known as (nonconvex) right Bregman proximity operator under mild conditions.See [9,10,24,30] for more details on the Bregman operator.

Overview
The article is organized as follows.In Section 2, we analyze the structure of the domain of the β-divergence.In Section 3, we study various mathematical structures of the β-divergence through the Bregman-divergence associated with the convex function of Legendre type.In Section 4, we introduce the Bregman variational model and its dual formulation for convex reformulation of the classic nonconvex problem that appears in the SAR speckle reduction problem.In addition, we introduce the right and left Bregman proximal operator.We give our conclusions in Section 5.

A Characterization of the Domain
In this section, we analyze the structure of the β-divergence and the associated domain Ω L × Ω R based on the so-called extended logarithmic function.
Let us start with a definition of the extended logarithmic function that is essential in characterizing the domain of the β-divergence.We note that it corresponds to an equivalence class of Tsallis's generalized logarithmic function [1,33] with an extention to the negative domain.
and c = u i , i = 1, ..., n}.Then, the extended logarithmic function is defined as an equivalence class For simplicity, we leave out all constants after integration and then we attain where dom(ln α ) = {x ∈ R n | ln α (x) ∈ R n }.We call (4) as the extended logarithmic function instead of the equivalence class [ln α (u)] c , unless otherwise specified.
Note that the domain and range of ln α (u) (4) are given in Table 1.In addition, we illustrate the structure of the extended logarithmic function in Figure 1.As noticed in Definition 1, the extended logarithmic function is defined as an equivalence class [ln α (u)] c with respect to c.If we set c = 1, then we can recover Tsallis' generalized logarithmic function [1,33] on its positive domain dom(ln α ) ⊆ R + .See Figure 1a.However, we cannot use the generalized logarithm (i.e., ln α,c=1 (u)) in the negative domain.In fact, if α > 1 and R − ⊂ dom(ln α ) then Tsallis' generalized logarithmic function is undefined, e.g., ln 4,c=1 (−1) = −1 1 x −4 dx ∈ R. On the other hand, the proposed extended logarithmic function ( 4) is well defined on R for all α, since we can choose an appropriate c having the same sign with u even if α > 1, e.g., ln 4,c=−2 (−1) = −1 −2 x −4 dx ∈ R. See Figure 1d and Table 1.Indeed, the extended logarithmic function is useful when we simplify the complicated structure of the β-divergence.As described in the following Definition 2, the β-divergence is defined based on the difference of two extended logarithmic functions.In other words, the β-divergence is invariance with respect to a constant function in the extended logarithmic function (4).It is interesting that the Bregman-divergence (2) also has a similar invariance property with respect to an affine function in the base function Φ (see Proposition 1).
Table 1.The domain and range of the extended logarithmic function ln α (x) defined in (4).
After integration, we get the well-known formula of β-divergence: Although the β-divergence has a unified formula (5) via the extended logarithm (4), unfortunately, the determination of the domain Ω L × Ω R of the β-divergence heavily depends on β.Before we go any further, let us introduce the most important equivalence classes in this article.It will simplify complicated notations appearing in the β-divergence and the Bregman-divergence.
Note that R e and R o are subsets of the rational number and satisfy R e ∩ R o = ∅, while R x is composed of all irrational numbers with a subset of the rational number that are not in 6) is developed based on the extended logarithmic function (i.e., power functions), inherently, we have to quantify the domain of a power function p(x) = x α and its inverse function p −1 (y) = y 1/α .Actually, if x is positive, then a power function p(x) and the corresponding inverse function is well defined, irrespective of the choice of an exponent α ∈ R \ {0}.On the other hand, in the case of negative domain, e.g., x < 0, the domain of a power function p(x) severely depends on the choice of an exponent α.With newly introduced equivalence classes in (7), we can easily categorize the domain of a power function p(x) = x α and its inverse function p −1 (y) = y 1/α , α = 0. We summarize it in the following Lemma.
and the corresponding range In addition, if α ∈ R o , then the inverse function of p is well defined and transparent on dom Proof.For any x ∈ R −− , let x = (−1)|x|, then the power function p is expressed as p(x) = (−1) α |x| α , ∀α ∈ R. We note that the negative real domain of p(x) is well-defined only if (−1) α ∈ {−1, +1}.To clarify the evaluation of (−1) α , let us express it in a polar form: Then, we get where δ ∈ C \ R. Regarding the inverse function p −1 (y) = y 1/α , we have (−1) Now, with the equivalence classes (7) and Lemma 1, we classify domains of the β-divergence.The details are given in the following Theorem and Table 2. See also Figure 2 for the overall structure of the β-divergence on its domain.

Table 2. A classification of the domain
Theorem 1.Let us consider the domain of the β-divergence In addition, let us assume that the minimum value of G(x) on its domain is nonnegative, i.e., where (b, u) Proof.Due to the assumption M G ≥ 0, we can easily obtain the positiveness of the β-divergence by the following inequality Consequently, we only need to fulfill the following two conditions: (1) Is the domain of the β-divergence determined to satisfy ( 9)? ( 2) Is the β-divergence well-defined on its domain?
Based on β ∈ R, we have two different cases regarding the domain -If β ∈ R e , then, due to (9), the negative region cannot be included into the domain of the β-divergence.Therefore, we have -If β ∈ R e , then due to (9), the domain of the β-divergence can be defined to include negative region.That is, we have Basically, the β-divergence can be expressed as That is, it is based on integrations of power functions of real variables u and b in R n .Therefore, we only need to see whether or not the integration in D β (b|u) is well defined at {0}.We note that, after integration, the exponents of b ∈ Ω L and u ∈ Ω R are different and thus the corresponding domains Ω R and Ω L could be different as well.Hence, we should consider the following three different cases: β > 1: We do not have any singularity at {0} with respect to b ∈ Ω L and u ∈ Ω R .Therefore, we have {0} ⊂ Ω L = Ω R .-0 < β ≤ 1: After integration, b ∈ Ω L does not have any singularity at {0}.However, u ∈ Ω R has a singularity at {0}.Therefore, we have {0} ⊂ Ω L but {0} ⊂ Ω R and thus Based upon the analysis in Cases 1 and 2, we have six different choices of domain Ω L × Ω R for the β-divergence.It is summarized in Table 2 and illustrated in Figure 2. Since we only consider a convex domain, Ω L and Ω R should be selected as a convex set for each variable.In addition, due to the inherent integral formulation of β-divergence between b and u, the domain of both variables should be determined to have the same sign.
As observed in Table 2, if β ∈ R e ∩ R − , then there is a symmetry in the selection of the domain of the β-divergence.That is, The positive domain is generally preferable, since it is related to real applications, e.g., intensity data type in the SAR system [11,12].However, if we reformulate the β-divergence with the Bregman-divergence, then the negative domain commonly appears in the dual Bregman-divergence.In addition, we note that, due to Theorem 1, the β-divergence with the domain defined in Table 2 satisfies the following distance-like properties of the generic divergence [1,2]: where ( 11) is followed from the Definition of the domain of the β-divergence and (10).Note that ( 12) is satisfied, if we restrict the domain of the β-divergence as Ω R × Ω R .In fact, let us assume that (b, u) . Therefore, we only need to show (10).Since the β-divergence (5) with its domain defined in Table 2 has the distance-like properties, (11) and (12), we can make a variational model with the β-divergence and a regularization term for smoothness constraint of the given data b.The following is an example of a variational model based on the β-divergence [12] where λ > 0 and B ⊆ Ω R is a domain of F β for a given b ∈ B. Note that B is an open convex set induced from the physical constraints of the observed data b.In the case of a prior R(u), it can be a sparsity-enforcing function such as total variation (TV) TV(u) = ∇u 1 [25,26] and frame [34].We call (13) the β-sparse model or β-TV [12], if TV is used as a prior.Under the domain restriction in Table 2, actually, we have lots of freedom in choosing β ∈ R of the β-sparse model (13).However, if we add additional constraints, such as convexity onto the β-divergence, then interestingly, the possible choice of β is dramatically reduced to a small set.For example, D β (b|u) with respect to u is convex on its domain only , the convexity of it depends on the given data b [12].In Section 4.1, we analyze the convexity of the β-divergence via the right Bregman proximity operator [28].
Although the proposed β-sparse model F β (u) in ( 13) is not convex in general, F β (u) has an interesting global optimum property [11,12], in case λ = 0. See also [35,36].For completeness of the article, we add it below.
Theorem 2 ([11,36]).For a given observed data {b 1 is always satisfied, regardless of the choice of u ∈ B.
Note that µ in ( 14) corresponds to the β-centroid in segmentation problem and is related to the Bregman centroid, which is extensively studied in [37].In SAR image processing, if β = 0, then (14) corresponds to the multi-looking process, which is commonly used to reduce speckle noise in SAR data [12,22,23].

The Bregman-Divergence Associated with the Convex Function of Legendre Type for the β-Divergence
In this section, we study the Bregman-divergence associated with the convex function of Legendre type and its connection to the β-divergence.Although there is partial equivalence between two divergences, the Bregman-divergence has an important mathematical dual formulation.With a negative domain, the dual Bregman-divergence is unambiguously useful for convex reformulation of the nonconvex β-sparse model ( 13) (see Section 4).
The main advantage of the convex function of Legendre type is that the inverse function of the gradient of it has an isomorphism with the gradient of its conjugate function as described below.This is a useful property when we characterize the dual structure of the Bregman-divergence associated with the convex function of Legendre type.Theorem 3 ([3,27]).Let Ω = domΦ and Then, the function Φ is the convex function of Legendre type if and only if its conjugate is the convex function of Legendre type.In this case, the gradient mapping is an isomorphism with its inverse mapping (∇Φ) −1 = ∇Φ * .
For more details on Theorem 3, see Theorem 26.5 in [3] and Fact 2.9 in [27].Let us assume that Φ be the convex function of Legendre type.Then, we can define the Bregman-divergence D Φ : Ω × int(Ω) → R + associated with Legendre Φ: where b ∈ Ω and u ∈ int(Ω).Several functions we are interested in are in a category of the convex function of Legendre type.For instance, Shannon entropy function Φ(x) = x log x, 1 is a typical example of Legendre.the Bregman-divergence associated with it corresponds to the β-divergence with β = 1, i.e., Generalized Kullback-Leibler-divergence.We note that there is the convex function of Legendre type which does not have a corresponding β-divergence.For instance, Fermi-Dirac entropy function See [27,36] for more details on the Bregman-divergence.In the following, we summarize various useful features of the Bregman-divergence associated with Legendre Φ.
In the above Theorem, the dual formulation ( 17) is a unique feature of the Bregman-divergence with Legendre.Unfortunately, the β-divergence does not have a corresponding dual concept.Later, we will show how to use the dual Bregman-divergence (17) to make a convex reformulated model of the nonconvex β-sparse model (13).In addition, we note that the β-divergence (5) is established based on the extended logarithmic function (4), which is an equivalence class in terms of a constant function.Therefore, we can say that the β-divergence is invariant with respect to a constant function in the extended logarithmic function.Interestingly, the base function Φ of the Bregman-divergence also has such kind of invariance property with respect to an affine function.For this, Φ does not need to be Legendre.However, for simplicity, we assume that Φ is Legendre.The details are following.Proposition 1.Let us define an equivalence class of the convex function of Legendre type Φ in terms of affine function as follows: where Proof.We have the following equivalence with respect to an arbitrary affine function A: where (b, u) ∈ Ω × int(Ω) and Ω = domΦ.Therefore, Φ with an arbitrary affine function A (i.e., an equivalence class [Φ] A in terms of affine function A) does not change the structure of the Bregman-divergence D Φ (b|u) at all.
To connect the β-divergence and the Bregman-divergence associated with Legendre, we need to find a specialized convex function of Legendre type.Based on the comments in [2], we use an integral formula of the extended logarithmic function for the special convex function of Legendre type.Through this connection, we can reformulate the β-divergence into the Bregman-divergence associated with the convex function of Legendre type.The details are following. where is the extended logarithmic function in (4), d is an arbitrary constant vector in R n and be selected to satisfy the condition Φ(x) ∈ R. For simplicity, by Proposition 1, we dropped all affine function in Φ(x).Then, Φ in (19) is the convex function of Legendre type with the domain Ω given below: Proof.For simplicity, all affine functions are left out based on Proposition 1.In addition, it is trivial to show that Φ(x) is the Burg entropy ( − ln x, 1 with Ω = R n ++ ) if β = 0 and the Shannon entropy ( x ln x, 1 with Ω = R n + ) if β = 1.They are well-known examples of Legendre.As noticed in (2), the corresponding Bregman-divergences are Itakura-Saito-divergence and Generalized Kullback-Leibler-divergence.Now, we only need to check whether Φ(x) = 1 β(β−1) x β , 1 (β = 0, 1) is Legendre or not.Among four conditions in Definition 3, it is trivial to show that Φ(x) satisfies the conditions 1 and 2. In the end, Φ(x) with β = 0, 1 and two Legendre conditions 3 and 4 are left.

II. Condition 4 in Definition 3:
It can be easily checked by the fact that Φ is strictly convex on int(Ω) if and only if ∇Φ is strictly monotone, that is, the following is satisfied [38] : Since ∇Φ is separable in terms of dimension, we only need to show that ∇Φ(x) is a strictly increasing function on int(Ω) ∩ R. Note that if ∇ 2 Φ(x) = x β−2 > 0 on an open region, then Φ is strictly convex (i.e., ∇Φ is strictly increasing) in that region: Note that at {0}, we need to directly show that ∇Φ is strictly increasing.Now, we integrate the information in the above Legendre conditions 3 and 4 for the decision of the domain Ω = domΦ based on β.The details are the following: but Φ is an odd function with respect to zero and thus it is not a convex function.
-Condition 3 is satisfied on the above selected domain.
-Condition 3 is satisfied on the above selected domain.
Remark 1.Note that Φ in (19) should be an equivalence class Here, [ln 2−β (t)] c is an equivalence class of an extended logarithmic function in (4).As observed in (18), we have D [Φ] A (b|u) = D Φ (b|u) for any affine function A. Therefore, we can drop all affine function in [Φ] A .
Since Φ in ( 19) is Legendre under the domain condition (20), we can establish a new Bregman-divergence associated with Φ in (19).Interestingly, it corresponds to the β-divergence (1) [2].However, there is a mismatch between the domain of the Bregman-divergence in (20) and the domain of the β-divergence in Table 2.We summarize it in Table 3.As a matter of fact, the positive domain with β > 1 is not defined in Φ (19) due to the Legendre condition.In addition, in the case of β = 0, the negative domain R n −− × R n −− is not defined in the Bregman-divergence with Φ (19).In the following Theorem, we show that under the restriction of the domain of the β-divergence to the domain of the Bregman-divergence, we can get an equivalence between the β-divergence and the Bregman-divergence associated with Legendre Φ (19).
Table 3.We compare the domain of the β-divergence and the domain of the Bregman-divergence associated with the convex function of Legendre type in (19).We note that the domain R n + × R n + (β > 1) and the domain R n −− × R n −− (β = 0) do not exist in the Bregman-divergence.* If we relax the Legendre condition of Φ as a convex and smooth function, then the Bregman-divergence D Φ also exists in the region β > 1 with domΦ = R n + .
Proof.Since the domain of the β-divergence Ω L × Ω R is set up with domΦ × int(domΦ), the β-divergence is well defined with the restricted domain.In the following, we show an equivalence between the β-divergence and the Bregman-divergence under the domain condition of the Bregman-divergence: Note that we do not use any ∇Φ(b) information in the above derivation and thus the above equivalence is well satisfied within the domain of the Bregman-divergence associated with Legendre Φ (19).
In the following Theorem, we calculate the conjugate function Φ * and the corresponding domain domΦ * of the convex function of Legendre type Φ defined in (19).The computation of the domain of Φ * is useful in determining the structure of D Φ (b|u).For instance, as noticed in Theorem 4 (3), if domΦ * is open, then the corresponding Bregman-divergence D Φ (b|u) is coercive with respect to u ∈ int(domΦ).Surprisingly, when β ∈ [0, 1), D Φ (b|u) is not convex but coercive with respect to u.This fact is importantly used in SAR speckle reduction problems [12,15,21].
Theorem 7. Let Φ (19) be the convex function of Legendre type with domΦ (20).Then, Φ * , the conjugate of Φ, and the corresponding domain domΦ * is calculated as follows: • β ∈ {0, 1} : In this case, domΦ * depends on β: Proof.Since Φ is Legendre, from Theorem 3, if x ∈ int(domΦ * ), then we have As noticed in (20), the domain of Φ (19) depends on β and thus the domain of its conjugate function Φ * also depends on β.We categorize domΦ * below, by using (22): 22), the conjugate function Φ * is calculated as Therefore, the domain of 22), the conjugate function Φ * is calculated as x β , 1 and domΦ is given in (20).By simple calculation, we get and from (22), the conjugate function Φ * is derived as follows: Now, we need to decide the domain While we identify domΦ * , it should be selected based on the following isomorphism (in Theorem 3) where ∇Φ(x) = 1 β−1 x β−1 = (∇Φ * ) −1 and the following estimation With the above information and the classification of domΦ in (20), we are going to decide domΦ * based on β.
β < 0 and domΦ = R n ++ : In this case, 0 < β/(β − 1) < 1.Therefore, we have domΦ * = R n − .Actually, this domain is well matched with the bijective mapping ∇Φ(x) In the following, the global minimization property of the β-divergence in Theorem 2 is reformulated with the Bregman-divergence.For more details, see [35,36].Theorem 8 ([36]).For all b i ∈ int(domΦ) with an index set i ∈ N, the following inequality is always satisfied, irrespective of the choice of u ∈ int(domΦ): where µ = 1 |N| ∑ i∈N b i and |N| is the cardinality of the set N.
Proof.Let us start with the generalized Pythagoras Theorem [35] of the Bregman-divergence: For all b i ∈ int(domΦ) with the index set N, let µ = 1 |N| ∑ i∈N b i .Then, from (26), we get irrespective of the choice of u ∈ int(domΦ).
Note that µ in (25) corresponds to the Bregman centroid, which is extensively studied in [37].Now, we are ready to jump into the variational model having the Bregman-divergence as its fitting term.Many various important variational models induced from the statistical distribution are in this category.

Bregman Variational Model-Bregman-TV
In this section, we study the β-sparse model (13) with TV regularization via Bregman divergence (16) associated with Legendre (19) under the domain condition in (20).First, we introduce Bregman proximity operators in Section 4.1 and then we demonstrate how to use dual Bregman-divergence with the negative domain for convex reformulation of the nonconvex β-TV model [12] in Section 4.2.
The image data in general is observed in 2D array and have a limited dynamic range, due to physical constraints of the image capturing system.Therefore, let us assume that the observed image data is bounded and also column-wise stacked.That is, b ∈ B ⊂ R n + , where B is an open and bounded convex set.Now, we start with the following Bregman variational model with total variation, i.e., Bregman-TV.min where TV(u) = ∑ n i=1 ∇u i is a typical sparsity constraint in image processing, L : R n + → R n + is a linear mapping, λ > 0, and B ⊆ int(domΦ) ⊂ R n + .Although the domain B is nonnegative in real applications, through the dual formulation of the Bregman-divergence (17), the nonpositive domain is very common and sometimes is useful for convex reformulation of nonconvex variational models appearing in SAR image enhancement problems.See Theorem 7 for the negative domain of the conjugate function Φ * .
We note that L is a matrix with nonnegative entries and it is designed based on various applications of image processing, e.g., for the image deblurring problem, L is a blur or a convolution matrix; for an image inpainting problem, L is a binary mask matrix; for an image denoising problem, L is an identity matrix.See [39,40] for more details on image denoising, deblurring, and inpainting problems with total variation or other sparsity constraints such as wavelet frames.The following are typical examples of the Bregman-TV induced from the various physical noise sources, e.g., Gaussian, Poisson, and Speckle noise: • β = 2: Image restoration problems (e.g., denoising, deblurring, inpainting) under Gaussian noise [25,26,41] min • β = 1: Image restoration problems (e.g., denoising, deblurring, inpainting) under Poisson noise [32] min • β = 0: Image restoration problems (e.g., denoising, deblurring, inpainting) under Gamma multiplicative noise (or speckle noise) [15] min • β ∈ (0, 1) : A convex relaxed model [12] for the above SAR image restoration model (30).Additionally, this region is related to the compound Poisson distribution [8] min For the remainder of this article, we only consider the image denoising problem (L = I).It is also known as the (nonconvex) right Bregman proximity operator [10].

Bregman Proximity Operators
In this section, we introduce the right and left Bregman proximity operator [9,10,30] based on the Bregman-divergence associated with Φ (19).In this section, let us assume that Φ (19) is convex, smooth, and (dimensionally) separable function (not necessarily Legendre).That is, the Bregman-divergence D Φ (b|u) associated with Φ (19) also exists in the positive domain domΦ = R n + with β > 1. See Table 3.We note that the Bregman-divergence D Φ (b|u) associated with Φ (19) is strictly convex with respect to b (see Theorem 4).On the other hand, convexity of D Φ (b|u) with respect to u strongly depends on the observed data b and β in Φ (19).Based on [9,12,28], we present three different convexities of the Bregman-divergence associated with Φ.Let Ω = domΦ.Then, we have the following: We note that the conditional convexity is first introduced in this article based on the previous analysis of the β-TV model [12].The reason we are interested in conditional convexity is that, in real applications, the dynamic range of the observed data is very limited.For instance, the observed image data via an optical camera have 8-bit resolution (i.e., b ∈ [0, 2 8 ]) [41] and the intensity level of the backscattered radar signal in SAR system is 32-bit resolution (i.e., b ∈ [0, 2 32 ]) at most [21].Therefore, it is natural to consider convexity depending on the given data b.
The following Theorem, mostly based on Theorem 3.3 in [28], is useful in characterizing convexity of the Bregman-divergence D Φ (b|u) associated with Φ (19).We note that Diag(A) is a vector with diagonal entries of a matrix (or tensor) A. Also, a function f is concave if and only if f (αu Theorem 9. Let D Φ (b|u) be the Bregman-divergence associated with convex, smooth, and (dimensionally) separable function Φ.In addition, we assume that h = Diag(∇ 2 Φ) > 0, then we have the following useful criterion for the convexity of D Φ (b|u).Here, Ω = domΦ.
(i) D Φ (b|u) is jointly convex if and only if 1/h is concave.Note that, since h = (h 1 , ..., h n ) is (dimensionally) separable, 1/h is defined as 1/h(u) = (1/h 1 (u), ..., 1/h n (u)), it is concave if and only if h satisfy the following inequality: Moreover, if ∇ 2 h exists, then D Φ (b|u) is jointly convex if and only if where B is an open convex set in int(Ω) and depends on b [12,42].
Proof.The proof of the first two convexities are given in Theorem 3.3 in [28].For the conditional convexity, let us take second derivatives of D Φ (b|u) with respect to u.Then, we get For each b ∈ B b ⊂ int(domΦ), we can find the domain of u ∈ B u ⊂ int(domΦ) satisfying the above condition.Let B be a convex and open set satisfying B ⊆ B b ∩ B u .Then, we have the conditional convexity condition in (35).
The following Theorem shows an interesting result that D Φ (b|u) associated with Φ (19) is convex on its whole domain with respect to u in a very limited region β ∈ [1,2].From a statistical point of view, this region is a little bit curious.In fact, if β ∈ (1, 2) then the Bregman-divergence D Φ (b|u) does not have the corresponding statistical Tweedie distribution [8].
Theorem 10.Let Φ be a convex and smooth function in (19) (not necessarily Legendre).Then, D Φ (b|u) is separately convex (and also jointly convex) with the following domain conditions: Due to the physical constraints of the observed data b, if we further restrict the domain of b, then we have conditional convexity of D Φ (b|u).Let α = (β − 2)/(β − 1), if β = 1 and B ± m/M be a constant vector in R n representing B ± m/M 1.Then, we have the following: • Case I: Let us assume that the given data b be positive and have the following limitation in measurement Then, D Φ (b|u) is conditionally convex on B, which is given below: Remark 2. We could use β-divergence to define proximity operators.For instance, the right β-divergence proximity operator can be defined as Instead of TV in (44), if we use an indicator function for a convex set S, then we get the right β-divergence projection operator for S as follows: It is interesting that the robustness of β-divergence [7] can be explained through the right β-divergence projection operator (45).Let us assume that b, u in (45) are probability distributions (i.e., ∑ i b i = 1 and ∑ i u i = 1) and S is a set of Gaussian distributions.Here, the notation is slightly abused, since the Gaussian distribution is a continuous probability distribution and it is not a convex set.We note that the (generalized) Kullback-Leibler (KL) divergence (i.e., D β (b|u) with β = 1), which is a commonly used similarity measure between two probability distributions, is undefined at zero probability (u = 0).See Figure 2a and Table 3.However, outliers (i.e., rare events) have extremely low probability and thus they exist near zero probability.In this case, KL-divergence amplify the value near zero, i.e., lim u→0 D β=1 (b|u) = +∞.However, when β > 1, as noticed in Figure 2a and Table 3, lim u→0 D β>1 (b|u) < +∞.Thus, outliers which exist near zero are not weighted too much.Hence, the right β-divergence projection operator (45) with β > 1 is more robust to outliers than the KL-divergence-based operator.For more details, see [4,5,7].Note that we can also define the left β-divergence proximity operator as In this section, we introduce a convex reformulation of the nonconvex Bregman-TV (27) (β < 1 and L = I) associated with Φ (19), which is the convex function of Legendre type and its domain is given in Table 3.Note that the problems we study in this section are related to the speckle reduction problem [12,15].
Proof.Let f (w) = ∇Φ * (w).Then, since β < 1 and w ∈ R n −− , we have Since Φ * is Legendre, ∇Φ * is strictly increasing on its domain.We note that, although ∇Φ * is strictly convex and strictly increasing, it is not coercive but bounded below, i.e., lim Finally, by using strict convexity and strictly increasing property of ∇Φ * , we have a unique solution of the minimization problem in (48).
Theorem 11.Let Φ (19) be the convex function of Legendre type and its domain is given in Table 3.In addition, we assume that β < 1 and B Then, for the given data b ∈ B, the left Bregman proximity operator (48) associated with the dual Bregman-divergence is well-defined.That is, there is a unique solution w * = arg min w∈∇Φ(B) F(w).
Proof.Since Φ (19) is Legendre, Φ * is also Legendre on the domain R n −− (= int(domΦ * )).Therefore, D Φ * (w|x) is strictly convex and coercive in terms of w by Theorem 4 (2).In addition, TV is a composition of • 1 • D, where D is a linear matrix (i.e, first order difference matrix) [44] and a 1 = ∑ i |a i |.Hence, we have where ∇Φ * (w) is strictly convex (Lemma 2) and D is a linear operator and • 1 is a simple metric (convex and increasing).Therefore, TV(∇Φ * (w)) is also a convex function (see Section IV.2.1 in [38]).In addition, since ∇Φ * (w) is lower bounded (Lemma 2), TV(∇Φ * (w)) is also lower bound.Then, the objective function F(w) in ( 48) is coercive (see Lemma 2.12 in [10]) and strictly convex.In the end, the left Bregman proximity operator associated with dual Bregman-divergence has an unique solution (see Proposition 3.5 in [10]) as (51) However, in general, due to the severe nonlinearity of ∇Φ * within the non-smooth regularizer, i.e., TV(∇Φ * (w)), it is not easy to design a stable numerical algorithm to find a solution u * in (51).To overcome this drawback, we can directly modify (47)  ( Since w = ∇Φ(u) is a nonlinear constraint and thus we cannot directly apply highly sophisticated augmented Lagrangian-based optimization algorithm.As a heuristic, to remedy these nonlinear constraints, we may consider the following penalty method [45]: This model is convex in terms of w and u, respectively.However, it is not convex with respect to (u, w).In case of speckle reduction problems (30), nonlinearity of ∇Φ could be reduced by using a shifting technique in [42].
In the following example, we show how (51) can be applied to relax nonconvex speckle reduction problems (30) with L = I. is the Bregman-divergence associated with Φ (Burg entropy) function.This model is known as AA-model [15].It is well known that it is not easy to find a global minimizer of (55), due to the severe nonconvexity of D Φ (b|u) in terms of u [15,21].Therefore, various transform-based convex relaxation approaches are introduced [12,[16][17][18][19][20][21]42].In this example, we are going to use dual Bregman-divergence to find a solution of (55).We note that Φ(u) = − ln u, 1 is the convex function of Legendre type on its domain int(domΦ) = R n ++ .Hence, by Theorem 7, we get the following corresponding conjugate function: We note that B w = ∇Φ(B) is a convex set and TV(− 1 w ) is also convex for all w ∈ R n −− .Therefore, the objective function F(w) in (56) is strictly convex on its domain B w .In addition, due to the Theorem 4 (2), F(w) is coercive in the domain R n −− .Therefore, we have a unique solution u * of (55).A similar inverse transformation on the positive domain R n ++ itself is introduced in [45,46].

Conclusions
In this article, we introduced the extended logarithmic function and, based on that, we could redefine the domain of the β-divergence.In fact, we have found that if β is in the class R e = {x ∈ R | x = 2k/(2l + 1), k, l ∈ Z}, then the negative region R n −− should be included into the domain of the β-divergence.In addition, if we use the integral of the extended logarithmic function as a base function of the Bregman-divergence, then we have a partial equivalence between the β-divergence and the Bregman-divergence associated with the Legendre base function.Last but not least, by using dual formulation of the Bregman-divergence associated with convex function of Legendre type and the negative domain of it, we have shown that we could make a convex reformulated model of the nonconvex variational model that appears in the SAR speckle reduction problem.The approaches in this article could be extended to other divergences, such as α-and γ-divergences [2].In addition, we could plug the presented model into various segmentation problems [11,47,48].

Theorem 4 .
Let Ω = domΦ and Ω * = domΦ * .Then, the Bregman-divergence associated with the convex function of Legendre type Φ satisfies the following.1. D Φ (b|u) is strictly convex with respect to b on int(Ω).2. For any u ∈ int(Ω), D Φ (b|u) is coercive with respect to b, i.e., lim b →∞ D Φ (b|u) → ∞. 3.For some b ∈ int(Ω), D Φ (b|u) is coercive with respect to u if and only if Then, the Bregman-divergence D [Φ] A associated with an equivalence class [Φ]A is equal to the Bregman-divergence D Φ associated with Φ, irrespective of the choice of an affine function A.
is convex with respect to u ∈ B for all b ∈ B, where B ⊆ int(Ω) is an open convex set and depends on b.

4 . 2 .
Dual Bregman-Divergence-Based Left Bregman Operator for a Convex Reformulation of the Bregman-TV with β < 1