## Appendix A

This section provides the background on the f-divergence, the $\alpha $-divergence, and the convex conjugate function, highlighting the key properties required for our derivations.

The

f-divergence [

26,

49,

50] generalizes many similarity measures between probability distributions [

32]. For two distributions

$\pi $ and

q on a finite set

$\mathcal{A}$, the

f-divergence is defined as

where

f is a convex function on

$(0,\infty )$ such that

$f\left(1\right)=0$. For example, the KL divergence corresponds to

${f}_{KL}\left(x\right)=xlogx$. Please note that

$\pi $ must be absolutely continuous with respect to

q to avoid division by zero, i.e.,

$q\left(a\right)=0$ implies

$\pi \left(a\right)=0$ for all

$a\in \mathcal{A}$. We additionally assume

f to be continuously differentiable, which includes all cases of interest for us. The

f-divergence can be generalized to

unnormalized distributions. For example, the generalized KL divergence [

27] corresponds to

${f}_{1}\left(x\right)=xlogx-(x-1)$. The derivations in this paper benefit from employing unnormalized distributions and subsequently imposing the normalization condition as a constraint.

The

$\alpha $-divergence [

19,

20] is a one-parameter family of

f-divergences generated by the

$\alpha $-function

${f}_{\alpha}\left(x\right)$ with

$\alpha \in \mathbb{R}$. The particular choice of the family of functions

${f}_{\alpha}$ is motivated by generalization of the natural logarithm [

21]. The

$\alpha $-logarithm

${log}_{\alpha}\left(x\right)=({x}^{\alpha -1}-1)/(\alpha -1)$ is a power function for

$\alpha \ne 1$ that turns into the natural logarithm for

$\alpha \to 1$. Replacing the natural logarithm in the derivative of the KL divergence

${f}_{1}^{\prime}=logx$ by the

$\alpha $-logarithm and integrating

${f}_{\alpha}^{\prime}$ under the condition that

${f}_{\alpha}\left(1\right)=0$ yields the

$\alpha $-function

The

$\alpha $-divergence generalizes the KL divergence, reverse KL divergence, Hellinger distance, Pearson

${\chi}^{2}$-divergence, and Neyman (reverse Pearson)

${\chi}^{2}$-divergence.

Figure A1 displays well-known

$\alpha $-divergences as points on the parabola

$y=\alpha (\alpha -1)$. For every divergence, there is a reverse divergence symmetric with respect to the point

$\alpha =0.5$, corresponding to the Hellinger distance.

**Figure A1.**
The $\alpha $-divergence smoothly connects several prominent divergences.

**Figure A1.**
The $\alpha $-divergence smoothly connects several prominent divergences.

The convex conjugate of

$f\left(x\right)$ is defined as

${f}^{*}\left(y\right)={sup}_{x\in domf}\{\langle y,x\rangle -f\left(x\right)\}$, where the angle brackets

$\langle y,x\rangle $ denote the dot product [

51]. The key property

${\left({f}^{*}\right)}^{\prime}={\left({f}^{\prime}\right)}^{-1}$ relating the derivatives of

${f}^{*}$ and

f yields

Table A1, which lists common functions

${f}_{\alpha}$ together with their convex conjugates and derivatives. In the general case (

A1), the convex conjugate and its derivative are given by

Function

${f}_{\alpha}$ is convex, non-negative, and attains minimum at

$x=1$ with

${f}_{\alpha}\left(1\right)=0$. Function

${\left({f}_{\alpha}^{*}\right)}^{\prime}$ is positive on its domain with

${\left({f}_{\alpha}^{*}\right)}^{\prime}\left(0\right)=1$. Function

${f}_{\alpha}^{*}$ has the property

${f}_{\alpha}^{*}\left(0\right)=0$. The linear inequality constraint (

A2) on the

$dom{f}_{\alpha}^{*}$ follows from the requirement

$dom{f}_{\alpha}=(0,\infty )$. Another result from convex analysis crucial to our derivations is Fenchel’s equality

where

${x}^{\u2606}\left(y\right)={arg\; sup}_{x\in domf}\{\langle y,x\rangle -f\left(x\right)\}$. We will occasionally put the conjugation symbol at the bottom, especially for the derivative of the conjugate function

${f}_{*}^{\prime}={\left({f}^{*}\right)}^{\prime}$.

**Table A1.**
Function ${f}_{\alpha}$, its convex conjugate ${f}_{\alpha}^{*}$, and their derivatives for some values of $\alpha $.

**Table A1.**
Function ${f}_{\alpha}$, its convex conjugate ${f}_{\alpha}^{*}$, and their derivatives for some values of $\alpha $.

Divergence | $\mathit{\alpha}$ | $\mathit{f}\left(\mathit{x}\right)$ | ${\mathit{f}}^{\prime}\left(\mathit{x}\right)$ | ${\left({\mathit{f}}^{*}\right)}^{\prime}\left(\mathit{y}\right)$ | ${\mathit{f}}^{*}\left(\mathit{y}\right)$ | $dom{\mathit{f}}^{*}$ |
---|

KL | 1 | $xlogx-(x-1)$ | $logx$ | ${e}^{y}$ | ${e}^{y}-1$ | $\mathbb{R}$ |

Reverse KL | 0 | $-logx+(x-1)$ | $-\frac{1}{x}+1$ | $\frac{1}{1-y}$ | $-log(1-y)$ | $y<1$ |

Pearson ${\chi}^{2}$ | 2 | $\frac{1}{2}{(x-1)}^{2}$ | $x-1$ | $y+1$ | $\frac{1}{2}{(y+1)}^{2}-\frac{1}{2}$ | $y>-1$ |

Neyman ${\chi}^{2}$ | $-1$ | $\frac{{(x-1)}^{2}}{2x}$ | $-\frac{1}{2{x}^{2}}+\frac{1}{2}$ | $\frac{1}{\sqrt{1-2y}}$ | $-\sqrt{1-2y}+1$ | $y<\frac{1}{2}$ |

Hellinger | $\frac{1}{2}$ | $2{\left(\sqrt{x}-1\right)}^{2}$ | $2-\frac{2}{\sqrt{x}}$ | $\frac{4}{{(2-y)}^{2}}$ | $\frac{2y}{2-y}$ | $y<2$ |