Bounded Variation Separates Weak and Strong Average Lipschitz

Elperin, Ariel; Kontorovich, Aryeh

doi:10.3390/e27090974

Open AccessArticle

Bounded Variation Separates Weak and Strong Average Lipschitz

by

Ariel Elperin

and

Aryeh Kontorovich

^*

Computer Science Department, Ben-Gurion University, Beer Sheva 84105, Israel

^*

Author to whom correspondence should be addressed.

Entropy 2025, 27(9), 974; https://doi.org/10.3390/e27090974

Submission received: 22 August 2025 / Revised: 16 September 2025 / Accepted: 17 September 2025 / Published: 18 September 2025

(This article belongs to the Section Information Theory, Probability and Statistics)

Download Review Reports Versions Notes

Abstract

We closely examine a recently introduced notion of average smoothness. The latter defined a weak and strong average-Lipschitz seminorm for real-valued functions on general metric spaces. Specializing to the standard metric on the real line, we compare these notions to bounded variation (BV) and discover that the weak notion is strictly weaker than BV while the strong notion is strictly stronger. Along the way, we discover that the weak average smooth class is also considerably larger in a certain combinatorial sense, which is made precise by the fat-shattering dimension.

Keywords:

Lipschitz; variation; smooth average

1. Introduction

A function

f : [0, 1] \to R

is L-Lipschitz if

| f (x) - f (x^{'}) | \leq L | x - x^{'} |

for all

x, x^{'} \in [0, 1]

, and

{∥f∥}_{Lip}

is the smallest L for which this holds. If f has an integrable derivative, its variation

V (f)

is given by

V (f) = \int_{0}^{1} | f^{'} (x) | d x

(the more general definition is given in (2)). Since

| f^{'} (x) | \leq {∥f∥}_{Lip}

, we have the obvious relation

V (f) \leq {∥f∥}_{Lip}

. No reverse inequality is possible: since for monotone f, we have

V (f) = | f (0) - f (1) |

[1], a function whose value increases from 0 to

ε

with a sharp “jump” in the middle can have an arbitrarily large L and arbitrarily small V.

Motivated by questions in machine learning and statistics, Ashlagi et al. [2] introduced two notions of “average Lipschitz” in general metric probability spaces: a weak one and a strong one (follow-up works extended these results to average Hölder smoothness [3,4]). For the special case of the metric space

Ω = [0, 1]

equipped with the standard metric

ρ (x, x^{'}) = | x - x^{'} |

and the uniform distribution U, their definitions are as follows. Both notions rely on the local slope of

f : [0, 1] \to R

at a point x, which is defined (and denoted) as follows:

\begin{matrix} Λ_{f} (x) & = & sup_{x^{'} \in [0, 1] ∖ \{x\}} \frac{| f (x) - f (x^{'}) |}{| x - x^{'} |}, x \in [0, 1] . \end{matrix}

(1)

The strong and weak average smoothness of f are defined, respectively, by

\begin{array}{l} {∥ f ∥}_{S} = E [Λ_{f} (X)], \\ {∥ f ∥}_{W} = W [Λ_{f} (X)] = sup_{t > 0} t U (\{x \in Ω : Λ_{f} (x) \geq t\}), \end{array}

where X is a random variable distributed according to U on

[0, 1]

,

E

is the usual expectation, and

W

is the weak

L_{1}

norm of the random variable Z:

W [Z] = sup_{t > 0} t P (| Z | \geq t) .

Both

{∥ \cdot ∥}_{S}

and

{∥ \cdot ∥}_{W}

satisfy the homogeneity axiom of seminorms (meaning that

∥α f∥ = | α | \cdot ∥f∥

), and

{∥ \cdot ∥}_{S}

additionally satisfies the triangle inequality and hence is a true seminorm. The weak

L_{1}

norm satisfies the weaker inequality

W [X + Y] \leq 2 (W [X] + W [Y])

[5], which

{∥ \cdot ∥}_{W}

also inherits.

We now recall the definition of the variation of

f : [a, b] \to R

:

\begin{matrix} V_{a}^{b} (f) = sup_{a = x_{0} < x_{1} < x_{2} < \dots < x_{n} \leq b} \sum_{i = 1}^{n} | f (x_{i}) - f (x_{i - 1}) | \end{matrix}

(2)

(when

a = 0

and

b = 1

, we omit these), as well as the Lipschitz and bounded variation function classes:

\begin{array}{l} Lip = \{f : [0, 1] \to R; {∥f∥}_{Lip} < \infty\}, \\ BV = \{f : [0, 1] \to R; V (f) < \infty\} . \end{array}

The discussion above implies the (well-known) strict containment

\begin{matrix} Lip ⊊ BV . \end{matrix}

(3)

In addition, we define the strong and weak average smoothness classes

\begin{array}{l} {\bar{Lip}}^{s} = \{f : [0, 1] \to {R; ∥ f ∥}_{S} < \infty\}, \\ {\bar{Lip}}^{w} = \{f : [0, 1] \to {R; ∥ f ∥}_{W} < \infty\} . \end{array}

By Markov’s inequality and the fact that the expectation is bounded by the supremum, we have

{∥ f ∥}_{W} \leq {∥ f ∥}_{S} \leq sup_{x \in Ω} Λ_{f} (x) = {∥f∥}_{Lip}

whence

\begin{matrix} Lip \subseteq {\bar{Lip}}^{s} \subseteq {\bar{Lip}}^{w}; \end{matrix}

(4)

all of these containments were shown to be strict in [2]. The containments in (3) and (4) leave open the relation between BV and

{\bar{Lip}}^{s}, {\bar{Lip}}^{w}

, which we resolve in this work:

Theorem 1.

{\bar{Lip}}^{s} ⊊ BV ⊊ {\bar{Lip}}^{w}

.

We also provide a quantitative, finitary relation between these clases:

Theorem 2.

For any

f : [0, 1] \to R

, we have

\frac{1}{2} {∥ f ∥}_{W} \leq V (f) \leq {∥ f ∥}_{S} .

Finally, we recall the definition of the fat-shattering dimension, which is a combinatorial complexity measure of function classes of central importance in statistics, empirical processes, and machine learning [6,7]. Let F be a collection of functions mapping

[0, 1]

to

R

. For

γ > 0

, a set

S = \{x_{1}, \dots, x_{m}\} \subset [0, 1]

is said to be

γ

-shattered by F if

\begin{matrix} sup_{r \in R^{m}} min_{y \in {\{- 1, 1\}}^{m}} sup_{f \in F} min_{i \in [m]} y_{i} (f (x_{i}) - r_{i}) \geq γ . \end{matrix}

(5)

The

γ

-fat-shattering dimension, denoted by

{fat}_{γ} (F)

, is the size of the largest

γ

-shattered set (possibly ∞). It is known [8] that for

F = {f : [0, 1] \to R ∣ V (f) \leq L}

, we have

{fat}_{γ} (F) = 1 + ⌊\frac{L}{2 γ}⌋

. This same bound holds for

F = {f : [0, 1] \to R ∣ Lip f \leq L}

.

Although the strong smoothness class has the same combinatorial complexity as the BV and Lipschitz classes, for weak average smoothness, this quantity turns out to be considerably greater:

Theorem 3.

For

L > 0

, let

F_{W} = \{f : [0, 1] \to {R; ∥ f ∥}_{W} \leq L\}

and

F_{S} = \{f : [0, 1] \to {R; ∥ f ∥}_{S} \leq L\}

. Then,

1.: ${fat}_{γ} (F_{W}) = \infty$ whenever $γ \leq \frac{L}{6}$ ;
2.: ${fat}_{γ} (F_{S}) = 1 + ⌊\frac{L}{2 γ}⌋$ for $γ > 0$ .

Notation.

We write

[n] : = \{1, \dots, n\}

and use

m (\cdot)

to denote the Lebesgue measure (length) of sets in

R

.

2. Proofs

We begin with a variant of the standard covering lemma.

Lemma 1.

For any sequence

s_{1}, \dots, s_{n}

of closed segments in

R

, there is a subsequence indexed by

I \subseteq [n]

such that for all distinct

i, j \in I

we have

s_{i} \cap s_{j} = \emptyset

and

\sum_{i \in I} m (s_{i}) \geq \frac{1}{2} m (⋃_{i = 1}^{n} s_{i})

.

Proof.

We proceed by induction on n. Let

G = ([n], E)

denote the intersection graph of the

s_{i}

: the vertices correspond to the segments and

(i, j) \in E

if

s_{i} \cap s_{j} \neq \emptyset

.

Suppose that G contains a cycle, and let

s_{1} = [a_{1}, b_{1}], \dots, s_{k} = [a_{k}, b_{k}]

be the segments in the cycle sorted by their right endpoint. Since

s_{1} \cap s_{k} \neq \emptyset

, we have

a_{k} \leq b_{1}

. If

a_{k - 1} \geq a_{1}

, then

s_{k - 1} \subseteq s_{1} \cup s_{k}

. Otherwise,

a_{k - 1} < a_{1}

and

s_{1} \subseteq s_{k - 1}

. Either way, we have found a segment that is completely covered by the other vertices of G. After removing it, we obtain

I \subseteq [n]

of size

n - 1

with

⋃_{i \in I} s_{i} = ⋃_{i = 1}^{n} s_{i}

, so applying the inductive hypothesis on the segments in I yields the desired result. If G does not contain a cycle, then

G = A \cup B

is bipartite, where

A, B \subseteq [n]

are disjoint and nonempty. Clearly,

m (⋃_{i = 1}^{n} s_{i}) \leq m (⋃_{i \in A} s_{i}) + m (⋃_{i \in B} s_{i})

, and thus

max (m (⋃_{i \in A} s_{i}), m (⋃_{i \in B} s_{i})) \geq \frac{1}{2} m (⋃_{i = 1}^{n} s_{i})

, so taking either

I = A

or

I = B

(which is possible since the segments inside each part are disjoint) yields the desired result. □

Next, we reduce the proof of Theorem 2 to the case of right-continuous monotone functions.

Lemma 2.

If for every right-continuous monotone function

f : [0, 1] \to R

we have

{∥ f ∥}_{W} \leq 2 V (f)

, then the bound holds for all

f : [0, 1] \to R

. Furthermore, both inequalities are tight.

Proof.

We begin by observing that we can restrict our attention to monotone functions, since

T_{f} (x) = V_{f} ([0, x])

is monotone and has the same variation as f, but

Λ_{T_{f}} (x) = sup_{x^{'} \neq x} \frac{| T_{f} (x) - T_{f} (x^{'}) |}{| x - x^{'} |} \geq sup_{x^{'} \neq x} \frac{| f (x) - f (x^{'}) |}{| x - x^{'} |}

, which means

{∥ T_{f} ∥}_{W} \geq {∥ f ∥}_{W}

.

Thus, the inequality

{∥ T_{f} ∥}_{W} \leq 2 V (T_{f})

immediately implies that

{∥ f ∥}_{W} \leq {∥ T_{f} ∥}_{W} \leq 2 V (T_{f}) = 2 V (f)

. If f is monotone, it can only have jump discontinuities. Let

I \subset [0, 1]

denote the set of right discontinuities of f. Note that since f is monotone, I is at most countable. Define the modified version of f to be

\begin{matrix} \tilde{f} (x) = \{\begin{matrix} f (x) & x \notin I \\ lim_{ε \to 0^{+}} f (x + ε) & x \in I . \end{matrix} \end{matrix}

(6)

Note that

\tilde{f}

is monotone and right-continuous. It is not hard to see that if

0 \notin I

, then

V (\tilde{f}) = V (f)

and

Λ_{\tilde{f}} (x) = Λ_{f} (x)

for all

x \notin I

, which implies that

{∥ f ∥}_{W} = {∥ \tilde{f} ∥}_{W}

and allows us to restrict our discussion to right-continuous functions. If

0 \in I

, then we can extend the domain of

\tilde{f}

to

[- ε, 1]

for all

ε > 0

, where

\tilde{f} (x) = f (0)

for all

x < 0

. Denote the extended function by

{\tilde{f}}_{ε}

, then since

{∥ f ∥}_{W} = {lim}_{ε \to 0} {∥ {\tilde{f}}_{ε} ∥}_{W}

and

V ({\tilde{f}}_{ε}) = V (f)

for all

ε > 0

, we can conclude that

{∥ f ∥}_{W} = lim_{ε \to 0} {∥ {\tilde{f}}_{ε} ∥}_{W} \leq lim_{ε \to 0} 2 V ({\tilde{f}}_{ε}) = 2 V (f) .

□

2.1. Proof of Theorem 2

We first show that

{∥ f ∥}_{W} \leq 2 V (f)

. We may assume without loss of generality that

V (f) < \infty

. We will use the notation

V_{f} ([a, b])

for the variation of f when restricted to the segment

[a, b]

. Since f is of bounded variation, the function

T_{f} (x) = V_{f} ([0, x])

is well defined for

x > 0

. By Lemma 2, we may assume without loss of generality that f is right-continuous. Thus,

T_{f} : [0, 1] \to R

is monotone and right-continuous and thus induces a Lebesgue–Stieltjes measure on

[0, 1]

, which we denote by

μ_{f}

. We now define the maximal function

M_{f} : [0, 1] \to R

as follows:

\begin{matrix} M_{f} (x) = sup_{r_{1}, r_{2} > 0} \frac{μ_{f} ([x - r_{1}, x + r_{2}])}{r_{1} + r_{2}} = sup_{r_{1}, r_{2} > 0} \frac{V_{f} ([x - r_{1}, x + r_{2}])}{r_{1} + r_{2}}, \end{matrix}

(7)

where the segments

[a, x], [x, b]

are taken to be

[0, x], [x, 1]

, respectively, whenever

a < 0

or

b > 1

. A standard argument shows that

M_{f}^{- 1} ((t, \infty))

is open, whence

M_{f}

is measurable.

We now observe that

M_{f} \geq Λ_{f}

everywhere in

[0, 1]

. Indeed, if

x^{'} > x

, then

M_{f} (x) \geq \frac{V_{f} ([x - ε, x^{'}])}{ε + (x^{'} - x)} \geq \frac{| f (x^{'}) - f (x) |}{x^{'} - x + ε}

holds for

ε > 0

, and hence

M_{f} (x) \geq sup_{x^{'} > x} \frac{| f (x^{'}) - f (x) |}{x^{'} - x}

. The case of

x^{'} < x

is completely analogous, whence

M_{f} (x) \geq sup_{x^{'} \neq x} \frac{| f (x) - f (x^{'}) |}{| x - x^{'} |} = Λ_{f} (x)

. For X uniformly distributed over

[0, 1]

, we have

P (Λ_{f} (X) \geq t) \leq P (M_{f} (X) \geq t)

and showing

{∥ f ∥}_{W} \leq 2 V (f)

reduces to bounding the latter probability by

2 V (f) / t

.

We now closely follow the proof of Theorem 7.4 in [9] and bound

m (M_{f}^{- 1} ((t, \infty)))

by bounding

m (K)

for arbitrary compact

K \subseteq M_{f}^{- 1} ((t, \infty))

. For

x \in K \subseteq M_{f}^{- 1} ((t, \infty))

, denote by

r_{1} (x), r_{2} (x)

some lengths such that

\frac{μ_{f} ([x - r_{1} (x), x + r_{2} (x)])}{r_{1} (x) + r_{2} (x)} \geq t

. Denote by

S_{x}

the open interval

(x - r_{1} (x), x + r_{2} (x))

. Then, clearly,

K \subseteq ⋃_{x \in K} S_{x}

. Since K is compact, a finite cover by intervals

S_{x_{1}}, \dots, S_{x_{n}}

exists. By Lemma 1, there exists

I \subseteq [n]

such that for all distinct

i, j \in I

, we have

S_{x_{i}} \cap S_{x_{j}} = \emptyset

and

\sum_{i \in I} m (S_{x_{i}}) \geq \frac{1}{2} m (⋃_{i = 1}^{n} S_{x_{i}})

. Finally, by the definition of the

S_{x}

’s, for each

i \in [n]

, it holds that

m (S_{x_{i}}) \leq \frac{μ_{f} (S_{x_{i}})}{t}

. We can now write

\begin{matrix} m (K) \leq m (⋃_{i = 1}^{n} S_{x_{i}}) \leq 2 \sum_{i \in I} m (S_{x_{i}}) \leq \frac{2}{t} \sum_{i \in I} μ_{f} (S_{x_{i}}) \leq \frac{2}{t} μ_{f} ([0, 1]), \end{matrix}

(8)

where the last inequality holds since the intervals in I are disjoint. Since

μ_{f} ([0, 1]) = V (f)

, it immediately follows that

{∥ f ∥}_{W} \leq 2 V (f)

.

It remains to show that

V (f) \leq {∥ f ∥}_{S}

. Let us denote by

P_{n}

the partition

0 \leq x_{1} < x_{2} < \dots < x_{n} \leq 1

of

[0, 1]

, and let

V (P_{n}) = \sum_{i = 1}^{n - 1} | f (x_{i + 1}) - f (x_{i}) |

denote the variation of f relative to

P_{n}

. It suffices to show that for any such partition

P_{n}

, we have

{∥ f ∥}_{S} \geq V (P_{n})

. Now

\begin{matrix} {∥ f ∥}_{S} = E [Λ_{f} (X)] \geq \sum_{i = 1}^{n - 1} | x_{i + 1} - x_{i} | E [Λ_{f} (X) | X \in [x_{i}, x_{i + 1}]] . \end{matrix}

(9)

Note that for all

x \in [x_{i}, x_{i + 1}]

, we have

\begin{matrix} Λ_{f} (x) \geq max (\frac{| f (x) - f (x_{i}) |}{x - x_{i}}, \frac{| f (x_{i + 1}) - f (x) |}{x_{i + 1} - x}) \geq \frac{| f (x_{i + 1}) - f (x_{i}) |}{x_{i + 1} - x_{i}} . \end{matrix}

(10)

Applying this to (9) yields

E [Λ_{f}] \geq \sum_{i = 1}^{n - 1} | x_{i + 1} - x_{i} | \frac{| f (x_{i + 1}) - f (x_{i}) |}{x_{i + 1} - x_{i}} = V (P_{n}) .

Finally, the tightness of the first claimed inequality is witnessed by the step function

f (x) = 1 [x > 1 / 2]

and of the second inequality by

f (x) = x

. □

2.2. Proof of Theorem 1

The claimed containments are immediate from Theorem 2; only the separations remain to be shown. The first of these is obvious: the step function has bounded variation but infinite strong average Lipschitz [2] (Appendix I). We proceed with the second separation:

Lemma 3.

There exists an

f : [0, 1] \to [0, 1]

such that

V (f) = \infty

but

{∥ f ∥}_{W} \leq 2

.

Proof.

Let

f : [0, 1] \to [0, 1]

be the piecewise linear function defined on

x_{n} = \frac{1}{n}

,

n \geq 1

, by

f (\frac{1}{n}) = \sum_{k = 1}^{n} \frac{{(- 1)}^{k + 1}}{k}

and extended to

[0, 1]

by linear interpolation.

Clearly,

V (f) = \sum_{n = 1}^{\infty} \frac{1}{n} = \infty

. To bound

{∥ f ∥}_{W}

, note that any

x, x^{'} \in [0, 1]

witnessing

\frac{| f (x) - f (x^{'}) |}{| x - x^{'} |} \geq t

also verify

| x - x^{'} | \leq \frac{1}{t}

. Let

I_{n}

denote the interval

[\frac{1}{n + 1}, \frac{1}{n}]

. If

Λ_{f} (x) \geq t

, then there is an

x^{'}

such that

\frac{| f (x) - f (x^{'}) |}{| x - x^{'} |} \geq t

. Now, either x or

x^{'}

lies in

I_{n}

for

n \geq t

. If

x \in I_{n}

with

n \geq t

, then

x \leq \frac{1}{t}

. If, however,

x^{'} \in I_{n}

for

n \geq t

, then

x^{'} \leq \frac{1}{t}

and since

| x - x^{'} | \leq \frac{1}{t}

, we have

x \leq \frac{2}{t}

. We conclude that

Λ_{f} (x) \geq t

implies

x \leq \frac{2}{t}

and hence

P (Λ_{f} (x) \geq t) \leq \frac{2}{t}

; this proves the claim. □

Remark.

Another function with this property is

x sin \frac{1}{x}

.

2.3. Proof of Theorem 3

2.3.1. Proof That ${fat}_{γ} (F_{W}) = \infty$ Whenever $γ \leq \frac{L}{6}$

Consider the partition of

[0, 1]

into segments

I_{n} = [x_{n + 1}, x_{n}]

where

x_{n} = 2^{- n}

. We define

f (x_{n}) = {(- 1)}^{n} γ

. This specifies f at all endpoints of

I_{n}

. For

x \in (x_{n + 1}, x_{n})

, we define

f (x) = {(- 1)}^{n} \frac{γ}{| I_{n} |} (x - x_{n + 1}) + {(- 1)}^{n} \frac{γ}{| I_{n} |} (x - x_{n})

, i.e., f is piecewise linear with slope

{(- 1)}^{n} 4 γ 2^{n}

in

I_{n}

. Similarly to Lemma 3, if

\frac{| f (x) - f (x^{'}) |}{| x - x^{'} |} \geq t

, then

| x - x^{'} | \leq \frac{2 γ}{t}

. Now, suppose

Λ_{f} (x) \geq t

, i.e., there exists

x^{'}

with

\frac{| f (x) - f (x^{'}) |}{| x - x^{'} |} \geq t

. This implies that either x or

x^{'}

lies in

I_{n}

for some

n \geq log \frac{t}{4 γ}

(the slope of the line connecting x and

x^{'}

lies between the slopes of the segments containing

x, x^{'}

). If

x \in I_{n}

for some

n \geq log \frac{t}{4 γ}

, then

x \leq x_{n} \leq \frac{4 γ}{t}

. If, however,

x^{'} \in I_{n}

for some

n \geq log \frac{t}{4 γ}

then

x^{'} \leq \frac{4 γ}{t}

and since

| x - x^{'} | \leq \frac{2 γ}{t}

, we have

x \leq \frac{6 γ}{t}

. Since

Λ_{f} (x) \geq t

implies

x \leq \frac{6 γ}{t}

, we can conclude that

{∥ f ∥}_{W} \leq 6 γ

. An immediate corollary is that

{\bar{Lip}}_{L}^{w}

γ

-shatters the infinite set

{x_{n}}_{n = 1}^{\infty}

for

γ \leq \frac{L}{6}

— which is even stronger than having arbitrarily large

γ

-shattered sets. Note that this is close to tight, since for

γ > \frac{L}{2}

, we cannot

γ

-shatter even a set of two points. Suppose

f (x_{1}) > \frac{L}{2}

and

f (x_{2}) < - \frac{L}{2}

, then for

x \in [x_{1}, x_{2}]

we have

Λ_{f} (x) > \frac{L}{| x_{2} - x_{1} |}

, hence

{∥ f ∥}_{W} > L

, which means

{x_{1}, x_{2}}

is not

γ

-shattered by

{\bar{Lip}}_{L}^{w}

. □

2.3.2. Proof That ${fat}_{γ} (F_{S}) = 1 + ⌊\frac{L}{2 γ}⌋$ for $γ > 0$

The upper bound follows immediately from

V (f) \leq {∥f∥}_{Lip}

. For the lower bound, take a

2 γ / L

packing

\{x_{1}, \dots, x_{1 + ⌊\frac{L}{2 γ}⌋}\}

of

[0, 1]

. For labeling

y_{i} \in {\{- 1, 1\}}^{n}

, consider the linear interpolation of

(x_{i}, y_{i} γ)

, and observe that the interpolation f satisfies

Λ_{f} (x) \leq \frac{2 γ}{2 γ / L} = L

everywhere. □

Author Contributions

Conceptualization, A.E. and A.K.; validation, A.E. and A.K.; formal analysis, A.E. and A.K.; writing—review and editing, A.E. and A.K.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Appell, J.; Banaś, J.; Merentes, N. Bounded Variation and Around; De Gruyter Series in Nonlinear Analysis and Applications; De Gruyter: Berlin/Heidelberg, Germany, 2014; Volume 17, pp. x+476. [Google Scholar]
Ashlagi, Y.; Gottlieb, L.; Kontorovich, A. Functions with average smoothness: Structure, algorithms, and learning. J. Mach. Learn. Res. 2024, 25, 117:1–117:54. [Google Scholar]
Hanneke, S.; Kontorovich, A.; Kornowski, G. Efficient Agnostic Learning with Average Smoothness. In Proceedings of the International Conference on Algorithmic Learning Theory, La Jolla, CA, USA, 25–28 February 2024; Volume 237, pp. 719–731. Available online: https://proceedings.mlr.press/v237/hanneke24a.html (accessed on 22 August 2025).
Kornowski, G.; Hanneke, S.; Kontorovich, A. Near-Optimal Learning with Average Hölder Smoothness. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10–16 December 2023; Available online: https://proceedings.neurips.cc/paper_files/paper/2023/hash/42afce512806ab874b9f99ed9a08055e-Abstract-Conference.html (accessed on 22 August 2025).
Hagelstein, P.A. Weak L1 norms of random sums. Proc. Am. Math. Soc. 2005, 133, 2327–2334. [Google Scholar] [CrossRef]
Alon, N.; Ben-David, S.; Cesa-Bianchi, N.; Haussler, D. Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 1997, 44, 615–631. [Google Scholar] [CrossRef]
Bartlett, P.L.; Long, P.M. Prediction, learning, uniform convergence, and scale-sensitive dimensions. J. Comput. Syst. Sci. 1998, 56, 174–190. [Google Scholar] [CrossRef][Green Version]
Anthony, M.; Bartlett, P.L. Neural Network Learning: Theoretical Foundations; Cambridge University Press: Cambridge, UK, 1999; pp. xiv+389. [Google Scholar] [CrossRef]
Rudin, W. Real and Complex Analysis, 3rd ed.; McGraw-Hill, Inc.: Columbus, OH, USA, 1987. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Elperin, A.; Kontorovich, A. Bounded Variation Separates Weak and Strong Average Lipschitz. Entropy 2025, 27, 974. https://doi.org/10.3390/e27090974

AMA Style

Elperin A, Kontorovich A. Bounded Variation Separates Weak and Strong Average Lipschitz. Entropy. 2025; 27(9):974. https://doi.org/10.3390/e27090974

Chicago/Turabian Style

Elperin, Ariel, and Aryeh Kontorovich. 2025. "Bounded Variation Separates Weak and Strong Average Lipschitz" Entropy 27, no. 9: 974. https://doi.org/10.3390/e27090974

APA Style

Elperin, A., & Kontorovich, A. (2025). Bounded Variation Separates Weak and Strong Average Lipschitz. Entropy, 27(9), 974. https://doi.org/10.3390/e27090974

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bounded Variation Separates Weak and Strong Average Lipschitz

Abstract

1. Introduction

2. Proofs

2.1. Proof of Theorem 2

2.2. Proof of Theorem 1

2.3. Proof of Theorem 3

2.3.1. Proof That ${fat}_{γ} (F_{W}) = \infty$ Whenever $γ \leq \frac{L}{6}$

2.3.2. Proof That ${fat}_{γ} (F_{S}) = 1 + ⌊\frac{L}{2 γ}⌋$ for $γ > 0$

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Bounded Variation Separates Weak and Strong Average Lipschitz

Abstract

1. Introduction

2. Proofs

2.1. Proof of Theorem 2

2.2. Proof of Theorem 1

2.3. Proof of Theorem 3

2.3.1. Proof That fat γ ( F W ) = ∞ Whenever γ ≤ L 6

2.3.2. Proof That fat γ ( F S ) = 1 + L 2 γ for γ > 0

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.3.1. Proof That ${fat}_{γ} (F_{W}) = \infty$ Whenever $γ \leq \frac{L}{6}$

2.3.2. Proof That ${fat}_{γ} (F_{S}) = 1 + ⌊\frac{L}{2 γ}⌋$ for $γ > 0$