Composition of Activation Functions and the Reduction to Finite Domain

George A. Anastassiou

doi:10.3390/math13193177

Department of Mathematical Sciences, University of Memphis, Memphis, TN 38152, USA

Mathematics2025, 13(19), 3177;https://doi.org/10.3390/math13193177

This article belongs to the Special Issue Special Functions with Applications

Version Notes

Order Reprints

Abstract

This work takes up the task of the determination of the rate of pointwise and uniform convergences to the unit operator of the “normalized cusp neural network operators”. The cusp is a compact support activation function, which is the composition of two general activation functions having as domain the whole real line. These convergences are given via the modulus of continuity of the engaged function or its derivative in the form of Jackson type inequalities. The composition of activation functions aims to more flexible and powerful neural networks, introducing for the first time the reduction in infinite domains to the one domain of compact support.

Keywords:

neural network approximation; cusp activation function; modulus of continuity; reduction of domain

MSC:

41A17; 41A25; 41A30; 41A36

1. Introduction

From AI and computer science, we have the following: in essence, composing activation functions in neural networks offers the advantage of potentially tailoring the network’s ability to learn and model complex, non-linear relationships in data. Here is a breakdown of the potential benefits:

Enhanced Capacity for Complex Modeling:
- Diversification of Non-linearity: Different activation functions have different characteristics. For example, ReLU introduces sparsity, while Sigmoid squashes values into a range. By composing them, the network potentially can learn a wider variety of non-linear transformations and capture more intricate patterns in the data.
Improved Training Dynamics:
- Mitigating Gradient Problems: Activation functions influence gradient flow during training. Using different activation functions can potentially help address issues like vanishing or exploding gradients, which hinder learning in deep networks.
- Faster Convergence: Certain activation functions, like ReLU, can accelerate the convergence of the training process compared to others like Sigmoid or Tanh. Combining different functions can potentially lead to faster training and competitive performance.
Enhanced Generalization and Robustness:
- Better Generalization: By learning richer representations of the data through diverse activation functions, the network’s ability to generalize well to unseen data improves, reducing the risk of overfitting.
- Increased Robustness: Networks with carefully chosen activation functions can handle variations in input data more effectively, adapting to noise, missing data, or unexpected perturbations.
Adaptation to Input Characteristics:
- Handling Diverse Data: Different activation functions can be suited to different data characteristics. For instance, tanh can be useful when dealing with data containing both positive and negative values.
Potential for Architectural Interpretability:
- Insight into Learning: By using distinct activation functions, different parts of the network might become responsible for capturing specific features, which can potentially offer insights into how the model learns.

In summary, composing activation functions potentially allows for a more flexible and powerful neural network capable of

Learning more complex patterns.
Faster and more stable training.
Better generalization to new data.
Greater adaptability to diverse data.

Attention: While composing activation functions can offer benefits, it’s important to choose them judiciously and with consideration for the specific problem at hand, as some combinations might not be beneficial or could even lead to unwanted behaviors like exploding gradients. Empirical testing and validation are crucial when exploring different activation function compositions.

The author was greatly inspired and motivated by [1] and was the pioneer of quantitative neural network approximation, see [2], and since then, he has published numerous of papers and books, e.g., see [3].

In this article, we continue this trend.

In mathematical neural network approximation AMS Mathscinet lists no articles related to composition of activation functions. So this is the first one of its kind.

By using the composition of activation functions, we achieve the first extensive part of this introduction, and most notably, this composition leads to an activation function of compact support, though the initial activation functions had an infinite domain, the whole real line.

Now the resulting activation function is an open cusp of compact support

[- 1, 1]

. Our involved activation functions are very general, and the constructed neural network operators resemble the squashing operators in [2,3], and so do the produced quantitative results.

As a result, our produced convergence inequalities look much simpler and nicer.

Of great inspiration are the articles [4,5,6]. References [7,8,9] are foundational. Finally, references [10,11,12,13] represent recent important works.

2. Basics

Let

i = 1, 2

, and

h_{i} : R \to [- 1, 1]

be general sigmoid activation functions, such that they are strictly increasing,

h_{i} (0) = 0

,

h_{i} (- x) = - h_{i} (x)

,

x \in R

,

h_{i} (+ \infty) = 1

,

h_{i} (- \infty) = - 1

. Also,

h_{i}

is strictly convex over

(- \infty, 0]

and strictly concave over

[0, + \infty)

, with

h_{i}^{(2)} \in C (R)

.

Clearly,

h_{1} \circ h_{2} = h_{1} |_{(- 1, 1)} \circ h_{2}

is strictly increasing and

(h_{1} \circ h_{2}) (0) = 0

, and

h_{1} \circ h_{2} (- x) = h_{1} (h_{2} (- x)) = h_{1} (- h_{2} (x)) = - h_{1} (h_{2} (x)) = - (h_{1} \circ h_{2}) (x),

that is

(h_{1} \circ h_{2}) (- x) = - (h_{1} \circ h_{2}) (x), \forall x \in R .

Furthermore,

(h_{1} \circ h_{2}) (+ \infty) = h_{1} (h_{2} (+ \infty)) = h_{1} (1),

(h_{1} \circ h_{2}) (- \infty) = h_{1} (h_{2} (- \infty)) = h_{1} (- 1) .

Next, acting over

(- \infty, 0] :

let

λ, μ \geq 0 : λ + μ = 1

. Then, by convexity of

h_{2}

there we have

h_{2} (λ x + μ y) \leq λ h_{2} (x) + μ h_{2} (y), x, y \in R_{-};

and

h_{1} (h_{2} (λ x + μ y)) \leq h_{1} (λ h_{2} (x) + μ h_{2} (y)) \leq λ h_{1} (h_{2} (x)) + μ h_{1} (h_{2} (y)),

i.e.,

(h_{1} \circ h_{2}) (λ x + μ y) \leq λ (h_{1} \circ h_{2}) (x) + μ (h_{1} \circ h_{2}) (y),

x, y \in R_{-} .

So that

h_{1} \circ h_{2}

is convex over

(- \infty, 0] .

Similarly, over

[0, + \infty)

, we get: let

λ, μ \geq 0 : λ + μ = 1

. Then, by concavity of

h_{2}

there we have

h_{2} (λ x + μ y) \geq λ h_{2} (x) + μ h_{2} (y), x, y \in R_{+};

and

h_{1} (h_{2} (λ x + μ y)) \geq h_{1} (λ h_{2} (x) + μ h_{2} (y)) \geq λ h_{1} (h_{2} (x)) + μ h_{1} (h_{2} (y)) .

Therefore

h_{1} \circ h_{2}

is concave over

[0, + \infty) .

Also, it is

{(h_{1} (h_{2} (x)))}^{″} = h_{1}^{″} (h_{2} (x)) {(h_{2}^{'} (x))}^{2} + h_{1}^{'} (h_{2} (x)) h_{2}^{″} (x) \in C (R), x \in R .

So

h_{1} \circ h_{2}

is a sigmoid activation function.

Next we consider the function

ψ_{1, 2} (x) : = \frac{1}{4} (h_{1} \circ h_{2} (x + 1) - h_{1} \circ h_{2} (x - 1)) > 0, \forall x \in R .

We observe that

ψ_{1, 2} (- x) = \frac{1}{4} (h_{1} (h_{2} (- x + 1)) - h_{1} (h_{2} (- x - 1))) =

\frac{1}{4} (h_{1} (h_{2} (- (x - 1))) - h_{1} (h_{2} (- (x + 1)))) =

\frac{1}{4} (h_{1} (- h_{2} (x - 1)) - h_{1} (- h_{2} (x + 1))) =

\frac{1}{4} (- h_{1} (h_{2} (x - 1)) + h_{1} (h_{2} (x + 1))) =

\frac{1}{4} (h_{1} h_{2} (x + 1) - h_{1} h_{2} (x - 1)) = ψ_{1, 2} (x),

that is

ψ_{1, 2} (- x) = ψ_{1, 2} (x), \forall x \in R .

So

ψ_{1, 2}

can serve as a density function in general.

So we have h₂ : ℝ → (−1, 1), h₁|(−1,1) : (−1, 1) → (−1, 1), and the strictly increasing function H := h₁|_(−1,1) ◦ h₂ : ℝ → (−1, 1), with the graph of H containing an arc of finite length, such that H(0) = 0, starting at (−1, h₁(h₂(−1))) and terminating at (1, h₁(h₂(1))). We call this arc also H. In particular H is negative and convex over (−1, 0], and it is positive and concave over [0, 1).

So it has compact support [−1, 1] and it is like a squashing function, see [3], Ch. 1, p. 8.

We will work from now on with |H|, which has as a graph a cusp joining the points (−1, |h₁(h₂(−1))|), (0, 0), (1, h₁(h₂(1))) and with compact support, again, [−1, 1]. The points (−1, |h₁(h₂(−1))|), (1, h₁(h₂(1))) belong to the graph of |H| and (0, 0) too.

Typically H has a steeper slope than of h₂, but it is flatter and closer to the x-axis than h₂ is, e.g. tanh(tanh x) has asymptotes ±0.76, while tanh x has asymptotes ±1, notice that tanh(1) = 0.76. Clearly H has applications in spiking neural networks.

3. Background

Here we consider functions

f : R \to R

to be either continuous and bounded, or uniformly continuous.

The first modulus of continuity is given by

ω_{1} (f, δ) : = \sup_{\begin{matrix} x, y \in R : \\ |x - y| \leq δ \end{matrix}} |f (x) - f (y)|, δ > 0 .

Here we have that

ω_{1} (f, δ) < + \infty

,

δ > 0 .

In this article, we study the pointwise and uniform convergences with rates over the real line, to the unit operator, of the “normalized cusp neural network operators”,

(A_{n} (f)) (x) : = \frac{\sum_{k = - n^{2}}^{n^{2}} f (\frac{k}{n}) |H (n^{1 - α} (x - \frac{k}{n}))|}{\sum_{k = - n^{2}}^{n^{2}} |H (n^{1 - α} (x - \frac{k}{n}))|},

(1)

where

0 < α < 1

and

x \in R

,

n \in N

.

Notice

A_{n}

is a positive linear operator with

A_{n} (1) = 1

.

The terms in the ratio of sums (1) can be non-negative and make sense, iff

0 < |n^{1 - α} (x - \frac{k}{n})| \leq 1

, i.e.,

0 < |x - \frac{k}{n}| \leq \frac{1}{n^{1 - α}},

iff

n x - n^{α} \leq x \leq n x + n^{α}, x \neq \frac{k}{n} .

(2)

In order to have the desired order of numbers

- n^{2} \leq n x - n^{α} \leq n x + n^{α} \leq n^{2}, x \neq \frac{k}{n},

(3)

it is sufficient to assume that

n \geq 1 + |x|, x \neq \frac{k}{n} .

(4)

When

x \in [- 1, 1]

,

x \neq \frac{k}{n}

, it is enough to assume

n \geq 2

, which implies (3), and

x \neq \frac{k}{n} .

But the unique case

x = \frac{k}{n}

contributes nothing and can be ignored.

Thus, without loss of generality, we can always take that

x \neq \frac{k}{n} .

Proposition 1

([2]). Let

a, b \in R

,

a \leq b

. Let

c a r d (k)

(\geq 0)

be the maximum number of integers contained in

[a, b]

. Then

\max (0, (b - a) - 1) \leq c a r d (k) \leq (b - a) + 1 .

Note 1.

We would like to establish a lower bound on

c a r d (k)

over the interval

[n x - n^{α}, n x + n^{α}]

. By Proposition 1, we get that

c a r d (k) \geq \max (2 n^{α} - 1, 0) .

We obtain

c a r d (k) \geq 1

, if

2 n^{α} - 1 \geq 1

iff

n \geq 1,

which is always true.

So to have the desired order

(3)

and

c a r d (k) \geq 1

over

[n x - n^{α}, n x + n^{α}],

it is enough to consider

n \geq \max (1 + |x|, 1) = 1 + |x| .

(5)

Also notice that

c a r d (k) \to + \infty

, as

n \to + \infty .

Denote

[\cdot]

as the integral part of a number and

⌈\cdot⌉

as its ceiling.

Thus, it is clear that

(A_{n} (f)) (x) : = \frac{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} f (\frac{k}{n}) |H (n^{1 - α} (x - \frac{k}{n}))|}{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |H (n^{1 - α} (x - \frac{k}{n}))|},

(6)

0 < α < 1

, and

n \in N : n \geq 1 + |x|,

x \in R .

4. Main Results

Next come our first main results.

Theorem 1.

Let

x \in R

,

n \in N : n \geq 1 + |x|

,

f : R \to R

is either continuous and bounded, or uniformly continuous. Then

|A_{n} (f) (x) - f (x)| \leq ω_{1} (f, \frac{1}{n^{1 - α}}),

(7)

where

ω_{1}

is the first modulus of continuity of f. Hence,

\lim_{n \to + \infty} A_{n} (f) (x) = f (x)

, pointwise, given f is uniformly continuous.

When

n \geq 2

, we obtain

{∥A_{n} (f) - f∥}_{\infty, [- 1, 1]} \leq ω_{1} (f, \frac{1}{n^{1 - α}}) .

(8)

Hence

\lim_{n \to + \infty} A_{n} (f) = f

, uniformly over

[- 1, 1]

, given f isuni f ormlycontinuous.

Proof.

|A_{n} (f) (x) - f (x)| = |\frac{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} f (\frac{k}{n}) |H (n^{1 - α} (x - \frac{k}{n}))|}{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |H (n^{1 - α} (x - \frac{k}{n}))|} - f (x)| =

|\frac{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} (f (\frac{k}{n}) - f (x)) |H (n^{1 - α} (x - \frac{k}{n}))|}{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |H (n^{1 - α} (x - \frac{k}{n}))|}| \leq

(9)

\frac{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |f (\frac{k}{n}) - f (x)| |H (n^{1 - α} (x - \frac{k}{n}))|}{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |H (n^{1 - α} (x - \frac{k}{n}))|} \leq

w_{1} (f, \frac{1}{n^{1 - α}}) (\frac{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |H (n^{1 - α} (x - \frac{k}{n}))|}{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |H (n^{1 - α} (x - \frac{k}{n}))|}) = w_{1} (f, \frac{1}{n^{1 - α}}) .

□

We continue with our second main result.

Theorem 2.

Let

x \in R

,

n \in N : n \geq 1 + |x|

. Let

f \in C^{N} (R)

,

N \in N,

such that

f^{(N)}

is a continuous and bounded function or a uniformly continuous function. Then

|(A_{n} (f)) (x) - f (x)| \leq \sum_{j = 1}^{N} (\frac{|f^{(j)} (x)|}{n^{j (1 - α)} j!}) + ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{1}{N! n^{N (1 - α)}} .

(10)

Notice that as

n \to + \infty

, we have that the right-hand side (R.H.S.) (10)

\to 0

; therefore, the left-hand side (L.H.S.) (10)

\to 0,

i.e., Equation (10) gives us with rates the pointwise convergence of

(A_{n} (f)) (x) \to f (x)

, as

n \to + \infty,

x \in R

.

Proof.

With Taylor’s formula, we have

f (\frac{k}{n}) = \sum_{j = 0}^{N - 1} \frac{f^{(j)} (x)}{j!} {(\frac{k}{n} - x)}^{j} + \int_{x}^{\frac{k}{n}} (f^{(N)} (t) - f^{(N)} (x)) \frac{{(\frac{k}{n} - t)}^{N - 1}}{(N - 1)!} d t .

(11)

Call

W (x) : = \sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |H (n^{1 - α} (x - \frac{k}{n}))| .

(12)

Hence

\frac{f (\frac{k}{n}) |H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} = \sum_{j = 0}^{N} \frac{f^{(j)} (x)}{j!} {(\frac{k}{n} - x)}^{j} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} +

\frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} \int_{x}^{\frac{k}{n}} (f^{(N)} (t) - f^{(N)} (x)) \frac{{(\frac{k}{n} - t)}^{N - 1}}{(N - 1)!} d t .

Thus

(A_{n} (f)) (x) - f (x) = \sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} \frac{f (\frac{k}{n}) |H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} - f (x)

= \sum_{j = 0}^{N} \frac{f^{(j)} (x)}{j!} (\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} {(\frac{k}{n} - x)}^{j} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)}) + F,

(13)

where

F : = \sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} \int_{x}^{\frac{k}{n}} (f^{(N)} (t) - f^{(N)} (x)) \frac{{(\frac{k}{n} - t)}^{N - 1}}{(N - 1)!} d t .

(14)

So that

|(A_{n} (f)) (x) - f (x)| \leq

\sum_{j = 0}^{N} \frac{|f^{(j)} (x)|}{j!} (\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} \frac{1}{n^{j (1 - α)}} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)}) + |F| .

And hence

|(A_{n} (f)) (x) - f (x)| \leq (\sum_{j = 0}^{N} \frac{|f^{(j)} (x)|}{n^{j (1 - α)} j!}) + |F| .

(15)

Next we estimate

|F| = |\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} \int_{x}^{\frac{k}{n}} (f^{(N)} (t) - f^{(N)} (x)) \frac{{(\frac{k}{n} - t)}^{N - 1}}{(N - 1)!} d t| \leq

\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} |\int_{x}^{\frac{k}{n}} (f^{(N)} (t) - f^{(N)} (x)) \frac{{(\frac{k}{n} - t)}^{N - 1}}{(N - 1)!} d t| \leq

(16)

\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} ρ \leq (*),

where

ρ : = |\int_{x}^{\frac{k}{n}} |f^{(N)} (t) - f^{(N)} (x)| \frac{{|\frac{k}{n} - t|}^{N - 1}}{(N - 1)!} d t|

(17)

and

(*) \leq \sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{W (x)} ψ = ψ,

(18)

where

ψ : = ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{1}{n! n^{N (1 - α)}} .

(19)

The last part of inequality (18) comes from the following:

(i) Let

x \leq \frac{k}{n},

then

ρ = \int_{x}^{\frac{k}{n}} |f^{(N)} (t) - f^{(N)} (x)| \frac{{|\frac{k}{n} - t|}^{N - 1}}{(N - 1)!} d t \leq

\int_{x}^{\frac{k}{n}} ω_{1} (f^{(N)}, |t - x|) \frac{{|\frac{k}{n} - t|}^{N - 1}}{(N - 1)!} d t \leq

ω_{1} (f^{(N)}, |x - \frac{k}{n}|) \int_{x}^{\frac{k}{n}} \frac{{(\frac{k}{n} - t)}^{N - 1}}{(N - 1)!} d t \leq

ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{{(\frac{k}{n} - x)}^{N}}{N!} \leq ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{1}{N! n^{N (1 - α)}};

i.e., when

x \leq \frac{k}{n}

we get

ρ \leq ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{1}{N! n^{N (1 - α)}} .

(20)

(ii) Let

x \geq \frac{k}{n}

, then

ρ = |\int_{\frac{k}{n}}^{x} |f^{(N)} (t) - f^{(N)} (x)| \frac{{|t - \frac{k}{n}|}^{N - 1}}{(N - 1)!} d t| =

\int_{\frac{k}{n}}^{x} |f^{(N)} (t) - f^{(N)} (x)| \frac{{(t - \frac{k}{n})}^{N - 1}}{(N - 1)!} d t \leq

\int_{\frac{k}{n}}^{x} ω_{1} (f^{(N)}, |t - x|) \frac{{(t - \frac{k}{n})}^{N - 1}}{(N - 1)!} d t \leq

ω_{1} (f^{(N)}, |x - \frac{k}{n}|) \int_{\frac{k}{n}}^{x} \frac{{(t - \frac{k}{n})}^{N - 1}}{(N - 1)!} d t =

ω_{1} (f^{(N)}, |x - \frac{k}{n}|) \frac{{(x - \frac{k}{n})}^{N}}{N!} \leq ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{1}{N! n^{N (1 - α)}} = ψ .

Thus, we get in both cases

ρ \leq ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{1}{N! n^{N (1 - α)}} .

(21)

Therefore, from (16), (18), (19) and (21), we obtain

|F| \leq ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{1}{N! n^{N (1 - α)}} .

(22)

At the end from (15) and (22), we derive inequality (10). □

Corollary 1

(to Theorem 2). It holds

|(A_{n} (f)) (x) - f (x) -

\sum_{j = 1}^{N} \frac{f^{(j)} (x)}{j!} (\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} {(\frac{k}{n} - x)}^{j} \frac{|H (n^{1 - α} (x - \frac{k}{n}))|}{\sum_{k = ⌈n x - n^{α}⌉}^{[n x + n^{α}]} |H (n^{1 - α} (x - \frac{k}{n}))|})|

\leq ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{1}{N! n^{N (1 - α)}} .

(23)

Proof.

By (13) and (22). □

Corollary 2

(to Theorem 1). Let

x \in [- ϕ, ϕ]

,

ϕ > 0

, and

n \in N

be such that

n \geq 1 + ϕ

,

0 < α < 1

. Consider

p \geq 1

. Then

{∥A_{n} (f) - f∥}_{p, [- ϕ, ϕ]} \leq ω_{1} (f, \frac{1}{n^{1 - α}}) 2^{\frac{1}{p}} ϕ^{\frac{1}{p}},

(24)

By (24), we derive the

L_{p}

convergence of

A_{n} (f)

to f with rates, given f is uniformly continuous.

We finish with

Corollary 3

(to Theorem 2). In the assumptions of Theorem 2 and Corollary 2 we have

{∥A_{n} (f) - f∥}_{p, [- ϕ, ϕ]} \leq \sum_{j = 1}^{N} \frac{{∥f^{(j)}∥}_{p, [- ϕ, ϕ]}}{n^{j (1 - α)} j!} +

ω_{1} (f^{(N)}, \frac{1}{n^{1 - α}}) \frac{2^{\frac{1}{p}} ϕ^{\frac{1}{p}}}{N! n^{N (1 - α)}}, N \in N .

(25)

By (25) we derive again the

L_{p}

convergence of

A_{n} (f)

to f with rates.

Proof.

Inequality (25) comes by integration of (10) over

[- ϕ, ϕ]

and the triangle and homogeneity properties of the

L_{p}

norm. □

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The author declares no conflicts of interest.

References

Cardaliaguet, P.; Euvrard, G. Approximation of a function and its derivative with a neural network. Neural Netw. 1992, 5, 207–220. [Google Scholar] [CrossRef]
Anastassiou, G.A. Rate of Convergence of some neural network operators to the unit—Univariate case. J. Math. Appl. 1997, 22, 237–262. [Google Scholar] [CrossRef]
Anastassiou, G.A. Intelligent Systems II: Complete Approximation by Neural Network Operators; Springer: Heidelberg, Germany; New York, NY, USA, 2016. [Google Scholar]
Chen, Z.; Cao, F. The approximation operators with sigmoidal functions. Comput. Math. Appl. 2009, 58, 758–765. [Google Scholar] [CrossRef]
Costarelli, D.; Spigler, R. Approximation results for neural network operators activated by sigmoidal functions. Neural Netw. 2013, 44, 101–106. [Google Scholar] [CrossRef] [PubMed]
Costarelli, D.; Spigler, R. Multivariate neural network operators with sigmoidal activation functions. Neural Netw. 2013, 48, 72–77. [Google Scholar] [CrossRef] [PubMed]
Haykin, S. Neural Networks: A Comprehensive Foundation, 2nd ed.; Prentice Hall: New York, NY, USA, 1998. [Google Scholar]
McCulloch, W.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 7, 115–133. [Google Scholar] [CrossRef]
Mitchell, T.M. Machine Learning; WCB-McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
Yu, D.S.; Cao, F.L. Construction and approximation rate for feed-forward neural network operators with sigmoidal functions. J. Comput. Appl. Math. 2025, 453, 116150. [Google Scholar] [CrossRef]
Cen, S.; Jin, B.; Quan, Q.; Zhou, Z. Hybrid neural-network FEM approximation of diffusion coeficient in elyptic and parabolic problems. IMA J. Numer. Anal. 2024, 44, 3059–3093. [Google Scholar] [CrossRef]
Coroianu, L.; Costarelli, D.; Natale, M.; Pantiş, A. The approximation capabilities of Durrmeyer-type neural network operators. J. Appl. Math. Comput. 2024, 70, 4581–4599. [Google Scholar] [CrossRef]
Warin, X. The GroupMax neural network approximation of convex functions. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 11608–11612. [Google Scholar] [CrossRef] [PubMed]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Composition of Activation Functions and the Reduction to Finite Domain

Abstract

1. Introduction

2. Basics

3. Background

4. Main Results

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics