Abstract
In this work, we perform univariate approximation with rates, basic and fractional, of continuous functions that take values into an arbitrary Banach space with domain on a closed interval or all reals, by quasi-interpolation neural network operators. These approximations are achieved by deriving Jackson-type inequalities via the first modulus of continuity of the on hand function or its abstract integer derivative or Caputo fractional derivatives. Our operators are expressed via a density function based on a q-deformed and -parameterized hyperbolic tangent activation sigmoid function. The convergences are pointwise and uniform. The associated feed-forward neural networks are with one hidden layer.
Keywords:
q-deformed; λ-parameterized hyperbolic tangent activation function; abstract neural network approximation; abstract quasi-interpolation operator; modulus of continuity; abstract Caputo fractional derivative; fractional approximation MSC:
26A33; 41A17; 41A25; 41A30; 46B25
1. Introduction
The author of [,], see Chapters 2–5, was the pioneer to start neural network quantitative approximation to continuous functions by precisely defined neural network operators of Cardaliaguet–Euvrard and “Squashing” types, by using the modulus of continuity of the given function or its high order derivative, and deriving almost sharp Jackson-type inequalities. He dealt with both the univariate and multivariate cases. The defining these operators “bell-shaped” and “squashing” functions were taken with a compact support. Furthermore in [], he provides the Nth order asymptotic expansion for the error of weak approximation of these two operators to a particular natural class of smooth functions, for more visit Chapters 4–5 therein.
Coming back the author motivated by [], who continued his research on neural networks approximation by employing and using the appropriate quasi-interpolation operators of sigmoidal and hyperbolic tangent type, which resulted in [,,,,], by treating both the univariate and multivariate cases. He also completed the corresponding fractional cases [,,].
Let h be a general sigmoid function with , and the horizontal asymptotes. Of course h is strictly increasing over . Let the parameter and . Then clearly and , furthermore it holds . Consequently the sigmoid has a graph inside the graph of , of course with the same asymptotes . Therefore has derivatives (gradients) at more points x than has different than zero or not as close to zero, thus killing a smaller number of neurons! Furthermore, of course is more distant from , than it is. A highly desired fact in neural networks theory.
The brain non-symmetry has been clearly discovered in animals and humans in terms of structure, function and behaviors. This observation reflects evolutionary, hereditary, developmental, experiential and pathological factors. Therefore it is natural to consider for our study deformed neural network activation functions and operators. So this paper is a specific study under this philosophy of approaching reality as close as possible.
Consequently the author here performs q-deformed and -parameterized hyperbolic tangent function activated neural network approximations to continuous functions over closed intervals of reals or over the whole with values to an arbitrary Banach space . Finally he deals with the related X-valued fractional approximation. All convergences here are quantitative expressed via the first modulus of continuity of the on hand function or its X-valued high order derivative, or X-valued fractional derivatives and given by almost attained Jackson-type inequalities.
Our closed intervals are not necessarily symmetric to the origin. Some of our upper bounds to error quantity are very flexible and general. In preparation to derive our results we describe important properties of the basic density function defining our operators which is induced by a q-deformed and -parameterized hyperbolic tangent sigmoid function.
Feed-forward X-valued neural networks (FNNs) with one hidden layer, the only type of networks we use in this work, are mathematically expressed by
where for , are the thresholds, are the connection weights, are the coefficients, is the inner product of and x, and k is the activation function of the network. For more in neural networks read [,,].
2. About q-Deformed and λ-Parameterized Hyperbolic Tangent Function gq,λ
Here all this background comes from [,].
We use , see (1), and exhibit that it is a sigmoid function and we will present several of its properties related to the approximation by neural network operators. It will act as activation function.
So, let us consider the function
We have that
We notice also that
That is
and
hence
It is
i.e.,
Furthermore,
i.e.,
We find that
therefore is strictly increasing.
Next we obtain ()
We observe that
So, in case of , we have that is strictly concave up, with
Furthermore, in case of , we have that is strictly concave down.
Clearly, is a shifted sigmoid function with , and , (a semi-odd function), see also [].
By , , we consider the function
; . Notice that , so the x-axis is horizontal asymptote.
We have that
Thus,
a deformed symmetry.
Next, we have that
Let , then and (by being strictly concave up for ), that is . Hence, is strictly increasing over
Let now , then , and , that is
Therefore is strictly decreasing over
Let us next consider, We have that
By
By
Clearly by (13) we obtain that , for
More precisely is concave down over , and strictly concave down over
Consequently has a bell-type shape over
Of course it holds
At , we have
Thus,
That is, is the only critical number of over . Hence at achieves its global maximum, which is
Conclusion: The maximum value of is
We mention
Theorem 1
([]). We have that
Thus,
Similarly, it holds
However, , ∀
Hence,
and
It follows
Theorem 2
([]). It holds
So that is a density function on
We need the following result
Theorem 3
([]). Let , and with ; . Then,
where
Let the ceiling of the number, and the integral part of the number.
Theorem 4
([]). Let and so that . For , we consider the number with and . Then,
We also mention
Remark 1
([]). (i) We have that
where
(ii) Let . For large n we always have . Furthermore, , iff . In general it holds
Let be a Banach space.
Definition 1.
Let and . We introduce and define the X-valued linear neural network operators
For large enough n we always obtain Furthermore, iff The same is used for real valued functions. We study here the pointwise and uniform convergence of to with rates.
For convenience, also we call
(the same can be defined for real valued functions) that is
So that
Consequently, we derive that
where as in (25).
We will estimate the right hand side of the last quantity.
For that we need, for the first modulus of continuity
Similarly, it is defined for (uniformly continuous and bounded functions from into X), for (continuous and bounded X-valued), and for (uniformly continuous).
The fact or , is equivalent to , see [].
We make
Definition 2.
When , or , we define
, , the X-valued quasi-interpolation neural network operator.
We give
Remark 2.
We have that
and
and
and, finally,
a convergent series in .
So, the series is absolutely convergent in X, hence it is convergent in X and . We denote by , for , similarly it is defined for
3. Main Results
We present a set of X-valued neural network approximations to a function given with rates.
Theorem 5.
Let , , , , , Then,
and
(ii)
We obtain that , pointwise and uniformly.
Proof.
Next we give
Theorem 6.
Let , , , Then
(i)
and
(ii)
For we obtain , pointwise and uniformly.
Proof.
We observe that
proving the claim. □
We need the X-valued Taylor’s formula in an appropriate form:
Theorem 7
([,]). Let , and , where and X is a Banach space. Let any . Then,
The derivatives , , are defined like the numerical ones, see [], p. 83. The integral in (46) is of Bochner type, see [].
By [,] we have that: if , then and
In the next we discuss high order neural network X-valued approximation by using the smoothness of f.
Theorem 8.
Let , , , , , and . Then,
(i)
(ii) assume further , for some , it holds
and
(iii)
Again we obtain , pointwise and uniformly.
Proof.
It is lengthy, and as similar to [] is omitted. □
All integrals from now on are of Bochner type [].
We need
Definition 3
([]). Let , X be a Banach space, ; , ( is the ceiling of the number), . We assume that . We call the Caputo–Bochner left fractional derivative of order α:
If , we set the ordinary X-valued derivative (defined similar to numerical one, see [], p. 83), and also set
By [], exists almost everywhere in and .
If , then by [], hence
We mention
Definition 4
([]). Let , X be a Banach space, , . We assume that , where . We call the Caputo–Bochner right fractional derivative of order α:
We observe that for , and
By [], exists almost everywhere on and .
If , and by [], hence
We make
Remark 3
([]). Let , , , , . Then,
Thus, we observe
Consequently,
Similarly, let , , , , , then
So for , , , , , we find
and
By [] we obtain that , and by [] we obtain that
We present the following X-valued fractional approximation result by neural networks.
Theorem 9.
Let , , , , , , , , Then,
(i)
(ii) if , for , we have
(iii)
∀
and
(iv)
Above, when the sum
As we see here we obtain X-valued fractionally type pointwise and uniform convergence with rates of the unit operator, as
Proof.
The proof is very lengthy and similar to []; therefore, it is omitted. □
Next we apply Theorem 9 for
Theorem 10.
Let , , , Then
(i)
and
(ii)
When we derive
Corollary 1.
Let , , , Then
(i)
and
(ii)
We make
Remark 4.
Some convergence analysis follows based on Corollary 1.
Then it holds
where
The other summand of the right hand side of (65), for large enough n, converges to zero at the speed , so it is about , where is a constant.
Then, for large enough , by (65), (68) and the above comment, we obtain that
where , converging to zero at the high speed of
In Theorem 5, for and for large enough , the speed is . So by (69), converges much faster to zero. The last comes because we assumed differentiability of f. Notice that in Corollary 1 no initial condition is assumed.
Funding
This research received no external funding.
Data Availability Statement
Not applicable.
Conflicts of Interest
The author declare no conflict of interest.
References
- Anastassiou, G.A. Rate of convergence of some neural network operators to the unit-univariate case. J. Math. Anal. Appl. 1997, 212, 237–262. [Google Scholar] [CrossRef]
- Anastassiou, G.A. Quantitative Approximations; Chapman & Hall: Boca Raton, FL, USA; CRC: New York, NY, USA, 2001. [Google Scholar]
- Chen, Z.; Cao, F. The approximation operators with sigmoidal functions. Comput. Math. Appl. 2009, 58, 758–765. [Google Scholar] [CrossRef]
- Anastassiou, G.A. Univariate hyperbolic tangent neural network approximation. Math. Comput. Model. 2011, 53, 1111–1132. [Google Scholar] [CrossRef]
- Anastassiou, G.A. Multivariate hyperbolic tangent neural network approximation. Comput. Math. 2011, 61, 809–821. [Google Scholar]
- Anastassiou, G.A. Multivariate sigmoidal neural network approximation. Neural Netw. 2011, 24, 378–386. [Google Scholar] [CrossRef]
- Anastassiou, G.A. Inteligent Systems: Approximation by Artificial Neural Networks. In Intelligent Systems Reference Library; Springer: Berlin/Heidelberg, Germany, 2011; Volume 19. [Google Scholar]
- Anastassiou, G.A. Univariate sigmoidal neural network approximation. J. Comput. Anal. Appl. 2012, 14, 659–690. [Google Scholar]
- Anastassiou, G.A. Fractional neural network approximation. Comput. Math. Appl. 2012, 64, 1655–1676. [Google Scholar] [CrossRef]
- Anastassiou, G.A. Intelligent Systems II: Complete Approximation by Neural Network Operators; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2016. [Google Scholar]
- Anastassiou, G.A. Nonlinearity: Ordinary and Fractional Approximations by Sublinear and Max-Product Operators; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2018. [Google Scholar]
- Haykin, S. Neural Networks: A Comprehensive Foundation, 2nd ed.; Prentice Hall: New York, NY, USA, 1998. [Google Scholar]
- McCulloch, W.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 7, 115–133. [Google Scholar] [CrossRef]
- Mitchell, T.M. Machine Learning 1997.
- Anastassiou, G.A. q-Deformed and lambda-parametrized hyperbolic tangent function based Banach space valued multivariate multi layer neural network approximations. Ann. Univ. Sci. Bp. Sect. Comp. 2023, in press. [Google Scholar]
- El-Shehawy, S.A.; Abdel-Salam, E.A.-B. The q-deformed hyperbolic Secant family. Intern. J. Appl. Math. Stat. 2012, 29, 51–62. [Google Scholar]
- Anastassiou, G.A. General sigmoid based Banach space valued neural network approximation. J. Comput. Anal. Appl. 2023, 31, 520–534. [Google Scholar]
- Anastassiou, G.A. Vector fractional Korovkin type Approximations. Dyn. Syst. Appl. 2017, 26, 81–104. [Google Scholar]
- Anastassiou, G.A. Strong Right Fractional Calculus for Banach space valued functions. Rev. Proyecc. 2017, 36, 149–186. [Google Scholar] [CrossRef]
- Anastassiou, G.A. A strong Fractional Calculus Theory for Banach space valued functions. Nonlinear Funct. Anal. Appl. 2017, 22, 495–524. [Google Scholar]
- Shilov, G.E. Elementary Functional Analysis; Dover Publications, Inc.: New York, NY, USA, 1996. [Google Scholar]
- Mikusinski, J. The Bochner integral; Academic Press: New York, NY, USA, 1978. [Google Scholar]
- Kreuter, M. Sobolev Spaces of Vector-Valued Functions. Master’s Thesis, Ulm University, Ulm, Germany, 2015. [Google Scholar]
- Anastassiou, G.A.; Karateke, S. Parametrized hyperbolic tangent induced Banach space valued ordinary and fractional neural network approximation. Progr. Fract. Differ. Appl. 2023, in press. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).