1. Introduction and Related Work
Pioneering work in the field of machine learning (ML) theory and computational approximation theory are integrating cutting-edge applications into our lives every day (see [
1,
2,
3,
4]). With the advancement of artificial intelligence (AI) technology, humanity needs more effective ML models. It is not possible to ignore the contribution of pure mathematics and physics at this point.
Throughout the history of mathematics and physics, the phenomena of “symmetry” and “asymmetry” have been of great importance both in theory and in practice (please, see also [
5]). When we have examined the literature, we have come across promising studies on how these phenomena are applied to real-world problems. It would be too assertive to say that the perspectives of different disciplines on symmetry-asymmetry phenomena and the tools and practices used are exactly the same. However, it would not be wrong to say that they are based on similar roots in principle and philosophy.
In [
6], together with using ANNs and symplectic NN architectures to solve differential equations according to Noether’s theorem used in mathematical physics, the “physical symmetry” that governs the behavior of physical systems, translational symmetry provides conservation of momentum, and rotational symmetry provides conservation of angular momentum and energy. Mattheakis et al. also exploited the property of being an odd or even function (in the sense of symmetry with respect to the origin/
axis in Euclidean geometry) for the generation of real-world data. In order to obtain better regression estimates with a standard feedforward multilayer perceptron (MLP) that is unaware of symmetry, a hidden layer called a “hub layer” was integrated into the classical MLP architecture. In this way, the desired odd or even function for the regression could be achieved.
In [
7], Faroughi et al. consider fundamental physics-based ANNs in three different categories and some applications to solid and fluid mechanics. They especially emphasize that working with sparse data is disadvantageous in terms of the training process under the regime of classical ANNs. When viewed from the philosophy of inserting physical structures into NN architectures, the symmetry applications of Lagrangian neural networks in fundamental physics attract attention.
Lu et al. recommend a new, elegant network called the deep operator network (DeepONet) inspired by the universal approximation theorem (UAT) for operator and the generalized version of UAT for the operator. They also establish that some implicit (for example, integrals, fractional Laplacians) and also explicit operators (like stochastic and deterministic differential equations) are learnable by DeepONet in [
8].
In [
9], Goswami et al. put under the spotlight a comprehensive literature review, highlighting the usefulness of the graph neural network operators, DeepONet and Fourier neural network operators, and their generalized versions, as well as their applications to the field of computational mechanics.
Again, as an interesting work, in [
10], Dong et al. handle concepts of “symmetry”, “anti-symmetry”, “non-symmetry” from the point of view of particle physics. In their data visualization, by employing the Heaviside step function, they use three equations verifying anti-symmetric, symmetric, and fully anti-symmetric cases for a two-dimensional dataset. They also think in their experimental studies that the use of continuous symmetry (like gauge, Lorentz) from field theory will open the door to promising results. Mostly, they study the performance of Equivariant Quantum Neural Networks (EQNNs) with discrete symmetries.
Another work that was the driving force behind the “symmetrization” technique, which was created by Anastassiou (please see [
11]) and used in this study, is in [
12]. Tahmasebi and Jegelka revealed that it is possible to use “smaller” amounts of data to train artificial neural networks by taking advantage of the “symmetry” phenomenon used in datasets. They have looked at the differential geometric perspective of ways to reduce the complexity of a given dataset in ANNs. More precisely, they have succeeded in extending Weyl’s law, which is used primarily in Spectral Theory, to include symmetry in the evaluation of the complexity of a dataset in machine learning (ML). It is thought that machine learning models that incorporate symmetry will not only make more accurate predictions, but will also fill a serious gap, especially in some disciplines where training data is scarce.
In [
13], Na and Park offer architecturally symmetric neural networks (NNs) in which even the weights act in a symmetric manner. Here, the authors particularly draw attention to the “difficulties” of memory space and time spent training the model in the performance process of the ANN models. In order to overcome these difficulties at an optimum level, they highlight the “symmetric” structure of the model with some examples, such as the XOR neural network. The NN structure here is topologically symmetric; in other words, this structure has symmetric weights, connections, and even inputs. Ref. [
14] is a comprehensive study of the approximation status of the idea of embedding infinite-dimensional inputs in classical neural network architectures, the approximation potentials of symmetric and antisymmetric networks, and the potentials of learning simple symmetric functions by gradient techniques.
The concept of “permutation symmetries” in multilayer perceptrons (MLPs), which began with the pioneering work of Hecht-Nielsen [
15], has spawned enthusiasm among many researchers up to the present day. Albertini et al. [
16] generalized this idea and investigated flip sign symmetries in artificial neural networks by considering single functions. Later, Laurent et al. [
17] investigated the concept of complexity in Bayesian neural network posteriors. They stated that considering symmetries in NN models yielded promising results and that they gathered considerable insights for future studies.
In [
18], Vlačić and Bölcskei solve the open problem posed by Fefferman [
19]. Accordingly, they develop a theory of the relationships between NNs with respect to a given function
by relating
and
-type nonlinear functions and their symmetries. In fact, they point out that the architecture, weights, and biases of feedforward neural networks (FNNs) can be determined with respect to a given nonlinearity
g.
In [
20], Perin and Deny state that incorporating “symmetries in the sense of transformations with group actions” into the architecture of deep neural networks (DNNs) is promising in enabling ML models to produce more accurate predictions. Thus, they answer the questions of how and under what conditions DNNs can learn symmetries in dataset.
In [
21], Hutter considers symmetric and anti-symmetric NNs. He also emphasizes the essential features of an ideal approximation architecture should have. The main emphasis in this work is to prove the existence of an “equivalent MLP” that allows symmetric MLP structures with the “universality” principle.
In this work, we benchmark the performance of the approximation of symmetrized NN operators versus classical NN operator’s performance. Inspired by [
12,
22], let us consider any function with mirror symmetry in the Euclidean plane (the function that is symmetric about the
y-axis—the concept of an “even” function). For instance, let us imagine that we are calculating the integral of such a function over a symmetric region. In this case, since this whole integral region, as is well known, contains two equal subregions, it is sufficient to perform the calculation over one subregion. It is also anticipated that a similar logic can be followed here, and that in ML models, which take symmetry into account, relatively fewer datasets can be used to save data and energy. As pointed out in the study in [
23], we observe that the popularity of computer programming languages such as Python, with minset of “number-crunching”, continues to increase. With the motivation we received from here, in our current study, we crowned our numerical examples with the graphics we obtained thanks to the computer programming language Python.
Beyond the above-mentioned papers, in [
24], Cantarini and Costarelli consider a research problem related to “simultaneous approximation” on well-known neural network (NN) operators activated by classical logistic functions with interesting Voronovskaja-type results. Again, in [
25], density results of deep neural network (DNN) operators have been highlighted. Moreover, here, Costarelli also draws attention to the issue that additional layers of NN operators can play a critical role in the degree of accuracy of the obtained approximations, and thus, increasing the number of hidden layers will improve the approximation. Also, in [
26], Costarelli and Spigler study the pointwise and uniform convergence of certain neural network operators. They consider these convergence cases by taking into account both the weights and the number of neurons of the network. They even propose solution methods obtained by using sigmoidal functions in the solutions of integro-differential equations and Volterra integral equations. Turkun and Duman [
27] have used regular summability methods to improve the convergence results of NN operators and increase the convergence rate. They also have put forth, with numerical examples and graphs, that they obtained better convergence results than those obtained by classical methods. In [
28], Hu et al. have parameterized deep neural network weights to be symmetric. Thanks to these symmetry parameterizations, memory requirements are significantly reduced, and computational efficiency is achieved. These improvements are revolutionary, especially for mobile applications. Ref. [
29] deals with an experimental study taking advantage of the “symmetry” property for credit risk modeling. The symmetry feature used in this credit risk modeling offers significant advantages. Some of these can be listed as temporal invariance of risk models, consistent representation of risk factors, and symmetry in financial network structures created by market participants. Peleshchak et al. have studied the classification of mines made of different materials with a neural network structure in [
30]. They have shown that in addition to the asymmetry of the neurons in the first and second hidden layers of this neural network structure with respect to the symmetry plane between the hidden layers, the activation functions used also indicated a significant change in the accuracy of the developed neural network model. Even when the symmetry of the number of neurons in the hidden layers is broken, a change in the value of the loss function is observed.
The current study is designed as follows. In
Section 1, we provide a gentle introduction and an in-depth literature survey. In
Section 2, we give an elegant theoretical background about the symmetrized density function and activation function, which are the backbone of our symmetrized NN operators. We insert this background into the approximation theorems in the subsequent section.
Section 3 is devoted to pointwise and uniform approximation results. In
Section 4, which is the core part of the paper, a table of convergence rates obtained by employing classical NN operators and symmetrized NN operators, as well as some special functions, is given. Furthermore, these numerical results are interpreted with nice graphics created using the Python symbolic library SymPy and Python numerical computation libraries SciPy and NumPy (see also [
31]). In
Section 5, the findings are discussed, and ideas that open the doors to the future are suggested. In
Appendix A, the Python programming language codes used throughout the article are shared with curious researchers from all fields, especially mathematicians, and also subject experts as “open source” (see [
31,
32,
33,
34,
35]).
2. Regarding Symmetrized Density Function and Creation of Symmetrized Neural Network (SNN) Operator
In this section, firstly, we consider our activation function as in (
1). Moreover, we create the density function
taking advantage of
. Being inspired by [
11], and also by a different line of vision compared with other papers like [
5,
24,
26,
27,
36], our operators shall be derived by a so-called “symmetrized density function” whose technique is given in between (
3) and (
6).
Now, our activation function to be used here is
for
. Above
is the parameter, and
is the deformation coefficient. For more, read [
36,
37,
38]. We employ the following density function
;
.
For
we have that
and
Adding (
3) and (
4), we obtain that
which is the key to this work.
is symmetric with respect to the
axis. In other words, we have obtained an even function. By (18.18) of [
37], for
we have that
sharing the same maximum at symmetric points. By Theorem 18.1, p.458 of [
37], we have that for all
is true. Consequently, we derive that
for every
By Theorem 18.2, p. 459 of [
37], for
, we have that
so that
therefore,
is a density function.
By Theorem 18.3, p. 459, of [
37], we have
Let
and
with
;
. Then,
where
Consequently, we obtain that
where
is as in (
12).
Depict the ceiling and integral part of a real number by and , respectively.
Theorem 1 ([
11])
. Let us take with and for each , Thenexists such that for ; . Similarly, we consider such that and
Consequently, it holds that
so that
that is,
Theorem 2 ([
11])
. Let us pick such that ; for , In addition, let so is valid for Then,is obtained such that for Remark 1. (i) By Remark 18.5, p.460 of [37], for at least some we have thatand for some such that Therefore, it holds that Hence,even ifbecause thenequivalentlytrue by(ii) For sufficiently large and , is valid. So, in general, it holds thatsuch that if and only if . Here, denotes a Banach space.
Definition 1. Y-valued linear “symmetrized neural network (SNN) operator” is defined as below:for arbitrary such that Definition 2 ([
39])
. (i) We employ the universal “moduli of continuity” defined as below for ,such that (ii) (33) is also valid for any , or where
, and
respectively.
(iii) or as
Definition 3. “Y-valued quasi-interpolation SNN operator” is defined asfor , or such that , . 3. Pointwise and Uniform Approximations by SNN Operators
We present a set of Y-valued symmetrized neural network (SNN) approximations to a function given quantitatively.
Theorem 3. Let , , , , , Then,
and
We get that , pointwise and uniformly.
Next, we give
Theorem 4. Let , , , , , Then,
For we get , pointwise and uniformly.
A high-order approximation follows.
Theorem 5. Let , , , , , and . Then,
(ii) assume , for some ; thus,and Again, we obtain , pointwise and uniformly.
Proof. It is long-winded, and similar to [
37]. We would like to bring it to the attention of interested readers. □
All integrals from now on are of Bochner-type.
Definition 4 ([
40])
. Let , Y be a Banach space, ; ; ; . We call the Caputo–Bochner left fractional derivative of order ς:for all Definition 5 ([
40])
. Let , , , Y be a Banach space, . We assume that , where . We call the Caputo–Bochner right fractional derivative of order ς:for every A Y-valued fractional approximation follows:
Theorem 6. Let , , , , , , , , Then, for
(ii) if , for , we have and
Above, when the sum
We have established the Y-valued fractional approximation, pointwise and uniform quantitatively of , the unit operator, as
Proof. The proof is long and similar to [
11], so it is saved for the interested reader. □
Next, we apply Theorem 6 for
Corollary 1. Let , , , , , Then,
and
When , we derive
Corollary 2. Let , , , , , Then,
and
4. Numerical Approach: Classical Neural Network (CNN) Operators Versus Symmetrized Neural Network (SNN) Operators
To the best of our knowledge, the approximation results and approximation speeds obtained with operators derived from the symmetrized density function with the theoretical framework in this article are different from other studies (for instance, [
24,
26,
27]) in the literature.
In other words, one of the gaps we noticed in the approximation theory literature was the speculative need for improving the convergence results of ANN operators. The elegant “symmetrization technique” presented by Anastassiou helped to close this gap. Below, we present the numerical examples, graphical results, and convergence speeds obtained according to different parameters by using the tools we introduced.
The “difference” between the measures RHS and LHS introduced in detail below are almost closed after a certain number of and with appropriate parameter choices for .
The fact that this difference is very close to zero, which is what is desired in approximation theory, is possible with SNN operators compared with the CNN operators, makes the present work interesting. It points to our main contribution to the literature.
Three different test functions are used in the six examples given below. As a result of the approaches made to each of these test functions first with CNN operators and then with SNN operators and the “difference” measurement results calculated using Python 3.9, we see that SNN operators give smaller “difference” values compared with CNNs by looking at the bolded numerical values at the bottom of the six tables. In addition, we support these results, which we have concretely obtained with numerical data, with six graphic visuals.
Notation 1 ([
37])
. Note that Classical NN operators are defined for , , as follows:such that and Notation 2. Remember that CNN operators are defined for , , as in (52). Furthermore, let us introduce the concepts of “Right Hand Side (RHS)”, and “Left Hand Side (LHS)” of the argument in (53) for “pointwise convergence” defined as follows, being inspired by Theorem 3. For any α such that where
Right Hand Side (RHS),
Left Hand Side (LHS):= ,
and,
Difference := =: RHS − LHS.
The three estimates, , , and , shall serve as criteria for interpreting the speed of convergence of both classical NN (CNN) operators and symmetrized NN (SNN) operators, and also creating the convergence speed tables employing Python 3.9 programming language.
Example 1. For the activation function defined in (1), , , and the density function in (2), , and the operators defined in (52), the convergence of the operators (blue), (red), (fuchsia), (aqua), (black) to (gold) is displayed in Figure 1. The convergence speeds of the CNN operators to the test function
under the regime of certain parameters are given below in
Table 1.
Here, for ; RHS, ; LHS, Difference:=RHS − LHS.
Result: We witness that the Difference value decreases for increasing numbers from 5 to 300, which shows us that the approximation is promising after a certain value.
Example 2. For the activation function defined in (1); , , , and the density function in (6), , and the “symmetrized” operators defined in (32), the convergence of the operators (blue), (red), (fuchsia), (aqua), (black) to (gold) is displayed in Figure 2. As seen in
Figure 2 and
Table 2 below, it is observed that the approximations by SNN operators clearly give better results than the CNN operators. We monitor this situation with both numerical results and graphical visuals. In other words, they overlap perfectly.
The convergence speeds of the SNN operators to the test function
under the regime of certain parameters are given below in
Table 2.
Example 3. For the activation function defined in (1), , , and the density function in (2), , and the operators defined in (52), the convergence of the operators (blue), (red), (fuchsia), (aqua), (black) to (gold) is displayed in Figure 3. The convergence speeds of the CNN operators to the test function
under the regime of certain parameters are given below in
Table 3.
Example 4. For the activation function defined in (1), , , and the density function in (6), ; and the “symmetrized” operators defined in (32), the convergence of the operators (blue), (red), (fuchsia), (aqua), (black) to (gold) is displayed in Figure 4. The convergence speeds of the SNN operators to the test function
under the regime of certain parameters are given below in
Table 4.
Example 5. For the activation function defined in (1), , , and density function in (2), , and the operators defined in (52), the convergence of the operators (blue), (red), (fuchsia), (aqua), (black) to (gold) is displayed in Figure 5. The convergence speeds of the CNN operators to the test function
under the regime of certain parameters are given below in
Table 5.
Example 6. For the activation function defined in (1); , , and density function in (6), , and the “symmetrized” operators defined in (32), the convergence of the operators (blue), (red), (fuchsia), (aqua), (black) to (gold) is displayed in Figure 6. The convergence speeds of the SNN operators to the test function
under the regime of certain parameters are given below in
Table 6.
As a result of all numerical examples, we conclude that the Difference value decreases for increasing numbers from 5 to 300, which shows us that the approximation is promising after a certain value for especially symmetrized versions of the NN operators.
5. Conclusions and Future Remarks
We believe that this study provides a two-pronged approach as a contribution to the literature. In one aspect, the theoretical background of symmetric univariate neural network operators has been emphasized once again. In another aspect, it has been shown with concrete examples and graphs how these symmetrized neural network operators (SNNs) achieve better approximation results under certain parameter choices compared with the classical ones.
Thanks to the major role that artificial neural networks (ANNs) and computational methods play in the advancement of scientific research, it is of great importance that any new developments in this field are supported by mathematical consistence and robustness for the society.
Furthermore, we think that we have presented the mathematical verification and comparison employing the computer programming language Python 3.9, which is open-source and flexible, to provide evidence of the superiority of symmetric structure. When we examine other studies in the literature, we can say that we have reached a consensus on the effectiveness and game-changing role of symmetrical structures in approximation theory. In addition, our findings anchor the effectiveness of symmetric structures in neural network architectures on the model and the potential for profound transformations in outputs in machine learning.
In a nutshell, we verified the basic theorems of ANN operators in the theory of approximations with numerical examples, compared classical ANN operators with symmetrized ANN accelerated operators, and showed that the symmetric structure, in particular, provides superior results.
We anticipate that this theoretical background will guide the latest applications of future artificial intelligence (AI) subfields such as “Geometric Deep Learning”, “Graph Learning”, “Graph Neural Networks”, “Quantum Neural Networks”, as well as a theory of machine learning (ML), applied analysis, and computational mathematics.
Lastly, there is undoubtedly still a lot to be carried out with other types of activation functions, such as half-hyperbolic tangent functions, sigmoidal functions, ReLu, activated NN operators, etc., in a symmetrized manner. In addition, in
Appendix A, we shared the Python 3.9 codes of numerical applications: speed of the approximation of the CNN and SNN operators to certain functions and their 2D space visualizations constructed by using Python 3.9 libraries NumPy 1.21.0, Matplotlib 3.8, and Math.