Inverse Result of Approximation for the Max-Product Neural Network Operators of the Kantorovich Type and Their Saturation Order

In this paper, we consider the max-product neural network operators of the Kantorovich type based on certain linear combinations of sigmoidal and ReLU activation functions. In general, it is well-known that max-product type operators have applications in problems related to probability and fuzzy theory, involving both real and interval/set valued functions. In particular, here we face inverse approximation problems for the above family of sub-linear operators. We first establish their saturation order for a certain class of functions; i.e., we show that if a continuous and non-decreasing function f can be approximated by a rate of convergence higher than 1/n, as n goes to +∞, then f must be a constant. Furthermore, we prove a local inverse theorem of approximation; i.e., assuming that f can be approximated with a rate of convergence of 1/n, then f turns out to be a Lipschitz continuous function.


Introduction
The introduction of the max-product version of families of linear approximation operators is due to Bede, Coroianu and Gal (see, e.g., [1,2]) and it led to a new branch of approximation theory. The new theory of max-product operators has been deeply studied, and recently the above-mentioned authors summarized their results in a complete monograph [3].
In general, the max-product version of a sequence/net of linear operators is a family of nonlinear (more precisely sub-linear) operators with better approximation properties of their original version: in many cases, the order of convergence is faster than their linear counterparts [4][5][6]. Further, the above operators can also be useful, for instance, in the applications of probability and fuzzy theory involving both real and interval/set valued functions (see, e.g., [7,8]).
In the present paper, we study the max-product form of the neural network (NN) operators of the Kantorovich type K (M) n , first introduced in [9] and here recalled in Definition 3 of Section 2. In general, the NN operators (see [10]) are strictly related to the theory of artificial neural networks, which has been introduced in order to provide a very simple model for the human brain, which is able to reproduce all its main abilities [11][12][13][14].
Each basic element composing a neural network is called an artificial neuron; its behavior is regulated by suitable activation functions, which must represent the two possible states of the biological neuron: the activation and the quiet phases [15]. From the mathematical point of view, functions which better represent the latter fact are those with sigmoidal shape (see, e.g., [16]).
For the above reasons, in the present paper, we mainly consider the operators K (M) n activated by suitable sigmoidal functions. Very useful examples of sigmoidal functions (in view of their importance in learning algorithms, [17]) are, e.g., the logistic and the hyperbolic tangent functions [18].
However, independently of its biological meaning, in some recent papers, a new unbounded activation function has also been introduced and deeply investigated. This function is the so-called rectified linear unit (ReLU) function (see, e.g., [19]), and it is simply defined by the positive part of x, for every x ∈ R. The ReLU activation is revealed to be very suitable for training deep (i.e., multi-layer) neural networks, in view of the very simple form that is assumed by its derivative (whenever it exists).
Here, we show that the above operators can also be based on a certain finite linear combination of ReLU activation function, and in this case, the approximation properties of K (M) n are also preserved. Problems of interpolation, or more in general, of approximation, are related to the topic of training a neural network by sample values belonging to a certain training set: this explains the interest in studying approximation results by means of NN operators in various contexts [15,[20][21][22][23][24].
Indeed, as can be seen from the references therein, results in this sense have been studied deeply in terms of various aspects, such as the convergence and the order of approximation.
In this paper, we deal with the problem of the saturation order and of inverse results of approximation.
In general, the problem of establishing the saturation order for a family of operators L n (see [25][26][27][28]), n ∈ N, consists in determining a class of functions D, a certain subclass E of trivial functions of D, and a positive non-increasing function ϕ(n), n ∈ N, such that there exists g ∈ D \ E with L n g − g = O(ϕ(n)), as n → +∞, and with the property that, for any f ∈ D with: it turns out that f ∈ E , and vice versa. Here, · denotes any suitable norm on D.
In this case, ϕ(n) is said to be the saturation order of the approximation process L n , and it represents the best possible order of approximation that can be achieved on D by the above approximation operators. In case of the max-product NN operators of the Kantorovich type, according to the studies given in [9,29], we expect that for a certain subclass D of C([0, 1]) (endowed with the usual max-norm), with ϕ(n) = 1/n, n ∈ N, then f is constant over [0, 1]. Hence, we also have that the trivial class of functions E is given by constant functions. Indeed, one of the main results that we establish in the present paper is exactly the proof of the above claim.
Further, since in [9] it has been proved that, in the space Lip([0, 1]), the order of approximation is exactly 1/n, as n goes to +∞, it is natural to ask if also the converse implication holds.
In this paper, we proved exactly a local version of such an inverse approximation theorem, i.e., if the relation:
Furthermore, we will denote by Lip([0, 1]) the subspace of C([0, 1]) of the Lipschitz continuous functions on [0, 1], i.e., the space of functions f for which there exists a positive constant L such that Finally, we also denote by · ∞ the classical max-norm. Obviously, all the above notations can be given by replacing the interval [0, 1] with any bounded or unbounded interval I ⊂ R. We now recall the definition of a sigmoidal function introduced by Cybenko [30].
In what follows, we consider non-decreasing sigmoidal functions σ that satisfy the following conditions: , the so-called logistic function), and by σ h (x) := (tanh(x) + 1)/2, x ∈ R (i.e., the so-called hyperbolic tangent activation function). Note that both σ (x) and σ h (x) satisfy (Σ3) for all α > 0 in view of their exponential decay at x → −∞. Further, we can also recall the definition of the sigmoidal functions that can be generated by the well-known central B-splines of order n ∈ N + where the function (x) + := max{x, 0}. Hence, we can define the sigmoidal function σ M n generated by M n as follows: Obviously, σ M n satisfies assumption (Σ1) for every n ≥ 1. Further, since M n have compact supports, and the supports are contained in the intervals − n 2 , n 2 , it turns out that σ M n also satisfies (Σ3) for every α > 0. Further, assumption (Σ2) is satisfied for n ≥ 1. Now, considering (from now on) a function σ that satisfies the above assumptions, we can recall the definition of the density (kernel) function φ σ , that is: For the function φ σ , the following lemma can be proved.
For a proof of conditions (i)-(iv), see [31]. Now, we introduce the following notation used in the literature (see, e.g., [3]) in order to define the so-called max-product type operators.
Now, we recall the following lemma that will be useful in order to show that the family of operators investigated in this paper are well-defined.
The definition of the max-product NN operators of the Kantorovich type can now be recalled.
Definition 3. Let f : [0, 1] → R be a bounded and locally integrable function and let n ∈ N + . The max-product NN operators of the Kantorovich type activated by σ are defined by: Clearly, in view of the properties established in Lemma 2, it turns out that K (M) n ( f , x) are well-defined, and, moreover, it is quite simple to observe that K Concerning the assumptions (Σ1), (Σ2), and (Σ3) assumed above, we can observe that condition (Σ2) could be avoided, requiring that the sigmoidal function σ is such that σ(3) > σ(1) and condition (iii) of Lemma 1 is fulfilled by φ σ . The main advantage that can be achieved by the latter fact is that one could apply all the approximation results established below also to discontinuous and non-smooth sigmoidal functions.
An example of continuous but non-smooth sigmoidal function (given according to the above remark) is the so-called ramp function σ R (x) (see [12]) defined as follows: In particular, σ R satisfies condition (Σ3) for all α > 0, and φ σ R turns out to be a function with compact support; moreover, σ R (3) > σ R (1).
Note that (see [27]) the sigmoidal function σ M 1 (3 ·) coincides with the ramp function σ R ; now recalling the definition of the well-known rectified linear unit (ReLU) activation function (see, e.g., [19,32]): it turns out that: Thus, the density function φ σ M 1 (3·) can be expressed in terms of ReLU activation function as follows: As a consequence of the latter relation, the NN operators K (M) n activated by σ M 1 (3 ·) can be considered as an NN activated by the above linear combination of ReLU activation functions. Recently, it has been proved that ψ ReLU is very suitable in order to train deep (i.e., multi-layer) neural networks; see, e.g., [33,34]. For more details concerning ψ ReLU , see also [19,35].

The Saturation Order
It is well known that, if f ∈ C + ([0, 1]) the family K (M) n ( f , ·) converges uniformly to f (see [9]). Moreover, we also know that the following quantitative estimates: as n → +∞, there holds if condition (Σ3) is satisfied for α ≥ 1, M > 0 and where: denotes the usual modulus of continuity of the function f ∈ C + ([0, 1]) (see, e.g., [36]). From the latter result, it turns out that, if the function f belongs to Lip([0, 1]), then the order of uniform approximation is 1/n, as n → +∞.
In this section, we study the saturation order for the NN operator of the Kantorovich type activated by sigmoidal functions; i.e., we show that 1/n is the best possible order of approximation that can be achieved for non-decreasing functions that belong to C + ([0, 1]).
In order to reach our main purpose, we need some preliminary lemmas.

Lemma 4.
Let I ⊆ R be a bounded or unbounded interval and f ∈ C(I). Suppose in addition that there exists an absolute positive constant C with the property that for every ε > 0 there exists n(ε) ∈ N + such that for any n ∈ N, n ≥ n(ε) and j ∈ Z, with j n , j+1 n ∈ I, we have Then f is a constant function.
Proof. Let us choose arbitrary x 0 , y 0 ∈ I, x 0 < y 0 and ε > 0. The continuity of f implies the existence of n 0 (ε) ∈ N + , such that for any x, y ∈ I, |x − We now fix n 1 = max n(ε), n 0 (ε), 1 y 0 −x 0 , where n(ε) is the constant arising from the assumptions, and let us choose arbitrary n ∈ N such that n ≥ n 1 . Since 1 n 1 ≤ y 0 − x 0 , it follows that there exists k ∈ Z and l ∈ N such that Applying successively the triangle inequality we get By relation (4), we have On the other hand, we observe that k+l n − k n ≤ y 0 − x 0 , which implies that l ≤ n(y 0 − x 0 ). Thus, we obtain Now, since ε > 0 has been chosen arbitrarily, passing to the infimum for ε > 0 in the previous inequality, we deduce that f (x 0 ) = f (y 0 ). By the arbitrariness of x 0 and y 0 , it turns out that f is a constant function on the whole I.

Lemma 5.
Let I ⊆ R be a bounded or unbounded interval and f ∈ C(I) be a non-decreasing function with the property that for any couple a, b, a < b, of inner points of I, and for every ε > 0 there exists n(a, b, ε) ∈ N + such that for any n ∈ N, n ≥ n(a, b, ε) and j ∈ Z, such that Then f is a constant function over I.
Proof. Let n ∈ N, n ≥ n(a, b, ε), such that 1 n ≤ b − a, and let us choose arbitrary j ∈ Z such that a ≤ j n ≤ j+1 n ≤ b. We observe that for any k ∈ N + , we have Therefore, applying successively relation (5) we obtain , j := 2 k j, n := 2 k n .
Taking, respectively, the sums of all the terms in the first and second parts of the previous inequalities, we obtain it follows that and since f is non-decreasing, we get By Lemma 4 it follows that f is constant in [a, b]. Since a and b are two arbitrary inner points of I and f is continuous, it easily results that f is constant in I.
Note that Lemma 4 and Lemma 5 can also be extrapolated from the monograph [3]. Now we can prove the main theorem of this section. Proof. Let us choose arbitrary a, b ∈ (0, 1), a < b. Further, let n ∈ N + be sufficiently large such that 1 n < b − a and b < 1 − 1 2n . Now, we fix j ∈ {0, 1, . . . , n − 2}, such that j+1 n , By Lemma 3, it follows that which easily implies that Moreover, recalling that f is non-decreasing, it follows that K Then, we can prove that for every ε > 0, there exists n(ε) ∈ N + such that for any n ∈ N + , with n ≥ n(ε) and j ∈ Z, such that j+1 n , 2j+1 2n ∈ [a, b], we have and hence the proof follows by Lemma 5. (as made in [9]) and then working with continuous and non-decreasing f : [a, b] → R + .

Local Inverse Result
The main aim of this section is to prove an inverse theorem of approximation. We will use a strategy similar to that presented in the previous section. Lemma 6. Let I ⊆ R be a bounded or unbounded interval and f ∈ C(I). Suppose in addition that there exists an absolute positive constant C such that for any sufficiently large n ∈ N and every j ∈ Z, with j n , j+1 n ∈ I, we have Then f ∈ Lip(I).
Proof. Let us choose arbitrary x 0 , y 0 ∈ I, x 0 < y 0 . Moreover, let ε > 0 with ε < y 0 − x 0 . The continuity of f implies the existence of n 0 (ε) ∈ N + , such that for any x, y ∈ I, |x − x 0 | ≤ 1 n 0 (ε) , |y − y 0 | ≤ 1 n 0 (ε) , we have We now fix n 1 = max n 0 (ε), 1 y 0 −x 0 , and let us choose arbitrary n ∈ N with n ≥ n 1 . Proceeding as in the proof in Lemma 4, we get and since ε < y 0 − x 0 , we get Then the thesis follows by the arbitrariness of x 0 and y 0 .

Lemma 7.
Let I ⊆ R be a bounded or unbounded interval and f ∈ C(I) be a non-decreasing function with the property that, for any couple a, b, a < b, of inner points of I, there exists a C > 0 such that for every sufficiently large n ∈ N and j ∈ Z, with j+1 n , 2j+1 2n ∈ [a, b], we have Then f ∈ Lip([a, b]).
Proof. Let n ∈ N be sufficiently large such that 1 n ≤ b − a, and let us choose arbitrary j ∈ Z such that a ≤ j n ≤ j+1 n ≤ b. Arguing as in Lemma 5, we get Taking the limit as k → +∞, we obtain and, since f is non-decreasing, Hence, by Lemma 6, it follows that f ∈ Lip([a, b]) .
For results similar to that ones of Lemma 6 and Lemma 7 (only in the case of bounded intervals), one can see, e.g., [3]. Now we can finally prove a (local) inverse theorem of approximation.

Data Availability Statement:
The study did not report any data.