A Neural Network Approximation Based on a Parametric Sigmoidal Function

Abstract: It is well known that feed-forward neural networks can be used for approximation to functions based on an appropriate activation function. In this paper, employing a new sigmoidal function with a parameter for an activation function, we consider a constructive feed-forward neural network approximation on a closed interval. The developed approximation method takes a simple form of a superposition of the parametric sigmoidal function. It is shown that the proposed method is very effective in approximation of discontinuous functions as well as continuous ones. For some examples, the availability of the presented method is demonstrated by comparing its numerical results with those of an existing neural network approximation method. Furthermore, the efficiency of the method in extended application to the multivariate function is also illustrated.


Introduction
Cybenko [1] and Funahashi [2] proved that any continuous function can be uniformly approximated on a compact set I ⊂ R n by the feed-forward neural networks (FNN) in the form of where σ is called an activation function, ω k ∈ R n are weights, θ k ∈ R are thresholds, and α k ∈ R are coefficients.It is called the universal approximation theorem.Moreover, Hornik et al. [3] showed that any measurable function can be approximated on a compact set by the form of the FNN.Some constructive approximation methods by the FNN were developed in the literature [4][5][6][7].
Other examples of the function approximation by the FNN can be found in the works of Cao et al. [8], Chui and Li [9], Ferrari and Stengel [10], and Suzuki [11].Particularly, the activation function σ is a basic architecture of the neural networks because it imports non-linear properties into the networks.This allows the artificial neural networks to learn from complicated non-linear mappings between inputs in general.
In this paper, aiming efficient approximation to the data obtained from continuous or discontinuous functions on a closed interval, we develop a feed-forward neural network approximation method based on a sigmoidal activation function.First, in the following section, we propose a parametric sigmoidal function σ [m] of the form (6) for an activation function.In Section 3 we construct an approximation formula S [m] N f (x) in (19) based on the proposed sigmoidal function σ [m] .It is shown that S [m] N f (x) approximates every given data with error O (δ m ), 0 < δ < 1, for the parameter m large enough.This implies the so-called quasi-interpolation property of the presented FNN approximation.
Mathematics 2019, 7, 262; doi:10.3390/math7030262www.mdpi.com/journal/mathematicsFurthermore, in order to better the interpolation errors near the end-points of the given interval, a correction formula (27) is introduced in Section 4. The efficiency of the presented FNN approximation is demonstrated by the numerical results for the data sets extracted from continuous and discontinuous functions.The aforementioned efficiency means that the proposed method requires less neurons to reach similar or lower error levels than the compared FNN approximation method using the conventional logistic function.
In addition, an extended FNN approximation formula for two variable functions is proposed in Section 5 with some numerical examples showing the superiority of the presented FNN approximation method.

A Parametric Sigmoidal Function
The role of the activation function in the artificial neural networks is to introduce non-linearity of the input data into the output of the neural network.One of the useful activation functions commonly used in practice is the sigmoidal function σ having the property below.
For example, two traditional sigmoidal functions are (i) Heaviside function: (ii) Logistic function: We recall the following approximation theorem shown in the literature [6].
Theorem 1. (Costarelli and Spigler [6]) For a bounded sigmoidal function σ and a function f ∈ C[a, b] let G N f be a neural network approximation to f of the form Then for every > 0 there exists an integer N > 0 and a real number ω > 0 such that Sigmoidal functions have been used in various applications including the artificial neural networks (See the literature [12][13][14][15][16][17]). In this work we employ an algebraic type sigmoidal function, containing a parameter m > 0, as follows.
for a fixed L > 0. This function has the following properties.

Constructing a Neural Network Approximation
Suppose for a real valued function f (x), a ≤ x ≤ b, a set of data is given, where N ≥ 2 is an integer and x k are nodes on the interval [a, b].For simplicity, we assume equally spaced nodes as We can observe that, for sufficiently large m, the function σ [m] with L = b − a in (6) satisfies due to the property (A2).Moreover, noting that σ [m] is an increasing function as mentioned in (A1), we can see that and from the property (A3) To find a lower bound of the parameter m we set L • (N−1) 1/m −1 (N−1) 1/m +1 = h(= L/N).Then we have the lower bound m = m * , satisfying this equation, as That is, for every m > m * it follows that The lower bound m * given in ( 16) will be used for a threshold of the parameter m in the numerical implementation of the proposed neural network approximation later.
Referring to the above features of σ [m] in ( 13), ( 17) and ( 18), we propose a superposition of σ [m] to approximate the given data where for some ξ j ∈ (x j−1 , x j+1 ), 0 < δ < 1 and a constant C j .Moreover, Proof.Since σ [m] is an increasing function and it satisfies the asymptotic behaviour in (8), for each 1 ≤ j ≤ N − 1 with m large enough, we have The second equation above results from the relation σ based on the property (A3).Denoting by ∆ and ∆ 2 the first and the second forward difference operators, respectively, and using the function θ (t) defined in (9), we have we have the formula (20).
On the other hand, for x = x 0 and m large enough Since ∆ f 0 = f (ξ 0 ) h for some ξ 0 ∈ (x 0 , x 1 ), we have for a constant C 0 .For x = x N and m large enough for a constant C N .Thus the proof is completed.
Theorem 2 implies that, for N fixed(i.e., h fixed), approximation errors of S [m] N f (x) at every nodes can be accelerated by increasing the value of the parameter m.

The sum S
[m] N f (x) in (19) can be written by Using a function ψ [m] defined as with L = b − a, satisfying 0 ≤ ψ [m] (t) ≤ 1 for all t, we may rewrite In fact, it follows that The formula ( 23) is a form of the feed-forward neural networks based on the activation function with constant weights w k = 1 and thresholds x k .
Under the assumption that m is large enough, the proposed quasi-interpolation S [m] N f (x) in (23) has the following properties: Graphs of the activation functions, σ 1 illustrate the intuition of the construction of the presented quasi-interpolation S [m] N f (x).In addition, Figure 2 includes the graphs of ψ It is well known that the interpolants for continuous functions are guaranteed to be good if and only if the Lebesgue constants are small [15].Regarding the formula (25) as an interpolation with equispaced points {x k } N k=0 , its Lebesgue function satisfies for all x, and thus the corresponding Lebesgue constant becomes Λ N = λ N (x) ∞ ≈ 1.Noting that for the polynomial interpolation, the Lebesgue constant grows exponentially such as Λ N ∼ 2 N+1 eN log N as N → ∞, we may expect that S [m] N f (x) will be better than the polynomial interpolation in approximation to any continuous function, at least.

Correction Formula
In order to improve the interpolation errors near the end-points of the given interval, that is, to make the formula (20) in Theorem 2 hold for all 0 ≤ j ≤ N, we employ two values at the points Using these additional data, we define a correction formula of (23) as To explore the availability of the proposed approximation method (27), we consider the following examples which were employed in the literature [6].
We compare the results of the presented method with those of the existing neural network approximation method (5) using the activation function σ = σ L in (4).In the literature [6], it was proved that Theorem 1 holds if the weight ω is chosen such as In practice, we have used ω = N 2 /L in implementation of the existing FNN G N f (x) in (5) for the examples above.The high level software, Mathematica(V.10) has been used as a programming tool throughout the numerical performance for the examples.

For the smooth function f 1 in Example 1, approximations of the proposed FNN S [m]
N f 1 (x), with small number of neurons (N = 10) are shown in Figure 3 with respect to each parameter m = 10, 15, 20, 30.The higher the value of m is, the more clearly S [m] N f 1 (x) reveals the so-called quasi-interpolation property as shown in Theorem 2.Moreover, Figure 4 shows errors of S [m] N f 1 (x) with m = 2N > m * , for m * the lower bound of m as given in ( 16), compared with errors of G N f 1 (x) for N = 10, 20, 30, • • • , 80. Therein the errors are defined as  N f 2 (x) for various values N = 10, 20, 40, 80, with m = 4N, are also given in Figure 6.One can see that the results of the presented method S [m] N f 2 (x) are better than those of G N f 2 (x) shown in Figure 7. On the other hand, it is noted that the FNN approximations are free from the so-called Gibbs phenomenon, generating wiggles (i.e., overshoots and undershoots) near the jump-discontinuity, which appears inevitably in partial sum approximations composed of the polynomial or trigonometric base functions in general.

Multivariate Approximation
For simplicity we consider a function of two variables g(x, y) on a region [a, b] × [c, d] ⊂ R 2 , and assume that a set of data g ij = g x i , y j 0≤i,j≤N is given for the nodes and σ [m] is the parametric sigmoidal function in (6).Then, referring to the formula (25) under the assumption that m is large enough, we define an extended version of the FNN approximation to g as To testify the efficiency of the presented method (31), we choose functions of two variables below.In the numerical implementation for the examples the software, gnuplot(V.5)was used as it is rather fast for evaluating and graphing on two dimensional region.
Figure 8 shows the approximations of the presented method S [m] N g i (x, y) to the test functions g i (x, y), i = 1, 2, for N = 30 with m = 120.We can see that S N g i (x, y) approximates g i (x, y) properly over the whole region, while the existing method G N g i (x, y) given in the literature [6] produces considerable errors as shown in Figure 9.

Conclusions
In this work we proposed an FNN approximation method based on a new parametric sigmoidal activation function σ [m] .It has been shown that the presented method with the parameter m large enough has a feature of the quasi-interpolation at the given nodes.As a result, we can note that the presented method is better than the existing FNN approximation method as demonstrated by the numerical results for several examples of univariate continuous and discontinuous functions.Additionally, the availability of the method in extended application to the multivariate function was illustrated.
[m] k (x) with respect to the values m = 1, 2, 4, 16, which shows that ψ [m] k (x) becomes flatter near the node x k and far from the node as the parameter m goes higher.

Figure 1 .
Figure 1.Graphs of the sigmoidal functions σ [m] k (x) in (a) and those of ψ[m] k (x) in (b).