Series of Semihypergroups of Time-Varying Artiﬁcial Neurons and Related Hyperstructures

: Detailed analysis of the function of multilayer perceptron (MLP) and its neurons together with the use of time-varying neurons allowed the authors to ﬁnd an analogy with the use of structures of linear differential operators. This procedure allowed the construction of a group and a hypergroup of artiﬁcial neurons. In this article, focusing on semihyperstructures and using the above described procedure, the authors bring new insights into structures and hyperstructures of artiﬁcial neurons and their possible symmetric relations.


Introduction
As mentioned in the PhD thesis [1], neurons are the atoms of neural computation. Out of those simple computational units all neural networks are build up. The output computed by a neuron can be expressed using two functions y = g( f (w, x)). The details of computation consist in several steps: In a first step the input to the neuron, x := {x i }, is associated with the weights of the neuron, w := {w i }, by involving the so-called propagation function f . This can be thought as computing the activation potential from the pre-synaptic activities. Then from that result the so-called activation function g computes the output of the neuron. The weights, which mimic synaptic strength, constitute the adjustable internal parameters of the neuron. The process of adapting the weights is called learning [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18].
From the biological point of view it is appropriate to use an integrative propagation function. Therefore, a convenient choice would be to use the weighted sum of the input f (w, x) = ∑ i w i x i , that is the activation potential equal to the scalar product of input and weights. This is, in fact, the most popular propagation function since the dawn of neural computation. However, it is often used in a slightly different form: The special weight Θ is called bias. Applying Θ(x) = 1 for x > 0 and Θ(x) = 0 for x < 0 as the above activation function yields the famous perceptron of Rosenblatt. In that case the function Θ works as a threshold.
Let F : R → R be a general non-linear (or piece-wise linear) transfer function. Then the action of a neuron can be expressed by where x i (k) is input value in discrete time k where i = 0, . . . , m, w i (k) is weight value in discrete time where i = 0, . . . , m, b is bias, y i (k) is output value in discrete time k.
Notice that in some very special cases the transfer function F can be also linear. Transfer function defines the properties of artificial neuron and this can be any mathematical function. Usually it is chosen on the basis of the problem that the artificial neuron (artificial neural network) needs to solve and in most cases it is taken (as mentioned above) from the following set of functions: step function, linear function and non-linear (sigmoid) function [1,2,5,7,9,12,16,19].
In what follows we will consider a certain generalization of classical artificial neurons mentioned above such that inputs x i and weight w i will be functions of an argument t belonging into a linearly ordered (tempus) set T with the least element 0. As the index set we use the set C(J) of all continuous functions defined on an open interval J ⊂ R. So, denote by W the set of all non-negative functions w : T → R forming a subsemiring of the ring of all real functions of one real variable x : R → R. Denote by Ne( w r ) = Ne(w r1 , . . . , w rn ) for r ∈ C(J), n ∈ N the mapping y r (t) = n ∑ k=1 w r,k (t)x r,k (t) + b r which will be called the artificial neuron with the bias b r ∈ R. By AN(T) we denote the collection of all such artificial neurons.
Neurons are usually denoted by capital letters X, Y or X i , Y i , nevertheless we use also notation Ne( w), where w = (w 1 , . . . , w n ) is the vector of weights [20][21][22].
We suppose -for the sake of simplicity -that transfer functions (activation functions) ϕ, σ(or f ) are the same for all neurons from the collection AN(T) and the role of this function plays the identity function f (y) = y.
Feedforward multilayer networks are architectures, where the neurons are assembled into layers, and the links between the layers go only into one direction, from the input layer to the output layer. There are no links between the neurons in the same layer. Also, there may be one or several hidden layers between the input and the output layer [5,9,16].

Preliminaries on Hyperstructures
From an algebraic point of view, it is useful to describe the terms and concepts used in the field of algebraic structures. A hypergroupoid is a pair (H, ·), where H is a (nonempty) set and is a binary hyperoperation on the set H. If a · (b · c) = (a · b) · c for all a, b, c ∈ H (the associativity axiom), the the hypergroupoid (H, ·) is called a semihypergroup. A semihypergroup is said to be a hypergroup if the following axiom: for all a ∈ H (the reproduction axiom), is satisfied. Here, for sets A, B ⊆ H, A = ∅ = B we define as usually Thus, hypergroups considered in this paper are hypergroups in the sense of F. Marty [23,24]. In some constructions it is useful to apply the following lemma (called also the Ends-lemma having many applications-cf. [25][26][27][28][29]). Recall, first that by a (quasi-)ordered semigroup we mean a triad (S, ·, ≤), where (S, ·) is a semigroup, (S, ≤) is a (quasi-)ordered set, i.e., a set S endowed with a reflexive and transitive binary relation "≤" and for all triads of elements a, b, c ∈ S the implication a ≤ b ⇒ a · c ≤ b · c, c · a ≤ c · b holds. Lemma 1 (Ends-Lemma). Let (S, ·, ≤) be a (quasi-)ordered semigroup. Define a binary hyperoperation * : S × S → P * (S) by a * b = {x ∈ S; a · b ≤ x}.
Notice, that if (G, ·), (H, ·) are (semi-)hypergroups, then a mapping h : G → H is said to be the homomorphism of (G, ·) into (H, ·) if for any pair a, b ∈ G we have If for any pair a, b ∈ G the equality h(a · b) = h(a) · h(b) holds, the homomorphism h is called the good (or strong) homomorphism-cf. [30,31]. By EndG we denote the endomorphism monoid of a semigroup (group) G.
Linear differential operators described in the article and used e.g., in [29,42] are of the following form: (the ring of all smooth functions up to order n, i.e., having derivatives up to order n defined on the interval J ⊆ R). Definition 2 ([41,49]). Let (G, ·) be a semigroup and P ⊂ G, P = ∅. A hyperoperation * P : G × G → P (G) defined by [x, y] → xPy, i.e., x * y = xPy for any pair [x, y] ∈ P × P is said to be the P-hyperoperation in G. If x * P (y * P z) = xPyPz = (x * P y) * P z holds for any triad x, y, z ∈ G, the P-hyperoperation is associative. If also the axiom of reproduction is satisfied, the hypergrupoid (G, * P ) is said to be a P-hypergroup.
Evidently, if (G, ·) is a group, then also (G, * P ) is a P-hypergroup. If the set P is a singleton, then the P-operation * P is a usual single-valued operation.

Lemma 2.
The triad (AN(T), · m , ≤ m ) (algebraic structure with an ordering) is a non-commutative ordered group.
Sketch of the proof was published in [21]. Denoting we get the following assertion: Then for any positive integer n ∈ N, n ≥ 2 and for any integer m such that Define a mapping F : AN n (T) m → LA n (T) m+1 by this rule: For an arbitrary neuron Ne( w r ) ∈ AN n (T) m , where w r = (w r,1 (t), . . . , w r,n (t)) ∈ [C(T)] n we put F(Ne( w r ) ) = L(w r,1 , . . . , w r,n ) ∈ LA n (T) m+1 with the action : Then the mapping F : AN n (T) m → LA n (T) m+1 is a homomorphism of the group (AN n (T) m , · m) into the group (LA n (T) m+1 , • m+1 ). Now, using the construction described in the Lemma 1, we obtain the final transposition hypergroup (called also non-commutative join space). Denote by P(AN(T) m ) * the power set of AN(T) m consisting of all nonempty subsets of the last set and define a binary hyperoperation * m : Then we have that (AN(T) m , * m ) is a non-commutative hypergroup. We say that this hypergroup is constructed by using the Ends Lemma (cf. e.g., [8,25,29]. These hypergroups can be called as EL-hypergroups. The above defined invariant (called also normal) subgroup (AN 1 (T) m , · m ) of the group (AN(T) m , · m ) is the carrier set of a subhypergroup of the hypergroup (AN(T) m , * m ) and it has certain significant properties. Using certain generalization of methods from [42] (p. 283), we obtain, after we investigate the constructed structures, the following result: Then for any positive integer n ∈ N, n ≥ 2 and for any integer m such that 1 ≤ m ≤ n the hypergroup (AN(T) m , * m ),where is a transposition hypergroup (i.e., a non-commutative join space) such that (AN(T) m , * m ) is its subhypergroup, which is Ne( w r ), /, Ne( w s ) ∈ AN 1 (T) m , -reflexive (i.e., Ne( w r ) AN 1 (T) m = AN 1 (T) m /Ne( w r ) for any neuron Ne( w r ) ∈ AN(T) m and -normal (i.e.Ne( w r ) * AN 1 (T) m = AN 1 (T) m * Ne( w r ) for any neuron Ne( w r ) ∈ AN(T) m .

Remark 1.
A certain generalization of the formal (artificial) neuron can be obtained from expression of a linear differential operator of the n-th order. Recall the expression of formal neuron with inner potential is the vector of weights. Using the bias b of the considered neuron and the transfer function σ we can expressed the output as y(t) = σ Now consider a tribal function u : J → R, where J ⊆ R is an open interval; inputs are derived from u ∈ C n (J) as follows: Inputs dt n . As weights we use the continuous functions w k : J → R, k = 1, . . . , n − 1. Then formula is a description of the action of the neuron Dn which will be called a formal(artificial) differential neuron. This approach allows to use solution spaces of corresponding linear differential equations.
Then defining Ne( w p (t)) * Ne( w q (t)) = Ne( w p (t)) · m P · m Ne( w q (t)) = {Ne( w p (t)) · m Ne( w u (t)) · m Ne( w q (t)); u ∈ S} for any pair of neurons Ne( w p (t)), Ne( w q (t)) ∈ AN(T), we obtain a P-hypergroup of artificial time varying neurons. If S is a singleton, i.e., P is a one-element subset of AN(T), the obtained structure is a variant of AN(T). Notice, that any f ∈ EndG for a group (G, ·) induces a good homomorphism of the P-hypergroups (G, * P ), (G, * f (P) ) and any automorphism creates an isomorphism beween the above P-hypergroups. Let (Z, +) be the additive group of all integers. Let Ne( w s (t)) ∈ AN(T) be arbitrary but λ s (Ne( w p (t)) = Ne( w s (t) · m Ne( w p (t)) for any neuron Ne( w p (t)) ∈ AN(T). Further, denote by λ r s the r-th iteration of λ s for r ∈ Z. Define the projection π s : AN(T) × Z → AN(T) by π s (Ne( w p (t)), r) = λ r s (Ne( w p (t)).
It is easy to see that we get a usual (discrete) transformation group, i.e., the action of (Z, +) (as the phase group) on the group AN(T). Thus the following two requirements are satisfied: 1. π s (Ne( w p (t)), 0) = Ne( w p (t)) for any neuron Ne( w p (t)) ∈ AN(T), 2. π s (Ne( w p (t)), r + u) = π s (π s (Ne( w p (t)), r), u) for any integers r, u ∈ Z and any artificial neuron Ne( w p (t)). Notice that, in the dynamical system theory this structure is called a cascade.
On the phase set we will define a binary hyperoperation. For any pair of neurons Ne( w p (t)), Ne( w q (t)) define Then we have that * : AN(T) × AN(T) → P (AN(T)) is a commutative binary hyperoperation and since Ne( w p (t)), Ne( w q (t)) ∈ Ne( w p (t)) * Ne( w q (t)), we obtain that the hypergroupoid (AN(T), * )

Main Results
Now, we will construct series of groups and hypergroups of artificial neurons using certain analogy with series of groups of differential operators described in [29].
It can be easily verified that both F n and φ n , for an arbitrary n ∈ N, are group homomorphisms. Evidently, LA n (J) ⊂ LA n (J), LA n−1 (J) ⊂ LA n (J) for all n ∈ N. Thus we obtain complete sequences of ordinary linear differential operators with linking homomorphisms F n , φ n : Now consider the groups of time-varying neurons (AN(T) m , · m ) from Proposition 3 and above defined homomorphism of the group (AN n (T) m , · m ) into the group (LA n (T) m+1 , • m+1 ). Then we can change the diagram in the following way: Using the Ends lemma and results the theory of linear operators we can describe also mapping morphisms in sequences groups of linear differential operators:

Remark 2.
The second sequence of (2) can thus be bijectively mapped onto sequence of hypergroups wit the linking surjective homomorphisms F n . Therefore, the bijective mapping of the above mentioned sequences is functorial.
Now, shift to the concept of an automaton. This was developed as a mathematical interpretation of real-life systems that work on a discrete time-scale. Using the binary operation of concatenation of chains of input symbols we obtain automata with input alphabets endowed with the structure of a semigroup or a group. Considering mainly the structure given by transition function and neglecting output functions with output sets we reach a very useful generalization of the concept of automaton called quasi-automaton [29,31,48,49]. Let us introduce the concept of automata as an action of time varying neurons. Moreover, let system (A, S, δ), consists of nonempty time-varying neuron set of states A ⊆ AN(T) m , arbitrary semigroup of their inputs S and let mapping δ : A × S → A fulfill the following condition: δ(δ(a, r), s) = δ(a, rṡ) for arbitrary a ∈ A and r, s ∈ S can be understood as a analogy of concept of quasi-automaton, as a generalization of the Mealy-type automaton. The above condition is some times called Mixed Associativity Condition (MAC). The just defined structures are also called as actions of semihypergroups (H, ·) on sets A (called state sets).
Neuron Ne( w) acts as described above: where i goes from 0 to n, w i (t) is the weight value in continuous time, b is a bias and y(t) is the output value in continuous time t. Here the transition function F is the identity function. Now suppose that the input functions x i are differentiable up to arbitrary order n. We consider linear differential operators L(m, w n , . . . , w 0 ) : C n (T) × · · · × C n (T) → C n (T), i. e. C n (T) × · · · × C n (T) = [C n (T)] n+1 , defined L(m, w n , . . . , w 0 )x(t) = Then we denote by LNe n (T) the additive Abelian group of linear differential operators L(m, w n , . . . , w 0 ), where for L(m, w n , . . . , w 0 ), L(k, w * n , . . . , w * 0 ) ∈ LNe n (T) with the bias b we define L(m, w n , . . . , w 0 ) + L(s, w * n . . . , w * 0 ) = L(m + s, w n + w * n , . . . , Suppose that w k (t) ∈ C n (T) and define δ n : C n (T) × LNe n (T) → C n (T) by δ n (x(t), L(m, w n , . . . , w 0 )) = mb + x(t) + m + n ∑ k=0 w k (t) d k x(t) dt k , x(t) ∈ C n (T), where w n , . . . , w 0 are weights corresponding with inputs and b is the bias of a neuron corresponding to the operator L(m, w n , . . . , w 0 ) ∈ LNe n (T).
Theorem 3. Let LNe n (T), C n (T) be the above defined structures and δ n : C n (T) × LNe n (T) → C n (T) be the above defined mapping. Then the triad (C n (T), LNe n (T), δ n ) is an action of the group LNe n (T) on the group C n (T), i.e., a quasi-automaton with the state space C n (T) and with the alphabet LNe n (T) with the group structure of artificial neurons.
Consequently the mapping F k : (A n+k N(T) m , ≤) → (A n+k−1 N(T) m , ≤) is order-preserving, i.e., this is an order-homomorphism of hypergroups. The final result of our considerations is the following sequence of hypergroups of artificial neurons and linking homomorphisms:

Conclusions
Artificial neural networks and structured systems of artificial neurons have been discussed by a great number of researchers. They are an important part of artificial intelligence with many useful applications in various branches of science and technical constructions. Our considerations are based on algebraic and analytic approach using certain formal similarity with classical structures and new hyperstructures of differential operators. We discussed a certain generalizations of classical artificial time-varying neurons and studied them using recently derived methods. The presented investigations allow further development.