On the Computability of Primitive Recursive Functions by Feedforward Artiﬁcial Neural Networks

: We show that, for a primitive recursive function h ( x , t ) , where x is a n -tuple of natural numbers and t is a natural number, there exists a feedforward artiﬁcial neural network N ( x , t ) , such that for any n -tuple of natural numbers z and a positive natural number m , the ﬁrst m + 1 terms of the sequence { h ( z , t ) } are the same as the terms of the tuple ( N ( z ,0 ) , . . . , N ( z , m )) .


Introduction
Primitive recursive functions describe, albeit incompletely, the intuitive notion of a number-theoretic algorithm, a deterministic procedure to transform numerical inputs to numerical outputs in finitely many steps.This perception of primitive recursive functions as intuitive counterparts of number-theoretic algorithms may be rooted in the fact that any primitive recursive function can be mechanically constructed from a set of initial functions with finitely many applications of simple, well-defined operations of composition and primitive recursion.These functions and some of their properties have been investigated by Gödel [1], Péter [2,3], Kleene [4], Davis [5], and Rogers [6] in their studies of formal systems, foundations of mathematics, and computability theory.Although the confinement of the construction procedure to two operations may at first seem restrictive, many functions on natural numbers ordinarily encountered in mathematics and computer science are, in fact, primitive recursive (cf., e.g., Ch. 3 in [7]).Primitive recursive functions have been used to investigate the foundations of functional programming.Colson [8] presents a computational model in which a primitive recursive function is viewed as a rewriting system and gives a non-trivial necessary condition for an algorithm to be representable in the system.Paolini et al. [9] define a class of recursive permutations, which they call Reversible Primitive Permutations (RPP), and formalize it as a language that is sufficiently expressive to represent all primitive recursive functions.Petersen [10] uses induction and primitive recursion to develop resource conscious logics where the repeated recycling of assumptions, e.g., repeated applications of the successor function f (n) = n + 1 to enumerate natural numbers, has costs.
Feedforward artificial neural networks have their origin in the research by McCulloch and Pitts [11], which describes neural events with propositional logic.McCulloch and Pitts assume that the human nervous system is a finite set of neurons, each of which has an excitation threshold.When a neuron's threshold is exceeded, the neuron generates an impulse that propagates to other neurons across synapses connecting them to the origin of the impulse.A fundamental insight by McCulloch and Pitts is that if the response of a neuron can be formalized as a logical proposition specifying its stimulus, then behaviors of complex networks of neurons can, in principle, be described with symbolic logic.Artificial neural networks entered mainstream computer science almost half a century after the research by McCulloch and Pitts when Rumelhart,Hinton,and Williams [12] discovered backpropagation, a method for training networks to modify synapse weights by minimizing error between the output and the ground truth.Different types of such networks have been shown to be universal approximators of some classes of functions (e.g., [13][14][15]).Artificial neural networks are increasingly used in embedded artificial intelligence (AI) systems, i.e., systems that run on computational devices with finite amounts of computer memory (e.g., [16]).We will refer to embedded AI as finite AI to emphasize the fact that finite AI systems are realized on computational devices with finite amounts of computational memory.
In this investigation, we seek to relate, in a formal way, primitive recursive functions and feedforward artificial neural networks by investigating whether it is possible, for a given primitive recursive function, to construct a feedforward artificial neural network that arbitrarily computes many values of the function's co-domain from the corresponding values of the function's domain.We hope that our investigation contributes to the knowledge of the classes of functions that can be not only approximated, but provably computed by feedforward artificial neural networks.In particular, we formalize feedforward artificial neural networks with recurrence equations, propose a formal definition of the concept of N-computability, i.e., the property of a function to be computed by a feedforward artificial neural network N, and prove several lemmas and theorems to show how feedforward artificial neural networks can be constructed to arbitrarily compute many consecutive values of any primitive recursive function.Since these networks consist of finite sets of neurons and are used in some finite AI systems [17,18], our investigation will be of interest to mathematicians and computer scientists interested in the computability theory of finite AI.
The remainder of our article is organized as follows.In Section 2, we review several definitions of primitive recursive functions starting with the original definition by Gödel [1] and proceeding to the later definitions by Kleene [4], Davis [5], Rogers [6], and Davis et al. [7], and Meyer and Ritchie [19].This section gives the reader a historical bird's eye view of how the concept of primitive recursive function and its formalization have co-evolved in time.In Section 3, we state the notational conventions and give the definition of a primitive recursive function we use in this article.This section is intended for reference.In Section 4, we offer a formalization of feedforward artificial neural networks in terms of recursive equations.In Section 5, we prove several lemmas and theorems that form the bulk of our theoretical investigation.In Section 6, we present some perspectives on the obtained results and summarize our conclusions in Section 7.

Recursive Functions
Gödel [1] (Sec. 2, p. 157) describes the class of number-theoretic functions as the class of functions whose domains are non-negative integers or n-tuples thereof and whose values are non-negative integers.Gödel [1] (Sec. 2, pp.157-159) states that a numbertheoretic function φ(x 1 , x 2 , . . ., x n ) is recursively defined in terms of the number-theoretic functions ψ(x 1 , x 2 , . . ., x n−1 ) and µ(x 1 , x 2 , . . .x n+1 ) if φ is obtained from ψ and µ by the following schema: where the equalities hold for all k, x 2 , . . ., x n .Gödel [1] (Sec. 2, p. 159) defines a numbertheoretic function φ to be recursive if there exists a finite sequence of number-theoretic functions φ 1 , φ 2 , . .., φ n = φ, where each function φ i , 1 ≤ i ≤ n, is a natural number constant, the successor function x + 1, or is defined from two preceding functions with Schema (1) or from one preceding function by substitution, i.e., the replacement of the arguments of a preceding function with some other preceding functions.Kleene [4] (Chap.IX, § 43, p. 219) defines a number-theoretic function to be primitive recursive if it is definable by a finite number of applications of the six schemata in (2), where m and n are positive integers, i is an integer such that 1 ≤ i ≤ n, q is a natural number, and ψ, χ 1 , . . ., χ m , and χ are number-theoretic functions with the indicated numbers of arguments.
where y (m) and x (n) are tuples of natural numbers with m and n elements, respectively.Davis [5] (Chap.3, Sec.4, p. 48) defines the operation of primitive recursion as the operation that uses Schema (4) to construct the function h(x (n+1) ) from the total functions f (x (n) ) and g(x (n+2) ), where x (n) , x (n+1) , and x (n+2) are tuples of natural numbers with n, n + 1, and n + 2 elements, respectively.
For a set of natural numbers A, Davis [5] (Chap.3, Sec.4, p. 49) defines an A-primitive recursive function or a function primitive recursive in A as a function that can be obtained by a finite number of applications of composition (cf.Schema (3)) and primitive recursion (cf.Schema (4)) from the following j functions: where C A (x) is the characteristic function of A (i.e., C A (x) is a total function such that , and S(x) and U n i are identical to Kleene's Schemata (I) and (III) in (2).Davis [5] (Chap.3, Sec.4, p. 49) defines a function f to be primitive recursive if it is ∅-primitive recursive, where ∅ denotes the empty set.
Rogers [6] (Chap. 1, § 1.2, p. 6) defines the class C of primitive recursive functions as the smallest class of functions such that Davis et al. [7] (Chap.3, Sec.3, p. 42) defines as initial the functions s(x) = x + 1, n(x) = 0, and u n i (x 1 , . . ., x n ) = x i , 1 ≤ i ≤ n, and defines a function to be primitive recursive if it can be obtained from the initial functions by a finite number of applications of composition or primitive recursion where primitive recursion is defined by Schema (7) (Chap.3, Sec.2, p. 40 in [7]) and Schema (8) (Chap.3, Sec.2, p. 41 in [7]).In Schema (7), k is a natural number and g is a total function of two variables.In Schema (8), f and g are total functions of n and n + 2 variables, respectively.
In this case, we say that Z computes f .If, in addition, f (x 1 , . . ., x n ) is a total function, then it is called computable.
In subsequent chapters of his monograph (cf.Chap. 2 and Chap. 3 in [5]), Davis separates the notion of computability from Turing machines to make it possible "to demonstrate the computability of quite complex functions without referring back to the original definition of computability in terms of Turing machines" (cf.Ch. 3, Sec. 1, p. 41 in [5]).
Davis et al. [7] (Chap.2) continue this treatment of computability by designing the programming language L and then defining partially computable and computable functions in terms of L programs, viz., finite sequences of L instructions.In L, the unique variable Y is designated as the output variable to store the output of an L program P on a given input.X 1 , X 2 , ... are input variables and Z 1 , Z 2 , ... are internal variables.All variables refer to natural numbers.L has conditional dispatch instructions, line labels, elementary arithmetic operations, comparisons of natural numbers, and macros.
Davis et al. [7] (Chap.2, Sec. 3, p. 27) define a computation of an L program P on some inputs x 1 , . . ., x m , and m > 0, as a finite sequence of snapshots (s 1 , . . ., s k ), where each snapshot s i , 1 ≤ i ≤ k, k > 0 specifies the number of the instruction in P to be executed and the value of each variable in P, and where each subsequent snapshot is uniquely determined by the previous snapshot (Theorem 3.2, Chap.4, Sec. 3, pp.74-75 in [7]).The snapshot s 1 is the initial snapshot, where the values of all input variables are set to their initial values, the program instruction counter is set to 1, i.e., the number of the first instruction in P, and the values of all the other variables in P are set to 0. The snapshot s k in (s 1 , . . ., s k ) is a terminal snapshot, where the instruction counter is set to the number of the instructions in P plus 1.If some program P in L takes m inputs The definitions of partially computable and computable functions are made by Davis et al. [7] (Chap.2, Sec. 4, p. 30) in terms of L programs as follows.
Definition 2. An n-ary function f is partially computable if f is partial and there is a L program P such that Equation ( 10) holds for all x 1 , . . ., x n .
Definition 3. A n-ary function f is computable if it is total and partially computable.
defined.This treatment of computable functions in terms of programs in a formal language is by no means the only one in the literature.For example, as early as 1967, Meyer and Ritchie [19] formalize primitive recursive functions as loop programs consisting of assignment and iteration statements similar to DO statements of the programming language FORTRAN.

Computability of Primitive Recursive Functions
Davis et al. [7] (Chap.3, Sec.3, p. 42) introduce the concept of a primitive recursively closed (PRC) class of functions, which is a class of total functions that contains the initial functions and any functions obtained from the initial functions by a finite number of applications of composition or primitive recursion.Davis et al. [7] (Chap.3, Sec.3, pp.42-43) show that (1) the class of computable functions is PRC; (2) the class of primitive recursive functions is PRC; and (3) a function is primitive recursive if and only if it belongs to every PRC class.A corollary of (3) is that every primitive recursive function is computable.
Péter [2,3] shows it is possible to define functions in terms of recursive equations that are not primitive recursive.In particular, Péter demonstrates that all unary primitive recursive functions are enumerable, i.e., φ 0 (x), φ 1 (x), φ 2 (x), . . . is an enumeration, with repetitions, of all unary primitive recursive functions.By Cantor's diagonalization (cf., e.g., pp.6-8 in [4]), the unary function f (x) = φ x (x) + 1 is not in the enumeration and, hence, not primitive recursive.While f is not primitive recursive, it is computable (cf.Definition 3).Thus, the class of primitive recursive functions is a proper subset of computable functions and, in and of itself, cannot completely capture the intuitive notion of a number-theoretic algorithm.Péter's argument suffers no loss of generality, insomuch as any n-ary primitive recursive function, n > 1, can be reduced to an equivalent unary primitive recursive function (cf., Theorems 9.1 and 9.2, Chap.4, Sec. 9, p. 108 in [7]).Kleene's separation of recursive functions into general recursive and primitive recursive may have been influenced by Péter's discovery (cited by Kleene [4] in Chap.XI, § 55, p. 272).
Rogers [6] (Chap. 1, § 1.2, p. 8) defines the Ackermann generalized exponential, a function for which there is no primitive recursive derivation, and formalizes it with the following recursive equations:

Notational Conventions and Definitions
If f is a function, dom( f ) and codom( f ) are the domain and the co-domain of f .The expression f : A → B abbreviates the logical conjunction dom( f The notation (a 1 , . . ., a n ) is used to denote ordered n-tuples or, simply, n-tuples over some set of numbers A. We will use bold lower-case variables, e.g., a, x, y, to refer to n-tuples.Thus, a = (13, 17, 19) is a 3-tuple over the set of natural numbers N = {0, 1, 2 . ..}.We will use the symbol N + to denote the set of positive natural numbers.
The individual elements of an n-tuple are not required to be distinct.If x is an n-tuple, then dim(x) = n, i.e., the number of elements in x.The 0-tuple is denoted as ().In calculus, a sequence is an ordered set of numbers in a one-to-one correspondence with N or N + (cf., e.g., Taylor [20], § 1.62, p. 67).Thus, if f : N → N, then { f (n)} denotes the sequence f (0), f (1), . . ., f (m), . . .with countably many elements or terms.In computability theory, the term sequence sometimes refers to an n-tuple (cf., e.g., Ch. 3, p. 60 in [7]).Thus, in order to avoid confusion, when we want to emphasize the fact that we are dealing with a finite number of ordered elements, we refer to the collection of these elements as a finite sequence, a tuple, or an m-tuple, where m is the number of the elements.
For n > 0, A n is the n-th Cartesian power of A, i.e., R}, where R is the set of real numbers.We use statements like a ∈ A n to mean that a is an n-tuple over A. We do not distinguish between 1-tuples and individual elements, e.g., a = (a), a ∈ A, and h(a) = h((a)) for some function h.
In formalizing feedforward artificial neural networks, it is sometimes convenient to treat n-tuples as vectors.Therefore, we occasionally use symbols like x, y, z to denote n-tuples.If x ∈ A n , then dim( x) = n and x[j], 1 ≤ j ≤ n, is the j-th element of x.E.g., if x = (1, 1, 11) ∈ N 3 , then x [1] = x[2] = 1 and x [3] = 11.If a ∈ A n and a ∈ A n , and a y[m]).The empty tuple is discarded in function arguments.E.g., if h : N → N, then h((), t) = h(t, ()) = h(t), t ∈ N. We occasionally separate individual arguments of functions from the remaining arguments combined into tuples.E.g., if f : is a predicate, where 1 arbitrarily designates logical truth and 0 logical falsehood.The symbols ¬, ∧, ∨, → respectively refer to logical not, logical and, logical or, and logical implication.P(x) is a shorthand for P(x) = 1, and ¬P(x) is a shorthand for P(x) = 0.If P and Q are predicates, then ¬P ∨ Q is logically equivalent to P → Q, i.e., ¬P ∨ Q ≡ P → Q.The symbols ∃ and ∀ refer to the logical existential (there exists) and universal (for all) quantifiers, respectively.Thus, the statement (∃x)P(x) is logically equivalent to the statement that P(x) holds for at least one x in dom(P), while the statement (∀x)P(x) is logically equivalent to the statement that P(x) holds for every x in dom(P).
Let, for 0 < k, n, f : N k → N, g j : N n → N, 1 ≤ j ≤ k, and x ∈ N n .We use the following definitions of composition and primitive recursion in our article.A function of h : N n → N is obtained from f , g j by composition if h is obtained from f , g j by Schema (11).
Let k ∈ N and φ : N 2 → N be total.A function h : N → N is obtained from φ by primitive recursion if it is obtained from φ by Schema (12).
Let f : N n → N and g : N n+2 → N be total, then h : N n+1 → N is obtained from f and g by primitive recursion if h is obtained from f and g by Schema (13), where x ∈ N n .
If x ∈ N n , Schema (13) can be expressed with the vector notation as Let the set of initial functions consist of Definition 4. A function is primitive recursive if it can be obtained from the initial functions by a finite number of applications of Schemata ( 11)-(13).
A corollary of Definition 4 is that if f is primitive recursive, then there is a sequence of functions φ 1 , . . ., φ n = f such that φ i , 1 ≤ i ≤ n, is either an initial function or is obtained from the previous functions in the sequence by composition or primitive recursion.

Feedforward Artificial Neural Networks
A feedforward artificial neural network N z is a finite set of neurons, each of which is connected to a finite number of the neurons in the same set through the synapses, i.e., directed weighted edges (cf. Figure 1).The neurons are organized into l layers E 1 , . . ., E l , where E 1 is the input layer, E l is the output layer, and E e , 1 < e < l are the hidden layers.We use the term network synonymously with the term feedforward artificial neural network. is w e i,j , 1 ≤ e ≤ 3. E.g., w 1  1,1 is the weight of the synapse from n 1 1 to n 2 1 and w 2 3,1 is the weight of the synapse from n 2 3 to n 3 1 .
Let z z denote the number of layers in N z and n e i refer to the i-th neuron in layer E e , 1 ≤ e ≤ z z .The function nn z (e) : N + → N + specifies the number of neurons in layer E e of N z .We assume that N z is fully connected, i.e., there is a synapse from every neuron in layer E e to every neuron in layer E e+1 , 1 ≤ e < z z .Each synaptic weight w e i,j (cf. Figure 1) is a real number.The vector w e is the vector of all synaptic weights in N z from E e to E e+1 .Thus, w e = w e 1,1 , . . ., w e 1,nn z (e+1) , . . ., w e nn z (e),1 , . . ., w e nn z (e),nn z (e+1) .
We let w 0 = () and assume, without loss of generality, that, for any synaptic weight w e i,j , 0 ≤ w e i,j ≤ 1, because, if that is not the case, w e i,j can be so scaled.No loss of generality is introduced with the assumption of full connectivity, because if full connectivity is not required, appropriate synaptic weights are set to zero.If, on the other hand, a given network is not fully connected, synapses with zero weights can be added as needed to make the network fully connected.
Each n e i , e > 1 computes an activation function α e i a e−1 , w e−1 : R dim( a e−1 ) , R dim( w e−1 ) → R, where a e−1 is the vector of the activations of the neurons in layers E e−1 , dim( a e−1 ) = nn z (e − 1), and dim( w e−1 ) = nn z (e − 1)nn z (e).If x is the input to N z , then a 1 = x.For the input layer, we have The term feedforward means that the activations of the neurons are computed layer by layer from the input layer to the output layer, because the activation functions of the neurons in the next layer require only the weights of the synapses connecting the current layer with the next one and the activation values, i.e., the outputs of the activation functions of the neurons in the current layer.If x is the input vector, then x, () , . . .α 1 nn z (1) x, () = x, a e = α e 1 a e−1 , w e−1 , . . ., α e nn(e) a e−1 , w e−1 , 1 < e < z z . ( The feedforward activation function f z that computes the activations of N z layer by layer can be defined as , w e , . . ., α e+1 nn(e+1) f z x, e , w e . ( Thus, for some sets A and B, we define the function Definition 5. A function f : A n → B m , for some sets A and B, is N-computable if there is a network N z such that, for all x = x ∈ A n , If N z computes f , we refer to N z as N f (•) and use the expression A network N z can include other networks.Let N j and N k be two networks such as ζ j : A m → B n and ζ k : B n → C k , for some sets A, B, C, and 0 < m, n, k.Then we can construct a new network N l by feeding the output of N j to N k so that ζ l : A m → C k (cf. Figure 2).We can generalize this case to a network that includes arbitrarily many networks whose outputs are the inputs to another network whose output is the output of the entire complex network (cf. Figure 3).Formally, let N z 1 , . . ., N z l be networks such that for some sets I and O, 0 < n z i , k z i , and 1 ≤ i ≤ l.Let, for some set S, a network N j compute the function where y = ( , and s = s.The two bottom networks are functionally identical pictogrammatic renderings of the same network N l .In the third network from the top the output y of N j is made explicit.In the bottom rendering of N l , y is implicit in the arrow from N j to N k .In sum, the output of N j is given to N k , and the output of N k is the output of N l .Thus, N l maps x to z. Figure 3.A network N z that includes networks N z 1 , . . ., N z l that take x z 1 , . . ., x z k as inputs and give their outputs to network N j (cf.Equation ( 22)).Thus, N z maps x z 1 , . . ., x z k to s.
We use the symbol N id to denote an identity network such that, for a = a ∈ A n , 0 < n, ζ id ( a) = a = ζ id (a) = a.One can think of N id as a single layer network of n neurons, where Our formalization of feedforward artificial neural networks as finite sets of neurons and synapses organized in finitely many layers is in compliance with the original definition by McCulloch and Pitts (Sec.2, p. 103 in [11]) who state that the neurons of a given network may be assigned designations c 1 , c 2 , . .., c n .It is also in compliance with the subsequent definition by Rumelhart,Hinton,and Williams [12] as well as with modern treatments of neural networks by Nielsen [17] and Goodfellow, Bengio, and Courville [18] that continue to describe neural networks as finite sets of neurons and synapses.

N-Computability of Primitive Recursive Functions
Lemma 1.The initial functions are N-computable.
Proof.Let N n(•) : N → N be a network with a single input node n 1  1 and a single output node n 2 1 such that 1 and a single output node n 2 1 such that

. , a[n]).
We abbreviate N u n i (•) as N u(•) , because n and i are always evident from the context.
Proof.Since u n i is primitive recursive, c n i is primitive recursive, by the definition by cases theorem and its corollary (cf.Theorem 5.4, Chap.3, Sec. 5, pp.50-51 in [7]).Let N c n i (•) be a network with n + 1 input nodes n 1 1 , . . ., n 1 n+1 , where the first n nodes receive the n corresponding values of x ∈ N n , and the last node have one output node n 2 1 and let w 1 j,k = 1, 1 ≤ j ≤ n.Let the activation function of n 2 1 be defined as Then, Lemma 3. Let f be a N-computable function of k arguments, k > 0, and g 1 , ..., g k be N-computable functions of n arguments each, n > 0. Let a function h of n arguments be obtained from f , g 1 , . . ., g k by Schema (11).Then, h is N-computable.
Proof.Let f , g 1 , . . ., g k be computable by Then, for z ∈ N n , we have Proof.Let N n(•) and N s(•) be as constructed in Lemma 1.Let {N s(•) } k , k ≥ 0 denote a network that consists of a finite sequence of k networks N s(•) , where the first N s(•) receives its input from N n(•) and each subsequent N s(•) receives its input from the previous N s(•) (cf. Figure 2).Let {N s(•) with itself, i.e., s 1 (x) = s(x), s 2 (x) = s(s(x)), etc.Then, The next lemma, Lemma 5, is a technical result for Lemma 6.The function x .
Proof.Let N .−(•) have two input nodes n 1 1 , n 1 2 and one output node n 2 1 .Let w 1 1,1 = w 1 2,1 = 1 and let Then, for a = a ∈ N 2 , we have Definition 6 confines the notion of N-computability of some function f (x, t) to the N-computability of the first k elements of the sequence { f (x, t)}, t ∈ N. Definition 6.A function f : A n × N → B m , for some sets A and B, is N-computable elementwise for any k > 0 if there is a network N z such that, for any z ∈ A n , the first k + 1 terms of the sequence { f (z, j)} = f (z, 0), f (z, 1), . . ., f (z, k), . . .are the same as the terms of the tuple Thus, if a function f (x, t) is N-computable, it is N-computable elementwise for any positive k.Lemma 6.Let φ : N 2 → N be N-computable elementwise and h(t) be a function obtained from φ by Schema (12).Then, h is N-computable elementwise.
Proof.Let φ be computable elementwise by N φ(•) .Let N h0 (0) = N J k (0) = k as constructed in Lemma 4. In the equations below, we abbreviate N n(•) (0) as 0, N J k (0) as k, N J t (0) as N J t , N .−(•) (x, y) as x .− y, and N hi (i) as N hi .Let By induction on t, h(t) = N h(•) (t) (cf. Figure 4).Let Then, the first m + 1 terms of the sequence {h(t)} are the same as the terms of the tuple (N h(•) (0), . . ., N h(•) (m)) (cf. Figure 5 for m = 3).(3), the first four terms of the sequence {h(t)} are the same as the terms of the 4-tuple Lemma 7. Let f : N n → N and g : N 2 → N be N-computable elementwise and h : N n+1 → N be a function obtained from f and g by Schema (13).Then h is N-computable elementwise.
We can ask if the elementwise N-computability of h(x, t) (cf.Definition 6) can be generalized to N-computability.In other words, is it possible to have the sequences {h(x, t)} and {N(x, n)} agree term by term, i.e., h(x, t) = N(x, t)?Since N has a finite set of neurons organized into a finite number of layers, N can compute, per Lemmas 6 and 7, only the first m + 1 values of h(x, t), i.e., h(x, t), 0 ≤ t ≤ m, although m can be an arbitrarily large natural number.Thus, the answer to this question is negative.
Let us assume that N h(x,t) in Theorem 1 is allowed to have countably many neurons so that the number of neurons in the hidden layers of N h(x,t) is countable.Let ζ N (x, t) be the function computed by N h(x,t) .Since countably many neurons can be added to N h(x,t) to compute h(x, t), for any t, we have the sequence {ζ N (x, t)} = {N(x, t)}, on the one hand, and the sequence {h(x, t)}, on the other hand.Let f (x, t) = h(x, t) − ζ N (x, t).Since h(x, t) = ζ N (x, t), for any t ∈ N, { f (x, t)} is vacuously convergent, i.e., lim t→∞ f (x, t) = 0. Hence, we have the following theorem.Theorem 2. Let h(x, t) be a primitive recursive function, x ∈ N n , n ≥ 0. Then there is a network N(x, t) with countably many neurons such that for any z ∈ N n , the sequences {h(z, t)} and {ζ N (z, t})} agree term by term, i.e., h(z, t) = ζ N (z, t), t ∈ N.

Discussion
As mathematical objects, feedforward artificial neural networks are more computationally powerful than primitive recursive functions inasmuch as the former can compute functions over real numbers whereas the latter, by definition, cannot.E.g., one can define a network that computes the sum of n real numbers, which no primitive recursive function can compute.However, the situation changes when networks cease to be mathematical objects and become computational objects by being realized on finite memory devices.A finite memory device is a computational device with a finite amount of memory available for numerical computation [21].Such a device is analogous to a human scribe with a pencil and an eraser who is to carry out a numerical computation by writing and erasing symbols from a finite alphabet on a finite number of paper sheets.Finite memory devices are different from finite state automata of classical computability theory (e.g., a deterministic finite state machine (Chap.2, Sec.2.2 in [22]), non-deterministic finite state machine (Chap.2, Sec.2.3 in [22]), a Mealy or Moore machine (Chap.2, Sec.2.7 in [22], a push down automaton (Chap.5 in [22]), or a Turing machine (Chap.6 in [7]), because the latter do not put any bounds on the number of cells in their tapes available for computation.A finite state automaton of classical computability becomes a finite memory device only when the number of its tape cells available for computation is bounded by a natural number.
A real number x is signifiable on a finite memory device D j if and only if the finite amount of memory on D j can hold its sign, where a sign is a sequence of arbitrary symbols from a finite alphabet [21].Thus, if the alphabet is { ".", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9" } and D j has 8 memory cells to represent a real number, then the real numbers 1.41, 1.414, 1.4142, 1.41421, 1.414213 are signifiable on D j as "1.41", "1.414", "1.4142", "1.41421", "1.414213", respectively, whereas the real numbers 1.4142135, 1.41421356, 1.414213562, 1.4142135623, and 1.41421356237 are not.A consequence of the finite amount of memory is that the set of real numbers signifiable on D j is finite and, hence, vacuously countable.To put it differently, Cantor's theorem ( § 2 in [23]) does not apply insomuch as the number of signifiable reals on D j in any interval (α . . .β), α, β ∈ R, α < β, is finite.Consequently, all computation of a feedforward artificial neural network N z : R n → R m , 0 < n, m, realized on D j , can be packed into a unique natural number Ω z such that there exists a primitive recursive function f : N → N such that ζ z ( x) = a if and only if f ( x) = ã, where x uniquely corresponds to x and a to ã (cf.Theorem 1, pp. 15-17 in [21]).Theorem 1 is, after a fashion, the converse of Theorem 1 in [21] in the sense that it shows how one can construct a network from a primitive recursive function.
Theorem 2 shows that all values of a primitive recursive function can be computed exactly by a feedforward artificial neural network if the network is allowed to have countably many neurons.This purely theoretical result contributes to the growing collection of universality theorems on feedforward neural networks and various classes of functions (cf.Ch. 4 in [17]).Thus, Hornik et al. [13] show that multilayer feedforward networks with a single hidden layer of neurons with arbitrary squashing activation functions can approximate any Borel measurable function from one dimensional space to another to any desired degree of accuracy so long as the number of the neurons in the hidden layer is unbounded.Gripenberg [14] shows that the general approximation property of feedforward perceptron networks is achievable when the number of perceptrons in each layer is bounded but the number of layers is allowed to grow to infinity and the perceptron activation functions are continuously differentiable and not linear.Guliyev and Ismailov [15] show that single hidden layer feedforward neural networks with the fixed weights of one and two neurons in the hidden layer approximate any continuous function on a compact subset of the real line and proceed to demonstrate that single layer feedforward networks with fixed weights cannot approximate all continuous multivariate functions.
We conclude our discussion with a caveat about universality results of feedforward neural networks with unbounded numbers of neurons.While these results provide valuable theoretical insights, they may not hold much sway with computer scientists interested in computability properties of finite AI, because networks with unbounded numbers of neurons cannot be realized on computational devices with finite amounts of computational memory.

Conclusions
We have formalized feedforward artificial neural networks with recurrence equations and proposed a formal definition of the concept of N-computability, i.e., the property of a function to be computed by a feedforward artificial neural network N. We have shown that, for a primitive recursive function h(x, t), where x is an n-tuple of natural numbers and t is a natural number, there exists a feedforward artificial neural network N(x, t) such that for any n-tuple of natural numbers z, the first m + 1 terms of the sequence {h(z, t)} agree elementwise with the tuple (N(z, 0), . . ., N(z, m)), for any positive natural number m.Our investigation contributes to the knowledge of the classes of functions that can be computed by feedforward artificial neural networks.Since such networks are used in some finite AI systems, our investigation may be of interest to mathematicians and computer scientists interested in the computability theory of finite AI.

Figure 1 . 1 1 and n 1 2 .
Figure 1.A 3-layer feedforward artificial neural network.Layer 1 includes the input neurons n 1 1 and n 1 2 .Layer 2 includes the neurons n 2 1 , n 2 2 , n 2 3 .Layer 3 includes the neurons n 3 1 , n 3 2 .The two arrows incoming into n 1 1 and n 1 2 signify that layer 1 is the input layer.The two arrows going out of n 3 1 and n 3 2 signify that layer 3 is the output layer.The weight of the synapse from n e i to n e+1 j

Figure 2 .
Figure 2. A chain network N l that consists of two networks N j (top) and N k (second from the top).The two bottom networks are functionally identical pictogrammatic renderings of the same network N l .In the third network from the top the output y of N j is made explicit.In the bottom rendering of N l , y is implicit in the arrow from N j to N k .In sum, the output of N j is given to N k , and the output of N k is the output of N l .Thus, N l maps x to z.

Lemma 5 .
Let the function x .

Figures 6
Figures 6 and 7 illustrate sample constructions of Lemma 7. If we treat h(t) as a shorthand for h((), t), then Lemmas 6 and 7 give us the following theorem.