On Correspondences between Feedforward Artiﬁcial Neural Networks on Finite Memory Automata and Classes of Primitive Recursive Functions

: When realized on computational devices with ﬁnite quantities of memory, feedforward artiﬁcial neural networks and the functions they compute cease being abstract mathematical objects and turn into executable programs generating concrete computations. To differentiate between feedforward artiﬁcial neural networks and their functions as abstract mathematical objects and the realizations of these networks and functions on ﬁnite memory devices, we introduce the categories of general and actual computabilities and show that there exist correspondences, i.e., bijections, between functions computable by trained feedforward artiﬁcial neural networks on ﬁnite memory automata and classes of primitive recursive functions.


Introduction
An offspring of McCollough and Pitts' research on foundations of cybernetics [1], artificial neural networks (ANNs) entered mainstream machine learning after the discovery of backpropagation by Rumelhart, Hinton, and Williams [2].ANNs proved to be universal approximators of different classes of functions when no limits are imposed on the number of artificial neurons in any layer (arbitrary width) or on the number of hidden layers (arbitrary depth) and even with bounded widths and depths (e.g., [3][4][5]).ANNs cease being abstract mathematical objects when implemented in specific programming languages on computational devices with finite quantities of internal and external memory, to which we interchangeably refer in our article as finite memory devices (FMDs) and finite memory automata (FMA).To differentiate between functions computable by ANNs in principle and functions computable by ANNs realized on FMA, we introduce the categories of general and actual computabilities and show that there exist correspondences, i.e., bijections, between functions computable by trained feedforward ANNs (FANNs) on FMA and classes of primitive recursive functions.
Our article is organized as follows.In Section 2, we expound the terms, definitions, and notational conventions for functions and predicates espoused in this article and define the term finite memory automaton.In Section 3, we explicate the categories of general and actual computabilities and elucidate their similarities and differences.In Section 4, we formalize FANNs in terms of recursively defined functions.In Section 5, we present primitive recursive techniques to pack finite sets and Cartesian powers thereof into Gödel numbers.In Section 6, we use the set packing techniques of Section 5 to show that functions computable by trained FANNs implemented on FMA can be archived into natural numbers.In Section 7, we show how such archives can be used to define primitive recursive functions corresponding to functions computable by FANNs.In Section 8, we discuss theoretical and practical reasons for separating computability into the general and actual categories and pursue some implications of the theorems proved in Section 7. In Section 9, we summarize our conclusions.For the reader's convenience, Appendix A gives supplementary definitions, results, and examples that are referenced in the main text when relevant.

Functions and Predicates
If f is a function, dom( f ) and codom( f ) denote the domain and the co-domain of f , respectively.Statements such as f : S → R abbreviate the logical conjunction dom( f ) = S ∧ codom( f ) = R.A function f is partial on a set S if dom( f ) is a proper subset of S, i.e., dom( f ) ⊂ S. Thus, if S = N = {0, 1, 2, . ..} and f (x) = x 1/3 , then f is partial on S, because dom( f ) = {i 3 |i ∈ N} ⊂ N. If S and R are sets, then S = R is logically equivalent to the logical conjunction S ⊆ R ∧ R ⊆ S, i.e., S is a subset of R, and vice versa.If f is partial on S and z ∈ S, the following statements are equivalent: (1) z ∈ dom( f ); (2) f is defined on z; (3) f (z) is defined; and (4) f (z) ↓.The following statements are also equivalent: (1) z ∈ dom( f ); (2) f is undefined on z; (3) f (z) is undefined; and (4) f (z) ↑.If f is partial on S and dom( f ) = S, then f is total on S. Thus, f (x) = x + 1 is total on N. When f : S → R is a bijection, i.e., f is injective (one-to-one) and surjective (onto), f is a correspondence between S and R.

Finite Memory Automata
A finite memory device D j is a physical or abstract automaton with a finite quantity of internal and external memory and an automated capability of executing programs, i.e., finite sequences of instructions written in a formalism, e.g., a programming language for D j , and stored in the finite memory of D j .Since bijections exist between expressions over any finite alphabet, i.e., a finite set of symbols or signs, and subsets of N [6], we call the memory of D j numerical memory.The numerical memory consists of registers, each of which is a sequence of numerical unit cells, e.g., digital array cells, mechanical switches, and finite state machine tape cells.The quantity of numerical memory is the product of the number of registers and the number of unit cells in each register, i.e., this quantity is a natural number.
A real number x is signifiable on D j iff a register on D j can hold its sign.Put another way, a number is signifiable on D j if, in a programming language L for D j , the number's sign can be assigned to a variable, i.e., stored in a designated register.When x is signifiable on D j , we say that x is simply signifiable.A set or a sequence of numbers is signifiable if each number in the set or sequence is signifiable.
∆ j > 0 is the smallest positive signifiable real number on D j iff for any signifiable x, there is no signifiable y such that x < y < x + ∆ j .The finite set of real numbers in the closed interval between 0 and 1 signifiable on D j is We note, in passing, a notational convention in Equation ( 1) to which we adhere in our article: if D j is an FMA, then the Latin letter j in subscripts or superscripts of symbols is used to emphasize that they are defined with respect to D j .Thus, if D j and D k are two FMDs with different quantities of numerical memory, ∆ j = ∆ k .Lemma 1.If z = i∆ j is a maximal element of {x ∈ R|x = i∆ j < 1} and y = (i + 1)∆ j , then y ≥ 1.
is the finite set of signifiable numbers in the closed interval from a to b such that there exists no signifiable number between any two consecutive members of R j a,b when the latter is sorted in non-descending order.
Lemmas 1 and 2 draw on the empirically verifiable fact manifested by division underflow errors in modern programming languages: given an FMD D j and two signifiable real numbers a and b, with a < b, the set of signifiable real numbers in the closed interval between a and b is a proper finite subset of the set of real numbers R. Thus, bijections are possible between R j a,b and finite subsets of N. While these bijections may differ from FMA to FMA in that they depend on the exact quantity of memory on a given FMA, they differ only in terms of the cardinalities of their domains and co-domains: the larger the quantity of memory, the greater the cardinality.A constructive interpretation of Lemmas 1 and 2 is that if we take two signifiable real numbers a and b such that b − a ≥ ∆ j , we can effectively enumerate the elements of Z j a,b by iteratively adding increasing integer multiples of ∆ j to a until we reach b, i.e., a + z∆ j = b, or go slightly above it, i.e., a + (z − 1)∆ j < b < a + z∆ j , for z > 0.

Computability: General vs. Actual
Computability theory lacks a uniform, commonly accepted formalism for computable, partially computable, and primitive recursive functions.The treatment of such functions in our article is based, in part, on the formalism by Davis, Sigal, and Weyuker (Chapters 2 and 3 in [7]), which has, in turn, much in common with Kleene's formalism (Chapter 9 in [8]).Alternative treatments include [9], where primitive recursive functions are formalized as loop programs consisting of assignment and iteration statements similar to DO statements in FORTRAN, and [10], where λ-calculus is used.These symbolically different treatments have one feature in common: computable, partially computable, and primitive recursive functions operate on natural numbers and the underlying automata, explicit or implicit, on which these functions can, in principle, be executed if implemented as programs in some formalism, have access to infinite numerical memory.To distinguish computability in principle from computability on finite memory automata, we introduce the categories of general and actual computabilities.

General Computability
As our formalism in this section, we use the programming language L developed in Chapter 2 in [7] and subsequently used in that book to define partially computable, computable, and primitive recursive functions and to prove various properties thereof.An L program P is a finite sequence of L instructions.The unique variable Y is designated as the output variable where the output of P on a given input is stored.X 1 , X 2 , . . .designate input variables, and Z 1 , Z 2 , . . .refer to internal variables, i.e., variables in P that are not input variables.No bounds are imposed on the magnitude of natural numbers assigned to variables.L has conditional dispatch instructions; line labels; elementary arithmetic operations on and comparisons of natural numbers; and macros, i.e., statements expandable into primitive L instructions.
A computation of P on some input x ∈ N m , m > 0, is a finite sequence of snapshots (s 1 , . . ., s k ), where each snapshot s 1≤i≤k , k > 0, specifies the number of the instruction in P to be executed and the value of each variable in P. The snapshot s 1 is the initial snapshot, where the values of all input variables are set to their initial values, the program instruction counter is set to 1, i.e., the number of the first instruction in P, and the values of all the other variables in P are set to 0. The snapshot s k in (s 1 , . . ., s k ) is a terminal snapshot, where the instruction counter is set to the number of the instructions in P plus 1.Not all snapshot sequences are computations.If (s 1 , s 2 , . . ., s k ) is a computation of P on x ∈ N m , i.e., X 1 = x 1 , X 2 = x 2 , . .., X m = x m , then there is a function that, given the text of P and a snapshot s 1≤i<k in the computation, generates the next snapshot s i+1 of the computation.This function can verify if (s 1 , . . ., s k ) constitutes the computation of P on x.The existence of such functions implies that each instruction in L is interpreted unambiguously.If some program P in L takes m inputs and the values of the input variables are denotes the value of Y in the terminal snapshot s k if there exists a computation (s 1 , . . ., s k ) of P on (x 1 , x 2 , . . ., x m ) and is undefined otherwise.
Let k ∈ N, n ∈ N + , and φ : N 2 → N, f : N n → N, g : N n+2 → N be total.If h is obtained from φ by the recurrences in (9) or from f and g by the recurrences in (10), then h is obtained from φ or from f and g by primitive recursion or simply by recursion.The recurrences in (10) are isomorphic to Gödel's recurrences (Section 2, Equation (2) in [6]) where he introduces the concept of recursively defined number-theoretic function.The three functions in (11) are the initial functions.
Definition 3. A function is primitive recursive if it can be obtained from the initial functions by a finite number of applications of composition and recursion in ( 8)-( 10).
An implication of Definition 3 is that if f is a primitive recursive function, then there is a sequence of functions ( f 1 , . . ., f n = f ), n > 0, where every function in the sequence is an initial function or is obtained from the previous functions in the sequence by composition or recursion.
A class C of total functions is primitive recursively closed (PRC) if the initial functions are in it and any function obtained from the functions in C by composition or recursion is also in C. It has been shown (Chapter 3 in [7]) that (1) the class of computable functions is PRC; (2) the class of primitive recursive functions is PRC; and (3) a function is primitive recursive iff it belongs to every PRC class.A corollary of (3) is that every primitive recursive function is computable.
If C includes all functions of a certain type, we refer to it as the class of those functions, e.g., the class of partially computable functions, the class of computable functions, the class of primitive recursive functions, etc.When we say that C is a class of functions of a certain type, we mean that C ⊆ C, where C is the class of functions of that type.

Actual Computability
In general, the FMA defined in Section 2.2 is different from the finite state automata of classical computability theory, because the latter, e.g., a Turing machine (TM), do not impose any limitations on memory.A TM becomes an FMA iff the number of cells on its tape where it reads and writes symbols is finite.Analogously, a finite state automaton (FSA) of classical computability is an FMA iff there is a limit, expressed as a natural number, on the length of the input tape from which the FSA reads sign sequences over a given alphabet.
As is the case with general computability, we let P j L be a L program, i.e., a finite sequence of unambiguous instructions in a programming language L for an FMD D j .Thus, if D j is a physical computer with an operating system, e.g., Linux, a programming language for D j can be Lisp, C, Perl, Python, etc.If D j is an abstract FMA, e.g., a TM with a finite number of cells on its tape, then D j is programmed with the standard quadruple formalism (Chapter 6 in [7]).If D j is a mechanical device, then we assume that there is a formalism that consists of instructions such as "set switch i to position p", "turn handle full circle clockwise t times", etc.A state of D j while executing P j L on some input x includes the number of the instruction in P j L to execute next and, depending on D j , may include the contents of each register, the signs on the finite input tape, or the state of each mechanical switch.As we did with general computability, we call such a state a snapshot of D j for P j L ( x) and define a computation of P j L ( x) on D j to be a finite sequence of snapshots (s 1 , . . ., s k ), k ≥ 1, where each subsequent snapshot is computed from the previous snapshot, the initial snapshot s 1 has the values of all the variables in P j L appropriately specified and the instruction counter of P j L set to 1, and the terminal snapshot s k has the instruction counter set to the number of the instructions in P j L plus 1.We let denote the number sign corresponding to the output of P j L ( x) executed on D j .It is irrelevant to our discussion where this number sign is stored (e.g., in a register, a section of a finite tape, or the sequence of the positions of the mechanical switches examined left to right or right to left, etc.) so long as it is understood that the output, whenever there is a computation, is unambiguously interpreted as a real number according to an interpretation fixed a priori.
Equation ( 12) of actual computability is interpreted so that for any x ∈ R m and z ∈ R signifiable on D j , and f ( x) ↑ iff Ψ (m) P ( x) ↑.However, unlike Equation (7) of general computability, which is defined only on natural numbers and every natural number is signifiable by implication, in actual computability, we have to make provisions for non-signifiable real numbers.Toward that end, we introduce the following inequality, which holds when a non-signifiable number is encountered during a computation of Inequality ( 13) can be illustrated with two examples.Let D j have two cells per register, x 2 , and let P j L (x 1 , x 2 ) be a program that implements f , i.e., adds two number signs of x 1 and x 2 and puts the number sign of x 1 + x 2 in a designated output register.Let number signs be interpreted in standard decimal notation.Furthermore, if some number x is not signifiable on D j , only the first two elementary signs of the number sign of x are placed into a register, i.e., number signs are truncated to fit into registers, as is common in many programming languages.Then, after "100" is truncated to "10", f (99, 1) = 100 = Ψ (213, 13) = 34, because 213 is not signifiable on D j and is truncated to "21."In both cases, f (x 1 , x 2 ), as a mathematical object, is total, and there is a computation of P j L (x 1 , x 2 ) on x 1 = 99, x 2 = 1 and x 1 = 213, x 2 = 13, but during both computations, non-signifiable numbers, i.e., 100 and 213, are encountered.Definition 5. A function f : R m → R, m ∈ N + , is actually computable on D j if it is total, i.e., (∀ x ∈ R m ) f ( x) ↓, and actually partially computable.
A program P j L that implements an actually computable f ( x) is guaranteed to have a computation for any signifiable x.However, Inequality (13) may still hold if a nonsignifiable number is produced during a computation.Functions can be defined for a specific D j so that they deal only with signifiable numbers, e.g., whose domains and codomains are, respectively, finite signifiable proper subsets of R m and R. The next definition characterizes these functions.Definition 6.A function f : R m → R, m ∈ N + , is absolutely actually computable on D j if it is actually computable and Inequality (13) holds for no computation of P j L ( x), where x is signifiable on D j .
An implication of Definitions 4-6 is that if f : N m → N satisfies Definition 4, it is partially computable according to Definition 1, and if it satisfies Definitions 5 or 6, it is computable according to Definition 2, because, if no memory limitations are placed on registers, every natural number is signifiable.
We call an FMD D j sufficiently significant if three conditions are satisfied.First, a programming language L for D j exists with the same control structures as the programming language L described in Section 3.1 such that L (1) is capable of signifying a finite subset of R and (2) capable of specifying the following operations on numbers: addition, subtraction, multiplication, division, assignment, i.e., setting the value of a register to a number sign, comparison, i.e., a = b, a < b, a > b, a ≤ b, a ≥ b, on any signifiable a and b, and the truncation of the signs of non-signifiable numbers to fit them into registers.Second, the finite memory of D j suffices to hold L programs of length ≤ N ∈ N + , where the length of the program is the number of instructions in it.Third, the finite memory of D j suffices, in addition to holding a program of at most N instructions, to hold number signs in K ∈ N + registers.
Lemma 3. Let an FMA D j be sufficiently significant with K ≥ 7, a, b signifiable, b − a ≥ ∆ j , and let a + z∆ j , z > 0, be the smallest signifiable number greater than or equal to b.Let µ L goes into a while loop with the condition of ρ 5 < ρ 2 , i.e., a + i∆ j < b.Inside the loop, when ρ 5 = ρ 6 , ρ 3 is incremented by 1 and placed into the output register ρ 7 , and P j L exits.Otherwise, the loop continues with ρ 3 incremented by 1.If x = b, P j L goes into a while loop with the condition of ρ 5 ≤ ρ 2 , i.e., a + i∆ j ≤ b, and keeps incrementing ρ 3 by 1 inside the loop.After the loop terminates, ρ 3 is incremented by 1 and placed into the output register ρ 7 , and P j L exits.
A corollary of Lemma 3 is that µ j a,b −1 is absolutely actually computable.

A Recursive Formalization of Feedforward Artificial Neural Networks
A trained feedforward artificial neural network (FANN) N j z implemented in a programming language L on a sufficiently significant FMA D j is a finite set of artificial neurons, each of which is connected to a finite number of the neurons in the same set through the synapses, i.e., directed weighted edges (See Figure 1).The neurons are organized into k + 1 layers E 0 , E 1 , . . ., E k , with E 0 being the input layer; E k being the output layer; and E e , 0 < e < k, being the hidden layers.We let E j z denote the number of layers in N  ; layer 1 includes the neurons n 1 0 , n 1 1 , and n 1 2 ; layer 2 includes the neurons n 2 0 and n 2 1 ; the two arrows coming into n 0 0 and n 0 1 signify that layer 0 is the input layer; the two arrows going out of n 2 0 and n 2 1 signify that layer 2 is the output layer; w e i,j , 0 < e < 3, is the weight of the synapse from n e−1 i to n e j , e.g., w 1 0,0 is the weight of the synapse from n 0 0 to n 1 0 and w 2 2,1 is the weight of the synapse from n 1 2 to n 2 1 .
We assume that N j z is trained, i.e., the synapse weights are fixed automatically or manually, and fully connected, i.e., there is a synapse from every neuron in layer E e−1 to every neuron in layer E e .Each synapse has a weight, i.e., a signifiable real number, associated with it.We let w e i,j , 0 < e < E j z , denote the weight of the synapse from n e−1 i to n e j (see Figure 1) and w e refer to a vector of all synaptic weights between E e−1 and E e .We define w 0 = ().Thus, for the FANN N j z in Figure 1, and w 2 = w 2 0,0 , w 2 0,1 , w 2 1,0 , w 2 1,1 , w 2 2,0 , w 2 2,1 .We assume, without loss of generality, that all numbers in w e are in R j 0,1 defined in (1), because, if that is not the case, they can be so scaled, nor is there any loss of generality associated with the assumption of full connectivity, because partial connectivity can be defined by setting the weights of the appropriate synapses to 0.
If R j 0,1 is abbreviated to R 0,1 , each n e i in N j z , e > 0, computes an activation function α e i a e−1 , w e : R 0,1 nn(e−1) → R 0,1 , where a e−1 is the vector of the activations, i.e., real signifiable numbers, of the neurons in layer E e−1 .For e = 0, where x ∈ R 0,1 nn(0) and x i ∈ R 0,1 , 0 ≤ i < nn(0).Thus, if nn(0) = 3, as in Figure 1, then, given the input x = (x 0 , x 1 , x 2 ) = (0.0, 0.3, 0.6), α 0 0 ( x, () z is implemented on a sufficiently significant D j , all activation functions α e i (•) are absolutely actually computable.It is irrelevant to our discussion whether the activation functions are the same, e.g., sigmoid, for all or some neurons, or each neuron has its own activation function.
The term feedforward means that the activations of the neurons are computed layer by layer from the input layer to the output layer, because the activation functions of the neurons in the next layer require only the weights of the synapses connecting the next layer with the previous one and the activation values, i.e., the outputs of the activation functions of the neurons in the previous layer.To define the activation vectors of individual layers, let a 0 = α 0 0 x, () , . . .α 0 nn(0)−1 x, () , a e = α e 0 a e−1 , w e , . . ., α e nn(e)−1 a e−1 , w e , ( where 0 < e < E j z and x is an input vector.For each N j z , we define the absolutely actually computable function that N j z computes as 2 is the input to N j z in Figure 1,

Finite Sets as Gödel Numbers
Our primitive recursive techniques to pack finite sets and Cartesian powers thereof into Gödel numbers in this section rely, in part, on our previous work on primitive recursive characteristics of chess [11], which, in turn, was based on several functions shown to be primitive recursive in [7].For the reader's convenience, Appendix A.1 in Appendix A gives the functions shown to be primitive recursive in [7] and gives the necessary auxiliary definitions and theorems.Appendix A.2 in Appendix A gives the functions or variants thereof shown to be primitive recursive in [11].When we use the functions from [7,11] in this section, we refer to their definitions in the above two sections of Appendix A as necessary.
Let G be a Gödel number (G-number) as defined in (A8).The primitive recursive predicate GP in (18) uses the bounded existential quantification of a primitive recursive predicate defined in (A2) and the primitive recursive functions (x) i and Lt(x), respectively, defined in (A9) and (A10).
The logical structure of GP is GP 1 ∧ {GP 2 ∨ GP 3 }, where GP 1 , GP 2 , and GP 3 are The predicate GP holds for G-numbers with at least one element and whose elements themselves have the same length, i.e., the same number of elements, greater than 0. Thus, GP( [11,10]]).
Let G be a G-number, and let Then, the primitive recursive function computes, for t ∈ N + , a Gödel number whose components are Gödel numbers representing all sequences of t + 1 elements of G. Thus, Let S = {a 1 , a 2 , . . ., a n } ⊂ N + , S = ∅, and G = [a 1 , . . ., a n ].An induction on t shows that, for t > 0, τ 3 (G, t − 1) is a G-number representation of S t in the sense that (a i 1 , . . ., ; ; ; Let, for t > 1, τ 3 in (22), and and, in particular, for a = 0 and b = 1, let Then, G t,j 0,1 is a G-number representation of I j 0,1 t , i.e., the t-th Cartesian power of I j 0,1 .Since both τ 3 and .
− are primitive recursive functions, G t,j a,b ∈ N and G t,j 0,1 ∈ N are primitive recursively computable.

Numbers
We recall that E j z > 0 is the number of layers in N j z .Then, for a hidden or output neuron n e i , 0 < e < E j z , let

FANNs and Primitive Recursive Functions
For 0 ≤ e < E j z , 0 ≤ i < nn(e), x ∈ N, let αe i (x) = r asc x, r asc e, where r(•) and asc(•) are defined in (A6) and (A19), respectively.An example of computing αe i is given at the end of Appendix A.3 in the Appendix A.
Since µ is an absolutely actually computable bijection, ).
Since µ is an absolutely actually computable bijection, , whence, since ζ nn(1) ( ã1 ) = a 1 iff η nn(1) ( a 1 ) = ã1 , f ( x, 1) = a 1 iff f ( x, 1) = ã1 .Let us assume f ( x, e) = a e iff f ( x, e) = ãe for e ≥ 1.Then, Then, and, for x = η nn(0) ( x), let be the set of FANNs implemented on a sufficiently significant FMA D j , and let be the set of corresponding absolutely actually computable functions of the FANNs in N j , as defined in (33).There exists a bijection between N j and a class of primitive recursive functions.
be the set of the numbers Ω j z defined in (28), each of which uniquely corresponds to N j z ∈ N j .Let be a class of primitive recursive functions, one function per each Ω j z ∈ O z , as defined in (34).We observe that Let λ 1 : N j → A j , λ 2 : A j → O j , and λ 3 : O j → F j be defined as is a bijection.

Discussion
The definition of the finite memory device or automation (FMD or FMA) in Section 2.2 has four main implications.First, a physical or abstract automaton is an FMD when its memory amount is quantifiable as a natural number.Second, characters and strings are not necessary, because bijections exist between any finite alphabet of symbols and natural numbers and, through Gödel numbering, between any strings over a finite alphabet and natural numbers, hence the term numerical memory used in the article.Third, an FSA of classical computability becomes an FMA when the quantity of its internal and external memory is finite, i.e., there is an upper bound in the form of a natural number on the quantity of the machine's memory.It is irrelevant for the scope of this investigation whether the input tape of an FSA, the input and output tapes of such FSA modifications as the Mealy and Moore machines (Chapter 2 in [12]) or the finite state transducers (Chapter 3 in [13]), and the input tape and the stack of a pushdown automaton (PDA) (Chapter 5 in [12]) are considered internal or external memory.Fourth, a universal Turing machine (UTM) (Chapter 6 in [7]) is an FMA when the number of its tape cells is bounded by a natural number, which a fortiori makes any physical computer an FMA.Thus, only one type of universal computer is needed to define all FMA it can simulate.
Consider a universal computer UC capable of executing the universal L program U 1 constructed to prove the Universality Theorem (Theorem 3.1, Chapter 3 in [7]).The computer UC, equivalent to a UTM, takes an arbitrary L program P, an input to that program in the form of a natural number stored in its input register X 1 , which can be a Gödel number encoding an array of numbers, executes P on X 1 by encoding the memory of P as another Gödel number and returns the output of P as a natural number, which can also be a Gödel number encoding a sequence of natural numbers, saved in its output register Y. Since characters and character sequences can be bijectively mapped to natural numbers, UC can simulate any FSA or a modification thereof, e.g., a Mealy machine, a Moore machine, a finite state transducer, or a PDA.Technically speaking, there is no need to distinguish between the Mealy and Moore machines, because they are equivalent (Theorems 2.6, 2.7, Chapter 2 in [12]).When a limit is placed on the numerical memory of UC by way of the number of registers it can use and the size of the numbers signifiable in them, the input and output registers included, UC immediately becomes an FMD and so a fortiori any device that UC is capable of simulating.
The separation of computability into the two overlapping categories, general and actual, is necessary for theoretical and practical reasons.A theoretical reason, generally accepted in classical computability theory, is that it is of no advantage to put any memory limitations on automata or on the a priori counts of unit time steps that automata may take to execute programs that implement functions in order to show that those functions are computable.Were it not the case, we would not be able to investigate what is computable in principle.Rogers [10] succinctly expresses this point of view: "[w]e thus require that a computation terminate after some finite number of steps; we do not insist on an a priori ability to estimate this number." An implication of the above assumption is that an automaton, explicit or implicit, on which the said computation is executed has access to, literally, astronomical quantities of numerical memory.For a thought experiment, consider an automaton programmable in L of Chapter 2 of [7] that we used in Section 3.1, and let a program P j L (n), n ∈ N + , compute the G-number of the sequence (1, . . ., n), i.e., the function computed by P j L is f (n) = [1, . . ., n], as defined in (A8).Then, f (n) is a primitive recursive function and, hence, computable in the general sense of Definition 2. Thus, f (n) is signifiable for any n ∈ N + on the automaton.In particular, if n is the Eddington number, i.e., n = 10 80 ∈ N + , estimating the number of hydrogen atoms in the observable universe [14], there is a computation and, by implication, a variable in P j L to which the G-number of (1, 2, . . ., 10 80 ) can be assigned.The foregoing paragraph brings us to a practical reason for separating computability into the general and actual categories: it is of little use for an applied scientist who wants to implement a number-theoretic function f in a programming language L for an FMA D j to know that f is generally computable and the L program can, therefore, compute, in principle, some characteristic of arbitrarily large natural numbers, e.g., the Eddington number.If no natural number greater than some n ∈ N is signifiable on D j , the scientist must make provisions in the program for the non-signifiable numbers in order to achieve feasible results with absolutely actually computable functions.
Theorem 1 shows that the computation of a trained FANN on a finite memory device can be packed into a unique natural number.Once packed, the natural number can be used as an archive, after a fashion, to look up natural numbers that correspond, in the bijective sense of the term, to the real vectors computed by the function A z z of an FANN N j z implemented on the device.The correspondence is such that for any signifiable x, the output of N

Conclusions
To differentiate between feedforward artificial neural networks and their functions as abstract mathematical objects and the realizations of these networks and functions on finite memory devices, we introduced the categories of general and actual computability.We showed that correspondences are possible between trained feedforward artificial neural networks on finite memory devices and classes of primitive recursive functions.We argued that there are theoretical and practical reasons why computability should be separated into these categories.The categories are overlapping in the sense that some functions belong in both categories.

Lemma 2 .
If a, b are signifiable and b − a ≥ ∆ j , there exists a bijection ψ {0, . . ., z} ⊂ N, z > 0, where a to N + = {1, 2, 3, . ..}, we define the bijection µ be the bijection in(5).Let P j L (x), x ∈ R j a,b , be a program for D j that iterates from a to a + z∆ j ≥ b in positive unit integer increments of ∆ j until k or z that satisfies the conditions in (3) is encountered, and the length of P j L ≤ N.Then, µ j a,b is absolutely actually computable.Proof.Since a, b, and a + z∆ j are signifiable, so are dom(µ j a,b ) and codom(µ j a,b ).The finite memory of D j suffices to hold P j L , and P j L needs access to five signifiable numbers to iterate over dom(µ j a,b ): a, b, i, ∆ j , a + i∆ j .Since K ≥ 7, the signs of these numbers are placed in registers ρ 1 , ρ 2 , ρ 3 , ρ 4 , and ρ 5 .After x ∈ dom(µ j a,b ) is placed in register ρ 6 , P j L sets ρ 3 to 0. If x < b, P j refer to the i-th neuron in layer E e in N j z .We abbreviate n j,e z,i to n e i , because n e i always refers to a unique neuron in N j z .The function nn j z (e) : N → N + specifies the number of neurons in layer E e of N j z and is abbreviated nn(e).

− 1 ,
e , w e+1 , . . ., α e+1 nn(e+1)let f ( x, e) = ().The function f j z in (17) computes the feedforward activation of N j z layer by layer, i.e., f ( is defined in (2) and ggn(•) is defined in (A17).If we recall from Lemma 2 and (= {1, . . ., z + 1}, where a + z∆ j is the smallest signifiable real number ≥ b on D j , we observe that G j a,b is a G-number representation of I j a,b .Thus, if we return to Example 2 and use the accessor function (x) i in (A9), then for G j 0,1 = [1, 2, 3, 4, 5], we have the only way for another FANN N j k on D j to have Ω the same number of layers, the same number of neurons in each layer, the same activation function in each neuron, and the same synapse weights between the same neurons, i.e., N j k = N j z .Appendix A.3 in Appendix A gives several examples of how the Ω numbers are computed for N j z in Figure 1.

Lemma 4 .
Let µ j 0,1 be absolutely actually computable on a sufficiently significant FMA D j and let N j z be an FANN implemented on D j .Let 0 ≤ i < nn(0), 0 ≤ k < nn(e), 0 < e < be signifiable on D j .Then, Ω j,0 z,i = Ω 0 i ∈ N and Ω j,e z,i = Ω e i ∈ N.

jz
, i.e., A j z ( x) = a, corresponds to the natural number ã computed by the primitive recursive function Ãj z , i.e., Ãj z ( x) = ã, and the input x corresponds to the natural number x.Thus, A j z ( x) = a iff Ãj z ( x) = ã.Furthermore, the function Ãj z is computable in the general sense and is absolutely actually computable on any FMA where the natural number Ω j z is signifiable.A correspondence established in Theorem 2 should be construed so that the uniqueness of Ω j z does not imply the uniqueness of A j z because the same function can be computed by different FANNs.What it implies is that, for any two different FANNs N j n and N j m , n = m (e.g., different numbers of layers or different numbers of nodes in a layer or different activation functions or different weights), implemented on the same FMA D j , Ω j n = Ω j m .However, it may be the case that A j m ( x) = A j n ( x) for any signifiable x, and consequently, Ãj m ( x) = Ãj n ( x).