Similarity Measures for Learning in Lattice Based Biomimetic Neural Networks

: This paper presents a novel lattice based biomimetic neural network trained by means of a similarity measure derived from a lattice positive valuation. For a wide class of pattern recognition problems, the proposed artiﬁcial neural network, implemented as a dendritic hetero-associative memory delivers high percentages of successful classiﬁcation. The memory is a feedforward dendritic network whose arithmetical operations are based on lattice algebra and can be applied to real multivalued inputs. In this approach, the realization of recognition tasks, shows the inherent capability of prototype-class pattern associations in a fast and straightforward manner without need of any iterative scheme subject to issues about convergence. Using an artiﬁcially designed data set we show how the proposed trained neural net classiﬁes a test input pattern. Application to a few typical real-world data sets illustrate the overall network classiﬁcation performance using different training and testing sample subsets generated randomly.


Introduction
The lattice neural network discussed in this paper is a biomimetic neural network. The term biomimetic refers to man-made systems of processes that imitate nature. Accordingly, biomimetic artificial neurons are man-made models of biological neurons, while biomimetic computational systems deal mostly with information processing in the brain. More specifically, biomimetic computational systems are concerned with such questions as how do neurons encode, transform and transfer information, and how this encoding and transfer of information can be expressed mathematically.
In the human as well as other mammal brains, every neuron has a cell body, named soma, and two kinds of physiological processes called, respectively, dendrites and axons [1]. Multiple dendrites conduct electric impulses toward the body of the cell whereas the axon conducts signals from the soma. Usually, dendrites have many branches forming complicated large trees and various types of dendrites are studded with many tiny branches known as spines. When present, dendrite spines are the main postsynaptic target for synaptic input. The input surface of the neuron is composed of the cell body and the dendrites. Those neurons receiving a firing signal coming from a presynaptic neuron are called postsynaptic neurons.
The axon hillock, usually located in the opposite pole of a neural cell, gives rise to the axon which is a long fiber whose branches form the axonal tree or arborization. In some neurons, besides its terminal arborization, the axon may have branches at intervals along its length. In general, a branch of an axon ends in several tips, called nerve terminals, synaptic knobs or boutons, and the axon, being the main fiber branch of a neuron, carries electric signals to other neurons. An impulse traveling along an axon from the axon hillock propagates all the way through the axonal tree to the nerve terminals. The boutons of the branches make contact at synaptic sites of the cell body and the many dendrites of other neurons. The synapse is a specialized structure whereby neurons communicate without actual physical contact between the two neurons at the synaptic site. The synaptic knob is separated from the surface of the soma or dendrite by a very short space known as the synaptic cleft. The mechanism characteristics of a synaptic structure are basically well known and there are two types of synapses, inhibitory synapses that prevent the neuron from firing impulses in response to excitatory synapses, which tend to depolarize the postsynaptic membrane and consequently exciting the postsynaptic cell to fire impulses.
In the cerebral cortex, the majority of synapses take place in the neural dendritic trees and much of the information processing is realized by the dendrites as brain studies have revealed [2][3][4][5][6][7][8].
A human brain has around 85 billion neurons and the average number of synaptic connections a neuron may have with other nearby neurons is about 7, 000 [9][10][11][12]. More specifically, a single neuron in the cerebral cortex has a number of synapses within the range 500 to 200, 000, and an adult's cerebral cortex has an estimated number of synapses in the range of 100 to 500 trillion (10 14 to 5 × 10 14 ) [10,[13][14][15]. In both volume and surface area of the brain, dendrites make up the largest component spanning all cortical layers in every region of the cerebral cortex [2,4,16]. Thus, in order to model an artificial neural network that can represent more faithfully a biological brain network, it is not possible to ignore dendrites and their spines, which cover the membrane of a neuron in more than 50%. This is particularly true by considering that several brain researchers have proposed that dendrites (not the neuron) are the basic computing devices of the brain. Neurons together with its associated dendritic structure can work as multiple, almost independent, functional subunits where each subunit can implement different logical operations [3,4,[16][17][18][19]. The interested reader may peruse the works of some researchers [3][4][5][6][7][8]16,20,21], that have proposed possible biophysical mechanisms for dendritic computation of logical functions such as 'AND', 'NOT', 'OR', and 'XOR'.
It is in light of these observations that we modeled biomimetic artificial neural networks based on dendritic computing. The binary logic operations 'AND' and 'OR' are naturally extended to non-binary numbers by considering their arithmetical equivalence, respectively, with finding the minimum and maximum of two numbers. Thus, the logic unary operation 'NOT', min and max together with addition belong to the class of machine operations that contribute to the high speed performance of digital computers. The preceding fact suggests us to select as the principal computational foundation, the algebraic structure provided by the bounded lattice ordered group (R n ±∞ , ∨, ∧, +, + * ) [22][23][24]. Recall that, R ±∞ stands for the set of extended real numbers and the binary operations of maximum, minimum, and extended additions are denoted, respectively, by ∨, ∧, and +/+ * .
The core issue in this research is a novel method for learning in biomimetic lattice neural networks. However, currently biomimetic neural networks and lattice based computational intelligence are not part of mainstream artificial neural networks (ANNs) and artificial intelligence (AI). To acquaint readers that are unfamiliar with these topics, we organized this paper as follows: Section 2 deals with basic concepts from lattice theory that are essential conceptual background, while Section 3 provides a short introduction to lattice biomimetic neural networks. Section 4 discusses the construction of the biomimetic neural network during the learning stages, and the illustrative examples provided in Section 5 show that the proposed neural architecture based on lattice similarity measures can be trained to give high percentages of correct classification in multiclass real-world pattern recognition datasets. The paper ends with Section 6, where we give our conclusions and some relevant comments.

Lattice Theory Background Material
Lattice theory is based on the concept of partially ordered sets, while partially ordered sets rest on the notion of binary relations. More specifically, given a set X and R ⊂ X × X = {(x, y) : x, y ∈ X}, then R is called a binary relation on X. For example, set inclusion is a relation on any power set P (X) of a set X. In particular, if X is a set and S = {(A, B) : A ⊂ B with A, B ∈ P (X)}, then S is a binary relation on P (X). Note that this example shows that a pair of elements of P (X) need not be a member pair of the binary relation. In contrast, the relation of less or equal, denoted by ≤, between real numbers is the set {(x, y) : x ≤ y} ⊂ R × R. Here, each pair of elements of R is related. The two examples of a binary relation on a set belong to a special case of binary relations known as partial order relations. We shall use the symbol for denoting a binary relation on an unspecified set X.

Definition 1.
A relation on a set P is called a partial order on P if and only if for every x, y, z ∈ P, the following three conditions are satisfied: 1. x x (reflexivity), 2. x y and y x ⇒ x = y (antisymmetry) and 3. x y and y z ⇒ x z (transitivity).
A set P together with a partial order , denoted by (P, ), is called a partially ordered set or simply a poset. If x y in a partially ordered set, then we say that x precedes y or that x is included in y and that y follows x or that y includes x. If (P, ) is a poset, then we define the notation x ≺ y, where x, y ∈ P, to mean that x y and x = y. The following theorem is a trivial consequence of these definitions. Theorem 1. Suppose (P, ) is a poset. Consequently, 1. If Q ⊂ P, then (Q, ) is also a poset, 2.
x ∈ P x ≺ x, and 3. if x ≺ y and y ≺ z, then x ≺ z, where x, y, z ∈ P.
If X is a set, then for any pair C, D ∈ P (X) the set {C, D} has a least upper bound and a greatest lower bound, namely C ∪ D and C ∩ D, respectively. Thus, (C ∩ D, C ∪ D) ∈ {(A, B) : A ⊂ B with A, B ∈ P (X)}. The greatest lower bound and least upper bound of a subset are commonly denoted by glb{C, D} and lub{C, D}, respectively. Similarly, if x, y ∈ R n , then lub{x, y} = x ∨ y and glb{x, y} = x ∧ y, so that (x ∧ y, x ∨ y) ∈ {(p, q) : p ≤ q and p, q ∈ R n }. The notions of least upper bound and greatest lower bound are key in defining the concept of a lattice.
More generally, if P is a poset and X ⊂ P, then the infimum denoted by inf(X), if it exist, is the greatest element in P that is less than or equal to all elements of X. Likewise, the supremum written as sup(X), if it exists, is the least element in P that is greater than or equal to all elements of X. Consequently, the infimum and supremum correspond, respectively, to the greatest lower bound and the least upper bound.
A few fundamental types of posets are described next: (1) A lattice is a partially ordered set L such that for any two elements x, y ∈ L, inf{x, y} and sup{x, y} exist. If L is a lattice, then we denote inf{x, y} by x ∧ y and sup{x, y} by x ∨ y, respectively. The expression x ∧ y is also referred to as the meet or min of x and y, while x ∨ y is referred to as the join or max of x and y. (2) A sublattice of a lattice L is a subset X of L such that for each pair x, y ∈ X, we have that x ∧ y ∈ X and x ∨ y ∈ X. (3) A lattice L is said to be complete if and only if for each of its subsets X, inf(X) and sup(X) exist. The symbols X and X are also commonly used for inf(X) and sup(X), respectively. Suppose L is a lattice and also an additive Abelian group, which we denote by (L, +). Now, consider the function ϕ : L → L defined by ϕ(x) = −x. If x y, then ϕ(x ∨ y) = −(x ∨ y) = −y and ϕ(x) ∧ ϕ(y) = −x ∧ −y = −y since −y −x. Likewise, if y x, then ϕ(x ∨ y) = −x and . This verifies the dual equations: signifying that the function ψ(x) = a + (−x) + b is a dual isomorphism for any fixed pair a, b ∈ L. Thus, in any lattice Abelian group the following identities hold: These equations easily generalize to, hence, if b = 0, then, Some of the most useful computational tools for applications of lattice theory to real data sets are mappings of lattices to the real number system. One family of such mappings are valuation functions.

Definition 2. A valuation on a lattice L is a function
A valuation is said to be isotone if and only if x y ⇒ v(x) ≤ v(y) and positive if and only if The importance of valuations on lattices is due to their close association with various measures. Among these measures are pseudometrics and metrics. Theorem 2. If L is a lattice and v is an isotone valuation on L, then the function d : satisfies, ∀ x, y, z, a ∈ L, the following conditions: An elegant proof of this theorem is provided by Birkhoff in [25]. In fact, the condition: or equivalently, d(x, y) = 0 ⇔ x = y, yields the following corollary of Theorem 2. The metric d defined on a lattice L in terms of an isotone positive valuation is called a lattice metric or simply an -metric, and the pair (L, d) is called a metric lattice or a metric lattice space. The importance of -metrics is due to the fact that they can be computed using only the operations of ∨, ∧, and + for lattices that are additive Abelian groups. For the lattice group (R n , +), they require far less computational time than any p metric whenever 1 < p < ∞. Just as different p norms give rise to different p metrics on R n , different positive valuations on a lattice will yield different -metrics. For instance, if L = R n , then the two positive valuations In particular, we have: Theorem 3. For x, y ∈ L, the induce metrics d 1 and d ∞ on L × L are given by, Proof. Considering (1) through (4) establishes the following equalities: Replacing the sum ∑ by the maximum operation and using an analogous argument proves the second equality in (10) of the theorem.
In addition to -metrics, valuations also give rise to similarity measures. A similarity measure is a measure that for a given object x tries to decide how similar or dissimilar other objects are when compared to x. For objects represented by vectors, distance measures such as metrics, measure numerically how unlike or different two data points are, while similarity measures find numerically how alike two data points are. In short, a similarity measure is the antithesis of a distance measure since a higher value indicates a greater similarity, while for a distance measure a lower value indicates greater similarity. There exists an assortment of different similarity measures, depending on the sets, spaces, or lattices under consideration. Specifically, for lattices we have the following, Definition 3. If L is a lattice with inf(L) = O, then a similarity measure for y ∈ L is a mapping s : L × L → [0, 1] defined by the following conditions: The basic idea is that if y ∈ L has more features in common with z than any other x ∈ L or if y is closer to z than any other x ∈ L in some meaningful way, then s(x, z) < s(y, z). As an aside, there is a close relationship of similarity measures with fuzzy sets. Specifically, if X = L × L, then F = {((x, y), s(x, y)) : (x, y) ∈ X} is a fuzzy set with membership function s : X → [0, 1].

Lattice Biomimetic Neural Networks
In ANNs endowed with dendrites whose computation is based on lattice algebra, a set N 1 , . . . , N n of presynaptic neurons provides information through its axonal arborization to the dendritic trees of some other set M 1 , . . . , M m of postsynaptic neurons [26][27][28]. Figure 1 illustrates the neural axons and branches that go from the presynaptic neurons to the postsynaptic neuron M j , whose dendritic tree has K j branches, denoted by, τ j 1 , . . . , τ j K j and containing the synaptic sites upon which the axonal fibers of the presynaptic neurons terminate. The address or location of a specific synapse is defined by the quintuple (i, j, k, h, ), where i ∈ {1, . . . , n}, j ∈ {1, . . . , m}, and k ∈ {1, . . . , K j }, that a terminal axonal branch of N i has a bouton on the k-th dendritic branch τ j k of M j . The index h ∈ {1, . . . , ρ} denotes the h-th synapse of N i on τ j k since there may be more terminal axonal branches of N i synapsing on τ j k . The index ∈ {0, 1} classifies the type of the synapse, where = 0 indicates that the synapse at (i, j, k, h, ) is inhibitory (i.e., releases inhibitory neurotransmitters) and = 1 indicates that the synapse is excitatory (releases excitatory neurotransmitters). The strength of the synapse (i, j, k, h, ) corresponds to a real number, commonly referred to as the synaptic weight and customarily denoted by w ijkh . Thus, if S denotes the set of synapses on the dendritic branches of the set of the postsynaptic neurons M 1 , . . . , M m , then w can be viewed as the function, w : S → R, defined by w(i, j, k, h, ) = w ijkh where w ijkh ∈ R. In order to reduce notational overhead we simplify the synapse location and type as follows: single output neuron) and denote its dendritic branches by τ 1 , . . . , τ K (multiple dendrites) or simply τ if K = 1 (single dendrite), and 3. (i, j, k, ) if ρ = 1 (at most one synapse per dendrite).
The axon terminals of different presynaptic biological neurons that have synaptic sites on a single branch of the dendritic tree of a postsynaptic neuron may release dissimilar neurotransmitters, which, in turn, affect the receptors of the branch. Since the receptors serve as storage sites of the synaptic strengths, the resulting electrical signal generated by the branch is the result of the combination of the output of all its receptors. As the signal travels toward the cell's body it again combines with signals generated in the other branches of the dendritic tree. In the lattice based biomimetic model, the various biological synaptic processes due to dissimilar neurotransmitters are replaced by different operations of a lattice group. More specifically, if Ω = {∨, ∧, +} represents the operations of a lattice group G, then the generic symbols ⊕, ⊗, and will mean that ⊕, ⊗, ∈ Ω, but are not explicitly specified numerical operations. For instance, if n i=1 a i = a 1 ⊕ · · · ⊕ a n and ⊕ = ∨, then n i=1 a i = n i=1 a i = a 1 ∨ · · · ∨ a n , and if ⊕ = +, then n i=1 a i = ∑ n i=1 a i = a 1 + · · · + a n . Let x = (x 1 , . . . , x n ) ∈ G n and let p jk be the switching value that signals the final outflow from the k-th branch reaching M j ; if excitatory, then p jk = 1 or if inhibitory then p jk = −1. Also, let I(k) ⊆ {1, . . . , n} represent the index set corresponding to all presynaptic neurons with terminal axonal fibers that synapse on the k-th dendrite of M j , and let ρ be the number of synaptic knobs of N i contacting branch d jk . Therefore, if N i sends the information value x i ∈ G via its axon and attached branches, the total output (or response) of a branch τ j k to the received input at its synaptic sites is given by the general formula: The cell body of M j receives τ j k (x), and its state is a function of the combined values processed by its dendritic structure. Hence, the state of M j is computed as, where p j = ±1 denotes the response of the cell to the received input. As explained before, p j = 1 (excitation) means acceptance of the received input and p j = −1 (inhibition) means rejection. This mimics the summation that occurs in the axonal hillock of biological neurons. In many applications of lattice neural networks (LNNs), the presynaptic neurons have at most one axonal bouton synapsing (ρ = 1) on any given dendritic branch τ j k . In these cases, (11) simplifies to, As in most ANNs, the next state of M j is determined by an activation function f j , which-depending on the problem domain-can be the identity function, a simple hard limiter, or a more complex function. The next state refers to the information being transferred via M j 's axon to the next level neurons or the output if M j is an output neuron. Any ANN that is based on dendritic computing and employs equations of type (11) and (12), or (13) and (12), will be called a lattice biomimetic neural network (LBNN). In the technical literature, there exist a multitude of different models of lattice based neural networks. The matrix based lattice associative memories (LAMs) discussed in [22,24,29,30] and LBNNs are just a few examples of LNNs. What sets LBNNs apart from current ANNs are the inclusion of the following processes employed by biological neurons: 1. The use of dendrites and their synapses. 2. A presynaptic neuron N i can have more than one terminal branch on the dendrites of a postsynaptic neuron M j . 3. If the axon of a presynaptic neuron N i has two or more terminal branches that synapse on different dendritic locations of the postsynaptic neuron M j , then it is possible that some of the synapses are excitatory and others are inhibitory to the same information received from N i . 4. The basic computations resulting from the information received from the presynaptic neurons takes place in the dendritic tree of M j . 5. As in standard ANNs, the number of input and output neurons is problem dependent. However, in contrast to standard ANNs where the number of neurons in a hidden layer, as well as the number of hidden layers are pre-set by the user or an optimization process, hidden layer neurons, dendrites, synaptic sites and weights, and axonal structures are grown during the learning process.
Substituting specific lattice operations in the general Equations (11) and (12) results in a specific model of the computations performed by the postsynaptic neuron M j . For instance, two distinct specific models are given by, or, Unless otherwise mentioned, the lattice group (R, ∧, ∨, +) will be employed when implementing Equations (11) and (12) or (13) and (12). In contrast to standard ANNs currently in vogue, we allow both negative and positive synaptic weights as well as weights of value zero. The reason for this is that these values correspond to positive weights if one chooses the algebraically equivalent lattice group (R + , ∧, ∨, ×), where R + = {x ∈ R : x > 0}. The equivalence is given by the bijection f : R → R + , which is defined by f (x) = exp(x). Consequently, negative weights correspond to small positive weights and zero weights to one.

Similarity Measure Based Learning for LBNNs
The focus of this section is on the pattern recognition capabilities of LBNNs. In particular, on how a lattice biomimetic neural network learns to recognize distinct patterns. However, since the learning method presented here is based on a specific similarity measure, we begin our discussion by describing the measure used [31]. The lattice of interest in our discussion is L = R * = {x ∈ R n : x ≥ 0, with inf(L) = 0, while the similarity measure for y ∈ L is the mapping s : L × L → [0, 1] defined by: where v is the isotone positive lattice valuation given by v(x) = ∑ n i=1 x i . We used the lattice L = (R * , ∨, ∧) in order to satisfy Condition (1) of Definition 3, and coordinates of pattern vectors considered here are nonnegative. Since data sets are finite, data sets consisting of pattern vectors that are subsets of R n have always an infimum v and a supremum u. Thus, if Q ⊂ R n is a dataset whose pattern vectors have both negative and nonnegative coordinates, simply compute v = q∈Q q = inf(Q) and u = q∈Q q = sup(Q). Note that the hyperbox which proves that Condition (1) of Definition 3 is satisfied, and the remaining two conditions are similarly proven.
There exist several distinct methods for learning in LBNNs. The method described here is novel in that it is based on the similarity measure given in (16). To begin with, suppose Q = {q 1 , . . . , q k } ⊂ R n is a data set consisting of prototype patterns, where each pattern q j belongs to one of m different classes. Here 1 < m < k and we use the expression q j ∈ c λ if q j belongs to class λ ∈ {1, . . . , m} by some predefined relationship. Letting N m = {1, . . . , m}, then the association of patterns and their class membership is a subset of Q × N m specified by H = {(q j , c λ ) : q j ∈ Q , λ ∈ N m }.
As in most learning methods for artificial neural networks, a lattice biomimetic neural network learns to recognize distinct patterns by using a subset of prototype patterns stored in a hetero-associative memory. Given the data set Q, learning in LBNNs begins with selecting a family of prototypes P p = {q s 1 , . . . , q s η } ⊂ Q. The selection is random and the subscript p is a predefined percentage p% of the total number of the k samples in Q.
After selecting the training set P p , precompute the values v(q s j ) = ∑ n i=1 q s j i for j = 1, . . . , η. These values will be stored at the synaptic sites of the LBNN and in most practical situations η n. Knowing the dimension n and the size of the training set P p , it is now an easy task to construct the network. As illustrated in Figure 2, the network has n input neurons denoted by N 1 , . . . , N n , two hidden layer neurons, and a layer of output neurons. The first hidden layer neurons consist of two different types of neurons denoted by A j and B j , where j = 1, . . . , η. Each neuron A j and B j will have a single dendrite with each dendrite having n synaptic sites. For the sake of simplicity we denote the dendrite of A j and of B j by a j and b j , respectively. The second hidden layer has η + 1 neurons denoted by C j , where j = 0, 1, . . . , η. Here C 0 has multiple dendrites, i.e., η dendrites denoted by τ 0 j , with each dendrite having two synaptic sites for j = 1, . . . , η. Any other neuron C j with j = 0 has one dendrite, with each dendrite having also two synaptic sites. The output layer is made up of η neurons, denoted by M j for j = 1, . . . , η, with each neuron M j having a single dendrite with two synaptic sites. In what follows, we describe the internal workings of the network. For a given x ∈ R * , the input neuron N i receives the input x i , and this information is sent to each of the neurons A j and B j . For each i = 1, . . . , n, the axonal arborization of N i consists of 2η terminal branches with one terminal on each a j and b j . The synaptic weight α ij at the i-th synapse on a j is given by α ij = q s j i with = 1. Each synapse on a j at location (i, j) results in x i ∨ q s j i upon receiving the information x i . The total response of the dendrite a j is given by the summation a j (x) = v(x ∨ q s j ) = ∑ n i=1 (x i ∨ q s j i ). In a similar fashion, the synaptic weight β ij at the i-th synapse on b j is given by β ij = q s j i with = 1. However, here, each synapse on b j at location (i, j) results in x i ∧ q s j i upon receiving the information x i , and each neuron This information a j (x) and b j (x) travels through the soma towards its axon hillock of the respective neurons where the corresponding activation functions are given, for A j and B j respectively, by: The information f j (x) and g j (x) is transferred via the axonal arborization of the first hidden layer neurons to the dendrites of the second layer neurons. The presynaptic neurons of C 0 are all the neurons of the first hidden layer. A terminal axonal fibers of A j and one from B j terminate on τ 0 j . The weight at each of the two synapses is w aj0 = 0 = w bj0 , where = 1 and aj, bj are address labels for the respective terminal axonal fibers from A j and B j . Thus, each synapse accepts the information f j (x) and g j (x). The total response of the dendrite is given by τ 0 j (x) = f j (x) ∧ g j (x). However, the total response of the neuron C 0 is given by: For j = 1, . . . , η, the presynaptic neurons for the neuron C j are the two neurons A j and B j . Denoting the single dendrite of C j by τ j , then a terminal axonal fibers of A j and one from B j terminate on τ j . In lockstep with C 0 , the weight at each of the two synapses is w aj = 0 = w bj , where = 1 and aj, bj are address labels for the respective terminal axonal fibers from A j and B j . Again, the two synapses accept the information f j (x) and g j (x), and the response of the single dendrite is: The activation function for C j is the identity function f (x) = x for all j ∈ {0, 1, . . . , η}. For the output layer, the presynaptic neurons for M j are the two neurons C j and C 0 . As mentioned earlier, each output neuron M j has one dendrite d j with two synaptic regions, one for the terminal axonal bouton of C j and one for C 0 . The synaptic weight at the synapse of C j on d j is given by w j , where = 1 and w 1 j = 0, while the synaptic weight at the synapse of C 0 on d j is given by w 0 , with = 0 and w 0 0 = 0. Because the activation function of C j is the identity function, the input at the synapse with weight w 1 j is τ j (x), and since w 1 j = 0, the synapse accepts the input. Likewise, the input from neuron C 0 at the synapse with weight w 0 0 is τ 0 (x). However, because = 0, the weight negates the input since (−1) (1− ) The dendrite d j adds the results of the synapses so that, d j (x) = τ j (x) − τ 0 (x). This information flows to the hillock of M j , and the activation function of M j is the hard-limiter . Suppose that q s j ∈ c λ and f [d j (x)] = 1. If for any k ∈ {1, . . . , η} \ {j}, we have that f [d k (x)] = 0, then we say that x ∈ c λ , i. e. winner takes all. If there is another winner that is not a member of c λ , then repeat the steps with a new randomly obtained set P p . If after several tries, a single winner cannot be found, it becomes necessary to increase the percentage of points in P p . Note that the method just described can be simplified by eliminating the neuron C 0 and using the C 1 to C η neurons as the output neurons.
If there is one τ j (x) such that τ k (x) < τ j (x) ∀ k ∈ {1, . . . , m} \ {j}, then x ∈ c λ , where c λ is the class of q s j . If there is more than one winner where the other winner does belong to class c λ , then repeat the steps with a new set P p as described earlier.
We close our theoretical description by pointing out the important fact that an extensive foundation with respect to the similarity measure given in Equation (16) or more precisely the two separate expressions in (17) has been developed earlier, although with a different perspective in mind, in related areas such as fuzzy sets [32][33][34], fuzzy logic [35,36], and fuzzy neural networks [37]. For example, the scalar lattice functions, defined by f : L → R and g : and v(x) = x for y ∈ L, were treated in [37]. Also, algorithms for computing subsethood and similarity for interval-valued fuzzy sets for the vectorial counterparts of f and g appear in [38].

Recognition Capability of Similarity Measure Based LNNs
Before discussing the issue of interest, we must mention that a previous LNN based on metrics appears in [39]. The proposed LBNN is trained in a fairly simple way in order to be able to recognize prototype-class associations in the presence of test or non-stored input patterns. As described in Section 4, the network architecture is designed to work with a finite set of hetero-associations, that we denote by H ⊂ Q × N m . Using the prototype-class pairs of a training subset randomly generated from the complete data set, all network weights are preassigned. After weight assignment, non-stored input patterns chosen from a test set can be used to prove the memory network. A test set is defined as the complement of the training set of exemplar or prototype patterns. Clearly, test patterns are elements of one of the m classes that the LBNN can recognize. If the known class of a given non-stored pattern matches the net output class, correct classification or a hit occurs, otherwise an error of misclassification happens. Consequently, by computing the fraction of hits relative to each input set used to test the network we can measure the recognition capability of the proposed LBNN.
In the following paragraphs, some pattern recognition problems are examined to show the performance classification of our LBNN model based on the similarity measure given in (19). For each one of the examples described next, a group of prototype subsets P p were randomly generated by fixing increasing percentages p%, of the total number k of samples in a given data set Q. Selected percentages p belong to the range {50%, 60%, . . . , 90%} and generated test subsets, symbolized as Q p , were obtained as complements of P p with respect to Q. Computation of the average fraction of hits for each selected percentage of all samples requires a finite number of trials or runs, here denoted by τ. Let µ and µ r be the average (over all runs) and the number of misclassified test patterns in each run, then the average fraction of hits is given by, Note that, if |Q| = k, |P p |, |Q p |, are the cardinalities of the data, prototype, and test sets, respectively, then k = |P p | + |Q p |. In (20), we set the number of runs, τ = 50, for any percentage p of the training population sample in order to stabilize the mean value µ. Although, for each run with the same value of p, the number of elements of P p and Q p does not change, the sample patterns belonging to each subset are different since they are selected in random fashion. Also, observe that a lattice biomimetic net trained for some p with a prototype subset P p , can be tested either with the whole data set Q = P p ∪ Q p or with the test set Q p .
We will use a table format to display the computational results obtained for LBNN learning and classification of patterns for the example data sets, to give the mean capability performance in recognizing any element in Q. Each table is composed of six columns; the first column gives the dataset characteristics; the second column gives the percentage p of sample patterns used to generate the prototype and test subsets; the third column provides the number of randomly selected prototype patterns, and the fourth column gives the number of test patterns. The fifth column shows the average number of misclassified inputs calculated using the similarity lattice valuation measure and the sixth column gives the corresponding average fraction of hits for correct classifications.

Classification Performance on Artificial Datasets
The following two examples are designed to illustrate simple data sets with two and three attributes that can be represented graphically as scatter plots, respectively, in two and three dimensions. We remark that both sets are build artificially and do not correspond to data sets coming from realistic measurements taken from a real-world situation or application.
Our first artificial or synthetic data set Q forms a discrete planar "X" shape with 55 points (samples) where coordinates x and y correspond to its two features. The points are distributed in two classes c λ where λ ∈ {1, 2}. The corresponding 2D scatter plot is shown in Figure 3. Similarly, the second synthetic set Q consists of 618 samples defined in the first octant of R 3 . Points in class c 1 belong randomly to a hemisphere of radius 3.5 centered at (5,5,5) with a hemispherical cavity of radius 1.5 concentric to the larger hemisphere and class c 2 points belong, also randomly, to a sphere of radius 1.5 embedded in the cavity formed by class c 1 points. Again, features are specified by the x, y, and z coordinates and the corresponding three-dimensional scatter plot is depicted in Figure 4. Table 1 gives the numerical results for the "X-shape" (X-s) and "Hemisphere-sphere" (H-s) datasets.   The last column in Table 1 shows the high classification rates achieved by training the similarity valuation based LBNN with, at least half the number of samples, and repeating the learning procedure several times in random fashion. For the sake of completeness, we explain graphically, using the X-shaped dataset, how the lattice based neural network shown in Figure 2 assigns a class label to input patterns once the network is trained with a randomly generated prototype subset P p of Q setting p = 55%. Specifically, Figure 5 displays the k = 55 points in Q that form the X-shaped set, where the point circles crossed with the symbol "×" (in olive green) denote class c 1 training data and the point circles marked with a "+" sign (over the red circles) are class c 2 training data totaling 29 elements belonging to P p . In the same figure, test points x 5 , x 19 , x 34 , and x 50 , selected from the 26 elements of Q p , are shown as filled colored dots and its class is determined based on the neural similarity lattice valuation measure response given in (18).
As can be seen in Figure 5, class λ = 1 is correctly attached to the test points, x 5 = (3, 3) and x 19 = (9.5, 10.5), since the maximum similarity valuation measure computed using (18), is obtained, respectively, for the training points, q 5 = (3.5, 2) and q 11 = (10. 5,9), which are elements of c 1 . Analogously, class λ = 2 is correctly assigned to the test points, x 34 = (3, 10.5) and x 50 = (10.5, 4), since the maximum similarity valuation measure is found for the training points, q 18 = (1.5, 11.5) and q 28 = (12, 4), data elements of c 2 . More specifically, the explicit calculation expression corresponding to (18) for testing any point x ζ ∈ Q p is given by, We end our discussion about the X-shaped artificial dataset by showing the similarity valuation measure graphs of the selected test points, x 5 , x 19 , x 34 , x 50 ∈ Q p . Hence, Figure 6 displays from top to bottom the similarity measure curves whose domain is the data training subset P p and with values ranging on the interval [0, 1]. The maximum similarity value τ 0 (x ζ ) is represented with the symbol and the corresponding training pattern index within the set P p is found at the bottom of the dropped vertical line (dashed). The LBNN then assigns the correct class to each one of the selected test points as depicted in the same figure with respect to the line dividing both classes. x 50 x 34 x 19 x 5 Class  Class 1 x 19 x 5 Figure 6. Similarity measure curves for x ζ ∈ Q p where ζ = 5, 19, 34, 50. The assigned class to test points corresponds to the class of the training points, q j ∈ P p for j = 5, 11, 18, 28, where the maximum similarity value occurs. Here, = 0.846, 0.928, 0.896, and 0.906, respectively, for x 5 , x 19 , x 34 , and x 50 .

Classification Performance on Real-World Application Datasets
Various application examples available at the UCI Machine Learning Repository [40] are described and discussed in this subsection in order to exhibit the similarity valuation LBNN classification performance. The numeric results are compiled in Table 2 that has the same organization as explained in the previous subsection on artificial datasets. However, in Table 2 each block of rows in a given example is separated by a horizontal line. Example 1. The "Iris" dataset has 150 samples where each sample is described by four flower features (sepal length, sepal width, petal length, petal width) and is equally distributed into three classes c 1 , c 2 , and c 3 , corresponding, respectively, to the subspecies of Iris setosa, Iris versicolor, and Iris virginica. A high average fraction of hits such as f hits p > 0.97 is obtained for percentages p ≥ 50%. The similarity valuation trained LBNN used as an individual classifier delivers similar performance against linear or quadratic Bayesian classifiers [41] for which f hits 50 = 0.953 and f hits 50 = 0.973, respectively, or in comparison with an edge-effect fuzzy support vector machine [42] whose f hits 60 = 0.978.

Example 2.
The "Column" dataset with 310 patient samples is specified by six biomechanical attributes derived from the shape and orientation of the pelvis and lumbar spine. Attributes 1 to 6 are numerical values of pelvic incidence, pelvic tilt, lumbar lordosis angle, sacral slope, pelvic radius, and grade of spondylolisthesis. Class c 1 of patients diagnosed with disk hernia has 60 samples, class c 2 of patients diagnosed with spondylolisthesis 150 samples, and class c 3 of normal patients 100 samples. Since feature 6 has several negative entries, the whole set is displaced to the positive octant of R 3 by adding | inf(Q)| = 11.06 to every vector in Q. In this case, a high average fraction of hits occurs for percentages p greater than 80%, which is due to the presence of several interclass outliers. However, the LBNN with much less computational effort is still good if compared with other classifiers such as an SVM (support vector machine) or a GRNN (general regression neural network) [43] (with all outliers removed), which both give f hits 80 = 0.965.

Example 3.
The "Wine" dataset has 178 samples subject to chemical analysis of wines produced from three different cultivars (classes) of the same region in Italy. The features in each sample represent the quantities of 13 constituents: alcohol, malic acid, ash, alkalinity of ash, magnesium, phenols, flavonoids, nonflavonoid phenols, proanthocyanins, color intensity, hue, diluted wines, and proline. Class c 1 has 59 samples, class c 2 71 samples, and class c 3 48 samples. In this last example, a high average fraction of hits occurs for percentages p greater than 80%, and the LBNN performance is quite good if compared with other classifiers, based on the leave one-out technique, such as the 1-NN (one-nearest neighbor), LDA (linear discriminant analysis), and QDA (quadratic discriminant analysis) [44], which give, correspondingly, f hits p = 0.961, f hits p = 0.989, and f hits p = 0.994, where p = 99.44%, and training must be repeated τ = 178 times. Although not shown in Table 2, the LBNN net gives f hits 99 1, since almost all samples in the given dataset are stored by the memory as prototype patterns. However, our LBNN model is outperformed by a short margin of misclassification error (0.055), since f hits 75 = 0.942, if compared to an FLNN classifier (fuzzy lattice neural network) that gives f hits 75 = 0.997 (leave-25%-out) [45].

Conclusions
This paper introduces a new lattice based biomimetic neural network structured as a two hidden layer dendritic lattice hetero-associative memory whose total neural response is computed using a similarity measure derived from a lattice positive valuation. The proposed model delivers a high ratio of successful classification for any data pattern considering that the network learns random prototype patterns with a percentage level from 50% up to 90% of the total number of patterns belonging to a data set.
More specifically, the new LBNN model provides intrinsic capacity to tackle multivariatemulticlass problems in pattern recognition pertaining to applications whose features are specified by data measured numerically. Our network model incorporates a straightforward mechanism whose topology implements a similarity function defined by simple lattice arithmetical operation used to measure the resemblance between a set of n-dimensional real vectors (prototype patterns) and a test input n-dimensional vector, in order to match its class. Representative examples, such as the "Iris", the "Column", and the "Wine" datasets, were used to carry on several computational experiments to obtain the average classification performance of the proposed LBNN for diverse randomly generated test subsets. Furthermore, the proposed LBNN model can be applied in other areas such as cryptography [46] or image processing [47][48][49][50].
The results given in this paper are competitive and look promising. However, future work with the LBNN classifier contemplates computational enhancements and comparisons with other challenging artificial and experimental data sets. Additionally, further analysis is required to deal with important issues such as data test set design, theoretical developments based on different lattice valuations, and comparisons with recently developed models based on lattice computing. We must point out that our classification performance experiments are actually limited due to its implementation in standard high-speed sequential machines. Nonetheless, LBNNs as described here and in early writings, can work in parallel using dedicated software or implemented in hardware to increase computational performance. Hence, a possible extension is to consider algorithm parallelization using GPUs or hardware realization as a neuromorphic system.