Article On the Entropy Based Associative Memory Model with Higher-Order Correlations

In this paper, an entropy based associative memory model will be proposed and applied to memory retrievals with an orthogonal learning model so as to compare with the conventional model based on the quadratic Lyapunov functional to be minimized during the retrieval process. In the present approach, the updating dynamics will be constructed on the basis of the entropy minimization strategy which may be reduced asymptotically to the above-mentioned conventional dynamics as a special case ignoring the higher-order correlations. According to the introduction of the entropy functional, one may involve higer-order correlation effects between neurons in a self-contained manner without any heuristic coupling coefficients as in the conventional manner. In fact we shall show such higher order coupling tensors are to be uniquely determined in the framework of the entropy based approach. From numerical results, it will be found that the presently proposed novel approach realizes much larger memory capacity than that of the quadratic Lyapunov functional approach, e.g., associatron.


Introduction
During the past quarter century, a large number of autoassociative models have been extensively investigated on the basis of the autocorrelation dynamics characterized by the quadratic Lyapunov functional to be minimized.Since the proposals of the pioneering retrieval models by Anderson [1], Kohonen [2], and Nakano [3], some works related to such an autoassociation model of the inter-connected neurons through an autocorrelation matrix were theoretically analyzed by Amari [4], Amit et al. [5] and Gardner [6].So far it has been well appreciated that the storage capacity of the au-OPEN ACCESS tocorrelation model, or the number of stored pattern vectors, L , to be completely associated vs. the number of neurons N, which is called the relative storage capacity or loading rate and denoted as α c = L/N , is evaluated as α c ~0.14 at most for the autocorrelation learning model with the activation function as the signum one (sgn(x ) for the abbreviation) [7,8].
In contrast to the above-mentioned models with monotonous activation functions, neuro-dynamics with a nonmonotonous mapping was recently proposed by Morita [9], Yanai and Amari [10], Shiino and Fukai [11].They clarified that the nonmonotonous mapping in a neuro-dynamics model possesses a remarkable advantage in the storage capacity, α c ~0.27-0.4,superior than the conventional association models with monotonous activation functions, e.g., the signum or sigmoidal function.Therefore activation functions have been considered to be worthwhile of investigation, not only the associative memory models but also learning models in relation with chaos dynamics [12].
In the above-mentioned association models, the dynamics have been restricted to the updating rule on the basis of the quadratic form of the Lyapunov functionals to be minimized through the retrieval process.That is, the nonlinearity of the dynamics results from the nonlinear characteristics of the activation function rather than the updating rule of the internal states derived from the quadratic Lyapunov, or energy, functional form.
From the above-mentioned viewpoint, we shall propose a novel approach based on the entropy defined in terms of the overlaps, which are defined by the inner products between the state vector and the embedded vectors.That is, in the present model the functional to be minimized is defined in terms of the entropy instead of the conventional quadratic functionals.Then it will be found that the higher order dynamics is to be involved in a self-contained manner in the present entropy-based approach.In Section 2 a theoretical framework based on the entropy approach will be described to present the relationship between the present proposal and the conventional model with a quadratic Lyapunov functional to be minimized.Some numerical results will be given in Section 3 and then Section 4 will be devoted to concluding remarks.

Theory
Let us consider an associative model with the embedded binary vector where N and L are the number of neurons and the number of embedded vectors, respectively, to be retrieved.The states of the neural network are to be characterized in terms of the vector s i (1 ≤ i ≤ N) and the internal states σ i (1 ≤ i ≤ N) which are related each other in terms of: where f (•) is the activation function of the neuron.
Then we introduce the following entropy which is to be related to the overlaps: where the overlaps m (r) (r = 1,2,...,L) are defined by: ; (5) where ω −1 denotes the inverse matrix of ω.
Then the entropy defined by Equation ( 2) can be minimized by the following condition: . 1 as the probability distribution in Equation ( 2), a target pattern may be retrieved by minimizing the entropy I with respect to m (r) or the state vector s i to achieve the retrieval of a target pattern in which the Equation ( 8) and Equation ( 9) are to be satisfied.Therefore the entropy function may be considered to be a functional to be minimized during the retrieval process of the auto-association model instead of the conventional quadratic Lyapunov, i.e. energy functional, E: where † i s is the covariant vector defined by: , † † † (11) and the connection matrix w ij is defined in terms of: Substituting Equation ( 12) into Equation ( 10), one may readily find: According to the steepest descent approach in the discrete time model, the updating rule of the internal states ( ) sgn( ), may be defined by: where η ( > 0) is a coefficient.Substituting Equation (2) and Equation (3) into Equation ( 14) and noting the following relation with the aid of Equation ( 11): one may readily derive the following relation: Generalizing somewhat the above dynamics, we propose the following dynamic rule for the internal states in order to unify the conventional quadratic dynamics as well as the presently proposed entropy approach as mentioned below: ( 1) ( ) 1 log 1 ( ) In the above expression α (0 1) α < < is considered to be a control parameter of the present model as follows.First, in the limit of α→0, the above dynamics will be reduced to the conventional autocorrelation dynamics: On the other hand, Equation (17) results in Equation ( 16) in the case of α→1.Therefore one may control the dynamics between the autocorrelation (α→0) and the entropy based approach ( α→1) on the basis of the presently proposed generalized approach defined by Equation (17).Now it seems to be worthwhile to see the higher-order correlation in Equation (17) expanding the right-hand-side of Equation (17) as follows.
Substituting Equation (3) into Equation ( 19), one may eventually derive the following up-dating rule for the internal state, i.e.: Thus the lowest correlation is reduced to the conventional quadratic framework expressed in terms of Equation (10) and Equation (12) as 0 α → .Furthermore, for the higher-order connection tensors appearing in Equation (10c), one may readily obtain the following results: It should be borne in mind here that all of the connection tensors, i.e.
to be uniquely determined in terms of the embedding vectors ( ) (1 ,1 ) ≤ ≤ , which may be related to each other according to Equation (5) to Equation (7).Thus the present approach substantially includes the higher-order correlations beyond the conventional approach defined by Equation (11), in which the correlation between neurons is restricted up to the second-order contribution corresponding to the quadratic Lyapunov functional given by Equation (10).
For practical association of the stored patterns, the connection tensors Equation ( 21) have to be utilised instead of the embedded vectors, i.e. (1 ≤ r ≤ L).

Results
The embedded vectors are set to the binary random vectors as follows: where ) (r i z (1 ≤i ≤ N , 1 ≤ r ≤ L ) are the zero-mean pseudo-random numbers between -1 and +1.For simplicity, the activation function , Equation (1), is set to: ( ) sgn( ), where sgn (•) denotes the signum function defined by: The initial vector s i (0 e is a target pattern to be retrieved and H d is the Hamming distance between the initial vector s i (0) and the target vector , in which the system may be in a steady state such that: To see the retrieval ability of the present model, the success rate S r is defined as the rate of the success for 1,000 trials with the different embedded vector sets To control from the autocorrelation dynamics after the initial state (t~1) to the entropy based dynamics (t~T max ) , the parameter α in Equation ( 17) was simply controlled by: ( ) where T max and α max are the maximum values of the iterations of the updating according to Equation ( 17 8) and Equation ( 9) are satisfied, and the entropy defined by Equation (2), respectively, for a retrieval process.In addition the time dependence of the parameter α/α max defined by Equation ( 31) is depicted as dots (.).In Figures 1(a)−(d) after a transient state, it may be confirmed that the complete association corresponding to the conditions, Equation ( 8) and Equation ( 9), can be achieved, even for such a relatively large Hamming distance of the initial vector from a target vector as H d /N = 0.1-0.15.On the other hand, in Figures 2(a)-(d), a trapping in a local minimum is found to be inevitable for L/N = 0.5 (>>0.14 which is the relative storage capacity for the autocorrelation model as discussed by Amari and Maginu [8] (see Concluding remarks), in which Equation (8) and Equation ( 9) cannot be achieved even for H d /N→0 with L/N > 0.5.In addition one may sees that the retrieval cannot be achieved beyond H d /N = 0.05 as in Figures 2(c) and (d).From these results one may certainly confirm the advantage of our approach beyond the conventional models based on the quadratic Lyapunov (energy) functionals.Thus, noting that s i (t) = sgn(σ i (t)) and η > 0, one has:

Concluding Remarks
In the present paper, we have proposed an entropy based association model instead of the conventional autocorrelation dynamics.From numerical results, it was found that a large memory capacity may be achieved on the basis of the entropy approach.This advantage of the association property of the present model is considered to result from the fact such that the present dynamics to update the internal state Equation (17) assures that the entropy, Equation ( 2) is minimized under the conditions, Equation (8) and Equation (9), which corresponds to the successful retrieval of a target pattern.
To conclude this work, we shall show the dependence of the storage capacity, which is defined as the area covered in terms of the success rate curves as shown in Figures 3 and Figure 4, on the Hamming distance in Figure 5. Therein one may see again the great advantage of the present model based on the entropy functional to be minimized beyond the conventional quadratic form.In fact one may realize the considerably larger storage capacity in the present model in comparison with the associatron over H d /N~0-0.5.The memory retrievals for the associatron become troublesome near H d /N = 0.5 as seen in Figure 5 since the directional cosine between the initial vector and a target pattern eventually vanishes therein.Remarkably, even in such a case, the present model attains a remarkably large memory capacity because of the higher-order correlations involved in Equation (17) or Equation (21), as expected from Figure 3.
As a future problem, it seems to be worthwhile to involve a chaotic dynamics in the present model introducing a periodic activation function such as sinusoidal one and to extend the autocorrelation model replacing respectively, corresponding to Equation (12) and Equation (15).The entropy based approach with Equation (20), i.e. autocorrelation dynamics, is now in progress in the relation with chaos dynamics [12] and will be reported elsewhere as a separated paper and to be compared with the previous works [13,14] in the near future.Furthermore it seems to be worthwhile to examine the truncation effects of the expansion tensors as in Equation ( 21), which was not directly derived in our previous work [15], for practical applications related to the hardware implementation.
† is defined in terms of the following orthogonal relation: n < ∞) are the connection weight tensors between neurons involving such higher-order correlations as 1 ≥ n and are to be expressed by means of ) and α, respectively.Choosing N = 200, η = 1, T max = 25, L/N = 0.5 and α max = 1, we first present an example of the dynamics of the overlaps in Figures 1(a)−(d) (entropy based approach) and Figures 2(a)-(d) (associatron), in which the abscissa and the ordinate are for the retrieval steps after the initial states and the overlaps derived from Equation (16), respectively.Therein the cross symbols (×) and the open circles (o) represent the success of retrievals, in which Equation (

Figure 1 .
Figure 1.The time dependence of overlaps of the present entropy based model defined by Equation (17).
in the present approach, in which the connection matrix w ij and the overlaps m (r) read:

Figure 5 .
Figure 5.The dependence of the storage capacity on the Hamming distance.Here symbols o and x are for the entropy based approach and the associatron, respectively.