Next Article in Journal
The Quantum-Classical Transition as an Information Flow
Next Article in Special Issue
Entropy Variation in the Two-dimensional Phase Transition of Anthracene Adsorbed at the Hg Electrode/Ethylene Glycol Solution Interface
Previous Article in Journal
Arguments for the Integration of the Non-Zero-Sum Logic of Complex Animal Communication with Information Theory

Entropy 2010, 12(1), 136-147; https://doi.org/10.3390/e12010136

Article
On the Entropy Based Associative Memory Model with Higher-Order Correlations
Department of Electrical Engineering, Faculty of Engineering, Nagaoka University of Technology, Kamitomioka 1603-1, Nagaoka, Niigata 940-21, Japan
Received: 2 January 2010 / Accepted: 18 January 2010 / Published: 22 January 2010

Abstract

:
In this paper, an entropy based associative memory model will be proposed and applied to memory retrievals with an orthogonal learning model so as to compare with the conventional model based on the quadratic Lyapunov functional to be minimized during the retrieval process. In the present approach, the updating dynamics will be constructed on the basis of the entropy minimization strategy which may be reduced asymptotically to the above-mentioned conventional dynamics as a special case ignoring the higher-order correlations. According to the introduction of the entropy functional, one may involve higer-order correlation effects between neurons in a self-contained manner without any heuristic coupling coefficients as in the conventional manner. In fact we shall show such higher order coupling tensors are to be uniquely determined in the framework of the entropy based approach. From numerical results, it will be found that the presently proposed novel approach realizes much larger memory capacity than that of the quadratic Lyapunov functional approach, e.g., associatron.
Keywords:
association model; entropy; memory capacity; lyapunov functional

1. Introduction

During the past quarter century, a large number of autoassociative models have been extensively investigated on the basis of the autocorrelation dynamics characterized by the quadratic Lyapunov functional to be minimized. Since the proposals of the pioneering retrieval models by Anderson [1], Kohonen [2], and Nakano [3], some works related to such an autoassociation model of the inter-connected neurons through an autocorrelation matrix were theoretically analyzed by Amari [4], Amit et al. [5] and Gardner [6]. So far it has been well appreciated that the storage capacity of the autocorrelation model, or the number of stored pattern vectors, L , to be completely associated vs. the number of neurons N, which is called the relative storage capacity or loading rate and denoted as αc = L/N , is evaluated as αc~0.14 at most for the autocorrelation learning model with the activation function as the signum one (sgn(x ) for the abbreviation) [7,8].
In contrast to the above-mentioned models with monotonous activation functions, neuro-dynamics with a nonmonotonous mapping was recently proposed by Morita [9], Yanai and Amari [10], Shiino and Fukai [11]. They clarified that the nonmonotonous mapping in a neuro-dynamics model possesses a remarkable advantage in the storage capacity, αc~0.27-0.4, superior than the conventional association models with monotonous activation functions, e.g., the signum or sigmoidal function. Therefore activation functions have been considered to be worthwhile of investigation, not only the associative memory models but also learning models in relation with chaos dynamics [12].
In the above-mentioned association models, the dynamics have been restricted to the updating rule on the basis of the quadratic form of the Lyapunov functionals to be minimized through the retrieval process. That is, the nonlinearity of the dynamics results from the nonlinear characteristics of the activation function rather than the updating rule of the internal states derived from the quadratic Lyapunov, or energy, functional form.
From the above-mentioned viewpoint, we shall propose a novel approach based on the entropy defined in terms of the overlaps, which are defined by the inner products between the state vector and the embedded vectors. That is, in the present model the functional to be minimized is defined in terms of the entropy instead of the conventional quadratic functionals. Then it will be found that the higher order dynamics is to be involved in a self-contained manner in the present entropy-based approach. In Section 2 a theoretical framework based on the entropy approach will be described to present the relationship between the present proposal and the conventional model with a quadratic Lyapunov functional to be minimized. Some numerical results will be given in Section 3 and then Section 4 will be devoted to concluding remarks.

2. Theory

Let us consider an associative model with the embedded binary vector e i ( r ) = ±1 (1 ≤ iN,1 ≤ rL), where N and L are the number of neurons and the number of embedded vectors, respectively, to be retrieved. The states of the neural network are to be characterized in terms of the vector si (1 ≤ iN) and the internal states σi (1 ≤ iN) which are related each other in terms of:
s i = f ( σ i )    ( 1 i N ) ,
where f (•) is the activation function of the neuron.
Then we introduce the following entropy which is to be related to the overlaps:
I = 1 2 r = 1 L m ( r ) 2 log m ( r ) 2 ,
where the overlaps m(r) (r = 1,2,...,L) are defined by:
m ( r ) 2 = r = 1 N e i ( r ) S i ;
where the covariant vector e i ( r ) is defined in terms of the following orthogonal relation:
r = 1 N e i ( r ) e i ( s ) = δ r s    ( 1 r , s L ) ,
e i ( r ) = r = 1 L a r r e i ( r ) ,
a r r = ( ω 1 ) r r ,
ω r r = ( i = 1 N e i ( r ) e i ( r ) ) ,
where ω−1 denotes the inverse matrix of ω.
Then the entropy defined by Equation (2) can be minimized by the following condition:
| m ( r ) | = δ r s    ( 1 r , s L ) ,
and:
r = 1 L m ( r ) 2 = 1 .
That is, regarding m ( r ) 2 ( 1 r L ) as the probability distribution in Equation (2), a target pattern may be retrieved by minimizing the entropy I with respect to m(r) or the state vector si to achieve the retrieval of a target pattern in which the Equation (8) and Equation (9) are to be satisfied. Therefore the entropy function may be considered to be a functional to be minimized during the retrieval process of the auto-association model instead of the conventional quadratic Lyapunov, i.e. energy functional, E:
E = 1 2 i = 1 N j = 1 N W i j S i S j ,
where S i is the covariant vector defined by:
S i = r = 1 L j = 1 N e i ( r ) e j ( r ) s j ,
and the connection matrix wij is defined in terms of:
W i j = r = 1 L e i ( r ) e j ( r ) .
Substituting Equation (12) into Equation (10), one may readily find:
E = 1 2 r = 1 L m ( r ) 2 .
According to the steepest descent approach in the discrete time model, the updating rule of the internal states  s i = f ( σ i ) = sgn ( σ i ) , may be defined by:
σ i ( t + 1 ) = η I s i    ( 1 i N ) ,
where η ( > 0) is a coefficient. Substituting Equation (2) and Equation (3) into Equation (14) and noting the following relation with the aid of Equation (11):
m ( r ) = i = 1 N e i ( r ) s i = i = 1 N e i ( r ) s i
one may readily derive the following relation:
Entropy 12 00136 i001
Generalizing somewhat the above dynamics, we propose the following dynamic rule for the internal states in order to unify the conventional quadratic dynamics as well as the presently proposed entropy approach as mentioned below:
σ i ( t + 1 ) = η r = 1 L e i ( r ) { j = 1 N e j ( r ) s j ( t ) } [ 1 + log { ( 1 α ) + α { j = 1 N e j ( r ) s j ( t ) } 2 } ] = η r = 1 L e i ( r ) { m ( r ) ( t ) } [ 1 + log { ( 1 α ) + α { m ( r ) ( t ) } 2 } ] .
In the above expression α (0 < α < 1) is considered to be a control parameter of the present model as follows. First, in the limit of α→0, the above dynamics will be reduced to the conventional autocorrelation dynamics:
Entropy 12 00136 i002
On the other hand, Equation (17) results in Equation (16) in the case of α→1. Therefore one may control the dynamics between the autocorrelation (α→0) and the entropy based approach ( α→1) on the basis of the presently proposed generalized approach defined by Equation (17).
Now it seems to be worthwhile to see the higher-order correlation in Equation (17) expanding the right-hand-side of Equation (17) as follows.
Entropy 12 00136 i003
where β is defined by:
β = α 1 α .
Substituting Equation (3) into Equation (19), one may eventually derive the following up-dating rule for the internal state, i.e.:
Entropy 12 00136 i004
where W i j 1 j n ( 2 n + 1 ) (0 ≤n < ∞) are the connection weight tensors between neurons involving such higher-order correlations as n ≥ 1 and are to be expressed by means of e i ( r ) and e i ( r ) comparing Equation (19) and Equation (21). Of course the lowest order connection weight W i j 1 ( 1 ) in Equation (21) corresponds to W i j 1 in Equation (12), i.e.:
Entropy 12 00136 i005
Thus the lowest correlation is reduced to the conventional quadratic framework expressed in terms of Equation (10) and Equation (12) as α 0 . Furthermore, for the higher-order connection tensors appearing in Equation (10c), one may readily obtain the following results:
Entropy 12 00136 i006
It should be borne in mind here that all of the connection tensors, i.e. W i j 1 j n ( 2 n + 1 ) (0 ≤n <∞) are to be uniquely determined in terms of the embedding vectors e i ( r ) ( 1 i N , 1 r L ) and e i ( r ) ( 1 i N , 1 r L ) , which may be related to each other according to Equation (5) to Equation (7). Thus the present approach substantially includes the higher-order correlations beyond the conventional approach defined by Equation (11), in which the correlation between neurons is restricted up to the second-order contribution corresponding to the quadratic Lyapunov functional given by Equation (10). For practical association of the stored patterns, the connection tensors W i j 1 j n ( n ) (1 ≤ n < ∞) defined by Equation (21) have to be utilised instead of the embedded vectors, i.e. e i ( r ) and e i ( r ) (1 ≤rL).

3. Results

The embedded vectors are set to the binary random vectors as follows:
e i r = sgn ( Z i ( r ) )    ( 1 r L ) ,
where Z i ( r ) where (1 ≤iN , 1 ≤ rL ) are the zero-mean pseudo-random numbers between -1 and +1. For simplicity, the activation function , Equation (1), is set to:
s i = f ( σ i ) = sgn ( σ i ) ,
where sgn (•) denotes the signum function defined by:
sgn ( x ) = { 1 ( x < 0 ) 0 ( x = 0 ) + 1 ( x > 0 )
The initial vector si (0) (1 ≤iN) is set to:
s i ( 0 ) = { e i ( s ) ( 1 i H d ) + e i ( s ) ( H d + 1 i N ) ,
where e i ( s ) is a target pattern to be retrieved and Hd is the Hamming distance between the initial vector si (0) and the target vector e i ( s ) . The retrieval is successful if:
m ( s ) ( t ) = i = 1 N e i ( s ) s i ( t )
results in 1 for t 1 , in which the system may be in a steady state such that:
s i ( t + 1 ) = s i ( t ) ,
σ i ( t + 1 ) = σ i ( t ) .
To see the retrieval ability of the present model, the success rate Sr is defined as the rate of the success for 1,000 trials with the different embedded vector sets e i ( r ) (1 ≤iN, 1 ≤rL). To control from the autocorrelation dynamics after the initial state (t~1) to the entropy based dynamics (t~Tmax) , the parameter α in Equation (17) was simply controlled by:
α = t T max α max    ( 0 t T max )
where Tmax and αmax are the maximum values of the iterations of the updating according to Equation (17) and α, respectively.
Choosing N = 200, η = 1, Tmax = 25, L/N = 0.5 and αmax = 1, we first present an example of the dynamics of the overlaps in Figures 1(a)−(d) (entropy based approach) and Figures 2(a)–(d) (associatron), in which the abscissa and the ordinate are for the retrieval steps after the initial states and the overlaps derived from Equation (16), respectively. Therein the cross symbols (×) and the open circles (o) represent the success of retrievals, in which Equation (8) and Equation (9) are satisfied, and the entropy defined by Equation (2), respectively, for a retrieval process. In addition the time dependence of the parameter α/αmax defined by Equation (31) is depicted as dots (.). In Figures 1(a)−(d) after a transient state, it may be confirmed that the complete association corresponding to the conditions, Equation (8) and Equation (9), can be achieved, even for such a relatively large Hamming distance of the initial vector from a target vector as Hd/N = 0.1-0.15. On the other hand, in Figures 2(a)–(d), a trapping in a local minimum is found to be inevitable for L/N = 0.5 (>>0.14 which is the relative storage capacity for the autocorrelation model as discussed by Amari and Maginu [8] (see Concluding remarks), in which Equation (8) and Equation (9) cannot be achieved even for Hd/N→0 with L/N > 0.5. In addition one may sees that the retrieval cannot be achieved beyond Hd/N = 0.05 as in Figures 2(c) and (d). From these results one may certainly confirm the advantage of our approach beyond the conventional models based on the quadratic Lyapunov (energy) functionals.
Figure 1. The time dependence of overlaps of the present entropy based model defined by Equation (17).
Figure 1. The time dependence of overlaps of the present entropy based model defined by Equation (17).
Entropy 12 00136 g001
Then we shall present the dependence of the success rate Sr on the loading rate L/N are depicted in Figure 3 for various Hamming distances Ηd with N = 100. For comparison, the corresponding results of the associatron model with α~0, i.e. Equation (11), are shown in Figure 4. Comparing between Figure 3 and Figure 4, it is found that the present approach may achieve a relatively larger memory capacity beyond the conventional autocorrelation strategy. Therefore the presently proposed nonlinear dynamics with the higher-order correlations involved in Equation (17) or Equation (21) based on the entropy functional to be minimized has a great advantage for the storage capacity beyond the conventional one based on Equation (10) and Equation (18).
Figure 2. The time dependence of overlaps of the associatron defined by Equation (18).
Figure 2. The time dependence of overlaps of the associatron defined by Equation (18).
Entropy 12 00136 g002
Figure 3. The dependence of the success rate on the loading rate α = L/N of autoassociation model based on Equation (17) (entropy based approach).
Figure 3. The dependence of the success rate on the loading rate α = L/N of autoassociation model based on Equation (17) (entropy based approach).
Entropy 12 00136 g003aEntropy 12 00136 g003b
Figure 4. The dependence of the success rate on the loading rate α = L/N of autoassociation model based on Equation (18) (associatron).
Figure 4. The dependence of the success rate on the loading rate α = L/N of autoassociation model based on Equation (18) (associatron).
Entropy 12 00136 g004
The depression of the success rate at L/N~1 in Figure 3 may be considered to result from the fact such that:
W i j = r = 1 L e i ( r ) e j ( r ) = δ i j    ( L = N ) ,
where Equation (18) reads:
σ i ( t + 1 ) = η j = 1 N w i j s j ( t ) = η s i ( t ) .
Thus, noting that si(t) = sgn(σi (t)) and η> 0, one has:
σ i ( t + 1 ) = η s i ( t ) = η s i ( 0 ) .    ( t 0 )

4. Concluding Remarks

In the present paper, we have proposed an entropy based association model instead of the conventional autocorrelation dynamics. From numerical results, it was found that a large memory capacity may be achieved on the basis of the entropy approach. This advantage of the association property of the present model is considered to result from the fact such that the present dynamics to update the internal state Equation (17) assures that the entropy, Equation (2) is minimized under the conditions, Equation (8) and Equation (9), which corresponds to the successful retrieval of a target pattern.
To conclude this work, we shall show the dependence of the storage capacity, which is defined as the area covered in terms of the success rate curves as shown in Figures 3 and Figure 4, on the Hamming distance in Figure 5. Therein one may see again the great advantage of the present model based on the entropy functional to be minimized beyond the conventional quadratic form. In fact one may realize the considerably larger storage capacity in the present model in comparison with the associatron over Hd/N~0-0.5. The memory retrievals for the associatron become troublesome near Hd/N = 0.5 as seen in Figure 5 since the directional cosine between the initial vector and a target pattern eventually vanishes therein. Remarkably, even in such a case, the present model attains a remarkably large memory capacity because of the higher-order correlations involved in Equation (17) or Equation (21), as expected from Figure 3.
As a future problem, it seems to be worthwhile to involve a chaotic dynamics in the present model introducing a periodic activation function such as sinusoidal one and to extend the autocorrelation model replacing e i ( r ) by e i ( r ) /N in the present approach, in which the connection matrix wij and the overlaps m(r) read:
w i j = 1 N r = 1 L e i ( r ) e j ( r ) ,
and:
m ( r ) ( t ) = 1 N i = 1 N e i ( r ) s i ( t ) ,
respectively, corresponding to Equation (12) and Equation (15). The entropy based approach with Equation (20), i.e. autocorrelation dynamics, is now in progress in the relation with chaos dynamics [12] and will be reported elsewhere as a separated paper and to be compared with the previous works [13,14] in the near future. Furthermore it seems to be worthwhile to examine the truncation effects of the expansion tensors as in Equation (21), which was not directly derived in our previous work [15], for practical applications related to the hardware implementation.
Figure 5. The dependence of the storage capacity on the Hamming distance. Here symbols o and x are for the entropy based approach and the associatron, respectively.
Figure 5. The dependence of the storage capacity on the Hamming distance. Here symbols o and x are for the entropy based approach and the associatron, respectively.
Entropy 12 00136 g005

Acknowledgments

This work was supported in part by the 21st Century COE (Center of Excellence) Program " Global Renaissance by Green Energy Revolution" and the Grant-in-Aid for Science Research (15300070) from the Ministry of Education Culture, Sports, Science and Technology of Japan.

References

  1. Anderson, J.A. A simple neural network generating interactive memory. Math. Biosci. 1972, 14, 197–220. [Google Scholar]
  2. Kohonen, T. Correlation matrix memories. IEEE Trans. Comput. 1972, C-21, 353–359. [Google Scholar]
  3. Nakano, K. Associatron-a model of associative memory. IEEE Trans. 1972, SMC-2, 381–388. [Google Scholar]
  4. Amari, S. Neural Theory of association and concept formation. Biol. Cybern. 1977, 26, 175–185. [Google Scholar] [CrossRef] [PubMed]
  5. Amit, D.J.; Gutfreund, H.; Sompolinsky, H. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys. Rev. Lett. 1985, 55, 1530–1533. [Google Scholar] [CrossRef] [PubMed]
  6. Gardner, E. Structure of metastable states in the Hopfield model. J. Phys. A: Math. Gen. 1986, 19, 1047–1052. [Google Scholar] [CrossRef]
  7. Kohonen, T.; Ruohonen, M. Representation of associated pairs by matrix operators. IEEE Trans. 1973, 22, 701–702. [Google Scholar]
  8. Amari, S.; Maginu, K. Statistical Neurodynamics of associative memory. Neural Networks 1988, 1, 63–73. [Google Scholar] [CrossRef]
  9. Morita, M. Associative memory with nonmonotone dynamics. Neural Networks 1993, 6, 115–126. [Google Scholar] [CrossRef]
  10. Yanai, H.-F.; Amari, S. Auto-associative Memory with two-stage dynamics of non-monotonic neurons. IEEE Trans. Neural Networks 1996, 7, 803–815. [Google Scholar] [CrossRef] [PubMed]
  11. Okada, M.; Shiino, M.; Fukai, T. Random and systematic dilutions of synaptic connections in a neural network with a nonmonotonic response functions. Phys. Rev. E 1998, 57, 2095–2103. [Google Scholar]
  12. Nakagawa, M. Chaos and Fractals in Engineering; World Scientific Inc.: Singapore, 1999. [Google Scholar]
  13. Yatsuki, S.; Miyajima, H. Associative ability of higher order neural networks. Neural Networks 1997, 2, 1299–1304. [Google Scholar]
  14. Gorban, A.N.; Mirkes, Y.M.; Wunsch, D.C. Higher order orthogonal tensor networks: Information capacity and reliability. Proc. Neural Networks 1997, 2, 1311–1314. [Google Scholar]
  15. Nakagawa, M. Entropy based associative model. Lect. Notes Comput. Sci. 2006, 4232/2006, 397–406. [Google Scholar]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top