Open Access
This article is

- freely available
- re-usable

*Entropy*
**2010**,
*12*(1),
136-147;
https://doi.org/10.3390/e12010136

Article

On the Entropy Based Associative Memory Model with Higher-Order Correlations

Department of Electrical Engineering, Faculty of Engineering, Nagaoka University of Technology, Kamitomioka 1603-1, Nagaoka, Niigata 940-21, Japan

Received: 2 January 2010 / Accepted: 18 January 2010 / Published: 22 January 2010

## Abstract

**:**

In this paper, an entropy based associative memory model will be proposed and applied to memory retrievals with an orthogonal learning model so as to compare with the conventional model based on the quadratic Lyapunov functional to be minimized during the retrieval process. In the present approach, the updating dynamics will be constructed on the basis of the entropy minimization strategy which may be reduced asymptotically to the above-mentioned conventional dynamics as a special case ignoring the higher-order correlations. According to the introduction of the entropy functional, one may involve higer-order correlation effects between neurons in a self-contained manner without any heuristic coupling coefficients as in the conventional manner. In fact we shall show such higher order coupling tensors are to be uniquely determined in the framework of the entropy based approach. From numerical results, it will be found that the presently proposed novel approach realizes much larger memory capacity than that of the quadratic Lyapunov functional approach, e.g., associatron.

Keywords:

association model; entropy; memory capacity; lyapunov functional## 1. Introduction

During the past quarter century, a large number of autoassociative models have been extensively investigated on the basis of the autocorrelation dynamics characterized by the quadratic Lyapunov functional to be minimized. Since the proposals of the pioneering retrieval models by Anderson [1], Kohonen [2], and Nakano [3], some works related to such an autoassociation model of the inter-connected neurons through an autocorrelation matrix were theoretically analyzed by Amari [4], Amit et al. [5] and Gardner [6]. So far it has been well appreciated that the storage capacity of the autocorrelation model, or the number of stored pattern vectors, L , to be completely associated vs. the number of neurons N, which is called the relative storage capacity or loading rate and denoted as α

_{c}= L/N , is evaluated as α_{c}~0.14 at most for the autocorrelation learning model with the activation function as the signum one (sgn(x ) for the abbreviation) [7,8].In contrast to the above-mentioned models with monotonous activation functions, neuro-dynamics with a nonmonotonous mapping was recently proposed by Morita [9], Yanai and Amari [10], Shiino and Fukai [11]. They clarified that the nonmonotonous mapping in a neuro-dynamics model possesses a remarkable advantage in the storage capacity, α

_{c}~0.27-0.4, superior than the conventional association models with monotonous activation functions, e.g., the signum or sigmoidal function. Therefore activation functions have been considered to be worthwhile of investigation, not only the associative memory models but also learning models in relation with chaos dynamics [12].In the above-mentioned association models, the dynamics have been restricted to the updating rule on the basis of the quadratic form of the Lyapunov functionals to be minimized through the retrieval process. That is, the nonlinearity of the dynamics results from the nonlinear characteristics of the activation function rather than the updating rule of the internal states derived from the quadratic Lyapunov, or energy, functional form.

From the above-mentioned viewpoint, we shall propose a novel approach based on the entropy defined in terms of the overlaps, which are defined by the inner products between the state vector and the embedded vectors. That is, in the present model the functional to be minimized is defined in terms of the entropy instead of the conventional quadratic functionals. Then it will be found that the higher order dynamics is to be involved in a self-contained manner in the present entropy-based approach. In Section 2 a theoretical framework based on the entropy approach will be described to present the relationship between the present proposal and the conventional model with a quadratic Lyapunov functional to be minimized. Some numerical results will be given in Section 3 and then Section 4 will be devoted to concluding remarks.

## 2. Theory

Let us consider an associative model with the embedded binary vector ${e}_{i}^{(r)}$ = ±1 (1 ≤ i ≤ N,1 ≤ r ≤ L), where N and L are the number of neurons and the number of embedded vectors, respectively, to be retrieved. The states of the neural network are to be characterized in terms of the vector s
where f (•) is the activation function of the neuron.

_{i}(1 ≤ i ≤ N) and the internal states σ_{i}(1 ≤ i ≤ N) which are related each other in terms of:
$${s}_{i}=f({\sigma}_{i})(1\le i\le N),$$

Then we introduce the following entropy which is to be related to the overlaps:
where the overlaps m
where the covariant vector ${e}_{i}^{\u2020(r)}$ is defined in terms of the following orthogonal relation:
where ω

$$I=-\frac{1}{2}{\displaystyle \sum _{r=1}^{L}{m}^{(r)2}\text{log}\left\{{m}^{(r)2}\right\}},$$

^{(r)}(r = 1,2,...,L) are defined by:
$${m}^{(r)2}={\displaystyle \sum _{r=1}^{N}{e}_{i}^{\u2020(r)}{S}_{i}};$$

$$\sum _{r=1}^{N}}{e}_{i}^{\u2020(r)}{e}_{i}^{(s)}={\delta}_{rs}(1\le r,s\le L),$$

$${e}_{i}^{\u2020(r)}={\displaystyle \sum _{{r}_{\prime}=1}^{L}}{a}_{r{r}_{\prime}}{e}_{i}^{({r}_{\prime})},$$

$${a}_{r{r}_{\prime}}={({\omega}^{-1})}_{r{r}^{\prime}},$$

$${\omega}_{r{r}^{\prime}}=({\displaystyle \sum _{i=1}^{N}{e}_{i}^{(r)}{e}_{i}^{({r}^{\prime})}}),$$

^{−1}denotes the inverse matrix of ω.Then the entropy defined by Equation (2) can be minimized by the following condition:
and:

$$|{m}^{(r)}|={\delta}_{rs}(1\le r,s\le L),$$

$$\sum _{r=1}^{L}}{m}^{(r)2}=1.$$

That is, regarding ${m}^{(r)2}$ $(1\le r\le L)$ as the probability distribution in Equation (2), a target pattern may be retrieved by minimizing the entropy I with respect to m
where ${S}_{i}^{\u2020}$is the covariant vector defined by:
and the connection matrix w

^{(r)}or the state vector s_{i}to achieve the retrieval of a target pattern in which the Equation (8) and Equation (9) are to be satisfied. Therefore the entropy function may be considered to be a functional to be minimized during the retrieval process of the auto-association model instead of the conventional quadratic Lyapunov, i.e. energy functional, E:
$$E=-\frac{1}{2}{\displaystyle \sum _{i=1}^{N}{\displaystyle \sum _{j=1}^{N}{W}_{ij}}}{S}_{i}^{\u2020}{S}_{j},$$

$${S}_{i}^{\u2020}={\displaystyle \sum _{r=1}^{L}{\displaystyle \sum _{j=1}^{N}{e}_{i}^{\u2020(r)}}}{e}_{j}^{\u2020(r)}{s}_{j},$$

_{ij}is defined in terms of:
$${W}_{ij}={\displaystyle \sum _{r=1}^{L}{e}_{i}^{(r)}}{e}_{j}^{\u2020(r)}.$$

Substituting Equation (12) into Equation (10), one may readily find:

$$E=-\frac{1}{2}{\displaystyle \sum _{r=1}^{L}{m}^{(r)2}}.$$

According to the steepest descent approach in the discrete time model, the updating rule of the internal states ${s}_{i}=f({\sigma}_{i})=\mathrm{sgn}({\sigma}_{i}),$ may be defined by:
where η ( > 0) is a coefficient. Substituting Equation (2) and Equation (3) into Equation (14) and noting the following relation with the aid of Equation (11):
one may readily derive the following relation:

$${\sigma}_{i}(t+1)=-\eta \frac{\partial I}{\partial {s}_{i}^{\u2020}}(1\le i\le N),$$

$${m}^{(r)}={\displaystyle \sum _{i=1}^{N}{e}_{i}^{\u2020(r)}{s}_{i}}={\displaystyle \sum _{i=1}^{N}{e}_{i}^{(r)}{s}_{i}^{\u2020}}$$

Generalizing somewhat the above dynamics, we propose the following dynamic rule for the internal states in order to unify the conventional quadratic dynamics as well as the presently proposed entropy approach as mentioned below:

$$\begin{array}{l}{\sigma}_{i}(t+1)=\eta {\displaystyle \sum _{r=1}^{L}{e}_{i}^{(r)}\left\{{\displaystyle \sum _{j=1}^{N}{e}_{j}^{(r)}{s}_{j}(t)}\right\}}\left[1+\mathrm{log}\left\{\left(1-\alpha \right)+\alpha {\left\{{\displaystyle \sum _{j=1}^{N}{e}_{j}^{(r)}{s}_{j}(t)}\right\}}^{2}\right\}\right]\\ =\eta {\displaystyle \sum _{r=1}^{L}{e}_{i}^{(r)}}\left\{{m}^{(r)}(t)\right\}\left[1+\mathrm{log}\left\{\left(1-\alpha \right)+\alpha {\left\{{m}^{(r)}(t)\right\}}^{2}\right\}\right].\end{array}$$

In the above expression α (0 < α < 1) is considered to be a control parameter of the present model as follows. First, in the limit of α→0, the above dynamics will be reduced to the conventional autocorrelation dynamics:

On the other hand, Equation (17) results in Equation (16) in the case of α→1. Therefore one may control the dynamics between the autocorrelation (α→0) and the entropy based approach ( α→1) on the basis of the presently proposed generalized approach defined by Equation (17).

Now it seems to be worthwhile to see the higher-order correlation in Equation (17) expanding the right-hand-side of Equation (17) as follows.
where β is defined by:

$$\beta =\frac{\alpha}{1-\alpha}.$$

Substituting Equation (3) into Equation (19), one may eventually derive the following up-dating rule for the internal state, i.e.:
where ${W}_{ij1\dots jn}^{(2n+1)}$ (0 ≤n < ∞) are the connection weight tensors between neurons involving such higher-order correlations as n ≥ 1 and are to be expressed by means of ${e}_{i}^{(r)}$ and ${e}_{i}^{\u2020(r)}$ comparing Equation (19) and Equation (21). Of course the lowest order connection weight ${W}_{ij1}^{(1)}$in Equation (21) corresponds to ${W}_{ij1}$in Equation (12), i.e.:

Thus the lowest correlation is reduced to the conventional quadratic framework expressed in terms of Equation (10) and Equation (12) as $\alpha \to 0$. Furthermore, for the higher-order connection tensors appearing in Equation (10c), one may readily obtain the following results:

It should be borne in mind here that all of the connection tensors, i.e. ${W}_{ij1\dots jn}^{(2n+1)}$ (0 ≤n <∞) are to be uniquely determined in terms of the embedding vectors ${e}_{i}^{(r)}(1\le i\le N,1\le r\le L)$ and ${e}_{i}^{\u2020(r)}(1\le i\le N,1\le r\le L)$, which may be related to each other according to Equation (5) to Equation (7). Thus the present approach substantially includes the higher-order correlations beyond the conventional approach defined by Equation (11), in which the correlation between neurons is restricted up to the second-order contribution corresponding to the quadratic Lyapunov functional given by Equation (10). For practical association of the stored patterns, the connection tensors ${W}_{ij1\dots jn}^{(n)}$ (1 ≤ n < ∞) defined by Equation (21) have to be utilised instead of the embedded vectors, i.e. ${e}_{i}^{(r)}$ and ${e}_{i}^{\u2020(r)}$ (1 ≤r ≤L).

## 3. Results

The embedded vectors are set to the binary random vectors as follows:
where ${Z}_{i}^{(r)}$ where (1 ≤i ≤ N , 1 ≤ r ≤L ) are the zero-mean pseudo-random numbers between -1 and +1. For simplicity, the activation function , Equation (1), is set to:
where sgn (•) denotes the signum function defined by:

$${e}_{i}^{r}=\mathrm{sgn}({Z}_{i}^{(r)})(1\le r\le L),$$

$${s}_{i}=f({\sigma}_{i})=\mathrm{sgn}({\sigma}_{i}),$$

$$\mathrm{sgn}(x)=\{\begin{array}{cc}\hfill -1\hfill & (x<0)\hfill \\ \hfill 0\hfill & (x=0)\hfill \\ \hfill +1\hfill & (x>0)\hfill \end{array}$$

The initial vector s
where ${e}_{i}^{(s)}$ is a target pattern to be retrieved and H
results in 1 for $t\ge 1$, in which the system may be in a steady state such that:

_{i}(0) (1 ≤i ≤ N) is set to:
$${s}_{i}(0)=\{\begin{array}{cc}-{e}_{i}^{(s)}\hfill & \hfill (1\le i\le {H}_{d})\hfill \\ +{e}_{i}^{(s)}\hfill & \hfill ({\text{H}}_{d}+1\le i\le N)\hfill \end{array},$$

_{d}is the Hamming distance between the initial vector s_{i}(0) and the target vector ${e}_{i}^{(s)}$. The retrieval is successful if:
$${m}^{(s)}(t)={\displaystyle \sum _{i=1}^{N}{e}_{i}^{\u2020(s)}{s}_{i}}(t)$$

$${s}_{i}(t+1)={s}_{i}(t),$$

$${\sigma}_{i}(t+1)={\sigma}_{i}(t).$$

To see the retrieval ability of the present model, the success rate S
where T

_{r}is defined as the rate of the success for 1,000 trials with the different embedded vector sets ${e}_{i}^{(r)}$(1 ≤i ≤N, 1 ≤r ≤L). To control from the autocorrelation dynamics after the initial state (t~1) to the entropy based dynamics (t~T_{max}) , the parameter α in Equation (17) was simply controlled by:
$$\alpha =\frac{t}{{T}_{\mathrm{max}}}{\alpha}_{\mathrm{max}}(0\le t\le {T}_{\mathrm{max}})$$

_{max}and α_{max}are the maximum values of the iterations of the updating according to Equation (17) and α, respectively.Choosing N = 200, η = 1, T

_{max}= 25, L/N = 0.5 and α_{max}= 1, we first present an example of the dynamics of the overlaps in Figures 1(a)−(d) (entropy based approach) and Figures 2(a)–(d) (associatron), in which the abscissa and the ordinate are for the retrieval steps after the initial states and the overlaps derived from Equation (16), respectively. Therein the cross symbols (×) and the open circles (o) represent the success of retrievals, in which Equation (8) and Equation (9) are satisfied, and the entropy defined by Equation (2), respectively, for a retrieval process. In addition the time dependence of the parameter α/α_{max}defined by Equation (31) is depicted as dots (.). In Figures 1(a)−(d) after a transient state, it may be confirmed that the complete association corresponding to the conditions, Equation (8) and Equation (9), can be achieved, even for such a relatively large Hamming distance of the initial vector from a target vector as H_{d}/N = 0.1-0.15. On the other hand, in Figures 2(a)–(d), a trapping in a local minimum is found to be inevitable for L/N = 0.5 (>>0.14 which is the relative storage capacity for the autocorrelation model as discussed by Amari and Maginu [8] (see Concluding remarks), in which Equation (8) and Equation (9) cannot be achieved even for H_{d}/N→0 with L/N > 0.5. In addition one may sees that the retrieval cannot be achieved beyond H_{d}/N = 0.05 as in Figures 2(c) and (d). From these results one may certainly confirm the advantage of our approach beyond the conventional models based on the quadratic Lyapunov (energy) functionals.**Figure 1.**The time dependence of overlaps of the present entropy based model defined by Equation (17).

Then we shall present the dependence of the success rate S

_{r}on the loading rate L/N are depicted in Figure 3 for various Hamming distances Η_{d}with N = 100. For comparison, the corresponding results of the associatron model with α~0, i.e. Equation (11), are shown in Figure 4. Comparing between Figure 3 and Figure 4, it is found that the present approach may achieve a relatively larger memory capacity beyond the conventional autocorrelation strategy. Therefore the presently proposed nonlinear dynamics with the higher-order correlations involved in Equation (17) or Equation (21) based on the entropy functional to be minimized has a great advantage for the storage capacity beyond the conventional one based on Equation (10) and Equation (18).**Figure 3.**The dependence of the success rate on the loading rate α = L/N of autoassociation model based on Equation (17) (entropy based approach).

**Figure 4.**The dependence of the success rate on the loading rate α = L/N of autoassociation model based on Equation (18) (associatron).

The depression of the success rate at L/N~1 in Figure 3 may be considered to result from the fact such that:
where Equation (18) reads:
Thus, noting that s

$${W}_{ij}={\displaystyle \sum _{r=1}^{L}{e}_{i}^{(r)}{e}_{j}^{\u2020(r)}}={\delta}_{ij}(L=N),$$

$$\begin{array}{cc}\hfill {\sigma}_{i}(t+1)& =\eta {\displaystyle \sum _{j=1}^{N}{w}_{ij}{s}_{j}(t)}\hfill \\ & =\eta {s}_{i}(t).\hfill \end{array}$$

_{i}(t) = sgn(σ_{i}(t)) and η> 0, one has:
$$\begin{array}{cc}\hfill {\sigma}_{i}(t+1)& =\eta {s}_{i}(t)\hfill \\ & =\eta {s}_{i}(0).(t\ge 0)\hfill \end{array}$$

## 4. Concluding Remarks

In the present paper, we have proposed an entropy based association model instead of the conventional autocorrelation dynamics. From numerical results, it was found that a large memory capacity may be achieved on the basis of the entropy approach. This advantage of the association property of the present model is considered to result from the fact such that the present dynamics to update the internal state Equation (17) assures that the entropy, Equation (2) is minimized under the conditions, Equation (8) and Equation (9), which corresponds to the successful retrieval of a target pattern.

To conclude this work, we shall show the dependence of the storage capacity, which is defined as the area covered in terms of the success rate curves as shown in Figures 3 and Figure 4, on the Hamming distance in Figure 5. Therein one may see again the great advantage of the present model based on the entropy functional to be minimized beyond the conventional quadratic form. In fact one may realize the considerably larger storage capacity in the present model in comparison with the associatron over H

_{d}/N~0-0.5. The memory retrievals for the associatron become troublesome near H_{d}/N = 0.5 as seen in Figure 5 since the directional cosine between the initial vector and a target pattern eventually vanishes therein. Remarkably, even in such a case, the present model attains a remarkably large memory capacity because of the higher-order correlations involved in Equation (17) or Equation (21), as expected from Figure 3.As a future problem, it seems to be worthwhile to involve a chaotic dynamics in the present model introducing a periodic activation function such as sinusoidal one and to extend the autocorrelation model replacing ${e}_{i}^{\u2020(r)}$ by ${e}_{i}^{(r)}$ /N in the present approach, in which the connection matrix w
and:
respectively, corresponding to Equation (12) and Equation (15). The entropy based approach with Equation (20), i.e. autocorrelation dynamics, is now in progress in the relation with chaos dynamics [12] and will be reported elsewhere as a separated paper and to be compared with the previous works [13,14] in the near future. Furthermore it seems to be worthwhile to examine the truncation effects of the expansion tensors as in Equation (21), which was not directly derived in our previous work [15], for practical applications related to the hardware implementation.

_{ij}and the overlaps m^{(r)}read:
$${w}_{ij}=\frac{1}{N}{\displaystyle \sum _{r=1}^{L}{e}_{i}^{(r)}{e}_{j}^{(r)}},$$

$${m}^{(r)}(t)=\frac{1}{N}{\displaystyle \sum _{i=1}^{N}{e}_{i}^{(r)}{s}_{i}(t)},$$

**Figure 5.**The dependence of the storage capacity on the Hamming distance. Here symbols o and x are for the entropy based approach and the associatron, respectively.

## Acknowledgments

This work was supported in part by the 21st Century COE (Center of Excellence) Program " Global Renaissance by Green Energy Revolution" and the Grant-in-Aid for Science Research (15300070) from the Ministry of Education Culture, Sports, Science and Technology of Japan.

## References

- Anderson, J.A. A simple neural network generating interactive memory. Math. Biosci.
**1972**, 14, 197–220. [Google Scholar] - Kohonen, T. Correlation matrix memories. IEEE Trans. Comput.
**1972**, C-21, 353–359. [Google Scholar] - Nakano, K. Associatron-a model of associative memory. IEEE Trans.
**1972**, SMC-2, 381–388. [Google Scholar] - Amari, S. Neural Theory of association and concept formation. Biol. Cybern.
**1977**, 26, 175–185. [Google Scholar] [CrossRef] [PubMed] - Amit, D.J.; Gutfreund, H.; Sompolinsky, H. Storing infinite numbers of patterns in a spin-glass model of neural networks. Phys. Rev. Lett.
**1985**, 55, 1530–1533. [Google Scholar] [CrossRef] [PubMed] - Gardner, E. Structure of metastable states in the Hopfield model. J. Phys. A: Math. Gen.
**1986**, 19, 1047–1052. [Google Scholar] [CrossRef] - Kohonen, T.; Ruohonen, M. Representation of associated pairs by matrix operators. IEEE Trans.
**1973**, 22, 701–702. [Google Scholar] - Amari, S.; Maginu, K. Statistical Neurodynamics of associative memory. Neural Networks
**1988**, 1, 63–73. [Google Scholar] [CrossRef] - Morita, M. Associative memory with nonmonotone dynamics. Neural Networks
**1993**, 6, 115–126. [Google Scholar] [CrossRef] - Yanai, H.-F.; Amari, S. Auto-associative Memory with two-stage dynamics of non-monotonic neurons. IEEE Trans. Neural Networks
**1996**, 7, 803–815. [Google Scholar] [CrossRef] [PubMed] - Okada, M.; Shiino, M.; Fukai, T. Random and systematic dilutions of synaptic connections in a neural network with a nonmonotonic response functions. Phys. Rev. E
**1998**, 57, 2095–2103. [Google Scholar] - Nakagawa, M. Chaos and Fractals in Engineering; World Scientific Inc.: Singapore, 1999. [Google Scholar]
- Yatsuki, S.; Miyajima, H. Associative ability of higher order neural networks. Neural Networks
**1997**, 2, 1299–1304. [Google Scholar] - Gorban, A.N.; Mirkes, Y.M.; Wunsch, D.C. Higher order orthogonal tensor networks: Information capacity and reliability. Proc. Neural Networks
**1997**, 2, 1311–1314. [Google Scholar] - Nakagawa, M. Entropy based associative model. Lect. Notes Comput. Sci.
**2006**, 4232/2006, 397–406. [Google Scholar]

© 2010 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).