1. Introduction
Artificial neural networks are a type of artificial intelligence computing system modeled on the human brain and nervous system. The mathematical theory of neural networks explores how mathematical models can be used to understand and manipulate neural networks. In this regard, statistical learning theory is a valuable tool for studying the behavior of neural networks [
1]. However, this paper will focus on a specific learning algorithm that is a biologically plausible alternative to the well-known backpropagation algorithm. We will use for this purpose some results from dynamical systems theory.
The basic building block of all artificial neural networks (ANNs) is the
Neuron. Mathematically, a neuron is a function
,
, parameterized by a weight vector
and defined by
where
is a suitable, usually continuous real function, known as the activation function. A feed-forward neural network is built by the composition of many (perhaps billions) of such simple functions, organized in layers. Networks with more than three layers are known in the literature as deep neural networks. For instance, the ANN that was trained to play the game of Chess or Go with superhuman performance, Alpha-zero, is constituted by more than 80 layers, with each layer having thousands of neurons [
2,
3]. For other applications, like natural language understanding or translation, recurrent networks are used. These networks are not necessarily organized by layers and thus do not form an acyclic graph.
The goal of training such an artificial neural network is to find the correct parameters so that the function implemented by the ANN satisfies certain conditions or minimizes some objective function. For example, in supervised learning, a training set of input–output pairs is used to adjust the parameters until the error between the output of the ANN and the true values is minimized, using gradient descent in the parameter space. Because the error is propagated from the output layer backwards, the algorithm used is known as
backpropagation (see, for instance, [
4], Chapter 5).
The backpropagation approach to supervised learning stands out as the most successful algorithm for training artificial neural networks. But despite ANNs being originally bio-inspired, backpropagation is widely considered to be biologically implausible [
5]. Among other reasons, it (i) lacks local error representation, (ii) uses distinct forward and backward information passes and (iii) requires symmetric feedback weights. One path towards bridging the gap between biology and machine learning is thus to explore other learning paradigms, closer to biological reality.
Equilibrium propagation (EP) is an energy-based model for recurrent neural networks, which operates by minimizing an energy function, uses a local learning rule and just one kind of neural computation, satisfying restrictions (i) and (ii) [
5]. This model considers continuous-time neural dynamics, and so is a departure from backpropagation, and makes discrete per-layer updates. However, it also requires symmetric feedback weights to work.
The original idea was followed by several advances addressing the challenges of convergence, scalability, and hardware efficiency while reinforcing theoretical ties to traditional backpropagation.
Laborieux et al. [
6] extended EP to deep convolutional networks using symmetric nudging; their work on CIFAR-10 reports an 11.7% test error and lower memory use, though convergence is about 20% slower than backpropagation through time. Martin et al. [
7] introduced the EqSpike algorithm for spiking networks and achieved 97.6% accuracy on MNIST. Foroushani et al. [
8] implemented EP on an analog circuit and reported a 250-fold acceleration in the relaxation process relative to a Python baseline. Kiraz et al. [
9] focused on parameter optimization by analyzing the effects of feedback current and learning rate, while Laborieux and Zenke [
10] proposed holomorphic EP, which computes exact gradients using finite teaching signals and matches backpropagation performance on an ImageNet 32 × 32 benchmark.
The problem of symmetric feedback weights was tackled with the introduction of the DirEcted EP (DEEP) model, which allows for asymmetric feedback connections while abandoning the need for a global energy function [
11]. The issue in doing so is that the convergence of the neuronal dynamics is no longer assured, because there is no energy function guiding the network’s behavior. In addition, the weight update rule, while biologically inspired, does not seem to have any ties to the gradient of the objective function, and learning is thus, in general, not guaranteed.
The main contribution of the present work is a generalization of the DEEP model by adding leakage to non-input neurons, which solves the stability issues found in the earlier work. New conditions for the convergence of the neuronal dynamics of DEEP’s inference phase are established, and a different local weight update rule is proposed, with close ties to the gradient of the objective function. Moreover, sufficient conditions for a small-sized network to learn following the proposed weight update rule are also determined.
The remainder of the paper is organized as follows. In
Section 2, common notation and the EP and DEEP models are introduced.
Section 3 goes over the stability problems in the DEEP model. In
Section 4, our generalization of the DEEP model is presented along with new conditions for convergence of the inference and learning phases.
Section 5 discusses a new learning rule and its ties to the gradient of the objective function.
Section 6 concludes the paper.
2. Equilibrium Propagation and Directed Equilibrium Propagation
In this section, we begin by defining some notation to be used throughout this text. Then, we present both the original EP model and the DEEP model.
2.1. Notation
In Equation (
1),
is commonly referred to as the neuron’s bias. In this text, the bias is represented as a fixed neuron (neuron 0) that maintains a state of one and connects to every non-input neuron. Then, the weight associated with the connection from this neuron to a neuron
i becomes neuron
i’s bias.
For an architecture consisting of a total of neurons, where P are input neurons, L are hidden neurons, and K are output neurons, both EP and DEEP are entirely described by the following elements:
A state vector , representing the neuronal activities. This state vector is composed of sub-vectors corresponding to the (fixed) input , the hidden neurons , and the output neurons , where corresponds to the bias neuron. The value can be biologically interpreted as the firing rate of neuron j;
A weight matrix , where represents the weight of the connection from neuron i to j;
A system of continuous-time differential equations that governs the dynamics of the network.
From this point onward, the time dependency of the state variable will be omitted to simplify the notation.
2.2. Equilibrium Propagation
Originally introduced in [
5], EP is a learning framework for energy-based models that employs a network of recurrently connected neurons with symmetric weights. Its goal is to provide a more biologically plausible alternative to conventional ANNs by circumventing the backpropagation algorithm.
This model is driven by an energy function F, which is the central object of its behavior: all quantities of interest (fixed points, cost function, objective function, gradient formula) can be defined or formulated directly in terms of F.
2.2.1. The Network
The quantity is the weight associated with the connection between neurons i and j. Self-loops are excluded, meaning for all i, and input neurons do not form connections among themselves, i.e., for all .
In
Figure 1, the architecture of a three-neuron network of the EP model is represented. The state vector is
, containing four elements: a constant bias term
, an input term
, a hidden term
and an output term
. Each
corresponds to the activity of the
neuron. The synaptic connections are described by the weight matrix
(note that
is omitted from
Figure 1 since 1 is an input neuron and so
).
2.2.2. Auxiliary Functions
There are some functions that need to be defined in order to establish the dynamics of the network. The internal energy function,
E, depends only on the state of the network and is defined as
It should be noted that the notation in this equation differs slightly from the one presented in [
5]. In particular, the notation used was simplified due to the alternative version of stochastic gradient descent employed (detailed in Equation (
8)), under which
holds for all
j.
The external energy function
C, which corresponds to the cost function, is the mean squared error and is defined as
where
is the vector containing the target output for the training input. The clamping factor, denoted by
, is the parameter that determines the extent to which the target output
y influences the network. Incorporating this “influence parameter”, the total energy function
F is then defined as
Finally, the loss function J is defined as the external energy at the equilibrium state , which is achieved after adjusting the neurons over the time interval to minimize the internal energy function.
2.2.3. The Dynamics and Training Algorithm
The EP dynamics are defined by a set of differential equations. The activity of each adjustable neuron,
,
, evolves in the negative direction of the gradient of the total energy,
where
Recall that the indicator function is defined as
and
is an output neuron. As a result, the total energy of the system is guaranteed to decrease over time. To promote convergence and keep the firing rates confined within the interval
, a different form of gradient descent is used in the neuronal update rule. Take
as the smallest time interval considered between iterations. At iteration step
, the standard gradient descent update rule takes the form
However, in what follows, the update rule is modified to
The training algorithm has two distinct phases, corresponding to setting (first phase) and (second phase).
The first phase, which corresponds to the inference process, consists of, given a fixed input expressed by the values taken by the input neurons, allowing the network to evolve toward an equilibrium state with respect to Equation (
5) with
. In the second phase, commonly referred to as the learning phase, a small increment in
introduces a perturbation to the system. This slight adjustment causes the output neurons to shift their activities closer to the target values, which, in turn, propagates through the network. The system then relaxes into a new equilibrium state with respect to (
5) that reduces the loss function.
Let us denote the first and second equilibrium states by
and
. The weight updates are then computed based on the gradients of the loss function, which, in the limit
, solely depend on
and
:
This Equation for the loss gradient of
was proved in [
5].
2.3. DirEcted Equilibrium Propagation (DEEP)
DEEP was proposed in [
11] and consists of a generalization of the original EP model. Specifically, DEEP “seeks to improve the biological plausibility of the original EP model, by removing all structural restrictions and enabling the architecture of the model to be a complete directed (asymmetric connections) graph”.
Figure 2 shows an example of such a Neural Network.
Due to the lack of symmetry of the synaptic weights, DEEP is not energy-based. This is because its neuronal dynamics are governed by vector fields, which are not gradient fields.
Fixing (clamping) the input neurons’ states, the neuronal dynamics proposed is
where
is the vector of target/desired outputs and the cost function,
C, is identical to the one specified in (
3) for the original formulation. The training algorithm also consists of two distinct phases: the first phase, where
, and the second phase, where
. During both phases, the network reaches equilibrium states, denoted by
in the first phase and
in the second. The objective function is defined as the cost when the network is at its first equilibrium state,
. In the inference phase, the activities of the input neurons are fixed, and the network evolves to the equilibrium state
, from which the output is read at the corresponding output neurons
.
The learning rule used is
where
and
are the number of steps in a discretization of the first and second phases, respectively, and
is the
ith neuron’s state after
m steps.
General Stability Test
Consider a nonlinear, time-invariant dynamical system with state variable
,
where
is a nonlinear continuously differentiable function (a
map).
Definition 1 (Equilibrium state)
. An equilibrium state of the system represented by Equation (12) is one for which . Definition 2 (Stable in the sense of Lyapunov). A system is said to be stable in the sense of Lyapunov around if, for any , there exists such that if , then for all .
Definition 3 (Locally asymptotically stable). A system is said to be locally asymptotically stable around if it is stable in the sense of Lyapunov and there exists such that, if , then .
The following result, which concerns the stability of nonlinear dynamical systems, is stated and proved in [
11]:
Theorem 1 (Farinha)
. Let be an equilibrium state of Equation (12), the Jacobian matrix of f evaluated at and . If, for , the following two conditions are satisfied- 1.
;
- 2.
;
then is locally asymptotically stable.
It yields, as a corollary, sufficient conditions for the stability of the networks’ dynamics during an inference phase:
Corollary 1 (DEEP Inference Sufficient Conditions for Stability Verification)
. Let be an equilibrium state with respect to the dynamics given by Equation (10). If, for , the following two conditions are satisfied:- 1.
;
- 2.
;
then is locally asymptotically stable.
4. Generalizing the DEEP Model
One way to improve the DEEP model and overcome the problem described in the previous Section is to add a leakage term to the neurons. There are several ways of represent this in the model: one would be to introduce a new neuron and letting be the ith neuron’s leakage term; another, that simplifies the mathematical treatment of the model, is to allow the connections ending in input neurons to be different from 0.
Since these neurons are clamped in both phases of training, they are unaffected by this change. As to each of the remaining neurons, they gain a leakage term (which was previously equal to 0).
The dynamics of the network remain the same as in Equation (
10), except now the first
terms of the sum
are not necessarily 0. We note that, by adding leakage to the model
We now present results attained for the stability of the altered DEEP model, which incorporates the newly added leakage and may shed light on this model’s need for leakage. We then show that these new stability conditions and Corollary 1 are also enough to ensure stability in the subsequent learning phase (). Finally, an interesting link between the Corollary 1 and the new conditions is described.
4.1. New Stability Conditions
Let us consider the discretized system, with
as the smallest time interval considered, and let
be the system’s state at the instant
. Then, following the same version of stochastic gradient descent considered in Equation (
8), now in vectorial form, we have
This means we can view an inference phase as a fixed point iteration of the function
, defined as
An equilibrium state with respect to the network’s dynamics simply becomes a fixed point of
. We assume
is continually differentiable and
(see end of this subsection). We will need the following theorems (see, for instance, [
13,
14]):
Theorem 2. Let X be a convex set in and a function in . Letting be the Jacobian of Φ, (The matrix norm 1
of a matrix A is defined as ) ifthen the function Φ is contractive in X. Theorem 3 (Fixed point theorem in
)
. Let E be a finite-dimensional normed space and X a closed convex subset of E. Let Φ be a contractive function in X, such thatThen, the following statements are valid:
- (1)
Φ has a unique fixed point z in X.
- (2)
If is the succession of terms in E such that and then converges to z.
We take and , which clearly satisfy the conditions of Theorem 3. Considering that the activation function restricts neurons’ activations to the interval , is immediate. All we have to verify is that is contractive in X. We are now ready to prove the following result:
Theorem 4 (DEEP Sufficient Conditions for Stability of an Inference Phase)
. The inference phase of the DEEP model converges to an equilibrium state if the activation function ρ used is such that , and for ,where . Proof. For any suitable function
F, let
denote its Jacobian matrix. By Theorem 2,
suffices to guarantee convergence of DEEP’s inference phase. Let
. Since
, where
I is the identity operator, we have
Since if , and , is a diagonal matrix and .
Due to the linearity of the vector field
V defined in (
13),
is independent of
x. This means
is enough to ensure (
25), since we are taking the supremum over the product of
, which is not greater than 1, and
, a constant real number strictly less than 1.
We perform the calculations for the
column of
(
is matrix (
19)):
For sufficiently small
,
and we can remove the absolute value of the first term. We are left with
where, in the last step, we cancelled the weights
that appeared on both sums. The left term is familiar and can be written as:
or, in general, for
,
□
This result can be interpreted as the DEEP model needing leakage to compensate for the existence of weights with negative values.
The sigmoid function or the hyperbolic tangent are examples of commonly used activation functions which satisfy Theorem 4’s statement. Note that the activation function introduced in (
18) is not
, however, it is
almost everywhere, and so arbitrarily good approximations by differentiable functions can be used.
4.2. On the Stability During Learning Phase ()
As previously stated, both Corollary 1 and Theorem 4 also ensure convergence of the learning phase, as can be seen in this Section.
4.2.1. DEEP Model
In
Section 3.2, it was mentioned that in [
11], Theorem 1 was proved using the Gershgorin Circle Theorem to guarantee all eigenvalues of matrix J had strictly negative real part. This was done by imposing the restriction that each circle is centered on the left half of the complex plane, with a radius smaller than the distance from its center to the origin, i.e.,
where
is the
Gershgorin circle’s radius.
Let K be the number of output neurons. Noting that J is a square matrix of size , and the cost function used is the mean squared error, when , the only change to J is that is subtracted to the diagonal entries corresponding to output neurons (i.e., with ).
Assuming Corollary 1’s conditions were satisfied during an inference phase of the DEEP model, then we simply have to check they still are for :
Given that and the previous condition holds, .
Therefore, if the Corollary 1’s stability conditions held during DEEP’s inference phase, they still do during its subsequent learning phase.
4.2.2. Theorem 4
The same is true with the new stability conditions defined in Theorem 4. Going back to Theorem 4’s proof, and again considering the only change in matrix
J is that
is subtracted from the diagonal entries corresponding to output neurons, we now have
However, since
, it can be argued that
This is because if
, for sufficiently small
, (
35) becomes
If
, again for sufficiently small
, (
35) becomes
which is trivial given
is, again, sufficiently small. Either way, (
35) holds and
where the last inequality comes from our assumption that the network satisfied Theorem 4’s conditions for stability.
4.3. Remark on the Link Between New and Previous Stability Conditions
Theorem 2 also holds if the matrix norm
∞ is used instead (see [
13,
14]). Then, the same reasoning used in proving Theorem 4 leads one to recover exactly the previous stability conditions (given by Corollary 1). Similarly, one can adapt Theorem 1’s proof. Recall that the Gershgorin Circle Theorem was used to ensure all the eigenvalues of matrix (
19) had negative real parts. If we apply the same reasoning to the transpose of matrix (
19), which has exactly the same eigenvalues, we get instead the new stability conditions. Two very different approaches to the same problem seem to yield the same results.
In this section, we presented the new DEEP model with leakage and proved stability conditions for both the learning and inference phases. In the next section, we will additionally introduce a learning rule inspired by [
15] and analyze its relationship with the gradient of the objective function.
6. Conclusions
In this paper, we employed dynamical systems theory to rigorously analyze learning dynamics in models of biologically plausible neural networks, encompassing both feed-forward and recurrent architectures. By critically examining the DirEcted Equilibrium Propagation (DEEP) model introduced in [
11], we identified inherent stability limitations arising from the absence of an energy function. To address these, we proposed an extension of the DEEP model with a neuronal leakage term and established, via theoretical analysis, that this modification ensures convergence in both the inference and learning phases.
Additionally, we clarified the relationship among various local learning rules, demonstrating that the update mechanisms in equilibrium propagation [
5] align with those in [
15], reinforcing the broader applicability of these learning principles.
Our investigation into learning with asymmetric (non-symmetric) feedback weights revealed that, although such systems depart from traditional energy-based frameworks, learning can still proceed under certain conditions. Specifically, for a small four-neuron network, we derived an explicit bound on the angle between the update vector and the true gradient, showing that effective learning is possible in this restricted case. However, generalizing these convergence guarantees to larger or more complex architectures remains an open challenge.
The main limitation of our study lies in its focus on small network sizes and specific architectural assumptions. While our theoretical results offer valuable insight, practical implementation and scalability in larger, real-world networks have yet to be established. Further work is needed to derive generalized angle bounds and convergence criteria for networks of arbitrary size and complexity, as well as to explore the empirical performance of these models on benchmark tasks.
We hope that our findings shed new light on the possibilities of biologically plausible learning in neural networks and encourage further research—both theoretical and experimental—into designing robust, scalable, and biologically inspired learning algorithms.