1. Introduction
Reservoir computing (RC) is a computationally cheap method used to train dynamical system models, which was initially proposed for recurrent neural networks (RNNs) [
1]. Generally, RNNs are trained using the gradient descent method, but this is computationally expensive because the signal flow must be tracked in the network for a certain period. Jaeger et al. [
2] and Maass et al. [
3] found that RNNs achieve approximation tasks by training only a static function, converting network states to the output. Their methods were unified as RC, a framework that trains dynamical models by training static functions [
4,
5].
An RC model is a dynamical system model designed to be trained through RC, and it consists of a dynamical system called the “reservoir” and a static function called the “readout.” As shown in
Figure 1, the reservoir processes the input to the model first, and the readout maps the reservoir state to the output. In supervised learning, the model aims to approximate the given target system. During RC model training, only the readout is tuned for each target, but the reservoir does not adapt to targets. Empirically, an RC model performs better using a reservoir with complex dynamics. For example, echo state networks [
2] use an RNN with random parameters as a reservoir.
Another advantage of RC is that various dynamical systems can be used as the reservoir, even if they are difficult to train or adjust. Recently, physical RC, which uses a reservoir implemented as hardware, has been drawing attention in terms of energy efficiency and computation speed [
6,
7,
8]. Many studies have been conducted on various implementations of reservoirs, e.g., electric and electronic circuits [
9,
10,
11,
12,
13], network [
14] and delay feedback systems [
15,
16] using optical elements, and spin torque oscillators [
17,
18,
19].
An RC model and its reservoir are said to be universal if the model can approximate an arbitrary target with arbitrary precision. The concept of universality concerning RC appeared simultaneously with RC itself [
3]. Maass et al. [
3] also proposed a sufficient condition for a continuous-time RC model to be universal. The sufficient condition in [
3] is the combination of the continuity and injectivity of the reservoir, which is a functional from input functions to output values. The injectivity of the reservoir is also a necessary condition for universality.
Sugiura et al. [
20] proposed the relaxed condition called the neighborhood separation property (NSP), which can be applied to more complex reservoirs with multiple equilibrium states. The authors of [
21] showed that a reservoir with a finite-dimensional output can satisfy the NSP but not the condition in [
3]. In [
21], another sufficient condition was proposed: the existence of the continuous inverse of a reservoir. This condition is also sufficient for the NSP and used to show that the NSP can be satisfied. Relationships among universality and the explained reservoir conditions are summarized in
Figure 2.
As mentioned above, some necessary conditions and sufficient conditions for universality are known, but the necessary and sufficient condition, equivalent to universality, is still unknown. Such a condition is critical as it provides an essential answer to the question regarding which properties of the reservoir enable approximation with an RC model. In this paper, we show that the NSP and continuous inverse, which are sufficient conditions for a reservoir to be universal, are also equivalent to the universality itself. Moreover, we also show that a universal reservoir has a “pathological” property using the obtained equivalence. Similarly to previous studies, we considered a continuous-time RC model with a polynomial readout and evaluated the approximation using the maximum error. A dynamical system, e.g., a model, reservoir, and target, is treated as a functional from input functions to output values.
Our result concerning the conditions equivalent to universality can be extended to a general case where the input function space is compactifiable. Hence, our result may be applied to various types of RC not discussed in this paper, such as discrete-time ones.
The pathological property of a universal reservoir is that it has dense discontinuous points. As we show later, a universal reservoir has a continuous inverse map. Hence, if there is a continuous and universal reservoir, it is a homeomorphism. However, the infinite-dimensional space of input functions and the finite-dimensional space of output values cannot be homeomorphic. The same holds if we restrict the reservoir domain to an arbitrary open subset, i.e., a universal reservoir has a discontinuous point in any open subset. This result suggests that a universal reservoir is highly sensitive to inputs, and chaotic reservoirs, such as those described in [
19,
22,
23,
24], are necessary to achieve universality. These facts support the empirical rule that a complex reservoir tends to be effective and provide significant insight into the development of high-performance reservoirs.
Although considering noise in inputs and observations is important in practice, we focus on the deterministic and noiseless case for the following reasons. First, theoretical research on continuous-time RC remains limited even in the deterministic setting. Second, the definition of universality in the stochastic case is not straightforward and has not yet been established.
The reminder of this paper is structured as follows:
Section 2 provides preliminaries and describes RC and previous results. In
Section 3, we prove that the NSP and the continuous inverse of a reservoir are equivalent to universality. In
Section 4, we prove that a universal reservoir has dense discontinuous points. The main symbols used in this paper are summarized in
Table 1.
2. Preliminary
We discuss a dynamical system represented as a functional on functions of time. Let
be a compact and convex set of input values and
be the limit of the speed of input change. We define the set
V of input functions as follows:
where
is the Euclidean norm, and
is defined as
. One must set
A and
large and wide enough according to the input functions that one considers. In reality, such
A and
K may be unknown or may not exist, but we do not consider these cases here.
Because input functions are given on a finite time interval in practice, we define another set of input functions by restricting the domain of functions in
V. For a function
v and
, we write the restriction of
v to
as
. We define the input functions on a finite time interval as
Note that
is not defined for a specific
and contains input functions on time intervals of various lengths. The dynamical system that we discuss is a functional from
to
. In the real world, such a functional is a machine or a device that processes an input
v for a period
in real time and outputs its state at time 0.
For example, we define a functional using the following state-space system:
where
and
are the system state and input at time
. The initial state is
, and the derivative of the state is given by the function
of the state and input. In considering the fixed
, System (
3) determines
for
. Hence, we can define the functional
as
Note that time is shifted so that the input signal starts at time
and ends at time 0. The functional output
is the state to which the system transitions from the initial state
with the input
of the length
t. State spaces can represent many physical systems, and state-space systems can also be represented as functionals, as in the example. Therefore, our discussion of functionals covers a fairly wide range of dynamical systems.
In supervised learning, reservoir computing (RC) trains a model to approximate a functional we call the “target.” Let and be the target and the model, respectively. We consider the uniform approximation, which is evaluated based on the worst error, . The RC model is represented as the composition of two maps as . The map is the dynamical part of the model called the “reservoir,” and the map is the static part called the “readout.” To approximate the target, RC trains only the readout and fixes the reservoir because training the dynamical part of the model is technically difficult and computationally expensive. Because the reservoir is fixed, we can implement it not only as software on a general-purpose computer but also as hardware. In the field of physical reservoir computing, many physical phenomena are studied as reservoirs and expected in terms of computation speed and energy consumption.
Let be a set of targets, which are functionals from to . An RC model and its reservoir are said to be universal in if the model can approximate an arbitrary . Let be the set of polynomial functions from to . If we use a polynomial readout, training the model involves selecting a polynomial function from , and universality in is defined as follows.
Definition 1. Reservoir is said to be universal for uniform approximations in if Polynomial readouts are not practical because of their limited generalization ability. However, they are theoretically tractable, and the discussion on them can be extended to other types of readouts that can approximate continuous functions. Hence, it suffices to discuss the theoretical aspects by assuming polynomial readouts, even if we use other types of continuous functions, such as feed-forward neural networks. The definition of universality using a polynomial readout and uniform approximation, like Definition 1, has been discussed since the earliest studies on reservoir computing [
3].
In previous studies, two sufficient conditions for universality were provided. One condition [
20] is the weakest ever known and has been shown to be satisfiable using the other [
21]. To explain these conditions, we need a metric on
. Let
be a non-increasing function that satisfies
. Using the supremum on the domain, we define the weighted norm
of the function
v as
where
. Let
for any
and define the map
as
For an input function, the map
returns the length of the interval on which that function is defined.
Let
be a strictly increasing, bounded, and continuous function. Using the weighted norm
and function
, we define the map
as
where
,
,
, and
. The map
d is a metric on
under the conditions we describe later. The first term of the distance (
8) compares the inputs on the intersection of their domains. The function
w assigns greater weight to the difference in the newer part of the inputs. The second term of (
8) compares the length of the two inputs via the function
. From the definition of
, the second term is negligible if
and
are sufficiently large. The following proposition [
20] provides the condition that makes
d a metric:
Proposition 1. Suppose that the following triangle inequality holds for any , , , and :Then, is a compact metric space, and is dense in . The triangle inequality (
9) guarantees that
d satisfies other triangle inequalities. The density of
is confirmed as follows: for any
,
converges to
as
. From the density of
, we have
, where the symbol
is the closure. An example of a pair,
, that makes
d a metric is shown by (38) in [
20].
In the rest of this paper, we assume that targets in
are uniformly continuous following the previous study [
3]. This assumption enables us to extended a functional
onto
, i.e., there is a continuous functional
such that
for any
. Such continuity on a compact set is needed for theoretical discussion on approximation.
The known weakest sufficient condition for universality is called the neighborhood separation property (NSP) [
20]. The NSP means that the reservoir transfers the neighborhoods of distinct points to the images disconnected from each other. For
and
, we define the set
as
Although the set
is similar to a general neighborhood, note that
holds even if
. The mathematical definition of the NSP of the reservoir
is the following.
Condition 1. For any distinct , , some satisfies
Another sufficient condition for universality is that the reservoir
has a uniformly continuous inverse, as shown below [
21].
Condition 2. There is a uniformly continuous map, , that satisfies .
Condition 2 means that the input function can be continuously reconstructed from the reservoir output and that there is a continuous map from the reservoir outputs to the target outputs through input functions. Condition 2 is sufficient for Condition 1, which shows that there is a universal reservoir. The Hahn–Mazurkiewicz theorem [
25] claims that there is a continuous surjection
, called a space-filling curve. Hence, we can obtain the reservoir
, satisfying Condition 2, by restricting the domain of
g for its image to be
and taking a right inverse of the restriction. This result can be easily extended to
with a general
.
3. Necessary and Sufficient Condition for Universality
In this section, we prove the following theorem:
Theorem 1. Let be the set of uniformly continuous functionals from to . Let be a bounded functional. Then, Equation (5), Condition 1, and Condition 2 are equivalent to one another. Theorem 1 means that a reservoir’s universality, NSP, and uniformly continuous inverse are equivalent. This is the first result that provides the necessary and sufficient conditions of a reservoir to be universal. Theorem 1 is proved as a corollary of the following generalization:
Theorem 2. Let X be a metric space with the metric d and be a compactification of X. Suppose that and that is bounded. Then, the following three conditions are equivalent to one another:
- (i)
For any uniformly continuous and , there is some polynomial function, , that satisfies - (ii)
For any distinct and , there is some that satisfieswhere for . - (iii)
There is a uniformly continuous map that satisfieswhere is the identity mapping on X.
From Proposition 1, Theorem 1 is a corollary of Theorem 2 where . Conditions (i)–(iii) correspond to the universality of Definition 1 and Conditions 1 and 2, respectively. Set X does not have to be a strict subset of , i.e., is allowed. Hence, instead of , we can consider the compact set V or as X. In this case, we obtain the same result as that of Theorem 1 for input functions on an infinite time interval. We prove Theorem 2 by proving the following propositions: (iii)⇒(i), ¬(iii)(i), and ¬(iii)(ii). To prove propositions premised on ¬(iii), we use the following lemma:
Lemma 1. Suppose that Condition (iii) does not hold on the same premise as Theorem 2. Then, there exist the Cauchy sequences and that satisfy Because of
and
, Equation (
14) means that even if two inputs to
g are close, the outputs are not necessarily close. More precisely,
g is not uniformly continuous if (
14) holds. Lemma 1 claims the inverse of this proposition.
Proof of Lemma 1. First, we consider the case where no map
satisfies
, i.e.,
does not have a left inverse. Having no left inverse is equivalent to not being injective. Hence, there exist distinct
and
that satisfy
, and the sequences
and
, which do not change with
k, satisfy (
14).
Next, we consider the case where a map
satisfying
exists but is not uniformly continuous, i.e.,
Then, there exist
and the sequences
,
that satisfy
A bounded
means that
is bounded and closed, i.e., compact. An infinite sequence on a compact metric space includes a subsequence that converges on that space. Hence, there exist an infinite set
and
such that the subsequence
converges to
. From the first inequality in (
16), we have
Because
and
are included in
, there exist the sequences
and
that satisfy the following:
We show that
and
include subsequences that satisfy (
14). From (
17) and (
18), the subsequences
and
satisfy the second equation in (
14). Because
is compact, there is some infinite set,
, such that the subsequence
converges. Similarly, there is also some infinite set,
, such that the subsequence
converges. Using
g in (
18) and substituting
, the following holds:
From (
19) and the second inequality in (
16), we have
for any
. Therefore, the Cauchy sequences
and
satisfy the first equation in (
14), which proves Lemma 1. □
Theorem 2 is proved as follows:
Proof of Theorem 2. As we explained, we prove Propositions (iii)⇒(i), ¬(iii)(i), ¬(iii)(ii), and ¬(ii)(iii).
Proof of (iii)⇒(i): Let
and
be an arbitrary uniformly continuous functional and an arbitrary real number. We define a function
as
. Because
, (
11) is equivalent to
The function
q is uniformly continuous because
and
g are uniformly continuous. Hence, there is a unique continuous extension,
, satisfying
for any
. From the Stone–Weierstrass theorem [
26], there is some polynomial function
that satisfies
for any
on the compact set
. The polynomial function
p also satisfies (
20), which proves (iii)⇒(i). The relationship among maps in the proof is shown in
Figure 3.
Proof of ¬(iii)
(i): From Lemma 1, there exist the Cauchy sequences
and
that satisfy (
14). We define distinct
and
as
Let
be a uniformly continuous function that satisfies
A specific definition of
is not necessary for the proof, but it can be defined as follows:
To prove by contradiction, we suppose that (i) holds and let
. Then, there is some polynomial function
that satisfies
From (
14) and the continuity of
p, the following holds:
However, from (
22) and the limit of (
24) when
, the following also holds:
This contradicts (
25), and we have ¬(i). The relationship among sets and sequences in the proof is shown in
Figure 4.
Proof of ¬(iii)
(ii): From Lemma 1, there exist the Cauchy sequences
and
that satisfy (
14). We define distinct
and
and
as
Let
be an arbitrary real number. From the first and second equations in (
27), there is a sufficiently large
that satisfies
and
. Because both
and
converge to
as
, we have
, which proves ¬(iii)
(ii). The relationship among sets and sequences in the proof is shown in
Figure 5.
Proof of ¬(ii)
(iii): Suppose that distinct
and
satisfy
for any
. Let
. We consider the sequences
and
that converge to
. To prove by contradiction, suppose that a uniformly continuous map
exists and satisfies
. Then, we have
and
. Therefore, although
and
converge to the same point
as
, we have
for any
. This means that
g is not uniformly continuous, and we have ¬(iii). The relationship among sets and sequences in the proof is shown in
Figure 6.
From the above four propositions, we have (iii)⇔(i) and (iii)⇔(ii), which proves Theorem 2. □
4. Pathological Property of Universal Reservoir
Although a previous study [
21] showed that a universal reservoir exists mathematically, whether we can physically construct one is still unknown. The authors of [
21] suggested that the proposed universal reservoir using a space-filling curve has an infinite number of discontinuous points and is difficult to implement. In this section, we show that all universal reservoirs have the same problem as the example in [
21]. The main result of this section is the following:
Theorem 3. Let be the set of uniformly continuous functionals from to . Let be a functional that is bounded and universal for uniform approximations in . Then, the set of points at which is discontinuous is dense in .
Theorem 3 says that a universal reservoir has a discontinuous point on an arbitrary neighborhood of an arbitrary point and is very sensitive to input change. Note that the discontinuity of
does not mean the discontinuity of the time derivative
of states in the state-space representation (
4) of
f. The same holds for
, which has
m outputs. Theorem 3 is proved using the contradiction that if
is continuous on some open set, two spaces with different dimensions are homeomorphic. To this end, we use the following lemma about a topological embedding:
Lemma 2. For any , , and any , there is some topological embedding .
Proof of Lemma 2. Let
be an arbitrary function. Let
and
l be arbitrary real and natural numbers, respectively. Let
, i.e., the domain of
v is
. Because the bounded function
in (
8) is continuous and monotonically increasing, there is some
satisfying
. Let an input value
satisfy
, where
A is the input value space and
K is the maximum Lipschitz constant in (
1). Using
T and
a, we define the map
as
where
is defined for
as
As shown in
Figure 7,
is an extension of
v to
with the values on the added domain defined as some internal division point between
and
a. The division rate
is piece-wise linear and takes the value of
at
for
, which are the borders of the linear pieces.
First, we show that holds for any . Because the convex set A includes a and , the image of is included in A. Because the Lipschitz constant of is at most , and holds, the Lipschitz constant of is less than or equal to K. Hence, we have . The functions v and have the same values on the intersection of their domain. Hence, from the definition of T, we have , i.e., .
Next, we show that
h is an embedding. A continuous bijection from a compact set is homeomorphic, and the domain
of
h is compact. Hence, if
h is a continuous injection, it is homeomorphic to its image, i.e., an embedding. As we explained, we have
for
. Hence, different
gives different
and
, i.e., the map
h is injective. Let
and
for
,
. Then, the distance between
and
is written as
This means that the map
h is continuous. Therefore, the map
is an embedding, which proves Lemma 2. □
Using Lemma 2, we prove Theorem 3 as follows:
Proof of Theorem 3. Let bounded
be universal for uniform approximations in
. To prove by contradiction, suppose that discontinuous points of
are not dense in
, i.e., there exist
and
such that
is continuous on
. From Theorem 1, universality is equivalent to Condition 2, and
has a continuous inverse map. Hence, the restriction
of
to
is a topological embedding because it and its inverse are continuous. From Lemma 2, there is a topological embedding
. The composition
is also an embedding, i.e.,
and
are homeomorphic. The relationship among the domains and images of
h and
is shown in
Figure 8.
We call the small inductive dimension simply a dimension and write the dimension of the space
A as
. Dimensions have the following two properties (see pages 3–4 of [
27]): First, two homeomorphic topological spaces have the same dimension. Second, a topological space has a dimension equal to its subspace or larger. Therefore, we have the contradiction of
which proves Theorem 3. □