Abstract Reservoir Computing

: Noise of any kind can be an issue when translating results from simulations to the real world. We suddenly have to deal with building tolerances, faulty sensors, or just noisy sensor readings. This is especially evident in systems with many free parameters, such as the ones used in physical reservoir computing. By abstracting away these kinds of noise sources using intervals, we derive a regularized training regime for reservoir computing using sets of possible reservoir states. Numerical simulations are used to show the effectiveness of our approach against different sources of errors that can appear in real-world scenarios and compare them with standard approaches. Our results support the application of interval arithmetics to improve the robustness of mass-spring networks trained in simulations. × 10 − 4 NARMA20 abstract 2.95 × 10 − 3 ± 5.17 × 10 − 4 2.98 × 10 − 3 ± 3.89 × 10 − 4 2.99 × 10 − 3 ± 3.90 × 10 − 4 NARMA20 classical 1.03 × 10 − 1 ± 3.27 × 10 − 2 8.87 × 10 − 2 ± 3.96 × 10 − 2 1.61 × 10 − 1 ± 7.64 × 10 − 2 NARMA20 noise 2.09 × 10 − 3 ± 1.06 × 10 − 4 2.04 × 10 − 3 ± 1.03 × 10 − 4 2.04 × 10 − 3 ± 7.46 × 10 − 5


Introduction
In recent years, physical reservoir computing has enjoyed increased popularity in a wide variety of fields. Since the emergence of reservoir computing as a computational framework for RNNs in the form of echo state networks [1] and liquid state machines [2], it has since transcended the digital world, and has found various applications in physical systems. This is due to the nature of the fixed reservoir, which allows the usage of other dynamical systems, given that they exhibit certain properties, as reservoirs. Over the years, reservoir computing systems have used buckets of water [3], light [4,5], and soft robots [6,7]. This development has given rise to the field of physical reservoir computing. Recent advances in this field, like the Origami reservoir [8] or mass-spring networks [9], have extensively used numerical simulations as part of their research.
Depending on the application, numerical simulations are inevitable; think of systems that work in difficult-to-access environments, where a loss can be potentially hazardous or expensive, or where building is time-consuming and costly. Although such numerical simulations have improved in fidelity, it is not possible to accurately represent all facets of physical systems in simulated environments. This, in part, leads to a gap between simulations and reality, also called the sim2real gap.
In addition to errors caused by differences in fidelity, hardware issues are also a source of concern. Sensors might break, misbehave, or become susceptible to noise. Accounting for all possible sources of errors is difficult and time-consuming, but necessary to create reliable and safe systems. Fortunately, the field of formal verification for software systems gives us tools to handle the complexity of such situations.
One of these tools, abstract interpretation [10], helps in dealing with such uncertainties, by enveloping them in abstract objects. This idea has also recently found its way to the neural network community for verification [11][12][13], but also for training [13,14]. Abstracting single data-points with sets of points, e.g., by encompassing them in a hypercube, as illustrated in Figure 1, it becomes possible to work with the complete neighbourhood of such data-points, without having to sample all points that lie around it.
By using simple interval arithmetic [15], Senn and Kumazawa [16] have shown how this idea can be leveraged to train robust echo state networks. We build on this approach and show how it could be applied to create robust physical reservoir computing systems in simulations. Our contributions are: • An abstract regulariser leading to robust weights for reservoir computing systems • A closed form solution for the regression problem, using the abstract regulariser • Numerical study on the robustness of physical reservoir computing systems against different types of errors Physical reservoir computing in the form of mass-spring networks and how we use abstraction to improve the robustness of such systems is introduced in Section 2; furthermore, the datasets and types of errors considered, along with the general experimental setup, are shown. Then, the achieved results are introduced and discussed in Section 3. Finally, in Section 4, we give a conclusion and an outlook for future research endeavours. Figure 1. Visualisation of the abstraction of a single data-point to a set of points, representing a hypercube. The solid point represents the concrete point, and the dotted ones represent points that are now also considered using the abstraction.

Materials and Methods
In this section, we provide a brief introduction to reservoir computing (see Section 2.1), a physical implementation based on mass-spring networks (see Section 2.2), and how we can improve the robustness of such systems against noise introduced by the sim2real gap (see Section 2.3).

Reservoir Computing
Reservoir computing revolves around the exploitation of dynamical systems, called reservoirs, for computation. In terms of machine learning, we can divide the whole approach into two phases: the training (see Section 2.1.1) phase, in which teacher forcing is employed, and an exploitation phase (see Section 2.1.2), in which we use the dynamical system for our computations (e.g., predictions). Following this, we will provide a brief introduction to how the training and exploitation in reservoir computing works, by using a nonlinear map of the form: with column vectors x t ∈ R d and u t ∈ R f ; matrices A ∈ R dxd and B ∈ R dx f ; a nonlinear function ϕ : R d → R d (e.g., the hyperbolic tangent); and time-steps t = 1 . . . T. In addition, we enforced the spectral radius of A to be ρ(A) < 1; this allowed the system to exhibit some kind of short-term memory. The rationale behind using a map as an example is that neural network or physics simulation-based approaches to reservoir computing can be reduced to discrete dynamical systems of this form.

Training
The training of reservoir computing systems was done in a supervised fashion; as such, we needed next the input signal u, a target signal y as ground truth. We further defined a washout time 0 ≤ τ < T, which is the time that the system needs until its state x is entirely dependent on the input u. Using Equation (1), we then drove the dynamical system using the input signal u, and collected the state x t as row vectors for each time-step t > τ into a matrix. To conclude the training, the following equation was solved for the column vector w ∈ R d : this is usually done using linear regression techniques.

Exploitation
Once we have calculated our output weights w, we calculated an outputŷ t with: depending on the application, two ways of driving a reservoir computing system are possible.
Either in an open-loop of the form: or in a closed-loop: As can be seen in Equations (4) and (5), the difference between a closed-and open-loop setup is how the input is generated. In the closed-loop setup, outputs were reused as inputs in the next time-step.

Mass-Spring Networks
Mass-spring networks are, in principle, coupled nonlinear oscillators that have been popularised by Hauser et al. [9], but also have been proposed by Couloumbe et al. [17]. Such systems can be used to approximate a variety of materials in simulations, like fabric [18]; compliant materials, as used in soft-robotics [19], or flesh-like setups [20].
We use a mass-spring system as shown in Figure 2, with nonlinear springs exhibiting a spring force of the following form: ∆l being the spring displacement and k the spring constant. This emulates a compliant, elastic material with a force displacement curve, as shown in Figure 3.
To exploit such a system for computation, an input signal u is translated to a force f and applied to predetermined input masses (green in Figure 2). The network then starts to oscillate accordingly, and we can record the mass accelerations as the state x (cf. Equation (2)). The recorded states can then be used for training and exploitation, as

Abstract Reservoir Computing
When physically building mass-spring systems, as introduced in Section 2.2, we have to deal with tolerances due to imperfections in the creation process. Therefore, the initial position of masses diverted from the positions was assumed in the simulation; this is visualized in Figure 4. As a first step to deal with this problem, we can replace each component p i of the location vector p of a mass, with an interval or ball of the form (p i,centre , p i,radius ), representing the possible positions of the mass (red rectangle in Figure 4). We call this a hyperrectangle or, in abstract interpretation terminology, a box. This abstraction was then directly used in the simulation using ball arithmetic [21,22]. Instead of concrete numbers for our states x t , we then got state tuples of the form (x t,centre , x t,radius ), which we collected in the matrices X c and X r , respectively. Senn and Kumazawa [16] proposed to use the additional information as constraints for the linear regression and use a splitting conic solver [23] to solve: This approach has the advantage of having exact upper error bounds encoded in y radius , which represent the maximum desired deviation from the concrete solution given as y centre , but using a solver, slows down the training significantly. By relaxing the requirement for an upper error bound, we can reformulate Equation (7) as follows: Reformulating this as the cost function L allows us to derive a closed form solution, as shown in Equations (9) and (10): then, setting the derivative dL dw equal to 0, we can solve for w: Using Equation (10), instead of solving Equation (7), we trade assurance of error bounds for a significant speed-up. The light blue circle represents the exact location of the mass and the red rectangle represents the area of possible locations due to tolerances in horizontal and vertical directions. Such tolerances can change the dynamical behaviour of the network compared with simulated the counterpart, and poses a major obstacle when trying to transfer learnt parameters from simulation to real-world systems.

Experimental Setup
To evaluate our proposed approach, we implemented a numerical simulation of a mass-spring network using Julia 1.5 [24], and tested it with three datasets in open-and closed-loop setup for different types of error sources. The benchmark datasets are the same ones used by Goudarzi et al. [25], and were precomputed for 5000 time-steps, then each point in each time-series was repeated 5 times (giving time-series with 15,000 data points); finally, we split them into training and testing sets. The training sets consisted of the first 10,000 time-steps, whereas for the testing sets, the remaining 5000 time-steps were used.

Mass-Spring Network
We use a mass-spring network, as depicted in Figure 2. All masses are aligned to a regular grid first, and then slightly displaced with a value ∆p ∈ U 2 (−0.25, 0.25). Then, each mass is connected with its 8-neighbourhood through non-linear springs based on Equation (6). The four corner masses (blue) are fixed, and the input signal is applied as a force to a single input mass (green). As sensor readings, we use the acceleration of each mass in the network.

Hénon Time-Series
The Hénon time-series is based on the Hénon map introduced in 1976 [26]. Equation (11) was used to compute the time-series.
The Hénon time-series, as used in the experiments, is shown in Figure 5.

NARMA10 Time-Series
Non-linear autoregressive moving average (NARMA) tasks are widely used in the reservoir computing community as basic benchmarks. NARMA10 specifically is one of the most used benchmarks for reservoir computing and is defined as: with u t ∈ U (0, 0.5) being drawn from a uniform distribution. The NARMA10 time-series, as used in the experiments, is shown in Figure 6.

NARMA20 Time-Series
The NARMA20 task is the same as the NARMA10 one, except with longer time dependencies and an additional non-linearity: The NARMA20 time-series, as used in the experiments, is shown in Figure 7.

Baselines
We compare the results of our proposed approach to the following two baselines: • Training with ridge regression (classical model) • Training with linear regression and added noise (noise model) Using the sensor readings x τ+1 , . . . , x T from the training simulation, the classical model is trained using Equation (14), and the noise model's training is based on Equation (15), with set to the amplitude of the current error parameter.

Sensor Augmentations
We tested each model under different types of errors that could potentially occur in a real-world scenario.
• Sensor Failure -Before testing, masses were randomly selected with a given probability p, and their readings were forced to 0 during testing.
• Sensor Noise -Gaussian noise with 0-mean and varying standard deviation σ is added during testing.
• Fixed Sensor Displacement -Sensor readings were displaced by a fixed value z.

-
The mass positions were randomly displaced by a random vector ∆x ∈ [−k; k] 2 .
Each possible source of error was tested independently of other error sources. The parameter ranges are shown in Table 1. Table 1. The parameter ranges used for each type of simulated error.

Augmentation
Parameter Range

Results and Discussion
For each experiment, we measured the mean squared error (MSE), given by Equation (16), between the generated outputsŷ by the system and the values y in test sets. As can be seen in Figures 8-11, our proposed approach scores better regarding the MSE compared to the classical approach in all experiments. Compared to the training regime with noise, at high enough noise levels, it takes over or performs equally to our proposed training regime. Exact numbers can be found in the Appendix A in Tables A1-A4.
When looking at the results when no noise is present in the input, our proposal surpasses both baselines, in any case. This can indicate that the abstract regulariser is generally a better choice than L2-regularisation. Considering the results of the experiments with noise present, our proposed approach gives more consistent results over the whole spectrum of noise amplitudes, except for sensor failures, i.e., missing sensor readings. In this case, both baselines also perform poorly.
One reason for the better performance, even without noise present, is the fact that ball arithmetic [21], as used in the experiments, also captures numerical imprecisions. This can be compared to adding noise to the system, and thus help against overfitting. Looking at the standard deviations of the results (Tables A1-A4), we see that the abstract training regime leads to fewer variations of the output, indicating that our approach is more robust against randomness.   (c) MSE for simulated sensor noise with the amplitude given by the x-axis for the NARMA20 dataset. (c) MSE for fixed simulated sensor perturbation with the amplitude given by the x-axis for the NARMA20 dataset. (c) MSE for simulated initial mass displacement with the amplitude given by the x-axis for the NARMA20 dataset.

Conclusions and Outlook
Although physical reservoir computing systems are becoming more and more important, and most experiments are based on numerical simulations, the direct transfer of trained systems from simulation to the real world has not been widely studied yet. We proposed a new training regime based on abstract interpretation to address some of the issues that are to be expected, such as building tolerances, sensor defects, noise, etc., and verified our approach in a series of simulated experiments. The results support the use of our abstract regulariser, not only in this setting, but also in general, as it achieved lower error rates. This is most likely due to the interval arithmetic used in our experiments. When adding noise, the difference to the classical approach with L2-regularisation becomes even more significant. In contrast, the training regime with added noise becomes better the more noise is added. Therefore, in settings where high noise amplitudes are to be expected, this approach seems to be the better choice.
For future works, we see a direct comparison between a physical reservoir computing system in simulations and real-life, e.g., based on fabrics, as this comes close to the mass-spring systems used in the experiments. Another direction could be the nature of noise captured by the abstraction. Currently, we only consider uniformly distributed abstractions-but most problems in nature are differently distributed, e.g., are from a normal distribution. Being able to use different distributions for the abstractions would give the ability to tailor the training regime specifically to the problems at hand.
In either case, both directions will allow for a more widespread use of physical reservoir computing, as it would allow for rapid iterations in simulations and a direct transfer of results to the real world.

Acknowledgments:
The authors thank Carmen Mei Ling Frischknecht-Gruber of the Zurich University of Applied Sciences for her comments and proofreading while creating this manuscript.

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.