Reservoir Computing with Delayed Input for Fast and Easy Optimisation

Reservoir computing is a machine learning method that solves tasks using the response of a dynamical system to a certain input. As the training scheme only involves optimising the weights of the responses of the dynamical system, this method is particularly suited for hardware implementation. Furthermore, the inherent memory of dynamical systems which are suitable for use as reservoirs mean that this method has the potential to perform well on time series prediction tasks, as well as other tasks with time dependence. However, reservoir computing still requires extensive task-dependent parameter optimisation in order to achieve good performance. We demonstrate that by including a time-delayed version of the input for various time series prediction tasks, good performance can be achieved with an unoptimised reservoir. Furthermore, we show that by including the appropriate time-delayed input, one unaltered reservoir can perform well on six different time series prediction tasks at a very low computational expense. Our approach is of particular relevance to hardware implemented reservoirs, as one does not necessarily have access to pertinent optimisation parameters in physical systems but the inclusion of an additional input is generally possible.


Introduction
Reservoir computing (RC) is a machine learning method that is particularly suited to solving dynamical tasks [1]. It was introduced as a way of using recurrent networks for machine learning but circumventing the costly training of the network weights [2]. The main principle underpinning reservoir computing is that the reservoir projects the inputs into a sufficiently high dimensional phase space such that it suffices to linearly sample the response of the reservoir in order to approximate the desired target for a given task. For this to work, the reservoir must fulfil certain criteria: the response to sufficiently different inputs must be linearly separable, the reservoir must be capable of performing nonlinear transforms, and the reservoir must have the fading memory property [2]. However, even when these criteria are fulfilled, the performance depends greatly on the dynamics of the reservoir. Hence, in the past two decades a lot of research in the reservoir computing community has focused on the optimisation of the reservoir parameters [3][4][5][6][7][8][9]. Furthermore, the optimisation of the reservoir is a task-specific problem [1,[10][11][12] and a universal reservoir, which performs well in a range of tasks, remains elusive.
In a recent paper [13], the authors aim to eliminate the issue of hyperparameter optimisation altogether by removing the reservoir. Their approach essentially takes the well-known nonlinear vector autoregression (NVAR) method, uses a less parsimonious approach to filling the feature vector, and adds Tikhonov regularisation. However, the method of [13] trades the optimisation of the reservoir hyperparameters for the optimisation of the feature vector elements and it cannot be asserted that the latter is generally less costly. Furthermore, one of the main factors driving research into reservoir computing forward is the possibility for hardware implementation [14][15][16][17][18][19], which is impractical when the reservoir is absent.
In this contribution we demonstrate a new approach that reduces the need for hyperparameter optimisation and is well suited to boosting the performance of physically implemented reservoir computers. Specifically, we show that, by adding a time-delayed version of the input for a given task, the performance of an unoptimised reservoir can be greatly improved. We demonstrate this by using one unaltered reservoir to perform six different time series prediction tasks. In each case the only optimisation parameters are the delay and input strength of the additional delayed input. The aim of this work is not to achieve the best possible performance, but rather to demonstrate that reasonable performance can be achieved for various tasks using the same reservoir and at a very low computational cost.
Using time-delayed input is a common approach for adding memory to feedforward networks [20][21][22][23] and is the basis of statistical forecasting methods [21,24]. However, despite the simplicity of this idea, to the best of our knowledge, time-delayed inputs have not been widely used to optimise the performance of reservoir computers. This may be because the focus has been on constructing reservoirs that have the necessary memory to perform a given task [1]. One study in which time-delayed inputs have been used to improve the performance of a time series prediction task is [25]. However, in [25], the manner in which the time-delayed input was constructed assumed that the memory requirements of the task monotonically decrease with increasing steps into the past and did not allow for the input scaling of the delayed input to be varied as a free parameter.
Our results are of particular relevance to the hardware implementation of reservoir computing, because in physical systems one does not always have access to the relevant hyperparameters necessary for optimisation of the task-dependent performance but it should always be possible to add an additional input.

Methods
In the following, we describe the reservoir computing concept, the model for the reservoir that we use, our proposed time-delayed input method, and the benchmarking tasks that are used to test our approach.

Reservoir Computing
In reservoir computing, the reservoir, which at this point can be treated as a black box, is fed an input and the response of the system is sampled a number of times. The responses are then linearly combined to approximate the desired output (see Figure 1a). The linear output weights are trained via linear regression, typically using Tikhonov regularisation or regularisation by noise [1]. A variant of reservoir computing, that is of particular relevance for hardware implementation, is time-multiplexed reservoir computing using only one nonlinear element [26]. In this scheme both the injection of the data into the reservoir and the filling of state matrix S occur sequentially. Typically, a mask is applied to the input data in order to diversify the response of the reservoir to the input. In the training phase, the reservoir is fed a sequence of training data of length K tr . A mask of length N v is applied to each element of the training data, where N v corresponds to the readout dimension (i.e., the number virtual nodes). Hence, there are N v K tr time-multiplexed inputs that are sequentially fed into the reservoir. The corresponding state matrix, which has the dimensions K tr × (N v + 1), is filled row by row with an additional bias term of 1 at the end of each row. The training step is then to find the (N v + 1) dimensional weight vector W that best approximatesô whereô is the vector of K tr target outputs. The solution to this linear problem is given by where λ is the Tikhonov regularisation parameter and I is the identity matrix. The vector x k out is the responses of the reservoir to an input I k and the corresponding output o k is generated by a weighted sum of these responses. The read-out weights W are trained. (b) Sketch of the memory cell reservoir described in Section 2.2, where a time-multiplexed input J(k) is fed into the reservoir (k = N v k + k , please see Figure 2 for the construction of the time-multiplexed input). The index n labels the memory cells (in total N) that are addressed via the coupling matrices K in and K out . K labels the feedback strength at which the output of the memory cells x out (k) (given by Equation (5)) is fed back into the nonlinearity G. The elements of the state matrix S are given by S k ,k = x out (N v k + k ), with k ∈ (1 . . . K tr ) and k ∈ (1 . . . N v ), i.e., one row of S corresponds to the vector of responses x k out in (a).

Error Measure
To quantify the performance of the reservoir computer we use the normalised root mean squared error (NRMSE), defined as whereô k are the target values, o k are the outputs produced by the reservoir computer, K o is the length of the vectorô, and var(ô) is the variance of the target sequence.

Reservoir Model
To investigate the effect of delayed input on a physically implemented reservoir computer, we model a physical system that is inspired by optical delay line reservoirs [27,28]. Delay line implementations have shown promise due to high throughput speeds [29]. However, complex network connectivity, achieved via the introduction of multiple delays, represents a significant experimental hurdle, or requires opto-electrical conversion of the signal for electronic storage, thereby forgoing the advantages of an all-optical implemen-tation. Recent developments in optical quantum memories with high bandwidth [30] and high capacity [31] allow for the on-demand storage and retrieval of optical pulses and thus the implementation of delays of arbitrary length, limited only by the coherence time of the optical memory, which can reach up to one second [32]. The reservoir model described below models a physical optical system including the optical memory for the reconfigurable and arbitrary coupling of the injected information (modeled as memory cells with input and output coupling), a nonlinear element (modeled as a semiconductor optical amplifier), and a short delay line, whose purpose is not for introducing delay, but to recouple existing information back into the optical system. A sketch of the envisaged setup is shown in Figure 1b. A time-multiplexed input is fed through a nonlinear element and then stored in the memory cells (described with the index n) with a certain input topology K n in . Combinations of the memory cells are partially read out with the output topology K n out and finally coupled back into the nonlinear element. Since the write and read-out process repeats in time, it is possible to realise time-varying read and write topologies. We describe this by adding the index m = k(mod M), where M is the period within which the coupling sequence repeats, giving the coupling matrix elements K mn in and K mn out . The map describing this process is given by the following. Let x n (k) be the state of the n th memory cell at time step k. The next time step is then given by where G(x) is the function describing the nonlinear element, and the matrices K in , K nin , and K out describe the (possibly time-varying) coupling into and out of the memory cells. The value of K describes the percentage of the output x out (k) that is coupled back into the nonlinear element and J(k) is the input. The coupling matrices have the dimensions MxN, where N is the number of memory cells. Note that the index m = k(mod M) depends on the time step k. For each iteration one row of the coupling matrices determines which memory cells are written into and which are read out of. K in gives the write sequence and K out the out-coupling sequence. These two matrices contain values from zero to one. For K in , the row sum must be one. The entries of the matrix K nin are This allows the memory cells with new input to be overwritten and those without to be updated according to how much was read out. The model described above allows for arbitrary coupling between the memory cells. For this study, we choose K in = K out = I, and M = N. This means, at every input cycle, one memory cell is overwritten and one is read out. For this choice of coupling, Equations (4) and (5) can be rewritten as We then choose N = N v + 1 where N v is the number of virtual nodes that will be used for the reservoir computing tasks. This coupling describes a type of ring coupling akin to delay-based reservoir computers with the feedback delay time τ = T + θ, where T is the input clock-cycle and θ is the virtual node separation [28,33]. Comparing the continuous and discrete cases gives τ → N, T → N v , and θ → 1. We choose such a simple coupling scheme as it has been demonstrated that such coupling topologies perform similarly to random coupling topologies [4]. Using Equation (6), the rows of the state matrix S are filled with N v sequential x out (k), i.e., S k ,k = x out (N v k + k ) and the bias S k ,(N v +1) = 1 (see Figure 1 for an illustration).
For the nonlinearity, we choose which describes the input response of a semiconductor optical amplifier [34,35].

Input and Mask
The reservoir input is given by a task-dependent time series and a time-delayed version of this time series. Before the data are fed into the reservoir, masks are applied to both input series. The masks consist of N v values drawn from a uniform distribution between 0 and 1. The final input is then given by where G 1 and G 2 are the input scaling factors, I(k ) is the input time series, d is the input delay, M 1 (k ) and M 2 (k ) are k th entries of the N v dimensional masking vectors, and J 0 is a constant offset. A sketch of the masked input sequence is shown in Figure 2.

Mackey-Glass
The Mackey-Glass equation is a delay differential equation which exhibits chaotic dynamics. The reservoir computing benchmarking task is to predict the time series s number of steps ahead in the chaotic regime. The Mackey-Glass equation is [36]: We use the standard parameters: τ = 17, n = 10, β = 0.2, and γ = 0.1. To create the input sequence I(k ), the time series generated by Equation (9) is sampled with a time step of dt = 1. The corresponding target sequence is then given by I(k + s).

NARMA10
NARMA10 is a commonly used benchmarking task that is defined by the iterative formula A n+1 = 0.3A n + 0.05A n 9 ∑ i=0 A n−i + 1.5u n−9 u n + 0.1, where u n are identically and independently drawn random numbers from a uniform distribution in the interval [0, 0.5] [37]. The reservoir input sequence I(k ) is given by the sequence of u n and the target sequence is given by the corresponding A n .

Lorenz
The Lorenz system [38] is given by With c 1 = 10, c 2 = 28 and c 3 = 8/3, this system exhibits chaotic dynamics. We use the x variable, sampled with a step size of dt = 0.02, as the input I(k ) for two time series prediction tasks. The first is one step ahead (s = 1) prediction of the x variable. The second is one step ahead (s = 1) cross-prediction of the z variable.

Simulation Conditions
For all tasks, the reservoir is initialised with an input sequence I(k ) of length 10,000. The system is then trained on K tr = 10,000 inputs. This is followed by another buffer of 10,000 inputs, before the performance is tested on a sequence of K te = 5000 inputs, unless stated otherwise. For each task, the reservoir parameters are kept identical and are as given in Table 1. The input scaling of the primary input G 1 (nondelayed input) and the offset J 0 are scaled such that the input range for each task is approximately [0.4, 1.3]. The scaling of the delayed input G 2 and the input delay d are used as the optimisation parameters. For each task, the performance is averaged over 100 realisations of the random masks and also, in the case of NARMA10, the random inputs.

Parameter
Value Parameter Value

Results
The performance of the reservoir with additional delayed input is tested on six tasks; we first consider Mackey-Glass time series prediction for one, three, and ten steps into the future. The results of the Mackey-Glass tasks, and their relation to the delayed input parameters, are depicted in Figure 3a-c. By inspecting the evolution of the performance error as a function of d and G 2 , an optimal performance and thus an optimal value for d can be identified (brightest light yellow region). This value, however, depends on the task and thus changes in between the panels. In order to quantify the impact of the delayed-input strength on the performance, we present scans of the delayed-input strength G 2 for the optimal input delay d, for each task, in Figure 4a-c. G 2 = 0 corresponds to the system without delayed input and should be used as the reference to quantify the performance boost due to the delayed input. For each of the three cases, the delayed input leads to a reduction in the NRMSE, ranging from 20% for s = 1 to over a factor three for s = 10. The optimal values for the delay and the input scaling G 2 vary depending on the number of steps s predicted into the future. In agreement with the results presented in [39], larger input scaling is required as s increases, indicating that nonlinear transforms become increasingly important. In terms of the absolute performance, similar results are achieved compared with other studies [39,40], despite the number of virtual nodes used in this study being significantly lower.
The results for the NARMA10 task are shown in Figures 3d and 4d. Without the delayed input (G 2 = 0) the performance of the reservoir is very poor. This is in contrast to the Mackey-Glass s = 1 for which the performance without delayed input (Figure 3a with G 2 = 0) is reasonable. Moreover, this finding supports the general observation that reservoir computers have to be optimised to individual tasks and perform poorly as universal approximators [1,10,11]. The inclusion of delayed input significantly reduced the NARMA10 error, reaching an NRMSE of about 0.3 for the input delay d = 9. In absolute terms, an NRMSE of 0.3 is within the range of typically quoted best values (NRMSE = 0.15-0.4) [4,10,[41][42][43][44], however, it is usually achieved with a much higher output dimension than the N v = 30 used here. The performance achieved in this study came at a very low computational cost. As a comparison, in [44] the authors investigate the influence of combining echo state networks with different timescales and achieve a best performance of just under 0.4 for the NRMSE, at a greater computational expense.  The remaining two tasks are one step ahead Lorenz x prediction and one step head Lorenz z cross-prediction, the results of which are shown in Figures 3e,f and 4e,f. In both cases there is an improvement in the performance with the correct choice of the delayed input. It is has been demonstrated that the Lorenz x one step ahead prediction requires only the very recent history of the x-variable time series [13], and we find the optimal input delay of d = 1 to be consistent with this prior knowledge. For the Lorenz z cross-prediction task, on the other hand, there is a strong dependence on the history of the Lorenz x variable. In this case, the best performance is achieved when the second input is delayed by d = 14 time steps. The optimal delayed-input scaling G 2 is larger for the Lorenz z task than the Lorenz x task (as seen by comparing the positions of the minima in Figure 4e,f), indicating that the cross-prediction task requires a greater degree of nonlinearity as well as a longer memory.
In order to demonstrate that the improvement in the performance with delayed input is not specific to the reservoir parameters used for Figure 3, in Figure 5 we show the NRMSE for the Mackey-Glass s = 10 task as a function of (a) the virtual node coupling strength K and (b) the coupling delay N (i.e., the number of memory cells). These parameters have a strong influence on the properties of the reservoir. In both cases the NRMSE without delayed input (orange dotted line) shows a large variation over the respective parameter ranges and is always larger than the error with optimised delayed input (blue dashed line). With optimised delayed input the variation in the error is comparatively small, demonstrating that the inclusion of the delayed input works well independent of the reservoir properties. The peak in the NRMSE at N = 30 in Figure 5b is a well-known resonance effect that occurs at resonances between the number of virtual nodes and the coupling range N, equivalent to clock time and delay resonances in time continuous systems [45]. To further demonstrate the universality of this method, we show the NARMA10 error with delayed input for a time continuous reservoir in Figure 6. In this case the reservoir is given by the Stuart-Landau equation with time-delayed feedback (see Appendix B). The reservoir parameters have not been optimised for the NARMA10 task, resulting in very poor performance without delayed input (G 2 = 0). With optimised delayed-input parameters reasonable performance is achieved, similar to the optimal results for the memory cell reservoir in Figure 3d. For the Stuart-Landau reservoir, optimal performance is achieved for the input delay d = 10, whereas, for the memory cell reservoir, the optimal input delay is d = 9. This is because the required input delay depends both on the dynamics of the reservoir as well as the memory requirements of the particular task. Figure 6. NRMSE for the NARMA10 task as a function of the delayed input parameters d and G 2 using the Stuart-Landau delay-based reservoir computer described in Appendix B.

Discussion
We have shown that, for various time series prediction tasks, including a delayed version of the input can lead to a substantial improvement in the performance of a reservoir. We have demonstrated this using a simple map describing a semiconductor optical amplifier nonlinearity and a ring-like coupling realised via memory cells. With this approach we were able to use one unaltered reservoir to perform well on six different tasks, each with different memory and nonlinear transform requirements. The performance boost due to the delayed input is achieved over a wide range of the reservoir parameters and was also demonstrated for a time continuous system, indicating that our approach is applicable to a wide range of reservoirs.
Our results are significant for a number of reasons. Firstly, we have demonstrated that computationally expensive hyperparameter optimisation can be circumvented by tuning only two input parameters. By including an additional delayed input, reasonable performance can be achieved using an unoptimised reservoir. Nevertheless, we note that, depending on the requirements for a given task, additional hyperparameter optimisation may be necessary. Secondly, to the best of our knowledge, this is the first demonstration of an identical reservoir performing well on such a large range of tasks. Thirdly, the simplicity of our approach means that it is well suited to be applied on physical reservoirs.
This study has raised several questions surrounding delay-based reservoir optimisation that require further investigation. For example, the optimal delayed-input parameters are task dependent and how these relate to a given task is not fully understood. The NARMA10 results presented in this study indicate that the optimal delayed-input parameters are related both to the reservoir and requirements of the task. This means that it may be possible to not only use reservoir computing for real-world time series prediction tasks, but also to gain insights into the dynamical systems being investigated. For example, in tasks such as El Niño prediction where the underlying dynamical system is very complex and the relevant physical processes are not fully understood [46]. Here, investigations surrounding delay-based input could provide critical insight into the involved timescales. Furthermore, the minimum requirements for a reservoir to yield good performance on a range of tasks by only tuning the delayed input parameters remain to be determined.
A natural extension of our proposed approach is to include multiple delayed input terms. This would bring the reservoir computing approach closer to classical statistical forecasting methods such as NVAR and could lead to a further improved performance, especially for tasks involving multiple disparate timescales. However, possible performance improvement with added input terms must be weighed against the associated increase in the computational cost as each added input adds two new optimisation parameters. Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable
Data Availability Statement: The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

RC
Reservoir computing NVAR Nonlinear vector autoregression NRMSE Normalised root mean squared error

Appendix A. Optimised Input Parameters
The optimised values of the delayed-input scaling d and the input delay d corresponding to Figure 5 are given in Figure A1. Figure A1. Values for the optimised input parameters G 2 (orange) and d (blue), corresponding to the Mackey-Glass s = 10 results depicted in Figure 5, as a function of (a) the virtual node coupling strength K and (b) the coupling delay N. All remaining parameters are as stated in Section 2.5.

Appendix B. Stuart-Landau Delay-Based Reservoir Computer
The Stuart-Landau system with time-delayed feedback is given by The parameter values are given in Table A1. The input sequence J(t) for this system comprises piece-wise constant steps of length θ = T/N v , T is the clock time [43]. For this system, we used regularisation by noise, meaning that we added Gaussian white noise of strength R noise to the state matrix S entries.