Neuro-Inspired Computing with Spin-VCSELs

: Delay-based reservoir computing (RC), a neuromorphic computing technique, has gathered lots of interest, as it promises compact and high-speed RC implementations. To further boost the computing speeds, we introduce and study an RC setup based on spin-VCSELs, thereby exploiting the high polarization modulation speed inherent to these lasers. Based on numerical simulations, we benchmarked this setup against state-of-the-art delay-based RC systems and its parameter space was analyzed for optimal performance. The high modulation speed enabled us to have more virtual nodes in a shorter time interval. However, we found that at these short time scales, the delay time and feedback rate heavily inﬂuence the nonlinear dynamics. Therefore, and contrary to other laser-based RC systems, the delay time has to be optimized in order to obtain good RC performances. We achieved state-of-the-art performances on a benchmark timeseries prediction task. This spin-VCSEL-based RC system shows a ten-fold improvement in processing speed, which can further be enhanced in a straightforward way by increasing the birefringence of the VCSEL chip.


Introduction
Reservoir computing (RC) is a neuromorphic computing technique which is gaining popularity rapidly in our age of Big Data and Digital Sustainability, because there is an urgent need for high-speed and energy-efficient computing techniques [1]. RC employs the transient dynamics of a nonlinear reservoir to map input data unto a high dimensional state space. An output layer can be constructed by sampling from this high dimensional state space and trained to perform tasks that are notoriously difficult for CPU-based algorithmic computing approaches, such as speech and pattern recognition, system identification and timeseries prediction [2][3][4][5]. The main advantage of RC is the simplification of the training procedure, as only a single layer of nodes needs to be trained and the larger part of the network (the reservoir) is left as it is.
The reservoir of an RC system can be any dynamical system that has an accessible high dimensional state space. Typically, the reservoirs are categorized into spatially and temporally distributed reservoirs. In spatially distributed reservoirs, the individual neurons (or nodes) can be accessed individually to read out their state value, very much like neural networks. Some successful examples of spatially distributed reservoirs are echo state networks [6,7], liquid state machines [8,9], a network of memristors [10], a network of on-chip ring resonators [5,11,12] or an array of VCSELs [13]. RC with temporally distributed reservoirs is generally known as a delay-based RC system [14]. A single nonlinear dynamical system is subjected to feedback, which creates a recurrent network. The nonlinear response of the RC system is enriched by preprocessing the input data with a step-wise constant mask. The nodes in this case can be accessed indirectly by sampling the temporal stream, hence we speak of virtual nodes. Photonic systems are especially interesting to this end, because of their high-speed response and energy efficiency. Some examples of photonic delay-based reservoir computing systems are a semiconductor laser subjected to feedback with electrical data injection [15,16] or optical data injection [17][18][19][20]. Delay-based RC lends itself very well for integrated design and the first results were achieved in [21,22]. An overview of all the (photonic) RC systems can be found in [3,5,21].
Due to the time-multiplexed nodes in a delay-based RC setup, the processing speed is inversely proportional to the duration of a single sample. This duration is determined by the number of nodes and the length of these nodes in the time domain. In [17], processing speeds around 10 MSa/s could be achieved, limited by the relaxation oscillations in the intensity. Follow-up research in [18] showed that the length of the nodes could be shortened thanks to phase dynamics occurring in the system, leading to speeds of 0.25 GSa/s. In [19], it was suggested to further increase the processing speed by spreading the number of nodes over different longitudinal modes available in the laser cavity.
Recently, ultra-fast modulation techniques have been demonstrated by relying on the polarization dynamics of spin-VCSELs [23][24][25]. Modulation speeds of up to 200 GHz were achieved in [26]. We conjecture that this fast modulation speed can be used to speed up delay-based RC systems. In this work, we numerically investigated a delay-based RC system using a spin-VCSEL and injecting the data via the pump ellipticity, such that we can employ the ultra-fast polarization dynamics to increase the processing speed. Previously, there have been numerical studies on delay-based RC systems using VCSELs, but they rather rely on the phase dynamics and the polarization modes only serve to increase the state diversity in the output layer via polarization multiplexing [27,28].
In the next section, we describe the model we used for the spin-VCSEL and we provide details on our RC setup. Afterwards, we present the results obtained from different parameter scans, accompanied by a discussion, followed by the conclusions.

The Spin-VCSEL
The spin-flip model as described in [26] is used to simulate the spin-VCSEL, as it has been shown to correctly describe the experimentally observed behavior of these lasers. We extend the model to incorporate the optical feedback, which gives us the following rate equations:Ė Here, E ± stands for the right (+) and left (-) circularly polarized components of the slowly varying amplitudes of the electric field, N is the total population inversion in the laser with a decay rate γ and n is the population difference between the spin-up and spin-down electrons with a decay rate γ s due to spin relaxation. The photon lifetime and linewidth enhancement factor are given by, respectively, τ p and α. The amplitude and phase anisotropies of the laser cavity are given by γ a and γ p . The term a + i p |E ± | 2 E ± takes into account saturation effects associated with the amplitude and phase of the field. In the last term of Equation (1), E ± (t − τ D ) is the optical feedback after a delay τ D , Ω is the constant feedback phase and η is the feedback rate. J ± are the time-dependent pumping rates of spin-up (+) and spin-down (-) electrons. In practice, as mentioned in [26], the spin-VCSEL has an electrical pump J 0 and a pulsed optical spin injection. The electrical pump contributes equally to the spin-up and spin-down populations, whereas the individual populations can be pumped separately with the optical spin injection. The pumping mechanism is further explained in Section 2.2, since it is also the mechanism used to inject data.
The values used for the different parameters are summarized in Table 1. The parameters of the spin-VCSEL were chosen to be the same as in [26], where a bitstream was being modulated at speeds of 240 Gbit/s. The electrical pump and optical pump amplitude were chosen after a few exploratory trials and in Section 3.3 an extensive scan was performed to find the optimized values. The number of nodes was scanned from 5 to 100, as the typical number lies in this range [19][20][21]. The node spacing is scanned over a range from 0.5 ps to 10 ps, which is centered around the period of the polarization oscillation π γ p , because the node spacing is typically of the same order of magnitude as the fastest time scale present in the system [3,14].

The RC Setup
A schematic overview of the simulated theoretical model is shown on the left in Figure 1. A spin-VCSEL is connected to a delay line, which has a round-trip time τ D . Data are injected in the VCSEL through the optical spin injection, such that we achieve the following pump rates: where J 0 is the previously mentioned electrical pump, δJ is the amplitude of the optical spin injection and A(t) is the normalized masked data that is to be processed. The masked data A(t) are obtained via the product I(t)M(t), where I(t) is the input data and M(t) is a sequence periodically repeating a mask. The masked data A(t) are normalized between 0 and 1. The mask consists of N step-wise constant levels of duration θ. Each datapoint in I(t) is held constant for a duration τ M = Nθ that corresponds to the mask length, as shown on the right in Figure 1. The purpose of the different levels in the mask is to create diversity in the nonlinear response of the VCSEL, such that we obtain a diverse set of virtual nodes spread along the delay line. The mask length τ M is often matched to the delay time τ D , however, in this paper, this is not the case and τ M can be much larger than τ D , such that a single masked datapoint is spread over multiple roundtrips of the delay line. A principal mask is generated for a maximum of 100 nodes by randomly selecting each mask level from the following set, (0, 0.25, 0.5, 0.75, 1). If N < 100, we used the first N values from this principal mask to obtain our mask. The principal mask is kept fixed. We only inject data in the spin-down carrier population, which contributes to the left circularly polarized mode. We do not have fixed values for the node separation θ, the number of nodes N, nor the delay time τ D or mask length τ M , because this is the first time that delay-based RC using spin-VCSELs is being investigated and we will scan these parameters for optimal RC performance. Typically, θ is closely related to the fastest timescale of the laser-based reservoir, which in this case is the polarization oscillation, which on its turn is linearly dependent on the inverse of the birefringence γ p . Hence, we expected θ to be in the range of 1-10 ps, which is a substantial improvement in comparison with node spacings of 20 ps used in other laser-based RC systems [19].
There are several ways in which the output layer of the RC can be constructed. The virtual nodes V i , shown in Figure 1, can either be formed by sampling the output power P = |E| 2 in a single mode (E + or E − ) or by concatenating the virtual nodes from both modes P + = |E + | 2 and P − = |E − | 2 , resulting in 2N nodes in contrast with the N nodes introduced by the mask. One could instead also construct an output layer consisting of the output polarization POL out = P + −P − P + +P − as virtual node values, but we found that the performance was negatively affected by this added hyperbolic nonlinearity. We briefly discuss this in our results in Section 3.1.
The training phase consists of feeding the setup with m masked datapoints and sampling the virtual nodes, so that we obtain the state matrix Q (m × 2N). We already know the expected output y expected for the m datapoints and hence, the 2N weights of the output layer can be calculated with the Moore-Penrose inverse Q + , such that: The obtained weights are then kept constant, such that unseen data can be fed to the system in order to test the performance. In this paper, the performance was benchmarked by the Santa Fe timeseries prediction task [29]. The data were from the Santa Fe timeseries competition [29] and consist of a univariate chaotic timeseries obtained from a NH 3 laser. The goal of the task was to predict the chaotic timeseries one step ahead. This task is frequently used to benchmark RC setups [17,18,22,27]. The first 3000 datapoints were used for the training and 1000 for testing. The performance was measured and indicated by the normalized mean square error (NMSE): where y(n) is the value predicted by the RC, y expected is the expected value for the given input and the symbols || . . . ||| and . . . stand for the norm and time average, respectively. The lower the N MSE is, the better the system performs. For the Santa Fe timeseries predic-tion task, the state-of-the-art performances for numeric RC simulations ranges between 0.01 and 0.1 [4,19,21,22].

Results and Discussion
Our setup has many parameters that affect the RC performance, hence we will perform scans along certain parameter spaces to obtain optimal parameter values.

The Role of Delay Time τ D
In previous studies, the delay time τ D is often matched to the mask length τ M [14,[17][18][19]. Initially, we did the same, such that (τ D = τ M ). This allowed us to observe the effects of feedback-induced dynamics (related to τ D ) on the performance and at the same time we could find a combination of N and θ that might work for our benchmark task (τ M = Nθ). In Figure 2a,b, we saw the results of these scans for a low feedback rate η = 1 ns −1 and for a high feedback rate η = 100 ns −1 , respectively. For the system with a low feedback rate, we saw a rather large area with good performance (lighter colors) in contrast with the results for the system with a high feedback rate. We would expect the best performing regions to be aggregated at higher values of N and around a particular value of θ, because the node state diversity becomes poor at low N and the node spacing has to be around the period of the polarization oscillations. Here, we see two contrasting plots for the different feedback rates. For η = 1 ns −1 , the absolute best performance was achieved at N = 50 and θ = 3 ps with N MSE = 0.044. For η = 100 ns −1 , the absolute best performance was achieved at N = 30 and θ = 0.5 ps with N MSE = 0.071. These best points were shown as red crosses in Figure 2a  On both plots, we see the regions with good performances (lighter colors) that coalesce around hyperbolic curves, corresponding to constant mask length τ M and the delay time τ D , as τ D = τ M . For the low feedback system, we found the best performance at τ D = τ M = 150 ps and a larger region of good performance at τ D = τ M = 228 ps. These are shown as white dashed lines in the left plot. For the strong feedback system, we found the best performance at τ D = τ M = 15 ps, shown as a white dashed line in the right plot. We observe that longer delay lines are required to obtain good performance for a low feedback rate and vice versa. Furthermore, for low feedback rates, the best performances are found near the middle of the hyperbolic line, showing a trade-off between the number of nodes N and the node spacing θ. For high feedback rates, a similar trend is seen, where the performance worsens as the number of nodes N is decreased. This makes sense as the diversity of states of the virtual nodes is reduced if N is too small and hence the nonlinear memory capacity of the system will deteriorate [30].
It seems that the delay time, in combination with the feedback rate, has a very profound effect on the performance of our setup via feedback-induced dynamics. This is in stark contrast with previous delay-based RC setups using edge-emitting semiconductor lasers, where the delay time τ D had no such significant role in the RC performance [20,31].
The results indicate that τ D and η affect the dynamical regime of the VCSEL considerably. This observation was supported by the findings in [32], where the nonlinear dynamics of the spin-VCSEL subjected to feedback was studied. The authors found that VCSELs connected to longer delay lines quickly move towards chaotic regimes with increasing feedback rates, whereas VCSELs connected to shorter delay lines would have various plateaus of steady state behavior, interspersed between chaotic regions, as the feedback rate is increased.
To further investigate the effect of τ D and η on the dynamical regime, we prepared two RC systems with the optimal parameters corresponding to the red crosses in Figure 2a,b. We studied the modal output power as the system was injected with a constant value instead of masked data. The resulting orbit diagrams of the modal output power, as the feedback rate was varied, are shown in Figure 3a,b. In Figure 3a, we see the orbit diagram for a long delay time τ D = 150 ps. The laser has a steady state behavior for the lowest feedback rate, but it quickly moves towards a chaotic regime via a period-doubling route. In contrast, for a shorter delay line τ D = 15 ps (orbit diagram shown in Figure 3b), we see plateaus of steady state behavior, interspersed by periodic regimes, similar to findings in [32]. In Figure 3c,d, we show the Santa Fe timeseries prediction performance as a function of the feedback rate for τ D = 150 ps and τ D = 15 ps, respectively. Furthermore, we show the performance for each possible output layer. From both plots, it is clear that the best performances were achieved when the output layer consists of virtual nodes from both P + and P − , whereas training on virtual nodes from a single mode gives the worst outcome.
Training on the output polarization POL out = P + −P − P + +P − is not recommended either, since its performance is almost always worse. We see a brief interval for η ∈ [50, 80] in Figure 3c,d where the output layer formed by the output polarization has a better performance than the rest. However, this performance is nowhere near the minimum N MSE.
From the literature, we know that the best performing reservoirs are typically tuned to be at the edge of a periodic or chaotic dynamical regime before data are injected into the system [3,33,34]. It is at η = 1 ns −1 for τ D = 150 ps and η = 100 ns −1 for τ D = 15 ps, where we see a switch from a steady state to a periodic regime. These are also the feedback rates with which we achieve the best N MSE values. Another local minimum N MSE was found around η = 15.5 ns −1 for τ D = 15 ps in Figure 3d, which corresponds to a switch from a steady state regime to a periodic regime in Figure 3b.
The findings from Figure 3 confirm our assumption that the delay time τ D and feedback rate were intrinsically linked to the RC performance as opposed to previous RC setups, where the delay time is several orders of magnitude larger, and where the delay can be kept fixed when the feedback rate is changed [14,17,18,27].

Decoupling τ D and τ M
Now that we established that the delay time τ D is intrinsically linked to the RC performance, we decouple the mask length τ M from the delay time τ D , such that we can find the optimal number of nodes N and optimal node spacing θ. There have already been studies where the delay time was longer than the mask length (τ D > τ M ), with state-ofthe-art performances [16,20,31]. This mismatch between mask length and delay time is favorable for RC performances, because the interconnection between nodes over multiple round trips becomes more complex. In our case, however, the delay time τ D is already fixed at rather low values. For the reservoir with weak feedback, the delay time is fixed at τ D = 150 ps and for the reservoir with strong feedback at τ D = 15 ps. Here, we expect that the case where τ D < τ M will be favored, since τ M = Nθ and we need a sufficient number of nodes N to be able to perform the tasks.
We did the same parameter scan as in Figure 2, but this time with constant delay times (τ D = 150 ps for η = 1 ns −1 and τ D = 15 ps for η = 100 ns −1 , corresponding to the white lines in Figure 1) and the results are shown in Figure 4. We again see the best performing regions coalesce along hyperbolic curves only corresponding to a constant mask length τ M this time. For the reservoir with low feedback, we find the best performance N MSE = 0.025 at θ = 5.5 ps and N = 95, corresponding to τ M = 522.5 ps. This point was depicted with a red cross in Figure 4. The mask length at this point corresponds to a processing speed of approximately 2 GSa/s. For the reservoir with strong feedback, shown in Figure 4b, we have a considerably larger part of parameter space which gives excellent performance. The best performance for this reservoir is N MSE = 0.012, found at θ = 7.5 ps and N = 55, corresponding to τ M = 412.5 ps. This point is depicted with a red cross. However, we observe another region with similar performances (N MSE around 0.015) around the hyperbolic curve corresponding to τ M = 188 ps (shown as a white dashed line). The points on this white curve have a shorter mask length and hence a higher processing speed as compared to the red cross. The processing speed for the points on this white line is 5.3 GSa/s, almost three times faster than that of the RC with low feedback. Both based on the N MSE as well as the processing speed, we found that our RC has superior qualities when operated with strong feedback rates.
It is worth noting for the discussion later on, that at the best processing speed, a single masked datapoint traverses the delay line τ M τ D = 3.5 times for the low feedback setting and 12.5 times for the high feedback setting.
We also note that the lower left corner of Figure 4 shows high N MSE values, i.e., worse performances, which corresponds to the area where τ D < τ M . In this region, either the number of nodes are not sufficient to deliver the nonlinear memory capacity needed to perform the task or the node spacing becomes too small for the laser to be able to follow the input data. These findings are similar to those in [20]. Similarly, in Figure 4b, we see the lower left corner turn darker, indicating worsening performances. This area is smaller, given that the delay time τ D = 15 ps is smaller than τ D = 150 ps in Figure 4a.

The Role of Pumping Parameters
Two other important parameters of the RC system are the electrical pump J 0 and the optical pump amplitude δJ. We scanned these two parameters, for a reservoir with weak feedback and a reservoir with strong feedback corresponding to the red crosses in Figure 4a,b. The result of this scan is shown in Figure 5a,b, respectively. We observe that the best performance for the reservoir with weak feedback is found at higher electrical bias as well as higher spin pump amplitude. For the reservoir with strong feedback, the best performances are found at lower electrical biases and a fairly wide interval of optical pump amplitudes (between 2J th and 8J th ). The points of best performances are indicated by the red crosses. For the low feedback reservoir, this point, corresponding to an N MSE of 0.015, was found at an electrical pump of 7.5J th and an optical pump amplitude of 9.5J th . For the high feedback reservoir, the lowest N MSE = 0.013 was found at an electrical pump of 3J th and an optical pump amplitude of 8J th . The pump parameters are optimized to rather high values for the reservoir with weak feedback and to low values for the reservoir with strong feedback. We presumed that these optimized parameter sets are such that there is a balance between the linear and nonlinear memory capacity needed for the Santa Fe task. In Ref. [31], in the case of a standard semiconductor laser, Köster et al. showed that the linear memory capacity drastically decreases as the mask becomes longer than the delay time. In our reservoirs, both with weak and strong feedback, the mask is longer than the delay time, namely τ M = 3.5τ D for weak feedback and τ M = 12.5τ D for strong feedback. The nonlinear memory capacity in these systems increases, since nodes related to one data sample (i.e., within one mask length) strongly mix with each other due to the mask looping around the delay line multiple times. This increase in nonlinear memory capacity comes at the expense of linear memory capacity, which is typically due to the interaction between nodes of subsequent data samples (i.e., over subsequent mask lengths). However, contrary to the system studied in [31], which have two characteristic timescales (namely the nodal response timescale and the feedback dynamics timescale), our system has three characteristic timescales that can couple nodes with each other. The first of these is the timescale introduced by the polarization dynamics, which governs the nodal response. The second is the timescale introduced by the feedback dynamics, which couples nodes separated by the delay time τ D and lastly, we have the timescale introduced by relaxation oscillations (ROs). The RO frequency and damping are strongly influenced by the pumping parameters.
We prepared two reservoirs to look at the effect of ROs on the interaction between nodes. One reservoir was set up with a weak feedback rate ( η = 1 ns −1 and τ D = 150 ps) and another with a strong feedback rate (η = 100 ns −1 and τ D = 15 ps). The other parameters of these reservoirs were optimized for best performance in the Santa Fe task. Both systems were injected with a constant stream of zeros until they reached a steady state regime. Then, we applied a perturbation of length θ and observed how this perturbation rippled through the system into the sum of the modal intensities.
The result is shown in Figure 6, where we showed the total output intensity response for the reservoir with weak feedback (in blue) and for the reservoir with strong feedback (in orange). For the sake of the discussion, we plotted the output versus the time in multiples of the mask length (τ M ), meaning that the blue curve (τ M = 522.5 ps) is slightly compressed compared to the orange curve (τ M = 412.5 ps). In addition, we noted that the perturbation has a duration θ, which was different for both optimized systems, namely θ = 5.5 ps for the weak feedback reservoir (blue curve) and θ = 7.5 ps for the strong feedback reservoir (orange curve). The oscillations found in the output of the weak feedback reservoir and strong feedback reservoir had frequencies of around 10 GHz and 5 GHz, respectively. This corresponds to typical RO frequencies found for VCSELs operated at given pump parameters. Furthermore, we see that the RO frequency and the damping increase as the pump parameters are increased.
For the reservoir with weak feedback (blue curve in Figure 6), we see that the response to the perturbation is short-lived. The perturbation introduced at the start of the data sample has an effect over approximately one mask-length. This means that the nodes related to one data sample (i.e., within one mask length) were interacting with each other thanks to the RO, but also nodes between subsequent samples (i.e., over subsequent mask lengths) interact with each other at the edges of the samples. The linear memory capacity that is lost due to the mask looping 3.5 times around the delay line is somewhat reinstated by the interaction between nodes of subsequent samples due to the RO.
For the reservoir with strong feedback (orange curve in Figure 6), we see that the response lingers in the system for at least two subsequent data samples. The mask loops for 12.5 times around the delay time, drastically decreasing the linear memory capacity according to Ref. [31], however, we see that the pumping parameters were optimized such that the RO damping rate was low and hence nodes over multiple samples could still interact with each other, which will counteract the decrease in linear memory capacity.
To conclude this section, we observed that the weak and strong feedback reservoirs could be fine tuned to give better performances, by adjusting the pumping parameters that affect the relaxation oscillations, which on its turn affects how nodes within (and over multiple) mask lengths interact with each other. Further detailed research into the linear and nonlinear memory capacities of the system can be useful to further interpret these results, but such a study is outside the scope of this paper. Figure 6. A reservoir with weak and another one with a strong feedback rate are prepared with a constant injection of zeros. A perturbation is introduced in the first node of the second datapoint and we observe how this perturbation ripples through the sum of the modal intensities. Parameters for the reservoir with weak feedback: η = 1 ns −1 , τ D = 150 ps, N = 95, θ = 5.5 ps, J 0 = 7.5J th and δJ = 9.5J th and for the reservoir with strong feedback: η = 100 ns −1 , τ D = 15 ps, N = 55, θ = 7.5 ps, J 0 = 3J th and δJ = 8J th .

Conclusions
We numerically investigated a delay-based reservoir computer using a spin-VCSEL, taking advantage of the high modulation speeds achievable with these lasers. We systematically scanned the RC performance as a function of the delay time, the mask length and the pumping parameters. In contrast with known photonic delay-based RC systems, this RC with spin-VCSELs shows a rather strong relation between the delay time and feedback rate on one hand and the RC performance on the other hand. A reservoir with weak feedback has better performances at longer delay times and vice versa for a reservoir with strong feedback. We further observe that the weak (strong) feedback regimes also favor different pumping parameters. We found that the reservoir with strong feedback has superior qualities, both based on the RC performance and the processing speed.
The RC system is benchmarked using the Santa Fe timeseries prediction task and has shown performances comparable to state-of-the-art delay-based RC systems, but at a considerable faster processing speed. The speed reached with this setup is around 5 GSa/s, an improvement by a factor of 10 compared to delay-based RC using single-mode semiconductor lasers [18,20], while maintaining the same error rate. The optimal node spacing is found to be between 5 and 8 ps, which is of the same order of magnitude as the inverse of the used birefringence (5 ps). Since the speed of this system is linked to the birefringence of the lasing cavity, the processing speeds can be increased in the future as the expertise on tuning the birefringence improves.