Application of Reservoir Computing for the Modeling of Nano-Contact Vortex Oscillator

The Nano-Contact Vortex Oscillator (NCVO) is a highly nonlinear spintronic device that can depict chaotic and nonchaotic behaviors according to the current flowing through it. The potential use of such a device in the future-generation computing systems requires the knowledge of a realistic model capable of describing its exact dynamics. In this paper, we firstly investigate the behavior of NCVO based on the power spectral analysis. Furthermore, we propose and demonstrate two efficient approaches of reservoir computing for the modeling of such a device. The performances of the proposed models are addressed in two ways. First, the generated time-varying signals are compared with the simulated magnetizations of the NCVO at different operating currents. Then, the power spectral analysis of one of the two models is carried out to examine its overall behavior over the complete DC current operating range and its ability to diagnose chaotic and non-chaotic regimes. The proposed models show quite promising results that can be counted on for further research.


Introduction
In the era of the Internet of Things (IoT) and Big Data (BD), a huge amount of data of different types is permanently generated and collected by sensing systems over time. The processing of these data strains the conventional CMOS circuits and systems to respect increasing hard timing requirements while trying to keep the overall power consumption in the very limited power budgets. Due to the increasing thermal issues, some limits of the conventional CMOS designs have already been reached and are at the origin of the new orientations that chip designers took in the last decade: heterogeneous Multiprocessor System-on-Chips (MPSoCs) [1]. Although these design orientations have brought some solutions, they are also at the origin of what we call the dark silicon paradigm [2]: the above mentioned thermal issues and stringent power budgets do not allow fully exploiting the overall computing power of a chip whereby through smart management, some portions of it are forcedly deactivated during its normal functioning. On the other hand, the large amount of data that needs to be processed imposes tight constraints not only on the necessary performances per watt that computing circuits and systems need to supply, but also on their overall energy efficiency.
In order to deal more efficiently with the constant computer efficiency demands and the diminishing technology benefits introduced by the technology scaling (leakage, variations, reliability), new beyond-CMOS emergent technologies must be considered. Many alternatives are being explored to meet these challenges. One of the solutions is to replace silicon with a less dissipative material such as graphene [3] or even to use magnetic materials whose bistable magnetization state and inherent non-volatility can be exploited to build Nanomagnetic Logic (NML) [4]; or their spintronic properties to design bioinspired systems [5]. Other approaches, such as quantum computing, propose to redesign

NCVO and the Physics Behind It
NCVO consists of an extended spin valve structure made of multiple stacked thin layers including two ferromagnetic layers (Figure 1a). The electrical current enters the device through a metallic NanoContact(NC) on top. Most of the flowing currents are perpendicular to the surface, producing a large Oersted-Ampère field with a circulating configuration. This allows the formation of a magnetic vortex only in the upper ferromagnetic layer below the NC. The remaining currents spread in the plane to farther regions and generate a spin transfer torque that moves the vortex core away from the center. At a given distance from the NC, this torque is compensated by a magnetic damping where the combination of these two torques with the precession torque (Figure 1c) leads to a self-sustained gyration of the vortex core around the NC (Figure 1b). In contrast, the magnetization of the lower magnetic cobalt layer remains uniform and is issued as a reference for measuring the Magnetoresistance (MR) of the spin valve. The resistance of the device is then proportional to the azimuthal angle of the vortex core. Since a bias DC current is applied to generate the vortex dynamics, due to the MR effect, an oscillating voltage V GMR at the vortex gyration frequency appears across the device [15].
To study this phenomenon, an NCVO was tested by micro-magnetic simulation for different DC bias current values I while recording the space averaging of the three components of the magnetization vector. For the value I, three different modes are recognized: • No-modulation mode: This mode appears for currents I in the range [12, 12.1] mA. The vortex revolves around the NC continuously. No change of polarity takes place. Thus, only a fundamental mode, which denotes the gyration frequency, is obtained with its harmonics in the power spectrum (see Figure 1d). • Modulation mode: This modulation frequency comes from a periodic relaxation of the vortex dynamics due to the core reversal. In this mode, the power spectrum shows two sideband frequencies ± f mod associated with the gyration frequency f 0 (see Figure 1e). f mod , which denotes the vortex polarity reversal, is said to be commensurably locked to f 0 . As a result, the vortex fulfills complete cycles around the NC before dropping toward the center to invert its direction. • Chaotic mode: As its name indicates, the vortex revolves with no definite rules. The vortex may drop at any time toward the center due to the non-compatibility of the gyration and modulation frequencies (see Figure 1f). Figure 1g shows the variation of the fraction f mod / f 0 as a function of I. The plateaus are a sign of phase-locking phenomenon, whereas the monotonic behavior comes from chaos. To verify this, the power spectrum density is plotted as a function of both the current and frequency (see Figure 1h). At the plateau, only a few frequencies are obtained: the fundamental frequency, its sidebands, and harmonics; whereas in the chaotic region, a large number of frequencies can be observed.

Modelling Methods
Real-world systems are inherently dynamic and complex. Their complexity is a result of a multitude of factors [18]: dynamic changing of the environment in which systems evolve or of their internal structure; tight coupling and interactions between the parts of systems; feedbacks on the environment dynamics and the system's; non-linearity giving rise to unpredicted and spontaneous features exhibiting chaos, self-organization, and pattern formation. To model such complex nonlinear systems, different approaches have been proposed in the literature [19,20]. In order to describe the complex dynamics of a system, a common approach is to use nonlinear system identification, consisting of several steps: data acquisition, system model hypothesis, parameter identification, and model test and validation. In the first step, the inputs and the outputs of the system are observed and acquired and will be used throughout all further steps. The system model hypothesis is vital because it needs to describe faithfully the behavior of the observed system, otherwise it will be rejected after the test and validation phases.

Si
Cu (10) Au (5) Cu (40) Au (40) Ta (5) Resist ( Among models used to describe the nonlinear dynamics of a system, different model classes have been reported [20][21][22][23][24][25][26][27]. Continuous-time polynomials represent the first class of models where, by using Ordinary Differential Equations (ODEs), the acquired data can be fitted [21]. The examples of nonlinear models described with ODEs are Lorenz [23] and Rössler systems [24]; the Lotka-Volterra model [28] suited for ecological systems as it efficiently describes the interaction of different species competing in nature [29]; the Kuramoto-Sivashinsky equation for chemical reaction modeling [21]. Complex chaotic dynamics can be modeled as well by delay differential equations [30]. This modeling technique has proven its efficiency in many fields, especially the population dynamics field [31]. Another large class of models used to represent nonlinear dynamics is neural networks [32][33][34]. These networks consist of neurons organized in layers that receive weighted inputs from all other neurons from neighboring layers in a feed-forward manner (no feedback loops) and compute their sum nonlinearly. Such systems can be described with coupled ODEs with a sigmoidal nonlinearity such as the hyperbolic tangent. Their use is well suited for physical problems as a small stimulus produces a linear response that saturates at a given large stimulus. However, the large number of requested parameters makes the system computationally expensive. In contrast to the latter system, sparse neural networks [33] just need a few parameters to mimic a chaotic behavior. The interaction matrices in such networks are circulant [34], corresponding to neurons arranged in a ring, each interacting identically to its neighbors. The main drawback of such networks is the ruinous effect of the initial conditions on the dynamics, which should be defined asymmetrically to avoid synchronization leading to chaos. Other classes of models used to describe the dynamics complexity of nonlinear systems are radial basis function models [35] (similar to NNs without weighted inputs between neurons in hidden layers), fuzzy logic [36], or rational models composed of the division of two polynomials [37].

Echo State Network
The Echo State Network (ESN) approach was first introduced by Herbert Jaeger in 2001 [38]. ESN is composed of K input units u, N internal network units x, and L output units y; combined with a set of weights ( Figure 2). After the input data are introduced, the neural network generates its internal states based on the fixed weights: the input matrix W in , the internal matrix W, and the backpropagation matrix W back . These intermediate values are exploited to train the output connections in the second part of the system: the trainable output matrix W out . The first part of the system, known as the reservoir, is a Recurrent Neural Network (RNN), which is not enrolled in the training phase. The second part, called the readout, is a classifying layer usually represented with a simple linear function.
The key point in this type of network is the reliance on the input history u(n), u(n − 1), . . . , u(0) for the computation of each internal state x(n); that is why it is described as "echo". Knowing that f = ( f 1 , .., f N ) is the output activation function of internal units, the activation process inside the network follows the following scheme: On the other side, the output is computed based on the current input, the current internal state, and the previous output values under the output activation function of output units f out = ( f out 1 , .., f out N ): In fact, the "echo state" property of a network is highly related to the internal matrix prepared for training [38,39]. For that, this matrix should be computed carefully, as will be seen later, whereas the remaining matrices of the reservoir can be freely chosen, as they do not affect the echo state property of an ESN.
The construction of the internal reservoir, as well as the output layer is done in the following manner. First, the computation of W is carried out by creating an untrained Dynamic Reservoir (DR) (W in , W, W back ), which has an echo state property and random values for W in and W back . The echo state property is ensured by |λ max | W < 1, where |λ max | is the spectral radius of the matrix W corresponding to the largest absolute value of its eigenvalues λ. The size N of W is generally chosen as T/10 < N < T/2. This matrix is then multiplied by (α/|λ max |) where α is the desired spectral radius of the final matrix W. There are no formal rules for the choice of α. However, this parameter can be tuned manually knowing that the large values increase the acceleration of the system's dynamics [39].
The training phase takes place by initializing the network to an arbitrary state, for example x(0) = 0. Then, by setting y(0) = 0, the internal states are computed using the activation formula of Equation (1). For n > T 0 (T 0 is the process washout time): the vector (u(n), x(n), y(n − 1)) is collected and inserted as a row of matrix M, and the vector tanh −1 (y(n)) is collected and inserted as a new row in the matrix D while going down in rows with time in both matrices.
Finally, the output matrix of the readout layer is calculated by the following equation: At this stage, the reservoir (W in , W, W back , W out ) is ready for use. It can be tested by computing the internal states Equation (1) and then the output units Equation (2) using the learned weight matrices. The whole process is summarized by a flowchart diagram in Figure 3. The obtained echo state network can be successfully used to model one time series. In the NCVO case, one time series corresponds to one set of magnetization signals (M x (t), M y (t) and M z (t)) obtained for one value of the input DC current. Therefore, it becomes impossible to capture all the sets of magnetization signals of the NCVO device using a single system. To do that, separately trained ESNs are needed. In other words, the number of echo state networks should be equal to the number of input datasets. In the next section, we expose a new methodology of training recurrent neural networks, which, by its memory, assures the easy switching of the different input datasets under the same weights. Consequently, the magnetization signals M x (t), M y (t), and M z (t) can be captured at different DC current values, previously trained on, using a single system.
W out W y nodes x nodes u nodes

Conceptors-Driven Network
The behavior of an NCVO varies with respect to the value of the direct current flowing through the device. Thus, there is a need for neural Long-Term Memory (LTM) [40] capable of saving and regenerating a large number of time series patterns. We suggest the use of conceptors-reservoir computing [41], which passes serially through two phases: the loading phase where patterns are installed in the network and the recall phase in which the patterns are regained according to the conceptor's value.
To model this system, an RNN with N neurons is built by randomly creating an N × N internal connection matrix W * , an N × K input weights matrix W in where K denotes the number of input datasets, and a N random bias vector b. In the loading phase, the scalar pattern p j (n) always drives the network by the following state-update equation: Note that j denotes the index of the pattern. Before starting, the network undergoes a training of the readout matrix where an M-dimensional white noise ν(n) is introduced to the network to collect the states: Accordingly, W out is computed via the optimization problem below: Note that the readout neuron is trained on a random signal, and thus, it should perform well for any temporal signal.
The input internalization weights matrix W is induced from the random matrix W * . This transformation enables the new reservoir to mimic the impact of the drivers p j in the absence of them. For that, the K patterns are fed into the reservoir by separate runs to compute the states x j according to Equation (4) for 1 j K. Consequently, W is calculated by minimizing the quadratic loss of Equation (7) after the washout period n 0 ends: Although, if the network is run just by iterating x(n + 1) = tanh(Wx(n) + b), the resulting input-free reservoir dynamics is entirely unpredictable. The basic role of the conceptors in the LTM is to guide the system to which temporal pattern it should engage at the exploitation time. Conceptors are N × N matrices, which interfere in the following manner: By referring to [41], the quadratic loss function for the computation of the matrices C j s is: and its explicit solution is as follows: where R j = E[x j (n)x j (n) T ] is the N × N correlation matrix of states x j (n), . 2 f ro denotes the squared Frobenius matrix norm, and γ is the aperture parameter that can be understood as a virtual scaling factor of the reservoir states. Note that R j is computed first without the intervention of the conceptors Equation (4). Afterwards, conceptor C j is calculated easily by Equation (10). This is done for each pattern separately.
After the RNN is completely installed, it enters the recall phase. At this stage, the system is able to obtain any pattern's state using its conceptor in the following sense: The final output is computed linearly from the network state through the readout weight matrix W out : This process is summarized in Figure 4 as a flowchart diagram.

Simulation Setup
The reference signals for training of both proposed models were obtained from the finite difference micromagnetic solver Mumax3 [42], which performs a numerical space and time integration of the Landau-Lifshitz-Gilbert equation including the different spin transfer torque terms. The mesh was composed of 512 cells in each of the x and y directions and one cell in the z direction (along the depth of the ferromagnetic layer). The cell size used was 2.5 × 2.5 µm 2 in the film plane (x and y direction) and 20 nm for the thickness. The current distribution and the Oersted field it generates were previously calculated with COMSOL [43], before being injected as a parameter in Mumax3. A precise description of the procedure used to realize these simulations can be found in the Supplementary Section in [15] and in [44]. The extracted M x (t) and M y (t) correspond to the time evolution of the average magnetization along the x and y directions, respectively. The measured signal on a real device is proportional to a linear combination of these two quantities. For the modeling work, the two approaches of RC were executed on the simulated data by running a programming code on MATLAB R2015a.

Generated Data from the Proposed Models
In this subsection, we present the dynamics generated by both the ESN and conceptors-driven RNN, presented in the two previous sections and try to depict the real dynamic magnetizations of NCVO (see Section 4.1). The data we processed were the spatial averaging of the two in-plane components of the NCVO's outputted magnetization (M x and M y ) obtained from micro-magnetic simulations and considered here as gold reference signals (see Section 4.1). The third dimension of magnetization was not studied as it only denotes the vortex polarity. The injected current ranged from 12 mA to 20 mA with a step of 0.1 mA. Therefore, by varying the injected current with a 0.1 mA step in the [12 mA, 20 mA] range in the micro-magnetic simulation framework, we obtained 81 reference datasets, which were further used for the training of both the ESN and conceptor-driven approaches. Each of the 81 input datasets had a data length of T = 10,000 samples with a time step t s = 10 −11 . The hyperparameter tuning of the ESN approach was done as in [45].
For the ESN, the number of neurons was set to 3000, and the spectral radius α was tuned for each input current separately. Seventy percent of the data was used for training, and the rest was left for testing. Some of the obtained results with this approach of different waveforms are shown in Figures 5 and 6 for both M x and M y , respectively. At I = 12 mA no modulation was captured; at I = 12.3 mA, 13.3 mA, and 16 mA, the system was said to be chaotic; and for the rest, it was non-chaotic.
On the contrary, the 3000 neuron conceptors-driven RNN was filled with the 81 training examples p j (n) one after the other before the regeneration phase. The conceptor C j of each pattern was computed after tuning its aperture α j . Thirty percent of the data length was dedicated for the recall phase. Similarly, the obtained results with this approach are shown in Figures 7 and 8 for both M x and M y , respectively.
At first glance, the two networks provided good tracking results in three modes: no-modulation, modulation, and even in the chaotic regime. The latter behavior occurred at 12.3 mA, 13.3 mA, and 16 mA, where the time-varying signals were slightly distorted. In all figures, the signals generated by the ESN and conceptor-driven RNN are presented in red, whereas the signals issued from the micro-magnetic simulations of the NCVO are in blue. In order to evaluate the tracking results of both approaches, a comparison in terms of the Mean Squared Error (MSE) between the generated signals obtained on testing data and reference signals (from micro-magnetic simulations) was carried out. These results are shown in Table 1. It can be noticed that the MSE for all currents for both magnetizations and approaches was in the [10 −4 , 10 −7 ] range. However, the ESN approach seemed more accurate for most of the current range for both magnetizations when compared to the reference signals from micro-magnetic simulations. This result was expected as for each current, the parameters of the ESN network were tuned separately to allow better tracking of the NCVO magnetizations corresponding to that current. However, the main drawback of this approach was that it was able to generate only one output signal per run. On the other hand, the conceptor-driven RNN approach gave comparable and for some currents even better tracking results when the generated signals on the testing data were compared with the reference ones from micro-magnetic simulations. Moreover, it had the advantage that once the network was trained, it could be used for all currents to generate the output signals only by changing the conceptor value.
As the ESN approach generated more accurate traces in terms of MSE when compared with the micro-magnetic reference signals, the generated results from testing data were also compared with these reference signals in terms of power spectral analysis. These results are shown in Figure 9. Figure 9e shows the power spectral densities for all observed DC currents in a 2D plot. Thus, for a given DC current, all frequencies generated by the proposed ESN model from testing data can be observed on the PSD color map along the vertical axis. For instance, the power spectra corresponding to three different DC currents generated by the ESN model from testing data are presented in Figure 9a-c for I = 12, 15.3, and 16.2 mA, respectively. These three different currents correspond to the three previously identified operation modes of the NCVO device. For 12 mA, only the fundamental frequency f 0 , corresponding to the NCVO gyration frequency, and its harmonics are observed. As explained in Section 2.1, this power spectrum is characteristic for the no-modulation regime of the NCVO device. For 15.3 mA, besides the fundamental frequency f 0 and its harmonics, the modulation frequency sidebands f mod can be visualized. Referring to Section 2.1, the modulation mode of the NCVO was identified from this power spectrum. Finally, for 16.2 mA, a frequency rich power spectrum was observed, which was characteristic of the third chaotic and incommensurable regime. Therefore, all three regimes of the NCVO described in Section 2.1 were all captured by the ESN model.    This plot extracted from the signals generated by the ESN model (in red) was compared with the f mod / f 0 of the reference system from micro-magnetic simulations (in blue) and is shown in Figure 9d. In the f mod / f 0 plot, the plateaus are signs of the synchronization regions where the NCVO device can be either in no-modulation or modulation mode. Moreover, each plateau was described with a rational number indicating the number of turns of the NCVO vortex before changing the polarity. On the other hand, the monotonic rate in the f mod / f 0 plot indicates non-stable and chaotic regions. By comparing the results presented in Figure 9, it can be noticed that the non-chaotic regions are still distinguished in the results generated by the ESN model where the value of f mod / f 0 in the intervals [12.9, 13] mA, [13.5, 15.6] mA, and [17,20] mA varies slightly around 1/6, 1/4, and 1/2, respectively. This means that the ESN model clearly distinguishes the modulation mode of the NCVO device (see Section 2.1). Moreover, the monotonic rates captured by the ESN model between these non-chaotic regions or plateaus are the sign of the chaotic mode behavior. On the other side, it can be noticed that the sideband frequencies cannot be extracted for some DC current values; these frequencies belong to the regions where chaos is dominant. This can be noticed in the 2D plot of PSD (Figure 9e), where for I belonging to the interval [12.2, 13.4] mA, the lower sideband frequency cannot be easily identified. Moreover, from Figure 9e, it can also be seen that the presented ESN modeling approach filters the harmonics of the gyration frequency, which are clearly visible in the PSD plot from micro-magnetic simulations presented in Figure 1.
In addition, to confirm the validity of the proposed ESN model to capture the time-varying nonlinear dynamics of the NCVO device along the whole DC current range, another way of presenting its generated PSD results from testing data is to plot the f mod / f 0 curve.

Conclusions
In the present work, two different modeling approaches were applied to model and describe the complex highly nonlinear dynamics of the spintronic device called the Nano-Contact Vortex Oscillator (NCVO). The behavior of this NCVO device can be controlled with the input DC current flowing though it and can be chaotic or nonchaotic with clearly identified frequency patterns. Two different reservoir computing based approaches were considered: the echo state network method was used to describe the NCVO's behavior for each DC current; and conceptor-based reservoir computing for all possible DC currents. Both modeling approaches gave good tracking results of the generated signals when compared with the simulated signals from micro-magnetic simulations with the MSE error in the range of [10 −4 , 10 −7 ]. The proposed ESN model generated from testing data more accurate results than the conceptor-based reservoir computing approach, but at the detriment of the system's complexity. Indeed, in the proposed ESN approach, one NCVO's state, which corresponds to one DC current, was described with a different ESN trained separately. On the other hand, the conceptor-based reservoir computing approach gave comparable and for some currents even better tracking results and had the advantage that once trained, it could be used for all DC currents only by changing the conceptor value. Moreover, the performances of the ESN approach were also compared with the simulation results from micro-magnetic simulations in terms of the power spectral density and f mod / f as a function of the applied DC current. The obtained results showed that the proposed modeling approaches allowed faithfully approximating the complex behavior of the NCVO device and gave quite promising insights for their future use as a basic block in future generation computing systems. Future work should consider the use of the proposed models as a building block of more complex computing systems.