Optical Extreme Learning Machines with Atomic Vapors

Extreme learning machines explore nonlinear random projections to perform computing tasks on high-dimensional output spaces. Since training only occurs at the output layer, the approach has the potential to speed up the training process and the capacity to turn any physical system into a computing platform. Yet, requiring strong nonlinear dynamics, optical solutions operating at fast processing rates and low power can be hard to achieve with conventional nonlinear optical materials. In this context, this manuscript explores the possibility of using atomic gases in near-resonant conditions to implement an optical extreme learning machine leveraging their enhanced nonlinear optical properties. Our results suggest that these systems have the potential not only to work as an optical extreme learning machine but also to perform these computations at the few-photon level, paving opportunities for energy-efficient computing solutions.


I. INTRODUCTION
In the last decades, the advent of artificial intelligence and neuromorphic architectures has completely reshaped the computing landscape.In particular, the logic procedures and arithmetic operations of the von Neumann paradigm may now be replaced by an alternative optimization-based approach.This establishes opportunities in two distinct directions.On one hand, on the side of algorithms, where rules and strategies are now autonomously inferred from data-driven processes [1].On the other hand, on the side of the hardware, co-locating processing and memory operations pave for simpler hardware solutions capable of competing with electronic devices [2][3][4].
From the perspective of hardware, one of the most promising architectures is the reservoir computing framework.This design leverages the nonlinear dynamics of a physical system to simplify the transference of neuromorphic concepts to hardware implementations, allowing most physical systems to act as a computing platform.A particular case of a reservoir computer that neither uses temporal dependence nor system feedback are Extreme Learning Machines(ELM), an architecture closer to a common feed forward network and typically easier to implement [5][6][7].In short, an Extreme learning machine exploits an untrained nonlinear random transformation to project each element of an input space onto a high-dimensional output space.The learning process is then performed in this output space, i.e. at the output layer, which reduces the computational load and bypasses the need to fine-tune all the network weights [5].
ELMs and reservoir computers have the potential to simplify the deployment of alternative physical systems as effective computing platforms, from mechanical [8,9] and hydraulic [10][11][12][13] to optical systems [14][15][16][17], being the latter the main focus of this work.Yet, deploying an effective physical ELM requires control of the relevant nonlinear dynamics [7,18,19].From the optical computing perspective, this means that one needs optical media with a tunable nonlinear optical response.Nonlinear crystals or thermo-optical media may be explored for that purpose, but their relatively weak nonlinearity may limit their applications, requiring either large propagation distances that forbid miniaturization and increase losses, or higher laser power that can be prohibitive from the power efficiency perspective.
In this context, atomic gases in near-resonant conditions may constitute a valuable alternative for the construction of such devices.Indeed, provided with a suitable level structure, one can engineer the light-matter interaction conditions to achieve a tunable optical media with enhanced nonlinear susceptibilities at ultra-low intensities [20][21][22][23].Motivated by this background, this manuscript explores how such near-resonant media can be utilized for deploying effective optical extreme learning machines (OELM).For that, we will start by modeling the optical response of a typical N -type four-level atomic system to demonstrate how it can be utilized to obtain a highly nonlinear optical medium that is suitable to perform at low-intensity levels.Then, introducing the basic concepts of ELMs, we demonstrate how such configuration can be envisioned and used as a processing solution by using a phase encoding scheme and interferometric processes occurring during the nonlinear optical propagation of an optical beam inside a gas cell.By performing beam propagation numerical simulations under realistic physical conditions, we test the computing capabilities of these systems, demonstrating the importance of controlling the nonlinear properties of the system.FIG. 1. Schematic of the purposed optical extreme learning configuration (exaggerated scales for illustration purposes).A. Each input of the dataset X (i) is encoded into the phase profile of the incident flat top weak probe beam Ωp, which can be achieved with a spatial light modulator, for example.B. The probe beam propagates inside a cell filled with an atomic vapor with a N -type 4-level atomic system, which acts as the intermediate layer of the OELM.C.After the propagation, the speckle pattern is recovered at the end, with the pixel intensity values being utilized as the output state Y (i) for training the linear transformation W that completes the extreme learning machine architecture.

II. PROPAGATION OF AN OPTICAL BEAM IN AN ATOMIC MEDIA UNDER NEAR-RESONANT CONDITIONS
Regarding the goal of a strong nonlinear optical response, multiple atomic systems with a variety of level structures are known to support such phenomenology through effects related to quantum state coherence [21,22].For this manuscript, we will focus on a typical 4-level atomic system [23][24][25][26] interacting with three continuous-wave electromagnetic fields, which is widely known in the literature to support giant cross-Kerr nonlinearities even at ultra-low intensity levels [23,24].In particular, we choose a typical N -type configuration(see Figure 1).First, a weak probe field E p = 1 2 E p (r, z) e ikpz−iωpt + c.c. , with envelope function E p (r), center frequency ω p and wave vector k p couples the levels |1⟩ and |3⟩.The additional transitions are driven with stronger fields: a second ground state |2⟩ is coupled to the excited state |3⟩ via a control field E c = 1 2 E c (r) e ikcz−iωct + c.c. , with envelope function E c (r), center frequency ω c and wave vector k c ; and a switching field couples the second ground state |2⟩ to a second excited state |4⟩ via E s = 1 2 E s (r) e iksz−iωst + c.c. , with envelope function E s (r), center frequency ω s and wave vector k s .Taking a semi-classical approach for the light-matter interaction [27,28] and neglecting the effects of the weaker probe beam on the dynamics of the control and switching fields, the propagation of the optical probe beam equation under the paraxial approximation is given by In this framework, the coherent light-matter interaction can be accounted for through the polarization P p density term that oscillates with ω p .For the current atomic medium it can be defined as P p = ηµ 31 ρ 31 e i(kpz−ωpt) + c.c., where µ ij and ρ ij are the dipole moment for the transition|i⟩ → |j⟩ and the population coherence terms of the density matrix operator ρ respectively, and η is the atomic density.To proceed with an analysis, the dynamics of the atomic populations can be modeled by the master equation where H is the system Hamiltonian given by where the Lindblad superoperator Γ (ρ) accounts for all the decoherence processes of the system, and the Rabi frequencies for the transitions are defined as Ω p,c,s = µ 31,32,24 E p,c,s /ℏ.Using the definition of optical susceptibility ε0ℏΩp ρ 31 , we can simplify the equation ( 1) to Equations 2 and 4 are then coupled through the susceptibility term, which may be obtained by solving the master equation.Assuming the steady-state solution, ρ = 0 and by making use of the rotating-wave approximation [28], equation ( 2) can be expanded into the form where ρ stands for a vectorized form of the ρ ij density matrix, and where M 0 and M p are matrices related with the Ω p -independent and dependent parts of the master equation for ρ, respectively.Recovering the weak probe beam assumption, i.e. |Ω p | ≪ |Ω s | , |Ω c |, a perturbative approach to the weak probe beam gives starting from the ground state as the zero-th order solution (i.e., ρ ).The results for this equation system are straightforward to obtain algebraically, but the general full expressions are typically too cumbersome [26].Yet, in the simplified limit of negligible dephasing processes between the two ground states, and assuming Γ 32 = Γ 31 = Γ 42 = Γ 41 = Γ, and the two-photon resonance condition ∆ p = ∆ c = ∆ s = ∆, it is possible to obtain that where the second approximation is valid for sufficiently large detunings, ∆ ≫ Γ, Ω p,c,s and equal amplitudes /γ and the coefficient κ = ηµ 2 31 /(ε 0 ℏγ) and dropping the primes, we obtain a dimensionless Nonlinear Schrödinger equation (NSE) to describe the evolution of the probe field as where the linear coefficient is given by while the nonlinear term, associated with a self-Kerr effect, is given by In the context of this work, the NSE model will be utilized to investigate numerically the propagation of a given envelope field Ω p that contains the input information encoded in its wavefront.In the next section, we introduce how this propagation can be used to construct an optical computing system to process information by establishing a parallel to an extreme learning machine architecture.Furthermore, leveraging on the controllable parameters of the model, specifically the detuning ∆, we investigate the impact of this choice in the overall performance of our OELM, showcasing the opportunities of using atomic systems for this specific purpose.

III. BUILDING AN OPTICAL EXTREME LEARNING MACHINE
To build an effective optical ELM we need first to understand the inner workings of this architecture.For this section, we consider the task of predicting a P (i) belonging to a space R Ntarget for a total of N D input states X (i) belonging to the feature space R Ninput (with N input being the total number of features) and associated with groundtruth targets T (i) that belong to a space R Ntarget .
The ELM architecture comprises three stages.First, each input state X (i) is projected into an intermediate space as with G being a nonlinear activation function, N c the number of output channels, and w i and b i the internal weights and bias for each channel.The second step concerns the learning stage.It is performed on this high-dimensional intermediate space and assumes that we can train a linear transformation W from R Nc to R Ntarget and that matches the target values T by minimizing a given loss function.For example, it can be an output weight matrix T that belongs to R Ntarget×Nc and that under the typical regularized ridge regression can be obtained from minimization of the loss function considering all the inputs i and associated targets T (i) of the training dataset.Note that while the Ridge model is typically suitable for regression tasks, classification is also possible for example by converting binary class targets to positive/negative values and keeping the sign of the prediction.Besides, under the theory of extreme learning machines [5][6][7], it is known that the universal approximation capabilities of this framework require two conditions under the activation function: i)an infinitely differentiable nonlinear activation function G and ii)a random distribution of weights w i [5].
Established the architecture, we can now discuss how to implement an OELM based on the propagation of an optical beam in our nonlinear optical media.First, we embed each input state into the phase of a given probe beam, namely Ω p (X (i) , z = 0) = j A j (x, y)e iϕ(X (i) j ) (14) where ϕ(X i ) are encoding function and A j intensity distributions working as embedding states.Then, we let the optical beam propagate inside the media, before recovering it at the end z = z L , utilizing an intensity sensor (e.g. a camera) for that purpose.Then, taking the pixels as the output channels we get where a subscript for the right-hand side j = {1, . . ., N c } may for example refer to a pixel position (x, y) = (mod(j/N x), int(j/N x)) for a sensor of N x × N y pixels.Establishing a parallel with the definition in equation 12 it is straightforward to conclude that our nonlinear activation function is provided by a mixed combination of multiple effects, namely the interference of waves and generated patterns when interrogated with an intensity sensor, and the nonlinear evolution of the optical state inside the nonlinear media.While the first by itself does not warrant the necessary conditions for an ELM with universal approximation capabilities [7,16], it is known that the second may provide them if the strength of the nonlinearity is sufficiently high [7,19].

IV. RESULTS
In this section, we present the results of numerical simulations of equation 9 to explore the performance of an OELM for regression and classification tasks.Nevertheless, to increase the relevance of the work and maintain a FIG. 2. Overview of the encoding for the regression task for dataset and targets presented in A. Panel B. presents the flattop intensity of the probe beam utilized, whereas C. presents one feature of the encoding scheme in the phase of the wavefront.D.A constant random matrix on the phase is also applied to the wavefront to warrant the speckle formation.

A. Regression of nonlinear functions
To understand the computing capabilities of our optical ELM we first focus on a typical regression task.For the purposes of this work, we have chosen to approximate the function t(f 1 ) = sin(4f 1 )/x with f 1 ∈ [−3, 3] encoded in the input state as 1 ) e iR(x,y) (16) with the encoding ϕ(X (i) 1 ) = 2π f1−min(f1) max(f1)−min(f1) − π obtained using a min-max normalization of the features (see Figure 2 for spatial distributions and encoding strategy).The additional fixed random phase distribution R(x, y) warrants the generation of speckles in such a short distance and randomness of the projection onto the output space, and can be applied experimentally using a spatial light modulator or an optical diffuser, for example.For each state, we propagate it numerically, taking the intensity at z = z L (i.e.imaging the intensity at the output plane) and recording a region of interest(ROI) of 60x60 pixels around the center (x, y) = (0, 0).We further downsampled the ROI by averaging regions of 2x2 pixels, before randomly choosing N c superpixels as the output state Y (X (i) ).Note that the effective dimensionality of the output space may be higher or lower than N c and it is associated with multiple factors such as the type of activation function and encoding strategy.Indeed, from a purely theoretical perspective, the only formal statement known is that if the matrix H = [Y (X (1) ), ..., Y (X (N D ) )] has rank(H) = N D then it can learn a dataset of N D elements with zero error, which would imply N D ≤ N c [5].Still, this statement does not impose major constraints on performance as one does not require N D ≤ N c for effective learning [16,19].Indeed, a higher N c may be detrimental for performance as it may lead to overfitting issues, that must be dealt with using convenient regularization techniques.
To explore this framework we test the system with the regression of the function t (f 1 ).The data set is comprised of 64 points equally distributed in the interval [−3, 3] with an 80 − 20% train-test split where a standard Ridge Regression methodology from the sklearn Python library is used to train the output layer [30].Taking the root mean squared error (RMSE) as evaluation metric, the results obtained are presented in figure 3.As we can see, both the the dimensionality of the output space N c and nonlinear strength g are important for achieving a good performance in the regression task, achieving an error below 5%, and meaning that both play a role as hyperparameters of the model.Yet, looking at subfigure 3C one can see that the nonlinear dynamics are more important for achieving a good performance than simply increasing the number of output channels, meaning that the dimensionality is important but not sufficient.Indeed, this becomes evident when computing the error of the noisy dataset (same input data but with 5% random noise added on top of it), for which one sees a decrease in the accuracy of the predictions with the increase of the N c , which suggest an overfitting of the model with the increase of N c .

B. Classification of the Spiral Dataset
For the second case study, we focus on the typical two-class spiral dataset to get a grasp of the classification capabilities of our proposed implementation as well as to get a clear picture of its generalization capabilities.Taking the dataset represented in Figure 4, we encoded each pair of features (f 1 , f 2 ) into an input state Ω p (X (i) , z = 0) = A 0 (x, y) + A 1 (x, y)e iϕ(X (i) 1 ) + A 2 (x, y)e iϕ(X (i) 2 ) e iR(x,y) ( with the encodings ϕ(X i ) = 2π fi−min(fi) max(fi)−min(fi) − π obtained using a min-max normalization of the features (see Figure 4 for spatial distributions).The associated vector on the output space for each input state was computed following the same strategy as utilized for the regression task.
Utilizing the logistic regression as a classification model, we followed the same 80%-20% train-test subset division procedure, and varied the number of channels N c and the nonlinear strength g, obtaining the results depicted in Figure 5.As in the regression task, the results again suggest that the output dimensionality and nonlinear strength of the physical system are both important parameters, but only the nonlinear strength warrants good accuracy.In ideal conditions, train and test accuracies above 90% can be achieved, meaning that the model is not overfitting and validating our conceptual proposal as an effective OELM capable of performing nonlinear classification tasks.
To finalize, Figure 5 also presents the accuracy for predictions on a noisy dataset, i.e. propagating states with 5% random noise added on top of the initial state.As the performance does not vary significantly, it suggests that the classification model is robust and may generalize well for unseen data.Indeed, a better insight into model generalization capabilities may be achieved by performing numerical simulations for each point covering a 20 × 20 rectangular grid with f 1 ∈ [−π, π] and f 2 ∈ [−π, π], and predicting the associated label at each point using the previously trained model for N c = 300.Figure 6 depicts the obtained results, showing that a correct spiral-like separation of the data is possible for the sufficiently high nonlinear regime of the reservoir, demonstrating that generalization is possible but tightly connected with the strength of the nonlinearity.

V. DISCUSSION AND CONCLUDING REMARKS
The present manuscript reports a strategy to enable a physical implementation of an OELM using the nonlinear optical properties of atomic vapors in near-resonant conditions.The proposed implementation focuses on the use of an N-type configuration and the propagation of a weak probe beam assisted with strong coupling and switching fields.Following a perturbative approach, we derived an effective model for the propagation of a weak probe optical beam in the form of a nonlinear Schrödinger equation.Then, leveraging an encoding strategy based on the spatial modulation of the phase of the input probe beam, we established a connection between the physical system and the ELM architecture, demonstrating how one can benefit from the strong nonlinear optical properties of near-resonant optical media to enable an optical implementation of an ELM.Besides, by offering the possibility to control the nonlinearity strength with external parameters such as detuning or field intensity, the system presents an interesting playground to explore the crossover between linear and nonlinear response and to assess its impact on the performance of an optical ELM.The numerical results presented demonstrate how combining a sufficiently large output dimensional space with strong nonlinear dynamics performs regression and classification of nonlinear problems.To approximate experimental conditions, realistic physical values together with synthetic noise are used to obtain the numerical results.Therefore, it is plausible to expect similar observations in experimental setups, which are soon to explored.
Finally, putting the results in perspective by comparing it with previous approaches in free space [14,16], the system presented here benefits from the fact that the nonlinearity does not reside solely in the measurement of the intensity of the field at the output plane (commonly done with an electronic element such as a camera) but also on the propagation itself, which can be controlled externally.These findings pave an important step for all-optical computing schemes and for establishing atomic vapors as possible building blocks of fast and robust neuromorphic all-optical computers.

FIG. 3 .
FIG. 3. Results for the regression task with the optical ELM measured with the root mean squared error (RMSE) metric.A. Change in performance with the increase of output channels Nc with fixed g = 200.B. Varying performance with the increase of the nonlinearity parameter g for a fixed Nc = 50.C.Performance on the test dataset with varying g and Nc, showing a clear increasing performance tendency for stronger nonlinearities and larger output spaces.

FIG. 4 .
FIG. 4. Overview of the encoding for the classification task for dataset and targets presented in A. Panel B. presents the flattop intensity of the probe beam utilized, whereas C. presents a two-feature encoding scheme in the phase of the wavefront, for the point highlighted in red in panel A. D.A constant random matrix on the phase is also applied to the wavefront to warrant the speckle formation.

FIG. 6 .
FIG. 6. Results for the two-spiral classification task with the optical ELM, regarding the generalization capabilities of the model for two distinct nonlinear parameters, A.g = 1.0 and B.g = 100.