An Online Iterative Linear Quadratic Approach for a Satisfactory Working Point Attainment at FERMI

: The attainment of a satisfactory operating point is one of the main problems in the tuning of particle accelerators. These are extremely complex facilities, characterized by the absence of a model that accurately describes their dynamics, and by an often persistent noise which, along with machine drifts, affects their behaviour in unpredictable ways. In this paper, we propose an online iterative Linear Quadratic Regulator (iLQR) approach to tackle this problem on the FERMI free-electron laser of Elettra Sincrotrone Trieste. It consists of a model identiﬁcation performed by a neural network trained on data collected from the real facility, followed by the application of the iLQR in a Model-Predictive Control fashion. We perform several experiments, training the neural network with increasing amount of data, in order to understand what level of model accuracy is needed to accomplish the task. We empirically show that the online iLQR results, on average, in fewer steps than a simple gradient ascent (GA), and requires a less accurate neural network to achieve the goal.


Introduction
A Free-Electron Laser (FEL) [1] is a highly tunable source of coherent radiation. Indeed, it has the widest wavelength range of all laser types, currently ranging from microwaves to X-rays. Such light sources are used for cutting-edge research in many different fields of physics, chemistry, and biology. The FEL principle is based on a beam of electromagnetic waves traveling collinearly with an electron beam inside a magnetic structure called modulator undulator. As a result of their superimposition, the electron beam is energy modulated. The energy modulation is then converted into a charge-density modulation that produces micro-bunching in the longitudinal direction according to the radiation wavelength. Finally, the micro-bunched electron beam radiates in phase with the electromagnetic waves amplifying the radiation until saturation. With a sufficiently high gain, the amplified radiation continues to interact with the electron beam traveling together with the radiation. This configuration is called Self-Amplified Spontaneous Emission (SASE) [2] and produces partially coherent radiation starting from an incoherent spontaneous emission. Even if the radiation has high peak power and good transverse coherence at saturation, the longitudinal coherence is affected by strong pulse-to-pulse spectro-temporal fluctuations. An improvement of the longitudinal coherence is provided by seeded FELs [3][4][5][6] where an external optical laser is injected with the electron beam in the modulator undulator. In this case, the energy modulation has better properties since it is seeded by the optical laser and not by incoherent spontaneous emission. Furthermore, a shorter magnetic path is required to amplify the output until saturation. As a result, a crucial parameter in seeded FELs is the longitudinal and transverse overlapping between electrons and laser, which is obtained by setting a proper working point.
In general, FEL optimization has always been a very demanding task. The system complexity, the lack of a complete theoretical model, the noise, and the unpredictability of machine drifts make it even harder. Over the years, several approaches have been proposed, some of them being just as a proof of principle, others being actually employed [7].
A model-free approach using Gradient Ascent and Extremum Seeking algorithms has been investigated on the FERMI FEL at Elettra Sincrotrone Trieste [8]. Furthermore, a multiphysics simulation tool kit called OCELOT [9] has been designed at the European XFEL in Hamburg for the study of FELs and synchrotron light sources. Some generic optimization algorithms such as Extremum Seeking, Nelder Mead, and Bayesian optimization based on Gaussian processes are already implemented in the framework. Gaussian process and Bayesian optimization have been also employed to tune the quadrupole currents at the Stanford Linear Accelerator Center (SLAC) [10,11] and to optimize the self-amplification power of the spontaneous emission of the Free electron LASer in Hamburg (FLASH) at the Deutsches Elektronen-SYnchrotron (DESY) [12]. In recent years, Machine Learning (ML) techniques have led to many improvements and successful implementations in the field of particle accelerators, from automated alignment of various devices with beam to optimising different parameters, see for example [13][14][15][16][17][18]. An overview of the opportunities provided by the application of ML for particle physics is given in [19]. In particular, the authors of [20,21] consider Reinforcement Learning (RL) for control and performance improvement. In [22], the authors advocate the use of artificial neural networks to model and control particle accelerators in combination with RL. Moreover, recent works [23][24][25][26][27][28][29] have presented RL methods used in the context of FELs. In [23,24], the FEL model and the policy are defined by neural networks in a simulation environment, while in [25] the authors present an application of RL on a real system.
In the present paper, we face the problem of the attainment of a satisfactory working point, i.e., the one resulting in a detected intensity greater than a given threshold. In particular, we are concerned with the alignment of the seed laser to properly tune the superimposition with the electron beam in the modulator undulator of the FERMI FEL. Such a challenging task has already been approached by some of the authors employing RL, both in its model-free and model-based variants [26,27,29]. Here, we investigate the use of an iterative Linear Quadratic Regulator (iLQR) approach [30] to deal with the problem at hand. That algorithm has been already successfully employed in the control of non-linear biological movement systems [30], and recently proposed for the trajectory correction on the Advanced Proton Driven Plasma Wakefield Acceleration Experiment (AWAKE) at CERN [31]. We describe the FEL dynamics using a state-space representation [32], where the association between the position of two tip-tilts and the resulting intensity is identified by training a Neural Network (NN) on real data samples [33][34][35]. We propose to use the iLQR in a Model-Predictive Control (MPC) fashion in order to deal with possible modelplant mismatches. We refer to the mentioned scheme as online iLQR. We evaluate the effectiveness of such an approach as the amount of data used for the NN training increases. Moreover, we compare its performances with a gradient ascent (GA), showing that the online iLQR needs fewer control steps for task achievement.
Clearly, training an NN is not the only way to identify a model of a nonlinear function, such as a regressor. Other possible approaches, for example, rely on Gaussian-Laplace [36] or exponential [37] models. However, they are limited to those problems in which it is possible to assume that the function to be approximated has a known shape.
The paper is organized in the following way. Section 2 presents a general description of the seed laser alignment system, as well as the proposed algorithm. In Section 3, the experimental configuration and the achieved results are reported and discussed. Finally, conclusions are drawn in the last section.

Materials and Methods
In the following, we first describe the main elements of FERMI involved in the considered task (Section 2.1). Subsequently, we recall the iterative Linear Quadratic Regulator (iLQR) method (Section 2.2) and describe an online implementation (Section 2.3), employed for the locally optimal feedback control of the FERMI process.

FERMI Facility
Despite the complex structure of the FERMI, the alignment process can be described by the simple setup shown in Figure 1. Assuming a stable electron beam trajectory, the superimposition of the two beams is achieved by imposing a specific laser trajectory. In particular, by properly tuning the angle of two tip-tilt mirrors upstream of the facility (TT1 and TT2), a coherent optical radiation downstream of the chain is detectable by the intensity sensor (I 0 monitor). For a detailed description of the alignment process, and of each of the devices shown in Figure 1, we refer readers to [26,27]. Here, we only point-out that the intensity detected by the I 0 monitor can be adjusted by properly controlling the pitch and yaw movements of the tip-tilts, which depend on the voltage regulation of some piezomotors (two for each tip-tilt mirror).
We denote by v the servo motors voltages at the k-th time instant, governing, respectively, the pitch and yaw angles of the i-th tip-tilt mirror. The angles are controlled incrementally, i.e., at each k-th time instant, the i-th tip-tilt mirror receives the pitch and yaw displacements as input (respectively, denoted by δv . (1) Let X, U and I be, respectively, the state set, the control set and the output set of the system represented in Figure 1. We model it as a discrete-time dynamical system S whose dynamics follows: where are two identity matrices, and f : X → I is a non linear function. In other words, it is a Wiener system [38], i.e., a model consisting of a static nonlinear element preceded by a linear dynamic system. For our purpose, we represent f through an NN trained on data collected at the beginning of the whole procedure, as described in Section 3. Clearly, the dynamical system S (Equation (2)) is only a simplified approximation of the FERMI dynamics. Therefore, the employed control strategy must be take into account possible model-plant mismatches.

Iterative Linear Quadratic Regulator
Consider a discrete-time dynamical system of the form: where x (k) ∈ X and u (k) ∈ U are, respectively, the state and the control input at the k-th time instant, while g : X × U → X is a non linear function.
The Iterative Linear Quadratic Regulator [30] is an iterative procedure used to employ the Linear Quadratic Regulator (LQR) approach [39] to a non-linear system (Equation (3)). It results in a locally-optimal state-feedback control law able to minimize, in a finite horizon H, a quadratic cost of the form: where Q = Q 0, Q H = Q H 0 and R = R 0 are, respectively, the state cost matrix, the final state cost matrix, and the input cost matrix, while x * is the desired terminal state (we label positive (semi-)definiteness ( )). Let be an H step state trajectory, result of a control sequence at the i-th iteration of the procedure. The corresponding costto-go value J i is obtained according to Equation (4). At each i-th iteration of the procedure: 1. the non-linear system is linearized around a nominal trajectory x i−1 , and the result of a nominal control sequence u i−1 applied to the open-loop system; 2.
a finite-horizon LQR problem is solved, providing as output a new control sequence are the gain matrices that solve the finite-horizon LQR problem and vec(·) is the vector operator.
The two steps are repeated, starting from the new trajectory x i , until convergence, i.e., until the difference of the two subsequent costs (J i − J i−1 ) is lower than a tolerance (tol). The resulting output is an optimal control sequence u * = u * (0) , u * (1) , . . . , u * (H−1) . More details of the iterative procedure can be found in [30].

Online iLQR and Augmented State
In order to apply the iLQR approach to the problem at hand, two adjustments are needed.
First, the iLQR procedure described above is essentially open-loop, and thus it is potentially affected by modeling errors and disturbances. In order to provide the iLQR with the ability to properly address possible dynamical deviations between Equation (3) and the real plant to be controlled, we employ iLQR in an MPC fashion [40]. In other words, only the first input of the optimal control sequence u * is actually applied, and the whole op-timization is repeated at the subsequent time step. At each k-th time instant, the iLQR is fed by the state of the system x (k) and outputs a control sequence u * = u * (k) , . . . , u * (k+H) , the result of the iterative procedure performed on the internal dynamical model Equation (3) (where H is the LQR horizon). Only the first control input u * (k) is applied to the system, and the process is repeated from the new state.
Second, to allow the cost function in Equation (4) to depend on the intensity, we employ an augmented statex (k) = [x (k) I (k) ] , thus the model state equation becomes: where h x (k) , u (k) represents the change of intensity due to the input u (k) occurring at time k when the tip-tilt mirrors are in x (k) . From the practical standpoint, when solving the iLQR, the function h is computed as: where ∇ denotes the gradient.

Implementations and Results
We next describe the conducted experiments (Section 3.1), and then show the obtained results (Section 3.2). In particular, we investigate the behavior of the iLQR as the amount of data provided to the NN during training increases. Indeed, the iLQR relies on the model Equation (2), whose (static) nonlinearity is described by an NN. Thus, the first step of the procedure is, essentially, an identification step, in which input-output pairs are collected and employed to train the network, i.e., to approximate the function f of Equation (2). Moreover, we compare the proposed approach to a simple GA approach, which employs the same approximation.

Experimental Procedure
We perform the online iLQR procedure of Section 2.3 on the real FERMI facility. We model the alignment dynamics at FERMI with S (Equation (2)), where f is identified as explained in the following, and use it for performing each iLQR step of the online procedure. The resulting control inputs will be denoted by u * . We set a maximum number of steps N max to obtain at least the 95% of a target intensity I * on the I 0 monitor. Hereafter, we will use x R (k) and I R (k) to denote the actual state and the actual detected intensity at the k-th time instant, while x (k) and I (k) are their respective counterparts in S. The pitch and yaw displacement performed at the k-th time instant on the FERMI are denoted by u R (k) . The experimental procedure consists of 6 subsequent runs of 30 episodes. At the beginning, a set of 30 initial states is randomly selected, to be used as initial conditions of the 30 episodes for all the 6 runs. A target intensity I * is set according to the operating condition of the manually tuned plant. The associated tip-tilts' state is denoted by x * . Each run starts collecting N random x R (i) , I R (i) -samples, i.e., state and intensity pairs: we initialize the FERMI in a random initial state x R (0) , resulting in a detected I R (0) on the I 0 monitor, and we collect the data-set D = by performing random pitch and yaw displacements u R (0) , . . . , u R (N−1) on the two tip-tilts' mirrors. The data collected at the beginning of the current run, are added to the data collected so far, in such a way that the dataset becomes richer as new runs are performed. The collected dataset D is then used to train a fully connected multi-layer NN in emulating the state-intensity dependency denoted by the non-linear function f in Equation (2). We randomly split D into a training and a validation set (respectively, composed of the 90% and 10% of the D data) and we train the NN for performing the regression task f : X → I. Subsequently, 30 episodes of online iLQR are performed, in order to lead the FERMI intensity toward the 95% of the target I * from each of the 30 previously selected initial states. At each k-th time instant of each episode, given the target augmented statex * = [x * , I * ] , the iLQR algorithm is fed by the current state x R (k) and intensity I R (k) , and outputs an optimal sequence u * , whose first element is applied to the system. Each episode ends either when I (k) ≥ 95%I * , or when the maximum number N max of control steps is exceeded. Figure 2 shows a schematic representation of the above described procedure.

until termination
Repeat for all episodes: x (k) Figure 2. Schematic representation of a single run of the online iLQR. The termination occurs when I (k) ≥ 95%I * , or when the control steps number exceeds N max .
In order to allow a comparison, each episode is also repeated (meaning that the same initial condition is used) performing a simple GA on the f function as approximated by the NN. Specifically, the control input at each k-th time instant is computed according to: where ζ is the step size, and ∇ f is computed analytically from the network weights and structure, which are known. Each run ends once both the online iLQR and the GA have been performed for all the 30 initial states. The experimental procedure is performed by allowing at most N max = 10 steps for each episode, and by collecting N = 250 additional samples at each run. Thus, the NN f used in S is re-trained at each run with a data-set of increasing size (N 1 = 250, N 2 = 500, N 3 = 750, N 4 = 1000, N 5 = 1250, N 6 = 1500), each time starting from null initial weights. The main idea is to find a minimum number of samples for the NN training that will bring the online iLQR approach to the 95% of the target intensity from each of the 30 starting conditions. We consider a fully connected multi-layer NN whose hyperparameters are shown in Table 1. The online iLQR procedure consists of LQR sub-problems of the form of Equation (4), with a horizon length H = 3, while Q = Q H = diag{1, 1, 1, 1, 1000} and R = diag{1, 1, 1, 1} (where with diag{a 1 , a 2 , . . . , a n } we denote a diagonal matrix whose diagonal entries are a 1 , a 2 , . . . , a n ). The iLQR is performed imposing tol = 0.00001 and a maximum number of iterations equal to 25. The GA, conversely, allows fixed steps of size ζ = 0.1 along the maximization of the gradient. Figure 3 shows the results obtained performing the online iLQR for each run and for each episode. In particular, each subplot reports the final intensity I R (k) (green) reached at each episode of a single run, starting from a particular initial intensity I R (0) (blue) and given the 95% of a predefined target intensity I * (red). The results highlight the effectiveness of the proposed approach starting from the third run, where the NN is trained on a data-set composed of 750 samples, and the online iLQR reaches a final detected intensity greater than or equal to 0.95%I * for all the episodes.  and of GA (Section 3.2). Better performance is achieved by the online iLQR that requires a samller total number of time-steps starting from the third test phase, the first completely successful one for both methods. The former shows the cumulative number of steps required for both methods in the achievement of the task for each episode and for each run. As evidenced by the last points of each subplot, the iLQR needs fewer steps than the GA, except in the first two runs, those in which the iLQR does not perform well ( Figure 3). Figure 5, instead, reports the expected value and the variance of the number of steps required in each run by each of the two methods. Moreover, in this case it is apparent that the iLQR requires, on average, fewer steps (last two columns of Table 2).

Results
A common feature of all the performed experiments is that, as the number of data provided for training the NN increases, the performance of both methods improves. This result is confirmed by Figure 6, showing the distribution of the error in the intensity estimation for each NN. For the purpose, we evaluated the NNs' capability of estimating the actual detected intensity in all the states visited during each episode of each run. The error distributions shrink their variance as the number of training data increases; moreover, they all present a mean value close to zero except for the first two, i.e., those trained with the two less numerous data-sets. Table 2 summarizes the performance of both approaches in the different runs. It highlights that the iLQR reaches the 100% success rate starting from an NN with a 0.0072 of mean squared error in intensity estimation, while the GA needs a more accurate NN in order to produce a comparable performance.

Conclusions
We adapted the iLQR approach to the problem of beam alignment at FERMI. Specifically, we performed an identification step to obtain a computational model of the system (based on a NN), and we applied the iLQR in a closed-loop fashion. The proposed approach is generic, and thus it can potentially handle more complex dynamics and possible constraints on state and input (not considered here) [41]. We performed several experiments, with different amounts of data for training the NN. We have shown experimentally that the proposed approach is a viable one, and requires on average, fewer adjustment steps to reach an acceptable working point than a GA approach based on the same computational model. We neglected the time-varying nature of the system, and thus it has to be expected that, in the long run, the computational model needs to be updated. However, the employed NN has a simple structure and few weights, and thus a possible strategy could be retraining the network periodically by employing the data collected during the operation.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: