Adaptive Machine Learning for Robust Diagnostics and Control of Time-Varying Particle Accelerator Components and Beams

: Machine learning (ML) is growing in popularity for various particle accelerator applications including anomaly detection such as faulty beam position monitor or RF fault identiﬁcation, for non-invasive diagnostics, and for creating surrogate models. ML methods such as neural networks (NN) are useful because they can learn input-output relationships in large complex systems based on large data sets. Once they are trained, methods such as NNs give instant predictions of complex phenomenon, which makes their use as surrogate models especially appealing for speeding up large parameter space searches which otherwise require computationally expensive simulations. However, quickly time varying systems are challenging for ML-based approaches because the actual system dynamics quickly drifts away from the description provided by any ﬁxed data set, degrading the predictive power of any ML method, and limits their applicability for real time feedback control of quickly time-varying accelerator components and beams. In contrast to ML methods, adaptive model-independent feedback algorithms are by design robust to un-modeled changes and disturbances in dynamic systems, but are usually local in nature and susceptible to local extrema. In this work, we propose that the combination of adaptive feedback and machine learning, adaptive machine learning (AML), is a way to combine the global feature learning power of ML methods such as deep neural networks with the robustness of model-independent control. We present an overview of several ML and adaptive control methods, their strengths and limitations, and an overview of AML approaches. A simple code for the adaptive control algorithm used here can be downloaded from: https://github.com/alexscheinker/ES_adaptive_optimization


Introduction
Machine learning (ML) [1] tools such as neural networks (NN) [2], Gaussian processes (GP) [3], and reinforcement learning (RL) in which NNs are incorporated to represent system models and optimal feedbacks [4], have been growing in popularity for particle accelerator applications. Although these methods have been around for decades, their recent growth in popularity can be attributed to recent growth in computing power with high performance computers and especially graphics processing units (GPUs) becoming very inexpensive. Also, powerful and easy to use software packages such as tensorflow are now freely available for anyone to easily develop their own sophisticated ML tools specifically tailored to their accelerator problems.
Recent ML applications for accelerators include ML-enhanced genetic optimization [5], utilizing surrogate models for simulation-based optimization studies and for estimating beam characteristics [6][7][8][9][10], Bayesian and GP approaches for accelerator tuning [11][12][13][14][15][16], various applications at the LHC including optics corrections and detecting faulty beam position monitors [17][18][19], powerful polynomial chaos expansion-based surrogate models for uncertainty quantification have been developed [20], and RL tools have been developed for online accelerator optimization [21][22][23][24][25]. One challenge faced by many ML approaches is the fact that as accelerators and their beams change with time, the ML models that were trained with previously collected data are no longer accurate because they are being applied to a different system than the one which they have been trained for. In order to provide accurate control and diagnostics in the presence of time varying systems a real time feedback adaptive ML approach is required.
Recently, novel adaptive feedback algorithms have been developed which are able to tune large groups of parameters simultaneously based only on noisy scalar measurements with analytic proofs of convergence and analytically known guarantees on parameter update rates, which makes them especially well suited for particle accelerator problems [26]. Such methods can be easily implemented via custom python scripts that read and write from machine components via network systems such as EPICS [27] and have been implemented in powerful optimization software such as OCELOT for online accelerator tuning [28]. The main benefit of adaptive methods is that they can be applied online in real time to drifting accelerator systems. For example, these methods have now been applied to automatically and quickly maximize the output power of FEL light at both the LCLS and the European XFEL and are able to compensate for un-modeled time variation in real time while optimizing 105 parameters simultaneously [29]. Another example of the benefit of these approaches is for multi-objective optimization which is typically done offline via extremely lengthy simulation studies. Adaptive methods have been demonstrated for realtime online multi-objective optimization of the electron beam line at AWAKE at CERN for simultaneous emittance growth minimization and trajectory control [30]. These adaptive methods have also been applied for online RL in which optimal feedback control policies were learned directly from data to learn optimal feedback control policies [31]. These methods have also been demonstrated at FACET to provide non-invasive LPS diagnostics that can actually predict and actively track time-varying TCAV measurements as both accelerator components and initial beam distributions drift with time [32].

Adaptive Machine Learning
The first adaptive ML approach combining ML and adaptive feedback for timevarying particle accelerator systems was recently developed for real time automatic control of the LPS of the LCLS electron beam [33]. The adaptive ML approach was demonstrated to combine the best of each family of tools: the power of ML tools such as NNs to learn complex relationships directly from measured data and the robust ability of adaptive feedback to handle time-varying noisy and unknown system dynamics. This adaptive ML approach has the potential to solve one of the main limitations in terms of matching the predictions of models (physics-based or surrogate) to actual accelerator beams, by adaptively identifying the initial beam distributions entering the accelerators [34], whose knowledge is needed for accurate model-based predictions.
In this work, our primary interest is in developing adaptive controls and diagnostics for time-varying conditions where it is possible that both the accelerator parameters and the initial beam entering the accelerator change as a function of time in unpredictable ways. Such uncertain time variation is one of the main challenges of accelerators and what makes it so difficult to run online models as real time non-invasive diagnostics because even the perfect model will not be predictive if it is not initialized with the correct parameter values or the correct initial beam distribution whose dynamic evolution is then simulated.

Unknown Time-varying Systems
Control in the presence of uncertainty and time-variation is extremely challenging even for linear systems. Linear systems are very popular because they are simple local approximations of much more complicated nonlinear dynamics and can be analytically solved. Consider a general n-dimensional nonlinear systeṁ . . .
Within a small enough neighborhood of any point (x 0 , u 0 ) ∈ R n×m , we can approximate the system (1), by a linear system, based on the Jacobian matrices of f(x, u). Via Taylor series expansion, for (x, u) − (x 0 , u 0 ) 1, we can approximate the f i (x, t) as: Thereby we approximate the nonlinear systeṁ , with all of the above derivatives calculated at the point (x 0 , u 0 ). For any point (x 0 , u 0 ), we can always change coordinates to Therefore without loss of generality we can consider the case (x 0 , u 0 ) = (0, 0), and from now on we ignore the term f(0, 0), recognizing that it is just a constant disturbance. Next, if we define the constants If a linear feedback control of the form is used, the resulting closed loop system is still lineaṙ and can be analytically solved with the help of Laplace transforms to get Finally, the transient dynamics and stability of the system (4) are completely defined by the eigenvalues of the closed loop system matrix A c because for any matrix, A c , there exists an invertible matrix P such that where J i is a Jordan block associated with the eigenvalue λ i of A c , where Jordan blocks of order one are J i = λ i , and a Jordan blocks or order m i are: Using the matrix exapnsion (6), we can rewrite the matrix exponential as to get the solution which, after some algebra, can be rewritten as The eigenvalues of A c = A − BK have found their way into the exponential terms e λ i t . Expanding eigenvalues as a sum of real and imaginary parts, λ i = λ r,i + iλ I,i , we can Clearly the imaginary parts of the eigenvalues define the resonant frequencies of the system while the real parts control whether the trajectories of the dynamics converge or diverge exponentially. If we choose our feedback controller −Kx such that the closed loop matrix A − BK, is Hurwitz, so its eigenvalues have negative real parts, then all λ r,i < 0, and the system's trajectory exponentially converges to the origin, and the system is globally exponentially stable. This simple analysis of linear systems has lead to the popularity of simple proportional integral derivative (PID) control which is by far the most common form of feedback control. If however, the original nonlinear system that we are controlling is time-varyinġ then by the same arguments as above, we can approximate our system (9) at any instant of time, within a small neighborhood of some point, by the linear time-varying systeṁ but that is where the similarity ends. For time-varying systems an eigenvalue analysis is completely useless (except in some very special cases, such as periodic or arbitrarily slowly varying systems) and determining whether systems (10) is stable or not and therefore designing stabilizing controllers, is incredibly difficult and requires a nonlinear Lyapunov analysis [35]. The uselessness of eigenvalues for time-varying systems is demonstrated by a few simple examples. Consider the system ẋ 1 x 2 = −1 + 1.5 cos 2 (t) 1 − 1.5 sin(t) cos(t) −1 − 1.5 sin(t) cos(t) −1 + 1.5 sin 2 (t) which has constant eigenvalues with negative real parts λ ± = −0.25 ± 0.25 √ 7i, but which is exponentially unstable with solution = e 0.5t cos(t) e −t sin(t) −e 0.5t sin(t) e −t cos(t) Another example is the linear time-varying system which has constant negative real eigenvalues λ i = −1 and is also unstable. The system 15 2 cos(12t) 15 2 cos(12t) − 11 2 − 15 2 sin(12t) has constant eigenvalues, one of which is positive real, λ i = {2, −13}, but is stable. As seen from the examples above, even when we know everything about a system, if it is time-varying its stability properties are not obvious. On top of this a whole new level of difficulty is added when the system is also unknown. Consider an extremely simple 1D example of a one parameter system with dynamicṡ where u 1 (x, t) is a feedback controller and b(t) is an unknown time-varying function such as b(t) = b cos(ωt). The stabilization of such a system with unknown control direction, , was an open problem in control theory for a long time. In 1985 a solution was proposed for the simple time-invariant case of b(t) ≡ b [36,37], this solution suffered from an unbounded growing overshoot for time-varying b(t) with hanging sign, eventually destabilizing and destroying any physical system. In 2012, problem (11) was finally solved with a novel model-independent approach [38,39], whose feedback forces the dynamics ( 11) to have an average behavior described by a new system of the forṁx with arbitrarily small > 0, where the unknown control direction, b(t), has become squared and can be stabilized automatically. Because of arbitrarily close proximity of the trajectories of (12) to (11), stabilization of (12) is equivalent to stabilizing (11). The results in [38,39] actually solved the much more general problem of stabilizing a n-dimensional nonlinear time-varying unknown system of the forṁ where f and g are both nonlinear, time-varying, and analytically unknown. The method has now been generalized further with analytical proofs of stability for non-differentiable systems as well as systems not affine in control [40,41], of the forṁ and has been utilized in various particle accelerator applications [26,[29][30][31][32][33].

Machine Learning for Time-Varying Systems
One application of machine learning is to try to learn unknown system relationships directly from data. This has been especially popular in the accelerator community for surrogate models and diagnostics based on them. Consider a family of N p parameters in an accelerator, p = (p 1 , . . . , p N p ), which might include the currents of magnet power supplies and RF cavity amplitude and phase settings throughout an accelerator. Theoretically each set of parameter settings p i maps, via some complicated nonlinear function, F, to an observable O i according to which may be, for example, the 2D (z, E) LPS distribution of the accelerator beam at a particular location. An ML tool can learn a close approximationF of the unknown function F directly from a data set D which contains enough pairs of parameter settings and their corresponding beam distributionŝ by minimizing some measure of error, such as e above, between ML predictions and observed data over the entire data set. Such a surrogate model approach has become popular recently for particle accelerator applications. For example, this has been done to map accelerator parameters to the LPS of the LCLS beam with the approximationF taking the form of a deep neural network [8].
It is known that both accelerator components and the initial beam distributions that are formed on photocathodes vary unpredictably with time and measurements are typically noisy and have arbitrary offsets, such as phase shifts of RF cables and analog components due to temperature and changing relationships between magnetic fields and power supplies due to hysteresis. Our goal is to tackle the problem of time varying systems in which (13) is replaced with where we emphasize that the parameters p i (t) drift with time, that the initial beam distribution ρ 0 (t) entering any accelerator section drifts with time, and that the overall relationship between beam observables and parameters drifts with time (t). Furthermore, we recognize that at any observations of the accelerator and its beam are actually of the form (p(t) i , O i ) andρ 0 (t), wherep i (t) is an estimate of parameters p i (t) andρ 0 (t) an estimate of the uncertain initial beam distribution ρ 0 (t). Therefore, a surrogate model such as a deep neural network will only result in an approximation that is valid for a small time interval, whose performance will drift as the accelerator's characteristics drift away from any collected data set. Our approach is to implement adaptive ML which combines online simulations, adaptive feedback, and ML for systems of the form (15), which can be summarized aŝ whereF represents an adaptive model-ML hybrid data-based learned representation of the observable O, where a trained surrogate model such as a convolutional neural network initially predicts the observableÔ, with the prediction then refined by an online model. The adaptive ML modelF receives time-varying parameter estimatesp(t) and an estimate of the beam's phase spaceρ 0 (t) from a second model which is adaptively tuned based on the error eψ(t) quantifying the match between its predictionψ(t) and the detected ψ(t) value of a rich non-invasively measured beam characteristic such as the beam's energy spread spectrum. The parameter estimates,p(t), ML weights w(t), and initial beam distribution ρ(t) are all adaptively tuned utilizing a model-independent adaptive feedback control method.

Controls and Diagnostics for a 22 Dimensional System
To demonstrate some of the ideas described above, we perform a simulation-based study of an incredibly high-dimensional system in which we control 22 parameters which are the quadrupole magnets in the low energy beam transport of the Los Alamos Neutron Science Center (LANSCE) linear accelerator, as shown in Figure 1. The fact that our system has 22 dimensions makes it clear that ML-based studies are very challenging because even a very coarse grid search of 10 steps per parameter results in a staggering 10 22 data points.
In this work we simulate beam dynamics according to the Kapchinsky-Vladimirsky (K-V) equations [42] which describe the evolution of the rms beam sizes (X(z), Y(z)) as where K is the beam perveance, a measure of how space-charge dominated the beam is, rx and ry are beam emittance in the x and y axis respective, I n (z) are indicator functions which are non-zero at the locations of the quadrupole magnets and Θ n are the quarupole focusing strengths defined as  where G n is the magnetic field of quadrupole n in Gauss/cm, r is the magnet pole-tip radius, N is the number of turns per pole, ν is the efficiency factor and I is the magnet current.
In designing an adaptive diagnostic, one must make a choice about what parameter measurements are accurate and which might drift with time. In this case, we assume that we can accurately measure magnet settings and create an adaptive ML model for this system which serves as a non-invasive diagnostic by mapping magnet settings to beam profiles along the accelerator section, this NN-based model will be referred to as M Θ→X,Y and its output is two vectors of length 200 each, which represent estimates of (X(z), Y(z)) over the z ∈ [0, L = 11.7 m] range with 58 cm longitudinal resolution.
A typical surrogate model approach would simply run many iterations of the simulation (18)- (20) to learn a mapping from quadrupole settings to beam size along the LEBT. However, in our case, we know that the beam's initial size (X(z = 0), Y(z = 0)) coming out of the source will slowly drift unpredictably with time according to some unknown time-varying functions: Such a drift is impossible to measure in real time during accelerator operations, it requires lengthy wire scanner-based measurements that interrupt operations. Taking this into account, although we do not expect to be able to measure (X(z = 0, t), Y(z = 0, t)), we still design our adaptive diagnostic with estimates (X(z = 0, t),Ŷ(z = 0, t)) to be used as inputs. These estimate inputs will give our ML model the flexibility to be adaptive, to track changes in real time. In order to update our estimates, we need one final part which is some measurement of the beam that can be compared to the ML-based prediction. In this case the beam size can be estimated in real time at the end of the LEBT, (X(z = L, t), Y(z = L, t)) by looking at the amount of current lost on a pair of vertical and horizontal slits. The plan is then to compare the model's prediction of the final horizontal and vertical beam sizes to their measurements and calculate an error and use that error to adaptively tune the input beam size estimates by using the methods in [40,41] to track the time-varying input beam distribution: A high level overview of this adpative ML diagnostic approach is shown in Figure 2, where the NN M Θ→X,Y has dense connected layers with relu activation functions. To train the model we simulated the dynamics (18)-(20) 600 thousand times and used 590 thousand of the generated trajectories and their corresponding quadrupole magnet values for training, reserving 10 thousand for model validation. Each simulation generated random initial conditions (X(0), Y(0) by uniformly sampling distributions with mean values of 2 mm and 1.75 mm respectively over a range of ±0.5 mm, which is realistic based on experimental data. The quadrupole magnet values were also sampled from uniform distributions centered on a known good setup over a range of ±1.
Considering the root-mean-square error (RMSE) as the metric of accuracy for both the X and Y trajectories, a histogram of the sum of these errors is shown in Figure 3 for the 10k validation set as well as the worst and best predictions from this set. In Figure 3 we also show the sum of these Using Correct Initial Conditions (X(0),Y(0)) Using Average Initial Conditions (X(0),Y(0)) Figure 3. Comparison of NN predictions when using the correct initial conditions and when using the mean value.  Figure 4 shows the beam envelopes for the best, average, and worst predictions of the NN when using the correct initial conditions as input. Finally, in Figure 5 we show the results of using this adaptive ML-based diagnostic to track time varying initial beam sizes.
By a similar principal, it is possible to design a second model which can be used to guide feedback control for accelerator tuning, by solving the inverse problem of mapping beam sizes to the quadrupole values that are required to achieve them, referred to as M X,Y→Θ . A high level view of such an adaptive control approach is shown in Figure 6. Such a model would require a diagnostic for comparing achieved and desired beam profiles and this type of approach was first demonstrated on the LCLS for automatic control of the LPS whose measurement was available in real time via a transverse deflecting cavity (TCAV) [33].

Conclusions
We have shown how combining adaptive model-independent methods with machine learning tools for adaptive ML combines the robustness of adaptive methods to time variation and uncertainty, with the global and extremely fast predictions of complex nonlinear dynamics. The main trick is to introduce extra inputs which may be tuned to modify the ML structure's predictions based on some measurement. These inputs do not have to be physically measurable quantities, but they should have a physically meaningful effect on the predictions.