Fast, efficient and flexible particle accelerator optimisation using densely connected and invertible neural networks

Particle accelerators are enabling tools for scientific exploration and discovery in various disciplines. Finding optimized operation points for these complex machines is a challenging task, however, due to the large number of parameters involved and the underlying non-linear dynamics. Here, we introduce two families of data-driven surrogate models, based on deep and invertible neural networks, that can replace the expensive physics computer models. These models are employed in multi-objective optimisations to find Pareto optimal operation points for two fundamentally different types of particle accelerators. Our approach reduces the time-to-solution for a multi-objective accelerator optimisation up to a factor of 640 and the computational cost up to 98%. The framework established here should pave the way for future on-line and real-time multi-objective optimisation of particle accelerators.


ILS1
ILS2 ILS3 In this work we introduce two new surrogate models, a forward model and an invertible model. Both are designed to model the machine at any position, overcoming a main limitation of existing approaches. Besides, the proposed methods to build those models are general enough to be applicable to any kind of particle accelerators. We demonstrate the generality of the Ansatz by considering a linear accelerator and a quasi circular machine i.e. a Cyclotron. Furthermore, the fast surrogate models enable optimisations on time scales suitable for on-line and real-time multi-objective optimisations of particle accelerators. The forward model resembles existing models described e. g. in [8]. While previous work focused on approximating OPAL only at one position (usually at the end) of the accelerator (as in [8]), our new model provides an approximation to OPAL at a multitude of positions along the accelerator, allowing us to estimate beam properties along the entire accelerator depicted in Fig. 1d. To achieve this goal, our model takes both the design variables and the position along the accelerator as input and predicts the beam properties at that position. Consequently our model is a complete replacement for OPAL and does not limit our options of objectives and constraints for beam optimisations. Existing models are not capable of optimising the beam properties at multiple positions at the same time, but our forward model enables such sophisticated optimisations, which we demonstrate empirically. The invertible model is capable of solving both the forward and the inverse problem by providing two kinds of prediction, called the forward and the inverse prediction. The forward prediction approximates OPAL, i. e. it calculates the beam properties corresponding to a given machine configuration.
The inverse prediction takes a user-imposed target beam and returns a machine configuration that realises an approximation to such a beam. To incorporate expert intuition and knowledge in the optimisation, the result of the inverse prediction initialises a GA-based multi-objective optimisation, which then needs fewer generations to converge than when the usual random initialisation is used. We demonstrate the capabilities of the two surrogate models with the two real-world examples depicted in Fig. 1. The first example is the AWA, a linear accelerator. The second one is the cyclotron used for the proposed Isotope Decay-at-Rest experiment, a proposed very-high-intensity electron-antineutrino source (from here on called the IsoDAR machine). We develop a forward model and an invertible model for both of these fundamentally different accelerators. Furthermore, we use them for a multi-objective accelerator optimisation to find machine configurations for desired beam properties. Physics Models and Datasets. For both accelerators we use the same underlying physics model, based on the OPAL accelerator simulation framework. The dynamics of an accelerator are given by a function where the quantity x represents a vector of machine settings (design variables), the scalar s ∈ R denotes the position along the accelerator, and the vector y(s) ∈ R n denotes the beam properties (quantities of interest). The function f represents the OPAL high-fidelity physics model. Our surrogate models learn the relationship between design variables and quantities of interest based on examples, which means that we need to incorporate the physics of an accelerator into a dataset. We achieve this by sampling the space of design variables randomly and evaluating the sampled configurations of interest at multiple positions along the machine. One sample in the dataset consists of the design variables, a position along the machine, and the corresponding quantities of interest. Specifics of the Argonne Wakefield Accelerator Model. The AWA accelerator [11] consists mainly of an electron gun, several radio-frequency cavities for accelerating the electrons, and magnets for keeping the beam focused. The electron gun generates bunches that form a train of Gaussians in longitudinal direction, separated by a peak-to-peak distance λ. Other electron-gun variables are the gun phase φ, the charge Q, the laser spot size (SIGXY), and the current in the buck focusing and matching solenoids (BFS and MS). For details we refer to Fig. 1d. In total, we have nine design variables, denoted x ∈ R 9 , that define the various operation points of the AWA. The ranges of x are determined by the physical limitations of the accelerator and are listed in Table 3. The quantities of interest (QoIs), y ∈ R 8 , are: the transversal root-mean-square beam sizes σ x , σ y , the transversal normalised emittances x , y , the mean bunch energy E, the energy spread of the beam ∆E, and the correlations between transversal position and momentum Corr(x, p x ), Corr(y, p y ). A summary and more details about the used QoIs are given in Table 2.
We build a labeled dataset of the AWA using OPAL and the latin hypercube procedure [12] to sample the search space of x, which results in a training/validation set of 18, 065 points and a test set of 913 points. Specifics of the IsoDAR Cyclotron Model. The simulations are based on the latest IsoDAR [13,14,15] cyclotron design, with the nominal beam intensity of 5 mA H + 2 beam (equivalent to 10 mA of protons). Ongoing modeling efforts of IsoDAR consider the radial momenta p r0 , the injection radius r 0 , the phase φ rf of the radio frequency of the acceleration cavities, and the root mean square beam sizes σ x , σ y , σ z at injection. The physical ranges of the machine settings are given in Table 4. The surrogate models will predict the transversal and longitudinal root mean square beam sizes σ x , σ y , σ z , the projected emittances x , y , z , the beam halo parameters h x , h y , h z , the energy E, the energy spread of the beam ∆E, and the particle loss N l (see Table 2). For this example we have six machine settings x ∈ R 6 and twelve quantities of interest y ∈ R 12 . All the samples are taken at turn 95, in the vicinity of the extraction channel. All values are calculated on the test set, i. e. on samples that have not been considered when choosing the parameters. We omit the curves for the QOIs in y direction for the sake of readability because they are close to the ones in x direction.
The labeled dataset, obtained with OPAL, consists of 5, 000 random machine configurations. Among these, 1, 000 samples are randomly selected and used as the test set, 800 for the validation set, and the remaining 3, 200 samples form the training set.

Results
Forward and Invertible Models. We assess the quality of the surrogate models by evaluating the adjusted coefficient of determination (R 2 ) and the relative prediction error at 95% confidence on the test set; i.e., data, that has not been used during the development of the models. For the AWA model, theR 2 values for the forward model are very close to one at nearly all positions along the machine, except for the solenoids and cavities. As a consequence, the variance of the dataset is explained well, as can be seen in Fig. 2a. Only the variance of the emittance is not captured well, due to a numerical artifact arising from the choice of coordinate system in the solenoids and cavities respectively. However, the relative prediction error for this quantity is still less than 25% for 95% confidence, as depicted in Fig. 2c. The relative errors for the other quantities are even lower than 15 and 10% for σ x (Fig. 2b) and ∆E (Fig. 2d), respectively. For the forward surrogate model of IsoDAR, the variance of the dataset is captured very well with a minimumR 2 value of 0.95 among all predicted values, except for the quantities in z-direction, here it ranges between 0.82 and 0.86. The relative prediction error of the forward model is at most 5.6%. Since we are mainly interested in the beam parameters, we evaluated the invertible model by performing first the inversion, computing machine settings from beam parameters, with the invertible model and then a forward prediction with the forward model. For the AWA model the error at 95% confidence is almost 40% for the beam size, emittance, and energy spread. Nevertheless, the sampling error is only 20% for three out of four target beams in the test set. The energy, on the other hand, is obtained almost with perfect accuracy. For the IsoDAR model, the relative errors for 95% confidence range from 0.08% to 11%. The correspondingR 2 values lie between 0.79 and 0.96 except for the longitudinal quantities, for which we obtained values between 0.65 and 0.77. The detailed values for the IsoDAR model can be found in Table 1, Fig. 7 and Fig. 8. Multi-objective Optimisation. To find good operation points for the accelerators, we solve multiobjective optmisation problems [4,5,8,16]. The standard approach for that is visualized in Fig.  6a: A physics based model, like OPAL, together with a GA and random initialization is used. We investigate two other approaches depicted in Fig. 6b. First, we use a GA together with the forward model, instead of OPAL, and a random initialisation. In the second approach, still the GA and the forward model are employed, but in addition the invertible model provides us with a good initial guess for the optimisation. Performance Metrics Optimisation. Characterising an approximation to the Pareto front, so the set of optimal solutions, encompasses two main aspects. First, we strive to converge quickly towards non-dominated quantities of interest. The range of the objectives in the non-dominated set is chosen to analyse the convergence behaviour. Second, we demand that the non-dominated quantities of interest are diverse, i. e. well spread out over the entire approximated Pareto front. The convex hull volume (V ch ), the number of solutions of the non-dominated set, as well as the generational distance [17] and the hypervolume difference (i.e., the difference between the hypervolume of the optimal solution and the hypervolumes of the Pareto front of each generation, with a reference point chosen close to the Nadir point [18]) assess this quality aspect. Finally, plotting projections of the nondominating set provides a holistic view to compare the effect of the initialisation strategies (random vs invertible model initialisation) on both the convergence and the diversity of the non-dominated set. Multi-objective Optimisation for AWA. The constrained multi-objective optimisation problem represents a generic multi-objective optimisation task, found in many of the past and future AWA experiments [11]. The optimisation of beam parameters was required at 26 m for a pilot experiment performed in 2020, regarding a new scheme for electron cooling. In addition, there were multiple constraints at various positions along the accelerator. We solve the optimisation problem twice, once initialising the GA population randomly and once initialising the GA population with the invertible model outcome, where we utilized target values for the beam parameters as listed in Tab. 6.
The objective values for different generations are depicted in Fig. 3a. The optimisation initialised with the invertible model converged almost entirely after 10 generations. If random initialisation is used, more than 100 generations are needed to arrive at the same optimal configurations. The ranges of the beam properties in the non-dominated set confirm this observation ( Fig. 3b-d). Using the invertible model leads to a bigger hypervolume even when prediction uncertainty is accounted for (Fig. 3e). Furthermore, the optimisation initialised with the invertible model finds more solutions during the first 500 generations (Fig. 3f). These two statements imply that the initialisation with the invertible model finds both more and more diverse non-dominated design variables. The non-dominated feasible machine settings are evaluated with OPAL (dots in Fig. 4a-c) to validate the predicted optimal objective values. All the OPAL evaluated points lie within the region of uncertainty for the optimal objective values at 90% confidence. In Fig. 4d a deeper insight into the constraint on the correlation parameter (Corr(x, P x )) is given. Although the surrogate models found many configurations lying outside the allowed region, both initialisation strategies lead to at least some feasible configurations. Importantly, no machine configuration in the training/validation set is feasible according to the optimisation problem. Nevertheless, some optimal points fulfill the correlation constraint. Multi-objective Optimisation for IsoDAR. For the IsoDAR case, we solve a two-objective optimisation problem, similar to that introduced by Edelen et al. [8]. We aim to minimise simultaneously the projected emittance x and the energy spread ∆E of the beam without any constraints: As before, the optimisation problem is solved with a GA using the forward surrogate model. First a random initialisation is used and then an initialisation using the invertible model with the target beam parameters in Table 5.
The convergence of the objective values for the two optimisation approaches is depicted in Fig. 5a. Both approaches lead to the same non-dominated set. The optimisation with the invertible model initialisation converges faster and finds more optimal points already in the first few generations. The random initialisation approach needs approximately 40 generations until it is at the same level. Fig.  5b shows that both initialisation schemes lead in the end to nearly the same number of solutions. The difference of the objective values for the two approaches in the first generations of the optimisation can also be seen in Fig. 5d, which depicts the ranges from minimum to maximum values of the non-dominated set for the two objectives. The performance-metrics plots Fig. 5c, g show the behaviour of the two approaches quantitatively. The generational distance for the invertible model initialisation starts at a much lower value. Therefore the non-dominated sets found in the first generation are already closer to the final non-dominated set than the ones found with the random initialisation. The lower hypervolume difference and the higher convex hull volume for the invertible model initialisation indicate that the initial non-dominated set is more widely spread, but these advantages vanish after approximately 40 generations.
To validate the optimisation results, we randomly chose 100 points of the non-dominated sets of both approaches and simulated them with OPAL. We did not evaluate all non-dominated machine settings because of the high computational cost of OPAL for the IsoDAR example. As can be seen in Fig. 5e,f, nearly all of the OPAL points lie within the 90-percent confidence region of the prediction with the test set, although only few points of the original OPAL data set are in this area. Computational Advantages. We compare solving an optimisation problem with OPAL to our approach of using a forward surrogate model instead of OPAL. There are two quantities to be considered for the speedup analysis. First, the time-to-solution t measured in hours and, second, the computational cost c measured in CPU hours. The detailed calculations for the improvement of both quantities can be found in Section D in the SI. Previous work [19] also calculated speedups, but those calculations do not include the computational cost and the entire cost of developing a surrogate model because they neglect the hyperparameter scan. Moreover, the existing speedups refer to a model that is only capable of predicting the quantities of interest at one position. Our calculations refer to our novel forward surrogate model, which is capable of predicting quantities of interest at a plethora of positions, and do not only include the generation of the dataset, but also the training, hyperparameter tuning, and the cost of running the optimisations themselves. The invertible model reduces the number of generations. However, we calculate the speedup for a fixed number of generations, therefore the effect of the invertible model is not included in the speedup calculations.
For the AWA (see Table 8), we calculate the relative improvement t OPAL /t surr ≈ 3.39 in terms of time-to-solution and c OPAL /c surr ≈ 1.83 in terms of computational cost. This approach is more than three times faster and requires approximately 45% less computational resources (see Eq. 2 in the SI). For IsoDAR the benefits are even more pronounced: t OPAL /t surr ≈ 640.00 and c OPAL /c surr ≈ 59.20. Hence, the usage of a surrogate model saves more than 98% of the computational cost and time resources compared to the approach using OPAL.

Methods
Forward Model. The forward surrogate modelf (Fig. 6c) is a fast-to-evaluate approximation to the expensive OPAL-based physics model f While existing work [8] is restricted to approximating the accelerator at a single position s, usually at the end, we build a model for the AWA that approximates the dynamics of the machine at all positions. Without loss of generality, the IsoDAR model predicts the beam properties only at the end of the machine, as ongoing design work requires. We follow work by Edelen et al. [8] and use densely connected feedforward neural networks [20] as candidates for the forward model. We focus only on architectures where all hidden layers are of the same width to simplify the hyperparameter scan. For the AWA model the hidden layers use the ReLU activation and the output layer uses the tanh activation function, whereas for the IsoDAR model tanh is used as an activation for the hidden layers and the output layer activation is linear. We use the mean absolute error (MAE) as the loss function for the AWA model and the mean squared error (MSE) for the IsoDAR model. Random initialisation Random initialisation Initialisation using invertible model forward model predicts beam  We optimise the trainable parameters using the Adam algorithm with default parameters. The nontrainable parameters are selected using grid search and can be found in Table 7, along with further information about the models. All the models (forward and invertible) are implemented using the TensorFlow framework [21] in version 2.0. For the hyperparameter scan of the IsoDAR model in addition the RAY TUNE library [22] is used. The models are trained and evaluated on the Merlin6 cluster at the Paul Scherrer Institut, using 12 CPU cores for the AWA dataset and one CPU core for the IsoDAR case. Invertible Model. The invertible model (see Fig. 6d) is capable of performing two kinds of predictions: A forward prediction, approximating OPAL, and an inverse prediction, aiming to solve the inverse problem: Given a target beam, the invertible model predicts a vector of design variables such that the corresponding beam is a good approximation of the target beam. We follow the ansatz by Ardizzone et al. [23] and refer to their work for the details of the architecture of the inverse model. Ardizzone et al. build a neural network consisting of invertible layers, so-called affine coupling blocks (see Fig. 6d). Each of them contains two neural networks, sharing parameters. We decided that all hidden layers of the internal networks have the same number of neurons and that the internal networks of all affine coupling blocks have the same architecture. Each internal network consists of the same number of hidden layers of the same size. All hidden neurons of the internal networks use the ReLU activation function. The solution of the inverse problem is not unique, hence a mechanism to select one solution is needed. This is realised by mapping the machine settings not only to the design variables, but also to a latent space Z ⊂ R dz of dimension d z . This space follows a known probability distribution, which allows to sample random points in it. Sampling a point in the latent space corresponds to selecting one solution of the inverse problem. For this reason, the inverse prediction is also called sampling.
For technical reasons, an invertible neural network requires that the input and output vectors have the same length. This is generally not fulfilled. If the input and/or output vectors are not big enough, vectors containing noise of small magnitude are added so that the total dimension of input and output vectors is d. We follow the advice of Ardizzone et al. [23] and allow padding not only on the smaller vector, but on both, in order to increase the network width and therefore the model capacity.
The total dimension d is a hyperparameter. We denote the padding vectors x pad ∈ R p , y pad ∈ R q and obtain d = m + 1 + p = n + 1 + dim(Z) + q. The mathematical formulation of the forward prediction is as follows: The inverse prediction is now written aŝ In order to improve the performance of the invertible model we use a best-of-n strategy. For each inverse prediction (at inference time only), n tries design variable configurations are sampled, evaluated with the forward pass and the best configuration is chosen as the final prediction. Note that we do not need to rely on a forward model but only on the invertible model itself. We choose n tries = 32 for both the IsoDAR model and the AWA, as we observed no significant improvement for bigger values. All prediction errors are calculated with this particular choice. The loss function consists of multiple parts: where the L x loss ensures that the sampled machine setting vectorsx follow the same distribution as the machine settings in the dataset, the L z loss ensures that the latent space vectors follow the desired distribution. Both loss functions are realised as Mean Field Discrepancy [24]. The L y loss is the mean squared error betweenŷ and y true , and the L artificial is the sum of the MSEs of both x pad and y pad . The reconstruction loss L r ensures that the inverse prediction is robust with respect to perturbations of small amplitude The inverse prediction aims to generate machine settingsx =f −1 (y, s) 1:m such that the resulting beamŷ = f (x, s) 1:n comes close to desired target-beam properties. To estimate the prediction accuracy of the models, we reproduce vectors y from the test set. It is not reasonable to compare the sampled machine settings to the corresponding setting x in the test set because multiple machine settings might realise similar beams. Instead, we perform a forward prediction with the forward surrogate model. This allows us to estimate to which beam properties the sampled configuration leadsỹ =f (x, s). That quantity is used to describe the error of the inverse prediction.

Discussion and Outlook
We have introduced a novel flexible forward surrogate model that is capable of simulating highfidelity physics models at a plethora of positions along the accelerator. We have shown that forward surrogate models reduce both the time (by a speedup factor of 3.39 for the AWA and 640 for IsoDAR) and the computational cost (by 45% and 98%) of beam optimisations. The benefits increase if several optimisations need to be performed, because the time-consuming and computationally expensive part is the development of the model. Once the models are developed, the optimisations are almost free and can be performed in approximately 10 min on 12 cores. This allows experimenters to adapt to new situations by changing constraints and objectives and to rerun the optimisation with very little computational resources. Furthermore, the surrogate models themselves are useful in control-room settings, as they are fast enough to be used in a graphical user interface. This gives operators the possibility to quickly try out different machine settings. Despite of all these benefits of the surrogate model approach, one needs to be careful when applying them to problems with narrow constraints. Neural networks can only accurately model the machine-setting regimes represented in the data set on which they are trained. If the feasible region of an optimisation problem is narrow, the models will hardly be able to find configurations that fulfill all constraints. One solution to this issue is to sample the feasible region more densely instead of sampling the design space uniformly. This is difficult because usually the feasible region in design space is unknown a priori. We have also presented an invertible model, which makes it possible to simulate the forward direction as well as the inversion, thus predicting machine settings that lead to desired beam parameters. The invertible model alone has the potential to support the design of accelerators, and it could also be used to implement a failure-prediction system or an adaptive control system. Another use-case of invertible surrogate models was demonstrated in this paper. Invertible surrogate models were shown to be able to bias the initialisation of a GA towards the optimal region, thereby incorporating prior knowledge and experience of experimenters. This reduces the number of generations needed for convergence of the GA when a forward surrogate model is used. As the optimisation is almost free when a forward surrogate is employed, there is not much incentive to go through the labour of developing an invertible model. However, if OPAL is used to evaluate all individuals, the reduced number of generations might reduce the time and cost of the optimisation significantly. Using OPAL solves the problem of the underrepresented feasible space because OPAL is capable of modelling the relevant physics of a particle accelerator. The approach of applying the biased initialisation to an OPAL optimisation might combine both the advantage of needing fewer generations with the ability to accurately represent the feasible space. The accompanying savings might justify the cost of developing the invertible model. Further research is needed to investigate this prediction. Finally, the choice of the target vector is important, and additional research is needed to investigate its impact. The societal impact of accelerator science and technology will continue, with no end in sight [2,1,25,26]. With this research we add new computational tools and hence contribute to the quest of finding optimal accelerator designs and machine configurations, which in turn are likely to greatly reduce construction and operational costs and to improve physics performance.

Acknowledgements
We acknowledge the help of Dr. John Power from AWA and Dr. Daniel Winklehner from the IsoDAR collaboration.

A Predicted Quantities & Model Fidelity
Performance metrics. We make use of the adjusted coefficient of determination for assessing the quality of the surrogate models, which is defined as where N test denotes the number of samples in the test set, m is the number of design variables x i ∈ R m and, we only include samples where s i == s. This quantity can be interpreted as the fraction of variance in the data that is explained by the model. A perfect prediction corresponds tō R 2 = 1.
The prediction uncertainty at confidence q is estimated by calculating the residuals of the prediction and take the absolute value over the test set samples. Then we calculate the q percentile of these values. The uncertainty is calculated separately for each beam property i and at every position s where the samples correspond to position s.
To measure the performance of the optimisation we use among others the convex hull volume. To compute that, the residuals over the test set are approximated with a Gaussian distribution. This allows us to sample new points around the predicted values and perform a Monte-Carlo estimate of the convex hull volume of the non-dominated set. IsoDAR model fidelity.         Table 7: Parameters and properties of our surrogate models. EOM means end of the machine.

B Parameter Ranges used in the Optimisation
Explanation of the network architectures: A feedforward network with n l hidden layers of width n w is denoted as an n l × n w network. An invertible model consisting of n b blocks, each containing internal networks of depth n d and width n w , is noted as n b × n d × n w . The loss functions were described in the main text and are given here for completeness reasons: where N denotes the number of samples over which the loss is calculated.

D Computational Details Implementation
The forward model and invertible model are implemented in TensorFlow 2.0 [21]. They are trained on 12 CPU cores for the AWA dataset and one core for the IsoDAR model. Training and evaluation were executed on the Merlin6 cluster at the Paul Scherrer Institute. The hyperparameters of IsoDAR are found with a grid search using [22].
We solve both the AWA and IsoDAR optimisation problems using the NSGA-II algorithm [27] implemented by the Python library pymoo [28], with default parameters. The general algorithm is shown in Alg. 1. The NSGA2() function is given the quantities of interest for the current generation, calculates the values of the objectives and the constraints, and suggests the next generation of machine settings based on the results. When we refer to the optimal solutions of generation g, we mean the non-dominated feasible solutions that are found in the generations up to and including generation g. Notice that only the forward surrogate model is used to evaluate the objectives and constraints; OPAL is used exclusively to train the models and validate the final optimal configurations. We solve both optimisation problems in two ways: First, we initialise the first generation randomly by uniformly sampling machine settings from their ranges (see Table 3 and Table 4). Second, the invertible model is used to bias the initialisation towards the optimal region. To achieve this, we provide a target vector of beam properties y t at position s t and ask the invertible model to sample the individuals in order to achieve this beam, see Alg. 2. In other words, we guess what a good beam looks like at one position, and let the network calculate corresponding machine settings. This is what allows operators and experimenters to incorporate their experience and intuition into the optimisation.

Speedup Calculation
Let t o be the time needed for a single OPAL simulation, running on r o cores. Let t t be the time to train a surrogate model on r t CPU cores, and t p the time needed to evaluate one individual of the optimisation (the p stands for prediction). Assume that the prediction takes place on the same number of cores as the training. Let n be the number of unique machine configurations in the datasets. Since the development of a surrogate model involves a training, validation and test set, all of them are included in this number. Let n h be the number of hyperparameter configurations to be tried for the model. A single optimisation requires running n g generations, each consisting of n i individuals. A summary of the parameters along with their values for both models is depicted in Table 8. The runtime of OPAL depends on the machine settings. The quantity is obtained by measuring the runtime for generating the training/validation set and then dividing by the number of unique machine settings in this set. For all calculations, assume that we have infinite computational resources at our disposal. This means that all calculations can always exhaust the entire theoretically available parallelism. In practice, run times will usually be higher than the ones calculated using this assumption. The only step where the surrogate model is limited by computational resources is the generation of the dataset. This step is embarrassingly parallel, so the surrogate model can adapt perfectly to limited computational resources. If only OPAL is used, however, the optimisation is also affected by limited resources. In the case where only one individual of a generation cannot be evaluated in parallel to the others, the time needed for the optimisation already doubles. This is because the next generation of the optimisation algorithm cannot start before the parent generation is fully evaluated 1 . Therefore, our assumption of infinite parallelism favours the OPAL-only approach. For this reason, our calculations lead to a lower bound for the speedup. Now we can calculate the time-to-solution for an optimisation using the surrogate model. This task involves the creation of the dataset and the model training (including a hyperparameter scan). The former is embarrassingly parallel and we have unlimited computational resources, so all design variable configurations are evaluated at once. The same argument applies for the hyperparameter scan. The time-to-solution for an optimisation with the surrogate model is given by  Solving n optimisation problems is equivalent to solving one problem using n times the number of generations. This means that solving more optimisation problems is equivalent to using more generations. Asymptotically, the improvement in terms of computational cost scales like The relationship of the relative improvement and the number of optimisations can be seen in Fig.  9.