IGLOO: An Iterative Global Exploration and Local Optimization Algorithm to Find Diverse Low-Energy Conformations of Flexible Molecules

: The exploration of the energy landscape of a chemical system is essential for understanding and predicting its observable properties. In most cases, this is a challenging task due to the high complexity of such landscapes, which often consist of multiple, possibly hierarchical basins that are difﬁcult to locate and thoroughly explore. In this study, we introduce a novel method, called IGLOO (Iterative Global Exploration and Local Optimization), which aims to achieve a more efﬁcient global exploration of the conformational space compared to existing techniques. The method utilizes a tree-based exploration inspired by the Rapidly exploring Random Tree (RRT) algorithm originating from robotics. IGLOO dynamically adjusts its exploration strategy to both homogeneously scan the landscape and focus on promising regions, avoiding redundant exploration. We evaluated IGLOO using models of two polypeptides and compared its performance to the traditional basin-hopping method and a hybrid method that also incorporates the RRT algorithm. We ﬁnd that IGLOO outperforms both alternative methods in terms of efﬁciently and comprehensively exploring the molecular conformational space. This approach can be easily generalized to other chemical systems such as molecules on surfaces or crystalline systems.


Introduction
Cost functions appear in many fields of science, economics, and engineering, wherever issues concerning the optimization, typically the minimization or maximization, of some (scalar) quantity-the so-called cost-are raised [1].Classical examples are the cost of a business strategy, the fitness of biological species or individuals, the objective function value of a trajectory in an optimal control problem, or the potential energy of an arrangement of atoms in space.Given a state space of the system of interest, the cost function assigns a real number to each state of the system; such a state is often called a microstate, configuration, or conformation (in the case of chemical systems), or a legitimate solution to a combinatorial optimization problem.Here, the state space can be a discrete set of states, like the nodes of a graph (often called a metagraph in the context of cost function analysis [2]), or exhibit a continuous structure, like a subspace of R n .Together with the neighborhood structure of the state space-in the case of a discrete state space, the connectivity of the metagraph-, the cost function can be represented by a so-called cost function landscape, which usually contains a multitude of local minima that are separated by a complex barrier structure.In this context, one should note that the choice of connectivity or neighborhood is essentially left to the researcher him/herself; since the choice of the neighborhood implicitly defines the dynamics of the system in most cases, one often speaks of the neighborhood in terms of the moveclass of the system-a terminology derived from the description of the exploration algorithms that are designed to study the landscape.Often, the number of local minima grows exponentially or even factorially with some parameter that represents the size of a system, such as the number of cities in a travelling salesman problem [3,4], the number of spins in a magnetic material, or the number of atoms in a cluster or molecule [5].For more general aspects of cost function, fitness, or energy landscapes, we refer to the literature [1].
In the case of a chemical or physical system, one frequently encounters the (potential) energy of the system as the quantity to be studied or minimized, and thus one speaks of the energy landscape of, e.g., the molecule of interest.Furthermore, in the case of such systems, the landscape is of much greater importance than just as a tool to find the lowest energy, since the laws of physics that describe the evolution of the system are embedded in the local shape of the potential energy surface (PES): the gradient of the potential energy yields the forces acting on the atoms, and thus prescribes the subsequent trajectory of, e.g., the atoms belonging to a molecule, such as the vibrations or translational motion of the molecule.
Thus, the study of the energy landscape of physical and chemical systems usually goes far beyond the obvious task of identifying the global minimum; for example, all lowenergy local minima surrounded by high enough barriers on the landscape correspond to (meta)stable configurations of, e.g., the molecule, which can be observed or synthesized, in principle, or transformed into each other.Therefore, determining all low-energy minima of a chemical system ranks as a goal of similar importance as finding the global minimum [1].
Furthermore, after the initial step of finding not only the global minimum but also as many side-minima as possible, one would study the energy barriers and entropy barriers of the system separating these minima.This requires the design and implementation of new classes of algorithms that, e.g., identify saddle points between local minima [6,7], or measure the entropic barriers between basins for portions of the landscape restricted to lie below energy lids, such as the threshold algorithm [8].
However, in a large number of instances, just identifying the global minimum without any prior information about the cost function landscape is already a highly nontrivial task that typically requires an enormous computational effort.This is due to the large size of the state space-prohibiting an exhaustive search via testing every microstate-and due to the complexity of the barrier structure, which makes straightforward steepest descent gradient-based minimizations (for continuous landscapes) or random downhill walks ineffective.Although some improvement is achieved by combining such greedy downhill algorithms with a systematic (nearly exhaustive) grid-like selection of starting points for the local minimization [9], this combination is still not very efficient in identifying the global minimum of the system, due to the enormous size of the state space and the large number of local minima present.Moreover, many of these minima would be rediscovered many times, thus resulting in a massive oversampling of the landscape's minima.
These basic-in principle generally applicable-algorithms are often specially adapted to certain classes of systems and energy landscapes, e.g., for the prediction of the conformers of molecules, such as the rotamers of chain-like molecules such as proteins or polysaccharides.Since one usually restricts the structure generation to molecules with a given bond network, only limited changes in the bond lengths are allowed, and the main degrees of freedom that are varied during the search are the bond angles and dihedral (torsion) angles.Ideally, one would compute the energy on the ab initio level.However, because of the high computational expense of quantum mechanical calculations, one commonly employs empirical molecular modeling potentials instead, possibly adding penalty terms that restrict the bond lengths and angles to certain "allowed" ranges which are physically and chemically sensible-according to experimental data and/or results from ab initio calculations.Such limits can be either implemented as parts of the energy function or via the moveclass, i.e., no new test configuration is allowed that violates these constraints.Moveclasses with such constraints on the allowed configurations are sometimes called "rule-based conformation generators" [24]; they are employed in the context of molecular structure prediction by, e.g., the OMEGA [28] method; for a review of these and other approaches, we refer to the literature [24].
Trying to find the optimal combination of computational speed, quality of the candidates-both regarding their energy and the accuracy of the parameters of the physical structure obtained-, and diversity of the solutions in the set of generated candidates, is a great challenge for systems that exhibit complex multi-minima landscapes.In particular, the goal to explore a wide range of the state space without losing much computational time by having to laboriously cross many energetic barriers as in, e.g., MD simulations, has led to the introduction of algorithms that attempt to perform large moves on the landscape without wasting too much time in exploring uninteresting high-energy regions of the landscape.Such so-called jumpmoves are common features of many modern algorithms, and an optimal choice of the moveclass is crucial for the success of a global optimization procedure [8]; for example, one class of searches which combines every jumpmove automatically with a local deterministic (gradient-based) or stochastic (downhill random walk) minimization, and therefore moves from local minimum to local minimum, has been called the basin-hopping algorithm (BH) [29,30].Similarly, the genetic algorithms, which belong to the class of multi-walker algorithms that generate new states from two (or more) old states by combining their parameter values, are nowadays often combined with a local minimization of the newly generated state [31].Nevertheless, oversampling is still a major problem.Even though the multiple discovery of the same minima is often interpreted as the heuristic criterion of the success (sometimes called "convergence") of the global optimization, thus turning oversampling from a bug into a feature, performing orders of magnitude more local optimizations than necessary greatly reduces the efficiency of the search algorithm.
Due to the wide variety of fields where global optimizations of cost functions take place, many of these algorithms have been re-invented, or have been transferred from one area of applications to another, where one often has to adapt the original algorithm to the specific aspects of the new class of systems under consideration.An example of such a transfer with adaptation is the application of the RRT (Rapidly exploring Random Tree) algorithm [32], which was developed in the field of robotics to solve the so-called motion planning problem [33], to the field of chemistry.Indeed, solving the robot motion planning problem requires algorithms to explore the state space of the robot system aiming to find feasible trajectories between an initial and a final state.The same type of algorithms, with some adaptations, can be applied to explore the energy landscape of atomic or molecular systems [34].
Of particularly great potential to yield a qualitative and not only quantitative improvement in the development of global optimization algorithms are those methods which combine tools and (sub)-algorithms from different fields into a new type of algorithm.In this study, we present the IGLOO (Iterative Global Exploration and Local Optimization) algorithm, which combines concepts from the RRT algorithm and the threshold algorithm to efficiently explore the low-energy regions of the landscape.In the past, an earlier version of this algorithm has been successfully applied to the study of disaccharide molecules on metal surfaces, predicting the shape of the molecules on the surface and thus allowing an interpretation of the STM (Scanning Tunneling Microscopy) images obtained in the experiment [35].
In the present investigation, we analyze the performance of this algorithm in detail, using two chain-type molecules as examples.The outcomes of these global optimizations are compared to the performance of two similar types of algorithms, a classical basinhopping algorithm [36] and a hybrid BH-RRT algorithm [37].

Materials and Methods
This section is structured in a top-down manner.First, the main principles of the global optimization algorithms considered in this comparative study are described.They are presented from the simplest to the most complex one: basin-hopping (BH), hybrid BH-RRT (HYBRID), and IGLOO.In our implementation, and for the sake of a fair comparison, these algorithms apply the same techniques to perform sampling and local optimization.These techniques as well as other implementation details are presented in the second subsection.The last subsection describes the molecular model and potential energy function considered for this investigation.

Basin-Hopping (BH)
Basin-hopping is the popular name [30] given to a "Monte Carlo-minimization" method initially proposed ten years earlier [29].The basic version of this algorithm, usually called monotonic basin-hopping [36] is presented in Algorithm 1. Starting from a given initial configuration q, this algorithm iterates the following steps until one of the conditions in the function StoppingCriteria is satisfied (see Section 2.2): (line 3) a relatively largeamplitude perturbation is applied to the current conformation q, aiming to escape from local basins; (line 4) a new conformation q new is generated by local energy minimization from the perturbed conformation q ; (line 5) the transition to the new local minimum is accepted or rejected based on the usual MetropolisTest [38].Following this stochastic acceptance test, the new conformation q new is accepted if its energy is lower than or equal to the one of the previous conformation q.Otherwise, it is accepted with a probability that decreases exponentially with the positive energy difference between the two configurations.The implementation of BH used for the comparative analysis in this work follows a multistart procedure, as presented in Algorithm 2. Aiming to cover the space more globally, it performs several rounds of the monotonic-BH algorithm, starting from randomly sampled conformations generated by the function SampleRoot.Note that we successfully applied this implementation of BH in previous work [39].

Algorithm 1: Monotonic-Basin-Hopping input : Conformational space variables C
Algorithm parameters and energy function P The HYBRID algorithm [37] combines the underlying principles of two methods: BH and RRT [32].RRT is a popular algorithm in the robotics community, originally developed to plan the motions of robotic systems, and subsequently extended and applied to explore the conformational space of molecules [34,39].RRT performs an iterative stochastic process that, starting from a given state, incrementally builds a tree that tends to cover the reachable regions of the search space.The tree's growth is guided by an intrinsic "Voronoi bias", enabling it to expand rapidly into unexplored regions.The idea of the HYBRID algorithm is to interleave exploration stages following the BH heuristic (i.e., digging into low-energy basins) and stages applying the RRT approach (i.e., biasing the search towards unexplored regions).
The pseudo-code in Algorithm 3 presents the main steps of HYBRID.The implementation applied in this work involves a mechanism similar to the multistart-BH, in which the exploration starts from different initial configurations sampled by the function SampleRoot.The number of initial configurations is defined in the parameters P and tested inside the function MaxNumberRoots.Then, the algorithm follows the main steps of RRT: (line 6) a conformation q rand is randomly sampled in the whole conformational space; (line 7) the nearest conformation q near to q rand among the already explored minima contained in T is selected; (line 8) the basic tree expansion step in the original RRT algorithm is replaced by an execution of the monotonic-BH algorithm starting from the selected q near .Since the probability of a conformation q near to be selected for expansion is proportional to the volume of its associated Voronoi cell (i.e., the subset of points in the conformational space closer to q near than to any other conformation contained in the tree), the tree tends to grow towards the least explored regions of the space.Note that the parameters used for the monotonic-BH within the HYBRID algorithm should ensure that the number of iterations of perturbation and local minimization stages is relatively small compared to the number of iterations within the multistart-BH, so that the efficiency of the exploration via the coupling of the BH with the RRT algorithm will be enhanced.

Iterative Global Exploration and Local Optimization (IGLOO)
Similar to the HYBRID algorithm, IGLOO relies on the exploration strategy of the robotics-inspired RRT algorithm.However, the exploration and local minimization steps are nested in a different way.The pseudo-code of IGLOO is presented in Algorithm 4. The first part of the algorithm (lines 2-5) samples n conformations that will be used as the initial roots of the exploration trees.These trees are constructed by the function TRRT-Exploration (line 7), which implements a multi-tree variant of the RRT algorithm that considers an energy threshold as a constraint for the exploration.This algorithm will be described in Section 2.2.Then, local energy minimization is performed from each of the conformations corresponding to the nodes of these trees (lines 8-10).The resulting set of local minima is filtered to eliminate conformations that are too similar, and to reduce the number of local minima taken into account for the following steps (line 11).Note that although the TRRT-Exploration generates conformations with a good dispersion in C, the subsequent local minimization usually produces clusters of conformations.The UpdateParameters function (line 12) adapts the energy threshold, the exploration step size, and the maximum number of roots for the next iteration of the algorithm.The three main steps of IGLOO-global exploration, local minimization, and filtering-are iterated until one of the conditions in the function StoppingCriteria is satisfied.To illustrate the behavior of IGLOO, Figure A1 provided in the appendix section presents results obtained along several iterations of the algorithm to find low-energy conformations of a simple system that has frequently been used in theoretical works: the alanine dipeptide.

Implementation Details
This section provides more detailed explanations of the main functions implemented in the algorithms described above: SampleRoot: This function randomly samples a conformation q from the domain defined for the conformational space variables C.This conformation is locally minimized using the LocalMinimization function described below.If the set T already contains other local minima, the distance between the new sample and the previous ones is computed, and the new sample is rejected if the minimum value of these distances is below a given threshold defined in the set of parameters P. In that case, the process is repeated until the new sample is sufficiently far from all the others.This strategy is inspired from the Poisson disk sampling process [40], and aims to guarantee a good dispersion of the points used to initialize the exploration.
LargeAmplitudePerturbation: As is generally done in Monte Carlo-type methods, this "random move" is not applied to all the variables simultaneously, but to a subset of them at each iteration.In the implementation used in this work, the LargeAmplitude-Perturbation is applied to a single variable at a time (we tested random moves applied to several variables simultaneously but the results did not improve for the test systems presented below).More precisely, these large-amplitude moves are applied only to a subset of the conformational space variables considered to be the most important ones, whereas the LocalMinimization function, explained below, operates on all the variables.Section 2.3 will explain the main and secondary variables for the type of molecules considered in this work.The bounds for the amplitude of the perturbation (i.e., the minimum and maximum step size) are defined in the set of parameters P. A random value between these bounds is sampled at each iteration.
LocalMinimization: The local minimization is based on a simple Monte Carlo method at very low temperature.Although this type of stochastic algorithm is in general less computationally efficient than deterministic gradient-based approaches, it is also less sensitive to local traps.Moreover, it does not require the computation of the derivatives of the objective function, which can be difficult and expensive in some cases.The algorithm iteratively applies small random perturbations to the current conformation, which are accepted or rejected based on the Metropolis test (as for the BH algorithm).As the temperature parameter used in this test is very low, the probability of accepting a locally perturbed conformation that increases energy is also very low, but not zero, allowing minimization to escape from small energy basins.
Our implementation considers two types of variables, main and secondary, as defined in Section 2.3.At each iteration, one of the two groups is alternatively selected.Then, a number of variables inside the group is randomly selected.The probability to perturb one or several variables is adapted with the number of iterations.During the first iterations, the probability of selecting one or several variables is of 50% in each case.Once a threshold percentage of rejected Metropolis tests is reached within a given number of consecutive iterations, only one variable is selected.The amplitude of the random perturbation applied to the selected variable(s) is adjusted between a maximum and a minimum value defined in the set of parameters P. Using a similar buffer strategy, this amplitude decreases each time that 50% of the Metropolis tests is rejected.To prevent the amplitude from decreasing too quickly, the buffer is reset after each amplitude reduction.The algorithm iterates until a stopping condition is satisfied.Similarly to the StoppingCriteria function explained below, the considered criteria are a maximum number of iterations, a limited computing time, or estimated convergence, based on a maximum number of consecutive rejections of Metropolis test.
TRRT-Exploration: The exploration algorithm implemented in this work combines ideas of the Transition-based RRT (TRRT) [34] and the threshold algorithm [8].The pseudocode is presented in Algorithm 5.The algorithm constructs several exploration trees starting from the given set of roots, aiming to cover the reachable regions of the conformational space.These trees are constructed by iterating the following steps: • (line 3) One of the trees T i is randomly selected.• (line 4) A conformation q rand is randomly sampled in a domain containing the selected tree.In practice, a convex polytope in C is computed for each tree, and updated when the tree grows.This polytope is enlarged in proportion to the size of the exploration step, and a conformation is randomly sampled within it.• (line 5) The nearest conformation to q rand , q near , stored in the nodes of the selected tree T i is chosen for extension.• (line 6) A new conformation q new is created by extending q near in the direction of q rand .This extension is implemented by moving along the interpolated path between q near and q rand of a given step size, provided in the set of parameters P. • (lines 7-8) A new node containing q new will be added to the corresponding tree if it satisfies a set of constraints.More precisely, the function ValidConformation tests the absence of clashes between non-bonded atoms and that the potential energy of the conformation is below a given threshold (defined in the set of parameters P).The function AddNode creates a new node and performs the merging of two trees when q new can be connected to a neighboring node in another tree.
The algorithm stops when StoppingCriteria based on a maximum number of iterations or on a limited computing time are satisfied.q rand ← SampleRandomConformation(C, T i ) 5 q near ← GetNearestNeighbor(T i , q rand ) 6 q new ← ExtendTree(P, q near , q rand ) 7 if ValidConformation(P, q new ) then 8 T ← AddNode(T i , q new ) 9 return T FilterConformations: The aim of this function is to reduce the number of local minima from which the next iteration of the IGLOO algorithm is initialized.
Similarity between conformations is based on two criteria: (i) the angular root-meansquare deviation (RMSD) between the values of the main variables; (ii) the difference between each main dihedral value.Two conformations are considered to be too similar if these two values are below given thresholds.In that case, the conformation with higher energy is removed.
StoppingCriteria: Several types of conditions can be considered to determine the end of the iterative process performed by the algorithms.They are based on: (i) a maximum number of iterations; (ii) a limited computing time; (iii) estimated convergence, based on the evolution of the lowest energy value (i.e., the global optimum).All these conditions are evaluated, and the first one to be satisfied stops the algorithm.

Molecular Model and Energy Function
For the comparative analyses of the algorithms presented in Section 3, we chose models of two polypeptides.We considered all-atom models, but assuming constant values for bond lengths and bond angles (i.e., the so-called rigid geometry assumption).Therefore, the conformational space variables we considered, C, consisted of all the bond torsions.As mentioned above, these variables are divided into two groups: the main variables involve all the backbone dihedral angles except the first one and the last one, and the secondary variables involve these two terminal backbone angles and the dihedral angles of the sidechains.The angular step size for the global exploration was empirically determined for each method/molecule pair.For the evaluation of the potential energy, we employed the AMBER parm96 force-field with an implicit representation of the solvent using the Generalized Born Approximation [41].

Software Availability
The algorithms presented in this section have been implemented in the Molecular Motion Algorithms (MoMA) software suite: https://moma.laas.fr/.This software is a prototype used for research purposes.The version used corresponds to the June 2023 release.Binaries can be provided upon request to the authors.

Results and Discussion
To analyse the performance of the algorithms, two molecules with different size and conformational behavior were chosen, as they allow different properties to be evaluated.The first molecule, Met-enkephalin, is an endogenous opioid pentapeptide (Tyr-Gly-Gly-Phe-Met) exhibiting a stable conformational state and various metastable ones [42].The second molecule, referred to as df-c-Myb in the following, is an heptapeptide (Ace-Lys-Gln-Cys-Arg-Glu-Arg-Ala-NMe) derived from the recognition helix of the c-Myb DNA-binding protein [43].It presents a more complex PES containing regions of the conformational space characterised by different structural motifs [44].A good characterization of the PES of these two molecules therefore requires an algorithm able to accurately locate the global minimum while exhaustively exploring the diversity of local minima, both with a high convergence rate.In the following, the performance of the three algorithms presented in Section 2.1 will be compared on the basis of (i) their convergence towards the global energy minimum, (ii) the associated atomic structures and computing costs, and (iii) their ability to explore different regions of the conformational space of the df-c-Myb PES corresponding to three different structural motifs: an α-helix (HLX) and two β-hairpins (HPN1 and HPN2).Note that we have used parallelized versions of the three algorithms.Therefore, the computing times mentioned below correspond to the sum of the CPU time for each thread.Given that we used 45 threads and that the speedup of the parallel version is almost linear with the number of threads, the wall-clock time is approximately the total CPU time divided by 45.
Figure 1 shows the evolution of the observed minimum energy as a function of the CPU time for the molecules used in our analyses.Here we depict, as a function of time, the value of the minimum energy averaged over 10 runs, together with a quartile-based estimate of the spread of the minimum energies observed.In the plot at the top, corresponding to Met-enkephalin, the IGLOO curve remains below the BH and HYBRID ones throughout the exploration.The IGLOO lowest energy decreases very rapidly until around 2 × 10 4 s, after which the slope decreases markedly and the average energy improves very little beyond 4 × 10 4 s.The overall behaviour of the HYBRID algorithm is similar to that of IGLOO but with a smaller initial slope and a stagnation of the observed minimum energy after 3 × 10 4 s.The BH curve looks very different, with a much shallower slope and large plateaus.After 6 × 10 4 s of exploration, the IGLOO energy (−224.43kcal/mol) is much lower than the energies of the other two algorithms (−221.63kcal/mol for HYBRID and −221.77kcal/mol for BH, respectively).The curves representing the evolution of the lowest energy of df-c-Myb (Figure 1, bottom panel) show a better performance of IGLOO in terms of convergence towards the minimum energy (−628.75kcal/mol after 1.4 × 10 5 s) compared to BH (−626.04kcal/mol) and HYBRID (−622.86 kcal/mol).The IGLOO curve has an almost constant slope up to 1.25 × 10 5 s, exploration time at which the average minimum energy reaches a plateau.Although its approach to low minimum energies is slower at the beginning, IGLOO surpasses the two other algorithms beyond 8.5 × 10 4 s.
The variability of the performance of the algorithms is illustrated by their first and third quartiles shown as dotted lines in Figure 1.These quartiles were computed from 10 independent runs of the algorithms, using time frames of 650 s for Met-enkephaline and 1450 s for df-c-Myb.For the Met-enkephalin and df-c-Myb, IGLOO exhibits an interquartile mean value of 0.7 ± 0.21 and 1.90 ± 0.71, respectively, vs. 1.34 ± 0.35 and 2.46 ± 1.74, respectively, for HYBRID, and 1.64 ± 0.64 and 3.03 ± 1.38, respectively, for BH.This reflects a lower variability of the results obtained from different executions of IGLOO compared to the two other methods.Due to the stochastic nature of the three algorithms, some runs deviate from their average behavior, as illustrated by the outliers shown in Figure 1.This confirms the need for a sufficient number of runs to obtain sufficient statistics for a meaningful comparative analysis of these methods.In summary, at short exploration times, the IGLOO algorithm locates structures of lower energy than the BH and HYBRID algorithms and with lower variability, demonstrating its ability to rapidly explore the PES with the objective of finding the global minimum.The lowest-energy structures identified by each global optimization method during long runs are depicted in Figure 2 for Met-enkephalin, and in Figure 3 for df-c-Myb, together with their energy and the execution time at which they were found.Regarding Met-enkephalin, the structures found by the three algorithms are similar and have very similar energies.The structure obtained by IGLOO was the one with the lowest-energy, followed by BH (+0.41 kcal/mol) and HYBRID (+0.80 kcal/mol).These energy differences come from the side-chain of residue 5 which breaks the stabilizing stacking interaction between the phenol group of residue 1 and the phenyl group of residue 4. The time needed for IGLOO to discover the lowest-energy structure of Met-enkephalin is around 2 and 20 times shorter than the one needed for the HYBRID and BH algorithms, respectively, to locate their optimal structures (which are still slightly less energetically favorable than the one found by IGLOO).This confirms the better performance of IGLOO in terms of convergence to the global minimum.The lowest-energy structures of df-c-Myb corresponding to each of the structural motifs (HPN1, HPN2 and HLX) are presented in Figure 3 for each of the three algorithms, including the time step when the structures were observed.A secondary structure was classified as a hairpin when it contains a β-turn H-bond ( 7 NH-4 O for HPN1 and 6 NH-3 O for HPN2) associated to at least one inter-strand H-bond.The possible inter-strand H-bonds are 9 NH-2 O, 3 NH-8 O, 8 NH-3 O and 4 NH- 7 O for HPN1 and 8 NH-1 O, 2 NH-7 O, 7 NH-2 O and 3 NH-6 O for HPN2.A secondary structure was classified as a helix when it contains at least two hydrogen bonds among 6 NH-2 O, 8 NH-4 O, 5 NH-1 O, 7 NH-3 O and 9 NH-5 O.A H-bond was considered to be present when the distance between a donor and an acceptor atom was shorter than a threshold value set to 3.3 Å.The three global optimization methods all succeed in visiting the three structural motif regions: HPN1, HPN2, and HLX.However, the methods differ in their ability to locate the lowest-energy structure for each motif.In the case of hairpins, IGLOO found a more favorable minimum than the ones obtained by BH (+17.52 kcal/mol for HPN1 and +1.41 kcal/mol for HPN2) and the HYBRID algorithm (+7.18 kcal/mol for HPN1 and +3.5 kcal/mol for HPN2).The performance with respect to the low-energy minima exhibiting helices diverges from that of the hairpins, with the BH method finding the best minimum; the best structures found by IGLOO and the HYBRID algorithm having energies 1.14 kcal/mol and 11.89 kcal/mol higher than the best one of the BH structure, respectively.We note that while the BH method finds low-energy helices, it encounters difficulties in the case of hairpins.The HYBRID algorithm explores the different regions of the PES but seems to perform poorly when it comes to refining the explorations locally.Finally, IGLOO manages to quickly identify diverse regions and perform refined local exploration of these regions, where the best minima in each region were found at similar CPU times during the simultaneous multi-basin exploration of the IGLOO runs.

Met-enkephalin
The two-dimensional projection of the minimized conformations issued from df-c-Myb PES explorations is depicted in Figure 4 for three evenly distributed CPU times (5 × 10 4 , 10 × 10 4 and 15 × 10 4 s).The two dimensions of the projection, the 4 Cα− 7 Cα and 2 Cα− 5 Cα interatomic distances, were chosen because they allow a clear spatial separation of the structural motifs of interest.Regions corresponding to HPN1, HPN2 and HLX structures can thus be approximately delineated on the 2D projection (ovals).The dark dots present on the 10 × 10 4 and 15 × 10 4 s snapshots correspond to minima already located at the CPU time of the previous snapshot, i.e., 5 × 10 4 and 10 × 10 4 s respectively.
The BH method struggles to explore the PES of df-c-Myb correctly, with some areas visited very infrequently (HPN2) or not at all (HLX) after 15 × 10 4 s of exploration, while the others are heavily oversampled.Indeed, very long times are needed to explore the conformational space corresponding to the helices, as shown by the ∼3 × 10 5 s needed to locate a low-energy helix (Figure 3).The BH local exploration also sometimes appears to be of poor quality, as evidenced by the high energy of the most stable HPN1 hairpin found by this algorithm, even though the algorithm has sampled a large part of the corresponding PES region.
The initial space visited by the HYBRID algorithm is more extended than the one visited by the BH method.After 5 × 10 4 s of exploration, minima are found in the three zones corresponding to the structural motifs.However, during the subsequent exploration, the algorithm concentrates on areas that have already been explored or those that are close to them, and struggles to extend the exploration to previously unvisited regions.Some areas therefore remain unexplored after 15 × 10 4 s while others appear to be oversampled.A major weakness of the HYBRID algorithm lies in its inability to properly explore locally: although the zones corresponding to the HPN1, HPN2, and HLX motifs are visited, the energies of the corresponding low-energy isomers found by the algorithm remain relatively high (Figure 3).
Finally, the IGLOO algorithm extends its initial exploration to the whole space and then concentrates on certain zones, including those corresponding to hairpins and helices.Its local refinement is also highly efficient, with the energy of the most stable isomer found by this method being the lowest one in the case of HPN1 and HPN2 and competitive for HLX (Figure 3).In summary, the BH method and the HYBRID algorithm struggle to cover the space to be explored, oversample certain areas, and show poor performance relative to local refinement.In contrast, the IGLOO algorithm explores the PES in a comprehensive fashion, focusing iteratively on the relevant basins.Furthermore, we note that IGLOO already reaches these relevant basins quite early during the search and thus allows to identify promising low-energy conformations with a rather small computational effort.

Conclusions and Outlook
In this study, we have presented a new global optimization algorithm, IGLOO, which combines the basic features of the RRT algorithm from the field of robotics, of the threshold algorithm that has in the past been used to study energy landscapes of various chemical and physical systems, and of repeated local stochastic quenches.Both the RRT and the threshold algorithm explore the landscape in many ways "orthogonal" to the standard global optimization and exploration procedures such as GA, SA, or BH.However, when combined with a moderately greedy local optimization algorithm, we find that IGLOO demonstrates a faster convergence to low-energy minima at a lower computational cost than the other two algorithms we have tested as a comparison (BH and HYBRID), even though they share tools such as frequent minimizations (BH and HYBRID) and the exploration capabilities of the RRT algorithm (HYBRID).In particular, IGLOO achieves a much smoother coverage of all important low-energy regions of the landscape and exhibits a high efficiency in exploring these zones on the landscape.Considering the tools common with the other algorithms, it appears that the threshold criterion likely makes a difference as it tends to focus the search on the low-energy regions of the landscape, without overly constraining the subsequent search inside these regions.In particular, by reducing the importance of lower-level energy barriers but still allowing the RRT-part of the search algorithm to stay in large regions that lie below a (possibly characteristic) threshold energy of the system, the combination of RRT, threshold, and local minimization results in a relatively fast yet efficient and homogeneous coverage of the various low-energy regions of the landscape.
In this "proof-of-principle" study, we have illustrated the ability of IGLOO to identify a representative set of low-energy conformations of flexible peptides.Nevertheless, we can envisage many other applications.A particularly interesting application would be the search for conformers of (small) organic molecules in the context of virtual screening for drug discovery, where IGLOO could be competitive with respect to other algorithms [24] thanks to its ability to rapidly find diverse low-energy conformations.IGLOO could also be applied to the generation of accessible conformations from a given one with minor modifications of the algorithm, incorporating additional features of the threshold algorithm.Indeed, the main aspect to be changed is the initial iteration of the algorithm, which should start from a given conformation instead of a set of randomly sampled states, and the energy threshold(s) controlling the accessibility should be initialized at the desired value(s).Going further, IGLOO could find applications in many other fields beyond molecular modeling, wherever global optimization methods are useful, including hyperparameter optimization in machine learning [45].Note, however, that due to the RRT-based exploration, IGLOO requires the definition of a suitable distance metric in the search space, which is not straightforward for all types of problems.
Concerning potential future applications of IGLOO to various types of chemical systems consisting of one or more molecules, either isolated or on a surface or within a medium, two issues are expected to arise: (i) the quality of the energy function, and (ii) the dimensionality and topology of the configuration space that needs to be explored.Regarding the energy function, full quantum mechanical evaluations of the energy are computationally very expensive and thus always a stretch for global landscape exploration.Nevertheless, energy functions with a good balance between accuracy and computational efficiency, e.g., based on density functional-based tight-binding (DFTB) [46], are available and have been successfully employed to globally explore energy landscapes of chemical systems [47].
The issue of the configuration space that needs to be smoothly but efficiently explored is a much more subtle one.We note that it will also appear in any application of IGLOO as a general global optimization algorithm for cost function landscapes drawn not only from robotics and chemistry but also from physics, biology, or economics.The concern here is the question whether the (T)RRT exploration methodology can be employed beyond compact state spaces such as the n-dimensional torus for a chain molecule or a multi-link robot arm where n angular variables characterize the microstates of the system.One class of systems for which (T)RRT appears to be very suitable are the landscapes of periodic approximants to crystalline solids, where the atom coordinates also exhibit a torus-like topology.While the threshold control and the stochastic quenches have proven their worth in many global landscapes studies, testing the (T)RRT feature of IGLOO for other classes of energy landscapes will be an important direction of future research.
The extension of IGLOO to a large variety of cost function problems will, of course, be accompanied by the fine-tuning of the current version of IGLOO, optimizing the parameters of the algorithm and possibly developing adaptive mechanisms for selection and modifying the IGLOO's control parameters based on information gathered about the landscape.Finally, the graph generation and exploration feature of the (T)RRT part of IGLOO should be a valuable tool for an efficient search for transition path candidates on the energy landscape.Combining this with the measurement of probability flows provided by the threshold algorithm could result in a promising approach for efficiently gaining deeper insights into the barrier structure of an energy landscape, which controls the stability of the minima configurations of a chemical system and governs the transformations among these structures and phases.
Another interesting direction for future work would be the extension of IGLOO to address general problems with high-dimensional state spaces.Up to now, we have successfully tested IGLOO on problems involving several dozen variables, but we can imagine difficulties in tackling problems involving hundreds or thousands of variables, which represent a huge challenge for global exploration.This extension may require a sophisticated parallel implementation of the algorithm.For the investigation presented in this work, we employed a basic multi-threading strategy exploiting the shared-memory architecture of current multi-core CPUs.The execution on larger computer clusters would need a more in-depth strategy, avoiding unnecessary communication between processes.Our previous work on the parallelization of RRT-like algorithms based on an automatic subdivision of space [48] could be a good starting point.In addition to the "high-level" parallelization of the algorithm for efficient execution on multiple CPUs, one can also envisage GPU-accelerated calculation of certain "low-level" functions inside the algorithm, such as energy or distance calculations.Thus, we expect IGLOO to prove to be a highly versatile tool for global optimization tasks and for the global exploration of complex energy landscapes in the future.

Algorithm 5 :
TRRT-Exploration input : Conformational space variables C Algorithm parameters and energy function P Initial conformations for the exploration (roots) Q output : Set of trees containing conformations T 1 T ← Q 2 while not StoppingCriteria(P, T ) do 3 T i ← SelectTree(T ) 4

Figure 1 .
Figure 1.Evolution of the Met-enkephalin (top) and df-c-Myb (bottom) lowest energy (in kcal/mol)for BH (red), HYBRID (blue) and IGLOO (green) algorithms as a function of CPU time (in s).The solid lines represent the average value over ten runs of each algorithm.The variability of their performance is illustrated by the colored area between the first and third quartile (dashed lines).Outliers are represented by red circles (BH), blue diamonds (HYBRID), and green crosses (IGLOO).

Figure 2 .Figure 3 .
Figure 2. Met-enkephalin lowest-energy structure found by BH (red), HYBRID (blue) and IGLOO (green) and associated energy (in kcal/mol) and CPU time (in s).Residue numbers are depicted in white.

Figure 4 .
Figure 4. Distribution of minimized conformations of df-c-Myb on a two-dimensional projection at three different CPU times (in s) for BH (red), HYBRID (blue), and IGLOO (green).The two dimensions are defined by the 4 Cα− 7 Cα and 2 Cα− 5 Cα interatomic distances (in Å).HPN1, HPN2 and HLX regions are depicted by oblique hatches, horizontal hatches and grids, respectively.On each thumbnail, the dark dots correspond to the minima already located during the CPU time corresponding to the previous thumbnail.

Figure A1 .
Figure A1.Two-dimensional projections, using the Φ and Ψ dihedral angles, of explored conformations of the alanine dipeptide after four iterations (a-d) of the IGLOO algorithm.Each sampled conformations is depicted with a point, and it is connected to its parent node in the tree by an edge.Each point and edge is colored on a gradient from gray to black, providing information on the order in which the conformations were sampled.The first conformations sampled are in light gray, the last in black.Once locally minimized, conformations are represented by white circles.Nodes selected as roots for the next iteration, which correspond to representative conformations in the different basins, are depicted by red crosses.(a) Extensive exploration of the potential energy surface with an energy threshold allowing the global exploration and the crossing of high-energy barriers.(b) Reduction of the exploration step, inducing closer proximity of the nodes, and of the energy threshold, revealing lower barriers.(c) Further reduction of the exploration step and energy threshold allows the achievement of a resolution enabling basin separation.(d) The threshold value reached limits the exploration to low-energy basins.