Improved Cluster Structure Optimization : Hybridizing Evolutionary Algorithms with Local Heat Pulses

Cluster structure optimization (CSO) refers to finding the globally minimal cluster structure with respect to a specific model and quality criterion, and is a computationally extraordinarily hard problem. Here we report a successful hybridization of evolutionary algorithms (EAs) with local heat pulses (LHPs). We describe the algorithm’s implementation and assess its performance with hard benchmark CSO cases. EA-LHP showed superior performance compared to regular EAs. Additionally, the EA-LHP hybrid is an unbiased, general CSO algorithm requiring no system-specific solution knowledge. These are compelling arguments for a wider future use of EA-LHP in CSO.


Introduction
Evolutionary algorithms (EAs) remain one of the prime strategies used to solve global optimization problems such as global cluster structure optimization (CSO).Global CSO attempts to find atomic or molecular cluster confomers with globally minimal energy, or with globally minimal deviations from measured properties [1].This is of immediate interest in many areas-both in fundamental science (e.g., non-continuum nucleation [2] and solvation [3]) and in applied studies (e.g., aerosols [4], designed materials [5], combustion/astrophysics [6,7], and semiconductors [8][9][10][11], involving carbon clusters, silicon clusters, and many other systems).CSO is considered a non-deterministic polynomial-hard (NP-hard) problem; i.e., the number of minima scales exponentially with the dimensionality of the problem.Breaking through this exponential wall is hence the first challenge and main obstacle for any solution strategy.One of us has shown [12] that EAs can be made to locate global minima of Lennard-Jones-type atomic clusters in roughly cubic time complexity.This is a significant scaling reduction compared to the exponential scaling of brute-force enumerative solutions.However, we have shown these same EAs to solve many of the standard analytical benchmark functions [13] in almost linear (i.e., optimal) time complexity, which so far has not been achieved for CSO.CSO is therefore a difficult challenge for the scaling reduction ability of global optimization strategies.
Second, if we are to connect our simulation data with experiment-as should be our target as scientists-we require the hypersurface model to be accurate enough for this purpose.Unfortunately, this requirement means we must typically employ computationally expensive models for the hypersurface itself.For a given cluster size (i.e., independent of the size scaling), this generates a high prefactor on the computational cost, which seriously limits search space exploration and/or enforces restriction of the search to small clusters.This constitutes the second challenge for global CSO: prefactor reduction.Current cases in point are CSO studies performed directly at the density functional theory (DFT) level [14], which are limited to very small clusters despite the use of national supercomputer resources.
Efficiency improvements to the model (e.g., reduced-scaling techniques) and efficiency improvements to the search itself (e.g., those of the present work) alleviate some of the computational expense and hence can extend direct-DFT CSO to larger systems.However, given the huge prefactor of first-principles methods compared to force-field calculations (4-6 orders of magnitude), equally huge efficiency improvements would be necessary.
Prefactor reduction can be approached in fundamentally different ways.A very successful avenue is the inclusion of system-specific deterministic knowledge into the otherwise generic heuristic EA.However, this simpler approach relies on the similarity of the problem to be solved to the reference from which the deterministic knowledge was extracted.Beyond simple systems, such similarity may be hard to ensure, hence biasing the solution strategy in the wrong direction and potentially jeopardizing its ability to locate the correct global minimum.
A theoretically superior but practically harder avenue is the analysis of a given global optimization strategy for its strengths and weaknesses in hypersurface exploration and exploitation, followed by algorithmic improvements designed to eliminate weaknesses while maintaining strengths.This more generic approach ensures that the similarity requirement is less strict.Again, various options arise.We have contributed, e.g., niches [12] and graph-based technologies [15], and others have employed order parameters [16,17] (which are similar to niches, in guiding the search by criteria other than the property being optimized), bond transpositions in Monte-Carlo graph search [18] and graph partitioning [19].
Over the course of the last two decades, researchers in the field have hypothesized that hybridizing EAs with other solution strategies of complementary strength profiles would result in overall superior behavior.Proofs-of-concept to this extent were demonstrated.Attempts to evolve beyond this stage were mainly unsuccessful.For example, in unpublished work, both of the present authors independently of each other attempted to hybridize EAs with Monte-Carlo with minimizations (MCM [20]; later known as basin-hopping [21]), but failed to arrive at a hybrid that performed significantly better than EAs alone.Augmenting MCM with short molecular dynamics trajectories has been successfully employed in two studies of transition metal nanoalloys as an indiscriminant heating of the entire cluster [22,23].
In this publication, we will introduce a successful attempt to hybridize EAs with another global optimization strategy-local heat pulses (LHPs).We will first concisely introduce our EA implementation and our implementation of LHPs, with an emphasis on changes of the latter compared to the original.Subsequently, we will discuss our hybridization strategy in detail.We then present benchmark results for a selection of hard CSO problems and conclude with an outlook on how this hybrid strategy will enable us to connect theory and experiment in the future with a smaller computational footprint.
In 1997, Möbius, Hoffmann, and Schreiber [24] proposed "thermal cycling" as an improvement over simulated annealing.Briefly, thermal cycling consists of two steps: (a) "locally heating" the system by perturbing a small part of it (e.g., via a short series of Metropolis Monte-Carlo steps at a given temperature T, or by applying small changes in a randomly selected region), followed by (b) a quench (e.g., via local optimization).Steps (a) and (b) are repeated, while T is lowered gradually.For several traveling salesman instances, substantial improvements over standard simulated annealing were found.
In 2011, Möbius and Schön [25] extended this concept to continuous problems, now under the name of "local heat pulses", and applied it to structure determination of the periodic bulk system Mg 10 Al 4 Ge 2 Si 8 O 36 , using a force field containing Buckingham and three-body terms.Again, they could demonstrate the superiority of LHP compared to standard simulated annealing, and also to multistart local search.
Clearly, LHP is historically and conceptually related to simulated annealing.However, it is also similar to MCM (aka basin-hopping): basically, both consist of cycling two steps, namely getting out of the current minimum and quenching into the next one.The difference is in the first step: in MCM, as well as in typical Monte-Carlo mutation operators of genetic algorithm (GA)/EA optimization, this "stepping-out" is achieved by randomly mutating randomly chosen particles.In contrast, in LHP these mutations are not entirely random; instead, both the choice of particles and the mutation extent is spatially localized.In this sense, the mutation step of LHP can be interpreted as a "phenotype" version of a Monte-Carlo mutation, in which prior spatial neighborhoods of the cluster particles are preserved instead of being potentially destroyed.In GA/EA applications to CSO, the phenotype crossover by Deaven and Ho [12,[26][27][28] turned out to be superior to the traditional genotype crossover, because the latter did not take spatial neighborhoods into account.Similarly, with the present work we show that the phenotype character of LHP tends to make it superior to traditional MC mutation that acts in a spatially random fashion.
We will in the following first introduce EAs and our hybridization strategy of them with LHP.We will discuss details of our LHP implementation in light of algorithmic requirements posed by hybridization.Subsequently, we will test the EA-LHP hybrid in comparison to the pure EA and LHP approaches on extremely hard CSO test cases and analyze its performance.We conclude with a summary and outlook of future applications of EA-LHP.

Evolutionary Algorithms
EAs encompass a large set of algorithms and implementations.In the following, we will restrict ourselves to discuss our implementation employed here and highlight important features of it of relevance to this work.More exhaustive discussions can be found in References [29,30] for our implementation and References [31][32][33][34] for EAs in general.
EAs have a history of using biology-inspired terms to describe algorithmic steps.A convincing argument has been made that this is needlessly obfuscating [35,36].In the following, we will hence limit our use of biological terms to a minimum.
Our pool-based EA maintains a set of energetically low-lying isomers or conformers (henceforth referred to as candidates) throughout the optimization.Each candidate contains an encoding of its solution string.For CSO, these are the nuclear coordinates as a mix of absolute Cartesian center of mass positions, Euler angles, and internal Cartesians.At startup, this pool is filled with initial solution candidates, based on randomly packing molecules and relaxing the resulting clusters.This is already a first step in gathering knowledge about possible solutions (e.g., on which particle pair-distances are presumably optimal).In fact, for very small clusters, the global minimum may already be in this initial pool.For larger clusters this quickly becomes unlikely, and in such a case simply increasing the initial pool is not a good strategy, since series of local optimizations from random starting structures will not suffice [37].Instead, exchange of partial knowledge embodied in the candidates and further variations of these candidates is much more profitable.Hence, we next select two known candidates-one based on a Gaussian-weighted fitness distribution, the other randomly-from the pool for further operations.First, solution knowledge is exchanged between the old candidates to form two new candidates (crossover).This exchange may either happen based on the numeric representation (genotype) or the physical form (phenotype).We have found phenotype exchanges [26][27][28] by cutting through clusters and exchanging the parts above the cutting plane to be superior to other options [12].Subsequently, the two new solution candidates can be augmented with new solution knowledge to explore the solution space further (mutation with a configured probability).This operation may take the form of selecting and moving a few random atoms through a Monte Carlo (MC) type move, a graph-based approach [15] where least connected molecules are identified and directed into a more connected position, or system-specific manipulations.In this contribution, we use LHP as a means to manipulate the cluster structure for better conformational space exploration.We will discuss our implementation of LHP and its importance for conformational space exploration below.
After these operations are completed, the candidate structures are exposed to fast quality filters.In the case of CSO, graph-based filters assert that the candidates do not have colliding nuclei or parts of the cluster dissociated from the rest.This step ensures that only candidates of a minimal quality enter the next, computationally most intensive step: local optimization.Each candidate geometry is quenched to a local minimum based on the method description chosen.We are employing classical force fields in this work (see below)-the fastest option available.Nevertheless, the local optimization steps account for the vast majority of computational expense.
This described hybridization employs LHP as a mutation step.Alternatively, we can integrate LHP to replace the local optimizations; i.e., after regular crossover and mutation, instead of a simple local optimization, LHP is employed for a few iterations to sample the vicinity of the candidates.
We then check the total energy of the two candidates after optimization, pick the lower energy (i.e., better individual) and attempt to add it to the pool of candidates.Here, the pool only adds the candidate if it can ensure a minimal diversity with respect to other candidates in the pool and if the energy of the candidate to be added is better than the one of the worst individuals in the pool.This minimal diversity is either achieved through primitive comparison of total energies or through more advanced technologies such as niching [12].If the candidate is found to qualify for addition, the pool size is kept below a configured maximal size through removal of worse individual(s).
Subsequently, the next two candidates are picked from the pool and the cycle repeats until either a maximal number of steps is reached or the best candidate in the pool has an energy less than or equal to a configured cutoff.Obviously, the cycles can be parallelized with great efficiency, as the only serial step is the addition to the pool of candidates, which is computationally significantly cheaper than local optimization, even if force fields and non-trivial diversity measures are used.
Lastly, for the sake of comparison, we have also enabled the option to use LHP in its intended purpose as a standalone global optimization.We want to note explicitly that we do not make the claim that our implementation is an optimal LHP for this purpose.Indeed, our semi-local tuning may be detrimental to its efficiency as a global optimization.Instead, we simply want to compare our implementation as part of the hybrid EA-LHP approach to it alone.If optimal LHP performance is required, we currently refer to the original implementations.The LHP also supports resetting to either a known good geometry or reinitializing with a randomized solution candidate.This breaks the Markov chain of events of LHP and hence serves as a good benchmark for LHP's performance without considering accumulated system knowledge.

Local Heat Pulses and EA-LHP Hybrid
Our implementation of LHP follows the ideas of Möbius et al., but differs in key aspects.At the most basic level, local heat pulses go through a repeated protocol of selecting a set of atoms to perturb, perturbing them (i.e., heating them), and quenching the so-perturbed geometry by means of a local optimization.Finally, a basic criterion is applied whether or not to accept the quenched geometry and do the next local heat cycle based on it or revert to the previously used geometry instead.
As we stated before, integration with our existing EA protocol is straightforward: we simply use LHP as the solution augmentation step, in contrast to its original use as a complete global optimization strategy.The requirements for these two usages are somewhat different: a global optimization protocol must be able to efficiently explore the conformational space, exploit semi-local regions/funnels to find low-lying minima, and ensure enough diversity in solution candidates to progress.On the other hand, as a substep of the overall optimization protocol, the solution augmentation step has a reduced responsibility.It only needs to explore and exploit semi-local regions of conformational space.Other parts of the EA protocol have other responsibilities: the exchange of solution knowledge ensures long-range exploration, the use of local optimization ensures local exploitation, and the pool with its diversity criteria and potential niching ensures that progress will be made to locate the global minimum.
Hence, we can tune the LHP implementation for this intended purpose.LHP's perturbation of a set of atoms lends itself from the onset to be used as a semi-local exploitation step.We can further strengthen this aspect by how the to-be-perturbed nuclei are selected.Instead of randomly choosing atoms, we try to localize our selection operators.Beyond trivial global selectors for a fixed number or percentage of the number of nuclei, we have implemented semi-local phenotype selectors that create a random pulse origin either inside or on the surface of the cluster and then pick nuclei (or center of masses of molecules) within a specified radius around this point for perturbation.The following perturbation is also localized by weighing the strength of move with a three-dimensional Gaussian centered at the pulse origin.
After the quenching local optimization, we either apply a strict criterion where only steps decreasing the energy are accepted and used for the next step or a standard Metropolis criterion using an artificial temperature.
All implementations described in this publication are available free of charge under a BSD 4-clause license from https://www.ogolem.orgas part of the OGOLEM framework.

Benchmark System Models
As explained in the introduction, one aim of the present work is to arrive at a CSO algorithm that offers improved performance compared to standard EAs but without relying on system-specific information; i.e., performance improvements should incur for very different application cases.Hence, we have tested our EA-LHP-hybrid both with atomic clusters and with molecular ones, and also for clusters without and with directional interactions between the particles.These desirable characteristics can be covered with just two pure neutral cluster systems as test cases: Lennard-Jones (LJ) clusters and water clusters.Notably, the energy landscape of TIP4P (transferable intermolecular potential, 4 points) water clusters can reasonably be expected to differ strongly from that of LJ clusters, since TIP4P supports strongly directional molecular interactions and reasonably captures the propensity of several water molecules to form ordered H-bonded chains and rings.

Lennard-Jones Clusters
LJ clusters are defined via total cluster energies E LJ given by with pair well depth and pair equilibrium distance r e = 6 √ 2 σ, interatomic distance r, and indices i, j enumerating the atoms.The LJ model is a reasonable first-order approximation for systems dominated by van-der-Waals interactions (e.g., rare gases), but even for this purpose it is not quantitatively accurate [38].Of course, we use the LJ model here not for its (limited) application accuracy, but for two other reasons: it is a well-established benchmark system for CSO, and by construction LJ particles have exclusively pairwise and perfectly isotropic particle-particle interactions.
For different cluster sizes n, homogeneous LJ n clusters offer a wide range of CSO challenges, from very simple single-funnel landscapes to difficult deceptive ones in which the global minimum-energy structure resides in narrow funnel separated by high barriers from the remaining search region which is dominated by significantly different structures.These situations have been analyzed in detail [39].From the well-known most challenging cases below n = 150 we pick n = 98, since in that case not just two structural types (icosahedral and decahedral) compete with each other in a close race, but the true global minimum is of yet another structural type with T d symmetry [40].Hence, for LJ 98 , a successful CSO run not only has to get beyond the icosahedral structures dominating most of the search landscape, but also has to differentiate between decahedral and T d as two elusive alternatives.Cartesian coordinates for these three minima are provided as Supplementary Materials.

Water Clusters
Water clusters differ from LJ clusters not only by being molecular, but they also feature strongly directional interactions (hydrogen bonds) that strongly favor far less than 12 nearest neighbors.Hence, search landscapes that strongly differ from those exhibited by LJ clusters can be expected.Compared to these basic features, the highly non-trivial choice of which water model to choose is of secondary importance.Here, we chose the TIP4P model [41].
TIP4P water clusters are defined in atomic units by where indices m, n, with m < n, enumerate individual water molecules and indices i, j indicate charged sites within each water molecule.Water molecules are rigid, with an OH bond length of 0.9572 Å and a bond angle of 104.52 degrees.Partial charges q of +0.52 sit on all H-atoms, while a partial charge of −1.04 sits on the so-called M-site on the HOH angle bisector, 0.15 Å away from the O-atom.The first two LJ-like terms are centered on the O-atoms themselves and prevent water molecules from collapsing upon each other.The two parameter values A = 6 × 10 5 Å 12 kcal/mol and B = 610 Å 6 kcal/mol complete the TIP4P definition.TIP4P water exhibits reasonable H-bond patterns and (in contrast to, e.g., TIP3P) a phase diagram that at least vaguely resembles the true one.Again, we use TIP4P here not for modeling true water clusters, but because it is also frequently employed as CSO benchmark.
Homogeneous neutral water clusters modeled with the TIP4P potential are simple for CSO algorithms until n = 15.Beyond this limit, not only differences between different water potentials start to have qualitative effects [42], but the search landscapes also become significantly more challenging than for clusters consisting of isotropic atoms without directional bonding propensities.A case in point is n = 21 with TIP4P, where a low-energy local minimum with a reasonable-looking geometry is easy to find, while the true global minimum structure is very different and difficult to locate [43,44]-in other words, this is another deceptive search landscape.Cartesian coordinates for these two minima are provided as Supplementary Materials.

LJ 98
For LJ 98 , we have compared an EA setup with LHP as a main mutation ingredient with another EA setup in which LHP is replaced by a standard Monte-Carlo mutation.In both cases, a pool size of 2000 individuals was chosen, and LJ neighborhood niching [45] was employed, as well as a mix of several phenotype crossover operators.Twenty-five percent of the mutations were done with our graph-based directed mutation [15].In the remaining 75% of the mutation events, either LHP-mutation or MC-mutation were used.In both cases, the number of mutated particle positions was Gaussian-distributed, with a similar mean and width of the distribution, such that in most cases between two and five particle positions were changed.
In Table 1, numbers of global optimization steps are compared that are needed until the true T d global minimum is found, starting from random seeds.
To cope with the typically large random scatter of non-deterministic search, 50 runs were performed in each case, and both average and median results are shown.Note that in each run, the maximum number of global optimization steps was set to 4.2 million, and unsuccessful runs were counted as successful after 4.2 million steps.Therefore, the results in Table 1 contain a bias, making bad results better than they actually are.Despite this disadvantage, LHP mutation is clearly better than conventional Monte-Carlo mutation.We have used the phenotype-style spatial localization of LHP "heating" here (concerning both the selection of atoms to be moved and their amount of movement), and this is the decisive difference between LHP and MC mutation in these runs.Therefore, it can be concluded that this phenotype mutation is beneficial-at least in this very hard CSO test case.For the EA cases, we used a pool size of 84 individuals and 2 million global optimization steps.As for the LJ clusters, we employed a mix of two mutation operators here: 25% of the mutations were always done with our graph-based directed mutation [15].The remaining 75% were either LHP-mutation or a standard Monte-Carlo mutation, here operating directly in the six-dimensional external-coordinate space for each water molecule (three positional coordinates of the center of mass and three Euler angles determining the rotational orientation), denoted extMC in the following.For LHP-mutation, the number of mutated molecules was chosen randomly via a Gaussian probability distribution; below we show results for three different choices of this Gaussian width.
In contrast to the LJ clusters, we used only a straightforward phenotype crossover operator (OGOLEM's sweden); this brings our CSO runs in a range where the effects of LHP are most easily visible.
Normally in our EA runs, each crossover/mutation global optimization step was finished by subjecting the resulting new clusters to a local geometry optimization.Here, we have performed additional extMC-EA-runs in which LHP was applied after crossover/mutation, but before local optimization.Hence, in these cases, LHP was added on top of a fully functional EA-based CSO.
In contrast to all this, we also used LHP closer to the original intention; i.e., in a stand-alone manner, without any EA ingredients.
The strongly deceptive search landscape of (H 2 O) 21 in the TIP4P model leads to an even wider spread of the number of global optimization steps needed to find the true global minimum than for LJ 98 .Hence, in Table 2, for the cases mentioned above we compare a different measure of success-namely the percentage of runs in which the true global minimum is found within the prescribed 2 million steps.Note that in the "wide" case of the LHP-as-mutation distribution, there is a significantly non-zero chance that most or even all molecules are affected by an LHP instance: 19 of 21 molecules are mutated with a chance of 2%, and all molecules with a chance of 0.007%.Conversely, for the "narrow" case, these probabilities are much smaller (zero percent for 16 molecules and more, 1% for 12 molecules).However, as Table 2 indicates, this significantly reduces the effectiveness of LHP.Partly, this re-emphasizes the visual finding that the true global minimum and the more dominant region of the search landscape correspond to very different structures, and hence large changes in cluster structures are needed to "jump" from one funnel to the other.Nevertheless, when doing such large structure changes with LHP, they can be less disrupting to the local particle neighborhoods (in this case also to the H-bond network), since in our LHP implementation the amount of positional and orientational mutations is spatially ordered and localized-in contrast to the completely randomized Monte-Carlo mutation ("extMC").For this reason, with proper tuning, LHP can be better than extMC, as documented by Table 2.
When LHP was inserted as an additional step before local optimization and after normal crossover/mutation, two thirds of all runs were successful, which is roughly twice the success rate of extMC or LHP acting alone.At first sight, this is not very surprising, since then both extMC and LHP are applied in each global search step, so it seems reasonable that the combined success is the sum of the isolated successes.However, this also means that (at least in this case) LHP and extMC are largely "orthogonal" to each other; i.e., one search mode can achieve changes that the other cannot, and vice versa.In any case, this test shows that adding LHP to an otherwise standard EA scheme indeed does improve the overall search power significantly.However, it has to be remembered that in this test case we used a rather simplistic crossover/mutation set in our EA.With more sophisticated EA ingredients, the impact of adding LHP may well be smaller.
On the other hand, operating LHP by itself without the usual EA ingredients (a pool of solutions being improved in parallel, via crossover/mutation and selection) is clearly worse than even the simplistic EA used here, without or with LHP as mutation operator.Unexpectedly, resetting standalone-LHP to new random structures periodically-in an effort to enhance exploration-does not help, but even lowers the success rate markedly.

Discussion and Conclusions
With hard and deceptive CSO application examples of very different characteristics, we have shown that EAs do benefit from hybridization with LHP, producing a combined search algorithm that is better than both of these ingredients on their own.This is a non-trivial result, since both ingredients alone had been previously shown to solve hard global optimization problems and since no system-specific knowledge was employed in their combination.
Extending the perspective beyond CSO towards other global optimization problems, however, it should be emphasized that both EA and LHP-as implemented and used here-do profit from a more general a-priori knowledge about CSO, namely locality in 3D space.Whatever the nature of the interactions between cluster particles may be (covalent bonds, hydrogen bonds, van-der-Waals, or Coulomb interactions, etc.), all of them decay to zero with increasing distance.For all but the smallest clusters (for which CSO is not problematic anyway), this guarantees that localized changes in one region of the cluster do not strongly couple with changes in other far-away regions.This may well be different in other global optimization problems.Hence, even for LHP-EA, by construction there still is no free lunch [46,47].

Table 1 .
Numbers of global optimization steps until the true T d global minimum is found, comparing local heat pulse (LHP) and standard Monte-Carlo (MC) as main mutation operators.O) 21 in the TIP4P model, we have again compared LHP mutation with conventional Monte-Carlo mutation.Additionally, we have also tested LHP as a pre-stage in local optimization within our standard global-local hybrid EA scheme, and LHP as a stand-alone global optimization algorithm, without any EA ingredients.

Table 2 .
Percentage of runs finding the true global minimum within 2 million steps for different cluster structure optimization (CSO) setups, as discussed in the text.