Quantum Chemical Microsolvation by Automated Water Placement

We developed a quantitative approach to quantum chemical microsolvation. Key in our methodology is the automatic placement of individual solvent molecules based on the free energy solvation thermodynamics derived from molecular dynamics (MD) simulations and grid inhomogeneous solvation theory (GIST). This protocol enabled us to rigorously define the number, position, and orientation of individual solvent molecules and to determine their interaction with the solute based on physical quantities. The generated solute–solvent clusters served as an input for subsequent quantum chemical investigations. We showcased the applicability, scope, and limitations of this computational approach for a number of small molecules, including urea, 2-aminobenzothiazole, (+)-syn-benzotriborneol, benzoic acid, and helicene. Our results show excellent agreement with the available ab initio molecular dynamics data and experimental results.


Introduction
Solvents play a pivotal role in solution phase chemistry. The efficiency of many chemical transformations ranging from organic reactions to (transition-)metal catalyzed processes is intricately linked to the solvent in which they are carried out [1][2][3][4]. As a result, the chemically relevant solution phase conformations of the reactants, intermediates, and products may differ significantly from their gas phase or solid-state counterparts and can even depend on the specific solvent [5]. This is in particular true for bio-or biomimetic molecules and biochemical reactions that take place in water.
To accurately account for the reactivity in the condensed phase, a reliable characterization is necessary. Despite experimental advances, the achieved temporal and spatial resolution is often not sufficient to decipher the impact of complex solvent effects. Theoretical investigations are an attractive alternative to fill this gap, but the accurate treatment of (explicit) solvation is still an active field of research in computational chemistry [5][6][7][8].
Yet, several strategies exist. In the simplest model, only bulk properties of the solvent are considered. These so-called implicit solvation models are computationally cost-effective and have long been applied in computational chemistry [9][10][11][12]. However, they fail if explicit solutesolvent interactions are expected to play a role. At the other end of the spectrum, condensed phase approaches exist, which take a large number of solvent molecules into account. Most accurately, solute and solvent are treated quantum-mechanically. However, such ab initio molecular dynamics (MD) approaches are quite expensive and limited to relatively small systems [13][14][15]. As a third strategy, explicit-implicit solvation approaches-also referred to as microsolvation-have emerged as a low-cost alternative. Here, a small number of explicit solvent molecules are incorporated in a quantum chemical calculation [8,[16][17][18][19][20].
However, microsolvation approaches implicate new challenges such as how many solvent molecules to include, where to put them, and how to orient them. Until today, numerous methodologies to microsolvation have been reported, which can be largely divided into two categories: bottom-up approaches, where starting from the solute alone, explicit solvent molecules are successively added, and top-down approaches, where starting from a condensed phase calculation-often a classical MD simulation-a few explicit solvent molecules are selected.
Bottom-up approaches comprise, for example, the cluster-continuum approach by Pliego and coworkers, where individual solvent molecules are added and the free energy of solvation, ∆G solv , is calculated quantum-mechanically. The number of solvents that minimizes ∆G solv is then taken for further investigations. Alternative approaches are based on an analysis of the molecular electrostatic potential to identify interaction sites [21][22][23]. Following a related idea, Sure et al. proposed a protocol based on the analysis of the screening charge densities [24]. Another idea is to start from "idealized" solvent geometries, such as the structures of clathrates, to describe microsolvated amino acids [25]. Simm and coworkers developed a stochastic approach, where interaction surfaces of solute and solvent are identified and randomly combined. Additional conformational sampling ensures generation of a structural ensemble [26]. More recently, global optimization procedures in combination with machine learning techniques have emerged to speed up the generation and identification of the most stable microsolvated clusters [27,28].
However, a general problem arises when taking the solute as starting point to build the microsolvation model: the question is how to define a reliable solute conformation without knowing it a priori. Convergence may be challenged if the structure changes drastically with increasing number of explicitly treated solvent molecules. In addition, evaluation of many solute-solvent cluster configurations may be necessary to identify the most stable arrangement, which is computationally demanding.
Top-down approaches circumvent this issue by starting the investigation from a condensed phase structure. Here, microsolvated solute structures are often derived from snapshots of a classical MD simulation or from Monte Carlo sampling. Typically, all solvent molecules within a certain distance to the solute are included in a follow-up quantum chemical calculation (for selected examples, see Refs. [17,[29][30][31]). Following a slightly different approach, Nocito and Baran developed an averaged condensed phase environment derived from snapshots of classical MD trajectories to be applied in quantum chemical calculations [32].
Several studies showed the practicality of such approaches to microsolvation. For example, solute-solvent cluster models were found to improve the description of electronic excitation spectra, [29,32,33] or of IR and Raman spectra [30]. However, in particular in top-down approaches, the number of explicit solvent molecules to be included in the cluster model is often justified based on the distance to the solute alone. A rigorous criterion to select the number of explicit solvent molecules to be included in the microsolvation model based on physical properties is missing. This can be attributed to the inherent difficulty in condensed phase (classical) MD simulations to define the solute-solvent interaction strength and to assign locations to the solvent molecules [34].
However, in biomolecular simulations, the quantification of solute-solvent interactions and the identification of hydration sites-that is, the location of water molecules that show slow or no movement during the simulation-are an important goal due to its relevance in drug design [35][36][37][38][39][40]. Several computational tools exist to aid this procedure [37,[39][40][41]. For example, grid inhomogeneous solvation theory (GIST) allows the analysis and thermodynamic quantification of the solute-solvent interaction on a grid over the whole trajectory [42,43]. Prior studies have shown that GIST can be used to identify favorable interaction sites in heterocycles [44,45] or in biomolecules, [43,46,47] for example, in antifreeze proteins [47] and kinases [46]. In the latter studies, the grid points were further analyzed to identify positions with high water density to visualize locations of strongly interacting solvent molecules. Yet, the only purpose of these initial studies was to enhance visualization, and the analysis determined oxygen positions only.
In the present study, we build on this idea to propose a rigorous protocol to quantum chemical microsolvation by automated water placement based on physical properties. We combined the GIST analysis with an evaluation of the averaged solvent density to place individual water molecules and to quantify their interaction with the solute. A crucial point was the development of an algorithm to assign hydrogen positions to obtain water orientations. Our approach is implemented in a fully automated Python script. The user can choose the number of explicit solvent molecules for subsequent quantum chemical studies based on their position and interaction strength. We chose five different small molecules, which differ in size and polarity, to test our methodology: urea and benzoic acid due to their biological relevance; (+)-syn-benzotriborneol because of its ability to act as a water receptor [48]; helicene because of its apolar nature; and 2-aminobenzothiazole, which can undergo water-assisted tautomerization [49]. All of these molecules have previously been investigated either experimentally by rotational spectroscopy as well as X-ray and neutron diffraction analysis or theoretically by quantum chemical or molecular dynamics studies [48][49][50][51][52][53][54][55][56][57][58][59].
After a brief introduction to GIST and to our methodology, the capabilities of our top-down approach to quantum chemical microsolvation are illustrated in the example of the previously introduced five small molecules. Thereby, we aimed at including a variety of compounds that differ in size and polarity.

Grid Inhomogeneous Solvation Theory (GIST)
Grid inhomogeneous solvation theory (GIST) is a method to calculate the free energy of solvation for a fixed solute conformation by evaluating (local) thermodynamics-that is, solute-solvent interactions-on a grid [42,43,46,47]. While a detailed derivation of the GIST formalism can be found elsewhere, [42,43] we briefly introduce the theoretical foundations here.
In GIST, the equations of inhomogeneous solvation theory (IST) [60] are solved numerically by sampling a constrained solute in solution via MD simulations and assigning solvent molecules to a three-dimensional grid. The result of this analysis is a three-dimensional density map of the solvent around the solute. From this map, the free energy of solvation ∆A solv can be calculated as a sum over the individual contributions of all grid voxels r according to Equation (1).
The free energy of solvation of each grid voxel r is split into an energy and an entropy term as described in Equation (2).
As our analysis is limited to water only, we refer to the solvent by the subscript "w." The entropy is split into a translational and an orientational term (Equations (4) and (5)), which are calculated as a Shannon entropy approximated by a nearest-neighbor approach, with R being the ideal gas constant, N r the total number of solvent molecules in voxel r during the simulation, and γ Euler's constant, which corrects for an asymptotic bias. N f is the number of frames of the simulation, ρ 0 is the reference density of the solvent model, d trans is the closest distance and ∆ω is the closest angular distance between two solvent molecules. By quantifying the individual enthalpic and entropic contributions to the free energy of solvation, GIST allows us to identify interaction sites, or rather grid voxels, which show strong interactions with the solute. Identification of such sites is essential for the location of solvent molecules in a microsolvation approach.

Computational Protocol to Microsolvation
Our computational protocol to quantum chemical microsolvation is shown in Figure 1. It started with sampling the solute in solution by molecular dynamics (MD) simulations (I). Second, we performed a grid inhomogeneous solvation theory (GIST) analysis of the obtained trajectory (II) to identify favorable interaction sites with the solute. Based on the interaction sites and the water density of the trajectory, individual water molecules were placed and oriented (III) and then ranked according to their interaction strength with the solute (IV). As a last step, the user could select the number of water molecules for subsequent optimizations (V). with R being the ideal gas constant, the total number of solvent molecules in voxel during the simulation, and Euler's constant, which corrects for an asymptotic bias. is the number of frames of the simulation, is the reference density of the solvent model, is the closest distance and Δ is the closest angular distance between two solvent molecules.
By quantifying the individual enthalpic and entropic contributions to the free energy of solvation, GIST allows us to identify interaction sites, or rather grid voxels, which show strong interactions with the solute. Identification of such sites is essential for the location of solvent molecules in a microsolvation approach.

Computational Protocol to Microsolvation
Our computational protocol to quantum chemical microsolvation is shown in Figure  1. It started with sampling the solute in solution by molecular dynamics (MD) simulations (I). Second, we performed a grid inhomogeneous solvation theory (GIST) analysis of the obtained trajectory (II) to identify favorable interaction sites with the solute. Based on the interaction sites and the water density of the trajectory, individual water molecules were placed and oriented (III) and then ranked according to their interaction strength with the solute (IV). As a last step, the user could select the number of water molecules for subsequent optimizations (V).
The workflow after the simulation was automated and autonomously handled by our program Free Energy Based Identification of Solvation Sites (FEBISS). MD Simulations and GIST Analysis. Prior to any analysis, an MD simulation of the solute in the solvent (here, in water) had to be performed. In case the solute showed significant dynamics, a clustering of the trajectory had to be performed to identify the most populated conformations. The most stable solute conformation or the cluster representatives were then simulated with the solute coordinates being constrained as required for GIST. The MD simulations can be performed with any software (any force field and water model) as long as the written topology and trajectory files can be read by the freely available open-source software Cpptraj [61].
The obtained trajectory was then analyzed with GIST to calculate the thermodynamic contributions of the solute-solvent interaction on a grid (with a spacing of 0.5 Å). As outlined above, GIST discretizes the trajectory and allows to identify solute-solvent interaction sites (grid voxels) with specific solvation thermodynamics. For the purpose of our computational protocol to microsolvation, we were solely interested in interaction sites (grid voxels) that showed a favorable (negative) free energy of solvation ΔAr and a high water density.
Water Placement. In our protocol, solvent positions were assigned based on the averaged solvent density around the solute and the discretized free energies of solvation. First, the algorithm identified the grid point of highest water density. A water molecule The workflow after the simulation was automated and autonomously handled by our program Free Energy Based Identification of Solvation Sites (FEBISS).
MD Simulations and GIST Analysis. Prior to any analysis, an MD simulation of the solute in the solvent (here, in water) had to be performed. In case the solute showed significant dynamics, a clustering of the trajectory had to be performed to identify the most populated conformations. The most stable solute conformation or the cluster representatives were then simulated with the solute coordinates being constrained as required for GIST. The MD simulations can be performed with any software (any force field and water model) as long as the written topology and trajectory files can be read by the freely available open-source software Cpptraj [61].
The obtained trajectory was then analyzed with GIST to calculate the thermodynamic contributions of the solute-solvent interaction on a grid (with a spacing of 0.5 Å). As outlined above, GIST discretizes the trajectory and allows to identify solute-solvent interaction sites (grid voxels) with specific solvation thermodynamics. For the purpose of our computational protocol to microsolvation, we were solely interested in interaction sites (grid voxels) that showed a favorable (negative) free energy of solvation ∆A r and a high water density.
Water Placement. In our protocol, solvent positions were assigned based on the averaged solvent density around the solute and the discretized free energies of solvation. First, the algorithm identified the grid point of highest water density. A water molecule was placed at this position and the density of one water molecule was subtracted from this grid point. If the density at this grid point was lower than that of a single water molecule, neighboring voxels were included until the sum of their densities suffices. The reference density of one water molecule was taken from the applied water model, here TIP3P [62]. The assignment procedure was repeated until 95% of the total water density was allocated to individual water atoms. Second, the free energy of the selected grid voxel was density-weighted and assigned to the positioned water molecule. This ensured a more accurate representation of the free energy of solvation in the case of insufficient sampling. The water molecules were assigned the free energy value from the grid voxel, where the oxygen nucleus was placed.
By this procedure, water molecules at points of high density and favorable free energy could be identified. Next, positions of the protons were retrieved by an algorithm sketched in Figure 2. Here, the positions of all protons were analyzed for each frame of the simulation. These proton positions were binned onto a smaller grid with a spacing of 0.1 Å, as shown in Figure 2A. The first proton was then placed on the voxel with the highest occupation number. For the second proton, our algorithm only considered positions resulting in a water angle that deviated from the equilibrium angle by less than 5 • , as shown in Figure 2B. Within this spherical segment, the second proton was placed at the voxel with the highest occupation number ( Figure 2C). The specific value of the H-O-H angle depends on the applied water model in the simulation. For our study, we used the TIP3P water model, [62] with an equilibrium H-O-H angle of 104.5 • .
Molecules 2021, 26, x 5 of 27 was placed at this position and the density of one water molecule was subtracted from this grid point. If the density at this grid point was lower than that of a single water molecule, neighboring voxels were included until the sum of their densities suffices. The reference density of one water molecule was taken from the applied water model, here TIP3P [62]. The assignment procedure was repeated until 95% of the total water density was allocated to individual water atoms. Second, the free energy of the selected grid voxel was density-weighted and assigned to the positioned water molecule. This ensured a more accurate representation of the free energy of solvation in the case of insufficient sampling. The water molecules were assigned the free energy value from the grid voxel, where the oxygen nucleus was placed. By this procedure, water molecules at points of high density and favorable free energy could be identified. Next, positions of the protons were retrieved by an algorithm sketched in Figure 2. Here, the positions of all protons were analyzed for each frame of the simulation. These proton positions were binned onto a smaller grid with a spacing of 0.1 Å, as shown in Figure 2A. The first proton was then placed on the voxel with the highest occupation number. For the second proton, our algorithm only considered positions resulting in a water angle that deviated from the equilibrium angle by less than 5°, as shown in Figure 2B. Within this spherical segment, the second proton was placed at the voxel with the highest occupation number ( Figure 2C). The specific value of the H-O-H angle depends on the applied water model in the simulation. For our study, we used the TIP3P water model, [62] with an equilibrium H-O-H angle of 104.5°. Free Energy Ranking. We assigned each placed water molecule the free energy value ΔA of the voxel where the oxygen was placed, weighted by the density of that voxel. The water molecules were then sorted by their ΔA values and displayed as a bar chart, where the height denotes their free energy and a color coding indicates their distance to the solute. We may categorize them into three groups: (i) water molecules with a distance < 3 Å, color-coded in orange: such water molecules typically describe the first solvation shell in hydrophilic regions of the solute; (ii) water molecules with distances between 3 Å < d < 6 Å, color-coded in blue: such molecules comprise either the second solvation shell in hydrophilic regions of the solute or the first solvation shell in hydrophobic regions; and (iii) water molecules with d > 6 Å, depicted in red: they are typical "bulk-like" molecules with almost no interaction with the solute.
Water Selection within the Graphical User Interface (GUI). A bar chart displays the ranking of the placed water molecules and serves as a graphical user interface (GUI). Users can select the water molecules for the microsolvated structure based on the Free Energy Ranking. We assigned each placed water molecule the free energy value ∆A of the voxel where the oxygen was placed, weighted by the density of that voxel. The water molecules were then sorted by their ∆A values and displayed as a bar chart, where the height denotes their free energy and a color coding indicates their distance to the solute. We may categorize them into three groups: (i) water molecules with a distance < 3 Å, color-coded in orange: such water molecules typically describe the first solvation shell in hydrophilic regions of the solute; (ii) water molecules with distances between 3 Å < d < 6 Å, color-coded in blue: such molecules comprise either the second solvation shell in hydrophilic regions of the solute or the first solvation shell in hydrophobic regions; and (iii) water molecules with d > 6 Å, depicted in red: they are typical "bulk-like" molecules with almost no interaction with the solute.
Water Selection within the Graphical User Interface (GUI). A bar chart displays the ranking of the placed water molecules and serves as a graphical user interface (GUI). Users can select the water molecules for the microsolvated structure based on the interaction energy (solvation free energy) and the distance to the solute by clicking on the individual bars. Complementary radial distribution functions (RDF) plots of the simulation including integral values can further aid the user in the selection of the number of solvent molecules. Implementation and Usage. To obtain microsolvated solute structures with our computational protocol, any MD simulation package can be used as long as it generates a trajectory and topology file that can be converted to be readable by the freely available open-source software Cpptraj [61]. Most conveniently, this can be achieved with the Amber package [63,64].
Besides the MD simulation, our Python program FEBISS handles the entire workflow as outlined above. It calls the GIST analysis and performs the newly implemented water placement-both steps are integrated in the C++ open-source software Cpptraj [61]. It generates the interactive bar charts so that the user can select the number of water molecules; it saves the solvated structures as pdb files and automatically displays them in PyMOL [65].
FEBISS is available on Github [66] and via pip for Python3 free of charge. To increase performance, our protocol makes use of the recently published GPU-accelerated GIST version, [67] while patch are files available on Github [68]. Our algorithm is directly implemented within the GIST analysis and can be invoked with the additional keyword febiss. Our Python program is able to install all dependencies automatically and performs the GIST analysis and water placement by calling Cpptraj. The analysis and the interactive bar chart can also be customized with an easy-to-use yaml input file.

Molecular Dynamics Simulation Details
All structures were generated with the Gaussian 16 program, [69] with the exception of (+)-syn-benzotriborneol, which was retrieved from the available crystal structure (CDCC code: 690687) [48]. The simulation setup, including the parameterization of the systems, as well as the final production runs were performed using the Amber18 simulation package [63,64]. For the parametrization, restrained electrostatic potential (RESP) charges [70] were calculated with HF/6-31G* for each system, using the Gaussian 16 program [69]. For the sake of comparison AM1-BCC charges [71] were also calculated for urea. This step was followed by the assignment of the atom types to all structures, using the generalized Amber force field (GAFF) [72] and the antechamber module of AMBERTools19 [64]. The systems were then solvated by applying an octahedral TIP3P [62] water box, maintaining a minimum distance of 20 Å between the solute and the edges of the box. The solvated systems were subjected to an equilibration run, following a thorough protocol by Wallnöfer et al. [73]. The final production runs were carried out in NVT ensembles, where the number of particles N, the volume V and the temperature T are kept constant with T = 300 K, using the GPU implementation of the pmemd module of AMBER 18, [64,74] whereby the temperature was controlled by applying the Langevin thermostat [75]. To achieve a higher performance, periodic boundary conditions and the smooth particlemesh Ewald approach [76] were applied in all simulations. Additionally, the SHAKE algorithm [77] was used to constrain bonds involving hydrogen atoms. Regarding the restrained NVT runs for the GIST analysis, the whole solute was restrained with a force constant of 100 kcal/mol Å 2 . The time step was set to 2 fs, and snapshots of the systems coordinates were saved every 10 ps. The final trajectories comprised 30,000 frames, which corresponds to 300 ns of NVT MD simulation time.

Data Analysis and Reference Settings
For the GIST analysis, we applied a 30 Å × 30 Å × 30 Å cubic grid with a spacing of 0.5 Å to obtain a reasonable balance between accuracy and convergence [43]. As all of the thermodynamic terms are state functions, reference values corresponding to the pure solvent state were needed. Throughout this paper, the reference solvent-solvent interaction was set to E ww = −9.533 kcal mol −1 (taken from the TIP3P model), [62] whereas E sw , S trans , and S rot were all set to zero.
All data analysis was handled by our program FEBISS, which calls the Cpptraj program [61] from the Amber tools package [64]. For the sake of comparison, we applied the 3-dimensional reference interaction site model (3D-RISM), [78] using the Solvent Analysis tool implemented in the MOE software [79]. To visualize the calculated 3D-RISM water densities, the Solvent Analysis Grid panel was used. We set a minimum distance of 15 Å between the solute and the edges of the solvent box and defined a tight convergence criterion. The remaining parameters were kept to their default values. We selected a cut-off of 2.9 Å and 1.9 Å for the RDFs of the water oxygen and hydrogen atoms, respectively, which aligns with the maximum RDFs determined in previously published results [50,52].

Quantum Chemical Calculations
Initial microsolvated structures retrieved from the FEBISS analysis were subjected to an optimization using density functional theory (DFT). All calculations were performed using the Turbomole electronic structure package (Version: 7.3) [80,81]. To comply with the already published data, all systems were optimized with the hybrid density functional B3LYP [82][83][84] and the triple-zeta basis set def2-TZVP [85]. Dispersion contributions in the DFT calculations were invoked by empirical atom-pairwise dispersion corrections with the Becke-Johnson damping [86,87]. In addition to the explicitly described water molecules, bulk solvent effects were treated implicitly, using the conductor-like screening model (COSMO) [88,89] as implemented in Turbomole. A dielectric constant of ε = 78.5 was predefined for all calculations. In this approach, the solute is placed in a cavity of a dielectric continuum. The cavity is defined by the solvent accessible surface obtained by applying increased van der Waals radii around each atom. In the case of explicit-implicit solvation models, this is done for each explicitly described molecule (solute and solvents). It has to be ensured that the cavity is meaningful and closed, and this was the case in all our calculations. Analysis of the Hessian verified that all optimized structures only had positive eigenvalues and were indeed energy minima.
The optimized geometries of the microsolvated clusters were depicted with the molecular visualization program PyMOL [65].

Results
We tested the applicability and accuracy of our automated solvent placement protocol on the following five systems: urea, 2-aminobenzothiazole, (+)-syn-benzotriborneol, benzoic acid, and helicene (see Scheme 1). The results of each investigated compound are discussed in order of decreasing polarity.
All data analysis was handled by our program FEBISS, which calls the Cpptraj program [61] from the Amber tools package [64]. For the sake of comparison, we applied the 3-dimensional reference interaction site model (3D-RISM), [78] using the Solvent Analysis tool implemented in the MOE software [79]. To visualize the calculated 3D-RISM water densities, the Solvent Analysis Grid panel was used. We set a minimum distance of 15 Å between the solute and the edges of the solvent box and defined a tight convergence criterion. The remaining parameters were kept to their default values. We selected a cutoff of 2.9 Å and 1.9 Å for the RDFs of the water oxygen and hydrogen atoms, respectively, which aligns with the maximum RDFs determined in previously published results [50,52].

Quantum Chemical Calculations
Initial microsolvated structures retrieved from the FEBISS analysis were subjected to an optimization using density functional theory (DFT). All calculations were performed using the Turbomole electronic structure package (Version: 7.3) [80,81]. To comply with the already published data, all systems were optimized with the hybrid density functional B3LYP [82][83][84] and the triple-zeta basis set def2-TZVP [85]. Dispersion contributions in the DFT calculations were invoked by empirical atom-pairwise dispersion corrections with the Becke-Johnson damping [86,87]. In addition to the explicitly described water molecules, bulk solvent effects were treated implicitly, using the conductor-like screening model (COSMO) [88,89] as implemented in Turbomole. A dielectric constant of ε = 78.5 was predefined for all calculations. In this approach, the solute is placed in a cavity of a dielectric continuum. The cavity is defined by the solvent accessible surface obtained by applying increased van der Waals radii around each atom. In the case of explicit-implicit solvation models, this is done for each explicitly described molecule (solute and solvents). It has to be ensured that the cavity is meaningful and closed, and this was the case in all our calculations. Analysis of the Hessian verified that all optimized structures only had positive eigenvalues and were indeed energy minima.
The optimized geometries of the microsolvated clusters were depicted with the molecular visualization program PyMOL [65].

Results
We tested the applicability and accuracy of our automated solvent placement protocol on the following five systems: urea, 2-aminobenzothiazole, (+)-synbenzotriborneol, benzoic acid, and helicene (see Scheme 1). The results of each investigated compound are discussed in order of decreasing polarity.

Urea-Water Complexes
We started the analysis with the urea molecule, which is the smallest compound of our test library. A 300 ns restrained NVT simulation was performed and the trajectory was subjected to a GIST analysis. The main steps of the FEBISS algorithm are depicted in Figure 3A: This process includes the identification of areas with high water density  Figure 3A (left)), the placement of water oxygen atoms ( Figure 3A (middle)), and the assignment of favorable water hydrogen orientations ( Figure 3A (right)). At the end of the FEBISS calculation, thermodynamic parameters for all water molecules were retrieved and the solvent molecules were ranked according to their free energy values. The resulting bar chart is depicted in Figure 3B. This graph allows the identification of thermodynamically favorable water molecules and, at the same time, permits the selection of water molecules with a predefined solute-water distance cut-off. The FEBISS graph of the urea system depicts 13 consecutive water molecules with a solute-water distance below 3 Å ( Figure 3B, orange bars). Six of these water molecules have a ∆A value lower than −10.00 kcal/mol. Additionally, we can point out one strongly bound water molecule with a ∆A value of −12.20 kcal/mol.

Urea-Water Complexes
We started the analysis with the urea molecule, which is the smallest compound of our test library. A 300 ns restrained NVT simulation was performed and the trajectory was subjected to a GIST analysis. The main steps of the FEBISS algorithm are depicted in Figure 3A: This process includes the identification of areas with high water density ( Figure 3A (left)), the placement of water oxygen atoms ( Figure 3A (middle)), and the assignment of favorable water hydrogen orientations ( Figure 3A (right)). At the end of the FEBISS calculation, thermodynamic parameters for all water molecules were retrieved and the solvent molecules were ranked according to their free energy values. The resulting bar chart is depicted in Figure 3B. This graph allows the identification of thermodynamically favorable water molecules and, at the same time, permits the selection of water molecules with a predefined solute-water distance cut-off. The FEBISS graph of the urea system depicts 13 consecutive water molecules with a solute-water distance below 3 Å ( Figure 3B, orange bars). Six of these water molecules have a ΔA value lower than -10.00 kcal/mol. Additionally, we can point out one strongly bound water molecule with a ΔA value of -12.20 kcal/mol. Ranking of the automatically placed water molecules around the urea molecule. The free energy of the first 50 water molecules is depicted. The distances between solute and water are color-coded, whereby orange denotes a solute-solvent distance of less than 3 Å; blue, a solute-water distance between 3 and 6 Å; and red, a solute-water distance of more than 6 Å.
Based on the FEBISS bar chart, we selected three urea-(H2O)n complexes, with n corresponding to 1, 6, and 13 water molecules. Both the urea-(H2O)n complexes directly The distances between solute and water are color-coded, whereby orange denotes a solute-solvent distance of less than 3 Å; blue, a solute-water distance between 3 and 6 Å; and red, a solute-water distance of more than 6 Å.
Based on the FEBISS bar chart, we selected three urea-(H 2 O) n complexes, with n corresponding to 1, 6, and 13 water molecules. Both the urea-(H 2 O) n complexes directly retrieved from the FEBISS calculation and the urea-(H 2 O) n complexes after DFT optimization with the COSMO solvation model are represented in Figure 4. The lowest ∆A value (FEBISS graph in Figure 3) matches a single water molecule located in between the two urea amide groups ( Figure 4A and Figure S2), with ∆A = − 12.20 kcal/mol. The interaction is characterized by a bifurcated hydrogen bond, formed by the water oxygen atom and the amide hydrogen atoms of urea. Regarding the urea-(H 2 O) 6 complex ( Figure 4B), we can observe three additional water molecules coordinating urea's carbonyl oxygen in an elliptical configuration with ∆A = −10.50 kcal/mol (compare also Figure S2). All three water molecules interact directly with the urea oxygen atom, whereby the average distance for the O urea -H water atom pair is 1.87 Å and 1.79 Å in the non-optimized and optimized water cluster, respectively. Furthermore, we can observe two more water molecules in the region that encompasses the urea oxygen atom and the amide groups. Concerning the urea-(H 2 O) 13 complex ( Figure 4C), we can see that seven additional water molecules are distributed around the urea molecule; these do not directly interact with the solute, but form hydrogen bonds with the already placed water molecules. For further validation, we compared our results with the urea-water interaction sites determined by the 3D reference interaction site model (3D-RISM). In Figure 5, an overlay of the optimized urea-(H2O)6 complex and the solvation sites obtained with the 3D-RISM methodology is shown. We find a very good agreement between the water positions obtained with our FEBISS algorithm and the high-density water locations determined with the 3D-RISM calculation. Both methodologies agree on the existence of a water molecule in between the two amide groups, as well as on water molecules surrounding the urea oxygen atom in a hemispherical manner. However, with our FEBISS algorithm, we were able to identify two more explicit water positions, which span the region between the urea oxygen and urea amide groups, which are less recognizable in the 3D-RISM results. It is noteworthy that the FEBISS chart and the water positioning looked different when AM1-BCC [71] rather than RESP [70] charges were used ( Figure S3). With AM1-BCC charges, four water molecules rather than three were found to be coordinated to the oxygen and one was localized between the amide groups.
When comparing radial distribution functions (RDFs) from our simulation to previously published ab initio molecular dynamics data, [52] we see a good agreement (see Figure S1 in Supplementary Materials). This fact proved that the sampling time of our simulation was sufficient, and that classical MD simulations are justified as input for our water placement algorithm FEBISS.
For further validation, we compared our results with the urea-water interaction sites determined by the 3D reference interaction site model (3D-RISM). In Figure 5, an overlay of the optimized urea-(H 2 O) 6 complex and the solvation sites obtained with the 3D-RISM methodology is shown. We find a very good agreement between the water positions obtained with our FEBISS algorithm and the high-density water locations determined with the 3D-RISM calculation. Both methodologies agree on the existence of a water molecule in between the two amide groups, as well as on water molecules surrounding the urea oxygen atom in a hemispherical manner. However, with our FEBISS algorithm, we were able to identify two more explicit water positions, which span the region between the urea oxygen and urea amide groups, which are less recognizable in the 3D-RISM results. In addition, we investigated the dipole moments of the urea-water clusters. While urea in implicit solvent has a dipole moment of 6.10 D, the addition of one water molecule increased this value to 9.24 D, whereas for the urea-(H2O)6 cluster, a dipole moment of 9.57 D was found. On further increasing the number of explicitly treated waters to 13, the dipole moment decreased to 2.98 D.

Aminobenzothiazole-Water Complexes
2-Aminobenzothiazole (ABT) served as another test system in our study (see Scheme 1). It exists in two tautomer conformations, the amino (T1) and the trans-imino (T2) structural isomers, depicted in Figure 6. Previous studies have shown that water assists the tautomerization process and the conversion from T1 to T2. A microsolvation model was required to calculate meaningful transition state structures and reaction barriers, while the number and location of the explicitly treated water molecules was determined by trial and error [49]. Our goal was to investigate whether our microsolvation protocol is able to yield reasonable microsolvated structures. Therefore, we analyzed both tautomeric forms with our FEBISS algorithm and assessed the number, location, and orientation of individual solvent molecules as well as their impact on the proton transfer.
Initial DFT structure optimizations in the implicit solvent confirmed the amino isomer (T1) to be more stable-by 5.4 kcal/mol in the gas phase and by 5.6 kcal/mol in the implicit solvent. The results of our microsolvation analysis are presented in Figure 7, where the FEBISS bar charts for both tautomers are shown. For T1, we observe six consecutive water molecules with a solute-water distance below 3 Å, of which two water molecules have a ΔA value lower than -10.00 kcal/mol ( Figure 7A). Regarding T2, there are also six water In addition, we investigated the dipole moments of the urea-water clusters. While urea in implicit solvent has a dipole moment of 6.10 D, the addition of one water molecule increased this value to 9.24 D, whereas for the urea-(H 2 O) 6 cluster, a dipole moment of 9.57 D was found. On further increasing the number of explicitly treated waters to 13, the dipole moment decreased to 2.98 D.

Aminobenzothiazole-Water Complexes
2-Aminobenzothiazole (ABT) served as another test system in our study (see Scheme 1). It exists in two tautomer conformations, the amino (T1) and the trans-imino (T2) structural isomers, depicted in Figure 6. Previous studies have shown that water assists the tautomerization process and the conversion from T1 to T2. A microsolvation model was required to calculate meaningful transition state structures and reaction barriers, while the number and location of the explicitly treated water molecules was determined by trial and error [49]. Our goal was to investigate whether our microsolvation protocol is able to yield reasonable microsolvated structures. Therefore, we analyzed both tautomeric forms with our FEBISS algorithm and assessed the number, location, and orientation of individual solvent molecules as well as their impact on the proton transfer. In addition, we investigated the dipole moments of the urea-water urea in implicit solvent has a dipole moment of 6.10 D, the addition of one increased this value to 9.24 D, whereas for the urea-(H2O)6 cluster, a dip 9.57 D was found. On further increasing the number of explicitly treated w dipole moment decreased to 2.98 D.

Aminobenzothiazole-Water Complexes
2-Aminobenzothiazole (ABT) served as another test system in our stu 1). It exists in two tautomer conformations, the amino (T1) and the tr structural isomers, depicted in Figure 6. Previous studies have shown tha the tautomerization process and the conversion from T1 to T2. A microso was required to calculate meaningful transition state structures and rea while the number and location of the explicitly treated water molecules w by trial and error [49]. Our goal was to investigate whether our microsolva able to yield reasonable microsolvated structures. Therefore, we analyzed b forms with our FEBISS algorithm and assessed the number, location, and individual solvent molecules as well as their impact on the proton transfer Initial DFT structure optimizations in the implicit solvent confirm isomer (T1) to be more stable-by 5.4 kcal/mol in the gas phase and by 5.6 implicit solvent. The results of our microsolvation analysis are presented in Figure  FEBISS bar charts for both tautomers are shown. For T1, we observe six con molecules with a solute-water distance below 3 Å, of which two water m ΔA value lower than -10.00 kcal/mol ( Figure 7A). Regarding T2, there are Initial DFT structure optimizations in the implicit solvent confirmed the amino isomer (T1) to be more stable-by 5.4 kcal/mol in the gas phase and by 5.6 kcal/mol in the implicit solvent.
The results of our microsolvation analysis are presented in Figure 7, where the FEBISS bar charts for both tautomers are shown. For T1, we observe six consecutive water molecules with a solute-water distance below 3 Å, of which two water molecules have a ∆A value lower than −10.00 kcal/mol ( Figure 7A). Regarding T2, there are also six water molecules ranked in a row, with a solute-water distance smaller than 3 Å (Figure 7B), while three water molecules have a ∆A value lower than −10.50 kcal/mol. If we inspect these water positions in more detail (Figure 7C), we find for T1 that the most tightly bound water is a distal water with a coordination to the amine nitrogen, whereas the next two water molecules are pre-aligned to form a network between the two nitrogen nuclei to facilitate proton transfer. In the case of the trans-imino isomer T2 (Figure 7D), the first two water molecules with ∆A = −10.60 kcal/mol can be considered to be pre-aligned to form a water chain and to facilitate proton transfer. The third water also coordinates to the imine nitrogen, but is almost perpendicular to the aromatic ring.

Benzotriborneol-Water Complexes
Our largest test compound was (+)-syn-benzotriborneol, which was synthesized by Fabris et al. [48]. The structure can be characterized as a C3-symmetric triterpene, with a central aromatic ring and three hydroxyl groups (Scheme 1). By means of X-ray crystal structure and neutron diffraction analysis, it was proven that the molecule acts as an Our FEBISS algorithm clearly predicts two water molecules to form an intermolecular hydrogen bonded chain to facilitate the proton transfer from the amino to the imino tautomer, and vice versa. Hence, we set up the microsolvated structures with two explicit water molecules to yield T1-(H 2 O) 2 and T2-(H 2 O) 2 and optimized them ( Figure 7E,F). The microsolvation shifts the equilibrium by decreasing the energy difference between the two isomers ∆E rel = E(T2-(H 2 O) 2 ) − E(T1-(H 2 O) 2 ) to 2.9 kcal/mol (gas phase) and 4.5 kcal/mol (implicit solvent), respectively-a finding in agreement with previously published results [49]. More strikingly, however, the predicted number of two water molecules forming a water chain and aiding the proton transfer process coincides with previous results found by trial and error [49]. Wazzan et al. reported that two water molecules minimize the barrier for proton transfer from T1 to T2, whereas chains consisting of one or three water molecules result in higher barriers.

Benzotriborneol-Water Complexes
Our largest test compound was (+)-syn-benzotriborneol, which was synthesized by Fabris et al. [48]. The structure can be characterized as a C 3 -symmetric triterpene, with a central aromatic ring and three hydroxyl groups (Scheme 1). By means of X-ray crystal structure and neutron diffraction analysis, it was proven that the molecule acts as an efficient water receptor. In total, three water molecules were found in the X-ray crystal structure of (+)-syn-benzotriborneol. Out of those three water molecules, one molecule is bound in the center of the molecule's cavity, forming hydrogen bonds with the surrounding hydroxyl groups. We aimed at reproducing those experimental outcomes and subjected (+)-syn-benzotriborneol to our FEBISS calculation.
The resulting bar chart is shown in Figure 8A, where the first 50 water moleculesranked according to their ∆A values-are shown. Out of those 50 water molecules, six consecutive water molecules have a solute-water distance below 3 Å. Most strikingly, we can observe a single water molecule with a ∆A value of around −13.00 kcal/mol, which clearly stands out from the rest of the water molecules.
To compare the X-ray crystal structure with the FEBISS water position, we plotted both in Figure 8B. Most remarkable, our microsolvation protocol exactly predicted the position of the most strongly bound central water molecule (∆A = −13.00 kcal/mol), where the water molecule and the hydroxyl groups of (+)-syn-benzotriborneol both act as hydrogen bond donor and acceptor. As visible from the overlay of the FEBISS and the crystal structure ( Figure 8B (right)), the two water molecules are in almost perfect alignment with a root mean square deviation (RMSD) of 0.215 Å. Indeed, DFT reoptimization hardly changed the structure ( Figure 8C (right)), with averaged distances of the water and the three hydroxyl groups being 1.87 Å, 1.82 Å (crystal), and 1.88 Å (FEBISS), respectively. Considering the next five predicted water positions, Figure 8B (middle) reveals that they form hydrogen bonds with the three hydroxy groups. They show also good agreement with the additional two crystal waters. The RMSD between the second most favorable FEBISS water and the crystal water is 0.481 Å, while it is 1.421 Å between the third crystal water and the nearest FEBISS water.

Benzoic Acid-Water Complexes
Due to the amphiphilic nature of benzoic acid (BA), we included the molecule in our study to investigate whether our algorithm can distinguish between the hydrophilic and hydrophobic site. To assess the algorithm's performance when working with charged molecules, we also studied the deprotonated benzoate.
The hydroxyl group of BA exists in two distinct orientations, which are denoted as syn and anti, whereas the syn conformation is by 6.0 kcal/mol (gas phase) and 3.2 kcal/mol (implicit solvent) lower in energy (compare Figure S4 of Supplementary Materials). This is in agreement with previous experimental and computational investigations, [53][54][55][56]58,59] where values range between 6.0 kcal/mol and 6.5 kcal/mol in the gas phase [55,56,59]. The FEBISS bar charts for syn-BA and anti-BA as well as benzoate are depicted in Figure 9A-C. All three graphs show the ranking of the first 50 automatically placed water molecules. Regarding syn-BA and anti-BA, we observe three water molecules, with a solute-water distance below 3 Å ( Figure 9A). Out of those three water molecules, two show similar ∆A values, while one water molecule stands out with a ∆A value of −11.20 kcal/mol and −10.60 kcal/mol, respectively. On the contrary, the bar chart of the BA anion shows numerous water molecules with a solute-water distance lower than 3 Å ( Figure 9C). In total, this includes 15 consecutive water molecules, which have ∆A values between −10.00 and −14.00 kcal/mol. These solvent molecules are almost exclusively located around the charged carboxylate group as visible from the initial microsolvated structures with n = 1, 2, 3, 6, 9, 15 displayed in Figure S5 in Supplementary Materials. Only for n = 15, we see water molecules being located on top of the phenyl ring.

Benzoic Acid-Water Complexes
Due to the amphiphilic nature of benzoic acid (BA), we included the molecule in our study to investigate whether our algorithm can distinguish between the hydrophilic and hydrophobic site. To assess the algorithm's performance when working with charged molecules, we also studied the deprotonated benzoate.
The hydroxyl group of BA exists in two distinct orientations, which are denoted as syn and anti, whereas the syn conformation is by 6.0 kcal/mol (gas phase) and 3.2 kcal/mol (implicit solvent) lower in energy (compare Figure S4 of Supplementary Materials). This is in agreement with previous experimental and computational investigations, [53][54][55][56]58,59] where values range between 6.0 kcal/mol and 6.5 kcal/mol in the gas phase [55,56,59]. The FEBISS bar charts for syn-BA and anti-BA as well as benzoate are depicted in Figure 9A-C. All three graphs show the ranking of the first 50 automatically placed water molecules. Regarding syn-BA and anti-BA, we observe three water molecules, with a solute-water distance below 3 Å ( Figure 9A). Out of those three water molecules, two show similar ΔA values, while one water molecule stands out with a ΔA value of -11.20 kcal/mol and -10.60 kcal/mol, respectively. On the contrary, the bar chart of the BA anion shows numerous water molecules with a solute-water distance lower than 3 Å ( Figure  9C). In total, this includes 15 consecutive water molecules, which have ΔA values between -10.00 and -14.00 kcal/mol. These solvent molecules are almost exclusively located around the charged carboxylate group as visible from the initial microsolvated structures with n = 1, 2, 3, 6, 9, 15 displayed in Figure S5 in Supplementary Materials. Only for n = 15, we see water molecules being located on top of the phenyl ring.  For the two conformers of BA, we investigated the energy difference between the synand the anti-structure in the gas phase and in implicit solvent for various cluster sizes BA-(H 2 O) n (n = 0-3). When looking at the listed energy differences of the complexes in the gas phase and in implicit solvent (Table 1), distinct differences in the energies can be found-lacking a systematic pattern: for example, for n = 1, a relative energy difference of −9.0 kcal/mol in favor of syn is determined for the gas phase in agreement with previous findings of −8.9 kcal/mol, [54] but this value increases to −13.4 kcal/mol for n = 2 before collapsing to −2.3 kcal/mol for n = 3. The energy spread is less pronounced for implicit solvent calculations (2nd row, Table 1), where energy differences are consistent with the exception of n = 2.
The microsolvated clusters optimized with an additional implicit solvent model are depicted in Figure 10. The microsolvated clusters in gas phase are shown in Figure S6 in Supplementary Materials. Almost all gas phase and implicit solvent structures are remarkably similar the only exception being syn-BA-(H 2 O), where the water molecule forms two H-bonds to the solute in the gas phase ( Figure S6) and one H-bond in the implicit solvent structure ( Figure 10). However, when comparing the synand anti-species with n = 1 and n = 2 ( Figure 10), it can be seen that the locations of the explicit solvent molecules are very different. In case of the BA-(H 2 O) 2 even the number of hydrogen bonds differs between the synand the anti-conformer, which skews the energy difference. For BA-(H 2 O) 3 , however, the two structures are reasonably similar. Consequently, the syn-anti energy differences in the gas phase and the implicit solvent are also similar with −2.3 kcal/mol and −2.2 kcal/mol, respectively. Table 1. Relative stability differences between the microsolvated synand anti-conformer of BA in the gas phase (∆E gas ) and in the implicit solvent (∆E solv ). Structures were optimized with B3LYP/def2-TZVP/D3. Relative electronic energy differences are given in kcal/mol. For the two conformers of BA, we investigated the energy difference between the syn-and the anti-structure in the gas phase and in implicit solvent for various cluster sizes BA-(H2O)n (n = 0-3). When looking at the listed energy differences of the complexes in the gas phase and in implicit solvent (Table 1), distinct differences in the energies can be found-lacking a systematic pattern: for example, for n = 1, a relative energy difference of -9.0 kcal/mol in favor of syn is determined for the gas phase in agreement with previous findings of -8.9 kcal/mol, [54] but this value increases to -13.4 kcal/mol for n = 2 before collapsing to -2.3 kcal/mol for n = 3. The energy spread is less pronounced for implicit solvent calculations (2nd row, Table 1), where energy differences are consistent with the exception of n = 2. Table 1. Relative stability differences between the microsolvated syn-and anti-conformer of BA in the gas phase (ΔEgas) and in the implicit solvent (ΔEsolv). Structures were optimized with B3LYP/def2-TZVP/D3. Relative electronic energy differences are given in kcal/mol. The microsolvated clusters optimized with an additional implicit solvent model are depicted in Figure 10. The microsolvated clusters in gas phase are shown in Figure S6 in Supplementary Materials. Almost all gas phase and implicit solvent structures are remarkably similar the only exception being syn-BA-(H2O), where the water molecule forms two H-bonds to the solute in the gas phase ( Figure S6) and one H-bond in the implicit solvent structure ( Figure 10). However, when comparing the syn-and anti-species with n = 1 and n = 2 ( Figure 10), it can be seen that the locations of the explicit solvent molecules are very different. In case of the BA-(H2O)2 even the number of hydrogen bonds differs between the syn-and the anti-conformer, which skews the energy difference. For BA-(H2O)3, however, the two structures are reasonably similar. Consequently, the syn-anti energy differences in the gas phase and the implicit solvent are also similar with -2.3 kcal/mol and -2.2 kcal/mol, respectively.

Helicene-Water Complexes
The most apolar molecule of our study is [4]-helicene, which belongs to the family of polycyclic aromatic hydrocarbons (PAHs) [90]. Such molecules and their interaction with water are of interest for studying ice-grain formation [51]. [4]-Helicene-water clusters have been previously investigated by Domingos et al., who provided experimental data on the arrangement of a single water molecule around [4]-helicene in the gas phase, using highresolution rotational spectroscopy. The results of their study served as further benchmark in evaluating the performance of our FEBISS algorithm.
Our automated solvent placement protocol yielded the free energy bar chart depicted in Figure 11A. We observe that none of the water molecules exceeds a ∆A value of −9.70 kcal/mol and that all solute-water distances correspond to values larger than 3 Å. An exception is the water ranked at position #12 with a solute-water distance of 2.9 Å, barely below 3 Å. Due to the small differences in free energy, a representation of [4]-helicene and the first 14 consecutive water molecules from our FEBISS calculation is shown in Figure 11B. It is evident that the algorithm hardly places any water molecules in close proximity to the aromatic rings, but rather locates them in the surroundings of the solute's hydrogen atoms.

Discussion
General Performance. We devised a computational protocol to quantum chemical microsolvation, where the location, orientation, and interaction of individual solvent molecules with the solute are automatically calculated based on MD and GIST. Our From the FEBISS bar chart and from inspection of the positioned water molecules, we could not infer which water positions would yield low-energy monohydrated [4]-helicene clusters (denoted as [4]-helicene-(H 2 O)). Hence, we generated 14 such clusters with the water molecules ranked at position #1 to #14 in the FEBISS graph and subjected them to DFT optimizations. Several low-energy clusters were found (Table S4, Supplementary Materials) and selected low-energy conformers are depicted in Figure 11C. They are denoted as 1-iii (∆E = 0.17 kcal/mol), 1-v (∆E = 0.15 kcal/mol), 1-vi (∆E = 0.21 kcal/mol), and 1-xi (∆E = 0.00 kcal/mol), whereby the number 1 is a shorthand notation for [4]-helicene-(H 2 O) and the small Roman number specifies the rank of the water molecule in the FEBISS bar chart. All other conformers were either identical to these or enantiomers.
The relative electronic energies are very similar for all four complexes and all clusters show a similar water arrangement, with the water hydrogen atoms directed toward one of the aromatic rings. In cluster 1-iii and 1-xi, the hydrogen atoms interact with ring number c and d, respectively. In 1-v, the water is located over rings a and b, whereas in cluster 1-vi, it is located over rings b and c. These results are in agreement with reported experimental and theoretical investigations with the exception of cluster 1-iii [51].

Discussion
General Performance. We devised a computational protocol to quantum chemical microsolvation, where the location, orientation, and interaction of individual solvent molecules with the solute are automatically calculated based on MD and GIST. Our methodology provides a rational approach to generate microsolvated clusters based on physical properties. The fully automated workflow is implemented in our Python program FEBISS. The results are presented in a bar chart, where the solvent molecules are ranked according to their free energy. By simultaneously displaying the distances to the solute, the user has two criteria to choose the number of solvent molecules for a following quantum chemical calculation. The algorithm to obtain the solvent orientations is limited to water at the moment but will be extended to arbitrary solvents in the future. Consequently, as of now, our FEBISS program can only generate solute-water clusters.
Of course, the performance of this protocol intrinsically depends on the applied force field and water model used for the initial MD calculations as well as the simulation time. These factors need to be evaluated for each investigated molecule. While the generalized Amber force field (GAFF) in combination with the TIP3P water model-which we usedwas shown to yield good results for small organic compounds, [91][92][93] individual molecules may profit from a more tailored force field [94]. As pointed out, the force field can easily be modified as long as the used software package generates a trajectory and topology file convertible to be readable by Cpptraj.
To obtain accurate results, it is also necessary to assign reasonable partial charges. As previously reported, RESP charges yield better results than AM1-BCC, [93] a finding we could confirm for urea (vide infra).
Overall, our top-down protocol to quantum chemical microsolvation yields very good agreement with available experimental and theoretical data; specific findings and limitations will be discussed for each investigated species.
Urea-Water Clusters. Urea's strong interaction with solvent molecules in its close proximity has been recognized by spectroscopic and theoretical investigations, even though the extent of this interaction is still under debate [95]. Theoretical investigations based on MD simulations tend to find higher hydration numbers than experimental studies with up to eight hydrogen bonds [50,[96][97][98]. In contrast, by dielectric electron relaxation spectroscopy Agieienko et al. found a hydration number of only 2 [92]. Our determined hydration pattern of urea suggests up to six strongly interacting water molecules, which is in agreement with previous theoretical studies [50,52,97,98]. In particular, the positions of our water molecules agree with 3D-RISM results [50] and with the areas of hydration determined by the ab initio MD study of Weiss and Hofer [52]. The latter study also finds a bifurcating water molecule localized between the two amide groups, which is consistent with our results. The three oxygen coordinating water molecules found in the here presented microsolvated urea complexes are in alignment with the average number of hydrogen bonds to the urea oxygen (2.07 ± 0.75) predicted by Weiss and Hofer [52]. Still, the number and position of the strongly interacting water molecules is sensitive to the parametrization scheme, in particular, to the assigned partial charges. The water placement obtained by employing AM1-BCC charges ( Figure S3) predicted four water molecules interacting with the urea oxygen and one with the two amide groups. This is clearly a poorer agreement with available reference data [50,52] concerning the number and positions of the water molecules.
Considering the dipole moment of urea in water, experimental studies found a value of µ eff = 10.0 ± 0.1 D in infinite solution [92]. While the calculated dipole moment of urea in implicit solvent is only 6.10 D, positioning one water molecule in between the amide groups (urea-(H 2 O)) already increased µ to > 9 D. The addition of five more waters to yield urea-(H 2 O) 6 had little effect on the dipole and for the large cluster, urea-(H 2 O) 13 , µ decreased to 2.98 D. This finding shows that a parallel alignment of solute and solvent dipole moments is beneficial [92]. It may also indicate that out of the 13 water molecules in close proximity to the solute, some might have more bulk-like properties or show high dynamics. Hence, calculating molecular properties from a single urea-(H 2 O) 13 cluster does not yield accurate results. Explicitly describing these "bulk-like" solvent molecules by one conformation is not adequate.
Aminobenzothiazole-Water Clusters. Concerning the amine-imine (T1 to T2) tautomerization process in ABT, our FEBISS algorithm predicted exactly two waters forming a hydrogen bonded chain and aiding the proton transfer reaction. These two solvent molecules were previously only identified by trial and error [49]. While these two bridging water molecules were correctly identified as strongly interacting solvent molecules by FEBISS, a distal amine-coordinated water was determined to have the most favorable free energy for T1 (see Figure 7). Obviously, to compare stabilities between two configurations, the microsolvated clusters T1 and T2 have to be structurally similar to obtain reliable results. Hence, the user has to select meaningful water positions based on the available data, such as solute-solvent interaction strength and location of solvent molecules.
Benzotriborneol-Water Clusters. Our FEBISS algorithm predicted the position of the most strongly bound central water molecule of the water receptor (+)-syn-benzotriborneol in quantitative agreement with experimental X-ray and neutron diffraction data [48]. Remarkably, not only the position of the oxygen atom but also the orientations of the hydrogens are in agreement with experiment-a finding that highlights the accuracy and reliability of our microsolvation protocol.
The positions of the next five water molecules are all in close proximity to the three hydroxyl groups as expected based on chemical intuition. Two of the solvent molecules are also in good agreement with X-ray crystal water positions ( Figure 8B (right)). However, differences are expected as packing effects and symmetry mates influence the position of crystal waters and the environment in the crystal structure clearly differs from that in solution.
Benzoic Acid-Water Clusters. Concerning the microsolvation of benzoic acid, the FEBISS bar charts (Figure 9) clearly show that our algorithm is able to distinguish between charged and neutral species as indicated by the distinct water placement. The benzoate anion attracts more water molecules to aid charge compensation than the protonated BA species, which is in alignment with chemical intuition. The comparison of the microsolvated synand anti-conformer of BA further illustrates that in any explicit-implicit solvation model, the microsolvated structures have to be similar to be comparable in energy. In particular, the number of hydrogen bonds must be carefully examined. If it differs as, for example, in the BA-(H 2 O) 2 cluster, it has to be evaluated whether this is inherent to the specific geometry or whether this is due to additional solvent-solvent interaction as found in our case. Still, energy differences between the synand anti-cluster calculated in the gas phase and in implicit solvation became very similar for three explicitly considered water molecules. As these are the only water molecules placed in close proximity to the solute (d < 3 Å, see Figure 9) by our FEBISS algorithm, it can be assumed that solvation effects of the COOH group are sufficiently described.
Comparing the cluster geometries, the predicted position of the most favorable water molecule in the gas phase optimized syn-BA-(H 2 O) complex ( Figure S6)-bridging the hydroxyl and the carbonyl group-agrees with spectroscopic microwave data [57] and theoretical studies [33,96]. For the implicit solvent optimization of syn-BA-(H 2 O), no such bridging was found but a single hydrogen bond was formed between BA and H 2 O (Figure 10). Such a structure further suggests that a single water molecule is not sufficient to describe the microsolvation environment present in the bulk phase. Larger clusters with n > 2 reported in a previous study [99] showed some differences with our structures. However, these deviations can be attributed to the different methodologies to retrieve the microsolvated molecules: Krishnakumar [99] et al. used MD snapshots, whereas we use averaged highly populated positions.
Helicene-Water Clusters. Helicene was the most challenging system for our FEBISS algorithm as the interaction with water was expected to be weak due to its apolar nature. Indeed, all predicted water positions had a distance of >3 Å and their interaction with the solute was barely larger than the value for a typical bulk water (see red bars in Figure 11A). In a pragmatic approach, we generated 14 [4]-helicene-(H 2 O) clusters and obtained four low-energy conformers. When comparing our computed results with the rotational spectroscopy data from Domingos et al., [51] we can see striking similarities regarding the binding mode and position of a single water molecule toward [4]-helicene. In agreement with their data, we found the most tightly bound water molecule (cluster 1-xi) to be aligned on top of ring d and c, and also found two more conformers with the water located on top of rings a and b (1-v) and rings b and c (1-vi), respectively. Our fourth cluster, 1-iii, with the water placement similar to 1-xi was not reported by Domingos, but it remains unclear why.
Although the ranking of the water molecules according to their free energy provided little insight into which solvent molecules to choose for subsequent quantum chemical studies, the FEBISS algorithm yielded rational and reasonable starting structures for optimization. Nonetheless, we would like to point out here that the poorer performance for helicene is not a weakness of the methodology per se but can be attributed to the apolar nature of the molecule, its low solubility in water, and potentially lower accuracy of the force field for such species.
Applicability. The FEBISS protocol provides a physically sound basis to obtain microsolvated clusters. However, the assigned water locations are averaged positions. If the solute-solvent cluster is dominated by one conformation, this is a good approximation. However, depending on the investigated system and property of interest, it might be necessary to consider multiple solute conformations. In addition, it might also be necessary to subject the obtained solute-solvent cluster to additional sampling of the water positions-this time at a quantum chemical or semi-empirical level to generate a structural ensemble that may be used for Boltzmann averaging.
Still, even when only a single conformer of a microsolvated cluster is considered, DFT was shown to be well suited to describe solute-solvent interactions, [100] supporting the added value of low-cost microsolvation approaches to approximately model (bulk) solvent effects.

Conclusions
In this paper, we presented a quantitative approach to quantum chemical microsolvation based on MD simulations and GIST analysis, which assigns water position and orientations based on physical quantities. Our protocol provides a rigorous basis as to where to locate the explicit solvent molecules, how to orient them, and how many to include, thereby allowing a rational setup of microsolvated clusters.
We demonstrated that our computational protocol is applicable to a number of different molecules ranging from biologically relevant urea and benzoic acid to apolar helicene.
We retrieved solute-solvent complexes that are in very good agreement with experimental and theoretical reference data. Although currently limited to water as a solvent, we will extend our algorithm to arbitrary solvents in the future.
Our methodology can be used to set up Quantum Mechanics/Molecular Mechanics models, where the number of solvent molecules in the QM region may be determined by our FEBISS algorithm. In addition, our protocol paves the way to study explicit solutesolvent interactions in large complexes that cannot be treated by ab initio MD.
Supplementary Materials: The following are available online. Figure S1: Radial distribution functions g(r) from the 300 ns simulation trajectory of urea. Table S1: Radial distribution functions (RDFs) for certain urea-water atom pairs, calculated from our MD simulation, in comparison to previously published results. Figure S2: Urea-(H 2 O) n complexes for n = 1 and n = 6 as obtained by the FEBISS algorithm. The free energy of the solvent molecules is given in kcal/mol and they are color-coded according to this value from strongly favored (magenta) to less favored (yellow). Figure  S3: FEBISS bar chart (A) and initial urea-(H 2 O) 5 cluster when AM1-BCC charges were used in the MD simulation. Figure S4: Quantum chemically optimized structures of syn-BA and anti-BA. The relative energies were calculated with B3LYP/def2-TZVP/D3 in water modelled as implicit. Table S2: Relative energies (kcal/mol) and dipole moments (Debye) of the synand anti-conformer of benzoic acid calculated with B3LYP/def2-TZVP/D3 in the gas phase and in implicit solvent. Figure S5: Benzoate-(H 2 O) n clusters as obtained from the FEBISS algorithm for n = 1-3 water molecules (top) and n = 6, 9, 15 water molecules (bottom). The water molecules are color-coded according to their interaction strength from magenta (most favorable) to yellow (less favorable). Figure S6: Gas phase optimized microsolvated cluster of benzoic acid in the synand anti-conformation. Table S3. Relative energies (kcal/mol) and dipole moments (Debye) of tautomer T1 and T2 calculated with B3LYP-D3(BJ)/def2-TZVP for in the gas phase and in implicit solvent. Figure S7a: Structural overview of the B3LYP/def2-TZVP/D3 optimized microsolvated clusters of T1-(H 2 O) n and T2(H 2 O) n with n = 1-3 water molecules. The initial water placement was done with our FEBISS algorithm. Figure S7b: Structural overview of the B3LYP/def2-TZVP/D3 optimized microsolvated clusters of T1-(H 2 O) n and T2-(H 2 O) n with n = 4-6 water molecules. The initial water placement was done with our FEBISS algorithm. Table S4