Interfacing CRYSTAL/AMBER to Optimize QM/MM Lennard–Jones Parameters for Water and to Study Solvation of TiO2 Nanoparticles

Metal oxide nanoparticles (NPs) are regarded as good candidates for many technological applications, where their functional environment is often an aqueous solution. The correct description of metal oxide electronic structure is still a challenge for local and semilocal density functionals, whereas hybrid functional methods provide an improved description, and local atomic function-based codes such as CRYSTAL17 outperform plane wave codes when it comes to hybrid functional calculations. However, the computational cost of hybrids are still prohibitive for systems of real sizes, in a real environment. Therefore, we here present and critically assess the accuracy of our electrostatic embedding quantum mechanical/molecular mechanical (QM/MM) coupling between CRYSTAL17 and AMBER16, and demonstrate some of its capabilities via the case study of TiO2 NPs in water. First, we produced new Lennard–Jones (LJ) parameters that improve the accuracy of water–water interactions in the B3LYP/TIP3P coupling. We found that optimizing LJ parameters based on water tri- to deca-mer clusters provides a less overstructured QM/MM liquid water description than when fitting LJ parameters only based on the water dimer. Then, we applied our QM/MM coupling methodology to describe the interaction of a 1 nm wide multilayer of water surrounding a spherical TiO2 nanoparticle (NP). Optimizing the QM/MM water–water parameters was found to have little to no effect on the local NP properties, which provide insights into the range of influence that can be attributed to the LJ term in the QM/MM coupling. The effect of adding additional water in an MM fashion on the geometry optimized nanoparticle structure is small, but more evident effects are seen in its electronic properties. We also show that there is good transferability of existing QM/MM LJ parameters for organic molecules–water interactions to our QM/MM implementation, even though these parameters were obtained with a different QM code and QM/MM implementation, but with the same functional.


Introduction
After the pioneering work of M. Levitt and A. Warshel [1], the fundamental QM/MM methodology of joining explicit electronic structure to classical potential functions is constantly being both applied and expanded upon [2][3][4][5][6][7]. While much work is being put into methods that go beyond the electrostatic embedding-type of additive QM/MM, which explicitly couples the two subsystems via a Coloumb term but leaves the MM charges fixed, this form of coupling still seems prevalent within additive QM/MM applications [8], most likely due to its generally accepted trade-off between accuracy and efficiency. Some of the authors involved in this work have previously demonstrated how electrostatic embedding QM/MM can be used to predict solvation dynamics of bond formation in a model catalyst system, which was later confirmed by experiment [9,10]. However, the QM/MM implementation employed previously [11] relies on a DFT code that at the time of writing does not yet support the calculation of forces from hybrid functionals. In contrast, the MPP-CRYSTAL17 [12][13][14] code has an efficient implementation of the exact exchange term [12]. This capability is crucial for successfully modeling semiconducting oxide systems, such as TiO 2 nanoparticles (NPs) in aqueous solution [15]. Metal oxide nanoparticles NPs are widely used in many technological applications, and especially TiO 2 nanomaterials are very popular because of their abundance, nontoxicity, high stability under a variety of conditions, and biocompatibility. Their uses ranges from traditional ones, such as in the pigment industry or in cosmetics, to more advanced fields, such as photocatalysis, photoelectrochemistry, fuel production, and nanomedicine [16][17][18][19]. Many, if not all, of the applications take place in aqueous environments, where water is often a main actor in the physics and chemistry of such processes [20,21]. However, the modeling of TiO 2 nanoparticles is computationally very demanding [22][23][24][25]: apart from requiring hybrid functionals, realistic models consist of 500-1000 atoms for NPs of at least 2 nm of diameter, which cannot be treated periodically. Therefore, such simulations require the use of parallelized, efficiently performing codes, such as MPP-CRYSTAL17. However, when also including the aqueous environment of the NP (easily around 3000 atoms), current computational power still does not allow for a full, exact-exchange-including DFT treatment. Thus, we here report on the development and benchmarking of an electrostatic embedding QM/MM framework that allows for simulations of systems of these sizes and geometries. The highly arched curvature of spherical NP surface is known to partially dissociate the first water monolayer around it [26,27]. In addition, the curved surface has long-range physical effects on surrounding water, as recently reported [28]. Thus, when preparing simulations of NPs, it is important to assess how much water should be included in the simulation to achieve the desired realism. To describe the aforementioned dissociation of the first monolayer, it is necessary to include it in the QM subsystem in the multiscale divisioning, when working with simple, rigid, point charge-based force fields. This means that there will almost exclusively be water close to the QM/MM intersection, which again motivates a thorough analysis/optimization of the water-water QM/MM coupling accuracy.
The paper is structured as follows: In Section 2, we present the results from the water-water QM/MM LJ re-parameterization and implementation (Section 2.1), before turning to solvation of a TiO 2 NP (Section 2.2). In Section 3.1, we analyse the consequences of choosing various LJ fitting strategies, assess the transferability of QM/MM LJ parameters between different electronic structure codes, and the extent of the influence of water-water LJ QM/MM interactions. Section 3.2 discusses the consequences of including more water in the NP simulations. Finally, the computational details of the various sections are presented in Section 4.2, before Section 5 wraps up the conclusions from our work. The appendices contain further benchmarks on our QM/MM implementation.

Optimizing the Lennard-Jones Parameters for QM/MM Water-Water Interactions
By comparing QM/MM interaction energies with their pure QM counterparts on a system that only contains one molecular species, we can assess the accuracy of our novel electrostatic embedding framework [11,31]. Since the method is focused on describing molecules and larger systems such as nanoparticles in aqueous solution, we start by focusing on neat water. Some of the main criteria of success for achieving an accurate QM/MM method are that: (1) the QM/MM coupled interaction energies do not over-or under-bind when compared to neither the chosen QM nor the MM models: and (2) the various possible combinations of QM/MM geometries and regions are as similar as possible, so that there will be no orientation-induced artifacts in the total energies and forces [2]. The QM/MM coupling model used in electrostatic embedding (see Section 4.1) leaves open two main routes for the tuning of the accuracy of the coupling: (1) through how the additional electrostatic term in the external potential in the DFT code is constructed [11,32]; and (2) via a re-parameterization of the Lennard-Jones potential that describes non-electrostatic interactions between the atoms in the QM and MM subsystem (Equation (4) in Section 4.1). Here, we focus on the second option. We have employed two strategies for optimizing QM/MM LJ parameters: The Dimer Fit (DF) is based on optimizing the LJ parameters to minimize the difference between the multiscale interaction energy curves of the water dimer and the average of the pure QM and pure MM interaction energy curves, a methodology akin to previous work on other systems [33]. The other strategy attempts to take into account that the structure of liquid water cannot be completely understood only analyzing the water dimer, and thus, the Cluster Fit (CF) is a global fit on interaction energies on water clusters of size 3-10 molecules per cluster. The details of the fits can be found in Section 4.2.2, which the very interested reader might benefit from reading first, before continuing to the results section. Apart from minimizing the errors in structure and energy introduced by partitioning the system, the fitting effort also leads to insights into the transferability of the molecular characteristics of the water dimer into the modeling of liquid water.
Hybrid functionals provide a better description of the water liquid structure, compared to their GGA relatives [34][35][36], although often semi-empirical corrections to GGA functionals are employed [37]. In contrast, for the MM subsystem, there are many known improvements to the TIP3P description of water, either within the same general model of static point charges [38], or even extending the description to account for the polarizability of water [39][40][41], and/or flexible molecules [41,42]. However, employing more advanced MM water potentials is outside the scope of this work because: (1) TIP3P is still ubiquitous in many QM/MM simulations, especially of very large total systems, where the added computational expense of going beyond pairwise potentials can become prohibitive. (2) When including the first solvation shell (or more) in the QM region and focusing on explicit solvation effects on the solute in particular, and not so much the solute-solvent interactions themselves, the outer layers of water become less important, thus making TIP3P an adequate choice.

Water Dimer
Throughout this section, we use the following terminology: "QM/MM" means that the hydrogen-donating molecule is in the QM region, while the hydrogen-accepting is described with TIP3P, and vice versa for "MM/QM". We address both configurations collectively as "the multiscale configurations", to avoid confusion. Figure 1 reports the structure of the water dimer considered and relevant quantities assessed for our QM/MM implementation.
The hydrogen bonding energy of the water dimer is calculated with the three different sets of LJ parameters, and compared to the pure B3LYP and pure TIP3P results in Figure 2. Using the TIP3P LJ parameters with our choice of functional and basis set within CRYSTAL17 produces overbound multiscale dimer binding curves for both configurations, but most prominently for the QM/MM configuration, where the electronic density of the hydrogen-donating molecule is modeled explicitly.
At very short distances, dominated by Pauli repulsion, the differences in binding energies between the multiscale and the B3LYP curves are large, indicating that a re-evaluation of the LJ parameters are warranted. For the multiscale binding curves using the DF LJ parameters (see Section 4.2.2 for details on the fitting strategies), the QM/MM curve becomes almost identical to the TIP3P result, even though the fit target is the average of the QM and the MM binding curves. The resulting MM/QM curve is also brought closer to its QM counterpart, and all overbinding has been removed from the model. The CF strategy seemingly produces binding curves that are still overbound around their minima, but it interestingly decreases the difference between the O-O distances with the two multiscale minimum values, effectively making the multiscale model more consistent with respect to geometric differences. See Appendix A for an analysis of the various relaxed multiscale geometries for all possible configurations of the water dimer.  Water dimer binding curves (counterpoise corrected), calculated using LJ parameters from: TIP3P (top); the dimer fit (middle); and the cluster fit (bottom). The starting geometry is obtained from the S22 database, where the water dimer was optimized using CCSD(T) [43], and all other intramolecular geometries are kept constant while scanning the O-O distance.

Water Clusters
To assess whether the results from the dimer-fitted LJ parameters transfer to larger systems of water, we analyze the difference in interaction QM/MM energies and their pure B3LYP counterparts: evaluated over a large dataset of water clusters ranging from n = 3 to 10 molecules in total. Since the hydrogen bonding network of liquid water is only fully realized in all three spatial dimensions when tackling clusters containing six or more molecules, including such systems into our fitting methodology seems justifiable in the quest for more transferable QM/MM LJ parameters for water-water interactions. Figure 3 shows box plots representing the distributions in interaction energy differences with respect to the pure B3LYP results. The blue patches represent the maximum difference in interaction energies from the pure B3LYP and pure TIP3P results. Using the TIP3P LJ parameters results in parts of the QM/MM ∆∆E-values falling below the blue patches, meaning that there is a significant overbinding with respect to the single-description results. We also observe a trend of the binding systematically being tightest for the multiscale configurations that have the most QM/MM interactions, resulting in "u-shaped" curves. When using the DF LJ parameters, this u-shape has disappeared, but, especially for the 8-and 9-mer clusters, the QM/MM interaction energies give significantly underbound clusters, as the ∆∆E-values end up over the upper limit of the blue patches. The CF parameters retain the u-shape, indicative of the same systematic increase in binding with number of QM/MM LJ terms, but all QM/MM distributions are now within the total difference spanned by the limits of the two pure descriptions. In all cases, spread of energy differences is largest when the multiscale description is furthest removed from the reference (i.e., 1 QM water vs. pure QM water), as one would expect for a well-behaved interface. The fact that the TIP3P and CF results are more similar is most likely because the TIP3P parameters are already optimized to reproduce bulk phase properties [2], which is discussed further in Section 3.1. The horizontal axis is firstly divided divided into eight subsets based on the total number of molecules in each size of water cluster, and then further divided into subsets defined by how many of the total amount of molecules are in the QM subsystem. Each box represents the distribution of interaction energy differences from all possible QM/MM combinations with that number of QM molecules. The boxes spans 50% (the IQR) of the distribution of energy differences, while the whiskers represent ±1.5 IQR. The three datasets are produced using the three different sets of LJ parameters: The original TIP3P parameters (blue), the dimer fit (DF) parameters (red), and the cluster fit (CF) parameters (green). The blue patches behind the box plots show the maximum difference between the two pure descriptions.

Nanoparticles
As mentioned in Section 1, it is important to explore the effects of including increasing amounts of water around NPs, since the nanoparticle/water interfacial effects are quite long range. Here, we have studied three systems: (A) the bare NP (TiO 2 ) 223 · 10H 2 O ( Figure 4a); (B) the NP with a first water monolayer (ML) adsorbed on the surface containing 134 molecules, ∼20% of which are dissociated, as discussed and detailed in Section 4.2.5 and in [28] (Figure 4b); and (C), comprised of system B, with a molecular mechanic (MM) region composed by 824 surrounding waters added around it ( Figure 4c). The figure shows the structures after geometry optimizations. Systems A and B were originally prepared for a previous study by some of us [28]. The geometry optimization for system C was carried out twice, once with the water in the MM region described with the CF LJ parameters, and once with the TIP3P LJ parameters, to assess any possible effects on the NP. and (c) with QM/MM for the nanoparticle with a water multilayer of 958 molecules, using CF LJ parameters. Titanium, oxygen and hydrogen atoms of the NP are in cyan, red and white, respectively. Oxygen atoms of molecular water are in blue, while oxygen and hydrogen atoms from the dissociated water molecules are in green and yellow, respectively.
First, the effect of water around the NP has been investigated with respect to the induced structural changes in the optimized geometries. In Figure 5, the simulated EXAFS spectra of the bare NP, of the NP with an adsorbed water monolayer and of the one with a water multilayer are compared with that of bulk anatase. The distribution is quite broad for the bare NP, while it is reduced upon water ML adsorption. The change is most drastic going from no water at all to some water, but, when the multilayer is added, there is only a further tendency of the distances, in particular of Ti-O ax , to be centred at the bulk value, as the grow-in of a new peak at this distance indicates. This is true for calculations performed with both the CF and TIP3P LJ parameters, which produce almost identical spectra. The effect on the NP electronic properties of the QM/MM relaxation has also been investigated. Figure 6 compares the total (DOS) and projected density of states (PDOS) on oxygen species for the bare NP, the NP with a water monolayer and with a water multilayer around it. The electronic structure of the NP in vacuum has been evaluated both with PBE and B3LYP functionals (Figure 6a,b, respectively). First, when switching from a GGA functional such as PBE to a functional including exact exchange, we observe an expected widening of the bandgap from 2.44 to 4.14 eV [15]. When a ML of water is adsorbed on the NP surface ( Figure 6c), we observe a 0.1 eV change in the band gap value, and a decrease of the work function by +0.97 eV. The effect is quite reduced, with a shift of +0.81 eV and +0.83 eV, when the system is immersed in a multilayer of water (Figure 6d), using CF or TIP3P LJ parameters, respectively.  Table 1 compares the values of σ OO and OO , used in the non-electrostatic term of the QM/MM coupling (Equation (4) in Section 4.1) obtained from our two different methods of fitting water-water QM/MM LJ parameters, and shows that optimizing LJ parameters only on the water dimer gives rise to a pairwise potential that is very different from both the TIP3P and CF potentials: while the van der Waals (vdW) radius is increased significantly, the potential well depth is over 10 times more shallow. On the contrary, the CF methodology reaches its optimum values by reducing the vdW radius and deepening the potential. The cluster-( Figure 3) and liquid-water benchmarks (Figures A3 and A4 in Appendix C) present a good case for the necessity of reaching beyond dimer systems when generating QM/MM parameters for solvents. This might not be particularly surprising, since the full three-dimensional hydrogen bonding network of water cannot be realized by molecular systems smaller than the water hexamer. Furthermore, the MM parameters are usually optimized to best describe condensed-phase systems that behave systematically more polarized, which again means that MM gas phase potentials might have overemphasized interactions [2]. The description of polarization in the QM system is more transferable between phases, therefore it should not have as overemphasized gas phase interactions as the MM description. Thus, when fitting the QM/MM interaction (partially) to the MM interaction (and partially to the QM interaction which is not correspondingly under-interacting), the over-emphasis carries over into the obtained parameters. Remembering that the DF method with its order of magnitude lower -value is just short of removing the attractive part of the LJ potential altogether, it is interesting to briefly revisit Figure 3. The two datasets with normal -values are seen to systematically bind hardest when the number of QM/MM LJ interactions is largest, giving these u-shaped collections of box plots. Since this feature is absent from the DF dataset, we can attribute this type of QM/MM overbinding (within the limits spanned by the pure QM pure MM difference) to the LJ potential. Since performing a fit on the entire dataset of clusters does not remove this behaviour, it must be inherent in the actual formulation of the interaction, and thus, it is necessary to improve on that, or go beyond electrostatic embedding QM/MM altogether, if one wishes to increase the accuracy even further.

Water-Water QM/MM LJ Re-Parameterization and Parameter Transferability Tests
It should be noted that when it comes to QM/MM methodologies that involve fitting of one sort or another, many much more "modern" strategies involving machine learning routines have been formulated [6], but how to apply such strategies to explicitly interacting QM and MM subsystems (e.g., electrostatic embedding QM/MM) seems not yet quite realized. The fitting methodology presented here is very basic in comparison, but has the advantage of only requiring re-evaluations of the pairwise LJ potential, which means that, if one were to replace the MM force field, no new QM simulations would have to be made to redo the fit, and, when the new parameters are obtained, there is no extra cost to the following simulations.
In the same vein as our water-water LJ re-parameterization, we have also tested a previous QM/MM re-parameterization of LJ parameters of organic molecule/water dimers to assess the more general accuracy of our novel CRYSTAL17/AMBER16 coupling, the details of which can be found in Appendix B. The QM/MM LJ parameters for this section were obtained from the work of Freindorf et al. When transferring the QM/MM LJ parameters optimized by Freindorf et al. to CRYSTAL17, the RMSD values increase from the results in the paper by 0.11 kcal/mol, 0.01 Å, and 0.98 degrees, for the energy, hydrogen bond distances and angles, respectively. Taking all possible calculational differences mentioned in Appendix B into account, we conclude that simply transferring the previously produced QM/MM parameters to a new QM/MM implementation can be done without a dramatic loss of accuracy. Lastly, we also note that we find no systematicity in any of the errors based on whether the role of the water molecule in the dimers is to act as a hydrogen-donor or -acceptor, which again indicates a robust QM/MM description, which is not significantly and/or systematically sensitive to the nature of the hydrogen bonding geometry over the QM/MM division.

Nanoparticles
For the simulated EXAFS spectra of the three NP systems (Figure 5), we saw that, while the most drastic changes happen when going from naked to ML hydrated NP, there were only fine differences between the QM monolayer and the QM/MM multilayer simulations, in particular associated to the emergence of a peak at the bulk anatase Ti-O ax distance in the QM/MM multilayer simulation. The effects of the QM/MM LJ re-parameterization on the NP structure were negligible, indicating that the inclusion of the first layer of water in the QM subsystem makes the NP structural properties robust to changes in the water-water QM/MM LJ coupling terms. This conclusion is thus also an input to the ongoing debate about the impact of the somewhat ad hoc treatment of non-electrostatic interactions over the QM/MM border. This study seems to show that, as long as this border is far enough removed from the main subject of the study, inaccuracies within the QM/MM non-electrostatic potential are of lesser concern.
The monolayer/multilayer differences could indicate that, when fully immersed in water, the nanoparticle undergoes a slow process of recrystallization, but this would have to be confirmed with molecular dynamics and thermal sampling. An experimental EXAFS of Rajh et al. [44] reported the partial restoration of the octahedral Ti coordination when enediol ligands adsorb to the surface of spherical TiO 2 nanoparticles.
With regards to the electronic properties of the NP in different surrounding conditions, it appears evident that inclusion of exact exchange is crucial for providing the correct electronic structure. The exact exchange term partially corrects for the electronic self-interaction error that is present in standard GGA functionals, such as PBE, and causes a large underestimation of the band gap, as is evident from Figure 6. We wish to comment on the fact that the DFTB+/AMBER approach used in previous work [28] is clearly faster than the current approach that does not employ the Tight-Binding approximation. The previous approach allows introducing a much larger number of water molecules to describe the bulk of water around the nanoparticle. However, it cannot correctly capture the electronic properties of the system being based on standard density functional theory methods.
When adding more water, up to a 1 nm thick layer, we do not observe a relevant effect on the band gap, but we observe an effect on the NP's work function with a shift of about 0.2 eV with both CF and TIP3P LJ parameters, due to the dipole orientation of the water molecules in the multilayer. In particular, one should consider that the surface dipole moment for a naked NP is negative and pointing outwards. In the first adsorbed water monolayer (Layer I in Figure 7), all molecules are bound through the water oxygen with the OH bonds directed towards the vacuum. This creates an opposite (positive) dipole moment (i.e., pointing towards the center of the NP), which destabilizes the band states by +0.97 eV (compare the position of the top of the valence band in Figure 6c with that in Figure 6b). The addition of further layers on top of the first adsorbed water monolayer (Layers II-V in Figure 7) mitigates this effect because the average radial dipole moment, created by the multilayer, is negative, causing a shift back of 0.2 eV (compare the position of the top of the valence band of Figure 6d with that in Figure 6c). It must be re-iterated that these observations pertain to a specific configuration of water molecules in the multilayer, as obtained through the relaxation from a starting random distribution (see Section 4.2.5). To assess whether these changes are carried over to a thermal average, one should run molecular dynamics trajectory/ies with extended sampling of configurations, which is however beyond the scope of this work and could be object of future studies.

The Basics of Electrostatic Embedding QM/MM
In electrostatic embedding QM/MM methodologies, the total energy of the entire system is a sum of the two subsystems, plus an interaction term: The MM region is here described using a point charge-based force field, with q i denoting the charges and R i their spatial coordinates and n(r) the electronic density of the QM subsystem. The interaction energy term E QM/MM (in atomic units) is: with Z α being the atomic number of atom α, running over all atoms in the QM subsystem. i runs over all N MM charges of the MM subsystem. In CRYSTAL17, the point charges are then formally considered "atoms" with zero mass, nuclear charge as given in the input, and the default assigned basis set is a single s gaussian with an exponent of 1,000,000. Thus, the point charges are included in the external potential, and the electronic density is converged under their influence. Likewise, CRYSTAL17 calculates the electrostatic forces from the density of the QM subsystem on the point charge centres.
The non-electrostatic electronic interactions, E NES , are in this work included through a Lennard-Jones pair potential between the atomic centers of each subsystem: with iα defining the well-depth, and σ iα the equillibrium distance of the potential. Thus, no non-electrostatic interactions are directly taken into consideration in the evaluation of the electronic density of the QM subsystem. The re-parameterizations via the DF and the CF strategies are carried out for iα and σ iα of Equation (4), for the atom types (elements) Oxygen and Hydrogen, combined via the Lorenz-Berthelot combining rules.

Computational Details
For the QM subsystems in all of the QM/MM simulations carried out for this work, the choice of functional has been (the CRYSTAL17 implementation of) B3LYP, comprised of Becke's exchange functional [45], 20% HF exchange, and the Lee-Yang-Parr correlation functional [46]. For the benchmarks, we opted for the CRYSTAL17 basis set 6-31G(d,p) [47], which resembles the basis set used in the work where we obtained the non-water QM/MM LJ parameters from [33]. The Anderson Convergence Accelerator was used for all CRYSTAL simulations [13], and the CRYSTAL standard convergence criteria for single point calculations were used. For water simulations including more than a single QM water molecule, the counterpoise correction scheme was employed to avoid basis set superposition errors: when calculating monomer energies of single molecules in clusters, ghost atoms were added at the positions of the other molecules, so that the same quality of basis set was kept constant, when evaluating interaction energies. As in previous work [11], the ASE-native TIP3P implementation [30,48] was used for the pure water benchmarks, thus avoiding the process of generating AMBER input files for these 6000+ calculations.

Water Dimers and Clusters
The interaction energy, or binding energy, is defined as ∆E int = E cluster − n m E m , where E m is the energy of monomer m in the cluster, and n is the total number of molecules in the cluster. Thus, e.g., for a QM/MM water dimer using TIP3P as the MM model, only the QM monomer has a monomer energy.
To transferably and generally minimize over-or underbinding of QM/MM systems in relation to the total difference between pure B3LYP and TIP3P models, we go beyond the water dimer, and extend the water dataset to encompass binding energy calculations of all N possible QM/MM combinations of n-molecule water clusters: where k is the number of QM molecules, and c n is the number of clusters with n total molecules in the dataset. The water cluster geometries were obtained from the work by the Bates and Tschumper [49], and the set of (H 2 O) n , n = 3-10 clusters provided by Temelso and coworkers [50], for a total of 6214 QM/MM single point energy calculations. First, a dataset of interaction energy differences ∆∆E = ∆E QMMM int − ∆E B3LYP int was produced using the TIP3P LJ parameters. The resulting dataset will then consist of a distribution of interaction energies per k, for each n. This means quite a few distributions will have to be held up against each other, which is most easily visualized through a box-plot, where the distribution is simplified by showing the interquartile range (IQR), or the middle 50% of the distribution as a box, as done in Figure 3.
Since only electrostatic interactions are included in the external potential in CRYSTAL17, one can simply subtract the non-electrostatic contributions to the interaction energies, and re-calculate the LJ term with any given LJ parameters, to achieve new total interaction energies, without having to redo the more expensive DFT calculations. This method was employed to obtain fitted LJ parameters that minimized the total the over-and underbinding of the entire water cluster dataset, as described in the following section.

Fitting Strategies
The dimer fit minimizes the difference between the QM/MM dimer binding curve and the average of the pure QM and the pure MM also in the 2.1 < r < 3.0 Å interval, by optimizing the LJ parameters that are used in the QM/MM non-electrostatic coupling term (Equation (4)). The MM-MM LJ interactions are not modified, i.e., these interactions are still evaluated using the TIP3P parameters. The fit was done using the curve_fit tool from the Scipy.optimize submodule for python. The interval is chosen to avoid giving weight to the very repulsive region at r < 2.1 Å, which would skew the overall interaction.
The cluster fit minimizes a modified chi-square difference, χ 2 , summed over all i cluster configurations, and normalized with the number of molecules in each cluster N i : where ∆E avg int,i is the average of the pure QM and pure MM interaction energy of cluster configuration i. The fit was carried out by using the basin hopping algorithm implemented in the Scipy python module, starting from TIP3P LJ parameters, but also allowing for non-zero LJ parameters on the hydrogen atoms, for a total of 4 degrees of freedom in the fit. The initial step size was 1 × 10 −4 eV or Å. Keeping in mind the physical meaning of the σand -parameters in the LJ potential, the following constraints were employed: 1 Å < σ OO < 6 Å, and 0 Å ≤ σ HH < 3 Å. After this initial exploration of the parameter space, a bootstrapping methodology was employed, spanning an initial grid of all four free parameters within the limits of the constraints. The minimizer used here was of the sequential least squares programming type, also from the scipy.minimize module. This provided the smallest χ 2 , with the parameters σ CF OO = 3.10031 Å, CF OO = 0.011402 eV, and curiously no non-zero hydrogen LJ parameters.

Liquid Water Simulations
In preparation for the QM/MM liquid water simulations, a cubic box with 27.96 Å sides containing 729 water molecules was equilibrated at 300 K with the TIP3P force field, using a Langevin thermostat with a friction coefficient of 0.02 a.u. After the equilibration was completed, the simulation was continued to produce initial configurations for spawning QM/MM MD simulations at 1 ps intervals from the equilibrated MM MD trajectory. Three sets of QM/MM MD simulations were prepared: (1) using the TIP3P LJ parameters; (2) using the DF LJ parameters; and (3) using the CF parameters. The simulation setups were otherwise identical: The QM subsystem was defined to include a single water molecule, and re-equilibration was carried out for 4 ps for each of the spawned QM/MM MD trajectories before sampling of, e.g., RDFs, was carried out. The RDFs were calculated using the VMD program [51]. The total sampled time for the CF LJ parameter-runs were 182 ps, 123 ps for the DF LJ parameter-runs, and 56 ps for the TIP3P runs. The results can be found in Appendix C.

Organic Molecule/Water Dimer Benchmarks
To benchmark the accuracy of our QM/MM methodology against pure DFT results on organic molecules, we used a subset of the database used by Freindorf et al. [33]. All dimers consisting of a single organic molecule and a water molecule from the original database were recreated, as well as two new systems, namely glycine and aspartic acid, were included in our dataset. Their geometries were then optimized with B3LYP/6-31G(d,p) only. Then, following Freindorf et al., QM/MM geometry optimizations were performed, with the MM subsystem consisting of the water molecule, using the original TIP3P LJ parameters, since the new parameters are only optimized for QM/MM water-water interactions. In these calculations, the MM atoms were fixed at their positions obtained from the previous calculations of the dimers, performed with the single-description QM level of theory. The geometry relaxations were carried out until the maximum force magnitude within the systems was below 0.01 eV/Å. The results can be found in Appendix B.

Nanoparticle Simulations
The anatase TiO 2 spherical nanoparticle (NP) model used in this work was obtained from a simulated annealing process described in a previous work of some of us [24]. The model has a stoichiometry of (TiO 2 ) 223 · 10H 2 O and it is characterized by a diameter of about 2.2 nm (see Figure 4a). A water monolayer around the NP was modeled by adding to the undercoordinated Ti atoms of the surface water molecules. The procedure is described in detail in ref. [28]. The most stable water monolayer around the NP is constituted by 134 water molecules and has an extent of dissociation α = 0.21 (see Figure 4b). α is defined as: where n H 2 O is the number of molecular and n OH,H of dissociated water molecules in the water monolayer. Finally, a spherical water shell of about 1 nm thickness was added around the NP with the water monolayer using the PACKMOL code [52]. The water density inside the shell is approximately of 1 g/cm 3 corresponding to 824 water molecules (see Figure 4c). For all the DFT calculations in this section, the all-electron basis sets used are O 8-411(d1), Ti 86-411(d41), and H 511(p1), as defined in [23]. In the case of the bare NP and the NP with the water monolayer, forces were relaxed to less than 0.02 eV/Å using the CRYSTAL17 code. For the bare NP, we also used the generalized gradient approximated (GGA) functional PBE [53], for comparison to B3LYP results. In the case of the NP surrounded by the multilayer, forces were relaxed to 0.05 eV/Å to decrease the overall computational cost, using CRYSTAL17/AMBER16 QM/MM approach. For the QM part, we used the B3LYP functional, while to describe the QM/MM non-electrostatic interaction we made two simulations, one using CF and one using TIP3P LJ parameters.
To simulate the direct-space extended X-ray adsorption fine structure (EXAFS) spectra, Gaussian convolution of peaks (σ = 0.005 Å) was centred at the distance lengths between each Ti atom and other atoms (O or Ti) from its first, second, and third coordination shells. Reference anatase bulk Ti-O representative distances have been calculated with DFT(B3LYP) to be, Ti-O eq = 1.946 Å and Ti-O ax = 2.000 Å for the first coordination sphere, Ti---Ti = 3.092 Å for the second coordination sphere and Ti---Ti = 3.789 Å and Ti---O = 3.939Å for the third and fourth coordination sphere, respectively.
Through another Gaussian convolution, this time of the eigenvalue coefficients (σ = 0.005), centred at the Kohn-Sham energy eigenvalue of each orbital, we simulated the total density of states (DOS) of the nanoparticles. For the projected density of states (PDOS), we used the coefficients in the linear combination of atomic orbitals (LCAO) of each molecular orbital, since summing the squares of the coefficients of all the atomic orbitals centred on a certain atom type results (after normalization) in the relative contribution of each atom type to a specific eigenstate.

Conclusions
With this work, we have made a QM/MM electrostatic embedding implementation that combines (MPP-)CRYSTAL17 and AMBER16, and tested it thoroughly. We have produced new water-water QM/MM LJ parameters for B3LYP/TIP3P, which improve the water-water coupling over the QM/MM border, and could help reduce problems in, e.g., future adaptive QM/MM methodologies, where the contents of the QM subsystem is dynamically updated. Through this part of the study, we have also made it clear that for solvents such as water, where the water-water interactions over the border are important, it is not always sufficient to optimize new LJ parameters from dimer interactions. This is of extra importance when the fit includes matching an average of QM and MM potentials, where the latter is already optimized to reproduce bulk phase properties, and thus have overemphasised gas-phase interactions.
We have also analysed the transferability of LJ parameters between different DFT codes and QM/MM implementations, and found that the QM/MM vs. pure-QM errors did not increase significantly, even though the parameters were produced with a different DFT code.
Finally, we have demonstrated that the implementation can handle geometry optimizations of TiO 2 nanoparticles immersed in water. In the same vein of assessing how large a role the QM/MM LJ parameters play in the total picture, we have shown that changing them has a negligible effect on the NP. This is perhaps not surprising, since the differences do not directly involve the NP itself: the MM-to-QM-water forces would have to affect the QM water shell structure rather drastically for this to affect the NP.
In addition, as the studies on neat QM/MM water have shown, if attempts to reduce computational costs further were to be made by describing all water layers with force fields, it is likely that re-parameterization of Ti and NP oxygen-water LJ parameters would be necessary, apart from also having to use a force field for water that can model dissociation.
Our analysis showed that the major changes to the nanoparticle take place already after adding the first water monolayer, and that adding further water layers can have a small effect on the structural properties, here seen in terms of a finer recrystallization. For the specific water configurations studied here, we observed a more evident effect on the electronic properties of the NP. Future work could be focused on studying if and how these effects could change with thermal sampling of the solvated nanoparticles.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  Table A1 shows results from performing geometry optimizations using the three different LJ parameter sets. Note how the multiscale geometries most closely resemble each of the two single-model geometries depending what model is used to describe the lone pair-donating (hydrogen accepting) molecule in the water dimer. The CF LJ parameters provide the smallest difference in optimal O-O distance between the QM/MM and MM/QM configuration.

Appendix B. Organic Molecule/Water Dimer Benchmark
Appendix B.1. Structures and Nomenclature Figure A1 provides a visual overview of the benchmarked dimers, and the hydrogen-bonding geometries that the QM/MM interface needs to describe.  Figure A1. Structure and nomenclature of the different H-bond evaluated for different organic molecules/water pairs: acetic acid/water (1,2,3,4), methylamine/water (1,2), acetone/water (1), dimethyl ether/water (1), methanol/water (1,2), methane/water (1), imidazole/water (1,2), glycine/water (1,2,3,4,5), and aspartic acid/water (1,2,3,4,5). It is debatable whether the methane-water interaction can be called a hydrogen bond, but, since it is included in the original dataset, we also include it here. Table A2. Comparison of the hydrogen bond binding energy (E HB in kcal/mol), distance (r HB in Å) and angle (φ HB in degrees) as calculated with B3LYP and B3LYP/TIP3P. For the structure and nomenclature of the hydrogen bond evaluated for different organic molecules/water and amino acids/water pairs refer to Figure A1 in Appendix B.1.  Table A3. Root-mean-square deviations (RMSD) for hydrogen bond binding energies (E HB in kcal/mol), distances (r HB in Å), and angles (φ HB in degrees) between the pure B3LYP reference and the B3LYP/TIP3P calculations.

RMSD (E HB ) RMSD (r HB ) RMSD (φ HB )
0.77 0.121 3.7 Appendix B.3. Analysis Figure A2 shows the correlation of hydrogen bond energies, distances, and angles between the QM/MM simulations and the pure QM reference. The QM/MM angles and energies are evenly distributed around their QM reference, and no systematic differences can be observed for this dataset. For the hydrogen bond lengths, the correlation still seems linear, but with a tendency for the QM/MM model to systematically slightly overestimate the bond lengths. The total RMSD values for the dataset are 0.77 kcal/mol, 0.121 Å, and 3.7 degrees, for the binding energy, H-bond length, and H-bond angle, respectively. The authors report that their hydrogen bond energies are systematically too big, while the hydrogen bond distances are systematically too short [33]. Curiously, when we use their LJ parameters in our CRYSTAL17/AMBER16 QM/MM implementation, we only see, as shown in Figure A2, a systematic deviation of the hydrogen bond distances, and, in our case, the QM/MM bonds are too long. There are three main differences between the work of Freindorf et al. and the work carried out here: firstly, the DFT code used by Freindorf et al. is Q-Chem, which is different from CRYSTAL17 in many aspects, and could have an implementation of B3LYP that is not exactly the same as in CRYSTAL17. The authors did not mention if any reduced-scaling algorithms were employed, which we assume must mean that no such methods were used. However, some programs apply, e.g., the resolution of identity approximation by default, which could affect geometry optimizations, but, since no auxiliary basis sets are reported, this is most likely not the case. Secondly, there is not a complete 1:1 correspondence between the basis sets available to the two codes, e.g., CRYSTAL17 uses 5d functions in the 6-31G(d,p) basis set, where many other codes would use 6d. Lastly, the geometry optimizations could have been carried out with another type of optimization, and to different convergence criteria. Figure A3 compares QM/MM water RDFs to the RDF obtained from a pure TIP3P run. Since hybrid functionals such as B3LYP are still relatively costly, we have not made a pure B3LYP simulation for comparison. Instead, we have shown a result from the literature [36], which was restricted to 32 molecules and short trajectories due to cost, and run at 350 K, which impedes direct comparison. It is a well known strategy [54][55][56] to increase the temperature in AIMD simulations to counteract both the overstructuring of liquid water by many DFT functionals, and to nullify the lack of proton quantum effects in the dynamics. The overbinding in the TIP3P LJ parameter dimer binding curve is observed to carry over to liquid water, and results in an overstructuring: the amplitude of both the first peak, and the first well is the largest of all the datasets. The first peak of the DF and CF datasets are almost of the same height, with a peak magnitude of 2.92 and 2.86 for the CF and DF set, respectively. Both the first well and the second peak of the DF dataset have higher amplitudes than the CF dataset. The CF radius of the first solvation shell seems to lie in between the pure B3LYP and pure TIP3P result, whereas the shell is thinner for the two other QM/MM simulations. Three types of QM/MM simulations were made, one for each of the QM/MM LJ parameter sets: TIP3P, DF, and CF. The pure B3LYP data, simulated at 350 K, and for 32 water molecules was obtained from [36]. The higher temperature is known to lower the amplitude of the O-O peaks for both classical, pairwise force fields [54], and for DFT methodologies [55,56].

Appendix C. QM/MM Liquid Water
Angular distributions of the hydrogen bond accepting and donating water molecules are central quantities for characterizing the hydrogen bond network in liquid water. Figure A4 shows the sampled distributions for the three QM/MM water datasets produced in this work. Again, as was the case for the radial distributions, the dimer fitting methodology has reduced the water over-structuring relative to the TIP3P-structure, but not to the same degree as the cluster fit methodology. Especially for the α (and the connected β) angle, not only the amplitude has been decreased towards the TIP3P result, but also the peak position has moved.