Structured Waters Mediate Small Molecule Binding to G-Quadruplex Nucleic Acids

The role of G-quadruplexes in human cancers is increasingly well-defined. Accordingly, G-quadruplexes can be suitable drug targets and many small molecules have been identified to date as G-quadruplex binders, some using computer-based design methods and co-crystal structures. The role of bound water molecules in the crystal structures of G-quadruplex-small molecule complexes has been analyzed in this study, focusing on the water arrangements in several G-quadruplex ligand complexes. One is the complex between the tetrasubstituted naphthalene diimide compound MM41 and a human intramolecular telomeric DNA G-quadruplex, and the others are in substituted acridine bimolecular G-quadruplex complexes. Bridging water molecules form most of the hydrogen-bond contacts between ligands and DNA in the parallel G-quadruplex structures examined here. Clusters of structured water molecules play essential roles in mediating between ligand side chain groups/chromophore core and G-quadruplex. These clusters tend to be conserved between complex and native G-quadruplex structures, suggesting that they more generally serve as platforms for ligand binding, and should be taken into account in docking and in silico studies.


Results
Only the crystal structures containing acridine, berberine, and naphthalene diimide (ND) derivatives fulfilled the acceptance criteria summarized in the preceding section. Two co-crystal structures are available (Table 1) for the tetrasubstituted naphthalene diimide compound MM41 ( Figure 1) complexed to intramolecular human telomeric GQs, for which it has high binding affinity. Structure PDB id 3UYH is at the higher resolution of the two and consequently a greater number of ligand-associated water molecules were observed in electron density maps and included in the final refined crystal structure [81]. Hence it was chosen for further detailed analysis (Tables 1 and 2). Structure 3CDM [124] has 2 G-quadruplexes, 158 water molecules, and 4 substituted naphthalene diimides in the asymmetric unit, i.e., 79 waters per G-quadruplex. However, few water molecules in this structure are resolved in the vicinity of the two stacked naphthalene diimide ligands compared to structure 3UYH and so this structure was not chosen for detailed analysis. In addition, the nature of naphthalene diimide substituents in 3CDM is not directly relevant to MM41 and hence not to CM03 or SOP1812, so this structure was not considered any further in the present analysis.

MM41 Side Chain Contacts and Water Environment
MM41 has two side chains terminating in N-methyl-piperazine groups and two with terminal morpholino groups. Each of these groups can be assumed to be protonated at physiological pH, with N-methyl-piperazine having a pK of 8.5 compared to the slightly less basic morpholino group, with a pK of 9.2 [81]. Figure 2a shows a view of structure 3UYH projected onto the planes of the G-quartets and the naphthalene diimide core, highlighting the grooves of the GQ. Each MM41 side chain is positioned in or close to the mouth of a GQ groove, although only three of the four end groups are actually situated within a groove. The fourth, having a terminal morpholino ring, is oriented away from the quartet plane and a detailed examination of the crystal structure has indicated that rotation of the side chain to place the morpholino group into groove 4 is sterically hindered by the small surface area of the naphthalene diimide core compared to that of the quartet [81].

MM41 Side Chain Contacts and Water Environment
MM41 has two side chains terminating in N-methyl-piperazine groups and two with terminal morpholino groups. Each of these groups can be assumed to be protonated at physiological pH, with N-methyl-piperazine having a pK of 8.5 compared to the slightly less basic morpholino group, with a pK of 9.2 [81]. Figure 2a shows a view of structure 3UYH projected onto the planes of the G-quartets and the naphthalene diimide core, highlighting the grooves of the GQ. Each MM41 side chain is positioned in or close to the mouth of a GQ groove, although only three of the four end groups are actually situated within a groove. The fourth, having a terminal morpholino ring, is oriented away from the quartet plane and a detailed examination of the crystal structure has indicated that rotation of the side chain to place the morpholino group into groove 4 is sterically hindered by the small surface area of the naphthalene diimide core compared to that of the quartet [81].  [125]. (b) A view of the 3UYH complex looking into groove 1. The four water molecules are shown that are hydrogen bonded to the morpholino group in this groove and the OAF carbonyl oxygen atom of the naphthalene diimide core. A nearby cluster of water molecules is also shown, embedded deep in the groove and adjacent to a TTA loop. (c) A view of the 3UYH complex midway between grooves 2 and 3, highlighting the group of four water molecules hydrogen bonding to an N-methyl-piperazine and a morpholino group in these grooves.
All four basic end groups of MM41 have their protonated nitrogen atoms in hydrogen bond/electrostatic contact with atoms in the GQ grooves (Table 2). However, only two Figure 2. Views of the crystal structure of MM41 (with its carbon atoms colored magenta) bound to a human intramolecular telomeric G-quadruplex, PDB id 3UYH. (a) The view is projected onto the G-quartet plane and shows the extent of overlap with the naphthalene diimide core chromophore. The water molecules that are in direct or indirect contact with the MM41 molecule, are shown as red spheres, with hydrogen bonds indicated by dashed lines. This and the subsequent figures were drawn using the ChimeraX package (https://www.cgl.ucsf.edu/chimerax/, last accessed on 16 December 2021) [125]. (b) A view of the 3UYH complex looking into groove 1. The four water molecules are shown that are hydrogen bonded to the morpholino group in this groove and the OAF carbonyl oxygen atom of the naphthalene diimide core. A nearby cluster of water molecules is also shown, embedded deep in the groove and adjacent to a TTA loop. (c) A view of the 3UYH complex midway between grooves 2 and 3, highlighting the group of four water molecules hydrogen bonding to an N-methyl-piperazine and a morpholino group in these grooves. Table 2. Hydrogen bond interactions. (a) In structure 3UYH, involving the tetrasubstituted naphthalene diimide MM41, a human intramolecular telomeric G-quadruplex, and water molecules. Hydrogen-bond distances are shown (d 1-2 in Å), together with the reported crystallographic B factor values (in Å 2 ) for MM41-bound waters and associated MM41 and DNA atoms. MM41 atoms are highlighted in bold red type. Waters in direct contact with MM41 atoms are highlighted in bold blue type. Numbering is as in the PDB entry. (b) In structure PDB id 3CE5 [74], involving the trisubstituted compound BRACO19, a human intermolecular bimolecular telomeric G-quadruplex, and water molecules. Parameter definitions and color coding are as in Table 2a.  All four basic end groups of MM41 have their protonated nitrogen atoms in hydrogen bond/electrostatic contact with atoms in the GQ grooves (Table 2). However, only two of Pharmaceuticals 2022, 15, 7 7 of 18 these contacts, each involving the terminal nitrogen atom of an N-methyl-piperazine group, has a direct nitrogen-phosphate group hydrogen bond interaction (N . . . OP distances of 2.9 and 3.1Ā). The three other end groups all have water contacts with ring nitrogen atoms (Figure 2b,c), which in the case of the morpholino groups, are presumed to be protonated. Groove 1 (Figure 2b) has the morpholino group positioned at the mouth of the groove. A small linear cluster of four water molecules extends from the morpholino basic nitrogen atom, with one water contacting two further waters, which contact with two neighboring phosphate oxygen atoms. One of these waters OP2 dG10 also contacts a fourth water molecule, which in turn contacts the adjacent oxygen substituent on the naphthalene diimide chromophore. The other water, contacting a phosphate oxygen atom (OP2 G9), also contacts and is thus the link to a water molecule in a second water network that fills the rest of this groove. The N-methyl-piperazine group in groove 2 (Figure 2c), which is situated at the mouth of the groove, has its terminal nitrogen atom (NCA) in close contact with a phosphate oxygen atom, OP2 dT11, suggesting that this nitrogen atom carries a proton. The inner piperazine ring nitrogen atom contacts another linear group of four water molecules which extends into groove 3 and terminates with a hydrogen bond contact with the second morpholino group. The second water in this array has a contact with a phosphate oxygen atom (OP2dG16), and the third is in hydrogen bond contact with carbonyl oxygen atom OAG of the ND core. The second N-methyl-piperazine ring, also situated at the mouth of groove 4, has a direct contact involving the outer ring nitrogen and a phosphate oxygen atom (Figure 2a), but does not have any associated water molecules.
The above section describes water molecules involved in ligand contacts; other waters fill out the remaining space in the grooves (the non-completeness in some grooves is most likely due to limitations of the crystal structure at 1.95Ā with only a fraction of the potential total number of water molecules located in electron density). Groove 2 is one of the more completely resolved grooves in terms of hydration (Figure 3), with waters embedded deep into the groove. This complex array of 12 water molecules, of which the majority are first-shell, hydrogen bond with phosphate oxygens, O4 and O5 atoms, and guanine base edges (which form the floor of the groove). The net effect is to maintain the relative positions of TTA loop and the groove. . Surface representation of a view into groove 2 in the 3UYH complex. The N-methyl-piperazine substituent of MM41 is oriented end-on and is colored blue. Note that the groove space is filled out by water molecules, colored red. The semi-transparent surface of the G-quadruplex is colored grey.

MM41 and Water Mobility
It is notable that there is remarkably little overlap between the ND four-ring core and the individual guanines in the top quartet, as seen in Figure 2a. The surface area of the ND core of MM41 is too small to allow simultaneous overlap with more than one guanine Figure 3. Surface representation of a view into groove 2 in the 3UYH complex. The N-methylpiperazine substituent of MM41 is oriented end-on and is colored blue. Note that the groove space is filled out by water molecules, colored red. The semi-transparent surface of the G-quadruplex is colored grey.

MM41 and Water Mobility
It is notable that there is remarkably little overlap between the ND four-ring core and the individual guanines in the top quartet, as seen in Figure 2a. The surface area of the ND core of MM41 is too small to allow simultaneous overlap with more than one guanine of the top G-quartet. Computational experiments were undertaken using two different energy minimization protocols (in the ARGUSLAB and AVOGADRO packages) to assess the effect of removing all the water molecules from the structure. Minimization in both cases resulted in movement of the ND core by ca 1.5 Å to produce improved overlap with a guanine base of the G-quartet. The cationic groups in the side chains tended to move closer to the phosphate oxygen atoms. Attempts to dock the MM41 molecule onto the water-free GQ using AutoDock Vina 1.1.2 as installed within the database G4LDB 2.2, resulted in a series of almost equi-energetic poses in which three out of four side chains were positioned away from the grooves. This was not pursued further.
The side chain heterocyclic groups in MM41 have greater mobility than the ND core, as revealed by their individual atomic temperature factors (see the PDB entry for 3UYH and Table 2). The five nitrogen atoms in these terminal rings, which are hydrogen-bonded to waters or phosphate groups, have a mean B value of 51Ā 2 , corresponding to a <U> of 0.8Ā. The water molecules in the mini cluster around and contacting the morpholino ring in groove 1 have lower B values, with most in the range 27-32Ā 2 , corresponding to a <U> of 0.6Ā. The cluster of waters in groove 2/3 have slightly greater mobility.
The extent to which water molecules located in the 3UYH crystal structure correspond to those found in the native structure was examined by superimposing on 3UYH the native crystal structure 1KF1 [126], overlaying the G-quartets (Figure 4a,b). Overlap of the quartets was good, as expected. However, the loops in the MM41-bound structure adopt distinct conformations compared to those in the native structure. Systematics of loop conformations in GQ crystal structures have been previously reported [95] and will not be further discussed here. Detailed comparison of water positions revealed that the cluster of four waters at the mouth of groove 1 that mediate between ono the morpholino end groups of MM41 and G-quadruplex is also present in the native structure (Figure 4a), with distances between each pair of waters (i.e., a 3UYH water . . . .1KF1 water) 0.6-1.0Ā. Since these waters are mobile with a <U> of 0.6Ā, they can be considered to likely occupy the same space. Three conserved water molecules are also present at the mouth of groove 3 (Figure 4b), close to the second morpholino end group.

Water Mediation in Acridine-G-Quadruplex Structures
The crystal structure (PDB id 3CE5) of the complex between the experimental drug BRACO19 and a bimolecular human GQ [74], shows that, in common with the MM41 complex, the GQ has adopted a parallel topology. In both instances, the ligand is stacked onto one end of the GQ, onto a terminal G-quartet. The BRACO19 molecule has three cationic charges at physiological pH, one in each side chain pyrrolidino ring and one on the central ring nitrogen atom in the acridine ring. None of these are directly hydrogen-bonded to anionic phosphate groups. Instead (Table 2 and Figure 5a), they are hydrogen bonded to water molecules. The waters hydrogen bonded to the pyrrolidino cationic nitrogen atoms do eventually link indirectly to phosphate groups, via further water molecules. Water molecule W52 hydrogen bonded to the acridine central ring nitrogen, appears to play a crucial role, hydrogen bonding both to an O6 of a guanine from the adjacent stacked G-quartet, and to N3 of a thymine in-plane with the acridine. W52 also hydrogen bonds to W53, which in turn hydrogen bonds to the carbonyl oxygen atom of one of the amide sidechains on BRACO19. The other amide group is trans to this and its amide nitrogen atom hydrogen bonds to O4 of this thymine-this is the sole direct drug-GQ hydrogen bond, with the remaining five being water-mediated. An overlay of the MM41 and BRACO19 complexes (Figure 5c) shows that several of the key water molecules are conserved between the two structures, with distances between pairs of waters < 1 Å, using the argument outlined in the previous section. end groups of MM41 and G-quadruplex is also present in the native structure ( Figure 4 with distances between each pair of waters (i.e., a 3UYH water ….1KF1 water) 0.6-1.0 Since these waters are mobile with a <U> of 0.6 Ǟ, they can be considered to likely occu the same space. Three conserved water molecules are also present at the mouth of groo 3 (Figure 4b), close to the second morpholino end group.
(a) (b) Figure 4. Superposition of the G-quartets of native and MM41-complexed G-quadruplex crys structures, 1KF1 and 3UYH, respectively, viewed into groove 1. The native structure is colored lig red, the ligand-bound is cyan, and the MM41 molecule is shown magenta. Only the water molecu in the groove are shown, colored as in their G-quadruplex structures. Those water molecules in t two structures that are < 1.0 Å to each other are enclosed in red circles. (a) Viewed into groove 1. Viewed into groove 3. pears to play a crucial role, hydrogen bonding both to an O6 of a guanine from the adjacent stacked G-quartet, and to N3 of a thymine in-plane with the acridine. W52 also hydrogen bonds to W53, which in turn hydrogen bonds to the carbonyl oxygen atom of one of the amide side-chains on BRACO19. The other amide group is trans to this and its amide nitrogen atom hydrogen bonds to O4 of this thymine-this is the sole direct drug-GQ hydrogen bond, with the remaining five being water-mediated. An overlay of the MM41 and BRACO19 complexes (Figure 5c) shows that several of the key water molecules are conserved between the two structures, with distances between pairs of waters < 1 Å, using the argument outlined in the previous section. Removal of the water molecules from this structure followed by energy minimization, resulted in movement of the acridine by ca 2.5 Å, enabling the acridine ring nitrogen atom to directly contact the thymine ring substituents. Such an arrangement has been observed in the series of co-crystal structures [75] involving disubstituted acridines with the bimolecular anti-parallel GQ from Oxytricha nova ( Table 2). These structures, exemplified by the two high-resolution structures [100] with fluorine substituents in the pyrrolidino side chains, have direct O2 thymine and N3 hydrogen-bond contacts with the acridine ring Figure 5. (a) View of the BRACO19 complex with a human telomeric bimolecular G-quadruplex, as observed in the crystal structure PDB id 3CE5 [74], projected onto the acridine plane. The carbon atoms of the ligand are colored magenta and water molecules are colored as red spheres. Hydrogen bonds are shown as dotted lines. Some water molecules not directly involved in ligand interactions have been omitted from this view to enhance clarity. (b) View of the complex involving a disubstituted acridine with a fluorine atom attached to each terminal side chain pyrrolidino ring, bound to an Oxytricha nova bimolecular G-quadruplex [100]. Color coding is as in other figures, with the fluorine atoms colored yellow. (c) Overlay of structures 3UYH (G-quadruplex in cyan and MM41 in magenta) and 3CE5 (G-quadruplex in red and BRACO19 in dark blue), superimposed on the G-quartets, viewed into groove 1 of the 3UYH structure. Water molecules in the groove are colored as in their G-quadruplex structures. Those water molecules in the two structures that are <1.0 Å distance to each other are enclosed in red circles.
Removal of the water molecules from this structure followed by energy minimization, resulted in movement of the acridine by ca 2.5 Å, enabling the acridine ring nitrogen atom to directly contact the thymine ring substituents. Such an arrangement has been observed in the series of co-crystal structures [75] involving disubstituted acridines with the bimolecular anti-parallel GQ from Oxytricha nova ( Table 2). These structures, exemplified by the two high-resolution structures [100] with fluorine substituents in the pyrrolidino side chains, have direct O 2 thymine and N3 hydrogen-bond contacts with the acridine ring and an amide carbonyl oxygen atom (Figure 5b). A water molecule is involved in mediating between a side chain amide and a phosphate oxygen atom. However, the Oxytricha nova complexes all have a distinct anti-parallel GQ topology, with the acridine constrained within a tetranucleotide diagonal loop, with little room for any associated water molecules.

Discussion
The number of water molecules located in nucleic acid and protein crystal structures is invariably less than the total present in the crystal lattice. Thus, crystal structures 3UYH and 3CE5, whose reported water molecules are analyzed here, both have an estimated 56% solvent, which corresponds to >400 water molecules. A small fraction of this, 51 and 54 water molecules, respectively, were observed in electron density maps [74,81]. These are almost all first or second shell immobilized waters.
Three principal findings emerge from the current analysis: 1.
The morpholino end groups of MM41, which are assumed to be basic in the buffering conditions of the crystallization experiment and in biological solution, do not directly contact the GQ. Hydrogen bonding/electrostatic interactions with negative backbone phosphate groups were anticipated but were not observed. Instead, the basic ring nitrogen in each morpholino group hydrogen bonds to one of a group of four water molecules positioned in the mouth of the relevant grooves (1 and 3). The waters are in hydrogen bond contact with backbone phosphates. Similarly, the basic pyrrolidino side chain terminal groups of BRACO19 do not directly contact phosphate groups in its GQ complex, with water mediation being observed in the crystal environment.

2.
A nitrogen atom on both N-methyl-piperazine groups of MM41, by contrast, directly hydrogen bonds to a backbone phosphate oxygen atom, implying greater basicity than morpholino for this end group. 3.
The water clusters associated with the two morpholino groups of MM41 are highly conserved between the native and the MM41-bound GQ structures. There is also conservation of a number of the ligand-associated waters between the MM41 and BRACO19 structures, and by implication, between the native and BRACO19 structures.
We suggest that the conserved water clusters have relevance to the observed structureactivity relationships for MM41 derivatives [81]. Thus, replacing the morpholino groups with isosteric groups such as hexose or ether groups, which lack the morpholino hydrogen bonding ability, results in an almost complete loss of GQ affinity and reduced biological activity compared to MM41. It is also notable that none of the four side chains are deeply embedded in the GQ grooves, and one might have therefore expected reduced GQ affinity compared to analogues with longer side chains. This is not the case since, as observed here, the short side chains are effectively captured by hydrogen bonded to the conserved water clusters, which would have to be displaced by longer side chains. This would be at a significant entropic cost. The strategy of replacing two of the four strongly basic end groups characteristic of earlier ND compounds (see, for example refs. [55,57,79]), by less strongly basic morpholino groups [18,19,81], has culminated in the design and evaluation of MM41, and subsequently compounds CM03 and SOP1812 (Figure 1), which are currently being assessed as pre-clinical candidates. The rationale for lowering the highly cationic nature of these ND compounds is that this could improve cellular uptake and tumor distribution while retaining GQ affinity. The present analysis has shown that this substitution has preserved the water structure around the perimeter of the grooves in which the morpholino groups bind, with no diminution of GQ affinity [90].
The mediated waters in the BRACO19 structure also have relevance to the observed structure-activity relationships for BRACO-GQ interactions [73,74], and for BRACO19 biology. Although BRACO19 may not be further developed as an anti-cancer drug [127], its activity against a variety of anti-viral GQ targets [128][129][130] suggests the likelihood of further anti-viral analogue programs, for which the 3CE5 structure and its structured water features are of direct relevance.
The present analysis, although limited to two crystal structures, demonstrates that water molecules can play an active role in GQ-ligand recognition. This indicates that in silico and docking studies of ligand-GQ binding need to take account of reliably located explicit water molecules. It is concluded that their omission will lead to misleading conclusions on low-energy ligand binding states and interactions. Many such studies still tend to ignore the role of water and really require input from high-resolution crystal structures or reliable and well-validated water modelling/simulations. This has been recognized in several studies for example, refs. [118,131]. Prediction of water positions and mobilities in ligand complexes can be made using molecular dynamics [131], although this has only rarely been been used to date for GQ systems [132]. The prediction of water positions having low mobility in nucleic acids by use of a specially generated water force field together with statistical scoring has led to the development of an automated method, termed "SPLASH'EM" (Solvation Potential Laid around Statistical Hydration on Entire Macromolecules) [118], which has given results for duplex DNA and some RNA structures in good agreement with experiment. It will be interesting to see this type of approach used for those GQ ligand complexes for which there is high resolution structural data, as well as therapeutically important GQ drug targets such as that from the KRAS promoter sequence [114]. Conserved groove water molecules have been identified in the grooves of this crystal structure [114], as well as in other high-resolution GQ native structures [115]. By analogy with structures 3UYH and 3CE5, these conserved and structured waters should be retained in docking studies. The present analysis indicates that such water platforms for ligand binding can form an essential part of the total low-energy GQ interaction complex.
Having at least one water molecule contacting a ligand, 3.
Hydrogen bonds were accepted in a structure if donor-acceptor distances ≤3.25 Å donor-hydrogen . . . acceptor angles were ≤30 • from ideality, and 4. Relevance to current drug discovery.
We excluded the daunomycin complexes with d(G 4 ) and d(TG 4 T) (PDB ids 3TVB and 1O0K) from consideration, even though they are of high resolution and have large numbers of localized water molecules [116,117]. Their relationship to human GQ ligand complexes is unclear, because of their characteristics of multiple bound and stacked daunomycin molecules.

Conclusions
This study has analysed data from earlier crystal structure analyses and has shown that 1st and 2nd shell water molecules play an important role in the binding of two experimental small-molecule drugs, MM41 and BRACO19 to human telomeric G-quadruplexes. These waters mediate between cationic side-chain functional groups and phosphate backbones. They also directly bridge the chromophore core of the drugs and other G-quadruplex groups. Altogether, waters serve to maintain the drug molecules in their low-energy binding positions and their removal would result in incorrect drug positions. This has implications for drug design and virtual library screening and docking.