Phosphorus SAD Phasing for Nucleic Acid Structures : Limitations and Potential

Phasing of nucleic acid crystal diffraction data using the anomalous signal of phosphorus, P-SAD, at Cukα wavelength has been previously demonstrated using Z-DNA. Since the original work on P-SAD with Z-DNA there has been, with a notable exception, a conspicuous absence of applications of the technique to additional nucleic acid crystal structures. We have reproduced the P-SAD phasing of Z-DNA using a rotating-anode source and have attempted to phase a variety of nucleic acid crystals using P-SAD without success. A comparison of P-SAD using Z-DNA and a representative nucleic acid, the Dickerson-Drew dodecamer, is presented along with a S-SAD using only two sulfurs to phase a 2’-thio modified DNA decamer. A theoretical explanation for the limitation of P-SAD applied to nucleic acids is presented to show that the relatively high atomic displacement parameter of phosphorus in the nucleic acid backbone is responsible for the lack of success in applying P-SAD to nucleic acid diffraction data.


Introduction
Anomalous dispersion phasing is an attractive approach to solving the phase problem in crystallography of biological macromolecules.In the best cases, experimental phases determined using anomalous dispersion techniques are capable of producing interpretable electron density maps without knowledge of the sequence of the protein or nucleic acid.The ability to exploit these techniques obviates the requirement for previously determined structures of similar conformation for molecular replacement phasing.However, the presence of a tractable anomalous signal depends on the presence of a suitable anomalous scatterer and the ability to collect diffraction data to adequate precision to extract that signal.In most cases, the anomalous scatterer is an artificial addition to the structure to be determined.Ideally, the anomalous scatterer is an atom naturally occurring in the molecule whose structure is to be determined precluding any question of isomorphism.
Covalent modification of oligonucleotide-size fragments with bromine (also iodine) at the C5-position of uracil or cytosine in combination with the multiple-wavelength anomalous dispersion (MAD) [21], or single-wavelength anomalous dispersion (SAD) [22], techniques is a common phasing approach [23].Halogenated nucleotides can be incorporated using both solid phase synthesis and in vitro transcription, the required reagents are commercially available and relatively cheap, and the bromine K edge is easily accessible at X-ray synchrotrons.However, despite its success the approach often fails: the carbon-halogen bond is light sensitive and halogenation can prevent crystallization or negatively affect the quality of crystals; the limited choice of incorporation sites constitutes a further disadvantage (i.e., [24][25][26]).Covalent incorporation of selenium into proteins in the form of Se-Met has greatly facilitated protein crystallography [27][28][29].Selenium modification has recently also been established for native and chemically modified nucleic acids and offers considerable benefits over bromine modification in terms of phasing [30][31][32][33][34][35][36][37][38].
Phasing based on intrinsically present heavy atoms such as sulfur in proteins obviously precludes the need for covalent or non-covalent modification.Thus, SAD phasing using the anomalous signal of sulfur (S-SAD) at either copper or chromium K α wavelengths on in-house rotating anode generators has enjoyed considerable success [39][40][41][42][43][44][45][46][47][48][49], even though the data are collected at energies considerably lower than the absorption edge.Wang [50] formulated a theoretical limit for the anomalous signal for a successful SAD phasing at 0.6%.Ramagopal et al. [51] were able to phase crystals of glucose isomerase and xylanase with anomalous signals below 1% and establish the validity of the Wang limit for S-SAD.
By comparison, in the years since the original work on phosphorus SAD (P-SAD) phasing of Z-DNA was done by Dauter and Adamiak [52], there has been little evidence of success using the phosphorus signal to phase oligonucleotide crystal diffraction data.This is in spite of the theoretical utility of the phosphorus signal.Oligonucleotide crystals generally contain one phosphorus atom per nucleotide (the 5'-terminal residue in chemically synthesized material typically carries a free hydroxyl group).Therefore the average anomalous signal will not vary by molecular weight.So, why has P-SAD not become a method of choice for phasing nucleic acid diffraction data?
We have explored the possibility of P-SAD phasing for several DNA and RNA oligonucleotides of known or unknown crystal structure using data collected on an in-house source with copper K α radiation.The work on the Z-DNA hexamer d(CGCGCG) follows and duplicates that by Dauter and Adamiak [52] who successfully phased diffraction data from a crystal of Z-DNA using 1.5418 Å data collected using a synchrotron source.The present work verifies the use of a rotating-anode source for phasing of Z-DNA diffraction data using P-SAD.Although a number of oligonucleotide crystals were examined in this study, only the Z-DNA crystal could be phased using P-SAD.It appears that the anomalous signal of phosphorus is too weak for P-SAD structure determination with even well diffracting crystals of relatively small oligonucleotides (i.e., the Dickerson-Drew dodecamer [2,53]).Rather than discuss each unsuccessful case, the Dickerson-Drew dodecamer, DDD d(CGCGAATTCGCG), is presented as representative of oligonucleotide crystals that would appear to be amenable to phasing by P-SAD but were not.The 2'-thio dodecamer crystal,(GCGTAU*ACGC; U*=2'-SMe-rU) is discussed as a case in which an oligonucleotide crystal could be phased using only two sulfur atoms demonstrating that the structural form of oligonucleotide crystals does not account for the lack of success using phosphorus alone.A theoretical explanation for this unexpected lack of success is presented.While the method is not yet routinely feasible, new developments and methods of data collection may help to achieve the levels of precision needed to make P-SAD a viable technique.

Z-DNA
The Z-DNA crystal was flash-cooled directly from the mother liquor and maintained at 100 K during data collection.The data collection strategy determined using the Bruker Cosmo software (Bruker AXS, Karlsruhe, Germany) to maximize real redundancy required a total of 13 separate ω and φ scans with 2θ angles up to 67 degrees.A total of 2323.5 degrees of data were collected from the single crystal.Data collection statistics are presented in Table 1.
Data were collected to a maximum resolution of 0.95Å matching the resolution of data used by Adamiak and Dauter [52].The quality of the data clearly exceeded that limit (see Table 1).Phasing of the integrated and scaled data using SHELXC/D/E [54] was done through the hkl2map gui [55] assuming 10 phosporus atoms and a solvent content of 46%.The solvent content calculated using the Matthews Probability Calculator is 49.94% [56,57].SHELXD was run on the prepared data for a total of 10,000 tries.A plot of the correlation coefficient calculated for all data, CCall, versus try number is presented in Figure 1.The vast majority of tries resulted in CCall around 20%.A quick look at the scatter plot would suggest that successful tries were those with a CCall greater than 40% of which there are 65 or about 0.65% of all tries.However, tries with CCall as low as 24.59% still provided excellent phases.Successful tries in Figure 1 are denoted in red and unsuccessful tries with green.From this it can be seen that the success rate is much higher, 373 or about 3.73%.The CC for all data appears to be the best indicator of a successful try but is not perfect as can be seen from the overlap of successful and unsuccessful tries.In fact, a try with a CCall value of 24.76% and a CC weak value of 8.91% was successful while a different try with an identical value for CCall but a CC weak of 5.37% failed.The cutoff of successful CCall values just below 25% is consistent with previous studies of protein SAD phasing [58].It is interesting that there is not a clear clustering for successful and unsuccessful tries.The peak height for sites is plotted in Figure 2.This plot shows that the peak height or occupancies calculated by SHELXD do not readily indicate true results from noise.Although the cluster of best tries clearly shows all 10 sites with high occupancy, there are a larger number of less well defined tries that provide excellent phases without all sites being well defined.The density modification in SHELXE is capable of refining less well defined phosphorus atom positions as shown in Figure 3.There are 3 phosphorus atoms which appear among the top peaks in all trials while the remainder do not follow a definite sequence (see Table 2) so that the peak height found by SHELXD is only crudely correlated with atomic B-factor refined in SHELXL (see Table 3).All peaks found in the top try (correlation coefficient of 44.98%) are shown in Figure 4 to be closely aligned with the final refined Z-DNA structure.Refinement statistics for Z-DNA are given in Table 4.As a further test of the quality of the Z-DNA data, phasing was done using all data to 0.95 Å by direct methods using SIR2004 [59].The Z-DNA data set was given to SIR2004 using the "nopatterson" option so that phasing was done using only the tangent method and the anomalous signal in the data was ignored (Figure 5a).Direct methods phasing was able to produce a map of sufficient quality to automatically place six atoms of the spermine molecule correctly (see Figure 5b).The resolution of the data is critical to the success of the direct methods tangent procedure given 240 non-hydrogen atoms in the asymmetric unit.The tangent procedure was unsuccessful when the data were limited to 1.1 Å resolution.

Dickerson-Drew Dodecamer
A large number of DDD crystals were used to collect diffraction data.Data collection statistics from a representative crystal are presented in Table 1.Repeated attempts to phase the data using the anomalous signal of phosphorus were unsuccessful.A representative crystal chosen for discussion here was collected at 100 K.The strategy determined using the Bruker Cosmo software required 25 separate ω and φ scans totaling 4638.5 degrees of data.Diffraction data collection statistics are presented in Table 1.
Attempts at phasing of the diffraction data were done using SHELXD through the hkl2map gui for a minimum of 10000 tries.Phasing attempts were performed using a variety of data cut-offs in both resolution and I/σ(I).Phasing of the data was attempted using SHARP [60] and SOlVE [61] without success.

2 -Thiomethyl-Modified DNA Decamer
Diffraction data were collected at 100 K to a resolution of 1.3 Å.The data collection strategy determined using the Bruker Cosmo software required 21 separate φ and ω scans with a total of 5,326 degrees of data collected from a single crystal.Data collection statistics are presented in Table 1.Refinement statistics for the 2'-thio decamer are given in Table 4.
Phasing of the diffraction data was done using SHELXD/E through the hkl2map gui using a search for 20 phosphorus atoms and a resolution cut-off of 1.6 Å.A scatter plot of correlation coeffiecient for all data, CCall, versus try number over 10000 tries is presented in Figure 6.Comparison of Figure 6 with Figure 1 demonstrates the much higher success rate for phasing with the 2'-thio decamer over the Z-DNA data even given the exceptionally high quality of the Z-DNA data.In each successful phasing try the two sulfur atoms were the first anomalous scatterers found and have the highest occupancy values.It is common practice to use the phosphorus anomalous signal as an internal control for correctness of SAD phasing using strong anomalous scattering from such elements as platinum, thallium, and rubidium [62] so the phosphorus positions are likely due primarily to the quality of the sulfur phases.

Anomalous Signal
The relative level of difficulty in extracting an anomalous signal using phosphorus atoms can be appreciated through a graph of f and f of phosphorus along with commonly used elements, sulfur, selenium, and barium (see Figure 7).Comparing this to the specific instance of sulfur and phosphorous signal at the in-house Cu K α wavelength (Figure 8) illustrates the difficulty of obtaining a sufficient signal:noise ratio to adequately retrieve the anomalous signal.
It seems from these results that the most important factor for success is the quality of the data and thus the signal to noise ratio.While Dauter and Adamiak [52] demonstrated the value of redundancy for success in phasing Z-DNA diffraction data, it must be noted that Z-DNA crystals provide unusually high quality diffraction data extending to 0.6Å using synchrotron radiation [7] and extending well beyond the present 0.95Å resolution in-house as seen by the data collection statistics for Z-DNA in Table 1.The anomalous contribution of the phosphorus atoms at copper K α is generally represented as the ratio of the mean Bijvoet difference to the mean amplitude at a given diffraction angle θ: where N A is the number of anomalous scatterers, phosphorus atoms, N P is the number of protein atoms and f e f f = 1 N Σ f i which is the effective scattering from an average atom at diffraction angle θ.The theoretical value of ∆F ano /F for nucleic acid crystal containing one phosphorus atom per base has been calculated by Dauter and Adamiak [52] to be about 2.0%.Calculation of ∆F ano /F for actual diffraction data is necessarily overestimated and the degree of overestimation provides a measure of noise in the anomalous signal.A comparison of the ∆F ano /F ratio plotted by resolution for the three data sets is presented in Figure 9.The signal to noise ratio evaluated using ∆F ano versus resolution plotted for the three data sets, zdna, DDD, and 2'-thio decamer.The data were calculted by resolution bin using 10 bins each for plotting.Z-DNA, a number of apparently noise level solutions from SHELXD were identified after refinement in SHELXE (Figure 10).It may be that advances in refinement of anomalous scatterers such as the "free-lunch algorithm" [63] will facilitate the application of P-SAD.

Identification of Correct Solutions
In an evaluation of the Measurability of Bijvoet differences as an indicator for phasing success (see Table 5), the DDD data showed the highest percentage measurability of the three and Z-DNA the lowest.It may be that the contributions of experimental error and noise at this level of anomalous signal make this measure unreliable for P-SAD.

Limitation of P-SAD
While in theory it might appear that P-SAD should be effective in phasing in-house data from oligonucleotide crystals, in practice it is rare that the phosphorus signal is sufficient to determine structure.Indeed, the one case in which the anomalous signal from phosphorus was sufficient to phase the data, Z-DNA, is now shown to be a special case.The first novel nucleic acid structure solved by P-SAD is the Z-DNA dodecamer, d(CGCGCGCGCGCG) [64].While a significant milestone in the advancement of the technique, this may represent a special case given that the crystals were capable of diffraction to a resolution of 0.75 Å. Phasing was done using a synchrotron source, 22-ID APS, at a wavelength of 1.5418 Å to simulate copper K α radiation.The success of P-SAD phasing in this instance is remarkable in that the multiplicity of the data with Friedel mates unmerged was only 5.7 overall and 3.7 in the highest resolution shell, 1.70-1.64Å. Native data to the ultra-high resolution were collected separately.The relative thermal motion of the phosphate backbone was high in the Z-DNA dodecamer with 6 out of the 11 phosphates modeled in multiple conformations.This in itself suggests that P-SAD phasing can ultimately be successful in nucleic acid structure determinations.Even though very few nucleic acids will provide such ultra-high resolution, as will be discussed later, new techniques may be capable of providing adequate anomalous signal from even average crystals.
Early in our efforts to apply P-SAD to nucleic acid structure determination, a number of hypotheses were considered to explain the discrepancy between theory and practice in the application of P-SAD phasing.It was considered possible that the periodicity of base-stacking could be overwhelming the anomalous signal from phosphorus.Base-stacking dominates the Patterson maps and tends to produce exceptionally strong reflections at around 3.4 Å and multiples of that dimension could affect signal to noise ratio of the anomalous signal.This does not appear to be the case as anomalous Patterson maps are quite clear for the Z-DNA data.
The most likely cause for the lack of success for P-SAD phasing comes from the relative thermal motion of the P atoms as opposed to the non-anomalous atoms.This was shown for S-SAD phasing to be one reason why sulfurs were useful [65] in phasing protein diffraction data.An analysis of the impact of the difference in average B-factor for phosphorus in relation to the non-anomalously scattering atoms was made by calculating the theoretical anomalous signal versus scattering angle (see Figure 11) according to the equation:

Phosphorus Mobility
Here ∆B, is the average difference in B factor of phosphorus versus the average B factor of all other non-hydrogen atoms in the model.For proteins, it was found that sulfur atoms tend to have lower B factors than the average of the other non-hydrogen atoms and a survey of models in the PDB showed that ∆B for proteins was about +3 Å 2 .A search of the PDB for double helical DNAs was made and resulted in a list of 34 structures as detailed in Table 6.From this list of models, ∆B for phosphorus was found to be -5.24Å 2 .However, three of the models were outliers with phosphorus B-factors lower than the average for all other non-hydrogen atoms.In these unpublished models, PDB ID code 3F80, 1EHV, and 3GDA, an unpaired base with high B-factors is flipped out and skews the result.If these models are omitted, the average ∆B is about -6.0 Å 2 .A more exhaustive study of the value of ∆B in nucleic acid crystals was undertaken to supplement the selected group of 34 structures.A cohort of 934 duplex DNA structures for which the B-factors were refined individually was examined calculating ∆B by comparing PO 2 versus only the remaining atoms in the DNA duplex.The value of ∆B was calculated for each structure individually and an overall mean value of ∆B was found to be -7.27±5.05Å 2 .A scatter plot of the values of ∆B versus reported resolution of the models is presented in Figure 12.A regression of ∆B on resolution is plotted with the average ∆B shown with error bars where the slope of the regression is -3.086 and the intercept is -1.49.From this it is clear that the resolution of the data bears some influence on the feasibility of P-SAD through its relation to ∆B.
The theoretical level of the anomalous signal for phosphorus was calculated to be about 2% using a value for f e f f of about 6.7.However, this is the value at zero scattering angle.In order to evaluate the scattering factor over the full range of scattering angles, the effective scattering was calculated using the four-Gaussian approximation with the Cromer-Mann coefficients for each of the elements, C, N, O, and P [91].To approximate the effective scattering factor for DNA, the relative abundance of each of the elements was tabulated from d(GT)/d(AC).This resulted in the relative proportions of the element, C:N:O:P, to be 0.476:0.183:0.293:0.049.The scattering curves in Figure 11 were calculated for each of the elements and the effective scattering factor, f e f f , was calculated as the weighted sum of the individual element contributions.The theoretical value of the anomalous signal < |∆F ± | > / < F > is plotted by scattering angle, sinθ/λ, for three values of ∆B, +3, 0.0, and -6.0, in Figure 13.From this plot, it is apparent that the higher atomic displacement parameter value of phosphorus atoms in nucleic acids versus the average atomic displacement parameter for non-anomalously scattering atoms is likely to be the primary contributing factor to the continued lack of success in the application of P-SAD.

Potential for P-SAD
Even given the limitation of P-SAD phasing due to the mobility of the phosphates on the nucleic acid backbone, there is yet hope that recent advances in instrumentation and methodology will allow for data collection to the precision required for this challenging method.
Advances include new sources such as the X-ray Free Electron Laser (XFEL) [92].Sulfur/chlorine SAD phasing has been reported using serial femtosecond crystallography with XFEL [93].The development of XFEL techniques is very promising given the diffraction:damage ratios although radiation damage has been shown to be a factor in serial femtosecond XFEL crystallography as well [94].
Recent advances in detectors and goniometers as well as X-ray sources have extended our ability to acquire high quality diffraction data.New multi-axis goniometers at synchrotron sources such as the ESRF/EMBL mini-κ goniometer and the Parallel Robotics Inspired Goniometer, PRIGo [95].The PRIGo goniometer serves as an Eulerian cradle providing three circles for ω, χ, and φ angles.

Phasing and Data Collection
This study was done using an in-house rotating-anode X-ray source to determine the feasibility and best approaches for solving the phase problem without recourse to synchrotron data as a mostly practical solution since most institutions employing crystallographers interested in nucleic acid structure will have in-house equipment.It should also be noted that rotating-anodes do have some advantages over synchrotron beam lines in this regard [96].The most obvious advantage is the greater degree of access and control allowing for careful maintenance and alignment prior to a demanding data collection.The high redundancies required for P-SAD phasing are more difficult to achieve at synchrotrons due to both time limitations and the greater degree of radiation damage from synchrotron beams.A fundamental advantage in the use of a modern rotating-anode X-ray source is the lack of wavelength drift during data collection.Wavelength drift can introduce systematic error in synchrotron data that justifies choosing the high energy remote wavelength as the reference wavelength in MAD data collection as the anomalous signal remains significant but is relatively insensitive to wavelength drift [58].
Specific strategies for acquiring precise native SAD data have been developed and demonstrated, primarily for S-SAD, involving either a Multi-Data Set (MDS), approach with a single crystal or sophisticated merging of data sets from multiple crystals in a Multi-Crystal Approach or MCA.For MDS approaches [97][98][99][100], the Garman limit for absorbed radiation dose [101] is determined.The resulting exposure time is then "dose-sliced" by dividing the exposure time to achieve the Garman limit by the number of data sets to be collected.The MCA technique [102,103] involves collecting complete data sets from as few as 5 and as many as 13 different crystals.One of the most critical elements of MCA is assuring that diffraction and lattice parameters from each of the crystals are statistically equivalent.Three tests for identification of statistical outliers were developed [102] and cluster analysis was used to verify equivalence of unit cell parameters and for both intensity and anomalous correlation between sets.Application of the two approaches was shown to be particularly effective for S-SAD phasing of a histidine kinase [104].A tour de force application of the combination of the two data collection strategies has recently been used to solve three challenging structures by native SAD phasing [105].In one case, the CRISPR-associated protein, Cas9, in complex with sgRNA and target DNA was solved using MDS data merged from 3 crystals.The complex crystallized in the relatively low symmetry space group, C2, with 1371 amino acids, 83 RNA nucleotides, and 39 DNA nucleotides in the asymmetric unit.The anomalous substructure comprised of 24 S atoms and 120 P atoms could be solved only after merging multiple low dose data sets collected in multiple orientations and from multiple crystals.This is the largest substructure solved thus far using native SAD phasing.

Choice of Wavelength
Ideally, when collecting anomalous dispersion data the wavelength used will correspond to an absorption edge for the anomalous scatterer present in the crystal.For phosphorus, this would mean collecting data at the K-edge energy of 2.1455 keV, 5.7788 Å.This exceptionally low energy is not generally attainable at a synchrotron source due to absorption and heating in beamline components.The choice of wavelength will depend on a calculation taking into account absorption, crystal size and a number of other factors with the object of maximizing the signal as quantified in the statistic, I/σ(I).Lower energies with corresponding higher values of f" for phosphorus phasing will also suffer from increase in radiation damage and increased absorption.Away from the absorption edge, absorption and, thus, the intensity of individual reflections will scale as the cube of the wavelength.Larger crystals may benefit from a higher energy so that for native SAD crystals of 50 µm or smaller would be favored.Experiments examining the relationship between crystal size and wavelength did not establish a clear relationship [106] and implicated individual sample quality.It was also suggested that small unit cells and higher symmetry space groups are detrimental by reducing the number of Bijvoet pairs at lower resolutions making substructure determination in SHELXD more difficult.Longer wavelengths also present issues with data collection strategy.The highest resolution attainable for diffraction data collection will depend on the beamline equipment.If the data collection strategy is limited to a single rotation axis, then the higher resolution regions unreachable by the Ewald sphere increase with increasing wavelength and the highest resolution data measurable depends on the minimum crystal to detector distance available [107].The mini-κ goniometer and the PRIGo goniometers are well suited to offset these issues.While it may not be a significant factor currently, the limiting sphere in diffraction means that the ultimate diffraction possible is limited to one-half of the wavelength of the radiation used so that data collection at a wavelength of 4 Å limits diffraction resolution to 2 Å.
Synchrotron beamlines are attempting to approach the lower energies needed to enable techniques such as native SAD phasing.Previous experiments at synchrotron sources have recommended 2.1 Å as an optimum wavelength [75] however, new developments at synchrotron sources have extended the range of energies available.The EMBL P13 beamline at the PETRA III synchrotron source in Hamburg is now capable of providing a variable focus size beam at wavelengths as long a 3.1 Å [108] at which the f" for phosphorus becomes about 1.2 e − .
The choice of wavelength for in-house data collection is generally limited to either copper K α , 8.046 keV and 1.5418 Å, or chromium K α , 5.415 keV and 2.2909 Å. Copper is by far the most common source available for in-house data collection.The use of chromium targets has been thoroughly investigated and proven useful [97,98] although the increase in air-scatter generally requires the additional complication of a helium filled chamber between the crystal and detector face to maintain an acceptable signal:noise ratio.

Radiation Damage
The need for exceptional levels of multiplicity in diffraction data collection in order to extract the phosphorus anomalous signal implies a concomitant increase in exposure time and radiation dose and concern for potential issues with chemical changes during data collection.For nucleic acid crystallography, radiation damage, while a critically important factor, may not be as serious as it is for protein S-SAD.A recent study of a nucleoprotein complex [109] quantified the per-atom electron density changes over a wide range of radiation dose levels (1.3-25.0MGy) and showed that RNA is much less sensitive to radiation-induced chemical changes than protein.And, unexpectedly, the normally radiation sensitive Glu and Asp residues within RNA binding pockets appear to have been protected by the association with RNA.
Software for simulation of radiation absorbed dose has been developed [110,111] that takes into account beam characteristics, crystal composition and exposure rates.These programs provide sophisticated calculations that should be used to provide guidance in setting up diffraction data collection strategies.This is particularly important when using the dose-sliced multiple-data set approach as discussed in Section 3.4.1.

Successful Application of P-SAD
The most remarkable application of P-SAD to date is the structure of a DNA dodecamer containing 5-formylcytosine (5fC), d(CTA-5fC-G-5fC-G-5fC-GTAG ) [112] at 1.4 Å resolution and phased entirely using the anomalous signal of phosphorus in-house at copper K α wavelength.The data were collected using a κ-axis goniometer and a Platinum 135 CCD area detector (Proteum X8, Bruker AXS, Madison, WI, USA) similar to the system used in the experiments reported here and utilizing that systems strategy software to maximize the redundancy of the data while evening out the sampling over the entire reciprocal lattice.The data were collected from a low resolution of 45.94 Å to a high of 1.60 Å with the highest resolution shell from 1.63-1.60Å.The overall redundancy of the data was a remarkable 85.1 and for the highest resolution shell an even more remarkable 40.6.The higher redundancies were facilitated by the high symmetry of the crystal, a tetragonal lattice with space group P4 3 2 1 2. The signal:noise ratio of the data were reported as an I/σ(I) of 48.8 overall and 4.9 in the highest resolution shell and a significant anomalous signal to a resolution of 2.2 Å.This successful application of P-SAD appears to be due to the remarkably high redundancy of the data.All eleven phosphorus atoms in the asymmetric unit were located by P-SAD phasing in PHENIX [113].Location of the anomalous scattering atoms was facilitated by the symmetry of the nucleic acid duplex corresponding to a crystallographic symmetry element.Calculations made here for ∆B (as discussed in Section 3.3.1)for phosphorus using the model for the 5fC containing dodecamer, PDB id 4QKK, showed that the average B-factor for P, OP1, and OP2 atoms in the backbone was 32.966 ± 6.37 Å 2 and for the other atoms in the model 25.283±4.251Å 2 leaving a nearly average ∆B of -7.68 Å 2 .The successful phasing of the 5fC dodecamer suggests that, while highly challenging due to the relative mobility of the phosphates in the nucleic acid backbone, P-SAD is yet a worthwhile goal.It may be that more structures may be phased using P-SAD by taking full advantage of all the recent developments in instrumentation and data collection strategies.
The Dickerson-Drew Dodecamer, DDD, was crystallized from oligonucleotides purchased from Integrated DNA Technologies (Coralville, Iowa).Crystallization was done using sitting drop vapor diffusion against a reservoir of 40% MPD.Drops contained 1.2 mM DNA, 8 mM spermine•4HCl, 40 mM MgCl 2 , 20 mM sodium cacodylate, pH 6.9.Crystals were flash-cooled from the crystallization drop without further treatment.
The 2'-thio modified oligonucleotide DNA, GCGTAU * ACGC (U * =2 -SMe-rU) was synthesized by solid-phase phosphoramidite chemistry and purified as previously described [115].Crystals were grown by hanging drop vapor diffusion using the Nucleic Acid Miniscreen (Hampton Research, Aliso Viejo, Ca).Crystals were flash-cooled without the need for additional cryoprotectant.

Diffraction Data Collection
All diffraction screening and data collections were performed using the Biomolecular Crystallography Facility in the Vanderbilt University Center for Structural Biology.All in-house data collection was performed using a Bruker-Nonius Microstar rotating anode X-ray generator equipped with a Proteum PT135 CCD area detector mounted on an X8 kappa goniometer with Montel confocal multilayer optics.Crystals were maintained at 100K using a Bruker KryoFlex cryostat.Autoindexing, strategy calculations and data reduction was done using Proteum2 software (Bruker-AXS, Madison, WI).Use of the COSMO module in the Proteum software in conjunction with the 4-circle goniometer allowed for high redundancy in data collection encompassing the full reciprocal space lattice.The system was aligned immediately prior or close to the time of data collection to insure optimum performance during data collection.Scaled data were evaluated and reflection files written using XPREP [116].

Diffraction Data Phasing and Model Refinement
Direct methods phasing of diffraction data was done using SIR2004 [59].Phasing using the anomalous signal from P atoms was done using SHELXC, SHELXD, SHELXE [117] and the hkl2map gui [55].The critical decision of resolution limit for phasing calculations was made by the hkl2map script using a cutoff of about 1.3 for the ratio of ∆F to its estimated standard deviation.Refinement was done using SHELXL [117].Map calculation and structure visualization and manipulation were done using Coot [118] and PyMol [119]  Structure factor amplitudes calculated with scalepack2mtz after applying corrections and Wilson scaling were used to generate anomalous difference data.The ∆F ano /F was then plotted versus resolution bin with 10 bins of equal numbers of reflections for each data set.

Measurability
Zwart [121] discussed quality metrics applied to anomalous data in order to predict success or failure of phasing attempts.Sorted intensity data from XPREP output in scalepack format was used to calculate the anomalous signal:noise ratio: (3) The average signal:noise ratio of a data set was calculated based on all measured Bijvoet pairs and the Measurability of the anomalous signal was calculated as the ratio over all measured Bijvoet pairs of Bijvoet pairs satifying the conditions:

Figure 1 .
Figure 1.Scatter plot of correlation coefficient calculated for all data (CCall) calculated by SHELXD versus try number for Z-DNA.

Figure 2 .
Figure 2. Site peak heights as a percentage of the top peak for Z-DNA calculated by SHELXD for a range of tries by the CCall value.The top tries, CCall > 40.00, are shown in purple.Other successful tries, CCall < 40.00, are in green, and unsuccessful tries are shown in red.

Figure 3 .
Figure 3. Connectivity calculated by SHELXE plotted by density modification cycle for Z-DNA for a range of tries based on the correlation coefficient.

Figure 4 .
Figure 4. Patterson superposition minimum function (PSMF) peaks superimposed onto the refined Z-DNA structure shows that the peaks do not fall on a single molecule.

Figure 5 .
Figure 5. (a) Model of Z-DNA resulting from the direct methods' phasing by SIR2004.Although element assignment is often incorrect, phasing by direct methods was quite good.In the figure, carbon atoms are green; oxygen atoms are red; phosphorus atoms are orange; and nitrogen atoms are blue.(b) Direct methods' phasing using SIR2004 was able to place six atoms of the spermine molecule.

Figure 7 .
Figure 7. Plot for f and f for some of the more commonly used anomalously scattering atoms.

Figure 8 .
Figure 8. Plot for f and f for S and P atoms around the Cu K α wavelength.

Figure 9 .
Figure 9.The signal to noise ratio evaluated using ∆F ano versus resolution plotted for the three data sets, zdna, DDD, and 2'-thio decamer.The data were calculted by resolution bin using 10 bins each for plotting.
There are no well established methods for identification of a correct solution other than examination of the electron density maps.There are a number of indicators including various figures of merit.SHELXD provides a Patterson Figure-of-Merit, FOM, to assess the correspondence between calculated and observed Patterson maps and a correlation coefficient between calculated and observed E-values.A clear separation of clustering of correct and incorrect solutions by correlation coefficient or a clear drop in occupancy values after the expected number of positions are valuable aids in recogition of success.However, in the case of

Figure 10 .
Figure 10.Scatter plot of correlation coefficient for all data, CCall, and related Patterson Figure of Merit, PatFOM, values over 10,000 tries for SHELXD with Z-DNA data.Tries resulting in useful phases are red and unsuccessful tries are green.

Figure 11 .
Figure 11.Plot of scattering factors calculated using Cromer-Mann coefficients in a four Gaussian approximation.The effective scattering, f(eff), is calculated by weighting the scattering factor of each element by its relative proportion to the total number of non-hydrogen atoms in d(GT)/d(AC).

Figure 12 .
Figure 12.Values of ∆B for 934 models of nucleic acid from the PDB plotted against reported resolution.The regression line is shown in green with the average value of ∆B, -7.237 Å 2 , shown with the error bars in blue.

Figure 13 .
Figure 13.Plot of the calculated anomalous signal as a function of resolution and including the term for ∆B representing the relative atomic displacement factor for phosphorus atoms in relation to non-anomalously scattering atoms. .

Table 2 .
Z-DNA phosphorus atoms found by SHELXD for tries with varying scores.

Table 3 .
Z-DNA phosphorus atoms found by SHELXD related to peak heights and refined B-factors.

Table 6 .
Structures examined to determine spacing and B-factors.