Aflatoxin B1–Formamidopyrimidine DNA Adducts: Relationships between Structures, Free Energies, and Melting Temperatures

Thermal stabilities of DNA duplexes containing Gua (g), α- (a) or β-anomer of formamidopyrimidine-N7-9-hydroxy-aflatoxin B1 (b) differ markedly (Tm: a<g<b), but the underlying molecular origin of this experimentally observed phenomenon is yet to be identified and determined. Here, by employing explicit-solvent molecular dynamics simulations coupled with free-energy calculations using a combined linear-interaction-energy/linear-response-approximation approach, we explain the quantitative differences in Tm in terms of three structural features (bulkiness, order, and compactness) and three energetical contributions (non-polar, electrostatic, and preorganized-electrostatic), and thus advance the current understanding of the relationships between structures, free energies, and thermal stabilities of DNA double helices.

b can exert its harmful influence on DNA information content and retrieval if, and only if, it persists for a biologically meaningful time. In this regard, a peculiar characteristic of dsDNA oligonucleotides containing b (dsDNA b ) is a higher, resistance-to-NER conferring [47], thermal stability (melting temperature, T m ) of the duplex relative to dsDNA a (∆T m ∼ 27 K) and dsDNA g (∆T m ∼ 13 K) [33,34,36,38,39,47], which has been qualitatively (but not quantitatively) ascribed, based on nuclear magnetic resonance (NMR) structures of dsDNA a and dsDNA b (Figure 2), to favorable stacking interactions [27,36,38,39,47] (and not to perturbed WC hydrogen bonding interactions, for these remain intact in both dsDNA a and dsDNA b ) [36,39,47]. And this brings us to defining the main object of our present inquiry, an attempt to answer the following question: What are the structural and energetical causes, not only qualitatively but also quantitatively, of the experimentally observed differences in the thermal stability of dsDNA g , dsDNA a , and dsDNA b ? For knowing the answer to this question-besides being the good per se (as one of the pieces of the aflatoxin puzzle)-may help us to proceed from particular observations to general principles that determine the structure and stability of N5-substituted FAPy lesions and intercalated bulky DNA adducts on a long journey toward a complete understanding of the stability and energetics of the DNA double helix in terms of the contributions of polar and non-polar interactions [120].
We approach the problem of quantifying the structural and energetical causes of the differences in the thermal stability between dsDNA g , dsDNA a , and dsDNA b theoretically in three steps: (1) generating ensembles of structures-and-energies of dsDNA and ssDNA models of DNA g [121][122][123], DNA a [39], and DNA b [36] using molecular dynamics (MD) simulations [124], (2) calculating absolute (∆G, dsDNA vs. ssDNA) and relative free energies (∆∆G, g vs. a vs. b) using a combination [125] of linear response approximation (LRA) [126] and linear interaction energy (LIE) methods [127], and (3) correlating the experimentally known melting temperatures [39,47] with the ensemble-derived structural signatures and free energies.

Structures
The lengths of helical rise between the base pairs 4 and 5 in the initial dsDNA models vs. PMD 1 (producing molecular dynamics; the polar state 1) structures of g, a, and b were, respectively, 3.4 vs. 3.0, 7.0 vs. 5.3, and 5.4 vs. 4.2 Å (the average of DNA 1 , the A·T variant, and DNA 2 , the G·C variant), from which the ranking by disorder was determined: g < b < a ( Figure 3A, top row). In contrast to the length of helical rise, the length of rise between the base pairs 4 and 5 does not distinguish a from b: g < a = b ( Figure 3A, middle row). The distances between the C1' atoms of the base-pair 5 in the initial dsDNA models of g, a, and b were 10.7, 11.0, and 10.3 Å, respectively, and the average occupancies of WC-5 (the Watson-Crick hydrogen bonds within the base-pair 5) in PMD 2 (the non-polar state 2) structures of g, a, and b were, respectively, 14, 11, and 17% (the average of DNA 1 and DNA 2 ); from these two measures the ranking by looseness was determined: b < g < a ( Figure 3A, bottom row). The same ranking, b < g < a, would be also obtained for the distances between the C1 atoms of the base-pair 5 in PMD structures, but only if ST, significance threshold, for the differences were ignored, because the distances in PMD 1 vs. PMD 2 structures of g, a, and b were, respectively, 10.7 vs. 11.9, 11.1 vs. 12.3, and 10.5 vs. 11.8 Å (the average of DNA 1 and DNA 2 ), and so the differences (δ) between g and b were δ ST = 0.3 Å. The same ranking, b < g < a, would be also obtained for the occupancy of the perturbed conformation of the nucleobase Cyt-16 ( Figure 3B), but only if ST for the differences in the occupancies were ignored, because the occupancies were 0.5, 1.6, and 0.2% (the average of DNA 1 and DNA 2 ), respectively, for g, a, and b, and so the differences (δ) between the occupancies were The dsDNA bending angles in PMD 1 structures of g, a, and b were, respectively, 25, 20, and 15 • (the average of DNA 1 and DNA 2 ), and the corresponding dsDNA bending angles in PMD 2 structures were 30, 25, and 20 • , respectively; from these two measures the ranking by curvature was determined: b < a < g ( Figure 3C).
The average occupancies of nWC (the non-Watson-Crick hydrogen bond involving the formyl group of the FAPy moiety and the exocyclic amino group of the 3 -neighboring Ade in DNA 1 ) in PMD 1 structures of dsDNA a , dsDNA b , ssDNA a , and ssDNA b were 31, 55, 6, and 30%, respectively, and the corresponding occupancies in PMD 2 structures were 2, 4, 1, and 3%, respectively; from these two measures the rankings by the stability of nWC in dsDNA 1 and ssDNA 1 , and the ranking by percentual difference in the stability of nWC between dsDNA 1 and ssDNA 1 , were determined: a < b, a < b, and a = b, respectively ( Figure 3D).

Free Energies
Contributions of the probes to the absolute free energies of dsDNA formation (Table S1) were obtained from interaction energies ( Table 1); relative free energies ( Table 2) were obtained from the corresponding absolute free energies. ∆G vdw1 of g, a, and b were, respectively, −1.0, −2.9, and −3.0 kcal/mol (the average of DNA 1 and DNA 2 ), from which the ranking by non-electrostatic contribution of the probe to the dsDNA formation, was determined: a = b < g. ∆G ele1 of g, a, and b were, respectively, −2.7, −0.5, and −1.4 kcal/mol (the average of DNA 1 and DNA 2 ), from which the ranking by electrostatic contribution of the probe in the polar state 1 to the dsDNA formation, was determined: g < b < a. ∆G ele2 of g, a, and b were, respectively, −0.4, 0.7, and −1.2 kcal/mol (the average of DNA 1 and DNA 2 ), from which the ranking by electrostatic contribution of the probe in the non-polar state 2-i.e., the ranking by the contribution of electrostatic preorganization to the dsDNA formation-was determined: b < g < a. ∆G ele of g, a, and b were, respectively, −3.1, 0.2, and −2.6 kcal/mol (the average of DNA 1 and DNA 2 ), from which the ranking by total electrostatic contribution of the probe to the dsDNA formation, was determined: g < b < a. ∆G of g, a, and b were, respectively, −4.1, −2.8, and −5.6 kcal/mol (the average of DNA 1 and DNA 2 ), from which the ranking by the total contribution of the probe to the free energy of the dsDNA formation, b < g < a, and the ranking by |∆∆G|, using g as the reference, a = b, were determined. The differences in ∆G between DNA 2 (the G·C variant) and DNA 1 (the A·T variant), ∆∆G dna = ∆G dna2 − ∆G dna1 , for g, a, and b were 0.4, −0.3, and 1.5 kcal/mol, respectively, from which the ratio of correct:incorrect signs of ∆∆G dna was determined: 1:2.

Correlations
The ranking by bulkiness, g < a = b, was the same as the inverse of the ranking by ∆G vdw1 , a = b < g; hence, the bulkier the probe, the more favorable its non-electrostatic contribution to the free energy of dsDNA formation: ∆G vdw1 ≈ −0.1 uB, where u is 1 kcal/mol and B is the number of non-hydrogen atoms in the probe ( Figure 4A). The ranking by disorder, g < b < a, was the same as the ranking by ∆G ele1 ; hence, the greater the disorder, the less favorable the electrostatic contribution from the polar state to the free energy of dsDNA formation: ∆∆G ele1 ≈ u∆D, where u is 1 kcal/(mol·Å) and D is the helical rise obtained from PMD 1 structures ( Figure 4B). The ranking by looseness, b < g < a, was the same as the ranking by ∆G ele2 ( Figure 4C); hence, the greater the looseness, the less favorable the electrostatic contribution from the non-polar state to the free energy of dsDNA formation: ∆∆G ele2 ≈ −0.3u∆L, where u is 1 kcal/mol and L is the WC occupancy obtained from PMD 2 structures. The ranking by ∆G, b < g < a, was the same as the inverse of the ranking by T m (melting temperature); hence, the more favorable the free energy of dsDNA formation, the higher the melting temperature: ∆T m ≈ −10u∆∆G, where u is 1 mol·K/kcal ( Figure 4D).

Relationships between Structures, Free Energies, and Melting Temperatures
Now, let us bring to the reader's attention the main question to which we seek answers in our present, theoretical work (in which we simulate the molecular dynamics of dsDNA decamers and ssDNA trimers-containing g, a, or b as the probe of non-bonded interactions-in aqueous solution), What are the structural and energetical causes, not only qualitatively but also quantitatively, of the experimentally observed differences in the thermal stability of dsDNA g , dsDNA a , and dsDNA b ?, and let us offer the reader an answer: The differences in the melting temperatures (a < g < b) can be explained (1) structurally by the differences in bulkiness (g < a = b; measured by the number of non-hydrogen atoms in the probe), disorder (g < b < a; measured by the average length of the helical rise between the base pairs 4 and 5 in the ensemble of PMD structures of dsDNA generated with the probe in the natural, charged state 1), and looseness (b < g < a; measured by the average occupancy of the WC hydrogen bonds between nucleobases belonging to the base pair 5 in the ensemble of PMD structures of dsDNA generated with the probe in the artificial, uncharged state 2), and (2) energetically by the differences in the non-electrostatic (a = b < g; ∆G vdw1 , calculated from the Lennard-Jones van der Waals interaction energies), electrostatic (g < b < a; ∆G ele1 , calculated from the Coulombic interaction energies obtained for the ensemble of PMD structures generated with the probe in its natural, charged state 1), and preorganized electrostatic (b < g < a; ∆G ele2 , calculated from the Coulombic interaction energies obtained for the ensemble of PMD structures generated with the probe in its artificial, uncharged state 2) free energy contributions of a given probe to the total free energies of dsDNA formation (b < g < a; ∆G, calculated as the sum of ∆G vdw1 , ∆G ele1 , and ∆G ele2 ).
Thus, the three structural attributes, bulkiness (which anticorrelates linearly with ∆G vdw1 ), disorder (which correlates linearly with ∆G ele1 ), and looseness (which correlates linearly with ∆G ele2 ), determine ∆G (which correlates linearly with ∆G ele2 and anticorrelates linearly with the melting temperature, T m ) as follows: the bulkier the nucleobase/adduct, and the less disordered the dsDNA, and the less loose the dsDNA, the higher the melting temperature. If the combined differences in the bulkiness (∆G vdw1 ) and disorder (∆G ele1 ) reflect the differences in the favorability of stacking interactions of g, a, and b with the nucleobases of the neighboring base-pairs-which, in general, is a reasonable assumption [128]-our results are qualitatively in agreement with other studies [36,38,39,47] but quantitatively unprecedented (for our computational work is the first of this kind). If the differences in ∆G ele2 , quantified here for the first time, do indeed reflect the electrostatic preorganization of the WC hydrogen bonding interactions involving the nucleobases of the base pair 5, we have identified a hitherto unknown contribution to the differences in thermal stability, and the current view of the intactness of these interactions in both a and b adducts [36,39,47] might need to be reconsidered. However, compactness is the contrary to looseness, and what is compact, is put together closely, and thus the single best measure of looseness might be the distance between C1' atoms of the base pair 5, which is the shortest in the NMR structure of dsDNA b [36] the longest in the NMR structure of dsDNA a [39] and intermediate in the standard B-DNA model of dsDNA g [121]. However, we did not find any explicit mentioning of these differences in the scientific literature. As for our PMD structures, they do, regardless of the charge state of the probe, preserve this ranking but only if we ignore our strict ST (significance threshold) of 0.3 Å. However, even if we do not ignore ST, which is our primary strategy, we can still clearly distinguish dsDNA a from dsDNA g from dsDNA b using this measure and so our current view is that the distance between C1' atoms of the base pair 5 is indeed a useful indicator of looseness.
A peculiar feature of DNA 1 is nWC (the intrastrand non-Watson-Crick hydrogen bond involving the formyl group of the FAPy moiety and the exocyclic amino group of the 3 -neighboring Ade) [36], which stabilizes the WC hydrogen-bonding interactions between the nucleobases in the base pairs 4-7 [36] but does not contribute to the differences in the melting temperature between DNA 1,g , DNA 1,a , and DNA 1,b [39,47], even though, compared to a, nWC involving b is more stable [39], and even though, compared to ssDNA, nWC in dsDNA is more stable [47]. In addition, while our PMD 1 and PMD 2 structures do not show the experimentally observed stabilizing effect of nWC on WC in the base pairs 4-7, they do agree with the remaining three experimental observations concerning nWC. Moreover, if our relative nWC occupancies (calculated as percentual differences) are quantitatively correct, the cause of the difference in the stability between nWC involving a and b resides in the geometric preferences of the adducts (and not in the differences between dsDNA and ssDNA).

Errors
Every measurement, no matter whether experimental or theoretical, is associated with errors: perfect measurement is impossible: every measurement is only approximate. In general, however, compared to experimental measurements, theoretical, computational results are prone to larger errors, because the latter do not actually observe real phenomena, but merely simulate (imitate) them, and they do so by using simplified models. The combined LRA-LIE approach, as employed in our present work, is no exception, despite the physical soundness and beautiful simplicity of the expression for the calculation of ∆G as the sum of ∆G ele1 (LRA), ∆G ele2 (LRA), and ∆G vdw1 (LIE) contributions [125][126][127]129,130]. Simply put, it is extremely difficult to calculate absolute binding free energies [127,129]; and the larger the molecules involved, the more degrees of freedom, and the bigger the problem [131]. Besides the problem with obtaining accurate ∆G values, there is also the issue of assessing the uncertainty of the ∆G values themselves [127]. We use three uncertainty metrics, namely, (1) convergence errors [127] (calculated as differences between ∆G values obtained from the first and second halves of PMD simulations), (2) standard deviations (calculated from ∆G values obtained from four parallel sets of PMD simulations), and (3) spreads (calculated as maximum differences between four parallel sets of ∆G values). While high convergence errors would imply that our 5.0 ns PMD simulations are too short, high standard deviations would imply that our four parallel sets of PMD simulations are too few; and large spreads would illustrate the necessity of generating multiple parallel sets of PMD simulations. The uncertainties in the rankings of the free energy values are low (∆G vdw1 : a < g and b < g; ∆G ele1 : g < a; ∆G ele2 : g < a), medium (∆G ele1 : g < b; ∆G: b < g), and high (∆G ele2 : b < g; ∆G: g < a), for the convergence errors smaller, similar (within ST of 0.3 kcal/mol), or greater than the unsigned free energies, and our confidence in the meaningfulness of the rankings are, correspondingly, high, medium, and low. However, no average convergence error in ∆G vdw1 , ∆G ele1 , ∆G ele2 , and ∆G is greater than 0.4, 1.5, 1.1, and 2.2 kcal/mol, respectively, and, therefore, if we adopt 2.0 kcal/mol as the threshold of good convergence [127,130,132] only the convergence error of ∆G for DNA a , and only due to the convergence error of ∆G for DNA 2,a , exceeds this good convergence threshold, and only by 0.2 kcal/mol. We would have to generate one or more additional sets of PMD simulations to lower the convergence error, but no improvement in the accuracy could be expected because our ∆G values for DNA g , DNA a , and DNA b are already perfectly linearly correlated with the corresponding melting temperatures.

Strengths and Weaknesses
Interpreting quantitative relationships based on three points requires extreme caution because the probability of fortuitous correlations is not negligible. We would like to emphasize that we use the original scaling parameters of the van der Waals (α = 0.161) and electrostatic (β = 0.500) interaction energies [127], because neither the refined scaling parameters (α = 0.180, β g = 0.430, β a = β b = 0.370) [133] nor free parametrization (all combinations of α and β from 0.000 to 1.000 by 0.001 increments) improves the linear correlation between the free energies and the corresponding melting temperatures, which is supportive of the physical meaning residing in the free energy contributions to ∆G, and we would caution against the use of α and β parameters as freely adjustable fudge factors if the purpose of obtaining binding free energies is truly scientific (Proclus): "For the task of science is the recognition of causes, and only when we recognize the causes of things do we say that we know them." If the original α and β values do not result in a good agreement between the calculated free energies and the corresponding experimental quantities, it is, in our opinion, less likely, due to the lack of robustness of the combined LRA-LIE approach but, rather, due to a problem with the molecular model or due to an insufficient sampling of the configurational space. The latter is the probable reason why our ∆G rankings for DNA variants with swapped identities of the nucleobases in the base pair 6 with respect to DNA 1 (T·A: g < b < a) and DNA 2 (C·G: g = b < a) were incorrect (and therefore not included in the dataset used for the interpretation of the relationships between structures, free energies, and melting temperatures).
The reliability of this computational approach depends, to a certain extent, on the availability of the corresponding experimental quantities. In addition if the differences between the experimental quantities translate into sub-1.0 kcal/mol differences in the calculated free energies, such as, in our case of DNA 1 vs. DNA 2 , it is difficult, if at all possible, to distinguish such small differences with high confidence, and the reason for this is, ultimately, as with any other computational or experimental technique, the detection limit, which is a function of both signal strength (sensitivity) and signal stability (noisiness). Only massively parallel PMD simulations would provide a definite answer about the true sensitivity and noisiness limits-and only for a given case, really, because the limits are partly case-specific-but such an undertaking is beyond the scope of our present work.

Structural Models
NMR structures of dsDNA decamers in B-conformation, PDB IDs 2KH3 [39] (a; model ID 1) and 1HM1 [36] (b; model ID 1) consisting of two complementary DNA strands (strand ID 1: 5 -CTATXYTTCA-3 , where X-5 is a or b, and Y-6 is Ade, A; and strand ID 2: 5 -TGAAZCATAG-3 , where Z-15 is Thy, T) were obtained from the Protein Data Bank [134], and named, for convenience, dsDNA 1,a and dsDNA 1,b , respectively; dsDNA 2,a and dsDNA 2,b were generated from their corresponding dsDNA 1 models by replacing the original nucleobases Y-6 and Z-15 with Gua and Cyt, respectively, for the purpose of exploring two sequence-specific effects: (1) an intra-strand non-WC hydrogen bond involving the formyl group of the FAPy moiety and the exocyclic amino group of the 3 -neighboring Ade, but not the Gua, and (2) two vs. three WC hydrogen bonds involving the complementary nucleobases Y·Z (A·T vs. G·C). Four single-stranded DNA models (ssDNA 1,a , ssDNA 1,b , ssDNA 2,a , and ssDNA 2,b ), were created by extracting the nucleotides 4-6 from the corresponding dsDNA models (5 -TXY-3 , where X-2 is a or b, and Y-3 is Ade or Gua); and these trimeric models were considered to be suitable approximations of the corresponding decameric strands 1 in the dissociated, single-stranded configuration [135,136]. The reference dsDNA 1,g model in the standard B-conformation [121] was built using X3DNA 2.1 [122,123] (fiber −seq=CTATXYTTCA −b, where X-5 is Gua and Y-6 is Ade), and the three remaining models-dsDNA 2,g , ssDNA 1,g , and ssDNA 2,g -were created analogously to the corresponding DNA a and DNA b models.

Energetical Models
Atom types, bond lengths, bond angles, torsion angles, and partial atomic charges of the natural nucleotides in the DNA models were described using the AMBER 95 Force Field [137,138]. The total charge of each 5 -terminal, non-terminal, and 3 -terminal natural nucleotide was, −0.3079, −1.0000, and −0.6921 e, respectively (amber11/data/leap/lib/DNA_CI.lib) [138]. Atom types, bond lengths, bond angles, and torsion angles of a and b were primarily described by the AMBER 95 Force Field [137], secondarily by the General AMBER Force Field (which is compatible with the AMBER 95 Force Field) [139], and tertiarily by the analogy to the AMBER 95 Force Field. Partial atomic charges of a and b were derived using the Restrained Electrostatic Potential (RESP) method [140], as implemented in AMBER 11 (resp −O −i resp.in −o resp.out −p resp.pch −t resp.chg −e esp) [138], applied to the quantum-mechanically calculated electrostatic potential (ESP) at the Hartree-Fock (HF) level of theory with 6-31G(d) basis set using Gaussian 09 (#HF 6−31G(d) opt scf = tight pop = MK iop(2/11 =1) iop(6/33 = 2)) [141] for 8-methyl-9-hydroxy-AFB 1 , with the carbon atom of the methyl group corresponding to the C1 atom of the dRib to which the FAPy moiety is attached. The net charges of the methyl groups in 8-methyl-9-hydroxy-AFB 1 were evenly distributed among all atoms of FAPy-N7-9-hydroxy-AFB 1 , and rounded to four decimal places; thus the original partial atomic charges of the dRib (including the atoms C1 and H1 ) and the net −1 e charge of X-5 nucleotide containing either a or b were preserved.

Solvation
The net −18 and −2 e charge of dsDNA and ssDNA models, respectively, was neutralized by an addition of 18 and 2 sodium ions (each with +1 e charge) on a grid surrounding the DNA: randomly, but not closer than 5 Å from any atom of the DNA, not farther than 18 Å from the geometrical center of the DNA, and not closer than 6 Å from each other. The resulting electroneutral complexes, composed of DNA and sodium ions, were immersed in a spherical grid-being centered at the geometrical center of DNA and having 28 Å in radius-of TIP3P water molecules [142] using the preparation program Qprep (version 5.03) from the molecular dynamics package Q (version 5.0) [124].

Simulation
The variable parts of the DNA g , DNA a , and DNA b models-i.e., g (15 atoms; net charge −0.0888 e), a (54 atoms; net charge −0.0888 e), and b (54 atoms; net charge −0.0888 e)-would be used as the probes (Tables S2 and S3; Figure S1), for which their van der Waals interaction energies (E vdw ), modeled using Lennard-Jones potential, and their electrostatic interaction energies (E ele ), modeled according to the Coulomb's law, with all the components of the surrounding environment-i.e., with the probe-less part of the DNA, the ions, and all the water molecules-would be collected from MD simulations for these are the potential energies from which the van der Waals (∆G vdw ) and electrostatic (∆G ele ) contributions of the probe to the free energy of dsDNA formation (∆G) would be obtained. MD simulations would be performed, separately, using the normally charged probe (state ID 1) and the uncharged probe (state ID 2; with all partial atomic charges set to 0 e), and the charged probe would be used for collecting E vdw1 , E ele1 , and E ele2 , where the integers 1 and 2 denote the simulated state.
The solvated DNA models, containing the probe in the state 1, were prepared for the subsequent collecting of structures and energies by a well-tried, continuous series of 12 equilibrating MD simulations (EMD) [143] using the simulation program Qdyn (version 5.04) from the molecular dynamics simulation package Q (Table S4): Water molecules were subjected to the surface-constraint all-atom solvent (SCAAS)-type boundary conditions [124]. DNA was prevented from moving toward the boundary of the simulation sphere, but not hindered in its tumbling motion, by being restrained to its geometrical center with a force constant of 2.0 kcal/(mol·Å 2 ) (dsDNA), or by having the C1 atom of the nucleotide X-2 restrained to its initial coordinates with a force constant of 50 kcal/(mol·Å 2 ) (ssDNA). No cut-off was applied to non-bonded interactions involving the atoms of the probe. Non-bonded interactions between atoms not belonging to the probe were evaluated explicitly and using the Local Reaction Field method for distances and > 10 Å, respectively. Four parallel equilibrations were generated for each solvated DNA model by executing the equilibrating MD simulation protocol with four different values of the seed for the pseudo-random number generator (which is used by Qdyn to generate initial velocities). The collecting of structures (every 1.0 ps) and energies (every 0.02 ps) was performed in the last 5000 ps of 5000 ps (with the probe being in the state 1) and 5500 ps (with the probe being in the state 2) of 96 producing MD simulations (PMD 1 and PMD 2 ), each of which was a natural continuation of the 12th EMD simulation (ensemble, constant NVT; temperature, 298.15 K; step size, 2 fs; bonds involving hydrogen atoms restrained using the SHAKE algorithm). Hence, 5000/250,000 and 480,000/24,000,000 structural/energetical configurations were harvested per PMD simulation and in total, respectively.

Measurement
The size of the probe was determined by the number of non-hydrogen atoms constituting the probe. Every 10th PMD structure was characterized-based on three attributes of bodies: locality, length, and angularity-using X3DNA. The presence of hydrogen-bonding interactions-both WC (involving complementary nucleobases, including FAPy) and non-WC (involving the formyl group of the FAPy moiety and the exocyclic amino group of the 3 -neighboring Ade)-was determined, for every PMD structure, using hydrogen · · · acceptor distance ( 2.5 Å) and donor-hydrogen · · · acceptor angle ( 135 • ) criteria, and hydrogen-bonding occupancies were calculated as fractions of the structures in the ensemble that satisfied these arbitrary, but stringent, geometrical standards for hydrogen-bonding [146]. The conformation of Cyt-16 in PMD structures of dsDNA was classified as perturbed when the distance between the geometrical centers of Cyt-16 and Gua-or FAPy-5 exceeded our arbitrary, but judicious, threshold of 8.0 Å.
The contribution of the probe to the absolute free energy of dsDNA formation was calculated from the average E vdw1 , E ele1 , and E ele2 interaction energies ( · · · ), which were extracted from the collections of energies from PMD simulations of dsDNA (ds) and ssDNA (ss) models using Qfep (version 5.01) from the molecular dynamics simulation package Q as follows (Equations (1)-(4)): [125] (1) where α = 0.161 and β 1 = β 2 = 0.500 = β [127,131]. The relative free energies of dsDNA g , dsDNA a , and dsDNA b formation were calculated as differences between the corresponding absolute free energies (Equation (5)): where f is either a or b.
The structural and energetical quantities obtained from parallel PMD simulations were simply averaged. The rounding error was set, arbitrarily, but judiciously, to 0.1 Å for distances, 1.0 • for angles, 1.0% for occupancies, and 0.1 kcal/mol for energies. The average energetical quantities were also calculated (1) separately for each of the four parallel sets of PMD simulations (for the purpose of assessing non-cumulative uncertainties in these quantities as standard deviations and spreads), and (2) separately for the first and last 130,000 energies (for the purpose of assessing non-cumulative convergences of these quantities as differences between the two averages). An arbitrary, but judicious, significance threshold (ST) of three times the rounding error was set for the structural and energetical differences between the DNA models distinguished by the identities of X-5, Y-6, and Z-15 nucleotides. The rankings of the DNA models according to the structural and energetical quantities, produced from the sums of integer significance scores obtained from the matrices of differences (δ) between the DNA models (−1, if δ −ST; 0, if −ST < δ < ST; 1, if ST δ), were compared with each other and with the ranking according to the experimental T m (a < g < b). In the cases of matching rankings, linear correlation coefficients (R 2 ; ST set to 0.96) were calculated by comparing the actual quantities, for the purpose of which only one set of experimental T m values-representing, approximately (within the experimental error of 1 K), the average of DNA 1 and DNA 2 -was used: T m,g = 312 K, T m,a = 298 K, and T m,b = 325 K, indicating that T m,g lies, approximately, in the middle between T m,a (∆T m,a−g = −14 K) and T m,b (∆T m,b−g = 13 K) [39,47].

Conclusions
Having identified the general attributes of a (bulky), b (as bulky as a), dsDNA a (disordered and loose) and dsDNA b (disordered, but less than dsDNA a , and compact), and thus having answered our main question, we ask ourselves: (1) How do the attributes of thermal (de)stabilization modulate (i) the efficiency and fidelity of the bypass of these lesions by the translesion-synthesis DNA polymerase ζ (which "preferentially misincorporates Ade opposite the lesion [64,65]," suggesting that this polymerase is "responsible for the predominant G·C→T·A mutation" induced by FAPy-AFB1 adducts) [64,65], (ii) the recognition of these lesions by the global-genome-repair-specific XPC-HR23B complex (which "screens the genome for damage on the basis of disrupted base-pairing instead of lesions per se") [88], and (iii) the subsequent dual incision (which is the "rate-limiting step of the nucleotide excision repair [86]," and which releases an oligonucleotide containing the lesion) [147], and (2) why is the thermal stabilization of DNA duplex by b (which is the dominant FAPy-N7-9-hydroxy-AFB 1 adduct in genomic DNA) [54] not common among bulky-and-intercalated-but-not-cross-linked DNA adducts [36]? These questions remain to be answered by future experimental and theoretical studies.
Supplementary Materials: The following are available online, Table S1: Contributions of the probes (g, a, or b) to the absolute free energy of dsDNA formation (kcal/mol), Table S2: Atom types and partial atomic charges of the Gua probe, Table S3: Atom types and partial atomic charges of the αand β-FAPy-N7-9-hydroxy-AFB 1 probes, Table S4: Parameters of 12-stage equilibrating molecular dynamics (EMD) simulations, Figure S1: Atom identifiers of the probes.

Conflicts of Interest:
The authors declare no conflict of interest.