Secondary Structure Adopted by the Gly-Gly-X Repetitive Regions of Dragline Spider Silk

Solid-state NMR and molecular dynamics (MD) simulations are presented to help elucidate the molecular secondary structure of poly(Gly-Gly-X), which is one of the most common structural repetitive motifs found in orb-weaving dragline spider silk proteins. The combination of NMR and computational experiments provides insight into the molecular secondary structure of poly(Gly-Gly-X) segments and provides further support that these regions are disordered and primarily non-β-sheet. Furthermore, the combination of NMR and MD simulations illustrate the possibility for several secondary structural elements in the poly(Gly-Gly-X) regions of dragline silks, including β-turns, 310-helicies, and coil structures with a negligible population of α-helix observed.


Introduction
Dragline spider silks have been extensively studied with the long-term goal often being biomimicry [1][2][3]. Dragline spider silks are protein-based biopolymers and understanding the proteins' primary and secondary structures are critical steps in the goal of reproducing synthetic versions of this extraordinary fiber [4]. The technology and ability to determine primary sequences through DNA analysis has provided numerous amino acid sequences for a large number of dragline silks as well as the diversity of other silks that spiders produce [5,6]. Hence, the next step is to determine the molecular secondary structure and dynamics of these sequenced proteins in spider dragline silk. Protein structural elucidation experimental tools such as nuclear magnetic resonance (NMR) spectroscopy and X-ray Diffraction (XRD) have been extensively used to probe the secondary structures of the proteins that make-up spider dragline silk [7][8][9][10][11][12][13][14][15]. They have provided many insights into the molecular structure and organization of the silk proteins. However, a complete picture of the structure and dynamics within spider dragline silk is still lacking due to the complex and amorphous nature of the biopolymer. The goal of determining a comprehensive protein secondary structure for spider dragline silk protein-based biopolymers is aided by molecular dynamics (MD) simulations, which can play a critical role in connecting experimental restraints with potentially plausible molecular structures [16][17][18]. Much of the complexities in both the structure and dynamics of biopolymers such as spider's dragline silk will require a synergistic effort between computational/theoretical biophysics and experimental structural biology to obtain a true molecular level structural and dynamic understanding [19]. This is a first effort on the part of the authors to combine recent solid-state NMR results and MD simulations to help elucidate the secondary structures found in the poly(Gly-Gly-X) of orb-weaving spider dragline silk.
Major ampullate spider silk (dragline) is a protein-rich biopolymer that is commonly made up of repetitive amino acid segments (or motifs) from two proteins, major ampullate spidroin 1 (MaSp1) and major ampullate spidroin 2 (MaSp2) in orb-weaving spiders [20]. Common repetitive segments or motifs include poly(Ala), poly(Gly-Ala), poly(Gly-Gly-X), and poly(Gly-Pro-Gly-X-X) [4]. The general picture that has emerged to describe major ampullate spider silk is that the poly(Ala) and flanking poly(Gly-Ala) segments form nanocrystalline β-sheet structures and the rest is an amorphous glycine-rich flexible linking region, where poly(Gly-Gly-X) is the common motif found in MaSp1 and poly(Gly-Pro-Gly-X-X) is the common motif found in MaSp2 [18]. Previous NMR studies have shown that poly(Gly-Pro-Gly-X-X) found in MaSp2 forms type II β-turn structures [21]. Additionally, solid-state NMR has provided evidence that poly(Gly-Gly-X) found in MaSp1 forms 3 1 -helical structures similar to polyglycine II [22][23][24][25][26][27][28][29]. The poly(Gly-Gly-X) sequence is also found in minor ampullate and flagelliform (capture spiral) silk. This sequence is of particular interest as the X residue is always from a restricted set of amino acids [30,31] and is frequently in the same order in each protein. In major and minor ampullate silks they are Leu, Tyr, Ala, and Gln and in flagelliform they are Ala, Val, Ser, and Tyr. In this paper, we combine experimental solid-state NMR results that focus on the X-residues and molecular dynamics simulations to better understand the molecular secondary structure of poly(Gly-Gly-X) found in MaSp1 which will provide a starting point for understanding the structure of this motif in the other silks.

Solid-State NMR
The consensus primary amino acid sequence for MaSp1 along with the 13 C cross polarization magic angle spinning (CP-MAS) NMR spectrum of 13 C-labeled N. clavipes spider dragline silk is shown in Figure 1a,b, respectively. The N. clavipes silk is a MaSp1-rich silk with a low MaSp2 content (~80:20, MaSp1:MaSp2). Thus, when investigating this silk it is the MaSp1 protein that is primarily characterized. However, it should be noted that although the N. clavipes dragline silk is primarily MaSp1, minor contributions from amino acids present in the MaSp2 protein cannot be discounted. The contribution from MaSp2 amino acids in non-Gly-Gly-X motifs is believed to be mostly negligible for the X amino acids, since Leu is entirely absent from MaSp2 and Tyr is present in the same Gly-Gly-X motif in MaSp2 [6]. Gln is present in a non-Gly-Gly-X Gln-Gln motif in MaSp2 and could contribute to a minor extent. The spider dragline silk is 13 C enriched at Ala, Gly, Leu, Gln, and Tyr. The Ala methyl, Cβ, resonance has been shown in previous studies to be heterogeneous with a minimum of two-components at 17.4 and 20.9 ppm that has been assigned to Ala present in a disordered 3 1 -helix similar to polyglycine II and ordered nanocrystalline β-sheet structures, respectively [29]. The Ala in 3 1 -helical structures have been ascribed to Ala located in the repetitive Gly-Gly-X motif while, the Ala in β-sheet structures are located in the poly(Ala) and flanking poly(Gly-Ala) motifs in the primary amino acid sequence. By 13 C labeling the common X amino acids (Gln, Tyr, Leu) found in Gly-Gly-X, we have been able to further probe the secondary structure of this disordered domain.
The 13 C isotope enrichment permits 2D 13 C-13 C correlation experiments with dipolar assisted rotational resonance (DARR) to extract the conformation dependent 13 C chemical shifts for the various amino acids that are found in the Gly-Gly-X motif. This 2D method is particularly useful for extracting all 13 C chemical shifts for each site including the CO chemical shifts that are completely overlapped in the 1D 13 C CP-MAS spectrum (see Figure 1). The 13 C chemical shifts are tabulated in Table 1. The results for Ala and Gly are similar to previously reported solid-state NMR studies where components for β-sheet and 3 1 -helix were observed with nearly identical chemical shifts [29]. For the other 13 C labeled X amino acids including Gln and Tyr, the chemical shifts indicate that these amino acids are present in non-β-sheet conformations; however, the shifts do not match α-helical structures and shifts similar to those observed for Ala in a model 3 1 -helix for the most part (see Table 1) [22,[32][33][34][35]. Specifically, the observed shifts for Cα and CO sites shift to the β-sheet side of random coil (to lower ppm) and the Cβ sites are close to random coil. This is similar to the observations observed for Ala in the model 3 1 -helical structure, but are also similar to β-turn chemical shift trends depending on the amino acid position in the turn [36]. Lastly, for Leu, the 13 C chemical shift trends most closely match the random coil conformation. Overall, the solid-state NMR results illustrate that the structure of the Gly-Gly-X motif is not β-sheet or α-helix and is best interpreted as a disordered structure with evidence for 3 1 -helix, β-turn, and/or random coil conformations. For the other 13 C labeled X amino acids including Gln and Tyr, the chemical shifts indicate that these amino acids are present in non-β-sheet conformations; however, the shifts do not match α-helical structures and shifts similar to those observed for Ala in a model 31-helix for the most part (see Table  1) [22,[32][33][34][35]. Specifically, the observed shifts for Cα and CO sites shift to the β-sheet side of random coil (to lower ppm) and the Cβ sites are close to random coil. This is similar to the observations observed for Ala in the model 31-helical structure, but are also similar to β-turn chemical shift trends depending on the amino acid position in the turn [36]. Lastly, for Leu, the 13 C chemical shift trends most closely match the random coil conformation. Overall, the solid-state NMR results illustrate that the structure of the Gly-Gly-X motif is not β-sheet or α-helix and is best interpreted as a disordered structure with evidence for 31-helix, β-turn, and/or random coil conformations.    The 13 C-13 C correlation method with long (1 s) DARR mixing permits observation of long range contacts between adjacent amino acids. This allows one to identify the location of amino acids in different motifs. For Gln, Tyr, and Leu, long-range contacts to Gly are observed at 42.1, 41.2, and 42.5 ppm, respectively, which is consistent with these amino acids present in the Gly-rich Gly-Gly-X motif (see Figure 2). Importantly, the Gly correlations observed are consistent with Gly present in 3 1 -helix (41.4-42.5) indicating that the common X-amino acids in Gly-Gly-X exhibit correlations consistent with this conformation. However, as discussed above, although the shift trends agree with the 3 1 -helical structure for the most part, the shifts for the X amino acids show some inconsistencies and closely overlap with β-turn and coil conformations. The MD simulations discussed in the next section help to clarify this ambiguity and assist in characterizing the structure of Gly-Gly-X. The 13 C-13 C correlation method with long (1 s) DARR mixing permits observation of long range contacts between adjacent amino acids. This allows one to identify the location of amino acids in different motifs. For Gln, Tyr, and Leu, long-range contacts to Gly are observed at 42.1, 41.2, and 42.5 ppm, respectively, which is consistent with these amino acids present in the Gly-rich Gly-Gly-X motif (see Figure 2). Importantly, the Gly correlations observed are consistent with Gly present in 31-helix (41.4-42.5) indicating that the common X-amino acids in Gly-Gly-X exhibit correlations consistent with this conformation. However, as discussed above, although the shift trends agree with the 31-helical structure for the most part, the shifts for the X amino acids show some inconsistencies and closely overlap with β-turn and coil conformations. The MD simulations discussed in the next section help to clarify this ambiguity and assist in characterizing the structure of Gly-Gly-X.    13 C-13 C correlation spectrum collected with a long dipolar assisted rotational resonance (DARR) mixing period of 1 s. Short range (intra-residue) and long range (inter-residue) dipolar contacts are indicated with dashed black lines. Long range dipolar contacts for (a) Leu-Gly and (b) Tyr-Gly present in Gly-Gly-X repeats are indicated in red.

Molecular Dynamics Simulations
Simulations were performed of two MaSp1 mini-fibrils that consisted of three planes of five identical strands. The systems differed in the arrangement of strands: in the anti-parallel/parallel (AP) system, the strands were oriented anti-parallel within the planes and parallel between the planes, while in the anti-parallel/anti-parallel (AA) system, the strands were arranged in an anti-parallel manner within the planes and between the planes. Representative temperature replica exchange molecular dynamics (TREX-MD) structures at 300 K are shown in Figure 3. In all systems, the poly(Ala) regions were in the β-sheet configuration, but the length of the β-sheets varied among the systems. AP had the longest β-sheets, with more residues in this conformation than AA. Root mean square deviations (RMSD) of the backbone atomic positions were calculated using the averaged system as a reference ( Table 2). AP showed lower average RMSD values than AA, indicating less structural variation of AP. RMSD values for the spacer region were similar to the overall RMSD for AP and AA, indicating significantly higher mobility of the spacer regions. In both systems, a bend between the two poly(Ala) regions was observed ( Figure 3, Table 2). This bending was decomposed into in-plane bending (within the plane of the sheets) and out-of-plane bending (out of the plane of the sheets). The AP bending was small for all three angles; AA had larger bending angles, particularly for out-of-plane bending. The lower bending angle for the AP system suggests that it can be packed more efficiently into larger structures as opposed to the AA system.

Molecular Dynamics Simulations
Simulations were performed of two MaSp1 mini-fibrils that consisted of three planes of five identical strands. The systems differed in the arrangement of strands: in the anti-parallel/parallel (AP) system, the strands were oriented anti-parallel within the planes and parallel between the planes, while in the anti-parallel/anti-parallel (AA) system, the strands were arranged in an anti-parallel manner within the planes and between the planes. Representative temperature replica exchange molecular dynamics (TREX-MD) structures at 300 K are shown in Figure 3. In all systems, the poly(Ala) regions were in the β-sheet configuration, but the length of the β-sheets varied among the systems. AP had the longest β-sheets, with more residues in this conformation than AA. Root mean square deviations (RMSD) of the backbone atomic positions were calculated using the averaged system as a reference ( Table 2). AP showed lower average RMSD values than AA, indicating less structural variation of AP. RMSD values for the spacer region were similar to the overall RMSD for AP and AA, indicating significantly higher mobility of the spacer regions. In both systems, a bend between the two poly(Ala) regions was observed ( Figure 3, Table 2). This bending was decomposed into in-plane bending (within the plane of the sheets) and out-of-plane bending (out of the plane of the sheets). The AP bending was small for all three angles; AA had larger bending angles, particularly for out-of-plane bending. The lower bending angle for the AP system suggests that it can be packed more efficiently into larger structures as opposed to the AA system.  Table 2. Structural data of 300 K replica. Root-mean-square deviation (RMSD) in Å and angles in degrees. Secondary structural elements in the spacer regions (residues 82-107) are given as a percentage of all spacer residues; π-helices and bridges are not reported.   The secondary structure of the spacer region in the TREX simulations is shown in Table 2, and the Ramachandran plot for the spacer region is shown in Figure 4. The spacer regions were rich in β-turns and coils, but a significant fraction of β-sheets was found as well. A large fraction of the poly(Ala) β-sheets extended into the first 5 residues of the spacer region. Typically only a few isolated β-sheets were found in the center of the spacer regions; these generally consisted of a few residues and 2 to 3 strands. This means that the high population of β-sheets stemmed from a continuation of the poly(Ala) β-sheets; apart from these extensions, the spacer had few β-sheets. The spacer was also poor in α-helices, with less than 1% of the spacer residues in α-helical conformation. The low abundance of α-helices and β-sheets in the spacer region is in agreement with the NMR results reported above and in the literature for the Gly-Gly-X region [22][23][24][25][26][27][28][29]. The secondary structure of the spacer region in the TREX simulations is shown in Table 2, and the Ramachandran plot for the spacer region is shown in Figure 4. The spacer regions were rich in β-turns and coils, but a significant fraction of β-sheets was found as well. A large fraction of the poly(Ala) β-sheets extended into the first 5 residues of the spacer region. Typically only a few isolated β-sheets were found in the center of the spacer regions; these generally consisted of a few residues and 2 to 3 strands. This means that the high population of β-sheets stemmed from a continuation of the poly(Ala) β-sheets; apart from these extensions, the spacer had few β-sheets. The spacer was also poor in α-helices, with less than 1% of the spacer residues in α-helical conformation. The low abundance of α-helices and β-sheets in the spacer region is in agreement with the NMR results reported above and in the literature for the Gly-Gly-X region [22][23][24][25][26][27][28][29]. simulations. Data is shown for the 300 K replicas. Three basins (and their mirror images, indicated by primes) were found. Basin I corresponds to β-turns and β-sheets, basin II corresponds to β-turns, 31-helices, and coils, and III corresponds to 310-helices and α-helices.

Region
Based on NMR chemical shifts, it has previously been reported and also indicated in this study that the spacer region is rich in 31-helices with (φ, ψ) values near (−90°, 150°) [22][23][24][25][26][27][28][29]. We indeed saw a large population of secondary structure elements with these dihedral angles in the simulations, but these were classified as β-turns in our MD analysis. In contrast to the NMR findings, very low 31-helical content was found in the spacer region (~1 per 6 simulation frames). The 31-helices that formed were rich in Gly (Table 3), and were 3 residues in length. Structures of representative 31-helices are shown in Figure 5. The helices formed inter-strand hydrogen bonds, mostly with β-turns. Although NMR chemical shifts of 31-helices show overlap with chemical shifts for β-turns, and the dihedral angles of 31-helices and β-turns overlap, the low occurrence of 31-helices in the simulations likely indicates a force field deficiency in describing Gly-rich areas.  Data is shown for the 300 K replicas. Three basins (and their mirror images, indicated by primes) were found. Basin I corresponds to β-turns and β-sheets, basin II corresponds to β-turns, 3 1 -helices, and coils, and III corresponds to 3 10 -helices and α-helices.
Based on NMR chemical shifts, it has previously been reported and also indicated in this study that the spacer region is rich in 3 1 -helices with (ϕ, ψ) values near (−90 • , 150 • ) [22][23][24][25][26][27][28][29]. We indeed saw a large population of secondary structure elements with these dihedral angles in the simulations, but these were classified as β-turns in our MD analysis. In contrast to the NMR findings, very low 3 1 -helical content was found in the spacer region (~1 per 6 simulation frames). The 3 1 -helices that formed were rich in Gly (Table 3), and were 3 residues in length. Structures of representative 3 1 -helices are shown in Figure 5. The helices formed inter-strand hydrogen bonds, mostly with β-turns. Although NMR chemical shifts of 3 1 -helices show overlap with chemical shifts for β-turns, and the dihedral angles of 3 1 -helices and β-turns overlap, the low occurrence of 3 1 -helices in the simulations likely indicates a force field deficiency in describing Gly-rich areas.  In the simulations, 310-helices also formed (4%); mostly consisting of three residues, and sometimes four (~10%). The (φ, ψ) distribution for the 310-helices peaked at (−70°, −25°) and (70°, 25°), which is close to the (−60°, −30°) and (60°, 30°) values for the ideal right and left-handed 310-helix, respectively [37]. The (φ, ψ) dihedral distribution of the 310-helices partly overlapped with the dihedral distribution for β-turns, in particular type I and its mirror image type I' [37]. β-turns and 310-helices both have i→i + 3 hydrogen bonding patterns, and their chemical shifts largely overlap. There are other structural similarities as well [38][39][40]; in fact, type III β-turns (which are excluded from our β-turn definition) correspond to a 310-helix [37]. In the simulations, the interconversion of β-turns and 310-helices was frequently observed, including β-turns with (φ, ψ) angles that match those of the 31-helix (especially in the higher temperature replicas) although they were short turns only comprised of a few residues and not continuous extended helices. Of interest was the relatively large occurrence of left-handed 310-helices ( Table 4). The backbone (φ, ψ) angle distribution of these helices peaked at (70, 25) degrees, which differs from the (−70, −25) degrees of right-handed 310-helices. Left-handed 310-helices are rare in ordinary proteins, and typically involve Gly [37]. The high occurrence of Gly in the MaSp1 spacer region is atypical for proteins; moreover, the absence of tertiary structure in the amorphous spacer region might further contribute to its high formation. Representative structures of the 310-helices are shown in Figure 6.
α-helical motifs were rare; when they occurred, they were on average 2-3 residues longer than the 310-helical motifs. Formation of α-helices occurred only in the central spacer region, except for the (Ala)n α-helices in AA, which were due to refolding of the poly(Ala) region. Since refolding of the poly(Ala) region is unlikely to occur, the AA system is likely not representative of a spider silk mini-fibril. Ser-Gln-Gly (SQG) was present in most of the motifs, and a large percentage of α-helices contained Leu-Gly-Ser (LGS) motifs. Both LGS and SQG sequences also formed 310-helices, suggesting potential interconversion between these structures. Table 4. 310-helices in the 300 K replicas. %G, %A, and %S indicate the occurrence of Gly, Ala, or Ser, respectively, given as a percentage of all residues in 310-helices. The average length of hydrogen bonds in the 310-helices (in Å) is also shown.  In the simulations, 3 10 -helices also formed (4%); mostly consisting of three residues, and sometimes four (~10%). The (ϕ, ψ) distribution for the 3 10 10 -helix, respectively [37]. The (ϕ, ψ) dihedral distribution of the 3 10 -helices partly overlapped with the dihedral distribution for β-turns, in particular type I and its mirror image type I' [37]. β-turns and 3 10 -helices both have i→i + 3 hydrogen bonding patterns, and their chemical shifts largely overlap. There are other structural similarities as well [38][39][40]; in fact, type III β-turns (which are excluded from our β-turn definition) correspond to a 3 10 -helix [37]. In the simulations, the interconversion of β-turns and 3 10 -helices was frequently observed, including β-turns with (ϕ, ψ) angles that match those of the 3 1 -helix (especially in the higher temperature replicas) although they were short turns only comprised of a few residues and not continuous extended helices. Of interest was the relatively large occurrence of left-handed 3 10 -helices ( Table 4). The backbone (ϕ, ψ) angle distribution of these helices peaked at (70, 25) degrees, which differs from the (−70, −25) degrees of right-handed 3 10 -helices. Left-handed 3 10 -helices are rare in ordinary proteins, and typically involve Gly [37]. The high occurrence of Gly in the MaSp1 spacer region is atypical for proteins; moreover, the absence of tertiary structure in the amorphous spacer region might further contribute to its high formation. Representative structures of the 3 10 -helices are shown in Figure 6.

Occurrence
α-helical motifs were rare; when they occurred, they were on average 2-3 residues longer than the 3 10 -helical motifs. Formation of α-helices occurred only in the central spacer region, except for the (Ala) n α-helices in AA, which were due to refolding of the poly(Ala) region. Since refolding of the poly(Ala) region is unlikely to occur, the AA system is likely not representative of a spider silk mini-fibril. Ser-Gln-Gly (SQG) was present in most of the motifs, and a large percentage of α-helices contained Leu-Gly-Ser (LGS) motifs. Both LGS and SQG sequences also formed 3 10 -helices, suggesting potential interconversion between these structures. Table 4. 3 10 -helices in the 300 K replicas. %G, %A, and %S indicate the occurrence of Gly, Ala, or Ser, respectively, given as a percentage of all residues in 3 10 -helices. The average length of hydrogen bonds in the 3 10 -helices (in Å) is also shown.

Discussion
TREX-MD simulations of two MaSp1 mini-fibrils that differed in the arrangement of strands indicated higher stability of the AP system in which the strands were arranged in an anti-parallel manner within and parallel between the planes. The simulations showed that the β-sheets of the poly(Ala) region extend into the first residues of the spacer region. The secondary structure of the remaining spacer region was poor in α-helices and β-sheets, and predominantly consisted of β-turns and coils. The simulations showed very low 31-helical content though, which might point to deficiencies in the force field. A minor fraction of 310-helices was found, with a high occurrence of left-handed 310-helices, which rarely occur in other proteins. It is thought that the high Gly content and the absence of tertiary structure is responsible for the high formation of left-handed 310-helices in the disordered spacer region. Only short 31-and 310-helices were found, with all 31-and most 310-helices consisting of three residues. Conversions between these two structural elements and β-turns were frequently observed. The variation in turns and 31 and 310-helicies appear possible due to the high Gly content and the absence of tertiary structure restraints.
In principle, combining solid-state NMR with MD simulation is a powerful approach for determining the secondary structure for spider dragline silk and the various repetitive motifs that comprise the silk proteins. The solid-state NMR data provided convincing evidence that the Gly-Gly-X motif does not form α-helical or β-sheet structures, with some evidence for the polyglycine II 31-helical conformation. However, some of the observed chemical shifts also overlap with 310-helical, β-turn, and random coil chemical shifts, making the interpretation somewhat ambiguous. The MD simulation provided evidence for the presence of all of these structures, illustrating the disorder of the Gly-Gly-X spacer region and helping with the NMR interpretation. While 31-helical content is currently underestimated in the MD, tuning of the Gly force field parameters and the use of chemical shift restraints in the simulations will improve the accuracy of the simulations. In this way, it is anticipated that combining solid-state NMR and MD will greatly enhance our ability to characterize the conformational structure of the various repetitive motifs that comprise spider and other types of animal silks.

Materials
Mature female N. clavipes spiders were fed with tap water and crickets once per week. Spiders were forcibly silked at a speed of 2 cm/s for 1 h every other day. The major ampullate silk (dragline

Discussion
TREX-MD simulations of two MaSp1 mini-fibrils that differed in the arrangement of strands indicated higher stability of the AP system in which the strands were arranged in an anti-parallel manner within and parallel between the planes. The simulations showed that the β-sheets of the poly(Ala) region extend into the first residues of the spacer region. The secondary structure of the remaining spacer region was poor in α-helices and β-sheets, and predominantly consisted of β-turns and coils. The simulations showed very low 3 1 -helical content though, which might point to deficiencies in the force field. A minor fraction of 3 10 -helices was found, with a high occurrence of left-handed 3 10 -helices, which rarely occur in other proteins. It is thought that the high Gly content and the absence of tertiary structure is responsible for the high formation of left-handed 3 10 -helices in the disordered spacer region. Only short 3 1 -and 3 10 -helices were found, with all 3 1 -and most 3 10 -helices consisting of three residues. Conversions between these two structural elements and β-turns were frequently observed. The variation in turns and 3 1 and 3 10 -helicies appear possible due to the high Gly content and the absence of tertiary structure restraints.
In principle, combining solid-state NMR with MD simulation is a powerful approach for determining the secondary structure for spider dragline silk and the various repetitive motifs that comprise the silk proteins. The solid-state NMR data provided convincing evidence that the Gly-Gly-X motif does not form α-helical or β-sheet structures, with some evidence for the polyglycine II 3 1 -helical conformation. However, some of the observed chemical shifts also overlap with 3 10 -helical, β-turn, and random coil chemical shifts, making the interpretation somewhat ambiguous. The MD simulation provided evidence for the presence of all of these structures, illustrating the disorder of the Gly-Gly-X spacer region and helping with the NMR interpretation. While 3 1 -helical content is currently underestimated in the MD, tuning of the Gly force field parameters and the use of chemical shift restraints in the simulations will improve the accuracy of the simulations. In this way, it is anticipated that combining solid-state NMR and MD will greatly enhance our ability to characterize the conformational structure of the various repetitive motifs that comprise spider and other types of animal silks.

Materials
Mature female N. clavipes spiders were fed with tap water and crickets once per week. Spiders were forcibly silked at a speed of 2 cm/s for 1 h every other day. The major ampullate silk (dragline silk) was separated from the minor ampullate silk under an optical microscope (Olympus, Waltham, MA, USA). To prepare isotope enriched dragline silk, the spiders were fed a 200 µL saturated solution of U-[ 13 C, 15

Solid-State NMR Measurements
Solid-state NMR spectra were collected on a Varian VNMRS 400 MHz spectrometer equipped with a 1.6 mm triple-resonance cross polarization magic angle spinning (CP-MAS) probe operating in triple resonance mode ( 1 H/ 13 C/ 15 N). One-dimensional (1D) 1 H→ 13 C CP-MAS and two-dimensional (2D) 13 C-13 C through-space correlation NMR experiments with dipolar-assisted rotational resonance (DARR) experiments [29] were performed at a spinning speed of 35 kHz. The CP condition consisted of a 1.6 µs 1 H π/2 pulse, followed by a 1.0 ms ramped (6%) 1 H spin-lock pulse with a radio frequency (rf) field strength of 155 kHz at the ramp maximum and the 13 C channel matched to the −1 spinning sideband condition (rf field strength of 120 kHz). Typical experimental conditions included a 25 kHz sweep width, and a recycle delay of 3.0 s, with two-pulse phase-modulated (TPPM) 1 H decoupling applied during acquisition with a rf field strength of 130 kHz. In 2D 13 C-13 C through-space correlation experiments, the spectra were collected with 1024 points in the direct dimension, 320 t 1 complex points in the indirect dimension, and 32 scans averages with spectral widths in the direct and indirect dimension of 25 and 35 kHz, respectively. During the DARR mixing period, continuous wave (CW) irradiation was applied on the 1 H channel at n = 1 (ω R = ω 1 ) rotary resonance condition with mixing times (τ m ) of 50, 150 ms, and 1 s. The 2D spectra were processed with exponential line broadening of 100 Hz in the direct dimension and a Gaussian function of the form exp(−(t/gf ) 2 ) in the indirect dimension with the constant, gf, equal to 0.0025. The 13 C isotropic chemical shift was indirectly referenced to adamantane (38.56 ppm).

Molecular Dynamics Simulations
Simulations of spider silk mini-fibrils consisting of MaSp1 residues 71-121 with primary sequence GQGAGAAAAA-AGGAGQGGYG-GLGSQGAGRG-GLGGQGAGAA-AAAAAGGAGQ-G were performed. Each strand consisted of two poly(Ala) regions, separated by a spacer region. The individual strands were capped by acetyl and amine groups. The mini-fibrils were constructed from 3 planes of 5 strands, for a total of 15 identical strands. Each of these strands was initially constructed in an extended β-sheet conformation. Two different systems were simulated in which the strands were oriented antiparallel within the planes, and parallel or antiparallel between the planes. These systems are designated AA (antiparallel within planes, antiparallel between planes) and AP (antiparallel within planes, parallel between planes), respectively.
In order to significantly enhance the amount of sampled space, temperature replica exchange molecular dynamics (TREX-MD) [41,42] simulations were performed. In TREX-MD, multiple independent copies of the system (replicas) are run at different temperatures. At regular time intervals, attempts are made to swap coordinates between the replicas with neighboring temperatures. The success of these attempts is based on an energy criterion that preserves detailed balance. In order to equilibrate the replicas at their chosen temperatures, molecular dynamics (MD) simulations were performed. These simulations used distance restraints between the Cα atoms of opposing sheets of poly(Ala) regions; these distances were restrained by a flat bottom potential with a force constant of 20 kcal/(mol Å), active beyond a distance of 6.0 Å. Each system was first heated from 120 to 400 K over a period of 1 ns. Replicas of the system were then cooled to their TREX-MD starting temperature over a period of 1 ns. The temperatures for the TREX-MD simulations were selected from unrestrained TREX-MD trial runs so as to optimize the swapping of replicas. In the trial runs, systems were simulated at temperature intervals of 10 K between 300 and 400 K. The heat capacity was then calculated from these exploratory simulations and the temperature of the phase transition corresponding to the melting of the noncrystalline area was identified (between 350 and 365 K). Because phase transitions represent bottlenecks for replica exchange [43], smaller spacings were used near the phase transition. Swapping was monitored in further trial runs, and extra replicas were inserted where needed. This optimization resulted in 37  After temperature equilibration, a 5 ns per replica TREX-MD equilibration was performed, during which the use of positional restraints on the poly(Ala) region was removed. All simulations were performed with the Amber12 GPU [44] code and the Amber99SB [45] force field, using a generalized Born implicit solvent model [46], and Langevin dynamics with a friction coefficient of 5 ps −1 . Bonds involving hydrogen atoms were constrained using the SHAKE algorithm [47], which permitted the use of a 2 fs timestep. Swapping was attempted every 2 ps for all systems, with an average success rate between 50% and 70% for each replica. Coordinates were saved every 2 ps. After equilibration, an unrestrained production run of 60 ns per replica was performed for AP, and 45 ns for AA, for a total production simulation time of 2.6 and 1.9 µs, respectively. Simulations were run until all replicas had visited all temperatures; this took somewhat longer for the AP system. Secondary structures for all but the 3 1 -helices were calculated using STRIDE [48], which uses the Kabsch and Sander rules [38] with stricter hydrogen bond definitions for assigning αand 3 10 -helices and β-sheets, and Thornton's definitions [39,40] for turns. For this analysis, all β-turn types were grouped together, but type III (which equals a 3 10 -helix) was excluded. STRIDE assignments were verified by visual inspections. Since 3 1 -helices are not identified by STRIDE, visual inspections were performed on samples of three consecutive residues, of which at least two residues had |ϕ| between 70 • and 90 • , and |ψ| between 140 • and 150 • ; absolute values were chosen to include both right-and left-handed helices. A total of 400 randomly selected structures were visually inspected for the occurrence of 3 1 -helices; the sample statistics were used to calculate the overall occurrence of 3 1 -helices, and boot strapping was used to estimate errors. All structural analyses were performed for the 300 K replicas.

Conclusions
Solid-state NMR and MD simulations were used in conjunction to illuminate the conformational structure of poly(Gly-Gly-X), one of the most common repetitive motifs found in dragline spider silk proteins. The combination of NMR and MD results provides new insight into the secondary structure of poly(Gly-Gly-X) segments and provides further support that these regions are disordered and primarily non-β-sheet. Further, the combination of NMR and MD simulations illustrate the possibility for several secondary structural domains in the poly(Gly-Gly-X) regions of dragline silks including β-turns, 3 10 -helicies, and coil structures with an insignificant population of α-helix observed. These solid-state NMR results and MD simulations highlight the complexity of this common spider silk protein motif. It is envisioned that this combined NMR experimental and MD computational method will be powerful moving forward for elucidating the conformational structure and hierarchical organization of other silk motifs that remain under determined.