Crystal Structure of Pyrrolysyl-tRNA Synthetase from a Methanogenic Archaeon ISO4-G1 and Its Structure-Based Engineering for Highly-Productive Cell-Free Genetic Code Expansion with Non-Canonical Amino Acids

Pairs of pyrrolysyl-tRNA synthetase (PylRS) and tRNAPyl from Methanosarcina mazei and Methanosarcina barkeri are widely used for site-specific incorporations of non-canonical amino acids into proteins (genetic code expansion). Previously, we achieved full productivity of cell-free protein synthesis for bulky non-canonical amino acids, including Nε-((((E)-cyclooct-2-en-1-yl)oxy)carbonyl)-L-lysine (TCO*Lys), by using Methanomethylophilus alvus PylRS with structure-based mutations in and around the amino acid binding pocket (first-layer and second-layer mutations, respectively). Recently, the PylRS·tRNAPyl pair from a methanogenic archaeon ISO4-G1 was used for genetic code expansion. In the present study, we determined the crystal structure of the methanogenic archaeon ISO4-G1 PylRS (ISO4-G1 PylRS) and compared it with those of structure-known PylRSs. Based on the ISO4-G1 PylRS structure, we attempted the site-specific incorporation of Nε-(p-ethynylbenzyloxycarbonyl)-L-lysine (pEtZLys) into proteins, but it was much less efficient than that of TCO*Lys with M. alvus PylRS mutants. Thus, the first-layer mutations (Y125A and M128L) of ISO4-G1 PylRS, with no additional second-layer mutations, increased the protein productivity with pEtZLys up to 57 ± 8% of that with TCO*Lys at high enzyme concentrations in the cell-free protein synthesis.

The PylRS·tRNA Pyl pairs are useful for non-canonical amino acid incorporation because of their "orthogonality" (non-reactivity) to the 20 canonical aaRS·tRNA pairs in many

Overall Structure of ISO4-G1 PylRS
The genome of the methanogenic archaeon ISO4-G1 encodes an ISO4-G1 PylRS protein consisting of 273 amino acids [53], which is quite similar to MaPylRS [22,53] (Figure S1). ISO4-G1 PylRS was expressed very well as a soluble protein in Escherichia coli cells. The yield of the ISO4-G1 PylRS protein was over 100 mg per liter E. coli culture ( Figure S2), and the protein could be concentrated without aggregation to more than 20 mg/mL, which is comparable to that of M. alvus PylRS. For crystallographic analysis, ISO4-G1 PylRS was purified to homogeneity by three column chromatography steps (Materials and Methods). The crystallization of the ISO4-G1 PylRS protein was successful when using PEG3350 as the precipitant, and the crystal structure of the ISO4-G1 PylRS apo form has been determined at 2.78-Å resolution (Figure 1, Materials and Methods). The structure of the ISO4-G1 PylRS protein is shown in Figure 1. The asymmetric unit contains ten molecules of PylRS (five PylRS dimers, A/B, C/G, D/F, E/H, J/I). The final model shows good geometry, and all residues are within the allowed regions of the Ramachandran plot, as evaluated by Procheck [77] and Molprobity [78] (Table S1).
which is comparable to that of M. alvus PylRS. For crystallographic analysis, ISO4-G1 PylRS was purified to homogeneity by three column chromatography steps (Materials and Methods). The crystallization of the ISO4-G1 PylRS protein was successful when using PEG3350 as the precipitant, and the crystal structure of the ISO4-G1 PylRS apo form has been determined at 2.78-Å resolution (Figure 1, Materials and Methods). The structure of the ISO4-G1 PylRS protein is shown in Figure 1. The asymmetric unit contains ten molecules of PylRS (five PylRS dimers, A/B, C/G, D/F, E/H, J/I). The final model shows good geometry, and all residues are within the allowed regions of the Ramachandran plot, as evaluated by Procheck [77] and Molprobity [78] (Table S1).
Trp417, respectively), whereas the counterparts of Met128, Val167, and Ala221 in ISO4-G1 PylRS correspond to Leu309, Cys348, and Val401, respectively, in MmPylRSc. The present crystallographic analysis revealed the structural differences between ISO4-G1 PylRS, MaPylRS, and MmPylRSc (PDB: 2ZIM) ( Figure 3). The ISO4-G1 PylRS Met128 and Val167 residues, which are, respectively, conserved as Met129 and Val168 in MaPylRS, are bulkier than the corresponding Leu309 and Cys348 residues in MmPylRSc, respectively. Therefore, the internal volumes of the amino acid binding pockets of ISO4-G1 PylRS and MaPylRS are smaller than that of MmPylRSc.   Therefore, the β5-β6 hairpin undergoes random conformational changes. The Tyr204 side-chain in molecule D is disordered and thus might be in an intermediate form ( Figure 5i). The β5-β6 hairpin in ISO4-G1 PylRS is similar to that of MaPylRS, while a remarkable difference exists between the β5-β6 hairpin in ISO4-G1 PylRS and the β7-β8 hairpin in MmPylRSc. As described previously, the β7-β8 hairpin in MmPylRSc is very flexible and adopts multiple conformations regardless of the substrate binding [52] (Figures 2 and 4). In the MmPylRSc structure (PDB: 2ZIM), the β7-β8 hairpin is bent in the middle, and the tip half of the β-hairpin is elevated. Tyr384, at the tip of the bent β7-β8 hairpin, is buried deeply within the active site [52,54]. While the β5-β6 hairpin in ISO4-G1 PylRS still assumes bent conformations, Tyr204 of ISO4-G1 PylRS is not as deeply accommodated within the active site as compared to Tyr384 of MmPylRSc (Figures 2, 3 and 4a,b,h). In the crystal structure of the ISO4-G1 PylRS mutant for cyanopyridylalanine [32], Trp204, which is substituted for the strictly conserved Tyr residue in the PylRS family, at the tip of the bent β5-β6 hairpin is only slightly inserted into the active site as compared to Tyr204 of the ISO4-G1 PylRS apo form (Figures 4c and S5a). In contrast, in the crystal structures of the acrydonylalanine (and ATP/AMPPNP)-bound MaPylRS mutants [76], Tyr206 at the tip of the bent β5-β6 hairpin seems to penetrate more deeply within the active site than     In molecules A, C, E, F, and J, a π-π stacking interaction is observed between His225 and Trp237 (c-g), while in molecules B, G, H, and I, His225 is stabilized by a π-π stacking interaction with Tyr204 (h-l).

Structural Changes of Tyr204 and His225 in ISO4-G1 PylRS and Comparison with Those of MmPylRS and MaPylRS
Interestingly, the ISO4-G1 PylRS structure revealed that the His225 residue (corresponding to His227 in MaPylRS) undergoes drastic conformational changes in accordance with the location of the Tyr204 residue (corresponding to Tyr206 in MaPylRS) in the β5-β6 hairpin (Figures 2, 4, and 5). On the one hand, when Tyr204 is far from the amino acid binding pocket (the open conformation), a π-π stacking interaction is observed between the imidazole ring of His225 and the indole ring of Trp237. On the other hand, when Tyr204 is located inside the amino acid binding pocket (the closed conformation), the imidazole ring of His225 shifts and is stabilized by a π-π stacking interaction with the aromatic ring of Tyr204. This conformational change is not observed in the corresponding His227 residue of MaPylRS, according to the structure of the MaPylRS apo form (Figures 4d,e, 5 and S5b). However, the recently determined structure of the MaPylRS mutant in complex with the non-canonical amino acid acridonylalanine (and AMPPNP) revealed the conformational changes of residues 224-230, and the movement of His227 away from the active site upon acridonylalanine binding (Figures 4f and S5b) [76]. In contrast, no conformational changes of the corresponding Ile405 residue in MmPylRS are observed (Figures 4g,h and S5c). Accordingly, the structural changes of His225/His227 share common features with MaPylRS and ISO4-G1 PylRS, but not with Ile405 of MmPylRS.

Structure-Based Engineering of the First-Layer Residues in ISO4-G1 PylRS for Site-Specific Incorporation of Bulky Lysine Derivatives into Proteins by Cell-Free Protein Synthesis
Previously, we developed a system for genetic code expansion with bulky ZLys derivatives by using the MmPylRS (Y306A/Y384F)·tRNA Pyl pair [8,63,69,71] and the MaPylRS (Y126A/M129L)·tRNA Pyl pair [24,26]. To examine whether the ISO4-G1 PylRS·tRNA Pyl pair is useful for genetic code expansion, we rationally engineered two ISO4-G1 PylRS mutants from the PylRS structures. In the previous study, the MaPylRS (Y126A/M129L) ·tRNA Pyl and MaPylRS (Y126A/M129A) ·tRNA Pyl pairs successfully facilitated the sitespecific incorporation of TCO*Lys and mAzZLys into proteins in an E. coli cell-free protein synthesis system [26]. Therefore, the Y126A/M129L and Y126A/M129A mutations of the "first-layer residues", which directly contact the substrate amino acids, were transplanted into the corresponding sites (Tyr125 and Met128) of ISO4-G1 PylRS. The Y125A mutation (corresponding to Y306A in M. mazei PylRS and Y126A in M. alvus PylRS) enlarges the ISO4-G1 PylRS active site pocket, which then becomes suitable for accommodating bulky non-canonical amino acids [26]. In ISO4-G1 PylRS, the Met128 side-chain protrudes into the amino acid binding pocket (Figure 3), which would reduce the pocket size as compared with that of MmPylRSc. The Leu and Ala mutations at position 128 would enlarge the inner space of the active site pocket (Figure 3). All of the ISO4-G1 PylRS mutant proteins were quite soluble, and over 100 mg quantities of the purified ISO4-G1 PylRS proteins per liter E. coli culture were obtained (Materials and Methods).

Effects of the ISO4-G1 PylRS(Y125A/M128L) Concentration on Cell-Free Protein Synthesis with the Inefficient Amino Acid pEtZLys
Previously, we found that the protein productivities for the inefficient, bulky amino acid TCO*Lys can be enhanced by increasing the concentration of the M. alvus PylRS mutant [26]. To examine the effects of higher concentrations of the ISO4-G1 PylRS protein on non-canonical amino acid incorporation, cell-free protein synthesis with the superinefficient, bulky, non-canonical amino acid pEtZLys, which is useful for alkyne-azide click chemistry [63,82], was performed by using various concentrations of the ISO4-G1 PylRS protein. The protein productivities for pEtZLys were enhanced from 8% (0.19 mg protein/mL reaction) to 57% (1.32 mg protein/mL reaction) of the N11-GFPS1 control protein when the concentration of the ISO4-G1 PylRS (Y125A/M128L) protein was increased from 10 to 75 µM (Figure 6c). Therefore, we achieved the highest ever protein productivity for pEtZLys. The incorporations of the non-canonical amino acids into the N11-GFPS1 protein were confirmed by mass spectrometry analyses ( Figure S6). These results confirmed that the efficient synthesis of the full-length N11-GFPS1 protein containing non-canonical amino acids occurs without any non-specific suppression of the UAG codon with canonical amino acids in the cell-free system.

Effects of the Second-Layer Mutations of ISO4-G1 PylRS for Site-Specific Incorporation of Bulky Lysine Derivatives into Proteins by Cell-Free Protein Synthesis
We found that the productivities of N11-GFPS1 proteins containing ZLys, mAzZLys, and TCO*Lys obtained with the ISO4-G1 PylRS (Y125A/M128L) system were comparable to or higher than those from the MaPylRS (Y126A/M129L/H227I/H228P) system (Figure 6a) [26]. Previously, the second-layer IP (H227I/H228P) additional mutations of MaPylRS (Y126A/M129L) extensively improved the protein productivities for mAzZLys and TCO*Lys [26]. However, mutations of the second-layer residues His227 and Tyr228 in MaPylRS (corresponding to the Ile405 and Pro406 residues in MmPylRSc, respectively, (Figures 4, S1 and S5)) might affect the first-layer residues, which interact directly with substrate amino acids within the amino acid binding pocket [26].

Discussion
In the present study, we determined the crystal structure of ISO4-G1 PylRS, and by its structure-based engineering, we achieved full productivity of cell-free protein synthesis according to the expanded genetic code with a variety of bulky non-canonical amino acids. By introducing two mutations into the first layer of the amino acid-binding pocket in ISO4-G1 PylRS, we achieved full productivity of cell-free synthesis with ZLys, TCO*Lys, mAzZLys, and pAzZLys. The first-layer mutant of ISO4-G1 PylRS required no additional second-layer mutations for the full productivity with these bulky non-canoni- Figure 7. Cell-free protein synthesis with non-canonical amino acids using ISO4-G1 PylRS with the H225A mutation. The N11-GFPS1 proteins synthesized with the S30 extracts from E. coli B-60∆A::Z/pMINOR cells in the presence of non-canonical amino acids. Non-canonical amino acids were site-specifically incorporated into the N11-GFPS1 protein at position 17 in response to the UAG codon, by using the ISO4-G1 PylRS·tRNA Pyl and ISO4-G1 PylRS (H225A)·tRNA Pyl pairs for BocLys and PocLys. The yields of the N11-GFPS1 proteins containing non-canonical amino acids were estimated by fluorescence. Protein productivities with non-canonical amino acids were compared with those of the cell-free synthesis of wild-type N11-GFPS1 protein containing Ala at position 17 (WT control) and are shown on the bars. The values represent the means of three independent experiments with standard deviations.

Discussion
In the present study, we determined the crystal structure of ISO4-G1 PylRS, and by its structure-based engineering, we achieved full productivity of cell-free protein synthesis according to the expanded genetic code with a variety of bulky non-canonical amino acids. By introducing two mutations into the first layer of the amino acid-binding pocket in ISO4-G1 PylRS, we achieved full productivity of cell-free synthesis with ZLys, TCO*Lys, mAzZLys, and pAzZLys. The first-layer mutant of ISO4-G1 PylRS required no additional second-layer mutations for the full productivity with these bulky non-canonical amino acids. Even with the much bulkier and most inefficient non-canonical amino acid, pEtZLys, we finally achieved the highest-ever levels of protein productivity by using the ISO4-G1 PylRS (Y125A/M128L) protein at a 7.5-fold higher concentration than the standard protocol. So far, this drastic improvement of protein productivity for pEtZLys has never been accomplished with the M. mazei and M. alvus systems.
Previously, we introduced the Y126A mutation of MaPylRS (corresponding to the Y306A mutation of MmPylRS), and the M129L or M129A mutation in the first-layer residues [26]. We found that simply transplanting the MaPylRS (Y126A/M129L or Y126A/ M129A) mutations into ISO4-G1 PylRS was appropriate for bulky non-canonical amino acids. The two ISO4-G1 PylRS mutants (Y125A/M128L and Y125A/M128A) with enlarged amino acid binding pockets achieved full productivity and showed much higher activities than those of MmPylRS (Y306A/Y384F) for ZLys, mAzZLys, pAzZLys, and TCO*Lys (Figures 3a and 6a). However, the full productivity level has not yet been achieved for more difficult non-canonical amino acids, such as pEtZLys. Because ISO4-G1 PylRS, as well as MaPylRS, is highly water-soluble, ISO4-G1 PylRS mutants can be used in the cell-free reaction at much higher concentrations than that of the standard protocol. Consequently, the yield of the pEtZLys-incorporated protein reached 1.3 mg/mL per cell-free reaction (57% productivity level of the control protein synthesis) when the concentration of the ISO4-G1 PylRS (Y125A/M128L) protein was increased up to 75 µM (Figure 6c).
The higher catalytic activity of ISO4-G1 PylRS than that of MaPylRS in the cell-free system was achieved for the site-specific incorporation of N ε -(2-(trimethylsilyl)ethoxycarbonyl)-L-lysine into proteins [28]. The molecular mechanism underlying this higher catalytic activity of ISO4-G1 PylRS than those of MaPylRS and MmPylRS remains unknown. Based on the crystal structure of ISO4-G1 PylRS (Figures 2, 4 and 5), the β5-β6 hairpin may exist in a dynamic open-closed equilibrium, and the location and conformational change of the His225 residue appear to be important for the catalytic activity. The ISO4-G1 PylRS His225 residue is conserved as His227 in MaPylRS, and undergoes a drastic conformational change upon non-canonical amino acid (and AMPPNP)-binding (Figures 4 and S5) [76]. However, in MaPylRS, His227 does not interact with Tyr206 and Trp241, in contrast to the interactions of His225 with Tyr204 and Trp237 in ISO4-G1 PylRS. The ISO4-G1 PylRS mutant with His225 replaced by Ala225 abolished the protein productivities for non-canonical amino acids (Figure 7). In the case of ISO4-G1 PylRS, the His225Ala mutation might reduce the interactions of His225 with Tyr204 and Trp237. In the above-mentioned dynamic closed-open equilibrium of the hairpin, the degree of movement of the hairpin in ISO4-G1 PylRS may be comparable to those in MaPylRS and MmPylRS, concerning the tip positions between the open and closed forms; although, we still lack ISO4-G1 PylRS structures bound to amino acid substrates (Figures 2, 3, 4 and S5). The interactions of His225 with Tyr204 and Trp237 in ISO4-G1 PylRS ( Figure 5), which are not observed in MaPylRS, appear to be a driving force for the rapid conformational changes of the β5-β6 hairpin. The elucidation of the molecular mechanism underlying the higher catalytic activities of the ISO4-G1 PylRS mutants based on the PylRS structures will lead to the development of a next-generation platform for producing non-canonical amino acid-incorporated proteins.
In the present study, we demonstrated that the ISO4-G1 PylRS system extensively improved the protein productivities, even for the very difficult, non-canonical amino acid pEtZLys, which had not been achieved by the MmPylRS and MaPylRS systems. The ISO4-G1 PylRS·tRNA Pyl pair, rationally engineered based on the ISO4-G1 PylRS crystal structures, will serve as a more useful tool for next-generation genetic code expansion technologies.

Expression and Purification of PylRS Proteins
The pET28c vector plasmids containing the ISO4-G1 PylRS gene were transformed into the E. coli BL21-Gold (DE3) strain, and selected on LB agar plates supplemented with 50 µg/mL kanamycin. A single colony was grown at 37 • C in broth culture, containing 15 g tryptone, 7.5 g yeast extract, and 15 g NaCl per liter, supplemented with 50 µg/mL kanamycin. Expression of the N-terminally hexahistidine-tagged ISO4-G1 PylRS protein was induced with 1 mM IPTG when the OD 600 reached 0.6. The cultivation temperature was then lowered to 20 • C, and the culture was continued overnight. The E. coli cells were collected by centrifugation and stored at −80 • C. The cells were resuspended in 50 mM potassium phosphate buffer (pH 7.4), containing 500 mM NaCl, 25 mM imidazole, 5 mM β-mercaptoethanol, and protease inhibitor cocktail (Complete-EDTA free ULTRA, Roche, Basel, Switzerland) (buffer C), and were sonicated on ice. The cell lysate was centrifuged at 15,000× g for 15 min at 4 • C, and the supernatant was applied to a HisTrap column (Cytiva, Uppsala, Sweden), which was equilibrated with buffer C. The protein was eluted with buffer C containing 400 mM imidazole, instead of 25 mM imidazole, and peak fractions were collected. The protein fractions were pooled, concentrated, and applied to a HiLoad 16/60 Superdex 200 column (Cytiva, Uppsala, Sweden), and equilibrated with 30 mM potassium phosphate buffer (pH 7.4), containing 200 mM NaCl and 1 mM DTT. The eluted fraction was collected and dialyzed against 40 mM potassium phosphate buffer (pH 7.4), containing 50 mM NaCl and 1 mM DTT (buffer B). The histidine-tag peptide derived from the pET28c vector was cleaved with thrombin protease (1 u per 0.1 mg PylRS protein, Sigma-Aldrich, St. Louis, MO, USA) at 4 • C overnight. The dialyzed fraction was then loaded on a HiTrap Q column (Cytiva, Uppsala, Sweden), and after washing the column with buffer B, the bound proteins were eluted by a linear gradient of 50-635 mM NaCl. The eluted fractions were pooled, concentrated, and applied to a HiLoad 16/60 Superdex 200 column (Cytiva, Uppsala, Sweden), equilibrated with 30 mM potassium phosphate buffer (pH 7.4), containing 200 mM NaCl and 1 mM DTT. The eluted fractions were collected, dialyzed against 10 mM Tris-HCl buffer (pH 8.0), containing 150 mM NaCl, 10 mM MgCl2 and 10 mM β-mercaptoethanol, and concentrated by ultracentrifugation to 16.2 mg/mL. Aliquots of the ISO4-G1 PylRS protein were flash-cooled in liquid nitrogen and stored at −80 • C. The MaPylRS and MmPylRS proteins were purified as described previously [26]. The histidine-tagged PylRS proteins were purified by chromatography on HisTrap and Superdex 200 HiLoad columns. After dialysis, the eluted PylRS proteins were concentrated by ultracentrifugation.

Preparation of tRNA Pyl Transcripts
The tRNA Pyl s from the methanogenic archaeon ISO4-G1, M. alvus, and M. mazei were transcribed in vitro with T7 RNA polymerase, using the PCR-amplified DNA fragment as the template. The tRNA transcripts were precipitated with isopropanol, applied onto a Resource Q column (Cytiva, Uppsala, Sweden) equilibrated with 10 mM Tris-HCl buffer (pH 7.5), containing 5 mM MgCl 2 and 50 mM NaCl, and eluted by a linear gradient of 0.05-0.7 M NaCl. The purified tRNA Pyl transcripts were precipitated with ethanol and dissolved in 10 mM Tris-HCl buffer (pH 7.5) containing 5 mM MgCl 2 .

Crystallization, Data Collection, and Structure Determination
All crystallization screenings were performed by the sitting-drop vapor-diffusion method, by mixing 0.2 µL of the ISO4-G1 PylRS protein solution with 0.2 µL of reservoir solution, using a Mosquito liquid handling robot (TTP Labtech, now SPT Labtech, Melbourn, Hertfordshire, UK). Crystals were grown at 20 • C in conditions with 100 mM HEPES-NaOH buffer (pH 7.2), 20% PEG3350, and 200 mM KCl. The crystal was transferred to 100 mM HEPES-NaOH buffer (pH 7.2) containing 20% PEG3350, 200 mM KCl, and 18% trehalose, mounted on a nylon loop, and flash-cooled in liquid nitrogen. The X-ray diffraction datasets were collected at the beamline BL32XU in SPring-8 (Harima, Japan) at −173 • C and were processed with XDS [83]. The crystal of ISO4-G1 PylRS belongs to the space group P2 1 2 1 2 1 , with unit cell parameters of a = 98.51 Å, b = 102.68 Å, c = 349.86 Å, and α = β = γ = 90 • . The phase was calculated by the molecular replacement method with Phaser, using 6JP2 as the search model. Ten ISO4-G1 PylRS molecules were found per asymmetric unit, with a solvent content of 56.6%. Iterative cycles of model refinement by PHENIX [84] and manual model building with Coot [85] were performed. The R work and R free factors for the ISO4-G1 PylRS structure are shown in Table S1. The final model was validated with Molprobity [78] and Procheck [77]. Graphical images were prepared with PyMOL (http://pymol.sourceforge.net/ (accessed on 28 May 2020)). The statistics of the data collection and refinement are summarized in Table S1.

Patents
A PCT international patent application [WO2020/045656 A1] related to this work has been filed.