Biochemical and Structural Characterization of Chi-Class Glutathione Transferases: A Snapshot on the Glutathione Transferase Encoded by sll0067 Gene in the Cyanobacterium Synechocystis sp. Strain PCC 6803

Glutathione transferases (GSTs) constitute a widespread superfamily of enzymes notably involved in detoxification processes and/or in specialized metabolism. In the cyanobacterium Synechocsytis sp. PCC 6803, SynGSTC1, a chi-class GST (GSTC), is thought to participate in the detoxification process of methylglyoxal, a toxic by-product of cellular metabolism. A comparative genomic analysis showed that GSTCs were present in all orders of cyanobacteria with the exception of the basal order Gloeobacterales. These enzymes were also detected in some marine and freshwater noncyanobacterial bacteria, probably as a result of horizontal gene transfer events. GSTCs were shorter of about 30 residues compared to most cytosolic GSTs and had a well-conserved SRAS motif in the active site (10SRAS13 in SynGSTC1). The crystal structure of SynGSTC1 in complex with glutathione adopted the canonical GST fold with a very open active site because the α4 and α5 helices were exceptionally short. A transferred multipolar electron-density analysis allowed a fine description of the solved structure. Unexpectedly, Ser10 did not have an electrostatic influence on glutathione as usually observed in serinyl-GSTs. The S10A variant was only slightly less efficient than the wild-type and molecular dynamics simulations suggested that S10 was a stabilizer of the protein backbone rather than an anchor site for glutathione.


Introduction
Glutathione transferases (GSTs) constitute a widespread superfamily of enzymes playing crucial roles in the cell notably in detoxification processes and in specialized secondary metabolism by catalyzing three major kinds of reactions. These include catalytic reactions where glutathione (GSH) is consumed (GSH-conjugation), reactions where GSH is not consumed (isomerization and dehalogenation) and reactions where GSH is oxidized (thiol-transferase and reduction activities) [1]. At the structural level, canonical GSTs are mainly dimeric proteins, and each subunit adopts a conserved fold composed of an Nterminal thioredoxin (TRX) domain linked to an all α-helical C-terminal domain. The active site of the enzyme is located in a cleft at the interface between both domains and contains a GSH-binding site (G site) and a hydrophobic-substrate binding site (H site). Depending by site-directed mutagenesis using the QuikChange site-directed mutagenesis kit (Agilent Technologies) and specific mutagenic primers listed in Table S1. The sequences have been confirmed by DNA sequencing.
The expression of recombinant SynGSTC1 and variants were performed at 37 • C using E. coli Rosetta2 (DE3) pLysS expression strain (Novagen) transformed with appropriate plasmid in LB medium supplemented with kanamycin (50 µg/mL) and chloramphenicol (34 µg/mL). When the cell culture reached an OD 600 nm of 0.7-0.8, the expression of the Syn-GSTC1 (or S10T or S10A or S10C or R11A) recombinant protein was induced with 0.1 mM isopropyl β-D-1-thio-galactopyranoside (IPTG) for 4 h at 37 • C. Cells were then harvested by centrifugation, resuspended in a 30 mM Tris-HCl buffer (pH 8.0) supplemented with 200 mM NaCl (lysis buffer) and stored at −20 • C until use. After the lysis of the cells by sonication, the resulting cell extract was centrifuged at 40,000× g for 20 min at 4 • C to remove cellular debris and aggregated proteins. After the addition of 10 mM imidazole, SynGSTC1 was purified from the soluble extract by gravity-flow chromatography on a nickel nitrilotriacetate (Ni-NTA) agarose resin (Qiagen, Hilden Germany). After a washing step with lysis buffer containing 20 mM imidazole, recombinant SynGSTC1 was eluted using lysis buffer supplemented with 250 mM imidazole. The fractions of interest were pooled, concentrated by ultrafiltration, subjected to a size exclusion chromatography using a Superdex TM 200 16/600 column connected to an ÄKTA-Purifier TM device (Cytiva) and eluted with lysis buffer. The purified recombinant protein was concentrated and finally stored at −20 • C. The concentration of SynGSTC1 recombinant protein was determined at 280 nm using a theoretical molar absorption coefficient of 28,420 M −1 ·cm −1 .

Crystallization, X-ray Data Collection, Processing and Refinement
A first screening of 288 crystallization conditions was carried out at the CRM2 crystallogenesis platform (University of Lorraine) with an Oryx 8 crystallogenesis robot (Douglas Instruments Ltd, Hungerford, UK). Crystals were optimized manually at 4 • C by the microbatch-under-oil method. Solutions of SynGSTC1 ant the variants contained 30-40 mg·mL −1 protein in 30 mM Tris buffer (pH 8.0) supplemented with 200 mM NaCl, 1 mM Tris(2-carboxyethyl)phosphine (TCEP) and 10 mM glutathione. SynGSTC1 was crystallized by mixing 1 µL of protein with 1 µL of solution consisting of 16% (w/v) PEG 8000, 40 mM potassium phosphate monobasic and 20% (w/v) glycerol (condition no. 32, Wizard™ Classic Crystallization Screen III, Rigaku, Tokyo, Japan). Preliminary X-ray diffraction experiments were carried out in-house on an Agilent SuperNova diffractometer (Rigaku Oxford Diffraction) equipped with a CCD detector. Data collections were carried out at the ESRF, on beamline FIP BM07 (ESRF, Grenoble, France) and (PX1 and PX2, SOLEIL, Gif-Sur-Yvette, France). Data sets were indexed and integrated with XDS [21], and scaled and merged with Aimless [22] from the CCP4 suite [23]. The structure of SynGSTC1 was solved by molecular replacement using MoRDa [24] with the coordinates of a GST from Rhodobacter sphaeroides (PDB entry 3LSZ) as the search model. Structures of SynGSTC1 and its variants were refined with BUSTER [25] and manually improved with Coot [26]. The validation of all structures was performed with the PDB validation service (http://validate.wwpdb.org, accessed on 30 September 2022). The coordinates and structure factors have been deposited in the Protein Data Bank (PDB entries 8AI8, 8AI9, 8AIB). Crystal data, diffraction and refinement statistics are shown in Table 1.
CC 1/2 is the correlation coefficient of the mean intensities between two random half-sets of data. R work = ∑ hkl ||F obs | − |F calc ||/ ∑ hkl |F obs |. In total, 5% of reflections were selected for R free calculation. RMSZ: root mean square Z-score. The MolProbity clashscore is the number of serious clashes per 1000 atoms. Values in parentheses are for highest resolution shell.

Structure Analysis Based on Electron Density Distribution
To calculate the electrostatic interaction energies between residues of SynGSTC1 active site and the glutathione ligand, the electron charge density of the complex based on the Hansen and Coppens multipolar model [27] was determined (method detailed in Supplementary Materials). The electron density parameters for the SynGSTC1-GSH complex were transferred from the ELMAM2 database2, which provides parameters averaged over experimental peptide electron densities from ultra-high resolution X-ray scattering data [28]. In addition, polarization effects due to the environment were estimated in the transferred electron density using the procedure described recently by Leduc et al. [29] and implemented in MoProViewer software (version 0.1.1302) [30]. The electrostatic interaction energy (E elec tot ) between the glutathione ligand and the SynGSTC1 active site residues was computed using Charger, which is a fast and analytical electrostatic energy calculation tool [31] also implemented in MoProViewer. E elec tot includes two terms, the electrostatic interaction permanent energy E elec perm and the polarization contribution E elec pol (hence E elec tot = E elec perm + E elec pol ). The MoProViewer database transfer tool enables an automatic parameter transfer on the structure with appropriate formal charge assignment (+1e for arginine and lysine, −1e for aspartate and glutamate, 0 for others). The His38 and His61 of SynGSTC1 were protonated on the Nε atom and the formal charge of glutathione was set to −1e. The procedure is detailed in the Supplementary Materials.

Molecular Dynamics Simulation
The Molecular Dynamics simulations presented in this study were based on the crystallographic structure of SynGSTC1 in complex with GSH. This system was immersed in a cubic simulation cell of length equal to 73 Å filled by a solvent of 9881 water molecules with 150 mM NaCl. The simulations were performed using NAMD 3.0 [32] with the CHARMM36 [33] force field for proteins and the TIP3P water model [34]. The parameters for the GSH ligand were generated by the CHARMM general force field (CGenFF) [35]. Long-range electrostatic forces were evaluated using the particle mesh Ewald algorithm with a grid spacing of 1.0 Å. A smoothed 12.0 Å spherical cutoff was applied to truncate the short-range van der Waals and electrostatic interactions. The temperature was maintained at 300 K thanks to the Langevin thermostat and the pressure at 1 atm thanks to the Langevin piston method. Covalent bonds involving hydrogen atoms were restrained to their equilibrium length by the Rattle algorithm [36] and the water molecules were constrained to their equilibrium geometry using the Settle algorithm [37]. In addition, a mass-repartitioning scheme was used to integrate the equations of motion with a time step of 4 fs, according to Hopkins et al. [38]. A smooth equilibration, along which the positions of the heavy atoms of the protein were restrained harmonically, was carried out during 8 ns before a non-restrained long equilibration of 100 ns. Then, the SynGSTC1-GSH complex was probed in production runs including a long simulation of 500 ns and five independent shorter simulations of 100 ns. These trajectories were visualized and analyzed using VMD [39]. These simulations were aimed at exploring the stability of the interactions between the glutathione and the active site as well as the flexibility of the protein interdomain linker.
For all activity assays, the recombinant protein, used at a concentration (3 µM) within the linear response range of the enzyme, was added after 2 min of preincubation and the variation of absorbance monitored using a Cary 50 spectrophotometer. The activity recorded without enzymes was subtracted and three independent experiments were performed at each substrate concentration. The kinetic parameters, apparent K m (Michaelis constant) and ksoftware cat (turnover number) were determined by fitting the data to the nonlinear regression Michaelis-Menten model in GraphPad Prism (version 8, GraphPad Software, Inc., San Diego, CA, USA). The k cat values were expressed as µmol of substrate oxidized per second per µmol of enzyme (i.e., the turnover number in s −1 ) using specific molar absorption coefficients of 9600 M −1 ·cm −1 at 340 nm for CDNB, 9250 M −1 ·cm −1 at 274 nm for BITC, 8890 M −1 ·cm −1 at 274 nm for PITC, 17700 M −1 ·cm −1 at 412 nm for PNP-butyrate and 6220 M −1 ·cm −1 at 340 nm for NADPH.

Phylogenetic Analysis
In total, 222 proteomes of the Cyanobacteria/Melainabacteria group were retrieved from the RefSeq database of the NCBI. These corresponded to 208 proteomes of Cyanobacteria labelled as RefSeq "reference proteomes" or from type strains, and 14 proteomes from noncyanobacterial lineages (i.e., Margulisbacteria, Melainabacteria, Gastranaerophilales) classified in the Cyanobacteria/Melainabacteria group (Table S2). The sequences of the 53 ribosomal protein families (rprots) were retrieved from the 222 proteomes using the riboDB database [40] (Table S3). The corresponding protein sequences were aligned using MAFFT v7.453 with the accurate option L-INS-I [41]. The resulting multiple alignments were trimmed with BMGE v1.2 using the BLOSUM30 substitution matrix [42]. The multiple alignments of the 52 rprots present in more than 30% of the 222 analyzed proteomes were combined to build a large supermatrix (222 sequences, 6430 amino acid positions) and used to build a phylogeny using the maximum likelihood method. The tree was inferred with IQ-TREE (multicore version 2.2.0 COVID-edition, June 2022) with the LG + C20 + F + R4 evolutionary model [43]. The branch robustness of the inferred tree was computed with the ultrafast bootstrap procedure implemented in IQ-TREE (1000 replicates). The resulting tree was rooted using the 14 noncyanobacterial sequences.
The 222 studied proteomes were queried with BLASTP using the GSTC1 sequence from the Synechocystis sp. PCC 6803 strain (RefSeq protein Id WP_010873500.1, locus tag SGL_RS13850) as seed. The 924 GST sequences displaying an E-value lower than 10 −3 were retrieved and aligned using MAFFT with the auto option. A total of 54 partial sequences were discarded from the analysis. A survey of the nr database at the NCBI identified 11 sequences of GSTC in noncyanobacterial bacteria. These 11 sequences were added to the cyanobacterial GSTC sequences. The 881 GSTC sequences were realigned with MAFFT with the L-INS-I option and trimmed using BMGE with the BLOSUM30 substitution matrix. The 104 kept amino acid positions were used to infer a phylogeny using FastTree v2 [44] with 20 rate categories of sites, the gamma optimization option, and the Le and Gascuel model [45]. The branch robustness of the inferred tree was estimated using the Shimodaira Hasegawa test (resampling the site likelihoods 1000 times). Finally, a phylogenetic analysis of the 147 cyanobacterial GSTC sequences was performed using FastTree and the same parameters (147 sequences, 110 amino acid positions).

Crystal Structure of SynGSTC1
In this study, the crystal structure of the glutathione transferase chi1 from Synechocystis sp. PCC 6803 (SynGSTC1) in complex with GSH is presented. We also solved the structures of two variants (S10T and R11A variants in complex with GSH) which did not show significant differences from the wild-type. The protein samples were cocrystallized with an excess of GSH (10:1) in the presence of TCEP to avoid oxidation of the GSH thiol group into sulfenic acid. SynGSTC1 crystallized in space group P4 3 2 1 2 with two polypeptide chains in the asymmetric unit. They formed a two-fold dimer that had a globular shape with molecular dimensions of approximately 55 Å× 55 Å × 45 Å (Figure 1). The dimer buried 1710 Å 2 of surface area for each monomer and was tightly stabilized by ten hydrogen bonds and six salt bridges (Table S4). At the core, a four-helix bundle consisting of the α3 and α4 helices of the two monomers buried aliphatic residues (L70 and L94 of chains A and B).
This interaction pattern was complemented by a lock-and-key motif where the F49 residue fitted into a low-polar cavity of the adjacent subunit (W92, F95, L117, L121) ( Figure 1). group into sulfenic acid. SynGSTC1 crystallized in space group P43212 with two polypeptide chains in the asymmetric unit. They formed a two-fold dimer that had a globular shape with molecular dimensions of approximately 55 Å× 55 Å × 45 Å (Figure 1). The dimer buried 1710 Å 2 of surface area for each monomer and was tightly stabilized by ten hydrogen bonds and six salt bridges (Table S4). At the core, a four-helix bundle consisting of the α3 and α4 helices of the two monomers buried aliphatic residues (L70 and L94 of chains A and B). This interaction pattern was complemented by a lock-and-key motif where the F49 residue fitted into a low-polar cavity of the adjacent subunit (W92, F95, L117, L121) ( Figure 1). Both subunits were very similar structures and could be superimposed within 0.33 Å root-mean-square deviation over 181 α-carbon atoms. The SynGSTC1 protomer adopted the conserved GST fold that was subdivided into two domains for clarity (Nterminal domain β1α1β2α2β3β4α3 and C-terminal domain α4α5α6α7α8). As mentioned in the introduction, the chain length of GSTCs (approximately 180 residues) was significantly shorter by at least 20 residues compared to most canonical GSTs [47]. The α4-α5 hairpin pattern was significantly shortened (roughly 10 residues) and the angle between these two helices (~42°) was twice that usually observed ( Figure 2). This "missing" region made the active site of SynGSTC1 very open, with no clear pocket for the hydrophobic substrate (H-site). Both motifs I (47-71) and II (129-147) played their expected structural roles. In motif I, the V52-P53 peptide bond was cis, and V52 formed the typical antiparallel β-sheet-like interaction with the cysteine moiety of GSH. Motif II contained the Ncap sequence 137 SVVD 140 where the side chains of the serine and aspartic acid residues contributed to the stabilization of the α6 helix [19]. The linker ( 76 ASTIPAD 82 ) between the N-and C-terminal domains was peculiar because it had no aliphatic or aromatic residue wedged between these two domains as usually observed [48,49]. The consequence was an interdomain linker without a unique conformation. The quality of the electron density allowed Both subunits were very similar structures and could be superimposed within 0.33 Å root-mean-square deviation over 181 α-carbon atoms. The SynGSTC1 protomer adopted the conserved GST fold that was subdivided into two domains for clarity (N-terminal domain β1α1β2α2β3β4α3 and C-terminal domain α4α5α6α7α8). As mentioned in the introduction, the chain length of GSTCs (approximately 180 residues) was significantly shorter by at least 20 residues compared to most canonical GSTs [47]. The α4-α5 hairpin pattern was significantly shortened (roughly 10 residues) and the angle between these two helices (~42 • ) was twice that usually observed ( Figure 2). This "missing" region made the active site of SynGSTC1 very open, with no clear pocket for the hydrophobic substrate (H-site). Both motifs I (47-71) and II (129-147) played their expected structural roles. In motif I, the V52-P53 peptide bond was cis, and V52 formed the typical antiparallel β-sheetlike interaction with the cysteine moiety of GSH. Motif II contained the Ncap sequence 137 SVVD 140 where the side chains of the serine and aspartic acid residues contributed to the stabilization of the α6 helix [19]. The linker ( 76 ASTIPAD 82 ) between the N-and C-terminal domains was peculiar because it had no aliphatic or aromatic residue wedged between these two domains as usually observed [48,49]. The consequence was an interdomain linker without a unique conformation. The quality of the electron density allowed the building of two major conformations (Figures 1 and S1). To investigate this property, we performed molecular dynamics simulations of the SynGSTC1-GSH complex in an aqueous environment. The simulation revealed a protein very stable with the linker as one of the most mobile regions. The time-evolution of the ϕ and ψ torsion angles of the linker residues revealed transitions between two main conformations during the trajectory ( Figure S2). Interestingly, these two conformations corresponded to those observed in the crystal structure. the building of two major conformations (Figures 1 and S1). To investigate this property, we performed molecular dynamics simulations of the SynGSTC1-GSH complex in an aqueous environment. The simulation revealed a protein very stable with the linker as one of the most mobile regions. The time-evolution of the φ and ψ torsion angles of the linker residues revealed transitions between two main conformations during the trajectory ( Figure S2). Interestingly, these two conformations corresponded to those observed in the crystal structure.

Structural Comparison
A search for the structural homologs using the Dali server (http://ekhidna2.biocenter. helsinki.fi/dali/, accessed on 30 September 2022) ranked proteobacterial nu GSTs and fungal GSTs from the Ure2p class at the top of the list [50]. The other hits included proteobacterial beta GSTs, insect delta GSTs, an unclassified proteobacterial GST and plant phi GSTs. To better depict the proximities of these structures, an additional multiple structural alignment was performed using the mTM-align server (https://yanglab.nankai.edu.cn/mTM-align/, accessed on 30 September 2022) [51] (Figure 2). The resulting dendrogram based on the pairwise alignment scores ( Figure S3) showed a distribution of the proteins into two clades, one of which containing SynGSTC1 and the unclassified proteobacterial GST (GST SMc00097 from Sinorhizobium meliloti 2011, PDB entry 4nhw). The latter had therefore the most similar structure to SynGSTC1. SMc00097 had one of the structural attributes of SynGSTC1, namely a SRAS motif at the beginning of the α1 helix in its active site (see below) ( Figure 2). The first serine residue adopted the same orientation and did not participate in the stabilization of GSH while the arginine residue did ( Figure S4). The closeness between SynGSTC1 and SMc00097 could be explained in a more comprehensive way by a domain-by-domain comparison. Indeed, the overall structures (i.e., both the N-ter and C-ter domains) of SynGSTC1 and SMc0097 overlapped well (Table S5). The proximity of SynGSTC1 with other hits (nu, beta, delta and phi GSTs) was rather due to the good overlap of N-terminal domains.

Active Site Structure and Its Analysis Using Transferred Multipolar Electron-Density
The active site contained GSH tightly bound to the G-site by numerous polar interactions (respectively, six, two and three for the γ-Glu, Cys and Gly moieties) (Figure 3). The GSH Cys moiety adopted two rotamers (m, χ 1 = −52 • and t, χ 1 = 172 • ) exposing the GSH thiol group towards the solvent (Figure 1). The three regular rotamers (p, m, and t) of the glutathione cysteine moiety were accessible during the molecular dynamics simulations with the frequencies of 0.35, 0.42 and 0.16, respectively ( Figure S5). The crystal structure did not reveal a strong polar interaction between the sulfur atom of GSH and the enzyme. The smallest distance was 3.8 Å with the amide group of R11. The Y5 residue, recently proposed as a catalytic residue [20], was far too distant to stabilize the GSH-thiolate group during catalysis as the Y5 hydroxyl group was 17 Å away from the GSH sulfur-atom. Based on the sequence analysis of SynGSTC1, we could have thought that S10 played an important role in catalysis. Indeed, this serine residue belongs to the 10 SRAS 13 motif, which is related to the CXXC active-site motif of thioredoxin [52]. The equivalent serine residue in Ser-GSTs (see introduction) is almost always found hydrogen-bonded to the GSH thiol group while this is not the case in SynGSTC1 [53,54]. Indeed, the OG atom invariably retained the same orientation in all structures (wild-type and variants), and was hydrogen bonded to the main chains of A7 and A12. This interaction network remained stable throughout most of the molecular dynamics simulations showing that S10 was important for the stabilization of the protein backbone. Whatever its conformation, this serine residue never formed a strong interaction with the GSH thiol group during the simulation ( Figure S6). 022, 12, x FOR PEER REVIEW 10 of 20 remained stable throughout most of the molecular dynamics simulations showing that S10 was important for the stabilization of the protein backbone. Whatever its conformation, this serine residue never formed a strong interaction with the GSH thiol group during the simulation ( Figure S6). The description of the interactions between a ligand and a protein is most often summarized by the list of residues involved, without quantifying the importance of each. We developed recently a fast and analytical procedure to estimate the electrostatic contribution of each residue to the ligand binding, based on a continuous distribution of electron density of experimental origin [31]. This method implemented in MoProViewer [30] was applied on SynGSTC1 in complex with GSH where the contributions of eleven residues were evaluated (distance cutoff of 3.5 Å away from GSH). This included eight residues from one chain (S10, R11, L33, H38, K51, V52, E64, S65 and N97) and three from the other (S98, T99 and R116). MoproSuite calculates electrostatic interaction energies that are divided into two contributions: a permanent electrostatic interaction term, , and a polarization one, , which can be interpreted as a molecular recognition term and an adaptation term, respectively ( Figure 4, Table S6). By definition, the polarization term is negative and makes the total interaction energy more favorable for all the active site residues and especially for charged residues [29]. We performed the calculations for the two GSH thiol orientations observed in the crystal structure. The orientation of the thiol group did not affect notably the GSH binding, from an electrostatic and dipolar-induction point of view (Table S6). Thus, the following analysis did not depend on the GSH conformation. The description of the interactions between a ligand and a protein is most often summarized by the list of residues involved, without quantifying the importance of each. We developed recently a fast and analytical procedure to estimate the electrostatic contribution of each residue to the ligand binding, based on a continuous distribution of electron density of experimental origin [31]. This method implemented in MoProViewer [30] was applied on SynGSTC1 in complex with GSH where the contributions of eleven residues were evaluated (distance cutoff of 3.5 Å away from GSH). This included eight residues from one chain (S10, R11, L33, H38, K51, V52, E64, S65 and N97) and three from the other (S98, T99 and R116). MoproSuite calculates electrostatic interaction energies E elec tot that are divided into two contributions: a permanent electrostatic interaction term, E elec perm , and a polarization one, E elec pol , which can be interpreted as a molecular recognition term and an adaptation term, respectively ( Figure 4, Table S6). By definition, the polarization term is negative and makes the total interaction energy more favorable for all the active site residues and especially for charged residues [29]. We performed the calculations for the two GSH thiol orientations observed in the crystal structure. The orientation of the thiol group did not affect notably the GSH binding, from an electrostatic and dipolar-induction point of view (Table S6). Thus, the following analysis did not depend on the GSH conformation. and total electrostatic interaction energies between the glutathione ligand and twelve residues of the SynGSTC1 active site are presented in kcal/mol. is computed using the electron density model transferred on the glutathione and the protein atoms, whereas is obtained after the electron density polarization procedure. Finally, is computed using = − , and represents the polarization contribution to the total electrostatic interaction energy. The reported energy values have been averaged over the two conformations of the glutathione (A and B) observed in the crystal structure and over the two monomers. The GSH formula has been added to highlight the proximity of the residues to the GSH moieties. The residues marked with a star (*) in the figure are not from the same monomer as glutathione. The numerical values of these energies and the associated standard deviations are available in the Supplementary Materials.
The permanent interaction energy pictures the electrostatic complementarity between the GSH chemical groups and the residues lining the binding site. GSH was assumed to bear three charges: a zwitterionic γ-glutamic acid moiety and a terminal glycine carboxylate group. The SynGSTC1 residues with the largest contributions were R11, K51, R116, which formed salt bridges with the GSH negative charges (Figure 3, Table S6). As an example, the energy values , and obtained for R11 were −56.6 kcal·mol −1 , −47.3 kcal·mol −1 and −9.3 kcal·mol −1 , respectively. S65 showed the most favorable among the uncharged residues ( = −39.3 kcal · mol ), and its contribution was close to those of R11 and R116 when the dipolar induction is included ( = −54.3 kcal · mol ). This serine residue was double-hydrogen-bonded to the γ-Glu carboxylate group. This interaction pattern, well conserved in GSTs, is ensured either by a serine residue or a threonine residue [55]. The negatively charged E64 residue was an interesting case because it had an unfavorable (13.6 kcal · mol ) that underwent a significant dipolar induction ( = −19.7 kcal · mol ) to interact with the positively charged N-terminal amine group of GSH ( = −6.0 kcal · mol , Figure 4, Table S6).
The major contributors for the GSH γ-Glu moiety, speaking, were therefore R11 via its guanidium group and S65 via its amide and hydroxyl groups (Figure 3). This showed that the site where the zwitterionic fragment of GSH was located was an electrophilic site. This property is verified in the crystallographic structures of glutathione-free GSTs because they often contain a negative ion in this site such as chloride, acetate or formate ions [56]. In addition, this electrophilic site was found to be catalytically The permanent E elec perm , polarization E elec pol and total E elec tot electrostatic interaction energies between the glutathione ligand and twelve residues of the SynGSTC1 active site are presented in kcal/mol. E elec perm is computed using the electron density model transferred on the glutathione and the protein atoms, whereas E elec tot is obtained after the electron density polarization procedure. Finally, E elec pol is computed using E elec pol = E elec tot − E elec perm , and represents the polarization contribution to the total electrostatic interaction energy. The reported energy values have been averaged over the two conformations of the glutathione (A and B) observed in the crystal structure and over the two monomers. The GSH formula has been added to highlight the proximity of the residues to the GSH moieties. The residues marked with a star (*) in the figure are not from the same monomer as glutathione. The numerical values of these energies and the associated standard deviations are available in the Supplementary Materials.
The permanent interaction energy E elec perm pictures the electrostatic complementarity between the GSH chemical groups and the residues lining the binding site. GSH was assumed to bear three charges: a zwitterionic γ-glutamic acid moiety and a terminal glycine carboxylate group. The SynGSTC1 residues with the largest contributions were R11, K51, R116, which formed salt bridges with the GSH negative charges (Figure 3, Table S6). As an example, the energy values E elec tot , E elec perm and E elec pol obtained for R11 were −56.6 kcal·mol −1 , −47.3 kcal·mol −1 and −9.3 kcal·mol −1 , respectively. S65 showed the most favorable E elec perm among the uncharged residues (E elec perm = −39.3 kcal·mol −1 ), and its contribution was close to those of R11 and R116 when the dipolar induction is included (E elec tot = −54.3 kcal·mol −1 ). This serine residue was double-hydrogen-bonded to the γ-Glu carboxylate group. This interaction pattern, well conserved in GSTs, is ensured either by a serine residue or a threonine residue [55]. The negatively charged E64 residue was an interesting case because it had an unfavorable E elec perm (13.6 kcal·mol −1 ) that underwent a significant dipolar induction (E elec pol = −19.7 kcal·mol −1 ) to interact with the positively charged N-terminal amine group of GSH (E elec tot = −6.0 kcal·mol −1 , Figure 4, Table S6). The major contributors for the GSH γ-Glu moiety, E elec perm speaking, were therefore R11 via its guanidium group and S65 via its amide and hydroxyl groups (Figure 3). This showed that the site where the zwitterionic fragment of GSH was located was an electrophilic site. This property is verified in the crystallographic structures of glutathione-free GSTs because they often contain a negative ion in this site such as chloride, acetate or formate ions [56]. In addition, this electrophilic site was found to be catalytically important as it hosts the γ-Glu carboxylate group which is presumed to decrease the pKa of the GSH thiol group [57]. The glycine part of GSH was surrounded by the two positively charged K51 and R116 residues, and by the lateral chain of H38 residue. These residues tightly stabilized the GSH C-terminal carboxylate group (Figures 3 and 4). Finally, the GSH Cys part was strongly stabilized by a single residue (V52) via two main-chain-main-chain hydrogen bonds (Figure 3). This twofold contribution was significantly lower compared to that of S65 probably because the V52-GSH interaction did not involve charged groups. The S10 residue, which was assumed to be the catalytic residue interacting with the thiol group, presented an unfavorable electrostatic interaction energy and did not contribute to the GSH stabilization (E elec tot = 1.3 kcal·mol −1 , Table S6). It also showed an almost zero polarization energy so this residue was not affected by the binding of the glutathione. This correlated well with the fact that the crystal structure of SynGSTC1 revealed no intermolecular interaction between S10 and GSH. The side chain of the "main" tyrosine residue of TyrGSTs (Tyrosine type GSTs) was always observed interacting with the thiol group of GSH in the crystal structures. The "main" serine residue of SerGSTs plays the same role in most known structures. We evaluated the electrostatic contribution of residues to GSH binding in a TyrGST (and a SerGST) containing a putative hydrogen bond between the tyrosine (serine) residue and GSH (Table S6). In both cases, the important residue (tyrosine or serine) provided a stabilizing effect on the ligand (E elec tot = −12 kcal·mol −1 and E elec tot = −7.3 kcal·mol −1 , respectively, Table S6). However, this contribution was never predominant. The main anchor points remained the positively charged residues that stabilized the terminal carboxylate groups of GSH.
We also investigated the structure-activity relationships of SynGSTC1 by targeting the first two residues of the 10 SRAS 13 motif, S10 being suspected to activate glutathione as in Ser-GSTs and R11 because of its ubiquity in GSTCs (see below in Section 3.5). The kinetic constants and the effect of pH on activities toward PITC were determined for S10T, S10A, S10C and R11A variants ( Table 2 and Figure S7). The optimal pH of Syn-GSTC1 WT (7.4 units) was in the same range as usually observed for GSTs [57]. The substitution of S10 by a threonine residue slightly decreased the optimal pH of the enzyme (6.9 vs. 7.4 for WT) and the catalytic efficiency (k cat /K m ) of the protein (2.1 × 10 5 M −1 ·s −1 for S10T vs. 3.7 × 10 5 M −1 ·s −1 for WT). This result was consistent with the crystal structure of S10T which was superimposable to the wild-type ( Figure S8). The bulkier threonine side chain did not impair GSH binding. Indeed, the GSH apparent affinity (K m ) was not altered in the S10T variant (Table 2). Furthermore, a sequence analysis of GSTCs (see below in Section 3.5) showed either a serine or a threonine as the first residue of the active site motif ( 10 SRAS 13 in SynGSTC1). S10A remained effective even though the decrease was greater than for S10T, being divided by 10 and 1.2 in S10A and S10T, respectively, compared to WT. All the kinetic parameters were affected roughly similarly. The crystal structure of Syn-GSTC1 did not show interaction between S10 and GSH. Instead, S10 was rather involved in stabilizing the β1-α1 loop in the close vicinity of the G-site (see above in Section 3.3) suggesting that the S10A substitution most likely disrupted the integrity of the active site. This resulted in a degradation of the catalytic constants and a moderate increase of the catalysis optimal pH (shift of 0.4 unit compared to WT). The R11A substitution also did not fully abolish the activity of the enzyme even though it decreased significantly (divided by a factor close to 250 as compared to WT). R11 formed a salt bridge with the N-terminal carboxylate group of GSH in the crystal structure (see above in Section 3.3). This interaction did not seem to be essential for the catalysis because the GSH apparent affinity in R11A was not much more degraded than in S10A (four and five times higher in S10A and R11A variants, respectively, compared to WT). The electrostatic influence of R11 on the catalytic process was, however, clear since the catalytic rate was 75 times lower in R11A compared to WT. This was accompanied by a significant one-unit increase in optimal pH suggesting a higher GSH-thiol pKa in the R11A variant than in the WT enzyme. These variations appeared small compared to those observed in eta GSTH1-1 from Agrobacterium tumefaciens, which harbored an arginine residue at the same position as in SynGSTC1. Indeed, the R34A mutation in AtuGSTH1-1 had a detrimental effect on the catalytic constant, which dropped by at least a factor of 5000 [58]. Finally, the substitution of S10 by a cysteine residue, had the same global effect as the S10A mutation (2.9 × 10 4 M −1 ·s −1 for S10C vs. 3.6 × 10 4 for S10A). Unlike the WT protein and other variants, the S10C enzyme was also active (k cat /K m of 3.36 × 10 3 ± 0.08 × 10 3 M −1 ·s −1 ) with HED, a substrate commonly used to characterize Grxs and cysteinyl-GSTs, indicating that this variant acquired a significant thiol-transferase activity. WT 3.73 × 10 5 ± 0.06 × 10 5 ND S10T 2.14 × 10 5 ± 0.04 × 10 5 ND S10A 2.89 × 10 4 ± 0.06 × 10 4 ND S10C 3.60 × 10 4 ± 0.06 × 10 4 3.36 × 10 3 ± 0.08 × 10 3 R11A 1.53 × 10 3 ± 0.02 × 10 3 ND The apparent K m values of SynGSTC1 wild-type and variants (S10T, S10A, S10C and R11A) were determined by varying substrate concentrations at a fixed saturating GSH concentration. The apparent K m and k cat values were calculated with Prism 8 software using the Michaelis-Menten equation as nonlinear regression model. Results are means ± S.D. (n = 3).

Comparative Genomic Analysis
A similarity-based survey of 222 reference proteomes of the Cyanobacteria/Melainabacteria group led to the identification of 870 full-length GSTC1 homologues (BLASTP E-value cutoff 10 −3 ). The phylogenetic analysis of these sequences led to a large tree ( Figure S9). According to this tree, the glutathione transferase chi1 from Synechocystis sp. PCC 6803 (SynGSTC1) belonged to a large group of 147 sequences displaying a SRAS motif or related motifs ( Figures S9 and S10 and Table S8). These 147 GSTC protein sequences displayed more than 35% of sequence identity and were largely distributed in Cyanobacteria, being present in 144 of the 208 analyzed cyanobacterial proteomes ( Figures 5 and S11 for high-quality version). In contrast, they were absent in the noncyanobacterial members of the Cyanobacteria/Melainabacteria group. More precisely, they were present in all cyanobacterial orders excepted Gloeobacterales, the oldest branching extant group of cyanobacteria [59]. 6803 (SynGSTC1) belonged to a large group of 147 sequences displaying a SRAS motif or related motifs ( Figures S9 and S10 and Table S8). These 147 GSTC protein sequences displayed more than 35% of sequence identity and were largely distributed in Cyanobacteria, being present in 144 of the 208 analyzed cyanobacterial proteomes ( Figure 5 and Figure  S11 for high-quality version). In contrast, they were absent in the noncyanobacterial members of the Cyanobacteria/Melainabacteria group. More precisely, they were present in all cyanobacterial orders excepted Gloeobacterales, the oldest branching extant group of cyanobacteria [59]. Figure 5. Phylogeny of the 222 proteomes of Cyanobacteria/Melainabacteria group considered in this study. The tree was inferred with IQ-TREE using the 52 rprots sequences present in more than 70% of the 222 proteomes (6430 amino acid sites, LG + C20 + F + R4 evolutionary model). The scale bar corresponds to the average number of substitutions per site. Gray circles correspond to ultrafast bootstrap values >90% (1000 replicates). The taxonomy of each proteome is indicated: Gloeobacterales (brown), Synechococcales (orange), Pseudanabaenales (pink), Gloeomargaritales (dark blue), Thermostichales (black), Oscillatoriales (light green), Chroococcales (yellow), Pleurocapsales (purple), Chroococcidiopsidales (dark green), Nostocales (light blue), and unclassified (gray). The 122 GSTC protein sequences harboring the SRAS motif are indicated with filled triangles, while the 25 GSTC sequences harboring variants of the SRAS motif are indicated with empty triangles. All the Figure 5. Phylogeny of the 222 proteomes of Cyanobacteria/Melainabacteria group considered in this study. The tree was inferred with IQ-TREE using the 52 rprots sequences present in more than 70% of the 222 proteomes (6430 amino acid sites, LG + C20 + F + R4 evolutionary model). The scale bar corresponds to the average number of substitutions per site. Gray circles correspond to ultrafast bootstrap values >90% (1000 replicates). The taxonomy of each proteome is indicated: Gloeobacterales (brown), Synechococcales (orange), Pseudanabaenales (pink), Gloeomargaritales (dark blue), Thermostichales (black), Oscillatoriales (light green), Chroococcales (yellow), Pleurocapsales (purple), Chroococcidiopsidales (dark green), Nostocales (light blue), and unclassified (gray). The 122 GSTC protein sequences harboring the SRAS motif are indicated with filled triangles, while the 25 GSTC sequences harboring variants of the SRAS motif are indicated with empty triangles. All the motifs are described in the Supplementary Table S8. The phylogeny of these 147 GSTC sequences is shown as Figure S10. A high-quality pdf version of the tree is provided as Figure S11 in the online Supplementary Materials.
To go further, we inferred the phylogeny of the 147 sequences displaying the SRAS motif (or related motifs) ( Table S9). As expected, due to the restricted number of amino acid positions retained after the alignment trimming, branch supports were overall low ( Figure S10). Despite this global lack of support, the resulting tree showed clearly that sequences harboring the SRAS motif and sequences harboring related motifs were mixed on the tree, indicating that the canonical SRAS motif was lost several times independently during the diversification of GSTCs. Furthermore, the topology of the tree also showed some inconsistencies with the phylogeny of species (Figures 5 and S10). For instance, some Chroococcales sequences emerged within Nostocales ( Figure S10), indicating that the evolutionary history of GSTCs harboring the SRAS (or related motifs) was impacted by horizontal gene transfers (HGTs) ( Figure S10). Interestingly, these HGTs also contributed to spread GSTC1 outside of Cyanobacteria, since homologues were found in a few noncyanobacterial bacteria ( Figure S9). Most of them were marine and freshwater bacteria and some were recently closely related to cyanobacteria as Planctomycetaceae bacterium TMED241. Indeed, it was found that this bacterium contains a circadian clock kaiABC operon, which is typically found in cyanobacteria [60].
The majority of the 147 GSTC sequences had a length of less than 190 amino acids (Table S10). A dozen had longer sequences because they contained extensions at the N-terminus and/or between the secondary structures. All GSTCs had a reduced C-ter domain with a shortening of helices α4 and α5 as observed in the crystal structure of SynGSTC1. The SRAS motif ( 10 SRAS 13 in SynGSTC1) was well conserved. The arginine residue was invariant, the first position was replaced in a few cases by a threonine residue and the last two positions were a bit more variable. Surprisingly other residues involved in the structural attributes of SynGSTC1 were not conserved such as the patch of leucine residues (L70 and L94) in the core of the dimer, or the key residue of the lock and key motif (F49), or quaternary contributors to the stabilization of GSH (S98, T99, R116) ( Figure 2). The sequence alignment revealed the conservation of 13 residues, most of which were located in the N-terminal domain (eight residues) and more precisely in the domain I (six residues) (Tables S8 and S9). The N-ter domain is generally better conserved than the C-ter domain because it contains an extended part of the active site [2]. In one subunit, the set of conserved residues was not centered on the active site but rather on the center of gravity of the monomer. The residues were distributed almost homogeneously around this center and most of them were located at a distance of less than 10 Å from it ( Figure S12). This distribution was consistent with what is usually observed in proteins, namely that the most conserved positions tend to be situated in the core of the protein or on functional surfaces [61]. While the structural role of these conserved residues is obvious, it is difficult to identify those that form the signature of GSTs chi and most of them have be shown conserved in a class of GSTs. Only N97 seemed specific to the GST chi class; it most likely contributed electrostatically to the active site, as it was located near the γ-Glu moiety of GSH and close to the guanidinium group of the SRAS motif (Table S9).

Conclusions
This study increased the knowledge on the biochemical characteristic acquired on the chi class of GSTs (GSTCs) and detailed for the first time the structural attributes of this GST class, specific to cyanobacteria. These short-sequence GSTs (~180 aa) had a three-dimensional structure with a very open active site because the α4 and α5 helices were significantly shorter than those usually observed. The glutathione substrate was tightly bound to the enzyme with its reactive center exposed to the solvent. The transfer of multipolar density parameters from small peptides to SynGSTC1 permitted the gradation of residues involved in GSH stabilization. The two carboxylate groups of GSH were the two chemical groups that best adhered to the protein.
GSTCs contained a SRAS conserved motif at the N-terminus of the α1 helix indicating that they belonged to the SerGST group because the first residue of the motif was a serine residue. However, this serine residue was not directly involved in the catalytic act as assumed in SerGSTs [9]. The SRAS motif appeared to constrain the conformation of the serine side chain towards the interior of the protein and not towards the thiol group of GSH. S10 (in SynGSTC1) had a weak and unfavorable electrostatic influence on GSH and its mutation did not drastically alter the catalytic properties of the enzyme. The denomination TyrGST, CysGST, SerGST and AtyGST (tyrosine type GST, . . . , Atypical GST) has the advantage of simplifying the confusing and cumbersome Greek letter classification. It is relevant in the case of TyrGSTs from a phylogenetic point of view [1]. It is also appropriate in the case of CysGSTs because the cysteine residue is covalently bound to the substrate in one step of the catalytic mechanism [62]. The disadvantage of the residue-based naming is its stigmatization on one residue that may not have a strong link to the activity of the enzyme as is the case for SynGSTC1.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/biom12101466/s1; Table S1. List of the PCR and mutagenic primers used in this study; Table S2. List of the 222 studied proteomes; Table S3. List of the 53 ribosomal protein families present in bacteria according to riboDB database; Table S4. Strong intersubunit contacts in SynGSTC1; Table S5. Comparison of SynGSTC1 with its structural homologs; Table S6. Permanent, polarization and total electrostatic interaction energies in the active sites of SynGSTC1, of Epsilon 2 GST from Anopheles Gambiae (AgGSTE2) and of Alpha 1 GST from chicken (GgGSTA1); Table S7. Kinetic parameters of SynGSTC1 toward model substrates; Table S8. List of the 147 full-length protein sequences displaying a SRAS motif (or a related motif) identified in the 222 proteomes of Cyanobacteria/Melainabacteria group; Table S9. Invariant amino acid residues in the GST Chi Class; Table S10. Multiple sequence alignment of the 147 GSTCs (displaying a SRAS motif or a related motif) identified in the 222 proteomes of Cyanobacteria/Melainabacteria group; Figure  S1. Stereoviews of the 2mFo-DFc map of the SynGSTC1 inter-domain linker; Figure S2. Φ and Ψ torsion angles for the inter-domain linker residues in SynGSTC1 during the simulation; Figure S3. Structure-based phylogenetic tree of SynGSTC1 with structural homologs; Figure S4. Stereoview of the comparison of the SRAS motif in SynGSTC1 and in GST SMc00097 from Sinorhizobium meliloti 2011; Figure S5. N-Cα-Cβ-Sγ torsion angle of glutathione during molecular dynamics simulation of SynGSTC1; Figure S6. Interatomic distances (Å) between γ-oxygen atom of Ser10 and selected atoms during molecular dynamics simulation of SynGSTC1; Figure S7. Optimal reaction pH of SynGSTC1 and variants S10T, S10A, S10C and R11A; Figure S8. Structural comparison of the active site of SynGSTC1 WT with S10T and R11A variants; Figure S9. Phylogeny of the 870 GST sequences identified in the 222 studied proteomes of the Cyanobacteria / Melainabacteria group and the 11 GSTC-related sequences identified in noncyanobacterial bacteria; Figure S10. Phylogeny of the 147 cyanobacterial GST1 sequences harboring the SRAS motif or related motifs; Figure S11. Phylogeny of the 222 proteomes of Cyanobacteria/Melainabacteria group considered in this study (high quality version of the tree provided in Figure 5); Figure S12. Stereoview of the invariant amino acid residues in the GST Chi Class and WebLogos of aligned GSTCs from cyanobacteria. References [63][64][65][66][67][68]  Funding: This work was supported by a grant from "Agence Nationale pour la Recherche" as part of the "Investissements d'Avenir" program (ANR-11-LABX-0002-01 and ANR-17-CE20-0008-01). This work was supported by the "Institut Jean Barriol".

Institutional Review Board Statement: Not relevant for this study.
Informed Consent Statement: Not relevant for this study.