Hidden Glutathione Transferases in the Human Genome

With the development of accurate protein structure prediction algorithms, artificial intelligence (AI) has emerged as a powerful tool in the field of structural biology. AI-based algorithms have been used to analyze large amounts of protein sequence data including the human proteome, complementing experimental structure data found in resources such as the Protein Data Bank. The EBI AlphaFold Protein Structure Database (for example) contains over 230 million structures. In this study, these data have been analyzed to find all human proteins containing (or predicted to contain) the cytosolic glutathione transferase (cGST) fold. A total of 39 proteins were found, including the alpha-, mu-, pi-, sigma-, zeta- and omega-class GSTs, intracellular chloride channels, metaxins, multisynthetase complex components, elongation factor 1 complex components and others. Three broad themes emerge: cGST domains as enzymes, as chloride ion channels and as protein–protein interaction mediators. As the majority of cGSTs are dimers, the AI-based structure prediction algorithm AlphaFold-multimer was used to predict structures of all pairwise combinations of these cGST domains. Potential homo- and heterodimers are described. Experimental biochemical and structure data is used to highlight the strengths and limitations of AI-predicted structures.


Introduction
Since glutathione transferase (GST) activity was discovered in rat liver and was postulated to play a role in drug detoxification [1], decades of research has led to the identification and isolation of multiple classes of GST from bacteria to man with a considerable array of catalytic and binding activities. In 1991, the first glutathione transferase structure was determined [2,3], the pi-class isozyme from pig in complex with glutathione-sulfonic acid (PDB ID 2GSR). Several features of that structure are now known to be typical of the cytosolic GSTs (cGSTs): it is a dimer, and each monomer contains an N-terminal domain (NDT) (having the thioredoxin-like βαβαββα topology) and a unique C-terminal domain (CTD) composed of α-helixes. Many crystal structures of cGSTs have revealed the binding location of glutathione (GSH) in the N-terminal domain (the "G-site") and the adjacent binding site for (often hydrophobic) co-substrates (the "H-site"). While hundreds of GST structures have been reported in organisms ranging from bacteria to man, the set of human proteins known to adopt the cGST fold can be regarded as incomplete. However, recent advances in protein structure prediction provide the tools to discover and analyze these "hidden" GSTs. Not considered here are the microsomal GSTs, which are trimeric, integral membrane proteins, and the mitochondrial kappa-class GST that form a distinct family of thioredoxin-fold-containing proteins [4].
With the availability of deep learning algorithms such as AlphaFold [5] and RoseTTAfold [6], we now have tools to make reliable predictions of protein structures. Briefly, these artificial intelligence (AI) systems use a neural network to extract the relationship between a protein's sequence and its 3D structure based on existing experimental data. Employed for the bulk of this  Slightly less than half of the proteins identified fall into previously identified classes of GST, including the alpha, mu, pi, theta, zeta and omega classes (18 proteins). The alpha-, mu-and pi-class GSTs are common in mammalian genomes and are well known for their role in phase II detoxification. They form a distinct clade in the phylogram ( Figure S2). Most characterized reactions involve GSH-conjugating activity with electrophiles. Some GSTs have additional roles. For example, the human pi-class GST modulates the activities of the mitogen-activated protein kinase (MAPK) signaling pathway via direct interactions with apoptosis signal-regulating kinase (ASK1) [12] and c-Jun N-terminal kinase 1 (JNK1) [13].
Prostaglandin E Synthase 2 (PTGES2), also known as microsomal prostaglandin E synthase type 2, was found. It catalyzes conversion of PGH 2 to prostaglandin E 2 (PGE 2 ) [15]. In addition to cGST-like motifs, this protein contains an insertion between the N-and Cterminal domains that results in an unusual dimerization interaction.
Three proteins classified as metaxins (MTX1, MTX2 and MTX3), which are associated with mitochondrial import and trafficking [16], were found.
Six intracellular chloride channels (CLIC1 to CLIC6) were identified. The CLICs pose an interesting challenge for structure prediction algorithms. While most experimental CLIC protein structures are characterized by the cGST fold, CLICs have both soluble and integral membrane forms [17]. Furthermore, CLIC1 has been shown to adopt two distinct soluble conformations [18].
The remaining hits are relatively poorly characterized proteins: ganglioside-induced differentiation-associated protein 1 (GADP1), ganglioside-induced differentiation-associated protein-1-like 1 (GADP1L1), failed axon connections homolog (FAXC) and glutathione S-transferase C-terminal domain-containing protein (GSTCD). RMSD-based comparisons show that the predicted GST structures are very similar to high-resolution crystal structures of human wild-type proteins (where they are available) ( Table 2). In general, the greatest deviations occur in surface loops or near the termini. In general, these regions are poorly ordered in crystal structures and show lower pLDDT scores in predictions ( Figure S1). Specific cases will be detailed below.  1 RMSD values calculated in ChimeraX. The number of Cα atoms used for comparison is provided in parentheses. The first column gives the pruned set of residues that provide the best fit. The second provides RMSD over all matching residues. 2 T210S mutant. 3 C80S, C121S, C136S, C140S, C170S, C214S mutant. 4 From Macaca fascicularis. 5 Oxidized form. 6 S156D mutant.

Prediction of GST Homo-and HeteroDimers
AlphFold predicted homodimers of all human alpha-, mu-, pi-, theta-and omega-class GSTs and HPGDS are consistent with crystallographic results obtained to date. For example, the homodimers GSTA1-1, GSTM1-1 and GSTP1-1 superimpose with RMSD values of 0.64 Å (over 422 Cα atoms), 0.58 Å (over 434 Cα atoms) and 0.35 Å (416 Cα atoms), respectively (Figure 2a-c). Of note is PTGES2, which has an unusual mode of dimerization thanks to a 46-residue insertion between the N-and C-terminal domains ( Figure 1). Nevertheless, AlphaFold predicted the structure of this insertion and the dimerization interaction correctly (RMSD 1.92 Å over 548 Cα atoms) (Figure 2d). These RMSD values are similar to those obtained when comparing crystal structures of the same proteins.
interaction correctly (RMSD 1.92 Å over 548 Cα atoms) (Figure 2d). These RMSD valu are similar to those obtained when comparing crystal structures of the same proteins. There are relatively few biochemical studies of GST heterodimers. Within the alpha, mu-and theta-class GSTs, all isoforms are predicted, based on model confidence scores, to form heterodimers (Table S1). Data concerning the human alpha-class GSTs provide valuable context for the interpretation of predicted heterodimer models. Within the human alpha-class GSTs, the A4 isozyme, a phylogenetic outlier ( Figure S2), has residues in its interface that distinguish it from the other alpha-class isozymes. Co-expression of human GSTA4 and GSTA1 in E. coli shows that, while each subunit prefers to form homodimers, it is possible to form the GSTA1-4 heterodimer. While both subunits were active with substrates CDNB, HNE or ∆ 5 AD in the heterodimer, the specific activities and k cat values were lower than the average values of the parent isozymes [22]. By contrast, the catalytic efficiencies (k cat /K M ) of various heterodimers of rat alpha-class GSTs toward CDNB were predictable from the activities of their corresponding homodimers [23]. The model confidence for the predicted GSTA1-4 heterodimer structure is the same as for the GSTA1-1 homodimer (0.94). In fact, the model confidence for the GSTA1-4 heterodimer is higher than that for the GSTA4-4 homodimer (0.92). These data show that that model confidence scores should not be used as a proxy for multimerization preference or activity. While heterodimer formation is possible, and structures of heterodimers predicted with high confidence, they may not be preferred in vitro or in vivo.
An important aspect of structure prediction is the fidelity with which the algorithm reproduces the conformation of binding-site residues. Comparison of the active site of predicted and crystal structures of GSTA1-1 indicates that some side-chains show small differences in conformation. Of 14 residues in the active site, different rotamers were assigned to three residues in the model versus the crystal structure ( Figure 2e). Residue R13 is assigned the mtt-180 • rotamer in the model but adopts the mtt-85 • rotamer in the crystal structure. Residue Q54 is assigned the pt-20 • rotamer in the model but adopts the mt-30 • conformation in the crystal structures. Finally, residue M208 is assigned the mtm rotamer in the model but adopts the ttp rotamer in the crystal structure. The consequence of these differences is that the placement of side-chains is similar but not identical. It is important to note that, while AlphaFold does not build ligands into predicted structures, the effects of ligand binding nonetheless may be imprinted on the structures on which the AI was trained. Thus, while the C-terminal helix (α9) is disordered in the unliganded structures (e.g., PDB 1GSD), it is nonetheless present in the predicted structures in a conformation observed in ligand-bound GSTA1-1 structures.
Few crystal structures of cGST domain-based heterodimers have been deposited in the PDB. However, published examples offer the opportunity to assess the quality of AlphaFold predictions of such assemblies. Comparison of the human GSTM2-M3 crystal structure solved at 2.8 Å resolution (PDB 3GTU) with the predicted structure yielded a RMSD of 0.69 Å over 434 Cα atoms. Comparison of the 2.6 Å crystal structure of the heterodimer of GST-like domains from EPRS1 and AIMP2 (PDB 5A34) with the AlphaFold prediction yielded an RMSD of 0.71 Å over 349 Cα atoms. The crystal structure of the GST-like domains of MARS1 and EEF1E1 (PDB 4BL7) with the predicted structure gave an RMSD of 0.80 Å over 366 Cα atoms.
A remarkable result is the prediction, with high confidence, of heterodimers of alpha-, mu-, pi-and theta-class-as well as other classes of-GST. (Selected heterodimers are shown in Figure 3.) While heterodimers within classes of GST are well known, there are few reports of heterodimers between classes of GST. Heterodimers of mu-and pi-class GSTs have been observed by incubation of rat GSTM2 and pig GSTP1 enzymes in phosphate buffer at 4 • C for 24 h [24]. The same researchers were unable to detect heterodimers of rat GSTs A1, A2 or A3 with the pi-class GST. However, the dissociation kinetics under experimental conditions may have been too slow to allow detection. Biomolecules 2023, 13, x 9 of 20

Metaxins and FAXC
Metaxin (from the Greek µεταξύ; "between") was first identified in mice as essential for embryonic development [25] and later shown to be a mitochondrial outer membrane component [26]. In humans, three metaxins have been described (MTX1, MTX2 and MTX3). These proteins have been identified as components of the mitochondrial intermembrane space bridging (MIB) complex that includes SAM50 (also a component of the Sorting and Assembly Machinery or SAM complex), DNAJC11 and several other components [27]. Mandibuloacral dysplasia associated with MTX2 (MADaM) is a sever condition with symptoms including growth retardation and arises from homozygous null mutations in the human MTX2 gene [28].
The GST-like domains of the metaxins are predicted to incorporate helices inserted between the canonical α4 and α5 helices cover the "G-site", which in these models is blocked and would be unable to bind GSH ( Figure 4). In line with earlier predictions that MTX1 has a transmembrane domain near its C-terminus ( Figure 1; residues 421 to 443) that is predicted to play a role in apoptosis [29], a hydrophobic helix is predicted for this region. The pattern of pairwise dimer predictions involving MTX2 and its close relatives suggests that MTX2 acts as a key interaction domain. The structures with the highest MC scores are MTX2 with MTX1, MTX3 and FAXC. Predicted heterodimers of MTX1-MTX2 and FAXC-MTX2 are shown in Figure 4b,c, respectively. Curiously, the MTX2 homodimer appears to be disfavored (MC = 0.28) and is predicted to form heterodimers with MTX1, MTX3 and FAXC, but not to form homodimers (Table S1).
There is little experimental or clinical data concerning FAXC ("Failed Axon Connections Homolog"). Transcriptomics reveal high levels of expression in the brain [30]. One report describes patients with developmental delay due to a 6q16.1 deletion that included the FAXC gene [31]. The human protein with the highest sequence similarity to FAXC is MTX2, with which FAXC is predicted to form a homodimer (MC = 0.90). Like MTX1 and MTX3, AlphaFold predictions of FAXC with other GST domains produced low ipTM scores with the exception of MTX2 (MC = 0.90). These data and phylogenetic analysis (Figure S2) suggest that FAXC is an outlying member of the metaxin family and, like MTX1 and MTX3, may form a heterodimer with MTX2.

Metaxins and FAXC
Metaxin (from the Greek µεταξύ; "between") was first identified in mice as essential for embryonic development [25] and later shown to be a mitochondrial outer membrane component [26]. In humans, three metaxins have been described (MTX1, MTX2 and MTX3). These proteins have been identified as components of the mitochondrial intermembrane space bridging (MIB) complex that includes SAM50 (also a component of the Sorting and Assembly Machinery or SAM complex), DNAJC11 and several other components [27]. Mandibuloacral dysplasia associated with MTX2 (MADaM) is a sever condition with symptoms including growth retardation and arises from homozygous null mutations in the human MTX2 gene [28].
The GST-like domains of the metaxins are predicted to incorporate helices inserted between the canonical α4 and α5 helices cover the "G-site", which in these models is blocked and would be unable to bind GSH ( Figure 4). In line with earlier predictions that MTX1 has a transmembrane domain near its C-terminus ( Figure 1; residues 421 to 443) that is predicted to play a role in apoptosis [29], a hydrophobic helix is predicted for this region. The pattern of pairwise dimer predictions involving MTX2 and its close relatives suggests that MTX2 acts as a key interaction domain. The structures with the highest MC scores are MTX2 with MTX1, MTX3 and FAXC. Predicted heterodimers of MTX1-MTX2 and FAXC-MTX2 are shown in Figure 4b,c, respectively. Curiously, the MTX2 homodimer appears to be disfavored (MC = 0.28) and is predicted to form heterodimers with MTX1, MTX3 and FAXC, but not to form homodimers (Table S1).
There is little experimental or clinical data concerning FAXC ("Failed Axon Connections Homolog"). Transcriptomics reveal high levels of expression in the brain [30]. One report describes patients with developmental delay due to a 6q16.1 deletion that included the FAXC gene [31]. The human protein with the highest sequence similarity to FAXC is MTX2, with which FAXC is predicted to form a homodimer (MC = 0.90). Like MTX1 and MTX3, AlphaFold predictions of FAXC with other GST domains produced low ipTM scores with the exception of MTX2 (MC = 0.90). These data and phylogenetic analysis ( Figure S2) suggest that FAXC is an outlying member of the metaxin family and, like MTX1 and MTX3, may form a heterodimer with MTX2.

GDAP1 and GDAP1L1
Ganglioside-Induced Differentiation-Associated Protein 1 (GDAP1) is a mitocho drial outer membrane protein involved in mitochondrial fission. GDAP1 mutations a associated with the autosomal recessive neurological disorder Charcot-Marie-Tooth d ease type 4A. Mutations in GDAP1 that are associated with disease (mostly missense m tations) impede mitochondrial dynamics. Crystal structures of the mouse [32] and hum [33] GDAP1 homologs confirmed the presence of the cGST fold. Efforts to detect the GS conjugating activity of GDAP have yielded mixed results, with some groups detecting activity [32,34], and others detecting GSH-conjugating activity with EA, p-nitrobe zylchloride and EPNP [35]. GDAP1L1 is a paralogue of GDAP1 (59% sequence identi and appears to function in mitochondrial fission. Expression of GDAP1L1 in macrophag is implicated in T-cell-and dendritic-cell-driven skin inflammation disease [36]. GDA and GDAP1L1 appear to be distant relatives of the CLICs ( Figure S2).
In addition to the cGST domain, GDAP has a sequence inserted between regions c responding to helix α4 and α5, and an auto-inhibitory hydrophobic domain (HD1) f lowed by a trans-membrane domain (TMD) near its C-terminus (Figures 1 and 5a). Mi chondrial fission and GST activity are dependent on HD1. Huber and co-workers p posed that HD1 switches between an autoinhibited mode, where HD1 blocks the cataly site, and an active mode, where HD1 dissociates from the catalytic site and associates w the membrane [35]. HD1 is in a position to influence the position of the loop betwe strand β1 and helix α1. The GDAP1L1 sequence also features the HD1 and TMD motif

GDAP1 and GDAP1L1
Ganglioside-Induced Differentiation-Associated Protein 1 (GDAP1) is a mitochondrial outer membrane protein involved in mitochondrial fission. GDAP1 mutations are associated with the autosomal recessive neurological disorder Charcot-Marie-Tooth disease type 4A. Mutations in GDAP1 that are associated with disease (mostly missense mutations) impede mitochondrial dynamics. Crystal structures of the mouse [32] and human [33] GDAP1 homologs confirmed the presence of the cGST fold. Efforts to detect the GSHconjugating activity of GDAP have yielded mixed results, with some groups detecting no activity [32,34], and others detecting GSH-conjugating activity with EA, p-nitrobenzylchloride and EPNP [35]. GDAP1L1 is a paralogue of GDAP1 (59% sequence identity) and appears to function in mitochondrial fission. Expression of GDAP1L1 in macrophages is implicated in T-cell-and dendritic-cell-driven skin inflammation disease [36]. GDAP1 and GDAP1L1 appear to be distant relatives of the CLICs ( Figure S2).
In addition to the cGST domain, GDAP has a sequence inserted between regions corresponding to helix α4 and α5, and an auto-inhibitory hydrophobic domain (HD1) followed by a trans-membrane domain (TMD) near its C-terminus (Figures 1 and 5a). Mitochondrial fission and GST activity are dependent on HD1. Huber and co-workers proposed that HD1 switches between an autoinhibited mode, where HD1 blocks the catalytic site, and an active mode, where HD1 dissociates from the catalytic site and associates with the membrane [35]. HD1 is in a position to influence the position of the loop between strand β1 and helix α1. The GDAP1L1 sequence also features the HD1 and TMD motifs. Crystal structures of mouse and human GDAP1 omit extensions at the N-and Ctermini and are dimers with an atypical arrangement: the beta-sheets form a sandwich stabilized by an inter-chain disulfide bond between C88 of each monomer (Figure 5b). None of the GDAP1 crystal structures reported to date appear be competent at binding GSH due to helix α2 and flanking regions being disordered ( Figure 5b) and residue P78 adopting a trans conformation. The equivalent proline in catalytically active cGSTs is in the cis conformation, is essential for GSH binding and is the only completely conserved G-site residue in the cGSTs. Intriguingly, the AlphaFold model of GDAP1 adopts a classic cGST-like structure including cis-proline in this region compatible with GSH binding (Figure 5c). Another key difference between model and crystal structures is the insertion between helices α4 and α5 (residues 154 to 200 in GDAP). In the AlphaFold model, this region consists of two helices forming a lid over the active site that contacts helix α2 like that seen in the predicted metaxin structures. This region is mostly disordered in crystal structures (Figure 5b). However, the residues that are present do not align with the Al-phaFold model, leading to a higher RMSD (7.511 Å over 259 residues) compared to other models for which experimental structures are available ( Table 2).

CLICs
Paradoxically, CLICs were first identified as chloride ion channels, yet they have a soluble form that adopts the cGST fold [37]. Several CLICs have been reported to spontaneously integrate into lipid bilayers. In addition to overall topology, CLICs contain features conserved in GSTs including the G-site cis-proline. They contain a conserved motif located between strand β1 and helix α1 that is also observed in glutaredoxins: CP(F/Y)C. A similar motif is also seen on the Omega-class GSTs (CPFA and CPYS in human GSTO1 and O2, respectively). Crystal structures of mouse and human GDAP1 omit extensions at the N-and Ctermini and are dimers with an atypical arrangement: the beta-sheets form a sandwich stabilized by an inter-chain disulfide bond between C88 of each monomer (Figure 5b). None of the GDAP1 crystal structures reported to date appear be competent at binding GSH due to helix α2 and flanking regions being disordered ( Figure 5b) and residue P78 adopting a trans conformation. The equivalent proline in catalytically active cGSTs is in the cis conformation, is essential for GSH binding and is the only completely conserved G-site residue in the cGSTs. Intriguingly, the AlphaFold model of GDAP1 adopts a classic cGSTlike structure including cis-proline in this region compatible with GSH binding (Figure 5c).
Another key difference between model and crystal structures is the insertion between helices α4 and α5 (residues 154 to 200 in GDAP). In the AlphaFold model, this region consists of two helices forming a lid over the active site that contacts helix α2 like that seen in the predicted metaxin structures. This region is mostly disordered in crystal structures (Figure 5b). However, the residues that are present do not align with the AlphaFold model, leading to a higher RMSD (7.511 Å over 259 residues) compared to other models for which experimental structures are available ( Table 2).

CLICs
Paradoxically, CLICs were first identified as chloride ion channels, yet they have a soluble form that adopts the cGST fold [37]. Several CLICs have been reported to spontaneously integrate into lipid bilayers. In addition to overall topology, CLICs contain features conserved in GSTs including the G-site cis-proline. They contain a conserved motif located between strand β1 and helix α1 that is also observed in glutaredoxins: CP(F/Y)C.
A similar motif is also seen on the Omega-class GSTs (CPFA and CPYS in human GSTO1 and O2, respectively).
The predicted structures of the human CLICs all have the cGST fold and are in excellent agreement with experimentally determined structures (where available) ( Table 2). A noteworthy exception is the oxidized form of CLIC1, represented by PDB structure 1RK4. This contains an intramolecular disulfide between residues C24 and C59 and the N-terminal domain is rearranged [18]. The predicted model of CLIC1 matches the reduced, cGST-like form of CLIC1 (RMSD = 2.18 Å over 236 Cα atoms) and not the exceptional oxidized structure (RMSD = 7.47 Å over 213 Cα atoms) (Figure 6d). The predicted structures of the human CLICs all have the cGST fold and are in excellent agreement with experimentally determined structures (where available) ( Table 2). A noteworthy exception is the oxidized form of CLIC1, represented by PDB structure 1RK4. This contains an intramolecular disulfide between residues C24 and C59 and the N-terminal domain is rearranged [18]. The predicted model of CLIC1 matches the reduced, cGSTlike form of CLIC1 (RMSD = 2.18 Å over 236 Cα atoms) and not the exceptional oxidized structure (RMSD = 7.47 Å over 213 Cα atoms) (Figure 6d). CLIC 4, 5 and 6 have N-terminal extensions predicted to be largely disordered. The N-terminal extension on CLIC5 is predicted to contain an additional β-strand "β0" that runs parallel to strand β2 (Figure 6a,e). The N-terminal extension is deleted in the crystal structure of human CLIC5 (PDB ID 6Y2H) which, thus, does not contain this additional βstrand. The longest N-terminal extension (487 residues) in CLIC6 includes 14 copies of a decapeptide motif (consensus sequence AEGPAGDSVD; residues 150 to 295) [38]. This repeat region is predicted to form a right-handed, 15-stranded β-helix. The relationship between the secondary structure of the beta helix and the repeat is illustrated in Figure 7. CLIC 4, 5 and 6 have N-terminal extensions predicted to be largely disordered. The N-terminal extension on CLIC5 is predicted to contain an additional β-strand "β0" that runs parallel to strand β2 (Figure 6a,e). The N-terminal extension is deleted in the crystal structure of human CLIC5 (PDB ID 6Y2H) which, thus, does not contain this additional β-strand. The longest N-terminal extension (487 residues) in CLIC6 includes 14 copies of a decapeptide motif (consensus sequence AEGPAGDSVD; residues 150 to 295) [38]. This repeat region is predicted to form a right-handed, 15-stranded β-helix. The relationship between the secondary structure of the beta helix and the repeat is illustrated in Figure 7. The function of this domain is unknown. A DALI search of the PDB using this domain as a search model reveals an ice-binding protein from perennial ryegrass, Lolium perenne (PDB 3ULT), adhesin UspA1 from the Gram-negative bacterium Moraxella catarrhalis (PDB 3PR7) and tailspike protein TSP3 from bacteriophage CBA120 (PDB 6NW9). In keeping with their known monomeric structures, no human CLIC is predicted to form dimers (Table S1). Predictions of heterodimers of CLIC GST domains with other GST-domain proteins produced negative results except with AIMP2. In a study investigating the role of CLIC4 in pancreatic β-cell apoptosis, mass spectrometry experiments demonstrated an interaction between CLIC4 and AIMP2 [39].

GSTCD
GSTCD has been implicated in the development of Chronic Obstructive Pulmonary Disease. GSTCD −/− mice showed an increased lung TNF production in response to lipopolysaccharide. It was predicted to contain a methyltransferase domain [40].
Of the human proteins predicted to contain the cGST fold, the AlphaFold-predicted structure of the Glutathione S-transferase C-terminal domain-containing protein (GSTCD) is the greatest outlier (Figures 1 and 8). The thioredoxin domain has a region containing a two-stranded β-sheet (β1b + β1c) inserted between strand β1 and helix α2. Following strand β2, a loop and another beta strand (β2b) that forms part of the classic NTD beta sheet is inserted. While the NTD contains the cis-proline conserved in cGSTs, the G-site appears to be degenerate; superposition of the GSTCD model with cGST/GSH complexes show clashes with GSH (data not shown). Consistent with this is a lack of detectable GST activity [40]. With the exception of a short helix corresponding to helix α4, the rest of the CTD is rotated approximately 90° with respect to its usual position relative to the NTD. Between helices α5 and α6, a 93-residue helical bundle including a likely disordered 31 residue loop is inserted. Following helix α8 is another extended loop and, finally, the methyltransferase domain. A DALI search of the PDB using human GSTCD as a template yields the methyltransferase domain of Anabaena variabilis Hen1-C (PDB ID 3JWH) [41]. Hen1-C is responsible for methylation (using S-adenosyl methionine; SAM) of 2′-OH groups at the 3′ ends of small RNAs. Based on crystal structures, it is trivial to model Sadenosyl methionine in GSTCD (Figure 8b,c). A cluster of cysteine residues near the proposed SAM hint at a Zn 2+ binding site that may be involved in catalysis. The role that the cGST-like components of GSTCD play in function remains unclear, as these parts of the protein do not approach the putative SAM and substrate RNA-binding sites of the methyltransferase domain. In keeping with their known monomeric structures, no human CLIC is predicted to form dimers (Table S1). Predictions of heterodimers of CLIC GST domains with other GSTdomain proteins produced negative results except with AIMP2. In a study investigating the role of CLIC4 in pancreatic β-cell apoptosis, mass spectrometry experiments demonstrated an interaction between CLIC4 and AIMP2 [39].

GSTCD
GSTCD has been implicated in the development of Chronic Obstructive Pulmonary Disease. GSTCD −/− mice showed an increased lung TNF production in response to lipopolysaccharide. It was predicted to contain a methyltransferase domain [40].
Of the human proteins predicted to contain the cGST fold, the AlphaFold-predicted structure of the Glutathione S-transferase C-terminal domain-containing protein (GSTCD) is the greatest outlier (Figures 1 and 8). The thioredoxin domain has a region containing a two-stranded β-sheet (β1b + β1c) inserted between strand β1 and helix α2. Following strand β2, a loop and another beta strand (β2b) that forms part of the classic NTD beta sheet is inserted. While the NTD contains the cis-proline conserved in cGSTs, the G-site appears to be degenerate; superposition of the GSTCD model with cGST/GSH complexes show clashes with GSH (data not shown). Consistent with this is a lack of detectable GST activity [40]. With the exception of a short helix corresponding to helix α4, the rest of the CTD is rotated approximately 90 • with respect to its usual position relative to the NTD. Between helices α5 and α6, a 93-residue helical bundle including a likely disordered 31 residue loop is inserted. Following helix α8 is another extended loop and, finally, the methyltransferase domain. A DALI search of the PDB using human GSTCD as a template yields the methyltransferase domain of Anabaena variabilis Hen1-C (PDB ID 3JWH) [41]. Hen1-C is responsible for methylation (using S-adenosyl methionine; SAM) of 2 -OH groups at the 3 ends of small RNAs. Based on crystal structures, it is trivial to model S-adenosyl methionine in GSTCD (Figure 8b,c). A cluster of cysteine residues near the proposed SAM hint at a Zn 2+ binding site that may be involved in catalysis. The role that the cGST-like components of GSTCD play in function remains unclear, as these parts of the protein do not approach the putative SAM and substrate RNA-binding sites of the methyltransferase domain.

MCS Components
Aminoacyl-tRNA synthetases (ARSs) are enzymes that ligate amino acids to their corresponding tRNAs (reviewed by [42]). In eukaryotes, synthetases have been observed to form a complex termed the multisynthetase complex (MSC), an assembly held together by a variety of domains appended to the synthetases as well as structural adapter proteins. The human MSC includes nine ARSs (glutamyl-, prolyl-, isoleucyl-, leucyl-, methionyl-, glutaminyl-, lysyl-, arginyl-and aspartyl-tRNA synthetase). cGST domains appear in four MSC proteins: EPRS1 (glutamyl-prolyl-tRNA synthetase 1), MARS1 (MRS, methionyl-tRNA synthetase 1), AIMP2 (ARS-interacting multifunctional protein 2) and EEF1E1 (eukaryotic translation elongation factor 1 epsilon 1, also known as AIMP3). The cGST domains are essential to the MCS assembly through canonical and non-canonical cGST-dimerization interactions. EEF1E1 has a severely truncated N-terminal domain, missing strand β1 and helix α2. Conversely, AIMP2 has additional secondary structure elements in its N-terminal domain, including an N-terminal α-helix ("α0") and a strand introduced between β1 and β2 ("β2b") (Figures 1 and 9a). These novel features were correctly predicted by AlphaFold. Interestingly, EEF1E1, in addition to forming a heterodimer with MARS1, has also been observed to form a homodimer in an X-ray structure (PDB ID 2UZ8) [43]. The MC scores of the predicted complexes are 0.93 and 0.89, respectively.

X EEF1 Components
Eukaryotic elongation factor 1A (eEF1A) binds aminoacyl-tRNAs and delivers them to the ribosome A-site in a GTP-dependent manner. If the correct codon-anticodon interaction occurs, GTP hydrolysis is triggered at eEF1A. GDP-bound eEF1A is then released from the A-site. Translation-elongation factor complex eEF1B facilitates GDP/GTP exchange on eEF1A [44]. Like the MCS, eEF1B is formed from multiple subunits: eEF1Bα (also called elongation factor 1-beta; EF1B), eEF1Bβ (also called elongation factor 1-delta; EF1D) and eEF1Bγ (also known as EEF1G). Additionally, VARS1 interacts with eEF1 to α4 α1 β2 α5 α6 α7 β3 β4 α3 α8 α4 β1 α1 β2 α2 α5 α6 α7 β3 β4 α3 α8 α0 The crystal structures of EPRS1, AIMP2, EEF1E1 and MARS1 with a fragment of aspartyl-tRNA synthetase 1 (DARS1) (PDB ID 5Y6L) reveal canonical heterodimer interactions between AIMP2 and EPRS and between EEF1E1 and MARS1 and a non-canonical interaction between EPRS1 and EEF1E1 (Figure 9b) [19]. AlphaFold models of AIMP2-EPRS1 (MC score 0.93) and EEF1E1 and MARS1 (MC score 0.93) agree with the crystal structures. Despite the interaction between EPRS1 and EEF1E1 being a non-canonical dimerization interaction, AlphaFold correctly predicted the interaction (MC score 0.93). It should be noted that non-canonical complexes of the domains were also predicted, albeit with lower MC scores: EEF1E1-EPRS and EEF1E1-AIMP2 with MC scores 0.89 and 0.86, respectively. Intriguingly, some promiscuity in heterodimer formation is predicted with the four MCS cGST domains. MC scores over 0.9 were observed with GSTP1 with the theta-class GSTs, GSTP1 and CLIC1 (Table S1). The aforementioned CLIC4-AIMP2 complex has an MC of 0.84. Finally, prediction of the complex of the four subunits together was successful (Figure 9c).
Interestingly, EEF1E1, in addition to forming a heterodimer with MARS1, has also been observed to form a homodimer in an X-ray structure (PDB ID 2UZ8) [43]. The MC scores of the predicted complexes are 0.93 and 0.89, respectively.

X EEF1 Components
Eukaryotic elongation factor 1A (eEF1A) binds aminoacyl-tRNAs and delivers them to the ribosome A-site in a GTP-dependent manner. If the correct codon-anticodon interaction occurs, GTP hydrolysis is triggered at eEF1A. GDP-bound eEF1A is then released from the A-site. Translation-elongation factor complex eEF1B facilitates GDP/GTP exchange on eEF1A [44]. Like the MCS, eEF1B is formed from multiple subunits: eEF1Bα (also called elongation factor 1-beta; EF1B), eEF1Bβ (also called elongation factor 1-delta; EF1D) and eEF1Bγ (also known as EEF1G). Additionally, VARS1 interacts with eEF1 to form a "heavy" complex (eEF1H) [45]. Weak GST activity has been found in the rice EEF1G homolog [46].
Two components of EEF1 contain (or are predicted to contain) cGST domains: EEF1G and VARS1 (Figure 1). EEF1G is a two-domain protein. Unpublished crystal structures of the human EEF1G N-terminal domain with eEF1Bα (EF1B) (PDB ID 5DQS) and EEF1Bα (EF1D) (PDB ID 5JPO) reveal classic cGST homodimers (Figure 10a). The C-terminal domain of EEF1G was determined by NMR and consists of a five stranded anti-parallel β-sheet surrounded by α-helices [47]. There are no experimental structural data for VARS1. However, the structure predicted by AlphaFold has a cGST-domain at its N-terminus (residues 1 to 213) (Figure 10b). The VARS1 cGST domain is not predicted to contain a residue equivalent to the catalytic residues of catalytically active GSTs and therefore appears unlikely to have enzymatic activity. Reconstitution of the rabbit eEF1H complex in vitro showed that the cGST domain VARS1 interacts with eEF1Bβ [48]. The existence of the cGST domains of EEF1G a VARS1 prompts consideration of the possibilities for homo-and heterodimer format in EEF1H complex formation. Human EEF1G is observed to form classic cGST homo mers in available crystal structures (Figure 10a). However, no experimental structures ist for the VARS1 cGST domain. Pairwise predictions support both homo-and hetero mer formation of EEF1G and VARS1 cGST domains (Table S1). Interestingly, EEF1G/VARS1 heterodimer gives a higher MC score (0.91) than the VARS1 homodim (0.84; Figure 10b). Nevertheless, a VARS1 cGST-domain homodimer could be the basis the reported presence of two copies of VARS1 in the eEF1H complex [49]. Reconstitution of the rabbit eEF1H complex in vitro showed that the cGST domain of VARS1 interacts with eEF1Bβ [48]. The existence of the cGST domains of EEF1G and VARS1 prompts consideration of the possibilities for homo-and heterodimer formation in EEF1H complex formation. Human EEF1G is observed to form classic cGST homodimers in available crystal structures (Figure 10a). However, no experimental structures exist for the VARS1 cGST domain. Pairwise predictions support both homo-and heterodimer formation of EEF1G and VARS1 cGST domains (Table S1). Interestingly, the EEF1G/VARS1 heterodimer gives a higher MC score (0.91) than the VARS1 homodimer (0.84; Figure 10b). Nevertheless, a VARS1 cGST-domain homodimer could be the basis for the reported presence of two copies of VARS1 in the eEF1H complex [49].

Discussion
The recent advancements in protein structure prediction are leading a revolution in structural biology. While such advances pose obvious opportunities to accelerate research, it is important to determine the strengths and limitations of such tools. The algorithms described and used in this study do not predict the binding of co-factors, metal ions or post-translational modifications. There are also biases associated with the structures used to train the AI, which naturally represent a small fraction of all protein structures. In this study, a bias appears to manifest in the predicted structure of GDAP1, which resembles an archetypal cGST domain more than the available GDAP1 crystal structures. Nevertheless, high-quality predictions can inform experimental strategies. For enzymes such as GSTs, predicted structures could be used (for example) to identify active-site residues for mutagenesis and kinetics studies investigating catalytic mechanism or enzymesubstrate interactions. Where proteins act as adapters for protein-protein interactions, models can be used to identify likely binding interfaces that can again inform mutagenesis studies. For structure determination, the models predict disordered regions likely to inhibit crystallization. Constructs for expression could therefore be designed that truncate or omit such regions.
Based on available experimental data, predicted structures of human cGST-domaincontaining dimers appear to be largely correct. Again, one exception is GDAP1: the mode of dimerization is unlike any other known cGST-domain-containing dimer and was not correctly predicted. Nevertheless, the successes found here compare favorably with recent benchmarking studies. Yin and co-workers [50] tested AlphaFold against a set of 152 diverse heterodimer complexes and reported near-native structure predictions for 43% of models. Again, success rates will be influenced by the training data and, therefore, some classes of dimer will be more accurately predicted than others.
Comparisons of predictions of heterodimers with biochemical data yields important lessons. Biochemical detection of heterodimers of mu-and pi-class GSTs [24] support the possibilities presented by predictions of heterodimers between classes of cGSTs (Table  S1). As noted above, the propensity for heterodimer formation in alpha-class GSTs is not reflected in MC or ipTM scores. This suggests that these confidence scores should be interpreted as indicating that an interaction is feasible but not necessarily thermodynamically favorable. Multimer formation in vivo will depend on expression levels as well as thermodynamic stability.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biom13081240/s1, Table S1: Model confidence, ipTM and pTM scores for pairwise predictions of cGST-domain-containing proteins. Where experimental structure data are available, pairs are highlighted in red. Figure S1: AlphaFold predictions of GST-domaincontaining proteins in the human genome shown in Cartoon form. Structures are colored by pLDDT (key at bottom right). Only the GST domains are shown, except for GSTCD where the whole protein is shown. Figure S2: Phylogram of 39 GST-domain sequences.  Table S1.

Conflicts of Interest:
The author declares no conflict of interest.