Crystal Structure of the Substrate-Binding Domain from Listeria monocytogenes Bile-Resistance Determinant BilE

BilE has been reported as a bile resistance determinant that plays an important role in colonization of the gastrointestinal tract by Listeria monocytogenes, the causative agent of listeriosis. The mechanism(s) by which BilE mediates bile resistance are unknown. BilE shares significant sequence similarity with ATP-binding cassette (ABC) importers that contribute to virulence and stress responses by importing quaternary ammonium compounds that act as compatible solutes. Assays using related compounds have failed to demonstrate transport mediated by BilE. The putative substrate-binding domain (SBD) of BilE was expressed in isolation and the crystal structure solved at 1.5 Å. Although the overall fold is characteristic of SBDs, the binding site varies considerably relative to the well-characterized homologs ProX from Archaeoglobus fulgidus and OpuBC and OpuCC from Bacillus subtilis. This suggests that BilE may bind an as-yet unknown ligand. Elucidation of the natural substrate of BilE could reveal a novel bile resistance mechanism.


Introduction
Listeria monocytogenes is a Gram-positive bacterium that causes listeriosis, a potentially fatal disease which imposes a significant global burden [1]. Infection usually occurs via contaminated food, in particular ready-to-eat varieties [2,3]. The success of Listeria monocytogenes as a food-borne pathogen is aided by its ability to survive and grow in extreme conditions found both in food-preparation environments and in the gastrointestinal tract e.g., at low temperatures (down to −0.4 • C), at high salt concentrations (up to 10% w/v NaCl), or in the presence of bile [4][5][6][7]. Determining the molecular mechanisms by which Listeria monocytogenes survives these stresses could help to understand and control its proliferation and is thus of interest from a food safety and health perspective. It is also relevant to the rational design of probiotic bacteria [8,9].
One of the major challenges to bacterial growth and survival in the gastrointestinal tract is the presence of bile (for reviews, including mechanisms of toxicity and bacterial resistance, see References [10][11][12][13]). Bile is a digestive secretion produced by the liver and stored and concentrated in the gallbladder before being released into the lower intestine, where it aids in the emulsification and solubilization of fats. This is achieved by bile acids (also referred to as bile salts or bile alcohols, Figure S1), a class of amphipathic cholesterol-derived steroids, which are also potent antimicrobials due to their activity against biological membranes [12,14]. Several bile resistance mechanisms have been identified in Listeria monocytogenes and other bacteria including the expression of multidrug resistance efflux pumps and modification of bile acids via the enzyme bile salt hydrolase [6,7,[15][16][17][18].
Listeria monocytogenes BilE is a putative ATP-binding cassette (ABC) transporter that promotes bile tolerance in vitro and colonization of the gastrointestinal tract in vivo [15,16,18]. However, the specific mechanism(s) by which it achieves this is unknown. There are many transporters known to mediate bacterial stress responses and virulence through the uptake of compatible solutes, low molecular weight osmolytes that can be accumulated to high concentrations without interfering with cellular processes [19]. In Listeria monocytogenes, the ABC transporters Gbu and OpuC, which are homologous to BilE, as well as the functionally related secondary transporter BetL, contribute to osmo-, bile-and chill-tolerance by importing glycine betaine and carnitine [16,[20][21][22]. Although it was initially hypothesized that BilE (then called OpuD) was an additional compatible solute transporter, it has since been demonstrated that it is unable to mediate transport of glycine betaine, carnitine, or choline (a metabolic precursor of glycine betaine), and plays no role in osmo-or chill-tolerance [18]. The observation that Listeria monocytogenes knockout cells accumulate lower levels of the bile component chenodeoxycholic acid led to the description of BilE as a bile exclusion system [18,23]. This is at odds with the similarity of BilE to Type I ABC importers, which are structurally and evolutionarily distinct from ABC exporters [24,25].
The main aim of this study was to generate structural information on BilE in order to make improved hypotheses about the mechanism(s) by which it mediates bile resistance. The soluble C-terminal domain of BilEB was crystallized and confirmed to have a substrate-binding protein fold belonging to structural subcluster F-III [26,27]. The characterized members of this group are all associated with ABC importers and bind compatible solutes. Although amino acid residues interacting directly with the substrate are largely conserved with characterized homologs, differences in surrounding residues mean that the overall architecture of the binding site varies significantly.

Sequence Analysis
BilE is encoded by the genes lmo1421 and lmo1422 in a genetic arrangement similar to the OpuA (BusA) system from Lactococcus lactis [28,29] (Figure 1A,C). The first gene in the operon encodes BilEA, the nucleotide-binding domain (NBD). The second gene encodes BilEB, a fusion protein containing both a transmembrane (TMD) domain and a substrate-binding domain (SBD). The predicted functional transporter is a homodimer ( Figure 1B). Although the architecture of BilE is similar to Lactococcus lactis OpuA, the individual domains share more sequence similarity with the equivalent subunits from Listeria monocytogenes OpuC [30], Bacillus subtilis OpuB and OpuC [31], and Archaeoglobus fulgidus ProX [32,33] ( Figure 1C). These are all Type I ABC importers that contribute to osmo-or thermotolerance through the import of quaternary ammonium compounds (QACs) which act as compatible solutes either directly (e.g., glycine betaine), or as metabolic precursors (e.g., choline).
Part of the justification for reclassifying BilE as a bile exporter was the identification of "bile acid permease signature sequences" conserved between BilEA and the NBDs of bile salt exporters Saccharomyces cerevisiae Ybt1 and mammalian ABCB11 [18]. We constructed an expanded sequence alignment, which includes the NBDs OpuCA and GbuA from Listeria monocytogenes, and see no evidence that BilEA shares unique sequence characteristics with the eukaryotic proteins ( Figure 1D and Figure S2).  Table A1. Genes and proteins are color-coded based on the predicted function: nucleotide-binding domains (NBDs), purple long dashes; transmembrane domains (TMDs), solid green; substrate-binding domains (SBDs), orange short dashes. When two TMDs are present, values from the closest match are reported. Numbers indicate the percentage sequence identity (similarity) between equivalent domains based on pairwise alignments made with the program needle from EMBOSS (the European Molecular Biology Open Software Suite) [34]. (D) An extract from a multiple sequence alignment of selected NBDs. The region shown is the same as in Sleator et al. [18], but the alignment has been expanded to include proteins from the Listeria monocytogenes Gbu and OpuC systems. Residues are shaded according to percentage sequence identity. (E) Multiple sequence alignment of BilEB and the TMDs from Listeria monocytogenes OpuC. Numbering is according to BilEB. Sequences are shaded according to percentage sequence similarity. Bars above the alignment indicate transmembrane segments (TMSs) predicted by TOPCONS [35] (light green) or by comparison to crystal structures of homologous ABC importers (dark green). Lettering indicates the orientation of predicted TMSs relative to the cytoplasm (out-to-in or in-to-out), which is in agreement between the two methods.
In the ABC transporters that have been crystallized, the fold of the TMD varies significantly between importers and exporters [24]. The HHpred server [36], which is designed for remote protein homology detection and structure prediction, was used to search the Protein Data Bank (PDB) with residues 1-231 of BilEB as the query sequence. The top matches were to the permease subunits of six different Type I ABC importers from bacteria and archaea (Table A2). The sequence alignment produced by HHpred covers the five transmembrane segments (TMSs) that are conserved between all of these structures and which, after dimerization, make up the core of the transporter. The predicted positions of these TMSs in BilEB closely match topology predictions made using the TOPCONS server [35] ( Figure 1E). The additional sixth TMS, predicted by TOPCONS, would position the predicted C-terminal SBD at the outer surface of the membrane where it could serve as receptor and bind ligands for subsequent transport. A comparison of BilEB with OpuCB and OpuCD from Listeria monocytogenes shows that the TMDs are similar in size, and that there is sequence similarity along the entire domain ( Figure 1E).
Based on sequence homology, the C-terminus of BilEB contains a substrate-binding domain (SBD) belonging to substrate-binding protein (SBP) subcluster F-III [26,27]. All characterized members of this cluster are part of an ABC transporter and bind QACs via cation-π interactions. They can be further divided into two groups based on whether the quaternary ammonium of the substrate is coordinated via three tryptophan residues (e.g., OpuAC from L. lactis [37] and ProX from Escherichia coli [38]) or four tyrosines (e.g., ProX from Archaeoglobus fulgidus [33]). Interestingly, a sub-group also exists which share their tertiary structure with the Lactococcus lactis OpuAC group (including tryptophan residues of the binding site) but in whose primary sequence the N-and C-terminal halves are swapped (e.g., OpuAC from Bacillus subtilis [39,40]). The sequence of BilEB is most similar to AfProX (33% identity, 51% similarity), although one binding site tyrosine is substituted for a phenylalanine. Figure 2 shows a multiple sequence alignment of BilEB, AfProX, and other selected homologs in which these four tyrosine residues are conserved: OpuBC and OpuCC from Bacillus subtilis [31]; OpuCC from Listeria monocytogenes [30], Staphylococcus aureus [41], Pseudomonas aeruginosa and Pseudomonas syringae [42]; OsmX (OpuCC) from Salmonella typhimurium [43,44]; and YehZ from Escherichia coli [45]. The known ligands of these proteins include glycine betaine, proline betaine, choline, acetylcholine, choline-O-sulfate, carnitine, and ectoine ( Figure S3). Several other studied homologs were excluded from this analysis: OpuCC from Pseudomonas syringae pv. syringae B728a [42], because it is 97% identical to the included OpuCC from Pseudomonas syringae pv. syringae DC3000; YehZ from Salmonella typhimurium [46], because it is 88% identical to EcYehZ and its substrate is unknown; and ProX from Agrobacterium tumefaciens, because no data apart from crystal structures (PDB ID 4NE4 and 4ND9) are available. substrate is coordinated via three tryptophan residues (e.g., OpuAC from L. lactis [37] and ProX from Escherichia coli [38]) or four tyrosines (e.g., ProX from Archaeoglobus fulgidus [33]). Interestingly, a sub-group also exists which share their tertiary structure with the Lactococcus lactis OpuAC group (including tryptophan residues of the binding site) but in whose primary sequence the N-and C-terminal halves are swapped (e.g., OpuAC from Bacillus subtilis [39,40]). The sequence of BilEB is most similar to AfProX (33% identity, 51% similarity), although one binding site tyrosine is substituted for a phenylalanine. Figure 2 shows a multiple sequence alignment of BilEB, AfProX, and other selected homologs in which these four tyrosine residues are conserved: OpuBC and OpuCC from Bacillus subtilis [31]; OpuCC from Listeria monocytogenes [30], Staphylococcus aureus [41], Pseudomonas aeruginosa and Pseudomonas syringae [42]; OsmX (OpuCC) from Salmonella typhimurium [43,44]; and YehZ from Escherichia coli [45]. The known ligands of these proteins include glycine betaine, proline betaine, choline, acetylcholine, choline-O-sulfate, carnitine, and ectoine ( Figure S3). Several other studied homologs were excluded from this analysis: OpuCC from Pseudomonas syringae pv. syringae B728a [42], because it is 97% identical to the included OpuCC from Pseudomonas syringae pv. syringae DC3000; YehZ from Salmonella typhimurium [46], because it is 88% identical to EcYehZ and its substrate is unknown; and ProX from Agrobacterium tumefaciens, because no data apart from crystal structures (PDB ID 4NE4 and 4ND9) are available. For a full list of organism names and protein accession numbers, see Table A1. The alignment has been truncated at the N-terminus to exclude signal sequences and residues not visible in the crystallographic structures. Sequences are numbered according to the unprocessed form of the protein, unless this varies from numbering used in published studies in which case the numbering is consistent with the literature. Residues are shaded according to percentage sequence identity. The secondary structure of BilEBSBD is indicated above the alignment. Domain A is indicated in purple, domain B in green. Arrows indicate β-sheets; large loops indicate α-helices, small loops indicate 310-helices (secondary structure assigned by DSSP [47]). Orange symbols below the alignment indicate amino acid residues involved in substrate binding either through caging the quaternary ammonium (closed circles) or forming salt bridges or hydrogen bonds (open circles).

Figure 2.
Multiple sequence alignment of BilEB and selected substrate-binding protein homologs. For a full list of organism names and protein accession numbers, see Table A1. The alignment has been truncated at the N-terminus to exclude signal sequences and residues not visible in the crystallographic structures. Sequences are numbered according to the unprocessed form of the protein, unless this varies from numbering used in published studies in which case the numbering is consistent with the literature. Residues are shaded according to percentage sequence identity. The secondary structure of BilEB SBD is indicated above the alignment. Domain A is indicated in purple, domain B in green. Arrows indicate β-sheets; large loops indicate α-helices, small loops indicate 3 10 -helices (secondary structure assigned by DSSP [47]). Orange symbols below the alignment indicate amino acid residues involved in substrate binding either through caging the quaternary ammonium (closed circles) or forming salt bridges or hydrogen bonds (open circles).

Structure of the Putative Substrate-Binding Domain from BilEB
We were unable to measure any glycine betaine transport or osmoprotective activity when BilE was expressed in Lactococcus lactis (data not shown). We decided to focus on the putative substrate-binding domain (residues 231-504 of BilEB, hereafter referred to as BilEB SBD ). BilEB SBD was successfully expressed in Escherichia coli with a His-tag and purified from the soluble fraction via affinity chromatography. The purified protein was analyzed by SEC-MALLS (size-exclusion chromatography coupled to static light-scattering, differential refractive index, and UV absorbance measurements) in order to determine its oligomeric state [48]. A single elution peak was observed with an estimated molecular mass of 32.4 kDa ( Figure S4). This is in good agreement with the molecular mass of 33.4 kDa calculated using the ProtParam tool [49] and indicates that the isolated BilEB SBD is monomeric in solution. Differential scanning calorimetry was used to screen for substrate binding, but no significant changes in melting temperature were observed in the presence of glycine betaine, carnitine, choline, proline, or cholate (5 µM of protein and up to 5 mM of ligand at pH 7.5; data not shown).
The X-ray structure of BilEB SBD was determined at a resolution of 1.5 Å and deposited in the PDB under accession number 4Z7E (Table 1). Two protein molecules are present in the asymmetric unit but they differ only slightly, with a root-mean-square deviation of atomic positions (RMSD) of 0.33 Å over all atoms. Several areas of density were assigned to polyethylene glycol and propanoic acid from the crystallization solution, but for the most part these are external to the protein and are not biologically relevant. The exception is a molecule of propanoic acid in the putative binding site of BilEB SBD , which will be discussed in the following section. The overall fold of BilEB SBD consists of two globular α/β domains joined by a hinge formed from two long flexible strands ( Figure 3A). Each domain is comprised of a five-stranded β-sheet surrounded by α-helices. This fold is characteristic of SBPs of subcluster F-III [26,27]. As expected based on sequence similarity, a search for structural homologs in the PDB using the Dali server [50] gave the best match as the open, unliganded structure of AfProX (PDB ID 1SW5, Table A3). The RMSD is 1.7 Å over 267 aligned residues (out of a total of 270). Matches with Z-scores greater than 25 were also returned for the structures of SaOpuCC, AtProX, BsOpuBC, BsOpuCC, and EcYehZ (Table A3). Based on comparisons with AfProX and in keeping with the literature we have defined domain A as containing the N-and C-terminus (residues 235-338 and residues 440-504) and domain B as containing the central region of the protein (residues 339-439). The binding site of SBPs contains residues from both domains, which trap the ligand by moving towards each other in what is often described as a "Venus's-flytrap" mechanism [51] ( Figure 3B). A comparison of the BilEB SBD structure with structural homologs shows that it was crystallized in the open conformation. The overall fold of BilEBSBD consists of two globular α/β domains joined by a hinge formed from two long flexible strands ( Figure 3A). Each domain is comprised of a five-stranded β-sheet surrounded by α-helices. This fold is characteristic of SBPs of subcluster F-III [26,27]. As expected based on sequence similarity, a search for structural homologs in the PDB using the Dali server [50] gave the best match as the open, unliganded structure of AfProX (PDB ID 1SW5, Table A3). The RMSD is 1.7 Å over 267 aligned residues (out of a total of 270). Matches with Z-scores greater than 25 were also returned for the structures of SaOpuCC, AtProX, BsOpuBC, BsOpuCC, and EcYehZ (Table A3)

Binding Site Comparisons
AfProX, BsOpuBC and BsOpuCC have all been crystallized in complex with their natural substrates [33,53,54]. In each case the quaternary ammonium group sits in a cage whose sides are formed by four Tyr residues and whose base is the main chain of either an Asp or Asn. The other part of the substrate is coordinated by interactions with specific side-chains. In the case of AfProX, the cage is formed by Tyr 63 , Tyr 111 , Tyr 190 , Tyr 214 , and Asp 109 , while the carboxylic tail of GB or PB forms salt bridges with Lys 13 and Arg 149 (domain A and B, respectively) and a hydrogen bond with Thr 66 (domain A). These residues are conserved in BilEB SBD in both the primary sequence and the crystallographic model (Figures 2 and 3C), except for the substitution of Phe 292 for Tyr 63 .
Studies of AfProX suggest that residues of the substrate-binding site from domain A undergo minimal conformational change between the open and closed forms of the protein [32,33]. Despite the fact that BilEB SBD has only been crystallized in the open, unliganded conformation, we can still make some comments about the putative binding site. A comparison of domain A from BilEB SBD and AfProX show that the position of the binding site residues is largely conserved (Figure 3C,D). The Phe 292 side chain of BilEB SBD , however, is tilted into the putative binding site. The corresponding residue in AfProX, Tyr 63 , forms hydrogen bonds with Glu 17 and Gln 18 [33] ( Figure 3D). This Glu residue is strictly conserved in characterized homologs ( Figure 2) and appears to hydrogen bond with the equivalent Tyr in all the available crystal structures. Phe 292 is unable to form these interactions.
The positioning of Phe 292 , as well as the substitution of Phe 15 and Gln 18 from AfProX with the less bulky Gly 243 and Pro 246 in BilEB SBD , results in a cavity on the face of domain A adjacent to the putative binding site ( Figure 3E). Within this cavity is a molecule of propanoic acid bound by stacking interactions with Phe 292 and hydrogen bonding with Tyr 335 . In chain B of the model, which contains the best density for the propanoic acid, two water molecules can also be clearly seen within this cavity. These are coordinated by hydrogen bonds with the propanoic acid, the main chain carboxyl of Pro 290 , and the main chain amide of Lys 241 . No such cavity is seen on the domain A face of AfProX, BsOpuBC or SaOpuCC ( Figure 3F and Figure S5 [53]. In the structure of BilEB SBD there are no nearby residues that would be able to fulfill this role.
These differences in the structure of BilEB SBD suggest that its natural ligand could vary significantly from those predicted based on sequence analysis. Unfortunately, the crystal structure does not lead directly to possible alternatives. Bile acids, besides being even larger than the extra space made available by the binding site cavity, do not make sense as an import substrate for a bile resistance protein; for bile acid tolerance the molecules would have to exported rather than imported.

Conservation of BilEB
In order to determine if the structural features described here are unique to BilEB, we used the UniRef50 database [55] to generate a list of sequences with significant similarity to BilEB SBD [E-value of less than 0.0001 in a Basic Local Alignment Search Tool (BLAST) search] and less than 50% pairwise sequence identity. Each sequence in the list thus represents a cluster of one or more original sequences. Using AfProX as the query sequence resulted in a very similar list. We then excluded sequences that did not contain either a Tyr, Phe or Trp residue at the four positions corresponding to the binding site tyrosines of AfProX. Our final list of 198 sequences contains proteins with various domain organizations including isolated SBPs as well as TMD-SBD, SBD-TMD and even SBD-SBD fusions ( Figure 4A, Supplementary File S1). Analysis with HHpred [36] suggests that these additional C-terminal SBDs belong to structural subclusters F-IV or E-II, both of which contain SBPs that bind amino acids and are associated with importers, channels or receptors [26,27]. We also identified three SBPs and four TMD-SBD fusions which contain a sequence swap similar to OpuAC from Bacillus subtilis [39,40] (Figure 4A). additional C-terminal SBDs belong to structural subclusters F-IV or E-II, both of which contain SBPs that bind amino acids and are associated with importers, channels or receptors [26,27]. We also identified three SBPs and four TMD-SBD fusions which contain a sequence swap similar to OpuAC from Bacillus subtilis [39,40] ( Figure 4A). (inset) reference sequences. Reference sequences for clusters containing SBPs discussed in this paper are shown in black. The reference sequence for the cluster containing PsOpuCC was removed during an automated trimming step. Topology predictions made by TOPCONS [35] are shown in green (TMSs) and purple (signal sequences). Orange indicates an additional SBP fold as predicted by HHpred [36]. Where sequences were manually altered to reflect a change in the primary sequence compared to the predicted tertiary structure (as described for Bacillus subtilis OpuAC), the shifted C-terminal portion is shown in blue. Dashed outlines indicate sequences containing a Gly residue at the position corresponding to Gly 243 of BilEB. The histogram below the main alignment shows sequence conservation. (B) Frequency plots showing the distribution of amino acids at corresponding positions of each alignment. Numbering shown is according to BilEB. (C) The ConSurf server [56] was used to map the conservation of each alignment onto the BilEBSBD structure. A molecule with the same color scheme as Figure 3 is included for clarity. Domains A and B are shown in dark grey and light grey, respectively. Residues predicted to form a cage around the quaternary ammonium are dark gold except for Phe 292 , which is pale gold. Gly 243 and Pro 246 are colored pale blue. (D) Phylogenetic distribution of the sequences represented in each alignment. Values shown are the number of unique organisms in each taxa which have sequences in the UniRef50 (left column) or UniRef90 (right column) clusters. Images were prepared using Jalview [57], PyMOL [52], and iTOL [58]. (inset) reference sequences. Reference sequences for clusters containing SBPs discussed in this paper are shown in black. The reference sequence for the cluster containing PsOpuCC was removed during an automated trimming step. Topology predictions made by TOPCONS [35] are shown in green (TMSs) and purple (signal sequences). Orange indicates an additional SBP fold as predicted by HHpred [36]. Where sequences were manually altered to reflect a change in the primary sequence compared to the predicted tertiary structure (as described for Bacillus subtilis OpuAC), the shifted C-terminal portion is shown in blue. Dashed outlines indicate sequences containing a Gly residue at the position corresponding to Gly 243 of BilEB. The histogram below the main alignment shows sequence conservation. (B) Frequency plots showing the distribution of amino acids at corresponding positions of each alignment. Numbering shown is according to BilEB. (C) The ConSurf server [56] was used to map the conservation of each alignment onto the BilEB SBD structure. A molecule with the same color scheme as Figure 3 is included for clarity. Domains A and B are shown in dark grey and light grey, respectively. Residues predicted to form a cage around the quaternary ammonium are dark gold except for Phe 292 , which is pale gold. Gly 243 and Pro 246 are colored pale blue. (D) Phylogenetic distribution of the sequences represented in each alignment. Values shown are the number of unique organisms in each taxa which have sequences in the UniRef50 (left column) or UniRef90 (right column) clusters. Images were prepared using Jalview [57], PyMOL [52], and iTOL [58].
The most common residues at the position corresponding to Gly 243 of BilEB are Phe (61.3%), Tyr (9.4%), Asp (7.3%), Gly (6.8%), Met (4.7%), Thr (3.1%), and Ser (2.1%) ( Figure 4B). Each of these alternatives, except for Tyr, is represented in the characterized QAC-binding homologs (Figure 2). Expanding the 13 reference sequences containing a Gly at this position to the UniRef90 database (a maximum of 90% pairwise sequence identity) returned 399 sequences, all of which are TMD-SBD fusions ( Figure 4A, inset, and Supplementary File S2). Gly 243 (98.8%), Pro 246 (99.0%) and Phe 292 (94.5%) are all highly conserved ( Figure 4B). Mapping the conservation from each multiple sequence alignment onto the BilEB SBD crystallographic model shows that conserved residues are clustered around the putative binding site ( Figure 4C). Within the Gly 243 subgroup the residues lining the cavity on the face of domain A are also conserved. This suggests that BilE is a member of a conserved subgroup with unknown transport specificity.
The initial list of UniRef50 reference sequences represents genes from a wide range of taxa, including eukaryotes, archaea and bacteria ( Figure 4D). The highest number of sequences is found within the class Bacilli. The UniRef90 Gly 243 subgroup shows a similar phylogenetic distribution, although it is even more heavily biased towards the Bacilli. A closer look suggests that BilE homologs are widespread within genera known to contribute to the human microbiome including Bacillus, Enterococcus, Lactobacillus, Neisseria, Staphylococcus, and Streptococcus, as well as in the genus Lactococcus.

Cloning, Expression and Purification
The isolated substrate-binding domain of BilE was expressed in Escherichia coli MC1061 with an N-terminal tag consisting of 10 histidine residues and a tobacco etch virus (TEV) protease cleavage site (for the full amino acid sequence see Appendix B). To construct the expression plasmid, amino acid residues 231-504 of BilEB were amplified by polymerase chain reaction using the oligonucleotides 5 -ATGGTGAGAA TTTATATTTT CAAGGTTCGG ATAAAAAGGA AATTACAATT GCTGGTAAAT TAG-3 and 5 -TGGGAGGGTG GGATTTTCAT TATTTAATAA TACCTTGATC TTTCAAATAG TCTTTGGCAA CTG-3 and cloned into pBADnLIC as described by Geertsma and Poolman [59]. The resulting plasmid, pBAD-lmo1422SBDnLIC, was confirmed by sequencing the entire open reading frame.
For expression, Escherichia coli containing pBAD-lmo1422SBDnLIC were inoculated into LB media containing 100 µg/mL ampicillin and grown at 37 • C with shaking to an OD 660 of 0.7. L-Arabinose was added to induce expression (final concentration 5 × 10 −3 % w/v) and the cultures incubated for 2 h at 25 • C with shaking. Cells were harvested by centrifugation and resuspended in Buffer A (50 mM potassium phosphate pH 7.5, 200 mM NaCl) with 5 mM MgSO 4 and 100 µg/mL DNase. Cells were disrupted by a single pass through a cell disruptor (Constant Systems Ltd, Daventry, UK) at 25,000 psi and stored at −80 • C before purification.
BilEB SBD was purified by Ni 2+ -affinity and size-exclusion chromatography using ÄKTA systems (GE Healthcare Life Sciences, Eindhoven, The Netherlands). Disrupted cells were thawed on ice and the insoluble fraction removed by ultracentrifugation (80,000 rpm, 4 • C, 20 min). The lysate was diluted 3-fold in Buffer A, and imidazole added to a final concentration of 10 mM. The solution was passed over a Ni 2+ -sepharose column (approximately 1 mL bed volume per mg wet weight cells) which had been pre-equilibrated in Buffer B (20 mM Hepes pH 7.5, 200 mM NaCl) with 10 mM imidazole. The column was washed with three volumes of Buffer B with 75 mM imidazole, and eluted with Buffer C (20 mM Hepes pH 7.5, 150 mM NaCl) with 500 mM imidazole. Elution fractions were collected directly into tubes containing 1/100 volumes of 0.5 M Na-EDTA (to give a final concentration of 5 mM) before being pooled, diluted 2.5-fold in Buffer C, and concentrated using a Vivaspin column (GE Healthcare Life Sciences, Eindhoven, The Netherlands) with a molecular weight cut-off of 10 kDa. The concentrated sample was then passed over a Superdex 200 10/300 GL column (GE Healthcare Life Sciences, Eindhoven, The Netherlands) in Buffer C. Eluted fractions were again concentrated using a Vivaspin column.
Determination of the molecular weight of purified BilEB SBD by SEC-MALLS was performed as described previously [48], using a Superdex 200 10/300 GL column pre-equilibrated in Buffer C.

Crystallization
All crystallization experiments were performed at 4 • C. Initial crystallization trials were carried out using a high-throughput crystallization robot (Mosquito, TTP Labtech, Melbourn, UK), commercially available kits from Molecular Dimensions, and the sitting-drop vapor-diffusion technique. BilEB SBD was found to crystallize in condition 25 from PACT premier (0.1 M PCTP (sodium propionate, sodium cacodylate trihydrate, Bis-Tris propane) pH 4.0, 25% (w/v) PEG-1500). Further crystallization trials were carried out using the hanging-drop vapor-diffusion method. The crystal used to solve the structure was obtained by mixing equal volumes of BilEB SBD (20 mg/mL in Buffer C) with the reservoir solution (0.1 M PCTP pH 4.25, 25% w/v PEG-1500). Cryo-buffer consisted of PACT premier 25 with an additional 15% (w/v) PEG-1500.

Structure Determination and Refinement
X-ray diffraction data were collected at the X06SA (PXI) beamline at the SLS (Paul Scherrer Institut, Villigen, Switzerland) and processed with the XDS package [60]. The structure was solved by molecular replacement with the program Phaser [61] using the structure of open, unliganded Archaeoglobus fulgidus ProX (PDB code 1SW5) as the search model. Refinement was performed with REFMAC5 [62] and PHENIX [63], with additional manual adjustments made in Coot [64]. The data and refinement statistics are listed in Table 1.

Sequence and Structure Analysis
Percent sequence identity values are from pairwise sequence alignments generated by the EMBOSS needle program [34]. All multiple sequence alignments were made using Clustal Omega [65] at the EMBL-EBI server [66]. Topology predictions were made using the TOPCONS server [35].
To generate a list of BilEB SBD homologs with lowered redundancy, a BLAST search was conducted against the UniRef50 database (accessed 9 September 2016) [55] using residues 231-504 of BilEB as the query sequence. Sequences less than 250 or more than 650 amino acids in length were discarded. The resulting list was trimmed using MaxAlign [67], which removes sequences creating significant gaps in a multiple sequence alignment. This is designed to remove truncated or incorrectly annotated sequences, but in this case also removed several proteins that appear to have a sequence swap similar to OpuAC of Bacillus subtilis [39,40]. These sequences were manually altered by removing the C-terminal portion and inserting it immediately following the predicted signal sequence or final TMS, and they were then added back to the list. Finally, any sequences that did not contain a Tyr, Phe, or Trp at the four positions corresponding to the binding site Tyr residues of AfProX were discarded. The final alignment contains 198 sequences of 267 to 650 amino acids in length. The members of each UniRef50 cluster whose representative sequence has a Gly at the position equivalent to Gly 243 of BilEB (n = 13) were gathered and reduced to 90% identity by mapping to the UniRef90 database. The same alignment and trimming procedures were applied, yielding a multiple sequence alignment with 399 sequences of 500 to 597 amino acids in length.

Conclusions
The C-terminus of BilE has a distinctive SBP fold, which supports the classification of BilE as a Type I ABC importer. Differences in the architecture of the binding site compared to characterized homologs indicate that the transport substrate(s) of BilE may vary significantly from those previously predicted (glycine betaine and other quaternary ammonium compounds which act as compatible solutes). This would explain why no transport activity has been observed, and opens up the possible discovery of a novel mechanism of bile resistance. The crystal structure of BilEB SBD determined in this study allows for the identification of possible candidate substrates via in silico methods such as ligand docking.
Supplementary Materials: The following are available online at www.mdpi.com/2073-4352/6/12/162/s1: Figure S1: Chemical structures of the most abundant bile acids in humans; Figure S2: Multiple sequence alignment of selected NBDs; Figure S3: Ligands of substrate-binding proteins homologous to BilEB SBD ; Figure S4: Determination of the oligomeric state of BilEB SBD ; Figure S5: The binding site on the face of domain A; File S1: Multiple sequence alignment of UniRef50 reference sequences; File S2: Multiple sequence alignment of UniRef90 reference sequences. Appendix A.

Organism 1 and Transporter Protein Subunits 2,3 and UniProtKB Accession Numbers
Pseudomonas syringae pv. syringae B728a (Ps) OpuC [42] OpuCD ( Table A3. Structural homologs of BilEB SBD found by searching the PDB using the Dali server [48]. The scores given are from the best match (highest Z-score/lowest RMSD) of individual chain-to-chain comparisons.