Determination and Dissection of DNA-Binding Specificity for the Thermus thermophilus HB8 Transcriptional Regulator TTHB099

Transcription factors (TFs) have been extensively researched in certain well-studied organisms, but far less so in others. Following the whole-genome sequencing of a new organism, TFs are typically identified through their homology with related proteins in other organisms. However, recent findings demonstrate that structurally similar TFs from distantly related bacteria are not usually evolutionary orthologs. Here we explore TTHB099, a cAMP receptor protein (CRP)-family TF from the extremophile Thermus thermophilus HB8. Using the in vitro iterative selection method Restriction Endonuclease Protection, Selection and Amplification (REPSA), we identified the preferred DNA-binding motif for TTHB099, 5′–TGT(A/g)NBSYRSVN(T/c)ACA–3′, and mapped potential binding sites and regulated genes within the T. thermophilus HB8 genome. Comparisons with expression profile data in TTHB099-deficient and wild type strains suggested that, unlike E. coli CRP (CRPEc), TTHB099 does not have a simple regulatory mechanism. However, we hypothesize that TTHB099 can be a dual-regulator similar to CRPEc.


Introduction
Transcription factors (TFs) are DNA-binding proteins that allow for modulation of transcription initiation in response to intracellular and extracellular changes. Over decades of research, there have been many advances in exploring the TFs regulatory mechanisms cells use to control their gene expression. However, technological innovations such as massively parallel sequencing and data sciences have expanded our interest in new model organisms and their adaptations. TFs are trans factors that bind to cis-regulatory elements, promoter or enhancer sequences known as TF binding sites (TFBSs). It has been reported that most of the bacterial TFBSs are found in the proximal region (about −100 to +20 bp from the transcription start site [TSS]) and distal regions (up to −200 from TSS) [1][2][3]. Functionally, TFs are categorized into activators and suppressors, with a few of them being dual-regulators [4]. Regarding the number of genes regulated, TFs are classified into local or global regulators [5]. Such characteristics make up the mechanism of transcription regulation and help identify novel TFs.
Proteomic studies allow the grouping of TFs into families based on structural comparison studies. However, new findings have shown that structurally similar TFs from distantly related bacteria are not usually evolutionary orthologs [6]. A more comprehensive characterization of the TF regulatory network is achieved by identifying the TFBSs, the genes regulated, and the method of regulation. Advances in computational biology and data processing have given rise to inclusive databases that can predict structure and function for TFs in new model organisms [7]. However, most of these databases are built from experimental studies.
To gain insights into transcriptional regulatory networks in extremophile organisms, our laboratory has employed a novel biochemistry-based method, Restriction Endonuclease, Selection, Protection, and Amplification (REPSA), to characterize several TFs in the extreme thermophilic model organism Thermus thermophilus HB8. To date, we have studied four tetracycline repressor protein (TetR) family transcriptional suppressors and have successfully identified their TFBSs [8][9][10][11]. Commonly, suppressors bind DNA in the absence of small-molecule modulators/cofactors and with high-affinity. Contrary, numerous transcriptional activators employ small-molecule modulators in order to bind DNA, thus complicating their analysis in vitro.
In this study, we explore the utility of REPSA to identify and characterize a potential thermophilic transcriptional activator, TTHB099. Protein sequence homology analysis indicates that TTHB099 is one of the four cAMP receptor protein (CRP) family members (TTHA1437, TTHA1359, TTHB099, and TTHA1567) in T. thermophilus HB8 and should bind palindromic DNA sequences as a homodimer [12]. However, despite having a cAMP binding domain, it does not require this cofactor to bind DNA. Here, we identified the preferred DNA-binding sequence for TTHB099 as the 16-mer motif: 5 -TGT(A/g)n(t/c)c(t/c)(a/g)g(a/g)n(T/c)ACA-3 . Furthermore, we used binding kinetics studies and mRNA expression data to validate potential biological roles of TTHB099.

Preferred TTHB099-Binding Sequences Selected Via REPSA
REPSA was used to select for TTHB099-binding sites present in a large pool (~60 billion molecules) of synthesized double-stranded DNA. Our selection library, ST2R24, has been successfully used in four previous studies [8][9][10][11]. IRDye ® 700 (IRD7)-labeled library DNA was incubated with purified TTHB099 protein to permit specific binding, then challenged by a type IIS restriction endonuclease (IISRE). Sequence-specific binding of TTHB099 to a subset of the library protected those oligonucleotides from endonuclease activity, thereby permitting their amplification by PCR. Seven rounds of binding, IISRE cleavage, and PCR resulted in the enrichment of DNA resistant to IISRE cleavage when TTHB099 was present (Figure 1, Round 7). Note that in Round 4, substantial uncut DNA appeared on the IISRE control lane (-/F) as well as the test lane (+/F). This nonspecific cleavage inhibition has been observed before and has been ascribed to the selection of FokI cleavage-resistant sequences [8,13]. Thus, subsequent rounds of REPSA were performed with an alternative, albeit less efficient IISRE (BpmI). databases that can predict structure and function for TFs in new model organisms [7]. However, most of these databases are built from experimental studies. To gain insights into transcriptional regulatory networks in extremophile organisms, our laboratory has employed a novel biochemistry-based method, Restriction Endonuclease, Selection, Protection, and Amplification (REPSA), to characterize several TFs in the extreme thermophilic model organism Thermus thermophilus HB8. To date, we have studied four tetracycline repressor protein (TetR) family transcriptional suppressors and have successfully identified their TFBSs [8][9][10][11]. Commonly, suppressors bind DNA in the absence of small-molecule modulators/cofactors and with high-affinity. Contrary, numerous transcriptional activators employ small-molecule modulators in order to bind DNA, thus complicating their analysis in vitro.
In this study, we explore the utility of REPSA to identify and characterize a potential thermophilic transcriptional activator, TTHB099. Protein sequence homology analysis indicates that TTHB099 is one of the four cAMP receptor protein (CRP) family members (TTHA1437, TTHA1359, TTHB099, and TTHA1567) in T. thermophilus HB8 and should bind palindromic DNA sequences as a homodimer [12]. However, despite having a cAMP binding domain, it does not require this cofactor to bind DNA. Here, we identified the preferred DNA-binding sequence for TTHB099 as the 16-mer motif: 5′-TGT(A/g)n(t/c)c(t/c)(a/g)g(a/g)n(T/c)ACA-3′. Furthermore, we used binding kinetics studies and mRNA expression data to validate potential biological roles of TTHB099.

Preferred TTHB099-Binding Sequences Selected Via REPSA
REPSA was used to select for TTHB099-binding sites present in a large pool (~60 billion molecules) of synthesized double-stranded DNA. Our selection library, ST2R24, has been successfully used in four previous studies [8][9][10][11]. IRDye ® 700 (IRD7)-labeled library DNA was incubated with purified TTHB099 protein to permit specific binding, then challenged by a type IIS restriction endonuclease (IISRE). Sequence-specific binding of TTHB099 to a subset of the library protected those oligonucleotides from endonuclease activity, thereby permitting their amplification by PCR. Seven rounds of binding, IISRE cleavage, and PCR resulted in the enrichment of DNA resistant to IISRE cleavage when TTHB099 was present ( Figure 1, Round 7). Note that in Round 4, substantial uncut DNA appeared on the IISRE control lane (-/F) as well as the test lane (+/F). This nonspecific cleavage inhibition has been observed before and has been ascribed to the selection of FokI cleavage-resistant sequences [8,13]. Thus, subsequent rounds of REPSA were performed with an alternative, albeit less efficient IISRE (BpmI).  Before proceeding, it is prudent to validate our selection of TTHB099-binding sequences. To do so, REPSA-selected DNA was subjected to a restriction endonuclease protection assay (REPA), which is very much like the binding and IISRE cleavage steps of REPSA [14]. The inclusion of a different fluorophore-labeled control DNA in these reactions permitted the discrimination of TTHB099-specific and nonspecific IISRE cleavage inhibition. Thus, Round 7 DNA exhibited the expected pattern of cleavage protection expected for a majority population of TTHB099-binding DNA (Figure 2A). However, Round 4 DNA exhibited TTHB099-independent cleavage protection of the selected DNA, consistent with a majority being refractory to cleavage by the IISRE FokI. Additional validation was achieved using an electrophoretic mobility shift assay (EMSA) to directly visualize TTHB099-DNA complexes. In this independent assay, different concentrations of TTHB099 protein were incubated with Round 1 and Round 7 DNA prior to electrophoresis ( Figure 2B). The slower mobility of the DNA-protein complex present in Round 7 but not in Round 1 DNA indicated that a substantial portion of the selected sequences contained stable TTHB099 binding sites. The results from REPSA and EMSA encouraged further studies on determining TTHB099-DNA binding sequences. Before proceeding, it is prudent to validate our selection of TTHB099-binding sequences. To do so, REPSA-selected DNA was subjected to a restriction endonuclease protection assay (REPA), which is very much like the binding and IISRE cleavage steps of REPSA [14]. The inclusion of a different fluorophore-labeled control DNA in these reactions permitted the discrimination of TTHB099specific and nonspecific IISRE cleavage inhibition. Thus, Round 7 DNA exhibited the expected pattern of cleavage protection expected for a majority population of TTHB099-binding DNA ( Figure  2A). However, Round 4 DNA exhibited TTHB099-independent cleavage protection of the selected DNA, consistent with a majority being refractory to cleavage by the IISRE FokI. Additional validation was achieved using an electrophoretic mobility shift assay (EMSA) to directly visualize TTHB099-DNA complexes. In this independent assay, different concentrations of TTHB099 protein were incubated with Round 1 and Round 7 DNA prior to electrophoresis ( Figure 2B). The slower mobility of the DNA-protein complex present in Round 7 but not in Round 1 DNA indicated that a substantial portion of the selected sequences contained stable TTHB099 binding sites. The results from REPSA and EMSA encouraged further studies on determining TTHB099-DNA binding sequences.

Identification and Characterization of TTHB099-Binding Motif
To massively parallel sequence the REPSA-selected DNA, Round 7 DNA was amplified with fusion PCR primers, purified, and emulsion PCR amplified onto individual sequencing particles (ISPs). The enriched ISPs were subjected to next-generation semiconductor sequencing using an Ion Personal Genome Machine (PGM) system. The multiplexed sequencing run yielded 6,921,164 total bases, 6,169,384 ≥ Q20, resulting in 120,585 reads of 57-bp mean length for the TTHB099 Round 7 DNA. A randomly selected set of 1000 reads was input into web version 5.0.5 of Multiple Em for Motif Elicitation (MEME) analyzed using default parameters with and without a palindromic filter [15]. The output position weight matrices displayed the best 23-mer motif without filters with an Evalue of 2.4 × 10 −2234 ( Figure 3A), and the best 16-mer palindromic motif with an E-value of 2.4 × 10 −2871 The presence (+) or absence (−) of TTHB099 and IISRE FokI (F) or BpmI (B) are indicated above each lane. The electrophoretic mobility of the intact (T) and cleaved (X) IRD8-labeled REPSAis control DNA (green, T c and X c ), IRD7-labeled ST2R24 selection template (red, T s and X s ), primer-dimers (D), as well as the IRD7_ST2R primer (P) are indicated at the right of the figure and color-coded to match the fluorescently labeled DNA present. (B) Shown are IR fluorescence images of electrophoretic mobility shift assays made with DNA mixtures obtained from Round 1 (left lanes) and Round 7 (right lanes) of REPSA selection incubated with increasing concentrations of TTHB099 protein (from left to right: 0, 5.06, 50.6, 506, and 5060 nM TTHB099). The electrophoretic mobility of a single protein-DNA complex (S) as well as the uncomplexed ST2R24 selection template (T) and IRD7_ST2R primer (P) are indicated at the right of the figure.

Identification and Characterization of TTHB099-Binding Motif
To massively parallel sequence the REPSA-selected DNA, Round 7 DNA was amplified with fusion PCR primers, purified, and emulsion PCR amplified onto individual sequencing particles (ISPs). The enriched ISPs were subjected to next-generation semiconductor sequencing using an Ion Personal Genome Machine (PGM) system. The multiplexed sequencing run yielded 6,921,164 total bases, 6,169,384 ≥ Q20, resulting in 120,585 reads of 57-bp mean length for the TTHB099 Round 7 DNA. A randomly selected set of 1000 reads was input into web version 5.0.5 of Multiple Em for Motif Elicitation (MEME) analyzed using default parameters with and without a palindromic filter [15]. The output position weight matrices displayed the best 23-mer motif without filters with an E-value of 2.4 × 10 −2234 ( Figure 3A), and the best 16-mer palindromic motif with an E-value of 2.4 × 10 −2871 ( Figure 3B). These statistically significant results indicate that the identified motifs are likely consensus sequences for the TTHB099 transcription factor.  Figure 3B). These statistically significant results indicate that the identified motifs are likely consensus sequences for the TTHB099 transcription factor. Noting that the nonpalindromic sequence logo is an extended version of the palindromic one, with seven vaguely significant nucleotides upstream, it was postulated that the palindromic logo is a better representation of the TTHB099 consensus DNA-binding sequence. To test this hypothesis, the 16-mer sequence 5′-TGTATTCTAGAATACA-3′ was incorporated into an ST2 background, yielding the probe ST2_099. A fixed concentration of IRD7-labeled ST2_099 was incubated with increasing purified TTHB099 protein concentrations to permit specific binding and the resulting products analyzed by EMSA ( Figure 4). We found that the TTHB099-ST2_099 complex exhibited similar electrophoretic mobility as observed with the TTHB099-Round 7 DNA complex ( Figure 2B  Biolayer interferometry (BLI) was used to characterize TTHB099-consensus DNA interactions. This innovative approach measures in vitro real-time interactions between macromolecules, including proteins and nucleic acids [16]. Our BLI analysis involved biotinylated consensus sequence, ST2_099, affixed onto streptavidin sensors interacting with increasing TTHB099 protein Noting that the nonpalindromic sequence logo is an extended version of the palindromic one, with seven vaguely significant nucleotides upstream, it was postulated that the palindromic logo is a better representation of the TTHB099 consensus DNA-binding sequence. To test this hypothesis, the 16-mer sequence 5 -TGTATTCTAGAATACA-3 was incorporated into an ST2 background, yielding the probe ST2_099. A fixed concentration of IRD7-labeled ST2_099 was incubated with increasing purified TTHB099 protein concentrations to permit specific binding and the resulting products analyzed by EMSA ( Figure 4). We found that the TTHB099-ST2_099 complex exhibited similar electrophoretic mobility as observed with the TTHB099-Round 7 DNA complex ( Figure 2B Figure 3B). These statistically significant results indicate that the identified motifs are likely consensus sequences for the TTHB099 transcription factor. Noting that the nonpalindromic sequence logo is an extended version of the palindromic one, with seven vaguely significant nucleotides upstream, it was postulated that the palindromic logo is a better representation of the TTHB099 consensus DNA-binding sequence. To test this hypothesis, the 16-mer sequence 5′-TGTATTCTAGAATACA-3′ was incorporated into an ST2 background, yielding the probe ST2_099. A fixed concentration of IRD7-labeled ST2_099 was incubated with increasing purified TTHB099 protein concentrations to permit specific binding and the resulting products analyzed by EMSA ( Figure 4). We found that the TTHB099-ST2_099 complex exhibited similar electrophoretic mobility as observed with the TTHB099-Round 7 DNA complex ( Figure 2B  Biolayer interferometry (BLI) was used to characterize TTHB099-consensus DNA interactions. This innovative approach measures in vitro real-time interactions between macromolecules, including proteins and nucleic acids [16]. Our BLI analysis involved biotinylated consensus sequence, ST2_099, affixed onto streptavidin sensors interacting with increasing TTHB099 protein Biolayer interferometry (BLI) was used to characterize TTHB099-consensus DNA interactions. This innovative approach measures in vitro real-time interactions between macromolecules, including proteins and nucleic acids [16]. Our BLI analysis involved biotinylated consensus sequence, ST2_099, affixed onto streptavidin sensors interacting with increasing TTHB099 protein concentrations in solution. This provided a qualitative observation of protein-DNA association and dissociation kinetics ( Figure 5A). The most substantial interactions were observed for the highest concentrations of TTHB099 (450 nM (red) and 150 nM (green)). An arbitrary DNA sequence, ST2_REPSAis, was tested as a control DNA ( Figure 5B). It demonstrated binding interactions that were below our experimental detection levels, consistent with a low TTHB099-REPSAis affinity. Another outcome of this study was the quantitative evaluation of the TTHB099-consensus binding affinity. Least squares regression analysis of the association and dissociation rates were calculated with GraphPad Prism 8. From those rates, a dissociation constant was produced. TTHB099 interacting with its consensus sequence had a K D of 2.214 nM with an R 2 value of 0.9883. concentrations in solution. This provided a qualitative observation of protein-DNA association and dissociation kinetics ( Figure 5A). The most substantial interactions were observed for the highest concentrations of TTHB099 (450 nM (red) and 150 nM (green)). An arbitrary DNA sequence, ST2_REPSAis, was tested as a control DNA ( Figure 5B). It demonstrated binding interactions that were below our experimental detection levels, consistent with a low TTHB099-REPSAis affinity. Another outcome of this study was the quantitative evaluation of the TTHB099-consensus binding affinity. Least squares regression analysis of the association and dissociation rates were calculated with GraphPad Prism 8. From those rates, a dissociation constant was produced. TTHB099 interacting with its consensus sequence had a KD of 2.214 nM with an R 2 value of 0.9883. Further characterization of TTHB099-DNA binding was made using selected point mutations of its consensus sequence and BLI. Binding kinetics data, including association rate (kon), dissociation rate (koff), and the dissociation constant, were derived for each of the mutated sequences and displayed in Table 1. As observed with the m2 mutant, a single change in a highly conserved nucleotide of the consensus sequence affects the binding affinity by 15-fold. Even point mutations of less conserved positions (e.g., m5) decreased affinity by 2-fold. These data suggest that the TTHB099 binding to DNA is highly sequence-specific. Additionally, the nanomolar dissociation constant we observed indicates that our consensus sequence is a good representation of the native TTHB099's preferred sequences in T. thermophilus HB8. Notably, TTHB099-DNA binding is not affected by the absence or presence of the second messenger 3′,5′ cAMP, unlike its archetype protein CRPEc [17].  Further characterization of TTHB099-DNA binding was made using selected point mutations of its consensus sequence and BLI. Binding kinetics data, including association rate (k on ), dissociation rate (k off ), and the dissociation constant, were derived for each of the mutated sequences and displayed in Table 1. As observed with the m2 mutant, a single change in a highly conserved nucleotide of the consensus sequence affects the binding affinity by 15-fold. Even point mutations of less conserved positions (e.g., m5) decreased affinity by 2-fold. These data suggest that the TTHB099 binding to DNA is highly sequence-specific. Additionally, the nanomolar dissociation constant we observed indicates that our consensus sequence is a good representation of the native TTHB099's preferred sequences in T. thermophilus HB8. Notably, TTHB099-DNA binding is not affected by the absence or presence of the second messenger 3 ,5 cAMP, unlike its archetype protein CRP Ec [17].

T. thermophilus HB8 Genome-Wide Mapping of the TTHB099-Binding Motif
The Find Individual Motif Occurrences (FIMO) program was used to scan the T. thermophilus HB8 genome (GenBank uid13202 210) for the 16-mer palindromic sequence identified through MEME software [18]. FIMO revealed 78 motif occurrences with a p-value of less than 0.0001. The top 25 results with p-values ≤ 3.95 × 10 −5 are shown in Table 2. The locations of these 25 sequences relative to the transcription start site of their proximally downstream genes were determined using the Kyoto Encyclopedia of Genes and Genomes (KEGG) and verified in the National Center for Biotechnology Information (NCBI) database [19,20]. Furthermore, operon predictions for each location were made using the Database of PrOkaryotic OpeRons (DOOR 2 ) and BioCyc [21,22]. Sixteen of these sites were situated within the −200 to +20 nucleotide region most common for transcription activator binding. Furthermore, their proximally downstream genes were the first of their operons or single transcriptional units, making these sites stronger candidates for TF regulation. The other nine sites were omitted from further analysis because they were located further downstream, inside open reading frames, or, as in the case of TTHC003, too far upstream (−666 nucleotides).
To better ascertain a potential role for TTHB099 to regulate transcription, all the 16 sequences selected from FIMO were analyzed for potential core promoter elements. Sequences ± 200 bp upstream and downstream of the FIMO identified TTHB099-binding sites were evaluated in SoftBerry BPROM ( Figure 6) [23]. Many sequences (9/16) contained a TTHB099-consensus sequence that overlapped with at least one promoter element (−35 box, −10 box, +1 start site). Those included TTHA0081/80, TTHA0507, TTHA0133, TTHA1833, TTHA1912, TTHA0202, TTHA0374, and TTHA1627. Three of the TTHB099-binding sequences, TTHA0506, TTHB089, and TTHA0201, were located upstream of the nearby −35 box. Conversely, TTHB088 and TTHA1626 had their putative TTHB099-binding sequences located downstream of the postulated promoter elements. There were no identified promoter elements near TTHA0132 and TTHA1911. It is not clear why BPROM was unable to identify any core promoter elements for these genes, but limitations could arise from a potential difference between core promoter elements in E. coli, the model organism used by BPROM, and those of T. thermophilus HB8.    Table 2). Blue nucleotides represent the longest open reading frames with a downstream orientation relative to the TTHB099 binding site; Green nucleotides indicate open reading frames with the opposite orientation; Black nucleotides imply intergenic regions. Potential promoter elements (−35 and −10 boxes, +1 start site of transcription) are indicated with cyan highlighting; TTHB099-binding sites are indicated with yellow highlighting; Overlapping TTHB099binding and core promoter elements are indicated by green highlighting.

Validation of Potential TTHB099-Regulated Genes
Apart from analyzing the locations of the binding sequences concerning the TSS, as well as their positions regarding promoters, we investigated the affinity of TTHB099 protein for the selected sequences. To better understand how TTHB099 regulates genes identified through FIMO, all 16  Table 2). Blue nucleotides represent the longest open reading frames with a downstream orientation relative to the TTHB099 binding site; Green nucleotides indicate open reading frames with the opposite orientation; Black nucleotides imply intergenic regions. Potential promoter elements (−35 and −10 boxes, +1 start site of transcription) are indicated with cyan highlighting; TTHB099-binding sites are indicated with yellow highlighting; Overlapping TTHB099-binding and core promoter elements are indicated by green highlighting.

Validation of Potential TTHB099-Regulated Genes
Apart from analyzing the locations of the binding sequences concerning the TSS, as well as their positions regarding promoters, we investigated the affinity of TTHB099 protein for the selected sequences. To better understand how TTHB099 regulates genes identified through FIMO, all 16 sequences underwent binding kinetics analysis using BLI. As some TTHB099 binding sites are shared by two bidirectional promoters, only nine unique sequences were synthesized into biotinylated double-stranded oligonucleotides. Binding reactions containing four different concentrations of TTHB099 (450, 150, 50, and 17 nM) were tested against each binding site probe ( Table 3). The strongest binding was observed for TTHA1833 and TTHB088/89, with K D values below 10 nM. The genes with binding affinities between 10-100 nM were TTHA1911/12, TTHA0506/07, and TTHA0080/81 in increasing order. TTHA1626/27, TTHA0132/33, and TTHA0201/02 displayed the weakest binding, with K Ds > 100 nM, while binding to TTHA0374 could not be detected under our experimental conditions. Interestingly, these binding parameters do not always follow the sequence order defined by FIMO, suggesting that there could be other factors at play that are not considered by this in vitro analysis. Further validation of TTHB099 involvement in the transcriptional regulation of these genes was sought through the analysis of prior DNA microarray studies, publicly available through the National Center for Biotechnology Information Gene Expression Omnibus (NCBI GEO) [24]. A GEO2R comparison (SuperSeries GSE21875, Supplementary Table S2) of expression profile data from sets of TTHB099-deficient and wild type strains was used to determine if the absence of TTHB099 produced any substantial changes in the expression of the FIMO-identified genes. Of these genes, only TTHA1626 displayed a substantially increased expression with a logFC of 2.62. The remainder of the 15 genes had only small, non-significant changes, as shown in Table 4. Likewise, individual genes within their respective operons did not seem to have any significant changes.
As an additional approach to better understand potential gene regulation by TTHB099, we investigated the postulated biological functions of these genes. Many were reported only as encoding hypothetical proteins, which is fairly common in T. thermophilus. Several encoded proteins that may be involved in sugar metabolism (malate synthase, 3-isopropylmalate dehydratase), energy metabolism (3-isopropylmalate dehydratase large subunit, homoaconitate hydratase small subunit), or transport. Most interesting, two genes (TTHA0134 and TTHA0507) are believed to encode transcriptional regulators. If so, their expression could complicate the identification of directly TTHB099-regulated genes by GEO2R.  [19]. (LogFC) Log2-fold change between data obtained from TTHB099-deficient (accessions GSM530118/20/22) and wild-type (accessions GSM532194/5/6) T. thermophilus HB8 strains, SuperSeries GSE21875. (Adj. p-value) The p-value obtained following multiple testing corrections using the default Benjamini and Hochberg false discovery rate method [25].
Another analysis of the GEO2R data was focused on investigating the genes that were affected most by the absence of TTHB099 (Table 5). These genes could be grouped into operons, suggesting that their expression was not affected by multiple-unrelated TFs, but rather a fundamental regulatory mechanism involving TTHB099. The upregulated genes, 75% (50/67), were involved in the electron transport chain (ETC) of oxidative phosphorylation, carbohydrate metabolism, bacteria motility, and osmotic stress defense. The downregulated operons, 25% (17/67 genes), were related to ribosomal proteins, ion ABC transporters, and ATPases. MEME analysis of the −300/+100 bp sequences upstream of each operon did not find our TTHB099 consensus sequence or reveal any additional binding motifs. Taken together, this suggests a complicated mechanism for the regulation of these genes that may not involve TTHB099 directly regulating their transcription.

Discussion
In this study, an in vitro iterative selection method, REPSA, was used to annotate the TTHB099 transcription regulator in T. thermophilus HB8. This, coupled with next generation sequencing and MEME motif elicitation, allowed for the identification of the TTHB099-DNA binding motif, a 16 bp long palindromic sequence, 5 -TGT(A/g)n(t/c)c(t/c)(a/g)g(a/g)n(T/c)ACA-3 , with a consensus half-site 5 -T 1 G 2 T 3 (A/G) 4 N 5 (T/C) 6 C 7 (T/C) 8 -3 . Binding kinetics between TTHB099 and its consensus sequence, as well as single point mutations within its half-site, were investigated using BLI. TTHB099 protein bound the 16-mer consensus sequence with a high affinity (K D = 2.21 nM) and the point-mutated sequences in the range of 4.86 of 33.6 nM with mutations at the second and third positions having the greatest effect. The different binding affinities for each mutated sequence mirrored the MEME results represented by the TTHB099 sequence logo. Our report is the first time a consensus sequence has been identified for TTHB099.
Interestingly, our sequence has a strong resemblance to the E. coli CRP (CRP Ec ) consensus sequence, 5 -AAATGTGATCTAGATCACATTT-3 [26]. In both cases, the trimers "TGT" and "ACA" are highly conserved and are considered most significant for TF binding. The specifics of this resemblance could be correlated to the homology between the two proteins previously reported by Agari et al. [12]. However, E. coli and T. thermophilus HB8 are not only phylogenetically distant, but they also live in entirely different environments, mesophilic and extremophilic, respectively [27]. Hence, the biological roles of TTHB099 need not necessarily be the same as those of CRP Ec . This is most evident in the observation that TTHB099 does not require the second messenger 3 ,5 cAMP to bind DNA, which is required by CRP Ec .
Having found and validated a consensus TTHB099-binding sequence, mapping it onto the genome of T. thermophilus HB8 would help identify potential TTHB099-regulated genes. Using FIMO, the MEME derived position weight matrix version of our consensus sequence recognized 78 sequences. The top 25 sequences with the best p-values were selected for further validation. It is important to note that the p-values derived were not as small as found in our previous studies, due to the ten poorly conserved positions in the middle of the TTHB099 consensus sequence palindrome, which affected the dynamic programming algorithm of FIMO. Our analysis of the TTHB099 binding site location relative to the TSS of the proximal downstream genes showed that almost half of the identified sites were located inside open reading frames, which is not typical for traditional transcription factors. Notably, no potential TTHB099 binding site was found near its own gene. This could imply that the TTHB099 TF by itself has no direct regulatory role over its operon litR (TTHB100, TTHB099, TTHB098) or the divergent crtB operon (TTHB101, TTHB102) that shares a common intergenic region. Autoregulation is a common feature for many prokaryotic TFs, including members of the CRP family, but may not be a characteristic for TTHB099 unless in an auxiliary fashion [28].
The promoter analysis revealed that nine TTHB099-binding sites overlapped with potential core promoter elements, a TF-promoter interaction characteristic of Class II transcription activators, as well as transcription inhibition via steric hindrance. Additionally, three sequences are located upstream of the −35 box, fitting the Class I activator model, while two are downstream of the −10 box, a model used by both transcription activators and repressors. These variations in the binding method suggest that TTHB099 could be either an activator or a suppressor. Indeed, the dual regulatory role is common in global regulators such as CRP Ec [29]. Moreover, eight pairs of the TTHB099-binding sequences were found in the intergenic region of divergent genes, another characteristic of dual-regulators [30].
Biophysical studies performed with BLI were used to further our understanding of TTHB099 interaction with the identified sites. The equilibrium dissociation constants were below the micromolar range, showing that TTHB099 had some appreciable affinity for the tested sites. However, variations as high as 200-fold were observed. These K D changes did not follow any particular trends, such as the p-value order established through FIMO. Neither did the sites with the highest affinity have similarities in terms of promoter location or presumed manner of transcription regulation. For example, the TTHB099 binding sequence with the highest affinity (3.05 nM) was located in the intergenic region and overlapped with the −35 box upstream of TTHA1833. The TTHB099 binding sequences with the next lowest K D were also situated in the intergenic regions, but they were located upstream and downstream of the TTHB088/89 promoters, respectively. Such biophysical results emphasize the importance of experimental validation of theoretically determined sites.
Our BLI binding studies are limited to the simple interactions of purified protein with synthesized DNAs in the absence of any environmental or biological factors. Knowing that the transcription regulation apparatus can be complex, we decided to complement our in vitro study with data from in vivo expression profiles. Using publicly available expression profile data from the matched wild type and TTHB099-deficient T. thermophilus HB8 strains, operons of the 16 potentially regulated genes were investigated. We found that the mRNAs of these genes were not significantly affected by the deficiency of TTHB099. These results suggest that TTHB099 does not have, on its own, any appreciable regulatory roles over these genes in exponentially propagating wild type organisms.
Nonetheless, TTHB099 deficiency does appreciably affect the expression of several genes in exponentially propagating T. thermophilus HB8. We identified 19 operons, 12 of which were overexpressed (positively affected) in the deficient strains. The upregulated set of genes were involved in the electron transport chain (ETC) of oxidative phosphorylation, sugar metabolism, type IV pilin related proteins, and one osmotically inducible protein, consistent with TTHB099 being a transcriptional repressor. Conversely, there were seven under-expressed operons or a total of 17 genes in the TTHB099-deficient strains, suggesting that TTHB099 may act as an activator for these genes. The downregulated genes encoded for ribosomal proteins, iron ABC transporters, and ATPases. Notably, the biological roles of the most affected operons in the TTHB099-deficient strain were involved in metabolic pathways that have been reported to be regulated by the archetype CRP Ec [31]. For example, ribosome related genes were downregulated in the absence of TTHB099, similar to what Pal et al. reported for their evolutionary expressed CRP Ec -deficient strains [32]. Likewise, iron transport genes were downregulated in the absence of TTHB099, similar to what was observed in the absence of CRP Ec , as Zhang et al. reported [33]. Such results indicate that TTHB099 does have some biological functions similar to those of the CRP Ec . However, these regulatory roles do not seem to be affected by changes in cAMP concentration. Moreover, a MEME search for a consensus sequence between the 19 most-affected operons identified via the GEO data failed to bring up any significant motifs. Thus, the hypothesis for a simple regulatory mechanism is once more unsatisfied. TT_P0055 from T. thermophilus HB27, an ortholog of TTHB099 with only one amino acid substitution (E77D), has been reported to be a positive regulator of crtB operon, which in turn is involved in light-dependent carotenoid biosynthesis [33]. However, the functional effects of TT_P0055 on carotenoid production lack details on the mechanism of regulation and could indicate that TT_P0055 has indirect control over crtB activation. The homology between the HB27 and HB8 strains, particularly on this regulatory complex (TT_P0055 and TTHB099 proteins, their intergenic regions, and their crtB operons), would suggest similar biological functions for the two TFs. When analyzing the GEO expression data in the absence of TTHB099, there is no detectable change in crtB genes. These results could be attributed to the absence of light in the experimental conditions required to deplete the litR transcriptional repressor of TT_P0055, the latter positively regulating carotenoid production [34].
Because TTHB099 does not seem to have any observable binding to the crtB promoter, the study published by Ebright et al. centered on TTHB099 binding upstream of TTHB101 is based on a prediction not firmly established [35]. Hence, Ebright's claim that TTHB099 is a model class II transcription activator may need to be reconsidered under the light of our new findings.
Looking for a connection between the genes found via the REPSA-identified consensus sequence and the genes affected by TTHB099 deficiency, as determined by GEO2R, we found that five of the affected operons (30 genes) had an upstream binding sequence identified by FIMO. Interestingly, these binding sites were located at about 0.9 to 4 kbp upstream of the most affected operons. Such behavior could be explained by TTHB099 acting as an enhancer or silencer. These elements do exist in the prokaryotic world but not in large numbers. To date, the identified prokaryotic enhancers regulate only a few promoters used by σ 54 -directed RNA polymerases [36]. Knowing that T. thermophilus HB8 does not have a σ 54 homolog, it becomes even more challenging to suggest that TTHB099 can function as an enhancer/silencer. Future studies could be designed to analyze potential interactions of TTHB099 with other TFs, supporting the hypothesis of a complex regulatory mechanism involving distal enhancer/silencer elements. As for TTHB099 being an activator or a suppressor, all our data point towards a dual regulatory role.

Preparation of Oligonucleotides
Single-stranded oligonucleotides used in this study (Supplementary Table S3) were obtained from Integrated DNA Technologies (Coralville, IA). ST2R24 library DNA used for the initial REPSA round was PCR amplified with primers ST2L and IRD7_ST2R for seven cycles to ensure maximal double-stranded DNA content with fully annealed randomized cassette regions. Subsequent REPSA round DNAs were PCR amplified for 6, 9, and 12 cycles to identify those products with optimal cassette integrity. Libraries for massively parallel semiconductor sequencing were prepared by a two-step fusion PCR process, using primers A_BC01_ST2R and trP1_ST2L as the initial set and A_uni and trP1_uni as the second set, as previously described [8]. Other duplex DNAs were prepared by conventional PCR amplification following the Taq DNA polymerase manufacturer's instructions. EMSA probes were amplified with primers ST2L and IRD7_ST2R, while nucleic acids used in BLI assays were amplified with primers ST2L and Bio_ST2R. The concentrations for the modified oligonucleotides were measured with Qubit 3 Fluorometer following our protocol [37].

TTHB099 Protein Expression and Purification
TTHB099 protein was expressed following IPTG induction of E. coli BL21(DE3) bacteria transformed with plasmid PC014099-42 (obtained from RIKEN Bioresource Research Center) and purified from soluble bacterial extracts by heat-treatment as described in our previous study [11]. SDS-PAGE analysis of fractions from purification steps is shown in Supplementary Figure S2A and is consistent with a near quantitative recovery of TTHB099 protein. Analysis by quantitative densitometry with Coomassie Brilliant Blue staining indicated that the purified TTHB099 preparation had a final concentration of 50.6 µM (Supplementary Figure S2B).

TTHB099-Consensus Sequence Determination
REPSA selections with 50.6 nM TTHB099 were performed essentially as previously described [8], with the exception that 3.2 U FokI were used in Rounds 1-4 and 8 U BpmI were used in Rounds 5-7. Furthermore, the Round 1 reactions were seeded with 4.515 ng (100 fmol) ST2R24 DNA pool. The PCR amplification reactions were adjusted to use 560 nM of primers and 25 U NEB Taq polymerase. Finally, the annealing and elongation temperatures were adjusted to 58 • C and 68 • C, respectively.
The amplicon library preparation, Ion PGM individual sequencing particle (ISP) preparation, Ion PGM semiconductor sequencing, and Ion Torrent sever sequence processing were all performed as previously described [8]. Resulting raw sequences in fastq format (Supplementary Data S1) were further processed by our Sequencing1.java program [8] and DuplicatesFinder v 1.1 (http: //proline.bic.nus.edu.sg/~asif/tools/DuplicateFinder.zip) to yield data (Supplementary Data S2) suitable for consensus sequence determination by web version 5.0.5 of Multiple Em for Motif Elicitation (MEME) (http://meme-suite.org/tools/meme) [15]. Position-weight matrices for the top three motifs were determined and displayed as sequence logos, from which a consensus sequence was derived.

Protein-DNA Binding Assays
Electrophoretic mobility shift assays (EMSA) with both libraries and defined DNA were performed as previously described [8], with a detailed protocol being available [38]. Note that EMSA experiments performed with REPSA selected DNAs contain multiple DNA species, including high concentrations of DNA primers, and should not be used to determine apparent binding affinities. Biolayer interferometry was performed as previously described [11], with the exception that only four concentrations of TTHB099 (17, 50, 150, 450 nM) were used for each DNA probe investigated. Such was sufficient to yield global values for k on and k off rate constants as well as K D equilibrium binding constants with R 2 goodness-of-fit determinations of greater than 0.95 in all cases. A single BLI experiment was performed with 2.25 nM consensus (wt) probe, 200 nM TTHB099, and 100 nM 3 ,5 -cAMP, to test the effects of cAMP on TTHB099-DNA binding. Its R 2 value was 0.92.

Conflicts of Interest:
The authors declare no conflict of interest. In addition, the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.