Discovering the DNA-Binding Consensus of the Thermus thermophilus HB8 Transcriptional Regulator TTHA1359

Transcription regulatory proteins, also known as transcription factors, function as molecular switches modulating the first step in gene expression, transcription initiation. Cyclic-AMP receptor proteins (CRPs) and fumarate and nitrate reduction regulators (FNRs) compose the CRP/FNR superfamily of transcription factors, regulating gene expression in response to a spectrum of stimuli. In the present work, a reverse-genetic methodology was applied to the study of TTHA1359, one of four CRP/FNR superfamily transcription factors in the model organism Thermus thermophilus HB8. Restriction Endonuclease Protection, Selection, and Amplification (REPSA) followed by next-generation sequencing techniques and bioinformatic motif discovery allowed identification of a DNA-binding consensus for TTHA1359, 5′–AWTGTRA(N)6TYACAWT–3′, which TTHA1359 binds to with high affinity. By bioinformatically mapping the consensus to the T. thermophilus HB8 genome, several potential regulatory TTHA1359-binding sites were identified and validated in vitro. The findings contribute to the knowledge of TTHA1359 regulatory activity within T. thermophilus HB8 and demonstrate the effectiveness of a reverse-genetic methodology in the study of putative transcription factors.


Introduction
In bacteria, transcription regulatory proteins or transcription factors function as critical constituents of signal transduction networks, acting upon environmental and cellular cues to modulate the transcriptional program appropriately through their specific binding to control elements within targeted gene promoters [1]. While their functions are traditionally determined through genetic means, this is less feasible in many less-well-studied organisms, often relying on genomic organization and structural homology to infer putative transcription factor biological roles [2].
The cyclic-AMP receptor proteins (CRPs) and fumarate and nitrate reduction regulators (FNRs) compose the CRP/FNR superfamily of transcriptional regulators, a diverse subgroup of bacterial transcription factors regulating gene expression in response to a spectrum of stimuli [3]. Currently, insight into the CRP/FNR superfamily primarily derives from past and present research into the founding and representative members of the superfamily, E. coli CRP (CRP Ec ) and FNR (FNR Ec ), respectively [4,5]. Following complexation with the metabolite effector 3 −5 cAMP, CRP Ec homodimers adopt a conformation that allows them to bind to DNA sequences with the consensus 5 -AAATGTGAtctagaTCACATTT-3 , thereby regulating hundreds of genes involved with the catabolism of secondary carbon sources [6][7][8]. FNR Ec , on the other hand, forms homodimers containing two [4Fe-4S] clusters under anaerobic conditions [9], allowing them to bind DNA sequences having the consensus 5 -TTGATnnnnATCAA-3 and activating the expression of hundreds of genes involved with anaerobic respiration [10][11][12]. 2

of 14
Thermus thermophilus HB8 [13] is a model extreme thermophile and is the subject of the Structural-Biological Whole Cell Project, which seeks to understand all cellular biological phenomena at an atomic level [14,15]. High-resolution, three-dimensional structures have been obtained for hundreds of its proteins, owing to their ease in crystallization and xray diffraction analysis. The T. thermophilus HB8 genome has been fully sequenced [16], and through homology studies, four CRP/FNR transcription factors have been identified: TTHA1359, TTHA1437, and TTHA1567, encoded by genes in the main chromosome, and TTHB099, encoded by a gene in the maxiplasmid pTT27. Structural information has been obtained for three T. thermophilus HB8 proteins, TTHA1359, TTHA1437, and TTHB099, with only TTHA1437 requiring cAMP binding to adopt a conformation conducive for DNA binding [17][18][19]. Additionally, information regarding DNA sequences recognized by these transcription factors and their regulated genes has been published for TTHA1359 and TTHA1437, thereby providing insights into their potential biological functions [17,18,20].
Our laboratory has pioneered a reverse-genetic approach to obtain insights into transcription factors in T. thermophilus HB8 [21][22][23][24][25]. Such approach entails the iterative selection method Restriction Endonuclease Selection, Protection, and Amplification (REPSA) to define their consensus DNA-binding sequences and various bioinformatic methods to identify favored genomic binding sites, promoter element homologies, and potential biological roles. Here we describe our investigation of the T. thermophilus HB8 transcription factor TTHA1359, a CRP/FNR superfamily protein, also known as SdrP [18,20]. We found that TTHA1359 preferentially binds a consensus sequence 5 -AWTGTRA(N) 6 TYACAWT-3 and identified several genes potentially regulated by this protein.

REPSA Selection of TTHA1359-Binding Sequences
REPSA selections were initiated with approximately 60 billion molecules of IRD7labeled ST2R24 selection template and 34 nM purified TTHA1359 protein. Such provided a good representation of all possible 14-bp recognition sequences (4 14 /2~134 million) with the potential to identify sequences having nanomolar binding affinity. The first two rounds of REPSA were performed using the type IIS restriction endonuclease (IISRE) FokI, while the final three rounds used the IISRE BpmI. This was done to avoid selecting any intrinsically FokI cleavage-resistant DNAs, which were observed previously [25]. Nondenaturing PAGE analysis of 5 -fluorophore-labeled DNA species through the course of REPSA selection shows that a TTHA1359-dependent, IISRE cleavage-resistant species was first observed with Round 5 DNA and constituted over 50% of the product DNA ( Figure 1). These data are consistent with the successful selection of DNAs containing high-affinity TTHA1359-binding sequences.
To validate REPSA selection of high-affinity TTHA1359-binding sequences, an independent protein-DNA binding assay, EMSA (electrophoretic mobility shift assay), was performed on the initial and final selected populations of DNA. DNAs from Round 5 demonstrated a shift to a single, slower mobility species for almost all input DNA following incubation with 100 nM dimeric TTHA1359, with the first indication of this species being observed at 10 nM dimeric TTHA1359 ( Figure 2). No comparable effects were observed with Round 1 DNAs, even following incubation with 1000 nM dimeric TTHA1359. Taken together, these data indicate that REPSA was successful in selecting for DNAs that can form stable complexes with TTHA1359 and that a majority of our selected DNAs contained TTHA1359 binding sites.

Identification of Consensus TTHA1359-Binding Sequences
To determine consensus DNA-binding sequences for TTHA1359, REPSA Round 5 DNA was sequenced. Massively parallel semiconductor sequencing of a synthesized amplicon library yielded 9,516,545 total base reads with an incorrect base calling quality score, ≥Q20, of 8,631,131 and 158,313 reads of 60 bp average length. Sequencing1.java refinement reduced this to 61,754 sequences. Duplicates constituted less than 0.5% of the To validate REPSA selection of high-affinity TTHA1359-binding sequences, an independent protein-DNA binding assay, EMSA (electrophoretic mobility shift assay), was performed on the initial and final selected populations of DNA. DNAs from Round 5 demonstrated a shift to a single, slower mobility species for almost all input DNA following incubation with 100 nM dimeric TTHA1359, with the first indication of this species being observed at 10 nM dimeric TTHA1359 ( Figure 2). No comparable effects were observed with Round 1 DNAs, even following incubation with 1000 nM dimeric TTHA1359. Taken together, these data indicate that REPSA was successful in selecting for DNAs that can form stable complexes with TTHA1359 and that a majority of our selected DNAs contained TTHA1359 binding sites.

Identification of Consensus TTHA1359-Binding Sequences
To determine consensus DNA-binding sequences for TTHA1359, REPSA Round 5 DNA was sequenced. Massively parallel semiconductor sequencing of a synthesized amplicon library yielded 9,516,545 total base reads with an incorrect base calling quality score, ≥Q20, of 8,631,131 and 158,313 reads of 60 bp average length. Sequencing1.java refinement reduced this to 61,754 sequences. Duplicates constituted less than 0.5% of the  To validate REPSA selection of high-affinity TTHA1359-binding sequences, an independent protein-DNA binding assay, EMSA (electrophoretic mobility shift assay), was performed on the initial and final selected populations of DNA. DNAs from Round 5 demonstrated a shift to a single, slower mobility species for almost all input DNA following incubation with 100 nM dimeric TTHA1359, with the first indication of this species being observed at 10 nM dimeric TTHA1359 ( Figure 2). No comparable effects were observed with Round 1 DNAs, even following incubation with 1000 nM dimeric TTHA1359. Taken together, these data indicate that REPSA was successful in selecting for DNAs that can form stable complexes with TTHA1359 and that a majority of our selected DNAs contained TTHA1359 binding sites.

Identification of Consensus TTHA1359-Binding Sequences
To determine consensus DNA-binding sequences for TTHA1359, REPSA Round 5 DNA was sequenced. Massively parallel semiconductor sequencing of a synthesized amplicon library yielded 9,516,545 total base reads with an incorrect base calling quality score, ≥Q20, of 8,631,131 and 158,313 reads of 60 bp average length. Sequencing1.java refinement reduced this to 61,754 sequences. Duplicates constituted less than 0.5% of the Sets of 1000 refined sequences were submitted to MEME analysis [26], both with and without palindromic filtering. The top non-palindromic motif was a 19-nt sequence within a 24-nt span and was present in 789/1000 input sequences for a statistical significance E-value of 1.1 × 10 −2996 . The top palindromic motif was 24-nt, found in 810/1000, and had an E-value of 4.5 × 10 −1153 . Both are rendered as sequence logos ( Figure 3). Notably, the second-best motifs for each analysis, a 15-nt non-palindromic and a 16-nt palindromic motif, had significantly reduced E-values (4.2 × 10 −264 and 1.5 × 10 −51 , respectively), in part due to their reduced lengths. We derived a 20-bp inverted repeat, primarily derived from the top non-palindromic motif, to serve as the TTHA1359-DNA binding consensus. This may be thought of as two palindromic 7-bp recognition elements separated by a 6-bp spacer region and is shown in Figure 3C. Such a motif would be expected for a CRP-family protein, which typically binds spaced, inverted repeat sequences as homodimers [27].
ond-best motifs for each analysis, a 15-nt non-palindromic and a 16-nt palindromic motif, had significantly reduced E-values (4.2 × 10 -264 and 1.5 × 10 -51 , respectively), in part due to their reduced lengths. We derived a 20-bp inverted repeat, primarily derived from the top non-palindromic motif, to serve as the TTHA1359-DNA binding consensus. This may be thought of as two palindromic 7-bp recognition elements separated by a 6-bp spacer region and is shown in Figure 3C. Such a motif would be expected for a CRP-family protein, which typically binds spaced, inverted repeat sequences as homodimers [27].

Biophysical Characterization of TTHA1359-DNA Binding
Biolayer interferometry (BLI) assays, which ascertain binding kinetics in real-time by measuring the optical interference pattern of reflected white light upon macromolecular interaction with a biosensor, were performed to characterize TTHA1359-DNA binding interactions [28]. Raw BLI data (dots) for a range of TTHA1359 concentrations interacting with consensus or control sequences are shown in Figure 4. Nonlinear regression analysis of these data yielded the best-fitted association and dissociation curves (solid lines). From these, kinetic parameters, including association (k on ) and dissociation (k off ) rates, were derived. These, as well as dissociation constants and R 2 coefficient of determination, were determined (Table 1). We also utilized BLI to test TTHA1359 binding to point mutations (Table 1; wt_p*) and insertion/deletion mutants within the spacer region (Table 1; wt_s*) of our REPSA-identified consensus sequence. Several of the mutant consensus sequences and a neutral control sequence, REPSAis (ctl), could not have their binding parameters determined under our experimental conditions, ostensibly given their weak binding by TTHA1359 (i.e., K D > 1000 nM). We found that TTHA1359 bound its consensus sequence with high affinity (3.447 nM), in line with DNA binding by other CRP proteins [6]. Single point mutations of this sequence in just one of the 7-bp recognition elements reduced binding by tenfold or more, depending on the location of the mutation within the consensus, and primarily mirrored each base's significance as ascertained by MEME. Interestingly, TTHA1359-DNA binding did not tolerate alterations in spacing between recognition elements, with either a single deletion (5-bp spacer) or addition (7-bp spacer) eliminating observable binding. Finally, TTHA1359 bound the CRP EC consensus sequence with higher affinity than the REPSA identified consensus sequence. This may be the consequence of sequence differences on the CRP EC consensus periphery (Table S1, compare sequences ST2_1359_wt and ST2_CRP_Ec) or the presence of an alternating AC spacer region in the TTHA1359 consensus.
determined (Table 1). We also utilized BLI to test TTHA1359 binding to point mutations (Table 1; wt_p*) and insertion/deletion mutants within the spacer region (Table 1; wt_s*) of our REPSA-identified consensus sequence. Several of the mutant consensus sequences and a neutral control sequence, REPSAis (ctl), could not have their binding parameters determined under our experimental conditions, ostensibly given their weak binding by TTHA1359 (i.e., KD > 1000 nM). We found that TTHA1359 bound its consensus sequence with high affinity (3.447 nM), in line with DNA binding by other CRP proteins [6]. Single point mutations of this sequence in just one of the 7-bp recognition elements reduced binding by tenfold or more, depending on the location of the mutation within the consensus, and primarily mirrored each base's significance as ascertained by MEME. Interestingly, TTHA1359-DNA binding did not tolerate alterations in spacing between recognition elements, with either a single deletion (5-bp spacer) or addition (7-bp spacer) eliminating observable binding. Finally, TTHA1359 bound the CRPEC consensus sequence with higher affinity than the REPSA identified consensus sequence. This may be the consequence of sequence differences on the CRPEC consensus periphery (Table S1, compare sequences ST2_1359_wt and ST2_CRP_Ec) or the presence of an alternating AC spacer region in the TTHA1359 consensus.

Figure 4.
Representative BLI association and dissociation data plots. Graphs depict raw association and dissociation step data measured during BLI experiments with (A) ST2_1359_wt DNA, the wild-type TTHA1359 consensus sequence; (B) ST2_CRP_Ec DNA, the E. coli CRP consensus sequence; and (C) ST2_1359_ctrl DNA, the REPSAis control sequence. Dots depict raw data points. Solid lines depict calculated best-fit lines for raw data points. Line colors pink, green, orange, and cyan correspond to 5.7, 17, 51, and 153 nM dimeric TTHA1359, respectively. ATTtTGACACACATCACAAT ATTGgGACACACATCACAAT ATTGTtACACACATCACAAT ATTGTGAcacacTCACAAT ------Ambiguous ------ Representative BLI association and dissociation data plots. Graphs depict raw association and dissociation step data measured during BLI experiments with (A) ST2_1359_wt DNA, the wild-type TTHA1359 consensus sequence; (B) ST2_CRP_Ec DNA, the E. coli CRP consensus sequence; and (C) ST2_1359_ctrl DNA, the REPSAis control sequence. Dots depict raw data points. Solid lines depict calculated best-fit lines for raw data points. Line colors pink, green, orange, and cyan correspond to 5.7, 17, 51, and 153 nM dimeric TTHA1359, respectively.

Exploration of Potential Regulatory TTHA1359-DNA Binding Sites in the T. thermophilus HB8 Genome
The motif scanning program FIMO (Find Identified Motif Occurances) [29] was used to identify possible TTHA1359-binding sites within the T. thermophilus HB8 genome. Since the top non-palindromic motif discovered by MEME was a truncation of the consensus determined for TTHA1359, it was not directly imported into FIMO as previously described [21]. Instead, position-dependent letter-probability matrix data from positions 6-16 of the top non-palindromic motif were initially utilized to derive an extended 22-bp position-dependent letter-probability matrix. A text file suitable for FIMO upload and utilization was then written in MEME minimal motif format, containing the targeted version number of MEME, the extended motif alphabet and strand information, and the extended motif position-dependent letter-probability matrix (http://meme-suite.org/doc/memeformat.html?man_type=web [accessed on 13 February 2020]). The file was uploaded to FIMO v 5.0.5 and used to scan the GenBank Thermus thermophilus HB8 universal identifier 13,202 version 210 database for potential binding sites with statistically significant p-values less than 0.0001. The potential binding sites selected for further bioinformatic analysis were limited to those with p-values less than 5 × 10 −6 . These were examined for their positions relative to mapped open reading frames (ORFs) in the T. thermophilus HB8 genome. Those in intergenic regions or within the −200 to +20 nucleotide region most common for transcription activator binding were subjected to BPROM identification of potential promoter elements. Examples of these analyses, corresponding to FIMO binding sites with p-values < 5 × 10 −6 and located in likely transcription regulatory regions, are shown in Table 2 and Figure 5, respectively.  Our promoter analyses found that 11 of the top 17 TTHA1359 genomic binding sites are situated in regions (intergenic, −200/+10) where bacterial transcriptional regulators typically reside. Five TTHA1359-binding sites were present in single, unidirectional promoters; three were shared by six opposing, bidirectional promoters. In each case, core promoter elements could be identified. Examples of different relationships between the TTHA1359 binding site and proximal core promoter elements were observed. Several had TTHA1359 sites upstream of the core promoter elements (e.g., TTHA0953, TTHA0987, TTHA0784, and TTHA0446). However, most had TTHA1359 sites overlapping core promoter elements, either the −35 box (e.g., TTHA0425, TTHA0447, and TTHA0533) or the −10 box (e.g., TTHA0080, TTHA0081, TTHA0954, and TTHA0534).   (Table 2). Names indicate the pairs of ORFs shown. Default is with a rightward, downstream orientation and is indicated with blue nucleotides. Reverse orientation genes have their names in brackets and are indicated with green nucleotides. Black nucleotides indicate intergenic regions. Potential core promoter elements (-35 and -10 boxes, +1 start site of transcription) were predicted using Softberry BPROM [30] and are indicated with cyan highlighting; TTHA1359-binding sites are indicated with yellow highlighting; overlapping TTHA1359-binding and core promoter elements are indicated by green highlighting.
Our promoter analyses found that 11 of the top 17 TTHA1359 genomic binding sites are situated in regions (intergenic, -200/+10) where bacterial transcriptional regulators typically reside. Five TTHA1359-binding sites were present in single, unidirectional promoters; three were shared by six opposing, bidirectional promoters. In each case, core promoter elements could be identified. Examples of different relationships between the TTHA1359 binding site and proximal core promoter elements were observed. Several had  (Table 2). Names indicate the pairs of ORFs shown. Default is with a rightward, downstream orientation and is indicated with blue nucleotides. Reverse orientation genes have their names in brackets and are indicated with green nucleotides. Black nucleotides indicate intergenic regions. Potential core promoter elements (−35 and −10 boxes, +1 start site of transcription) were predicted using Softberry BPROM [30] and are indicated with cyan highlighting; TTHA1359-binding sites are indicated with yellow highlighting; overlapping TTHA1359-binding and core promoter elements are indicated by green highlighting.

In Vitro Validation of TTHA1359-DNA Binding Sites in the T. thermophilus HB8 Genome
To validate TTHA1359-binding to FIMO-predicted gene promoters, we utilized the IISRE cleavage-protection assay REPA (restriction endonuclease protection assay) [31]. This assay is similar to REPSA; however, REPA uses defined DNA templates and excludes amplification and sequencing steps. We initially screened several promoter sequences and identified five FIMO-predicted promoter sequences that were resistant to IISRE cleavage in the presence of TTHA1359 compared to the REPSAis control probe ( Figure S1). Notably, the two promoter sequences that showed little to no evidence of cleavage protection (TTHA0954 and TTHA0533/4) had mutations in bases that we deemed essential based on our BLI analysis (Table 1).
We further analyzed the binding dynamics of TTHA1359 to the cleavage-resistant promoters by performing REPA with a titration of TTHA1359 ( Figure 6). We observed levels of cleavage protection for each promoter sequence tested, with some exhibiting more protection than others (TTHA0425 > TTHA0953 > TTHA0446/7, TTHA0954, TTHA0081). Importantly, no observable cleavage protection was observed for the control REPSAis DNA (green) in each case. Collectively, these results show that TTHA1359 is capable of binding these promoters and further validate our reverse genetic and bioinformatical approach.
To validate TTHA1359-binding to FIMO-predicted gene promoters, we utilized the IISRE cleavage-protection assay REPA (restriction endonuclease protection assay) [31]. This assay is similar to REPSA; however, REPA uses defined DNA templates and excludes amplification and sequencing steps. We initially screened several promoter sequences and identified five FIMO-predicted promoter sequences that were resistant to IISRE cleavage in the presence of TTHA1359 compared to the REPSAis control probe ( Figure S1). Notably, the two promoter sequences that showed little to no evidence of cleavage protection (TTHA0954 and TTHA0533/4) had mutations in bases that we deemed essential based on our BLI analysis (Table 1).
We further analyzed the binding dynamics of TTHA1359 to the cleavage-resistant promoters by performing REPA with a titration of TTHA1359 ( Figure 6). We observed levels of cleavage protection for each promoter sequence tested, with some exhibiting more protection than others (TTHA0425 > TTHA0953 > TTHA0446/7, TTHA0954, TTHA0081). Importantly, no observable cleavage protection was observed for the control REPSAis DNA (green) in each case. Collectively, these results show that TTHA1359 is capable of binding these promoters and further validate our reverse genetic and bioinformatical approach.

Bioinformatic Analysis of Potential TTHA1359-Regulated Genes
Further insights into the biological function of TTHA1359 as a transcriptional regulator were pursued through different bioinformatic approaches. For those FIMO-identified promoters potentially regulated by TTHA1359 that we validated in vitro, additional genes that could be part of a co-regulated operon were identified through BioCyc [32]. The resulting gene products and their biological roles were ascertained from information in the KEGG database [33]. Additionally, their gene expression changes between exponentially growing wild-type and isogenic TTHA1359-depleted strains were determined using GEO2R software and publicly available microarray data (GEO subseries GSE10369) [34]. These data are presented in Table 3. Some potential TTHA1359-regulated genes were involved in universal processes, including transcription (TTHA0953) and translation (TTHA0446, tRNA-Ala-3, and TTHA0083). Notably, those genes involved in translation were present in different transcriptional units (operons), suggestive of their coordinate regulation by TTHA1359. Others were involved in metabolic processes, including energy-related (TTHA0425) and sugar metabolism (TTHA0954 and TTHA0955). A single operon (TTHA0447-TTHA0451) containing transporter genes thought to be involved with quorum sensing was also identified. Finally, several genes (TTHA0080, TTHA0081, and TTHA0082) lacked substantial information regarding their gene products and biological roles. This is understandable, given that over 40% of the genes in T. thermophilus HB8 encode hypothetical proteins with unknown biological functions [33].

Discussion
Using the iterative selection method REPSA, massively parallel sequencing, and MEME motif discovery software, we defined a 20-bp consensus sequence for TTHA1359, 5 -AWTGTRA(N) 6 TYACAWT-3 . This consensus contains a spaced inverted repeat, characteristic of most CRP-family transcription factor binding sites [27]. In fact, it is quite reminiscent of the archetype E. coli CRP consensus sequence 5 − TGTGA(N) 6 TCACA − 3 [6]. This is somewhat surprising, as although both proteins have recognizable CRP-type HTH domains (TTHA1359: aa 117-189, CRP EC : aa 138-210), there is not an appreciable identity or homology between these domains, except in the region conferring sequence-specific DNA recognition (TTHA1359: VRETVTK, CRP EC : SRETVGR, pfam13545: TRETVSR). Differences between these HTH domains may be necessary to maintain structural integrity under different environmental conditions, thermophilic and mesothermic, respectively.
Biophysical characterization of TTHA1359-DNA binding found that TTHA1359 binds its consensus sequence with a dissociation constant of 3.4 nM. Single point mutation of this consensus resulted in a decrease in binding affinity, from 6.8-fold to greater than 1000-fold, the measurement limit of our standard assay. The locations of the most critical nucleotides in the consensus sequence, 5

-A(T/A)TGT(G/A)A(N) 6 T(C/T)ACA(A/T)T-3 (underlined)
, correlated well with those emphasized in the MEME-derived sequence logo. Such speaks to the validity of our REPSA approach. Not fully appreciated is the magnitude of single point mutations on binding affinity. For example, the related T. thermophilus HB8 CRP-family protein, TTHB099, recognizes a similar, albeit smaller, consensus sequence, 5 -TGT(A/g)N(Y) 3 (R) 3 N(T/c)ACA-3 , with some of the same nucleotides being most critical for binding affinity (underlined) [25]. However, mutation of these sites had only a 6-to 15-fold decrease in binding affinity, far less than observed for comparable mutations in TTHA1359 binding sites. Thus, our findings suggest that structurally similar proteins can bind related consensus sequences yet exhibit different responses to specific point mutations. They also suggest that slight changes in the DNA sequence that certain motif scanning algorithms may tolerate (e.g., FIMO) can profoundly affect the potential for TTHA1359 binding under concentration-limiting conditions. This is most evident by our in vitro analysis of FIMO-predicted promoter sequences, in which point mutations at critical bases based on our consensus sequence resulted in little to no TTHA1359 binding. Interestingly, although the initial adenine in our consensus sequence is essential for TTHA1359 binding, its palindromic thymine appears to be less critical ( Figure S1C). This is consistent with previous reports suggesting one side of the TTHA1359 palindromic sequence may be more selective than the other [18,20].
Like most CRP-family proteins, TTHA1359 preferentially binds a spaced, invertedmotif sequence as a homodimer. However, we found that the length of this spacer region is critical. TTHA1359, like TTHB099 and CRP EC , binds to a core inverted repeat, 5 -GT(X) 10 AC-3 , with 10 intervening base pairs. This spacing allows both CRP-family homodimer members to be on the same face of the DNA double helix. In the case of TTHA1359, changes in this spacing, either shortening or lengthening by single base pair, result in a greater than a 100-fold decrease in binding. Curiously, similar spacing consequences are not usually observed for CRP EC [36]. This demonstrates the unique importance of the spacing parameter on high-affinity TTHA1359-DNA binding.
To gain insights into the transcriptional functions of TTHA1359 in vivo, a GEO2R comparison between available microarray expression data for wild-type and isogenic TTHA1359-depleted T. thermophilus HB8 strains was performed. Expression changes in the genes immediately downstream of the FIMO-identified TTHA1359-binding sites, as well as other members of their transcriptional units (operons), were examined. Notably, only three of these genes (TTHA0425, TTHA0081, and TTHA0082) were among the top 100 GEO2R-identified responsive genes, while another (TTHA0953) was at position #102. Most of the remaining FIMO-identified genes were ranked much higher, either because of their low magnitude of expression change between wild-type and depleted strains, the low confidence in the results between multiple experiments, or both. While it is somewhat reassuring that our FIMO-identified sites with the highest correlation to the consensus TTHA1359 sequence were among the best GEO2R-identified TTHA1359-responsive genes, it is not wholly unexpected that a complete correlation between the two data sets was not observed. Transcriptional regulation in vivo is complex, relying on multiple proteins and co-factors. It can also be a consequence of indirect effects, e.g., regulation of transcriptional regulatory proteins beyond the one under investigation. Thus, while simple transcriptional repressors (e.g., TTHA0101, TTHA0973, and TTHB023) have shown a high degree of correlation between their FIMO-identified promoters and those genes exhibiting substantially increased expression in the depleted strains [22][23][24], we have found that putative dual-function transcriptional regulatory proteins (e.g., TTHB099) do not always exhibit such a direct relationship [25].
TTHA1359, also known as SdrP, has been investigated previously [18,20]. Using either changes in gene expression between wild-type and TTHA1359-depleted T. thermophilus HB8 strains or microarray data from 117 different environmental and chemical stress conditions, the authors were able to identify 16 gene promoters whose regulation by TTHA1359 could be recapitulated in an in vitro transcription assay. Analysis of their promoter regions allowed a refinement of the TTHA1359 consensus sequence to 5 -TTGTG(N) 9 CNC-3 , with these sites being located adjacent or overlapping the −35 box. Taken together, their data suggests that SdrP likely functions as a Class II transcriptional activator to primarily regulate gene expression in response to oxidative stress.
Notably, only one gene, TTHA0425, was shared between those identified through our REPSA-based, reverse genetic approach and those identified through the more conventional genetic process. Such differences may reflect the intrinsic limitations and biases of the assays used, REPSA/REPA versus microarrays/in vitro transcription. However, it is intriguing that none of these TTHA1359 binding sites would be considered high affinity based on their sequences. Perhaps under oxidative stress conditions, TTHA1359 accumulates to micromolar concentrations, thereby activating its target genes with weak promoter binding sites. This increase in cellular protein concentration provides a reasonable model for gene regulation by TTHA1359, especially as it lacks a modulatory co-factor like 3 −5 cAMP, which many CRP family transcription factors require.

Oligonucleotides
Single-stranded oligodeoxyribonucleotides used in this study (Supplementary Table  S1) were obtained from Integrated DNA Technologies (Coralville, IA, USA). Initial and subsequent selected ST2R24 REPSA selection libraries were prepared by PCR using unmodified ST2L and 5 -IRDye ® 700-modified IRD7_ST2R primers, essentially as previously described [21]. These libraries contained ST2R24 selection templates and 73-bp doublestranded deoxyribonucleotides with a central randomized 24-bp core flanked by defined sequences possessing IISRE-binding sites for FokI and BpmI [21]. Their design permitted the probing of sequence-specific protein-DNA binding through their inhibition of IISRE cleavage within the randomized core and the survival of intact templates for subsequent PCR amplification [37]. Defined, duplex DNA probes for biolayer interferometry (BLI) analyses were synthesized by PCR using ST2Ls and 5 -biotin-modified IRD7-ST2R primers, as previously described [21]. 5 -modified probe concentrations were measured with a Qubit 3 Fluorometer using our standard protocol [38].

TTHA1359 Protein
Purified full-length (1-202 aa) TTHA1359 protein was obtained from IPTG-induced E. coli BL21(DE3) bacteria transformed with the pET11a-derived expression plasmid PC011359-41 (RIKEN Bioresource Research Center) following heat-treatment of soluble bacterial extracts as previously described [24]. Coomassie-stained SDS-PAGE analysis of fractions through purification is shown in Supplementary Figure S2. Stock TTHA1359 is estimated to be 110 µM and greater than 95% pure.
The amplicon library preparation, Ion PGM individual sequencing particle preparation, Ion PGM semiconductor sequencing, and Ion Torrent sever sequence processing were all performed as previously described [21]. The resulting fastq raw sequences (Supplementary Data S1) were processed using our Sequencing1.java program and DuplicatesFinder v 1.1 to yield data (Supplementary Data S2) suitable for consensus sequence determination using Multiple Em for Motif Elicitation (MEME) v 5.0.5 (http://meme-suite.org/tools/meme Informed Consent Statement: Not applicable. Data Availability Statement: All data in this study are provided as Supplementary Materials or are publicly archived datasets. Links to these may be found in the Materials and Methods.