3.2. Establishment of in Vitro Genomic DNA Binding Assay
We established a method to identify HY5 genomic binding sites based on
in vitro gDNA immunoprecipitation like DIP-chip [
13] and next-generation sequencing. A detailed description is given in the Materials and Methods section. Firstly, recombinant HY5 proteins, His-HY5-HA, were bound to beads, mixed with sheared gDNA fragments (150–200 bp) and incubated
in vitro. After immunoprecipitation using anti-HA antibodies, gDNA fragments bound with His-HY5-HA were recovered and subjected to next-generation sequencing. The read sequences were mapped onto the
Arabidopsis genome and then peak binding locations were detected
in silico. We detected 2503 HY5-binding peaks located in the
Arabidopsis genome and 17 peaks located to the mitochondrial or chloroplast genomes (MACS2
q-value < 0.01). Of the nuclear genome peaks, 2177 mapped to the regulatory regions of 3195 nuclear genes. These regions are defined as being from 2000 bases upstream to 500 bases downstream of the gene (
Figure 2A and
Table S2). Of these, 60% are located either 500 bases upstream (the promoter) or inside the genes (gene bodies), although some peak positions mapped 500 bases downstream. In addition, we compared the HY5-binding peak positions with TSSs (transcription start sites) that have been identified previously [
26]. Many of the positions were strongly associated with the 200-base (promoter) region upstream from the TSSs (
Figure 2B). On the other hand, 326 peaks mapped to intergenic regions. We named this method that identifies TF-binding sites from genomic fragments as the gDNA binding sequencing method (gDB-seq).
Figure 1.
Recombinant His-HY5-HA protein produced in wheat embryo extract. (A) CBB (Coomassie Brilliant Blue) staining of the gel on which purified proteins were loaded. (B) and (C) Western blot analysis of the purified proteins using anti-HA antibody (B) and anti-HY5 antiserum (C), respectively. Asterisks indicate the positions of His-HY5-HA, respectively.
Figure 1.
Recombinant His-HY5-HA protein produced in wheat embryo extract. (A) CBB (Coomassie Brilliant Blue) staining of the gel on which purified proteins were loaded. (B) and (C) Western blot analysis of the purified proteins using anti-HA antibody (B) and anti-HY5 antiserum (C), respectively. Asterisks indicate the positions of His-HY5-HA, respectively.
We predicted five conserved sequence motifs of the HY5-binding sites using the surrounding sequences of 498 peaks with the highest peak scores using GADEM program [
24] and found that one of the five conserved sequences contains 5'-(G/T)(C/A)CACGT(C/G)-3' that is similar to the HY5-binding motif registered in the JASPAR database, indicating that the motif prediction is almost accurate (
Figure 3A and
Figure S1) [
25]. Of the 2520 detected peaks, 2301 and 1976 peaks contain at least one ACGT or CACGT consensus sequence(s), respectively, within 400 bases of the peaks (
Table S3). The number of ACGT- or CACGT-containing sequences in the gene-related regions tended to correlate with the peak scores (
Figure 3B and
Figure 3C). These observations indicated that the gDB-seq of HY5 sufficiently uncovers binding motif and genomic binding sites.
Figure 2.
Peak positions derived from HY5 gDB-seq. (A) Positions of detected peaks around the HY5-binding genes. (B) Distribution of gDB-seq peak positions from transcription start sites (TSSs).
Figure 2.
Peak positions derived from HY5 gDB-seq. (A) Positions of detected peaks around the HY5-binding genes. (B) Distribution of gDB-seq peak positions from transcription start sites (TSSs).
The HY5 gDB-seq data can be visualized using GBrowse located at
http://plant.psc.riken.jp/cgi-bin/gb2/gbrowse/arabidopsis. For example, in the promoter region of
AT1G12200, there is a peak in which there are two CACGT-containing motif sequences (
Figure 4). In ChIP-based methods, several factors in the nucleus may affect the selection of binding sites by the TF and the TF’s binding affinity. On the other hand, a uniform pool of gDNA fragments was mixed with recombinant TF proteins thus avoiding any involvement of other factors in the binding assay of gDB-seq. Therefore, using gDB-seq it is possible to find the relative TF affinity (peak score) to a binding site compared with those of other binding sites. For example, the score (218.0) of the peak located upstream of AT1G12200 (
Figure 4) is higher than that (128.6) of the peak located upstream of AT1G10090, indicating that HY5 binds upstream of AT1G12200 more strongly than it does upstream of AT1G10090 (
Table S2).
Figure 3.
Motif prediction of HY5-binding sequence. (A) Results of the HY5-binding motif prediction and similarity search in the JASPAR database. Seventeen TFs with similar motifs as those of HY5 are shown under the predicted motif. Arrow indicates a HY5 binding motif registered in the JASPAR database. (B) and (C) box plots of the distribution of peak scores to number(s) of ACGT (B) and CACGT (C) sequence(s), respectively. p-values of peak scores between categories (0 and 1) and (2<) are shown above boxplots. Peak numbers included in each category of (B) and (C) are 219 (0), 498 (1), 599 (2), 562 (3), 378 (4), 163 (5), 68 (6) and 33 (7<), and 544 (0), 809 (1), 659 (2), 344 (3), 113 (4) and 51 (5<), respectively.
Figure 3.
Motif prediction of HY5-binding sequence. (A) Results of the HY5-binding motif prediction and similarity search in the JASPAR database. Seventeen TFs with similar motifs as those of HY5 are shown under the predicted motif. Arrow indicates a HY5 binding motif registered in the JASPAR database. (B) and (C) box plots of the distribution of peak scores to number(s) of ACGT (B) and CACGT (C) sequence(s), respectively. p-values of peak scores between categories (0 and 1) and (2<) are shown above boxplots. Peak numbers included in each category of (B) and (C) are 219 (0), 498 (1), 599 (2), 562 (3), 378 (4), 163 (5), 68 (6) and 33 (7<), and 544 (0), 809 (1), 659 (2), 344 (3), 113 (4) and 51 (5<), respectively.
Figure 4.
Genome browser view of an example of gDB-seq. There is one peak at the promoter of the AT1G12200 gene. Sequence around this peak contains two predicted motifs (CACGT). Peak score = 218.0. The HY5 gDB-seq data can be visualized using GBrowse located at
http://plant.psc.riken.jp/cgi-bin/gb2/gbrowse/arabidopsis.
Figure 4.
Genome browser view of an example of gDB-seq. There is one peak at the promoter of the AT1G12200 gene. Sequence around this peak contains two predicted motifs (CACGT). Peak score = 218.0. The HY5 gDB-seq data can be visualized using GBrowse located at
http://plant.psc.riken.jp/cgi-bin/gb2/gbrowse/arabidopsis.
Some studies have established novel
in vitro high-throughput methodologies for identifying TF-binding motifs, such as SELEX (systematic evolution of ligands by experimental enrichment)-seq and PBM (protein-binding microarray) methods [
30,
31,
32,
33]. Compared with these methods, gDB-seq and ChIP-based methods are good at identifying candidate genes targeted by the TFs as well as binding motifs. In addition, it is possible that gDB-seq uncovers larger number of possible physical binding sites of a TF than ChIP-based methods, because gDB-seq eliminates some
in vivo interferences, such as environmental conditions, cell types and interaction with other cellular factors.
3.3. Comparison of gDB-Seq with ChIP-Chip
In previous ChIP-chip work, 3894 genes were estimated to be binding targets of HY5 [
11] and 738 of them overlapped with 3103 light-regulated genes reported previously by microarray analysis using RNAs of cotyledon, hypocotyl and root [
34]. To evaluate the gDB-seq results we compared 3195 HY5-binding candidate genes identified through the gDB-seq method with those of the
in vivo ChIP-chip analysis and also with light-regulated genes. About 36.5% (1166) or 14.6% (468) of candidate genes in gDB-seq overlapped with genes in ChIP-chip or light-regulated genes, respectively (
Figure 5 and
Table S2).
Lee
et al. selectively picked up and analyzed four genes,
CAB1,
CHS,
RbcS1A and
F3H, which were detected by ChIP-chip of HY5 [
11]. While the
F3H and
RbcS1A genes were also predicted to be HY5-binding genes in this report, the
CAB1 and
CHS genes were not. When we examined their loci on the genome browser, we observed much lower peaks comprising of small numbers of sequenced reads on both the
CAB1 and
CHS loci rather than the predicted peaks (
Figure 6). The
ANNAT1 gene was also detected by ChIP-chip of HY5 [
11], but not by gDB-seq. However, using a cut-off with a higher
q-value (<0.05) in the MACS2 program resulted in the detection of a peak located in the
ANNAT1 gene (
Figure 6), suggesting that the number of gDB-seq peaks may increase if we used different ways to predict the peak positions. Alternatively, as the DNA-protein binding buffer condition could affect gDB-seq results, any changes in the buffer may lead to different results from this report.
Figure 5.
Venn diagram of overlaps between gDB-seq (gDNA binding sequencing), ChIP-chip (Chromatin immunoprecipitation-chip) [
11] and light-regulated genes [
34].
Figure 5.
Venn diagram of overlaps between gDB-seq (gDNA binding sequencing), ChIP-chip (Chromatin immunoprecipitation-chip) [
11] and light-regulated genes [
34].
3.4. Association Study between gDB-Seq and Microarray in hy5 Mutant
To identify direct targets regulated by HY5 binding, we performed microarray analysis of three-day-old
hy5 null mutants grown under continuous white light. We found that 1391 or 3242 genes were up-regulated or down-regulated in
hy5 mutants, respectively, compared to wild type (twofold and
p-value < 0.05). In this microarray, probes of 3050 genes of the 3195 HY5-binding genes were plotted and accumulation of most of their transcripts in the
hy5 mutants was equivalent to those in wild-type plants (
Figure 7A). However, 234 or 236 transcripts of 3050 HY5-binding genes overlapped with the up-regulated or down-regulated transcripts in the
hy5 mutants, respectively (
Figure 7B and
Table S4). These include novel HY5-regulated candidate genes that were not identified previously [
11]. These results indicate that HY5-binding potential does not necessarily associate with the accumulation of transcripts from the HY5-targeted genes and that secondary effects by disruption of the
HY5 gene had a much larger impact on the whole transcriptome than any direct effect.
Figure 6.
Genome browser view of the CAB1, CHS and ANNAT1 loci. Arrows indicate possible peak positions, which were not detected in MACS2 (Model-based Analysis of ChIP-Seq) peak prediction.
Figure 6.
Genome browser view of the CAB1, CHS and ANNAT1 loci. Arrows indicate possible peak positions, which were not detected in MACS2 (Model-based Analysis of ChIP-Seq) peak prediction.
Figure 7.
Microarray analysis in hy5 null mutant. (A) Dot plot of the accumulation of 3050 HY5-binding genes in wild type (WT) and hy5. LOG2 values of normalized signals were plotted. (B) Venn diagram of overlaps between HY5-binding genes and genes that showed a significant difference in microarray analysis in the hy5 mutant.
Figure 7.
Microarray analysis in hy5 null mutant. (A) Dot plot of the accumulation of 3050 HY5-binding genes in wild type (WT) and hy5. LOG2 values of normalized signals were plotted. (B) Venn diagram of overlaps between HY5-binding genes and genes that showed a significant difference in microarray analysis in the hy5 mutant.
3.5. Role of HY5-Binding Potential in Response to Blue Light Exposure
As shown in
Figure 5, several HY5-binding genes overlapped with light-regulated genes. In order to know the transcript profiles of the HY5-binding genes in response to blue light exposure, three-day-old plants grown in the dark were transferred to blue light, grown for one hour under blue light, and harvested. We then performed directional RNA-seq analysis and the results showed that 684 or 484 transcripts were up-regulated or down-regulated, respectively. The accumulation tendency of the transcripts from HY5-binding genes is comparable between dark and blue light (
Figure 8A). Nevertheless, 80 down-regulated and 183 up-regulated genes in
hy5 overlapped with HY5-binding genes (
Figure 8B and
Table S5). For example, blue-light inducible genes,
CRY3 and
EFO1, possess gDB-seq peaks at the promoter and gene body, respectively (
Figure 8C).
Figure 8.
Relationship between HY-5 binding potential and early blue-light response. (A) Dot plot of the accumulation of 2631 HY5-binding genes in plants grown in the dark and under blue light for 1 hour. LOG10 (in −4~4) values of FPKMs (Fragments per kilobase of exon per million mapped reads) were plotted. (B) Venn diagram of the overlaps between HY5-binding genes (gDB-seq), blue-inducible genes (Blue_up) and blue-repressed genes (Blue_down). (C) Genome browser view of two examples (the CRY3 and EFO1 loci) of blue light-inducible genes with HY5-binding sites. (D) Semi-quantitative RT-PCR (Reverse Transcription-Polymerase Chain Reaction) analysis of nine HY5-binding genes, HY5 and ACT2. ACT2 accumulation was used as a loading control. The peak positions of gDB-seq are shown on the right.
Figure 8.
Relationship between HY-5 binding potential and early blue-light response. (A) Dot plot of the accumulation of 2631 HY5-binding genes in plants grown in the dark and under blue light for 1 hour. LOG10 (in −4~4) values of FPKMs (Fragments per kilobase of exon per million mapped reads) were plotted. (B) Venn diagram of the overlaps between HY5-binding genes (gDB-seq), blue-inducible genes (Blue_up) and blue-repressed genes (Blue_down). (C) Genome browser view of two examples (the CRY3 and EFO1 loci) of blue light-inducible genes with HY5-binding sites. (D) Semi-quantitative RT-PCR (Reverse Transcription-Polymerase Chain Reaction) analysis of nine HY5-binding genes, HY5 and ACT2. ACT2 accumulation was used as a loading control. The peak positions of gDB-seq are shown on the right.
To understand the role of HY5 binding on blue-inducible genes, we examined the change in accumulation of the transcripts of nine blue-inducible HY5-binding genes by semi-quantitative RT-PCR analysis. We detected delayed induction of eight transcripts including
MAPKKK13,
JAC1 and
F3H in
hy5 mutants compared with wild type (
Figure 8D). This result suggests that HY5 binding positively regulates efficient induction of blue-light inducible genes
in vivo. Interestingly, it is possible that HY5 binds and affects gene expression not only by binding promoters but also by binding gene bodies, such as the
EFO1 locus (
Figure 8C and
Figure 8D).
The result above is one example of how gDB-seq analysis will contribute to future research that will clarify the regulatory mode of transcription by TFs during environmental change in plants. Generally, in vivo ChIP-based methods identify physiological TF-binding sites, while in vitro gDB-seq, reveals the candidate genes regulated by a TF. The information from gDB-seq will be useful to elucidate the whole picture of genes that are controlled by TFs.
HY5 regulates blue-light signaling pathway through physical interaction with other bZIP-type TFs, HYH and GBF1 [
18,
35]. GBF1 antagonistically acts with HY5 and HYH in seedling development [
18]. HY5 also interacts with some B-Box-containing TFs like BBX21, BBX22 and BBX25 [
36,
37,
38]. While BBX21 and BBX22 act as positive regulators of photomorphogenesis [
36,
37], BBX25 does as a negative regulator by down-regulating
BBX22 expression [
38]. The output of gDB-seq ignores these interactions and chromatin states, which are often important for
in vivo regulation of gene expression by HY5. Therefore, additional studies through other approaches may be required to reveal regulation of genes targeted by HY5 based on gDB-seq information.