1. Introduction
Regulation of gene expression remains a pivotal subject of research, particularly in transgenic engineering [
1]. Theoretical frameworks have extensively utilized one-dimensional sequence data from gene-transcribed regions to elucidate the interactions between regulatory elements and transgene expression levels [
2,
3]. These studies aimed to enhance exogenous gene expression; however, challenges in eukaryotic transgene regulation persist, necessitating the development of controlled, transgenic expression systems.
Enhanced promoter strength increases transgene expression [
4]. However, owing to the intricate nature of eukaryotic gene regulation, not all robust promoters can achieve maximal protein expression and activity [
5]. Increasing the gene copy number is a straightforward approach; however, excessive copy numbers can be detrimental [
6]. Codon optimization within protein-coding sequences is a common strategy for boosting gene expression [
7,
8,
9]. Synonymous codon usage frequencies vary significantly among species such as
Escherichia coli [
10], yeasts [
11], animals [
12], and plants [
13]. Synonymous codon substitutions can alter post-translational modification sites, thereby affecting the protein structure, stability, and function [
14,
15]. Furthermore, enhancers can improve transgene expression but must be paired with strong promoters and may inadvertently affect host gene expression [
3,
16].
Eukaryotic chromosomes are organized into nucleosome arrays, with nucleosomes comprising ~147 bp DNA segments wrapped around histone octamers, each containing two copies of histones H2A, H2B, H3, and H4 [
17,
18]. Nucleosome positioning modulates the binding of DNA-binding proteins, thereby influencing gene transcription and expression [
19,
20]. In the transcriptional start region, there are two relatively well-positioned nucleosomes located upstream and downstream of the transcription start sites (TSSs), which are referred to as the −1 nucleosome and the +1 nucleosome, respectively [
21,
22]. Between the −1 nucleosome and the +1 nucleosome, there is a variable-length nucleosome-depleted region (NDR) [
23]. NDRs are prevalent across transcriptional regions in various organisms including yeast [
24], plants [
25],
D. melanogaster [
26],
C. elegans [
27], and humans [
28,
29]. Positioning of the −1 nucleosome upstream of the NDR is highly variable, whereas the +1 nucleosome downstream of the NDR remains relatively stable [
30]. NDR serves as a critical functional element for transcription initiation, is enriched with TFBSs, and facilitates transcription factor entry into the chromatin. Transcription initiation is triggered by the binding of transcription factors, and the efficiency of this binding directly influences the transcription efficiency [
31,
32,
33].
In
Saccharomyces cerevisiae, the TSSs of most genes are located at the 3′-end of NDRs or within the +1 nucleosome. The −1 and +1 nucleosomes, along with the intervening NDR sequences, play pivotal roles in the regulation of gene expression [
34,
35,
36]. The efficiency of transcription factor (TF) binding to NDRs is closely linked to NDR length [
22,
37,
38,
39]. Generally, TFs bind to NDRs in three modes: (1) When the NDR is sufficiently long, TFs can easily bind without requiring upstream movement of the −1 nucleosome, resulting in high binding efficiency and typically high gene expression. (2) With shorter NDRs, the space for TF binding is limited, necessitating upstream movement of the −1 nucleosome to extend the NDR sequence. Nucleosome repositioning consumes energy, reduces the binding efficiency, and leads to moderate or weak gene expression. (3) For very short NDRs, a TF-binding space can be created by displacing the −1 nucleosomes. Thus, the NDR length between the −1 and +1 nucleosomes dictates TF binding efficiency and transcription efficacy.
Regulating transgenic expression based solely on one-dimensional DNA information has limitations. Regulation of gene expression at the nucleosome level is essential. In this study, we propose a method for regulating exogenous gene expression using the ±1 nucleosomes and NDR sequences of the transcriptional regulatory region in S. cerevisiae genes. To date, no study has reported the regulation of exogenous gene expression by NDRs. Therefore, we examined the −1 nucleosome, NDR sequence, and +1 nucleosome as transcriptional regulatory sequences in S. cerevisiae and experimentally verified the effect of NDR length on exogenous gene expression levels. This novel approach offers a new strategy for regulating exogenous gene expression.
2. Materials and Methods
2.1. Transcriptional Regulatory Region Design of Target Gene
In
S. cerevisiae, the composition of nucleosomes in gene transcription regions is closely related to the regulation of gene expression [
40]. To explore the relationship between NDR length and transgene expression, we designed two experiments. In experiment one, based on the location of the +1 nucleosome, the length of the NDR, and the expression level of the native gene of
S. cerevisiae, we obtained six transcriptional regulatory sequences, including three long NDRs and three short NDRs. The −1 nucleosome sequence, NDR sequence, and +1 nucleosome sequence in the transcriptional regulatory regions of
S. cerevisiae native genes were used as the transcriptional regulatory sequence of the transgene. The coding sequence of GFP was used as the exogenous gene. The
CYC1-terminator sequence was used as a uniform terminator sequence to avoid the effect of 3’-UTR sequences on transgene expression. The aim of this study was to explore the relationship between NDR length and transgene expression at the experimental level.
Experiment two was based on experiment one. The long NDR sequences were shortened by 80 bp and are referred to as the NDR− sequences. The short NDR sequences were lengthened by 80 bp and were referred to as NDR+ sequences. The aim of the experiment was to explore the effect of changes in NDR length on the corresponding transgene expression levels. The center positions of the −1 and +1 nucleosomes in the transcription start regions of
S. cerevisiae native genes were obtained from the single-base pair precision localization data of
S. cerevisiae nucleosomes provided by Brogaard et al. [
41]. The schematic diagram of the experimental design and workflow is shown in
Figure S1.
2.2. Yeast Strain and Plasmid
The S. cerevisiae strain BY4741 (MATa his3∆1 leu2∆0 met15∆0 ura3∆0) was purchased from Wei Di Biotechnology Co., Ltd. (Shanghai, China). and cultivated at 28 °C in YPD medium [1% (w/v) yeast extract, 2% (w/v) bacto-peptone, and 2% (w/v) glucose]. NDR, GFP, and CYC1-ter DNA fragments were synthesized and cloned into the integration vector pAUR101 (Takara, Dalian, China).
2.3. RNA-Seq
RNA-Seq and data analysis were performed by Shanghai Ling EN Biotech. Briefly, total RNA was extracted from S. cerevisiae at the logarithmic stage using Yeast Processing Reagent (Takara, Dalian, China). RNA-seq transcriptome libraries were prepared using TruSeqTM (San Diego, CA, USA). Libraries were then size-selected for cDNA target fragments ranging from 200 to 300 bp on a 2% agarose gel and amplified by polymerase chain reaction (PCR) for 15 cycles using Phusion DNA Polymerase (NEB, M0530S, Beijing, China). After quantification using TBS380 (Shanghai, China), paired-end libraries were sequenced on an Illumina NovaSeq 6000 sequencing platform.
2.4. PCR Analysis of −1 Nucleosome In Vitro
Genomic DNA was isolated from
S. cerevisiae using the Dr. GenTLE
TM (from yeast) High Recovery (Takara, Dalian, China). Nucleosome DNA was prepared with a Nucleosome Assembly Kit (NEB, E5350, Beijing, China), utilizing −1 nucleosome nucleosome-positioning sequences. Isothermal amplification was conducted using DNA from both nucleosome-assembled and naked DNA templates. The amplification reaction was carried out in a 20 μL system at 37 °C for 20 min using
Bsu DNA polymerase (Yeasen, Shanghai, China). The PCR primers used are provided in
Table S1 of Supplementary File S1.
2.5. Quantification of −1 Nucleosome Sequence In Vitro
S. cerevisiae cells with differential expression levels of six target genes (two groups per gene: one with high expression and one with low expression) were subjected to treatment with micrococcal nuclease (Takara, Dalian, China). DNA was extracted as described in
Section 2.4. An absolute quantitative polymerase chain reaction (qPCR) was performed using DNA as the template to quantify the amount of −1 nucleosome DNA. A standard curve was established by serially diluting plasmids containing the nucleosome-positioning sequences. The PCR conditions were as follows: initial denaturation at 95 °C for 30 s, followed by 40 cycles of denaturation at 95 °C for 5 s, annealing at 60 °C for 30 s, and melting curve detection from 65 °C to 95 °C. The samples were analyzed in three biological replicates and three technical replicates. The primers used for PCR are provided in
Table S2 of Supplementary File S1.
2.6. Yeast Transformation
The integrative plasmid, pAUR101-NDR-GFP, was linearized using the
EcoR I restriction enzyme. Following the LiAc/SS carrier DNA/PEG method described by Gietz and Schiestl [
42], the linearized plasmids were introduced into
S. cerevisiae BY4741 cells using the Yeast Transformation System 2 kit (Takara, Dalian, China). The plasmids were integrated into the
AUR1 locus of chromosome XI in
S. cerevisiae through homologous recombination.
2.7. PCR Detection of Transgenic S. cerevisiae
Genomic DNA was extracted from S. cerevisiae BY4741 using Dr. GenTLE™ (from yeast) High Recovery (Takara, Dalian, China). Polymerase chain reaction (PCR) was used to verify whether the GFP gene was integrated into the AUR1 locus on chromosome XI in S. cerevisiae. The primers used were as follows: GFP gene forward: 5-TCTAAAGGTGAAGAATTATTCACTGGT-3, and reverse: 5-TTATTTGTACAATTCATCCATACCATGG-3. The PCR conditions were as follows: pre-denaturation at 94 °C for 3 min, 30 cycles of denaturation at 94 °C for 15 s, annealing at 60 °C for 30 s, extension at 72 °C for 50 s (30 cycles), and a final extension at 72 °C for 5 min. The PCR products were confirmed by 1% agarose gel electrophoresis and subsequently verified by Sanger sequencing.
2.8. Determination of Target Gene Copy Numbers
The copy number of GFP in the transduced cells was determined by quantitative SYBR real-time qPCR. gDNA was harvested from the cells using a Dr. GenTLE™ (Yeast) High Recovery (Takara, Dalian, China). A standard curve was generated by serial dilution of pEASY-T1 plasmid (TransGen, Beijing, China). The PCR conditions were as follows: pre-denaturation at 95 °C for 30 s, followed by 40 cycles of denaturation at 95 °C for 5 s, and annealing at 60 °C for 30 s, and a dissociation curve analysis was performed to confirm the amplification of a single amplicon. The amplification was conducted with three biological and three technical replicates. The GFP copy number was calculated by employing a standard curve. The primers used for PCR are provided in
Table S2 of Supplementary File S1.
2.9. Analysis of GFP mRNA Expression
Total RNA was extracted from
S. cerevisiae grown to the log-growth phase using Yeast Processing Reagent (Takara, Dalian, China). RNA was reverse-transcribed into cDNA using the Prime Script™ RT Reagent Kit with a gDNA Eraser (Takara, Dalian, China). Genomic DNA was removed by incubating at 42 °C for 2 min. Reverse transcription was performed at 37 °C for 15 min and 85 °C for 5 s. RT-qPCR was performed using TB Green
® Premix Ex Taq™ II (Tli RNaseH Plus) (Takara, Dalian, China). Real-time quantitative PCR was performed using a CFX96 Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA). The parameters were as follows: 95 °C for 30 s, followed by 40 cycles of 95 °C for 5 s, 60 °C for 30 s, and Melt Curve 95 °C for 10 s, followed by 65 °C to 95 °C in increments of 0.5 °C. A Ct value of >37 was defined as a negative result. Normalization was performed using
β-actin RT-PCR. The relative mRNA expression levels of GFP were calculated using the comparative 2
−∆∆Ct or 2
−∆Ct method. Real-time qPCR was performed with three biological and three technical replicates. Primer-BLAST was used to design qPCR primers to quantify natural gene transcripts to obtain results with similar amplicon size and amplification efficiency. The amplicon lengths selected ranged from 100 to 150 bp. The final concentration of each primer was 200 nmol/L. The primers used are listed in
Table S3 of Supplementary File S1.
2.10. Western Blot Analysis
S. cerevisiae BY4741 and the recombinant strains were cultured in 50 mL of YPD liquid medium for 48 h at 28 °C. Cells were collected by centrifugation at 8000× g for 1 min at 4 °C. The collected cells were washed twice with phosphate-buffered saline (PBS). The total protein was extracted using a Yeast Total Protein Extraction Kit (Sangon Biotech, Shanghai, China). The samples were boiled for 5 min, separated by 12.5% sodium dodecyl sulfate–polyacrylamide gel electrophoresis, and transferred to polyvinylidene difluoride membranes. Membranes were blocked with 5% non-fat milk at room temperature for 2 h, followed by incubation with primary antibodies (GFP: Cat#bs-0890R; β-actin: Cat# bs-0061R, 1:2000, Bioss, Beijing, China) at 4 °C overnight. To remove unbound primary antibodies, the membranes were washed thrice in TBST and incubated with peroxidase-conjugated anti-rabbit secondary antibodies (bs-0295G-HRP, 1:2000, Bioss, China) for 1 h at room temperature. Signals were detected using an enhanced chemiluminescence kit (Bio-Rad) and Fusion FX6 imaging system (Vilber Lourmat, Marne La Vallée, France).
2.11. Quantitative Assessment of GFP Protein
The concentration of the GFP protein was measured using a GFP ELISA kit (SED025Ge, Cloud-Clone, Wuhan, China) according to the manufacturer’s protocol. Samples were diluted at 1:100 with PBS and analyzed in duplicate. Standard curves for GFP expression were generated according to the manufacturer’s instructions. The absorbance of each sample was spectrophotometrically measured at 450 nm using a microplate reader (BioTek, Winooski, VT, USA). The units of GFP protein are shown in ng/mL. ELISA assays were conducted in triplicate.
2.12. Statistical Analysis
Statistical analyses were performed using GraphPad Prism 8 software (GraphPad Inc., San Diego, CA, USA). All data are presented as mean ± standard error of the mean. Two-sample t-tests were used to compare two independent groups and paired t-tests were used to compare two paired groups. A p-value of less than 0.05 and an α of 0.05 were considered statistically significant.
3. Results
3.1. NDR Length and Expression Levels of Native Gene of S. cerevisiae
Studies have shown that NDR length in the transcriptional start region correlates closely with gene expression in
S. cerevisiae [
22]. To determine the validity of the theoretical conclusion, we examined the relationship between NDR length and gene expression levels. We obtained and analyzed 1258 genes from
S. cerevisiae, and the NDR lengths ranged between 80 bp and 370 bp. Among them, about 80% of NDR lengths were less than 150 bp, and 20% were greater than 150 bp. The length distribution of the 1258 NDR sequences is shown in
Figure 1A. We obtained the FPKM values from the RNA-seq data for
S. cerevisiae BY4741 and analyzed the relative mRNA expression levels of 1258 genes. The FPKM values were used as expression levels. Genes with NDRs longer than 150 bp showed approximately 30% higher relative mRNA expression levels compared to those with NDRs shorter than 150 bp (
Figure 1B). After excluding genes with extremely low FPKM values (FPKM < 1), a total of 86 genes were randomly selected from the 1258 native genes of
S. cerevisiae. In order to ensure the reliability and reproducibility of statistical analyses and the balance of the two datasets, we selected 37 genes with long NDR (L-NDR) and 49 genes with short NDR (S-NDR). The length of L-NDR is between 150 bp and 400 bp and of S-NDR is between 80 bp and 149 bp. We examined the relative mRNA expression levels of the two groups of genes using RT-qPCR and analyzed the potential correlation between NDR length and gene expression. The results showed that the relative mRNA expression levels of genes with long NDRs were 15 times higher than those of genes with short NDRs (
p < 0.0001;
Figure 1C). The results indicated the possibility of a positive correlation between the length of the NDR and the level of gene expression in a portion of the genes under investigation.
3.2. Structural Characteristics of Regulatory Sequence of Target Gene
Based on the potential positive correlation between NDR length and gene expression levels [
22,
25,
37], we extracted ±1 nucleosome sequences along with their corresponding NDR sequences from six
S. cerevisiae genes. Then, we utilized these sequences as transcriptional regulatory elements to construct a GFP expression system in
S. cerevisiae for investigating the relationship between NDR length and transgene expression level. The GFP expression system in
S. cerevisiae is illustrated in
Figure 2A.
We chose these six genes for the following reasons: firstly, the NDR length is more than 200 bp and the native gene expression is high for the L-NDR model, and the NDR length is shorter than 140 bp and the native gene expression is low for the S-NDR model. Then, the coding regions of the six native genes have no introns. The chromatin structure of the ±1 nucleosomes sequences and NDR sequences of the six genes is illustrated in
Figure 2B. The lengths of the −1 and +1 nucleosome sequences were set to 150 bp. The transcriptional regulatory sequences started 75 bp upstream from the center position of the −1 nucleosome and extended 75 bp downstream from the center position of the +1 nucleosome. The TSS was designated as position 0 and used as a reference for other locations.
The six transcriptional regulatory sequences were categorized into three long NDRs (L-NDR) and three short NDRs (S-NDR), according to the lengths of the NDR sequences. The length of L-NDRs was more than 200 bp, and the length of S-NDRs was in the range of 90~150 bp. For YBL032W, YBL025W, YAL030W, YAR008W and YBL071W-A genes, the translational start site, ATG, was located within the +1 nucleosome. For the YAR027W gene, the ATG was located downstream of the +1 nucleosome. Partial coding sequences of the selected yeast genes were in the transcriptional regulation sequences. The coding sequence of the exogenous gene was connected to the 3’-end of the transcriptional regulation sequence. To ensure that the reading frame of the exogenous gene was not disrupted, we chose the partial coding sequences of the selected yeast genes in multiples of three.
In the six selected genes, we did not consider the relative location of TSS and start codon ATG in the transcript region. We focus mainly on the differences in NDR length in the regulation of transgene expression. Information of the six transcriptional regulatory sequences is shown in
Table 1. The six transcriptional regulatory sequences are shown in
Supplementary File S2.
3.3. Impacts of Nucleosome Assembly and Disassembly on Native Gene Expression in S. cerevisiae
In order to investigate the impact of −1 nucleosomes on gene duplication, we utilized the −1 nucleosome sequences derived from
YBL025W,
YBL032W,
YBL071W-A,
YAR008W,
YAR027W and
YAL030W genes along with commercially available nucleosome assembly kits for in vitro nucleosome assembly. Subsequently, isothermal amplification PCR was conducted, employing both naked DNA and nucleosome-assembled DNA sequences as templates. The results indicated that, compared with naked DNA, the amplification products significantly decreased in the presence of nucleosomes (
Figure 3A). This reduction was attributed to nucleosome DNA being wrapped by proteins, which prevented primer binding. Consequently, only a minor fraction of unencapsulated DNA fragments served as templates during isothermal amplification, thereby diminishing the amplification yield. These observations indicate that the formation of the −1 nucleosome impedes DNA replication, thereby decreasing the amplification efficiency.
We further analyzed the expression levels of L-NDR genes (
YBL025W,
YBL032W, and
YBL071W-A) and S-NDR genes (
YAR008W,
YAR027W, and
YAL030W) in
S. cerevisiae using RT-qPCR. For each gene, both high- and low-expression strains were examined to determine the effect of −1 nucleosomes on downstream native gene expression (
Figure 3B). Absolute quantitative PCR was performed using the −1 nucleosome DNA sequence as a template to evaluate the effects of −1 nucleosome disintegration or sliding on the DNA abundance. The results revealed no statistically significant difference in the amplification of the −1 nucleosome DNA sequence between the two groups of L-NDR genes. However, there was a significant reduction in −1 nucleosome DNA amplification in both the S-NDR groups (
Figure 3C). These results suggest that during transcription, the −1 nucleosome of S-NDR genes at the high expression stage may disintegrate or slide, increasing the NDR sequence length and available physical space for transcription factor binding, thereby enhancing transcription efficiency. Conversely, for L-NDR genes, the position of the −1 nucleosome and NDR sequence length likely remained unchanged, resulting in a minimal impact on their transcription levels.
3.4. Construction of the NDR− and NDR+ Sequences
To verify the impact of NDR length on gene expression, we manipulated the NDR length of six original transcriptional regulatory sequences and developed a GFP expression system employing these sequences as transcriptional regulatory elements. For long NDRs, an 80 bp DNA fragment was truncated near the 5’ end, reducing the NDR length to less than 150 bp, named NDR−. This reduction decreases the physical space for transcription factor binding to the DNA sequence, prolonging the transcription factor binding time and thereby reducing gene transcription efficiency. For short NDRs, an additional 80 bp was added to each original NDR sequence near the 5’ end, increasing the NDR length to more than 150 bp (NDR+). This extension expands the physical space for transcription factor binding and shortens the binding time, thereby enhancing the gene transcription efficiency. The length of the NDR is directly related to the accessibility of DNA. A shorter NDR means that the chromatin state of the region is tighter, making it difficult for transcription factors to bind to the target NDR. In contrast, a longer NDR provides an open chromatin environment that allows transcription factors to bind more easily and quickly, resulting in a relatively shorter binding time [
22,
37,
38,
39]. The structures of transcriptional regulatory sequences with altered NDR lengths are shown in
Figure 4. The NDR− and NDR+ lengths are listed in
Table 1. The details of NDR sequence modifications are as follows.
The NDR− transcriptional regulatory sequences are YBL071W-A, YBL032W, and YBL025W. In the YBL071W-A NDR sequence, an 80 bp DNA fragment was excised upstream of the −47 bp site, and the remaining NDR sequences were spliced to create a 128 bp NDR− sequence. For YBL032W, an 80 bp fragment was removed upstream of the −72 bp site, resulting in a 138 bp NDR− sequence after splicing. Similarly, in YBL025W, an 80 bp fragment excised upstream of the −134 bp site yielded a 134 bp NDR− sequence upon splicing.
The NDR+ transcriptional regulatory sequences are YAL030W, YAR008W, and YAR027W. In the YAL030W NDR sequence, an 80 bp DNA fragment spanning from −46 bp to −125 bp was inserted at the −126 bp position, extending the NDR+ sequence to 214 bp. For YAR008W, an 80 bp fragment from −36 bp to −115 bp was inserted at the −116 bp site, resulting in a 203 bp NDR+ sequence. In YAR027W, an 80 bp fragment between −19 bp and −98 bp was inserted at −99 bp, expanding the NDR+ sequence to 175 bp.
3.5. Analysis of Exogenous Gene
The integration site of the GFP gene on the chromosome of
S. cerevisiae is shown in
Figure 5A. To validate the integration of the GFP gene into the
S. cerevisiae chromosome, we employed the polymerase chain reaction (PCR) technique to ascertain the presence of the GFP gene using genomic DNA isolated from recombinant
S. cerevisiae as a template. The results demonstrated the presence of a 714 bp DNA fragment in the recombinant strain, but not in the control strain (
Figure 5B), indicating that the GFP gene had been integrated into the
S. cerevisiae chromosome. The integration was further validated by Sanger sequencing, which demonstrated a perfect alignment between the obtained sequence and the known GFP sequence. Then, the copy number of the GFP gene in recombinant
S. cerevisiae was determined using the quantitative PCR (qPCR) method. The results suggest that the copy numbers of GFP genes in the genome of
S. cerevisiae are comparable (
Figure 5C). In conclusion, the GFP gene was integrated into chromosome XI of the
S. cerevisiae genome.
3.6. Correlation Between NDR Length and Exogenous Gene Expression Level
To verify the relationship between NDR length and gene expression, three L-NDR and three S-NDR transcriptional regulatory sequences, along with the coding sequence of the GFP gene, were chemically synthesized, cloned into the pAUR101 vector, and introduced into
S. cerevisiae BY4741 cells for expression. The mRNA relative expression levels of the GFP gene under the regulation of each transcriptional regulatory sequence were detected by RT-qPCR. The results showed that the relative mRNA expression levels of GFP under the regulation of L-NDR sequences were all higher than those of S-NDR sequences (
Supplementary File S3: Figure S2A). The average relative expression level of mRNA under the regulation of L-NDR was three times higher than that of S-NDR sequences (
Figure 6A). Independent
t-test results showed that the expression differences were significant between L-NDR and S-NDR groups (
p < 0.01).
The protein expression was detected by ELISA and Western blotting. The results showed that the expression levels of the GFP protein under the regulation of L-NDR sequences were all higher than that of S-NDR sequences (
Supplementary File S3: Figure S2B), and the average expression level of the GFP protein under the regulation of L-NDR sequences was seven times higher than that of S-NDR sequences (
Figure 6B). Independent
t-tests revealed that the green fluorescent protein expression differences between L-NDR and S-NDR groups were significant (
p < 0.001). Using Western blot analyses, 27 kDa products were detected in the recombinant
S. cerevisiae protein samples (
Figure 6C). The green fluorescent protein expression levels followed the same trend as the relative mRNA expression levels.
3.7. Changing the NDR Length of the Same Gene Can Regulate Gene Expression
To further verify the effect of NDR length on exogenous gene expression, we compared GFP gene expression under L-NDR and NDR−, as well as S-NDR and NDR+ conditions, using RT-qPCR, ELISA, and Western blotting. GFP mRNA levels were reduced by 19, 22, and 9 times, and protein levels by 3, 6, and 4 times under L-NDR and NDR− regulation in
YBL025W,
YBL032W, and
YBL071W-A genes, respectively (
Figure 7A,B). Western blot analysis detected consistent 27 kDa products (
Figure 7C).
Conversely, GFP mRNA levels increased by 11, 33, and 4 times, and protein levels by 6, 15, and 3 times under S-NDR and NDR+ regulation in
YAR008W,
YAR027W, and
YAL030W genes, respectively (
Figure 7D,E). Western blot analyses also indicated 27 kDa products, corroborating the mRNA expression trends (
Figure 7F). These findings demonstrated that alterations in NDR length significantly influenced exogenous gene expression, with shorter NDRs downregulating gene expression and longer NDRs upregulating it.
4. Discussion
In
S. cerevisiae, nucleosome positioning is crucial for regulating gene expression [
32]. It has been observed that −1 nucleosomes in long nucleosome-depleted regions (NDRs) remain stable during transcription, whereas those in short NDRs may disassemble or shift upstream. This suggests a nuanced interaction between NDR length and transcriptional regulation [
43]. Genes with long NDRs allow transcription factors to bind directly to NDR sequences without displacing the −1 nucleosome, resulting in higher gene expression. In contrast, genes with short NDRs required energy to move the −1 nucleosome upstream, leading to lower gene expression (
Figure 8). Our experimental data support this view, although not all the genes conformed to this pattern. Future studies should delve deeper into the epigenetic mechanisms through which NDRs influence gene expression.
Currently, the main purpose of the transgenic design is to increase the level of exogenous gene expression based on the one-dimensional sequence information of the gene transcription start region, but research on how to regulate exogenous gene expression levels is still immature. We proposed a method that may regulate the expression of exogenous genes from the perspective of the chromosome structure of ±1 nucleosome and the NDR in the transcriptional regulatory region. It was found that the length of NDR sequences between the −1 nucleosome and the +1 nucleosome in the transcriptional start region of S. cerevisiae genes was closely related to gene expression levels. Based on the conclusion, we took the −1 nucleosome, the NDR sequence, and the +1 nucleosome as the transcriptional regulatory sequences to explore the effect of NDR length on the expression levels of GFP.
For the transcriptional regulatory sequences of long NDRs (>150 bp), it was found that the relative mRNA expression levels and protein expression levels of GFP were significantly higher than that of short NDRs (<150 bp). When the lengths of long NDRs were shortened by 80 bp to a length less than 150 bp, the relative mRNA expression levels and protein expression levels of GFP were significantly decreased. Conversely, when the length of short NDRs is lengthened by 80 bp to a length greater than 150 bp, the relative mRNA expression levels and protein expression levels of GFP were significantly increased. Those results indicated that the length of the NDR sequence is a key factor in regulating the expression of exogenous genes, and the expression level of exogenous genes can be increased by increasing the length of the NDR sequence. Conversely, shortening the length of the NDR sequence can decrease the expression levels of exogenous genes. As mentioned above, this may be related to the effect of NDR length on transcription factor binding. Long NDR is conducive to transcription factor binding, and the gene expression level is high. In contrast, short NDR reduces the binding efficiency of transcription factors, resulting in decreased gene expression levels.
In eukaryotes, the regulation of gene expression is a complex process that is affected by a variety of factors, including the TATA box, DNA methylation, histone modification, and transcription factor binding. It has been demonstrated that the presence of nucleosomes impedes the formation of TATA frames and preinitiation complexes, rendering it essentially impossible for TATA-binding proteins and the RNA polymerase II transcription machinery to bind to nucleosome DNA. Consequently, NDRs are generated in poly(dA:dT) bundles that bind to the core promoter and initiate transcription [
36,
44]. NDRs are essential functional elements for initiating transcription and replication, providing entry points on chromatin for protein complexes involved in RNA transcription and DNA replication [
45,
46,
47]. Furthermore, histone modifications are of great importance in the regulation of chromatin-associated processes. These modifications include methylation, acetylation and ubiquitination of lysine residues, as well as phosphorylation of serine residues. They are essential for a number of functions, including transcription and DNA repair. Among these modifications, acetylation occurs predominantly on nucleosomes flanking the NDR [
48,
49,
50]. It has been demonstrated that ATP-dependent nucleosome remodeling complexes can localize +1 and −1 nucleosomes through the NDR [
47]. In conclusion, gene expression regulation involves a multitude of molecular mechanisms, and a comprehensive understanding of these mechanisms is of paramount importance.
Finally, it should be stressed that, in this study, we mainly focused on the effects of the −1 nucleosome, +1 nucleosome, and NDR length on exogenous gene expression. One study has shown that when large genomic regions from a foreign yeast species are introduced into
S. cerevisiae, the distance between nucleosomes is characteristic of
S. cerevisiae, and when foreign DNA is introduced into
S. cerevisiae, nucleosome-depleted regions may also occur in coding regions [
51]. Therefore, we thought that the transcription system would still form the structure of the −1 and +1 nucleosomes when it is integrated into the chromosome of
S. cerevisiae. By experimental analysis, we ensured that the transgene design is consistent with the expectation, which means that the transcription system including NDR really preserve the nucleosome occupancy signatures in
S. cerevisiae. Meanwhile, it is important to note that this was only a preliminary study investigating the relationship between NDR length and the expression of exogenous genes, and there are some limitations. In the future, more precise design is needed to explore the regulation of exogenous gene expression. For example, first, more appropriate transcriptional regulatory sequences in
S. cerevisiae genes should be selected. The translation start site should be located at the 3′-end or downstream of the +1 nucleosome sequences in the selected transcriptional regulatory sequences. Then, the part of the coding sequence of the
S. cerevisiae gene is replaced by the coding sequence of the exogenous gene. This design not only ensures the integrity of the +1 nucleosome structure but also achieves the direct expression of the exogenous gene. Second, the sequence of the +1 nucleosome can be reconstructed. It was found that the central site of the +1 nucleosome is located at +143 bp in the coding sequence of the
MF1 gene in
S. cerevisiae BY4741. The length of the signal peptide sequence is 267 bp, which means that the +1 nucleosome is included in the signal peptide sequence. The signal peptide sequence can be used to construct the sequence of the +1 nucleosome. Subsequently, the coding sequence of exogenous gene can be connected with the signal peptide sequence to express the exogenous gene. Lastly, we can gradually change the length of the NDR to explore the expression changes of exogenous genes for different transcriptional regulatory sequences. Based on our design and the experiment results, we believe that the expression levels of exogenous genes and
S. cerevisiae genes can be regulated according to the actual demand. Further in-depth study of this issue is warranted. However, we cannot confirm whether this kind of NDR structure of
S. cerevisiae is appropriate to other genomes. But NDRs are more highly conserved across evolution [
21]. We can use the design ideas to conduct a transgene regulation in other genomes in future.