Isolation and Comprehensive in Silico Characterisation of a New 3-Hydroxy-3-Methylglutaryl-Coenzyme A Reductase 4 (HMGR4) Gene Promoter from Salvia miltiorrhiza: Comparative Analyses of Plant HMGR Promoters

Salvia miltiorrhiza synthesises tanshinones with multidirectional therapeutic effects. These compounds have a complex biosynthetic pathway, whose first rate limiting enzyme is 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGR). In the present study, a new 1646 bp fragment of the S. miltiorrhiza HMGR4 gene consisting of a promoter, 5′ untranslated region and part of a coding sequence was isolated and characterised in silico using bioinformatics tools. The results indicate the presence of a TATA box, tandem repeat and pyrimidine-rich sequence, and the absence of CpG islands. The sequence was rich in motifs recognised by specific transcription factors sensitive mainly to light, salicylic acid, bacterial infection and auxins; it also demonstrated many binding sites for microRNAs. Moreover, our results suggest that HMGR4 expression is possibly regulated during flowering, embryogenesis, organogenesis and the circadian rhythm. The obtained data were verified by comparison with microarray co-expression results obtained for Arabidopsis thaliana. Alignment of the isolated HMGR4 sequence with other plant HMGRs indicated the presence of many common binding sites for transcription factors, including conserved ones. Our findings provide valuable information for understanding the mechanisms that direct transcription of the S. miltiorrhiza HMGR4 gene.


Introduction
Salvia miltiorrhiza Bunge, also known as Red sage, or Chinese sage, is an important species used in traditional Chinese medicine. The dried root is used alone or in combination with other herbs to treat various ailments including cardiovascular diseases, menstrual disorders and insomnia [1,2]. In addition, it has been found to have potential in treating cancer [3], Parkinson's [4] and Alzheimer's disease [5], as well as renal deficiency [6], hepatocirrhosis [7], acute lung injury [8], fibrosis [9], neuropathic pain [10], diabetes mellitus [11], or alcohol dependence [12]. The main bioactive compounds responsible for such medical properties are quinone diterpenoids (e.g., tanshinones) and phenolic acids. The tanshinones are biosynthesised from intermediates generated in mevalonate (MVA) and methylerythritol phosphate (MEP) pathways [13]. The key rate-limiting enzyme in the MVA pathway converting 3-hydroxy-3-methylglutaryl-coenzyme A (HMG-CoA) to MVA is HMG-CoA reductase (HMGR) [14]. The pivotal role of HMGR in plant metabolism is emphasised by the precise regulation of its function at the level of transcription, post-transcription, translation and post-translation [15,16]. To date, five S. miltiorrhiza HMGR gene sequences (HMGR-HMGR4) have been identified and deposited in the GenBank database [17][18][19]. A combination of cDNA sequence similarity searches with exon/intron structure indicates Figure 1. Isolated S. miltiorrhiza HMGR4 promoter sequence (1499 bp), 5′ untranslated region (5′UTR) (96 bp) and coding sequence fragment (51 bp). The potential TATA box, transcription start site (TSS), pyrimidine-rich sequence (PRS), tandem repeat and consensus sequences for hormone-, pathogen-, wounding-, light-, and anaerobic-responsive elements are signed and marked in pink on the strands. ABRE, abscisic acid-responsive element; ARE, anaerobic-responsive element; AuxRE, auxin-responsive element; BRRE, brassinosteroid-responsive element; ERE, ethylene-responsive element; LRE, light-responsive element; MJRE, methyl jasmonate-responsive element Moreover, 5369 TFBSs and 365 potentially interacting TFs described previously in A. thaliana were revealed using the PlantPan 2.0 tool (File S1). Each of the obtained TFs could interact with a number of binding sites. The TFBSs were identified at both strands of the examined sequence. The similarity score between the binding sites found in the S. miltiorrhiza HMGR4 promoter and those detected in A. thaliana ranged from 0.7 to 1.0.
Moreover, 5369 TFBSs and 365 potentially interacting TFs described previously in A. thaliana were revealed using the PlantPan 2.0 tool (File S1). Each of the obtained TFs could interact with a number of binding sites. The TFBSs were identified at both strands of the examined sequence. The similarity score between the binding sites found in the S. miltiorrhiza HMGR4 promoter and those detected in A. thaliana ranged from 0.7 to 1.0.
Additional analysis of the entire HMGR4 promoter sequence revealed the following commonly-known consensus sequences: two auxin-responsive elements (AuxREs); seven salicylic acid (SA)-responsive elements, including W box and TCA-elements; two brassinosteroid-responsive elements (BRREs); two ethylene-responsive elements (EREs); two abscisic acid-responsive elements (ABREs); four methyl jasmonate-responsive elements (MJREs); one pathogen-responsive element Eli box 3; three wounding-and pathogenresponsive elements WRE3; three anaerobic-responsive elements (AREs); and twenty-four light-responsive elements (LREs), including ATCT sequences, Box 4, GT1 motif, TCT motifs, GA motifs, AE box, Box I ( Figure 1). The vast majority of these results were in agreement with data provided by PlantPan 2.0, which demonstrated the presence of numerous binding sites for TFs responding to these types of stimulation (Table S1). Only anaerobic-sensitive TFs were not found. The most commonly-observed TFs were those representing the following families: GATA (light); MYB-related and Dof (auxin); WRKY (SA, wounding and pathogen); NAC; NAM (brassinosteroid and abscisic acid (ABA)); MYB-related (ethylene); and CAMTA (MeJa) ( Table S1). The consensus sequences listed above were distributed along the entire HMGR4 promoter, with some being located only a few nucleotides from each other; this allows more precise regulation of gene expression by dimerization of binding TFs. Such sequences included two SA-responsive TCA elements and two LREs ( Table 1). The search of the PlantPan 2.0 database revealed about 234 TFs that could interact with 915 TFBSs in the HMGR4 proximal promoter (File S2). To complement the findings described above, the binding sites located in the proximal promoter were analysed in terms of their response to external factors. Such data were available for 666 TFBSs (File S2). The results indicated that HMGR4 transcription may be most dependent on light, SA, bacterial infection and auxins, and less dependent on ABA, gibberellin, chitin, cold or salt stress ( Figure 2). The response to these agents was mainly associated with the following TF families: GATA (light); WRKY (SA, bacterial infection, chitin); bZIP (auxins); MYB-related, WRKY and C2H2 (ABA); MYB-related and MADS box (gibberellins); C2H2 and WRKY (cold); and WRKY and MYB-related (salt stress) (Table S2). To determine the stage of S. miltiorrhiza development at which the HMGR4 gene expression regulation is likely to occur, 450 TFBSs and interacting TFs located in its proximal promoter were examined (File S2). The findings indicated flowering, embryogenesis, organogenesis (root, shoot, leaf and flower development) and circadian rhythm ( Figure 3).  As HMGR genes are crucial for the production of intermediates in the biosynthesis pathway for tanshinones, a literature search was performed for TFs that positively regulate tanshinone production. Following this, based on the results obtained with the PlantPan 2.0 database, the S. miltiorrhiza HMGR4 promoter sequence was searched for the presence of binding sites for the 20 identified TFs; of these, the results indicated the presence of the following five TFs: BHLH6 (MYC2) (14 binding sites), BHLH74 (2 sites), BZIP20 (32 sites), WRKY2 (8 sites) and WRKY61 (9 sites) (File S1) [36][37][38][39][40]. A number of TFBSs were also found to be located in the proximal promoter region. The obtained results suggest that HMGR4 may play a role in the biosynthesis of tanshinones.
Furthermore, in silico analysis of the HMGR4 promoter and 5′UTR revealed potentially interacting miRNAs (Table 2). In total, 12 mature miRNAs were found, 8 binding within the promoter and 4 within the 5′UTR.   As HMGR genes are crucial for the production of intermediates in the biosynthesis pathway for tanshinones, a literature search was performed for TFs that positively regulate tanshinone production. Following this, based on the results obtained with the PlantPan 2.0 database, the S. miltiorrhiza HMGR4 promoter sequence was searched for the presence of binding sites for the 20 identified TFs; of these, the results indicated the presence of the following five TFs: BHLH6 (MYC2) (14 binding sites), BHLH74 (2 sites), BZIP20 (32 sites), WRKY2 (8 sites) and WRKY61 (9 sites) (File S1) [36][37][38][39][40]. A number of TFBSs were also found to be located in the proximal promoter region. The obtained results suggest that HMGR4 may play a role in the biosynthesis of tanshinones.
Furthermore, in silico analysis of the HMGR4 promoter and 5′UTR revealed potentially interacting miRNAs (Table 2). In total, 12 mature miRNAs were found, 8 binding within the promoter and 4 within the 5′UTR. As HMGR genes are crucial for the production of intermediates in the biosynthesis pathway for tanshinones, a literature search was performed for TFs that positively regulate tanshinone production. Following this, based on the results obtained with the PlantPan 2.0 database, the S. miltiorrhiza HMGR4 promoter sequence was searched for the presence of binding sites for the 20 identified TFs; of these, the results indicated the presence of the following five TFs: BHLH6 (MYC2) (14 binding sites), BHLH74 (2 sites), BZIP20 (32 sites), WRKY2 (8 sites) and WRKY61 (9 sites) (File S1) [36][37][38][39][40]. A number of TFBSs were also found to be located in the proximal promoter region. The obtained results suggest that HMGR4 may play a role in the biosynthesis of tanshinones.
Furthermore, in silico analysis of the HMGR4 promoter and 5 UTR revealed potentially interacting miRNAs (Table 2). In total, 12 mature miRNAs were found, 8 binding within the promoter and 4 within the 5 UTR.

Microarray and Next-Generation Sequencing (NGS) Co-Expression Data Analysis
The Protein BLAST analysis revealed that the coding sequence of S. miltiorrhiza HMGR4 (AEZ55673.1) was more similar to A. thaliana HMGR1 (NP_177775.2), with an identity of 73.76%, than to A. thaliana HMGR2 (NP_179329.1), with one of 69.48%. Additionally, a phylogenetic study of coding sequences indicated that HMGR4 from S. miltiorrhiza (JN831103.1) and HMGR1 from A. thaliana (NM_106299.4) were more closely related to each other ( Figure S1). Therefore, the A. thaliana HMGR1 gene (At1g76490) was used for further co-expression research. As a result of the conducted microarray analysis, 166 TF genes co-expressed with A. thaliana HMGR1 in the r range of 0.5-1.0 were found: 41 in AtGenExpress Hormone and Chemical Compendium, 37 in AtGenExpress Abiotic Stress Compendium, 34 in AtGenExpress Pathogen Compendium, 25 in AtGenExpress Tissue Compendium, and 29 in AtGenExpress Plus-Extended Tissue Compendium (Table S3). The RNA-seq analysis did not identify any TF genes co-expressed with A. thaliana HMGR1 in the WGCNA correlation range of 0.5-1.0.

Comparison of the in Silico HMGR4 Analysis Results with Microarray Co-Expression Data
The comparison identified 32 common TFs in the S. miltiorrhiza HMGR4 promoter (Table 3), with the most well-represented being TFs from the homeodomain-leucine zipper (HD-ZIP) and WRKY families. The common TFs participated mainly in response to hormones (ABA, ethylene, jasmonic acid and cytokinins), other abiotic factors (light, salt stress, water deprivation, heat and iron ion) and biotic agents (bacteria), embryogenesis, organogenesis (root development), flowering or, finally, tissue development (epidermis) ( Table 3).     These 32 common TFs were scanned with the Genomatix Pathway System for the presence of interactions between them. The identified relationships are presented in Figure 4. It was found that SVP, AGL18 and SPL3 proteins were involved together in the regulation of flowering, and PDF2 and ATML1 in epidermal specification in embryos, respectively. Furthermore, NAC3 and RD26 responded to high salinity, drought and ABA, while NAC3 and ZF2 supported resistance to the herbivore Spodoptera littoralis. EIL3 and EIL1 played roles in regulating the response to sulphur deficiency and in ethylene signalling, and EBP transcription was light modulated through the EIN2-EIN3/EIL1 pathway. These 32 common TFs were scanned with the Genomatix Pathway System for the presence of interactions between them. The identified relationships are presented in Figure 4. It was found that SVP, AGL18 and SPL3 proteins were involved together in the regulation of flowering, and PDF2 and ATML1 in epidermal specification in embryos, respectively. Furthermore, NAC3 and RD26 responded to high salinity, drought and ABA, while NAC3 and ZF2 supported resistance to the herbivore Spodoptera littoralis. EIL3 and EIL1 played roles in regulating the response to sulphur deficiency and in ethylene signalling, and EBP transcription was light modulated through the EIN2-EIN3/EIL1 pathway. Of the 32 common TFs that were recognised, 18 were found to interact with the TFBSs situated in the proximal promoter region ( Table 3). The ability of these 18 TFs to bind to DNA as dimers or multimers was also tested. The TFBSs identified for HD-ZIP (ATML1, PDF2, HDG1), WRKY (WRKY2, WRKY14, WRKY45, WRKY57, WRKY69) and Dof (DOF5.4) are given in Table 1; all are closely located to each other, and only separated by a few nucleotides. The existence of experimentally-determined interactions between ATML1 and PDF2 proteins was confirmed by the BioGRID database.

Comparison of S. miltiorrhiza HMGR Promoters
The S. miltiorrhiza HMGR1, HMGR2 and HMGR4 promoter sequences were analysed using the Common TFs tool. Based on the findings, common binding sites for TFs were recognised, and these were classified into 22 matrix families ( Figure 5, Table 4), with each single matrix family comprising identical or functionally-similar TFs identified by weight matrices. The following matrix families were found: Arabidopsis homeobox proteins (P$AHBP), L1 box (P$L1BX), MYB IIG-type binding sites (P$MIIG), DNA binding with one finger (P$DOFF), GT box elements (P$GTBX), MADS box proteins (P$MADS), MYB-like proteins (P$MYBL), MYB proteins with single DNA binding repeat (P$MYBS), NAC factors with transmembrane motif (P$NTMF), plant specific NAC proteins (P$NACF), transcription repressor KANADI (P$KAN1), W box family (P$WBXF), time-of-day-specific regulatory elements (P$TODS), nodulin consensus sequence 1 (P$NCS1), sweet potato DNA-binding factor with two WRKY domains (P$SPF1), zinc Of the 32 common TFs that were recognised, 18 were found to interact with the TFBSs situated in the proximal promoter region ( Table 3). The ability of these 18 TFs to bind to DNA as dimers or multimers was also tested. The TFBSs identified for HD-ZIP (ATML1, PDF2, HDG1), WRKY (WRKY2, WRKY14, WRKY45, WRKY57, WRKY69) and Dof (DOF5.4) are given in Table 1; all are closely located to each other, and only separated by a few nucleotides. The existence of experimentally-determined interactions between ATML1 and PDF2 proteins was confirmed by the BioGRID database.

Comparison of S. miltiorrhiza HMGR Promoters
The S. miltiorrhiza HMGR1, HMGR2 and HMGR4 promoter sequences were analysed using the Common TFs tool. Based on the findings, common binding sites for TFs were recognised, and these were classified into 22 matrix families ( Figure 5, Table 4), with each single matrix family comprising identical or functionally-similar TFs identified by weight matrices. The following matrix families were found: Arabidopsis homeobox proteins (P$AHBP), L1 box (P$L1BX), MYB IIG-type binding sites (P$MIIG), DNA binding with one finger (P$DOFF), GT box elements (P$GTBX), MADS box proteins (P$MADS), MYBlike proteins (P$MYBL), MYB proteins with single DNA binding repeat (P$MYBS), NAC factors with transmembrane motif (P$NTMF), plant specific NAC proteins (P$NACF), transcription repressor KANADI (P$KAN1), W box family (P$WBXF), time-of-day-specific regulatory elements (P$TODS), nodulin consensus sequence 1 (P$NCS1), sweet potato DNA-binding factor with two WRKY domains (P$SPF1), zinc finger proteins (P$ZFAT), light-responsive elements (P$LREM), protein secretory pathway elements (P$PSPE), CGCG box binding proteins (P$CGCG), proteins involved in programmed cell death response (P$PCDR), plant nitrate-responsive elements (P$PNRE) and finally, stomatal carpenter (P$SCAP). As can be seen in Figure 5, the distribution of the matrices within the HMGR1 and HMGR2 promoter sequences was strikingly similar. The above TF family analysis found that the S. miltiorrhiza HMGR1, HMGR2 and HMGR4 genes can be co-regulated in response to abiotic factors (auxins, gibberellins, ABA, SA, jasmonic acid, brassinosteroids, light, water deprivation, salt stress, cold or phosphate starvation), biotic factors (bacteria, fungi and viruses) and during root, stem, leaf and flower organogenesis (Table 4). finger proteins (P$ZFAT), light-responsive elements (P$LREM), protein secretory pathway elements (P$PSPE), CGCG box binding proteins (P$CGCG), proteins involved in programmed cell death response (P$PCDR), plant nitrate-responsive elements (P$PNRE) and finally, stomatal carpenter (P$SCAP). As can be seen in Figure 5, the distribution of the matrices within the HMGR1 and HMGR2 promoter sequences was strikingly similar. The above TF family analysis found that the S. miltiorrhiza HMGR1, HMGR2 and HMGR4 genes can be co-regulated in response to abiotic factors (auxins, gibberellins, ABA, SA, jasmonic acid, brassinosteroids, light, water deprivation, salt stress, cold or phosphate starvation), biotic factors (bacteria, fungi and viruses) and during root, stem, leaf and flower organogenesis (Table 4).  root, seed, stamen and xylem development; cellular cadmium ion homeostasis; gibberellin and flavonol biosynthesis; defence response to fungi; response to: ABA, chitin, salt stress, cold, water deprivation, phosphate starvation, potassium ion and light DNA binding with one finger <break/>(P$DOFF) secondary shoot, cotyledon and seed development; cell  shoot system and stomatal complex development; trichome morphogenesis; seed maturation and germination; cell size and growth; response to: auxin and water deprivation The FrameWorker tool indicated the existence of 10,000 10-element-frameworks within the S. miltiorrhiza HMGR1, HMGR2 and HMGR4 promoters. Two selected models are provided in Figure 6. The frameworks were created based on 52 matrix families common to the tested sequences, some of which are mentioned above in the Common TF results section. The matrix families were located on the positive or negative strand of the promoters.
The FrameWorker tool indicated the existence of 10,000 10-element-frameworks within the S. miltiorrhiza HMGR1, HMGR2 and HMGR4 promoters. Two selected models are provided in Figure 6. The frameworks were created based on 52 matrix families common to the tested sequences, some of which are mentioned above in the Common TF results section. The matrix families were located on the positive or negative strand of the promoters. The DiAlign TF tool analysis found the HMGR1 (GU367911.1) and HMGR2 (KF297286.1) promoters to demonstrate the greatest similarity (97%). In contrast, only 17% similarity was found between HMGR4 (KT921337.1) and HMGR2, and 14% between HMGR4 and HMGR1. The greatest number of identical areas was revealed in the proximal fragments of the analysed promoters, as well as in the beginning and the middle of the distal parts. Common overlapping TFBSs were identified in locations where all three tested sequences showed high local similarity; these included two binding sites for Arabidopsis homeobox proteins (P$AHBP), one site for SBP domain proteins (P$SBPD), one site for W box family proteins (P$WBXF), one site for DNA binding with one finger factors (P$DOFF), one GT box element (P$GTBX) and, finally, one L1 box (P$L1BX). The PlantPan 2.0 and MatInspector (Genomatix) databases indicated that P$AHBP proteins are mainly involved in the response to hormones (auxins, ABA, cytokinins and gibberellins) and the initiation and development of shoot, root and flower meristems. P$SBPD TFs are associated with inflorescence development, flowering and leaf epidermis differentiation. In turn, P$WBXF matrix family responds to hormonal stimulation (SA, ABA, ethylene and jasmonic acid), other abiotic factors (salt stress, wounding, osmotic stress, heat, water deprivation and cold) and biotic agents (bacteria, fungi and viruses), and also participate in leaf senescence. P$DOFF proteins are primarily involved in the regulation of flowering, circadian rhythm and in response to hormones (auxins and SA). P$GTBX factors participate in the organogenesis of flowers and shoots. In addition, P$L1BX pro- The DiAlign TF tool analysis found the HMGR1 (GU367911.1) and HMGR2 (KF297286.1) promoters to demonstrate the greatest similarity (97%). In contrast, only 17% similarity was found between HMGR4 (KT921337.1) and HMGR2, and 14% between HMGR4 and HMGR1. The greatest number of identical areas was revealed in the proximal fragments of the analysed promoters, as well as in the beginning and the middle of the distal parts. Common overlapping TFBSs were identified in locations where all three tested sequences showed high local similarity; these included two binding sites for Arabidopsis homeobox proteins (P$AHBP), one site for SBP domain proteins (P$SBPD), one site for W box family proteins (P$WBXF), one site for DNA binding with one finger factors (P$DOFF), one GT box element (P$GTBX) and, finally, one L1 box (P$L1BX). The PlantPan 2.0 and MatInspector (Genomatix) databases indicated that P$AHBP proteins are mainly involved in the response to hormones (auxins, ABA, cytokinins and gibberellins) and the initiation and development of shoot, root and flower meristems. P$SBPD TFs are associated with inflorescence development, flowering and leaf epidermis differentiation. In turn, P$WBXF matrix family responds to hormonal stimulation (SA, ABA, ethylene and jasmonic acid), other abiotic factors (salt stress, wounding, osmotic stress, heat, water deprivation and cold) and biotic agents (bacteria, fungi and viruses), and also participate in leaf senescence. P$DOFF proteins are primarily involved in the regulation of flowering, circadian rhythm and in response to hormones (auxins and SA). P$GTBX factors participate in the organogenesis of flowers and shoots. In addition, P$L1BX proteins are needed for epidermis development and seed germination. These data are available in File S1 and Table 4.

The Conservation of Plant HMGR Promoters
MEGA X software alignment of 36 sequences spanning the proximal promoters and 5 UTRs of the plant HMGR genes revealed the presence of conserved regions; these are marked in blue in Figure 7. These regions were detected in both the 5 UTRs and proximal promoters. In most of the tested sequences, PRS was identified within the preserved areas. Additional analysis with the DiAlign TF tool revealed the presence of conserved TFBSs. However, no TFBS was found to be conserved in any of the analysed sequences. The most conserved site was the TATA box, detected in 41.7% of the sequences. Two preserved binding sites for TFs belonging to the P$AHBP (Arabidopsis homeobox protein) and P$GCCF (GCC box family) families were detected in 27.8% of cases. Sites for P$TDTF (transposase-derived proteins), P$MYBL (MYB-like proteins) and P$L1BX (L1 box) proteins were identified in 25% of the sequences. Binding sites for P$DREB (dehydration responsive element binding factors) and P$ROOT (root hair-specific cis-elements in angiosperms) families were found in 22.2%. These TFBSs are highlighted in red in Figure 7. Apart from the conserved TATA box and PRS motifs, the investigated S. miltiorrhiza HMGR4 promoter was found to contain several other common binding sites for TFs, belonging to P$AHBP (Arabidopsis homeobox protein), P$GTBX (GT box elements), P$DOFF (DNA binding with one finger) and P$L1BX (L1 box); these were shared by 8-13% of the tested sequences. Nucleotide pairwise alignment of S. miltiorrhiza HMGR4 with the remaining tested sequences found 13 to 31% identity (mean 24.7%).

Discussion
Our study presents new data regarding the isolated S. miltiorrhiza HMGR4 promoter and 5 UTR and compares these sequences with other plant HMGRs.
Initially, the sequences were examined for the presence of certain distinctive motifs (Figure 1). One such motif found in the investigated sequences is the TATA box, which is estimated to be present in 30-50% of all known promoters [41] and 29% of A. thaliana promoters [42]. It was also detected in the S. miltiorrhiza HMGR2 promoter [43]. Previous studies in human and yeast models indicate that the TATA box is more common in promoters of highly-regulated genes and in those stimulated by stress factors and extracellular signals [44][45][46][47][48]; in contrast, TATA-less genes are more constitutively expressed and associated with key processes such as cell growth [44][45][46][47][48]. In addition, promoters containing the TATA box appear to have a more conserved sequence than those that do not [49].
The HMGR4 promoter is also characterised by the presence of a single tandem repeat. This motif is estimated to be present in only 25% of promoters [50], and is absent from the S. miltiorrhiza HMGR2 promoter [43]. As tandem repeats are more prone to mutation, which affects the length of the repeat and thus local nucleosome positioning and gene expression rate, genes whose promoters have tandem repeats show higher rates of transcription divergence [50].
Both the HMGR2 promoter and the studied HMGR4 promoter lack CpG islands [43]. The cytosines in the CG dinucleotides of the islands can be methylated, thus inhibiting gene expression [51,52]. However, the CpG cluster is not required for methylation since, in plants, it can also occur within the CHG and CHH sequences (H = A, T or C) [53].
A PRS is also detected in the 5 UTR of the described sequence. This is a rather rare observation, but not an unprecedented one, as a PRS has also been found in the 5 UTR of the S. miltiorrhiza HMGR2 gene [43]. It is believed to take part in the organisation of the spliceosomal complex [54].
Furthermore, the examined S. miltiorrhiza HMGR4 promoter sequence turned out to be rich in TFBSs recognised by specific TFs (File S1). The conducted research indicates that the number of promoter regulatory elements and interacting proteins positively correlates with divergence of gene expression [55]. The HMGR4 proximal promoter was found to contain consensus sequences mainly related to the response to light, SA, bacterial infection, auxins, ABA, gibberellin, chitin, cold or, finally, salt stress (Figure 2), suggesting that these factors may participate in gene regulation. One previous paper investigating the influence of external agents on S. miltiorrhiza HMGR4 found that treatment with 200 µM MeJa had no significant effect on HMGR4 expression in either leaves or roots [19]. It is important to note that the effect of these factors has been examined on other S. miltiorrhiza HMGR genes. Chen et al. found that only 100% red light slightly increased the expression of HMGR in hairy root culture, while other light types (e.g., 100% far-red, 100% blue, red:far-red, blue:far-red, red:blue, red:blue:UV) had an inhibitory effect [56]. In contrast, Wang et al. noted that UV-B enhanced the expression of HMGR1 in roots almost 5-fold compared to an untreated control [57]. Incubation of hairy root culture with 100 µM SA raised HMGR transcript level, peaking at three-times higher than baseline after 36 h [58]. In turn, 200 µM SA has been found to have a differential effect on HMGR2 promoter in leaf material [43]. A decrease in its activity was observed after 12, 24 and 48 h of treatment, while a 2.5-to 3-fold rise compared to the calibrator values was observed after 72 and 96 h. Bacteria appeared to be good activators of HMGR expression. The addition of Pseudomonas brassicacearum subsp. neoaurantiaca and Pseudomonas thivervalensis to S. miltiorrhiza root culture resulted in 2.1-and 1.5-fold enhancements in HMGR enzyme activity, respectively [59]. In addition, Streptomyces pactum Act12 increased HMGR1 expression by more than a factor of 35 on day 14 relative to the calibrator [60]. Exposure to 2.85 µM IAA and 2.88 µM gibberellic acid improved the activity of the HMGR2 promoter, resulting in manifold higher expression compared to the calibrator in 96 h [43]. In turn, 200 µM and 10 µM ABA upregulated HMGR1 and HMGR2, respectively [43,61]. Salt stress (50 mM, 100 mM, 200 mM, 300 mM NaCl) enhanced the expression and enzymatic activity of HMGR in leaves and roots over 48 h of exposure [62]. It was also found that 200 mM NaCl inhibited the level of HMGR1 transcript in leaves and roots as compared to the calibrator [63].
As non-coding regions are generally not highly conserved, from an evolutionary perspective, finding such motifs in the promoter or 5 UTR sequences suggest they have functional importance [64].
Within the studied 36 HMGR sequences, the most frequently-identified conserved motif was the TATA box ( Figure 7). This is a known sequence that has been conserved from Archaebacteria to humans [65]. The other TFBSs discussed in the Results section were shared by a much smaller number of tested sequences (27.8% or fewer).
The study also examined the possibility that more complex structures could be created by TFs interacting with the HMGR4 promoter. TFs participate in the regulation of gene expression as monomers, dimers (homo-and heterodimers) or multimers. Dimers and multimers are often preferred by nature because they allow specific interactions with the promoter sequence and bind with high affinity [66]. One TF monomer can create dimers or multimers with different functions, thus mediating the regulation of various genes, by forming bonds with multiple, but not random, protein partners [67]. Our analyses revealed the presence of closely-related TFBSs for the following TFs in the S. miltiorrhiza HMGR4 proximal promoter: HD-ZIP (ATML1, PDF2 and HDG1), WRKY (WRKY2, WRKY14, WRKY45, WRKY57 and WRKY69) and Dof (DOF5.4) ( Table 1). The HD-ZIP proteins are unique to the plant kingdom. TFs from the family are unable to bind to DNA as monomers [68]. They form homo-and heterodimers via the leucine zipper motif [67]. Meanwhile, ATML1 was able to create a heterodimer with its paralogue PDF2 in studies on Nicotiana benthamiana and A. thaliana [69,70], and to form homodimers in vitro [69,71]. It has been shown that WRKY TFs can interact with DNA as monomers or create homo-and heterodimers, especially those with a leucine zipper motif [72][73][74], WRKY2 protein was found to form homodimers in Hordeum vulgare [75], while WRKY45 created homodimers in vitro by exchanging β4-β5 strands in Oryza sativa [72]. The Dof TFs have a multifunctional domain that allows them to bind to DNA and interact with other proteins [76] and to establish homo-and heterodimers.
As miRNA is believed to regulate plant promoter activity at the transcription level, the investigated HMGR4 sequence was searched for miRNA binding sites and interacting miRNAs [27]. Of the 12 miRNAs potentially binding to the HMGR4 promoter and 5 UTR sequences (Table 2), non-conserved miR1128 and miR1436 were detected during deep sequencing in S. miltiorrhiza [77]. However, their significance in the regulation of gene expression has not yet been investigated at the experimental level.
The results of our present in silico analysis of the HMGR4 promoter and 5 UTR sequences constitute a strong basis for planning future necessary experiments on S. miltiorrhiza.

Isolation of the S. miltiorrhiza HMGR4 Promoter Sequence
Genomic DNA used for isolation of the HMGR4 promoter was obtained from young, fresh S. miltiorrhiza leaves and stems according to the method proposed by Khan et al. [78]. The DNA was analysed using a NanoPhotometer P300 (Implen, Munich, Germany) to determine its quantity and quality based on A 260 /A 280 and A 260 /A 230 ratios. The HMGR4 promoter region was isolated using GenomeWalker Universal Kit (Takara Bio, Kusatsu, Japan) according to the manufacturer's instructions. A 5 -terminal fragment of the HMGR4 gene, deposited in GenBank under accession number JN831103.1, was used as a target for designing GSP1 and GSP2 specific primers (Table S4). The PCR reactions were performed using the Advantage 2 PCR Kit (Takara Bio, Kusatsu, Japan). The amplified DNA fragments were TOPO-TA cloned into a pCRII-TOPO vector (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions. The inserts were Sanger sequenced (CoreLab, Medical University of Lodz, Lodz, Poland) with specific primers listed in Table S4. The HMGR4 promoter sequence was assembled using CodonCode Aligner software version 8.0.2 (CodonCode Corporation, Centerville, MA, USA).
The isolation and sequencing of the S. miltiorrhiza HMGR4 promoter took approximately one month.

In Silico Analysis of the S. miltiorrhiza HMGR4 Promoter Sequence
The obtained HMGR4 promoter sequence was characterised in silico using available tools and databases [79]. The promoter, TATA box, TSS and 5 UTR positions were identified with TSSP software (Softberry Inc., Mount Kisco, NY, USA) [80]. Tandem repeats, CpG islands, TFBSs and TFs were detected using PlantPan 2.0 [81]. The promoter sequence was screened for the presence of commonly-known consensus motifs reported in the published literature. Assuming that the functional TFBSs are concentrated mainly in the proximal promoters, special attention was paid to the promoter region lying within 300 bp from the TSS. The miRBase tool was used to search for miRNA binding sites and interacting miRNAs in the obtained promoter and 5 UTR sequences [82].

Microarray and NGS Co-Expression Data Analysis
Protein BLAST (NCBI, Bethesda, MD, USA) and MEGA X version 10.2.6 (Pennsylvania State University, State College, PA, USA) [83] were employed to determine which of the A. thaliana HMGR genes is a homologue of the S. miltiorrhiza HMGR4 gene. Analyses were performed on coding sequences. Expression Angler (BAR, Toronto, ON, Canada) [84] and Arabidopsis RNA-seq Database [85] were utilised to find TF genes co-expressed with the selected A. thaliana HMGR gene. The Expression Angler tool has access to the expression results for approximately 22,000 Arabidopsis genes, while the Arabidopsis RNA-seq Database integrates 28,164 publicly available Arabidopsis RNA-seq libraries. The following microarray dataset compendiums were used during the study: AtGenExpress Hormone and Chemical, AtGenExpress Abiotic Stress, AtGenExpress Pathogen, AtGenExpress Tissue, and AtGen-Express Plus-Extended Tissue. The Pearson's correlation coefficient (r) ranging from 0.50 to 1.00 (moderate to strong positive correlation) was applied to identify co-regulated genes. Information on the detected TFs was obtained from the UniProt database [86]. The collected results were compared with the in silico data found by PlantPan 2.0. The occurrence of interactions between the received common TFs was determined using Pathway System (Genomatix, Munich, Germany) and the BioGRID database version 4.4.201 [87].

Comparison of S. miltiorrhiza HMGR Promoters
The entire available S. miltiorrhiza HMGR promoter sequences, i.e., HMGR1 (GU367911.1), HMGR2 (KF297286.1), and HMGR4 (KT921337.1) were analysed with Common TFs, Frame-Worker and DiAlign TF tools from Genomatix, Munich, Germany. Common TFs was used for preliminary analysis of the common TFBSs and interacting TFs located anywhere in the investigated promoters. The search only included sites that were common to all three sequences. The similarity of the matrix to the tested sequences was set to the highest possible value, i.e., 0.05. The FrameWorker tool permitted the common, most complex framework of TFBSs to be extracted from the input promoters. Frameworks are defined as TFBSs that occur in the same order and in a specificed space range in all of the sequences. The DiAlign TF allowed for multiple alignment of the studied HMGR promoters, and revealed conserved regions and TFBSs located therein. The analyses using the Genomatix tools were based on matrix library version 11.3 and default search criteria.

Assessment of Conservation of Plant HMGR Promoters
The conservation of the 36 available HMGR promoters derived from plants such as A. thaliana, Arabidopsis lyrata, Glycine max, Gossypium hirsutum, Oryza sativa, Solanum lycopersicum, Zea mays and S. miltiorrhiza was assessed by aligning their sequences. Proximal promoters and 5 UTR sequences (each sequence 500 bases long) were obtained from Plant-Pan 3.0 [88] and NCBI Nucleotide databases with the participation of the UniProt [86]. Alignments were performed using the MUSCLE algorithm from the MEGA X software, version 10.2.6 [83]. TFBSs located in the conserved regions of the compared sequences were recognised with the DiAlign TF tool (Genomatix, Munich, Germany).

Conclusions
Regulation of S. miltiorrhiza HMGR4 gene expression can occur during flowering, embryogenesis, organogenesis and circadian rhythm, and are influenced mainly by factors such as light, SA, bacterial infection and auxins.
The presence of binding sites for TFs that promote the biosynthesis of tanshinones may indicate that the S. miltiorrhiza HMGR4 gene plays an important role in the production of these metabolites.
A comparison of TFBSs and TFs in the S. miltiorrhiza HMGR1, HMGR2, and HMGR4 promoter sequences indicates that these genes can be co-regulated in response to abiotic and biotic factors, and during organogenesis.
The S. miltiorrhiza HMGR4 promoter is not highly conserved. Future research on the S. miltiorrhiza HMGR4 promoter could be developed towards preparing promoter deletion mutants, and studying their transcriptional activity [89]. Moreover, mutagenesis of particular TFBSs could be suitable for experimental verification of their importance in response to biotic or abiotic factors [89]. The TFs or other regulatory proteins could be isolated using a yeast-one hybrid (Y1H) system and the promoter segments as bait [90]. Isolated TFs could be functionally characterised by studying their DNA binding properties, and their potential to increase expression of specific genes [91,92]. These studies could be verified by chromatin immunoprecipitation-sequencing (ChIP-seq) of DNA fragments that are associated with particular proteins [93]. Finally, regulatory networks of TFs and other proteins playing a pivotal role in the response to certain external factors could be built using transcriptomic RNA sequencing, and weighted gene co-expression network analysis (WGCNA) [94,95].
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/plants11141861/s1, Figure S1: Unrooted dendrogram of HMGR sequences from S. miltiorrhiza and A. thaliana; Table S1: TFs responsive to light, hormone, wounding and pathogen stimulation found in the entire S. miltiorrhiza HMGR4 promoter sequence using PlantPan 2.0 database; Table S2: TFs responsive to major abiotic and biotic factors found in the proximal S. miltiorrhiza HMGR4 promoter using the PlantPan 2.0 database; Table S3: TF genes co-expressed with A. thaliana HMGR1 found with the Expression Angler tool; Table S4: Primers used in the study.