1. Introduction
Plants have developed complex regulatory networks during long-term evolution to adapt to changing environments and to maintain normal growth and development [
1]. Among these, secondary metabolic pathways generate a wide range of bioactive compounds that play crucial roles in resisting pathogen invasion, coping with abiotic stresses and regulating physiological processes [
2], and also have broad applications in the pharmaceutical, food and industrial sectors [
3]. Transcription factors are central regulators of secondary metabolism, as they integrate external environmental signals with endogenous phytohormone pathways to precisely control the expression of biosynthetic enzyme genes [
4].
The AP2/ERF (APETALA2/Ethylene-Responsive Factor) superfamily is a large, plant-specific transcription factor family characterised by a highly conserved AP2 DNA-binding domain [
5]. It is widely involved in the regulation of growth and development [
6], hormone signal transduction [
7], and stress responses [
8]. With the rapid progress of plant genome sequencing, AP2/ERF family members have been systematically identified in various species, including approximately 147 members in Arabidopsis thaliana [
9], ~170 in rice [
10], ~301 in soybean [
11], ~132 in grapevine [
12], and ~200 in poplar [
13]. These studies have revealed their regulatory roles in cold, drought and salt stresses [
14,
15,
16], pathogen responses [
17], reactive oxygen species (ROS) scavenging [
18], as well as signalling pathways mediated by ethephon (ETH), jasmonic acid (JA) and abscisic acid (ABA) [
19,
20].
Beyond their canonical roles in development and stress adaptation, AP2/ERF transcription factors also participate in the regulation of specialised metabolism [
21]. ORCA2 and ORCA3 in
Catharanthus roseus activate multiple biosynthetic enzyme genes involved in terpenoid indole alkaloid production [
22]; AaERF1 and AaERF2 in
Artemisia annua enhance artemisinin biosynthesis [
23]; and in
A. thaliana, AtERF4 and AtERF8 influence the accumulation of phenylpropanoid-derived products such as anthocyanins [
24]. These findings suggest that AP2/ERF transcription factors act as key nodes in regulating plant secondary metabolism and may represent effective molecular targets for improving the yield of medicinal constituents.
Salvia miltiorrhiza (Danshen) is a traditional Chinese medicinal plant widely used for the treatment of cardiovascular and cerebrovascular diseases [
25]. Its major bioactive components include water-soluble phenolic acids and lipid-soluble diterpene quinones [
26]. Following the release of the Danshen genome, previous studies have preliminarily identified the AP2/ERF gene family in
S. miltiorrhiza, reporting approximately 170 members and classifying them into subfamilies such as AP2, ERF and RAV, while expression-based evidence suggested that some genes might be involved in the biosynthesis of tanshinones and phenolic acids [
27]. Despite these advances, most studies have mainly focused on gene number statistics or expression analysis of selected candidates, and comprehensive investigation of genome-scale features—such as overall evolutionary patterns, chromosomal distribution and duplication expansion modes, and intra-/inter-species synteny relationships—remains limited. In addition, systematic analyses of spatiotemporal expression patterns across tissues or developmental stages, as well as studies linking hormone-induced AP2/ERF responses with the accumulation of specialised metabolites (e.g., phenolic acids and tanshinones), are still scarce.
This study performed a comprehensive identification and analysis of the AP2/ERF family in S. miltiorrhiza based on the latest genome dataset, including phylogenetic relationships, gene structures, chromosomal localisation and synteny. By integrating phylogenetic information from Arabidopsis ERF proteins, we further screened four candidate SmAP2/ERF genes that may be involved in phenolic-acid biosynthesis, and analysed their tissue-specific expression patterns, hormone-responsive expression profiles, and subcellular localisation. This work provides a theoretical basis for understanding the evolutionary characteristics of the SmAP2/ERF family and lays a foundation for exploring their potential roles in phenolic-acid biosynthesis.
2. Results
2.1. Identification and Basic Characteristics of SmAP2/ERF Family Members
Based on the
S. miltiorrhiza genome dataset, a total of 169 putative AP2/ERF family members were identified and sequentially named
SmAP2/ERF1–
SmAP2/ERF169 according to their chromosomal order and physical positions (
Supplementary Table S1). With the exception of
SmAP2/ERF36, all members contained a complete AP2 domain.
Physicochemical property prediction showed that SmAP2/ERF proteins ranged from 93 to 645 aa in length, with SmAP2/ERF167 being the shortest (93 aa) and SmAP2/ERF72 the longest (645 aa). The predicted molecular weights varied from 10.45 to 70.59 kDa, and the theoretical isoelectric points (pI) ranged from 4.36 to 11.25, with 70 proteins showing pI values above 7. Most SmAP2/ERFs had an instability index (II) higher than 40, except for SmAP2/ERF107, SmAP2/ERF120, SmAP2/ERF135, and SmAP2/ERF77. The aliphatic index ranged from 44.43 to 91.37 (mean 63.11), and the average GRAVY value was −0.618, indicating that SmAP2/ERF proteins are overall hydrophilic.
Subcellular localisation prediction suggested that most SmAP2/ERF proteins are localised in the nucleus (140/169), while a smaller proportion were predicted to localise in the chloroplast (15), mitochondrion (7), or cytoplasm (6). One exception was SmAP2/ERF117, which was the only member predicted to localise to the peroxisome, highlighting additional diversity in subcellular distribution within this family.
2.2. Phylogenetic Relationships and Subfamily Classification of SmAP2/ERF Proteins
To characterise the conserved structural features of the AP2/ERF family in
S. miltiorrhiza, multiple sequence alignment was performed using SmAP2/ERF protein sequences, and low-confidence regions were removed prior to downstream analyses. The alignment showed that members of the AP2 subfamily contained typical double-repeat domains (AP2-R1 and AP2-R2) and largely retained conserved signature residues within the AP2 domain (
Supplementary Figure S1). Members of the RAV subfamily possessed both AP2 and B3 domains, consistent with the canonical architecture of this group. Sequence divergence within the AP2 domain further distinguished the ERF and DREB subfamilies. Among these,
SmAP2/ERF5 exhibited relatively high sequence similarity to the
Arabidopsis Soloist clade.
Based on these domain-feature comparisons, a maximum-likelihood phylogenetic tree was further constructed using combined AP2/ERF proteins from
S. miltiorrhiza and
A. thaliana to resolve evolutionary relationships (
Figure 1). The resulting phylogeny classified all members into five major clades—AP2, RAV, ERF, DREB, and Soloist—which agreed well with the domain compositions revealed by sequence alignment. Further subdivision indicated that the DREB clade could be grouped into six subgroups (A1–A6), whereas the ERF clade was separated into six subgroups (B1–B6). This classification is consistent with the established
Arabidopsis reference system.
2.3. Gene Structure Features and Conserved Motif Distribution of SmAP2/ERFs
Gene structure analysis revealed pronounced differences in exon–intron organisation among SmAP2/ERF subfamilies. As shown in the right panel of
Figure 2, genes in the AP2 and Soloist clades exhibited more complex gene structures, whereas ERF and DREB members showed markedly simplified architectures. AP2 subfamily genes typically contained 3–10 exons and 2–9 introns, and the Soloist member comprised 7 exons and 6 introns. Most ERF/DREB genes were intronless (single-exon) or contained only 0–1 intron. The RAV clade displayed an intermediate level of structural complexity, with 1–5 exons and 0–4 introns. UTR annotations were more frequently observed in AP2, RAV and Soloist members, while most ERF/DREB genes lacked annotated UTRs in the current genome annotation.
Conserved domain analysis confirmed that all SmAP2/ERF proteins contained at least one AP2 domain, except SmAP2/ERF36. AP2 subfamily members harboured two AP2 repeats, RAV members featured an AP2 and a B3 domain, and ERF/DREB members possessed a single AP2 domain, consistent with the phylogenetic classification (
Figure 2, middle panel).
To further investigate sequence conservation and group-specific features, 10 conserved motifs were predicted using MEME. The distribution of these motifs is shown in the left panel of
Figure 2. Motif1, Motif2 and Motif3 were widely distributed across the majority of SmAP2/ERF proteins, suggesting that they represent core motifs associated with the AP2 DNA-binding domain. Clear subfamily-specific motif patterns were also observed: the AP2 subfamily was characterised by Motif5 and Motif6, ERF subgroups (B1–B6) commonly contained Motif8, and the DREB A1 subgroup possessed a unique combination of Motif9 and Motif10. Both the A1 and A4 subgroups retained Motif4, while members of the RAV and Soloist clades mainly preserved Motif1 and Motif2.
2.4. Chromosomal Distribution, Gene Duplication Events and Ka/Ks Analysis of SmAP2/ERFs
After mapping SmAP2/ERF genes onto the eight chromosomes of
S. miltiorrhiza, we found that these genes were unevenly distributed across the genome (
Figure 3). Chr1 harboured the largest number of SmAP2/ERF genes (35), followed by Chr5 (25) and Chr6 (23), whereas Chr8 contained the fewest (7). Owing to incomplete assembly of the reference genome, three SmAP2/ERF genes were located on unanchored scaffolds.
Tandem duplication analysis identified 26 tandem-duplication events distributed across Chr1–Chr8. Chr5 showed the highest number of tandem duplications (7 events), followed by Chr6 (6 events) and Chr7 (5 events), while only one tandem-duplication event was detected on each of Chr1, Chr2 and Chr8. Representative tandem gene clusters included SmAP2/ERF96–SmAP2/ERF97 on Chr5, SmAP2/ERF134–SmAP2/ERF135 on Chr6, and SmAP2/ERF153–SmAP2/ERF155 on Chr7.
To evaluate the selective pressure acting on duplicated genes, Ka, Ks and Ka/Ks ratios were calculated for tandem duplicate pairs. The results showed that all tandem duplicate pairs exhibited Ka/Ks < 1, indicating that these tandem duplicated SmAP2/ERF genes have mainly undergone purifying selection during evolution.
2.5. Cis-Acting Element Composition in the Promoters of SmAP2/ERF Genes
Systematic annotation of cis-acting regulatory elements in the 2000 bp upstream promoter regions of SmAP2/ERF genes revealed that the predicted elements could be broadly classified into four categories: light-responsive elements, phytohormone-responsive elements, stress-related elements, and development-related elements (
Figure 4).
Among light-responsive elements, G-box was the most abundant motif, followed by typical light-related elements such as GT1-motif, TCT-motif, and GATA-motif. SmAP2/ERF127 harboured the highest number of light-responsive elements (20).
In the phytohormone-responsive category, ABRE was the most enriched element. CGTCA-motif and TGACG-motif associated with methyl jasmonate (MeJA) responsiveness were also detected, together with the TGA-element and TCA-element related to auxin and salicylic acid responses. SmAP2/ERF115 harboured the largest number of hormone-responsive elements (21).
For stress-related elements, ARE was the most frequently observed motif, together with representative stress-responsive elements such as LTR (low-temperature response) and MBS (drought response). SmAP2/ERF122 and SmAP2/ERF11 contained the highest numbers of stress-responsive elements (9 each).
Within development-related elements, circadian was the predominant motif, and SmAP2/ERF47 showed the highest number of development-related elements, suggesting a potential association with circadian-regulated processes.
It should be noted that the presence of these predicted cis-elements indicates regulatory potential and does not by itself confirm hormone or stress responsiveness of individual SmAP2/ERF genes.
2.6. Intraspecies and Interspecies Synteny Relationships of SmAP2/ERFs
To further investigate the evolutionary relationships of the SmAP2/ERF family, we performed both intraspecies and interspecies synteny analyses. All 169 SmAP2/ERF genes were initially included in the synteny search; however, the three genes located on unanchored scaffolds did not exhibit detectable syntenic relationships with other SmAP2/ERFs and were therefore excluded from the duplicated gene pair statistics and synteny visualisations. The intraspecies collinearity analysis identified 49 segmentally duplicated SmAP2/ERF gene pairs in the
S. miltiorrhiza genome (
Figure 5). These duplicated pairs were mainly distributed in the DREB (21 pairs) and ERF (20 pairs) clades, while relatively fewer were detected in the AP2 (7 pairs) and RAV (1 pair) clades, indicating that segmental duplication contributed substantially to the expansion of multiple subfamilies.
For interspecies comparisons, substantial collinear relationships were detected between
S. miltiorrhiza and the selected reference species (
Figure 6). Specifically, 166 syntenic SmAP2/ERF gene pairs were identified between
S. miltiorrhiza and
A. thaliana. Syntenic gene pairs between
S. miltiorrhiza and
S. bowleyana,
S. hispanica, and
S. splendens numbered 193, 259, and 502, respectively, indicating extensive retention of collinear loci across these species.
2.7. Candidate Prioritisation and Preliminary Characterisation of Four SmAP2/ERFs Potentially Associated with Phenolic-Acid Metabolism
To prioritise SmAP2/ERF candidates potentially involved in phenylpropanoid-related regulation, candidate clades were screened using the combined phylogenetic tree. Four SmAP2/ERF members, SmAP2/ERF110, SmAP2/ERF121, SmAP2/ERF122 and SmAP2/ERF88, clustered with Arabidopsis thaliana ERFs reported to participate in phenylpropanoid-associated processes and were selected for subsequent characterisation. Further classification according to ERF subgroups showed that SmAP2/ERF88 belongs to ERF-B3, SmAP2/ERF110 to ERF-B4, whereas SmAP2/ERF121 and SmAP2/ERF122 were both assigned to ERF-B1, and these four genes were selected for subsequent characterisation.
Subcellular localisation assays were conducted using GFP fusion constructs transiently expressed in
Nicotiana benthamiana leaves via
Agrobacterium-mediated infiltration (
Figure 7). Confocal microscopy showed that, whereas the UBQ10-GFP empty vector control displayed diffuse fluorescence throughout the whole cell, GFP signals from
SmAP2/ERF88,
SmAP2/ERF110,
SmAP2/ERF121 and
SmAP2/ERF122 were specifically enriched in the nucleus, with no obvious signal detected in the cytoplasm. These observations confirm that all four candidate proteins are nuclear-localised, consistent with their predicted roles as transcription factors.
Tissue-specific expression profiling further revealed distinct spatial expression patterns among the four candidates (
Figure 8).
SmAP2/ERF88 was markedly enriched in aerial tissues, with transcript levels in leaves and petioles reaching approximately 36-fold and 19-fold those in roots, respectively, and it also showed moderate upregulation in stems (approximately 5-fold). In contrast,
SmAP2/ERF110 exhibited relatively minor variation across tissues, with only slight increases in leaves and petioles (approximately 2.3-fold and 1.7-fold, respectively). Compared with these two genes,
SmAP2/ERF121 showed a petiole-preferential pattern (approximately 3.3-fold) while remaining low in leaves (approximately 0.29-fold).
SmAP2/ERF122 reached its highest expression in petioles (approximately 10-fold) and maintained a moderate level in leaves (approximately 3.8-fold). Collectively, these results indicate that the four candidates display distinct tissue-preferential expression, with
SmAP2/ERF88 enriched in leaves and petioles, whereas
SmAP2/ERF122 shows the strongest preference for petioles.
2.8. Hormone-Induced Expression Responses of Candidate SmAP2/ERF Genes
Root tissues were selected for hormone-response assays to align hormone-responsive transcriptional changes with RA and SAB outputs measured in the same tissue. To investigate the phytohormone responsiveness of the four candidate SmAP2/ERF genes, transcript levels were quantified after treatments with ETH, ABA, salicylic acid (SA), MeJA and brassinolide (BR) (
Figure 9).
SmAP2/ERF88 showed the strongest responsiveness to ETH and SA, with transcript levels induced by approximately 12-fold and 7.6-fold, respectively, and it was also upregulated by MeJA and ABA (approximately 6.0-fold and 3.7-fold), whereas BR caused only weak induction (approximately 2.3-fold).
SmAP2/ERF110 displayed its strongest response to ABA, which increased expression by approximately 15-fold, while ETH, MeJA and SA also induced transcription to varying degrees (approximately 5.6-fold, 4.6-fold and 3.8-fold, respectively), and BR again produced only a minor effect (approximately 2.2-fold).
Compared with these two broadly inducible genes, SmAP2/ERF121 and SmAP2/ERF122 exhibited more selective response patterns. SmAP2/ERF121 was primarily induced by ABA (approximately 4.3-fold), whereas ETH and SA had negligible effects and BR repressed transcription to approximately 0.56-fold of the control. A similar trend was observed for SmAP2/ERF122, with ABA inducing expression (approximately 2.7-fold) and BR reducing transcript levels to approximately 0.63-fold. Overall, the four candidates showed distinct hormone-response modes, in which SmAP2/ERF88 and SmAP2/ERF110 were broadly responsive, whereas SmAP2/ERF121 and SmAP2/ERF122 were characterised by ABA-associated induction coupled with BR-mediated repression.
2.9. Hormone-Induced Transcriptional Responses of Key Phenolic-Acid Biosynthetic Genes
To assess how phytohormones reprogramme phenolic-acid biosynthesis at the transcriptional level, the expression of PAL, C4H, 4CL, TAT, HPPR, RAS and CYP98A14 was quantified after hormone treatments (
Figure 10). These genes exhibited node-specific and hormone-dependent responses, indicating preferential activation or repression of distinct pathway modules.
Under ETH, PAL, TAT and RAS were moderately induced (e.g., PAL ~1.38-fold; RAS ~1.75-fold), whereas 4CL and HPPR were strongly repressed (~0.23-fold and ~0.10-fold, respectively), suggesting an unbalanced activation across pathway steps. ABA markedly activated the C4H–4CL–RAS module (C4H ~10.79-fold; RAS ~13.98-fold) but simultaneously suppressed TAT (~0.59-fold) and downregulated CYP98A14 to near-background levels, implying a potential downstream constraint. MeJA and SA displayed broadly similar trends, with strong induction of 4CL (~3.83-fold in MeJA; ~3.25-fold in SA) and RAS (~9.50–9.73-fold), accompanied by reduced TAT (~0.54-fold in MeJA) and pronounced repression of CYP98A14 (~0.156-fold in MeJA; ~0.012-fold in SA). BR exerted an overall inhibitory effect on major pathway genes, including strong reductions in PAL (~0.17-fold) and TAT (~0.11-fold) and near-complete suppression of CYP98A14, whereas RAS showed only a mild increase (~1.63-fold). These results highlight that phytohormones regulate distinct pathway nodes and identify CYP98A14 as a highly hormone-sensitive downstream bottleneck that may contribute to hormone-dependent variation in phenolic-acid accumulation.
2.10. Effects of Phytohormone Treatments on RA and SAB Accumulation in Roots
Phenolic-acid accumulation was evaluated by quantifying RA and SAB contents in roots after three weeks of hormone treatment using HPLC (
Figure 11). This time point differed from the 12 h sampling used for transcript analyses to capture longer-term metabolic responses. Hormone treatments caused divergent changes in RA and SAB accumulation, indicating differential effects on phenolic-acid metabolic output.
In the control (CK), the mean RA and SAB contents were 12.42 mg/g DW and 69.67 mg/g DW, respectively. ETH markedly increased RA accumulation to 26.91 mg/g DW (~2.17-fold) while decreasing SAB to 50.76 mg/g DW. Under SA treatment, RA remained close to the control level (12.89 mg/g DW), whereas SAB increased to 76.68 mg/g DW. ABA reduced both metabolites, with RA and SAB decreasing to 10.89 mg/g DW and 50.82 mg/g DW, respectively. MeJA and BR showed relatively minor effects on RA (11.39 mg/g DW and 12.17 mg/g DW, respectively), while SAB remained at intermediate levels (58.79 mg/g DW and 62.48 mg/g DW, respectively). Taken together, these results indicate that phytohormones reshape phenolic-acid accumulation in S. miltiorrhiza, with ETH preferentially promoting RA, SA favouring SAB accumulation, and ABA exerting inhibitory effects on both metabolites.
3. Discussion
The AP2/ERF gene family represents one of the largest transcription factor groups in plants and participates in a broad range of biological processes, including growth, phytohormone signalling and stress responses [
28]. Functional divergence is evident among different AP2/ERF subfamilies [
6], and even closely related members can play distinct roles across species or conditions [
29]. This study systematically characterised the AP2/ERF family in
S. miltiorrhiza and examined its expansion patterns, structural features and potential functional differentiation. Four candidates potentially involved in phenolic-acid metabolism were further prioritised, providing targets for dissecting specialised-metabolism regulatory networks in Danshen.
We identified 169 AP2/ERF family members in the
S. miltiorrhiza genome, a number comparable to that of model plants such as
A. thaliana [
9]. These genes displayed an uneven chromosomal distribution, suggesting that their evolution may have been shaped by region-specific genomic dynamics. Except for
SmAP2/ERF36, all members contained a complete AP2 domain. Similar cases have also been reported in
Brassica napus,
Gossypium raimondii, and
Nelumbo nucifera [
30,
31,
32]. This phenomenon may be attributed to random mutations or sequence deletions occurring in certain duplicated genes following tandem or segmental duplication, resulting in incomplete retention of the AP2 domain. The encoded proteins ranged from 93 to 645 amino acids in length, with molecular weights of 10.45–70.59 kDa. Most SmAP2/ERF proteins were predicted to localise in the nucleus, indicating that the AP2/ERF family in
S. miltiorrhiza mainly exerts regulatory functions in the nucleus, which is consistent with the typical localisation and function of transcription factors. Based on the established classification system in
Arabidopsis, the 169 SmAP2/ERF proteins were categorised into five major groups: AP2, RAV, ERF, DREB, and Soloist, which is consistent with classification schemes reported in most AP2/ERF family studies.
Although SmAP2/ERF proteins vary considerably in physicochemical properties such as length and molecular weight, their gene structures and key conserved motifs show high overall conservation. Members of the ERF/DREB subfamilies generally exhibited a single-exon structure, which may facilitate rapid transcriptional responses to environmental stimuli. AP2 and RAV subfamilies retained more complex exon–intron organisations, potentially reflecting their roles in developmental regulation. Similar structural patterns have been reported in
Cymbidium sinense and
Cinnamomum camphora [
33,
34], supporting a conserved relationship between structure and function in AP2/ERF families across plant species. In this study, we identified 10 conserved motifs; Motif1, Motif2 and Motif3 were highly conserved in most SmAP2/ERF members, suggesting that they represent the core AP2 domain features. Clear subfamily-specific motif signatures were also observed. The AP2 subfamily was enriched in Motif5 and Motif6, the ERF subfamily commonly contained Motif8, and the DREB A-1 group possessed Motif9 and Motif10. This motif divergence is consistent with the findings reported by Jiang et al. [
35].
Promoter regions are enriched in cis-acting regulatory elements, which are critical for controlling gene expression levels and specificity [
36]. By analysing promoter cis-elements, it is possible to infer the types of physiological cues or stress signals that may regulate gene expression. Here, we predicted cis-acting elements in the promoter regions of SmAP2/ERF genes and found that they were mainly associated with four categories: light signalling, phytohormone regulation, stress responses, and developmental processes. Light- and hormone-responsive elements were present at substantially higher proportions, suggesting that SmAP2/ERF genes may participate in growth/development regulation and hormone signalling pathways in
S. miltiorrhiza.
Differences in gene family size among plant species are common in genome evolution and are typically shaped by genome structural variation and selection pressure. A total of 96 CcAP2/ERF genes and 189 PgAP2/ERF genes were identified in the genomes of
Coptis chinensis and
Panax ginseng, respectively [
37,
38]. The AP2/ERF families in both species exhibited evidence of tandem and segmental duplication, suggesting that gene duplication may contribute to the expansion and size variation in this family across species. In our study, we identified 26 tandem duplication events involving 47 SmAP2/ERF members in the
S. miltiorrhiza genome, and 49 segmentally duplicated SmAP2/ERF gene pairs were detected through comparative genomic analysis. Ka/Ks analysis of tandem-duplicated pairs showed ratios below 1, indicating strong purifying selection and functional conservation after expansion. Comparative synteny analysis identified 166 SmAP2/ERF gene pairs between
S. miltiorrhiza and
A. thaliana. Syntenic gene pairs with
S. bowleyana,
S. hispanica and
S. splendens were 193, 259 and 502, respectively, indicating widespread retention of conserved collinear blocks.
The AP2/ERF transcription factor family is widely regarded as an important regulatory hub linking phytohormone signalling with biotic and abiotic stress responses. Many ERF members mediate the expression of defence-related genes through ethylene, jasmonate, and salicylic acid pathways, playing central roles in immune responses and environmental adaptation. In tomato, the JA-responsive AP2/ERF transcription factor
GAME9/JRE4 can directly bind the promoters of steroidal glycoalkaloid biosynthetic genes, thereby promoting the accumulation of defence-related specialised metabolites under stress conditions [
39]. Similarly, AP2/ERF transcription factors such as
ORCA2 and
ORCA3 in
Catharanthus roseus enhance the accumulation of specific specialised metabolites by activating genes involved in terpenoid indole alkaloid biosynthesis [
40]. Four candidate genes—
SmAP2/ERF88,
SmAP2/ERF110,
SmAP2/ERF121, and
SmAP2/ERF122—were prioritised using comparative phylogeny with
A. thaliana ERFs implicated in phenylpropanoid-associated regulation.
A. thaliana does not produce RA or SAB, and phylogenetic proximity was used as an initial screening criterion rather than a proxy for functional equivalence. Tissue expression analysis indicated clear spatial differentiation among these candidates in vegetative organs:
SmAP2/ERF88 was highly enriched in leaves,
SmAP2/ERF121 and
SmAP2/ERF122 were predominantly expressed in petioles, whereas
SmAP2/ERF110 maintained moderate expression across multiple aerial tissues. Hormone-induction assays further showed that
SmAP2/ERF88 and
SmAP2/ERF110 were strongly induced by multiple hormones including ETH, SA, MeJA and ABA, while
SmAP2/ERF121 and
SmAP2/ERF122 were mainly responsive to ABA and were notably repressed under BR treatment. These patterns suggest that
SmAP2/ERF88 and
SmAP2/ERF110 may function as key nodes integrating multiple defence-related signals, whereas
SmAP2/ERF121 and
SmAP2/ERF122 may preferentially participate in an ABA-driven regulatory module that is antagonised by BR. The present dataset is derived from coordinated responses to phytohormone treatments, supporting an associative relationship between candidate SmAP2/ERF expression and phenolic-acid outputs; direct transcriptional regulation of phenolic-acid biosynthetic genes remains to be experimentally established.
Phytohormone treatments triggered node-specific transcriptional reprogramming of key phenolic-acid biosynthetic genes, suggesting modular regulation of the pathway. ABA, MeJA and SA generally activated the C4H–4CL–RAS module but strongly suppressed CYP98A14, whereas ETH and BR exhibited distinct regulatory patterns across upstream and downstream steps. Consistently, hormone treatments produced divergent metabolic outputs, with ETH promoting RA accumulation, SA favouring SAB, and ABA reducing both metabolites. Considering the different time scales between early transcriptional responses and long-term metabolite accumulation, these results imply that hormone-dependent phenolic-acid biosynthesis is governed by coordinated control of multiple pathway nodes, and CYP98A14 may represent a key hormone-sensitive constraint.
The identification, classification and expression profiling of the SmAP2/ERF family provide valuable insights into the potential roles of AP2/ERF transcription factors in S. miltiorrhiza and establish a foundation for future functional studies. Transcription factor–promoter interaction assays, including yeast one-hybrid, EMSA, Dual-LUC, and ChIP–qPCR, will be required to evaluate promoter binding and regulatory capacity. Genetic perturbation approaches, such as overexpression or CRISPR-based editing in stable or hairy-root systems, will further test whether these candidate SmAP2/ERFs directly modulate phenolic-acid biosynthetic genes and influence RA and SAB accumulation.
4. Materials and Methods
4.1. Plant Materials, Growth Conditions and Hormone Treatments
S. miltiorrhiza seedlings were used in this study. Plants were grown in pots under greenhouse conditions (25 °C, 12 h light/12 h dark photoperiod, and ~60% relative humidity). The substrate consisted of peat soil: vermiculite: perlite (5:1:1, v/v/v).
Phytohormone treatments were performed using uniformly grown plants. Hormone solutions were applied via a root-irrigation (subirrigation) method, in which pots were placed in trays and hormone solutions were added to the trays. The substrate absorbed the solutions from the bottom to ensure continuous exposure, and the solution level was replenished as needed to maintain a constant solution volume. Distilled water–treated plants were used as controls (CK). The final working concentrations were 500 μM ETH, 100 μM ABA, 1 mM SA, 100 μM MeJA, and 0.1 μM BR. All phytohormones were purchased from SparkJade (SparkJade, Jinan, China). Root tissues were used for transcript and metabolite analyses to match the tissue used for RA and SAB quantification. Root tissues were harvested at 12 h post-treatment for RNA extraction and qRT-PCR analysis (three biological replicates per treatment). Root samples collected after 3 weeks were used to quantify rosmarinic acid (RA) and salvianolic acid B (SAB). Each biological replicate represented an independent individual plant, and root tissues were harvested and processed separately.
4.2. Genome Datasets and Sequence Retrieval
The genome assembly and annotation of S. miltiorrhiza were retrieved from the Genome Warehouse (GWH; assembly accession: GWHAOSJ00000000). Predicted protein sequences and the corresponding GFF annotation file were used for family-wide identification and bioinformatic analyses.
AP2/ERF protein sequences of A. thaliana were retrieved from TAIR10 and used for homology searches and comparative phylogenetic analyses. Genome resources for S. bowleyana (GWH: GWHASIU00000000), S. hispanica (NCBI GenBank: GCA_023119035.1), and S. splendens (NCBI GenBank: GCA_004379255.2) were retrieved to support interspecies synteny analyses.
4.3. Identification of SmAP2/ERF Family Members
Candidate SmAP2/ERF proteins were identified by combining homology-based and domain-based approaches [
41]. First,
A. thaliana AP2/ERF proteins were used as queries to search the
S. miltiorrhiza proteome using BLASTP (v2.17.0) with an E-value cutoff of ≤ 1 × 10⁻
5. In parallel, the Hidden Markov Model (HMM) profile of the AP2 domain (PF00847) was downloaded from Pfam and used to perform HMMER hmmsearch (v3.4) with an E-value cutoff of ≤ 1 × 10⁻
5. Candidates identified by both strategies were merged and de-duplicated, and domain validation was conducted using SMART and NCBI-CDD [
42]. Proteins containing at least one AP2 domain were retained as confirmed members of the SmAP2/ERF family.
Physicochemical properties, including molecular weight (MW), theoretical isoelectric point (pI), and instability index (II), were calculated using ExPASy (
https://web.expasy.org, accessed on 8 July 2025). Subcellular localisation was predicted with WoLF PSORT (
https://wolfpsort.hgc.jp/, accessed on 8 July 2025).
4.4. Phylogenetic Analysis and Subfamily Classification
A combined dataset consisting of AP2/ERF proteins from
S. miltiorrhiza and
A. thaliana was constructed for phylogenetic inference. Multiple sequence alignment was performed using MUSCLE in MEGA11, and poorly aligned regions were trimmed using trimAl. A maximum-likelihood phylogenetic tree was inferred using IQ-TREE (v3.0.1) with 1000 bootstrap replicates. Subfamily classification of SmAP2/ERF proteins was performed according to the established AP2/ERF system in
A. thaliana [
43]. Tree visualisation and annotation were performed using iTOL (
https://itol.embl.de/, accessed on 28 July 2025).
4.5. Gene Structure and Conserved Motif Analysis
Gene structural features of SmAP2/ERF genes were extracted from the genome annotation file, and corresponding CDS and protein sequences were retrieved accordingly. Conserved domains were annotated using NCBI Batch CD-Search. Conserved motifs were identified using MEME with the maximum number of motifs set to 10, while other parameters were kept as default (
https://meme-suite.org/meme/tools/meme, accessed on 15 July 2025) [
44]. Gene structure, domain distribution, and motif composition were integrated and visualised using TBtools-II (v2.400) [
45].
4.6. Chromosomal Localisation, Duplication and Evolutionary Analysis
Chromosomal coordinates of SmAP2/ERF genes were extracted from the GFF annotation, and their distribution patterns were visualised in TBtools-II (v2.400). Gene duplication events were identified using the Quick Run MCScanX Wrapper in TBtools-II (v2.400). Tandemly duplicated gene pairs were further extracted based on genomic proximity criteria. Selection pressure on tandem duplicate pairs was evaluated by calculating nonsynonymous (Ka) and synonymous (Ks) substitution rates and Ka/Ks ratios using the Simple Ka/Ks Calculator in TBtools-II (v2.400).
4.7. Promoter Cis-Element and Synteny Analyses
Promoter sequences (2000 bp upstream of the CDS start site) were extracted for all SmAP2/ERF genes based on the genome annotation. Cis-acting elements were predicted using PlantCARE (
https://bioinformatics.psb.ugent.be/webtools/plantcare/html/, accessed on 15 July 2025), and the element composition was summarised and visualised with TBtools-II (v2.400).
Collinearity analysis within S. miltiorrhiza was performed using MCScanX, and syntenic relationships were visualised using the Circos function in TBtools-II (v2.400). Interspecies synteny analyses were conducted between S. miltiorrhiza and A. thaliana, S. bowleyana, S. hispanica, and S. splendens. Dual synteny plots were generated using the Dual Synteny Plot for MCScanX module in TBtools-II (v2.400).
4.8. Candidate Prioritisation Based on Comparative Phylogeny
To prioritise candidate regulators potentially associated with phenolic-acid metabolism, AP2/ERF members implicated in specialised metabolism or phenylpropanoid-associated regulation in
Arabidopsis thaliana were selected as references, including
AtERF114,
AtERF4,
AtERF8, ORA59 (
AtERF59), and
AtERF012 [
24,
46,
47,
48]. Candidate SmAP2/ERF genes were prioritised by phylogenetic proximity to these reference factors.
A. thaliana does not synthesise RA or SAB, and phylogenetic proximity was treated as a heuristic for initial screening rather than evidence of functional equivalence.
4.9. RNA Extraction, qRT-PCR and Subcellular Localisation Assays
Total RNA was extracted using a Labgene RNA extraction kit (Labgene, Chengdu, China). RNA concentration and purity were assessed using a NanoDrop 2000 (Thermo Fisher Scientific, Waltham, MA, USA), and RNA integrity was examined via agarose gel electrophoresis. First-strand cDNA was synthesised using HiScript IV All-in-One Ultra RT SuperMix for qPCR (Vazyme, Nanjing, China). qRT-PCR was conducted on a Bio-Rad real-time PCR system using SYBR Green chemistry (Bio-Rad, Hercules, CA, USA). Each sample was analysed with three technical replicates, and primer sequences are provided in
Supplementary Table S2.
SmActin was used as the internal reference gene, and relative expression levels were calculated using the 2
−ΔΔCt method [
49]. Statistical differences between each treatment group and the control were assessed by one-way ANOVA followed by Dunnett’s multiple comparisons test. Differences were considered statistically significant at
p < 0.05.
Tissue-specific expression analysis used roots, stems, leaves, and petioles collected from three independent 4-month-old S. miltiorrhiza plants. Each plant served as one biological replicate for each tissue. Transcript levels of candidate SmAP2/ERF genes were measured by qRT-PCR.
Subcellular localisation analysis was performed by cloning four candidate SmAP2/ERF coding sequences into the UBQ10-pCAMBIA1305-GFP vector to generate GFP fusion constructs. Recombinant plasmids were transformed into Agrobacterium tumefaciens strain GV3101 and transiently expressed in Nicotiana benthamiana leaves. GFP fluorescence was observed at 48 h post-infiltration using confocal laser scanning microscopy (Leica Microsystems, Wetzlar, Germany) with excitation at 488 nm and emission at 510 nm. The UBQ10-GFP empty vector served as the control.
4.10. Determination of Phenolic Acids by HPLC
Root samples were dried at 55 °C and finely ground into powder. Approximately 0.1 g of dried powder was extracted with 8 mL methanol by ultrasonication for 30 min, followed by volume adjustment to 10 mL with methanol. After settling, the supernatant was collected and filtered through a 0.22 μm membrane filter prior to HPLC analysis (Thermo Fisher Scientific, Waltham, MA, USA).
HPLC analysis was performed on a Thermo Fisher system equipped with a reverse-phase C18 column (250 mm × 4.6 mm, 5 μm). The mobile phase consisted of 0.05% phosphoric acid in water (A) and acetonitrile (B). The column temperature was maintained at 30 °C, the flow rate was set to 0.8 mL min
−1, and detection was conducted at 286 nm. The gradient programme was as follows: 0–10 min, 10–20% B; 10–35 min, 20–24% B; 35–45 min, 24–28% B [
50]. Quantification of RA and SAB was performed using standard curves generated from authentic standards.