You are currently viewing a new version of our website. To view the old version click .
Genes
  • Article
  • Open Access

28 September 2025

Genome-Wide Identification and Expression Analysis of the bHLH Transcription Factor Family in Lilium bakerianum var. rubrum

,
,
,
and
1
College of Landscape and Horticulture, Yunnan Agricultural University, Kunming 650201, China
2
Key Laboratory of Phytochemistry and Natural Medicines, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China
3
Institute for Advanced Study, Chengdu University, No. 2025 Chengluo Road, Chengdu 610106, China
*
Author to whom correspondence should be addressed.
This article belongs to the Section Plant Genetics and Genomics

Abstract

Background/Objectives: The basic helix–loop–helix (bHLH) transcription factor family regulates plant development, metabolism, and stress responses. Yet, its genome-wide composition remains unexplored in Lilium bakerianum var. rubrum (LBVR), an ornamental lily valued for its floral traits. This study aimed to identify, classify, and profile the bHLH family in LBVR using full-length transcriptomic resources. Methods: PacBio HiFi full-length transcriptome sequencing was combined with Illumina RNA-seq for accurate structural annotation and expression quantification. Candidate bHLHs were identified by iTAK and HMMER-Pfam, and their physicochemical properties, secondary structures, motifs, and phylogenetic positions were examined. Expression patterns were analyzed across four floral stages (bud, initial bloom, full bloom, and late bloom). Results: A total of 113 high-confidence bHLH genes were identified, with ~90% successfully annotated. The proteins displayed variation in molecular weight, isoelectric point, structural features, and motif composition. Phylogenetic analysis placed them into 13 clades consistent with Arabidopsis subfamilies, revealing lineage-specific expansions and contractions. Expression profiling showed that 95 genes were active in at least one stage, with two transcriptional waves: a strong bud-to-initial-bloom activation and a secondary wave spanning anthesis. Seventeen genes were expressed exclusively at the bud stage, suggesting roles in early floral-organ initiation and pigmentation. Conclusions: This work provides the first genome-wide characterization of bHLHs in LBVR. The integrated sequencing approach generated a robust catalogue and developmental expression map, offering candidates for functional studies and resources for breeding in lilies.

1. Introduction

The basic helix–loop–helix (bHLH) transcription factor (TF) family represents one of the largest and most functionally diverse groups of DNA-binding proteins in eukaryotes []. Members of this family contain a conserved bHLH domain of approximately 50–60 amino acids (aa), which comprises two amphipathic α-helices separated by a flexible loop region []. This structural motif mediates the formation of both the homodimer and heterodimer, as well as specific recognition of E-box (CANNTG) elements in the target gene promoters. The bHLH proteins govern a broad spectrum of developmental and physiological processes in plants, including the regulation of cell proliferation, organ identity, responses to light and phytohormones, adaptation to drought or salinity stresses, and control of flavonoid and alkaloid biosynthetic pathways [,]. However, there is a substantial variation in the bHLH gene copy number among several plant species, with over 160 bHLH-encoding loci in Arabidopsis thaliana [], over 180 in rice (Oryza sativa) [], and fewer than 100 loci in basal land plants such as mosses []. Further, lineage-specific expansions have given rise to clade-specific bHLH subfamilies that exhibit neo- or sub-functionalization, resulting in both deeply conserved regulators involved in stomatal development and novel factors unique to particular taxa [,]. Additionally, functional characterization in model species has shown that certain bHLH clades are indispensable for core processes, while others contribute to species-specific traits, such as pigment accumulation in fruits or defensive metabolite production [,]. More recently, transcriptomic analyses have further highlighted the involvement of bHLH transcription factors in regulating plant responses to temperature stress and floral coloration, emphasizing the continued interest in this gene family [,]. These observations highlight the evolutionary plasticity of the bHLH family and underscore the value of having a genome-wide catalogue in any given plant species to reveal both conserved modules and unique regulatory innovations.
In the genus Lilium, several individual bHLH transcription factors have been reported in recent years. For example, LpbHLH144 from Lilium pumilum enhances salt and alkali stress tolerance when expressed in transgenic tobacco []. In L. oriental hybrid ‘Siberia’, LoUDT1 was characterized as a bHLH gene essential for anther development []. Moreover, LvbHLH13 from L. ‘Viviana’ was shown to positively regulate anthocyanin accumulation in petals by activating LvMYB5 []. In the same species, LibHLH22 and LibHLH63 were identified as positive regulators of volatile terpenoid biosynthesis, directly enhancing the expression of key terpene pathway genes []. Beyond bHLH factors, other transcriptional regulators have also been implicated in lily traits: for instance, LhMYB114 together with structural genes such as LhDFR and LhANS-rr1 regulates anthocyanin biosynthesis in flower buds of L. ‘Siberia’ [], while transcriptomic studies in LA lily ‘Aladdin’ revealed that transcription factors including BLHs, ARFs, HD-ZIPs, AP2/ERFs, and SBPs are involved in hormonal and sugar-mediated control of stem bulblet formation []. Collectively, these studies highlight the importance of transcription factors, particularly bHLHs, in controlling developmental processes, pigment biosynthesis, secondary metabolism, and stress responses in lilies. However, most reports have focused on single-gene functional analyses or specific physiological pathways, and no genome-wide investigation of the bHLH gene family has yet been reported in lilies.
L. bakerianum var. rubrum (LBVR) is an ornamental lily species endemic to the montane regions of Yunnan Province in Southwestern China [] distinguished by its vibrant magenta-red petals, strong floral fragrance, and graceful floral architecture, which also make it highly prized in both commercial horticulture and traditional ornamental gardens. Besides its aesthetic appeal, LBVR is also utilized in ethnobotanical applications due to its anti-inflammatory and antioxidant properties. However, despite its economic, ecological, and cultural significance, no genome-wide survey of TFs has been conducted in this species. Therefore, a systematic characterization of the bHLH family in LBVR is critical to uncover the regulators underlying its unique floral traits, pigment biosynthesis, scent production, and stress resilience. Moreover, understanding the repertoire and expression dynamics of bHLH genes may facilitate targeted breeding strategies aimed at enhancing flower color intensity, extending vase life and improving tolerance to abiotic stresses such as temperature fluctuations encountered during commercial cultivation and postharvest storage.
The advent of high-throughput sequencing technologies, including combined transcriptomics and third-generation sequencing, provides powerful tools for mining genes involved in secondary metabolism in both medicinal and ornamental plants []. In this study, we conducted a genome-wide identification and characterization of bHLH TFs in LBVR using high-quality transcriptome data from floral tissues at various developmental stages. By integrating homology-based prediction, hidden Markov model (HMM)-based domain searches, conserved domain validation, protein property assessment, motif composition, gene structure, phylogenetic relationships, and expression profiling, we identified a high-confidence bHLH gene set. This work provides new insights into the regulatory roles of bHLH TFs in floral development and specialized metabolism and also offers a valuable resource for future functional and breeding studies in LBVR and related ornamental species.

2. Materials and Methods

2.1. Full-Length Transcriptome Sequencing and Annotation

The plant materials used for sequencing in this study were collected from a mature individual of LBVR growing on Changchong Mountain in Kunming, Yunnan Province, Southwestern China (25.1177° N, 102.7079° E). Floral tissues at four developmental stages, namely, bud, initial bloom, full bloom, and late bloom (Figure S1), were collected and promptly frozen in liquid nitrogen to prevent RNA degradation. High-quality RNA was extracted from the pooled samples. Following quality control assessments of purity, concentration, and integrity, full-length cDNA was synthesized from the mRNA and subsequently amplified by PCR. The resulting cDNA was then subjected to damage- and end-repairs. SMRTbell adapters were ligated to the repaired cDNA to construct SMRTbell template libraries. High-fidelity (HiFi) long reads were generated through single-molecule sequencing on the PacBio Revio platform (Pacific Biosciences, Menlo Park, CA, USA).
High-accuracy circular consensus sequences (CCSs) were generated from subreads using the CCS tool in SMRT Link v10.1 (Pacific Biosciences, Menlo Park, CA, USA), with a minimum of three full passes and a read quality threshold of ≥0.9. The resulting CCS reads were then classified as full- or non-full-length transcripts based on the presence of intact 5′ primers, 3′ primers, and poly(A) tails. FLNC sequences were processed using the Iso-Seq module in SMRT Link to cluster similar sequences into distinct groups, with each cluster subsequently collapsed into a single consensus isoform. Redundant isoforms were then removed using CD-HIT v4.6.1 [] with a 99% sequence identity threshold. The completeness of the non-redundant transcripts was subsequently assessed using BUSCO v3.0.2 [], based on the OrthoDB database of lineage-specific single-copy orthologs.
The coding sequences (CDSs) of the assembled transcripts were predicted using TransDecoder v5.0.0 []. Functional annotation was performed using DIAMOND v2.0.15 [] by aligning the predicted protein sequences against multiple databases, including the NCBI Non-Redundant Protein Sequence Database (NR) [], Swiss-Prot, TrEMBL [], eggNOG [], Clusters of Orthologous Groups of proteins (COG) [], Eukaryotic Orthologous Groups (KOG) [], and the Kyoto Encyclopedia of Genes and Genomes (KEGG) [], with an E-value cutoff of <1 × 10−5 Conserved protein domains and Gene Ontology (GO) annotations were obtained using InterProScan v5.34-73.0 [].

2.2. Identification of bHLH TFs

Candidate bHLH TFs were identified by integrating iTAK prediction with Pfam domain validation. Putative bHLH genes were first predicted using iTAK v1.7a [] with default plant-specific parameters and then screened for the conserved bHLH domain (PF00010) using HMMER v3.3.2 [] with an E-value cutoff of <1 × 10−5. The final candidates were defined as bHLH transcripts after being identified by both methods.
The physicochemical properties of the identified bHLH proteins, including sequence length, molecular weight, isoelectric point, instability index, and GRAVY, were calculated using the ProteinAnalysis module from Biopython’s ProtParam package based on their amino acid sequences. Further, the subcellular localization of the bHLH proteins was examined using DeepLoc-2.0 [], which is a deep learning (DL)-based predictor trained on eukaryotic proteins, while their secondary structures were predicted using NetSurfP-3.0 [], which estimates the probabilities of α-helix, β-strand, and coil structures using DL models. Conserved motif analysis of the bHLH proteins was conducted using MEME Suite v5.5.8 [] in the classical mode. The number of motifs was set to 15, with widths ranging from 6 to 50 aa, while the site distribution was set to zero or one occurrence per sequence.

2.3. Phylogenetic Analysis and Classification of the bHLH TFs

A total of 153 A. thaliana bHLH protein sequences with the longest isoforms for each gene were downloaded from PlantTFDB 4.0 [] and combined with 113 LBVR bHLH proteins identified in this study. Multiple sequence alignment was used to align the protein sequences using MAFFT v7.525 [] with the linsi mode for high-accuracy alignment. The resulting alignment was trimmed using trimAl v1.4 [] with the automated1 mode to remove poorly aligned and divergent regions. Further, ML phylogenetic inference was conducted using RAxML v8.2.12 [] with the PROTGAMMAILGX model with 500 replicates for Bootstrap using the rapid bootstrapping algorithm.

2.4. Transcriptome Sequencing and Expression Level Analysis

Floral tissues of LBVR were collected at bud (ST1), initial bloom (ST2), full bloom (ST3), and late bloom (ST4) developmental stages, with three biological replicates per stage. All samples were immediately frozen in liquid nitrogen and stored at −80 °C to prevent RNA degradation until RNA extraction. Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA), and high-quality RNA samples were used for library preparation. mRNA was enriched from total RNA using oligo(dT) magnetic beads and converted into cDNA libraries using the TruSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, CA, USA) according to the manufacturer’s protocol. The libraries were then sequenced using the DNBSEQ-T7 platform (MGI Tech, Shenzhen, China) to generate 150 bp paired-end reads. Raw sequencing reads were quality-filtered using Trimmomatic v0.39 []. Clean reads were subsequently aligned to the full-length transcriptome reference using Bowtie2 v2.5.4 [], while the expression levels of each transcript were quantified as FPKM (fragments per kilobase of transcript per million mapped reads) using StringTie v1.3.3 []. Transcript-level count data were obtained using StringTie and aggregated into count matrices via the prepDE.py script for downstream differential gene expression analysis. Differential expression between successive floral stages, including ST1 vs. ST2, ST2 vs. ST3, and ST3 vs. ST4, was analyzed in DESeq2 (Bioconductor), with genes showing |log2 fold change| > 1 and Benjamini–Hochberg-adjusted p < 0.05 classified as differentially expressed genes (DEGs).

3. Results

3.1. Summary of Transcriptome Assembly and Annotation

A total of 193,000 CCSs were generated, yielding approximately 409 million bases (Mb) of HiFi data with an average read length of 2120 bp. Among these CCSs, 181,609 (94.10%) were identified as full-length non-chimeric (FLNC) sequences, indicating high-quality transcript capture. Similar sequences were then clustered into 75,629 consensus isoforms with an average length of 1858 bp, of which 75,619 exhibited high accuracy (>99%). After removing redundancy, a final set of 55,520 non-redundant transcript isoforms was obtained (Figure 1A). BUSCO analysis yielded a completeness score of 68.97%, indicating moderate representation of conserved single-copy orthologs in the assembled transcriptome (Figure 1B). Coding sequence prediction resulted in 52,216 open reading frames (ORFs), among which 29,660 were classified as complete. The functional annotation results showed that 49,810 transcript isoforms (89.72%) were successfully annotated in at least one database (Figure 1C). Among them, 43,700 (78.71%) were assigned Gene Ontology (GO) terms, and 42,304 (76.20%) contained identifiable Pfam domains.
Figure 1. Summary of transcriptome assembly and annotation. (A) Workflow illustrating the processing steps of PacBio HiFi data. (B) BUSCO evaluation of the completeness of the transcriptome based on the conserved single-copy orthologs. (C) Bar chart showing the number of transcript isoforms annotated in different functional databases.

3.2. Identification and Features of bHLH TFs

A total of 4388 TFs were predicted in LBVR using iTAK, out of which 113 were identified as bHLH TFs, consistent with domain-based annotation, which also detected 113 transcripts containing the conserved bHLH domain. The overlap of both methods also yielded 113 high-confidence bHLH TFs. The lengths of the identified bHLH proteins ranged from 59 to 707 aa (Figure 2A), with corresponding molecular weights of between 7.15 and 77.78 kDa (Figure 2B). The predicted isoelectric points of the bHLH TFs varied from 5.04 to 10.05 (Figure 2C), while their instability index values ranged from 37.23 to 96.46 (Figure 2D). Furthermore, all proteins exhibited negative grand average of hydropathicity (GRAVY) values, ranging from −1.00 to −0.17 (Figure 2E), indicating that they are generally hydrophilic.
Figure 2. Physicochemical properties of bHLH proteins. Each panel shows a histogram overlaid with a kernel density estimation curve, representing the frequency and overall distribution of the respective property across all identified bHLH proteins. (A) Protein length. (B) Molecular weight. (C) Isoelectric point. (D) Instability index. (E) Grand average of hydropathicity.
A total of 111 bHLH proteins were found exclusively in the nucleus, while baihe_transcript_30752 and baihe_transcript_55079 were localized to either the cytoplasm or the nucleus. Secondary structure prediction indicated that the bHLH proteins predominantly contain coil regions (78.41%), followed by α-helices (18.94%) and a small proportion of β-strands (2.65%), which is consistent with the expected helix–loop–helix (HLH) motif, suggesting considerable structural flexibility of the bHLH proteins. In addition to the conserved HLH domain (PF00010), domain composition analysis revealed that seven bHLH proteins contained an N-terminal domain specific to bHLH-MYC and R2R3-MYB TFs (PF14215), suggesting potential functional diversification. Interestingly, protein baihe_transcript_54573 had two tandem PF00010 domains, indicating possible internal duplication. These multi-domain configurations imply that certain bHLH members may participate in broader or specialized regulatory pathways.
To further explore the structural diversity of the bHLH protein family, a total of 15 distinct motifs were identified, with lengths ranging from 21 to 50 aa (Figure 3). Of these, Motif 1 was present in 81 of the 113 bHLH proteins with the highest E-value of 1.5 × 10−1699, corresponding well with the canonical HLH DNA-binding domain. However, Motif 2 was the most frequently occurring motif, with occurrence in 107 sequences, suggesting that it may represent a highly conserved auxiliary region shared by most bHLH members. The diversity in motif presence, order, and combination patterns across bHLH proteins reflects both the conserved nature of the HLH core and the evolutionary divergence in their flanking regions, which may contribute to differences in DNA-binding specificity or regulatory interactions.
Figure 3. Distribution of conserved motifs in bHLH transcription factors. Each horizontal bar represents a bHLH protein, with colored boxes indicating the positions and identities of up to 15 predicted motifs. The full list of gene IDs is provided in Table S1.

3.3. Subfamily Classification of bHLH Genes

To clarify the evolutionary relationships and potential functions of bHLH genes, a maximum likelihood (ML) phylogenetic tree was constructed based on sequence alignments of bHLH genes from A. thaliana and LBVR. The bHLH genes in LBVR clustered into 13 distinct clades (Figure 4). In one clade, the gene baihe_transcript_43017 was most closely related to AT5G65320 and clustered with AT1G49770, which is a well-characterized bHLH TF known to regulate embryo development in A. thaliana []. However, in the largest clade, 21 LBVR genes clustered with 17 A. thaliana bHLH genes, several of which have well-established roles, including AT1G68920, which is involved in flowering-time regulation []; AT1G25330, which is involved in transmitting tract development []; and AT1G73830, which positively regulates shade avoidance []. Interestingly, several A. thaliana bHLH genes appeared as singleton branches in the phylogenetic tree, without clustering with any LBVR homologs.
Figure 4. Maximum likelihood phylogenetic tree of bHLH transcription factors from L. bakerianum var. rubrum and A. thaliana. Bootstrap support values (>70%) from 1000 replicates are indicated at the corresponding nodes.

3.4. Expression Patterns of bHLH Genes

To explore the potential roles of bHLH genes during floral development in LBVR, we analyzed their expression patterns across four developmental stages, namely, ST1, ST2, ST3, and ST4. Transcript abundance of the identified bHLH genes was quantified using FPKM, while distribution profiling, stage-specific expression screening, expression clustering, and pairwise differential expression analysis were performed between consecutive floral stages. Of the 113 bHLH genes, 95 (84.07%) exhibited an average FPKM > 1 in at least one developmental stage, suggesting their widespread transcriptional activity during floral development. On average, the expression levels of the bHLH genes were highest in the ST2 (13.52) and ST1 (11.40) stages, suggesting early transcriptional activation before and during anthesis (Figure 5A). However, the bHLH gene expression declined slightly at ST3 (9.12) but moderately increased at ST4 (10.59). Therefore, the transient decrease at ST3 may reflect a regulatory shift associated with peak anthesis.
Figure 5. Expression patterns of bHLH genes in L. bakerianum var. rubrum across four floral stages. (A) Boxplots of log10-FPKM for all 113 genes were generated using three biological replicates per stage. (B) Temporal profiles of 17 ST1-specific genes with FPKM ≥ 5 only at the ST1 stage. (C) Heat-map with hierarchical clustering revealing three expression clusters, namely, cluster 1 with uniformly low expression, cluster 2 with ST1-biased expression, and cluster 3 with ST2-ST4-induced expression.
Stage-specific expression screening was used to identify bHLH genes that were highly expressed at only one floral stage. Genes with an average FPKM ≥ 5 at exactly one developmental stage but with FPKM < 1 at all other stages were defined as stage-specific. Thus, based on this criterion, 17 bHLH genes were specifically expressed at ST1 and none at ST2, ST3, or ST4 (Figure 5B). These ST1-specific genes are likely involved in early floral-organ initiation and the developmental transition toward anthesis. Further, hierarchical clustering of the four-stage matrix resolved three expression modules (Figure 5C). For instance, cluster 1 consisted of 74 genes (65.48%) with low-to-moderate and uniform expression; cluster 2, with 22 genes (19.47%), was strongly ST1-biased, mirroring the stage-specific set; while cluster 3, containing 16 genes (14.16%), was up-regulated from ST2 onward, marking anthesis progression and later maturation. However, one highly expressed gene formed an outlier cluster. Therefore, these results point to two major expression waves, consisting of an early burst at the ST1 stage and a later activation spanning the ST2-ST4 stages, through which bHLH TFs coordinate successive phases of LBVR flower development.
Pairwise differential gene expression analysis revealed two major transcriptional waves that parallel the expression profiles of bHLH TFs. The first transcriptional wave from ST1 to ST2 involved 31,536 DEGs, with 13,866 genes up-regulated and 17,670 down-regulated (Figure 6A). However, subsequent transitions were smaller, with 10,190 genes changed between ST2 and ST3 and 10,899 genes differentially expressed between ST3 and ST4. Interestingly, the expression of bHLH genes also followed the same trend (Figure 6B). For instance, 68 of the 113 bHLH family members were differentially expressed in the transition between ST1 and ST2, with 36 up-regulated and 32 down-regulated. On the other hand, only 19 DEGs responded in the ST2–ST3 comparisons, with 12 up-regulated and 7 down-regulated, while 26 DEGs were expressed in the ST3–ST4 transitions, with 9 up-regulated and 17 down-regulated. Therefore, the majority of bHLH reconfiguration accompanies the large-scale rewiring that initiates floral-organ formation at the bud and initial bloom stages, whereas later stages of full and late blooms require progressively fewer bHLH adjustments to fine-tune anthesis progression and floral maturation.
Figure 6. Stage-to-stage differentially expressed genes (DEGs) in L. bakerianum var. rubrum. Stacked bars show up-regulated (yellow) and down-regulated (blue) genes. (A) shows whole transcriptome and (B) the bHLH subset transcriptome.

4. Discussion

Unlike conventional short-read RNA-seq, full-length transcriptome sequencing with PacBio HiFi reads captures complete cDNA molecules and reduces assembly artefacts, which is particularly valuable for gene family studies. In this study, we combined PacBio long reads with Illumina short reads to achieve both accurate structural annotation and reliable expression quantification of bHLH genes in LBVR. Two independent pipelines consistently identified 113 bHLH genes, providing a robust catalogue despite the organ-specific bias of the floral transcriptome libraries. This integrated approach not only maximized confidence in gene models but also enabled stage-resolved expression profiling across floral development, thereby establishing a solid foundation for functional studies.
The bHLH family is one of the largest and most heterogeneous TF lineages in plants [,]. In LBVR, the 113 identified bHLHs represent a smaller repertoire compared with representative monocots like rice (~180) [], maize (>200) [], and wheat (>470) [] but are closer to numbers reported for other ornamentals such as Dendrobium officinale (98) [] and Cymbidium ensifolium (94) []. This suggests lineage-specific contraction of the family in lilies, which may reflect functional consolidation or unique adaptive pressures associated with perennial growth and ornamental traits. Phylogenetic analysis placed the LBVR bHLH proteins into 13 clades, largely consistent with the canonical Arabidopsis subfamilies, highlighting both evolutionary conservation and divergence. For instance, the clade that contained the four Arabidopsis flavonoid-regulatory paralogues TT8 (AT4G09820), GL3 (AT5G41315), EGL3 (AT1G63650), and MYC1 (AT1G32640) was represented in LBVR by only two orthologous genes, indicating a lineage-specific contraction relative to Arabidopsis. Conversely, the clade housing flowering-time, transmitting-tract, and shade-avoidance regulators, such as AT1G68920, AT1G25330, and AT1G73830, contain 21 LBVR paralogues compared to 17 Arabidopsis genes, indicating a lineage-specific expansion. Therefore, phylogenetic profiling provides a framework that prioritizes bHLH candidates for functional assays, as has been proven in previous studies [,,].
Structural analyses further support this view. LBVR bHLHs displayed wide variation in molecular weight, isoelectric point, secondary-structure content, and motif organization. Such diversity parallels findings in other monocots and Arabidopsis, where subfamily-specific motifs often correlate with specialized roles in developmental or metabolic pathways [,,]. The structural heterogeneity observed in LBVR thus provides a molecular basis for potential functional divergence, particularly in processes unique to lilies, such as volatile biosynthesis and bulb development.
Expression profiling refined these evolutionary insights by highlighting functional candidates. Among the 113 genes, 95 were actively expressed in at least one floral stage, with two major transcriptional waves: a strong activation at the early bud stage and a secondary wave spanning initial to late bloom. Notably, 17 genes were exclusively expressed at the bud stage, suggesting critical roles in early floral-organ initiation. This early-stage bias contrasts with rice, where some bHLHs peak around anthesis [,,], but is consistent with observations in other lilies, where bHLHs regulate pigmentation and stress responses at pre-anthesis stages [,,]. Moreover, the majority of transcriptomic reprogramming occurred during the bud-to-initial-bloom transition, whereas later stages involved fewer adjustments, pointing to an early commitment of transcriptional regulation in LBVR floral development.
Overall, the contraction of the bHLH family size, the enrichment of certain clades, and the strong early-stage transcriptional activation distinguish LBVR from both cereals and Arabidopsis. These findings suggest that lilies may rely on fewer but potentially more specialized bHLH regulators to coordinate floral development and ornamental traits, highlighting both evolutionary streamlining and adaptive specialization.

5. Conclusions

This study presents the first genome-wide characterization of the bHLH transcription factor family in LBVR. A total of 113 high-confidence bHLH genes were identified, fewer than in other monocots. Phylogenetic analysis grouped them into 13 distinct clades, reflecting both conserved regulators and lineage-specific variation. The bHLH proteins also showed high diversity in physicochemical traits, secondary structures, and MEME motifs, suggesting potential functional diversification. Expression profiling across four floral stages revealed two major transcriptional waves, with 17 genes specifically activated at the early floral bud stage, indicating roles in floral-organ initiation. These findings provide candidate regulators for floral development, pigmentation, and stress responses and offer a valuable resource for future functional and breeding studies in lilies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes16101153/s1, Figure S1. Representative photographs of L. bakerianum var. rubrum flowers at different developmental stages used for transcriptomic analysis. Table S1, List of bHLH transcription factor gene IDs in the same order as presented in Figure 3.

Author Contributions

Conceptualization, H.W. and Z.G.; methodology, Z.G. and M.W.; software, M.W., M.Z., J.C. and Z.G.; validation, M.W., M.Z., J.C. and Z.G.; formal analysis, Z.G. and M.W.; investigation, Z.G.; resources, H.W. and Z.G.; data curation, Z.G. and M.W.; writing—original draft preparation, H.W., Z.G. and M.W.; writing—review and editing, H.W., Z.G. and M.W.; visualization, Z.G. and M.W.; supervision, H.W.; project administration, H.W.; funding acquisition, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Joint Special Project for Basic Agricultural Research of Yunnan Province (202301BD070001-144) and the Special Mission for the Flower Industry in Yao’an County, Yunnan Province (202304BI090030).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The transcriptome assembly and annotation have been deposited in Figshare under the DOI https://doi.org/10.6084/m9.figshare.30226042.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
bHLHBasic Helix–Loop–Helix
LBVRL. bakerianum var. rubrum
TFTranscription Factor
HMMHidden Markov Model
aaAmino Acids
GRAVYGrand Average of Hydropathicity
MLMaximum Likelihood
FPKMFragments per Kilobase of Transcript per Million Mapped Reads
DEGsDifferentially Expressed Genes
ORFsOpen Reading Frames
BUSCOBenchmarking Universal Single-Copy Orthologs

References

  1. Jones, S. An overview of the basic helix-loop-helix proteins. Genome Biol. 2004, 5, 226. [Google Scholar] [CrossRef]
  2. Atchley, W.R.; Fitch, W.M. A natural classification of the basic helix–loop–helix class of transcription factors. Proc. Natl. Acad. Sci. USA 1997, 94, 5172–5176. [Google Scholar] [CrossRef]
  3. Pires, N.; Dolan, L. Origin and diversification of basic-helix-loop-helix proteins in plants. Mol. Biol. Evol. 2010, 27, 862–874. [Google Scholar] [CrossRef] [PubMed]
  4. Buck, M.J.; Atchley, W.R. Phylogenetic analysis of plant basic helix-loop-helix proteins. J. Mol. Evol. 2003, 56, 742–750. [Google Scholar] [CrossRef] [PubMed]
  5. Toledo-Ortiz, G.; Huq, E.; Quail, P.H. The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell 2003, 15, 1749–1770. [Google Scholar] [CrossRef] [PubMed]
  6. Wei, K.; Chen, H. Comparative functional genomics analysis of bHLH gene family in rice, maize and wheat. BMC Plant Biol. 2018, 18, 309. [Google Scholar] [CrossRef]
  7. Rensing, S.A.; Lang, D.; Zimmer, A.D.; Terry, A.; Salamov, A.; Shapiro, H.; Nishiyama, T.; Perroud, P.F.; Lindquist, E.A.; Kamisugi, Y.; et al. The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 2008, 319, 64–69. [Google Scholar] [CrossRef]
  8. Heim, M.A.; Jakoby, M.; Werber, M.; Martin, C.; Weisshaar, B.; Bailey, P.C. The basic helix–loop–helix transcription factor family in plants: A genome-wide study of protein structure and functional diversity. Mol. Biol. Evol. 2003, 20, 735–747. [Google Scholar] [CrossRef]
  9. MacAlister, C.A.; Ohashi-Ito, K.; Bergmann, D.C. Transcription factor control of asymmetric cell divisions that establish the stomatal lineage. Nature 2007, 445, 537–540. [Google Scholar] [CrossRef]
  10. Dombrecht, B.; Xue, G.P.; Sprague, S.J.; Kirkegaard, J.A.; Ross, J.J.; Reid, J.B.; Fitt, G.P.; Sewelam, N.; Schenk, P.M.; Manners, J.M.; et al. MYC2 differentially modulates diverse jasmonate-dependent functions in Arabidopsis. Plant Cell 2007, 19, 2225–2245. [Google Scholar] [CrossRef]
  11. Qi, Y.; Zhou, L.; Han, L.; Zou, H.; Miao, K.; Wang, Y. PsbHLH1, a novel transcription factor involved in regulating anthocyanin biosynthesis in tree peony (Paeonia suffruticosa). Plant Physiol. Biochem. 2020, 154, 396–408. [Google Scholar] [CrossRef]
  12. Ma, Y.; Yang, G.; Duan, R.; Li, X.; Zeng, S.; Yan, Y.; Zheng, C.; Hu, Y. Transcriptome analysis of alfalfa (Medicago sativa L.) roots reveals overwintering changes in different varieties. Czech J. Genet. Plant Breed. 2024, 60, 97–104. [Google Scholar] [CrossRef]
  13. Li, H.; Yao, Y.; An, L.; Li, X.; Cui, Y.; Bai, Y.; Yao, X.; Wu, K. Isolation and expression analysis of the HvnAnt2 gene in qingke barley (Hordeum vulgare L. var. nudum Hook. f.) varieties with different grain colours. Czech J. Genet. Plant Breed. 2024, 60, 107–118. [Google Scholar] [CrossRef]
  14. So, K.; Ri, U.; Sun, S.; Che, H.; He, L.; Ri, H.; Zhang, Y. bHLH transcription factor from Lilium pumilum, LpbHLH144 confers the salt and alkali stress tolerance of tobacco. Plant Physiol. Biochem. 2025, 226, 110076. [Google Scholar] [CrossRef] [PubMed]
  15. Yuan, G.; Wu, Z.; Liu, X.; Li, T.; Teng, N. Characterization and functional analysis of LoUDT1, a bHLH transcription factor related to anther development in the lily oriental hybrid Siberia (Lilium spp.). Plant Physiol. Biochem. 2021, 166, 1087–1095. [Google Scholar] [CrossRef]
  16. An, W.; Sun, Y.; Gao, Z.; Liu, X.; Guo, Q.; Sun, S.; Zhang, M.; Han, Y.; Irfan, M.; Chen, L.; et al. LvbHLH13 regulates anthocyanin biosynthesis by activating the LvMYB5 promoter in lily (Lilium ‘Viviana’). Horticulturae 2024, 10, 926. [Google Scholar] [CrossRef]
  17. Feng, Y.; Guo, Z.; Zhong, J.; Liang, Y.; Zhang, P.; Sun, M. The LibHLH22 and LibHLH63 from Lilium ‘Siberia’ can positively regulate volatile terpenoid biosynthesis. Horticulturae 2023, 9, 459. [Google Scholar] [CrossRef]
  18. Fang, S.; Lin, M.; Ali, M.M.; Zheng, Y.; Yi, X.; Wang, S.; Chen, F.; Lin, Z. LhANS-rr1, LhDFR, and LhMYB114 regulate anthocyanin biosynthesis in flower buds of Lilium ‘Siberia’. Genes 2023, 14, 559. [Google Scholar] [CrossRef]
  19. Zhang, K.; Lyu, T.; Lyu, Y. Transcriptional insights into lily stem bulblet formation: Hormonal regulation, sugar metabolism, and transcriptional networks in LA Lily ‘Aladdin’. Horticulturae 2024, 10, 171. [Google Scholar] [CrossRef]
  20. Chinese Flora Editorial Committee. Flora Reipublicae Popularis Sinicae, Vol. 14: Liliaceae; Science Press: Beijing, China, 1980; p. 138. [Google Scholar]
  21. Wang, M.; Zhang, S.; Li, R.; Zhao, Q. Unraveling the specialized metabolic pathways in medicinal plant genomes: A review. Front. Plant Sci. 2024, 15, 1459533. [Google Scholar] [CrossRef] [PubMed]
  22. Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012, 28, 3150–3152. [Google Scholar] [CrossRef]
  23. Simão, F.A.; Waterhouse, R.M.; Ioannidis, P.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef]
  24. Haas, B.J. TransDecoder v5.0.0; Github Repository. Available online: https://github.com/TransDecoder/TransDecoder (accessed on 22 January 2025).
  25. Buchfink, B.; Xie, C.; Huson, D.H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2015, 12, 59–60. [Google Scholar] [CrossRef] [PubMed]
  26. Pruitt, K.D.; Tatusova, T.; Maglott, D.R. NCBI reference sequences (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35, D61–D65. [Google Scholar] [CrossRef]
  27. Bairoch, A.; Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28, 45–48. [Google Scholar] [CrossRef]
  28. Cantalapiedra, C.P.; Hernández-Plaza, A.; Letunic, I.; Bork, P.; Huerta-Cepas, J. eggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 2021, 38, 5825–5829. [Google Scholar] [CrossRef]
  29. Tatusov, R.L.; Galperin, M.Y.; Natale, D.A.; Koonin, E.V. The COG database: A tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000, 28, 33–36. [Google Scholar] [CrossRef] [PubMed]
  30. Koonin, E.V.; Fedorova, N.D.; Jackson, J.D.; Jacobs, A.R.; Krylov, D.M.; Makarova, K.S.; Mazumder, R.; Mekhedov, S.L.; Nikolskaya, A.N.; Rao, B.S.; et al. A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes. Genome Biol. 2004, 5, R7. [Google Scholar] [CrossRef] [PubMed]
  31. Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28, 27–30. [Google Scholar] [CrossRef]
  32. Jones, P.; Binns, D.; Chang, H.Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-Scale Protein Function Classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef]
  33. Zheng, Y.; Jiao, C.; Sun, H.; Rosli, H.G.; Pombo, M.A.; Zhang, P.; Banf, M.; Dai, X.; Martin, G.B.; Giovannoni, J.J.; et al. iTAK: A program for genome-wide prediction and classification of plant transcription factors, transcriptional regulators, and protein kinases. Mol. Plant 2016, 9, 1667–1670. [Google Scholar] [CrossRef] [PubMed]
  34. Eddy, S.R. Accelerated profile HMM searches. PLoS Comput. Biol. 2011, 7, e1002195. [Google Scholar] [CrossRef] [PubMed]
  35. Thumuluri, V.; Almagro Armenteros, J.J.; Johansen, A.R.; Nielsen, H.; Winther, O. DeepLoc 2.0: Multi-label subcellular localization prediction using protein language models. Nucleic Acids Res. 2022, 50, W228–W234. [Google Scholar] [CrossRef]
  36. Høie, M.H.; Kiehl, E.N.; Petersen, B.; Nielsen, M.; Winther, O.; Nielsen, H.; Hallgren, J.; Marcatili, P. NetSurfP-3.0: Accurate and fast prediction of protein structural features by protein language models and deep learning. Nucleic Acids Res. 2022, 50, W510–W515. [Google Scholar] [CrossRef]
  37. Bailey, T.L.; Johnson, J.; Grant, C.E.; Noble, W.S. The MEME suite. Nucleic Acids Res. 2015, 43, W39–W49. [Google Scholar] [CrossRef]
  38. Jin, J.; Tian, F.; Yang, D.C.; Meng, Y.Q.; Kong, L.; Luo, J.; Gao, G. PlantTFDB 4.0: Toward a central hub for transcription factors and regulatory interactions in plants. Nucleic Acids Res. 2017, 45, D1040–D1045. [Google Scholar] [CrossRef]
  39. Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar] [CrossRef] [PubMed]
  40. Capella-Gutiérrez, S.; Silla-Martínez, J.M.; Gabaldón, T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 2009, 25, 1972–1973. [Google Scholar] [CrossRef]
  41. Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014, 30, 1312–1313. [Google Scholar] [CrossRef]
  42. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
  43. Langmead, B.; Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 2012, 9, 357–359. [Google Scholar] [CrossRef] [PubMed]
  44. Pertea, M.; Pertea, G.M.; Antonescu, C.M.; Chang, T.-C.; Mendell, J.T.; Salzberg, S.L. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 2015, 33, 290–295. [Google Scholar] [CrossRef]
  45. Kondou, Y.; Nakazawa, M.; Kawashima, M.; Ichikawa, T.; Yoshizumi, T.; Suzuki, K.; Ishikawa, A.; Koshi, T.; Matsui, R.; Muto, S.; et al. Retarded growth of embryo1, a new basic helix-loop-helix protein, expresses in endosperm to control embryo growth. Plant Physiol. 2008, 147, 1924–1935. [Google Scholar] [CrossRef] [PubMed]
  46. Liu, Y.; Li, X.; Li, K.; Liu, H.; Lin, C. Multiple bHLH proteins form heterodimers to mediate CRY2-dependent regulation of flowering-time in Arabidopsis. PLoS Genet. 2013, 9, e1003861. [Google Scholar] [CrossRef] [PubMed]
  47. Di Marzo, M.; Roig-Villanova, I.; Zanchetti, E.; Caselli, F.; Gregis, V.; Bardetti, P.; Chiara, M.; Guazzotti, A.; Caporali, E.; Mendes, M.A.; et al. MADS-box and bHLH transcription factors coordinate transmitting tract development in Arabidopsis thaliana. Front. Plant Sci. 2020, 11, 526. [Google Scholar] [CrossRef]
  48. Cifuentes-Esquivel, N.; Bou-Torrent, J.; Galstyan, A.; Gallemí, M.; Sessa, G.; Salla Martret, M.; Roig-Villanova, I.; Ruberti, I.; Martínez-García, J.F. The bHLH proteins BEE and BIM positively modulate the shade avoidance syndrome in Arabidopsis seedlings. Plant J. 2013, 75, 989–1002. [Google Scholar] [CrossRef]
  49. Zhang, T.; Lv, W.; Zhang, H.; Ma, L.; Li, P.; Ge, L.; Li, G. Genome-wide analysis of the basic Helix-Loop-Helix (bHLH) transcription factor family in maize. BMC Plant Biol. 2018, 18, 235. [Google Scholar] [CrossRef]
  50. Xin, Z.; Huang, H.; Li, T.; Liu, L.; Du, X.; Li, G.; Zhang, K.; Wang, D.; Yang, Y. Comprehensive analysis of bHLH genes in wheat and functional characterization of TabHLH319 in salt tolerance. Plant Cell Rep. 2025, 44, 199. [Google Scholar] [CrossRef]
  51. Wang, Y.; Liu, A. Genomic Characterization and expression analysis of Basic Helix-Loop-Helix (bHLH) family genes in traditional Chinese herb Dendrobium officinale. Plants 2020, 9, 1044. [Google Scholar] [CrossRef]
  52. Wang, M.J.; Ou, Y.; Li, Z.; Zheng, Q.D.; Ke, Y.J.; Lai, H.P.; Lan, S.R.; Peng, D.H.; Liu, Z.J.; Ai, Y. Genome-wide identification and analysis of bHLH transcription factors related to anthocyanin biosynthesis in Cymbidium ensifolium. Int. J. Mol. Sci. 2023, 24, 3825. [Google Scholar] [CrossRef]
  53. Liu, R.; Wang, Y.; Tang, S.; Cai, J.; Liu, S.; Zheng, P.; Sun, B. Genome-wide identification of the tea plant bHLH transcription factor family and discovery of candidate regulators of trichome formation. Sci. Rep. 2021, 11, 10764. [Google Scholar] [CrossRef]
  54. Qin, S.; Liang, Y.; Ye, Y.; Wei, G.; Lin, Q.; Qin, W.; Wei, F. Genome-wide analysis of the bHLH gene family in Spatholobus suberectus identifies SsbHLH112 as a regulator of flavonoid biosynthesis. BMC Plant Biol. 2025, 25, 594. [Google Scholar] [CrossRef] [PubMed]
  55. Wang, X.; Wang, B.; Yuan, F. Genome-wide identification of bHLH transcription factors and functional analysis in salt gland development of the recretohalophyte sea lavender (Limonium bicolor). Hortic. Res. 2024, 11, uhae036. [Google Scholar] [CrossRef] [PubMed]
  56. Li, X.; Duan, X.; Jiang, H.; Sun, Y.; Tang, Y.; Yuan, Z.; Guo, J.; Liang, W.; Chen, L.; Yin, J.; et al. Genome-wide analysis of basic/helix-loop-helix transcription factor family in rice and Arabidopsis. Plant Physiol. 2006, 141, 1167–1184. [Google Scholar] [CrossRef] [PubMed]
  57. Ranjan, R.; Khurana, R.; Malik, N.; Badoni, S.; Parida, S.K.; Kapoor, S.; Tyagi, A.K. bHLH142 regulates various metabolic pathway-related genes to affect pollen development and anther dehiscence in rice. Sci. Rep. 2017, 7, 43397. [Google Scholar] [CrossRef]
  58. Ortolan, F.; Fonini, L.S.; Pastori, T.; Mariath, J.E.; Saibo, N.J.; Margis-Pinheiro, M.; Lazzarotto, F. Tightly controlled expression of OsbHLH35 is critical for anther development in rice. Plant Sci. 2021, 302, 110716. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.