Next Article in Journal
Ameliorative Effects of Soybean Powder Fermented by Bacillus subtilis on Constipation Induced by Loperamide in Rats
Previous Article in Journal
Dissecting Tumor Heterogeneity by Liquid Biopsy—A Comparative Analysis of Post-Mortem Tissue and Pre-Mortem Liquid Biopsies in Solid Neoplasias
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genome-Wide Identification and Characterisation of the 4-Coumarate–CoA Ligase (4CL) Gene Family in Gastrodia elata and Their Transcriptional Response to Fungal Infection

1
Pharmacy School, Guizhou University of Traditional Chinese Medicine, Guiyang 550025, China
2
College of Life Sciences, Guizhou Normal University, Guiyang 550025, China
3
Gastrodia Elata Industrial Technology Research Institute of Guizhou Province, Guiyang 550004, China
*
Authors to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(15), 7610; https://doi.org/10.3390/ijms26157610
Submission received: 16 June 2025 / Revised: 15 July 2025 / Accepted: 4 August 2025 / Published: 6 August 2025
(This article belongs to the Section Molecular Genetics and Genomics)

Abstract

Gastrodia elata Blume is an important medicinal orchid, yet its large-scale cultivation is increasingly threatened by fungal diseases. The 4-coumarate–CoA ligase (4CL) gene family directs a key step in phenylpropanoid metabolism and plant defence, but its composition and function in G. elata have not been investigated. We mined the G. elata genome for 4CL homologues, mapped their chromosomal locations, and analysed their gene structures, conserved motifs, phylogenetic relationships, promoter cis-elements and codon usage bias. Publicly available transcriptomes were used to examine tissue-specific expression and responses to fungal infection. Subcellular localisation of selected proteins was verified by transient expression in Arabidopsis protoplasts. Fourteen Ge4CL genes were identified and grouped into three clades. Two members, Ge4CL2 and Ge4CL5, were strongly upregulated in tubers challenged with fungal pathogens. Ge4CL2 localised to the nucleus, whereas Ge4CL5 localised to both the nucleus and the cytoplasm. Codon usage analysis suggested that Escherichia coli and Oryza sativa are suitable heterologous hosts for Ge4CL expression. This study provides the first genome-wide catalogue of 4CL genes in G. elata and suggests that Ge4CL2 and Ge4CL5 may participate in antifungal defence, although functional confirmation is still required. The dataset furnishes a foundation for functional characterisation and the molecular breeding of disease-resistant G. elata cultivars.

1. Introduction

Gastrodia elata Blume (Orchidaceae) is a prized traditional medicinal herb distributed chiefly in Guizhou, Yunnan, Jilin, Sichuan, and neighbouring provinces of China, and it is now listed as a nationally protected rare species [1,2]. Its tubers contain abundant bioactive constituents—phenols, flavonoids, polysaccharides, sterols, and others—that underpin diverse pharmacological activities [3,4,5,6]. Gastrodin, the principal phenolic glycoside, exhibits anti-inflammatory, anticancer, antiviral, and neuroprotective effects and is widely prescribed for neurological, cardiovascular, and hepatic disorders [7]. Commercial cultivation of G. elata has expanded steadily since artificial propagation was achieved in the late twentieth century [8]. However, large-scale production is increasingly compromised by fungal diseases that flourish under the warm, humid conditions required for Tianma cultivation, causing serious yield and quality losses [9,10]. Breeding disease-tolerant cultivars therefore represents a pressing need and requires functional dissection of genes that orchestrate antifungal defence in this species.
The 4-coumarate–CoA ligase (4CL; EC 6.2.1.12) gene family is a pivotal branch-point in phenylpropanoid metabolism and plays multifaceted roles in plant–pathogen interactions [11,12,13,14,15,16]. Infection by Pseudomonas syringae pv. maculicola markedly upregulates At4CL transcripts in Arabidopsis thaliana [17]. Overexpressing Gh4CL3 in cotton enhances lignin and flavonoid accumulation and confers systemic resistance to Verticillium dahlia [18], whereas ectopic expression of Os4CL3 or Os4CL5 in rice restricts Magnaporthe oryzae penetration by reinforcing early lignin deposition [19]. Genome-wide surveys have catalogued 4CL repertoires in numerous crops, including pomegranate [20], apple [21], cassava [22], Populus trichocarpa [23], and A. thaliana [24]. In contrast, investigations of G. elata have focused mainly on chemical diversity, pharmacological action, and functional food development [4,25]; comprehensive analyses of defence-related genes, including 4CLs, remain scarce.
In this study, systematic identification and analysis of the members of the 4CL gene family were conducted based on the whole-genome data of G. elata. The molecular characteristics of the Ge4CLs were systematically analysed through bioinformatics methods. Meanwhile, the influencing factors of codon usage bias and the optimal host expression system of Ge4CL genes were predicted based on codon usage characteristics. Subsequently, the transcriptome data were used to further reveal the tissue-specific expression patterns of the Ge4CLs, as well as their expression patterns in the tubers of healthy and pathogenic fungal infective states. Two of the Ge4CLs, Ge4CL2 and Ge4CL5, were verified through subcellular localisation experiments. This study systematically clarifies the evolutionary characteristics and functional diversity of the 4CL gene family in G. elata and lays a foundation for the functions of the 4CL gene family, which provides key candidate genes for cultivating G. elata with strong antibacterial properties.

2. Results

2.1. Identification and Chromosomal Location Analysis of G. elata Ge4CL Gene Family Members

Fourteen non-redundant 4CL genes (Ge4CL1–Ge4CL14) were recovered from the G. elata genome after BLASTP screening, Pfam domain confirmation, and AMP-binding domain verification with NCBI-CDD. These genes map to nine chromosomes (Figure 1). Chromosome 16 houses the largest cluster—Ge4CL12, Ge4CL13, and Ge4CL14—whereas chromosomes 2, 4, 5, 7, and 13 each contain a single locus. A tandem-duplication pair (Ge4CL3/Ge4CL4) was detected on chromosome 3; no additional tandem arrays were observed. The predicted protein lengths span 522–697 amino acids, corresponding to molecular masses of 55.7–77.4 kDa. The mean isoelectric point is 6.6, and the instability indices range from 27.81 to 49.47, classifying most members as stable (Table 1). The grand average of hydropathicity (GRAVY) values exceed 0 for all but Ge4CL2, Ge4CL8, Ge4CL11, and Ge4CL12, indicating that most isoforms are hydrophobic. PSORT analysis predicts primary localisation in the cytoplasm and plasma membrane, with subsidiary targeting to chloroplasts and the endoplasmic reticulum.

2.2. Collinearity Analysis

To trace the origin and evolution of the Ge4CL family, we analysed both intra- and inter-species synteny. No segmental or tandem-duplication events were detected among the 14 loci within the G. elata genome (Figure 2a), indicating that the family size has remained stable during long-term evolution. Comparative mapping between G. elata and five reference genomes—Arabidopsis thaliana, Populus trichocarpa, C. annuum, O. sativa, and D. officinale—revealed the strongest collinear relationships with the fellow orchid D. officinale (Figure 2b). Orthology with A. thaliana and C. annuum was limited, reflecting lineage-specific divergence and genomic rearrangement.

2.3. Phylogenetic Tree Analysis

A maximum-likelihood tree constructed from 63 plant 4CL proteins divided all members into three clades (Figure 3). Ge4CL4, Ge4CL5, and Ge4CL7 were grouped with At4CL3 in clade II, which participates in flavonoid biosynthesis. Most Ge4CL genes were clustered in clade III together with homologues from C. annuum, Selaginella moellendorffii, Physcomitrella patens, and Pinus spp., suggesting a shared evolutionary trajectory. None of the fourteen Ge4CL proteins were grouped with the canonical lignin-associated Group I defined by Endler et al. Two members (Ge4CL8 and Ge4CL12) were clustered within the flavonoid-associated Group II, whereas the remaining twelve proteins resided in the monocot-specific Group III previously reported for rice by Gui et al. This distribution indicates a lineage-specific expansion of Group III isoforms in G. elata.

2.4. Analysis of Conserved Motifs and Gene Structures

Ten conserved motifs (Motif 1–Motif 10) were detected across all 14 Ge4CL proteins (Figure 4a). The study found that the Ge4CLs gene family has highly conserved functional domains. Each gene contains Motif 2 and Motif 3. Among them, Motif 3 contains an AMP-binding functional domain (SSGTTGLPKGV), and Motif 2 contains a conserved domain (GEICIRG) (Figure 4c). Gene structure analysis shows that all Ge4CL genes contain exons, but the number of exons varies to some extent (Figure 4b). Moreover, the intron lengths of Ge4CL8 and Ge4CL11 are longer than those of the other Ge4CLs. These results provide molecular evolutionary evidence for explaining the functional differentiation of Ge4CLs.

2.5. Analysis of Promoter Cis-Acting Elements

Promoter scanning uncovered 28 cis-elements that fall into three functional categories: hormonal response (e.g., ABA- and MeJA-responsive motifs), growth and development (light-responsive elements), and abiotic/biotic stress (drought- and defence-related motifs) (Figure 5). The heterogeneous distribution of these elements suggests that individual Ge4CL genes participate in distinct regulatory networks across development and stress adaptation. Notably, several defence-associated motifs were detected. The classic W-box (TTGACC/T), a binding site for WRKY transcription factors involved in fungal resistance, occurs in the promoters of Ge4CL2 and Ge4CL5. We also identified the TC-rich repeat (ATTTTCTTCA), which is linked to stress-responsive gene activation. In the hormone-responsive category, MeJA-associated CGTCA/TGACG motifs and ethylene-responsive elements (EREs) are widely distributed, whereas multiple ABRE sites suggest ABA-regulated transcription. The co-presence of W-box and MeJA/ethylene motifs in Ge4CL2 and Ge4CL5 aligns with their strong induction upon fungal infection, implying that these genes may be jointly regulated by hormone-mediated defence pathways.

2.6. Analysis of Codon-Related Parameters of the Gene Family

The coding sequence (CDS) of the Ge4CL gene family was analysed with CodonW and EMBOSS-CUSP (Table 2). At the third codon position, the average nucleotide frequencies were 23% A, 29% T, 37% C, and 33% G. The GC contents at each codon position were not evenly distributed, which is consistent with the rules of monocotyledonous plants. The total GC content of Ge4CLs ranges from 41% to 66%, while the average ranges of GC1, GC2, and GC3 are 56.11%, 43%, and 58%, respectively. The codon adaptation index (CAI) ranges from 0.18 to 0.23; the codon bias index (CBI) ranges from −0.07 to 0.14; the frequency of optimal codons (FOP) ranges from 0.35 to 0.49; the effective number of codons (ENC) ranges from 41.15 to 56.44, and all genes have a value greater than 35, indicating a weak codon bias; the number of synonymous codons (L_sym) ranges from 506 to 677; and the aromaticity of protein (Aromo) ranges from 0.06 to 0.11.

2.7. Analysis of Influencing Factors on Codon Usage Bias of the Gene Family

ENC plot analysis (Figure 6a) placed two Ge4CL genes near the Wright expected curve, whereas the remaining members fell below it. The overall distribution followed the theoretical trend, indicating that mutation pressure largely shapes codon choice, with natural selection acting on a subset of genes. PR2 plot analysis (Figure 6b) revealed a slight preference for pyrimidines (T + C) over purines (A + G) at third-codon positions, supporting unequal mutation pressure. Neutral plot regression of GC12 against GC3s produced a slope of 0.36 (r = 0.89; Figure 6c), highlighting natural selection as the primary driver of codon usage, supplemented by mutation effects.

2.8. Selection of the Recipient System of the Ge4CLs

Because an efficient genetic transformation system for G. elata is not yet available, heterologous hosts were evaluated by comparing codon usage patterns. Codons exhibiting significant bias (ratio ≤ 0.5 or ≥2.0) between Ge4CL genes and the genomes of five model organisms were counted (Table 3). The results showed that the numbers of sites with significant differences in codon usage bias between the Ge4CL genes and the genomes of Escherichia coli and Saccharomyces cerevisiae were 10 and 20, respectively. This indicates that Escherichia coli is more suitable as a prokaryotic expression vector for the Ge4CL genes. When using S. cerevisiae as a eukaryotic expression system, the codons of the 4CLs need to be optimised. In addition, the numbers of sites with significant differences in codon bias between G. elata and A. thaliana, Nicotiana tabacum, and Oryza sativa were 7, 16, and 1, respectively. The results showed that O. sativa was more suitable for the genetic transformation of the Ge4CLs.

2.9. Expression Patterns of the Ge4CLs in G. elata

RNA-Seq analysis (Figure 7a) revealed that most Ge4CL transcripts accumulate in mother tubers—both mature and juvenile—while showing moderate levels in daughter tubers. Ge4CL9 and Ge4CL13 were scarcely expressed in any tissue (TPM < 1). In contrast, Ge4CL10 maintained high expression across all four tissues (TPM > 8), implying a broad role in phenylpropanoid biosynthesis. Under pathogenic fungal infection (Figure 7b), Ge4CL2 and Ge4CL5 were significantly upregulated, suggesting that these genes help orchestrate defence-related secondary metabolite production.

2.10. Subcellular Localisation Analysis

To confirm subcellular distribution, the CDS fragments of Ge4CL2 and Ge4CL5 (lacking stop codons) were fused to GFP in pUC19 and transiently co-expressed with an NLS-RFP marker in Arabidopsis protoplasts. Confocal microscopy 18 h post-transformation (Figure 8) showed Ge4CL2-GFP fluorescence confined to nuclei, whereas Ge4CL5-GFP signals were detected in both nuclei and cytoplasm, consistent with nuclear localisation for Ge4CL2 and dual nuclear–cytoplasmic localisation for Ge4.

3. Discussion

3.1. Evolutionary Expansion and Diversification of the Ge4CL Gene Family in G. elata

The identification of 14 distinct 4CL genes in G. elata represents a substantial expansion of the 4CL gene family compared to model plants such as Arabidopsis thaliana (4 genes) and Oryza sativa (5 genes) [20,26]. However, this is fewer than in species like Malus domestica (69 genes) and Gossypium hirsutum (34 genes), suggesting that the family in G. elata is both relatively small and distinctively shaped by its evolutionary pressures [17,27]. The phylogenetic analysis of these genes indicated their clear grouping into three distinct subfamilies, highlighting functional divergence [28]. While subfamily I is typically involved in the biosynthesis of lignin, a key secondary metabolite in many plants, G. elata’s 4CLs are not found in this subfamily. This absence supports the hypothesis that G. elata’s 4CLs have evolved to play a different role in secondary metabolism, perhaps involved in the biosynthesis of compounds like flavonoids, which are critical for its medicinal properties. Functional implications of the Group III enrichment: Dicots typically harbour two 4CL subclasses, with Group I enzymes channelling precursors into lignin and Group II enzymes fuelling flavonoid formation. A third subclass, Group III, appears to be restricted to monocots and is thought to support stress-responsive phenylpropanoid branches rather than structural lignification. The fact that twelve out of fourteen Ge4CL genes fall into Group III, while none remain in Group I, suggests that lignin-directed CoA ligase activity may be complemented by other acyl-activating enzymes in this achlorophyllous orchid; conversely, the expansion of Group III members could confer metabolic flexibility, allowing rapid redirection of phenylpropanoid flux toward antifungal defence or symbiotic interactions during subterranean growth. Biochemical assays and reverse-genetics studies will be required to verify whether individual Ge4CL isoforms specialise in these adaptive pathways.
Notably, the occurrence of tandem duplications on chromosome 3 could suggest that gene expansion in this species was driven by such duplications, a process commonly observed in other plant species (and particularly in plants that undergo selective pressure for pathogen resistance) [29]. These results offer important insights into the evolutionary forces that have shaped the 4CL family in G. elata, and potentially in other orchids or medicinal plants with similar evolutionary histories. Further comparison with the 4CL family in other medicinal plants may reveal species-specific adaptations in the synthesis of bioactive secondary metabolites. It is well documented that the biosynthesis of flavonoids, phenolic acids, and other secondary metabolites contributes significantly to the medicinal and therapeutic properties of plants [30,31]. Given that G. elata is used in traditional medicine for its antioxidant and anti-inflammatory properties, the absence of 4CL genes related to lignin biosynthesis but the presence of genes potentially involved in flavonoid synthesis could directly link the gene family’s evolution to the plant’s medicinal qualities. Moreover, the unique evolutionary trajectory of these genes in G. elata opens new avenues for the study of gene family diversification in other plants that may also possess therapeutic value, but with distinct metabolic pathways or molecular adaptations tailored to their ecological niches.

3.2. Role of Ge4CL Genes in Fungal Infection Response and Plant Defence Mechanisms

The expression analysis of Ge4CL genes in response to fungal infection in G. elata provided compelling evidence that certain members of the 4CL gene family, particularly Ge4CL2 and Ge4CL5, play an important role in the plant’s defence mechanism. These genes exhibited significant upregulation in infected tissues, suggesting their involvement in the plant’s antifungal response [32,33,34]. This aligns with findings in other species, where 4CL genes have been linked to the biosynthesis of secondary metabolites like phenylpropanoids, which are well known to participate in plants’ defence against pathogens [20,35,36,37]. In particular, phenolic compounds such as flavonoids and lignin derivatives can act as physical barriers or antimicrobial agents to thwart pathogen invasion [38,39]. The upregulation of Ge4CL2 and Ge4CL5 in response to fungal stress might therefore indicate that these genes contribute to the synthesis of such defence compounds, reinforcing the plant’s ability to resist fungal infections.
In addition to expression data, the subcellular localisation of Ge4CL2 and Ge4CL5 further supports their roles in plant defence. Ge4CL2’s predominant localisation to the nucleus suggests that it might be involved in the regulation of gene expression related to stress responses, potentially through interactions with transcription factors that modulate defence pathways. On the other hand, the dual localisation of Ge4CL5 in both the nucleus and cytoplasm hints at a broader functional scope, possibly including the regulation of metabolic pathways in addition to its involvement in defence. This dual localisation is significant, as it may indicate Ge4CL5’s participation in both transcriptional regulation and post-translational modifications, which are crucial for fine-tuning the plant’s response to external stressors such as fungal pathogens. These findings not only emphasise the central role of Ge4CL genes in stress responses but also open the door to further investigations into the precise biochemical and molecular pathways through which these genes contribute to G. elata’s antifungal properties.

3.3. Codon Usage Bias and Its Implications for Gene Expression in G. elata

The codon usage bias analysis conducted on the Ge4CL gene family provided insightful data on the evolutionary pressures shaping these genes [40]. The results indicated that the coding sequences of these genes exhibit a distinct bias, influenced by both mutational pressure and natural selection. This suggests that the Ge4CL genes in G. elata have undergone selective evolutionary pressures, likely driven by the plant’s adaptation to specific environmental conditions, including pathogen stress [41]. Interestingly, the codon usage patterns observed in G. elata’s 4CL genes were comparable to those in other well-studied plants such as Oryza sativa and Arabidopsis thaliana, which further supports the hypothesis that codon usage bias is a general feature of plant gene families, influenced by both evolutionary history and environmental adaptation. Moreover, this codon usage bias could have implications for the efficient expression of these genes, suggesting that G. elata’s 4CL genes are optimised for the particular ecological pressures that the plant faces.
Biological relevance of codon usage metrics: The 14 Ge4CL genes display medium codon adaptation indices (CAI = 0.18–0.23), suggesting a balanced usage that avoids excessive competition for tRNA pools. Notably, Ge4CL2 and Ge4CL5—the two transcripts most strongly induced upon Penicillium infection—lie at the upper end of this range, indicating that their codon composition closely matches that of high-expression housekeeping genes and, thus, favours rapid translation under stress. The lower effective number of codons for Ge4CL5 (ENC = 43.68) further points to a restricted set of preferred codons, likely paired with abundant tRNAs, which can accelerate elongation and boost the flux toward defence-related phenylpropanoid metabolites. In contrast, Ge4CL1, Ge4CL10, and Ge4CL13 exhibit GC contents > 0.60. Elevated GC may stabilise secondary mRNA structure while slightly lowering initiation efficiency, a feature congruent with their putative housekeeping or metabolic “buffer” roles under non-stress conditions. Together, these metrics suggest that codon usage in G. elata fine-tunes translational output, allowing rapid upregulation of key defence genes without globally perturbing protein synthesis homeostasis.

3.4. Evolutionary Implications of the Reduced 4CL Repertoire in G. elata

The present study identified only seven canonical Ge4CL genes, a copy number that lies at the lower end of the spectrum reported for angiosperms. Autotrophic species such as Arabidopsis thaliana, Zea mays, and Oryza sativa typically harbour 4–5 members, whereas lineage-specific duplications in woody or fruit crops can inflate the family to dozens of loci (e.g., 30 4CL/ACS-related genes in pear) [42]. Within Orchidaceae, the fully photosynthetic epiphyte Vanilla planifolia retains five 4CL genes [43], while draft genomes of Dendrobium spp. annotate ≈ 9–11 copies.
The intermediate yet streamlined complement in the achlorophyllous, fully mycoheterotrophic G. elata is consistent with relaxed selection on phenylpropanoid-derived lignin following the loss of leaves and the concomitant reduction in structural carbon demand. Comparative genomics of other mycoheterotrophic orchids and parasitic plants reveals parallel contractions in plastid and nuclear gene sets once photosynthesis is abandoned [44]. We therefore hypothesise that the modest downsizing of the 4CL family in G. elata reflects a balance between (i) conserving minimal enzymatic capacity for specialised metabolites such as gastrodin and (ii) genome-wide pressures to economise redundant functions. Testing this hypothesis will require broader taxon sampling (e.g., Epipogium, Cuscuta) and functional interrogation of lineage-specific 4CL paralogues via CRISPR knockout, enzyme activity assays, and targeted metabolomics, thereby clarifying whether convergent gene family contraction is a common hallmark of heterotrophic evolution.

3.5. Future Directions: Functional Validation and Applications in Plant Breeding

While this study provides a comprehensive genomic and bioinformatic characterisation of the 4CL gene family in Gastrodia elata, the functional validation of these genes remains a critical next step. Future research should focus on experimental approaches to confirm the roles of Ge4CL2 and Ge4CL5 in the biosynthesis of secondary metabolites involved in defence responses. Techniques such as quantitative real-time PCR (qRT-PCR) and RNA interference (RNAi) could be used to validate the expression patterns of these genes under different stress conditions, and to explore their precise involvement in fungal resistance. Additionally, gene knockout or overexpression systems, coupled with metabolic profiling, could help identify the specific metabolites produced by these genes and their role in plant defence.
Beyond functional validation, our transcriptome-based findings already hint at tangible avenues for the molecular breeding of G. elata varieties with improved resistance to fungal diseases. Pinpointing Ge4CL2 and Ge4CL5 as putative defence-related genes suggests that they could be harnessed—either through marker-assisted selection or targeted engineering—to boost phenylpropanoid-derived metabolites and thereby enhance both plant vigour and medicinal quality. Such insights may, in turn, stimulate parallel efforts in other medicinal species, ultimately expanding the repertoire of crops that combine strong disease resistance with high bioactive compound yields. We recognise, however, that these proposals rest on correlative transcriptome evidence. Owing to the current unavailability of mature tuber material and pending biosafety approval for Penicillium oxalicum inoculation, qRT-PCR and functional complementation assays could not be completed within the present revision cycle. We have therefore scheduled controlled infection experiments for the coming harvest season and secured funding to perform qRT-PCR validation and enzyme assays. Until those data are obtained, all functional inferences should be regarded as hypotheses that require experimental confirmation.

4. Materials and Methods

4.1. Identification and Chromosomal Localisation of Ge4CL Genes

Whole-genome sequence data for Gastrodia elata (assembly accession GWHBDNU00000000, National Genomics Data Center, CN) were downloaded and screened for 4-coumarate–CoA ligase (4CL) candidates. The reference set of Arabidopsis thaliana 4CL proteins (TAIR; https://www.arabidopsis.org/, accessed on 8 December 2024) was first queried against the G. elata proteome with BLASTP (E-value < 1 × 10−5). In parallel, the hidden Markov model for the AMP-binding/4CL domain (Pfam entry PF00501) was used in HMMER 3.3 searches. Sequences lacking the conserved AMP-binding motif were discarded after confirmation with the NCBI Conserved Domain Database. Genomic coordinates for the retained genes were extracted from the G. elata GFF3 annotation and visualised with TBtools. Basic physicochemical properties—including amino acid length, predicted molecular weight, and isoelectric point—were calculated with the same software, while subcellular localisation was predicted using PSORT (https://wolfpsort.hgc.jp/, accessed on 10 December 2024).
To relate gene position to expression behaviour, we re-examined two publicly available RNA-Seq datasets originating from the same geographical provenance (Changbai Mountains, Jilin, China). Healthy mature tubers (BioSample SAMN14380862) and naturally Penicillium oxalicum-infected mature tubers (BioSample SAMN14380861) of G. elata Bl. f. glauca were collected on the same day, surface-sterilised, and dissected at identical anatomical sites. Each condition comprised three biological replicates (≈100 mg tissue per replicate), immediately flash-frozen in liquid nitrogen. Illumina NovaSeq sequencing yielded 7.89 × 1010 and 6.45 × 1010 clean bases for the healthy and diseased groups, respectively. Raw reads were quality-filtered with FastQC and Trimmomatic, aligned to the reference genome with HISAT2, and normalised counts were generated in DESeq2. Genes with |log2FC| ≥ 1 and FDR < 0.05 were considered to be differentially expressed. Concordance among biological replicates was high (Pearson r2 = 0.96–0.98), underscoring the reliability of the expression data used in subsequent analyses.
BLASTP searches were performed with NCBI BLAST+ v2.10.1, using the four annotated 4CL proteins of Arabidopsis thaliana (AT1G51680, AT3G21240, AT1G65060, and AT3G21230) as queries. The search parameters were set to E-value < 1 × 10−5, word size 3, BLOSUM62 matrix, gap open 11, gap extend 1, and low-complexity filtering disabled (-seg no); the maximum target sequences option was raised to 5000 to avoid premature truncation. In parallel, HMMER v3.3.2 was run against the G. elata proteome using the Pfam profile PF00501 (AMP-binding/4CL domain). The union of BLASTP and HMMER hits was purged of redundancy and screened in the NCBI Conserved Domain Database; proteins lacking the diagnostic AMP-binding motif (cl00909) were discarded.

4.2. Collinearity Analysis of the Ge4CL Gene Family

Chromosome length, gene coordinates, and gene density information were parsed from the G. elata genome and its GFF3 file. Intragenomic synteny was illustrated with the Advanced Circos function in TBtools. To examine inter-species collinearity, genome assemblies and annotations for A. thaliana, P. trichocarpa, C. annuum, O. sativa, and D. officinale were downloaded from NCBI and analysed in TBtools.

4.3. Phylogenetic Analysis of the Ge4CLs

Published 4CL amino acid sequences from diverse plant species were aligned, and a maximum-likelihood phylogenetic tree was built with the “One-Step Build a ML Tree” module in TBtools. Ultrafast bootstrap resampling (5000 replicates) was applied. The tree was refined in iTOL.

4.4. Conserved Motifs and Gene Structures of Ge4CLs

Conserved motifs were identified with MEME (http://meme-suite.org/, accessed on 12 December 2024), with the maximum number of motifs set to 10. Gene structures and motif distributions were visualised in TBtools alongside the phylogenetic tree.

4.5. Analysis of Cis-Acting Elements in the Promoters of Ge4CLs

Promoter regions (2 kb upstream of the ATG start codon) were extracted in TBtools. Cis-regulatory elements were predicted with PlantCARE, and their abundance was displayed as a heatmap generated in TBtools.

4.6. Analysis of Codon Usage Bias in the Ge4CLs

Codon composition and related parameters were calculated with CodonW and the CUSP program. The effective number of codons (ENC), the GC content at each codon position, and the frequency of A, T, C, and G at the third position of synonymous codons were determined. ENC plots, PR2 plots, and neutral plots were produced in Origin 2024.

4.7. Data Sources for Codon Usage Bias

Codon usage tables for Ge4CL genes were generated with the EMBOSS Explorer interface (http://www.bioinformatics.nl/emboss-explorer/, accessed on 13 December 2024). Genome-wide codon usage data for five model organisms—E. coli, N. tabacum, A. thaliana, S. cerevisiae, and O. sativa—were downloaded from the Kazusa Codon Usage Database (http://www.kazusa.or.jp/codon/, accessed on 13 December 2024). The codon usage metrics for each Ge4CL were compared with the genomic averages of the five reference organisms.

4.8. Analysis of Gene Family Expression Patterns

RNA-Seq datasets covering four G. elata tissues—mature tuber, juvenile tuber, mother tuber, and mother-of-juvenile tuber (SRA project SRP279888)—together with mature tubers showing fungal disease symptoms and healthy controls (SRP268570), were obtained from the GelFAP v2.0 portal. Transcript abundance was expressed as log2(TPM + 1). Expression heatmaps were drawn in TBtools.
Clean reads from the healthy and fungal-infected tuber libraries were processed in DESeq2 to obtain normalised gene-level counts. A gene was deemed to be significantly differentially expressed when it met both of the following criteria: (i) an absolute log2(fold change) ≥ 1, corresponding to a ≥2-fold up- or downregulation, and (ii) a Benjamini–Hochberg adjusted p value (FDR) < 0.05. Wald tests (pairwise contrasts) were used for two-group comparisons, while likelihood ratio tests provided ANOVA-like assessment in multi-group situations. All p values are two-tailed unless otherwise stated. The same thresholds were applied wherever ”significant expression change” is discussed in the Results section.

4.9. Cloning and Subcellular Localisation of Ge4CL2 and Ge4CL5

Coding sequences of Ge4CL2 and Ge4CL5 were PCR-amplified from G. elata cDNA with gene-specific primers lacking stop codons. Amplicons were purified, ligated into pUC19-GFP via seamless cloning, and transformed into Escherichia coli DH5α by heat shock. Positive clones were verified by Sanger sequencing (Kumei Biotechnology, Changchun, China), and endotoxin-free plasmids were prepared with the EndoFree Plasmid Midi Kit (CWBIO, Beijing, China). pUC19-Ge4CL-GFP and NLS-RFP constructs were co-transfected into Arabidopsis thaliana mesophyll protoplasts. After 18–22 h of dark incubation at 22 °C, GFP and RFP signals were visualised with a confocal laser scanning microscope to determine subcellular localisation.

5. Conclusions

In this study, we conducted a comprehensive genome-wide identification and characterisation of the 4CL gene family in G. elata, revealing 14 distinct members with unique evolutionary and functional characteristics. Our findings indicate a possible role for Ge4CL genes, particularly Ge4CL2 and Ge4CL5, in the plant’s response to fungal infection, suggesting their involvement in the biosynthesis of defence-related secondary metabolites. This work not only deepens our understanding of the molecular mechanisms underlying G. elata’s antifungal resistance but also lays the groundwork for future functional validation studies and potential applications in breeding G. elata varieties with enhanced disease resistance. The insights gained here contribute significantly to both the basic understanding of plant gene family evolution and the practical application of medicinal plant improvement.

Author Contributions

Conceptualisation, Z.J. and Y.P.; methodology, W.S. and K.M.; software, Q.L. and S.S.; validation, S.Y. and T.T.; formal analysis, K.M.; investigation, S.S.; resources, W.S.; data curation, Q.L.; writing—original draft preparation, S.S.; writing—review and editing, Z.J. and K.M.; visualisation, S.S.; supervision, Y.P. and Z.J.; project administration, Z.J.; funding acquisition, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Administration of Traditional Chinese Medicine, High-Level Key Discipline Construction Project of Traditional Chinese Medicine (Grant No. zyyzdxk-2023186); the Science and Technology Department of Guizhou Province, Full-Component Analysis and Functional Component Exploration of Dafang Gastrodia elata; and the Guizhou Provincial Gastrodia elata Industry Technology Research Program, Key Technology Research of the Entire Industry Chain of Guizhou Gastrodia elata. The APC was funded by the National Administration of Traditional Chinese Medicine.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The genome data of Gastrodia elata used in this study are publicly available from the National Genomics Data Center (NGDC) under accession number GWHBDNU00000000. Most of the transcriptome and gene expression data were obtained from the public database GelFAP v2.0 (http://www.gzybioinformatics.cn/Gelv2, accessed on 22 December 2024).

Acknowledgments

The authors would like to thank the Guizhou University of Traditional Chinese Medicine and Guizhou Normal University for providing laboratory facilities and administrative support. We also acknowledge the use of public databases including NGDC and GelFAP v2.0.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
4CL4-Coumarate–CoA Ligase
Ge4CLGastrodia elata 4CL Gene
G. elataGastrodia elata
TPMTranscripts Per Million
NGDCNational Genomics Data Center
CDDConserved Domain Database
MEMEMultiple EM for Motif Elicitation
ENCEffective Number of Codons
CAICodon Adaptation Index
CBICodon Bias Index
FOPFrequency of Optimal Codons
GCGuanine–Cytosine Content
GFFGeneral Feature Format
PSORTProtein Subcellular Localisation Prediction Tool
TBtoolsToolbox for Biologists
GFPGreen Fluorescent Protein
RFPRed Fluorescent Protein
RNAiRNA Interference

References

  1. Hu, J.; Feng, Y.; Zhong, H.; Liu, W.; Tian, X.; Wang, Y.; Tan, T.; Hu, Z.; Liu, Y. Impact of climate change on the geographical distribution and niche dynamics of Gastrodia elata. PeerJ 2023, 11, e15741. [Google Scholar] [CrossRef] [PubMed]
  2. Zhao, F.; Yin, C.; Lai, Y.; Lin, H.; Jian, Z.; Tao, A. Extraction, purification, characteristics, bioactivities, application, and toxicity of Gastrodia R. Br. polysaccharides: A review. Int. J. Biol. Macromol. 2025, 301, 140084. [Google Scholar] [CrossRef]
  3. Su, Z.; Yang, Y.; Chen, S.; Tang, Z.; Xu, H. The processing methods, phytochemistry and pharmacology of Gastrodia elata Bl.: A comprehensive review. J. Ethnopharmacol. 2023, 314, 116467. [Google Scholar] [CrossRef] [PubMed]
  4. Li, C.; Li, J.; Wang, Y.Z. A Review of Gastrodia elata Bl.: Extraction, Analysis and Application of Functional Food. Crit. Rev. Anal. Chem. 2024, 1–30. [Google Scholar] [CrossRef] [PubMed]
  5. Xiao, G.; Tang, R.; Yang, N.; Chen, Y. Review on pharmacological effects of gastrodin. Arch. Pharm. Res. 2023, 46, 744–770. [Google Scholar] [CrossRef]
  6. Zhan, H.D.; Zhou, H.Y.; Sui, Y.P.; Du, X.L.; Wang, W.H.; Dai, L.; Sui, F.; Huo, H.R.; Jiang, T.L. The rhizome of Gastrodia elata Blume—An ethnopharmacological review. J. Ethnopharmacol. 2016, 189, 361–385. [Google Scholar] [CrossRef]
  7. Wang, Y.; Bai, M.; Wang, X.; Peng, Z.; Cai, C.; Xi, J.; Yan, C.; Luo, J.; Li, X. Gastrodin: A comprehensive pharmacological review. Naunyn Schmiedebergs Arch. Pharmacol. 2024, 397, 3781–3802. [Google Scholar] [CrossRef]
  8. Yu, C.J.; Wang, S.B. Research on Pollution-free Comprehensive Prevention and Control Technology for Sclerotium Rot of G. elata. Hubei Agric. Sci. 2010, 49, 92–94. [Google Scholar]
  9. Tian, W.G. Research on Fungal Diseases of G. elata in Zhaotong, Yunnan. Master’s Thesis, Yunnan University, Kunming, China, 2020. [Google Scholar]
  10. Zhang, J.Q.; Tang, X.; Xiao, C.H.; Xu, J.; Yuan, Q.S.; Wang, X.; Liu, D.H.; Zhang, G.W.; Liu, F.M.; Jiang, W.K.; et al. [Investigation, analysis and identification of disease of Gastrodia elata f. glauca]. Zhongguo Zhong Yao Za Zhi 2020, 45, 478–484. [Google Scholar]
  11. Kang, L.; Wu, Y.; Jia, Y.; Chen, Z.; Kang, D.; Zhang, L.; Pan, C. Nano-selenium enhances melon resistance to Podosphaera xanthii by enhancing the antioxidant capacity and promoting alterations in the polyamine, phenylpropanoid and hormone signaling pathways. J. Nanobiotechnol. 2023, 21, 377. [Google Scholar] [CrossRef]
  12. Sun, J.; Fan, Z.; Chen, Y.; Jiang, Y.; Lin, M.; Wang, H.; Lin, Y.; Chen, Y.; Lin, H. The effect of ε-poly-l-lysine treatment on molecular, physiological and biochemical indicators related to resistance in longan fruit infected by Phomopsis longanae Chi. Food Chem. 2023, 416, 135784. [Google Scholar] [CrossRef] [PubMed]
  13. Chen, X.; Fang, X.; Zhang, Y.; Wang, X.; Zhang, C.; Yan, X.; Zhao, Y.; Wu, J.; Xu, P.; Zhang, S. Overexpression of a soybean 4-coumaric acid: Coenzyme A ligase (GmPI4L) enhances resistance to Phytophthora sojae in soybean. Funct. Plant Biol. 2019, 46, 304–313. [Google Scholar] [CrossRef]
  14. Lavhale, S.G.; Kalunke, R.M.; Giri, A.P. Structural, functional and evolutionary diversity of 4-coumarate-CoA ligase in plants. Planta 2018, 248, 1063–1078. [Google Scholar] [CrossRef] [PubMed]
  15. Meng, L.; Zhou, R.; Liang, L.; Zang, X.; Lin, J.; Wang, Q.; Wang, L.; Wang, W.; Li, Z.; Ren, P. 4-Coumarate-CoA ligase (4-CL) enhances flavonoid accumulation, lignin synthesis, and fruiting body formation in Ganoderma lucidum. Gene 2024, 899, 148147. [Google Scholar] [CrossRef]
  16. Wei, C.; Wang, C.; Zhang, X.; Huang, W.; Xing, M.; Han, C.; Lei, C.; Zhang, Y.; Zhang, X.; Cheng, K.; et al. Histone deacetylase GhHDA5 negatively regulates Verticillium wilt resistance in cotton. Plant Physiol. 2024, 196, 2918–2935. [Google Scholar] [CrossRef]
  17. Ehlting, J.; Büttner, D.; Wang, Q.; Douglas, C.J.; Somssich, I.E.; Kombrink, E. Three 4-coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionarily divergent classes in angiosperms. Plant J. 1999, 19, 9–20. [Google Scholar] [CrossRef]
  18. Alariqi, M.; Ramadan, M.; Wang, Q.; Yang, Z.; Hui, X.; Nie, X.; Ahmed, A.; Chen, Q.; Wang, Y.; Zhu, L.; et al. Cotton 4-coumarate-CoA ligase 3 enhanced plant resistance to Verticillium dahliae by promoting jasmonic acid signaling-mediated vascular lignification and metabolic flux. Plant J. 2023, 115, 190–204. [Google Scholar] [CrossRef]
  19. Li, W.; Wang, K.; Chern, M.; Liu, Y.; Zhu, Z.; Liu, J.; Zhu, X.; Yin, J.; Ran, L.; Xiong, J.; et al. Sclerenchyma cell thickening through enhanced lignification induced by OsMYB30 prevents fungal penetration of rice leaves. New Phytol. 2020, 226, 1850–1863. [Google Scholar] [CrossRef]
  20. Wang, Y.; Guo, L.; Zhao, Y.; Zhao, X.; Yuan, Z. Systematic Analysis and Expression Profiles of the 4-Coumarate: CoA Ligase (4CL) Gene Family in Pomegranate (Punica granatum L.). Int. J. Mol. Sci. 2022, 23, 3509. [Google Scholar] [CrossRef] [PubMed]
  21. Ma, Z.H.; Nan, X.T.; Li, W.F.; Mao, J.; Chen, B.H. Comprehensive genomic identification and expression analysis 4CL gene family in apple. Gene 2023, 858, 147197. [Google Scholar] [CrossRef]
  22. Ran, F.; Xiang, C.; Wang, C.; Zang, Y.; Liu, L.; Wu, S.; Wang, C.; Cai, J.; Wang, D.; Min, Y. Identification of the 4CL family in cassava (Manihot esculenta Crantz) and expression pattern analysis of the Me4CL32 gene. Biochem. Biophys. Res. Commun. 2024, 735, 150731. [Google Scholar] [CrossRef]
  23. Zhang, C.; Zang, Y.; Liu, P.; Zheng, Z.; Ouyang, J. Characterization, functional analysis and application of 4-Coumarate: CoA ligase genes from Populus trichocarpa. J. Biotechnol. 2019, 302, 92–100. [Google Scholar] [CrossRef]
  24. Li, Y.; Kim, J.I.; Pysh, L.; Chapple, C. Four Isoforms of Arabidopsis 4-Coumarate:CoA Ligase Have Overlapping yet Distinct Roles in Phenylpropanoid Metabolism. Plant Physiol. 2015, 169, 2409–2421. [Google Scholar]
  25. Gong, M.Q.; Lai, F.F.; Chen, J.Z.; Li, X.H.; Chen, Y.J.; He, Y. Traditional uses, phytochemistry, pharmacology, applications, and quality control of Gastrodia elata Blume: A comprehensive review. J. Ethnopharmacol. 2024, 319 Pt 1, 117128. [Google Scholar] [CrossRef]
  26. Sun, H.; Li, Y.; Feng, S.; Zou, W.; Guo, K.; Fan, C.; Si, S.; Peng, L. Analysis of five rice 4-coumarate:coenzyme A ligase enzyme activity and stress response for potential roles in lignin and flavonoid biosynthesis in rice. Biochem. Biophys. Res. Commun. 2013, 430, 1151–1156. [Google Scholar] [CrossRef]
  27. Sun, S.C.; Xiong, X.P.; Zhang, X.L.; Feng, H.J.; Zhu, Q.H.; Sun, J.; Li, Y.J. Characterization of the Gh4CL gene family reveals a role of Gh4CL7 in drought tolerance. BMC Plant Biol. 2020, 20, 125. [Google Scholar] [CrossRef]
  28. Feng, X.; Wang, Y.; Zhang, N.; Gao, S.; Wu, J.; Liu, R.; Huang, Y.; Zhang, J.; Qi, Y. Comparative phylogenetic analysis of CBL reveals the gene family evolution and functional divergence in Saccharum spontaneum. BMC Plant Biol. 2021, 21, 395. [Google Scholar] [CrossRef]
  29. Nie, T.; Sun, X.; Wang, S.; Wang, D.; Ren, Y.; Chen, Q. Genome-Wide Identification and Expression Analysis of the 4-Coumarate: CoA Ligase Gene Family in Solanum tuberosum. Int. J. Mol. Sci. 2023, 24, 1642. [Google Scholar] [CrossRef]
  30. Atanasov, A.G.; Waltenberger, B.; Pferschy-Wenzig, E.M.; Linder, T.; Wawrosch, C.; Uhrin, P.; Temml, V.; Wang, L.; Schwaiger, S.; Heiss, E.H.; et al. Discovery and resupply of pharmacologically active plant-derived natural products: A review. Biotechnol. Adv. 2015, 33, 1582–1614. [Google Scholar] [CrossRef]
  31. Baker, D.D.; Chu, M.; Oza, U.; Rajgarhia, V. The value of natural products to future pharmaceutical discovery. Nat. Prod. Rep. 2007, 24, 1225–1244. [Google Scholar] [CrossRef]
  32. Gill, U.S.; Uppalapati, S.R.; Gallego-Giraldo, L.; Ishiga, Y.; Dixon, R.A.; Mysore, K.S. Metabolic flux towards the (iso)flavonoid pathway in lignin modified alfalfa lines induces resistance against Fusarium oxysporum f. sp. medicaginis. Plant Cell Environ. 2018, 41, 1997–2007. [Google Scholar]
  33. Shinde, B.A.; Dholakia, B.B.; Hussain, K.; Panda, S.; Meir, S.; Rogachev, I.; Aharoni, A.; Giri, A.P.; Kamble, A.C. Dynamic metabolic reprogramming of steroidal glycol-alkaloid and phenylpropanoid biosynthesis may impart early blight resistance in wild tomato (Solanum arcanum Peralta). Plant Mol. Biol. 2017, 95, 411–423. [Google Scholar] [CrossRef]
  34. Wang, S. Cloning of Phenylpropanoid Metabolism and Key Gene Cl4CL in the Resistance Response of Watermelon to Fusarium Wilt. Master’s Thesis, Hebei Agricultural University, Baoding, China, 2021. [Google Scholar]
  35. Zhang, M.; Tian, M.; Weng, Z.; Yang, Y.; Pan, N.; Shen, S.; Zhao, H.; Du, H.; Qu, C.; Yin, N. Genome-Wide Identification Analysis of the 4-Coumarate: Coa Ligase (4CL) Gene Family in Brassica U’s Triangle Species and Its Potential Role in the Accumulation of Flavonoids in Brassica napus L. Plants 2025, 14, 211. Plants 2025, 14, 211. [Google Scholar]
  36. Chen, X.; Wang, H.; Li, X.; Ma, K.; Zhan, Y.; Zeng, F. Molecular cloning and functional analysis of 4-Coumarate:CoA ligase 4(4CL-like 1) from Fraxinus mandshurica and its role in abiotic stress tolerance and cell wall synthesis. BMC Plant Biol. 2019, 19, 231. [Google Scholar] [CrossRef]
  37. Zhou, X.; Cao, J.; Liu, X.M.; Wang, L.N.; Zhang, W.W.; Ye, J.B.; Xu, F.; Cheng, S. Cloning and functional analysis of Gb4CL1 and Gb4CL2 from Ginkgo biloba. Plant Genome 2024, 17, e20440. [Google Scholar] [CrossRef]
  38. Chtioui, W.; Balmas, V.; Delogu, G.; Migheli, Q.; Oufensou, S. Bioprospecting Phenols as Inhibitors of Trichothecene-Producing Fusarium: Sustainable Approaches to the Management of Wheat Pathogens. Toxins 2022, 14, 72. [Google Scholar] [CrossRef]
  39. Irais, C.M.; María-de-la-Luz, S.G.; Dealmy, D.G.; Agustina, R.M.; Nidia, C.H.; Mario-Alberto, R.G.; Luis-Benjamín, S.G.; María-Del-Carmen, V.M.; David, P.E. Plant Phenolics as Pathogen-Carrier Immunogenicity Modulator Haptens. Curr. Pharm. Biotechnol. 2020, 21, 897–905. [Google Scholar] [CrossRef]
  40. Angellotti, M.C.; Bhuiyan, S.B.; Chen, G.; Wan, X.F. CodonO: Codon usage bias analysis within and across genomes. Nucleic Acids Res. 2007, 35 (Suppl. S2), W132–W136. [Google Scholar] [CrossRef]
  41. Gao, Y.; Lu, Y.; Song, Y.; Jing, L. Analysis of codon usage bias of WRKY transcription factors in Helianthus annuus. BMC Genom. Data 2022, 23, 46. [Google Scholar] [CrossRef]
  42. Cao, Y.; Han, Y.; Li, D.; Lin, Y.; Cai, Y. Systematic analysis of the 4-Coumarate: Coenzyme a ligase (4CL) related genes and expression profiling during fruit development in the Chinese pear. Genes 2016, 7, 89. [Google Scholar] [CrossRef]
  43. Kaur, A.; Sharma, K.; Pawar, S.V.; Sembi, J.K. Genome-wide characterization of PAL, C4H, and 4CL genes regulating the phenylpropanoid pathway in Vanilla planifolia. Sci. Rep. 2025, 15, 10714. [Google Scholar] [CrossRef] [PubMed]
  44. Zhou, L.; Chen, T.; Qiu, X.; Liu, J.; Guo, S. Evolutionary differences in gene loss and pseudogenization among mycoheterotrophic orchids in the tribe Vanilleae (subfamily Vanilloideae). Front. Plant Sci. 2023, 14, 1160446. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Physical locations of Ge4CLs in G. elata: Chromosome numbers are marked on the left side of the chromosomes in green font, and chromosome lengths are indicated in megabases (Mb). The red colours on the chromosomes represent the highest density, and blue indicates the lowest density.
Figure 1. Physical locations of Ge4CLs in G. elata: Chromosome numbers are marked on the left side of the chromosomes in green font, and chromosome lengths are indicated in megabases (Mb). The red colours on the chromosomes represent the highest density, and blue indicates the lowest density.
Ijms 26 07610 g001
Figure 2. Collinearity analysis of Ge4CLs in G. elata: (a) Intra-species collinearity analysis of Ge4CLs. (b) Collinearity analysis of 4CLs between G. elata and five other representative plants. The grey lines represent collinear blocks, and the red lines represent collinear 4CL gene pairs.
Figure 2. Collinearity analysis of Ge4CLs in G. elata: (a) Intra-species collinearity analysis of Ge4CLs. (b) Collinearity analysis of 4CLs between G. elata and five other representative plants. The grey lines represent collinear blocks, and the red lines represent collinear 4CL gene pairs.
Ijms 26 07610 g002
Figure 3. Maximum-likelihood (ML) phylogenetic tree of 4-coumarate–CoA ligase (4CL) proteins from G. elata (Ge) and 43 reference plant species. The tree was reconstructed in TBtools v1.138 using the JTT + G4 model and 5000 bootstrap replicates; bootstrap support values (%) are shown at the nodes. Subfamilies I–III are colour-shaded for clarity. Protein IDs of reference sequences are given in parentheses (e.g., At4CL1 = A. thaliana At1g51680; Os4CL1 = O. sativa AK069932.1; full list provided in Materials and Methods). Scale bar indicates the number of substitutions per site.
Figure 3. Maximum-likelihood (ML) phylogenetic tree of 4-coumarate–CoA ligase (4CL) proteins from G. elata (Ge) and 43 reference plant species. The tree was reconstructed in TBtools v1.138 using the JTT + G4 model and 5000 bootstrap replicates; bootstrap support values (%) are shown at the nodes. Subfamilies I–III are colour-shaded for clarity. Protein IDs of reference sequences are given in parentheses (e.g., At4CL1 = A. thaliana At1g51680; Os4CL1 = O. sativa AK069932.1; full list provided in Materials and Methods). Scale bar indicates the number of substitutions per site.
Ijms 26 07610 g003
Figure 4. Conserved motifs and gene structure analysis: (a) Conserved motifs of Ge4CLs. (b) Gene structures of Ge4CLs. (c) Motif identifications of Ge4CLs.
Figure 4. Conserved motifs and gene structure analysis: (a) Conserved motifs of Ge4CLs. (b) Gene structures of Ge4CLs. (c) Motif identifications of Ge4CLs.
Ijms 26 07610 g004
Figure 5. Distribution of different types of cis-acting elements in Ge4CLs: The numbers in the heatmap represent the quantities of different types of cis-acting elements in the promoter regions.
Figure 5. Distribution of different types of cis-acting elements in Ge4CLs: The numbers in the heatmap represent the quantities of different types of cis-acting elements in the promoter regions.
Ijms 26 07610 g005
Figure 6. Analysis of codon usage bias of the Ge4CLs: (a) ENC plot analysis. GC3s: GC content at the third position of the codon; (b) PR2 plot analysis. A3s: content of base A at the third position of the codon; T3s: content of base T at the third position of the codon; G3s: content of base G at the third position of the codon; C3s: content of base C at the third position of the codon. In the figure, x = 0.5 and y = 0.5 are used as reference lines for analysing codon preference. (c) Neutral plot analysis. GC3: GC content at the third position of the codon; GC12: average GC content at the first and second positions of the codon.
Figure 6. Analysis of codon usage bias of the Ge4CLs: (a) ENC plot analysis. GC3s: GC content at the third position of the codon; (b) PR2 plot analysis. A3s: content of base A at the third position of the codon; T3s: content of base T at the third position of the codon; G3s: content of base G at the third position of the codon; C3s: content of base C at the third position of the codon. In the figure, x = 0.5 and y = 0.5 are used as reference lines for analysing codon preference. (c) Neutral plot analysis. GC3: GC content at the third position of the codon; GC12: average GC content at the first and second positions of the codon.
Ijms 26 07610 g006
Figure 7. Expression profiles of Ge4CL genes based on public RNA-Seq data: (a) Tissue specificity across four developmental tissues: 1 = mature tuber (MT), 2 = juvenile tuber (JT), 3 = mother tuber of G. elata (MT-Ge), 4 = mother tuber of juvenile (MT-JT). (b) Comparison between fungal-diseased (FD) and healthy (HT) mature tubers of G. elata f. glauca. Heatmaps were generated from log2-transformed TPM values; colour scale ranges from red (high expression) to blue (low expression). Gene IDs are listed on the left.
Figure 7. Expression profiles of Ge4CL genes based on public RNA-Seq data: (a) Tissue specificity across four developmental tissues: 1 = mature tuber (MT), 2 = juvenile tuber (JT), 3 = mother tuber of G. elata (MT-Ge), 4 = mother tuber of juvenile (MT-JT). (b) Comparison between fungal-diseased (FD) and healthy (HT) mature tubers of G. elata f. glauca. Heatmaps were generated from log2-transformed TPM values; colour scale ranges from red (high expression) to blue (low expression). Gene IDs are listed on the left.
Ijms 26 07610 g007
Figure 8. Subcellular localisation analysis of Ge4CL2 and Ge4CL5 proteins: Bright Light: The field of view under the bright-light channel, showing the cell morphology. NLS-RFP: The field of view under the red fluorescence channel; NLS-RFP was localised in the nucleus. GFP: The field of view under the green fluorescence channel, showing the localisation of the green fluorescent protein (GFP). Merged: The overlapping effect of the images under the three channels. The scale bar is 25 nm.
Figure 8. Subcellular localisation analysis of Ge4CL2 and Ge4CL5 proteins: Bright Light: The field of view under the bright-light channel, showing the cell morphology. NLS-RFP: The field of view under the red fluorescence channel; NLS-RFP was localised in the nucleus. GFP: The field of view under the green fluorescence channel, showing the localisation of the green fluorescent protein (GFP). Merged: The overlapping effect of the images under the three channels. The scale bar is 25 nm.
Ijms 26 07610 g008
Table 1. Physicochemical properties of the Ge4CLs.
Table 1. Physicochemical properties of the Ge4CLs.
Gene NameGene IDNumber of aaMolecular Weight/kDpI ValueInstability IndexGRAVYSubcellular Localisation
Ge4CL1GelC01G00992.154758.158.8249.470.163Plasma membrane
Ge4CL2GelC02G00823.167074.277.1533.04−0.064Cytoplasm
Ge4CL3GelC03G01279.255360.168.5141.850.095Plasma membrane
Ge4CL4GelC03G01284.156060.166.2935.890.179Plasma membrane
Ge4CL5GelC04G00359.155160.105.4243.370.06Chloroplast
Ge4CL6GelC04G01345.154258.786.1940.940.1Plasma membrane
Ge4CL7GelC05G00275.154459.275.5938.710.077Plasma membrane
Ge4CL8GelC06G00150.165473.006.2327.81−0.081Cytoplasm
Ge4CL9GelC06G00789.153357.286.4744.490.166Plasma membrane
Ge4CL10GelC10G00141.152255.705.8842.50.022Endoplasmic reticulum
Ge4CL11GelC12G00053.269777.416.5334.69−0.113Cytoplasm
Ge4CL12GelC16G00539.166475.086.4935.78−0.191Cytoplasm
Ge4CL13GelC16G00628.152957.486.3436.820.043Chloroplast
Ge4CL14GelC16G00735.169776.246.4736.10.005Plasma membrane
Table 2. Base composition of codons in the Ge4CL gene family.
Table 2. Base composition of codons in the Ge4CL gene family.
Gene NameT3sC3sA3sG3sGC3sGCGC1GC2GC3CAICBIFopENCL_symAromo
Ge4CL10.150.530.170.330.730.610.610.490.730.210.140.4945.275310.07
Ge4CL20.430.240.320.270.420.430.510.370.420.20−0.070.3856.446460.10
Ge4CL30.320.350.240.290.550.510.540.440.550.210.030.4354.025330.09
Ge4CL40.220.410.170.410.680.580.610.440.680.220.070.4553.905460.07
Ge4CL50.170.530.130.390.750.580.560.410.750.220.100.4743.685300.06
Ge4CL60.300.380.230.290.560.510.540.430.560.200.020.4256.445310.08
Ge4CL70.210.500.120.390.730.590.610.410.730.200.060.4345.605240.07
Ge4CL80.450.210.330.280.400.430.520.370.400.20−0.060.3855.166340.11
Ge4CL90.220.400.220.350.630.560.580.470.630.210.130.4853.565180.06
Ge4CL100.160.480.150.410.740.610.610.480.740.230.130.4945.285130.07
Ge4CL110.470.180.350.250.350.420.480.420.350.20−0.130.3551.356770.10
Ge4CL120.430.190.420.240.350.410.500.380.350.18−0.120.3549.186390.11
Ge4CL130.090.560.110.420.830.660.650.490.830.200.100.4641.155060.07
Ge4CL140.410.250.300.280.430.460.530.420.430.21−0.020.4155.966750.09
Table 3. Comparison of codon biases between the Ge4CLs and representative species.
Table 3. Comparison of codon biases between the Ge4CLs and representative species.
CodonAAGe/EcGe/NtGe/AtGe/ScGe/OsCodonAAGe/EcGe/NtGe/AtGe/ScGe/Os
UUUPhe0.860.840.970.811.61UAUTyr0.60.730.890.691.3
UUC 1.511.171.011.140.94UAC 1.581.371.351.251.23
UUALeu0.470.60.640.311.33UAATER0.10.180.220.180.29
UUG 1.210.70.750.571.06UAG 21.21.21.20.75
CUU 1.210.730.731.421.15CAUHis0.850.780.760.770.93
CUC 3.072.371.815.41.13CAC 1.341.131.131.260.71
CUA 1.340.80.760.560.97CAAGln0.680.470.990.360.73
CUG 0.572.092.172.031.01CAG 0.5610.991.240.72
AUUIle0.850.911.180.841.78AAUAsn0.580.60.760.471.12
AUC 1.31.821.371.471.3AAC 0.670.770.660.551.74
AUA 10.951.060.751.51AAALys0.560.630.670.491.29
AUGMet0.920.870.891.040.91AAG 1.990.910.930.990.94
GUUVal1.060.860.851.041GAUAsp0.820.750.750.731.09
GUC 1.792.11.831.981.16GAC 1.141.211.191.010.73
GUA 0.710.820.940.791.37GAAGlu0.770.750.790.591.25
GUG 1.161.381.332.140.95GAG 1.670.81.011.690.84
UCUSer1.200.790.620.671.24UGUCys10.61.560.730.95
UCC 2.112.011.831.441.26UGC 2.291.751.752.631.02
UCA 0.890.660.640.630.94UGATER0.730.80.671.420.67
UCG 1.51.321.321.431UGGTrp0.720.80.780.930.7
CCUPro1.850.940.941.31.29CGUArg0.320.680.570.80.71
CCC 2.432.292.852.221.25CGC 0.742.642.713.960.64
CCA 1.160.540.660.580.75CGA 0.960.870.731.530.72
CCG 1.0331.742.830.83CGG 1.052.241.694.880.62
ACUThr0.890.570.660.571.09AGUSer0.390.390.370.370.59
ACC 0.771.51.411.140.97AGC 0.921.321.171.350.83
ACA 0.80.70.770.681.04AGAArg1.390.620.520.460.94
ACG 1.992.291.341.290.9AGG 0.930.961.061.270.73
GCUAla1.080.650.720.961.04GGUGly0.50.530.530.490.8
GCC 1.322.292.780.270.93GGC 1.232.262.752.580.86
GCA 0.781.781.031.111.04GGA 1.630.960.922.041.4
GCG 0.853.091.992.890.67GGG 1.562.781.883.21.12
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sha, S.; Mu, K.; Luo, Q.; Yao, S.; Tang, T.; Sun, W.; Ju, Z.; Pang, Y. Genome-Wide Identification and Characterisation of the 4-Coumarate–CoA Ligase (4CL) Gene Family in Gastrodia elata and Their Transcriptional Response to Fungal Infection. Int. J. Mol. Sci. 2025, 26, 7610. https://doi.org/10.3390/ijms26157610

AMA Style

Sha S, Mu K, Luo Q, Yao S, Tang T, Sun W, Ju Z, Pang Y. Genome-Wide Identification and Characterisation of the 4-Coumarate–CoA Ligase (4CL) Gene Family in Gastrodia elata and Their Transcriptional Response to Fungal Infection. International Journal of Molecular Sciences. 2025; 26(15):7610. https://doi.org/10.3390/ijms26157610

Chicago/Turabian Style

Sha, Shan, Kailang Mu, Qiumei Luo, Shi Yao, Tianyu Tang, Wei Sun, Zhigang Ju, and Yuxin Pang. 2025. "Genome-Wide Identification and Characterisation of the 4-Coumarate–CoA Ligase (4CL) Gene Family in Gastrodia elata and Their Transcriptional Response to Fungal Infection" International Journal of Molecular Sciences 26, no. 15: 7610. https://doi.org/10.3390/ijms26157610

APA Style

Sha, S., Mu, K., Luo, Q., Yao, S., Tang, T., Sun, W., Ju, Z., & Pang, Y. (2025). Genome-Wide Identification and Characterisation of the 4-Coumarate–CoA Ligase (4CL) Gene Family in Gastrodia elata and Their Transcriptional Response to Fungal Infection. International Journal of Molecular Sciences, 26(15), 7610. https://doi.org/10.3390/ijms26157610

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop