Design, Validation, and Application of Transcriptome-Based InDel Markers in Phalaenopsis-Type Dendrobium Varieties

Yu, Xiaoyun; Yao, Tongyan; Luo, Xiaoyan; Yi, Shuangshuang; Liao, Yi; Lu, Shunjiao

doi:10.3390/horticulturae11121459

Open AccessArticle

Design, Validation, and Application of Transcriptome-Based InDel Markers in Phalaenopsis-Type Dendrobium Varieties

by

Xiaoyun Yu

^1,2,3,4

,

Tongyan Yao

^1,2,3,4,

Xiaoyan Luo

^1,2,3,4

,

Shuangshuang Yi

^1,2,3,4

,

Yi Liao

^1,2,3,4 and

Shunjiao Lu

^1,2,3,4,*

¹

Tropical Crops Genetic Resources Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China

²

Key Laboratory of Crop Gene Resources and Germplasm Enhancement in Southern China, Ministry of Agriculture, Haikou 571101, China

³

Key Laboratory of Tropical Crops Germplasm Resources Genetic Improvement and Innovation of Hainan Province, The Engineering Technology Research Center of Tropical Ornamental Plant Germplasm Innovation and Utilization, Haikou 571101, China

⁴

National Key Laboratory for Tropical Crop Breeding, Sanya Research Institute, Chinese Academy of Tropical Agricultural Sciences, Sanya 572024, China

^*

Author to whom correspondence should be addressed.

Horticulturae 2025, 11(12), 1459; https://doi.org/10.3390/horticulturae11121459

Submission received: 21 October 2025 / Revised: 27 November 2025 / Accepted: 28 November 2025 / Published: 3 December 2025

(This article belongs to the Section Genetics, Genomics, Breeding, and Biotechnology (G2B2))

Download

Browse Figures

Versions Notes

Abstract

The genetic improvement of Phalaenopsis-type Dendrobium, a valuable ornamental and medicinal orchid, is hindered by the lack of a complete reference genome. In this study, a transcriptome-based approach was employed to develop and validate insertion–deletion (InDel) markers for genetic analysis and variety identification. RNA-seq was performed on two distinct varieties, resulting in the de novo assembly of 156,108 unigenes. A bioinformatics pipeline was developed to identify 5083 high-quality InDel loci, from which 1029 potential markers were designed. Fifty primer pairs were selected and validated experimentally, with 84% successfully amplifying clear products, and 76% showing polymorphism. The polymorphism information content (PIC) of the markers ranged from 0.25 to 0.78, indicating their high potential for use in genetic diversity studies. These markers were used to classify 24 Phalaenopsis-type Dendrobium varieties into distinct genetic clusters. This work provides a scalable and robust platform for molecular breeding, DNA fingerprinting, and germplasm management in non-model species that lack a reference genome. By leveraging transcriptome data, these markers will contribute to the efficient genetic improvement of Dendrobium and other similar crops.

Keywords:

Phalaenopsis-type Dendrobium; InDel markers; RNA-seq; genetic diversity

1. Introduction

The genus Dendrobium (Orchidaceae) encompasses a diverse group of orchids with significant ornamental and medicinal value. Phalaenopsis-type Dendrobium refers to a set of Dendrobium hybrids and varieties notable for their large, showy flowers that resemble those of the genus Phalaenopsis. These orchids are extensively cultivated in China and other parts of Asia for their esthetic appeal and pharmacological properties. Despite their horticultural importance, Phalaenopsis-type Dendrobium currently lacks a publicly available comprehensive reference genome. Currently, the genomes of several orchid species, such as Cymbidium goeringii, Dendrobium officinale, Dendrobium chrysotoxum, and Dendrobium nobile [1,2,3,4], have been sequenced. However, the genome of Phalaenopsis-type Dendrobium remains unavailable, limiting molecular marker design, genetic diversity analysis, and variety identification. The substantial genomic differences among Dendrobium species make existing reference genomes unsuitable for detailed analysis of Phalaenopsis-type varieties, hindering gene mapping, marker development, and DNA fingerprinting—all crucial for molecular breeding and accurate variety identification.

Amplified Fragment Length Polymorphism (AFLP) markers were used in early genetic diversity studies of Dendrobium, to assess genetic relationships and variation [5,6]. Molecular markers such as simple sequence repeats (SSRs) and single-nucleotide polymorphisms (SNPs) are indispensable tools in plant genetics and breeding [7]. Insertion–deletions (InDels) represent another class of markers that offer unique advantages: they are co-dominant in inheritance, abundant across genomes, and can be easily detected through PCR and agarose gel electrophoresis [8]. The SSR markers designed based on transcriptomes have also been applied in Dendrobium [9,10]. Both SSR and InDel markers are based on PCR amplification, but InDel markers offer distinct advantages. Unlike SSR markers, InDel markers do not require the use of toxic reagents such as polyacrylamide gel electrophoresis, and they also shorten electrophoresis time. These features make InDel markers particularly advantageous, especially in species lacking reference genomes. They allow for high-resolution genetic mapping and genotyping without the need for advanced sequencing platforms. Furthermore, AFLP, while also a powerful marker system, requires complex and labor-intensive procedures, including the use of restriction enzymes and the generation of large, difficult-to-interpret datasets. In contrast, InDel markers offer a simpler and more straightforward approach to genotyping, providing a more efficient alternative for genetic studies in non-model species like Phalaenopsis-type Dendrobium. InDel markers have been successfully utilized in other plant species for genetic linkage analysis and diversity studies [8,11,12], but their development in Phalaenopsis-type Dendrobium has been limited by the absence of genomic data.

The diversity analysis of Orchidaceae plants based on different markers has demonstrated that the family exhibits a complex genetic background and high heterozygosity [13,14,15,16]. This indirectly suggests a high potential for conducting InDel marker analysis in Phalaenopsis-type Dendrobium. Given these challenges and opportunities, the aim of this study was to develop and validate a robust set of InDel markers for Phalaenopsis-type Dendrobium using transcriptome sequencing data. Due to the absence of a reference genome for Phalaenopsis-type Dendrobium and the substantial genomic differences between species within the Dendrobium genus, a de novo transcriptome assembly approach was employed. In contrast to reference-based assembly, which relies on a pre-existing genome for mapping reads, de novo assembly builds the transcriptome directly from sequencing data, making it particularly suitable for species lacking a reference genome. This approach has been successfully applied in similar studies, such as the development of de novo transcriptome assemblies and SSR markers in Brassica species [17] and the comparison of de novo vs. reference genome assembly for drought stress analysis in Brassica genotypes [18]. By leveraging the expressed portion of the genome (the transcriptome) from two varieties with distinct phenotypic traits, a bioinformatics pipeline and custom scripts were established to identify polymorphic InDel loci and to design primers flanking those loci. This workflow included transcriptome assembly, variant discovery, InDel filtering, primer design, and in silico specificity screening, with all steps automated through our scripts. The efficacy of these markers was then tested across a diverse panel of 24 Phalaenopsis-type Dendrobium varieties to assess their amplification success and polymorphism. The successful development and validation of these transcriptome-based InDel markers not only provides valuable tools for genetic diversity assessment, variety identification, and breeding programs in Phalaenopsis-type Dendrobium, but also demonstrates the feasibility of applying a reproducible transcriptome-derived marker development pipeline in non-model, highly heterozygous plant species lacking a complete reference genome.

2. Materials and Methods

2.1. Plant Materials

A panel of 24 hybrid Phalaenopsis-type Dendrobium varieties was collected to validate the developed markers. To ensure the broad applicability and high discriminatory power of the markers, these varieties were specifically selected based on their diverse pedigrees, representing parents from different genetic origins. These varieties include both local varieties and commercial cultivars primarily sourced from Southeast Asia and southern China, covering a wide range of phenotypic traits. The varieties included: Nopporn Green Star, Red Bull, Burana Emerald, Pop Eye, Burana White, Siri Gem, Burana Charming, Dendrobium Sripratum Delight, Swirl, Suriya Gold, Smile, Thongchai Gold, Enobi Purple, Leong Yok Kin, Asian Pearl, Caesar, Serene Chang ‘red’, Little Girl, Sonia Hiasakul, Pensuda, Rasa Saying, Burana Gold, Burbank Queen × Dal’s Jazz, Alice’s Stewart. Young leaves were collected from each variety for DNA extraction. Among these, two varieties—Sonia Hiasakul and Dendrobium Sripratum Delight—exhibit distinct phenotypic differences and were selected for transcriptome sequencing to discover polymorphic loci. The characteristics of Sonia Hiasakul include rhombic tepals with a flower color in the red-purple N79B shade, and the plant exhibits a spreading leaf habit. In contrast, the features of Dendrobium Sripratum Delight are tepals that are more ovoid in shape, with a flower color in the deep purple-red N80A shade, and the plant has a compact and erect leaf habit.

2.2. Transcriptome Data Acquisition and Analysis

To extract InDel variations and design molecular markers, Sonia Hiasakul and Dendrobium Sripratum Delight were selected as the varieties for transcriptome sequencing. Three biological replicates were selected for each parent, with the three most mature leaves from the top of each individual plant’s uppermost leaf cluster used as experimental replicates. Total RNA was extracted from young leaf tissue of the two selected varieties using the RNAprep Pure Plant Plus Kit (TIANGEN, Beijing, China), followed by RNase-free DNase I treatment (Takara Bio, Dalian, China) to remove contaminating genomic DNA. RNA quality and integrity were assessed by 1% agarose gel electrophoresis and quantified using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). High-quality RNA samples were subjected to polyA selection to enrich for mRNA, followed by the construction of complementary DNA (cDNA) libraries with the NEBNext^® Ultra™ RNA Library Prep Kit for Illumina^® (New England Biolabs, Ipswich, MA, USA) according to the manufacturer’s instructions. The libraries were sequenced on the Illumina HiSeq 2500 platform (Illumina, San Diego, CA, USA), generating 150 bp paired-end reads for high coverage transcriptome data.

To assess the quality of the transcriptome data, several key metrics were evaluated. Raw and clean reads were assessed using FastQC (https://github.com/s-andrews/FastQC) (accessed on 1 March 2025), and sequencing depth was quantified by reporting the total number of raw and clean bases, as well as the Q20 and Q30 scores. Adapter sequences and low-quality bases were trimmed using Trimmomatic [19] with appropriate parameters to ensure high-quality clean reads, including ILLUMINACLIP (2:30:10 for adapter removal), LEADING and TRAILING (for quality score < 3 trimming), SLIDINGWINDOW (size: 4, quality: 15), and MINLEN (36 bp). The assembly of the transcriptome was conducted using Trinity assembler [20], and to reduce redundancy, the longest transcript for each gene was extracted as the representative unigene sequence. Using BUSCO v6.0.0 [21] with the embryophyta_odb10 lineage dataset, we assessed the presence of conserved, single-copy orthologs across the assembly. The BUSCO completeness score provides an estimate of the percentage of genes in the assembly that are complete, fragmented, or missing. In this study, we also compared the Trinity assembly, which includes all isoforms, with the Unigene set, which retains only the longest transcript for each gene. This comparison allowed us to evaluate the impact of redundancy on the assembly and its downstream analysis. Trinity was run with default parameters, and the resulting contigs (transcript sequences) were clustered to produce a set of unigenes (unique representative transcripts per gene locus). The longest transcript for each gene was extracted as the representative unigene sequence (output as a unigene.fasta file).

2.3. InDel Detection and Primer Design

For the identification of insertion–deletion polymorphisms, the clean RNA-seq reads from two varieties were aligned back to the assembled unigene set. Alignment was performed using a short-read aligner (BWA), treating the assembled unigenes as a pseudo-reference. The alignment files were processed with SAMtools [22] and Picard tools to sort by coordinate, remove duplicate reads, and prepare the data for variant calling. InDel variant calling was conducted using the Genome Analysis Toolkit (GATK v3) [23] in UnifiedGenotyper or HaplotypeCaller mode, with parameters tuned for discovering small insertions and deletions in a pooled sample of two varieties.

Given the highly heterozygous nature of Phalaenopsis-type Dendrobium, each genomic locus can present multiple alleles. To focus on clear bi-allelic polymorphisms and simplify downstream analysis, the raw InDel calls were filtered with the following criteria: (1) retain only loci with up to two alternate alleles (any site with more than two allelic variants across the two varieties was excluded), and (2) require an indel length difference between reference and alternate allele of 15–29 bp. InDel loci not meeting these criteria were discarded. After filtering, the coordinates of the remaining InDels (relative to the unigene sequences) were used to extract the corresponding sequences from the unigene.fasta file for primer design. This extraction was automated with a custom Python (version 3.12.4) script (step1.extract_indel.py).

For each high-confidence InDel locus, polymerase chain reaction (PCR) primers flanking the indel region were designed. To ensure the primers would amplify a product spanning the indel, a 300 bp sequence window centered on each InDel (approximately 150 bp upstream and 150 bp downstream of the indel site) was extracted using another script (step2.150bp.py). This sequence window served as the template for primer design. Primer3 was employed [24] to design up to five primer pairs for each InDel locus, targeting an amplicon size of 260–280 bp that includes the indel. Primer design criteria were primer length 20–24 nucleotides, melting temperature (Tm) 56–60 °C (optimized around 58 °C), and GC content 40–60%. These parameters were chosen to ensure robust PCR performance and similar annealing temperatures across primers. This automated design process was implemented via a script (step3.primerdesign.py), generating a list of candidate primer pairs for each InDel.

The primer sequences generated for all candidate InDel markers were compiled and converted into FASTA format (step4.extract-primer.py) to facilitate specificity screening. To ensure that each primer pair would amplify a unique target (the intended InDel locus) and not bind elsewhere in the genome, an in silico specificity check was performed using BLAST (version 2.16.0+). Since a complete reference genome for Phalaenopsis-type Dendrobium was unavailable, the draft genome sequences of Dendrobium nobile [25] was used, a different ecological type within the Dendrobium genus, as a surrogate reference genome. A BLAST database was constructed from the Dendrobium nobile genome and performed BLAST searches to assess primer specificity. Each primer sequence (forward and reverse) was searched against this genome database using NCBI BLAST+ (blastn). BLAST hits with E-value < 10⁻⁵ were examined. If a primer had multiple significant hits or a second-best hit with comparable E-value to the best hit, it was considered non-unique. The primer pair was filtered by removing any pair where either primer showed multiple binding sites in the genome. In practice, the BLAST output for each of the five candidate primer pairs per locus sequentially was analyzed: if the top-ranked primer pair was not unique, the next primer pair for that locus was evaluated, and so on. If none of the five primer pairs for a given locus was uniquely mapped, that InDel locus was deemed unsuitable and dropped from further consideration. This filtering step (implemented with script step5.blast.py) ensured that the remaining primers likely amplify only the specific target locus. After this specificity screening, at most one primer pair per locus was retained (the first pair that passed the uniqueness criterion). Finally, the information about all retained primer pairs (InDel locus ID, primer sequences, expected amplicon size, etc.) were collated into a final primer list using a script (step6.final.py). All custom scripts used in the bioinformatic pipeline are available in our GitHub repository (version 1.0.0) (https://github.com/biodendrobium/InDel-Marker-Design-Using-Transcriptome-Data) (accessed on 1 October 2025).

2.4. Selection of Primer Pairs for Validation

From the pool of InDel primer pairs that passed the in silico filtering (1029 specific primer pairs in total), 50 primer pairs were selected for experimental validation. The selection was done randomly while ensuring that a range of InDel sizes and loci from different unigenes were represented. These 50 primer pairs were synthesized (Beijing Tsingke Biotech, Beijing, China) and used to genotype the full panel of 24 Phalaenopsis-type Dendrobium varieties described above. Genomic DNA from each variety was extracted from young leaves using a modified CTAB method [26].

PCR amplifications were carried out in a 20 μL reaction volume containing approximately 50 ng of template genomic DNA, 0.2 μM of each primer (forward and reverse), 200 μM of each dNTP, 2 μL of 10× PCR buffer (with Mg²⁺), and 0.5 U of rTaq DNA polymerase (Takara Bio, Beijing, China). The PCR cycling conditions were initial denaturation at 94 °C for 2 min; followed by 35 cycles of denaturation at 94 °C for 30 s, primer-specific annealing at 58 °C for 30 s, and extension at 72 °C for 30 s, with a final extension at 72 °C for 2 min. The PCR products were separated by electrophoresis on a 1.5% (w/v) agarose gel stained with 3% fluorescent DNA dye (GoldView, Zomanbio, Beijing, China). A 100 bp DNA ladder was run alongside to estimate fragment sizes. Gel images were visualized under UV light and documented.

For each primer pair, the presence or absence of bands was recorded for all 24 samples. Primers that produced no PCR product in all samples were noted as failures. Primers that produced a single band of the expected size in all samples were noted as non-polymorphic. Primers that yielded bands of different sizes among the varieties were considered polymorphic. Distinct genotypes were assigned codes to different banding patterns: for instance, if a marker produced a single band of a unique size in one variety and a different size band in another, these would be scored as different alleles (homozygous genotypes). If a variety showed two bands (indicating heterozygosity for two different alleles), that pattern was recorded as a separate genotype as well. Thus, each unique band size or combination of band sizes was treated as a distinct genotype for analysis.

2.5. Data Analysis

The polymorphism information content (PIC) was calculated for each polymorphic marker to assess its informativeness. PIC is defined as:

{PIC}_{i} = 1 - \sum_{j = 1}^{n} p_{i j}^{2},

where p_ij is the frequency of the j-th allele of the i-th marker in the sample set [11,27]. A higher PIC value (close to 1) indicates a more informative (highly polymorphic) marker, whereas a value of 0 indicates a monomorphic marker. Frequencies were calculated for each marker based on the observed banding patterns (genotypes) in the 24 Phalaenopsis-type Dendrobium varieties and then computed PIC values accordingly.

To examine genetic relationships among the 24 Phalaenopsis-type Dendrobium varieties, a binary genotype matrix from the gel analysis of the polymorphic markers (scoring each variety for presence/absence of each allele at each locus) was compiled. Using this data, a pairwise genetic similarity matrix was calculated based on Jaccard’s coefficient [28], which considers the proportion of shared alleles between each pair of varieties. Genetic distance and distance matrix was analyzed using the NTSYSpc software [29] to perform cluster analysis. A dendrogram was constructed using the UPGMA method, a widely used distance-based clustering algorithm, to visualize the genetic relationships. The robustness of clustering was visually assessed, and the dendrogram was drawn using MEGA X [30].

3. Results

3.1. Transcriptome Assembly and InDel Identification

Following the experimental workflow illustrated in Figure 1, transcriptome sequencing was first performed on the two selected Phalaenopsis-type Dendrobium varieties, Sonia Hiasakul and Dendrobium Sripratum Delight. The sequencing quality of the two Phalaenopsis-type Dendrobium varieties, Sonia Hiasakul and Dendrobium Sripratum Delight, was assessed based on several key metrics. For Sonia Hiasakul, an average of 10.4 GB of raw bases and 10.3 GB of clean bases were obtained, compared to 10.6 GB of raw bases and 10.6 GB of clean bases for Dendrobium Sripratum Delight. Both varieties had an error rate of 0.02%, with Sonia Hiasakul showing a Q20 score of 98.45% and a Q30 score of 95.27%, while Dendrobium Sripratum Delight had slightly higher Q20 and Q30 scores of 98.47% and 95.29%, respectively. The GC content was 46.75% for Sonia Hiasakul and 46.39% for Dendrobium Sripratum Delight. These results indicate that the sequencing data for both varieties are of high quality, with minor differences observed in raw and clean bases as well as GC content (Table S1). The transcriptome assembly was assessed for completeness using BUSCO analysis, which measures the presence of conserved single-copy orthologs (Table S2). The Trinity assembly resulted in 84.4% complete BUSCOs, with 32.1% being single-copy and 52.3% duplicated genes. In contrast, the Unigene set showed 70.0% complete BUSCOs, with a higher proportion of single-copy genes (67.9%) and fewer duplicated genes (2.1%). While the Trinity assembly provided a more comprehensive representation of gene diversity, including more duplicated genes, the Unigene assembly was more focused on single-copy genes, offering a refined representation of gene content but with more fragmented and missing BUSCOs. These results highlight that the Trinity assembly has a higher degree of redundancy, with more duplicated genes, whereas the Unigene set has less redundancy but may suffer from higher fragmentation and missing data. Given that transcript redundancy can lead to ambiguous read mapping and false-positive variant calls, selecting the Unigene set for InDel marker design is therefore reasonable and preferable for ensuring higher marker specificity and accuracy.

Illumina sequencing yielded a high volume of reads for each variety, which were successfully assembled de novo using Trinity. The combined assembly (merging both varieties’ reads) resulted in 156,108 unigene sequences longer than 300 bp. The quality of the assembly was reflected in an N50 of approximately 1419 bp, indicating that half of the total assembled nucleotides were in contigs of length 1.4 kb or greater. The length of the unigenes ranged from 301 bp (the lower cutoff applied) up to 16,623 bp for the longest transcript, with a median unigene length of 579 bp. This assembly provided a substantial reference transcriptome for downstream polymorphism discovery, covering a wide range of expressed genes in Phalaenopsis-type Dendrobium.

By aligning the reads of Sonia Hiasakul and Dendrobium Sripratum Delight back to the assembled unigenes and calling variants, a large number of candidate indel polymorphisms between the two varieties were identified. Sonia Hiasakul exhibited 104,940.1 InDel sites, while Dendrobium Sripratum Delight had 93,987.1 InDel sites (Table S3). In total, 173,995 unique InDel sites were detected across the transcriptome, accounting for shared and distinct polymorphisms between the two varieties. After applying the filtering criteria (bi-allelic indels with length 15–29 bp), 5083 InDel loci suitable for marker development were retained (Table S4). Thus, only about 2.9% of the initial indel calls met the strict criteria, which is expected given the prevalence of smaller indels and complex indels in a heterozygous genome. An inverse relationship between indel length and frequency was observed: shorter indels were far more abundant than longer ones. The distribution of indel lengths from 15 to 29 bp is shown in Figure 2. Indels of 15 bp were the most common (nearly 974 occurrences), whereas 29 bp indels were rare (only 7 occurrences), with the frequency generally decreasing as length increased. InDels smaller than 15 nucleotides often result in very subtle size differences between alleles, which may not be easily distinguishable on agarose gels. While increasing the agarose concentration could help visualize these small differences, it would also significantly increase the electrophoresis time, making the process less efficient. Therefore, focusing on InDels within the 15–29 bp range ensures more reliable and efficient genotyping. On average, each indel length within this range had about 339 occurrences. This indicates that the selected indel loci were biased toward the shorter end of the range, simply because such indels occur more frequently.

3.2. Primer Design and Validation

From the 5083 selected InDel loci, up to five primer pairs were designed per locus, yielding a theoretical maximum of 25,415 primer pairs. In practice, some loci did not produce the full set of five candidate primer pairs (due to sequence constraints or low complexity flanking regions), resulting in a total of 12,104 primer pairs designed (Table S5). Each of these primer pairs was evaluated for specificity by BLAST alignment to the draft Dendrobium nobile genome. The majority of candidate primers were discarded during this in silico screening because they either matched multiple genomic locations or aligned to repetitive regions. After BLAST-based filtering, 1029 high-specificity InDel markers were obtained, each represented by a unique primer pair that targets a single locus unambiguously (Table S6). These markers are based on transcript sequences but were confirmed to be single-copy (or at least unique) in the genome to avoid multi-locus amplification.

To empirically assess the performance of our InDel markers, 50 primer pairs were randomly selected from the filtered set for laboratory validation (Table S7). These 50 markers were used in PCR amplifications across all 24 Phalaenopsis-type Dendrobium varieties. The results of this experimental validation are summarized as follows. Out of the 50 primer pairs tested, 42 yielded successful amplification of clear PCR products of the expected size in the agarose gel, whereas 8 primer pairs failed to amplify any product in any variety. Among the 42 successful markers, 38 showed polymorphic banding patterns, while 4 produced a single band of identical size in all varieties (monomorphic markers). Importantly, no non-specific bands were observed for any of the 42 successful primer pairs—each produced a single distinct band per genotype, with no smearing or extra fragments (Table S8). This indicates a high degree of primer specificity, likely due to the rigorous BLAST screening step. In summary, our success rate was 84% (42/50) for specific amplification, and 76% (38/50) of the tested markers revealed polymorphism among the tested varieties.

For each polymorphic marker, the number of alleles observed were recorded. Most InDel markers detected 2 to 3 alleles across the 24 Phalaenopsis-type Dendrobium varieties, consistent with the expectation that many loci would be bi-allelic or triallelic in our sample. Indeed, the marker with the highest allelic diversity showed six distinct banding patterns (genotypes) among the 24 orchids. On average, each polymorphic marker exhibited 1.9 allelic types in this set of varieties. An example of the genotyping results is presented in Figure 3, which displays gel images for four representative InDel markers across all 24 samples. Each marker produced clearly resolved bands, and the variation in band size among varieties is evident, demonstrating the co-dominant and polymorphic nature of these markers. Specifically, in lane 8 (representing Dendrobium Sripratum Delight) and lane 19 (representing Sonia Hiasakul), InDel variations were clearly observed, confirming the accuracy of the InDel marker design and validating the existence of polymorphisms between these two varieties.

3.3. Polymorphism Information Content

For each of the 38 polymorphic InDel markers validated, the polymorphism information content was calculated (Figure 4). PIC values ranged from 0.25 up to 0.78 (Table S9). The average PIC across the 38 markers was 0.52. This relatively high average indicates that, collectively, the markers are quite informative. In fact, more than half of these markers (approximately 55%) had PIC > 0.5, a threshold often used to define highly polymorphic markers in population genetic studies. Markers with PIC above 0.5 are considered very useful for distinguishing genotypes because they tend to have multiple alleles with intermediate frequencies. It can be seen that a substantial portion of the markers are in the high-PIC range, underscoring their utility for genetic diversity analyses in Phalaenopsis-type Dendrobium. No marker had a PIC of 0 (since the monomorphic markers were excluded from this analysis), and only a few markers fell in the low polymorphism range (PIC around 0–0.3). These PIC results confirm that transcriptome-derived InDel markers can reach levels of informativeness comparable to traditional SSR markers in orchids [31].

3.4. Phylogenetic Analysis

The genetic similarity analysis based on the 38 polymorphic InDel markers provided insight into the relationships among the 24 hybrid Phalaenopsis-type Dendrobium varieties. Pairwise genetic similarity coefficients (Jaccard’s similarity) ranged from 0.03 (indicating very low similarity between a particular pair of varieties) to 0.89 (indicating two varieties sharing a very high proportion of alleles). When converted to a distance metric as used by NTSYS, the distances ranged from 0.01 to 0.38, indicating a moderate level of genetic variation among the 24 Phalaenopsis-type Dendrobium varieties. Overall, this range of values suggests a moderate to high level of genetic similarity among many of the cultivated varieties, reflecting a potentially shared genetic background or common breeding stock, yet there is sufficient diversity to distinguish each variety and to form clusters of related varieties.

The genetic distance matrix generated from the InDel markers was subsequently used to construct a UPGMA tree (Figure 5). The resulting dendrogram clearly revealed distinct clustering patterns among the hybrid Phalaenopsis-type Dendrobium varieties, reflecting their underlying marker-based genetic relationships. Although the analyzed accessions were derived from diverse breeding backgrounds and no shared parental combinations were documented, several accessions still formed well-supported clusters. This indicates that the InDel markers effectively capture genomic similarity that is not apparent from pedigree records alone and can discriminate even closely related varieties.

Such clustering patterns provide practical insight into the genetic structure and divergence of the studied germplasm. The clear group separation further suggests that these InDel markers are suitable for assessing genetic diversity, verifying varietal identity, and detecting cryptic relatedness or convergent selection footprints. Importantly, understanding this marker-defined structure can guide parental choice and the rational design of new hybrids with desirable trait combinations in Dendrobium.

4. Discussion

In this study, a new set of polymorphic InDel markers for Phalaenopsis-type Dendrobium was successfully developed and validated using a transcriptome-driven approach. The challenges posed by the absence of a complete reference genome and the highly heterozygous nature of Phalaenopsis-type Dendrobium were effectively addressed by focusing on expressed gene regions and applying stringent selection criteria for marker development. By sequencing the transcriptomes of two phenotypically distinct varieties and comparing their assembled sequences, nearly 174,000 raw InDel candidates were identified. The filtering steps (limiting to bi-allelic indels of moderate size 15–29 bp) reduced this to a manageable set of 5000 high-confidence loci. This strategy enabled us to target indels that are likely to produce clear, distinguishable differences on agarose gels and to avoid complex indels that might be difficult to genotype or interpret. The approach of limiting alternate allele types also minimized complications from multi-allelic loci (which are common in orchids due to segmental duplications and polyploidy in some species). In essence, our workflow demonstrates how transcriptome data from just two individuals can yield a wealth of genetic markers, even in the absence of a reference genome.

The de novo transcriptome assembly provided a robust foundation for marker discovery, as evidenced by the large number of unigenes (156 k) and a substantial N50 of nearly 1.4 kb. By mining this assembly for variants between two divergent varieties, the natural genetic variation present in the Phalaenopsis-type Dendrobium gene pool was revealed. The identification of ~5083 useful indels is a testament to the genetic divergence between Sonia Hiasakul and Dendrobium Sripratum Delight—these two varieties contributed a substantial number of differences, which is not surprising given their distinct origins and traits. This also implies that other variety combinations might yield additional markers, and expanding to more transcriptomes could further enrich the marker resource for Phalaenopsis-type Dendrobium.

Transcriptome-derived variant calling can be influenced by sequence similarity among paralogous genes or alternative isoforms, which may cause ambiguous read mapping and lead to potential false-positive variants—a known limitation of RNA-seq-based polymorphism discovery. Highly similar paralogous genes or alternative splicing isoforms can generate multiple nearly identical transcripts, resulting in mapping ambiguity when short reads are aligned to these transcripts. While Trinity’s default strategy of retaining the longest transcript for each gene helps reduce redundancy, it does not fully eliminate the possibility of misalignment. The Unigene set, which retains only the longest transcript for each gene, further minimizes the impact of transcript redundancy, thus providing a more refined representation of gene content and reducing misalignment caused by paralogous genes or isoforms. However, the relatively higher M (missing) and F (fragmented) values in the Unigene assembly indicate that it has some gaps, which may lead to fewer InDel markers and lower marker coverage compared to the Trinity assembly. While Unigene offers a more accurate and precise depiction of the transcriptome, it may miss some of the genetic diversity captured by the Trinity assembly, which includes a broader range of gene isoforms and duplicated genes. To further mitigate these issues, future efforts could incorporate long-read sequencing technologies such as PacBio Iso-Seq or Oxford Nanopore, which provide full-length transcript sequences and help resolve complex regions of the genome. Additionally, clustering paralogous sequences based on similarity thresholds or applying more stringent mapping criteria could improve the accuracy of InDel detection by reducing the impact of redundancy and misalignment.

Our primer design and in silico filtering pipeline proved to be effective, yielding over a thousand candidate markers with a high likelihood of specificity. The BLAST screening against the draft Dendrobium nobile genome [25] was particularly crucial. Orchids are known for large and repetitive genomes; thus, cross-amplification of primers on non-target sequences is a serious concern. By removing primers that had multiple binding sites, the success rate of our markers in the lab was dramatically improved. The experimental validation results underscore this point: 84% of tested primer pairs produced a clear single band, and none of the successful amplifications showed spurious bands. This is a notably high success rate for de novo marker development. In similar studies without rigorous specificity checks, it is common to encounter a higher proportion of primers that amplify either nothing or multiple bands. Therefore, our approach can serve as a model for marker development in other species lacking reference genomes, combining transcriptome mining with genome filtration to identify reliable markers.

The polymorphism rate among the successful markers 38 out of 50 (76%) is also encouraging. It indicates that a large fraction of the transcriptome-derived indels are truly variable across the broader set of Phalaenopsis-type Dendrobium germplasm, not just between the two varieties used for discovery. Compared to widely used SSR and SNP marker systems, the transcriptome-based InDel markers developed in this study offer a distinct balance of cost-efficiency, operational simplicity, and discriminatory power. While SSR markers are known for their high polymorphism and allelic diversity, their detection often necessitates polyacrylamide gel electrophoresis (PAGE), a process that is labor-intensive, time-consuming, and involves the use of toxic reagents. In contrast, the InDel markers designed here are resolvable on standard agarose gels, significantly reducing the genotyping workload and eliminating safety hazards associated with PAGE. On the other hand, while SNP markers provide high genomic density and are ideal for high-throughput automated platforms, they typically incur higher development and running costs per sample and require specialized equipment (e.g., fluorescence scanners or sequencers) that may not be available in basic breeding laboratories.

In terms of discriminatory power, our InDel markers exhibited an average PIC of 0.52, which is comparable to the informativeness of many Dendrobium SSR markers reported in previous studies (typically ranging from 0.5 to 0.8) [31]. Although SNPs and SSRs may offer higher density or allele numbers, respectively, the bi-allelic nature of InDel markers simplifies scoring and reduces genotyping errors. Furthermore, because these markers are derived from transcribed regions (CDS and UTRs), they target evolutionarily conserved sequences. This feature suggests they may possess higher cross-species transferability compared to genomic SSRs often located in rapidly evolving intergenic regions, potentially extending their utility to related Dendrobium species.

The genetic relationships revealed by our markers provide insights into the genetic structure of Phalaenopsis-type Dendrobium varieties. The relatively high genetic similarity among many varieties suggests that breeders may have used a limited core set of parent plants or that certain hybrids have been widely used in breeding, leading to a degree of relatedness among modern varieties. This is not unexpected in ornamental breeding, where a few popular hybrids can dominate lineage due to their desirable traits (e.g., large flower size or particular colors). However, the fact that clear clustering and a range of genetic distances (with some pairs of varieties being quite distinct) were still observed indicates that there is substantial genetic diversity that can be leveraged [32]. For breeding programs, this has two implications: (1) breeders can use the markers to identify genetically divergent parents to maximize heterosis or to introduce new variation (as per the concept of optimizing parental selection for mapping populations; and (2) the existing genetic clusters might correspond to particular trait groupings, so crossing between clusters could combine complementary traits (e.g., crossing a large-flowered cluster I variety with an early-flowering cluster III variety to generate progeny with both advantages). Our results also suggest that if the breeding pool remains too narrow (high similarity), it may be beneficial to introduce new germplasm (perhaps wild Dendrobium species or less-related hybrids) to broaden the genetic base. The markers that developed would be instrumental in monitoring the integration of such new germplasm and assessing its impact on genetic diversity [33].

From a methodological perspective, our work demonstrates the utility of transcriptome sequencing for marker development in a non-model organism. By focusing on the transcribed portions of the genome, much of the “noise” associated with repetitive DNA that plagues whole-genome approaches in complex genomes was avoided [34]. While the transcriptome-based approach efficiently avoids the complexity of repetitive genomic sequences, it is important to acknowledge its inherent biases. Since RNA-seq focuses exclusively on expressed regions (CDS and UTRs), genetic variations located in non-coding regulatory elements (such as promoters, introns, and enhancers) and intergenic regions are inevitably missed. Consequently, the InDel markers developed herein may underrepresent the total landscape of genetic variation, particularly in regulatory regions that drive expression differences. However, this limitation is counterbalanced by the fact that the identified markers are physically linked to functional genes. Unlike random genomic markers in intergenic spacers, these gene-targeted markers have a higher probability of being associated with phenotypic traits, making them particularly valuable for downstream applications such as marker-assisted selection and functional gene mapping. The fact that our markers are gene-based also means they could be useful for comparative studies: for example, many might amplify in related Dendrobium species or hybrids, potentially allowing cross-transfer of these markers to germplasm outside our current set. The validation of these markers across a broader range of Dendrobium species will be a priority in our future research to fully assess their transferability. Looking ahead, the integration of these transcriptome-derived markers with a physical map will be a critical step once a high-quality chromosome-level reference genome for Phalaenopsis-type Dendrobium becomes available. Since the markers are developed from specific unigenes, their sequences can be computationally anchored to the physical chromosomes using BLAST alignment. This physical mapping will not only reveal the genomic distribution and density of the markers but also help assign them to specific linkage groups. Aligning the genetic map constructed from these InDel markers with the physical genome will bridge the gap between phenotypes and genotypes, significantly accelerating the identification of candidate genes within quantitative trait loci (QTLs) and facilitating map-based cloning for key ornamental traits. Notably, the pipeline employed—de novo assembly, variant calling, filtering, primer design, and BLAST screening—is broadly applicable and can be replicated in other species when transcriptome data from two or more individuals with relevant differences are available [35].

In summary, the development of these InDel markers fills a significant gap for Phalaenopsis-type Dendrobium. Prior to this, molecular marker studies in Dendrobium largely relied on ISSR or AFLP techniques, or a limited number of SSR markers developed for specific species [36]. The new markers provide a more sequence-specific and reproducible toolkit. They can be used to establish DNA fingerprints for variety protection and identity verification, which is important given the high commercial value of certain orchid hybrids. They also enable more rigorous genetic diversity assessments to guide conservation efforts for germplasm collections and breeding repositories, ensuring that the full range of genetic variation is captured and maintained. Furthermore, in a breeding context, these markers can facilitate marker-assisted selection (MAS): for instance, if certain markers are found to be linked to desirable traits (like disease resistance or particular flower characteristics), breeders can screen seedlings for those markers rather than waiting for the plants to mature and bloom [37].

Beyond Phalaenopsis-type Dendrobium, both SSR and InDel markers have shown great potential in other orchid species. For example, SSR markers have been widely used for genetic diversity studies and cultivar identification in species like Cattleya [38], Dendrobium nobile [39], and Cymbidium [40], providing valuable tools for germplasm conservation and breeding. Similarly, the application of InDel markers in orchids has gained momentum due to their advantages in genotyping, including simpler detection methods and higher resolution in detecting genetic variation compared to traditional markers. In species such as Orchidaceae and Paphiopedilum, InDel markers have been effectively employed for genetic mapping, marker-assisted selection, and breeding programs aimed at improving disease resistance, flower traits, and adaptation to environmental stresses [41,42]. This study, by demonstrating the utility of InDel markers in Phalaenopsis-type Dendrobium, lays the groundwork for extending these markers to other plants, where limited genomic resources are available [43].

One interesting observation from our results is that even with moderate genetic diversity detected, each of the 24 varieties was uniquely identifiable by a combination of these markers (a genetic “fingerprint”). This is crucial for variety registration and protection. It also implies that the number of markers can be further reduced to a core set for routine identification—for example, a panel of the top 10 most polymorphic InDel markers might suffice to distinguish all 24 varieties in our test (since none were identical across all those loci) [44]. More varieties were added to the analysis because the panel might need to grow, but with 1029 markers available in our library, there was ample room to create high-discriminatory panels.

The approaches and findings here provide a framework for other non-model, highly heterozygous crops. Ornamental plants, many fruit trees, and medicinal herbs often lack reference genomes, yet with decreasing costs of RNA sequencing, the transcriptome-to-marker route is very feasible [45]. This study serves as a case example showing that high-throughput sequencing data can be translated into practical breeding tools even in the absence of complete genome information.

5. Conclusions

This study successfully demonstrates the utility of a transcriptome-based approach for the development of InDel markers in Phalaenopsis-type Dendrobium, a non-model orchid with no reference genome. The identification and validation of high-quality polymorphic markers offer a significant step forward in understanding the genetic diversity and relationships among different Dendrobium varieties. These markers can be used for variety identification, molecular breeding, and genetic improvement, providing a valuable tool for the management of Dendrobium germplasm. Furthermore, the custom bioinformatics pipeline developed in this study can be adapted for use in other species that lack reference genomes, enhancing the scope and applicability of high-throughput marker development. Overall, this work contributes to advancing the genetic research and breeding strategies of Dendrobium and other similar plants, facilitating future improvements in ornamental and medicinal crops.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/horticulturae11121459/s1, Table S1: Sequencing quality metrics for Sonia Hiasakul and Dendrobium Sripratum Delight transcriptomes; Table S2: BUSCO analysis results comparing the Trinity and Unigene transcriptome assemblies; Table S3: InDel variations in Sonia Hiasakul and Dendrobium Sripratum Delight; Table S4: Summary of InDel loci retained after filtering; Table S5: Primer design for the selected InDel loci; Table S6: Summary of BLAST-based filtering and high-specificity InDel markers; Table S7: InDel markers for experimental validation; Table S8: Amplification specificity of validated InDel markers in 24 Phalaenopsis-type Dendrobium varieties; Table S9: Polymorphism information content (PIC) values of validated polymorphic InDel markers.

Author Contributions

Conceptualization, methodology, investigation, data curation, writing—original draft, X.Y.; formal analysis, visualization, writing— review and editing, T.Y.; data analysis, X.L.; investigation, S.Y.; data analysis, writing—review and editing, Y.L.; data analysis, conceptualization, supervision, project administration, funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Hainan Province, Youth Fund Project (No. 324QN311), the Central Public interest Scientific Institution Basal Research Fund (1630032022004 and 1630032023014), the Earmarked Fund for CARS (CARS-23-G60).

Data Availability Statement

The datasets generated and analyzed in this study (raw sequencing reads, assembled unigenes, and variant call files) are available from the corresponding author on reasonable request. The custom scripts used for InDel mining and primer design are openly available in the GitHub repository “InDel-Marker-Design-Using-Transcriptome-Data” (http://github.com/biodendrobium/InDel-Marker-Design-Using-Transcriptome-Data) (accessed on 1 October 2025). All relevant data supporting the conclusions of this article are included within the article and its supplementary files.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AFLP	amplified fragment length polymorphism
InDel	insertion–deletion
PIC	polymorphism information content
SSRs	simple sequence repeats
SNPs	single-nucleotide polymorphisms
cDNA	complementary DNA
GATK	Genome Analysis Toolkit
Tm	melting temperature
PCR	polymerase chain reaction
PAGE	polyacrylamide gel electrophoresis
QTLs	quantitative trait loci

References

Xu, Q.; Niu, S.-C.; Li, K.-L.; Zheng, P.-J.; Zhang, X.-J.; Jia, Y.; Liu, Y.; Niu, Y.-X.; Yu, L.-H.; Chen, D.-F.; et al. Chromosome-Scale Assembly of the Dendrobium nobile Genome Provides Insights Into the Molecular Mechanism of the Biosynthesis of the Medicinal Active Ingredient of Dendrobium. Front. Genet. 2022, 13, 844622. [Google Scholar] [CrossRef]
Chung, O.; Kim, J.; Bolser, D.; Kim, H.M.; Jun, J.H.; Choi, J.P.; Jang, H.D.; Cho, Y.S.; Bhak, J.; Kwak, M. A chromosome-scale genome assembly and annotation of the spring orchid (Cymbidium goeringii). Mol. Ecol. Resour. 2022, 22, 1168–1177. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhang, G.-Q.; Zhang, D.; Liu, X.-D.; Xu, X.-Y.; Sun, W.-H.; Yu, X.; Zhu, X.; Wang, Z.-W.; Zhao, X.; et al. Chromosome-scale assembly of the Dendrobium chrysotoxum genome enhances the understanding of orchid evolution. Hortic. Res. 2021, 8, 183. [Google Scholar] [CrossRef] [PubMed]
Niu, Z.; Zhu, F.; Fan, Y.; Li, C.; Zhang, B.; Zhu, S.; Hou, Z.; Wang, M.; Yang, J.; Xue, Q.; et al. The chromosome-level reference genome assembly for Dendrobium officinale and its utility of functional genomics research and molecular breeding study. Acta Pharm. Sin. B 2021, 11, 2080–2092. [Google Scholar] [CrossRef] [PubMed]
Xiang, N.; Hong, Y.; Lam-Chan, L. Genetic analysis of tropical orchid hybrids (Dendrobium) with fluorescence amplified fragment-length polymorphism (AFLP). J. Am. Soc. Hortic. Sci. 2003, 128, 731–735. [Google Scholar] [CrossRef]
Wahba, L.E.; Hazlina, N.; Fadelah, A.; Ratnam, W. Genetic relatedness among Dendrobium (Orchidaceae) species and hybrids using morphological and AFLP markers. Hortscience 2014, 49, 524–530. [Google Scholar] [CrossRef]
Amom, T.; Nongdam, P. The use of molecular marker methods in plants: A review. Int. J. Curr. Res. Rev. 2017, 9, 1–7. [Google Scholar] [CrossRef]
Hu, W.; Zhou, T.; Wang, P.; Wang, B.; Song, J.; Han, Z.; Chen, L.; Liu, K.; Xing, Y. Development of whole-genome agarose-resolvable LInDel markers in rice. Rice 2020, 13, 1. [Google Scholar] [CrossRef]
Xu, M.; Liu, X.; Wang, J.-W.; Teng, S.-Y.; Shi, J.-Q.; Li, Y.-Y.; Huang, M.-R. Transcriptome sequencing and development of novel genic SSR markers for Dendrobium officinale. Mol. Breed. 2017, 37, 18. [Google Scholar] [CrossRef]
Zhu, B.; Luo, X.; Gao, Z.; Hu, X.; Weng, Q. De novo transcriptome assembly and development of est-ssr markers of endangered Dendrebium nobile (Orchidaceae). Pak. J. Bot. 2022, 54, 483–489. [Google Scholar] [CrossRef]
Liu, J.; Li, J.; Qu, J.; Yan, S. Development of Genome-Wide Insertion and Deletion Polymorphism Markers from Next-Generation Sequencing Data in Rice. Rice 2015, 8, 27. [Google Scholar] [CrossRef]
Guo, G.; Zhang, G.; Pan, B.; Diao, W.; Liu, J.; Ge, W.; Gao, C.; Zhang, Y.; Jiang, C.; Wang, S. Development and application of InDel markers for Capsicum spp. based on whole-genome re-sequencing. Sci. Rep. 2019, 9, 3691. [Google Scholar] [CrossRef]
He, T.; Ye, C.; Zeng, Q.; Fan, X.; Huang, T. Genetic diversity and population structure of cultivated Dendrobium nobile Lindl. in southwest of China based on genotyping-by-sequencing. Genet. Resour. Crop Evol. 2022, 69, 2803–2818. [Google Scholar] [CrossRef]
Li, X.; Ding, X.; Chu, B.; Zhou, Q.; Ding, G.; Gu, S. Genetic diversity analysis and conservation of the endangered Chinese endemic herb Dendrobium officinale Kimura et Migo (Orchidaceae) based on AFLP. Genetica 2008, 133, 159–166. [Google Scholar] [CrossRef] [PubMed]
Reddy, D.M.; Momin, K.C.; Singh, A.K.; Kumar, S.; Wangchu, L.; Bhargav, V. Genetic diversity of Dendrobium species revealed by simple sequence repeat (SSR) markers. Genet. Resour. Crop Evol. 2024, 71, 1453–1464. [Google Scholar] [CrossRef]
Ye, M.; Wang, X.; Zhou, Y.; Huang, S.; Liu, A. Genetic diversity and population structure of cultivated Dendrobium huoshanense (C.Z. Tang et S.J. Cheng) using SNP markers generated from GBS analysis. Pak. J. Bot. 2021, 53, 1683–1690. [Google Scholar] [CrossRef] [PubMed]
Singh, K.P.; Kumari, P.; Yadava, D.K. Development of de-novo transcriptome assembly and SSRs in allohexaploid Brassica with functional annotations and identification of heat-shock proteins for thermotolerance. Front. Genet. 2022, 13, 958217. [Google Scholar] [CrossRef]
Privitera, G.F.; Treccarichi, S.; Nicotra, R.; Branca, F.; Pulvirenti, A.; Piero, A.R.L.; Sicilia, A. Comparative transcriptome analysis of B. oleracea L. var. italica and B. macrocarpa Guss. genotypes under drought stress: De novo vs reference genome assembly. Plant Stress 2024, 14, 100657. [Google Scholar]
Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef] [PubMed]
Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q. Trinity: Reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nat. Biotechnol. 2011, 29, 644. [Google Scholar] [CrossRef]
Tegenfeldt, F.; Kuznetsov, D.; Manni, M.; Berkeley, M.; Zdobnov, E.M.; Kriventseva, E.V. OrthoDB and BUSCO update: Annotation of orthologs with wider sampling of genomes. Nucleic Acids Res. 2025, 53, D516–D522. [Google Scholar] [CrossRef]
Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; 1000 Genome Project Data Processing Subgroup. The sequence alignment/map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar] [CrossRef]
McKenna, A.; Hanna, M.; Banks, E.; Sivachenko, A.; Cibulskis, K.; Kernytsky, A.; Garimella, K.; Altshuler, D.; Gabriel, S.; Daly, M.; et al. The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20, 1297–1303. [Google Scholar] [CrossRef]
Untergasser, A.; Cutcutache, I.; Koressaar, T.; Ye, J.; Faircloth, B.C.; Remm, M.; Rozen, S.G. Primer3—New capabilities and interfaces. Nucleic Acids Res. 2012, 40, e115. [Google Scholar] [CrossRef]
Sherpa, R.; Devadas, R.; Suprasanna, P.; Bolbhat, S.N.; Nikam, T.D. First De novo whole genome sequencing and assembly of mutant Dendrobium hybrid cultivar ‘Emma White’. Gigabyte 2022, 2022, 1–8. [Google Scholar] [CrossRef] [PubMed]
Murray, M.; Thompson, W. Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 1980, 8, 4321–4326. [Google Scholar] [CrossRef] [PubMed]
Anderson, J.A.; Churchill, G.; Autrique, J.; Tanksley, S.; Sorrells, M. Optimizing parental selection for genetic linkage maps. Genome 1993, 36, 181–186. [Google Scholar] [CrossRef] [PubMed]
Jaccard, P. Nouvelles recherches sur la distribution florale. Bull. Société Vaudoise Sci. Nat. 1908, 44, 223–270. [Google Scholar]
Rohlf, F. NTSYS-pc: Numerical Taxonomy and Multivariate Analysis System; Exeter Publishing: Exeter, UK, 1988. [Google Scholar]
Kumar, S.; Stecher, G.; Li, M.; Knyaz, C.; Tamura, K. MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. Mol. Biol. Evol. 2018, 35, 1547–1549. [Google Scholar] [CrossRef] [PubMed]
Tsai, C.-C.; Shih, H.-C.; Wang, H.-V.; Lin, Y.-S.; Chang, C.-H.; Chiang, Y.-C.; Chou, C.-H. RNA-seq SSRs of moth orchid and screening for molecular markers across genus Phalaenopsis (Orchidaceae). PLoS ONE 2015, 10, e0141761. [Google Scholar] [CrossRef]
Hsu, C.-C.; Chen, S.-Y.; Chiu, S.-Y.; Lai, C.-Y.; Lai, P.-H.; Shehzad, T.; Wu, W.-L.; Chen, W.-H.; Paterson, A.H.; Chen, H.-H. High-density genetic map and genome-wide association studies of aesthetic traits in Phalaenopsis orchids. Sci. Rep. 2022, 12, 3346. [Google Scholar] [CrossRef]
Lai, Y.-S.; Chen, S.-Y.; Wu, Y.-J.; Chen, W.-H.; Chen, H.-H.; Lin, Y.-Y.; Lin, T.-C.; Lin, T.-J.; Kao, C.-F. Genetic profiles and phenotypic patterns in Taiwanese Phalaenopsis orchids: A two-step phenotype and genotype strategy using modified genetic distance algorithms. Front. Plant Sci. 2024, 15, 1416886. [Google Scholar] [CrossRef]
Cahais, V.; Gayral, P.; Tsagkogeorga, G.; Melo-Ferreira, J.; Ballenghien, M.; Weinert, L.; Chiari, Y.; Belkhir, K.; Ranwez, V.; Galtier, N. Reference-free transcriptome assembly in non-model animals from next-generation sequencing data. Mol. Ecol. Resour. 2012, 12, 834–845. [Google Scholar] [CrossRef] [PubMed]
Kim, J.M.; Lyu, J.I.; Lee, M.-K.; Kim, D.-G.; Kim, J.-B.; Ha, B.-K.; Ahn, J.-W.; Kwon, S.-J. Cross-species transferability of EST-SSR markers derived from the transcriptome of kenaf (Hibiscus cannabinus L.) and their application to genus Hibiscus. Genet. Resour. Crop Evol. 2019, 66, 1543–1556. [Google Scholar] [CrossRef]
Guang, X.-M.; Xia, J.-Q.; Lin, J.-Q.; Yu, J.; Wan, Q.-H.; Fang, S.-G. IDSSR: An efficient pipeline for identifying polymorphic microsatellites from a single genome sequence. Int. J. Mol. Sci. 2019, 20, 3497. [Google Scholar] [CrossRef]
Liu, Y.-C.; Lin, B.-Y.; Lin, J.-Y.; Wu, W.-L.; Chang, C.-C. Evaluation of chloroplast DNA markers for intraspecific identification of Phalaenopsis equestris cultivars. Sci. Hortic. 2016, 203, 86–94. [Google Scholar] [CrossRef]
Cui, X.-Q.; Tang, X.; Huang, C.-Y.; Deng, J.-L.; Li, X.-L.; Zhang, Z.-B.; Lu, J.-S. Analysis of genetic diversity of Cattleya germplasms by using ISSR markers. Southwest China J. Agric. Sci. 2020, 33, 1383–1398. [Google Scholar]
Lu, J.-J.; Kang, J.-Y.; Feng, S.-G.; Zhao, H.-Y.; Liu, J.-J.; Wang, H.-Z. Transferability of SSR markers derived from Dendrobium nobile expressed sequence tags (ESTs) and their utilization in Dendrobium phylogeny analysis. Sci. Hortic. 2013, 158, 8–15. [Google Scholar] [CrossRef]
Li, X.; Jin, F.; Jin, L.; Jackson, A.; Huang, C.; Li, K.; Shu, X. Development of Cymbidium ensifolium genic-SSR markers and their utility in genetic diversity and population structure analysis in cymbidiums. BMC Genet. 2014, 15, 124. [Google Scholar] [CrossRef] [PubMed]
Li, D.-M.; Zhu, G.-F. High-density genetic linkage map construction and QTLs Identification Associated with four leaf-related traits in lady’s slipper orchids (Paphiopedilum concolor × Paphiopedilum hirsutissimum). Horticulturae 2022, 8, 842. [Google Scholar] [CrossRef]
Yang, F.; Guo, Y.; Li, J.; Lu, C.; Wei, Y.; Gao, J.; Xie, Q.; Jin, J.; Zhu, G. Genome-wide association analysis identified molecular markers and candidate genes for flower traits in Chinese orchid (Cymbidium sinense). Hortic. Res. 2023, 10, uhad206. [Google Scholar] [CrossRef] [PubMed]
El Caid, M.B.; Lachheb, M.; Lagram, K.; Wang, X.; Serghini, M.A. Ecotypic variation and environmental influence on saffron (Crocus sativus L.) vegetative growth: A multivariate performance analysis. J. Appl. Res. Med. Aromat. Plants 2024, 43, 100601. [Google Scholar] [CrossRef]
Yang, T.; Gao, M.; Huang, S.; Zhang, S.; Zhang, X.; Li, T.; Yu, W.; Meng, P.; Shi, Q. Genetic diversity and DNA fingerprinting of Dendrobium officinale based on ISSR and scot markers. Appl. Ecol. Environ. Res. 2023, 21, 421–438. [Google Scholar] [CrossRef]
Guo, J.; Huang, Z.; Sun, J.; Cui, X.; Liu, Y. Research progress and future development trends in medicinal plant transcriptomics. Front. Plant Sci. 2021, 12, 691838. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Experimental design flowchart summarizing the steps from transcriptome sequencing and de novo assembly to InDel marker development and validation in Phalaenopsis-type Dendrobium.

Figure 2. Distribution of InDel lengths (15–29 bp) identified between the two Phalaenopsis-type Dendrobium varieties.

Figure 3. Banding patterns of PCR products for four representative InDel markers across 24 Phalaenopsis-type Dendrobium varieties. Each panel (A–D) corresponds to a different InDel marker (marker names are indicated in the top-left corner). A 2000 bp DNA ladder (leftmost and rightmost lanes) was used as a size standard, and the target fragment sizes ranged from approximately 260 to 280 bp. Lanes from left to right represent the following hybrid orchid varieties: Nopporn Green Star, Red Bull, Burana Emerald, Pop Eye, Burana White, Siri Gem, Burana Charming, Dendrobium Sripratum Delight, Swirl, Suriya Gold, Smile, Thongchai Gold, Enobi Purple, Leong Yok Kin, Asian Pearl, Caesar, Serene Chang ‘red’, Little Girl, Sonia Hiasakul, Pensuda, Rasa Saying, Burana Gold, Burbank Queen × Dal’s Jazz, Alice’s Stewart.

Figure 4. Polymorphism information content (PIC) values for the 38 validated InDel markers.

Figure 5. UPGMA dendrogram of the 24 Phalaenopsis-type Dendrobium varieties based on genetic distances calculated from 38 InDel markers.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, X.; Yao, T.; Luo, X.; Yi, S.; Liao, Y.; Lu, S. Design, Validation, and Application of Transcriptome-Based InDel Markers in Phalaenopsis-Type Dendrobium Varieties. Horticulturae 2025, 11, 1459. https://doi.org/10.3390/horticulturae11121459

AMA Style

Yu X, Yao T, Luo X, Yi S, Liao Y, Lu S. Design, Validation, and Application of Transcriptome-Based InDel Markers in Phalaenopsis-Type Dendrobium Varieties. Horticulturae. 2025; 11(12):1459. https://doi.org/10.3390/horticulturae11121459

Chicago/Turabian Style

Yu, Xiaoyun, Tongyan Yao, Xiaoyan Luo, Shuangshuang Yi, Yi Liao, and Shunjiao Lu. 2025. "Design, Validation, and Application of Transcriptome-Based InDel Markers in Phalaenopsis-Type Dendrobium Varieties" Horticulturae 11, no. 12: 1459. https://doi.org/10.3390/horticulturae11121459

APA Style

Yu, X., Yao, T., Luo, X., Yi, S., Liao, Y., & Lu, S. (2025). Design, Validation, and Application of Transcriptome-Based InDel Markers in Phalaenopsis-Type Dendrobium Varieties. Horticulturae, 11(12), 1459. https://doi.org/10.3390/horticulturae11121459

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Design, Validation, and Application of Transcriptome-Based InDel Markers in Phalaenopsis-Type Dendrobium Varieties

Abstract

1. Introduction

2. Materials and Methods

2.1. Plant Materials

2.2. Transcriptome Data Acquisition and Analysis

2.3. InDel Detection and Primer Design

2.4. Selection of Primer Pairs for Validation

2.5. Data Analysis

3. Results

3.1. Transcriptome Assembly and InDel Identification

3.2. Primer Design and Validation

3.3. Polymorphism Information Content

3.4. Phylogenetic Analysis

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI