Next Article in Journal
Detection of Chromosomal Aneuploidy Using Exome Sequencing
Previous Article in Journal
Transcriptional Regulation of CYP2E1: Promoter Methylation in In Vitro Models and Human Liver Disease Samples
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Genome Survey Sequencing of Indigofera pseudotinctoria and Identification of Its SSR Markers

1
Chongqing Academy of Animal Sciences, Rongchang, Chongqing 402460, China
2
Chongqing Crop Germplasm Rongchang Forage Resource Bank, Rongchang, Chongqing 402460, China
3
College of Grassland Science and Technology, Sichuan Agricultural University, Chengdu 611130, China
*
Author to whom correspondence should be addressed.
Genes 2025, 16(9), 991; https://doi.org/10.3390/genes16090991
Submission received: 22 June 2025 / Revised: 5 August 2025 / Accepted: 22 August 2025 / Published: 23 August 2025
(This article belongs to the Section Plant Genetics and Genomics)

Abstract

Background: Indigofera pseudotinctoria, a traditional Chinese forage and medicine widely used in East Asia, holds significant economic and agricultural value. Despite this, genomic information regarding I. pseudotinctoria remains conspicuously lacking. Methods: In this study, we utilized genome survey sequencing to elucidate the complete genome sequence of this species. Results: The genome size of I. pseudotinctoria to be around 637–920 Mb with a heterozygosity rate of 0.98% and a repeat rate of 66.3%. A total of 240,659 simple sequence repeat (SSR) markers were predicted in the genome of I. pseudotinctoria. Substantial differences were observed among nucleotide repeat types, for instance, mononucleotide repeats were found to be predominant (62.47%), whereas pentanucleotide repeats were notably scarce (0.24%). Furthermore, among dinucleotide and trinucleotide repeats, sequence motifs AT/AT (66.57%) and AAT/ATT (54.15%) were found to be particularly abundant. Among the identified unigenes, 58,790 exhibited alignment with known genes in established databases, including 33,218 genes within the Gene Ontology (GO) database and 10,893 genes in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Conclusions: This study marks the first attempt to both sequence and delineate the genomic landscape of I. pseudotinctoria. Importantly, it will serve as a foundational reference for subsequent comprehensive genome-wide deep sequencing and the development of SSR molecular markers within the scope of I. pseudotinctoria research.

1. Introduction

Indigofera pseudotinctoria (2n = 2x = 16), commonly known as “Chinese Indigo”, is a flowering plant species belonging to the genus Indigofera in the Fabaceae family (Leguminosae) [1]. Its native habitat spans East Asia, including specific regions in China, Japan, and Korea [2]. Historically, I. pseudotinctoria has been used as a natural dye source and as a bioaccumulate of materials such as lead and caumium [3,4]. Furthermore, I. pseudotinctoria holds additional importance as an agronomically and economically valuable perennial leguminous shrub. It is particularly renowned for its medicinal properties, which are attributed to its rich store of natural antioxidants, including flavonoids, polyphenol, and amino acids [5,6]. With these attributes, I. pseudotinctoria is a crucial asset within agricultural systems.
In recent years, driven by the increasing demand for natural medicinal plants, the scrutiny of I. pseudotinctoria has gained significant importance [7]. However, ongoing habitat degradation driven by human activities hampers efforts to meet increasing demand while maintaining sustainability [8]. As a result, the need to domesticate and cultivate this species has become a viable solution. With advances in cultivation techniques, the focus of future efforts is likely to shift towards the development of diverse varieties. Currently, the lack of genomic information hinders the full exploration and utilization of I. pseudotinctoria.
Due to their numerous advantages, such as robust reproducibility, co-dominance, substantial abundance, and straightforward applicability, SSR markers have become indispensable tools for analyzing of genetic diversity and conducting linkage mapping [9,10]. Both genomic SSRs and EST SSRs act as complementary resources in the field of plant genome mapping. Notably, recent efforts have included the meticulous selection and validation of 44 pairs of EST-SSR markers through transcriptome sequencing in I. szechuensis [11]. This project aimed to unravel the population’s genetic structure within I. pseudotinctoria. While EST-SSRs are useful for genetic analysis, they are somewhat limited by their relatively lower polymorphism and the higher likelihood of being scarce in non-coding genomic regions [12]. In contrast, genomic SSRs display greater polymorphism and a tendency for broader genomic distribution, which contributed to improved map coverage [13]. Regrettably, no markers based on genomic sequences are currently available for I. pseudotinctoria.
Next-generation sequencing (NGS) has emerged as a powerful and innovative approach for the efficient discovery of numerous simple sequence repeat (SSR) markers [9,14]. Compared to traditional methods, this technique not only dramatically enhances sequencing throughput but also significantly reduces both time consumption and overall experimental costs. By integrating NGS with K-mer frequency analysis, genomic survey analysis allows for the estimation of key genomic features, including genome size, GC content, heterozygosity level, and repeat sequence proportion [15,16]. This method has proven effective in accurately predicting whole-genome sizes across multiple plant species, including potato (Solanum tuberosum) [17], wheat (Triticum aestivum) [18], and rice (Oryza sativa) [19]. In the study, we applied genomic survey analysis together with flow cytometry to explore the genome architecture of I. pseudotinctoria. Our research was guided by three main objectives: firstly, to determine the genome size, GC content, and heterozygosity levels of I. pseudotinctoria; secondly, to analyze the distribution patterns of SSR motifs throughout its genome using survey sequencing data; and thirdly, to conduct functional annotation through Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and KOG pathways. The results obtained from this investigation contribute valuable insights into the genomic characteristics of I. pseudotinctoria, laying a solid foundation for future large-scale genome sequencing and genetic resource development.

2. Materials and Methods

2.1. Plant Materials

I. pseudotinctoria wild accessions were collected from Chongqing, China (107.37° N, 29.10° E). A selected individual plant (Germplasm ID: XKY20150201) was developed through multiple cycles of recurrent mass selection using progenitor germplasm derived from these wild accessions (Figure 1). The germplasm served as material for subsequent analyses, including flow cytometry, genome survey sequencing, and SSR characterization. Genomic DNA was extracted from youthful leaf tissues of I. pseudotinctoria, using a Tiangen Biotech (Beijing, China) plant genomic DNA extraction kit with the manufacturer’s stipulations. Subsequently, the integrity and concentration of the extracted DNA samples were evaluated via 1% agarose gel electrophoresis.

2.2. Genome Size Estimation by Fow Cytometry

The genome size was determined using the Beckman CytoFLEXTM flow cytometer (Beckman, Pasadena, CA, USA). The protocol was appropriately optimized and modified based on the methods of Doležel and Bartoš [20]. The genome size estimation utilized rice (O. sativa subsp. japonica cv. Nipponbare) (2n = 2x = 24, 1C DNA content = 389 Mb, GC content = 43.6%) [21] as an internal reference standard. Approximately 100 mg of young leaves (3-week-old) was collected and immediately wrapped in moist filter paper, then stored in an icebox for subsequent use. For rice, 1 mL of pre-chilled OTTO’s I buffer was added. The tissue was quickly and finely chopped using a sharp single-edge razor blade, followed by incubation at 4 °C for 5 min. Then, 0.5 mL of OTTO’s II buffer was added to facilitate better nuclear release. For I. pseudotinctoria, 1 mL of LB01 buffer was added. The tissue was finely chopped using a sharp single-edge razor blade, followed by incubation at 4 °C for 5 min. Then, 0.5 mL of LB01 buffer was added. The lysate was filtered through a 300-mesh nylon mesh, and the filtrate was collected in a 1.5 mL centrifuge tube. The sample was centrifuged at 4 °C for 5 min at 1000 rpm/min. Then, 100 μL of the cell suspension was transferred to a sterile EP tube, mixed with 50 μL of propidium iodid (PI) staining solution, and incubated at 4 °C in the dark for 30 min. After staining, 1 mL of PBS was added to resuspend the cells, followed by centrifugation at 4 °C for 5 min at 1000 rpm/min. The supernatant was discarded, and the pellet was resuspended in 400 μL of PBS. The cell suspension was immediately subjected to flow cytometry analysis. Nuclear suspensions were prepared at a 1:1 (test:control) ratio for flow cytometric assays. A 488 nm green argon laser was employed to examine a minimum of 5000 nuclei per sample. Fluorescence detection and subsequent data processing were performed using Kaluza software version 3.1 (www.mybeckman.cn/flow-cytometry/software/kaluza, accessed on 30 March 2024), maintaining coefficient of variation (CV) values below 5% for both peaks. The nuclear DNA content of test samples was determined using the following computational approach: Estimated genome size = [(sample G0/G1 peak mean)/(standard G0/G1 peak mean)] × standard genome size.

2.3. Genome Survey Sequencing and Quality Control

A paired-end library with an insert size of 220 base pairs (bp) was constructed assembled through the controlled fragmentation of genomic DNA, meticulously following the standardized procedure established by Illumina (Beijing, China). Subsequent to the library’s preparation, the Nanjing Genepioneer Biotechnologies Co., Ltd. (Nanjing, China) employed an Illumina HiSeq 2500 sequencing platform to generate the sequence data. The process of Base Calling was executed utilizing Illunima Casava 1.8. After sequencing, filtering and sequence data correction yielded clean reads.

2.4. Genome Sequencing Assembly and GC Content

For the assembly of the genome, we employed SOAPdenovo2 software version 1.0 (https://github.com/aquaskyline/SOAPdenovo2, accessed on 20 February 2025) and AByss software version 2.3.10 (https://github.com/bcgsc/abyss, accessed on 20 February 2025) [22,23], utilizing clean reads. The assessment of K-mer sizes at 17 was executed with default parameters, and the optimal K-mer size was determined based on the N50 length [24]. Given that sequences under 200 bp were prone to originating from repetitive or low-quality sources, reads exceeding a length of 200 bases were selected for subsequent contig sequence realignment. Ensuring the concurrence of the paired-end relationship between reads and contigs, scaffolds were progressively constructed utilizing paired-end inserts. To yield insights into the genome’s characteristics, we computed the average depth and GC content for each window, producing both GC-depth plots and determining repeat content by stratifying GC clusters [25]. BUSCO software version 6.0 was employed to quantitative assessment of genome assembly and annotation [26].

2.5. Genomic SSR Identification and PCR Amplification

The identification of SSR markers was carried out using MISA software (version 2.1, http://pgrc.ipk-gatersleben.de/misa/, accessed on 15 March 2025). SSR motifs identification was based on the following thresholds: The minimum repeat number for mononucleotide motifs was set at ten, for dinucleotide motifs at six, and for trinucleotide, tetranucleotide, pentanucleotide, or hexanucleotide motifs, the threshold was five. The compound SSRs were identified as sequences where distinct microsatellite repeats.
In this study, due to the abundance of SSRs, 17 SSR loci were randomly selected from tri- and tetranucleotide microsatellites and designed corresponding primer pairs for PCR amplification. These primer pairs were designed using Primer Premier 5.0 software (www.premierbiosoft.com/primerdesign/, accessed on 30 May 2025) (Table 1), with all oligonucleotides conforming to the following specifications: length between 20 and 26 nucleotides, amplification product size ranging from 100 to 300 base pairs, and optimal annealing temperatures maintained at 50–60 °C. Primers were synthesized by Zhejiang Youkang Biotechnology Co., Ltd., Huzhou, China. The 20 µL PCR reaction system contained 1 µL genomic DNA (40 ng/µL), 0.5 µL (20 pmol/L) of each forward and reverse primer, 10 µL of 2X Taq-AS PCR Mix (BestEnzymes Biotech, Lianyungang, China), and 8 µL of double-distilled water. The PCR conditions were as follows: 2 min at 94 °C; 30 cycles at 94 °C for 20 s, 60 °C for 20 s, and 72 °C for 20 s; and a final extension at 72 °C for 3 min. The final PCR product was visualized by 2% agarose gel electrophoresis at 120 V for 30 min.

2.6. Gene Prediction and Annotation

We accessed the high-quality unigene library through using Trinity software v2.15.1 [27]. Following its acquisition, the obtained unigene underwent comprehensive bioinformatics analysis, encompassing functional annotation and categorization. To facilitate this, we employed the BLASTx comparison tool (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome, accessed on 20 April 2025) for unigene-protein database comparison (with an E-value threshold of ≤ 1 × 10−5) [28]. The functional annotation was contingent upon the resemblance of the gene to the functional annotation information for the encoded protein of the unigene. The protein databases employed for this purpose comprised NR (Non-Redundant Protein Database) (https://www.ncbi.nlm.nih.gov/protein, accessed on 20 April 2025), KOG (Clusters of Orthologous Groups for Eukaryotic Complete Genomes) [29], GO (Gene Ontology Database) [30,31], and KEGG (Tokyo Encyclopedia of Genes and Genomes) [32].

3. Results

3.1. Genome Size Estimation by Fow Cytometry

The application of flow cytometric analysis produced an intricate histogram with high resolution (Figure 2). Both rice and I. pseudotinctoria exhibited sharp and narrow peaks without overlapping interference between their measurement positions. The distinct particle clusters demonstrated excellent separation between the two species, confirming the reliability of using rice as an internal reference. The mean coefficient of variation (CV) of I. pseudotinctoria and the internal standard rice were quantified at 4.73% and 3.56%, respectively (Table 2). The outcomes unveiled that the genome size of I. pseudotinctoria approximated 920 ± 2 Mb.

3.2. Genome Sequencing and Sequence Assembly

A total of 48.952 Gb of clean bases were successfully acquired, with Q20 and Q30 values of 98.18% and 93.51%, respectively (Table 3). The cumulative count of sequences amounted to 553,021, encompassing a total length of 431,452,197 base pairs (bp) (Table 4). Within the genome of I. pseudotinctoria, the most extensive assembled sequence extended over 90,236 bp, while the N50 length reached 3506 bp. All clean reads was employed as query sequences for BLAST (Basic Local Alignment Search Tool) analysis against the NCBI’s (National Center for Biotechnology Information) Nucleotide Sequence Database (NT). The outcome of this analysis pinpointed the top five comparative species: Abrus precatorius (9393; 15.98%), Camellia sinensis (8869; 15.09%), Spatholobus suberectus (5615; 9.6%), Glycine soja (4172; 7.10%), and Trifolium pratense (3355; 5.70%) (Figure S1).

3.3. Genome Size Estimation and GC Content

The complete set of clean reads was employed to predict the genomic attributes of I. pseudotinctoria through k-mer analysis. Utilizing the 17-mer frequency distribution, the genome size was appraised at 637 Mb, constituting 69% of the size (920 Mb) estimated via flow cytometry. Moreover, the ratios of heterozygosity and repeat sequence content were determined as 0.98% and 66.30%, respectively (Figure 3 and Table 5). Consequently, it is evident that the genome of I. pseudotinctoria aligns with a complex nature, underscored by heightened heterozygosity and repeat sequence. Additionally, the genomic GC content registered at 34.3% (Figure 4).

3.4. Identifcation and Verification of SSRs

A total of 240,659 SSRs was discerned within the draft genome sequence of the I. pseudotinctoria (Table 6). Notably, mononucleotide SSRs emerged as the most prevalent, constituting 74.50% (180,491) of the total SSR. This was followed by dinucleotide SSRs (35,978; 14.95%), trinucleotide SSRs (20,213; 8.40%), tetranucleotide SSRs (3140; 1.30%), pentanucleotide SSRs (515; 0.21%), and hexanucleotide SSRs (322; 0.13%) (Table 6 and Figure 5). Among the mononucleotide repeats, the AT/AT motif predominated, accounting for 66.57% of the total repeat units. In the dinucleotide context, the most frequent motif observed was AG/CT (18.41%), followed by AC/GT (14.55%) (Figure 6A). For trinucleotide repeats, the prevailing motifs were AAT/ATT, AAG/CTT, AAC/GTT, and AGG/CCT, encompassing proportions of 54.15%, 21.35%, 8.47%, and 2.03%, respectively (Figure 6B).
Given the numerous SSRs identified, we randomly selected 17 tri- and tetranucleotide microsatellite loci from the draft genome of I. pseudotinctoria and designed corresponding primer pairs. These selected loci predominantly contained five or six motif repeats. The amplification results demonstrated that 16 out of the 17 SSR loci could be successfully amplified (Figure 7), confirming the reliability and reproducibility of our genomic SSR marker identification.

3.5. Gene Prediction and Annotation

A comprehensive total of 58,790 unigenes, constituting 98.91% of all unigenes, were meticulously matched and subsequently annotated within the Nr homologous database (Figure S1 and Table S1). Overall, 33,218 putative genes were classified into KOG functional categories (Figure 8A), among which the largest cluster was general function prediction (17,453; 52.54%), followed by post-translational modification (1867; 5.62%), signal transduction mechanisms (1497; 8.58%), and transcription (1086; 6.22%) (Table S2). Utilizing sequence homology, the 28,111 assembled transcripts were systematically assigned into GO terms, including biological process, cellular component, and molecular function (Figure 8B). Within biological processes, the category predominantly represented was cellular processes (18,916; 23.98%), succeeded by metabolic processes (17,548; 20.98%) and biological regulation (8241; 17.59%). For cellular components, the most prevalent category was the cell part (22,173; 24.52%), followed by organelle (17,039; 18.84%) and membrane (8826; 9.76%). In terms of molecular function, two major categories stood out: binding (15,191; 19.98%) and catalytic activity (15,100; 19.87%).
Furthermore, the assignment of putative genes to 131 KEGG pathways yielded a total of 57,636 associations (Figure 8C), among which 10,893 genes were correlated with 20 metabolic pathways. Notably, carbohydrate metabolism featured prominently (1261; 11.58%), closely pursued by translation (3025; 27.8%), amino acid metabolism (739; 6.8%), and lipid metabolism (610; 5.6%). Additionally, organismal systems pathways, environmental information processing, genetic information processing, and metabolism KEGG pathways were linked to 332, 661, 2623, and 4549 genes, respectively.

4. Discussion

This study represents a significant advancement in understanding the complex genomics of I. pseudotinctoria, a traditional Chinese botanical species that holds paramount economic values due to its applications in forage production and medicinal practices. The relevance of our investigation is underscored by the paucity of genomic information pertaining to this species. This study used flow cytometry to determine the genome size of I. pseudotinctoria, yielding a result of 920 Mb (Table 1), whereas K-mer-based survey analysis estimated it to be 637 Mb (Table 5). The genome size measured by flow cytometry is larger than that obtained through K-mer analysis, a trend consistent with findings in other plant species such as Parrotia [33], Sophora alopecuroides [34], Cucumis sativus [35], and Panax ginseng [36]. Studies have shown that flow cytometry employs dissociated and stained cell suspensions as test samples, and thus the results can be influenced by plant cellular structures and secondary metabolites [37]. In contrast, K-mer-based survey methods are not affected by endogenous cellular substances; however, during second-generation sequencing, some genomic fragments may be lost due to the fragmentation and assembly processes, resulting in an underestimation of the genome size. Notably, the observed heterozygosity rate of 0.98% and a repeat rate of 66.3% collectively indicated a genome with a high level of heterozygosity and repeat sequence (Figure 3 and Table 5). The resulting genetic diversity, which aligns with the species’ agricultural importance, is poised to enhance its biological adaptability. This propensity for heterogeneity is expected to contribute to the species’ overall genetic diversity, thereby enhancing its capacity to adapt and interact within its environment, which notably corresponds to its significant agricultural role.
Simple sequence repeat (SSR) markers are valuable tools for assessing population structure and genetic diversity within native populations of I. pseudotinctoria, as well as for evaluating genetic perturbations caused by non-native conspecifics. A previous study conducted by Fan et al. [38] used amplified fragment length polymorphism (AFLP) technology to investigate the genetic diversity, differentiation, and structure of I. pseudotinctoria genotypes. It is important to note that there are significant methodological differences between AFLP (a dominant marker) and SSR (a co-dominant marker) techniques, primarily due to their fundamentally distinct patterns of genetic inheritance [39]. Previous studies have used microsatellite markers to assess genetic disturbance in native populations of I. pseudotinctoria resulting from the introduction of non-native conspecifics [40]. However, these studies have limitations in comprehensively evaluating the entire genome of I. pseudotinctoria. In the context of this current study, a notable finding is the identification of 240,659 SSR loci within the genome of I. pseudotinctoria (Table 6). The observed variations in nucleotide repeat abundance indicated that mono-nucleotide repeats are the most abundant, while pentanucleotide repeats are conspicuously rare (Table 6). These patterns reflect the complex evolutionary forces that have shaped the genomic landscape of this species. Moreover, the prominence of specific sequence motifs, such as AT/AT and AAT/ATT within dinucleotide and trinucleotide repeats, respectively (Figure 6), suggests their potential involvement in critical functional aspects of the genome. Furthermore, SSR markers developed from functional genes are expected to significantly enhance the precision of marker-assisted selection and association mapping [41]. Genome-wide analysis identified 11,027 Pfam-annotated genes harboring SSRs, with enrichment analysis revealing that the top 10 functional domains [e.g., pentatricopeptide repeats (PPRs), protein kinases, and Myb genes] were associated with plant growth regulation and stress responses (Figure S2). These findings suggest SSR-containing genes may mediate important biological functions, providing valuable genetic markers for I. pseudotinctoria breeding programs.
The impact of GC content on sequencing bias caused by the Illumina sequencing platform was identified as one of the three key factors contributing to this phenomenon [25]. Variations in GC content beyond the optimal range (25–65%) result in uneven se-quencing coverage, thereby compromising the accuracy and completeness of genome assembly [42]. In the present study, the GC content of I. pseudotinctoria was found to be moderate, which aligns with the typical range of 30% to 47% that observed in most plant species [1,43]. Consequently, it is postulated that this moderate GC content would not exert a significant influence on the quality of the genome sequencing during the sequencing process [44].
Our investigation goes beyond mere sequence analysis by identifying 58,790 unigenes that align with known genes in widely recognized databases, particularly the GO and KEGG (Figure 8). This expansion enhances our comprehension of the underlying biological mechanisms of I. pseudotinctoria, offering insights into its molecular functionalities and potential pathways. The substantial representation of genes within these databases underscores the biological richness of this species, amplifying its significance in both agricultural and medicinal contexts [45]. Notably, our study highlights the presence of numerous genes mapped onto key pathways of KEGG within I. pseudotinctoria, potentially shedding light on its ability to synthesize essential amino acids, secondary metabolites, and bioactive compounds. This is particularly relevant when considering the plant’s historical medicinal application. Amino acids, beyond their primary roles, also serve as precursors in the biosynthesis of diverse secondary metabolites [46]. In I. pseudotinctoria, secondary metabolites likely play a vital role in protecting the plant against pathogens and herbivores, a trait that is notably consistent with its historical medicinal use [5]. Consequently, the identified genetic repertoire not only provides valuable insights for exploring the realm of secondary metabolites but also holds the potential to unearth distinctive bioactive compounds that might find application in various fields, including phytochemicals with potential health benefits, thus aligning with its medicinal traditions.

5. Conclusions

In conclusion, this study not only accomplishes the sequencing of the complete genome of I. pseudotinctoria but also lays the groundwork for future studies. The genetic insights garnered herein will serve as a cornerstone for subsequent genome-wide investigations, deepening our comprehension of this species’ biology. Furthermore, the identification of SSR markers holds promise for advancing breeding programs and molecular research. This study substantially contributes to the fields of genomics, botany, and agriculture, ultimately providing a vital resource for the further exploration and utilization of I. pseudotinctoria.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes16090991/s1, Figure S1: Distribution of I. pseudotinctoria gene in Nr homologous database; Figure S2: Top 10 functional terms of genes containing SSR; Table S1: Basic function annotation; Table S2: KOG functional categories.

Author Contributions

Conceptualization, J.C. and Y.F.; methodology, J.C. and Q.R.; software, Y.X.; validation, J.C., Q.R. and W.H.; resources, Y.F.; data curation, J.Z. and X.M.; writing—original draft preparation, J.C.; writing—review and editing, J.C., J.Z. and W.H.; supervision, Y.F.; project administration, Q.R. and Y.F.; funding acquisition, Q.R. and Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Livestock Science and Technology Innovation Team Cultivation Project (grant number 22535C), Chongqing Technology Innovation and Application Development Project (grant number CSTB2023TIAD-KPX0024) and Chongqing performance incentive guide special project (grant number CSTB2023JXJL-YFX0034).

Data Availability Statement

Raw data and the genome assembly from this study were deposited in China National GeneBank Database under the ID: CNP0007515.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhou, S.M.; Wang, F.; Yan, S.Y.; Zhu, Z.M.; Gao, X.F.; Zhao, X.L. Phylogenomics and plastome evolution of Indigofera (Fabaceae). Front. Plant Sci. 2023, 14, 1186598. [Google Scholar] [CrossRef] [PubMed]
  2. Cho, S.E.; Zhao, T.T.; Choi, I.Y.; Choi, Y.J.; Shin, H.D. First Report of Powdery Mildew Caused by Erysiphe trifoliorum on Indigofera amblyantha in Korea. Plant Dis. 2016, 100, 1954. [Google Scholar] [CrossRef]
  3. Schrire, B. A review of tribe Indigofereae (Leguminosae–Papilionoideae) in Southern Africa (including South Africa, Lesotho, Swaziland & Namibia; excluding Botswana). S. Afr. J. Bot. 2013, 89, 281–283. [Google Scholar]
  4. Zhao, J.M.; Chen, J.; Xiong, Y.; He, W.; Xiong, Y.L.; Xu, Y.D.; Ma, H.Z.; Yu, Q.Q.; Li, Z.; Liu, L.; et al. Organelle genomes of Indigofera amblyantha and Indigofera pseudotinctoria: Comparative genome analysis, and intracellular gene transfer. Ind. Crops Prod. 2023, 198, 116674. [Google Scholar] [CrossRef]
  5. Gerometta, E.; Grondin, I.; Smadja, J.; Frederich, M.; Gauvin-Bialecki, A. A review of traditional uses, phytochemistry and pharmacology of the genus Indigofera. J. Ethnopharmacol. 2020, 253, 112608. [Google Scholar] [CrossRef]
  6. Bakasso, S.; Lamien-Meda, A.; Lamien, C.E.; Kiendrebeogo, M.; Millogo, J.; Ouedraogo, A.G.; Nacoulma, O.G. Polyphenol contents and antioxidant activities of five Indigofera species (Fabaceae) from Burkina Faso. Pak. J. Biol. Sci. 2008, 11, 1429–1435. [Google Scholar] [CrossRef]
  7. Wang, W.L.; Xu, J.F.; Fang, H.Y.; Li, Z.J.; Li, M.H. Advances and challenges in medicinal plant breeding. Plant Sci. 2020, 298, 110573. [Google Scholar] [CrossRef]
  8. Chen, S.L.; Yu, H.; Luo, H.M.; Wu, Q.; Li, C.F.; Steinmetz, A. Conservation and sustainable use of medicinal plants: Problems, progress, and prospects. Chin. Med. 2016, 11, 37. [Google Scholar] [CrossRef]
  9. Taheri, S.; Abdullah, T.L.; Yusop, M.R.; Hanafi, M.M.; Sahebi, M.; Azizi, P.; Shamshiri, R.S. Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants. Molecules 2018, 23, 399. [Google Scholar] [CrossRef] [PubMed]
  10. Younis, A.; Ramzan, F.; Ramzan, Y.; Zulfiqar, F.; Ahsan, M.; Lim, K.B. Molecular Markers Improve Abiotic Stress Tolerance in Crops: A Review. Plants 2020, 9, 1347. [Google Scholar] [CrossRef] [PubMed]
  11. Guo, L.N.; Gao, X.F. Genetic diversity and population structure of Indigofera szechuensis complex (Fabaceae) based on EST-SSR markers. Gene 2017, 624, 26–33. [Google Scholar] [CrossRef]
  12. Ellis, J.R.; Burke, J.M. EST-SSRs as a resource for population genetic analyses. Heredity 2007, 99, 125–132. [Google Scholar] [CrossRef]
  13. Zhang, L.; Yuan, D.; Yu, S.; Li, Z.; Cao, Y.; Miao, Z.; Qian, H.; Tang, K. Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana. Bioinformatics 2004, 20, 1081–1086. [Google Scholar] [CrossRef]
  14. Zalapa, J.E.; Cuevas, H.; Zhu, H.Y.; Steffan, S.; Senalik, D.; Zeldin, E.; Mccown, B.; Harbut, R.; Simon, P. Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant sciences. Am. J. Bot. 2012, 99, 193–208. [Google Scholar] [CrossRef]
  15. Chen, Q.F.; Lan, C.W.; Zhao, L.; Wang, J.X.; Chen, B.S.; Chen, Y.P. Recent advances in sequence assembly: Principles and applications. Brief. Funct. Genom. 2017, 16, 361–378. [Google Scholar] [CrossRef]
  16. Armstrong, J.; Fiddes, I.T.; Diekhans, M.; Paten, B. Whole-Genome Alignment and Comparative Annotation. Annu. Rev. Anim. Biosci. 2019, 7, 41–64. [Google Scholar] [CrossRef]
  17. Liu, Y.H.; Zeng, Y.T.; Li, Y.M.; Liu, Z.; Wang, K.L.; Espley, R.V.; Allan, A.C.; Zhang, J.L. Genomic survey and gene expression analysis of the MYB-related transcription factor superfamily in potato (Solanum tuberosum L.). Int. J. Biol. Macromol. 2020, 164, 2450–2464. [Google Scholar] [CrossRef]
  18. Li, Y.L.; Sun, A.L.; Wu, Q.; Zou, X.X.; Chen, F.L.; Cai, R.Q.; Xie, H.; Zhang, M.; Guo, X.H. Comprehensive genomic survey, structural classification and expression analysis of C2H2-type zinc finger factor in wheat (Triticum aestivum L.). BMC Plant Biol. 2021, 21, 380. [Google Scholar] [CrossRef]
  19. Ouyang, Y.D.; Huang, X.L.; Lu, Z.H.; Yao, J.L. Genomic survey, expression profile and co-expression network analysis of OsWD40 family in rice. BMC Genom. 2012, 13, 100. [Google Scholar] [CrossRef]
  20. Dolezel, J.; Bartos, J. Plant DNA Flow Cytometry and Estimation of Nuclear Genome Size. Ann. Bot. 2005, 95, 99–110. [Google Scholar] [CrossRef]
  21. Project, I.R.G.S. The map-based sequence of the rice genome. Nature 2005, 436, 793–800. [Google Scholar] [CrossRef]
  22. Miller, J.R.; Koren, S.; Sutton, G. Assembly algorithms for next-generation sequencing data. Genomics 2010, 95, 315–327. [Google Scholar] [CrossRef]
  23. Simpson, J.T.; Wong, K.; Jackman, S.D.; Schein, J.E.; Jones, S.J.M.; Birol, I. ABySS: A parallel assembler for short read sequence data. Genome Res. 2009, 19, 1117. [Google Scholar] [CrossRef]
  24. Varshney, R.K.; Chen, W.B.; Li, Y.P.; Bharti, A.K.; Saxena, R.K.; Schlueter, J.A.; Donoghue, M.T.A.; Azam, S.; Fan, G.Y.; Whaley, A.M.; et al. Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat. Biotechnol. 2011, 30, 83–89. [Google Scholar] [CrossRef]
  25. Cheung, M.S.; Down, T.A.; Latorre, I.; Ahringer, J. Systematic bias in high-throughput sequencing data and its correction by BEADS. Nucleic Acids Res. 2011, 39, e103. [Google Scholar] [CrossRef]
  26. Simão, F.A.; Waterhouse, R.M.; Panagiotis, I.; Kriventseva, E.V.; Zdobnov, E.M. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 2015, 31, 3210–3212. [Google Scholar] [CrossRef]
  27. Grabherr, M.G.; Haas, B.J.; Yassour, M.; Levin, J.Z.; Thompson, D.A.; Amit, I.; Adiconis, X.; Fan, L.; Raychowdhury, R.; Zeng, Q.D.; et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011, 29, 644–652. [Google Scholar] [CrossRef]
  28. Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.H.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997, 25, 3389–3402. [Google Scholar] [CrossRef]
  29. Galperin, M.Y.; Vera Alvarez, R.; Karamycheva, S.; Makarova, K.S.; Wolf, Y.I.; Landsman, D.; Koonin, E.V. COG database update 2024. Nucleic Acids Res. 2024, 53, 356–363. [Google Scholar] [CrossRef]
  30. Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T.; et al. Gene Ontology: Tool for the unification of biology. Nat. Genet. 2000, 25, 25–29. [Google Scholar] [CrossRef]
  31. Consortium, G.O. The Gene Ontology knowledgebase in 2023. Genetics 2023, 224, iyad031. [Google Scholar] [CrossRef]
  32. Kanehisa, M.; Furumichi, M.; Sato, Y.; Matsuura, Y.; Ishiguro-Watanabe, M. KEGG: Biological systems database as a model of the real world. Nucleic Acids Res. 2024, 53, 672–677. [Google Scholar] [CrossRef]
  33. Zhang, Y.Y.; An, Y.; Lin, F.; Ma, Q.Y.; Zhou, X.Y.; Jin, L.; Li, P.F.; Wang, Z.S. Estimation of Genome Size of Parrotia C. A. Mey. by Flow Cytometry and K-mer Analysis. J. Plant Genet. Resour. 2020, 22, 561–570. [Google Scholar]
  34. Huang, A.J.; Zhou, J.Y.; Li, T.Z.; Xing, Y.D.; Gao, F.; Zhou, Y.J. Flow cytometry and K-mer analysis estimates of genome size of Sophora alopecuroide. Chin. Tradit. Herb. Drugs 2019, 50, 6098–6102. [Google Scholar]
  35. Huang, S.W.; Li, R.Q.; Zhang, Z.H.; Li, L.; Gu, X.F. The genome of the cucumber, Cucumis sativus L. Nat. Genet. 2009, 41, 1275–1281. [Google Scholar]
  36. Zhang, X.Y.; Liu, Z.X.; Liao, B.S.; Xiao, S.M.; Xu, J.; Sheng, W. Estimation of Genome Size of Ginseng Based on Herbgenomics by Flow Cytometric Analysis and High-throughput Sequence. World Sci. Technol./Mod. Tradit. Chin. Med. Mater. Medica 2017, 19, 1724–1728. [Google Scholar]
  37. Temsch, E.M.; Koutecký, P.; Urfus, T.; Šmarda, P.; Doležel, J. Reference standards for flow cytometric estimation of absolute nuclear DNA content in plants. Cytom. Part A 2022, 101, 710–724. [Google Scholar] [CrossRef]
  38. Fan, Y.; Zhang, C.L.; Wu, W.D.; He, W.; Zhang, L.; Ma, X. Analysis of Genetic Diversity and Structure Pattern of Indigofera Pseudotinctoria in Karst Habitats of the Wushan Mountains Using AFLP Markers. Molecules 2017, 22, 1734. [Google Scholar] [CrossRef]
  39. Amiteye, S. Basic concepts and methodologies of DNA marker systems in plant molecular breeding. Heliyon 2021, 7, e08093. [Google Scholar] [CrossRef]
  40. Otao, T.; Kobayashi, T.; Uehara, K. Development and characterization of 14 microsatellite markers for Indigofera pseudotinctoria (Fabaceae). Appl. Plant Sci. 2016, 4, apps.1500110. [Google Scholar] [CrossRef]
  41. Du, Q.Z.; Pan, W.; Xu, B.H.; Li, B.L.; Zhang, D.Q. Polymorphic simple sequence repeat (SSR) loci within cellulose synthase (PtoCesA) genes are associated with growth and wood properties in Populus tomentosa. New Phytol. 2013, 197, 763–776. [Google Scholar] [CrossRef]
  42. Shangguan, L.F.; Han, J.; Kayesh, E.; Sun, X.; Zhang, C.Q.; Pervaiz, T.; Wen, X.C.; Fang, J.G. Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags. PLoS ONE 2013, 8, e69890. [Google Scholar] [CrossRef]
  43. Li, G.Q.; Song, L.X.; Jin, C.Q.; Li, M.; Gong, S.P.; Wang, Y.F. Genome survey and SSR analysis of Apocynum venetum. Biosci. Rep. 2019, 39, BSR20190146. [Google Scholar] [CrossRef] [PubMed]
  44. Claros, M.G.; Rocío, B.; Darío, G.F.; Benzerki, H.; Noé, F.P. Why Assembling Plant Genome Sequences Is So Challenging. Biology 2012, 1, 439–459. [Google Scholar] [CrossRef]
  45. Baxevanis, A.D.; Bateman, A. The Importance of Biological Databases in Biological Discovery. Curr. Protoc. Bioinform. 2015, 50, 1–8. [Google Scholar] [CrossRef] [PubMed]
  46. Chandel, N.S. Amino Acid Metabolism. Cold Spring Harb. Perspect. Biol. 2021, 13, a040584. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The morphological characteristics of I. pseudotinctoria. (A) The plant; (B) the leaves; (C) the tree trunk; (D) the flower; (E) the branch with inflorescence; (F) the branch with silique.
Figure 1. The morphological characteristics of I. pseudotinctoria. (A) The plant; (B) the leaves; (C) the tree trunk; (D) the flower; (E) the branch with inflorescence; (F) the branch with silique.
Genes 16 00991 g001
Figure 2. Genome size of I. pseudotinctoria analyzed by flow cytometry, with rice as the internal control. (A) Flow scatter diagram of cell mixed suspension. The x-axis represents the signal of forward scatter area (FSC-A), the y-axis represents the signal of side scatter area (SSC-A). Region B (red): G0/G1 nuclei of I. pseudotinctoria. Region C (pink purple): G0/G1 nuclei of rice. (B) Histogram of relative fluorescence intensity derived from nuclei isolated from rice and I. pseudotinctoria processed simultaneously. The x-axis represents the fluorescence intensity of PI fluorescence area (PE-A), the y-axis represents the number of cells. Peak A (rose red): G0/G1 nuclei of I. pseudotinctoria, Peak D (purple): G0/G1 nuclei of rice.
Figure 2. Genome size of I. pseudotinctoria analyzed by flow cytometry, with rice as the internal control. (A) Flow scatter diagram of cell mixed suspension. The x-axis represents the signal of forward scatter area (FSC-A), the y-axis represents the signal of side scatter area (SSC-A). Region B (red): G0/G1 nuclei of I. pseudotinctoria. Region C (pink purple): G0/G1 nuclei of rice. (B) Histogram of relative fluorescence intensity derived from nuclei isolated from rice and I. pseudotinctoria processed simultaneously. The x-axis represents the fluorescence intensity of PI fluorescence area (PE-A), the y-axis represents the number of cells. Peak A (rose red): G0/G1 nuclei of I. pseudotinctoria, Peak D (purple): G0/G1 nuclei of rice.
Genes 16 00991 g002
Figure 3. Distribution curve of K-mer (k = 17) of I. pseudotinctoria. The x-axis is depth and the y-axis is the proportion that represents the frequency at that depth divided by the total frequency of all depths.
Figure 3. Distribution curve of K-mer (k = 17) of I. pseudotinctoria. The x-axis is depth and the y-axis is the proportion that represents the frequency at that depth divided by the total frequency of all depths.
Genes 16 00991 g003
Figure 4. Average sequencing depth and GC content of I. pseudotinctoria. The x-axis represents the GC content and the y-axis is the sequence depth. The distribution of the sequence depth is on the right side, while the distribution of the GC content is at the top. Each dot represents a contig, color from red to yellow and then to gry indicates dot density from high to low.
Figure 4. Average sequencing depth and GC content of I. pseudotinctoria. The x-axis represents the GC content and the y-axis is the sequence depth. The distribution of the sequence depth is on the right side, while the distribution of the GC content is at the top. Each dot represents a contig, color from red to yellow and then to gry indicates dot density from high to low.
Genes 16 00991 g004
Figure 5. The distribution and frequency of SSR motif repeat numbers. x-axis is the number of repeats, y-axis is microsatellite class/type according to motif length and z axis is frequency of given microsatellite type.
Figure 5. The distribution and frequency of SSR motif repeat numbers. x-axis is the number of repeats, y-axis is microsatellite class/type according to motif length and z axis is frequency of given microsatellite type.
Genes 16 00991 g005
Figure 6. Identification and characteristics of simple repeat sequence motif. (A). Percentage of different motifs in dinucleotide repeats in I. pseudotinctoria; (B). Percentage of different motifs in trinucleotide repeats in I. pseudotinctoria.
Figure 6. Identification and characteristics of simple repeat sequence motif. (A). Percentage of different motifs in dinucleotide repeats in I. pseudotinctoria; (B). Percentage of different motifs in trinucleotide repeats in I. pseudotinctoria.
Genes 16 00991 g006
Figure 7. Genomic SSR-PCR amplification products from 17 primer pairs resolved by 2% agarose gel electrophoresis at 120 V.
Figure 7. Genomic SSR-PCR amplification products from 17 primer pairs resolved by 2% agarose gel electrophoresis at 120 V.
Genes 16 00991 g007
Figure 8. The genes were aligned by BLAST to the KOG, GO, and KEGG database. (A) KOG functional categories; (B) Gene Ontology classification, genes were assigned to three categories—cellular components, molecular functions, and biological processes; (C) gene assignment to KEGG functional categories in I. pseudotinctoria.
Figure 8. The genes were aligned by BLAST to the KOG, GO, and KEGG database. (A) KOG functional categories; (B) Gene Ontology classification, genes were assigned to three categories—cellular components, molecular functions, and biological processes; (C) gene assignment to KEGG functional categories in I. pseudotinctoria.
Genes 16 00991 g008
Table 1. Primer of 17 SSR markers in present study.
Table 1. Primer of 17 SSR markers in present study.
LocusRepeat MotifPrimer Sequence (5′-3′)
ForwardReverse
IP380501(GTT)5AATTTTTCCACGGGGTCTTCGTTGGTTTTATCCGTCGCTT
IP553716(ATA)5ATTGGTTGTGTGGACCGAATTCAAATTATTCCCTTATTCAAATTCA
IP885913(ATTT)5AATACAGGTGAGCAGTGCGATGAAATTCCACCACAATGGA
IP1040384(AAAT)5AATTGTCCTCGTGTTGTGAGGAATGGTGCGAATTTTATGCTT
IP1047571(GAT)5TCCTAAGCCACCACAAATCCCCATCTCCTACCTTCCAACTTC
IP1227591(AGA)5ACGAATCAGAAGAACAGGGCTCTCTCACAAACACCGACCA
IP1434377(GGC)5TTCGATTTGGATTTGCACTGAGAATGTTCTGCACCGTTCC
IP2408442(TATT)7CGCTGTTTAGGTTAACATTCCAACATCCCCATTAACTCAACATAG
IP10099562(TAC)5GGCCCTTTTCATTCCTTTTCACAACAAGGAGCTCTTCCCA
IP10099615(CTT)8TGCAGCAATGATGACATCTGTTGGCACCACATCAAACAGT
IP10125461(AAT)7GGAAGCTACTCTGCATCGGACATGCTCATCTCAGGCATGT
IP10125944(TCT)6ACCATTAGGCAGAGAGGCAATTGCACATGATTCGTTCTCC
IP10130648(TAC)5TGTCAGCTTTTGAAGCATGGGGCCAAAAGTGCAACATTCT
IP10130698(CCT)6CCTCCACCTCCCATGTAGAAAGCCACAAGCTACCTCAGGA
IP10130710(CAA)5GGGGTTATTCAGTCCCGTTTGACGCGACCCAATTGTAACT
IP10130769(GAT)6CGAGAGGTTAGGGGGAGATTCCCACAAATTAAGGGCATGA
IP10130792(ACA)7TTGCCACAAATACGCAAAAATTCTCAGGTCTGCTCTCGCT
Table 2. Statistics of flow cytometry data.
Table 2. Statistics of flow cytometry data.
Genome Size (Mb) Mean ± SDCV (%) of StandardCV (%) of Sample
920 ± 24.733.56
Table 3. Statistics of sequencing data and quality assessment of I. pseudotinctoria.
Table 3. Statistics of sequencing data and quality assessment of I. pseudotinctoria.
Number of Raw ReadsRaw Base (Gbp)Clean Base (Gbp)Q20 (%)Q30 (%)GC Content (%)
365,359,15054.80448.95298.1893.5135.96
Abbreviations: Q20 percentage of bases with quality value ≥ 20, Q30 percentage of bases with quality value ≥ 30.
Table 4. Statistics of assembled genome sequences for I. pseudotinctoria.
Table 4. Statistics of assembled genome sequences for I. pseudotinctoria.
Total Length
(bp)
Total NumberMax Length
(bp)
N50 Length
(bp)
N75 Length (bp)GC Content
(%)
431,452,197553,02190,236350676334.3
Table 5. Statistical data from the 17-mer analysis.
Table 5. Statistical data from the 17-mer analysis.
KmerDepthn_kmerGenome SizeHeterozygous RatioRepeat Sequence Content
176843,729,058,4436.37 × 1080.98%66.30%
Table 6. Simple sequence repeat types detected in the I. pseudotinctoria.
Table 6. Simple sequence repeat types detected in the I. pseudotinctoria.
Searching ItemNumberRatio (%)
Total number of sequences examined548,189
The total size of examined sequences (bp)430,490,629
Total number of identifed SSRs240,659100
Number of SSR containing sequences122,19550.78
Number of sequences containing more than 1 SSR46,23919.21
Number of SSRs present in the compound formation28,08611.67
Mononucleotide180,49174.50
Dinucleotide35,97814.95
Trinucleotide20,2138.40
Tetranucleotide31401.30
Pentanucleotide5150.21
Hexanucleotide3220.13
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, J.; Ran, Q.; Xu, Y.; Zhao, J.; Ma, X.; He, W.; Fan, Y. Genome Survey Sequencing of Indigofera pseudotinctoria and Identification of Its SSR Markers. Genes 2025, 16, 991. https://doi.org/10.3390/genes16090991

AMA Style

Chen J, Ran Q, Xu Y, Zhao J, Ma X, He W, Fan Y. Genome Survey Sequencing of Indigofera pseudotinctoria and Identification of Its SSR Markers. Genes. 2025; 16(9):991. https://doi.org/10.3390/genes16090991

Chicago/Turabian Style

Chen, Jing, Qifan Ran, Yuandong Xu, Junming Zhao, Xiao Ma, Wei He, and Yan Fan. 2025. "Genome Survey Sequencing of Indigofera pseudotinctoria and Identification of Its SSR Markers" Genes 16, no. 9: 991. https://doi.org/10.3390/genes16090991

APA Style

Chen, J., Ran, Q., Xu, Y., Zhao, J., Ma, X., He, W., & Fan, Y. (2025). Genome Survey Sequencing of Indigofera pseudotinctoria and Identification of Its SSR Markers. Genes, 16(9), 991. https://doi.org/10.3390/genes16090991

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop