Inflorescence Transcriptome Sequencing and Development of New EST-SSR Markers in Common Buckwheat (Fagopyrum esculentum)

Common buckwheat (Fagopyrum esculentum M.) is known for its adaptability, good nutrition, and medicinal and health care value. However, genetic studies of buckwheat have been hindered by limited genomic resources and genetic markers. In this study, Illumina HiSeq 4000 high-throughput sequencing technology was used to sequence the transcriptome of green-flower common buckwheat (Gr) with coarse pedicels and white-flower Ukrainian daliqiao (UD) with fine pedicels. A total of 118,448 unigenes were obtained, with an average length of 1248 bp and an N50 of 1850 bp. A total of 39,432 differentially expressed genes (DEGs) were identified, and the DEGs of the porphyrins and chlorophyll metabolic pathway had significantly upregulated expression in Gr. Then, a total of 17,579 sequences containing SSR loci were detected, and 20,756 EST-SSR loci were found. The distribution frequency of EST-SSR in the transcriptome was 17.52%, and the average distribution density was 8.21 kb. A total of 224 pairs of primers were randomly selected for synthesis; 35 varieties of common buckwheat and 13 varieties of Tartary buckwheat were verified through these primers. The clustering results well verified the previous conclusion that common buckwheat and Tartary buckwheat had a distant genetic relationship. The EST-SSR markers identified and developed in this study will be helpful to enrich the transcriptome information and marker-assisted selection breeding of buckwheat.


Introduction
Common buckwheat (Fagopyrum esculentum Moench) is a medicinal and edible crop that belongs to the eudicot family Polygonaceae [1]. Due to its characteristics of a shorter growing period, wide adaptability to different geographical environments, and strong resistance to extreme climates, common buckwheat is widely cultivated in temperate regions of Asia, Europe, and North America [2]. Common buckwheat seeds are rich in protein, fat, starch, vitamin, rutin, mineral elements, and vegetable cellulose, and it has preventive and therapeutic effects on cardiovascular diseases, diabetes, and constipation [3]. Therefore, a large number of studies were mainly focused on the biologically active ingredients of buckwheat, such as flavonoids and flavones [4], phytosterols [5], and fagopyrins [6].
Although common buckwheat has high nutritional value, the seed yield is low because of its self-incompatibility. The lack of genomic resources and tightly linked markers of important agronomic genes of buckwheat is an important factor restricting the molecular

RNA Sequencing and Functional Annotation of Unigenes
To establish the transcriptome library, two replicates of the total RNA of Gr_F and UD_F were extracted and sequenced on the Illumina HiSeq platform. The raw reads were uploaded in the NCBI Sequence Read Archive (SRA) under the accession numbers SRR17325563, SRR17325562, SRR17325561, and SRR17325560. In this study, there were 48,327,536 raw reads for Gr_F_1 and 42,364,332 raw reads for Gr_F_2, while there were 54,440,478 raw reads for UD_F_1 and 51,885,264 raw reads for UD_F_2. After removing adaptors and low-quality data, Gr_F_1, Gr_F_2, UD_F_1, and UD_F_2 produced 47,630,570, 41,788,726, 52,870,344, and 50,344,286 clean reads, respectively. The GC content reached 45.61%, 45.58%, 45.52%, and 45.54%, respectively, and the Q20 values were all greater than 97% (Table S1).
According to the clean reads, Trinity software generated 177,125 transcripts with an average length of 932 bp and an N50 of 1657 bp, the minimum length was 201 bp, and the maximum length was 16,788 bp. The longest cluster obtained via Corset hierarchical clustering was identified as a unigene, where 118,448 unigenes were obtained after the calculations with an average length of 1248 bp and an N50 of 1850 bp; the minimum and maximum values were the same as the former but the median length value was 882 bp. As for the length interval, the largest number of unigenes were between 501 and 1000 bp in length with 34,910 and the fewest were less than 301 bp in length with 9402 (Table 1). Among the 118,448 assembled unigenes, seven databases were compared to obtain comprehensive gene function information. The results showed that 67,950 (57.36%) of the unigenes had significant similarity in the NCBI non-redundant (Nr) database, 43,056 (36.35%) of them were in the NT database, and 57,262 (48.34%) were in the Swiss-Prot database. In total, there were 10,798 (9.11%) unigenes annotated to all seven databases and 77,428 (65.36%) unigenes annotated in at least one database successfully (Table S2).
After GO annotation of the unigenes, the unigenes that were successfully annotated were classified according to the next level of three GO categories of "Biological process", "Cellular component", and "Molecular function" ( Figure S1). According to the statistical results, the unigenes could be classified into 24 terms in "Biological process", and the top three largest categories were "cellular process", "metabolic process", and "singleorganism process." There were 21 terms in "Cellular component", and "cell", "cell part", and "organelle" were highly represented. In terms of "Molecular function", there were 10 terms, of which, "binding" and "catalytic activity" were the first and second most abundant categories.
KEGG is a database resource for understanding high-level functions and utilities of the biological system; we used KOBAS [30] software to test the statistical enrichment of differential expression unigenes in KEGG pathways. In this study, out of all the 118,448 unigenes, 28,120 (23.74%) were significantly matched to the KEGG pathway database and assigned to five biochemical pathways (hierarchy 1), including 19 main pathways (hierarchy 2) ( Figure S3). In these five main classes, metabolism had the largest proportion (11,722,55.14%), genetic information processing followed closely at (5876, 27%), while the last three were environmental information processing (1163, 5.47%), cellular processes (1391, 6.54%), and organismal systems (1106, 5.21%). The top three in the biochemical pathways (hierarchy 2) were carbohydrate metabolism; translation; and folding, sorting, and degradation.

Differential Expression Analysis
The results of observing the pedicels of Gr and UD at the full flowering stage showed that the pedicel diameter of Gr was thicker and showed an extremely significant difference. The contents of chlorophyll A, chlorophyll B, and total chlorophyll in the inflorescence in Gr were significantly higher than in UD ( Figure 2).
To detect differentially expressed genes (DEGs), the expression levels of these unigenes in the Gr_F and UD_F were estimated. Using differential expression analysis of genes in the inflorescence of them, 19,484 genes and 25,393 genes were specifically expressed among them. A total of 39,432 unigenes were differentially expressed between the two cultivars, of which, 23,100 genes were significantly upregulated and 16,332 genes were significantly downregulated in the inflorescence of Gr_F (Table S3).
In the porphyrins and chlorophyll metabolic pathway [34][35][36], chlorophyll a and chlorophyll b, in addition to catalytic synthesis pathways in which the original chlorophyll acid divinyl fat are converted back into the original chlorophyll acid fat of 8-ethylene reductase (DVR), did not display differentially expressed genes, and a total of 33 differentially expressed genes were detected, where 27 of them in green-flower buckwheat had significantly upregulated expression and 6 had significantly downregulated expression ( Figure 3).

Differential Expression Analysis
The results of observing the pedicels of Gr and UD at the full flowering stage showed that the pedicel diameter of Gr was thicker and showed an extremely significant difference. The contents of chlorophyll A, chlorophyll B, and total chlorophyll in the inflorescence in Gr were significantly higher than in UD ( Figure 2).  To detect differentially expressed genes (DEGs), the expression levels of these unigenes in the Gr_F and UD_F were estimated. Using differential expression analysis of genes in the inflorescence of them, 19,484 genes and 25,393 genes were specifically expressed among them. A total of 39,432 unigenes were differentially expressed between the two cultivars, of which, 23,100 genes were significantly upregulated and 16,332 genes were significantly downregulated in the inflorescence of Gr_F (Table S3).
In the porphyrins and chlorophyll metabolic pathway [34][35][36], chlorophyll a and chlorophyll b, in addition to catalytic synthesis pathways in which the original chlorophyll acid divinyl fat are converted back into the original chlorophyll acid fat of 8-ethylene reductase (DVR), did not display differentially expressed genes, and a total of 33 differentially expressed genes were detected, where 27 of them in green-flower buckwheat had significantly upregulated expression and 6 had significantly downregulated expression ( Figure 3).  Flavonoids are the main nutrient in buckwheat and phenylalanine is the direct precursor of flavonoid biosynthesis. The pigments of white flowers are mainly colorless flavonoids, such as flavone and flavonols. Therefore, we analyzed the differentially expressed genes in the phenylpropane biosynthesis pathway (Ko0940) and detected 186 differentially expressed genes. Among them, there were 15 differentially expressed genes of Flavonoids are the main nutrient in buckwheat and phenylalanine is the direct precursor of flavonoid biosynthesis. The pigments of white flowers are mainly colorless flavonoids, such as flavone and flavonols. Therefore, we analyzed the differentially expressed genes in the phenylpropane biosynthesis pathway (Ko0940) and detected 186 differentially expressed genes. Among them, there were 15 differentially expressed genes of PAL, C4H, and 4CL, 11 of which were downregulated in Gr_F, while PAL and C4H were all downregulated in Gr_F, which might have been caused by the synthesis of a large amount of flavonoids in white-flower buckwheat. The downstream genes F3H, F3'5' H, and FLS were downregulated in Gr_F and upregulated in UD_F ( Figure 4). PAL, C4H, and 4CL, 11 of which were downregulated in Gr_F, while PAL and C4H were all downregulated in Gr_F, which might have been caused by the synthesis of a large amount of flavonoids in white-flower buckwheat. The downstream genes F3H, F3'5' H, and FLS were downregulated in Gr_F and upregulated in UD_F (Figure 4).

The Frequency and Distribution of SSRs
All 118,448 assembled unigenes were identified for potential SSR loci using MISA software, and a total of 20,756 SSRs were mined from 17,579 unigenes (Table S4). The distribution frequency of EST-SSR in the transcriptome was 17.52%, and the average distribution density was 8.21 kb (Table S5). Among the unigenes containing SSRs, 2579 unigenes had more than one SSR; meanwhile, 920 SSRs presented a compound formation ( Table 2).

Primer Design and Validation of EST-SSR Markers
In this study, a total of 13,909 pairs of primers were developed from 20,756 identified EST-SSR loci, where the length of the primers was 18-23 bp, and the size of the amplified products was 100-300 bp (Table S6). After comparing the transcriptome data of other varieties of Fagopyrum esculentum, part of the Fagopyrum esculentum genome database and the F. tataricum genome database, 224 pairs of SSR loci were randomly selected and synthesized (Table S7) for homology cluster analysis of 35 varieties of common buckwheat and 13 varieties of Tartary buckwheat (Table S8). It was found that 92 (41.07%) pairs showed polymorphism in different varieties of common buckwheat and Tartary buckwheat ( Figure S4). The results showed that the genetic similarity coefficient of the 48 buckwheat varieties ranged from 0.38 to 0.99, and the buckwheat varieties were divided into two groups with a limit of 0.68 ( Figure 6).

Primer Design and Validation of EST-SSR Markers
In this study, a total of 13,909 pairs of primers were developed from 20,756 identified EST-SSR loci, where the length of the primers was 18-23 bp, and the size of the amplified products was 100-300 bp (Table S6). After comparing the transcriptome data of other varieties of Fagopyrum esculentum, part of the Fagopyrum esculentum genome database and the F. tataricum genome database, 224 pairs of SSR loci were randomly selected and synthesized (Table S7) for homology cluster analysis of 35 varieties of common buckwheat and 13 varieties of Tartary buckwheat (Table S8). It was found that 92 (41.07%) pairs showed polymorphism in different varieties of common buckwheat and Tartary buckwheat ( Figure S4). The results showed that the genetic similarity coefficient of the 48 buckwheat varieties ranged from 0.38 to 0.99, and the buckwheat varieties were divided into two groups with a limit of 0.68 ( Figure 6).

Discussion
In a previous study, Logacheva revealed differentially expressed genes related to sugar biosynthesis and metabolism through comparative analysis of flower and inflorescence transcriptomes of common and Tartary buckwheat [28]. High-throughput mRNA sequencing technologies were used in the genetic research of buckwheat [17]. In this study, inflorescence transcriptomes of "green-flower buckwheat" and "white-flower buckwheat" were sequenced on an Illumina HiSeq 4000 platform. Compared with the transcriptome sequencing data of immature buckwheat seeds by Shi [7], we produced a larger number of transcripts (177,125 vs. 54,975), a longer average transcript length (932 vs. 840) ( Table 1), and the higher N50 value of 1657 bp indicated that we generated a highquality assembly of the inflorescence transcriptome for common buckwheat. The assembled transcripts in this study are appropriate for transcriptome analysis, gene identification, and marker development, and could be an important source for shattering-resistant research on buckwheat in the future.
In this study, 10,798 (9.11%) unigenes were annotated using Nr, NT, Pfam, KOG, Swiss-Prot, KEGG, and GO databases, where 77,428 (65.36%) unigenes were annotated in at least one database successfully. Compared with other species, the top match was Beta vulgaris (24.6% sequence identity), followed by Vitis vinifera (12.9%), Theobroma cacao (3.8%), Jatropha curcas (3.4%), and Nelumbo nucifera (3.4%) ( Figure 1B). The results of Shi found that the top-hits taxonomic distribution of BLAST hits of common buckwheat was from Vitis vinifera [7], which is consistent with our results. A similar taxonomic distribution of species also appeared in previous studies, such as buckwheat flower [28], imma- The first group was different varieties of common buckwheat and the second group was different varieties of Tartary buckwheat. This well verified the previous conclusion that common buckwheat and Tartary buckwheat had a distant genetic relationship and demonstrated that the SSR primers developed in this study had strong accuracy and practicability.

Discussion
In a previous study, Logacheva revealed differentially expressed genes related to sugar biosynthesis and metabolism through comparative analysis of flower and inflorescence transcriptomes of common and Tartary buckwheat [28]. High-throughput mRNA sequencing technologies were used in the genetic research of buckwheat [17]. In this study, inflorescence transcriptomes of "green-flower buckwheat" and "white-flower buckwheat" were sequenced on an Illumina HiSeq 4000 platform. Compared with the transcriptome sequencing data of immature buckwheat seeds by Shi [7], we produced a larger number of transcripts (177,125 vs. 54,975), a longer average transcript length (932 vs. 840) ( Table 1), and the higher N50 value of 1657 bp indicated that we generated a high-quality assembly of the inflorescence transcriptome for common buckwheat. The assembled transcripts in this study are appropriate for transcriptome analysis, gene identification, and marker devel-opment, and could be an important source for shattering-resistant research on buckwheat in the future.
In this study, 10,798 (9.11%) unigenes were annotated using Nr, NT, Pfam, KOG, Swiss-Prot, KEGG, and GO databases, where 77,428 (65.36%) unigenes were annotated in at least one database successfully. Compared with other species, the top match was Beta vulgaris (24.6% sequence identity), followed by Vitis vinifera (12.9%), Theobroma cacao (3.8%), Jatropha curcas (3.4%), and Nelumbo nucifera (3.4%) ( Figure 1B). The results of Shi found that the top-hits taxonomic distribution of BLAST hits of common buckwheat was from Vitis vinifera [7], which is consistent with our results. A similar taxonomic distribution of species also appeared in previous studies, such as buckwheat flower [28], immature seeds of buckwheat [7], and Prunus persica [37] Among the species closest to buckwheat, genome sequencing of Vitis vinifera has been completed [38], which plays an important role in genome alignment and gene annotation of buckwheat.
Common buckwheat normally has white flowers, but scientists have also bred greenand red-flower buckwheat. Breeders in Ukraine found that green buckwheat is more fertile and has larger grains [39]. Studies showed that the green-flower phenotype and stout peduncle are regulated by a recessive gene [40]. Suzuki [41] found that the green-flower buckwheat had stouter peduncles and the shattering seed ratio was lower, and Fang [42] found the petals of green-flower buckwheat contain more chlorophyll than those of whiteflower buckwheat and red-flower buckwheat. Based on the previous studies, transcriptome sequencing of inflorescences of Gr and UD was conducted in this study. A total of 33 DEGs were detected in the chlorophyll synthesis pathway, of which, 27 genes were significantly upregulated in Gr; therefore, it was speculated that the green-flower buckwheat was mainly caused by chlorophyll. The results of this study laid a foundation for future research on the key candidate genes of buckwheat petal color and shattering resistance and the research on shattering resistance varieties.
A total of 20,756 SSRs were mined in 17,579 unigenes, which provided rich information for the development of the SSR marker in buckwheat. Excluding mono-nucleotide repeats, tri-nucleotide repeat motifs of EST-SSRs were the most abundant type (53.15%) of microsatellites in the study ( Table 2). The results were consistent with previous studies, such as Cucurbita pepo [43], Pinus tabuliformis [44], and castor bean [45], while other results showed that di-nucleotide was the most abundant type, such as in sesame [46] and oil palm [47]. The most abundant di-and tri-nucleotide motifs in this study were AT/AT (10.08%) and AAG/CTT (6.09%) (Table 3), respectively, where the di-nucleotide was consistent with the result of Hou [48] and the tri-nucleotide was consistent with the result of Shi [7]. These results are consistent with those of most plant species previously studied [49,50]. However, our nucleotide motif frequency was slightly different from that of legumes [51] and cereals [52]. The main reasons for the difference in the distribution frequency of SSR motif types were the different SSR search criteria, the different search algorithms, and the different selection pressures between plants.
Validation of SSRs discovered via transcriptome sequencing is the next step to building a working marker set for genetic improvement efforts. In previous studies, 10 polymorphic SSR markers were utilized in genetic diversity analysis of a common buckwheat population consisting of 41 accessions of diverse origin [17], including 17 (25%) SSRs exhibited polymorphisms between D. officinale individuals [50]. A total of 20,756 EST-SSR loci were obtained and 13,909 pairs of primers were developed in this study, where 224 pairs of primers were synthesized and 92 (41.07%) pairs of them showed polymorphism in different varieties. The polymorphic ratio (41.07%) of the primer pairs was higher than Konishi [11] and Ma [17]. The similarity coefficient of 48 buckwheat varieties ranged from 0.38 to 0.99, and 0.68 was used as the limit to divide buckwheat varieties into two groups; most of the cultivars were grouped according to geographic distribution, mainly into Yunnan, Sichuan, Guizhou, and other regions, which indicated that the large number of new SSR markers developed in this study will be useful resources for genetic diversity analysis, genetic mapping studies, and play an important role in molecular marker-assisted selection breeding for Fagopyrum species.

Plant Materials and RNA Isolation
Two common cultivated varieties, namely, green-flower buckwheat (Gr) with green flowers and resistance to shattering and Ukraine daliqiao (UD) with white flowers and nonresistance to shattering, were cultivated in the test field at Southwest University, Chongqing, China, with normal field management during the growth period. In the full-bloom stage, the inflorescence was collected from the Gr and UD and placed in liquid nitrogen for RNA isolation. The total RNA of the samples was extracted with TRIzol Reagent (TIANGEN, Beijing, China), according to the manufacturer's instructions. The purity and contamination of the isolated RNA were monitored on 1% agarose gels, and these RNA samples were used for cDNA library construction.

cDNA Library Construction and Sequence Assembly
The same amount (1.5 µg) of total RNA was taken from each sample as input for RNA sample preparations for library construction. Sequencing libraries were generated via a NEBNext ® Ultra™ RNA Library Prep Kit for Illumina ® (NEB, San Diego, CA, USA) following the manufacturer's recommendations. In short, mRNAs were purified from total RNA using poly-T oligo-attached magnetic beads. The mRNAs were disrupted into short fragments by added fragmentation buffer. First-strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H), and second-strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. The remaining overhangs were converted into blunt ends using exonuclease/polymerase activities. The cDNA fragments with lengths of 150-200 bp were selected after purification, size selection, and adaptor ligation of the library fragments. Then, PCR was performed and PCR products were purified to establish the final cDNA libraries. The library quality was assessed on an Agilent Bioanalyzer 2100 system.
The cDNA library was sequenced on the Illumina Hiseq platform by Bioinformatics Technology Co. Ltd., Beijing, China. Raw data (raw reads) in the FASTQ format were first processed through in-house Perl scripts. Clean reads were obtained by removing reads containing adapter, ploy-N, and low-quality reads (the proportion of low-quality bases of Q-value ≤ 20 is more than 50% in a read) from raw data (raw reads). Transcriptome assembly was based on Trinity [53] with min_kmer_cov set to 2 by default, where Trinity connects the contigs and obtains sequences defined as unigenes.

Function Annotation and Expression Analysis of Unigenes
All of the assembled unigenes were searched in the following databases for the functional annotation and classification of unigenes: Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam (protein family); KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (a manually annotated and reviewed protein sequence database), KO (KEGG Ortholog database), and GO (Gene Ontology). To further annotate the unigenes, GO annotation of unigenes was obtained using the Blast2GO program [54] with the cutoff E-value of 1 × 10 −6 The unigene sequences were also aligned with the COG database to predict and classify function and the pathway annotations of these unigenes were obtained with the KEGG database [55].
Gene expression levels were estimated using RSEM [56] for each sample. The transcriptome obtained via Trinity was used as the reference sequence (ref), and the clean reads of each sample were mapped on the ref. The read count for each gene was obtained from the mapping results. Differential expression analysis of two samples was performed using the DEGseq R package [57]. The resulting p-values were adjusted using Benjamini and Hochberg's method for controlling the false discovery rate, and the differentially expressed genes were defined as adjusted p-value < 0.05. Then, GO function enrichment analysis and KEGG metabolic pathway enrichment analysis were performed for the differentially expressed (upregulated and downregulated) genes.

SSR Mining and Primer Design
The Microsatellite software (MISA) [58] was used to detect the microsatellites within the unigenes in this study that were longer than 1000 bp, and the analysis parameters of MISA were set to the default. The standard of SSRs was considered to contain one to six repeat motifs in size. The minimum number of repeats of each corresponding unit size was as follows: mono-10, dimer-6, trimer-5, tetramer-5, pentamer-5, and hexamer-5. The primer pairs were designed by Primer3. The major parameters for designing SSR primers were: (1) primer length from 18 to 22 bases, (2) PCR product size ranges from 100 to 300 bp, (3) melting temperature between 55 and 61 • C with 59 • C being the optimal annealing temperature, and (4) GC content of 45-65% with an optimum of 50%. A total of 224 EST-SSR primers were randomly selected and synthesized by Beijing Genomics Institute Co. Ltd. (Beijing, China) to evaluate the application value of this set of EST-SSR markers.

PCR Amplification and Experimental Evaluation of Microsatellite Markers
To assess amplification efficiency and experimental evaluation of the newly developed SSRs, 35 common buckwheat accessions and 13 Tartary buckwheat accessions were used and the total genomic DNA of each accession was extracted according to a modified CTAB method [59]. The PCR amplification system consisted of 10 µL, including 1.0 µL template DNA, 1.0 µL 10X PCR buffer, 0.2 µL dNTP (2.5 mM), 1.5 µL MgCl 2 (1.5 mM), the forward and reverse primer (1 µM each) were each 0.5 µL, and 0.1 µL Taq polymerase (2 U/µL). Finally, ddH 2 O was added to complete the 10 µL. The specific reaction time and temperature of PCR amplification were set as follows: first, denaturing at 95 • C for 5 min, then cycled 40 times of 95 • C for 30 s, 55-60 • C for 30 s, 72 • C for 30 s, and finally extending at 72 • C for 15 min. PCR amplification results were detected using polyacrylamide gel electrophoresis, and the primer bands with polymorphism were counted and processed. Coefficients of genetic similarity of the 48 kinds of buckwheat germplasm resources were calculated using the SIMQUAL program of the NTSYS-pc software [60] and a clustering graph for the materials was constructed using the UPGMA algorithm of the SAHN module.

Conclusions
In this study, transcriptome sequencing was carried out by extracting RNA from two materials: Gr_F and UD_F, where 118,448 unigenes were obtained with a total sequence length of 147,868,721 bp, and 77,424 unigenes were annotated in at least one database. A total of 20,756 EST-SSR loci were mined in 17,595 unigenes, where 13,909 pairs of primers were developed. After preliminary screening, 224 pairs of primers were randomly selected and synthesized for homology cluster analysis of 35 varieties of common buckwheat and 13 varieties of Tartary buckwheat, where the 48 buckwheat varieties were divided into two groups according to the similarity coefficient of 0.68. The results of transcriptome sequencing and assembly, primer sequencing, and differential expression analysis will provide a theoretical basis for species classification, germplasm conservation, genetic diversity analysis and molecular marker-assisted breeding of buckwheat.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/plants11060742/s1. Figure S1. GO functional classification. The X-axis shows the GO term of the next level of the three categories and the Y-axis shows the numbers of unigenes annotated under the terms. Figure S2. KOG function classification. The X-axis shows the functional classes of KOG and the Y-axis shows the numbers of unigenes in each group. Figure S3. KEGG pathway assignment. The X-axis shows the number of unigenes annotated in the pathway and the proportion of the number in the total number of unigenes annotated. The Y-axis shows the name of the KEGG metabolic pathway. (A) Cellular processes, (B) environmental information processing, (C) genetic information processing, (D) metabolism, and (E) organismal systems. Figure S4. Polypropylene gel electrophoresis map for species verification of Primer SWU_Fe0156. Lane 1-35: