Comparative Transcriptome Analysis Reveals Adaptive Evolution of Notopterygium incisum and Notopterygium franchetii, Two High-Alpine Herbal Species Endemic to China

The extreme conditions (e.g., cold, low oxygen, and strong ultraviolet radiation) of the high mountains provide an ideal natural laboratory for studies on speciation and the adaptive evolution of organisms. Up to now, few genome/transcriptome-based studies have been carried out on how plants adapt to conditions at extremely high altitudes. Notopterygium incisum and Notopterygium franchetii (Notopterygium, Apiaceae) are two endangered high-alpine herbal plants endemic to China. To explore the molecular genetic mechanisms of adaptation to high altitudes, we performed high-throughput RNA sequencing (RNA-seq) to characterize the transcriptomes of the two species. In total, more than 130 million sequence reads, 81,446 and 63,153 unigenes with total lengths of 86,924,837 and 62,615,693 bp, were generated for the two herbal species, respectively. OrthoMCL analysis identified 6375 single-copy orthologous genes between N. incisum and N. franchetii. In total, 381 positively-selected candidate genes were identified for both plants by using estimations of the non-synonymous to synonymous substitution rate. At least 18 of these genes potentially participate in RNA splicing, DNA repair, glutathione metabolism and the plant–pathogen interaction pathway, which were further enriched in various functional gene categories possibly responsible for environment adaptation in high mountains. Meanwhile, we detected various transcription factors that regulated the material and energy metabolism in N. incisum and N. franchetii, which probably play vital roles in the tolerance to stress in surroundings. In addition, 60 primer pairs based on orthologous microsatellite-containing sequences between the both Notopterygium species were determined. Finally, 17 polymorphic microsatellite markers (SSR) were successfully characterized for the two endangered species. Based on these candidate orthologous and SSR markers, we detected that the adaptive evolution and species divergence of N. incisum and N. franchetii were significantly associated with the extremely heterogeneous environments and climatic oscillations in high-altitude areas. This work provides important insights into the molecular mechanisms of adaptation to high-altitudes in alpine herbal plants.


Introduction
Extreme environments provide a natural laboratory for studies on the processes of speciation and adaptive evolution of organisms [1]. In particular, the extremely high-alpine environment is known for its harsh conditions, characterized by severe cold, intense ultraviolet radiation, hypoxia, poor soils, and low CO 2 pressure [2,3]. Therefore, survival in extreme environments is very challenging for most organisms. Nevertheless, many plant species can thrive in the cold and hypoxic conditions of high-alpine areas [4,5]. Understanding how organisms adapt to these various extreme environments can make a significant contribution to the study of species divergence and evolutionary ecology. Generally, in response to the bioclimatic conditions, plants can generate particular structures and physiological features, and furthermore they can adapt to complex ecological conditions [6]. For example, some alpine plants are able to activate antioxidants, such as ascorbate oxidase (APX), catalase (CAT), and glutathione reductase (GR), proline and abscisicacid, conferring plants with tolerance to the alpine environment [7][8][9].
In recent years, genome/transcriptome sequencing has been proven to be an effective and rapid method for determining adaptive evolution and differential gene expression in high-altitude plants, e.g., Kobresia pygmaea [7], Potentilla saundersiana [10], Lamiophlomis rotata [11] and Lobelia [12]. Some studies demonstrated that the alpine plants have various morphological and physiological response strategies to adapt to high-elevation environments. For example, researchers have revealed that some genes are significantly involved in energy metabolism, hypoxia response under positive selection, and rapid adaptation [10][11][12]. Meanwhile, RNA-sequencing (RNA-seq) can also provide important genetic information on the development of molecular markers (e.g., co-dominant microsatellite markers (SSRs) and single-copy nuclear genes) [13]. These genetic markers identified by RNA-sequencing have also been widely applied to study population genetic divergence and species conservation [1,5,13]. However, up to now, few genome/transcriptome-based investigations have been devoted to the molecular mechanisms of high-altitude adaptation and evolution in alpine plants.
Notopterygium incisum C. C. Ting ex H. T. Chang and Notopterygium franchetii H. de Boissieuas are two important traditional Chinese medicine plants endemic to China. Both species are mainly distributed in the high-alpine meadows and forests (N. incisum, 3500-5100m) and sub-alpine shrubs (N. franchetii, 1700-4500 m) of Northwest China [14]. They provide excellent models for studying speciation and plant adaptation to extreme environments. In recent decades, wild population resources of the two species has rapidly decreased due to the effects of climatic oscillations and human activities [14]. They are listed as endangered herbal species in the IUCN Red List, and urgent management and conservation are required [15]. Information on population genetic structure and gene diversity are important to formulate effective conservation and management strategies of wild plant resources. However, most previous studies of the two species have mainly focused on their physiological and morphological characteristics [16][17][18], phylogenetic relationships [19,20] and pharmacognosy [21]. Little is known about their population genetics and species divergence information. In addition, the absence of genomic and transcriptomic resources has largely hindered the studies of adaptive evolution of two species to high-altitude areas.
Here, we present large-scale transcriptomes for N. incisum and N. franchetii that were derived using RNA-seq. Our aims were to: (1) identify candidate genes associated with adaptation to alpine environments through analysis of the functional positively selected genes between two species; (2) increase the limited molecular genetic resources of Notopterygium species; and (3) discover transcription factors (TFs) and genus-specific SSR genetic markers for two endangered species.

Sequencing and De Novo Assembly
In total, 56,759,848 and 61,120,020 sequence reads were generated for N. incisum and N. franchetii, respectively. After trimming low-quality sequences, we obtained 55,833,880 and 60,052,340 clean reads for the two species, respectively. All of the clean reads were then administered to the Trinity program to assemble transcripts. For N. incisum, 81,446 unigenes were obtained with a mean length of 1067 bp and an N50 size of 1435 bp. For N. franchetii, 63,153 unigenes were obtained with a mean length of 991 bp and an N50 size of 1405 bp (Table 1, Figure 1).

Functional Annotation
The complete set of unigenes from both Notopterygium species was obtained against the NR, GO, Swiss-prot, KOG and KEGG databases. In total, 69,237 (85.0%) sequences from N. incisum and 47,774 (75.65%) from N. franchetii generated at least one significant alignment with an available sequence in BLASTX searches ( Table 2). The top hit species were similar in both Notopterygium datasets. The majority of the top hits were genes derived from Vitis vinifera, Sesamum indicum and Theobroma cacao ( Figure S1).

Functional Annotation
The complete set of unigenes from both Notopterygium species was obtained against the NR, GO, Swiss-prot, KOG and KEGG databases. In total, 69,237 (85.0%) sequences from N. incisum and 47,774 (75.65%) from N. franchetii generated at least one significant alignment with an available sequence in BLASTX searches ( Table 2). The top hit species were similar in both Notopterygium datasets. The majority of the top hits were genes derived from Vitis vinifera, Sesamum indicum and Theobroma cacao ( Figure S1). The GO analysis distributions of the unigenes for both herbal species were also highly similar ( Figure 2). The GO classification separated the unigenes into 47 functional groups representing biological process (127,779 unigenes from N. incisum and 71,679 from N. franchetii), cellular component (90,356 unigenes from N. incisum and 55,483 from N. franchetii) and molecular function (56,434 unigenes from N. incisum and 33,848 from N. franchetii). Large groups of unigenes from both endangered species (48,515 and 29,474 from N. incisum and N. franchetii, respectively) were assigned to GO categories. We then used the public KOG database to elucidate the orthologous functions of the unigenes. The distributions of unigene function classification for the two herbal species were also similar ( Figure S2). The GO analysis distributions of the unigenes for both herbal species were also highly similar ( Figure 2). The GO classification separated the unigenes into 47 functional groups representing biological process (127,779 unigenes from N. incisum and 71,679 from N. franchetii), cellular component (90,356 unigenes from N. incisum and 55,483 from N. franchetii) and molecular function (56,434 unigenes from N. incisum and 33,848 from N. franchetii). Large groups of unigenes from both endangered species (48,515 and 29,474 from N. incisum and N. franchetii, respectively) were assigned to GO categories. We then used the public KOG database to elucidate the orthologous functions of the unigenes. The distributions of unigene function classification for the two herbal species were also similar ( Figure S2).

Putative Orthologs, Substitution Rates, and Species Divergence between N. incisum and N. franchetii
To compare the conservativeness between N. incisum and N. franchetii, we estimated 56,999 pairs of putative orthologous unigenes between two species applying the reciprocal best hit method with Blastp algorithm (E-values < 10 −7 ). The remaining protein coding genes that could not be assigned to orthologous groups were considered as species-specific expressed genes. After rigorous filtration, we finally obtained 6375 pairs of single-copy orthologs between the two species by OrthoMCL analysis. These orthologs were used in the subsequent evolutionary analysis of synonymous (Ks) and nonsynonymous (Ka) substitution rates. Finally, a total of 3823 pairs of single copy orthologous genes were identified. Of these orthologs, 381 pairs with a Ka/Ks value > 1 were identified indicating positive selection, and 857 had a Ka/Ks ratio between 0.5 and 1, suggesting weak purifying selection ( Figure S3) [22].
The divergence times between N. incisum and N. franchetii can be estimated by utilizing the peak synonymous rates (Ks) of orthologous pairs [13]. In this study, a peak of Ks distribution between two species was observed at 0.011 ± 0.021 ( Figure 3). We obtained an approximately estimate of the divergence time (T) between two endangered species according to the simple formula: T = K/2r [23], where r is the synonymous substitution rate, and is considered to be 1.5 × 10 −8 substitutions/synonymous site/year for the dicots [23]; K is genetic divergence of synonymous substitutions between orthologs. We calculated the period of the species divergence between N. incisum and N. franchetii to be roughly 1.06 ± 0.71 Mya, which falls between the largest glaciation in the Middle Pleistocene [24,25].

Putative Orthologs, Substitution Rates, and Species Divergence between N. incisum and N. franchetii
To compare the conservativeness between N. incisum and N. franchetii, we estimated 56,999 pairs of putative orthologous unigenes between two species applying the reciprocal best hit method with Blastp algorithm (E-values < 10 −7 ). The remaining protein coding genes that could not be assigned to orthologous groups were considered as species-specific expressed genes. After rigorous filtration, we finally obtained 6375 pairs of single-copy orthologs between the two species by OrthoMCL analysis. These orthologs were used in the subsequent evolutionary analysis of synonymous (Ks) and nonsynonymous (Ka) substitution rates. Finally, a total of 3823 pairs of single copy orthologous genes were identified. Of these orthologs, 381 pairs with a Ka/Ks value > 1 were identified indicating positive selection, and 857 had a Ka/Ks ratio between 0.5 and 1, suggesting weak purifying selection ( Figure S3) [22].
The divergence times between N. incisum and N. franchetii can be estimated by utilizing the peak synonymous rates (Ks) of orthologous pairs [13]. In this study, a peak of Ks distribution between two species was observed at 0.011 ± 0.021 ( Figure 3). We obtained an approximately estimate of the divergence time (T) between two endangered species according to the simple formula: T = K/2r [23], where r is the synonymous substitution rate, and is considered to be 1.5 × 10 −8 substitutions/synonymous site/year for the dicots [23]; K is genetic divergence of synonymous substitutions between orthologs. We calculated the period of the species divergence between N. incisum and N. franchetii to be roughly 1.06 ± 0.71 Mya, which falls between the largest glaciation in the Middle Pleistocene [24,25].

Functions and Metabolite Pathway Analysis under Positive Selection
To detect genes that might be involved in the adaptation to high altitudes, we used 381 pairs of PSGs in orthologous unigenes as the candidate genes, then combined rigorous selection methods to annotate all of the genes based on their functional roles (Table S2 and Figure S4). Firstly, genes derived from significant enrichment analysis included GO and KEGG enrichments, which included putative candidate genes and related metabolic pathways. Secondly, we utilized the functional annotated information for every PSG and pathways to identify the resistance genes reported in previous experimental studies.
In enrichment analyses, we separated the orthologs into two databases: N. incisum and N. franchetii with Ka/Ks > 1. In analysis of the GO category with a minimum of five hits, 5 GO-categories annotated to 9 ortholog pairs were detected to be over-represented (Fisher's exact test, p-value < 0.05) in the two Notopterygium databases (Table 3). Among these ortholog genes under positive selection with Ka/Ks > 1, most were enriched with functions related to RNA splicing, mismatch repair and acetate metabolic process. In addition, KEGG enrichment analysis revealed that three predicated metabolic pathways were associated with glutathione metabolism, plant-pathogen interaction and ribosome biogenesis in eukaryotes.

Functions and Metabolite Pathway Analysis under Positive Selection
To detect genes that might be involved in the adaptation to high altitudes, we used 381 pairs of PSGs in orthologous unigenes as the candidate genes, then combined rigorous selection methods to annotate all of the genes based on their functional roles (Table S2 and Figure S4). Firstly, genes derived from significant enrichment analysis included GO and KEGG enrichments, which included putative candidate genes and related metabolic pathways. Secondly, we utilized the functional annotated information for every PSG and pathways to identify the resistance genes reported in previous experimental studies.
In enrichment analyses, we separated the orthologs into two databases: N. incisum and N. franchetii with Ka/Ks > 1. In analysis of the GO category with a minimum of five hits, 5 GO-categories annotated to 9 ortholog pairs were detected to be over-represented (Fisher's exact test, p-value < 0.05) in the two Notopterygium databases (Table 3). Among these ortholog genes under positive selection with Ka/Ks > 1, most were enriched with functions related to RNA splicing, mismatch repair and acetate metabolic process. In addition, KEGG enrichment analysis revealed that three predicated metabolic pathways were associated with glutathione metabolism, plant-pathogen interaction and ribosome biogenesis in eukaryotes. Meanwhile, we identified 18 candidate PSGs from Go and KEGG enrichment analysis. These PSGs were mainly involved in RNA-binding protein (RBPs), DNA mismatch repair protein (MSH4), and cysteine and histidine-rich domain-containing protein RAR1, etc., in two Notopterygium species (Table 3, Tables S2 and S3). Specifically, most of the genes were involved in resistance genes (GO0008380, GO0032300, GO0031072, KO04620) or enzymes (GO0008083, GO0019783, KO00480, KO03008) in plant metabolite biosynthesis pathways, and were differentially expressed in the two species. However, we noticed that many more unigenes were related to disease resistance (KO04620), antioxidants (KO00480), and cold stress (GO32300, GO0031072, GO0008380), especially antioxidants associated with the glutathione metabolism pathway. As an important antioxidant, glutathione is synthesized from glycine, glutamate and methionine-derived cysteine ( Figure 4A). Glutathione is able to recycle with the oxidation/reduction cycle of its reduced (GSH) and oxidized (GSSG) forms [26]. The abundance of glutathione increases when plants have tolerance to oxidative stimuli [27]. Two genes (GP and RSG), encoding the enzymes and regulatory proteins in glutathione biosynthesis, were differentially expressed and highly expressed in high altitude species N. incisum (Table S3). Another important metabolic pathway is with respect to the plant-pathogen interactions, here, N. incisum and N. franchetii showed differential modulation of the expression genes that encoded five proteins included in this pathway ( Figure 4B). PAMP-triggered immunity constituted the first line of inducible defense against infectious diseases. Several genes involved in this primary response were linked to cytosolic Ca 2+ concentrations, including calcium-dependent protein kinase (CDPK), calcium-binding protein CML (CaM/CML) and the cyclic nucleotide gated channel (CNGC), which showed positive selection in both species. Meanwhile, the increase of Ca 2+ also have regulated the production of ROS and enhanced the tolerance of plants to environment stress. Two pathogen-related genes like PR1, RAR1 were identified, which were upregulated in N. incisum compared with N. franchetii (Table S3). protein CML (CaM/CML) and the cyclic nucleotide gated channel (CNGC), which showed positive selection in both species. Meanwhile, the increase of Ca 2+ also have regulated the production of ROS and enhanced the tolerance of plants to environment stress. Two pathogen-related genes like PR1, RAR1 were identified, which were upregulated in N. incisum compared with N. franchetii (Table S3).

Transcription Factor Identification
Transcription factors (TFs) played an important role in the plant growth and development [29]. Here, a total of 2696 and 1547 potential TFs were identified by comparing the unigenes of N. incisum and N. franchetii, respectively. The potential TFs were distributed in 53 families, such as bHLH, MYB, WAKY, ERF, bZIP, and so on ( Figure 5). Among these TF gene families, we identified some TFs were

Transcription Factor Identification
Transcription factors (TFs) played an important role in the plant growth and development [29]. Here, a total of 2696 and 1547 potential TFs were identified by comparing the unigenes of N. incisum and N. franchetii, respectively. The potential TFs were distributed in 53 families, such as bHLH, MYB, WAKY, ERF, bZIP, and so on ( Figure 5). Among these TF gene families, we identified some TFs were under positive selection, and most of them were differentially expressed, which may regulate the downstream target genes involved in stimulus responses in Notopterygium (Tables 3 and S3). under positive selection, and most of them were differentially expressed, which may regulate the downstream target genes involved in stimulus responses in Notopterygium (Table 3 and Table S3).

SSR Identification
To assess the quality of the transcriptome assembly and characterize new molecular genetic markers, 6375 pairs of single copy homologous genes were used to develop potential SSRs. A total of 1184/1007 (N. incisum/N. franchetii) SSR loci primers were identified using the MISA program, of which 155/126 sequences contained more than one SSR repeat ( Table 3). The most common repeat type was di-nucleotide, followed by tri-nucleotide, tetra-nucleotide, hexa-nucleotide, and pentanucleotide ( Figure S5). The most abundant repeat motif of di-nucleotide was AT/AT, followed by AC/GT and AG/CT. Among the units of tri-nucleotide, the dominant type was AAG/CTT, followed by ATC/ATG and AGC/CTG (FigureS5).

Genetic Diversity and Divergence of N. incisum and N. franchetii
For evaluating genetic diversity and divergence of N. incisum and N. franchetii, 45 individuals (Table S1) from nine natural populations were analyzed using 17 polymorphic primers selected from 60 SSR genetic markers (Table S6). Based on these polymorphic primers, the number of alleles (Na) per locus for the two species ranged from one to three, with an average value of 1.71. Mean expected heterozygosity (HE) across all loci in N. incisum was 0.201 which was higher than that in N. franchetii (HE = 0.192). Mean Shannon's information index (I) values were 0.336 and 0.288 for N. incisum and N. franchetii, respectively (Table S7).
The most likely number of clusters (K) of STRUCTURE analysis can be inferred by the peak of ΔK. We found that K = 2 was the optimal K value for SSR ( Figure S6), representing two species, respectively. Following 20 independent STRUCTURE runs with K = 2, individuals from N. incisum were assigned to one cluster with high probability, whereas those samples of N. franchetii were assigned to the other cluster with a similarly high probability. When K = 3, population E, HA and HH of N. incisum exhibits significant genetic divergence and structure ( Figure S6).

De Novo Assembly and Functional Annotation
In the present study, the large-scale transcriptomes of two Notopterygium species were characterized by de novo sequencing and assembly methods [30]. In total, 55.8 million and 60.1 million clean reads were generated for N. incisum and N. franchetii, respectively. Also, the unigenes sizes obtained in this study (average length of 1067 bp for N. incisum and 991 bp for N. franchetii) are longer than those

SSR Identification
To assess the quality of the transcriptome assembly and characterize new molecular genetic markers, 6375 pairs of single copy homologous genes were used to develop potential SSRs. A total of 1184/1007 (N. incisum/N. franchetii) SSR loci primers were identified using the MISA program, of which 155/126 sequences contained more than one SSR repeat ( Table 3). The most common repeat type was di-nucleotide, followed by tri-nucleotide, tetra-nucleotide, hexa-nucleotide, and penta-nucleotide ( Figure S5). The most abundant repeat motif of di-nucleotide was AT/AT, followed by AC/GT and AG/CT. Among the units of tri-nucleotide, the dominant type was AAG/CTT, followed by ATC/ATG and AGC/CTG ( Figure S5).

Genetic Diversity and Divergence of N. incisum and N. franchetii
For evaluating genetic diversity and divergence of N. incisum and N. franchetii, 45 individuals (Table S1) from nine natural populations were analyzed using 17 polymorphic primers selected from 60 SSR genetic markers (Table S6). Based on these polymorphic primers, the number of alleles (Na) per locus for the two species ranged from one to three, with an average value of 1.71. Mean expected heterozygosity (H E ) across all loci in N. incisum was 0.201 which was higher than that in N. franchetii (H E = 0.192). Mean Shannon's information index (I) values were 0.336 and 0.288 for N. incisum and N. franchetii, respectively (Table S7).
The most likely number of clusters (K) of STRUCTURE analysis can be inferred by the peak of ∆K. We found that K = 2 was the optimal K value for SSR ( Figure S6), representing two species, respectively. Following 20 independent STRUCTURE runs with K = 2, individuals from N. incisum were assigned to one cluster with high probability, whereas those samples of N. franchetii were assigned to the other cluster with a similarly high probability. When K = 3, population E, HA and HH of N. incisum exhibits significant genetic divergence and structure ( Figure S6).

De Novo Assembly and Functional Annotation
In the present study, the large-scale transcriptomes of two Notopterygium species were characterized by de novo sequencing and assembly methods [30]. In total, 55.8 million and 60.1 million clean reads were generated for N. incisum and N. franchetii, respectively. Also, the unigenes sizes obtained in this study (average length of 1067 bp for N. incisum and 991 bp for N. franchetii) are longer than those previously published herbal plants using similar technologies [14,31], thus indicating that the transcriptome sequencing data were well assembled for two Notopterygium species.
In addition, a substantial part of the unigenes of both species (85.0%/75.65%) could be annotated using four public protein databases and most were involved in plant proteins. Generally, the present study fills an important knowledge gap, and provides relevant transcriptome information regarding the pattern of gene expression and functional categories. Intriguingly, a higher number of unigenes was obtained for two Notopterygium species, and the number of annotated unigenes for N. incisum (81,446) was greater than that for N. franchetii (63,153). This difference suggested the potential to discover novel genes specific to the aim plant, and which might play a key role in species adaptations and evolution [32].

Species Divergence between N. incisum and N. franchetii
Generally, N. incisum inhabits the higher mountain areas than N. franchetii. They encounter the different environmental conditions and have been shown to have different tolerances during their growth and development [33]. The harsh high-altitude environment imposes strong selective pressures on high alpine plant species. Candidate adaptive genes for high altitudes identified in our study greatly enhance our understanding of the complex genetic architecture, which serves as an important basis for comparative genomic studies of adaptation and species divergence at high elevations.
In addition, peak values of the Ks distribution of orthologs between species often shows species divergence events [34], and this method has been successfully applied in the inference of such divergence events [35,36]. We dated the age of the species divergence between N. incisum and N. franchetii according to the Ks distribution; it approximately at 1.06 ± 0.71 Mya, which was significantly associated with the rapid uplift of the Qinghai-Tibetan Plateau and the largest glaciation recycles in the Middle Pleistocene [24,25].

Detecting Candidate Genes under Positive Selection
Ka/Ks values are widely used to elucidate protein coding genes under positive selection [37]. We focused on the function of positively selected genes between two Notopterygium species (Table  S2). These genes associated with GO functional categories clearly indicated the molecular and cellular events involved in N. incisum and N. franchetii ( Figure S4). We identified several interesting candidate genes exhibiting rapid evolution pattern with signs of strong selection (Ka/Ks > 1) in different functional groups that are highly likely associated with the environmental adaptation and stress response. Among them, four genes (ORTHOMCL16216, ORTHOMCL16135, ORTHOMCL16124, ORTHOMCL17922) were found to most likely be associated with the adaptive process to high-elevation environments. Five (ORTHOMCL12869, ORTHOMCL14798, ORTHOMCL15032, ORTHOMCL17505, ORTHOMCL18047) candidate PSGs are involved in links to RBPs, corresponding to the functional group of RNA splicing. Plant genomes encode a variety of RBPs which play important roles in plant growth and stress responses [38]. Serine/arginine-rich (SR) proteins play key roles in precursor mRNA splicing and have been proven to be crucial in plant adaptation to various environments [39]. In addition, one candidate PSG, MSH4, can be related to the mismatch repair (MMR) system. Similarly, MMR proteins play an important role in maintaining genome stability in organisms [40]. Meanwhile, we also have found that the vital gene RAR1 is a component of defense signaling pathways, requiring activation to prevent the spread of infection in the process of plants disease resistance.
Meanwhile, a series of other defense responses including glutathione metabolism and plant-pathogen interaction was identified ( Figure 4A), which was of vital importance in the process of adaptive evolution and species divergence of plants [41]. We also detected a large number of expressed genes encoding various enzymes associated with glutathione metabolism, indicating that glutathione may play a key role in the adaptation to the high mountain environments for the two endangered species (Table S4).
In addition, most of the expressed genes encoding various enzymes involved in plant-pathogen interaction metabolism were also predicted (Table S5). The PAMP-triggered immunity pathway (PTI) was activated ( Figure 4B), particularly by the up-regulation of the expression level of calcium-dependent protein kinase CDPK. Therefore, the activation of the PTI inhibits pathogen of N. incisum and N. franchetii, which made these two species have more widely adaptation to the harsh environments [42,43].
Furthermore, some important transcription factors were determined in two high altitude species. Generally, TFs can regulate the downstream genes involved in responses to environment stress, e.g., MYB, NAC, WRKY and ERF etc. [29,44]. These TFs have been proven to be significantly associated with various environment stresses in many plants [36,45,46]. Firstly, NAC proteins constituted one of the largest families of plant transcription factors [47]. Genes from this family participated in a large number of biological processes including developmental programs, defense, and environment stress responses [48,49]. In this study, two unigenes (ORTHOMCL17007 and ORTHOMCL20774) were identified in NAC gene networks. Secondly, the WRKY transcription factor may be associated with plant abiotic stresses such as cold and drought conditions [50,51]. Two unigenes (ORTHOMCL11146 and ORTHOMCL13242) were annotated as WRKY transcription factors which were under positive selection in two Notopterygium species. Thirdly, we found two genes (ORTHOMCL16173 and ORTHOMCL17851) belonging to the MYB family TFs, which were induced in various environment stress responses and tolerances [52,53]. In addition, basic helix-loop-helix (bHLH) TFs comprised a large TF family and acted as crucial regulators in response to stress in plants [54,55] were identified. Another two of the TFs, BkERF2.1, DcERF1, are both associated with enhanced disease resistance. BkERFs contained an ERF/AP2 DNA binding domain, which mediated the expression of defense-related genes in plants [56]. DcERF1, belonging to the ethylene-responsive element-binding factor (ERF) family (related to plant defense [57,58], regulation of secondary metabolism and the flower senescence) was also found in this study.

Polymorphic SSR Markers and Population Structure
Genetic markers based on transcriptome sequences are effective for studying population structure, diversity and species divergence [59]. To validate the determined SSR markers, we amplified the predicted SSR primers via PCR reaction. Of the 60 pair primers that were selected for PCR validation, 28 (46.67%) produced clear DNA bands. This PCR success rate was higher than that reported for some herbal plants such as alfalfa (30%) [60] and tree peony (47.30%) [61]. Among the working primers, we excluded 11 primers which showed polymorphism in one species but could not amplify clear bands in another, and then 17 polymorphic primer pairs were selected as genus-specific SSR markers to evaluate genetic diversity and divergence of N. incisum and N. franchetii. Genetic diversity analyses suggested that most of these primers showed higher polymorphism in N. incisum than in N. franchetii. The mean expected heterozygosity (H E ) across all loci in N. incisum was 0.201 and was higher than that in N. franchetii (H E = 0.192). This was probably caused by the much more limited distributional range and natural populations of N. incisum than N. franchetii. Meanwhile, according to the STRUCTURE cluster analysis, we found that all populations were clustered into two groups ( Figure S6) and these showed a clear demarcation between accessions from two species ( Figure S6). The results implied that there may exist high genetic divergence or/and barriers between N. incisum and N. franchetii. In short, our current study indicated that the SSR primers from two species transcriptomes are reliable and could be used to study the population genetics and species divergence of Notopterygium in the future.

Plant Material
The fresh tissues of N. incisum and N. franchetii used in this study were collected in June 2013, from the Taibai Mountain (33 • 59 57.322" N, 107 • 48 25.527" E, 3500 m), Shaanxi province, and Gansu province (33 • 29 09.655" N, 104 • 31 30.857" E, 1800 m), in the western region of China. To obtain as many expressed genes as possible, three different tissues were sampled for each species, including leaves, stems and flowers. We immediately froze tissue samples in liquid nitrogen and stored them at −80 • C in a cryogenic refrigerator until RNA extraction. Fresh young leaves of 45 individuals of both Notopterygium species were also collected from Sichuan, Gansu, and Shaanxi, China for SSR development and validation (Table S1). Samples were dried instantly using silica gel for DNA extraction.

RNA Preparation, Illumina Paired-End Sequencing
We isolated the total RNA of two Notopterygium species using the RNeasy Kit (Qiagen, Hilden, Germany) [13]. A mixed sample with equal volumes (RNAs of leaves, stems and flowers) was prepared as a single pooled RNA sample for each species. cDNA preparation and RNA-Seq were carried out using these pooled samples. Double-stranded cDNA was synthesized and sequencing adaptors were ligated according to the manufacturer's instructions (Illumina, San Diego, CA, USA). The ligated products were then purified with AMPureXP beads and amplified for the construction of cDNA libraries [62]. The libraries were sequenced using the Illumina HiSeq TM 2000 (Novogene, Tianjin, China) after completion of cDNA libraries. The raw transcriptome datasets of N. incisum and N. franchetii were deposited in the NCBI Sequence Read Archive (accession numbers SRR5573673 and SRR5659683, respectively).

Transcriptome Assembly and Function Annotation
We obtained clean reads by filtering the adaptor sequences, low-quality regions (bQ20) and sequences shorter than 50 bp [63]. The program Trinity was employed for de novo assembly with all clean reads (http://trinityrnaseq.sourceforge.net/) [64]. BLASTx was used to align unigene sequences to the protein databases NCBI non-redundant (NR), the Swiss-Prot protein database (Swiss-Prot, in UniProt), the Kyoto Encyclopedia of Genes and Genomes (KEGG, www.genome.jp/kegg/) [28], and the Clusters of enKaryotic Orthologous Groups of proteins (KOG, http://www.ncbi.nlm.nih. gov/KOG/) [65].The highest similarity protein sequences with E-values < 10 −5 were selected and then analyzed. According to NR annotation, Gene Ontology (GO) annotation and classification of all unigenes was acquired by the Blast2Go program [66] and WEGO software [67], which comprehensively describe the attributes of genes and gene products in species [68].

Ortholog Identification and Sequence Alignment
We used the BLASTx and ESTScan software to predicate the coding sequences (CDSs and protein sequences) [69]. The latter is aimed at unigenes that do not align with the previously mentioned protein databases. Then, the predicted CDSs allowed us to evaluate orthologous groups between N. incisum and N. franchetii. Blastp search of the predicted CDS regions of two species transcriptomes with an E-values < 10 −7 was carried out. For such comparisons, the purpose of the alignment is to appraise gene families and potential orthologs [32]. These analyses were performed with the program OrthoMCL [70].

Substitution Rate Estimation and Selection Analyses
Putative single-copy orthologs were assessed by clustering protein sequences between N. incisum and N. franchetii (OrthoMCL). The nonsynonymous (Ka), synonymous (Ks), and Ka/Ks values were computed using the software KaKs_Calculator (http://code.google.com/p/kaks-calculator/wiki/ KaKs_Calculatoref) [71]. We used the Fisher's exact test to validate the efficiency of the Ka and Ks values for each single ortholog. We excluded candidate orthologs with a synonymous (Ks) substitution value >0.1, as these may be paralogs [13,22]. Rigorous criteria were used to identify the positive selection genes (PSGs): PSGs were required to have a Ka/Ks value higher than 1 [5]. To speculate on the putative functions of unigenes specifically over-expressed in orthologous groups, GO and KEGG pathways enrichment analyses were performed by comparing homologous genes between two species. Multiple testing was implemented by using the false discovery rate [72], with a cut-off of 0.05 to determine significant over expression.

Transcription Factor Mining
In order to identify some positively selected genes and detect the adaption mechanisms in the two high-alpine herbal species, we analyzed the transcription factors represented in N. incisum and N. franchetii transcriptomes; all assembled and annotated unigenes were aligned to the plant transcription factor database (PlantTFdb; http://plntfdb.bio.uni-potsdam.de/v3.0/downloads.php) with application of the program HMMER [73].

Microsatellite Marker Detection
We used the program MISA (http://pgrc.ipk-gatersleben.de/misa/) [74] to mine the microsatellite markers of N. incisum and N. franchetii. The criteria of SSR search were as follows: di-, tri-, tetra-, penta-and hexa-nucleotide motifs with at least a 6, 5, 4, 4 and 4 repeats, respectively [75]. The maximum size of interruption allowed between two different SSRs in a compound sequence was 100 bp. SSR primer pairs were designed using the software Primer Premier 6.0 (PremierBiosoft International, Palo Alto, CA, USA). A random selection of tetra-to penta-nucleotide microsatellites were used to design locus-specific primers for two species [76,77].

DNA Isolation and Primer Selection
We isolated genomic DNA of dried leaves using the cetyltrimethylammonium bromide method [78]. The quality of DNA was determined by 1% agarose gel electrophoresis. Sixty primer pairs were detected with 45 samples of N. incisum and N. franchetii. Polymerase chain reaction (PCR) amplifications were administered with a 10 µL reactive volume containing 1 µL of DNA template (10-40 ng/µL), 5 µL of 2X Taq mix, 0.4 µL of the forward primer (1 mM), 1.6 µL of the reverse primer (1 mM), and 1.4 µL of ddH 2 O. PCR amplification conditions consisted of an initial denaturation step at 95 • C for 5 min, followed by 32 cycles of 95 • C for 30 s, 53-57 • C for 90 s, and 72 • C for 60 s; and a final elongation step of 7 min at 72 • C. All loci were run separately in 10% non-denaturing polyacrylamide gel and then visualized by silver staining protocols. The band size was determined using the program Quantity One (Bio-Rad Laboratories, Hercules, CA, USA) with PBR322 DNA/MspI as the molecular size standard.

Microsatellite DNA Analysis
A set of statistical tests on SSRs, including allelic richness (Na), Shannon's information index (I), and observed and expected heterozygosity (Ho and H E , respectively), were performed via the software GenALEx version 6.5 (Canberra, Australia) [79]. Genetic structure of all populations was estimated by clustering method based on a Bayesian model with the software package STRUCTURE [80]. The data set without prior population information was analyzed using an admixture model and independent allelic frequencies [81]. A total of 20 independent simulations were run for K from 1 to 6 with 2 × 10 5 burn-in steps followed by 5 × 10 5 MCMC steps. The program STRUCTURE HARVESTR (Santa Cruz, CA, USA) was employed to evaluate the most likely number (K) of genetic clusters using the delta K criterion [82]; the inferred clusters were drawn as colored box-plots using the bio-software DISTRUCT (Los Angeles, CA, USA) [83].