Transcriptome Profiling of a Common Mistletoe Species Parasitizing Four Typical Host Species in Urban Southwest China

Comparing gene expressions among parasitic plants infecting different host species can have significant implications for understanding host–parasite interactions. Taxillus nigrans is a common hemiparasitic species in Southwest China that parasitizes a variety of host species. However, a lack of nucleotide sequence data to date has hindered transcriptome-level research on T. nigrans. In this study, the transcriptomes of T. nigrans individuals parasitizing four typical host species (Broussonetia papyrifera (Bpap), a broad-leaved tree species; Cryptomeria fortunei (Cfor), a coniferous tree species; Cinnamomum septentrionale (Csep), an evergreen tree species; and Ginkgo biloba (Gbil), a deciduous-coniferous tree species) were sequenced, and the expression profiles and metabolic pathways were compared among hosts. A total of greater than 400 million reads were generated in nine cDNA libraries. These were de novo assembled into 293823 transcripts with an N50 value of 1790 bp. A large number of differentially expressed genes (DEGs) were identified when comparing T. nigrans individuals on different host species: Bpap vs. Cfor (1253 DEGs), Bpap vs. Csep (864), Bpap vs. Gbil (517), Cfor vs. Csep (259), Cfor vs. Gbil (95), and Csep vs. Gbil (40). Four hundred and fifteen unigenes were common to all six pairwise comparisons; these were primarily associated with Cytochrome P450 and environmental adaptation, as determined in a KEGG enrichment analysis. Unique unigenes were also identified, specific to Bpap vs. Cfor (808 unigenes), Bpap vs. Csep (329 unigenes), Bpap vs. Gbil (87 unigenes), Cfor vs. Csep (108 unigenes), Cfor vs. Gbil (32 unigenes), and Csep vs. Gbil comparisons (23 unigenes); partial unigenes were associated with the metabolism of terpenoids and polyketides regarding plant hormone signal transduction. Weighted gene co-expression network analysis (WGCNA) revealed four modules that were associated with the hosts. These results provide a foundation for further exploration of the detailed molecular mechanisms involved in plant parasitism.


Introduction
Parasitic plants are a diverse group of 4750 species that obtain water, mineral nutrients, and carbon from other plants using a specialized feeding organ called a haustorium; within the angiosperms, parasitism has evolved independently at least twelve times [1,2]. The majority of parasitic plants are hemiparasites, which feed directly on other plants but also maintain green leaves and photosynthesize [3]. Many host characteristics interact to determine parasite performance, including nitrogen content [4], carbon content [5], the presence of secondary compounds [6,7]), host condition [8], defenses and immunity [9], biomass [10], and genotype [11]. This complexity has impeded research into hemiparasite host range evolution, particularly as many of the host variables are confounded (depending on the host species).
The ecological significance of parasitic plants depends on the host plant's preferences and specificity; most mistletoe species have a wide host range and may attach to a diversity of host plants belonging to co-occurring plant species. For instance, Parentucellia viscosa, Rhinanthus minor, and Triphysaria versicolor are known to parasitize 27, 50, and 25 different plant species, respectively [12,13]. However, among all possible host species, each mistletoe species may have a preferred host(s), for which it shows higher specificity than other host species [14]. Santalum acuminatum performs better (as measured by its biomass, percent cover, and haustoria abundance) when parasitizing nitrogen-fixing woody plants [15]; the haustoria of Thesium chinense are also larger when parasitizing the Fabaceae species as compared to other host species [16]. This enhanced performance on leguminous hosts appears to result from a combination of less effective anti-parasitic defenses and the availability of sufficient amounts of easily absorbed nitrogen-containing compounds within the host, rather than being a direct consequence of nitrogen fixation [17]. In addition, germination stimulants also affect parasite host selectivity [6,18].
The Loranthaceae family comprises approximately 70 genera and 950 species, all of which are hemiparasites [19]. Within the family, Taxillus nigrans is a hemiparasitic species that retains the ability to photosynthesize while obtaining water, carbon, and nutrients from host plants via the haustoria [19,20]. T. nigrans can only be propagated by the seeds. It produces fleshy berries that are eaten by birds, promoting seed dispersal; seeds rapidly germinate when deposited on a suitable host, forming new haustoria and drilling into the host's cortex [21]. The semiparasitism seen in T. nigrans is similar to that of other Loranthaceae species, such as Agelanthus natalitius and Struthanthus aff. polyanthus [19,22]. T. nigrans can parasitize 51 host species [23], causing the death of branches or the entire plant of the host. However, very little is known about the mechanisms driving host species preference.
With the advent of next-generation sequencing (NGS), the de novo assembly and analysis of parasitic plant transcriptomes have offered important insights into gene expression differences related to host species identity [24]. Here, RNA-seq was used to generate transcriptome profiles for T. nigrans individuals parasitizing four different host species (Broussonetia papyrifera, Cinnamomum septentrionale, Cryptomeria fortune, and Ginkgo biloba) to reveal underlying molecular differences. A fully annotated transcriptome was obtained and used as a reference to examine expression differences associated with each host species. These results will be helpful in understanding gene expression in the mistletoe species colonizing different host species.

Sample Collection
In July 2016, fresh T. nigrans were collected from four different host species (Figure 1), namely, Broussonetia papyrifera (Bpap), Cinnamomum septentrionale (Csep), Cryptomeria fortunei (Cfor), and Ginkgo biloba (Gbil), on the Wangjiang Campus of Sichuan University, Chengdu, Sichuan, China. T. nigrans has 41 known host species [21]; the host species selected for this study included two deciduous broad-leaved species (Gbil and Bpap), a deciduous coniferous species (Cfor), and an evergreen broad-leaved species (Csep). For the leaf samples, three biological replicates were collected per host species. Samples were immediately frozen in liquid nitrogen and then stored in a −80 • C refrigerator prior to RNA extraction. All samples were used for RNA extraction and RNA-Seq library construction.

RNA-Seq Library Preparation and Sequencing
Total RNA was extracted from collected leaves using Guanidine thiocyanate (Sigma, 50983, Beijing, China)-Chloroform (Sigma, 472476, Beijing, China) according to the manufacturer's instructions. For each host species, two hundred milligrams of leaf tissue was ground in liquid nitrogen to extract total RNA. The total RNA samples were then treated with a DNA-free TM DNA Removal Kit (Ambion, AM1906, Shanghai, China) to remove contaminating genomic DNA. RNA purity was checked using a Nano-Photometer spectrophotometer (IMPLEN, Westlake Village, CA, USA). Before cDNA synthesis, RNA concentrations were measured using a Qubit RNA Assay Kit (Life Technologies, Q32852, Shanghai, China), and RNA integrity was assessed using the RNA Nano 6000 Assay Kit for the Agilent Bioanalyzer 2100 system (Agilent Technologies, Santa Clara, CA, USA).
A total of 3 µg of RNA per sample was used as input material for sample preparation. Twelve sequencing libraries (i.e., three replicates per host species) were generated using the NEBNext Ultra TM RNA Library Prep Kit for Illumina (NEB, Beijing, China) following the provided protocol. Clustering of the index-coded samples was then performed with the cBot Cluster Generation System using a TruSeq PE Cluster Kit v3-cBot-HS (Illumina, Shanghai, China) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina HiSeq 2000 platform by Novogene Bioinformatics Technology Co., Ltd. (Beijing, China), and 150-bp paired-end reads were generated.
A total of 3 µg of RNA per sample was used as input material for the sample preparation. Twelve sequencing libraries (i.e., three replicates per host species) were generated using the NEBNext Ultra TM RNA Library Prep Kit for Illumina (NEB, Beijing, China) following the provided protocol. Clustering of the index-coded samples was then performed with the cBot Cluster Generation System using a TruSeq PE Cluster Kit v3-cBot-HS (Illumina, Beijing, China) according to the manufacturer's instructions. After cluster generation, the library preparations were sequenced on an Illumina HiSeq 2000 platform by Novogene Bioinformatics Technology Co., Ltd. (Beijing, China), and 150-bp paired-end reads were generated.

Preprocessing of Illumina Reads and De Novo Transcriptome Assembly
Raw reads were filtered using Trimmomatic [25] to obtain high-quality, clean reads for assembly: adapter sequences, reads containing poly-N runs (≥10%), and low quality (sQ ≤ 5) reads were removed. The number of sequence duplications, GC content, Q20, and Q30 was then calculated for the clean reads. All downstream analyses were based on clean data. The clean data of all samples were uploaded to the Sequence Read Archive (SRA) with accession number PRJNA851568. De novo assembly of the transcriptome from the RNA-seq data was performed using the Trinity software package [26]), with min_kmer_cov set to two and default values for other parameters. Benchmarking Universal Single-Copy Orthologues (BUSCO) provides measures for quantitative assessment of transcriptome completeness based on evolutionarily informed expectations of gene content from conserved single-copy orthologs [27]. In this study, we evaluated the completeness of the assembly using BUSCO based on the embryophyta_odb10 database.

Functional Annotation of the Transcriptome
To annotate the assembled T. nigrans transcriptome, transcripts were aligned against the eggNOG Nr, and Swiss-Prot databases using BLASTP with a significance threshold of E ≤ 10 −5 . For functional categorization, the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were analyzed and annotated using the TBtools [28].

Differential Expression Analysis and Co-Expression Network Analysis
To estimate the abundance of the de novo assembled transcripts, RSEM [29] was used; this protocol assesses transcript abundance based on the mapping of RNA-seq reads to the assembled transcriptome. The DESeq2 and edgeR in IDEAMEX [30] were used to identify differentially expressed genes (DEGs) among the host species. To minimize the false positive rate, only transcripts with threshold p values < 0.05 and absolute values of log 2 ( f old change) > 1 (as screened by IDEAMEX) were considered differentially expressed. The differentially expressed transcripts were divided into six sets, each comparing two host species, as follows: set I (Bpap vs. Cfor), set II (Bpap vs. Csep), set III (Bpap vs. Gbil), set IV (Cfor vs. Csep), set V (Cfor vs. Gbil), and set VI (Csep vs. Gbil). Pearson correlation coefficients were then calculated for all sets using gene expression data. A GO term enrichment analysis and KEGG pathways were performed for the DEGs using TBtools. We performed a co-expression network analysis using the WGCNA package [31] based on log 2 (FPKM + 1) values.

Transcriptome Sequencing and De Novo Assembly of T. nigrans Transcriptome
Fresh leaves of T. nigrans collected from four host species were used to construct cDNA libraries, which were subsequently sequenced on an Illumina HiSeq 4000 platform. For each host species, three biological replicates were analyzed. After filtering the raw sequencing data, a total of 5,553,239,700 (Bpap-001), 6 Table 1). The de novo assembly yielded 332,439 transcripts, with an N50 size of 1790 bp ( Table 2). The BUSCO analysis showed 92.3% of 1614 single-copy genes in the embryophyta_odb10 database ( Table 2). The results of the functional annotations of all transcripts are shown in Table S1.
After identifying the DEGs for each pairwise comparison, the subset of DEGs common to all samples was then determined. A total of 415 common unigenes were identified based on DESeq2 and degeR; these are presented in a Venn diagram and volcano map ( Figure 2B,D). Partially common unigenes encoding putative transcription factors, including members of the cytochrome P450 family, HSP, ABC transporter family, UDPglycosyltransferase family, and zinc finger families (Table S1). In addition, unique DEGs were identified for the Bpap vs.

Transcriptome Assembly and Annotation
Advances in transcriptome sequencing technology and data mining platforms have led to rapid progress in comparative transcriptomics for non-model, non-crop plants, such as parasitic plants [32][33][34][35][36][37][38]. Here, transcriptomes were generated and compared for T. nigrans plants parasitizing four different host species. High-throughput sequencing generated more than 405 million clean reads from all samples, yielding approximately 248,751 unigenes after assembly. The assembled transcriptome comprised mostly short transcripts, as seen for other de novo assembled plant transcriptomes, perhaps as a result of the assembly algorithm used [39,40]. The contig N50 was 1790 bp, indicating that the Trinity assembly was of high quality. The assembly and annotation of the T. nigrans transcriptome provides a valuable resource for investigating the processes and pathways involved in T. nigrans host adaptation. For example, transcriptomes produced for the parasitic plants Arceuthobium sichuanense and Cuscuta campestris were used to identify DEGs among samples [24,40].

Gene Categories Associated with Plant Development and Host Selection
For some parasitic plants, such as Arceuthobium sichuanense and Cuscuta campestris, the establishment of vascular connections with the host plant allows the acquisition of nutrients and solutes; this nutrient sink can significantly reduce the biomass of the host [40,41]. In this study, based on the results of GO and KEGG enrichment, a large number of common unigenes (shared across T. nigrans individuals) were primarily related to carbohydrate metabolism, including glycolysis/gluconeogenesis; pyruvate metabolism; glyoxylate and dicarboxylate metabolism; and amino sugar and nucleotide sugar metabolism. Although the transcriptomic data were not directly related to the parasitic properties of T. nigrans, studies of other parasitic plants have shown that metabolism is related to parasitism [40][41][42][43].
In addition, unigenes involved in the energy metabolism in photosynthetic organisms (i.e., oxidative phosphorylation and carbon fixation) were also enriched in T. nigrans. We also found that part of the unigenes related to the circadian rhythm, cytochrome P450, metabolism of terpenoids and polyketides, and response to abscisic acid. Cytochrome P450 may contribute to environmental and host adaptation [44,45]. Abscisic acid is thought to be involved in the development [46,47]. The observed results may thus reflect the adaptive adjustment of T. nigrans to host selection and the environment. Furthermore, unigenes were identified specific to the Bpap vs. Cfor comparison (n = 808), Bpap vs. Csep comparison (n = 329), Bpap vs. Gbil comparison (n = 87), Cfor vs. Csep comparison (n = 108), Cfor vs. Gbil comparison (n = 32), and Csep vs. Gbil comparison (n = 23), most of which were annotated using the Gene Ontology and KEGG databases. In all annotated comparisons, most of the genes were related to the biosynthesis and metabolism of secondary metabolites. Unique genes of each comparison showed similarities in functional enrichment, such as Bpap vs. Cfor and Csep vs. Cfor comparisons involving the metabolism of terpenoids and polyketides; in Bpap vs. Csep and Csep vs. Cfor comparisons, we found genes related to plant hormone signal transduction that may be related to host differences. In addition, partial unigenes were associated with the plasma membrane, response to endogenous stimuli, ion binding, and organic hydroxy compound metabolic processes. Previous studies have shown that these factors are associated with plant defenses [48][49][50]. We identified co-expressed gene sets that were expressed in four samples via the weighted gene co-expression network analysis (WGCNA), a method widely used in the comparative transcriptome analysis [51][52][53][54]. These results indicate that each sample was associated with one or two co-expression modules that reflected the gene regulatory processes specific to each sample. We anticipate that the comprehensive information presented here will serve as a crucial resource to understand host differences and commonalities.

Conclusions
In summary, RNA-seq analysis revealed both host-specific and common pathways involved in T. nigrans parasitism; these results provide a foundation for further study of the molecular factors and functions underlying adaptation to different host plants. The putative key genes and pathways identified here may also be used to explore the molecular factors and mechanisms explaining host diversity. The analysis represents a critical step in promoting the development of molecular ecology tools for parasite-host systems and sustainable integrated management programs.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.