PacBio Long-Read Sequencing Reveals the Transcriptomic Complexity and Aux / IAA Gene Evolution in Gnetum (Gnetales)

: The genus Gnetum includes pantropical trees, shrubs and lianas, with unresolved phylogenetic relationships with other seed plant groups. Despite the reference genome for this genus being recently published, the molecular mechanisms that regulate the reproductive organ development of Gnetum remain unclear. A previous study showed that indole-3-acetic acid is involved in the regulation of female strobili of Gnetum , while the diversity and evolution of indole-3-acetic acid-related genes—the Aux / IAA genes—have never been investigated in Gnetales. Thus, a pooled sample from di ﬀ erent developmental stages of female strobili in Gnetum luofuense C.Y. Cheng was sequenced using PacBio single-molecular long-read technology (SMRT) sequencing. PacBio SMRT sequencing generated a total of 53,057 full-length transcripts, including 2043 novel genes. Besides this, 10,454 alternative splicing (AS) events were detected with intron retention constituting the largest proportion (46%). Moreover, 1196 lncRNAs were identiﬁed, and 8128 genes were found to possess at least one poly (A) site. A total of 3179 regulatory proteins, including 1413 transcription factors (e.g., MADS-box and bHLHs), 477 transcription regulators (e.g., SNF2), and 1289 protein kinases (e.g., RLK / Pelles) were detected, and these protein regulators probably participated in the female strobili development of G. luofuense . In addition, this is the ﬁrst study of the Aux / IAA genes of the Gnetales, and we identiﬁed 6, 7 and 12 Aux / IAA genes from Gnetum luofuense , Welwitschia mirabilis , and Ephedra equistina , respectively. Our phylogenetic analysis reveals that Aux / IAA genes from the gymnosperms tended to cluster and possessed gene structures as diverse as those in angiosperms. Moreover, the Aux / IAA genes of the Gnetales might possess higher molecular evolutionary rates than those in other gymnosperms. The sequencing of the full-length transcriptome paves the way to uncovering molecular mechanisms that regulate reproductive organ development in gymnosperms.


Introduction
The order Gnetales comprises three extant genera: Ephedra L., Welwitschia Hook.f. and Gnetum L. The gross morphology of the Gnetales, especially their reproductive organs, dramatically differs from that of non-Gnetalean gymnosperms [1][2][3]. The systematic position of the Gnetales remain unresolved among extant seed plant groups, and hypotheses proposed based on morphological and molecular data are constantly contradictory [1,[4][5][6][7][8]. Gnetum, comprising around 40 species, is a pantropical genus with the main life-forms of trees, shrubs and lianas [9][10][11]. As dioecious, a female strobilus of Gnetum is featured by several rolls of involucres-for each roll, multiple reproductive units closely connect to a AS, APA and lncRNAs in gymnosperms have never been investigated to date using PacBio sequencing technique. Therefore, in this present study, we adopted the full-length transcriptome from G. luofuense female strobili at different developmental stages using PacBio SMRT sequencing. Illumina-based RNA sequencing was applied to refine the PacBio transcript isoforms by short-read error correction. Further isoform analyses were performed to reveal the diversity of AS isoforms and APA events and lncRNAs. In addition, the diversity of Aux/IAA genes was surveyed in the Gnetales and their phylogeny, structural features, and molecular evolutionary rates were investigated. The first full-length transcriptome of reproductive organs in the Gnetales provides a valuable resource for investigating the reproductive development and evolution of Aux/IAA genes in gymnosperms.

Plant Materials and RNA Extraction
The plant material used for full-length transcriptome sequencing was collected from a mature female individual of G. luofuense cultivated in Bamboo Garden, Sun Yat-Sen University on 8 May 2018. To reveal the transcriptome of the entire developmental process, female strobili at four different developmental stages as defined by Lan, Liu, Shi, Deng, Jiang and Chang [18] were pooled (Figure 1), snap-frozen in liquid nitrogen, and stored at −80 • C. The collected material was divided into two samples: one for PacBio SMRT sequencing, and the other for Illumina sequencing. The voucher for the two specimens (No. CH003) was deposited in the herbarium of Sun Yat-Sen University, SYS.
Total RNA was extracted from the two samples using an RNA kit (Qiagen, Valencia, CA, USA) following the manufacturer's instructions. The relic DNA was removed using RNase-free DNase set (Qiagen), and the cleaned RNA samples were assessed by gel electrophoresis with 1% agarose. Besides this, the purity and integrity of extracted RNA was assessed by NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA) and Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA), respectively. however, AS, APA and lncRNAs in gymnosperms have never been investigated to date using PacBio sequencing technique. Therefore, in this present study, we adopted the full-length transcriptome from G. luofuense female strobili at different developmental stages using PacBio SMRT sequencing. Illumina-based RNA sequencing was applied to refine the PacBio transcript isoforms by short-read error correction. Further isoform analyses were performed to reveal the diversity of AS isoforms and APA events and lncRNAs. In addition, the diversity of Aux/IAA genes was surveyed in the Gnetales and their phylogeny, structural features, and molecular evolutionary rates were investigated. The first full-length transcriptome of reproductive organs in the Gnetales provides a valuable resource for investigating the reproductive development and evolution of Aux/IAA genes in gymnosperms.

Plant Materials and RNA Extraction
The plant material used for full-length transcriptome sequencing was collected from a mature female individual of G. luofuense cultivated in Bamboo Garden, Sun Yat-Sen University on 8 May 2018. To reveal the transcriptome of the entire developmental process, female strobili at four different developmental stages as defined by Lan, Liu, Shi, Deng, Jiang and Chang [18] were pooled ( Figure  1), snap-frozen in liquid nitrogen, and stored at −80 °C. The collected material was divided into two samples: one for PacBio SMRT sequencing, and the other for Illumina sequencing. The voucher for the two specimens (No. CH003) was deposited in the herbarium of Sun Yat-Sen University, SYS.
Total RNA was extracted from the two samples using an RNA kit (Qiagen, Valencia, CA, USA) following the manufacturer's instructions. The relic DNA was removed using RNase-free DNase set (Qiagen), and the cleaned RNA samples were assessed by gel electrophoresis with 1% agarose. Besides this, the purity and integrity of extracted RNA was assessed by NanoDrop spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA) and Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA), respectively.

cDNA Construction and PacBio SMRT Sequencing
The high-quality mRNA (RNA integrity value >7.0) was transcribed into full-length cDNA using a SMARTer PCR cDNA Synthesis kit (Clontech, CA, USA). After PCR amplification, quality control and purification, RNA samples were subjected to cDNA damage/terminal reparation and the connection of SMRT dumbbell-type adapters. A SMRTBell Template library was constructed based

cDNA Construction and PacBio SMRT Sequencing
The high-quality mRNA (RNA integrity value >7.0) was transcribed into full-length cDNA using a SMARTer PCR cDNA Synthesis kit (Clontech, CA, USA). After PCR amplification, quality control and purification, RNA samples were subjected to cDNA damage/terminal reparation and the connection of SMRT dumbbell-type adapters. A SMRTBell Template library was constructed based on the cDNA production using a SMRTBell Template Prep Kit (Pacific Biosciences, CA, USA). The insert size of the cDNA library was measured by an Agilent 2100 Bioanalyzer, and the concentration of the cDNA library was assessed using Qubit 2.0 flurometer (Life Technologies, CA, USA). One SMRT cell was sequenced on a PacBio RS II platform at Biomarker Technologies Inc., Beijing, China. The PacBio raw reads have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database (accession number SAMN12911136).

Illumina Library Construction and Sequencing
An RNA sample that possesses Poly (A) was segregated from the total RNA using oligo (dT) magnetic beads. A fragmentation buffer was applied to randomly split the enriched RNA into small pieces, and the first strand of cDNA was synthesized using hexamers and reverse transcriptase (Superscript III, Invitrogen). After removing the RNA templates, the second strand of cDNA was subsequently synthesized using deoxyribonucleotide triphosphates (dNTPs), RNase H and DNA polymerase I (Sigma-Aldrich). The synthesized double-strand cDNA was purified with AMPure XP beads. After terminal reparation and A-tailing, the short cDNA fragments were subjected to ligation with Illumina adaptors and purification with AMPure XP beads. The cDNA library was created with PCR amplification, and its quality and concentration were assessed by an Agilent 2100 Bioanalyzer and Qubit 2.0 flurometer, respectively. The final cDNA library was sequenced using an Illumina Hiseq 4000 system. The raw reads generated from Illumina sequencing have been deposited in the NCBI SRA database (accession number SAMN12911137).

PacBio Long Read Processing and Genome Mapping
Reads of inserts (ROIs) were generated with the following settings of parameters: full passes ≥0, a predicted accuracy of circular consensus sequence (CCS) >0.8 and a minimum length of 50 bp. These ROIs were further divided into full-length (FL) reads and non-full-length (nFL) reads with regard to the presence of 5 and 3 cDNA primers and a 3 poly (A) tail. All ROIs, comprising both FLs and nFLs, were clustered by the Iterative Clustering for Error Correction (ICE) algorithm to obtain consensus isoforms. To obtain high-quality isoforms, full-length nonchimeric (FLNC) transcripts from the Information Concealment Engine (ICE) algorithm were corrected using the Quiver algorithm (https://www.pacb.com/applications/rna-sequencing) with a post-correction accuracy above 99%. The remaining low-quality isoforms were corrected from the Illumina sequenced data with the Proovread tool version 2.12 [54].
Full-length nonchimeric and corrected low-quality transcripts were mapped against the reference genome of G. montanum [= G. luofuense, 4] using GMAP [55] with the following setting parameters: -cross-species, -allow-close-indels=0. The software cDNA Cupcake (https://github.com/Magdoll/ cDNA_Cupcake/wik) was applied to remove the redundant transcripts with a minimum alignment accuracy of 0.9 and minimum coverage of 0.85. Mapped transcripts with length differences merely on their 5 ends were not considered to be redundant. The assessment of transcriptome completeness was performed with regard to the conserved content of transcripts using BUSCO (Benchmarking Universal Single Copy Orthologs) [56]. The mapped transcripts were compared to conserved proteins, and the percentages of aligned and partially aligned transcripts were calculated. In addition, the novel genes were identified, annotated and classified into gene families using Pfam.

AS and APA Analysis
The GMAP output files in the BAM and gff format were used for subsequent gene structure analysis. Alternative splicing (AS) events were identified using Astalavista (version 3.0, Barcelona, Spain) [57], and five types of AS were identified: alternative 3 splice site (A3SS), alternative 5 splice site (A5SS), exon skipping (ES), intron retention (IR), and mutually exclusive exon (MEE). In addition, alternative polyadenylation (APA) was analyzed using the TAPIS pipeline (Colorado, USA) [39]. The detection of motifs was performed on the sequence of 50 nucleotides upstream of poly (A) sites in all transcripts using the web service MEME [58].

Identification of CDS, Regulatory Proteins and lncRNAs
The coding sequences (CDS) of corrected isoforms were identified with TransDecoder [62] concerning the length of the open reading frames (ORFs), log-likelihood score and similarity with amino acid sequences in the Pfam database. Based on the putative protein sequences, three types of regulatory proteins-i.e., transcription factors (TFs), transcript regulators (TRs) and protein kinases (PKs)-were identified via searching Plant TFDB4.0 (The Plant Transcription Factor Database 4.0, http://planttfdb.cbi.pku.edu.cn) using iTAK (version 15.03, NY, USA) [63].
To predict lncRNAs (long noncoding RNAs), four analysis methods including CPC (Coding Potential Calculator) [64], CNCI (Coding-Non-Coding Index) [65], CPAT (Coding Potential Assessment Tool) [66], and Pfam were applied to distinguish nonprotein and protein coding transcripts. Putative protein-coding transcripts were filtered out, and the potential lncRNAs fulfilled the thresholds that the length of transcripts should be more than 200 nt and that they should possess at least two exons. The identified lncRNAs were classified into four groups, lincRNA, antisense-lncRNA, sense-lncRNA, and intronic-lncRNA, with regard to the lncRNA location on the reference genome of G. luofuense. In addition, target genes on the genome of G. luofuense regulated by lncRNAs were predicted using LncTar (version 1.0, Tianjin, China) [67]. Two approaches of gene prediction were applied since lncRNAs regulates target genes either by cis or by trans [68,69]. One approach considered that the expression of lncRNAs differs from that of mRNA within every 100 kbp on each chromosome, whereas the other approach merely took the sequence complementation between lncRNAs and mRNAs into account.

qRT-PCR Validation and Ka/Ks Analysis
To validate the expression of Aux/IAA genes, six Aux/IAA genes (i.e., GluIAA1-6) were designed using Primer Premier (version 5.0, PREMIER Biosoft International, Palo Alto, CA, USA). Primer sequences and the qRT-PCR protocol are listed in Table S1. Two micrograms of RNA were extracted from the plant material used for PacBio and Illumina sequencing and was subjected to cDNA synthesis. The RT-PCR was performed under the following conditions: 5 min at 95°C (1 cycle), 10 s at 95°C, 10 s at 60-65°C and 10 s at 72°C (30 cycles), and 5 min at 72°C (1 cycle). Given the ACTIN gene expression as the reference, the relative expression of six Aux/IAA genes as normalized with the ∆∆Ct-method [73].
Sequences of Aux/IAA genes for each species mentioned above were aligned using MUSCLE (version 3.8.31, Cambridge, UK) [74].Values of the nonsynonymous substitution rate (Ka) and the synonymous substitution (Ks) between paralogous gene pairs were calculated using DNAsp (version 6.12.03, Barcelona, Spain) [75] with the default setting. The ratios of Ka/Ks for each gene pair and their standard deviations were calculated and presented in a box plot using ggplot2 package (version 2.2.1, TX, USA) [75] implemented in R version 3.4.4 [76]. The Student t-test-with double tales and different standard deviation-was performed to examine the Ka/Ks values in pairs between different seed plant species. In addition, a phylogeny that comprises 11 land plant species was taken from TimeTree (PA, USA) [77].

Motif Prediction and Phylogenetic Analysis
A phylogeny was constructed based on 72 full-length Aux/IAA amino acid sequences derived from A. trichopoda, A. thaliana, E. equisetina, G. biloba, G. luofuense, P. taeda, W. mirabilis Hook.f., and Z. mays and was applied to identify motifs using the web server MEME. Three parameters were set according to Wu, Liu, Wang, Li, Liu, Tan, He, Bai and Ma [24]: an optimum motif width of 6-50, a maximum number of motifs of 5, and zero or one occurrence per sequence for each motif site.
Since Aux/IAA genes were dramatically variable, we merely applied the four conservative domains to build up a phylogeny, and the 72 canonical sequences were aligned using MUSCLE. The best-fit model for the amino acid alignment-JTT+Γ-was selected according to AICc values (corrected Akaike Information Criterion) using ProtTest (version 3.2, Vigo, Spain) [78]. A maximum likelihood tree with 1000 pseudo-replicates of simulated bootstraps was created using RAxML-HPC2 version 8.2.12 [79] implemented in web service CIPRES Gateway version 3.3 (http://www.phylo.org). One Aux/IAA gene from Physcomitrella patens (Hedw.) Bruch & Schimp was set as the outgroup.

Assembly of Full-Length Transcriptome and Genome Mapping
PacBio sequencing generated 710,392 ROIs with an average length of 1812 bp, and the mean read quality of each insert was 0.944 ( Figure S1A). After the deletion of 30,101 short reads (4.2%), the remaining ROIs were classified into 483,475 full-length reads (FLs, 68.1%), comprising 3551 full-length chimeric reads (0.5%) and 480,225 full-length nonchimeric reads (67.6%, FLNC), and 196,816 non-full-length reads (nFLs, 27.7%) (Figure 2A). The FLNC reads were clustered using the ICE algorithm and yielded 254,573 consensus isoforms with a mean length of 1859 bp ( Figure 2B). Together with non-full-length sequences, the consensus isoform in each cluster was polished, resulting in 176,391 high quality isoforms (HQs) and 77,034 low quality isoforms (LQs). All polished consensus isoforms were corrected by Illumina sequencing data and yielded 253,273 corrected isoforms, with an N50 length of 2076 bp and a mean read length of 1837 bp. All corrected isoforms (253, 273) were further mapped against the reference genome with GMAP. After deleting the unmapped and redundant isoforms (18,521,7.32%), the remaining 234,752 isoforms (92.68%) were assembled into 53,057 transcripts with a mean length of 1992 bp ( Figure 2C). The completeness of our transcript dataset was assessed with BUSCO, and the result revealed that this dataset consisted of 61.5% complete and 3.8% partial BUSCO orthologues ( Figure 2D). The assembled 53,057 transcripts were functionally annotated by searching against the databases NR, SwissProt, KOG, COG, GO, Pfam and KEGG, and a total of 26,305 transcripts (49.6%) were annotated ( Figure 3A, Table S2). A total of 25,837 transcripts was annotated by the NR database, and we compared the transcript sequences to the NR databases and found five species ( Figure 3B)-i.e., Picea sitchensis (Bong.) Carrière, 7119; A. trichopoda, 2070; Nelumbo nucifera Gaertn., 1729; Vitis vinifera L., 829; and Elaeis guineensis Jacq., 787-obtained the largest five-number of transcripts homologous to G. luofuense. A total of 20,762 transcripts was annotated by Pfam and further classified into 3919 gene families (Table S3). The top five gene families with the largest enriched transcripts were the protein kinase domain (931), protein tyrosine kinase (931), leucine rich repeats (824), pentatricopeptide repeat (PPR) repeat (607), and pentatricopeptide repeat domain (516). The results of GO analysis show that 14,044 transcripts were classified into three categories: "biological process", "cellular components" and "molecular functions" ( Figure 3C). The biological process category was mainly distributed in "metabolic processes" (9409), "cellular process" (6944), and "single-organism process" (5988). Transcripts involved in the cellular component category consisted of "cell part" (6301), "cell" (6133) and "organelle" (4816). Transcripts in the molecular function category were mainly enriched for "catalytic activity" (7806), "binding" (5482), and "transporter activity" (816). Moreover, 10,140 transcripts were annotated by KEGG pathways (Figure 3D), and the largest numbers of the transcripts were enriched in five KEGG pathways: "phenylpropanoid biosynthesis" (387 transcripts, KEGG orthology number, ko00940), "starch and sucrose metabolism" (351, ko00500),

AS and APA Analysis
After aligning to the reference genome, a total of 10,454 AS events was detected, including 4745 intron retention (46%), 2531 alternative 3′ splice sites (24%), 1800 exon skippings (17%), 1235 alternative 5′ splice sites (12%), and 143 mutually exclusive exons (1%) ( Figure 4A, Table S4). In total, 9155 genes were detected to possess alternative splicing events, with the isoform numbers ranging one to 55. The largest isoform number-55-was found among three genes: TnS000130997g02, TnS000368105g01, and TnS000008647g03. To identify the differential polyadenylation sites at the 3 end of transcripts, a total of 8128 genes was found to possess at least one poly (A) site, of which 1405 genes possess at least five poly (A) sites ( Figure 4B, Table S5). The nucleotide composition both upstream (50 nts) and downstream (50 nts) was analyzed to assess the nucleotide bias of all poly (A) cleavage sites. We found an enrichment of uracil (U) 50 nts upstream and adenine (A) downstream of the cleavages site in 3 UTRs (untranslated regions, Figure 4C). Moreover, we performed a MEME analysis to detect motifs at 50 nts upstream of the poly (A) site of all transcripts to identify cis-elements for polyadenylation. Three conserved motifs-(AAAACA), (GGGCGC) and (CGCCGC)-were identified 25 nts upstream of poly (A) sites ( Figure 4D).

Identification of CDS, Regulatory Proteins and lncRNAs
To identify CDS, 31,928 ORFs were identified with a mean length of 2076 bp with Transdecoder. We found 29,430 5 UTRs with a mean length of 466 bp and 31,714 3 UTRs with a mean length of 577 bp. Of the detected ORFs, 24,605 (77%) with a mean length of 2174 bp were CDS, which were composed of start and stop codons. The number and length distribution of the identified CDS are displayed in Figure 5A. Based on the CDS detected, a total of 3179 regulatory proteins was identified, including 1413 transcription factors (TFs), 477 transcript regulators (TRs) and 1289 protein kinases (PKs) ( Figure 5B). The amino acid sequences of the identified regulatory proteins are listed in Appendix A2.
After aligning to the reference genome, a total of 10,454 AS events was detected, including 4745 intron retention (46%), 2531 alternative 3′ splice sites (24%), 1800 exon skippings (17%), 1235 alternative 5′ splice sites (12%), and 143 mutually exclusive exons (1%) ( Figure 4A, Table S4). In total, 9155 genes were detected to possess alternative splicing events, with the isoform numbers ranging one to 55. The largest isoform number-55-was found among three genes: TnS000130997g02, TnS000368105g01, and TnS000008647g03. To identify the differential polyadenylation sites at the 3′ end of transcripts, a total of 8128 genes was found to possess at least one poly (A) site, of which 1405 genes possess at least five poly (A) sites ( Figure 4B, Table S5). The nucleotide composition both upstream (50 nts) and downstream (50 nts) was analyzed to assess the nucleotide bias of all poly (A) cleavage sites. We found an enrichment of uracil (U) 50 nts upstream and adenine (A) downstream of the cleavages site in 3′ UTRs (untranslated regions, Figure 4C). Moreover, we performed a MEME analysis to detect motifs at 50 nts upstream of

Aux/IAA Gene Identification, qRT-PCR and Adaptive Evolution Analysis
This is the first study investigating Aux/IAA genes in the Gnetales, and a total of six Aux/IAA genes were found among the transcription regulators (TRs) in G. luofuense: TnS000498063g52 (named GluIAA1 hereafter), TnS000653177g04 (GluIAA2), TnS000867017g28 (GluIAA3), TnS000053353g02 (GluIAA4), TnS000142615g19 (GluIAA5), TnS000994775g01 (GluIAA6). This number of Aux/IAA genes is congruent with that derived from a genome-wide search of G. luofuense. The expression of the six Aux/IAA genes identified was further verified with qRT-PCR ( Figure 6A). In addition, we found seven Aux/IAA genes in W. mirabilis, 12 genes in E. equisetina, and six genes in G. biloba. All amino acid sequences of the identified Aux/IAA genes are listed in Appendix A4.
were predicted to regulate considerably different gene numbers using two prediction approaches: 628 genes were regulated by 722 lncRNAs in cis (Table S6), while 11,967 genes were regulated by 1195 lncRNAs in trans (Table S7).

Aux/IAA Gene Identification, qRT-PCR and Adaptive Evolution Analysis
This is the first study investigating Aux/IAA genes in the Gnetales, and a total of six Aux/IAA genes were found among the transcription regulators (TRs) in G. luofuense: TnS000498063g52 (named GluIAA1 hereafter), TnS000653177g04 (GluIAA2), TnS000867017g28 (GluIAA3), TnS000053353g02 (GluIAA4), TnS000142615g19 (GluIAA5), TnS000994775g01 (GluIAA6). This number of Aux/IAA genes is congruent with that derived from a genome-wide search of G. luofuense. The expression of the six Aux/IAA genes identified was further verified with qRT-PCR ( Figure 6A). In addition, we found seven Aux/IAA genes in W. mirabilis, 12 genes in E. equisetina, and six genes in G. biloba. All amino acid sequences of the identified Aux/IAA genes are listed in Appendix A4.
A pairwise comparison of Ka/Ks values among Aux/IAA genes reveals that the average Ka/Ks values in G. luofuense and E. foeminea were both above 1, whereas the average Ka/Ks value was 1 in W. mirabilis ( Figure 6B). There was no significance between every two gnetalean genera with regard to Ka/Ks values in pairs using Student t-test: however, the Ka/Ks values of G. luofuense were significantly larger than those in G. biloba (p-value = 5.89 × 10 −8 ) and P. taeda (p-value = 7.73 × 10 −4 ), respectively, whereas no significant difference was found between G. luofuense and P. abies (p-value = 0.2281). Besides, the Ka/Ks values of G. luofuense were significantly larger than those of Z. mays (pvalue = 2.18 × 10 −3 ) while significantly smaller than that of A. thaliana (p-value = 1.61 × 10 −3 ) and S. moellendorffii (p-value = 1.82 × 10 −6 ).

Motif Prediction and Phylogenetic Analysis
The motif composition of Aux/IAA genes was investigated with MEME, and the results show that five conservative motifs were identified ( Figure 6C): motif 1, corresponding to domain I of Aux/IAA genes, was characterized by possessing a conservative sequence "LxLxLx"; motif 2, corresponding to domain II, had a conservative sequence "VGWPPV"; motif 3, corresponding to domain III, was characteristic of conservative sequences "VKVxMDG…RK"; motif 4, corresponding to domain IV, had conservative sequences "GDxMLVGDVPW" and "KRxRxMK"; and motif 5 was characterized by possessing newly detected conservative sequences "GLAPRxxEK". The structures A pairwise comparison of Ka/Ks values among Aux/IAA genes reveals that the average Ka/Ks values in G. luofuense and E. foeminea were both above 1, whereas the average Ka/Ks value was 1 in W. mirabilis ( Figure 6B). There was no significance between every two gnetalean genera with regard to Ka/Ks values in pairs using Student t-test: however, the Ka/Ks values of G. luofuense were significantly larger than those in G. biloba (p-value = 5.89 × 10 −8 ) and P. taeda (p-value = 7.73 × 10 −4 ), respectively, whereas no significant difference was found between G. luofuense and P. abies (p-value = 0.2281). Besides, the Ka/Ks values of G. luofuense were significantly larger than those of Z. mays (p-value = 2.18 × 10 −3 ) while significantly smaller than that of A. thaliana (p-value = 1.61 × 10 −3 ) and S. moellendorffii (p-value = 1.82 × 10 −6 ).
The deep divergence of the phylogeny based on canonical Aux/IAA sequences in different seed plant groups usually has low statistical support ( Figure 6B). Nevertheless, the results of phylogenetic analysis reveal that the 72 Aux/IAA genes were divided into two groups: type A and type B genes. Type A genes formed a monophyletic group with good support (clade A, bootstrap values, BS=85) while type B formed two paraphyletic groups-i.e., clades B and C-but with low support. Moreover, Aux/IAA sequences from gymnosperms tend to form a clade, as with the cases found in clades D, E, F, G, H and I, but all the clades received low statistic support.

Full-Length Transcriptome and Gene Annotation
A total of 253,273 corrected isoforms was generated in the present study, with a mean read length of 1837 bp. The full-length transcripts can overcome the problems confronted by the short-read sequencing of conventional RNA sequencing. Using PacBio sequencing technique, 31,928 ORFs and 2043 novel genes were identified, suggesting that full-length transcript sequence information provides useful resources to improve genome annotation in G. luofuense. A final of 53,057 transcripts were assembled, and 26,305 genes from 3896 gene families were annotated against the seven databases (NR, SwissProt, KOG, COG, GO, Pfam and KEGG) ( Figure 3A). Among the ten KEGG pathways annotated ( Figure 3D), "starch and sucrose metabolism", "purine metabolism", "ribosome", "biosynthesis of amino acids", and "carbon metabolism" might participate in the secretion of sugary pollination drops in G. gnemon and G. luofuense as the response to insect pollination [80]. Besides, KEGG pathways in "phenylpropanoid biosynthesis", "plant hormone signal transduction", and "plant-pathogen interaction" might affect fertile reproductive unites development of the two species [80]. Besides, the annotated genes were enriched in the top ten GO enrichment categories ( Figure 3C). Our results are congruent with that eight categories i.e., "metabolic process", "cellular process", "response to stimulus", "catalytic activity", "binding", "cell part", "cell" and "organelle" are all densely enriched in the transcripts of female reproductive organs in Gingko [81] and conifers [82]. These results provide useful information to infer the conservative functional genes in regulating the development of female strobili in gymnosperms.

AS Analysis
In the present study, we found~46% cases of intron retention,~24% of three alternative 3 splice sites,~17% of exon skipping,~12% of five alternative 3 splice sites, and~1% mutually exclusive exons ( Figure 4A). Our results significantly differ from the proportion of AS events identified in Arabidopsis, with~40% intron retention,~15.5% three alternative 3 splice sites,~17% exon skipping,~7.5% five alternative 3 splice sites, and~8% mutually exclusive exons [31]. A similar proportion of intron retention (~40%) was detected in other angiosperms; e.g., maize [83] and sorghum [39]. In other species of angiosperms, intron retention also accounts for the majority of cases, such as bamboo [49], Amborella trichopoda [50] and strawberry [51], red clover [52], sugar cane [53]. Thus, the results corroborate the suggestion that intron retention constituted the majority of AS events in seed plants [84]. In addition, alternative spliced genes are usually over-represented in order to respond to abiotic stress in angiosperm plants [85,86]. Compared to angiosperms, AS events that associated with the resistance of abiotic stress have been rarely reported in gymnosperms. One study shows that the retention of the first intron in genes that encodes β-hydroxyacyl ACP dehydratase was associated with response to cold temperature during seed imbibition and the regulation of the lipids biosynthesis in Picea mariana [87].Another case shows that anthocyanidin synthase gene in Gingko biloba (named GbANS), which possesses alternative splicing, participate into a series of abiotic stresses, e.g., UV-B, cold, and abscisic acid etc. [88]. Accordingly, the 10,454 AS events with varied types detected in the female strobili of G. luofuense pave the way to detecting the responses of reproductive organs in gymnosperms to abiotic stress in further studies.

APA Analysis
Besides AS events, alternative polyadenylation is able to produce different lengths of transcripts and contributed to the diversity of mRNA [35,89,90]. APA events affect different methods of 3 termination, which comprise the protein coding region and 3 UTRs [35,38,89,91]. A previous study shows that alternative polyadenylation of genes Lhcb2 and Lhcb3, affects the stability and transcription of mRNAs during the leaf development of G. biloba [92]. In angiosperms, the different types of the cleavage and polyadenylation of mRNAs play a role in floral development and the flowering time of Arabidopsis [90,93,94], the detection of APA events in the present study, a total of 8128 genes to have poly A sites ( Figure 4B), is helpful for elucidating the female strobili development in G. luofuense. Moreover, we found a clear bias towards nucleotide composition 50 nt upstream and downstream of all poly (A) cleavages ( Figure 4C). The nucleotide composition bias found in G. luofuense is congruent with angiosperms (e.g., Arabidopsis [36,37] and Sorghum [39]) and non-Gnetalean gymnosperms (e.g., Gingko [95]). In addition, we found three overrepresented motifs-(AAAACA), (GGGCGC) and (CGCCGC)-at 25 nucleotides upstream from the poly (A) sits of all the transcripts ( Figure 4D). These motifs, however, all remarkably differ from a significantly over-represented motif-(AAUAAA) detected in angiosperms (e.g., Arabidopsis [39] and Trifolium [52]) and non-Gnetalean gymnosperms (e.g., Pinus [96] and Gingko [95,97]).

LncRNA Analysis
Our results show that a total of 1196 lncRNAs with the mean length 1258 bp were identified for the major developmental stages of G. luofuense female strobili. The lncRNAs in G. luofuense were shorter than the full-length transcripts with the mean length of 1992bp, which is consistent with the lncRNAs detected in Picea abies [98]. We found 488 lincRNA (40.80%), 20 antisense-lncRNA (1.67%), 238 intronic-lincRNA (19.90%), 417 sense-lncRNA (34.87%) and 33 others (2.76%). The number and proportion of lncRNAs in G. luofuense dramatically differ from the 1323 lncRNAs from G. biloba, which possessed 762 lincRNA (57.60%), 211 antisense-lncRNA (15.95%), 100 intronic-lincRNA (7.56%) and 280 sense-lncRNA (21.16%) [99]. Besides, a very recent published full-length trancriptome of G. bioloba consisted of 1270 lncRNAs, comprising 628 lincRNA (49.45%), 88 antisense-lncRNA (6.93%), 259 intronic-lincRNA (20.39%), 267 sense-lncRNA (21.02%) and 28 others (2.20%) [95]. The pronounced interspecific and infraspecific variation in lncRNA numbers corroborates the statement that evolutionarily conserved lncRNAs account for quite a small proportion of RNAs, making up 2-5.5%, probably due to the consequence of rapid evolution [100][101][102]. In addition, lncRNAs have effects on their target genes in the manner of either in cis or in trans [99]. Provided that the concentration of lncRNA is at a low level, they tend to execute in cis, regulating the expression of the genes in the vicinity, whereas at a higher level, lncRNAs can perform in trans instead, acting on the genes regardless of their location [68,69]. Our results show that 1093 target genes were predicted to be regulated by lncRNA in cis and 11,967 genes in trans. These results pave the way to characterizing the functions of lncRNAs in the organ development of Gnetum.

Regulator Proteins
The identification of regulator proteins is important to elucidate the molecular mechanism that regulates the female strobili development of G. luofuense. Our results also show that a total of 3179 regulatory proteins from 176 families were identified, comprising 1413 transcription factors (TFs), 477 transcript regulators (TRs), and 1289 protein kinases (PKs) ( Figure 4B). TFs of MADS-box genes play an essential role in reproductive organ evolution of seed plants [17,103]. We found 38 TFs belong to the MADS-box gene family, of which 24 TFs derive from MIKC c -group genes. One previous study reported that three MIKC c -group genes-GpMADS1, GpMADS3 and GpMADS4-were involved in the female strobili development of G. parvifolium [13]. Besides this, MIKC c -group genes-e.g., GGM2, GGM3, GGM9 and GGM11-participate in the sexual determination and development of female strobili in G. gnemon [14,17,104]. A genome-wide research of MADS-box genes yielded a total of 35 MIKC c -group genes [105], the majority of which were detected to express female strobili and fertile reproductive ovules of G. luofuense [80].
TRs of SNF2 genes were shown to participate in the shoot development and flowering in A. thaliana [106]. In the present study, 31 SNF2 TRs in the female strobili of G. luofuense were identified. In addition, a previous study reported a series of membrane-bound transcription factors in seed plants via the whole genome research, e.g., 3 bHLHs, 4 C2H2, and 5 WRKY TFs in Picea abies [107]. We identified considerably more TFs-92 C2H2, 109 bHLH, and 41 WRKY TFs in G. luofuense. The C2H2, bHLH and WRKY TFs were reported to differentially express during floral organ development in Brassica rapa [108]. Besides this, bHLH TFs mediate the carpel margin tissues and petal sizes of Arabidopsis' flowers [109,110]. Thus, the identified membrane-bound transcription factors might play roles in the female strobili development of G. luofuense. Moreover, we identified 795 RLK/Pelle PKs, accounting for a relatively large proportion (25%) of the detected regulator proteins. The RLK/Pelle family is the largest family of PKs, and over 600 and 1000 RLK/Pelle PKs were found in Arabidopsis and rice, respectively [111]. One gene, CLAVATA1, facilitates the enlargement of the shoot and flower meristems, suggesting an important role in controlling meristem size and floral organ development in A. thaliana [112,113]. In gymnosperm, one gene PsRLK was examined to involve in the seedling development of in Pinus Sylvestris, this gene was strictly expressed in the specialized phloem cells and functioned as a putative receptor-like protein [114].

Diversity and Molecular Evolutionary Rates of Aux/IAA Genes
Aux/IAA genes are a large gene family that is involved in varied developmental processes of plants including embryogenesis, apical dominance, vascular differentiation, lateral root imitation and shoot elongation [23,24,115]. We found 82 Aux/IAA transcription factors from G. luofuense, corresponding to six genes (namely GluIAA1-GluIAA6), and the expression of the six genes was further validated by qRT-PCR ( Figure 6A). Besides of the Aux/IAA genes from Gnetum, we found seven genes from W. mirabilis (i.e., WmiIAA1-WmiIAA7) and 12 genes from E. equistina (i.e., EeqIAA1-EeqIAA12), respectively. The Aux/IAA gene numbers of the three gnetalean genera were all considerably less than those from 13 species (ranging from 16 in A. trichopoda to 63 from G. max) that represent major lineages of angiosperms and P. abies (31) [24]. Considering the low abundance of Aux/IAA genes detected in P. patens (3), S. moellendorffii (9), and G. biloba (6), the diversity of Aux/IAA genes probably dramatically increased during the evolution of land plants. The statement is in accordance with the previous study [24]. In addition, the molecular evolutionary rates were estimated with regard to Ka/Ks values using paralogous Aux/IAA gene pairs for seed plants ( Figure 6B). We found the Ka/Ks values of Aux/IAA paralogous gene pairs were overlapping within the Gnetales. It is noticeable that the average Ka/Ks value in G. luofuense and E. equisetina were both above 1, implying a relatively large proportion of the Aux/IAA paralogous gene pairs might experience positive selection during the course of evolution. The average Ka/Ks value of W. mirabilis was close to 1, suggesting the trend of neutral selection for Aux/IAA paralogous gene pairs in W. mirabilis. In contrast, however, a relatively large proportion of Ka/Ks values in P. abies and all Ka/Ks values of G. biloba and P. taeda were lower than 1, implying that these Aux/IAA paralogous gene pairs experienced purifying selection. Thus, Aux/IAA genes might have evolved faster than those in other non-Gnetalean gymnosperms. Compared to gymnosperms, Ka/Ks values of Aux/IAA paralogous gene pairs exhibited a more diverse pattern among the four angiosperm species, which is in accordance with the results found in a previous study [24].

Phylogenetic Analysis and Structural Detection of Aux/IAA Genes
Phylogenetic analysis is a powerful method to understand the evolution of Aux/IAA genes. Previous studies have made efforts to resolve the phylogenetic relationships of Aux/IAA genes for many angiosperms-e.g., O. sativa [27], Solanum lycopersicum L. [72], and Z. mays [29]-and one gymnosperm species-P. taeda [26]. A recent phylogeny of Aux/IAA genes was constructed based on 253 canonical sequences from 17 land plant species, and was resolved into five clades i.e., clade A-E [24]. The taxon sampling of this 17-species phylogeny; however, this was biased, with merely one gymnosperms species (i.e., P. abies) sampled. A new phylogeny of the present study included the Aux/IAA genes in G. biloba and three Gnetalean genera and shows that the Aux/IAA genes were classified into two types; i.e., type A and type B ( Figure 6C). This classification is largely in accordance with that defined in a previous study [27]. Aux/IAA genes from the Gnetales and other gymnosperms tend to segregate from those from angiosperms (examples shown in clades D, E, F, G, H and I, Figure 5C). Nevertheless, the statement requires further validation based on a well-resolved phylogeny of Aux/IAA genes with an extensive sampling of seed plants. In addition, a total of five motifs was identified ( Figure 6C), of which four motifs were detected as the four conservative domains of Aux/IAA genes [22][23][24]. Interestingly, a new motif with the conservative amino acids "GLAPRxxEK" was identified, which is located in the vicinity of domain IV. The new motif is commonly seen in gymnosperms, especially in the Gnetales, with quite a few present in angiosperms. Moreover, considering the gene structure, a total of eight types were identified for the 72 Aux/IAA genes from the land plant species. The results corroborate the suggestion that Aux/IAA genes already existed before the divergence of gymnosperms and angiosperms [24,26].

Conclusions
This is the first full-length transcriptome of reproductive organs in gymnosperms using PacBio SMRT sequencing. A total of 53,057 transcripts were assembled and 2043 novel genes were identified, improving genome annotation and gene function during the female strobili development of G. luofuense. In addition, AS, APA and lncRNA for all detected transcripts were reported and discussed, paving the way for the analysis of post-transcriptional modification and identification of the regulation mechanism in further studies of reproductive organs in Gnetum. Finally, this is the first study reporting Aux/IAA genes in the Gnetales, and a total of six, seven and 12 Aux/IAA genes were identified for Gnetum, Ephedra and Welwitschia, respectively. The results of phylogenetic analysis and molecular rate assessment corroborate the suggestion that Aux/IAA genes in the Gnetales might have experienced a different evolutionary course from those of other non-Gnetalean groups. Additional work is needed to elucidate the molecular pathways and regulation of the reproductive organs in association with Aux/IAA genes in the Gnetales.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/10/11/1043/s1, Figure S1: ROI and FLNC length distribution; Table S1: Primer information of Aux/IAA genes and qRT-PCR protocol; Table S2: All transcripts annotated by searching against varied databases; Table S3: Gene families and transcript information annotated by searching the Pfam database; Table S4: Alternative splicing events and location sites on the reference genome of G. luofuense; Table S5: Information of polyadenylation events detected from G. luofuense female strobili; Table S6: lncRNAs and corresponding cis-regulated genes; Table S7: LncRNAs and corresponding cis-regulated genes.