Differential Gene Expression with an Emphasis on Floral Organ Size Differences in Natural and Synthetic Polyploids of Nicotiana tabacum (Solanaceae)

Floral organ size, especially the size of the corolla, plays an important role in plant reproduction by facilitating pollination efficiency. Previous studies have outlined a hypothesized organ size pathway. However, the expression and function of many of the genes in the pathway have only been investigated in model diploid species; therefore, it is unknown how these genes interact in polyploid species. Although correlations between ploidy and cell size have been shown in many systems, it is unclear whether there is a difference in cell size between naturally occurring and synthetic polyploids. To address these questions comparing floral organ size and cell size across ploidy, we use natural and synthetic polyploids of Nicotiana tabacum (Solanaceae) as well as their known diploid progenitors. We employ a comparative transcriptomics approach to perform analyses of differential gene expression, focusing on candidate genes that may be involved in floral organ size, both across developmental stages and across accessions. We see differential expression of several known floral organ candidate genes including ARF2, BIG BROTHER, and GASA/GAST1. Results from linear models show that ploidy, cell width, and cell number positively influence corolla tube circumference; however, the effect of cell width varies by ploidy, and diploids have a significantly steeper slope than both natural and synthetic polyploids. These results demonstrate that polyploids have wider cells and that polyploidy significantly increases corolla tube circumference.


Introduction
Organ growth is highly regulated, controlled by both intrinsic and extrinsic factors [1]. Organ size is controlled by either cell growth or cell proliferation, with the combination of the two often being the highest contributor [2]. Cell proliferation occurs early in development followed by cell expansion [3,4], with an indication that cell division and organ growth are to a certain degree independent processes [5]. However, a few studies have shown a co-regulation of cell division and expansion, albeit with different genetic pathways [6][7][8]. The relative contribution of each phase is variable among species [9,10], likely contributing to different mating systems and pollinator syndromes [11][12][13][14]. Flower size has important ecological implications such as pollinator visitation and pollination success [15][16][17] in wild and domesticated plants (as reviewed in [18]). Across angiosperms, closely related species have similar floral architecture in general, yet the size and shape can be highly labile and highly heritable [19].
Nicotiana polyploids and their progenitors to address the following questions related to changes in floral organ size. Are increases in tube width of polyploids due to increased cell width compared to their diploid progenitors? Do the natural and synthetic polyploids of N. tabacum show similar patterns of differential gene expression, both within and among taxa at specific developmental time points? Do any of the known floral size candidate genes show differential expression? Do we see clear associations between organ size genes, overall flower size, and changes in cell size? The analyses presented here will advance our understanding of floral evolution after whole genome duplication and the association of genome size and cell size in floral tissue while also providing an additional comparison to determine if known candidate genes impacting floral organ size show similar patterns of differential expression to previously investigated taxa.

Plant Material
Plant material was grown in greenhouses with natural sunlight and temperatures between 10 and 30 • C at the University of California, Riverside. The following accessions were used: Nicotiana sylvestris A04750326 (Radboud University, Nijmegen, The Netherlands); N. tomentosiformis BRNO 4103 (acquired from A. Kovařík, Brno, Czech Republic); N. tabacum 095-55 (IPK Gatersleben, Germany); N. tabacum 'Chulumani' (collected in the field in Bolivia by S. Knapp); and three first-generation synthetic lines, QM20, QM24, and QM25 (created by K.Y. Lim at Queen Mary, University of London by crossing 4x autotetraploid N. sylvestris and 4x autotetraploid N. tomentosiformis; Figure 1). Because multiple accessions of N. tabacum were used throughout the study, we will refer to plant lines as accessions.
Genes 2020, 11, x FOR PEER REVIEW 3 of 24 Most studies investigating the genetics of flower size have focused on model organisms. There is a large gap in our understanding across the angiosperm tree of life in terms of which genes are responsible for observed differences in floral size and if any clade specific patterns exist. Here, we use Nicotiana polyploids and their progenitors to address the following questions related to changes in floral organ size. Are increases in tube width of polyploids due to increased cell width compared to their diploid progenitors? Do the natural and synthetic polyploids of N. tabacum show similar patterns of differential gene expression, both within and among taxa at specific developmental time points? Do any of the known floral size candidate genes show differential expression? Do we see clear associations between organ size genes, overall flower size, and changes in cell size? The analyses presented here will advance our understanding of floral evolution after whole genome duplication and the association of genome size and cell size in floral tissue while also providing an additional comparison to determine if known candidate genes impacting floral organ size show similar patterns of differential expression to previously investigated taxa.

Flower Size and Cell Size
To measure differences in cell size and cell number, we stained fresh corolla tissue at anthesis in 0.1% aniline blue in 1N K 3 PO 4 for 2 h. We collected the tissue from one half of the mouth of the corolla tube, just below the floral limb ( Figure S1). Stained tissues were mounted on microscope slides and Genes 2020, 11, 1097 4 of 24 imaged on a Leica DM5500 B fluorescent microscope with the DAPI filter using tiling to create a mosaic image of the entire tissue. We measured corolla tube circumference using Fiji [98] with the length of the mounted tissue just below the floral limb ( Figure S1) serving as a proxy for corolla tube width. Number of cells was also counted, and the width of 100 consecutive cells were measured, starting at the dorsal side of the flower. We used four flowers each from N. sylvestris; N. tomentosiformis; N. tabacum 095-55; N. tabacum 'Chulumani'; and the synthetic N. tabacum lines QM20, QM24, and QM25.
To determine if differences in cell size, cell number, and ploidy influence the corolla tube circumference at the corolla mouth, we ran a Linear Model in R [99] using cell number, average cell width, and ploidy as predictor variables. To investigate which factors were significant in the model, we started by including two-way interactions and compared the fit of this model to that of models with a single factor removed using the drop1 function in the lmerTest package (version 3.0.1; [100]). We chose a reduced model (corolla tube circumference~width + ploidy + width:ploidy + cell number) in which all factors included had a significant effect on corolla tube circumference and only removed factors that did not alter the fit of the model. We checked the assumptions of normality and constant variance, and the data were appropriate. We used our chosen model to predict corolla tube circumference using the full range of cell width and cell number values in the dataset and calculated 95% confidence intervals (mean ± 2SE). We used the car::anova function to determine whether model variables significantly affected corolla tube circumference and also performed post hoc pairwise comparisons using the lsmeans function [101] to investigate the significant interaction between ploidy and cell width. In these pairwise analyses, we compared the corolla tube circumference of diploids, natural polyploids, and synthetic polyploids as predicted by our best model at minimum, 1st quartile, median, 3rd quartile, and maximum cell width values. We ran linear models to determine whether ploidy affects both cell width and cell number (width~ploidy and cell number~ploidy) and performed ANOVAs using the aov function with post hoc Tukey tests (α = 0.05) to determine whether diploids, natural polyploids, or synthetic polyploids had significantly different cell widths and cell numbers. We plotted the predicted data based on the continuous effects in R using the geom_smooth function in ggplot2 (version 3.0.0; [102]). We plotted the discrete effects as strip plots and plotted strip plots for cell width, cell number, and corolla tube circumference for all accessions. R scripts were uploaded to GitHub (https://github.com/elizabethwmccarthy/).

Transcriptome Sequencing and Analyses
Flower material from three developmental time points (60%, 85%, and 95% mean corolla length at anthesis including the floral limb, the part of the flower that opens, mean calculated from a minimum of five flowers per accession) from three biological replicates was flash frozen in liquid nitrogen for six taxa: N. sylvestris, N. tomentosiformis, N. tabacum 095-55, N. tabacum 'Chulumani', synthetic N. tabacum QM24, and synthetic N. tabacum QM25. Due to lack of material, only two biological replicates for N. tomentosiformis at 85% and QM25 at 95% were collected. The RNeasy Mini Plant Kit (Qiagen, Hilden, Germany) was used for RNA extraction, followed by DNase treatment using the Turbo-DNase Kit (Ambion, Vilnius, Lithuania). Strand-specific libraries were constructed from mRNA as previously described by [103] and sequenced on an Illumina HiSeq2500 (Illumina, San Diego, CA, USA) with 1 × 85 bp reads. Raw reads were processed using TrimGalore (version 0.4.2; [104]) to remove adapters and filter with a minimum quality score of 30 and a minimum length of 60 bp.
Oxford Nanopore minION long reads were sequenced for stages 60% and 95% for both progenitor species. A total of 100 ng of total RNA was used as input for the cDNA-PCR Barcoding library kit (SQK-PCS109 with SQK-PBK004 barcodes). Four barcoded libraries were pooled with equal volumes. The loaded flowcell was run for 48 h, generating 5.86 million reads and 6.56 GB of raw data, followed by flushing (kit EXP-WSH003) and loading the remaining library for another 24 h, generating an additional 588 thousand reads and 656 MB of data. Fastq files were called from the produced fast5 files using guppy (version 3.4.1; Oxford Nanopore) with a minimum quality score of 7. Fastq reads were demultiplexed using porechop (version 0.2.4; https://github.com/rrwick/Porechop) and error corrected using FMLRC (version 1.0.0; [105]) by first building an index using the cffq function in MSBWT (version 0.3.0; [106]) from the Illumina sequencing reads. FMLRC has been shown to outperform other error correction approaches [107,108]. The resulting corrected reads were filtered with seqkit (version 0.12.0; [109]) to keep reads larger than 800 bp. Accession numbers to the Sequencing Read Archive for each accession are provided in Table S1.
A de novo transcriptome representing N. tabacum was necessary to map individual reads to characterize expression patterns because previous attempts to investigate differential gene expression using the reference genomes [110,111] resulted in a large number of unmapped reads. To minimize homeolog bias during assembly, a species level assembly for both N. sylvestris and N. tomentosiformis were merged in silico to represent N. tabacum. Multiple approaches were undertaken to find the most suitable combination of programs and reads to generate the most complete assemblies in terms of number of contigs, N50, and percent of Solanales Benchmarking Universal Single-Copy Orthologs (BUSCO; version 3.0.2; [112]). Assemblies were carried out with SPAdes (version 3.13.1; [113]), Trinity (version 2.8.4; [114,115]), and TransAbyss (version 2.2.1; [116]). Default parameters for SPAdes were employed. For TransAbyss, a kmer of 64 was used. Default parameters were used for Trinity with the exception of requiring a minimum contig length of 250 bp. Trinity and TransAbyss assemblies were done on the high-performance computing cluster at the University of California, Riverside, while the SPAdes assemblies were done on the CUPAC (Cornell University Plant Anatomy Collection) server. Additional Illumina data from previous studies were downloaded from the European Nucleotide Archive (ENA ; Table S1) and processed with TrimGalore with a minimum quality of 30 and minimum length of 75 bp. Cleaned reads from each accession were normalized to 100× coverage using the BBNorm function in BBTools (version 38.16; http://jgi.doe.gov/data-and-tools/bbtools). The different combinations of data and programs were as follows: (1) SPAdes Illumina data, (2) SPAdes Illumina + ENA + Nanopore, (3) SPAdes ENA + Nanopore, (4) Trinity Illumina, (5) Trinity ENA + Nanopore, and (6) TransAbyss ENA + Nanopore. Assembly quality was estimated by calculating N50 and N90 values using the R package CNEr (version 1.18.1; [117]) after removing contigs smaller than 250 bp using seqkit (version 0.12.0). The percent of BUSCO genes in each assembly was determined using the Solanales library (solanaceae_odb10) with the transcriptome function. Plots were made in R (version 3.6.0; [99]) using the generate_plot.py script.
Filtered assemblies for N. sylvestris and N. tomentosiformis using ENA and Nanopore data were annotated using Trinotate [118] and TransDecoder (version 5.0.2; [115]). The assembly fasta file was prepped for alignment using RSEM (version 1.3.0; [119]) and bowtie2 (version 2.3.4.1; [120]). Contigs were compared to the SwissProt database with ncbi-blast (version 2.7.1+; [121] using BLASTX and BLASTP. The translated peptide assemblies were analyzed with OrthoVenn2 [122] to compare the number of overlapping proteins and the unique proteins in each species. Differential gene expression was determined using limma (version 3.40.6; [123]) and Trimmed Mean of M values (TMM) normalization. Two sets of analyses were performed: one comparing 60%, 85%, and 95% of anthesis length developmental stages within a taxon and one comparing taxa at each developmental stage. The resulting BAM files were sorted and indexed using samtools (version 1.9; [124]) and visualized in IGV (version 2.5.3; [125][126][127]) to check for quality of mapping to candidate genes. Candidate genes were identified via the Trinotate annotation file. For known genes that were not annotated, we used a local BLAST (version 2.2.31) analysis using the TransDecoder CDS file with a minimum cutoff of length 50 bp and minimum e-value 1.0 × 10 −20 . Plots were generated to visualize differentially expressed genes using ggplot2. We plotted the log 2 fold change of differentially expressed genes for each pairwise comparison using violin plots to represent all differentially expressed genes and overlaid strip plots representing the TMM differentially expressed values of organ size genes of interest. Clustering of differentially expressed transcripts was done in the Trinity pipeline with supplied perl scripts using hierarchical clustering of FPKM (Fragments Per Kilobase Million) expression values.

Cell width, Cell Number, and Ploidy Positively Influence Corolla Tube Circumference
Based on previous studies showing wider corolla tubes in allopolyploids than expected [90,91], we tested the hypothesis that increases in tube width are due to increased cell width in polyploids. Cell size, cell number, and corolla tube circumference vary across accessions (Figure 2A-C). Ploidy positively influences cell width (F 2,25 = 23.67, p = 1.71 × 10 −6 ), natural polyploids have significantly wider cells than diploids (TukeyHSD, p = 0.006), and synthetic polyploids have significantly wider cells than both diploids (TukeyHSD, p = 1 × 10 −6 ) and natural polyploids (TukeyHSD, p = 0.012; Figure 2A and Table 1). Ploidy also affects cell number (F 2,25 = 4.892, p = 0.016), and synthetic polyploids have significantly fewer cells than diploids and natural polyploids (TukeyHSD, p = 0.015; Figure 2B and Table 2). We found that cell width (F 1,21 = 181.47, p = 4.39 × 10 −14 ), cell number (F 1,21 = 240.12, p = 5.74 × 10 −13 ), and ploidy (F 2,21 = 6.56, p = 0.0061) all have a significant positive effect on corolla tube circumference by using a linear model ( Figure 2D-F and Table 3; Table 4). Additionally, the interaction between cell width and ploidy is significant (F 2,21 = 7.84, p = 0.0029; Table 4). This suggests that cell number affects corolla tube circumference independent of ploidy, but that the effect of cell width varies by ploidy. Both natural polyploids and synthetic polyploids have significantly less steep positive slopes for the effects of cell width on corolla tube circumference than diploids (p = 0.01 and p = 0.005, respectively; Figure 2D and Table 3), which implies that an increase in cell width in polyploids will not influence corolla tube circumference as much as an increase in cell width in diploids. However, natural polyploids show a marginal difference and synthetic polyploids have significantly larger corolla tube circumferences than diploids at 3rd quantile cell width values, and both natural and synthetic polyploids have a significantly larger corolla tube circumference than diploids at the maximum cell width value (Table S2). There is no difference in corolla tube circumference between diploids, natural polyploids, and synthetic polyploids at 1st quartile or median cell width values, whereas natural polyploids seem to have smaller corolla tube circumferences than diploids at the minimum cell width value (Table S2). In summary, cell width, ploidy, the interaction of width by ploidy, and cell number are all crucial for determining corolla tube circumference (Table 4).

Transcriptome Assemblies
Nanopore sequencing generated 5,380,172 high-quality reads. Specifically, 1,728,564 reads were generated for N. sylvestris 60% and 995,999 reads were generated at 95%. For N. tomentosiformis, there were 992,903 reads generated at 60% and 1,662,706 reads generated at 95%. After removing reads shorter than 800 bp, a total of 1,058,332 reads for N. sylvestris with a mean read length of 1171 bp and 952,350 reads for N. tomentosiformis with a mean read length of 1150 bp remained.
The number of assembled contigs, associated N50 values, and represented BUSCO genes varies greatly among the different assemblies. The only combination that failed to assemble was Trinity using ENA + Nanopore with the -long_reads option. For N. sylvestris, in terms of the size and number of contigs, the best assembly was with SPAdes using ENA + Nanopore. This assembly had 37,710 contigs with a N50 of 2356 bp and N90 of 788 bp (Table 5). Using just Illumina data generated an assembly with 44,350 contigs and an N50 of 1716 bp. Combining all data generated 39,620 contigs and a N50 of 2254 bp. The assembly with all data was slightly worse than ENA + Nanopore with close to 2500 additional contigs and a 100 bp smaller N50 value; even though there were more total reads, the Illumina data from developmental stages were short (85 bp) and only single-end. The worst assemblies were the Trinity assembly (119,790 contigs, N50 of 815) and the TransAbyss assembly with ENA + Nanopore (100,463 contigs, N50 of 1172).
A similar pattern of quality of assembly was seen for N. tomentosiformis with the SPAdes ENA + Nanopore assembly producing 39,950 contigs, a N50 of 2258 bp, and a N90 of 643 bp. The second-best assembly was the SPAdes Illumina only, with the SPAdes Illumina + ENA + Nanopore generating a much worse assembly in terms of number of contigs (51,360 contigs vs. 47,568 contigs) but a better N50 value (2258 bp vs. 1704 bp). Again, the worst assemblies were the Trinity Illumina and the TransAbyss assemblies.
For completeness in terms of percent BUSCO genes present for N. sylvestris, the best assemblies were SPAdes Illumina + ENA + Nanopore, SPAdes ENA + Nanopore, and TransAbyss. For complete single-copy genes, all three ranged from 84.9 to 86.8%, with 6-7.6% genes missing. The Trinity assembly and SPAdes Illumina assemblies had the lowest percentage of complete single copy genes, the largest percentage of fragmented genes, and the largest percentage of missing genes ( Figure 3 and Table 5). For N. tomentosiformis, the best assemblies for complete single-copy genes are the same, but with the TransAbyss assembly having slightly higher percentage of complete single-copy genes and lower fragmented genes. The Trinity and SPAdes Illumina assemblies were the worst, with the Trinity assembly coming in far below the others in terms of complete single-copy genes and over double the amount of fragmented and missing genes.
Genes 2020, 11, x FOR PEER REVIEW 9 of 24 using ENA + Nanopore with the --long_reads option. For N. sylvestris, in terms of the size and number of contigs, the best assembly was with SPAdes using ENA + Nanopore. This assembly had 37,710 contigs with a N50 of 2356 bp and N90 of 788 bp (Table 5). Using just Illumina data generated an assembly with 44,350 contigs and an N50 of 1716 bp. Combining all data generated 39,620 contigs and a N50 of 2254 bp. The assembly with all data was slightly worse than ENA + Nanopore with close to 2500 additional contigs and a 100 bp smaller N50 value; even though there were more total reads, the Illumina data from developmental stages were short (85 bp) and only single-end. The worst assemblies were the Trinity assembly (119,790 contigs, N50 of 815) and the TransAbyss assembly with ENA + Nanopore (100,463 contigs, N50 of 1172). A similar pattern of quality of assembly was seen for N. tomentosiformis with the SPAdes ENA + Nanopore assembly producing 39,950 contigs, a N50 of 2258 bp, and a N90 of 643 bp. The secondbest assembly was the SPAdes Illumina only, with the SPAdes Illumina + ENA + Nanopore generating a much worse assembly in terms of number of contigs (51,360 contigs vs. 47,568 contigs) but a better N50 value (2258 bp vs. 1704 bp). Again, the worst assemblies were the Trinity Illumina and the TransAbyss assemblies.
For completeness in terms of percent BUSCO genes present for N. sylvestris, the best assemblies were SPAdes Illumina + ENA + Nanopore, SPAdes ENA + Nanopore, and TransAbyss. For complete single-copy genes, all three ranged from 84.9 to 86.8%, with 6-7.6% genes missing. The Trinity assembly and SPAdes Illumina assemblies had the lowest percentage of complete single copy genes, the largest percentage of fragmented genes, and the largest percentage of missing genes ( Figure 3 and Table 5). For N. tomentosiformis, the best assemblies for complete single-copy genes are the same, but with the TransAbyss assembly having slightly higher percentage of complete single-copy genes and lower fragmented genes. The Trinity and SPAdes Illumina assemblies were the worst, with the Trinity assembly coming in far below the others in terms of complete single-copy genes and over double the amount of fragmented and missing genes.
We used the SPAdes ENA + Nanopore assemblies for further analyses, and these assemblies have a total of 6115 shared protein clusters for both progenitor species, with an additional 460 protein clusters unique to N. tomentosiformis and 480 protein clusters unique to N. sylvestris ( Figure S2 and Table S3). . Percent complete single copy, fragmented, and missing BUSCO genes for both N. sylvestris and N. tomentosiformis across the five different assemblies performed: including the Nanopore longreads gave an increase of complete single copy genes and decrease in missing genes compared to short-read only, regardless of assembler.

SPAdes Illumina
TransAbyss ENA + Nanopore SPAdes Illumina + ENA + Nanopore SPAdes ENA + Nanopore N. tomentosiformis Figure 3. Percent complete single copy, fragmented, and missing BUSCO genes for both N. sylvestris and N. tomentosiformis across the five different assemblies performed: including the Nanopore long-reads gave an increase of complete single copy genes and decrease in missing genes compared to short-read only, regardless of assembler.
We used the SPAdes ENA + Nanopore assemblies for further analyses, and these assemblies have a total of 6115 shared protein clusters for both progenitor species, with an additional 460 protein clusters unique to N. tomentosiformis and 480 protein clusters unique to N. sylvestris ( Figure S2 and Table S3). Table 5. Assembly statistics for each transcriptome assembly performed for N. sylvestris and N. tomentosiformis including the assembler used, data that went into the assembly, number of contigs, N50, N90, percent complete single-copy BUSCO genes, percent fragmented BUSCO genes, and percent missing BUSCO genes.

Differential Expression
Through development, there is a pattern in which the 85% and 95% stages are most similar based on gene expression ( Figure S3) while the most differentially expressed (DE) genes occur in comparisons between 60% and 95% (Table 6). Within N. sylvestris, of the total 37,097 transcripts, there are 35 DE transcripts between 60% and 85%, 537 DE transcripts between 60% and 95%, and zero DE transcripts between 85% and 95% using a cutoff of False Discover Rate (FDR) = 0.05 and a 4-fold change in expression (Table 6). Within N. tomentosiformis, of the 39,950 transcripts, there are 227 DE transcripts between 60% and 85%, 801 DE transcripts between 60% and 95%, and no DE transcripts between 85% and 95%. For the natural polyploids N. tabacum 'Chulumani' and 095-55, the comparisons with the most DE transcripts are between 60% and 95% with 4289 and 3796, respectively. In comparison, the synthetic N. tabacum accessions QM24 and QM25 show 2914 and 2720 transcripts differentially expressed. Table 6. Number of differentially expressed contigs within a taxon through the three developmental stages: 60% of anthesis, 85% of anthesis, and 95% of anthesis. The number of differentially expressed contigs with a 4-fold change in expression and FDR less than 0.05 are reported, along with the total number of transcripts present in the given comparison. When looking at clustering of expression patterns across all taxa, there is a cluster of natural polyploids and a cluster of synthetic polyploids, with the two progenitor species being quite distinct from all polyploids ( Figure S4). In fact, based on patterns of DE transcripts, N. tomentosiformis appears to be more similar to all N. tabacum accessions at 60% and 95% while N. sylvestris is more similar to the N. tabacum accessions at 85%. At 60%, we see 43,147 DE transcripts between the progenitor species. Between the two natural polyploids, there are 781 DE transcripts, and between the two synthetic polyploids, there are 177 DE transcripts. The same pattern exists at 85% with 42,275 DE transcripts between progenitor lines, 605 DE transcripts between the natural polyploids, and 228 DE transcripts between the synthetic polyploids. Within the 95% comparisons, there are 479 DE transcripts between natural polyploids but no differentially expressed transcripts between the synthetic polyploids. Between the diploid progenitors at 95%, there are 43,023 DE transcripts.
Hierarchical clustering based on differential expression for the 60% comparisons yielded 8 subclusters (Figure 4). In the first four subclusters, representing 44,852 transcripts, both natural and synthetic polyploids show expression patterns similar to one or the other progenitor. Subcluster 5 with 573 transcripts shows that the synthetic polyploids and N. sylvestris have similar expression patterns whereas the natural polyploids and N. tomentosiformis have similar expression patterns. Subcluster 6 with 375 transcripts shows the opposite pattern, with the synthetic lines similar to N. tomentosiformis and the natural polyploids similar to N. sylvestris. The remaining two clusters show transgressive expression patterns with the synthetic polyploids and natural polyploids showing the highest expression in subcluster 7 (474 transcripts) and subcluster 8 (30 transcripts), respectively. The breakdown of annotated transcripts in each subcluster is presented in Table S4.
For comparisons within the 85% developmental stage, 12 subclusters were identified. Five subclusters comprising 42,080 transcripts show a pattern with the polyploids being similar to one of the progenitors ( Figure S5). Subcluster 4 (353 transcripts) and subcluster 7 (646 transcripts) show similar expressions between the synthetic lines and N. sylvestris, whereas the natural polyploids resemble N. tomentosiformis. Subcluster 12 (18 transcripts) shows the opposite pattern, with the natural polyploids similar to N. sylvestris and the synthetic lines similar to N. tomentosiformis. Subcluster 8 (520 transcripts) shows the synthetic polyploids displaying transgressive upregulated expression. Three subclusters show unique patterns not found in the 60% developmental stage. In subcluster 5 (1986 transcripts), both the synthetic and natural polyploids appear to have an intermediate expression between the two progenitor species. Subcluster 9 (215 transcripts) shows that the natural polyploid N. tabacum 095-55 is similar to N. tomentosiformis, whereas N. tabacum 'Chulumani' is similar to N. sylvestris. Subcluster 10 (122 transcripts) shows that N. tabacum 095-55 is similar to N. sylvestris, whereas N. tabacum 'Chulumani' is intermediate between the two progenitors. The breakdown of annotated contigs in each subcluster is presented in Table S5.    Table S6.
We were able to identify many of the known candidate genes involved in floral organ size. In general, we see more candidate genes differentially expressed within the different polyploid accessions in comparisons between 60% and 95% ( Figure 5); however, we do see differential expression within polyploid accessions in at least a few genes across all stages. A few candidate genes show differential expression within N. sylvestris or N. tomentosiformis through development except for BPE which is upregulated in 95% in N. sylvestris compared to 60% ( Figure S7). As expected, comparisons between polyploids and one of their diploid progenitors show that polyploids display upregulation of the other progenitor homeolog ( Figure S8). The N. sylvestris copy of GASA/GAST1 appears to have higher levels of expression in all polyploids compared to the expression found within N. sylvestris across all developmental stages ( Figure S8). We also see downregulation of some of the N. sylvestris genes that act as inhibitors, such as BIG BROTHER (BB) and BIGGER ORGANS, in the polyploids compared to expression in N. sylvestris as well as downregulation of the N. tomentosiformis copy of BB in polyploids compared to expression in N. tomentosiformis ( Figure S8).
Genes 2020, 11, x FOR PEER REVIEW 13 of 24 expression within polyploid accessions in at least a few genes across all stages. A few candidate genes show differential expression within N. sylvestris or N. tomentosiformis through development except for BPE which is upregulated in 95% in N. sylvestris compared to 60% ( Figure S7). As expected, comparisons between polyploids and one of their diploid progenitors show that polyploids display upregulation of the other progenitor homeolog ( Figure S8). The N. sylvestris copy of GASA/GAST1 appears to have higher levels of expression in all polyploids compared to the expression found within N. sylvestris across all developmental stages ( Figure S8). We also see downregulation of some of the N. sylvestris genes that act as inhibitors, such as BIG BROTHER (BB) and BIGGER ORGANS, in the polyploids compared to expression in N. sylvestris as well as downregulation of the N. tomentosiformis copy of BB in polyploids compared to expression in N. tomentosiformis ( Figure S8). Comparing the natural and synthetic polyploids at 60%, we see downregulation of the N. tomentosiformis copy of BB in N. tabacum 095-55 compared to all other polyploids ( Figure 6). For QM24, the N. tomentosiformis copy of ARF2 appears to be downregulated compared to the two natural polyploids, whereas the transcripts in QM25 do not show differences. The N. sylvestris copy of GASA/GAST1 appears to be upregulated in QM25 compared to the natural polyploids, whereas the transcript in QM24 does not show any difference.
Genes 2020, 11, x FOR PEER REVIEW 14 of 24 represent differentially expressed candidate floral organ size genes. Positive logFC values represent genes upregulated in the first stage listed in the comparison, whereas negative logFC values represent genes upregulated in the second stage listed. The dashed gray lines across the plot denote the log2 = |2| cutoff for differentially expressed genes. Sylv = N. sylvestris, tomf = N. tomentosiformis.
Comparing the natural and synthetic polyploids at 60%, we see downregulation of the N. tomentosiformis copy of BB in N. tabacum 095-55 compared to all other polyploids ( Figure 6). For QM24, the N. tomentosiformis copy of ARF2 appears to be downregulated compared to the two natural polyploids, whereas the transcripts in QM25 do not show differences. The N. sylvestris copy of GASA/GAST1 appears to be upregulated in QM25 compared to the natural polyploids, whereas the transcript in QM24 does not show any difference.  In the comparisons at 85%, we see upregulation of both progenitor copies of ARGOS in the natural polyploid N. tabacum 'Chulumani'. The N. tomentosiformis copies of GASA/GAST1 appear downregulated in QM24 compared to the natural polyploids and QM25, whereas copies from both progenitors appear upregulated in QM25 compared to both natural polyploids. We also see upregulation of the N. tomentosiformis copy of BB in both QM24 and QM25 compared to N. tabacum 095-55, whereas BB from N. sylvestris is upregulated in both QM24 and QM25 compared to N. tabacum 'Chulumani'. Additionally, the LONGIFOLIA genes from N. sylvestris appear to be upregulated in both synthetic lines compared to N. tabacum 'Chulumani' but show no expression differences with N. tabacum 095-55.
At 95%, we see consistent upregulation of the N. tomentosiformis copy of GASA/GAST1 in QM24 compared to natural polyploids (we also see upregulation in QM25 versus N. tabacum 'Chulumani'), while in QM25, the N. sylvestris copies of GASA/GAST1 show upregulation compared to the natural polyploids. We generally see upregulation of BB in synthetic lines compared to the natural polyploids (except in QM24 compared to N. tabacum 'Chulumani').

Discussion
Nicotiana polyploids have significantly wider cells than diploids, with synthetic polyploids having even wider cells than natural polyploids, and the ploidy factor in our model, which includes three levels-diploids, natural polyploids, and synthetic polyploids-significantly increases corolla tube circumference. Synthetic polyploids show a decrease in the number of differentially expressed transcripts within a taxon through development compared to natural polyploids. Clustering by expression patterns shows a relatively small number of transcripts displaying transgressive expression in either the natural or synthetic polyploids compared to the progenitors, while even fewer transcripts show an intermediate expression profile. Many of the known candidate genes for floral organ size show differential expression, specifically ARF2, BB, and GASA/GAST1.
Liqin et al. [49] showed upregulation of BB in diploids compared to synthetic triploids and synthetic tetraploids of Populus and suggested that BB acts as a species-specific organ size checkpoint. More importantly, Liqin et al. [49] put forward the hypothesis that polyploids may need higher expression levels of BB to obtain normal organ size. BIG BROTHER was originally described as limiting organ size by restricting the period of proliferative growth [26]. Here, we see higher expression levels of the N. tomentosiformis copy across all three developmental time points in the synthetic polyploids than the natural polyploids. In addition to upregulation of BB compared to otherN. tabacum accessions, the synthetic polyploid QM25, which has the largest flowers of all the polyploids investigated, shows higher expression of GASA/GAST1 than the natural polyploids, especially the copy from N. sylvestris. Members of the GASA family, in particular GASA4, are expressed in developing roots and flowers [24]. Indeed, in Saltugilia (Polemoniaceae), GASA was expressed at the highest levels at mature stages, which corresponded to cell elongation in the petal tube [50,128]. The accession QM25 has the longest corolla tube examined excluding N. sylvestris (Figure 1). The interaction between a gene that represses proliferation (such as BB [26]) and a gene that promotes cellular elongation (such as GASA [24]) in polyploids is intriguing, especially since the upregulated homeolog of the repressor is from the shorter flowered progenitor and the upregulated homeolog that promotes elongation comes from the longer flowered progenitor. In addition, the N. tomentosiformis copy of ARF2, which is involved in promoting cell proliferation [34], is downregulated in QM24 compared to the natural polyploids; this downregulation may play a role in restricting cell division in this accession because QM24 has the fewest cells of the accessions examined ( Figure 2B).

Differences in Expression
We see that natural and synthetic polyploids have expression patterns similar to one progenitor for the vast majority of transcripts across all developmental stages (Figure 4 and Figures S5 and S6). In total, there are 1612 transcripts in which the synthetic lines and N. sylvestris share similar expression patterns, whereas the natural polyploids are similar to N. tomentosiformis. Conversely, there are only 769 transcripts where the synthetic lines have similar expression to N. tomentosiformis, whereas the natural polyploids were similar to N. sylvestris. This demonstrates that the expression patterns in the synthetic lines tend to more closely resemble those of the maternal progenitor, N. sylvestris, whereas those in the natural polyploids tend to be more like the paternal progenitor, N. tomentosiformis.
The scaling of transcription with cell size has been shown previously [129][130][131], yet gene expression scales independently of ploidy [132,133] with no universal linear relationship [134]. We see distinct differences in cell width between natural and synthetic polyploids ( Figure 2) and that natural and synthetic polyploids form two distinct clusters based on patterns of differentially expressed genes across developmental stages. Even with the same genomic input, we see clear evidence at the phenotypic and transcriptomic levels that the two types of polyploids are distinctly different, with clear differences in floral size (Figure 1).

De Novo Transcriptome Assembly
Before undertaking the de novo approach, we tried appropriate analyses using the published genomes currently available for both progenitor species, N. sylvestris and N. tomentosiformis [110], as well as for N. tabacum [111]. Likely due to the total number of contigs in each assembly (253,917 contigs for N. sylvestris, 159,598 contigs for N. tomentosiformis, and 1,084,432 contigs for N. tabacum) and possible fragmentation of genes, we were unable to get clear results for differentially expressed genes, including genes associated with the anthocyanin pathway [135], which based on our sampling of developmental stage should have shown differential expression. The generated de novo assemblies were only slightly worse in regards to missing data than the published genomes with 6% and 9.2% missing BUSCO genes compared to 2.2% and 2.5%, which provided the power to characterize differential expression at a finer scale than just being able to characterize the broadest gene ontology terms [136]. The de novo assemblies proved better than the assembled genomes with fewer fragmented contigs of candidate genes of organ size and flower color. Similar to other recent studies, there is a wide distribution of quality of assemblies based on the programs and methods used [137,138]; in our case, we saw that SPAdes outperformed Trinity in regards to completeness of BUSCO genes and number of contigs. Adding just over a million long-reads to the N. sylvestris transcriptome decreased the number of contigs by nearly 7000 and increased the N50 values of the assembly by approximately 500 bp while increasing the complete single copy BUSCO genes from 78.5% to 86.4% (Figure 3). Similarly, for N. tomentosiformis, adding 950 thousand long-reads decreased the number of contigs by over 7700, increased the N50 from 1704 bp to 2365 bp, and increased the percent complete single copy genes from 79.5% to 82.7%. The addition of long-reads allowed for differentiation between N. sylvestris and N. tomentosiformis homeologs, which was not possible with only the Illumina data [139].

Cell Size Increase Results in Wider Corolla Tubes
Young allopolyploids tend to evolve shorter and wider corolla tubes than expected based on the mean of the morphology of their progenitors, which may allow for a greater variety of pollinator types to access the nectar rewards of these flowers [90,91]. Cell width, cell number, and ploidy positively influence corolla tube circumference, and polyploids have significantly wider cells than diploids, with either the same number of cells or fewer ( Figure 2). Thus, the wider tubes in allopolyploids result largely from an increase in cell width. The increase of cell size may result from increased growth during longer cell cycle times associated with increased genome content [140].
Other studies have also found a correlation between higher ploidy and increased cell size [59,[63][64][65]141]. These results suggest that an increase in cell size may be a direct consequence of the increase in genome size that is inherent in polyploidization [58]. Further evidence from Arabidopsis thaliana diploid, triploid, and tetraploid hybrids indicate that guard cell size increases with ploidy but that hybrids of the same ploidy as their progenitors do not display an increase [142]. Therefore, the increase in cell size is driven by polyploidy, not hybridization. Thus, the wider tubes observed in young Nicotiana polyploids may be a direct result of increased cell width at polyploid origin due to genome duplication. Similarly, flower size increased with polyploidy in colchicine-induced autotetraploids of pomegranate (Punica granatum; [143], Phlox [144], and Gerbera [145]; in natural autopolyploid populations [146]; and in synthetic allopolyploid lines of A. thaliana [142,146] and Brassica napus [147]. Intriguingly, synthetic allopolyploids have significantly wider cells than naturally occurring allopolyploids that arose approximately 0.6 million years ago [87]. This suggests that increased cell size in polyploids may attenuate over time [71], which correlates with our results that young polyploids tend to have wider corolla tubes than expected while older polyploids do not [90]. However, this attenuation of cell size in natural polyploids does not seem to be due solely to reduction of genome size following polyploidy because the natural N. tabacum genome has only downsized 3.7% compared to the expected sum of its progenitors [148]. In addition, these cell size results are from a single allopolyploid species represented by only two natural accessions and three synthetic lines. Another possibility is that the colchicine treatment and tissue culture processes used in the creation of the synthetic lines has affected cell size in the synthetic allopolyploids. However, a study separating the effects of polyploidy and colchicine treatment found that guard cell length was primarily affected by ploidy [149]. In addition, conical cells in floral limb tissue of Nicotiana rustica (approximately 0.7 million years old) are nearly the sum of those of its progenitor species while conical cells of three species of Nicotiana section Repandae (approximately 4.3 million years old) [87] are intermediate in area between those of their progenitors [150]. This intermediate cell size is consistent with the fact that tube width in section Repandae is intermediate between progenitor morphology, as expected [91]. The potential reversion to a diploid-like cell size over time may explain why older polyploids do not display the same trends in corolla tube width evolution as young polyploids [90]. Again, this reversion to a diploid-like cell size does not seem to be correlated to genome size because two of the section Repandae species examined have genome sizes that have increased following polyploidy based on the sum of their diploid progenitors [148].

Conclusions
Inclusion of Oxford Nanopore long-read data significantly improved our transcriptome assemblies such that they contained a higher percentage of BUSCO genes and allowed for differentiation between progenitor homeologs. Floral organ size genes ARF2, BIG BROTHER, and GASA/GAST1 show upregulation in natural and synthetic N. tabacum accessions compared to the progenitor species. The synthetic polyploid accessions also showed upregulation of the candidate genes compared to the natural polyploids. The highest expression of candidate genes was observed in earlier stages of flower development. The wider corolla tubes observed in young allopolyploids are predominantly due to an increase in cell width, perhaps resulting from whole genome duplication, which suggests that morphological change toward more generalist pollination may be a direct consequence of polyploidy.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/9/1097/s1. Figure S1: Diagram of corolla tube sampling, Figure S2: OrthoVenn diagram comparisons of protein clusters between progenitor transcriptomes, Figure S3: Hierarchical cluster of gene expression between developmental stages with a taxon, Figure S4: Hierarchical cluster of gene expression between taxa within a developmental stage, Figure S5: Subclustering of transcripts across all taxa at 85% anthesis, Figure S6: Subclustering of transcripts across all taxa at 95% anthesis, Figure S7: Differentially expressed candidate genes within each progenitor species at different developmental stages, Figure S8: Differentially expressed candidate genes between all comparisons at different developmental stages, Table S1: Sequencing information for each accession, Table S2: Pairwise comparisons of ploidy categories, Table S3: Protein comparison information between progenitor species, Table S4: Contigs for each subcluster at 60% anthesis, Table S5: Contigs for each subcluster at 85% anthesis, Table S6: Contigs for each subcluster at 95% anthesis.