Integrated SMRT Technology with UMI RNA-Seq Reveals the Hub Genes in Stamen Petalody in Camellia oleifera

: Male sterility caused by stamen petalody is a key factor for a low fruit set rate and a low yield of Camellia oleifera but can serve as a useful genetic tool because it eliminates the need for artiﬁcial emasculation. However, its molecular regulation mechanism still remains unclear. In this study, transcriptome was sequenced and analyzed on two types of bud materials, stamen petalody mutants and normal materials, at six stages of stamen development based on integrated single-molecule real-time (SMRT) technology with unique molecular identiﬁers (UMI) and RNA-seq technology to identify the hub genes responsible for stamen petalody in C. oleifera . The results show that a large number of alternative splicing events were identiﬁed in the transcriptome. A co-expression network analysis of MADSs and all the differentially expressed genes between the mutant stamens and the normal materials showed that four MADS transcription factor genes, CoSEP3.1 , CoAGL6 , CoSEP3.2 , and CoAP3 , were predicted to be the hub genes responsible for stamen petalody. Among these four, the expression patterns of CoAGL6 and CoSEP3.2 were consistently high in the mutant samples, but relatively low in the normal samples at six stages, while the patterns of CoSEP3.1 and CoAP3 were initially low in mutants and then were upregulated during development but remained relatively high in the normal materials. Furthermore, the genes with high connectivity to the hub genes showed signiﬁcantly different expression patterns between the mutant stamens and the normal materials at different stages. qRT-PCR results showed a similar expression pattern of the hub genes in the RNA-seq. These results lay a solid foundation for the directive breeding of C. oleifera varieties and provide references for the genetic breeding of ornamental Camellia varieties.


Introduction
Camellia oleifera (also known as oil-seed camellia), is an evergreen shrub or small evergreen tree belonging to the genus Camellia of the family Theaceae. It is distributed in the Yangtze River basin and further south of China. In southern China, C. oleifera is an important source of edible oil derived from woody plants. The oil of C. oleifera is high in unsaturated fatty acids and has shown medicinal effects, making it a high-quality edible oil. When the major cultivar of C. oleifera, "Huashuo", is compared with normal C. oleifera flowers, the stamen of the male sterile mutant shows remarkable petalody, leading to sterility. In normal C. oleifera flowers, the stamen shows no abnormal morphology, allowing for the normal dispersal of pollen. Male sterility caused by stamen petalody is a key factor for a low fruit set rate and a low yield of C. oleifera [1,2]. However, male sterility-a relatively common reproductive trait in plants-can serve as a useful genetic tool because it eliminates the need for artificial emasculation. Thus, this condition affects research and field applications in economic forests due to the outbreeding enhancement. Petaloid stamens are an extremely important ornamental characteristic in plants of the same genus.
MADS genes (MADSs) have been reported to be widely involved in the regulation of flower organ development in plants [3][4][5][6]. The development in Arabidopsis thaliana can be interpreted with an ABCE model. In this model, the development of stamens is regulated by B, C, and E class genes, all of which belong to the MADSs; however, some of them have a short expression duration and a low abundance [7].
RNA sequencing (RNA-seq) technology has rapidly become a useful tool for the analysis of differential gene expressions among samples at the transcriptome level in the past decade. With the development of this technology, single-molecule real-time (SMRT) sequencing based on Pacific Biosciences System has made it possible to obtain a highquality full-length transcriptome due to its advantage of ultra-long reads. At the same time, it can provide reference information for the analysis of the alternative splicing (AS) of genes among samples of non-model species without genomes [8][9][10][11]. Polymerase chain reaction (PCR)-induced errors in the next-generation sequencing can be corrected by introducing unique molecular identifiers (UMI) into the transcriptome library as a molecular barcode; this process results in a more accurate expression of data, especially for the quantitative expression of low abundance genes [12][13][14][15].
Petaloid stamens of C. oleifera have a crucial value for artificial hybrid breeding. However, their molecular regulation mechanism still remains unclear. In this study, SMRT technology combined with UMI RNA-seq technology was used to compare the differentially expressed genes (DEGs) between the stamen petalody mutant and the normal stamen at six stamen development stages in order to identify the hub genes responsible for stamen petalody in C. oleifera. The gene regulatory network was comprehensively analyzed.

Plant Materials
Developing flower buds of C. oleifera were sampled from ten-year-old trees with petaloid stamen mutants and normal stamens in the germplasm in Dongcheng Town, Wangcheng District, Changsha, Hunan Province (113 • 21 E, 28 • 05 N). This area has an average annual precipitation of 1380 mm, an average annual temperature of 19.3 • C, and an annual accumulated temperature of 5463 • C. In accordance with our previous studies on the development cycle of the bud samples [16], three biological replicates were collected separately at six stages in 2020 as follows: from 20 June to 28 June, stage 1 (S1) was the formation stage of the stamen primordium (when the stamen primordium cells elongate longitudinally and the primordium protrudes from the torus); from 29 June to 6 July, stage 2 (S2) was the inner stamen formation stage (the innermost 1-2 whorls of stamens are the first to form); from 7 July to 13 July, stage 3 (S3) was the formation stage of the outer stamens (the rest of the stamens are then formed); from 14 July to 21 July, stage 4 (S4) was the differentiation stage of anthers and filaments (the anthers and filaments are separated with clear boundaries); from 22 July to 2 August, stage 5 (S5) was the differentiation stage of the pollen sac (four pollen sacs begin to appear with distinct differences from the septum); from 3 August to 11 August, stage 6, (S6) was the formation stage of the stamen (formation of the pollen sac). The sepals and pedicels were quickly removed from all buds before further study.

Paraffin Section Microscopy
Paraffin section microscopy was performed in accordance with our previous study [17] with a modification. The buds were placed in Carnoy's solution for 10 h to fully eliminate the air from the material. They were then transferred to a 70% alcohol solution and stored in a refrigerator at 4 • C for later use. The fixed material was properly trimmed and then dyed with a 70% haematoxylin solution, washed with water, and dehydrated using alcohol of various concentrations. The sample was made transparent with xylene, embedded in paraffin, and then sliced with a paraffin microtome (Leica RM2235, Heidelberg, Germany) with a thickness of 10 µm. A permanent cover sheet was made with Canada balsam, and photographs were acquired using an optical microscope (Leica DM2500, Heidelberg, Germany).

Scanning Electron Microscopy
The buds were fixed with a 2.5% glutaraldehyde fixative solution (prepared with 0.1 mol·L −1 phosphate buffer) for 2 h, washed with a phosphate buffer (0.1 mol·L −1 ), and then fixed with a 1% osmium fixative solution (prepared with 0.1 mol·-L −1 phosphate buffer) for another 2 h. After washing with a phosphate buffer (0.1 mol·L −1 ), the materials were dehydrated with 30%, 50%, 70%, 80%, 90%, 95%, and 100% ethanol, and then transitioned to 100% tert-butanol to be freeze-dried. The samples were placed on a sample table in an ion sputterer for 30 s and then photographed using a scanning electron microscope (HiTACHI TM4000, Tokyo, Japan).

RNA Isolation, Library Construction and Sequencing
The RNAs were isolated from a total of 36 bud samples (two types of materials at six stages with three biological replicates) by using a RNAprep Pure Plant Kit (Polysaccharides and Polyphenolics-rich) (Qiangen, Beijing, China) in accordance with the kit instructions. The integrity of all the RNA samples was checked through agarose gel electrophoresis and the Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA, USA). The integrity of all the samples was greater than 8.00, and these samples were used for further study. All RNA samples were mixed in equal mole amounts into an RNA pool. mRNAs were enriched by Oligo(dT) magnetic beads for a full-length cDNA library. These samples were prepared in accordance with the Isoform Sequencing protocol (Iso-Seq) using the Clontech SMARTer PCR cDNA Synthesis Kit and the BluePippin Size Selection System protocol as described by Pacific Biosciences (PN 100-092-800-03), and then sequenced using a SMRT approach based on the PacBio sequencing platform. The 36 libraries labeled with UMI were constructed. In contrast to the traditional sequencing library construction, the mRNAs were enriched by the rRNA deletion method, and each cDNA molecule was ligated with a unique barcode sequence adapter before amplification. The constructed libraries were sequenced using the pair-end approach based on the Illumina NovaSeq system. All sequencing processes were completed in Novogene Co., LTD. (Beijing, China).

Sequence Processing and Analysis
The PacBio official software package SMRTLink v10.1 was used to process the raw SMRT data. Subread sequences were first obtained, and circular consensus sequences were obtained by correction among subreads. The sequences were then divided into fulllength sequences and non-full-length sequences in accordance with whether the sequences contained 5 and 3 end primers and a polyA tail. Full-length sequences were clustered using the hierarchical n*log (n) algorithm to obtain the cluster consensus sequence. These sequences were then polished to obtain high-quality consensus sequences, which were corrected by Illumina data, and redundancy was eliminated using software CD-HIT v4.8.1 [18] for subsequent analysis.
The processed SMRT sequences were used as a reference for the assembly of UMI Illumina reads. The clean reads of each sample were compared with the reference with the parameters of the comparison software Bowtie2 v2.4.3 [19] of the RSEM software package [20] with end-to-end and sensitive modes. Other parameters were set to default. The results were counted to obtain the read count value of each gene, and the fragments per kilobase of transcript sequence per million base pairs (FPKM) transformation was performed to obtain the gene expression level. DEGs were screened by comparing the genes expressed in two types of materials at the same developmental stage.
The gene function annotation was conducted on the sequences that were not redundant with CD-HIT software [18] versus databases including NR, NT, PFAM, KOG/COG, Swiss-Prot, KEGG, and GO. The GO enrichment analysis of the DEGs was implemented using the GOseq R package (http://www.bioconductor.org/packages/release/bioc/html/ goseq.html (accessed on 27 July 2020)). The DEGs were then screened by comparing the genes expressed in two types of materials at the same developmental stage by using the DEGSeq R package (http://www.bioconductor.org/packages/release/bioc/html/ DEGseq.html (accessed on 29 July 2020)). The p values were adjusted using the Benjamini Hochberg method. A corrected p-value of 0.05 and a log2 (Fold change) of 1.5 were set as the thresholds for significantly differential expressions.

Alternative Splicing Analysis
SUPPA v2.3 (https://github.com/comprna/SUPPA (accessed on 1 August 2020)) was used to calculate the expression weight (psi) of AS on the basis of the TPM values. The differential AS of the two types was performed using the significance test of psi. The dpsi value was adjusted using the Mann-Whitney U test method. The absolute dpsi value of 0.1 and a p value of 0.05 were set as the thresholds for significantly differential alternative splices.

Identification of MADSs
The MADS transcription factors (TFs) were predicted on ITAK software [21]. The MADS TFs of the model plants A. thaliana were downloaded from TAIR (https://www. arabidopsis.org/ (accessed on 3 February 2021)). The phylogenetic tree of the MADSs from both C. oleifera and A. thaliana was inferred on the basis of protein sequences under maximum likelihood on MEGA v6 [22] and then visualized on software iTOL v5 [23].

Co-Expression Network Analysis
The expression data of all the DEGs and the MADSs were selected for weighted coexpression network analysis (WGCNA) on the R package WGCNA v 1.47 [24,25] with a merge cut height value of 0.25 and a minimum module size of 50. The Kendall model was used to analyze the correlation between the gene expression and the phenotype in order to reduce the sensitivity to outliers [26,27]. The module with the highest correlation with the phenotype was selected, and all the genes in the module were annotated. They were enriched and related to the flower organ development. The genes related were then extracted and visualized on Cytospace software, and the hub genes were screened in accordance with the degree of node connectivity [28,29].

Expression Analysis
The expression patterns of the hub gene and the relatively high connectives were analyzed on the basis of log2 (FPKM). A heatmap of the expression patterns was constructed on TBtools v1.092 software [30] for the mean expression in three biological replicates of each gene at each stage. The clustering of similar gene expression patterns was also completed by TBtools v1.092. Quantitative real-time PCR (qRT-PCR) reactions were conducted by using a Talent qPCR kit in accordance with the operating manual (TIAGEN, Beijing, China) with three technical replicates. The FPKM of the housekeeping gene CoGAPDH was relatively stable in this study. Thus, it was used as an internal control to normalize the relative expression of all the verified genes. The specific primers are listed in the Supplementary Materials section in Table S1.

Observation of Stamen Development
At the first two stages, the difference between the mutant and the normal stamens was unremarkable ( Figure 1A2-F2). At (S3), a slight difference was observed on the top of the stamens in the mutant, which showed subtle directional tendencies of a finger-shape ( Figure 1A3-C3), whereas the normal stamens were only in a finger-shape ( Figure 1D3-F3). At (S4), the top of the finger-shaped stamens in the mutant group showed petaloid direction  Figure 1A4-C4), whereas the normal stamens began to form anthers ( Figure 1D4-F4). At (S5), the upper pollen sacs in the mutant stamens differed significantly from those of the normal stamens ( Figure 1A5-C5). At (S6), the petaloid stamens in the mutants spread downward ( Figure 1A6-C6), whereas the apical part of the anther in the normal stamens retained meristematic tissue ( Figure 1D6-F6). Our previous studies showed that the development of microspores occurs in a normal bud [31]. In the fully opened flower, the anthers of the mutant stamens did not form complete or normal pollen sac structures, resembling petals ( Figure 1A7,B7). By contrast, the anthers of the normal stamens formed full pollen sacs ( Figure 1C7,D7).

Observation of Stamen Development
At the first two stages, the difference between the mutant and the normal stamens was unremarkable ( Figure 1A2-F2). At (S3), a slight difference was observed on the top of the stamens in the mutant, which showed subtle directional tendencies of a fingershape ( Figure 1A3-C3), whereas the normal stamens were only in a finger-shape ( Figure  1D3-F3). At (S4), the top of the finger-shaped stamens in the mutant group showed petaloid direction ( Figure 1A4-C4), whereas the normal stamens began to form anthers (Figure 1D4-F4). At (S5), the upper pollen sacs in the mutant stamens differed significantly from those of the normal stamens ( Figure 1A5-C5). At (S6), the petaloid stamens in the mutants spread downward ( Figure 11A6-C6), whereas the apical part of the anther in the normal stamens retained meristematic tissue ( Figure 1D6-F6). Our previous studies showed that the development of microspores occurs in a normal bud [31]. In the fully opened flower, the anthers of the mutant stamens did not form complete or normal pollen sac structures, resembling petals ( Figure 1A7,B7). By contrast, the anthers of the normal stamens formed full pollen sacs ( Figure 1C7,D7).

Sequence Processing
Full-length transcriptome sequencing was performed on 36 bud materials with the same amount of mixed RNA samples to complete gene sequences. All raw data were deposited in the National Genomics Data Center, China, under BioProject acc. PRJCA004583. A total of 23,603,965 clean reads were generated from SMRT sequencing, resulting in 41,717 genes with an N50 of 3131 after processing, where 15,704 genes were more than 3000 bp long. The length distribution of subreads is shown in the Supplementary Materials section, Figure S1, and the length distribution of the unigenes is shown in the Supplementary Materials section, Figure S2

Alternative Splicing
AS can be accurately identified and analyzed with the advantage of full-length transcriptome sequencing reading length. The results show that a large number of AS events are found in the expressed sequences ( Figure 2). A total of 4088 genes containing two AS isoforms were found, accounting for 31% of the total. Three of the seven AS types were dominant, where 165 genes belonged to the retained intron type, accounting for 4.17% of the total AS genes, followed by the alternative-3 splice-sites type with 317 genes, and the alternative-5 splice-sites type with 135 genes. A total of 341 AS isoforms were differentially expressed between the mutant and the normal samples.

MADS TFs
MADS TFs are vital regulators of floral organ development, especially in A. thaliana, a well-studied model plant. In this study, a total of 42 MADSs were identified. Phylogenetic analysis showed that the MADSs expressed in the development of flower buds in C. oleifera were clustered with multiple A. thaliana MADS subfamilies including SEP, AGL, AP3, FLC, AGL17, SVP, SOC1, AG, AP1, and AGL6 (Figure 3), indicating the diverse regulatory functions of the MADSs in C. oleifera.

Identification of Hub Genes
WGCNA could cluster genes into modules with similar expression patterns, analyze the correlation between the clustering modules and specific traits, and could further identify the hub genes in the network. The results show that the highest correlation coefficient between the module and the stamen trait was 0.74. The gene network of the module indicated that four genes show a high connectivity. These four genes were the hub genes in the stamen petaloid in C. oleracea (Figure 4). Further annotation analysis indicated that the four hub genes belonged to the MADS TF family, including two SEP3, a AGL6, and a AP3, which were designated as CoSEP3.1, CoSEP3.2, CoAGL6, and CoAP3. Genes with a relatively high connectivity to the hub genes were involved in flower organ development, including floral organ morphogenesis (HUA2, SEU, TSO1), hormone response (ARF2, GID1C), AS (CLO, PAPS1), and signal transduction (MKK5, SERK).

Expression Analysis
The expression patterns were obviously different in the mutant and normal samples of the hub genes and their connectivities were detected by RNA-seq ( Figure 5A). These expression profiles were clustered into three main groups, and the four hub genes were all clustered in the third main group. The expressions of two hub genes, CoSEP3.2 and CoAGL6, were consistently high in the mutant samples, but were relatively low in the normal samples, while expressions of the other two hub genes, CoSEP3.1 and CoAP3, were initially low in the mutant buds and then upregulated with development, but remained relatively high in the normal buds. Furthermore, the genes with a high connectivity to the hub genes showed significantly different expression patterns at different stages between the mutant and the normal samples. qRT-PCR results showed a similar expression pattern of the hub genes in the RNA-seq ( Figure 5B). AS is a common post-transcriptional process to enhance the proteome diversity in eukaryotes and is not exempted in the development of plant flower organs. Up to 60% of intron-containing genes in A. thaliana undergo AS. More than 1700 isoforms were differentially expressed between the inflorescent meristem stages and the flower development stages, suggesting that AS is involved in the transition during the floral development in

Discussion
The stamen petalody phenomenon is common in plants, some of which are stable, and some of which change with the variation in environmental conditions [32,33]. The mutant of C. oleifera has a stable phenotype of stamen petrification according to our previous observation, enabling this study to be conducted.
AS is a common post-transcriptional process to enhance the proteome diversity in eukaryotes and is not exempted in the development of plant flower organs. Up to 60% of intron-containing genes in A. thaliana undergo AS. More than 1700 isoforms were differentially expressed between the inflorescent meristem stages and the flower development stages, suggesting that AS is involved in the transition during the floral development in this model plant [34]. In Magnolia stellate, one of the three AS isoforms of AG orthologous, expressed in developing stamens and carpels, can mimic the flower phenotype of the ag mutant and can produce double flowers with the homeotic transformation of stamens into petals in ectopic expressed A. thaliana, showing different functions from the two other isoforms [35]. In rice, OsMADS3, an AG ortholog, has two AS isoforms that differ in only one serine residue. Only the isoform, lacking serine residue, can specify stamens and carpels in ag mutant flowers [36]. Consistent with the previous reports, the numbers of AS isoforms were identified in the transcriptomes of all samples in this study by using the full-length transcriptome as a reference. These AS isoforms are probably involved in the development of certain flower organs.
Gene expression studies based on RNA-seq technology usually generate a large amount of expression data, and WGCNA enables the identification of the hub genes that have a high connectivity with other genes, which are usually the TFs that play a crucial role [24]. On the basis of RNA-Seq and WGCNA, four MADSs as hub genes were identified in the regulation of stamen petaloid in double flowers of Lagerstroemia speciosa [37]. Eighteen genes, including 7 MADSs and 11 other TF genes, were found to be involved in petaloid stamens in Paeonia lactiflora [38]. This finding indicated that MADSs are widely involved in the regulation of stamen petalody in non-model plant species. Similar to previous reports, in this study, four MADSs, including two SEPs, a AGL6, and a AP3, were identified to be involved in petaloid stamen determination in C. oleracea.
The molecular mechanism of flower organ development is best understood in the model plant A. thaliana. This mechanism was proposed as a ABCE model, where B-class MADSs specified either petals or stamens, and C-class MADSs specified stamens and carpels; they act in a combinatorial manner to determine the floral meristem identity [39][40][41]. The specification of the stamen is mainly regulated by the expression of a C-class gene AGAMOUS [42,43]. However, plants have intensely expanded the functional diversity of the MADSs during evolution by the formation of heterotetrameric complexes. SEPALLATA3 (SEP3), an E-class gene, has been regarded as a redundant mediator in flower organ specification [44], but has been later reported to function as a central protein-protein interaction hub, driving tetramerization with other MADSs, such as AG, PISTILLATA (PI), AP3, and even other SEP, to specify the determination of floral organs [45][46][47][48]. In petunia, the phenotype of AG-silenced plants has a similar flower phenotype to that of SEP3-silenced plants. Thirdwhorl stamens in the plants were transformed into petaloid organs, and the two genes were verified to interact [49]. In Zingiberales plants, SEP-like genes have undergone several duplication events giving rise to multiple copies, and AGL6, sister to the SEP-like genes, play a crucial role in the alteration of stamen into petaloid organs [50]. In line with previous studies, in this study, four MADSs, including two SEP3, a AP3, and a AGL6, were identified as hub genes in petaloid stamen determination in C. oleracea. Although the homologous of AG of A. thaliana was found in this study, it was not identified as the hub gene. This finding suggests that this determination may not be made by a single gene but by a combination of MADSs in C. oleracea. However, the expression patterns of the two SEP3 genes were almost opposite in the two types of materials, which might be because their expressions needed to reach a certain balance and interact with other MADSs to determine the petaloid stamen in mutant C. oleracea or other unknown reasons. Therefore, the aim of future research should be to verify the function of the hub genes through genetic transformation.

Conclusions
Transcriptome sequencing was performed and analyzed on two types of materials, stamen petalody mutants and normal materials, at six stamen development stages on the basis of integrated SMRT technology with UMI RNA-seq technology. This process was performed to identify the hub genes responsible for the stamen petalody of C. oleifera. The results showed that numbers of AS isoforms were identified in the transcriptomes. Four MADSs as hub genes responsible for stamen petalody were identified. Among them, CoSEP3.1 was the most important, followed by CoAGL6, CoSEP3.2, and CoAP3. These results lay a solid foundation for the directive breeding of C. oleifera varieties and provide references for the genetic breeding of ornamental Camellia varieties.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/f12060749/s1, Figure S1: length distribution of subreads from SMRT sequencing; Figure S2: length distribution of unigenes. The X-axis represents length (bp), the Y-axis represents the number of genes; Figure S3: Venn diagram of unigene annotation. The sum of the numbers in each large circle represents the number of unigenes for the database annotation, and the overlapping parts of the circles represent unigenes annotation results that are common among databases; Figure S4: DEGs at each stage. DEGs were then screened by comparing the genes expressed in two types of materials at the same developmental stage. Corrected p-value of 0.05 and log2 (Fold change) of 1.5 were set as the threshold for significantly differential expression, values are means of three biological replicates; Table S1: specific primers used for qRT-PCR verification; Table S2: total reads of each sample and mapped reads against the reference sequence. Total reads represent clean data of sequencing reads after quality control; Total mapped represent the number of clean reads that can be matched to the reference sequence.