Identification and Validation of Reference Genes for Gene Expression Analysis in Schima superba

Real-time quantitative PCR (RT-qPCR) is a reliable and high-throughput technique for gene expression studies, but its accuracy depends on the expression stability of reference genes. Schima superba is a fast-growing timber species with strong resistance. However, thus far, reliable reference gene identifications have not been reported in S. superba. In this study, 19 candidate reference genes were selected and evaluated for their expression stability in different tissues of S. superba. Three software programs (geNorm, NormFinder, and BestKeeper) were used to evaluate the reference gene transcript stabilities, and comprehensive stability ranking was generated by the geometric mean method. Our results show that SsuACT was the most stable reference gene and that SsuACT + SsuRIB was the best reference gene combination for different tissues. Finally, the stable and less stable reference genes were verified using SsuSND1 expression in different tissues. To our knowledge, this is the first report to verify appropriate reference genes for normalizing gene expression in S. superba for different tissues, which will facilitate the future elucidation of gene regulations in this species and useful references for relative species.


Introduction
Currently, plant gene expression analysis methods include Northern blot, in situ hybridization, RT-PCR, and real-time quantitative PCR (RT-qPCR). RT-qPCR has been widely used in molecular biology research, and expression analysis is realized by real-time detection of fluorescence signal changes in the whole PCR reaction process due to its high sensitivity, accuracy, specificity, throughput capability, and cost-effectiveness [1][2][3][4][5]. However, the accuracy of relative quantification in RT-qPCR is always affected by many variables, such as RNA quality, integrity, reverse transcription efficiency, and amplification efficiency [2,6]. To ensure accurate results and eliminate errors, it is necessary to use one or more stable reference genes to normalize the expression data of target genes [7].
Reference genes, also known as housekeeping genes, refer to a class of genes that can be stably expressed in different tissues and organs. In plant research, the commonly used reference genes are mainly the genes that constitute the cytoskeleton or participate in the cells' basic biochemical metabolic activities, including actin (ACT), β-tubulin (TUB), ribosomal RNA (18S rRNA, 26S rRNA), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), ubiquitin (UBQ), and elongation factors 1 α (EF-1α) [8][9][10]. However, studies have shown that the expression levels of these genes are specific and not stable across species, under different treatments. A reference gene suitable for all conditions does not exist [11][12][13]. Therefore, in the qualitative and quantitative research of genes, it is necessary to select the appropriate reference genes based on the specific experimental conditions [14].
Schima superba is an evergreen broad-leaved tree in Theaceae, and it is valued commercially for its timber and fire protection [15][16][17]. Theaceae contains about 700 species, which have important economic value, such as the tea plant (Camellia sinensis), with its special drinking value, the traditional oil tree (C. oleifera), which produces high-quality edible seed oil, and the ornamental plant C. azalea, with its attractive flowers. The genus Schima has approximately 20 species and is mainly distributed in southern China and the adjacent parts of East Asia, with 13 species (six endemic) present in China [18]. Some reference genes of Theaceae have been reported. For example, β-actin could be used as a reference gene for tissues and GAPDH for mature leaves and callus in C. sinensis [19]. TUA-3, ACT7α, and CESA were relatively stable in the different tissues of six oil-tea Camellia spp. [20]. TUA and GAPDH were optimal reference genes in different organs, as well as TUB and UBQ in petals in C. azalea [21]. However, there have been few reports on gene expression analysis and reference gene expression stability in S. superba. Only Yang [22] used C. oleifera's GAPDH to verify the differential expression genes between self-and cross-pollinated S. superba, but the expression abundance and stability have not been reported. With the completion of genome sequencing and construction of a high-density genetic map [23], molecular design breeding and molecular-assisted breeding of S. superba have been carried out gradually. Therefore, it is necessary to select appropriate reference genes for gene expression analysis.

Plant Materials
Plant materials were collected from the germplasm bank of S. superba clones (119 • 06 E, 28 • 03 N) in Zhejiang Longquan Academy of Forestry Sciences. The clones were 25year-old mother trees grafted with scions from Jianou, Fujian (118 • 31 E, 27 • 8 N) in 2008. The gene bank covers an area of 6.7 hm 2 and is located at an altitude of 200-300 m in the subtropical monsoon region. The relative humidity of the area is 79%, and the average annual rainfall is 1664.8-1706.2 mm.
In August 2020, different tissues were collected, including secondary xylem, secondary phloem, mature leaf, bud, annual fruit, and root of tissue culture seedlings sub-cultured for 60 days. It was a hot summer, with abundant rainfall and a monthly temperature average of 35 • C. Each sample was set up in biological triplicates. The samples were frozen with liquid nitrogen and stored at −80 • C.

Selection of Candidate Reference Genes
Eighteen candidate reference genes (GeneBank accessions: MW770873-770890) with a stable expression in the different tissues and ages of S. superba were selected according to transcriptome data from our laboratory (unpublished, Novogene, Beijing, China) ( Table S1). Eleven of them are often used as housekeeping genes in model plant species. The ColGAPDH (C. oleifera) (GeneBank accession: KC337052) gene was used as a control [22], and the sequence consistency was only 38.68% with SsuGAPDH. According to their CDS sequences blasted from genomic data of S. superba (unpublished, Novo-gene, Beijing, China) (Table S2), the primers were designed on the web using Primer 3.0 (http://www.primer3plus.com/primer3web/primer3web_input.htm, accessed on August 2020) and synthesized by Sangon Biotech Co., Ltd. (Shanghai, China), as shown in Table S3.

RT-qPCR Analysis
The total RNA was extracted using the RNAprep Pure Plant Plus Kit (polysaccharides and polyphenolics-rich) (Code No. DP441, TIANGEN, Beijing, China) and was stored at −80 • C. RNA was assessed using 1% agarose gel electrophoresis and quantified with a Nanodrop ND-2000 ultra-micro nucleic acid protein analyzer (Thermo, Waltham, MA, USA). The RNA samples with A260/A280 ratios between 1.8 and 2.1 (Table S4) were used to synthesize the first strand cDNA with the PrimeScript TM RT master mix (Perfect Real Time) (Code No. RR036A, Takara, Kyoto, Japan) in 20-µL reaction mixtures, once they were adjusted to the same concentration of RNA to 1 µg. The cDNA was diluted 1:9 with nuclease-free water prior to RT-qPCR analysis.
RT-qPCR was restricted to the following guidelines (Applied Biosystems Q7 The primer specificity was verified by the presence of a single peak in the melt curve analysis during the RT-qPCR process. Three independent biological replicates and three technical repetitions were performed for each of the quantitative PCR experiments. The threshold cycle (Ct) was measured automatically, and correlation coefficients (R 2 ) together with slope were calculated from the standard curve based on a tenfold series dilution of the cDNA templates. The corresponding RT-qPCR efficiencies (E) for each gene were determined from the given slope.

Validation of Identified Reference Genes
SND1 (GeneBank accession: MW796194) was selected as the target gene to validate the reliability of the identified reference genes from transcriptome data (unpublished, Novogene, Beijing, China) ( Table S1). The gene expression profiles at different tissues were normalized using the most and least stable reference gene and ColGAPDH. Sample collections and experiments were performed as described above.

Statistical Data Analysis
The average Ct value was calculated from three biological replicates and three technical replicates. Relative gene expression levels were calculated using the 2 − Ct method [25]. GeNorm, NormFinder, and BestKeeper algorithms were used to evaluate the stability of 19 candidate reference genes. GeNorm and NormFinder calculated the average expression stability values based on the 2 −∆Ct value [26,27]. BestKeeper calculated the standard deviation (SD), coefficient of variance (CV), and correlation coefficient (r) based on the Ct value [28], using geometric means to provide a comprehensive stability evaluation of candidate reference genes.

Candidate Reference Genes and PCR Amplification
Eighteen candidate reference genes were selected from the transcriptome of S. superba, and ColGAPDH was cited from Yang [22]. The presence of a single PCR product of the expected size ( Figure S1) and a single peak in the melting curve ( Figure S2) confirmed specific amplification. The amplification efficiency (E) of all PCR reactions ranged from 93.47% for SsuTUA1 to 109.03% for SsuUBCJ2 ( Table 1), suggesting that these genes were suitable for further gene expression analysis. Meanwhile, the standard curves showed good linear relationships, with correlation coefficients (R 2 ) above 0.99 (Table 1).

Ct Values of Candidate Reference Genes
To assess the expression stability of 19 candidate reference genes in different tissues, the transcript abundances were presented as their Ct values. The Ct values varied in different tissues, from 16.752 (SsuMet2) to 33.379 (SsuCas), while the mean Ct values varied from 18.032 (SsuMet2) to 25.556 (SsuMDH) ( Table 2). The Ct range > 4 of SsuCas, SsuUBC17, SsuUBC2, and SsuUDP in different tissues suggested that these genes varied greatly and were unstable in different tissues.

Analysis of Reference Gene Stability Using Three Bioinformatic Programs
To reduce analysis error, candidate gene stability ranking in different tissues was determined separately using geNorm, NormFinder, and BestKeeper to screen out the reference genes suitable for experimental treatment and provide a beneficial reference for subsequent research.
The geNorm program was used to rank the gene expression stability by calculating the average expression stability values (M) based on the 2 −∆Ct value [26]. The smaller the M value of the reference gene, the more stably it was expressed. Meanwhile, if M > 1.5, it was not suitable as a reference gene [26]. The M values of tested genes evaluated by geNorm are shown in Figure 1. SsuTUA1 and SsuRIB were ranked as the two most stable genes in different tissues, while SsuCas and SsuUDP were the two least stable genes.  The pairwise variation value (Vn/Vn+1), calculated by geNorm, determined the optimal number of reference genes. When Vn/Vn+1 < 0.15, the optimal number of reference genes is n, otherwise, the number is n+1 [26]. In this study, except for V18/V19, the other value of Vn/Vn+1 < 0.15 (Figure 2), indicating two reference genes, would be sufficient for gene normalization, and an increase did not improve sensitivity. NormFinder ranked the expression stability of reference genes by calculating the average pairwise variation in one gene relative to other candidate genes. The smaller the stability value, the more suitable it is as a reference gene [27]. For different tissues, the most stable gene was SsuACT, followed by SsuRIB, while the least stable gene was SsuCas, which was not included with the genes selected by geNorm (Figure 3). Expression stability is represented by the standard deviation (SD), coefficient of variance (CV), and correlation coefficient (r) of Ct values in the BestKeeper program, and the most stable reference genes were identified as those with the lowest SD and CV and the most r [28]. In this study, SsuACT and SsuUBCJ2 were identified as the most stable genes for different tissues, while SsuUBC17, SsuTUB, SsuUBC2, SsuUDP, and SsuCas were unstable because of SD > 1 (Table 3). The rankings of the 19 tested genes were not perfectly consistent among geNorm, NormFinder, and BestKeeper (Table 4). To provide a comprehensive evaluation of candidate reference genes, further analysis was carried out using the geometric mean, which integrates geNorm, Normfinder, and BestKeeper. The comprehensive ranking, recommended by the geometric mean method, is shown in Table 4, and SsuACT was the most stable gene for different tissues. The best combination of reference genes was determined based on the optimal number calculated by geNorm and the ranking list obtained using the geometric mean method. Therefore, SsuACT + SsuRIB was found to be the best combination of reference genes for different tissues.

Validation of the Identified Reference Genes
To examine the reliability of the candidate reference genes for normalization, SsuSND1 expression profiles in different tissues were normalized using the two most stable candidate reference genes (SsuACT and SsuRIB), a combination of stable genes (SsuACT + SsuRIB), and the least stable reference gene (SsuCas), as well as ColGAPDH (Figure 4). When SsuACT, SsuRIB, SsuACT + SsuRIB, and ColGAPDH were used for normalization, the expression patterns of SsuSND1 were similar, and the relative expression of xylem, leaf, and fruit was higher than the others. SsuSND1 was hardly expressed in the bud and root, but the expression was most appropriate for SsuACT and SsuACT + SsuRIB. However, as analyzed by SsuCas, the expression pattern was not compatible, and the expression levels were too high in the fruit and too low in the bud. It was suggested that the selected reference genes were reliable.

Discussion
RT-qPCR is a common technique in molecular biology research [29]. In the analysis process, reference genes are often used to reduce or correct the errors in the quantitative process of target genes. Therefore, the selection of an appropriate reference gene is the key to realizing the research of target gene expression under different experimental conditions or tissues [30]. S. superba has economic value for its timber, and the wood is used for furniture and construction. According to the phylogenetic analysis of Theaceae, Theeae and Gordonieae are closely related to each other, and the genera Schima belongs to Gordonieae [31]. Moreover, we constructed a high-density genetic map and obtained 168 QTLs for 14 phenotypes [23], but it was not focused on their molecular function or gene expression. Therefore, to carry out the follow-up experiment smoothly, a stable and suitable reference gene would be selected and evaluated for the normalization of gene expression analysis by RT-qPCR in our research.
To avoid the limitations of using only a single software analysis, three bioinformatic programs (geNorm, NormFinder, and BestKeeper) were used to evaluate the expression stability of candidate reference genes in our analysis. The basis for evaluating gene stability in geNorm is the use of each gene's 2 −∆Ct value to calculate the M value [26]. Meanwhile, geNorm can determine the optimal number of reference genes required for quantitative analyses. In this study, gene expression analysis needs two reference genes to achieve the best performance. The NormFinder algorithm is similar to geNorm, using the 2 −∆Ct value as the relative expression to calculate the stability of gene expression [27]. BestKeeper focuses on the standard coefficient variation (SD) and variation correlation coefficient (CV) to screen the stability of internal reference genes [28]. The rankings from different programs showed some substantial discrepancies (Table 4). For instance, SsuTUA1 and SsuRIB were the best reference genes identified by geNorm (Figure 1), while SsuACT was evaluated as the best by NormFinder ( Figure 3) and BestKeeper (Table 3). Differences in rankings among these programs have also been reported in other studies [12,32,33], which are likely the result of the different algorithms that they employ [34]. Therefore, to provide a comprehensive evaluation of candidate reference genes, the geometric mean was used to generate a comprehensive stability ranking, and the best combinations were determined based on the optimal number of reference genes calculated by geNorm. SsuACT and SsuACT + SsuRIB were the most stable reference gene and combination for different tissues (xylem, phloem, leaf, bud, fruit, and root). However, in the study processes, we found that the Ct values of 19 candidate reference genes in seeds were significantly greater than in other tissues, so the transcription levels fluctuated greatly in the seven tissues (root, phloem, xylem, leaf, bud, fruit, and seed). The rankings of the three programs were contradictory; thus, we excluded seeds and analyzed only the remaining six tissues. Notably, SsuRIB was the best reference gene predicted among the seven tissues by NormFinder, which indicated that SsuRIB could also be used to normalize the gene expression of seeds, because NormFinder is more suitable for the situation when the genes' transcription level fluctuates greatly [27,28].
Actin is widely used as a reference gene in plants. In this study, SsuACT was also the most stable gene. SsuRIB and SsuCal7 are novel genes screened from the genome and transcriptome of S. superba, and they have not been reported as reference genes in other species. ColGAPDH was not suitable as a reference gene for S. superba because of its low expression and poor stability. We compared the reference genes in S. superba, C. sinensis, C. oleifera, and C. azalea in Theaceae. The optimal reference genes in different tissues were SsuACT, SsuRIB, and SsuTUA1 in S. superba; β-actin in C. sinensis [19]; TUA-3, ACT7α, and CESA in C. oleifera [20]; and TUA and GAPDH in C. azalea [21]. Therefore, ACT could be used as the reference gene in S. superba, C. sinensis, and C. oleifera, and TUA could be used in S. superba, C. oleifera, and C. azalea. Therefore, we presume that ACT and TUA have wide applicability as reference genes in Theaceae.
When a certain tissue is targeted for investigating gene expression, usually the combination of reference genes should be considered. Hence, to screen the best combination of reference genes for each tissue, we set the moderate Ct value standard to 20-25 and selected the gene with the lowest SD value. The results show that the best combinations of reference genes for each tissue were different, and the best gene combination was Ssu-TUA2 and ColGAPDH for leaves; SsuMDH and SsuUBC2 for buds; SsuCal7 and SsueIF5 for fruits; SsuCas and SsuMDH for the phloem; SsuGAPDH and SsuGTP for roots; and ColGAPDH and SsuUBCJ2 for the xylem (Table 2). Differences in the best combination of reference genes between each tissue and different tissues were significant, which indicated that it is necessary to select the appropriate reference genes based on the specific experimental conditions.
To validate the suitability of the identified reference genes, SsuSND1 expression patterns were investigated in different tissues using different reference genes. The expression patterns normalized by SsuCas were not compatible with SsuACT + SsuRIB. The data once again demonstrate that reference genes play a key role in normalizing the data from RT-qPCR, and the use of inappropriate reference genes may lead to inaccurate results. Moreover, NAC plays a crucial role in the formation and development of the apical meristem [35], lateral root [36], and secondary wall [37]. SND1 plays a similar role in S. superba, and the relative expression of secondary xylem was higher than other tissues.

Conclusions
As far as our knowledge goes, this study is the first systematic report on the selection and verification of reliable stable reference genes for different tissues in S. superba, showing that SsuACT was the most stable reference gene, and that SsuACT + SsuRIB was the best combination for different tissues of S. superba. This study provides a basis for gene expression in S. superba and Theaceae.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12050732/s1, Figure S1: Agarose gel electrophoresis analysis of total RNA and PCR products, Figure S2: Melting curves of candidate reference genes and target gene in S. superba, Table S1: Candidate reference genes and target gene in RNA-seq libraries of S. superba, Table S2: CDS of candidate reference genes and target gene, Table S3: Primers used for RT-qPCR of reference genes and target gene, Table S4: Quality of total RNA from different tissues in S. superba.
Author Contributions: Z.Y., R.Z., and Z.Z. designed the research; Z.Y. conducted the experiment, analyzed the data, and wrote the manuscript. All authors have read and agreed to the published version of the manuscript.