Selection and Validation of Reference Genes for Gene Expression Studies Using Quantitative Real-Time PCR in Prunus Necrotic Ringspot Virus-Infected Cucumis sativus

Several members of the genus Ilarvirus infect fruit trees and are distributed worldwide. Prunus necrotic ringspot virus (PNRSV) is one of the most prevalent viruses, causing significant losses. Cucumis sativus can be infected by several ilarviruses, leading to obvious symptoms, including PNRSV, which suggests that cucumbers could be good hosts for the study of the pathogenesis of ilarviruses. Real-time quantitative PCR is an optimal choice for studying gene expression because of its simplicity and its fast and high sensitivity, while its accuracy is highly dependent on the stability of the reference genes. In this study, we assessed the stability of eleven reference genes with geNorm, NormFinder, ΔCt method, BestKeeper, and the ranking software, RefFinder. The results indicated that the combined use of EF1α and F-BOX was the most accurate normalization method. In addition, the host genes AGO1, AGO4, and RDR6 were selected to test the reliability of the reference genes. This study provides useful information for gene expression analysis during PNRSV infection and will facilitate gene expression studies associated with ilarvirus infection.


Introduction
Gene expression analysis is one of the important and efficient approaches to investigating gene function and revealing phenotypic differences in biological research [1]. Microarray analysis [2], transcript profiling [3], and real-time quantitative polymerase chain reaction (RT-qPCR) [4] are common methods used to identify gene expression patterns and compare RNA levels at transcriptional levels among different populations. In particular, RT-qPCR has been widely utilized in the quantification of the expression levels of single or several genes because of its simplicity, speed, sensitivity, accuracy, and relatively low cost [5]. RT-qPCR has also been used to evaluate the accuracy of microarray analysis and transcriptome data [6]. The selection of appropriate reference genes to ensure the accuracy of transcript normalization is both a prerequisite and the most prominent challenge in the performance of RT-qPCR analyses [7]. The use of reference genes that are not stable has a significant impact on the results, even leading to different conclusions [8]. The ideal reference genes are usually stably expressed across all the compared samples or groups, regardless of the experimental conditions, tissue differences, or treatments [9]. However, there is no universal gene that can satisfy all of the requirements [10]. Some housekeeping genes, including ACTIN, TUBULIN, UBIQUITIN, and GAPDH, ae the most commonly used reference genes in RT-qPCR tests; however, they vary substantially under different conditions [11][12][13]. A growing body of research indicates that it is critically important to evaluate the stability of the candidate reference genes experimentally for accurate quantification analysis case-by-case, which depends on specific species, the sample or tissue type, different treatments, experimental conditions, etc. [14,15].
Prunus necrotic ringspot virus (PNRSV) is a member of the genus Ilarvirus in the family Bromoviridae, which was first identified in peach (Prunus persica L.) in 1941. It is one of the most economically important and prevalent viruses, and mainly infects stonefruit trees, including peach [28], nectarine (P. persica) [29], almond (P. dulcis) [30], sweet cherry (P. avium) [31], sour cherry (P. cerasus) [32], plum (P. domestica) [33], and apricot (P. armeniaca) [34], causing significant losses; it can also infect apple (Malus domestica) [35], rose (Rosa chinensis) [36], cucumber (Cucumis sativus) [37], hops (Humulus) [38], etc. According to research by Pallas et al., PNRSV infection in Prunus species can reduce bud-take in nurseries, decrease fruit growth by 10-30%, induce fruit-yield losses by 20-60%, and delay fruit maturity [39]. Recently, many studies focused on the development of viral detection techniques and the culture of virus-free propagation materials, which reduced the incidence of PNRSV greatly in the production [40]. However, there is little documented evidence as to its pathogenic mechanism. Cucumber is the natural host of PNRSV and shows symptoms of mosaic leaves and severe stunting once infected ( Figure 1) [41]. Furthermore, cucumber can also be infected by other ilarviruses from woody plants, which cause obvious symptoms in laboratory conditions involving agrobacterium-mediated infectious clones, such as apple mosaic virus [42] (ApMV), apple necrotic mosaic virus (ApNMV, data not shown), and prune dwarf virus [43] (PDV). Furthermore, the availability of the complete genomic sequence of cucumber and the successful application of virus-induced gene silencing (VIGS) vectors on cucumber, such as cucumber green mottle mosaic virus (CGMMV)-based [44] and tobacco ringspot virus (TRSV)-based vectors [45], has also contributed to the functional genomics research on cucumber and increased the utilization of cucumber as a model plant in the mechanistic research on ilarviruses from woody plants.
In this study, eleven potential reference genes (GAPDH, L23, PP2A, F-BOX, EF1α, TUA, CYP, CACS, TIP41, UBI-1, and UBI-ep) were selected according to previous research, and we evaluated which of them would be suitable for normalization analysis in specific conditions (at 12 days post-PNRSV-infection) in C. sativus. After the validation of the primer specificity, four programs (geNorm, NormFinder, ∆Ct, and BestKeeper) and the ranking software RefFinder were used to analyze the stability of the candidate reference genes, which indicated that EF1α and F-BOX used in combination were the most accurate normalization method. In addition, the three key host genes associated with the viral infection, AGO1, AGO4, and RDR6, were selected to test the reliability of the reference genes. This study is expected to provide a basis for reference-gene normalization in the context of PNRSV infection in C. sativus in future gene-expression research.

Plant Materials and Virus Inoculation
Seedlings of C. sativus were planted in a growth room under control conditions (28 • C day and 24 • C night, 16 h light and 8 h dark). The viral inoculation was conducted as previously reported. The PNRSV infectious clones with two vectors (one included RNA1 and RNA2 of PNRSV isolate Pch12, and another included RNA3 of PNRSV isolate Pch12) were transformed into Agrobacterium tumefaciens EHA105 and grown in Luria-Bertani medium containing kanamycin (100 mg/L) and rifampicin (25 mg/L). The agrobacterium cells were harvested by centrifugation and resuspended in the infiltration buffer (10 mM MgCl 2 , 10 mM MES, and 150 µM acetosyringone) [37]. After incubation for 2 h at room temperature, the culture was diluted to an optical density of 1.0 at OD 600 nm. After mixing at equal volumes, the cultures were agroinfiltrated into the two expanded cotyledons of seedlings that were about 1 week old. The agroinfiltrated plants were grown in the same conditions.

Total RNA Isolation and First-Strand cDNA Synthesis
Total RNA was extracted from the new leaves of mock-inoculated or PNRSV-inoculated cucumber seedlings 12 days post-inoculation (dpi), using TRIzol reagent (TianGen, Beijing, China), according to the manufacturer's instructions. One ug of total RNA was treated with DNase I and then a synthesis of first-strand cDNA was performed with random primer and M-MLV reverse transcriptase, using the PrimeScriptTM RT reagent Kit with gDNA Eraser (Perfect Real Time) (RR047A, Takara, Dalian, China). The mixture was incubated at 37 • C for 30 min and 85 • C for 5 s. The resulting cDNA can be stored at −80 • C for long-term use.

Selection of Candidate Reference Genes and Primer Design
Eleven widely used reference genes (GAPDH, L23, PP2A, F-BOX, EF1α, TUA, CYP, CACS, TIP41, UBI-1, and UBI-ep) were selected for this study. The reference sequences of these genes can be obtained from the NCBI database, and the accession numbers are listed in Table 1. The primer pairs of GAPDH, L23, PP2A, and F-BOX were designed using DNAMAN software, version 5.0 (Lynnon Biosoft, Quebec, QC, Canada). The conditions were set with the following parameters: melting temperature of 58-62 • C, GC content of 40-60%, primer length of 20-25 nt, and amplicon length of 80-250 bp. The specific primers of EF1α, TUA, CYP, CACS, TIP41, UBI-1, and UBI-ep were based on previous studies. The primer sequences and the associated parameters are listed in Table 1. The specificity of primers was evaluated by PCR amplification using cucumber cDNA as template. The amplified products were subjected to 2% agarose gel electrophoresis and sequenced to verify the targeted gene.

RT-qPCR Analysis
After the validation of primer specificity, RT-qPCR reactions were carried out in 20 µL volume containing 10 µL of TB Green Premix Ex Taq II (Tli RNaseH Plus) (2×), 0.8 µL of each primer (10 µM), 0.4 µL of ROX Reference Dye II (50×), 2 µL of template (the diluted cDNA of different samples), and 6 µL of distilled water. Every treatment contained at least three biological replicates with three technical replicates in clear 96-well or 384-well plates. The RT-qPCR analyses were performed in the Applied Biosystems QuantStudio 6 Flex Real-Time PCR systems (Thermo fisher scientific, Waltham, MA, USA). The PCR program was set as follows: 95 • C for 30 s (denaturation), followed by 40 cycles of 5 s at 95 • C, 20 s at 58 • C, and 35 s at 72 • C, before, finally, dissociation stage was added to generate a melting curve (95 • C for 15 s, 60 • C for 1 min, and 95 • C for 15 s) to verify the specificity of PCR amplification. The E and R 2 were calculated from the raw data using LinRegPCR software (Academic Medical Centre, Amsterdam, the Netherlands).

Data Analyses for Expression Stability
Four tools, based on different algorithms, were selected to evaluate the stability of the 11 candidate reference genes, as follows: geNorm, NormFinder, ∆Ct, and BestKeeper. The Ct values were obtained from the QuantStudio 6 Flex Real-Time PCR systems and exported into an Excel datasheet (Microsoft Excel 2013). The Ct values were directly used or converted into the desired form according to the requirements of the programs. For geNorm and NormFinder analysis, the Ct values were converted into the relative quantities using the formula 2 -∆Ct (∆Ct = Ct sample − Ct mininum ). For BestKeeper analysis, the raw Ct values were directly used. The geNorm software, which is based on the principle that the expression ratio of two reference genes should be constant in all samples regardless of different groups, calculates the gene-expression M-value as the average pairwise variation of one gene with the other candidate genes. The cut-off value of 1.5 is used to assess the stability of the reference genes and the gene with lowest M-value refers is the most stable. In addition, geNorm calculates the optimal number of reference genes required for accurate normalization by pairwise variation (V n/n+1 ). The value of V n/n+1 < 0.15 indicates that no extra reference genes are required for normalization. NormFinder is used to rank the stability of the reference genes by the parameters of stability value (SV) within and among groups. Reference genes with the lowest SVs are considered the most stable reference gene. The ∆Ct method identifies the potential reference genes by comparing the relative expression between gene pairs. BestKeeper analyses determine the most stable reference genes with the lowest coefficient of variation (CV) and the lowest relative standard deviation (SD). The candidate genes with SD higher than 1.0 are considered inconsistent and are cut off. The r and p-value are also important parameters to be considered. RefFinder is an additional web-based tool, which was used to comprehensively rank the order of the 11 candidate genes.

Relative Quantification of AGO1, AGO4, and RDR6
The EF1α and F-BOX, the two most stably expressed genes, were used in combination as reference genes for the relative quantification of AGO1, AGO4, and RDR6. The geometric mean of the Ct values of two reference genes (multiplying the Ct values of the two reference genes and then taking the square root) was used as the reference Ct value for each biological replicate. The mock-inoculated samples were used as control and the relative quantifications of AGO1, AGO4, and RDR6 in PNRSV-inoculated samples were calculated by 2 -∆∆Ct method. The experiments were replicated three times, with at least three plants each time.

Assessment of Primer Specificity and Amplification Efficiency
Eleven candidate reference genes were selected from among previously used genes for normalization in cucumber or other species. The sequences of the primers and their associated parameters are listed in Table 1. To identify the primer specificity, the amplified products were analyzed using 2% agarose gel electrophoresis. Only one clear band of the expected size appeared in each lane, and no primer dimers or non-specific amplification could be detected (Figure 2). The specific bands were cut off and sent to Sanger-sequencing. Blastn verified that the amplicons were from the targeted genes (data not shown). To further assess the primer specificity, RT-qPCR was conducted. A single peak in the melting curve was obtained after the amplification of all 11 genes (Figure 3). The corresponding PCR amplification efficiencies (E) and the linear relationships between the Ct values and the log-transformed copies indicated by correlation coefficients (R 2 ) for all the tested reference genes were calculated from the RT-qPCR data ( Table 1). The E values ranged from 91.99% to 96.24%, which were all within the acceptable range of 90-105%. Furthermore, the R 2 values ranging from 0.9924 to 0.9990 confirmed the specificity of the primer pairs.   To further evaluate the expression stability of the candidate reference genes, four methods based on different algorithms were used to calculate the expression stability: geNorm, NormFinder, ∆Ct method, and BestKeeper.

geNorm Analysis
The raw Ct values from RT-qPCR were transformed into quantities for the geNorm analysis. The average gene-expression stability measurement (M-value) of the 11 candidate genes were obtained through the program, and the genes were ranked based on the Mvalue, from highest to lowest. The genes with the highest M-values showed lower stability, while the genes with the lowest M-values indicated higher stability. All of the 11 selected genes showed an acceptable level of expression stability with M < 0.702, which was below the cut-off value of 1.5 suggested by geNorm (Figure 5a). F-BOX and UBI-1 were the most stable genes, followed by EF1α, and the TUA and TIP41 genes showed the least stability according to geNorm. In addition, the optimal number of reference genes required for reliable normalization was also determined by geNorm. The pairwise value for two genes (V 2/3 ) was 0.107, which was lower than the threshold of 0.15 (Figure 5b). This result indicated that two reference genes were sufficient for accurate analyses of the gene expression in the context of PNRSV infection. The optimal number of reference genes required for effective qRT-PCR data normalization by geNorm. Pairwise variation (V n /V n+1 ) between the normalization factors NF n and NF n+1 (NF n is the expression values for the n first-ranked candidate reference genes). (c) Expression stability of the candidate reference genes analyzed by NormFinder. A lower stability value indicates more stable gene expression. (d) The STDEV values analyzed by ∆Ct method. A lower value indicates more stable gene expression.

NormFinder Analysis
NormFinder was used to evaluate the stability of the reference genes based on the SV parameter. Lower SV values meant higher stability. The SVs of the 11 candidate genes are shown in Table 2 and ranked in Figure 5c. The results of the NormFinder analysis illustrated that the two most stable genes were EF1α and F-BOX, and the two least stable candidate genes were TUA and TIP41.

∆.Ct Analysis
The comparative ∆Ct method identified that EF1α and F-BOX were the most two suitable genes, and TUA and TIP41 were the least suitable (Figure 5d). The ranking order of the 11 candidate genes was similar to that obtained by NormFinder analysis.

Bestkeeper Analysis
The three variables, SD, coefficient of correlation (r), and CV play key roles in geneexpression stability analyses using the Bestkeeper program. Candidate genes with SD values > 1 should be excluded, since they are considered inconsistent. In the study, the analysis of the data from all the candidate genes using Bestkeeper showed that all the candidate reference genes had SD values < 1. Across all the candidate genes, TUA had relatively high SD and CV values (SD TUA = 0.81 and CV TUA = 3.66), which can be considered to have been eliminated in the subsequent analysis (Table 3). In addition, the Bestkeeper analysis showed that the p-values of the genes CYP, GAPDH, UBI-ep, CACS, and TIP41 were over 0.05 (0.105 < p-value < 0.216), which meant they could also be excluded from the subsequent ranking. The remaining candidate genes were ranked from the most stable, with the highest coefficient of correlation, to the least stable, with the lowest value. EF1α (r = 0.873; p-value = 0.001) was the most stable gene, followed by PP2A (r = 0.845; p-value = 0.001) and F-BOX (r = 0.826; p-value = 0.001) ( Table 2). These three genes (EF1α, F-BOX, and PP2A) were also the most stable genes identified by NormFinder analysis and the ∆Ct method.

RefFinder Analysis
To obtain a final overall ranking, RefFinder, a web-based comprehensive evaluation platform, was performed, based on the results of the above four programs. According to the RefFinder analysis, EF1α and F-BOX were the top two suitable reference genes for normalizing the transcripts in PNRSV-infected C. sativus, whereas TIP41 and TUA were the least stable genes ( Table 2). According to the optimal number of reference genes evaluated by geNorm, EF1α and F-BOX were selected as the optimal multiple reference genes for normalization in PNRSV-infected C. sativus at 12 dpi.

Influence of Different Selection of Reference Genes on the Relative Expression of Target Genes
To investigate the influences of different reference genes on the expression of the target mRNA, we selected AGO1, AGO4, and RDR6, which are the important genes in RNA interference (RNAi)-based antiviral immunity (Figure 6), for additional RT-qPCR analyses. To identify the primer specificity, the amplified products were analyzed using 2% agarose gel electrophoresis and the melting curve ( Figures S1 and S2). The most stable genes, EF1α and F-BOX, were used in combination for normalization, while the less stable genes, TIP41 and TUA, were also used as comparison. The relative expression levels of AGO1 and AGO4 showed significant upregulation, and the relative expression level of RDR6 showed no significant difference after PNRSV infection when we normalized using EF1α and F-BOX in combination. The AGO1, AGO4, and RDR6 genes exhibited similar expression trends when using TIP41 as the reference gene, while the upregulated folds were much larger than when using EF1α and F-BOX in combination, which indicated that the relative expressions were obviously overestimated (Figure 6a-c). By contrast, the relative expression of AGO1 and AGO4 in the PNRSV-inoculated C. sativus showed quite different trends when TUA was used for normalization, and did not show any differences compared with mock-inoculated plants. Figure 6. Relative expression of (a) AGO1, (b) AGO4, and (c) RDR6 in PNRSV-infected C. sativus with different reference genes. EF1α and F-BOX were the most stable genes, while TIP41 and TUA were the least stable genes. (d) The relative expression levels of AGO1, AGO4, and RDR6 were calculated by high-throughput RNA sequencing data. * means p < 0.05, ** means p < 0.01, and *** means p < 0.0001 (based on t-test analysis).
Moreover, the relative expression levels of AGO1, AGO4, and RDR6 were calculated by high-throughput RNA sequencing data, which indicated that the expression of the AGO1 in the PNRSV-inoculated C. sativus was 2.28 times that in healthy plants, the expression of AGO4 in PNRSV-inoculated C. sativus was 1.61 times that in healthy plants, and the expression level of RDR6 showed no significant difference after PNRSV infection (Figure 6d). The relative expression levels of AGO1, AGO4, and RDR6 by transcriptome analyses were similar to the RT-qPCR results, which validated the use of EF1α and F-BOX as normalization genes.

Discussion
RT-qPCR is one of the most widely used methods for quantifying gene expression changes [5]. Relative changes in gene expression can be determined without knowing the absolute quantity of the reference genes, while the accuracy and repeatability of RT-qPCR analysis highly depends on many factors, such as sample quality, RNA quality and integrity, primer specificity, PCR conditions and amplification efficiency, etc. Based on these factors, the selection of a suitable reference gene is of primary importance [18,47]. In this study, we proposed a systematic process through which to identify the optimal reference genes in the leaf tissues of C. sativus under PNRSV infection. Firstly, the primer specificity of the candidate reference genes was experimentally validated by PCR amplification, Sanger sequencing, and melting curves. Secondly, the Ct value, E value, and R 2 were achieved for the tested reference genes. The suitable Ct values, 95% < E < 105%, and R 2 > 99%, are the prerequisite for using the 2 -∆∆Ct method for data analysis [48]. Subsequently, four methods based on different algorithms and one platform for ranking order were used to evaluate the expression stability of the candidate reference genes [48,49]. geNorm is considered one of the best methods to determine the most stable genes and the optimal number of genes [16]. F-BOX, UBI-1, and EF1α were the most stable genes and TUA and TIP41 showed the least stability according to geNorm. At the same time, geNorm indicated that two reference genes were sufficient in our search. NormFinder uses the variation among candidate genes to rank, taking into account intra-group and inter-group variation, and reduces bias through co-regulation [17]. When we used NormFinder, EF1α and F-BOX were the most stable genes. The ranking results showed almost no differences between the NormFinder and ∆Ct methods. BestKeeper is a useful approach for calculating the coefficients of correlation, SD and CV [18]. The EF1α was the most stable gene and TUA had a high r value, but its high SD and CV values suggested instability. Combining these four methods, EF1α and F-BOX were the top-ranked reference genes, while TUA and TIP41 were the lowest=ranked, according to the RefFinder analysis. Finally, three virus-immunity-associated genes were selected to validate the importance of the selection of suitable reference genes.
Several viruses infecting woody plants in the genus Ilarvirus cause serious damages in the fruit industry [39]. PNRSV induced significant losses in stone-fruit trees when it broke out in Turkey, India, etc. [50,51]. The ApMV and ApNMV viruses were identified as having induced apple mosaic disease, which is characterized by the symptoms of mosaic leaves and occurs widely in major apple-producing areas around the world [52,53]. PDV, another ilarvirus infecting stone-fruit species, caused deteriorated fruit marketability and reduced fruit yields [39]. The fact that the infectious clones of the above four viruses caused clear symptoms in cucumbers, the breakthrough of transgenic technology, and the establishment of VIGS systems on cucumbers, have prompted the application of cucumber as a useful and crucial experimental material to facilitate studies on the infection mechanism of ilarviruses. The study of pathogenic mechanisms is very important in the unravelling the viral infections and provide clues for the development of resistant materials [54]. Therefore, the validation of reference genes for gene expression studies using RT-qPCR after viral infection in C. sativus is highly necessary. Recently, some studies have identified a variety of reference genes in cucumbers at different developmental stages and under different biotic or abiotic stresses. TUA and UBI-ep were the most stably expressed genes when the cucumber seedlings at the second true leaf stage were treated with hormones, and EF1α showed a relatively stable expression level when the seedlings were treated with salt and drought stress [47,55]. The genes CACS, TIP41, F-BOX, and EF1α showed the highest expression stability under different nitrogen nutrition regimens and the combined use of three or four references helped to obtain reliable results [48]. In terms of biotic stress, EF1α and GAPDH were the most reliable reference genes for normalizing the miRNA expression after cucumber green mottle mosaic virus infection in leaf, root, and stem samples [56]. In our research, we identified EF1α and F-BOX as the most suitable reference genes among the eleven candidate genes for normalization in the PNRSV-infected leaf tissues of C. sativus at 12 dpi. By contrast, the validated reference genes of TUA, TIP41, CACS, and UBI-ep in the previous experiments were relatively unstable and ranked lower in the stability ranking, which further indicated that there is no universal reference gene that can be stably expressed and used under all conditions for all kinds of tissue. The stability of reference genes depends on the experimental parameters, and the selection of reference genes should be studied in advance, case by case, to prevent subsequent false data interpretation and conclusions.
RNAi-mediated plant immunity is believed to be a universal pathway against viruses [57]. To further validate our conclusion, we investigated the relative expression of AGO1, AGO4, and RDR6 using the normalization factor generated with EF1α and F-BOX in combination, which showed that AGO1 expression was greatly and significantly upregulated in PNRSVinfected leaf tissues compared with the mock-inoculated control plants, and suggested that AGO1 may play an important role in PNRSV infection. Furthermore, when using TUA or TIP41 as reference genes, the differential expression of AGO1 and AGO4 was ignored or overemphasized. To support these conclusions, transcriptome-wide high-throughput RNA sequencing was conducted to provide a more accurate, unbiased estimate of the transcript abundance of AGO1, AGO4, and RDR6. The changes in the expression trends were similar to the RT-qPCR results. These findings further indicate that the use of appropriate reference genes is crucial to obtain accurate results.
In this study, we assessed eleven candidate reference genes in C. sativus in the context of PNRSV infection and determined that the combined use of EF1α and F-BOX is the best choice. This study represents the first attempt to select suitable reference genes in C. sativus in the context of ilarvirus infection, which will facilitate the functional study of genes related to viral infection.