Selection of Appropriate Reference Genes for Gene Expression Analysis under Abiotic Stresses in Salix viminalis

Salix viminalis is a fast growing willow species with potential as a plant used for biomass feedstock or for phytoremediation. However, few reference genes (RGs) for quantitative real-time polymerase chain reaction (qPCR) are available in S. viminalis, thereby limiting gene expression studies. Here, we investigated the expression stability of 14 candidate reference genes (RGs) across various organs exposed to five abiotic stresses (cold, heat, drought, salt, and poly-metals). Four RGs ranking algorithms, namely geNormPLUS, BestKeeper, NormFinder, and GrayNorm were applied to analyze the qPCR data and the outputs were merged into consensus lists with RankAggreg, a rank aggregation algorithm. In addition, the optimal RG combinations were determined with geNormPLUS and GrayNorm. The genes that were the most stable in the roots were TIP41 and CDC2. In the leaves, TIP41 was the most stable, followed by EF1b and ARI8, depending on the condition tested. Conversely, GAPDH and β-TUB, two genes commonly used for qPCR data normalization were the least stable across all organs. Nevertheless, both geNormPLUS and GrayNorm recommended the use of a combination of genes rather than a single one. These results are valuable for research of transcriptomic responses in different S. viminalis organs.


Introduction
Perennial woody plants are present across the world and cover about 30% of the land surface [1]. Trees are a source of numerous economical products, such as timber, fuel, food, resins, and oil, but they also provide a range of ecosystem services including carbon storage, maintenance of wildlife habitats, and soil stabilization [2]. Unlike herbaceous plants, they usually have a longer life cycle, spanning from decades to centuries. Therefore, they have to withstand numerous challenges throughout their life span, under the form of both biotic and abiotic stress constraints. Abiotic stresses are the main factors affecting plant distribution and primary production [3]. From this perspective, unravelling stress acclimation and resistance mechanisms could help us to optimize plant primary production [4].
Basket willow (Salix viminalis L.) is a fast-growing tree species with a wide Eurasian distribution that can be found in cool, temperate, and boreal climates. It is a pioneer species of riparian environments with a moderate to high metal tolerance [5,6]. It is one of the most important species used in basketry and its fast growth makes it amongst the most used species in intensive short-rotation agro-forestry systems in northern temperate regions [7]. In addition to this, its highly developed root system and Relative water content of control-and drought-treated soil were measure at 0,1,4,6, and 7 days after stress exposure for each plant sampled. Chlorophyll fluorescence of young leaves was measured at each sampling time, and the F V /F M ratio was calculated as an easily obtainable stress marker [21]. Two different trends were observed in the chlorophyll fluorescent values (Table 1). Plants exposed to cold stress showed an initial decrease in F V /F M , followed by a subsequent recovery. On the contrary, the F V /F M ratio of plants exposed to drought and heat stress decreased from the start of the experiment throughout the whole duration of the experiment. Plants rooted in metal contaminated soil for two months had the lowest F V /F M ratio and their leaves showed signs of mild chlorosis. The precise chlorophyll fluorescence values for all plants can be found in Table S1. Table 1. (a) Sampling time of the different organs of 8 months old willows exposed to different abiotic stress factors for 12 days. R: roots, L: leaves, O: other organs (roots stele, roots cortex, stem xylem and bark). (b) Evolution of the F V /F M ratios during the time course of the experiment. Values given are the averaged F V /F M ratios ± standard deviation of the measures made on three biological replicates. On day 7, only one plant exposed to drought stress still had its leaves. Dark green, F V /F M ratio between 0.825 and 0.850; light green, 0.800-0.825; yellow, 0.775-0.800; orange, 0.750-0.775; red, F V /F M ratio under 0.750. (c) Evolution of the soil relative water content (RWC) in the soil of control and drought-exposed plants. Plants exposed to salt, cold, and heat stress were watered at the same frequency as control plants. (

Abbreviation
Gene The efficiency of the primers (E) was calculated (Table 2) using serial five-fold dilutions (cf. Materials and Methods, melting curves can be found in Figure S1). The quantification cycle (C q ) was measured and descriptive statistics were performed on the raw data ( Figure 1). Quantification cycles were then transformed when needed. The expression stability of the candidate RGs was assessed with four algorithms: geNorm PLUS [24], BestKeeper [25], NormFinder [26], and GrayNorm [27]. We obtained a total of five lists ranking the genes by their stability and two optimal gene combinations (from geNorm PLUS and GrayNorm). Since the different algorithms produced different ranking lists, we used RankAggreg to compute a consensus RGs list for each condition. In addition, we compared the optimal combinations of genes computed by geNorm PLUS and GrayNorm. Validation of the RGs was performed using both the quantification cycle values and the primers efficiency. The efficiency of the primers (E) was calculated (Table 2) using serial five-fold dilutions (cf. Materials and Methods, melting curves can be found in Figure S1). The quantification cycle (Cq) was measured and descriptive statistics were performed on the raw data ( Figure 1). Quantification cycles were then transformed when needed. The expression stability of the candidate RGs was assessed with four algorithms: geNorm PLUS [24], BestKeeper [25], NormFinder [26], and GrayNorm [27]. We obtained a total of five lists ranking the genes by their stability and two optimal gene combinations (from geNorm PLUS and GrayNorm). Since the different algorithms produced different ranking lists, we used RankAggreg to compute a consensus RGs list for each condition. In addition, we compared the optimal combinations of genes computed by geNorm PLUS and GrayNorm. Validation of the RGs was performed using both the quantification cycle values and the primers efficiency. Figure 1. Data analysis workflow. Cycles of quantification (Cq) and PCR efficiency (E) were determined by qPCR and then used as input for the different gene ranking algorithms. Contrary to the other ranking algorithms, BestKeeper only used Cq as input data. The ranking list produced by the different algorithms were then used as input data with RankAggreg to produce a consensus list. Grey boxes: raw data and descriptive statistics, green boxes: algorithms, circles: ranking algorithms output, round-cornered rectangles: final outputs, blue circles: Data used to rank gene stability, orange circles: Data used to determine the optimal gene combination. Stab.: NormFinder stability value, Avg M: Average expression stability M, Ind.: GrayNorm ranking taking only single gene combinations into account, Comb.: GrayNorm ranking taking all gene combinations into account.
Out of the 14 candidate RGs tested, GAPDH and β-TUB displayed a huge variation in their expression pattern (Figure 2). GAPDH is more expressed in leaves than in roots, as it has a role in the Calvin-Benson cycle in addition to is function in glycolysis. Conversely β-TUB is on average more expressed in the roots than in the leaves. As the presence of genes showing an important variation in Cq can lead to incorrect rankings [25], GAPDH and β-TUB were not taken into account when analyzing stability of RGs in all organs grouped together.
The expression level of all the candidate RGs shows a mild but non-significant increasing trend over time under control condition in both the leaves and the roots ( Figures S2 and S3). While a similar trend was observed under cold and heat conditions in the roots for most candidate RGs, CYP and β-TUB show a significant increasing trend and GAPDH displays a decreasing trend. Trends are not that marked in the leaves exposed to cold and heat. While all candidate RGs except CYP show an increasing trend in the leaves of plants exposed to salt, both non-significant increasing and decreasing trends were observed in the roots of the plants exposed to the same conditions. All the genes show an increasing trend in both the leaves and the roots of plants exposed to drought condition, except for CYP and VHAC in the leaves and GAPDH in the roots. On the other hand, β-TUB displays a strong and significant increase in Cq values over time in leaves and roots of plants exposed to drought stress. Cycles of quantification (C q ) and PCR efficiency (E) were determined by qPCR and then used as input for the different gene ranking algorithms. Contrary to the other ranking algorithms, BestKeeper only used C q as input data. The ranking list produced by the different algorithms were then used as input data with RankAggreg to produce a consensus list. Grey boxes: raw data and descriptive statistics, green boxes: algorithms, circles: ranking algorithms output, round-cornered rectangles: final outputs, blue circles: Data used to rank gene stability, orange circles: Data used to determine the optimal gene combination. Stab.: NormFinder stability value, Avg M: Average expression stability M, Ind.: GrayNorm ranking taking only single gene combinations into account, Comb.: GrayNorm ranking taking all gene combinations into account.
Out of the 14 candidate RGs tested, GAPDH and β-TUB displayed a huge variation in their expression pattern (Figure 2). GAPDH is more expressed in leaves than in roots, as it has a role in the Calvin-Benson cycle in addition to is function in glycolysis. Conversely β-TUB is on average more expressed in the roots than in the leaves. As the presence of genes showing an important variation in C q can lead to incorrect rankings [25], GAPDH and β-TUB were not taken into account when analyzing stability of RGs in all organs grouped together.
The expression level of all the candidate RGs shows a mild but non-significant increasing trend over time under control condition in both the leaves and the roots ( Figures S2 and S3). While a similar trend was observed under cold and heat conditions in the roots for most candidate RGs, CYP and β-TUB show a significant increasing trend and GAPDH displays a decreasing trend. Trends are not that marked in the leaves exposed to cold and heat. While all candidate RGs except CYP show an increasing trend in the leaves of plants exposed to salt, both non-significant increasing and decreasing trends were observed in the roots of the plants exposed to the same conditions. All the genes show an increasing trend in both the leaves and the roots of plants exposed to drought condition, except for CYP and VHAC in the leaves and GAPDH in the roots. On the other hand, β-TUB displays a strong and significant increase in C q values over time in leaves and roots of plants exposed to drought stress.

Figure 2.
Boxplot of the quantification cycle (Cq) values for 14 candidate reference genes in S. viminalis roots and leaves exposed to various abiotic stresses. Eight-month old cuttings were exposed to drought, cold, heat, metal or salt stress in triplicate for up to two months before being sampled.
Expression stability of the candidate RGs was analyzed in the roots, the focus of our study. Nevertheless, we also provide RGs to use in studies focused on the leaves and/or various organ combinations in the Tables S2-S4.

Reference Genes Ranking
geNorm PLUS is a quantity-based algorithm with a pairwise comparison approach. Quantitybased algorithms take both Cq values and E into account during the computation. The key principle behind a pairwise comparison approach is that ideal RGs display similar expression patterns across the different samples. Based on this hypothesis, expression levels of ideal RGs are highly correlated between them and ideal RGs can therefore be easily detected. One drawback of this assumption is that co-regulated genes are artificially ranked amongst the most stable genes, independently of their intrinsic gene expression stability [26].
To rank RGs, geNorm PLUS computes for each gene a measure of stability (named M) based on the average pairwise variation between this gene and the remaining candidate RGs. The algorithm also calculates another stability value (called Average expression stability M) by doing a stepwise exclusion of the least stable gene and averaging the M value of the remaining RGs.
In addition to ranking genes, geNorm PLUS is able to determine the minimal number of RGs that should be used to obtain an accurate normalization. To do so, the software estimates the pairwise variation between two normalization factors (NF) composed of an increasing number of reference genes (namely Vn/n+1, with n being the n most stable genes). A large variation indicates that the added gene (+1) has a significant effect on normalization and should be kept for the calculation of the NF. On the contrary, if the variation falls below a threshold value (fixed to 0.15), n is considered the minimum number of RGs to use for accurate normalization [24]. Alternatively, n+1 genes can be used Figure 2. Boxplot of the quantification cycle (C q ) values for 14 candidate reference genes in S. viminalis roots and leaves exposed to various abiotic stresses. Eight-month old cuttings were exposed to drought, cold, heat, metal or salt stress in triplicate for up to two months before being sampled.
Expression stability of the candidate RGs was analyzed in the roots, the focus of our study. Nevertheless, we also provide RGs to use in studies focused on the leaves and/or various organ combinations in the Tables S2-S4.

Reference Genes Ranking
geNorm PLUS is a quantity-based algorithm with a pairwise comparison approach. Quantity-based algorithms take both C q values and E into account during the computation. The key principle behind a pairwise comparison approach is that ideal RGs display similar expression patterns across the different samples. Based on this hypothesis, expression levels of ideal RGs are highly correlated between them and ideal RGs can therefore be easily detected. One drawback of this assumption is that co-regulated genes are artificially ranked amongst the most stable genes, independently of their intrinsic gene expression stability [26].
To rank RGs, geNorm PLUS computes for each gene a measure of stability (named M) based on the average pairwise variation between this gene and the remaining candidate RGs. The algorithm also calculates another stability value (called Average expression stability M) by doing a stepwise exclusion of the least stable gene and averaging the M value of the remaining RGs.
In addition to ranking genes, geNorm PLUS is able to determine the minimal number of RGs that should be used to obtain an accurate normalization. To do so, the software estimates the pairwise variation between two normalization factors (NF) composed of an increasing number of reference genes (namely V n/n+1 , with n being the n most stable genes). A large variation indicates that the added gene (+1) has a significant effect on normalization and should be kept for the calculation of the NF. On the contrary, if the variation falls below a threshold value (fixed to 0.15), n is considered the minimum number of RGs to use for accurate normalization [24]. Alternatively, n+1 genes can be used for normalization when V n/n+1 reaches a minimum [28]. This has the advantage of providing a minimum of three RGs for normalization, which reduces the risk of having co-regulated genes. Table 3. Potential reference genes (RGs) to normalize qPCR data in Salix viminalis roots exposed to various abiotic stresses. Plants were exposed to the stresses for up to two months. RGs expression stability is given by the "Average expression stability M" value as determined by geNorm PLUS . Lower stability value means greater stability. Stab.: gene stability value.

Rank
In this experiment, CDC2 ranked amongst the most stable genes across all the tested conditions (Table 3). TIP41 and ACT also ranked well, although their ranking was more variable depending on the conditions tested. GAPDH and β-TUB were among the least stable genes for each condition. CYP showed a variable stability pattern i.e., while it was the most stable gene under control and cold conditions, it varied between the third and twelfth position under the other conditions. The gene eTIF5 had a similar pattern although it was generally slightly less stable than CYP.
BestKeeper is another software based on pairwise comparison. To rank the candidate RGs, the algorithm computes several stability measures, which allow the users to select the most stable genes. First, it computes the standard deviation (SD) of the C q values for each gene (the lower the more stable). In parallel, it combines all the candidate RGs into an index computed based on the geometrical average of the C q s of all the RGs for each sample. It then estimates the coefficient of determination (r 2 ) between each candidate RGs and the index (the closer to 1 the better). In addition, BestKeeper measures sample integrity by comparing for each RG, the sample C q values with the index C q value.
As the original excel-based algorithm allows to test a maximum of 10 genes, we used ctrlGene, a package based on R [29], after having verified it produced the same results as the original spreadsheets. When less than 10 reference genes were used and efficiency fixed to 2, the results obtained with the R-based BestKeeper package were the same as for the excel-based BestKeeper, with the exception of SD [x-fold] (data not shown). When using the efficiencies calculated from our primers, only the "[x-fold]" measures changed. This means that BestKeeper could not be classified as a quantity-based algorithm. Both the correlations and the descriptive statistics were calculated based on the C q values. The efficiency factor, a quantitative factor, was then only taken into account for the calculation of the [x-fold] values [17]. As the correct [x-fold] values can easily be manually computed, we used the R package that allowed us to analyze the stability of 14 reference genes at the same time. Table 4. Potential RGs to normalize qPCR data in Salix viminalis roots exposed to various abiotic stresses. Plants were exposed to the stresses for up to two months. Gene expression stability was calculated with BestKeeper. Two stability values are given: standard deviation SD [+/− C q ] (the lower, the better) and by the coefficient of determination (r 2 ) (the closer to 1, the better). Genes with SD [+/− Cq] over 1 should be discarded. Color scale: the bluer, the more stable. Based on the calculated standard deviation, TIP41 was globally the most stable gene, followed by CDC2 (Table 4). In six out of seven conditions tested, GAPDH had a SD > 1, indicating a low stability. Similarly, β-TUB had a SD > 1 in four conditions. Under the heat condition, β-TUB, CYP and UCEE2 were shown to be unstable. The coefficient of determination shows that CDC2 and TIP41 were highly correlated to the BestKeeper index while β-TUB and GAPDH were the least correlated to it.

Conditions
NormFinder is a quantity-model-based software. Model-based algorithms use complex statistical models to compute the variation between the expression of genes in samples belonging to different biological groups (conditions, time points, organs . . . ). These approaches have two advantages over pairwise comparison algorithms: (1) They are theoretically less sensitive to co-regulated genes and (2) they take biological groups into account and thereby reduce the errors introduced by systematic intergroup variation [26]. However, compared to the pairwise approach, the robustness of a model-based approach is more dependent on the number of reference genes and conditions tested. For example, the NormFinder manual recommends to use a minimum of three candidate genes and two samples per group, and optimally 5-10 genes and eight samples per group.
To rank RGs, NormFinder evaluates the overall variation of the candidate RGs, but also the variation existing between sample groups. The algorithm then merges the intra-and intergroup variations values into a stability index allowing the user to select the most stable genes. The software can also give the best combination of two genes that together are the most stable. Table 5. Potential RGs to normalize qPCR data in Salix viminalis roots exposed to various abiotic stresses. Plants were exposed to the stresses for up to two months. Stability value of candidate RGs was determined by NormFinder. Genes are ranked from top to bottom in order of decreasing stability, lower values indicate higher stability. Stab.: gene stability value. GrayNorm is another quantity-model-based algorithm specifically designed to identify RGs that limit uncertainty introduced during data normalization. GrayNorm is based on the principle that non-normalized data of GOIs contain both biologically meaningful expression differences (signal) and technical variation (noise). Signal and noise are fixed once the experiment is finished and only the NF can further change the uncertainty level. Therefore, GrayNorm identifies candidate RGs that introduce the least uncertainty during normalization.
GrayNorm works by computing the NF for each condition and each possible RG combination. It then calculates the average NF per condition in relation to the control. Finally, it orders the different RG combinations based either on the coefficient of variation of these NF averaged per condition (CV inter ), or on the cumulative deviation of the NF [27,30]. In our study, CV inter was used as the ranking factor. Interestingly, by computing all the possible RG combinations, GrayNorm can group RGs showing opposite expression variability trends, thereby producing a NF that smoothens individual gene expression variability [26]. Table 6. Potential RGs to normalize qPCR data in Salix viminalis roots exposed to various abiotic stresses. Plants were exposed to the stresses for up to two months. Gene expression stability was calculated with GrayNorm. Genes are ranked according to their internal coefficient of variation (CV inter  Although ACT and OTUp rank best in most of the conditions, their stability depends highly on the conditions to which the plants are exposed. In general, TIP41 was always found amongst the higher ranked ( Table 6). The gene ARI8 was also generally detected as stable although its rank varied importantly depending on the conditions the plants were exposed to. Interestingly, GAPDH was amongst the most stable genes when all the conditions were grouped together and under heat stress, where it was second and third respectively. On the other hand, CDC2 ranked on average at the seventh place while it was determined as one of the most stable genes with the other algorithms.

Consensus Ranking List
In this study, four different algorithms were used to analyze the expression stability of 14 candidate RGs. However, the algorithms yielded different ranking lists due to their different approach. In order to provide a comprehensive result, we used RankAggreg [31] to compute a consensus RG list for each condition.
RankAggreg is an algorithm specifically designed to aggregate large ranking lists. To do so, it first generates all the candidate consensus lists possible. It then computes the distance between the input lists and the candidate consensus list for each of the possible candidate consensus list. Two functions can be used to compute the distance: Spearman footrule distance (which was used) or Kendall's tau distance. Finally, RankAggreg selects the consensus list which yields the minimum distance value. As this task can be time consuming with long (>10) input lists, RankAggreg has built-in Cross-Entropy Monte Carlo and Genetic algorithms to reduce the time of computation required.
Predictably, CDC2 and TIP41 were amongst the most stable genes in all analyses (Table 7). CYP showed a high variation pattern as it ranked first in control and cold conditions but eighth and twelfth when all conditions were grouped and under heat stress, respectively. GAPDH and β-TUB were the least stable genes across all conditions, which is expected as they were ranked last by most algorithms. Table 7. Consensus ranking of the 14 candidate RGs to normalize qPCR data in Salix viminalis roots exposed to various abiotic stresses for up to two months. Plants were exposed to the stresses for up to two months. This consensus ranking list was generated by RankAggreg. Genes are listed from top to bottom in order of decreasing expression stability.  EF1b

Optimal Combination of Reference Genes to Use
Out of the four algorithms used, two (geNorm PLUS and GrayNorm) allow to compute the optimum combination of reference genes to use (cf. supra for the principles).
Based on our data, geNorm PLUS recommends the use of two reference genes to normalize the expression of GOI ( Figure 3). Indeed, for every condition, the value of V 2 /V 3 is under the threshold value of 0.15. However, based on the recommendations of Ling and Salvaterra [28], more genes should be used. For example, under control condition, the eight most stable genes could be used to normalize gene expression. However, the optimal number of RGs to use for normalization is highly dependent on the conditions studied, and ranges from four under heat stress to 11 for salt-exposed roots.
Based on our data, GrayNorm recommends to use between one and five reference genes to normalize GOIs expression values ( Table 8). The number of reference genes to use for optimal normalization was highly dependent on the condition, it was one, for roots of plants exposed to metals and under control condition, two, under heat stress and when grouping all the stresses together, three, under drought and salt stresses and five, under cold stress. Interestingly, RG combinations given by GrayNorm are not always the combination of the most stable genes. Most of the time, optimal combinations consist of genes classified as highly stable and genes individually classified as having lower stability values. For example, under cold conditions, GrayNorm suggests to use the four most stable genes (OTUp, TIP41, ARI8, ACT) and one less stable gene (GAPDH, ranked 11 out of 14).

Optimal Combination of Reference Genes to Use
Out of the four algorithms used, two (geNorm PLUS and GrayNorm) allow to compute the optimum combination of reference genes to use (cf. supra for the principles).
Based on our data, geNorm PLUS recommends the use of two reference genes to normalize the expression of GOI (Figure 3). Indeed, for every condition, the value of V2/V3 is under the threshold value of 0.15. However, based on the recommendations of Ling and Salvaterra [28], more genes should be used. For example, under control condition, the eight most stable genes could be used to normalize gene expression. However, the optimal number of RGs to use for normalization is highly dependent on the conditions studied, and ranges from four under heat stress to 11 for salt-exposed roots.
Based on our data, GrayNorm recommends to use between one and five reference genes to normalize GOIs expression values ( Table 8). The number of reference genes to use for optimal normalization was highly dependent on the condition, it was one, for roots of plants exposed to metals and under control condition, two, under heat stress and when grouping all the stresses together, three, under drought and salt stresses and five, under cold stress. Interestingly, RG combinations given by GrayNorm are not always the combination of the most stable genes. Most of the time, optimal combinations consist of genes classified as highly stable and genes individually classified as having lower stability values. For example, under cold conditions, GrayNorm suggests to use the four most stable genes (OTUp, TIP41, ARI8, ACT) and one less stable gene (GAPDH, ranked 11 out of 14).

Gene Stability in the Leaves and Other Organs
In the leaves, TIP41 was the most stable gene across all conditions, according to the consensus list (cf. Table S2). ARI8 displayed a high expression stability under most conditions but falls at the fifth and seventh place under cold and drought conditions, respectively. Similarly, EF1b ranked well except under heat stress and when all conditions were grouped (where it ranked at the sixth position). In the leaves, CDC2 showed an important variation in its stability as it ranked first under salt and heat condition, but ninth under drought condition. Interestingly, CYP was the second least stable gene in average, while it ranked as the most stable gene in roots exposed to control and cold conditions. As in roots, β-TUB, GAPDH and PT1 were amongst the least stable genes.
When the data from the roots and the leaves were merged together, TIP41 was the most stably expressed gene, according to the consensus list (cf. Table S3). Under most conditions, EF1b and CDC2 were stably expressed when both organs were grouped together. Apart from β-TUB and GAPDH, which displayed a huge variation in their expression pattern, α-TUB, OTUp, and PT1 were not stable under any tested conditions. To normalize qPCR data across various organs under control conditions, eTIF5, TIP41, CDC2, and EF1b were the most stable genes (cf . Table S4). Overall, TIP41 was the most stably expressed gene across all conditions and organs.

Reference Genes Validation
The validity of the candidate RGs identified was tested in the different organs and conditions by studying the expression profiles of three reporter genes: HSP17 (a 17.6 kDa heat shock protein, SapurV1A.0393s0170), ADC (arginine decarboxylase, SapurV1A.0091s0150), and CAT (catalase, SapurV1A.0016s0660).
The small heat shock protein HSP17 has a chaperone-like activity and its expression changes when exposed to temperature and osmotic stresses [32]. Arginine decarboxylase is an enzyme playing a role in the synthesis of putrescine, a polyamine, which plays a role in the tolerance to various abiotic stresses among which osmotic stress, salinity, hypoxia, and cold [33]. The last tested gene codes for catalase, a H 2 O 2 -scavenging enzyme with expression changes in response to osmotic, temperature, and oxidative stresses [34].
It was assumed that the stress treatment would affect the expression level of the reporter genes but not the expression of the RGs. The data was analyzed with qBase PLUS and normalized using CDC2, TIP41, ARI8, ACT, and EF1b for both the roots and the leaves (figure for the leaves are found in Figure  S4), since these genes were the most stable according to the consensus list.
As can be seen in Figure 4, the ADC expression level increased in roots exposed to cold and drought stresses. On the contrary, it decreased in the roots exposed to heat et metals. CAT expression was induced in the roots by both drought and heat stresses. Drought and heat stresses both significantly induced HSP17 expression. Salt stress did not significantly affect the expression level of any of these genes in the roots, except a mild decrease of expression of ADC during the first day. Interestingly, salt stress had a significant effect on the expression level of both ADC and HSP17 in the leaves on the first day after the stress application ( Figure S4).

Reference genes validation
The validity of the candidate RGs identified was tested in the different organs and conditions by studying the expression profiles of three reporter genes: HSP17 (a 17.6 kDa heat shock protein, SapurV1A.0393s0170), ADC (arginine decarboxylase, SapurV1A.0091s0150), and CAT (catalase, SapurV1A.0016s0660).
The small heat shock protein HSP17 has a chaperone-like activity and its expression changes when exposed to temperature and osmotic stresses [32]. Arginine decarboxylase is an enzyme playing a role in the synthesis of putrescine, a polyamine, which plays a role in the tolerance to various abiotic stresses among which osmotic stress, salinity, hypoxia, and cold [33]. The last tested gene codes for catalase, a H2O2-scavenging enzyme with expression changes in response to osmotic, temperature, and oxidative stresses [34].
It was assumed that the stress treatment would affect the expression level of the reporter genes but not the expression of the RGs. The data was analyzed with qBase PLUS and normalized using CDC2, TIP41, ARI8, ACT, and EF1b for both the roots and the leaves (figure for the leaves are found in Figure  S4), since these genes were the most stable according to the consensus list.
As can be seen in Figure 4., the ADC expression level increased in roots exposed to cold and drought stresses. On the contrary, it decreased in the roots exposed to heat et metals. CAT expression was induced in the roots by both drought and heat stresses. Drought and heat stresses both significantly induced HSP17 expression. Salt stress did not significantly affect the expression level of any of these genes in the roots, except a mild decrease of expression of ADC during the first day. Interestingly, salt stress had a significant effect on the expression level of both ADC and HSP17 in the leaves on the first day after the stress application ( Figure S4). . Normalized relative expression level of three stress-responsive genes in the roots of S. viminalis exposed to various abiotic stresses for up to two months. Data was normalized using ACT, ARI8, CDC2, EF1b and TIP41. Means that share a letter are not significantly different at α = 0.05.

Figure 4.
Normalized relative expression level of three stress-responsive genes in the roots of S. viminalis exposed to various abiotic stresses for up to two months. Data was normalized using ACT, ARI8, CDC2, EF1b and TIP41. Means that share a letter are not significantly different at α = 0.05.

Discussion
Currently, qPCR is widely used to perform gene expression analysis [15]. However, the use of non-stable RGs for data normalization can lead to improper analysis and conclusions [20]. Therefore, the aim of this study was to identify RGs in S. viminalis with a stable expression level across various conditions and organs. While validation of appropriate RGs has been done in many plants, most studies focus on herbaceous plants [30] and no RGs are publicly available for the closely related S. purpurea, which has been sequenced.
Due to the high potential of S. viminalis in phytoremediation, together with its high primary productivity, we performed RG validation for leaves and roots exposed to various conditions. The study showed that the potential RGs had different expression profiles depending on the organ studied, which highlights the need to perform condition-and organ-specific RG validation before every gene expression experiment.
Chlorophyll fluorescence was measured and the F V /F M ratio was calculated and used as a stress marker. Interestingly, the F V /F M ratio of the drought-exposed plants only slightly decreased during the six first days but on the seventh, two of the three plants had lost their leaves. Conversely, the plants exposed to cold showed a drop in F V /F M value the first day and subsequently recovered, although the stress was maintained at the same level. According to Lichtenthaler (1998) [35], stress has a dose-effect relationship i.e., while some stressors have a low effect on plant survival at low "concentration", higher "concentration" or longer exposure time could lead to significant damage to the plant integrity (exhaustion phase). Conversely, some stresses can induce a temporary distress (alarm phase) which act as a stimulatory signal for the plant and induces its hardening. The F V /F M ratio could indicate the alarm and exhaustion phases in the stress response rather than the application of a stressor to the plant.
Four different algorithms, geNorm PLUS , BestKeeper, NormFinder, and GrayNorm were used to analyze the expression stability of 14 candidate RGs. While the four algorithms produced globally similar rankings, some punctual but important differences could be observed between the algorithms' outputs. For example, while geNorm PLUS , BestKeeper and NormFinder ranked GAPDH amongst the least stable genes across all conditions, GrayNorm ranked this gene among the most stable ones when all conditions were grouped and under heat stress. These kinds of discrepancies arise from the fact that each algorithm is based on specific premises and working hypotheses. However, under most conditions, the same general pattern could be observed across the different algorithms. More specifically, CDC2 and TIP41 always ranked amongst the most stable genes across all conditions; CYP displayed a high expression stability under control and cold conditions but a lower expression stability under the other conditions; ACT, ARI8, and EF1b were globally stable; and GAPDH, β-TUB, and PT1 were the least stable genes. Nevertheless, the variations existing between the outputs of the different algorithms were smoothened by the use of RankAggreg, which allowed us to produce a consensus list.
In the consensus list, CDC2 and TIP41 were the most stable RGs across all conditions. CDC2 codes for a cyclin dependent kinase (CDK1/CDKA), which is involved in eukaryotic cell cycle regulation. In association with various cyclins, CDK1/CDKA is known for its role in the control of the transition from one cell cycle phase to another [36]. It is interesting that its expression remained stable during metal stress, as cadmium has been reported to block the cell cycle at the G2 checkpoint in the roots of Lactuca sativa [37]. TIP41 codes for a 41kDa protein interacting with TAP42, which is involved in the regulation of cellular growth in response to the nutrient status and the environment [38]. It has been shown to be induced by long-term exposure to NaCl [38], but this was not observed in our study. CYP codes for a cyclophilin peptidyl prolyl isomerase involved in a wide range of basal functions, such as proper protein folding and regulation of plant growth and development. Some cyclophilin isoforms have been observed to be induced by abiotic stresses [39], but this was not the case for the cyclophilin isoform used in the present study. UCEE2 and ARI8 both code for proteins involved in the ubiquitination process (respectively in ubiquitin conjugation and ubiquitin ligation) [40]. Proteins involved in ubiquitination are constitutively expressed and have been reported to be reliable RGs in various species and under diverse stresses [16]. Interestingly, while ARI8 was relatively stable, UCEE2 expression stability varied from relatively stable (sixth most stable gene in all conditions grouped) to unstable (thirteenth under control condition). GAPDH, which was amongst the least stable genes, is involved in glycolysis but also in non-metabolic functions during abiotic stress [41]. It was less expressed in the roots than in the leaves, as it also plays a role in the Calvin-Benson cycle.
The expression level of the different candidate RGs was globally stable over time in the roots of S. viminalis under control condition. The same pattern could be observed for every RGs except GAPDH in the roots of plants exposed to cold and heat. No clear trend could be observed in the roots exposed to salt stress as some genes showed increasing expression level and other decreasing. Under drought conditions, a significant increase in the C q s value could be observed for most genes, except for GAPDH, which displays opposite trend. Curiously, candidate RGs that showed different trends than the other RGs (for example CYP, β-TUB and GAPDH in the roots under heat stress) were ranked as less stable by the algorithms. This is due to the fact that ideal RGs expression patterns are expected to be highly correlated between them.
Interestingly, GAPDH and β-TUB, which are both widely used as reference genes in plants, were ranked as the least stable genes under most conditions. This is worrisome, as standardization with non-stable RGs can lead to misinterpretations of data, all the more since often only one RG is used. For example, let us consider an experiment in which we want to investigate the effect of metal exposure on the expression level of some GOI in the roots of S. viminalis. If a GOI is standardized with GAPDH, which has a maximum variation of 4.04 C q between control roots and roots exposed to metals, there could potentially be a [(amplification factor) maximum variation = 1.88 4.04 =] 12.81-fold difference in GOI quantification compared to the situation where we use an ideal RG, which would have no variation across samples and conditions. However, as mentioned previously, perfect RGs do not exist, and even TIP41, which was ranked as the most stable gene under metal exposure, displayed a maximum variation of 2.24-fold. As such, the use of multiple RGs is recommended, as it cancels out the variation in expression observed for individual RG [17,26].
The optimum combination of genes given by geNorm PLUS and GrayNorm were not the same. This was expected, as the two algorithms have a different approach to determine the best combination of RGs. Indeed, geNorm PLUS combines genes based on their individual stability i.e., it groups together the n most stable genes, as such the addition of a new RG (n+1) does not significantly affect the normalization factor. On the contrary, GrayNorm groups genes based on the stability of their NF. This can lead to the proposed use of genes with poor individual stability but which, grouped together, have a higher stability. In other words, geNorm PLUS computes the combination of the best RGs while GrayNorm calculates the best combination of RGs. Nevertheless, it is clear from both the geNorm PLUS and GrayNorm output that the optimal number of genes to use for gene expression normalization depends on the condition and the organ that is targeted, as it was already observed in Arabidopsis thaliana [42], Petunia hybrid [43], Populus euphratica [30], and Salix psammophila [22] amongst others. Nevertheless, a minimum of three stable genes should be sufficient for data normalization in most studies [28].
For the validation phase, we used three genes known to change in expression during exposure to abiotic stresses. In general, the stress-responsive genes displayed the same response patterns in both roots and leaves. However, some differences were observed between both organs. For example, while metals did not induce ADC in the roots, its expression level increased in the leaves. Similarly, while cold did not significantly altered CAT expression in the roots, it was significantly down-regulated in the cold-exposed leaves.

Plant Cultivation and Sampling
Green cuttings of approximately 15 cm and 5 mm diameter of one individual of S. viminalis were rooted in containers filled with a mix of 25 kg of potting soil and 17.5 kg sand under greenhouse conditions for 5 months. Shortly after bud break, rooted cuttings were transferred to pots (2 cuttings per pots) containing the same mixture as previously used. Three of these pots were supplemented with a mixture of metals under the form of metal chloride (final concentration of 3 ppm Cd, 60 ppm Ni, 1000 ppm Zn, 350 ppm Cu), and the rooted cuttings were allowed to grow for two months, forming the metal-exposed samples. After two months of growth in the greenhouse, the cuttings were subjected to abiotic stresses and sampled at several time points (cf. Table 1). Four cuttings were exposed to cold (9/7 • C, 16/8 photoperiod) and four to heat (35/25 • C, 16/8 photoperiod) and sampled after 1, 4, and 7 days. For salt stress, 200 mM of NaCl were solubilized in water and gradually added during the first day of the treatment; cuttings were subsequently watered with tap water and sampled after 1, 4, and 12 days. For drought stress, plants were left without watering and sampled at days 4, 6, and 7. Organs from plants under control conditions (no treatment) were sampled at each of these time points. For each of these treatments, organs from the two cuttings growing in the same pot were pooled as one biological replicate. At each sampling points, chlorophyll fluorescence was measured on the third fully expanded leaf using a Plant Efficiency Analyzer (Hansatech Instruments, Pentney, England). More precisely, intact leaves were dark-adapted for 30 min and then minimal fluorescence yield (F 0 ) was determined with a diode emitting red light (650 nm). Maximal fluorescence yield (F M ) was determined after exposure to a saturating flash of light. The variable fluorescent yield (F V ) was calculated as F M -F 0 , and the F V /F M ratio was used as a stress marker. In addition, the volumetric water content (RWC) was recorded for the pots of control and drought-exposed plants at each sampling time with a Field Scout Digital Moisture Sensor (Turf-Tec International, Tallahassee, FL, USA). For each pot, the final value recorded was the average of 5 measurements taken at different places.
On the 12 th day, coarse roots xylem and phloem, stem cortex and bark were also collected from control plants. All the organs were sampled in biological triplicate and snap-frozen in liquid nitrogen and stored at −80 • C before subsequent use.

Total RNA Isolation and cDNA Synthesis
Samples (250 mg) were ground into a fine powder using mortar and pestle in liquid nitrogen. Total RNA was extracted using a modified CTAB-buffer extraction protocol [44]. Total RNA was purified and treated with DNase I using the RNeasy plant mini kit (Qiagen, Leusden, The Netherlands) according to the manufacturer's instructions. RNA purity and quantity were assessed by measuring the absorbance at 230, 260, and 280 nm using a Nanodrop ND1000 spectrophotometer (Thermo Scientific, Villebon-sur-Yvette, France). Total RNAs integrity was assessed using the RNA Nano 6000 Assay (Agilent Technologies, Diegem, Belgium) and a 2100 Bioanalyzer (Agilent technologies, Santa Clara, CA, USA) with parameters adapted to plant RNA profiles. All samples had sufficient purity and integrity scores. Precise RINs as well as 230/260 and 280/260 ratios are available in Table S5. Subsequently, one microgram of total RNA was retro-transcribed using ProtoScript II First Strand cDNA Synthesis Kit (NEB, Leiden, The Netherlands), following the manufacturer's instructions.
For each primer pair, standard curves were obtained using a 5-fold dilution series of cDNA template over seven dilution points, starting from a concentration of 12.5 ng/µL. The coefficient of determination (r 2 ) and slope (S) values were obtained from the standard curves. All the coefficients of determination were above 0.99 (Table 2, Table S6). Primer efficiencies were calculated with qBase PLUS (ver. 3.5; https://www.qbaseplus.com; Hellemans et al., 2007 [46]). All efficiencies ranged from 83% to 115%.
For the qPCR analysis, 4 ng of cDNA were used as a template. The reactions were performed in a 384-wells plate prepared with a liquid handling robot (epMotion 5073, Eppendorf, Hamburg, Germany). The cDNA was amplified using Takyon low ROX SYBR qPCR MasterMix dTTP Blue Kit (Eurogentech, Liège, Belgium) on a Viia 7 Real-Time PCR System (Thermo Fisher, Waltham, MA, USA) in a final volume of 10 µL. No-template as well as non-RT controls ensured that the samples were free from contamination. The reactions were performed in technical triplicates for each biological independent replicate. The PCR conditions consisted of an initial denaturation at 95 • C for 3 min, followed by 40 cycles of denaturation at 95 • C for 10 sec and annealing/extension at 60 • C for 60 sec. A melting curve analysis was performed at the end of each experiment to check the specificity of the amplified products.

Descriptive Analyses
For all the analyses, data obtained from the Viia 7 Real-Time PCR Software were trimmed and exported into Excel datasheets. The descriptive analyses were performed using R (v. 3.5.1; https://www. r-project.org). The boxblots of the quantification cycle values were made with the median value of each technical replicate using the package ggplot2 (v. 3.2.0; https://cran.r-project.org/web/packages/ggplot2/). Stability over time was assessed by plotting the median value of each technical replicate as a function of the days. Linear regression and 95% confidence intervals were calculated by ggplot2.

Determination of RG Expression Stability with geNormPLUS, BestKeeper, NormFinder, and GrayNorm
The geNorm PLUS version included in qBase PLUS was used, with the primer efficiencies previously calculated and all parameters set to default. As the original BestKeeper Excel spreadsheet (ver. 1.0; https://www.gene-quantification.de/bestkeeper.html) can only compute gene stability for a maximum of 10 candidate RGs, the R-based package ctrlGene (ver. 1.0.0; https://cran.r-project.org/web/packages/ ctrlGene/) was used after verifying it yielded the same results as the Excel-based solution (cf. Results). NormFinder software (ver. 0.953; http://moma.dk/normfinder-software) is a Visual Basic Application based on Excel to rank RG expression stability. Input data was first transformed as required and a unique group identifier was set for each combination of organ, sampling day and condition. GrayNorm (ver. 1.1; https://github.com/gjbex/GrayNorm) is a python-based (ver. 2.7.x) algorithm able to compute the combination of genes that introduces the least uncertainty during gene expression normalization. It was used on Anaconda (ver. 1.9.2; https://www.anaconda.com/distribution/) and control groups were defined as the samples from day 0, with no treatment. When RG stability in more than one organ was computed, roots were also set as control group.

Consensus Ranking of Candidate RGs with RankAggreg
In order to generate a consensus from the data produced by geNorm PLUS , BestKeeper, NormFinder, and GrayNorm, we aggregated the obtained ranking lists by applying the RankAggreg (ver. 0.6.5) package of R software as done previously [16,43,47]. RankAggreg is a package which provides algorithms able to combine different ranking lists. Based on the size of our ranking lists, we used the Cross-Entropy Monte Carlo algorithm. The ranking list previously generated were used as input with the following parameters: Distance was calculated using the Spearman's Footrule function, rho was set at 0.1, the seed at 75, and the "convIn" argument at 50.

Reference Genes Validation
The expression level of the stress-responsive genes studied in the various organs and conditions was analyzed using qBase PLUS and normalized using the five most stable RGs as indicated in the "Results" section. The expression level of the stress-responsive genes, expressed as "Calibrated Normalized Relative Quantities" (CNRQs) as calculated by qBase PLUS , was exported to Excel datasheet. A one-way ANOVA (followed by a Tukey-Kramer post-hoc test) taking all the possible "day x condition" combination as factors was performed on the log Efficiency transformed CNRQs using R and the agricolae package (https://cran.r-project.org/web/packages/agricolae).

Conclusions
The expression stability of 14 candidate RGs was assessed in Salix viminalis leaves and roots exposed for 12 days to five abiotic stress conditions. This assessment was done using four algorithms and the different rankings they produced were aggregated into a consensus list. Out of the 14 candidate RGs, TIP41 and CDC2 were globally the most stable genes across all conditions. Other RGs that should be used in combination with TIP41 and CDC2 were organ and condition dependent. Our results provide a panel of reference genes that can be used when performing a gene expression analysis in different organs and under various stresses in S. viminalis. These reference genes will be useful for unwinding the expression patterns and function of genes involved in stress tolerance. This report also emphasizes the MIQE recommendations. Amongst those, it clearly outlines the importance to use more than one reference gene, which unfortunately still occurs a lot. In addition, this report emphasizes the importance to validate widely used reference genes before performing an experiment with specific plants and stresses.