Identification of Reference Genes for Quantitative Gene Expression Studies in Pinus massoniana and Its Introgression Hybrid

qRT-PCR is a powerful molecular research tool to study the regulation of gene expression. However, to accurately calculate gene expression levels, an experiment should include proper reference genes that show no changes in their expression level. Pinus massoniana, P. hwangshanensis, and their introgression hybrid in Mountain Lushan, China, are an ideal model for studying introgression and speciation. Although some research on reference gene selection for P. massoniana has been reported before, no studies on this subject have been performed where P. massoniana and its introgression hybrid were evaluated simultaneously. Here, we investigated ten genes (upLOC, SDH, ACT, EF, TOC75, DMWD, FBOX, PGK1, UBQ, and CL2417C7) identified from transcriptome data of these two taxa for reference gene potential. These ten genes were then screened across multiple tissues such as cone, young and mature stems, and young needles according to qRT-PCR thermal cycling and dissociation. Correlation coefficient, amplification efficiency, and cycle threshold value (Ct) range were applied to evaluate the reliability of each gene. The stability of candidate reference gene expression was calculated using three algorithms: geNorm, NormFinder, and BestKeeper. Base on the reliability and stability, we then offered a list of genes of recommended and not recommended for seven different tissue type and species. Our results demonstrated that different sample lines require different genes as reference to evaluate.


Introduction
Masson pine (Pinus massoniana) is an economically important conifer species that is naturally distributed in southern China.Its wood is often used for standing timber and furniture manufacturing.Resin, a secretion from its canals, can be used for the maintenance of musical instrument strings, and is an ingredient in several medicines [1,2].Because of its fast growth and high biomass, it has been planted extensively in south China during the past fifty years for rapid vegetation recovery and additional ecological purposes.
Huangshan pine (P.hwangshanensis) is a species native only to China.Although it grows at a considerably slower rate than P. massoniana, it has ornamental value as a horticultural plant in higher altitude regions.
Around the middle and lower reaches of the Yangtze River, P. massoniana mainly grows at an altitude below 700 m (above sea level, a.s.l.), while P. hwangshanensis mainly grows above 900 m (a.s.l.).On Mountain Lushan, situated in this area, both pines have their own ecological niche according to the distribution mentioned above.An introgression hybrid pine (hereafter referred to as the Z pine) can be found directly adjacent to the habitats of both P. massoniana and P. hwangshanensis (Figure 1).There are several morphological differences between P. massoniana and P. hwangshanensis [3], while the Z pine displays phenotypic characteristics of both parental pines [4,5].However, it also shows novel phenotypes, such as an ultra-low pinecone ripening and seed germination rate, compared to both P. massoniana and P. hwangshanensis [6].This distinct characteristic shows that genetic incompatibilities during the process of fertilization and/or even embryonic development might exist between both parents of the Z pine.Our previous research has shown that a group of differentially expressed genes (DEGs) related to reproduction were expressed at a much lower level in the Z pine than in P. massoniana, possibly causing the abnormal reproductive phenotype in the Z pine [7].
Forests 2019, 10, x FOR PEER REVIEW 2 of 13 (a.s.l.).On Mountain Lushan, situated in this area, both pines have their own ecological niche according to the distribution mentioned above.An introgression hybrid pine (hereafter referred to as the Z pine) can be found directly adjacent to the habitats of both P. massoniana and P. hwangshanensis (Figure 1).There are several morphological differences between P. massoniana and P. hwangshanensis [3], while the Z pine displays phenotypic characteristics of both parental pines [4,5].However, it also shows novel phenotypes, such as an ultra-low pinecone ripening and seed germination rate, compared to both P. massoniana and P. hwangshanensis [6].This distinct characteristic shows that genetic incompatibilities during the process of fertilization and/or even embryonic development might exist between both parents of the Z pine.Our previous research has shown that a group of differentially expressed genes (DEGs) related to reproduction were expressed at a much lower level in the Z pine than in P. massoniana, possibly causing the abnormal reproductive phenotype in the Z pine [7].Understanding how the expression of certain genes of interest is regulated in space, time, and different tissues is a crucial approach to further understanding about their biological function.Various methods can be applied to study gene expression: With microarrays, northern blots, RNAseq, and qRT-PCR (quantitative real-time polymerase chain reaction) being considered some of the more common ways, RNA-seq and qRT-PCR being the most popular.After qRT-PCR was introduced in the late twentieth century [8], it has been widely applied to quantify gene expression in life science research and medical diagnosis.qRT-PCR has a series of advantages, including high accuracy, sensitivity, and reproducibility.Nevertheless, due to its high sensitivity, the qRT-PCR assay might be affected by RNA or cDNA quality, primer specificity, amplification efficiency [9], etc.Therefore, optimal normalization methods are required for sample-to-sample comparison.A series of strategies has been introduced for normalizing qRT-PCR data [9][10][11][12][13].In general, in order to evaluate the expression quantity of a gene, a reference gene is required for various samples and/or experimental conditions (e.g., developmental stages, different organs, treatments, etc.).A reference gene is a gene that shows unchanging expression among all samples and/or experimental treatments [14] and can then be used to normalize and calibrate results [15].Therefore, obtaining a reliable reference gene is a critical step before being able to perform qRT-PCR in further research.
However, no single universal reference gene is found acceptable to evaluate all species or samples.Due to the varying PCR amplification efficiency of the target sequence, a different reference gene could be required.Reference genes with poor availability could lead to significant biases and faulty interpretation of the data.Therefore, determining a reference gene is an essential step before starting the experimental process.
Genes such as ACT (actin), β-TUB (β-tubulin), GAPDH (glyceraldehyde 3-phosphate dehydrogenases), EF (elongation factor), and 18S rRNA have commonly been assigned as reference Understanding how the expression of certain genes of interest is regulated in space, time, and different tissues is a crucial approach to further understanding about their biological function.Various methods can be applied to study gene expression: With microarrays, northern blots, RNA-seq, and qRT-PCR (quantitative real-time polymerase chain reaction) being considered some of the more common ways, RNA-seq and qRT-PCR being the most popular.After qRT-PCR was introduced in the late twentieth century [8], it has been widely applied to quantify gene expression in life science research and medical diagnosis.qRT-PCR has a series of advantages, including high accuracy, sensitivity, and reproducibility.Nevertheless, due to its high sensitivity, the qRT-PCR assay might be affected by RNA or cDNA quality, primer specificity, amplification efficiency [9], etc.Therefore, optimal normalization methods are required for sample-to-sample comparison.A series of strategies has been introduced for normalizing qRT-PCR data [9][10][11][12][13].In general, in order to evaluate the expression quantity of a gene, a reference gene is required for various samples and/or experimental conditions (e.g., developmental stages, different organs, treatments, etc.).A reference gene is a gene that shows unchanging expression among all samples and/or experimental treatments [14] and can then be used to normalize and calibrate results [15].Therefore, obtaining a reliable reference gene is a critical step before being able to perform qRT-PCR in further research.
However, no single universal reference gene is found acceptable to evaluate all species or samples.Due to the varying PCR amplification efficiency of the target sequence, a different reference gene could be required.Reference genes with poor availability could lead to significant biases and faulty interpretation of the data.Therefore, determining a reference gene is an essential step before starting the experimental process.
Genes such as ACT (actin), β-TUB (β-tubulin), GAPDH (glyceraldehyde 3-phosphate dehydrogenases), EF (elongation factor), and 18S rRNA have commonly been assigned as reference genes in recent research, we call them 'classical' reference genes.In addition, several novel reference Forests 2019, 10, 787 3 of 13 genes have emerged from recent research.For example, RPL29 (60S ribosomal protein L29) was found to be the most stable reference gene in Bemisia tabaci (Hemiptera: Aleyrodidae) under biotic conditions including host plant, acquisition of a plant virus, developmental stage, tissue from body region of the adult, and whitefly biotype [16].RPS4 (ribosomal protein S4) and UBQ (ubiquitin), a novel and classical reference gene, were found to be the optimal combination of reference genes used to study turbot (Scophthalmus maximus) gonad development before fertilization [17].TIP41 (tonoplast intrinsic protein) and NTB (nucleotide tract-binding protein) could serve as reliable reference genes across all tissues and at different developmental stages in bamboo (Phyllostachys edulis) [18].
As the number of transcriptomic studies on various species increased, more and more unigenes were discovered and were assigned as reference genes for non-model species.These novel transcriptomic reference genes usually expressed a better stability than their classical counterparts.Our study is the first attempt to identify reference genes in multiple tissues of P. massoniana and its introgression hybrid.This study could offer a guideline for reference gene identification used for research into hybridization.

Preparation of Samples
Cones, mature needles, and young stems of mature P. massoniana and the Z pine were collected.Due to existing individual variation in open-pollinated wild pine, we conducted a mix-up strategy in sample collection to minimize biological variation within each organ sample: Three to five tree individuals were assigned to be collected from, tissues were then harvested and randomized per tissue.After that the mixed materials were subpackaged into RNase-free 50 mL centrifuge tubes or ziplock bags, which were frozen in liquid nitrogen or dry ice immediately before storing in a −80 • C ultra-low temperature freezer.
Young needles were collected from seedlings of P. massoniana and the Z pine.Seeds of each taxa were surface sterilized for 2 h with a 0.4% (w/v) solution of KMnO 4 (potassium permanganate) and rinsed 5 times with sterile distilled water.Seeds were then moved into a germination box.A layer of cotton (saturated with sterile distilled water) was placed at the bottom of the germination box for retaining sufficient moisture, seeds were then placed on top.Germination boxes were positioned in a versatile environmental test chamber (MLR-352H, Panasonic Healthcare Co., Ltd., Ehime, Japan) under a 16:8 h day:night photoperiod cycle at 25 • C (day) and 22 • C (night), with a humidity setting of 70% (day) and 75% (night).During germination, seedlings with a root length of more than 1 cm were chosen to transfer to Pindstrup Substrate (Pindstrup Mosebrug A/S, Denmark) inside pots with 16 cm in diameter and grown under conditions identical to those used for germination.Young needles were carefully collected in RNase-free 15 mL centrifuge tubes and were frozen in liquid nitrogen, then transferred into an ultra-low temperature freezer (−80 • C) for storage.
More information of samples could be found in Table 1 and Figure 2.

Candidate Reference Gene Selection and Primer Design
Candidate reference genes were selected from previous transcriptome data of P. massoniana and Z pine cones [7].These transcriptome data are available on the NCBI Sequence Read Archive (SRA) under the BioProject accession number PRJNA482692 [7].Candidate reference genes were screened based on the following criteria: Their FPKM (fragments per kilobase of transcript per million fragments mapped) values were stable at different developmental stages, the coefficient of variation (CV) of the FPKM value can be no larger than 0.5.
Details of candidate reference genes and their corresponding primers are listed in Table 2.

Candidate Reference Gene Selection and Primer Design
Candidate reference genes were selected from previous transcriptome data of P. massoniana and Z pine cones [7].These transcriptome data are available on the NCBI Sequence Read Archive (SRA) under the BioProject accession number PRJNA482692 [7].Candidate reference genes were screened based on the following criteria: Their FPKM (fragments per kilobase of transcript per million fragments mapped) values were stable at different developmental stages, the coefficient of variation (CV) of the FPKM value can be no larger than 0.5.
Details of candidate reference genes and their corresponding primers are listed in Table 2.

RNA Extraction, cDNA Template Preparation, and Quantitative Real-Time PCR (qRT-PCR)
Total RNA extraction of all samples was conducted using a Bioteke Plant RNA Extraction Kit (RP3301, Bioteke Corporation, Beijing, China).The 260 nm/280 nm UV absorption value of each RNA sample was measured using a Nanodrop 2000 (Thermo Fisher Scientific, Waltham, MA, USA) to confirm its purity.The RIN (RNA integrity number) was measured using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).cDNA templates were synthesized using the Vazyme HiScript II Q RT SuperMix for qPCR (R223-01, Vazyme Biotech Co.,Ltd., Nanjing, China).A total of 1 µg RNA was chosen for cDNA synthesis of each sample.After that, each reaction (20 µL) was diluted to 200 µL (10-fold diluted) and stored at −20 • C.After running the thermal cycling and dissociation, the amplification and melting curves were obtained for evaluating the quality of the candidate reference genes amplification product.Each sample was tested with three technical replicates.
To assess the qRT-PCR amplification efficiency, the original cDNA solution was diluted 10-, 50-, 250-, 1250-, 6250-fold, respectively.All these solutions were used as templates for qRT-PCR amplification.A standard curve for each reference gene was drawn by using the average value of the three Ct (cycle threshold) technical replicates and its corresponding log solution concentration.The amplification efficiency was calculated using a standard Formula (1), where AE is the amplification efficiency and slope is the slope of the standard curve.The slope and correlation coefficient were automatically computed by Microsoft Office Excel when drawing the standard curve.

Stability Evaluation of Candidate Reference Genes
Three different algorithms, including geNorm [20], NormFinder [21], and BestKeeper [22] were assigned as tools to evaluate the stability of each candidate reference gene.All three algorithms are operated in Microsoft Office Excel, where geNorm and NormFinder need to enable 'Macros' in Excel to launch, while BestKeeper does not need this operation.The Ct relative expression value was the only correct data format to put into geNorm and NormFinder before evaluating, while in BestKeeper the Ct value can be put in directly.The Excel Add-in geNorm searches for the one or two most stable reference genes by computing the gene expression normalization factor, an M value, according to the geometric mean of a series of candidate reference genes.The M value indicates the stability level of every single gene, with the least stable gene having the maximum M value and the most stable gene the minimum M value.Usually, genes with an M value higher than 1.5 would be identified as unstable.Furthermore, a pairwise variation value offered by geNorm determines how many genes should be chosen as additional reference gene.The pairwise variation has a v n /v n+1 form value: If this value is bigger than 0.15 (threshold of default), then the ideal number of combinations would be 'n+1 .Otherwise it would be 'n' as recommended.
The Excel-based VBA (Visual Basic for Applications) applet NormFinder provides an algorithm evaluating stability of a group of genes.It can assess the expression variation of entire groups of genes with intra-and inter-group comparisons simultaneously.Like geNorm, it uses an M value to indicate the stability of a gene, with a lower number equating to better stability.
Finally, the Excel-based tool BestKeeper, computes a series of indicators such as standard deviation, correlation coefficient, coefficient of variable, aiming to find which genes are more stable.Here, we adopted the correlation coefficient as a single parameter to decide the most stable gene, the closer to 1 the better.

Reference Gene Evaluation
Ten candidate reference genes were tested in different P. massoniana and Z pine tissues (cone, young stem, mature needle, and young needle).The PCR product melting curves showed that all genes and primers had their unique amplification specificity (Figures S1-S8).Correlation coefficient (R), slope and amplification efficiency (AE) of each gene under every tissue were calculated (Figures S1-S8), also.The correlation coefficient ranged from 0.9610 to 0.9999, while the amplification efficiency range is 1.22 to 2.86.Concerning the correlation coefficient, in both species and all tissues, EF was greater than 0.99, while ACT, PGK1, and CL2417C7 were greater than 0.985.One sample, amplification of the FBOX gene in MYN tissue, had the lowest correlation coefficient at 0.9610.The SDH and FBOX genes showed the highest amplification efficiency across all samples, while upLOC, ACT, TOC75, PGK1, and CL2417C7 had a lower value (Figure 3).
Forests 2019, 10, x FOR PEER REVIEW 6 of 13 candidate reference genes.The M value indicates the stability level of every single gene, with the least stable gene having the maximum M value and the most stable gene the minimum M value.Usually, genes with an M value higher than 1.5 would be identified as unstable.Furthermore, a pairwise variation value offered by geNorm determines how many genes should be chosen as additional reference gene.The pairwise variation has a vn/vn+1 form value: If this value is bigger than 0.15 (threshold of default), then the ideal number of combinations would be 'n+1′.Otherwise it would be 'n' as recommended.
The Excel-based VBA (Visual Basic for Applications) applet NormFinder provides an algorithm evaluating stability of a group of genes.It can assess the expression variation of entire groups of genes with intra-and inter-group comparisons simultaneously.Like geNorm, it uses an M value to indicate the stability of a gene, with a lower number equating to better stability.
Finally, the Excel-based tool BestKeeper, computes a series of indicators such as standard deviation, correlation coefficient, coefficient of variable, aiming to find which genes are more stable.Here, we adopted the correlation coefficient as a single parameter to decide the most stable gene, the closer to 1 the better.

Reference Gene Evaluation
Ten candidate reference genes were tested in different P. massoniana and Z pine tissues (cone, young stem, mature needle, and young needle).The PCR product melting curves showed that all genes and primers had their unique amplification specificity (Figures S1-8).Correlation coefficient (R), slope and amplification efficiency (AE) of each gene under every tissue were calculated (Figure S1-8), also.The correlation coefficient ranged from 0.9610 to 0.9999, while the amplification efficiency range is 1.22 to 2.86.Concerning the correlation coefficient, in both species and all tissues, EF was greater than 0.99, while ACT, PGK1, and CL2417C7 were greater than 0.985.One sample, amplification of the FBOX gene in MYN tissue, had the lowest correlation coefficient at 0.9610.The SDH and FBOX genes showed the highest amplification efficiency across all samples, while upLOC, ACT, TOC75, PGK1, and CL2417C7 had a lower value (Figure 3).  1.

Z pine Has a Higher Expression Abundance than P. massoniana Across Ten Candidate Genes
The expression level of ten candidate reference genes in all assayed tissues (cone, young stem, mature needle, young needle) of P. massoniana and Z pine has a wide range according to their Ct (~16 to ~28 for both species) (Figure 4).EF has the lowest Ct value in both species, indicating that it shows the highest expression abundance.The FBOX Ct value is the highest, indicating that it has the lowest

Z pine Has a Higher Expression Abundance than P. massoniana Across Ten Candidate Genes
The expression level of ten candidate reference genes in all assayed tissues (cone, young stem, mature needle, young needle) of P. massoniana and Z pine has a wide range according to their Ct (~16 to ~28 for both species) (Figure 4).EF has the lowest Ct value in both species, indicating that it shows the highest expression abundance.The FBOX Ct value is the highest, indicating that it has the lowest expression abundance.In addition, Ct fluctuations in PGK1 are smallest in P. massoniana, while TOC75 and UBQ expression varies the least in Z pine, indicating that their expression abundance is very similar across different organs for their respective species.SDH shows the highest variation in expression level among all genes tested, which was not really to serve as a reference gene anymore.Overall, Ct values from P. massoniana are significantly higher than those of the Z pine, possibly reflecting that the Z pine has a higher basic expression abundance.All genes show a Ct distribution range between 15 and 30, which is an acceptable range for a potential reference gene.
Forests 2019, 10, x FOR PEER REVIEW 7 of 13 expression abundance.In addition, Ct fluctuations in PGK1 are smallest in P. massoniana, while TOC75 and UBQ expression varies the least in Z pine, indicating that their expression abundance is very similar across different organs for their respective species.SDH shows the highest variation in expression level among all genes tested, which was not really to serve as a reference gene anymore.
Overall, Ct values from P. massoniana are significantly higher than those of the Z pine, possibly reflecting that the Z pine has a higher basic expression abundance.All genes show a Ct distribution range between 15 and 30, which is an acceptable range for a potential reference gene.

Candidate Reference Genes Show Variable Stability Across Sample Lines
The stability of these ten genes was assessed using the M and r (correlation coefficient) values derived from either geNorm, NormFinder, and BestKeeper.Ranking the M and r value of each gene offers information about their relative performance; therefore, we created a parameter called 'ranking score' which is the sum of the M and r values, with smaller values being better (Table S1).Both the exact values (of M and r) and ranking score were assigned to decide which gene would perform best as a reference gene.
Within both cone samples (MCN + ZCN), UBQ, DMWD, and ACT were expressed at the most stable level in geNorm, NormFinder, and BestKeeper, respectively (Figure 5a), with ACT having the lowest ranking score (Table S1).This makes ACT the most stably expressed reference gene in cone tissue, while EF and TOC75 showed the most unstable expression.Within young stem samples (MYS + ZYS), PGK1 was the most stably expressed according to all three algorithms, followed by ACT and EF; TOC75 was the most unstably expressed in young stem tissue (Figure 5b).Within mature needle tissue (MMN + ZMN), FBOX, DMWD, and upLOC showed the most stable expression according to geNorm, NormFinder, and BestKeeper, respectively (Figure 5c).According to their ranking score, FBOX was the most stably expressed (Table S1), CL2417C7 the most unstable.Considering that FBOX has a relatively high amplification efficiency (Figure 3b), which may cause uncertainty in normalization, it was excluded from the recommended list.Within young needle tissue (MYN + ZYN), almost all genes, excluding CL2417C7 which was more unstable, performed similarly, with upLOC, UBQ, DMWD, and ACT showing the highest stability (Figure 5d).
Analyzing the combined data from all P. massoniana tissues (Figure 5e), we found that upLOC and ACT showed the most stable expression using the geNorm and BestKeeper algorithms.Even though DMWD was deemed the most stable using NormFinder, it had a relatively low ranking using the remaining two algorithms.Therefore, we conclude that ACT and upLOC are the most stably 4. Range of Ct values for ten reference gene candidates across different P. massoniana and Z pine tissues.Ct distribution is presented as a box-plot, indicating the interquartile range (box), median (horizon line in box), average (small square), maximum and minimum value (upper and lower whisker cap).

Candidate Reference Genes Show Variable Stability Across Sample Lines
The stability of these ten genes was assessed using the M and r (correlation coefficient) values derived from either geNorm, NormFinder, and BestKeeper.Ranking the M and r value of each gene offers information about their relative performance; therefore, we created a parameter called 'ranking score' which is the sum of the M and r values, with smaller values being better (Table S1).Both the exact values (of M and r) and ranking score were assigned to decide which gene would perform best as a reference gene.
Within both cone samples (MCN + ZCN), UBQ, DMWD, and ACT were expressed at the most stable level in geNorm, NormFinder, and BestKeeper, respectively (Figure 5a), with ACT having the lowest ranking score (Table S1).This makes ACT the most stably expressed reference gene in cone tissue, while EF and TOC75 showed the most unstable expression.Within young stem samples (MYS + ZYS), PGK1 was the most stably expressed according to all three algorithms, followed by ACT and EF; TOC75 was the most unstably expressed in young stem tissue (Figure 5b).Within mature needle tissue (MMN + ZMN), FBOX, DMWD, and upLOC showed the most stable expression according to geNorm, NormFinder, and BestKeeper, respectively (Figure 5c).According to their ranking score, FBOX was the most stably expressed (Table S1), CL2417C7 the most unstable.Considering that FBOX has a relatively high amplification efficiency (Figure 3b), which may cause uncertainty in normalization, it was excluded from the recommended list.Within young needle tissue (MYN + ZYN), almost all genes, excluding CL2417C7 which was more unstable, performed similarly, with upLOC, UBQ, DMWD, and ACT showing the highest stability (Figure 5d).and DMWD in that order are the most suitable for functioning as reference gene, with only PGK1 being unsuitable for this purpose (Figure 5g).We listed all (non-) recommended reference genes for each tissue type and species in Table 3.
We next analyzed the optimal number of reference genes that should be used for relative expression normalization using geNorm (Figure 5h).According to the criteria mentioned in Section 2.4, all pairwise variations of all tissue types expressed below 0.15 (threshold of default), therefore a combination of two genes would be the optimal number for every normalization.Analyzing the combined data from all P. massoniana tissues (Figure 5e), we found that upLOC and ACT showed the most stable expression using the geNorm and BestKeeper algorithms.Even though DMWD was deemed the most stable using NormFinder, it had a relatively low ranking using the remaining two algorithms.Therefore, we conclude that ACT and upLOC are the most stably expressed potential reference genes in P. massoniana.According to Figure 5e and Table S1, TOC75 was the most unstably expressed gene we analyzed across all P. massoniana samples.We then found upLOC and FBOX to be the most stably expressed reference genes across all Z pine tissues (Figure 5f).However, FBOX was not included in our recommended list due to its higher amplification efficiency (Figure 3b).In addition, we found that the PGK1 gene is not suitable for gene expression normalization in the Z pine due to its low ranking across all three algorithms.Finally, combining data from both P. massoniana and the Z pine within all tissue types, we conclude that upLOC, ACT, FBOX, and DMWD in that order are the most suitable for functioning as reference gene, with only PGK1 being unsuitable for this purpose (Figure 5g).We listed all (non-) recommended reference genes for each tissue type and species in Table 3. M_All: all P. massoniana samples (MCN+MYS+MMN+MYS); Z_All: all Z pine samples (ZCN+ZYS+ZMN+ZYN); MZ_All: all P. massoniana and Z pine samples; details of sample code please see Table 1.
We next analyzed the optimal number of reference genes that should be used for relative expression normalization using geNorm (Figure 5h).According to the criteria mentioned in Section 2.4, all pairwise variations of all tissue types expressed below 0.15 (threshold of default), therefore a combination of two genes would be the optimal number for every normalization.

Discussion
Acquisition of compatible reference genes is a prerequisite for obtaining reliable qRT-PCR results.The utility of a reference gene must be experimentally validated for particular tissues or cell types and specific experimental designs [14].Several traditional reference genes such as the actin, elongation factor, and GAPDH gene families have been frequently used in a number of different species, especially in model organisms like Danio rerio [23], Drosophila melanogaster [24], Schmidtea mediterranea, and Macrostomum lignano [25].However, they are not always suitable as reference gene under every possible experimental condition.As the scope of research continues to expand, other unconventional novel reference genes like uroporphyrin III C-methyltransferase, HcaT MFS transporter, L-idonate/5-ketogluconate/gluconate transporter in Escherichia coli [26], Type II metacaspases and GYF domain-containing protein in Brassica napus [27], have emerged.Evaluation of reference genes via transcriptome research in conifers has been reported in recent years [28].However, reference gene selection for more than one conifer taxa simultaneously has rarely been described [29].
In this paper, we aimed to determine optimal reference genes from P. massoniana and its introgression hybrid, the Z pine, in each of several chosen tissues: cone, young stem, mature and young needle.P. massoniana, and Z pine display significant phenotypic differences not only in appearance, but also at the transcriptome level [7].Finding appropriate reference genes for both taxa is an essential step for downstream experiments.After conducting sequence alignment and annotation using our previously generated transcriptomic data, we selected ten candidate reference genes (upLOC, SDH, ACT, EF, TOC75, DMWD, FBOX, PGK1, UBQ, and CL2417C7).
A series of assessments were conducted before calculating their stability value.All candidate genes and their primers should generate a single PCR product according to their melting curve (Figures S1-S8), which is a basic requirement for any reference gene.The correlation coefficient would be good if it closer to 1 and the amplification efficiency would be suitable when it is not too high.We found that most of our selected 10 genes indeed adhered to these criteria.Their amplification efficiency ranged from 1.22 to 2.86, which is higher than in some species like Davidia involucrata [30], Euphorbia esula [31], Phyllostachys edulis [18], but close to more related species such as Pinus pinaster and Picea abies [29].Furthermore, recent research has reported both low [32] and high [33] amplification efficiencies for P. massoniana genes.What causes this difference is still poorly understood?FBOX has a higher amplification efficiency compared to the other genes (Figure 3b); considering the accuracy of normalization, it was disqualified from becoming a reference gene (Table 3), although it was higher-ranked in the MMN+ZMN (all mature needle) and Z_ALL (all Z pine) samples.
During a qRT-PCR reaction, fluorescence values are recorded during every cycle and represent the amount of product amplified up until that point in the amplification reaction [9].If the sample contains more DNA template, it would take less cycles to reach the point at which the fluorescence signal could be detected [34].This point is called the cycle threshold [9].The cycle threshold, or Ct, is the main parameter obtained from a qRT-PCR reaction.Lower Ct values equate to a higher gene abundance.We found that the Z pine shows overall lower Ct values for all tested genes, in other words, it shows higher expression of these genes compared to P. massoniana.It is currently not clear why this is the case, however, since significant expression level differences exist between these two taxa, it is advised to perform stability expression analysis separately for each species if possible.
We used three different algorithms to evaluate the expression stability of candidate reference genes: geNorm [20], NormFidner [21], and BestKeeper [22].From each algorithm, we used both the stability value and the ranking score of each gene to determine their suitability as reference gene across multiple tissue types.The three algorithms did not always give the same ranking of gene expression stability (Figure 5a-g).In particular, geNorm and NormFinder showed a similar ranking across most samples, while BestKeeper gave a significantly different outcome.The reason for this phenomenon could be that geNorm and NormFinder requires a ∆Ct input while BestKeeper uses the raw Ct data.This difference in reference gene evaluation could also be seen in Undaria pinnatifida [35].However, studies have also been reported in which NormFinder and BestKeeper provided similar outcomes [36].Multiple control reference genes are essential if possible [10].Therefore, the parameter of pairwise variation in geNorm was calculated (Figure 5h) as a reference.
For each tissue type and species, we found different recommended and not recommended reference genes (Table 3).The upLOC gene is the most widely suitable reference gene, as it can be used for young and mature needle tissues, as well as combined tissue samples.In addition, ACT is a suitable reference gene for cone and mature needle tissue and combined tissue samples.Several candidate reference genes, such as TOC75 and CL2417C7 showed too much expression variability in several tissues to be usable as a reference gene.Interestingly, the PGK1 gene, which performs well as a reference gene for young stem tissue, showed too much expression variation in combined tissue samples.This analysis shows that suitability as a reference gene depends on tissue type, which could indicate that such genes show clear tissue specific expression.These findings suggest that prior to analyzing gene expression in a novel tissue type, reference gene suitability should be examined.
The actin gene family originates from a common ancestor in various species [37].Some species have three types of actin genes: α-, βand γ-actin.In plants, more than ten types of actin exist.β-actin has been widely used as reference gene across countless studies.Yet, even though actin is a celebrity in the 'reference gene club', it could not always be successfully applied as a reference gene; for example, in Medicago sativa [38], Symbiodinium [39], and Ganoderma lucidum [40].However, we tested ACT in normalization of a series target genes of cones from P. massoniana and Z pine in a recent study (as introduced in Table 3) and obtained an ideal result [7], which indicated that despite ACT not always being reliable, it might be in this case anyway.Phosphoglycerate kinase 1 (PGK1) is an ATP-generating glycolytic enzyme that forms part of the glycolytic pathway.Reports of this enzyme mainly concern human health and disease [41,42], where it also acts as reference gene in human pathology examinations [43].Little research has been published on dystrophia myotonica WD repeat-containing protein (DMWD), which plays a role in myotonic dystrophy, a complex neuromuscular disorder in mammals [44,45].The uncharacterized protein LOC103705956 (upLOC) is a novel gene that was initially found in transcriptome data of P. massoniana and Z pine cones [7].It showed high expression stability across multiple tissues in this study (Table 3), implying that it could be a potential valuable reference gene used for expression normalization, perhaps even for P. hwangshanensis and other species in the Pinus genus.

Conclusions
In this study we tested ten candidate reference genes, both classic and novel, from the transcriptome data of P. massoniana and Z pine cones.Our results show that ACT and upLOC are the most consistently well behaving reference genes taken across all tissue types and species, while PGK1 is the most stably expressed specifically in young stem samples.Our study has further shown that reference genes should be chosen carefully, for no single gene can fit every tissue even in the same or closely related species.

Figure 1 .
Figure 1.Schematic indicating the main distribution areas of P. massoniana, P.hwangshanensis, and their introgression hybrid, the Z pine, on Mt.Lushan, Jiangxi Province, China.a.s.l.: above sea level.

Figure 1 .
Figure 1.Schematic indicating the main distribution areas of P. massoniana, P.hwangshanensis, and their introgression hybrid, the Z pine, on Mt.Lushan, Jiangxi Province, China.a.s.l.: above sea level.
An Applied Biosystems 7500 PCR cycler (Thermo Fisher Scientific Corporation, CA, USA) was used to run the qRT-PCR.The Vazyme ChamQ SYBR qPCR Master Mix (Q331-02, Vazyme Biotech Co.,Ltd., Nanjing, China) was chosen as reaction reagent kit.For each qRT-PCR sample, 4 µL cDNA was added to a 20 µL reaction volume, containing 10 µL of 2× ChamQ SYBR qPCR Master Mix (Low ROX Premixed) and 0.4 µL of each primer.Thermal cycling was performed with an auto increment step of 30 s at 95 • C, and followed by 40 cycles of 10 s at 95 • C, 34 s at 60 • C. Dissociation analysis was performed with a default step of 15 s at 95 • C, and followed by 60 s at 60 • C, then 15 s at 95 • C.

Figure 3 .
Figure 3. Correlation coefficient and amplification efficiency of ten candidate reference genes across all samples.(a) Correlation coefficient; (b) amplification efficiency.The legend on the right displays colored sample code information, see Table 1.

Figure 3 .
Figure 3. Correlation coefficient and amplification efficiency of ten candidate reference genes across all samples.(a) Correlation coefficient; (b) amplification efficiency.The legend on the right displays colored sample code information, see Table 1.
Figure 3. Correlation coefficient and amplification efficiency of ten candidate reference genes across all samples.(a) Correlation coefficient; (b) amplification efficiency.The legend on the right displays colored sample code information, see Table 1.

Figure 4 .
Figure 4. Range of Ct values for ten reference gene candidates across different P. massoniana and Z pine tissues.Ct distribution is presented as a box-plot, indicating the interquartile range (box), median (horizon line in box), average (small square), maximum and minimum value (upper and lower whisker cap).

Table 1 .
Details of sample collection.

Table 2 .
Reference genes and their primers in this study.

Table 2 .
Reference genes and their primers in this study.

Table 3 .
The recommended and NOT recommended reference genes for each tissue type/species.

Table 3 .
The recommended and NOT recommended reference genes for each tissue type/species.