Data Mining Nitrogen-Responsive Gene Expression for Source–Sink Relations and Indicators of N Status in Potato

: Potato tuber yields depend on nitrogen (N) supply, which a ﬀ ects source–sink relations. Transcriptome sequencing of the foliar source using a single ﬁeld trial identiﬁed gene expression responsive to 180 kg N ha − 1 . The expression of N-responsive genes was further analyzed in the next stage using a NanoString nCounter over an expanded number of foliar samples from seven ﬁeld trials with varying N rates, sites, and cultivars. Least absolute shrinkage and selection operator (LASSO) regression models of gene expression predictive of yield, total plant N uptake, and tuber-speciﬁc gravity (proxy for dry matter) were built. Genes in the LASSO model for yield were associated with source–sink partitioning. A key gene regulating tuberization and senescence, StSP6A Flowering locus T , was found in the LASSO model predicting tuber yield, but not the other models. An aminotransferase involved in photorespiratory N assimilation and amino acid biosynthesis was found in all LASSO models. Other genes functioning in amino acid biosynthesis and integration of sulfur (S) and N metabolism were also found in the yield prediction model. The study provides insights on N responses in foliage of potato plants that a ﬀ ect source–sink partitioning. Additionally, N-responsive genes predictive of yield are candidate indicators of N status. test Cross-validated test mean squared error (MSE) and adjusted R-squared results on training were both These results were compared to cross-validated MSE and adjusted R-squared results from a linear all for assessing LASSO10 biological processes that contribute to N assimilation and sink partitioning. Of interest were di ﬀ erences between the genes in the total N uptake and yield models. The total N uptake model included genes involved in amino acid and nitrate transport that were not present in the yield model. These results suggest that these transport activities were associated with N uptake—more so than sink partitioning. The yield model included genes involved in sulfur sensing and transport. These results suggest integration of N and S metabolism is associated with N-responsive source–sink partitioning. Other studies also reported ﬁndings that sulfur


Introduction
Tubers are underground starch-storage extensions from the stem, which are used for vegetative propagation and are the major sink organ of the potato (Solanum tuberosum L.). Tuber yield is dependent Agronomy 2020, 10 on radiation intercepted by leaves and is responsive to N fertilization [1]. Increasing N supply increases plant canopy development, which then increases radiation reception leading to increases in photosynthates used for tuber production [2][3][4]. Sub-optimal N supply leads to growth retardation and low tuber yield and quality. However, application of N in excess of the plant's N requirement can contribute to environmental pollution, particularly, nitrate leaching to groundwater and emissions of nitrous oxide, a greenhouse gas [5]. High N rates can also result in decreased tuber quality [6,7]. There are complex relationships between N supply and source-sink relations in tuber production. Under commercial potato production, roots take up N from soils via transporters mostly as nitrate but also in the form of ammonium [8]. These inorganic nitrogenous compounds are assimilated into organic compounds by a series of enzymatic processes occurring primarily in leaves resulting in synthesis of amino acids [9]. The foliar response to N contributing to source-sink relations in tuber production is not well-understood. Studies have shown that N supply affects leaf size and duration rather than photosynthetic efficiency [10][11][12][13]. To better understand source responses to N that affect sink development, the current study examined gene expression in source tissue (leaves of the potato vine) mid-season to find N-responsive biological processes associated with tuber yield. Gene expression associated with tuber dry matter (as measured using specific gravity as a proxy trait) and total plant N uptake were also compared. Knowledge of N-responsive processes contributing to tuber yield is needed to improve N use efficiency in potato plants and to develop indicators of crop N status to guide the optimization of N fertilizer application. Application rates of N are currently mainly guided using predictions of mineral N supply from the soil, which does not take into account in-season variation in growth conditions [5,14]. In-season monitoring of N status during crop growth can provide more precise determination of the N requirement needed to meet economic and environmental goals [15,16]. Development of a foliar diagnostic is advantageous as the tissue is above ground and accessible for sampling. Foliar gene expression responsive to N was identified in other studies and has potential application in monitoring crop N status [17][18][19][20][21][22]. In addition to N response, genes with functions associated with source-sink relations have even greater relevance to crop N status, and identification of these genes was the goal of the current study. The current study used data from a single field trial to identify N-responsive gene expression in potato foliage. Two methods were used to identify differentially expressed genes with N supply-CuffDiff [17,23] and uncorrelated shrunken centroids [24]. The foliar expression of the N-responsive genes was then analyzed over seven field trials where samples used for analysis of gene expression were taken from the plants in the same plots used for collection of crop data-yield, total N uptake and specific gravity. Multiple field trials enabled yield to be determined across a range of soil and environmental conditions, which provided a more accurate picture of sink partitioning under N supply compared to greenhouse trials using potted plants. In the current study, transcriptome sequencing was used to identify genes differentially expressed with N supply using a single field trial [17]. These N-responsive genes were further analyzed using the NanoString nCounter platform [25]. This platform quantifies gene expression by direct counting of mRNA transcripts hybridized to probes labelled with fluorescent tags using an imaging system embedded in the instrument. The nCounter system does not require enzymatic steps, hence analysis of large numbers of samples can be done in shorter amounts of time with less laboratory handling compared to transcriptome sequencing and RT-PCR. The capacity for analysis of large sample numbers and relatively low cost of nCounter analysis makes it amenable as a diagnostic tool for gene expression compatible with in-season monitoring of N status in potato plants. The nCounter system was used to quantify gene expression for a large set of samples assembled from seven replicated field trials at four locations across Canada involving five cultivars of potato grown at varying N rates. A novel data mining approach with LASSO was used to analyze the nCounter data to find gene expression predictive of tuber yield, total plant N uptake, and tuber-specific gravity. Studies by others have demonstrated the effectiveness of LASSO in mining gene expression to find genes involved in seed coat mucilage and pectin metabolism in Arabidopsis thaliana [26]. LASSO both selects a subset of highly correlated genes and performs ridge regression to make predictions [27]. As such, genes selected in the LASSO analysis provided further insight into N-responsive molecular processes associated with yield, total plant N uptake, and tuber-specific gravity. This information was used to identify candidate indicators for N status.

Field Trials
Tubers were planted at four locations in Canada (Carberry, MB; Péribonka, QC; Harrington, PE; and Fredericton, NB), in a total of seven field trials, to capture a range of soil and environmental conditions ( Table 1 and Supplementary Table S1). The 30-year average mean temperature and precipitation during the growing season is presented in Table 1 to illustrate the variation in the sites. Leaves were sampled for gene expression between 42 and 50 days after planting (DAP) from the seven trials, which was coordinated with the timing of a second application of N in split N fertilization (Table 1) [6]. Cultivars planted at each site varied. In total there were five cultivars (Russet Burbank, Jemseg, Shepody, Atlantic, and Classic Russet). Each cultivar was planted in four replicate plots of 15 hills (plants) for each N treatment for each trial. Various rates and sources of mineral N fertilizer were applied depending on the site (see Supplementary  Table S1 and S2). Nitrogen fertilizer was applied at planting at all sites and a second application of fertilizer was applied at dates indicated in Table 1 at three of the sites. For data analysis, the total N rate from both applications was used. Samples treated with total N rates of 106 and 120 kg N ha −1 were combined as a 106-120 treatment rather than as separate treatments, and similarly total N rates of 180 and 200 kg N ha −1 were combined as a 180-200 treatment. Combining samples for the treatment was supported by data that showed that there were no statistically significant differences in yield, specific gravity, or total N uptake response between the 106 and 120 kg N ha −1 treatments or between the 180 to 200 kg N ha −1 treatments (data not shown).
Total yield (tuber mass per hectare) was determined by weighing tubers in each plot that had a known area. Whole plant samples (4) were removed from each plot. Vines, stolons, and readily recoverable roots were washed, oven-dried, and weighed to determine dry matter yield. Dried plant tissue was ground and total N concentration was determined by dry combustion [28]. Tuber-specific gravity was used as a proxy for tuber dry matter [29]. Specific gravity was calculated on approximately 4.5 kg of tubers with marketable size categories from each plot using the weight in air minus weight in water method [30].

RNA Extraction
For each plot, the last fully expanded leaf from each of 15 hills (plants) from each plot was sampled using a hole punch 42-50 days after planting. The 15 leaf disk samples were pooled in a 2 mL screw cap tube with 750 µL of RNALater (Ambion, Austin, TX, USA). Leaf disks were stored at 4°C for up to two weeks. RNALater was removed and leaves were rinsed twice in borate buffer (200 mM sodium borate decahydrate (Borax), pH 9.0, 30 mM ethylene glycol bis (β-aminoethyl ether)-N-N'-tetraacetic acid (EGTA), 1% (w/v) sodium dodecyl sulfate (SDS), and 10 mM dithiothreitol) plus 1% (w/v) sodium deoxycholate and 2% (w/v) polyvinylpyrrolidone (PVP) (Mr 40,000)) [31,32] at room temperature. The borate buffer was drained and the leaves were flash-frozen in liquid nitrogen. Leaf tissue was ground using two metal ball bearings in a FastPrep24 homogenizer (MP Biomedical, Solon, OH, USA). Samples were pre-extracted in 1 mL borate buffer. The lysate supernatant (200 µL) was used for RNA extraction with the Biomek NXP Laboratory Automation Workstation (Beckman Coulter, Mississauga, ON, Canada) using the RNAdvance Tissue Kit (Beckman Coulter, Mississauga, ON, Canada) according to the manufacturer's instructions for liquid samples. The RNA concentration and quality were determined using a NanoDrop 1000 Spectrophotometer (Thermo Scientific, Dartmouth, NS, Canada) and 2100 Bioanalyzer (Aglient Technologies, Santa Clara, CA, USA), respectively. There was a total of 162 samples in the study. Table 1. Field trials. The locations of the trial are followed by the latitude and longitude coordinates in decimal degrees in brackets. Fertilizer was applied at planting and at a second application for three of the sites. The planting, second application, and harvest dates are indicated as month/day/year (m/d/y). Second application and harvest dates are followed by days after planting (DAP). The soil type, the 30-year average temperature of the growing season from May to October (Avg°C), and average precipitation in mm from May to October (precipitation (mm)) are presented.

Transcriptome Sequencing Data Analysis
Genome-wide gene expression analysis was done using transcriptome sequencing data from a previous study [17] that used leaves from the GE2012 trial (Table 1, Figure 1). Briefly, leaves were sampled from the GE2012 trial from three cultivars (Russet Burbank, Atlantic, and Shepody) treated with two fertilizer N rates, 0 (control) and 180 kg N ha −1 (treated) in a randomized complete block design with four blocks. Sampling of leaves was done at two dates-25 July and 8 August 2012. Forty-eight samples from the GE2012 trial were used for the transcriptome analysis. National Centre for Biotechnology Information Gene Expression Omnibus (NCBI GEO) accession for the data is GSE75926. Differentially expressed genes between the control and treated samples were identified using the negative binomial t-test in CuffDiff for the three cultivars at two time points [17]. The fragments per kilobase of transcript per million mapped reads (FPKM) from the transcriptome sequencing data was further analyzed using another approach in the current study. FPKM was log 2 transformed and analyzed using uncorrelated shrunken centroids (USC) [24] as implemented in Multi-experiment Viewer (MeV4.9) [33] with delta 2.5, #bins 20, and rho 0.9. USC is a supervised machine learning classification algorithm that was used to select genes that can classify samples into control and treatment groups based on gene expression. All of the differentially expressed genes from the previous and the current transcriptome sequencing data analyses were combined as the set of N-responsive foliar genes for further analysis using nCounter.

nCounter Gene Expression Analysis
Multiplex analysis of the expression of 68 genes was done using the nCounter Digital Analyzer (NanoString Technologies, Inc., Seattle, WA, USA). Details of nCounter multiplex gene expression analysis are described elsewhere [25]. The natural logarithm was used to transform data prior to analysis. The genes included five reference genes and 63 N-responsive experimental genes (Supplementary Table S1). The probe sequences used in the nCounter CodeSet were from the potato genome reference sequence [34] and are in Supplementary Table S3. Thirty-six of the experimental genes were identified using the USC analysis described above. Another 24 genes were identified

nCounter Gene Expression Analysis
Multiplex analysis of the expression of 68 genes was done using the nCounter Digital Analyzer (NanoString Technologies, Inc., Seattle, WA, USA). Details of nCounter multiplex gene expression analysis are described elsewhere [25]. The natural logarithm was used to transform data prior to analysis. The genes included five reference genes and 63 N-responsive experimental genes (Supplementary Table S1). The probe sequences used in the nCounter CodeSet were from the potato genome reference sequence [34] and are in Supplementary Table S3. Thirty-six of the experimental genes were identified using the USC analysis described above. Another 24 genes were identified using a negative binomial t-test from a previous study [17] and three were selected based on previous studies using nCounter [19,21]. There were four glutaredoxin genes identified using USC; however, the nCounter assay had a single probe for detecting Sotub04g010640 (glutaredoxin) due to the similarity in sequence between the glutaredoxin genes. This probe can detect all four genes, therefore expression for the Sotub04g010640 was representative of any and all the glutaredoxin genes. However, the transcriptome sequencing analysis indicated that only Sotub04g010640 showed differential expression with N supplementation. Unique probe sequences were found for all of the other genes. The five reference genes were adenine phosphoribosyl transferase (aprt), tubulin, cyclophilin [35,36], elongation factor 1-α(ef1α) [37], and COX1 [38]. The nCounter data was adjusted using the manufacturer-provided spiked positive and negative controls according to the manufacturer's instructions. The geometric mean of the reference genes was calculated and used to normalize the experimental genes on a per sample basis [39,40]. The 48 samples from GE2012 used for transcriptome sequencing were analyzed using nCounter and Spearman rank correlation was performed using SYSTAT v. 13 (Systat Software, San Jose, USA) and used to compare expression data from nCounter and transcriptome sequencing. For the LASSO analysis 24 of the samples from GE2012 (from the first time point at 47 DAP) were used along with another 138 samples from the six other field trials for a total of 162 samples (Table 1, Figure 1). A heat map of the log e transformed nCounter gene expression data was generated using Multi-Experiment Viewer (version MeV4.9) [33].

Statistical Analysis
The nCounter data for 63 N-responsive genes from 162 samples in the seven field trials ( Figure 1) were used for analysis. Selection of genes was done using LASSO, a regularization method, to fit a regression on the nCounter gene expression data against yield, total N uptake, and specific gravity. Coefficients for the model are calculated using the following criteria [27].
where x ij is the observation i for the variable j, y i is the response for observation i, and RSS is the residual sum of squares. By adding the penalty term λ p j=1 β j , small coefficients will shrink to zero, as regulated by the tuning parameter λ. In the current analysis λ was chosen through ten-fold cross validation via the glmnet package in the R statistical computing software (https://www.R-project.org/) [41]. The data were separated into training and testing sets and evaluated through a ten-fold cross validation. Outliers were removed.
For each fold of the cross-validation, 100 bootstrapped samples of the training data were analyzed using the LASSO model. A reduced model was determined for each fold by selecting variables that were included over the bootstrapped samples more than 50% of the time for that particular fold. That is, if a variable's coefficient was greater than zero more than 50 times across the 100 bootstraps within a given fold in cross-validation, it was selected to be included in the regression model. Then, genes that were common across all ten folds were selected for the final set of variables (LASSO10).
The model using the LASSO10 reduced set of genes was then fitted to the training data using linear regression and evaluated using the test data. Cross-validated test mean squared error (MSE) and adjusted R-squared results on training data were both calculated. These results were compared to cross-validated MSE and adjusted R-squared results from a linear model including all genes for assessing improvement of the fit with the reduced LASSO10 models.
SpudDB-http://solanaceae.plantbiology.msu.edu/index.shtml [42]-was used to identify Arabidopsis thaliana genes that aligned with genes in the LASSO reduced models. The Arabidopsis Information Resource-www.arabidopsis.org [43]-was used to search further information on gene function.

Identification of N-Responsive Genes Using Transcriptome Sequencing
Transcriptome sequence data from a single field trial (GE2012, Table 1) were mined for genes showing differential expression between 0 (control) and 180 kg N ha −1 (treated) treatments to find N-responsive genes ( Figure 1). A previous study used the negative binomial t-test in CuffDiff to identify differential expression for each of the three cultivars-Atlantic, Shepody, and Russet Burbank-in the GE2012 trial at two time points [17]. There were 39 genes showing differential expression across the cultivars and time points. In the current study, USC was used to filter a set of genes that can classify samples as control (0 N) and treated (supplemented with 180 kg N ha −1 ) using the same data as Gálvez et al. [17]. Forty-one genes were found to classify samples as control and treated ( Table 2). Eighteen of these genes overlapped with the differentially expressed genes in the CuffDiff analysis from Galvez et al. [17]. There were 23 genes in the USC classifier that did not show differential expression with CuffDiff. Many of the genes unique to USC analysis included those with low expression levels. The differences between the CuffDiff and USC analyses indicate that each method has biases, and there are advantages to identification of N-responsive genes using more than one analytical method. The differentially expressed genes from Galvez et al. [17] and the USC classifier genes were combined to generate a 63-gene nCounter assay for gene expression analysis ( Table 2, Supplementary Tables S1  and S3). The nCounter assay had decreased sample processing and was less expensive to run over large numbers of samples. This made nCounter suited to analysis of a larger number of samples collected in multiple field trials. A Spearman rank correlation analysis was done to compare gene expression from transcriptome sequencing (FPKM) and nCounter ( Figure 2). The results indicate that the R 2 is 0.84, which provided validation for gene expression from these two platforms. my 2020, 10, x FOR PEER REVIEW 8 iases, and there are advantages to identification of N-responsive genes using more than tical method. The differentially expressed genes from Galvez et al. [17] and the USC class were combined to generate a 63-gene nCounter assay for gene expression analysis ( Tab  lementary Table S1 and S3). The nCounter assay had decreased sample processing and was sive to run over large numbers of samples. This made nCounter suited to analysis of a la er of samples collected in multiple field trials. A Spearman rank correlation analysis was d pare gene expression from transcriptome sequencing (FPKM) and nCounter ( Figure 2). s indicate that the R 2 is 0.84, which provided validation for gene expression from these rms. igure 2. Correlation between nCounter and FPKM from transcriptome sequencing. The same RNA amples were used for analysis for nCounter and FPKM values. Values for each of the samples were raphed. The R 2 is from Spearman rank correlation.  Table 2. Differentially expressed genes from transcriptome sequencing analysis of leaves from the GE2012 trial. Foliar samples from the GE2012 were used for transcriptome sequencing [17]. Genes in gray and white boxes were found to be differentially expressed in Gálvez et al. [17] using CuffDiff. The same transcriptome sequencing data was re-analyzed in the current study using uncorrelated shrunken centroids (USC) for classifying control and treated samples to identify differentially expressed genes. Genes in gray boxes show differential expression in both CuffDiff and USC analyses. Genes showing differential expression for just USC (blue boxes) and CuffDiff (white boxes) are shown. Genes in green boxes were examined in previous studies [19,21], but were not found to be differentially expressed in the transcriptome sequencing analysis. The fragments per kilobase of transcript per million mapped reads (FPKM) were averaged for control (0 N) and treated (180 kg N ha −1 ) samples over the three cultivars and two time points from the GE2012 trial. The iTAG ID for the genes is from Potato Genome Ssequencing Consoritum (PGSC) v4.03. The nCounter ID are the labels for the genes in the nCounter assay (Supplementary Table S1). Letters on the right indicate the LASSO10 model that included the gene. Y = LASSO10 yield model, N = LASSO10 total N uptake model, S = LASSO10 specific gravity model.

N-Responsive Gene Expression Predictive of Yield, Total N Uptake and Specific Gravity
The N-responsive genes identified from transcriptome sequencing were further analyzed in seven field trials (Table 1) using nCounter. Supplementary Table S3a-c summarizes the cultivars and N rate treatments for each of the plots at the seven field trial sites. In each of the field trials, the foliar tissue used for gene expression analysis was sampled from plants in same plots used to collect crop data. Figure 3 shows a representative photo of potato plants, cultivar Russet Burbank, at 0 N (control) and 180 kg N ha −1 (treated) at 45 and 80 DAP to demonstrate crop response to N supply. N fertilizer is typically applied at one time and rate at planting. Alternatively, split N-fertilization practices involve a reduction in applied N at planting followed by a second application of N. Information on a crop's N status at the timing of the second application is needed to optimize the N rate. Hence, gene expression analysis was done on plants between 42-50 DAP to coordinate with the timing of a second application in split N fertilization. Figure 4 is a heat map of the average log e -transformed nCounter gene expression for each of the plots from the seven field trials. The average yield, total N uptake, and specific gravity for each plot in each of the field trials is shown on the right side of the heat map. Yield, specific gravity, and total N uptake data are also presented as box plots in Supplementary Figures S1-S3. The raw data for yield, total N uptake and specific gravity is in Supplementary Table S1. The N rate was observed to affect yield and total N uptake, but specific gravity had low levels of variation between applied N rates.

N-Responsive Gene Expression Predictive of Yield, Total N Uptake and Specific Gravity
The N-responsive genes identified from transcriptome sequencing were further analyzed in seven field trials (Table 1) using nCounter. Supplementary Tables S3a-c summarizes the cultivars and N rate treatments for each of the plots at the seven field trial sites. In each of the field trials, the foliar tissue used for gene expression analysis was sampled from plants in same plots used to collect crop data. Figure 3 shows a representative photo of potato plants, cultivar Russet Burbank, at 0 N (control) and 180 kg N ha −1 (treated) at 45 and 80 DAP to demonstrate crop response to N supply. N fertilizer is typically applied at one time and rate at planting. Alternatively, split N-fertilization practices involve a reduction in applied N at planting followed by a second application of N. Information on a crop's N status at the timing of the second application is needed to optimize the N rate. Hence, gene expression analysis was done on plants between 42-50 DAP to coordinate with the timing of a second application in split N fertilization. Figure 4 is a heat map of the average loge-transformed nCounter gene expression for each of the plots from the seven field trials. The average yield, total N uptake, and specific gravity for each plot in each of the field trials is shown on the right side of the heat map. Yield, specific gravity, and total N uptake data are also presented as box plots in Supplementary Figures S1-S3. The raw data for yield, total N uptake and specific gravity is in Supplementary Table S1. The N rate was observed to affect yield and total N uptake, but specific gravity had low levels of variation between applied N rates.  The goal of the study was to associate foliar N-responsive gene expression with yield, total N uptake, and specific gravity. LASSO regression analysis was used for this purpose. Regression models were built to predict the response variables yield, specific gravity, and total N uptake using gene expression, N rate, and the non-numerical categorical variables (factors). The factors in the analysis were site and cultivar. The full LASSO model was built with factors, N rate, and gene expression for all 63 genes. A model was also built using just factors and another with just the 63 genes (all genes). Additionally, subsets of genes identified from the feature selection process as part of LASSO were used to build reduced-gene models ( Table 2 and Figure 4). The predicted yield, specific gravity, and total N uptake from each of the LASSO models were fitted to the actual crop data and the adjusted R 2 and MSE from the analysis are shown in Table 3. The factors-only model had the best fit of all the models. These results demonstrate that variations due to site and cultivar have significant effects on the response variables and need to be considered in development of diagnostic tools for N status monitoring. The all-genes and LASSO 10 reduced-gene models also predicted yield with reasonable accuracies, with the LASSO10 reduced-gene models having better predictive accuracies than the all-genes model, due to the reduced model variance. These results confirm that genes eliminated in the selection process did not contribute greatly to the prediction of the response variables and that LASSO selection identified key genes for predicting yield, specific gravity, and total N uptake. The adjusted R 2 value for the LASSO10 reduced model for yield was 0.57, and total N uptake was 0.58 compared to specific gravity, which had a lower value of 0.35 (Table 3). The genes selected in the LASSO10 reduced-genes models for predicting yield, total N uptake, and specific gravity are indicated in Table 2.
Agronomy 2020, 10, x FOR PEER REVIEW 13 of 22 Figure 4. Heat map of nCounter gene expression. The loge gene expression quantified using nCounter and averaged over the four replicates is represented by a blue-yellow gradient where yellow is the highest expression and blue is the lowest. The nCounter gene ID for the 63 genes is indicated at the top of the heat map. The colored bars over the nCounter gene IDs indicate which genes were included in the LASSO10 reduced-gene model: red for yield, green for total N uptake, and light blue for specific gravity. The genes without a colored bar over them were not included in any of the LASSO10 reducedgene models. The average yield, total N uptake, and specific gravity for each of the plots where the leaf samples were taken is indicated in the table adjacent to the heat map. The trial name, cultivar, and N rate treatment for each sample is also indicated in the table.
The goal of the study was to associate foliar N-responsive gene expression with yield, total N uptake, and specific gravity. LASSO regression analysis was used for this purpose. Regression models were built to predict the response variables yield, specific gravity, and total N uptake using gene expression, N rate, and the non-numerical categorical variables (factors). The factors in the analysis were site and cultivar. The full LASSO model was built with factors, N rate, and gene expression for all 63 genes. A model was also built using just factors and another with just the 63 genes (all genes). Additionally, subsets of genes identified from the feature selection process as part of LASSO were used to build reduced-gene models ( Table 2 and Figure 4). The predicted yield, specific gravity, and total N uptake from each of the LASSO models were fitted to the actual crop data and the adjusted R 2 and MSE from the analysis are shown in Table 3. The factors-only model had the best fit of all the models. These results demonstrate that variations due to site and cultivar have significant effects on the response variables and need to be considered in development of The log e gene expression quantified using nCounter and averaged over the four replicates is represented by a blue-yellow gradient where yellow is the highest expression and blue is the lowest. The nCounter gene ID for the 63 genes is indicated at the top of the heat map. The colored bars over the nCounter gene IDs indicate which genes were included in the LASSO10 reduced-gene model: red for yield, green for total N uptake, and light blue for specific gravity. The genes without a colored bar over them were not included in any of the LASSO10 reduced-gene models. The average yield, total N uptake, and specific gravity for each of the plots where the leaf samples were taken is indicated in the table adjacent to the heat map. The trial name, cultivar, and N rate treatment for each sample is also indicated in the table. Table 3. Summary statistics for least absolute shrinkage and selection operator (LASSO) analysis. Adjusted R 2 and mean squared error (MSE) from fitting the linear regression model for each of the crop parameters of yield, specific gravity, and total N uptake to the test data are presented. The factors were site and cultivar. All genes were the entire set of 63 genes in the nCounter analysis. LASSO10 genes were the genes selected from the 63 genes in the reduced regression models. The full model included N rates, expression of all genes and factors for prediction of each of the crop parameters of yield, specific gravity, and total N uptake. The factors model included just the factors, and the all-genes model was just the 63 genes. The LASSO10 genes model included just the selected genes from Table 2 for predicting each of the crop parameters with a reduced set of genes.

Functional Analysis of Genes in the LASSO10 Reduced Models
Analysis of the functions for N-responsive genes in the LASSO10 reduced models predictive of yield, total N uptake, and specific gravity were analyzed using SpudDB [42] and literature searching (Supplementary File S1). This information was used to categorize the genes into functional groupings. There were some functional groups that were found to be unique to each LASSO10 model and several that were shared between models (Figures 4 and 5).
The most important crop parameter responding to N is yield. There were 22 genes in the reduced LASSO10 model for yield. The model included 15 genes that were upregulated and seven downregulated with N supplementation (Table 2, Figure 4, Supplementary File S1). The upregulated genes functioned in senescence, protein degradation, amino acid biosynthesis and repair, nitrate assimilation, intracellular signaling, sulfate transport and sensing, redox signaling, and detoxification ( Figure 5). The downregulated genes had functions in tuberization, senescence, sulfate transport, polyamine catabolism, nitrate sensing, gene expression regulation, lipid biosynthesis, and amino acid biosynthesis ( Figure 5). The genes in the LASSO10 yield model are associated with source-sink partitioning and their expression has potential application as indicators of crop N status.
N uptake from the soil is important to plant growth and development. Foliar processes responsive to N supplementation associated with total plant N uptake were examined. The LASSO10 reduced model for total N uptake had 19 genes of which 15 were upregulated and four were downregulated with N supplementation (Table 2, Figure 4, Supplementary File S1). The upregulated genes had functions in nitrate sensing, transport and assimilation, amino acid biosynthesis and repair, senescence, protein degradation, redox regulation, light signaling, glycolysis, redox regulation and lipid metabolism ( Figure 5). The downregulated genes had functions in senescence, chlorophyll degradation, amino transport, sulfate transport and polyamine catabolism ( Figure 5).
Examination of genes in the LASSO10 model for yield that overlap with the reduced-gene model for total N uptake provided insight into biological processes connecting total N uptake and yield. There were eight genes present in both of the LASSO10 models (Table 2, Figure 4, Supplementary File S1). These genes were also in the LASSO10 model for specific gravity. The N supplementation upregulated genes functioned in nitrate assimilation, senescence and protein degradation, amino acid biosynthesis and repair, intracellular signaling and lipid metabolism ( Figure 5). The downregulated genes functioned in sulfate transport and polyamine catabolism ( Figure 5). The gene showing the greatest differential expression with N supplementation was Sotub10g018540 aminotransferase like protein, which functions in nitrate assimilation in photorespiration (Table 2, Figure 4 and Supplementary File S1), was present in all three LASSO10 models.

Functional Analysis of Genes in the LASSO10 Reduced Models
Analysis of the functions for N-responsive genes in the LASSO10 reduced models predictive of yield, total N uptake, and specific gravity were analyzed using SpudDB [42] and literature searching (Supplementary File S1). This information was used to categorize the genes into functional groupings. There were some functional groups that were found to be unique to each LASSO10 model and several that were shared between models (Figures 4 and 5). The most important crop parameter responding to N is yield. There were 22 genes in the reduced LASSO10 model for yield. The model included 15 genes that were upregulated and seven downregulated with N supplementation (Table 2, Figure 4, Supplementary File S1). The upregulated genes functioned in senescence, protein degradation, amino acid biosynthesis and repair, nitrate assimilation, intracellular signaling, sulfate transport and sensing, redox signaling, and detoxification ( Figure 5). The downregulated genes had functions in tuberization, senescence, sulfate transport, polyamine catabolism, nitrate sensing, gene expression regulation, lipid biosynthesis, and amino acid biosynthesis ( Figure 5). The genes in the LASSO10 yield model are associated with source-sink partitioning and their expression has potential application as indicators of crop N status.
N uptake from the soil is important to plant growth and development. Foliar processes N-responsive biological processes associated with source-sink partitioning but not N uptake were also of interest. Differences in the LASSO10 reduced-gene model for yield and the total N uptake model provided a way to examine these processes. Genes in the yield model but not total N uptake included N-responsive upregulated genes with functions in senescence, intracellular signaling, sulfate sensing and transport, redox signaling, detoxification, and amino acid biosynthesis ( Figure 5). The downregulated genes had functions in tuberization, senescence, nitrate sensing, regulation of gene expression, lipid biosynthesis, and amino acid biosynthesis ( Figure 5). A key gene in the yield model but not in the other models was Sotub05g028860 flowering locus T protein, which functions in signaling for tuberization and senescence in potato (Supplementary File S1).
Additionally, genes in the LASSO10 total N uptake model but not the yield model were also examined. The N supplementation upregulated genes had functions in glycolysis and nitrate sensing, transport and regulation, senescence, protein degradation, redox regulation, and light signaling ( Figure 5). The downregulated genes functioned in senescence, chlorophyll degradation, and amino acid transport ( Figure 5).
N rate was not found to have a significant effect on specific gravity, a proxy for dry matter, in the current study (Supplementary Figure S3). The variation in specific gravity in the field trials was mostly due to site and cultivar variation. A LASSO10 reduced model for specific gravity was built and it had 25 genes of which 19 were upregulated and six were downregulated with N. The upregulated genes had functions in sucrose transport, peptide transport, starch breakdown, amino acid biosynthesis and repair, redox signaling and regulation, nitrate and sulfate assimilation, detoxification, light signaling, intracellular signaling, and lipid metabolism ( Figure 5). The downregulated genes had functions in sulfate transport, polyamine catabolism, nitrate sensing, gene expression, lipid biosynthesis, and amino acid biosynthesis ( Figure 5). The results suggest that biological processes responding to N in foliage were associated with dry matter accumulation in tubers, however, site and cultivar variation confounded the effect of the N rate. Genes unique to the specific gravity model were of interest as these genes provided insight into foliar functions associated with production of dry matter that were not associated with yield and total N uptake. Genes unique to the specific gravity model had functions in sucrose transport, starch breakdown, peptide transport, amino acid biosynthesis, sulfate assimilation, and redox signaling ( Figure 5).

Discussion
Nitrogen in soils is an environmental cue initiated in roots that triggers molecular responses in plants to enable them to capitalize on nutrient abundance or adjust to nutrient depletion. The current study examines N supplementation, which is a condition of nutrient abundance that many crops are under in agricultural production systems. In potato, N taken up through roots induces responses in leaves (source) that impact development of tuber sink tissues. N is a key nutrient for potato to reach optimal yield and quality; however, over-application results in nitrate leaching to groundwater and greenhouse gas emissions [5]. A goal of the study was to integrate N responses in source tissues (leaves) with sink partitioning to foster a better understanding of N use efficiency in potato. N-responsive gene expression in source tissues also has potential application to in-season monitoring of N status [17][18][19][20][21][22], which is key to optimizing N fertilization to achieve maximal yields and quality [44]. N-responsive genes associated with source-sink partitioning that have a functional connection to N status and are of particular interest as diagnostic indicators.
A major sink tissue for potato is the tuber, which was quantified as yield (mass/area) in field plots. Specific gravity, a proxy for tuber dry matter, is an indicator for tuber quality that was also quantified. Total plant N included both source and sink tissues and was a measure of the capacity of the crop to respond to N supply. The strategy was to first identify genes with expression in leaves responding to N supply, then find which of these genes are associated with tuber yield using multiple field trials and data mining with LASSO. The functions of these genes were then examined to find N-responsive molecular processes associated with source-sink partitioning. LASSO regression models for total plant N uptake and tuber-specific gravity were also built and compared to the yield model. The study demonstrated the effectiveness of LASSO for selecting genes and fitting regressions involving their expression.
Tuber yields increase with N supply to a plateau where additional N does not stimulate further increases in yield [45,46], as was observed in the current study. N is both a plant nutrient and signaling molecule for growth and developmental phase changes [47,48], and in potato increased N supply delays tuberization [49][50][51]. Studies also reported that increased N supply increases leaf size and duration rather than photosynthetic efficiency in potato [10][11][12][13]. The signal for tuberization and vine senescence involves inducing expression of the potato homolog of Flowering locus T, StSP6A, which is transcribed and translated in leaves and transported via phloem to stolons as a peptide [52]. In Arabidopsis thaliana Flowering locus T interacts with a b-ZIP transcription factor FD and functions in the regulation of gene expression [53]. In potato stolons, StSP6A was also found to bind and block sucrose transporter SWEET11 to reduce leakage of sucrose to the apoplastic phloem in coordination with increased symplastic loading during tuberization [54,55]. These findings demonstrate that StSP6A functions as a tuberigen and inducer of senescence in leaves, and a regulator of source-sink partitioning in stolons. The decreased expression of StSP6A in leaves observed with N supplementation in the current study concurred with other studies [17,20], and is thought to underlie the delay in tuberization and vine senescence under high N rates. The gene expression profile showed that several genes associated with senescence were decreased in expression under N supplementation in coordination with StSP6A. Increased leaf area and duration under increased N supply means greater production of photosynthates and increased resources for N remobilization to sink tissues during subsequent senescence [56,57]. Hence, delaying tuberization and vine senescence under abundant N supply in potato allows the plant to maximize resources for partitioning to tuber production later in the growing season.
Infection by the fungal pathogen Verticillium dahliae induced early senescence and lowered tuber yields [58], which has parallels with low N rates. A quantitative trait loci (QTL) for V. dahliae disease resistance was mapped to the St.CDF1 locus [59]. The St.CDF1 gene encodes a transcriptional regulator that functions in the pathway controlling expression of StSP6A [52,60]. These results support a role for St.CDF1-StSP6A regulatory pathway for tuberization in source-sink partitioning and in V. dahliae resistance in potato. Heat stress also induces a delay in tuberization [61]. Repression of StSP6A gene expression was also found with heat stress and involved post-transcriptional regulation mediated by miRNA [62]. In contrast to N supplementation and resistance to V. dahliae, the delay in tuberization with heat stress reduced partitioning from source to sink in potato [63][64][65]. Where N supply increased leaf area, heat stress was demonstrated to decrease leaf size [66] and reduce photosynthetic efficiency [67,68]. These studies suggest that increasing the amount of source tissue in coordination with delaying tuberization, was important in increasing sink potential.
Increasing leaf life span and area increases photosynthesis and other foliar metabolic processes including photorespiratory N assimilation [69]. Photorespiration requires Rubisco binding to O 2 as opposed to CO 2 in the light-independent reactions of photosynthesis. The photorespiratory pathway coordinates enzymatic steps in the mitochondria, cytosol, peroxisome, and chloroplast in leaves and results in the uptake of N in the form of nitrate and the biosynthesis of amino acids [9]. The results showed that a gene encoding a peroxisomal enzyme, alanine-glyoxylate aminotransferase, which functions in the photorespiratory pathway, increased in expression with N supplementation and was included in LASSO10 reduced models for yield, total N uptake, and specific gravity. This enzyme catalyzes the conversion of alanine and glyoxylate to glycine and pyruvate [70]. Other genes involved in amino acid biosynthesis were also included in the gene models. These results provide evidence that photorespiratory N assimilation and amino acid biosynthesis are associated with N-responsive source-sink partitioning. The results concur with other studies showing that N supplementation increased foliar amino acids in potato [71].
The analysis of the effects of N supplementation on foliar gene expression associated with yield was of greatest relevance to understanding source-sink partitioning. Total N uptake and specific gravity were also analyzed in comparison provided insight on biological processes that contribute to N assimilation and sink partitioning. Of interest were differences between the genes in the total N uptake and yield models. The total N uptake model included genes involved in amino acid and nitrate transport that were not present in the yield model. These results suggest that these transport activities were associated with N uptake-more so than sink partitioning. The yield model included genes involved in sulfur sensing and transport. These results suggest integration of N and S metabolism is associated with N-responsive source-sink partitioning. Other studies also reported findings that sulfur assimilation was responsive to N nutrition in plants [72]. Increases in amino acid biosynthesis under N supplementation require S as well as N since all proteins contain sulfur amino acids. Integration of S transport and assimilation with N supplementation was observed in the current study. These results also demonstrated the importance of maintaining S sufficiency under N supplementation for optimizing tuber yield in potato.
Specific gravity is a proxy for tuber dry matter [29], which is an important fry-processing quality trait for potato. Nitrogen was reported to have varying effects on specific gravity at lower N rates [73]. However, decreases in specific gravity occur when N is applied in excess [3,46,73]. In the current study, increasing N rates did not have a significant effect on specific gravity. The variation in specific gravity in the study was relatively limited in magnitude, and was due to other factors, including site and cultivar. The specific gravity model was found to have an adjusted R 2 of 0.35 which was the lowest of the three models. These results indicate that the N-responsive genes were not as strongly associated with specific gravity as yield and total N uptake. Among the genes included in the specific gravity model but not the others were a gene encoding a sucrose transporter and another encoding alpha-glucosidase, a gene involved in starch breakdown. These results demonstrate association of foliar molecular processes controlling carbohydrate metabolism and sucrose transport with dry matter accumulation in tubers.
The connection between N supplementation response and biological functions in source-sink partitioning makes genes in the LASSO10 reduced model for yield are good candidate indicators for in-season crop N status monitoring. Monitoring crop N status is important for split N fertilizer application, which enables optimization of N to plant demand [44]. For these reasons, our study targeted the timing of the second application of N for gene expression analysis to enable development of a N status indicator compatible with split fertilizer application. The findings of the study also indicate that site and cultivar variation affect yield and the other crop parameters. This means that LASSO gene expression prediction models for yield can vary with site and cultivar. In this case, gene expression indicators for monitoring crop N status may need to be calibrated specifically for each cultivar and site.

Conclusions
The LASSO10 gene expression analysis was effective in making novel association of biological processes in foliar source tissues with N-responsive sink development. The findings provided a connection between foliar gene expression and N-responsive source-sink partitioning, including roles for foliar signaling of tuberization, senescence, photorespiratory N assimilation, amino acid biosynthesis, and integration of S metabolism with N supply. The functional connection of the expression of a gene with source-sink partitioning supports selection of the gene as a candidate indicator of crop N status for potato.  Figure S1 Boxplots of (a) yield, (b) specific gravity, and (c) total N uptake for the different field trials; Figure S2 Boxplots of (a) yield, (b) specific gravity, and (c) total N uptake for the different varieties; Figure S3 Boxplots of (a) yield, (b) specific gravity, and (c) total N uptake for the different total applied N rates.