Confounding Factors in Container-Based Drought Tolerance Assessments in Solanum tuberosum

: Potato is an important food crop with high water-use-efﬁciency but low drought tolerance. The bottleneck in drought tolerance breeding is phenotyping in managed ﬁeld environments. Fundamental research on drought tolerance is predominantly done in container-based test systems in controlled environments. However, the portability of results from these systems to performance under ﬁeld conditions is debated. Thus, we analyzed the effects of climate conditions, container size, starting material, and substrate on yield and drought tolerance assessment of potato genotypes compared to ﬁeld trials. A leave one out assessment indicated a minimum of three ﬁeld trials for stable tolerance prediction. The tolerance ranking was highly reproducible under controlled-conditions, but weakly correlated with ﬁeld performance. Changing to variable climate conditions, increasing container size, and substituting cuttings by seed tubers did not improve the correlation. Substituting horticultural substrate by sandy soil resulted in yield and tuber size distributions similar to those under ﬁeld conditions. However, as the effect of the treatment × genotype × substrate interaction on yield was low, drought tolerance indices that depend on relative yields can be assessed on horticultural substrate also. Realistic estimates of tuber yield and tuber size distribution, however, require the use of soil-based substrates.


Introduction
Global climate change models predict altered precipitation patterns and increased air temperatures, which result in more arid conditions during the cultivation period of many crops [1]. The alleviation of drought effects by irrigation meets restrictions by rising competition for decreased fresh water resources by industry and domestic use [2]. Water saving irrigation techniques like drip-irrigation and deficit irrigation provide partial solutions [3]. Thus, there is a high demand for water-efficient crops [4,5]. Among the most important crops, potato (Solanum tuberosum) has the highest water use efficiency with regard to the food calories produced per water volume [3,6]. However, tetraploid potatoes (Solanum tuberosum ssp. tuberosum L.) are drought-sensitive because of shallow rooting and low recovery capacity after drought stress [7][8][9][10]. Studies on wild ancestors of S. tuberosum and on modern European potato cultivars found significant genetic variation for drought tolerance as a basis for the breeding of drought tolerant potato genotypes [11][12][13][14][15]. Traditionally, potato breeding is based on rather slow phenotypic recurrent selection [6,16,17]. In wheat and maize, selection on yield in arid environments has proved to result in a significant improvement of yield under arid conditions [18,19]. However, although these traditional techniques are effective, they are too slow to meet the pressing demand for an improved water-efficiency of crops within the next decade. Modern methods like marker-assisted selection based on genetic or metabolic marker or genomic selection can be employed in the early breeding and reduce the breeding cycle from ten to four years [6,[20][21][22]. The identification of markers and the training of marker models for the target population requires the phenotyping of many genotypes for drought tolerance. Ideally, the phenotyping is done in multi-environment trials and managed environments, where field trials in arid target environments are repeated over several years [2]. Thus, plants are exposed to the timing and degree of drought stress, the combination with heat or salinity stress, and the agricultural management, which is typical for on-farm production in the target environment [2,23,24]. Large scientific consortia and commercial companies have shown the power of this approach, especially for maize [2,24]. However, smaller companies and scientific settings need proxy methods that allow mimicking agricultural drought stress scenarios under climate chamber or greenhouse conditions. Fundamental research on drought stress, especially on the molecular drought stress response of Arabidopsis thaliana, mainly relies on axenic agar culture. In these systems, drought stress is applied by removing plants from the medium and exposing them to intense short-term drought stress [25]. Alternatively, plants are cultivated on agar plates, in which the medium's water potential is decreased by osmolytes like mannitol or PEG [26][27][28]. The latter system differs from the situation in the field by the high humidity/low water vapor pressure deficit (VPD) of the air space in agar plates and by the high osmotic component of the substrate water potential. In contrast, low water potentials in natural soils have a small osmotic component unless they are also saline. Furthermore, dry soils have a high resistance to water movement [29]. As drought stress researchers are generally aware of the shortcomings of agar-based systems, findings from these systems are checked in soil-grown plants. In these approaches, plants are grown on peat-based horticultural substrate under climate chamber conditions, in climate-controlled glasshouses, or screenhouses with variable climate conditions [14,[30][31][32].
In the fastest screens, drought tolerance is quantified by assessing physiological response or the survival after stopping the irrigation and reapplying water after a set period [33][34][35]. This approach can be misleading if the size of the compared genotypes differ significantly, as the lower transpiration of smaller plants will result in a higher soil water content at the end of the period without irrigation. Furthermore, survival assays are very far from the agricultural situation, in which yield not survival is the target trait [36]. Thus, most researchers perform drought tolerance assessment by measuring the plant's response to a reduced soil water potential that is kept constant by adding the water lost by evapotranspiration. Controlled irrigation is done laboriously by manual weighting and watering or automatically in phenotyping platforms [32,37,38]. This approach compensates for the effect of plant size on water-use. The results on drought tolerance QTL in maize and barley found in these systems are similar to those found under field conditions [38,39]. Mimicking this approach by manual weighing and watering is possible but laborious, which imposes a considerable restriction on the number of testable genotypes [40,41]. Furthermore, the constant water content approach differs from the field setting. In the field, all plants receive the same amount of water by precipitation or irrigation. The plants can only reach more water if they extend their roots into deeper soil layers and thus make use of excess precipitation in humid times. The impact of this drought avoidance adaptation on yield under drought stress is very difficult to assess in experiments, where plants grow in rather small pots [42,43]. However, not every researcher interested in drought tolerance in potato has access either to large managed field trials or to sophisticated phenotyping platforms. Thus, we like to establish, which factors are critical for the assessment of drought tolerance in container trials. Therefore, we performed a meta-analysis on the yield and drought tolerance data of potato genotypes in container-based experiments and field trials. Drought stress was applied by deficit irrigation, in which stressed plants received a fixed percentage of the irrigation volume required for optimal growth. Drought tolerance was assessed by the DRYMp (deviation of relative starch yield from the median of parent check genotypes) [14,21]. Drought trials were conducted on three populations with a total of 90 genotypes in controlled environments and under variable climate conditions in a screenhouse. The yield and tolerance results were compared to those from field trials conducted on three different sites between 2011 and 2018. The analysis revealed that a minimum of three field trials is required to obtain a stable drought tolerance ranking of the genotypes. Among the factors analyzed, pot size and planting material (in vitro cuttings or seed tubers) had but a weak effect on the tolerance assessment. In contrast, the substrate type affected the reproducibility of observations from field trials in container experiments considerably.

Materials and Methods
In this study, we performed a meta-analysis on data from 32 drought stress trials (Table 1) performed between 2011 and 2020 on three different populations of potato (Solanum tuberosum ssp. tuberosum L.). Population 1 contained 34 German potato cultivars, predominantly optimized for starch production, from nine different breeders. The data on this population have been analyzed and published with respect to the variability of drought tolerance and the relationship between yield potential and drought tolerance [14]. The data were reanalyzed in the context of this manuscript to illustrate the problem linked to container trials. Population 2 contained four cultivars (A, D, E, R) from population 1 and 60 offsprings of crosses between A × E (31 genotypes) and E × R (29 genotypes) [21]. Ten of these offsprings (five of each cross), the four cultivars A, D, E, and R and seven additional cultivars from population 1 were studied in population 3. Further details on the genotypes are given in Supplementary File S1, Figure S1.

Experiments on Population 1
The drought stress trials on population 1 were performed in parallel either in pots or in managed field environments (details see Table 1 and [14]). In the pot trials, plants were grown from in vitro cuttings and cultivated in 4L-pots on horticultural substrate in a screenhouse (2 experiments) or in a climate-controlled glasshouse (CE, 4 experiments). Temperature in the CE was set to 21 • C (16 h day) and 18 • C (8 h night) and artificial light was provided by 62,400-W-AgrosonT lamps/100 m 2 . Natural light penetration through the roof was approximately 50% of the outside light intensity. The drought treatment was imposed by reducing the volume of the irrigation water to a fixed percentage (30% or 70%) of the volume required for optimal water supply. The optimal water supply was determined by weighing representative pots to estimate when the soil water content has decreased below 40% of full capacity. Drought-stress (treatment = s) was applied by reducing the water supply to 30% (6 experiments) or 70% (Experiment PCH11) of the irrigation volume received by the control plants that were cultivated under optimal water supply (treatment = c). The reduction was done by increasing the intervals between irrigations rather than by decreasing the amount of water given during the irrigation event. Water stress was applied from two weeks after planting until tuber harvest. The spatial design was a randomized split-plot design. Table 1. Metadata on potato drought stress trials. Trial-Code First letter: F = field trial, P = pot trial, B = bigbag trial, second letter: T = starting material tuber, C = starting material cutting, third letter: S = sandy soil based substrate, H = horticultural substrate, final letters: counter. P = population, Culture = culture id in the plant database [44], n = number of replicates, Pl = number of plants per replicate, G = number of genotypes, start date = planting date, end date = date of haulm destruction, SI = stress index [14]. In the field trials, plants were grown from tubers on sand in Potsdam-Golm (Golm, 52 • 23 55" N 13 • 3 56" E) or sandy loam in Dethlingen (51 • 57 17" N 10 • 07 33" E) and Groß-Lüsewitz (54 • 04 12" N 12 • 20 19" E) between 2011 and 2013 (8 experiments, see Table 1). In Groß-Lüsewitz, drought-stressed plants were grown under rainout shelters and received no irrigation after emergence. On the two other sites, drought treatment relied on the low natural precipitation and low soil water capacity on the sites. Drought treatment by differential irrigation was started after plant emergence. On the site Potsdam-Golm, control plants received 10 L/m 2 when wilting occurred at midday. Drought-stress plants were irrigated with 10 L/m 2 when wilting was observed at 7:00. On the site Dethlingen, control plants were irrigated with an overhead irrigation system once the soil water capacity had been predicted to decrease below 50% of the field capacity by the soil moisture prediction model of the Deutsche Wetterdienst. Stressed plants received no irrigation but precipitation. The stress index SI (1 minus relative starch yield [14]) is given in Table 1. The drought score before and after flowering can be found in supplemental material Table S3 at [14]. The spatial design was randomized block design, details on the number of replicates and plants per replicate are given in Table 1.

Trial
In all experiments, drought tolerance was assessed based on the tuber starch yield. Tubers with a diameter >10 mm were harvested, weighted, and the starch content was determined gravimetrically or by dry weight determination (CE experiments). The starch yield was calculated as the product of starch content and tuber fresh weight. For further experimental details, see [14]. Yield and climate data are available at E!DAL [45,46].

Experiments on Population 2
All drought stress trials on population 2 were performed under naturally variable climate conditions. Plants grown from cuttings were cultivated in pots on horticultural substrate under a shelter (PCH19, PCH24) or in 30L-bigbags with horticultural substrate in a polytunnel screenhouse (BCH15, BCH20). Drought stress was applied as described before, with drought-stressed plant receiving 50% of the water-volume supplied to control plants.
In parallel, genotypes of population 2 were grown from tubers on the three field sites as described above in 2015 and 2016. The onset of drought stress can be retrieved from the irrigation data shown in Figure 3 of [21] and in file 'climate_VALDIS_vpd_water_supply.csv' available at E!DAL [47]. Drought stress and yield phenotyping were done as described before. Plants in the stress-treatment were grown under rainout shelters at the sites Groß Lüsewitz and Golm. The experimental design was a randomized block design, details on replicates, start and end date are available in Table 1. Planting density was 44,000 plants/ha. Further experimental details are given in [21]. Original climate and phenotyping data can be retrieved from E!DAL [47].

Experiments on Population 3
Field experiments on population 3 (21 genotypes) were performed in 2017 and 2018 in the field site Potsdam-Golm as described in [21]. Four bigbag experiments were performed between 2017 and 2020 in the polytunnel screenhouse at the MPI-MP Golm. As in the field trials, plants were grown from tubers that were chitted for at least eight weeks at 6-8 • C in dim light before planting (planting dates see Table 1). In 2017 to 2019, all 20 genotypes were grown on 30 L peat-based horticultural substrate, a mixture of 1 part sand and 2 parts substrate type T (Kausek, Mittenwalde, Germany) mixed with 30 g Novatec classic per 30L-bigbag (Experiments BTH25, BTH27 and BTH29). Additionally, the cultivars A, D, E, and R were grown on sand (A horizon of a Podsol from Potsdam-Golm, Germany) fertilized with the same amount and type of fertilizer (experiment BTS30). In 2020, all 20 cultivars were grown on the soil-based system (BTS32) and the cultivars A, D, E and R were grown additionally in an horticultural substrate-based system (BTH31). The experimental design was a randomized-block design with treatment as block and genotype as subplots (replicates and plants per replicate see Table 1).
Stress treatment was started at BBCH15 and applied as described before: during the stress treatment, plants received the same amount of water per irrigation like the control plants but the interval between the irrigation was increased to achieve a final irrigation volume in the stress treatment equivalent to 50% of the optimal water supply in the controls. In addition to the control treatment (cc) and the stress treatment (ss), an early (sc) and late (cs) stress treatment was applied. The treatment switch was performed at flowering stage and all plant were watered to 60% of the soil water capacity at the switch date. Plants on early stress received reduced irrigation before the switch date, plants on late stress after the switch date. Details of the times and volumes of irrigation and precipitation can be found in Supplemental File S1, Figure S2, and in file 'Water_2017_2020.csv' at E!DAL [48]. Humidity and temperature of the soil was measured every hour at a depth of 15-20 cm below the surface in randomly chosen replicates of the cultivars A, D, E, and R in all treatments by Plantcare sensors (Plantcare, Ruchikon, Switzerland). In the field treatment, treatment ss and cs (2017) or sc (2018) grew under a rainout shelter, the other treatments were grown without a shelter. The water volume received in each treatment by irrigation and precipitation is depicted in Supplementary File S1, Figure S2. The air temperature was measured every 15 min (polytunnel) or every 1 min (field) by an HC2-S3 sensor, shielded with a SS3 radiation shield (UP Umweltanalytische Produkte, Cottbus) approximately 1.5 m above ground at a weather station adjacent to the experimental sites in the polytunnel and The experiments were terminated by the removal of the shoots when a thermal sum of 1400 day × C • (d • C) was reached. The thermal sum was calculated as the sum of the average between minimum and maximum air temperature starting from the day of tuber planting with a base temperature of 6 • C and a maximum temperature of 30 • C [21]. Within two weeks after termination of the experiments, tubers with a diameter >10 mm were harvested manually. Tuber fresh weight, starch content and tuber number in the size classes below 35 mm, 35 mm to 60 mm, and above 60 mm were determined as described before [14]. All data are available at E!DAL [48].

Statistical Evaluation
All statistical evaluations were performed in SAS (version 9.4, SAS Institute, Cary); scripts are available on request from the corresponding author.
Starch yield (SY) was calculated per replicate as a product of tuber fresh weight and starch content. Relative starch yield (RelSY) was calculated per replicate of each Genotype G a by dividing the starch yield of the replicates with stress treatments (T s = s, ss, sc or cs) by the mean starch yield of all replicates of the same genotype G a from treatment c or cc of the same experiment E b .
The deviation of the starch yield from the experimental median (DRYM) [21] was calculated as the difference between RelSY of a genotype G a from treatment T s and the median of the relative starch yield of the all cultivars from the same experiment E b and treatment T s .
The deviation of the starch yield from the parental median (DRYMp) [21] was calculated as the difference between RelSY of a genotype from treatment T s and the median of the relative starch yield of the parent cultivars (Gp = A, R, and E) from the same experiment E b and treatment T s .
Spearman correlation analysis was performed with proc CORR on the mean DRYMp values for each genotype for the experiments performed on population 1. We performed a leaf one out analysis (LOOA) to estimate the power of a subset of field trials to assess drought tolerance. For LOOA, m (m = 2-7) experiments were selected randomly 21 times from the subset of eight field experiments on population 1 using proc SURVEYSELECT. The mean DRYMp was calculated for the selected experiments (DRYMp(m)) and for all eight experiments (DRYMp (8)). The Spearman correlation coefficient was calculated for the correlation between DRYMp(m) and DRYMp (8) .
All analysis of variance and analysis of covariance on the effects of cultivar and treatment on tuber number, tuber fresh weight and starch yield were performed with proc GLM (method SS3), which adjusts the analysis for unbalanced designs. The Ryan-Einot-Gabriel-Welsh (REGW) test, which adjusts the test statistic for multiple testing, was used for means comparisons.
For population 1 and population 2, we tested the effects of type (pot, field for population 1; pot, bigbag and field for population 2), treatment (control, stress) and genotype (34 cultivars for population 1, 63 genotypes for populations 2) on starch yield per plant. The analysis was performed separately on each population. Correlation analysis on the relationship between DRYMp values calculated from starch yields in different test systems was calculated on mean DRYMp values for each cultivar calculated separately for the test systems pot, bigbag, and field (proc CORR, method Pearson correlation).
For population 3, data analysis was performed on the data for the four cultivars A, D, E, and R that had also been part of population 1 and 2. For a direct comparison of field and bigbag trials grown from seed tubers, the effect of genotype, trial type, treatment, year, and their interaction on tuber fresh weight FW and starch yield SY was analyzed for the experiments conducted in 2017 and 2018. To assess the substrate effect, an ANOVA on the effects of substrate, treatment, genotype and their interaction on FW and SY data was performed separately for the years 2019 and 2020, as a joint analysis revealed a highly significant interaction between substrate, treatment, and year.
The effect of genotype (A, D, E, R) and planting material (cuttings, tubers) on tuber numbers in the three size classes S, M, and L was analyzed in the datasets from bigbag trials with horticultural substrate performed in 2017 to 2020. Data from control and stress (ss) treatments were analyzed separately.

Comparison of Potato Cultivars in Field and Pot Experiments
In the first years of the study (2011 to 2013), 34 potato cultivars (population 1) were characterized for starch yield and drought tolerance in eight field trials (F) and six pot (P) experiments. Plants were grown under optimal (control) and reduced water supply (drought) until tuber maturity. Field trials and pot experiments differed in several parameters, namely starting material (tubers (T) in field trials, cuttings (C) in pot experiments), substrate volume (field versus 4 liter-pots) and substrate type (soil (S) in field trials, peatbased horticultural (H) substrate in pot trials). Furthermore, all field trials and two of the pot experiments were conducted under naturally variable climate conditions, whereas four pot trials (marked with an asterisk in Figures 1 and 2) were grown in a climate-controlled glasshouse with minimal (<4 K/24 h) temperature fluctuations. For population 3, data analysis was performed on the data for the four cultivars A, D, E, and R that had also been part of population 1 and 2. For a direct comparison of field and bigbag trials grown from seed tubers, the effect of genotype, trial type, treatment, year, and their interaction on tuber fresh weight FW and starch yield SY was analyzed for the experiments conducted in 2017 and 2018. To assess the substrate effect, an ANOVA on the effects of substrate, treatment, genotype and their interaction on FW and SY data was performed separately for the years 2019 and 2020, as a joint analysis revealed a highly significant interaction between substrate, treatment, and year.
The effect of genotype (A, D, E, R) and planting material (cuttings, tubers) on tuber numbers in the three size classes S, M, and L was analyzed in the datasets from bigbag trials with horticultural substrate performed in 2017 to 2020. Data from control and stress (ss) treatments were analyzed separately.

Comparison of Potato Cultivars in Field and Pot Experiments
In the first years of the study (2011 to 2013), 34 potato cultivars (population 1) were characterized for starch yield and drought tolerance in eight field trials (F) and six pot (P) experiments. Plants were grown under optimal (control) and reduced water supply (drought) until tuber maturity. Field trials and pot experiments differed in several parameters, namely starting material (tubers (T) in field trials, cuttings (C) in pot experiments), substrate volume (field versus 4 liter-pots) and substrate type (soil (S) in field trials, peatbased horticultural (H) substrate in pot trials). Furthermore, all field trials and two of the pot experiments were conducted under naturally variable climate conditions, whereas four pot trials (marked with an asterisk in Figures 1 and 2) were grown in a climate-controlled glasshouse with minimal (<4 K/24 h) temperature fluctuations.  As a readout parameter, starch yield (SY) was determined as the product of harvested tuber mass and tuber starch content (Figure 1). The analysis of variance (ANOVA) indicated that SY was significantly higher in the field trials than in the pot trials (Table 2). Furthermore, the ANOVA revealed a significant interaction between trial type (F versus P) and treatment (control versus drought) and a significant interaction between trial type, treatment, and cultivar. This suggests that the drought stress response of cultivars may depend on the trials system. compared to pot grown plants. The larger rooting volume supports a larger plant and thu a higher tuber yield, which may have resulted in the differences in tuber starch yield dis cernible in Figure 1. It also means that the drought tolerance of genotypes, in whic drought tolerance resulted from the ability to forage additional water with longer root could have been underestimated in small pots. In the next step, we thus tested the effec of larger substrate volumes on drought tolerance assessments.   To establish how well drought tolerance under field conditions were predictable from pot-based drought tolerance trials, we calculated the correlation between the mean drought tolerance indices estimated for each cultivar for each of the twelve different experiments. The drought tolerance index was the deviation of the relative starch yield from the experimental median of all cultivars (DRYM) or of the check cultivars (DRYMp) [14,21]. The normalization to a set of cultivars that were used in all experiments allows comparisons over many years and all populations. The heat map (Figure 2a) shows the Spearman correlation coefficient (by color-code) and the type I error probability (Prob) values. The predominantly blue color indicates that most of the correlations coefficients were around zero or even negative, indicating a lack of reproducibility between the experiments. The exception are pot trials conducted at the same site, most of which show a highly significant (Prob < 0.001) correlation between the DRYMp values of the cultivars. The close correlation was observed in both test systems, the climate-controlled greenhouse and in the experiments conducted under variable climate conditions. However, neither test system showed a close correlation with any of the field trials nor did the field trials correlate well with each other.
To estimate the power of a single field trial to assess drought tolerance, we compared the mean DRYMp(m) calculated from a randomly selected number of m experiments to the mean DRYMp(8) calculated from all eight experiments by calculating the Spearman correlation coefficient. The distribution of the Spearman correlation coefficient is shown in Figure 2b. The median of the distribution increases from 0.50 for n = 1 to 0.80 for n = 4. When less than three experiments are performed, non-significant correlations between DRYMp(n) and DRYMp(8) are likely, as indicated by Spearman correlation coefficients below the significance threshold in Figure 2b. Thus, we concluded that single field experiments do not allow a stable assessment of a cultivars drought tolerance. Therefore, we switched to assess the drought tolerance of cultivars as mean tolerance estimated in a specified system. Indeed, the Pearson correlation between the DRYM calculated from all six pot and all field trials was positive (0.42) and significant (α = 0.05) [14]. The tolerance ranking of the cultivars in pot trials still differed from the ranking in the field trials as indicated by the significant interaction between genotype, treatment, and trial type (Prob = 0.054, Table 2). The main hypothesis was that the different response of the genotypes may have been caused by the much larger soil volume available to field grown plants compared to pot grown plants. The larger rooting volume supports a larger plant and thus a higher tuber yield, which may have resulted in the differences in tuber starch yield discernible in Figure 1. It also means that the drought tolerance of genotypes, in which drought tolerance resulted from the ability to forage additional water with longer roots, could have been underestimated in small pots. In the next step, we thus tested the effect of larger substrate volumes on drought tolerance assessments.

Effect of Pot Size on Drought Tolerance Assessment
In the second series of experiments, starch yield and drought tolerance were determined for population 2 that contained 3 or 4 potato cultivars and 60 genotypes obtained from crosses between three cultivars (details see [21]). The genotypes of population 2 were grown under optimal and reduced water supply in six field trials, two pot experiments and two bigbag experiments (Figure 3a). The bigbag contained 30 L of a horticultural substrate and thus provided a rooting volume comparable to the upper 30 cm of soil in a field trial system, which is the main rooting zone of potato. The plant density in the bigbag setup was similar to those on the field sites. In all trial systems, plant grew under naturally variable light and temperature conditions. The distribution of the starch yield data for the 10 experiments is shown in Figure 3b. Tuber starch yield of bigbag grown plants and pot grown plants was significantly lower than starch yield obtained from field grown plants (REGWQ-Test alpha = 0.05). However, the bigbag-grown plants achieved on average 85% of the field SY, whereas the SY of pot-grown plants was on average 24% of the field SY. The ANOVA (see Table 3) again revealed a significant effect of the test system on SY as well as a significant interaction between test system and genotype.

Effect of Pot Size on Drought Tolerance Assessment
In the second series of experiments, starch yield and drought tolerance were determined for population 2 that contained 3 or 4 potato cultivars and 60 genotypes obtained from crosses between three cultivars (details see [21]). The genotypes of population 2 were grown under optimal and reduced water supply in six field trials, two pot experiments and two bigbag experiments (Figure 3a). The bigbag contained 30 L of a horticultural substrate and thus provided a rooting volume comparable to the upper 30 cm of soil in a field trial system, which is the main rooting zone of potato. The plant density in the bigbag setup was similar to those on the field sites. In all trial systems, plant grew under naturally variable light and temperature conditions. The distribution of the starch yield data for the 10 experiments is shown in Figure 3b. Tuber starch yield of bigbag grown plants and pot grown plants was significantly lower than starch yield obtained from field grown plants (REGWQ-Test alpha = 0.05). However, the bigbag-grown plants achieved on average 85% of the field SY, whereas the SY of pot-grown plants was on average 24% of the field SY. The ANOVA (see Table 3) again revealed a significant effect of the test system on SY as well as a significant interaction between test system and genotype.  The power to predict drought tolerance in the field from container experiments was tested by correlation analysis on the mean drought tolerance indices DRYMp. The mean was calculated for each cultivar from several experiments, as the predictive power of single field experiments was insufficiently low (see above). The Pearson correlation coefficient between DRYMp from pot experiments and DRYMp from bigbag experiments was  The power to predict drought tolerance in the field from container experiments was tested by correlation analysis on the mean drought tolerance indices DRYMp. The mean was calculated for each cultivar from several experiments, as the predictive power of single field experiments was insufficiently low (see above). The Pearson correlation coefficient between DRYMp from pot experiments and DRYMp from bigbag experiments was positive and highly significant (p = 0.0003), suggesting that the substrate volume is less relevant for the assessment of the drought tolerance as previously assumed. However, the Pearson correlation between the mean DRYMp in the field and the DRYMp obtain in pot or bigbag experiments was not significant (Prob = 0.31 (pot) or Prob = 0.69 (bigbag)). Figure 4 indicates that many genotypes that responded tolerantly under field conditions and thus had a DRYMp of > 0 showed a negative DRYMp (sensitive) in pot and bigbag experiments. Thus, increasing the substrate volume and moving from controlled environments to variable climate conditions has not solved the problem of insufficient tolerance prediction from container experiments. The other two variables differing between field and container experiments are the starting material-tubers versus cuttings-and the substrate, which is sandy soil in the field and peat-based horticultural substrate in the container.
Agronomy 2021, 11, x FOR PEER REVIEW 11 of 24 positive and highly significant (p = 0.0003), suggesting that the substrate volume is less relevant for the assessment of the drought tolerance as previously assumed. However, the Pearson correlation between the mean DRYMp in the field and the DRYMp obtain in pot or bigbag experiments was not significant (Prob = 0.31 (pot) or Prob = 0.69 (bigbag)). Figure 4 indicates that many genotypes that responded tolerantly under field conditions and thus had a DRYMp of > 0 showed a negative DRYMp (sensitive) in pot and bigbag experiments. Thus, increasing the substrate volume and moving from controlled environments to variable climate conditions has not solved the problem of insufficient tolerance prediction from container experiments. The other two variables differing between field and container experiments are the starting material-tubers versus cuttings-and the substrate, which is sandy soil in the field and peat-based horticultural substrate in the container. . Green symbol tolerant cultivar A, red symbol sensitive cultivars E and R, blue symbols genotypes obtained from crosses (see [21]).

Effect of Starting Material and Substrate
The effect of starting material and substrate was tested in 2017 to 2020 in a third population, which contained cultivars from population 1 and crossing lines from population 2 (details see Supplemental File, Supplemental Figure S1). Drought stress was applied in three different forms, as early-stress before flowering, late stress after flowering, and as long-term stress. Early-stress resulted in an almost twice as high starch yield compared to late-and long-term stress (see Supplemental File 1, Figure S5). However, correlation analysis revealed significant correlations (Pearson coefficient > 0.88, Prob < 0.0001) between the starch yields measured under all three stress conditions. Thus, the ranking of the cultivars with respect to the starch yield was unaffected by the timing and duration of the stress.
The drought resistant cultivars A and D and the sensitive cultivars R and E were part of all three populations. In the experiments on population 3, all plants were grown from tubers. The bigbag experiments were performed in a polytunnel adjacent to the field site, in which the field experiments were grown. Horticultural substrate was used in four bigbag experiments, and sandy soil was used additionally in two bigbag experiments.
The average tuber yield and starch yield of all four cultivars was significantly lower (REGW-Test α = 0.05) in plants grown in bigbags with horticultural substrate (BTH25 und BTH27) than in plants (FTS26 and FTS28) grown in the adjacent field ( Figure 5, statistics Table 4). In addition to the significant effect of the test system on SY and FW, there was a significant interaction between cultivar and test system. While cultivar E showed the lowest SY and FW of all cultivars under optimal and stress conditions in the bigbag system, . Green symbol tolerant cultivar A, red symbol sensitive cultivars E and R, blue symbols genotypes obtained from crosses (see [21]).

Effect of Starting Material and Substrate
The effect of starting material and substrate was tested in 2017 to 2020 in a third population, which contained cultivars from population 1 and crossing lines from population 2 (details see Supplemental File, Supplemental Figure S1). Drought stress was applied in three different forms, as early-stress before flowering, late stress after flowering, and as long-term stress. Early-stress resulted in an almost twice as high starch yield compared to late-and long-term stress (see Supplemental File S1, Figure S5). However, correlation analysis revealed significant correlations (Pearson coefficient > 0.88, Prob < 0.0001) between the starch yields measured under all three stress conditions. Thus, the ranking of the cultivars with respect to the starch yield was unaffected by the timing and duration of the stress.
The drought resistant cultivars A and D and the sensitive cultivars R and E were part of all three populations. In the experiments on population 3, all plants were grown from tubers. The bigbag experiments were performed in a polytunnel adjacent to the field site, in which the field experiments were grown. Horticultural substrate was used in four bigbag experiments, and sandy soil was used additionally in two bigbag experiments.
The average tuber yield and starch yield of all four cultivars was significantly lower (REGW-Test α = 0.05) in plants grown in bigbags with horticultural substrate (BTH25 und BTH27) than in plants (FTS26 and FTS28) grown in the adjacent field ( Figure 5, statistics Table 4). In addition to the significant effect of the test system on SY and FW, there was a significant interaction between cultivar and test system. While cultivar E showed the lowest SY and FW of all cultivars under optimal and stress conditions in the bigbag system, it yielded highest when grown in the field. Interestingly, there was no significant interaction between test system, genotype and treatment, suggesting that the lower tuber and starch yield in the bigbag system did not affect the cultivar-specific response to drought. it yielded highest when grown in the field. Interestingly, there was no significant interac tion between test system, genotype and treatment, suggesting that the lower tuber and starch yield in the bigbag system did not affect the cultivar-specific response to drought.    Subsequently, we compared the effect of the substrate on tuber fresh weight FW and SY of plants grown from tubers on sand or horticultural substrate in the bigbag system in 2019 (BTH29 und BTS30) and 2020 (BTH31 and BTS32) ( Figure 6). In both years, cultivar and substrate had a significant effect on FW and SY (see Table 5). A significant interaction between cultivar and substrate on FW and SY indicated that the cultivars responded differently to the different substrates. In 2019 and 2020, plants grown on sand under optimal conditions reproduced the ranking of SY found under field conditions, with highest yields in cultivar E and lowest yields in cultivar D. This ranking was also found on horticultural substrate in 2020, but not in 2019. In a given treatment, e.g., drought treatment ss, both substrate types received equal amounts of irrigation water. Nevertheless, there was a significant substrate × treatment interaction on SY in both years and on FW in 2020 (Table 5). In 2019, drought stressed plants on horticultural substrate yielded significantly lower than those on sandy soil, although the shoots of the plants were larger on horticultural substrate than on sand.  Table 5.
The substrate also affected tuber form and size distribution (Figure 7). On sandy soil, cultivars E ( Figure 7b) and R (Figure 7c) produced the large, oblong tubers that are typical for these cultivars when grown under field conditions. In contrast, plants grown on horticultural substrate produced roundish tubers. In an overview, Figure 7a displays the mean tuber numbers for the different size classes large (L, >60 mm), medium (M, 35-60 mm) and small (S, <35 mm) for the four cultivars that were grown in bigbags from cuttings or tubers on horticultural substrate or from tubers on sandy soil in bigbags or in the field. Plants grown under optimal conditions produce more tubers per plant and a higher percentage of medium tubers compared to plants grown under drought stress conditions.  yield (a,b) and tuber yield (c,d) of four cultivars grown horticultural substrate (H, black) or sandy soil (S, blue) under optimal (cc H, cc S) and reduced (ss H, ss S)) water supply. For result of ANOVA see Table 5.
The substrate also affected tuber form and size distribution (Figure 7). On sandy soil, cultivars E ( Figure 7b) and R (Figure 7c) produced the large, oblong tubers that are typical for these cultivars when grown under field conditions. In contrast, plants grown on horticultural substrate produced roundish tubers. In an overview, Figure 7a displays the mean tuber numbers for the different size classes large (L, >60 mm), medium (M, 35-60 mm) and small (S, <35 mm) for the four cultivars that were grown in bigbags from cuttings or tubers on horticultural substrate or from tubers on sandy soil in bigbags or in the field. Plants grown under optimal conditions produce more tubers per plant and a higher percentage of medium tubers compared to plants grown under drought stress conditions. Table 5. Result of an ANOVA on the effect of substrate, treatment, year and genotype on tuber starch yield (SY) and tuber fresh weight (FW) of potato cultivars cultivated in bigbags. Substrate: peat-based horticultural substrate, sandy soil; treatment: optimal irrigation, long-term drought; genotype: potato genotypes A, D, E, and R. DF = degrees of freedom, F = F-value, Prob = probability. Bold print indicates Prob < 0.05.  In the field (FS26 and FTS28), all four cultivars produce 2 to 4 large tubers per plant under optimal water supply. In the bigbag system, plants predominantly produced large tubers when grown from tubers on sandy soil (BTS30 and BTS32). Cultivar D is the exception and develops large tubers under all conditions. The number of small tubers seemed to be higher on horticultural substrate than on sand, especially in the cultivars A and D. In the field (FS26 and FTS28), all four cultivars produce 2 to 4 large tubers per plant under optimal water supply. In the bigbag system, plants predominantly produced large tubers when grown from tubers on sandy soil (BTS30 and BTS32). Cultivar D is the exception and develops large tubers under all conditions. The number of small tubers seemed to be higher on horticultural substrate than on sand, especially in the cultivars A and D. However, this could have been a sampling artefact as small tubers more easily detectable in horticultural substrate than in sand. The starting material-cuttings or tubers-had but a marginal effect on the tuber numbers of plants that grew on horticultural substrate (Statistics see Supplemental File, Table S1).

Year
In summary, substrate had a higher impact on the assessment of potato performance in container trials than planting material, container size, and climate conditions.

Controlled Water Supply versus Controlled Soil Water Content
In the experiments in this study, drought stress was applied by reducing the water supply to the plant to a fixed percentage of the optimal water supply. This approach differs substantially from the widely used approach of constantly reduced soil humidity [37,38,41]. In the constant humidity approach, the water content is kept on a level below the optimal soil water content by daily addition of the amount of water lost by evapotranspiration. This amount is quantified by pot weighing, corrected by the estimates on the plant weight [37]. In this approach, plants with a high transpiration rate will receive more water than plants with a low transpiration rate. Thus, differences between water-spending drought-sensitive and water-saving drought tolerant genotypes [2,49] are hard to detect in this approach unless the effect of genotype on the irrigation volumes is carefully evaluated. The approach of constant water supply mimics the situation in the field, in which an area-dependent volume of water is available to the plant. The constant volume approach comes to its limits when plants of very different size are investigated. In this case, small plants will be subjected to soil water contents that are higher than optimal while big plants experience suboptimal soil humidity. This problem is aggravated if pot sizes are small compared to the size of the plant. In the field, larger plant will root in a larger soil volume and hence forage water from a larger soil area, limited by competition from neighboring plants. Hence, planting density has a large impact on the response of maize and other cereals to water-deficit [2]. Optimum planting density in potato depends on the cultivar, seed tuber weight and the production goal with respect to total yield and required tuber size distribution [50,51]. Planting density affects the vertical root distribution of potato: high planting density results in a lower percentage of deep roots per leaf area [52]. This change has been associated with the negative effect of high planting densities on yield stability under water limiting conditions [50,52]. Differences in root growth have been linked to differences in drought tolerance between two cultivars that became apparent when plants were cultivated in deep soil layers (>40 cm) [53]. The substrate volume in our bigbag system mimicked the planting density of 4.4 plants m −2 in our field trials, but could not provide the deeper soil layers, thus potentially underestimating drought tolerance of deep rooting genotypes like the late-maturing cultivar E. However, under irrigated conditions and on sandy soil, highest root length densities were found in the top 30 to 40 cm [54,55], making the restriction of the rooting depth in the bigbag system less influential when sandy soil was used (see discussion below).

Controlled Condition versus Managed Field Environments
Major differences between drought stress experiments in academic settings and drought stress in plant production are the timing of drought stress within the life cycle of the plant, the degree of stress, and the micrometeorological conditions before and during the stress [42]. The timing of the stress with regard to the crop's lifecycle is critical [42,56]. Potato is most sensitive to insufficient water supply in the period 15 to 20 days before flowering, during and after flowering, and during tuber filling [10,57,58]. In our study on population 3, we subjected potato genotypes to reduced water supply before (early stress) or after (late stress) flowering. The tuber yield was significantly lower after late stress than after early stress (see Supplemental File S1, Supplemental Figure S5). Likewise, cereals like rice and maize are most sensitive to drought during flowering and grain filling [2,59]. In contrast, a substantial part of academic research on drought stress is performed during early development. Hatzig et al. assessed the performance of rape seed seedlings to estimate the transgenerational effect of drought stress [60]. Muscolo et al. analyzed the response of lentil seedlings to short-term salt and PEG stress [32]. Bundig et al. screened potato genotypes on young cuttings in tissue culture [61]. Degenkolbe et al. studied the response of rice to drought stress during vegetative growth [41]. Resistance to drought during early development is critical for the establishment of crops and forest trees [42,62,63]. However, early drought tolerance is not necessarily linked to drought tolerance in later growth stages and thus yields stability in agronomic environments [64]. In our study, however, the cultivar ranking according to their tuber starch yield under stress was unaffected by the timing or the duration of the stress, although a much higher yield was achieved after early stress compared to late stress. Stress intensity is another difference between field and container trials. The much lower stress index (SI) of field experiments compared to bigbag and pot experiments may have affected the ranking. The role of stress intensity for the tolerance ranking of the cultivars in population 1 can be concluded from the correlation analysis between the pot trials. The drought tolerance indices in experiment PCH11 that had a low stress index of 0.42 was significantly correlated (Prob < 0.05) with the drought tolerance indices in the other three experiments conducted in controlled environments, which had high stress indices of >0.65. Thus, stress intensity does not seem to affect the genotype ranking in potato. The high correlation between the experiments furthermore suggest that the experimental design had sufficient power to estimate the tolerance indices. However, these experiments required substantial resources (100 m 2 of climate-controlled glasshouse for three months for one experiment) and their power to predict yield under field conditions was limited. Controlled environments differ from field environments both with respect to the root environment (see below) and to the micrometeorological conditions. Poorter et al. reviewed the multitude of differences between controlled environments and field conditions [65]. The consequences of these difference have been shown, especially for Arabidopsis thaliana [66]. Mishra et al. found that phenotypes of Arabidopsis thaliana found under controlled conditions do not correlate with those found in the field [67]. Annunziata et al. showed that diel metabolism and expression of clock genes differs between controlled conditions with constant temperatures and highly variable natural conditions [68].
In our setup, light intensity in the controlled environment experiments was variable but substantially lower than in the field, as the greenhouse roof structure absorbs and scatters light and some of the experiments were conducted off-season. The internal intensity of natural light in the glasshouse is about 50% of the external intensity. Temperature variability was much lower under controlled conditions, with a difference of less than 4 K between day and night temperatures. In contrast, median temperatures on subsequent days could differ by up to 10 K under field conditions (see Supplemental File S1, Supplemental Figure S3). High maximum temperatures in the field were also linked to high vapor pressure deficits (VPD), which reached more than 2 kPa during field experiments on the site Golm [21,47]. In contrast, the set values of 21 • C/60% RH in the glasshouse amount to a VPD of 0.43. Together with the lower radiation intensity and the lack of wind, this causes a substantially lower transpiration in climate-controlled glasshouses compared to the open field. As this difference also affects the morphology of leaves and plants, high-end climate simulation facilities have been built to provide the option to simulate realistic atmospheric conditions, including wind [69].
In contrast to controlled environments, atmospheric conditions in the screenhouses and the rainout shelters are much more field-like. Light and temperatures are as variable as in the field, while soil volume and composition as well as water supply can be controlled.
Temperatures in polytunnel screenhouses and-to a lesser extend-rainout shelters tend to be higher than in the open field and consequently VPDs are higher in polytunnel, too. In contrast, light intensities are lower in polytunnel screenhouses than in the field. Manufacturers provide transmission rate of up to 90% [70], however dust deposition can decrease transmission rates to 67% (own measurements). The effect of the light intensity reduction on yield will be highest, when light intensity is low during tuber filling after canopy closure, when the leaf area index has reached its maximum. Additionally, wind speeds enhance the transpiration rates in fields more than in polytunnels.
The major advantage of polytunnel screenhouses is the low per area cost for construction and maintenance. This make several 100 m 2 area for large experiments with many genotypes or larger containers affordable.

The Hidden Half: Planting Material, Pot Size, and Soil
The second important aspect of this study concerns the compartment, in which the root-the hidden half [71] of the plant-is cultivated. The closer attention to the root compartment was triggered by the insightful review of Passioura [72] and by a puzzling observation that cultivar E, which was among the highest yielding in the field, showed low tuber yields in pots under controlled conditions.
Our first hypothesis was that 4 liter-pots prevented the optimal development of cultivar E, which produces large shoots [21]. Therefore, we started cultivating plants in substantially bigger containers, namely bigbags with a volume of 30 L and a height of 30 cm. However, E again showed lower yield under optimal water supply than the three other check cultivars A, D, and R. Solanum tuberosum ssp. tuberosum cultivars root to a depth of 100 cm, but develop the highest root density in the upper 30 cm, especially when irrigated [54,73,74]. Potato cultivars differ in root development both with respect to the root density in the plow layer (upper 30 cm) and to their root length. Late-maturing genotypes tend to produce deeper roots [73][74][75]. The stronger root development in late-maturing cultivars is related to the longer root growth period, which terminates with the end of the leaf production. Cultivar E has the latest development compared to A, D, and R based on the BBCH assessments under field conditions and the tuber maturity score [21]. Thus, a rooting horizon of 30 cm may have been insufficient for the cultivar E to fulfill its full potential for water uptake and yield. In spite of the higher root density in the upper 30 cm, deeper roots can contribute substantially to water uptake [76]. The interaction between rooting depth and water supply on yield gets even more complicated in irrigated crops, which take up most of the water from the upper layers. In our system, irrigation was performed with large volumes equivalent to 10 L per m 2 , thus rewetting more than the superficial layers of the soil. Thus, cultivar E could have benefited from its ability to take up water and nutrients from higher soil depth in the field while being cultivated under suboptimal conditions in bigbags on peat.
The second factor that differed between field and container experiments was the starting material. In most pot experiments, the plant started from in vitro propagated cuttings and thus relied on successful photosynthesis from the very beginning. In field trials as in production systems, potato crops were always started from seed tubers. Seed tubers bring a substantial amount of resources in the form of storage starch available for the establishment of the plant. Thus, the size of the seed tuber has an impact on the performance of the crop [50,77]. Therefore, tuber size was carefully controlled in our experiment to make sure that all cultivars and treatments were started with the same seed tuber weights. The comparison between the bigbag experiments from 2017 to 2020, which started from tubers, with the bigbag experiments of 2015 and 2016 started from cuttings revealed substantial differences in shoot development. Plants grown from cuttings logged two to three weeks earlier than those grown from tubers did. Nevertheless, the effect of the interaction between experiment type and genotype on tuber yield remained significant in the experiments performed in 2017 and 2018 (Table 4). Thus, the factor planting material did not explain the difference in cultivar ranking between field and container experiments.
Finally, the substrate turned out to be the essential factor for the reproduction of field results in container trials. When potato plants were grown on sandy substrate instead of horticultural substrate, the ranking of the genotypes with respect to yield was the same as in the field. The effect of substrate on tuber and starch yield was significant in both trial years; however the direction of the effect was opposite. This suggest that an interaction between substrate type and the water demand caused by the atmospheric condition throughout the experiment. In 2019, water demand was almost twice as high as in 2020, mainly due to a period of high temperatures and thus high VPD between 40 and 60 days from planting in 2019 (see Supplemental File S1, Supplemental Figures S2 and S3). The substrate effect was not due to soil temperature differences, as these were surprisingly similar. The interaction between substrate × genotype suggest that differences in water uptake, presumably due to different root developments, may be at the bottom of the substrate effect on cultivar ranking. The later maturing cultivar E may have been better able to respond by increased root growth to high water use than the other cultivars, which are more early maturing and thus terminate root growth earlier [75].
In addition to the effect on yield, substrate affected tuber quality. The typical tubers size distribution of the cultivars (see Figure 7) and the cultivar-specific tuber forms that are typically found in the field were reproduced after cultivation in bigbags with sandy substrate. On peat, tubers tended to be less oblong and the percentage of large tubers was decreased. Tuber size distribution and form are highly relevant for the market value of the potato crops for the fresh market or the chip production [78]. Thus, realistic assessment on tuber qualities in container trials requires the use of soil based substrates.
Interestingly, the 3-way interaction of genotype × substrate × treatment was nonsignificant, indicating that in spite of the striking effect of the substrate on absolute yield, its effect on drought tolerance index DRYMp, which is based on relative yield, was low. Thus, it is possible to use horticultural substrate when assessing drought tolerance based on tolerance scores DRYM, DRYMp, or the stress tolerance index STI, which are based on relative yield and are weakly affected by the yield potential [14].
The use of soil instead of horticultural substrate provides some challenges. Horticultural substrates are quality-controlled by manufactures to provide homogenous quality within and between batches and optimal pore size distribution. In contrast, homogeneous and quality-controlled soil supplies are more difficult to find. We tested quarz sand as an alternative to sandy soil in controlled-environment experiments and found it impossible to establish potato cultures from cuttings, even when cuttings were pre-cultivated in peat pots. One of the problem with soil use is the loss of soil structure during pot filling. When the dry soil is filled into pots, the porous structure that has established in the soil during numerous freeze-thaw, drying-rewetting cycles is lost and the soil is in single grain structure with a very adverse pore distribution [72]. We found that the structure can be improved when the soil-filled containers are kept in the polytunnel over winter with occasional rewetting. However, our sandy soil originated from an A-horizon and thus contained more organic matter than pure quarz sand. Additional problems with the use of soil instead of horticultural substrate arise from the risk of water logging when used in small pots [72]. Furthermore, the higher specific weight of mineral soil (typically 1.4 kg/L) compared to horticultural substrate (typically 0.7 kg/L) makes the logistics of substrate transport, pot filling and pot movement challenging. Nevertheless, natural soils in deep growth columns have been successfully used in drought tolerance trials on potato, maize, and pear millet [49,74,79].
Thus, we conclude that the use of soil-based substrates instead of peat-based horticultural substrate, while less critical for the assessment of the relative yield/yield stability under drought conditions, leads to a better estimate of yield potential, yield under stress, and tuber quality in potato.

Conclusions
The analysis of nine years of drought tolerance experiments on potato indicated that a minimum of three trials is required to obtain a stable estimate of drought tolerance in potato genotypes. We tested the effect of climate conditions, container size, starting material, and substrate on the correlation between the performances of potato genotypes in container experiments compared to field experiments. Realistic estimates on relative yields under drought can be obtained in container systems on horticultural substrate. However, genotype assessment with respect to yield potential, absolute yield under stress, and tuber size distribution requires the use of soil-based substrates.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/agronomy11050865/s1, in supplemental File S1, Figure S1: Potato genotypes in test populations 1 to 3. Figure S2 Water volumes in experiments on population 3. Figure S3: Air temperature in experiments on population 3, Figure S4 Soil temperature in experiments on population 3. Figure S5: Mean starch yield of cultivars in population 3 under early, late and long-term drought. Table S1: ANOVA on the effect of planting material on tuber size distribution.